Documente Academic
Documente Profesional
Documente Cultură
V. raghavendra
Contents
1 Introduction to Matrices
1.1 Definition of a Matrix . . . . . .
1.1.1 Special Matrices . . . . .
1.2 Operations on Matrices . . . . .
1.2.1 Multiplication of Matrices
1.2.2 Inverse of a Matrix . . . .
1.3 Some More Special Matrices . . .
1.3.1 Submatrix of a Matrix . .
1.4 Summary . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
8
10
14
16
16
20
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
24
25
31
33
38
42
43
47
49
50
51
.
.
.
.
.
.
.
.
.
53
53
57
60
64
66
68
71
78
81
. . . .
. . . .
. . . .
. . . .
. . . .
Space
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
4 Linear Transformations
4.1 Definitions and Basic Properties
4.2 Matrix of a linear transformation
4.3 Rank-Nullity Theorem . . . . . .
4.4 Similarity of Matrices . . . . . .
4.5 Change of Basis . . . . . . . . . .
4.6 Summary . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
83
83
86
89
93
95
98
.
.
.
.
.
.
.
99
99
99
106
108
114
118
120
.
.
.
.
123
123
129
132
137
7 Differential Equations
7.1 Introduction and Preliminaries . . . . . . . . .
7.2 Separable Equations . . . . . . . . . . . . . . .
7.2.1 Equations Reducible to Separable Form
7.3 Exact Equations . . . . . . . . . . . . . . . . .
7.3.1 Integrating Factors . . . . . . . . . . . .
7.4 Linear Equations . . . . . . . . . . . . . . . . .
7.5 Miscellaneous Remarks . . . . . . . . . . . . . .
7.6 Initial Value Problems . . . . . . . . . . . . . .
7.6.1 Orthogonal Trajectories . . . . . . . . .
7.7 Numerical Methods . . . . . . . . . . . . . . . .
143
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
145
145
148
148
150
152
155
158
159
164
165
.
.
.
.
.
.
.
.
.
167
167
170
170
173
175
176
178
181
185
CONTENTS
II
. . . .
. . . .
. . . .
Point
. . . .
. . . .
. . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Laplace Transform
191
191
194
195
197
197
197
199
205
10 Laplace Transform
10.1 Introduction . . . . . . . . . . . . . . . . . . . . .
10.2 Definitions and Examples . . . . . . . . . . . . .
10.2.1 Examples . . . . . . . . . . . . . . . . . .
10.3 Properties of Laplace Transform . . . . . . . . .
10.3.1 Inverse Transforms of Rational Functions
10.3.2 Transform of Unit Step Function . . . . .
10.4 Some Useful Results . . . . . . . . . . . . . . . .
10.4.1 Limiting Theorems . . . . . . . . . . . . .
10.5 Application to Differential Equations . . . . . . .
10.6 Transform of the Unit-Impulse Function . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
207
207
207
208
210
215
216
217
217
218
221
11 Appendix
11.1 Permutation/Symmetric Groups
11.2 Properties of Determinant . . . .
11.3 Dimension of M + N . . . . . . .
11.4 Condition for Exactness . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
223
223
227
231
232
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
Chapter 1
Introduction to Matrices
1.1
Definition of a Matrix
a11
a21
A= .
..
am1
a12
a22
..
.
am2
..
.
a1n
a11
a21
a2n
.. or A = ..
.
.
amn
am1
a12
a22
..
.
am2
..
.
a1n
a2n
.. ,
.
amn
where aij is the entry at the intersection of the ith row and j th column. In a more concise
manner, we also write Amn = [aij ] or A = [aij ]mn or A = [aij ]. We shall mostly be concerned
1 3 7
with matrices having real numbers, denoted R, as entries. For example, if A =
then
4 5 6
a11 = 1, a12 = 3, a13 = 7, a21 = 4, a22 = 5, and a23 = 6.
A matrix having only one column is called a column vector; and a matrix with only one
row is called a row vector. Whenever a vector is used, it should be understood from
the context whether it is a row vector or a column vector. Also, all the vectors
will be represented by bold letters.
Definition 1.1.2 (Equality of two Matrices) Two matrices A = [aij ] and B = [bij ] having the
same order m n are equal if aij = bij for each i = 1, 2, . . . , m and j = 1, 2, . . . , n.
In other words, two matrices are said to be equal if they have the same order and their corresponding entries are equal.
Example 1.1.3 The linear system of equations 2x + 3y = 5 and 3x + 2y = 5 can be identified
2 3 : 5
with the matrix
. Note that x and y are indeterminate and we can think of x being
3 2 : 5
associated with the first column and y being associated with the second column.
7
1.1.1
Special Matrices
Definition 1.1.4
For example,
0
0
0
0 0
and 023 =
0
0 0
0
.
0
2. A matrix that has the same number of rows as the number of columns, is called a square
matrix. A square matrix is said to have order n if it is an n n matrix.
3. The entries a11 , a22 , . . . , ann of an nn square matrix A = [aij ] are called the diagonal entries
(the principal diagonal) of A.
4. A square matrix A = [aij ] is said to be a diagonal matrix if aij = 0 for i 6= j. In other words,
the non-zero
entries appear only on the principal diagonal. For example, the zero matrix 0n
4 0
and
are a few diagonal matrices.
0 1
1 0 0
1 0
For example, I2 =
and I3 = 0 1 0 . The subscript n is suppressed in case the
0 1
0 0 1
order is clear from the context or if no confusion arises.
6. A square matrix A = [aij ] is said to be an upper triangular matrix if aij = 0 for i > j.
A square matrix A = [aij ] is said to be a lower triangular matrix if aij = 0 for i < j.
A square matrix A
0 1
For example, 0 3
0 0
is said to be triangular if it is
0 0
4
1 is upper triangular, 1 0
0 1
2
0
0 is lower triangular.
1
Exercise 1.1.5 Are the following matrices upper triangular, lower triangular or both?
1. .
..
..
..
..
.
.
.
0
0 ann
2. The square matrices 0 and I or order n.
3. The matrix diag(1, 1, 0, 1).
1.2
Operations on Matrices
Definition 1.2.1 (Transpose of a Matrix) The transpose of an m n matrix A = [aij ] is defined as the n m matrix B = [bij ], with bij = aji for 1 i m and 1 j n. The transpose of
A is denoted by At .
1 4
That is, if A =
0 1
vector and vice-versa.
1
5
t
then A = 4
2
5
0
1 . Thus, the transpose of a row vector is a column
2
Definition 1.2.3 (Addition of Matrices) let A = [aij ] and B = [bij ] be two m n matrices.
Then the sum A + B is defined to be the matrix C = [cij ] with cij = aij + bij .
Note that, we define the sum of two matrices only when the order of the two matrices are same.
Definition 1.2.4 (Multiplying a Scalar to a Matrix) Let A = [aij ] be an m n matrix. Then
for any element k R, we define kA = [kaij ].
1 4 5
5 20 25
For example, if A =
and k = 5, then 5A =
.
0 1 2
0 5 10
Theorem 1.2.5 Let A, B and C be matrices of order m n, and let k, R. Then
1. A + B = B + A
2. (A + B) + C = A + (B + C)
(commutativity).
(associativity).
3. k(A) = (k)A.
4. (k + )A = kA + A.
Proof. Part 1.
Let A = [aij ] and B = [bij ]. Then
A + B = [aij ] + [bij ] = [aij + bij ] = [bij + aij ] = [bij ] + [aij ] = B + A
as real numbers commute.
The reader is required to prove the other parts as all the results follow from the properties of
real numbers.
Definition 1.2.6 (Additive Inverse) Let A be an m n matrix.
1. Then there exists a matrix B with A + B = 0. This matrix B is called the additive inverse of
A, and is denoted by A = (1)A.
2. Also, for the matrix 0mn , A + 0 = 0 + A = A. Hence, the matrix 0mn is called the additive
identity.
Exercise 1.2.7
10
1 1
2
9. Let A = 2 3 and B =
1
0 1
1.2.1
3 1
. Compute A + B t and B + At .
1 2
Multiplication of Matrices
n
X
k=1
ai1
ai2
ain
and Bnr
.
.. b
1j
..
. b2j
=
.
..
..
.
..
. bmj
..
.
..
.
..
.
..
.
then
AB = [(AB)ij ]mr and (AB)ij = ai1 b1j + ai2 b2j + + ain bnj .
Observe that the product AB is defined if and only if
the number of columns of A = the number
of rows
of B.
a b c
For example, if A =
and B = x y z t then
d e f
u v w s
a + bx + cu a + by + cv
AB =
d + ex + f u d + ey + f v
a + bz + cw
d + ez + f w
a + bt + cs
.
d + et + f s
(1.2.1)
+b x
t +c u
s .
That is, if Rowi (B) denotes the i-th row of B for 1 i 3, then the matrix product AB can be
re-written as
a Row1 (B) + b Row2 (B) + c Row3 (B)
AB =
.
(1.2.2)
d Row1 (B) + e Row2 (B) + f Row3 (B)
11
Similarly, observe that if Colj (A) denotes the j-th column of A for 1 j 3, then the matrix
product AB can be re-written as
AB
(1.2.3)
AB
a1 B
a2 B
an B
1
Example 1.2.10 Let A = 1
0
matrix multiplication to
1
2 0
0 1 and B = 0
0
1 1
0 1
0
1 . Use the row/column method of
1 1
12
Remark 1.2.12 Note that if A is a square matrix of order n and if B is a scalar matrix of order
n then
the matrix product is not commutative. For example, consider
AB = BA. In
general,
1 1
1 0
A=
and B =
. Then check that the matrix product
0 0
1 0
2 0
1
AB =
6
=
0 0
1
1
= BA.
1
Theorem 1.2.13 Suppose that the matrices A, B and C are so chosen that the matrix multiplications are defined.
1. Then (AB)C = A(BC). That is, the matrix multiplication is associative.
2. For any k R, (kA)B = k(AB) = A(kB).
3. Then A(B + C) = AB + AC. That is, multiplication distributes over addition.
4. If A is an n n matrix then AIn = In A = A.
5. For any square matrix A of order n and D = diag(d1 , d2 , . . . , dn ), we have
the first row of DA is d1 times the first row of A;
for 1 i n, the ith row of DA is di times the ith row of A.
A similar statement holds for the columns of A when A is multiplied on the right by D.
Proof. Part 1.
p
X
bk cj and (AB)i =
=1
n
X
aik bk .
k=1
Therefore,
A(BC) ij
=
=
=
n
X
aik BC
k=1
p
n X
X
Part 5.
k=1 =1
p X
n
X
(AB)C
kj
n
X
aik
k=1
aik bk cj =
=1 k=1
p
X
bk cj
=1
p
n X
X
k=1 =1
aik bk cj
t
X
aik bk cj =
AB i cj
=1
ij
n
X
k=1
13
0 1 0
4. Let A = 0 0 1 . Compute A + 3A2 A3 and aA3 + bA + cA2 .
1 0 0
5. Let A and B be two matrices. If the matrix addition A + B is defined, then prove that
(A + B)t = At + B t . Also, if the matrix product AB is defined then prove that (AB)t = B t At .
6. Let A = [a1 , a2 , . . . , an ] and B t = [b1 , b2 , . . . , bn ]. Then check that order of AB is 11, whereas
BA has order n n. Determine the matrix products AB and BA.
7. Let A and B be two matrices such that the matrix product AB is defined.
(a) If the first row of A consists entirely of zeros, prove that the first row of AB also consists
entirely of zeros.
(b) If the first column of B consists entirely of zeros, prove that the first column of AB also
consists entirely of zeros.
(c) If A has two identical rows then the corresponding rows of AB are also identical.
(d) If B has two identical columns then the corresponding columns of AB are also identical.
1 0
1 1 2
8. Let A = 1 2 1 and B = 0 1. Use the row/column method of matrix multipli1 1
0 1
1
cation to compute the
(a) first row of the matrix AB.
(b) third row of the matrix AB.
(c) first column of the matrix AB.
(d) second column of the matrix AB.
(e) first column of B t At .
(f ) third column of B t At .
(g) first row of B t At .
(h) second row of B t At .
9. Let A and B be the matrices given in Exercise 1.2.14.8. Compute A At , (3AB)t 4B t A
and 3A 2At .
10. Let n be a positive integer. Compute An for the following matrices:
1 1 1
1 1 1
1 1
0 1 1 ,
1 1 1 .
,
0 1
0 0 1
1 1 1
Can you guess a formula for An and prove it by induction?
14
(c) The matrix products AB and BA are defined and they have the same order but AB 6= BA.
a
a+b
12. Let A be a 3 3 matrix satisfying A b = b c . Determine the matrix A.
c
0
a
ab
13. Let A be a 2 2 matrix satisfying A
=
. Can you construct the matrix A satisfying
b
a
the above? Why!
1.2.2
Inverse of a Matrix
a
1. Let A =
c
b
.
d
1
1
adbc
d
c
b
.
a
1 2
2. Let A = 2 3
3 4
[b d] for
2
1
,
0
4
3
2 0
1
4 . Then A1 = 0
3 2 .
6
1 2 1
Theorem 1.2.19 Let A and B be two matrices with inverses A1 and B 1 , respectively. Then
1. (A1 )1 = A.
0
0
15
2. (AB)1 = B 1 A1 .
3. (At )1 = (A1 )t .
Proof. Proof of Part 1.
By definition AA1 = A1 A = I. Hence, if we denote A1 by B, then we get AB = BA = I. Thus,
the definition, implies B 1 = A, or equivalently (A1 )1 = A.
Proof of Part 2.
Verify that (AB)(B 1 A1 ) = I = (B 1 A1 )(AB).
Proof of Part 3.
We know AA1 = A1 A = I. Taking transpose, we get
(AA1 )t = (A1 A)t = I t (A1 )t At = At (A1 )t = I.
Hence, by definition (At )1 = (A1 )t .
We will again come back to the study of invertible matrices in Sections 2.2 and 2.5.
Exercise 1.2.20
1. Let A be an invertible matrix and let r be a positive integer. Prove that
(A1 )r = Ar .
cos() sin()
cos() sin()
2. Find the inverse of
and
.
sin()
cos()
sin() cos()
3. Let A1 , A2 , . . . , Ar be invertible matrices. Prove that the product A1 A2 Ar is also an invertible matrix.
4. Let xt = [1, 2, 3] and yt = [2, 1, 4]. Prove that xyt is not invertible even though xt y is
invertible.
5. Let A be an n n invertible matrix. Then prove that
(a) A cannot have a row or column consisting entirely of zeros.
(b) any two rows of A cannot be equal.
(c) any two columns of A cannot be equal.
(d) the third row of A cannot be equal to the sum of the first two rows, whenever n 3.
(e) the third column of A cannot be equal to the first
whenever n 3.
1
1
6. Suppose A is a 2 2 matrix satisfying (I + 3A) =
2
2 0
1
7. Let A be a 3 3 matrix such that (I A)1 = 0
3 2 . Determine the matrix A
1 2 1
[Hint: See Example 1.2.18.2 and Theorem 1.2.19.1].
8. Let A be a square matrix satisfying A3 + A 2I = 0. Prove that A1 = 12 A2 + I .
9. Let A = [aij ] be an invertible matrix and let p be a nonzero real number. Then determine the
inverse of the matrix B = [pij aij ].
16
1.3
Definition 1.3.1
At = A.
1 2
3
0 1 2
Example 1.3.2
1. Let A = 2 4 1 and B = 1 0 3 . Then A is a symmetric
2 3 0
3 1 4
matrix and B is a skew-symmetric matrix.
1
1
1
2. Let A = 12
1
6
3
12
1
6
0
. Then A is an orthogonal matrix.
26
Exercise 1.3.3
1. Let A be a real square matrix. Then S =
1
t
(A
A
)
is
skew-symmetric,
and A = S + T.
2
1
2 (A
+ At ) is symmetric, T =
2. Show that the product of two lower triangular matrices is a lower triangular matrix. A similar
statement holds for upper triangular matrices.
3. Let A and B be symmetric matrices. Show that AB is symmetric if and only if AB = BA.
4. Show that the diagonal entries of a skew-symmetric matrix are zero.
5. Let A, B be skew-symmetric matrices with AB = BA. Is the matrix AB symmetric or skewsymmetric?
6. Let A be a symmetric matrix of order n with A2 = 0. Is it necessarily true that A = 0?
7. Let A be a nilpotent matrix. Prove that there exists a matrix B such that B(I + A) = I =
(I + A)B [ Hint: If Ak = 0 then look at I A + A2 + (1)k1 Ak1 ].
1.3.1
Submatrix of a Matrix
Definition 1.3.4 A matrix obtained by deleting some of the rows and/or columns of a matrix is
said to be a submatrix of the given matrix.
1 4 5
For example, if A =
, a few submatrices of A are
0 1 2
1
1
[1], [2],
, [1 5],
0
0
5
, A.
2
17
1 4
1 4
But the matrices
and
are not submatrices of A. (The reader is advised to give
1 0
0 2
reasons.)
Let A be an n m matrix and B be an m
p matrix. Suppose r < m. Then, we can decompose
H
the matrices A and B as A = [P Q] and B =
; where P has order n r and H has order r p.
K
That is, the matrices P and Q are submatrices of A and P consists of the first r columns of A and
Q consists of the last m r columns of A. Similarly, H and K are submatrices of B and H consists
of the first r rows of B and K consists of the last m r rows of B. We now prove the following
important theorem.
Theorem 1.3.5 Let A = [aij ] = [P Q] and B = [bij ] =
H
be defined as above. Then
K
AB = P H + QK.
Proof. First note that the matrices P H and QK are each of order np. The matrix products P H
and QK are valid as the order of the matrices P, H, Q and K are respectively, nr, rp, n(mr)
and (m r) p. Let P = [Pij ], Q = [Qij ], H = [Hij ], and K = [kij ]. Then, for 1 i n and
1 j p, we have
(AB)ij
=
=
m
X
k=1
r
X
k=1
aik bkj =
r
X
aik bkj +
k=1
m
X
Pik Hkj +
m
X
aik bkj
k=r+1
Qik Kkj
k=r+1
Remark 1.3.6 Theorem 1.3.5 is very useful due to the following reasons:
1. The order of the matrices P, Q, H and K are smaller than that of A or B.
2. It may be possible to block the matrix in such a way that a few blocks are either identity
matrices or zero matrices. In this case, it may be easy to handle the matrix product using the
block form.
3. Or when we want to prove results using induction, then we may assume the result for r r
submatrices and then look for (r + 1) (r + 1) submatrices, etc.
a b
1 2 0
For example, if A =
and B = c d , Then
2 5 0
e f
AB =
1 2
2 5
a
c
b
0
a + 2c
+
[e f ] =
d
0
2a + 5c
0 1 2
If A = 3
1
4 , then A can be decomposed as follows:
2 5 3
b + 2d
.
2b + 5d
18
0 1 2
A= 3
1
4
2 5 3
0 1 2
A= 3
1
4
2 5 3
, or
0 1
A= 3
1
2 5
4 , or
3
and so on.
m m
s s
1 2
1 2
Suppose A = n1
and B = r1
P Q
E F . Then the matrices P, Q, R, S and
n2
R S
r2
G H
E, F, G, H, are called the blocks of the matrices A and B, respectively.
Even if A+B is defined, the orders of P and E may not be same and hence, we
may not be able
P +E Q+F
to add A and B in the block form. But, if A+B and P +E is defined then A+B =
.
R+G S+H
Similarly, if the product AB is defined, the product P E need not be defined. Therefore, we
can talk of matrix product AB as block product
of matrices, if both
the products AB and P E are
P E + QG P F + QH
defined. And in this case, we have AB =
.
RE + SG RF + SH
That is, once a partition of A is fixed, the partition of B has to be properly
chosen for purposes of block addition or multiplication.
Exercise 1.3.7
1/2 0 0
1 0
2. Let A = 0 1 0 , B = 2 1
0 0 1
3 0
(a) the first row of AC,
of Theorems 1.2.5
0
2
0 and C = 2
1
3
and 1.2.13.
2 2 6
1 2 5 . Compute
3 4 10
sin
x1
. If x =
and y =
cos
x2
19
3 2
4 3
(a) Let A =
and B =
. Compute tr(A) and tr(B).
2 2
5 1
(b) Then for two square matrices, A and B of the same order, prove that
i. tr (A + B) = tr (A) + tr (B).
ii. tr (AB) = tr (BA).
(c) Prove that there do not exist matrices A and B such that AB BA = cIn for any c 6= 0.
7. Let A and B be two m n matrices with real entries. Then prove that
(a) Ax = 0 for all n 1 vector x with real entries implies A = 0, the zero matrix.
(b) Ax = Bx for all n 1 vector x with real entries implies A = B.
1 2 2 1
1 0 0 1
1 1 2 1
0 1 1 1
10. Let A =
and B =
. Compute the matrix product AB
1 1 1 1
0 1 1 0
0 1 0 1
1 1 1 1
using the block matrix multiplication.
P Q
11. Let A =
. If P, Q, R and S are symmetric, is the matrix A symmetric? If A is
R S
symmetric, is it necessary that the matrices P, Q, R and S are symmetric?
A11 A12
12. Let A be a 3 3 matrix and let A =
, where A11 is a 2 2 invertible matrix and
A21
c
c is a real number.
20
B11 B12
(a) If p =
is non-zero, prove that B =
is the inverse of A, where
B21 p1
1
1
1
B11 = A1
A21 A1
, B12 = A1
and B21 = p1 A21 A1
12 p
11 + A11 A12 p
11
11 A
11 .
0 1 2
0 1 2
1 4 and 3
1
4 .
(b) Find the inverse of the matrices 1
2 1 1
2 5 3
c A21 A1
11 A12
(A + xyt )1 = A1 A1 xyt A1 .
This formula gives the information about the inverse when an invertible matrix is modified by
a rank one matrix.
15. Let J be an n n matrix having each entry 1.
(a) Prove that J 2 = nJ.
(b) Let 1 , 2 , 1 , 2 R. Prove that there exist 3 , 3 R such that
(1 In + 1 J) (2 In + 2 J) = 3 In + 3 J.
(c) Let , R with 6= 0 and + n 6= 0 and define A = In + J. Prove that A is
invertible.
16. Let A be an upper triangular matrix. If A A = AA then prove that A is a diagonal matrix.
The same holds for lower triangular matrix.
1.4
Summary
In this chapter, we started with the definition of a matrix and came across lots of examples. In
particular, the following examples were important:
1. The zero matrix of size m n, denoted 0mn or 0.
2. The identity matrix of size n n, denoted In or I.
3. Triangular matrices
4. Hermitian/Symmetric matrices
5. Skew-Hermitian/skew-symmetric matrices
6. Unitary/Orthogonal matrices
We also learnt product of two matrices. Even though it seemed complicated, it basically tells
the following:
1. Multiplying by a matrix on the left to a matrix A is same as row operations.
2. Multiplying by a matrix on the right to a matrix A is same as column operations.
Chapter 2
Introduction
ii. b = 0 then the system has infinite number of solutions, namely all x R.
2. Consider a system with 2 equations in 2 unknowns. The equation ax + by = c represents a
line in R2 if either a 6= 0 or b 6= 0. Thus the solution set of the system
a1 x + b 1 y = c1 , a2 x + b 2 y = c2
is given by the points of intersection of the two lines. The different cases are illustrated by
examples (see Figure 1).
(a) Unique Solution
x + 2y = 1 and x + 3y = 1. The unique solution is (x, y)t = (1, 0)t .
Observe that in this case, a1 b2 a2 b1 6= 0.
(b) Infinite Number of Solutions
x + 2y = 1 and 2x + 4y = 2. The solution set is (x, y)t = (1 2y, y)t = (1, 0)t + y(2, 1)t
with y arbitrary as both the equations represent the same line. Observe the following:
i. Here, a1 b2 a2 b1 = 0, a1 c2 a2 c1 = 0 and b1 c2 b2 c1 = 0.
ii. The vector (1, 0)t corresponds to the solution x = 1, y = 0 of the given system
whereas the vector (2, 1)t corresponds to the solution x = 2, y = 1 of the system
x + 2y = 0, 2x + 4y = 0.
(c) No Solution
x+2y = 1 and 2x+4y = 3. The equations represent a pair of parallel lines and hence there
is no point of intersection. Observe that in this case, a1 b2 a2 b1 = 0 but a1 c2 a2 c1 6= 0.
21
22
1
2
No Solution
Pair of Parallel lines
1 and 2
ii. a line passing through (3, 1, 0) with direction ratios (0, 1, 1), and
iii. a line passing through (1, 4, 0) with direction ratios (0, 1, 1).
The readers are advised to supply the proof.
= b1
= b2
..
.
(2.1.1)
= bm
2.1. INTRODUCTION
23
x1
b1
a11 a12 a1n
a21 a22 a2n
x2
b2
A= .
..
.. , x = .. , and b = ..
..
..
.
.
.
.
.
am1 am2 amn
xn
bm
The matrix A is called the coefficient matrix and the block matrix [A b] , is called the
augmented matrix of the linear system (2.1.1).
Remark 2.1.2
1. The first column of the augmented matrix corresponds to the coefficients of
the variable x1 .
2. In general, the j th column of the augmented matrix corresponds to the coefficients of the
variable xj , for j = 1, 2, . . . , n.
3. The (n + 1)th column of the augmented matrix consists of the vector b.
4. The ith row of the augmented matrix represents the ith equation for i = 1, 2, . . . , m.
That is, for i = 1, 2, . . . , m and j = 1, 2, . . . , n, the entry aij of the coefficient matrix A
corresponds to the ith linear equation and the j th variable xj .
Definition 2.1.3 For a system of linear equations Ax = b, the system Ax = 0 is called the
associated homogeneous system.
Definition 2.1.4 (Solution of a Linear System) A solution of Ax = b is a column vector y
with entries y1 , y2 , . . . , yn such that the linear system (2.1.1) is satisfied by substituting yi in place
of xi . The collection of all solutions is called the solution set of the system.
That is, if yt = [y1 , y2 , . . . , yn ] is a solution of the linear system Ax = b then Ay = b holds.
For example, from Example 3.3a, we see that the vector yt = [1, 1, 1] is a solution of the system
1 1
1
Ax = b, where A = 1 4
2 , xt = [x, y, z] and bt = [3, 7, 13].
4 10 1
We now state a theorem about the solution set of a homogeneous system. The readers are
advised to supply the proof.
Theorem 2.1.5 Consider the homogeneous linear system Ax = 0. Then
1. The zero vector, 0 = (0, . . . , 0)t , is always a solution, called the trivial solution.
2. Suppose x1 , x2 are two solutions of Ax = 0. Then k1 x1 + k2 x2 is also a solution of Ax = 0
for any k1 , k2 R.
Remark 2.1.6
24
2.1.1
A Solution Method
0 1 1
Solution: In this case, the augmented matrix is 2 0 3
1 1 1
along the following steps.
= 5, x + y + z = 3.
2
5 and the solution method proceeds
3
=5
=2
=3
2
0
1
0 3
1 1
1 1
1
0
1
0
1
1
5
2 .
3
= 52
=2
=3
3
2
1
1
5
2
2 .
3
1 0
0 1
0 1
= 52
=2
= 12
x + 32 z
y+z
y 12 z
3
2
1
21
5
2
2 .
1
2
1 0
0 1
0 0
= 52
=2
= 23
= 52
=2
=1
3
2
1
23
5
2
2 .
23
2
3 .
1
0
0
0
1
0
3
2
1
1
5
2
2 .
1
The last equation gives z = 1. Using this, the second equation gives y = 1. Finally, the first
equation gives x = 1. Hence the solution set is {(x, y, z)t : (x, y, z) = (1, 1, 1)}, a unique solution.
In Example 2.1.7, observe that certain operations on equations (rows of the augmented matrix)
helped us in getting a system in Item 5, which was easily solvable. We use this idea to define
elementary row operations and equivalence of two linear systems.
2.1. INTRODUCTION
25
Definition 2.1.8 (Elementary Row Operations) Let A be an m n matrix. Then the elementary row operations are defined as follows:
1. Rij : Interchange of the ith and the j th row of A.
2. For c 6= 0, Rk (c): Multiply the k th row of A by c.
3. For c 6= 0, Rij (c): Replace the j th row of A by the j th row of A plus c times the ith row of A.
Definition 2.1.9 (Equivalent Linear Systems) Let [A b] and [C d] be augmented matrices of
two linear systems. Then the two linear systems are said to be equivalent if [C d] can be obtained
from [A b] by application of a finite number of elementary row operations.
Definition 2.1.10 (Row Equivalent Matrices) Two matrices are said to be row-equivalent if
one can be obtained from the other by a finite number of elementary row operations.
Thus, note that linear systems at each step in Example 2.1.7 are equivalent to each other. We
also prove the following result that relates elementary row operations with the solution set of a
linear system.
Lemma 2.1.11 Let Cx = d be the linear system obtained from Ax = b by application of a single
elementary row operation. Then Ax = b and Cx = d have the same solution set.
Proof. We prove the result for the elementary row operation Rjk (c) with c 6= 0. The reader is
advised to prove the result for other elementary operations.
In this case, the systems Ax = b and Cx = d vary only in the k th equation. Let (1 , 2 , . . . , n )
be a solution of the linear system Ax = b. Then substituting for i s in place of xi s in the k th and
j th equations, we get
ak1 1 + ak2 2 + akn n = bk , and aj1 1 + aj2 2 + ajn n = bj .
Therefore,
(ak1 + caj1 )1 + (ak2 + caj2 )2 + + (akn + cajn )n = bk + cbj .
(2.1.2)
(2.1.3)
2.1.2
We first define the Gauss elimination method and give a few examples to understand the method.
26
[A b] = .
..
..
..
..
..
..
.
.
.
.
.
am1
am2
amm
amn
c1n
c2n
..
.
cmn
d1
d2
..
.
dm
bm
c11
0
..
.
0
c12
c22
..
.
0
..
.
c1m
c2m
..
.
cmm
by application of elementary row operations. This elimination process is also called the forward
elimination method.
We have already seen an example before defining the notion of row equivalence. We give two
more examples to illustrate the Gauss elimination method.
Example 2.1.14 Solve the following linear system by Gauss elimination method.
x + y + z = 3, x + 2y + 2z = 5, 3x + 4y + 4z = 11
1 1 1
3
Solution: Let A = 1 2 2 and b = 5 . The Gauss Elimination method starts with the
3 4 4
11
augmented matrix [A b] and proceeds as follows:
1. Replace 2nd equation by 2nd equation minus the 1st equation.
x+y+z
=3
1 1 1
0 1 1
y+z
=2
3x + 4y + 4z = 11
3 4 4
3
2 .
11
1 1 1
x+y+z =3
0 1 1
y+z
=2
0 1 1
y+z
=2
3
2 .
2
1 1 1
x+y+z =3
0 1 1
y+z
=2
0 0 0
3
2 .
0
Thus, the solution set is {(x, y, z)t : (x, y, z) = (1, 2 z, z)} or equivalently {(x, y, z)t : (x, y, z) =
(1, 2, 0) + z(0, 1, 1)}, with z arbitrary. In other words, the system has infinite number of
solutions. Observe that the vector yt = (1, 2, 0) satisfies Ay = b and the vector zt = (0, 1, 1) is
a solution of the homogeneous system Ax = 0.
2.1. INTRODUCTION
27
Example 2.1.15 Solve the following linear system by Gauss elimination method.
x + y + z = 3, x + 2y + 2z = 5, 3x + 4y + 4z = 12
1 1 1
3
Solution: Let A = 1 2 2 and b = 5 . The Gauss Elimination
3 4 4
12
augmented matrix [A b] and proceeds as follows:
1. Replace 2nd equation by 2nd equation minus the 1st equation.
x+y+z
=3
1 1 1
y+z
=2
0 1 1
3x + 4y + 4z = 12
3 4 4
2. Replace 3rd equation by 3rd equation minus 3 times 1st equation.
x+y+z =3
1 1 1
0 1 1
y+z
=2
y+z
=3
0 1 1
3
2 .
12
3
2 .
3
1 1 1 3
x+y+z =3
0 1 1 2 .
y+z
=2
0 0 0 1
0
=1
0x + 0y + 0z = 1.
This can never hold for any value of x, y, z. Hence, the system has no solution.
Remark 2.1.16 Note that to solve a linear system Ax = b, one needs to apply only the row
operations to the augmented matrix [A b].
Definition 2.1.17 (Row Echelon Form of a Matrix) A matrix C is said to be in the row echelon form if
1. the rows consisting entirely of zeros appears after the non-zero rows,
2. the first non-zero entry in a non-zero row is 1. This term is called the leading term or a
leading 1. The column containing this term is called the leading column.
3. In any two successive non-rows, the leading 1 in the lower row occurs farther to the right than
the leading 1 in the higher row.
1
1 0
2
3
0
1
4
2
1
1 0
2
3
1
1 0
2
3
0
1
4
2
and 0
0
0
0 ,
0
0
0
1
4
0
0
0
1
0
0
0
1
1
2
1
4
0
0 0
0
0
0 0
are not in row-echelon form.
28
Definition 2.1.19 (Basic, Free Variables) Let Ax = b be a linear system consisting of m equations in n unknowns. Suppose the application of Gauss elimination method to the augmented matrix
[A b] yields the matrix [C d].
1. Then the variables corresponding to the leading columns (in the first n columns of [C d]) are
called the basic variables.
2. The variables which are not basic are called free variables.
The free variables are called so as they can be assigned arbitrary values. Also, the basic variables
can be written in terms of the free variables and hence the value of basic variables in the solution
set depend on the values of the free variables.
Remark 2.1.20 Observe the following:
1. In Example 2.1.14, the solution set was given by
(x, y, z) = (1, 2 z, z) = (1, 2, 0) + z(0, 1, 1), with z arbitrary.
That is, we had x, y as two basic variables and z as a free variable.
2. Example 2.1.15 didnt have any solution because the row-echelon form of the augmented matrix
had a row of the form [0, 0, 0, 1].
3. Suppose the application of row operations to [A b] yields the matrix [C d] which is in row
echelon form. If [C d] has r non-zero rows then [C d] will consist of r leading terms or r
leading columns. Therefore, the linear system Ax = b will have r basic variables
and n r free variables.
Before proceeding further, we have the following definition.
Definition 2.1.21 (Consistent, Inconsistent) A linear system is called consistent if it admits
a solution and is called inconsistent if it admits no solution.
We are now ready to prove conditions under which the linear system Ax = b is consistent or
inconsistent.
Theorem 2.1.22 Consider the linear system Ax = b, where A is an m n matrix and xt =
(x1 , x2 , . . . , xn ). If one obtains [C d] as the row-echelon form of [A b] with dt = (d1 , d2 , . . . , dm )
then
1. Ax = b is inconsistent (has no solution) if [C
0t = (0, . . . , 0).
2. Ax = b is consistent (has a solution) if [C d] has no row of the form [0t 1]. Furthermore,
(a) if the number of variables equals the number of leading terms then Ax = b has a unique
solution.
(b) if the number of variables is strictly greater than the number of leading terms then Ax = b
has infinite number of solutions.
2.1. INTRODUCTION
29
Proof. Part 1: The linear equation corresponding to the row [0t 1] equals
0x1 + 0x2 + + 0xn = 1.
Obviously, this equation has no solution and hence the system Cx = d has no solution. Thus, by
Theorem 2.1.12, Ax = b has no solution. That is, Ax = b is inconsistent.
Part 2: Suppose [C d] has r non-zero rows. As [C d] is in row echelon form there exist
positive integers 1 i1 < i2 < . . . < ir n such that entries ci for 1 r are leading
terms. This in turn implies that the variables xij , for 1 j r are the basic variables and the
remaining n r variables, say xt1 , xt2 , . . . , xtnr , are free variables. So for each , 1 r, one
P
obtains xi +
ck xk = d (k > i in the summation as [C d] is an upper triangular matrix). Or
k>i
equivalently,
xi = d
r
X
j=+1
cij xij
nr
X
s=1
cts xts
for 1 l r.
r
X
cij dj .
j=2
Thus, by Theorem 2.1.12 the system Ax = b is consistent. In case of Part 2a, there are no free
variables and hence the unique solution is given by
xn = dn , xn1 = dn1 dn , . . . , x1 = d1
n
X
cij dj .
j=2
In case of Part 2b, there is at least one free variable and hence Ax = b has infinite number of
solutions. Thus, the proof of the theorem is complete.
We omit the proof of the next result as it directly follows from Theorem 2.1.22.
Corollary 2.1.23 Consider the homogeneous system Ax = 0. Then
1. Ax = 0 is always consistent as 0 is a solution.
2. If m < n then n m > 0 and there will be at least n m free variables. Thus Ax = 0 has
infinite number of solutions. Or equivalently, Ax = 0 has a non-trivial solution.
We end this subsection with some applications related to geometry.
Example 2.1.24
1. Determine the equation of the line/circle that passes through the points
(1, 4), (0, 1) and (1, 4).
Solution: The general equation of a line/circle in 2-dimensional plane is given by a(x2 +
y 2 ) + bx + cy + d = 0, where a, b, c and d are the unknowns. Since this curve passes through
the given points, we have
a((1)2 + 42 ) + (1)b + 4c + d
= =0
= =0
= = 0.
a((0) + 1 ) + (0)b + 1c + d
a((1) + 4 ) + (1)b + 4c + d
16
3
d, 0, 13
d, d). Hence, taking d = 13, the equation
Solving this system, we get (a, b, c, d) = ( 13
of the required circle is
3(x2 + y 2 ) 16y + 13 = 0.
30
=0
a + 3b + 2c + d =
=0
2a b + 2c + d =
= 0.
Solving this system, we get (a, b, c, d) = ( 43 d, d3 , 23 d, d). Hence, taking d = 3, the equation
of the required plane is 4x y + 2z + 3 = 0.
2 3 4
3. Let A = 0 1 0 .
0 3 4
(a) Find a non-zero xt R3 such that Ax = 2x.
0
leads to the augmented matrix 0
0
xt = (1, 0, 0).
Ax
3
3
4
4 0
0 0 . Check that a non-zero solution is given by
2 0
Solution of Part 3b: Solving for Ay = 4y is same as solving for (A 4I)y = 0. This
2 3 4 0
leads to the augmented matrix 0 5 0 0 . Check that a non-zero solution is given by
0 3 0 0
t
y = (2, 0, 1).
Exercise 2.1.25
1. Determine the equation of the curve y = ax2 + bx + c that passes through
the points (1, 4), (0, 1) and (1, 4).
2. Solve the following linear system.
(a) x + y + z + w = 0, x y + z + w = 0 and x + y + 3z + 3w = 0.
(b) x + 2y = 1, x + y + z = 4 and 3y + 2z = 1.
(c) x + y + z = 3, x + y z = 1 and x + y + 7z = 6.
(d) x + y + z = 3, x + y z = 1 and x + y + 4z = 6.
(e) x + y + z = 3, x + y z = 1, x + y + 4z = 6 and x + y 4z = 1.
3. For what values of c and k, the following systems have i) no solution, ii) a unique solution
and iii) infinite number of solutions.
(a) x + y + z = 3, x + 2y + cz = 4, 2x + 3y + 2cz = k.
(b) x + y + z = 3, x + y + 2cz = 7, x + 2y + 3cz = k.
(c) x + y + 2z = 3, x + 2y + cz = 5, x + 2y + 4z = k.
(d) kx + y + z = 1, x + ky + z = 1, x + y + kz = 1.
(e) x + 2y z = 1, 2x + 3y + kz = 3, x + ky + 3z = 2.
2.1. INTRODUCTION
31
(f ) x 2y = 1, x y + kz = 1, ky + 4z = 6.
4. For what values of a, does the following systems have i) no solution,
and iii) infinite number of solutions.
5. Find the condition(s) on x, y, z so that the system of linear equations given below (in the
unknowns a, b and c) is consistent?
(a) a + 2b 3c = x, 2a + 6b 11c = y, a 2b + 7c = z
(b) a + b + 5c = x, a + 3c = y, 2a b + 4c = z
(c) a + 2b + 3c = x, 2a + 4b + 6c = y, 3a + 6b + 9c = z
6. Let A be an n n matrix. If the system A2 x = 0 has a non trivial solution then show that
Ax = 0 also has a non trivial solution.
7. Prove that we need to have 5 set of distinct points to specify a general conic in 2-dimensional
plane.
8. Let ut = (1, 1, 2) and vt = (1, 2, 3). Find condition on x, y and z such that the system
cut + dvt = (x, y, z) in the unknowns c and d is consistent.
2.1.3
Gauss-Jordan Elimination
The Gauss-Jordan method consists of first applying the Gauss Elimination method to get the rowechelon form of the matrix [A b] and then further applying the row operations as follows. For
example, consider Example 2.1.7. We start with Step 5 and apply row operations once again. But
this time, we start with the 3rd row.
I. Replace 2nd equation by 2nd equation minus the 3rd equation.
x + 32 z = 52
1 0 23
0 1 0
y
=2
z
=1
0 0 1
II. Replace 1st equation by 1st equation minus
x =1
y =1
z =1
3
2
5
2
1 .
1
1
0
0
0 0
1 0
0 1
1
1 .
1
III. Thus, the solution set equals {(x, y, z)t : (x, y, z) = (1, 1, 1)}.
Definition 2.1.26 (Row-Reduced Echelon Form) A matrix C is said to be in the row-reduced
echelon form or reduced row echelon form if
1. C is already in the row echelon form;
2. the leading column containing the leading 1 has every other entry zero.
A matrix which is in the row-reduced echelon form is also called a row-reduced echelon matrix.
32
0
1
0
0
0
B are in row echelon form. If C and D are
0
1
0
2
then C = 0
1
1 and D =
0
0
0
0
0
1
1 0
2
2
1
0 0
1 and B = 0
0
0
0 0
0
the row-reduced echelon
forms
of A
1
1 0
0
0
0
0 0
1
0
.
0
0 0
0
1
4
. Then A and
1
and B, respectively
Definition 2.1.28 (Back Substitution/Gauss-Jordan Method) The procedure to get The rowreduced echelon matrix from the row-echelon matrix is called the back substitution. The elimination process applied to obtain the row-reduced echelon form of the augmented matrix is called the
Gauss-Jordan elimination method.
That is, the Gauss-Jordan elimination method consists of both the forward elimination and the
backward substitution.
Remark 2.1.29 Note that the row reduction involves only row operations and proceeds from left
to right. Hence, if A is a matrix consisting of first s columns of a matrix C, then the row-reduced
form of A will consist of the first s columns of the row-reduced form of C.
The proof of the following theorem is beyond the scope of this book and is omitted.
Theorem 2.1.30 The row-reduced echelon form of a matrix is unique.
Remark 2.1.31 Consider the linear system Ax = b. Then Theorem 2.1.30 implies the following:
1. The application of the Gauss Elimination method to the augmented matrix may yield different
matrices even though it leads to the same solution set.
2. The application of the Gauss-Jordan method to the augmented matrix yields the same matrix and also the same solution set even though we may have used different sequence of row
operations.
Example 2.1.32 Consider Ax = b, where A is a 3 3 matrix. Let [C d] be the row-reduced
echelon form of [A b]. Also, assume that the first column of A has a non-zero entry. Then the
possible choices for the matrix [C d] with respective solution sets are given below:
1 0 0 d1
1. 0 1 0 d2 . Ax = b has a unique solution, (x, y, z) = (d1 , d2 , d3 ).
0 0 1 d3
1 0 d1
1
2. 0 1 d2 , 0 0
0 0
0 0 0 1
choice of , .
1 0 d1
1
3. 0 1 d2 , 0 0
0 0 0 0
0 0
lutions for every choice
0 d1
1
1 d2 or 0 0
0 1
0 0
0
1
0
of
d1
1
d2 , 0
0
0
, .
0 0
0 0
0
0
d1
1 . Ax = b has no solution for any
0
d1
0 . Ax = b has Infinite number of so0
33
Exercise 2.1.33
1. Let Ax = b be a linear system in 2 unknowns. What are the possible choices
for the row-reduced echelon form of the augmented matrix [A b]?
2. Find the row-reduced echelon form of the following matrices:
1 1
0 0 1
0 1 1 3
0 1 1
3
3
1 0 3 , 0 0 1 3 , 2 0 3 ,
1
1
3 0 7
1 1 0 0
5 1 0
1 1
2 3
3 3
.
2
2
2 2
3. Find all the solutions of the following system of equations using Gauss-Jordan method. No
other method will be accepted.
x
+y
z
2.2
2u
+ u
+ v
+2v
v
v
+ w
+2w
=
=
=
=
2
3
3
5
Elementary Matrices
In the previous section, we solved a system of linear equations with the help of either the Gauss
Elimination method or the Gauss-Jordan method. These methods required us to make row operations on the augmented matrix. Also, we know that (see Section 1.2.1 ) the row-operations
correspond to multiplying a matrix on the left. So, in this section, we try to understand the matrices which helped us in performing the row-operations and also use this understanding to get some
important results in the theory of square matrices.
Definition 2.2.1 A square matrix E of order n is called an elementary matrix if it is obtained
by applying exactly one row operation to the identity matrix, In .
Remark 2.2.2 Fix a positive integer n. Then the elementary matrices of order n are of three types
and are as follows:
1. E corresponds to the interchange of the ith and the j th row of I .
ij
3. For c 6= 0, Eij (c) is obtained by replacing the j th row of In by the j th row of In plus c times
the ith row of In .
Example 2.2.3
1 0 0
c 0
E23 = 0 0 1 , E1 (c) = 0 1
0 1 0
0 0
1 2 3 0
1 2
2. Let A = 2 0 3 4 and B = 3 4
3 4 5 6
2 0
nd
rd
change of 2 and 3 row. Verify that
1 0 0 1
E23 A = 0 0 1 2
0 1 0 3
0
1
has
0
1
0
0
c .
1
3
5
3
0
6 . Then B is obtained from A by the inter4
2
0
4
3 0
1
3 4 = 3
5 6
2
2 3
4 5
0 3
0
6 = B.
4
34
0 1 1
3. Let A = 2 0 3
1 1 1
readers are advised
2
1 0
5 . Then B = 0 1
3
0 0
to verify that
0 1
0 1 is the row-reduced echelon form of A. The
1 1
B = E32 (1) E21 (1) E3 (1/3) E23 (2) E23 E12 (2) E13 A.
Or equivalently, check that
E13 A =
E23 A2
E3 (1/3)A4
E32 (1)A6
1 1 1 3
1 1 1 3
A1 = 2 0 3 5 , E12 (2)A1 = A2 = 0 2 1 1 ,
0 1 1 2
0 1 1 2
1 1 1 3
1 1 1 3
A3 = 0 1 1 2 , E23 (2)A3 = A4 = 0 1 1 2 ,
0 2 1 1
0 0 3 3
1 1 1 3
1 0 0 1
A5 = 0 1 1 2 , E21 (1)A5 = A6 = 0 1 1 2 ,
0 0 1 1
0 0 1 1
1 0 0 1
B = 0 1 0 1 .
0 0 1 1
35
0 x1 + 1 x2 + 0 x3 + + 0 xn
0 x1 + 0 x2 + 1 x3 + + 0 xn =
..
. =
0
..
.
0 x1 + 0 x2 + 0 x3 + + 1 xn
0.
That is, the final system of homogeneous system is given by In x = 0. Or equivalently, the rowreduced echelon form of the augmented matrix [A 0] is [In 0]. That is, the row-reduced echelon
form of A is In .
3 = 4
Suppose that the row-reduced echelon form of A is In . Then using Remark 2.2.4.4, there exist
elementary matrices E1 , E2 , . . . , Ek such that
E1 E2 Ek A = In .
(2.2.4)
Now, using Remark 2.2.4, the matrix Ej1 is an elementary matrix and is the inverse of Ej for
1 j k. Therefore, successively multiplying Equation (2.2.4) on the left by E11 , E21 , . . . , Ek1 ,
we get
1
A = Ek1 Ek1
E21 E11
and thus A is a product of elementary matrices.
4 = 1
Suppose A = E1 E2 Ek ; where the Ei s are elementary matrices. As the elementary matrices
are invertible (see Remark 2.2.4) and the product of invertible matrices is also invertible, we get
the required result.
As an immediate consequence of Theorem 2.2.5, we have the following important result.
Theorem 2.2.6 Let A be a square matrix of order n.
1. Suppose there exists a matrix C such that CA = In . Then A1 exists.
2. Suppose there exists a matrix B such that AB = In . Then A1 exists.
Proof. Suppose there exists a matrix C such that CA = In . Let x0 be a solution of the
homogeneous system Ax = 0. Then Ax0 = 0 and
x0 = In x0 = (CA)x0 = C(Ax0 ) = C0 = 0.
That is, the homogeneous system Ax = 0 has only the trivial solution. Hence, using Theorem 2.2.5,
the matrix A is invertible.
36
0 0 1
Example 2.2.8 Find the inverse of the matrix 0 1 1 using the Gauss-Jordan method.
1 1 1
0 0 1 1 0 0
Solution: let us apply the Gauss-Jordan method to the matrix 0 1 1 0 1 0 .
1 1 1 0 0 1
0 0 1 1 0 0
1 1 1 0 0 1
1. 0 1 1 0 1 0 R13 0 1 1 0 1 0
1 1 1 0 0 1
0 0 1 1 0 0
1 1 1 0 0 1 1 1 0 1 0 1
R
(1)
31
2. 0 1 1 0 1 0 0 1 0 1 1 0
0 0 1 1 0 0 R32 (1) 0 0 1 1 0 0
1 1 0 1 0 1
1 0 0 0 1 1
3. 0 1 0 1 1 0 R21 (1) 0 1 0 1 1 0 .
0 0 1 1 0 0
0 0 1 1
0 0
0 1 1
Thus, the inverse of the given matrix is 1 1 0 .
1
0 0
Exercise 2.2.9
1. Find the
1 2 3
1
(i) 1 3 2 , (ii) 2
2 4 7
2
3 3
2 1 1
0 0 2
3 2 , (iii) 1 2 1 , (iv) 0 2 1.
4 7
1
2 0 1
0 0
1 1 0
1 0
2
0 1 0 , 0 1 0 , 0 1 0 , 5 1
0 0 1
0 0 1
0 0 1
0 0
2 1
0
0 0
0 , 0 1
1
1 0
1
0 0
0 , 1 0
0
0 1
1
0 .
0
2 1
3. Let A =
.
1 2
I2 .
1 1
4. Let B = 0 1
0 0
I3 .
37
1
1 . Determine elementary matrices E1 , E2 and E3 such that E3 E2 E1 B =
3
We now state another important result whose proof is immediate from Theorem 2.2.10 and
Theorem 2.2.5 and hence the proof is omitted.
38
Theorem 2.2.11 Let A be an n n matrix. Then the two statements given below cannot hold
together.
1. The system Ax = b has a unique solution for every b.
2. The system Ax = 0 has a non-trivial solution.
Exercise 2.2.12
1. Let A and B be two square matrices of the same order such that B = P A.
Prove that A is invertible if and only if B is invertible.
2. Let A and B be two m n matrices. Then prove that the two matrices A, B are row-equivalent
if and only if B = P A, where P is product of elementary matrices. When is this P unique?
3. Let bt = [1, 2, 1, 2]. Suppose A is a 4 4 matrix such that the linear system Ax = b has
no solution. Mark each of the statements given below as true or false?
(a) The homogeneous system Ax = 0 has only the trivial solution.
(b) The matrix A is invertible.
(c) Let ct = [1, 2, 1, 2]. Then the system Ax = c has no solution.
2.3
the
the
the
the
the
Rank of a Matrix
In the previous section, we gave a few equivalent conditions for a square matrix to be invertible.
We also used the Gauss-Jordan method and the elementary matrices to compute the inverse of a
square matrix A. In this section and the subsequent sections, we will mostly be concerned with
m n matrices.
Let A by an m n matrix. Suppose that C is the row-reduced echelon form of A. Then the
matrix C is unique (see Theorem 2.1.30). Hence, we use the matrix C to define the rank of the
matrix A.
Definition 2.3.1 (Row Rank of a Matrix) Let C be the row-reduced echelon form of a matrix
A. The number of non-zero rows in C is called the row-rank of A.
For a matrix A, we write row-rank (A) to denote the row-rank of A. By the very definition, it is
clear that row-equivalent matrices have the same row-rank. Thus, the number of non-zero rows in
either the row echelon form or the row-reduced echelon form of a matrix are equal. Therefore, we
just need to get the row echelon form of the matrix to know its rank.
1 2 1 1
Example 2.3.2
1. Determine the row-rank of A = 2 3 1 2 .
1 1 2 1
Solution: The row-reduced echelon form of A is obtained as follows:
1 2 1 1
1 2
1 1
1 2 1 1
1
2 3 1 2 0 1 1 0 0 1 1 0 0
1 1 2 1
0 1 1 0
0 0 2 0
0
0
1
0
0 1
0 0 .
1 0
39
1 2
2. Determine the row-rank of A = 2 3
1 1
Solution: row-rank(A) = 2 as one has
1 2
2 3
1 1
1 1
1 2
0 1
1
2
1
0
0
1 1
2 2 .
1 1
following:
2
1 1 1
1 2
1 1 0 0 0 1
1 1 0 0
0 0
1 1
1 0
0 0
1
0 .
0
The following remark related to the augmented matrix is immediate as computing the rank only
involves the row operations (also see Remark 2.1.29).
Remark 2.3.3 Let Ax = b be a linear system with m equations in n unknowns. Then the rowreduced echelon form of A agrees with the first n columns of [A b], and hence
row-rank(A) row-rank([A b]).
Now, consider an m n matrix A and an elementary matrix E of order n. Then the product AE
corresponds to applying column transformation on the matrix A. Therefore, for each elementary
matrix, there is a corresponding column transformation as well. We summarize these ideas as
follows.
Definition 2.3.4 The column transformations obtained by right multiplication of elementary matrices are called column operations.
1 2 3 1
Example 2.3.5 Let A = 2 0 3 2. Then
3 4 5 3
1
0
A
0
0
0
0
1
0
0
1
0
0
0
1 3
0
=
2 3
0
3 5
1
2 1
0 2
4 3
1
0
and A
0
0
0
1
0
0
0 1
1 2
0 0
=
2 0
1 0
3 4
0 1
3 0
3 0 .
5 0
Remark 2.3.6 After application of a finite number of elementary column operations (see Definition 2.3.4) to a matrix A, we can obtain a matrix B having the following properties:
1. The first nonzero entry in each column is 1, called the leading term.
2. Column(s) containing only 0s comes after all columns with at least one non-zero entry.
3. The first non-zero entry (the leading term) in each non-zero column moves down in successive
columns.
We define column-rank of A as the number of non-zero columns in B.
It will be proved later that row-rank(A) = column-rank(A). Thus we are led to the following
definition.
Definition 2.3.7 The number of non-zero rows in the row-reduced echelon form of a matrix A is
called the rank of A, denoted rank(A).
40
Theorem 2.3.8 Let A be a matrix of rank r. Then there exist a finite number of elementary
matrices E1 , E2 , . . . , Es and F1 , F2 , . . . , F such that
Ir 0
E1 E2 . . . Es A F1 F2 . . . F =
.
0 0
Proof. Let C be the row-reduced echelon matrix of A. As rank(A) = r, the first r rows of C are
non-zero rows. So by Theorem 2.1.22, C will have r leading columns, say i1 , i2 , . . . , ir . Note that,
th row and zero, elsewhere.
for 1 s r, the ith
s column will have 1 in the s
We now apply column operations to the matrix C. Let D be the matrix obtained from C by
successively interchanging the sth and ith
column of C for 1 s r. Then D has the form
s
Ir B
, where B is a matrix of an appropriate size. As the (1, 1) block of D is an identity matrix,
0 0
the block (1, 2) can be made the zero matrix by application of column operations to D. This gives
the required result.
The next result is a corollary of Theorem 2.3.8. It gives the solution set of a homogeneous
system Ax = 0. One can also obtain this result as a particular case of Corollary 2.1.23.2 as by
definition rank(A) m, the number of rows of A.
Corollary 2.3.9 Let A be an m n matrix. Suppose rank(A) = r < n. Then Ax = 0 has infinite
number of solutions. In particular, Ax = 0 has a non-trivial solution.
Proof. By Theorem 2.3.8, there
exist
elementary matrices E1 , . . . , Es and F1 , . . . , F such that
Ir 0
E1 E2 Es A F1 F2 F =
. Define P = E1 E2 Es and Q = F1 F2 F . Then the
0 0
Ir 0
matrix P AQ =
. As Ei s for 1 i s correspond only to row operations, we get AQ =
0 0
C 0 , where C is a matrix of size m r. Let Q1 , Q2 , . . . , Qn be the columns of the matrix
Q. Then check that AQi = 0 for i = r + 1, . . . , n. Hence, the required results follows (use
Theorem 2.1.5).
Exercise 2.3.10
1. Determine ranks of the coefficient and the augmented matrices that appear
in Exercise 2.1.25.2.
2. Let P and Q be invertible matrices such that the matrix product P AQ is defined. Prove that
rank(P AQ) = rank(A).
2 4 8
1 0 0
3. Let A =
and B =
. Find P and Q such that B = P AQ.
1 3 2
0 1 0
4. Let A and B be two matrices. Prove that
(a) if A + B is defined, then rank(A + B) rank(A) + rank(B),
41
6. Let A be an m n matrix of rank r. Then prove that A can be written as A = BC, where both
B and C have rank r and B is of size m r and C is of size r n.
7. Let A and B be two matrices such that AB is defined and rank(A) = rank(AB). Then prove
that A = ABX for some matrix X. Similarly, if BA is defined and rank (A) = rank (BA),
then
A #= Y BA for some matrix Y." [Hint: #Choose invertible matrices P, Q satisfying P AQ =
"
A1
0
"
C
0
0
A2 A3
, P (AB) = (P AQ)(Q1 B) =
. Now find R an invertible matrix with P (AB)R =
0
0
0
#
"
#
0
C 1 A1 0
. Define X = R
Q1 .]
0
0
0
8. Suppose the matrices B and C are invertible and the involved partitioned products are defined,
then prove that
1
A B
0
C 1
=
.
C 0
B 1 B 1 AC 1
A11 A12
B11 B12
and B =
. Also, assume that A11 is
A21 A22
B21 B22
invertible and define P = A22 A21 A1
11 A12 . Then prove that
A11
A12
(a) A is row-equivalent to the matrix
,
0
A22 A21 A1
11 A12
1
1
1
A11 + (A1
(A21 A1
(A1
11 A12 )P
11 )
11 A12 )P
.
(b) P is invertible and B =
P 1 (A21 A1
P 1
11 )
9. Suppose A1 = B with A =
We end this section by giving another equivalent condition for a square matrix to be invertible.
To do so, we need the following definition.
Definition 2.3.11 A n n matrix A is said to be of full rank if rank(A) = n.
Theorem 2.3.12 Let A be a square matrix of order n. Then the following statements are equivalent.
1. A is invertible.
2. A has full rank.
3. The row-reduced form of A is In .
Proof. 1 = 2
Let if possible rank(A) = r < n. Then there exists an invertible matrix P (a product of
B1 B2
elementary matrices) such that P A =
, where B1 is an r r matrix. Since A is invertible,
0
0
C1
let A1 =
, where C1 is an r n matrix. Then
C2
1
P = P In = P (AA
) = (P A)A
B1
=
0
B2
0
C1
B1 C1 + B2 C2
=
.
C2
0
Thus, P has n r rows consisting of only zeros. Hence, P cannot be invertible. A contradiction.
Thus, A is of full rank.
2 = 3
42
Suppose A is of full rank. This implies, the row-reduced echelon form of A has all non-zero
rows. But A has as many columns as rows and therefore, the last row of the row-reduced echelon
form of A is [0, 0, . . . , 0, 1]. Hence, the row-reduced echelon form of A is In .
3 = 1
Using Theorem 2.2.5.3, the required result follows.
2.4
Existence of Solution of Ax = b
In Section 2.2, we studied the system of linear equations in which the matrix A was a square matrix.
We will now use the rank of a matrix to study the system of linear equations even when A is not
a square matrix. Before proceeding with our main result, we give an example for motivation and
observations. Based on these observations, we will arrive at a better understanding, related to the
existence and uniqueness results for the linear system Ax = b.
Consider a linear system Ax = b. Suppose the application of the Gauss-Jordan method has
reduced the augmented matrix [A b] to
1
0
2 1
0
0
2 8
0
1
1 3
0
0
5 1
0
0
0
1
0
1
2
[C d] =
.
0
1
1 4
0
0 0
0
0
0
0 0
0
0
0 0
0
0
0 0
0
0
0 0
Then to get the solution set, we observe the following.
Observations:
1. The number of non-zero rows in C is 4. This number is also equal to the number of non-zero
rows in [C d]. So, there are 4 leading columns/basic variables.
2. The leading terms appear in columns 1, 2, 5 and 6. Thus, the respective variables x1 , x2 , x5
and x6 are the basic variables.
3. The remaining variables, x3 , x4 and x7 are free variables.
Hence, the solution set is given by
x1
8 2x3 + x4 2x7
8
2
1
2
x2
1 x3 3x4 5x7 1
1
3
5
0
x
1
0
0
x3
3
x4
= 0 + x3 0 + x4 1 + x7 0 ,
x4 =
0
x5
0
1
2 + x7
x6
4
0
0
1
4 x7
x7
x7
0
0
0
1
where x3 ,x4 and x7 are
arbitrary.
2
2
1
8
5
3
1
1
0
1
0
0
Let x0 = 0 , u1 = 0 , u2 = 1 and u3 = 0 .
1
0
0
2
1
0
4
0
0
1
0
0
2.5. DETERMINANT
43
Then it can easily be verified that Cx0 = d, and for 1 i 3, Cui = 0. Hence, it follows that
Ax0 = d, and for 1 i 3, Aui = 0.
A similar idea is used in the proof of the next theorem and is omitted. The proof appears on
page 76 as Theorem 3.3.26.
Theorem 2.4.1 (Existence/Non-Existence Result) Consider a linear system Ax = b, where
A is an m n matrix, and x, b are vectors of orders n 1, and m 1, respectively. Suppose
rank (A) = r and rank([A b]) = ra . Then exactly one of the following statement holds:
1. If r < ra , the linear system has no solution.
2. if ra = r, then the linear system is consistent. Furthermore,
(a) if r = n then the solution set contains a unique vector x0 satisfying Ax0 = b.
(b) if r < n then the solution set has the form
{x0 + k1 u1 + k2 u2 + + knr unr : ki R, 1 i n r},
where Ax0 = b and Aui = 0 for 1 i n r.
Remark 2.4.2 Let A be an m n matrix. Then Theorem 2.4.1 implies that
1. the linear system Ax = b is consistent if and only if rank(A) = rank([A b]).
2. the vectors ui , for 1 i n r, correspond to each of the free variables.
Exercise 2.4.3 In the introduction, we gave 3 figures (see Figure 2) to show the cases that arise
in the Euclidean plane (2 equations in 2 unknowns). It is well known that in the case of Euclidean
space (3 equations in 3 unknowns), there
1. is a figure to indicate the system has a unique solution.
2. are 4 distinct figures to indicate the system has no solution.
3. are 3 distinct figures to indicate the system has infinite number of solutions.
Determine all the figures.
2.5
Determinant
In this section, we associate a number with each square matrix. To do so, we start with the following
notation. Let A be an n n matrix.
Then for each positive integers i s 1 i k and j s for
1 j , we write A(1 , . . . , k 1 , . . . , ) to mean that submatrix of A, that is obtained by
deleting the rows corresponding to i s and the columns corresponding to j s of A.
1 2
3
1 2
1
, A(1|3) =
2 . Then A(1|2) =
2 7
2
7
3
and A(1, 2|1, 3) =
4
With the notations as above, we have the following inductive definition of determinant of a
matrix. This definition is commonly known as the expansion of the determinant along the first
row. The students with a knowledge of symmetric groups/permutations can find the definition of
the determinant in Appendix 11.1.15. It is also proved in Appendix that the definition given below
does correspond to the expansion of determinant along the first row.
44
Definition 2.5.2 (Determinant of a Square Matrix) Let A be a square matrix of order n. The
determinant of A, denoted det(A) (or |A|) is defined by
if A = [a] (n = 1),
a,
n
P
det(A) =
(1)1+j a1j det A(1|j) ,
otherwise.
j=1
Example 2.5.3
1. Let A = [2]. Then det(A) = |A| = 2.
a b
2. Let A =
. Then, det(A) = |A| = a A(1|1) b A(1|2) = ad bc. For example, if
c d
1 2
1 2
= 1 5 2 3 = 1.
A=
then det(A) =
3 5
3 5
= a11 (a22 a33 a23 a32 ) a12 (a21 a33 a31 a23 )
1 2
Let A = 2 3
1 2
Exercise
1 2
0 4
i)
0 0
0 0
3
3 1
2 2 1 + 3 2 3 = 4 2(3) + 3(1) = 1.
1. Then |A| = 1
2 2
1 2
1 2
2
7 8
3 0 0 1
1 a a2
3 2
0 2 0 5
2
, ii)
6 7 1 0 , iii) 1 b b .
2 3
2
1 c c
0 5
3 2 0 6
2.5. DETERMINANT
45
Since det(In ) = 1, where In is the n n identity matrix, the following remark gives the determinant of the elementary matrices. The proof is omitted as it is a direct application of Theorem 2.5.6.
n
X
j=1
(1)k+j akj det A(k|j) .
2 2 6
Example 2.5.9
1. Let A = 1 3 2 . Determine det(A).
1 1 2
2 2 6
1 1 3 R21 (1)
Solution: Check that 1 3 2 R1 (2) 1 3 2
1 1 2 R31 (1)
1 1 2
rem 2.5.6, det(A) = 2 1 2 (1) = 4.
1
0
0
1
2
0
3
1 . Thus, using Theo1
2 2 6 8
1 1 2 4
2. Let A =
1 3 2 6 . Determine det(A).
3 3 5 8
Solution: The successive application of row operations R1 (2), R21 (1), R31 (1), R41 (3), R23
and R34 (4) and the application of Theorem 2.5.6 implies
1
0
det(A) = 2 (1)
0
0
1
2
0
0
3
4
1 2
= 16.
1 0
0 4
Observe that the row operation R1 (2) gives 2 as the first product and the row operation R23
gives 1 as the second product.
Remark 2.5.10
1. Let ut = (u1 , u2 ) and vt = (v1 , v2 ) be two vectors in R2 . Consider the
parallelogram on vertices P = (0, 0)t , Q = u, R = u + v and
S = v (see Figure 3). Then
u1 u2
.
Area (P QRS) = |u1 v2 u2 v1 |, the absolute value of
v1 v2
46
uv
T
w
R
S v
Qu
uv
(u)(v)
2
p
p
(u)2 + (v)2 (u v)2 = (u1 v2 u2 v1 )2
= |u1 v2 u2 v1 |.
The vector u v is perpendicular to the plane containing both u and v. Note that if u3 =
v3 = 0, then we can think of u and v as vectors in the XY -plane and in this case (u v) =
|u1 v2 u2 v1 | = Area(P QRS). Hence, if is the angle between the vector w and the vector
u v, then
w1
volume (P ) = Area(P QRS) height = |w (u v)| = u1
v
1
w2
u2
v2
w3
u3 .
v3
In general, for any n n matrix A, it can be proved that | det(A)| is indeed equal to the volume
of the n-dimensional parallelepiped. The actual proof is beyond the scope of this book.
Exercise 2.5.11 In each of the questions given below, use Theorem 2.5.6 to arrive at your answer.
a b c
a b a + b + c
a b c
1. Let A = e f g , B = e f g and C = e f e + f + g for some complex
h j h + j +
h j
h j
numbers and . Prove that det(B) = det(A) and det(C) = det(A).
1 3
2. Let A = 2 3
1 5
1 1 0
2
1 and B = 1 0 1. Prove that 3 divides det(A) and det(B) = 0.
0 1 1
3
2.5. DETERMINANT
2.5.1
47
Adjoint of a Matrix
Definition 2.5.12 (Minor, Cofactor of a Matrix) The number det (A(i|j)) is called the (i, j)th
minor of A. We write Aij = det (A(i|j)) . The (i, j)th cofactor of A, denoted Cij , is the number
(1)i+j Aij .
Definition 2.5.13 (Adjoint of a Matrix) Let A be an n n matrix. The matrix B = [bij ] with
bij = Cji , for 1 i, j n is called the Adjoint of A, denoted Adj(A).
3
4
2 7
1 . Then Adj(A) = 3 1 5 as
2
1
0 1
1 2
Example 2.5.14 Let A = 2 3
1 2
n
P
n
P
n
P
aij Cij =
j=1
j=1
aij Cj =
j=1
n
P
j=1
1
Adj(A).
det(A)
(2.5.2)
Proof. Part 1: It directly follows from Remark 2.5.8 and the definition of the cofactor.
Part 2: Fix positive integers i, with 1 i 6= n. And let B = [bij ] be a square matrix
whose th row equals the ith row of A and the remaining rows of B are the same as that of A.
Then by construction, the ith and th rows of B are equal. Thus, by Theorem 2.5.6.5, det(B) =
0. As A(|j) = B(|j) for 1 j n, using Remark 2.5.8, we have
0 = det(B)
=
=
n
n
X
X
(1)+j bj det B(|j) =
(1)+j aij det B(|j)
j=1
n
X
j=1
(1)+j aij det A(|j) =
j=1
n
X
aij Cj .
(2.5.3)
j=1
A Adj(A)
=
ij
n
X
k=1
n
X
aik Adj(A) kj =
aik Cjk =
k=1
0,
det(A),
if i 6= j,
if i = j.
1
det(A) Adj(A)
= In . Hence, by
48
1 1 0
Example 2.5.16 For A = 0 1 1 ,
1 2 1
1/2 1/2
Theorem 2.5.15.3, A1 = 1/2 1/2
1/2
3/2
1 1
Adj(A) = 1
1
1 3
1/2
1/2 .
1/2
1
1 and det(A) = 2. Thus, by
1
The next corollary is a direct consequence of Theorem 2.5.15.3 and hence the proof is omitted.
if j = k,
if j 6= k.
The next result gives another equivalent condition for a square matrix to be invertible.
Theorem 2.5.18 A square matrix A is non-singular if and only if A is invertible.
1
Proof. Let A be non-singular. Then det(A) 6= 0 and hence A1 = det(A)
Adj(A).
Now, let us assume that A is invertible. Then, using Theorem 2.2.5, A = E1 E2 Ek , a product
of elementary matrices. Also, by Remark 2.5.7, det(Ei ) 6= 0 for each i, 1 i k. Thus, by a
repeated application of the first three parts of Theorem 2.5.6 gives det(A) 6= 0. Hence, the required
result follows.
We are now ready to prove a very important result that related the determinant of product of
two matrices with their determinants.
Theorem 2.5.19 Let A and B be square matrices of order n. Then
det(AB) = det(A) det(B) = det(BA).
Proof. Step 1. Let A be non-singular. Then by Theorem 2.5.15.3, A is invertible. Hence, using
Theorem 2.2.5, A = E1 E2 Ek , a product of elementary matrices. Then a repeated application
of the first three parts of Theorem 2.5.6 gives
det(AB)
=
=
=
=
=
Thus, if A is non-singular then det(AB) = det(A) det(B). This will be used in the second step.
Step 2. Let A be singular. Then using Theorem 2.5.18
A is not invertible. Hence, there exists
C1
an invertible matrix P such that P A = C, where C =
. So, A = P 1 C and therefore
0
det(AB)
=
=
=
C1 B
det((P 1 C)B) = det(P 1 (CB)) = det P 1
0
C1 B
det(P 1 ) det
as P 1 is non-singular
0
det(P ) 0 = 0 = 0 det(B) = det(A) det(B).
2.5. DETERMINANT
49
The next result relates the determinant of a matrix with the determinant of its transpose. As
an application of this result, determinant can be computed by expanding along any column as well.
Theorem 2.5.20 Let A be a square matrix. Then det(A) = det(At ).
Proof. If A is a non-singular, Corollary 2.5.17 gives det(A) = det(At ).
If A is singular, then by Theorem 2.5.18, A is not invertible. Therefore, At is also not invertible
t
(as At is invertible implies A1 = (At )1 )). Thus, using Theorem 2.5.18 again, det(At ) = 0 =
det(A). Hence the required result follows.
2.5.2
Cramers Rule
Let A be a square matrix. Then using Theorem 2.2.10 and Theorem 2.5.18, one has the following
result.
Theorem 2.5.21 Let A be a square matrix. Then the following statements are equivalent:
1. A is invertible.
2. T he linear system Ax = b has a unique solution for every b.
3. det(A) 6= 0.
Thus, Ax = b has a unique solution for every b if and only if det(A) 6= 0. The next theorem
gives a direct method of finding the solution of the linear system Ax = b when det(A) 6= 0.
Theorem 2.5.22 (Cramers Rule) Let A be an n n matrix. If det(A) 6= 0 then the unique
solution of the linear system Ax = b is
xj =
det(Aj )
,
det(A)
for j = 1, 2, . . . , n,
where Aj is the matrix obtained from A by replacing the jth column of A by the column vector b.
Proof. Since det(A) 6= 0, A1 =
x=
1
Adj(A). Thus, the linear system Ax = b has the solution
det(A)
1
Adj(A)b. Hence xj , the jth coordinate of x is given by
det(A)
xj =
b1 a12
b2 a22
In Theorem 2.5.22 A1 = .
..
..
.
bn an2
a11 a1n1 b1
a12 a2n1 b2
till An = .
..
.. .
..
..
.
.
.
a1n ann1 bn
..
.
a1n
a11
a2n
a21
.. , A2 = ..
.
.
ann
an1
b1
b2
..
.
bn
a13
a23
..
.
an3
..
.
a1n
a2n
.. and so on
.
ann
50
1 2
2.6
Miscellaneous Exercises
Exercise 2.6.1
3
1
1 and b = 1.
2
1
2 1
3 1 = 0.
2 1
2. Prove that every 2 2 matrix A satisfying tr(A) = 0 and det(A) = 0 is a nilpotent matrix.
3. Let A and B be two non-singular matrices. Are the matrices A + B and A B non-singular?
Justify your answer.
4. Let A be an n n matrix. Prove that the following statements are equivalent:
(a) A is not invertible.
(b) rank(A) 6= n.
(c) det(A) = 0.
6. Let x1 , x2 , . . . , xn be fixed reals numbers and define A = [aij ]nn with aij = xj1
. Prove that
i
Q
det(A) =
(xj xi ). This matrix is usually called the Van-der monde matrix.
1i<jn
7. Let A = [aij ]nn with aij = max{i, j}. Prove that det A = (1)n1 n.
1
8. Let A = [aij ]nn with aij = i+j1
. Using induction, prove that A is invertible. This matrix
is commonly known as the Hilbert matrix.
2.7. SUMMARY
51
(a) If all the entries in odd positions are multiplied with 1 then the value of the determinant
doesnt change.
(b) If all entries in even positions are multiplied with 1 then the determinant
i. does not change if the matrix is of even order.
2.7
Summary
In this chapter, we started with a system of linear equations Ax = b and related it to the augmented
matrix [A |b]. We applied row operations to [A |b] to get its row echelon form and the row-reduced
echelon forms. Depending on the row echelon matrix, say [C |d], thus obtained, we had the following
result:
1. If [C |d] has a row of the form [0 |1] then the linear system Ax = b has not solution.
2. Suppose [C |d] does not have any row of the form [0 |1] then the linear system Ax = b has
at least one solution.
(a) If the number of leading terms equals the number of unknowns then the system Ax = b
has a unique solution.
(b) If the number of leading terms is less than the number of unknowns then the system
Ax = b has an infinite number of solutions.
The following conditions are equivalent for an n n matrix A.
1. A is invertible.
2. The homogeneous system Ax = 0 has only the trivial solution.
3. The row reduced echelon form of A is I.
4. A is a product of elementary matrices.
5. The system Ax = b has a unique solution for every b.
6. The system Ax = b has a solution for every b.
7. rank(A) = n.
8. det(A) 6= 0.
Suppose the matrix A in the linear system Ax = b is of size m n. Then exactly one of the
following statement holds:
1. if rank(A) < rank([A |b]), then the system Ax = b has no solution.
52
Chapter 3
Recall that the set of real numbers were denoted by R and the set of complex numbers were denoted
by C. Also, we wrote F to denote either the set R or the set C.
Let A be an m n complex matrix. Then using Theorem 2.1.5, we see that the solution set of
the homogeneous system Ax = 0, denoted V , satisfies the following properties:
1. The vector 0 V as A0 = 0.
2. If x V then A(x) = (Ax) = 0 for all C. Hence, x V for any complex number .
In particular, x V whenever x V .
3. Let x, y V . Then for any , C, x, y V and A(x + y) = 0 + 0 = 0. In particular,
x + y V and x + y = y + x. Also, (x + y) + z = x + (y + z).
That is, the solution set of a homogeneous linear system satisfies some nice properties. We use
these properties to define a set and devote this chapter to the study of the structure of such sets.
We will also see that the set of real numbers, R, the Euclidean plane, R2 and the Euclidean space,
R3 , are examples of this set. We start with the following definition.
Definition 3.1.1 (Vector Space) A vector space over F, denoted V (F) or in short V (if the field
F is clear from the context), is a non-empty set, satisfying the following axioms:
1. Vector Addition: To every pair u, v V there corresponds a unique element u v in V
(called the addition of vectors) such that
(a) u v = v u (Commutative law).
(c) There is a unique element 0 in V (the zero vector) such that u 0 = u, for every u V
(called the additive identity).
(d) For every u V there is a unique element u V such that u (u) = 0 (called the
additive inverse).
2. Scalar Multiplication: For each u V and F, there corresponds a unique element
u in V (called the scalar multiplication) such that
(a) ( u) = () u for every , F and u V.
53
54
(b) ( + ) u = ( u) ( u).
1
1
1
0 = ( u) = ( ) u = 1 u = u
Example 3.1.4 The readers are advised to justify the statements made in the examples given below.
1. Let A be an m n matrix with complex entries and suppose rank(A) = r n. Let V denote
the solution set of Ax = 0. Then using Theorem 2.4.1, we know that V contains at least the
trivial solution, the 0 vector. Thus, check that the set V satisfies all the axioms stated in
Definition 3.1.1 (some of them were proved to motivate this chapter).
55
2. The set R of real numbers, with the usual addition and multiplication of real numbers (i.e.,
+ and ) forms a vector space over R.
3. Let R2 = {(x1 , x2 ) : x1 , x2 R}. Then for x1 , x2 , y1 , y2 R and R, define
(x1 , x2 ) (y1 , y2 ) = (x1 + y1 , x2 + y2 ) and (x1 , x2 ) = (x1 , x2 ).
Then R2 is a real vector space.
4. Let Rn = {(a1 , a2 , . . . , an ) : ai R, 1 i n} be the set of n-tuples of real numbers. For
u = (a1 , . . . , an ), v = (b1 , . . . , bn ) in V and R, we define
u v = (a1 + b1 , . . . , an + bn ) and u = (a1 , . . . , an )
(called component wise operations). Then V is a real vector space. This vector space Rn is
called the real vector space of n-tuples.
Recall that the symbol i represents the complex number
1.
= (z1 + w1 , . . . , zn + wn ), and
= (z1 , . . . , zn ).
Then it can be verified that Cn forms a vector space over C (called complex vector space) as
well as over R (called real vector space). Whenever there is no mention of scalars, it will
always be assumed to be C, the complex numbers.
Remark 3.1.5 If the scalars are C then i(1, 0) = (i, 0) is allowed. Whereas, if the scalars
are R then i(1, 0) 6= (i, 0).
7. Fix a positive integer n and let Pn (R) denote the set of all polynomials in x of degree n
with coefficients from R. Algebraically,
Pn (R) = {a0 + a1 x + a2 x2 + + an xn : ai R, 0 i n}.
Let f (x) = a0 + a1 x + a2 x2 + + an xn , g(x) = b0 + b1 x + b2 x2 + + bn xn Pn (R) for some
ai , bi R, 0 i n. It can be verified that Pn (R) is a real vector space with the addition and
scalar multiplication defined by
f (x) g(x) =
f (x) =
56
f (x) g(x) =
a0 + a1 x + + ap xp for R.
f (x) =
9. Let P(C) be the set of all polynomials with complex coefficients. Then with respect to vector
addition and scalar multiplication defined coefficient-wise, the set P(C) forms a vector space.
10. Let V = R+ = {x R : x > 0}. This is not a vector space under usual operations of
addition and scalar multiplication (why?). But R+ is a real vector space with 1 as the additive
identity if we define vector addition and scalar multiplication by
u v = u v and u = u for all u, v R+ and R.
11. Let V = {(x, y) : x, y R}. For any R and x = (x1 , x2 ), y = (y1 , y2 ) V , let
x y = (x1 + y1 + 1, x2 + y2 3) and x = (x1 + 1, x2 3 + 3).
Then V is a real vector space with (1, 3) as the additive identity.
12. Let M2 (C) denote the set of all 2 2 matrices with complex entries. Then M2 (C) forms a
vector space with vector addition and scalar multiplication defined by
a1 a2
b1 b2
a1 + b 1 a2 + b 2
a1 a2
AB =
=
, A=
.
a3 a4
b3 b4
a3 + b 3 a4 + b 4
a3 a4
13. Fix positive integers m and n and let Mmn (C) denote the set of all m n matrices with complex entries. Then Mmn (C) is a vector space with vector addition and scalar multiplication
defined by
A B = [aij ] [bij ] = [aij + bij ],
A = [aij ] = [aij ].
( f )(x)
=
=
15. Let V and W be vector spaces over F, with operations (+, ) and (, ), respectively. Let
V W = {(v, w) : v V, w W }. Then V W forms a vector space over F, if for every
(v1 , w1 ), (v2 , w2 ) V W and R, we define
(v1 , w1 ) (v2 , w2 ) =
(v1 , w1 ) =
(v1 + v2 , w1 w2 ), and
( v1 , w1 ).
v1 + v2 and w1 w2 on the right hand side mean vector addition in V and W , respectively.
Similarly, v1 and w1 correspond to scalar multiplication in V and W, respectively.
57
( f )(x)
=
=
5. Prove that C(R), the set of all real valued continuous functions on R forms a vector space if
for all x R, we define
(f g)(x)
( f )(x)
3.1.1
=
=
Subspaces
Definition 3.1.7 (Vector Subspace) Let S be a non-empty subset of V. The set S over F is
said to be a subspace of V (F) if S in itself is a vector space, where the vector addition and scalar
multiplication are the same as that of V (F).
Example 3.1.8
1. Let V (F) be a vector space. Then the sets given below are subspaces of V.
They are called trivial subspaces.
(a) S = {0}, consisting only of the zero vector 0 and
(b) S = V , the whole space.
2. Let S = {(x, y, z) R3 : x + 2y z = 0}. Then S is a subspace of R3 (S is a plane in R3
passing through the origin).
3. Let S = {(x, y, z) R3 : x + y + z = 0, x y z = 0}. Then S is a subspace of R3 (S is a
line in R3 passing through the origin).
4. Let S = {(x, y, z) R3 : z 3x = 0}. Then S is a subspace of R3 .
5. The vector space Pn (R) is a subspace of the vector space P(R).
6. Prove that S = {(x, y, z) R3 : x + y + z = 3} is not a subspace of R3 (S is still a plane in
R3 but it does not pass through the origin).
7. Prove that W = {(x, 0) R2 : x R} is a subspace of R2 .
58
We are now ready to prove a very important result in the study of vector subspaces. This result
basically tells us that if we want to prove that a non-empty set W is a subspace of a vector space
V (F) then we just need to verify only one condition. That is, we dont have to prove all the axioms
stated in Definition 3.1.1.
Theorem 3.1.9 Let V (F) be a vector space and let W be a non-empty subset of V . Then W is a
subspace of V if and only if u + v W whenever , F and u, v W .
Proof. Let W be a subspace of V and let u, v W . Then for every , F, u, v W and
hence u + v W .
Now, let us assume that u + v W whenever , F and u, v W . Need to show, W is a
subspace of V . To do so, observe the following:
1. Taking = 1 and = 1, we see that u + v W for every u, v W .
2. Taking = 0 and = 0, we see that 0 W .
3. Taking = 0, we see that u W for every F and u W and hence using Theorem 3.1.3.3, u = (1)u W as well.
4. The commutative and associative laws of vector addition hold as they hold in V .
5. The axioms related with scalar multiplication and the distributive laws also hold as they hold
in V .
Thus, we have the required result.
Exercise 3.1.10
59
(d) All the sets given below are subspaces of C([1, 1]) (see Example 3.1.4.14).
i. W = {f C([1, 1]) : f (1/2) = 0}.
(b) {(z1 , z2 , . . . , zn ) : z1 + z2 = z3 }.
1 1 1
8. Let A = 2 0 1 . Are the sets given below subspaces of R3 ?
1 1 0
(a) W = {xt R3 : Ax = 0}.
60
10. Fix a positive integer n. Then Mn (C) is a complex vector space with usual operations of
matrix addition and scalar multiplication. Are the sets W Mn (C), given below, subspaces
of Mn (C)? Give reasons.
(a) W = {A : A = A}, the set of Hermitian matrices.
3.1.2
Linear Span
(3.1.1)
in the unknowns a, b, c R has a solution. The augmented matrix of Equation (3.1.1) equals
1 2 3 4
0 1 3 5 and it has the solution 1 = 4, 2 = 10 and 3 = 5.
0 0 1 5
2. Is (4, 5, 5) a linear combination of the vectors (1, 2, 3), (1, 1, 4) and (3, 3, 2)?
Solution: The vector (4, 5, 5) is a linear combination if the linear system
a(1, 2, 3) + b(1, 1, 4) + c(3, 3, 2) = (4, 5, 5)
(3.1.2)
61
in the unknowns a, b, c R has a solution. The row reduced echelon form of the augmented
1 0 2
3
matrix of Equation (3.1.2) equals 0 1 1 1 . Thus, one has an infinite number of
0 0 0
0
solutions. For example, (4, 5, 5) = 3(1, 2, 3) (1, 1, 4).
3. Is (4, 5, 5) a linear combination of the vectors (1, 2, 1), (1, 0, 1) and (1, 1, 0).
Solution: The vector (4, 5, 5) is a linear combination if the linear system
a(1, 2, 1) + b(1, 0, 1) + c(1, 1, 0) = (4, 5, 5)
(3.1.3)
1 1 1 4
Equation (3.1.3) gives 0 1 21 32 . Thus, Equation (3.1.3) has no solution and hence
0 0 0 1
(4, 5, 5) is not a linear combination of the given vectors.
Exercise 3.1.13
1. Prove that every x R3 is a unique linear combination of the vectors
(1, 0, 0), (2, 1, 0), and (3, 3, 1).
2. Find condition(s) on x, y and z such that (x, y, z) is a linear combination of (1, 2, 3), (1, 1, 4)
and (3, 3, 2)?
3. Find condition(s) on x, y and z such that (x, y, z) is a linear combination of the vectors
(1, 2, 1), (1, 0, 1) and (1, 1, 0).
Definition 3.1.14 (Linear Span) Let S = {u1 , u2 , . . . , un } be a non-empty subset of a vector
space V (F). The linear span of S is the set defined by
L(S) =
{1 u1 + 2 u2 + + n un : i F, 1 i n}
(3.1.4)
1 0
2y x
0 1
x y . Thus, we need 2x y z = 0 and hence
0 0 z + y 2x
62
1 1 1
x
2xy
. Thus,
to Equation (3.1.7) gives 0 1 12
3
0 0 0 xy+z
L(S) = {(x, y, z) : x y + z = 0}.
(c) S = {(1, 2, 3), (1, 1, 4), (3, 3, 2)}.
Solution: We need to find condition(s) on x, y, z such that the linear system
a(1, 2, 3) + b(1, 1, 4) + c(3, 3, 2) = (x, y, z)
in the unknowns a, b, c is always consistent. An application of Gauss elimination method
gives 5x 7y + 3z = 0 as the required condition. Thus,
L(S) = {(x, y, z) : 5x 7y + 3z = 0}.
3. S = {(1, 2, 3, 4), (1, 1, 4, 5), (3, 3, 2, 3)} R4 . Determine L(S).
Solution: The readers are advised to show that
L(S) = {(x, y, z, w) : 2x 3y + w = 0, 5x 7y + 3z = 0}.
Exercise 3.1.16 For each of the sets S, determine the geometric representation of L(S).
1. S = {1} R.
2. S = { 1014 } R.
3. S = { 15} R.
4. S = {(1, 0, 0), (0, 1, 0), (0, 0, 1)} R3 .
5. S = {(1, 0, 1), (0, 1, 0), (3, 0, 3)} R3 .
6. S = {(1, 0, 1), (1, 1, 0), (3, 4, 3)} R3 .
7. S = {(1, 2, 1), (2, 0, 1), (1, 1, 1)} R3 .
8. S = {(1, 0, 1, 1), (0, 1, 0, 1), (3, 0, 3, 1)} R4 .
Definition 3.1.17 (Finite Dimensional Vector Space) A vector space V (F) is said to be finite
dimensional if we can find a subset S of V , having finite number of elements, such that V = L(S).
If such a subset does not exist then V is called an infinite dimensional vector space.
Example 3.1.18
space.
1. The set {(1, 2), (2, 1)} spans R2 and hence R2 is a finite dimensional vector
63
2. The set {1, 1 + x, 1 x + x2 , x3 , x4 , x5 } spans P5 (C) and hence P5 (C) is a finite dimensional
vector space.
3. Fix a positive integer n and consider the vector space Pn (R). Then Pn (C) is a finite dimensional vector space as Pn (C) = L({1, x, x2 , . . . , xn }).
4. Recall P(C), the vector space of all polynomials with complex coefficients. Since degree of a
polynomial can be any large positive integer, P(C) cannot be a finite dimensional vector space.
Indeed, checked that P(C) = L({1, x, x2 , . . . , xn , . . .}).
Lemma 3.1.19 (Linear Span is a Subspace) Let S be a non-empty subset of a vector space
V (F). Then L(S) is a subspace of V (F).
Proof. By definition, S L(S) and hence L(S) is non-empty subset of V. Let u, v L(S).
Then, there exist a positive integer n, vectors wi S and scalars i , i F such that u =
1 w1 + 2 w2 + + n wn and v = 1 w1 + 2 w2 + + n wn . Hence,
au + bv = (a1 + b1 )w1 + + (an + bn )wn L(S)
for every a, b F as ai + bi F for i = 1, . . . , n. Thus using Theorem 3.1.9, L(S) is a vector
subspace of V (F).
Remark 3.1.20 Let W be a subspace of a vector space V (F). If S W then L(S) is a subspace
of W as W is a vector space in its own right.
Theorem 3.1.21 Let S be a non-empty subset of a vector space V. Then L(S) is the smallest
subspace of V containing S.
Proof. For every u S, u = 1.u L(S) and hence S L(S). To show L(S) is the smallest
subspace of V containing S, consider any subspace W of V containing S. Then by Remark 3.1.20,
L(S) W and hence the result follows.
Exercise 3.1.22
6. Let x1 = (1, 0, 0), x2 = (1, 1, 0), x3 = (1, 2, 0), x4 = (1, 1, 1) and let S = {x1 , x2 , x3 , x4 }.
Determine all xi such that L(S) = L(S \ {xi }).
64
In this section, we saw that a vector space has infinite number of vectors. Hence, one can start
with any finite collection of vectors and obtain their span. It means that any vector space contains
infinite number of other vector subspaces. Therefore, the following questions arise:
1. What are the conditions under which, the linear span of two distinct sets are the same?
2. Is it possible to find/choose vectors so that the linear span of the chosen vectors is the whole
vector space itself?
3. Suppose we are able to choose certain vectors whose linear span is the whole space. Can we
find the minimum number of such vectors?
We try to answer these questions in the subsequent sections.
3.2
Linear Independence
(3.2.2)
65
1
(c1 v1 + + cp vp ) L(v1 , v2 , . . . , vp )
cp+1
ci
F for 1 i p. Thus, the result follows.
as cp+1
We now state a very important corollary of Theorem 3.2.4 without proof. The readers are
advised to supply the proof for themselves.
Corollary 3.2.5 Let S = {u1 , . . . , un } be a subset of a vector space V (F). If S is linearly
1. dependent then there exists a k, 2 k n with L(u1 , . . . , uk ) = L(u1 , . . . , uk1 ).
2. independent and there is a vector v V with v 6 L(S) then {u1 , . . . , un , v} is also a linearly
independent subset of V.
Exercise 3.2.6
1. Consider the vector space R2 . Let u1 = (1, 0). Find all choices for the vector
u2 such that {u1 , u2 } is linearly independent subset of R2 . Does there exist vectors u2 and u3
such that {u1 , u2 , u3 } is linearly independent subset of R2 ?
2. Let S = {(1, 1, 1, 1), (1, 1, 1, 2), (1, 1, 1, 1)} R4 . Does (1, 1, 2, 1) L(S)? Furthermore,
determine conditions on x, y, z and u such that (x, y, z, u) L(S).
3. Show that S = {(1, 2, 3), (2, 1, 1), (8, 6, 10)} R3 is linearly dependent.
4. Show that S = {(1, 0, 0), (1, 1, 0), (1, 1, 1)} R3 is linearly independent.
5. Prove that {u1 , u2 , . . . , un } is a linearly independent subset of V (F) if and only if {u1 , u1 +
u2 , . . . , u1 + + un } is linearly independent subset of V (F).
66
10. Suppose V is a collection of vectors such that V (C) as well as V (R) are vector spaces. Prove
that the set {u1 , . . . , uk , iu1 , . . . , iuk } is a linearly independent subset of V (R) if and only if
{u1 , . . . , uk } is a linear independent subset of V (C).
11. Let M be a subspace of V and let u, v V . Define K = L(M, u) and H = L(M, v). If v K
and v 6 M prove that u H.
12. Let A Mn (R) and let x and y be two non-zero vectors such that Ax = 3x and Ay = 2y.
Prove that x and y are linearly independent.
2 1 3
13. Let A = 4 1 3 . Determine non-zero vectors x, y and z such that Ax = 6x, Ay = 2y
3 2 5
and Az = 2z. Use the vectors x, y and z obtained here to prove the following.
(a) A2 v = 4v, where v = cy + dz for any real numbers c and d.
(b) The set {x, y, z} is linearly independent.
6 0 0
(d) Let D = 0 2 0 . Then AP = P D.
0 0 2
14. Let P and Q be subspaces of Rn such that P + Q = Rn and P Q = {0}. Prove that each
u Rn is uniquely expressible as u = uP + uQ , where uP P and uQ Q.
3.3
Bases
Definition 3.3.1 (Basis of a Vector Space) A basis of a vector space V is a subset B of V such
that B is a linearly independent set in V and the linear span of B is V . Also, any element of B is
called a basis vector.
Remark 3.3.2 Let B be a basis of a vector space V (F). Then for each v V , there exist vectors
n
P
u1 , u2 , . . . , un B such that v =
i ui , where i F, for 1 i n. By convention, the linear
i=1
span of an empty set is {0}. Hence, the empty set is a basis of the vector space {0}.
Lemma 3.3.3 Let B be a basis of a vector space V (F). Then each v V is a unique linear
combination of the basis vectors.
Proof. On the contrary, assume that there exists v V that is can be expressed in at least two
ways as linear combination of basis vectors. That means, there exists a positive integer p, scalars
i , i F and vi B such that
v = 1 v1 + 2 v2 + + p vp and v = 1 v1 + 2 v2 + + p vp .
3.3. BASES
67
(3.3.1)
Since the vectors are from B, by definition (see Definition 3.3.1) the set S = {v1 , v2 , . . . , vp } is a
linearly independent subset of V . This implies that the linear system c1 v1 + c2 v2 + + cp vp = 0 in
the unknowns c1 , c2 , . . . , cp has only the trivial solution. Thus, each of the scalars i i appearing
in Equation (3.3.1) must be equal to 0. That is, i i = 0 for 1 i p. Thus, for 1 i p,
i = i and the result follows.
Example 3.3.4
2. The set {(1, 1), (1, 1)} is a basis of the vector space R2 (R).
3. Fix a positive integer n and let ei = (0, . . . , 0, |{z}
1 , 0, . . . , 0) Rn for 1 i n. Then
ith
place
(c) B = {e1 , e2 , e3 } with e1 = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1) is the standard basis
of R3 .
(d) B = {(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1)} is the standard basis of R4 .
4. Let V = {(x, y, 0) : x, y R} R3 . Then B = {(2, 0, 0), (1, 3, 0)} is a basis of V.
5. Let V = {(x, y, z) R3 : x + y z = 0} be a vector subspace of R3 . As each element
(x, y, z) V satisfies x + y z = 0. Or equivalently z = x + y and hence
(x, y, z) = (x, y, x + y) = (x, 0, x) + (0, y, y) = x(1, 0, 1) + y(0, 1, 1).
Hence {(1, 0, 1), (0, 1, 1)} forms a basis of V.
6. Let V = {a + ib : a, b R} be a complex vector space. Then any element a + ib V equals
a + ib = (a + ib) 1. Hence a basis of V is {1}.
7. Let V = {a + ib : a, b R} be a real vector space. Then {1, i} is a basis of V (R) as
a + ib = a 1 + b i for a, b R and {1, i} is a linearly independent subset of V (R).
8. In C2 , (a+ ib, c+ id) = (a+ ib)(1, 0)+ (c+ id)(0, 1). So, {(1, 0), (0, 1)} is a basis of the complex
vector space C2 .
9. In case of the real vector space C2 , (a + ib, c + id) = a(1, 0) + b(i, 0) + c(0, 1) + d(0, i). Hence
{(1, 0), (i, 0), (0, 1), (0, i)} is a basis.
10. B = {e1 , e2 , . . . , en } is the standard basis of Cn . But B is not a basis of the real vector space
Cn .
Before coming to the end of this section, we give an algorithm to obtain a basis of any finite
dimensional vector space V . This will be done by a repeated application of Corollary 3.2.5. The
algorithm proceeds as follows:
Step 1: Let v1 V with v1 6= 0. Then {v1 } is linearly independent.
68
Step 2: If V = L(v1 ), we have got a basis of V. Else, pick v2 V such that v2 6 L(v1 ). Then by
Corollary 3.2.5.2, {v1 , v2 } is linearly independent.
Step i: Either V = L(v1 , v2 , . . . , vi ) or L(v1 , v2 , . . . , vi ) 6= V.
In the first case, {v1 , v2 , . . . , vi } is a basis of V. In the second case, pick vi+1 V with
vi+1 6 L(v1 , v2 , . . . , vi ). Then, by Corollary 3.2.5.2, the set {v1 , v2 , . . . , vi+1 } is linearly
independent.
This process will finally end as V is a finite dimensional vector space.
Exercise 3.3.5
whenever
n
P
i=1
9. Let A = 0
0
of V = {(x, y, z, u) R4 : x y z = 0, x + z u = 0}.
0 1 1 0
1 2 3 0 . Find a basis of V = {xt R5 : Ax = 0}.
0 0 0 1
10. Prove that {1, x, x2 , . . . , xn , . . .} is a basis of the vector space P(R). This basis has an infinite
number of vectors. This is also called the standard basis of P(R).
11. Let ut = (1, 1, 2), vt = (1, 2, 3) and wt = (1, 10, 1). Find a basis of L(u, v, w). Determine
a geometrical representation of L(u, v, w)?
12. Prove that {(1, 0, 0), (1, 1, 0), (1, 1, 1)} is a basis of C3 . Is it a basis of C3 (R)?
3.3.1
We first prove a result which helps us in associating a non-negative integer to every finite dimensional
vector space.
Theorem 3.3.6 Let V be a vector space with basis {v1 , v2 , . . . , vn }. Let m be a positive integer
with m > n. Then the set S = {w1 , w2 , . . . , wm } V is linearly dependent.
Proof. We need to show that the linear system
1 w1 + 2 w2 + + m wm = 0
(3.3.2)
3.3. BASES
69
w2 =
..
. =
wm
n
n
n
X
X
X
1
aj1 vj + 2
aj2 vj + + m
ajm vj = 0.
j=1
j=1
j=1
Or equivalently,
m
X
i=1
i a1i
v1 +
m
X
i a2i
i=1
v2 + +
m
X
i=1
i ani
vn = 0.
(3.3.3)
i a1i =
m
X
i=1
i a2i = =
m
X
i ani = 0.
i=1
Therefore, finding i s satisfying Equation (3.3.2) reduces to solving the homogeneous system A =
a11 a12 a1m
1
a21 a22 a2m
2
0 where = . and A = .
..
.. .
..
..
..
.
.
.
an1 an2 anm
m
Since n < m, Corollary 2.1.23.2 (here the matrix A is m n) implies that A = 0 has a nontrivial solution. hence Equation (3.3.2) has a non-trivial solution and thus {w1 , w2 , . . . , wm } is a
linearly dependent set.
Corollary 3.3.7 Let B1 = {u1 , . . . , un } and B2 = {v1 , . . . , vm } be two bases of a finite dimensional
vector space V . Then m = n.
Proof. Let if possible, m > n. Then by Theorem 3.3.6, {v1 , . . . , vm } is a linearly dependent
subset of V , contradicting the assumption that B2 is a basis of V . Hence we must have m n. A
similar argument implies n m and hence m = n.
Let V be a finite dimensional vector space. Then Corollary 3.3.7 implies that the number of
elements in any basis of V is the same. This number is used to define the dimension of any finite
dimensional vector space.
Definition 3.3.8 (Dimension of a Finite Dimensional Vector Space) Let V be a finite dimensional vector space. Then the dimension of V , denoted dim(V ), is the number of elements in a
basis of V .
Note that Corollary 3.2.5.2 can be used to generate a basis of any non-trivial finite dimensional
vector space.
70
Example 3.3.9 The dimension of vector spaces in Example 3.3.4 are as follows:
1. dim(R) = 1 in Example 3.3.4.1.
2. dim(R2 ) = 2 in Example 3.3.4.2.
3. dim(V ) = 2 in Example 3.3.4.4.
4. dim(V ) = 2 in Example 3.3.4.5.
5. dim(V ) = 1 in Example 3.3.4.6.
6. dim(V ) = 2 in Example 3.3.4.7.
7. dim(C2 ) = 2 in Example 3.3.4.8.
8. dim(C2 (R)) = 4 in Example 3.3.4.9.
9. For fixed positive integer n, dim(Rn ) = n in Example 3.3.4.3 and in Example 3.3.4.10, one
has dim(Cn ) = n and dim(Cn (R)) = 2n.
Thus, we see that the dimension of a vector space dependents on the set of scalars.
Example 3.3.10 Let V be the set of all functions f : Rn R with the property that f (x + y) =
f (x) + f (y) and f (x) = f (x) for all x, y Rn and R. For any f, g V, and t R, define
(f g)(x) = f (x) + g(x) and (t f )(x) = f (tx).
Then it can be easily verified that V is a real vector space. Also, for 1 i n, define the functions
ei (x) = ei (x1 , x2 , . . . , xn ) = xi . Then it can be easily verified that the set {e1 , e2 , . . . , en } is a
basis of V and hence dim(V ) = n.
The next theorem follows directly from Corollary 3.2.5.2 and Theorem 3.3.6. Hence, the proof
is omitted.
Theorem 3.3.11 Let S be a linearly independent subset of a finite dimensional vector space V.
Then the set S can be extended to form a basis of V.
Theorem 3.3.11 is equivalent to the following statement:
Let V be a vector space of dimension n. Suppose, we have found a linearly independent subset
{v1 , . . . , vr } of V with r < n. Then it is possible to find vectors vr+1 , . . . , vn in V such that
{v1 , v2 , . . . , vn } is a basis of V. Thus, one has the following important corollary.
Corollary 3.3.12 Let V be a vector space of dimension n. Then
1. any set consisting of n linearly independent vectors forms a basis of V.
2. any subset S of V having n vectors with L(S) = V forms a basis of V .
Exercise 3.3.13
2. Let W1 and W2 be two subspaces of a vector space V such that W1 W2 . Show that W1 = W2
if and only if dim(W1 ) = dim(W2 ).
3.3. BASES
71
3. Consider the vector space C([, ]). For each integer n, define en (x) = sin(nx). Prove that
{en : n = 1, 2, . . .} is a linearly independent set.
[Hint: For any positive integer , consider the set {ek1 , . . . , ek } and the linear system
1 sin(k1 x) + 2 sin(k2 x) + + sin(k x) = 0 for all x [, ]
in the unknowns 1 , . . . , n . Now for suitable values of m, consider the integral
Z
sin(mx) (1 sin(k1 x) + 2 sin(k2 x) + + sin(k x)) dx
3.3.2
In this subsection, we will study results that are intrinsic to the understanding of linear algebra,
especially results associated with matrices. We start with a few exercises that should have appeared
in previous sections of this chapter.
1. Let V = {A M2 (C) : tr(A) = 0}, where tr(A) stands for the trace of the
a b
matrix A. Show that V is a real vector space and find its basis. Is W =
: c = b
c a
a subspace of V ?
Exercise 3.3.14
2. Find the dimension and a basis for each of the subspaces given below.
(a) sln (R) = {A Mn (R) : tr(A) = 0}.
(b) Sn (R) = {A Mn (R) : A = At }.
72
Definition 3.3.15 Let A Mmn (C) and let R1 , R2 , . . . , Rm Cn be the rows of A and a1 , a2 , . . . , an
Cm be its columns. We define
1. RowSpace(A), denoted R(A), as R(A) = L(R1 , R2 , . . . , Rm ) Cn ,
2. ColumnSpace(A), denoted C(A), as C(A) = L(a1 , a2 , . . . , an ) Cm ,
3. NullSpace(A), denoted N (A), as N (A) = {xt Cn : Ax = 0}.
4. Range(A), denoted Im(A), as Im(A) = {y : Ax = y for some xt Cn }.
Note that the column space of A consists of all b such that Ax = b has a solution. Hence,
C(A) = Im(A). The above subspaces can also be defined for A , the conjugate transpose of A. We
illustrate the above definitions with the help of an example and then ask the readers to solve the
exercises that appear after the example.
1
Example 3.3.16 Compute the above mentioned subspaces for A = 1
1
Solution: Verify the following
1
1
2
2 1
1 .
2 7 11
(d) N (A ) is a subspace of Cm ,
1 2 1 3 2
2
4
0
6
0 2 2 2 4
1 0 2 5
2. Let A =
2 2 4 0 8 and B = 3 5 1 4 .
4 2 5 6 10
1 1 1
2
(a) Find the row-reduced echelon forms of A and B.
(b) Find P1 and P2 such that P1 A and P2 B are in row-reduced echelon form.
(c) Find a basis each for the row spaces of A and B.
(d) Find a basis each for the range spaces of A and B.
(e) Find bases of the null spaces of A and B.
(f ) Find the dimensions of all the vector subspaces so obtained.
3.3. BASES
73
Lemma 3.3.18 Let A Mmn (C) and let B = EA for some elementary matrix E. Then R(A) =
R(B) and dim(R(A)) = dim(R(B)).
Proof. We prove the result for the elementary matrix Eij (c), where c 6= 0 and 1 i < j m.
The readers are advised to prove the results for other elementary matrices. Let R1 , R2 , . . . , Rm be
the rows of A. Then B = Eij (c)A implies
R(B)
m
X
=1
(m
X
=1
+m Rm : R, 1 m}
)
R + i (cRj ) : R, 1 m
R : R, 1 m
= L(R1 , . . . , Rm ) = R(A)
We omit the proof of the next result as the proof is similar to the proof of Lemma 3.3.18.
Lemma 3.3.19 Let A Mmn (C) and let C = AE for some elementary matrix E. Then C(A) =
C(C) and dim(C(A)) = dim(C(C)).
The first and second part of the next result are a repeated application of Lemma 3.3.18 and
Lemma 3.3.19, respectively. Hence the proof is omitted. This result is also helpful in finding a basis
of a subspace of Cn .
Corollary 3.3.20 Let A Mmn (C). If
1. B is in row-reduced echelon form of A then R(A) = R(B). In particular, the non-zero rows
of B form a basis of R(A) and dim(R(A)) = dim(R(B)) = Row rank(A).
2. the application of column operations gives a matrix C that has the form given in Remark 2.3.6,
then dim(C(A)) = dim(C(C)) = Column rank(A) and the non-zero columns of C form a basis
of C(A).
Before proceeding with applications of Corollary 3.3.20, we first prove that for any A Mmn (C),
Row rank(A) = Column rank(A).
Theorem 3.3.21 Let A Mmn (C). Then Row rank(A) = Column rank(A).
Proof. Let R1 , R2 , . . . , Rm be the rows of A and C1 , C2 , . . . , Cn be the columns of A. Let
Row rank(A) = r. Then by Corollary 3.3.20.1, dim L(R1 , R2 , . . . , Rm ) = r. Hence, there exists
vectors
ut1 = (u11 , . . . , u1n ), ut2 = (u21 , . . . , u2n ), . . . , utr = (ur1 , . . . , urn ) Rn
with
Ri L(ut1 , ut2 , . . . , utr ) Rn , for all i, 1 i m.
Therefore, there exist real numbers ij , 1 i m, 1 j r such that
R1 =
11 ut1
+ +
1r utr
r
X
i=1
1i ui1 ,
r
X
i=1
1i ui2 , . . . ,
r
X
i=1
1i uin
74
R2 =
21 ut1
+ +
2r utr
i=1
m1 ut1
+ +
mr utr
r
X
r
X
i=1
2i ui1 ,
r
X
2i ui2 , . . . ,
i=1
mi ui1 ,
r
X
r
X
2i uin
i=1
mi ui2 , . . . ,
i=1
r
X
mi uin
i=1
11
12
1r
u
i=1 1i i1
21
22
2r
..
C1 =
= u11 . + u21 . + + ur1 . .
.
r
..
..
..
P
mi ui1
m1
m2
mr
r
P
i=1
12
1r
11
u
i=1 1i ij
2r
21
22
..
Cj =
+
+
u
=
u
+
u
. .
2j
rj
1j
.
.
r .
..
..
..
P
mi uij
mr
m2
m1
i=1
(3.3.4)
Let S be a subset of Rn and let V = L(S). Then Theorem 3.3.6 and Corollary 3.3.20.1 to obtain
a basis of V . The algorithm proceeds as follows:
1. Construct a matrix A whose rows are the vectors in S.
2. Apply row operations on A to get B, a matrix in row echelon form.
3. Let B be the set of non-zero rows of B. Then B is a basis of L(S) = V.
Example 3.3.23
of L(S).
1. Let S = {(1, 1, 1, 1), (1, 1, 1, 1), (1, 1, 0, 1), (1, 1, 1, 1)} R4 . Find a basis
1
1
Solution: Here A =
1
1
1
1 1
1
0
1 1 1
. Then B =
0
1
0 1
1 1 1
0
1
1
0
0
1
0
1
0
1
0
is the row echelon form
0
0
3.3. BASES
75
of A and hence B = {(1, 1, 1, 1), (0, 1, 0, 0), (0, 0, 1, 0)} is a basis of L(S). Observe that
the non-zero rows of B can be obtained, using the first, second and fourth or the first,
third and fourth rows of A. Hence the subsets {(1, 1, 1, 1), (1, 1, 0, 1), (1, 1, 1, 1)} and
{(1, 1, 1, 1), (1, 1, 1, 1), (1, 1, 1, 1)} of S are also bases of L(S).
2. Let V = {(v, w, x, y, z) R5 : v+x+z = 3y} and W = {(v, w, x, y, z) R5 : wx = z, v = y}
be two subspaces of R5 . Find bases of V and W containing a basis of V W.
Solution: Let us find a basis of V W. The solution set of the linear equations
v + x 3y + z = 0, w x z = 0 and v = y
is
(v, w, x, y, z)t = (y, 2y, x, y, 2y x)t = y(1, 2, 0, 1, 2)t + x(0, 0, 1, 0, 1)t.
Thus, a basis of V W is B = {(1, 2, 0, 1, 2), (0, 0, 1, 0, 1)}. Similarly, a basis of V is B1 =
{(1, 0, 1, 0, 0), (0, 1, 0, 0, 0), (3, 0, 0, 1, 0), (1, 0, 0, 0, 1)}
and
that
of
W
is
B2 = {(1, 0, 0, 1, 0), (0, 1, 1, 0, 0), (0, 1, 0, 0, 1)}. To find a basis of V containing a basis of
V W , form a matrix whose rows are the vectors in B and B1 (see the first matrix in
Equation(3.3.5)) and apply row operations without disturbing the first two rows that have
come from B. Then after a few row operations, we get
1 2 0 1 2
1 2 0 1 2
0 0 1 0 1
0 0 1 0 1
0 1 0 0 0
1 0 1 0 0
(3.3.5)
.
0 0 0 1 3
0 1 0 0 0
0 0 0 0 0
3 0 0 1 0
0 0 0 0 0
1 0 0 0 1
Thus, a required basis of V is {(1, 2, 0, 1, 2), (0, 0, 1, 0, 1), (0, 1, 0, 0, 0), (0, 0, 0, 1, 3)}. Similarly, a required basis of W is {(1, 2, 0, 1, 2), (0, 0, 1, 0, 1), (0, 0, 1, 0, 1)}.
Exercise 3.3.24
1. If M and N are 4-dimensional subspaces of a vector space V of dimension
7 then show that M and N have at least one vector in common other than the zero vector.
2. Let V = {(x, y, z, w) R4 : x + y z + w = 0, x + y + z + w = 0, x + 2y = 0} and
W = {(x, y, z, w) R4 : x y z + w = 0, x + 2y w = 0} be two subspaces of R4 . Find bases
and dimensions of V, W, V W and V + W.
3. Let W1 and W2 be two subspaces of a vector space V . If dim(W1 ) + dim(W2 ) > dim(V ), then
prove that W1 W2 contains a non-zero vector.
4. Give examples to show that the Column Space of two row-equivalent matrices need not be
same.
5. Let A Mmn (C) with m < n. Prove that the columns of A are linearly dependent.
6. Suppose a sequence of matrices A = B0 B1 Bk1 Bk = B satisfies
R(Bl ) R(Bl1 ) for 1 l k. Then prove that R(B) R(A).
Before going to the next section, we prove the rank-nullity theorem and the main theorem of
system of linear equations (see Theorem 2.4.1).
Theorem 3.3.25 (Rank-Nullity Theorem) For any matrix A Mmn (C),
dim(C(A)) + dim(N (A)) = n.
76
Proof. Let dim(N (A)) = r < n and let {u1 , u2 , . . . , ur } be a basis of N (A). Since {u1 , . . . , ur }
is a linearly independent subset in Rn , there exist vectors ur+1 , . . . , un Rn (see Corollary 3.2.5.2)
such that {u1 , . . . , un } is a basis of Rn . Then by definition,
C(A) =
We need to prove that {Aur+1 , . . . , Aun } is a linearly independent set. Consider the linear system
1 Aur+1 + 2 Aur+2 + + nr Aun = 0.
(3.3.6)
(3.3.7)
Theorem 3.3.25 is part of what is known as the fundamental theorem of linear algebra (see
Theorem 5.2.14). As the final result in this direction, We now prove the main theorem on linear
systems stated on page 43 (see Theorem 2.4.1) whose proof was omitted.
Theorem 3.3.26 Consider a linear system Ax = b, where A is an m n matrix, and x, b are
vectors of orders n 1, and m 1, respectively. Suppose rank (A) = r and rank([A b]) = ra . Then
exactly one of the following statement holds:
1. If r < ra , the linear system has no solution.
2. if ra = r, then the linear system is consistent. Furthermore,
(a) if r = n, then the solution set of the linear system has a unique n 1 vector x0 satisfying
Ax0 = b.
(b) if r < n, then the set of solutions of the linear system is an infinite set and has the form
{x0 + k1 u1 + k2 u2 + + knr unr : ki R, 1 i n r},
where x0 , u1 , . . . , unr are n1 vectors satisfying Ax0 = b and Aui = 0 for 1 i nr.
3.3. BASES
77
Proof. Proof of Part 1. As r < ra , the (r + 1)-th row of the row-reduced echelon form of [A b]
has the form [0, 1]. Thus, by Theorem 1, the system Ax = b is inconsistent.
Proof of Part 2a and Part 2b. As r = ra , using Corollary 3.3.20, C(A) = C([A, b]). Hence, the
vector b C(A) and therefore there exist scalars c1 , c2 , . . . , cn such that b = c1 a1 + c2 a2 + cn an ,
where a1 , a2 , . . . , an are the columns of A. That is, we have a vector xt0 = [c1 , c2 , . . . , cn ] that
satisfies Ax = b.
If in addition r = n, then the system Ax = b has no free variables in its solution set and thus
we have a unique solution (see Theorem 2.1.22.2a).
Whereas the condition r < n implies that the system Ax = b has n r free variables in its
solution set and thus we have an infinite number of solutions (see Theorem 2.1.22.2b). To complete
the proof of the theorem, we just need to show that the solution set in this case has the form
{x0 + k1 u1 + k2 u2 + + knr unr : ki R, 1 i n r}, where Ax0 = b and Aui = 0 for
1 i n r.
To get this, note that using the rank-nullity theorem (see Theorem 3.3.25) rank(A) = r implies
that dim(N (A)) = n r. Let {u1 , u2 , . . . , unr } be a basis of N (A). Then by definition Aui = 0
for 1 i n r and hence
A(x0 + k1 u1 + k2 u2 + + knr unr ) = Ax0 + k1 0 + + knr 0 = b.
Thus, the required result follows.
1 1 0 1 1 0 1
Example 3.3.27 Let A = 0 0 1 2 3 0 2 and V = {xt R7 : Ax = 0}. Find a basis
0 0 0 0 0 1 1
and dimension of V .
Solution: Observe that x1 , x3 and x6 are the basic variables and the rest are the free variables.
Writing the basic variables in terms of free variables, we get
1
1
1
1
x7 x2 x4 x5
x1
0
0
0
1
x2
x
2
2
3
2
0
x 2x 2x 3x
4
5
3 7
x4
= x2 0 + x4 1 + x5 0 + x7 0 .
x4 =
0
1
0
0
x5
x5
1
0
0
0
x6
x7
1
0
0
0
x7
x7
(3.3.8)
Therefore, if we let ut1 = 1, 1, 0, 0, 0, 0, 0 , ut2 = 1, 0, 2, 1, 0, 0, 0 , ut3 = 1, 0, 3, 0, 1, 0, 0
and ut4 = 1, 0, 2, 0, 0, 1, 1 then S = {u1 , u2 , u3 , u4 } is the basis of V . The reasons are as follows:
1. For Linear independence, we consider the homogeneous system
c 1 u1 + c 2 u2 + c 3 u3 + c 4 u4 = 0
(3.3.9)
in the unknowns c1 , c2 , c3 and c4 . Then relating the unknowns with the free variables x2 , x4 , x5
and x7 and then comparing Equations (3.3.8) and (3.3.9), we get
(a) c1 = 0 as the 2-nd coordinate consists only of c1 .
78
Remark 3.3.28 The vectors u1 , u2 , . . . , unr in Theorem 3.3.26.2b correspond to expressing the
solution set with the help of the free variables. This is done by writing the basic variables in terms
of the free variables and then writing the solution set in such a way that each ui corresponds to a
specific free variable.
The following are some of the consequences of the rank-nullity theorem. The proof is left as an
exercise for the reader.
Exercise 3.3.29
(f ) There exists a linearly independent subset {b1 , b2 , . . . , bk } of Rm such that the system
Ax = bi for 1 i k is consistent.
3.4
Ordered Bases
79
Example 3.4.2
1. Consider the vector space P2 (R) with basis {1 x, 1 + x, x2 }. Then one can
take either B1 = 1 x, 1 + x, x2 or B2 = 1 + x, 1 x, x2 as ordered bases. Also for any
element a0 + a1 x + a2 x2 P2 (R), one has
a0 + a1 x + a2 x2 =
a0 a1
a0 + a1
(1 x) +
(1 + x) + a2 x2 .
2
2
2. Let V = {(x, y, z) : x + y = z} and let B = {(1, 1, 0), (1, 0, 1)} be a basis of V . Then check
that (3, 4, 7) = 4(1, 1, 0) + 7(1, 0, 1) V.
That is, as ordered bases (u1 , u2 , . . . , un ), (u2 , u3 , . . . , un , u1 ) and (un , un1 , . . . , u2 , u1 ) are
different even though they have the same set of vectors as elements. To proceed further, we now
define the notion of coordinates of a vector depending on the chosen ordered basis.
Definition 3.4.3 (Coordinates of a Vector) Let B = (v1 , v2 , . . . , vn ) be an ordered basis of a
vector space V . If an element v V is expressible as
v = 1 v1 + 2 v2 + + n vn for some scalars1 , 2 , . . . , n
then the tuple (1 , 2 , . . . , n ) is called the coordinate of the vector v with respect to the ordered
basis B and is denoted by [v]B = (1 , . . . , n )t , a column vector.
Example 3.4.4
[p(x)]B1 =
2
a0 +a1 , [p(x)]B2
2
a2
2
a0 a1
2
a2
a2
1
and [p(x)]B3 = a0 a
.
2
a0 +a1
2
4
y
2. In Example 3.4.2.2, (3, 4, 7) B =
and (x, y, z) B = (z y, y, z) B =
.
7
z
3. Let the ordered bases of R3 be B1 = (1, 0, 0), (0, 1, 0), (0, 0, 1) , B2 = (1, 0, 0), (1, 1, 0), (1, 1, 1)
and B3 = (1, 1, 1), (1, 1, 0), (1, 0, 0) . Then
(1, 1, 1) =
=
n
X
l=1
80
!
n
n
n
n
n
X
X
X
X
X
v=
i vi =
i
aji uj =
aji i uj .
i=1
i=1
j=1
j=1
i=1
a11 a1n
1
!
t
n
n
n
a21 a2n 2
X
X
X
[v]B1 =
a1i i ,
a2i i , . . . ,
ani i = .
.. .. = A[v]B2 ,
..
..
.
. .
i=1
i=1
i=1
an1 ann
n
where A = [v1 ]B1 , [v2 ]B1 , . . . , [vn ]B1 . Hence, we have proved the following theorem.
Theorem 3.4.5 Let V be an n-dimensional vector space
with bases B1 = (u1 ,u2 , . . . , un ) and
B2 = (v1 , v2 , . . . , vn ). Define an n n matrix A by A = [v1 ]B1 , [v2 ]B1 , . . . , [vn ]B1 . Then, A is an
0 2 0
2. Check that A = [(1, 1, 1)]B1 , [(1, 1, 1)]B1 , [(1, 1, 0)]B1 = 0 2 1 as
1 1 0
[(1, 1, 1)]B1
[(1, 1, 0)]B1
[(1, 1, 1)]B1
xy
0
[(x, y, z)]B1 = y z = 0
z
1
yx
2 0
2 +z
= A [(x, y, z)]B2 .
2 1 xy
2
1 0
xz
4. Observe that the matrix A is invertible and hence [(x, y, z)]B2 = A1 [(x, y, z)]B1 .
In the next chapter, we try to understand Theorem 3.4.5 again using the ideas of linear transformations/functions.
Exercise 3.4.7
3.5. SUMMARY
81
[v]B1
a1
0
a0 a1 + 2a2 a3 1
=
a0 a1 + a2 2a3 = 1
a0 + a1 a2 + a3
1
1
0
0
0
0
1
0
0
0
0
1
0
a0 + a1 a2 + a3
a1
= [v]B2 .
a2
a3
2. Let B = (2, 1, 0), (2, 1, 1), (2, 2, 1) be an ordered basis of R3 . Determine the coordinates of
(1, 2, 1) and (4, 2, 2) with respect B.
3.5
Summary
In this chapter, we started with the definition of vector spaces over F, the set of scalars. The set F
was either R, the set of real numbers or C, the set of complex numbers.
It was important to note that given a non-empty set V of vectors with a set F of scalars, we
need to do the following:
1. first define vector addition and scalar multiplication and
2. then verify the axioms in Definition 3.1.1.
If all the axioms are satisfied then V is a vector space over F. To check whether a non-empty subset
W of a vector space V over F is a subspace of V , we only need to check whether u + v W for all
u, v W and u W for all F and u W .
We then came across the definition of linear combination of vectors and the linear span of
vectors. It was also shown that the linear span of a subset S of a vector space V is the smallest
subspace of V containing S. Also, to check whether a given vector v is a linear combination of the
vectors u1 , u2 , . . . , un , we need to solve the linear system
c 1 u1 + c 2 u2 + + c n un = v
in the unknowns c1 , . . . , cn . This corresponds to solving the linear system Ax = b. It was also
shown that the geometrical representation of the linear span of S = {u1 , u2 , . . . , un } is equivalent
to finding conditions on the coordinates of the vector b such that the linear system Ax = b is
consistent, where the matrix A is formed with the coordinates of the vector ui as the i-th column
of the matrix A.
By definition, S = {u1 , u2 , . . . , un } is linearly independent subset in V (F) if the homogeneous
system Ax = 0 has only the trivial solution in F, else S is linearly dependent, where the matrix A
is formed with the coordinates of the vector ui as the i-th column of the matrix A.
We then had the notion of the basis of a finite dimensional vector space V and the following
results were proved.
1. A linearly independent set can be extended to form a basis of V .
2. Any two bases of V have the same number of elements.
This number was defined as the dimension of V and we denoted it by dim(V ).
The following conditions are equivalent for an n n matrix A.
1. A is invertible.
82
Chapter 4
Linear Transformations
4.1
In this chapter, it will be shown that if V is a real vector space with dim(V ) = n then V looks like
Rn . On similar lines a complex vector space of dimension n has all the properties that are satisfied
by Cn . To do so, we start with the definition of functions over vector spaces that commute with
the operations of vector addition and scalar multiplication.
Definition 4.1.1 (Linear Transformation, Linear Operator) Let V and W be vector spaces
over the same scalar set F. A function (map) T : V W is called a linear transformation if for
all F and u, v V the function T satisfies
T ( u) = T (u) and T (u + v) = T (u) T (v),
where +, are binary operations in V and , are the binary operations in W . In particular, if
W = V then the linear transformation T is called a linear operator.
We now give a few examples of linear transformations.
Example 4.1.2
1. Define T : RR2 by T (x) = (x, 3x) for all x R. Then T is a linear
transformation as
T (x) = (x, 3x) = (x, 3x) = T (x) and
T (x + y) = (x + y, 3(x + y) = (x, 3x) + (y, 3y) = T (x) + T (y).
2. Let V, W and Z be vector spaces over F. Also, let T : V W and S : W Z be linear
transformations. Then, for each v V , the composition of T and S is defined by S T (v) =
S T (v) . It is easy to verify that S T is a linear transformation. In particular, if V = W ,
one writes T 2 in place of T T .
3. Let xt = (x1 , x2 , . . . , xn ) Rn . Then for a fixed vector at = (a1 , a2 , . . . , an ) Rn , define
n
P
T : Rn R by T (xt ) =
ai xi for all xt Rn . Then T is a linear transformation. In
i=1
particular,
(a) T (xt ) =
n
P
i=1
84
Before proceeding further with some more definitions and results associated with linear transformations, we prove that any linear transformation sends the zero vector to a zero vector.
Proposition 4.1.3 Let T : V W be a linear transformation. Suppose that 0V is the zero vector
in V and 0W is the zero vector of W. Then T (0V ) = 0W .
Proof. Since 0V = 0V + 0V , we have
T (0V ) = T (0V + 0V ) = T (0V ) + T (0V ).
So T (0V ) = 0W as T (0V ) W.
From now on, we write 0 for both the zero vector of the domain and codomain.
Definition 4.1.4 (Zero Transformation) Let V and W be two vector spaces over F and define
T : V W by T (v) = 0 for every v V. Then T is a linear transformation and is usually called
the zero transformation, denoted 0.
Definition 4.1.5 (Identity Operator) Let V be a vector space over F and define T : V V by
T (v) = v for every v V. Then T is a linear transformation and is usually called the Identity
transformation, denoted I.
Definition 4.1.6 (Equality of two Linear Operators) Let V be a vector space and let T, S :
V V be a linear operators. The operators T and S are said to be equal if T (x) = S(x) for all
xV.
We now prove a result that relates a linear transformation T with its value on a basis of the
domain space.
Theorem 4.1.7 Let V and W be two vector spaces over F and let T : V W be a linear trans
formation. If B = u1 , . . . , un is an ordered basis of V then for each v V , the vector T (v) is
a linear combination of T (u1 ), . . . , T (un ) W. That is, we have full information of T if we know
T (u1 ), . . . , T (un ) W , the image of basis vectors in W .
Proof. As B is a basis of V, for every v V, we can find c1 , . . . , cn F such that v =
c1 u1 + + cn un , or equivalently [v]B = (1 , . . . , n )t . Hence, by definition
T (v) = T (c1 u1 + + cn un ) = c1 T (u1 ) + + cn T (un ).
That is, we just need to know the vectors T (u1 ), T (u2 ), . . . , T (un ) in W to get T (v) as [v]B =
(1 , . . . , n )t is known in V . Hence, the required result follows.
85
(b) T (A) = I + A
(c) T (A) = A2
(b) T 6= 0, S 6= 0, S T 6= 0, T S = 0.
(c) S 2 = T 2 , S 6= T.
(d) T 2 = I, T 6= I.
86
4.2
In the previous section, we learnt the definition of a linear transformation. We also saw in Example 4.1.2.5 that for each A Mmn (C), there exists a linear transformation TA : Cn Cm given
by TA (xt ) = Ax for each xt Cn . In this section, we prove that every linear transformation over
finite dimensional vector spaces corresponds to a matrix. Before proceeding further, we advise the
reader to recall the results on ordered basis, studied in Section 3.4.
Let V and W be finite dimensional vector spaces over F with dimensions n and m, respectively.
Also, let B1 = (v1 , . . . , vn ) and B2 = (w1 , . . . , wm ) be ordered bases of V and W , respectively.
If T : V W is a linear transformation then Theorem 4.1.7 implies that T (v) W is a linear
combination of the vectors T (v1 ), . . . , T (vn ). So, let us find the coordinate vectors [T (vj )]B2 for
each j = 1, 2, . . . , n. Let us assume that
[T (v1 )]B2 = (a11 , . . . , am1 )t , [T (v2 )]B2 = (a12 , . . . , am2 )t , . . . , [T (vn )]B2 = (a1n , . . . , amn )t .
Or equivalently,
T (vj ) = a1j w1 + a2j w2 + + amj wm =
m
X
aij wi for j = 1, 2, . . . , n.
!
n
n
n
m
m
n
X
X
X
X
X
X
T (x) = T
xj vj =
xj T (vj ) =
xj
aij wi =
aij xj wi .
j=1
j=1
j=1
a1j xj
a11
j=1
a21
..
[T (x)]B2 =
= .
n .
..
P
amj xj
am1
j=1
i=1
i=1
j=1
(4.2.1)
i=1
..
.
a1n
x1
x2
a2n
.. .. = A [x]B1 ,
. .
amn
xn
(4.2.2)
87
where
a11
a21
A= .
..
am1
a12
a22
..
.
am2
a1n
a2n
.. = [T (v1 )]B2 , [T (v2 )]B2 , . . . , [T (vn )]B2 .
.
amn
..
.
(4.2.3)
The above observations lead to the following theorem and the subsequent definition.
Theorem 4.2.1 Let V and W be finite dimensional vector spaces over F with dimensions n and
m, respectively. Let T : V W be a linear transformation. Also, let B1 and B2 be ordered bases
of V and W, respectively. Then there exists a matrix A Mmn (F), denoted A = T [B1 , B2 ], with
A = [T (v1 )]B2 , [T (v2 )]B2 , . . . , [T (vn )]B2 such that
[T (x)]B2 = A [x]B1 .
m
X
i=1
(4.2.4)
We now give a few examples to understand the above discussion and Theorem 4.2.1.
Q = ( sin , cos )
Q = (0, 1)
P = (x , y )
P = (cos , sin )
P = (1, 0)
P = (x, y)
Example 4.2.4
1. Let T : R2 R2 be a function that counterclockwise rotates every point in R2
by an angle , 0 < 2. Then using Figure 4.1 it can be checked that x = OP cos( + ) =
OP cos cos sin sin = x cos y sin and similarly y = x sin + y cos . Or equivalently, if B = (e1 , e2 ) is the standard ordered basis of R2 , then using T (1, 0) = (cos , sin )
and T (0, 1) = ( sin , cos ), we get
cos
T [B, B] = [T (1, 0)]B , [T (0, 1)]B =
sin
sin
.
cos
(4.2.5)
88
x+y
1
=
x 2y
1
2xy 1
2
= 3y
= 32
[T (x, y)]B1 = (x + y, x 2y) B1 =
[T (x, y)]B2 = (x + y, x 2y) B
1
x
and
2
y
x+y
3
23
2
xy
2
3. Let B1 = (1, 0, 0), (0, 1, 0), (0, 0, 1) and B2 = (1, 0), (0, 1) be ordered bases of R3 and R2 ,
respectively. Define T : R3 R2 by T (x, y, z) = (x + y z, x + z). Then
1 1 1
.
T [B1 , B2 ] = [(1, 0, 0)]B2 , [(0, 1, 0)]B2 , [(0, 0, 1)]B2 =
1 0 1
Check that [T (x, y, z)]B2 = (x + y z, x + z)t = T [B1, B2 ] [(x, y, z)]B1 .
4. Let B1 = (1, 0, 0), (0, 1, 0), (0, 0, 1) , B2 = (1, 0, 0), (1, 1, 0), (1, 1, 1) be two ordered bases of
R3 . Define T : R3 R3 by T (xt ) = x for all xt R3 . Then
[T (1, 0, 0)]B2
[T (0, 0, 1)]B2
[T (0, 1, 0)]B2
T [B2 , B1 ] =
1 1 0
[(1, 0, 0)t , (1, 1, 0)t , (0, 1, 1)t] = 0 1 1 ,
0 0
1
1 1 1
[[T (1, 0, 0)]B1 , [T (1, 1, 0)]B1 , [T (1, 1, 1)]B1 ] = 0 1 1 ,
0 0 1
Remark 4.2.5
1. Let V and W be finite dimensional vector spaces over F with order bases
B1 = (v1 , . . . , vn ) and B2 of V and W , respectively. If T : V W is a linear transformation
then
(a) T [B1 , B2 ] = [T (v1 )]B2 , [T (v2 )]B2 , . . . , [T (vn )]B2 .
89
(b) [T (x)]B2 = T [B1 , B2 ] [x]B1 for all x V . That is, the coordinate vector of T (x) W is
obtained by the matrix of the linear transformation with the coordinate vector of x V .
2. Let A Mmn (R). Then A induces a linear transformation TA : Rn Rm defined by
TA (xt ) = Ax for all xt Rn . Let B1 and B2 be the standard ordered bases of Rn and Rm ,
respectively. Then it can be easily verified that TA [B1 , B2 ] = A.
Exercise 4.2.6
1. Let T : R2 R2 be a function that reflects every point in R2 about the line
y = mx. Find its matrix with respect to the standard ordered basis of R2 .
2. Let T : R3 R3 be a function that reflects every point in R3 about the X-axis. Find its
matrix with respect to the standard ordered basis of R3 .
3. Let T : R3 R3 be a function that counterclockwise rotates every point in R3 around the
positive Z-axis by an angle , 0 < 2. Prove that T is a linear operator and find its
matrix with respect to the standard ordered basis of R3 .
4. Define a function D : Pn (R)Pn (R) by
D(a0 + a1 x + a2 x2 + + an xn ) = a1 + 2a2 x + + nan xn1 .
Prove that D is a linear operator and find the matrix of D with respect to the standard ordered
basis of Pn (R). Observe that the image of D is contained in Pn1 (R).
5. Let T be a linear operator in R2 satisfying T (3, 4) = (0, 1) and T (1, 1) = (2, 3). Let B =
(1, 0), (1, 1) be an ordered basis of R2 . Compute T [B, B].
6. For each linear transformation given in Example 4.1.2, find its matrix of the linear transform
with respect to standard ordered bases.
4.3
Rank-Nullity Theorem
We are now ready to related the rank-nullity theorem (see Theorem 3.3.25 on 75) with the ranknullity theorem for linear transformation. To do so, we first define the range space and the null
space of any linear transformation.
Definition 4.3.1 (Range Space and Null Space) Let V be finite dimensional vector space over
F and let W be any vector space over F. Then for a linear transformation T : V W , we define
1. C(T ) = {T (x) : x V } as the range space of T and
2. N (T ) = {x V : T (x) = 0} as the null space of T .
We now prove some results associated with the above definitions.
Proposition 4.3.2 Let V be a vector space over F with basis {v1 , . . . , vn }. Also, let W be a vector
spaces over F. Then for any linear transformation T : V W ,
1. C(T ) = L(T (v1 ), . . . , T (vn )) is a subspace of W and dim(C(T ) dim(W ).
2. N (T ) is a subspace of V and dim(N (T ) dim(V ).
3. The following statements are equivalent.
90
Remark 4.3.3
=
=
=
L (1, 0, 1, 2), (1, 1, 0, 5), (1, 1, 0, 5)
L (1, 0, 1, 2), (1, 1, 0, 5)
{(1, 0, 1, 2) + (1, 1, 0, 5) : , R}
{( + , , , 2 + 5) : , R}
{(x, y, z, w) R4 : x + y z = 0, 5y 2z + w = 0}
and
N (T ) =
=
=
=
=
Exercise 4.3.5
{(x, y, z) R3 : T (x, y, z) = 0}
{(x, y, z) R3 : (x y + z, y z, x, 2x 5y + 5z) = 0}
{(x, y, z) R3 : x y + z = 0, y z = 0,
{(x, y, z) R
{(0, y, y) R
x = 0, 2x 5y + 5z = 0}
: y z = 0, x = 0}
: y R} = L((0, 1, 1))
91
(c) prove that T 3 = T and find the matrix of T with respect to the standard basis.
4. Find a linear operator T : R3 R3 for which C(T ) = L (1, 2, 0), (0, 1, 1), (1, 3, 1) ?
5. Let {v1 , v2 , . . . , vn } be a basis of a vector space V (F). If W (F) is a vector space and
w1 , w2 , . . . , wn W then prove that there exists a unique linear transformation T : V W
such that T (vi ) = wi for all i = 1, 2, . . . , n.
We now state the rank-nullity theorem for linear transformation. The proof of this result is
similar to the proof of Theorem 3.3.25 and it also follows from Proposition 4.3.2. Hence, we omit
the proof.
Theorem 4.3.6 (Rank Nullity Theorem) Let V be a finite dimensional vector space and let
T : V W be a linear transformation. Then (T ) + (T ) = dim(V ). That is,
dim(R(T )) + dim(N (T )) = dim(V ).
Using the Rank-nullity theorem, we give a short proof of the following result.
Corollary 4.3.7 Let V be a finite dimensional vector space and let T : V V be a linear operator.
Then the following statements are equivalent.
1. T is one-one.
2. T is onto.
3. T is invertible.
Proof. By Proposition 4.3.2, T is one-one if and only if N (T ) = {0}. By Theorem 4.3.6 N (T ) =
{0} implies dim(C(T )) = dim(V ). Or equivalently, T is onto.
Now, we know that T is invertible if T is one-one and onto. But we have just shown that T is
one-one if and only if T is onto. Thus, we have the required result.
Remark 4.3.8 Let V be a finite dimensional vector space and let T : V V be a linear operator.
If either T is one-one or T is onto then T is invertible.
Theorem 4.3.9 Let V and W be finite dimensional vector spaces over F and let T : V W be a
linear transformation. Also assume that T is one-one and onto. Then
1. for each w W, the set T 1 (w) is a set consisting of a single element.
2. the map T 1 : W V defined by T 1 (w) = v whenever T (v) = w is a linear transformation.
92
Proof. Since T is onto, for each w W there exists v V such that T (v) = w. So, the set
T 1 (w) is non-empty.
Suppose there exist vectors v1 , v2 V such that T (v1 ) = T (v2 ). Then the assumption, T is
one-one implies v1 = v2 . This completes the proof of Part 1.
We are now ready to prove that T 1 , as defined in Part 2, is a linear transformation. Let
w1 , w2 W. Then by Part 1, there exist unique vectors v1 , v2 V such that T 1 (w1 ) = v1
and T 1 (w2 ) = v2 . Or equivalently, T (v1 ) = w1 and T (v2 ) = w2 . So, for any 1 , 2 F,
T (1 v1 + 2 v2 ) = 1 w1 + 2 w2 . Hence, by definition, for any 1 , 2 F, T 1 (1 w1 + 2 w2 ) =
1 v1 + 2 v2 = 1 T 1 (w1 ) + 2 T 1 (w2 ). Thus the proof of Part 2 is over.
Definition 4.3.10 (Inverse Linear Transformation) Let V and W be finite dimensional vector
spaces over F and let T : V W be a linear transformation. If the map T is one-one and onto,
then the map T 1 : W V defined by
T 1 (w) = v whenever T (v) = w
is called the inverse of the linear transformation T.
Example 4.3.11
1. Let T : R2 R2 be defined by T (x, y) = (x+y, xy). Then T 1 : R2 R2
xy
is defined by T 1 (x, y) = ( x+y
2 , 2 ). One can see that
T T 1 (x, y)
x+y xy
,
)
2
2
x+y xy x+y xy
= (
+
,
= T (T 1 (x, y)) = T (
where I is the identity operator. Hence, T T 1 = I. Verify that T 1 T = I. Thus, the map
T 1 is indeed the inverse of T.
2. For (a1 , . . . , an+1 ) Rn+1 , define the linear transformation T : Rn+1 Pn (R) by
T (a1 , a2 , . . . , an+1 ) = a1 + a2 x + + an+1 xn .
Then it can be checked that T 1 : Pn (R)Rn+1 is defined by T 1 (a1 + a2 x+ + an+1 xn ) =
(a1 , a2 , . . . , an+1 ) for all a1 + a2 x + + an+1 xn Pn (R).
Exercise 4.3.12
1. Let V be a finite dimensional vector space and let T : V W be a linear
transformation. Then
(a) prove that N (T ) and C(T ) are also finite dimensional.
(b) prove that
93
4.4
Similarity of Matrices
Let V be a finite dimensional vector space with ordered basis B. Then we saw that any linear
operator T : V V corresponds to a square matrix of order dim(V ) and this matrix was denoted
by T [B, B]. In this section, we will try to understand the relationship between T [B1 , B1 ] and
T [B2 , B2 ], where B1 and B2 are distinct ordered bases of V . This will enable us to understand the
reason for defining the matrix product somewhat differently.
Theorem 4.4.1 (Composition of Linear Transformations) Let V, W and Z be finite dimensional vector spaces with ordered bases B1 , B2 and B3 , respectively. Also, let T : V W and
S : W Z be linear transformations. Then the composition map S T : V Z (see Figure 4.2)
is a linear transformation and
T [B1 , B2 ]mn
(V, B1 , n)
S[B2 , B3 ]pm
(W, B2 , m)
(Z, B3 , p)
m
X
j=1
X
m
j=1
p
X
(T [B1 , B2 ])jt
p
X
k=1
(T [B1 , B2 ])jt vj
k=1
m
X
(T [B1 , B2 ])jt S(vj )
j=1
p
X
(S[B2 , B3 ])kj wk =
m
X
( (S[B2 , B3 ])kj (T [B1 , B2 ])jt )wk
k=1 j=1
S[B2 , B3 ] T [B1 , B2 ] 1t
T [B1 , B2 ]1t
S[B , B ] T [B , B ]
T [B1 , B2 ]2t
2
3
1
2 2t
= S[B2 , B3 ]
[(S T ) (ut )]B3 =
.
.
..
.
.
S[B2 , B3 ] T [B1, B2 ] pt
T [B1 , B2 ]pt
94
Hence, (S T )[B1 , B3 ] = [(S T )(u1 )]B3 , . . . , [(S T )(un )]B3 = S[B2 , B3 ] T [B1, B2 ] and the proof
of the theorem is over.
Proposition 4.4.2 Let V be a finite dimensional vector space and let T, S : V V be two linear
operators. Then (T ) + (S) (T S) max{(T ), (S)}.
Proof. We first prove the second inequality.
Suppose v N (S). Then (T S)(v) = T (S(v) = T (0) = 0 gives N (S) N (T S). Therefore,
(S) (T S).
We now use Theorem 4.3.6 to see that the inequality (T ) (T S) is equivalent to showing
C(T S) C(T ). But this holds true as C(S) V and hence T (C(S)) T (V ). Thus, the proof of
the second inequality is over.
For the proof of the first inequality, assume that k = (S) and {v1 , . . . , vk } is a basis of N (S).
Then {v1 , . . . , vk } N (T S) as T (0) = 0. So, let us extend it to get a basis {v1 , . . . , vk , u1 , . . . , u }
of N (T S).
Claim: {S(u1 ), S(u2 ), . . . , S(u )} is a linearly independent subset of N (T ).
It is easily seen that {S(u1 ), . . . , S(u )} is a subset of N (T ). So, let us solve the linear system
c1 S(u1 ) + c2 S(u2 ) + + c S(u ) = 0 in the unknowns c1 , c2 , . . . , c . This system is equivalent
P
P
to S(c1 u1 + c2 u2 + + c u ) = 0. That is,
ci ui N (S). Hence,
ci ui is a unique linear
i=1
i=1
c1 u1 + c2 u2 + + c u = 1 v1 + 2 v2 + + k vk
(4.4.1)
95
2. Let B = 1, x, x2 , x3 be an ordered basis of P3 (R). Define T : P3 (R)P3 (R) by
T (1) = 1, T (x) = 1 + x, T (x2 ) = (1 + x)2 and T (x3 ) = (1 + x)3 .
Prove that T is an invertible linear operator. Also, find T [B, B] and T 1 [B, B].
We end this section with definition, results and examples related with the notion of isomorphism.
The result states that for each fixed positive integer n, every real vector space of dimension n is
isomorphic to Rn and every complex vector space of dimension n is isomorphic to Cn .
Definition 4.4.6 (Isomorphism) Let V and W be two vector spaces over F. Then V is said to
be isomorphic to W if there exists a linear transformation T : V W that is one-one, onto and
invertible. We also denote it by V
= W.
Theorem 4.4.7 Let V be a vector space over R. If dim(V ) = n then V
= Rn .
Proof. Let B be the standard ordered basis of Rn and let B1 = v1 , . . . , vn be an ordered basis
of V . Define a map T : V Rn by T (vi ) = ei for 1 i n. Then it can be easily verified that T
is a linear transformation that is one-one, onto and invertible (the image of a basis vector is a basis
vector). Hence, the result follows.
A similar idea leads to the following result and hence we omit the proof.
Theorem 4.4.8 Let V be a vector space over C. If dim(V ) = n then V
= Cn .
Example 4.4.9
1. The standard ordered basis of Pn (C) is given by 1, x, x2 , . . . , xn . Hence,
define T : Pn (C)Cn+1 by T (xi ) = ei+1 for 0 i n. In general, verify that T (a0 +
a1 x + + an xn ) = (a0 , a1 , . . . , an ) and T is linear transformation which is one-one, onto
and invertible. Thus, the vector space Pn (C)
= Cn+1 .
2. Let V = {(x, y, z, w) R4 : x y + z w = 0}. Suppose that B is the standard ordered basis of
R3 and B1 = (1, 1, 0, 0), (1, 0, 1, 0), (1, 0, 0, 1) is the ordered basis of V . Then T : V R3
defined by T (v) = T (y z +w, y, z, w) = (y, z, w) is a linear transformation and T [B1 , B] = I3 .
Thus, T is one-one, onto and invertible.
4.5
Change of Basis
Let V be a vector space with ordered bases B1 = (u1 , . . . , un ) and B2 = (v1 , . . . , vn }. Also, recall
that the identity linear operator I : V V is defined by I(x) = x for every x V. If
a11
a
21
I[B2 , B1 ] = [I(v1 )]B1 , [I(v2 )]B1 , . . . , [I(vn )]B1 = .
..
an1
then by definition of I[B2 , B1 ], we see that vi = I(vi ) =
n
P
j=1
a12
a22
..
.
an2
..
.
a1n
a2n
..
.
ann
proved the following result which also appeared in another form in Theorem 3.4.5.
96
Theorem 4.5.1 (Change of Basis Theorem) Let V be a finite dimensional vector space with
ordered bases B1 = (u1 , u2 , . . . , un } and B2 = (v1 , v2 , . . . , vn }. Suppose x V with [x]B1 =
(1 , 2 , . . . , n )t and [x]B2 = (1 , 2 , . . . , n )t . Then [x]B1 = I[B2 , B1 ] [x]B2 . Or equivalently,
1
a11
2 a21
. = .
.. ..
n
an1
a12
a22
..
.
..
.
an2
a1n
a2n
..
.
ann
1
2
. .
..
n
Remark 4.5.2 Observe that the identity linear operator I : V V is invertible and hence by Theorem 4.4.4 I[B2 , B1 ]1 = I 1 [B1 , B2 ] = I[B1 , B2 ]. Therefore, we also have [x]B2 = I[B1 , B2 ] [x]B1 .
Let V be a finite dimensional vector space with ordered bases B1 and B2 . Then for any linear
operator T : V V the next result relates T [B1, B1 ] and T [B2 , B2 ].
Theorem 4.5.3 Let B1 = (u1 , . . . , un ) and B2 = (v1 , . . . , vn ) be two ordered bases of a vector
space V . Also, let A = [aij ] = I[B2 , B1 ] be the matrix of the identity linear operator. Then for any
linear operator T : V V
T [B2 , B2 ] = A1 T [B1 , B1 ] A = I[B1 , B2 ] T [B1 , B1 ] I[B2 , B1 ].
(4.5.2)
Proof. The proof uses Theorem 4.4.1 by representing T [B1 , B2 ] as (IT )[B1 , B2 ] and (T I)[B1 , B2 ],
where I is the identity operator on V (see Figure 4.3). By Theorem 4.4.1, we have
T [B1 , B1 ]
(V, B1 )
(V, B1 )
I T
I[B1 , B2 ]
I[B1 , B2 ]
T I
(V, B2 )
(V, B2 )
T [B2 , B2 ]
Figure 4.3: Commutative Diagram for Similarity of Matrices
T [B1 , B2 ] =
Thus, using I[B2 , B1 ] = I[B1 , B2 ]1 , we get I[B1 , B2 ] T [B1 , B1 ] I[B2 , B1 ] = T [B2 , B2 ] and the result
follows.
Let T : V V be a linear operator on V . If dim(V ) = n then each ordered basis B of V
gives rise to an n n matrix T [B, B]. Also, we know that for any vector space we have infinite
number of choices for an ordered basis. So, as we change an ordered basis, the matrix of the linear
transformation changes. Theorem 4.5.3 tells us that all these matrices are related by an invertible
matrix (see Remark 4.5.2). Thus we are led to the following remark and the definition.
Remark 4.5.4 The Equation (4.5.2) shows that T [B2 , B2 ] = I[B1 , B2 ] T [B1 , B1 ] I[B2 , B1 ]. Hence,
the matrix I[B1 , B2 ] is called the B1 : B2 change of basis matrix.
97
Definition 4.5.5 (Similar Matrices) Two square matrices B and C of the same order are said to
be similar if there exists a non-singular matrix P such that P 1 BP = C or equivalently BP = P C.
Example 4.5.6
1. Let B1 = 1 + x, 1 + 2x + x2, 2 + x and B2 = 1, 1 + x, 1 + x + x2 be ordered
bases of P2 (R). Then I(a + bx + cx2 ) = a + bx + cx2 . Thus,
1 1 2
I[B2 , B1 ] = [[1]B1 , [1 + x]B1 , [1 + x + x2 ]B1 ] = 0 0 1
1 0 1
0 1
I[B1 , B2 ] = [[1 + x]B2 , [1 + 2x + x2 ]B2 , [2 + x]B2 ] = 1 1
0 1
and
1
1 .
0
0 0 2
4/5 1 8/5
0 1 1
1 1 4 and T [B2 , B2 ] = 2/5 2 9/5 . Also, check that I[B2 , B1 ] = 2
1 0
0 1 0
8/5 0 1/5
1 1 1
and
2 2 2
T [B1 , B1 ] I[B2 , B1 ] = I[B2 , B1 ] T [B2 , B2 ] = 2 4
5 .
2
1
0
Exercise 4.5.7
1. Let V be an n-dimensional vector space and let T : V V be a linear
operator. Suppose T has the property that T n1 6= 0 but T n = 0.
(a) Prove that there exists u V with {u, T (u), . . . , T n1 (u)}, a
0 0
1 0
0 1
(b) For B = u, T (u), . . . , T n1 (u) prove that T [B, B] =
.
.
.
0 0
basis of V.
0
0
0
..
..
.
1
0
0
0 .
..
.
0
(c) Let A be an n n matrix satisfying An1 6= 0 but An = 0. Then prove that A is similar
to the matrix given in Part 1b.
98
4.6
Summary
Chapter 5
Introduction
In the previous chapters, we learnt about vector spaces and linear transformations that are maps
(functions) between vector spaces. In this chapter, we will start with the definition of inner product
that helps us to view vector spaces geometrically.
5.2
100
i,j=1
i,j=1
1. Verify that inner products defined in Examples 3 4, are indeed inner products.
2. Let hx, yi = 0 for every vector y of an inner product space V . prove that x = 0.
Definition 5.2.5 (Length/Norm of a Vector) Let V be a vector
p space. Then for any vector
u V, we define the length (norm) of u, denoted kuk, by kuk = hu, ui, the positive square root.
A vector of norm 1 is called a unit vector.
Example 5.2.6
1. Let V be an inner product space and u V . Then for any scalar , it is
easy to verify that kuk = kuk.
101
4. Prove that for any two continuous functions f (x), g(x) C([1, 1]), the map hf (x), g(x)i =
R1
1 f (x) g(x)dx defines an inner product in C([1, 1]).
5. Fix an ordered basis B = (u1 , . . . , un ) of a complex vector space V . Prove that h , i defined
n
P
by hu, vi =
ai bi , whenever [u]B = (a1 , . . . , an )t and [v]B = (b1 , . . . , bn )t is indeed an inner
i=1
product in V .
A very useful and a fundamental inequality concerning the inner product is due to Cauchy and
Schwarz. The next theorem gives the statement and a proof of this inequality.
Theorem 5.2.8 (Cauchy-Schwarz inequality) Let V (F) be an inner product space. Then for
any u, v V
|hu, vi| kuk kvk.
(5.2.1)
Equality holds in Equation (5.2.1) if and only if the vectors u and v are linearly dependent. Furu
u
i
.
thermore, if u 6= 0, then v = hv,
kuk kuk
Proof. If u = 0, then the inequality (5.2.1) holds trivially. Hence, let u 6= 0. Also, by the third
hv, ui
,
property of inner product, hu + v, u + vi 0 for all F. In particular, for =
kuk2
0
=
=
Or, in other words |hv, ui|2 kuk2 kvk2 and the proof of the inequality is over.
If u 6= 0 then hu + v, u + vi = 0 if and only of u + v = 0. Hence, equality holds in (5.2.1) if
D
E
hv, ui
u
u
and only if =
. That is, u and v are linearly dependent and in this case v = v, kuk
kuk .
2
kuk
Let V be a real vector space. Then for every u, v V, the Cauchy-Schwarz inequality (see
hu,vi
(5.2.1)) implies hat 1 kuk
kvk 1. Also, we know that cos : [0, ] [1, 1] is an one-one and
onto function. We use this idea, to relate inner product with the angle between two vectors in an
inner product space V .
Definition 5.2.9 (Angle between two vectors) Let V be a vector space and let u, v V . Suppose is the angle between u, v. We define
cos =
hu, vi
.
kuk kvk
hu, vi
is called the angle between
kuk kvk
102
b
A
b 2 + c2 a2
.
2bc
Proof. Let the coordinates of the vertices A, B and C be 0, u and v, respectively. Then
~ = u, AC
~ = v and BC
~ = v u. Thus, we need to prove that
AB
cos(A) =
Now, using the properties of an inner product and Definition 5.2.9, it follows that
kvk2 + kuk2 kv uk2 = 2 hu, vi = 2 kvkkuk cos(A).
Thus, the required result follows.
Definition 5.2.11 (Orthogonal Complement) Let W be a subset of a vector space V with inner
product h , i. Then the orthogonal complement of W in V , denoted W , is defined by
W = {v V : hv, wi = 0 for all w W }.
Exercise 5.2.12 Let W be a subset of a vector space V . Then prove that W is a subspace of V .
Example 5.2.13
1. Let R4 be endowed with the standard inner product. Fix two vectors ut =
t
(1, 1, 1, 1), v = (1, 1, 1, 0) R4 . Determine two vectors z and w such that u = z + w, z is
parallel to v and w is orthogonal to v.
Solution: Let zt = kvt = (k, k, k, 0) for some k R and let wt = (a, b, c, d). As w is
orthogonal to v, hw, vi = 0 and hence a + b c = 0. Thus, c = a + b and
(1, 1, 1, 1) = ut = zt + wt = (k, k, k, 0) + (a, b, a + b, d).
Comparing the corresponding coordinates, we get
d = 1, a + k = 1, b + k = 1 and a + b k = 1.
Solving for a, b and k gives a = b =
1
3 (2, 2, 4, 3).
2
3
and k =
1
3.
Thus, zt =
1
3 (1, 1, 1, 0)
and wt =
2. Let R3 be endowed with the standard inner product and let P = (1, 1, 1), Q = (2, 1, 3) and
R = (1, 1, 2) be three vertices of a triangle in R3 . Compute the angle between the sides P Q
103
and P R.
Solution: Method 1: The sides are represented by the vectors
~ = (3, 0, 1).
P~Q = (2, 1, 3) (1, 1, 1) = (1, 0, 2), P~R = (2, 0, 1) and RQ
.
2
We end this section by stating and proving the fundamental theorem of linear algebra. To do
this, recall that for a matrix A Mn (C), A denotes the conjugate transpose of A, N (A) = {v
Cn : Av = 0} denotes the null space of A and R(A) = {Av : v Cn } denotes the range space of A.
The readers are also advised to go through Theorem 3.3.25 (the rank-nullity theorem for matrices)
before proceeding further as the first part is stated and proved there.
Theorem 5.2.14 (Fundamental Theorem of Linear Algebra) Let A be an n n matrix with
complex entries and let N (A) and R(A) be defined as above. Then
1. dim(N (A)) + dim(R(A)) = n.
2. N (A) = R(A ) and N (A ) = R(A) .
3. dim(R(A)) = dim(R(A )).
1. Answer the following questions when R3 is endowed with the standard inner
(a) Let ut = (1, 1, 1). Find vectors v, w R3 that are orthogonal to u and to each other.
(b) Find the equation of the line that passes through the point (1, 1, 1) and is parallel to the
vector (a, b, c) 6= (0, 0, 0).
104
(e) Find the equation of the plane that contains the point (2, 2, 1) and is perpendicular to
the line with parametric equations x = t 1, y = 3t + 2, z = t + 1.
(f ) Let P = (3, 0, 2), Q = (1, 2, 1) and R = (2, 1, 1) be three points in R3 .
i. Find the area of the triangle with vertices P, Q and R.
~
ii. Find the area of the parallelogram built on vectors P~Q and QR.
iii. Find a nonzero vector orthogonal to the triangle with vertices P, Q and R.
~ with kxk = 2.
iv. Find all vectors x orthogonal to P~Q and QR
v. Choose one of the vectors x found in part 1(f )iv. Find the volume of the paral~ and x. Do you think the volume would be
lelepiped built on vectors P~Q and QR
different if you choose the other vector x?
(g) Find the equation of the plane that contains the lines (x, y, z) = (1, 2, 2) + t(1, 1, 0) and
(x, y, z) = (1, 2, 2) + t(0, 1, 2).
(h) Let ut = (1, 1, 1) and vt = (1, k, 1). Find k such that the angle between u and v is
/3.
= (2, 1, 1) as its
(i) Let p1 be a plane that passes through the point A = (1, 2, 3) and has n
normal vector. Then
i. find the equation of the plane p2 which is parallel to p1 and passes through the point
(1, 2, 3).
(j) In the parallelogram ABCD, ABkDC and ADkBC and A = (2, 1, 3), B = (1, 2, 2), C =
(3, 1, 5). Find
i. the coordinates of the point D,
ii. the cosine of the angle BCD.
iii. the area of the triangle ABC
iv. the volume of the parallelepiped determined by the vectors AB, AD and the vector
(0, 0, 7).
(k) Find the equation of a plane that contains the point (1, 1, 2) and is orthogonal to the line
with parametric equation x = 2 + t, y = 3 and z = 1 t.
(l) Find a parametric equation of a line that passes through the point (1, 2, 1) and is orthogonal to the plane x + 3y + 2z = 1.
2. Let {et1 , et2 , . . . , etn } be the standard basis of Rn . Then prove that with respect to the standard
inner product on Rn , the vectors ei satisfy the following:
(a) kei k = 1 for 1 i n.
105
106
(c) hx, yi = 0 whenever kx + yk2 = kxk2 + kyk2 and kx + iyk2 = kxk2 + kiyk2 .
15. Let h , i denote the standard inner product on Cn (C) and let A Mn (C). That is, hx, yi =
x y for all xt , yt Cn . Prove that hAx, yi = hx, A yi for all x, y Cn .
16. Let (V, h , i) be an n-dimensional inner product space and let u V be a fixed vector with
kuk = 1. Then give reasons for the following statements.
(a) Let S = {v V : hv, ui = 0}. Then dim(S ) = n 1.
5.2.1
We start this subsection with the definition of an orthonormal set. Then a theorem is proved that
implies that the coordinates of a vector with respect to an orthonormal basis are just the inner
products with the basis vectors.
Definition 5.2.16 (Orthonormal Set) Let S = {v1 , v2 , . . . , vn } be a set of non-zero, mutually
orthogonal vectors in an inner product space V . Then S is called an orthonormal set if kvi k = 1
for 1 i n. If S is also a basis of V then S is called an orthonormal basis of V.
Example 5.2.17
1. Consider R2 with the standard inner product. Then a few orthonormal sets
in R2 are (1, 0), (0, 1) , 12 (1, 1), 12 (1, 1) and 15 (2, 1), 15 (1, 2) .
2. Let Rn be endowed with the standard inner product. Then by Exercise 5.2.15.2, the standard
ordered basis (et1 , et2 , . . . , etn ) is an orthonormal set.
Theorem 5.2.18 Let V be an inner product space and let {u1 , u2 , . . . , un } be a set of non-zero,
mutually orthogonal vectors of V.
1. Then the set {u1 , u2 , . . . , un } is linearly independent.
2. Let v =
n
P
i=1
3. Let v =
n
P
i=1
i ui V . Then kvk2 = k
n
P
i=1
i ui k2 =
n
P
i=1
|i |2 kui k2 ;
n
X
i=1
n
X
i=1
|hv, ui i|2 .
(5.2.2)
in the unknowns c1 , c2 , . . . , cn . As h0, ui = 0 for each u V and huj , ui i = 0 for all j 6= i, we have
0 = h0, ui i = hc1 u1 + c2 u2 + + cn un , ui i =
n
X
j=1
cj huj , ui i = ci hui , ui i.
107
As ui 6= 0, hui , ui i 6= 0 and therefore ci = 0 for 1 i n. Thus, the linear system (5.2.2) has only
the trivial solution. Hence, the proof of Part 1 is complete.
For Part 2, we use a similar argument to get
k
n
X
i=1
i ui k
=
=
n
X
i=1
n
X
i ui ,
i=1
n
X
i=1
n
X
j=1
i ui
j hui , uj i =
n
X
i=1
n
X
i=1
ui ,
n
X
j uj
j=1
i i hui , ui i =
n
X
i=1
|i |2 kui k2 .
DP
E
Pn
n
Note that hv, ui i =
u
,
u
=
j
j
i
j=1
j=1 j huj , ui i = j . Thus, the proof of Part 3 is
complete.
Part 4 directly follows using Part 3 as the set {u1 , u2 , . . . , un } is a basis of V. Therefore, we
have obtained the required result.
In view of Theorem 5.2.18, we inquire into the question of extracting an orthonormal basis from a
given basis. In the next section, we describe a process (called the Gram-Schmidt Orthogonalization
process) that generates an orthonormal set from a given set containing finitely many vectors.
Remark 5.2.19 The last two parts of Theorem 5.2.18 can be rephrased as follows:
Let B = v1 , . . . , vn be an ordered orthonormal basis of an inner product space V and let u V .
Then
[u]B = (hu, v1 i, hu, v2 i, . . . , hu, vn i)t .
Exercise 5.2.20
1. Let B =
Also, compute [(x, y)]B .
2. Let B = 13 (1, 1, 1), 12 (1, 1, 0), 16 (1, 1, 2), be an ordered basis of R3 . Determine [(2, 3, 1)]B .
Also, compute [(x, y, z)]B .
3. Let ut = (u1 , u2 , u3 ), vt = (v1 , v2 , v3 ) be two vectors in R3 . Then recall that their cross
product, denoted u v, equals
ut vt = (u2 v3 u3 v2 , u3 v1 u1 v3 , u1 v2 u2 v1 ).
Use this to find an orthonormal basis of R3 containing the vector
1 (1, 2, 1).
6
4. Let ut = (1, 1, 2). Find vectors vt , wt R3 such that v and w are orthogonal to u and to
each other as well.
5. Let A be an n n orthogonal matrix. Prove that the rows/columns of A form an orthonormal
basis of Rn .
6. Let A be an n n unitary matrix. Prove that the rows/columns of A form an orthonormal
basis of Cn .
7. Let {ut1 , ut2 , . . . , utn } be an orthonormal basis of Rn . Prove that the n n matrix A =
[u1 , u2 , . . . , un ] is an orthogonal matrix.
108
5.3
Suppose we are given two non-zero vectors u and v in a plane. Then in many instances, we need
to decompose the vector v into two components, say y and z, such that y is a vector parallel to u
and z is a vector perpendicular (orthogonal) to u. We do this as follows (see Figure 5.3):
u
=
is a unit vector in the direction of u. Also, using trigonometry, we know that
Let u
. Then u
kuk
~
~ = kOP
~ k cos(). Or using Definition 5.2.9,
cos() = kOQk and hence kOQk
~ k
kOP
u
u
kuk i kuk
and z = v hv,
u
u
kuk i kuk .
R
~ =v
OR
~ =
OQ
hv,ui
kuk2 u
hv,ui
kuk2 u
u
u
kuk i kuk k.
v
is parallel to v
kvk2
1
1
~ = (1, 1, 1, 1)t ~z = (2, 2, 4, 3)t.
(1, 1, 1, 0)t and w
3
3
109
Note that Proju (w) is parallel to u and Projv2 (w) is parallel to v2 . Hence, we have
u
1
1
t
(a) w1 = Proju (w) = hw, ui kuk
2 = 4 u = 4 (1, 1, 1, 1) is parallel to u,
(c) w3 = w w1 w2 =
7
t
44 (3, 3, 5, 1)
3
t
11 (1, 1, 2, 4)
is parallel to v2 and
That is, from the given vector subtract all the orthogonal components that are obtained as
orthogonal projections. If this new vector is non-zero then this vector is orthogonal to the
previous ones.
Theorem 5.3.2 (Gram-Schmidt Orthogonalization Process) Let V be an inner product space.
Suppose {u1 , u2 , . . . , un } is a set of linearly independent vectors in V. Then there exists a set
{v1 , v2 , . . . , vn } of vectors in V satisfying the following:
1. kvi k = 1 for 1 i n,
2. hvi , vj i = 0 for 1 i 6= j n and
3. L(v1 , v2 , . . . , vi ) = L(u1 , u2 , . . . , ui ) for 1 i n.
Proof. We successively define the vectors v1 , v2 , . . . , vn as follows.
u1
Step 1: v1 =
.
ku1 k
Step 2: Calculate w2 = u2 hu2 , v1 iv1 , and let v2 =
w2
.
kw2 k
w3
.
kw3 k
(5.3.2)
As the set {u1 , u2 , . . . , un } is linearly independent, it can be verified that kwi k 6= 0 and hence
wi
we define vi = kw
.
ik
We prove this by induction on n, the number of linearly independent vectors. For n = 1, v1 =
As u is an element of a linearly independent set, u1 6= 0 and thus v1 6= 0 and
kv1 k2 = hv1 , v1 i = h
u1
ku1 k .
u1
u1
hu1 , u1 i
,
i=
= 1.
ku1 k ku1 k
ku1 k2
110
1. kvi k = 1 for 1 i n 1,
2. hvi , vj i = 0 for 1 i 6= j n 1, and
3. L(v1 , v2 , . . . , vi ) = L(u1 , u2 , . . . , ui ) for 1 i n 1.
Using (5.3.2), we define
wn = un hun , v1 iv1 hun , v2 iv2 hun , vn1 ivn1 .
(5.3.3)
wn
We first show that wn 6 L(v1 , v2 , . . . , vn1 ). This will imply that wn 6= 0 and hence vn = kw
nk
is well defined. Also, kvn k = 1.
On the contrary, assume that wn L(v1 , v2 , . . . , vn1 ). Then, by definition, there exist scalars
1 , 2 , . . . , n1 , not all zero, such that
wn = 1 v1 + 2 v2 + + n1 vn1 .
So, substituting 1 v1 + 2 v2 + + n1 vn1 for wn in (5.3.3), we get
un = 1 + hun , v1 i v1 + 2 + hun , v2 i v2 + + ( n1 + hun , vn1 i vn1 .
That is, un L(v1 , v2 , . . . , vn1 ). But L(v1 , . . . , vn1 ) = L(u1 , . . . , un1 ) using the third induction assumption. Hence un L(u1 , . . . , un1 ). A contradiction to the given assumption that the
set of vectors {u1 , . . . , un } is linearly independent.
Also, it can be easily verified that hvn , vi i = 0 for 1 i n 1. Hence, by the principle of
mathematical induction, the proof of the theorem is complete.
We illustrate the Gram-Schmidt process by the following example.
Example 5.3.3
1. Let {(1, 1, 1, 1), (1, 0, 1, 0), (0, 1, 0, 1)} R4 . Find a set {v1 , v2 , v3 } that is
orthonormal and L( (1, 1, 1, 1), (1, 0, 1, 0), (0, 1, 0, 1) ) = L(v1t , v2t , v3t ).
Solution: Let ut1 = (1, 0, 1, 0), ut2 = (0, 1, 0, 1) and ut3 = (1, 1, 1, 1). Then v1t = 12 (1, 0, 1, 0).
Also, hu2 , v1 i = 0 and hence w2 = u2 . Thus, v2t = 12 (0, 1, 0, 1) and
w3 = u3 hu3 , v1 iv1 hu3 , v2 iv2 = (0, 1, 0, 1)t.
Therefore, v3t =
1 (0, 1, 0, 1).
2
Solution: Let (x, y, z) R3 with (1, 2, 1), (x, y, z) = 0. Then x + 2y + z = 0 or equivalently,
x = 2y z. Thus,
(x, y, z) = (2y z, y, z) = y(2, 1, 0) + z(1, 0, 1).
Observe that the vectors (2, 1, 0) and (1, 0, 1) are both orthogonal to (1, 2, 1) but are not
orthogonal to each other.
Method 1: Consider { 16 (1, 2, 1), (2, 1, 0), (1, 0, 1)} R3 and apply the Gram-Schmidt
process to get the result.
Method 2: This method can be used only if the vectors are from R3 . Recall that in R3 , the
cross product of two vectors u and v, denoted u v, is a vector that is orthogonal to both the
vectors u and v. Hence, the vector
(1, 2, 1) (2, 1, 0) = (0 1, 2 0, 1 + 4) = (1, 2, 5)
is orthogonal to the vectors (1, 2, 1) and (2, 1, 0) and hence the required orthonormal set is
1
{ 16 (1, 2, 1),
(2, 1, 0), 1
(1, 2, 5)}.
5
30
111
(a) Let {u1 , u2 , . . . , uk } be a linearly independent subset of V. Then Gram-Schmidt orthogonalization process gives an orthonormal set {v1 , v2 , . . . , vk } of V with
L(v1 , v2 , . . . , vi ) = L(u1 , u2 , . . . , ui ) for 1 i k.
(b) Let W be a subspace of V with a basis {u1 , u2 , . . . , uk }. Then {v1 , v2 , . . . , vk } is also a
basis of W .
(c) Suppose {u1 , u2 , . . . , un } is a linearly dependent subset of V . Then there exists a smallest
k, 2 k n such that wk = 0.
Idea of the proof: Linear dependence (see Corollary 3.2.5) implies that there exists a
smallest k, 2 k n such that
L(u1 , u2 , . . . , uk ) = L(u1 , u2 , . . . , uk1 ).
Also, by Gram-Schmidt orthogonalization process
L(u1 , u2 , . . . , uk1 ) = L(v1 , v2 , . . . , vk1 ).
Thus, uk L(v1 , v2 , . . . , vk1 ) and hence by Remark 5.2.19
uk = huk , v1 iv1 + huk , v2 iv2 + + huk , vk1 ivk1 .
So, by definition wk = 0.
2. Let S be a countably infinite set of linearly independent vectors. Then one can apply the
Gram-Schmidt process to get a countably infinite orthonormal set.
3. Let B = (e1 , e2 , . . . , en ) be the standard ordered basis of Rn . Suppose Rn has an orthonormal
set {v1 , v2 , . . . , vn }. Then we can find real numbers ij , 1 i, j n such that
[vi ]B = (1i , 2i , . . . , ni )t for i = 1, 2, . . . , n.
11 12 1n
21 22 2n
Ann = .
..
.. .
..
..
.
.
.
n1
n2
nn
0 = hvi , vj i =
Note that,
At A =
v1t
v2t
.
..
vnt
1 0
0 1
. .
.. ..
0 0
[v1 , v2 , . . . , vn ]
..
.
0
0
.. = In .
.
n
P
s=1
si sj .
n
P
j=1
2ji ,
(5.3.4)
kv1 k2
hv1 , v2 i
hv2 , v1 i
kv2 k2
=
..
..
..
.
.
.
hvn , v1 i hvn , v2 i
hv1 , vn i
hv2 , vn i
..
.
2
kvn k
112
11 21 n1
11 12
12 22 n2 21 22
At A = .
..
.. ..
..
.
.
.
.
.
.
. .
.
n1 n2
1n 2n nn
..
.
1n
2n
.. = In .
.
nn
Recall that the matrix A is an orthogonal matrix. So, we see that the rows/columns of an
n n orthogonal matrix gives rise to an orthonormal basis of Rn . Using a similar idea, one
can show that the rows/columns of an n n unitary matrix gives rise to an orthonormal basis
of the complex vector space Cn .
Definition 5.2.11 started with a subspace of an inner product space V and looked at its complement. Be now look at the orthogonal complement of a subset of an inner product space V and the
associated results.
Definition 5.3.5 (Orthogonal Subspace of a Set) Let V be an inner product space. Let S be
a non-empty subset of V . We define
S = {v V : hv, si = 0 for all s S}.
Example 5.3.6 Let V = R.
1. S = {0}. Then S = R.
2. S = R, Then S = {0}.
3. Let S be any subset of R containing a non-zero real number. Then S = {0}.
4. Let S = {(1, 2, 1)} R3 . Then using Example 5.3.3.2, S = L({(2, 1, 0), (1, 0, 1)}).
We now state the result which gives the existence of an orthogonal subspace of a finite dimensional inner product space.
Theorem 5.3.7 Let S be a subset of a finite dimensional inner product space V, with inner product
h , i. Then
1. S is a subspace of V.
2. Let W = L(S). Then the subspaces W and S = W are complementary. That is, V =
W + S = W + W .
3. Moreover, hu, wi = 0 for all w W and u S .
Proof. We leave the prove of the first part to the reader. The prove of the second part is as
follows:
Let dim(V ) = n and dim(W ) = k. Let {w1 , w2 , . . . , wk } be a basis of W. By Gram-Schmidt
orthogonalization process, we get an orthonormal basis, say {v1 , v2 , . . . , vk } of W. Then, for any
v V,
k
X
v
hv, vi ivi S .
i=1
113
3. Let A and B be two n n orthogonal matrices. Then prove that AB and BA are both
orthogonal matrices. Prove a similar result for unitary matrices.
4. Prove the statements made in Remark 5.3.4.3 about orthogonal matrices. State and prove a
similar result for unitary matrices.
5. Let A be an n n upper triangular matrix. If A is also an orthogonal matrix then prove that
A = In .
6. Determine an orthonormal basis of R4 containing the vectors (1, 2, 1, 3) and (2, 1, 3, 1).
7. Consider the real inner product space C[1, 1] with hf, gi =
R1
12. Let v, w Rn , n 1 with kuk = kwk = 1. Prove that there exists an orthogonal matrix A
such that Av = w. Prove also that A can be chosen such that det(A) = 1.
114
5.4
Recall that given a k-dimensional vector subspace of a vector space V of dimension n, one can
always find an (n k)-dimensional vector subspace W0 of V (see Exercise 3.3.13.5) satisfying
W + W0 = V
and W W0 = {0}.
The subspace W0 is called the complementary subspace of W in V. We first use Theorem 5.3.7 to get
the complementary subspace in such a way that the vectors in different subspaces are orthogonal.
That is, hw, vi = 0 for all w W and v W0 . We then use this to define an important class of
linear transformations on an inner product space, called orthogonal projections.
Definition 5.4.1 (Orthogonal Complement and Orthogonal Projection) Let W be a subspace of a finite dimensional inner product space V .
1. Then W is called the orthogonal complement of W in V. We represent it by writing V =
W W in place of V = W + W .
2. Also, for each v V there exist unique vectors w W and u W such that v = w + u.
We use this to define
PW : V V by PW (v) = w.
Then PW is called the orthogonal projection of V onto W .
Exercise 5.4.2 Let W be a subspace of a finite dimensional inner product space V . Use V =
W W to define the orthogonal projection operator PW of V onto W . Prove that the maps
PW and PW are indeed linear transformations.
Example 5.4.3
1. Let V = R3 and W = {(x, y, z) R3 : x + y z = 0}. Then it can be easily
verified that {(1, 1, 1)} is a basis of W as for each (x, y, z) W , we have x + y z = 0
and hence
h(x, y, z), (1, 1, 1)i = x + y z = 0 for each (x, y, z) W.
Also, using Equation (5.3.1), for every xt = (x, y, z) R3 , we have u =
, x+2y+z
, x+y+2z
) and x = w + u. Let
w = ( 2xy+z
3
3
3
x+yz
(1, 1, 1),
3
2 1 1
1
1 1
1
1
A=
1 2 1 and B = 1
1 1 .
3
3
1
1 2
1 1 1
Then by definition, PW (x) = w = Ax and PW (x) = u = Bx. Observe that A2 = A, B 2 = B,
At = A, B t = B, A B = 03 , B A = 03 and A + B = I3 , where 03 is the zero matrix of size
3 3 and I3 is the identity matrix of size 3. Also, verify that rank(A) = 2 and rank(B) = 1.
2. Let W = L( (1, 2, 1) ) R3 . Then using Example 5.3.3.2, and Equation (5.3.1), we get
W = L({(2, 1, 0), (1, 0, 1)}) = L({(2, 1, 0), (1, 2, 5)}),
u = ( 5x2yz
, 2x+2y2z
, x2y+5z
) and w = x+2y+z
(1, 2, 1) with (x, y, z) = w + u. Hence,
6
6
6
6
for
1 2 1
5 2 1
1
1
A = 2 4 2 and B = 2 2 2 ,
6
6
1 2 1
1 2 5
115
Let x R(TA ) . Then 0 = hx, TA (y)i = hTA (x), yi for every y Rn . Hence, by
Exercise 2 TA (x) = 0. That is, x N (A) and hence R(TA ) N (TA ).
We now state and prove the main result related with orthogonal projection operators.
Theorem 5.4.6 Let W be a vector subspace of a finite dimensional inner product space V and let
PW : V V be the orthogonal projection operator of V onto W .
1. Then N (PW ) = {v V : PW (v) = 0} = W = R(PW ).
2. Then R(PW ) = {PW (v) : v V } = W = N (PW ).
3. Then PW PW = PW , PW PW = PW .
116
4. Let 0V denote the zero operator on V defined by 0V (v) = 0 for all v V . Then PW PW =
0V and PW PW = 0V .
5. Let IV denote the identity operator on V defined by IV (v) = v for all v V . Then IV =
PW PW , where we have written instead of + to indicate the relationship PW PW = 0V
and PW PW = 0V .
6. The operators PW and PW are self-adjoint.
Proof. Part 1: Let u W . As V = W W , we have u = 0 + u for 0 W and u W .
Hence by definition, PW (u) = 0 and PW (u) = u. Thus, W N (PW ) and W R(PW ).
Also, suppose that v N (PW ) for some v V . As v has a unique expression as v = w + u
for some w W and some u W , by definition of PW , we have PW (v) = w. As v N (PW ), by
definition, PW (v) = 0 and hence w = 0. That is, v = u W . Thus, N (PW ) W .
One can similarly show that R(PW ) W . Thus, the proof of the first part is complete.
Part 2: Similar argument as in the proof of Part 1.
Part 3, Part 4 and Part 5: Let v V and let v = w + u for some w W and u W .
Then by definition,
(PW PW )(v) = PW PW (v) = PW (w) = w & PW (v) = w,
(5.4.1)
(PW PW )(v) = PW PW (v) = PW (w) = 0 and
(5.4.2)
(PW PW )(v)
(5.4.3)
Hence, applying Exercise 2 to Equations (5.4.1), (5.4.2) and (5.4.3), respectively, we get PW PW =
PW , PW PW = 0V and IV = PW PW .
Part 6: Let u = w1 + x1 and v = w2 + x2 , where w1 , w2 W and x1 , x2 W . Then, by
definition hwi , xj i = 0 for 1 i, j 2. Thus,
hPW (u), vi = hw1 , vi = hw1 , w2 i = hu, w2 i = hu, PW (v)i
and the proof of the theorem is complete.
The next theorem is a generalization of Theorem 5.4.6 when a finite dimensional inner product
space V can be written as V = W1 W2 Wk , where Wi s are vector subspaces of V . That
is, for each v V there exist unique vectors v1 , v2 , . . . , vk such that
1. vi Wi for 1 i k,
2. hvi , vj i = 0 for each vi Wi , vj Wj , 1 i 6= j k and
3. v = v1 + v2 + + vk .
We omit the proof as it basically uses arguments that are similar to the arguments used in the
proof of Theorem 5.4.6.
Theorem 5.4.7 Let V be a finite dimensional inner product space and let W1 , W2 , . . . , Wk be vector
subspaces of V such that V = W1 W2 Wk . Then for each i, j, 1 i 6= j k, there exist
orthogonal projection operators PWi : V V of V onto Wi satisfying the following:
1. N (PWi ) = Wi = W1 W2 Wi1 Wi+1 Wk .
2. R(PWi ) = Wi .
3. PWi PWi = PWi .
117
4. PWi PWj = 0V .
5. PWi is a self-adjoint operator, and
6. IV = PW1 PW2 PWk .
Remark 5.4.8
and equality holds if and only if w = PW (v). Since PW (v) W, we see that
d(v, W ) = inf {kv wk : w W } = kv PW (v)k.
That is, PW (v) is the vector nearest to v W. This can also be stated as: the vector
PW (v) solves the following minimization problem:
inf kv wk = kv PW (v)k.
wW
Exercise 5.4.9
1. Let A Mn (R) be an idempotent matrix and define TA : Rn Rn by
TA (v) = Av for all vt Rn . Recall the following results from Exercise 4.3.12.5.
(a) TA TA = TA
(b) N (TA ) R(TA ) = {0}.
The linear map TA need not be an orthogonal projection operator as R(TA ) need not be
equal to N (TA ). Here TA is called a projection operator of Rn onto R(TA ) along N (TA ).
118
5.4.1
The minimization problem stated above arises in lot of applications. So, it is very helpful if the
matrix of the orthogonal projection can be obtained under a given basis.
i=1
n
a
a
a
22
2k
P
21
..
vi =
aji ej , for 1 i k then Ank = .
, [v]B2 =
.
.
.
.
.
.
.
.
.
j=1
.
.
.
n
P
ani hv, vi i
an1 an2 ank
i=1
k
P
i=1 a1i hv, vi i
..
and [PW (v)]B2 =
. Also, as observed in Remark 5.3.4.3, At A = Ik . That is, for
.
P
ani hv, vi i
1 i, j k,
i=1
n
X
s=1
asi asj =
1
0
if i = j
if i =
6 j.
(5.4.4)
119
AAt [v]B2
a11
a12
A .
..
a1k
..
.
an1
an2
..
.
ank
a1i hv, vi i
i=1
P
n
a2i hv, vi i
i=1
..
P
n
ani hv, vi i
i=1
n
n
n n
P
P
P P
s=1 as1 i=1 asi hv, vi i
i=1 s=1 as1 asi hv, vi i
n
n
n n
P P
..
..
.
.
n
n
n
n
P P
P
ask
asi hv, vi i
ask asi hv, vi i
s=1
a21
a22
..
.
a2k
P
n
i=1
i=1
k
P
s=1
a hv, vi i
i=1 1i
k
hv, v1 i
P
hv, v2 i
a2i hv, vi i
A . = i=1
..
.
..
hv, vk i
P
k
ani hv, vi i
i=1
[PW (v)]B2 = PW [B2 , B2 ] [v]B2 .
Thus PW [B2 , B2 ] = AAt and hence we have proved the following theorem.
Theorem 5.4.10 Let W be a k-dimensional subspace of Rn and let PW be the corresponding orthogonal projection of Rn onto W . Also assume that B = (v1 , v2 , . . . , vk ) is an orthonormal ordered
basis of W. Then the matrix of PW in the standard ordered basis (e1 , e2 , . . . , en ) of Rn is AAt (a
symmetric matrix), where A = [v1 , v2 , . . . , vk ] is an n k matrix.
We illustrate the above theorem with the help of an example. One can also see Example 5.4.3.
Example 5.4.11 Let W = {(x, y, z, w) R4 : x = y, z = w} be a subspace of W. Then an orthonor
mal ordered basis of W and W is 12 (1, 1, 0, 0), 12 (0, 0, 1, 1) and 12 (1, 1, 0, 0), 12 (0, 0, 1, 1) ,
respectively. Let PW : R4 R4 be an orthogonal projection of R4 onto W . Then
1
2
1
2
where B =
A=
0
0
1
0
2
1
0
and PW [B, B] = AA = 2
1
0
2
0
1
2
1
2
1
2
0
0
1
2
1
2
0
0
1. PW [B, B] is symmetric,
2
0
0
1 ,
2
1
2
. Verify that
120
x+y
, z+w
, xy
, zw
2
2
2
2
t
and hence
x+y
z+w
(1, 1, 0, 0) +
(0, 0, 1, 1)
PW (x, y, z, w) =
2
2
2. Let W be a subspace of an inner product space V and let P : V V be the orthogonal projection of V onto W . Let B be an orthonormal ordered basis of V. Then prove that
t
(P [B, B]) = P [B, B].
3. Let W1 = {(x, 0) : x R} and W2 = {(x, x) : x R} be two subspaces of R2 . Let PW1 and
PW2 be the corresponding orthogonal projection operators of R2 onto W1 and W2 , respectively.
Compute PW1 PW2 and conclude that the composition of two orthogonal projections need not
be an orthogonal projection?
4. Let W be an (n 1)-dimensional subspace of Rn . Suppose B is an orthogonal ordered basis
of Rn obtained by extending an orthogonal ordered basis of W. Define
T : Rn Rn by T (v) = w0 w
whenever v = w + w0 for some w W and w0 W . Then
(a) prove that T is a linear transformation,
(b) find T [B, B] and
5.5
QR Decomposition
The next result gives the proof of the QR decomposition for real matrices. A similar result holds
for matrices with complex entries. The readers are advised to prove that for themselves. This decomposition and its generalizations are helpful in the numerical calculations related with eigenvalue
problems (see Chapter 6).
Theorem 5.5.1 (QR Decomposition) Let A be a square matrix of order n with real entries.
Then there exist matrices Q and R such that Q is orthogonal and R is upper triangular with
A = QR.
In case, A is non-singular, the diagonal entries of R can be chosen to be positive. Also, in this
case, the decomposition is unique.
Proof. We prove the theorem when A is non-singular. The proof for the singular case is left as
an exercise.
Let the columns of A be x1 , x2 , . . . , xn . Then {x1 , x2 , . . . , xn } is a basis of Rn and hence
the Gram-Schmidt orthogonalization process gives an ordered basis (see Remark 5.3.4), say B =
(v1 , v2 , . . . , vn ) of Rn satisfying
L(v1 , v2 , . . . , vi ) = L(x1 , x2 , . . . , xi ),
for 1 i 6= j n.
(5.5.5)
kvi k = 1, hvi , vj i = 0,
5.5. QR DECOMPOSITION
121
11
0
QR =
12
22
..
.
0
..
.
(5.5.6)
1n
2n
11 12 1n
0 22 2n
[v1 , v2 , . . . , vn ] .
..
..
..
..
.
.
.
0
0 nn
n
X
11 v1 , 12 v1 + 22 v2 , . . . ,
in vi
i=1
[x1 , x2 , . . . , xn ] = A.
Thus, we see that A = QR, where Q is an orthogonal matrix (see Remark 5.3.4.1) and R is an
upper triangular matrix.
The proof doesnt guarantee that for 1 i n, ii is positive. But this can be achieved by
replacing the vector vi by vi whenever ii is negative.
1
Uniqueness: suppose Q1 R1 = Q2 R2 then Q1
2 Q1 = R2 R1 . Observe the following properties
of upper triangular matrices.
1. The inverse of an upper triangular matrix is also an upper triangular matrix, and
2. product of upper triangular matrices is also upper triangular.
Thus the matrix R2 R11 is an upper triangular matrix. Also, by Exercise 5.3.8.3, the matrix Q1
2 Q1
1
is an orthogonal matrix. Hence, by Exercise 5.3.8.5, R2 R1 = In . So, R2 = R1 and therefore
Q2 = Q1 .
Let A = [x1 , x2 , . . . , xk ] be an n k matrix with rank (A) = r. Then by Remark 5.3.4.1c, the
Gram-Schmidt orthogonalization process applied to {x1 , x2 , . . . , xk } yields a set {v1 , v2 , . . . , vr } of
orthonormal vectors of Rn and for each i, 1 i r, we have
L(v1 , v2 , . . . , vi ) = L(x1 , x2 , . . . , xj ), for some j, i j k.
Hence, proceeding on the lines of the above theorem, we have the following result.
Theorem 5.5.2 (Generalized QR Decomposition) Let A be an n k matrix of rank r. Then
A = QR, where
1. Q = [v1 , v2 , . . . , vr ] is an n r matrix with Qt Q = Ir ,
2. L(v1 , v2 , . . . , vr ) = L(x1 , x2 , . . . , xk ), and
3. R is an r k matrix with rank (R) = r.
122
1 0 1 2
0 1 1 1
Example 5.5.3
1. Let A =
1 0 1 1 . Find an orthogonal matrix Q and an upper tri0 1 1 1
angular matrix R such that A = QR.
Solution: From Example 5.3.3, we know that
1
1
1
v1 = (1, 0, 1, 0), v2 = (0, 1, 0, 1) and v3 = (0, 1, 0, 1).
2
2
2
(5.5.7)
1
(1, 0, 1, 0)t.
2
(5.5.8)
Thus, using Equations (5.5.7), (5.5.8) and Q = v1 , v2 , v3 , v4 , we get
1
0
0
2 0
2 32
2
2
1
0
0
2 0
2
0
2
2
Q= 1
and
R
=
0
0
2 0
0
2
0
2
1
0
0
0
1
1
0
0
2
2
2
that A = QR is indeed correct.
1 1 1 0
1 0 2 1
t
2. Let A =
1 1 1 0 . Find a 43 matrix Q satisfying Q Q = I3 and an upper triangular
1 0 2 1
matrix R such that A = QR.
Solution: Let us apply the Gram Schmidt orthogonalization process to the columns of A.
That is, apply the process to the subset {(1, 1, 1, 1), (1, 0, 1, 0), (1, 2, 1, 2), (0, 1, 0, 1)} of R4 .
Let u1 = (1, 1, 1, 1). Define v1 = 12 u1 . Let u2 = (1, 0, 1, 0). Then
w2 = (1, 0, 1, 0) hu2 , v1 iv1 = (1, 0, 1, 0) v1 =
1
(1, 1, 1, 1).
2
1 (0, 1, 0, 1).
2
Hence,
Q = [v1 , v2 , v3 ] =
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2 ,
1
2
2 1
and R = 0 1
0 0
3
0
1 0 .
0
2
Chapter 6
In this chapter, the linear transformations are from the complex vector space Cn to itself. Observe
that in this case, the matrix of the linear transformation is an n n matrix. So, in this chapter,
all the matrices are square matrices and a vector x means x = (x1 , x2 , . . . , xn )t for some positive
integer n.
Example 6.1.1 Let A be a real symmetric matrix. Consider the following problem:
Maximize (Minimize) xt Ax such that x Rn and xt x = 1.
To solve this, consider the Lagrangian
L(x, ) = xt Ax (xt x 1) =
n X
n
X
i=1 j=1
n
X
aij xi xj (
x2i 1).
i=1
L
= 2an1 x1 + 2an2 x2 + + 2ann xn 2xn .
xn
Therefore, to get the points of extremum, we solve for
(0, 0, . . . , 0)t = (
L L
L t
L
,
,...,
) =
= 2(Ax x).
x1 x2
xn
x
We therefore need to find a R and 0 6= x Rn such that Ax = x for the extremal problem.
Let A be a matrix of order n. In general, we ask the question:
For what values of F, there exist a non-zero vector x Fn such that
Ax = x?
123
(6.1.1)
124
Here, Fn stands for either the vector space Rn over R or Cn over C. Equation (6.1.1) is equivalent
to the equation
(A I)x = 0.
By Theorem 2.4.1, this system of linear equations has a non-zero solution, if
rank (A I) < n,
or equivalently
det(A I) = 0.
So, to solve (6.1.1), we are forced to choose those values of F for which det(AI) = 0. Observe
that det(A I) is a polynomial in of degree n. We are therefore lead to the following definition.
Definition 6.1.2 (Characteristic Polynomial, Characteristic Equation) Let A be a square
matrix of order n. The polynomial det(A I) is called the characteristic polynomial of A and is
denoted by pA () (in short, p(), if the matrix A is clear from the context). The equation p() = 0 is
called the characteristic equation of A. If F is a solution of the characteristic equation p() = 0,
then is called a characteristic value of A.
Some books use the term eigenvalue in place of characteristic value.
Theorem 6.1.3 Let A Mn (F). Suppose = 0 F is a root of the characteristic equation. Then
there exists a non-zero v Fn such that Av = 0 v.
Proof. Since 0 is a root of the characteristic equation, det(A 0 I) = 0. This shows that the
matrix A 0 I is singular and therefore by Theorem 2.4.1 the linear system
(A 0 In )x = 0
has a non-zero solution.
Remark 6.1.4 Observe that the linear system Ax = x has a solution x = 0 for every F.
So, we consider only those x Fn that are non-zero and are also solutions of the linear system
Ax = x.
Definition 6.1.5 (Eigenvalue and Eigenvector) Let A Mn (F) and let the linear system Ax =
x has a non-zero solution x Fn for some F. Then
1. F is called an eigenvalue of A,
2. x Fn is called an eigenvector corresponding to the eigenvalue of A, and
3. the tuple (, x) is called an eigen-pair.
Remark 6.1.6 To understand the difference between a characteristic value and an eigenvalue, we
give the following
example.
0 1
Let A =
. Then pA () = 2 + 1. Also, define the linear operator TA : F2 F2 by
1 0
TA (x) = Ax for every x F2 .
1. Suppose F = C, i.e., A M2 (C). Then the roots of p() = 0 in C are i. So, A has (i, (1, i)t )
and (i, (i, 1)t ) as eigen-pairs.
2. If A M2 (R), then p() = 0 has no solution in R. Therefore, if F = R, then A has no
eigenvalue but it has i as characteristic values.
125
Remark 6.1.7
1. Let A Mn (F). Suppose (, x) is an eigen-pair of A. Then for any c
F, c 6= 0, (, cx) is also an eigen-pair for A. Similarly, if x1 , x2 , . . . , xr are linearly indepenr
P
dent eigenvectors of A corresponding to the eigenvalue , then
ci xi is also an eigenvector
i=1
1 1
2. Let A =
. Then p() = (1 )2 . Hence, the characteristic equation has roots 1, 1. That
0 1
is, 1 is a repeated eigenvalue. But the system (A I2 )x = 0 for x = (x1 , x2 )t implies that
x2 = 0. Thus, x = (x1 , 0)t is a solution of (A I2 )x = 0. Hence using Remark 6.1.7.1, (1, 0)t
is an eigenvector. Therefore, note that 1 is a repeated eigenvalue whereas there is
only one eigenvector.
1 0
3. Let A =
. Then p() = (1 )2 . Again, 1 is a repeated root of p() = 0. But in this
0 1
case, the system (A I2 )x = 0 has a solution for every xt R2 . Hence, we can choose
any two linearly independent vectors xt , yt from R2 to get (1, x) and (1, y) as the
two eigen-pairs. In general, if x1 , x2 , . . . , xn Rn are linearly independent vectors then
(1, x1 ), (1, x2 ), . . . , (1, xn ) are eigen-pairs of the identity matrix, In .
1 2
. Then p() = (3)(+1) and its roots are 3, 1. Verify that the eigen-pairs
2 1
are (3, (1, 1)t ) and (1, (1, 1)t). The readers are advised to prove the linear independence of
the two eigenvectors.
1 1
5. Let A =
. Then p() = 2 2 + 2 and its roots are 1 + i, 1 i. Hence, over R, the
1 1
matrix A has no eigenvalue. Over C, the reader is required to show that the eigen-pairs are
(1 + i, (i, 1)t ) and (1 i, (1, i)t ).
4. Let A =
Exercise 6.1.9
2. Find
eigen-pairs
C, for each
over
of the following matrices:
1
1+i
i
1+i
cos sin
cos
,
,
and
1i
1
1 + i
i
sin cos
sin
sin
.
cos
126
n
P
j=1
5. Prove that the matrices A and At have the same set of eigenvalues. Construct a 2 2 matrix
A such that the eigenvectors of A and At are different.
6. Let A be a matrix such that A2 = A (A is called an idempotent matrix). Then prove that its
eigenvalues are either 0 or 1 or both.
7. Let A be a matrix such that Ak = 0 (A is called a nilpotent matrix) for some positive integer
k 1. Then prove that its eigenvalues are all 0.
2 1
2 i
8. Compute the eigen-pairs of the matrices
and
.
1 0
i 0
Theorem 6.1.10 Let A = [aij ] be an n n matrix with eigenvalues 1 , 2 , . . . , n , not necessarily
n
n
n
P
P
Q
i .
distinct. Then det(A) =
i and tr(A) =
aii =
i=1
i=1
i=1
(6.1.2)
n
Y
i =
i=1
n
Y
i .
i=1
Also,
a11
a21
det(A In ) = .
..
an1
a12
a22
..
..
.
.
an2
= a0 a1 + 2 a2 +
a1n
a2n
..
.
ann
(6.1.3)
(6.1.4)
for some a0 , a1 , . . . , an1 F. Note that an1 , the coefficient of (1)n1 n1 , comes from the
product
(a11 )(a22 ) (ann ).
n
P
So, an1 =
aii = tr(A) by definition of trace.
i=1
(1)n ( 1 )( 2 ) ( n ).
(6.1.5)
n
X
i=1
i } =
n
X
i .
i=1
127
Exercise 6.1.11
1. Let A be a skew symmetric matrix of order 2n + 1. Then prove that 0 is an
eigenvalue of A.
2. Let A be a 3 3 orthogonal matrix (AAt = I). If det(A) = 1, then prove that there exists a
non-zero vector v R3 such that Av = v.
Let A be an n n matrix. Then in the proof of the above theorem, we observed that the
characteristic equation det(A I) = 0 is a polynomial equation of degree n in . Also, for some
numbers a0 , a1 , . . . , an1 F, it has the form
n + an1 n1 + an2 2 + a1 + a0 = 0.
Note that, in the expression det(A I) = 0, is an element of F. Thus, we can only substitute
by elements of F.
It turns out that the expression
An + an1 An1 + an2 A2 + a1 A + a0 I = 0
holds true as a matrix identity. This is a celebrated theorem called the Cayley Hamilton Theorem.
We state this theorem without proof and give some implications.
Theorem 6.1.12 (Cayley Hamilton Theorem) Let A be a square matrix of order n. Then A
satisfies its characteristic equation. That is,
An + an1 An1 + an2 A2 + a1 A + a0 I = 0
holds true as a matrix identity.
Some of the implications of Cayley Hamilton Theorem are as follows.
0 1
Remark 6.1.13
1. Let A =
. Then its characteristic polynomial is p() = 2 . Also,
0 0
for the function, f (x) = x, f (0) = 0, and f (A) = A 6= 0. This shows that the condition
f () = 0 for each eigenvalue of A does not imply that f (A) = 0.
2. Suppose we are given a square matrix A of order n and we are interested in calculating A
where is large compared to n. Then we can use the division algorithm to find numbers
0 , 1 , . . . , n1 and a polynomial f () such that
f () n + an1 n1 + an2 2 + a1 + a0
+0 + 1 + + n1 n1 .
128
1 n1
[A
+ an1 An2 + + a1 I].
an
Exercise 6.1.14 Find inverse of the following matrices by using the Cayley Hamilton Theorem
2 3
i) 5 6
1 1
4
7
2
1
ii) 1
0
1 1
1 1
1 1
1 2 1
iii) 2 1 1 .
0 1 2
Theorem 6.1.15 If 1 , 2 , . . . , k are distinct eigenvalues of a matrix A with corresponding eigenvectors x1 , x2 , . . . , xk , then the set {x1 , x2 , . . . , xk } is linearly independent.
Proof. The proof is by induction on the number m of eigenvalues. The result is obviously true
if m = 1 as the corresponding eigenvector is non-zero and we know that any set containing exactly
one non-zero vector is linearly independent.
Let the result be true for m, 1 m < k. We prove the result for m+1. We consider the equation
c1 x1 + c2 x2 + + cm+1 xm+1 = 0
(6.1.6)
=
=
=
(6.1.7)
(c) if A is nonsingular then BA1 and A1 B have the same set of eigenvalues.
6.2. DIAGONALIZATION
129
(e) det(A + xyt ) equals (1 + 0 ) det(A) for any R. This result is known as the
Shermon-Morrison formula for determinant.
(b) Give an example to show that if u and v are linearly independent then 1 + 2 need not
be an eigenvalue of A + B.
6.2
c1
c2
cn
u1 + u2 + +
un .
1
2
n
Diagonalization
Let A Mn (F) and let TA : Fn Fn be the corresponding linear operator. In this section, we ask
the question does there exist a basis B of Fn such that TA [B, B], the matrix of the linear operator
TA with respect to the ordered basis B, is a diagonal matrix. it will be shown that for a certain
class of matrices, the answer to the above question is in affirmative. To start with, we have the
following definition.
Definition 6.2.1 (Matrix Digitalization) A matrix A is said to be diagonalizable if there exists
a non-singular matrix P such that P 1 AP is a diagonal matrix.
Remark 6.2.2 Let A Mn (F) be a diagonalizable matrix with eigenvalues 1 , 2 , . . . , n . By definition, A is similar to a diagonal matrix D = diag(1 , 2 , . . . , n ) as similar matrices have the
same set of eigenvalues and the eigenvalues of a diagonal matrix are its diagonal entries.
Example 6.2.3 Let A =
0 1
1 0
130
1. Let V = R2 . Then A has no real eigenvalue (see Example 6.1.7 and hence A doesnt have
eigenvectors that are vectors in R2 . Hence, there does not exist any non-singular 2 2 real
matrix P such that P 1 AP is a diagonal matrix.
2. In case, V = C2 (C), the two complex eigenvalues of A are i, i and the corresponding eigenvectors are (i, 1)t and (i, 1)t , respectively. Also, (i, 1)t and (i, 1)t can be taken as a basis
i i
1
2
of C (C). Define U = 2
. Then
1 1
U AU =
i
0
0
i
Theorem 6.2.4 Let A Mn (R). Then A is diagonalizable if and only if A has n linearly independent eigenvectors.
Proof. Let A be diagonalizable. Then there exist matrices P and D such that
P 1 AP = D = diag(1 , 2 , . . . , n ).
Or equivalently, AP = P D. Let P = [u1 , u2 , . . . , un ]. Then AP = P D implies that
Aui = di ui for 1 i n.
Since ui s are the columns of a non-singular matrix P, using Corollary 4.3.7, they form a linearly
independent set. Thus, we have shown that if A is diagonalizable then A has n linearly independent
eigenvectors.
Conversely, suppose A has n linearly independent eigenvectors ui , 1 i n with eigenvalues
i . Then Aui = i ui . Let P = [u1 , u2 , . . . , un ]. Since u1 , u2 , . . . , un are linearly independent, by
Corollary 4.3.7, P is non-singular. Also,
AP
1 0
0
0 2 0
[u1 , u2 , . . . , un ] . .
= P D.
. . ...
..
0
0 n
6.2. DIAGONALIZATION
131
exactly mi linearly independent eigenvectors. Thus, for each i, 1 i k, the homogeneous linear
system (A i I)x = 0 has exactly mi linearly independent vectors in its solution set. Therefore,
dim ker(A i I) mi . Indeed dim ker(A i I) = mi for 1 i k follows from a simple
counting argument.
Now suppose that for each i, 1 i k, dim ker(A i I) = mi . Then for each i, 1 i k,
we can choose mi linearly independent eigenvectors. Also by Corollary 6.1.16, the eigenvectors
k
P
corresponding to distinct eigenvalues are linearly independent. Hence A has n =
mi linearly
independent eigenvectors. Hence by Theorem 6.2.4, A is diagonalizable.
i=1
2 1 1
Example 6.2.7
1. Let A = 1 2 1 . Then pA () = (2 )2 (1 ). Hence, the eigen0 1 1
values of A are 1, 2, 2. Verify that 1, (1, 0, 1)t and ( 2, (1, 1, 1)t are the only eigen-pairs.
That is, the matrix A has exactly one eigenvector corresponding to the repeated eigenvalue 2.
Hence, by Theorem 6.2.4, A is not diagonalizable.
2 1 1
2. Let A = 1 2 1 . Then pA () = (4 )(1 )2 . Hence, A has eigenvalues 1, 1, 4.
1 1 2
Verify that u1 = (1, 1, 0)t and u2 = (1, 0, 1)t are eigenvectors corresponding to 1 and
u3 = (1, 1, 1)t is an eigenvector corresponding to the eigenvalue 4. As u1 , u2 , u3 are linearly
independent, by Theorem 6.2.4, A is diagonalizable.
Note that the vectors u1 and u2 (corresponding to the eigenvalue 1) are not orthogonal. So,
in place of u1 , u2 , we will take the orthogonal
vectors u2 and
w = 2u1 u2 as eigenvectors.
Now define U = [ 13 u3 , 12 u2 , 16 w] =
1
3
1
3
1
3
1
2
12
1
6
2
6
1
6
Observe that A is a symmetric matrix. In this case, we chose our eigenvectors to be mutually
orthogonal. This result is true for any real symmetric matrix A. This result will be proved
later.
cos sin
cos
sin
Exercise 6.2.8
1. Are the matrices A =
and B =
for some
sin cos
sin cos
, 0 2, diagonalizable?
2. Find the eigen-pairs of A = [aij ]nn , where aij = a if i = j and b, otherwise.
A 0
3. Let A Mn (R) and B Mm (R). Suppose C =
. Then prove that C is diagonalizable
0 B
if and only if both A and B are diagonalizable.
4. Let T : R5 R5 be a linear operator with rank (T I) = 3 and
N (T ) = {(x1 , x2 , x3 , x4 , x5 ) R5 | x1 + x4 + x5 = 0, x2 + x3 = 0}.
(a) Determine the eigenvalues of T ?
132
5. Let A be a non-zero square matrix such that A2 = 0. Prove that A cannot be diagonalized.
[Hint: Use Remark 6.2.2.]
6. Are the following matrices diagonalizable?
1 3 2 1
1 0 1
1
0 2 3 1
, ii) 0 0 1 , iii) 0
i)
0 0 1 1
0 2 0
0
0 0 0 4
6.3
3 3
2 i
.
5 6 and iv)
i 0
3 4
Diagonalizable Matrices
In this section, we will look at some special classes of square matrices that are diagonalizable. Recall
t
that for a matrix A = [aij ], A = [aji ] = At = A , is called the conjugate transpose of A. We also
recall the following definitions.
Definition 6.3.1 (Special Matrices)
1. Let B =
i 1
. Then B is skew-Hermitian.
1 i
1 i
1 1
2. Let A =
and B =
. Then A is a unitary matrix and B is a normal matrix.
i 1
1 1
Definition 6.3.3 (Unitary Equivalence) Let A, B Mn (C). They are called unitarily equivalent if there exists a unitary matrix U such that A = U BU. As U is a unitary matrix, U = U 1 .
Hence, A is also unitarily similar to B.
Exercise 6.3.4
1. Let A be a square matrix such that U AU is a diagonal matrix for some
unitary matrix U . Prove that A is a normal matrix.
133
1 1 4
2 1 3 2
A = 0 2 2 and B = 0 1
2 .
0 0 3
0 0
3
U1 AU1
x
u2
U1 [Ax, Au2 , , Auk ] = . [1 x, Au2 , , Auk ]
..
uk
1 x x
u2 (1 x)
..
.
uk (1 x)
..
.
x Auk
u2 (Auk )
=
..
.
uk (Auk )
1
0
..
.
0
134
=
=
=
1 0
1 0
U1
A U1
0 U2
0 U2
1 0
1 0
1 0
1 0
U A U1
=
U1 AU1
0 U2 1
0 U2
0 U2
0 U2
1 0
1 0 1 0
1
0
1 0
=
=
.
0 U2 0 B 0 U2
0 U2 BU2
0 D2
135
4 4
10
9
4. Show that the matrices A =
and B =
are similar. Is it possible to find a
0 4
4 2
unitary matrix U such that A = U BU ?
5. Let A be a 2 2 orthogonal matrix. Then prove the following:
cos sin
(a) if det(A) = 1, then A =
for some , 0 < 2. That is, A countersin cos
clockwise rotates every point in R2 by an angle .
cos
sin
(b) if det A = 1, then A =
for some , 0 < 2. That is, A reflects evsin cos
ery point in R2 about a line passing through origin. Determine
thisline. Or equivalently,
1 0
1
there exists a non-singular matrix P such that P AP =
.
0 1
6. Let A be a 3 3 orthogonal matrix. Then prove the following:
(a) if det(A) = 1, then A is a rotation about a fixed axis, in the sense that A has an eigenpair (1, x) such that the restriction of A to the plane x is a two dimensional rotation
in x .
(b) if det A = 1, then A corresponds to a reflection through a plane P, followed by a rotation
about the line through origin that is orthogonal to P.
2 1 1
7. Let A = 1 2 1 . Find a non-singular matrix P such that P 1 AP = diag (4, 1, 1). Use
1 1 2
this to compute A301 .
8. Let A be a Hermitian matrix. Then prove that rank(A) equals the number of non-zero eigenvalues of A.
Remark 6.3.8 Let A and B be the 2 2 matrices in Exercise 6.3.7.4. Then A and B were similar
matrices but they were not unitarily equivalent. In numerical calculations, unitary transformations
are preferred as compared to similarity transformations due to the following main reasons:
1. Exercise 6.3.7.3d implies that an orthonormal change of basis does not alter the sum of squares
of the absolute values of the entries. This need not be true under a non-singularity change of
basis.
2. For a unitary matrix U, U 1 = U and hence unitary equivalence is computationally simpler.
3. Also there is no round-off error in the operation of conjugate transpose.
We next prove the Schurs Lemma and use it to show that normal matrices are unitarily diagonalizable. The proof is similar to the proof of Theorem 6.3.5. We give it again so that the readers
have a better understanding of unitary transformations.
Lemma 6.3.9 (Schurs Lemma) Let A Mn (C). Then A is unitarily similar to an upper triangular matrix.
Proof. We will prove the result by induction on n. The result is clearly true for n = 1. Let the
result be true for n = k 1. we need to prove the result for n = k.
136
Let (1 , x) be an eigen-pair of a k k matrix A with kxk = 1. Let us extend the set {x}, a
linearly independent set, to form an orthonormal basis {x, u2 , u3 , . . . , uk } (using Gram-Schmidt
Orthogonalization) of Ck . Then U1 = [x u2 uk ] is a unitary matrix and
u2
U1 AU1 = U1 [Ax Au2 Auk ] = . [1 x Au2 Auk ] =
..
uk
1
0
..
.
0
2 1
1 1 1
2
1 1
0
2. Let A = 0 2 1, B = 0 1
0 and U = 12 1 1 0 . Prove that A and B
0 0 3
0 0
3
0 0
2
are unitarily equivalent via the unitary matrix U . Hence, conclude that the upper triangular
matrix obtained in the Schurs Lemma need not be unique.
We end this chapter with an application of the theory of diagonalization to the study of conic
sections in analytic geometry and the study of maxima and minima in analysis.
6.4
137
Definition 6.4.1 (Bilinear Form) Let A be an n n real symmetric matrix. A bilinear form in
x = (x1 , x2 , . . . , xn )t , y = (y1 , y2 , . . . , yn )t is an expression of the type
n
X
Q(x, y) = yt Ax =
aij xi yj .
i,j=1
n
X
aij xi yj .
i,j=1
Observe that if A = In then the bilinear (sesquilinear) form reduces to the standard real (complex) inner product. Also, it can be easily seen that H(x, y) is linear in x, the first component
and conjugate linear in y, the second component. The expression Q(x, x) is called the quadratic
form and H(x, x) the Hermitian form. We generally write Q(x) and H(x) in place of Q(x, x) and
H(x, x), respectively. It can be easily shown that for any choice of x, the Hermitian form H(x) is
a real number. Hence, for any real number , the equation H(x) = , represents a conic in Cn .
1
Example 6.4.3 Let A =
2+i
H(x) =
=
=
2i
. Then A = A and for x = (x1 , x2 ) ,
2
x Ax = (x1 , x2 )
1
2+i
2i
2
x1
x2
where Re denotes the real part of a complex number. This shows that for every choice of x the
Hermitian form is always real. Why?
The main idea of this section is to express H(x) as sum of squares and hence determine the
possible values that it can take. Note that if we replace x by cx, where c is any complex number,
then H(x) simply gets multiplied by |c|2 and hence one needs to study only those x for which
kxk = 1, i.e., x is a normalized vector.
Let A = A Mn (C). Then by Theorem 6.3.5, the eigenvalues i , 1 i n, of A are real
and there exists a unitary matrix U such that U AU = D diag(1 , 2 , . . . , n ). Now define,
z = (z1 , z2 , . . . , zn ) = U x. Then kzk = 1, x = U z and
H(x) = z U AU z = z Dz =
n
X
i=1
p
r
2
p
2
X
X
p
i |zi | =
|i | zi
|i | zi .
2
i=1
(6.4.1)
i=p+1
Thus, the possible values of H(x) depend only on the eigenvalues of A. Since U is an invertible
matrix, the components zi s of z = U x are commonly known as linearly independent linear forms.
Also, note that in Equation (6.4.1), the number p (respectively r p) seems to be related to the
number of eigenvalues of A that are positive (respectively negative). This is indeed true. That is,
in any expression of H(x) as a sum of n absolute squares of linearly independent linear forms, the
number p (respectively r p) gives the number of positive (respectively negative) eigenvalues of A.
This is stated as the next lemma and it popularly known as the Sylvesters law of inertia.
138
Lemma 6.4.4 Let A Mn (C) be a Hermitian matrix and let x = (x1 , x2 , . . . , xn ) . Then every
Hermitian form H(x) = x Ax, in n variables can be written as
H(x) = |y1 |2 + |y2 |2 + + |yp |2 |yp+1 |2 |yr |2
where y1 , y2 , . . . , yr are linearly independent linear forms in x1 , x2 , . . . , xn , and the integers p and
r satisfying 0 p r n, depend only on A.
Proof. From Equation (6.4.1) it is easily seen that H(x) has the required form. We only need to
show that p and r are uniquely determined by A. Hence, let us assume on the contrary that there
exist positive integers p, q, r, s with p > q such that
H(x)
= (Y1 , 0 ). Then
non-zero solution and let y
H(
y) = |y1 |2 + |y2 |2 + + |yp |2 = (|zq+1 |2 + + |zs |2 ).
Now, this can hold only if y1 = y2 = = yp = 0, which gives a contradiction. Hence p = q.
Similarly, the case r > s can be resolved. Thus, the proof of the lemma is over.
Remark 6.4.5 The integer r is the rank of the matrix A and the number r 2p is sometimes called
the inertial degree of A.
We complete this chapter by understanding the graph of
ax2 + 2hxy + by 2 + 2f x + 2gy + c = 0
for a, b, c, f, g, h R. We first look at the following example.
Example 6.4.6 Sketch the graph of 3x2 + 4xy + 3y 2 = 5.
2
3 2
y]
2 3
1
2 3
0
2
2
Now, let u =
x+y
and v =
2
xy
.
2
3x + 4xy + 3y
x
and the eigen-pairs of the matrix
y
" 1
0 2
1 12
1
2
12
Then
3
= [x, y]
2
"
= [x, y]
=
1
2
1
2
2 x
3 y
1
2
12
#
5 0
0 1
" 1
2
1
2
5 0 u
u, v
= 5u2 + v 2 .
0 1 v
1
2
12
#
x
y
139
2
Thus, the given graph reduces to 5u2 + v 2 = 5 or equivalently to u2 + v5 = 1. Therefore, the given
graph represents an ellipse with the principal axes u = 0 and v = 0 (correspinding to the line
x + y = 0 and x y = 0, respectively). See Figure 6.4.6.
y = x
y=x
Proposition 6.4.8 For fixed real numbers a, b, c, g, f and h, consider the general conic
ax2 + 2hxy + by 2 + 2gx + 2f y + c = 0.
Then prove that this conic represents
1. an ellipse if ab h2 > 0,
2. a parabola if ab h2 = 0, and
3. a hyperbola if ab h2 < 0.
x
a h
2
2
Proof. Let A =
. Then ax + 2hxy + by = x y A
is the associated quadratic
h b
y
form. As A is a symmetric matrix, by Corollary 6.3.6, the eigenvalues 1 , 2 of A are both real,
the corresponding
u
and A is unitarily diagonalizable with A =
1 ,u2 are
torthonormal
eigenvectors
t
1 0
u
u1 x
u1
. Let
=
. Then ax2 + 2hxy + by 2 = 1 u2 + 2 v 2 and the
u1 u2
v
ut2 y
0 2
ut2
equation of the conic section in the (u, v)-plane, reduces to
1 u2 + 2 v 2 + 2g1 u + 2f1 v + c = 0.
(6.4.2)
140
d3
2 .
ii. If d3 < 0, the solution set of the corresponding conic is an empty set.
(c) If d2 6= 0. Then the given equation is of the form Y 2 = 4aX for some translates X = x+
and Y = y + and thus represents a parabola.
3. 1 > 0 and 2 < 0. In this case, ab h2 = det(A) = 1 2 < 0. Let 2 = 2 with 2 > 0.
Then Equation (6.4.2) can be rewritten as
1 (u + d1 )2 2 (v + d2 )2 = d3 for some d1 , d2 , d3 R
(6.4.3)
1 (u + d1 ) + 2 (v + d2 )
1 (u + d1 ) 2 (v + d2 ) = 0
or equivalently, a pair of intersecting straight lines in the (u, v)-plane.
=1
d3
d3
or equivalently, a hyperbola in the (u, v)-plane, with principal axes u + d1 = 0 and
v + d2 = 0.
4. 1 , 2 > 0. In this case, ab h2 = det(A) = 1 2 > 0 and Equation (6.4.2) can be rewritten
as
1 (u + d1 )2 + 2 (v + d2 )2 = d3 for some d1 , d2 , d3 R.
(6.4.4)
We consider the following three subcases to understand this.
(a) Let d3 = 0. Then Equation (6.4.4) reduces to a pair of perpendicular lines u + d1 = 0
and v + d2 = 0 in the (u, v)-plane.
(b) Let d3 < 0. Then the solution set of Equation (6.4.4) is an empty set.
(c) Let d3 > 0. Then Equation (6.4.4) reduces to the ellipse
1 (u + d1 )2
2 (v + d2 )2
+
=1
d3
d3
whose principal axes are u + d1 = 0 and v + d2 = 0.
141
u
x
Remark 6.4.9 Observe that the condition
= u1 u2
implies that the principal axes of
y
v
the conic are functions of the eigenvectors u1 and u2 .
Exercise 6.4.10 Sketch the graph of the following surfaces:
1. x2 + 2xy + y 2 6x 10y = 3.
2. 2x2 + 6xy + 3y 2 12x 6y = 5.
3. 4x2 4xy + 2y 2 + 12x 8y = 10.
4. 2x2 6xy + 5y 2 10x + 4y = 7.
As a last application, we consider the following problem that helps us in understanding the
quadrics. Let
ax2 + by 2 + cz 2 + 2dxy + 2exz + 2f yz + 2lx + 2my + 2nz + q = 0
(6.4.5)
be a general quadric. Then to get the geometrical idea of this quadric, do the following:
x
2l
a d e
(6.4.6)
4. Depending on which i =
6 0, rewrite Equation (6.4.6). That is, if 1 6= 0 then rewrite
2 2
l1
2
1 y1 + 2l1 y1 as 1 y1 + 1 l11 .
5. Use the condition x = P y to determine the center and the planes of symmetry of the quadric
in terms of the original system.
4
2 1 1
Solution: For Part 1, observe that A = 1 2 1, b = 2 and q = 2. Also, the orthonormal
4
1 1 2
1
1
1
4 0 0
3
2
6
1
1
1
t
matrix P = 3
and P AP = 0 1 0 . Hence, the quadric reduces to 4y12 + y22 +
2
6
1
2
0 0 1
0
3
6
10
2
2
2
y3 + 3 y1 + 2 y2 6 y3 + 2 = 0. Or equivalently to
5
1
1
9
4(y1 + )2 + (y2 + )2 + (y3 )2 =
.
12
4 3
2
6
142
9
So, the standard form of the quadric is 4z12 + z22 + z32 = 12
, where the center is given by (x, y, z)t =
5
1 1 t
3 1 3 t
P(4 3, 2, 6) = ( 4 , 4, 4 ) .
3 0 0
For Part 2, observe that A = 0 1 0, b = 0 and q = 10. In this case, we can rewrite the
0 0 1
quadric as
y2
3x2
z2
=1
10
10
10
which is the equation of a hyperboloid consisting of two sheets.
The calculation of the planes of symmetry is left as an exercise to the reader.
Part I
143
Chapter 7
Differential Equations
7.1
There are many branches of science and engineering where differential equations naturally arise.
Now a days there are applications to many areas in medicine, economics and social sciences. In
this context, the study of differential equations assumes importance. In addition, in the elementary
study of differential equations, we also see the applications of many results from analysis and linear
algebra. Without spending more time on motivation, (which will be clear as we go along) let us
start with the following notations. Suppose that y is a dependent variable and x is an independent
variable. The derivatives of y (with respect to x) are denoted by
y =
dy
d2 y
d(k) y
, y = 2 , . . . , y (k) = (k)
dx
dx
dx
for k 3.
The independent variable will be defined for an interval I; where I is either R or an interval
a < x < b R. With these notations, we ask the question: what is a differential equation?
A differential equation is a relationship between the independent variable and the unknown
dependent functions along with its derivatives.
Definition 7.1.1 (Ordinary Differential Equation, ODE) An equation of the form
f x, y, y , . . . , y (n) = 0
for x I
(7.1.1)
3. y = x + cos y;
2
4. (y ) + y = 0.
5. y + y = 0.
145
146
6. y + y = 0.
7. y (3) = 0.
8. y + m sin y = 0.
(7.1.2)
and move to higher orders later. With this in mind let us look at:
Definition 7.1.8 (General Solution) A function y(x, c) is called a general solution of Equation
(7.1.2) on an interval I R, if y(x, c) is a solution of Equation (7.1.2) for each x I, for a fixed
c R but c is arbitrary.
Remark 7.1.9 The family of functions {y(., c) : c R} is called a one parameter family of functions and c is called a parameter. In other words, a general solution of Equation (7.1.2) is nothing
but a one parameter family of solutions of the Equation (7.1.2).
Example 7.1.10
1. Show that for each k R, y = kex is a solution of y = y. This is a general
solution as it is a one parameter family of solutions. Here the parameter is k.
Solution: This can be easily verified.
147
2. Determine a differential equation for which a family of circles with center at (1, 0) and arbitrary radius, a is an implicit solution.
Solution: This family is represented by the implicit relation
(x 1)2 + y 2 = a2 ,
(7.1.3)
3. Consider the one parameter family of circles with center at (c, 0) and unit radius. The family
is represented by the implicit relation
(x c)2 + y 2 = 1,
2
where c is a real constant. Show that y satisfies yy + y 2 = 1.
Solution: We note that, differentiation of the given equation, leads to
(7.1.5)
(x c) + yy = 0.
Now, eliminating c from the two equations, we get
(yy )2 + y 2 = 1.
In Example 7.1.10.2, we see that y is not defined explicitly as a function of x but implicitly
1
defined by Equation (7.1.3). On the other hand y =
is an explicit solution in Example
1x
7.1.5.2. Solving a differential equation means to find a solution.
Let us now look at some geometrical interpretations of the differential Equation (7.1.2). The
Equation (7.1.2) is a relation between x, y and the slope of the function y at the point x. For
1
instance, let us find the equation of the curve passing through (0, ) and whose slope at each point
2
x
(x, y) is . If y is the required curve, then y satisfies
4y
x
1
dy
= , y(0) = .
dx
4y
2
It is easy to verify that y satisfies the equation x2 + 4y 2 = 1.
Exercise 7.1.11
(a) y 2 + sin(y ) = 1.
(b) y + (y )2 = 2x.
(c) (y )3 + y 2y 4 = 1.
2. Find a differential equation satisfied by the given family of curves:
(a) y = mx, m real (family of lines).
(b) y 2 = 4ax, a real (family of parabolas).
(c) x = r2 cos , y = r2 sin , is a parameter of the curve and r is a real number (family
of circles in parametric representation).
3. Find the equation of the curve C which passes through (1, 0) and whose slope at each point
x
(x, y) is
.
y
148
7.2
Separable Equations
(7.2.1)
1 dy
dy =
g(y) dx
dy
= G(y) + c,
g(y)
1
,
1 ex+c
2. Solve y = y 2 .
1
, where c is a constant; is the required solution.
x+c
Observe that the solution is defined, only if x+c 6= 0 for any x. For example, if we let y(0) = a,
a
then y =
exists as long as ax 1 6= 0.
ax 1
7.2.1
There are many equations which are not of the form 7.2.1, but by a suitable substitution, they can
be reduced to the separable form. One such class of equation is
y =
g1 (x, y)
g2 (x, y)
y
or equivalently y = g( )
x
where g1 and g2 are homogeneous functions of the same degree in x and y, and g is a continuous
function. In this case, we use the substitution, y = xu(x) to get y = xu + u. Thus, the above
equation after substitution becomes
xu + u(x) = g(u),
which is a separable equation in u. For illustration, we consider some examples.
149
Example 7.2.2
1. Find the general solution of 2xyy y 2 + x2 = 0.
Solution: Let I be any interval not containing 0. Then
y
y
2 y ( )2 + 1 = 0.
x
x
Letting y = xu(x), we have
2u(u x + u) u2 + 1 = 0 or 2xuu + u2 + 1 = 0 or equivalently
2u du
1
= .
2
1 + u dx
x
On integration, we get
1 + u2 =
c
x
or
x2 + y 2 cx = 0.
The general solution can be re-written in the form
c
c2
(x )2 + y 2 = .
2
4
This represents a family of circles with center ( 2c , 0) and radius
c
2.
2. Find the equation of the curve passing through (0, 1) and whose slope at each point (x, y) is
x
.
2y
Solution: If y is such a curve then we have
dy
x
=
and y(0) = 1.
dx
2y
Notice that it is a separable equation and it is easy to verify that y satisfies x2 + 2y 2 = 2.
3. The equations of the type
dy
a1 x + b 1 y + c1
=
dx
a2 x + b 2 y + c2
can also be solved by the above method by replacing x by x + h and y by y + k, where h and k
are to be chosen such that
a1 h + b 1 k + c1 = 0 = a2 h + b 2 k + c2 .
This condition changes the given differential equation into
then the equation reduces to the form y = g( xy ).
Exercise 7.2.3
dy
= x(ln x)(ln y).
dx
dy
(b) y 1 cos1 +(ex + 1)
= 0.
dx
(a)
d
, r(0) = a.
dr
dy
a1 x + b 1 y
=
. Thus, if x 6= 0
dx
a2 x + b 2 y
150
dy
, y(0) = 0.
dx
dy
xy+2
=
.
dx
x + y + 2
4. Solve y = y y 2 and use it to determine lim y. This equation occurs in a model of popux
lation.
7.3
Exact Equations
As remarked, there are no general methods to find a solution of Equation (7.1.2). The Exact
Equations is yet another class of equations that can be easily solved. In this section, we introduce
this concept.
Let D be a region in xy-plane and let M and N be real valued functions defined on D. Consider
an equation
dy
M (x, y) + N (x, y)
= 0, (x, y) D.
(7.3.1)
dx
In most of the books on Differential Equations, this equation is also written as
M (x, y)dx + N (x, y)dy = 0, (x, y) D.
(7.3.2)
Definition 7.3.1 (Exact Equation) Equation (7.3.1) is called Exact if there exists a real valued
twice continuously differentiable function f : R2 R (or the domain is an open subset of R2 ) such
that
f
f
= M and
= N.
(7.3.3)
x
y
Remark 7.3.2 If Equation (7.3.1) is exact, then
f
f dy
df (x, y)
+
=
= 0.
x
y dx
dx
This implies that f (x, y) = c (where c is a constant) is an implicit solution of Equation (7.3.1). In
other words, the left side of Equation (7.3.1) is an exact differential.
dy
Example 7.3.3 The equation y + x dx
= 0 is an exact equation. Observe that in this example,
f (x, y) = xy.
(7.3.4)
151
Note: If the Equation (7.3.1) or Equation (7.3.2) is exact, then there is a function f (x, y)
satisfying f (x, y) = c for some constant c, such that
d(f (x, y)) = M (x, y)dx + N (x, y)dy = 0.
Let us consider some examples, where Theorem 7.3.4 can be used to easily find the general
solution.
Example 7.3.5
1. Solve
2xey + (x2 ey + cos y )
dy
= 0.
dx
M
N
= 2xey and
= 2xey .
y
x
Therefore, the given equation is exact. Hence, there exists a function G(x, y) such that
G
= 2xey and
x
G
= x2 ey + cos y.
y
The first partial differentiation when integrated with respect to x (assuming y to be a constant)
gives,
G(x, y) = x2 ey + h(y).
But then
G
(x2 ey + h(y))
=
=N
y
y
implies dh
dy = cos y or h(y) = sin y + c where c is an arbitrary constant. Thus, the general
solution of the given equation is
x2 ey + sin y = c.
The solution in this case is in implicit form.
2. Find values of and m such that the equation
y 2 + mxy
dy
=0
dx
N
M
= 2y and
= my.
y
x
Hence for the given equation to be exact, m = 2. With this condition on and m, the equation
reduces to
dy
y 2 + 2xy
= 0.
dx
This equation is not meaningful if = 0. Thus, the above equation reduces to
d
(xy 2 ) = 0
dx
whose solution is
xy 2 = c
for some arbitrary constant c.
152
M
y
N
x
M = 3x2 ey x2 and N = x3 ey + y 2 .
(3x2 ey x2 )dx = x3 ey
x3
+ h(y)
3
x3
y3
+
=c
3
3
7.3.1
Integrating Factors
On may occasions,
M (x, y) + N (x, y)
dy
= 0, or equivalently M (x, y)dx + N (x, y)dy = 0
dx
may not be exact. But the above equation may become exact, if we multiply it by a proper factor.
For example, the equation
ydx dy = 0
an exact equation. Such a factor (in this case, ex ) is called an integrating factor for the given
equation. Formally
Definition 7.3.6 (Integrating Factor) A function Q(x, y) is called an integrating factor for the
Equation (7.3.1), if the equation
Q(x, y)M (x, y)dx + Q(x, y)N (x, y)dy = 0
is exact.
Example 7.3.7
1. Solve the equation ydx xdy = 0, x, y > 0.
Solution: It can be easily verified that the given equation is not exact. Multiplying by
equation reduces to
1
xy ,
the
1
1
ydx
xdy = 0, or equivalently d (ln x ln y) = 0.
xy
xy
Thus, by definition,
is G(x, y) =
1
is an integrating factor. Hence, a general solution of the given equation
xy
1
= c, for some constant c R. That is,
xy
153
x(3y + 2x)
2
1
=
.
y x+y
xy x + y
(7.3.5)
(7.3.6)
(7.3.7)
(7.3.8)
Method 2: Here the terms M = 4y 2 + 3xy and N = (3xy + 2x2 ) are polynomial in x and
y. Therefore, we suppose that x y is an integrating factor for some , R. We try to find
this and .
Multiplying the terms M (x, y) and N (x, y) with x y , we get
M (x, y) = x y 4y 2 + 3xy , and N (x, y) = x y (3xy + 2x2 ).
For the new equation to be exact, we need
M (x, y)
N (x, y)
=
. That is, the terms
y
x
y
must be equal. Solving for and , we get = 5 and = 1. That is, the expression 5 is
x
also an integrating factor for the given differential equation. This integrating factor leads to
G(x, y) =
y3
y2
+ h(y)
x4
x3
154
y3
y2
3 + g(x).
4
x
x
Thus, we need h(y) = g(x) = c, for some constant c R. Hence, the required solution by this
method is
y 2 y + x = cx4 .
G(x, y) =
Remark 7.3.8
1. If Equation (7.3.1) has a general solution, then it can be shown that Equation
(7.3.1) admits an integrating factor.
2. If Equation (7.3.1) has an integrating factor, then it has many (in fact infinitely many)
integrating factors.
3. Given Equation (7.3.1), whether or not it has an integrating factor, is a tough question to
settle.
4. In some cases, we use the following rules to find the integrating factors.
(a) Consider a homogeneous equation M (x, y)dx + N (x, y)dy = 0. If
M x + N y 6= 0,
then
1
Mx + Ny
is an Integrating Factor.
(b) If the functions M (x, y) and N (x, y) are polynomial functions in x, y; then x y works
as an integrating factor for some appropriate values of and .
(c) Theequation M
(x, y)dx + N (x, y)dy = 0 has e
1 M
N
is a function of x alone.
N
y
x
f (x)dx
g(y)dy
(d) The equation
as an integrating factor, if
M (x, y)dx
+ N (x, y)dy = 0 has e
N
1 M
is a function of y alone.
g(y) =
M
y
x
(e) For the equation
yM1 (xy)dx + xN1 (xy)dy = 0
1
with M x N y 6= 0, the function
is an integrating factor.
Mx Ny
Exercise 7.3.9
1. Show that the following equations are exact and hence solve them.
dr
+ r(cos sin ) = 0.
d
y
x
dy
(b) (ex ln y + ) + ( + ln x + cos y)
= 0.
x
y
dx
dy
=0
dx
is exact.
3. What are the conditions on f (x), g(y), (x), and (y) so that the equation
((x) + (y)) + (f (x) + g(y))
is exact.
dy
=0
dx
155
4. Verify that the following equations are not exact. Further find suitable integrating factors to
solve them.
dy
= 0.
dx
dy
(b) y 2 + (3xy + y 2 1)
= 0.
dx
dy
(c) y + (x + x3 y 2 )
= 0.
dx
dy
(d) y 2 + (3xy + y 2 1)
= 0.
dx
(a) y + (x + x3 y 2 )
7.4
Linear Equations
Some times we might think of a subset or subclass of differential equations which admit explicit
solutions. This question is pertinent when we say that there are no means to find the explicit
dy
solution of
= f (x, y) where f is an arbitrary continuous function in (x, y) in suitable domain
dx
of definition. In this context, we have a class of equations, called Linear Equations (to be defined
shortly) which admit explicit solutions.
Definition 7.4.1 (Linear/Nonlinear Equations)
Let p(x) and q(x) be real-valued piecewise continuous functions defined on interval I = [a, b]. The
equation
y + p(x)y = q(x), x I
(7.4.1)
dy
. Equation (7.4.1) is called Linear non-homogeneous
dx
if q(x) 6= 0 and is called Linear homogeneous if q(x) = 0 on I.
is called a linear equation, where y stands for
A first order equation is called a non-linear equation (in the independent variable) if it is neither a
linear homogeneous nor a non-homogeneous linear equation.
Example 7.4.2
p(x)dx ( or
Rx
d P (x)
(e
y) = eP (x) q(x).
dx
eP (x) q(x)dx.
156
In other words,
y = ce
P (x)
+e
P (x)
eP (x) q(x)dx
(7.4.2)
Rx
y = y(a)e
P (x)
+e
P (x)
Zx
eP (s) q(s)ds.
(7.4.3)
Hence, P (x) = (1)dx = x. Substituting for P (x) in Equation (7.4.2), we get y = cex as
the required general solution.
We can just use the second part of the above proposition to get the above result, as k = 1.
2. The general solution of xy = y, x I (0 6 I) is y = cx1 , where c is an arbitrary constant.
Notice that no non-zero solution exists if we insist on the condition lim y = 0.
x0,x>0
A class of nonlinear Equations (7.4.1) (named after Bernoulli (1654 1705)) can be reduced to
linear equation. These equations are of the type
y + p(x)y = q(x)y a .
(7.4.5)
(7.4.6)
nemx dx = Aemx +
n
.
m
157
Equivalently
y=
1
Aemx +
n
m
158
7.5
Miscellaneous Remarks
In Section 7.4, we have learned to solve the linear equations. There are many other equations,
though not linear, which are also amicable for solving. Below, we consider a few classes of equations
dy
which can be solved. In this section or in the sequel, p denotes
or y . A word of caution is needed
dx
here. The method described below are more or less ad hoc methods.
1. Equations solvable for y:
Consider an equation of the form
y = f (x, p).
(7.5.1)
of equivalently p = g(x, p,
).
dx
x
p
dx
dx
(7.5.2)
Equation (7.5.2) can be viewed as a differential equation in p and x. We now assume that
Equation (7.5.2) can be solved for p and its solution is
h(x, p, c) = 0.
(7.5.3)
If we are able to eliminate p between Equations (7.5.1) and (7.5.3), then we have an implicit
solution of the Equation (7.5.1).
Solve y = 2px xp2 .
dp
dp
2xp
dx
dx
dy
by p, we get
dx
or (p + 2x
dp
)(1 p) = 0.
dx
So, either
p + 2x
dp
= 0 or p = 1.
dx
That is, either p2 x = c or p = 1. Eliminating p from the given equation leads to an explicit
solution
r
c
y = 2x
c or y = x.
x
The first solution is a one-parameter family of solutions, giving us a general solution. The
latter one is a solution but not a general solution since it is not a one parameter family of
solutions.
2. Equations in which the independent variable x is missing:
These are equations of the type f (y, p) = 0. If possible we solve for y and we proceed.
Sometimes introducing an arbitrary parameter helps. We illustrate it below.
Solve y 2 + p2 = a2 where a is a constant.
Solution: We equivalently rewrite the given equation, by (arbitrarily) introducing a new
parameter t by
y = a sin t, p = a cos t
from which it follows
dy
dy
dy
= a cos t; p =
=
dt
dx
dt
dx
dt
159
dx
1 dy
=
= 1 or x = t + c.
dt
p dt
= p(3p2 1).
dp
dx dp
Therefore,
3 4 1 2
p p +c
4
2
(regarding p as a parameter). The desired solution in this case is in the parametric form,
given by
3
1
x = t3 t 1 and y = t4 t2 + c
4
2
where c is an arbitrary constant.
y=
Remark 7.5.1 The readers are again informed that the methods discussed in 1), 2), 3) are
more or less ad hoc methods. It may not work in all cases.
1. Find the general solution of y = (1 + p)x + p2 .
dx
Hint: Differentiate with respect to x to get
= (x + 2p) ( a linear equation in x). Express
dp
the solution in the parametric form
Exercise 7.5.2
7.6
As we had seen, there are no methods to solve a general equation of the form
y = f (x, y)
and in this context two questions may be pertinent.
1. Does Equation (7.6.1) admit solutions at all (i.e., the existence problem)?
(7.6.1)
160
2. Is there a method to find solutions of Equation (7.6.1) in case the answer to the above question
is in the affirmative?
The answers to the above two questions are not simple. But there are partial answers if some
additional restrictions on the function f are imposed. The details are discussed in this section.
For a, b R with a > 0, b > 0, we define
S = {(x, y) R2 : |x x0 | a, |y y0 | b} I R.
Definition 7.6.1 (Initial Value Problems) Let f : S R be a continuous function on a S.
The problem of finding a solution y of
y = f (x, y), (x, y) S, x I with y(x0 ) = y0
(7.6.2)
f (s, y(s))ds.
(7.6.3)
x0
In the absence of any knowledge of a solution of IVP (7.6.2), we now try to find an approximate
solution. Any solution of the IVP (7.6.2) must satisfy the initial condition y(x0 ) = y0 . Hence, as a
crude approximation to the solution of IVP (7.6.2), we define
y0 = y0 for all x [x0 h, x0 + h].
Now the Equation (7.6.3) appearing in Proposition 7.6.2, helps us to refine or improve the approximate solution y0 with a hope of getting a better approximate solution. We define
Z x
y1 = yo +
f (s, y0 )ds
x0
x0
As yet we have not checked a few things, like whether the point (s, yn (s)) S or not. We
formalise the theory in the latter part of this section. To get ourselves motivated, let us apply the
above method to the following IVP.
161
ds = 1 x.
(1 s)ds = 1 x +
yn = 1 x +
x2
.
2!
x2
x3
xn
+ + (1)n .
2!
3!
n!
lim yn = ex .
This example justifies the use of the word approximate solution for the yn s.
We now formalise the above procedure.
Definition 7.6.4 (Picards Successive Approximations) Consider the IVP (7.6.2). For x
I with |x x0 | a, define inductively
y0 (x)
yn (x)
y0 and for n = 1, 2, . . . ,
Z x
y0 +
f (s, yn1 (s))ds.
(7.6.4)
x0
162
x0
|yn y0 | M |x x0 | M h b.
This shows that (x, yn ) S whenever |x x0 | h. Hence (x, yk ) S for k = n holds and therefore
the proof of the proposition is complete.
Let us again come back to Example 7.6.3 in the light of Proposition 7.6.2.
Example 7.6.6 Compute the successive approximations to the IVP
y = y, 1 x 1, |y 1| 1 and y(0) = 1.
(7.6.5)
y for y 0.
(7.6.6)
0ds = 0.
A similar argument implies that yn (x) 0 for all n = 2, 3, . . . and lim yn (x) 0. Also, it can be
n
163
The following result is about the existence of a unique solution to a class of IVPs. We state the
theorem without proof.
Theorem 7.6.8 (Picards Theorem on Existence and Uniqueness) Let S = {(x, y) : |x
f
x0 | a, |y y0 | b}, and a, b > 0. Let f : SR be such that f as well as
are continuous on
y
S. Also, let M, K R be constants such that
|f | M, |
f
| K on S.
y
Let h = min{a, b/M }. Then the sequence of successive approximations {yn } (defined by Equation
(7.6.4)) for the IVP (7.6.2) uniformly converges on |x x0 | h to a solution of IVP (7.6.2).
Moreover the solution to IVP (7.6.2) is unique.
Remark 7.6.9 The theorem asserts the existence of a unique solution on a subinterval |x x0 | h
of the given interval |x x0 | a. In a way it is in a neighbourhood of x0 and so this result is also
called the local existence of a unique solution. A natural question is whether the solution exists on
the whole of the interval |x x0 | a. The answer to this question is beyond the scope of this book.
Whenever we talk of the Picards theorem, we mean it in this local sense.
Exercise 7.6.10
y, y(0) = 0, x 0
x2
has solutions y1 0 as well as y2 =
, x 0. Why does the existence of the two solutions
4
not contradict the Picards theorem?
4. Consider the IVP
y = y, y(0) = 1 in {(x, y) : |x| a, |y| b}
for any a, b > 0.
(a) Compute the interval of existence of the solution of the IVP by using Theorem 7.6.8.
(b) Show that y = ex is the solution of the IVP which exists on whole of R.
This again shows that the solution to an IVP may exist on a larger interval than what is being
implied by Theorem 7.6.8.
164
7.6.1
Orthogonal Trajectories
One among the many applications of differential equations is to find curves that intersect a given
family of curves at right angles. In other words, given a family F, of curves, we wish to find curve
(or curves) which intersect orthogonally with any member of F (whenever they intersect). It is
important to note that we are not insisting that should intersect every member of F, but if they
intersect, the angle between their tangents, at every point of intersection, is 90 . Such a family
of curves is called orthogonal trajectories of the family F. That is, at the common point of
intersection, the tangents are orthogonal. In case, the family F1 and F2 are identical, we say that
the family is self-orthogonal.
Before procedding to an example, let us note that at the common point of intersection, the
product of the slopes of the tangent is 1. In order to find the orthogonal trajectories of a family
of curves F, parametrized by a constant c, we eliminate c between y and y . This gives the slope at
any point (x, y) and is independent of the choice of the curve. Below, we illustrate, how to obtain
the orthogonal trajectories.
Example 7.6.11 Compute the orthogonal trajectories of the family F of curves given by
F :
y 2 = cx3 ,
(7.6.7)
(7.6.8)
3cx2
3 cx3
3y
=
=
.
2y
2x y
2x
(7.6.9)
At the point (x, y), if any curve intersects orthogonally, then (if its slope is y ) we must have
y =
2x
.
3y
x2
+ c.
3
(7.6.10)
for which the given family F are a general solution. Equation (7.6.10) is obtained by the elimination
of the constant c appearing in F (x, y, c) = 0 using the equation obtained by differentiating this
equation with respect to x.
Step 2: The differential equation for the orthogonal trajectories is then given by
y =
1
.
f (x, y)
(7.6.11)
Final Step: The general solution of Equation (7.6.11) is the orthogonal trajectories of the given
family.
In the following, let us go through the steps.
165
Example 7.6.12 Find the orthogonal trajectories of the family of stright lines
y = mx + 1,
(7.6.12)
x
.
1y
(7.6.13)
(7.6.14)
where c is an arbitrary constant. In other words, the orthogonal trajectories of the family of straight
lines (7.6.12) is the family of circles given by Equation (7.6.14).
Exercise 7.6.13
1. Find the orthogonal trajectories of the following family of curves (the constant c appearing below is an arbitrary constant).
(a) y = x + c.
(b) x2 + y 2 = c.
(c) y 2 = x + c.
(d) y = cx2 .
(e) x2 y 2 = c.
2. Show that the one parameter family of curves y 2 = 4k(k + x), k R are self orthogonal.
3. Find the orthogonal trajectories of the family of circles passing through the points (1, 2) and
(1, 2).
7.7
Numerical Methods
All said and done, the Picards Successive approximations is not suitable for computations on
computers. In the absence of methods for closed form solution (in the explicit form), we wish to
explore how computers can be used to find approximate solutions of IVP of the form
y = f (x, y),
y(x0 ) = y0 .
(7.7.1)
In this section, we study a simple method to find the numerical solutions of Equation (7.7.1).
The study of differential equations has two important aspects (among other features) namely, the
qualitative theory, the latter is called Numerical methods for solving Equation (7.7.1). What is
presented here is at a very rudimentary level nevertheless it gives a flavour of the numerical method.
To proceed further, we assume that f is a good function (there by meaning sufficiently
differentiable). In such case, we have
y(x + h) = y + hy +
h2
y +
2!
166
which suggests a crude approximation y(x + h) y + hf (x, y) (if h is small enough), the symbol
means approximately equal to. With this in mind, let us think of finding y, where y is the
x x0
solution of Equation (7.7.1) with x > x0 . Let h =
and define
n
xi = x0 + ih, i = 0, 1, 2, . . . , n.
That is, we have divided the interval [x0 , x] into n equal intervals with end points x0 , x1 , . . . , x = xn .
x0
x1
x2
xn = x
k = 0, 1, 2, . . . , n 1.
This method of calculation of y1 , y2 , . . . , yn is called the Eulers method. The approximate solution
of Equation (7.7.1) is obtained by linear elements joining (x0 , y0 ), (x1 , y1 ), . . . , (xn , yn ).
y
3
y2
y n1
y1
yn
y
0
x0
x1
x2
x3
x4
x n1 x n
Chapter 8
Introduction
Second order and higher order equations occur frequently in science and engineering (like pendulum
problem etc.) and hence has its own importance. It has its own flavour also. We devote this section
for an elementary introduction.
Definition 8.1.1 (Second Order Linear Differential Equation) The equation
p(x)y + q(x)y + r(x)y = c(x), x I
(8.1.1)
1. The equation
y +
9
sin y = 0
168
(8.1.2)
Then for any two real number c1 , c2 , the function c1 y1 + c2 y2 is also a solution of Equation (8.1.2).
It is to be noted here that Theorem 8.1.5 is not an existence theorem. That is, it does not assert
the existence of a solution of Equation (8.1.2).
Definition 8.1.6 (Solution Space) The set of solutions of a differential equation is called the
solution space.
For example, all the solutions of the Equation (8.1.2) form a solution space. Note that y(x) 0
is also a solution of Equation (8.1.2). Therefore, the solution set of a Equation (8.1.2) is non-empty.
A moments reflection on Theorem 8.1.5 tells us that the solution space of Equation (8.1.2) forms a
real vector space.
Remark 8.1.7 The above statements also hold for any homogeneous linear differential equation.
That is, the solution space of a homogeneous linear differential equation is a real vector space.
The natural question is to inquire about its dimension. This question will be answered in a
sequence of results stated below.
We first recall the definition of Linear Dependence and Independence.
Definition 8.1.8 (Linear Dependence and Linear Independence) Let I be an interval in R
and let f, g : I R be continuous functions. we say that f, g are said to be linearly dependent if
there are real numbers a and b (not both zero) such that
af (t) + bg(t) = 0 for all t I.
The functions f (), g() are said to be linearly independent if f (), g() are not linear dependent.
To proceed further and to simplify matters, we assume that p(x) 1 in Equation (8.1.2) and
that the function q(x) and r(x) are continuous on I.
In other words, we consider a homogeneous linear equation
y + q(x)y + r(x)y = 0, x I,
(8.1.3)
(8.1.4)
where A and B are prescribed real constants. Then Equation (8.1.3), with initial conditions given
by Equation (8.1.4) has a unique solution on I.
8.1. INTRODUCTION
169
(8.1.5)
The unique solutions y1 and y2 exist by virtue of Theorem 8.1.9. We now claim that y1 and y2 are
linearly independent. Consider the system of linear equations
y1 (x) + y2 (x) = 0,
(8.1.6)
where and are unknowns. If we can show that the only solution for the system (8.1.6) is
= = 0, then the two solutions y1 and y2 will be linearly independent.
Use initial condition on y1 and y2 to show that the only solution is indeed = = 0. Hence
the result follows.
We now show that any solution of Equation (8.1.3) is a linear combination of y1 and y2 . Let
be any solution of Equation (8.1.3) and let d1 = (x0 ) and d2 = (x0 ). Consider the function
defined by
(x) = d1 y1 (x) + d2 y2 (x).
By Definition 8.1.3, is a solution of Equation (8.1.3). Also note that (x0 ) = d1 and (x0 ) = d2 .
So, and are two solution of Equation (8.1.3) with the same initial conditions. Hence by Picards
Theorem on Existence and Uniqueness (see Theorem 8.1.9), (x) (x) or
(x) = d1 y1 (x) + d2 y2 (x).
Thus, the equation (8.1.3) has two linearly independent solutions.
Remark 8.1.11
1. Observe that the solution space of Equation (8.1.3) forms a real vector space
of dimension 2.
2. The solutions y1 and y2 corresponding to the initial conditions
y1 (x0 ) = 1, y1 (x0 ) = 0,
170
1
1
1
.
1
Exercise 8.1.13
1. State whether the following equations are second-order linear or secondorder non-linear equaitons.
(a) y + y sin x = 5.
(b) y + (y )2 + y sin x = 0.
(c) y + yy = 2.
8.2
In this section, we wish to study some more properties of second order equations which have nice
applications. They also have natural generalisations to higher order equations.
Definition 8.2.1 (General Solution) Let y1 and y2 be a fundamental system of solutions for
y + q(x)y + r(x)y = 0, x I.
(8.2.1)
8.2.1
Wronskian
In this subsection, we discuss the linear independence or dependence of two solutions of Equation
(8.2.1).
Definition 8.2.2 (Wronskian) Let y1 and y2 be two real valued continuously differentiable function on an interval I R. For x I, define
y1 y1
W (y1 , y2 ) :=
y2 y2
=
y1 y2 y1 y2 .
171
Example 8.2.3
(8.2.2)
2. Let y1 = x2 |x|, and y2 = x3 for x (1, 1). Let us now compute y1 and y2 . From analysis,
we know that y1 is differentiable at x = 0 and
y1 (x) = 3x2 if x < 0 and y1 (x) = 3x2 if x 0.
Therefore, for x 0,
and for x < 0,
y1
W (y1 , y2 ) =
y2
y
W (y1 , y2 ) = 1
y2
y1 x3
=
y2 x3
3x2
=0
3x2
y1 x3
=
y2 x3
3x2
= 0.
3x2
It is also easy to note that y1 , y2 are linearly independent on (1, 1). In fact,they are linearly
independent on any interval (a, b) containing 0.
Given two solutions y1 and y2 of Equation (8.2.1), we have a characterisation for y1 and y2 to
be linearly independent.
Theorem 8.2.4 Let I R be an interval. Let y1 and y2 be two solutions of Equation (8.2.1). Fix
a point x0 I. Then for any x I,
Z x
W (y1 , y2 ) = W (y1 , y2 )(x0 ) exp(
q(s)ds).
(8.2.3)
x0
Consequently,
W (y1 , y2 )(x0 ) 6= 0 W (y1 , y2 ) 6= 0 for all x I.
Proof. First note that, for any x I,
W (y1 , y2 ) = y1 y2 y1 y2 .
So
d
W (y1 , y2 ) =
dx
=
=
=
y1 y2 y1 y2
(8.2.4)
q(x)W (y1 , y2 ).
So, we have
W (y1 , y2 ) = W (y1 , y2 )(x0 ) exp
x0
(8.2.5)
(8.2.6)
(8.2.7)
q(s)ds .
172
Remark 8.2.5
1. If the Wronskian W (y1 , y2 ) of two solutions y1 , y2 of (8.2.1) vanish at a point
x0 I, then W (y1 , y2 ) is identically zero on I.
2. If any two solutions y1 , y2 of Equation (8.2.1) are linearly dependent (on I), then W (y1 , y2 )
0 on I.
Theorem 8.2.6 Let y1 and y2 be any two solutions of Equation (8.2.1). Let x0 I be arbitrary.
Then y1 and y2 are linearly independent on I if and only if W (y1 , y2 )(x0 ) 6= 0.
Proof. Let y1 , y2 be linearly independent on I.
To show: W (y1 , y2 )(x0 ) 6= 0.
Suppose not. Then W (y1 , y2 )(x0 ) = 0. So, by Theorem 2.4.1 the equations
c1 y1 (x0 ) + c2 y2 (x0 ) = 0 and c1 y1 (x0 ) + c2 y2 (x0 ) = 0
(8.2.8)
admits a non-zero solution d1 , d2 . (as 0 = W (y1 , y2 )(x0 ) = y1 (x0 )y2 (x0 ) y1 (x0 )y2 (x0 ).)
Let y = d1 y1 + d2 y2 . Note that Equation (8.2.8) now implies
y(x0 ) = 0 and y (x0 ) = 0.
Therefore, by Picards Theorem on existence and uniqueness of solutions (see Theorem 8.1.9), the
solution y 0 on I. That is, d1 y1 + d2 y2 0 for all x I with |d1 | + |d2 | 6= 0. That is, y1 , y2 is
linearly dependent on I. A contradiction. Therefore, W (y1 , y2 )(x0 ) 6= 0. This proves the first part.
Suppose that W (y1 , y2 )(x0 ) 6= 0 for some x0 I. Therefore, by Theorem 8.2.4, W (y1 , y2 ) 6= 0
for all x I. Suppose that c1 y1 (x) + c2 y2 (x) = 0 for all x I. Therefore, c1 y1 (x) + c2 y2 (x) = 0 for
all x I. Since x0 I, in particular, we consider the linear system of equations
c1 y1 (x0 ) + c2 y2 (x0 ) = 0 and c1 y1 (x0 ) + c2 y2 (x0 ) = 0.
(8.2.9)
But then by using Theorem 2.4.1 and the condition W (y1 , y2 )(x0 ) 6= 0, the only solution of the
linear system (8.2.9) is c1 = c2 = 0. So, by Definition 8.1.8, y1 , y2 are linearly independent.
173
Proof. Let x0 I. Let y(x0 ) = a, y (x0 ) = b. Here a and b are known since the solution y is
given. Also for any x0 I, by Theorem 8.2.6, W (y1 , y2 )(x0 ) 6= 0 as y1 , y2 are linearly independent
solutions of Equation (8.2.1). Therefore by Theorem 2.4.1, the system of linear equations
c1 y1 (x0 ) + c2 y2 (x0 ) = a and c1 y1 (x0 ) + c2 y2 (x0 ) = b
(8.2.10)
8.2.2
We are going to show that in order to find a fundamental system for Equation (8.2.1), it is sufficient
to have the knowledge of a solution of Equation (8.2.1). In other words, if we know one (non-zero)
solution y1 of Equation (8.2.1), then we can determine a solution y2 of Equation (8.2.1), so that
{y1 , y2 } forms a fundamental system for Equation (8.2.1). The method is described below and is
usually called the method of reduction of order.
Let y1 be an every where non-zero solution of Equation (8.2.1). Assume that y2 = u(x)y1 is a
solution of Equation (8.2.1), where u is to be determined. Substituting y2 in Equation (8.2.1), we
have (after a bit of simplification)
u y1 + u (2y1 + py1 ) + u(y1 + py1 + qy1 ) = 0.
By letting u = v, and observing that y1 is a solution of Equation (8.2.1), we have
v y1 + v(2y1 + py1 ) = 0
which is same as
d
(vy12 ) = p(vy12 ).
dx
This is a linear equation of order one (hence the name, reduction of order) in v whose solution is
Z x
vy12 = exp(
p(s)ds), x0 I.
x0
174
x0
1
exp(
y12 (s)
x0
p(t)dt)ds, x0 I
x
x0
1
exp(
y12 (s)
p(t)dt)ds.
x0
It is left as an exercise to show that y1 , y2 are linearly independent. That is, {y1 , y2 } form a
fundamental system for Equation (8.2.1).
We illustrate the method by an example.
Example 8.2.10 Given that e y1 =
1
, x 1 is a solution of
x
x2 y + 4xy + 2y = 0,
(8.2.11)
determine another solution y2 of (8.2.11), such that the solutions y1 , y2 , for x 1 are linearly
independent.
4
Solution: With the notations used above, note that x0 = 1, p(x) = , and y2 (x) = u(x)y1 (x),
x
where u is given by
u
=
=
Z1 x
1
Z s
1
exp
p(t)dt ds
y12 (s)
1
1
exp ln(s4 ) ds
2
y1 (s)
s2
1
ds = 1 ;
s4
x
1
1
.
x x2
1
1
1
1
already appears in y1 , we can take y2 = 2 . So,
and 2 are the required two
x
x
x
x
linearly independent solutions of (8.2.11).
Exercise 8.2.11 In the following, use the given solution y1 , to find another solution y2 so that the
two solutions y1 and y2 are linearly independent.
1. y = 0, y1 = 1, x 0.
2. y + 2y + y = 0, y1 = ex , x 0.
3. x2 y xy + y = 0, y1 = x, x 1.
4. xy + y = 0, y1 = 1, x 1.
5. y + xy y = 0, y1 = x, x 1.
8.3
175
(8.3.1)
(8.3.2)
Equation (8.3.2) is called the characteristic equation of Equation (8.3.1). Equation (8.3.2) is
a quadratic equation and admits 2 roots (repeated roots being counted twice).
Case 1: Let 1 , 2 be real roots of Equation (8.3.2) with 1 6= 2 .
Then e1 x and e2 x are two solutions of Equation (8.3.1) and moreover they are linearly independent
(since 1 6= 2 ). That is, {e1 x , e2 x } forms a fundamental system of solutions of Equation (8.3.1).
Case 2: Let 1 = 2 be a repeated root of p() = 0.
Then p (1 ) = 0. Now,
d
(L(ex )) = L(xex ) = p ()ex + xp()ex .
dx
But p (1 ) = 0 and therefore,
L(xe1 x ) = 0.
Hence, e1 x and xe1 x are two linearly independent solutions of Equation (8.3.1). In this case, we
have a fundamental system of solutions of Equation (8.3.1).
Case 3: Let = + i be a complex root of Equation (8.3.2).
So, i is also a root of Equation (8.3.2). Before we proceed, we note:
Lemma 8.3.2 Let y = u + iv be a solution of Equation (8.3.1), where u and v are real valued
functions. Then u and v are solutions of Equation (8.3.1). In other words, the real part and the
imaginary part of a complex valued solution (of a real variable ODE Equation (8.3.1)) are themselves
solution of Equation (8.3.1).
Proof. exercise.
176
Exercise 8.3.3
(a) y 4y + 3y = 0.
(b) 2y + 5y = 0.
(c) y 9y = 0.
(b) y + 6y + 5y = 0.
(c) y + 5y = 0.
(8.3.3)
Show that it has a solution if and only if B = 0. Compare this with Exercise 4. Also, show
that if B = 0, then there are infinitely many solutions to (8.3.3).
8.4
Throughout this section, I denotes an interval in R. we assume that q(), r() and f () are real valued
continuous function defined on I. Now, we focus the attention to the study of non-homogeneous
equation of the form
y + q(x)y + r(x)y = f (x).
(8.4.1)
We assume that the functions q(), r() and f () are known/given. The non-zero function f ()
in (8.4.1) is also called the non-homogeneous term or the forcing function. The equation
y + q(x)y + r(x)y = 0.
(8.4.2)
(8.4.3)
L(y) =
0.
(8.4.4)
177
2. Let z be any solution of (8.4.1) on I and let z1 be any solution of (8.4.2). Then y = z + z1
is a solution of (8.4.1) on I.
Proof. Observe that L is a linear transformation on the set of twice differentiable function on I.
We therefore have
L(y1 ) = f and L(y2 ) = f.
The linearity of L implies that L(y1 y2 ) = 0 or equivalently, y = y1 y2 is a solution of (8.4.2).
For the proof of second part, note that
L(z) = f
and L(z1 ) = 0
implies that
L(z + z1 ) = L(z) + L(z1 ) = f.
Thus, y = z + z1 is a solution of (8.4.1).
(8.4.5)
Remark 8.4.4 The above results tell us that to solve (i.e., to find the general solution of (8.4.1))
or the IVP (8.4.5), we need to find the general solution of the homogeneous equation (8.4.2) and a
particular solution yp of (8.4.1). To repeat, the two steps needed to solve (8.4.1), are:
1. compute the general solution of (8.4.2), and
2. compute a particular solution of (8.4.1).
Then add the two solutions.
Step 1. has been dealt in the previous sections. The remainder of the section is devoted to step 2.,
i.e., we elaborate some methods for computing a particular solution yp of (8.4.1).
178
Exercise 8.4.5
8.5
Variation of Parameters
In the previous section, calculation of particular integrals/solutions for some special cases have
been studied. Recall that the homogeneous part of the equation had constant coefficients. In this
section, we deal with a useful technique of finding a particular solution when the coefficients of the
homogeneous part are continuous functions and the forcing function f (x) (or the non-homogeneous
term) is piecewise continuous. Suppose y1 and y2 are two linearly independent solutions of
y + q(x)y + r(x)y = 0
(8.5.1)
on I, where q(x) and r(x) are arbitrary continuous functions defined on I. Then we know that
y = c1 y 1 + c2 y 2
is a solution of (8.5.1) for any constants c1 and c2 . We now vary c1 and c2 to functions of x, so
that
y = u(x)y1 + v(x)y2 , x I
(8.5.2)
is a solution of the equation
y + q(x)y + r(x)y = f (x),
on I,
(8.5.3)
where f is a piecewise continuous function defined on I. The details are given in the following
theorem.
Theorem 8.5.1 (Method of Variation of Parameters) Let q(x) and r(x) be continuous functions defined on I and let f be a piecewise continuous function on I. Let y1 and y2 be two linearly
independent solutions of (8.5.1) on I. Then a particular solution yp of (8.5.3) is given by
Z
Z
y2 f (x)
y1 f (x)
yp = y1
dx + y2
dx,
(8.5.4)
W
W
where W = W (y1 , y2 ) is the Wronskian of y1 and y2 . (Note that the integrals in (8.5.4) are the
indefinite integrals of the respective arguments.)
179
Proof. Let u(x) and v(x) be continuously differentiable functions (to be determined) such that
yp = uy1 + vy2 , x I
(8.5.5)
(8.5.6)
u y1 + v y2 = 0.
(8.5.7)
(8.5.8)
Since yp is a particular solution of (8.5.3), substitution of (8.5.5) and (8.5.8) in (8.5.3), we get
u y1 + q(x)y1 + r(x)y1 + v y2 + q(x)y2 + r(x)y2 + u y1 + v y2 = f (x).
As y1 and y2 are solutions of the homogeneous equation (8.5.1), we obtain the condition
u y1 + v y2 = f (x).
(8.5.9)
We now determine u and v from (8.5.7) and (8.5.9). By using the Cramers rule for a linear system
of equations, we get
y2 f (x)
y1 f (x)
u =
and v =
(8.5.10)
W
W
(note that y1 and y2 are linearly independent solutions of (8.5.1) and hence the Wronskian, W 6= 0
for any x I). Integration of (8.5.10) give us
Z
Z
y2 f (x)
y1 f (x)
u=
dx and v =
dx
(8.5.11)
W
W
( without loss of generality, we set the values of integration constants to zero). Equations (8.5.11)
and (8.5.5) yield the desired results. Thus the proof is complete.
Before, we move onto some examples, the following comments are useful.
Remark 8.5.2
1. The integrals in (8.5.11) exist, because y2 and W (6= 0) are continuous functions and f is a piecewise continuous function. Sometimes, it is useful to write (8.5.11) in
the form
Z x
Z x
y2 (s)f (s)
y1 (s)f (s)
u=
ds and v =
ds
W
(s)
W (s)
x0
x0
where x I and x0 is a fixed point in I. In such a case, the particular solution yp as given
by (8.5.4) assumes the form
Z x
Z
y2 (s)f (s)
y1 (s)f (s)
yp = y1
ds + y2 )xx0
ds
(8.5.12)
W (s)
W (s)
x0
for a fixed point x0 I and for any x I.
2. Again, we stress here that, q and r are assumed to be continuous. They need not be constants.
Also, f is a piecewise continuous function on I.
3. A word of caution. While using (8.5.4), one has to keep in mind that the coefficient of y in
(8.5.3) is 1.
180
Example 8.5.3
1
, x 0.
2 + sin x
2
2
y + 2y = x
x
x
and two linearly independent solutions of the corresponding homogeneous part are y1 = x and
y2 = x2 . Here
x x2
= x2 , x > 0.
W = W (x, x2 ) =
1 2x
x2 x
dx + x2
x2
x3
x3
= + x3 =
.
2
2
= x
xx
dx
x2
The readers should note that the methods of Section 8.7 are not applicable as the given equation
is not an equation with constant coefficients.
Exercise 8.5.4
181
0
1
if 0 x < 1
with y(0) = 0 = y (0).
if x 1.
8.6
This section is devoted to an introductory study of higher order linear equations with constant
coefficients. This is an extension of the study of 2nd order linear equations with constant coefficients
(see, Section 8.3).
The standard form of a linear nth order differential equation with constant coefficients is given
by
Ln (y) = f (x) on I,
(8.6.1)
where
dn
dn1
d
+
a
+ + an1
+ an
1
n
n1
dx
dx
dx
is a linear differential operator of order n with constant coefficients, a1 , a2 , . . . , an being real constants (called the coefficients of the linear equation) and the function f (x) is a piecewise continuous
function defined on the interval I. We will be using the notation y (n) for the nth derivative of y. If
f (x) 0, then (8.6.1) which reduces to
Ln
Ln (y) = 0 on I,
(8.6.2)
is called a homogeneous linear equation, otherwise (8.6.1) is called a non-homogeneous linear equation. The function f is also known as the non-homogeneous term or a forcing term.
Definition 8.6.1 A function y defined on I is called a solution of (8.6.1) if y is n times differentiable and y along with its derivatives satisfy (8.6.1).
Remark 8.6.2
1. If u and v are any two solutions of (8.6.1), then y = u v is also a solution
of (8.6.2). Hence, if v is a solution of (8.6.2) and yp is a solution of (8.6.1), then u = v + yp
is a solution of (8.6.1).
2. Let y1 and y2 be two solutions of (8.6.2). Then for any constants (need not be real) c1 , c2 ,
y = c1 y 1 + c2 y 2
is also a solution of (8.6.2). The solution y is called the superposition of y1 and y2 .
3. Note that y 0 is a solution of (8.6.2). This, along with the super-position principle, ensures
that the set of solutions of (8.6.2) forms a vector space over R. This vector space is called the
solution space or space of solutions of (8.6.2).
182
As in Section 8.3, we first take up the study of (8.6.2). It is easy to note (as in Section 8.3) that
for a constant ,
Ln (ex ) = p()ex
where,
p() = n + a1 n1 + + an
(8.6.3)
Definition 8.6.3 (Characteristic Equation) The equation p() = 0, where p() is defined in
(8.6.3), is called the characteristic equation of (8.6.2).
Note that p() is of polynomial of degree n with real coefficients. Thus, it has n zeros (counting
with multiplicities). Also, in case of complex roots, they will occur in conjugate pairs. In view of
this, we have the following theorem. The proof of the theorem is omitted.
Theorem 8.6.4 ex is a solution of (8.6.2) on any interval I R if and only if is a root of
(8.6.3)
1. If 1 , 2 , . . . , n are distinct roots of p() = 0, then
e1 x , e2 x , . . . , en x
are the n linearly independent solutions of (8.6.2).
2. If 1 is a repeated root of p() = 0 of multiplicity k, i.e., 1 is a zero of (8.6.3) repeated k
times, then
e1 x , xe1 x , . . . , xk1 e1 x
are linearly independent solutions of (8.6.2), corresponding to the root 1 of p() = 0.
3. If 1 = + i is a complex root of p() = 0, then so is the complex conjugate 1 = i.
Then the corresponding linearly independent solutions of (8.6.2) are
y1 = ex cos(x) + i sin(x) and y2 = ex cos(x) i sin(x) .
These are complex valued functions of x. However, using super-position principle, we note
that
y1 + y2
y1 y2
= ex cos(x) and
= ex sin(x)
2
2i
are also solutions of (8.6.2). Thus, in the case of 1 = +i being a complex root of p() = 0,
we have the linearly independent solutions
ex cos(x) and ex sin(x).
Example 8.6.5
183
(a) y + y = 0.
184
2. Find a linear differential equation with constant coefficients and of order 3 which admits the
following solutions:
(a) cos x, sin x and e3x .
(b) ex , e2x and e3x .
(c) 1, ex and x.
3. Solve the following IVPs:
(a) y iv y = 0, y(0) = 0, y (0) = 0, y (0) = 0, y (0) = 1.
4.
n1
dn y
y
n1 d
+
a
x
+ + a0 y = 0, x I
n1
n
n1
dx
dx
(8.6.4)
is called the homogeneous Euler-Cauchy Equation (or just Eulers Equation) of degree n.
(8.6.4) is also called the standard form of the Euler equation. We define
L(y) = xn
dn y
dn1 y
+ an1 xn1 n1 + + a0 y.
n
dx
dx
( 1) ( n + 1) + an1 ( 1) ( n + 2) + + a0 = 0.
(8.6.5)
Essentially, for finding the solutions of (8.6.4), we need to find the roots of (8.6.5), which
is a polynomial in . With the above understanding, solve the following homogeneous Euler
equations:
(a) x3 y + 3x2 y + 2xy = 0.
(b) x3 y 6x2 y + 11xy 6y = 0.
(c) x3 y x2 y + xy y = 0.
dy)
dt .
(b) using mathematical induction, show that xn dn y = D(D 1) (D n + 1) y(t).
(c) with the new (independent) variable t, the Euler equation (8.6.4) reduces to an equation
with constant coefficients. So, the questions in the above part can be solved by the method
just explained.
185
We turn our attention toward the non-homogeneous equation (8.6.1). If yp is any solution of
(8.6.1) and if yh is the general solution of the corresponding homogeneous equation (8.6.2), then
y = yh + yp
is a solution of (8.6.1). The solution y involves n arbitrary constants. Such a solution is called the
general solution of (8.6.1).
Solving an equation of the form (8.6.1) usually means to find a general solution of (8.6.1).
The solution yp is called a particular solution which may not involve any arbitrary constants.
Solving (8.6.1) essentially involves two steps (as we had seen in detail in Section 8.3).
Step 1: a) Calculation of the homogeneous solution yh and
b) Calculation of the particular solution yp .
In the ensuing discussion, we describe the method of undetermined coefficients to determine yp .
Note that a particular solution is not unique. In fact, if yp is a solution of (8.6.1) and u is any
solution of (8.6.2), then yp + u is also a solution of (8.6.1). The undetermined coefficients method
is applicable for equations (8.6.1).
8.7
(8.7.6)
k
to obtain
p()
Ln (yp ) = kex .
Thus, yp =
k x
e is a particular solution of Ln (y) = kex .
p()
186
Modification Rule: If is a root of the characteristic equation, i.e., p() = 0, with multiplicity
r, (i.e., p() = p () = = p(r1) () = 0 and p(r) () 6= 0) then we take, yp of the form
yp = Axr ex
and obtain the value of A by substituting yp in Ln (y) = kex .
Example 8.7.1
Solution: Here f (x) = 2ex with k = 2 and = 1. Also, the characteristic polynomial,
p() = 2 4. Note that = 1 is not a root of p() = 0. Thus, we assume yp = Aex . This on
substitution gives
Aex 4Aex = 2ex = 3Aex = 2ex .
So, we choose A =
2
, which gives a particular solution as
3
yp =
2ex
.
3
3Aex x3 + 6x2 + 6x
3Aex x3 + 3x2 Ax3 ex = 2ex .
1
x3 ex
, and thus a particular solution is yp =
.
3
3
187
Case II. f (x) = ex k1 cos(x) + k2 sin(x) ; k1 , k2 , , R
We first assume that + i is not a root of the characteristic equation, i.e., p( + i) 6= 0. Here,
we assume that yp is of the form
yp = ex A cos(x) + B sin(x) ,
and then comparing the coefficients of ex cos x and ex sin x (why!) in Ln (y) = f (x), obtain the
values of A and B.
Modification Rule: If + i is a root of the characteristic equation, i.e., p( + i) = 0, with
multiplicity r, then we assume a particular solution as
yp = xr ex A cos(x) + B sin(x) ,
and then comparing the coefficients in Ln (y) = f (x), obtain the values of A and B.
Example 8.7.3
Solution: Here, = 1 and = 1. Thus +i = 1+i, which is not a root of the characteristic
equation p() = 2 + 2 + 2 = 0. Note that the roots of p() = 0 are 1 i.
Thus, let us assume yp = ex (A sin x + B cos x) . This gives us
(4B + 4A)ex sin x + (4B + 4A)ex cos x = 4ex sin x.
Comparing the coefficients of ex cos x and ex sin x on both sides, we get A B = 1 and
1
A + B = 0. On solving for A and B, we get A = B = . So, a particular solution is
2
ex
yp =
(sin x cos x) .
2
2. Find a particular solution of
y + y = sin x.
Solution: Here, = 0 and = 1. Thus + i = i, which is a root with multiplicity r = 1,
of the characteristic equation p() = 2 + 1 = 0.
So, let yp = x (A cos x + B sin x) . Substituting this in the given equation and comparing the
1
coefficients of cos x and sin x on both sides, we get B = 0 and A = . Thus, a particular
2
1
solution is yp =
x cos x.
2
Exercise 8.7.4 Find a particular solution for the following differential equations:
1. y y + y y = ex cos x.
2. y + 2y + y = sin x.
3. y 2y + 2y = ex cos x.
188
and then compare the coefficient of xk in Ln (yp ) = f (x) to obtain the values of Ai for 0 i m.
Example 8.7.5 Find a particular solution of
y y + y y = x2 .
Solution: As p(0) 6= 0, we assume
yp = A2 x2 + A1 x + A0
which on substitution in the given differential equation gives
2A2 + (2A2 x + A1 ) (A2 x2 + A1 x + A0 ) = x2 .
Comparing the coefficients of different powers of x and solving, we get A2 = 1, A1 = 2 and
A0 = 0. Thus, a particular solution is
yp = (x2 + 2x).
Finally, note that if yp1 is a particular solution of Ln (y) = f1 (x) and yp2 is a particular solution
of Ln (y) = f2 (x), then a particular solution of
Ln (y) = k1 f1 (x) + k2 f2 (x)
is given by
yp = k1 yp1 + k2 yp2 .
In view of this, one can use method of undetermined coefficients for the cases, where f (x) is a linear
combination of the functions described above.
Example 8.7.6 Find a particular soltution of
y + y = 2 sin x + sin 2x.
1
sin(2x).
3
Exercise 8.7.7 Find a particular solution for the following differential equations:
1. y y + y y = 5ex cos x + 10e2x.
2. y + 2y + y = x + ex .
3. y + 3y 4y = 4ex + e4x .
4. y + 9y = cos x + x2 + x3 .
5. y 3y + 4y = x2 + e2x sin x.
6. y + 4y + 6y + 4y + 5y = 2 sin x + x2 .
189
190
Chapter 9
Introduction
X
an (x x0 )n
(9.1.1)
n=0
is called a power series in x around x0 . The point x0 is called the centre, and an s are called the
coefficients.
In short, a0 , a1 , . . . , an , . . . are called the coefficient of the power series and x0 is called the
centre. Note here that an R is the coefficient of (x x0 )n and that the power series converges for
x = x0 . So, the set
X
S = {x R :
an (x x0 )n converges}
n=0
is a non-empty. It turns out that the set S is an interval in R. We are thus led to the following
definition.
Example 9.1.2
x3
x5
x7
+
+ .
3!
5!
7!
191
192
2. Any polynomial
a0 + a1 x + a2 x2 + + an xn
is a power series with x0 = 0 as the centre, and the coefficients am = 0 for m n + 1.
Definition 9.1.3 (Radius of Convergence) A real number R 0 is called the radius of convergence of the power series (9.1.1), if the expression (9.1.1) converges for all x satisfying
|x x0 | < R and R is the largest such number.
From what has been said earlier, it is clear that the set of points x where the power series (9.1.1)
is convergent is the interval (R + x0 , x0 + R), whenever R is the radius of convergence. If R = 0,
the power series is convergent only at x = x0 .
Let R > 0 be the radius of convergence of the power series (9.1.1). Let I = (R + x0 , x0 + R).
In the interval I, the power series (9.1.1) converges. Hence, it defines a real valued function and we
denote it by f (x), i.e.,
X
f (x) =
an (x x0 )n , x I.
n=1
Such a function is well defined as long as x I. f is called the function defined by the power
series (9.1.1) on I. Sometimes, we also use the terminology that (9.1.1) induces a function f on I.
It is a natural question to ask how to find the radius of convergence of a power series (9.1.1).
We state one such result below but we do not intend to give a proof.
Theorem 9.1.4
1. Let
n=1
n=1
n=1
lim
p
n
|an |
9.1. INTRODUCTION
193
Remark 9.1.5 If the reader is familiar with the concept of lim sup of a sequence, then we have a
modification p
of the above theorem.
n
In case,
|an | does p
not tend to a limit as n , then the above theorem holds if we replace
p
lim n |an | by lim sup n |an |.
n
P
1. Consider the power series
(x+1)n . Here x0 = 1 is the centre and an = 1
n=0
p
for all n 0. So, n |an | = n 1 = 1. Hence, by Theorem 9.1.4, the radius of convergenceR = 1.
Example 9.1.6
X (1)n (x + 1)2n+1
. In this case, the centre is
(2n + 1)!
n0
p
|a2n+1 | = 0 and
2n+1
lim
Thus, lim
lim
(1)n
.
(2n + 1)!
p
|a2n | = 0.
2n
p
n
|an | exists and equals 0. Therefore, the power series converges for all x R.
n=1
p
|a2n+1 | = 0 and
2n+1
lim
p
Thus, lim n |an | does not exist.
lim
p
|a2n | = 1.
2n
n=1
x2n reduces to
n=1
n=1
un converges for all u with |u| < 1. Therefore, the original power
series converges whenever |x2 | < 1 or equivalently whenever |x| < 1. So, the radius of convergence is R = 1. Note that
X
1
=
x2n for |x| < 1.
1 x2
n=1
4. Consider the power series
n0
nn xn . In this case,
n
|an | = n nn = n. doesnt have any finite
n1
X xn
1
1
5. The power series
has coefficients an =
and it is easily seen that lim = 0
n n!
n!
n!
n0
and the power series converges for all x R. Recall that it represents ex .
194
9.1.1
Now we quickly state some of the important properties of the power series. Consider two power
series
X
X
an (x x0 )n and
bn (x x0 )n
n=0
n=0
with radius of convergence R1 > 0 and R2 > 0, respectively. Let F (x) and G(x) be the functions
defined by the two power series defined for all x I, where I = (R + x0 , x0 + R) with R =
min{R1 , R2 }. Note that both the power series converge for all x I.
With F (x), G(x) and I as defined above, we have the following properties of the power series.
1. Equality of Power Series
The two power series defined by F (x) and G(x) are equal for all x I if and only if
an = bn for all n = 0, 1, 2, . . . .
In particular, if
n=0
(an + bn )(x x0 )n
n=0
Essentially, it says that in the common part of the regions of convergence, the two power
series can be added term by term.
3. Multiplication of Power Series
Let us define
c0 = a0 b0 , and inductively cn =
n
X
anj bj .
j=1
n=0
cn (x x0 )n .
X
X
k
j
bk (x x0 )
aj (x x0 )
j=0
k=0
is cn =
n
X
j=1
n=1
nan (x x0 )n .
anj bj .
195
Note that it also has R1 as the radius of convergence as by Theorem 9.1.4 lim
n
and
p
p
p
1
lim n |nan | = lim n |n| lim n |an | = 1
.
n
n
n
R1
Let 0 < r < R1 . Then for all x (r + x0 , x0 + r), we have
p
n
|an | = R11
X
d
F (x) = F (x) =
nan (x x0 )n .
dx
n=1
In other words, inside the region of convergence, the power series can be differentiated term
by term.
In the following, we shall consider power series with x0 = 0 as the centre. Note that by a
transformation of X = x x0 , the centre of the power series can be shifted to the origin.
Exercise 9.1.1
1. which of the following represents a power series (with centre x0 indicated in
the brackets) in x?
(a) 1 + x2 + x4 + + x2n +
(x0 = 0).
(c) 1 + x|x| + x |x | + + x |x | +
(x0 = 0).
(x0 = 0).
x5
x2n+1
x3
+
+ (1)n
+
3!
5!
(2n + 1)!
x2
x4
x2n
= 1
+
+ (1)n
+ .
2!
4!
(2n)!
= x
Find the radius of convergence of f (x) and g(x). Also, for each x in the domain of convergence,
show that
f (x) = g(x) and g (x) = f (x).
[Hint: Use Properties 1, 2, 3 and 4 mentioned above. Also, note that we usually call f (x) by
sin x and g(x) by cos x.]
3. Find the radius of convergence of the following series centred at x0 = 1.
(a) 1 + (x + 1) +
(x+1)2
2!
+ +
2
(x+1)n
n!
+ .
9.2
(9.2.1)
Let a and b be analytic around the point x0 = 0. In such a case, we may hope to have a solution y
in terms of a power series, say
X
y=
ck xk .
(9.2.2)
k=0
In the absence of any information, let us assume that (9.2.1) has a solution y represented by (9.2.2).
We substitute (9.2.2) in Equation (9.2.1) and try to find the values of ck s. Let us take up an example
for illustration.
196
(9.2.3)
X
y=
cn xn .
(9.2.4)
n=0
Then y =
n=0
n=0
n=0
cn xn = 0
n=0
or, equivalently
0=
(n + 2)(n + 1)cn+2 x +
n=0
cn x =
n=0
n=0
cn
.
(n + 1)(n + 2)
Therefore, we have
c2 = c2!0 ,
c4 = (1)2 c4!0 ,
..
.
c0
c2n = (1)n (2n)!
,
c3 = c3!1 ,
c5 = (1)2 c5!1 ,
..
.
c1
c2n+1 = (1)n (2n+1)!
.
X
X
(1)n x2n
(1)n x2n+1
+ c1
(2n)!
(2n + 1)!
n=0
n=0
9.3
197
Earlier, we saw a few properties of a power series and some uses also. Presently, we inquire the
question, namely, whether an equation of the form
y + a(x)y + b(x)y = f (x), x I
(9.3.1)
admits a solution y which has a power series representation around x I. In other words, we are
interested in looking into an existence of a power series solution of (9.3.1) under certain conditions
on a(x), b(x) and f (x). The following is one such result. We omit its proof.
Theorem 9.3.1 Let a(x), b(x) and f (x) admit a power series representation around a point x =
x0 I, with non-zero radius of convergence r1 , r2 and r3 , respectively. Let R = min{r1 , r2 , r3 }.
Then the Equation (9.3.1) has a solution y which has a power series representation around x0 with
radius of convergence R.
Remark 9.3.2 We remind the readers that Theorem 9.3.1 is true for Equations (9.3.1), whenever
the coefficient of y is 1.
Secondly, a point x0 is called an ordinary point for (9.3.1) if a(x), b(x) and f (x) admit power
series expansion (with non-zero radius of convergence) around x = x0 . x0 is called a singular
point for (9.3.1) if x0 is not an ordinary point for (9.3.1).
The following are some examples for illustration of the utility of Theorem 9.3.1.
Exercise 9.3.3
1. Examine whether the given point x0 is an ordinary point or a singular point
for the following differential equations.
(a) (x 1)y + sin xy = 0, x0 = 0.
(b) y +
sin x
x1 y
= 0, x0 = 0.
9.4
9.4.1
Legendre Equation plays a vital role in many problems of mathematical Physics and in the theory
of quadratures (as applied to Numerical Integration).
Definition 9.4.1 The equation
(1 x2 )y 2xy + p(p + 1)y = 0, 1 < x < 1
where p R, is called a Legendre Equation of order p.
(9.4.1)
198
Equation (9.4.1) was studied by Legendre and hence the name Legendre Equation.
Equation (9.4.1) may be rewritten as
y
2x
p(p + 1)
y +
y = 0.
2
(1 x )
(1 x2 )
2x
p(p + 1)
and
are analytic around x0 = 0 (since they have power series
1 x2
1 x2
expressions with centre at x0 = 0 and with R = 1 as the radius of convergence). By Theorem
9.3.1, a solution y of (9.4.1) admits a power series solution (with centre at x0 = 0) with radius of
P
convergence R = 1. Let us assume that y =
ak xk is a solution of (9.4.1). We have to find the
The functions
k=0
k=0
k=0
k=0
Hence, for k = 0, 1, 2, . . .
ak+2 =
(p k)(p + k + 1)
ak .
(k + 1)(k + 2)
= p(p+1)
2! a0 ,
(p2)(p+3)
=
a2
34
a0 ,
= (1)2 p(p2)(p+1)(p+3)
4!
a3 = (p1)(p+2)
a1 ,
3!
2 (p1)(p3)(p+2)(p+4)
a5 = (1)
a1
5!
etc. In general,
a2m = (1)m
and
a2m+1 = (1)m
It turns out that both a0 and a1 are arbitrary. So, by choosing a0 = 1, a1 = 0 and a0 = 0, a1 = 1
in the above expressions, we have the following two solutions of the Legendre Equation (9.4.1),
namely,
p(p + 1) 2
(p 2m + 2) (p + 2m 1) 2m
x + + (1)m
x +
2!
(2m)!
(9.4.2)
(p 1)(p + 2) 3
(p 2m + 1) (p + 2m) 2m+1
x + + (1)m
x
+ .
3!
(2m + 1)!
(9.4.3)
y1 = 1
and
y2 = x
Remark 9.4.2 y1 and y2 are two linearly independent solutions of the Legendre Equation (9.4.1).
It now follows that the general solution of (9.4.1) is
y = c1 y 1 + c2 y 2
where c1 and c2 are arbitrary real numbers.
(9.4.4)
9.4.2
199
Legendre Polynomials
In many problems, the real number p, appearing in the Legendre Equation (9.4.1), is a non-negative
integer. Suppose p = n is a non-negative integer. Recall
ak+2 =
(n k)(n + k + 1)
ak , k = 0, 1, 2, . . . .
(k + 1)(k + 2)
(9.4.5)
(2n)!
1 3 5 (2n 1)
=
.
n
2
2 (n!)
n!
(n 1)n
(2n 2)!
an = n
2(2n 1)
2 (n 1)!(n 2)!
(2n 2m)!
.
2n m!(n m)!(n 2m)!
Hence,
M
X
(1)m
m=0
where M =
(2n 2m)!
xn2m ,
m)!(n 2m)!
2n m!(n
(9.4.6)
n
n1
when n is even and M =
when n is odd.
2
2
Proposition 9.4.4 Let p = n be a non-negative even integer. Then any polynomial solution y of
(9.4.1) which has only even powers of x is a multiple of Pn (x).
Similarly, if p = n is a non-negative odd integer, then any polynomial solution y of (9.4.1) which
has only odd powers of x is a multiple of Pn (x).
Proof. Suppose that n is a non-negative even integer. Let y be a polynomial solution of (9.4.1).
By (9.4.4)
y = c1 y 1 + c2 y 2 ,
200
where y1 is a polynomial of degree n (with even powers of x) and y2 is a power series solution with
odd powers only. Since y is a polynomial, we have c2 = 0 or y = c1 y1 with c1 6= 0.
Similarly, Pn (x) = c1 y1 with c1 6= 0. which implies that y is a multiple of Pn (x). A similar proof
holds when n is an odd positive integer.
We have an alternate way of evaluating Pn (x). They are used later for the orthogonality properties of the Legendre polynomials, Pn (x)s.
Theorem 9.4.5 (Rodrigues
Formula) The Legendre polynomials Pn (x) for n = 1, 2, . . . , are
given by
1 dn 2
Pn (x) = n
(x 1)n .
(9.4.7)
2 n! dxn
Proof. Let V (x) = (x2 1)n . Then
(x2 1)
d
dx V
d
V (x) = 2nx(x2 1)n = 2nxV (x).
dx
Now differentiating (n + 1) times (by the use of the Leibniz rule for differentiation), we get
(x2 1)
By denoting, U (x) =
dn+2
V (x)
dxn+2
dn
dxn V
dn+1
2n(n + 1) dn
V (x) +
V (x)
n+1
dx
1 2 dxn
dn+1
dn
2nx n+1 V (x) 2n(n + 1) n V (x) = 0.
dx
dx
+ 2(n + 1)x
(x), we have
0
0.
This tells us that U (x) is a solution of the Legendre Equation (9.4.1). So, by Proposition 9.4.4, we
have
dn
Pn (x) = U (x) = n (x2 1)n for some R.
dx
Also, let us note that
dn 2
(x 1)n
dxn
=
=
Therefore,
dn
{(x 1)(x + 1)}n
dxn
n!(x + 1)n + terms containing a factor of (x 1).
dn 2
n
(x
1)
= 2n n! or, equivalently
n
dx
x=1
1 dn 2
n
(x
1)
=1
n
n
2 n! dx
x=1
and thus
Pn (x) =
1 dn 2
(x 1)n .
2n n! dxn
Example 9.4.6
1. When n = 0, P0 (x) = 1.
2. When n = 1, P1 (x) =
1 d 2
(x 1) = x.
2 dx
201
1 d2 2
1
3
1
(x 1)2 = {12x2 4} = x2 .
22 2! dx2
8
2
2
(9.4.8)
(1 x2 )Pm
(x) + m(m + 1)Pm (x) = 0.
(9.4.9)
(9.4.10)
Multiplying Equation (9.4.9) by Pm (x) and Equation (9.4.10) by Pn (x) and subtracting, we get
n(n + 1)
=
m(m + 1)
Z
0.
1
Pn (x)Pm (x)dx
(1 x2 )Pm
(x) Pn (x) (1 x2 )Pn (x) Pm (x) dx
2
(1 x
)Pm
(x)Pn (x)dx
+ (1 x
x=1
)Pm
(x)Pn (x)
x=1
x=1
2
x=1
Theorem 9.4.8 For n = 0, 1, 2, . . .
Z
Pn2 (x) dx =
2
.
2n + 1
(9.4.11)
Proof. Let us write V (x) = (x2 1)n . By the Rodrigues formula, we have
2 n
Z 1
Z 1
1
d
dn
Pn2 (x) dx =
V (x) n V (x)dx.
n
n
dx
dx
1
1 n!2
Let us call I =
Z1
dn
dn
V (x) n V (x)dx. Note that for 0 m < n,
n
dx
dx
dm
dm
V
(1)
=
V (1) = 0.
dxm
dxm
(9.4.12)
202
d2n
V (x) (1)n V (x)dx = (2n)!
dx2n
2 n
(1 x ) dx = (2n)! 2
R2
(1 x2 )n dx.
We now state an important expansion theorem. The proof is beyond the scope of this book.
Theorem 9.4.9 Let f (x) be a real valued continuous function defined in [1, 1]. Then
f (x) =
n=0
2n + 1
where an =
2
Z1
an Pn (x), x [1, 1]
f (x)Pn (x)dx.
Legendre polynomials can also be generated by a suitable function. To do that, we state the
following result without proof.
Theorem 9.4.10 Let Pn (x) be the Legendre polynomial of degree n. Then
X
1
=
Pn (x)tn , t 6= 1.
1 2xt + t2
n=0
(9.4.13)
1
admits a power series expansion in t (for small t) and the
The function h(t) =
1 2xt + t2
n
coefficient of t in Pn (x). The function h(t) is called the generating function for the Legendre
polynomials.
Exercise 9.4.11
Pn+1
(x)
xPn (x)
xPn (x)
Pn1
(x)
+ (n + 1)Pn (x).
(9.4.14)
(9.4.15)
(9.4.16)
The relations (9.4.14), (9.4.15) and (9.4.16) are called recurrence relations for the Legendre
polynomials, Pn (x). The relation (9.4.14) is also known as Bonnets recurrence relation. We will
now give the proof of (9.4.14) using (9.4.13). The readers are required to proof the other two
recurrence relations.
203
Differentiating the generating function (9.4.13) with respect to t (keeping the variable x fixed),
we get
X
1
2 32
(1 2xt + t ) (2x + 2t) =
nPn (x)tn1 .
2
n=0
Or equivalently,
n=0
nPn (x)tn1 .
n=0
1
n=0
Pn (x)t = (1 2xt + t )
nPn (x)tn1 .
n=0
The two sides and power series in t and therefore, comparing the coefficient of tn , we get
xPn (x) Pn1 (x) = (n + 1)Pn (x) + (n 1)Pn1 (x) 2n x Pn (x).
This is clearly same as (9.4.14).
To prove (9.4.15), one needs to differentiate the generating function with respect to x (keeping
t fixed) and doing a similar simplification. Now, use the relations (9.4.14) and (9.4.15) to get the
relation (9.4.16). These relations will be helpful in solving the problems given below.
Exercise 9.4.12
y(1) = 10.
R1
(b)
R1
(c)
R1
n(n + 1)
n(n + 1)
and Pn (1) = (1)n1
.
2
2
204
Part II
Laplace Transform
205
Chapter 10
Laplace Transform
10.1
Introduction
In many problems, a function f (t), t [a, b] is transformed to another function F (s) through a
relation of the type:
Z b
K(t, s)f (t)dt
F (s) =
a
where K(t, s) is a known function. Here, F (s) is called integral transform of f (t). Thus, an integral
transform sends a given function f (t) into another function F (s). This transformation of f (t) into
F (s) provides a method to tackle a problem more readily. In some cases, it affords solutions to
otherwise difficult problems. In view of this, the integral transforms find numerous applications
in engineering problems. Laplace transform is a particular case of integral transform (where f (t)
is defined on [0, ) and K(s, t) = est ). As we will see in the following, application of Laplace
transform reduces a linear differential equation with constant coefficients to an algebraic equation,
which can be solved by algebraic methods. Thus, it provides a powerful tool to solve differential
equations.
It is important to note here that there is some sort of analogy with what we had learnt during the
study of logarithms in school. That is, to multiply two numbers, we first calculate their logarithms,
add them and then use the table of antilogarithm to get back the original product. In a similar
way, we first transform the problem that was posed as a function of f (t) to a problem in F (s),
make some calculations and then use the table of inverse Laplace transform to get the solution of
the actual problem.
In this chapter, we shall see same properties of Laplace transform and its applications in solving
differential equations.
10.2
208
Definition 10.2.2 (Laplace Transform) Let f : [0, ) R and s R. Then F (s), for s R
is called the Laplace transform of f (t), and is defined by
Z
L(f (t)) = F (s) =
f (t)est dt
0
Remark 10.2.3
b 0
|f (t)| M et for all t > 0 and for some real numbers and M with M > 0.
Then the Laplace transform of f exists.
2. Suppose F (s) exists for some function f . Then by definition, lim
Rb
b 0
lim F (s) = 0.
10.2.1
Examples
lim
.
b s 0
s b s
0
Note that if s > 0, then
esb
lim
= 0.
b s
Example 10.2.5
209
Thus,
F (s) =
1
,
s
for s > 0.
In the remaining part of this chapter, whenever the improper integral is calculated, we will not
explicitly write the limiting process. However, the students are advised to provide the details.
2. Find the Laplace transform F (s) of f (t), where f (t) = t, t 0.
Solution: Integration by parts gives
Z st
Z
e
test
st
dt
F (s) =
te dt =
+
s
s
0
0
0
1
=
for s > 0.
s2
3. Find the Laplace transform of f (t) = tn , n a positive integer.
Solution: Substituting st = , we get
Z
F (s) =
est tn dt
0
Z
1
e n d
=
sn+1 0
n!
=
for s > 0.
sn+1
4. Find the Laplace transform of f (t) = eat , t 0.
Solution: We have
Z
Z
at
at st
L(e ) =
e e dt =
0
1
sa
e(sa)t dt
for s > a.
a sin(at)
dt
s 0
s
0
Z
st
1
a sin(at) est
2 cos(at) e
=
a
dt
s
s
s 0
s
s
0
Note that the limits exist only when s > 0. Hence,
Z
a2 + s2
1
s
cos(at)est dt = . Thus L(cos(at)) = 2
;
s2
s
a
+
s2
0
a
, s > 0.
s2 + a2
s > 0.
210
1
7. Find the Laplace transform of f (t) = , t > 0.
t
Solution: Note that f (t) is not a bounded function near t = 0 (why!). We will still show that
the Laplace transform of f (t) exists.
Z
Z
1
s
1
d
est dt =
e
L( ) =
( substitute = st)
s
t
t
0
0
Z
Z
1
1
1
1
=
2 e d =
2 1 e d.
s 0
s 0
Z
1
Recall that for calculating the integral
2 1 e d, one needs to consider the double inte0
gral
(x2 +y 2 )
dxdy =
Z
x2
2
2 Z
1
1
1
dx =
2 e d .
2 0
2 1 e d =
We now put the above discussed examples in tabular form as they constantly appear in applications of Laplace transform to differential equations.
f (t)
L(f (t))
f (t)
L(f (t))
1
, s>0
s
1
, s>0
s2
eat
1
, s>a
sa
tn
sin(at)
sinh(at)
n!
sn+1
s2
s2
, s>0
a
, s>0
+ a2
cos(at)
a
, s>a
a2
cosh(at)
s2
s2
s
, s>0
+ a2
s
, s>a
a2
10.3
211
The above lemma is immediate from the definition of Laplace transform and the linearity of the
definite integral.
Example 10.3.2
s > |a|.
2. Similarly,
L(sinh(at)) =
1
2
1
1
sa s+a
a
,
s2 a2
s > |a|.
1
.
s(s + 1)
Solution:
L1
1
=
s(s + 1)
=
L1
L1
1
1
s s+1
1
1
L1
= 1 et .
s
s+1
1
is f (t) = 1 et .
s(s + 1)
Theorem 10.3.3 (Scaling by a) Let f (t) be a piecewise continuous function with Laplace trans1 s
form F (s). Then for a > 0, L(f (at)) = F ( ).
a a
Proof. By definition and the substitution z = at, we get
Z
Z
1 s z
L(f (at)) =
est f (at)dt =
e a f (z)dz
a 0
0
Z
1 as z
1 s
=
e
f (z)dz = F ( ).
a 0
a a
Exercise 10.3.4
s2
1
1
+
, find f (t).
+ 1 2s + 1
The next theorem relates the Laplace transform of the function f (t) with that of f (t).
Theorem 10.3.5 (Laplace Transform of Differentiable Functions) Let f (t), for t > 0, be a
differentiable function with the derivative, f (t), being continuous. Suppose that there exist constants
M and T such that |f (t)| M et for all t T. If L(f (t)) = F (s) then
L (f (t)) = sF (s) f (0) for s > .
(10.3.1)
212
1
a
2a
2a
So, by definition,
L f (t) =
=
b
Z
st
lim f (t)e lim
b
b
0
est f (t)dt
0
b
f (t)(s)est dt
f (0) + sF (s).
We can extend the above result for nth derivative of a function f (t), if f (t), . . . , f (n1) (t), f (n) (t)
exist and f (n) (t) is continuous for t 0. In this case, a repeated use of Theorem 10.3.5, gives the
following corollary.
Corollary 10.3.6 Let f (t) be a function with L(f (t)) = F (s). If f (t), . . . , f (n1) (t), f (n) (t) exist
and f (n) (t) is continuous for t 0, then
L f (n) (t) = sn F (s) sn1 f (0) sn2 f (0) f (n1) (0).
(10.3.2)
L f (t) = s2 F (s) sf (0) f (0).
(10.3.3)
Corollary 10.3.7 Let f (t) be a piecewise continuous function for t 0. Also, let f (0) = 0. Then
L(f (t)) = sF (s) or equivalently L1 (sF (s)) = f (t).
Example 10.3.8
s
.
s2 + 1
1
s
Solution: We know that L1 ( 2
) = sin t. Then sin(0) = 0 and therefore, L1 ( 2
)=
s +1
s +1
cos t.
2. Find the Laplace transform of f (t) = cos2 (t).
Solution: Note that f (0) = 1 and f (t) = 2 cos t sin t = sin(2t). Also,
L( sin(2t)) =
2
.
s2 + 4
213
1
s
2
+1
s2 + 4
s2 + 2
.
s(s2 + 4)
Lemma 10.3.9 (Laplace Transform of tf (t)) Let f (t) be a piecewise continuous function with
L(f (t)) = F (s). If the function F (s) is differentiable, then
d
F (s).
ds
d
Equivalently, L1 ( F (s)) = tf (t).
ds
Z
Proof. By definition, F (s) =
est f (t)dt. The result is obtained by differentiating both sides
L(tf (t)) =
with respect to s.
Suppose we know the Laplace transform of a f (t) and we wish to find the Laplace transform of
f (t)
the function g(t) =
. Suppose that G(s) = L(g(t)) exists. Then writing f (t) = tg(t) gives
t
d
F (s) = L(f (t)) = L(tg(t)) = G(s).
ds
Rs
R
Thus, G(s) = F (p)dp for some real number a. As lim G(s) = 0, we get G(s) = F (p)dp.
s
f (t)
. Then
t
Z
L(g(t)) = G(s) = F (p)dp.
Example 10.3.11
2as
a
. Hence L(t sin(at)) = 2
.
s2 + a2
(s + a2 )2
4
.
(s 1)3
1
Solution: We know L(et ) =
and
s12
4
d
1
d
1
=
2
=
2
.
(s 1)3
ds
(s 1)2
ds2 s 1
d
d
By lemma 10.3.9, we know that L(tf (t)) = ds
F (s). Suppose ds
F (s) = G(s). Then g(t) =
1
1 d
L G(s) = L ds F (s) = tf (t). Therefore,
2
d
d
1
L1
F
(s)
=
L
G(s)
= tg(t) = t2 f (t).
ds2
ds
214
Proof. By definition,
Z t
Z
L
f ( ) d =
0
est
Z
f ( ) d
dt =
est f ( ) d dt.
We dont go into the details of the proof of the change in the order of integration. We assume that
the order of the integrations can be changed and therefore
Z Z t
Z Z
st
e f ( ) d dt =
est f ( ) dt d.
0
Thus,
L
f ( ) d
=
=
=
est f ( ) d dt
f ( ) dt d =
es(t )s f ( ) dt d
0
Z
Z
es f ( )d
es(t ) dt
0
Z
Z
1
s
e f ( )d
esz dz = F (s) .
s
0
0
e
st
Rt
1. Find L( 0 sin(az)dz).
a
Solution: We know L(sin(at)) = 2
. Hence
s + a2
Z t
a
a
1
=
.
L(
sin(az)dz) = 2
2)
2 + a2 )
s
(s
+
a
s(s
0
Example 10.3.13
2. Find L
Z
4
.
s(s 1)
1
. So,
s1
Z t
4
1 1
1
1
L
= 4L
=4
e d = 4(et 1).
s(s 1)
ss1
0
Lemma 10.3.14 (s-Shifting) Let L(f (t)) = F (s). Then L(eat f (t)) = F (s a) for s > a.
Proof.
L(eat f (t)) =
=
eat f (t)est dt =
F (s a)
f (t)e(sa)t dt
s > a.
215
Example 10.3.15
2. Find L1
s5
(s5)2 +36
Solution: By s-Shifting, if L(f (t)) = F (s) then L(eat f (t)) = F (s a). Here, a = 5 and
1
s
2
s + 36
=L
s
2
s + 62
= cos(6t).
10.3.1
Let F (s) be a rational function of s. We give a few examples to explain the methods for calculating
the inverse Laplace transform of F (s).
Example 10.3.16
Solution: F (s) =
f (t) =
(s + 1)(s + 3)
s(s + 2)(s + 8)
find f (t).
3
1
35
+
+
. Thus,
16s 12(s + 2) 48(s + 8)
1
35
3
+ e2t + e8t .
16 12
48
Solution: F (s) = 4
4s + 3
s2 + 2s + 5
find f (t).
s+1
1
2
. Thus,
(s + 1)2 + 22
2 (s + 1)2 + 22
1
f (t) = 4et cos(2t) et sin(2t).
2
3. Denominator of F has Repeated Real Roots:
If F (s) =
3s + 4
(s + 1)(s2 + 4s + 4)
find f (t).
Solution: Here,
F (s) =
3s + 4
3s + 4
a
b
c
=
=
+
+
.
(s + 1)(s2 + 4s + 4)
(s + 1)(s + 2)2
s + 1 s + 2 (s + 2)2
1
s+1
1
s+2
+
2
(s+2)2
1
s+1
1
s+2
d
1
+ 2 ds
(s+2)
. Thus,
216
10.3.2
est dt =
esa
, s > 0.
s
c
g(t)
f(t)
d+a
L g(t) = eas F (s).
st
g(t)dt =
Z
a
s(t+a)
0
as
= e
est f (t a)dt
f (t)dt = e
as
est f (t)dt
F (s).
e5s
s2 4s5
e5s
s2 4s5
1
F (s), with F (s) = s2 4s5
. Since s2 4s 5 = (s 2)2 32
1
3
1
L1 (F (s)) = L1
= sinh(3t)e2t .
3 (s 2)2 32
3
=e
5s
217
1
U5 (t) sinh 3(t 5) e2(t5) .
3
0
t cos t
t < 2
t > 2.
0
t < 2
(t 2) cos(t 2) + 2 cos(t 2) t > 2.
s2 1
s
+ 2 2
2
2
(s + 1)
s +1
10.4
10.4.1
The following two theorems give us the behaviour of the function f (t) when t 0+ and when
t .
Theorem 10.4.1 (First Limit Theorem) Suppose L(f (t)) exists. Then
lim f (t) = lim sF (s).
s
t0+
as lim est = 0.
Example 10.4.2
1. For t 0, let Y (s) = L(y(t)) = a(1 + s2 )1/2 . Determine a such that
y(0) = 1.
Solution: Theorem 10.4.1 implies
as
a
1 = lim sY (s) = lim
= lim
. Thus, a = 1.
s
s (1 + s2 )1/2
s ( 12 + 1)1/2
s
(s + 1)(s + 3)
find f (0+ ).
s(s + 2)(s + 8)
Solution: Theorem 10.4.1 implies
2. If F (s) =
(s + 1)(s + 3)
= 1.
s(s + 2)(s + 8)
On similar lines, one has the following theorem. But this theorem is valid only when limt f (t)
exists and is finite.
218
Theorem 10.4.3 (Second Limit Theorem) Suppose L(f (t)) exists. Then
lim f (t) = lim sF (s)
s0
s0
est f (t)dt
Z t
= f (0) + lim lim
es f ( )d
s0 t 0
Z t
= f (0) + lim
lim es f ( )d = lim f (t).
= f (0) + lim
s0
0 s0
2(s + 3)
find lim f (t).
t
s(s + 2)(s + 8)
Solution: From Theorem 10.4.3, we have
s0
lim s
s0
6
3
2(s + 3)
=
= .
s(s + 2)(s + 8)
16
8
f ( )g(t )d.
Check that
1. (f g)(t) = g f (t).
2. If f (t) = cos(t) then (f f )(t) =
t cos(t) + sin(t)
.
2
Theorem 10.4.6 (Convolution Theorem) If F (s) = L(f (t)) and G(s) = L(g(t)) then
L
Z
f ( )g(t )d = F (s) G(s).
1
Remark 10.4.7 Let g(t) = 1 for all t 0. Then we know that L(g(t)) = G(s) = . Thus, the
s
Convolution Theorem 10.4.6 reduces to the Integral Lemma 10.3.12.
10.5
219
F (s) =
nonhomogeneous part
(as + b)f0
af1
+ 2
.
2 + bs + c
as
as
+
bs + c}
|
{z
(10.5.1)
initial conditions
Now, if we know that G(s) is a rational function of s then we can compute f (t) from F (s) by using
the method of partial fractions (see Subsection 10.3.1 ).
Example 10.5.2
y 4y 5y = f (t) =
t
t+1
if 0 t < 5
.
if t 5
1
e5s
+
.
2
s
s
Which gives
s
e5s
1
+
+ 2
(s + 1)(s 5) s(s + 1)(s 5) s (s + 1)(s 5)
1
5
1
e5s
6
5
1
+
+
+
+
6 s5 s+1
30
s s+1 s5
1
30 24
25
1
+
2 +
+
.
150
s
s
s+1 s5
Y (s) =
=
Hence,
y(t)
5e5t
et
1 e(t5)
e5(t5)
+
+ U5 (t) +
+
6
6
5
6
30
1
+
30t + 24 25et + e5t .
150
Remark 10.5.3 Even though f (t) is a discontinuous function at t = 5, the solution y(t) and
y (t) are continuous functions of t, as y exists. In general, the following is always true:
Let y(t) be a solution of ay + by + cy = f (t). Then both y(t) and y (t) are continuous functions
of time.
220
Example 10.5.4
1. Consider the IVP ty (t) + y (t) + ty(t) = 0, with y(0) = 1 and y (0) = 0.
Find L(y(t)).
Solution: Applying Laplace transform, we have
d
d 2
s Y (s) sy(0) y (0) + (sY (s) y(0)) Y (s) = 0.
ds
ds
Using initial conditions, the above equation reduces to
d 2
(s + 1)Y (s) s sY (s) + 1 = 0.
ds
This equation after simplification can be rewritten as
Y (s)
s
= 2
.
Y (s)
s +1
1
Therefore, Y (s) = a(1 + s2 ) 2 . From Example 10.4.2.1, we see that a = 1 and hence
1
Y (s) = (1 + s2 ) 2 .
2. Show that y(t) =
f ( )g(t )d is a solution of
where L[g(t)] =
1
.
s2 + as + b
F (s)
1
= F (s) 2
. Hence,
s2 + as + b
s + as + b
Z t
y(t) = (f g)(t) =
f ( )g(t )d.
0
1
a
a
. Hence,
s2 + a2
Z
1
1 t
y(t) = f (t) sin(at) =
f ( ) sin(a(t ))d.
a
a 0
F (s)
1
=
s2 + a2
a
F (s)
y (t) =
y( )d + t 4 sin t,
with y(0) = 1.
Solution: Taking Laplace transform of both sides and using Theorem 10.3.5, we get
sY (s) 1 =
Y (s)
1
1
+ 2 4 2
.
s
s
s +1
s2 1
1
1
= 2 2
.
s(s2 + 1)
s
s +1
10.6
221
h (t) =
1
h
t<0
0t<h
t > h.
1
U0 (t) Uh (t) . By linearity of the Laplace transform, we get
h
Dh (s) =
1 1 ehs
.
h
s
Remark 10.6.2
1. Observe that in Example 10.6.1, if we allow h to approach 0, we obtain a
new function, say (t). That is, let
(t) = lim h (t).
h0
This new function is zero everywhere except at the origin. At origin, this function tends
to infinity. In other words, the graph of the function appears as a line of infinite height at
the origin. This new function, (t), is called the unit-impulse function (or Diracs delta
function).
2. We can also write
1
U0 (t) Uh (t) .
h0 h
3. In the strict mathematical sense lim h (t) does not exist. Hence, mathematically speaking,
h0
h (t)dt = 1,
for all h.
1 ehs
. Now, if we take the limit of both sides, as h approaches
hs
zero (apply LHospitals rule), we get
sehs
1 ehs
= lim
= 1.
h0
h0
hs
s
L((t)) = lim
222
Chapter 11
Appendix
11.1
Permutation/Symmetric Groups
1
2
Example 11.1.2
1. In general, we represent a permutation by =
(1) (2)
This representation of a permutation is called a two row notation for .
n
(n)
2 3
1 2
, 2 =
2 3
1 3
1 2 3
4 =
, 5 =
2 3 1
3
2
1 2 3
,
2 1 3
3
1 2 3
, 6 =
(11.1.1)
2
3 2 1
, 3 =
1 2
3 1
Remark 11.1.3
1. Let Sn . Then is determined if (i) is known for i = 1, 2, . . . , n. As
is both one to one and onto, {(1), (2), . . . , (n)} = S. So, there are n choices for (1)
(any element of S), n 1 choices for (2) (any element of S different from (1)), and so on.
Hence, there are n(n 1)(n 2) 3 2 1 = n! possible permutations. Thus, the number of
elements in Sn is n!. That is, |Sn | = n!.
2. Suppose that , Sn . Then both and are one to one and onto. So, their composition
map , defined by ( )(i) = (i) , is also both one to one and onto. Hence, is
also a permutation. That is, Sn .
3. Suppose Sn . Then is both one to one and onto. Hence, the function 1 : SS
defined by 1 (m) = if and only if () = m for 1 m n, is well defined and indeed
223
.
224
2. Let =
1
4
2
2
3 4
3 5
1 2 m r n
= (m r) (m ) =
= (r l) (r m).
1 2 r m n
1 2 m r n
Similarly check that =
.
1 2 m r n
225
With the above definitions, we state and prove two important results.
Theorem 11.1.8 For any Sn , can be written as composition (product) of transpositions.
Proof. We will prove the result by induction on n(), the number of inversions of . If n() = 0,
then = Idn = (1 2) (1 2). So, let the result be true for all Sn with n() k.
For the next step of the induction, suppose that Sn with n( ) = k + 1. Choose the smallest
positive number, say , such that
(i) = i, for i = 1, 2, . . . , 1 and () 6= .
As is a permutation, there exists a positive number, say m, such that () = m. Also, note that
m > . Define a transposition by = ( m). Then note that
( )(i) = i, for i = 1, 2, . . . , .
So, the definition of number of inversions and m > implies that
n( )
n
X
i=1
X
i=1
n
X
i=+1
n
X
i=+1
n
X
i=+1
< (m ) +
= n( ).
n
X
i=+1
226
227
Sn
sgn()
n
Y
ai(i) .
i=1
Sn
Remark 11.1.16
1. Observe that det(A) is a scalar quantity. The expression for det(A) seems
complicated at the first glance. But this expression is very helpful in proving the results related
with properties of determinant.
2. If A = [aij ] is a 3 3 matrix, then using (11.1.1),
det(A)
sgn()
ai(i)
i=1
Sn
= sgn(1 )
3
Y
3
Y
i=1
sgn(4 )
3
Y
i=1
3
Y
i=1
3
Y
ai3 (i) +
i=1
3
Y
i=1
3
Y
ai6 (i)
i=1
= a11 a22 a33 a11 a23 a32 a12 a21 a33 + a12 a23 a31 + a13 a21 a32 a13 a22 a31 .
Observe that this expression for det(A) for a 3 3 matrix A is same as that given in (2.5.1).
11.2
Properties of Determinant
228
6. if B is obtained from A by replacing the th row by itself plus k times the mth row, for 6= m
then det(B) = det(A).
7. if A is a triangular matrix then det(A) = a11 a22 ann , the product of the diagonal elements.
8. If E is an elementary matrix of order n then det(EA) = det(E) det(A).
9. A is invertible if and only if det(A) 6= 0.
10. If B is an n n matrix then det(AB) = det(A) det(B).
11. det(A) = det(At ), where recall that At is the transpose of the matrix A.
Proof. Proof of Part 1. Suppose B = [bij ] is obtained from A = [aij ] by the interchange of the
th and mth row. Then bj = amj , bmj = aj for 1 j n and bij = aij for 1 i 6= , m n, 1
j n.
Let = ( m) be a transposition. Then by Proposition 11.1.4, Sn = { : Sn }. Hence
by the definition of determinant and Example 11.1.14.2, we have
det(B)
sgn()
Sn
= sgn( )
Sn
sgn( )
n
Y
bi( )(i)
i=1
sgn( ) sgn() b1( )(1) b2( )(2) b( )() bm( )(m) bn( )(n)
X
Sn
bi(i) =
i=1
Sn
n
Y
Sn
as sgn( ) = 1
= det(A).
Proof of Part 2. Suppose that B = [bij ] is obtained by multiplying the mth row of A by c 6= 0.
Then bmj = c amj and bij = aij for 1 i 6= m n, 1 j n. Then
det(B)
Sn
Sn
Sn
c det(A).
Sn
expression for determinant, contains one entry from each row. Hence, from the condition that A
has a row consisting of all zeros, the value of each term is 0. Thus, det(A) = 0.
Proof of Part 4. Suppose that the th and mth row of A are equal. Let B be the matrix obtained
from A by interchanging the th and mth rows. Then by the first part, det(B) = det(A). But the
assumption implies that B = A. Hence, det(B) = det(A). So, we have det(B) = det(A) = det(A).
Hence, det(A) = 0.
229
Sn
Sn
Sn
= det(B) + det(A).
Proof of Part 6. Suppose that B = [bij ] is obtained from A by replacing the th row by itself plus
k times the mth row, for 6= m. Then bj = aj +k amj and bij = aij for 1 i 6= m n, 1 j n.
Then
X
det(B) =
sgn()b1(1) b2(2) b() bm(m) bn(n)
Sn
Sn
Sn
Sn
Sn
use Part 4
det(A).
Proof of Part 7. First let us assume that A is an upper triangular matrix. Observe that if Sn
is different from the identity permutation then n() 1. So, for every 6= Idn Sn , there exists
a positive integer m, 1 m n 1 (depending on ) such that m > (m). As A is an upper
triangular matrix, am(m) = 0 for each (6= Idn ) Sn . Hence the result follows.
A similar reasoning holds true, in case A is a lower triangular matrix.
Proof of Part 8. Let In be the identity matrix of order n. Then using Part 7, det(In ) = 1. Also,
recalling the notations for the elementary matrices given in Remark 2.2.2, we have det(Eij ) = 1,
(using Part 1) det(Ei (c)) = c (using Part 2) and det(Eij (k) = 1 (using Part 6). Again using
Parts 1, 2 and 6, we get det(EA) = det(E) det(A).
Proof of Part 9. Suppose A is invertible. Then by Theorem 2.2.5, A is a product of elementary
matrices. That is, there exist elementary matrices E1 , E2 , . . . , Ek such that A = E1 E2 Ek . Now
a repeated application of Part 8 implies that det(A) = det(E1 ) det(E2 ) det(Ek ). But det(Ei ) 6= 0
for 1 i k. Hence, det(A) 6= 0.
Now assume that det(A) 6= 0. We show that A is invertible. On the contrary, assume that A
is not invertible. Then by Theorem 2.2.5, the matrix A is not of full rank. That is there exists a
positive integer r < n such that rank(A) = r. So, there exist elementary matrices E1 , E2 , . . . , Ek
B
such that E1 E2 Ek A =
. Therefore, by Part 3 and a repeated application of Part 8,
0
det(E1 ) det(E2 ) det(Ek ) det(A) = det(E1 E2 Ek A) = det
B
0
= 0.
230
But det(Ei ) 6= 0 for 1 i k. Hence, det(A) = 0. This contradicts our assumption that
det(A) 6= 0. Hence our assumption is false and therefore A is invertible.
Proof of Part 10. Suppose A is not invertible. Then by Part 9, det(A) = 0. Also, the product matrix AB is also not invertible. So, again by Part 9, det(AB) = 0. Thus, det(AB) = det(A) det(B).
Now suppose that A is invertible. Then by Theorem 2.2.5, A is a product of elementary matrices.
That is, there exist elementary matrices E1 , E2 , . . . , Ek such that A = E1 E2 Ek . Now a repeated
application of Part 8 implies that
det(AB)
=
=
Proof of Part 11. Let B = [bij ] = At . Then bij = aji for 1 i, j n. By Proposition 11.1.4, we
know that Sn = { 1 : Sn }. Also sgn() = sgn( 1 ). Hence,
X
det(B) =
sgn()b1(1) b2(2) bn(n)
Sn
Sn
Sn
det(A).
Remark 11.2.2
1. The result that det(A) = det(At ) implies that in the statements made in
Theorem 11.2.1, where ever the word row appears it can be replaced by column.
2. Let A = [aij ] be a matrix satisfying a11 = 1 and a1j = 0 for 2 j n. Let B be the submatrix
of A obtained by removing the first row and the first column. Then it can be easily shown that
det(A) = det(B). The reason being is as follows:
for every Sn with (1) = 1 is equivalent to saying that is a permutation of the elements
{2, 3, . . . , n}. That is, Sn1 . Hence,
X
X
sgn()a2(2) an(n)
det(A) =
sgn()a1(1) a2(2) an(n) =
Sn
Sn1
Sn ,(1)=1
We are now ready to relate this definition of determinant with the one given in Definition 2.5.2.
Theorem 11.2.3 Let A be an n n matrix. Then det(A) =
n
P
j=1
(1)1+j a1j det A(1|j) , where
recall that A(1|j) is the submatrix of A obtained by removing the 1st row and the j th column.
Proof. For 1 j n, define two matrices
0
a21
Bj = .
..
an1
0
a22
..
.
..
.
a1j
a2j
..
.
an2
anj
0
a2n
..
.
ann
nn
a1j
a2j
and Cj = .
..
anj
0
a21
..
.
0
a22
..
.
..
.
an1
an2
0
a2n
..
.
ann
nn
11.3. DIMENSION OF M + N
231
n
X
det(Bj ).
(11.2.2)
j=1
We now compute det(Bj ) for 1 j n. Note that the matrix Bj can be transformed into Cj by
j 1 interchanges of columns done in the following manner:
first interchange the 1st and 2nd column, then interchange the 2nd and 3rd column and so on
(the last process consists of interchanging the (j 1)th column with the j th column. Then by
Remark 11.2.2 and Parts 1 and 2 of Theorem 11.2.1, we have det(Bj ) = a1j (1)j1 det(Cj ).
Therefore by (11.2.2),
det(A) =
n
X
j=1
n
X
(1)j1 a1j det A(1|j) =
(1)j+1 a1j det A(1|j) .
j=1
11.3
Dimension of M + N
Theorem 11.3.1 Let V (F) be a finite dimensional vector space and let M and N be two subspaces
of V. Then
dim(M ) + dim(N ) = dim(M + N ) + dim(M N ).
(11.3.3)
Proof. Since M N is a vector subspace of V, consider a basis B1 = {u1 , u2 , . . . , uk } of M N.
As, M N is a subspace of the vector spaces M and N, we extend the basis B1 to form a basis
BM = {u1 , u2 , . . . , uk , v1 , . . . , vr } of M and also a basis BN = {u1 , u2 , . . . , uk , w1 , . . . , ws } of N.
We now proceed to prove that the set B2 = {u1 , u2 , . . . , uk , w1 , . . . , ws , v1 , v2 , . . . , vr } is a basis
of M + N.
To do this, we show that
1. the set B2 is linearly independent subset of V, and
2. L(B2 ) = M + N.
The second part can be easily verified. To prove the first part, we consider the linear system of
equations
1 u1 + + k uk + 1 w1 + + s ws + 1 v1 + + r vr = 0.
(11.3.4)
This system can be rewritten as
1 u1 + + k uk + 1 w1 + + s ws = (1 v1 + + r vr ).
The vector v = (1 v1 + + r vr ) M, as v1 , . . . , vr BM . But we also have v = 1 u1 + +
k uk + 1 w1 + + s ws N as the vectors u1 , u2 , . . . , uk , w1 , . . . , ws BN . Hence, v M N
and therefore, there exists scalars 1 , . . . , k such that v = 1 u1 + 2 u2 + + k uk .
Substituting this representation of v in Equation (11.3.4), we get
(1 1 )u1 + + (k k )uk + 1 w1 + + s ws = 0.
But then, the vectors u1 , u2 , . . . , uk , w1 , . . . , ws are linearly independent as they form a basis.
Therefore, by the definition of linear independence, we get
i i = 0, for 1 i k and j = 0 for 1 j s.
232
11.4
Let D be a region in xy-plane and let M and N be real valued functions defined on D. Consider
an equation
M (x, y(x))dx + N (x, y(x))dy = 0, (x, y(x)) D.
(11.4.5)
Definition 11.4.1 (Exact Equation) The Equation (11.4.5) is called Exact if there exists a real
valued twice continuously differentiable function f such that
f
f
= M and
= N.
x
y
Theorem 11.4.2 Let M and N be smooth in a region D. The equation (11.4.5) is exact if and
only if
N
M
=
.
(11.4.6)
y
x
Proof. Let Equation (11.4.5) be exact. Then there is a smooth function f (defined on D) such
f
2f
2f
M
N
that M = f
x and N = y . So, y = yx = xy = x and so Equation (11.4.6) holds.
Conversely, let Equation (11.4.6) hold. We now show that Equation (11.4.6) is exact. Define
G(x, y) on D by
Z
G(x, y) =
G
x
G
G
M
N
=
=
.
x y
y x
y
x
So
x (N
G
y )
= 0 or N
M (x, y) + N
dy
dx
G
y
=
=
=
=
G
is independent of x. Let (y) = N G
y or N = (y) + y . Now
G
G
dy
+
+ (y)
x
y
dx
Z
G G dy
d
dy
+
+
(y)dy
x
y dx
dy
dx
Z
d
d
G(x, y(x)) +
(y)dy
where y = y(x)
dx
dx
Z
d
f (x, y)
where f (x, y) = G(x, y) + (y)dy
dx