Documente Academic
Documente Profesional
Documente Cultură
A1 Mathematics
Linear Algebra
These are some books that I found useful while preparing these notes.
1
Topic 1
Systems of linear equations
1.1 Introduction
Let us begin by discussing the simple equation
ax b (1.1)
This is a linear equation in the unknown x. It is linear because x appears
on the left-hand side in a very simple form: raised to the power one, and
multiplied by a constant. There are no terms such as x , log(x) or x + 2
on the left-hand side. The only other term in the equation – the b on the
right-hand side – is a constant.
We can think of the left-hand side of eqn (1.1) as the result of a function
that maps x (a scalar) to ax (another scalar):
f ( x) ax
The adjective ‘linear’ arises from the fact that the graph of this function is
a straight line (and in particular, one that goes through the origin). More
formally, we say that f is a linear operator if it has the properties
a1x1 a2 x2 an xn b
2
where the coefficients a1, a2, …, an and the right-hand side b are all
constants. We will assume that the constants and the unknowns are
numbers in the real domain, denoted , but it should be pointed out that
some applications (e.g. quantum mechanics) give rise to linear equations
involving numbers in the complex domain.
Ax b (1.2)
We can think of the left-hand side of eqn (1.2) as the result of a function
that maps x (a vector in n) to Ax (a vector in m):
f (x) Ax
R
n Q P
R m
Q
P
O
O
3
1.2 Two ways of thinking about Ax = b
Consider the 2 2 system
1 1 x1 3
4 1 x 2
2
We need to find values for x1 and x2 such that the matrix-vector product
on the left-hand side is equal to the vector on the right-hand side.
x1 x2 3
4 x1 x2 2
Our task is to find a point in 2 with coordinates (x1, x2) such that both
equations are satisfied simultaneously. Geometrically, we can visualize
plotting two lines:
x2
4x1 – x2 = 2
4
2 (1, 2)
1
x1
-4 -3 -2 -1 1 2 3 4
-1
x1 + x2 = 3
-2
-3
-4
4
The alternative ‘column-oriented’ view interprets the system as
1 1 3
x1 x2
4 1 2
This time we have two vectors in 2, namely i 4 j and i j , and our task
is to combine them (by choosing suitable scalar multipliers x1 and x2) in
such a way that the resultant vector is 3i + 2j. Geometrically, we can
visualize constructing a parallelogram:
i + 4j
4
2 3i + 2j
-4 -3 -2 -1 1 2 3 4
-1
i–j
-2
2(i – j)
-3
-4
The ‘right’ linear combination of the column vectors is 1(i + 4j) + 2(i - j).
Hence x1 = 1, x2 = 2 is the (unique) solution of the original system.
The row- and column-oriented pictures look very different, but they lead,
of course, to the same solution. They are two sides of the same coin.
The column-oriented view may seem less natural, but we will see that it
can be very helpful in understanding and solving general m n linear
systems, whether homogeneous (Ax = 0) or non-homogeneous (Ax = b).
5
In particular, the column-oriented view provides us with an alternative
way of posing (and then answering!) the two fundamental questions that
arise in relation to any linear system:
Is there any solution at all?
If there is a solution, is it unique, or are there infinitely many?
A b
6
All of the systems you studied in P1 involved n equations in n unknowns.
We now want to relax that restriction, and see what can happen when
the coefficient matrix A is rectangular (m n). There are three scenarios:
Same number of
Fewer equations than More equations than
equations as
unknowns unknowns
unknowns
x1 x1 x1
x
x2
2 x2
x
x3 3 x3
2 3 1 4 1
2 2 1 4 0 0 3 2 5
0 1 0 3 0 1 3 2 0 0 2 3
0 0 0 2 3
0 0 3 2 0 0 0 0
0 0 0 0 4
7
In any row that does not consist entirely of zeros, the leftmost nonzero
entry is given a special name: the leading entry. We can now proceed to
a formal definition.
Some authors add a third condition, that each of the leading entries must
be 1. This complicates the elimination process slightly (each row has to
be divided through by its leading entry) but it simplifies back-substitution.
We will not insist on this; the main thing is to get the pattern right.
Here are some examples of matrices that are not in echelon form:
2 3 1 4 1
2 2 1 4 0 0 3 2 5
0 1 0 3 0 1 3 2 0 0 0 0
0 0 2 2 3
3 0 3 2 0 0 2 3
0 0 0 0 4
8
Note that if a square matrix is reduced to echelon form, we get a bonus
because the determinant is simply the product of the entries on the
diagonal. Be careful, however. Although ERO1 (remarkably) does not
affect the det, ERO2 and ERO3 both do. Swapping two rows negatives
the det, while scaling a row scales the det by the same factor.
Example
Reduce the following matrix to echelon form.
1 2 3 2 1
2 4 6 7 8
1 4 2 2 1
4 8 12 2 8
Solution
1 2 3 2 1
0 0 0 3 6 R 2 R 2 2R1
0 2 1 4 0 R 3 R 3 R1
0 0 0 6 12 R 4 R 4 4R1
1 2 3 2 1
0 2 1 4 0 R 2 R3
0 0 0 3 6
0 0 0 6 12
1 2 3 2 1
0 2 1 4 0
0 0 0 3 6
0 0 0 0 0 R 4 R 4 2R 3
9
Optional extra #1: if we wanted all the leading entries to be 1, we could
have scaled the rows en route, but we can also scale them now:
1 2 3 2 1
0 1 21 2 0 R 2 R 2 21
0 0 0 1 2 R 3 R 3 31
0 0 0 0 0
Optional extra #2: having done this, we could carry on and clear out the
columns vertically above each leading 1, working from right to left. This
eventually leads to the reduced row-echelon form:
1 2 3 0 3 R1 R1 2R 3
0 1 21 0 4 R 2 R 2 2R 3
0 0 0 1 2
0 0 0 0 0
1 0 4 0 11 R1 R1 2R 2
0 1 21 0 4
0 0 0 1 2
0 0 0 0 0
Reduced row-echelon form is the simplest form that a matrix (or really,
the underlying system of equations) can take. It makes back-substitution
a breeze, and furthermore it is unique, whereas an ordinary row-echelon
form (even with leading 1’s) is not. Then again, we need to work harder
to obtain the reduced version in the first place.
For Matlab enthusiasts, there is a function called rref for computing the
reduced row-echelon form of a matrix.
10
1.4 Rank and nullity
Let us first refresh the notion of linear independence.
1 0 1 0 0 1 0 2 0
0 2 0 2 0 0 3 0 4
3 4 3 4 5 2 3 4 4
We now need to introduce the idea of the kernel (or null space) of a
matrix. In Section 1.1 we discussed how an m n matrix A defines a
11
linear transformation from n to m. Essentially, the kernel of A is the set
of all vectors in n that get mapped to the zero vector in m. It is denoted
ker(A). The dimension of this set, i.e. the number of linearly independent
vectors it contains, is called the nullity of A. More formally,
It is clear that ker(A) always contains the zero vector, 0. If this is the only
vector in ker(A), then by convention, nullity(A) = 0. If ker(A) contains
vectors other than 0, then nullity(A) > 0. One way to determine nullity(A)
is to solve the homogeneous system Ax = 0, then count the number of
linearly independent solutions. We will do this in the next section. There
is, however, a neat shortcut based on the rank-nullity theorem:
rank(A) + nullity(A) = n
Example
Find the rank and nullity of the matrix
1 2 3 5
A 2 3 4 7
3 4 5 9
12
Solution
1 2 3 5
0 1 2 3 R 2 R 2 2R1
0 2 4 6 R 3 R 3 3R1
1 2 3 5
0 1 2 3
0 0 0 0 R 3 R 3 2R 2
There are two nonzero rows, so rank(A) = 2. Since the matrix has n = 4
columns, nullity(A) = n – rank(A) = 4 – 2 = 2.
1.5 Solving Ax = 0
A linear system in which the right-hand side vector is filled with zeros is
said to be homogeneous. Suppose we are given a matrix A and asked to
find the general solution of Ax = 0. This is exactly the same as being
asked to find ker(A). To start with, we can quickly determine nullity(A) as
outlined above. If it turns out to be zero, ker(A) contains only the zero
vector, and we can declare immediately that Ax = 0 has only the trivial
solution x = 0. If nullity(A) is not zero, then ker(A) contains vectors other
than 0, and Ax = 0 has infinitely many solutions. To find them, we use
reduction to echelon form and a ‘special’ back-substitution process.
Example
Find the general solution of Ax = 0 where
2 3 2 1 4
4 6 1 1 1
A
8 12 1 5 11
4 6 5 7 19
13
Solution
2 3 2 1
4
0 0 3 3 9 R 2 R 2 2R1
0 0 9 9 27 R 3 R 3 4R1
0 0 9 9 27 R 4 R 4 2R1
2 3 2 1 4
0 0 3 3 9
0 0 0 0 0 R 3 R 3 3R 2
0 0 0 0 0 R 4 R 4 3R 2
x1
2 3 2 1 4 0
0 x2
0 3 3 9 0
x3
0 0 0 0 0 0
x4
0 0 0 0 0 0
x5
The variables are now split into two groups. Columns C1 and C3 contain
the leading entries on the echelon (circled). The corresponding variables
x1 and x3 are variously referred to as the leading variables, key variables,
bound variables or pivot variables. The remaining columns (C2, C4, C5)
are associated with the variables x2, x4 and x5. These are called the free
variables (or sometimes, the non-leading variables). To find the general
solution, we first assign arbitrary parameters to the free variables, say
14
x2 , x4 , x5
3 x3 3 x4 9 x5 0
3 x3 3 9 0
x3 3
2x1 3 x2 2x3 x4 4 x5 0
2x1 3 2 3 4 0
x1 32 21
x1 32 21 32 21 1
x 0
2 1 0
x3 3 0 1 3 (1.3)
x
4 0 1 0
x5 0 0 1
32 21 1
0
1 0
0 , 1 , 3
0 1 0
0 0 1
15
In this instance, ker(A) is seen to be a 3-dimensional subspace of 5.
Hence nullity(A) – which is just the dimension of ker(A) – is equal to 3.
This agrees with the initial ‘prediction’ of nullity(A) that was made using
the rank-nullity theorem. It is good practice to check that each of the
kernel basis vectors does actually satisfy Ax = 0, and this can readily be
verified. It follows that any linear combination of the basis vectors, as in
eqn (1.3), must also satisfy Ax = 0.
1.6 Solving Ax = b
With a homogeneous linear system Ax = 0, the answer to the question of
solution existence is always affirmative, since (as discussed above) there
will always be at least the trivial solution x = 0. With a non-homogenous
system Ax = b, the answer to this question is no longer automatic: there
is now a possibility that the system will have no solution.
0 0
0 0 0 0 0
0 0 0 0 0 0
16
and 2 respectively) indicates that the vector b in 4 cannot be expressed
as a linear combination of the 5 columns of A.
0 0 0 0
We can then infer that the original system Ax = b is consistent, i.e. it has
at least one solution. Taking the row-oriented view, the absence of a
contradictory equation indicates that the m hyperplanes in n have at
least one point in common. Taking the column-oriented view, the fact
that [A | b] and A have the same column rank indicates that the vector b
in m can be expressed – in at least one way – as a linear combination
of the n columns of A.
If the above analysis shows that there is at least one solution, we turn to
the question of uniqueness: is there just one solution, or a whole family
of solutions? The answer for Ax = b, just as for Ax = 0, is determined by
the nullity of A. If nullity(A) = 0, the solution is unique, and it can be found
by ‘standard’ back-substitution (leading variables only, no free variables).
If nullity(A) > 0, the solution involves a corresponding number of arbitrary
parameters, and its general form has to be worked out by ‘special’ back-
substitution (separate treatment of leading and free variables).
17
Example
Find the general solution of Ax = b where
2 3 2 1 4 7
4 6 1 1 1 2
A , b
8 12 1 5 11 8
4 6 5 7 19 22
Solution
2 3 2 1 4 7
4 6 1 1 1 2
8 12 1 5 11 8
4 6 5 7 19 22
2 3 2 17 4
0 0 3 312 R 2 R 2 2R1
9
0 0 9 9 27 36 R 3 R 3 4R1
0 0 9 9 27 36 R 4 R 4 2R1
2 3 2 1 4 7
0 0 3 3 9 12
0 0 0 0 0 0 R 3 R 3 3R 2
0 0 0 0 0 0 R 4 R 4 3R 2
x2 , x4 , x5
18
Next we solve for the leading variables, remembering to incorporate the
right-hand side entries (this time they are not zero!). Solving row R2 for
x3 gives
3 x3 3 x4 9 x5 12
3 x3 3 9 12
x3 4 3
2x1 3 x2 2x3 x4 4 x5 7
2x1 3 2 4 3 4 7
x1 21 32 21
x1 21 32 21 21 32 21 1
x 0
2 0 1 0
x3 4 3 4 0 1 3 (1.4)
x4 0 0 1 0
x5 0 0 0 1
x xp ker( A)
21
2 3 2 1 4 7
4 6 1 1 1 0 2
4
8 12 1 5 11 8
0
4 6 5 7 19 22
0
Of course this is not the only possible choice for xp. The general solution
19
contains three arbitrary parameters, and we can pick any values we like
for those. For example, setting = 1, = 0 and = 1 in eqn (1.4), we get
another solution
x 1 1 1 0 1
T
1
2 3 2 1 4 7
4 6 1 1 1 1 2
1
8 12 1 5 11 8
0
4 6 5 7 19 22
1
and could therefore serve just as well as the particular solution xp. An
infinite number of other (equally valid) choices could be made. Note,
however, that any expression for the general solution of Ax = b must
always include the whole of the kernel:
x xp ker( A)
In closing it is worth recalling the ‘first’ form of the general solution in eqn
(1.4), and reflecting on the significance of the rank and nullity of the
coefficient matrix A.
x1 21 32 21
x
2
x3 4 3
x
4
x5
20
are ‘bound’ to the free variables in certain prescribed ratios.
21
Flowchart for Ax = b (matrix A is m n)
No
Yes nullity(A) = 0
rank(A) = n ?
(‘full column rank’) UNIQUE SOLUTION
No
nullity(A) > 0
INFINITELY MANY SOLNS
All n variables are
leading variables!
22
Example (A1 2004 Q7)
A set of linear equations is given by Ax = b where
1 1 2 3
1 x1 4
A
2 3
, x x2 and b
2 3 5 7
x3
1 3 4 5
23
24
Topic 2
Norms and conditioning
Going back to the example in the opening paragraph, we get the desired
outcome: the length of the 1-component vector 6.2 is greater than that
of 2.8 .
We are very used to calculating the length of a vector in this way, but in
fact a whole family of ‘length’ measures can be defined as follows:
25
1p
length of x1 x2 x1 x2
T p p
x
1p
length of x1 x2 x3
T p p p
1 x 2 x3
1p
length of x1 x2 xn x1 x2
T p p p
xn
x 2
x x xT x
26
It is often simpler (e.g. when minimizing the length of a vector, see Q7 of
the tutorial sheet) to work with the square of the 2-norm:
2
x 2
x x xT x
x 1 xi
i
i.e. the sum of the absolute values of the components. The 1-norm is
sometimes referred to as the ‘taxicab’ or ‘Manhattan’ norm, since a
x
max xi
i
i.e. the maximum of the absolute values of the components. You are
asked to justify this in Q3 of the tutorial sheet.
It is also possible to come up with vector norms other than the p-norm,
but any proposed norm must possess certain common-sense properties
in order to qualify as a meaningful measure of length. These are similar
to the properties of the absolute value function. Specifically,
(i) x 0 if and only if x 0
(ii) x x
27
It can be shown that these basic properties imply another common-
sense property that we would expect any norm to exhibit:
x 0 for all x
A good way to understand vector norms is to study how the unit circle in
2 looks under the three common p-norms:
1 1 1
1 1 1
x 1 1 x 2
1 x
1
Example
Find the 1-, 2-, 5-, and -norms of the vector x 3 5 4 1 .
T
Solution
x 1 3 5 4 1 13
12
32 5 42 1
2 2
x 51 7.14
2
15
3 5 4 1
5 5 5 5
x 5 4393 5.35
5
x
max 3 , 5 , 4 , 1 5
28
2.2 Matrix norms
Can we extend the notion of a norm from vectors to matrices? Intuition
suggests that a matrix such as
100 200
75 400
must somehow be ‘bigger’ than a matrix such as
1 3
1 2
One idea comes quickly to mind. We could place all the entries of the
matrix into a single vector, then apply the vector p-norm introduced
above. If we took p = 2, this would be equivalent to squaring each entry
of the matrix, summing and taking the square root. This is indeed a
commonly used matrix norm known as the Frobenius norm:
A Fro Aij2
i j
Another approach, and one that turns out to be much more useful, is to
think about the action of a matrix on an arbitrary vector. As we noted in
the first lecture, an m n matrix A can be viewed as a linear operator
that maps a vector x in n to a vector Ax in m. If we take the norm of
the ‘output’ vector Ax and divide it by the norm of the ‘input’ vector x, we
should get some clue as to the ‘magnifying power’ of the matrix A (and
thus its inherent ‘size’ or ‘magnitude’). We are therefore motivated to
calculate the ratio
Ax p
x p
assuming of course that x is not the zero vector. The trouble is that,
29
unless A happens to be the identity matrix (or some multiple thereof),
this ratio will depend on the direction of the input vector x. To use a very
simple example, consider the diagonal matrix
5 0
A
0 2
0
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
-1
-2
-3
Ax
max
p
A p x 0 x p
With this definition, the matrix p-norm provides an upper bound on the
30
‘magnifying power’ of A (as measured by applying the vector p-norm to
both the input vector x and output vector Ax). It means that we are
assured of satisfying the following important inequality for any vector x,
regardless of its direction.
is satisfied for all vectors x in n, then the matrix norm A is said to be
compatible with, or induced from, the vector norm used as a metric for x
and Ax.
A 1 max Aij
j
i
which is the maximum absolute column sum of A (remember that i
counts down the rows, and j counts across the columns). We could just
take this on trust, but it would be nice to verify that the inequality
Ax 1 A 1 x 1
is actually satisfied for any vector x. The proof requires some messy
fiddling with summations and absolute values. The indices i and j are
understood to run from 1 to m and 1 to n respectively.
31
Ax 1 Aij x j
i j
Aij x j *
i j
Aij x j
i j
Aij x j
j i
xj Aij
j i
x j max Aij
j
j
i
x1 A1
A1 x 1
The justification for the step marked * is the scalar version of the triangle
inequality mentioned above: .
A
max Aij
i
j
which is the maximum absolute row sum of A. To verify the validity of
this definition, we need to show that the inequality
Ax
A x
is certain to be satisfied for any vector x. This time the proof is a little
shorter, but still requires careful study to follow the reasoning. You might
find it helpful to make up a few small but concrete ‘test cases’ (e.g. with
A = 2 3 and x = 3 1) and trace what happens at each step. To make
it interesting, ensure you include a few negative terms in both A and x!
32
Ax
max
i ij j
A x
j
max Aij x j *
i
j
max Aij x j
i
j
max Aij max xj
i
j j
A x
Ax 2 A 2 x 2
A 2 max i
i
You might like to think about this for the unit circle to ellipse mapping
discussed above. More generally, the 2-norm of A turns out to be its
largest singular value (this is beyond the scope of the present course).
Example
Find the 1-norm, -norm and Frobenius norm of the matrix
3 2 6
A 5 8 4
4 1 3
33
Solution
The 1-norm is the maximum absolute column sum:
Example
Find the 2-norm of the matrix
1 2
A
2 3
Solution
This matrix is square and symmetric, so its 2-norm can be calculated by
finding the eigenvalues.
1 2
det 0
2 3
1 3 4 0
2 2 7 0
34
Example
Given the matrix and vector
Ax p
Aq x p
Solution
x 1 0.5, x
0.3 (vector norms of x)
Ax 1 0.91, Ax
0.82 (vector norms of Ax)
Combination Ax p
Aq x p Inequality satisfied?
Moral: don’t mix and match incompatible matrix and vector norms!
35
2.3 Conditioning of linear systems
Consider the 2 2 system
1 1 x1 2
1 1 x 0
2
x1 1
x 1
2
1 1.01 x1 2
1 1 x 0
2
x1 0.995
x 0.995
2
Similarly, if we perturb the first term of the right hand side vector by 1%,
1 1 x1 2.02
1 1 x 0
2
x1 1.01
x 1.01
2
36
Consider another 2 2 system
1 1 x1 2
0.99 1.01 x 2
2
x1 1
x 1
2
1 1.01 x1 2
0.99 1.01 x 2
2
x1 0
x 1.98
2
When we perturb the first term of the right hand side vector by 1%,
1 1 x1 2.02
0.99 1.01 x 2
2
x1 2.01
x 0.01
2
37
We can gain some insight into what is happening if we plot the two ‘base
case’ situations (well-conditioned on the left, ill-conditioned on the right):
2 2
1 1
0 0
0 1 2 0 1 2
The reason for the ill-conditioned behaviour is now clear. The lines in the
right-hand plot are so close to parallel that even small changes in slope
and/or intercept cause the intersection point to move far away from the
‘base case’ solution point (1, 1).
If the lines were parallel, the determinant of the coefficient matrix would
be zero, which might prompt the idea of using det(A) as an indicator of
possible conditioning trouble. The ill-conditioned system above has
1 1
A
0.99 1.01
such that det(A) = 0.02, which seems close to zero. Unfortunately, the
determinant is useless for predicting ill-conditioning because a simple
rescaling of the problem leads to a change in the determinant. If we
multiply the equations of the ill-conditioned system by 10, we get
10 10 x1 20
9.9 10.1 x 20
2
38
such that det(A) = 2. The matrix appears to be much ‘less singular’, but
the solution remains just as sensitive to 1% (or other small) perturbations
of the entries in A and/or b.
A b
and
A b
We would like to be able to predict the effect on the solution, via the
corresponding relative error ratio
x
x
Any convenient norm can be used, but it is essential that the matrix norm
and vector norm be ‘compatible’ in the sense defined previously.
Here are the gory details. Suppose we have a linear system defined by a
matrix A, a right-hand side vector b, and an exact solution x, such that
Ax b
The matrix A is assumed to be square and invertible.
A A x x b
39
Subtracting Ax from the left and b from the right,
Ax A x x 0
x A1A x x 0
x A1A x x
x A 1A x x
A 1 A x x
A 1 A x x
Hence
x
A1 A
x x
or equivalently
x A
x x
A A1 A
(2.1)
A x x b b
Ax b
Premultiplying by A1 ,
x A1b
40
Taking norms and applying the ‘compatibility inequality’,
x A1b
A 1 b
x b A1 b Ax
x b A1 b A x
Rearranging,
x b
x
A A1 b
(2.2)
Remarkably, the bracketed term A A1 appears in both eqn (2.1) and
A A A1
x A
A
x x A
x b
A
x b
41
Example
Find the condition numbers for the two systems considered at the start of
this section. Also, for the ill-conditioned system, verify that the actual
relative errors caused by the perturbations in A and b are within the
bounds predicted by the condition number. Use the infinity norm.
Solution
For the well-conditioned system,
1 1 0.5 0.5
A and A1
1 1 0.5 0.5
hence
A A
A1 2 1 2 (which is low)
1 1 50.5 50
A and A1
0.99 1.01 49.5 50
hence
A A
A1 2 100.5 201 (which is high)
1 1 1 1.01
A to A A
0.99 1.01 0.99 1.01
such that
0 0.01
A
0 0
42
This caused the solution to change from
1 0
x to x x
1 1.98
such that
1
x
0.98
We therefore expect
x A
A
x x
A
1 0.01
201
1.98 2
0.505 1.005
which checks out.
2 2.02
b to b b
2 2
such that
0.02
b
0
1 2.01
x to x x
1 0.01
such that
1.01
x
0.99
43
We therefore expect
x b
A
x
b
1.01 0.02
201
1 2
1.01 2.01
which again checks out.
44