2A1 Linear Algebra L1 Notes Martin PDF

University of Oxford
Department of Engineering Science
A1 Mathematics
Linear Algebra
4 lectures, Michaelmas Term 2017
Prof. C.M. Martin

chris.martin@eng.ox.ac.uk
These are some books that I found useful while preparing these notes.
 Allenby, R.B.J.T. (1995). Linear algebra. Elsevier.

 Anton, H. (1984). Elementary linear algebra (4th edn). Wiley.
 Anton, H. & Rorres, C. (2000). Elementary linear algebra –
applications version (8th edn). Wiley.
 Kreyszig, E. (2006). Advanced engineering mathematics (9th edn).
Wiley.
 Larson, R. & Falvo, D.C. (2010). Elementary linear algebra (6th edn).
Brooks/Cole.
 Strang, G. (2006). Linear algebra and its applications (4th edn).
Thomson Brooks/Cole
1
Topic 1
Systems of linear equations
1.1 Introduction
Let us begin by discussing the simple equation
ax  b (1.1)
This is a linear equation in the unknown x. It is linear because x appears
on the left-hand side in a very simple form: raised to the power one, and
multiplied by a constant. There are no terms such as x , log(x) or x + 2
on the left-hand side. The only other term in the equation – the b on the
right-hand side – is a constant.
We can think of the left-hand side of eqn (1.1) as the result of a function
that maps x (a scalar) to ax (another scalar):
f ( x)  ax
The adjective ‘linear’ arises from the fact that the graph of this function is
a straight line (and in particular, one that goes through the origin). More
formally, we say that f is a linear operator if it has the properties
(i) f(x + y) = f(x) + f(y)

(ii) f(x) = f(x) [where  = an arbitrary scalar]
The function f(x) = ax satisfies both requirements, whereas functions like
x , log(x) and x + 2 clearly do not: they are non-linear operators.
More generally, a linear equation in n unknowns x1, x2, …, xn is defined

as one that can be expressed in the form
a1x1  a2 x2   an xn  b
2
where the coefficients a1, a2, …, an and the right-hand side b are all
constants. We will assume that the constants and the unknowns are
numbers in the real domain, denoted , but it should be pointed out that
some applications (e.g. quantum mechanics) give rise to linear equations
involving numbers in the complex domain.
For much of this course, we will be studying sets of simultaneous linear

equations that we aim to solve for the unknowns. We will describe a set
of m linear equations in n unknowns as an “m  n linear system”. As you
know from P1, such a system can be expressed in the compact form
Ax  b (1.2)
where A is an m  n matrix, and b and x are column vectors with m and

n components respectively. Note the similarity with eqn (1.1).
We can think of the left-hand side of eqn (1.2) as the result of a function
that maps x (a vector in n) to Ax (a vector in m):
f (x)  Ax
This function is a linear transformation from n to m because
(i) f(x + y) = A(x + y) = Ax + Ay = f(x) + f(y)

(ii) f(x) = A(x) = (Ax) = f(x)
Essentially, the origin is mapped to the origin (i.e. there is no translation),

and straight lines in n are mapped to straight lines in m.
R
n Q P
R m
Q
P
O
O
3
1.2 Two ways of thinking about Ax = b
Consider the 2  2 system
 1 1   x1  3
 4 1  x    2
  2  
We need to find values for x1 and x2 such that the matrix-vector product
on the left-hand side is equal to the vector on the right-hand side.
The familiar ‘row-oriented’ view interprets the system as
x1  x2  3
4 x1  x2  2
Our task is to find a point in 2 with coordinates (x1, x2) such that both
equations are satisfied simultaneously. Geometrically, we can visualize
plotting two lines:
x2
4x1 – x2 = 2
4
2 (1, 2)
1
x1
-4 -3 -2 -1 1 2 3 4
-1
x1 + x2 = 3
-2
-3
-4
The lines intersect at the point x1 = 1, x2 = 2. This is the (unique) solution

of the original system.
4
The alternative ‘column-oriented’ view interprets the system as
 1  1  3 
x1    x2     
 4  1  2
This time we have two vectors in 2, namely i  4 j and i  j , and our task
is to combine them (by choosing suitable scalar multipliers x1 and x2) in
such a way that the resultant vector is 3i + 2j. Geometrically, we can
visualize constructing a parallelogram:
i + 4j
4
2 3i + 2j
-4 -3 -2 -1 1 2 3 4
-1
i–j
-2
2(i – j)
-3
-4
The ‘right’ linear combination of the column vectors is 1(i + 4j) + 2(i - j).
Hence x1 = 1, x2 = 2 is the (unique) solution of the original system.
The row- and column-oriented pictures look very different, but they lead,
of course, to the same solution. They are two sides of the same coin.
The column-oriented view may seem less natural, but we will see that it
can be very helpful in understanding and solving general m  n linear
systems, whether homogeneous (Ax = 0) or non-homogeneous (Ax = b).
5
In particular, the column-oriented view provides us with an alternative
way of posing (and then answering!) the two fundamental questions that
arise in relation to any linear system:
 Is there any solution at all?
 If there is a solution, is it unique, or are there infinitely many?
Row-oriented view Column-oriented view
Do the m hyperplanes in Can b be expressed as a

Solution
n-dimensional space have linear combination of the
existence
any point in common? columns of A?
Do the hyperplanes Can the linear combination

Solution
intersect at just one point, be formed in just one way,
uniqueness
or infinitely many points? or infinitely many ways?
1.3 Echelon form

Regardless of which geometric interpretation we have in mind, we need
a reliable way of answering the basic questions of solution existence and
solution uniqueness for a general linear system Ax = b. In P1 you learnt
how to apply Gaussian elimination to the augmented coefficient matrix
A b
Having reduced this matrix (and thus the system of equations) to a

simpler form, you were generally able to solve for the unknowns by a
systematic process of back-substitution, leading to a unique solution x.
Sometimes, however, elimination revealed that there was no solution,
implying that the original equations were inconsistent. In other cases,
elimination revealed that there were infinitely many solutions – one or
more of the unknowns could be assigned an arbitrary value.
6
All of the systems you studied in P1 involved n equations in n unknowns.
We now want to relax that restriction, and see what can happen when
the coefficient matrix A is rectangular (m  n). There are three scenarios:
m=n m<n m>n
Square Underdetermined Overdetermined
Same number of
Fewer equations than More equations than
equations as
unknowns unknowns
unknowns
Example: 3 3 Example: 2 3 Example: 4 3
   
    x1    x1     x1   
    x              
  x2   
  2       x2         
  x  
    x3    3     x3   
     
‘Usual’ expectation: ‘Usual’ expectation: ‘Usual’ expectation:

unique solution infinitely many solns no solution
The good news is that Gaussian elimination – with a few tweaks –

continues to play a central role. The key is to reduce the (augmented)
coefficient matrix to row-echelon form, or just echelon form for short. The
word is borrowed directly from French (échelon = “a rung on a ladder”).
To illustrate the meaning, the following matrices are in echelon form:
2 3 1 4 1
 2 2 1 4  0 0 3 2 5
0 1 0 3  0 1 3 2 0 0 2 3 
 
  0 0 0 2 3   
0 0 3 2   0 0 0 0
0 0 0 0 4
7
In any row that does not consist entirely of zeros, the leftmost nonzero
entry is given a special name: the leading entry. We can now proceed to
a formal definition.
A matrix is in echelon form if

 in every pair of successive nonzero rows, the leading entry in the
lower row lies strictly to the right of the leading entry in the upper row;
 any zero rows are grouped together at the bottom of the matrix.
Some authors add a third condition, that each of the leading entries must
be 1. This complicates the elimination process slightly (each row has to
be divided through by its leading entry) but it simplifies back-substitution.
We will not insist on this; the main thing is to get the pattern right.
Here are some examples of matrices that are not in echelon form:
2 3 1 4 1
 2 2 1 4  0 0 3 2 5
0 1 0 3  0 1 3 2 0 0 0 0 
 
  0 0 2 2 3   
3 0 3 2   0 0 2 3
0 0 0 0 4
Any matrix can be reduced to echelon form by performing a sequence of

elementary row operations. These operations allow the corresponding
system of equations to be simplified, without altering the solution.
The three available elementary row operations are:

ERO1 – add a multiple of one row to another row.
ERO2 – swap two rows.
ERO3 – multiply a row by a nonzero constant.
8
Note that if a square matrix is reduced to echelon form, we get a bonus
because the determinant is simply the product of the entries on the
diagonal. Be careful, however. Although ERO1 (remarkably) does not
affect the det, ERO2 and ERO3 both do. Swapping two rows negatives
the det, while scaling a row scales the det by the same factor.
Example
Reduce the following matrix to echelon form.
1 2 3 2 1
2 4 6 7 8
 
 1 4 2 2 1
 
 4 8 12 2 8
Solution
1 2 3 2 1 
0 0 0 3 6  R 2  R 2  2R1
 
0 2 1 4 0  R 3  R 3  R1
 
0 0 0 6 12 R 4  R 4  4R1
1 2 3 2 1 
0 2 1 4 0  R 2  R3
 
0 0 0 3 6 
 
0 0 0 6 12
1 2 3 2 1
0 2 1 4 0
 
0 0 0 3 6
 
0 0 0 0 0  R 4  R 4  2R 3
9
Optional extra #1: if we wanted all the leading entries to be 1, we could
have scaled the rows en route, but we can also scale them now:
1 2 3 2 1
0 1  21 2 0  R 2  R 2   21
 
0 0 0 1 2 R 3  R 3  31
 
0 0 0 0 0
Optional extra #2: having done this, we could carry on and clear out the
columns vertically above each leading 1, working from right to left. This
eventually leads to the reduced row-echelon form:
1 2 3 0 3 R1  R1  2R 3
0 1  21 0 4  R 2  R 2  2R 3
 
0 0 0 1 2
 
0 0 0 0 0
1 0 4 0 11 R1  R1  2R 2
0 1  21 0 4 
 
0 0 0 1 2 
 
0 0 0 0 0 
Reduced row-echelon form is the simplest form that a matrix (or really,
the underlying system of equations) can take. It makes back-substitution
a breeze, and furthermore it is unique, whereas an ordinary row-echelon
form (even with leading 1’s) is not. Then again, we need to work harder
to obtain the reduced version in the first place.
For Matlab enthusiasts, there is a function called rref for computing the
reduced row-echelon form of a matrix.
10
1.4 Rank and nullity
Let us first refresh the notion of linear independence.
A set of vectors is linearly independent if no member of the set can be

expressed as a linear combination of the other members. Otherwise, the
set of vectors is linearly dependent.
When discussing matrices, the terms row rank (= number of linearly

independent rows) and column rank (= number of linearly independent
columns) are commonly used. Somewhat surprisingly, it turns out that
the row rank and the column rank of any matrix are the same, so there is
no ambiguity in talking about ‘the rank’ of a matrix. In summary:
The rank of a matrix A can be defined as either

 the number of linearly independent rows of A, or
 the number of linearly independent columns of A.
The two definitions are equivalent.
Sometimes the rank of a matrix is obvious from inspection, e.g.
1 0  1 0 0  1 0 2 0
0 2  0 2 0  0 3 0 4 
     
3 4  3 4 5  2 3 4 4
have ranks of 2, 3 and 2. The failsafe technique, however, is to reduce

the matrix to echelon form, then count the number of nonzero rows (or
equally well, count the number of leading entries – both counts will be
the same because of the strict definition of echelon form).
We now need to introduce the idea of the kernel (or null space) of a
matrix. In Section 1.1 we discussed how an m  n matrix A defines a
11
linear transformation from n to m. Essentially, the kernel of A is the set
of all vectors in n that get mapped to the zero vector in m. It is denoted
ker(A). The dimension of this set, i.e. the number of linearly independent
vectors it contains, is called the nullity of A. More formally,
The kernel of a matrix A is the set of all solutions of the homogeneous

system Ax = 0.
The nullity of a matrix A is the number of linearly independent solutions

of the homogeneous system Ax = 0.
It is clear that ker(A) always contains the zero vector, 0. If this is the only
vector in ker(A), then by convention, nullity(A) = 0. If ker(A) contains
vectors other than 0, then nullity(A) > 0. One way to determine nullity(A)
is to solve the homogeneous system Ax = 0, then count the number of
linearly independent solutions. We will do this in the next section. There
is, however, a neat shortcut based on the rank-nullity theorem:
Let A be a matrix with n columns. Then
rank(A) + nullity(A) = n
This means that we can determine the nullity of a matrix as soon as it

has been reduced to echelon form.
Example
Find the rank and nullity of the matrix
 1 2 3 5
A   2 3 4 7
 
3 4 5 9 
12
Solution
1 2 3 5 
0 1 2 3 R 2  R 2  2R1
 
0 2 4 6 R 3  R 3  3R1
1 2 3 5 
0 1 2 3
 
0 0 0 0  R 3  R 3  2R 2
There are two nonzero rows, so rank(A) = 2. Since the matrix has n = 4
columns, nullity(A) = n – rank(A) = 4 – 2 = 2.
1.5 Solving Ax = 0
A linear system in which the right-hand side vector is filled with zeros is
said to be homogeneous. Suppose we are given a matrix A and asked to
find the general solution of Ax = 0. This is exactly the same as being
asked to find ker(A). To start with, we can quickly determine nullity(A) as
outlined above. If it turns out to be zero, ker(A) contains only the zero
vector, and we can declare immediately that Ax = 0 has only the trivial
solution x = 0. If nullity(A) is not zero, then ker(A) contains vectors other
than 0, and Ax = 0 has infinitely many solutions. To find them, we use
reduction to echelon form and a ‘special’ back-substitution process.
Example
Find the general solution of Ax = 0 where
 2 3 2 1 4 
 4 6 1 1 1 
A 
 8 12 1 5 11
 
 4 6 5 7 19 
13
Solution
2 3 2 1
4 
0 0 3 3 9  R 2  R 2  2R1
 
0 0 9 9 27  R 3  R 3  4R1
 
0 0 9 9 27  R 4  R 4  2R1
2 3 2 1 4
0 0 3 3 9 
 
0 0 0 0 0  R 3  R 3  3R 2
 
0 0 0 0 0  R 4  R 4  3R 2
We see that rank(A) = 2 and nullity(A) = 5 – 2 = 3, so the system Ax = 0

has infinitely many solutions. In fact, we could see at the outset that the
system was underdetermined (4  5), so we knew that nullity(A) had to
be at least 5 – 4 = 1. In general, an underdetermined homogeneous
system always has infinitely many solutions.
Writing the echelon form of Ax = 0 out in full, we have:
 x1 
2 3 2 1 4    0 
0 x2
0 3 3 9   0
   x3    
0 0 0 0 0    0 
  x4  
0 0 0 0 0    0 
 x5 
The variables are now split into two groups. Columns C1 and C3 contain
the leading entries on the echelon (circled). The corresponding variables
x1 and x3 are variously referred to as the leading variables, key variables,
bound variables or pivot variables. The remaining columns (C2, C4, C5)
are associated with the variables x2, x4 and x5. These are called the free
variables (or sometimes, the non-leading variables). To find the general
solution, we first assign arbitrary parameters to the free variables, say
14
x2  , x4  , x5  
Next we solve for the leading variables in terms of these parameters,

working up through the nonzero rows, and from right to left along the
echelon. (Note how each of the zero rows automatically has a zero on
the right-hand side – it is impossible for a homogeneous system to be
inconsistent.) We first solve row R2 for x3,
3 x3  3 x4  9 x5  0
3 x3  3  9  0
x3    3
then solve row R1 for x1:
2x1  3 x2  2x3  x4  4 x5  0
2x1  3  2   3     4  0
x1   32   21   
The general solution can now be assembled, then (optionally) expressed

as a linear combination of vectors, one for each arbitrary parameter:
 x1    32   21       32    21  1
x        0
  2    1  0  
 x3      3     0     1     3 (1.3)
         
x
  4    0 1 0
 x5      0   0   1 
This is the desired general solution of Ax = 0; it is also ker(A). We can

easily read off a set of linearly independent basis vectors for the kernel:
   32    21  1 
     0 
 1 0   
  0 ,  1 ,  3 
       
 0 1 0 
  0   0   1  
15
In this instance, ker(A) is seen to be a 3-dimensional subspace of 5.
Hence nullity(A) – which is just the dimension of ker(A) – is equal to 3.
This agrees with the initial ‘prediction’ of nullity(A) that was made using
the rank-nullity theorem. It is good practice to check that each of the
kernel basis vectors does actually satisfy Ax = 0, and this can readily be
verified. It follows that any linear combination of the basis vectors, as in
eqn (1.3), must also satisfy Ax = 0.
1.6 Solving Ax = b
With a homogeneous linear system Ax = 0, the answer to the question of
solution existence is always affirmative, since (as discussed above) there
will always be at least the trivial solution x = 0. With a non-homogenous
system Ax = b, the answer to this question is no longer automatic: there
is now a possibility that the system will have no solution.
As usual, reduction to echelon form (this time performing elementary row

operations on the augmented coefficient matrix [A | b]) provides clarity.
Suppose we have a 4  5 system and end up with something like
     
0 0    
 
0 0 0 0 0 
 
0 0 0 0 0 0
We can immediately infer that the original system Ax = b is inconsistent,

i.e. it has no solution. We can interpret the circled entry in two different
ways, using the terminology introduced in Section 1.2. Taking the row-
oriented view, the emergence of a contradictory equation indicates that
the 4 hyperplanes in 5 have no point in common. Taking the column-
oriented view, the fact that [A | b] and A have different column ranks (3
16
and 2 respectively) indicates that the vector b in 4 cannot be expressed
as a linear combination of the 5 columns of A.
On the other hand, suppose we reduce a general m  n system Ax = b to

echelon form, and find that the trailing rows (if any) all look like
0 0 0 0
We can then infer that the original system Ax = b is consistent, i.e. it has
at least one solution. Taking the row-oriented view, the absence of a
contradictory equation indicates that the m hyperplanes in n have at
least one point in common. Taking the column-oriented view, the fact
that [A | b] and A have the same column rank indicates that the vector b
in m can be expressed – in at least one way – as a linear combination
of the n columns of A.
If the above analysis shows that there is at least one solution, we turn to
the question of uniqueness: is there just one solution, or a whole family
of solutions? The answer for Ax = b, just as for Ax = 0, is determined by
the nullity of A. If nullity(A) = 0, the solution is unique, and it can be found
by ‘standard’ back-substitution (leading variables only, no free variables).
If nullity(A) > 0, the solution involves a corresponding number of arbitrary
parameters, and its general form has to be worked out by ‘special’ back-
substitution (separate treatment of leading and free variables).
Our last example will show how to solve a non-homogeneous system

Ax = b that has infinitely many solutions. To illustrate some important
points, we will retain the same A matrix that was used to demonstrate
the solution of Ax = 0 in Section 1.5.
17
Example
Find the general solution of Ax = b where
 2 3 2 1 4  7
 4 6 1 1 1  2
A , b 
 8 12 1 5 11  8 
   
 4 6 5 7 19   22 
Solution
 2 3 2 1 4 7
 4 6 1 1 1 2 
 
 8 12 1 5 11 8 
 
 4 6 5 7 19 22 
2 3 2 17  4
0 0 3 312 R 2  R 2  2R1
9
 
0 0 9 9 27 36  R 3  R 3  4R1
 
0 0 9 9 27 36  R 4  R 4  2R1
2 3 2 1 4 7 
0 0 3 3 9 12
 
0 0 0 0 0 0  R 3  R 3  3R 2
 
0 0 0 0 0 0  R 4  R 4  3R 2
Since rank([A | b]) = rank(A) = 2, the system is consistent: there is at

least one solution. Furthermore, since nullity(A) = 5 – 2 = 3, we know
that the general solution will involve 3 arbitrary parameters, so in fact the
system has infinitely many solutions. Proceeding just as in Section 1.5,
we first assign arbitrary parameters to the free variables, say
x2  , x4  , x5  
18
Next we solve for the leading variables, remembering to incorporate the
right-hand side entries (this time they are not zero!). Solving row R2 for
x3 gives
3 x3  3 x4  9 x5  12
3 x3  3  9  12
x3  4    3
and solving row R1 for x1 gives
2x1  3 x2  2x3  x4  4 x5  7
2x1  3  2  4    3     4  7
x1   21  32   21   
The general solution can now be assembled, then ‘unpacked’ as before:
 x1    21  32   21       21    32    21  1
x          0
 2    0 1 0  
 x3    4    3   4     0     1     3 (1.4)
           
 x4     0 0 1 0
 x5      0   0   0   1 
x xp ker( A)
We recognize the second part of the solution as ker(A): it is identical to

the solution of the homogeneous system Ax = 0 that we obtained in eqn
(1.3). But we now have an additional term: a particular solution xp that
(we hope!) satisfies Ax = b. It is easy to check that it does:
  21 
 2 3 2 1 4     7 
 4 6 1 1 1   0   2 
  4     
 8 12 1 5 11    8
  0   
 4 6 5 7 19     22
0
Of course this is not the only possible choice for xp. The general solution
19
contains three arbitrary parameters, and we can pick any values we like
for those. For example, setting  = 1,  = 0 and  = 1 in eqn (1.4), we get
another solution
x   1 1 1 0 1
T
that also satisfies Ax = b,
 1
 2 3 2 1 4     7 
 4 6 1 1 1   1   2 
  1     
 8 12 1 5 11    8
  0   
 4 6 5 7 19     22
 1
and could therefore serve just as well as the particular solution xp. An
infinite number of other (equally valid) choices could be made. Note,
however, that any expression for the general solution of Ax = b must
always include the whole of the kernel:
x  xp  ker( A)
In closing it is worth recalling the ‘first’ form of the general solution in eqn
(1.4), and reflecting on the significance of the rank and nullity of the
coefficient matrix A.
 x1    21  32   21    
x   
  2  
 x3    4    3 
   
x
  4  
 x5    
In this example nullity(A) is 3, which corresponds to the number of free

variables (x2, x4, x5) that can be assigned arbitrary values. Also, rank(A)
is 2, which corresponds to the number of leading variables (x1, x3) that
20
are ‘bound’ to the free variables in certain prescribed ratios.
In general, the significance of the nullity of A is that it tells us the number

of degrees of freedom in the general solution of a linear system Ax = b
(or indeed Ax = 0). The rank of A is a complementary measure – it tells
us the number of constraints imposed on the general solution. If the
nullity is zero, the solution has no degrees of freedom, therefore that
solution is unique. If the nullity is greater than zero, the solution has a
corresponding number of degrees of freedom, and consequently the
system has infinitely many solutions.
21
Flowchart for Ax = b (matrix A is m  n)
Form augmented matrix

[A | b] and reduce it to
echelon form
Yes rank([A | b]) > rank(A)

Trailing row of form
0 0 0  ? NO SOLUTION
No
rank([A | b]) = rank(A)

AT LEAST ONE SOLUTION
Yes nullity(A) = 0
rank(A) = n ?
(‘full column rank’) UNIQUE SOLUTION
No
nullity(A) > 0
INFINITELY MANY SOLNS
All n variables are
leading variables!
Identify free variables and

assign them arbitrary
parameters (say , , , …)
Solve for leading variables Solve for leading variables

by back-substitution  by back-substitution 
x = xp + ker(A) x = xp
22
Example (A1 2004 Q7)
A set of linear equations is given by Ax = b where
1 1 2 3 
1   x1   4
A
2 3 
 , x  x2  and b   
2 3 5   7
   x3   
1 3 4 5 
(a) Define the rank, nullity and kernel of a matrix.

(b) Reduce the augmented matrix [A | b] to echelon form and hence
find the rank, nullity and kernel of the matrix A.
(c) Confirm that solutions to Ax = b exist, and find the general solution
for x.
23
24
Topic 2
Norms and conditioning
2.1 Vector norms

When we talk about the ‘size’ or ‘magnitude’ of a real number, we mean
its absolute value. Although the number –6.2 is algebraically smaller than
the number 2.8, most of us would agree that –6.2 is ‘bigger’ than 2.8, in
the sense that it lies further away from zero. The idea can be expressed
more formally: the length of the vector running from the origin to –6.2 is
greater than that of the vector running from the origin to 2.8.
What about higher dimensions? We know how to calculate the length of

vectors in 2 and 3 using Pythagoras’s theorem, and there is a very
natural extension to n:
length of  x1 x2   x12  x22

T
length of  x1 x2 x3   x12  x22  x3 2

T
length of  x1 x2 xn   x12  x22 

T
 xn 2
The general definition also works perfectly well when n = 1, since
length of  x1  x12  x1
Going back to the example in the opening paragraph, we get the desired
outcome: the length of the 1-component vector  6.2 is greater than that
of 2.8 .
We are very used to calculating the length of a vector in this way, but in
fact a whole family of ‘length’ measures can be defined as follows:
25
 
1p
length of  x1 x2   x1  x2
T p p
x 
1p
length of  x1 x2 x3 
T p p p
1  x 2  x3
 
1p
length of  x1 x2 xn   x1  x2 
T p p p
 xn
This generalized measure of length is called the p-norm (or Lp norm). To

denote the p-norm of a vector x, we use the special notation x p . We
never write x p , though the similarity is of course intentional; taking the
norm of a vector gives us a scalar measure of its magnitude, and thus is

analogous to taking the absolute value of a real (or complex) number.
The p-norm can be defined compactly as follows.
Let x be a vector and p ( 1) be a scalar. The p-norm of x is

1p
 p
x p
   xi 
 i 
It is not essential that p be an integer, but in practice it usually is. In fact,

there are only a few norms in common use, namely the ones that are
easy to calculate: those for p = 1, p = 2 and p = .
The familiar Pythagorean measure of length is just the p-norm with p = 2.

It is commonly referred to as the Euclidean norm, or the Euclidean metric
(note that in this context, metric means “measure of distance” – nothing
to do with SI units). The 2-norm can also be expressed in a compact
vectorial form, since it is the square root of the dot product of x with itself:
x 2
 x  x  xT x
26
It is often simpler (e.g. when minimizing the length of a vector, see Q7 of
the tutorial sheet) to work with the square of the 2-norm:
2
x 2
 x  x  xT x
If we set p = 1 in the general definition, we get
x 1   xi
i
i.e. the sum of the absolute values of the components. The 1-norm is
sometimes referred to as the ‘taxicab’ or ‘Manhattan’ norm, since a
hypothetical taxi driving from 0 0 to a destination with coordinates

T
 x1 x2  has to cover a distance of at least x1  x2 if it is constrained

T
to drive on a rectangular grid of streets in 2.
Finally, if we let p approach infinity in the general definition, we get
x 
 max  xi 
i
i.e. the maximum of the absolute values of the components. You are
asked to justify this in Q3 of the tutorial sheet.
It is also possible to come up with vector norms other than the p-norm,
but any proposed norm must possess certain common-sense properties
in order to qualify as a meaningful measure of length. These are similar
to the properties of the absolute value function. Specifically,
(i) x  0 if and only if x  0
(ii) x   x
(iii) xy  x  y (the ‘triangle inequality’)
27
It can be shown that these basic properties imply another common-
sense property that we would expect any norm to exhibit:
x  0 for all x
For example, the 1-norm  xi is a valid vector norm, but  xi is not,

i i
since the sum of the terms in a vector could be negative.
A good way to understand vector norms is to study how the unit circle in
2 looks under the three common p-norms:
1 1 1
1 1 1
x 1 1 x 2
1 x 
1
Example
Find the 1-, 2-, 5-, and -norms of the vector x  3 5 4 1 .
T
Solution
x 1  3  5  4  1  13
12
 32   5   42   1 
2 2
x  51  7.14
2  
15
  3  5  4  1 
5 5 5 5
x  5 4393  5.35
5  
x 
 max  3 , 5 , 4 , 1   5
28
2.2 Matrix norms
Can we extend the notion of a norm from vectors to matrices? Intuition
suggests that a matrix such as
100 200
 75 400 
 
must somehow be ‘bigger’ than a matrix such as
 1 3
 1 2
 
One idea comes quickly to mind. We could place all the entries of the
matrix into a single vector, then apply the vector p-norm introduced
above. If we took p = 2, this would be equivalent to squaring each entry
of the matrix, summing and taking the square root. This is indeed a
commonly used matrix norm known as the Frobenius norm:
A Fro   Aij2
i j
Another approach, and one that turns out to be much more useful, is to
think about the action of a matrix on an arbitrary vector. As we noted in
the first lecture, an m  n matrix A can be viewed as a linear operator
that maps a vector x in n to a vector Ax in m. If we take the norm of
the ‘output’ vector Ax and divide it by the norm of the ‘input’ vector x, we
should get some clue as to the ‘magnifying power’ of the matrix A (and
thus its inherent ‘size’ or ‘magnitude’). We are therefore motivated to
calculate the ratio
Ax p
x p
assuming of course that x is not the zero vector. The trouble is that,
29
unless A happens to be the identity matrix (or some multiple thereof),
this ratio will depend on the direction of the input vector x. To use a very
simple example, consider the diagonal matrix
5 0 
A 
0 2
The vector x  1 0 is mapped to Ax  5 0 (magnification = 5), but

T T
the vector 0 1 is mapped to 0 2 (magnification = 2). Vectors in

T T
other directions undergo intermediate levels of magnification. This can

be clarified by looking at the image of the unit circle under the action of
A. It is mapped to an ellipse with semi-axes of length 5 and 2:
0
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
-1
-2
-3
Given that a general matrix A is likely to exert widely varying ‘magnifying

powers’ on different input vectors x, we had better be conservative and
define the matrix p-norm of A as
Ax
 max
p
A p x 0 x p
With this definition, the matrix p-norm provides an upper bound on the
30
‘magnifying power’ of A (as measured by applying the vector p-norm to
both the input vector x and output vector Ax). It means that we are
assured of satisfying the following important inequality for any vector x,
regardless of its direction.
Let A be a matrix with n columns, and let x be a vector in n. If the

inequality
Ax  A x
is satisfied for all vectors x in n, then the matrix norm A is said to be
compatible with, or induced from, the vector norm used as a metric for x
and Ax.
There is no general formula for calculating the p-norm of a matrix A in

terms of p and the entries Aij. For selected values of p, however, it is
possible to work out formulae for A p
that guarantee the satisfaction of
the above ‘compatibility inequality’ for an arbitrary vector x.
For p = 1, the compatible matrix norm is
 
A 1  max   Aij 
j
 i 
which is the maximum absolute column sum of A (remember that i
counts down the rows, and j counts across the columns). We could just
take this on trust, but it would be nice to verify that the inequality
Ax 1  A 1 x 1
is actually satisfied for any vector x. The proof requires some messy
fiddling with summations and absolute values. The indices i and j are
understood to run from 1 to m and 1 to n respectively.
31
Ax 1    Aij x j
i j
  Aij x j *
i j
  Aij x j
i j
  Aij x j
j i
  xj  Aij
j i
 
  x j  max   Aij 
j
j
 i 
 x1 A1
 A1 x 1
The justification for the step marked * is the scalar version of the triangle
inequality mentioned above:        .
For p = , the compatible matrix norm is
 
A 
 max   Aij 
i
 j 
which is the maximum absolute row sum of A. To verify the validity of
this definition, we need to show that the inequality
Ax 
 A x 
is certain to be satisfied for any vector x. This time the proof is a little
shorter, but still requires careful study to follow the reasoning. You might
find it helpful to make up a few small but concrete ‘test cases’ (e.g. with
A = 2  3 and x = 3  1) and trace what happens at each step. To make
it interesting, ensure you include a few negative terms in both A and x!
32
 
Ax 
 max 
i   ij j 
A x
 j 
 
 max   Aij x j  *
i
 j 
 
 max   Aij x j 
i
 j 
 
 max   Aij   max xj 
i
 j  j
 A x
Again, note the use of the triangle inequality to reach line *.
The matrix 2-norm is widely used and important because it is compatible

with the 2-norm for vectors, i.e. it ensures
Ax 2  A 2 x 2
for any vector x. If A happens to be square and symmetric, the 2-norm is

the maximum of the absolute values of the eigenvalues:
A 2  max  i 
i
You might like to think about this for the unit circle to ellipse mapping
discussed above. More generally, the 2-norm of A turns out to be its
largest singular value (this is beyond the scope of the present course).
Example
Find the 1-norm, -norm and Frobenius norm of the matrix
 3 2 6 
A   5 8 4 
 
 4 1 3
33
Solution
The 1-norm is the maximum absolute column sum:
A 1  max 12, 11, 13   13
The -norm is the maximum absolute row sum:
A   max 11, 17, 8   17
The Frobenius norm is the square root of the sum of squares:

12
A Fro  32   2    3  
2 2
 180  13.42
 
Example
Find the 2-norm of the matrix
1 2 
A 
 2 3
Solution
This matrix is square and symmetric, so its 2-norm can be calculated by
finding the eigenvalues.
1   2 
det  0
 2 3   
1    3     4  0
 2  2  7  0
Solving the quadratic gives   1  8  1.83,  3.83 . Hence
A 2  max  1.83 , 3.83   3.83
34
Example
Given the matrix and vector
 1.8 1.4  0.3

A , x 
 0.1 0.6  0.2
investigate the satisfaction of the ‘compatibility inequality’
Ax p
 Aq x p
when p and q take values 1 and  in all combinations.
Solution
 1.8 1.4  0.3 0.82

Ax    0.2  0.09
 0.1 0.6    
We therefore have
x 1  0.5, x 
 0.3 (vector norms of x)
A 1  2.0, A   3.2 (matrix norms of A)
Ax 1  0.91, Ax 
 0.82 (vector norms of Ax)
Combination Ax p
 Aq x p Inequality satisfied?
p = 1, q = 1 0.91  2.0  0.5 YES (expected)
p = , q =  0.82  3.2  0.3 YES (expected)
p = 1, q =  0.91  3.2  0.5 YES (lucky)
p = , q = 1 0.82  2.0  0.3 NO
Moral: don’t mix and match incompatible matrix and vector norms!
35
2.3 Conditioning of linear systems
Consider the 2  2 system
1 1   x1   2
1 1  x   0
  2  
which has the solution
 x1  1
 x   1
 2  
If we perturb one of the terms in the matrix by 1%,
1 1.01  x1   2
1 1   x   0
  2  
the solution changes to
 x1  0.995
 x   0.995
 2  
This seems reasonable – both components of x have changed by 0.5%,

which is of the same order of magnitude as the perturbation.
Similarly, if we perturb the first term of the right hand side vector by 1%,
1 1   x1   2.02
1 1  x    0 
  2  
the solution becomes
 x1  1.01
 x   1.01
 2  
which is again in line with our intuitive expectation.
This type of ‘nicely behaved’ linear system is said to be well-conditioned.
36
Consider another 2  2 system
 1 1   x1   2
0.99 1.01  x   2
  2  
which also happens to have the solution
 x1  1
 x   1
 2  
When we perturb one of the terms in the matrix by 1%,
 1 1.01  x1  2
0.99 1.01  x   2
  2  
we now get a drastic change in the solution:
 x1   0 
 x   1.98
 2  
When we perturb the first term of the right hand side vector by 1%,
 1 1   x1   2.02
0.99 1.01  x    2 
  2  
the solution is wildly different again:
 x1   2.01
 x   0.01
 2  
This behaviour is surprising, and clearly undesirable: the solution x is

highly sensitive to small perturbations in the problem data A and b (such
as those which might arise from noisy data, imperfect measurements, or
numerical roundoff errors during a computation). This type of ‘badly
behaved’ linear system is said to be ill-conditioned.
37
We can gain some insight into what is happening if we plot the two ‘base
case’ situations (well-conditioned on the left, ill-conditioned on the right):
2 2
1 1
0 0
0 1 2 0 1 2
The reason for the ill-conditioned behaviour is now clear. The lines in the
right-hand plot are so close to parallel that even small changes in slope
and/or intercept cause the intersection point to move far away from the
‘base case’ solution point (1, 1).
If the lines were parallel, the determinant of the coefficient matrix would
be zero, which might prompt the idea of using det(A) as an indicator of
possible conditioning trouble. The ill-conditioned system above has
 1 1 
A 
0.99 1.01
such that det(A) = 0.02, which seems close to zero. Unfortunately, the
determinant is useless for predicting ill-conditioning because a simple
rescaling of the problem leads to a change in the determinant. If we
multiply the equations of the ill-conditioned system by 10, we get
 10 10   x1  20
9.9 10.1  x   20
  2  
38
such that det(A) = 2. The matrix appears to be much ‘less singular’, but
the solution remains just as sensitive to 1% (or other small) perturbations
of the entries in A and/or b.
Fortunately, there is a better approach for quantifying the sensitivity of a

linear system Ax = b, thus allowing ill-conditioned behaviour to be
predicted in advance (i.e. without resorting to trial and error perturbations
of the sort we imposed above). The perturbations to A and b are
characterized by the relative error ratios
A b
and
A b
We would like to be able to predict the effect on the solution, via the
corresponding relative error ratio
x
x
Any convenient norm can be used, but it is essential that the matrix norm
and vector norm be ‘compatible’ in the sense defined previously.
Here are the gory details. Suppose we have a linear system defined by a
matrix A, a right-hand side vector b, and an exact solution x, such that
Ax  b
The matrix A is assumed to be square and invertible.
If the matrix is perturbed from A to A + A, the solution will be perturbed

from x to x + x, such that
 A  A x  x   b
39
Subtracting Ax from the left and b from the right,
Ax  A  x  x   0
Premultiplying by A1 and rearranging,
x  A1A  x  x   0
x   A1A  x  x 
Taking norms and applying the ‘compatibility inequality’ twice,
x  A 1A  x  x 
 A 1 A  x  x 
 A 1 A x  x
Hence
x
 A1 A
x  x
or equivalently
x A
x  x

 A A1  A
(2.1)
Similarly, if the right-hand side vector is perturbed from b to b + b, the

solution will be perturbed from x to x + x, such that
A  x  x   b  b
Subtracting Ax from the left and b from the right,
Ax  b
Premultiplying by A1 ,
x  A1b
40
Taking norms and applying the ‘compatibility inequality’,
x  A1b
 A 1 b
Multiplying by b on the left and Ax on the right,
x b  A1 b Ax
Applying the ‘compatibility inequality’ once more,
x b  A1 b A x
Rearranging,
x b
x

 A A1  b
(2.2)
Remarkably, the bracketed term A A1 appears in both eqn (2.1) and
eqn (2.2). It is called the condition number, and is usually denoted .
Let A be a square, invertible matrix. The condition number of A is
  A   A A1
For a linear system Ax = b, the condition number provides an upper

bound on the relative error in x caused by relative errors in A and b:
x A
   A
x  x A
x b
   A
x b
41
Example
Find the condition numbers for the two systems considered at the start of
this section. Also, for the ill-conditioned system, verify that the actual
relative errors caused by the perturbations in A and b are within the
bounds predicted by the condition number. Use the infinity norm.
Solution
For the well-conditioned system,
1 1  0.5 0.5 
A  and A1   
1 1 0.5 0.5
hence
  A  A 
A1  2 1 2 (which is low)

For the ill-conditioned system,
 1 1   50.5 50
A  and A1   
0.99 1.01  49.5 50 
hence
  A   A 
A1  2  100.5  201 (which is high)

The matrix was changed from
 1 1   1 1.01
A  to A  A   
0.99 1.01 0.99 1.01
such that
0 0.01
A  
0 0 
42
This caused the solution to change from
1  0 
x  to x  x   
1 1.98
such that
 1 
x   
0.98
We therefore expect
x A

   A  
x  x 
A 
1 0.01
 201 
1.98 2
0.505  1.005
which checks out.
The right-hand side was changed from
 2  2.02
b  to b  b   
 2  2 
such that
0.02
b   
 0 
This caused the solution to change from
1  2.01
x  to x  x   
1 0.01
such that
 1.01 
x   
 0.99
43
We therefore expect
x b

   A  
x 
b 
1.01 0.02
 201 
1 2
1.01  2.01
which again checks out.
44

2A1 Linear Algebra L1 Notes Martin PDF

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

2A1 Linear Algebra L1 Notes Martin PDF

Încărcat de

Drepturi de autor:

Formate disponibile

University of Oxford

Department of Engineering Science

4 lectures, Michaelmas Term 2017

Prof. C.M. Martin

 Allenby, R.B.J.T. (1995). Linear algebra. Elsevier.

(i) f(x + y) = f(x) + f(y)

The function f(x) = ax satisfies both requirements, whereas functions like

x , log(x) and x + 2 clearly do not: they are non-linear operators.

More generally, a linear equation in n unknowns x1, x2, …, xn is defined

For much of this course, we will be studying sets of simultaneous linear

where A is an m  n matrix, and b and x are column vectors with m and

This function is a linear transformation from n to m because

(i) f(x + y) = A(x + y) = Ax + Ay = f(x) + f(y)

Essentially, the origin is mapped to the origin (i.e. there is no translation),

The familiar ‘row-oriented’ view interprets the system as

The lines intersect at the point x1 = 1, x2 = 2. This is the (unique) solution

Row-oriented view Column-oriented view

Do the m hyperplanes in Can b be expressed as a

Do the hyperplanes Can the linear combination

1.3 Echelon form

Having reduced this matrix (and thus the system of equations) to a

m=n m<n m>n

Square Underdetermined Overdetermined

Example: 3 3 Example: 2 3 Example: 4 3

‘Usual’ expectation: ‘Usual’ expectation: ‘Usual’ expectation:

The good news is that Gaussian elimination – with a few tweaks –

To illustrate the meaning, the following matrices are in echelon form:

A matrix is in echelon form if

Any matrix can be reduced to echelon form by performing a sequence of

The three available elementary row operations are:

A set of vectors is linearly independent if no member of the set can be

When discussing matrices, the terms row rank (= number of linearly

The rank of a matrix A can be defined as either

Sometimes the rank of a matrix is obvious from inspection, e.g.

have ranks of 2, 3 and 2. The failsafe technique, however, is to reduce

The kernel of a matrix A is the set of all solutions of the homogeneous

The nullity of a matrix A is the number of linearly independent solutions

Let A be a matrix with n columns. Then

This means that we can determine the nullity of a matrix as soon as it

We see that rank(A) = 2 and nullity(A) = 5 – 2 = 3, so the system Ax = 0

Writing the echelon form of Ax = 0 out in full, we have:

Next we solve for the leading variables in terms of these parameters,

then solve row R1 for x1:

The general solution can now be assembled, then (optionally) expressed

This is the desired general solution of Ax = 0; it is also ker(A). We can

As usual, reduction to echelon form (this time performing elementary row

We can immediately infer that the original system Ax = b is inconsistent,

On the other hand, suppose we reduce a general m  n system Ax = b to

Our last example will show how to solve a non-homogeneous system

Since rank([A | b]) = rank(A) = 2, the system is consistent: there is at

and solving row R1 for x1 gives

The general solution can now be assembled, then ‘unpacked’ as before:

We recognize the second part of the solution as ker(A): it is identical to

that also satisfies Ax = b,

In this example nullity(A) is 3, which corresponds to the number of free

In general, the significance of the nullity of A is that it tells us the number

Form augmented matrix

Yes rank([A | b]) > rank(A)

rank([A | b]) = rank(A)

Identify free variables and

Solve for leading variables Solve for leading variables

(a) Define the rank, nullity and kernel of a matrix.