Linear Algebra - Alder

Introduction to Algebra and Linear Algebra
Michael D. Alder
March 13, 2007
2
Contents
1 Introduction 9
1.1 Four Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.1 Problem One . . . . . . . . . . . . . . . . . . . . . . . 9
1.1.2 Problem Two . . . . . . . . . . . . . . . . . . . . . . . 10
1.1.3 Problem Three . . . . . . . . . . . . . . . . . . . . . . 11
1.1.4 Problem Four . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 Sets and Logic 17
2.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1 Propositional Calculus . . . . . . . . . . . . . . . . . . 18
2.2.2 Predicate Calculus . . . . . . . . . . . . . . . . . . . . 23
2.3 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.3.1 Aside: Russells Paradox . . . . . . . . . . . . . . . . . 28
2.3.2 The Algebra of Sets . . . . . . . . . . . . . . . . . . . . 29
2.3.3 Another Aside: Dening N . . . . . . . . . . . . . . . . 30
2.3.4 Set Dierences . . . . . . . . . . . . . . . . . . . . . . 31
2.3.5 Cartesian Products of Sets . . . . . . . . . . . . . . . . 31
2.3.6 Sets of Sets . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1 Equivalence Relations and Partitions . . . . . . . . . . 35
2.4.2 Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.3 Finite and Innite sets . . . . . . . . . . . . . . . . . . 42
3 Algebra and Arithmetic 45
3
4 CONTENTS
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Unary Operations, Operators . . . . . . . . . . . . . . . . . . 45
3.3 Binary Operations . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.4.1 Subgroups . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.4.2 Group Actions . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Rings and Fields . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.1 Rings . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.5.2 Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.6 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.6.1 The Euclidean Algorithm in Z . . . . . . . . . . . . . . 60
3.6.2 Properties of the gcd . . . . . . . . . . . . . . . . . . . 63
3.6.4 Primes . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.6.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.6.6 Linear Congruences . . . . . . . . . . . . . . . . . . . . 65
3.6.7 Properties of Congruences . . . . . . . . . . . . . . . . 65
3.6.8 Examples . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.6.9 Division in Congruences . . . . . . . . . . . . . . . . . 67
3.6.11 Examples: . . . . . . . . . . . . . . . . . . . . . . . . . 67
4 Complex Numbers 69
4.1 A Little History Lesson . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Traditional Notations . . . . . . . . . . . . . . . . . . . . . . . 72
4.3 The Fundamental Theorem of Algebra . . . . . . . . . . . . . 75
4.4 Why C is so cool . . . . . . . . . . . . . . . . . . . . . . . . . 78
5 Abstract Vector Spaces 81
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.2 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
5.3 (Linear) Subspaces . . . . . . . . . . . . . . . . . . . . . . . . 85
5.4 Spanning Sets and Bases . . . . . . . . . . . . . . . . . . . . . 87
5.5 Direct Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.5.1 Changing the basis for a space . . . . . . . . . . . . . . 93
5.6 Linear Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
CONTENTS 5
5.6.1 Kernels and Images . . . . . . . . . . . . . . . . . . . . 100
5.6.2 Isomorphisms . . . . . . . . . . . . . . . . . . . . . . . 101
5.7 Change of Basis . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.7.1 Matrix Representations of Linear Maps . . . . . . . . . 105
5.7.2 Changing the Basis . . . . . . . . . . . . . . . . . . . . 108
5.8 Similarities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6 Banach and Hilbert Spaces 117
6.1 Beyond Vector Spaces . . . . . . . . . . . . . . . . . . . . . . 117
6.2 Normed Vector Spaces: Banach Spaces . . . . . . . . . . . . . 118
6.3 Inner Product Spaces: Hilbert Spaces . . . . . . . . . . . . . . 120
6.3.1 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . 123
6.4 The Gram-Schmidt Orthogonalisation Process . . . . . . . . . 123
7 Linear and Quadratic Maps 127
7.1 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2 Orthogonal Maps . . . . . . . . . . . . . . . . . . . . . . . . . 132
7.3 Quadratic Forms on R
n
. . . . . . . . . . . . . . . . . . . . . . 136
7.4 Diagonalising Symmetric Matrices . . . . . . . . . . . . . . . . 141
7.5 Classifying Quadratic Forms . . . . . . . . . . . . . . . . . . . 145
7.6 Multivariate Gaussian (Normal) distributions . . . . . . . . . 146
7.7 Eigenvalues and Eigenspaces . . . . . . . . . . . . . . . . . . . 149
7.8 Oranges and Mirrors . . . . . . . . . . . . . . . . . . . . . . . 156
8 Dynamical Systems 161
8.1 Low Dimensional Vector Fields . . . . . . . . . . . . . . . . . 161
8.2 Linear Algebra and Linear Systems . . . . . . . . . . . . . . . 169
8.3 Block Decompositions . . . . . . . . . . . . . . . . . . . . . . 171
8.4 A (stolen) Worked Example . . . . . . . . . . . . . . . . . . . 173
8.4.1 A little calculation . . . . . . . . . . . . . . . . . . . . 173
8.5 Complex Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . 176
8.6 The Jordan Canonical Form of a Matrix . . . . . . . . . . . . 177
6 CONTENTS
Chapter 1
Introduction
1.1 Four Problems
I am going to give you some idea of what the course is about by introducing
you to four problems.
1.1.1 Problem One
I take an orange, roughly a solid ball. I mark on a North pole, equator, and
a point on the equator.
I hold the orange up in what I shall call the initial position.
Now I stick a knitting needle through the centre at some denite angle I
shant trouble to specify, and rotate the orange about the needle by some
angle which I shall call . This changes the orientation of the orange but
not its centre, to what I shall call the intermediate position. Now I withdraw
the needle and choose another direction to stick it through the centre of the
orange, and rotate about the needle by an angle I shall call . This takes me
to the nal position of the orange.
I now ask the following question: Is there a way of poking the needle through
the orange in this nal position and rotating by an angle which will return
the orange in one rotation to its initial position?
Anybody with a gut feeling about this? We are testing your geometric-
physical intuition here!
Idea of how to tackle this problem
We abstract to R
3
: we put the origin at the centre of the orange and we rotate
all of R
3
. We have that the rst rotation specied a map : R
3
R
3
. It
7
8 CHAPTER 1. INTRODUCTION
leaves the origin (centre of the orange) xed, leaves a line in R
3
(the knitting
needle) xed, and we can take the north pole, and two points on the equator,
as the vectors
_
_
1
0
0
_
_
,
_
_
0
1
0
_
_
and
_
_
0
0
1
_
_
.
This is then followed by a similar map

: R
3
R
3
. The composite map
is the result of doing rst then

.
Now if there is a similar map that returns the orange and all of R
3
to its
original position, then doing it backwards would go from the initial position
to the nal position in one move. Call this map . Then since certainly
exists as a map, and if exists then it seems likely that
1
exists ( done
backwards) the problem can be restated as:
If and

are rigid rotations of R
3
about the origin, is

a rigid rotation
about the origin? To solve this problem we need to say precisely what we
mean by a rigid rotation about the origin of R
3
. It is very often true in Science
and Engineering that solving a problem requires us to state the problem itself
with great clarity - with much more precision than English allows. This is
why mathematics works.
Anybody want to revise their gut feelings about the answer?
One of the purposes of this course will be to solve this problem and indeed
convince you that the solution is (a) correct and (b) obvious. Since you
mostly are not yet in this happy position, you will have learnt something
when we get to this point.
What is perhaps less obvious is that what you learn is important, not because
oranges are important but because (a) showing you how to sharpen your
physical intuitions is important, and (b) it doesnt stop there.
1.1.2 Problem Two
We again start with an orange in its initial position and again I stick a
knitting needle through the centre. The next bit requires some visual imag-
ination: I imagine I stick a plane mirror through the centre of the orange,
orthogonal to the needle, and I reect the orange through the mirror, on
both sides of the mirror.
This gives the intermediate position of the orange. I call the position of the
plane mirror M and also use M : R
3
R
3
to denote the map that reects
all of R
3
in the mirror.
Now I stick a needle through and put in another mirror N orthogonal to
1.1. FOUR PROBLEMS 9
the needle and passing through the centre of the orange. Reection in this
new mirror (of the rection in the rst mirror) gives the nal position of the
orange and I represent this by the map N M.
Question: Is it possible to put a mirror in some position to return from the
nal position to the initial position in one hit?
Question: Is it possible to stick a knitting needle through the orange and do
a rotation to return the orange to its initial position in one hit?
Again, this is a test of your geometric intuitions, and again later in the course
I shall provide the solution and convince you the answer is obvious. Since
you dont nd the answer obvious at present, this will constitute progress.
You will have learnt something.
1.1.3 Problem Three
You stand in front of a large wall mirror and look at yourself. You wink your
right eye and the mirror image of you winks its left eye, so you deduce that
the mirror has swapped your left side and your right side.
Now you lie horizontally in front of the mirror and again you wink your right
eye and the image winks its left eye. But now the mirror has taken the top
eye to the bottom eye and vice-versa.
Question: How does the mirror know which way up you are?
It is fairly obvious that this is the wrong question, so another question is,
what makes us think this is a question at all?
Again, later in the course, you will have the conceptual tools to provide a sat-
isfactory answer to both questions and will have learnt something important
about the nature of thinking and understanding the world.
1.1.4 Problem Four
I dene a vector eld on R
n
as a map
V : R
n
R
n
. (Think n = 2 or 3.)
We picture this as a map taking points in R
n
and assigning them little arrows,
and we attach the tails of the arrows to the points to get a picture of lots of
little arrows as shown in gure 1.1.1.
This is not, of course, the only possible interpretation of a map f : R
n
R
n
,
we saw another one when we talked about rigid rotations in R
3
. But this
Figure 1.1.1: A Vector Field on R
2
.
one has its uses. You should get used to the idea of a piece of mathematics
having multiple interpretations it happens a lot.
Mathematica can draw these in R
2
and R
3
. Use
<< GraphicsPlotField
PlotVectorField[{-y, x}, {x, -1, 1}, {y, -1, 1}]
to get the picture of gure 1.1.1. Note that Mathematica has shrunk the
little arrows so they dont get in each others way.
Now I ask the question: If I started at
_
1
0
_
and moved so that my velocity
at every point is given by the little arrow at the point, what curve do I have?
For the above example, it should be intuitively obvious that the curve is a
circle.
I shall prove this. Let
_
x(t)
y(t)
_
be the curve given by
_
x = cos(t)
y = sin(t)
_
. Then
at t = 0, I am at
_
1
0
_
. The velocity vector is
_
x
y
_
=
_
sin t
cos t
_
=
_
y
x
_
as required.
What we have done here is known in old fashioned language as solving an
initial value problem for a system of ordinary dierential equations.
Now I shall show how to solve all such systems.
Write the vector eld
V : R
2
R
2
_
x
y
_

_
y
x
_
as a matrix equation
[V ] =
_
0 1
1 0
_
check:
_
0 1
1 0
_ _
x
y
_
=
_
y
x
_
.
1.1. FOUR PROBLEMS 11
Now calculate e
t[V ]
.
We know e
x
= 1 + x +
x
2
2!
+
x
3
3!
+ . . .
x
n
n!
+ . . . is a convergent series for e
x
when x R.
To extend exponentation to square matrices is, in principle, straightforward.
If A is a square matrix, A
1
= A, A
2
= AA, A
3
= AAA, and so on, A
= I,
the identity matrix. So if A is a square matrix
e
A
= I +A +
1
2!
A
2
+
1
3!
A
3
+
1
n!
A
n
+
To believe this always makes sense we need to believe the innite series
converges, and its limit is a square matrix of the same size as the original
one.
Lets do it for t[V ].
We have
e
t[V ]
= I +t[V ] +
t
2
2!
[V ]
2
+ +
t
n
n!
[V ]
n
+
now [V ] =
_
0 1
1 0
_
so [V ]
2
=
_
0 1
1 0
_ _
0 1
1 0
_
=
_
1 0
0 1
_
and [V ]
3
=
_
0 1
1 0
_ _
1 0
0 1
_
=
_
0 1
1 0
_
and [V ]
4
=
_
1 0
0 1
_ _
1 0
0 1
_
=
_
1 0
0 1
_
.
So [V ]
5
= [V ], [V ]
6
= [V ]
2
, [V ]
7
= [V ]
3
, [V ]
8
= I and so on.
So
e
t[V ]
=
_
_
1 + 0
t
2
2!
+
0
3!
+
t
4
4!
+ 0
t
6
6!
t +
t
3
3!

t
5
5!
+
t
7
7!

0 +t
t
3
3!
+
t
5
5!

t
7
7!
+
t
0
9!
1
t
2
2!
+
t
4
4!

t
6
6!

_
_
=
_
cos t sin t
sin t cos t
_
To get the curve starting at
_
1
0
_
we multiply this vector by the matrix
_
cos t sin t
sin t cos t
_ _
1
0
_
=
_
cos t
sin t
_
which was the solution.
To get the solution curve for any other initial point, just operate on that
initial point:
_
cos t sin t
sin t cos t
_ _
a
b
_
=
_
a cos t b sin t
a sin t +b cos t
_
and this is the solution curve for
_
x(0) = a
y(0) = b
_
.
Exercise 1.1.1. Here are a few for you to play with:
1. Turn V
_
x
y
_
=
_
x
y
_
into a matrix equation.
2. Plot the vector eld of V using Mathematica.
3. Calculate e
t[V ]
.
4. Solve the system of equations directly using old fashioned methods
_
x = x
y = y
_
.
5. Check that the solution to the IVP with
_
x(0)
y(0)
_
=
_
1
1
_
is in agree-
ment with the old fashioned method.
1.2 Remarks
This course is basically about the four problems discussed. There is a lot
of infrastructure, preliminary material to be covered in order to solve these
problems, groups and group actions, complex numbers, to name the most
crucial. Some of the ideas are very abstract and very new. The main objective
which I stress, is to make our ideas ultra-precise and we attack
this by making up a lot of denitions. You will be glad to hear that the
considerable amount of work we shall do will solve a large number of other
problems at the same time.
About the only assumption I shall make is that you can read, write and
multiply matrices. Just to be on the safe side I point out that you can
multiply a column matrix of height n (an n 1 matrix) by a 1 n row
matrix on the left by moving your left index nger along the row, your right
1.2. REMARKS 13
index nger down the column, multiplying the corresponding numbers and
keeping a running total. So
_
3 4
_
1
2
_
= (3)(1) + (4)(2) = 11
If you have more columns in the second matrix, you just do each column at
a time and have a result which also has more columns:
_
3 4
_
1 5
2 9
_
=
_
11 51
If you have more than one row in the left hand matrix, you repeat for each
row, writing the corresponding answers in the corresponding rows of the
output matrix:
_
3 4
2 1
_ _
1 5
2 9
_
=
_
11 51
4 19
_
If we have A an n m matrix and B an mk matrix then we can multiply
to get AB an n k matrix, for any positive integers m, n, k. If the numbers
dont match up we cant do the multiplication.
Why we follow this peculiar rule will become clear later.
If anything is obscure, if I assume you know things you dont, it is your job to
tell me as soon as possible. It is NOT your job to merely copy down what
I say and write like a mediaeval scribe and hope you can mug up enough to
pass the examination.
Chapter 2
Sets and Logic
2.1 Basics
Mathematicians are obsessed with sets. Arent we all, I hear you say, but
the word was sets.
A set is a collection of objects, which we generally refer to as the elements
of a set. We generally use braces and to enclose the elements of the set,
e.g. the non-negative integers (natural numbers) are the set
N = 0, 1, 2, 3,
If A is a set and x an element of A then we write x A and we write y , A
if y is not an element of A. Note that we have not restricted the types of
elements that we can put into a set. The set with no elements is called the
empty set and is generally denoted by .
Example 2.1.1. The set of human beings in this room.
Example 2.1.2. The set of natural numbers, N. We have informally that
N = 0, 1, 2,
This is informal because there is an essential uness in the three dots which
mean and so on, and which presupposes that your guess as to what comes
next is the same as mine. A proper denition shouldnt have to rely on
guessing games.
Example 2.1.3. The set of purple camels in this room
Example 2.1.4. The set of two headed students in this room
15
16 CHAPTER 2. SETS AND LOGIC
Remark 2.1.1. The last two sets are actually the same set, . I shall prove
this soon.
I shall suppose that you know what the set N is. Actually you dont, but you
are suciently familiar with it that it is a reasonable starting point. (Note
that if anyone said to you What is the number two? you might have the
greatest diculty in giving a satisfactory answer. You have learnt how to add
and subtract and even multiply and divide such things, mainly, alas, using
a pocket calculator. Familiarity may well have given you some contempt for
the numbers. But if you are the thoughtful type you must confront the fact
that although you can recognise them and do sums, you have not in fact got
the foggiest idea of what one of them actually is. I recommend that you give
this matter some thought.)
I shall also suppose that you know what the sets Z, the integers, with (infor-
mally) Z = 3, 2, 1, 0, 1, 2, 3, , R the real numbers, Q the rational
numbers (fractions). Again, you dont have any sensible idea of what these
are, but you can recognise their usual names and you can do sums with them
on your calculator. For the present, this will suce.
2.2 Logic
Logic comes in two sorts, known as Propositional Calculus and Predicate
Calculus.
2.2.1 Propositional Calculus
The rst of these is about propositions or sentences which are strings of sym-
bols together with an interpretation so that the sentences mean something
and are either True or False. In studying logic we are not concerned with
whether a sentence is true or not, but only with what follows if it is or isnt.
Thus the syllogism
If God exists and is good then this is the best of all possible
worlds
This is not the best of all possible worlds
Therefore either God is not good or He does not exist.
is a valid argument. Whether you believe the conclusion (the last line)
will depend on whether you accept both of the two premisses (the rst two
lines), but if you do accept them then you must accept the conclusion. The
2.2. LOGIC 17
reason you must is not because there is a law saying that people who are
illogical should be shot, although it might be a better world if there were,
but because this is how language is used. You can, if you wish, decide to be
illogical just like all the people in the arts departments, but although you
may feel liberated from the laws of logic, all it means is that you are refusing
to use language the way the rest of us do. It would no doubt be liberating
to call cats dogs whenever you felt like it, but people would soon afterwards
stop listening to anything you said.
The form of the above argument is:
P AND Q R
NOT R
Therefore (NOT P) OR (NOT Q)
The simplest and best known rule of inference is called modus ponens.
Modus Ponens
P
P Q
Q
This means that if P is any sentence which is true, and if the sentence If P
is true then Q is true, for any sentence Q, is true, then Q must be true.
The symbol is called implies and you need to note that it makes sense
between sentences. It is not a fashionable kind of = sign and anybody writ-
ing anything where the things on either side are numbers or sets gets an
automatic fail.
Example 2.2.1. Something can go wrong
If something can go wrong it will go wrong
Therefore something will go wrong
Usually we leave out the rst premise and state the second, and leave the
person we are talking to to work out what was omitted and what was implied.
The fact that most of you can do this with high reliability tells us that
somehow you have picked up the rules of Logic.
A variant (called in some quarters Modus Morons) goes:
Modus Morons An invalid but popular rule of inference:
Q
P Q
P
As an example I shall prove that I am the pope. Since I am not the pope,
you may safely conclude that Modus Morons is not a valid rule of inference!
The pope is infallible when pronouncing on matters of faith and
morals.
If I am the pope I pronounce that all matters are matters of faith
and morals.
If I am the pope I further pronounce that it is not raining in the
classroom.
It is not raining in the classroom.
Therefore I am the pope.
The rst four lines are perfectly sound. It follows that if P is the statement
I am the pope and if you accept the rst line (and the pope was declared
infallible sometime early in the twentieth century or late in the nineteenth,
I am not sure of the date) then it follows that the second statement is true
because it is clearly a matter of faith and morals because it says something
about faith and morals, namely that they encompass everything. In which
case it follows that if I am the pope then my pronouncement on it not raining
in the classroom must be true. The catch is in the last line which follows
from the rest by Modus Morons which is not in fact valid.
This might sound as if it is of little practical concern, but two thirds of
the rst year students in M131 this year relied on it in induction proofs.
Consequently they got zero marks for their assignments and the relevant
examination question. Make sure you dont imitate them.
Two common forms of argument which are not valid are the argumentam de
auctoritas or argument from authority. This has the form:
IF A is an authority THEN any assertion of A is true
A is an authority
A asserts P
Therefore P (is true)
Related to this is the Argumentam ad Hominem (argument to the man)
which goes
2.2. LOGIC 19
If A is stupid/evil/ugly then any assertion of A is False
A is stupid/evil/ugly
A asserts P
Therefore P is False
For example:
George Bush is stupid and wicked
George Bush thinks the War in Iraq is a good idea
Therefore the war in Iraq is a bad idea
Now whether the war in Iraq is a good idea or a bad one is not established
by this argument. We can show this by replacing the war in Iraq is a good
idea by eating food is a good idea. Even if George Bush is stupid and
wicked, he may be right about some things.
The standard training in Logic for about a thousand years required the stu-
dent to take an argument and reduce it to syllogistic form by breaking it up
into separate assertions joined by logical connectives (and, or, implies) and
inserting suppressed premisses where necessary. I give an example of this
process; the suppressed premisses are indicated by parentheses. Note that
this can be done in many ways. The text of the argument was extracted
from a letter to The Australian Newspaper.
Example 2.2.2. K Lynch believes that Angela Shanahan is wicked because
she would not use therapy with embryonic stem cells to save the life of her
child on the grounds that the embryos were going to destruction anyway.
Using the same moral criterion, the sacrice of Jews by Dr. Mengele to
develop life saving therapies was justied since the Jews were going to de-
struction anyway.
Analysis
Argument A for any x, x is going to be detroyed anyway using x
to save lives is morally justied.
Argument A Dr Mengele was morally justied to use Jews to save lives.
(Dr. Mengele was not morally justied to use Jews to save lives.)
(Therefore Argument A is not sound.)
(K Lynch believes Argument A is sound.)
(Therefore K Lynch has a false belief.)
Exercise 2.2.1. Does this argument establish that Angela Shanahan is not
wicked?
Exercise 2.2.2. If you were K Lynch, how would you respond to this?
There are several ways of joining up sentences into compound sentences: we
have already seen one with P Q but I could also have written P & Q
to mean P and Q (are both true) or P Q to mean that either P is true
or Q is true. I can also, for any sentence P obtain the denial of P which is
written P or P. Thus if P is true, P is false and vice versa, and if P is
false, P is true and vice versa.
Some of you who are doing Computer Science will have met these before or
will shortly.
Exercise 2.2.3. The de Morgan Laws are two in number and assert:
For any sentences P and Q,
(P Q) = (P) (Q)
(P Q) = (P) (Q)
To declare that two propositions are equal is to mean that whenever one is
true so is the other.
Take some examples of particular P and Q and check that the laws hold.
Exercise 2.2.4. The Truth Table for & is as follows:
P Q &
T T T
T F F
F T F
F F F
This gives every possible combination of cases of the truth values of P and Q
and in the nal column gives the truth value of P & Q.
Draw up a Truth table for or and one for . If you have trouble with the
last, there is a denition of which goes:
P Q = (P & Q)
Fill in the table:
P Q Q (P & Q) (P & Q)
? ? ? ? ?
2.2. LOGIC 21
Exercise 2.2.5. Prove the deMorgan Laws using Truth Tables.
Exercise 2.2.6. A compound proposition is called a tautology when the
truth table entries for it are all True. The de Morgan laws give two such
tautologies. Perhaps the simplest tautology is P P which clearly has value
True whether P is true or False.
Find a tautology involving three propositions P, Q and R and conrm that it
is one. It should have eight rows in the Truth Table.
Remark 2.2.1. Note that tautologies appear to be true statements but
dont actually say anything about the world at all. They are true in the
most boring possible way, they are, in eect, statements about how language
is correctly used.
2.2.2 Predicate Calculus
The second level of Logic, the Predicate Calculus, deals with things called
predicates or open sentences. These are generalisations of sentences, having
one (or more) special symbols called placeholders in them. There has been
a tendency to use the letters at the tail end of the alphabet for these.
Example 2.2.3. x > 3 is a predicate over the natural numbers. This means
that you are allowed to substitute the name of any natural number for the
symbol x and the result is a proposition or sentence. If you substitute 1 for
x the sentence is false and if you substitute 4 for x you get a true sentence.
The last example might also be a predicate over Z or Q or R. Usually the
context tells you which is intended, but it is considered good manners to tell
people if there is any chance of uncertainty.
Example 2.2.4. x is more than two metres tall is a predicate over the set
of people in this room.
There are two ways of turning a predicate into a sentence, the rst is to
substitute some allowable value, the name of an element of the set for which
the predicate is dened, and the second is to use one of two quantiers.
The rst is the universal quantier which puts For every x in front of the
predicate P(x). We write this:
x S, P(x)
where S is the set of possible names of things for which P(x) makes sense.
Example 2.2.5. For every x in this room, x is more than two metres tall.
This translates as everybody in this room is more than two metres tall. It
is a perfectly respectable proposition, it just happens to be false.
The second is the existential quantier which puts there exists an x such
that in front of the predicate P(x). We write this:
x S, P(x)
Example 2.2.6. There exists an x in this room such that x is more than
two metres tall.
This is usually translated as Somebody in this room is more than two metres
tall. I dont know if this is true or not. What is clear is that it either is or
isnt.
Example 2.2.7. x N, x > 3
This says that there is at least one natural number bigger than three. It is a
true statement.
You can see that the sign also makes sense when it is used between pred-
icates.
Example 2.2.8. x Z, x > 5 x > 3
in this case it is the implication which is being quantied, and the result is
a true statement.
It should be obvious to you after a few moments though that you can have
two placeholders in a predicate and you can quantify over both.
Example 2.2.9. x Z, y Z, x < y
This says that for every integer there is at least one which is bigger, which
is true.
Note that reversing the order of the quantiers changes everything:
y Z, x Z, x < y
says that there is some integer which is bigger than every integer (including
itself). This is false.
Human beings are extremely good at working out rules of how to use language
from quite small samples. They work them out in the sense of being able
to use them, but not, usually, in the sense of being able to state what they
are. About two and a half thousand years ago, Aristotle worked out the
rules of Logic and wrote them down, and for about a thousand years, Logic
was in the school syllabus for all educated Europeans. It was believed that
knowing the rules of Logic sharpened your thinking and made it easy to
deal with quite complicated arguments. Certainly this was before the age of
2.3. SETS 23
ubiquitous bullshit in which we now live, so there may be something in the
idea.
Since you have absorbed through your skins the correct way to use Logic in
easy cases, at least I hope so, and since we do not want to turn you into
Logicians, merely sharpen you capacity for logical thought, I shall stop here.
Anybody interested will nd books on Logic in the Library. Anybody with
an interest in Computer Science will nd it useful to be familiar with the
elements of classical or Aristotelian Logic. There are a number of general-
isations that have been devised in modern times (since about 1920) some
of which have a bearing on computing, for example Temporal Logic. Other
Logics have looked at what you can do if you allow more than just two values
of TRUE and FALSE, an important case being the probabilistic logic where
you allow all the numbers between 0 and 1 as possible values, 0 being FALSE
and 1 being TRUE, and a number like 0.5 being a reasonable value for the
truth value of When I toss this coin it will come down Heads. I mention
these things because they are good clean fun and might amuse as well as
instruct.
2.3 Sets
We use predicates over sets to dene subsets. The subset is called the truth
set of the predicate.
Example 2.3.1. Z
+
z Z : z > 0
This is read: Z plus is dened to be the set of those integers z which are
greater than zero.
I use the symbol to tell you that the thing on the left is dened to mean the
thing on the right. This is dierent from the usual = sign, since 4 = 2 + 2
is a statement which is true and 5 = 2 + 2 is a statement that is false.
We generally try to prove statements in Mathematics (when we have some
reason to think they are true) but nobody would try to prove a denition.
It helps clear thinking to use dierent symbols for dierent ideas. Life is full
of muddle and reducing the amount of it helps keep it simple.
Example 2.3.2. A x R [ 1 < x 3. You may nd that this set is
described by (1, 3] in Calculus.
Some people use [ where I use : , read as such that. I nd the [ symbol
gets hard to read when it is used in predicates with a modulus sign in,
(x R [ |x 1| < 0) and my handwriting is bad enough as it is.
Denition 2.3.1. Two sets A and B are equal if they contain the same
elements, that is every element of A is also in B and every element of B is
also in A.
I shall write this more algebraically as
A = B i x, x A x B and x, x B x A
the expression i is short for if and only if and is sometimes replaced by
the double implication symbol .
Note that I havent told you what set the predicate x A is dened over.
What possible values are there for x? In such cases we rather suppose that
there is a universe of discourse containing every conceivable thing we might
want to talk about. If we call this U then it contains x and perhaps also A
and B and we could regard it as containing all possible sets, so everything
is in it. There is a bit of a problem with this which I shall come back
to. If I leave out the name of the set for my predicate, assume that it is
some suciently big set for the predicate to make sense which will contain
everything we might want it to.
Note that we usually list each element in a set only once but if we repeat
the same element it doesnt change the set. Also the order in which we list
elements of sets does not matter, as sets are only characterised by which
elements they contain. Hence 1, 2 = 2, 1, since both sets contain the
elements 1 and 2. Also, the set 2, 1 +1, 2 +0 is equal to the set 2 which
has only one thing in it.
I could dene the empty set as
x : x ,= x
Since any x is the name of something, it must surely be the case that the
thing is itself, whatever it is. So the predicate x ,= x isnt true no matter
which x you plug in.
We can now show that the set of purple camels in this room is the empty
set:
I claim the statement There exists a purple camel in this room is false. If
you dispute this you can prove me wrong by pointing to a purple camel.
Nobody did so my claim stands.
It follows that if P(x) is short for x is a purple camel in this room I am
claiming that x P(x) is false. Hence xP(x) is true. Hence A x :
P(x) = . To show from the denition that A = what I have to prove is
that for every x, x A x and also x, x x A. This is very
easy because there are no such x.
2.3. SETS 25
Exercise 2.3.1. Convince yourself by looking at examples that
(x o, P(x)) = x o, P(x)
is always true.
Exercise 2.3.2. The above is a generalisation of one of the deMorgan Laws.
Which one? What do you get if you generalise the other?
The same argument shows that the set B of two headed students in this
class is also the empty set. It follows that the statement Every two headed
student in this room is a purple camel is a true statement. Such statements
are said to be vacuously true and it is a good description.
This would all be rather silly (although harmless fun) were it not for the fact
that mathematics also uses vacuously true statements on occasion.
Denition 2.3.2. Let A be a set. The set B is called a subset of A if every
element of B is in A. We use the notation B A. If B A and B ,= A
then we say B is a proper subset of A and we write B A.
Formally we write:
B A i x, x B x A
Remark 2.3.1. The above denition uses the expression Let A be a set.
This is slightly weird but is widely used. You should also note that the
above formal denition is a predicate with two placeholders A and B. Since
we want the denition to be a true statement, there is an implied universal
quantication in there. Unfortunately there is a lot of universal quantication
which gets left out and you are supposed to put it in.
If S is the collection of all possible sets, then we should really write:
A S, B S B A i x S, x B x A
There is a big problem with doing this. It draws your attention to the
collection of all possible sets, and you may reasonably think that S is itself
a set. Moreover a set which contains itself as an element. Sets which contain
themselves as subsets are one thing, and we meet them all the time, because
they all do, even the empty set, but a set which contains itself as an element
is denitely weird. This leads us to consider the subset of S of those sets
which contain themselves (as elements) and another subset those which do
not.
2.3.1 Aside: Russells Paradox
I shall use R to denote the set of those sets which do not contain themselves
as elements. Then I ask the innocent seeming question, does R contain itself
or not?
We ought, being clear thinking adults, to be able to decide this by a little
careful analysis. Suppose R does contain itself. Then it is, by denition the
collection of all those sets which do not contain themselves, so it damn well
shouldnt be there, and so we deduce that
R R
is denitely false.
So it must be true that
R , R
Since then R is a set which does not contain itself, it denitely belongs in R
and must be there after all.
If your head is not spinning a bit then you havent understood the whole
idea.
I have used R to denote this impossible set after Bertrand Russell who
invented it. What it shows is that going around turning anything you fancy
into a set is fraught with logical paradox.
This threw Mathematics into a nasty spin for a while in the early years of the
last century. Russell sent a letter to Gotlobb Frege, a mathematician who had
just nished proof reading a book which put all of mathematics on a sound
footing of set theory (he thought) when he got Russells letter which started
o Dear Frege, Consider the set of all sets which do not contain themselves
. . . It came as a bit of a shock, because his lifes work had suddenly had the
bottom shot out of it.
We have since found ways of avoiding Russells Paradox, and they involve
saying that S is not a set. It is something altogether bigger and hairier.
You can do a Google search on Russells Paradox and nd out all about it.
It turns out to have a bearing on some results of Kurt Godel about what
can and cant be proved in Mathematics. Google also Godels Theorem
for some innocent entertainment. Some fruitloops have claimed that Godels
Theorem proves that people are not robots which happen to be made of meat
instead of silicon and steel. This shows that fruitloops should stay away from
mathematics. Of course you are all meat machines. Modern medicine relies
upon the fact.
2.3. SETS 27
You are allowed to argue with me, indeed encouraged to, but only after you
have understood Godels Theorem.
2.3.2 The Algebra of Sets
Now we return to the business of saying some conventional things about sets
and we shall use the nasty sloppy convention of letting things be things,
without asking whether they want to be them or not, and avoid proper formal
quantication so as to slither around dicult problems in metamathematics.
Denition 2.3.3. For any sets A and B, the intersection of A and B, de-
noted by A B is dened to be the set
A B x : x A and x B
The union of A and B denoted A B, is dened to be the set
A B x : x A or x B
Example 2.3.3. Let A = 0, 1, 2, 3, 4, . . . be the set of all natural num-
bers and B = x Z [ x
2
= 4. Then B = 2, 2 and so A B =
2, 0, 1, 2, 3, . . . and A B = 2.
Example 2.3.4. Let A denote the set of human beings in this room, B (for
Blokes) denote the subset of male human beings in this room, and S (for
Sheilas) denote the female human beings in this room. Then BS = A and
B S = unless we have a hermaphrodite or two.
In the Algebra of Sets we perform operations on sets using and as our
operations (similar to how we use addition and multiplication for numbers).
We use parentheses to indicate precedence.
Theorem 2.3.1. For any sets A, B, C,
1. Commutativity:
(a) A B = B A,
(b) A B = B A,
2. Associativity:
(a) (A B) C = A (B C),
(b) (A B) C = A (B C),
3. Distributivity:
(a) (A B) C = (A C) (B C),
(b) (A B) C = (A C) (B C).
Remark 2.3.2. Proving these claims is very easy and follows from corre-
sponding statements about Logic: if for example I use P Q to mean P and
Q for any propositions P, Q, then (P Q) R = P (QR) both meaning
the claim that all three propositions are true. Likewise if P Q means P or
Q, it is easy to see that
(P Q) R = (P R) (Q R)
The left hand side means that to say that both Pand Q are true or R is
true, then this is the same as saying that either P is true or R is true and
also either Q is true or R is true. I hope you can convince yourself of the
corresponding Logical statements by thinking about what they mean.
Now
x (A B) C i x A B or x C
i (x A x B) x C
and the rest is logic. Which I have assumed, rightly or wrongly, that you can
all do.
Denition 2.3.4. Two sets A and B are called disjoint i A B = .
2.3.3 Another Aside: Dening N
It should be clear after a little thought that ,= . The rst set has got
nothing in it but the second set has one thing in it, that thing being a set,
in fact the empty set. Note that
,
and also
,
and these are quite dierent statements.
You can see that the set , has two things in it.
If you choose the rules of the game carefully, you have here a way of dening
the natural numbers. We could write them as
, , , , , , ,
2.3. SETS 29
This is counting
0, 1, 2, 3,
We note that every element of the sequence is the set of all its predecessors.
This gives a way of using the set-builder notation to say what each of the
natural numbers is. In eect what we do is to announce that is the rst
of them and that whenever we have a full collection of them, there is a next
one which is the set of all those we already have.
This approach was invented by John von Neumann, a German-American
mathematician who also invented the modern computer architecture, did
basic work in Quantum Mechanics and functional analysis, and proved it is
possible to build a self-replicating robot.
Some books have N starting with 1 and not including 0. The von Neumann
denition gives us a good reason to start with 0 which is why I did. And if
anyone should ask you what the number two is, you can now tell them. It is
the set , . That should shut them up.
2.3.4 Set Dierences
Denition 2.3.5. For two sets A and B the set dierence AB is the set of
all elements of A which are not in B.
Another way of expressing this is:
AB x : x A and x , B
or as
x AB x A and x , B
Example 2.3.5. Let A = 1, 5, 7, 9 and B = 2, 3, 5, 9. Then AB =
1, 7.
2.3.5 Cartesian Products of Sets
Consider the real plane. We often describe points in the plane through their
coordinates, e.g. (1, 2) or (1, 17), when it becomes R
2
. In this case we
distinguish the points (1, 2) and (2, 1). Hence (1, 2) is not the same as the
set 1, 2. We call (x, y) an ordered pair. We say that two ordered pairs
(x, y) and (s, t) are equal if x = s and y = t and unequal otherwise. For
example (1, 2) ,= (2, 1) and also (1, 2) ,= (1, 3).
In the ordered pair (x, y) we call x the rst component and y the second
component.
Denition 2.3.6. Let A and B be sets. Then the Cartesian product of A
and B is
A B = (a, b) [ a A, b B
We can generalise this to n sets.
Denition 2.3.7. Let n be a positive integer and A
1
, . . . , A
n
sets. Then
A
1
A
n
= (a
1
, . . . , a
n
) [ a
i
A
i
is the Cartesian product of A

1
, . . . , A
n
.
We denote A A by A
n
when there are n terms in the product. Hence
R
n
in the particular case A = R.
Exercise 2.3.3. What is the result if one or more of the A
i
= ?
2.3.6 Sets of Sets
Sometimes, as discussed earlier, we have to consider sets whose elements are
sets. In this case we have to be quite careful. Suppose that o is a set whose
elements are sets. For example, o = A, B, C. Then if x A it does not
mean that x o. The elements of o are sets and x might not be a set.
We distinguish carefully between x and x, the former is some entity which
might or might not be a set, while the second is a set with one element in it.
Confusing and is the sign of a disordered mind.
Denition 2.3.8. Let A be a set. The power set of A is the the set of all
subsets of A. It is denoted T(A)
Exercise 2.3.4. Show that if the set A is nite and has n elements, T(A)
is also nite and has 2
n
elements.
Remark 2.3.3. Since I havent dened the term nite you will have to
produce an intuitively convincing argument rather than a proper proof, but
using the school ideas will suce for the present.
Suppose we have a collection of sets U
j
, one such for every j J where J is
another set called the index set. So far we have dealt with the case where J
has only two or at most a nite collection of elements. There is no particular
reason for continuing with this restriction.
Denition 2.3.9. If U
j
, j J is any collection of sets,
_
jJ
U
j
is a set and x
_
jJ
U
j
i j J x U
j
2.4. RELATIONS 31
Exercise 2.3.5. translate into English!
Exercise 2.3.6. What is
_
jJ
U
j
when J = ?
Denition 2.3.10. If U
j
, j J is any collection of sets,
jJ
U
j
is a set and x
jJ
U
j
i j J x U
j
Exercise 2.3.7. translate this into English!
Exercise 2.3.8. What is
jJ
U
j
when J = ?
There is a denition of a cartesian product over any set which is rather more
technical and which we shant need.
2.4 Relations
Suppose a Little Green Man from outer space came down and discovered
that two people were married and asked what it meant to be married.
Since being married means dierent things to dierent people we would have
a denite problem. The LGM, we may suppose, knows almost nothing but
can distinguish between dierent species and has worked out how to tell a
plant from an animal and a male animal from a female animal, at least most
of the time.
The mathematicians answer to the question would be to say that being
married is a relation between human beings. This precludes being married
to your pet hamster. The relation has a number of properties: for a start if a
human being x is married to another y then it is also true that y is married
to x. It is tempting to write M(x, y) as shorthand for x is married to y so
we can tell the LGM that
x, y H, M(x, y) M(y, x)
where H denotes the set of human beings.
It is illegal to marry yourself, so we can assure the LGM that
x H, M(x, x)
After that it gets dicult. Should we tell the LGM that male human beings
(x H
M
) can marry only female human beings (y H
F
)? Not in Canada.
Should we tell it that a man can be married to only one woman? Not in Saudi
Arabia. Should we say that a man can be married to someone above a denite
age? In some cultures at some times it has been possible to marry children or
an adult to a child. And anyway, could you keep on specifying enough rules
to do the job? Some cultures insist on the local priest or shaman ociating,
others require only a declaration to each other by the parties concerned.
The mathematicians way is to specify the set of people concerned and then
give a list of who is married to whom. It might be a longish list but this
would do it. In this case we have that being married would be a subset of
HH, the cartesian product of the set of all human beings with itself. We
write M HH is the set of married people and write (x, y) M instead
of M(x, y).
In general, we have:
Denition 2.4.1. A binary relation R between sets A and B is a subset,
R A B.
We can dene a ternary relation (x gave y to z) on three sets in the same
way. I shant bother much with n-ary relations for n > 2.
Example 2.4.1. < is a binary relation on R (that is a relation between R
and R) given by
(x, y) < z R
+
, y = x +z
We write x < y instead of (x, y) < and if R is a binary relation I shall
generally write xRy as shorthand for (x, y) R.
Example 2.4.2. Let R denote the relation of owning something, a relation
human beings take seriously. It is going to be a subset of A B where A is
the set of possible owners and B is the set of things that can be owned. If
we decide that a company can own things, then A will have to be the union
of a set of human beings and a set of other things including companies. If
joint ownership is allowed, then we might have to allow A to include T(H),
the set of all subsets of human beings. This would include companies. Does
a dog own its bone or a bird its nest? If so, dogs and birds would have to be
in A as well.
Can you own only material objects, or could you own a promise to buy your
shares in Oil at some future date for a xed price? Economists think you can.
So the set B is going to take some careful thought too. Can you both own
some things and be owned by others? In Roman times a slave was owned
and could also own things. There have been cases of slaves who owned more
money than their masters. Does your pet cat own you? The cat might think
2.4. RELATIONS 33
so. Working out what ownership actually means is very tricky. Better to
turn back to mathematics where we always have precise denitions to work
with.
Example 2.4.3. We use the notation M
2,2
(R) to denote the set of 2 2
matrices with real entries. For any A, B M
2,2
(R) write ARB i there is a
sequence of elementary row operations that can convert A to B. Then R is a
binary relation on M
2,2
(R). We shall say that A and B are ERO equivalent
when ARB.
It is immediate that
A M
2,2
(R), ARA
(Just do any ERO and then undo it the next time. Or add zero times one
row to another.) It is immediate that
A, B M
2,2
(R), ARB BRA
and also that
A, B, C M
2,2
(R), ARB & BRC ARC
There are many other relations which share these properties. For example,
the relation = on the set N satises the conditions in an admittedly uninter-
esting way since numbers are equal only to themselves.
2.4.1 Equivalence Relations and Partitions
The denition of a relation is too general (just think of any subset of AA).
Some special relations are particularly useful and we will discuss a few below.
Denition 2.4.2. Let A be any set and a relation on the set A. that is,
A A.
(i) is called reexive i
x A, x x
That is, if the subset (x, x): x A .
(ii) is called symmetric i
x A, x y y x
That is, if the whenever (x, y) , then (y, x) .
Figure 2.4.1: An equivalence on cows.
(iii) R is called transitive i
x A, x y & y z x z
That is, if (x, y), (y, z) R, then (x, z) R
Denition 2.4.3. An equivalence relation on a set A is a binary relation
which is reexive, symmetric and transitive that is it satises all the three
conditions in the last denition.
Exercise 2.4.1. Show that = is an equivalence relation (that is where the
word equivalence comes in!) and that > is not an equivalence relation.
It is very easy to construct an equivalence relation on small sets.
Suppose we have a set C of cows in a eld. I want to construct an equivalence
relation on the set C. I can do this by herding the cows into clusters, one in
the north corner, one in the east, one in the south and a fth cluster in the
middle.
I show a picture of the ve clusters of cows in gure 2.4.1 with them all neatly
separated by fences. Now I dene an equivalence by: cow x is equivalent to
cow y i the two cows are in the same cluster.
It is easy to see that the relation is reexive since each cow is in the same
cluster as itself, it is symmetric because if x and y are in the same cluster
then so are y and x. And it is transitive because x y and z are all cows
together in the same cluster.
This works for other things besides cows. In fact it works for everything. We
now make this claim precise:
Denition 2.4.4. A partition of a set A is a set T of non-empty subsets of
A such that x A, x belongs to only one member of T.
In the cow example, the partition elements are the clusters of cows. There
are ve of them.
2.4. RELATIONS 35
For another example, consider the even and odd positive integers this is a
partition of the positive integers into two subsets.
Denition 2.4.5. Let A be any set and an equivalence relation on A. For
any x A, dene E
x
y A: x y. The set E
x
is called the equivalence
class of x with respect to the equivalence relation .
REMARKS
(a) Observe that x E
x
, since x x.
(b) x y if and only if E
x
= E
y
. Proof: Suppose x y and a E
x
.
Then a x y and by transitivity a y and therefore a E
y
.
Hence every element of E
x
is in E
y
. On the other hand if b E
y
then
b y x and so b E
x
and therefore E
x
= E
y
. If E
x
= E
y
then
x E
y
and so x y.
(c) Let T = E
x
: x A. Then T is a partition of A. First it is clear the
E
x
A. Also, E
x
= E
y
or E
x
E
y
= . Proof: Suppose E
x
E
y
,= .
Then there exist an element a E
x
E
y
, so a x and a y. By
transitivity, x y and by (b) E
x
= E
y
.
(d) Let T be any partition of A. Dene the relation R on A as follows:
(x, y) R if and only if x and y lie in the same part of T. Then R is
an equivalence relation on A.
Hence we have proved
Theorem 2.4.1. Let A be any non-empty set. Then the equivalence classes
of an equivalence relation partition A. Moreover, any partition denes an
equivalence relation on A whose equivalence classes are the parts of the par-
tition.
Exercise 2.4.2. We put an equivalence relation on R
2
.
_
x
1
y
1
_
,
_
x
2
y
2
_
R
2
,
_
x
1
y
1
_

_
x
2
y
2
_
x
1
= x
2
Verify that this is an equivalence relation on R
2
and describe the equivalence
classes.
Exercise 2.4.3. How many distinct relations are there on a set with three
elements? How many of them are equivalence relations? With four elements?
2.4.2 Maps
Maps or functions are a special case of a binary relation between (generally
dierent) sets X and Y . We use them extensively to represent input-output
systems, of which there are rather a lot, for example rotations and moving
objects. In the last case, the input is the time and the output is where the
object is at that time.
For any binary relation R A B, we have a set of ordered pairs which is
R.
Denition 2.4.6. For any binary relation R A B, the domain of R,
written dom(R) is dened by:
dom(R) a A : b B, (a, b) R
Denition 2.4.7. For any binary relation R, between Aand B, the codomain
is B. This is sometimes called the range of R.
Denition 2.4.8. For any binary relation R A B, the image of R,
written im(R) is dened by: im(R) b B : a A (a, b) R
Exercise 2.4.4. Let R be the relation < on R. Give the domain and image
of R.
Exercise 2.4.5. Let R denote the relation on the cows of the last section.
Give the domain and range of the relation R.
Denition 2.4.9. If R is a binary relation between A and B (that is, R
A B) then R
1
is the inverse relation between B and A given by
(b, a) R
1
(a, b) R
Remark 2.4.1. The inverse relation is always dened.
Exercise 2.4.6. The relation square fromR to R consists of all pairs (x, x
2
), x
R. Give the inverse relation.
Denition 2.4.10. A relation f between sets X and Y is a map i
1. dom(f) = X
2. x X, y, z Y (x, y) f & (x, z) f y = z
Exercise 2.4.7. Translate the above into English and construct some exam-
ples of relations which are not maps and others which are when X is the set
u,v,w and Y is the set a,b,c,d,e
2.4. RELATIONS 37
X
>
x
f
y
>
Y
Figure 2.4.2: A picture of a map.
We write a map f between X and Y as f : X Y , and when (x, y) f we
write f(x) = y. We may write the pair of lines:
f : X Y
x y
to say that f is a map and f(x) = y. We talk informally of f taking x to y.
We also draw a picture like gure 2.4.2 to provide us with a picture of the
kind of things maps are. The image of f in gure 2.4.2 is shown on the
codomain.
Informally, a map from X to Y is a set of inputs, X, and a set of outputs
which are in Y but need not be all of it, and each input has precisely one
output.
Example 2.4.4. f(x) = x
2
is a map from R to R. So are most of the other
functions you have met. The inverse relation f
1
is not a map as you should
have established.
Remark 2.4.2. Some people call what I have called maps functions or map-
pings. The terms transformation and operator are also used. The reason for
the muddle is that the idea kept coming up in dierent contexts and people
needed a name for it and were too thick to notice it had been done before.
Denition 2.4.11. Two maps f : A B and g : C D are equal if
1. they have the same domain (A = C)
2. they have the same codomain (B = D)
3. for all a A, f(a) = g(a)
Sometimes we weaken this to insist only that the image of f is the image
of g and both are contained in B and in D. The word range is sometimes
used to mean the codomain and sometimes it means the image of f. This is
annoying and means you have to ask someone what they mean exactly. If it
is the author of a book you can only guess or hope he tells you.
Denition 2.4.12. Suppose f : X Y is a map and A X. Then the
image of A by f is written f(A) and is dened by:
f(A) y Y : x A, f(x) = y
Exercise 2.4.8. Translate this into English
The image of X is then the same thing as the image of f. The image of a
set consisting of a single point, x is necessarily a set consisting of a single
point, y, and we confuse ourselves by referring to y as the image of x.
Denition 2.4.13. If f : X Y is any map and V Y is a non-empty
subset, then the pre-image of V by f, written f
1
(V ), is dened by:
f
1
(V ) = x X : f(x) V
That is, the preimage of a set V is the set of all things that get mapped into
V by f.
Denition 2.4.14. A map f : A B is called onto if for every b B there
is some (that means at least one) a A such that f(a) = b.
Exercise 2.4.9. Write this out in formal algebra.
Note that f is onto if the codomain of f equals the image f(A).
Exercise 2.4.10. Which of the following maps are onto?
1. f : R R, f(x) = x
2
2. det: M
2,2
(R) R (which calculates the determinant of each matrix)
3. arctan(x)
4. f : N N, f(x) = 2x
Denition 2.4.15. A map f : A B is called 1-1 if f(x) = f(y) implies
x = y.
Exercise 2.4.11. Write out this in formal algebra
Exercise 2.4.12. Which of the following maps are 1-1?
1. f : R R, f(x) = x
2
2.4. RELATIONS 39
2. det: M
2,2
(R) R (which calculates the determinant of each matrix)
3. arctan(x)
4. f : N N, f(x) = 2x
Denition 2.4.16. A map f : A B is called bijective or a bijection if and
only if f is 1-1 and onto.
Exercise 2.4.13. Dene a bijection in formal terms.
Remark 2.4.3. The term bijection is relatively new and in old books it was
called a one-one correspondence.
Exercise 2.4.14. Which of the above maps are bijections? Can you redene
the domain and codomain to make them bijections?
Denition 2.4.17. Let f : A B and g : B C be maps. Then the map
g f : A C dened by (g f)(a) = g(f(a)) is called the composition of f
and g.
Exercise 2.4.15. Consider the permutations f, g : 1, 2, 3 1, 2, 3 de-
ned by
f =
1 2 3
2 3 1
g =
1 2 3
2 1 3
What is g f? f g?
Note that for a nite set a bijection to itself is otherwise known as a permu-
tation.
Denition 2.4.18. If A is any non-empty set then the identity function on
A is the function
A
: A A dened by
A
(a) = a for all a A.
Denition 2.4.19. Let f : A B be a function. Then a function f
1
:
B A is called inverse function of f if f
1
f =
A
and f f
1
=
B
.
Remark 2.4.4. This is rather confusing notation and leads people to think
that when I write U = f
1
V that f
1
exists and is a map. This is wrong.
Look at f(x) = x
2
and observe that the pre-image of the set [1, 4] is the set
[2, 1] [1, 2] which makes perfectly good sense, but there is no inverse
map dened from R to R. There is of course a perfectly good inverse relation
but it is not a map.
The previous denition means three things:
1. f
1
is a map
2. f
1
(f(a)) = a for all a A and
3. f(f
1
(b)) = b for all b B.
In particular, inverse functions need not exist. For example, the function
f : R R, f(x) = x
2
has no inverse map. It is easy to see why: rst 1
couldnt be sent anywhere because nothing got sent to it by f! Second, +1
couldnt go anywhere because we wouldnt know whether to send it to +1
or 1, because f sent both of them to +1. And maps have to have unique
outputs for any input.
Theorem 2.4.2. A map f : A B has an inverse if and only if it is a
bijection.
Proof:
Suppose rst that f is a bijection. Dene f
1
: B A with f
1
(b) = a
i f(a) = b. We rst need to show that f
1
is a map. That is we have to
show that for each b B there is one and only one a with f
1
(b) = a. Let
b B. Since f is onto there exists an a A with f(a) = b (that is just the
denition of onto). Now suppose there was also an c A with f(c) = b. But
then f(a) = b = f(c) and since f is 1-1 this implies a = c and hence there is
only one a with f(a) = b and we have proved f
1
is function.
The other two conditions now follow immediately.
Now suppose that f has an inverse. Then there is a map f
1
with f
1
(f(a)) =
a for all a A and f(f
1
(b)) = b for all b B.
To show onto: Let b B. Let a = f
1
(b). Then f(a) = f(f
1
(b)) = b.
To show 1-1: Suppose f(a) = f(c). Then f
1
(f(a)) = f
1
(f(c)) and hence
a = c.
Note that it is possible to write down rules that do NOT dene maps. To
see that f : A B really is a map (for which we often say it is well-dened)
we really have to check that for every a A there is one and only one b B
with f(a) = b.
2.4.3 Finite and Innite sets
We dene two sets A and B as having the same cardinality i there is a
bijection between A and B. This gives an equivalence relation on all sets.
2.4. RELATIONS 41
We dene, for any n N the set
n j N : 0 < j n
Denition 2.4.20. A set is nite i it has the same cardinality as some n
and we say it has cardinality n. The set 0 is the empty set and has cardinality
zero.
Remark 2.4.5. For this to amount to anything we need to be sure that
there is no bijection between n and m unless n = m. Try to prove this result
if you are feeling very brave.
Denition 2.4.21. A set is innite if it is not nite
It is now possible to prove a number of theorems although I shall merely
state them:
Theorem 2.4.3. The union, intersection, cartesian product of a pair of nite
sets is nite
Theorem 2.4.4. The power set of a nite set is nite
In fact if the set has cardinality n then the power set has cardinality 2
n
.
Denition 2.4.22. A set is said to be countably innite if there is a bijection
beteen it and N.
Remark 2.4.6. It is not immediately obvious but the integers Z and the
rationals Q are both countably innite. The reals R are not. It is however
true that R and R
2
have the same cardinality, and that the unit interval [0, 1]
also has the same cardinality. We sometimes say they all have the cardinality
of the continuum. There is a lot to be said about the dierence between nite
and innite sets but I have too much to do to say much.
Galileo noted that innite sets have the property that it is possible to have a
bijection between the set and a proper subset of it; his example was the map
dub : N N, n 2n. This means that there are as many even numbers
as there are numbers. For any nite set it is impossible to put the set in 1-1
correspondence with a proper subset of itself. You might consider trying to
prove this.
Exercise 2.4.16. Google Hilberts Innite Hotel for some entertainment
and enlightenment.
Chapter 3
Algebra and Arithmetic
3.1 Introduction
In the rst section, I am going to continue with the approach of the last chap-
ter to dene a whole lot of new terms to describe things that are described
in what is called abstract algebra. This is rather a matter of going through
the zoo and pointing at the weird beasts; the last beasts we come to are the
rings and elds, and I shall merely give their denitions and a few examples.
The second section, Arithmetic, is about the properties of a rather well known
ring, Z, the integers. This section is described in the ocial syllabus so I
have to cover it, but although interesting in its own right it is completely
irrelevant to the rest of the course. I nd it a bit of an embarrassment, and
I dont embarrass easily. I have lifted it from Alice Niemeyers lectures last
year and cut it to the minimum.
3.2 Unary Operations, Operators
Denition 3.2.1. A unary operation or operator on a set X is a map T :
X X.
Example 3.2.1. An n n matrix is an operator on R
n
.
Example 3.2.2. Transposition is a unary operation on n n matrices.
Example 3.2.3. On the space (
(R, R) of innitely dierentiable functions

from R to R, the identity map I is an operator, as is aI for any a R. So is
the operation of dierentiation, usually written D. So is D + aI, and since
43
44 CHAPTER 3. ALGEBRA AND ARITHMETIC
we can compose operators, so is (D + aI)(D + bI): this takes the function
f (
(R, R) to
d
2
f
dx
2
+ (a +b)
df
dx
+ab f
Solving the second order linear dierential equation
d
2
f
dx
2
+ 4
df
dx
+ 2f = e
x
sin(x)
is a matter of nding the preimage under the operator of the function e
x
sin(x).
You solved such equations last year.
There are obviously lots more. Note that a map such as the determinant
from n n matrices to R is not an operator.
These suggest a way of looking at an operator as a procedure which does
something. If there is an inverse map then it can be regarded as undoing it!
The more ways you have of thinking about mathematical objects the better.
3.3 Binary Operations
Denition 3.3.1 (Binary Operation). Let G be a set. A binary operation
on G is a function that maps an ordered pair of elements of G to an element
of G.
Formally, a binary operation on a set G is a map
: GG G
Note that a binary operation is closed in the sense that it combines two
elements of a set to yield a third element that again lies in the set. This
means that it never goes outside the set.
Example 3.3.1. 1. Addition on Z is a binary operation: any ordered
pairs of elements of Z is mapped to another element of Z.
2. Addition is also a binary operation on N, R, C.
3. is a binary operation on R.
4. is not a binary operation on N.
5. Multiplication is also a binary operation on Z, N, R and C.
3.3. BINARY OPERATIONS 45
6. Let G = M
n,n
(R) be the set of all n n matrices with entries in R.
Then matrix multiplication is a binary operation on G.
Exercise 3.3.1. Which of the following operations are binary operations?
The addition + on the vector space R
n
.
The dot product on the vector space R
n
for n > 1.
The cross product on the vector space R
3
.
Now we can look at some nice properties that binary operations sometimes
have.
Denition 3.3.2. Let be a binary operation on a set G. Then we say that
is commutative if g h = h g for all g, h G.
We are very used to commutative binary operations, since addition and mul-
tiplication on N, Z, R and C are all commutative: it does not matter in which
order we perform the binary operation. Do we know any non-commutative
binary operations? Yes we do!
Example 3.3.2. Matrix multiplication is not commutative, e.g.
_
_
0 1 0
1 0 1
1 0 2
_
_
_
_
0 0 1
0 1 0
1 0 0
_
_
=
_
_
0 1 0
1 0 1
2 0 1
_
_
.
However
_
_
0 1 0
1 0 1
1 0 2
_
_
_
_
0 0 1
0 1 0
1 0 0
_
_
=
_
_
1 0 2
1 0 1
0 1 0
_
_
.
Another interesting property binary operations can have is the following:
Denition 3.3.3. Let be a binary operation on a set G. Then we say that
is associative if (a b) c = a (b c) for all a, b, c G.
Most of the examples of binary operations we have seen are associative.
1. Matrix multiplication is associative but not commutative,
2. Function composition is associative but not commutative.
3. The cross product is not commutative, nor associative, on R
3
,
Exercise 3.3.2. Let A be a nite set and write Bij(A) for the set of all
bijections between A and itself. As remarked earlier, we can call these per-
mutations in the case where A is nite.
Let : Bij(A) Bij(A) Bij(A) denote the operation of composing the
bijections. It is easy to see that the composition of two bijections is a bijection
(Check this!) so we have a well dened binary operator on Bij(A). Is it
associative? Commutative?
It may have crossed your mind that quite a lot of common structure is no-
ticeable in the various bits of mathematics we have done. For example, we
can add and subtract and multiply integers and also real numbers and also
2 2 matrices. This is only the tip of a humungous iceberg.
3.4 Groups
Denition 3.4.1. Suppose : G G G is a binary operation. We shall
say that (G, ) has an identity if there is some element e G such that
g G, e g = g e = g
Exercise 3.4.1. How many of the sets and binary operations discussed so
far have an identity?
Denition 3.4.2. Suppose (G, ) has an identity e. We say that it has
inverses i
g G, h G : g h = h g = e
We usually write the inverse of g as g
1
which we read as g inverse.
Exercise 3.4.2. How many of the sets and binary operations discussed so
far have inverses?
Denition 3.4.3. A group is a pair (G, ) where G is a non-empty set,
: GG G is a binary operation on G such that
1. is associative
2. (G, ) has an identity, e
3. Every g G has an inverse, g
1
, such that g g
1
= g
1
g = e
3.4. GROUPS 47
We care about groups because there are so many of them. The little dears
turn up all over the place and you cant do much serious science or mathe-
matics without nding one. The symmetries of a crystal are specied by a
group; the symmetries of quantum mechanics involving things like SU(2) are
groups (Lie groups actually) and we shall meet some of them later.
Exercise 3.4.3. Make a list of all the groups you can think of. Conrm that
they are groups.
Exercise 3.4.4. Let Z
12
denote the integers as seen on a clock. Then we
can add numbers as the clock does so that 11+2 = 1. I shall imagine a clock
which has a zero where most ordinary clocks have a twelve. Then I could
write out an addition table for what is called arithmetic modulo twelve. I
could also do multiplication in this system.
Is the set of twelve integers with clock addition a group? Check carefully.
Exercise 3.4.5. Suppose I had a crazy clock with only 11 hours on it, so that
it produced the time every hour as a number between 0 and 10. Three hours
after nine oclock would be one oclock and so on. Show that these numbers
(known as the integers mod eleven) also form a group under the addition mod
11.
Exercise 3.4.6. Dene, for each positive integer n,
n
on Z by
a, b Z, a b m Z nm = (b a)
Show that
n
is an equivalence on Z. What are the equivalence classes for
n = 2, 3, 12?
Show how we can turn the equivalence classes into a group by adding a sample
element from each class and nding the resulting class. Show the sum class
does not depend on the samples you chose.
The groups are called Z
n
, the cyclic groups.
Exercise 3.4.7. I decide to dene a binary operation on the students in this
class. Given students A and B, my operation simply chooses the taller. I
suppose no two people are exactly the same height.
Does this operation make the class into a group? If not, why not? Is there
an identity and if so who?
Exercise 3.4.8. For a nite group (G, ), you could write out the multipli-
cation table for , all of it. Unlike the multiplication tables for integers you
wouldnt have to stop after they got a bit big. Here is a set of four elements
and a multiplication table. Conrm that it denes a group.
e a b c
e e a b c
a a b c e
b b c e a
c c e a b
Exercise 3.4.9. How many other groups are there with four elements?
Exercise 3.4.10. n j Z : 1 j n. A permutation of n is a
bijection from n to itself. Verify that the permutations of n form a group
under the operation of composition. How many elements are there in the
group? Is the group always abelian?
Denition 3.4.4. The permutation group on n is called S
n
.
Exercise 3.4.11. Many of the groups which you have met are abelian, mean-
ing that the binary operation is commutative. Can you nd a non-abelian
nite group? This is rather a lot of work if you do it the stupid way and very
easy if you are smart.
Exercise 3.4.12. What is the smallest possible group? Write out its opera-
tion table.
Exercise 3.4.13. What is the next smallest group? Write out its operation
table.
Exercise 3.4.14. What is the smallest non-abelian group? Write out its
operation table.
Exercise 3.4.15. Keep going until you run out of patience or reach groups
of order 8.
Two results which may make lling in multiplication a whole lot easier:
Theorem 3.4.1. In any group (G, ) the right cancellation law holds:
a, b, c G, a b = c b a = c
Proof:
a, b, c G, a b = c b
(a b) b
1
= (c b) b
1
a (b b
1
) = c (b b
1
)
a e = c e
a = c
3.4. GROUPS 49
Theorem 3.4.2. In any group (G, ) the left cancellation law holds:
a, b, c G, a b = a c b = c
Proof: This is an easy exercise. Note that a kindlier person might have
explained after each deduction just what axiom or property is being used to
get from one line to the next. See if you can go through the above proof
conrming that every line does indeed follow from the line above by some
rule you can cite. It should be possible to pretend you are a little robot and
can only do things by following exact rules which are named, and then see
if you can get through the argument. If not, it does not stand up and you
should complain. Maybe shouting Exterminate! while you do so.
3.4.1 Subgroups
If you look hard at the above operation table for a group of order 4, and
compare it with the smallest possible group with one element e and table
e e = e, you will notice that every group has a subgroup consisting of just
this element. It is a group in its own right and also a subset of a group, so
it is called a subgroup:
Denition 3.4.5. If (G, ) is a group and H G is a subset so that (H, )
is also a group then H is said to be a subgroup of G and we write H < G.
It follows that every group has at least two subgroups, e and the group itself.
These are called improper subgroups, all the others (if any) are called proper
subgroups.
If you look at the table for the 4-group again you will see another subgroup,
H = e, b gives a subgroup of order two.
Denition 3.4.6. The order of a group is the number of elements in it. We
write [H[ for the order of H.
Exercise 3.4.16. Draw an equilateral triangle and label the vertices A, B, C.
We can rotate the triangle into itself and specify them by permutations. Note
that every permutation is either a rotation or a reection of the triangle. In
fact the group is the same as S
3
. Write out its operation table, and nd all
the subgroups.
Exercise 3.4.17. The above group is also called the Dihedral group D
3
be-
cause it is related to the symmetries of a planar gure. There is also a group
D
4
which is the set of symmetries of a square. How many elements are there
in D
4
? Write out its operation table and list all the subgroups.
Exercise 3.4.18. We can regard D
4
as a subgroup of S
4
. Show it is a proper
subgroup by giving a permutation of the letters A, B, C, D which is not a
symmetry of the square.
Exercise 3.4.19. You can get subgroups from a group by taking an element
g G and calculating g, g
2
= g g, g
3
, g
n
. Do this for the nite groups
you have looked at so far. Since the group is nite you must repeat after
some point. Show that the powers of any element of a group G must form a
subgroup of G.
Exercise 3.4.20. Find a subgroup of R
2
, +. Find all subgroups!
Exercise 3.4.21. Dene a relation
H
on a nite group G having subgroup
H < G by
x, y G, x
H
y h H x = y h
1. Show
H
is an equivalence relation on G.
2. Show that H itself is one equivalence class.
3. Show that any two equivalence classes have the same number of ele-
ments.
4. Deduce that H < G [H[ divides [G[
3.4.2 Group Actions
You will have noticed that many groups are groups of bijections (in a sense
all are, up to change of name). This suggests an intimate relation between
a group and the things the group elements act on to change them into other
things. For example, The dihedral group D
3
acts on an equilateral triangle
to move it into itself.
Generally we say that a group G acts on a set X when this sort of thing
happens. Formally:
Denition 3.4.7. A group G is said to have an action on a set X when
: GX X
is a map with the properties
1. x X, (e, x) = x, where e G is the group identity
3.5. RINGS AND FIELDS 51
2. g, h G, x X,
(gh, x) = (g, (h, x))
We often write (g, x) as g(x) treating g as a map from X to X.
Exercise 3.4.22. Make a list of all the group actions you can think of.
Remark 3.4.1. Observe that every group acts naturally on itself.
Exercise 3.4.23. Explain!
Remark 3.4.2. Observe that a general solution to a vector eld on R
n
denes an action of R on R
n
.
Exercise 3.4.24. Explain!
3.5 Rings and Fields
This section is mostly about jargon. It is pretty much a sequence of deni-
tions, and the only reason for doing them is that they are standard and get
used a lot. If you read a book or paper which mentions non-commutative
rings, at least you wont be intimidated because youll know what they are.
3.5.1 Rings
Most of the algebraic systems you have met at school have not just one
binary operation but two or even more. One of the things we are doing is
gradually building up to recover more complicated structures such as the real
numbers or vector spaces. So now we introduce a set which is a group and
which also has a second operation. I shall call the group operation addition
and the second operation multiplication, but this does not mean that they
are exactly what you mean by them in the case of numbers. They are just
convenient names. Similarly I shall use 1 to denote a multiplicative identity
without meaning that it has to be the number 1.
Denition 3.5.1. A ring is a set X and a binary operation called addition
and a second binary operation called multiplication such that the following
properties hold:
1. (X, ) is an abelian group with 0 as identity and the inverse of x X
is written as x.
2. is associative, that is
x, y, z X, (x y) z = x (y z)
3. is left distributive over , that is
x, y, z X, x (y z) = (x y) (x z)
4. is right distributive over , that is
x, y, z X, (y z) x = (y x) (z x)
Remark 3.5.1. There are various denitions of a ring:some insist on a mul-
tiplcative identity. I follow the better authors here.
Example 3.5.1. The integers Z form the best known ring.
Exercise 3.5.1. Verify the claim that Z is a ring.
Example 3.5.2. A real polynomial of one variable is a nite string of non-
zero numbers together with an indeterminate symbol (usually x) which oc-
curs, with a positive integer as index, next to each of the numbers in the
string. For example, 3 2x + 4x
2
. The set of real polynomials p(x) =
a
0
+a
1
x +a
2
x
2
+ +a
n
x
n
forms a ring. Adding them is done by adding
the coecients, and multiplying them is done in the usual way by using the
sum law for indices.
Exercise 3.5.2. Verify the above claim.
Remark 3.5.2. Alternatively we could dene a real polynomial in one in-
determinate as a nite string of the form a
0
+a
1
x +a
2
x
2
+ +a
n
x
n
where
the a
j
R and a
n
,= 0. This has the advantage that we can make a nice easy
denition of the degree:
Denition 3.5.2. The above polynomial with highest power a
n
x
n
, a
n
,= 0,
is said to be a polynomial of degree n.
Note that if you add two polynomials the degree of the sum is the higher
of the degrees of the components, and if you multiply two polynomials of
degrees n, m the degree of the product is n +m.
Exercise 3.5.3. Dene a polynomial in two indeterminates. Is the set of
such things a ring?
Much of what can be done with integers can also be done with any ring
including the polynomials. For example we can look for factors, we can
dene primes or something like them.
Example 3.5.3. The set M
n,n
of square matrices is a ring.
Denition 3.5.3. A subring of a ring (X, , ) is a subset U X which is
a ring under the same operations , .
Example 3.5.4. The set of functions F([a, b]) is a ring. The set C
0
([a, b]) of
continuous functions from [a, b] to R is a subring. The addition of functions
is asa usual, and the multiplication of functions also as usual.
Exercise 3.5.5. Verify the above claims.
Example 3.5.5. The clock integers, the integers mod 12, form a ring. We
can write down both a nite binary operation table for addition and for
multiplication mod 12.
There is nothing special about the number twelve, we could choose any pos-
itive integer n and take the integers with addition mod n, that is, n

= 0 and
when m = an + r, for 0 r < n m

= r . The resulting ring is commonly
called Z
n
. The

= symbol is caled congruence and an exercise will ask you to
show that

= is in fact an equivalence relation on Z.
If we look at the addition table for Z
4
we get
+ 0 1 2 3
0 0 1 2 3
1 1 2 3 0
2 2 3 0 1
3 3 0 1 2
The multiplication table is
0 1 2 3
0 0 0 0 0
1 0 1 2 3
2 0 2 0 2
3 0 3 2 1
The non zero elements certainly do not form a group since 2 2 = 0. For a
ring they dont have to, but as we shall see later, for a eld they do. Also
we have 2 1 = 2 3 = 1 which means we do not have cancellation, which
we must have in a group. So Z
4
is not a eld. In order for it to be a ring
we need to believe that is associative and also left and right distributive.
Since it is commutative, if it is left distributive it is also right distributive,
so we only have to look at two issues.
Now associativity, we believe, holds in Z, we have been using it for years, so
a, b, c Z, a (b c) = (a b) c
where denotes multiplication of integers. And if the two numbers are the
same, reducing them mod four, that is getting the remainders when we divide
by four, will leave them the same. So if you give me three elements of Z
4
I
just do multiplication on them the two ways as if they are in Z and get the
same answer then knock o fours and also get the same answer. So I have
that is associative in Z
4
and indeed in Z
n
for any n Z
+
. And exactly the
same argument works for distributivity: it works in Z, has done for years,
and if two numbers in Z are equal they stay equal when I do the reduction
mod four.
So Z
4
is a ring.
Doing this for Z
5
there is a dierence. The + is still a commutative binary
operation as before. The multiplication table for the non-zero elements is:
1 2 3 4
1 1 2 3 4
2 2 4 1 3
3 3 1 4 2
4 4 3 2 1
A look at the symmetry of the table tells us it is commutative and has an
identity, 1, and checking the number 1 occurs in every row precisely once so
we have inverses and the table must be an abelian group. Hence Z
5
is a eld.
It is worth noticing that a table like the above could have the columns per-
muted without changing the information in any way. Likewise it could have
the rows permuted. We can still read o a b from such a modied table.
Ordering them by increasing digits is convenient but inessential.
Another point worth making is that the use of numbers hasnt really got a
lot to do with it, changing the names to anything whatever, as long as we
had distinct symbols and did the replacement consistently, would not change
anything very important, the underlying structure would be the same.
So I shall set up a translation system: I shall rewrite 1 to 0 in the above table,
I shall rewrite 4 to 2, I shall rewrite 2 to 1 and I shall leave 3 unchanged. I
shall also rewrite to just for the hell of it.
If I do this I get the table
0 1 3 2
0 0 1 3 2
1 1 2 0 3
3 3 0 2 1
2 2 3 1 0
Now I swap the last two columns and then the last two rows and I get
0 1 2 3
0 0 1 2 3
1 1 2 3 0
2 2 3 0 1
3 3 0 1 2
This is just old Z
4
, so we have not only discovered that the non-zero elements
of Z
5
are a group, we know that, up to a change of name, it is in fact Z
4
.
The business of changing names to protect the guilty is very common and
we shall see a lot of it in future. Objects that have the same structure but
with the names changed are said to be isomorphic which is Greek for having
the same shape. This is a matter of much practical importance and I shall
return to it later.
Polynomial Rings
We regard polynomials as simply formal expressions with an indeterminate
x, but of course each polynomial determines a function
p : R R, x a
0
+a
1
x + +a
n
x
n
We may dene higher order polynomials with more than one indeterminate,
for example, p(x, y) = a
0
+a
1
x+b
1
y+a
2
x
2
+b
2
y
2
+c
2
xy+ +a
n
x
n
and these
dene functions p : R
k
R for k > 1 being the number of indeterminates.
Any set of polynomials with some xed set of indeterminates give a ring.
I said we could do some of the same sort of thing for rings as we can for the
integers. We can try dividing one polynomial into another to see if we get a
polynomial out or if there is a remainder:
(x 2) )x
2
+ 2x + 3( x
x
2
2x
4x + 3( +4
4x 8
11
x
2
+ 2x + 3 = (x 2)(x + 4) + 11
If there is a remainder it must be a polynomial having degree less than that
of the poynomial we are dividing by. If the remainder is zero, we say the rst
polynomial divides the second.
Theorem 3.5.1. If the real polynomial function p(x) has a zero at a R,
that is, if p(a) = 0, then (x a) divides the polynomial p, and if (x a)
divides p then p(a) = 0
Proof: Proving the rst half rst, we can divide (x a) into p to get a
polynomial q plus a remainder which must be a real number. In other words
p(x) = (x a)q(x) +r
If we evaluate both at a then the result is 0 = 0 +r so the remainder r must
be zero.
Conversely, if p = (x a)q then since the right hand side is zero when x = a
so is the left hand side.
Later we shall make use of this result in showing that the composite of
rotations in R
3
is a rotation.
Exercise 3.5.8. Would it make sense to divide one matrix into another and
have a remainder? If so give an example if not, why not?
3.5.2 Fields
A eld is a special sort of ring, one where the non-zero elements of the ring
form an abelian group under the multiplication. If they form a non-abelian
group, the object is called a skew-eld. The only elds you know so far are
Q and R.
3.6. ARITHMETIC 57
Exercise 3.5.9. Write down the addition and multiplication table for Z
2
(which has only the numbers 0, 1 in it) and verify it is a eld. Now repeat
for Z
3
, Z
5
and Z
7
. Is Z
4
a eld? Verify it is a ring.
This concludes the survey of the algebra zoo. There are more exotic things
in it, but this has introduced you to the more important of the beasts.
In order to remember the names and what they mean, it is easiest to re-
member the more common concrete examples, so if you read about rings,
think integers and if you read about elds think R. A few more examples
would be even better, because then there is a good chance you will be able
to remember the rules which dene them.
All of these beasts are important in applications (some in such areas as coding
theory) or we wouldnt trouble to tell you about them. Fields and nite
groups are used in such things as proving the insolubility of the quintic by
extracting roots. (You know the formula for solving the quadratic equation.
There is a similar formula for solving cubics and quartics, but there isnt a
similar formula for higher degree polynomials. Knowing this saves a lot of
time that might be spent looking for one.)
3.6 Arithmetic
This section has been lifted wholesale from Alice Niemeyers notes from last
year and deals with the properties of a rather well known Ring, the integers,
Z.
We begin by dening a natural relation on the integers.
Denition 3.6.1 (The Divides Relation).
An integer x is said to divide another integer y if there exists an integer n
such that y = nx. We write x[y to mean x divides y.
Lemma 3.6.1 (Division Algorithm). If m is a positive integer and n is
an integer, then there exist unique integers q and r such that
n = mq +r and 0 r < m.
The integer r is called the remainder.
Example 3.6.1.
If we take m = 4 and n = 23, then q = 5 and r = 3 are the only
integers which satisfy the Division Algorithm.
Warning: r must be between 0 and m.
Denition 3.6.2 (Greatest Common Divisor).
The greatest common divisor of two positive integers x and y is dened by
gcd(x, y) = maxi N : i[x and i[y.
Example 3.6.2.
The divisors of 8 are 1, 2, 4, 8 and the divisors of 20 are 1, 2, 4, 5, 10, 20.
So gcd(8, 20) = 4.
The gcd of two distinct primes is always 1.
Theorem 3.6.2. The greatest common divisor of two non-zero integers a
and b, can be expressed as a linear combination of a and b:
gcd(a, b) = am+bn for some integers m and n.
Denition 3.6.3. The least common multiple of two positive integers x and
y is dened by
LCM(x, y) = mini N : x[i and y[i.
Lemma 3.6.3. For all non-zero integers x and y,
LCM(x, y) =
xy
gcd(x, y)
.
Denition 3.6.4. Two positive integers are relatively prime if their gcd is
equal to 1.
Example 3.6.3.
Any two distinct primes are relatively prime.
15 and 8 are relatively prime.
3.6.1 The Euclidean Algorithm in Z
Let a
1
, a
2
Z with 0 a
2
a
1
. Then we can compute the greatest common
divisor d = gcd(a
1
, a
2
) of a
1
and a
2
by the following algorithm and nd
integers x, y Z such that
d = xa
1
+ya
2
.
3.6. ARITHMETIC 59
Repeatedly employ the Division Algorithm in Z to obtain:
a
1
= q
1
a
2
+a
3
with 0 a
3
< a
2
a
2
= q
2
a
3
+a
4
with 0 a
4
< a
3
. . .
a
i
= q
i
a
i+1
+a
i+2
with 0 a
i+2
< a
i+1
. . .
As a
i
< a
i1
< < a
2
a
1
there has to be an n Z
+
such that a
n+2
= 0.
In this case we say the algorithm has stopped. Then the last equation with
a non-zero remainder is:
a
n1
= q
n1
a
n
+a
n+1
with 0 < a
n+1
< a
n
.
Now we rewrite all these equation to solve them for their remainder and we
get
a
1
q
1
a
2
= a
3
a
2
q
2
a
3
= a
4
. . .
a
n2
q
n2
a
n1
= a
n
a
n1
q
n1
a
n
= a
n+1
Starting with the last equation
a
n1
q
n1
a
n
= a
n+1
replace the occurrence of a
i
for the largest i on the left hand side by the
equation solving for a
i
. That is we begin by inserting the second last equa-
tion:
a
n1
q
n1
(a
n2
q
n2
a
n1
) = a
n+1
,
that gives
(1 +q
n1
q
n2
)a
n1
q
n1
a
n2
= a
n+1
.
Now the third last equation solves for a
n1
and we can insert this. If we
repeatedly do this we nd integers x, y Z such that
xa
1
+ya
2
= a
n+1
.
We will see later that a
n+1
= gcd(a
1
, a
2
). Let us rst look at an example
though.
Example 3.6.4. Let a
1
= 558 and a
2
= 423
558 = 1 423 + 135
423 = 3 135 + 18
135 = 7 18 + 9
18 = 2 9 + 0
Then the last non-zero remainder is 9 so it will turn out that 9 = gcd(a
1
, a
2
).
Now we do the reverse substitution and get
9 = 135 7 18
= 135 7(423 3 135)
= 22 135 7 423
= 22 (558 423) 7 423
= 22 558 29 423
Example 3.6.5. We use the Euclidean Algorithm to nd the gcd of 56 and
1450.
1450 = (25)56 + 50
56 = (50)1 + 6
50 = (6)8 + 2
6 = (2)3 + 0
Therefore gcd(56, 1450) = 2.
Theorem 3.6.4. Let 0 = a
n+2
< a
n+1
< a
n
< < a
2
a
1
be the integers
computed by the Euclidean Algorithm for a
1
, a
2
. Then
a
n+1
= gcd(a
1
, a
2
).
Proof:
Let d = gcd(a
1
, a
2
). Then d divides a
1
and a
2
. As a
3
= a
1
q
1
a
2
it follows
that d divides a
3
. So d divides a
2
and a
3
and therefore by the next equation,
d divides a
4
. Repeatedly applying this argument we see that d divides a
n+1
.
On the other hand the Euclidean Algorithm stopped when a
n+2
= 0 which
means that a
n
= q
n
a
n+1
and thus a
n+1
divides a
n
. As a
n1
= q
n1
a
n
+ a
n+1
it follows that a
n+1
divides a
n1
. Repeatedly applying this argument we see
that a
n+1
divides a
1
and a
2
. Thus a
n+1
is a common divisor of a
1
and a
2
.
As d is the greatest common divisor it follows that a
n+1
divides d Hence
d = a
n+1
.
3.6. ARITHMETIC 61
3.6.2 Properties of the gcd
1. a, b Z and gcd(a, b) = 1 and a divides bc then a divides c.
Proof:
Let x, y Z such that xa + yb = 1 then xac + ybc = c. Suppose that
bc = az. Then xac + yaz = c and hence a(xc + yz) = c and therefore
a divides c.
2. For a, b Z and m Z0 holds
m gcd(a, b) = gcd(am, bm).
Proof:
Let d = gcd(a, b) then d [ a and d [ b and hence md [ ma and md [ mb
showing that md is a common divisor.
Now there are , Z such that d = a + b. Let c be a common
divisor of am, bm, that is c [ am and c [ bm then c [ am and c [ bm
and hence c [ m(a + b) = md and therefore md is the greatest
common divisor.
Denition 3.6.5. Let a, b Z. Then c is called a common multiple of a and
b if a and b both divide c. Further Z is called the least common multiple
of a and b, denoted lcm(a, b) if is a common multiple and divides every
other common multiple.
Theorem 3.6.3. For a, b Z holds
gcd(a, b) lcm(a, b) = ab.
Proof:
We prove this result by distinguishing whether gcd(a, b) is 1 or not.
Case 1: gcd(a, b) = 1. Then there is a x Z such that lcm(a, b) = ax.
Since b divides lcm(a, b) it follows by that ab divides ax = lcm(a, b). But
gcd(a, b) = 1 and so b divides x by 1. Thus lcm(a, b) = abx
and therefore
ab [ lcm(a, b). On the other hand lcm(a, b) [ ab and thus ab is the lcm(a, b).
Case 2: If d = gcd(a, b) then there are x, y Z such that a = xd and b = yd.
Then gcd(x, y) = 1 (because by Property 2 it follows that d gcd(x, y) =
gcd(dx, dy) = gcd(a, b) = d and so gcd(x, y) = 1.) Now by Case 1
gcd(x, y) lcm(x, y) = xy.
Multiplying this equation by d
2
we get
d gcd(x, y)d lcm(x, y) = dxdy
which means gcd(a, b) lcm(a, b) = ab.
Note that this gives us a way to compute least common multiples in Euclidean
Domains as we already know how to compute greatest common divisors.
3.6.4 Primes
A positive integer p is called a prime if whenever p divides ab for a, b Z
then p divides a or p divides b.
Lemma Let p be a prime. Then the only divisors of p are 1, p.
Suppose a is a divisor of p. Then there is an integer b such that p = ab.
Hence p divides ab (as p 1 = ab) and by the denition of prime, either p
divides a or p divides b. Hence ab divides a or ab divides b. But then there
is an integer c such that abc = a and so bc = 1 and so b = 1 or an integer
d such that abd = b and so ad = 1 and so ad = 1.
Theorem [Unique Factorisation Theorem] Every positive integer n is either
1 or can be expressed as a product of prime integers, and this factorisation
is unique except for the order of the factors.
proof.
We rst show that the following holds: Every positive integer n is either 1
or can be expressed as a product of prime integers.
We prove this result by induction on n. Note rst that if n = 1 then the
statement holds.
Now suppose that the statement has been proved for all integers n with n < k
for some positive integer k. Then we can show that it also holds for k. If k
is a prime, then the statement holds.
If k is not a prime, then k = ab and 1 < a, b < k. Hence the theorem holds
for a and for b. Thus there are primes p
1
, . . . , p
r
and q
1
, . . . , q
s
such that
a = p
1
p
r
and b = q
1
q
s
. Then k = ab = p
1
p
r
q
1
q
s
and so k can
be written as a product of primes.
Now we show that the expression is unique. Suppose n > 1 and suppose that
n = p
1
p
r
= q
1
q
s
. Then p
1
divides q
1
q
s
and so p
1
divides q
j
for some
j. As q
j
is also prime, it follows that p
1
= q
j
and we may assume without
3.6. ARITHMETIC 63
loss of generality that p
1
= q
1
. Thus cancellation gives p
2
p
r
= q
2
q
s
.
We can repeat this argument until on one side there are no primes left. Then
we have to end up with 1 = 1 since 1 = q
t
q
is not possible. Thus we

have found that the p
1
= q
1
and p
2
= q
2
etc .
If we write a positive integer as a product of primes in increasing order we
get
n = p
m
1
1
p
m
r
r
with
p
1
< p
2
< . . . < p
r
and we call this the standard form for n.
3.6.5 Examples
a = 31752 = 2
3
3
4
7
2
.
b = 126000 = 2
4
3
2
5
3
7.
Note that the standard forms of integers can be used to nd their GCDs.
gcd(a, b) = 2
3
3
2
7 = 504.
3.6.6 Linear Congruences
Let n Z
+
and a, b Z then we write
a b (mod n) if n [ (a b).
Now n [ (a b) if and only if a b nZ. nZ is the set nz : z Z.
3.6.7 Properties of Congruences
1.
a b (mod n)
a
(mod n)
_

_
a +a
b +b
(mod n)
aa
bb
(mod n)
2. If d > 0 and d [ n then
a b (mod n) a b (mod d).
If n divides a b and d divides n then d divides a b.
3.
a b (mod n
1
)
a b (mod n
2
)
.
.
.
a b (mod n
k
)
_
_
a b (mod lcm(n
1
, . . . , n
k
))
This is easy to see as the lcm(n
1
, . . . , n
k
) divides a b since a b is a
common multiple of n
1
, . . . , n
k
.
3.6.8 Examples
1. Find the last decimal digit of 7
19
.
Solution:
7
19
(49)
9
7 (mod 10)
(1)
9
7 (mod 10)
7 (mod 10)
3 (mod 10)
So the last digit is 3.
2. Fermat thought that F
5
= 2
2
5
+ 1 was prime but Euler showed that
641 divides F
5
:
proof: We must show that 2
32
1 (mod 641). Now
2
6
64 (mod 641)
10 2
6
640 (mod 641)
5 2
7
1 (mod 641)
5
4
2
28
(1)
4
(mod 641)
625 2
28
1 (mod 641)
(16) 2
28
1 (mod 641)
2
32
1 (mod 641)
3. Let n = p
1
1
p
k
k
be the decomposition of n into primes. Then
a b (mod n) a b (mod p
i
i
)
for each i = 1, . . . , k. This is clear because n is the lcm of the moduli.
This is often easier than working modulo n.
3.6. ARITHMETIC 65
3.6.9 Division in Congruences
In general ax ay (mod n) with a , 0 (mod n) does not imply x y
(mod n). As Z
n
is not always an integral domain the cancellation law ax = ay
with a ,= 0 implies x = y might not hold. For example, 2 8 2 2 (mod 12)
but 8 , 2 (mod 12).
We would like to work out when we can divide with congruences.
Theorem 3.6.10.
1. ax ay (mod n) x y (mod
n
gcd(a,n)
)
2.
ax ay (mod n),
gcd(a, n) = 1
_
x y (mod n)
Proof:
1.
ax ay (mod n)
n[a(x y)
n
gcd(n, a)
[
a
gcd(n, a)
(x y)
n
gcd(n, a)
[ (x y)
x y (mod
n
gcd(n, a)
)
Note that we use gcd(
n
gcd(n,a)
,
a
gcd(n,a)
) = 1.
2. This is just a special case of (1).
3.6.11 Examples:
2 8 2 2 (mod 12) and hence 8 2 (mod
12
gcd(2,12)
) which shows that
8 2 (mod 6).
3x 12y (mod 49) implies x 4y (mod 49).
If p is a prime and a , 0 (mod p) then gcd(a, p) = 1 and so ax ay
(mod p) implies x y (mod p).
Chapter 4
Complex Numbers
4.1 A Little History Lesson
There is a certain mysticism about complex numbers, much of which comes
from meeting them relatively late in life. The student has it explained that
there is an imaginary number, i which is the square root of 1 and this is
certainly imaginary because it is obvious to any fool that 1 doesnt have a
square root. When much younger, you might have been troubled by 4 7 =
3 which is just as obviously impossible and meaningless because you cant
take seven from four. Since you met this when you were young and gullible
and the teacher assured you that 3 is a perfectly respectable number, just
not one which is used for counting apples, you went along with it and bought
the integers. Now, when your brains have nearly ossied you are asked to
believe in square roots of negative numbers and those poor little brains resist
the new idea with even more determination that when you rst met negative
numbers.
The fact is all numbers are imaginary. The question is, can we nd a use for
them? If so, we next work out what the rules are for messing around with
them. And thats all there is to it. As long as we can devise some consistent
rules we are in business. Naturally, when we do it, all the practical people
scream you cant do that! It makes no sense!. After a few years they get
used to it and take them for granted. I bet the rst bloke who tried to sell
arabic notation to a banker had the hell of a time. Now try selling them the
Latin numerals as a sensible way to keep bank balances. Up to about 1400
everybody did, now theyd laugh at you.
Its a silly old world and no mistake.
We got from the natural numbers (used for counting apples and coins and
67
68 CHAPTER 4. COMPLEX NUMBERS
sheep and daughters, and other sorts of negotiable possessions) to the integers
by nding a use for negative numbers. It certainly meant that we could
now subtract any two natural numbers in any order, so made life simpler
in some respects. You could also subtract any two integers, so the problem
of subtraction was solved. The extra numbers could be used for various
things, essentially keeping track of debts and having a sort of direction to
counting. They didnt seem to do any harm, so after regarding them with
grave suspicion and distrust for a generation or so, people gradually got used
to them.
We got from the integers to the rational numbers by nding that although you
could divide four by two you couldnt divide two by four. So mathematicians
invented fractions, around ve thousand years ago. While schoolies all around
were saying Five into three does not go, some bright spark said to himself But
what if it did? Then you could divide by pretty much anything, except zero.
Dividing by zero wasnt the sort of thing practical folk felt safe with anyway,
so that was OK. And people found you could use these new-fangled numbers
for measuring things that didnt, like sheep and daughters, naturally come
in lumps. Then the Greeks found (oh horror!) that the diagonal of the unit
square hasnt got a length- not in Q anyway. The Pythagorean Society, which
was the rst and last religious cult based on Mathematics, threatened to send
out the death squads to deal with anyone spreading this around. The saner
mathematicians just invented the real numbers (decimals) which of course,
to a Pythagorean, were evil and wicked and heretical and the product of the
imagination- and a diseased one at that.
So there is a long history of people being appalled and horried by new num-
bers and feeling that anyone spouting that sort of nonsense should be kicked
out if not stoned to death. Square root of minus one indeed! Whatever rub-
bish will they come up with next? Telling us the world isnt at, I shouldnt
wonder.
Meanwhile back in the mathematical world where we can make up whatever
we want, lets approach it from a dierent angle.
The real line is just that a line. We can do arithmetic with the points on
the line, adding and subtracting and multiplying and dividing any pair of
points so long as we dont try to divide by zero.
Suppose you wanted to do the same on something else. Say a circle. Could
you do that? Could you make up some rules for adding and multiplying
points on a circle? Well you could certainly regard the points as angles from
some zero line and add them up. If you did it would be rather like the clock
arithmetic of the last chapter except you would be working mod 2 instead
4.1. A LITTLE HISTORY LESSON 69
of mod 12. You could also do multiplication the same way if you wanted.
The mathematician yawns. Boooooring! Nothing really new here.
Could you take R
2
and make up a rule for adding and multiplying the points
in the plane? Well, you can certainly add and subtract them because this
is a well known vector space and we can always add vectors. Confusing the
vectors with little arrows and the points at the end of them with the arrows
themselves is fairly harmless.
So we can do some arithmetic on R
2
.
What about multiplying points in R
2
? If we asked you to invent a way to do
this you would probably just multiply the corresponding components so
_
3
4
_
_
2
1
_
=
_
6
4
_
I have used for my very own new rule for multiplying points in R
2
. This
is not very exciting because we are not really doing anything new, we just
do to each component what we do to R. Booooring! I shall look for a more
interesting rule for .
To get anything new out we would have to jumble the components up in
some way.
Howsabout making
_
0
1
_
_
0
1
_
=
_
1
0
_
(4.1.1)
This would mean that we could identify the X-axis with the usual real num-
ber line and 1 would have a square root! This is beginning to look a bit less
boring. We have got something new for sure, the question is can we make up
some consistent rules for doing all possible multiplications of points in R
2
.
If we are going to regard the X-axis as the original real line, then we must
have the rule
_
a
0
_
_
c
0
_
=
_
ac
0
_
(4.1.2)
for any a and c.
The question now is, how much latitude have we got for making up the
general rule for multiplying any two points? After all, making up really
bizarre systems is too easy, we would like to have a sort of minimal extension
of the rules for R, not something simply dotty.
One reasonable restriction to make is that
_
1
0
_
_
c
d
_
=
_
c
d
_
(4.1.3)
which makes the real number 1 sitting in the plane a multiplicative identity
for the whole plane, not just for R. Another would be that we really would
like to have inverses for the non-zero points. And it would be nice if the
system were distributive, that is a (b + c) = a b + a c for every three
points a, b, c R
2
.
I strongly recommend you try to make up some rules which obey these con-
straints and see how little choice you have left. This will give you some feel
for what makes mathematicians do mathematics and you will nd it rather
fun. Probably. If you write down the rules implied by equations 4.1.1,4.1.2
and 4.1.3 and try to extend them you will nd there is basically only one
way to do it. You will come out with the rule:
_
a
b
_
_
c
d
_
=
_
ac bd
ad +bc
_
(4.1.4)
This is not immediately obvious which is perhaps why it took until the seven-
teenth century before these numbers became known, and took about another
century before most western mathematicians knew about them. And three
centuries later, you know about them. They are called the Complex Numbers
for no very good reason. It would have been more sensible to call them the
planar numbers but its too late to be sensible now.
4.2 Traditional Notations
We can do something to make remembering the rule of equation 4.1.4 a bit
easier. Instead of writing a point as a column matrix (which is the sensible
way to write elements of R
2
) we write the two components horizontally and
instead of a comma to separate them we write +i so we get a+ib to mean the
point
_
a
b
_
, with the understanding that this is not just R
2
we are talking
about but R
2
equipped with a new and spunky multiplication. We shall call
this new object (R
2
+, ). No we shant, just kidding, we shall call it C.
The advantage of this notation is that we can now do the multiplication
remembering only two things, rst that everything is associative and dis-
tributive:
4.2. TRADITIONAL NOTATIONS 71
(a +ib) (c +id) = ac +iad +ibc +i
2
bd
and second if i
2
= 1 we get
(a +ib) (c +id) = ac +iad +ibc +i
2
bd = ac bd +i(ad +bc)
So if you simply carry on as if you are adding and mutltiplying using the
usual rules and also that i
2
= 1 then you can do all the multiplication you
want.
To summarise, we have the following rules for C:
1. C is the set R
2
together with rules for adding and multiplying the
elements of R
2
2. The elements of C are written (a + ib), with (0 + ib) being shortened
to ib and (a +i0) being shortened to a.
3. The addition of elements of C is just as in R
2
so (a + ib) + (c + id) =
((a +c) +i(b +d))
4. The multiplication follows from the rule i
2
= 1 and using the distribu-
tive law with wild abandon. It follows that multiplication distributes
over addition in C.
We shall not use any special symbol such as to denote multiplication in C
because we dont in R. So if w and z are elements of C we just write wz for
the result of multiplying them in C.
In addition we have a special map : C C which is dened by a +ib =
a ib. We refer to a ib as the conjugate of a +ib. It is obvious that if you
do it twice you get back to where you started. We have the following rules
for conjugation:
1. w, z C, w +z = w + z
2. w, z C, wz = w z
3. z C, z z R
These are all easy to check by simply expanding w = u + iv and z = a + ib
and doing the sums. The last one gives us (a +ib)(a ib) = a
2
+b
2
(leaving
out +i0) and this is called the square of the modulus of z. You will note that
it is the same as the square of the distance of z from the origin. We write
[z[
z z. This depends on the denition:

Denition 4.2.1. The modulus of a complex number a+ib is the real number
a
2
+b
2
and is written [a +ib[.
It is obvious that this makes the modulus of a complex number its distance
from the origin.
Can we divide in C? Yes, except by zero.
w
z
=
w z
z z
and the denominator is
real and zero only when z is.
You can get used to doing sums with complex numbers very quickly by the
time tested method of doing a lot of them. Make up your own if you cant
nd any. The rest of the section is good clean fun and you will enjoy trying
the exercises. It is essential that you tackle them honestly.
Exercise 4.2.1. Draw the origin and a region around it in R
2
big enough to
include the unit circle. Draw in half a dozen points and mark them and their
coordinates. Now multiply each of them by i and draw in the results. What
does the map

i : C C, z iz do?
Exercise 4.2.2. Show that if two complex numbers are on the unit circle, so
is their product. In fact show that w, z C, [wz[ = [w[ [z[.
Exercise 4.2.3. Find a complex number which has the property that multi-
plication by it rotates R
2
about the origin by /4.
plication by it rotates R
2
about the origin by an angle .
plying by it scales the plane by a factor of two, so that every point is moved
twice as far from the origin as it started, in a straight line.
plying by it (a) rotates the plane by an angle and then (b) scales the plane
by an amount r for r R
+
.
Exercise 4.2.7. What conclusions do you draw from this sequence of exer-
cises?
Exercise 4.2.8. Has i got a square root? A cube root?
Exercise 4.2.9. Take the set of points you used to investigate multiplication
by i and square them all. This is sampling the map f : C C z z
2
.
Keep track of where each one goes. Give a description in English and pictures
of what this map does to (a) the unit circle (b) the interior of the unit circle
and (c) the exterior of the unit circle. Give arguments for believing that
4.3. THE FUNDAMENTAL THEOREM OF ALGEBRA 73
(d) every complex number with modulus less or equal to one has two square
roots except for 0, and (e) every number with modulus greater than 1 has two
square roots.
Exercise 4.2.10. Whaddya reckon on cube roots? Find three complex num-
bers which have cube 1.
4.3 The Fundamental Theorem of Algebra
If you experimented with the square function f : C C, z z
2
then
you will have discovered that it takes the unit circle and wraps it around
itself twice, the squaring function doubles the angle the point subtends from
the positive real axis. Points inside the unit disc get rotated and squashed
towards the centre. If you imagine doing this with a sort of rubbery disc, then
it is clear that every point in the disc has two square roots which are opposite
each other and also in the disc. I say it is clear; well it can be established
by algebra or by visualising a bit of weird stretching and shrinking of a disc
shaped piece of plasticene. Some prefer one way, some the other. Some
experiments show this is also true for cube roots and indeed for n
th
roots
for any positive integer n. So we not only have ensured that -1 has a square
root, we have that every negative number has a square root, and with a bit
of an eort, every complex number has a square root too. In fact two of
them unless we take zero.
Exercise 4.3.1. If you do not nd arguments about discs made of chewing
gum entirely convincing, you might like to write out the complex number
z = a + ib in the polar form (r, ) with r R
+
(except when z = 0) and
a = r cos(), b = r sin(). Note that if you square (r, ) you get (r
2
, 2).
You should have discovered this in doing the last sequence of exercises. It
follows that the square roots of (r, ) are (
r, /2). Verify this carefully for

about a dozen complex numbers by calculating the square root and then by
squaring them.
This is a bit like inventing negative numbers so you could take seven from
four and nding that you could take any one of the new numbers from any
other: it is closed under subtraction. Well, C is closed under taking n
th
roots.
The last exercise should suggest how to prove this.
Exercise 4.3.2. Calculate the n
th
roots of 1. 1 is often called unity in this
context.
Exercise 4.3.3. Show that the n
th
roots of unity form a group under complex
multiplication. Show this group is isomorphic to the cyclic group Z
n
.
It goes further than this: We can not only solve z
2
+ w = 0 for any known
complex number w, we could also solve az
2
+ bz + c = 0 for any a, b, c C.
We can use the usual formula for the quadratic to do so.
Exercise 4.3.4. Use the quadratic formula to solve
(1 2i)z
2
(3 + 4i)z + (5 2i) = 0
Hence write the quadratic z
2
(3 + 4i)
(1 2i)
z +
(5 2i)
(1 2i)
in the form
(z (a +ib))(z (u +iv)) for a, b, u, v R
Since it is clear that any quadratic equation with complex coecients can be
solved and factorised in linear terms, it is worth noting that if the original
quadratic has only real coecients, then there are two factors (z u)(z v)
and either both u and v are real or u and v are both complex with u = v.
All this generalises to higher degree polynomials. It is well known that there
is a formula for solving cubics and another for quartics, so the arguments
that we have used for quadratics work there too. We note that we can
always assume a polynomial has highest coecient 1 by just dividing by the
leading coecient: the new polynomial has the same zeros as the old one.
We have the following for any polynomial with leading coecient 1:
Theorem 4.3.1. [Fundamental Theorem of Algebra] For any polynomial of
degree n over C, with leading coecient 1,
p(n) = a
0
+a
1
z +a
2
z
2
+ a
n1
z
n1
+z
n
there is a factorisation into linear terms:
p(n) = (z u
1
)(z u
2
) (z u
n
)
for some n complex numbers u
1
, u
2
, , u
n
.
Proof:
This is going to be a very dirty and sketchy argument which can be made
rigorous but takes a lot of work and some algebraic topology to do it. Since
you dont do Algebraic Topology until fourth year I can only sketch this part
of the argument.
First I am going to put the complex plane inside the unit disc, D
2
= z C :
[z[ 1, by sending (r, ) to (
2
arctan(r), ). This contracts the plane R

2
4.3. THE FUNDAMENTAL THEOREM OF ALGEBRA 75
so it ts in the interior of the unit disc; it is a dierentiable map and is 1-1
and onto the interior and the inverse map is also dierentiable; this makes it
a dieomorphism. Call it . I assume that we carry over the multiplication
and addition using the usual rules in C. Now I take a polynomial of degree
1, z w over C where w is some complex number. This denes a polynomial
function f, z z w on C and this in turn denes a map on the interior of
the unit disc by my dieomorphism, . I note that as I get closer and closer
to the boundary, the map from the interior of the unit disc to the interior of
the unit disc given by g = f
1
gets closer and closer to the identity. I
therefore extend g to be the identity on the boundary of the unit disc. Then
this new map g : D
2
D
2
is continuous and is the identity on the boundary
which therefore gets left xed.
Now it may be proved by algebraic topology that if you take a circular disc
of putty or plasticene or chewing gum and stick it onto another such disc
which is rigid, in such a way that you do not tear the rst disc and keep
the boundary xed, then there has to be at least one point on the deformed
disc of putty which is over the centre of the rigid disc. There may be more
than one, but no point of the interior fails to be covered by some putty (or
plasticene or chewing gum). This is intuitively plausible. The not-tearing
condition means that the map specifying the position of the putty before and
after is continuous.
It follows that there is a point which gets sent to zero. Well, we knew this
of course, it is the point w.
Now let p(n) = a
0
+a
1
z +a
2
z
2
+ a
n1
z
n1
+z
n
be a polynomial of degree
n for any n Z
+
. This gives a polynomial map
f : C C, z a
0
+a
1
z +a
2
z
2
+ a
n1
z
n1
+z
n
This in turn gets transformed to a map g : D
2
D
2
by the same process as
before. We have g = f
1
on the interior of the disc, and we observe
that on the boundary of the disc, the lower order terms vanish and the map
becomes just a wrapping of the bounding circle around itself n times, with
n. And g is continuous.
Now an extension of the topological result which I shall have to ask you to
take on faith at present (although it is awfully easy to believe it ought to
hold) is that there is still a complete covering of every point of the interior
of the disc by the mapping. So there is some u C which gets sent to zero.
This means that z u is a factor of the polynomial p(n) = z
n
+u
n1
z
n1
+
u
n2
z
n2
+ u
1
z +u
o
which means it can be written as a product of (z u)
and a polynomial of degree n 1. But the same argument applies to this
one (alternatively there is an induction argument in here), so the polynomial
must factorise as
p(n) = (z u
1
)(z u
2
) (z u
n
)
as required.
Exercise 4.3.5. If u, w are two complex numbers and u + v is real and uv
is real, show that either u = v or both u and v are real.
Exercise 4.3.6. Suppose p(2) is a quadratic polynomial over C which has all
the coecients real numbers. Show that either (a) the roots of the polynomial
are both real or (b) the roots occur as a pair which are conjugates.
Exercise 4.3.7. Suppose p(3) is a cubic over C with real coecients. By
considering the related cubic function show that there has to be at least one
real root of p(3), and deduce that again the roots are either all real or one is
real and the other two are conjugate.
Exercise 4.3.8. Take a quartic polynomial (of degree 4) with real coe-
cients. It factorises into two quadratics (z
2
+az +b)(z
2
+cz +d). Show that
you can choose particular complex numbers for a, b, c, d, none of them real,
to recover a quartic with only real coecients.
Solve the two quadratic equations you obtain to obtain four roots of the quar-
tic.
Frame a plausible conjecture about all polynomials over C having real coe-
cients. Prove your conjecture. You might nd it useful to show that if p(z)
has real coecients, then p( z) = p(z)
4.4 Why C is so cool
If you look at the abstract rules which C satises, we observe that we have a
set of thingies and rules for adding them and multiplying them. The addition
is associative and commutative, it has an identity the complex number 0 and
additive inverses, the negative of the complex number. We can also multiply
these thingies and multiplication too is associative and has an identity. So
far we have got a ring. In addition, the ring is commutative and the non-zero
elements have multiplicative inverses, so it is a eld. There are not too many
elds, so a new one is like meeting a long lost relative.
Since you have been indoctrinated into using certain rules, it is nice to know
that you dont have to change when meeting something new. All the algebraic
4.4. WHY C IS SO COOL 77
habits you have acquired so painfully in school can be kept intact. Almost.
R and Q have an order on them, and C does not. And C has got the
conjugation operation on it which R and Q do not. But in the main, we can
feel reasonably safe in doing algebra with complex numbers.
Being a eld, we can use it for building vector spaces. Everything we did for
vector spaces over R such as talking about bases and dimension and having
matrices, we can do over C.
Those of you who have heard of SU(2) from Quantum Mechanics will get to
nd out about it. It is a group, the elements of which are complex matrices.
Exercise 4.4.1. Put
U =
_
w x
y z
_
, w, x, y, z C
Then U
the conjugate transpose is the matrix

U
=
_
w y
x z
_
We say that U is unitary i UU
is the identity matrix I

2
=
_
1 0
0 1
_
. We
say that U is a special unitary matrix i det(U) = 1, that is i wz xy = 1.
1. Show that the set of special unitary 2 2 matrices form a group. This
is SU(2).
2. Show that UU
= I
2
U
U = I
2
3. Show that the above conditions on U force z = w and y = x
4. Show that we could have dened SU(2) as the set of 2 2 complex
matrices
U =
_
w x
x w
_
with w w + x x = 1. Use the implicit function theorem to deduce that
SU(2) is a three dimensional manifold. (Hint: You might nd it worth
proving rst that SO(2) is a one dimensional manifold, that is, a curve.)
The last part will take you o the syllabus a bit, but I put in these exercises
to educate you as well as to entertain you.
Chapter 5
Abstract Vector Spaces
5.1 Introduction
I expect you will have found the abstraction of Chapters two and three some-
what stressful. It looks rather like a game where the job is to build up sys-
tems according to rules. You will have wondered, if you are alive, why anyone
bothers to do this.
The reason is that it saves a lot of time and energy. We shall shortly be
looking at vector spaces over C. We could do this by going through all the
material you did on Linear Algebra in rst year but with complex numbers
instead of real numbers. Then if we had some other eld, we could look at
vector spaces over that, and so on. Similarly, we could look at some of the few
hundred interesting groups, all one at a time. This would take a large slice
out of your lives, and you would discover that quite a lot of things looked
rather similar. You would notice that the theorems tended to get copied out
all over again, with only the names changed. This would be very wasteful,
and you would probably complain to each other that you have done it all
before.
The way out is to deal with the abstraction. What we do is to work out what
properties of the actual vector space the theorems depend on. If another
vector space shares the same properties, the theorems will hold for that
one as well. This means we focus on the properties we care about, so we
make a list of properties and call anything that has them an abstract vector
space. We get these properties by looking at concrete cases and abstract the
properties and throw everything else away. This is scary the rst time you
meet it but you get used to it.
We have already done this with groups before you really got to terms with
79
80 CHAPTER 5. ABSTRACT VECTOR SPACES
individual cases; shortly we shall look hard at some particularly important
groups. With vector spaces we are doing it the other way around: you
already know something about R
n
, up to a choice of coordinate axes you live
in R
3
. Now we are going to do the abstraction: we shall take out the crucial
properties all the arguments depend upon and this way we can deal with a
squillion things all at once. Hang onto your hats.
5.2 Vector Spaces
Denition 5.2.1. Let F be a eld (for example R or C). A set V of objects
is called a vector space over F if for all u, v, w V and k, F the following
holds
1. u +v V (V is closed under addition)
2. u +v = v +u (addition is commutative)
3. u + (v +w) = (u +v) +w (addition is associative)
4. there is 0 V such that u + 0 = u, the element 0 is called the zero
vector
5. for each u V there is a u V such that u+(u) = 0, the element
u is called the additive inverse
6. ku V
7. k(u +v) = ku +kv
8. (k +)u = ku +u
9. k(u) = (k)u
10. 1u = u.
The above list is known in some quarters as the ten commandments.
The elements of V are called vectors and the elements of F are called scalars.
Examples
1. The n-dimensional real Euclidean (vector) space which you have seen
in rst year is dened as the set of ordered n-tuples
R
n
=
_
(x
1
, x
2
, . . . , x
n
)
T
: x
1
, x
2
, . . . , x
n
R
_
,
with vector addition and scalar multiplication dened by:
5.2. VECTOR SPACES 81
(i) x +y = (x
1
+y
1
, x
2
+y
2
, . . . , x
n
+y
n
)
T
,
(ii) kx = (kx
1
, kx
2
, . . . , kx
n
)
T
, for k R,
is a vector space.
Note that I write my elements of R
n
as a column, and when for typo-
graphic reasons I dont, I put a little T for transpose to tell you to turn
the row sideways.
Remarks 5.2.1.
(a) We refer to an n-tuple (x
1
, x
2
, . . . , x
n
) as a point in R
n
or as a
vector.
(b) The zero vector is 0 = (0, 0, . . . , 0)
. .
n zeros
.
(c) The addition inverse of the vector x is x = (x
1
, x
2
, . . . , x
n
).
(d) R
n
is the Cartesian product of n copies of R which becomes a
vector space over R through addition of vectors and scalar multi-
plication.
2. The set
C
n
= (x
1
, x
2
, . . . , x
n
)[ x
1
, x
2
, . . . , x
n
C ,
with addition and scalar multiplication dened componentwise (same
as for R
n
) is an n-dimensional complex vector space.
3. Let F[a, b] denote the set of all real functions dened on the interval
[a, b], that is, F[a, b] = f[f : [a, b] R. If we dene addition of
two functions (as usual) pointwise and scalar multiplication (as usual)
pointwise, that is, for f, g F[a, b] and k R, we have (f + g)(x) =
f(x) +g(x), and (kf)(x) = kf(x), then F[a, b] is a vector space.
Note that, if we let a = and b = in the last case, then we have
the vector space F(R).
As an exercise show that F[a, b] satises the 10 dening properties of
a vector space.
4. Let M
m,n
denote the set of mn matrices with entries in F. You are
familiar with matrix scalar multiplication and matrix addition. The
set M
m,n
dened in this way is a vector space.
There are things which have some of the properties that look a bit like a
vector space at rst glance but which fail to satisfy all the conditions. For
example, suppose we look at all possible mixtures of rosemary, sage, parsely
and thyme, giving a cupful of r grams of rosemary, s of sage, p of parsely and
t of thyme for any possible values of s, t, p, t. This looks like a vector space
(the vector spice) but it isnt because we cant have negative values and so
a cupful doesnt have an additive inverse cup which can be added to a given
cup to give the zero cup which has no spice in it at all. On the other hand
it looks like a bit of a vector space of dimension four. One sixteenth of R
4
,
in fact.
Theorem 5.2.1. Let V be a vector space, u V and k a scalar, then
(a) 0u = 0
(b) k0 = 0
(c) (1)u = u
(d) ku = 0 if and only if k = 0 or u = 0.
We will prove parts (a) and (d) and leave the rest as exercises.
Proof (a).
(1 + 0)u = 1u + 0u axiom 7
(1 + 0)u = 1u denition of 0 in F
1u = 1u + 0u denition of =
u = u + 0u axiom 10
u +u = u + (u + 0u) property of =
0 = u + (u + 0u) denition of u
0 = (u +u) + 0u associativity
0 = 0 + 0u denition of u
0 = 0u property of 0
0u = 0 denition of =
5.3. (LINEAR) SUBSPACES 83

For part (b) I suggest you look at k(0 + 0). For part (c) try adding 1u to
both sides, but be careful you dont get the logic reversed!
Proof (d). If k = 0, then by part (a) ku = 0. Suppose k ,= 0, then we have
ku = 0 u =
1
k
0 = 0,
by part (b) of the theorem.
Conversely, suppose k = 0 or u = 0, then from parts (a) and (b) of the
theorem, ku = 0.
5.3 (Linear) Subspaces
Denition 5.3.1. Let U be a non-empty subset of a vector space V over
the eld F. Then U is a linear subspace of V if the following conditions hold.
(1) If u, v U, then u+v U (we say that the subset U is closed under
addition).
(2) If u U and k F, then ku U. (we say that the subset U is
closed under scalar multiplication).
Note that it is not true that R
2
is a linear subspace of R
3
, although there are
obvious ways of nding subspaces of R
3
which look a lot like R
2
, for example
the plane z = 0.
The term linear subspace will be shortened to just subspace from now on.
There are in fact other sorts of subspace, but we shall not be discussing
them in this chapter so we shall save space and typing by ignoring them. The
vector spice, which was in eect a subset of R
4
, is an example of something
it would be legitimate to call a subspace of R
4
in the general sense, but it is
not a linear subspace.
Remarks
1. A subspace U of a vector space V over F is itself a vector space over
F. That is, with the induced operations of vector addition and scalar
multiplication, the subspace U is a vector space over the same eld F.
Exercise 5.3.1. Show that I could have dened a linear subspace as
a subset with the induced operations and that this would have been
equivalent to the denition given above.
2. It is clear from the denition that 0 is always an element of a subspace
U.
3. The sets V and 0 are always subspaces of V . These are called the
trivial subspaces. (As an exercise prove that V and 0 are subspaces
using the denition above.)
Theorem 5.3.1. Let U be a subset of a vector space V over F. Then U is
a subspace of V if and only if the following conditions hold:
(a) 0 U and
(b) if u, v U and k, F, then ku +v U.
Proof. Exercise.
Examples of subspaces
(a) The solution set, U say, to the homoegenous system of equations in n
unknowns,
a
11
x
1
+ a
12
x
2
+ + a
1n
x
n
= 0
a
21
x
1
+ a
22
x
2
+ + a
2n
x
n
= 0
.
.
. +
.
.
. +
.
.
. +
.
.
. = 0
a
m1
x
1
+ a
m2
x
2
+ + a
mn
x
n
= 0.
which we can write in matrix notation Ax = 0, where A is an m n
matrix and x = (x
1
, x
2
, , x
n
) R
n
is a subspace if R
n
.
To see this, we use the properties of matrix addition and scalar multi-
plication. For then, it is clear that if k R and x U, then A(kx) =
kAx = 0. Also, if x, y U, then A(x +y) = Ax +Ay = 0 +0 = 0.
(b) Recall that F(R) denotes the vector space of all functions from R into
R. Then the subset C
0
(R) of all continuous functions, with the induced
operations of addition and scalar multiplication is a subspace.
To see this you need to recall that the sum of two continuous functions
is continuous and the scalar multiple of a continuous function is con-
tinuous. These two observations now prove that C
0
(R) is a subspace
of F(R).
5.4. SPANNING SETS AND BASES 85
(c) Recall that M
n,n
denotes the vector space of all n n matrices. Also
recall that a matrix A M
n,n
is symmetric if A = A
T
. The set of all
symmetric A = A
T
matrices is a subspace of M
n,n
.
Exercise. Let C
(m)
(R) denote the set of functions which are m times
dierentiable and where the m
th
derivative is continuous. Prove that C
(m)
(R)
is a subspace of C
0
(R) (which in turn is a subspace of F(R)).
Denition 5.3.2. Let V be a vector space and let u
1
, u
2
, . . . , u
r
in V . Then
the vector
w = k
1
u
1
+k
2
u
2
+ +k
r
u
r
,
is called a linear combination of the vectors u
1
, u
2
, . . . , u
r
. (Observe that
w V .)
Theorem 5.3.2. Let V be a vector space over a eld F and u
1
, u
2
, . . . , u
r

V . Then the set of all linear combinations of u
1
, u
2
, . . . , u
r
is a subspace of
V .
Proof. Exercise.
Denition 5.3.3. The subspace consisting of all linear combinations of
u
1
, u
2
, . . . , u
r
is denoted
W = spanu
1
, u
2
, . . . , u
r
.
We then say that the vectors u
1
, u
2
, . . . , u
r
span W or that the vectors
u
1
, u
2
, . . . , u
r
are a spanning set for W.
5.4 Spanning Sets and Bases
The idea of a spanning set is one which I visualise as a set of extensible
ladders.
We imagine that someone has two ladders, one stuck on the end of the other
at an angle, possibly a rightangle. They are aligned with a wall and there is
a paintbrush on the end of the second ladder, as in gure 5.4.1.
I have shown the blue blob that you get with one choice of extensions, and
another blue blob for a second (dotted) choice. It is pretty clear that you can
reach any point on the wall with these two ladders, and put a blue blob on
Figure 5.4.1: A spanning set for the plane.
Figure 5.4.2: A redundant spanning set for the plane.
it, and also that you couldnt make do with only one ladder. A single ladder
would give us blue blobs along a line only.
Figure 5.4.2 shows that with three extensible ladders, all at dierent angles,
you can get to any point on the wall in an innite number of dierent ways.
In particular you can get to the origin in innitely many ways.
The magic number two in this case gives a minimal set of extensible ladders
which can reach every point, and is, of course, the dimension of the wall.
If we wanted to reach every point in R
3
we would obviously need three
ladders, but not just any three. A badly chosen three would still allow us
to paint blobs only on a wall or at least some plane. This would happen
in the case when the three ladders were in the same plane. This is similar
to the case where we have two ladders which are collinear, they could reach
only along a single line.
To be able to span the largest possible space with three ladders we need to
have them linearly independent. This needs a careful denition which makes
precise and general the idea that the ladders point in dierent directions.
The usual denition is a bit opaque and needs some explanation but here it
is:
Denition 5.4.1. A set of vectors u
1
, u
2
, u
k
in a vector space V over
a eld F is linearly independent i for any a
i
F,
a
1
u
1
+a
2
u
2
+ a
k
u
k
= 0 a
1
= a
2
= = a
k
= 0
The following theorem explains a major consequence of this denition:
Theorem 5.4.1. A set of vectors is linearly independent when there is only
one way to express any point in the span as a combination of the vectors.
This is easy to see because if there were more than one way to get to any one
point on the wall, there would be more than one way to get to the origin,
which contradicts the denition of linear independence. For example, based
on gure 5.4.2,
1
_
2
1
_
1
3
_
1
6
_
1
3
_
5
3
_
= 0
Exercise 5.4.1. Prove the last theorem properly for any eld.
This tells us that if a set of vectors is not linearly independent, that is the
set is linearly dependent, then we can get to not only the origin but to any
point in the span of the set in more than one way. Since any one vector
determines a point, we can get to that point in more than one way: one is
just using that vector alone, so it follows that at least one vector in the set
can be expressed as a linear combination of the others. (It does not follow
that every vector in the set can be expressed as a linear combination of the
others. For example the set of three vectors in R
2
__
1
0
_
,
_
2
0
_
,
_
0
1
__
is linearly dependent since two times the rst minus the second plus zero
times the third is the zero vector. But we cannot express the last as a linear
combination of the rst two.)
This can be seen algebraically by getting to zero by a sum of scaled vectors
not all of which are zero: a
1
u
1
+a
2
u
2
+ a
k
u
k
= 0. Then if a
j
,= 0 we can
write
a
j
u
j
= a
1
u
1
a
2
u
2
a
k
u
k
and dividing by the non-zero a
j
expresses u
j
as a linear combination of the
rest.
Exercise 5.4.2. Show this works when u
j
= 0 and hence deduce that any
set of vectors containing the zero vector is linearly dependent.)
It follows therefore that if we have a linearly dependent set of vectors, we
can always throw away at least one and get a set which has the same span as
before.
Given any set of vectors we can continue throwing vectors away until we have
a minimal spanning set for the subspace spanned by the original set. This is
called a basis for the subspace spanned by the set.
Denition 5.4.2. A basis for a vector space is a set of vectors which
1. is linearly independent
2. spans the space
It follows that every vector space has a basis, since we could start o with
every vector in the space! The basis need not however be nite.
As well as starting o with a spanning set and chopping it down in size, we
could go in from underneath and build up to a basis. First we choose one
vector known to be in the space (which may or may not be a subspace of
a bigger space). If this does not span the space, we choose another vector
also in the space but not in the span of the rst. This gives us two vectors
which must be linearly independent since neither can be expressed as a linear
combination of the other. If this still does not span the space, we can choose
a third not in the span of the rst two, and keep on going until we have a
spanning set for the whole space.
It is an important theorem that says that if you start o to do this and I
start o to do the same thing, we will not usually get the same vectors, but
there is always the same number of vectors we wind up with. In general, the
set of vectors you get and the set I get may not be nite, but there is always
a bijection between them.
The cardinality of the set in other words is a property of the space, and is
called its dimension.
I give the nite dimensional version of the theorem:
Theorem 5.4.2. If u
1
, u
2
, u
k
and v
1
, v
2
, v
m
are both linearly in-
dependent sets of vectors and if the span of the two sets is the same, then
k = m.
Proof: See text book.
This justies the following denition:
Denition 5.4.3. The dimension of a vector space V is the cardinality of a
basis for V .
Exercise 5.4.3. Conrm that the space of polynomials of degree less than
or equal to n is a vector space, nd a basis for it, and hence deduce its
dimension.
The argument that a vector space has a dimension depends on being able to
show that if you have two dierent bases, the cardinality has to be the same.
This does not have to be nite in general. This works for any eld.
(Note it does not work for rings. The thing that satises the axioms for a
vector space but is over a ring instead of a eld is called a module over the
ring. Modules are much more complicated objects than vector spaces as is
suggested by the next exercise.)
Exercise 5.4.4. Show that two linearly independent vectors in Z Z need
not span the space.
Example 5.4.1. Determine whether x = (1, 1+i, 1i), y = (i, 6, 1+i) and
z = (1 +i, 2 i, 3 +i) span C
3
.
Solution. We have to determine whether C
3
= spanx, y, z. That is,
given any vector (a, b, c) C
3
, we wish to nd whether we can write this as
(a, b, c) = k
1
x +k
2
y +k
3
z
= k
1
(1, 1 +i, 1 i) +k
2
(i, 6, 1 +i) +k
3
(1 +i, 2 i, 3 +i),
which is equivalent to solving the system of equations
_
_
1 i 1 +i
1 +i 6 2 i
1 i 1 +i 3 +i
a
b
c
_
_

_
_
1 i 1 +i
0
6
1 +i
2 i
1 +i
0
1 +i
1 i
3 +i
1 i
a
b
1 +i
a
c
1 i
a
_
_
R
2

1
1 +i
R
2
R
1
R
3

1
1 i
R
3
R
1
_
1 i 1 +i
0 1
2 i
3(1 i)(1 +i)
0 1
3 +i
i(1 i)
a
b
3(1 i)1 +i

a
3(1 i)
c
i(1 i)

a
i
_
_
R
2

1
3(1 i)
R
2
R
3

1
i
R
3
_
1 i 1 +i
0 1
2 i
3(1 i)(1 +i)
0 0
3 +i
i(1 i)

2 i
3(1 i)(1 +i)
a
b
3(1 i)1 +i

a
3(1 i)
c
i(1 i)

a
i

b
3(1 i)1 +i
+
a
3(1 i)
_
_
R
3
R
3
R
2
=
_
_
1 i 1 +i
0 1
1
3

i
6
0 0
5
3

5i
6
a
b a(1 +i)
6
a(7 +i)
6

b
6
+
c(1 i)
2
_
_
.
This shows that
k
3
=
6
10 5i
_
a(7 +i)
6

b
6
+
c(1 i)
2
_
k
2
=
b a(1 +i)
6
k
3
_
1
3

i
6
_
k
1
= a k
2
i k
3
(1 +i).
Hence given any three complex numbers (a, b, c) we have shown that we can
write them
(a, b, c) = = k
1
(1, 1 +i, 1 i) +k
2
(i, 6, 1 +i) +k
3
(1 +i, 2 i, 3 +i).
Hence we have shown
C
3
= spanx, y, z
5.5 Direct Sums

If V is a vector space over a eld F then it has a basis, and we could split
the basis elements into two collections. The rst collection would then span
a subspace U of V and the second would span another, W, say.
It is obvious that any vector in V can be written as a sum of basis elements
and hence as a sum of two vectors, one in U and one in W.
In such a case we write V = U W and call V the direct sum of U and W,
5.5. DIRECT SUMS 91
Denition 5.5.1. For any vector space V over a eld F, V is the direct sum
of subspaces U and W i U W = 0 and
v V, u U, w W v = u +w
This doesnt say anything about splitting a basis up, but we can recover this
by taking a basis for U and another for W and observing that the union is a
basis for V .
Exercise 5.5.1. Prove that if V = U W and B is a basis for U and C is
a basis for W, then B C is a basis for V .
Given a subspace U of V , we can take a basis for U and then extend it to
a basis for V . The extra elements in the basis for V not in the basis for U
themselves span a subspace W, and must be a basis for W, since if the set
were linearly dependent then the whole set would be and couldnt be a basis
for V . This shows we have constructed a subspace W such that U W = V .
We say that W is a complement for U in V . Some algebraists use the term
direct summand because all this makes sense in any abelian group since we
are not using the scaling properties here.
Denition 5.5.2. In any vector space V , given a subspace U, a subspace
W is a complement of U in V i U W = V .
Note that complements are not usually unique.
Exercise 5.5.2. Show that there are an innite number of complements to
any line in R
2
.
Theorem 5.5.1. dim(U) + dim(W) = dim(U W)
Proof:
This comes directly from the fact that we can take the union of a basis for
U and a basis for W to get a basis for U W.
5.5.1 Changing the basis for a space
The fact that there are a lot of dierent bases for the same space has much
practical importance. We shall see later that for signicant problems there
can be a basis in which the calculations are easy, and others for which they are
almost impossible. The standard technique then is to transform the problem
into the natural basis, solve it there, and then transform the answer back.
A particular case of this is calculating the position of the planet Mars in the
night sky six months from now. The question is, where do you point your
>
>
Figure 5.5.1: A nice vector eld in R
2
, with some ow curves.
telescope? The answer will be some altitude and some azimuth, two angles.
This is the coordinate system in which we want an answer. But the motion
of Mars does not depend on your telescope, it depends on the Sun. So we
work out from the angles of your telescope today, taking into account where
you are on Earth and what the date and time are, where Mars is in its orbit
around the sun, and also where Earth is on its orbit. Then we transfer to
a coordinate system centred on the Sun to work out where Mars will be six
months from now, and where you will be six months from now. Then we
transfer back to the telescope system of coordinates and we have the answer.
Doing it all in telescope terms is an appalling prospect.
As another example of the importance of changing bases, directly relevant
to us since we shall be looking at vector elds but not doing astronomy, the
gure 5.5.1 shows a fairly simple vector eld in R
2
.
Exercise 5.5.3. Join the arrows from a collection of dierent starting points
to get the ow of the vector eld of gure 5.5.1. This will be a family of curves
showing the solutions to the system of Ordinary Dierential Equations with
dierent initial starting points. I have drawn two to give the idea.
The second vector eld is much nastier.
Exercise 5.5.4. Join the arrows from a collection of dierent starting points
to get the phase potrait or ow of the vector eld of gure 5.5.2.
The point is, they are essentially the same vector eld and we can get from
one to the other by simply changing the basis in which we represent them
both. It is much easier to both solve the equations and to draw the ow in
the rst coordinate system than in the second.
Many problems are essentially similar to this one, so we shall be concerned
with changing the basis of a space into something appropriate to the problem.
This takes us naturally into the next section.
5.6. LINEAR MAPS 93
Figure 5.5.2: A nasty vector eld in R
2
.
5.6 Linear Maps
Your recommended text uses the term linear transformation for what I shall
call a linear map. While I would not wish to suggest that there is anything
wrong with the recommended text, you will note that the recommended
reading Hirsch and Smale use the term linear map. What splendid people
Hirsch and Smale are, having the same taste and judgment in matters of
terminology as me!
Given that we have vector spaces which are sets, then we can certainly have
maps between the sets. These particular sets have some structure on them,
and we would like to take not just any old maps but those maps which
preserve the structure. The intuitive idea is best looked at through concrete
examples, as always, so we do so now.
Example 5.6.1. Let U = V = R
2
and dene f : R
2
R
2
to be a rotation
about the origin by /4 in the positive sense. Figure 5.6.1 gives a picture of
this. If you think of the rst R
2
as a sheet of transparent green plastic and
the second R
2
as a sheet of red plastic (both with the axes marked on), we
pick up the green sheet, rotate it a bit, and plonk it down on the red sheet.
Then the green points are directly over the red points the map sends them
to.
Since we shall be concerned with rotations in three dimensions later (remem-
ber the orange) this is worth looking at hard.
We have then that the domain of f is the set of initial positions of the points
and the image is the set of nal positions of the points. The map f is 1-1
and onto. It has an inverse map, rotate by the same angle in the opposite
direction.
This particular map has some additional important properties: rst it sends
>
f
Figure 5.6.1: A rotation of R
2
regarded as a map f : R
2
R
2
.
>
g
Figure 5.6.2: A stretching of R
2
regarded as another map g : R
2
R
2
.
straight lines to straight lines and the origin to the origin. Second, it leaves
all distances between points the same after as they were before. We say that
the distance is invariant under the map.
Compare this with the second map I want to look at: this one stretches
everything by a factor of two. I show a picture of it in gure 5.6.2, I call this
one g so there is no risk of the two quite dierent maps being confused.
This map is also 1-1 and onto. It has an inverse which shrinks everything to
one half the original size. It still sends straight lines to straight lines and the
origin to the origin. But it does not preserve distances, it doubles them.
For my last example I describe the elephants footstep map. It crushes the
whole plane onto the X-axis as if an elephant trod on it. I have shown, in
gure 5.6.3, the elephants foot just before the crunch, although this is not
strictly speaking necesssary. After all, it might not have been an elephant
did it. We only require a before and after description, given neatly by the
5.6. LINEAR MAPS 95
>
e
Figure 5.6.3: A projection of R
2
regarded as another map e : R
2
R
2
.
map.
Note that this map is neither 1-1 not onto, but it still takes the origin to
the origin and straight lines to straight lines (possibly points). It preserves
distances only between points on the same horizontal line. To get away with
the claim that it preserves lines I need to assume that the elephant also
tramples points below the X-axis upwards to the axis.
This is not entirely silly, although you may think it has nothing much to
do with mathematics. You are quite wrong about that. But it is certainly
lacking in precision, so lets specify the three examples carefully.
f
_
x
y
_
=
_
x cos() y sin()
x sin() +y cos()
_
describes the rotation of the plane by an angle so putting = /4 will
describe the rotation map.
g
s
_
x
y
_
=
_
sx
sy
_
describes the stretching by a factor s for any s in R, so putting s = 2 will
specify our second map.
e
_
x
y
_
=
_
x
0
_
will specify our elephants footstep map. It sounds classier if I call this a
projection so from now on I shall.
The property of sending the origin to the origin and straight lines to straight
lines is the one thing they all have in common. It would be nice to be able
to say that in algebra too. Fortunately this is easy:
Denition 5.6.1. A map f : U V between vector spaces over a eld F is
linear i
1. u U, t F, f(tu) = tf(u)
2. u, v U, f(u +v) = f(u) +f(v)
The rst property is called homogeneity and the second is called additivity.
Exercise 5.6.1. Go through the three maps described above and verify that
they are linear.
Exercise 5.6.2. Prove that a linear map f : R
2
R
2
takes the origin to the
origin and a straight line to a straight line or possibly a point.
Exercise 5.6.3. Find another linear map from R
2
to R
2
which is not a
composite of some of the above three maps (for various and s).
Exercise 5.6.4. Prove that the composite of linear maps is linear.
These properties are crucial. They will keep cropping up all over the place,
from discussing movements in space to solving Ordinary Dierential Equa-
tions. Dont forget them.
The central fact that makes linear maps so important is that if we know
what a linear map does to a set of basis elements, we know what it does to
everything.
Theorem 5.6.1. If U, V are vector spaces over a eld F and if the set of
vectors u
1
, u
2
, , u
n
U span U, then a linear map f : U V is
specied once the set f(u
1
), f(u
2
), , f(u
n
) V is known.
Proof: Since the set u
1
, u
2
, , u
n
spans U we know that any u U can
be written as u = a
1
u
1
+ a
2
u
2
+ + a
n
u
n
for elements a
j
, j [1 : n] of F.
Whereupon
f(u) = f(a
1
u
1
+a
2
u
2
+ +a
n
u
n
) = a
1
f(u
1
) + a
n
f(u
n
)
by the linearity of f.
There is a very standard basis for R
2
consisting of the vectors
e
1
=
_
1
0
_
, e
2
=
_
0
1
_
5.6. LINEAR MAPS 97
It follows that if I know for a linear map f : R
2
R
2
: f(e
1
) =
_
a
b
_
and
f(e
2
) =
_
c
d
_
then I have used only the four numbers a, b, c, d to specify the
map. If I write them in the form
_
a c
b d
_
then this array is a complete specication of the linear map f : R
2
R
2
. It
follows also that if I want to calculate the value of f on a vector, I can do it
by:
f
_
x
y
_
= xf(e
1
) +yf(e
2
) = x
_
a
b
_
+y
_
c
d
_
=
_
xa +cy
xb +dy
_
If we write the array to the left of the vector we get the rule for a matrix
multiplying a column vector:
_
a c
b d
_ _
x
y
_
=
_
ax +cy
bx +dy
_
So the rule for matrix multiplication comes from a decision to write the
vectors as columns. Had we written them all as rows wed have had a dierent
rule. Conversely, if this is the rule for matrix multiplication and we propose
to put the maps to the left of the things they act on, then it would be sensible
to write vectors as columns and not as rows. This last consideration seems to
have escaped the writers of the text book and also the writers of the rst year
linear algebra notes. If for typographical reasons I want to t in a vector as
a row I shall put a little index T for transpose to tell you to turn it sideways,
as in [x y]
T
.
Exercise 5.6.5. Work out what the rule for operating on vectors by matrices
would be if we had them all as rows.
It is obvious that this works for any vector spaces with a nite spanning set,
and all that happens is that the number of columns becomes the dimension of
the domain U (the number of vectors in a basis, that is, a minimal spanning
set), while the number of rows is the dimension of the codomain V .
We have therefore proved:
Theorem 5.6.2. Any linear map f : U V between nite dimensional
vector spaces over a eld F can be represented as an n m matrix, (with n
rows and m columns) where the dimension of U is m and the dimension of
V is n, and where the matrix has entries elements of F.
Note that we didnt have to say anything about the entries of the matrix
except that they are elements of F. No worrying about complex numbers
as special cases precisely because we dene everything as an abstract vector
space over F for any eld F you feel inclined to use.
Note also that we can use any spanning set or any basis. You have only used
the standard basis in R
n
so far. (The standard basis in C
n
is the same!). But
it is important to realise that we could choose any basis in U and any basis
in V and still get a matrix. Obviously we need to take care to ensure that
we are clear about which pair of bases we are using. For many vector spaces
there is no natural or standard basis, so the matrix representing the linear
map must have the two bases specied carefully. For other linear maps, there
is, as we shall see, a basis which is natural for the linear map.
5.6.1 Kernels and Images
For any linear map f : U V between vector spaces over the same eld
F, we have two important theorems. The rst is that the image of f is a
subspace of V .
Theorem 5.6.3. For any linear map f : U V between vector spaces over
the same eld F, the image of f is a subspace of V .
Proof:
v im(f), t F, u U f(u) = v
f(tu) = tv
tv im(f)
Hence im(f) is closed under scalar multiplication.
Similarly,
v, v
im(f), u, u
U f(u) = v and f(u
) = v
f(u +u
) = v +v
v +v
im(f)
Hence im(f) is closed under addition.
The second theorem says that the kernel of f is a linear subspace of U; the
kernel is the subspace of U that f sends to the zero vector. Your text book
may call it the nullspace of f.
5.6. LINEAR MAPS 99
Exercise 5.6.6. Write out a formal denition of the kernel of a linear map,
that is, translate the above denition into algebra.
Theorem 5.6.4. For any linear map f : U V between vector spaces over
the same eld F, ker(f) is a linear subspace of U.
Exercise 5.6.7. Provide a proof.
Remark 5.6.1. We looked at this case earlier but wrote it out as a system
of linear equations all equal to zero. The old fashioned language says the
same thing but in a clunkier way.
An important theorem which I hope you did last year is:
Theorem 5.6.5 (Rank-Nullity). For any linear map f : U V between
vector spaces over the same eld F, if U is nite dimensional then im(f) is
nite dimensional and dim(ker(f)) + dim(im(f)) = dim(U). Later we shall
prove this.
5.6.2 Isomorphisms
An important idea in mathematics is that two objects may be essentially
the same but with the names changed. For example, R
2
is a vector space
where the elements are ordered pairs of real numbers and we write them
as columns, each column having two numbers in it. We could dene, for
any a R
2
, a shift map s
a
: R
2
R
2
, x x + a. This is not a linear
map, it takes straight lines to straight lines, but it doesnt take the origin
to the origin, it takes it to a. If x and y are any two points of R
2
then
s
a
(x +y) = (x +y +a) ,= s
a
(x) +s
a
(y) = x +y + 2a.
It is easy to see that s
a
is 1-1 and onto, that is a bijection. It would be
tempting to regard s
a
as an arrow with its tail at the origin and its head at
the point a, and think of it as slightly dierent from the point at the head.
Given two such shifts, a and b it makes sense to add them: In fact this is
just the composite map. In other words we have:
a, b R
2
: s
a
s
b
= s
a+b
It also makes sense to scale shift maps:
a R
2
, t R, ts
a
= s
ta
Now you can rather easily convince yourself that the result of taking the set
of all such shift maps is a vector space over R of dimension 2. It has a rather
obvious basis: the rst element shifts the plane to the right by one unit and
the second element shifts it up by one unit. In fact it looks an awful lot like
R
2
.
Exercise 5.6.8. Show that the space of all shift maps is indeed a vector space
over R and that there is a linear bijection between it and R
2
.
Denition 5.6.2. A map f : U V is called ane i f = g + s where g
is linear and s : V V is a shift. That is to say there is a v V such that
u U
f(u) = g(u) +v
Remark 5.6.2. Contrary to the beliefs of a great many engineers, the func-
tion f(x) = mx +c is not a linear map unless c = 0.
Calling the space of all shift maps on R
2
o
2
, we have a vague feeling that o
2
and R
2
are basically the same as far as the structure is concerned although
we have to concede that they are dierent as far as the actual objects are
concerned, since an ordered pair of numbers is dierent from a map from
ordered pairs to ordered pairs.
Theorem 5.6.6. If f : U V is a linear bijection between vector spaces
over the same eld F, then the inverse map is linear.
Proof:
v V u U f(u) = v
Then
t F, f(tu) = tf(u) = tv f
1
(tv) = tu = tf
1
v
Also,
v
V u
U f(u
) = v
and f(u +u
) = f(u) +f(u
) = v +v
f
1
(v +v
) = u +u
= f
1
v +f
1
v
Denition 5.6.3. A linear bijection between vector spaces over the same
eld is called an isomorphism and the spaces are said to be isomorphic
Remark 5.6.3. Informally we think of two isomorphic spaces as being pretty
much the same but with the names changed to protect the guilty. Physicists
typically fail to make any distinction between isomorphic objects and this
5.6. LINEAR MAPS 101
leads to a certain amount of muddle and confusion. Arent you lucky this
is a mathematics unit, we could have had you totally screwed up by now if
wed tried.
It follows from putting these things together that R
2
and o
2
are isomorphic.
There are less obvious cases.
Exercise 5.6.9. Take the set of all linear maps from R to R
2
and add and
scale them as you are used to doing for maps, but dene the operations care-
fully. Show the space L(R, R
2
) is isomorphic to R
2
.
Exercise 5.6.10. Do the same for L(R
2
, R). This is called the dual space
to R
2
.
Exercise 5.6.11. Show that if V is any vector space over a eld F, the vector
space L(F, V ) is well dened and is isomorphic to V .
Exercise 5.6.12. Is it true that L(V, F), the dual space to V is always iso-
morphic to V ? If so prove it, if not give a counterexample.
Exercise 5.6.13. Prove that the composite of two isomorphisms is an iso-
morphism. Hence show that isomorphism is an equivalence relation on the
set of vector spaces. If it is a set!
Denition 5.6.4. A linear map from a vector space V to itself is called a
linear operator on V . In shortened form we use merely the term operator,
leaving it to the context to decide if the linearity is implied.
Not all operators are isomorphisms of course. We have that:
Theorem 5.6.7. If T is an operator on a nite dimensional vector space V ,
then the following are equivalent:
1. T has trivial kernel, that is kert(T) = 0
2. T is 1-1
3. T is onto
4. T is an isomorphism
Exercise 5.6.14. Prove the above theorem.
exProve that if two vector spaces are isomorphic they have the same dimen-
sion. Hint: take a basis for one and show that the image of the basis by the
isomorphism is also a basis.
Exercise 5.6.15. Prove the Rank Nullity Theorem. The idea is that you
choose a basis for ker(f) and extend to a basis for U. The extra basis el-
ements, not those in ker(f), are a basis for a complement of ker(f), W,
and on this complement f is 1-1 and onto the image, so is an isomorphism
between the complement and the image which therefore have the same dimen-
sion. Then we use theorem 5.5.1. It should give you some revision of the
ideas in rst year.
5.7 Change of Basis
Suppose U is a vector space of (nite) dimension n N over a eld F and V
is another of dimension m N also over F, and f : U V is a linear map.
Denition 5.7.1. An ordered basis for U is a basis with an order on it, so
we can talk of the rst, second, et cetera, basis element.
Theorem 5.7.1. Take B (u
1
, u
2
, . . . u
n
) to be an ordered basis for U.
Then there is a linear bijection
B
n
: F
n
U
_
_
_
_
_
b
1
b
2
.
.
.
b
n
_
_
_
_
_
b
1
u
1
+b
2
u
2
+ +b
n
u
n
Proof:
The map is dened for all b =
_
_
b
1
b
2
.
.
.
b
n
_
_
F
n
and
B
n
(tb) = tb
1
u
1
+tb
2
u
2
+ +tb
n
u
n
= t(b
1
u
1
+b
2
u
2
+ +b
n
u
n
)
= tB
n
(b)
so B
n
is homogenous.
Also b, b
F
n
B
n
(b +b
) = (b
1
+b
1
)u
1
+ (b
2
+b
1
)u
2
+ + (b
n
+b
n
)u
n
= (b
1
u + b
n
u
n
) + (b
1
u
1
+ +b
n
u
n
)
= B
n
(b) +B
n
(b
)
5.7. CHANGE OF BASIS 103
so B
n
is additive.
Together these show B
n
is linear.
It remains to show B
n
is a bijection.
Since B is a basis, it spans U so for every u U there is some b B with
B
n
(b) = u so B
n
is onto.
Since B is a basis the vectors are linearly independent and if B
n
(b) = u the
coecient (b
1
, b
2
. . . b
n
) are unique by theorem 5.4.1 so B
n
is 1 1. So B
n
is
a bijection.
Denition 5.7.2. The map B
n
is called a parametrisation of U.
In other words, an ordered basis for U sets up an isomorphism between U
and F
n
.
Similarly the ordered basis C for V gives an isomorphism C
n
between V and
F
m
.
Exercise 5.7.1. Show that the isomorphism from R
2
to itself given by the
standard basis e
1
, e
2
is the identity map when the basis is in that order,
but that changing the order to (e
2
, e
1
) gives a dierent isomorphism.
Exercise 5.7.2. Show that the same holds for the same basis for C
2
.
Exercise 5.7.3. The swap of the two basis elements in the preceding exercises
is said to change the parity of the basis. Note that it is equivalent to a
reection. In R
3
make a list of the dierent orderings of the standard basis
elements e
1
, e
2
, e
3
and decide which of them change the parity of the basis.
This has everything to do with being left handed or right handed! Suppose we
got a communication from deep space from Little Green Men and they sent
us a description of the world seen from their point of view. How could we
tell whether or not we had a basis for space having the same parity as the
LGM? Could you do it be sending them a circularly polarised beam of light
and telling them how it was polarised in our system?
5.7.1 Matrix Representations of Linear Maps
The map B
1
n
is called a representation of U and you can think of it as a
naming system which gives for each point u of U a name of u which is a
column of scalars from F.
U
F
n
V
F
m
-
-
6 6
B
n
C
m
f
?
We now have the diagram shown, and we can ll in the question mark with
[f] which is a matrix with entries in F.
Theorem 5.7.2. There is a unique m n matrix [f] which, regarded as a
map from F
n
to F
m
has the property that
C
m
[f] = f B
n
Proof:
Take e
1
=
_
_
1
0
0
.
.
.
_
_
in F
n
. Going by the top left route, this goes to the rst
basis element in U, u
1
. This is taken by f to f(u
1
) V which has some
expression b
1
v
1
+b
2
v
2
+ +b
m
v
m
. The numbers
_
_
b
1
b
2
.
.
.
b
m
_
_
are the rst column
of the matrix [f]. We get n 1 more columns in the same way, so column j
is
b
j
=
_
_
_
_
_
b
j
1
b
j
2
.
.
.
b
j
n
_
_
_
_
_
precisely when f(u
j
) =
i[1:m]
b
j
i
v
i
This tells us what the matrix is, and we see that the top left route f B
n
takes e
j
to f(u
j
), and the bottom right route takes e
j
to b
j
by [f] and b
j
to f(u
j
) by C
m
, so we get to the same result both ways around the square.
(We say the diagram commutes.) Since this holds for basis elements e
j
and
all maps are linear, it holds for all points. The matrix has to be unique since
a dierent matrix would take some point to a dierent element of F
m
and so
the diagram would not commute.
Denition 5.7.3. The matrix [f] obtained by this process is called the rep-
resentation of f with respect to the bases B and C.
Remark 5.7.1. It is called a representation because an element of a vector
space can be a very strange thing and a linear map rather abstract. Turning
linear maps into matrices means we can do sums easily.
This shows that we can obtain a matrix representing a linear map between
any two nite dimensional vector spaces: all it takes is a choice of a basis
for each of the spaces. In the case where there is a standard basis, as for
example R
n
, this is the usual but not the only choice. Some other choice, as
remarked earlier, may be highly desirable for particular problems.
Exercise 5.7.4. This one is important! Suppose I have linear maps
R
n
g
R
m
f
R
k
Then there is certainly a matrix [g] representing g with respect to the standard
basis, another [f] representing f likewise, and there must also be one [f g]
representing f g. Show that
[f g] = [f] [g]
where denotes matrix multiplication.
Remark 5.7.2. This is doing it backwards. We in fact make up the rule for
matrix multiplication precisely so that the above result will be true. Under-
stand this and you understand why we have the strange rule for multiplying
matrices that we use.
Exercise 5.7.5. If I
2
is the identity map from R
2
to R
2
, and we choose the
ordered basis (e
1
, e
2
) for the domain and the basis (e
2
, e
1
) for the codomain,
nd the matrix [I
2
].
Exercise 5.7.6. Write down the matrices representing the identity map for
every choice of a pair of ordered standard basis elements for R
3
.
Exercise 5.7.7. Suppose we chose the basis
_
ie
1
=
_
i
0
_
, ie
2
=
_
0
i
__
in C
2
. What is the matrix representing the identity map from C
2
to C
2
if we use the standard basis in the standard order for the domain and this
imaginary version of it for the codomain? What if it were the other way
around? What if we used the ordered basis (ie
2
, e
1
)? How many dierent
matrices represent the identity if you go through all possible permutations?
Exercise 5.7.8. Using the same bases for C as in the last exercise, represent
the reection map from C
2
to itself that interchanges the rst and second
components of every vector.
Exercise 5.7.9. Is the conjugation map from C to C linear? It is certainly
linear regarding C as R
2
and asking if it is linear over R. But is it linear as
a one dimensional complex space over C?
Exercise 5.7.10. Let (e
1
+ e
2
, e
1
e
2
) be an ordered basis for R
2
amd let
f : R
2
R
2
be the reection in the Y -axis. Calculate [f] with respect to this
choice of basis in both domain and codomain.
5.7.2 Changing the Basis
Now suppose we have two bases each for U and V . This also leads to a
diagram which contains the earlier one.
U
F
n
V
F
m
-
-
6 6
B
n
C
m
f
[f]
F
n
F
m -
? ?
[f
]
B
n
C
-
S T
The bottom part of the diagram is as before but I have supposed another
ordered basis B
for U and another ordered basis C
for V .
This could happen if you had one coordinate frame in R
3
and I had another.
We would both assign a name consisting of a triple of numbers to a point in
space, but they would be dierent numbers. If the point was rotated about
your axes by a matrix, I would need to describe it as a rotation about my
axes plus a shift. The rotation matrices would be dierent. So we need to
be able to gure out how to get from yours to mine and vice-versa.
There is an isomorphism (B
n
)
1
B
n
which I call S and another (C
m
)
1
C
m
which I call T. These are represented by transition matrices, also called S
and T in what follows, and the idea is that if I have an element x of F
n
which
is the B-name for some object u in U (meaning the name in the B basis,
that is the sequence of coecients which expand u in terms of the elements
of B in the given order) and if y is the B
-name for the same u, Then the

transition matrix for S from B to B
takes B-names to the corresponding

B
-names, and similarly the transition matrix T takes C-names to C
names
of the same objects.
The matrix for S is easy to calculate:
I look to see where e
1
=
_
_
_
_
_
1
0
.
.
.
0
_
_
_
_
_
goes. B
n
takes it to u
1
the rst basis
element.
u
1
= c
1
u
1
+c
2
u
2
+ +c
n
u
n
where B
1
= (u
1
, u
2
, . . . , u
n
) is the second basis. Then the rst column of [S]
the matrix for S is
_
_
_
_
_
c
1
c
2
.
.
.
c
n
_
_
_
_
_
and the other columns are calculated in the same way.
Similarly for the matrix [T] for T.
Then we have
Theorem 5.7.3. [f
][S] = [T][f] and the becomes matrix multiplication

because that is how matrices work.
Alternatively [f
] = T [f] [S]
1
.
This tells you how to calculate the new representation of f with respect to
B
, C
if we know the representation of f with respect to B, C.

Exercise 5.7.11. Suppose we have the identity map I
2
from R
2
to itself and
we choose rst the standard basis with the usual ordering on both domain
and codomain, and then the reverse ordering on both domain and codomain.
Calculate [I
2
] and [I
2
] by computing [S] and [T].
Special Application
Suppose U = R
n
and V = R
m
and F = R, and B is the standard basis for
R
n
, C is the standard basis for R
m
and f has matrix [f] with respect to the
standard basis.
Let B
and C
denote new bases for R

n
, R
m
respectively.
B
= (u
1
, u
2
, . . . , u
n
)
C
= (v
1
, v
1
, . . . v
m
).
We want to calculate [f
] the matrix for f with respect to B
and C
.
The lower part of the diagram now becomes rather trivial:
R
n
R
n
R
m
R
m
-
-
6 6
I
n
I
m
f
[f]
R
n
R
m -
? ?
[f
]
B
n
C
-
S T
Now B
n
takes
_
_
_
_
_
1
0
0
.
.
.
_
_
_
_
_
to the rst basis element which is c
1
e
1
+c
2
e
2
+ +c
n
e
n
which is just
_
_
_
_
_
c
1
c
2
.
.
.
c
n
_
_
_
_
_
, the standard name of the rst basis element. Similarly
for all the other columns. The matrix for S is thus simply the ordered list
of elements of the basis written as columns. Similarly for T. The matrix [f
]
therefore is
[T] [f] [S
1
]
Example 5.7.1. Suppose I have the linear map for R
2
to R
3
with matrix
A =
_
_
1 2
3 4
5 6
_
_
in the standard basis.
I choose the ordered basis
B =
__
1
2
_
,
_
3
4
__
for R
2
and the basis
C =
_
_
_
_
0
1
0
_
_
_
_
1
0
1
_
_
_
_
2
2
1
_
_
_
_
for R
3
. Then I ask for the matrix M representing the map specied by A
with respect to these two bases B and C.
I regard the bases as naming systems which assign names to actual vectors
in R
2
and R
3
. One of the oddities of this is that the names and the objects
named are the same kinds of things. It is usually hard to confuse a thing
like a dog with its name, one is a hairy object that runs around digging up
bones and the other is a noise or a string of letters of the alphabet, but in
this case, both objects and names are columns of numbers. This gives lots
of scope for muddle and confusion.
To minimise the risk of getting muddled and confused, I recycle an earlier
picture
U
R
2
V
R
3
-
-
6 6
B
2
C
3
f
M
U is actually R
2
and V is actually R
3
. f is the map represented (relative
to the standard basis ) by the matrix A. We want to nd M, the matrix
representing the same map with respect to the basis B in the domain and C
in the codomain.
One way is to say it is simply C
1
3
AB
2
where B
2
and C
3
are the transition
matrices which change the basis from the new one to the standard one. This
amounts to going around the square the long way around.
Now the basis B determines the map B
2
which takes the names of vectors
under the basis B to their standard names. The rst column of the matrix
representing B
2
is the image of [1, 0]
T
. This is code for the rst basis element
which is [1, 2]
T
. Similarly the second column of the matrix representing B
2
is the second basis element. So the matrix representing B
2
is just the basis
B in order:
B
2
=
_
1 3
2 4
_
Similarly
C
3
=
_
_
0 1 2
1 0 2
0 1 1
_
_
and the matrix M representing the map specied by A in the new bases is
_
_
0 1 2
1 0 2
0 1 1
_
_
1
_
_
1 2
3 4
5 6
_
_
_
1 3
2 4
_
This solves the problem of computing M, with a bit of help from MATLAB
or Mathematica. In fact the inverse of C
3
is just
_
_
0 1 2
1 0 2
0 1 1
_
_
1
=
_
_
2 1 2
1 0 2
1 0 1
_
_
This gives the nal result
M =
_
_
35 81
29 67
12 28
_
_
which is the representation of the map in the new pair of bases.
The other way to do it involves diagram chasing. We reason that the rst
column of M tells us what the name [1, 0]
T
gets sent to. Now [1, 0]
T
is the
B-name (name in the basis B) of the vector [1, 2]
T
. This gets taken by f to
_
_
1 2
3 4
5 6
_
_
_
1
2
_
=
_
_
5
11
17
_
_
Now we need to determine the C-name of this vector in the basis C. It is
the column
_
_
a
b
c
_
_
where
_
_
5
11
17
_
_
= a
_
_
0
1
0
_
_
+b
_
_
1
0
1
_
_
+c
_
_
2
2
1
_
_
This comes out, by solving simultaneous equations, as
_
_
a
b
c
_
_
=
_
_
35
29
12
_
_
We get the second column in the same way.
I leave you to verify that you get the same matrix M whichever way you do
it. One way requires a diagram chase and forces you to think about what ev-
erything means, the other involves only matrix inversion and multiplication.
Dierent kind of people have dierent ideas of the best way. You now have
a choice. Until you are condent that you know what you are doing and are
no longer muddled I recommend you do it both ways and look to see if they
give the same answer.
Remark 5.7.3. It is very easy to test the ability to do this in an examination.
Be warned!
5.8 Similarities
A particular case which will be of interest later is when we have a linear
map f : V V , the domain and the codomain being the same space. It is
natural to have the same basis for each of them. Recall that a linear map
from a space to itself is often called a linear operator or operator for short.
In this case, we get the diagram
5.8. SIMILARITIES 113
R
n
R
n
R
n
R
n
-
-
6 6
I
n
I
n
f
[f]
R
n
R
n -
? ?
[f
]
B
n
B
-
T T
Here [f] is the matrix for f with respect to the standard basis and [f
] is the
matrix with respect to the new basis B
. Then we have the simple change of

basis formula
[f
] = T [f] T
1
and T = (B
n
)
1
.
Example 5.8.1. In R
2
let B be the ordered basis (e
1
+e
2
, e
1
e
2
). Express
the map f that reects in the Y axis as a matrix with respect to both the
standard basis and the given basis.
Solution: We take the rst basis element e
1
and look to see where the map
takes it and we obseve it gets multiplied by 1, then the second basis vector
e
2
and observe that it is left xed by the map f. So the matrix [f] of f with
respect to the standard basis is
[f] =
_
1 0
0 1
_
For the second basis we take the rst element e
1
+e
2
and see that f takes it
to the vector e
1
+e
2
which is the negative of the second basis element. So
the rst column of the matrix with respect to this basis is
_
0
1
_
The second basis element e
1
e
2
is taken by f to e
1
e
2
which is the
negative of the rst basis element. So the complete matrix is
[f
] =
_
0 1
1 0
_
The matrix B
2
is
_
1 1
1 1
_
and its inverse is
1
2
_
1 1
1 1
_
Then the magic formula says
[f
] = B
1
2
[f]B
2
That is,
_
0 1
1 0
_
=
1
2
_
1 1
1 1
_ _
1 0
0 1
_ _
1 1
1 1
_
which is easily veried.
It is worth noting that in this easy case it was quicker to do it directly but
for most cases the matrix calculation is much quicker- particularly if you use
MATLAB or Mathematica.
Denition 5.8.1. Linear operators f and g are said to be similar i there
is an invertible map T such that f = T g T
1
.
Then if we take any basis in the domain and codomain of a real vector space,
and then choose another basis, the two matrices with respect to dierent
choices are similar.
Exercise 5.8.1. Prove that similarity is an equivalence relation on square
matrices over F for any eld F.
Chapter 6
Banach and Hilbert Spaces
6.1 Beyond Vector Spaces
In dening an abstract vector space, we extracted some of the properties
of R
n
and turned them into a denition. This gives us something which
we know exists and which we then generalised to being over any eld. But
R
n
has a number of important properties which we ignored: in particular
it makes sense to talk of the length of a vector in R
n
, or if you prefer the
distance of the point from the origin, and hence the distance between any
pair of points u, v by calculating the distance of uv from the origin. This
simply does not make sense in an abstract vector space, nor in some of the
examples. For instance, F(R) is the space of all maps from R to R and we
would need to be able to talk of the distance of any such function from the
origin. It is hard to see how to do this. If we restricted ourselves to C
0
([0, 1])
then it might be possible since continuous functions on compact intervals
attain their maximum and minimum, and the greatest of the absolute value
of these two numbers might (and indeed does) make a sensible value of the
distance of the function from the zero function. If we are to extend the notion
of a distance to such spaces as this, there is a need for something beyond a
vector space.
Another property of R
n
is that we can draw line segments in it and calculate
the angle between them. Again there is no such possibility for abstract
vector spaces. The idea of a line segment does make sense in an abstract
vector space, and I leave you to dene them, but it is dicult to see what
the angle between f(x) = tx
2
and g(x) = tx
3
, where t [0, 1], might be in
F(R), but it might make sense in some other vector spaces besides R
n
, and
indeed does.
115
116 CHAPTER 6. BANACH AND HILBERT SPACES
So there are some properties dealing with distances from the origin of vectors
and angles between lines in vector spaces. In this chapter we extend the idea
of an abstract vector space to encompass these ideas.
6.2 Normed Vector Spaces: Banach Spaces
The proper name for the distance of a point from the origin in a vector space
is a norm and we dene a norm on a vector space V over the eld R as
follows:
Denition 6.2.1. A norm on a real vector space V is a map
| | : V R, v |v|
with the properties
1. v V, |v| 0 and |v| = 0 v = 0
2. v V, t R, |tv| = [t[|v|
3. u, v V, |u +v| |u| +|v|
The last property is called the triangle inequality. The second property uses
the fact that R has an absolute value function on it. This is something like
a norm itself in that it has the properties that [t[ R and:
1. t R, [t[ 0 and [t[ = 0 t = 0
2. s, t R, [st[ = [s[[t[
3. s, t R, [s +t[ [s[ +[t[
These critical properties are also shared by the modulus of a complex number,
which means that everything we say in this chapter works for vector elds
over C.
Denition 6.2.2. A vector space V with a norm | | is called a normed
vector space.
If we have a normed vector space we automatically get a metric on it, which
is a way of assigning a distance to any pair of points:
Denition 6.2.3. A metric space is a set X and a map d : XX R such
that:
6.2. NORMED VECTOR SPACES: BANACH SPACES 117
1. x, y X, d(x, y) 0 and d(x, y) = 0 x = y
2. x, y X, d(x, y) = d(y, x)
3. x, y, z R, d(x, z) d(x, y) +d(y, z)
Again, the last axiom (dening property) is called the triangle inequality. If
you translate these properties into English, you get something like:
1. The distance between any pair of dierent points is a positive real
number and the distance of any point from itself is zero
2. The distance from here to the pub is the same as the distance from the
pub to here, wherever the pub is and wherever here is
3. The distance between here and the pub cant be shortened by going
via some other place.
I strenuously recommend that you do translations like this for every abstract
set of algebraic properties. You need to understand what they mean, not
just to memorise a meaningless string of symbols.
Theorem 6.2.1. A normed vector space is a metric space when we dene
d(u, v) = |u v|
Exercise 6.2.1. Prove this. It is not dicult and the exercise is known as
axiom bashing and is good for your souls.
Exercise 6.2.2. Dene an innite sequence of points in a metric space.
Dene the limit of such a sequence when it exists. Hence dene the term
continuous for maps between possibly dierent metric spaces.
Exercise 6.2.3. dene a Cauchy sequence in a metric space (X, d) as a
sequence of points x
n
: n N such that
R
+
, N N, n, m > N d(x
n
, x
m
) <
Show that if a sequence in X has a limit x then it is a Cauchy sequence.
Remark 6.2.1. Cauchy sequences ought to converge to limits in any re-
spectable metric space. But the rational numbers Q form a metric space and
Qhas holes in it, notably
2 / Q. So the obvious sequence 1, 1.4, 1, 41, 1.414,

which is in Q does not have a limit in Q despite being a Cauchy sequence.
This leads to the following denition:
Denition 6.2.4. A metric space (X, d) is said to be complete if every
Cauchy sequence in X has a limit.
Intuitively, complete metric spaces have no holes in them; if a sequence is
getting closer and closer to something then there is always something there
for it to get closer to!
Denition 6.2.5. A normed vector space which is complete in the induced
metric is called a Banach space
This is worth mentioning because Banach spaces are the natural setting for
doing Calculus. Everything that you did in rst year in Calculus can be done
in an abstract Banach space. This saves a lot of repetition when it comes to
extending Calculus to nastier things than R
n
.
6.3 Inner Product Spaces: Hilbert Spaces
On R
n
we have the good old dot product. We want to shift this up to
abstract spaces too. We shall therefore extract the critical properties of the
dot product, and say anything having these properties is an inner product.
Denition 6.3.1. An inner product ) on a real vector space V is a map
) : V V R, )(u, v) = u, v)
such that the following properties hold:
1. u, v V, u, v) = v, u)
2. u V, u, u) 0 and u, u) = 0 u = 0
3. u, v V, t R, tu, v) = tu, v)
4. u, v, w V, u +v, w) = u, w) +v, w)
The last two properties are sometimes called the bilinearity properties of , ).
Exercise 6.3.1. Show that ordinary multiplication on R satises these ax-
ioms.
Exercise 6.3.2. Show that the dot product on R
n
satises these axioms.
Exercise 6.3.3. Show that on C, we do not get these axioms satised by
multiplication, but we do if we dene w, z) = w z and change the rst axiom
to u, v) = v, u). We call such a thing a complex inner product
6.3. INNER PRODUCT SPACES: HILBERT SPACES 119
Exercise 6.3.4. Dene a complex inner product on C
n
.
Exercise 6.3.5. Have another look at Exercise 4.4.1 in the light of the last
exercise and dene SO(2) and SO(3) for the real analogues of SU(n).
Denition 6.3.2. A real vector space V and an inner product , ) on V gives
a real inner product space, (V, , )).
Theorem 6.3.1 (Cauchy-Schwarz Inequality). For any real inner prod-
uct space (V, , )),
u, v V, (u, v))
2
u, u)v, v)
Proof:
For any s, t R we have that
0 su tv, su tv)
0 s
2
u, u) +t
2
v, v) 2stu, v)
2stu, v) s
2
u, u) +t
2
v, v)
Putting s
2
= v, v) and t
2
= u, u) we have
2
_
u, u)
_
v, v)u, v) 2u, u)v, v)
u, v)
_
u, u)v, v)
(u, v))
2
u, u)v, v)
Remark 6.3.1. This is not exactly an obvious result but it has important
uses as we shall see.
Theorem 6.3.2. A real inner product space is a normed space
Proof: We have to show that we can get a norm for every v V from the
inner product. Given , ) we dene
|v| =
_
v, v
It remains to show that this is a norm, that is, it satises the axioms.
The rst property for a norm follows immediately from the second property
for an inner product, and the second property for a norm follows from the
third property for an inner product. This leaves the triangle inequality.
We have
|u +v|
2
= u +v, u +v) = u, u) +v, v) + 2u, v)
= |u|
2
+ 2u, v) +|v|
2
But the Cauchy-Schwarz inequality ensures
2u, v) 2|u||v|
|u +v|
2
(|u| +|v|)
2
|u +v| |u| +|v|
Remark 6.3.2. Note that the Cauchy-Schwarz inequality can now be put
in the form
1
u, v)
|u||v|
1
and this is precisely what we need to provide us with the cosine of the angle
between the vectors u and v as seen from the origin. In other words, we can
calculate the angle between any two vectors in a real inner product space.
The following exercises should give some point to all this eort:
Exercise 6.3.6. Show that the space C
0
[, ] of continuous functions from
[, ] to R is a vector space, and show that if we dene
f, g)
_

f(t)g(t) dt
then we have an inner product on this space.
Exercise 6.3.7. Show that in this inner product space sin and cos are or-
thogonal.
Exercise 6.3.8. Show that in this space, sin(nx) and sin(mx) are orthogonal
for distinct positive integers n, m.
Remark 6.3.3. This now means that we can do in the space C
0
[, ] a
large amount that we can do in R
2
and R
3
. Since we have geometric intuitions
developed in the last two spaces we can see a great many things as painfully
obvious and hardly in need of proof. But by proving them using only the
axioms for an inner product space we have also proved them in C
0
[, ]
where our geometric intuitions are not terribly well developed. This is a
powerful technique for seeing the obvious in innite dimensional spaces and
explains all this obsession with abstraction. We do it because it is very very
useful. All of what is called Fourier Theory, an enormously important piece of
Mathematics with a huge number of applications in Physics and Enginering
depends on this strategy.
6.4. THE GRAM-SCHMIDT ORTHOGONALISATION PROCESS 121
6.3.1 Hilbert Spaces
The space C
0
[, ] is not a complete space in the metric derived from the
norm which is in turn derived from the inner product. The denition of a
real Hilbert space is:
Denition 6.3.3. A Real Hilbert Space is a complete real inner product
space, that is a vector space over R with a real inner product which has
every cauchy sequence converge to a limit in the space.
There is a corresponding denition of a complex Hilbert space:
Denition 6.3.4. A Complex Hilbert Space is a complete complex inner
product space, that is, a vector space over C with a complex inner product
which is complete.
Remark 6.3.4. When a space is not complete it is possible to make it
complete by adding additional points to force limits to all cauchy sequences.
Such a space is called a completion. If this is done to the space Q it gives R.
I shall not elaborate on this because it comes under the heading of Topology
rather than Algebra which is the main point of this unit.
6.4 The Gram-Schmidt Orthogonalisation Pro-
cess
The reason for abstracting to Hilbert Spaces was so that we could do in
them the easy things we can do in R
2
and R
3
. One of these is constructing
a special basis which is orthogonal. The standard basis is orthogonal, that
is any two vectors are at right angles, or e
i
, e
j
) = 0 if i ,= j. In fact they
are orthornormal, which is to say e
i
, e
j
) =
ij
where
ij
= 0 if i ,= j and
jj
= 1. But if I give you any non-zero point in R
2
we can turn it into an
orthonormal basis for R
2
.
Take any u R
2
0. Then the rst thing we do is to x it up so that it
has unit length by making
u =
u
|u|
Now we need a vector orthogonal to u (and u) so choose v not in the span
of u. This is not generally orthogonal to u, in other words, u, v) , = 0. The
picture of gure 6.4.1 shows the situation.
The idea is to work out what the vector representing the line segment from
q to v is and this is the new vector orthogonal to u. Then all we have to do
is to normalise it and we are done, we get u, v is our basis.
u
v
>
>
u
>
>
q
>
v
>
o
Figure 6.4.1: Getting an orthogonal vector.
The easy way to get the line from q to v is to observe that if we add it to
the line from 0 to q we get v, so it is v q. But q is the projection of v on
u which we can calculate as I shall now remind you.
Denition 6.4.1. Given two points, u, v in a real innerproduct space, the
projection of v on u is the point
v, u)
u, u)
u
Exercise 6.4.1. Show that this works correctly in R
2
and R
3
in particular
cases.
It follows that in gure 6.4.1,
q = v, u) u
and
v =
v v, u) u
|v v, u) u|
This gives us an orthonormal basis for R
2
and which contains u, which is a
unit norm vector in the span of the initial vector u. It also gives us a way
of nding an orthornormal basis for any two dimensional real inner product
space, since all the constructions make perfectly good sense in any such space.
In R
3
we can continue the process: having got an orthonormal basis for some
plane in R
3
, we simply choose a vector not in the plane and subtract o
its projection on the plane. And again we can do the same in any three
dimensional inner product space. And nally, we can obtain an orthogonal
basis for any real inner product space by continuing to subtract o projec-
tions. We can suppose there always is a basis, and proceed to order it and
6.4. THE GRAM-SCHMIDT ORTHOGONALISATION PROCESS 123
orthonormalise it just as we have been doing. In each case we have a nite
dimensional subspace for which we have an orthonormal basis, and a vector
not in the subspace. Then we need to subtract o the projection of this
last vector on the subspace from the vector to get an orthogonal vector, and
nally normalise it.
This process is called the Gram-Schmidt Orthogonalisation Process.
Exercise 6.4.2. Suppose you have a nite dimensional subspace U of a real
inner product space V , an orthogonal basis B for U and a vector v V U.
Calculate the projection of v on U and hence obtain an orthogonal basis for
the space spanned by B and v.
Exercise 6.4.3. In the space T[1, 1] of all polynomial functions from [1, 1]
to R, with the inner product p, q)
_
1
1
p(x)q(x) dx, take the subspace
spanned by (1, x, x
2
) and use the Gram-Schmidt process to construct a basis
for this subspace.
Chapter 7
Linear and Quadratic Maps
Because linear maps can be specied in many ways, it is tempting to use
only matrices to represent them, and this can be carried to the point where
we lose sight of the underlying map. So there are lots of books which focus
exclusively on matrices. This can make some results look incredibly obscure:
if you merely memorise formulae this may not matter, but if you need to
invent formulae it makes you life impossible. This dierence of perspective
comes up in several areas of applications, the next section giving a good
example of why it is better in general to focus on the underlying maps.
Exercise 7.0.1. Show that the identity matrix on R
5
has an innite number
of square roots, that is matrices A such that A
2
= I. Hint: Do this by matrix
manipulations only if you want to take forever. Thinking in pictures of maps
in lower dimensions can save a lot of time.
7.1 Determinants
The space R
2
has a property that we have not abstracted to the general case
of real vector spaces, and certainly not to vector spaces over any other eld.
It has a standard measure on it, the Lebesgue measure, otherwise known
as area. This is a map from a collection of subsets of R
2
(those that can
condently be said to have an area) to R, which assigns to each subset its
area. Calling this map A, we have A([0, 1] [0, 1]) = 1. We believe that the
subset [0, 1] [0, 1], usually known as the unit square, actually has an area
and that the area is in fact 1. If we dene D
2
= x R
2
: |x| 1 and call
it the unit disc, we usually agree that A(D
2
) = .
It should be obvious that none of this makes any sense in an abstract real
vector space but you might reasonably feel it ought to. It doesnt make sense
125
126 CHAPTER 7. LINEAR AND QUADRATIC MAPS
in a Banach space or a Hilbert space either, but we could make it make sense
by dening what we mean by a general measure. This ought to generalise
the idea of length in R, area in R
2
, and volume in R
3
. If I were to do this,
you would then be in a good position to make sense of probability theory
which uses these ideas extensively.
I leave this as a subject for the reective student to investigate. You can
try making up some axioms for measures and see whether you can dene
them on abstract spaces and see when you can make them make sense. This
is good clean innocent fun and keeps you from corrupting your minds by
hanging around street corners or watching television.
Exercise 7.1.1. A Probability Measure on a space X is a measure P with
the property that P(X) = 1. Obtain a probability measure on R. It has to
assign some non-negative real number to every interval (a, b) R.
Instead of dening area properly, I am going to pretend we know what area
is when we are merely used to knowing how to nd the areas of particular
sets, a rather dierent thing.
A linear map f from R
2
to R
2
will take subsets with an area to other subsets
with a usually dierent area. In particular if I dene the unit interval and
the unit square by I [0, 1] R and I
2
I I R
2
I can ask what a
linear map does to the unit square and ask what the area of the image of the
unit square is.
It is fairly obvious that the image is usually a parallelogram, although it
may degenerate into a line segment in particular cases. Figure 7.1.1 shows
the image of the interval I
x
= (x, y)
T
R
2
: 0 x 1, y = 0 and labels
it a, and also the image of I
y
= (x, y)
T
R
2
: x = 0, 0 y 1 which is
labelled b. I shall also confuse a with f(e
1
) and b with f(e
2
) in the horribly
sloppy way you are used to.
I have written the coordinates of a and b underneath: in this form it is the
matrix for f with respect to the standard basis.
The interior of the parallelogram with vertices at 0, a, b, a+b then has some
denite area, which is a property of f. What makes this of interest is that
it is some number which we can interpret as the area stretching factor of f,
because if I had started with a smaller or bigger square than I
2
, the result
would have been a smaller or larger parallelogram in the same proportion.
Nor would it matter where the original square was. Moreover, if we assume
that we can get estimates for the area of some other shape such as a disc by
covering it with smaller and smaller squares which are disjoint and adding
up the areas of these, getting the right answer as the limit of this process,
then it follows that the area of the new shape after it has been stretched by
7.1. DETERMINANTS 127
a
b
a
a
b
b
1 1
2 2
Figure 7.1.1: Image of the unit square by a linear map.
f will also be in the same proportion. So there is a number A(f) which tells
us how much f stretches (or compresses) the area of any subset of R
2
that
has an area.
It would be nice to be able to compute this number, and it does not look to
be too hard to do it.
First I chop o the dotted pink triangle shown and move it to the green one,
to get a dierent parallelogram with the same area, and underneath it I write
the new vector pair. This transforms the original diagram to a simpler one,
but leaves the area unchanged. I show this process in gure 7.1.2.
The new pair of vectors was obtained from the old by subtracting a multiple
of b from a, and the multiple is chosen so as to make the y coordinate of the
new a zero. This gives:
_
a
1
a
2
b
2
b
1
b
1
a
2
a
2
b
2
b
2
b
2
_
=
_
a
1
a
2
b
2
b
1
b
1
0 b
2
_
I suggest you write this below the diagram and check that it does what I
claim.
Finally I chop o the green triangle and move it to the blue triangle of gure
7.1.3 to get the new pair of vectors:
_
a
1
a
2
b
2
b
1
0
0 b
2
_
This subtracts from b a suitable multiple of a
to give b
.
It is now easy to see that we have transformed the original parallelogram
a
b
Figure 7.1.2: Transformed image of the unit square by a linear map.
New vectors: a
=
_
a
1
a
2
b
2
b
1
0
_
, b =
_
b
1
b
2
_
a
b
Figure 7.1.3: Further transformed image of the unit

square by a linear map.
New vectors: a
=
_
a
1
a
2
b
2
b
1
0
_
, b
=
_
0
b
2
_
7.1. DETERMINANTS 129
into a rectangle of width a
1

a
2
b
2
b
1
and height b
2
. We multiply these to get
the area which is therefore a
1
b
2
a
2
b
1
.
I shall call this value det(f) since it is clearly a property of the map f, and
refer to it as the determinant of f.
There is one feature of this which is a bit puzzling, namely it can be negative!
Areas cannot; one of the axioms we would need if we were going to say
what an abstract measure is would say that areas cannot be negative. What
happens can be seen by taking some xed b and rotating a until it crosses
over. When it is along the same line, the area is zero which it should be,
but on the other side it is negative. The sign is telling us something. If we
imagine the unit square painted a lurid pink on the top side and blue on
the other, then the map corresponding to negative area would be one which
had ipped the square over as well as deforming it into a parallelogram, thus
giving us a blue parallelogram instead of a lurid pink one.
So the determinant gives us an oriented area and if we want only the area we
should simply take the absolute value. In fact the orientation is important
information so we keep it in the denition of the determinant.
Note that the only properties of area which we used were rst that the area of
disjoint subsets sum to give the area of the whole, second that if we simply
shift a region it does not change the area, and nally that the area of a
rectangle is the height multiplied by the width. These all look reasonable
and ought to generalise to volume and indeed higher dimensions than three
1
.
It is immediate that for f : R
3
R
3
there is a similar function det which
again assigns a number to the linear map f which is the (oriented) volume
of the image of the unit cube in R
3
. And it is true (and easy to believe) that
there is a determinant function det:L(R
n
, R
n
) R which generalises it. We
give the formula for the n n case for completeness: First a denition:
Denition 7.1.1. If A is an n n matrix then the cofactor of the term
a
i
j
, where i indexes the rows and j the columns, is the matrix obtained by
removing row i and column j and is called A
i
j
.
Then the number det(A) usually written [A[, is the sum
j[1:n]
a
1
j
(1)
1+j
det(A
1
j
)
We could have chosen in place of the rst row any row, say the i
th
and written
det(A) =
j[1:n]
a
i
j
(1)
i+j
det(A
i
j
)
1
But see the Banach-Tarski paradox! Try some googling here.
In fact we could also have interchanged the rows and columns and got the
same answer; all however involve reducing the determinant to a sum of n
determinants of size n 1. This means that it can take a long time to
calculate the determinant of a big matrix by hand. Some thought shows
that we are taking every set of choices of one element from each column that
arent in the same row and multiplying them together, and weighting by a
sign of 1, then summing. It makes the determinant an alternating tensor,
a remark which may mean rather more to you some day than it does today.
Example 7.1.1.
_
_
1 3 2
4 1 0
2 1 1
_
_
= 1
_
1 0
1 1
_
_
4 0
2 1
_
+ 2
_
4 1
2 1
_
= 1 1 3 4 + 2 (4 2) = 1 12 + 4 = 7
Exercise 7.1.2. Provide a plausible argument to show that if f, g : R
2
R
2
are linear maps, then
det(f g) = det(f) det(g)
and
det (f
1
) = 1/ det(f)
when f
1
exists.
Deduce that f : R
n
R
n
has an inverse i det(f) ,= 0
Exercise 7.1.3. If we change the basis of R
2
and represent the map f with
respect to the new basis (on both domain and codomain) show that this cannot
change the determinant.
The following is not immediately obvious from our denition but can be
proved (see text book)
Theorem 7.1.1. A M
n,n
(R), det(A) = det(A
T
)
where the transpose A
T
of the matrix A is obtained by swapping rows and
columns.
7.2 Orthogonal Maps
In this section we discuss the rotations and reections which you will recall
from the rst lecture, which you probably think was about oranges.
7.2. ORTHOGONAL MAPS 131
I start o in R
2
which is where life is simplest, and I use the standard inner or
dot product on that space. I shall however take care to use only arguments
and ideas which make sense on R
n
and indeed on any Inner Product space.
The idea is think R
2
but write inner product spaces V .
I want to restrict myself to maps which keep all distances unchanged, so I
dene an orthogonal map f as a linear map from an inner product space V
to V which satises
u, v V, f(u), f(v)) = u, v)
I have framed this so that it makes sense on any inner product space. For real
inner product spaces the term is orthogonal and for complex inner products
we use the term unitary. Attending for the time being only to the real case,
and indeed focussing on R
2
, a number of things follow.
First, an orthogonal map must preserve distances from the origin by putting
u = v. That is, |u| = |f(u)| for every u V . It must also therefore
preserve all distances, since the distance of u from v is |u v|.
Second, the composite of any two orthogonal maps is also orthogonal, since
if f and g are orthogonal,
u, v V, g(f(u)), g(f(v))) = f(u), f(v)) = u, v)
Third every orthogonal map is a linear bijection and hence has a linear in-
verse. This follows immediately from the rst property since no distinct
points can be confounded by f (it would change the distance) so f is 1-1,
and since it is 1-1 on a basis and the dimension is xed the image of a basis
is a basis. I leave you to complete the argument.
Fourth the linear inverse is also orthogonal. If you suppose f(u
) = u and
f(v
) = v then the logic is very obvious and you should make sure you can
do it.
Exercise 7.2.1. Show that a matrix A represents an orthogonal map i
A
T
= A
1
Since the composition of maps is associative and the identity map is certainly
orthogonal, it follows that the orthogonal maps from any inner product space
to itself form a group. The same holds for complex inner products.
Denition 7.2.1. The group of all orthogonal maps from R
n
to R
n
is called
the orthogonal group and is written O(n). The set of all unitary maps from
C
n
to itself is called the Unitary group and is written U(n).
These groups are extremely important in Physics and O(n) is important in
Engineering.
f(e )
1
f(e )
f(e )
2
2
Figure 7.2.1: An othogonal map from R
2
to R
2
.
Both of these groups are what are known as Lie groups
2
because they are not
discrete things like the nite groups but are continuous. They are actually
manifolds, the generalisation of curves and surfaces. I shall show shortly that
O(2) is actually a pair of circles! The group O(3) is a three dimensional space
which is a bit harder to visualise.
To set about visualising O(2) it is a good idea to look at matrices representing
the linear maps. Suppose we take the basis element e
1
and ask where it goes
to under an orthogonal map f. We know that |e
1
| = 1 so f(e
1
) must lie on
the unit circle.
Denition 7.2.2.
S
1
x R
2
: |x| = 1
I have drawn a picture of this in gure 7.2.1
Now given that f(e
2
) must also lie on the unit circle, for the same reason,
and that it has to stay at the same distance from f(e
1
) as e
1
was from e
2
, a
distance of
2 if we believe Pythagoras theorem, there are only two possible

places for it to be, indicated by the two blue dots.
If we specify the position of f(e
1
) by the angle then we have
f(e
1
) =
_
cos()
sin()
_
and the location of f(e
2
) is either
_
sin()
cos()
_
or
_
sin()
cos()
_
This ensures that since the two basis vectors had a dot product of zero to
start with, they still had one afterwards. It follows that there are two sorts
2
Lie rhymes with ea, not with y. If you learn nothing else from the course at least
you can sound educated if you know this.
7.2. ORTHOGONAL MAPS 133
of orthogonal maps from R
2
to R
2
, for every [0, 2) we have those with
matrix
_
cos() sin()
sin() cos()
_
and those with matrix
_
cos() sin()
sin() cos()
_
Since is an angle, we have one such matrix for every justifying my claim
that O(2) is two circles. The orthogonal group actually comes in two parts,
it is disconnected.
A little thought shows that the rst of these is a rotation by the angle and
the second is a reection in the line through the origin at angle /2.
Exercise 7.2.2. Verify this claim.
Looking at these two matrices we note that the columns have norm 1 and
the inner product between the two is zero. The transpose of the matrix then
has the property A
T
A = I. It is easy to see that it follows from this that
AA
T
= AA
T
AA
1
= AA
1
= I and hence A
T
= A
1
. This has to work for
any orthogonal matrix regarded as a linear map from R
n
to R
n
.
Exercise 7.2.3. Justify this claim. State it as a theorem and prove it.
Since det(A) = det(A
T
) it follows that when A is orthogonal, (det(A))
2
= 1.
This tells us that there are two sorts of orthogonal matrix, those with deter-
minant 1 and those with determinant 1. A quick check shows that in R
2
the
determinant 1 matrices are the rotations and the determinant 1 orthogonal
matrices are the reections.
It follows that there is a subgroup of O(n) consisting of those orthogonal
maps having determinant one, since the subset is closed under multiplication
and inversion. Note that the reections are not closed under multiplication
and are therefore not a subgroup.
Denition 7.2.3. The subgroup of O(n) having determinant 1 is called the
Special Orthogonal Group and is written SO(n).
Exercise 7.2.4. Verify that this also holds in R
3
, there are two possible
values of the determinant giving two sorts of orthogonal map.
Exercise 7.2.5. Show that SO(2) is a subgroup of SO(3) in more than one
way. How many ways?
Exercise 7.2.6. Show that SO(n) is a subgroup of SO(n + 1) in more than
one way.
The rotations of R
2
therefore form a group, SO(2), which looks pretty much
like the unit circle except that the points in it are 2 2 matrices instead
of little dots. This is a bit like noting that the space F[a, b] is a vector
space and pretending that f(x) = x
2
is a little dot in this space. This is a
bit mind boggling the rst time you meet it, but all it really amounts to is
throwing away all the details of what things actually are and keeping only
the relationships between them. We can think of S
1
as a space of points in
the plane, or a collection of angles which we can add. Or we can think of it
as complex numbers of modulus 1 which we can multiply. Or we can think
of it as the set of special orthogonal 22 matrices. I hope you are beginning
to grasp the idea of an isomorphism because the angles form a group, the
complex numbers form a group, and the special orthogonal matrices form a
group. And these three groups are isomorphic. Which is to say, they are
basically the same but we may nd it convenient to change the names.
If you dont nd this sort of thing shocking the rst time you meet it you
havent got the point. Like roller coasters, you can get to enjoy the sensation.
Remark 7.2.1. It is premature to call the elements of SO(3) rotations, since
it is not entirely obvious that each element of SO(3) has some xed axis
around which the rotation takes place. Once we have proved this we shall
have answered the rst two questions of the introduction and have nished
with oranges.
7.3 Quadratic Forms on R
n
Although we have focussed on matrices as representing linear maps, they can
also be used to represent quadratic functions on R
n
. In the simple case where
n = 2 a quadratic function looks like ax
2
+2bxy +cy
2
+dx +ey +f. We are
interested in the general shape of the graph, so we rst ignore the constant
term f which simply moves the graph up or down. So I shall put f = 0.
I can next do a bit of rewriting to get it as
a(x u)
2
+ 2b(x u)(y v) +c(y v)
2
+f
for some suitably chosen u and v, and I shall throw away the f
again and
note that this is ax
2
+ 2bxy + cy
2
except for a shift of coordinates. And
shifting the graph doesnt change its shape either.
It suces to look only at this case then if we wish to understand the qual-
itative properties of quadratic functions to R. In this form they are known
7.3. QUADRATIC FORMS ON R
N
135
-2
-1
0
1
2
-2
-1
0
1
2
0
5
10
15
20
25
30
35
Figure 7.3.1: The graph of the quadratic form 2x
2
+ 3xy + 3y
2
.
as quadratic forms for historical reasons. Such a function has a certain sym-
metry about the origin since changing x to x and y to y leaves the value
unchanged.
We can represent
ax
2
+ 2bxy +cy
2
(7.3.1)
as a matrix by writing it as
[x y]
_
a b
b c
_ _
x
y
_
(7.3.2)
which explains why we stuck the 2 in the rst equation.
This matrix is symmetric (A = A
T
) and provided we mentally insert the
[x y] and its transpose in the right places, we can identify the quadratic
form with the matrix.
The graph of a quadratic form on R
2
is going to be a surface, and there are
not really very many possibilities for the general shape of the surfaces we can
get.
Figure 7.3.1 shows a MATLAB plot of a particular case obtained from the
entering into MATLAB the text:
>> [X,Y] = meshgrid([-2:0.1:2]);
>> Z= 2*X.*X + 3* X.*Y + 3 *Y.*Y;
>> meshc(X,Y,Z)
-2
-1
0
1
2
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-20
-10
0
10
20
30
40
50
60
2
9xy + 3y
2
.
If we change the function to
Z= 2*X.*X - 9*X.*Y + 3*Y.*Y;
that is, f(x, y) = 2x
2
9xy + 3y
2
we get the qualitatively dierent graph
shown in gure 7.3.2.
Underneath the graph you can see some contour curves. For the rst quadratic
form they are all ellipses and for the second they are hyperbolae.
I recommend (if you havent met MATLAB) that you go to the Mathematics
Computer Lab and get it running and then type in the quoted text. The
pictures you get can be rotated into dierent positions, giving a good idea of
the shape of the surface.
The last type of quadratic surface is the rst one upside down, and gure
7.3.3 shows the MATLAB graph of
Z= -4*X.*X - 3*X.*Y - 3*Y.*Y;
There is one nal type of surface called a degenerate form and the simplest
example is f(x, y) = x
2
. This has graph a parabolic trough, as shown in gure
7.3.4. The degenerate forms are a sort of boundary between the parabolic
and hyperbolic types, and have measure zero in the space of quadratic forms
(!), meaning that the chances of getting one if you choose random coecients
are negligible.
7.3. QUADRATIC FORMS ON R
N
137
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
-2
0
2
-40
-35
-30
-25
-20
-15
-10
-5
0
2
3xy 3y
2
.
Exercise 7.3.1. Describe the space of quadratic forms on R
2
.
The four types are essentially the only four kinds of surface that you can get
from a quadratic form up to rotation and scaling. They are either paraboloids
one way up as in f(x, y) = x
2
+ y
2
or paraboloids the other way up as in
f(x, y) x
2
y
2
, or hyperboloids as in f(x, y) = x
2
y
2
, or degenerate
parabolic troughs. If we take any of these four templates and scale the x and
y axes by dierent amounts and rotate the axes, then we can get any other
quadratic form. This is obviously an important result which generalises to
quadratic forms on R
n
. It all comes from change of basis ideas, which explains
why we spent so much time on that.
Looking at the matrices for our templates we see that we have:
_
1 0
0 1
_ _
1 0
0 1
_ _
1 0
0 1
_ _
1 0
0 0
_
as the four standard forms to one of which we can reduce any symmetric
matrix
_
a b
b c
_
by a change of basis! This is the claim, an important one, telling us a lot
about quadratic forms on R
2
and indeed on R
n
. This amount of simpli-
cation and classication makes life very much easier, and given that life
is complicated enough, such a classication result is well worth proving. It
-2
-1
0
1
2
-2
-1
0
1
2
0
0.5
1
1.5
2
2.5
3
3.5
4
Figure 7.3.4: The graph of the degenerate quadratic form x
2
.
would also be nice to be able to extract the change of basis matrix which does
the job. The procedure for getting it solves a whole raft of other apparently
unrelated problems at the same time.
First some denitions to add clarity:
Denition 7.3.1. A quadratic form f(x) (and the associated symmetric
matrix) is said to be positive denite i f(x) 0 and f(x) = 0 x = 0.
It should be obvious that the form x
2
+ y
2
is an example. It should also
be obvious that rotating the space will not stop a positive denite function
being positive denite, neither will reecting it, so it follows that if a matrix
A is positive denite, so is QAQ
1
for orthogonal Q.
Denition 7.3.2. A quadratic formf(x) is negative denite i f is positive
denite.
It is obvious that the remark about invariance under similarity by an orthog-
onal matrix applies also to negative denite matrices. It is clear that we have
characterised the matrices
_
1 0
0 1
_ _
1 0
0 1
_
as being the archetypes for positive denite and negative denite quadratic
forms.
7.4. DIAGONALISING SYMMETRIC MATRICES 139
This leaves the other forms which are called indenite, and which will be
positive in some regions and negative in others, which are characterised by
the matrix
_
1 0
0 1
_
together with the degenerate forms which I shall largely ignore.
Exercise 7.3.2. Show that f(x, y) = x
2
2xy +y
2
is a degenerate form and
nd a basis in which it has matrix the fourth standard form.
Now I note that the standard form matrices above are all diagonal, that is
the elments x
i
j
with i ,= j are zero. The other property they have is that the
diagonal elements are 1 except for the degenerate case where one is zero.
If we have a diagonal matrix
_
a 0
0 c
_
then it is easy to see that we can reduce it to one of the standard forms by
simply rescaling x and y: If ax
2
+cy
2
is the quadratic form, and both a, c are
positive, replace x by X = x/
a and y by Y = y/
_
(c) and we get X
2
+Y
2
which is in standard form. If either a or c is zero then we cannot do this and
the form is degenerate. If both are zero we dont have a quadratic form at
all.
Exercise 7.3.3. Do this for the case when one or both of a, c is negative.
So the key question is, can we diagonalise a symmetric matrix by choosing
a suitable basis?
Exercise 7.3.4. Do this by assuming that there is a rotation matrix Q so
that the symmetric matrix A has QAQ
1
= D where D is diagonal, and
solving for Q,
There is a clunky way of doing this (see the last exercise!) which does not
generalise to higher dimensions than two without colossal eort, and there is
a smart way which I shall now show you.
7.4 Diagonalising Symmetric Matrices
We look at examples in dimension two but try to use methods which will
generalise to higher dimensions.
Suppose we rst assume that the matrix Ais positive denite and that there is
some rotation which will produce a new basis in which the matrix is diagonal.
Then it will look, in this coordinate system like ax
2
+ cy
2
for some positive
a, c. Thinking of this as a linear map, the matrix
A
=
_
a 0
0 c
_
has the property that it stretches everthing along the x axis by a factor of
a and along the y axis by a factor of b. For the original matrix A therefore
there are two orthogonal directions (the new basis vectors determine them)
so that if we choose a vector, u, along the rst, A(u) = au, and if we choose a
vector, v, along the second, A(v) = cv. The trick is to nd these directions,
particularly when we dont know in advance what a and c are.
Denition 7.4.1. A vector u R
n
such that R, A(u) = u is called
an eigenvector of A for . The number is called an eigenvalue for A. The
set of such eigenvectors for a xed eigenvalue is called the eigenspace for that
eigenvalue; it is a line through the origin when the eigenvalues are all dierent.
When two eigenvalues are the same and the rest are dierent, if there are
two linearly independent eigenvectors with that eigenvalue, it will be a plane
through the origin. As we shall see it is possible to have two eigenvalues the
same without having two corresponding eigenlines determining an eigenplane.
So the problem is to compute the eigenvectors and eigenvalues for our matrix
A, which we know to be symmetric.
We can write Au = u as (AI
n
)u = 0 where I
n
is the identity matrix in
R
n
. If the matrix AI
n
were invertible, the only solution to this equation
would be u = 0 so we know that if there is a suitable eigenvector, AI
n
is
not invertible. From which it follows that it has zero for its determinant.
This tells us that det(A I
n
) = 0. Let us look at this in the case where
n = 2.
We get
_
a b
b c
_
_
0
0
_
for the matrix and this reduces to
_
a b
b c
_
The determinant of this is the equation
ac +
2
a c b
2
=
2
(a +c) b
2
a quadratic in . And we have that this is zero, which we know gives us either
two dierent real roots (the eigenvalues) or a complex conjugate pair. For
the cases we are studying we have good reason to expect the rst alternative.
7.4. DIAGONALISING SYMMETRIC MATRICES 141
This gives us the eigenvalues. By substituting in the matrix equation
(A I
2
)u = 0
for each of the known values of we can extract a suitable u.
Example 7.4.1. Take
A =
_
2 1
1 4
_
Then
A I
2
=
_
2 1
1 4
_
and the determinant is
(2 )(4 ) 1 =
2
6 + 7
Solving the quadratic we get 3
2 1.586, 4.414.
Putting the rst of these back in the matrix equation we get
_
2 (3
2) 1
1 4 (3
2)
_ _
x
y
_
=
_
0
0
_
or
_
2 1 1
1
2 + 1
_ _
x
y
_
=
_
0
0
_
This gives two equations, y = (
2 1)x and x = (
2 + 1)y and these are

the same equation, a useful check on whether you have done the sums right.
This is a line, and we can choose a basis vector along it:
_
1
2 1
_
The norm of this is
_
4 2
2, and dividing by the norm to make this a unit

length vector we get
_
_
1
42
21
42
2
_
_
The other line comes out as y = (1
2)x and we note that the product

of the gradients is 1, or alternatively the vectors
_
1
2 1
_
,
_
1
1 +
2
_
have a dot product of zero. The norm of the second is
_
4 + 2
2 This tells
us that the matrix
Q =
_
_
1
42
2
1
4+2
21
42
2+1
4+2
2
_
_
is an orthogonal matrix which should have determinant one, which I shall
leave you to conrm. Note that by choosing the negative of one of the basis
vectors we should have got determinant 1 so be careful to choose correctly!
Then Q is a rotation matrix which changes the basis and you should verify
that
Q
T
AQ = D
where D is the diagonal matrix of eigenvalues. I recommend you use Math-
ematica and be prepared to use the Simplify[%] command a lot.
Remark 7.4.1. Most examples are better chosen to have easier numbers, so
the sums wont be as bad when you get a question in an examination asking
you to diagonalise a matrix!
Denition 7.4.2. The polynomial detAI
n
of degree n for an nn matrix
A is called the characteristic polynomial for A.
Exercise 7.4.1. Show that the characteristic polynomial is a property of the
underlying linear map, not just of the matrix representation.
Exercise 7.4.2. Hence show that the sum of the diagonal elements of any
square matrix, the trace of the matrix, is also invariant under change of basis
and is hence, like the determinant, a property of the operator.
Remark 7.4.2. It is a curious fact (The Cayley Hamilton Theorem) that
if the characteristic polynomial for A is a
0
+ a
1
x + a
2
x
2
+ + a
n
x
n
, then
replacing x by A always gives
a
0
I
n
+a
1
A +a
2
A
2
+ +a
n
A
n
= 0
You will nd a proof in Hirsch and Smales Dierential Equations, Dynamical
Systems and Linear Algebra which is recommended to the brightest students.
It is also a fact that for a symmetric real matrix, the eigenvalues all exist and
are real. In this case the matrix is said to be diagonalisable and the above
process is guaranteed to do it. In fact you can always choose the matrix Q
to be a special orthogonal matrix, just as in the example above. In general
there are plenty of non-symmetric matrices which can be diagonalised, but
not by orthogonal matrices. And there are plenty of matrices which are not
diagonalisable with real entries:
7.5. CLASSIFYING QUADRATIC FORMS 143
Example 7.4.2.
_
0 1
1 0
_
has characteristic polynomial (in x because takes too long to type)
0 x 1
1 0 x
= x
2
+ 1
which has imaginary roots i and i
On the other hand
_
5 3
6 4
_
has characteristic polynomial
5 x 3
6 4 x
= x
2
x 2
which has two perfectly good real roots, 2 and 1. If you look at the meaning
of a negative eigenvalue it is fairly obvious and means that the determinant
is negative. Note that the determinant of the original matrix must be the
determinant of Q
T
AQ = D which is the product of the eigenvalues. This
makes a useful check on whether you have got the eigenvalues right.
We shall see later that there are matrices which are not diagonalisable at all!
If you compute the eigenvectors you should nd that they are no longer or-
thogonal. This is the best thing that can happen when you try to diagonalise
a matrix that isnt symmetric. The worst thing is that it cant be done!
7.5 Classifying Quadratic Forms
Although I havent proved it, it is the case that every symmetric matrix
has an orthogonal basis in which it is diagonal, and it follows that every
quadratic form can, in this new basis be represented as ax
2
+cy
2
. If we now
do a stretching by the factor 1/
_
[a[ on the x axis, and 1/
_
[c[ on the y axis,
we reduce it further to one of the forms x
2
+ y
2
or x
2
y
2
or x
2
+ y
2
or
x
2
y
2
. The only possible obstruction to this is if a or c is zero. They
cannot both be zero or we dont have a quadratic form at all. The two cases
x
2
y
2
and x
2
+ y
2
are not dierent because we can rotate one into the
other. Similarly the case f(x, y) = y
2
is just f(x, y) = x
2
rotated by /2. So
there are qualitatively three types of non-degenerate quadratic form on R
2
and one degenerate type. This was the original claim.
On R
n
we can do exactly the same thing: the matrix can be diagonalised and
then the diagonal terms made 1. We can rotate axes so that we can place
all the +1s at the top and the 1s at the bottom: swapping axes does not
give us anything new, so the only question is where the changeover occurs.
We could have all n being +1, or all but the last being +1 and the last
1, and so on. The number of +1s can range from n to zero, and there are
therefore n + 1 non-degenerate quadratic forms on R
n
up to some rotating,
and scaling. I leave you to count the possible degenerate forms.
Remark 7.5.1. This solves the problem of speciying the shapes of the graphs
of quadratic functions on R
2
and also classies all quadratic functions on R
n
.
Those with n +1s in the nal matrix are positive denite. If you can persuade
yourself that it is a good idea to be able to classify these things, then you
will also be persuaded that choosing a good basis for a map has a lot going
for it.
Remark 7.5.2. Being able to diagonalise a matrix by nding a new basis
to represent the underlying linear map has applications to more than just
quadratic forms. Exponentiating a diagonal matrix is rather easy, and so
this has implications for solving systems of ODEs as indicated in Chapter
one.
Exercise 7.5.1. Show that a symmetric positive denite 22 matrix having
both eigenvalues the same is a scalar multiple of the identity.
7.6 Multivariate Gaussian (Normal) distribu-
tions
In gure 7.6.1 I show the gaussian distribution on R
2
up to scaling by a
constant to make the volume under the surface equal to 1. The form is not
very general, because I have given the simplest one. You will note that it is
the function
f(x, y) = exp(
(x
2
+y
2
)
2
)
The general form is
f(x) = K exp(
(x a)
T
C
1
(x a)
2
)
which we can write as
f(x) = K exp(
1
2
q(x))
7.6. MULTIVARIATE GAUSSIAN (NORMAL) DISTRIBUTIONS 145
-4
-3
-2
-1
0
1
2
3
4
-4
-2
0
2
4
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
x
exp(-( x
2
+y
2
)/2)
y
Figure 7.6.1: The graph of the Normal or Gaussian Distribution (more or
less)
where q is a quadratic form on R
n
.
The constant K is chosen so that
_
R
n
f = 1.
The matrix C in the quadratic form is a parameter specifying the generali-
sation of the variance and the vector a tells us where the centre of symmetry
or highest value of the function occurs. The one dimensional version is
f(x) =
1
2
exp(
(x a)
2
2
2
)
The (covariance) matrix C is positive denite and symmetric and it is easy
to see that the inverse must be too.
Exercise 7.6.1. Prove that a positive denite symmetric matrix must have
a positive denite symmetric inverse.
Exercise 7.6.2. Prove that a positive denite symmetric matrix has a unique
positive denite symmetric square root.
Exercise 7.6.3. Prove that the contours of a gaussian function on R
2
, that
is the sets x R
2
: f(x) = c for dierent c R
+
, are ellipses.
Exercise 7.6.4. Characterise the surfaces having constant value of the gaus-
sian function on R
3
.
Given a cluster of data points in R
2
and a desire to model it by a normal dis-
tribution, we have to nd the most sensible place to put a gaussian function
over the data: the maximum likelihood method of making the best choice
evaluates, for each possible a and C, the function over each data point and
multiplies the answers together. This gives a value for each choice of param-
eters, called the likelihood of that choice. It is not dicult to prove that this
is a maximum when a is at the centroid of the data, and when C has term c
i
j
being the average value of (x
j
a
j
)(x
i
a
i
). This is clearly symmetric and
it is easy to show the matrix is positive denite. The calculation in terms of
the data points is very simple.
Exercise 7.6.5. Write a set of N data points in R
2
as a 2 N matrix, D.
Express the centroid of the data set as the product of some particular matrix
by D. Express the centralised data set which has the centroid subtracted from
each point in matrix terms. Call this new matrix B. Show that (BB
T
)/N
gives the covariance matrix. Show that for any invertible matrix, A, AA
T
is
symmetric and positive denite. Deduce that the covariance matrix is always
positive denite except when it has determinant zero and explain what this
exceptional case tells us about the data.
Exercise 7.6.6. Conrm that this all works for data sets in R
n
.
Since modelling by normal distributions or by mixtures of normal distribu-
tions is very popular, this is quite important. It is often required to integrate
a gaussian function with some not very nice covariance matrix, and it is usu-
ally easier to transform back into the spherically symmetric form and do the
integration there.
Exercise 7.6.7. Suppose you have a bivariate normal distribution (that is a
gaussian on R
2
) and you draw a straight line in R
2
. Consider the section of
the bivariate normal over the line. This is a function of one variable. Show
that up to a scaling constant, this too is a gaussian function. Now show that
for a multivariate gaussian on R
n
and any non-trivial ane (linear plus a
shift) subspace of R
n
, the restriction of the function to the ane subspace is,
up to a multiplicative constant another multivariate gaussian on the subspace.
Remark 7.6.1. The above problem was tackled by some postgraduate stu-
dents once in Engineering for severely practical purposes. They screwed up.
I solved it in seconds. See if you can too. Hint: write the new function as
a composite and observe that the composite of a quadratic function and an
ane function is quadratic.
7.7. EIGENVALUES AND EIGENSPACES 147
7.7 Eigenvalues and Eigenspaces
The situation for non-symmetric matrices is rather dierent and can give
some unpleasant surprises to the person who is used to the symmetric case.
One of the problems is dealing with complex eigenvalues, another is dealing
with the case where there are two eigenvalues which are the same. We look
at the second case.
The ring of matrices has some properties which make it very dierent from
elds or from the ring of integers or the ring of polynomials. For example, if
A
n
= 0 in the integers, then A = 0. This is false in the case of matrices. For
example:
A =
_
1 1
1 1
_
Then it is simple to conrm that A
2
= 0 where 0 is the matrix with every
element zero.
Exercise 7.7.1. Is it true in the ring of polynomials that p
n
= 0 p = 0?
Exercise 7.7.2. How many square roots of the multiplicative identity are
there in R, in Z, in C, in the ring of polynomials, and in the ring of 2 2
matrices?
If we calculate the eigenvalues of the above matrix A we get characteristic
polynomial 1(1 x)(1 +x) +1 = x
2
The equation x
2
= 0 has only the two
roots 0 and 0. This shows that A is not diagonalisable or it would have to be
similar to the zero matrix, that is A = P
1
DP for D = 0, and hence would
have to be the zero matrix. which it isnt.
So repeated eigenvalues can certainly give some unpleasant surprises to the
unwary.
A matrix A with the property that there is some power of it equal to zero is
called nilpotent. Formally:
Denition 7.7.1. An n n matrix A is nilpotent i k Z
+
, A
k
= 0. If
k is the smallest positive such power we say the matrix has order k.
Suppose s, t are distinct eigenvalues for a 2 2 matrix A. Then I claim
Theorem 7.7.1. (A sI
2
)(A tI
2
) = 0.
Proof: I know there is an eigenvector u corresponding to the eigenvalue s
such that Au = su since s satises det(A sI
2
) = 0 so there is something
other than 0 in the kernel of AsI
2
and any non-zero element of this kernel
will do for u. Since (A sI
2
)(u) = 0 we have that Au = su. Similarly
there is some non-zero v such that Av = tv. Moreover u and v are linearly
independent if s ,= t since if they were we would have v = u and Av =
Au = su = tv, giving v = (s/t)u forcing s = t, contradiction. Hence we
have two linearly independent vectors u and v in R
2
which therefore give us
a basis.
Now note that (AsI
2
)(AtI
2
) = (AtI
2
)(AsI
2
) by multiplying both
sides out. That is, the two matrices (A sI
2
), (A tI
2
) commute.
Now we take any element w R
2
and write it as w = au + bv for a, b R.
Then
(A sI
2
)(A tI
2
)au +bv = (A sI
2
)(A tI
2
)au + (A sI
2
)(A tI
2
)bv
= (A tI
2
)(A sI
2
)au + (A sI
2
)(A tI
2
)bv
= (A tI
2
)0 + (A sI
2
)0
= 0 +0 = 0
Exercise 7.7.3. Conrm that if we have n distinct eigenvalues s
1
, s
2
, s
n
for A M
n,n
then
i[1:n]
(A s
i
I
n
) = 0
by extending the above argument.
Now consider the case where we have repeated eigenvalues for a 22 matrix
A. Suppose the repeated eigenvalue is s.
Now the above argument shows that we have at least one non-zero eigenvec-
tor, but the argument to provide a second linearly independent one collapses.
Exercise 7.7.4. Conrm the last observation.
We do however still have,
(A sI
2
)(A sI
2
) = 0
To see this has to be true, take the matrix A and make a very tiny change
to the entries to alter the characteristic equation so that the eigenvalues are
now s
1
and s
2
, both real, which are close to s but not actually equal to it
and are dierent. Since the characteristic equation is a quadratic and in fact
the quadratic x
2
trace(A) + det(A) we can get this result by taking the
top element of the second column and increasing it slightly. This will reduce
the constant term and so move the quadratic down a small amount, getting
two distinct real roots. Then we have (As
1
I
2
)(As
2
I
2
) = 0 by the above
argument. Now we take a curve in the (four dimensional) space of matrices
back to A, with A
being the line of matrices terminating at = 0 in A

0
= A
and with one eigenvalue being s
1
and the other s
2
and lim
0
s
1
= lim
0
s
2
= s.
As we go along the line to A, we have at every point (A
s
1
I
2
)(A
s
2
I
2
) =
0, so we certainly have
lim
0
(A
s
1
I
2
)(A
s
2
I
2
) = 0
Now all the maps are continuous: the characteristic polynomial is continuous
and the roots are a continuous function of the matrix entries, and the matrix
is a continuous function of a parameter along the curve in the space of
matrices. And the limit as 0 of A
= A. So it must be the case that

(A sI
2
)(A sI
2
) = 0
as claimed.
Alternative Proof: I show this without using continuity arguments.
Again let s be the sole eigenvalue for a 2 2 matrix A. There are two cases
to consider.
Case 1: Two linearly independent eigenvectors, u, v with Au = su, Av = sv.
Then for any w R
2
, w = au+bv and Aw = aAu+bAv = asu+bsv = sw
So (A sI
2
)w = 0. This shows A sI
2
= 0 so certainly (A sI
2
)
2
= 0.
Case 2: Only one eigenline with basis u. Then span(u) = ker(AsI
2
) and by
the rank nullity theorem the image of AsI
2
is a one dimensional subspace
U R
2
. Then A sI
2
takes U to U and any non zero basis vector v for U
is taken to tv by AsI
2
, for some t R. So Av = tv +sv = (s +t)v. This
makes s + t an eigenvalue for A restricted to U and hence an eigenvalue for
A. This can only be possible if t = 0 since s is the only eigenvalue for A.
This ensures that v is in the kernel of AsI
2
(as well as being in its image!).
So v is some multiple of u and (A sI
2
)
2
= 0.
The following exercises should make you more familiar with nilpotent matri-
ces:
Exercise 7.7.5. If A is an nn matrix which is nilpotent, show that A
n
= 0
Exercise 7.7.6. Show that if N is nilpotent then so is any matrix represent-
ing it in any basis, and all such matrices have the same order of nilpotence.
Exercise 7.7.7. Show that the matrix
A =
_
_
0 1 2
0 0 1
0 0 0
_
_
is nilpotent.
Exercise 7.7.8. Show that if A is a square matrix with all entries on and
below the diagonal zero, then it is nilpotent.
Exercise 7.7.9. Show that if A is a square matrix with all entries on and
above the diagonal zero, then it is nilpotent.
Exercise 7.7.10. Show that the transpose of a nilpotent operator is nilpotent.
Exercise 7.7.11. What possible values are there for the eigenvalues of a
nilpotent matrix?
We can strengthen the last result to show that the dimension stays at two.
This is obvious because (AsI
2
)
2
= 0 so the kernel is the whole space, but it
is useful to use the continuity idea here too, because it generalises to higher
dimensions.
Suppose we perturb the matrix so as to get two distinct real roots. Then there
are distinct eigenvectors which certainly span R
2
. As we return continuously
to the case of repeated roots, the eigenlines may get closer and closer and
coincide in the limit, which appears to detroy the argument. But we can, if
this happens, use the Gram-Schmidt process to choose an orthogonal basis,
and this can be made to change continuously as we proceed along the curve.
The eigenbasis collapses in the limit but the orthogonalised basis still exists
and has two elements, so the dimension stays the same.
Denition 7.7.2. If s is an eigenvalue of an nn matrix A which is repeated
k times, then we say that s has multiplicity k.
Theorem 7.7.2. If s is a real eigenvalue of an n n real matrix A with
multiplicity k, and all other eigenvalues are real then ker(A sI
n
)
k
has
dimension k.
Sketch Proof: This generalises the above argument to polynomials of degree
n. The case of repeated roots is indicated in gure 7.7.1 by an arrow.
It is intuitively plausible that we can perturb the polynomial slightly and
turn any repeated root into distinct roots. This can be done by making
small changes to the matrix entries of A. If k = n then we would certainly
have this result in dimension 2 since ker(AsI
2
)
2
= R
2
. The same argument
as for the dimension two case will give us the dimension n case. If k < n,
we have some other factors for the other eigenvalues to put in the product,
but again we can perturb the particular k eigenvalues slightly and again a
continuity argument will produce the required result. This is true whether
<
Figure 7.7.1: The graph of a polynomial with repeated roots)
the other eigenvalues are all distinct or if they have some other multiplicity,
or even if they are complex. We can always perturb A slightly to make them
all dierent, and then head continuously back to A
Denition 7.7.3. When s is a real eigenvalue of an nn real matrix A and
s has multiplicty k, ker(AsI
n
)
k
is called the generalised eigenspace for the
eigenvalue s.
Note that the eigenspace for the eigenvalue s is just the space of all eigen-
vectors u in R
n
such that Au = su, and this is always a subspace of the
generalised eigenspace for s, since if Au = su then (A sI
n
)(u) = 0 and so
certainly (A sI
n
)
k
(u) = 0 for k 1. It follows that the dimension of the
eigenspace is never greater than the multiplicity of the eigenvalue. The di-
mension of the generalised eigenspace is in fact equal to the multiplicity of the
eigenvalue. (Your textbook talks about algebraic and geometric multiplicity
which doesnt seem to me terribly useful or informative.)
Example 7.7.1. The matrix A = 2I
2
has eigenvalue 2 with multiplicity 2
and every vector is in the eigenspace. And A 2I
2
= 0. On the other hand
the matrix
A =
_
1 1
1 1
_
has an eigenspace which is one dimensional with basis (1, 1)
T
and both
eigenvalues zero. But the generalised eigenspace is the whole space since
A
2
= (A 0I
2
)
2
= 0.
The following easy theorem is crucial in exponentiating matrices:
Theorem 7.7.3. If all eigenvalues of a matrix A are real and the same,
then A has a decomposition A = S +N for a diagonalisable matrix S and a
nilpotent matrix N, and S and N commute.
Proof: We know that if the eigenvalue is s then A sI
n
satises (A
sI
n
)
n
= 0 so it is nilpotent, and A = sI
n
+ (A sI
n
) which is the required
decomposition. Furthermore,
s
I
n(A sI
n
) = sA s
2
I
n
= (A sI
n
)sI
n
so S and N commute.
Theorem 7.7.4 (Primary Decomposition Theorem). The dimension of
the generalised eigenspace for an eigenvalue s, for the case where all eigen-
values of an nn real matrix A are real, is the multiplicity of the eigenvalue
s. Moreover, R
n
is the direct sum of the generalised eigenspaces.
Sketch Proof: Again we rely on there being, close to any polynomial having
all real roots but with some multiplicities, another polynomial which has all
distinct real roots. The formal proof of this requires us to say what we mean
by close, but this is not dicult to dene and I leave you to invent a suitable
formal denition. It also requires us to be able to prove that if I specify
the characteristic equation, you can always choose matrix entries to get that
equation. This is equivalent to solving a set of n multilinear equations with
n
2
variables, so there are a lot of dierent ways to do this. Finally, we need
to believe that the roots are a continuous function of the coecients in any
polynomial. For quadratics, cubics and quartics there is a formula to give the
roots and the formula provides a continuous function of the coecients, so
it is certainly true in these cases. The general proof requires some analysis.
Given these assumptions, we need only perturb the given characteristic poly-
nomial by changing the matrix entries slightly to recover the case of n distinct
real eigenvalues and a basis of distinct eigenvectors. By constructing a con-
tinuous parametrisation for a line in the space of matrices (which are, after
all, a vector space of dimension n
2
) we can continuously change back to the
given matrix, and hence to the given characteristic equation. The situation
is then essentially the same as the special cases.
We now have a collection of subspaces, the generalised eigenspaces, and the
sum of the dimensions of these is n. And we have again by continuity that
j[1:k]
(A s
i
I
n
)
r
i
= 0 (7.7.1)
where the multiplicities of the eigenvalue s
i
is r
i
and there are k distinct
generalised eigenspaces. If we have k distinct subspaces and every vector
v in R
n
is in one, then v can be in only one, for if distinct generalised
eigenspaces had non-trivial intersection, the sum of them could not span R
n
and equation 7.7.1 ensures that it does.
Exercise 7.7.12. If A is a 3 3 matrix over R with one real eigenvalue
s repeated three times, show without using continuity arguments that (A
sI
3
)
3
= 0. Hint: show that the kernel of (AsI
3
) has dimension at least one
and use the rank-nullity theorem to investigate (A sI
3
) on the image.
Remark 7.7.1. There is a somewhat opaque proof of the last theorem in
Appendix III of Hirsch and Smale; also a typo on the rst page with
instead of . I shall give a proof that it works in dimensions two and three
which does not use continuity arguments and leave you to try to extend it.
First look at the case of R
2
. Either we have two dierent eigenvalues in which
case each has its own eigenspace of dimension 1 and the result follows, or
the two eigenvectors are the same, say s, in which case the result showing
(AsI
2
)
2
= 0 shows that the generalised eigenspace is the whole space and
again the theorem holds.
Now look at the case of R
3
. If there are three dierent real eigenvalues for
A then the result is certainly true because each eigenvalue has an eigenspace
which is a line and eigenvectors from dierent eigenspaces are linearly inde-
pendent. If all three eigenvalues are the same, s, then (A sI
3
)
3
= 0 by
an earlier exercise,and the generalised eigenspace is the whole space, so the
theorem also holds in this case. The last case is where one eigenvalue, s has
multiplicity 2 and the other, t has multiplicity 1. There is an eigenspace V
for t of dimension 1 (it couldnt be more than 1 because any other eigenvector
for t would mean t must occur more than once as a root in the characteristic
polynomial). Take U = ker(A sI
3
)
2
. We certainly have U V = 0 since
if v U V we have both Av = tv and A
2
v = t
2
v and 2sAv = 2stv so
(AsI
3
)
2
(v) = t
2
v 2stv +s
2
v = (t s)
2
v = 0 which means s = t, which
is impossible, or v = 0.
Now U contains at least one eigenspace W of dimension 1 otherwise AsI
3
would be invertible and have non-zero determinant. If it has in fact got two
distinct eigenlines, then ker(AsI
3
) has dimension 2 and must be the same
as U. So the only case left is where W = ker(A sI
3
) has dimension one.
We look at the map AsI
3
from R
3
to R
3
. We note that it takes V to itself,
v going to (t s)v. Since the dimension of the kernel is one, the dimension
of the image is two, so take K to be a one dimensional subspace of the image
which is not V so that the image is K V .
Now since AsI
3
takes everything in R
3
to K V it certainly takes K V
to itself. It is a linear operator on a real two dimensional space and therefore
has either two real eigenvalues or a complex conjugate pair. Since t is clearly
one of the eigenvalues, the other must also be real. It has to be s since
otherwise the original A would have three distinct eigenvalues. It has to
have an eigenspace of dimension one, but this must be W again, this being
the only eigenline for s. W is not just the kernel, it is also in the image of
AsI
3
. Consequently, (AsI
3
)
2
has a two dimensional kernel, U, and again
R
3
= U V .
Remark 7.7.2. Doing it for n = 4 follows the same pattern: if all eigenvalues
are dierent then it is trivial, if all eigenvalues are he same it is still trivial,
so we need to worry about the two cases when we have three the same and
one left over (which is a minor extension of the above argument for two the
same and one left over) or the case where we have two eigenvalues each with
multiplicity 2.
Exercise 7.7.13. Do the last case in R
4
.
7.8 Oranges and Mirrors
Now we have built up some useful machinery for describing the maps in the
original problems, we can easily solve the rst three. I restate them in case
you have forgotten.
Problem One
I take an orange, roughly a solid ball. I mark on a North pole,
equator, and a point on the equator.
I hold the orange up in what I shall call the initial position.
Now I stick a knitting needle through the centre at some denite
angle I shant trouble to specify, and rotate the orange about
the needle by some angle which I shall call . This changes
the orientation of the orange but not its centre, to what I shall
call the intermediate position. Now I withdraw the needle and
choose another direction to stick it through the centre of the
orange, and rotate about the needle by an angle I shall call .
This takes me to the nal position of the orange.
I now ask the following question: Is there a way of poking the
needle through the orange in this nal position and rotating by
an angle which will return the orange in one rotation to its
initial position?
Solution:
The set of special orthogonal roations of R
3
is a group, SO(3), and it fol-
lows that the two rotations, certainly elements of SO(3), compose to give an
element of SO(3). We shall be done if we can show that every elements of
7.8. ORANGES AND MIRRORS 155
SO(3) has an axis which is left xed by the map. The characteristic equation
of the matrix is a cubic polynomial. Every cubic has one real root, since
the limit of a
0
+ a
1
x + a
2
x
2
+ a
3
x
3
as x is depending on the sign
of a
3
, and the limit as x is also multiplied by the opposite sign.
There is therefore some value of x for which the function takes the value zero
by the intermediate value theorem. So there is at least one real eigenvalue
and a corresponding eigenvector. The eigenvalue gives the stretching value
along the eigenvector and since a special orthogonal matrix leaves distances
invariant, the eigenvalue must be 1. If all the eigenvalues are real, then
they must all be 1 and since the determinant is their product and is +1,
at least one of the eigenvalues is +1. If two are complex conjugates, their
product is positive so the real eigenvalue is also positive and must be +1, So
there has to be an eigenvalue of +1.
So there are three cases: one real eigenvalue of 1 and a complex conjugate
pair, in which case there is an eigenvector with eigenvalue 1, or three real
eigenvalues one being 1 and the other two being 1, and again there is an
eigenvector for the eigenvalue 1. The last case is when every eigenvalue is
1, and again at least one eigenvector with eigenvalue 1. (In fact the only
possibility here is that A is the identity matrix, but we dont need to prove
this.)
The eigenspace for this eigenvalue 1 is therefore a line through the origin
which is left xed by the map, that is, it is the axis of a rotation. It follows
that every special orthogonal map on R
3
is a rotation. The inverse of the
composite is the required rotation that restores the orange (and all of R
3
) to
its original position.
Problem Two
We again start with an orange in its initial position and again
I stick a knitting needle through the centre. The next bit re-
quires some visual imagination: I imagine I stick a plane mirror
through the centre of the orange, orthogonal to the needle, and
I reect the orange through the mirror, on both sides of the
mirror.
This gives the intermediate position of the orange. I call the
position of the plane mirror M and also use M : R
3
R
3
to
denote the map that reects all of R
3
in the mirror.
Now I stick a needle through and put in another mirror N or-
thogonal to the needle and passing through the centre of the
orange. Reection in this new mirror (of the rection in the
rst mirror) gives the nal position of the orange and I repre-
sent this by the map N M.
Question: Is it possible to put a mirror in some position to
return from the nal position to the initial position in one hit?
Question: Is it possible to stick a knitting needle through the or-
ange and do a rotation to return the orange to its initial position
in one hit?
Solution:
The rst map M is an orthogonal map which has an eigenvalue of 1 with
eigenspace the line of the needle, and so is the second, N. Each leaves the
orthogonal plane xed so there are two eigenvalues of +1 in both cases.
The determinant for both is 1 therefore, which is what we expect of an
orthogonal but not special orthogonal map. The composite has determinant
(1)(1) = +1 and is therefore, by the previous solution a rotation and
not a reection. The answer to the rst question is therefore no, and to the
second one yes.
It is worth remarking that this result can be obtained by pure geometry. Each
reection leaves a plane xed. If the mirrors actually coincide the composite
is the identity which is a rotation of an admittedly trivial sort. If they do
not, the planes intersect in a line which is going to be the axis of rotation
of the composite since it is left xed by both reections and hence by the
composite. Looking at the plane orthogonal to this line, it is easy to see that
the composite is a rotation by twice the smaller of the angles between the
7.8. ORANGES AND MIRRORS 157
planes. (Take the line which is the intersection of this plane with one of the
mirrors, say M. Then a point on this line is on M so is left xed by the
reection M but is reected by N to twice the angle between M and N. It is
easy to see that the corresponding result for a point on the mirror N forces
the unit circle in the plane orthogonal to the interesction of the mirrors to
be rotated.)
Problem Three
You stand in front of a large wall mirror and look at yourself.
You wink your right eye and the mirror image of you winks its
left eye, so you deduce that the mirror has swapped your left
side and your right side.
Now you lie horizontally in front of the mirror and again you
wink your right eye and the image winks its left eye. But now
the mirror has taken the top eye to the bottom eye and vice-
versa.
Question: How does the mirror know which way up you are?
Solution:
The problem here lies in the vagueness of natural language and the large
number of implicit assumptions we make in saying things. It is necessary to
translate into a more precise language in order to see what is going on.
When you say that when you wink your right eye the image winks its left
eye what you are doing is replacing the image in the mirror by a composite
of two maps. One is a reection in the plane of bilateral symmetry passing
through your body. Doing this interchanges your right and left. but leaves
you in the wrong place because you coincide with yourself. So you follow this
reection through your symmetry plane with a rotation about the centre of
the mirror: your reection walks around the other side of the mirror and looks
at you as you wink. This decomposition into two maps is done unconsciously
but quickly. Why is it done? The existence of bilateral symmetry in things
like the human body seem to make it instinctive to extend the symmetry
that we learn when we identify other people as much like ourselves, we note
that if someone else was moved to be where we are, theyd look similar to
us. So having an image of yourself requires you to be able to play this game
of shifting bodies around in space. We learn this at an early age, at least
most of us do. The same sort of thing is presumably involved in the common
assumption that since poking a needle into my leg hurts me, poking a needle
into your leg will probably hurt you. Contrary to anything Bill Clinton said,
he doesnt feel your pain, but he can make a good guess at some of it by
performing this transformation. So shifting other people by imaginary maps
to coincide with ourselves happens a lot in much more general senses than
the merely geometric, but it happens there too, and the inverse map is also
one which we use instinctively.
Thus we see the image as being pretty much like us (our bilateral symmetry
contributes substantially to this perception; in a world in which all men
had their right ear ten times as big as their left and women the other way
around, we should probably feel that our image in a mirror had suered a
sex change. It would be dicult to check this suggestion.) but shifted and
reected through our plane of symmetry.
It would be possible to decompose the reection in the mirror as a sudden
jump from the other side of the mirror landing head-rst where your feet are,
followed by a reection in a horizontal plane passing through our middles. We
dont choose this particular decomposition, probably because we see other
people walking around with their heads up and their feet down (with respect
to the gravity eld) quite a lot but we seldom see people jumping and landing
on their heads. Maybe if we didnt live in a gravity eld wed regard it as a
reasonable decomposition. So we denitely have preferred decompositions of
the map.
When you lie down in front of the mirror you do a similar sort of decompo-
sition, but you still use your plane of bilateral symmetry, which is now in a
dierent place. The decomposition being dierent, there is a slight tempta-
tion to think the mirror is doing something dierent. It isnt, you are.
This solves the rst three problems. Now we proceed to the last.
Chapter 8
Dynamical Systems
We now start on the study of autonomous systems of ordinary dierential
equations which is an important application of Linear Algebra ideas, some
of which we have already covered and some more of which remain to be
developed. In all that follows, V is a nite dimensional vector space and
indeed is usually R
n
for some positive integer n.
8.1 Low Dimensional Vector Fields
Reference: Chapter one of Hirsch and Smale
I cannot improve on the treatment of Hirsch and Smale so I copy their rst
chapter. Figure 8.1.1 shows what kinds of solutions you get to the extremely
simple ODE x = ax when a > 0. They have the form x = Ae
at
for any
constant A, and I have given a few curves for dierent choices of A. In this
case I made a = 1, but if a is positive this does not change the qualitative
picture much as is shown in gure 8.1.2 where I made a = 1/2.
On the other hand if I make a negative I get a dierent kind of family of
possible solutions as in gure 8.1.3.
If the value of a is zero I just get the constant functions out as shown in
gure 8.1.4.
We can take the set R then and divide it into two regions with 0 as the
dividing point, and as we move slowly through the exponent space R so we
slowly change the solution curves, with a bifurcation point at the origin.
Note that as long as a ,= 0, perturbing the parameter a does not make
much dierence qualitatively to the solution family. We refer to this as
stability under change of parameter. Normally in Physics we cannot measure
159
160 CHAPTER 8. DYNAMICAL SYSTEMS
Figure 8.1.1: Solutions for positive exponent.
Figure 8.1.2: Solutions for smaller positive exponent.
8.1. LOW DIMENSIONAL VECTOR FIELDS 161
Figure 8.1.3: Solutions for negative exponent.
Figure 8.1.4: Solutions for exponent zero.
Figure 8.1.5: Both parameters positive.
parameters to innite precision and so it is highly desirable that our models
of the world have this kind of stability, so that if we are a little bit wrong
in the parameter we are not wildly wrong in the kinds of predictions we get
out.
Note that although there is no compulsion to interpret t the independent
variable as the time, I shall tend to assume it is, so that x(t) is the position
along the x-axis of a point at time t. This is called a one (dimensional)
dynamical system.
Next we look at a system of ordinary dierential equations, and again we
look at some very simple cases.
x = ax
y = by
Now there are four parameter-stable possibilities, both a, b positive, a >
0, b < 0 both a, b negative and a < 0, b > 0. Since the equations are
uncoupled, which means that the rst has only x in it and the second only
y, it is rather easy to draw the graphs of all four possibilities. It will be
more useful to draw the vector elds for them. This means we attach to the
point (x, y)
T
the vector (ax, by)
T
saying how fast a solution is moving at that
point and in which direction it is moving. Mathematica draws these rather
quickly. In gure 8.1.5 I show what happens if both a and b are positive and
in gure 8.1.6 what happens if they are both negative.
In gure 8.1.7 I show what happens if one is positive and the other negative:
it doesnt much matter which way around because rotating the picture of
one by /2 gives the other.
Figure 8.1.6: Both parameters negative.
Figure 8.1.7: One parameter positive, the other negative.
>
>
>
>
>
>
>
>
>
>
>
>
>
> <
Figure 8.1.8: Go with the ow.
Again, these vector elds are parameter stable in that so long as neither is
zero, wobbling the parameters will not change the qualitative features much.
If both are positive, we rush away from the origin, both negative we rush
towards the origin, one of each and we come sweeping in along one axis and
go sweeping out along the other. This, of course, is if I treat this as a two
dimensional dynamical system.
I can sketch some solution curves by starting at some point and following the
arrows so that the arrow is always tangent to the path. This is what solving
a system of ODEs means.
You can see that I will get expressions like
_
x(t)
y(t)
_
=
_
Ae
at
Be
bt
_
and changing the starting point (A, B)
T
will give a family of curves like those
in gure 8.1.8 for the case when the parameters have dierent signs.
Families of curves like this are called phase portraits of the dynamical system
(of ODEs), although it is probably easier to plot vector elds these days. On
the other hand, a solution can tell you very clearly where things are headed:
a curve that is a closed ellipse is quite dierent from one that spirals in to
the origin, but it can be hard to tell which is which by looking at the vector
elds.
It is seldom that we get uncoupled systems: more usually we get them coupled
as in:
x = 5x + 3y
y = 6x 4y
Solving this sort of dynamical system is what the rest of this course is about.
The obvious question is, what has this to do with Linear Algebra? The
answer is an awful lot.
Exercise 8.1.1. You can use Mathematica to draw the vector eld by
typing in
<< GraphicsPlotField
PlotVectorField[{5x + 3y, -6x - 4y}, {x, -1, 1}, {y, -1, 1}]
Changing the formula to dierent choices of the four parameters will give new
vector elds almost instantaneously, and it is recommended that you try out
lots of numbers to see what happens. You will nd that qualitatively there are
not as many possibilities as you might imagine. This is important, because
again we do not always want the exact answer, we want to know the general
properties of the dynamical system.
Again, the question of stability under perturbation of parameters is important.
Can you nd some values for the four numbers a, b, c, d such that for the
system
x = ax +by
y = cx +dy
there are bifurcations at these points and dramatic changes in the behaviour?
Experiment with a, d close to zero b > 0 and c < 0 and draw the vector elds.
What other bifurcations can you nd?
There is another and quite dierent sense in which the word stable is used.
In rather sloppy terminology, we say that a point of a vector eld is an
equilbrium point when the vector at that point is the zero vector. Obviously,
if you start at an equilibrium point you stay there. Now it is not in the
nature of the world that things stay xed indenitely, and the question is
what happens if we make a small change to the location of a point. If you
think about what happens in the case of the vector eld 8.1.5 you can see that
shifting away from the origin (the equilibrium point) by only a very small
amount will result in getting pushed out even further (and faster!). We say
that the equilibrium is unstable. It corresponds to things like a rod balancing
on its end or a house of cards. For a while it may just stay there but one
small disturbance from outside and the system changes rapidly. By contrast
a ball on a at table doesnt change its position after a small perturbation,
and in the case of the dynamical system given by the circular ow
_
cos(t) sin(t)
sin(t) cos(t)
_
a small wobble to the equilibrium point doesnt take the point any further
away, it keeps the same distance. Even if the ow were elliptical instead of
circular, the distance would not increase for long and would return to its
earlier position. Such ows (and vector elds) are said to be stable. And
nally, in gure 8.1.6 you can see that a small disturbance from the equilib-
rium point is followed by a tendency to return closer to it. Such a ow (or
vector eld) is called asymptotically stable. A ball at the bottom of a bowl
is an example of such a system; a pendulum is another. More complicated
dynamical systems may have many equilibria, and working out which are
stable, which are unstable and which are asymptotically stable is of enor-
mous importance. The vector eld for gure 8.1.6 is called a sink, that for
gure 8.1.5 a source.
I have used the word ow without dening it:
Denition 8.1.1. A ow on a space X is a group action of R, + on the
space, that is a map
f : R X X
satisfying
1. x X, f(0, x) = x
2. s, t R, f(s, f(t, x)) = f(s +t, x)
Exercise 8.1.2. Translate this into English. I suggest you treat R as the
time.
Exercise 8.1.3. Describe a ow on the sphere S
2
which has two stable equi-
libria, one which has two equilibria one a source and one a sink, and one
which has two unstable equilibria. Could you have a ow on S
2
with three
equilibria? Sketch a ow on R
2
with three unstable equilibria, and another
with three equilibria, two asymptotically stable and one not. Hint: Think
balls rolling on a surface.
8.2. LINEAR ALGEBRA AND LINEAR SYSTEMS 167
The job of making the notion of stability for an equilibrium precise is tricker
than it may look. I shant deal with it in this course but it is a fascinating
area of work. See the rest of Hirsch and Smale to get some idea of what is
involved. This goes far beyond the linear systems we are going to be looking
at: the simple linear case is, however, a very important rst step.
Another reason for studying linear systems is that they are more general than
you might suppose. For example the second order linear ODE x+a x+bx = 0
which you would have studied in rst year can be rewritten as a system in
R
2
by putting y = x. This gives
x = y
y = bx ay
This generalises: k
th
order linear ODE systems on R
m
can be turned into
rst order linear systems on R
mk
.
8.2 Linear Algebra and Linear Systems
To see what this has to do with Linear Algebra, note that we can write the
vector eld as F(x) = Ax where A is a matrix and then the system of ODEs
in R
2
can be written
x = Ax
which makes the solution x(t) = e
tA
look rather natural as a generalisation
of the one dimensional case.
The notation rather suggests this, but we can verify with very little eort
that in the uncoupled case this is right.
Exercise 8.2.1. Show that for a diagonal matrix A the above method of
exponentiating the matrix is just a vector form of the methods for solving the
one dimensional case.
For the case where A is not diagonal, we still have that if we try x(t) = e
tA
,
write out the exponential as an innite sum and dierentiate it, we conrm
that x(t) = Ax.
Exercise 8.2.2. Verify this claim. Dierentiate e
tA
where A is a matrix.
Exercise 8.2.3. Show that if x(0) = u then the unique solution to x(t) =
Ax is x(t) = e
tA
u. Do this by looking at the dierence between any two
solutions.
It is obvious that we are restricting ourselves to the case where F is linear, and
also to the case where it does not depend on t, the case of an autonomous
system. This is because it is hard enough to solve the autonomous linear
ODEs, the non-linear, non-autonomous ones are usually too hard. We can
however learn a lot by approximating the non-linear systems by linear ones.
In general then we shall be looking at vector elds on R
n
given by linear
maps. The aim is to understand this important case thoroughly in order to
be able to (a) deal with a number of signicant practical problems and (b)
to be in a position to tackle even nastier cases later on.
The last exercise shows, as stated in Chapter one, that given the vector eld
F(x) = Ax where A is a matrix the solution is
x(t) = e
tA
u
where u is an initial value at t = 0. This solves the system. So all we have
to do is to exponentiate the matrix. This is a fairly horrible job even for
small matrices if we do it by writing down the innite series, and what other
method is there?
Exercise 8.2.4. If A = PDP
1
show that for every positive integer n,
A
n
= PD
n
P
1
and deduce that e
A
= Pe
D
P
1
.
Most of the rest of the chapter is about making the job of exponentiating
a matrix a more tractable job than it looks. We shall be much concerned
with nding a new basis with respect to which it is easy. For example, if the
matrix were symmetric, then as our exploration of quadratic forms indicated,
there is a basis which is merely a rotation of the standard basis in which the
matrix is diagonal. If we do this, we nd a basis in which the dierential
equations are uncoupled and we know how to solve them (exponentiating a
diagonal matrix is rather trivial!).
Exercise 8.2.5. Verify that last remark.
Unfortunately we shall not be meeting with symmetric matrices in general,
so life is a little more complicated. As you will see.
Exercise 8.2.6. Read Chapter Two of Hirsch and Smale. Take notes and
think of questions about the material. It is a sort of history lesson but one with
important consequences. It deals with Newtons work on planetary motion
which changed the world. It is not a coincidence that Western civilisation
is much richer than any other. There is no essential dierence between you
and a villager in India or Bangladesh, they are not any less intelligent than
you and probably work a lot harder. But you are more closely related to Isaac
8.3. BLOCK DECOMPOSITIONS 169
Newton, either by genes or by social connections, which is why you are here
reading and listening to Mathematics instead of standing in a eld picking
turnips or rice. Your good fortune. Make sure you deserve it.
8.3 Block Decompositions
Suppose A : R
n
R
n
has one eigenvalue s of multiplicity n. Then we can
write N = A sI
n
and S = sI
n
and get A = S + N where S is diagonal (in
any basis) and N is nilpotent. It is easy to see that NS = s(A I
n
) = SN
so the matrices S and N commute. This result generalises to any A with
real eigenvalues: it can be decomposed as the sum of a diagonalisable and a
nilpotent operator which commute. This makes exponentiating them easy,
we just have to nd the right basis.
We have the Primary Decomposition Theorem, theorem 7.7.4, which tells us
that if A has real eigenvalues, there is a decomposition of R
n
as the direct
sum of the generalised eigenspaces of A. An important property of these
generalised eigenspaces is that they are invariant under A, that is, if V
s
is a
generalised eigenspace of A for the eigenvalue s, then AV
s
V
s
.
Theorem 8.3.1. If V
s
is a generalised eigenspace of A for the eigenvalue s,
then AV
s
V
s
.
Proof:
v R
n
, v V
s
(A sI
n
)
r
(v) = 0
(A sI
n
)
r1
(A sI
n
)v = 0
(A sI
n
)
r
(A sI
n
)v = 0
(A sI
n
)v V
s
Av sv V
s
Av V
s

The Primary Decomposition Theorem tells us that for an operator A on R
n
with real eigenvalues, R
n
decomposes into a direct sum of the generalised
eigenspaces, and the last result tells us that these eigenspaces are mapped
into themselves by A. This means that we can choose a basis for R
n
which
is built out of bases for each of the generalised eigenspaces, and in this basis,
which I shall call the generalised eigenbasis, we can represent A as a matrix
in block diagonal form:
_
_
[A
1
]
[A
2
]
.
.
.
[A
k
]
_
_
In this decomposition, each [A
j
] is an r
j
r
j
matrix specifying how V
j
is
taken into itself. All the unmarked spaces are zeros.
Exercise 8.3.1. Conrm the above claim.
Theorem 8.3.2 (S-N Decomposition Theorem). If A is an operator on
R
n
with all eigenvalues real, then it is the sum of a diagonalisable operator
and a nilpotent operator which commute.
Proof:
We showed in Theorem 7.7.3 that on R
n
every operator with only one eigen-
value can be written as the sum of a diagonal operator sI
n
and a nilpotent
operator N, so we can apply this to each of the A
j
in the generalised eigen-
basis. So we can subtract o the diagonal part of A which is
_
_
_
_
s
1
s
1
.
.
.
s
1
_
_
_
_
s
2
s
2
.
.
.
s
2
_
_
.
.
.
_
_
s
k
s
k
.
.
.
s
k
_
_
_
_
The rest is a diagonal block array of nilpotent matrices which gives a nilpotent
matrix. Since each diagonal matrix A
j
commutes with the nilpotent part,
the nilpotent total matrix commutes with the total diagonal part
Remark 8.3.1. The above result pretty much wraps up the problem of
exponentiating square matrices with real eigenvalues, since we have that if
A and B commute,
e
A+B
= e
A
e
B
8.4. A (STOLEN) WORKED EXAMPLE 171
by an easy exercise.
Exercise 8.3.2. Show this. Also show that if A and B do not commute the
result is false.
The next section is a worked example stolen from Hirsch and Smale which
puts all this to work. Study it carefully because asking you to do a similar
problem is an obvious examination question
8.4 A (stolen) Worked Example
I shall give an example of a problem in three dimensions requiring us to solve
a linear system to show the general thrust of what we have done. This has
been stolen from Hirsch and Smale, pp112-115.
8.4.1 A little calculation
The problem is to solve the system
x = x +y 2z
y = y + 4z
z = z
This can be written as the problem of nding the ow of the vector eld given
by the linear operator T with matrix with respect to the standard basis
T
0
=
_
_
1 1 2
0 1 4
0 0 1
_
_
We know (or at least believe) that the solution is given by
e
tT
0
applied to any initial point in R
3
. So the problem is to exponentiate the
matrix T
0
. Trying to do this by using the innite series is not a very nice
problem which you are encouraged to try and give up on. Others are even
worse, frequently much worse. Here is a quick way.
We can read o the characteristic polynomial of T from the matrix since T
0
has everything below the diagonal zero, so we get p(t) = (t +1)
2
(t 1). Thus
we have an eigenvalue of -1 with multiplicity 2 and an eigenvalue of 1 with
multiplicity 1. We can nd an eigenvector for the eigenvalue 1 consisting
of
_
_
1
0
0
_
_
and an eigenvector for the eigenvalue 1 consisting of
_
_
0
2
1
_
_
It is not hard to check that there is no other eigenvector, but the vector
_
_
0
1
0
_
_
is obviously linearly independent of the two eigenvectors and can therefore
be adjoined to the rst to give
_
_
1
0
0
_
_
,
_
_
0
1
0
_
_
as a basis for the generalised eigenspace for the eigenvalue 1.
Exercise 8.4.1. Calculate (A+I)
2
and nd its kernel. Verify that the above
two vectors are a basis for it. Conrm that (A +I)
2
(A I) = 0.
This gives the decomposition into the two generalised eigenspaces, and a
generalised eigenbasis
_
_
1
0
0
_
_
,
_
_
0
1
0
_
_
,
_
_
0
2
1
_
_
I call this basis B.
Now in the generalised eigenbasis, the diagonalisable part is diagonalised by
the eigenvalues, so we have that in the basis B we can represent the operator
S as the matrix
S
1
=
_
_
1 0 0
0 1 0
0 0 1
_
_
8.4. A (STOLEN) WORKED EXAMPLE 173
Exercise 8.4.2. Conrm this claim.
The matrix representing the new basis in terms of the old is
P
1
=
_
_
1 0 0
0 1 2
0 0 1
_
_
the inverse of this is
P =
_
_
1 0 0
0 1 2
0 0 1
_
_
Exercise 8.4.3. Verify these claims.
The matrix representing S in the standard basis is therefore S
0
= P
1
S
1
P
which is easily seen to be:
S
0
=
_
_
1 0 0
0 1 4
0 0 1
_
_
We can now obtain the nilpotent part of the decomposition represented in
the standard basis as N
0
by subtracting S
0
from T
0
to get:
N
0
=
_
_
0 1 2
0 0 0
0 0 0
_
_
Exercise 8.4.5. Verify that this is nilpotent and nd the smallest m such
that N
m
0
= 0. Conrm that S
0
N
0
= N
0
S
0
.
Now we calculate e
T
0
by using the decomposition.
First we get e
S
0
.
We have
e
S
0
= e
P
1
S
1
P
= P
1
e
S
1
P
Now e
S
1
is trivially obtained by exponentiating the diagonal terms.
So we have
e
S
0
=
_
_
1 0 0
0 1 2
0 0 1
_
_
_
_
e
1
0 0
0 e
1
0
0 0 e
_
_
_
_
1 0 0
0 1 2
0 0 1
_
_
=
_
_
e
1
0 0
0 e
1
2e
1
+ 2e
0 0 e
_
_
Now it is east to see that N
2
0
= 0 so we need only the rst two terms of the
exponential series and
e
N
0
= I +N
0
=
_
_
1 1 2
0 1 0
0 0 1
_
_
Finally (almost) e
T
0
= e
S
0
e
N
0
and we have
e
T
0
=
_
_
e
1
e
1
2e
1
0 e
1
2e
1
+ 2e
0 0 e
_
_
What we want is e
tT
0
which is e
tS
o
+tN
0
= e
tS
0
e
tN
0
which is
e
tT
0
=
_
_
e
t
te
t
2te
t
0 e
t
2e
t
+ 2e
t
0 0 e
t
_
_
Exercise 8.4.9. Verify the calculation.
This gives the solution to the original system of ODEs.
Remark 8.4.1. You should be able to repeat the above calculation with
more or less any matrix that has real eigenvalues, although this could be a
longish job if it it a big matrix. It might be prudent to expect a 3 3 case
in the examination.
8.5 Complex Eigenvalues
In practice we cannot and should not expect the eigenvalues to be all real.
We know from the Fundamental Theorem of Algebra that eigenvalues will
generally occur in complex conjugate pairs when the entries of the matrix
are real numbers (and hence the coecients in the characteristic polynomial
are real).
Now everything we have shown here for real matrices representing linear op-
erators on R
n
goes over to complex matrices representing C-linear operators
over C
n
. This is simply a matter of noting what properties of R
n
were used
in the arguments and observing that it all made sense for any vector space
over any eld.
Note that a basis for R
n
is, without making any changes, a basis for C
n
.
We just decide to use complex numbers as the allowable scalars. And if
8.6. THE JORDAN CANONICAL FORM OF A MATRIX 175
A : R
n
R
n
is a linear operator, then A determines a unique extension
to a linear operator A
C
: C
n
C
n
by observing that any element of C
n
can be expressed as a (complex) linear combination of the standard (or any
other) basis for R
n
and insisting that A
C
is linear tells us what to do with
the element. The extension A
C
is called the complexication of A.
Denition 8.5.1. A linear operator on R
n
is called semisimple i its com-
plexication is diagonalisable on C
n
.
Note that if all the eigenvalues (real and complex) of A are distinct, then
the argument that shows we can diagonalise A in the real case goes over to
the complex case without any more work, and we have that in this case the
matrix A is semisimple. Of course, A may be semisimple even if some of the
real or complex eigenvalues repeat, the question is whether they have linearly
independent eigenvectors.
The S N decomposition of a matrix with only real eigenvalues goes over to
the case of some complex eigenvalues: there is a decomposition of any real
matrix into a nilpotent part and a semisimple part.
The case of complex eigenvalues is very important in the study of whether
the system of ODEs is stable. You will do more of this if you study dynamical
systems further in later years.
8.6 The Jordan Canonical Form of a Matrix
I shall not prove any of the results in this last section, the proofs may be
found in Hirsch and Smale, mostly in Chapter Six of that estimable book.
Denition 8.6.1. A matrix of the form
_
_
0
1 0
1 0
.
.
.
1 0
_
_
is called an Elementary nilpotent matrix or nilpotent block.
Theorem 8.6.1. For any nilpotent operator A on R
n
there is a basis for
which the matrix [A] representing A can be written in the diagonal block
form
_
_
[A
1
]
[A
2
]
.
.
.
[A
k
]
_
_
where each A
j
is an elementary nilpotent block.
Remark 8.6.1. The blocks may be arranged from the top down in order
of decreasing size. This will not generally be unique. We include blocks of
size 1 which consist of the number zero. It follows that any nilpotent matrix
is similar to one which has zeros everywhere except along the subdiagonal,
where some of the entries may be 1.
Some books use the transpose with the 1s just above the diagonal. A matrix
representing a nilpotent opeator in either form is said to be canonical or in
canonical form.
It follows that since we can take any of the generalised eigenspaces and nd
that on each of them the operator A with real eigenvalues has a decomposition
as sI
r
j
added to a nilpotent matrix, any operator A with real eigenvalues has
a basis for which it can be written in block diagonal form where each block
is of the form:
_
_
s
j
1 s
j
1 s
j
.
.
.
1 s
j
_
_
for an eigenvalue s
j
.
A block such as this is called an Elementary Jordan Block and when A is
represented in the form of diagonal blocks, each block being an Elementary
Jordan Block, we say the Matrix is in Jordan Canonical Form (or sometimes
just in Jordan Form for those who arent sure how many ns there are in
canonical). It can be shown that this is unique up to order of the blocks.
I am told this has some theoretical value but have to admit I have never
found out what it actually is. We dont actually need it for solving linear
systems of ODEs, but we do need to know about the S N decomposition
and be able to extract S and N.
Exercise 8.6.1. Write down some non-canonical nilpotent nn real matrices
for n = 3, 4 and canonicalise them. I suggest you start with a matrix with
8.6. THE JORDAN CANONICAL FORM OF A MATRIX 177
non-zero entries below the diagonal and zero above and on it and then try
doing EROs to reduce it. The EROs can be regarded as a matrix operation,
and you can try this as a candidate for the change of basis matrix.
I made up the word canonicalise as I hope you realise. It appeals to me
because of its sheer awfulness. Find out what canons are. Distinguish clearly
between cannons, canons and canons.

Linear Algebra - Alder

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Linear Algebra - Alder

Încărcat de

Drepturi de autor:

Formate disponibile

Introduction to Algebra and Linear Algebra

is the result of doing rst then

is the Cartesian product of A

(R, R) of innitely dierentiable functions

is not possible. Thus we

z z. This depends on the denition:

r, /2). Verify this carefully for

arctan(r), ). This contracts the plane R

the conjugate transpose is the matrix

is the identity matrix I

5.3. (LINEAR) SUBSPACES 83

5.5 Direct Sums

U f(u) = v and f(u

for U and another ordered basis C

-name for the same u, Then the

takes B-names to the corresponding

-names, and similarly the transition matrix T takes C-names to C

][S] = [T][f] and the becomes matrix multiplication

if we know the representation of f with respect to B, C.

denote new bases for R

] the matrix for f with respect to B

. Then we have the simple change of

2 / Q. So the obvious sequence 1, 1.4, 1, 41, 1.414,

Figure 7.1.3: Further transformed image of the unit

2 if we believe Pythagoras theorem, there are only two possible

2 + 1)y and these are

2, and dividing by the norm to make this a unit

2)x and we note that the product

being the line of matrices terminating at = 0 in A

= A. So it must be the case that

S-ar putea să vă placă și