Sunteți pe pagina 1din 67

ECON 381 SC Foundations Of Economic Analysis

2009

John Hillas and Dmitriy Kvasov


University of Auckland
Contents

Chapter 1. Logic, Sets, Functions, and Spaces 1


1. Logic 1
2. Sets 3
3. Binary Relations 4
4. Functions 5
5. Spaces 7
6. Metric Spaces and Continuous Functions 8
7. Open sets, Compact Sets, and the Weierstrass Theorem 10
8. Sequences and Subsequences 11
9. Linear Spaces 14
Chapter 2. Linear Algebra 17
1. The Space Rn 17
2. Linear Functions from Rn to Rm 19
3. Matrices and Matrix Algebra 20
4. Matrices as Representations of Linear Functions 22
5. Linear Functions from Rn to Rn and Square Matrices 24
6. Inverse Functions and Inverse Matrices 25
7. Changes of Basis 25
8. The Trace and the Determinant 28
9. Calculating and Using Determinants 30
10. Eigenvalues and Eigenvectors 34
Chapter 3. Consumer Behaviour: Optimisation Subject to the Budget
Constraint 37
1. Constrained Maximisation 37
2. The Implicit Function Theorem 41
3. The Theorem of the Maximum 43
4. The Envelope Theorem 45
5. Applications to Microeconomic Theory 49

Chapter 4. Topics in Convex Analysis 57


1. Convexity 57
2. Support and Separation 61

i
CHAPTER 1

Logic, Sets, Functions, and Spaces

1. Logic
All the aspects of logic that we describe in this section are part of what is called
first order or propositional logic.
We start by supposing that we have a number of atomic statements, which we
denote by lower case letters, p, q, r. Examples of such statements might be
Consumer 1 is a utility maximiser
the apple is green
the price of good 3 is 17.
We assume that each atomic statement is either true or false.
Given these atomic statements we can form other statements using logical con-
nectives.
If p is a statement then ¬p, read not p, is the statement that is true precisely
when p is false. If both p and q are statements then p ∧ q, read p and q, is the
statement that is true when both p and q are true and false otherwise. If both p
and q are statements then p ∨ q, read p or q, is the statement that is true when
either p and q are true, that is, the statement that is false only if both p and q are
false.
We could make do with these three symbols together with brackets to group
symbols and tell us what to do first. For example we could have the complicated
statement ((p ∧ q) ∨ (p ∧ r)) ∨ ¬s. This means that at least one of two statements
is true. The first is that either both p and q are true or both p and r are true. The
second is that s is not true.

Exercise 1. Think about the meaning of the statement we have just consid-
ered. Can you see a more straightforward statement that would mean the same
thing?

While we don’t strictly need any more symbols it is certainly convenient to


have at least a couple more. If both p and q are statements then p ⇒ q, read if p
then q or p implies q or p is sufficient for q or q is necessary for p, is the statement
that is false when p is true and q is false and is true otherwise. Many people find
this a bit nonintuitive. In particular, one might wonder about the truth of this
statement when p is false and q is true. A simple (and correct) answer is that this
is a definition. It is simply what we mean by the symbol and there isn’t any point
in arguing about definitions. However there is a sense in which the definition is
what is implied by the informal statements. When we say “if p then q” we are
saying that in any situation or state in which p is true then q is also true. We are
not making any claim about what might or might not be the case when p is not
true. So, in states in which p is not true we make no claim about q and so our
statement is true whether q is true or false. Instead of p ⇒ q we can write q ⇐ p.
In this case we are most likely to read the statement as q only if p or q is necessary
for p.
1
2 1. LOGIC, SETS, FUNCTIONS, AND SPACES

If p ⇒ q and p ⇐ q (that is q ⇒ p) then we say that p if and only if q or p is


necessary and sufficient for q and write p ⇔ q.
One powerful method of analysing logical relationships is by means of truth
tables. A truth table lists all possible combinations of the truth values of the
atomic statements and the associated truth values of the compound statements.
If we have two atomic statements then the following table gives the four possible
combinations of truth values.
p q
T T
F T
T F
F F
Now, we can add a column that would, for each combination of truth values of
p and q, give the truth value of p ⇒ q, just as described above.
p q p⇒q
T T T
F T T
T F F
F F T
Such truth tables allow us to see the logical relationship between various state-
ments. Suppose we have two compound statements A and B and we form a truth
table showing the truth values of A and B for each possible profile of truth values
of the atomic statements that constitute A and B. If in each row in which A is true
B is also true then statement A implies statement B. If statements A and B have
the same truth value is each row then statements A and B are logically equivalent.
For example I claim that the statement p ⇒ q we have just considered is logically
equivalent to ¬p ∨ q. We can see this by adding columns to the truth table we have
just considered. Let me add a column for ¬p and then one for ¬p ∨ q. (we only add
the column for ¬p to make it easier).

p q p⇒q ¬p ¬p ∨ q
T T T F T
F T T T T
T F F F F
F F T T T
Since the third column and the fifth column contain exactly the same truth values
we see that the two statements, p ⇒ q and ¬p ∨ q are indeed logically equivalent.
Exercise 2. Construct the truth table for the statement ¬(¬p ∨ ¬q). Is it
possible to write this statement using fewer logical connectives? Hint: why not
start with just one?
Exercise 3. Prove that the following statements are equivalent:
(i) (p ∨ ¬q) ⇒ ((¬p) ∧ q) and ¬(q ⇒ p),
(ii) p ⇒ q and ¬q ⇒ ¬p.
In part (ii) the second statement is called the contrapositive of the first statement.
Often if you are asked to prove that p implies q it will be easier to show the
contrapositive, that is, that not q implies not p.
Exercise 4. Prove that the following statements are equivalent:
(i) ¬(p ∧ q) and ¬p ∨ ¬q,
(ii) ¬(p ∨ q) and ¬p ∧ ¬q.
2. SETS 3

These two equivalences are known as De Morgan’s Laws.


A tautology is a statement that is necessarily true. For example if the state-
ments A and B are logically equivalent then the statement A ⇔ B is a tautology.
If A logically implies B then A ⇒ B is a tautology. We can check whether a com-
pound statement is a tautology by writing a truth table for this statement. If the
statement is a tautology then its truth value should be T in each row of its truth
table.
A contradiction is a statement that is necessarily false, that is, a statement A
such that ¬A is a tautology. Again, we can see whether a statement is a contradic-
tion by writing a truth table for the statement.

2. Sets
Set theory was developed in the second half of the 19th century and is at the
very foundation of modern mathematics. But we shall not be concerned here with
the development of the theory. Rather we shall only give the basic language of set
theory and outline some of the very basic operations on sets.
We start by defining a set to be a collection of objects or elements. We will
usually denote sets by capital letters and their elements by lower case letters. If
the element a is in the set A we write a ∈ A. If every element of the set B is also
in the set A we call B a subset of the set A and write B ⊂ A. We shall also say
the A contains B. If A and B have exactly the same elements then we say they
are equal or identical. Alternatively we could say A = B if and only if A ⊂ B and
B ⊂ A. If B ⊂ A and B 6= A then we say that B is a proper subset of A or that A
strictly contains B.
Exercise 5. How many subsets a set with N elements has?
In order to avoid the paradoxes such as the one referred to in the first paragraph
we shall always assume that in whatever situation we are discussing there is some
given set U called the universal set which contains all of the sets with which we
shall deal.
We customarily enclose our specification of a set by braces. In order to specify
a set one may simply list the elements. For example to specify the set D which
contains the numbers 1,2, and 3 we may write D = {1, 2, 3}. Alternatively we may
define the set by specifying a property that identifies the elements. For example
we may specify the same set D by D = {x | x is an integer and 0 < x < 4}. Notice
that this second method is more powerful. We could not, for example, list all
the integers. (Since there are an infinite number of them we would die before we
finished.)
For any two sets A and B we define the union of A and B to be the set which
contains exactly all of the elements of A and all the elements of B. We denote the
union of A and B by A ∪ B. Similarly we define the intersection of A and B to
be that set which contains exactly those elements which are in both A and B. We
denote the intersection of A and B by A ∩ B. Thus we have

A∪B = {x | x ∈ A or x ∈ B}
A∩B = {x | x ∈ A and x ∈ B}.
Exercise 6. The oldest mathematician among chess players and the oldest
chess player among mathematicians is it the same person or (possibly) different
ones?
Exercise 7. The best mathematician among chess players and the best chess
player among mathematicians is it the same person or (possibly) different ones?
4 1. LOGIC, SETS, FUNCTIONS, AND SPACES

Exercise 8. Every tenth mathematician is a chess player and every fourth


chess player is a mathematician. Are there more mathematicians or chess players
and by how many times?
Exercise 9. Prove the distributive laws for operations of union and intersec-
tion.
(i) (A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C)
(ii) (A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C)
Just as the number zero is extremely useful so the concept of a set that has
no elements is extremely useful also. This set we call the empty set or the null set
and denote by ∅. To see one use of the empty set notice that having such a concept
allows the intersection of two sets be well defined whether or not the sets have any
elements in common.
We also introduce the concept of a Cartesian product. If we have two sets, say
A and B, the Cartesian product, A × B, is the set of all ordered pairs, (a, b) such
that a is an element of A and b is an element of B. Symbolically we write
A × B = {(a, b) | a ∈ A and b ∈ B}.

3. Binary Relations
There are a number of ways of formulating the notion of a binary relation. We
shall pursue one, defining a binary relation on a set X simply as a subset of X × X,
the Cartesian product of X with itself.
Definition 1. A binary relation R on the set X is a subset of X × X. If the
point (x, y) ∈ R we shall often write xRy instead of (x, y) ∈ R.
Since we have already defined the notions of Cartesian product and subset,
there is really nothing new here. However the structure and properties of binary
relations that we shall now study is motivated by the informal notion of a “relation”
between the elements of X.
Example 1. Suppose that X is a set of boys and girls and the relation xSy is
“x is a sister of y.”
Example 2. Suppose that X is the set of natural numbers X = {1, 2, 3, . . . }.
There are binary relations >, ≥, and =.
Example 3. Suppose that X is the set of natural numbers X = {1, 2, 3, . . . }.
The relations R, P , and I are defined by
xRy if and only if x + 1 ≥ y,
xP y if and only if x > y + 1, and
xIy if and only if −1 ≤ x − y ≤ 1.
Definition 2. The following properties of binary relations have been defined
and found to be useful.
(BR1) Reflexivity: For all x in X xRx.
(BR2) Irreflexivity: For all x in X not xRx.
(BR3) Completeness: For all x and y in X either xRy or yRx (or both).1
(BR4) Transitivity: For all x, y, and z in X if xRy and yRz then xRz.
(BR5) Negative Transitivity: For all x, y, and z in X if xRy then either
xRz or zRy (or both).
(BR6) Symmetry: For all x and y in X if xRy then yRx.
(BR7) Anti-Symmetry: For all x and y in X if xRy and yRx then x = y.
(BR8) Asymmetry: For all x and y in X if xRy then not yRx.

1We shall always implicitly include “or both” when we say “either. . . or.”
4. FUNCTIONS 5

Exercise 10. Show that completeness implies reflexivity, that asymmetry im-
plies anti-symmetry, and that asymmetry implies irreflexivity.
Exercise 11. Which properties does the relation described in Example 1 sat-
isfy?
Exercise 12. Which properties do the relations described in Example 2 sat-
isfy?
Exercise 13. Which properties do the relations described in Example 3 sat-
isfy?
We now define a few particularly important classes of binary relations.
Definition 3. A weak order is a binary relation that satisfies transitivity and
completeness.
Definition 4. A strict partial order is a binary relation that satisfies transi-
tivity and asymmetry.
Definition 5. An equivalence is a binary relation that satisfies transitivity
and symmetry.
You have almost certainly already met examples of such binary relations in
your study of Economics. We normally assume that weak preference, strict pref-
erence, and indifference of a consumer are weak orders, strict partial orders, and
equivalences, though we actually typically assume a little more about the strict
preference.
The following construction is also motivated by the idea of preference. Let
us consider some binary relation R which we shall informally think of as a weak
preference relation, though we shall not, for the moment, make any assumptions
about the properties of R. Consider the relations P defined by xP y if and only if
xRy and not yRx and I defined by xRy if and only if xRy and yRx.
Exercise 14. Show that if R is a weak order then P is a strict partial order
and I is an equivalence.
We could also think of starting with a strict preference P and defining the weak
preference R in terms of P . We could do so either by defining R as xRy if and only
if not yP x or by defining R as xRy if and only if either xP y or not yP x.
Exercise 15. Show that these two definitions of R coincide if P is asymmetric.
Exercise 16. Show by example that P may be a strict partial order (so, by
the previous result, the two definitions of R coincide) but R not a weak order.
[Hint: If you cannot think of another example consider the binary relations defined
in Example 3.]
Exercise 17. Show that if P is asymmetric and negatively transitive then
(i) P is transitive (and hence a strict partial order), and
(ii) R is a weak order.

4. Functions
Let X and Y be two sets. A function (or a mapping) f from the set X to the
set Y is a rule that assigns to each x in X a unique element in Y , denoted by f (x).
The notation
f : X → Y.
6 1. LOGIC, SETS, FUNCTIONS, AND SPACES

is standard. The set X is called the domain of f and the set Y is called the
codomain of f . The set of all values taken by f , i.e. the set
{y ∈ Y | there exists x in X such that y = f (x)}
is called the range of f . The range of a function need not coincide with its codomain
Y.
There are several useful ways of visualising functions. A function can be thought
of as a machine that operates on elements of the set X and transforms an input
x into a unique output f (x). Note that the machine is not required to produce
different outputs from different inputs. This analogy helps to distinguish between
the function itself, f , and its particular value, f (x). The former is the machine,
the latter is the output2! One of the reasons for this confusion is that in practice,
to avoid being verbose, people often say things like ‘consider a function U (x, y) =
xα y β ’ instead of saying ‘consider a function defined for every pair (x, y) in R2 by
the equation U (x, y) = xα y β ’.
A function can also be thought of as a transformation, or a mapping, of the set
X into the set Y . In line with this interpretation is the common terminology, it is
said that f (x) is the image of x under the function f . Again, it is important to
remember that there may be points of Y which are the images of no point of X and
that there may be different points of X which have the same images in Y . What is
absolutely prohibited, however, is for a point from X to have several images in Y !
The part of definition of the function is the specification of its domain. However,
in applications, functions are quite often defined as an algebraic formula, without
explicit specification of its domain. For example, a function may be defined as
f (x) = sin x + 145x2 .
The function f is then the rule that assigns the value sin x + 145x2 to each value of
x. The convention in such cases is that the domain of f is the set of all values of x
for which the formula gives a unique value. Thus, if you come, for instance, across
the function f (x) = 1/x you should assume that its domain is (−∞, 0) ∪ (0, ∞),
unless specified otherwise.
For any subset A of X, the subset f (A) of Y such that y = f (x) for some x in
X is called the image of A by f , that is,
f (A) = {y ∈ Y | there exists x in A such that y = f (x)}.
Thus, the the range of f can be written as f (X). Similarly, one can define the
inverse image. For any subset B of Y , the inverse image f −1 (B) of B is the set of
x in X such that f (x) is in B, that is,
f −1 (B) = {x ∈ X | f (x) ∈ B}.
A function f is called a function onto Y (or surjection) if the range of f is Y ,
i.e., if for every y ∈ Y there is (at least) one x ∈ X such that y = f (x). In other
words, each element of Y is the image of (at least) one element of X. A function f is
called one-to-one (or injection) if f (x1 ) = f (x2 ) implies x1 = x2 , that is, for every
element y of f (X) there is a unique element x of X such that y = f (x). In other
words, one-to-one function maps different elements of X into different elements of
Y . When a function f : X → Y is both onto and one-to-one it is called a bijection.
Exercise 18. Suppose that a set X has m elements and a set Y has n ≥ m
elements. How many different functions are there from X to Y ? from Y to X?
How many of them surjective? How many of them injective? How many of them
bijective?
2Mathematician Robert Bartle put it as follows. ”Only a fool would confuse sausage-grinder
with a sausage; however, enough people have confused functions with their values...”
5. SPACES 7

Exercise 19. Find a function f : N → N which is


(i) surjective but not injective,
(ii) injective but not surjective,
(iii) neither surjective nor injective,
(iv) bijective

If function f is a bijection then it is possible to define a function g : Y → X


such that g(y) = x where x = f (y). Thus, to each element y of Y is assigned an
element x in X whose image under f is y. Since f is onto, g is defined for every y
of Y and since f is one-to-one g(y) is unique. The function g is called the inverse of
f and is usually written as f −1 . In that case, however, it’s not immediately clear
what f −1 (x) means. Is it the inverse image of x under f or the image of x under
f −1 ? Happily enough they are the same if f −1 exists!

Exercise 20. Prove that when a function f −1 exists it is both onto and one-
to-one and that the inverse of f −1 is the function f itself.

If f : X → Y and g : Y → Z, then the function h : X → Z, defined as


h(x) = g(f (x)), is called the composition of g with f and denoted by g ◦ f . Note
that even if f ◦ g is well-defined it is usually, different from g ◦ f .

Exercise 21. Let f : X → Y . Prove that there exist a surjection g : X → A


where A ⊆ X and a injection h : A → Y such that f = h ◦ g. In other words, prove
that any function can be written as a composition of a surjection and an injection.

The set G ⊂ X × Y of ordered pairs (x, f (x)) is called the graph of the function
f 3. Of course, the fact that something is called a graph does not necessarily mean
that it can be drawn!

5. Spaces
Sets are reasonably interesting mathematical objects to study. But to make
them even more interesting (and useful for applications) sets are usually endowed
with some additional properties, or structures. These new objects are called spaces.
The structures are often modeled after the familiar properties of space we live in and
reflect (in axiomatic form) such notions as order, distance, addition, multiplication,
etc.
Probably one of the most intuitive spaces is the space of the real numbers, R.
We will briefly look at the axiomatic way of describing some of its properties.
Given the set of real numbers R, the operation of addition is the function
+ : R × R → R that maps any two elements x and y in R to an element denoted
by x + y and called the sum of x and y. The addition satisfies the following axioms
for all real numbers x, y, and z.
A1: x + y = y + x.
A2: (x + y) + z = x + (y + z).
A3: There exist an element, denoted by 0, such that x + 0 = x.
A4: For each x there exist an element, denoted by −x, such that x + (−x) = 0.
All the remaining properties of the addition can be proven using these axioms.
Note also that we can define another operation x − y as x + (−y) and call it
subtraction.

3Some people like the idea of the graph of a function so much that they define a function to
be its graph.
8 1. LOGIC, SETS, FUNCTIONS, AND SPACES

Exercise 22. Prove that the axioms for addition imply the following state-
ments.
(i) The element 0 is unique.
(ii) If x + y = x + z then y = z (a cancelation law).
(iii) −(−x) = x.
The operation of multiplication can be axiomatised in a similar way. Given the
set of real numbers, R, the operation of multiplication is the function · : R × R → R
that maps any two elements x and y in R to an element denoted by x · y and called
the product of x and y. The multiplication satisfies the following axioms for all real
numbers x, y, and z.
A5: x · y = y · x.
A6: (x · y) · z = x · (y · z).
A7: There exist an element, denoted by 1, such that x · 1 = x.
A8: For each x 6= 0 there exist an element, denoted by x−1 , such that
x · x−1 = 1.
One more axiom (a distributive law) brings these two operations, addition and
multiplication4, together.
A9: x(y + z) = xy + xz for all x, y, and z in R.
Another structure possessed by the real numbers has to do with the fact that
the real numbers are ordered. The notion of x less than y can be axiomatised as
follows. For any two distinct elements x and y either x < y or y < x and, in
addition, if x < y and y < z then x < z.
Another example of a space (very important and useful one) is n−dimensional
real space5. Given the natural number n, define Rn to be the set of all possi-
ble ordered n−tuples of n real numbers, with generic element denoted by x =
(x1 , . . . , xn ). Thus, the space Rn is the n−fold Cartesian product of the set R with
itself. Real numbers x1 , . . . , xn are called coordinates of the vector x. Two vectors
x and y are equal if and only if x1 = y1 , . . . , xn = yn . The operation of addition of
two vectors is defined as
x + y = (x1 + y1 , . . . , xn + yn ).
Exercise 23. Prove that the addition of vectors in Rn satisfies the axioms of
addition.
The role of multiplication in this space is player by the operation of multipli-
cation by real number defined for all x in Rn and all α in R by
αx = (αx1 , . . . , αxn ).
Exercise 24. Prove that the multiplication by real number satisfies a distribu-
tive law.

6. Metric Spaces and Continuous Functions


The notion of metric is the generalisation of the notion of distance between two
real numbers.
Let X be a set and d : X × X → R a function. The function d is called a metric
if it satisfies the following properties for all x, y, and z in X.
1. d(x, y) ≥ 0 and d(x, y) = 0 if and only if x = y,
2. d(x, y) = d(y, x),
3. d(x, y) ≤ d(x, z) + d(z, y).

4From now on, to go easy on notation we will follow the standard convention not to write
the symbol for multiplication, that is to write xy instead of x · y, etc.
5We haven’t defined what the word dimension means yet, so just treat it as a (fancy) name.
6. METRIC SPACES AND CONTINUOUS FUNCTIONS 9

The set X together with the function d is called a metric space, elements of X
are usually called points, and the number d(x, y) is called the distance between x
and y. The last property of a metric is called triangle inequality.
Exercise 25. Let X be a non-empty set and d : X × X → R be the function
that satisfies the following two properties for all x, y, and z in X.
(i) d(x, y) = 0 if and only if x = y,
(ii) d(x, y) ≤ d(x, z) + d(z, y).
Prove that d is a metric.
Exercise 26. Prove that d(x, y) + d(w, z) ≤ d(x, w) + d(x, z) + d(y, w) + d(y, z)
for all x, y, w, and z in X, where d is some metric on X.
An obvious example of a metric space is the the set of real numbers, R, together
with the ‘usual’ distance, d(x, y) = |x − y|. Another example is the n−dimensional
Euclidean space Rn with metric
p
d(x, y) = (x1 − y1 )2 + · · · + (xn − yn )2 .
Note that the same set can be endowed with the different metrics thus resulting
in the different metric spaces! For example, the set of all n−tuples of real numbers
can be made into metric space by use of the (non-Euclidean) metric
dT (x, y) = |x1 − y1 | + · · · + |xn − yn |,
which is different from metric space Rn . This metric is sometimes called the Man-
hattan (or taxicab) metric. Another curious metric is the so-called French railroad
metric, defined by

0 if x = y
dF (x, y) =
d(x, P ) + d(y, P ) if x 6= y
where P is the particular point of Rn (called Paris) and function d is the Euclidean
distance.
Exercise 27. Prove that the French railroad metric dF is a metric.
Exercise 28. Let X be a non-empty set and d : X × X → R be the function
defined by

1 if x 6= y
d(x, y) =
0 if x = y
Prove that d is a metric. (This metric is called the discrete metric.)
Using the notion of metric it is possible to generalise the idea of continuous
function.
Suppose (X, dX ) and (Y, dY ) are metric spaces, x0 ∈ X, and f : X → Y is a
function. Then f is continuous at x0 if for every ε > 0 there exists a δ > 0 such
that
dY (f (x0 ), f (x)) < ε
for all points x ∈ X for which dX (x0 , x) < δ.
The function f is continuous on X if f is continuous at every point of X.
Let’s prove that function f (x) = x is continuous on R using the above definition.
For all x0 ∈ R, we have |f (x0 ) − f (x)| = |x0 − x| < ε as long as |x0 − x| < δ = ε.
That is, given any ε > 0 we are always able to find a δ, namely δ = ε, such that
all points which are closer to x0 than δ will have images which are closer to f (x0 )
than ε.
10 1. LOGIC, SETS, FUNCTIONS, AND SPACES

Exercise 29. Let f : R → R be the function defined by



1/x if x 6= 0
f (x) =
0 if x = 0
Prove that f is continuous at every point of R, with the exception of 0.

7. Open sets, Compact Sets, and the Weierstrass Theorem


Let x be a point in a metric space and r > 0. The open ball B(x, r) of radius
r centred at x is the set of all y ∈ X such that d(x, y) < r. Thus, the open ball is
the set of all points whose distance from the centre is strictly less than r. The ball
is closed if the inequality is weak, d(x, y) ≤ r.
A set S in a metric space is open if for all x ∈ S there exists r ∈ R, r > 0 such
that B(x, r) ⊂ S. A set S is closed if its complement
S C = {x ∈ X :| x ∈
/ S}
is open.
Exercise 30. Prove that an open ball is an open set.
Exercise 31. Prove that the intersection of any finite number of open sets is
the open set.
A set S is bounded if there exists a closed ball of finite radius that contains it.
Formally, S is bounded if there exists a closed ball B(x, r) such that S ⊂ B(x, r).
Exercise 32. Prove that the set S is bounded if and only if there a exists a
real number p > 0 such that d(x, x0 ) ≤ p for all x and x0 in S.
Exercise 33. Prove that the union of two bounded sets is a bounded set.
A collection (possibly infinite) of open sets U1 , U2 , . . . in a metric space is an
open cover of the set S if S is contained in its union.
A set S is compact if every open cover of S has a finite subcover. That is from
any open cover can select a finite number of sets Ui that still cover S.
Note that the definition does not say that a set is compact if there is a finite
open cover! That wouldn’t be a good definition as you can cover any set with the
whole space, which is just one open set.
Let’s see how to use this definition to show that something is not compact.
Consider the set (0, 1) ∈ R. To prove that it is not compact we need to find an
open cover of (0, 1) from which we cannot select a finite cover. The collection of
open intervals (1/n, 1) for all integers n ≥ 2 is an open cover of (0, 1), because for
any point x ∈ (0, 1) it is always able to find an integer n such that n > 1/x, thus
x ∈ (1/n, 1). But, no finite subcover will do! Let (1/N, 1) be the maximal interval
in a candidate subcover then it is always possible to find a point x ∈ (0, 1) such
that N < 1/x.
While this definition of compactness is quite useful for finding out when the set
under question is not compact it is less useful for verifying that a set is indeed com-
pact. Much more convenient characterisation of compact sets in finite-dimensional
Euclidean space, Rn , is given by the following theorem.
Theorem 1. Any closed and bounded subset of Rn is compact.
But why are we interested in compactness at all? Because of the following ex-
tremely important theorem the first version of which was proved by Carl Weierstrass
around 1860.
Theorem 2. Let S be a compact set in a metric space and f : S → R be a
continuous function. Then function f attains its maximum and minimum in S.
8. SEQUENCES AND SUBSEQUENCES 11

And why this theorem is important for us? Because many economic problems
are concerned with finding a maximal (or a minimal) value of a function on some set.
Weierstrass theorem provides conditions under which such search is meaningful!!!
This theorem and its implications will be much dwelt upon later in the notes, so
we just give here one example. The consumer utility maximisation problem is the
problem of finding the maximum of utility function subject to the budget constraint.
According to Weierstrass theorem, this problem has a solution if utility function is
continuous and the budget set is compact.

8. Sequences and Subsequences


Let us consider again some metric space (X, d). An infinite sequence of points
in (X, d) is simply a list
x1 , x2 , x3 , . . . ,
where . . . indicates that the list continues “forever.”
We can be a bit more formal about this. We first consider the set of natural
numbers (or counting numbers) 1, 2, 3, . . . , which we denote N. We can now define
an infinite sequence in the following way.
Definition 6. An infinite sequence of elements of X is a function from N to
X.
Notation. If we look at the previous definition we see that we might have
a sequence s : N → X which would define s(1), s(2), s(3), . . . or in other words
would define s(n) for any natural number n. Typically when we are referring to
sequences we use subscripts (or sometimes superscripts) instead of parentheses and
write s1 , s2 , s3 , . . . and sn instead of s(1), s(2), s(3), . . . and s(n). Also rather than
saying that s : N → X is a sequence we say that {sn } is a sequence or even that
{sn }∞
n=1 is a sequence.

Lets now examine a few examples.


Example 4. Suppose that√(X, d) is R the real numbers with the usual metric
d(, x, y) = |x − y|. Then {n}, { n}, and {1/n} are sequences.
Example 5. Again, suppose that (X, d) is R the real numbers with the usual
metric d(x, y) = |x − y|. Consider the sequence {xn } where
(
1 if n is odd
xn =
0 if n is even

We see that {n} and { n} get arbitrary large as n gets larger, while in the last
example xn “bounces” back and forth between 0 and 1 as n gets larger. However for
{1/n} the element of the sequence gets closer and closer to 0 (and indeed arbitrarily
close to 0). We say, in this case, that the sequence converges to zero or that the
sequence has limit 0. This is a particularly important concept and so we shall give
a formal definition.
Definition 7. Let {xn } be a sequence of points in (X, d). We say that the
sequence converges to x0 ∈ X if for any ε > 0 there is N ∈ N such that if n > N
then d(xn , x0 ) < ε.
Informally we can describe this by saying that if n is large then the distance
from xn to x0 is small.
If the sequence {xn } converges to x0 , then we often write xn → x0 as n → ∞
or limn→∞ xn = x0 .
12 1. LOGIC, SETS, FUNCTIONS, AND SPACES

Exercise 34. Show that if the sequence {xn } converges to x0 then it does not
converge to any other value unequal to x0 . Another way of saying this is that if
the sequence converges then it’s limit is unique.
We have now seen a number of examples of sequences. In some the sequence
“runs off to infinity;” in others it “bounces around;” while in others it converges to
a limit. Could a sequence do anything else? Could a sequence, for example, settle
down each element getting closer and closer to all future elements in the sequence
but not converging to any particular limit? In fact, depending on what the space
X is this is indeed possible.
First let us recall the notion of a rational number. A rational number is a
number that can be expressed as the ratio of two integers, that is r is rational if
r = a/b with a and b integers and b 6= 0. We usually denote the set of all rational
numbers Q (since we have already used R for the real numbers). We now consider
and example in which the underlying space X is Q. Consider the sequence of
rational numbers defined in the following way
x1 = 1
xn + 2
xn+1 = .
xn + 1
This kind of definition is called a recursive definition. Rather than writing, as a
function of n, what xn is we write what x1 is and then what xn+1 is as a function
of what xn is. We can obviously find any element of the sequence that we need, as
long as we sequentially calculate each previous element. In our case we’d have
x1 = 1
1+2 3
x2 = = = 1.5
1+1 2
3
+ 2 7
x3 = 23 = = 1.4
2 +1
5
7
5 +2 17
x4 = 7 = ≈ 1.416667
5 + 1 12
17
12 + 2 41
x5 = 17 = ≈ 1.413793
12 + 1
29
41
29 + 2 99
x6 = 41 = ≈ 1.414286
29 + 1
70
..
.
We see that the sequence goes up and down but that it seems to be “converg-
ing.” What is it converging to? Lets suppose that it’s converging to some value x0 .
Recall that
xn + 2
xn+1 = .
xn + 1
We’ll see later that if f is a continuous function then lim n → ∞f (xn ) = f (lim n → ∞xn ).
In this case that means that
xn + 2
x0 = lim n → ∞xn+1 = lim n → ∞
xn + 1
x0 + 2
= .
x0 + 1
Thus we have
x0 + 2
x0 =
x0 + 1
8. SEQUENCES AND SUBSEQUENCES 13


and if we solve this we obtain x0 = ±√ 2. Clearly if xn > 0 then √ xn+1 >√0 so
our sequence can’t be converging to − 2 so we must have x0 = 2. But 2 is
not in Q. Thus we have a sequence of elements in Q that are getting very close to
each other but are not converging to any element of Q. (Of course the sequence is
converging to a point in R. In fact one construction of the real number system is
in terms of such sequences in Q.
Definition 8. Let {xn } be a sequence of points in (X, d). We say that the
sequence is a Cauchy sequence if for any ε > 0 there is N ∈ N such that if n, m > N
then d(xn , xm ) < ε.
Exercise 35. Show that if {xn } converges then {xn } is a Cauchy sequence.
A metric space (X, d) in which every Cauchy sequence converges to a limit in
X is called a complete metric space. The space of real numbers R is a complete
metric space, while the space of rationals Q is not.
Exercise 36. Is N the space of natural or counting numbers with metric d
given by d(x, y) = |x − y| a complete metric space?
In Section 6 we defined the notion of a function being continuous at a point.
It is possible to give that definition in terms of sequences.
Definition 9. Suppose (X, dX ) and (Y, dY ) are metric spaces, x0 ∈ X, and
f : X → Y is a function. Then f is continuous at x0 if for every sequence {xn } that
converges to x0 in (X, dX ) the sequence {f (xn )} converges to f (x0 ) in (Y, dY ).
Exercise 37. Show that the function f (x) = (x + 2)/(x + 1) is continuous at
any point x 6= −1. Show that this means that if xn → x0 as n → ∞ then
xn + 2 x0 + 2
lim = .
n→∞ xn + 1 x0 + 1
We can also define the concept of a closed set (and hence the concepts of open
sets and compact sets) in terms of sequences.
Definition 10. Let (X, d) be a metric space. A set S ⊂ X is closed if for any
convergent sequence {xn } such that xn ∈ S for all n then limn→∞ xn ∈ S. A set is
open if its complement is closed.
Given a sequence {xn } we can define a new sequence by taking only some of
the elements of the original sequence. In the example we considered earlier in which
xn was 1 if n was odd and 0 if n was even we could take only the odd n and thus
obtain a sequence that did converge. The new sequence is called a subsequence of
the old sequence.
Definition 11. Let {xn } be some sequence in (X, d). Let {nj }∞ j=1 be a
sequence of natural numbers such that for each j we have nj < nj+1 , that is
n1 < n2 < n3 < . . . . The sequence {xnj }∞
j=1 is called a subsequence of the original
sequence.
The notion of a subsequence is often useful. We often use it in the way that
we briefly referred to above. We initially have a sequence that may not converge,
but we are able to take a subsequence that does converge. Such a subsequence is
called a convergent subsequence.
Definition 12. A subset of a metric space with the property that every se-
quence in the subset has a convergent subsequence is called sequentially compact.
Theorem 3. In any metric space any compact set is sequentially compact.
14 1. LOGIC, SETS, FUNCTIONS, AND SPACES

If we restrict attention to finite dimensional Euclidian spaces the situation is


even better behaved.
Theorem 4. Any subset of Rn is sequentially compact if and only if it is
compact.
Exercise 38. Verify the following limits.
n
(i) lim =1
n→∞ n + 1
n+3
(ii) lim 2 =0
n +1 √
n→∞ √
(iii) lim n + 1 − n = 0
n→∞ √
n
(iv) lim an + bn = max{a, b}
n→∞

Exercise 39. Consider a sequence {xn } in R. What can you say about the
sequence if it converges and for each n xn is an integer.
Exercise 40. Consider the sequence
1 1 2 1 2 3 1 2 3 4 1
2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, . . . .
For which values z ∈ R is there a subsequence converging to z?
Exercise 41. Prove that if a subsequence of a Cauchy sequence converges to
a limit z then so does the original Cauchy sequence.
Exercise 42. Prove that any subsequence of a convergent sequence converges.
Finally one somewhat less trivial exercise.
Exercise 43. Prove that if limn→∞ xn = z then
x1 + · · · + xn
lim =z
n→∞ n
9. Linear Spaces
The notion of linear space is the axiomatic way of looking at the familiar linear
operations: addition and multiplication. A trivial example of a linear space is the
set of real numbers, R.
What is the operation of addition? The one way of answering the question is
saying that the operation of addition is just the list of its properties. So, we will
define the addition of elements from some set X as the operation that satisfies the
following four axioms.
A1: x + y = y + x for all x and y in X.
A2: x + (y + z) = (x + y) + z, for all x, y, and z in X.
A3: There exists an element, denoted by 0, such that x + 0 = x for all x in
X.
A4: For every x in X there exist an element y in X, called inverse of x, such
that x + y = 0.
And, to make things more interesting we will also introduce the operation of
‘multiplication by number’ by adding two more axioms.
A5: 1x = x for all x in X.
A6: α(βx) = (αβ)x for all x in X and for all α and β in R.

Finally, two more axioms relating addition and multiplication.


A7: α(x + y) = αx + αy for all x and y in X and for all α in R.
A8: (α + β)x = αx + βx for all x in X and for all α and β in R.
9. LINEAR SPACES 15

Elements x, y, . . . , w are linearly dependent if there exist real numbers α, β, . . . , λ,


not all of them equal to zero, such that
αx + βy + · · · + λz = 0.
Otherwise, the elements x, y, . . . , w are linearly independent.
If in a space L it is possible to find n linearly independent elements, but any
n + 1 elements are linearly dependent then we say that the space L has dimension
n.
Nonempty subset L0 of a linear space L is called a linear subspace if L0 forms
a linear space in itself. In other words, L0 is a linear subspace of L if for any x and
y in L and all α and β in R
αx + βy ∈ L0 .
CHAPTER 2

Linear Algebra

1. The Space Rn
In the previous chapter we introduced the concept of a linear space or a vector
space. We shall now examine in some detail one example of such a space. This is
the space of all ordered n-tuples (x1 , x2 , . . . , xn ) where each xi is a real number.
We call this space n-dimensional real space and denote it Rn .
Remember from the previous chapter that to define a vector space we not only
need to define the points in that space but also to define how we add such points
and how we multiple such points by scalars. In the case of Rn we do this element
by element in the n-tuple or vector. That is,
(x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn )
and
α(x1 , x2 , . . . , xn ) = (αx1 , αx2 , . . . , αxn ).
Let us consider the case that n = 2, that is, the case of R2 . In this case we can
visualise the space as in the following diagram. The vector (x1 , x2 ) is represented
by the point that is x1 units along from the point (0, 0) in the horizontal direction
and x2 units up from (0, 0) in the vertical direction.
x2
6

2 q (1, 2)

-
1 x1

Figure 1

Let us for the moment continue our discussion in R2 . Notice that we are
implicitly writing a vector (x1 , x2 ) as a sum x1 × v 1 + x2 × v 2 where v 1 is the
unit vector in the first direction and v 2 is the unit vector in the second direction.
Suppose that instead we considered the vectors u1 = (2, 1) = 2 × v 1 + 1 × v 2 and
17
18 2. LINEAR ALGEBRA

u2 = (1, 2) = 1 × v 1 + 2 × v 2 . We could have written any vector (x1 , x2 ) instead


as z1 × u1 + z2 × u2 where z1 = (2x1 − x2 )/3 and z2 = (2x2 − x1 )/3. That is, for
any vector in R2 we can uniquely write that vector in terms of u1 and u2 . Is there
anything that is special about u1 and u2 that allows us to make this claim? There
must be since we can easily find other vectors for which this would not have been
true. (For example, (1, 2) and (2, 4).)
The property of the pair of vectors u1 and u2 is that they are independent. That
is, we cannot write either as a multiple of the other. More generally in n dimensions
we would say that we cannot write any of the vectors as a linear combination of
the others, or equivalently as the following definition.
Definition 13. The vectors x1 , . . . , xk all in Rn are linearly independent if it
is not possible to find scalars α1 , . . . , αk not all zero such that
α1 x1 + · · · + αk xk = 0.
Notice that we do not as a matter of definition require that k = n or even that
k ≤ n. We state as a result that if k > n then the collection x1 , . . . , xk cannot
be linearly independent. (In a real maths course we would, of course, have proved
this.)
Comment 1. If you examine the definition above you will notice that there
is nowhere that we actually need to assume that our vectors are in Rn . We can
in fact apply the same definition of linear independence to any vector space. This
allows us to define the concept of the dimension of an arbitrary vector space as the
maximal number of linearly independent vectors in that space. In the case of Rn
we obtain that the dimension is in fact n.
Exercise 44. Suppose that x1 , . . . , xk all in Rn are linearly independent and
that the vector y in Rn is equal to β1 x1 + · · · + βk xk . Show that this is the only
way that y can be expressed as a linear combination of the xi ’s. (That is show that
if y = γ1 x1 + · · · + γk xk then β1 = γ1 , . . . , βk = γk .)
The set of all vectors that can be written as a linear combination of the vectors
x1 , . . . , xk is called the span of those vectors. If x1 , . . . , xk are linearly independent
and if the span of x1 , . . . , xk is all of Rn then the collection { x1 , . . . , xk } is called
a basis for Rn . (Of course, in this case we must have k = n.) Any vector in Rn
can be uniquely represented as a linear combination of the vectors x1 , . . . , xk . We
shall later see that it can sometimes be useful to choose a particular basis in which
to represent the vectors with which we deal.
It may be that we have a collection of vectors { x1 , . . . , xk } whose span is not
all of Rn . In this case we call the span of { x1 , . . . , xk } a linear subspace of Rn .
Alternatively we say that X ⊂ Rn is a linear subspace of Rn if X is closed under
vector addition and scalar multiplication. That is, if for all x, y ∈ X the vector
x + y is also in X and for all x ∈ X and α ∈ R the vector αx is in X. If the span
of x1 , . . . , xk is X and if x1 , . . . , xk are linearly independent then we say that these
vectors are a basis for the linear subspace X. In this case the dimension of the
linear subspace X is k. In general the dimension of the span of x1 , . . . , xk is equal
to the maximum number of linearly independent vectors in x1 , . . . , xk .
Finally, we comment that Rn is a metric space with metric d : R2n → R+
defined by
p
d((x1 , . . . , xn ), (y1 , . . . , yn )) = (x1 − y1 )2 + · · · + (xn − yn )2 .
There are many other metrics we could define on this space but this is the standard
one.
2. LINEAR FUNCTIONS FROM Rn TO Rm 19

2. Linear Functions from Rn to Rm


In the previous section we introduced the space Rn . Here we shall discuss
functions from one such space to another (possibly of different dimension). The
concept of continuity that we introduced for metric spaces is immediately applicable
here. We shall be mainly concerned here with an even narrower class of functions,
namely, the linear functions.
Definition 14. A function f : Rn → Rm is said to be a linear function if it
satisfies the following two properties.
(1) f (x + y) = f (x) + f (y) for all x, y ∈ Rn , and
(2) f (αx) = αf (x) for all x ∈ Rn and α ∈ R.
Comment 2. When considering functions of a single real variable, that is,
functions from R to R functions of the form f (x) = ax + b where a and b are
fixed constants are sometimes called linear functions. It is easy to see that if b 6= 0
then such functions do not satisfy the conditions given above. We shall call such
functions affine functions. More generally we shall call a function g : Rn → Rm an
affine function if it is the sum of a linear function f : Rn → Rm and a constant
b ∈ Rm . That is, if for any x ∈ Rn g(x) = f (x) + b.
Let us now suppose that we have two linear functions f : Rn → Rm and
g : Rn → Rm . It is straightforward to show that the function (f + g) : Rn → Rm
defined by (f + g)(x) = f (x) + g(x) is also a linear function. Similarly if we have a
linear function f : Rn → Rm and a constant α ∈ R the function (αf ) : Rn → Rm
defined by (αf )(x) = αf (x) is a linear function. If f : Rn → Rm and g : Rm →
Rk are linear functions then the composite function g ◦ f : Rn → Rk defined by
g ◦ f (x) = g(f (x)) is again a linear function. Finally, if f : Rn → Rn is not only
linear, but also one-to-one and onto so that it has an inverse f −1 : Rn → Rn then
the inverse function is also a linear function.
Exercise 45. Prove the facts stated in the previous paragraph.
Recall in the previous section we defined the notion of a linear subspace. A
linear function f : Rn → Rm defines two important subspaces, the image of f ,
denoted Im(f ) ⊂ Rm , and the kernel of f , denoted Ker(f ) ⊂ Rn . The image of f
is the set of all vectors in Rm such that f maps some vector in Rn to that vector,
that is,
Im(f ) = { y ∈ Rm | ∃x ∈ Rn such that y = f (x) }.
The kernel of f is the set of all vectors in Rn that are mapped by the function f
to the zero vector in Rm , that is,
Ker(f ) = { x ∈ Rn | f (x) = 0 }.
The kernel of f is sometimes called the null space of f .
It is intuitively clear that the dimension of Im(f ) is no more than n. (It is of
course no more than m since it is contained in Rm .) Of course, in general it may be
less than n, for example if m < n or if f mapped all points in Rn to the zero vector
in Rm . (You should satisfy yourself that this function is indeed a linear function.)
However if the dimension of Im(f ) is indeed less than n it means that the function
has mapped the n-dimensional space Rn into a linear space of lower dimension and
that in the process some dimensions have been lost. The linearity of f means that
a linear subspace of dimension equal to the number of dimensions that have been
lost must have been collapsed to the zero vector (and that translates of this linear
subspace have been collapsed to single points). Thus we can say that
dim(Im(f )) + dim(Ker(f )) = n.
20 2. LINEAR ALGEBRA

In the following section we shall introduce the notion of a matrix and define
various operations on matrices. If you are like me when I first came across matrices,
these definitions may seem somewhat arbitrary and mysterious. However, we shall
see that matrices may be viewed as representations of linear functions and that when
viewed in this way the operations we define on matrices are completely natural.

3. Matrices and Matrix Algebra


A matrix is defined as a rectangular array of numbers. If the matrix contains
m rows and n columns it is called an m × n matrix (read “m by n” matrix). The
element in the ith row and the jth column is called the ijth element. We typically
enclose a matrix in square brackets [ ] and write it as
 
a11 . . . a1n
 .. .. ..  .
 . . . 
am1 ... amn
In the case that m = n we call the matrix a square matrix. If m = 1 the matrix
contains a single row and we call it a row vector. If n = 1 the matrix contains
a single column and we call it a column vector. For most purposes we do not
distinguish between a 1 × 1 matrix [a] and the scalar a.
Just as we defined the operation of vector addition and the multiplication of
a vector by a scalar we define similar operations for matrices. In order to be able
to add two matrices we require that the matrices be of the same dimension. That
is, if matrix A is of dimension m × n we shall be able to add the matrix B to it
if and only if B is also of dimension m × n. If this condition is met then we add
matrices simply by adding the corresponding elements of each matrix to obtain the
new m × n matrix A + B. That is,
     
a11 . . . a1n b11 . . . b1n a11 + b11 . . . a1n + b1n
 .. .. ..  +  .. .. ..  =  .. .. ..
.

 . . .   . . .   . . .
am1 ... amn bm1 ... bmn am1 + bm1 ... amn + bmn
We can see that this definition of matrix addition satisfies many of the same
properties of the addition of scalars. If A, B, and C are all m × n matrices then
(1) A + B = B + A,
(2) (A + B) + C = A + (B + C),
(3) there is a zero matrix 0 such that for any m × n matrix A we have A + 0 =
0 + A = A, and
(4) there is a matrix −A such that A + (−A) = (−A) + A = 0.
Of course, the zero matrix referred to in 3 is simply the m × n matrix consisting
of all zeros (this is called a null matrix ) and the matrix −A referred to in 4 is the
matrix obtained from A by replacing each element of A by its negative, that is,
   
a11 . . . a1n −a11 . . . −a1n
−  ... .. ..  =  .. .. ..
.
 
. .   . . .
am1 ... amn −am1 ... −amn
Now, given a scalar α in R and an m × n matrix A we define the product of α
and A which we write αA to be the matrix in which each element is replaced by α
times that element, that is,
   
a11 . . . a1n αa11 . . . αa1n
α  ... .. ..  =  .. .. ..
.
 
. .   . . .
am1 ... amn αam1 ... αamn
3. MATRICES AND MATRIX ALGEBRA 21

So far the definitions of matrix operations have all seemed the most natural
ones. We now come to defining matrix multiplication. Perhaps here the definition
seems somewhat less natural. However in the next section we shall see that the defi-
nition we shall give is in fact very natural when we view matrices as representations
of linear functions.
We define matrix multiplication of A times B written as AB where A is an
m × n matrix and B is a p × q matrix only when n = p. In this case the product
Pn to be an m × q matrix in which the element in the ith row and jth
AB is defined
column is k=1 aik bkj . That is, to find the term to go in the ith row and the jth
column of the product matrix AB we take the ith row of the matrix A which will
be a row vector with n elements and the jth column of the matrix B which will be
a column vector with n elements. We then multiply each element of the first vector
by the corresponding element of the second and add all these products. Thus

    Pn Pn 
a11 . . . a1n b11 ... b1q k=1 a1k bk1 ... a1k bkq
k=1
 .. .. ..   .. .. ..  =  .. .. ..
.

 . . .  . . .   P . .
n Pn .
am1 . . . amn bn1 ... bnq k=1 mk bk1
a ... a b
k=1 mk kq

For example

 
  p q  
a b c  r s  = ap + br + ct aq + bs + cv
.
d e f dp + er + f t dq + es + f v
t v

We define the identity matrix of order n to be the n × n matrix that has 1’s on
its main diagonal and zeros elsewhere that is, whose ijth element is 1 if i = j and
zero if i 6= j. We denote this matrix by In or, if the order is clear from the context,
simply I. That is,

 
1 0 ... 0
 0 1 ... 0 
I= .
 
.. .. .. ..
 . . . . 
0 0 ... 1

It is easy to see that if A is an m × n matrix then AIn = A and Im A = A. In fact,


we could equally well define the identity matrix to be that matrix that satisfies
these properties for all such matrices A in which case it would be easy to show that
there was a unique matrix satisfying this property, namely, the matrix we defined
above.
Consider an m × n matrix A. The columns of A are m-dimensional vectors,
that is, elements of Rm and the rows of A are elements of Rn . Thus we can ask
if the n columns are linearly independent and similarly if the m rows are linearly
independent. In fact we ask: What is the maximum number of linearly independent
columns of A? It turns out that this is the same as the maximum number of linearly
independent rows of A. We call the number the rank of the matrix A.
22 2. LINEAR ALGEBRA

4. Matrices as Representations of Linear Functions


Let us suppose that we have a particular linear function f : Rn → Rm . We have
suggested in the previous section that such a function can necessarily be represented
as multiplication by some matrix. We shall now show that this is true. Moreover
we shall do so by explicitly constructing the appropriate matrix.
Let us write the n-dimensional vector x as a column vector
 
x1
 x2 
x =  . .
 
 .. 
xn
Pn
Now, notice that we can write the vector x as a sum i=1 xi ei , where ei is the ith
unit vector, that is, the vector with 1 in the ith place and zeros elsewhere. That is,
       
x1 1 0 0
 x2   0   1   0 
 ..  = x1  ..  + x2  ..  + · · · + xn  ..  .
       
 .   .   .   . 
xn 0 0 1

Now from the linearity of the function f we can write


Xn
f (x) = f ( xi ei )
i=1
n
X
= f (xi ei )
i=1
n
X
= xi f (ei ).
i=1

But, what is f (ei )? Remember that ei is a unit vector in Rn and that f maps
vectors in Rn to vectors in Rm . Thus f (ei ) is the image in Rm of the vector ei . Let
us write f (ei ) as
 
a1i
 a2i 
 ..  .
 
 . 
ami
Thus
n
X
f (x) = xi f (ei )
i=1
     
a11 a12 a1n
 a21   a22   a2n 
= x1   + x2   + · · · + xn 
     
.. .. .. 
 .   .   . 
am1 am2 amn
 Pn 
Pni=1 a1i xi
i=1 a2i xi
 
=
 
.. 
Pn .
 
i=1 ami xi
4. MATRICES AS LINEAR FUNCTIONS 23

and this is exactly what we would have obtained had we multiplied the matrices

  
a11 a12 ... a1n x1
 a21 a22 ... a2n  x2 
.
  
 .. .. .. ..  ..
 . . . .  . 
am1 am2 ... amn xn

Thus we have not only shown that a linear function is necessarily represented by
multiplication by a matrix we have also shown how to find the appropriate matrix.
It is precisely the matrix whose n columns are the images under the function of the
n unit vectors in Rn .

Exercise 46. Find the matrices that represent the following linear functions
from R2 to R2 .

(1) a clockwise rotation of π/2 (90◦ ),


(2) a reflection in the x1 axis,
(3) a reflection in the line x2 = x1 (that is, the 45◦ line),
(4) a counter clockwise rotation of π/4 (45◦ ), and
(5) a reflection in the line x2 = x1 followed by a counter clockwise rotation of
π/4.

Recall that in Section 2 we defined, for any f, g : Rn → Rm and α ∈ R, the


functions (f + g) and (αf ). In Section 3 we defined the sum of two m × n matrices
A and B, and the product of a scalar α with the matrix A. Let us instead define
the sum of A and B as follows.
Let f : Rn → Rm be the linear function represented by the matrix A and
g : Rn → Rm be the linear function represented by the matrix B. Now define
the matrix (A + B) to be the matrix that represents the linear function (f + g).
Similarly let the matrix αA be the matrix that represents the linear function (αf ).

Exercise 47. Prove that the matrices (A + B) and αA defined in the previous
paragraph coincide with the matrices defined in Section 3.

We can also see that the definition we gave of matrix multiplication is precisely
the right definition if we mean multiplication of matrices to mean the composition of
the linear functions that the matrices represent. To be more precise let f : Rn → Rm
and g : Rm → Rk be linear functions and let A and B be the m × n and k × m
matrices that represent them. Let (g ◦ f ) : Rn → Rk be the composite function
defined in Section 2. Now let us define the product BA to be that matrix that
represents the linear function (g ◦ f ).
24 2. LINEAR ALGEBRA

Now since the matrix A represents the function f and B represents g we have
(g ◦ f )(x) = g(f (x))
  
a11 a12 ... a1n x1
 a21 a22 ... a2n  x2 
= g  .
  
.. .. .. ..
 ..
 
. . .  . 
am1 am2 . . . amn xn
 Pn 
Pni=1 a1i xi
i=1 a2i xi 
 
= g 

.. 
Pn .
 
i=1 ami xi
   Pn 
b11 b12 . . . b1m Pni=1 a1i xi
 b21 b22 . . . b2m  
i=1 a2i xi 

= .
 
.. .. .. .
 .. .   P ..
  
. . 
n
bk1 bk2 . . . bkm i=1 ami xi
 Pm Pn
b1j i=1 aji xi

Pj=1
m Pn
 j=1 b2j i=1 aji xi 
=
 
.. 

Pm Pn. 
j=1 bkj i=1 aji xi
 Pn Pm
j=1 b1j aji xi

Pi=1
n P m
j=1 b2j aji xi 
 i=1

=

.. 

Pn Pm . 
i=1 j=1 bkj aji xi
 Pm Pm Pm
b1j aj1 j=1 b1j aj2 ... b1j ajn
 
Pj=1 Pj=1 x1
m Pm m
 j=1 b 2j a j1 j=1 b 2j aj2 . . . j=1 b2j ajn
 x2 
= .
  
.. .. . .. ..  ..

Pm . Pm . Pm .
 . 
j=1 bkj aj1 j=1 bkj aj2 . . . j=1 bkj ajn xn
And this last is the product of the matrix we defined in Section 3 to be BA with
the column vector x. As we have claimed the definition of matrix multiplication
we gave in Section 3 was not arbitrary but rather was forced on us by our decision
to regard the multiplication of two matrices as corresponding to the composition
of the linear functions the matrices represented.
Recall that the columns of the matrix A that represented the linear function
f : Rn → Rm were precisely the images of the unit vectors in Rn under f . The
linearity of f means that the image of any point in Rn is in the span of the images
of these unit vectors and similarly that any point in the span of the images is the
image of some point in Rn . Thus Im(f ) is equal to the span of the columns of
A. Now, the dimension of the span of the columns of A is equal to the maximum
number of linearly independent columns in A, that is, to the rank of A.

5. Linear Functions from Rn to Rn and Square Matrices


In the remainder of this chapter we look more closely at an important subclass
of linear functions and the matrices that represent them, viz the functions that
map Rn to itself. From what we have already said we see immediately that the
matrix representing such a linear function will have the same number of rows as it
has columns. We call such a matrix a square matrix.
7. CHANGES OF BASIS 25

If the linear function f : Rn → Rn is one-to-one and onto then the function f


has an inverse f −1 . In Exercise 45 you showed that this function too was linear.
A matrix that represents a linear function that is one-to-one and onto is called a
nonsingular matrix. Alternatively we can say that an n × n matrix is nonsingular
if the rank of the matrix is n. To see these two statements are equivalent note
first that if f is one-to-one then Ker(f ) = {0}. (This is the trivial direction of
Exercise 48.) But this means that dim(Ker(f )) = 0 and so dim(Im(f )) = n. And,
as we argued at the end of the previous section this is the same as the rank of
matrix that represents f .
Exercise 48. Show that the linear function f : Rn → Rm is one-to-one if and
only if Ker(f ) = {0}.
Exercise 49. Show that the linear function f : Rn → Rn is one-to-one if and
only if it is onto.

6. Inverse Functions and Inverse Matrices


In the previous section we discussed briefly the idea of the inverse of a linear
function f : Rn → Rn . This allows us a very easy definition of the inverse of a
square matrix A. The inverse of A is the matrix that represents the linear function
that is the inverse function of the linear function that A represents. We write the
inverse of the matrix A as A−1 . Thus a matrix will have an inverse if and only if
the linear function that the matrix represents has an inverse, that is, if and only
if the linear function is one-to-one and onto. We saw in the previous section that
this will occur if and only if the kernel of the function is {0} which in turn occurs
if and only if the image of f is of full dimension, that is, is all of Rn . This is the
same as the matrix being of full rank, that is, of rank n.
As with the ideas we have discussed earlier we can express the idea of a matrix
inverse purely in terms of matrices without reference to the linear function that
they represent. Given an n × n matrix A we define the inverse of A to be a matrix
B such that BA = In where In is the n × n identity matrix discussed in Section 3.
Such a matrix B will exist if and only if the matrix A is nonsingular. Moreover, if
such a matrix B exists then it is also true that AB = In , that is, (A−1 )−1 = A.
In Section 9 we shall see one method for calculating inverses of general n × n
matrices. Here we shall simply describe how to calculate the inverse of a 2 × 2
matrix. Suppose that we have the matrix
 
a b
A= .
c d
The inverse of this matrix is
  
1 d −b
.
ad − bc −c a
Exercise 50. Show that the matrix A is of full rank if and only if ad − bc 6= 0.
Exercise 51. Check that the matrix given is, in fact, the inverse of A.

7. Changes of Basis
We have until now implicitly assumed that there is no ambiguity when we
speak of the vector (x1 , x2 , . . . , xn ). Sometimes there may indeed be an obvious
meaning to such a vector. However when we define a linear space all that are really
specified are “what straight lines are” and “where zero is.” In particular, we do
not necessarily have defined in an unambiguous way “where the axes are” or “what
26 2. LINEAR ALGEBRA

a unit length along each axis is.” In other words we may not have a set of basis
vectors specified.
Even when we do have, or have decided on, a set of basis vectors we may wish
to redefine our description of the linear space with which we are dealing so as to
use a different set of basis vectors. Let us suppose that we have an n-dimensional
space, even Rn say, with a given set of basis vectors v 1 , v 2 , . . . , v n and that we
wish instead to describe the space in terms of the linearly independent vectors
b1 , b2 , . . . , bn where
bi = b1i v 1 + b2i v 2 + · · · + bni v n .
Now, if we had the description of a point in terms of the new coordinate vectors,
e.g., as
z1 b1 + z2 b2 + · · · + zn bn
then we can easily convert this to a description in terms of the original basis vectors.
We would simply substitute the formula for bi in terms of the ej ’s into the previous
formula giving
n
! n
! n
!
X X X
b1i zi v 1 + b2i zi v 2 + · · · + bni zi v n
i=1 i=1 i=1

or, in our previous notation


 Pn 
(Pi=1 b1i zi )
 ( n b2i zi ) 
i=1
.
 
 ..
Pn .
 
( i=1 bni zi )

But this is simply the product


  
b11 b12 ... b1n z1
 b21 b22 ... b2n  z2 
.
  
 .. .. .. ..  ..
 . . . .  . 
bn1 bn2 ... bnn zn

That is, if we are given an n-tuple of real numbers that describe a vector in terms
of the new basis vectors b1 , b2 , . . . , bn and we wish to find the n-tuple that describes
the vector in terms of the original basis vectors we simply multiply the ntuple we
are given, written as a column vector by the matrix whose columns are the new
basis vectors b1 , b2 , . . . , bn . We shall call this matrix B. We see among other things
that changing the basis is a linear operation.
Now, if we were given the information in terms of the original basis vectors
and wanted to write it in terms of the new basis vectors what should we do? Since
we don’t have the original basis vectors written in terms of the new basis vectors
this is not immediately obvious. However we do know that if we were to do it and
then were to carry out the operation described in the previous paragraph we would
be back with what we started. Further we know that the operation is a linear
operation that maps n-tuples to n-tuples and so is represented by multiplication
by an n × n matrix. That is we multiply the n-tuple written as a column vector by
the matrix that when multiplied by B gives the identity matrix, that is, the matrix
B −1 . If we are given a vector of the form

x1 v 1 + x2 v 2 + · · · + xn v n
7. CHANGES OF BASIS 27

and we wish to express it in terms of the vectors b1 , b2 , . . . , bn we calculate


 −1  
b11 b12 . . . b1n x1
 b21 b22 . . . b2n   x2 
..   ..  .
   
 .. .. ..
 . . . .   . 
bn1 bn2 . . . bnn xn
Suppose now that we consider a linear function f : Rn → Rn and that we have
originally described Rn in terms of the basis vectors v 1 , v 2 , . . . , v n where v i is the
vector with 1 in the ith place and zeros elsewhere. Suppose that with these basis
vectors f is represented by the matrix
 
a11 a12 . . . a1n
 a21 a22 . . . a2n 
A= . ..  .
 
.. ..
 .. . . . 
an1 an2 ... ann
If we now describe Rn in terms of the vectors b1 , b2 , . . . , bn how will the linear
function f be represented? Let us think of what we want? We shall be given
a vector described in terms of the basis vectors b1 , b2 , . . . , bn and we shall want
to know what the image of this vector under the linear function f is, where we
shall again want our answer in terms of the basis vectors b1 , b2 , . . . , bn . We shall
know how to do this when we are given the description in terms of the vectors
e1 , e2 , . . . , en . Thus the first thing we shall do with our vector is to convert it from
a description in terms of b1 , b2 , . . . , bn to a description in terms of e1 , e2 , . . . , en . We
do this by multiplying the n-tuple by the matrix B. Thus if we call our original
n-tuple z we shall now have a description of the vector in terms of e1 , e2 , . . . , en ,
viz Bz. Given this description we can find the image of the vector in question
under f by multiplying by the matrix A. Thus we shall have A(Bz) = (AB)z.
Remember however this will have given us the image vector in terms of the basis
vectors e1 , e2 , . . . , en . In order to convert this to a description in terms of the vectors
b1 , b2 , . . . , bn we must multiply by the matrix B −1 . Thus our final n-tuple will be
(B −1 AB)z.
Recapitulating, suppose that we know that the linear function f : Rn → Rn is
represented by the matrix A when we describe Rn in terms of the standard basis
vectors e1 , e2 , . . . , en and that we have a new set of basis vectors b1 , b2 , . . . , bn . Then
when Rn is described in terms of these new basis vectors the linear function f will
be represented by the matrix B −1 AB.
Exercise 52. Let f : Rn → Rm be a linear function. Suppose that with the
standard bases for Rn and Rm the function f is represented by the matrix A. Let
b1 , b2 , . . . , bn be a new set of basis vectors for Rn and c1 , c2 , . . . , cm be a new set of
basis vectors for Rm . What is the matrix that represents f when the linear spaces
are described in terms of the new basis vectors?
Exercise 53. Let f : R2 → R2 be a linear function. Suppose that with the
standard bases for Rn and Rm the function f is represented by the matrix
 
3 1
.
1 2
Let    
3 1
and
2 1
be a new set of basis vectors for R2 . What is the matrix that represents f when
R2 is described in terms of the new basis vectors?
28 2. LINEAR ALGEBRA

Properties of a square matrix that depend only on the linear function that the
matrix represents and not on the particular choice of basis vectors for the linear
space are called invariant properties. We have already seen one example of an
invariant property, the rank of a matrix. The rank of a matrix is equal to the
dimension of the image space of the function that the matrix represents which
clearly depends only on the function and not on the choice of basis vectors for the
linear space.
The idea of a property being invariant can be expressed also in terms only of
matrices without reference to the idea of linear functions. A property is invariant
if whenever an n × n matrix A has the property then for any nonsingular n × n
matrix B the matrix B −1 AB also has the property. We might think of rank as a
function that associates to any square matrix a nonnegative integer. We shall say
that such a function is an invariant if the property of having the function take a
particular value is invariant for all particular values we may choose.
Two particularly important invariants are the trace of a square matrix and the
determinant of a square matrix. We examine these in more detail in the following
section.

8. The Trace and the Determinant


In this section we define two important real valued functions on the space
of n × n matrices, the trace and the determinant. Both of these concepts have
geometric interpretations. However, while the trace is easy to calculate (much easier
than the determinant) its geometric interpretation is rather hard to see. Thus we
shall not go into it. On the other hand the determinant while being somewhat
harder to calculate has a very clear geometric interpretation. In Section 9 we shall
examine in some detail how to calculate determinants. In this section we shall be
content to discuss one definition and the geometric intuition of the determinant.
Given an n×n matrix A the trace of A, written tr(A) is the sum of the elements
on the main diagonal, that is,
 
a11 a12 . . . a1n
 a21 a22 . . . a2n  X n
tr  . = aii .
 
. .. .
 .. .. .. 

. i=1
an1 an2 ... ann
Exercise 54. For the matrices given in Exercise 53 confirm that tr(A) =
tr(B −1 AB).
It is easy to see that the trace is a linear function on the space of all n × n
matrices, that is, that for all A and B n × n matrices and for all α ∈ R
(1) tr(A + B) = tr(A) + tr(B),
and
(2) tr(αA) = αtr(A).
We can also see that if A and B are both n×n matrices then tr(AB) = tr(BA).
In fact, if A is an m × n matrix and B is an n × m matrix this is still true. This
will often be extremely useful in calculating the trace of a product.
Exercise 55. From the definition of matrix multiplication show that if A is an
m × n matrix and B is an n × m matrix that tr(AB) = tr(BA). [Hint: Look at the
definition of matrix multiplication in Section 2. Then write the determinant of the
product matrix using summation notation. Finally change the order of summation.]
8. THE TRACE AND THE DETERMINANT 29

The determinant, unlike the trace is not a linear function of the matrix. It does
however have some linear structure. If we fix all columns of the matrix except one
and look at the determinant as a function of only this column then the determinant
is linear in this single column. Moreover this is true whatever the column we choose.
Let us write the determinant of the n × n matrix A as det(A). Let us also write
the matrix A as [a1 , a2 , . . . , an ] where ai is the ith column of the matrix A. Thus
our claim is that for all n × n matrices A, for all i = 1, 2, . . . n, for all n vectors b,
and for all α ∈ R
det([a1 , . . . , ai−1 , ai + b, ai+1 , . . . , an ]) = det([a1 , . . . , ai−1 , ai , ai+1 , . . . , an ])
(3)
+ det([a1 , . . . , ai−1 , b, ai+1 , . . . , an ])

and

(4) det([a1 , . . . , ai−1 , αai , ai+1 , . . . , an ]) = α det([a1 , . . . , ai−1 , ai , ai+1 , . . . , an ]).

We express this by saying that the determinant is a multilinear function.


Also the determinant is such that any n × n matrix that is not of full rank,
that is, of rank n, has a zero determinant. In fact, given that the determinant
is a multilinear function if we simply say that any matrix in which one column is
the same as one of its neighbours has a zero determinant this implies the stronger
statement that we made. We already see one use of calculating determinants. A
matrix is nonsingular if and only if its determinant is nonzero.
The two properties of being multilinear and zero whenever two neighbouring
columns are the same already almost uniquely identify the determinant. Notice
however that if the determinant satisfies these two properties then so does any
constant times the determinant. To uniquely define the determinant we “tie down”
this constant by assuming that det(I) = 1.
Though we haven’t proved that it is so, these three properties uniquely de-
fine the determinant. That is, there is one and only one function with these three
properties. We call this function the determinant. In Section 9 we shall discuss a
number of other useful properties of the determinant. Remember that this addi-
tional properties are not really additional facts about the determinant. They can
all be derived from the three properties we have given here.
Let us now look to the geometric interpretation of the determinant. Let us
first think about what linear transformations can do to the space Rn . Since we
have already said that a linear transformation that is not onto is represented by a
matrix with a zero determinant let us think about linear transformations that are
onto, that is, that do not map Rn into a linear space of lower dimension. Such
transformations can rotate the space around zero. They can “stretch” the space in
different directions. And they can “flip” the space over. In the latter case all objects
will become “mirror images” of themselves. We call linear transformations that
make such a mirror image orientation reversing and those that don’t orientation
preserving. A matrix that represents an orientation preserving linear function has a
positive determinant while a matrix that represents an orientation reversing linear
function has a negative determinant. Thus we have a geometric interpretation of
the sign of the determinant.
The absolute size of the determinant represents how much bigger or smaller the
linear function makes objects. More precisely it gives the “volume” of the image
of the unit hypercube under the transformation. The word volume is in quotes
because it is the volume with which we are familiar only when n = 3. If n = 2 then
it is area, while if n > 3 then it is the full dimensional analog in Rn of volume in
R3 .
30 2. LINEAR ALGEBRA

Exercise 56. Consider the matrix


 
3 1
.
1 2
In a diagram show the image under the linear function that this matrix represents
of the unit square, that is, the square whose corners are the points (0,0), (1,0),
(0,1), and (1,1). Calculate the area of that image. Do the same for the matrix
 
4 1
.
−1 1
In the light of Exercise 53, comment on the answers you calculated.

9. Calculating and Using Determinants


We have already used the concepts of the inverse of a matrix and the determi-
nant of a matrix. The purpose of this section is to cover some of the “cookbook”
aspects of calculating inverses and determinants.
Suppose that we have an n × n matrix
 
a11 . . . a1n
A =  ... .. .. 

. . 
an1 ... ann
then we shall use |A| or

a11. . . a1n
.. .. ..

. . .
an1 . . . ann
as an alternative notation for det(A). Always remember that

a11 . . . a1n

.. .. ..
.
. .
an1 . . . ann
is not a matrix but rather a real number. For the case n = 2 we define

a11 a12
det(A) =

a21 a22
as a11 a22 − a21 a12 . It is possible to also give a convenient formula for the deter-
minant of a 3 × 3 matrix. However, rather than doing this, we shall immediately
consider the case of an n × n matrix.
By the minor of an element of the matrix A we mean the determinant (re-
member a real number) of the matrix obtained from the matrix A by deleting the
row and column containing the element in question. We denote the minor of the
element aij by the symbol |Mij |. Thus, for example,

a22 . . . a2n

|M11 | = ... .. .. .

. .
an2 . . . ann

Exercise 57. Write out the minors of a general 3 × 3 matrix.


We now define the cofactor of an element to be either plus or minus the minor
of the element, being plus if the sum of indices of the element is even and minus
if it is odd. We denote the cofactor of the element aij by the symbol |Cij |. Thus
|Cij | = |Mij | if i + j is even and |Cij | = −|Mij | if i + j is odd. Or,
|Cij | = (−1)i+j |Mij |.
9. CALCULATING AND USING DETERMINANTS 31

We now define the determinant of an n × n matrix A,



a11 . . . a1n

det(A) = |A| = ... .. ..

. .

an1 . . . ann
Pn
to be j=1 a1j |C1j |. This is the sum of n terms, each one of which is the product
of an element of the first row of the matrix and the cofactor of that element.
Exercise 58. Define the determinant of the 1 × 1 matrix [a] to be a. (What
else could we define it to be?) Show that the definition given above corresponds
with the definition we gave earlier for 2 × 2 matrices.
Exercise 59. Calculate the determinants of the following 3 × 3 matrices.
     
1 2 3 1 5 2 1 1 0
(a)  3 6 9  (b)  1 4 3  (c)  5 4 1 
4 5 7 0 1 2 2 3 2
   
1 0 0 2 5 2
(d)  0 1 0  (e)  1 5 3 
0 0 1 0 1 3
Exercise 60. Show that the determinant of the identity matrix, det(In ) is 1
for all values of n. [Hint: Show that it is true for I2 . Then show that if it is true
for In−1 then it is true for In .]
One might ask what was special about the first row that we took elements of
that row multiplied them by their cofactors and added them up. Why not the
second row, or the first column? It will follow from a number of properties of
determinants we list below that in fact we could have used any row or column and
we would have arrived at the same answer.
Exercise 61. Expand the matrix given in Exercise 59(b) in terms of the 2nd
and 3rd rows and in terms of each column and check that the resulting answer
agrees with the answer you obtained originally.
We now have a way of calculating the determinant of any matrix. To find
the determinant of an n × n matrix we have to calculate n determinants of size
(n − 1) × (n − 1). This is clearly a fairly computationally costly procedure. However
there are often ways to economise on the computation.
Exercise 62. Evaluate the determinants of the following matrices
   
1 8 0 7 4 7 0 4
 2 3 4 6   (b)  5 6 1 8 
 
(a) 
 1 6 0 −1   0 0 9 0 
0 −5 0 8 1 −3 1 4
[Hint: Think carefully about which column or row to use in the expansion.]
We shall now list a number of properties of determinants. These properties
imply that, as we stated above, it does not matter which row or column we use to
expand the determinant. Further these properties will give us a series of transfor-
mations we may perform on a matrix without altering its determinant. This will
allow us to calculate a determinant by first transforming the matrix to one whose
determinant is easier to calculate and then calculating the determinant of the easier
matrix.
32 2. LINEAR ALGEBRA

Property 1. The determinant of a matrix equals the determinant of its trans-


pose.
|A| = |A0 |
Property 2. Interchanging two rows (or two columns) of a matrix changes
its sign but not its absolute value. For example,

c d
= cb − ad = −(ad − cb) = − a b .


a b c d

Property 3. Multiplying one row (or column) of a matrix by a constant λ


will change the value of the determinant λ-fold. For example,

λa11 . . . λa1n a11 . . . a1n

.. .. .. = λ .. .. .. .
.
. .

.
. .
an1 . . . ann an1 . . . ann
Exercise 63. Check Property 3 for the cases n = 2 and n = 3.
Corollary 1. |λA| = λn |A| (where A is an n × n matrix).
Corollary 2. | − A| = |A| if n is even. | − A| = −|A| if n is odd.
Property 4. Adding a multiple of any row (column) to any other row (column)
does not alter the value of the determinant.
Exercise 64. Check that
   
1 5 2 1 5+3×2 2
 1 4 3 = 1 4+3×3 3 
0 1 2 0 1+3×2 2
 
1 + (−2) × 1 5 + (−2) × 4 2 + (−2) × 3
= 1 4 3 .
0 1 2
Property 5. If one row (or column) is a constant times another row (or
column) then the determinant the matrix is zero.
Exercise 65. Show that Property 5 follows from Properties 3 and 4.
We can strengthen Property 5 to obtain the following.
Property 50 . The determinant of a matrix is zero if and only if the matrix is
not of full rank.
Exercise 66. Explain why Property 50 is a strengthening of Property 5, that
is, why 50 implies 5.
These properties allow us to calculate determinants more easily. Given an n×n
matrix A the basic strategy one follows is to use the above properties, particularly
Property 4 to find a matrix with the same determinant as A in which one row (or
column) has only one non-zero element. Then, rather than calculating n determi-
nants of size (n − 1) × (n − 1) one only needs to calculate one. One then does the
same thing for the (n − 1) × (n − 1) determinant that needs to be calculated, and
so on.
There are a number of reasons we are interested in determinants. One is that
they give us one method of calculating the inverse of a nonsingular matrix. (Recall
that there is no inverse of a singular matrix.) They also give us a method, known
as Cramer’s Rule, for solving systems of linear equations. Before proceeding with
this it is useful to state one further property of determinants.
9. CALCULATING AND USING DETERMINANTS 33

Property 6. If one expands a matrix in terms of one row (or column) and
the cofactors of a different row (or column) then the answer is always zero. That is
n
X
aij |Ckj | = 0
j=1

whenever i 6= k. Also
n
X
aij |Cik | = 0
i=1
whenever j 6= k.
Exercise 67. Verify Property 6 for the matrix
 
4 1 2
 5 2 1 .
1 0 3
Let us define the matrix of cofactors C to be the matrix [|Cij |] whose ijth
element is the cofactor of the ijth element of A. Now we define the adjoint matrix
of A to be the transpose of the matrix of cofactors of A. That is
adj(A) = C 0 .
It is straightforward to see (using Property 6) that A adj(A) = |A|In = adj(A)A.
That is, A−1 = |A|1
adj(A). Notice that this is well defined if and only if |A| =
6 0.
We now have a method of finding the inverse of any nonsingular square matrix.
Exercise 68. Use this method to find the inverses of the following matrices
     
3 −1 2 4 −2 1 1 5 2
(a)  1 0 3  (b)  7 3 3  (c)  1 4 3 .
4 0 2 2 0 1 0 1 2
Knowing how to invert matrices we thus know how to solve a system of n linear
equations in n unknowns. For we can express the n equations in matrix notation as
Ax = b where A is an n × n matrix of coefficients, x is an n × 1 vector of unknowns,
and b is an n × 1 vector of constants. Thus we can solve the system of equations
as x = A−1 Ax = A−1 b.
Sometimes, particularly if we are not interested in all of the x’s it is convenient
to use another method of solving the equations. This method is known as Cramer’s
Rule. Let us suppose that we wish to solve the above system of equations, that is,
Ax = b. Let us define the matrix Ai to be the matrix obtained from A by replacing
the ith column of A by the vector b. Then the solution is given by
|Ai |
xi = .
|A|
Exercise 69. Derive Cramer’s Rule. [Hint: We know that the solution to the
system of equations is solved by x = (1/|A|)adj(A)b. This gives a formula for xi .
Show that this formula is the same as that given by xi = |Ai |/|A|.]
Exercise 70. Solve the following system of equations (i) by matrix inversion
and (ii) by Cramer’s Rule
2x1 − x2 = 2 −x1 + x2 + x3 = 1
(a) 3x2 + 2x3 = 16 (b) x1 − x2 + x3 = 1 .
5x1 + 3x3 = 21 x1 + x2 + x3 = 1
34 2. LINEAR ALGEBRA

Exercise 71. Recall that we claimed that the determinant was an invariant.
Confirm this by calculating (directly) det(A) and det(B −1 AB) where
   
1 0 1 1 0 0
B =  1 −1 2  and A =  0 2 0 .
2 1 −1 0 0 3
Exercise 72. An nth order determinant of the form

a11 0 0 ... 0

a21 a22 0 ... 0

a31 a32 a33 . . . 0

.. .. .. .. ..
.
. . . .
an1 an2 an3 . . . ann
is called triangular. Evaluate this determinant. [Hint: Expand the determinant in
terms of its first row. Expand the resulting (n − 1) × (n − 1) determinant in terms
of its first row, and so on.]

10. Eigenvalues and Eigenvectors


Suppose that we have a linear function f : Rn → Rn . When we look at
how f deforms Rn one natural question to look at is: Where does f send some
linear subspace? In particular we might ask if there are any linear subspaces that
f maps to themselves. We call such linear subspaces invariant linear subspaces.
Of course the space Rn itself and the zero dimensional space {0} are invariant
linear subspaces. The real question is whether there are any others. Clearly, for
some linear transformations there are no other invariant subspaces. For example,
a clockwise rotation of π/4 in R2 has no invariant subspaces other than R2 itself
and {0}.
A particularly important class of invariant linear subspaces are the one dimen-
sional ones. A one dimensional linear subspace is specified by one nonzero vector,
say x̄. Then the subspace is {λx̄ | λ ∈ R}. Let us call this subspace L(x̄). If L(x̄)
is an invariant linear subspace of f and if x ∈ L(x̄) then there is some value λ such
that f (x) = λx. Moreover the value of λ for which this is true will be the same
whatever value of x we choose in L(x̄).
Now if we fix the set of basis vectors and thus the matrix A that represents f
we have that if x is in a one dimensional invariant linear subspace of f then there
is some λ ∈ R such that
Ax = λx.
Again we can define this notion without reference to linear functions. Given a
matrix A if we can find a pair x, λ with x 6= 0 that satisfy the above equation we
call x an eigenvector of the matrix A and λ the associated eigenvalue. (Sometimes
these are called characteristic vectors and values.)
Exercise 73. Show that the eigenvalues of a matrix are an invariant, that
is, that they depend only on the linear function the matrix represents and not on
the choice of basis vectors. Show also that the eigenvectors of a matrix are not
an invariant. Explain why the dependence of the eigenvectors on the particular
basis is exactly what we would expect and argue that is some sense they are indeed
invariant.
Now we can rewrite the equation Ax = λx as
(A − λIn )x = 0.
10. EIGENVALUES AND EIGENVECTORS 35

If x, λ solve this equation and x 6= 0 then we have a nonzero linear combination of


the columns of A − λIn equal to zero. This means that the columns of A − λIn are
not linearly independent and so det(A − λIn ) = 0, that is,
 
a11 − λ a12 ... a1n
 a21 a22 − λ . . . a2n 
det   = 0.
 
.. .. . . .
.
 . . . . 
an1 an2 ... ann − λ
Now, the left hand side of this last equation is a polynomial of degree n in
λ, that is, a polynomial in λ in which n is the highest power of λ that appears
with nonzero coefficient. It is called the characteristic polynomial and the equation
is called the characteristic equation. Now this equation may, or may not, have a
solution in real numbers. In general, by the fundamental theorem of algebra the
equation has n solutions, perhaps not all distinct, in the complex numbers. If the
matrix A happens to be symmetric (that is, if aij = aji for all i and j) then all of
its eigenvalues are real. If the eigenvalues are all distinct (that is, different from
each other) then we are in a particularly well behaved situation. As a prelude we
state the following result.
Theorem 5. Given an n×n matrix A suppose that we have m eigenvectors of A
x1 , x2 , . . . , xm with corresponding eigenvalues λ1 , λ2 , . . . , λm . If λi 6= λj whenever
i 6= j then x1 , x2 , . . . , xm are linearly independent.
An implication of this theorem is that an n × n matrix cannot have more than
n eigenvectors with distinct eigenvalues. Further this theorem allows us to see that
if an n × n matrix has n distinct eigenvalues then it is possible to find a basis
for Rn in which the linear function that the matrix represents is represented by
a diagonal matrix. Equivalently we can find a matrix B such that B −1 AB is a
diagonal matrix.
To see this let b1 , b2 , . . . , bn be n linearly independent eigenvectors with associ-
ated eigenvalues λ1 , λ2 , . . . , λn . Let B be the matrix whose columns are the vectors
b1 , b2 , . . . , bn . Since these vectors are linearly independent the matrix B has an
inverse. Now
B −1 AB = B −1 [Ab1 Ab2 . . . Abn ]
= B −1 [λ1 b1 λ2 b2 . . . λn bn ]
= [λ1 B −1 b1 λ2 B −1 b2 . . . λn B −1 bn ]
 
λ1 0 . . . 0
 0 λ2 . . . 0 
= . .
 
.. . . .
 .. . . .. 
0 0 ... λn
CHAPTER 3

Consumer Behaviour: Optimisation Subject to the


Budget Constraint

1. Constrained Maximisation
1.1. Lagrange Multipliers. Consider the problem of a consumer who seeks
to distribute his income across the purchase of the two goods that he consumes,
subject to the constraint that he spends no more than his total income. Let us
denote the amount of the first good that he buys x1 and the amount of the second
good x2 , the prices of the two goods p1 and p2 , and the consumer’s income y.
The utility that the consumer obtains from consuming x1 units of good 1 and x2
of good two is denoted u(x1 , x2 ). Thus the consumer’s problem is to maximise
u(x1 , x2 ) subject to the constraint that p1 x1 + p2 x2 ≤ y. (We shall soon write
p1 x1 + p2 x2 = y, i.e., we shall assume that the consumer must spend all of his
income.) Before discussing the solution of this problem lets write it in a more
“mathematical” way.
max u(x1 , x2 )
x1 ,x2
(5)
subject to p1 x1 + p2 x2 = y
We read this “Choose x1 and x2 to maximise u(x1 , x2 ) subject to the constraint
that p1 x1 + p2 x2 = y.”
Let us assume, as usual, that the indifference curves (i.e., the sets of points
(x1 , x2 ) for which u(x1 , x2 ) is a constant) are convex to the origin. Let us also
assume that the indifference curves are nice and smooth. Then the point (x∗1 , x∗2 )
that solves the maximisation problem (31) is the point at which the indifference
curve is tangent to the budget line as given in Figure 1.
One thing we can say about the solution is that at the point (x∗1 , x∗2 ) it must be
true that the marginal utility with respect to good 1 divided by the price of good 1
must equal the marginal utility with respect to good 2 divided by the price of good
2. For if this were not true then the consumer could, by decreasing the consumption
of the good for which this ratio was lower and increasing the consumption of the
other good, increase his utility. Marginal utilities are, of course, just the partial
derivatives of the utility function. Thus we have
∂u ∗ ∗ ∂u ∗ ∗
∂x1 (x1 , x2 ) ∂x2 (x1 , x2 )
(6) = .
p1 p2
The argument we have just made seems very “economic.” It is easy to give an
alternate argument that does not explicitly refer to the economic intuition. Let xu2
be the function that defines the indifference curve through the point (x∗1 , x∗2 ), i.e.,
u(x1 , xu2 (x1 )) ≡ ū ≡ u(x∗1 , x∗2 ).
Now, totally differentiating this identity gives
∂u ∂u dxu
(x1 , xu2 (x1 )) + (x1 , xu2 (x1 )) 2 (x1 ) = 0.
∂x1 ∂x2 dx1
37
383. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

x2
6

@
@
@
@
@
@
x∗2 q q q q q q q q q q q q q q q@
qq
@
q
qq @
q @
qq u(x1 , x2 ) = ū
q @
qq @
q
qq
@
p1 x1 + p2 x2 = y
q @
q @ -
x∗1 x1

Figure 1

That is,
∂u
dxu2 (x1 , xu2 (x1 ))
(x1 ) = − ∂x
∂u
1
u
.
dx1 ∂x (x1 , x2 (x1 ))
2

Now xu2 (x∗1 ) = x∗2 . Thus the slope of the indifference curve at the point (x∗1 , x∗2 )

dxu2 ∗
∂u
(x∗1 , x∗2 )
(x1 ) = − ∂x
∂u
1
∗ ∗
.
dx1 ∂x (x1 , x2 )
2

Also, the slope of the budget line is − pp21 . Combining these two results again gives
result (6).
Since we also have another equation that (x∗1 , x∗2 ) must satisfy, viz
(7) p1 x∗1 + p2 x∗2 = y
we have two equations in two unknowns and we can (if we know what the utility
function is and what p1 , p2 , and y are) go happily away and solve the problem.
(This isn’t quite true but we shall not go into that at this point.) What we shall
develop is a systemic and useful way to obtain the conditions (6) and (7). Let us
first denote the common value of the ratios in (6) by λ. That is,
∂u ∗ ∗ ∂u ∗ ∗
∂x1 (x1 , x2 ) ∂x2 (x1 , x2 )
=λ=
p1 p2
and we can rewrite this and (7) as
∂u ∗ ∗
(x , x ) − λp1 = 0
∂x1 1 2
(8) ∂u ∗ ∗
(x , x ) − λp2 = 0
∂x2 1 2
y − p1 x∗1 − p2 x∗2 = 0.
1. CONSTRAINED MAXIMISATION 39

Now we have three equations in x∗1 , x∗2 , and the new artificial or auxiliary variable
λ. Again we can, perhaps, solve these equations for x∗1 , x∗2 , and λ. Consider the
following function
(9) L(x1 , x2 , λ) = u(x1 , x2 ) + λ(y − p1 x1 − p2 x2 )
∂L ∂L
This function is known as the Lagrangian. Now, if we calculate ∂x 1
, ∂x2
, and, ∂L
∂λ ,
and set the results equal to zero we obtain exactly the equations given in (8). We
now describe this technique in a somewhat more general way.
Suppose that we have the following maximisation problem
max f (x1 , . . . , xn )
x1 ,...,xn
(10)
subject to g(x1 , . . . , xn ) = c

and we let
(11) L(x1 , . . . , xn , λ) = f (x1 , . . . , xn ) + λ(c − g(x1 , . . . , xn ))
then if (x∗1 , . . . , x∗n ) solves (10) there is a value of λ, say λ∗ such that
∂L ∗
(12) (x , . . . , x∗n , λ∗ ) = 0 i = 1, . . . , n
∂xi 1
∂L ∗
(13) (x , . . . , x∗n , λ∗ ) = 0.
∂λ 1
Notice that the conditions (12) are precisely the first order conditions for
choosing x1 , . . . , xn to maximise L, once λ∗ has been chosen. This provides an
intuition into this method of solving the constrained maximisation problem. In
the constrained problem we have told the decision maker that he must satisfy
g(x1 , . . . , xn ) = c and that he should choose among all points that satisfy this con-
straint the point at which f (x1 , . . . , xn ) is greatest. We arrive at the same answer
if we tell the decision maker to choose any point he wishes but that for each unit by
which he violates the constraint g(x1 , . . . , xn ) = c we shall take away λ units from
his payoff. Of course we must be careful to choose λ to be the correct value. If we
choose λ too small the decision maker may choose to violate his constraint—e.g.,
if we made the penalty for spending more than the consumer’s income very small
the consumer would choose to consume more goods than he could afford and to
pay the penalty in utility terms. On the other hand if we choose λ too large the
decision maker may violate his constraint in the other direction, e.g., the consumer
would choose not to spend any of his income and just receive λ units of utility for
each unit of his income.
It is possible to give a more general statement of this technique, allowing for
multiple constraints. (Of course, we should always have fewer constraints than we
have variables.) Suppose we have more than one constraint. Consider the problem
max f (x1 , . . . , xn )
x1 ,...,xn
subject to g1 (x1 , . . . , xn ) = c1
.. ..
. .
gm (x1 , . . . , xn ) = cm .

Again we construct the Lagrangian


L(x1 , . . . , xn , λ1 , . . . , λm ) = f (x1 , . . . , xn )
(14)
+ λ1 (c1 − g1 (x1 , . . . , xn )) + · · · + λm (cm − gm (x1 , . . . , xn ))
403. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

and again if (x∗1 , . . . , x∗n ) solves (14) there are values of λ, say λ∗1 , . . . , λ∗m such that
∂L ∗
(x , . . . , x∗n , λ∗1 , . . . , λ∗m ) = 0 i = 1, . . . , n
∂xi 1
(15)
∂L ∗
(x , . . . , x∗n , λ∗1 , . . . , λ∗m ) = 0 j = 1, . . . , m.
∂λj 1
1.2. Caveats and Extensions. Notice that we have been referring to the set
of conditions which a solution to the maximisation problem must satisfy. (We call
such conditions necessary conditions.) So far we have not even claimed that there
necessarily is a solution to the maximisation problem. There are many examples of
maximisation problems which have no solution. One example of an unconstrained
problem with no solution is
(16) max 2x
x
maximise over the choice of x the function 2x. Clearly the greater we make x the
greater is 2x, and so, since there is no upper bound on x there is no maximum.
Thus we might want to restrict maximisation problems to those in which we choose
x from some bounded set. Again, this is not enough. Consider the problem
(17) max 1/x .
0≤x≤1

The smaller we make x the greater is 1/x and yet at zero 1/x is not even defined.
We could define the function to take on some value at zero, say 7. But then the
function would not be continuous. Or we could leave zero out of the feasible set
for x, say 0 < x ≤ 1. Then the set of feasible x is not closed. Since there would
obviously still be no solution to the maximisation problem in these cases we shall
want to restrict maximisation problems to those in which we choose x to maximise
some continuous function from some closed (and because of the previous example)
bounded set. (We call a set of numbers, or more generally a set of vectors, that
is both closed and bounded a compact set.) Is there anything else that could go
wrong? No! The following result says that if the function to be maximised is
continuous and the set over which we are choosing is both closed and bounded, i.e.,
is compact, then there is a solution to the maximisation problem.
Theorem 6 (The Weierstrass Theorem). Let S be a compact set. Let f be a
continuous function that takes each point in S to a real number. (We usually write:
let f : S → R be continuous.) Then there is some x∗ in S at which the function is
maximised. More precisely, there is some x∗ in S such that f (x∗ ) ≥ f (x) for any
x in S.
Notice that in defining such compact sets we typically use inequalities, such
as x ≥ 0. However in Section 1 we did not consider such constraints, but rather
considered only equality constraints. However, even in the example of utility max-
imisation at the beginning of Section 5.6, there were implicitly constraints on x1
and x2 of the form
x1 ≥ 0, x2 ≥ 0.
A truly satisfactory treatment would make such constraints explicit. It is possible
to explicitly treat the maximisation problem with inequality constraints, at the
price of a little additional complexity. We shall return to this question later in the
book.
Also, notice that had we wished to solve a minimisation problem we could
have transformed the problem into a maximisation problem by simply multiplying
the objective function by −1. That is, if we wish to minimise f (x) we could do
so by maximising −f (x). As an exercise write out the conditions analogous to
2. THE IMPLICIT FUNCTION THEOREM 41

the conditions (8) for the case that we wanted to minimise u(x). Notice that if
x∗1 , x∗2 , and λ satisfy the original equations then x∗1 , x∗2 , and −λ satisfy the new
equations. Thus we cannot tell whether there is a maximum at (x∗1 , x∗2 ) or a
minimum. This corresponds to the fact that in the case of a function of a single
variable over an unconstrained domain at a maximum we require the first derivative
to be zero, but that to know for sure that we have a maximum we must look at the
second derivative. We shall not develop the analogous conditions for the constrained
problem with many variables here. However, again, we shall return to it later in
the book.

2. The Implicit Function Theorem


In the previous section we said things like: “Now we have three equations
in x∗1 , x∗2 , and the new artificial or auxiliary variable λ. Again we can, perhaps,
solve these equations for x∗1 , x∗2 , and λ.” In this section we examine the question
of when we can solve a system of n equations to give n of the variable in terms
of the others. Let us suppose that we have n endogenous variables x1 , . . . , xn ,
m exogenous variables or parameters, b1 , . . . , bm , and n equations or equilibrium
conditions
f1 (x1 , . . . , xn , b1 , . . . , bm ) = 0
f2 (x1 , . . . , xn , b1 , . . . , bm ) = 0
(18) ..
.
fn (x1 , . . . , xn , b1 , . . . , bm ) = 0,
or, using vector notation,
f (x, b) = 0,
n+m
where f : R → R , x ∈ R , that is it is an n vector, b ∈ Rm , and 0 ∈ Rn .
n n

When can we solve this system to obtain functions giving each xi as a function
of b1 , . . . , bm ? As we’ll see below we only give an incomplete answer to this question,
but first let’s look at the case that the function f is a linear function.
Suppose that our equations are
a11 x1 + . . . a1n xn + c11 b1 + c1m bm = 0
a21 x1 + . . . a2n xn + c21 b1 + c2m bm = 0
..
.
an1 x1 + . . . ann xn + cn1 b1 + cnm bm = 0.
We can write this, in matrix notation, as
 
x
[A | C] = 0,
b
where A is an n × n matrix, C is an n × m matrix, x is an n × 1 (column) vector,
and b is an m × 1 vector.
This we can rewrite as
Ax + Cb = 0,
and solve this to give
x = −A−1 Cb.
And we can do this as long as the matrix A can be inverted, that is, as long as the
matrix A is of full rank.
Our answer to the general question in which the function f may not be linear
is that if there are some values (x̄, b̄) for which f (x̄, b̄) = 0 then if, when we take
a linear approximation to f we can solve the approximate linear system as we did
423. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

above, then we can solve the true nonlinear system, at least in a neighbourhood of
(x̄, b̄). By this last phrase we mean that if b is not close to b̄ we may not be able to
solve the system, and that for a particular value of b there may be many values of
x that solve the system, but there is only one close to x̄.
To see why we can’t, in general, do better than this consider the equation
f : R2 → R given by f (x, b) = g(x) − b, where the function g is graphed in Figure 2.
Notice that the values (x̄, b̄) satisfy the equation f (x, b) = 0. For all values of b
close to b̄ we can find a unique value of x close to x̄ such that f (x, b) = 0. However,
(1) for each value of b there are other values of x far away from x̄ that also satisfy
f (x, b) = 0, and (2) there are values of b, such as b̃ for which there are no values of
x that satisfy f (x, b) = 0.

g(x)
6

b̃ q

b̄ q q q q q q q q q q q q qq
q
qq
q
qq
q -
x̄ x

Figure 2

Let us consider again the system of equations 18. We say that the function f
is C 1 on some open set A ⊂ Rn+m if f has partial derivatives everywhere in A and
these partial derivatives are continuous on A.
Theorem 7. Suppose that f : Rn+m → Rn is a C 1 function on an open set
A ⊂ Rn+m and that (x̄, b̄) in A is such that f (x̄, b̄) = 0. Suppose also that
 ∂f (x,b) 
1
∂x1 · · · ∂f∂x1 (x,b)
n
∂f (x, b)  .. .. 
= . .

∂x  
∂fn (x,b) ∂fn (x,b)
∂x1 · · · ∂xn

is of full rank. Then there are open sets A1 ⊂ Rn and A2 ⊂ Rm with x̄ in A1 and
b̄ in A2 and A1 × A2 ⊂ A such that for each b in A2 there is exactly one g(b) in A1
such that f (g(b), b) = 0. Moreover, g : A2 → A1 is a C 1 function and
 −1  
∂g(b) ∂f (g(b), b) ∂f (g(b), b)
=− .
∂b ∂x ∂b
3. THE THEOREM OF THE MAXIMUM 43

Exercise 74. Consider the general utility maximisation problem


max u(x1 , x2 , . . . , xn )
x1 ,x2 ,...,xn
(19)
subject to p1 x1 + p2 x2 + · · · + pn xn = w.
Suppose that for some price vector p̄ the maximisation problem has a utility max-
imising bundle x̄. Find conditions on the utility function such that in a neighbour-
hood of (x̄, p̄) we can solve for the demand functions x(p). Find the derivatives of
the demand functions, ∂x/∂p.
Exercise 75. Now suppose that there are only two goods and the utility
function is given by
1 2
u(x1 , x2 ) = (x1 ) 3 (x2 ) 3 .
Solve this utility maximisation problem, as you learned to do in Section 1 of this
Chapter, and then differentiate the demand functions that you find to find the
partial derivative with respect to p1 , p2 , and w of each demand function.
Also find the same derivatives using the method of the previous exercise.

3. The Theorem of the Maximum


Often in economics we are not so much interested in what the solution to a
particular maximisation problem is but rather wish to know how the solution to a
parameterised problem depends on the parameters. Thus in our first example of
utility maximisation we might be interested not so much in what the solution to the
maximisation problem is when p1 = 2, p2 = 7, and y = 25, but rather in how the
solution depends on p1 , p2 , and y. (That is, we might be interested in the demand
function.) Sometimes we shall also be interested in how the maximised function
depends on the parameters—in the example how the maximised utility depends on
p1 , p2 , and y.
This raises a number of questions. In order for us to speak meaningfully of a
demand function it should be the case that the maximisation problem has a unique
solution. Further, we would like to know if the “demand” function is continuous—
or even if it is differentiable. Consider again the problem (14), but this time let us
explicitly add some parameters.
max f (x1 , . . . , xn , a1 , . . . , ak )
x1 ,...,xn
subject to g1 (x1 , . . . , xn , a1 , . . . , ak ) = c1
(20) .. ..
. .
gm (x1 , . . . , xn , a1 , . . . , ak ) = cm
In order to be able to say whether or not the problem has a unique solution
it is useful to know something about the shape or curvature of the functions f
and g. We say a function is concave if for any two points in the domain of the
function the value of function at a weighted average of the two points is greater
than the weighted average of the value of the function at the two points. We say
the function is convex if the value of the function at the average is less than the
average of the values. The following definition makes this a little more explicit. (In
both definitions x = (x1 , . . . , xn ) is a vector.)
Definition 15. A function f is concave if for any x and x0 with x 6= x0 and
for any t such that 0 < t < 1 we have f (tx + (1 − t)x0 ) ≥ tf (x) + (1 − t)f (x0 ). The
function is strictly concave if f (tx + (1 − t)x0 ) > tf (x) + (1 − t)f (x0 ).
A function f is convex if for any x and x0 with x 6= x0 and for any t such that
0 < t < 1 we have f (tx + (1 − t)x0 ) ≤ tf (x) + (1 − t)f (x0 ). The function is strictly
convex if f (tx + (1 − t)x0 ) < tf (x) + (1 − t)f (x0 ).
443. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

The result we are about to give is most conveniently stated when our statement
of the problem is in terms of inequality constraints rather than equality constraints.
As mentioned earlier we shall examine this kind of problem later in this course.
However for the moment in order to proceed with our discussion of the problem
involving equality constraints we shall assume that all of the functions with which
we are dealing are increasing in the x variables. (See Exercise 1 for a formal
definition of what it means for a function to be increasing.) In this case if f is
strictly concave and gj is convex for each j then the problem has a unique solution.
In fact the concepts of concavity and convexity are somewhat stronger than is
required. We shall see later in the course that they can be replaced by the concepts
of quasi-concavity and quasi-convexity. In some sense these latter concepts are the
“right” concepts for this result.
Theorem 8. Suppose that f and gj are increasing in (x1 , . . . , xn ). If f is
strictly concave in (x1 , . . . , xn ) and gj is convex in (x1 , . . . , xn ) for j = 1, . . . , m
then for each value of the parameters (a1 , . . . , ak ) if problem (20) has a solution
(x∗1 , . . . , x∗n ) that solution is unique.
Now let v(a1 , . . . , ak ) be the maximised value of f when the parameters are
(a1 , . . . , ak ). Let us suppose that the problem is such that the solution is unique and
that (x∗1 (a1 , . . . , ak ), . . . , x∗n (a1 , . . . , ak )) are the values that maximise the function
f when the parameters are (a1 , . . . , ak ) then
(21) v(a1 , . . . , ak ) = f (x∗1 (a1 , . . . , ak ), . . . , x∗n (a1 , . . . , ak ), a1 , . . . , ak ).
(Notice however that the function v is uniquely defined even if there is not a unique
maximiser.)
The Theorem of the Maximum gives conditions on the problem under which
the function v and the functions x∗1 , . . . , x∗n are continuous. The constraints in the
problem (20) define a set of feasible vectors x over which the function f is to be
maximised. Let us call this set G(a1 , . . . , ak ), i.e.,
(22) G(a1 , . . . , ak ) = {(x1 , . . . , xn ) | gj (x1 , . . . , xn , a1 , . . . , ak ) = cj ∀j}
Now we can restate the problem as
max f (x1 , . . . , xn , a1 , . . . , ak )
x1 ,...,xn
(23)
subject to (x1 , . . . , xn ) ∈ G(a1 , . . . , ak ).
Notice that both the function f and the feasible set G depend on the parameters
a, i.e., both may change as a changes. The Theorem of the Maximum requires
both that the function f be continuous as a function of x and a and that the
feasible set G(a1 , . . . , ak ) change continuously as a changes. We already know—
or should know—what it means for f to be continuous but the notion of what it
means for a set to change continuously is less elementary. We call G a set valued
function or a correspondence. G associates with any vector (a1 , . . . , ak ) a subset of
the vectors (x1 , . . . , xn ). The following two definitions define what we mean by a
correspondence being continuous. First we define what it means for two sets to be
close.
Definition 16. Two sets of vectors A and B are within  of each other if for
any vector x in one set there is a vector x0 in the other set such that x0 is within 
of x.
We can now define the continuity of the correspondence G in essentially the
same way that we define the continuity of a single valued function.
4. THE ENVELOPE THEOREM 45

Definition 17. The correspondence G is continuous at (a1 , . . . , ak ) if for any


 > 0 there is δ > 0 such that if (a01 , . . . , a0k ) is within δ of (a1 , . . . , ak ) then
G(a01 , . . . , a0k ) is within  of G(a1 , . . . , ak ).
It is, unfortunately, not the case that the continuity of the functions gj neces-
sarily implies the continuity of the feasible set. (Exercise 2 asks you to construct a
counterexample.)
Remark 1. It is possible to define two weaker notions of continuity, which we
call upper hemicontinuity and lower hemicontinuity. A correspondence is in fact
continuous in the way we have defined it if it is both upper hemicontinuous and
lower hemicontinuous.
We are now in a position to state the Theorem of the Maximum. We assume
that f is a continuous function, that G is a continuous correspondence, and that
for any (a1 , . . . , ak ) the set G(a1 , . . . , ak ) is compact. The Weierstrass Theorem
thus guarantees that there is a solution to the maximisation problem (23) for any
(a1 , . . . , ak ).
Theorem 9 (Theorem of the Maximum). Suppose that f (x1 , . . . , xn , a1 , . . . , ak )
is continuous (in (x1 , . . . , xn , a1 , . . . , ak )), that G(a1 , . . . , ak ) is a continuous corre-
spondence, and that for any (a1 , . . . , ak ) the set G(a1 , . . . , ak ) is compact. Then
(1) v(a1 , . . . , ak ) is continuous, and
(2) if (x∗1 (a1 , . . . , ak ), . . . , x∗n (a1 , . . . , ak )) are (single valued) functions then
they are also continuous.
Later in the course we shall see how the Implicit Function Theorem allows us
to identify conditions under which the functions v and x∗ are differentiable.
Exercises.
Exercise 76. We say that the function f (x1 , . . . , xn ) is nondecreasing if x0i ≥
xi for each i implies that f (x01 , . . . , x0n ) ≥ f (x1 , . . . , xn ), is increasing if x0i > xi
for each i implies that f (x01 , . . . , x0n ) > f (x1 , . . . , xn ) and is strictly increasing if
x0i ≥ xi for each i and x0j > xj for at least one j implies that f (x01 , . . . , x0n ) >
f (x1 , . . . , xn ). Show that if f is nondecreasing and strictly concave then it must be
strictly increasing. [Hint: This is very easy.]
Exercise 77. Show by example that even if the functions gj are continuous
the correspondence G may not be continuous. [Hint: Use the case n = m = k = 1.]

4. The Envelope Theorem


In this section we examine a theorem that is particularly useful in the study
of consumer and producer theory. There is in fact nothing mysterious about this
theorem. You will see that the proof of this theorem is simply calculation and a
number of substitutions. Moreover the theorem has a very clear intuition. It is this:
Suppose we are at a maximum (in an unconstrained problem) and we change the
data of the problem by a very small amount. Now both the solution of the problem
and the value at the maximum will change. However at a maximum the function
is flat (the first derivative is zero). Thus when we want to know by how much the
maximised value has changed it does not matter (very much) whether or not we
take account of how the maximiser changes or not. See Figure 2. The intuition for
a constrained problem is similar and only a little more complicated.
To motivate our discussion of the Envelope Theorem we will first consider a
particular case, viz, the relation between short and long run average cost curves.
Recall that, in general we assume that the average cost of producing some good is
463. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

f (x, a)
6

f (x∗ (a0 ), a0 ) q q q q q q q q q q q q q q q q q q q q q
qqqqqqqqqqqqqqqq q
f (x∗ (a), a0 ) qq qq
q q f (·, a0 )

q q q q q q q q q q q q q q q q
qq q
qq
f (x (a), a)
qq qq
q qq
qq qq
q
qq q
qq
q
qq q f (·, a)
qq
q
q q -
x∗ (a) x∗ (a0 ) x

Figure 2

a function of the amount of the good to be produced. The short run average cost
function is defined to be the function which for any quantity, Q, gives the average
cost of producing that quantity, taking as given the scale of operation, i.e., the size
and number of plants and other fixed capital which we assume cannot be changed
in the short run (whatever that is). The long run average cost function on the
other hand gives, as a function of Q, the average cost of producing Q units of the
good, with the scale of operation selected to be the optimal scale for that level of
production.
That is, if we let the scale of operation be measured by a single variable k,
say, and we let the short run average cost of producing Q units when the scale is
k be given by SRAC(Q, k) and the long run average cost of producing Q units by
LRAC(Q) then we have

LRAC(Q) = min SRAC(Q, k).


k

Let us denote, for a given value Q, the optimal level of k by k(Q). That is, k(Q) is
the value of k that minimises the right hand side of the above equation.
Graphically, for any fixed level of k the short run average cost function can be
represented by a curve (normally assumed to be U-shaped) drawn in two dimensions
with quantity on the horizontal axis and cost on the vertical axis. Now think about
drawing one short run average cost curve for each of the (infinite) possible values of
k. One way of thinking about the long run average cost curve is as the “bottom” or
envelope of these short run average cost curves. Suppose that we consider a point
on this long run or envelope curve. What can be said about the slope of the long
run average cost curve at this point. A little thought should convince you that it
should be the same as the slope of the short run curve through the same point.
(If it were not then that short run curve would come below the long run curve, a
4. THE ENVELOPE THEOREM 47

contradiction.) That is,


d LRAC(Q) ∂ SRAC(Q, k(Q))
= .
dQ ∂Q
See Figure 3.

Cost
6

SRAC

LRAC(Q̄) =
SRAC(Q̄, k(Q̄)) q q q q q q q q q q q q q q q qq
qq
q
qq LRAC
q
qq
q
qq
q
qq -
Q̄ Q

Figure 3

The envelope theorem is a general statement of the result of which this is a


special case. We will consider not only cases in which Q and k are vectors, but also
cases in which the maximisation or minimisation problem includes some constraints.
Let us consider again the maximisation problem (20). Recall:
max f (x1 , . . . , xn , a1 , . . . , ak )
x1 ,...,xn
subject to g1 (x1 , . . . , xn , a1 , . . . , ak ) = c1
.. ..
. .
gm (x1 , . . . , xn , a1 , . . . , ak ) = cm
Again let L(x1 , . . . , xn , λ1 , . . . , λm ; a1 , . . . , ak ) be the Lagrangian function.
L(x1 , . . . , xn , λ1 , . . . , λm ; a1 , . . . , ak ) = f (x1 , . . . , xn , a1 , . . . , ak )
m
(24) X
+ λj (cj − gj (x1 , . . . , xn , a1 , . . . , ak )).
j=1

Let (x∗1 (a1 , . . . , ak ), . . . , x∗n (a1 , . . . , ak )) and (λ1 (a1 , . . . , ak ), . . . , λm (a1 , . . . , ak )) be


the values of x and λ that solve this problem. Now let
(25) v(a1 , . . . , ak ) = f (x∗1 (a1 , . . . , ak ), . . . , x∗n (a1 , . . . , ak ), a1 , . . . , ak )
That is, v(a1 , . . . , ak ) is the maximised value of the function f when the parameters
are (a1 , . . . , ak ). The envelope theorem says that the derivative of v is equal to the
derivative of L at the maximising values of x and λ. Or, more precisely
483. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

Theorem 10 (The Envelope Theorem). If all functions are defined as above


and the problem is such that the functions x∗ and λ are well defined then
∂v ∂L ∗
(a1 , . . . , ak ) = (x (a1 , . . . , ak ), . . . , x∗n (a1 , . . . , ak ),
∂ah ∂ah 1
λ1 (a1 , . . . , ak ), . . . , λm (a1 , . . . , ak ), a1 , . . . , ak )
∂f ∗
= (x (a1 , . . . , ak ), . . . , x∗n (a1 , . . . , ak ), a1 , . . . , ak )
∂ah 1
m
X ∂gh ∗
− λj (a1 , . . . , ak ) (x1 (a1 , . . . , ak ), . . . , x∗n (a1 , . . . , ak ), a1 , . . . , ak )
j=1
∂a h

for all h.
In order to show the advantages of using matrix and vector notation we shall
restate the theorem in that notation before returning to give a proof of the theorem.
(In proving the theorem we shall return to using mainly scalar notation.)
Theorem 10 (The Envelope Theorem). Under the same conditions as above
∂v ∂L ∗
(a) = (x (a), λ(a), a)
∂a ∂a
∂f ∗ ∂g
= (x (a), a) − λ(a) (x∗ (a), a).
∂a ∂a
Proof. From the definition of the function v we have
(26) v(a1 , . . . , ak ) = f (x∗1 (a1 , . . . , ak ), . . . , x∗n (a1 , . . . , ak ), a1 , . . . , ak )
Thus
n
∂v ∂f ∗ X ∂f ∗ ∂x∗
(27) (a) = (x (a), a) + (x (a), a) i (a).
∂ah ∂ah i=1
∂xi ∂ah
Now, from the first order conditions (12) we have
m
∂f ∗ X ∂gj ∗
(x (a), a) − λj (a) (x (a), a) = 0.
∂xi j=1
∂xi

Or
m
∂f ∗ X ∂gj ∗
(28) (x (a), a) = λj (a) (x (a), a).
∂xi j=1
∂xi

Also, since x∗ (a) satisfies the constraints we have, for each j


gj (x∗1 (a), . . . , x∗n (a), a1 , . . . , ak ) ≡ cj .
And, since this holds as an identity, we may differentiate both sides with respect
to ah giving
n
X ∂gj ∗ ∂x∗ ∂gj ∗
(x (a), a) i (a) + (x (a), a) = 0.
i=1
∂x i ∂ah ∂ah

Or
n
X ∂gj ∂x∗i ∂gj ∗
(29) (x∗ (a), a) (a) = − (x (a), a).
i=1
∂xi ∂ah ∂ah
Substituting (28) into (27) gives
n X m
∂v ∂f ∗ X ∂gj ∗ ∂x∗
(a) = (x (a), a) + [ λj (a) (x (a), a)] i (a).
∂ah ∂ah i=1 j=1
∂xi ∂ah
5. APPLICATIONS TO MICROECONOMIC THEORY 49

Changing the order of summation gives


m n
∂v ∂f ∗ X X ∂gj ∗ ∂x∗
(30) (a) = (x (a), a) + λj (a)[ (x (a), a) i (a)].
∂ah ∂ah j=1 i=1
∂xi ∂ah

And now substituting (29) into (30) gives


m
∂v ∂f ∗ X ∂gj ∗
(a) = (x (a), a) − λj (a) (x (a), a),
∂ah ∂ah j=1
∂ah

which is the required result. 

Exercises.

Exercise 78. Rewrite this proof using matrix notation. Go through your proof
and identify the dimension of each of the vectors or matrices you use. For example
fx is a 1 × n vector, gx is an m × n matrix.

5. Applications to Microeconomic Theory


5.1. Utility Maximisation. Let us again consider the problem given in (31)

max u(x1 , x2 )
x1 ,x2
subject to p1 x1 + p2 x2 − y = 0.

Let v(p1 , p2 , y) be the maximised value of u when prices and income are p1 , p2 , and
y. Let us consider the effect of a change in y with p1 and p2 remaining constant.
By the Envelope Theorem
∂v ∂
= {u(x1 , x2 ) + λ(y − p1 x1 + p2 x2 )} = 0 + λ1 = λ.
∂y ∂y
This is the familiar result that λ is the marginal utility of income.

5.2. Expenditure Minimisation. Let us consider the problem of minimising


expenditure subject to attaining a given level of utility, i.e.,
n
X
min pi xi
x1 ,...,xn
i=1
subject to u(x1 , . . . , xn ) − u0 = 0.

Let the minimised value of the expenditure function be denoted by


e(p1 , . . . , pn , u0 ). Then by the Envelope Theorem we obtain
n
∂e ∂ X
= { pi xi + λ(u0 − u(x1 , . . . , xn ))} = xi − λ0 = xi
∂pi ∂pi i=1

when evaluated at the point which solves the minimisation problem which we write
as hi (p1 , . . . , pn , u0 ) to distinguish this (compensated) value of the demand for good
i as a function of prices and utility from the (uncompensated) value of the demand
for good i as a function of prices and income. This result is known as Hotelling’s
Theorem.
503. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

5.3. The Hicks-Slutsky Equations. It can be shown that the compensated


demand at utility u0 , i.e., hi (p1 , . . . , pn , u0 ) is equal to the uncompensated de-
mand at income e(p1 , . . . , pn , u0 ), i.e., xi (p1 , . . . , pn , e(p1 , . . . , pn , u0 )). (This result
is known as the duality theorem.) Thus totally differentiating the identity
xi (p1 , . . . , pn , e(p1 , . . . , pn , u0 )) ≡ hi (p1 , . . . , pn , u0 )
with respect to pk we obtain
∂xi ∂xi ∂e ∂hi
+ =
∂pk ∂y ∂pk ∂pk
which by Hotelling’s Theorem gives
∂xi ∂xi ∂hi
+ hk = .
∂pk ∂y ∂pk
So
∂xi ∂hi ∂xi
= − hk
∂pk ∂pk ∂y
for all i, k = 1, . . . , n. These are the Hicks-Slutsky equations.

5.4. The Indirect Utility Function. Again let v(p1 , . . . , pn , y) be the indi-
rect utility function, that is, the maximised value of utility as described in Appli-
cation (1). Then by the Envelope Theorem
∂v ∂u
= − λxi (p1 , . . . , pn , y) = −λxi (p1 , . . . , pn , y)
∂pi ∂pi
∂u ∂v
since ∂pi = 0. Now, since we have already shown that λ = ∂y (in Section 4.1) we
have
∂v/∂pi
xi (p1 , . . . , pn , y) = − .
∂v/∂y
This is known as Roy’s Theorem.

5.5. Profit functions. Now consider the problem of a firm that maximises
profits subject to technology constraints. Let x = (x1 , . . . , xn ) be a vector of
netputs, i.e., xi is positive if the firm is a net supplier of good i, negative if the firm
is a net user of that good. Let assume that we can write the technology constraints
as F (x) = 0. Thus the firm’s problem is
n
X
max pi xi
x1 ,...,xn
i=1
subject to F (x1 , . . . , xn ) = 0.
Let ϕi (p) be the value of xi that solves this problem, i.e., the net supply of
commodity i when prices are p. (Here p is a vector.) We call the maximised value
the profit function which is given by
n
X
Π(p) = pi ϕi (p).
i=1

And so by the Envelope Theorem


∂Π
= ϕi (p).
∂pi
This result is known as Hotelling’s lemma.
5. APPLICATIONS TO MICROECONOMIC THEORY 51

5.6. Cobb-Douglas Example. We consider a particular Cobb-Douglas ex-


ample of the utility maximisation problem
√ √
max x1 x2
x1 ,x2
(31)
subject to p1 x1 + p2 x2 = w
The Lagrangean is
√ √
(32) L(x1 , x2 , λ) = x1 x2 + λ(y − p1 x1 − p2 x2 )
and the first order conditions are
∂L 1 −1 1
(33) = x1 2 x22 − p1 λ = 0
∂x1 2
∂L 1 1 −1
(34) = x12 x2 2 − p2 λ = 0
∂x2 2
∂L
(35) = w − p1 x1 − p2 x2 = 0.
∂λ
If we divide equation (33) by equation (34) we obtain
x1 −1 x2 = p1 /p2
or
p1 x1 = p2 x2
and if we substitute this into equation (35) we obtain
w − p1 x1 − p1 x1 = 0
or
w
(36) x1 = .
2p1
Similarly,
w
(37) x2 = .
2p2
Substituting equations (36) and (37) into the utility function gives
s
w2 w
(38) v(p1 , p2 , w) = = √ .
4p1 p2 2 p1 p2
As a check here we can check some known properties of the indirect utility
function. For example it is homogeneous of degree zero, that is, is we multiply p1 ,
p2 , and w by the same positive constant, say α we do not change the value of v.
You should confirm that this is the case.
We now calculate the optimal value of λ from the first order conditions by
substituting equations (36) and (37) into (33), giving
 − 21  1
1 w w 2
− p1 λ = 0
2 2p1 2p2
or r
1 2p1 w
= p1 λ
2 w2p2
or √
1 p1 1
√ · =λ
2 p2 p1
or
1
λ= √ .
2 p1 p2
523. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

Our first application of the Envelope Theorem told us that this value of λ could
be found as the derivative of the indirect utility function with respect to w. We
confirm this by differentiating the function we found above with respect to w.
∂v ∂ w 1
= √ = √
∂w ∂w 2 p1 p2 2 p1 p2
as we had found directly above.
Now let us, for the same utility function consider the expenditure minimisation
problem
min p1 x2 + p2 x2
x1 ,x2
√ √
subject to x1 x2 = u.
The Lagrangian is
√ √
(39) L(x1 , x2 , λ) = p1 x1 + p2 x2 + λ(u − x1 x2 )
and the first order conditions are
∂L 1 −1 1
(40) = p1 − λ x1 2 x22 = 0
∂x1 2
∂L 1 1 −1
(41) = p2 − λ x12 x2 2 = 0
∂x2 2
∂L √ √
(42) = u − x1 x2 = 0.
∂λ
Dividing equation (40) by equation (41) gives
p1 x2
=
p2 x1
or
p1 x1
(43) x2 = .
p2
And, if we substitute equation (43) into equation (40) we obtain
r
p1
u − x1
p2
or r
p2
x1 = u .
p1
Similarly,
r
p1
x2 = u ,
p2
and if we substitute these values back into the objective function we obtain the
expenditure function

r r
p2 p1
e(p1 , p2 , u) = p1 u + p2 u = 2u p1 p2 .
p1 p2
Hotelling’s Theorem tells us that is we differentiate this expenditure function
with respect to pi we should obtain the Hicksian demand function hi .

r r
∂e(p1 , p2 , u) ∂ 1 p2 p2
= 2u p1 p2 = 2u · =u
∂p1 ∂p1 2 p1 p1
as we had already found. And similarly for h2 .
5. APPLICATIONS TO MICROECONOMIC THEORY 53

Let us summarise what we have found so far. The Marshallian demand func-
tions are
w
x1 (p1 , p2 , w) =
2p1
w
x2 (p1 , p2 , w) =
2p2
The indirect utility function is
w
v(p1 , p2 , w) = √ .
2 p1 p2
The Hicksian demand functions are
r
p2
h1 (p1 , p2 , w) = u
p1
r
p1
h2 (p1 , p2 , w) = u ,
p2

and the expenditure function is



e(p1 , p2 , u) = 2u p1 p2 .
We now look at the third application concerning the Hicks -Slutsky decompo-
sition. First let us confirm that if we substitute the expenditure function for w in
the Marshallian demand function we do obtain the Hicksian demand function.
e(p1 , p2 , u)
x1 (p1 , p2 , e(p1 , p2 , u)) =
2p1

2u p1 p2
=
2p1
r
p2
=u ,
p1
as required.
Similarly, if we plug the indirect utility function v into the Hicksian demand
function hi we obtain the Marshallian demand function xi . Confirmation of this is
left as an exercise. [You should do this exercise. If you understand properly it is
very easy. If you understand a bit then doing the exercise will solidify your under-
standing. If you can’t do it then it is a message to get some further explanation.]
Let us now check the Hicks-Slutsky decomposition for the effect of a change in
the price of good 2 on the demand for good 1. The Hicks-Slutsky decomposition
tells us that
∂x1 ∂h1 ∂x1
= − h2 .
∂p2 ∂p2 ∂w
Calculating these partial derivatives we have
∂x1
=0
∂p2
∂x1 1
=
∂w 2p1
∂h1 u 1 1
=√ × ×√
∂p2 p1 2 p2
u
= √
2 p1 p2
543. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT

and r
p2
h1 = u .
p1
Substituting into the right hand side of the Hicks-Slutsky equation above gives
r
u p2 1
RHS = √ −u · = 0,
2 p1 p2 p1 2p1
which is exactly what we had found for the left hand side of the Hicks-Slutsky
equation.
Finally we check Roy’s Theorem, which tells us that the Marshallian demand
for good 1 can be found as
∂v
− ∂p 1
x1 (p1 , p2 , w) = ∂v
.
∂w
In this case we obtain
−3
−1
− w2 × √1
p2 × 2 × p1 2

x1 (p1 , p2 , w) = q
1 1
2 p1 p2
w
= ,
2p1
as required.
Exercises.
Exercise 79. Consider the direct utility function
Xn
u(x) = βi log(xi − γi ),
i=1
where βi and γi , i = 1, . . . , n are, respectively, positive and nonpositive parameters.
(1) Derive the indirect utility function and show that it is decreasing in its
arguments.
(2) Verify Roy’s Theorem.
(3) Derive the expenditure function and show that it is homogeneous of degree
one and nondecreasing in prices.
(4) Verify Hotelling’s Theorem.

Exercise 80. For the utility function defined in exercise 2,


(1) Derive the Slutsky equation.
(2) Let di (p, y) be the demand for good i derived from the above utility func-
tion. Goods i and j are said to be gross substitutes if ∂di (p, y)/∂pj > 0
and gross complements if ∂di (p, y)/∂pj < 0. For this utility function are
the various goods gross substitutes, gross complements, or can we not say?
(The two previous exercises are taken from R. Robert Russell and Maurice
Wilkinson, Microeconomics: A Synthesis of Modern and Neoclassical Theory, New
York, John Wiley & Sons, 1979.)
Exercise 81. An electric utility has two generating plants in which total costs
per hour are c1 and c2 respectively where
c1 =80 + 2x1 + 0.001bx21
b >0
c2 =90 + 1.5x2 + 0.002x22
5. APPLICATIONS TO MICROECONOMIC THEORY 55

where xi is the quantity generated in the i-th plant. If the utility is required to pro-
duce 2000 megawatts in a particular hour, how should it allocate this load between
the plants so as to minimise costs? Use the Lagrangian method and interpret the
multiplier. How do total costs vary as b changes. (That is, what is the derivative
of the minimised cost with respect to b.)
CHAPTER 4

Topics in Convex Analysis

1. Convexity
Convexity is one of the most important mathematical properties in econom-
ics. For example, without convexity of preferences, demand and supply functions
are not continuous, and so competitive markets generally do not have equilibrium
points. The economic interpretation of convex preference sets in consumer theory is
diminishing marginal rates of substitution; the interpretation of convex production
sets is constant or decreasing returns to scale. Considerably less is known about
general equilibrium models that allow non-convex production sets (e.g., economies
of scale) or non-convex preferences (e.g., the consumer prefers a pint of beer or a
shot of vodka alone to any mixture of the two).
Another set of mathematical results closely connected to the notion of convexity
is so-called separation and support theorems. These theorems are frequently used in
economics to obtain a price system that leads consumers and producers to choose
Pareto-efficient allocation. That is, given the prices, producers are maximizing
profits, and given those profits as income, consumers are maximizing utility subject
to their budget constraints.
1.1. Convex Sets. Given two points x, y ∈ Rn , a point z = ax + (1 − a) y,
where 0 ≤ a ≤ 1, is called a convex combination of x and y.
The set of all possible convex combinations of x and y, denoted by [x, y], is
called the interval with endpoints x and y (or, the line segment connecting x and
y):
[x, y] = {ax + (1 − a) y : 0 ≤ a ≤ 1} .
Definition 18. A set S ⊆ Rn is convex iff for any x, y ∈ S the interval
[x, y] ⊆ S.
In words: a set is convex if it contains the line segment connecting any two of
its points; or, more loosely speaking, a set is convex if along with any two points it
contains all points between them.
Convex sets in R2 include interiors of triangle, squares, circles, ellipses, and
hosts of other sets. Note also that, for example in R3 , while the interior of a cube is
a convex set, its boundary is not. The quintessential convex set in Euclidean space
Rn for any n > 1 is the n−dimensional sphere SR (a) of radius R > 0 about point
a ∈ Rn , given by
SR (a) = {x : x ∈ Rn , |x − a| < R}.
More examples of convex sets:
1. Is the empty set convex? Is a singleton convex? Is Rn convex?
There are also several standard ways of forming convex sets from convex sets:
2. Let A, B ⊆ Rn be sets. The Minkowski sum A + B ⊆ Rn is defined as
A + B = {x + y : x ∈ A, y ∈ B} .
When B = {b} is a singleton, the set A + b is called a translation of A. Prove that
A + B is convex if A and B are convex.
57
58 4. TOPICS IN CONVEX ANALYSIS

3. Let A ⊆ Rn be a set and α ∈ R be a number. The scaling αA ⊆ Rn is


defined as
αA = {αx : x ∈ A} .
When α > 0, the set αA is called a dilation of A. Prove that αA is convex if A is
convex.
4. Prove that the intersection ∩i∈I Si of any number of convex sets is convex.
5. Show by example that the union of convex sets need not be convex.
It is also possible to define convex combination of arbitrary (but finite) number
of points.
Definition 19. Let x1 , ..., xk be a finite set of points from Rn . A point
k
X
x= αi xi ,
i=1
k
P
where αi ≥ 0 for i = 1, ..., k and αi = 1, is called a convex combination of
i=1
x1 , ..., xk .
Note that the definition of a convex combination of two points is a special case
of this definition. (Prove it)
Can we generate ‘superconvex’ sets using definition 19? No! as the following
Lemma shows.
Lemma 1. A set S ⊆ Rn is convex iff every convex combination of points of S
is in S.
Proof. If a set contains all convex combinations of its points it is obviously
convex, because it also contains convex combinations of all pairs its points. Thus,
we need to show that a convex set contains any convex combination of its points.
The proof is by induction on the number of points of S in a convex combination.
By definition, convex set contains all convex combinations of any two of it points.
Suppose that S contains any convex combination of n or fewer points and consider
Pn+1
one of n + 1 points, x = i=1 αi xi . Since not all αi = 1, we can relabel them so
that αn+1 < 1. Then
n
X αi
x = (1 − αn+1 ) xi + αn+1 xn+1
i=1
1 − αn+1
= (1 − αn+1 ) y + αn+1 xn+1 .
Note that y ∈ S by induction hypothesis (as a convex combination of n points of
S) and, as a result, so is x, being a convex combination of two points in S. 
But, using definition 19, we can generate convex sets from non-convex sets!
This operation is very useful, so the resulting set deserves a special name.
Definition 20. Given a set S ⊆ Rn the set of all convex combinations of
points from S, denoted convS, is called the convex hull of S.
Note: convince yourself that the adjective ‘convex’ in the term ‘convex hull’ is
well-deserved by proving that convex hull is indeed convex! Now, the lemma 1 can
be written more succinctly: S = convS iff S is convex.
1.2. Convex Hulls. The next theorem deals the following interesting prop-
erty of convex hulls: the convex hull of a set S is the intersection of all convex sets
containing S. Thus, in a natural sense, the convex hull of a set S is the ‘smallest’
convex set containing S. In fact, many authors define convex hulls in that way and
then prove our Definition 20 as theorem.
1. CONVEXITY 59

Theorem 11. Let S ⊆ Rn be a set then any convex set containing S also
contains convS.
Proof. Let A be a convex set such that S ⊆ A. By lemma 1 A contains all
convex combinations of its points and, in particular, all convex combinations of
points of its subset S, which is convS. 
The next property is quite obvious and, again, frustrates attempts to generate
‘superconvex’ sets, this time by trying to take convex hulls of convex hulls.
1. Prove that convconvS = convS for any S.
2. Prove that if A ⊂ B then convA ⊂ convB.
The next property relates the operation of taking convex hulls and of taking
direct sums. It does not matter in which order you use these operations.
3. Prove that conv (A + B) = (convA) + (convB).
4. Prove that conv (A ∩ B) ⊆ (convA) ∩ (convB).
5. Prove that (convA) ∪ (convB) ⊆ conv (A ∪ B).
1.3. Caratheodory’s Theorem. The definition 20 implies that any point x
in the convex hull of S is representable as a convex combination of (finitely) many
points of S but it places no restrictions on the number of points of S required
to make the combination. Caratheodory’s Theorem puts the upper bound on the
number of points required, in Rn the number of points never has to be more than
n + 1.
Theorem 12 (Caratheodory, 1907). Let S ⊆ Rn be a non-empty set then every
x ∈ convS can be represented as a convex combination of (at most) n + 1 points
from S.
Note that the theorem does not ‘identify’ points used in representation, their
choice would depend on x.
Show by example that the constant n + 1 in Caratheodory’s theorem cannot
be improved. That is, exhibit a set S ⊆ Rn and a point x ∈ convS that cannot be
represented as a convex combination of fewer than n + 1 points from S.
1.4. Polytopes. The simplest convex sets are those which are convex hulls of
a finite set of points, that is, sets of the form S = conv{x1 , x2 , ..., xm }. The convex
hull of a finite set of points in Rn is called a polytope.
1. Prove that the set
n+1
X
∆ = {x ∈ Rn+1 : xi = 1 and xi ≥ 0 for any i}
i=1

is a polytope. This polytope is called the standard n−dimensional simplex.


2. Prove that the set
C = {x ∈ Rn+1 : 0 ≤ xi ≤ 1 for any i}
is a polytope. This polytope is called an n−dimensional cube.
3. Prove that the set
n+1
X
O = {x ∈ Rn+1 : |xi | ≤ 1}
i=1

is a polytope. This polytope is called a (hyper)octahedron.


1.5. Topology of Convex Sets.
(1) The closure of a convex set is a convex set.
(2) The interior of a convex set (possible empty) is convex.
60 4. TOPICS IN CONVEX ANALYSIS

1.6. Aside: Helly’s Theorem. While there are not so many applications of
Helly’s theorem to economics (in fact, I am aware of the only one paper that uses
Helly’s theorem in economic context), it is definitely one of the most famous results
in convexity.
Theorem 13 (Helly, 1913). Let A1 , A2 , ..., Am ⊆ Rn be a finite family of convex
sets with m ≥ n + 1. Suppose that every n + 1 sets have a nonempty intersection.
Then all sets have a nonempty intersection.
To prove Helly’s theorem with elegance we need first to formulate a very useul
result obtianed by J.Radon.
Theorem 14 (Radon, 1921). Let S ⊆ Rn be a set of at least n + 2 points.
Then there are two non-intersecting subsets R ⊂ S (‘red points’) and B ⊂ S (‘blue
points’) such that
convR ∩ convB 6= ∅.
Proof. Let x1 , ..., xm be m ≥ n + 2 distinct points from S. Consider the
system of n + 1 homogeneous linear equations in variables γ1 , ..., γm
γ1 x1 + ... + γm xm = 0 and γ1 + ... + γm = 0
Since m ≥ n + 2, there is a nontrivial solution to this system. Let
R = {xi : γi > 0} and B = {xi : γi < 0}.
P P
Then R ∩ B = ∅. Let β = γi then β > 0 and γi = −β, since γ’s sum
i:γi >0 i:γi <0
up to zero. Moreover,
X X
γi xi = (−γi ) xi
i:γi >0 i:γi <0
i
P
since γi x = 0. Let
X γi X −γi
x= xi = xi .
i:γ >0
β i:γ <0
β
i i

Thus x is a convex combination of points from R and a convex combination of


points from B. 
Example: For any set of four points in the plane (R2 ) either one of the points
lies within the convex hull of other three or the points can be split into two pairs
whose convex hulls intersect. In both cases, Radon’s theorem is true.
Proof of Helly’s theorem. (Due to Radon, 1921). The proof is by induc-
tion on the number of points, m, starting with m = n + 1. Suppose that m > n + 1.
Then, by the induction hypothesis, for every i = 1, ..., m there is a point pi in the
intersection A1 ∩ ... ∩ Ai−1 ∩ Ai+1 ∩ ... ∩ Am (Ai is omitted). Altogether there are
m > n + 1 points each of which belongs to all the sets, except perhaps Ai .
If two of the points are the same then that point belongs to all Ai ’s. Otherwise,
by Radon’s Theorem, there are non-intersecting subsets R = {pi : i ∈ I} and
B = {pi : j ∈ J} such that there is a point p ∈ convR ∩ convB. We claim that
p is a common point of all Ai ’s. All the points of R belong to the sets Ai : i ∈ / I
and all the points of B belong to the sets Aj : j ∈/ J. Since the sets Ai ’s and Aj ’s
are convex, every point in convR belongs to Ai : i ∈ / I and every point in convB
belongs to Aj : j ∈ / J. Therefore
p ∈ ∩i∈I
/ Ai and p ∈ ∩i∈J
/ Aj .

Since I ∩ J = ∅ we have p ∈ ∩m
i=1 Ai . 
2. SUPPORT AND SEPARATION 61

2. Support and Separation


2.1. Hyperplanes. The concept of hyperplane in Rn is a straightforward gen-
eralisation of the notion of a line in R2 and of a plane in R3 . A line in R2 can be
described by an equation
p1 x1 + p2 x2 = α
where p = (p1 , p2 ) is some non-zero vector and α is some scalar. A plane in R3 can
be described by an equation
p1 x1 + p2 x2 + p3 x3 = α
where p = (p1 , p2 , p3 ) is some non-zero vector and α is some scalar. Similarly, a
hyperplane in Rn can be described by an equation
Xn
pi xi = α
i=1
where p = (p1 , p2 , ..., pn ) is some non-zero vector in Rn and α is some scalar. It can
be written in more concise way using scalar (aka inner, dot) product notation.
Definition 21. A hyperplane is the set
H(p, α) = {x ∈ Rn : p · x = α}
where p ∈ Rn is a non-zero vector and α is a scalar. The vector p is called the
normal to the hyperplane H.
Suppose that there are two points x∗ , y ∗ ∈ H(p, α). Then by definition p·x∗ = α
and p · y ∗ = α. Hence p · (x∗ − y ∗ ) = 0. In other words, vector p is orthogonal to
the line segment (x∗ − y ∗ ), or to H(p, α).
Given a hyperplane H ⊂ Rn points in Rn can be classified according to their
positions relative to hyperplane. The (closed) half-space determined by the hyper-
plane H(p, α) is either the set of points ‘below’ H or the set of points ‘above’ H,
i.e., either the set {x ∈ Rn : p · x ≤ α} or the set {x ∈ Rn : p · x ≥ α}. Open
half-spaces are defined by strict inequalities. Prove that a closed half-space is closed
and open half-space is open.
The straightforward economic example of a half-space is a budget set {x ∈
Rn : p · x ≤ α} of a consumer with income α facing the vector of prices p. (It was
rather neat to call the normal vector p, wasn’t it?). By the way, hyperplanes and
half-spaces are convex sets (Prove it).
2.2. Support Functions. In this section we give a description of what is
called a dual structure. Consider the set of all closed convex subsets of Rn . We
will show that to each such set S we can associate an extended-real valued function
µS : Rn → R ∪ {∞}, that is a function that maps each vector in Rn to either a real
number or to −∞. Not all such functions can be arrived at in this way. In fact
we shall show that any such function must be concave and homogeneous of degree
1. But once we restrict attention to functions that can be arrived at as a “support
function” for some such closed convex set we have another set of objects that we
can analyse and perhaps make useful arguments about the original sets in which
we where interested.
In fact, we shall define the function µS for any subset of Rn , not just the closed
and convex ones. However, if the original set S is not a closed convex one we shall
lose some information about S in going to µS . In particular, µS only depends on
the closed convex hull of S, that is, if two sets have the same closed convex hull
they will lead to the same function µS .
We define µS : Rn → R ∪ {−∞} as
µS (p) = inf{p · x | x ∈ S},
62 4. TOPICS IN CONVEX ANALYSIS

where inf denotes the infimum or greatest lower bound. It is a property of the
real numbers that any set of real numbers has an infimum. Thus µS (p) is well
defined for any set S. If the minimum exists, for example if the set S is compact,
then the infimum is the minimum. In other cases the minimum may not exist. To
take a simple one dimensional example suppose that the set S was the subset of R
consisting od the numbers 1/n for n = 1, . . . and that p = 2. Then clearly p·x = px
does not have a minimum on the set S However 0 is less than px = 2x for any value
of x in S but for any number a greater than 0 there is a value of x in S such that
px < a. Thus 0 is in this case the infimum of the set {p · x | x ∈ S}.
Recall that we have not assumed that S is convex. However, if we do assume
that S is both convex and closed then the function µS contains all the information
needed to reconstruct S.
Given any extended-real valued function µ : Rn → R ∪ {∞} let us define the
set Sµ as
Sµ = {x ∈ Rn | p · x ≥ µ(p) for every p ∈ Rn }.
That is, for each p > −inf ty we define the closed half space
{x ∈ Rn | p · x ≥ µ(p)}.
Notice that is µ(p) = −∞ then p · x ≥ µ(p) for any x and so the above set will be
Rn rather than a half space. The set Sµ is the intersection of all these closed half
spaces. Since the intersection of convex sets is convex and the intersection of closed
sets is closed, the set Sµ is, for any function µ, a closed convex set.
Suppose that we start with a set S, define µS as above and then use µS to
define the set SµS . If the set S was a closed convex set then SµS will be exactly
equal to S. Since we have seen that SµS is a closed convex set, it must be that if
S is not a closed convex set it will not be equal to SµS . However S will always be
a subset of SµS , and indeed SµS will be the smallest closed convex set such that S
is a subset, that is SµS is the closed convex hull of S.

2.3. Separation. We now consider the notion of ‘separating’ two sets by a


hyperplane.
Definition 22. A hyperplane H separates sets A and B if A is contained in
one closed half-space and B is contained in the other. A hyperplane H strictly
separates sets A and B if A is contained in one open half-space and B is contained
in the other.
It is clear that strict separation requires the two sets to be disjoint. For example,
consider two (externally) tangent circles in a plane. Their common tangent line
separates them but does not separate them strictly. On the other hand, although it
is necessary for two sets be disjoint in order to strictly separate them, this condition
is not sufficient, even for closed convex sets. Let A = {x ∈ R2 : x1 > 0 and
x1 x2 ≥ 1} and B = {x ∈ R2 : x1 ≥ 0 and x2 = 0} then A and B are disjoint
closed convex sets but they cannot be strictly separated by a hyperplane (line in
R2 ). Thus the problem of the existence of separating hyperplane is more involved
then it may appear to be at first.
We start with separation of a set and a point.
Theorem 15. Let S ⊆ Rn be a convex set and x0 ∈ / S be a point. Then S and
x0 can be separated. If S is closed then S and x can be strongly separated.
Idea of proof. Proof proceeds in two steps. The first step establishes the
existence a point a in the closure of S which is the closest to x0 . The second step
constructs the separating hyperplane using the point a.
2. SUPPORT AND SEPARATION 63

STEP 1. There exists a point a ∈ S̄ (closure of S) such that d(x0 , a) ≤ d(x, a)


for all x ∈ S̄, and d(x0 , a) > 0.
Let B̄(x0 ) be closed ball with centre at x0 that intersects the closure of S.
Let A = B̄(x0 ) ∩ S̄ 6= ∅. The set A is nonempty, closed and bounded (hence
compact). According to Weierstrass’s theorem, the continuous distance function
d(x0 , x) achieves its minimum in A. That is, there exists a ∈ A such that d(x0 , a) ≤
d(x, a) for all x ∈ S̄. Note that d(x0 , a) > 0
STEP 2. There exists a hyperplane H(p, α) = {x ∈ Rn : p · x = α} such that
p · x ≥ α for all x ∈ S̄ and p · x < α.
Construct a hyperplane which goes through the point a ∈ S̄ and has normal
p = a − x0 . The proof that this hyperplane is the separating one is done by
contradiction. Suppose there exists a point y ∈ S̄ which is strictly on the same side
of H as x0 . Consider the point y 0 ∈ [a, y] such that the vector y 0 − x0 is orthogonal
to y − a. Since d(x0 , y) ≥ d(x0 , a), the point y 0 is between a and y. Thus, y ∈ S̄
and d(x0 , y 0 ) ≤ d(x0 , a) which contradicts the choice of a. When S = S̄, that is S
is closed, the separation can be made strict by choosing a point strictly in between
a and x0 instead of a. This is always possible because d(x0 , a) > 0. 

Theorem 15 is very useful because separation of a pair of sets can be always


reduced to separation of a set and a point.
Lemma 2. Let A and B be a non-empty sets. A and B can be separated
(strongly separated) iff A − B and 0 can be separated (strongly separated).
Proof. If A and B are convex then A − B is convex. If A is compact and B
is closed then A − B is closed. And 0 ∈
/ A − B iff A ∩ B = ∅. 

Theorem 16 (Minkowski, 1911). Let A and B be a non-empty convex sets


with A ∩ B = ∅. Then A and B can be separated. If A is compact and B is closed
then A and B can be strongly separated.
2.4. Support. Closely (not in topological sense) related to the notion of a
separating hyperplane is the notion of supporting hyperplane.
Definition 23. The hyperplane H supports the set S at the point x0 ∈ S if
x0 ∈ H and S is a subset of one of the half-spaces determined by H.
A convex set can be supported at any of its boundary points, this is the imme-
diate consequence of Theorem 16. To prove it, consider the sets A and B = {x0 },
where x0 is a boundary point of A.
Theorem 17. Let S ⊆ Rn be a convex set with nonempty interior and x0 ∈ S
be its boundary point. Then there exist a supporting hyperplane for S at x0 .
Note that if the boundary of a convex set is smooth (‘differentiable’) at the
given point x0 then the supporting hyperplane is unique and is just the tangent
hyperplane. If, however, the boundary is not smooth then there can be many
supporting hyperplanes passing through the given point. It is important to note
that conceptually the supporting theorems are connected to calculus. But, the
supporting theorems are more powerful (don’t require smoothness), more direct,
and more set-theoretic.
Certain points on the boundary of a convex set carry a lot of information about
the set.
Definition 24. A point x of a convex set S is an extreme point of S if x is
not an interior point of any line segment in S.
64 4. TOPICS IN CONVEX ANALYSIS

The extreme points of a closed ball and of a closed cube in R3 are its boundary
points and its eight vertices, respectively. A half-space has no extreme points even
if it is closed.
An interesting property of extreme points is that an extreme point can be
deleted from the set without destroying convexity of the set. That is, a point x in
a convex set S is an extreme point iff the set S\{x} is convex.
The next Theorem is a finite-dimensional version of a quite general and powerful
result by M.G. Krein and D.P. Milman.
Theorem 18 (Krein & Milman, 1940). Let S ⊆ Rn be convex and compact.
Then S is the convex hull of its extreme points.

S-ar putea să vă placă și