Documente Academic
Documente Profesional
Documente Cultură
2009
i
CHAPTER 1
1. Logic
All the aspects of logic that we describe in this section are part of what is called
first order or propositional logic.
We start by supposing that we have a number of atomic statements, which we
denote by lower case letters, p, q, r. Examples of such statements might be
Consumer 1 is a utility maximiser
the apple is green
the price of good 3 is 17.
We assume that each atomic statement is either true or false.
Given these atomic statements we can form other statements using logical con-
nectives.
If p is a statement then ¬p, read not p, is the statement that is true precisely
when p is false. If both p and q are statements then p ∧ q, read p and q, is the
statement that is true when both p and q are true and false otherwise. If both p
and q are statements then p ∨ q, read p or q, is the statement that is true when
either p and q are true, that is, the statement that is false only if both p and q are
false.
We could make do with these three symbols together with brackets to group
symbols and tell us what to do first. For example we could have the complicated
statement ((p ∧ q) ∨ (p ∧ r)) ∨ ¬s. This means that at least one of two statements
is true. The first is that either both p and q are true or both p and r are true. The
second is that s is not true.
Exercise 1. Think about the meaning of the statement we have just consid-
ered. Can you see a more straightforward statement that would mean the same
thing?
p q p⇒q ¬p ¬p ∨ q
T T T F T
F T T T T
T F F F F
F F T T T
Since the third column and the fifth column contain exactly the same truth values
we see that the two statements, p ⇒ q and ¬p ∨ q are indeed logically equivalent.
Exercise 2. Construct the truth table for the statement ¬(¬p ∨ ¬q). Is it
possible to write this statement using fewer logical connectives? Hint: why not
start with just one?
Exercise 3. Prove that the following statements are equivalent:
(i) (p ∨ ¬q) ⇒ ((¬p) ∧ q) and ¬(q ⇒ p),
(ii) p ⇒ q and ¬q ⇒ ¬p.
In part (ii) the second statement is called the contrapositive of the first statement.
Often if you are asked to prove that p implies q it will be easier to show the
contrapositive, that is, that not q implies not p.
Exercise 4. Prove that the following statements are equivalent:
(i) ¬(p ∧ q) and ¬p ∨ ¬q,
(ii) ¬(p ∨ q) and ¬p ∧ ¬q.
2. SETS 3
2. Sets
Set theory was developed in the second half of the 19th century and is at the
very foundation of modern mathematics. But we shall not be concerned here with
the development of the theory. Rather we shall only give the basic language of set
theory and outline some of the very basic operations on sets.
We start by defining a set to be a collection of objects or elements. We will
usually denote sets by capital letters and their elements by lower case letters. If
the element a is in the set A we write a ∈ A. If every element of the set B is also
in the set A we call B a subset of the set A and write B ⊂ A. We shall also say
the A contains B. If A and B have exactly the same elements then we say they
are equal or identical. Alternatively we could say A = B if and only if A ⊂ B and
B ⊂ A. If B ⊂ A and B 6= A then we say that B is a proper subset of A or that A
strictly contains B.
Exercise 5. How many subsets a set with N elements has?
In order to avoid the paradoxes such as the one referred to in the first paragraph
we shall always assume that in whatever situation we are discussing there is some
given set U called the universal set which contains all of the sets with which we
shall deal.
We customarily enclose our specification of a set by braces. In order to specify
a set one may simply list the elements. For example to specify the set D which
contains the numbers 1,2, and 3 we may write D = {1, 2, 3}. Alternatively we may
define the set by specifying a property that identifies the elements. For example
we may specify the same set D by D = {x | x is an integer and 0 < x < 4}. Notice
that this second method is more powerful. We could not, for example, list all
the integers. (Since there are an infinite number of them we would die before we
finished.)
For any two sets A and B we define the union of A and B to be the set which
contains exactly all of the elements of A and all the elements of B. We denote the
union of A and B by A ∪ B. Similarly we define the intersection of A and B to
be that set which contains exactly those elements which are in both A and B. We
denote the intersection of A and B by A ∩ B. Thus we have
A∪B = {x | x ∈ A or x ∈ B}
A∩B = {x | x ∈ A and x ∈ B}.
Exercise 6. The oldest mathematician among chess players and the oldest
chess player among mathematicians is it the same person or (possibly) different
ones?
Exercise 7. The best mathematician among chess players and the best chess
player among mathematicians is it the same person or (possibly) different ones?
4 1. LOGIC, SETS, FUNCTIONS, AND SPACES
3. Binary Relations
There are a number of ways of formulating the notion of a binary relation. We
shall pursue one, defining a binary relation on a set X simply as a subset of X × X,
the Cartesian product of X with itself.
Definition 1. A binary relation R on the set X is a subset of X × X. If the
point (x, y) ∈ R we shall often write xRy instead of (x, y) ∈ R.
Since we have already defined the notions of Cartesian product and subset,
there is really nothing new here. However the structure and properties of binary
relations that we shall now study is motivated by the informal notion of a “relation”
between the elements of X.
Example 1. Suppose that X is a set of boys and girls and the relation xSy is
“x is a sister of y.”
Example 2. Suppose that X is the set of natural numbers X = {1, 2, 3, . . . }.
There are binary relations >, ≥, and =.
Example 3. Suppose that X is the set of natural numbers X = {1, 2, 3, . . . }.
The relations R, P , and I are defined by
xRy if and only if x + 1 ≥ y,
xP y if and only if x > y + 1, and
xIy if and only if −1 ≤ x − y ≤ 1.
Definition 2. The following properties of binary relations have been defined
and found to be useful.
(BR1) Reflexivity: For all x in X xRx.
(BR2) Irreflexivity: For all x in X not xRx.
(BR3) Completeness: For all x and y in X either xRy or yRx (or both).1
(BR4) Transitivity: For all x, y, and z in X if xRy and yRz then xRz.
(BR5) Negative Transitivity: For all x, y, and z in X if xRy then either
xRz or zRy (or both).
(BR6) Symmetry: For all x and y in X if xRy then yRx.
(BR7) Anti-Symmetry: For all x and y in X if xRy and yRx then x = y.
(BR8) Asymmetry: For all x and y in X if xRy then not yRx.
1We shall always implicitly include “or both” when we say “either. . . or.”
4. FUNCTIONS 5
Exercise 10. Show that completeness implies reflexivity, that asymmetry im-
plies anti-symmetry, and that asymmetry implies irreflexivity.
Exercise 11. Which properties does the relation described in Example 1 sat-
isfy?
Exercise 12. Which properties do the relations described in Example 2 sat-
isfy?
Exercise 13. Which properties do the relations described in Example 3 sat-
isfy?
We now define a few particularly important classes of binary relations.
Definition 3. A weak order is a binary relation that satisfies transitivity and
completeness.
Definition 4. A strict partial order is a binary relation that satisfies transi-
tivity and asymmetry.
Definition 5. An equivalence is a binary relation that satisfies transitivity
and symmetry.
You have almost certainly already met examples of such binary relations in
your study of Economics. We normally assume that weak preference, strict pref-
erence, and indifference of a consumer are weak orders, strict partial orders, and
equivalences, though we actually typically assume a little more about the strict
preference.
The following construction is also motivated by the idea of preference. Let
us consider some binary relation R which we shall informally think of as a weak
preference relation, though we shall not, for the moment, make any assumptions
about the properties of R. Consider the relations P defined by xP y if and only if
xRy and not yRx and I defined by xRy if and only if xRy and yRx.
Exercise 14. Show that if R is a weak order then P is a strict partial order
and I is an equivalence.
We could also think of starting with a strict preference P and defining the weak
preference R in terms of P . We could do so either by defining R as xRy if and only
if not yP x or by defining R as xRy if and only if either xP y or not yP x.
Exercise 15. Show that these two definitions of R coincide if P is asymmetric.
Exercise 16. Show by example that P may be a strict partial order (so, by
the previous result, the two definitions of R coincide) but R not a weak order.
[Hint: If you cannot think of another example consider the binary relations defined
in Example 3.]
Exercise 17. Show that if P is asymmetric and negatively transitive then
(i) P is transitive (and hence a strict partial order), and
(ii) R is a weak order.
4. Functions
Let X and Y be two sets. A function (or a mapping) f from the set X to the
set Y is a rule that assigns to each x in X a unique element in Y , denoted by f (x).
The notation
f : X → Y.
6 1. LOGIC, SETS, FUNCTIONS, AND SPACES
is standard. The set X is called the domain of f and the set Y is called the
codomain of f . The set of all values taken by f , i.e. the set
{y ∈ Y | there exists x in X such that y = f (x)}
is called the range of f . The range of a function need not coincide with its codomain
Y.
There are several useful ways of visualising functions. A function can be thought
of as a machine that operates on elements of the set X and transforms an input
x into a unique output f (x). Note that the machine is not required to produce
different outputs from different inputs. This analogy helps to distinguish between
the function itself, f , and its particular value, f (x). The former is the machine,
the latter is the output2! One of the reasons for this confusion is that in practice,
to avoid being verbose, people often say things like ‘consider a function U (x, y) =
xα y β ’ instead of saying ‘consider a function defined for every pair (x, y) in R2 by
the equation U (x, y) = xα y β ’.
A function can also be thought of as a transformation, or a mapping, of the set
X into the set Y . In line with this interpretation is the common terminology, it is
said that f (x) is the image of x under the function f . Again, it is important to
remember that there may be points of Y which are the images of no point of X and
that there may be different points of X which have the same images in Y . What is
absolutely prohibited, however, is for a point from X to have several images in Y !
The part of definition of the function is the specification of its domain. However,
in applications, functions are quite often defined as an algebraic formula, without
explicit specification of its domain. For example, a function may be defined as
f (x) = sin x + 145x2 .
The function f is then the rule that assigns the value sin x + 145x2 to each value of
x. The convention in such cases is that the domain of f is the set of all values of x
for which the formula gives a unique value. Thus, if you come, for instance, across
the function f (x) = 1/x you should assume that its domain is (−∞, 0) ∪ (0, ∞),
unless specified otherwise.
For any subset A of X, the subset f (A) of Y such that y = f (x) for some x in
X is called the image of A by f , that is,
f (A) = {y ∈ Y | there exists x in A such that y = f (x)}.
Thus, the the range of f can be written as f (X). Similarly, one can define the
inverse image. For any subset B of Y , the inverse image f −1 (B) of B is the set of
x in X such that f (x) is in B, that is,
f −1 (B) = {x ∈ X | f (x) ∈ B}.
A function f is called a function onto Y (or surjection) if the range of f is Y ,
i.e., if for every y ∈ Y there is (at least) one x ∈ X such that y = f (x). In other
words, each element of Y is the image of (at least) one element of X. A function f is
called one-to-one (or injection) if f (x1 ) = f (x2 ) implies x1 = x2 , that is, for every
element y of f (X) there is a unique element x of X such that y = f (x). In other
words, one-to-one function maps different elements of X into different elements of
Y . When a function f : X → Y is both onto and one-to-one it is called a bijection.
Exercise 18. Suppose that a set X has m elements and a set Y has n ≥ m
elements. How many different functions are there from X to Y ? from Y to X?
How many of them surjective? How many of them injective? How many of them
bijective?
2Mathematician Robert Bartle put it as follows. ”Only a fool would confuse sausage-grinder
with a sausage; however, enough people have confused functions with their values...”
5. SPACES 7
Exercise 20. Prove that when a function f −1 exists it is both onto and one-
to-one and that the inverse of f −1 is the function f itself.
The set G ⊂ X × Y of ordered pairs (x, f (x)) is called the graph of the function
f 3. Of course, the fact that something is called a graph does not necessarily mean
that it can be drawn!
5. Spaces
Sets are reasonably interesting mathematical objects to study. But to make
them even more interesting (and useful for applications) sets are usually endowed
with some additional properties, or structures. These new objects are called spaces.
The structures are often modeled after the familiar properties of space we live in and
reflect (in axiomatic form) such notions as order, distance, addition, multiplication,
etc.
Probably one of the most intuitive spaces is the space of the real numbers, R.
We will briefly look at the axiomatic way of describing some of its properties.
Given the set of real numbers R, the operation of addition is the function
+ : R × R → R that maps any two elements x and y in R to an element denoted
by x + y and called the sum of x and y. The addition satisfies the following axioms
for all real numbers x, y, and z.
A1: x + y = y + x.
A2: (x + y) + z = x + (y + z).
A3: There exist an element, denoted by 0, such that x + 0 = x.
A4: For each x there exist an element, denoted by −x, such that x + (−x) = 0.
All the remaining properties of the addition can be proven using these axioms.
Note also that we can define another operation x − y as x + (−y) and call it
subtraction.
3Some people like the idea of the graph of a function so much that they define a function to
be its graph.
8 1. LOGIC, SETS, FUNCTIONS, AND SPACES
Exercise 22. Prove that the axioms for addition imply the following state-
ments.
(i) The element 0 is unique.
(ii) If x + y = x + z then y = z (a cancelation law).
(iii) −(−x) = x.
The operation of multiplication can be axiomatised in a similar way. Given the
set of real numbers, R, the operation of multiplication is the function · : R × R → R
that maps any two elements x and y in R to an element denoted by x · y and called
the product of x and y. The multiplication satisfies the following axioms for all real
numbers x, y, and z.
A5: x · y = y · x.
A6: (x · y) · z = x · (y · z).
A7: There exist an element, denoted by 1, such that x · 1 = x.
A8: For each x 6= 0 there exist an element, denoted by x−1 , such that
x · x−1 = 1.
One more axiom (a distributive law) brings these two operations, addition and
multiplication4, together.
A9: x(y + z) = xy + xz for all x, y, and z in R.
Another structure possessed by the real numbers has to do with the fact that
the real numbers are ordered. The notion of x less than y can be axiomatised as
follows. For any two distinct elements x and y either x < y or y < x and, in
addition, if x < y and y < z then x < z.
Another example of a space (very important and useful one) is n−dimensional
real space5. Given the natural number n, define Rn to be the set of all possi-
ble ordered n−tuples of n real numbers, with generic element denoted by x =
(x1 , . . . , xn ). Thus, the space Rn is the n−fold Cartesian product of the set R with
itself. Real numbers x1 , . . . , xn are called coordinates of the vector x. Two vectors
x and y are equal if and only if x1 = y1 , . . . , xn = yn . The operation of addition of
two vectors is defined as
x + y = (x1 + y1 , . . . , xn + yn ).
Exercise 23. Prove that the addition of vectors in Rn satisfies the axioms of
addition.
The role of multiplication in this space is player by the operation of multipli-
cation by real number defined for all x in Rn and all α in R by
αx = (αx1 , . . . , αxn ).
Exercise 24. Prove that the multiplication by real number satisfies a distribu-
tive law.
4From now on, to go easy on notation we will follow the standard convention not to write
the symbol for multiplication, that is to write xy instead of x · y, etc.
5We haven’t defined what the word dimension means yet, so just treat it as a (fancy) name.
6. METRIC SPACES AND CONTINUOUS FUNCTIONS 9
The set X together with the function d is called a metric space, elements of X
are usually called points, and the number d(x, y) is called the distance between x
and y. The last property of a metric is called triangle inequality.
Exercise 25. Let X be a non-empty set and d : X × X → R be the function
that satisfies the following two properties for all x, y, and z in X.
(i) d(x, y) = 0 if and only if x = y,
(ii) d(x, y) ≤ d(x, z) + d(z, y).
Prove that d is a metric.
Exercise 26. Prove that d(x, y) + d(w, z) ≤ d(x, w) + d(x, z) + d(y, w) + d(y, z)
for all x, y, w, and z in X, where d is some metric on X.
An obvious example of a metric space is the the set of real numbers, R, together
with the ‘usual’ distance, d(x, y) = |x − y|. Another example is the n−dimensional
Euclidean space Rn with metric
p
d(x, y) = (x1 − y1 )2 + · · · + (xn − yn )2 .
Note that the same set can be endowed with the different metrics thus resulting
in the different metric spaces! For example, the set of all n−tuples of real numbers
can be made into metric space by use of the (non-Euclidean) metric
dT (x, y) = |x1 − y1 | + · · · + |xn − yn |,
which is different from metric space Rn . This metric is sometimes called the Man-
hattan (or taxicab) metric. Another curious metric is the so-called French railroad
metric, defined by
0 if x = y
dF (x, y) =
d(x, P ) + d(y, P ) if x 6= y
where P is the particular point of Rn (called Paris) and function d is the Euclidean
distance.
Exercise 27. Prove that the French railroad metric dF is a metric.
Exercise 28. Let X be a non-empty set and d : X × X → R be the function
defined by
1 if x 6= y
d(x, y) =
0 if x = y
Prove that d is a metric. (This metric is called the discrete metric.)
Using the notion of metric it is possible to generalise the idea of continuous
function.
Suppose (X, dX ) and (Y, dY ) are metric spaces, x0 ∈ X, and f : X → Y is a
function. Then f is continuous at x0 if for every ε > 0 there exists a δ > 0 such
that
dY (f (x0 ), f (x)) < ε
for all points x ∈ X for which dX (x0 , x) < δ.
The function f is continuous on X if f is continuous at every point of X.
Let’s prove that function f (x) = x is continuous on R using the above definition.
For all x0 ∈ R, we have |f (x0 ) − f (x)| = |x0 − x| < ε as long as |x0 − x| < δ = ε.
That is, given any ε > 0 we are always able to find a δ, namely δ = ε, such that
all points which are closer to x0 than δ will have images which are closer to f (x0 )
than ε.
10 1. LOGIC, SETS, FUNCTIONS, AND SPACES
And why this theorem is important for us? Because many economic problems
are concerned with finding a maximal (or a minimal) value of a function on some set.
Weierstrass theorem provides conditions under which such search is meaningful!!!
This theorem and its implications will be much dwelt upon later in the notes, so
we just give here one example. The consumer utility maximisation problem is the
problem of finding the maximum of utility function subject to the budget constraint.
According to Weierstrass theorem, this problem has a solution if utility function is
continuous and the budget set is compact.
Exercise 34. Show that if the sequence {xn } converges to x0 then it does not
converge to any other value unequal to x0 . Another way of saying this is that if
the sequence converges then it’s limit is unique.
We have now seen a number of examples of sequences. In some the sequence
“runs off to infinity;” in others it “bounces around;” while in others it converges to
a limit. Could a sequence do anything else? Could a sequence, for example, settle
down each element getting closer and closer to all future elements in the sequence
but not converging to any particular limit? In fact, depending on what the space
X is this is indeed possible.
First let us recall the notion of a rational number. A rational number is a
number that can be expressed as the ratio of two integers, that is r is rational if
r = a/b with a and b integers and b 6= 0. We usually denote the set of all rational
numbers Q (since we have already used R for the real numbers). We now consider
and example in which the underlying space X is Q. Consider the sequence of
rational numbers defined in the following way
x1 = 1
xn + 2
xn+1 = .
xn + 1
This kind of definition is called a recursive definition. Rather than writing, as a
function of n, what xn is we write what x1 is and then what xn+1 is as a function
of what xn is. We can obviously find any element of the sequence that we need, as
long as we sequentially calculate each previous element. In our case we’d have
x1 = 1
1+2 3
x2 = = = 1.5
1+1 2
3
+ 2 7
x3 = 23 = = 1.4
2 +1
5
7
5 +2 17
x4 = 7 = ≈ 1.416667
5 + 1 12
17
12 + 2 41
x5 = 17 = ≈ 1.413793
12 + 1
29
41
29 + 2 99
x6 = 41 = ≈ 1.414286
29 + 1
70
..
.
We see that the sequence goes up and down but that it seems to be “converg-
ing.” What is it converging to? Lets suppose that it’s converging to some value x0 .
Recall that
xn + 2
xn+1 = .
xn + 1
We’ll see later that if f is a continuous function then lim n → ∞f (xn ) = f (lim n → ∞xn ).
In this case that means that
xn + 2
x0 = lim n → ∞xn+1 = lim n → ∞
xn + 1
x0 + 2
= .
x0 + 1
Thus we have
x0 + 2
x0 =
x0 + 1
8. SEQUENCES AND SUBSEQUENCES 13
√
and if we solve this we obtain x0 = ±√ 2. Clearly if xn > 0 then √ xn+1 >√0 so
our sequence can’t be converging to − 2 so we must have x0 = 2. But 2 is
not in Q. Thus we have a sequence of elements in Q that are getting very close to
each other but are not converging to any element of Q. (Of course the sequence is
converging to a point in R. In fact one construction of the real number system is
in terms of such sequences in Q.
Definition 8. Let {xn } be a sequence of points in (X, d). We say that the
sequence is a Cauchy sequence if for any ε > 0 there is N ∈ N such that if n, m > N
then d(xn , xm ) < ε.
Exercise 35. Show that if {xn } converges then {xn } is a Cauchy sequence.
A metric space (X, d) in which every Cauchy sequence converges to a limit in
X is called a complete metric space. The space of real numbers R is a complete
metric space, while the space of rationals Q is not.
Exercise 36. Is N the space of natural or counting numbers with metric d
given by d(x, y) = |x − y| a complete metric space?
In Section 6 we defined the notion of a function being continuous at a point.
It is possible to give that definition in terms of sequences.
Definition 9. Suppose (X, dX ) and (Y, dY ) are metric spaces, x0 ∈ X, and
f : X → Y is a function. Then f is continuous at x0 if for every sequence {xn } that
converges to x0 in (X, dX ) the sequence {f (xn )} converges to f (x0 ) in (Y, dY ).
Exercise 37. Show that the function f (x) = (x + 2)/(x + 1) is continuous at
any point x 6= −1. Show that this means that if xn → x0 as n → ∞ then
xn + 2 x0 + 2
lim = .
n→∞ xn + 1 x0 + 1
We can also define the concept of a closed set (and hence the concepts of open
sets and compact sets) in terms of sequences.
Definition 10. Let (X, d) be a metric space. A set S ⊂ X is closed if for any
convergent sequence {xn } such that xn ∈ S for all n then limn→∞ xn ∈ S. A set is
open if its complement is closed.
Given a sequence {xn } we can define a new sequence by taking only some of
the elements of the original sequence. In the example we considered earlier in which
xn was 1 if n was odd and 0 if n was even we could take only the odd n and thus
obtain a sequence that did converge. The new sequence is called a subsequence of
the old sequence.
Definition 11. Let {xn } be some sequence in (X, d). Let {nj }∞ j=1 be a
sequence of natural numbers such that for each j we have nj < nj+1 , that is
n1 < n2 < n3 < . . . . The sequence {xnj }∞
j=1 is called a subsequence of the original
sequence.
The notion of a subsequence is often useful. We often use it in the way that
we briefly referred to above. We initially have a sequence that may not converge,
but we are able to take a subsequence that does converge. Such a subsequence is
called a convergent subsequence.
Definition 12. A subset of a metric space with the property that every se-
quence in the subset has a convergent subsequence is called sequentially compact.
Theorem 3. In any metric space any compact set is sequentially compact.
14 1. LOGIC, SETS, FUNCTIONS, AND SPACES
Exercise 39. Consider a sequence {xn } in R. What can you say about the
sequence if it converges and for each n xn is an integer.
Exercise 40. Consider the sequence
1 1 2 1 2 3 1 2 3 4 1
2, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, . . . .
For which values z ∈ R is there a subsequence converging to z?
Exercise 41. Prove that if a subsequence of a Cauchy sequence converges to
a limit z then so does the original Cauchy sequence.
Exercise 42. Prove that any subsequence of a convergent sequence converges.
Finally one somewhat less trivial exercise.
Exercise 43. Prove that if limn→∞ xn = z then
x1 + · · · + xn
lim =z
n→∞ n
9. Linear Spaces
The notion of linear space is the axiomatic way of looking at the familiar linear
operations: addition and multiplication. A trivial example of a linear space is the
set of real numbers, R.
What is the operation of addition? The one way of answering the question is
saying that the operation of addition is just the list of its properties. So, we will
define the addition of elements from some set X as the operation that satisfies the
following four axioms.
A1: x + y = y + x for all x and y in X.
A2: x + (y + z) = (x + y) + z, for all x, y, and z in X.
A3: There exists an element, denoted by 0, such that x + 0 = x for all x in
X.
A4: For every x in X there exist an element y in X, called inverse of x, such
that x + y = 0.
And, to make things more interesting we will also introduce the operation of
‘multiplication by number’ by adding two more axioms.
A5: 1x = x for all x in X.
A6: α(βx) = (αβ)x for all x in X and for all α and β in R.
Linear Algebra
1. The Space Rn
In the previous chapter we introduced the concept of a linear space or a vector
space. We shall now examine in some detail one example of such a space. This is
the space of all ordered n-tuples (x1 , x2 , . . . , xn ) where each xi is a real number.
We call this space n-dimensional real space and denote it Rn .
Remember from the previous chapter that to define a vector space we not only
need to define the points in that space but also to define how we add such points
and how we multiple such points by scalars. In the case of Rn we do this element
by element in the n-tuple or vector. That is,
(x1 , x2 , . . . , xn ) + (y1 , y2 , . . . , yn ) = (x1 + y1 , x2 + y2 , . . . , xn + yn )
and
α(x1 , x2 , . . . , xn ) = (αx1 , αx2 , . . . , αxn ).
Let us consider the case that n = 2, that is, the case of R2 . In this case we can
visualise the space as in the following diagram. The vector (x1 , x2 ) is represented
by the point that is x1 units along from the point (0, 0) in the horizontal direction
and x2 units up from (0, 0) in the vertical direction.
x2
6
2 q (1, 2)
-
1 x1
Figure 1
Let us for the moment continue our discussion in R2 . Notice that we are
implicitly writing a vector (x1 , x2 ) as a sum x1 × v 1 + x2 × v 2 where v 1 is the
unit vector in the first direction and v 2 is the unit vector in the second direction.
Suppose that instead we considered the vectors u1 = (2, 1) = 2 × v 1 + 1 × v 2 and
17
18 2. LINEAR ALGEBRA
In the following section we shall introduce the notion of a matrix and define
various operations on matrices. If you are like me when I first came across matrices,
these definitions may seem somewhat arbitrary and mysterious. However, we shall
see that matrices may be viewed as representations of linear functions and that when
viewed in this way the operations we define on matrices are completely natural.
So far the definitions of matrix operations have all seemed the most natural
ones. We now come to defining matrix multiplication. Perhaps here the definition
seems somewhat less natural. However in the next section we shall see that the defi-
nition we shall give is in fact very natural when we view matrices as representations
of linear functions.
We define matrix multiplication of A times B written as AB where A is an
m × n matrix and B is a p × q matrix only when n = p. In this case the product
Pn to be an m × q matrix in which the element in the ith row and jth
AB is defined
column is k=1 aik bkj . That is, to find the term to go in the ith row and the jth
column of the product matrix AB we take the ith row of the matrix A which will
be a row vector with n elements and the jth column of the matrix B which will be
a column vector with n elements. We then multiply each element of the first vector
by the corresponding element of the second and add all these products. Thus
Pn Pn
a11 . . . a1n b11 ... b1q k=1 a1k bk1 ... a1k bkq
k=1
.. .. .. .. .. .. = .. .. ..
.
. . . . . . P . .
n Pn .
am1 . . . amn bn1 ... bnq k=1 mk bk1
a ... a b
k=1 mk kq
For example
p q
a b c r s = ap + br + ct aq + bs + cv
.
d e f dp + er + f t dq + es + f v
t v
We define the identity matrix of order n to be the n × n matrix that has 1’s on
its main diagonal and zeros elsewhere that is, whose ijth element is 1 if i = j and
zero if i 6= j. We denote this matrix by In or, if the order is clear from the context,
simply I. That is,
1 0 ... 0
0 1 ... 0
I= .
.. .. .. ..
. . . .
0 0 ... 1
But, what is f (ei )? Remember that ei is a unit vector in Rn and that f maps
vectors in Rn to vectors in Rm . Thus f (ei ) is the image in Rm of the vector ei . Let
us write f (ei ) as
a1i
a2i
.. .
.
ami
Thus
n
X
f (x) = xi f (ei )
i=1
a11 a12 a1n
a21 a22 a2n
= x1 + x2 + · · · + xn
.. .. ..
. . .
am1 am2 amn
Pn
Pni=1 a1i xi
i=1 a2i xi
=
..
Pn .
i=1 ami xi
4. MATRICES AS LINEAR FUNCTIONS 23
and this is exactly what we would have obtained had we multiplied the matrices
a11 a12 ... a1n x1
a21 a22 ... a2n x2
.
.. .. .. .. ..
. . . . .
am1 am2 ... amn xn
Thus we have not only shown that a linear function is necessarily represented by
multiplication by a matrix we have also shown how to find the appropriate matrix.
It is precisely the matrix whose n columns are the images under the function of the
n unit vectors in Rn .
Exercise 46. Find the matrices that represent the following linear functions
from R2 to R2 .
Exercise 47. Prove that the matrices (A + B) and αA defined in the previous
paragraph coincide with the matrices defined in Section 3.
We can also see that the definition we gave of matrix multiplication is precisely
the right definition if we mean multiplication of matrices to mean the composition of
the linear functions that the matrices represent. To be more precise let f : Rn → Rm
and g : Rm → Rk be linear functions and let A and B be the m × n and k × m
matrices that represent them. Let (g ◦ f ) : Rn → Rk be the composite function
defined in Section 2. Now let us define the product BA to be that matrix that
represents the linear function (g ◦ f ).
24 2. LINEAR ALGEBRA
Now since the matrix A represents the function f and B represents g we have
(g ◦ f )(x) = g(f (x))
a11 a12 ... a1n x1
a21 a22 ... a2n x2
= g .
.. .. .. ..
..
. . . .
am1 am2 . . . amn xn
Pn
Pni=1 a1i xi
i=1 a2i xi
= g
..
Pn .
i=1 ami xi
Pn
b11 b12 . . . b1m Pni=1 a1i xi
b21 b22 . . . b2m
i=1 a2i xi
= .
.. .. .. .
.. . P ..
. .
n
bk1 bk2 . . . bkm i=1 ami xi
Pm Pn
b1j i=1 aji xi
Pj=1
m Pn
j=1 b2j i=1 aji xi
=
..
Pm Pn.
j=1 bkj i=1 aji xi
Pn Pm
j=1 b1j aji xi
Pi=1
n P m
j=1 b2j aji xi
i=1
=
..
Pn Pm .
i=1 j=1 bkj aji xi
Pm Pm Pm
b1j aj1 j=1 b1j aj2 ... b1j ajn
Pj=1 Pj=1 x1
m Pm m
j=1 b 2j a j1 j=1 b 2j aj2 . . . j=1 b2j ajn
x2
= .
.. .. . .. .. ..
Pm . Pm . Pm .
.
j=1 bkj aj1 j=1 bkj aj2 . . . j=1 bkj ajn xn
And this last is the product of the matrix we defined in Section 3 to be BA with
the column vector x. As we have claimed the definition of matrix multiplication
we gave in Section 3 was not arbitrary but rather was forced on us by our decision
to regard the multiplication of two matrices as corresponding to the composition
of the linear functions the matrices represented.
Recall that the columns of the matrix A that represented the linear function
f : Rn → Rm were precisely the images of the unit vectors in Rn under f . The
linearity of f means that the image of any point in Rn is in the span of the images
of these unit vectors and similarly that any point in the span of the images is the
image of some point in Rn . Thus Im(f ) is equal to the span of the columns of
A. Now, the dimension of the span of the columns of A is equal to the maximum
number of linearly independent columns in A, that is, to the rank of A.
7. Changes of Basis
We have until now implicitly assumed that there is no ambiguity when we
speak of the vector (x1 , x2 , . . . , xn ). Sometimes there may indeed be an obvious
meaning to such a vector. However when we define a linear space all that are really
specified are “what straight lines are” and “where zero is.” In particular, we do
not necessarily have defined in an unambiguous way “where the axes are” or “what
26 2. LINEAR ALGEBRA
a unit length along each axis is.” In other words we may not have a set of basis
vectors specified.
Even when we do have, or have decided on, a set of basis vectors we may wish
to redefine our description of the linear space with which we are dealing so as to
use a different set of basis vectors. Let us suppose that we have an n-dimensional
space, even Rn say, with a given set of basis vectors v 1 , v 2 , . . . , v n and that we
wish instead to describe the space in terms of the linearly independent vectors
b1 , b2 , . . . , bn where
bi = b1i v 1 + b2i v 2 + · · · + bni v n .
Now, if we had the description of a point in terms of the new coordinate vectors,
e.g., as
z1 b1 + z2 b2 + · · · + zn bn
then we can easily convert this to a description in terms of the original basis vectors.
We would simply substitute the formula for bi in terms of the ej ’s into the previous
formula giving
n
! n
! n
!
X X X
b1i zi v 1 + b2i zi v 2 + · · · + bni zi v n
i=1 i=1 i=1
That is, if we are given an n-tuple of real numbers that describe a vector in terms
of the new basis vectors b1 , b2 , . . . , bn and we wish to find the n-tuple that describes
the vector in terms of the original basis vectors we simply multiply the ntuple we
are given, written as a column vector by the matrix whose columns are the new
basis vectors b1 , b2 , . . . , bn . We shall call this matrix B. We see among other things
that changing the basis is a linear operation.
Now, if we were given the information in terms of the original basis vectors
and wanted to write it in terms of the new basis vectors what should we do? Since
we don’t have the original basis vectors written in terms of the new basis vectors
this is not immediately obvious. However we do know that if we were to do it and
then were to carry out the operation described in the previous paragraph we would
be back with what we started. Further we know that the operation is a linear
operation that maps n-tuples to n-tuples and so is represented by multiplication
by an n × n matrix. That is we multiply the n-tuple written as a column vector by
the matrix that when multiplied by B gives the identity matrix, that is, the matrix
B −1 . If we are given a vector of the form
x1 v 1 + x2 v 2 + · · · + xn v n
7. CHANGES OF BASIS 27
Properties of a square matrix that depend only on the linear function that the
matrix represents and not on the particular choice of basis vectors for the linear
space are called invariant properties. We have already seen one example of an
invariant property, the rank of a matrix. The rank of a matrix is equal to the
dimension of the image space of the function that the matrix represents which
clearly depends only on the function and not on the choice of basis vectors for the
linear space.
The idea of a property being invariant can be expressed also in terms only of
matrices without reference to the idea of linear functions. A property is invariant
if whenever an n × n matrix A has the property then for any nonsingular n × n
matrix B the matrix B −1 AB also has the property. We might think of rank as a
function that associates to any square matrix a nonnegative integer. We shall say
that such a function is an invariant if the property of having the function take a
particular value is invariant for all particular values we may choose.
Two particularly important invariants are the trace of a square matrix and the
determinant of a square matrix. We examine these in more detail in the following
section.
The determinant, unlike the trace is not a linear function of the matrix. It does
however have some linear structure. If we fix all columns of the matrix except one
and look at the determinant as a function of only this column then the determinant
is linear in this single column. Moreover this is true whatever the column we choose.
Let us write the determinant of the n × n matrix A as det(A). Let us also write
the matrix A as [a1 , a2 , . . . , an ] where ai is the ith column of the matrix A. Thus
our claim is that for all n × n matrices A, for all i = 1, 2, . . . n, for all n vectors b,
and for all α ∈ R
det([a1 , . . . , ai−1 , ai + b, ai+1 , . . . , an ]) = det([a1 , . . . , ai−1 , ai , ai+1 , . . . , an ])
(3)
+ det([a1 , . . . , ai−1 , b, ai+1 , . . . , an ])
and
Property 6. If one expands a matrix in terms of one row (or column) and
the cofactors of a different row (or column) then the answer is always zero. That is
n
X
aij |Ckj | = 0
j=1
whenever i 6= k. Also
n
X
aij |Cik | = 0
i=1
whenever j 6= k.
Exercise 67. Verify Property 6 for the matrix
4 1 2
5 2 1 .
1 0 3
Let us define the matrix of cofactors C to be the matrix [|Cij |] whose ijth
element is the cofactor of the ijth element of A. Now we define the adjoint matrix
of A to be the transpose of the matrix of cofactors of A. That is
adj(A) = C 0 .
It is straightforward to see (using Property 6) that A adj(A) = |A|In = adj(A)A.
That is, A−1 = |A|1
adj(A). Notice that this is well defined if and only if |A| =
6 0.
We now have a method of finding the inverse of any nonsingular square matrix.
Exercise 68. Use this method to find the inverses of the following matrices
3 −1 2 4 −2 1 1 5 2
(a) 1 0 3 (b) 7 3 3 (c) 1 4 3 .
4 0 2 2 0 1 0 1 2
Knowing how to invert matrices we thus know how to solve a system of n linear
equations in n unknowns. For we can express the n equations in matrix notation as
Ax = b where A is an n × n matrix of coefficients, x is an n × 1 vector of unknowns,
and b is an n × 1 vector of constants. Thus we can solve the system of equations
as x = A−1 Ax = A−1 b.
Sometimes, particularly if we are not interested in all of the x’s it is convenient
to use another method of solving the equations. This method is known as Cramer’s
Rule. Let us suppose that we wish to solve the above system of equations, that is,
Ax = b. Let us define the matrix Ai to be the matrix obtained from A by replacing
the ith column of A by the vector b. Then the solution is given by
|Ai |
xi = .
|A|
Exercise 69. Derive Cramer’s Rule. [Hint: We know that the solution to the
system of equations is solved by x = (1/|A|)adj(A)b. This gives a formula for xi .
Show that this formula is the same as that given by xi = |Ai |/|A|.]
Exercise 70. Solve the following system of equations (i) by matrix inversion
and (ii) by Cramer’s Rule
2x1 − x2 = 2 −x1 + x2 + x3 = 1
(a) 3x2 + 2x3 = 16 (b) x1 − x2 + x3 = 1 .
5x1 + 3x3 = 21 x1 + x2 + x3 = 1
34 2. LINEAR ALGEBRA
Exercise 71. Recall that we claimed that the determinant was an invariant.
Confirm this by calculating (directly) det(A) and det(B −1 AB) where
1 0 1 1 0 0
B = 1 −1 2 and A = 0 2 0 .
2 1 −1 0 0 3
Exercise 72. An nth order determinant of the form
a11 0 0 ... 0
a21 a22 0 ... 0
a31 a32 a33 . . . 0
.. .. .. .. ..
.
. . . .
an1 an2 an3 . . . ann
is called triangular. Evaluate this determinant. [Hint: Expand the determinant in
terms of its first row. Expand the resulting (n − 1) × (n − 1) determinant in terms
of its first row, and so on.]
1. Constrained Maximisation
1.1. Lagrange Multipliers. Consider the problem of a consumer who seeks
to distribute his income across the purchase of the two goods that he consumes,
subject to the constraint that he spends no more than his total income. Let us
denote the amount of the first good that he buys x1 and the amount of the second
good x2 , the prices of the two goods p1 and p2 , and the consumer’s income y.
The utility that the consumer obtains from consuming x1 units of good 1 and x2
of good two is denoted u(x1 , x2 ). Thus the consumer’s problem is to maximise
u(x1 , x2 ) subject to the constraint that p1 x1 + p2 x2 ≤ y. (We shall soon write
p1 x1 + p2 x2 = y, i.e., we shall assume that the consumer must spend all of his
income.) Before discussing the solution of this problem lets write it in a more
“mathematical” way.
max u(x1 , x2 )
x1 ,x2
(5)
subject to p1 x1 + p2 x2 = y
We read this “Choose x1 and x2 to maximise u(x1 , x2 ) subject to the constraint
that p1 x1 + p2 x2 = y.”
Let us assume, as usual, that the indifference curves (i.e., the sets of points
(x1 , x2 ) for which u(x1 , x2 ) is a constant) are convex to the origin. Let us also
assume that the indifference curves are nice and smooth. Then the point (x∗1 , x∗2 )
that solves the maximisation problem (31) is the point at which the indifference
curve is tangent to the budget line as given in Figure 1.
One thing we can say about the solution is that at the point (x∗1 , x∗2 ) it must be
true that the marginal utility with respect to good 1 divided by the price of good 1
must equal the marginal utility with respect to good 2 divided by the price of good
2. For if this were not true then the consumer could, by decreasing the consumption
of the good for which this ratio was lower and increasing the consumption of the
other good, increase his utility. Marginal utilities are, of course, just the partial
derivatives of the utility function. Thus we have
∂u ∗ ∗ ∂u ∗ ∗
∂x1 (x1 , x2 ) ∂x2 (x1 , x2 )
(6) = .
p1 p2
The argument we have just made seems very “economic.” It is easy to give an
alternate argument that does not explicitly refer to the economic intuition. Let xu2
be the function that defines the indifference curve through the point (x∗1 , x∗2 ), i.e.,
u(x1 , xu2 (x1 )) ≡ ū ≡ u(x∗1 , x∗2 ).
Now, totally differentiating this identity gives
∂u ∂u dxu
(x1 , xu2 (x1 )) + (x1 , xu2 (x1 )) 2 (x1 ) = 0.
∂x1 ∂x2 dx1
37
383. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT
x2
6
@
@
@
@
@
@
x∗2 q q q q q q q q q q q q q q q@
qq
@
q
qq @
q @
qq u(x1 , x2 ) = ū
q @
qq @
q
qq
@
p1 x1 + p2 x2 = y
q @
q @ -
x∗1 x1
Figure 1
That is,
∂u
dxu2 (x1 , xu2 (x1 ))
(x1 ) = − ∂x
∂u
1
u
.
dx1 ∂x (x1 , x2 (x1 ))
2
Now xu2 (x∗1 ) = x∗2 . Thus the slope of the indifference curve at the point (x∗1 , x∗2 )
dxu2 ∗
∂u
(x∗1 , x∗2 )
(x1 ) = − ∂x
∂u
1
∗ ∗
.
dx1 ∂x (x1 , x2 )
2
Also, the slope of the budget line is − pp21 . Combining these two results again gives
result (6).
Since we also have another equation that (x∗1 , x∗2 ) must satisfy, viz
(7) p1 x∗1 + p2 x∗2 = y
we have two equations in two unknowns and we can (if we know what the utility
function is and what p1 , p2 , and y are) go happily away and solve the problem.
(This isn’t quite true but we shall not go into that at this point.) What we shall
develop is a systemic and useful way to obtain the conditions (6) and (7). Let us
first denote the common value of the ratios in (6) by λ. That is,
∂u ∗ ∗ ∂u ∗ ∗
∂x1 (x1 , x2 ) ∂x2 (x1 , x2 )
=λ=
p1 p2
and we can rewrite this and (7) as
∂u ∗ ∗
(x , x ) − λp1 = 0
∂x1 1 2
(8) ∂u ∗ ∗
(x , x ) − λp2 = 0
∂x2 1 2
y − p1 x∗1 − p2 x∗2 = 0.
1. CONSTRAINED MAXIMISATION 39
Now we have three equations in x∗1 , x∗2 , and the new artificial or auxiliary variable
λ. Again we can, perhaps, solve these equations for x∗1 , x∗2 , and λ. Consider the
following function
(9) L(x1 , x2 , λ) = u(x1 , x2 ) + λ(y − p1 x1 − p2 x2 )
∂L ∂L
This function is known as the Lagrangian. Now, if we calculate ∂x 1
, ∂x2
, and, ∂L
∂λ ,
and set the results equal to zero we obtain exactly the equations given in (8). We
now describe this technique in a somewhat more general way.
Suppose that we have the following maximisation problem
max f (x1 , . . . , xn )
x1 ,...,xn
(10)
subject to g(x1 , . . . , xn ) = c
and we let
(11) L(x1 , . . . , xn , λ) = f (x1 , . . . , xn ) + λ(c − g(x1 , . . . , xn ))
then if (x∗1 , . . . , x∗n ) solves (10) there is a value of λ, say λ∗ such that
∂L ∗
(12) (x , . . . , x∗n , λ∗ ) = 0 i = 1, . . . , n
∂xi 1
∂L ∗
(13) (x , . . . , x∗n , λ∗ ) = 0.
∂λ 1
Notice that the conditions (12) are precisely the first order conditions for
choosing x1 , . . . , xn to maximise L, once λ∗ has been chosen. This provides an
intuition into this method of solving the constrained maximisation problem. In
the constrained problem we have told the decision maker that he must satisfy
g(x1 , . . . , xn ) = c and that he should choose among all points that satisfy this con-
straint the point at which f (x1 , . . . , xn ) is greatest. We arrive at the same answer
if we tell the decision maker to choose any point he wishes but that for each unit by
which he violates the constraint g(x1 , . . . , xn ) = c we shall take away λ units from
his payoff. Of course we must be careful to choose λ to be the correct value. If we
choose λ too small the decision maker may choose to violate his constraint—e.g.,
if we made the penalty for spending more than the consumer’s income very small
the consumer would choose to consume more goods than he could afford and to
pay the penalty in utility terms. On the other hand if we choose λ too large the
decision maker may violate his constraint in the other direction, e.g., the consumer
would choose not to spend any of his income and just receive λ units of utility for
each unit of his income.
It is possible to give a more general statement of this technique, allowing for
multiple constraints. (Of course, we should always have fewer constraints than we
have variables.) Suppose we have more than one constraint. Consider the problem
max f (x1 , . . . , xn )
x1 ,...,xn
subject to g1 (x1 , . . . , xn ) = c1
.. ..
. .
gm (x1 , . . . , xn ) = cm .
and again if (x∗1 , . . . , x∗n ) solves (14) there are values of λ, say λ∗1 , . . . , λ∗m such that
∂L ∗
(x , . . . , x∗n , λ∗1 , . . . , λ∗m ) = 0 i = 1, . . . , n
∂xi 1
(15)
∂L ∗
(x , . . . , x∗n , λ∗1 , . . . , λ∗m ) = 0 j = 1, . . . , m.
∂λj 1
1.2. Caveats and Extensions. Notice that we have been referring to the set
of conditions which a solution to the maximisation problem must satisfy. (We call
such conditions necessary conditions.) So far we have not even claimed that there
necessarily is a solution to the maximisation problem. There are many examples of
maximisation problems which have no solution. One example of an unconstrained
problem with no solution is
(16) max 2x
x
maximise over the choice of x the function 2x. Clearly the greater we make x the
greater is 2x, and so, since there is no upper bound on x there is no maximum.
Thus we might want to restrict maximisation problems to those in which we choose
x from some bounded set. Again, this is not enough. Consider the problem
(17) max 1/x .
0≤x≤1
The smaller we make x the greater is 1/x and yet at zero 1/x is not even defined.
We could define the function to take on some value at zero, say 7. But then the
function would not be continuous. Or we could leave zero out of the feasible set
for x, say 0 < x ≤ 1. Then the set of feasible x is not closed. Since there would
obviously still be no solution to the maximisation problem in these cases we shall
want to restrict maximisation problems to those in which we choose x to maximise
some continuous function from some closed (and because of the previous example)
bounded set. (We call a set of numbers, or more generally a set of vectors, that
is both closed and bounded a compact set.) Is there anything else that could go
wrong? No! The following result says that if the function to be maximised is
continuous and the set over which we are choosing is both closed and bounded, i.e.,
is compact, then there is a solution to the maximisation problem.
Theorem 6 (The Weierstrass Theorem). Let S be a compact set. Let f be a
continuous function that takes each point in S to a real number. (We usually write:
let f : S → R be continuous.) Then there is some x∗ in S at which the function is
maximised. More precisely, there is some x∗ in S such that f (x∗ ) ≥ f (x) for any
x in S.
Notice that in defining such compact sets we typically use inequalities, such
as x ≥ 0. However in Section 1 we did not consider such constraints, but rather
considered only equality constraints. However, even in the example of utility max-
imisation at the beginning of Section 5.6, there were implicitly constraints on x1
and x2 of the form
x1 ≥ 0, x2 ≥ 0.
A truly satisfactory treatment would make such constraints explicit. It is possible
to explicitly treat the maximisation problem with inequality constraints, at the
price of a little additional complexity. We shall return to this question later in the
book.
Also, notice that had we wished to solve a minimisation problem we could
have transformed the problem into a maximisation problem by simply multiplying
the objective function by −1. That is, if we wish to minimise f (x) we could do
so by maximising −f (x). As an exercise write out the conditions analogous to
2. THE IMPLICIT FUNCTION THEOREM 41
the conditions (8) for the case that we wanted to minimise u(x). Notice that if
x∗1 , x∗2 , and λ satisfy the original equations then x∗1 , x∗2 , and −λ satisfy the new
equations. Thus we cannot tell whether there is a maximum at (x∗1 , x∗2 ) or a
minimum. This corresponds to the fact that in the case of a function of a single
variable over an unconstrained domain at a maximum we require the first derivative
to be zero, but that to know for sure that we have a maximum we must look at the
second derivative. We shall not develop the analogous conditions for the constrained
problem with many variables here. However, again, we shall return to it later in
the book.
When can we solve this system to obtain functions giving each xi as a function
of b1 , . . . , bm ? As we’ll see below we only give an incomplete answer to this question,
but first let’s look at the case that the function f is a linear function.
Suppose that our equations are
a11 x1 + . . . a1n xn + c11 b1 + c1m bm = 0
a21 x1 + . . . a2n xn + c21 b1 + c2m bm = 0
..
.
an1 x1 + . . . ann xn + cn1 b1 + cnm bm = 0.
We can write this, in matrix notation, as
x
[A | C] = 0,
b
where A is an n × n matrix, C is an n × m matrix, x is an n × 1 (column) vector,
and b is an m × 1 vector.
This we can rewrite as
Ax + Cb = 0,
and solve this to give
x = −A−1 Cb.
And we can do this as long as the matrix A can be inverted, that is, as long as the
matrix A is of full rank.
Our answer to the general question in which the function f may not be linear
is that if there are some values (x̄, b̄) for which f (x̄, b̄) = 0 then if, when we take
a linear approximation to f we can solve the approximate linear system as we did
423. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT
above, then we can solve the true nonlinear system, at least in a neighbourhood of
(x̄, b̄). By this last phrase we mean that if b is not close to b̄ we may not be able to
solve the system, and that for a particular value of b there may be many values of
x that solve the system, but there is only one close to x̄.
To see why we can’t, in general, do better than this consider the equation
f : R2 → R given by f (x, b) = g(x) − b, where the function g is graphed in Figure 2.
Notice that the values (x̄, b̄) satisfy the equation f (x, b) = 0. For all values of b
close to b̄ we can find a unique value of x close to x̄ such that f (x, b) = 0. However,
(1) for each value of b there are other values of x far away from x̄ that also satisfy
f (x, b) = 0, and (2) there are values of b, such as b̃ for which there are no values of
x that satisfy f (x, b) = 0.
g(x)
6
b̃ q
b̄ q q q q q q q q q q q q qq
q
qq
q
qq
q -
x̄ x
Figure 2
Let us consider again the system of equations 18. We say that the function f
is C 1 on some open set A ⊂ Rn+m if f has partial derivatives everywhere in A and
these partial derivatives are continuous on A.
Theorem 7. Suppose that f : Rn+m → Rn is a C 1 function on an open set
A ⊂ Rn+m and that (x̄, b̄) in A is such that f (x̄, b̄) = 0. Suppose also that
∂f (x,b)
1
∂x1 · · · ∂f∂x1 (x,b)
n
∂f (x, b) .. ..
= . .
∂x
∂fn (x,b) ∂fn (x,b)
∂x1 · · · ∂xn
is of full rank. Then there are open sets A1 ⊂ Rn and A2 ⊂ Rm with x̄ in A1 and
b̄ in A2 and A1 × A2 ⊂ A such that for each b in A2 there is exactly one g(b) in A1
such that f (g(b), b) = 0. Moreover, g : A2 → A1 is a C 1 function and
−1
∂g(b) ∂f (g(b), b) ∂f (g(b), b)
=− .
∂b ∂x ∂b
3. THE THEOREM OF THE MAXIMUM 43
The result we are about to give is most conveniently stated when our statement
of the problem is in terms of inequality constraints rather than equality constraints.
As mentioned earlier we shall examine this kind of problem later in this course.
However for the moment in order to proceed with our discussion of the problem
involving equality constraints we shall assume that all of the functions with which
we are dealing are increasing in the x variables. (See Exercise 1 for a formal
definition of what it means for a function to be increasing.) In this case if f is
strictly concave and gj is convex for each j then the problem has a unique solution.
In fact the concepts of concavity and convexity are somewhat stronger than is
required. We shall see later in the course that they can be replaced by the concepts
of quasi-concavity and quasi-convexity. In some sense these latter concepts are the
“right” concepts for this result.
Theorem 8. Suppose that f and gj are increasing in (x1 , . . . , xn ). If f is
strictly concave in (x1 , . . . , xn ) and gj is convex in (x1 , . . . , xn ) for j = 1, . . . , m
then for each value of the parameters (a1 , . . . , ak ) if problem (20) has a solution
(x∗1 , . . . , x∗n ) that solution is unique.
Now let v(a1 , . . . , ak ) be the maximised value of f when the parameters are
(a1 , . . . , ak ). Let us suppose that the problem is such that the solution is unique and
that (x∗1 (a1 , . . . , ak ), . . . , x∗n (a1 , . . . , ak )) are the values that maximise the function
f when the parameters are (a1 , . . . , ak ) then
(21) v(a1 , . . . , ak ) = f (x∗1 (a1 , . . . , ak ), . . . , x∗n (a1 , . . . , ak ), a1 , . . . , ak ).
(Notice however that the function v is uniquely defined even if there is not a unique
maximiser.)
The Theorem of the Maximum gives conditions on the problem under which
the function v and the functions x∗1 , . . . , x∗n are continuous. The constraints in the
problem (20) define a set of feasible vectors x over which the function f is to be
maximised. Let us call this set G(a1 , . . . , ak ), i.e.,
(22) G(a1 , . . . , ak ) = {(x1 , . . . , xn ) | gj (x1 , . . . , xn , a1 , . . . , ak ) = cj ∀j}
Now we can restate the problem as
max f (x1 , . . . , xn , a1 , . . . , ak )
x1 ,...,xn
(23)
subject to (x1 , . . . , xn ) ∈ G(a1 , . . . , ak ).
Notice that both the function f and the feasible set G depend on the parameters
a, i.e., both may change as a changes. The Theorem of the Maximum requires
both that the function f be continuous as a function of x and a and that the
feasible set G(a1 , . . . , ak ) change continuously as a changes. We already know—
or should know—what it means for f to be continuous but the notion of what it
means for a set to change continuously is less elementary. We call G a set valued
function or a correspondence. G associates with any vector (a1 , . . . , ak ) a subset of
the vectors (x1 , . . . , xn ). The following two definitions define what we mean by a
correspondence being continuous. First we define what it means for two sets to be
close.
Definition 16. Two sets of vectors A and B are within of each other if for
any vector x in one set there is a vector x0 in the other set such that x0 is within
of x.
We can now define the continuity of the correspondence G in essentially the
same way that we define the continuity of a single valued function.
4. THE ENVELOPE THEOREM 45
f (x, a)
6
f (x∗ (a0 ), a0 ) q q q q q q q q q q q q q q q q q q q q q
qqqqqqqqqqqqqqqq q
f (x∗ (a), a0 ) qq qq
q q f (·, a0 )
∗
q q q q q q q q q q q q q q q q
qq q
qq
f (x (a), a)
qq qq
q qq
qq qq
q
qq q
qq
q
qq q f (·, a)
qq
q
q q -
x∗ (a) x∗ (a0 ) x
Figure 2
a function of the amount of the good to be produced. The short run average cost
function is defined to be the function which for any quantity, Q, gives the average
cost of producing that quantity, taking as given the scale of operation, i.e., the size
and number of plants and other fixed capital which we assume cannot be changed
in the short run (whatever that is). The long run average cost function on the
other hand gives, as a function of Q, the average cost of producing Q units of the
good, with the scale of operation selected to be the optimal scale for that level of
production.
That is, if we let the scale of operation be measured by a single variable k,
say, and we let the short run average cost of producing Q units when the scale is
k be given by SRAC(Q, k) and the long run average cost of producing Q units by
LRAC(Q) then we have
Let us denote, for a given value Q, the optimal level of k by k(Q). That is, k(Q) is
the value of k that minimises the right hand side of the above equation.
Graphically, for any fixed level of k the short run average cost function can be
represented by a curve (normally assumed to be U-shaped) drawn in two dimensions
with quantity on the horizontal axis and cost on the vertical axis. Now think about
drawing one short run average cost curve for each of the (infinite) possible values of
k. One way of thinking about the long run average cost curve is as the “bottom” or
envelope of these short run average cost curves. Suppose that we consider a point
on this long run or envelope curve. What can be said about the slope of the long
run average cost curve at this point. A little thought should convince you that it
should be the same as the slope of the short run curve through the same point.
(If it were not then that short run curve would come below the long run curve, a
4. THE ENVELOPE THEOREM 47
Cost
6
SRAC
LRAC(Q̄) =
SRAC(Q̄, k(Q̄)) q q q q q q q q q q q q q q q qq
qq
q
qq LRAC
q
qq
q
qq
q
qq -
Q̄ Q
Figure 3
for all h.
In order to show the advantages of using matrix and vector notation we shall
restate the theorem in that notation before returning to give a proof of the theorem.
(In proving the theorem we shall return to using mainly scalar notation.)
Theorem 10 (The Envelope Theorem). Under the same conditions as above
∂v ∂L ∗
(a) = (x (a), λ(a), a)
∂a ∂a
∂f ∗ ∂g
= (x (a), a) − λ(a) (x∗ (a), a).
∂a ∂a
Proof. From the definition of the function v we have
(26) v(a1 , . . . , ak ) = f (x∗1 (a1 , . . . , ak ), . . . , x∗n (a1 , . . . , ak ), a1 , . . . , ak )
Thus
n
∂v ∂f ∗ X ∂f ∗ ∂x∗
(27) (a) = (x (a), a) + (x (a), a) i (a).
∂ah ∂ah i=1
∂xi ∂ah
Now, from the first order conditions (12) we have
m
∂f ∗ X ∂gj ∗
(x (a), a) − λj (a) (x (a), a) = 0.
∂xi j=1
∂xi
Or
m
∂f ∗ X ∂gj ∗
(28) (x (a), a) = λj (a) (x (a), a).
∂xi j=1
∂xi
Or
n
X ∂gj ∂x∗i ∂gj ∗
(29) (x∗ (a), a) (a) = − (x (a), a).
i=1
∂xi ∂ah ∂ah
Substituting (28) into (27) gives
n X m
∂v ∂f ∗ X ∂gj ∗ ∂x∗
(a) = (x (a), a) + [ λj (a) (x (a), a)] i (a).
∂ah ∂ah i=1 j=1
∂xi ∂ah
5. APPLICATIONS TO MICROECONOMIC THEORY 49
Exercises.
Exercise 78. Rewrite this proof using matrix notation. Go through your proof
and identify the dimension of each of the vectors or matrices you use. For example
fx is a 1 × n vector, gx is an m × n matrix.
max u(x1 , x2 )
x1 ,x2
subject to p1 x1 + p2 x2 − y = 0.
Let v(p1 , p2 , y) be the maximised value of u when prices and income are p1 , p2 , and
y. Let us consider the effect of a change in y with p1 and p2 remaining constant.
By the Envelope Theorem
∂v ∂
= {u(x1 , x2 ) + λ(y − p1 x1 + p2 x2 )} = 0 + λ1 = λ.
∂y ∂y
This is the familiar result that λ is the marginal utility of income.
when evaluated at the point which solves the minimisation problem which we write
as hi (p1 , . . . , pn , u0 ) to distinguish this (compensated) value of the demand for good
i as a function of prices and utility from the (uncompensated) value of the demand
for good i as a function of prices and income. This result is known as Hotelling’s
Theorem.
503. CONSUMER BEHAVIOUR: OPTIMISATION SUBJECT TO THE BUDGET CONSTRAINT
5.4. The Indirect Utility Function. Again let v(p1 , . . . , pn , y) be the indi-
rect utility function, that is, the maximised value of utility as described in Appli-
cation (1). Then by the Envelope Theorem
∂v ∂u
= − λxi (p1 , . . . , pn , y) = −λxi (p1 , . . . , pn , y)
∂pi ∂pi
∂u ∂v
since ∂pi = 0. Now, since we have already shown that λ = ∂y (in Section 4.1) we
have
∂v/∂pi
xi (p1 , . . . , pn , y) = − .
∂v/∂y
This is known as Roy’s Theorem.
5.5. Profit functions. Now consider the problem of a firm that maximises
profits subject to technology constraints. Let x = (x1 , . . . , xn ) be a vector of
netputs, i.e., xi is positive if the firm is a net supplier of good i, negative if the firm
is a net user of that good. Let assume that we can write the technology constraints
as F (x) = 0. Thus the firm’s problem is
n
X
max pi xi
x1 ,...,xn
i=1
subject to F (x1 , . . . , xn ) = 0.
Let ϕi (p) be the value of xi that solves this problem, i.e., the net supply of
commodity i when prices are p. (Here p is a vector.) We call the maximised value
the profit function which is given by
n
X
Π(p) = pi ϕi (p).
i=1
Our first application of the Envelope Theorem told us that this value of λ could
be found as the derivative of the indirect utility function with respect to w. We
confirm this by differentiating the function we found above with respect to w.
∂v ∂ w 1
= √ = √
∂w ∂w 2 p1 p2 2 p1 p2
as we had found directly above.
Now let us, for the same utility function consider the expenditure minimisation
problem
min p1 x2 + p2 x2
x1 ,x2
√ √
subject to x1 x2 = u.
The Lagrangian is
√ √
(39) L(x1 , x2 , λ) = p1 x1 + p2 x2 + λ(u − x1 x2 )
and the first order conditions are
∂L 1 −1 1
(40) = p1 − λ x1 2 x22 = 0
∂x1 2
∂L 1 1 −1
(41) = p2 − λ x12 x2 2 = 0
∂x2 2
∂L √ √
(42) = u − x1 x2 = 0.
∂λ
Dividing equation (40) by equation (41) gives
p1 x2
=
p2 x1
or
p1 x1
(43) x2 = .
p2
And, if we substitute equation (43) into equation (40) we obtain
r
p1
u − x1
p2
or r
p2
x1 = u .
p1
Similarly,
r
p1
x2 = u ,
p2
and if we substitute these values back into the objective function we obtain the
expenditure function
√
r r
p2 p1
e(p1 , p2 , u) = p1 u + p2 u = 2u p1 p2 .
p1 p2
Hotelling’s Theorem tells us that is we differentiate this expenditure function
with respect to pi we should obtain the Hicksian demand function hi .
√
r r
∂e(p1 , p2 , u) ∂ 1 p2 p2
= 2u p1 p2 = 2u · =u
∂p1 ∂p1 2 p1 p1
as we had already found. And similarly for h2 .
5. APPLICATIONS TO MICROECONOMIC THEORY 53
Let us summarise what we have found so far. The Marshallian demand func-
tions are
w
x1 (p1 , p2 , w) =
2p1
w
x2 (p1 , p2 , w) =
2p2
The indirect utility function is
w
v(p1 , p2 , w) = √ .
2 p1 p2
The Hicksian demand functions are
r
p2
h1 (p1 , p2 , w) = u
p1
r
p1
h2 (p1 , p2 , w) = u ,
p2
and r
p2
h1 = u .
p1
Substituting into the right hand side of the Hicks-Slutsky equation above gives
r
u p2 1
RHS = √ −u · = 0,
2 p1 p2 p1 2p1
which is exactly what we had found for the left hand side of the Hicks-Slutsky
equation.
Finally we check Roy’s Theorem, which tells us that the Marshallian demand
for good 1 can be found as
∂v
− ∂p 1
x1 (p1 , p2 , w) = ∂v
.
∂w
In this case we obtain
−3
−1
− w2 × √1
p2 × 2 × p1 2
x1 (p1 , p2 , w) = q
1 1
2 p1 p2
w
= ,
2p1
as required.
Exercises.
Exercise 79. Consider the direct utility function
Xn
u(x) = βi log(xi − γi ),
i=1
where βi and γi , i = 1, . . . , n are, respectively, positive and nonpositive parameters.
(1) Derive the indirect utility function and show that it is decreasing in its
arguments.
(2) Verify Roy’s Theorem.
(3) Derive the expenditure function and show that it is homogeneous of degree
one and nondecreasing in prices.
(4) Verify Hotelling’s Theorem.
where xi is the quantity generated in the i-th plant. If the utility is required to pro-
duce 2000 megawatts in a particular hour, how should it allocate this load between
the plants so as to minimise costs? Use the Lagrangian method and interpret the
multiplier. How do total costs vary as b changes. (That is, what is the derivative
of the minimised cost with respect to b.)
CHAPTER 4
1. Convexity
Convexity is one of the most important mathematical properties in econom-
ics. For example, without convexity of preferences, demand and supply functions
are not continuous, and so competitive markets generally do not have equilibrium
points. The economic interpretation of convex preference sets in consumer theory is
diminishing marginal rates of substitution; the interpretation of convex production
sets is constant or decreasing returns to scale. Considerably less is known about
general equilibrium models that allow non-convex production sets (e.g., economies
of scale) or non-convex preferences (e.g., the consumer prefers a pint of beer or a
shot of vodka alone to any mixture of the two).
Another set of mathematical results closely connected to the notion of convexity
is so-called separation and support theorems. These theorems are frequently used in
economics to obtain a price system that leads consumers and producers to choose
Pareto-efficient allocation. That is, given the prices, producers are maximizing
profits, and given those profits as income, consumers are maximizing utility subject
to their budget constraints.
1.1. Convex Sets. Given two points x, y ∈ Rn , a point z = ax + (1 − a) y,
where 0 ≤ a ≤ 1, is called a convex combination of x and y.
The set of all possible convex combinations of x and y, denoted by [x, y], is
called the interval with endpoints x and y (or, the line segment connecting x and
y):
[x, y] = {ax + (1 − a) y : 0 ≤ a ≤ 1} .
Definition 18. A set S ⊆ Rn is convex iff for any x, y ∈ S the interval
[x, y] ⊆ S.
In words: a set is convex if it contains the line segment connecting any two of
its points; or, more loosely speaking, a set is convex if along with any two points it
contains all points between them.
Convex sets in R2 include interiors of triangle, squares, circles, ellipses, and
hosts of other sets. Note also that, for example in R3 , while the interior of a cube is
a convex set, its boundary is not. The quintessential convex set in Euclidean space
Rn for any n > 1 is the n−dimensional sphere SR (a) of radius R > 0 about point
a ∈ Rn , given by
SR (a) = {x : x ∈ Rn , |x − a| < R}.
More examples of convex sets:
1. Is the empty set convex? Is a singleton convex? Is Rn convex?
There are also several standard ways of forming convex sets from convex sets:
2. Let A, B ⊆ Rn be sets. The Minkowski sum A + B ⊆ Rn is defined as
A + B = {x + y : x ∈ A, y ∈ B} .
When B = {b} is a singleton, the set A + b is called a translation of A. Prove that
A + B is convex if A and B are convex.
57
58 4. TOPICS IN CONVEX ANALYSIS
Theorem 11. Let S ⊆ Rn be a set then any convex set containing S also
contains convS.
Proof. Let A be a convex set such that S ⊆ A. By lemma 1 A contains all
convex combinations of its points and, in particular, all convex combinations of
points of its subset S, which is convS.
The next property is quite obvious and, again, frustrates attempts to generate
‘superconvex’ sets, this time by trying to take convex hulls of convex hulls.
1. Prove that convconvS = convS for any S.
2. Prove that if A ⊂ B then convA ⊂ convB.
The next property relates the operation of taking convex hulls and of taking
direct sums. It does not matter in which order you use these operations.
3. Prove that conv (A + B) = (convA) + (convB).
4. Prove that conv (A ∩ B) ⊆ (convA) ∩ (convB).
5. Prove that (convA) ∪ (convB) ⊆ conv (A ∪ B).
1.3. Caratheodory’s Theorem. The definition 20 implies that any point x
in the convex hull of S is representable as a convex combination of (finitely) many
points of S but it places no restrictions on the number of points of S required
to make the combination. Caratheodory’s Theorem puts the upper bound on the
number of points required, in Rn the number of points never has to be more than
n + 1.
Theorem 12 (Caratheodory, 1907). Let S ⊆ Rn be a non-empty set then every
x ∈ convS can be represented as a convex combination of (at most) n + 1 points
from S.
Note that the theorem does not ‘identify’ points used in representation, their
choice would depend on x.
Show by example that the constant n + 1 in Caratheodory’s theorem cannot
be improved. That is, exhibit a set S ⊆ Rn and a point x ∈ convS that cannot be
represented as a convex combination of fewer than n + 1 points from S.
1.4. Polytopes. The simplest convex sets are those which are convex hulls of
a finite set of points, that is, sets of the form S = conv{x1 , x2 , ..., xm }. The convex
hull of a finite set of points in Rn is called a polytope.
1. Prove that the set
n+1
X
∆ = {x ∈ Rn+1 : xi = 1 and xi ≥ 0 for any i}
i=1
1.6. Aside: Helly’s Theorem. While there are not so many applications of
Helly’s theorem to economics (in fact, I am aware of the only one paper that uses
Helly’s theorem in economic context), it is definitely one of the most famous results
in convexity.
Theorem 13 (Helly, 1913). Let A1 , A2 , ..., Am ⊆ Rn be a finite family of convex
sets with m ≥ n + 1. Suppose that every n + 1 sets have a nonempty intersection.
Then all sets have a nonempty intersection.
To prove Helly’s theorem with elegance we need first to formulate a very useul
result obtianed by J.Radon.
Theorem 14 (Radon, 1921). Let S ⊆ Rn be a set of at least n + 2 points.
Then there are two non-intersecting subsets R ⊂ S (‘red points’) and B ⊂ S (‘blue
points’) such that
convR ∩ convB 6= ∅.
Proof. Let x1 , ..., xm be m ≥ n + 2 distinct points from S. Consider the
system of n + 1 homogeneous linear equations in variables γ1 , ..., γm
γ1 x1 + ... + γm xm = 0 and γ1 + ... + γm = 0
Since m ≥ n + 2, there is a nontrivial solution to this system. Let
R = {xi : γi > 0} and B = {xi : γi < 0}.
P P
Then R ∩ B = ∅. Let β = γi then β > 0 and γi = −β, since γ’s sum
i:γi >0 i:γi <0
up to zero. Moreover,
X X
γi xi = (−γi ) xi
i:γi >0 i:γi <0
i
P
since γi x = 0. Let
X γi X −γi
x= xi = xi .
i:γ >0
β i:γ <0
β
i i
Since I ∩ J = ∅ we have p ∈ ∩m
i=1 Ai .
2. SUPPORT AND SEPARATION 61
where inf denotes the infimum or greatest lower bound. It is a property of the
real numbers that any set of real numbers has an infimum. Thus µS (p) is well
defined for any set S. If the minimum exists, for example if the set S is compact,
then the infimum is the minimum. In other cases the minimum may not exist. To
take a simple one dimensional example suppose that the set S was the subset of R
consisting od the numbers 1/n for n = 1, . . . and that p = 2. Then clearly p·x = px
does not have a minimum on the set S However 0 is less than px = 2x for any value
of x in S but for any number a greater than 0 there is a value of x in S such that
px < a. Thus 0 is in this case the infimum of the set {p · x | x ∈ S}.
Recall that we have not assumed that S is convex. However, if we do assume
that S is both convex and closed then the function µS contains all the information
needed to reconstruct S.
Given any extended-real valued function µ : Rn → R ∪ {∞} let us define the
set Sµ as
Sµ = {x ∈ Rn | p · x ≥ µ(p) for every p ∈ Rn }.
That is, for each p > −inf ty we define the closed half space
{x ∈ Rn | p · x ≥ µ(p)}.
Notice that is µ(p) = −∞ then p · x ≥ µ(p) for any x and so the above set will be
Rn rather than a half space. The set Sµ is the intersection of all these closed half
spaces. Since the intersection of convex sets is convex and the intersection of closed
sets is closed, the set Sµ is, for any function µ, a closed convex set.
Suppose that we start with a set S, define µS as above and then use µS to
define the set SµS . If the set S was a closed convex set then SµS will be exactly
equal to S. Since we have seen that SµS is a closed convex set, it must be that if
S is not a closed convex set it will not be equal to SµS . However S will always be
a subset of SµS , and indeed SµS will be the smallest closed convex set such that S
is a subset, that is SµS is the closed convex hull of S.
The extreme points of a closed ball and of a closed cube in R3 are its boundary
points and its eight vertices, respectively. A half-space has no extreme points even
if it is closed.
An interesting property of extreme points is that an extreme point can be
deleted from the set without destroying convexity of the set. That is, a point x in
a convex set S is an extreme point iff the set S\{x} is convex.
The next Theorem is a finite-dimensional version of a quite general and powerful
result by M.G. Krein and D.P. Milman.
Theorem 18 (Krein & Milman, 1940). Let S ⊆ Rn be convex and compact.
Then S is the convex hull of its extreme points.