Sunteți pe pagina 1din 388

Contents

Introduction 7
I Elements of linear algebra 11
1 Linear spaces 13
1.1 The Euclidean space . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2 Euclidean n-space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.3 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2 Linear Algebra 27
2.1 Systems of linear equations. Gauss-Jordan
elimination method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Linear programming problems (LPP) . . . . . . . . . . . . . . . . . . . 43
II Calculus 111
3 One variable calculus 113
3.1 Dierential calculus of one variable . . . . . . . . . . . . . . . . . . . . 113
3.1.1 Limits and continuity . . . . . . . . . . . . . . . . . . . . . . . 114
3.1.2 Rates of change and derivatives . . . . . . . . . . . . . . . . . . 127
3.1.3 Linear approximation and dierentials . . . . . . . . . . . . . . 139
3.1.4 Extreme values of a real valued function . . . . . . . . . . . . . 141
3.1.5 Applications to economics . . . . . . . . . . . . . . . . . . . . . 150
3.2 Integral calculus of one variable . . . . . . . . . . . . . . . . . . . . . . 157
3.2.1 Antiderivatives and techniques of integration . . . . . . . . . . 157
3.2.2 The denite integral . . . . . . . . . . . . . . . . . . . . . . . . 164
3.3 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
3.3.1 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . . 167
3.3.2 Eulers integrals . . . . . . . . . . . . . . . . . . . . . . . . . . 176
4 Dierential calculus of several variables 187
5
4.1 Real functions of several variables. Limits and
continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
4.1.1 Real functions of several variables . . . . . . . . . . . . . . . . 187
4.1.2 Limits. Continuity . . . . . . . . . . . . . . . . . . . . . . . . . 192
4.2 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
4.3 Higher order partial derivatives . . . . . . . . . . . . . . . . . . . . . . 212
4.4 Dierentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
4.4.1 Dierentiability. The total dierential . . . . . . . . . . . . . . 218
4.4.2 Higher order dierentials . . . . . . . . . . . . . . . . . . . . . 228
4.4.3 Taylor formula in R
n
. . . . . . . . . . . . . . . . . . . . . . . . 231
4.5 Extrema of function of several variables . . . . . . . . . . . . . . . . . 235
4.6 Constrained extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
4.7 Applications to economics . . . . . . . . . . . . . . . . . . . . . . . . . 261
4.7.1 The method of least squares . . . . . . . . . . . . . . . . . . . . 261
4.7.2 Inventory control. The economic order
quantity model . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
III Probabilities 269
A short history of probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 271
5 Counting techniques. Tree diagrams 273
5.1 The addition rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273
5.2 Tree diagrams and the multiplication principle . . . . . . . . . . . . . 277
5.3 Permutations and combinations . . . . . . . . . . . . . . . . . . . . . . 281
6 Basic probability concepts 289
6.1 Sample space. Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
6.2 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . 302
6.3 The total probability formula. Bayes formula . . . . . . . . . . . . . . 306
6.4 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
6.5 Classical probabilistic models. Urn models . . . . . . . . . . . . . . . . 313
7 Random variables 327
7.1 Discrete random variables . . . . . . . . . . . . . . . . . . . . . . . . . 328
7.2 The distribution function of a random variable . . . . . . . . . . . . . 332
7.3 Continuous random variables . . . . . . . . . . . . . . . . . . . . . . . 335
7.4 Numerical characteristics of random variables . . . . . . . . . . . . . . 336
7.5 Special random variables . . . . . . . . . . . . . . . . . . . . . . . . . . 345
Appendix A 380
Appendix B 384
Bibliography 389
6
Introduction
Why Maths?
Because Mathematics is the universal language of sciences. When we speak math-
ematics, all the barriers - linguistic or cultural ones - are pushed away.
Why Maths in Economis?
Mathematics plays an important role in Economics. This role has been rather
signicant for the late century and knew a real impulse during the last decades.
Emmanuel Kant (1724-1804) said: A science contains as much science as it con-
tains Mathematics.
One of the rst economists who wanted to make economics more scientic by
applying the mathematical rigor to it was Alfred Marshall (1842-1924, English
economist). He did not want that an overside of mathematics to make the economic
texts harder to understand. Accordingly, Marshall put the mathematical content in
the footnotes and appendices of his economics books. In 1906 we wrote:
I had a growing feeling in the later years of my work at the subject that a good
mathematical theorem dealing with economic hypotheses was very unlikely to be a
good economics: and I went more and more on the rules
(1) Use mathematics as a short hand language, rather than an engine of inquiry;
(2) Keep to them till you have done;
(3) Translate into English;
(4) Then illustrate by examples that are important in real life;
(5) Burn the mathematics;
(6) If you cant succeed in (4), burn (3).
That last I did often.
The use of mathematics in economics provides us some advantages:
- the language that is used is more precise and concise
- it allows us to treat the general case
- we have at our disposal a great number of mathematical results.
On his blog, Greg Mankiw (Professor of Economics at Harvard University) wrote
the following answers to the next question:
Why aspiring economists need Math?
A student who wants to pursue a career in policy-related economics is advised to
7
go to the best graduate school he or she can get into. The best graduate schools will
expect to see a lot of math on your undergraduate transcript, so you need to take it.
But will you use a lot of dierential equations and real analysis once you land that
dream job in a policy organization? No, you wont.
That raises the question: Why do we academics want students that have taken a
lot of math? There are several reasons:
1. Every economist needs to have a solid foundation in the basics of economic
theory and econometrics, even if you are not going to be either a theorist or an econo-
metrician. You cannot get this solid foundation without understanding the language
of mathematics that these elds use.
2. Occasionally, you will need math in your job. In particular, even as a policy
economist, you need to be able to read the academic literature to gure out what
research ideas have policy relevance. That literature uses a lot of math, so you will
need to be equipped with mathematical tools to read it intelligently.
3. Math is good training for the mind. It makes you a more rigorous thinker.
4. Your math courses are one long IQ test. We use math courses to gure out who
is really smart.
5. Economics graduate programs are more oriented to training students for aca-
demic research than for policy jobs. Although many econ PhD go on to policy work,
all of us teaching in graduate programs are, by denition, academics. Some academics
take a few years o to experience the policy world, as I did not long ago, but many
academics have no idea what that world is like. When we enter the classroom, we
teach what we know. (I am not claiming this is optimal, just reality.) So math plays
a larger role in graduate classes than it does in many jobs that PhD economists hold.
It is possible that admissions committees for econ PhD programs are excessively
fond of mathematics on student transcripts? Perhaps. That is some thing I might
argue with my colleagues about if I were ever put on the admissions committee. But
a student cannot change that. The fact is, if you are thinking about a PhD program
in economics, you are advised to take math course until it hurts.
At present the Maths teachers mission is to do the best advertisement for Maths,
to make the students the importance of Maths, to dress up the Maths classes in
vivid colours.
The purpose of this book (covering three parts (Linear Algebra, Calculus and
Probabilities) divided in seven chapters) is to give to the students at economics the
possibility to acquire the basic knowledge in Maths which they will have to work with
in the future, in order to be able to use them with complex economic models belonging
to the real world. Our book attempts to develop the students intuition concerning
the ways of working with mathematical techniques.
This book wants to be a device for using Maths in order to understand the struc-
ture of economics.
The text oers an introduction into the most intimate relationship between Maths
and Economics.
Taking into account the applicative content of this book, we will not always present
the complete proofs of all theoretical statements but attached importance to examples
and economic applications. As Mathematics is a very old science, we cant possibly be
8
entirely original, but the structure and concepts have been thought out after several
years of work together with the students in economics. For the content could get closer
to the necessities of economists, we introduced several examples from the economic
eld.
This book is especially meant to the rst years students in Economic Sciences
and Business Administration. We also turn to all those who should need to refresh
their knowledge in maths which are to be used in economics or just for the sake of
their professional update.
9
10
Part I
Elements of linear algebra
11
Chapter 1
Linear spaces
1.1 The Euclidean space
One of the main uses of mathematics in economic theory is to construct the ap-
propriate geometric and analytic generalizations of the two or three-dimensional ge-
ometric models which are the main stay of undergraduate economic courses. In this
paragraph we will study how to generalize notions of points, lines, planes, distances
and angles to n-dimensional Euclidean spaces.
The set of real numbers, denoted by R, plays a dominant role in mathematics. The
geometric representation of R is a straight line. A point, called the origin, is chosen
to represent 0 and another point, usually to the right of 0, to represent 1. There is
a natural correspondence between the points on the line and real numbers, i.e. each
point will represent a unique real number and each real number will be represented
by a unique point. For this reason we refer to R as the real line and use the words
point and number interchangeably.
0 1
O P

The real line R
We assume that the reader is familiar with the Cartesian plane. Each point P
represents an ordered pair of real numbers (a, b) R
2
and each element of R
2
can
be represented by a point in the Cartesian plane. The vertical line through the point
P meets the horizontal axis (x axis) at a which is called the abscissa of P. The
horizontal line through the point P meets the vertical axis (y axis) at b which is
called the ordinate of P. For this reason we refer to
R
2
= (a, b) [ a R, b R
as the plane and use the words plane and ordered pair of real numbers interchangeably.
13

O
P
b
a
x axis
y axis
The cartesian plane R
2
If P(x
1
, y
1
) and Q(x
2
, y
2
) are two points in R
2
then the distance between them
can be determined by using the pythagorean theorem in a right triangle. We shall
denote the distance between P and Q by d(P, Q).
`

y
2
y
1
x
2
x
1
P
Q

_
. .
x
2
x
1
y
2
y
1
The line PQ
d(P, Q) =
_
(x
2
x
1
)
2
+ (y
2
y
1
)
2
.
It is well know that two dierent points determine exactly one line.
Next, we will present dierent forms of the equation of a line.
The inclination of a line l is the angle that l makes with the horizontal axis.
is the smallest positive angle measured counterclockwise from the positive end of the
x axis to the line l. The range of is given by 0 < 180

.
The slope m of the line is dened as the tangent of its angle of inclination:
m = tg .
14
In particular, the slope of the line passing through two points P(x
1
, y
1
) and
Q(x
2
, y
2
) is given by:
m = tg =
y
2
y
1
x
2
x
1
.
Remark. a) The slope of an horizontal line, i.e. when y
2
= y
1
, is zero.
b) The slope of a vertical line, i.e. when x
2
= x
1
is not dened.
c) Two distinct lines l
1
and l
2
are parallel if and only if their slopes, m
1
respectively
m
2
, are equal, i.e. m
1
= m
2
.
d) Two distinct lines l
1
and l
2
are perpendicular if and only if the slope of one is
the negative reciprocal of the other:
m
1
=
1
m
2
or m
1
m
2
= 1.
Linear equations
Every line l in the Cartesian plane R
2
can be represented by a linear equation of
the form
(l) ax +by +c = 0,
where a and b are not both zero, i.e. each point on l is a solution of (l) and each
solution of (l) is a point on l.
Horizontal and vertical lines
The equation of a horizontal line i.e. a line parallel to the x axis, is of the form
y = k where k is the ordinate of the point at which the line intersects the y axis.
In particular, the equation of the x axis is y = 0.
The equation of a vertical line i.e. a line parallel to the y axis, is of the form
x = k where k is the abscissa of the point at which the line intersects the x axis.
In particular, the equation of the y axis is x = 0.
Point-slope form
A line is completely determined if we know its direction (its slope) and a point on
the line.
The equation of the line having slope m and passing through the point (x
1
, y
1
) is
(l) y y
1
= m(x x
1
).
Two points form
Let P(x
1
, y
1
) and Q(x
2
, y
2
) two dierent points in the Cartesian plane. The equa-
tion of the line which passes through the previous two points is:
(l)
y y
1
y
2
y
1
=
x x
1
x
2
x
1
.
15
The previous equation was obtained by replacing m =
y
2
y
1
x
2
x
1
in the point-slope
form of the equation.
1.2 Euclidean n-space
We can interpret the order pairs of R
2
not only as locations but also as displace-
ments. We represent these displacements as vectors in R
2
. The displacement (a, b)
means: move a units to the right and b units up from the current location. The tail of
the arrow marks the initial location; the head marks the location after the displace-
ment is made.
To develop a geometric intuition for vector addition we can think in the following
way. If u = (a, b) and v = (c, d) are two vectors in R
2
, then u + v will represent a
displacement of a +c units to the right and b +d units up.
`

`
y
x
O
u
u
v
v
u
+
v
Addition of two vectors
We can use the parallelogram as in the above gure to draw u + v keeping the
tails of u and v at the same point.
It is generally not possible to multiply two vectors in a nice way to generalize the
multiplication of real numbers. For instance, coordinatewise multiplication does not
satisfy the basic properties of the multiplication of real numbers. On the other hand,
geometrically, scalar multiplication of a vector v be a nonnegative (negative) scalar
corresponds to stretching or shrinking v without (with) changing its direction.
16
`

y
x
2a
a
b
2b
O
u
2u
Vector multiplication by a real scalar
We will generalize the previous discussion to the general case.
For an integer n 1. By denition R
n
is the set of ordered n-tuples x =
(x
1
, x
2
, . . . , x
n
) of real numbers i.e.
R
n
= x = (x
1
, x
2
, . . . , x
n
) [ x
1
, x
2
, . . . , x
n
R.
The elements of R
n
are called vectors and the numbers x
i
, i = 1, n, are called the
coordinates of x (x
i
is the i
th
coordinate of x).
The two fundamental operations (which generalize the addition of two vectors and
the multiplication of a vector by a scalar) are:
1. addition of two vectors: if x, y R
n
then
x +y := (x
1
+y
1
, x
2
+y
2
, . . . , x
n
+y
n
) R
n
2. scalar multiplication: if x R
n
and R then
x = (x
1
, x
2
, . . . , x
n
) R
n
.
Next, we will present an axiomatic concept based on the simplest properties of
the previous operations.
Denition. A real vector space is a set V ,= with an operation + : V V V
called vector addition and an operation : R V V called scalar multiplication
with the following properties:
(i) (x +y) +z = x + (y +z), x, y, z V
(ii) there is a vector V (called the null vector) that is an identity element for
addition:
x + = +x = , x V
(iii) for any x V there is x V such that
x + (x) = (x) +x =
17
(iv) x +y = y +x, x, y V
(v) 1 x = x, x V
(vi) x, y V, , R
(a) ()x = (x)
(b) ( +)x = x +x
(c) (x +y) = x +y.
Remark. (V, +) is a commutative group.
Example. R
n
is a vector space.
The proof of the previous example follows immediately by using the denitions of
the operations in R
n
.
We next show that any nite dimensional vector space is like R
n
.
Let V be a vector space.
A linear subspace of V is a subset W V that is itself a vector space with
vector addition and scalar multiplication dened by restriction of given operations on
V .
Remark. If W V then W is a linear subspace if and only if the following two
conditions are fullled:
a) W ,=
b) R, u, v W, u +v W.
If
1
,
2
, . . . ,
n
R and v
1
, v
2
, . . . , v
n
V then the sum
1
v
1
+ + a
n
v
n
V
is called a linear combination of the vectors v
1
, v
2
, . . . , v
n
.
The span of a set S V is the set of all linear combinations
1
v
1
+ +
n
v
n
where v
1
, . . . , v
n
V and
1
,
2
, . . . ,
n
R.
span S =
1
v
1
+ +
n
v
n
[ n N

,
1
, . . . ,
n
R, v
1
, . . . , v
n
V .
A set of vectors v
1
, . . . , v
n
is called linearly independent if the vector equation

1
v
1
+
2
v
2
+ +
n
v
n
=
has the only trivial solution
1
=
2
= =
n
= 0.
A set of vectors v
1
, . . . , v
n
is called linearly dependent if the vector equation

1
v
1
+ +
n
v
n
= has a nontrivial solution, that is, there are
1
,
2
, . . . ,
n
R
not all zero, such that
1
v
1
+ +
n
v
n
= .
In other words, a set of vectors in a vector space is linearly dependent if and only
if one vector can be written as a linear combination of the others.
A basis for V is a subset B V that spans V (spanB = V ) and is minimal for
this property in the sense that there are no proper subsets of B that span V .
A basis is linearly independent. Conversely, a linearly independent set that spans
V is a basis for V .
Remark. If V has a nite basis, then all basis have the same number of elements.
In this case we say that V is nite dimensional and the common number of elements
of the basis of V is called the dimension of V .
If B = v
1
, . . . , v
n
is a basis of v then each point v V can be written as a linear
combination
v =
1
v
1
+
2
v
2
+ +
n
v
n
18
in exactly one way.
Example. a) The set B = e
1
, . . . , e
n
where
e
i
= (0, . . . , 0, 1, 0, . . . , 0)
(1 is the i
th
coordinate), i = 1, n, is a base of the vector space R
n
.
b) dimR
n
= n.
Normed spaces
The concept of a norm is an abstract generalization of the length of a vector.
Denition. Let V be a vector space. A function | | : V R, x |x| is called
a norm if it satises the following conditions:
N1) |x| 0, x V
|x| = 0 x = 0
N2) |x| = [[ |x|, x V, R
N3) |x +y| |x| +|y|, x, y V .
A vector space V with a norm | | is called a normed space and it is denoted by
(V, | |).
Inner product spaces
Denition. Let V be a vector space. A mapping , ) : V V R is called an
inner product in V if the following conditions are satised:
IP1) x, x) > 0, x V ;
x, x) = 0 x = (positive deniteness)
IP2) x, y) = y, x), x, y V
IP3) x +y, z) = x, z) +y, z), x, y, z V, , R (bilinearity).
A vector space with an inner product is called an inner product space.
Example. The canonical inner product in R
n
is dened in the following way:
, ) : R
n
R
n
R
x, y) = x
1
y
1
+x
2
y
2
+ +x
n
y
n
, x, y R
n
.
The proof that the previous functions satises all the properties of the previous
denition is left to the reader (easy computations based on the properties of real
numbers).
Example. The function | | : R
n
R
x |x| =
_
x, x)
is a norm on R
n
which is called the Euclidean norm.
Proof. The properties (N1) and (N2) are easy consequences of the properties of
the inner product.
The property (N3) follows from the Cauchy-Buniakovski-Schwarz inequality:
[x, y)[ |x| |y|, x, y R
n
.
19
We consider rst the following obvious inequalities:
0 |x +y|
2
= x +y, x +y) = |x|
2
+|y|
2
+ 2x, y)
0 |x y|
2
= x y, x y) = |x|
2
+|y|
2
2x, y)
wherefrom we get
(|x|
2
+|y|
2
) 2x, y) |x|
2
+|y|
2
hence
2[x, y)[ |x|
2
+|y|
2
.
We can assume that x ,= and y ,= (if x = or y = the Cauchy-Buniakovski-
Schwarz is true).
If we replace x by
x
|x|
and y by
y
|y|
in the previous inequality we obtain
2

_
x
|x|
,
y
|y|
_

_
_
_
_
x
|x|
_
_
_
_
2
+
_
_
_
_
y
|y|
_
_
_
_
2
= 2
wherefrom we have:
[x, y)[ |x| |y|,
as desired.
We are now able to prove N3)
|x +y|
2
= |x|
2
+ 2x, y) +|y|
2
CBS
|x|
2
+ 2|x| |y| +|y|
2
= (|x| +|y|)
2
.
Example (other norms on R
n
).
1) | |
1
: R
n
R, |x|
1
= [x
1
[ + +[x
n
[, x R
n
2) | |

: R
n
R, |x|

= max[x
1
[, . . . , [x
n
[.
Metric spaces
Denition. If X ,= , d : X X R is called distance on X if the following
conditions are satised:
D1) d(x, y) 0, x, y X
d(x, y) = 0 x = y
D2) d(x, y) = d(y, x), x, y (symmetry)
D3) d(x, y) d(x, y) +d(y, z), x, y, z (triangle inequality).
A metric space is a pair (X, d) in which X is a nonempty set and d is a distance
on X.
Remark. Each normed space (V, | |) is a metric space since the function
d : V V R
d(x, y) = |x y|, x, y V
satises the properties D1), D2), D3).
Example. In R
n
the Euclidean distance is:
d(x, y) =
_
(x
1
y
1
)
2
+ + (x
n
y
n
)
2
, x, y R
n
.
20
1.3 Quadratic forms
In this section we present the natural generalizations of linear and quadratic func-
tions to several variables.
Linear operators
Denition. Let V and W be two real vector spaces, such that dimV = n and
dimW = m.
A linear operator from V to W is a function T that preserves the vector space
structure, i.e.
T(x +y) = T(x) +T(y), x, y V, , R.
A linear operator T : V R is called a linear form.
Remark 1. Let T : R
n
R be a linear operator. Then, there exists a vector
a =
_
_
_
a
1
.
.
.
a
n
_
_
_ R
n
such that T(x) = a
t
x for all x V .
Proof. Let B = e
1
, . . . , e
n
be the canonical basis of R
n
.
Let a
i
= T(e
i
) R, i = 1, n. Then, for any vector x R
n
,
x = x
1
e
1
+x
2
e
2
+ +x
n
e
n
T(x) = T(x
1
e
1
+ +x
n
e
n
) = x
1
T(e
1
) + +x
n
T(e
n
)
= x
1
a
1
+ +x
n
a
n
= a
t
x.
The previous remark implies that every linear form on R
n
can be associated with
a unique vector a R
n
(or with a unique 1 n matrix) so that T(x) = a
t
x.
The same correspondence between linear operators and matrices is valid for linear
operators from R
n
to R
m
.
Remark 2. Let T : R
n
R
m
be a linear operator. Then there exists an m n
matrix A such that
T(x) = Ax, x R
n
.
Proof. The idea is the same as that of the previous remark. Let B = e
1
, . . . , e
n

be the canonical base of R


n
.
For each j = 1, n, T(e
j
) R
m
, hence
T(e
j
) =
_
_
_
_
a
1j
a
2j
. . .
a
mj
_
_
_
_
.
21
Let A be the mn matrix whose j
th
column is the column vector T(e
j
). For any
x = x
1
e
1
+ +x
n
e
n
R
n
we have
T(x) = T(x
1
e
1
+ +x
n
e
n
) = x
1
T(e
1
) + +x
n
T(e
n
)
= x
1
_
_
_
_
a
11
a
21
. . .
a
m1
_
_
_
_
+x
2
_
_
_
_
a
12
a
22
. . .
a
m2
_
_
_
_
+ +x
n
_
_
_
_
a
1n
a
2n
. . .
a
mn
_
_
_
_
=
_
_
_
_
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
. . . . . . . . . . . .
a
m1
a
m2
. . . a
mn
_
_
_
_
_
_
x
1
. . .
x
n
_
_
= Ax.
So, we can say that matrices are representations of linear operators.
Quadratic forms
In mathematics, a quadratic form is a homogeneous polynomial of degree two in
a number of variables.
Examples.
Q(x) = ax
2
Q(x, y) = ax
2
+bxy +cy
2
Q(x, y, z) = ax
2
+by
2
+cz
2
+dxy +exz +fyz
are quadratic forms in one, two or three variables.
Quadratic forms are associated to bilinear forms.
Denition. a) Let V and W be two real vector spaces such that dimV = n and
dimW = m. The application ([) : V W R is called bilinear if it is linear with
respect to its two variables, i.e.:
(x +y[z) = (x[z) +(y[z), , R, x, y V, z W
(x[y +z) = (x[y) +(x[z), , R, x V, y, z W.
b) The bilinear form ([) : V V R is called symmetric if
(x[y) = (y[x), x, y V.
c) Let V be a real vector space whose dimension is n and let ([) : V V R be
a symmetric bilinear form. The application:
Q : V R
x Q(x) = (x[x), x V
is called a quadratic form on V .
22
Next, we determine the analytical expression of a quadratic form.
If B = v
1
, . . . , v
n
is a base of V then x can be uniquely expressed as
x = x
1
v
1
+x
2
v
2
+ +x
n
v
n
.
Hence:
Q(x) = (x[x) = (x
1
v
1
+x
2
v
2
+ +x
n
v
n
[x)
= x
1
(v
1
[x) +x
2
(v
2
[x) + +x
n
(v
n
[x)
= x
1
(v
1
[x
1
v
1
+ +x
n
v
n
)
+ +
+x
n
(v
n
[x
1
v
1
+ +x
n
v
n
)
= x
1
[x
1
(v
1
[v
1
) + +x
n
(v
1
[v
n
)]
+ +
+x
n
[x
1
(v
n
[v
1
) + +x
n
(v
n
[v
n
)]
= x
1
n

j=1
x
j
(v
1
[v
j
) + +x
n
n

j=1
x
j
(v
n
[v
j
)
=
n

i=1
_
_
x
i
n

j=1
x
j
(v
i
[v
j
)
_
_
=
n

i=1
n

j=1
(v
i
[v
j
)x
i
x
j
.
In conclusion:
Q(x) = (x[x) =
n

i=1
n

j=1
(v
i
[v
j
)x
i
x
j
.
If for each i, j = 1, n we denote a
ij
= (v
i
[v
j
) then a
ij
= a
ji
(since ([) is symmetric)
and
Q(x) =
n

i=1
n

j=1
a
ij
x
i
x
j
= a
11
x
2
1
+2a
12
x
1
x
2
+ + 2a
1n
x
1
x
n
+a
22
x
2
2
+ + 2a
2n
x
2
x
n
+ +a
nn
x
2
n
which is the analytical expression of the quadratic form Q.
Just as a linear function has a matrix representation, a quadratic form has a matrix
representation, too.
Remark. The quadratic form Q : R
n
R,
Q(x) =
n

i=1
n

j=1
a
ij
x
i
x
j
(a
ij
= a
ji
)
23
can be written as
Q(x) = (x
1
, . . . , x
n
)
_
_
_
_
a
11
. . . a
1n
a
21
. . . a
2n
. . . . . . . . .
a
n1
. . . a
nn
_
_
_
_
_
_
x
1
. . .
x
n
_
_
= x
t
Ax,
where A is the matrix (symmetric) of the coecients of the quadratic form Q.
Denitess of quadratic forms
Denition. If Q : R
n
R be a quadratic form, then Q is
(a) positive denite if Q(x) > 0 for all x R
n

(b) positive semidenite if Q(x) 0 for all x R
n
(c) negative denite if Q(x) < 0 for all x R
n

(d) negative semidenite if Q(x) 0 for all x R
n
(e) indenite if there are x, x

R
n
such that Q(x) < 0 and Q(x

) > 0.
Next, we will describe a simple test for the denitess of a quadratic form. To
present the test we need some denitions related to the coecient matrix of Q.
Denition. a) Let A be a nn matrix. A mm submatrix of A formed by deleting
n m columns and the same n m rows from A is called a m
th
order principal
submatrix of A. The determinant of a m m principal submatrix is called a m
th
order principal minor of A.
For a n n matrix there are C
m
n
m
th
order principal minors of A.
b) Let A be a n n matrix. The m
th
order principal submatrix of A obtained by
deleting the last n m rows and the last n m columns from A is called the m
th
order leading principal minor of A.
We will denote the m
th
order leading principal submatrix by A
m
and the corre-
sponding leading principal minor by det A
m
.
The following remark provides an algorithm which uses the leading principal mi-
nors to determine the denitess of a quadratic form Q whose coecient matrix is
A.
Remark 3. Let Q : R
n
R be a quadratic form whose coecient matrix is A.
Then
(a) Q is positive denite if and only if all its n leading principal minors are strictly
positive i.e.
det A
1
> 0, det A
2
> 0, . . . , det A
n
= det A > 0
(b) Q is negative denite if and only if all its n leading principal minors alternate
in sign as follows
det A
1
< 0, det A
2
> 0, . . . , (1)
n
det A
n
= (1)
n
det A > 0.
(c) Q is positive semidenite if and only if every principal minor of A is nonnega-
tive.
24
d) Q is negative semidenite if and only if every principal minor of odd order is
not positive and every principal minor of even order is nonnegative.
e) If there is an even number m (m 1, . . . , n) such that det A
m
< 0 or if there
are two odd numbers m
1
and m
2
such that det A
m
1
< 0 and det A
m
2
> 0 then Q is
indenite.
Proof. We will prove the part (a) (the proofs are similar for parts (b), (c) and
(d)) by using induction on the size of A (the coecient matrix of Q).
Suppose that all the leading minors are strictly positive. We have to show that Q
is positive denite.
If n = 1 then the result is trivial.
If n = 2 then det A
1
= a
11
> 0, det A
2
= det A = a
11
a
22
a
2
12
> 0 and hence:
Q(x) = (x
1
, x
2
)
_
a
11
a
12
a
21
a
22
__
x
1
x
2
_
= a
11
x
2
1
+ 2a
12
x
1
x
2
+a
22
x
2
2
= a
11
_
x
1
+
a
12
a
11
x
2
_
2
+
a
11
a
22
a
2
12
a
11
x
2
2
= det A
1
_
x
1
+
a
12
a
11
x
2
_
2
+
det A
2
det A
1
x
2
2
> 0, (x
1
, x
2
) ,= (0, 0).
We suppose that the theorem is true for symmetric matrices of order k and prove
it for symmetric matrices of order k + 1.
Let A be symmetric matrix of order k + 1. We have to prove that if det A
j
> 0,
j = 1, k + 1 then x
t
Ax > 0, x ,= . The matrix A can be written as
A =
_
A
k
a
a
t
a
k+1 k+1
_
where a =
_
_
_
_
a
1 k+1
a
2 k+1
. . .
a
k k+1
_
_
_
_
.
If d = a
k+1 k+1
a
t
A
1
k
a then we have
_
I
k
0
(A
1
k
a)
t
1
__
A
k
0
0 d
__
I
k
A
1
k
a
0 1
_
=
_
I
k
0
(A
1
k
a)
t
1
__
A
k
a
0 d
_
=
_
A
k
a
a
t
(A
1
k
a)
t
a +d
_
=
_
A
k
a
a
t
a
k+1 k+1
_
= A.
The previous equality can be written as A = C
t
BC, where
C =
_
I
k
A
1
k
a
0 1
_
and B =
_
A
k
0
0 d
_
.
Since det C = det C
t
= 1 and det B = d det A
k
then
det A = det B = d det A
k
.
25
Since det A > 0 and det A
k
> 0 then d > 0. Let x R
k+1
. Every x R
k+1
can be written as x =
_
x
x
k+1
_
where x R
k
.
Then
x
t
Ax = x
t
C
t
BCx = (Cx)
t
B(Cx) = y
t
By
=
_
y
t
y
n+1
_
_
A
k
0
0 d
__
y
y
k+1
_
= y
t
A
k
y +dy
2
k+1
.
In the previous equality we denoted the vector Cx by y =
_
y
y
k+1
_
which is not
the null vector since C is invertible and x ,= .
By using the inductive hypothesis and the fact that d > 0 we get that x
t
Ax > 0,
hence Q is positive denite.
To prove the converse (Q positive denite implies that det [A
j
[ > 0, j = 1, n) we
will use the induction once more.
If n = 1 then the result is trivial.
If n = 2, then
Q(x) = det A
1
_
x
1
+
a
12
a
11
x
2
_
2
+
det A
2
det A
1
x
2
2
.
In the previous equality we used the fact that a
11
,= 0 since if a
11
= 0 then
Q(1, 0) = 0 and Q cannot be positive denite.
It is obvious that if Q(x) > 0, x ,= , then det A
1
> 0 and det A
2
> 0.
Assume that the result is true for any quadratic form whose coecient matrix has
order k and let A be the (k + 1) (k + 1) coecient matrix of a positive denite
quadratic form.
Let x R
k
. If x =
_
x
0
_
R
k+1
then
0 < x
t
Ax =
_
x
t
0
_
A
_
x
0
_
= x
t
A
k
x.
By the inductive hypothesis we obtain that det A
1
> 0, det A
2
> 0, and det A
k
> 0.
It remains for us only to prove that det A = det A
k+1
is positive.
We write the matrix A as in the rst part of the proof. Hence
A = C
t
BC and det A = det A
k
d.
We have to prove now that d > 0. Indeed, since Q is positive denite we have that
d = (0, 0, . . . , 1)
_
A
k
0
0 d
_
_
_
_
0
.
.
.
1
_
_
_ = x
t
Bx = x
t
(C
1
)
t
AC
1
x
= (C
1
x)
t
A(C
1
x) > 0.
Since det A
k
> 0 and d > 0 then det A = det A
k+1
> 0 as desired.
26
Chapter 2
Linear Algebra
2.1 Systems of linear equations.
Gauss-Jordan elimination method
A nite set of linear equations in the variables x
1
, x
2
, . . . , x
n
R is called a
system of linear equations or a linear system. The general form of a linear system of
m equations and n unknowns is the following:
_
_
_
a
11
x
1
+ +a
1n
x
n
= b
1
. . .
a
m1
x
1
+ +a
mn
x
n
= b
m
where a
ij
R, b
i
R, i = 1, m, j = 1, n.
A solution of the system is a list (s
1
, . . . , s
n
) of numbers which makes each equation
a true statement when the values s
1
, . . . , s
n
are substituted for x
1
, . . . , x
n
, respectively.
The set of all possible solutions is called the solution set or the general solution of the
linear system. Two linear systems are called equivalent if they have the same solution
set. That is, each solution of the rst system is a solution of the second system, and
each solution of the second system is a solution of the rst system.
A system of linear equation has either
1. no solution, or
2. exactly one solution, or
3. innitely many solutions.
We say that a linear system is consistent if it has either one solution or innitely
many solutions; a system is inconsistent if it has no solution.
For a linear system we consider
the matrix of the system (the matrix of the coecients of the unknowns)
A = (a
ij
)
i=1,m
j=1,n
=
_
_
_
a
11
. . . a
1n
.
.
.
.
.
.
a
m1
. . . a
mn
_
_
_;
27
A =
_
_
_
_
_
R
1
R
2
.
.
.
R
m
_
_
_
_
_
,
where R
i
= (a
i1
, a
i2
, . . . , a
in
) is the i
th
row.
the augmented matrix (the coecient matrix with an added column containing
the constants from the right sides of the equations)
A =
_
_
_
a
11
. . . a
1n
b
1
.
.
.
.
.
.
.
.
.
a
m1
. . . a
mn
b
m
_
_
_
the column of the constants
b =
_
_
_
b
1
.
.
.
b
n
_
_
_
the column of the unknowns
x =
_
_
_
x
1
.
.
.
x
n
_
_
_.
By using the above matrix notations the system can be written in the following
form:
Ax = b.
Concerning the solutions set of a linear system we have the following result:
Remark. 1) If rank A < rank A then there is no solution of the considered linear
system. The system is inconsistent.
2) If rank A = rank A = n (where n is the number of the unknowns) then the
system has exactly one solution. The system is consistent.
3) If rank A = rank A < n then the system has innitely many solutions.
This chapter describes an algorithm or a systematic procedure for solving linear
systems. This algorithm is called Gauss-Jordan elimination method and its basic
strategy is to replace one system with an equivalent one that is easier to solve.
If the equivalent system contains a degenerate linear equation of the following
form
0 x
1
+ 0 x
2
+ + 0 x
n
= b
i
then
i) If b
i
,= 0, then the system is inconsistent.
ii) If b
i
= 0, then the degenerate equation may be deleted from the system without
changing the solution set.
The method is named after German mathematicians Carl Friederich Gauss (1777-
1855) and Wilhelm Jordan (1842-1899) but it appears in an important Chinese math-
ematical text which was written approximately at 150 BCE.
28
The rectangle rule for row operations
The purpose of this paragraph is to transform a matrix which has a nonzero
column into an equivalent one that contains one element equal to 1 and all the other
elements equal to 0 (we say that such a column is in proper form).
This can be done by using the elementary row operations which are:
1) Scaling. Multiply all entries in a row by a nonzero constant.
R
i
R
i
, ,= 0
2) Replacement. Replace one row by the sum of itself and a multiple of another
row.
R
i
+R
k
R
i
3) Interchange. Interchange two rows.
R
i
R
j
Remark. If we apply the elementary row operations to an augmented matrix of a
linear system we obtain a new matrix which is the augmented matrix of an equivalent
linear system to the given one. This remark is true since it is well known that the
solution of a system remains unchanged if we multiply one equation by a nonzero
constant or if we add a multiple of one equation to another or if we interchange two
equations of a system (the rows of an augmented matrix correspond to the equations
in the associated system).
Let A be the matrix
A =
_
_
_
_
_
_
_
. . . . . . . . . . . . . . .
. . . a
ij
. . . a
il
. . .
.
.
.
.
.
.
. . . a
kj
. . . a
kl
. . .
. . . . . . . . . . . . . . .
_
_
_
_
_
_
_
Suppose that a
ij
,= 0.
We want to determine the elementary row operations which transform the element
a
ij
into 1 (a
ij
1) and all the other elements of the j
th
column into 0 (a
kj
0,
k ,= i).
We consider the following row operations
R
i

1
a
ij
R
i
, a
ij

a
ij
a
ij
= 1
and
R
k
R
i

a
kj
a
ij
R
k
, a
kj
a
kj
a
ij

a
kj
a
ij
= 0, k = 1, m, k ,= i.
The eects of the previous elementary row operations on the other elements of the
matrix are:
a
il

a
il
a
ij
, l = 1, n, l ,= j
29
a
kl
a
kl
a
il

a
kj
a
ij
=
a
kl
a
ij
a
il
a
kj
a
ij
, k ,= i, l ,= j.
The element a
ij
,= 0 is called the pivot.
So, in order to transform the element a
kl
(by using a
ij
as a pivot) we locate the
rectangle which contains the element and the pivot a
ij
as opposite corners. Then,
from the product of the elements situated in the opposite corners of the previous
rectangles diagonal which contains the pivot we subtract the product of the elements
situated in the corners of the other diagonal and the result is divided by the pivot
(rectangles rule).
Remark. 1) The rows which contain 0 on the pivot column remain unchanged.
Indeed, if a
kj
= 0 then
a
kl

a
kl
a
ij
a
il
0
a
ij
= a
kl
.
2) The columns which contain 0 on the pivot row remain unchanged.
Indeed, if a
il
= 0 then
a
kl

a
kl
a
ij
0 a
kj
a
ij
= a
kl
.
So, in order to transform a matrix which has a nonzero column into an equivalent
one that contains one element equal to 1 and all the other elements equal to 0 we
have to follow the next steps:
Rectangles algorithm
Step 1. Choose and circle (from the considered column) a nonzero element which is
called the pivot.
Step 2. Divide the pivot row by the pivot.
Step 3. Set the elements of the pivot column (except the pivot) equal to 0.
Step 4. The rows which contain a 0 on the pivot column remain unchanged.
The columns which contain a 0 on the pivot row remain unchanged.
Step 5. Compute all the other elements of the matrix by using the rectangles rule.
Example.
A =
_
_
2 0 1 1
1 3 2 0
1 1 0 2
_
_

_
_
2 0 1 1
1/3 1 2/3 0
4/3 0 2/3 2
_
_
Remark. The rectangle rule can be used to determine the inverse of a given
invertible matrix A. This can be done by writing at the right side of the given matrix
the unitary matrix I which has the same number of rows and columns as the matrix A
30
and then applying the rectangle rule to the obtained matrix. By choosing successively
the elements situated on the main diagonal of matrix A as pivots we will nally obtain
the unitary matrix I (situated below the given matrix A). The matrix situated at the
right side of the unitary matrix (in the nal table) is the inverse of the matrix A.
We will illustrate the previous procedure by an example.
Example. Determine the inverse of the matrix A given by
A =
_
_
1 2 3
2 3 1
3 1 2
_
_
.
We observe that the matrix A is invertible since its determinant is 18 ,= 0.
A I
3
1 2 3 1 0 0
2 3 1 0 1 0
3 1 2 0 0 1
1 2 3 1 0 0
0 1 5 2 1 0
0 5 7 3 0 1
1 0 7 3 2 0
0 1 5 2 1 0
0 0 18 7 5 1
1 0 0
5
13
1
18
7
18
0 1 0
1
18
7
18

5
18
0 0 1
7
18

5
18
1
18
Hence
A
1
=
_
_

5
13
1
18
7
18
1
18
7
18

5
18
7
18

5
18
1
18
_
_
.
The Gauss-Jordan elimination method
This method is an elimination procedure which transforms the initial system into
an equivalent one whose solution can be obtained directly.
Gauss-Jordan elimination algorithm
Step 1. Associate to the given system the following table. The table contains the
augmented matrix with the constant column written at the left side of the matrix A.
b x
1
x
2
. . . x
n
b
1
a
11
a
12
. . . a
1n
b
2
a
21
a
22
. . . a
2n
.
.
.
.
.
.
.
.
.
.
.
.
b
m
a
m1
a
m2
. . . a
mn
31
Step 2. Choose and circle a
ij
,= 0 (the pivot). The pivot has to be chosen from
the coecient matrix A not from the constant column.
Use a
ij
as a pivot to eliminate the unknown x
j
from all the equations except the
i
th
equation (by applying rectangles algorithm).
Step 3. Examine each new row obtained (or, equivalently, each new equation) R.
a) If R corresponds to the following equation
0 x
1
+ + 0 x
n
= 0
then delete R from the table.
b) If R corresponds to the following equation
0 x
1
+ 0 x
2
+ + 0 x
n
= b
i
,
with b
i
,= 0 then exit the algorithm. The system is inconsistent.
Step 4. Repeat steps 2 and 3 with the subsystem formed by all the equations
from which a pivot hasnt been chosen yet.
Step 5. Continue the above process until we choose a pivot from each row or a
degenerate equation is obtained at the step 3b.
In the case of consistency (the system is consistent if we choose a pivot from each
row) write the general solution. The solution set can be specied as follows
- the variables whose columns are in proper form are called leading variables. If all
the variables are leading variables then the system has a unique solution which can
be obtained directly from the column b
- the variables whose columns are not in proper form may assume any values and
they are called secondary variables. If there is at least one secondary variable then
the system has innitely many solutions. In this case express the leading variables in
terms of secondary variables.
Example. Solve the following linear systems.
a)
_
_
_
x + 2y 3z + 4t = 2
2x + 5y 2z +t = 1
5x + 12y 7z + 6t = 7
Solution
b x y z t
2 1 2 3 4
1 2 5 2 1
7 5 12 7 6
2 1 2 3 4
3 0 1 4 7
3 0 2 8 14
8 1 0 11 18
3 0 1 4 7
3 0 0 0 0
The system is inconsistent since we obtain the following equation:
3 = 0 x + 0 y + 0 z + 0 t.
32
b)
_
_
_
x 2y +z = 7
2x y + 4z = 17
3x 2y + 2z = 14
Solution
b x y z
7 1 2 1
17 2 1 4
14 3 2 2
7 1 2 1
3 0 3 2
7 0 4 1
0 1 2 0
11 0 11 0
7 0 4 1
2 1 0 0
1 0 1 0
3 0 0 1
The system is consistent since we have chosen a pivot from each row. The leading
variables are x, y, z and the system has a unique solution which is
_
_
x
y
z
_
_
=
_
_
2
1
3
_
_
.
c)
_
_
_
x + 2y 3z 2s + 4t = 1
2x + 5y 8z s + 6t = 4
x + 4y 4z + 5s + 2t = 8
Solution
b x y z s t
1 1 2 3 2 4
4 2 5 8 1 6
8 1 4 4 5 2
1 1 2 3 2 4
2 0 1 2 3 2
7 0 2 1 7 2
3 1 0 1 8 8
2 0 1 2 3 2
3 0 0 3 1 2
21 1 0 25 0 24
7 0 1 11 0 8
3 0 0 3 1 2
33
The system is consistent since we have chosen a pivot from each row. The leading
variables are x, y, s; the secondary variables are z, t and in consequence the system is
consistent and has innitely many solutions.
The general solution can be expressed as follows.
From the nal table we write down the following equivalent system with the given
one:
_
_
_
x + 25z + 24t = 21
y 11z 8t = 7
3z +s + 2t = 3
where from we easily can express the leading variables in terms of secondary variables
_
_
_
x = 21 25z 24t
y = 7 + 11z + 8t
s = 3 3z 2t
with z, t R.
The general solution is:
_
_
_
_
_
_
x
y
z
s
t
_
_
_
_
_
_
=
_
_
_
_
_
_
21 25z 24t
7 + 11z + 8t
z
3 3z 2t
t
_
_
_
_
_
_
,
where z, t R.
Leontief Production Model
The Leontief production model is a model for the economics of a whole country or
region. In this model there are n industries producing n dierent products such that
consumption equals production. We remark that a part of production is consumed
internally by industries and the rest is to satisfy the outside demand.
The problem is to determine the levels of the outputs of the industries if the
external demand is given and the prices are xed. We will measure the levels of the
outputs in terms of their economic values. Over some xed period of time, let
x
i
= monetary value of the total output of the i
th
industry
d
i
= monetary value of the output of the i
th
industry needed to satisfy the external
demand
c
ij
= monetary value of the output of the i
th
industry needed by the j
th
industry to
produce one unit of monetary of its own output.
We dene the production vector
x =
_
_
_
_
_
x
1
x
2
.
.
.
x
n
_
_
_
_
_
,
34
the demand vector
d =
_
_
_
_
_
d
1
d
2
.
.
.
d
n
_
_
_
_
_
and the consumption matrix
C =
_
_
_
_
c
11
c
12
. . . c
1n
c
21
c
22
. . . c
2n
. . . . . . . . . . . .
c
n1
c
n2
. . . c
nn
_
_
_
_
.
It is obvious that x
j
, d
j
, c
ij
0 for each j = 1, n, i = 1, n.
The quantity c
i1
x
1
+c
12
x
2
+ +c
in
x
n
is the value of the output of the i
th
industry
needed by all n industries. We are led to the following equation
x = Cx +d
which is called the Leontief input-output model, or production model.
Writing x as I
n
x and using matrix algebra, we can rewrite the previous equation
as
I
n
x Cx = d
(I
n
C)x = d.
The above system can be solved by using the Gauss-Jordan elimination method.
If the matrix I
n
C is invertible, then we obtain
x = (I
n
C)
1
d.
Example. As a simple example, suppose the economy consists of three sectors -
manufacturing, agriculture and services whose consumption matrix is given by
C =
_
_
0, 5 0, 2 0, 1
0, 4 0, 3 0, 1
0, 2 0, 1 0, 3
_
_
.
Suppose the external demand is 50 units for manufacturing, 30 units for agriculture
and 20 units for services. Find the production level that will satisfy this demand.
Solution 1 (by using the Gauss-Jordan elimination method)
The production equation is the following
(I
3
C)x = d
which gives us the following system to be solved:
_
_
_
0, 5x
1
0, 2x
2
0, 1x
3
= 50
0, 4x
1
+ 0, 7x
2
0, 1x
3
= 30
0, 2x
1
0, 1x
2
+ 0, 7x
3
= 20
35
b x
1
x
2
x
3
50 0, 5 0, 2 0, 1
30 0, 4 0, 7 0, 1
20 0, 2 0, 1 0, 7
500 5 2 1
300 4 7 1
200 2 1 7
500 5 2 1
200 9 9 0
3700 33 15 0

4100
9
3 0 1

200
9
1 1 0
10100
3
18 0 0
950
9
0 0 1
4450
27
0 1 0
5050
27
1 0 0
x
1
=
5050
27
187
x
2
=
4450
27
165
x
3
=
950
9
106
Solution 2 (by determining the inverse of the matrix I C)
We know that the production level is determined by
x = (I
3
C)
1
D.
We rst determine the matrix (I
3
C)
1
.
I
3
C I
3
0, 5 0, 2 0, 1 1 0 0
0, 4 0, 7 0, 1 0 1 0
0, 2 0, 1 0, 7 0 0 1
1
2
5

1
5
2 0 0
0
27
50

9
50
4
5
1 0
0
9
50
33
50
2
5
0 1
1 0
1
3
70
27
20
27
0
0 1
1
3
40
27
50
27
0
0 0
3
5
2
3
1
3
1
1 0 0
80
27
25
27
5
9
0 1 0
50
27
55
27
5
9
0 0 1
10
9
5
9
5
3
36
Hence,
(I C)
1
=
_
_
80
27
25
27
5
9
50
27
55
27
5
9
10
9
5
9
5
3
_
_
and in consequence
x = (I C)
1
d =
_
_
5050
27
4450
27
950
9
_
_
as we expected.
The theorem below shows that in most practical cases, I C is invertible and
the production vector x is economically feasible in the sense that the entries in x are
nonnegative.
Theorem. Let C be the consumption matrix for an economy and let d be the
vector of external demand. If C and d have nonnegative entries and if each row sum
or each column sum of C is less than 1, then (I C)
1
exists and the production
vector
x = (I C)
1
d
has nonnegative entries and is the unique solution of the production equation
x = Cx +d.
Remark. The economic interpretation of entries in (I C)
1
The (i, j)
th
entry of the matrix (I C)
1
is the increased amount of the i
th
sector
which is to be produced in order to satisfy an increase of 1 unit in the external demand
for sector j.
Proof. Let d be the vector in R
n
with 1 in the j
th
entry and zeros elsewhere. The
corresponding production vector x is the j
th
column of (I C)
1
. This shows that
the (i, j)
th
entry of (I C)
1
gives the production of the i
th
sector to satisfy 1 unit
in the external demand for sector j. Now, the conclusion is true since if x
1
and x
2
are
production vectors which satisfy respectively the external demands d
1
and d
2
then
x
1
x
2
is the production vector which satises the external demand d
1
d
2
.
Basic feasible solutions
We consider a linear system in general form
_
_
_
a
11
x
1
+a
12
x
2
+ +a
1n
x
n
= b
1
. . .
a
m1
x
1
+a
m2
x
2
+ +a
mn
x
n
= b
m
.
We suppose that the above system is a consistent one with an innite number of
solutions (that means that rank A = rank A < n). Also, we suppose that rank A = m
(in the case that rank A < m then there are some equations of the system which are
linear combinations of the others, and if we eliminate these equations we dont change
the general solution).
37
Since rank A = rank A = m < n the system will have m leading variables and
n m secondary variables. A leading variable is also called a basic variable and a
secondary variable is called a nonbasic variable.
Denitions
A feasible solution (FS) of a linear system is a solution for which all the com-
ponents are nonnegative.
A basic solution (BS) of a linear system is a solution for which all the nonbasic
variables are zero.
If one or more basic variables in a BS are zero then the solution is a degenerate
BS.
A basic feasible solution (BFS) is a feasible solution which is also a basic one.
If a BFS is degenerate, it is called a degenerate BFS.
Example. Determine all the basic solutions and all the basic feasible solutions of
the following system:
_
2x
1
+ 3x
2
x
3
= 9
x
1
+x
2
x
3
= 2
Solution. Since
A =
_
2 3 1 9
1 1 1 2
_
,
then rank A = rank A = 2 then the system is consistent with an innite number of
solutions. Actually, we have 2 basic variables and one nonbasic variable.
The 2 basic variables can be:
a) x
1
, x
2
(x
3
is a nonbasic variable)
Since x
3
is nonbasic then x
3
= 0 and the system becomes
_
2x
1
+ 3x
2
= 9
x
1
+x
2
= 2
The solution of the previous system is x
1
= 3 and x
2
= 1.
In this case we obtain the BS
_
_
3
1
0
_
_
which is also a BFS.
b) x
1
, x
3
(x
2
is a nonbasic variable)
In this case we obtain the BS
_
_
11
3
0

5
3
_
_
which is not a BFS.
c) x
2
, x
3
(x
1
is a nonbasic variable)
In this case we obtain the BS
_
_
0
11
2
15
2
_
_
which is a BFS.
38
Remark. For a consistent system having an innite number of solutions whose
rank is m < n (n is the number of unknowns) there are at most C
m
n
basic solutions.
Our purpose is to determine the basic feasible solutions of a linear system. We will
use the Gauss-Jordan elimination method.
Since the rank A = m then we have m basic variables and nm nonbasic variables.
Since a basic variable is a variable from whose column we have chosen a pivot, that
means that we have chosen m pivots from m dierent columns and m dierent rows.
In consequence, we choose a pivot from each row.
Eventually by renumbering the unknowns we can suppose that we have chosen
pivots from the rst m columns. So, we can suppose that the basic variables are
x
1
, . . . , x
m
and the nonbasic variables are x
m+1
, . . . , x
n
.
The computations can be arranged in the following table.
b x
1
. . . x
m
x
m+1
. . . x
n
b
1
a
11
. . . a
1m
a
1m+1
. . . a
1n
.
.
.
b
m
a
m1
. . . a
mm
a
mm+1
. . . a
mn
. . . . . .

1
1 . . . 0
1m+1
. . .
1n
.
.
.

m
0 . . . 1
mm+1
. . .
mn
The general solution is:
_

_
x
1
=
1
(
1m+1
x
m+1
+ +
1n
x
n
)
x
2
=
2
(
2m+1
x
m+1
+ +
2n
x
n
)
.
.
.
x
m
=
m
(
mm+1
x
m+1
+ +
mn
x
n
)
x
m+1
, . . . , x
n
R
In order to get a basic solution we let
x
m+1
= = x
n
= 0, so x
1
=
1
, x
2
=
2
, . . . , x
m
=
m
.
The basic solution
x =
_
_
_
_
_
_
_
_
_
_
_
_

2
.
.
.

m
0
.
.
.
0
_
_
_
_
_
_
_
_
_
_
_
_
is in the nal table. This basic solution is also a basic feasible solution if in the nal
table the column of the constants contains only nonnegative elements.
39
Next, we will determine rules for choosing the pivot such that if in the initial
table the column of the constants is nonnegative then so it will be in the nal table.
Actually, we are interested in preserving the property of the constant column to
contain only nonnegative elements at each intermediate table which occurs when we
solve the system.
We may assume that in the initial table the constant column is nonnegative (if
there is an equation whose righthand side constant is negative, then we can multiply
it by 1).
We are interested in choosing a pivot from the j
th
column such that the constant
column in the next table will remain nonnegative.
If we choose a
ij
,= 0 as a pivot then b
i
will transform into
b
i
a
ij
, which has to be
nonnegative, too.
Since b
i
0 and
b
i
a
ij
0, then a
ij
(the pivot) has to be positive.
If k ,= i, then the element b
k
will transform by using the rectangles rule into
b
k

b
k
a
ij
b
i
a
kj
a
ij
0.
Since a
ij
> 0 then b
k
a
ij
b
i
a
kj
0, k = 1, m, k ,= i.
For k = i the previous inequality becomes
b
i
a
ij
b
i
a
ij
= 0 0;
so the pivot has to satisfy the following condition
b
i
a
kj
b
k
a
ij
, k = 1, m ()
Let J
1
= k = 1, m [ a
kj
> 0 and J
2
= k = 1, m [ a
kj
0.
If k J
2
then () is satised since b
i
a
kj
0 b
k
a
ij
.
If k J
1
then () is equivalent with the following condition
b
i
a
ij

b
k
a
kj
, k J
1
.
So, () is satised if
b
i
a
ij
= min
_
b
k
a
kj
, k J
1
_
where J
1
= k = 1, m [ a
kj
> 0.
The previous condition is called the ratio test.
Conclusion. In order to keep the nonnegativity property of the constants column
we obtain the following rule for choosing a pivot on the j
th
column.
1) The pivot has to be positive; a
ij
> 0.
2) If J
1
= (on the j
th
column there is no positive element) then none of the
elements of the j
th
column can become a pivot. In this case x
j
cant be a basic
variable.
40
If J
1
,= then the pivot will be the positive element situated on j
th
column for
which the ratio test is satised.
The computation table contains an extra column situated at the right hand side
of the usual table, for the ratio test.
Remark. If the ratio test is satised for more than one element then the pivot will
be the element which provides us the minimum row by using the lexicographical
order.
Let a = (a
1
, . . . , a
n
) R
n
and b = (b
1
, . . . , b
n
) R
n
. We say that a < b (in
lexicographical order) if
_

_
a
1
< b
1
or
a
1
= b
1
, a
2
< b
2
or
a
1
= b
1
, a
2
= b
2
, a
3
< b
3
or
a
1
= b
1
, . . . , a
n1
= b
n1
, a
n
< b
n
Examples. Determine a basic feasible solution for the following systems:
a)
_
2x
1
+ 3x
2
x
3
= 9
x
1
+x
2
x
3
= 2 [ 1
First, we multiply the second equation by 1, in order to obtain a positive constant
in the right hand side.
_
2x
1
+ 3x
2
x
3
= 9
x
1
x
2
+x
3
= 2
b x
1
x
2
x
3
ratio test
9 2 3 1
2 1 1 1 min
_
9
2
,
2
1
_
= 2
5 0 5 3
x
1
2 1 1 1
x
2
1 0 1
3
5
x
1
3 1 0
2
5
BFS : x =
_
_
3
1
0
_
_
x
2
11
2
3
2
1 0
x
3
15
2
5
2
0 1 BFS : x =
_
_
0
11
2
15
2
_
_
41
b)
_
2x
1
x
2
3x
3
+x
4
= 5
x
1
2x
2
+x
3
+ 2x
4
= 10
b x
1
x
2
x
3
x
4
ratio test
5 2 1 3 1 min
_
5
1
,
10
2
_
= 5
10 1 2 1 2
5 2 1 3 1
|
5
1
2
1
1
2
1
0
3
2
0
7
2
0
x
4
5
1
2
1
1
2
1 min
_
0
3
2
,
5
1
2
_
= 0
x
1
0 1 0
7
3
0
x
4
5 0 1
5
3
1 BFS :
_
_
_
_
0
0
0
5
_
_
_
_
a degenerate BFS
c)
_
_
_
x + 2y 3z 2s + 4t = 1
2x + 5y 8z s + 6t = 4
x + 4y 7z + 5s + 2t = 8
b x y z s t ratio test
1 1 2 3 2 4
4 2 5 8 1 6 min
_
1
1
,
4
2
,
8
1
_
= 1
8 1 4 7 5 2
x 1 1 2 3 2 4
2 0 1 2 3 2 min
_
2
3
,
7
7
_
=
2
3
7 0 2 4 7 2
x
7
3
1
8
3

13
3
0
8
3
s
2
3
0
1
3

2
3
1
2
3
7
3
0
1
3
2
3
0
8
3
x
35
2
1
1
2
0 0 20
s 3 0 0 0 1 2 BFS :
_
_
_
_
_
_
35
2
0
7
2
3
0
_
_
_
_
_
_
z
7
2
0
1
2
1 0 4
42
2.2 Linear programming problems (LPP)
Example. The diet problem
We want to determine the most economical diet which satises the basic minimum
nutritional requirements for a good health.
We know:
- there are available n dierent kinds of food: F
1
, . . . , F
n
- food F
j
sells at a price c
j
per unit; j = 1, n
- there are m basic nutritional ingredients N
1
, . . . , N
m
- for a balanced diet each individual must receive at least b
i
units of N
i
th
ingredient
per day, i = 1, m
- each unit of food F
j
contains a
ij
units of the i
th
ingredient.
We want:
- the amount x
j
of food F
j
(j = 1, n) such that the total cost of the diet is as
small as possible.
The mathematical model is the following:
- The total cost:
f(x
1
, x
2
, . . . , x
n
) = x
1
c
1
+x
2
c
2
+ +x
n
c
n
minimize
- The quantity of the i
th
ingredient received by a person is:
a
i1
x
1
. .
from F
1
+ a
i2
x
2
. .
from F
2
+ + a
in
x
n
. .
from F
n
b
i
, i = 1, m
We have to solve the following problem (example of a linear programming problem):
f =
n

j=1
c
j
x
j
minimize (the objective function)
subject to the constraints:
_

_
a
11
x
1
+a
12
x
2
+ +a
1n
x
n
b
1
. . .
a
m1
x
1
+a
m2
x
2
+ +a
mn
x
n
b
m
x
j
0, j = 1, n
The main characteristic of a LPP is that all the involved functions: the objective
function and those which express the constraints must be linear.
Denition (General form of a linear programming problem)
A LPP is an optimization (minimization or maximization) problem of the following
form:
Find the optimum (minimum or maximum) of the following function
f =
n

j=1
c
j
x
j
subject to the constraints:
43
_

_
n

j=1
a
ij
x
j
b
i
, i = 1, p
n

j=1
a
ij
x
j
b
i
, i = p + 1, q
n

j=1
a
ij
x
j
= b
i
, i = q + 1, m
x
j
0, j = 1, n
where c
j
, b
i
, a
ij
, j = 1, n, i = 1, m are known real numbers, and x
j
, j = 1, n are real
numbers to be determined.
Depending on particular values of p and q we may have inequality constraints of
one type or the other and equality restrictions as well.
Denition (Standard form of a LPP)
Optimize
f =
n

j=1
c
j
x
j
subject to the constraints:
_

_
a
11
x
1
+ +a
1n
x
n
= b
1
. . .
a
m1
x
1
+ +a
mn
x
n
= b
m
x
j
0, j = 1, n
where c
j
, b
i
, a
ij
, j = 1, n, i = 1, m are known real numbers; x
j
, j = 1, n are real
numbers to be determined.
We can assume that b
i
0, i = 1, m (otherwise we multiply the equality by 1).
Remark. Any LPP can be converted to the standard form by using the slack or
surplus variables.
a) If a
i1
x
1
+a
i2
x
2
+ +a
in
x
n
b
i
, then we add to the left side a new variable
y
i
0 in order to transform the inequality into an equality. We obtain:
a
i1
x
1
+a
i2
x
2
+ +a
in
x
n
+y
i
= b
i
.
In this case y
i
is called a slack variable.
b) If a
i1
x
1
+ a
i2
x
2
+ + a
in
x
n
b
i
then we subtract to the left side of the
inequality a new variable y
i
0 in order to transform the inequality into an equality.
We obtain
a
i1
x
1
+a
i2
x
2
+ +a
in
x
n
y
i
= b
i
.
In this case y
i
is called a surplus variable.
Denition. Any solution of the constraints for which the optimum of the objective
function is obtained is called an optimal solution.
44
The graphical method for solving a LPP
When a LPP involves only 2 variables it can be solved by graphical procedures.
The graphical approach is extremely helpful in understanding the kinds of phenomena
which can occur in solving linear programming problems. We consider the case n = 2.
The feasible region is the set of points with coordinates (x, y) that satisfy all the
constraints. Each constraint (inequality) represents a half-plane at one side of the line
whose equation is the correspondent equality. So, the set of feasible solution is the
intersection of these half planes.
Example 1. Determine the maximum of the function
f(x, y) = 3x + 2y subject to
_
_
_
2x + 2y 4
3x y 3
x 0, y 0
`

(0, 2)
(0, 0) (1, 0)
(2, 3)
3
x

y
=
3

x
+
2
y
=
4
3
x
+
2
y
=
1
0
3
x
+
2
y
=
0
To solve this problem graphically we rst shade the region in the graph in which
all the feasible solutions must lie and then shift the position of the objective function
line
f = 3x + 2y.
The objective function is linear so it has level curves that are straight lines of
equation
3x + 2y = c, c const.
45
The question is how big can c become so that the line of equation 3x + 2y = c
meets the above polygon somewhere (for a maximum problem).
The objective is to maximize the level curve 3x+2y = c. If we x for the beginning
the value of c to be 0 we see that the level curve can be represented as a line of slope
3
2
that passes through the origin. Translating this objective line (i.e. moving it without
changing its slope) is equivalent to choose a dierent value for c. When the value of
c increases, the correspondent line moves to the right and hence we are interested in
determining the greatest value for c such that the correspondent level curve touches
the set of feasible solutions.
Graphically it is not hard to realize that the optimum value is realized at the
vertex (2,3) and the value of the maximum is 10.
Remark (The corner point method for solving a LPP)
The following cases can arise for a maximization problem.
1) If the constraints are such that there is no feasible region, then there is no
solution for the LPP.
2) If the objective function line can be moved indenitely in a direction that
increases f and still intersects the feasible region, then f approaches .
3) If the objective function line can be moved only a nite amount by increasing
the value of f (while still intersecting the feasible region) then the last point touched
by the objective function, if it is unique, will give the unique optimal solution. If it
is not unique, then any point on the segment of the boundary last touched gives an
optimal solution. In this case if x and x are the coordinates of the endpoints of the
segment, then the general solution is the following
(1 t)x +tx, t [0, 1].
Example 2.
f(x, y) = 6x 2y maximize
subject to
_
_
_
x + 2y 4
3x y 3
x 0, y 0
Solution. In this case the level curve 6x2y = c is parallel to the line 3xy = 3,
hence the optimal solutions are situated on the segment whose end points are (1,0)
and (2,3).
46
`

y
x
(2,3)
(0,2)
(1,0) (0, 0)
3
x

y
=
3

x
+
2
y
=
4
6
x

2
y
=
0
6
x

2
y
=
6
Hence, the general solution is
(1 t)(1, 0) +t(2, 3) = (1 +t, 3t), t [0, 1].
Example 3.
f(x, y) = 5x + 4y maximize
subject to
_
_
_
x +y 2
2x 2y 9
x 0, y 0
x +y
9
2
Solution. In this case the set of feasible solutions is empty and hence the LPP
has no solution.
47
`

(0,2)
y
(0,
9
2
)
(
9
2
, 0)
x
(2,0)
x
+
y
=
2
x
+
y
=
9
2
Example 4.
f(x, y) = x 4y maximize
subject to
_
_
_
2x y 1
x + 2y 2
x 0, y 0
Solution. In this case we have an unbounded feasible set. The objective function
can become as large as we want so the LPP is unbounded.
48
`

(0,1)
y
(0, 1)
(
1
2
, 0) (0,2)
x
x
+
2
y
=
2
Remark. A similar discussion can be made regarding a minimum LPP.
The Simplex algorithm
The Simplex algorithm was developed by George B. Dantzig and was used rst
for military reasons and after the second world war in the business world. In the years
70 the Simplex algorithm was used to optimize the production, the benets, the
costs and in the game theory. George B. Dantzig is considered to be one of the three
founders of linear programming, among John von Neumann and Leonid Kantorovich.
We will analyze only minimum problems, the results concerning maximum prob-
lems will be only stated.
Consider a LPP in standard form
f = c
1
x
1
+ +c
n
x
n
minimize
subject to
_

_
a
11
x
1
+ +a
1n
x
n
= b
1
.
.
.
a
m1
x
1
+ +a
mn
x
n
= b
m
x
1
0, . . . , x
n
0
As we have already discussed, we can assume that
rank A = m and b
j
0, j = 1, m.
49
Theorem (The fundamental theorem of linear programming)
Let a LPP in standard form.
a) If there is no optimal solution the problem is either infeasible or unbounded.
b) If there is an optimal solution then there is an optimal basic feasible solution.
Remark. The previous theorem assures us that it is sucient to consider only
BFS in our search for optimal solutions.
The idea of the simplex method is to start from one basic feasible solution of the
constraints set and transform it into another one in order to decrease the value of the
objective function until a minimum is reached.
We need a criterion to decide when the objective function cannot be decreased
anymore, case in which we found an optimal solution and no more iterative steps are
needed.
Let x be an arbitrary BFS associated to the rst part of the next table (the table
looks this way, eventually by renumbering the equations and the unknowns).
b x
1
x
2
. . . x
m
x
m+1
. . . x
n
x
1

1
1 0 . . . 0
1m+1
. . .
1n
x
2

2
0 1 . . . 0
2m+1
. . .
2n
.
.
.
.
.
.
x
m

m
0 0 . . . 1
mm+1
. . .
mn
x
m+1

1
1/
1m+1
0 . . . 0 1 . . .
1n
/
1n+1
x
2

2
0 1 . . . 0 0
.
.
.
.
.
.
.
.
.
x
m

m
.
.
. 0 . . . 1 0 . . .
.
.
.
The basic variables are: x
1
, x
2
, . . . , x
m
.
The nonbasic variables are: x
m+1
, . . . , x
m
.
The rst part of the previous table gives us the following BFS
x =
_
_
_
_
_
_
_
_
_
_

2
. . .

m
0
. . .
0
_
_
_
_
_
_
_
_
_
_
For each j = 1, n, we dene
f
j
=
m

i=1
c
i

ij
,
where c
i
, i = 1, m are the coecients of the objective function.
50
For instance, we have
f
1
= c
1
1 +c
2
0 + +c
m
0 = c
1
.
.
.
f
m
= c
1
0 +c
2
0 + +c
m
1 = c
m
f
m+1
= c
1

1m+1
+c
2

2m+1
+ +c
m

mm+1
.
.
.
f
n
= c
1

1n
+c
2

2n
+ +c
m

mn
We are now able to present the main result.
Theorem (The optimality criterion for a minimum LPP) If for a basic
feasible solution we have c
j
f
j
0, j = 1, n, then the solution is optimal.
Proof. Let x be a BFS for which c
j
f
j
0, j = 1, n.
We want to prove that any new BFS (x) obtained by choosing a new pivot from
the remaining columns (from m+ 1 to n) isnt better then x, that is f(x) f(x).
Suppose we choose
1m+1
as a pivot from the m+1
th
column, so
1m+1
satises
the following conditions
_

1m+1
> 0

1m+1
= min
k
_

k

km+1
[
km+1
> 0
_
the lexicographical order is respected
By choosing the new pivot, x
1
is leaving the base and becomes a nonbasic variable
and x
m+1
is entering the base and becomes a basic variable.
We determine now, the new BFS

1
=

1

1m+1

2
=
2


1

1m+1

2m+1

3
=
3


1

1m+1

3m+1
.
.
.

m
=
m


1

1m+1

mm+1
The basic variables for x are x
2
=

2
, . . . , x
m
=

m
and x
m+1
=

1
.
The nonbasic variables for x are x
1
= x
m+2
= = x
n
= 0.
51
Hence,
x =
_
_
_
_
_
_
_
_
_
_
0

2
. . .

1
0
0
_
_
_
_
_
_
_
_
_
_
.
It remains for us to compute and compare f(x) and f(x).
f(x) = c
1

1
+ +c
m

m
+c
m+1
0 + +c
n
0
= c
1

1
+ +c
m

m
f(x) = c
1
0 +c
2

_

2


1

1m+1

2m+1
_
+ +
+c
m

_

m


1

1m+1

mm+1
_
+c
m+1


1

1m+1
+c
m+2
0 + +c
n
0
= c
2

2
+ +c
m

m
. .
=f(x)c
1

1
+

1

1m+1
[c
m+1
(c
2

2m+1
+ +c
m

mm+1
. .
=f
m+1
c
1

1m+1
)]
= f(x) c
1

1
+

1

1m+1
(c
m+1
f
m+1
+c
1

1m+1
)
= f(x) c
1

1
+

1

1m+1
(c
m+1
f
m+1
) +c
1

1
So,
f(x) = f(x) +

1

1m+1
. .
0
(c
m+1
f
m+1
. .
0
) f(x).
In conclusion, the basic feasible solution x cannot be improved (so it is optimal).
This completes the proof.
From the previous theorem we get the following obvious corollary.
Corollary. If there exists l 1, n with c
l
f
l
< 0 for a basic feasible solution,
the value of the objective function can be decreased by choosing x
l
as a basic variable.
The following two theorems characterize situations when either an optimal solution
does not exist or when an existing optimal solution is not uniquely determined.
52
Theorem. If inequality c
l
f
l
< 0 holds for a nonbasic variable x
l
and x
l
cannot
become a basic variable (the entire column of x
l
is nonpositive so we cannot choose a
pivot from its column) then the LPP does not have an optimal solution.
In the latter case, the objective function value is unbounded from below, and we
can stop our computation.
Theorem. If there is l 1, . . . , n such that c
l
f
l
= 0 for an optimal solution
and x
l
is a nonbasic variable which can become a basic variable (there is at least
one positive element on its column), then there exists another optimal basic feasible
solution (by choosing x
l
as a basic variable).
Indeed, if the assumptions of the previous theorem are satised, we can perform
a further pivoting step with x
l
as entering variable and there is at least one basic
variable which can be chosen as leaving variable. However, due to c
l
f
l
= 0, the
objective function value does not change.
In what concerns a minimum LPP we have the following conclusions (regarding a
BFS denoted by x):
1) If c
j
f
j
0, for each j 1, . . . , n then x is an optimal solution and
f
min
= f(x).
2) If there is j 1, . . . , n such that c
j
f
j
< 0 and J
1
= k = 1, m,
kj
> 0 =
then the LPP is unbounded from below; f
min
= .
3) If there is j 1, . . . , n such that c
j
f
j
< 0 then x is not an optimal solution.
In this case we obtain a better solution x (f(x) < f(x)) by choosing a pivot from x
j
s
column.
4) If c
j
f
j
0 for each j 1, . . . , n and there is l 1, . . . , n such that
c
l
f
l
= 0 and x
l
is a nonbasic variable then the solution x, obtained by choosing a
pivot from x
l
column, is optimal too.
In what concerns a maximum LPP we have the following conclusions (regarding
a BFS denoted by x).
1) If c
j
f
j
0 for each j 1, . . . , n then x is an optimal solution and f
max
=
f(x).
2) If there is j 1, . . . , n such that c
j
f
j
> 0 and J
1
= k = 1, m,
kj
> 0 =
then the LPP is unbounded from above; f
max
= +.
3) If there is j 1, . . . , n such that c
j
f
j
> 0 then x is not an optimal solution.
In this case we obtain a better solution x (f(x) > f(x)) by choosing a pivot from x
j
s
column.
4) If c
j
f
j
0 for each j 1, . . . , n and there is l 1, . . . , n such that
c
l
f
l
= 0 and x
l
is a nonbasic variable then the solution x, obtained by choosing a
pivot from x
l
column, is optimal too.
Based on the results above, we can summarize the simplex algorithm as follows.
Assume that we have some current basic feasible solutions. The correspondent
simplex tableau is given bellow.
53
c c
1
. . . c
j
. . . c
l
. . . c
n
ratio
c
B
B b x
1
. . . x
j
. . . x
l
. . . x
n
test
. . .
c
i
x
i
b
i

i1
. . .
ij
. . .
il
. . .
in
. . .
c
k
x
k
b
k

k1
. . .
kj
. . .
kl
. . .
kn
. . .
f
j
f(x) f
1
. . . f
j
. . . f
l
. . . f
n
c
j
f
j
c
1
f
1
. . . c
j
f
j
. . . c
l
f
l
. . . c
n
f
n
In the rst row and column of the previous table we write the coecients of the
corresponding variables in the objective function.
f
j
=
m

k=1
c
k

kj
, j = 1, n
are obtained by adding the corresponding products between the elements of column
c
B
and column x
j
.
The simplex algorithm
1
st
step. Determine a BFS.
2
nd
step. Check the optimality of the current BFS.
3
rd
step. If the LPP is unbouded exit the algorithm.
If the current BFS is optimal and unique exit the algorithm.
If the current BFS is optimal and not unique determine another optimal solu-
tion.
If the current BFS is not optimal improve it.
4
th
step. Repeat steps 2 and 3 till obtaining all the optimal solutions.
Examples
1) A rm intends to manufacture three types of products P
1
, P
2
and P
3
so that
the total production cost does not exceed 32000 EUR. There are 400 working hours
possible and 30 units of raw materials may be used. Additionally, the data presented
in the table below are given.
Product P
1
P
2
P
3
Selling price (EUR/piece) 1600 3000 5200
Production cost (EUR/piece) 1000 2000 4000
Required raw material (per piece) 3 2 2
Working time (hours per piece) 20 10 20
54
The objective is to determine the quantities of each product so that the prot
is maximized. Let x
i
be the number of produced pieces of P
i
, i 1, 2, 3. We can
formulate the above problem as an LPP as follows:
The objective function is obtained by subtracting the production cost from the
selling price and dividing the resulting prot by 100 for each product
f(x
1
, x
2
, x
3
) =
1
100
(1600x
1
+ 3000x
2
+ 5200x
3
1000x
1
2000x
2
4000x
3
)
= 6x
1
+ 10x
2
+ 12x
3
maximize
The constraint on the production cost can be divided by 1000
1000x
1
+ 2000x
2
+ 4000x
3
32000 [: 1000
and we obtain
x
1
+ 2x
2
+ 4x
3
32.
The constraint on the working time can be divided by 10
20x
1
+ 10x
2
+ 20x
3
400 [: 10
and we obtain
2x
1
+x
2
+ 2x
3
40.
The constraint on raw materials is the following:
3x
1
+ 2x
2
+ 2x
3
30.
So, we get the following LPP problem, written in general form:
f = 6x
1
+ 10x
2
+ 12x
3
max
subject to
_

_
x
1
+ 2x
2
+ 4x
3
32
3x
1
+ 2x
2
+ 2x
3
30
2x
1
+x
2
+ 3x
3
40
x
1
, x
2
, x
3
0
Introducing now in the i
th
constraint, i = 1, 3, the slack variable x
3+i
0, we
obtain the standard form and the following table.
f = 6x
1
+ 10x
2
+ 12x
3
max
subject to
_

_
x
1
+ 2x
2
+ 4x
3
+x
4
= 32
3x
1
+ 2x
2
+ 2x
3
+x
5
= 30
2x
1
+x
2
+ 3x
3
+x
6
= 40
x
1
, . . . , x
6
0
55
c 6 10 12 0 0 0
c
B
B b x
1
x
2
x
3
x
4
x
5
x
6
ratio test
0 x
4
32 1 2 4 1 0 0 min
_
32
4
,
30
2
,
40
3
_
= 8
0 x
5
30 3 2 2 0 1 0
0 x
6
40 2 1 3 0 0 1
f
j
0 0 0 0 0 0 0
c
j
f
j
6 10 12 0 0 0
12 x
3
8
1
4
1
2
1
1
4
0 0 min
_
8
1
2
,
14
1
_
= 14
0 x
5
14
5
2
1 0
1
2
1 0
0 x
6
16
5
4

1
2
0
3
4
0 1
f
j
96 3 6 12 3 0 0
c
j
f
j
3 4 0 3 0 0
12 x
3
1 1 0 1
1
2

1
2
0
10 x
2
14
5
4
1 0
1
2
1 0
0 x
6
23
5
2
0 0 1
1
2
1
f
j
152 13 10 12 1 4 0
c
j
f
j
7 0 0 1 4 0
Since now all coecients c
j
f
j
are nonpositive, we get the following optimal
solution from the latter table:
x
1
= 0, x
2
= 14, x
3
= 1, x
4
= 0, x
5
= 0, x
6
= 23.
Since there is no nonbasic variable for which c
j
f
j
= 0, we have a unique solution.
This means that the optimal solution is to produce no piece of product P
1
, 14 pieces
of product P
2
and one piece of product P
3
. Taking into account that the coecients
of the objective function were divided by 100, we get a total prot of 15200 EUR.
2) We consider the following LPP
f = 2x
1
2x
2
min
subject to
_
_
_
x
1
x
2
1
x
1
+ 2x
2
4
x
1
, x
2
0.
Solution. First, we transform the given problem into the standard form, i.e. we
multiply the rst constraint by 1 and introduce the slack variables x
3
and x
4
. We
obtain:
f = 2x
1
2x
2
min
subject to
_
_
_
x
1
+x
2
+x
3
= 1
x
1
+ 2x
2
+x
4
= 4
x
1
, x
2
, x
3
, x
4
0
56
c 2 2 0 0
c
B
B b x
1
x
2
x
3
x
4
ratio test
0 x
3
1 1 1 1 0 min
_
1
1
,
4
2
_
= 1
0 x
4
4 1 2 0 1
f
j
0 0 0 0 0
c
j
f
j
2 2 0 0
2 x
2
1 1 1 1 0
0 x
4
2 1 0 2 1
f
j
2 2 2 2 0
c
j
f
j
4 0 2 0
2 x
2
3 0 1 1 1
2 x
1
2 1 0 2 1
f
j
10 2 2 6 4
c
j
f
j
0 0 6 4
Since there is only one negative coecient of a nonbasic variable in the objective
row, variable x
3
should be chosen as entering variable. However, there are only neg-
ative elements in the column belonging to x
3
. This means that we cannot perform a
further pivoting step, and so there does not exist an optimal solution of the minimiza-
tion problem considered, i.e. the objective function value is unbounded from below
(f
min
= ).
3) We consider the following LPP
f = x
1
+x
2
+x
3
+x
4
+x
5
+x
6
min
subject to
_

_
2x
1
+x
2
+x
3
4000
x
2
+ 2x
4
+x
5
5000
x
3
+ 2x
5
+ 3x
6
3000
x
1
, x
2
, x
3
, x
4
, x
5
, x
6
0
Solution. To get the standard form, we notice that in each constraint there is one
variable that occurs only in this constraint. Therefore, we divide the rst constraint
by the coecient 2 of variable x
1
, the second constraint by 2 and the third constraint
by 3. Then, we introduce a surplus variable in each of the constraints and obtain the
standard form
_

_
f = x
1
+x
2
+x
3
+x
4
+x
5
+x
6
min
x
1
+
1
2
x
2
+
1
2
x
3
x
7
= 2000
1
2
x
2
+x
4
+
1
2
x
5
x
8
= 2500
1
3
x
3
+
2
3
x
5
+x
6
x
9
= 1000
x
1
, x
2
, x
3
, x
4
, x
5
, x
6
, x
7
, x
8
, x
9
0
57
c 1 1 1 1 1 1 0 0 0
c
B
B b x
1
x
2
x
3
x
4
x
5
x
6
x
7
x
8
x
9
ratio test
1 x
1
2000 1
1
2
1
2
0 0 0 1 0 0 min
_
2500
1
2
,
1000
2
3
_
1 x
4
2500 0
1
2
0 1
1
2
0 0 1 0 = min{5000, 1500}
1 x
6
1000 0 0
1
3
0
2
3
1 0 0 1 = 1500
f
j
5500 1 1
5
6
1
7
6
1 1 1 1
c
j
f
j
0 0
1
6
0
1
6
0 1 1 1
1 x
1
2000 1
1
2
1
2
0 0 0 1 0 0 min
_
2000
1
2
,
1750
1
2
_
1 x
4
1750 0
1
2

1
4
1 0
3
4
0 1
3
4
= 3500
1 x
5
1500 0 0
1
2
0 1
3
2
0 0
3
2
f
j
5250 1 1
3
4
1 1
3
4
1 1
3
4
c
j
f
j
0 0
1
4
0 0
1
4
1 1
1
4
Now all the coecients in the objective row are nonnegative and from the latter
tableau we obtain the following optimal solution
x
1
= 2000, x
2
= x
3
= 0, x
4
= 1750, x
5
= 1500, x
6
= 0
with the optimal objective function f
min
= 5250.
Notice that the optimal solution is not uniquely determined. In the last tableau,
there is one coecient in the objective row equal to zero (this coecient corresponds
to the nonbasic variable x
2
). Taking x
2
as entering variable, the ratio test determines
x
4
as the leaving variable, and we get
1 x
1
250 1 0
3
4
1
3
4
1 0
3
4
1 x
2
3500 0 1
1
2
2 0
3
2
0 2
3
2
1 x
5
1500 0 0
1
2
0 1
3
2
0 0
3
2
f
j
5250 1 1
3
4
1 1
3
4
1 2
3
4
c
j
f
j
0 0
1
4
0 0
1
4
1 2
3
4
So, we obtain the following basic feasible solution
x
1
= 250, x
2
= 3500, x
3
= x
4
= 0, x
5
= 1500, x
6
= 0
with the same objective function value f
min
= 5250.
The general solution is:
x(t) = (1 t)
_
_
_
_
_
_
_
_
2000
0
0
1750
1500
0
_
_
_
_
_
_
_
_
+t
_
_
_
_
_
_
_
_
250
3500
0
0
1500
0
_
_
_
_
_
_
_
_
=
_
_
_
_
_
_
2000 1750t
3500t
1750 1750t
1500
0
_
_
_
_
_
_
, t [0, 1].
58
Matrix form of the simplex method
We derive now the formulas in matrix-vector form for the linear programming
problem. In vector notation the standard problem becomes:
f(x) = c
t
x minimize
subject to
_
Ax = b
x 0
Here x is an n-dimensional column vector, c
t
is an n-dimensional row vector,
named the cost vector (the symbol c
t
means the transpose of the vector c), A is an
mn matrix and b is an m-dimensional column vector. The vector inequality x 0
means that each component of x is nonnegative.
Let x be a basic feasible solution with the variables ordered so that
x =
_
x
B
0
_
, x
B
R
m
, 0 R
nm
, x
B
0
where x
B
is the vector of basic variables and 0 is the vector of nonbasic variables. In the
same way, the matrix A after the same permutations of column, can be decomposed
as
A =
_
B N
_
.
B is the submatrix of the matrix consisting of the m columns of A corresponding
to the basic variables. These columns are linearly independent and hence the columns
of B form a basis for R
m
. The matrix B is invertible.
The equation Ax = b is equivalent to
_
B N
_
_
x
B
0
_
= b, Bx
B
= b, x
B
= B
1
b.
The cost of such a vector is
f(x) =
_
c
t
B
c
t
N
_
_
x
B
0
_
= c
t
B
x
B
= c
t
B
B
1
b.
The basic step of the simplex method consists in moving to another basic feasible
solution such that the cost has been lowered.
The changing from x =
_
x
B
0
_
to x =
_
x
B
x
N
_
must satisfy the following
conditions:
1) Ax = b,
2) f(x) < f(x)
3) x 0.
The rst condition is equivalent to
x
B
= x
B
B
1
Nx
N
.
59
Indeed, the equality
_
B N
_
_
x
B
x
N
_
= b
implies
Bx
B
+Nx
N
= b
and hence
x
B
= B
1
(b Nx
N
) = x
B
B
1
Nx
N
.
The new cost f(x) will be
f(x) = c
t
x =
_
c
t
B
c
t
N
_
_
x
B
x
N
_
=
_
c
t
B
c
t
N
_
_
x
B
B
1
Nx
N
x
N
_
= c
t
B
(x
B
B
1
Nx
N
) +c
t
N
x
N
= (c
t
N
c
t
B
B
1
N)x
N
+c
t
B
x
B
= (c
t
N
c
t
B
B
1
N)x
N
+f(x).
The sign of (c
t
N
c
t
B
B
1
N)x
N
will show if it is possible to decrease the cost by
moving to the new vector x.
The vector c
t
N
c
t
B
B
1
N is called the vector of reduced costs or the relative cost
vector (for nonbasic variables). If all the components of the vector of reduced costs
are nonnegative, we are not able to lower the costs anymore and x is optimal.
Duality theory
Linear programming is based on the theory of duality. To each primal linear
programming problem we can assign a dual linear programming problem. This sec-
tion formulates and discusses the relationships between the primal and dual problems
which are important for optimality conditions and oers a meaningful economic in-
terpretation of the optimization model.
Motivation
We begin with an example.
Example 1.
f = x
1
+ 3x
2
minimize
subject to
_

_
x
1
+ 3x
2
4
4x
1
x
2
1
x
2
3
x
1
, x
2
0
First we observe that every feasible solution provides an upper bound of the opti-
mal objective function value f
min
. For example, the solution (x
1
, x
2
) = (1, 3) tells us
60
f
min
1 + 3 3 = 10. But how close is this bound to the optimal value? To answer,
we need to give lower bounds.
By multiplying the third constraint by 13 and adding that to the sum of the rst
two constraints we get
x
1
+ 3x
2
+ 4x
1
x
2
+ 13x
2
4 + 1 + 39
which is equivalent to
5x
1
+ 15x
2
44.
Hence
44
5
f
min
10.
To get a better lower bound, we apply again the same lower bounding technique,
but we replace the numbers used before with variables. So, we multiply the three
constraints by nonnegative numbers y
1
, y
2
and y
3
.
Hence
y
1
(x
1
+ 3x
2
) +y
2
(4x
1
x
2
) +y
3
x
2
4y
1
+y
2
+ 3y
3
and
x
1
(y
1
+ 4y
2
) +x
2
(3y
1
y
2
+y
3
) 4y
1
+y
2
+ 3y
3
.
If we stipulate that each of the coecients of the x
i
is at most as large as the
corresponding coecient in the coecient function,
_
y
1
+ 4y
2
1
3y
1
y
2
+y
3
3
then
f = x
1
+ 3x
2
4y
1
+y
2
+ 3y
3
.
We now have a lower bound 4y
1
+y
2
+3y
3
, which we should maximize in our eort
to obtain the best possible lower bound.
Therefore, we are led to the following optimization problem
g = 4y
1
+y
2
+ 3y
3
maximize
subject to
_
_
_
y
1
+ 4y
2
1
3y
1
y
2
+y
3
3
y
1
, y
2
, y
3
0
This problem is called the dual linear programming problem associated to the
given linear programming problem. Next, we will dene the dual linear programming
problem in general.
61
The dual problem. Symmetric form
Given a LPP in the form
(P) f =
n

j=1
c
j
x
j
minimize
subject to
_

_
n

j=1
a
ij
x
j
b
i
, i = 1, 2, . . . , m
x
j
0, j = 1, 2, . . . , n
the associated dual linear programming problem is given by
(D) g =
m

i=1
b
i
y
i
maximize
subject to
_

_
m

i=1
a
ij
y
i
c
j
, j = 1, 2, . . . , n
y
i
0, i = 1, 2, . . . , m
Since we started with the LPP (P) it is called the primal problem.
If we use the matrix notation in the form
_
A b
f
_
we have
i) the minimization problem (P)
_
_
_
_
_
a
11
. . . a
1n
b
1
.
.
.
.
.
.
.
.
.
a
m1
. . . a
mn
b
m
c
1
. . . c
n

_
_
_
_
_
ii) the maximization problem (D).
_
_
_
_
_
a
11
. . . a
m1
c
1
.
.
.
.
.
.
.
.
.
a
1n
. . . a
mn
c
n
b
1
. . . b
m

_
_
_
_
_
Example 2. a) Find the dual of the given linear programming problem
f = 6x
1
+ 5x
2
+ 7x
3
minimize
subject to
62
_

_
3x
1
+x
2
+ 2x
3
3
2x
1
+ 2x
2
x
3
5
x
1
+ 2x
2
+x
3
2
x
1
, x
2
, x
3
0
Solution
(D) g = 3y
1
+ 5y
2
+ 2y
3
maximize
subject to
_

_
3y
1
+ 2y
2
+y
3
6
y
1
+ 2y
2
+ 2y
3
5
2y
1
y
2
+y
3
7
y
1
, y
2
, y
3
0
b) The dual of the diet problem
The diet problem was the problem faced by a dieticien to select a combination of
foods to meet certain nutritional requirements at minimum cost. This problem has
the form (see the rst example in section 2.2)
f =
n

j=1
c
j
x
j
minimize
subject to
_

_
n

j=1
a
ij
x
j
b
i
, i = 1, m
x
j
0, j = 1, n
The dual problem is
g =
m

i=1
b
i
y
i
maximize
subject to
_

_
m

i=1
a
ij
y
i
c
j
, j = 1, n
y
i
0, i = 1, m
We describe an interpretation of the dual problem. Imagine a pharmaceutical com-
pany that produces the nutrients considered important by the dieticien. The problem
is to determine the positive unit prices y
1
, y
2
, . . . , y
m
for the nutrients, in order to
maximize the revenue
_
m

i=1
b
i
y
i
_
while at the same time being competitive to the
real food. To be competitive with real food, the cost of a unit of food i made by the
pharmaceutical company must be at most c
i
_
n

i=1
a
ij
y
i
c
i
_
.
63
Remark 1. The dual of the dual symmetric problem is the primal problem.
Proof. We must rst write the dual problem in the form (P). To change a maxi-
mization into a minimization, we note that:
max
m

i=1
b
i
y
i
= min
_
m

i=1
(b
i
)y
i
_
.
To change the direction of the inequalities, we simply multiply by 1.
The resulting equivalent representation of the dual problem in standard form then
is
- minimize
m

i=1
(b
i
)y
i
subject to
_

_
n

i=1
(a
ij
)y
i
c
j
, j = 1, . . . , n
y
i
0, i = 1, 2, . . . , m
Now we take its dual:
- maximize
n

j=1
(c
j
)x
j
= minimize
n

j=1
c
j
x
j
subject to
_

_
n

j=1
(a
ij
)x
j
b
i
, i = 1, 2, . . . , m
x
j
0, j = 1, 2, . . . , n
which is clearly equivalent to the primal problem (P).
It is always possible to obtain the dual of a LPP consisting a mixture of equations,
inequalities (in either direction), nonnegative variables or variables unrestricted in sign
by changing the system to an equivalent system (P).
However, an easier way is to apply certain rules, presented below.
Primal Dual
Minimize primal objective Maximize dual objective
Objective coecients Right hand side (RHS) of dual
RHS of primal Objective coecients
Coecient matrix Transpose coecient matrix
Primal relation Dual variable
i
th
inequality: y
i
0
i
th
inequality: y
i
0
i
th
equation: = y
i
unrestricted in sign
Primal variable Dual relation:
x
j
0 j
th
inequality:
x
j
0 j
th
inequality:
x
j
unrestricted in sign j
th
equation: =
64
The dual of a standard form
Applying the correspondence rules of the previous table, the dual of the standard
form can be easily obtained.
Thus, the primal problem for the standard linear problem is
(P

) f = c
1
x
1
+ +c
n
x
n
minimize
subject to
_

_
a
11
x
1
+a
12
x
2
+ +a
1n
x
n
= b
1
. . .
a
m1
x
1
+a
m2
x
2
+ +a
mn
x
n
= b
m
x
1
, x
2
, . . . , x
n
0
and the dual problem for the standard LPP is
(D

) g = b
1
y
1
+b
2
y
2
+ +b
m
y
m
maximize
subject to
_

_
a
11
y
1
+a
21
y
2
+ +a
m1
y
m
c
1
. . .
a
1n
y
1
+a
2n
y
2
+ +a
mn
y
m
c
n
y
1
, . . . , y
m
unrestricted in sign
The matrix form of the previous problem is the following:
If the primal problem is
f(x) = c
t
x minimize
subject to
_
Ax = b
x 0
then its dual is
g(y) = b
t
y
subject to
A
t
y c.
The dual variables y are unrestricted.
Remark 2. The dual of a dual of a primal LPP in standard form is itself the
primal LPP in standard form.
Duality theorems
A duality theorem is a statement of the range of possible values for the primal
problem versus the range of possible values for the dual problem. There are two major
results relating the primal and dual problems. The rst, called weak duality states
65
that primal objective values provide bounds for dual objective values, and viceversa.
The second, called strong duality, states that the optimal values of the primal and
dual problems are equal, provided that they exist. Since every linear programming
problem can be converted to standard form, in the theoretical results below we work
with primal linear programs in standard form.
Theorem (The weak duality theorem). Let x be a BFS for the primal problem
in standard form, and let y be a BFS for the dual problem. Then f(x) g(y).
Proof. The constraints for the dual show that A
t
y c. By transposing the pre-
vious inequality we get y
t
A c
t
. Since x 0,
f(x) = c
t
x y
t
Ax = y
t
b = g(y).
There are several simple consequences of the weak duality theorem.
Corollary 1. If the primal is unbounded then the dual is infeasible. If the dual is
unbounded, then the primal is infeasible.
Corollary 2. If x is a feasible solution to the primal problem, y is a feasible
solution to the dual, and f(x) = g(y), then x and y are optimal for their respective
problems.
The previous result shows us that it is possible to check if the vectors x and y are
optimal without solving the primal and dual problems.
The previous theorem is called the weak duality theorem because it expresses only
the guarding of the primal problem by the dual problem, but it doesnt say that the
guarding is perfect. The latter is expressed by the strong duality theorem.
Theorem (The strong duality theorem). Let a pair of primal and dual linear
programming problems. If one of the problems has an optimal solution then so does
the other, and the optimal values are equal.
Proof. We assume that
- the primal problem is in standard form
- the primal problem has an optimal basic feasible solution x.
By reordering the variables we can write x in terms of basic and nonbasic variables
x =
_
x
B
0
_
and correspondingly we have
A =
_
B N
_
, c =
_
c
B
c
N
_
and x
B
= B
1
b.
Since x is optimal, then c
t
N
c
t
B
B
1
N 0.
Let y = (B
1
)
t
c
B
.
We will show that y is a feasible solution and f(x) = g(y). Then Corollary 2 will
show that y is optimal for the dual.
First we check the feasibility
y
t
A = (B
t
c
B
)
t
A = c
t
B
B
1
A = c
t
B
B
1
_
B N
_
=
_
c
t
B
c
t
B
B
1
N
_

_
c
t
B
c
t
N
_
= c
t
.
66
Taking the transpose of the previous inequality we get A
t
y c and hence y satises
the dual constraints.
f(x) = c
t
x = c
t
B
x
B
= c
t
B
B
1
b
g(y) = b
t
y = (b
t
y)
t
= y
t
b = c
t
B
B
1
b.
So, y is feasible for the dual and f(x) = g(y). Hence by Corollary 2, y is optimal
for the dual.
The previous proof provides the optimal dual solution.
If
x =
_
x
B
x
N
_
, A =
_
B N
_
and c =
_
c
B
c
N
_
,
then the optimal values of the dual variables are given by
y = B
t
c
B
.
Remark 3. If the given linear programming problem has a complete set of slack
variables, then the reduced costs for the slack variables are given by
c
t
N
c
t
B
B
1
N = 0
t
c
t
B
B
1
I = (B
t
c
B
)
t
= y
t
because the objective coecients (c
t
N
) for the slack variables are zero, and their con-
straint coecients (N) are given by the identity matrix I. In this case the values of
the optimal dual variables are the opposites of the reduced costs of the slack variables.
More precisely:
To obtain the optimal values of the nonsurplus variables of the dual problem
negate the entries in the c
j
f
j
row under the slack columns (of the primal problem).
The slack column corresponding to the rst constraint of the primal problem yields
the rst variable of the dual, and so on. To obtain the optimal surplus values for the
dual negate the entries in the c
j
f
j
row under the nonslack columns. The column
corresponding to the rst nonslack variable yields the surplus variables associated to
the rst constraint of the dual problem, and so on.
Example 3.
f = x
1
2x
2
minimize
subject to
_

_
2x
1
+x
2
2
x
1
+ 2x
2
7
x
1
3
x
1
, x
2
0
The standard form of the previous problem is
_

_
f = x
1
2x
2
minimize
2x
1
+x
2
+x
3
= 2
x
1
+ 2x
2
+x
4
= 7
x
1
+x
5
= 3
x
1
, x
2
, x
3
, x
4
, x
5
0
67
c 1 2 0 0 0
c
B
B b x
1
x
2
x
3
x
4
x
5
ratio test
0 x
3
2 2 1 1 0 0
0 x
4
7 1 2 0 1 0 min
_
2
1
,
7
2
_
= 2
0 x
5
3 1 0 0 0 1
f
j
0 0 0 0 0 0
c
j
f
j
1 2 0 0 0
2 x
2
2 2 1 1 0 0
0 x
4
3 3 0 2 1 0 min
_
3
3
,
3
1
_
= 1
0 x
5
3 1 0 0 0 1
f
j
4 4 2 2 0 0
c
j
f
j
5 0 2 0 0
2 x
2
4 0 1
1
3
2
3
0
1 x
1
1 1 0
2
3
1
3
0
0 x
5
2 0 0
2
3

1
3
1
f
j
9 1 2
4
3

5
3
0
c
j
f
j
0 0
4
3
5
3
0
2 x
2
5 0 1 0
1
2
1
2
1 x
1
3 1 0 0 0 1
0 x
3
3 0 0 1
1
2
3
2
f
j
13 1 2 0 1 2
c
j
f
j
0 0 0 1 2
Hence f
min
= 13 and x
t
min
= (3, 5, 3, 0, 0).
The dual problem is
g = 2y
1
+ 7y
2
+ 3y
3
maximize
subject to
_
_
_
= 2y
1
y
2
+y
3
1
y
1
+ 2y
2
2
y
1
, y
2
, y
3
0
g
max
= f
min
= 13, y
t
max
= (0, 1, 2).
Remark 4. If the given linear programming problem has a complete set of surplus
variables, then the reduced costs for the surplus variables are given by
c
t
N
c
t
B
B
1
N = 0
t
c
t
B
B
1
(I) = (B
t
c
B
)
t
= y
t
because the objective coecients (c
t
N
) for the surplus variables are zero, and their
constraint coecients (N) are given by I. In this case the values of the optimal dual
variables are the same as the reduced costs of the surplus variables.
68
More precisely:
The optimal values of the nonslack variables of the dual problem are the entries in
the c
j
f
j
row under the surplus columns of the primal problem. The surplus column
corresponding to the rst constraint of the primal problem yields the rst variable of
the dual, and so on.
The optimal values of the slack variables of the dual problem are the entries in
the c
j
f
j
row under the nonsurplus columns of the primal problem. The column
corresponding to the rst nonsurplus variable yields the slack variable associated to
the rst constraint of the dual problem, and so on.
Example 4. We consider a very simple diet problem in which the nutrients are
starch, protein and vitamins. The foods are two types of grains with data given below.
Nutrient units/kg Minimum daily
of grain type requirement of nutrient
Nutrient 1 2 in units
Starch 2 1 2
Protein 1 2 2
Vitamins 2 2 3
Cost (RON/kg) of food 5 4
Determine the most economical diet which satises the basic minimum nutritional
requirements.
Solution. x
j
is the amount in kg of grain j included in the daily diet, j = 1, 2,
and the vector x = (x
1
, x
2
)
t
is the diet. Each of nutrients lead to a constraint. For
example the amount of vitamins contained in the diet is 2x
1
+ 2x
2
which must be
3.
The problem to be solved is the following
f(x) = 5x
1
+ 4x
2
minimize
subject to
_

_
2x
1
+x
2
2
x
1
+ 2x
2
2
2x
1
+ 2x
2
3
x
1
0, x
2
0
The simplex table associated to the standard LPP corresponding to (P) is:
69
c 5 4 0 0 0
c
B
B b x
1
x
2
x
3
x
4
x
5
ratio test
2 2 1 1 0 0
2 1 2 0 1 0 min
_
2
2
,
2
1
,
3
2
_
= 1
3 2 2 0 0 1
5 x
1
1 1
1
2

1
2
0 0
1 0
3
2
1
2
1 0 min
_
1
1
2
,
1
3
2
,
1
1
_
=
2
3
1 0 1 1 0 1
5 x
1
2
3
1 0
2
3
1
3
0
4 x
2
2
3
0 1
1
3

2
3
0 min
_
2
3
1
3
,
1
3
2
3
_
=
1
2
1
3
0 0
2
3
2
3
1
5 x
1
1 1 0 0 1 1
4 x
2
1
2
0 1 0 1
1
2
min
_
1
1
,
1
2
1
_
=
1
2
0 x
3
1/2 0 0 1 1
3
2
f
j
7 5 4 0 1 3
c
j
f
j
0 0 0 1 3
5 x
4
1/2 1 0 1 0 1/2
4 x
2
1 0 1 1 0 1
0 x
4
1/2 0 0 1 1
3
2
f
j
13
2
5 4 1 0
3
2
c
j
f
j
0 0 1 0
3
2
f
min
=
13
2
, x
t
=
_
1
2
, 1, 0,
1
2
, 0
_
The dual problem is
g = 2y
1
+ 2y
2
+ 3y
3
maximize
subject to
_
_
_
2y
1
+y
2
+ 2y
3
5
y
1
+ 2y
2
+ 2y
3
4
y
1
, y
2
, y
3
0
By using Remark 4 we obtain that g
max
=
13
2
and y
t
=
_
1, 0,
3
2
_
.
Complementary slackness
Theorem (Complementary Slackness). Consider a pair of primal and dual
linear programming problems with the primal problem in standard form.
a) If x is optimal for the primal and y is optimal for the dual then
x
t
(c A
t
y) = 0 i.e.
70
n

j=1
x
j
(c
j
(A
t
y)
j
) =
n

j=1
x
j
_
c
j

m

i=1
a
ij
y
i
_
= 0.
b) If x is feasible for the primal, y is feasible for the dual, and
x
t
(c A
t
y) = 0
then x and y are optimal for their respective problems.
Proof. If x and y are feasible, then
f(x) = c
t
x (A
t
y)
t
x = y
t
Ax = y
t
b = (b
t
y)
t
= g(y).
If x and y are optimal then f(x) = g(y) so that
c
t
x = y
t
Ax,
where from we easily get
x
t
c = x
t
A
t
y and x
t
(c A
t
y) = 0.
If x and y are optimal, then f(x) = g(y) and Corollary 2 shows that x and y are
optimal.
Example. We look again at the pair of LPP given by Example 3
f = x
1
2x
2
minimize
subject to
_

_
2x
1
+x
2
+x
3
= 2
x
1
+ 2x
2
+x
4
= 7
x
1
+x
5
= 3
x
1
, . . . , x
5
0
The optimal solutions are
x = (3, 5, 3, 0, 0)
t
and y = (0, 1, 2)
t
.
The dual constraints are
_
_
_
2y
1
y
2
+y
3
1
y
1
+ 2y
2
2
y
1
, y
2
, y
3
0
The complementary slackness theorem says
5

j=1
x
j
_
c
j

3

i=1
a
ij
y
i
_
= 3[1 (2 0 1(1) + (2))]
+5[2 (0 + 2(1))] + 3[0 (1 0 + 0(1) + 0(2))]
71
+0[0 (0 0 + 1(1) + 0(2))] + 0[0 (0 0 + 0(1) + 1(2))]
= 3 0 + 5 0 + 3 0 + 0 1 + 0 2 = 0.
The complementary slackness theorem conducted us to the following results.
Remark. Consider a pair of primal and dual linear programming problems. Let
x be optimal for the primal and y optimal for the dual.
1) If x
j
> 0 then
c
j

m

i=1
a
ij
y
i
= 0.
In other words if x
j
is a basic variable then its reduced cost (or dual slack vari-
able) is zero. Conversely, if a dual slack variable (reduced cost) is nonzero, then the
associated primal variable is nonbasic and hence zero.
2) It is possible to have both x
j
= 0 and c
j

m
i=1
a
ij
y
i
= 0, for example, when
the problem is degenerate and one of the basic variables is zero.
3) For a symmetric pair of primal and dual linear programming problems
(P) f = c
t
x minimize
subject to
_
Ax b
x 0
and
(D) g = b
t
y maximize
subject to
_
A
t
y c
y 0
the complementary slackness conditions are
x
t
(c A
t
y) = 0 and y
t
(Ax b) = 0.
The complementary slackness conditions have an economic interpretation:
Thinking in terms of the diet problem (see the rst example, section 2.2) which
is a primal part of a symmetric pair of dual problems, suppose that the optimal diet
supplies more than b
j
units of the j
th
nutrient. This means that the dietician will not
pay anything for small quantities of that nutrient, since availability of it would not
reduce the cost of the optimal diet. This implies y
j
= 0 which is (3) of Remark 5.
Marginal values. Shadow prices
Consider the LPP in standard form
f(x) = c
t
x minimize
subject to
72
_
Ax = b
x 0
where A is a mn matrix and rank A = m.
The marginal value (or shadow price) of a constraint i is dened to be the rate
of the change in the objective function as a result of change in the values of b
i
, the
right-hand side of constraint i.
Suppose we keep all the other data in the problem xed at their current value,
except b
i
. Then as b
i
varies, the optimum objective value in the problem is a function
of b
i
which we denote by F(b
i
)(= c
t
x). Then the marginal value of b
i
in the problem
is F

(b
i
).
By using the limit denition of the derivative
F

(b
i
) = lim
h0
F(b
i
+h) F(b
i
)
h
F(b
i
+ 1) F(b
i
).
By using the previous approximation we can say that a shadow price is the amount
the optimal value of the objective function would change if the right-hand side of a
constraint is increased by one unit.
If the given LPP has a nondegenerate optimum basic feasible solution, then the
dual problem has a unique optimal solution y and c
t
x = b
t
y. The previous equality
can be used to show that the marginal value associated with b
i
is y
i
. Since the solution
x is nondegenerate, small changes in any b
i
will not change the optimal dual solution.
Under nondegeneracy the change in value of f for small changes in b
i
is obtained
by partially dierentiating F, F(b
i
) = b
t
y with respect to b
i
, as follows
F

(b
i
) =
_
m

i=1
b
i
y
i
_

b
i
= y
i
.
Example. [14] A mining company owns two dierent mines that produce a given
kind of ore. The mines are located in dierent parts of the country and have dierent
production capacities. After crushing, the ore is graded into three classes: high-grade,
medium-grade and low-grade ores. There is some demand for each grade of ore. The
mining company has contracted to provide a smelting plant with 12 tons of high-
grade, 8 tons of medium-grade, and 24 tons of low-grade ore. It costs the company $
200 per day to run the rst mine and $ 160 per day to run the second. However, in a
days operation the rst mine produces 6 tons of high grade, 2 tons of medium-grade
and 4 tons of low-grade ore, while the second mine produces daily 2 tons of high
grade, 2 tons of medium-grade, and 12 tons of low-grade ore. How many days should
each mine be operated in order to fulll the companys orders most economically?
Solution. First, we summarize the problem in the following table:
High-grade Medium-grade Low-grade Cost
ore ore ore
Mine 1 6 2 4 200
Mine 2 2 2 12 160
Requirements 12 8 24
73
x
1
- the number of days that mine 1 operates
x
2
- the number of days that mine 2 operates
f(x) = 200x
1
+ 160x
2
minimize
subject to
_

_
6x
1
+ 2x
2
12
2x
1
+ 2x
2
8
4x
1
+ 12x
2
24
x
1
0, x
2
0
The standard form of the previous problem is
f(x) = 200x
1
+ 160x
2
subject to
_

_
6x
1
+ 2x
2
x
3
= 12
2x
1
+ 2x
2
x
4
= 8
4x
1
+ 12x
2
x
5
= 24
x
1
, x
2
, x
3
, x
4
, x
5
0
c 200 160 0 0 0
c
B
B b x
1
x
2
x
3
x
4
x
5
ratio test
12 6 2 1 0 0
8 2 2 0 1 0 min
_
12
6
,
8
2
,
24
4
_
= 2
24 4 12 0 0 1
200 x
1
2 1
1
3

1
6
0 0 min
_
2
1
3
,
4
4
3
,
16
32
3
_
4 0
4
3
1
3
1 0 = min
_
6, 3,
3
2
_
=
3
2
16 0
32
3
2
3
0 1
200 x
1
3
2
1 0
3
16
0
1
32
min
_
3
2
1
32
,
2
1
8
_
2 0 0
1
4
1
1
8
= min48, 8
160 x
2
3
2
0 1
1
16
0
3
32
200 x
1
1 1 0
1
4
1
4
0
0 x
5
16 0 0 2 8 1
160 x
2
3 0 1
1
4

3
4
0
f
j
680 200 160 10 70 0
c
j
f
j
0 0 10 70 0
In conclusion f
min
= 680, x
min
= (1, 3, 0, 0, 16).
The minimum operating cost is $ 680 and it is achieved by operating the rst mine
one day and the second mine three days.
If the mines are operated as indicated, then the combined production will be
6 + 2 3 = 12 tons of high-grade ore, 2 + 2 3 = 8 tons of medium-grade ore and
4 + 12 3 = 40 tons of high-grade ore.
74
We can observe that the low-grade ore is overproduced (with 16 tons).
The dual problem is
g = 12y
1
+ 8y
2
+ 24y
3
maximize
subject to
_
6y
1
+ 2y
2
+ 4y
3
200
2y
1
+ 2y
2
+ 12y
3
160
From the simplex table and Remark 4 we get that
y
1
= 10, y
2
= 70, y
3
= 0, y
4
= 0, y
5
= 0, g
max
= 680.
The rst step in interpreting the solution to the dual problem is that of determining
the dimensions of the variables involved. We will determine the dimensions of the
variables of both the primal and dual problems by following the next two rules:
a) the dimension of x
j
is the ratio of the dimension of b divided by the dimension
of a
ij
for any i
b) the dimension of y
i
is the ratio of the dimension of c
j
divided by the dimension
of a
ij
for any j.
In our example we already know that the dimensions of x
1
and x
2
are days.
dimension of y
1
=
The dimension of c
1
dimension of a
11
=
$/day
tons-Hg/day
=
$
tons-Hg
In the same way
The dimension of y
2
=
$
tons-Mg
and the dimension of y
3
=
$
tons-Lg
.
The next step is to look at the optimal dual solution and give its interpretation.
We know that y
1
= 10 has dimension $/ton of high-grade ore, which sounds like the
imputed cost of producing an additional ton of high grade ore and we shall show that
this is the case. Suppose we increase the requirements for high-grade ore production
from 12 to 16 tons.
The new problem is
f = 200x
1
+ 160x
2
minimize
subject to
_

_
6x
1
+ 2x
2
16
2x
1
+ 2x
2
8
4x
1
+ 12x
2
24
x
1
0, x
2
0
75
c 200 160 0 0 0
c
B
B b x
1
x
2
x
3
x
4
x
5
ratio test
16 6 2 1 0 0
8 2 2 0 1 0 min
_
16
2
,
8
2
,
24
2
_
= 2
24 4 12 0 0 1
12
16
3
0 1 0
1
6
4
4
3
0 0 1
1
6
min
_
8
1
0
,
4
1
0
_
= 24
160 x
2
2
1
3
1 0 0
1
12
8 4 0 1 1 0
0 x
5
24 8 0 0 6 1 min
_
8
4
,
24
8
,
42
1
_
= 2
160 x
2
4 1 1 0
1
2
0
200 x
1
2 1 0
1
4
1
4
0
0 x
5
8 0 0 2 4 1
160 x
2
2 0 1
1
4

3
4
0
f
j
720 200 160 10 70 0
c
j
f
j
0 0 10 70 0
The optimal solution to the new problem x
1
= 2, x
2
= 2.
Notice that the cost of production has increased from 680 to 720 which is 4y
1
=
4 10 = 40. Hence y
1
= 10 is the cost/ton of each of the additional high-grade ore. The
fact that y
3
= 0 (which has dimension $/ton of low-grade ore) says that the low-grade
ore is free in the sense that producing an additional ton has zero cost (because there
is already an overproduction of 16 tons, so the additional ton will cost zero to produce
since it already exists).
Remark 1. The interpretation of a pair of symmetric dual linear programming.
a) For either problem the matrix A will be called the matrix of technological
coecients.
b) If the original problem is minimizing, we interpret x as the activity vector. b is
interpreted as the requirements vector, whose components give the minimum amounts
of each good that must be produced. The vector c is the cost vector, whose entries
give the unit costs of each of the activities. The vector y (the solution of the dual
problem) is the imputed-cost vector, whose components give the imputed costs of
producing additional amounts of each of the required goods (provided the changes in
requirements are suciently small that the dual solution remains optimal).
c) If the original problem is maximizing we interpret x as the activity vector.
Then the vector b is interpreted as the capacity-constraint vector, whose components
give the amounts of resources that can be demanded by a given activity vector. The
vector c is the prot vector, whose entries give the unit prots for each component
of the activity vector x. The vector y is the imputed-value vector, whose entries give
the imputed values of each of the resources that enter into the productive process
(provided the changes in resources are suciently small that the dual solution remains
optimal).
76
Remark 2. Suppose in the linear programming problem
f = c
t
x minimize
subject to
_
Ax = b
x 0
Assume that the optimal basis is B with corresponding solution (x
B
, 0) where
x
B
= B
1
b. A solution to the corresponding dual problem is y = B
t
c
B
.
Assuming nondegeneracy, small changes in the vector b will not cause the optimal
basis to change. Thus for b + b the optimal solution is x = (x
B
+ x
B
, 0) where
x
B
= B
1
b. Thus the corresponding increment in the cost function is
f = g = c
t
B
x
B
= y
t
b.
This equation shows that y give the sensitivity of the optimal cost with respect to
small changes in the vector b. If a new problem is solved with b changed to b + b,
the change in optimal value of the objective function will be
t
b, where
j
is the
marginal price of the component b
j
, since if b
j
is changed to b
j
+b
j
the value of the
optimal solution changes by
j
b
j
.
Game theory
Game theory is a mathematical approach to the problems of strategy such as one
can nd in the theory of operational research or economy. This theory is frequently
and naturally used in every day life.
Game theory is used to analyse situations where for two or more individuals (or
institutions) the outcome of an action by one of them depends not only on the action
taken by that individual but also on the actions taken by the others. The strategies
of individuals will be dependent on expectations about the others are doing. These
games are called games of strategy and the participants are called players. The players
of such a game need to take into account the possible actions of the others when they
make decisions.
Strategic thinking characterizes many human interactions. Here are some exam-
ples:
a) Two rms with large market shares in a particular industry makings decisions
with respect to price and output.
b) The decision of a rm to enter a new market where there is a risk that the
existing rms will try to ght entry.
c) A criminal deciding whether to confess or not to a crime that he has committed
with an accomplice who is also questioned by the police.
d) Family members arguing over the division of work within the household.
We shall focus on the simplest type of game, called the nite two-person zero-sum
game, or matrix game for short.
77
Matrix games
A matrix game is a two-person game dened as follows. Each of two persons selects
(independently) an action from a nite set of choices and both reveal to each other
their choice. If we denote the rst players choice by i (i = 1, m) and the second
players choice by j (j = 1, l) then the rules of the game stipulate that the rst
players payo is a
ij
. We shall refer to the rst player as the row player (R) and the
second player as the column player (C).
The matrix of possible payments A = [a
ij
]
1im
1jl
is known to both of players before
the game begins.
More explicitly: If a
ij
> 0, C pays to R an amount of a
ij
; if a
ij
< 0, R pays C an
amount of [a
ij
[; if a
ij
= 0, no money is won or lost.
The main properties of a matrix game are:
there are two players (two-person game)
each player has nitely many choices of play, each makes one choice, and the
combination of the two choices determines a payo (nite game)
what one player wins, the other loses (zero-sum game).
We present now some examples:
Example 1. Paper-Scissors-Rock game
This is a two-person game in which each player declares either Paper, Scissors or
Rock. If both players declare the same object, then the payo is 0. Paper loses to
Scissors since scissors can cut a piece of paper. Scissors loses to Rock since a rock
can dull scissors and nally Rock loses to Paper since a piece of paper can cover up
a rock. The payo is 1 in these cases.
The payo matrix is:
C player
Paper Scissors Rock
Paper
R player Scissors
Rock
_
_
0 1 1
1 0 1
1 1 0
_
_
Example 2. Morra game
The two players simultaneously show either one or two ngers, and at the same
time, each player announce a number.
If the number announced by one of the players is the same as the number of ngers
showed by both players, then he wins that number from the opponent (if both players
guess right then the payment is zero).
Each player has four possible strategies. If he shows one nger then he may guess
two or three (he will never guess four in this case, but this strategy never wins so he
will eliminate it). If he shows two ngers then he may guess three or four.
If we denote by R
ij
and C
ij
the strategy of showing i ngers and guessing the
number j then the following payo matrix will be associated to Morra game:
78
C
12
C
13
C
23
C
24
R
12
R
13
R
23
R
24
_
_
_
_
0 2 3 0
2 0 0 3
3 0 0 4
0 3 4 0
_
_
_
_
Example 3. Two stores R and C, are planning to locate in one of two towns.
Town 1 has 70 percent of the population while town 2 has 30 percent. If both stores
locate in the same town they will split the total business of both towns equality, but
if locate in dierent towns each will get the business of that town. Where should each
store locate?
The payo matrix is:
Store C locates in
1 2
Store R 1
locates in 2
_
50 70
30 50
_
The entries of the payo matrix represent the percentages of business of store R
(or of the percentage loses of business by C).
It is easy to see that store R should prefer to locate in town 1 because by this
choice R can assure himself of 20 percent more business in town 1 than in town 2.
Similarly, store C also prefers to locate in town 1 because he will lose 20 percent less
business in town 1 than in 2.
Hence the best strategies are for each store to locate in town 1.
By a strategy for R in a matrix game, A, we mean a decision by R to play
the various rows with a given probability distribution, i.e. to play the rst row with
probability p
1
, to play the second row with probability p
2
, and so on.
This strategy for R is represented by the probability vector
p = (p
1
, . . . , p
m
),
m

k=1
p
k
= 1,
where p
i
, i = 1, m, represents the probability of choosing (by player R) the row i.
In the same way, by a strategy for C we mean a decision by C to play the
various columns with a given probability distribution, i.e. to play the rst column
with probability q
1
, to play the second column with probability q
2
, and so on.
This strategy for C is represented by the probability vector
q = (q
1
, . . . , q
l
),
l

j=1
q
j
= 1,
where q
j
, j = 1, l, represents the probability of choosing (by player C) the column j.
A strategy which contains a 1 as component (and in consequence 0 everywhere
else) is called a pure strategy, otherwise it is called a mixed strategy.
79
In the case of a pure strategy the player R (respectively the player C) decides to
play always a given row (respectively a given column).
When R plays row i with probability p
i
(i = 1, m) and C plays column j with
probability q
j
(j = 1, l) then the payo a
ij
will be realized with probability p
i
q
j
.
Hence the expected winning of R is:
E(p, q) = p
1
q
1
a
11
+p
1
q
2
a
12
+ +p
1
q
l
a
1l
+ +
+p
m
q
1
a
m1
+p
m
q
2
a
m2
+ +p
m
q
l
a
ml
=
m

i=1
p
i
l

j=1
q
j
a
ij
= (p
1
, . . . , p
m
)
_
_
_
a
11
. . . a
1l
.
.
.
.
.
.
a
m1
. . . a
ml
_
_
_
_
_
_
q
1
.
.
.
q
l
_
_
_
= pAq
t
=
l

j=1
q
j
m

i=1
p
i
a
ij
= (q
1
, . . . , q
l
)
_
_
_
a
11
. . . a
m1
.
.
.
.
.
.
a
1l
. . . a
ml
_
_
_
_
_
_
p
1
.
.
.
p
m
_
_
_
= qA
t
p
t
In conclusion:
E(p, q) = pAq
t
= qA
t
p
t
.
The player R tries to choose a row i (i = 1, m) such that the expected value of his
winnings is maximal no matter what column the player C chooses.
The player C tries to choose a column j (j = 1, l) such that the expected value of
his loses is minimal no matter what row the player R chooses.
We say that the game with payo matrix A has the value v, and we call p
0
and
q
0
optimal strategies, if
E(p
0
, q) v, for every strategy q for C ()
E(p, q
0
) v, for every strategy p for R ()
Remark 1.
a) If p
0
is a given strategy for player R in a matrix game A then the following two
conditions are equivalent:
(i) E(p
0
, q) v, for every strategy q for C
(ii) p
0
A (v, v, . . . , v).
b) If q
0
is a given strategy for player C in a matrix game A then the following two
conditions are equivalent:
(i) E(p, q
0
) v, for every strategy p for C
(ii) q
0
A
t
(v, v, . . . , v).
Proof. a) Assume that (i) holds and that p
0
A = (a
1
, a
2
, . . . , a
l
). Choosing the
pure strategy q = (1, 0, . . . , 0) we have
E(p
0
, q) = p
0
Aq
t
= (a
1
, a
2
, . . . , a
l
)(1, 0, . . . , 0)
t
= a
1
v.
Similarly a
2
v, . . . , a
l
v. In other words
p
0
A (v, v, . . . , v).
80
On the other hand, assume that (ii) holds. Then, for any strategy q for C,
E(p
0
, q) = p
0
Aq
t
(v, v, . . . , v)(q
1
, q
2
, . . . , q
l
)
t
=
l

j=1
vq
l
= v 1 = v.
b) Assume that (i) holds and that q
0
A
t
= (b
1
, b
2
, . . . , b
m
). Choosing the pure
strategy p = (1, 0, . . . , 0) we have
E(p, q
0
) = q
0
A
t
p
t
= (b
1
, b
2
, . . . , b
m
)(1, 0, . . . , 0)
t
= b
1
v.
Similarly, b
2
v, . . . , b
m
v. In other words
q
0
A
t
(v, v, . . . , v).
On the other hand, assume that (ii) holds. Then, for any strategy p for R,
E(p, q
0
) = q
0
A
t
p
t
(v, v, . . . , v)(p
1
, p
2
, . . . , p
m
)
t
=
m

i=1
vp
i
= v 1 = v.
As we can observe from the previous proof:
- the inequality p
0
A (v, v, . . . , v) can be written as E(p
0
, q) v for every
strategy q for C
- the inequality q
0
A
t
(v, v, . . . , v) can be written as E(p, q
0
) v for every pure
strategy p for R.
In view of the previous remark we say that the game with payo matrix A has
the value v, and p
0
, q
0
optimal strategies if
p
0
A (v, v, . . . , v) (

)
q
0
A
t
(v, v, . . . , v) (

)
We conclude this subsection by proving three results that characterize the value
and optimal strategies of a game.
Remark 2. If A is a matrix game that has a value and optimal strategies then
the value of the game is unique.
Proof. Suppose that v and w are two values for the matrix game A. If p
0
and q
0
are optimal strategy vectors associated with the value v then
(i) p
0
A (v, v, . . . , v)
(ii) q
0
A
t
(v, v, . . . , v).
If p
1
and q
1
are optimal strategy vectors associated with the value w then
(iii) p
1
A (w, w, . . . , w)
(iv) q
1
A
t
(w, w, . . . , w).
If we multiply (i) on the right by (q
1
)
t
we get
p
0
A(q
1
)
t

j=1
vq
1
j
= v.
81
In the same way, multiplying (iv) on the right by (p
0
)
t
gives
p
0
A(q
1
)
t
= (p
0
A(q
1
)
t
)
t
= q
1
A
t
(p
0
)
t

j=1
wp
1
i
= w.
The two inequalities obtained before show that w v.
Similarly, if we multiply (ii) on the right by (p
1
)
t
and (ii) on the right by (q
0
)
t
we
obtain v p
1
A(q
0
)
t
and p
1
A(q
0
)
t
w, which together imply v w.
In consequence w = v, which completes the proof.
Remark 3. If A is a matrix game with value v and optimal strategies p
0
and q
0
,
then v = p
0
A(q
0
)
t
.
Proof. The following inequalities are true:
p
0
A (v, v, . . . , v) and q
0
A
t
(v, v, . . . , v).
Multiplying the rst of these inequalities on the right by (q
0
)
t
, we get
p
0
A(q
0
)
t
v.
Similarly, multiplying the second inequality on the right by (p
0
)
t
, we obtain
p
0
A(q
0
)
t
= (p
0
A(q
0
)
t
)
t
= q
0
A
t
(p
0
)
t
v.
These two inequalities together imply that
v = p
0
A(q
0
)
t
.
The previous two remarks allow us to interpret the value of a game as an expected
value in the following way: If the matrix game is played repeatedly and if each time
the player R chooses the p
0
strategy and the player C chooses the q
0
strategy, then
the value of the matrix game A is the expected value of the game for R.
Remark 4. If A is a matrix game with value v and optimal strategies p
0
and q
0
,
then v is the largest expectation that R can assure for himself and w is the smallest
expectation that C can assure for himself.
Proof. Let p any strategy vector of R; then multiplying the inequality q
0
A
t

(v, . . . , v) on the right by (p


0
)
t
, we get
pA(q
0
)
t
= (q
0
A
t
p
t
)
t
= q
0
A
t
p
t
v.
So, if C plays optimally, the most that R can obtain for himself is v.
On the other hand, since v = p
0
A(q
0
)
t
, R can obtain for himself an expectation
of v.
The proof of the other statement of the remark is similar.
The previous remark tells us that the value of a game is the best that a player
can obtain for himself (by using the optimal strategies).
82
Strictly determined games. Saddle point
A matrix game is strictly determined if the matrix has an entry which is a
minimum in its row and a maximum in its column; such an entry is called a saddle
point.
Remark 5. Let v a saddle point of a strictly determined game. Then an optimum
strategy for R is to play the row containing v, an optimum strategy for C is to play
the column containing v, and v is the value of the game.
Proof. Suppose v = a
ij
, so p
0
= (0, 0, . . . , 1, . . . , 0) (1 is the value of the i
th
coordinate of p
0
) and q
0
= (0, . . . , 1, . . . , 0) (1 is the value of the j
th
coordinate of q
0
).
We now show that p
0
, q
0
and v satisfy the required properties to be optimum
strategies and the value of the game. Indeed,
p
0
A = (0, . . . , 1, . . . , 0) A = (a
i1
, . . . , a
ij
, . . . , a
il
)
(a
ij
, . . . , a
ij
) = (v, . . . , v)
q
0
A
t
= (0, . . . , 1, . . . , 0)A
t
= (a
1j
, . . . , a
ij
, . . . , a
mj
)
(a
ij
, . . . , a
ij
) = (v, . . . , v).
Thus in a strictly determined game a pure strategy for each player is an optimum
strategy:
for player R: to choose a row that contains a saddle value
for player C: to choose a column that contains a saddle value.
Example 4. We consider a generalization of Example 3 in which the stores R and
C are trying to locate in one of the three towns in gure below.
50
30 20
30 km
18 km
24 km
Town 1
Town 2
Town 3
If both stores locate in the same town they split all business equally, but if they
locate in dierent towns then all the business in the town that doesnt have a store
will go to the closer of the two stores.
The payo matrix for this game is the following:
83
Store C locates in
1 2 3
Store R 1
locates in 2
3
_
_
50 50 80
50 50 80
20 20 50
_
_
If we circle the minimum entry in each row and put a square around the maximum
entry in each column we obtain
1 2 3
1
2
3
_
_
_
_
_
_
50 50 80
50 50 80
20 20 50
_
_
_
_
_
_
Each of the four 50 entries in the 22 matrix in the upper left-hand corner is both
circled and boxed and so is a saddle value of the matrix. Hence the game is strictly
determined, and optimal strategies are:
for store R: locate in town 1 or locate in town 2, represented by the vectors (1, 0, 0)
and (0, 1, 0) respectively.
: combining the previous two strategies we get the following mixed strategy
: locate in town 1 with probability p and locate in town 2 with probability 1p
represented by the vector
p(1, 0, 0) + (1 p)(0, 1, 0) = (p, 1 p, 0), 0 < p < 1
for store C: locate in town 1, locate in town 2 and locate in town 1 with probability
q and locate in town 2 with probability 1q represented by the vectors: (1, 0, 0),
(0, 1, 0) and (q, 1 q, 0).
2 2 matrix games
Consider the matrix game
A =
_
a
11
a
12
a
21
a
22
_
.
If A is strictly determined, then the solution is presented above. Thus we need
only to consider the case in which A is non-strictly determined.
Criterion. The 2 2 matrix game is non-strictly determined if and only if each
of the entries on one of the diagonals is greater than each of the entries on the other
diagonal i.e. one of the following situations are fullled:
(i) a
11
, a
22
> a
12
and a
11
, a
22
> a
21
or
(ii) a
12
, a
21
> a
11
and a
12
, a
21
> a
22
.
84
Proof. If either of the conditions (i) or (ii) holds, it is easy to check that no entry
of the matrix is simultaneously the minimum of the row and the maximum of the
column in which it occurs hence the game is not strictly determined.
In order to prove the other part of the criterion we observe rst that if two of the
entries in the same row or the same column of A are equal then the game is strictly
determined: hence the entries in the same row (or column) are dierent.
Suppose now that a
11
> a
12
; then a
22
> a
12
or else a
12
is a row minimum and a
column maximum; then also a
22
> a
21
or else a
22
is a row minimum and a column
maximum; then also a
11
> a
21
or else a
21
is a row minimum and a column maximum.
In a similar manner the assumption a
11
< a
12
leads to case (ii). This completes
the proof of the theorem.
In order to determine the optimal strategies for a 2 2 non-strictly determinate
game we have the following result:
Theorem 1. If the 2 2 matrix game A is non-strictly determined then p
0
=
(p
0
1
, p
0
2
) is an optimal strategy for player R, q
0
= (q
0
1
, q
0
2
) is an optimal strategy for
player C and v is the value of the game, where
p
0
1
=
a
22
a
21
a
11
+a
22
a
12
a
21
, p
0
2
=
a
11
a
12
a
11
+a
22
a
12
a
21
q
0
1
=
a
22
a
12
a
11
+a
22
a
12
a
21
, q
0
2
=
a
11
a
21
a
11
+a
22
a
12
a
21
and
v =
a
11
a
22
a
12
a
21
a
11
+a
22
a
12
a
21
=
det A
a
11
+a
22
a
12
a
21
.
Proof. We have to see that the values above satisfy the following conditions:
p
0
A (v, v)
q
0
A
t
(v, v)
which are equivalent to:
p
0
1
a
11
+p
0
2
a
21
v
p
0
1
a
12
+p
0
2
a
21
v
q
0
1
a
11
+q
0
2
a
12
v
and
q
0
1
a
21
+q
0
2
a
22
v.
It is easy to verify that the previous formulas are true, and in consequence the
proof is complete.
Example 5. (a simplied version of Morra game)
Each of the two players R and C simultaneously shows one or two ngers. If the
sum of the ngers shown is even, R wins the sum from C; if the sum is odd, R loses
the sum to C. The matrix is the following:
85
C shows
1 2
R 1
shows 2
_
2 3
3 4
_
It is easy to see that the game is nonstrictly determined so we apply the formulas
presented in Theorem 1
v =
2 4 (3)(3)
2 + 4 + 3 + 3
=
1
12
.
Thus the game is in favor of player C. Optimum strategies p
0
for R and q
0
for C
are as follows:
p
0
=
_
7
12
,
5
12
_
, q
0
=
_
7
12
,
5
12
_
.
Remark 6. If a matrix game contains a row (column) whose elements are smaller
or equal (greater or equal) then the elements of other row (column) then the smaller
(greater) row (column) is called a recessive row (column). Clearly, player R (player
C) would never play the recessive row (column), thats why the recessive row (column)
can be omitted from the game.
Example 6. Consider the matrix game
A =
_
_
4 3 1
2 1 2
2 3 4
_
_
.
Note that (4, 3, 1) (2, 1, 2) i.e. the rst row is recessive and can be omitted
from the game and the game may be reduced to
_
2 1 2
2 3 4
_
.
Now observe that the third column is recessive since each entry is greater or equal
to the corresponding entry in the second column. Thus the game may be reduced to
the 2 2 game
A

=
_
2 1
2 3
_
.
The solution to the game A

can be found by using the formulas in Theorem 1


and is
v =
4
8
=
1
2
; p

0
=
_
5
8
,
3
8
_
; q

0
=
_
4
8
,
4
8
_
.
Thus the solution to the original game A is
v =
1
2
, p
0
=
_
0,
5
8
,
3
8
_
and q
0
=
_
1
2
,
1
2
, 0
_
.
86
2 l and m2 matrix games
In the case in which one of the players has just 2 strategies we can solve the game
geometrically.
Example 6. Consider the game whose matrix is:
A =
_
1 0 1 0
3 2 1 2
_
.
Since the fourth column is recessive, we can omit it from the game which can be
reduced to
A

=
_
1 0 1
3 2 1
_
.
The player R plays an arbitrary strategy
p = (p
1
, p
2
) = (1 p
2
, p
2
).
If the player C chooses column 1, then the expected payment y is:
y = 1 p
1
3p
2
= 1 p
2
3p
2
= 1 4p
2
.
If the player C chooses column 2 then
y = 0 p
1
2p
2
= 2p
2
.
If the player C chooses column 3 then
y = p
1
+p
2
= (1 p
2
) +p
2
= 1 + 2p
2
.
Notice that each of these expectations expresses y as a linear function of p
2
. Hence
the graph of these expectations will be straight line in each case. Since we gave the
restriction 0 p
2
1, we are interested only in segment for which p
2
satises the
restrictions.
87

Maximum
1
0
1
1/2
2
3
Column 3
y = 1 + 2p
2
1
p
2
axis
Column 2
y = 2p
2
Column 1
y = 1 4p
2
`
y axis
The player C will minimize his own expectation (his losses) by choosing the lowest
of the three lines presented in the above gure. Now R is the maximizing player, so he
will try to get the maximum of this function. This maximum occurs at the intersection
of the lines corresponding to the columns 2 and 3 (1 + 2p
2
= 2p
2
) when p
2
=
1
4
,
p
0
=
_
3
4
,
1
4
_
and the value of the game is v = 2
1
4
=
1
2
.
We can nd an optimal strategy for player C by considering the 2 2 subgame of
A consisting of the second and third columns:
_
0 1
2 1
_
.
Applying the formulas from Theorem 1 we obtain the strategy q
0
=
_
1
2
,
1
2
_
. We
can extend q
0
to an optimal strategy for player C in A by adding two zero entries
thus:
q
0
=
_
0 1/2 1/2 0
_
.
A similar method to that presented in the previous example works to solve games
in which the column player has just two strategies and the row player has more than
2.
Example 7. Consider the game whose matrix is
A =
_
_
6 1
0 4
4 3
_
_
.
88
The player C plays an arbitrary strategy
q = (q
1
, q
2
) = (1 q
2
, q
2
).
If the player R chooses row i (i = 1, 3) then the expectation that player R has is:
If R chooses row 1: y = 6q
1
+ (1)q
2
= 6(1 q
2
) q
2
= 6 7q
2
.
If R chooses row 2: y = 0 q
1
+ 4q
2
= 4q
2
.
If R chooses row 3: y = 4q
1
+ 3q
2
= 4(1 q
2
) + 3q
2
= 4 q
2
.

6
4
1
Row 1
y = 6 7q
2
Row 3
y = 4 q
2
Row 2
y = 4q
2
q
2
axis
`
y axis
The player R will maximize his own expectation (his winnings) by choosing the
greatest of the three lines presented in the above gure. C is the minimizing player,
so he will try to get the minimum of this function. This minimum occurs at the
intersection of the lines corresponding to the rows 2 and 3 (4q
2
= 4 q
2
) when
q
2
=
4
5
, q
0
=
_
1
5
,
4
5
_
and the value of the game is v = 4
4
5
=
16
5
.
We can nd an optimal strategy for player R by considering the 2 2 subgame of
A consisting of the second and third rows:
_
0 4
4 3
_
.
Applying the formulas from Theorem 1 we obtain the strategy p
0
=
_
1
5
,
4
5
_
. We
can extend p
0
to an optimal strategy for player R in A by adding one zero entry thus:
p
0
=
_
0,
1
5
,
4
5
_
.
89
Next, we will prove the Von Neumanns theorem (which says that every zero-sum,
two-person game has a value in mixed strategies). The proof of Von Neumann uses
the Brower xed-point theorem. The proof that will be presented below uses an idea
of George Dantzig and linear programming theory. Dantzigs proof is better that von
Neumanns because it is elementary and because it shows how to construct a best
strategy.
Theorem 2. (John von Neumanns theorem)
Let A be any real matrix. Then the zero-sum, two-person game with payo matrix
A has a value v which satises p
0
A (v, v, . . . , v) and q
0
A
t
(v, v, . . . , v) for some
optimal strategies p and q.
Proof. With no loss of generality, assume that all a
ij
are positive. Otherwise, if
a
ij
+ is positive for all i and j then the optimality conditions p
0
A (v, v, . . . , v)
and q
0
A
t
(v, v, . . . , v) may be replaced by
m

i=1
p
i
(a
ij
+) =
m

i=1
p
i
a
ij
+ v +, j = 1, l
and
l

j=1
(a
ij
+)q
j
=
l

j=1
a
ij
q
j
+ v +.
Assuming all a
ij
> 0, we will construct a number v > 0 satisfying the optimality
conditions. First we observe that the optimality conditions can be written as
m

i=1
p
i
v
a
ij
1, j = 1, l
l

j=1
q
j
v
a
ij
1, i = 1, m.
If we dene the unknowns x
j
=
q
j
v
(j = 1, l), y
i
=
p
i
v
(i = 1, m) then the previous
inequalities become:
l

j=1
a
ij
x
j
1, i = 1, m
m

i=1
a
ij
y
i
1, j = 1, l
with x 0, y 0 and
l

j=1
x
j
=
l

j=1
q
j
v
=
1
v
,
m

i=1
y
i
=
m

i=1
p
i
v
=
1
v
.
90
The required vectors x 0 and y 0 must solve the following linear programming
problems:
(P) f =
l

j=1
x
j
maximize (since
l

j=1
x
j
=
1
v
and v minimized
according to the fact that the column player is a minimizing player)
subject to
_

_
l

j=1
a
ij
x
j
1, i = 1, m
x
1
, . . . , x
l
0
(D) g =
m

i=1
y
i
minimize (since
m

i=1
y
i
=
1
v
and v maximized
according to the fact that the row player is a maximizing player)
subject to
_

_
m

i=1
a
ij
y
i
1
y
i
0, i = 1, l
The previous two linear programming problems are a symmetric pair of dual prob-
lems. These problems have optimal solutions because they both have feasible solutions
(since all a
ij
are positive a vector y is feasible if all its components are large; the vec-
tor x = 0 is feasible for the primal). By duality theorem, these linear programming
problems have optimal solutions x, y and the same optimal value which is denoted
by
1
v
.
We easily can see that v, p and q satisfy the von Neumanns theorem if p = vy
and q = vx.
The simplex method for solving matrix games
Suppose we are given a matrix game. We assume that A is non-strictly determined
(strictly determined games can be solved as we discussed before without using the
simplex method) and does not contain any recessive rows or columns. According to
the previous proof we can obtain the solution to A as follows:
1
st
step. Add a suciently large number k to every entry of A to form the following
matrix game which has only positive entries
A

=
_
_
a
11
a
12
a
1l
a
21
a
22
a
2l
a
m1
a
m2
a
ml
_
_
The purpose is to guarantee that the value of the new matrix game is positive.
91
2
nd
step. Solve the following LPP by the simplex method
f(x) = x
1
+x
2
+ +x
l
maximize
subject to
_

_
a
11
x
1
+ +a
1l
x
l
1
a
21
x
1
+ +a
2l
x
l
1
. . .
a
m1
x
1
+ +a
ml
x
l
1
x
1
, x
2
, . . . , x
l
0
Let x
0
be the optimum solution to the maximum problem, y
0
the optimum solution
to the dual minimum problem (which can be found in the row of reduced costs of the
terminal table) and v

= f(x
0
) (the optimal solution of LPP).
Let p
0
=
1
v

y
0
, q
0
=
1
v

x
0
and v =
1
v

k.
Then p
0
is an optimum strategy for player R in game A, q
0
is an optimum strategy
for player C and v is the value of the game.
We can remark that the games A and A

have the same optimum strategies for


their respective players and that their values dier by the added constant k.
Example 8. Solve the Paper-Scissors-Rock game (example 1) by using the simplex
method.
Solution. Add k = 2 to each entry to form the matrix game
_
_
2 1 3
3 2 1
1 3 2
_
_
.
We have to solve the following LPP
f(x) = x
1
+x
2
+x
3
maximize
subject to
_

_
2x
1
+x
2
+ 3x
3
1
3x
1
+ 2x
2
+x
3
1
x
1
+ 3x
2
+ 2x
3
1
x
1
, x
2
, x
3
0
92
c 1 1 1 0 0 0
C
B
B b x
1
x
2
x
3
x
4
x
5
x
6
ratio test
0 x
4
1 2 1 3 1 0 0 min
_
1
2
,
1
3
, 1
_
0 x
5
1 3 2 1 0 1 0 =
1
3
0 x
6
1 1 3 2 0 0 1
f
j
0 0 0 0 0 0 0
c
j
f
j
1 1 1 0 0 0
0 x
4
1
3
0
1
3
7
3
1
2
3
0 min
_
1
3
7
3
,
1
3
1
3
,
2
3
5
3
_
1 x
1
1
3
1
2
3
1
3
0
1
3
0 =
1
7
0 x
6
2
3
0
7
3
5
3
0
1
3
1
f
j
1
3
1
2
3
1
3
0
1
3
0
c
j
f
j
0
1
3
2
3
0 0 0
1 x
3
1
7
0
1
7
1
3
7

2
7
0 min
_
2
7
5
7
,
3
7
18
7
_
1 x
1
2
7
1
5
7
0
1
7
3
7
0 =
1
6
0 x
6
3
7
0
18
7
0
5
7
1
7
1
f
j
3
7
1
4
7
1
2
7
1
7
0
c
j
f
j
0
3
7
0
2
7

1
7
0
1 x
3
1
6
0 0 1
7
18

5
18
1
18
1 x
1
1
6
1 0 0
1
18
7
18

5
18
1 x
2
1
6
0 1 0
5
18
1
18
7
18
f
j
1
2
1 1 1
1
6
1
6
1
6
c
j
f
j
0 0 0
1
6

1
6

1
6
The optimal solutions of the primal and dual problem are
x
0
=
_
1
6
,
1
6
,
1
6
_
, y
0
=
_
1
6
,
1
6
,
1
6
_
and the optimal value is v

=
1
2
.
Then for the original game
p
0
=
1
v

y
0
= 2
_
1
6
,
1
6
,
1
6
_
=
_
1
3
,
1
3
,
1
3
_
q
0
=
1
v

x
0
= 2
_
1
6
,
1
6
,
1
6
_
=
_
1
3
,
1
3
,
1
3
_
v =
1
v

2 = 0.
Observe that the game is fair (since its value is 0).
93
The transportation problem
The balanced transportation problem
An important component of economic life is the shipping of goods from where
they are produced, to markets. The aim is to ship these goods at minimum cost. This
problem was one of the rst problems that was modeled and solved by using the linear
programming.
We analyse the single commodity transportation problem. The data of this prob-
lem consists of the amount available at each source, the requirement at each demand
center and the cost of transporting the commodity per unit from each source to each
market.
We consider the transportation problem with the following data
m = number of sources where material is available
n = number of demand centers where material is required
a
i
= units of material available at source i, a
i
> 0, i = 1, m
b
j
= units of material required at demand center j, b
j
> 0, j = 1, n
c
ij
= unit shipping cost (m.u./unit) from source i to demand center j, i = 1, m,
j = 1, n.
The transportation problem with this data is said to satisfy the
balance condition if it satises
m

i=1
a
i
=
n

j=1
b
j
A transportation problem which satises the previous condition is called a bal-
anced transportation problem.
We want to determine the quantities to be shipped such that all the requirements
to be satised (all the supplies from the sources to be shipped and all the demands
to be satised) and the total cost of transportation to be minimum.
If we denote by x
ij
(i = 1, m, j = 1, n) the quantity to be shipped from the
source i to demand center j we get the following mathematical model (as a linear
programming problem)
f(x) =
m

i=1
n

j=1
c
ij
x
ij
minimize
subject to
_

_
n

j=1
x
ij
= a
i
, i = 1, m
m

i=1
x
ij
= b
j
, j = 1, n
x
ij
0, u = 1, m, j = 1, n
(1)
If we add the set of rst m constraints (those corresponding to the sources) and
separately, the set of last m constraints (those corresponding to the demand centers),
94
we see that
m

i=1
a
i
=
m

i=1
n

j=1
x
ij
=
n

j=1
m

i=1
x
ij
=
n

j=1
b
j
.
The previous equality shows us that the balance condition is a necessary condition
for the feasibility of the transportation problem.
Thats why we assumed that the data of the problem satisfy the balance condition.
The previous equality shows that there is a redundant constraint among the con-
straints (1). One of the equality constraints in (1) can be deleted from the system
without aecting the set of feasible solutions.
The matrix of the previous system of equations has the order (m + n 1) mn
and its rank is m+n 1.
So, every basic vector for the balanced transportation problem of order m n
consists of (m+n 1) basic variables.
The transportation problem can be represented in a two dimensional array in
which row i corresponds to source i; column j corresponds to demand center j; and
(i, j) is the cell in row i and column j.
In the cell (i, j), we record the value x
ij
(the amount to be shipped) in the lower
right-hand corner of the cell and the unit shipping cost in the upper left-hand corner
of the cell.
On the right-hand side of the array we record the availabilities at the sources and
at the bottom of the array we record the requirements at the demand centers.
The objective function is the sum of the variables in the array multiplied by the
unit cost in the corresponding cell.
Array representation of the transportation problem
Demand center
1 j n Supply
1
c
11
x
11
c
1n
x
1n
a
i
Source i
c
i1
x
i1
c
ij
x
ij
c
in
x
in
a
i
m
c
m1
x
m1
c
mn
x
mn
a
n
Demand b
1
b
j
b
n
We end this subsection with an example.
Example 1. We consider a small transportation problem where the commodity
is iron ore, the source are mine 1 and mine 2 that produce the ore, and the markets
are three steel plants. Let c
ij
= cost (RON/ton) to ship ore from mine i to plant j,
i = 1, 2, j = 1, 2, 3. The data is given below.
95
Steel plant Available at
1 2 3 mine (tons) daily
Mine 1
11
x
11
8
x
12
2
x
13
800
2
7
x
21
5
x
22
4
x
23
300
Demand at 400 500 200
plant (tons) daily
Determine the amounts to be shipped such that the transportation cost to be
minimum.
Let x
ij
be the amount of ore (in tons) shipped from mine i to plant j.
At mine 1 there are 800 tons of ore available. The amount of ore shipped out of
this mine, x
11
+x
12
+x
13
has to be smaller than the amount available, leading to the
constraint x
11
+ x
12
+ x
13
800. Similarly, considering ore at steel plant 1, at least
400 of it is required there, leading to the constraint x
11
+x
21
400.
The total amount of ore available is 800 + 300 = 1100; and the total amount
required is 400 + 500 + 800 = 1100. This imply that all the ore at each mine will be
shipped out, and the requirement at each plant will be met exactly. In consequence
all constraints will be equalities.
The dual problem. The optimality criterion
We associate the dual variable u
i
to the constraint corresponding to source i,
i = 1, m, and the dual variable v
j
to the constraint corresponding to demand center
j, j = 1, n.
The dual problem is:
g(u, v) =
m

i=1
a
i
u
i
+
n

j=1
b
j
v
j
maximize
subject to
_
u
i
+v
j
c
ij
, i = 1, m, j = 1, n
u
i
, v
j
, i = 1, m, j = 1, n unrestricted in sign
(2)
From the complementarity slackness theorem (from previous section) we know
that if x = (x
ij
)
i=1,m
j=1,n
is a basic feasible solution for the primal problem, (u, v) =
((u
i
)
i=1,m
, (v
j
)
j=1,n
) is feasible for the dual and
x
ij
(c
ij
u
i
v
j
) = 0, for all i and j
then x and (u, v) are optimal for their problems.
96
In conclusion, if x = (x
ij
)
i=1,m
j=1,n
is a basic feasible solution for the transportation
problem then the dual basic solution associated with it can be computed by solving
the following system of equations:
u
i
+v
j
= c
ij
for each (i, j) corresponding to a basic variable x
ij
.
The previous system has m + n unknowns and m + n 1 equations (since each
BFS x has m+n 1 basic variables).
Deleting one constraint in (1) has the eect of setting 0 the correspondent dual
variable.
The system which gives us the dual basic solution is
_
u
i
+v
j
= c
ij
for each (i, j) corresponding to a basic variable x
ij
u
1
= 0 (we can choose any dual variable to be 0)
The optimality criterion is
Optimality criterion. c
ij
u
i
+v
j
for all nonbasic (i, j).
Indeed, if the optimal criterion is satised, then (u, v) is feasible for the dual
problem. Since x is feasible for the primal problem, according to the complementarity
slackness theorem, x and (u, v) are optimal to respective problems.
The transportation algorithm
By using the special structure of the transportation problem we will present a
version of the simplex algorithm that can be solved without canonical tables.
Step 1. Determine an initial basic feasible solution.
We present two methods of obtaining an initial basic feasible solution: the north-
west corner rule and the minimal cost rule.
The northwest corner rule
Begin in the upper left-hand corner (or northwest corner) of the transportation
array and set x
11
as large as possible (there are limitations for setting x
11
: b
1
which
is the demand at the market 1 and a
1
which is the supply at source 1). So, x
11
=
mina
1
, b
1
. Setting x
11
= mina
1
, b
1
, the supply at the source 1 will be a
1
x
11
and the demand at the market 1 will be b
1
x
11
.
If x
11
= a
1
then x
12
= = x
1n
= 0 and hence x
12
, . . . , x
1n
will be nonbasic
variables (instead of 0 in their cells we will put a point).
If x
11
= b
1
then x
21
= = x
n1
= 0 and hence x
21
, . . . , x
n1
will be nonbasic
variables (instead of 0 in their cells we will put a point).
Continue this procedure from the upper left-hand corner of the remaining table.
The northwest corner does not utilize shipping costs. It provides us easily an initial
BFS but the total shipping cost may be very high.
Example. To make this description more concrete, we now illustrate the general
procedure on the iron ore shipping problem (example 1).
97
11
400
8
400
2

800 400 0
7

5
100
4
200
300
400 500 200
0 100
In the rst step, at the northwest corner, a maximum of 400 tons can be allocated,
since that is all that was required by plant 1. This left 800 400 = 400 tons available
at the rst mine and 0 tons required at plant 1. The demand in column 1 is fully
satised and we cannot ship anymore to plant 1 so x
21
= 0 and we put a in cell
(2,1).
Next, we move to the second cell in the top row (the northwest corner of the
remaining table) where a maximum of 400 tons can be allocated, since that is all that
was available at that moment at mine 2. This left 400 400 = 0 tons available at
the rst mine and 500 400 = 100 tons required by plant 2. At this moment the
rst rows availability is met (x
13
= 0) and we move down to the second row where
x
22
= 100 and x
23
= 200. We have 3 + 2 1 = 4 basic variables which are x
11
, x
12
,
x
22
and x
23
. The transportation cost in this BFS is
11 400 + 8 400 + 2 0 + 7 0 + 5 100 + 4 200 = 8900 RON.
The minimal cost rule
This method uses shipping costs in order to provide an initial BFS that has a
lower cost.
First we determine the variable with the smallest shipping cost, then assign x
ij
the largest possible value, which is x
ij
= mina
i
, b
j
.
The supply at source i will be reduced to a
i
x
ij
and the demand at demand
center j will be reduced to b
j
x
ij
.
If x
ij
= a
i
then x
ik
= 0 for each k = 1, n, k ,= j.
If x
ij
= b
j
then x
lj
= 0 for each k = 1, m, l ,= i.
After that, we will choose the cell with the minimum shipping cost from the
remaining array and we will repeat the procedure.
If the minimum cost is realized for more than one cell then we will choose to begin
with that cell for which the correspondent variable is maximum.
Example. We illustrate the procedure described above on the iron shipping prob-
lem (example 1).
98
11
400
8
200
2
200
800 600 400 0
7

5
300
4

300 0
400 500 200
200 0
0
The smallest coecient cost is c
13
= 2, so x
13
= min200, 800 = 200. The demand
in column 3 is fully satised, and we cannot ship anymore to plant 3. So, x
23
= 0
(and we put a point in the cell (2,3)). We change the amount still to be shipped from
mine 1 to 800 200 = 600 and the amount to be shipped at plant 3 to 200 200 = 0.
The smallest cost among the remaining cells is c
22
= 5 and
x
22
= min500, 300 = 300.
We change the remaining requirement at plant 2 to 500300 = 200, the remaining
availability at mine one to 300 300 = 0 and put a point in cell (1,3) since x
13
= 0.
The remaining cells are (1,1) and (1,2) and
x
12
= min200, 600 = 200; x
11
= min400, 400 = 400.
We have 4 = 3 + 2 1 basic variables which are: x
11
, x
12
, x
13
and x
22
.
The transportation cost in this BFS is
11 400 + 8 200 + 2 200 + 7 0 + 5 300 + 4 0 = 7900 RON.
Step 2. Check the optimality of the current BFS.
Denote by u
i
(i = 1, m) and v
j
(j = 1, n) the dual variables correspondent to
sources and respectively demand centers.
Solve the system
_
u
i
+v
j
= c
ij
for each (i, j) corresponding to a basic variable x
ij
u
1
= 0
For each (i, j) compute u
i
+ v
j
and wrote down the obtained value in the upper
right-hand corner in the correspondent cell.
Check the optimality criterion: c
ij
u
i
+v
j
for all nonbasic x
ij
.
If the current BFS is optimal then stop. Write down the optimal solution and
compute the minimal value of f.
Remark. The above system can be solved very easily. Since we know u
1
= 0, we
can get the values of v
j
for columns j of the basic cells. Knowing the values of v
j
,
from equations corresponding to basic cells in these columns, we can get the values
of u
i
for rows i of these basic cells. We continue the method with the rows of u
i
in
the same way, until all the u
i
and v
j
are computed.
99
Since any dual variable can be 0, we will choose that dual variable on whose row
(or column) we have the maximum number of basic cells to be zero.
Example. Consider the BFS obtained by the minimal cost rule in the iron ore
shipping problem.
v
1
= 11 v
2
= 8 v
3
= 2
u
1
= 0
11 11
400
8 8
200
2 2
200
u
2
= 3
7 8
0
5 5
300
4 1

To compute the dual basic solution, we start with u


1
= 0 (on the rst row we have
3 basic cells). Since (1,1) is a basic cell, we have u
1
+v
1
= c
11
= 11, so v
1
= 11.
Since (1,2) is a basic cell, we have u
1
+v
2
= c
12
= 8, so v
2
= 8.
Since (1,3) is a basic cell, we have u
1
+v
3
= c
13
= 2, so v
3
= 2; and the processing
of row 1 is done.
Since (2,2) is a basic cell, we have u
2
+v
2
= c
22
= 5, and since v
2
= 8 we get that
u
2
= 5 8 = 3.
The dual solution is written in the above array.
We check the optimality conditions and see that c
21
= 7 < 8 = u
2
+v
1
.
Since c
21
< u
2
+v
1
, the optimality criterion is not satised.
Step 3. Improve the current basic feasible solution by replacing exactly one basic
variable with a nonbasic variable for which the optimality criterion is violated.
Choose as entering cell the nonbasic variable x
ij
with the most negative reduced
cost, c
ij
u
i
v
j
.
Since every BFS for the mn transportation problem has exactly m+n1 basic
variables, when an entering variable is brought into the BFS, some present basic
variable should be dropped from the BFS: the variable is called the dropping basic
variable and the correspondent cell is called the dropping basic cell.
We determine now the dropping basic variable and the new BFS.
The value of the entering variable is changed from 0 (its present value as a nonbasic
variable) to a value denoted by (which will be determined later) and all the other
nonbasic variables remain unchanged. If we denote by (p, q) the entering cell then the
value of x
pq
changes from 0 to . We have to subtract to one of the basic variables
in row p (such that a
p
units to be shipped out of source p) and to subtract to one of
the basic variables in column q (such that b
q
units to be shipped to demand center q).
We continue these adjustments adding to another basic variable, then subtracting
from a basic variable; until all the adjustments cancel each other in every row and
column. All the cells which have the values modied by or by belong to a loop
which has the following properties:
i) all the cells in the loop other than the entering cell are in present basic cells
ii) every row and column of the array either has no cells in the loop; or has exactly
two cells, one with + adjustment and the other with a adjustment.
The cells in the loop with + adjustment are called recipient cells. The cells with
adjustment are called donor cells.
100
The new solution is x() = (x
ij
())
i,j
x
ij
() =
_
_
_
x
ij
if the cell (i, j) is not in the loop
x
ij
+ if (i, j) is a recipient cell in the loop
x
ij
if (i, j) is a donor cell in the loop.
We have that
f(x()) = f(x) +(c
pq
u
p
v
q
).
(The proof of the previous equality is beyond the scope of this text).
Since f(x()) = f(x) +(c
pq
u
p
v
q
) and c
pq
u
p
v
q
< 0, in order to decrease
the objective value as much as possible we should give the maximum value it can
have which is
= minx
ij
, (i, j) a donor cell in the loop.
A donor cell for which = x
ij
will become the dropping cell (and become a
nonbasic cell). If there are more than one donor cell for which = x
ij
, then the
dropping cell will be that cell for which there are no loops with basic cells.
We can summarize the previous discussion as follows:
- choose as entering variable x
ij
with the most negative reduced cost (the reduced
cost is c
ij
u
i
v
j
)
- starting from this cell consider a loop whose all the other corners except the
starting cell, are situated in basic cells. On each row or column there are either 2 cells
of the loop or none.
- mark (in the lower left-hand corner of the correspondent cell) by a + the odd
cells of this loop (the rst, the third,...). These are the recipient cells of the loop.
- mark by a the even cells of this loop (the second, the fourth,...). These are the
donor cells of the loop.
- determine the minimal value of the variables situated in the donor cells. Denote
it by .
- add to the recipient variables (situated in the cells marked by +).
- subtract to the donor variables (situated in the cells marked by )
- one of the donor variables equal to will leave the base. Go to step 2.
Example. Improve the BFS obtained by the minimal cost rule in the iron ore
shipping problem.
v
1
= 11 v
2
= 8 v
3
= 2
u
1
= 0
11 11
400
8 8
+ 200
2
200
u
2
= 3
7 8
+
5 5
300
4 1

The nonbasic cell (2,1) with c


12
= 7 < 8 = u
1
+v
2
is the only eligible cell to enter
the base. The loop consists of the following cells: (2,1), (2,2), (1,2), (1,1).
The odd cells (recipient cells) are (2,1), (1,2) and they are marked by +.
The even cells (donor cells) are (2,2), (1,1) and they are marked by .
101
Since = min400, 300 = 300 = x
22
then the dropping variable is x
22
.
The new BFS is given in the array below.
v
1
= 11 v
2
= 8 v
3
= 2
u
1
= 0
11 11
100
8 8
500
2 2
200
u
2
= 4
7 7
300
5 4

4 2

We compute the dual basic variables and since


c
22
= 5 > 4 = u
2
+v
2
; c
23
= 4 > 2 = u
2
+v
3
the optimality criterion is satised.
Hence, we get an optimal solution
x =
_
100 500 200
300 0 0
_
with a minimum cost
f(x) = 11 100 + 8 500 + 2 200 + 7 300 + 5 0 + 4 0 = 7600 RON.
Remark. a) Degenerate solution
Each BFS which occurs in this iterative process must have m+n1 basic variables.
If we have less than m+n1 positive variables we must put 0s instead of points
(in the cells of nonbasic variables) in order to obtain m + n 1 basic variables. We
will transform the points into 0s in that cells with minimal costs for which there are
no loops with the basic cells.
b) Multiple solution
If all the optimality conditions are fullled and if there is a nonbasic cell for which
c
ij
= u
i
+ v
j
then by choosing a loop starting with this cell we will obtain another
optimal solution.
If x
1
and x
2
are optimal solutions then for each t [0, 1], (1t)x
1
+tx
2
is optimal,
too.
Next, we present an example of nding the loop which improves a given basic
feasible solution for a balanced transportation problem.
The loop will be a little more complicated in this case.
102
v
1
= 5 v
2
= 3 v
3
= 4 v
4
= 9 v
5
= 4
u
1
= 0
6 5

3 3
18
4 4
+ 20
5 9

5 4

u
2
= 5
10 10
27
5 8

7 9

11 4

9 9
+ 35
u
3
= 3
8 8
+ 0
5 6

7 7
27
11 6

9 7

u
4
= 21
13 26

4 24
+
16 25

12 12
25
25 25
65
The previous BFS is degenerate since among 8 components just 7 are positive.
Computing the dual variables we observe that the optimality conditions are vio-
lated in the following cells: (2,2), (2,3), (3,2), (4,1), (4,2) and (4,3).
The entering variable is (4,2) since it provides the most negative reduced cost.
We put a + in the lower left-hand side part of this cell. To satisfy the equality
constraints in the problem we need to subtract the same value as was added in cell
(3,2) from one of the basic cells in row 4, that is in cells (4,4) or (4,5).
If we choose the cell (4,4), since this is the only basic cell in column four, we
cannot make the next correction in another basic cell in this column. In conclusion
the adjustment must be made in the basic cell (4,5). Continuing in the same way we
obtain the entire loop (presented in the above array).
Example 2. Solve the following transportation problem.
Demand center
1 2 3
1
1 2 3
15
Source 2
4 4 10
19
3
6 5 15
11
7 8 30
We check rst the balance condition: 7 + 8 + 30 = 15 + 19 + 11.
Step 1. We determine an initial BFS by using the minimal cost rule:
103
1
7
2
8
3

15 8 0
4

10
19
19 0
6

5
0
15
11
11 0
7 8 30
0 0 11
0
Since c
11
= 1 = minc
ij
, i = 1, 3, j = 1, 3 we have
x
11
= min7, 15 = 7.
The minimum cost of the remaining array is now c
12
= 2 so x
12
= min48, 8 = 8.
Since we obtain just 4 positive variables and since we must have 3+31 = 5 basic
variables we must transform a into a basic variable with value 0. We can transform
any in 0 since neither of present nonbasic variables have loops with basic variables.
We consider x
23
= 0 to be a basic variable.
Step 2. We check the optimality of the previous BFS.
v
1
= 1 v
2
= 2 v
3
= 12
u
1
= 0
1 1
7
2 2
8
3 12
+
u
2
= 2
4 1

4 0

10 10
19
u
3
= 3
6 4

5 5
+ 0
15 15
11
u
1
= 0, u
1
+v
1
= 1 v
1
= 1
u
1
+v
2
= 2 v
2
= 2
v
2
= 2, u
3
+v
2
= 5 u
3
= 3
u
3
= 3, u
3
+v
3
= 15 v
3
= 12
v
3
= 12, u
2
+v 3 = 10 u
2
= 2
Since c
13
= 3 < 12 = u
1
+v
3
the BFS is not optimal.
Step 3. We improve the BFS, starting from the (1,3) cell.
The loop consists in the following cells: (1,3), (3,3), (3,2) and (1,2).
The odd (recipient) cells are: (1,3) and (3,2).
The even (donor) cells are: (1,2) and (3,3).
104
Since = min8, 11 = 8 = x
12
then the dropping variable is x
12
.
The new BFS is given in the array below.
v
1
= 1 v
2
= 7 v
3
= 3
u
1
= 0
1 1
7
2 7

3 3
+ 8
u
2
= 7
4 8

4 0

10 10
19
u
3
= 12
6 13
+
5 5
8
15 15
3
We compute the dual variables and since c
21
= 4 < 8 = u
2
+ u
1
and c
31
=
6 < 13 = u
3
+ u
1
the BFS is not oprimal. The entering variable is x
31
(since it
provides the most negative reduced cost). The loop is (3,1), (3,3), (1,3), (1,1) with
= min7, 3 = 3 = x
33
which will be the dropping variable.
The new array is:
v
1
= 1 v
2
= 0 v
3
= 3
u
1
= 0
1 1
4
2 0

3 3
+ 11
u
2
= 7
4 8
+
4 7

10 10
19
u
3
= 5
6 6
3
5 5
8
15 8

The new BFS obtained before is not optimal since the optimality condition is
violated in the cells: (2,1) and (2,2). The entering variable is x
21
. Repeating the
procedure presented before we obtain the following new BFS.
v
1
= 3 v
2
= 4 v
3
= 3
u
1
= 0
1 3

2 4

3 3
15
u
2
= 7
4 4
4
4 3

10 10
15
u
3
= 9
6 6
3
5 5
8
15 12

We compute the dual variables and since c


11
> u
1
+v
1
, c
12
> u
1
+v
2
, c
22
> u
2
+v
2
,
c
33
> u
3
+v
3
the optimality criterion is satised.
105
Hence, we get an optimal solution
x =
_
_
0 0 15
4 0 15
3 8 0
_
_
with a minimum cost
f(x) = 3 15 + 4 4 + 10 15 + 6 3 + 5 8 = 269.
According to the previous remark, part b, the previous solution is unique since
there are no nonbasic variables for which the reduced costs to be zero.
Example 3. Solve the following balanced transportation problem.
Demand center
1 2 3
1
9 6 8
6
Source 2
10 5 12
11
3
11 13 20
4
3 4 14
Solution. We determine an initial basic feasible solution by using the least cost
rule. We show this BFS in the following array. The dual variables are also entered in
the array.
v
1
= 10 v
2
= 5 v
3
= 12
u
1
= 4
9 6

6 1

8 8
6
6 0
u
2
= 0
10 10
3
5 5
4
12 12
+ 4
11 7 4 0
u
3
= 8
11 18
+
13 13

20 29
4
4 0
3 4 14
0 0 8
0
The optimality criterion is violated since c
31
= 11 < 18 = u
3
+ v
1
. Hence (3,1)
is the entering cell, and the corresponding loop is already entered on the array. =
min3, 4 = 3 and (2,1) is the dropping cell. The next BFS is presented in the following
array.
106
v
1
= 3 v
2
= 5 v
3
= 12
u
1
= 4
9 1

6 1

8 8
6
u
2
= 0
10 3

5 5
4
12 12
+ 7
u
3
= 8
11 11
3
13 13
+
20 20
1
Now the optimality criterion is satised, so the present BFS is an optimal solution.
x =
_
_
0 0 6
0 4 7
3 0 1
_
_
.
The minimum cost is
f
min
= 8 6 + 5 4 + 12 7 + 11 3 + 20 1 = 205.
Since in the last array there is a nonbasic cell, namely (3,2), for which c
32
= u
3
+v
2
,
then by choosing a loop starting with this cell we will obtain another optimal solution,
as bellow:
v
1
= 3 v
2
= 5 v
3
= 12
u
1
= 4
9 1

6 1

8 8
6
u
2
= 0
10 3

5 5
3
12 12
8
u
3
= 8
11 11
3
13 13
1
20 20

The new optimal solution is:


x

=
_
_
0 0 6
0 3 8
3 1 0
_
_
with the same minimal cost f
min
= f(x

) = 205.
The general solution (see the previous remark, part b) is:
x(t) = (1 t)x +tx

=
_
_
0 0 6
0 4 t 7 t
3 0 1 t
_
_
, t [0, 1].
107
Marginal values in the balanced transportation
problem
In this subsection we analyse how changes in availabilities and requirements a
i
and b
i
, aect the transportation cost in a balanced transportation problem.
The marginal value is the rate of change in the optimum objective value, per unit
change in the availabilities and requirements, a
i
and b
j
.
Since the balance condition is necessary for feasilibity, we cannot change only one
quantity among a
1
, . . . , a
m
; b
1
, . . . , b
n
.
We will consider the following types of changes:
i) increased demand at demand center j and the same balancing increase in avail-
ability at source i
ii) increased availability at source p and decreased availability at source i by the
same amount (this moves the supply from source i to source p)
iii) increased demand at demand center q and decreased demand at demand center
j by the same amount.
In all the cases presented above, all the other data in the balanced transportation
problem remain the same.
Let x and (u, v) be optimal solutions for primal and dual problems. Assume then x
is nondegenerate. According to Remark 6 of the previous section the marginal values
in the three cases presented before are:
i) v
j
+u
i
ii) u
p
u
i
iii) v
q
v
j
.
Example. Consider the balanced transportation problem presented in the previ-
ous example. In the next array we consider the rst optimal solution x.
v
1
= 3 v
2
= 5 v
3
= 12
u
1
= 4
9 1

6 1

8 8
6
u
2
= 0
10 3

5 5
4
12 12
7
u
3
= 8
11 11
3
13 13

20 20
1
The optimum transportation cost in this problem is 205.
According to the above discussion, if b
2
increases from its current value by 2 units
and a
2
changes by the same amount (to keep the problem balanced) then the optimum
objective value will change by 2(u
2
+v
2
) = 2 5 = 10 taking the value 215.
We remark the fact that if demand b
2
increases, the best place to create additional
supplies to satisfy that additional demand, is source 1 (it is the source with the small-
est u
i
). By shifting supply (2 units in our case) from source 2 to source 1 the company
can save 8 monetary units since the rate of change in the optimum transportation
cost per unit shift is u
1
u
2
= 4 0 = 4.
108
Remark. The previous discussion is true for suciently small changes in b
j
and
a
i
which dont aect the basis of the transportation problem. In this case, both, initial
and modied transportation have the same basic variables which determine the same
dual variables in both problems, as needed in Remark 6 from previous section.
Unbalanced transportation problem
So far we assumed that the total supply at all the sources is equal to the total
demand at all the demand centers. This implies that
m

i=1
a
i
=
n

j=1
b
j
so the system is in balance.
In many applications it may be impossible (or unprotable) to ship all that is
required or the total supply either exceeds or is less than the total demand. Such
problems are called unbalanced and can be solved by transportation algorithm as
below.
(a) Supply exceeds demand (overproduction)
In this case
_
_
m

i
a
i
>
n

j
b
j
_
_
after all the demand is met an amount of
m

i
a
i

j
b
j
will be left unused at the sources.
To solve this problem, we introduce a new (n + 1) column in the array. For each
i, i = 1, m, the cell (i, n +1) corresponds to the material left unused at source i. The
cost coecients for all the cells in this new column are equal to zero and
b
n+1
=
m

i=1
a
i

n

j=1
b
j
.
In the optimum solution of this modied problem, basic values in the cells of n+1
column represent unused material at the sources.
(b) Demand exceeds supply (underproduction)
In this case
_
_

i
a
i
<

j
b
j
_
_
there is a shortage of

j
b
j

i
a
i
and we cannot
meet all the demand with the existing supply.
To solve this problem, we introduce a new (m + 1) row in the array. Since we
have to nd how to distribute the existing supply to meet as much as the demand as
possible, we introduce a dummy source with availability a
n+1
=

j
b
j

i
a
i
. The
cost coecients for all the cells in this new row are equal to zero.
In the optimum solution of this modied problem, basic values in the cells of
(m+ 1) row represent unfullled demand at demand centers.
109
110
Part II
Calculus
111
Chapter 3
One variable calculus
This part of the text concerns overviews of one-variable calculus. One can cover
this material either by taking the facultative mathematics course or read it on their
own as a review of the calculus they have already taken in high school. The examples
contained in this part should make the process relatively simple.
A central goal of economic theory is to express and analyse relationships between
economic variables which are described mathematically by functions.
A numerical function f : A B, A, B R,
A x y = f(x) B
is a rule which assigns one and only one value y = f(x) in B to each element x in A.
The variable x is called the independent variable, or, in economic applications the
exogenous variable. y is called the dependent variable, or in economic applications
the endogeneous variable.
The development of calculus by Isaac Newton (1642-1727) and Gottfried Wilhelm
von Leibniz (1646-1716) resulted from studying certain mathematical problems such
as:
nding the tangent line to a curve at a given point
nding the extreme values of certain functions
nding the areas of planar regions bounded by arbitrary curves.
The study of the rst classes of problems mentioned before led to the creation of
dierential calculus. The study of the third class of problems led to the creation of
integral calculus.
3.1 Dierential calculus of one variable
The most important information in which we are interested concerns how a change
in one variable aects the other. In the case when the relationships are expressed as
113
linear functions the eect of a change of one variable on the other is expressed by
the slope of the function, but in the general case the eect of this change can be
expressed by the derivative. In this section we will review some facts related to the
derivative of a one-variable function focusing on its role in quantifying relationships
between variables. The derivative of a function is dened by using the notion of limit.
3.1.1 Limits and continuity
We will begin with the intuitive approach to the notion of a limit.
Let f : R 3 R, f(x) =
9 x
2
3 x
.
Even if f(3) is not dened, f(x) can be calculated for any value of x near 3. Simple
computations show if x approaches 3 either the left or right then the values f(x) are
approaching 6. We say 6 is the limit of f as x approaches 3 and write
lim
x3
9 x
2
3 x
= 6 or f(x) 6 as x 3.
Intuitively, the notion of f(x) approaching a number l as x approaches to a number
a is dened in the following way.
Let f : D R, D R, a D

(see appendix...). If f(x) can be made arbitrarily


close to a number l by taking x suciently close to a number a (but dierent from a)
from both the left and right side of a, then lim
xa
f(x) = l.
We shall use the notation x a (or x a

) to denote that x approaches a from


the left and x a (or x a
+
) to denote that x approaches a from right.
If the limits lim
xa
f(x) and lim
xa
f(x) have a common value l, we say that lim
xa
f(x)
exists and write lim
xa
f(x) = l.
Intuitively the notion of an innite limit, lim
xa
f(x) = is as follows.
If f can be made arbitrarily large by taking x suciently close to a number a (but
dierent from a) from both the left and right side of a then lim
xa
f(x) = .
The innite limit lim
xa
f(x) = can be described in a similar manner.
The intuitive denitions are too vague to be of any use in proving theorems. A
proof of the existence of a limit can never be based on previous intuitive approach.
To give a rigorous demonstration of the existence of a limit or to prove results
concerning limits we must present now the precise denition of a limit. This rigorous
denition ( denition) is due to Augustin-Louis Cauchy. He used because of
the correspondence between epsilon and the french word erreur and because delta
correspond to dierence.
Denition 1. ( denition of a limit)
Let f : A R, a A

R.
lim
xa
f(x) = l R means for every > 0 there exists a > 0 such that [f(x)l[ <
whenever 0 < [x a[ < and x A.
To try to understand the meaning behind this abstract denition, see the diagram
below.
114
`

y
x a a a +
l +
l
l
We rst take an > 0 and represent on the y-axis the interval (l , l +) around l.
We then determine an interval (a, a+) around a so that for all x-values (excluding
a) inside the determined interval the corresponding values f(x) lie inside (l , l +).
In general, the value of will depend on the value of . That is, we will always
begin with > 0 and then determine an appropriate corresponding value for > 0.
There are many values of which work. Once a value for is found, all smaller values
of also work.
In the next examples we will use the precise denition.
We will begin the proofs by letting > 0 be given.
Then we take the expression [f(x) l[ < and from this inequality we try to
determine an appropriate value for (which depends on ) such that [x a[ < will
guarantee [f(x) l[ < .
Example 1. Prove that lim
x3
(4x 2) = 10.
Solution. Given > 0, we want to determine a > 0 so that [x 3[ < will
guarantee [f(x) 10[ = [(4x 2) 10[ < . It is natural to try to determine a
connection between [(4x 2) 10[ and [x 3[. We have
[f(x) 10[ = [(4x 2) 10[ = [4x 12[ = 4[x 3[.
So [f(x) 10[ < 4[x 3[ < [x 3[ <

4
.
The choice of is now clear. If we let =

4
we have
[(4x 2) 10[ < [x 3[ < .
Putting all these together, we can write down the following proof.
Given any > 0, let =

4
> 0. Then for all x in the domain of the function f, if
0 < [x 3[ < , we have
[f(x) 10[ = [(4x 2) 10[ = [4x 12[ = [4(x 3)[ = 4[x 3[ < 4 = .
This completes the proof of lim
x3
(4x 2) = 10.
115
Example 2. Prove that lim
x4
3
x + 5
=
1
3
.
Solution. We will start with the analysis of the problem.
Observe that

f(x)
1
3

3
x + 5

1
3

4 x
3(x + 5)

=
[x 4[
3[x + 5[
.
So,

3
x + 5

1
3

<
[x 4[
3[x + 5[
< [x 4[ < 3[x + 5[.
We cannot just take = 3[x + 5[ since should not depend on x. We see that
we need here is a constant k so that for x close enough to 4,
1
3[x + 5[
k which is
equivalent to [x + 5[ a constant.
If [x 4[ < , then [x + 5[ = [x 4 + 9[ 9 [x 4[ > 9 . If we take 1
then the previous inequality becomes [x + 5[ > 9 8.
In consequence we will have:
1
3[x + 5[

1
24
and

3
x + 5

1
3

=
[x 4[
3[x + 5[

[x 4[
24
< .
We observe that must satisfy two conditions: 1 and

24
.
We can now write down the following proof:
Let > 0 and = min1, 24. Then > 0 and if x is in the domain of the
function f and if 0 < [x 4[ < we have
[x + 5[ = [x 4 + 9[ 9 [x 4[ > 9 9 1 = 8
and in consequence we get

3
x + 5

1
3

4 x
3(x + 5)

=
[x 4[
3[x + 5[

[x 4[
24
<

24

24
24
= .
This completes the proof of
lim
x4
3
x + 5
=
1
3
.
We will present now the denition for the limits that involve innity.
Denition 2. a) Let f : A R and let a A

R.
lim
xa
f(x) = means for each > 0 there exists a > 0 such that f(x) >
whenever 0 < [x a[ < and x A.
lim
xa
f(x) = means for each > 0 there exist a > 0 such that f(x) <
whenever 0 < [x a[ < and x A.
b) Let f : A R such that there is a a R such that (a, ) A.
116
lim
x
f(x) = l R if for each > 0 there is a > 0 such that [f(x) l[ <
whenever x > and x A.
Similarly we can dene lim
x
f(x) = l R.
lim
x
f(x) = if for each > 0 there is a > 0 such that f(x) > whenever
x > and x A.
Similarly we can dene lim
x
f(x) = , lim
x
= .
The following properties of limits, which we list without proof, enable us to eval-
uate limits of functions algebraically.
Property 1. If f(x) = c, where c is a constant, then lim
xa
f(x) = c.
Property 1 states the limit of the constant function f(x) = c at any point x = a
is equal to the value of the constant function which is c.
Property 2. If f(x) = x, then lim
xa
f(x) = lim
xa
x = a.
Property 3. If lim
xa
f(x) = l and n R such that (f(x))
n
is well dened then
lim
xa
[f(x)]
n
=
_
lim
xa
f(x)
_
n
= l
n
.
Property states that the limit of the n-th power of a function is equal to the nth
power of the limit of the function. For this result to be true we must assume that x
is chosen so that the nth power of f is well dened for x close to a. For instance, if
n =
1
2
then f(x) cannot be negative.
Property 4. If lim
xa
f(x) = l R and k is a constant, then
lim
xa
(kf)(x) = k
_
lim
xa
f(x)
_
= kl.
Property 4 states that the limit of a constant times a function is equal to the
constant times the limit of the function.
Property 5. If lim
xa
f(x) = l R and lim
xa
g(x) = m R, then
lim
xa
(f g)(x) = lim
xa
f(x) lim
xa
g(x).
So, the limit of the sum or dierence of two functions is equal to the sum or
dierence of their limits. This result is easily extended to the case involving the sum
and (or) dierence of any nite number of functions.
Property 6. If lim
xa
f(x) = l R and lim
xa
g(x) R then
lim
xa
(fg)(x) = lim
xa
f(x) lim
xa
g(x).
The previous property can be also easily extended to the case involving the product
of any nite number of functions.
Property 7. If lim
xa
f(x) = l, lim
xa
g(x) = m and m ,= 0, then
lim
xa
_
f
g
_
(x) =
lim
xa
f(x)
lim
xa
g(x)
=
l
m
.
117
The previous rules tell us that the limit operations interact with all the basic
algebraic operations in a natural way.
Example 3. Compute the following limit
lim
x2
(2x
2
+ 1)(3x 1)
x + 4
.
Solution.
lim
x2
(2x
2
+ 1)(3x 1)
x + 4
P7
=
lim
x2
(2x
2
+ 1)(3x 1)
lim
x2
(x + 4)
P6
=
=
lim
x2
(2x
2
+ 1) lim
x2
(3x 1)
lim
x2
(x + 4)
P5
=
_
lim
x2
(2x
2
) + lim1
_ _
lim
x2
3x limn
_
lim
x2
x + lim
x2
4
P1,P4
=
_
2 lim
x2
x
2
+ 1
__
3 lim
x2
x 1
_
lim
x2
x + 4
P3,P2
=
_
2
_
lim
x2
x
_
2
+ 1
_
(3 2 1)
2 + 4
=
(2 4 + 1) 5
6
=
9 5
6
=
15
2
.
In certain situations, the attempt to apply Property 7 leads to the expression
0
0

(that is, both numerator and the denominator have limit 0 at x = a). In this case we
say that the quotient
f(x)
g(x)
has the indeterminate form
0
0
at x = a.
To solve the problem, we have to replace the given function with another one that
takes the same values at the original function except at x = a and to evaluate the
limit of the latter. The next examples illustrate this process.
Example 4. Find lim
x2
x
2
+ 6x 16
x
2
5x + 6
.
Solution. The limit of the denominator is
lim
x2
(x
2
5x + 6) = 2
2
5 2 + 6 = 0.
So, we cannot use the Property 7. We see if it is possible to simplify the given function.
We can try to factorize the denominator and the numerator too. The fact that the
denominator has limit zero suggest that 2 is a root of the denominator and so x 2
is a factor of the denominator. In the same way we can conclude that x 2 is also a
factor of the numerator. Thus,
lim
x2
x
2
+ 6x 16
x
2
5x + 6
= lim
x2
(x 2)(x + 8)
(x 2)(x 3)
= lim
x2
x + 8
x 3
=
2 + 10
2 3
= 12.
Example 5. Find lim
x4

3x + 4 4
x 4
.
118
Solution. The limit is again of the form
0
0
. The trouble this time is that it
might not be very clear how we can nd the hidden factor x 4 in the numerator.
The technique which is to be applied in this case is to rationalize by using the idea
of the conjugate expression.
In our example, when

3x + 4 4 is multiplied by

3x + 4 + 4 we will get
(

3x + 4)
2
4
2
= 2x + 4 16 = 3(x 4).
The square root disappears and a polynomial is obtained. Here are the details
lim
x4

3x + 4 4
x 4
= lim
x4
(

3x + 4 3)(

3x + 4 + 4)
(x 4)(

3x + 4 + 4)
= lim
x4
3(x 4)
(x 4)(

3x + 4 + 4)
= lim
x4
3

3x + 4 + 4
=
3

3 4 + 4 + 4
=
3
8
.
Example 6. Compute lim
x
3x
2
+ 8x 4
2x
2
+ 4x 5
if it exists.
Solution. Since the limits of both the numerator and the denominator are the
Property 7 is not applicable. We will try to put the function into a form in which we
can nd the required limit. By taking out as common factor x
2
(the highest power of
x appearing in the denominator) from both the numerator and the denominator, we
obtain
lim
x
3x
2
+ 8x 4
2x
2
+ 4x 5
= lim
x
x
2
_
3 +
8
x

4
x
2
_
x
2
_
2 +
4
x

5
x
2
_ = lim
x
3 +
8
x

4
x
2
2 +
4
x

5
x
2
=
lim
x
3 + 8 lim
x
1
x
4 lim
x
1
x
2
lim
x
2 + 4 lim
x
1
x
5 lim
x
1
x
2
=
3 + 8 0 4 0
2 + 4 0 5 0
=
3
2
.
Observe that we have use that lim
x
1
x
= lim
x
1
x
2
= 0 in evaluating the second
and third terms of both the numerator and the denominator.
The previous remark can be generalized as follows.
Property 8. lim
x
1
x
n
= 0 for all n > 0 and lim
x
1
x
n
= 0 for all n > 0, provided
that
1
x
n
is dened.
Some limits are best calculated by rst nding the left and right-hand limits.
Denition 3. (left-hand limit). Let f : A R and let a A

R.
lim
xa
f(x) = l R if for every number > 0 there is a number > 0 such that if
a < x < a and x A then [f(x) l[ < .
119
lim
xa
f(x) = if for every number > 0 there is a number > 0 such that if
a < x < a and x A then f(x) > .
Similarly, we can dene lim
xa
f(x) = .
Denition 4. (right-hand limit). Let f : A R and let a A

R.
lim
xa
f(x) = l R if for every number > 0 there is a number > 0 such that if
a < x < a + and x A then [f(x) l[ < .
lim
xa
f(x) = for every number > 0 there is a number > 0 such that if
a < x < a + and x A then f(x) > .
Similarly, we can dene lim
xa
f(x) = .
Notice that Denition 3 is the same as Denition 1 except that x is restricted to
be in the left half (a, a) of the interval (a, a+). In Denition 4, x is restricted
to lie in the right half (a, a +) of the interval (a , a +).
The following theorem says that a limit exists if and only both of the one-sided
limits exist and are equal.
Theorem 1. Let f : A R and let a A

R.
Then lim
xa
f(x) = l if and only if lim
xa
f(x) = lim
xa
f(x) = l.
Example 7. If
f(x) =
_
x 6, if x > 6
2x 12, if x < 6,
determine whether lim
x4
f(x) exists.
Solution. Since f(x) =

x 6 for x > 6, we have


lim
x6
f(x) = lim
x6

x 6 =

6 6 = 0.
Since f(x) = 2x 12 for x < 6, we have
lim
x6
f(x) = lim
x6
(2x 12) = 2 6 12 = 0.
The right and left hand limits are equal. Thus the limit exists and lim
x6
f(x) = 0.
By using the limits we can dene the notion of asymptote of a real valued function.
A linear asymptote is essentially a straight line to which the graph of the function
becomes closer and closer but does not become identical.
A function may have multiple asymptotes, of dierent or of the same kind. One
such function with a horizontal, vertical and oblique asymptote is graphed below.
120
`

x
y = f(x)
f : R

R
f(x) =
_

_
1
x
+x, x > 0
1
x
, x < 0
Denition 5. (asymptote)
a) horizontal asymptote
The line y = l R is called a horizontal asymptote of the curve y = f(x) if
lim
x
f(x) = l or lim
x
f(x) = l.
b) vertical asymptote
The line x = a R is called a vertical asymptote of the curve y = f(x) if at least
one of the following statements is true:
lim
xa
f(x) = (or ); lim
xa
f(x) = (or );
lim
xa
f(x) = (or ).
c) oblique (or slant) asymptote
The line y = mx +n, m , 0, is called an oblique (or slant) asymptote if
lim
x
(f(x) (mx +n)) = 0 or lim
x
(f(x) (mx +n)) = 0.
In this case
m = lim
x
f(x)
x
and n = lim
x
(f(x) mx)
or
m = lim
x
f(x)
x
and n = lim
x
(f(x) mx).
121
In particular a function y = f(x) can have at most 2 horizontal or 2 oblique
asymptotes (or one of each).
Example 8. Find the asymptotes of the graph of the function dened by
f(x) =

2x
2
+ 1
3x 5
.
Solution. First we determine the domain of f.
A = x R [ 2x
2
+ 1 0, 3x 5 ,= 0 = R
_
5
3
_
lim
x

2x
2
+ 1
3x 5
= lim
x

x
2
_
2 +
1
x
2
_
x
_
3
5
x
_ = lim
x
x
_
2 +
1
x
2
x
_
3
5
x
_
= lim
x
_
2 +
1
x
2
3
5
x
=

2
3
.
Therefore the line y =

2
3
is a horizontal asymptote of the graph of f.
In computing the limit as x , we must remember that for x < 0, we have

x
2
= [x[ = x. So, when we take out as common factor x
2
we have
_
2x
2
+ 1 =

x
2
_
2 +
1
x
2
_
= [x[
_
2 +
1
x
2
= x
_
2 +
1
x
2
Therefore
lim
x

2x
2
+ 1
3x 5
= lim
x
x
_
2 +
1
x
2
x
_
3
5
x
_ = lim
x
_
2 +
1
x
2
3
5
x
=

2
3
Thus the line y =

2
3
is also a horizontal asymptote. A vertical asymptote is
likely to occur when the denominator 3x 5 is 0, that is when x =
5
3
.
lim
x
5
3

2x
2
+ 1
3x 5
=

2
_
5
3
_
2
+ 1
+0
=
122
If x is close to
5
2
but x <
5
3
, then 3x 5 < 0 and so f(x) is large negative. Thus
lim
x
5
3

2x
2
+ 1
3x 5
= .
Since we already have two horizontal asymptotes there are no oblique asymptotes
of f.
Example 9. Determine the horizontal and oblique asymptotes of the function
dened by
f(x) =
_
x
2
+x x.
Solution.
D = x R [ x
2
+x 0
= x R [ x(x + 1) 0
= (, 1] [0, )
We compute rst
lim
x
f(x) = lim
x
(
_
x
2
+x x)
Both

x
2
+x and x are large when x is large, so it is very dicult to see what
happens to their dierence. We will use algebra to rewrite the function. We rst
multiply both the numerator and the denominator by the conjugate radical.
lim
x
(
_
x
2
+x x) = lim
x
(
_
x
2
+x x)

x
2
+x +x

x
2
+x +x
= lim
x
(x
2
+x) x
2

x
2
+x +x
= lim
x
x

x
2
+x +x
= lim
x
x

x
2
_
1 +
1
x
_
+x
= lim
x
x
x
_
_
1 +
1
x
+ 1
_
= lim
x
1
_
1 +
1
x
+ 1
=
1
2
So, y =
1
2
is an horizontal asymptote of f.
Since
lim
x
(
_
x
2
+x x) = + =
then there is no horizontal asymptote at .
123
It remains to look for oblique asymptote at .
m = lim
x
f(x)
x
= lim
x

x
2
+x x
x
If in the previous limit we make the substitution y = x then y and the
limit becomes
m = lim
y
_
(y)
2
+ (y) (y)
y
= lim
y
_
y
2
y +y
y
= lim
y
y
__
1
1
y
+ 1
_
y
= lim
y
_
1
1
y
+ 1
1
= 2
n = lim
x
(f(x) mx) = lim
x
(f(x) + 2x) = lim
x
(
_
x
2
+x +x)
= lim
y
(
_
y
2
y y) = lim
y
y
2
y y
2
_
y
2
y +y
= lim
y
y
_
y
2
y +y
= lim
y
y
y
__
1
1
y
+ 1
_ = lim
y
1
_
1
1
y
+ 1
=
1
2
In conclusion the line y = 2x
1
2
is a slant asymptote to .
Next, we want to look at another useful technique of nding limits. We start with
an example.
Example 10. Find lim
x0
x
2
cos
1
x
.
Solution. We know from Property 6 that the limit of a product is the product of
the limits. That assumes that the limits of the factors exist. If we try to apply this
result, we would say that the limit of x
2
cos
1
x
is the limit of x
2
times the limit of
cos
1
x
. The problem is that the limit of cos
1
x
does not exist and the limit property
for product cannot be applied. Indeed,
cos
1
1
2n
= cos 2n = 1
for any nonzero natural number and
cos
1
1

2
+n
= 0
124
for any natural number. Since the values of cos
1
x
do not approach a xed number as x
approaches 0, lim
x0
cos
1
x
does not exist. Lets notice that even though cos
1
x
oscillates,
it oscillates between xed bounds, namely 1 and 1. So, as long as x ,= 0 we have
1 cos
1
x
1.
We multiply the previous inequality by x
2
and we get
x
2
x
2
cos
1
x
x
2
, for all x ,= 0.
Notice that x
2
is always positive, and so when we multiply it to the inequality, we
do not need to turn the inequality signs around.
As x 0, both x 0 and x
2
0. Being squeezed between two functions that
approach 0, the function x
2
cos
1
x
is forced to go to zero, too. So we can conclude that
it also has limit 0. That is
lim
x0
x
2
cos
1
x
= 0.
The way we solved the above example suggests that we can write down a general
result.
Theorem 2. (Squeeze theorem). Suppose that f(x) g(x) h(x) for all x close
to a, except possible for x = a. If lim
xa
f(x) = lim
xa
h(x) = l then lim
xa
g(x) = l.
If we compute the limit of a polynomial f at a given point a then the limit will
be f(a). For instance
lim
x2
(2x
2
+ 4x + 1) = 2 2
2
+ 4 2 + 1.
Functions with this property are called continuous at a. We will see that the
mathematical denition of continuity corresponds closely with the meaning of the
word continuity in everyday language.
Denition 6. Let f : A R and a A.
a) If a A

we say that f is continuous at a if and only if


lim
xa
f(x) = f(a).
If a A A

(a is an isolated point of A) then f is continuous at a.


b) If a A

we say that f is continuous from the right at a if


lim
xa
f(x) = f(a).
c) If a A

we say that f is continuous from the left at a if


lim
xa
f(x) = f(a).
125
The previous denition says that f is continuous at an accumulation point a if
f(x) approaches f(a) as x approaches a. A continuous function f has the property
that a small change in x produces only a small change in f(x).
Geometrically, the graph of a continuous function at each point of a given interval
can be drawn without removing the pen from the paper.
We say that f is discontinuous at a, or f has a discontinuity at a, if f is not
continuous at a.
Let f : I R where I is an interval on the real axis. The function f is continuous
on I if it is continuous at each point in the interval. If f is dened only on one side
of an end point of the interval, we understand continuous at the end point to mean
continuous from the right or continuous from the left.
Instead of using Denition 6 to verify the continuity of a function, it is often con-
venient to use the next theorem, which shows how to build up complicated continuous
functions from simple ones.
Theorem 3. a) If f and g are continuous at a and c is a constant, then the
following functions are also continuous at a:
f g, cf, fg and
f
g
if g(a) ,= 0.
b) If g is continuous at a and f is continuous at g(a) then the composite function
f g (given by (f g)(x) = f(g(x))) is continuous at a.
c) The following types of functions are continuous at every point in their do-
mains: polynomials, rational functions, root functions, trigonometric functions, in-
verse trigonometric functions, exponential functions and logarithmic functions.
Intuitively, the part b) of the previous theorem is reasonable because if x is close
to a, then g(x) is close to g(a) and since f is continuous at g(a) then f(g(x)) is close
to f(g(a)).
Example 11. Where is the following function continuous?
f : A R, f(x) =
1

x
2
+ 16 5
Solution. The function f is the composition of four continuous functions
f = f
1
f
2
f
3
f
4
where f
1
(x) =
1
x
, f
2
(x) = x 5, f
3
(x) =

x and f
4
(x) = x
2
+ 16.
We know that each of these functions is continuous on its domain (by Theorem 3,
part c)) and so by Theorem 3 (part a)), f is continuous on its domain.
The domain A of f is:
A = x R [ x
2
+ 16 0,
_
x
2
+ 16 ,= 5
= x [ x = 3 = (, 3) (3, 3) (3, ).
An important property of continuous functions is expressed by the following theorem.
126
Theorem 4. (The intermediate value theorem). Let f : [a, b] R be a continuous
function on the closed interval [a, b] and let m be any number between f(a) and f(b).
Then there exists a number c in (a, b) such that f(c) = m.
The intermediate value theorem states that a continuous function takes on every
intermediate value between the function values f(a) and f(b).
Example 12. Show that there is a root of the equation
4x
3
6x
2
+ 3x 2 = 0
between 1 and 2.
Solution. Let f : [1, 2] R, f(x) = 4x
3
6x
2
+ 3x 2. We are looking for a
number c between 1 and 2 such that f(c) = 0. Therefore we take a = 1, b = 2 and
m = 0 in the previous theorem.
We have
f(1) = 4 1
3
6 1
2
+ 3 1 2 = 1 < 0
f(2) = 4 2
3
6 2
2
+ 3 2 2 = 12 > 0
So, m = 0 is a number between f(1) and f(2). Since f is continuous (as a poly-
nomial function), the intermediate value theorem says there is a number c between 1
and 2 such that f(c) = 0.
The intermediate value theorem plays an important role in the way the computers
are drawing the graphs of continuous functions. A computer calculates a nite number
of points on the graph and turns on the pixels that contain these calculated points.
We end this subsection by presenting an important property of continuous which
will be used in the next sections.
Theorem 5. (The extreme value theorem). Let f : [a, b] R be a continuous
function on [a, b]. Then there exist c, d [a, b] such that f(c) f(x) and f(d) f(x)
for all x [a, b].
The extreme value theorem says that a continuous function on a closed interval
has a maximum value and a minimum value, but it doesnt tell us how to nd these
extreme values.
3.1.2 Rates of change and derivatives
Starting from the slope of a straight line we try to introduce rst the notion of
the slope of an arbitrary curve.
If a line passes through the points (x
0
, y
0
) and (x
1
, y
1
) then its slope is dened by
(see [??])
m =
y
1
y
0
x
1
x
0
.
The numerator, y
1
y
0
is the change in y which occurs when x changes from x
0
to x
1
. Mathematicians often use the symbol to denote the change. Thus we write
y
1
y
0
= y and x
1
x
0
= x. Using this notation we have
m =
y
1
y
0
x
1
x
0
=
y
x
.
127
The quantity
y
x
tells us how fast y is changing with respect to x. It represents
the rate of change of y with respect to x. For example, if
y
x
= 2, then y is increasing
twice as fast as x, while if
y
x
= 2, then y is decreasing by two units as x increases
by one unit.
For example, if the straight line is the graph of the prots of a company, then the
slope represents the change in prots, which may be increasing or decreasing, rapidly
or slowly, depending on the sign and the size of the slope. The notion of rate of change
is fundamental in economics and includes topics as changes in prot, ination rate or
elasticity of demand.
However, practical situations in economics rarely generate straight line graphs. In
consequence we have to extend the notion of slope to general curves. Even if the slope
of a straight line is a single number we cannot expect a single number to represent
the steepness of a curve, which changes from point to point.
Suppose that the curve can be represented by the equation y = f(x) and the point
P by the coordinates (x
0
, y
0
). We will dene the slope of the curve at P to be equal to
the slope of the tangent line to the curve at P. The word tangent is derived from the
Latin word tangens, which means touching. Thus a tangent to a curve is a line
that touches the curve. In other words, a tangent line should have the same direction
as the curve at the point of contact. We will assume initially that for our curve the
tangent line exists (it doesnt always exist).
Choose a second point on the curve reasonably close to P, say Q(x, y). The line
through the two points P and Q is called the secant line whose slope is easily to be
found:
m
sec
=
y y
0
x x
0
.
We can dene the slope m of the tangent line as the limit of the slope of the secant
line as x approaches x
0
.
m = lim
xx
0
y y
0
x x
0
= lim
xx
0
f(x) f(x
0
)
x x
0
,
provided this limit exists.
The expression
f(x) f(x
0
)
x x
0
measures the average rate of change of y = f(x)
with respect to x over the interval [x
0
, x] and provides us an approximation to the
rate of change of the function f at x
0
. The approximation becomes better and better
as the intervals become shorter and shorter. This leads to the following denition of
the rate of change of f at x
0
:
lim
xx
0
f(x) f(x
0
)
x x
0
(provided that the limit exists) (1)
128
`

x
0 x
P(x
0
, y
0
)
Q(x, y)
s
e
c
a
n
t
l
i
n
e
ta
n
g
en
t
lin
e
The rate of change of a function f at x
0
is often called the instantaneous rate of
change of f at x
0
. Thus, the previous limit measures both the slope of the tangent
line to the graph of f at the point P(x
0
, y
0
) and the (instantaneous) rate of change
of the function f at x
0
.
Example 1. What is the slope of the parabola y = x
2
at the point P(2, 4)?
Solution.
m
tan
= lim
x2
f(x) f(2)
x 2
= lim
x2
x
2
2
2
x 2
= lim
x2
(x 2)(x + 2)
x 2
= lim
x2
(x + 2) = 4

`
P(2, 4)
2
4
x
y
O
129
The equation of the tangent line can be obtained by using the point-slope formula
y y
0
= m(x x
0
)
(see [??]) with x
0
= 2, y
0
= f(2) = 4 and m = 4.
In consequence the equation will be y 4 = 4(x 2) or y = 4x 4.
Example 2. Find the slope at the origin of the curve with equation y = [x[.
Solution. We can rewrite the expression of the given curve with a split formula:
y =
_
x, if x 0
x, if x < 0

`
P(0, 0)
Q(x, x) Q(x, x)
y
=

x y
=
x
y
First let Q approach P from the right, so that the coordinates of Q are (x, x). The
slope of any secant line is
y y
0
x x
0
=
x 0
x 0
= 1
so the right-hand limit of these slopes is also 1. However, when Q approaches P from
the left, its coordinates are (x, x) which will produce a left hand limit of 1. Since
the right- and the left-hand limits are not equal, the slope at the origin is not dened.
Actually any function which has a sharp corner fails to admit a tangent line at that
corner.
Since the limit of the form (1) occurs whenever we calculate a rate of change in
science and engineering it is given a special name and notation.
Denition 1. Let f : A R, a A

.
The derivative of the function f at the point a, denoted by f

(a) is
f

(a) = lim
xa
f(x) f(a)
x a
(2)
if the limit exists and takes a nite value.
If we write x = a+h, then h = xa and h approaches 0 if and only if x approaches
a. Therefore, an equivalent way of stating the denition of the derivative is
f

(a) = lim
h0
f(a +h) f(a)
h
(3)
130
We say that the function f is dierentiable at a, if f admits a nite derivative at
a.
We say that the function f is dierentiable on the set A if f is dierentiable at
each point of A.
In this case we can dene the derivative function
f

: A R by x f

(x) = lim
h0
f(x +h) f(x)
h
.
Mathematicians, from the seventeenth to the nineteenth century, believed that a
continuous function usually possessed a derivative. In 1872 the German mathemati-
cian Karl Weierstrass destroyed this tenet by publishing an example of a function
that was continuous at every real number but nowhere dierentiable.
Actually, the opposite of the previous assumption is true.
Theorem 1. Let f : A R and a A

. If f is dierentiable at a then f is
continuous at a.
Example 3. Find the derivative of the function f : R R, f(x) = x
2
4x + 2
at the point a R.
Solution. From (3) we have
f

(a) = lim
h0
f(a +h) f(a)
h
= lim
h0
[(a +h)
2
4(a +h) + 2] [a
2
4a + 2]
h
= lim
ha
a
2
+ 2ah +h
2
4a 4h + 2 a
2
+ 4a 2
h
= lim
h0
2ah +h
2
4h
h
= lim
h0
(2a +h 4)
= 2a 4.
Example 4. Suppose C(x) = 8000 +200x 0, 2x
2
(0 x 400) is the total cost
that a company incurs in producing x units of a certain commodity.
a) What is the actual cost incurred for manufacturing the 251
st
unit?
b) Find the rate of change of the total cost with respect to x when x = 250.
c) Compare the results obtained in parts a) and b).
Solution. a) The actual cost in producing the 251
st
unit is the dierence between
the total cost incurred in producing the rst 251 units and the total cost of producing
the rst 250 units. Thus, the actual cost is given by
C(251) C(250) = 99, 80.
b) The rate of change of the total cost function C with respect to x is given by
the derivative of C at the point 250.
C

(250) = lim
h0
C(250 +h) C(250)
h
131
= lim
h0
[8000 + 200(250 + h) 0, 2(250 + h)
2
] [8000 + 200 250 0, 2 250
2
]
h
= lim
h0
200h 0, 2h
2
0, 2 500h
h
= lim
h0
(200 0, 2h 0, 2 500) = 200 100 = 100
c) From the solution of part (a) we know that the actual cost for producing the
251st unit of commodity is 99,80. This answer is very approximated by the answer to
part (b) which is 100.
To explain that, we observe that
C

(250) = lim
h0
C(250 +h) C(250)
h

C(250 +h) C(250)
h
for h suciently small.
Taking h = 1 (which is small enough compared to 250) we have
C

(250) C(251) C(250).


The cost of producing an additional unit of a certain commodity is called the
marginal cost. If C(x) is the total cost function in producing x units of a certain
commodity then the marginal cost in producing one additional unit is C(x+1)C(x).
This quantity can be approximated as in the previous example by the rate of change
C

(x) C(x + 1) C(x).


For this reason, economists have dened the marginal cost function to be the
derivative of the corresponding total cost function. Thus the word marginal is syn-
onymous with derivative of.
When we try to apply the denition of the derivative we encounter diculties of
an algebraic nature. In the next example it can be seen how much work is needed to
compute the derivative of a relatively simple function.
Example 5. Find f

(x) for f(x) =


2x + 1
x 2
, f : R 2 R.
Solution.
f

(x) = lim
h0
f(x +h) f(x)
h
= lim
h0
2x + 2h + 1
x +h 2

2x + 1
x 2
h
= lim
h0
(2x + 2h + 1)(x 2) (x +h 2)(2x + 1)
(x +h 2)(x 2)h
= lim
h0
5h
(x +h 2)(x 2)h
= lim
h0
5
(x +h 2)(x 2)
=
5
(x 2)
2
132
So, the technical aspects of dierentiation are complex. We need other techniques
for computing derivatives, which avoid using the formal denition. For this, we will
consider the various ways in which two functions may be combined to form a new
function. The technique for handling such combinations are generally known as rules
of dierentiation. The most important of them are the following.
Rule 1. Constant multiple rule
The derivative of cf (where c is a constant) is cf

; (cf)

= cf

.
Rule 2. Sum rule
The derivative of f +g is f

+g

; (f +g)

= f

+g

.
Rule 3. Dierence rule
The derivative of f g is f

; (f g)

= f

.
Rule 4. Product rule
The derivative of fg is f

g +fg

; (fg)

= f

g +fg

.
Rule 5. Quotient rule
The derivative of
f
g
is
f

g fg

g
2
;
_
f
g
_

=
f

g fg

g
2
.
Rule 6. Chain rule
The derivative of the composite function f g is (f

g) g

.
(f g)

= (f

g) g

so that (f g)

(x) = f

(g(x)) g

(x).
The general rules presented before, allow us to compute derivatives of complicated
functions which are constructed from the basic ones.
Example 6. Dierentiate f : R R, f(x) = (x
3
2x
2
+ 4)(8x
2
+ 5x).
Solution. From the product rule we have
f

(x) = (x
3
2x
2
+ 4)

(8x
2
+ 5x) + (x
2
2x
2
+ 4)(8x
2
+ 5x)

= (3x
2
4x)(8x
2
+ 5x) + (x
2
2x
2
+ 4)(16x + 5)
Example 7. Dierentiate f : D R, f(x) =
3x
2
1
2x
3
+ 5x
2
+ 7
.
Solution. From the quotient rule we have
f

(x) =
(3x
2
1)

(2x
3
+ 5x
2
+ 7) (3x
2
1)(2x
3
+ 5x
2
+ 7)

(2x
3
+ 5x
2
+ 7)
2
=
6x(2x
3
+ 5x
2
+ 7) (3x
2
1)(6x
2
+ 10x)
(2x
3
+ 5x
2
+ 7)
2
=
6x
4
+ 6x
2
+ 52x
(2x
3
+ 5x
2
+ 7)
2
Example 8. Find the derivative of h,
h : R R, h(x) = (x
3
+ 6x
2
5x + 2)
7
.
Solution. We will split h into its constituent parts
h(x) = [g(x)]
7
, where g(x) = x
3
+ 6x
2
5x + 2
= (f g)(x), where f(x) = x
6
By the chain rule, we get
h

(x) = f

(g(x)) g

(x) = 7g
6
(x) g

(x)
= 7(x
3
+ 6x
2
5x + 2)
6
(3x
2
+ 12x 5)
133
Example 9. Find the derivative of h,
h : R R, h(x) =
_
6x
6
+x
2
+ 4.
Solution. We rst split h into its constituent functions
h(x) =
_
g(x), where g(x) = 6x
6
+x
2
+ 4
= (f g)(x), where f(x) =

x
By the chain rule
h

(x) = f

(g(x)) g

(x) =
1
2
_
g(x)
g

(x)
=
36x
5
+ 2x
2

6x
6
+x
2
+ 4
Example 10. Find the derivative of
h(x) =
_
x
2
+ sin
2
x.
Solution.
h

(x) =
1
2
_
x
2
+ sin
2
x
(x
2
+ sin
2
x)

=
1
2
_
x
2
+ sin
2
x
(2x + 2 sin x(sin x)

)
=
x + sin xcos x
_
x
2
+ sin
2
x
Example 11. Dierentiate f,
f : R

R, f(x) = e
1
x
3
.
Solution.
f

(x) = e
1
x
3
_
1
x
3
_

= e
1
x
3
(x
3
)

= e
1
x
3
(3x
4
) = 3e
1
x
3

1
x
4
Example 12. Dierentiate f,
f : R R, f(x) = ln(e
4x
+e
4x
).
Solution.
f

(x) = (ln(e
4x
+e
4x
))

=
1
e
4x
+e
4x
(e
4x
+e
4x
)

=
1
e
4x
+e
4x
[e
4x
(4x)

+e
4x
(4x)

] =
4e
4x
4e
4x
e
4x
+e
4x
134
Example 13. Let f : R R, f(x) = (x
2
4)
3
.
Compute f

(2) in two dierent ways.


Solution. One way of computing f

(2) is to use the denition of the derivative at


a given point.
f

(2) = lim
x2
f(x) f(2)
x 2
= lim
x2
(x
2
4)
3
0
x 2
= lim
x2
(x 2)
3
(x + 2)
3
x 2
= lim
x2
(x 2)
2
(x + 2)
3
= 0 4
3
= 0
The second way is to determine rst the derivative function f

and then evaluate


it at x = 2
f

(x) = [(x
2
4)
3
]

= 3(x
2
4)
2
(x
2
4)

= 3(x
2
4)
2
2x = 6x(x
2
4)
2
So,
f

(2) = 6 2 (2
2
4)
2
= 0,
as we expected.
If f is a dierentiable function, then its derivative f

is also a function, so f

may
have a derivative of its own, denoted by (f

= f

. This new function f

(if it exists)
is called the second derivative of f because it is the derivative of the derivative of f.
Example 14. If f : R R, f(x) = xsin x, nd f

.
Solution. Using the product rule, we have
f

(x) = x

sin x +x(sin x)

= sin x +xcos x
To nd f

we dierentiate f

(x) = (sin x +xcos x)

= (sin x)

+ (xcos x)

= cos x +x

cos x +x(cos x)

= cos x + cos x xsin x


= 2 cos x xsin x.
The third derivative f

is the derivative of the second derivative: f

= (f

.
The process can be continued.
The fourth derivative is usually denoted by f
(4)
.
In general, the nth derivative of f is denoted by f
(n)
and is obtained from f by
dierentiating n times.
We will end this subsection by presenting the lHospitals rule for computing limits.
The lHospital rule talks about a method to calculate limits of fractions where the
denominator and the numerator both go to zero or both go to innity.
135
These forms of a limit are said to be indeterminate, since we cannot say what the
limit will be.
In an indeterminate form
0
0
we do not know how fast each is going to 0. If
the numerator goes to zero faster then the denominator, we can expect that the
limit is zero. But if the denominator goes to zero faster then the numerator of the
fraction will be a large number. Finally, if the numerator and the denominator are
going to zero equally fast, then the limit will be a nonzero real number. In any case,
the limit cannot be determined just looking at the form
0
0
.
Theorem 1. (lHospitals rule for
0
0
)
Let f and g be functions and a R. If
(a) f and g are dierentiable in some interval (a h, a +h) with h > 0,
(b) lim
xa
f(x) = 0 = lim
xa
g(x) and
(c) lim
xa
f

(x)
g

(x)
(allowing the limits + and )
then the limit lim
xa
f(x)
g(x)
exists and
lim
xa
f(x)
g(x)
= lim
xa
f

(x)
g

(x)
.
The rule works also for limits of the indeterminate form
_

_
.
Theorem 2. (lHospitals rule for
_

_
).
If f and g are functions that satisfy (a) and (c) above together with
(b) lim
xa
[f(x)[ = = lim
xa
[g(x)[
then
lim
xa
f(x)
g(x)
= lim
xa
f

(x)
g

(x)
.
Either form of lHospitals rule works for limits at innity, lim
x
f(x)
g(x)
or
lim
x
f(x)
g(x)
, as long as for one sided limits. In each case, condition (a) has to be
adapted correspondingly.
For example,
lim
x
f(x)
g(x)
= lim
x
f

(x)
g

(x)
if
(a) f and g are dierentiable on some interval (b, ),
(b) lim
x
f(x) = 0 = lim
x
g(x) or lim
x
[f(x)[ = = lim
x
[g(x)[ and
(c) lim
x
f

(x)
g

(x)
exists.
136
The following limits are all of the form
_
0
0
_
or
_

_
, but their answers are all
dierent.
Example 15. Find lim
x0
x sin x
x
3
.
Solution.
lim
x0
x sin x
x
3
[
0
0
]
= lim
x0
(x sin x)

(x
3
)

= lim
x0
1 cos x
3x
2
if the last limit exists. But the last limit is also of the indeterminate form
_
0
0
_
and
so we can try the lHospitals rule one more time:
lim
x0
1 cos x
3x
2
= lim
x0
(1 cos x)

(3x
2
)

= lim
x0
sin x
6x
[
0
0
]
= lim
x0
cos x
6
=
1
6
Example 16.
lim
x0
sin x
x
3
[
0
0
]
= lim
x0
(sin x)

(x
3
)

= lim
x0
cos x
3x
2
=
1
+0
= +
Example 17.
lim
x
e
x
x
2
+ 4x + 2
[

]
= lim
x
(e
x
)

(x
2
+ 4x + 2)

= lim
x
e
x
2x + 4
[

]
= lim
x
(e
x
)

(2x + 4)

= lim
x
e
x
2
= .
There are other indeterminate forms as we can see in the following examples.
a) lim
x0

xln x is of the form [0 ]


b) lim
x0
x
sin x
is of the form [0
0
]
c) lim
x
(e
x
x)
1
x
2
is of the form [
0
]
d) lim
x
_
x
2
+ 1
x
2
_
2x
is of the form [1

]
e) lim
x0
_
1
x

sin x
x
2
_
is of the form [].
To nd the limits of these indeterminate form, we can rewrite the functions as
quotients, as we can see below.
Example 18. Find lim
x0

xln x.
Solution. The limit is of the indeterminate form 0 . We rewrite it as a quotient
as follows:
lim
x0

xln x = lim
x0
ln x
x

1
2
.
137
Now, the limit is of the indeterminate form
_

_
. We use the lHospitals rule.
lim
x0

xln x = lim
x0
ln x
x

1
2
= lim
x0
(ln x)

(x

1
2
)

= lim
x0
1
x

1
2
x

3
2
= 2 lim
x0
x
1
2
= 0
Example 19. Find lim
x0
_
1
x

sin x
x
2
_
.
Solution. The limit is of the indeterminate form . We rst rewrite
1
x

sin x
x
2
in the form of a quotient
1
x

sin x
x
2
=
x sin x
x
2
.
Now, the limit
lim
x0
_
1
x

sin x
x
2
_
= lim
x0
x sin x
x
2
[
0
0
]
= lim
x0
(x sin x)

(x
2
)

= lim
x0
1 cos x
2x
[
0
0
]
= lim
x0
(1 cos x)

(2x)

= lim
x0
sin x
2
= 0.
The form 0
0
, 1

,
0
and 0

are indeterminate powers.


We will use the exponential and logarithmic functions to convert them into an
indeterminate product.
Example 20. Find lim
x0
x
sin x
.
Solution. By using the equality a = e
ln a
we can write
x
sin x
= e
ln x
sin x
= e
sin x ln x
Hence,
lim
x0
x
sin x
= lim
x0
e
sin x ln x
= e
lim
x0
sin x ln x
.
The last limit is of the indeterminate form 0 . So, we rewrite it as an indeter-
minate quotient and use the lHospital rule
lim
x0
sin xln x
[0]
= lim
x0
ln x
1
sin x
[

]
= lim
x0
1
x

cos x
sin
2
x
= lim
x0
sin x
x
tg x = lim
x0
sin x
x
lim
x0
tg x = 1 0 = 0.
138
In the previous equality we use lim
x0
sin x
x
= 1. Indeed
lim
x0
sin x
x
[
0
0
]
= lim
x0
cos x
1
= 1.
Finally, we obtain that lim
x0
x
sin x
= e
0
= 1.
Example 21. Find lim
x
_
x
2
+ 1
x
2
_
2x
.
Solution.
lim
x
_
x
2
+ 1
x
2
_
2x
= lim
x
e
ln
_
x
2
+1
x
2
_
2x
= lim
x
e
2x ln
_
x
2
+1
x
2
_
.
The last limit is of the indeterminate form [0 ].
lim
x
2xln
x
2
+ 1
x
2
[0]
= lim
x
ln
x
2
+ 1
x
2
1
2x
[
0
0
]
= lim
x
2x
x
2
+ 1

2x
x
2

1
2x
2
= lim
x
2x
3
2x
3
2x
(x
2
+ 1)x
2

1
2x
2
= lim
x
4x
x
2
+ 1
[

]
= lim
x
4
2x
= 0.
In consequence we have
lim
x
_
x
2
+ 1
x
2
_
2x
= e
0
= 1.
We end this section by mentioning the fact that the lHospitals rule was rst pub-
lished in 1696 in the Marquis de lHospital calculus textbook Analyse des inniment
petits. The rule actually was discovered in 1694 by the Swiss mathematician Johann
Bernoulli. That fact was possible because the Marquis de lHospital bought the rights
to Bernoullis mathematical discoveries.
3.1.3 Linear approximation and dierentials
A curve lies very close to its tangent line near the point of tangency. This observa-
tion is the basis for a method of nding approximate values of functions. We use the
tangent line at (a, f(a)) as an approximation to the curve y = f(x) when x is near a.
The equation of the tangent line at the point (a, f(a)) is
y = f(a) +f

(a)(x a)
139
and the correspondent approximation will be
f(x) f(a) +f

(a)(x a) (1)
The relation (1) is called the linear approximation of f at a.
Example 1. Find the linearization of the function f : [3, ) R,
f(x) =

x + 3 at a = 1 and use it to approximate the numbers



3, 98 and

4, 02.
Solution. The derivative of f is f

,
f

: (3, ) R, f

(x) =
1
2

x + 3
and so we have f(1) = 2 and f

(1) =
1
4
.
The approximation formula will be
f(x) f(1) +f

(1) (x 1)
(when x is near 1)

x + 3 2 +
1
4
(x 1)

x + 3
7
4
+
x
4
.
In particular we have
_
3, 98 =
_
0, 98 + 3
7
4
+
0, 98
4
= 1, 995
_
4, 02 =
_
1, 02 + 3
7
4
+
1, 02
4
=
8, 02
4
= 2, 005
The linear approximation is illustrated in the gure below.
`

y =
7
4
+
x
4
(1, 2)
y =

x + 3
140
We can see that the tangent line approximation is a good approximation to the
given function when x is near 1.
These approximation ideas are formulated in terminology of dierentials. If y =
f(x), where f is a dierentiable function at a then the dierential of f at the point a
is the following function:
df
(a)
: R R, dened by df
(a)
(h) = f

(a)h (2)
Sometimes in the previous relation we use dx instead of h and dy instead of
df
(a)
(h), so we have
dy = f

(a)dx.
The geometric meaning of dierentials is shown in the gure below. Let P(a, f(a))
and Q(a + x, f(a + x)) be points on the graph of f and let dx = x. The corre-
spondent change in y is y = f(a + x) f(a). The slope of the tangent line PR is
the derivative f

(a).
`

P
a a + x
dx = x
y
_
dy
R
Q
df
(a)
(dx) = dy represents the change in linearization whereas y represents the change
in function. The approximation y dy becomes better as x = dx becomes smaller.
For complicated functions it may be impossible to compute y exactly. In such cases
the approximation by dierentials is useful
f(a +dx) f(a) +f

(a)dx. (3)
3.1.4 Extreme values of a real valued function
Some of the most important applications of dierential are optimization problems.
These problems can be reduced to nding the maximum or minimum values of a
function.
141
Denition 1. Let f : A R and a A.
The function f has a global maximum at a if f(a) f(x) for all x in A. The
number f(a) is called the maximum value of f on A.
The function f has a global minimum at a if f(a) f(x) for all x in A. The
number f(a) is called the minimum value of f on A.
The maximum and minimum values of f are called the extreme values of f.
Denition 2. Let f : A R and a A.
The function f has a local maximum at a if f(a) f(x) where x is near a. (This
means that f(a) f(x) for all x in some open interval containing a).
The function f has a local minimum at a if f(a) f(x) where x is near a.
Example 1. Determine the extreme values of the function
f : R R, f(x) = sin x.
Solution. Since 1 sin x 1 for all x R and
sin
_

2
+ 2n
_
= 1
for any integer n then the function f takes its (local and global) maximum value of 1
innitely many times.
In the same way 1 is its minimum value (local and global). This value is taken
innitely many times too, since
sin
_
3
2
+ 2n
_
= 1 for all n Z.
Example 2. Determine the extreme values of the function
f : R R, f(x) = x
3
.
Solution.
`

y
x
142
From the graph of the function f we see that this function has neither an absolute
maximum value nor an absolute minimum value.
We have seen that some functions have extreme values, whereas others do not. The
extreme value theorem (Theorem 5, subsection 3.1.1) says that a continuous function
on a closed interval has a maximum value and a minimum value, but it doesnt tell us
how to nd these extreme values. In the next gure we sketch the graph of a function
f with a local maximum at c and a local minimum at d.
`

(c, f(c))
(d, f(d))
d
c
It seems that at the maximum and minimum points the tangent lines are parallel
to the x-axis and in consequence each has slope 0. Since the slope of the tangent line
is the derivative we may believe that f

(c) = f

(d) = 0.
The following theorem shows us that this remark is always true for dierentiable
functions.
Theorem 1. (Fermats theorem). Let f : I R, I R, I an open interval and
a I. If f is dierentiable at a and f has a local maximum or minimum at a then
f

(a) = 0.
The example 2 shows us that we cant expect to locate extreme values by setting
f

(x) = 0 and solving for x. Indeed if f(x) = x


3
, then f(x) = x
3
, so f

(x) = 3x
2
et f

(0) = 0. But f has no maximum or minimum at 0, as we already mention in


discussing example 2.
Example 3. Let f : R R, f(x) = [x[. The graph of f is showed below. The
function f has a minimum at 0, but this value cant be found by solving the equation
f

(x) = 0 since f is not dierentiable at x = 0. Indeed


lim
x0
f(x) f(0)
x
= lim
x0
[x[
x
does not exist since
lim
x0
[x[
x
= lim
x0
x
x
= 1 and lim
x0
[x[
x
= lim
x0
x
x
= 1.
143

`
O
x
y
f(x) = [x[
In conclusion we can observe that the converse of Fermats theorem is false.
In fact, the Fermats theorem say that we have to seek the local extreme points
among the solutions of the equation f

(x) = 0 or among the points for which f is not


dierentiable.
These points (solutions for f

(x) = 0 or points for which f is not dierentiable)


are called critical points.
Example 4. Find the critical points of the function
f : R R, f(x) =
3

x(3 x).
Solution. We rewrite rst the function f as
f(x) = 3x
1
3
x
4
3
and so
f

(x) = x

2
3

4
3
x
1
3
= x

2
3
_
1
4x
3
_
=
1
4
3
x
x
2
3
, for all x ,= 0.
Therefore, f

(x) = 0 if 1
4
3
x = 0 that is x =
3
4
. f is not dierentiable at x = 0.
Thus the critical points are x = 0 and x =
3
4
.
Remark 1. To nd an absolute maximum or minimum of a continuous function
f on a closed interval we nd the critical points of f in (a, b) and compute the values
of function f at the critical points and at the endpoints of the interval. The largest of
the previous values is the absolute maximum value and the smallest of these values
is the absolute minimum value.
Example 5. Find the absolute maximum and minimum values of the function
f : [2, 1] R, f(x) = x
3
+ 2x
2
1.
Solution. f

(x) = 3x
2
+ 4x = (3x + 4)x.
144
The function f is dierentiable on (2, 1) so the critical points are the solutions
of f

(x) = 0.
f

(x) = 0 x(3x + 4) = 0 x = 0 et x =
4
3
.
The values of f at critical points are
f(0) = 1 and f
_

4
3
_
=
5
27
.
The values of f at the endpoints of the interval are
f(2) = 1 and f(1) = 2.
Comparing these values we see that the absolute maximum value is f(1) = 2 and
the absolute minimum value is f(0) = f(2) = 1.
As we have already shown the derivative function f

is very useful in studying the


properties of the given function f.
Next, we will present two important facts which summarize this connection.
Theorem 2. (Rolles theorem). Let f : [a, b] R be continuous on [a, b] and
dierentiable on (a, b) such that f(a) = f(b). Then there exists a number c (a, b)
such that f

(c) = 0.
As a rst application of Rolles theorem present the second order Taylors formula
for a function whose second derivative is continuous on an interval.
Remark 2. (Taylors formula with the remainder in the Lagranges form)
Let f : I R where I is an open interval and let a I. Suppose that the second
derivative f

is continuous on I.
For each x I there exists a number c between a and x such that
f(x) = f(a) +f

(a)(x a) +
1
2
f

(c)(x a)
2
.
Proof. The previous equality is true for x = a. So, let x I, x ,= a. We dene
the function g : I R given by
g(t) = f(t) f(a) f

(a)(t a) (t a)
2
where is chosen so that g(x) = 0.
We easily obtain that
=
1
(x a)
2
[f(x) f(a) f

(a)(x a)].
We also have that g(a) = g

(a) = 0.
We apply Rolles theorem to the function g dened on the interval
[min(a, x), max(a, x)] to nd c
1
between a and x so that g

(c
1
) = 0.
We apply Rolles theorem to the function g

dened on the interval


[min(a, c
1
), max(a, c
1
)] to nd c between a and c (hence c lies between a and x)
so that g

(c) = 0.
145
On the other hand, the second derivative of g
g

(t) = f

(t) 2,
where we easily get that =
1
2
f

(c).
By putting t = x and =
1
2
f

(c) in the expression of g we get


0 = f(x) f(a) f

(a)(x a)
1
2
f

(c)(x a)
2
,
which completes the proof.
The main use of Rolles theorem is proving the following important theorem, which
was rst stated by the french mathematician, Joseph-Louis Lagrange.
Theorem 3. (The mean value theorem, Lagranges theorem). Let f : [a, b] R
be continuous on [a, b] and dierentiable on (a, b). Then there is a number c (a, b)
such that
f

(c) =
f(b) f(a)
b a
(1)
or, equivalently
f(b) f(a) = f

(c)(b a).
By interpreting geometrically the mean value theorem, we can see that it is rea-
sonable. Indeed, if A(a, f(a)) and B(b, f(b)) are points on the graph of f (see gures
below) then the slope of the secant line AB is
m
AB
=
f(b) f(a)
b a
which is the same as the right side of equality (1). Since f

(c) is the slope of the


tangent line at the point (c, f(c)) the mean value theorem says that there is at least
one point P(c, f(c)) on the graph where the tangent line is parallel to the secant line
AB.
`

a
b
A
B
P(c, f(c))
146
`

P
1
P
2
B
A
a
b
The mean value theorem helps us to obtain information about a function from
information about its derivative.
Example 6. Let f : R R be a dierentiable function. Suppose that f(2) = 2
and f

(x) 2 for all value of x. How large f(4) can be?


Solution. We can apply the mean value theorem on the interval [2, 4]. There exists
a number c such that
f(4) f(2) = f

(c)(4 2)
so
f(4) = f(2) +f

(c) 2 = 2 +f

(c) 2.
We are given that f

(x) 2 for all x, so in particular we know that f

(c) 2. So,
f(4) = 2 +f

(2) 2 2 + 2 2 = 6.
The largest possible value for f(2) is 6.
The mean value theorem is useful in establishing the following basic properties of
dierentiable functions.
Theorem 4. Let f : (a, b) R be a dierentiable function. If f

(x) = 0 for all


x (a, b) then f is constant on (a, b).
Theorem 5. Let f, g : (a, b) R be two dierentiable functions. If f

(x) = g

(x)
for all x (a, b), then f g is constant on (a, b); there is a constant c such that
f = g +c.
Denition 3. Let f : A R, A R.
a) We say that f is increasing on A if for all x
1
, x
2
A with x
1
< x
2
we have
f(x
1
) < f(x
2
).
b) We say that f is decreasing on A if for all x
1
, x
2
A with x
1
< x
2
we have
f(x
1
) > f(x
2
).
Theorem 6. Let f : (a, b) R be a dierentiable function.
a) If f

(x) > 0 for all x (a, b) then f is increasing on (a, b).


b) If f

(x) < 0 for all x (a, b) then f is decreasing on (a, b).


Example 7. Find the intervals where the function
f : R R, f(x) = 3x
4
24x
2
+ 2
147
is increasing and where it is decreasing.
Solution. f

(x) = 12x
3
48x = 12x(x
2
4) = 12x(x 2)(x + 2)
We have to solve the following inequations:
f

(x) > 0 and f

(x) < 0.
This depends on the sign of the three factors of f

(x), namely, 12x, x 2 and


x + 2.
The critical points of f are x = 0, x = 2 and x = 2.
We can arrange the signs of f

(x) in the following table.


x 2 0 2 +
12x 0 + + + +
x 2 0 + +
x + 2 0 + + + + + +
f

(x) 0 + 0 0 + +
f(x) f(2) f(0) f(2)
f is decreasing on (, 2) and on (0, 2).
f is increasing on (2, 0) and on (2, ).
Recall that when a function has a relative extremum it must occur at a criti-
cal value. We will now combine the ideas mentioned before to obtain two tests for
determining when a critical value is a relative extremum point of a given function.
Theorem 7. (First derivative test for relative extrema) Let f : D R and let
[a, b] D.
Suppose that f is continuous on [a, b] and dierentiable on (a, b) except possibly
at the critical value c.
(a) If f

(x) > 0 for a < x < c and f

(x) < 0 for c < x < b, then c is a relative


maximum point of f.
(b) If f

(x) < 0 for a < x < c and f

(x) > 0 for c < x < b, then c is a relative


minimum point of f.
(c) If f

(x) has the same algebraic sign on a < x < c and c < x < b then c is not
an extremum point of f.
Example 8. Determine the extreme points of the function
f : R R, f(x) = 3x
4
24x
2
+ 2.
Solution. By using the previous theorem and the results obtained in Example 7
we obtain that 2 and 2 are relative minimum points and 0 is a relative maximum
point.
Another geometric property of a graph of a given function is its concavity.
Visually, concavity is easy to recognize. If a graph is smiling at you, it is concave
up (or convex); if it is frowning at you, it is concave down (or simply concave).
148
`

x
y
concave up: smiling:
`

x
y
concave down: frowning
A mathematical characterization of concavity involves the second derivative of the
given function.
Theorem 8. (Test for concavity) Let f : I R, I R, where I is an open
interval.
(a) If f

(x) > 0 for all x I then f is concave up on I.


(b) If f

(x) < 0 for all x I then f is concave down on I.


Example 9. Find the intervals where the graph of
f : R R, f(x) = 2x
3
6x
2
is concave up and where it is concave down.
Solution. We have
f

(x) = 6x
2
12x
and
f

(x) = 12x 12 = 12(x 1).


The sign of the second derivative is given in the following table
x 1 +
f

(x) 0 + +
f(x) f(1)
concave down concave up
Thus the graph is concave up on (1, ) and concave down on (, 1).
Another application of the second derivative is the following test for maximum
and minimum values. It is a consequence of the concavity test.
Theorem 9. (The second derivative test) Let f : A R and c A. Suppose f

is continuous near c (that is f

is continuous on an interval (c h, c +h)).


(a) If f

(c) = 0 and f

(c) > 0 then f has a local minimum at c.


(b) If f

(c) = 0 and f

(c) < 0 then f has a local maximum at c.


(c) If f

(c) = 0 and f

(c) = 0 then the test is inconclusive.


149
Part (a) is true because f

(x) > 0 near c and so f is concave up near c. This


means that the graph of f lies above its horizontal tangent at c (since f

(c) = 0) and
so f has a local minimum at c.
Part (b) is true because f

(x) < 0 near c and so f is concave down near c. This


means that the graph of f lies below its horizontal tangent at c (since f

(c) = 0) and
so f has a local maximum at c.
Example 10. Use the second derivative test to nd the extrema of the following
function:
f : R R, f(x) = 3x
4
8x
3
+ 6x
2
.
Solution. We have to evaluate the second derivative at the critical points. First,
we determine the critical points of f.
f

(x) = 12x
3
24x
2
+ 12x = 12x(x
2
2x + 1) = 12x(x 1)
2
.
In consequence f

(x) = 0 for x = 0 and x = 1.


Now, nd the second derivative, and test the sign at x = 0 and x = 1.
f

(x) = 36x
2
48x + 12 = 12(3x
2
4x + 1).
Since f

(0) = 12 > 0 then the function f has a relative minimum point at x = 0


Since f

(1) = 0 the test fails so we have to use the rst derivative test. The sign
of the rst derivative is given in the table below.
x 0 1 +
12x 0 + + + +
(x 1)
2
+ + + + 0 + +
f

(x) 0 + 0 + +
f(x) f(0) f(1)
The part (c) of the rst derivative test shows that 1 is not a relative extreme point.
3.1.5 Applications to economics
In subsection 3.1.2 we introduced the idea of marginal cost. Recall that if C is the
cost function and C(x) is the cost of producing x units of a certain product then the
marginal cost is the rate of change of C with respect to x. In fact the marginal cost
function is the derivative C

of the cost function. We also consider the average cost


function
c(x) =
C(x)
x
representing the cost per unit if x units are produced. We want to nd what happens
at a minimum point of the average cost function.
Theorem 1. a) If a is a minimum point for c then C

(a) = c(a).
b) If the marginal cost is less then the average cost function decreases.
c) If the marginal cost is greater then the average cost then the average cost in-
creases.
150
Proof. a) If a is a minimum point of function c then c

(a) = 0 (as a consequence


of Fermats theorem).
By applying the quotient rule we have
c

(x) =
_
C(x)
x
_

=
C

(x) x C(x)
x
2
=
x
_
C

(x)
C(x)
x
_
x
2
=
C

(x) c(x)
x
.
Since c

(a) = 0 then C

(a) c(a) = 0 and so C

(a) = c(a).
b) If the marginal cost is less then the average cost then
c

(x) =
C

(x) c(x)
x
< 0
and c is a decreasing function (by Theorem 6, subsection 3.1.4).
c) If the marginal cost is greater then the average cost then
c

(x) =
C

(x) c(x)
x
> 0
and c is an increasing function (by Theorem 6, subsection 3.1.4). This completes the
proof.
The part a) of the previous theorem says if the average cost is minimum then the
marginal cost equals to the average cost.
We have the following explanation for parts b) and c) of the previous theorem.
The marginal cost is (approximatively) the cost of producing one additional unit
of the considered product (see Example 4, subsection 3.1.2).
If the addition unit cost is less then the average cost this less expensive unit will
determine the average cost per unit to decrease.
If the additional unit cost is greater then the average cost this more expensive
unit will determine the average cost per unit to increase.
This principle is plausible because if the marginal cost is smaller then the average
cost then it should be produced more in order to lower the average cost. Similarly, if
the marginal cost is greater then the average cost, then it would be produced less in
order to lower the average cost.
We also consider the revenue function R, R(x) representing the income from the
sale of x units of the product. The derivative R

is called the marginal revenue function.


If x units are sold the price function p will be dened by
p(x) =
R(x)
x
.
The function P = RC is naturally called the prot function and the derivative
P

is called the marginal prot function. Note that


P

(x) = R

(x) C

(x) = 0 if R

(x) = C

(x).
151
We therefore conclude that.
Theorem 2. If the prot is maximum, then the marginal revenue is equal to the
marginal cost.
Remark 1. It is often appropriate to represent a total cost function by a polyno-
mial (usually of degree three)
C(x) = a +bx +cx
2
+dx
3
where a represents the overhead cost (rent, heat, maintenance) and the other terms
represent the cost of raw materials, labor and so on. The cost raw materials may be
proportional to x but labor costs might depend partially on higher powers of x.
Example 1. A publisher of a calculus text book works with a cost function
C(x) = 50000 + 20x
1
10
4
x
2
+
1
3 10
8
x
3
and a price function
p(x) = 120
1
10
4
x,
both in dollars. Determine the maximum of the prot function.
Solution. Clearly we have
C

(x) = 20
1
5 10
3
x +
1
10
8
x
2
and
C

(x) =
1
5 10
3
+
1
5 10
7
x
so that
C

(x) = 0 pour x = 10
4
.
The marginal cost increases after 10000 copies. On the other hand we have
R(x) = xp(x) = 120x
1
10
4
x
2
and
R

(x) = 120
1
5 10
3
x
Maximum prot occurs when P

(x) = 0 and P

(x) < 0, so that R

(x) = C

(x)
120
1
5 10
3
x = 20
1
5 10
3
x +
1
10
8
x
2
with the solution x = 10
5
. If we want to use the second derivative test to establish
the nature of the critical point x = 10
5
we have to evaluate P

(10
5
).
P

(10
5
) = R

(10
5
) C

(10
5
) =
1
5 10
3
+
1
5 10
3

1
5 10
7
10
5
< 0.
152
This means that maximum prot occurs when exactly 100.000 copies are produced
and sold. The income is then
R(10
5
) = 11 10
6
at p(10
5
) = 110 dollars per copy. The cost is
C(10
5
) =
1315
3
10
4
dollars.
The maximum of the prot function will be:
P(10
5
) = R(10
5
) C(10
5
) = 11 10
6

1315
3
10
4
=
1985
3
10
4
dollars.
Finally, we will use the marginal concepts introduced before to derive an important
criterion used by economists to analyze the demand function. This concept is the price
elasticity of demand.
In mathematics, elasticity of a dierentiable function f at a point x is dened as
E(x) =
xf

(x)
f(x)
which can be rewritten in the following two dierent forms
E(x) =
f

(x)
f(x)
1
x
=
(ln f(x))

(ln x)

(1)
or
E(x) =
x
f(x)
f

(x) =
x
f(x)
lim
x0
f(x + x) f(x)
x
= lim
x0
f(x + x) f(x)
f(x)
x
x

f(x + x) f(x)
f(x)
100
x
x
100
=
percentage change in f
percentage change in x
(2)
So
E(x)
percentage change in f
percentage change in x
(3)
If we use the notations y = f(x) or y = y(x), then the x point elasticity of y is
denoted by
E
x
y
=
xf

(x)
f(x)
.
Remark 2. If y = y(x), then the y point elasticity of x is
E
y
x
=
1
E
x
y
.
153
Proof. If y = f(x), then x = f
1
(y) and by using the denition of elasticity
E
y
x
=
y(f
1
)

(y)
f
1
(y)
=
y
1
f

(f
1
(y))
x
=
f(x)
1
f

(x)
x
=
f(x)
xf

(x)
=
1
E
x
y
Next, we shall present the following economic example.
The demand for a product is usually related to its price. In most cases, the demand
decreases when the price increases. The sensitivity of demand to changes in price varias
from one product to another.
For some products small percentage changes in price have little eect on demand.
For other products small percentage changes in price have considerable eect on
demand. We want to measure the sensitivity of demand to changes in price.
Denition 1. If p represents the price per unit of a certain product and Q rep-
resents the demand function (in fact Q(p) is the number of the considered product)
then the price elasticity of demand is (see (2))
E
p
Q
=
pQ

(p)
Q(p)
= lim
p0
Q(p + p) Q(p)
Q(p)
100
p
p
100

percentage change in quantity demanded


percentage change in price
(4)
We observe that if the percentage change in price is one then
E
p
Q
percentage change in demand due to 1% increase in price. (5)
Remark 3. a) The price elasticity of demand is usually negative because the
demand decreases when the price increases.
b) If E
p
Q
< 1 the demand is said to be elastic with respect to price.
In this case the percentage decrease in demand is greater than the percentage
increase in price that caused it.
c) If 1 < E
p
Q
the demand is said to be inelastic with respect to price.
In this case the percentage decrease in demand is less then the percentage increase
in price that caused it.
d) If E
p
Q
= 1 the demand is said to be of unit elasticity with respect to price.
Theorem 3. (Elasticity and the total revenue)
Let R, R(p) = pQ(p) be the total revenue function.
a) If E
p
Q
< 1 then R is a decreasing function.
In this case, when the price is raised the total revenue decreases.
b) E
p
Q
> 1 then R is an increasing function.
In this case, when the price is raised the total revenue increases.
154
Proof. By the product rule of dierentiation we have
R

(p) = (pQ(p))

= pQ

(p) +Q(p) = Q(p)


_
pQ

(p)
Q(p)
+ 1
_
= Q(p)(E
p
Q
+ 1)
For the part a) we have E
p
Q
< 1 and so E
p
Q
+1 < 0. Since R

(p) = Q(p)(E
p
Q
+1)
and E
p
Q
+ 1 < 0 we obtain that R

(p) < 0. So, R is a decreasing function.


For the part b) we have E
p
Q
> 1 and so E
p
Q
+1 > 0. Since R

(p) = Q(p)(E
p
Q
+1)
and E
p
Q
+1 > 0 we obtain that R

(p) > 0. So, R is an increasing function in this case.


Example 2. Suppose the relationship between the unit price p in dollars and the
quantity demanded, x, is given by the equation
p = 0, 02x + 400 (0 x 20000).
Compute the price elasticity of demand and interpret the results.
Solution. Solving the given demand equation for x in terms of p we nd
x = Q(p) = 50p + 20000
from which we see that Q

(p) = 50. Therefore


E
p
Q
=
pQ

(p)
Q(p)
=
50p
50(p 400)
=
p
p 400
(0 p < 400).
Next, we will solve the equation
E
p
Q
= 1,
that is
p
p 400
= 1
giving p = 200.
We also see that E
p
Q
< 1 when p > 200 (elastic demand) and E
p
Q
> 1 when
p < 200 (inelastic demand).
So, when the unit price is between 0 and 200, an increase in the unit price will
increase the revenue; when the unit price is between 200 and 400, an increase in the
unit price will cause a decrease in revenue.
In consequence the revenue is maximized when the unit price is set at 200.
Example 3. Let Q be the demand function dened by
Q(p) = 10
_
50 p
p
; 0 < p 50.
a) Determine the elasticity of demand when the price is p = 10. If the price
increases by 6% determine the approximate change in demand.
b) Determine where the demand is elastic, inelastic and of unitary elasticity with
respect to price.
155
c) Determine the price function as a function of demand.
d) Find the maximum of the total revenue function.
Solution. a)
E
p
Q
=
pQ

(p)
Q(p)
=
p
_
50 p
p
_

2
_
50 p
p
_
50 p
p
=
p
p (50 p)
p
2
2
50 p
p
=
25
p 50
,
then
E
10
Q
=
5
8
On the other hand, from (4) we know that
E
p
Q

percentage change in demand
percentage change in price
.
If we take the percentage change in price to be 6 then
E
10
Q
=
5
8

percentage change in demand
6
,
where from the percentage change in demand is approximately
15
4
. This means that
the demand decreases with
15
4
%.
b) First we solve the equation
25
p 50
= 1. This gives us the solution p = 25.
For determining the elasticity intervals we have to solve the inequations
E
p
Q
< 1 and E
p
Q
> 1.
We easily obtain that E
p
Q
< 1 when p (25, 50) (elastic demand) and E
p
Q
< 1
when p (0, 25) (inelastic demand).
c) In order to determine the price function as a function of demand we solve the
equation
Q(p) = 10
_
50 p
p
for p.
Q
2
= 100
50 p
p
.
156
Thus Q
2
p = 5000 100p and
p = p(Q) =
5000
Q
2
+ 100
.
d) The critical points of the total revenue function are given by the following
equation
R

(Q) = 0,
where
R(Q) = Qp(Q) =
5000Q
Q
2
+ 100
.
R

(Q) = 5000
100 Q
2
(Q
2
+ 100)
2
= 0 implies that Q = 10.
By using the rst derivative test for determining the extreme values we get that
Q = 10 is a maximum point for the total revenue function.
3.2 Integral calculus of one variable
3.2.1 Antiderivatives and techniques of integration
In the previous chapter we were concerned only with the basic problem: given a
function f nd its derivative f

. In this chapter we are interested in precisely the


opposite process, that is, given a function f, nd a function whose derivative is f.
This process is called antidierentiation. Antidierentiation and dierentiation are
inverse operations in the sense that one undoes what the other does.
Denition 1. A function F is an antiderivative of the function f if F

= f.
Example 1. An antiderivative of f : R R, f(x) = 2x is F : R R, F(x) = x
2
since F

(x) = 2x = f(x).
There is always more than one antiderivative of a function. For instance, in the
previous example, F
1
: R R, F
1
(x) = x
2
1 and F
2
: R R, F
2
(x) = x
2
+ 10
are also antiderivatives of f. If F is an antiderivative of a function f then so is G,
G(x) = F(x) +c, for any constant c.
Theorem 1. Let f : I R, I R an interval and let F : I R be an
antiderivative of f. Then any other antiderivative G of f must be of the form G(x) =
F(x) +c, where c is a constant.
The proof of the previous result is based on Theorem 5, subsection 3.1.4.
The indenite integral of a function f represents the entire family of antiderivatives
of the given function. We will use the following notation for the indenite integral
_
f(x)dx.
The indenite integral is a family of functions. The function f is called the inte-
grand.
157
If F is an antiderivative of a given function f (dened on an open interval) then
the indenite integral of f will be
_
f(x)dx = F(x) +(.
Extensive techniques for the calculation of antiderivatives have been developed.
We will discuss now some basic techniques of integration.
Integration by substitution
This technique is based on the chain rule of dierentiation. We have to mention
rst that the integration by substitution does not always work and there is no simple
routine that could help us to nd a suitable substitution even in the cases where the
method works.
Theorem 2. If F is an antiderivative of f, then
_
f(g(x))g

(x)dx = F(g(x)) +( (1)


Proof. By the chain rule,
(F(g(x)) +c)

= F

(g(x))g

(x) = f(g(x))g

(x).
Hence, from the denition of an antiderivative we have that
_
f(g(x))g

(x)dx = F(g(x)) +(.


Example 2. Evaluate
_
(x
2
+ 3)
4
2xdx.
Solution. Let g(x) = x
2
+ 3. Then g

(x) = 2x. If we dene the function f by


f(u) = u
4
then the integrand of the indenite integral we are considering has the
form
(x
2
+ 3)
4
2x = [g(x)]
4
g

(x) = f(g(x))g

(x).
From the equality
_
f(g(x))g

(x)dx = F(g(x)) +(,


we conclude that the required antiderivative can be found if we know the antiderivative
of the function f. But if f(u) = u
4
, then F(u) =
1
5
u
5
. Thus
_
(x
2
+ 3)
4
2xdx =
_
f(g(x))g

(x)dx
= F(g(x)) +( =
1
5
(x
2
+ 3)
5
+(.
158
On a practical level it is helpful to rewrite the integral in a more recognizable form
by using the substitutions u = g(x) and du = g

(x)dx. Then the rules of integration


are used to complete the solution of the problem. This formal procedure is justied
since it leads to the correct solution of the problem.
If we write u = g(x) and du = g

(x)dx, then the integral


_
f(g(x))g

(x)dx
which is to be evaluated becomes
_
f(u)du which is equal to F(u) +( since F is an
antiderivative of f.
So we have
_
f(u)du = F(u) +( which is the same with (1), as mentioned before.
Example 3. Rework the previous example using the relationships
u = g(x) and du = g

(x)dx.
We want to evaluate the indenite integral
I =
_
(x
2
+ 3)
4
(2x)dx.
Let u = x
2
+ 3 so that du = 2xdx.
Making this substitution into the expression for I we get
I =
_
u
4
du =
1
5
u
5
+( =
1
5
(x
2
+ 3)
5
+(
which agrees with the results of previous example.
Example 4. Evaluate
_
1
xln x
dx.
Solution. Note rst that the derivative of the function lnx is equal to
1
x
, so it is
convenient to make the substitution u = ln x. Then du =
1
x
dx and
_
1
xln x
dx =
_
1
u
du = ln [u[ +( = ln [ ln x[ +(.
Example 5. Evaluate
_
sin
3
xcos
3
xdx.
Solution. Since the derivative of the function sinx is equal to cos x, it is convenient
to make the substitution u = sin x. Then du = cos xdx, and
_
sin
3
xcos
3
xdx =
_
sin
3
xcos
2
xcos xdx
=
_
u
3
(1 u
2
)du =
_
(u
3
u
5
)du =
u
4
4

u
6
6
+( =
sin
4
x
4

sin
6
x
6
+(.
159
Alternatively, note that the derivative of the function cos x is equal to sin x, so
it is convenient to make the substitution v = cos x. Then dv = sin xdx, and
_
sin
3
xcos
3
xdx =
_
(sin
2
x) cos
3
x(sin x)dx
=
_
[(1 v
2
)]v
3
dv =
_
(v
5
v
3
)dv =
v
6
6

v
4
4
+( =
cos
6
x
6

cos
4
x
4
+(
It can be checked that
sin
4
x
4

sin
6
x
6
=
cos
6
x
6

cos
4
x
4
+
1
12
so both of the previous results are true.
Example 6. Evaluate
_
x

x + 1dx.
Solution. If we make the substitution u =

x + 1, then x = u
2
1 and dx = 2udu.
_
x

x + 1dx =
_
(u
2
1)u 2udu = 2
_
u
4
du 2
_
u
2
du
=
2
5
u
5

2
3
u
3
+( =
2
5
(x + 1)
5
2

2
3
(x + 1)
3
2
+(.
Note that in this example the variable x is written as a function of the new variable
u. The substitution x = g(u) has to be invertible (u = g
1
(x)) to enable us to return
from the new variable u to the original variable x at the end of the process.
Integration by parts
Recall the product rule for dierentiation, that is
(fg)

(x) = f

(x)g(x) +f(x)g

(x).
Integrating with respect to variable x, we obtain
_
(fg)

(x)dx =
_
f

(x)g(x)dx +
_
f(x)g

(x)dx.
Since fg is an antiderivative of (fg)

the previous equality can be rewritten as


_
f(x)g

(x)dx = f(x)g(x)
_
f

(x)g(x)dx (2)
The relationship (2) is called the formula for integration by parts for indenite
integrals. It is very useful if the indenite integral
_
f

(x)g(x)dx is much easier to


calculate than the indenite integral
_
f(x)g

(x)dx.
160
Example. Compute
_
xe
x
dx.
Solution. Writing f(x) = x and g

(x) = e
x
, we have f

(x) = 1 and g(x) = e


x
. It
follows that
_
xe
x
dx = xe
x

_
e
x
dx = xe
x
e
x
+(
Example 8. Evaluate
_
ln xdx.
Solution. Writing f(x) = ln x and g

(x) = 1, we have f

(x) =
1
x
and g(x) = x so
_
ln xdx = xln x
_
x
1
x
dx = xln x x +(.
Example 9. Evaluate
_
e
x
sin xdx.
Solution.
_
e
x
sin xdx =
_
(e
x
)

sin xdx
= e
x
sin x
_
e
x
cos xdx (3)
We now need to study the indenite integral
_
e
x
cos xdx =
_
(e
x
)

cos xdx
= e
x
cos x
_
e
x
(sin x)dx = e
x
cos x +
_
e
x
sin xdx (4)
It looks like we are back to the same problem.
However, if we combine (3) and (4) then we obtain
_
e
x
sin xdx = e
x
sin x e
x
cos x
_
e
x
sin xdx
so that
_
e
x
sin xdx =
1
2
e
x
(sin x cos x) +(.
Completing squares
In this section, we shall consider thechniques to solve integrals involving square
roots of the form

a
2
x +bx +c, where a ,= 0. Our task is to show that such integrals
can be reduced to integrals discussed before.
161
Note that
ax
2
+bx +c = a
_
x
2
+
b
a
x +
c
a
_
= a
_
x
2
+
b
a
x +
_
b
2a
_
2
_
+c
b
2
4a
= a
_
x +
b
2a
_
2

b
2
4ac
4a
We will use now the following substitution
u = x +
b
2a
and du = dx.
Example 10. Evaluate the integral
_
1

3 2x x
2
dx.
Solution. We have
3 2x x
2
= (x
2
+ 2x 3) = (x
2
+ 2x + 1) + 4 = 4 (x + 1)
2
.
We use the substitutions u = x + 1 and du = dx
_
1

3 2x x
2
dx =
_
1

4 u
2
dx =
1
2
arcsin
u
2
+(
=
1
2
arcsin
u
2
+( =
1
2
arcsin
x + 1
2
+(
Partial fractions
In this section we shall consider indenite integrals of the form
_
p(x)
q(x)
dx where
p and q are polynomials in x.
If the degree of p is not smaller than the degree of q, then we can always nd
polynomials c and r such that
p(x)
q(x)
= c(x) +
r(x)
q(x)
where r 0 or r has a smaller degree than the degree of q.
We can therefore restrict our attention to the case when the polynomial p is of
lower degree than q.
The rst step is to factorize the polynomial q into a product of irreducible factors.
It is a fundamental result in algebra that a polynomial with real coecients can be
factorized into a product of irreducible linear factors and quadratic factors with real
coecients.
Suppose that a linear factor (ax+b) occurs n times in the factorization of q. Then
we write down a decomposition:
A
1
ax +b
+
A
2
(ax +b)
2
+ +
An
(ax +b)
n
162
where the constants A
1
, A
2
, . . . , A
n
will be determinated later.
Suppose that a quadratic factor (ax
2
+bx +c) occurs n times in the factorization
of q. Then we write down a decomposition
A
1
x +B
1
ax
2
+bx +c
+
A
2
x +B
2
(ax
2
+bx +c)
2
+ +
A
n
x +B
n
(ax
2
+bx +c)
n
where the constants A
1
, . . . , A
n
and B
1
, . . . , B
n
will be determined later.
We proceed to add all the decompositions and equate their sum to
p(x)
q(x)
and then
calculate all the constants by equating the coecients.
Example 11. Consider the indenite integral
_
x
2
+x 3
x
3
2x
2
x + 2
dx.
Solution. We factorize rst the denominator of the integrand
x
3
2x
2
x + 2 = x
3
x 2(x
2
1) = x(x
2
1) 2(x
2
1)
= (x 2)(x
2
1) = (x 2)(x 1)(x + 1)
So we consider partial fractions of the form
x
2
+x 3
x
3
2x
2
x + 2
=
a
x 2
+
b
x 1
+
c
x + 1
=
a(x 1)(x + 1) +b(x 2)(x + 1) +c(x 2)(x 1)
(x 2)(x 1)(x + 1)
It follows that
x
2
+x 3 = a(x
2
1) +b(x
2
x 2) +c(x
2
3x + 2)
= x
2
(a +b +c) +x(b 3c) a 2b + 2c
We equate coecients and solve for a, b, c.
_
_
_
a +b +c = 1
b 3c = 1
a 2b + 2c = 3
where from we get a = 1, b =
1
2
, c =
1
2
.
Hence
_
x
2
+x 3
x
3
2x
2
x + 2
dx =
_
1
x 2
dx
1
2
_
1
x + 1
dx +
1
2
_
1
x 1
dx
= ln [x 2[
1
2
ln [x + 1[ +
1
2
ln [x 1[ +(
163
3.2.2 The denite integral
In order to dene the concept of a denite integral we will dene rst the Riemann
sums (which are named after the famous german mathematician, Georg Friedrich
Bernhard Riemann (1826-1866)).
This is a 5 steps process.
1) Let f be dened on a closed interval [a, b].
2) Partition the interval [a, b] into n subintervals [x
k1
, x
k
] of length x
k
x
k1
.
Let P denote the partition
a = x
0
< x
1
< < x
n1
< x
n
= b.
3) Let |P| be the length of the longest subinterval. The number |P| is called the
norm of the partition P.
4) Choose a number x

k
(x
k1
, x
k
) in each subinterval k = 1, n.
5) Form the sum
n

k=1
f(x

k
)(x
k
x
k1
). (1)
Sums as (1) for the various partitions of [a, b] are known as Riemann sums.
Denition 1. Let f be a function dened on the closed interval [a, b]. Then the
denite integral of f from a to b, denoted
_
b
a
f(x)dx, is dened to be:
_
b
a
f(x)dx = lim
P0
n

k=1
f(x

k
)(x
k
x
k1
) (2)
provided that the previous limit exists and has a nite value.
If the limit in (2) exists and is nite, the function f is said to be integrable on
[a, b].
The numbers a and b in the previous denition are called the lower and upper
limits of integration, respectively. The integral symbol
_
, rst used by Leibniz, is an
elongated S for the word sum.
We have the following important result, which gives us an important class of
integrable functions.
Theorem 1. If f is continuous on [a, b] then f is integrable on the integrable on
the interval.
The precise characterization of the integrable functions is given by the following
theorem.
Theorem 2. Let f : [a, b] R.
The function f is integrable on [a, b] if and only if f is bounded on [a, b] and f is
continuous almost everywhere on [a, b].
In consequence any integrable function is a bounded one.
The next theorem gives some of the basic properties of the denite integral.
Theorem 3. Let f and g be integrable functions on [a, b]. Then we have:
164
a)
_
b
a
kf(x)dx = k
_
b
a
f(x)dx, where k is any constant;
b)
_
b
a
[f(x) g(x)]dx =
_
b
a
f(x)dx
_
b
a
g(x)dx;
c)
_
b
a
f(x)dx =
_
c
a
f(x)dx +
_
b
c
f(x)dx,
where c is any number in [a, b];
d)
_
b
a
f(x)dx =
_
a
b
f(x)dx;
e)
_
a
a
f(x)dx = 0.
The most helpful result in computing denite integrals is the following:
Theorem 4. (Leibniz-Newtons theorem) Let f : [a, b] R. Suppose that f is
integrable on [a, b] and that there exists an antiderivative F of f. Then
_
b
a
f(x)dx = F(b) F(a). (3)
The dierence F(b) F(a) is usually written F(x)

b
a
.
Example 1. Evaluate
_
2
2
(3x
2
x + 1)dx.
Solution.
_
2
2
(3x
2
x + 1)dx = 3
_
2
2
x
2
dx
_
2
2
xdx +
_
2
2
dx
= 3
x
3
3

2
2

x
2
2

2
2
dx +x

2
2
= 2
3
(2)
3

1
2
[2
2
(2)
2
] + [2 (2)] = 20.
The Leibniz-Newton theorem allows us to use all the techniques of integration
presented in subsection 3.2.1.
Example 2. Evaluate
_
3
0
x

x + 1dx.
We will use the technique of integration by substitution.
Remark 1. If we dont use the Theorem 4 in evaluating such an integral and
we make the substitution in the denite integral we have to change the limits of
integration to correspond to the values of u for x = a and x = b.
Solution. To calculate the previous integral we can use the substitution
u =

x + 1,
where from we have x = u
2
1 and dx = 2udu.
165
Note that if x = 0, then u = 1 and if x = 3, then u = 2.
It follows that
_
3
0
x

x + 1dx =
_
2
1
(u
2
1)u2udu
=
_
2
1
(2u
4
2u
2
)du =
2
5
u
5

2
1

2
3
u
3

2
1
=
2
5
(32 1)
2
3
(8 1) =
62
5

14
3
=
116
15
Example 3. Evaluate the integral
_
/2
0
xcos xdx.
We will use the method of integration by parts in order to compute the previous
integral.
Remark 2. For denite integrals over an interval [a, b] we have the following
formula for integrating by parts
_
b
a
f

(x)g(x)dx = f(x)g(x)

b
a

_
b
a
f(x)g

(x)dx (4)
Solution.
_
2
0
xcos xdx =
_
2
0
x(sin x)

dx
= xsin x

2
0

_
2
0
sin xdx =

2
+ cos x

2
0
=

2
1.
One of the most important applications of the denite integral is the calculation
of areas bounded by arbitrary curves.
Theorem 5. Let f be a continuous nonnegative function with the domain con-
taining the interval [a, b]. Then the area of the region bounded above by the graph of
f, below by the x-axis, and on the left and right by the vertical lines x = a and x = b,
respectively, is given by the denite integral
_
b
a
f(x)dx.
Example 4. Find the area of the region situated under the curve y = x
2
+1 from
x = 1 to x = 2.
166
`

x = 1
x = 2
y
x
1 2
y
=
f
(
x
)
=
x
2
+
1
O
Solution. The region under consideration is shown in this gure. Using Theorem
5 we have
_
2
1
(x
2
+ 1)dx =
x
3
3

2
1
+x

2
1
=
1
3
[2
3
(1)
3
] + 2 (1) = 6.
3.3 Improper integrals
3.3.1 Improper integrals
In dening a denite integral (or a Riemann integral)
_
b
a
f(x)dx it was understood
that:
1

the limits of integration were nite numbers


2

the function f was bounded on the interval [a, b].


Now we will extend the concept of a denite (proper) integral to the case where
length of the interval is innite and also to the case when f is unbounded.
The resulting integral is said to be an improper integral.
So, in conclusion, improper means that some part of
_
b
a
f(x)dx becomes innite.
It might be a or b or the function f.
First we will consider integrals of functions that are dened on unbounded inter-
vals.
To motivate the denition of an improper integral of a function f over an innite
interval, consider the problem of nding the area of the region under the curve y =
f(x) =
1
x
2
, above the x-axis, and to the right of the line x = 1 (as shown in gure
below).
167
`

t 1
y =
1
x
2
,
The area that lies to the left of the line x = t (shaded in gure below) is
A(t) =
_
t
1
1
x
2
dx =
1
x

t
1
= 1
1
t
.
Note that A(t) < 1 no matter how large t is chosen.
We also observe that
lim
t
A(t) = lim
t
_
1
1
t
_
= 1.
The area of the shaded region approaches 1 at t , so we can say that the area
of the innite region is equal to 1 and we write
_

1
1
x
2
dx = lim
t
_
t
1
1
x
2
dx = lim
t
_
1
1
t
_
= 1.
Using this example we dene the integral of f over an innite interval as the limit
of integrals over nite intervals.
Denition 1. (Improper integrals on unbounded intervals)
1) Let f : [a, ) R. If
_
t
a
f(x)dx exists for each t 0 then
_

a
f(x)dx = lim
t
_
t
a
f(x)dx.
The integral
_

a
f(x)dx is called an improper integral on an unbounded interval
on the right. This integral is said to be convergent if the limit exists and has a nite
value and it is said to be divergent if the limit does not exist or it has an innite
value.
2) Let f : (, b] R. If
_
b
t
f(x)dx exists for each t b, then
_
b

f(x)dx = lim
t
_
b
t
f(x)dx.
168
The previous integral
_
b

f(x)dx is called an improper integral on an unbounded


interval on the left. The denition of convergence or divergence is similar with the
previous case.
3) Let f : R R and a R.
_
+

f(x)dx =
_
a

f(x) +
_
+
a
f(x)dx.
The improper integral
_
+

f(x)dx is said to be convergent if both of


_
a

f(x)dx
and
_
+
a
f(x)dx are convergent.
The previous improper integral is divergent if at least one of the improper integrals
_
a

f(x)dx,
_

a
f(x)dx is divergent.
This type of improper integrals are easy to identify. It is sucient to look at
the limits of integration. If either the lower limit of integration, the upper limit of
integration or both of them are not nite, it will be an improper integral on an
unbounded interval.
Example 1. Evaluate
_

0
e
x
dx.
Solution.
_

0
e
x
dx = lim
t
_
t
0
e
x
dx = lim
t
_
e
x

t
0
_
= lim
t
(e
t
+ 1) = 1
We can abbreviate this calculation by writing (instead of writing the limit):
_

0
e
x
dx = e
x

0
= 0 + 1.
Example 2. Evaluate
_
0

xe
x
dx.
Solution. By using the denition of an improper integral we have
_
0

xe
x
dx = lim
t
_
0
t
xe
x
dx.
We integrate by parts with f(x) = x and g

(x) = e
x
, so that f

(x) = 1 and
g(x) = e
x
.
_
0
t
xe
x
dx = xe
x

0
t

_
0
t
e
x
dx
= te
t
e
x

0
t
= te
t
1 +e
t
.
169
We know that lim
t
e
t
= 0 and by using the lHospitals rule (theorem 2, subsec-
tion 3.1.2) we get
lim
t
te
t
= lim
t
t
e
t
= lim
y
y
e
y
[

]
= lim
y
(y)

(e
y
)

= lim
y
1
e
y
= 0
Another way of determining the previous limit is by using the fact the exponential
function goes faster to innity as any polynomial. So,
lim
y
P(y)
e
y
= 0
and in particular
lim
y
y
e
y
= 0.
Therefore
_
0

xe
x
dx = lim
t
(te
t
1 +e
t
) = 0 1 + 0 = 1.
Example 3. For what values of is the integral
_

1
1
x

dx convergent?
Solution. For = 1 we have
_

1
1
x
dx = lim
t
_
t
1
1
x
dx = lim
t
ln [x[

t
1
= lim
t
(ln t ln 1) = .
The limit is not nite and so the improper integral
_

1
1
x
dx is divergent.
For ,= 1 we have
_

1
1
x

dx = lim
t
_
t
1
x

dx = lim
t
x
+1
+ 1

t
1
=
1
1
lim
t
_
1
t
1
1
_
If > 1 then 1 > 0 and
lim
t
1
t
1
=
1

= 0.
Therefore
_

1
1
x

dx =
1
1
170
and the integral is convergent.
If < 1 then 1 > 0,
lim
t
1
t
1
= lim
t
t
1
=
and the integral is divergent.
We can summarize the previous results in the following remark (for future refer-
ence).
Remark 1. The improper integral
_

1
1
x

dx is convergent if > 1 and divergent


if 1.
Example 4. Evaluate
_
0

cos xdx if possible.


Solution.
_
0

cos xdx = lim


t
_
0
t
cos xdx
= lim
t
_
sin x

0
t
_
= lim
t
(sin t) = lim
t
sin t.
Since lim
t
sin t does not exist (as in example 10, subsection 3.1.1).
We will analyse now the integrals of unbounded functions.
Denition 2. (improper integrals of unbounded functions)
1) Let f : [a, b) R be a continuous function on [a, b) with lim
xb
f(x) = (or
).
We dene the improper integral of the unbounded function f as
_
b
a
f(x)dx = lim
tb
_
t
a
f(x)dx.
This integral is said to be convergent if the limit exists and has a nite value and
it is said to be divergent if the limit does not exist or it has an innite value. The
point b is called a critical point or a bad point.
2) Let f : (a, b] R be a continuous function on (a, b] with lim
xa
f(x) = (or
). Then
_
b
a
f(x)dx = lim
ta
_
b
t
f(x)dx.
The denition of convergence or divergence is similar with the previous case.
3) Let f : [a, c)(c, b] R be a continuous function on [a, c)(c, b] with lim
xc
f(x) =
() or lim
xc
f(x) = ().
We dene
_
b
a
f(x)dx =
_
c
a
f(x)dx +
_
b
c
f(x)dx.
171
The improper integral
_
b
a
f(x)dx is said to be convergent if both of
_
c
a
f(x)dx
and
_
b
c
f(x)dx are convergent.
The previous improper integral is divergent if at least one of the improper integrals
_
c
a
f(x)dx,
_
b
c
f(x)dx is divergent.
The integrals of unbounded functions are more dicult to identify. It is necessary
to look at the interval of integration and determine if the integrand is continuous or
not in that interval. Things to look are fractions for which the denominator becomes
zero in the interval of integration.
Example 5. Evaluate
_
3
0
1
x 2
dx if possible.
Solution. Observe that the line x = 2 is a vertical asymptote of the integrand.
We have to use part 3) of the Denition 2 with c = 2:
_
3
0
1
x 2
dx =
_
2
0
1
x 2
dx +
_
3
2
1
x 2
dx
where
_
2
0
1
x 2
dx = lim
t2
_
t
0
1
x 2
dx = lim
t2
ln [x 2[

t
0
= lim
t2
ln(2 x)

t
0
= lim
t2
[ln(2 t) ln 2]
= lim
t2
ln(2 t) ln 2 = .
Thus
_
2
0
1
x 2
dx is divergent. This implies that
_
3
0
1
x 2
dx is divergent. We do
not need to evaluate
_
3
2
1
x 2
dx.
If we had not observed the asymptote x = 2 in the previous example and we
confused the integral with a proper integral, then we might have made the following
erroneous calculation.
_
3
0
1
x 2
dx = ln [x 2[

3
0
= ln 1 ln 2 = ln 2.
This is wrong because the integral is improper and must be calculated in terms of
limits.
Example 6. Evaluate
_
4
0
dx

x
if possible.
Solution. Observe that lim
x0
1

x
= +.
172
We must use the part 1) of the Denition 2 with a = 0.
_
4
0
dx

x
= lim
t0
_
4
t
x

1
2
dx = lim
t0
x

1
2
+1

1
2
+ 1

4
t
= lim
t0
2

4
t
= lim
t0
(4 2

t) = 4.
Hence, the integral converges and
_
4
0
dx

x
= 4.
Example 7. Evaluate
_
e
0
ln xdx if possible.
Solution. Since lim
x0
ln x = , then the critical point is a = 0.
Using integration by parts we get
_
e
0
ln xdx = lim
t0
_
e
t
xln xdx = lim
t0
_
xln x

e
t

_
e
t
x
1
x
dx
_
= lim
t0
(xln x x)

e
t
= e ln e e lim
t0
(t ln t t)
= lim
t0
t ln t = lim
t0
ln t
1
t
[

]
= lim
t0
1
t

1
t
2
= lim
t0
t = 0
In conclusion, the integral is convergent and
_
e
0
ln xdx = 0.
Example 8. For what values of is the integral
_
b
a
1
(x a)

convergent?
Solution. For = 1 we have
_
b
a
1
x a
dx = lim
ta
_
b
t
1
x a
dx = lim
ta
ln [x a[

b
t
= lim
ta
(ln(b a) ln(t a)) = .
For ,= 1 we have
_
b
a
1
(x a)

dx = lim
ta
_
b
t
(x a)

dx = lim
ta
(x a)
+1
+ 1

b
t
= lim
ta
1
1
[(b a)
1
(t a)
1
]
If > 1 then 1 > 0 and
lim
ta
(t a)
1
= lim
ta
1
(t a)
1
= .
173
If < 1 then 1 > 0 and
lim
ta
(t a)
1
= 0.
We can summarize the previous results in the following remark (for future refer-
ence).
Remark 2. a)
_
b
a
1
(x a)

dx is convergent if < 1 and divergent for 1.


b)
_
b
a
1
(b x)

dx is convergent if < 1 and divergent for 1.


The proof of the part b) in Remark 2 is similar with that presented in solving the
part a), so it will be omitted.
Sometimes an improper integral is too dicult to be evaluated. In these cases we
can compare the integrals with known integrals. The theorem below shows us how to
do this.
Theorem 1. (Comparison theorem)
Let f, g : [0, ) R be two continuous functions with f(x) g(x) 0 for x a.
a) If
_

a
f(x)dx is convergent then
_

a
g(x)dx is convergent.
b) If
_

a
g(x)dx is divergent then
_

a
f(x)dx is divergent.
If we use the previous theorem and Remark 1 we obtain the following criterion for
convergence-divergence.
Theorem 2. (Criterion for convergence-divergence).
a) If there is > 1 such that lim
x
x

[f(x)[ = c < then the improper integral


_

a
f(x)dx is a convergent one.
b) If there is 0 < 1 such that lim
x
x

[f(x)[ = c > 0, then the improper integral


_

a
f(x)dx is a divergent one.
Similar results are valid for the improper integral
_
b

f(x)dx.
In what concerns the improper integrals of unbounded functions we have the
following results.
Theorem 3. (Comparison theorem)
Let f, g : [a, b) R be two continuous functions such that
lim
xb
f(x) = lim
xb
g(x) =
and
f(x) g(x) 0 for x [a, b).
174
a) If
_
b
a
f(x)dx is convergent then
_
b
a
g(x)dx is convergent.
b) If
_
b
a
g(x)dx is divergent then
_
b
a
f(x)dx is divergent.
Similar results are valid for the improper integral
_
b
a
f(x)dx
where a is a critical point.
If we use Theorem 3 and Remark 2 we obtain the following criterions for
convergence-divergence.
Theorem 4. (Criterion for convergence-divergence)
Let f, g : [a, b) R be two continuous functions such that
lim
xb
[f(x)[ = lim
xb
[g(x)[ = .
a) If there is (0, 1) such that
lim
xb
(b x)

[f(x)[ = c < ,
then the improper integral
_
b
a
f(x)dx is a convergent one.
b) If there is 1 such that
lim
xb
(b x)

[f(x)[ = c > 0,
then the improper integral
_
b
a
f(x)dx is a divergent one.
Theorem 5. (Criterion for convergence-divergence)
Let f, g : (a, b] R be two continuous functions such that
lim
xa
[f(x)[ = lim
xa
[g(x)[ = .
a) If there is (0, 1) such that
lim
xa
(x a)

[f(x)[ = c < ,
then the improper integral
_
b
a
f(x)dx is a convergent one.
b) If there is 1 such that
lim
xa
(x a)

[f(x)[ = c > 0,
then the improper integral is a divergent one.
175
Example 9. Show that
_

0
e
x
2
dx is convergent.
Solution. We cant evaluate the integral directly because we are not able to
compute the antiderivative of e
x
2
. We write
_

0
e
x
2
dx =
_
1
0
e
x
2
dx +
_

1
e
x
2
dx.
The rst integral on the right hand side is just a proper integral. In the second
integral we use the fact that for x 1 we have x
2
x, so e
x
2
e
x
.
The integral of e
x
is easy to evaluate:
_

1
e
x
dx = lim
t
_
t
1
e
x
dx = lim
t
(e
1
e
t
) =
1
e
.
Thus, taking f(x) = e
x
and g(x) = e
x
2
in the Comparison theorem (Theorem
1), we see that
_

1
e
x
2
dx is convergent and so will be
_

0
e
x
2
dx.
3.3.2 Eulers integrals
Eulers integrals are special functions (dened by using improper integrals) that
are used in probabilities and in the computation of certain integrals.
Beta function
The integral
_
1
0
x
p1
(1 x)
q1
dx is called Eulers rst integral.
This integral can be an improper integral of an unbounded function where the
potential critical points are 0 and 1.
If p < 1 then 0 is a critical point since
lim
x0
x
p1
(1 x)
q1
= .
If q < 1 then 1 is a critical point since
lim
x1
x
p1
(1 x)
q1
= .
If p 1 and q 1 then the Eulers rst integral is a denite (proper) integral.
In what concerns the convergence of the Eulers rst integral we have the following
result.
Theorem 1.
a) If p > 0 and q > 0 then the Eulers rst integral is convergent.
b) If p 0 or q 0 then the Eulers rst integral is divergent.
Proof. We split rst the integral as
_
1
0
x
p1
(1 x)
q1
dx =
_
1/2
0
x
p1
(1 x)
q1
dx +
_
1
1/2
x
p1
(1 x)
q1
dx
176
and we study the convergence of both improper integrals in the right-hand side of
previous equality.
We use Theorem 5 section 3.3.1 to study the convergence of the rst improper
integrals mentioned before
lim
x0
x

x
p1
(1 x)
q1
= lim
x0
x
+p1
=
_
_
_
, if +p 1 < 0
1, if +p 1 = 0
0, if +p 1 > 0
The previous limit is nite if +p 1 0 and is positive if +p 1 0.
The improper integral
_
1/2
0
x
p1
(1x)
q1
is convergent if there is (0, 1) such
that the previous limit is nite. We are looking for (0, 1) such that +p 1 0.
So, we need to have 1 p < 1 which is possible if p > 0. Therefore for p > 0
the improper integral
_
1/2
0
x
p1
(1 x)
q1
dx is convergent.
The improper integral
_
1/2
0
x
p1
(1x)
q1
is divergent if there is 1 such that
the previous limit is positive.
We are looking for 1 such that +p 1 0.
So, we need to have 1 1 p which is possible if p 0. Therefore for p 0
the improper integral
_
1/2
0
x
p1
(1 x)
q1
dx is divergent.
Similar arguments, based on Theorem 4, give us the following results: for q > 0
the improper integral
_
1
1/2
x
p1
(1x)
q1
dx is convergent and for q 0 the improper
integral
_
1
1/2
x
p1
(1 x)
q1
dx is divergent, as desired.
Since for p > 0 and q > 0 the rst Eulers integral is convergent we can dene the
following function which is called Beta function:
B : (0, ) (0, ) R
B(p, q) =
_
1
0
x
p1
(1 x)
q1
dx
(1)
Theorem 2. (Properties of Beta function)
B1) B(p, 1) =
1
p
, B(1, 1) = 1, for each p > 0.
B2) B
_
1
2
,
1
2
_
= .
B3) B(p, q) = B(q, p), for each p > 0 and q > 0.
B4) B(p, q) =
p 1
p +q 1
B(p 1, q), for each p > 1 and q > 0.
B5) B(p, q) =
q 1
p +q 1
B(p, q 1), for each p > 0 and q > 1.
B6) B(m, n) =
(m1)!(n 1)!
(m+n 1)!
, for each m, n N

.
177
B7) B(p, 1 p) =

sin p
, for each 0 < p < 1.
Proofs. (for statements from 1 to 6)
B1) B(p, 1) =
_
1
0
x
p1
(1 x)
11
dx =
_
1
0
x
p1
dx =
x
p
p

1
0
=
1
p
If we let p = 1 in the previous equality we get B(1, 1) = 1.
B2) B
_
1
2
,
1
2
_
=
_
1
0
x

1
2
(1 x)

1
2
dx =
_
1
0
1

x x
2
dx
=
_
1
0
1

1
4

_
x
1
2
_
2
dx = arcsin
x
1
2
1
2

1
0
= arcsin 1 arcsin(1) = 2 arcsin 1 = 2

2
= .
B3) B(p, q) =
_
1
0
x
p1
(1 x)
q1
dx
Let t = 1 x so that x = 1 t and dx = dt. When x = 1, t = 0 and when x = 0,
t = 1. Making the indicated substitution we nd:
B(p, q) =
_
0
1
(1 t)
p1
t
q1
(dt) =
_
0
1
(1 t)
p1
t
q1
dt
=
_
1
0
(1 t)
p1
t
q1
dt = B(q, p)
B4) Let p > 1 and q > 0. By using the integration by parts we obtain:
B(p, q) =
_
1
0
x
p1
_

(1 x)
q
q
_

dx
=
1
q
x
p1
(1 x)
q

1
0
+
1
q
_
1
0
(p 1)x
p2
(1 x)
q
dx
=
p 1
q
_
1
0
x
p2
(1 x)
q1
(1 x)dx
=
p 1
q
_
1
0
x
p2
(1 x)
q1
dx
p 1
q
_
1
0
x
p1
(1 x)
q1
dx
=
p 1
q
B(p 1, q)
p 1
q
B(p, q).
From the previous equality we have:
B(p, q)
_
1 +
p 1
q
_
=
p 1
q
B(p 1, q)
where from we nally obtain
B(p, q) =
p 1
p +q 1
B(p 1, q)
178
as desired.
B5) Let q > 1 and p > 0. By using successively properties B3 and B4 we obtain:
B(p, q)
B3)
= B(q, p)
B4)
=
q 1
p +q 1
B(q 1, p)
B3)
=
q 1
p +q 1
B(p, q 1)
B6) The desired equality can be obtained by applying successively properties B4,
B5 and B1 as follows:
B(m, n)
B4)
=
m1
m+n 1
B(m1, n)
B4)
=
m1
m+n 1

m2
m+n 2
B(m2, n)
B4)
= . . .
B4)
=
m1
m+n 1

m2
m+n 2
. . .
1
n + 1
B(1, n)
B5)
=
m1
m+n 1
. . .
1
n + 1

n 1
1 +n 1
B(1, n 1)
B5)
=
m1
m+n 1
. . .
1
n + 1

n 1
n

n 2
n 1
B(1, n 2)
B5)
= . . .
B5)
=
(m1)!
(m+n 1) . . . (n + 1)

n 1
n

n 2
n 1
. . .
1
2
B(1, 1)
B1)
=
(m1)!(n 1)!
(m+n 1)!
.
B7) The proof of this statement is beyond the scope of this text.
Example 1. By using the properties of Beta function compute the following
values:
a) B(11, 9); b) B
_
5
2
,
1
2
_
; c) B
_
7
4
,
1
4
_
.
Solution. a) B(11, 9)
B6)
=
(11 1)!(9 1)!
(11 + 9 1)!
=
10!8!
19!
b)
B
_
5
2
,
1
2
_
B4)
=
5
2
1
5
2
+
1
2
1
B
_
5
2
1,
1
2
_
=
3
4
B
_
3
2
,
1
2
_
B4)
=
3
4

3
2
1
3
2
+
1
2
1
B
_
3
2
1,
1
2
_
=
3
8
B
_
1
2
,
1
2
_
B2)
=
3
8

179
c)
B
_
7
4
,
1
4
_
B4)
=
7
4
1
7
4
+
1
4
1
B
_
7
4
1,
1
4
_
=
3
4
B
_
3
4
,
1
4
_
B3)
=
3
4
B
_
1
4
,
3
4
_
B7)
=
3
4


sin

4
=
3
4

2
2
=
3
2

Example 2. By using the properties of Beta function compute the following


integrals:
a)
_
1
0
x
10
(1 x)
8
dx;
b)
_
1
0
x
_
x
1 x
dx;
c)
_
1
0
4

_
1 x
x
_
3
dx.
Solution. a)
_
1
0
x
10
(1 x)
8
dx =
_
1
0
x
111
(1 x)
91
dx = B(11, 9) =
10!8!
9!
(see example 1, part a))
b)
_
1
0
x
_
x
1 x
dx =
_
1
0
x
x
1
2
(1 x)
1
2
dx
=
_
1
0
x
3
2
(1 x)

1
2
dx =
_
1
0
x
5
2
1
(1 x)
1
2
1
dx = B
_
5
2
,
1
2
_
=
3
8

(see example 2, part b))


c)
_
1
0
4

_
1 x
x
_
3
dx =
_
1
0
x

3
4
(1 x)
3
4
dx
=
_
1
0
x
1
4
1
(1 x)
7
4
1
dx = B
_
1
4
,
7
4
_
B3)
= B
_
7
4
,
1
4
_
=
3
2

2
(see example 2, part c)).
Example 3. By using the properties of Beta function compute the following
integrals:
a)
_
2
0
sin
4
xcos
2
xdx; b)
_

0
3

x
1 +x
2
dx.
Solution. a) We make the following change of variable
sin
2
x = t.
180
Since sin x =

t we get
x = arcsin

t
and hence
dx =
1

1 t

1
2

t
dt.
We have to change the limits of integration. If x = 0 then t = 0 and if x =

2
then
t = 1.
_
2
0
sin
4
xcos
2
xdx =
_
1
0
t
2
(1 t)
1

1 t 2

t
dt
=
1
2
_
1
0
t
3
2
(1 t)
1
2
dt =
1
2
_
1
0
t
5
2
1
(1 t)
3
2
1
dt
=
1
2
B
_
5
2
,
3
2
_
B5)
=
1
2

3
2
1
5
2
+
3
2
1
B
_
5
2
,
3
2
1
_
=
1
12
B
_
5
2
,
1
2
_
=
1
12

3
8
=

32
(see example 1, part b)).
b) We make the following change of variable
x
2
=
t
1 t
.
Since x =
_
t
1 t
we get
dx =
1
2
_
t
1 t
_
t
1 t
_

dt
that is
dx =
1
2
_
1 t
t

1
(1 t)
2
dt
We have to change the limits of integration. If x = 0 then t = 0 and if x = then
t = 1.
_

0
3

x
1 +x
2
dx =
_
1
0
6
_
t
1 t
1 +
t
1 t

1
2
_
1 t
t

1
(1 t)
2
dt
=
1
2
_
1
0
6

t
1 t

(1 t)
3
t
3

1
(1 t)
6
dt
181
=
1
2
_
1
0
6

1
t
2
(1 t)
4
dt =
1
2
_
1
0
t

1
3
(1 t)

2
3
dt
=
1
2
_
1
0
t
2
3
1
(1 t)
1
3
1
dt =
1
2
B
_
2
3
,
1
3
_
B2)
=
1
2
B
_
1
3
,
2
3
_
B7)
=
1
2


sin

3
=
1
2

3
2
=

3
Gamma function
The integral
_

0
x
p1
e
x
dx is called Eulers second integral.
This integral is an improper one on an unbounded interval. If p < 0 then 0 is a
critical point for this integral.
In what concerns the convergence of the Eulers second integral we have the fol-
lowing result:
Theorem 3. a) If p > 0 then the Eulers second integral is convergent.
b) If p 0 then the Eulers second integral is divergent.
Proof. We split rst the integral as
_

0
x
p1
e
x
dx =
_
1
0
x
p1
e
x
dx +
_

1
x
p1
e
x
dx
and we study the convergence of both improper integrals in the right-hand side of
previous equality.
Similar arguments (based on Theorem 5 section 3.3.1) to those used in the proof of
Theorem 1 give us the following results: for p > 0 the improper integral
_
1
0
x
p1
e
x
dx
is convergent and for p 0 the improper integral
_
1
0
x
p1
e
x
dx is divergent.
We use Theorem 2, section 3.3.1, to study the convergence of the improper integral
_

1
x
p1
e
x
dx.
We have
lim
x
x

[x
p1
e
x
[ = lim
x
x
+p1
e
x
= 0 <
for each , in particular for > 1 (the previous limit is 0 since the exponential function
goes faster to innity than any power function). Hence the considered integral is a
convergent one, as desired.
Since for p > 0 the second Eulers integral is convergent we can dene the following
function which is called Gamma function.
: (0, ) R
(p) =
_

0
x
p1
e
x
dx
(2)
182
Gamma function is also known as generalized factorial function. We will present
next the basic properties of the gamma function and its relation with n!.
Theorem 4. (Properties of Gamma function)
1) (1) = 1.
2) (p) = (p 1)(p 1), for each p > 1.
3) (n) = (n 1)!, for each n N

.
4) B(p, q) =
(p)(q)
(p +q)
, for each p > 0 and q > 0.
5)
_
1
2
_
=

.
Proof. 1)
(1) =
_

0
x
11
e
x
dx =
_

0
e
x
dx = e
x

0
= 0 + 1 = 1
2) Integration by parts gives us:
(p) =
_

0
x
p1
e
x
dx = lim
t
_
t
0
x
p1
(e
x
)

dx
= lim
t
_
e
x
x
p1

t
0
+ (p 1)
_
t
0
x
p2
e
x
dx
_
= lim
t
t
p1
e
t
+ (p 1) lim
t
_
t
0
x
p2
e
x
dx
= 0 + (p 1)(p 1) = (p 1)(p 1)
3) The desired equality can be obtained by applying successively property 2)
and 1) as follows:
(n)
2)
= (n 1)(n 1)
2)
= (n 1)(n 2)(n 2) = =
2)
= (n 1)(n 2) . . . 1(1)
1)
= (n 1) . . . 1 = (n 1)!
4) There will be no proof of this item.
5) We take p = q =
1
2
in Euler relation 4) to obtain
B
_
1
2
,
1
2
_
=
_

_
1
2
__
2
(1)
,
where from
_

_
1
2
__
2
= .
Since
_
1
2
_
> 0 from the last equality we get
_
1
2
_
=

. This completes the
proof.
183
Example 4. Compute the following integrals:
a)
_
1
0
_
x
5
x
6
dx;
b)
_

0
x
2
e

x
5
dx;
c)
_

2
xe
2x
dx;
d)
_

0
xe
2x
dx.
Solution. a)
_
1
0
_
x
5
x
6
dx =
_
1
0
_
x
5
(1 x)dx =
_
1
0
x
5
2
(1 x)
1
2
dx
=
_
1
0
x
7
2
1
(1 x)
3
2
1
dx = B
_
7
2
,
3
2
_
=

_
7
2
_

_
3
2
_
(5)
It remains for us to compute (5),
_
3
2
_
and
_
7
2
_
.
(5)
3)
= (5 1)! = 4! = 24

_
3
2
_
2)
=
_
3
2
1
_

_
3
2
1
_
=
1
2

_
1
2
_
5)
=
1
2

_
7
2
_
2)
=
_
7
2
1
_

_
7
2
1
_
=
5
2

_
5
2
_
2)
=
5
2
_
5
2
1
_

_
5
2
1
_
=
5
2

3
2

_
3
2
_
=
15
4

2
=
15

8
In consequence
_
1
0
_
x
5
x
6
dx =

_
7
2
_

_
3
2
_
(5)
=
15

2
24
=
5
128
b) In order to use Gamma function we make the following substitutions
x
5
= t; x = 5t; dx = 5dt
If x = 0 then t = 0 and if x = then t = .
_

0
x
2
e

x
5
dx =
_

0
(5t)
2
e
t
5dt = 125
_

0
t
2
e
t
dt
184
= 125
_

0
t
31
e
t
dt = 125(3) = 125 2! = 250
c) In order to use Gamma function we make the following substitutions
t = x 2; x = t + 2; dx = dt
If x = 2 then t = 0 and if x = then t =
_

2
xe
2x
dx =
_

0
(t + 2)e
t
dt =
_

0
te
t
dt + 2
_

0
e
t
dt
=
_

0
t
21
e
t
dt + 2
_

0
t
11
e
t
dt = (2) + 2(1) = (2 1)! + 2 1 = 3
d)
_

0
xe
2x
dx =
_

0
e
2
xe
x
dx = e
2
_

0
xe
x
dx
= e
2
_

0
x
21
e
x
dx = e
2
(2) = e
2
(2 1)! = e
2
.
Euler-Poisson integral
The integral
_

0
e
x
2
dx is called Euler-Poisson integral.
As we saw in example 9, subsection 3.3.1 the previous integral is convergent.
Next, by using the substitution
t = x
2
, x =

t, dx =
1
2

t
dt
we will evaluate the Euler-Poisson integral. We observe that by the previous change
of variable the limits of integration remain the same.
_

0
e
x
2
dx =
1
2
_

0
1

t
e
t
dt =
1
2
_

0
t

1
2
e
t
dt
=
1
2
_

0
t
1
2
1
e
t
dt =
1
2

_
1
2
_
=
1
2

In conclusion
_

0
e
x
2
dx =

2
Theorem 5. (Properties of Euler-Poisson integral)
a)
_

0
e
x
2
dx =

2
;
b)
_

e
x
2
dx =

;
185
c)
_

x
2
2
dx =

2.
Proof. a) The equality was already proved.
b)
_

e
x
2
dx =
_
0

e
x
2
dx +
_

0
e
x
2
dx.
We compute separately the rst integral of the right-side of equality by making
the following substitutions:
t = x, x = t and dx = dt.
If x = then t = and if x = 0 then t = 0.
Hence
_
0

e
x
2
dx =
_
0

e
t
2
dt =
_

0
e
t
2
dt =
_

0
e
x
2
dx =

2
Finally, we get
_

e
x
2
dx =

2
+

2
=

.
c) By making the following change of variable
x

2
= t; x = t

2; dx =

2dt
we obtain
_

x
2
2
dx =
_

2t)
2
2

2dt =

2
_

e
t
2
dt =

2.
186
Chapter 4
Dierential calculus of several
variables
4.1 Real functions of several variables.
Limits and continuity
4.1.1 Real functions of several variables
In many practical situations, the value of one quantity may depend on the values
on two or more others. For example, the output of a factory depends on the amount
of capital invested in the plant and on the size of the labor force. The demand for
butter may depend on the price of butter and on the price of margarine. Relationships
of this type can be represented mathematically by functions having more than one
variable.
We shall restrict rst our attention to functions of two variables.
Denition 1. A real function f of two variables is a rule that assigns to each
ordered pair of real numbers (x, y) in a set A a unique real number denoted by
f(x, y).
f : A R, A (x, y) f(x, y) R.
The set A is the domain of f (usually is the largest set for which the rule of f
makes sense), the set R in which f takes its values is called the target space and its
range is the set of values that f takes on, that is, f(x, y) [ (x, y) A.
We often write z = f(x, y) to make explicit the value taken on by f at the general
point (x, y). The variables x and y are independent variables and z is the dependent
variable.
A function of two variables is a function whose domain is a subset of R
2
and whose
range is a subset of R. One way of visualizing such a function is by using an arrow
diagram, where the domain is a subset of R
2
.
187
`

f(x, y) 0 f(a, b)

(x, y)
(a, b)
y
x
x
If a function f is given by a formula and no domain is specied, then the domain
of f is the largest set for which the rule of f makes sense.
Example 1. For the following function nd the domain and evaluate f(3, 2),
f(x, y) =

x +y + 1
x 1
.
Solution. The given expression is well-dened if the denominator is not 0 and the
quantity under the square root sign is nonnegative. In conclusion the domain of f is
A = (x, y) [ x +y + 1 0, x ,= 1.
The inequality x + y + 1 0, or y x 1, represents the points that lie on or
above the line y = x 1 while the condition x ,= 1 means that the points on the
line x = 1 must be excluded from the domain.
`

x = 1
1
1
y
x
O
f(3, 2) =

3 + 2 + 1
3 1
=

6
2
188
Example 2. In 1928 Charles Cobb and Paul Douglas published a study in which
they modeled the growth of the american economy during the period 1899-1922. They
considered a simplied view of the economy in which production output is determined
by the amount of labor involved and the amount of capital invested.
The function used to model production was
Q(L, K) = bL

K
1
(1)
where Q is the total production (the monetary value of all goods produced ina year),
L is the amount of labor (the total number of hours worked in a year) and K is the
amount of capital invested.
Cobb and Douglas used the method of least squares (see 4.7.1) to t the data
published by the government to the function
Q(L, K) = 1, 01 L
0,75
K
0,25
.
The production function (1) has been used in many settings, from individual rms
to global economic functions. Its domain is (L, K) [ L 0, K 0 because L and
K represent labor and capital and they are never negative.
Example 3. Suppose that at a certain factory, output is given by the Cobb-
Douglas production function
Q(K, L) = 60 K
1/3
L
2/3
,
where K is the capital investment measured in unit of 1000 Euros and L the size of
the labor force measured in worker-hours.
a) compute the output if the capital investment is 512.000 Euros and 1000 of
worker-hours are used.
b) show that the output in part (a) will be double if both the capital investment
and the size of the labor force are doubled.
Solution. a) Evaluate Q(K, L) with K = 512 and L = 1000.
Q(512, 1000) = 60 (512)
1/3
(1000)
2/3
= 60 8 100 = 48000 (units)
b) Q(2 512, 2 1000) = 60 (2 512)
1/3
(2 1000)
2/3
= 60 2
1/3
512
1/3
2
2/3
1000
2/3
= 60 2
1/3+2/3
512
1/3
1000
2/3
= 2Q(512, 1000) = 2 48000 = 96000.
Using a calculation similar to the one in part (b) of the previous example, it can
be shown that if both capital and labor are multiplied by some positive number m,
then output will also be multiplied by m. In economics, production functions with
this property are said to have constant return to scale.
Indeed,
Q(mK, mL) = b(mK)

(mL)
1
= m

m
1
b K

L
1
= mQ(K, L)
189
Another way of visualizing the behaviour of a function of two variables is to
consider its graph.
Denition 2. If f is a real function of two variables with domain A, then the
graph of f is the set:
G
f
= (x, y, z) R
3
[ (x, y) A, z = f(x, y) (2)
The graph of a function f of two variables is a surface S R
3
with equation
z = f(x, y).
Example 4. Sketch the graph of the function
f(x, y) = 6 3x 2y, A = R
2
.
Solution. The graph has the equation
z = 6 3x 2y or 3x + 2y +z = 6
which represents a plane. We determine rst the intercepts.
Putting y = z = 0 in the equation, we get x = 2 as the x-intercept. Similarly
y-intercept in 3 and z-intercept is 6.
`

x
(0,0,6)
(0,3,0)
z
y
(2,0,0)
.
The graph of a function of two variables is, in general, dicult to be sketched,
and we shall not develop a systematic procedure for sketching the graphs of such
functions.
However, computer programs are available for graphing functions of two variables.
Fortunately, there is another way to visualize a function from R
2
to R - the study
of level curves of f.
190
Suppose f is a function of two variables x and y. If c is some value in the range
of the function f, then the equation f(x, y) = c describes a curve lying on the plane
z = c called the trace of the graph of f in the plane z = c.
If this trace is projected onto the xy-plane, the resulting curve in the xy-plane is
called a level curve.
Actually, for any constant c, the points (x, y) for which f(x, y) = c form a curve
in the xy plane that is said a level curve of f.
Example 5. If f(x, y) = x
2
y, f : R
2
R, sketch the level curves f(x, y) = 4
and f(x, y) = 9.
Solution. The level curve f(x, y) = 9 consists of all points (x, y) in xy plane for
which
x
2
y = 9 or y = x
2
9.
The latter equality represent a quadratic function whose graph is sketched below.
`

f
=
4
f
=
9
2 2
3 3
4
9
x
y
Denition 3. Let f : A R
n
R, f : A R is called a real function of n
variables if and only if for each x A there corresponds one and only one element
f(x) R
A x = (x
1
, x
2
, . . . , x
n
) f(x) R.
The set A is called the domain of f, R is called the target space and the set
f(x) [ x A is called the range of f or the image of f.
191
4.1.2 Limits. Continuity
Global limit
In order to make the things clear from the beginning we shall discuss rst the case
n = 2.
Intuitively, f has the limit l at a given point (a, b) if the values f(x, y) are ap-
proaching l when (x, y) approaches (a, b). This limit can be written as
lim
(x,y)(a,b)
f(x, y) = l or lim
x
yb
f(x, y).
In the last limit x a and y b in the same time and independently.
In the previous discussion we mention the word approach and we want to mea-
sure somehow the notion of approaching to a given point.
In one variable x a means that we can approach a only from the left side
and from the right side. In two or more variables the situation is more complicated.
There is an innite number of ways of approaching a point (a, b). For instance, we
can approach along vertical or horizontal lines; along every straight line which passes
through (a, b) or along every curve (as we can see in gure below).
`

(a, b)
a
b
y
x
We will use the notion of distance between two points to measure how close is a
point another.
We can say that lim
(x,y)(a,b)
f(x, y) = l if the distance between f(x, y) and l can be
made arbitrarily small by making the distance from (x, y) to (a, b) suciently small.
We are prepared now to present the rigorous denition of a limit of a function at
a given point.
Denition 1. (the global limit)
Let f : A R, A R
2
and (a, b) A

.
192
We say that the limit of f as (x, y) approaches (a, b) is l and we write
lim
(x,y)(a,b)
f(x, y) = l
if and only if > 0, > 0 such that if (x, y) A and
0 <
_
(x a)
2
+ (y b)
2
< ,
then [f(x, y) l[ < .
Other notations for the limit in the Denition 1 are
f(x, y) l as (x, y) (a, b)
and
lim
xa
yb
f(x, y) = l
(here x a, y b independently and in the same time).
`

(x, y)
(a, b)
x
y
f

l l

l +
From the previous discussion we obtain the following remark.
If we can nd two dierent paths of approach along which f has dierent limits
then the global limit does not exist.
Remark 1. If f(x, y) l
1
when (x, y) (a, b) along a path C
1
and f(x, y) l
2
as (x, y) (a, b) along a path C
2
where l
1
,= l
2
then lim
(x,y)(a,b)
f(x, y) does not exist.
Example 1. Show that lim
(x,y)(0,0)
x
2
3y
2
x
2
+y
2
does not exist.
Solution. First we will approach (0,0) along the real axis. Then y = 0 and
f(x, 0) =
x
2
x
2
= 1 for all x ,= 0, so f(x, y) 1 as (x, y) (0, 0) along x-axis.
We now approach along the y-axis by putting x = 0. Then
f(0, y) =
3y
2
y
2
= 3 for all y ,= 0
193
so
f(x, y) 3 as (x, y) (0, 0)
along y-axis.
Since f has two dierent limits along two dierent paths, the global limit does not
exist.
Example 2. Study the existence of the following limit
lim
(x,y)(0,0)
xy
x
2
+y
2
.
Solution. It is obvious that
A = R
2
(0, 0), so (0, 0) A

.
If (x, y) (0, 0) along the x axis then y = 0 and
lim
x0
y=0
f(x, y) = lim
x0
f(x, 0) = lim
x0
x 0
x
2
+ 0
= 0.
In the same way
lim
x=0
y0
f(x, y) = 0.
Even if we have obtained identical limits along the axes, that does not assure that
the given limit is 0. We can go to (0,0) along another line, lets say y = mx (that is
the equation of a nonvertical line which passes through (0,0)).
lim
x0
y=mx
f(x, y) = lim
x0
y=mx
xy
x
2
+y
2
= lim
x0
xmx
x
2
+m
2
x
2
= lim
x0
x
2
m
x
2
(1 +m
2
)
=
m
1 +m
2
.
Since we have obtained dierent limits along dierent paths the global limit does
not exist.
Example 3. Does the limit lim
(x,y)(0,0)
x
2
y
x
4
+y
2
exist?
Solution. If (x, y) (0, 0) along any nonvertical line which passes through the
origin, y = mx then
lim
x0
y=mx
f(x, y) = lim
x0
x
2
mx
x
4
+m
2
x
2
= lim
x0
xm
x
2
+m
2
= 0
So f(x, y) 0 as (x, y) (0, 0) along y = mx.
Even if f has the same limiting value along nonvertical lines, that does not show
that the given limit is 0.
194
Indeed, if we now let (x, y) (0, 0) along the parabola y = mx
2
, we have
lim
x0
y=mx
2
x
2
y
x
4
+y
2
= lim
x0
x
2
mx
2
x
4
+m
2
x
4
= lim
x0
mx
4
x
4
(1 +x
2
)
=
m
1 +m
2
Since dierent paths lead to dierent limiting values, the given limit does not
exist.
Example 4. Show that lim
(x,y)(0,0)
x
3
y
x
6
+y
2
does not exist.
Solution. If we let y = mx or y = mx
2
we obtain that
lim
(x,y)(0,0)
f(x, y) = 0.
The limit does not exist since on y = x
3
we have
lim
x0
y=x
3
x
3
y
x
6
+y
2
= lim
x0
x
6
x
6
+x
6
=
1
2
.
In what concerns the limits that do exist, their computation can be greatly sim-
plied by the use of properties of limits. As in the case of one variable we have that:
the limit of a sum is the sum of the limits, the limit of a product of the limits, etc
(see subsection 3.1.1). The squeeze theorem also holds (Theorem 2, subsection 3.1.1).
Example 5. Compute
lim
(x,y)(0,0)
xy

xy + 4 2
[
0
0
]
= lim
x0
y0
xy(

xy + 4 + 2)
xy + 4 4
= lim
x0
y0
xy(

xy + 4 + 2)
xy
= lim
x0
y0
(
_
xy + 4 + 2) = 4.
Example 6. Compute lim
x0
y0
3x
2
y
x
2
+y
2
if it exists.
Solution. Since x
2
x
2
+y
2
and 3[y[ 3y 3[y[ then
3[y[
3x
2
y
x
2
+y
2
3[y[.
We know that 3[y[ 0 as (x, y) (0, 0) and by using the Squeeze Theorem we
obtain that
lim
x0
y0
3x
2
y
x
2
+y
2
= 0.
Another way of computing the considered limit is to use the following result.
195
Remark 2. If f is a bounded function and
lim
(x,y)(a,b)
g(x, y) = 0,
then
lim
(x,y)(a,b)
[f(x, y)g(x, y)] = 0.
In our example
f(x, y) =
x
2
x
2
+y
2
and g(x, y) = 3y.
It is obvious that
f(x, y) =
x
2
x
2
+y
2
1
and lim
x0
y0
3y = 0, so the product limit will be 0.
We can now present the denition of the limit in general case (n variable case).
Denition 2. Let f : A R, A R
n
and a A

.
We say that the limit of f as x approaches a is l and we write
lim
xa
f(x) = l
i > 0, > 0 such that x A and 0 < |x a| < then [f(x) l[ < (i is an
abbreviation for if and only if).
Other notations for the limit in the previous denition are f(x) l as x a and
lim
x
1
a
1
...
x
n
a
n
f(x) = l (here x
1
a
1
, . . . , x
n
a
n
independently and in the same time).
The following theorem will permit us to compute limits.
Theorem 1. (basic limit theorem). Let f, g, h : A R
n
R and a A

. Suppose
lim
xa
f(x) = l and lim
xa
g(x) = m. Then:
a) lim
xa
[f(x) +g(x)] = l +m
b) lim
xa
[cf(x)] = cl, where c is a real constant
c) lim
xa
f(x)g(x) = lm
d) lim
xa
f(x)
g(x)
=
l
m
provided m ,= 0
e) Moreover if l = m, lim
xa
f(x) = lim
xa
g(x) and if f(x) h(x) g(x), for x near
a, then lim
xa
h(x) = l. (The Squeeze theorem for functions of several variables).
In the previous theorem, part e), x near a means x in a ball centered at a.
Iterated limits for function of two variables
Denition 3. Let f : A R, A R
2
, (a, b) A

. The iterated limits l


12
and l
21
are dened as
l
12
= lim
xa
_
lim
yb
f(x, y)
_
196
and
l
21
= lim
yb
_
lim
xa
f(x, y)
_
provided that the previous limits exist.
In the case of iterated limits x a and y b independently but not in the same
time as you can see in the gure below.
`

(a, y) (x, y)
(a, b) (x, b)
x a
b
y
l
12
l
21

D
y
x
Remark 3. (The connection between the iterated limits and the global limit).
a) If there exist the iterated limits l
12
, l
21
and l
12
,= l
21
then the global limit does
not exist.
b) The existence and equality of the iterated limits does not assure that the global
limit exists.
Next, we will illustrate by 2 examples the statements of the previous results.
Example 7. Study the existence of the iterated limits and of the global limit in
the next cases:
a) f : R
2
R
f(x, y) =
_
_
_
x
2
3y
2
x
2
+y
2
, (x, y) ,= (0, 0)
0, (x, y) = (0, 0)
b) f : R
2
R
f(x, y) =
_
xy
x
2
+y
2
, (x, y) ,= (0, 0)
0, (x, y) = (0, 0)
Solution. a) l
12
= lim
x0
_
lim
y0
x
2
3y
2
x
2
+y
2
_
= lim
x0
x
2
x
2
= lim
x0
1 = 1
l
21
= lim
y0
_
lim
x0
x
2
3y
2
x
2
+y
2
_
= lim
y0
3y
2
y
2
= lim
y0
(3) = 3
Hence l
12
,= l
21
.
197
The global limit does not exist (see example 1).
b) As we have already observed the global limit does not exist (see example 2).
l
12
= lim
x0
_
lim
y0
xy
x
2
+y
2
_
= lim
x0
0
x
2
= lim
x0
0 = 0
l
21
= lim
y0
_
lim
x0
xy
x
2
+y
2
_
= lim
y0
0
y
2
= lim
y0
0 = 0
So, in conclusion, even if l
12
= l
21
the global limit does not exist.
Directional limits
Denition 4. (the limit in the direction of a given vector)
Let f : A R, A R
n
, a A

and let h R
n
. The limit in the direction of
the vector h is dened by
l
h
= lim
t0
f(a +th)
In the case n = 2 we can geometrically present the way of approaching a in this
type of limit.
`

,
,
y
x
h
a +th
a
th
As you can see a + th goes to a on the straight line which passes through a and
has the same direction with the vector h.
Example 8. Compute l
h
in the case when
f : A R, f(x, y) =
x
2
+y
2x +y
2
, a = (1, 0),
h = (h
1
, h
2
) ,=
Solution. A = (x, y) R
2
[ y
2
,= 2x
The domain A of f is the set of points in R
2
which not lie on the parabola whose
equation is y
2
= 2x. Hence a = (1, 0) A

.
198

`
(1, 0)
a +th = (1, 0) +t(h
1
, h
2
) = (1, 0) + (th
1
, th
2
) = (1 +th
1
, th
2
)
l
h
= lim
t0
f(a +th) = lim
t0
f(1 +th
1
, th
2
)
= lim
t0
(1 +th
1
)
2
+th
2
2(1 +th
1
) +t
2
h
2
2
=
1
2
Remark 4. (The connection between the global limit and the directional limit)
1) If the global limit exists and is equal to l then any directional limit exists and
is equal to l.
2) If there are two dierent vectors h ,= h

(h, h

,= ) such that l
h
1
,= l
h
2
, then
the global limit does not exist.
3) If any directional limit exists and is equal to the same real number then it is
not sure that the global limit exists.
Continuity
Denition 5. Let f : A R, A R
n
and a A. If a A

we say that f is
continuous at a if
lim
xa
f(x) = f(a).
If a is an isolated point then f is continuous at a.
A function f is said to be continuous on the set A if f is continuous at any point
which belongs to A.
The intuitive meaning of continuity is that if the point x (in R
n
) changes by a
small amount, then the value of f(x) changes by a small amount.
This means that a surface that is the graph of a continuous function (of two
variables) has no holes or breaks.
Using the properties of limits, it can be easily seen that sums, dierences, products,
quotients and compositions of continuous functions are continuous on their domain.
199
Theorem 2. (Basic continuity theorem). Let f, g : A R
n
R and let a A.
Suppose f and g are continuous at a. Then f + g, fg and cf (c R, constant) are
continuous at a as is
f
g
provided g(a) ,= 0.
Theorem 3. (Continuity composition theorem) Let f : A R, A R
n
, h : B
R such that f(A) B R and let a A. Suppose f is continuous at a and that h is
continuous at f(a). Then the function h f,
h f : A R, (h f)(x) = h(f(x))
is continuous at a.
Example 9. Let f : R
2
R dened by
f(x, y) =
_
_
_
3x
2
y
x
2
+y
2
, (x, y) ,= (0, 0)
0, (x, y) = (0, 0)
Study the continuity of f.
Solution. We know that f is continuous for (x, y) ,= (0, 0) since it is equal to a
rational function there.
Also from Example 6, we have
lim
(x,y)(0,0)
f(x, y) = 0 = f(0, 0).
Therefore f is continuous at (0,0) and so it is continuous on R
2
.
Example 10. Study the continuity of the function f : R
2
R,
f(x, y) =
_
_
_
1 cos(x
3
+y
3
)
x
2
+y
2
, if (x, y) ,= (0, 0)
, if (x, y) = (0, 0)
where is a real parameter.
Solution. We know that f is continuous for (x, y) ,= (0, 0) since it is equal to a
composition of elementary functions there.
By using the formula
1 cos 2z = 2 sin
2
z
and the fact that
lim
z0
sin z
z
0
0
= lim
z0
(sin z)

= lim
z0
cos z = 1,
we have
lim
(x,y)(0,0)
f(x, y) = lim
(x,y)(0,0)
1 cos(x
3
+y
3
)
x
2
+y
2
= lim
(x,y)(0,0)
2 sin
2
_
x
3
+y
3
2
_
x
2
+y
2
= 2 lim
(x,y)(0,0)
sin
2
_
x
3
+y
3
2
_
_
x
3
+y
3
2
_
2

_
x
3
+y
3
2
_
2
x
2
+y
2
200
=
1
2
_

_ lim
(x,y)(0,0)
sin
x
3
+y
3
2
x
3
+y
3
2
_

_
2
lim
(x,y)(0,0)
x
6
+y
6
+ 2x
3
y
3
x
2
+y
2
=
1
2
lim
(x,y)(0,0)
_
(x
2
+y
2
)(x
4
x
2
y
2
+y
4
)
x
2
+y
2
+
2x
3
y
3
x
2
+y
2
_
=
1
2
_
lim
(x,y)(0,0)
(x
4
x
2
y
2
+y
4
) + 2 lim
(x,y)(0,0)
x
3
y
3
x
2
+y
2
_
=
1
2
lim
(x,y)(0,0)
x
3
y
3
x
2
+y
2
=
1
2
lim
(x,y)(0,0)
x
2
x
2
+y
2
xy
3
=
1
2
0 = 0.
The last limit is 0 by using Remark 2 since
0
x
2
x
2
+y
2
1 and lim
(x,y)(0,0)
xy
3
= 0.
In conclusion we get that if = 0 then f is continuous on R
2
and if ,= 0 then f
is continuous on R
2
(0, 0).
We end this section by presenting (without proof) a result concerning continuous
functions dened on compact sets. We need this result later.
Theorem 4. (Weierstrass theorem. Extreme value theorem). Suppose f : K R
is continuous and K is a closed and bounded subset of R
n
(hence a compact subset).
Then f has a maximum value and a minimum value on K.
4.2 Partial derivatives
An important goal in economic analysis is to understand how a change in one
economic variable aects another.
We have seen (see subsection 3.1.2) that one-variable calculus helps us to under-
stand such kind of changes in the case of functions of one variable.
Since we are interested in the variation brought by the change in one variable,
we will change one variable at a time, keeping all the other variables constant and
the correspondent derivative is called the partial derivative of f with respect to the
considered variable.
Remark 1. From now on all the domains of denition of considered functions will
be open and connected sets and will be denoted by D. Recall that an open connected
set is a domain (see appendix A). Hence any point in D will be an accumulation point
for D.
For a better understanding of these notions we will start with the case of functions
of two variables.
Denition 1. (Partial dierentiability at a point; two variables case) Let f : D
R, D R
2
and let (a, b) D.
201
We say that f is partial dierentiable with respect to the variable x at the point
(a, b) if the following limit exists and has a nite value
f
x
(a, b) = f

x
(a, b) = lim
xa
f(x, b) f(a, b)
x a
= lim
h0
f(a +h, b) f(a, b)
h
(1)
In the same way we can dene
f
y
(a, b) = f

y
(a, b) = lim
yb
f(a, y) f(a, b)
y b
= lim
h0
f(a, b +h) f(a, b)
h
(2)
Example 1. Compute the partial derivatives of f at the point (2,1) where
f : R
2
R, f(x, y) = x
2
+y
2
+xy.
Solution.
f
x
(2, 1) = lim
x2
f(x, 1) f(2, 1)
x 2
= lim
x2
x
2
+ 1 +x (2
2
+ 1
2
+ 2 1)
x 2
= lim
x2
x
2
+x 6
x 2
= lim
x2
(x 2)(x + 3)
x 2
= lim
x2
(x + 3) = 5
f
y
(2, 1) = lim
y1
f(2, y) f(2, 1)
y 1
= lim
y1
4 +y
2
+ 2y 7
y 1
= lim
y1
y
2
+ 2y 3
y 1
= lim
y1
(y 1)(y + 3)
y 1
= lim
y1
(y + 3) = 4
Denition 2. (Partial dierentiability on a set, two variables case) Let f : D R,
D R
2
. We say that the function f is partial dierentiable with respect to x on the
domain D, if f is partial dierentiable with respect to x at any point (x, y) in D.
In this case the following limit exists and has a nite value at each point (x, y) in
D.
f
y
(x, y) = f

x
(x, y) = lim
h0
f(x +h, y) f(x, y)
h
, for each (x, y) D.
Similarly, we can dene partial dierentiability of f with respect to y on a set.
f
y
(x, y) = f

y
(x, y) = lim
h0
f(x, y +h) f(x, y)
h
for each (x, y) D.
In this case we can dene the partial derivative functions of f with respect to x,
respectively to y
f
x
= f

x
: D R (3)
202
(x, y)
f
x
(x, y)
respectively
f
y
= f

y
: D R (4)
(x, y)
f
y
(x, y).
Actually, to nd the partial derivative function with respect to x we think of y as
a constant and we dierentiate in the usual way with respect to x. This gives another
method of computing the partial derivatives at a given point. First we compute the
partial derivative functions and then we evaluate them at the considered point.
Example 2. Another way of computing the partial derivatives in Example 1 is
the following:
f
x
: R
2
R
f
x
(x, y) = (x
2
+y
2
+xy)

x
= (x
2
)

x
+ (y
2
)

x
+ (xy)

x
= 2x + 0 +y = 2x +y
So,
f
x
(2, 1) = 2 2 + 1 = 5
f
y
(x, y) = (x
2
+y
2
+xy)

y
= (x
2
)

y
+ (y
2
)

y
+ (xy)

y
= 2y +x
and
f
y
(2, 1) = 2 1 + 2 = 4,
as we expected.
As we can see in the above computations in order not to make mistakes, we have
to mention with respect to whom we compute the partial derivative functions.
Example 3. Compute the partial derivatives of each of the following functions:
a) f : D R, f(x, y) =
xy
x
2
+y
2
b) g : D R, g(s, t) = (s
2
st +t
2
)
5
c) f : D R, f(x, y) =
5

xy.
Solution. a) D = (x, y) R
2
[ x
2
+y
2
,= 0 = R
2
(0, 0)
To compute
f
x
, think of the variable y as a constant and apply the quotient rule
(see subsection 3.1.2)
f
x
(x, y) =
_
xy
x
2
+y
2
_

x
=
(xy)

x
(x
2
+y
2
) xy(x
2
+y
2
)

x
(x
2
+y
2
)
2
=
y(x
2
+y
2
) xy 2x
(x
2
+y
2
)
2
=
y(y
2
x
2
)
(x
2
+y
2
)
2
203
To compute
f
y
, think of the variable x as a constant and apply the quotient rule.
f
y
(x, y) =
_
xy
x
2
+y
2
_

y
=
(xy)

y
(x
2
+y
2
) xy(x
2
+y
2
)

y
(x
2
+y
2
)
2
=
x(x
2
+y
2
) xy 2y
(x
2
+y
2
)
2
=
x(x
2
y
2
)
(x
2
+y
2
)
2
b) D = R
2
To compute
g
s
, we treat the variable t as if it is a constant.
g
s
(s, t) = [(s
2
st +t
2
)
5
]

s
= 5(s
2
st +t
2
)
4
(s
2
st +t
2
)

s
= 5(s
2
st +t
2
)
4
(2s t)
In the same way we can obtain that
g
t
(s, t) = [(s
2
st +t
2
)
5
]

t
= 5(s
2
st +t
2
)
4
(s
2
st +t
2
)

t
= 5(s
2
st +t
2
)
4
(2t s)
c) D = R
2
f
x
(x, y) = (
5

xy)

x
=
5

y (
5

x)

x
=
5

y (x
1
5
)

=
5

y
1
5
x
1
5
1
=
5

y
1
5

1
5

x
4
=
1
5

5
_
y
x
4
The previous equality is true for x ,= 0.
So, if x = 0, in order to obtain the partial derivative with respect to x at the
points of the form (0, y), we cannot simply dierentiate and substitute (x, y) = (0, y).
In fact, we can use the denition 1.
f
x
(0, y) = lim
x0
f(x, y) f(0, y)
x 0
= lim
x0
5

xy
x
= lim
x0
5
_
y
x
4
=
_
, if y > 0
, ify < 0
, for x = 0 and y ,= 0.
If x = 0, y = 0 we have
f
x
(0, 0) = lim
x0
f(x, 0) f(0, 0)
x 0
= lim
x0
0 0
x
= lim
x0
0 = 0
204
f
y
(x, y) = (
5

xy)

y
=
5

x (
5

y)

y
=
5

x
1
5
5
_
y
4
=
1
5

5
_
x
y
4
which is valid for y ,= 0 and x ,= 0.
For y = 0 and x ,= 0
f
y
(x, 0) = lim
y0
f(x, y) f(x, 0)
y 0
= lim
y0
5

xy
y
= lim
y0
5
_
x
y
4
=
_
, if x > 0
, if x < 0
For y = 0 and x = 0
f
y
(0, 0) = lim
y0
f(0, y) f(0, 0)
y 0
= lim
y0
0 0
y
= 0
In conclusion
f
x
(x, y) =
_

_
0, (x, y) = (0, 0)
1
5
5
_
y
x
4
, (x, y) ,= (0, y)
does not exist, (x, y) = (0, y) where y ,= 0
and
f
y
(x, y) =
_

_
0, (x, y) = (0, 0)
1
5
5
_
x
y
4
, (x, y) ,= (x, 0)
does not exist, (x, y) = (x, 0) where x ,= 0.
Example 4. Economic interpretation of the partial derivatives
Let Q = f(K, L) be a production function where Q represents the output, K
represents the capital input and L represents the labor input.
If the rm is using K
0
units of capital and L
0
units of labor to produce Q
0
units
of output then the partial derivative:
f
K
(K
0
, L
0
) = lim
KK
0
f(K, L
0
) f(K
0
, L
0
)
K K
0
= lim
K0
f(K
0
+ K, L
0
) f(K
0
, L
0
)
K
= lim
K0
Q
K

Q
K
is the rate at which output changes with respect to capital, keeping L xed at L
0
.
If capital increases by K, then the output will increase by
Q
f
K
(K
0
, L
0
) K.
205
If K = 1 then Q
f
K
(K
0
, L
0
) so
f
K
(K
0
, L
0
) represents approximately the
change in output due to a one unit increase in capital (keeping L xed).
f
K
(K
0
, L
0
) is called the marginal product of capital.
In the same way we can deduce that
f
L
(K
0
, L
0
) is the rate at which output
changes with respect to labor in the case in which capital is held xed at K
0
.
f
L
(K
0
, L
0
) represents approximately the change in output due to a one unit
increase in labor (keeping the capital xed at K
0
).
f
L
(K
0
, L
0
) is called the marginal product of labor.
As a particular case of the previous analysis we will consider the Cobb-Douglas
production function
Q : [0, ) [0, ) R
Q(K, L) = 4K
3/4
L
1/4
If K = 10000 and L = 625 the output is
Q = 4 (10
4
)
3/4
(5
4
)
1/4
= 20000.
The partial derivatives at (10000, 625) are
Q
K
= (4K
3
4
L
1
4
)

K
= 4L
1
4

3
4
K
3
4
1
= 3
_
L
K
_1
4
= 3
4
_
L
K
Q
K
(10000, 625) = 3
4
_
625
10000
= 3
5
10
=
3
2
Q
K
(10000, 625) =
3
2
Q
L
= 4
1
4
K
3
4
L

3
4
=
_
K
L
_3
4
Q
L
(10000, 625) =
_
10000
625
_3
4
=
_
10
5
_
3
= 2
3
= 8
Q
L
(10000, 625) = 8
By the previous discussion if L
0
= 625 and K is increased by K, Q will increase
by approximately
3
2
K.
If K = 10 then
Q(10000, 625) 20000 +
3
2
10 = 20015.
Evaluating
Q(10010, 625) = 4(10010)
3
4
625
1
4
= 20014, 99,
206
so the previous estimation is a very good one.
In the same way, by using the previous discussion if Q
0
= 10000 and L is decreased
by L, L will decrease by approximately 8 L.
That means that a 10 units decrease in labor should induce a 8 10 decrease in
output.
Q(10000, 615) 20000 80 = 19920
Evaluating
Q(10000, 615) = 4 (10000)
3
4
615
1
4
19919, 5
so the previous estimation is a good one.
Example 5. In example 1, subsection 4.1.1 we described the work of Cobb and
Douglas in modelling the total production of an economic system.Here we use the
partial derivatives to show how their model can be obtained from certain assumptions
on economy.
If Q = Q(K, L) then the partial derivatives
Q
K
,
Q
L
are the rates of change at
which production modies with respect to amount of capital and labor.
The assumptions made by Cobb and Douglas are the following.
(a) If either capital or labor vanishes, then so will production.
(b) The marginal product of capital is proportional to the amount of production
per unit of capital.
(c) The marginal product of labor is proportional to the amount of production
per unit of labor.
The second assumption says that
Q
K
=
Q
K
for some constant which is equivalent to
Q
K
Q
=
1
K
.
If we keep L constant, by integrating this equality with respect to K we get
ln Q = ln K + ln C
1
,
where C
1
is a function which depend on L. From the previous relation we get
Q(K, L) = C
1
(L) K

(5)
Similarly, from the third assumption we get
Q(K, L) = C
2
(K) L

(6)
Combining (5) and (6) we have
Q(K, L) = bK

207
where b is a constant that is independent of both K and L.
If capital are labor are both increased by a factor m, then
Q(mK, mL) = b(mK)

(mL)

= bm
+
K

= m
+
Q(K, L)
If + = 1, then Q(mK, mL) = mQ(K, L), which means that the production is
also increased by a factor m. That is why Cobb and Douglas assumed that + = 1
and therefore
Q(K, L) = bK

L
1
.
Remark 2. The existence of partial derivatives of a function f at a given point
does not imply the continuity of f at that point, as we can see in the next example.
Example 6. Let f : R
2
R,
f(x, y) =
_
xy
x
2
+y
2
, (x, y) ,= (0, 0)
0, (x, y) = (0, 0)
Show that f is not continuous at (0,0) even that the function f admits partial
derivatives at (0,0).
Solution. Example 2 from subsection 4.1.2 shows that
lim
(x,y)(0,0)
f(x, y)
does not exist and in consequence f is not continuous at the point (0,0).
We will prove next that the partial derivatives of f exist at (0,0).
f
x
(0, 0) = lim
x0
f(x, 0) f(0, 0)
x
= lim
x0
0 0
x
= lim
x0
0
x
= lim
x0
0 = 0
f
y
(0, 0) = lim
y0
f(0, y) f(0, 0)
y
= lim
y0
0
y
= lim
y0
0 = 0.
Next, we will present the denition of partial derivatives in the general case of a
function of n variables.
Denition 3. Let f : D R, D R
n
, a = (a
1
, . . . , a
n
) D and let i = 1, n.
The partial derivative of function f with respect to x
i
at a is the following limit
(if the limit exists and has a nite value):
f
x
i
(a) = f

x
i
(a) =
208
lim
xx
i
f(a
1
, . . . , a
i1
, x
i
, a
i+1
, . . . , a
n
) f(a
1
, . . . , a
n
)
x
i
a
i
(7)
If the partial derivative of function f with respect to x
i
at a exists for all points a
in D we can dene the partial derivative function with respect to x, in the following
way:
f
x
i
: D R, x
f
x
i
(x) (8)
Denition 4. A function f : D R is continuously dierentiable (or C
1
) on D
if the partial derivative functions
f
x
i
exist for all i = 1, n and are continuous on D.
Example 7. Compute the partial derivative functions of each of the following
functions:
a) f : R
3
R, f(x, y, z) = x
2
+ sin yz
b) f : D R, D = x R
n
[ x
1
> 0, x
2
> 0, . . . , x
n
> 0
f(x
1
, . . . , x
n
) = x
2
1
+x
2
2
+ +x
2
n1
+
x
n
x
1
.
Solution. a)
f
x
: R
3
R,
f
x
(x, y, z) = (x
2
+ sin yz)

x
= 2x + 0 = 2x
f
y
: R
3
R,
f
y
(x, y, z) = (x
2
+ sin yz)

y
= 0 +z cos yz = z cos yz
f
y
: R
3
R,
f
z
(x, y, z) = (x
2
+ sin yz)

z
= y cos yz
b)
f
x
1
: D R,
f
x
1
(x) =
_
x
2
1
+ +x
2
n1
+
x
1
x
n
_

x
1
= 2x
1
+
1
x
n
f
x
2
: D R,
f
x
2
(x) =
_
x
2
1
+ +x
2
n1
+
x
1
x
n
_

x
2
= 2x
2
. . . . . . . . .
f
x
n1
: D R,
f
x
n1
(x) =
_
x
2
1
+ +x
2
n1
+
x
1
x
n
_

x
n1
= 2x
n1
f
x
n
: D R,
f
x
n
(x) =
_
x
2
1
+ +x
2
n1
+
x
1
x
n
_

x
n
=
x
1
x
2
n
We end this section by presenting a geometric interpretation of partial derivatives.
Suppose f is a function of two variables x and y. If c is some value in the range
of the function f, then the equation f(x, y) = c describes a curve lying on the plane
z = c called the trace of the graph of f in the plane z = c.
209
If this trace is projected onto the xy-plane, the resulting curve in the xy-plane is
called a level curve.
Actually, for any constant c, the points (x, y) for which f(x, y) = c form a curve
in the xy plane that is said a level curve of f.
The slope of the line that is tangent to the level curve f(x, y) = c at a particular
point is given by the derivative y

(x). This derivative is the rate of change of y with


respect to x on the level curve and hence is approximately the amount by which the
y coordinate of a point on the level curve changes when the x coordinate is increased
by 1.
For example, if f represent output and x and y represent the levels of skilled
and unskilled labor, respectively, the slope y

(x) of the tangent to the level curve


f(x, y) = c is an approximation to the amount by which the manufacturer should
change the level of unskilled labor y to compensate for a 1-unit increase in the level
of skilled labor x so that output will remain unchanged.
`

`
_
y

(x)
f(x, y) = c
y
x x x + 1
actual change in y
on level curve
One way to compute y

(x) is to solve the equation f(x, y) = c in terms of x, and


then dierentiate the resulting expression with respect to x. Sometimes is dicult or
even impossible to solve the equation f(x, y) = c explicitly for y. In such cases, we can
dierentiate the equality f(x, y) = c with respect to x, by considering y to depend on
x:
f
x
1 +
f
y
y

(x) = 0
wherefrom we get the formula
y

(x) =
f

x
(x, y)
f

y
(x, y)
(9)
Since
y

(x) = lim
x0
y
x

y
x
,
210
then
y y

(x) x =
f

x
f

y
x.
In conclusion, the change in y needed to compensate a small change x in x so
that the value of the function f(x, y) will remain unchanged is
y
f

x
f

y
x.
Example 8. Indierence curves
Let U(x, y) be a utility function which measures the total satisfaction (or utility)
the consumer obtains from having x units of the rst commodity and y units of the
second. An indierence curve is a level curve U(x, y) = C of the utility function.
An indierence curve gives all the combination of x and y that lead to the same
level of consumer satisfaction.
We next present a typical example involving the slope of an indierence curve.
Suppose U(x, y) = x
3
2
y. The consumer currently owns x = 16 units of the rst
commodity and y = 20 units of the second.
Use calculus to estimate how many units of these commodity could substitute for
1 unit of the rst commodity without changing the total utility.
Solution. The level of utility is
U(16, 20) = 1280.
The corresponding indierence curve is sketched in gure below. Since 1280 = x
3
2
y,
then y = 1280x

3
2
.
`

We try to estimate the change y required to compensate a change of x = 1


so that the utility will remain at its current level which is 1280. The approximation
formula
y
U

x
U

y
x =
U

x
U

y
=
3
2
x
1
2
y
x
3
2
=
3
2

y
x
211
with x = 16 and y = 20 gives
y =
3
2

20
16
=
15
8
= 1, 875 units.
4.3 Higher order partial derivatives
The partial derivative
f
x
i
of a function f, i 1, . . . , n, is itself a function of
n variables so we can continue analyzing the existence of partial derivatives of these
partial derivatives obtaining the second order partial derivatives.
Let f : D R, D R
n
, a D and i = 1, n. We assume that the partial derivative
function
f
x
i
exists.
Denition 1. If the function
f
x
i
is partial dierentiable at the point a with
respect to x
j
(j = 1, n) we say that f admits second-order partial derivative at the
point a with respect to x
i
and x
j
.
We have the following notations:

x
j
_
f
x
i
_
(a) =

2
f
x
i
x
j
(a)
or
(f

x
i
)

x
j
(a) = f

x
i
x
j
(a).
If i = j, then the second order partial derivative is written in the following way

x
i
_
f
x
i
_
(a) =

2
f
x
2
i
(a) = f

x
2
i
(a)
If i ,= j, then

2
f
x
i
x
j
(a) = f

x
i
x
j
(a)
is called the mixed partial derivative or cross partial derivative.
Continuing in the same way we can obtain higher order partial derivatives.
Example 1. Let f : R
2
R, f(x, y) = e
x
y. Compute f

x
, f

x
2
, f

x
2
y
.
Solution.
f

x
= (e
x
y)

x
= y(e
x
)

x
= ye
x
f

x
2 = (f

x
)

x
= (ye
x
)

x
= ye
x
f

x
2
y
= (f

x
2)

y
= ((f

x
)

x
)

y
= e
x
Remark 1. A function of n variables can admit
n rst order partial derivatives:
f
x
1
,
f
x
2
, . . . ,
f
x
n
212
n
2
second order partial derivatives (since for each partial derivative we have n
second order partial derivatives)
n
3
third order partial derivatives
. . . . . . . . .
n
k
k
th
order partial derivatives.
It is natural to arrange the n
2
partial derivatives of f at a given point a into a
n n matrix whose (i, j)th entry is

2
f
x
i
x
j
(a). This matrix is called the Hessian or
the Hessian matrix of f at a.
H(a) =
_
_
_
_
_
_
_
_
_
_
_
_

2
f
x
2
1
(a)

2
f
x
1
x
2
(a) . . .

2
f
x
1
x
n
(a)

2
f
x
2
x
1
(a)

2
f
x
2
2
(a) . . .

2
f
x
2
x
n
(a)
. . . . . . . . . . . .

2
f
x
n
x
1
(a)

2
f
x
n
x
2
(a) . . .

2
f
x
2
n
(a)
_
_
_
_
_
_
_
_
_
_
_
_
(1)
If we denote by
a
ij
=

2
f
x
i
x
j
(a),
then a shorter written form of H(a) is the following:
H(a) =
_
_
_
_
a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
. . . . . . . . . . . .
a
n1
a
n2
. . . a
nn
_
_
_
_
Denition 2. a) If all the second partial derivatives of f exist and are themselves
continuous functions we say that f is twice continuously dierentiable or C
2
.
b) A function f is C
3
(or 3 times continuously dierentiable) if all of its n
3
third
order partial derivatives exist and are continuous.
In the same way we can dene C
k
functions (for which all n
k
k
th
order partial
derivatives exist and are continuous).
Example 2. Let f : R
2
R be the function dened by
f(x, y) = x
3
y
2
4y
2
x.
Compute all the third partial derivative functions of f.
Solution. It is easier for us to arrange the calculation in the following tree dia-
gram.
213
f(x, y) = x
3
y
2
4y
2
x
_

_
f

x
= 3x
2
y
2
4y
2
_

_
f

x
2 = 6xy
2
_
_
_
f

x
3 = 6y
2
f

x
2
y
= 12xy
f

xy
= 6x
2
y 8y
_
_
_
f

xyx
= 12xy
f

xy
2 = 6x
2
8
f

y
= 2x
3
y 8yx
_

_
f

yx
= 6x
2
y 8y
_
_
_
f

yx
2 = 12xy
f

yxy
= 6x
2
8
f

y
2 = 2x
3
8x
_
_
_
f

y
2
x
= 6x
2
8
f

y
3 = 0
From the previous example we observe that some of the mixed partial derivatives
are equal and the order of dierentiation seems to be of no importance (at least in this
case). This is not an accident, the equality is true for almost all functions which arise
in practical applications. More precisely we have the following result. This theorem
gives us informations about the equality of the mixed second order partial derivatives.
Theorem 1. (Schwarz theorem, general case). Let f : D R, D R
n
, a D,
i, j = 1, n, with i ,= j.
If both f

x
i
x
j
and f

x
j
x
i
exist for all points in a ball centered at a and they are
continuous at a then
f

x
i
x
j
(a) = f

x
j
x
i
(a).
For the sake of simplicity of the notations we will present the proof just for the
case of two variables.
The reasoning for the general case remains the same. In the case of two variables
we have the following statement of Schwarz theorem.
Theorem 2. Let f : D R, D R
2
, (a, b) D. If both f

xy
and f

yx
exist for all
points in a disc centered at (a, b) and they are continuous at (a, b) then
f

xy
(a, b) = f

yx
(a, b).
Proof. Let r be the radius of the disc centered at (a, b) and let u, v be real number
such that [u[, [v[ <
r

2
.
214
(a, b)
(a, b +v) (a +u, b +v)
(a +u, b)
In this case the rectangle whose corners are the points (a, b), (a+u, b), (a+u, b+v)
and (a, b +v) is situated inside the disc centered at (a, b) with radius r.
Applying the mean value theorem (Theorem 3, subsection 3.1.4) to the function
g : [a, a +u] R,
g(x) = f(x, b +v) f(x, b)
we nd there exist a
0
(which depends on u and v) between a and a +u such that
g(a +u) g(a) = g

(a
0
)u
which translates to
f(a +u, b +v) f(a +u, b) f(a, b +v) +f(a, b)
=
_
f
x
(a
0
, b +v)
f
x
(a
0
, b)
_
u
Applying Lagranges theorem to the function
h : [b, b +v] R, h(y) =
f
x
(a
0
, y),
there is some b
0
(which depends on u and v) between b and b +v such that
h(b +v) h(b) = h

(b
0
)v
Hence,
f(a +u, b +v) f(a +u, b) f(a, b +v) +f(a, b) =

2
f
xy
(a
0
, b
0
)uv
Interchanging x and y in the argument above, we can nd a point (a
1
, b
1
) such
that:
f(a +u, b +v) f(a +u, b) f(u, b +v) +f(a +b) =

2
f
yx
(a
1
, b
1
)uv
215
Thus

2
f
xy
(a
0
, b
0
) =

2
f
yx
(a
1
, b
1
)
Letting u, v 0 we obtain that (a
0
, b
0
) (a, b) and (a
1
, b
1
) (a, b) so the claim
follows from the continuity of the partial derivatives at (a, b).
Fortunately, just about every function we will meet in applications will be C
2
and
therefore its mixed partial derivatives will be equal.
It is important to note that Schwarz theorem implies that the Hessian matrix (1)
is a symmetric one.
Remark 2. If f : D R, D R
n
is a C
2
function on D and a D then the
Hessian matrix H(a) is a symmetric one.
Remark 3. The previous result (Schwarz theorem) is true for mixed higher order
derivatives too (with the assumption of continuity of partial derivatives). So, in these
conditions we can reverse the order of any two successive dierentiations.
For example, if we take an x
1
, x
2
, x
4
derivative of order 3 then the order of
dierentiation does not matter for a C
3
function and we have
f

x
1
x
2
x
4
= f

x
1
x
4
x
2
= f

x
2
x
1
x
4
= f

x
2
x
4
x
1
= f

x
4
x
1
x
2
= f

x
4
x
2
x
1
Example 3. Let f : R
2
R dened by
f(x, y) =
_
_
_
0, if (x, y) = (0, 0)
x
3
y xy
3
x
2
+y
2
, otherwise.
Compute the mixed partial derivatives at (0,0). Does this result contradict the
conclusion of Schwarz theorem?
Solution. For each (x, y) ,= (0, 0), by applying the quotient rule of dierentiation
we easily get that:
f
x
(x, y) =
x
4
y y
5
+ 4x
2
y
3
(x
2
+y
2
)
2
and
f
y
(x, y) =
x
5
4x
3
y
2
xy
4
(x
2
+y
2
)
2
We will compute the mixed partial derivatives at (0,0) by using the limit of de-
nition

2
f
yx
(0, 0) =

y
_
f
x
_
(0, 0) = lim
y0
f
x
(0, y)
f
x
(0, 0)
y 0
(2)
By the previous computation we have that
f
x
(0, y) =
y
5
y
4
= y
216
On the other hand, by using once more the limit denition we obtain
f
x
(0, 0) = lim
x0
f(x, 0) f(0, 0)
x 0
= lim
x0
0
y
2
0
x
= lim
x0
0
x
= 0.
Substituting these two partial derivatives in (2) we get

2
f
yx
(0, 0) = lim
y0
y 0
y
= 1
so the nal result will be

2
f
yx
(0, 0) = 1.
In the same way we obtain

2
f
xy
(0, 0) =

x
_
f
y
_
(0, 0) = lim
x0
f
y
(x, 0)
f
y
(0, 0)
x
= lim
x0
x 0
x
= 1
so

2
f
xy
(0, 0) = 1
In conclusion we have

2
f
xy
(0, 0) ,=

2
f
yx
(0, 0)
This result does not contradict Schwarzs theorem, since, as we can latter see, the
partial derivative function is not continuous at (0,0) so the hypotheses of Schwarzs
theorem do not hold.
For each (x, y) ,= (0, 0) we have

2
f
yx
(x, y) =

y
_
f
x
_
(x, y) =
_
x
4
y y
5
+ 4x
2
y
3
(x
2
+y
2
)
2
_

y
=
x
6
y
6
9x
2
y
4
+ 9x
4
y
2
(x
2
+y
2
)
3
It remains for us to evaluate the limit of the previous function at (0,0). We will
compute rst the limit along the line y = mx, m R.
lim
x0
y=mx

2
f
yx
(x, y) = lim
x0
y=mx
x
6
y
6
9x
2
y
4
+ 9x
4
y
2
(x
2
+y
2
)
3
= lim
x0
x
6
(1 m
6
9m
4
+ 9m
2
)
x
6
(1 +m
2
)
3
=
1 m
6
9m
4
+ 9m
2
(1 +m
2
)
3
217
Since the limit depends on m we conclude that lim
(x,y)(0,0)

2
f
yx
(x, y) does not
exist.
In consequence the function

2
f
xy
is not continuous in (0,0).
4.4 Dierentiability
4.4.1 Dierentiability. The total dierential
Because of the geometric representation we will start the discussion in this section
with the case of functions of two variables.
Suppose we are interested in the behaviour of a function f in the neighborhood of
a given point (a, b).
We know from the calculus of one variable that (see subsection 3.1.3)
if y = b and u is a real number suciently small we have that
f(a +u, b) f(a, b)
f
x
(a, b) u
if x = a and v is a real number suciently small than
f(a, b +v) f(a, b)
f
y
(a, b) v.
It is natural to ask ourselves what happens if both a, b changes into a + u and
b +v. The expected eect is the sum of the eects of the one variables changes.
f(a +u, b +v) f(a, b)
f
x
(a, b)u +
f
y
(a, b)v (1)
For a geometric interpretation it is more convenient to write the previous formula
in the form
f(a +u, b +v) f(a, b) +
f
x
(a, b)u +
f
y
(a, b)v (2)
The right-hand side of (2) is exactly the equation of the tangent plane to the graph
of f at the point ((a, b), f(a, b)) and therefore (2) is an analytical expression of the
fact that the tangent plane is a good approximation to the graph (as we can see in
the gure below).
218
We have now to justify rigorously the discussion above.
Denition 1.
a) Dierentiability at a given point (two variables case)
Let f : D R, D R
2
, (a, b) D.
We say that f is dierentiable at (a, b) if there exist

1
,
2
R, : D R, lim
(x,y)(a,b)
(x, y) = (a, b) = 0
such that
f(x, y) f(a, b) =
1
(x a) +
2
(y b)
+(x, y)
_
(x a)
2
+ (y b)
2
, for all (x, y) D (3)
b) Dierentiability on a set
We say that f is dierentiable on D if f is dierentiable at each point in D.
Remark 1. Let f : D R, D R
2
, (a, b) D.
a) If f is dierentiable at (a, b) then f is continuous at (a, b).
b) If f is dierentiable at (a, b) then f is partial dierentiable at (a, b) with respect
to x and y and
f
x
(a, b) =
1
and
f
y
(a, b) =
2
.
c) The converse statements of part a) and part b) are not true.
Proof. a) If we let (x, y) (a, b) in (3) we get that
lim
xa
yb
[f(x, y) f(a, b)] = lim
xa
yb
[
1
(x a) +
2
(y b)
+(x, y)
_
(x a)
2
+ (y b)
2
] = 0.
In consequence
lim
(x,y)(a,b)
f(x, y) = f(a, b),
so f is continuous at (a, b).
219
b) We have to evaluate the limits
lim
xa
f(x, b) f(a, b)
x a
and lim
yb
f(a, y) f(a, b)
y b
.
If we take y = b in (3), divide the obtained relation by x a and take then the
limit in both parts of equality we obtain
f
x
(a, b) = lim
xa
f(x, b) f(a, b)
x a
= lim
xa
_

1
+(x, b)
[x a[
x a
_
=
1
+ lim
xa
(x, b)
[x a[
x a
=
1
+ 0 =
1
.
The last limit is 0 as a limit of a product between a function with limit equal to
0 and a bounded function.
The fact that
f
y
(a, b) =
2
can be proved in a similar manner.
c) We will present here a function which is continuous at (0,0), admits partial
derivatives at (0,0) and still is not dierentiable at (0,0).
Let
f : R
2
R, f(x, y) =
_
_
_
xy
_
x
2
+y
2
, (x, y) ,= (0, 0)
0, (x, y) = (0, 0)
We study rst the continuity of f at (0,0) by taking the following limit:
lim
x0
y0
xy
_
x
2
+y
2
= lim
x0
y0
y
_
x
2
+y
2
x = 0 = f(0, 0)
The previous limit is 0 since
1
y
_
x
2
+y
2
1 and lim
x0
y0
x = 0.
By using the limit denition we study the existence of partial derivatives at (0,0).
f
x
(0, 0) = lim
x0
f(x, 0) f(0, 0)
x
= lim
x0
0
x
= lim
x0
0 = 0
f
y
(0, 0) = lim
y0
f(0, y) f(0, 0)
y
= lim
y0
0
y
= lim
y0
0 = 0
Assume, by contradiction, that f is dierentiable at (0,0). Then there exists
: R
2
R, lim
x0
y0
(x, y) = (0, 0) = 0,
220
such that
f(x, y) f(0, 0) =
f
x
(0, 0)(x 0) +
f
y
(0, 0)(y 0)
+(x, y)
_
x
2
+y
2
, (x, y) R
2
.
Since
f
x
(0, 0) =
f
y
(0, 0) = 0,
the previous equality reduces to
f(x, y) = (x, y)
_
x
2
+y
2
,
wherefrom we obtain:
(x, y) =
_
_
_
xy
_
x
2
+y
2
, (x, y) ,= (0, 0)
0, (x, y) = (0, 0)
But lim
(x,y)(0,0)
(x, y) does not exist (see Example 2, subsection 4.1.2) which con-
tradicts the assumption, so f is not dierentiable as desired.
Remark 2. Let f : D R, (a, b) R. If f is dierentiable at (a, b) then there
exists a function : D R, with
lim
xa
yb
(x, y) = (a, b) = 0,
such that
f(x, y) = f(a, b) +
f
x
(a, b)(x a) +
f
y
(a, b)(y b)
+(x, y)
_
(x a)
2
+ (y b)
2
(4)
for all (x, y) D.
If we let x a = u, y b = v and u and v are suciently small then the quantity
(x, y)
_
(x a)
2
+ (y b)
2
is small enough to justify the geometric approximation
f(x, y) f(a, b) +
f
x
(a, b)(x a) +
f
y
(a, b)(y b)
which was discussed at the beginning of this section.
Since the denition of dierentiability is quite dicult to be checked we will
present, without proof (which is beyond the scope of this text) the sucient con-
ditions for dierentiability.
Theorem 1. If f has continuous partial derivatives on D (f is C
1
) then f is
dierentiable on D.
Denition 2. (the dierential or the total dierential, 2 variables case)
221
Lett f : D R, (a, b) D. If f is dierentiable at (a, b) the dierential (the total
dierential) of f at (a, b) is dened by
df
(a,b)
: R
2
R
df
(a,b)
(h
1
, h
2
) =
f
x
(a, b)h
1
+
f
y
(a, b)h
2
(5)
For h = (h
1
, h
2
) R
2
the number
df
(a,b)
(h) = df
(a,b)
(h
1
, h
2
)
is called the dierential of f at (a, b) of increment h = (h
1
, h
2
) R
2
.
By using this new notion the relation (5) becomes:
f(x, y) = f(a, b) +df
(a,b)
(x a, y b)
+(x, y)
_
(x a)
2
+ (y b)
2
(6)
with the correspondent approximation
f(x, y) f(a, b) +df
(a,b)
(x a, y b) (7)
Example 1. Compute the total dierential of the function
f : R
2
R, f(x, y) = x
2
+y
2
+xy
at the point (2,1).
Solution. The function f is a C
1
function on R
2
so, according to Theorem 1, f is
dierentiable on R
2
and in consequence the total dierential of f exists at any point
in R
2
.
df
(2,1)
: R
2
R
df
(2,1)
(h
1
, h
2
) =
f
x
(2, 1)h
1
+
f
y
(2, 1)h
2
= (2 2 + 1)h
1
+ (2 1 + 2)h
2
= 5h
1
+ 4h
2
We will present now the denitions in general case.
Denition 3. (dierentiability, general case)
Let f : D R, D R
n
, D domain. We say that f is dierentiable at a if there
exist
1
,
2
, . . . ,
n
R, : D R,
lim
xa
(x) = (a) = 0,
such that
f(x) f(a) =
1
(x
1
a
1
) + +
n
(x
n
a
n
)
222
+(x)|x a|, x D. (8)
As in the two dimensional case if f is dierentiable at a then f is continuous at a
and f admits partial derivatives at a with
f
x
i
(a) =
i
, i = 1, n.
If f is C
1
on D then f is dierentiable at each point in D.
Denition 4. (the dierential or the total dierential, general case)
Let f : D R, D R
n
and a D. If f is dierentiable at a then the dierential
(the total dierential) of f at a is dened by
df
(a)
: R
n
R
df
(a)
(h) =
f
x
1
(a)h
1
+
f
x
2
(a)h
2
+ +
f
x
n
(a)h
n
(9)
Sometimes instead of h
i
, i = 1, n, we can use the notation dx
i
which is considered
to be a small increment along x
i
axis.
If dx = (dx
1
, dx
2
, . . . , dx
n
) then
df
(a)
(dx) =
f
x
1
(a)dx
1
+
f
x
2
(a)dx
2
+ +
f
x
n
(a)dx
n
.
By using (9) the relation (8) can be written in the following form
f(x) = f(a) +df
(a)
(x a) +(x)|x a|, (10)
for all x D.
Since (x)|x a| 0 as x a for |x a| small enough we have the following
approximation of (10)
f(x) f(a) +df
(a)
(x a) (11)
The chain rule
Recall that the chain rule for functions of a single variable gives the rule for
dierentiating a composite function. If f and g are dierentiable functions such that
f g makes sense then
(f g)

(t) = f

(g(t)) g

(t).
For functions of more than one variable, the Chain rule has several versions, each
of them giving a rule for dierentiating a composite function.
The rst version (Theorem 2) analyse the case where z = f(x, y) and each of the
variables x and y is a function of a variable t.
This means that f is indirectly a function of t,
z(t) = f(x(t), y(t)),
223
and the Chain rule gives a formula for dierentiating z as a function of t. Assume f,
x and y are dierentiable functions.
Theorem 2. Suppose that z = f(x, y) is a dierentiable function of x and y, where
x = x(t) and y = y(t) are both dierentiable functions of t. Then z is dierentiable
function of t and
z

(t) =
f
x
(x(t), y(t))x

(t) +
f
y
(x(t), y(t))y

(t) (12)
For short, we can write down the previous formula in the following way:
z

(t) =
f
x
x

(t) +
f
y
y

(t) (13)
Proof. From Denition 1 and Remark 2 we have:
z =
f
x
x +
f
y
y +
_
(x)
2
+ (y)
2
where 0 as (x, y) (0, 0).
Dividing both sides of this equation by t, we have
z
t
=
f
x

x
t
+
f
y

y
t
+

_
x
t
_
2
+
_
y
t
_
2
If we now let t 0, then
x = x(t + t) x(t) 0
because x is dierentiable and therefore continuous. In the same way y 0. This
will imply that 0 so,
z

(t) = lim
t0
z
t
=
f
x
lim
t0
x
t
+
f
y
lim
t0
y
y
+ lim
t0

_
lim
t0
x
t
_
2
+
_
lim
t0
y
t
_
2
=
f
x
x

(t) +
f
y
y

(t) + 0
_
(x

(t))
2
+ (y

(t))
2
=
f
x
x

(t) +
f
y
y

(t)
Since we often write
z
x
in place of
f
x
we can rewrite the chain rule in the form
z

(t) =
z
x
x

(t) +
z
y
y

(t). (14)
224
If z depends on more than two variables then
z

(t) =
z
x
1
x

1
(t) +
z
x
2
x

2
(t) + +
z
x
n
x

n
(t)
=
n

i=1
z
x
i
x

i
(t) (15)
As you have already observed, in order not to make the notations too complicated
we worked in a more formal way in formulating and proving the previous theorem.
Example 2. Use the chain rule to nd w

(t) if w = ln
_
x
2
+y
2
+z
2
, x = sin t,
y = cos t, z = tg t et t =

4
.
Solution.
w

(t) =
w
x
x

(t) +
w
y
y

(t) +
w
z
z

(t)
= (ln
_
x
2
+y
2
+z
2
)

x
sin

t + (ln
_
x
2
+y
2
+z
2
)

y
cos

t
+(ln
_
x
2
+y
2
+z
2
)

z
tg

t
=
(
_
x
2
+y
2
+z
2
)

x
_
x
2
+y
2
+z
2
cos t
(
_
x
2
+y
2
+z
2
)

y
_
x
2
+y
2
+z
2
sin t
+
(
_
x
2
+y
2
+z
2
)

z
_
x
2
+y
2
+z
2

1
cos
2
t
w

(t) =
x
x
2
+y
2
+z
2
cos t
y
x
2
+y
2
+z
2
sin t +
z
x
2
+y
2
+z
2

1
cos
2
t
Its not necessary to substitute the expressions for x, y and z in terms of t. We
observe that when t =

4
we have
x = sin

4
=

2
2
, y = cos

4
=

2
2
, z = tg

4
= 1.
Therefore
w

4
_
=

2
2
_

2
2
_
2
+
_

2
2
_
2
+ 1

2
2

2
2
_

2
2
_
2
+
_

2
2
_
2
+ 1

2
2
+
1
_

2
2
_
2
+
_

2
2
_
2
+ 1

1
_

2
2
_
2
= 1
Hence
w

4
_
= 1.
225
We now consider the situation where z = f(x, y) but each of x and y is a function
of two variables s and t: x = x(s, t), y = y(s, t). Then z is indirectly a function of s
and t. We try to nd
z
s
and
z
t
.
Theorem 3. Suppose that z = f(x, y) is a dierentiable function of x and y,
where x = x(s, t) and y = y(s, t) are dierentiable functions of s and t. Then
z
s
=
z
x

x
s
+
z
y

y
s
and
z
t
=
z
x

x
t
+
z
y

y
t
(16)
Proof. Recall that in computing
z
s
we hold t xed and compute the derivative
of z with respect to x. Therefore we can apply Theorem 2 to obtain
z
s
=
z
x

x
s
+
z
y

y
s
.
A similar argument holds for
z
t
and the proof is complete.
Example 3. Use the chain rule to nd
z
s
and
z
t
if z = e
x+2y
, x =
s
t
and y =
t
s
.
Solution.
z
s
=
z
x

x
s
+
z
y

y
s
= (e
x+2y
)

x
_
s
t
_

s
+ (e
x+2y
)

y
_
t
s
_

s
= e
x+2y

1
t
+e
x+2y
2
_

t
s
2
_
= e
s
t
+
2t
s
_
1
t

2t
s
2
_
=
s
2
2t
2
ts
2
e
s
2
+2t
2
st
This case of the chain rule contains three types of variables: s and t are independent
variables, x and y are called intermediate variables and z is the dependent variable.
A tree diagram (see gure below) could help us to remember the previous form of
the chain rule. We draw branches from the dependent variable z to the intermediate
variables x and y. Then we draw branches from x and y to the independent variables
x and t. On each branch we write the corresponding partial derivative.
226
z
x
y
z
x
z
y
x
s
x
t
y
s
y
t
s s t t
To nd
z
s
we nd the product of the partial derivatives along each path from z
to s and then add these products:
z
s
=
z
x

x
s
+
z
y

y
s
We consider now the general case in which the dependent variable z is a function
of n intermediate variables x
1
, . . . , x
n
, each of which is a function of m independent
variables s
1
, . . . , s
m
.
Theorem 4. (general case) Suppose that z is a dierentiable function of the n
variables x
1
, . . . , x
n
and each x
j
is a dierentiable function of the m independent
variables s
1
, . . . , s
m
. Then z is a function of s
1
, . . . , s
m
and
z
s
j
=
z
x
1

x
1
s
j
+ +
z
x
n

x
n
s
j
=
n

i=1
z
x
i

x
i
s
j
(17)
for each j = 1, m.
The proof is similar to previous case.
Example 4. Use the chain rule to nd the indicated partial derivatives:
z
u
,
z
v
,
z
w
when u = 2, v = 1, w = 0 for z = x
2
+xy
3
, x = uv
2
+w
3
, y = u +ve
w
.
227
Solution.
z
x y
z
x
z
y
x
u
x
w
y
u
y
w
u u w w v
x
v
v
y
v
z
u
=
z
x

x
u
+
z
y

y
u
= (x
2
+xy
3
)

x
(uv
2
+w
3
)

u
+ (x
2
+xy
3
)

y
(u +ve
w
)

u
= (2x +y
3
)v
2
+ 3xy
2
1
When u = 2, v = 1 and w = 0, then
x = 2 1
2
+ 0 = 2 and y = 2 + 1 e
0
= 2
Therefore
z
u
(2, 1, 0) = (2 2 + 2) 1 + 3 2 2
2
1 = 6 + 24 = 30.
The other two partial derivatives can be calculated in a similar way.
4.4.2 Higher order dierentials
Denition 1. Let f : D R, D R
n
, a D.
We say that f is dierentiable of k-th order at the point a if all the partial deriva-
tives of order k 1 exist for all points near a and they are dierentiable at a.
Remark 1. If f has continuous partial derivatives of order k on D then f is
dierentiable of order k on D.
Denition 2. Let f : D R, D R
n
be a C
k
function on D. The k-th order
dierential of the function f at a of increment h = (h
1
, . . . , h
n
) R
n
is dened in the
following formal way:
d
k
f
(a)
(h) =
_

x
1
h
1
+ +

x
n
h
n
_
(k)
f(a) (1)
228
In the previous denition

x
i
, i = 1, n, is the partial dierentiation operator with
respect to x
i
and (k) is a formal power. When we rise to the formal power (k) we
apply the binomial theorem where the formal k
th
power of

x
i
means actually the
k-th order partial derivative

k
f
x
k
i
(a).
In order to make the things clear we will present some particular cases of the
previous formula.
Particular case 1. n = 2.
In this case the k-th order dierential of the function f at (a, b) of increment
h = (h
1
, h
2
) R
2
is dened by
d
k
f
(a,b)
(h
1
, h
2
) =
_

x
h
1
+

y
h
2
_
(k)
f(a, b).
a) k = 2
d
2
f
(a,b)
(h
1
, h
2
) =
_

x
h
1
+

y
h
2
_
(2)
f(a, b)
=
_

2
x
2
h
2
1
+ 2

2
xy
h
1
h
2
+

2
y
2
h
2
2
_
f(a, b)
=

2
f
x
2
(a, b)h
2
1
+ 2

2
f
xy
(a, b)h
1
h
2
+

2
f
y
2
(a, b)h
2
2
.
b) k = 3
d
3
f
(a,b)
(h
1
, h
2
) =
_

x
h
1
+

y
h
2
_
(3)
f(a, b)
=
_

3
x
3
h
3
1
+ 3

3
x
2
y
h
2
1
h
2
+ 3

3
xy
2
h
1
h
2
2
+

3
y
3
h
3
2
_
f(a, b)
=

3
f
x
3
(a, b)h
3
1
+ 3

3
f
x
2
y
(a, b)h
2
1
h
2
+ 3

3
f
xy
2
(a, b)h
1
h
2
2
+

3
f
y
3
(a, b)h
3
2
Particular case 2. n = 3.
In this case the k-th order dierential of the function f at (a, b, c) of increment
h = (h
1
, h
2
, h
3
) R
3
is dened by
d
k
f
(a,b,c)
(h
1
, h
2
, h
3
) =
_

x
h
1
+

y
h
2
+

z
h
3
_
(k)
f(a, b, c).
If k = 2 we have
d
2
f
(a,b,c)
(h
1
, h
2
, h
3
) =
_

x
h
1
+

y
h
2
+

z
h
3
_
(2)
f(a, b, c)
229
=
_

2
x
2
h
2
1
+

2
y
2
h
2
2
+

2
z
2
h
2
3
+ 2

2
xy
h
1
h
2
+2

2
yz
h
2
h
3
+ 2

2
zx
h
3
h
1
_
f(a, b, c)
=

2
f
x
2
(a, b, c)h
2
1
+

2
f
y
2
(a, b, c)h
2
2
+

2
f
z
2
(a, b, c)h
2
3
+2

2
f
xy
(a, b, c)h
1
h
2
+ 2

2
f
yz
(a, b, c)h
2
h
3
+ 2

2
f
zx
(a, b, c)h
3
h
1
Remark 2. Let f : D R, D R
n
, be a C
2
function and let a D. Then we
have
d
2
f
(a)
(h) = h H(a) h
t
, for each h R
n
where by h
t
we understand the column vector
h
t
=
_
_
_
_
_
h
1
h
2
.
.
.
h
n
_
_
_
_
_
(which is the transpose of the row vector h = (h
1
, . . . , h
n
)).
Proof. Since f is a C
2
function then the Hessian matrix H(a) is a symmetric
matrix (see Remark 2, Section 4.3)
h H(a) h
t
= (h
1
, . . . , h
n
)
_
_
_
_
_
_
_
_
_
_
_
_

2
f
x
2
1
(a)

2
f
x
1
x
2
(a) . . .

2
f
x
1
x
n
(a)

2
f
x
2
x
1
(a)

2
f
x
2
2
(a) . . .

2
f
x
2
x
n
(a)
. . . . . . . . . . . .

2
f
x
n
x
1
(a)

2
f
x
n
x
2
(a) . . .

2
f
x
2
n
(a)
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
_
h
1
h
2
. . .
h
n
_
_
_
_
=
_
_
n

j=1

2
f
x
1
x
j
(a)h
j
,
n

j=1

2
f
x
2
x
j
(a)h
j
, . . . ,
n

j=1

2
f
x
n
x
j
(a)h
j
_
_
_
_
_
_
h
1
h
2
. . .
h
n
_
_
_
_
=
n

i,j=1

2
f
x
i
x
j
(a)h
i
h
j
=
n

i=1

2
f
x
2
i
(a)h
2
i
+ 2

1i<jn

2
f
x
i
x
j
(a)h
i
h
j
= d
2
f
(a)
(h).
Remark 3. Let f : D R, D R
n
be a C
2
function and let a D. Then
Q : R
n
R dened by
Q(h) = d
2
f
(a)
(h)
230
is a quadratic form.
Proof. The proof is an easy consequence of the previous remark and of the de-
nition of a quadratic form (see 1.3).
4.4.3 Taylor formula in R
n
We learned in subsection 4.4.1 that the approximation by dierential for a C
1
function f, f : D R, a D, is
f(x) f(a) +df
a
(x a) (1)
The previous estimation is valid for |x a| suciently small.
The dierence between the left-hand side of (1) and the right-hand side is (x)|x
a| (see (10), subsection 4.1.2) where lim
xa
(x) = 0.
If we denote the dierence mentioned above by R
1
(x) (R from the remaider) then
the equality (10), subsection 4.4.1, becomes
f(x) = f(a) +df
(a)
(x a) +R
1
(x), x D (2)
where
lim
xa
R
1
(x)
|x a|
= 0.
(2) is called the Taylor approximation of order one for a C
1
function of several
variables.
The next theorem shows that if f has continuous second-order partial derivatives,
the error term is equal to a quadratic form (see 1.3) plus a term of smaller order than
|x a|
2
.
Theorem 1. (Second order Taylor formula)
Let f : D R be a C
2
function, D R
n
and let a D. Then, there exists a
function R
2
: D R such that
f(x) = f(a) +df
(a)
(x a) +
1
2!
d
2
f
(a)
(x a) +R
2
(x), x D (3)
where
lim
xa
R
2
(x)
|x a|
2
= 0.
Proof. Keep x xed and dene g : [0, 1] R by the relationship
g(u) = f(a +u(x a)).
Then
f(x) f(a) = g(1) g(0).
We will prove the theorem by applying the second order Taylors formula to g for
x = 1 and a = 0. We obtain
g(1) = g(0) +g

(0) +
1
2
g

(c), where 0 < c < 1.


231
Here we have used Lagranges form of the remainder (see Remark 2, subsection
3.1.4).
We will compute the derivatives of g by the chain rule
g

(u) =
n

i=1
f
x
i
(a +u(x a))(x
i
a
i
)
= df
(a+u(xa))
(x a)
In particular
g

(0) = df
(a)
(x a).
Using the chain rule once more we nd
g

(u) =
n

i,j=1

2
f
x
i
x
j
(a +u(x a))(x
i
a
i
)(x
j
a
j
)
= d
2
f
(a+u(xa))
(x a).
Hence
g

(c) = d
2
f
(a+c(xa))
(x a).
To prove (3) we dene
R
2
(x) =
1
2
d
2
f
(a+c(xa))
(x a)
1
2
d
2
f
(a)
(x a).
By using the previous equality we have
f(x) f(a) = df
(a)
(x a) +
1
2
d
2
f
(a+c(xa))
(x a)
= df
(a)
(x a) +
1
2
d
2
f
(a)
(x a) +R
2
(x)
To complete the proof we need to show that
R
2
(x)
|x a|
2
0 as x a.
We have:
[R
2
(x)[
|x a|
2
=
1
|x a|
2

1
2

i,j=1
_

2
f
x
i
x
j
(a +c(x a))


2
f
x
i
x
j
(a)
_
(x
i
a)(x
j
a)

1
2
n

i,j=1

2
f
x
i
x
j
(a +c(x a))

2
f
x
i
x
j
(a)

232
Since each second-order partial derivative is continuous at a, we have

2
f
x
i
x
j
(a +c(x a))

2
f
x
i
x
j
(a), as x a
so
R
2
(x)
|x a|
2
0, as x a.
This completes the proof.
The correspondent second-order Taylor approximation is
f(x) f(a) +df
(a)
(x a) +
1
2
d
2
f
(a)
(x a)
The previous estimation is valid for |x a| suciently small.
We will present without proof the next theorem which summarize the approxima-
tion of a C
k
function on R
n
by a Taylor polynomial of order k.
Theorem 2. Let k be a positive integer. Let f : D R, D R
n
, be a C
k
function
on D and let a D. Then, there exists a function R
k
: D R, such that for all
x D
f(x) = f(a) +df
(a)
(x a) +
1
2
d
2
f
(a)
(x a) + +
+
1
k!
d
k
f
(a)
(x a) +R
k
(x) (4)
where
R
k
(x)
|x a|
k
0, as x a.
Relation (4) is called Taylors formula.
In the previous theorem the polynomial T
k
: D R
T
k
(x) = f(a) +
1
1!
df
(a)
(x a) +
1
2!
d
2
f
(a)
(x a)+
+ +
1
k!
d
k
f
(a)
(x a) (5)
is called the Taylor polynomial of degree k and R
k
= f T
k
is called the remainder
of oder k.
The correspondent k-order Taylor approximation is
f(x) f(a) +
1
1!
df
(a)
(x a) +
1
2!
d
2
f
(a)
(x a)+
+ +
1
k!
d
k
f
(a)
(x a), (6)
valid for |x a| suciently small.
Example 1. Compute the rst and second order Taylor approximation of the
Cobb-Douglas function
f : (0, ) (0, ) R
f(x, y) = x
1
4
y
3
4
233
at the point (1, 1).
Solution. We compute the rst and second order partial derivative functions of
f:
f
x
=
1
4
x

3
4
y
3
4
,
f
y
=
3
4
x
1
4
y

1
4

2
f
x
2
=
3
16
x

7
4
y
3
4
,

2
f
xy
=
3
16
x

3
4
y

1
4
,

2
f
y
2
=
3
16
x
1
4
y

5
4
Evaluating these partial derivatives at the point (1,1) we obtain:
f
x
(1, 1) =
1
4
,
f
y
(1, 1) =
3
4
,

2
f
x
2
(1, 1) =
3
16
,

2
f
xy
(1, 1) =
3
16
,

2
f
y
2
(1, 1) =
3
16
Substituting in
df
(1,1)
(x 1, y 1) =
f
x
(1, 1)(x 1) +
f
y
(1, 1)(y 1)
the values of the rst partial derivatives obtained before we get:
df
(1,1)
(x 1, y 1) =
1
4
(x 1) +
3
4
(y 1)
Substituting in
d
2
f
(1,1)
(x 1, y 1) =

2
f
x
2
(1, 1)(x 1)
2
+ 2

2
f
xy
(1, 1)(x 1)(y 1)
+

2
f
y
2
(1, 1)(y 1)
2
the values of the second partial derivatives we get:
d
2
f
(1,1)
(x 1, y 1) =
3
16
(x 1)
2
+
3
8
(x 1)(y 1)
3
16
(y 1)
2
.
Finally we obtain:
f(x, y) 1 +
1
4
(x 1) +
3
4
(y 1) the rst order approximation
f(x, y) 1 +
1
4
(x 1) +
3
4
(y 1)
3
32
(x 1)
2
+
3
16
(x 1)(y 1)
3
32
(y 1)
2
the second order approximation.
234
If we use the Taylor approximation of order one to approximate
f(1, 1; 0, 9) = (1, 1)
1
4
(0, 9)
3
4
,
we obtain that
(1, 1)
1
4
(0, 9)
3
4
1 +
1
4
(1, 1 1) +
3
4
(0, 9 1)
= 1 +
1
4
0, 1
3
4
0, 1 = 0, 9
If we use the Taylor approximation of order two we get that
(1, 1)
1
4
(0, 9)
3
4
1 +
1
4
0, 1 +
3
4
(0, 1)
3
32
(0, 1)
2
+
3
16
0, 1 (0, 1)
3
32
(0, 1)
2
= 0, 94625
It can be easily seen that the approximation obtain by using the second order
Taylor approximation is a better approximation of
(1, 1)
1
4
(0, 9)
3
4
= 0, 943026 . . . .
4.5 Extrema of function of several variables
Since optimization plays a major role in economic theory this section can be
considered the core of this part of the book.
At this moment we have a good understanding of conditions under which a is a
local extreme point of a C
2
function f : I R, where I is an open interval, I R.
These conditions can be stated as follows.
1

. Necessary conditions
If a is a local minimum point of f then
f

(a) = 0 and f

(a) 0.
If a is a local maximum point of f then
f

(a) = 0 and f

(a) 0.
2

. Sucient conditions
If f

(a) = 0 and f

(a) < 0, then a is a local maximum point of f.


If f

(a) = 0 and f

(a) > 0, then a is a local minimum point of f.


Our purpose is to develop the generalizations of the previous results to the case
of functions of more then one variable.
We will see that the main results for functions of several variables are analogous
to the one-dimensional results.
Denition 1. Let f : D R be a real-valued function of n variables, D a subset
of R
n
and let a D.
235
a) The point a is called a global or absolute maximum point of f if f(a) f(x)
for all x D. In this case, f(a) is the global maximum value of f.
b) The point a is called a local (or relative) maximum point of f if there is a ball
B(a, r) such that f(a) f(x) for all x B(a, r) D. In this case, f(a) is the local
maximum value of f.
Reversing the inequalities in the above two denitions we obtain the denitions of
a global minimum point and of a local minimum point.
First order conditions
The results presented here are obtained by using only the rst order partial deriva-
tives of a given function.
In the case of one variable, the rst order condition for a point a be to a local
maximum or minimum point of a C
1
function f is that f

(a) = 0. In this case a has


to be a critical point of f.
The generalization to several variables of the critical point notion is the following:
Denition 2. Let f : D R, D R
n
and let a D. The point a is called a
stationary (or critical) point of f if f admits partial derivatives at a and
f
x
1
(a) = =
f
x
n
(a) = 0.
The next theorem is very useful in locating local extrema of f.
Theorem 1. Let f : D R be a dierentiable function on D R
n
. If a is a local
maximum (or minimum) point of f then
f
x
1
(a) =
f
x
2
(a) = =
f
x
n
(a) = 0.
Hence, a is a stationary point of f.
Proof. We will work in the case when a is a local minimum point (the same proof
works for the maximum case).
Let B = B(a, r) be a ball centered at a, B D with the property that f(a) f(x)
for all x B.
Since a is a minimum point of f on B then along each segment which passes
through a (that lies in B) f takes its minimum value at a.
In consequence, for each i = 1, n, a
i
is the minimum value of the following function
of one variable
g
i
: (a
i
r, a
i
+r) R
g
i
(x
i
) = f(a
1
, . . . , a
i1
, x
i
, a
i+1
, . . . , a
n
).
If we apply now Fermats theorem (Theorem 1, subsection 3.1.4) to the above
function we conclude that
g
i
(a
i
) =
f
x
i
(a
i
) = 0, i = 1, n.
236
The previous theorem says that in order to determine the local extreme points for
a dierentiable function we must seek among the stationary points.
Example 1. Determine the stationary points of the function dened by
f : R
2
R
f(x, y) = x
3
y
3
+ 9xy.
Solution. To nd the stationary points of f, we compute the rst order partial
derivatives and equate them to zero.
f
x
(x, y) = 3x
2
+ 9y = 0
f
y
(x, y) = 3y
2
+ 9x = 0
From the rst equation we get
y =
1
3
x
2
which can be substituted in the second one to get

1
3
x
4
+ 9x = 0.
The solution of the previous equation are x = 0 and x = 3.
Substituting these values into y =
1
3
x
2
we obtain that the stationary points of
the function f are (0,0) and (3, 3). At this moment we are not able to decide the
nature of each of the previous two stationary points. To determine the nature of the
stationary points we need to use a condition on the second order dierential of f, as
we did for functions of one variable.
Second order conditions
Sucient conditions
Theorem 2. Let f : D R be a C
2
function. Suppose that D R
n
, a D is a
stationary point for f.
a) If d
2
f
(a)
(h) > 0 for each h R
n
, then a is a local minimum point for f.
b) If d
2
f
(a)
(h) < 0 for each h R
n
, then a is a local maximum point of f.
c) If there are v, w R
n
such that d
2
f
(a)
(v) > 0 and d
2
f
(a)
(w) < 0, then a
is neither a local maximum point nor a local minimum of f.
Denition 3. A stationary point of f for which the assumptions of part c) hold
is called a saddle point.
Proof. Since the proofs for part a) and b) are quite similar, we will prove part a)
and leave the proof of part b) as an exercise.
237
a) We assume that a is a stationary point of the C
2
function f and that d
2
f
(a)
(h) >
0, for each h R
n
. Write the second order Taylors formula at the critical point
a:
f(x) = f(a) +df
(a)
(x a) +
1
2
d
2
f
(a)
(x a) +R
2
(x) (1)
where
R
2
(x)
|x a|
2
0, as x a.
Since a is a stationary point of f then (1) becomes
f(x) f(a) =
1
2
d
2
f
(a)
(x a) +R
2
(x).
We divide the previous equality by |x a|
2
and we get
f(x) f(a)
|x a|
2
=
1
2
d
2
f
(a)
_
x a
|x a|
_
+
R
2
(x)
|x a|
2
(2)
As a polynomial of degree two, the quadratic form
Q(h) = d
2
f
(a)
(h)
is a continuous function on R
n
.
Let = minQ(v) [ |v| = 1.
Since the unit sphere v [ |v| = 1 is compact, by applying Weierstrass theorem
(Theorem 4, subsection 4.1) to the restriction of function Q on the unit sphere we
conclude that there exists a point w on the unit sphere such that Q(w) = . Since Q
is positive denite and w ,= 0 (|w| = 1) then = Q(w) > 0 and it follows that
1
2

1
2
Q
_
x a
|x a|
_
=
1
2
d
2
f
(a)
_
x a
|x a|
_
, for all x ,= a (3)
Since
R
2
(x)
|x a|
2
0 as x a, there exist an r > 0 such that

4
<
R
2
(a)
|x a|
2
<

4
for all x, 0 < |x a| < r (4)
Combining (3) and (4) we nd that for all x with 0 < |x a| < r we have
1
2
d
2
f
(a)
_
x a
|x a|
_
+
R
2
(a)
|x a|
2
>
1
2

1
4
=
1
4
> 0.
The second part of the equality (2) is positive for 0 < |x a| < r, therefore, so
is the rst part of the above mentioned equality:
f(x) f(a)
|x a|
2
> 0 for 0 < |x a| < r or x B(a, r).
238
c) We will show that the conditions
df
(a)
0 and d
2
f
(a)
(v) > 0
imply that a cannot be a local maximum point of f.
In the same way the conditions
df
(a)
0 and d
2
f
(a)
(w) < 0
imply that a cannot be a local minimum point of f.
We consider the function
t g(t) = f(a +tv)
and use the chain rule to compute the rst and second derivatives of function g dened
above
g

(t) =
n

i=1
f
x
i
(a +tv)v
i
= df
(a+tv)
(v)
so
g

(0) = df
(a)
(v) = 0.
Taking the second derivative we get
g

(t) =
n

i,j=1

2
f
x
i
x
j
(a +tv)v
i
v
j
= d
2
f
(a+tv)
(v)
Since g

(0) = d
2
f
(a)
(v) > 0 and g

is a continuous function then there is > 0


such that g

(t) > 0 for all t (, ) and in conclusion g

is an increasing function on
(, ). By taking in account the fact g

(0) = 0 we obtain g

(t) > 0 for each t (0, ).


This implies that g is an increasing function on (0, ).
In particular a cannot be a local maximum point for f.
A similar argument shows that
df
(a)
0 and d
2
f
(a)
(w) < 0
imply that a cannot be a local minimum point either.
This completes the proof of Theorem 2.
By using the characterization of positive denite and negative denite quadratic
forms, Theorem 2 can be restated as the following theorem.
Theorem 3. Let f : D R be a C
2
function, D R
n
. Suppose that a is a
stationary point of f.
a) If the n leading principal minors of H(a) are all positive
[a
11
[ > 0,

a
11
a
12
a
21
a
22

> 0, . . . ,

a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
. . . . . . . . . . . .
a
n1
a
n2
. . . a
nn

> 0,
239
then a is a local minimum point of f.
b) If the n leading principal minors of H(a) alternate is sign
[a
11
[ < 0,

a
11
a
12
a
21
a
22

> 0, . . . , (1)
n

a
11
a
12
. . . a
1n
a
21
a
22
. . . a
2n
. . . . . . . . . . . .
a
n1
a
n2
. . . a
nn

> 0,
then a is a local maximum point of f.
c) If some nonzero leading principal minors of H(a) donts satisfy the sign condi-
tions in the hypotheses of parts a) and b) then a is a saddle point of f (it is neither
a local maximum nor a local minimum point of f).
Next, we will present a particular case of the previous results which concerns
functions of two variables.
Theorem 4. Let f : D R, D R
2
, be a C
2
function. Suppose that (a, b) is a
stationary point of f.
Let
A =

2
f
x
2
(a, b), B =

2
f
xy
(a, b),
C =

2
f
y
2
(a, b) and D = B
2
AC.
a) If D < 0 and A > 0 then (a, b) is a local minimum point of f.
b) If D < 0 and A < 0 then (a, b) is a local maximum point of f.
c) If D > 0 then (a, b) is a saddle point.
d) If D = 0 no conclusion can be drawn concerning a relative extremum, the test
inconclusive and some other technique must be used to solve the problem.
Proof. Since
a
11
= A and D =

a
11
a
12
a
21
a
22

theorem 4 is a immediate consequence of theorem 3.


Example 2. In example 1 we computed the stationary points of
f : R
2
R, f(x, y) = x
3
y
3
+ 9xy
are (0, 0) and (3, 3). By dierentiating the rst partial derivatives, we obtain that
the Hessian of f at (x, y) is
H(x, y) =
_
f

x
2
(x, y) f

xy
(x, y)
f

yx
(x, y) = f

xy
(x, y) f

y
2
(x, y)
_
=
_
6x 9
9 6y
_
The rst order principal minor is

1
(x, y) = 6x
240
and the second order principal minor is

2
(x, y) = 36xy 81.
At (0,0), these two minors are 0 and 81, respectively.
Since the second order leading principal minor is negative, (0,0) is a saddle point
of f neither a maximum point nor a minimum point (see Theorem 3).
At (3, 3) these two minors are 18 and 243 which are positive numbers and in
consequence (3, 3) is a local point of f (see Theorem 3).
Another way of solving the problem is by using Theorem 4. If analyze the nature
of stationary point we observe that
A =

2
f
x
2
(0, 0) = 0, B =

2
f
xy
(0, 0) = 9,
C =

2
f
y
2
(0, 0) = 0, D = 81,
hence (0,0) is a saddle point (see part c) of Theorem 4).
At (3, 3) we have
A =

2
f
x
2
(3, 3) = 18, B =

2
f
xy
(3, 3) = 9,
C =

2
f
y
2
(3, 3) = 18, D = 18
2
81 < 0,
hence (3, 3) is a local minimum point of f.
We have to mention that (3, 3) is not a global minimum point, because f(0, n) =
n
3
which goes to as n .
Example 3. A monopolist producing a single output has two types of customers.
If it produces a units for customers of type 1, then these customers are willing to pay
100 10a euros per unit. If it produces b units for customers of type 2, then these
customers are willing to pay a price of 200 20b euros per unit. The monopolists
cost of producing c units of output is 180 + 40c euros. In order to maximize prots,
how much should the monopolist produce for each market?
Solution. The prot function is the following
f(a, b) = a(100 10a) +b(200 2b) [180 + 40(a +b)]
The stationary points are the solutions of the following system:
_

_
f
a
= 100 20a 40 = 60 20a = 0
f
b
= 200 40b 40 = 160 40b = 0
_
60 20a = 0
160 40b = 0
(a, b) = (3, 4)
241
It remains to check the second order conditions. Since
f

a
2(a, b) = 20, f

b
2(a, b) = 40
and
f

ab
(a, b) = f

ba
(a, b) = 0,
then A = 20 < 0, D = 800. Therefore the point (3,4) is a local maximum point of
f.
Example 4. A rm uses two inputs to produce a single product. If its production
function is
Q(x, y) = x
1
4
y
1
4
and if it sells its output for a euro a unit and buys each input for
1
16
euros a unit,
nd its maximum prot.
Solution. The prot function is the following
f(x, y) = x
1
4
y
1
4

1
16
(x +y), x > 0, y > 0.
The stationary points are the solutions of the following system
_

_
f
x
(x, y) = 0
f
y
(x, y) = 0

_
1
4
x

3
4
y
1
4

1
16
= 0
1
4
x
1
4
y

3
4

1
16
= 0

_
y
x
3
=
_
1
4
_
4
x
y
3
=
_
1
4
_
4

_
1
x
2
y
2
=
_
1
4
_
8
y
x
3
=
_
1
4
_
4

_
1
xy
=
_
1
4
_
4
y
x
3
=
_
1
4
_
4

_
_
1
x
_
4
=
_
1
4
_
8
y
x
3
=
_
1
4
_
4

_
x = 16
y = 16
It remains to check the second order conditions:

2
f(x, y)
x
2
=
3
16
x

7
4
y
1
4
; A =
3
16
2
6
< 0

2
f(x, y)
xy
=
1
16
x

3
4
y

3
4
; B =
1
16
2
3
2
3
=
1
16
2
6

2
f(x, y)
y
2
=
3
16
x
1
4
y

7
4
and C =
3
16
2
6
.
242
In consequence
D =
1
16
2
12

9
16
2
2
12
< 0 and A < 0,
hence (16, 16) is a local maximum point for f.
Example 5. A farmer wishes to build a rectangular storage bin, without a top,
with a volume of 500 cubic meters using the least amount of material possible. De-
termine the dimensions of such a storage bin.
Solution. If we let x and y be the dimensions of the base of the bin and z be the
height, all measured in meters, then the farmer wishes to minimize the surface area
of the bin, given by
S = xy + 2xz + 2yz (3)
subject to the constraint of the volume, namely,
500 = xyz
Solving for z in the latter expression and substituting into (3) we have
S = S(x, y) = xy + 2x
500
xy
+ 2y
500
xy
= xy +
1000
y
+
1000
x
This is the function we need to minimize on the unbounded set
D = (x, y) [ x > 0, y > 0.
Now
S
x
= y
1000
x
2
and
S
y
= x
1000
y
2
so to nd the stationary points of S we need to solve
_

_
y
1000
x
2
= 0
x
1000
y
2
= 0
Solving for y in the rst equation and then substituting into the second equation
we get
x
x
4
1000
= 0 x
_
1
x
3
1000
_
= 0.
243
The solutions of the latter are x = 0 and x = 10. Since the rst of these will not
give us a point in D, we have x = 10 and
y =
1000
10
2
= 10.
Thus the only stationary point is (10,10).
Now

2
S
x
2
=
2000
x
3
; A =

2
S
x
2
(10, 10) =
2000
1000
= 2 > 0

2
S
xy
= 1; B =

2
S
xy
(10, 10) = 1

2
S
xy
=
2000
y
3
; C =

2
S
y
2
(10, 10) =
2000
1000
= 2 > 0
Hence D
2
= B
2
AC = 1 4 = 3 < 0 and A > 0.
This shows that S has a local minimum of
S(10, 10) = 10 10 +
1000
10
+
1000
10
= 300 at (x, y) = (10, 10).
Finally, when x = 10 and y = 10, we have
z =
500
10 10
= 5,
so the farmer should build the bin to have a base of 10 meters by 10 meters and a
height of 5 meters.
The problem of showing that the point (10,10) is actually the global minimum
value of S will be discussed later.
Example 6. Determine the local extreme values of the following function
f : R
3
R, f(x, y, z) = 2x
2
+y
2
+ 2xy + 3z
2
+ 2xz + 4z + 3.
Solution. The stationary points are the solutions of the following system of linear
equations:
_

_
f
x
(x, y, z) = 0
f
y
(x, y, z) = 0
f
z
(x, y, z) = 0

_
4x + 2y + 2z = 0
2x + 2y = 0
2x + 6z + 4 = 0
The solution of the previous system is (x, y, z) = (1, 1, 1). In order to establish
the nature of the stationary point (1, 1, 1) we have to apply the second order
conditions. We evaluate rst the Hessian matrix at (1, 1, 1).
H(x, y, z) =
_
_
4 2 2
2 2 0
2 0 6
_
_
and hence H(1, 1, 1) =
_
_
4 2 2
2 2 0
2 0 6
_
_
.
244
The principal minors are

1
= 4,
2
=

4 2
2 2

= 4 and

3
=

4 2 2
2 2 0
2 0 6

= 48 + 0 + 0 8 0 24 = 16.
By applying part a) of Theorem 3 we get that (1, 1, 1) is a local minimum point
of f.
Necessary conditions
We will now prove the second order necessary conditions for optimization.
Theorem 5. Let f : D R be a C
2
function, D R
n
. Suppose that a is a
local minimum point (respectively a local maximum point) of f. Then df
(a)
0 and
d
2
f
(a)
(h) 0, for each h in R
n
(respectively df
(a)
0 and d
2
f
(a)
(h) 0 for each h
in R
n
).
Proof. From Theorem 1 we know that a is a stationary point of f and hence
df
(a)
0.
By using a similar argument as in the proof of Theorem 2 if a is a stationary point
and d
2
f
(a)
(v) > 0 for some vector v then a cannot be a local maximum point of f.
So, if a stationary point is a local maximum point of f there is no vector such that
d
2
f
(a)
(v) > 0. In consequence we have d
2
f
(a)
(h) 0 for all h R
n
.
In the same way if a is a local minimum point of f then df
(a)
0, d
2
f
(a)
(h) 0
for each h R
n
.
Global maxima and minima
Denition 4. Let D R
n
. D is said to be convex if, for all x and y in D and
every t in the interval [0, 1], the point (1 t)x +ty is in D.
In other words, every point on the line segment connecting x and y is in D.
Theorem 6. a) Let f be a C
2
function on an open convex subset D of R
n
for
which d
2
f
(x)
(h) 0 for all x D and h R
n
.
If a is a stationary point of f then a is a global minimum point of f.
b) Let f be a C
2
function on an open convex subset D of R
n
for which d
2
f
(x)
(h) 0
for all x D and h R
n
. If a is a stationary point of f, that is d
2
f
(a)
0, then a is
a global maximum point of f.
The proof of the previous theorem involves ideas that are beyond the scope of this
text and will be omitted.
Example 7. In example 3 we get that the point (3,4) is a local maximum point
of f. By applying the previous theorem we obtain that (3,4) is a global maximum
point. Indeed,
D = (a, b) R
2
[ a > 0, b > 0
245
is a convex set (see the Denition 4). On the other hand, the Hessian matrix is
H(a, b) =
_
20 0
0 40
_
for each point in D, hence
d
2
f
(a,b)
(h
1
, h
2
) = 20h
2
1
40h
2
2
0
for each (h
1
, h
2
) R
2
.
Since all the hypotheses of theorem 4, part b) are fullled it results that the
stationary point (3,4) is a global maximum point.
Example 8. Prove that the stationary point in example 6 is a global (absolute)
minimum point of f.
Solution. D = R
3
is a convex set.
The Hessian matrix is an arbitrary point (x, y, z) in R
3
is
H(x, y, z) =
_
_
4 2 2
2 2 0
2 0 6
_
_
,
hence
d
2
f
(x,y,z)
(h
1
, h
2
, h
3
) = 4h
2
1
+ 4h
1
h
2
+ 4h
1
h
3
+ 2h
2
2
+ 6h
2
3
= 2h
2
1
+ 4h
1
h
2
+ 2h
2
2
+ 2h
2
1
+ 4h
1
h
3
+ 2h
2
3
+ 4h
2
3
= 2(h
1
+h
2
)
2
+ 2(h
1
+h
3
)
2
+ 4h
2
3
0,
for all (x, y, z) R
3
and (h
1
, h
2
, h
3
) R
3
.
Since all the hypotheses of Theorem 4 part a) are fullled it results that the
stationary point (1, 1, 1) is a global minimum point.
Remark 1. The global minimum and maximum values of a continuous function
on a closed and bounded (hence compact) set D can be obtained in the following way:
nd the values of f at the stationary points of f in the interior of D
nd the extreme values of f on the boundary of D
the largest of the values of f from steps 1 and 2 is the global maximum value,
the smallest of these values is the global minimum value.
Example 9. Prove that the stationary point (10,10) in example 5 is an absolute
maximum point of S.
246
Solution. let D
1
be the closed rectangle:
D
1
= (x, y) [ 1 x 400, 1 y 400.
Now, if 0 < x 1, then
1000
x
1000 and so
S = xy +
1000
y
+
1000
x
1000 > 300.
Similarly, if 0 < y 1, then S > 300. Moreover, if x 400 and y 1, then
xy 400, and so S > 300. Similarly, if y 400 and x 1, then S > 300. Hence
S > 300 for all (x, y) outside of D
1
and for all (x, y) on the boundary of D
1
.
From the previous observations, the global minimum on D must in fact be the
global minimum of S on all D
1
which is 300.
4.6 Constrained extrema
In this section we discuss a powerful method for determining the relative extrema
of a function whose independent variables satisfy one or more constraints. This method
is called the Lagrange multipliers method. Consider the problem:
_
_
_
optimize f(x) = f(x
1
, x
2
, . . . , x
n
)
subject to
g
j
(x
1
, x
2
, . . . , x
n
) = c
j
, j = 1, . . . , m < n
(1)
f is called the objective function, g
1
, . . . , g
m
are the constraint functions, c
1
, . . . , c
m
are the constraint constants.
If it is possible to express m independent variables as functions of the other nm
independent variables, we can eliminate m variables in the objective function (as in
example 5, section 4.5) thus the initial problem will be reduced to the unconstrained
optimization problem with respect to n m variables. However, in many cases it is
not technically possible to express one variable as a function of the others.
In this case, instead of the substitution and elimination method, we will use the
method of Lagrange multipliers.
In comparison with using the constraint to express m independent variables in
terms of the others, the Lagrangean technique involves more variables and more equa-
tions. The advantage of the Lagrangean method is its universality.
Constrained optimization has a proeminent place in economic theory due to the
importance of maximization of utility subject to a budget constraint.
In economic theory it is important that the Lagrange multipliers express how the
extreme value of the problem would change as the constraint is modied.
We begin with the simplest constrained maximization problem, that is maximizing
a function f(x, y) of two variables subject to a single equality constraint g(x, y) = c.
247
`

g
(
x
,
y
)
=
c
f(x, y) = k
y
x
The previous gure shows this curve together with several level curves of f. To
maximize f(x, y) subject to g(x, y) = c is to nd the largest value of k such that
the level curve f(x, y) = k intersects g(x, y) = c. It appears from the gure above
that this happens when these curves just touch each other, that is, when they have
a common tangent line (otherwise, the value of k could be increased further). In this
case the slope of the constraint curve g(x, y) = c is equal to the slope of a level curve
f(x, y) = k. According to the formula (9), section 4.2 the slope of the constraint curve
is =
g

x
g

y
and the slope of the level curve is
f

x
f

y
.
Hence the condition that the slopes be equal can be expressed by the equation

x
f

y
=
g

x
g

y
or, equivalently,
f

x
g

x
=
f

y
g

y
.
If we let denote this common ratio, we have
=
f

x
g

x
and =
f

y
g

y
from which we get the following equations (Lagrange equations)
f

x
= g

x
and f

y
= g

y
.
The third equation g(x, y) = c is simply a statement of the fact that the point in
question actually lies on the constraint set.
There is a formal way of obtaining the previous equations:
248
Form the Lagrange function
L(x, y, ) = f(x, y) [g(x, y) c].
Find the critical points of the Lagrangean. The result of this process is the following
system:
_

_
f
x
(x, y) =
g
x
(x, y)
f
y
(x, y) =
g
y
(x, y)
g(x, y) = c
Among the solutions of this system we can nd the extreme points of f.
A minimization problem can be analyzed by using the same arguments.
The statement of necessary conditions for optimizing a function of b variables
constrained by m equality constraints is the following.
Theorem 1. (Necessary conditions)
Let f, g
1
, . . . , g
m
: D R be C
1
functions of n variables (m < n). Consider the
problem of maximizing (or minimizing) f on the constraint set
C
g
= x [ g
1
(x) = c
1
, . . . , g
m
(x) = c
m
.
Suppose that a is a local maximum or minimum point of f on C
g
(that is a C
g
).
Suppose further that the rank of the Jacobian matrix
Dg(a) =
_
_
_
_
_
g
1
x
1
(a) . . .
g
1
x
n
(a)
. . . . . . . . .
g
m
x
1
(a) . . .
g
m
x
n
(a)
_
_
_
_
_
, is m. (2)
Then there exist
1
,
2
, . . . ,
n
such that
(a
1
, . . . , a
n
,
1
, . . . ,
m
) = (a, )
is a stationary point of the Lagrangean function:
L(x, ) = f(x)
1
[g
1
(x) c
1
]
2
[g
2
(x) c
2
]

m
[g
m
(x) c
m
] (3)
In other words
L
x
1
(a, ) = 0, . . . ,
L
x
n
(a, ) = 0,
L

1
(a, ) = g
1
(x) c
1
= 0, . . . ,
L

m
(a, ) = g
m
(x) c
m
= 0.
249
The proof of this theorem involves ideas that are beyond the scope of this text all
will be omitted.
Example 1. A consumer has 1200 m.u. (monetary units) to spend on two com-
modities, the rst of which costs 40 m.u. per unit and the second 60 m.u. per unit.
Suppose that the utility derived by the consumer from x units of the rst commodity
and y units of the second commodity is given by the Cobb-Douglas utility function
U(x, y) = 20x
0,6
y
0,4
.
How many units of each commodity should the consumer buy to maximize utility?
Solution. The total cost of buying x units of the rst commodity and y units of
the second is 40x+60y. Since the consumer has only 1200 m.u. to spend, the goal is to
maximize utility U(x, y) subject to the budgetary constraint that 40x + 60y = 1200.
The Lagrangean function is
L(x, y, ) = 20x
0,6
y
0,4
(40x + 60y 1200).
The three Lagrange equations are:
_

_
L
x
= 12
_
y
x
_
0,4
40 = 0
L
y
= 8
_
x
y
_
0,6
60 = 0
L

= 40x + 60y 1200 = 0


wherefrom we easily get that
_
y
x
_
0,4
=
10
3
;
_
x
y
_
0,6
=
15
2
and 2x + 3y = 60.
From the rst two equalities we get
_
y
x
_
0,4

_
y
x
_
0,6
=
10
3

2
15

y
x
=
4
9
y =
4
9
x
Substituting this into the third equation we get
2x + 3
4
9
x = 60
from which it follows that x = 18 and y = 8.
So the only candidate for a solution to our problem is x = 18 and y = 8;
=
3
10

_
y
x
_
0,4
=
3
10
_
8
18
_
0,4
=
3
10

_
2
3
_
0,8
.
Next we will describe a second order condition that distinguished maximum points
from minimum points.
250
Intuitively, the second order condition for a constrained maximization (or mini-
mization) problem should involve the negative deniteness of some Hessian matrix,
but should only be concerned with directions along the constraint set.
Theorem 2. (Sucient conditions).
Let f, g
1
, . . . , g
m
: D R, D R
n
, be C
2
functions of n variables (m < n).
Consider the problem of maximizing or minimizing f on the constraint set
C
g
= x = (x
1
, . . . , x
n
) [ g
1
(x) = c
1
, . . . , g
m
(x) = c
m
.
Form the Lagrangean
L(x, ) = f(x)
1
[g
1
(x) c
1
]
m
[g
m
(x) c
m
]
and suppose that:
a) a C
g
b) There exists = (
1
, . . . ,
m
) R
m
, such that
L
x
1
(a, ) = =
L
x
n
(a, ) = 0
c) The Hessian of L with respect to x at (a, ) is negative denite (positive denite)
on the set v [ Dg(a)v = 0 that is for each v ,= with Dg(a)v = 0 we have
d
2
L
(a,)
(v) < 0 (respectively d
2
L
(a,)
(v) > 0).
Then a is a strict local constrained maximum (minimum) point on C
g
.
Proof. We want to show that a is a local maximum point of f on the constraint
set C
g
.
We assume the opposite, that means that there exists a sequence (x
j
)
j1
R
n
such
that x
j

k
a with x
j
,= a for all j and x
j
C
g
and f(x
j
) > f(a) for all j. Construct
a new sequence by using these x
j
s.
v
j
=
x
j
a
|x
j
a|
.
It is obvious that |v
j
| = 1.
(v
j
)
j1
is a sequence contained by the unit sphere (in R
n
).
Since the unit sphere in R
n
is a compact set, the sequence (v
j
)
j1
has a convergent
subsequence which will be denoted by (v
k
)
k1
and its limit by v.
Since g
i
is C
1
, for each i = 1, m, we write down its rst Taylor polynomial of order
one about a evaluating it at each x
k
.
g
i
(x
k
) g
i
(a) = dg
i
(a)
(x
k
a) +R
i
1
(x
k
)
where
R
i
1
(x
k
)
|x
k
a|
0 as x
k
a
c
i
c
i
|x
k
a|
= dg
i
(a)
_
x
k
a
|x
k
a|
_
+
R
i
(x
k
)
|x
k
a|
.
251
If we let k in the previous equality we get that
0 = dg
i
(x)
(a), i = 1, m.
Now write down the second order Taylor polynomial of the Lagrangean as a func-
tion of x about a:
L(x
k
) = L(a) +dL
a
(x
k
a) +
1
2
d
2
L
a
(x
k
a) +R
2
(x
k
) (4)
where
R
2
(x
k
)
|x
k
a|
2
0 as x
k
a.
By hypothesis, dL
a
0; also
L(x
k
) = f(x
k
)

i
(g
i
(x
k
) c
i
)
= f(x
k
)
In the same way L(a) f(a).
Using these results, rewrite (4) as
0
f(x
k
) f(a)
|x
k
a|
2
= d
2
L
a
(v
k
) +
R
2
(x
k
)
|x
k
a|
2
.
Let x
k
a in the previous relation. Hence d
2
L
a
(v) 0 which is a contradiction
with the hypotheses. This completes the proof.
By combining the previous two theorems we obtain the following 5 steps algorithm
in order to determine the constrained extreme points of a given function.
Lagranges method of multipliers
Let f, g
1
, . . . , g
m
: D R, D R
n
(m < n) be C
2
functions and let c
1
, . . . , c
m

R. Consider the problem of maximizing (or minimizing) f on the constraint set
C
g
= x [ g
1
(x) = c
1
, . . . , g
m
(x) = c
m
.
Suppose that the rank of the matrix
Dg(x) =
_
_
_
_
_
g
1
x
1
(x) . . .
g
1
x
n
(x)
. . . . . . . . .
g
m
x
1
(x) . . .
g
m
x
n
(x)
_
_
_
_
_
is m for each x C
g
.
Step 1. Assign to any constraint g
j
(x) = c
j
one Lagrange multiplier,
j
R,
j = 1, . . . , m.
252
Write down the Lagrange function (Lagrangean)
L(x, ) = f(x)
1
[g
1
(x) c
1
]
m
[g
m
(x) c
m
]
= f(x)
m

j=1

j
[g
j
(x) c
j
] (5)
Step 2. Find the stationary points (a, ) of the function L with respect to variables
x and . These are solutions of the following system
_

_
L
x
i
(x, ) = 0, i = 1, n
L

j
(x, ) = g
j
(x) c
j
= 0, j = 1, m
Step 3. For each stationary point (a, ) consider the function L of n variables:
L(x) = f(x)
1
[g
1
(x) c
1
]
m
[g
m
(x) c
m
]
= f(x)
m

j=1

j
[g
j
(x) c
j
]
Step 4. Consider and solve the following system whose rank is m (see (1))
_
_
_
dg
1(a)
(v) = 0
. . .
dg
m(a)
(v) = 0
Step 5. Evaluate d
2
L
(a)
at each solution v of the previous system.
a) If d
2
L
(a)
(v) > 0 for each v ,= obtained at step 4, then a is a constrained
minimum point.
b) If d
2
L
(a)
(v) < 0 for each v ,= obtained at step 4, then a is a constrained
maximum point.
c) If d
2
L
(a)
(v) takes both positive and negative values on the solution set obtained
at step 4 then a is not a constrained extreme point.
Example 2. Determine the nature of the stationary point in Example 1.
Solution. The rank of the matrix
(g

x
(x, y), g

y
(x, y)) = (40, 60)
is always 1. The Lagrangean function in this example is
L(x, y, ) = 20x
0,6
y
0,4
(40x + 60y 1200)
and the critical point is
_
18, 8,
3
10
_
2
3
_
0,8
_
. It remains for us to check steps 3, 4 and
5 from the previous algorithm.
253
Step 3. L(x, y) = 20x
0,6
y
0,4

3
10
_
2
3
_
0,8
(40x + 60y 1200)
Step 4. We have to solve the following equation (because we have just one con-
straint)
g
x
(18, 8)v
1
+
g
y
(18, 8)v
2
= 0.
Since
g
x
(18, 8) = 40 and
g
y
(18, 8) = 60, then the equation to be solved is:
40v
1
+ 60v
2
= 0
wherefrom we have v
2
=
2
3
v
1
.
Step 5. d
2
L
(18,8)
(v
1
, v
2
) = d
2
L
(18,8)
_
v
1
,
2
3
v
1
_
=

2
L
x
2
(18, 8)v
2
1
+ 2

2
L
xy
(18, 8)v
1
_

2
3
v
1
_
+

2
L
y
2
(18, 8)
_

2
3
v
1
_
2
=
_

2
L
x
2
(18, 8)
4
3


2
L
xy
(18, 8) +
4
9


2
L
y
2
(18, 8)
_
v
2
1
We have

2
L
x
2
(x, y) =
_
12
_
y
x
_
0,4
40
_

x
= 4, 8y
0,4
x
1,4
= 4, 8
1
x
_
y
x
_
0,4
so

2
L
x
2
(18, 8) =
4, 8
18
_
2
3
_
0,8
;

2
L
xy
(x, y) =
_
12
_
y
x
_
0,4
40
_

y
= 4, 8x
0,4
y
0,6
= 4, 8
1
y
_
y
x
_
0,4
so

2
L
xy
(18, 8) = 4, 8
1
8
_
8
18
_
0,4
=
4, 8
8
_
2
3
_
0,8

2
L
y
2
(x, y) =
_
8
_
x
y
_
0,6
60
_

y
= 4, 8y
1,6
x
0,6
= 4, 8
1
y
_
x
y
_
0,6
so

2
L
y
2
(18, 8) =
4, 8
8
_
2
3
_
1,2
254
We nally obtain that
d
2
L
(18,8)
_
v
1
,
2
3
v
2
_
=
_

4, 8
18
_
2
3
_
0,8

4, 8
6
_
2
3
_
0,8

4, 8
18
_
2
3
_
1,2
_
v
2
1
< 0
for all v
1
,= 0 and in consequence we have that (18,8) is a local maximum constraint
point f.
Recall from section 4.2 (example 8) that the level curves of a utility function are
the optimal indierence curve U(x, y) = C, where C = U(18, 8) and the budgetary
constraint is 40x + 60y = 1200 is sketched in gure below.
`

20
30
-
(18,8)
budget line
Example 3. Rework Example 5 from section 4.5 by using Lagranges multipliers
method.
Solution. Let x, y and z be the length, width and height, respectively, of the bin
in meters. We wish to minimize
S : (0, ) (0, ) (0, ) R, S(x, y, z) = xy + 2yz + 2zx
subject to the constraint of the volume, namely
V : (0, ) (0, ) (0, ) R, V (x, y, z) = xyz = 500.
Using the method of Lagrange multipliers, we have to follows the ve steps of the
Lagranges algorithm.
We have rst to check that the rank of the matrix
(V

x
(x, y, z), V

y
(x, y, z), V

z
(x, y, z)) = (yz, xz, xy)
is one for each point (x, y, z) which satises xyz = 500. The matrix rank is then 1 if
and only if x = y = z = 0, but the point (0,0,0) does not respect the constraint. In
conclusion the rank of the matrix is one for all the points of the constraint set.
255
Steps 1 and 2.
L(x, y, z) = S(x, y, z) [V (x, y, z) 500]
L(x, y, z) = xy + 2yz + 2zx (xyz 500)
We have to solve the following system
_

_
L
x
(x, y, z) = y + 2z yz = 0
L
y
(x, y, z) = x + 2z xz = 0
L
z
(x, y, z) = 2y + 2x xy = 0
L

(x, y, z) = 500 xyz = 0


There are no general rules for solving nonlinear systems of equations. Sometimes
some ingenuity is required. Usually we eliminate from the equations and try to solve
the remaining system.
=
y + 2z
yz
=
x + 2z
xz
=
2y + 2x
xy
From the previous equality we obtain that
1
z
+
2
y
=
1
z
+
2
x
=
2
x
+
2
y
The rst equality shows us that x = y and the last equality assure us that y = 2z,
so x = y = 2z.
If we substitute this values in the constraint equality we get 4z
3
= 500 wherefrom
we obtain that z = 5, x = y = 10 and =
10 + 10
50
=
2
5
. In consequence
_
10, 10, 5,
2
5
_
is the unique critical point of L.
Step 3. L(x, y, z) = xy + 2xz + 2yz
2
5
(xyz 500)
Step 4. In order to solve the equation
g
x
(10, 10, 5)v
1
+
g
y
(10, 10, 5)v
2
+
g
z
(10, 10, 5)v
3
= 0
we compute rst
g
x
= yz,
g
y
= xz and
g
z
= xy,
hence
g
x
(10, 10, 5) = 50,
g
y
(10, 10, 5) = 50 and
g
z
(10, 10, 5) = 100
256
The equation to be solved is
50v
1
+ 50v
2
+ 100v
3
= 0
wherefrom we have
v
3
=
1
2
(v
1
+v
2
).
Step 5. d
2
L
(10,10,5)
(v
1
, v
2
, v
3
) = d
2
L
(10,10,5)
_
v
1
, v
2
,
1
2
(v
1
+v
2
)
_
= 2

2
L
xy
(10, 10, 5)v
1
v
2
+ 2

2
L
xz
(10, 10, 5)v
1
_

1
2
(v
1
+v
2
)
_
+2

2
L
yz
(10, 10, 5)v
2
_

1
2
(v
1
+v
2
)
_
= 2
_
1
2
5
5
_
v
1
v
2

_
2
2
5
10
_
v
1
(v
1
+v
2
)
_
2
2
5
10
_
v
2
(v
1
+v
2
)
= 2v
1
v
2
+ 2v
1
(v
1
+v
2
) + 2v
2
(v
1
+v
2
)
= 2v
2
1
+ 2v
1
v
2
+ 2v
2
2
= v
2
1
+v
2
2
+ (v
1
+v
2
)
2
> 0
implies that (10,10,5) is a local minimum constraint point for f.
Example 4. Consider the problem of optimization of f
f : R
3
R, f(x, y, z) = x + 2y + 3z
on the constraint set dened by
g(x, y, z) = x
2
+y
2
= 1 and h(x, y, z) = x +z = 1.
Solution. First, compute the Jacobian matrix of the constraint function
Dg(x, y, z) =
_
2x 2y 0
1 0 1
_
Its rank is less than 2 if and only if x = y = 0. Since any point of the form (0, 0, z)
does not respect the rst constraint, the rank of the Jacobian matrix is two for all
the points of the constraint set.
Next, form the Lagrangean
L(x, y, z, , ) = x + 2y + 3z (x
2
+y
2
1) (x +z 1)
257
and set its rst partial derivatives equal to 0
_

_
L
x
= 1 2x = 0
L
y
= 2 2y = 0
L
z
= 3 = 0
L

= (x
2
+y
2
1) = 0
L

= (x +z 1) = 0
Solve the second and third equations for and and plug these into the rst
equation to obtain: = 3, =
1
y
and 1 2
x
y
3 = 0 which implies that x = y.
Then, solve the fourth equation for x and the last equation for z.
2x
2
= 1 x =
1

2
, y =
1

2
, z = 1
1

2
, =

2, = 3
x =
1

2
, y =
1

2
, z = 1 +
1

2
, =

2, = 3
For each of two previous critical points it remains for us to follows steps 3 to 5.
We will analyse just the rst case, the second is similar and it is left to the reader.
Step 3. L(x, y, z) = x + 2y + 3z +

2(x
2
+y
2
1) 3(x +z 1)
Step 4. We have to solve the following system (we denoted by a =
_
1

2
,
1

2
, 1
1

2
_
)
_

_
g
x
(a)v
1
+
g
y
(a)v
2
+
g
z
(a)v
3
= 0
h
x
(a)v
1
+
h
y
(a)v
2
+
h
z
(a)v
3
= 0
Easy computations lead us to
_
2v
1

2v
2
= 0
v
1
+v
3
= 0
, hence
_
v
2
= v
1
v
3
= v
1
Step 5. d
2
L
_
1

2
,
1

2
,1
1

2
_
(v
1
, v
2
, v
3
)
= d
2
L
_
1

2
,
1

2
,1
1

2
_
(v
1
, v
1
, v
1
)
258
= 2

2v
2
1
+ 2

2v
2
1
= 4

2v
2
1
> 0
so
_
1

2
,
1

2
, 1
1

2
_
is a local constraint maximum point for f.
The signicance of the Lagrange multipliers
It is possible to solve a constrained optimization problem by the method of La-
grange multipliers without obtaining numerical values for the Lagrange multipliers.
However, the multipliers play an important role in economic analysis since they
measure the sensitivity of the optimal value of the objective function to changes in
the right-hand sides of the constraints.
We analyse rst the simplest problem two variables and one equality constraint.
Let f, g : D R, D R
2
optimize f(x, y)
subject to g(x, y) = c
(6)
Consider c as a parameter, c R, which may vary.
For any xed value of c we denote by (a(c), b(c)) the solution of the previous
problem by (c) the multiplier which corresponds to this solution and by f(a(c), b(c))
the corresponding optimal value of the objective function.
We will show that (c) measures the rate of change of the optimal value of f with
respect to the parameter c.
Theorem 3. Let f, g : D R be C
1
functions of two variables. Let (a(c), b(c)) be
the solution of the problem (6) where the corresponding multiplier is denoted by (c).
Suppose that a, b and are C
1
functions of c and
g
x
(a(c), b(c)) ,= 0 or
g
y
(a(c), b(c)) ,= 0.
Then
(c) =
d
dc
f(a(c), b(c)).
The derivative in the previous equality is taken with respect to c since f(a(c), b(c))
can be seen as a function of one variable which is c.
Proof. From Theorem 1 we know that at an extreme point (a(c), b(c)) we have
f
x
(a(c), b(c)) = (c)
g
x
(a(c), b(c)),
f
y
(a(c), b(c)) = (c)
g
y
(a(c), b(c)) and
g(a(c), b(c)) = c.
259
If we dierentiate the latter equality (with respect to c) we get
g
x
(a(c), b(c))a

(c) +
g
y
(a(c), b(c))b

(c) = 1.
By the chain rule of partial derivatives:
d
dc
(f(a(c), b(c)) =
f
x
(a(c), b(c))a

(c) +
f
y
(a(c), b(c))b

(c)
= (c)
g
x
(a(c), b(c))a

(c) +(c)
g
y
(a(c), b(c))b

(c)
= (c)
_
g
x
(a(c), b(c))a

(c) +
g
y
(a(c), b(c))b

(c)
_
= (c) 1
= (c).
This completes the proof.
Remark 1. Under the assumptions of Theorem 3 we have
change in the optimal value of f due to 1-unit change in c. (7)
Proof. We know, from Theorem 3, that
(c) =
d
dc
(f(a(c), b(c))
= lim
h0
f(a(c +h), b(c +h)) f(a(c), b(c))
h

f
c
.
In consequence
f c (8)
If we take c = 1 in (8), we get f , as desired.
Example 5. Suppose the consumer in Example 1 has 1201 m.u. instead of 1200
m.u. to spend on the two commodities. Estimate how the additional 1 m.u. will aect
the maximum utility.
Solution. From Example 1 we know that
=
3
10
_
y
x
_
0,4
.
Since the maximum value M of utility when 1200 m.u. was available occured when
x = 18 and y = 8, substitute these values into the formula for we get
= 0, 3
_
8
18
_
0,4
0, 22
260
which is (see Remark 1) approximately the increase M in maximum utility resulting
from the 1 m.u. increase in available funds.
The statement of the natural generalization of Theorem 3 to several variables and
several equality constraints is the following.
Theorem 4. Let f, g
1
, . . . , g
m
: D R, D R
n
, be C
1
functions, m < n. Let
(a
1
(c), a
2
(c), . . . , a
n
(c)), c = (c
1
, . . . , c
m
) be the solution of the problem (1). Suppose
that a
1
, . . . , a
n
,
1
, . . . ,
m
are dierentiable functions of the parameters (c
1
, . . . , c
m
)
and that condition (2) holds.
Then, for each j = 1, m we have

j
(c) =
f
c
j
(a
1
(c
1
, . . . , c
m
), . . . , a
n
(c
1
, . . . , c
m
)).
In the previous equality
j
describes (approximately) the inuence of the c
j
(j
th
component of the constraint constants) on the change of the optimal value of the
problem.
4.7 Applications to economics
4.7.1 The method of least squares
Scientists studying the data from some observations or experiments are often in-
terested in determining a function that ts the data reasonable well. Suppose that
we are studying a relationship between two variables, so that each observation can be
represented by a point (x, y) in the plane.
The method of least squares is used for determining a function which approximates
the set of given points so that its graph is closest to that points.
Suppose we have n point P
1
(x
1
, y
1
), . . . , P
n
(x
n
, y
n
) which describe a relationship
between the two variables x and y. Usually these data are presented in a table of the
following form:
x x
1
x
2
. . . x
n
y y
1
y
2
. . . y
n
.
The rst step is to determine what type of function to look for. This can be done
by theoretical analysis of the practical situation or by inspection of the graph of the
n points P
1
, . . . , P
n
. The second step is to determine the particular function whose
graph is closest to the given set of points.
261

`
P
1
(x
1
, y
1
)
P
2
(x
2
, y
2
)
P
n
(x
n
, y
n
)
x
1
x
2
(x
2
, f(x
2
))
x
n
. . .
It is obvious that the error at x
i
is y
i
f(x
i
), i = 1, n. The question is how can we
combine these errors in order to dene the total error which has to reect how close
is the graph to the given points.
The choice
n

i=1
(y
i
f(x
i
)) is not convenient because this sum can be 0 and still
the terms of this sum can have great values and opposite signs.
The sum
n

i=1
[y
i
f(x
i
)[ reects better how close is the graph of f to the points
but this choice is not convenient, too, since the modulus function is not everywhere
dierentiable and we cant use the sucient conditions for local extrema. The sum
of the squares of the vertical distances from the given points to the graph of f,
n

i=1
(y
i
f(x
i
))
2
, is the convenient choice for the total error.
We have to solve the following problem:
Determine the function f such that the sum
n

i=1
(y
i
f(x
i
))
2
takes the minimum
value.
From now on we restrict the discussion to the case when f is a polynomial.
Problem. Determine a polynomial f of degree at most m,
f(x) = a
0
+a
1
x + +a
m
x
m
(a
0
, a
1
, . . . , a
m
=?) such that the function
F(a
0
, a
1
, . . . , a
m
) =
n

i=1
[y
i
(a
0
+a
1
x
i
+ +a
m
x
m
i
)]
2
takes the minimum value.
262
The unknowns are the coecients of the polynomial. We have m + 1 unknowns:
a
0
, a
1
, . . . , a
m
.
We will apply Theorem 2, section 4.5.
First we determine the stationary points of F by solving the following system:
_

_
F

a
0
= 2
n

i=1
[y
i
(a
0
+a
1
x
i
+ +a
m
x
m
i
)](1) = 0
F

a
1
= 2
n

i=1
[y
i
(a
0
+a
1
x
i
+ +a
m
x
m
i
)](x
i
) = 0
. . .
F

a
m
= 2
n

i=1
[y
i
(a
0
+a
1
x
i
+ +a
m
x
m
i
)](x
m
i
) = 0
The previous system can be written in the following way:
_

_
n

i=1
y
i
= a
0
n

i=1
1 +a
1
n

i=1
x
i
+ +a
m
n

i=1
x
m
i
n

i=1
y
i
x
i
= a
0
n

i=1
x
i
+a
1
n

i=1
x
2
i
+ +a
m
n

i=1
x
m+1
i
. . .
n

i=1
y
i
x
m
i
= a
0
n

i=1
x
m
i
+a
1
n

i=1
x
m+1
i
+ +a
m
n

i=1
x
2m
i
By denoting
n

i=1
x
k
i
= s
k
, k = 0, 2m,
n

i=1
y
i
x
l
i
= t
l
, l = 0, m (here we make the
convention x
0
i
= 1, i = 1, m), the system to be solved becomes:
_

_
s
0
a
0
+s
1
a
1
+ +s
m
a
m
= t
0
s
1
a
0
+s
2
a
1
+ +s
m+1
a
m
= t
1
. . . . . .
s
m
a
0
+s
m+1
a
1
+ +s
2m
a
m
= t
m
(1)
The previous system is called the normal system.
It can be shown that the previous system has a unique solution which is a minimum
point for F (we will not prove these statements).
In conclusion the solution of the normal system will give us the coecients for the
desired polynomial.
The coecients s
0
, s
1
, . . . , s
2m
; t
0
, t
1
, . . . , t
m
can be determined by arranging the
calculation in the following table:
263
x
0
i
x
1
i
x
2
i
. . . x
2m
i
y
i
y
i
x
i
. . . y
i
x
m
i
1 x
1
x
2
1
. . . x
2m
1
y
1
y
1
x
1
. . . y
1
x
m
n
. . . . . . . . . . . . . . . . . . . . . . . . . . .
1 x
n
x
2
n
. . . x
2m
n
y
n
y
n
x
n
. . . y
m
x
m
n

s
0
s
1
s
2
. . . s
2m
t
0
t
1
. . . t
m
(2)
In the case m = 1 the desired function is a polynomial of degree one f(x) = a
0
+a
1
x
whose graph is called the least squares line.
Example 1. On election day, the polls open at 8:00 A.M. Every 2 hours after that,
an election ocial determines what percentage of the registered voters have already
express their ballots. The data through 6:00 P.M. are shown below.
time 10:00 12:00 2:00 4:00 6:00
percentage 12 19 24 30 37
Find the equation of the least-squares line (let x denote the number of hours after
8:00 A.M.). Use the least-squares line to predict what percentage of the registered
voters will have express their ballots by the time the polls close at 8:00 P.M.
Solution. Let x denote the number of hours after 8:00 A.M. and y the percentage.
Arrange the calculations as follows:
x 2 4 6 8 10
y 12 19 24 30 37
x
0
i
x
i
x
2
i
y
i
y
i
x
i
1 2 4 12 24
1 4 16 19 76
1 6 36 24 144
1 8 64 30 240
1 10 100 37 370

s
0
= 5 s
1
= 30 s
2
= 220 t
0
= 122 t
1
= 854
The normal system is
_
5a
0
+ 30a
1
= 122
30a
0
+ 220a
1
= 854
The solution of the previous system is given by
a
0
=

122 30
854 220

5 30
30 220

=
122 220 30 854
5 220 30 30
=
122
20
= 6, 1
a
1
=

5 122
30 854

5 30
30 220

=
5 854 30 122
200
=
610
200
= 3, 05
264
So the least squares line is
f(x) = 6, 1 + 3, 05x.
To predict the percentage at 8:00 P.M., substitute 12 (the number of hours after
8:00 A.M.) into the equation of the least-squares line. This gives
y = 6, 1 + 12 3, 05 = 42, 7
which suggest that the percentage at 8:00 P.M. might be 42,7.
4.7.2 Inventory control. The economic order
quantity model
Inventory is the set of items (goods or materials) that are held by an organization
for later use.
For every type of item held in inventory we want to establish how much should be
ordered each time and when the reordering occur.
The objective is to minimize variable inventory costs which are: ordering costs and
holding costs.
Ordering costs are expenses of processing an order (these costs are independent
of the order quantity).
Holding costs are rent, heat, salaries etc.
The economic order quantity model is the simplest and the oldest of the inventory
models. It uses unrealistic assumptions but it gives a reasonable rst approximation
to the given situation.
The assumptions in this model are the following:
- we study a single product with a constant demand
- no shortages (stockouts) are allowed
- the order is constant
- the time between the orders is constant
- lead time is 0 (the lead time is the time between the ordering moment and the
receipt of the goods; so, goods arrive at the same day they are ordered).
The elements of this model are:
a) - the entire period of time (known)
b) D - the demand of the entire period (known)
c) Q - the order quantity (unknown)
d) T - the time interval between the orders (unknown)
e) C
h
- holding cost / per item / per day (known)
f) n - number of orders (unknown)
g) C
0
- ordering cost, which is xed for each period (known).
265

O
`
T 2T nT
time
inventory
control
Q
We have to determine the costs per each cycle and then to multiply them by n in
order to obtain the total variable costs. The cost per cycle consist in ordering costs
(C
0
) and holding costs C
h

Q
2
T so the total cost function will be:
C

= n
_
C
0
+C
h

Q
2
T
_
.
We have to solve the following constraint problem
_

_
C

(n, Q, T) = nC
0
+nC
h

Q
2
T min
subject to the following constraints :
= nT
D = nQ
We will solve the previous constraint optimization problem by using the elimina-
tion method. Since n =
D
Q
and = nT the total cost function will depend just on Q
and it remains for us to nd the minimum value of the following one variable function
C(Q) =
D
Q
C
0
+C
h

Q
2

The critical points are obtained by solving the next equation
C

(Q) =
DC
0
Q
2
+
C
h
2
= 0,
wherefrom we get
Q
2
=
2DC
0
C
h
and Q =
_
2DC
0
C
h
.
Since C

(Q) =
2DC
0
Q
3
> 0, we have that the optimal order quantity is
Q

=
_
2DC
0
C
h
.
Easy computations will give us the values of all unknowns mentioned before.
266
The minimum cost
C

=
D
Q

C
0
+C
h

Q

2
=
_
2DC
0
C
h
.
The minimum number of order is
n

=
_
DC
h
2C
h
and the optimal time between two orders is
T

=
_
2C
0
DC
h
.
267
268
Part III
Probabilities
269
A short history of probabilities
It is said that the theory of probabilities as a branch of mathematics appeared in
the middle of 17
th
century in France. Antoine Gombaud, Chevalier de Mere (a French
noblement) proposed his gambling problems to Blaise Pascal (1623-1662) who started
a mathematical correspondence with Pierre de Fermat (1601-1665). The gambling
problems were:
- how many throws of two dice are required such that the number of double six
appearevents to be more than a half of total throws
- how to share the wagered money between two gamblers if the game is interrupted
before it ends.
The legend say that the de Meres gambling problems made the beginning of the
theory of probabilities. Actually, the legend is not entirely true since years before
Pascal and Fermat, problems of a probabilistic nature have been analysed by some
mathematicians. It would be more realistic to say that Pascal and Fermat initiated
the fundamental principles of probability theory as we know them now (the theory
started as an empirical science).
There are at least two distinct roots of the probability theory. The rst one is
the processing of statistical data for determining mortality tables and insurances
rates (the Babylonians had forms of maritime insurances, the romans had annuities,
elements of empirical probability were applied for census of population in China).
The second one is gambling which appeared in the early stages of human history in
many places of the world. The predecessor of the dice was the astragalus (a heel bone
of an animal, the bones were used both for religious ceremonies and for gambling).
It took more than 2000 years of dice games, card games, etc. before someone
developed the basic probabilitic rules. There are at least two reasons for this late
appearance of probabilistic abstractions: Greek philosophy (the antiempiricism was
against the quantication of the random events) and early Christian theology (ev-
ery event was supposed to be a direct manifestation of Gods intervention and in
consequence every probabilist could be considered an heretic).
The rst reasoned considerations which put rudimentary probabilistic bases to the
games of chance were presented in the manuscript The book on games of chance
written around 1550 by Gerolamo Cardano and found after his death in 1576 (the
manuscript was printed only in 1663). G. Cardano was a phisicist addicted to gam-
bling. It is said that he sold all his wifes possessions just to get table stakes. It can
be said that the classical denition of probability came out of his obsession for
gambling.
The next paper on probability, On a discovery concerning dice is due to Galileo
Galilei (presumable written between 1613 and 1623).
The Pascal-Fermat exchange of letters (1654) remained unpublished until 1657.
Even that in the correspondence there were solved a set of isolated problems in prob-
ability we cannot say that the obtained results put the basis of a new theory.
But the strong inuence on many mathematicians, focused initially on gambling
and then in other branches of mathematics and sciences, lead to idea that the history
of probability begins with the correspondence between Pascal and Fermat.
271
One of those who heard about the correspondence was a Dutch mathematician,
Christian Huygens (1629-1695). In 1657, after a visit to Paris where he did not meat
Pascal (at that time Pascal abandoned math for religion) nor Fermat, he published
the rst book on probability, On reasoning in games of dice in which he solved the
same problems that have been already solved by Fermat and Pascal, he proposed and
solved some new problems and he introduced the concept of mathematical expectation
as the value of the chance. Huygens book remained a standard introduction to the
subject for about a half a century.
During the same period, important advances were made in collection of demo-
graphic data and the development of the science known today as statistics. John
Graunt (1620-1674) made a semimathematical study of mortality and insurances. His
work was extended by Sir William Petty (1623-1687) and by Edmund Halley (1656-
1742) who developed mortality tables and is considered to initiate the science of life
statistics.
Because of the games of chance, probability theory became popular and the subject
developed rapidly during the 18
th
century.
The major contributors during this period were:
Jacob Bernoulli (1654-1705) whose most important result was the Law of large
numbers.
Abraham de Moivre (1667-1754) derived the theory of permutations and combina-
tions from the principles of probability and founded the theory of annuities. In 1733 he
discovered the equation of the normal curve. The normal curve is know as the Gaus-
sian curve or Gauss-Laplace curve in honor of Marquis de Laplace (1749-1827)
and Karl Friedrich Gauss (1777-1855) who independently rediscovered the equation.
Gauss obtained it from a study of errors in repeated measurements of the same quan-
tity. Laplace made great contributions to the application of probability to astronomy
and introduced the use of partial dierential equations into the study of probability.
Between 1835 and 1870 the Belgian scientist Lambert A.J. Quetelet (1796-1874)
showed that biological and anthropological measurements follows the normal curve
and applied statistical methods in biology, education and sociology.
Simeon Denis Poisson (1781-1840) publishes in 1837 Research on the probabil-
ity of the judgements out of criminal matter and civil matter where the Poisson
distribution rst appears.
Probability theory has been developed since the 17
th
century and now has many
applications in many elds such as: actuarial mathematics, statistical mechanics, ge-
netics, law, medicine, meteorology, etc.
The major diculty in developing the rigorous theory of probabilities was to give
a denition of probability that is rigorous enough to be used in mathematics but in
the same time to be applicable in real world.
It took three century until an acceptable denition was obtained.
Andrey Nikolaevich Kolmogorov (1903-1987) presented an axiomatic denition for
probability, this work is the basis for the modern theory of probabilities.
272
Chapter 5
Counting techniques.
Tree diagrams
In this section we present some techniques for determining without direct enumer-
ation the number of possible outcomes of a particular experiment or the number of
elements in a particular set. Such counting problems are called combinatorial prob-
lems, because we count the number of ways in which dierent possible outcomes can
be combined.
As the fundamental rules of all combinatorics we consider the addition rule and
the multiplication rule. While these rules are very easy to state, they are useful in
many various and complicated situations.
5.1 The addition rule
The number of elements in a given set is called the cardinality of set A, and is
denoted by cardA or [A[.
If it is an empty set then the cardinality is 0.
If it is an innite set, then the cardinality is .
We will analyze only nite sets in this section.
Cardinality of unions
Let A and B be two nite sets. Then
card (A B) = card A+ card B card (A B).
The previous equality is obvious since if we add the cardinality of A to the cardi-
nality of B we have added the cardinality of the intersection A B twice. Hence we
have to subtract once the cardinality of A B from cardA+ card B.
As a corollary, if the sets A and B are mutually exclusive then the cardinality of
273
the union is the sum of cardinalities
card (A B) = card A+ card B, if A B = .
The generalization of the previous result to an arbitrary union of n nite sets is
called the inclusion-exclusion principle.
The inclusion-exclusion principle. Let A
1
, A
2
, . . . , A
n
be n nite sets. Then:
card
_
n
_
i=1
A
i
_
=
n

i=1
card A
i

1i<jn
card (A
i
A
j
)
+

1i<j<kn
card (A
i
A
j
A
k
) + (1)
n+1
card
_
n

i=1
A
i
_
.
If the sets A
i

i=1,n
are mutually exclusive then:
card
_
n
_
i=1
A
i
_
=
n

i=1
card A
i
, if A
i
A
j
= , i ,= j.
Example. In an association gathering 95 people, 72 play backgammon, 44 play
chess and 30 do not play any of these two games. How many of them play at the same
time backgammon and chess?
Solution. We will use the following notations:
A - the set of whole members of the association
B - the set of backgammon players
C - the set of chess players.
We are given the following data:
card A = 95, card B = 72, card C = 44
and
card (A (B C)) = 30
wherefrom we can easily get:
card (B C) = 95 30 = 65.
By applying the previous formula we obtain:
card (B C) = card B + card C card (B C)
= 72 + 44 65
= 51.
274
Hence, the number of people that play at the same time backgammon and chess
is 51.
Example. 120 people take part at a conference. Among participants each can
speak at least one language: French, Spanish or German. We also know that:
10 people speak the three languages
4 people speak French, Spanish, but not German
8 people speak only Spanish
100 people speak French
32 people speak Spanish
53 people speak German.
Determine:
- the number of people who speak Spanish and German but not French
- the number of people who speak French and German but not Spanish
- the number of people who speak only German
- the number of people who speak only French.
Solution. We consider the following Venn-Euler diagram.
F
B
C
A
D
G
S
French
German
Spanish
Each disc among the 3 drawn corresponds to a group who speak a language
(French, German or Spanish).
In order to simplify the notations we will denote by small correspondent letters
the cardinality of the involved sets (for instance a = card A).
By applying the addition rule with mutually exclusive sets our problem reduces
to solving the following system of linear equations:
275
_

_
a +b +c +d +f +g +s = 120
a = 10
c = 4
s = 8
f +c +a +b = 100
s +c +a +d = 32
g +a +b +d = 53
_

_
a = 10, c = 4, s = 8
b +d +f +g = 120 10 4 8 = 98
f +b = 100 14 = 86
d = 32 22 = 10
g +b = 53 10 10 = 33
_

_
a = 10, c = 4, s = 8, d = 10
f +b = 86
g +b = 33
b +f +g = 88
_

_
a = 10, c = 4, s = 8, d = 10
g = 88 86 = 2
b = 33 2 = 31
f = 88 33 = 55
In conclusion: a = 10, b = 31, c = 4, d = 10, f = 55, g = 2, s = 8.
Hence:
- there are d = 10 people who speak Spanish and German, but not French
- there are b = 31 people who speak French and German but not Spanish
- there are g = 2 people who speak only German
- there are f = 55 people who speak only French.
The addition rule can be formulated in general terms involving objects, operations
or symbols but the main idea is the same.
The addition rule
If there are m possible outcomes for an event (or ways to do something) and n
possible outcomes for another event (or ways to do another thing) and the two events
cannot both occur (or the two things cant both be done) then either of the two events
can occur (or total possible ways to do one of the things) in m+n ways.
Formally, the sum of the sizes of two disjoint sets is equal to the size of their union.
Example. A square with side length 3 is divided by parallel lines into 9 equal
squares. What is the total number of squares obtained by this procedure?
Solution. We divide the squares into three sets S
1
, S
2
, S
3
such that the set S
i
contains all squares of side length i (i = 1, 2, 3).
276
It is obvious that: cardS
1
= 1, card S
2
= 4 and cardS
3
= 9, hence the total
number of squares is
card (S
1
S
2
S
3
) = card S
1
+ card S
2
+ card S
3
= 1 + 4 + 9 = 14.
5.2 Tree diagrams and the multiplication principle
Consider an experiment that takes place in several steps such that the number
of outcomes at the n
th
step is independent of the outcomes of the previous steps.
Suppose that the number of outcomes at each step may be dierent for dierent
steps. We have to count the number of ways that the entire experiment can occur.
The best way to analyze such multistep problems is by drawing a tree diagram.
We list the possible outcomes of the rst step, and then draw lines to present the
possible outcomes that can occur in the second step and so on.
To clarify the above description we will present the following example:
Example. A cafe has the following menu:
a) two choices for appetizers: soup or juice;
b) three for the main course; lamb chops, sh or vegetable dish;
c) two for dessert: ice cream or cake.
How many possible choices do you have for your complete menu?
Solution. The complete menu is choosen in three independent steps: two choices
at the rst course, three at the second and two at the third.
From the following tree diagram we see that the total number of choices is the
product of the number of choices at each stage.
277
soup
juice
lamb
sh
vegetable
lamb
sh
vegetable
ice cream
ice cream
ice cream
ice cream
ice cream
ice cream
cake
cake
cake
cake
cake
cake
We have 2 3 2 = 12 possible menus.
A tree diagram is a device used to enumerate all the possible outcomes of a mul-
tistep experiment where the outcomes at each step are independent of those at the
previous steps and the outcomes at each step can occur in a nite number of steps.
From the previous example we can observe that the tree is constructed from left
to right, and the number of branches at each point corresponds to the number of
possible outcomes of the next step of the experiment.
More rigorously, we can introduce a tree as:
A directed graph is a set of points, called vertices, together with a set of directed
line segments, called edges, between some pairs of distinct vertices. A path from a
vertex u to a vertex v in a directed graph G is a nite sequence (v
0
, v
1
, . . . , v
n
) of
vertices of G (n 1), v
0
= u, v
n
= v, and (v
i1
, v
i
) is an edge in G for i = 1, 2, . . . , n.
A directed graph T is a tree if it has a distinguished vertex r, called the root,
such that r has no edges going into it and such that for every other vertex v of T
there is a unique path from r to v.
278
We can easily generalize the result obtained in the previous example to multistep
experiments.
The multiplication rule
If an experiment is performed in m steps, and there are n
1
choices in the rst
step, and for each of those choices there are n
2
choices in the second step, and so on,
with n
m
choices in the last step for each of the previous choices, then the number of
all possible outcomes is given by the product n
1
n
2
n
3
. . . n
m
.
Example. On a grid of sporting lotto, we have to choose one of the three boxes 1,
x or 2 for each of the 9 matches (1 is put when the host team gains the match, x when
the match nishes at the same score and 2 when the host team loses the match).
How many dierent choices do we have?
Solution. There are 9 steps in our experiment (since there are 9 matches on the
grid) hence m = 9.
Since n
1
= n
2
= = n
9
= 3 (at each step we have to choose one of the three
possible boxes: 1, x or 2) then the number of all choices is
3 3 . . . 3
. .
9 times
= 3
9
= 19683.
Example. How many functions can we dene on a set A, card A = m, given that
the target space is B, card B = n.
Solution. There are m steps in our experiment (since there are m points in the
domain of the function).
Since n
1
= n
2
= = n
m
= n (at each step we have to choose one of the n
elements of the set B) then the number of all functions is
n n. . . n
. .
m times
= n
m
.
Example. In a race with 20 horses, in how many dierent ways can the rst three
places be lled?
Solution. There are 3 steps in our experiment (rst, second and third place)
hence m = 3. There are 20 horses that can come rst (n
1
= 20). Whichever horse
comes rst, there are 19 horses left that can come second (n
2
= 19). Whichever
horses come rst and second, there are 18 horses left that can come third. So there
are n
1
n
2
n
3
= 20 19 18 = 6840 ways in which the rst three positions can be lled.
Example. Ruxi and Ana are to play a tennis match. The rst person who wins
three sets wins the match. Draw a tree diagram which shows the possible outcomes
of the match.
279
Solution.
R
R R
R
R
R
R
R
R
R
R
R
R
R
R
R
R
A
A
A
A
A
A
A
A
A
A
A
R
R
R
A
A
A
A
A
(R, R, R)
(R, R, A, R)
(R, R, A, A, R)
(R, R, A, A, A)
(R, A, R, R)
(R, A, R, A, R)
(R, A, R, A, A)
(R, A, A, R, R)
(R, A, A, R, A)
(R, A, A, A)
(A, R, R, R)
(A, R, R, A, R)
(A, R, R, A, A)
(A, R, A, R, R)
(A, R, A, R, A)
(A, R, A, A)
(A, A, R, R, R)
(A, A, R, R, A)
(A, A, R, A)
(A, A, A)
Notice that the total number of choices is 20. In this example, the branches have
dierent lengths, and this makes the counting more dicult than in the previous
examples.
Example. How many natural numbers are there under 1000 whose digits are
even?
Solution. There are 4 single-digit even numbers.
There are 4 5 numbers with two even digits, since the rst digit can take 4 values
(2,4,6,8) and the second digit can take 5 values (0,2,4,6,8).
There are 4 5
2
numbers with three even digits.
By applying the addition rule we get the solution, which is
4 + 4 5 + 4 5 = 4 31 = 124.
280
5.3 Permutations and combinations
Some counting problems appear so frequently in application that we have special
names and symbols. In this subsection we will discuss such problems.
Permutations
Example. How many dierent ordered arrangements of the letters a, b, c are pos-
sible?
Solution. We can enumerate all the possibilities which are: abc, acb, bac, bca, cab,
cba. Hence there are 6 ordered arrangements.
This result can also be obtained from the multiplication rule, since the rst letter
can be any of three, the second letter can be any of the two remaining letters and the
third letter is the remaining one.
Thus there are 3 2 1 = 6 ordered arrangements of the given letters.
Factorial notation
The product of the positive numbers from 1 to n inclusive occurs frequently in
mathematics and is denoted by the special symbol n! (read n factorial).
n! = 1 2 . . . n.
The expression 0! is dened to be 1 to make simpler certain formulas.
Any arrangement of a set of n objects in a given order is called a permutation
of the objects (taken all a time).
Any arrangement of any k n of these objects in a given order is called an
k-permutation, a permutation of the n objects taken k at time or an ar-
rangement of the n objects taken k at a time.
Notations. The number of permutations of n objects taken k at a time is denoted
by
P(n, k) or A
k
n
.
The number of permutations of n objects taken n at a time is denoted by
P(n, n) or P
n
.
We usually are interested in the number of such permutations without listing them.
Theorem. Given n distinct objects, the number of distinct permutations of the n
objects taken k (k n) at a time is
P(n, k) = A
k
n
= n(n 1) . . . (n k + 1) =
n!
(n k)!
P
n
= n!
Proof. In the rst place we can put n objects (which can be written as n1+1);
in the second place we can put n 1 (n 2 + 1) objects; and so on. In the k
th
place
(the last one) we can put n k + 1 objects.
281
By applying the multiplication rule
A
k
n
= P(n, k) = n(n 1)(n 2) . . . (n k + 1)
=
n(n 1)(n 2) . . . (n k + 1)(n k) . . . 2 1
(n k) . . . 1
=
n!
(n k)!
If we take k = n in the previous formula we get
P
n
= P(n, n) =
n!
0!
= n!, as desired.
Example. Ioana has 11 books that she is going to put on her bookshelf. Of these,
5 are law books, 3 are literature books, 2 are history books and 1 is a language book.
Ioana wants to arrange her books so that all the books with the same subject are
together on the shelf. How many dierent arrangements are possible?
Solution. For each possible ordering of the subjects, there are 5! 3! 2! 2! possible
arrangements. Since there are 4! possible orderings of the subjects, the desired result
is 4! 5! 3! 2! 1!.
Example. In how many ways can 8 persons arrange themselves
a) in a row of 8 chains?
b) around a circular table?
Solution. a) The eight persons can arrange themselves in P
8
= 8! ways.
b) One person can sit at any place in the circular table. The other 7 persons can
then arrange themselves in P
7
= 7! ways around the table.
This is an example of circular permutation.
n objects can be arranged in a circle in (n 1)! ways.
Example. a) In how many ways 3 women aged 20, 4 women aged 45 and 6 women
aged 75 be seated in a row so that those of the same age sit together?
b) Solve the same problem is they sit at a round table.
Solution. a) The 3 groups of women can be arranged in a row in 3! ways. In each
case, the 3 women aged 20 can be seated in 3! ways, the 4 women aged 45 in 4! ways
and the 6 women aged 75 in 6! ways. Thus, there are 3! 3! 4! 6! arrangements.
b) The 3 groups of women can be arranged in a circle in 2! ways (see the previ-
ous example with circular permutations). Thus, in this case there are 2! 3! 4! 6!
arrangements.
Example. Find the number of four-letter words using only the letters
a, b, c, d, e, f. Dont use a letter twice!
Solution.
A
4
6
=
6!
(6 4)!
=
6!
2!
= 3 4 5 6 = 360.
Permutations with indistinguishable objects
We will now determine the number of permutations of a set of n objects when
certain objects are indistinguishable from each other.
282
First of all we will present an example.
Example. How many dierent letter arrangements can be formed using the letters
MUMMY ?
Solution. First, note that there are not 5! permutations, since the Ms are not
distinguishable from each other.
If the three Ms are distinguished there are 5! permutations of the letters
M
1
UM
2
M
3
Y . Observe that the following 3! = 6 permutations
M
1
M
2
M
3
UY, M
1
M
3
M
2
UY, M
2
M
1
M
3
UY,
M
2
M
3
M
1
UY, M
3
M
1
M
2
UY, M
3
M
2
M
1
UY
produce the same word when the subscripts are removed.
This is true for each of the other possible positions in which the Ms appear.
In conclusion there are
5!
3!
= 4 5 = 20
dierent letter words that can be obtained using the letters from the word MUMMY .
Theorem. If there are n objects with n
1
indistinguishable objects of a rst type, n
2
indistinguishable objects of a second type,... and n
k
indistinguishable objects of an k
th
type, where n
1
+n
2
+ +n
k
= n, then there are
n!
n
1
!n
2
! . . . n
k
!
linear arrangements
of the given n objects.
Proof. We begin with the assumption that the objects of the same type are
distinct and let all n! arrangements of these n objects. We split these arrangements
into groups such that two elements of the same group dier only by the fact that the
objects of the same type are interchanged. Each group can be represented by one xed
arrangement with repetitions. Since the objects of the rst type can be interchanged
in n
1
! ways, the objects of the second type can be interchanged in n
2
! ways,. . . and
the objects of the k
th
type can be interchanged in n
k
!, each group contains exactly
n
1
!n
2
! . . . n
k
! ways arrangements. In conclusion, by applying the addition rule we get
that the desired number of arrangements is
n!
n
1
!n
2
! . . . n
k
!
.
Example. An university applicant has to pass four entrance exams, which means
getting 2, 3 or 4 point for each exam. In order to be accepted the applicant must get
a total of at least 13 points. How many possible exam results are there (in order that
the applicant to be accepted).
Solution. In order to be accepted the applicant can obtain a total of 13, 14, 15
or 16 points at the 4 exams.
16 points can be achieved in 1 one way (four points at each exam).
15 points can be achieved in
4!
3! 1!
= 4 ways (four points at any three exams out
of four and 3 points at the other exam).
283
14 points can be achieved in
4!
3! 1!
+
4!
2! 2!
= 4 +6 = 10 ways (four points at any
3 exams and 2 points at the other exam or four points at any two exams and 3 points
at the other two exams).
13 points can be achieved in
4!
2! 1! 1!
+
4!
3! 1!
= 12 + 4 = 16 (four points at 2
exams, 3 points at one exam and 2 points at the other exam or 3 points at any 3
exams and four points at the other exam).
By applying the addition rule we get 1 + 4 + 10 + 16 = 31 possible exam results
in order that the applicant to be accepted.
Permutations with repetitions
Let A = a
1
, . . . , a
n
be a set with n elements.
A k-permutation with repetitions of elements of n types is an k-tuple whose
components are in the set A.
Theorem. The number of all k-permutations with repetitions of elements of n
types is n
k
.
Proof. Each component of the k-tuple can take n values and by applying the
multiplication rule the desired number is
n n. . . n
. .
k times
= n
k
.
Combinations
Example. Ten points lie in a plane in such way that no three of them lie on the
same straight line. How many lines do these point determine?
Solution. Since each line is uniquely determined by a pair of points through which
it passes, the number of all lines is equal to the number of all unordered pairs of points
that can be chosen from the given set of 10. There are A
2
10
pairs of 2 points when the
order in which the points are selected is relevant. However, since every pair is counted
twice, the total number of lines is equal to
A
2
10
2
=
10!
8! 2
=
9 10
2
= 45.
As the previous example shows, sometimes we are interested in determining the
number of dierent groups of k objects that could be selected from a total of n objects.
Denition. Let us consider a set with n elements.
A combination of these n elements taken k at a time is any selection of k of the n
elements where the order does not count. Such a selection is called an k-combination.
The number of all possible unordered selections of k dierent elements out of n dif-
ferent ones is denoted by
_
n
k
_
(read n choose k) or by C
k
n
(read combinations of n
taken k at a time).
Theorem. If 0 k n then
C
k
n
=
A
k
n
k!
=
n!
k!(n k)!
.
284
Proof. We have C
k
n
ways of choosing k elements out of n without regarding order.
In each case we have k elements which can be ordered in k! ways. By applying the
multiplication rule, the number of k permutations is C
k
n
k!. On the other hand, this
number is A
k
n
. Hence
C
k
n
k! = A
k
n
,
wherefrom we get the desired formula.
Remark. The quantity C
k
n
is also called the binomial coecient since it occurs
as the coecient of the binomial expansion given by
(a +b)
n
= C
0
n
a
n
+C
1
n
a
n1
b + +C
k
n
a
nk
b
k
+ +C
n
n
b
n
.
Properties of the binomial coecients
1

. C
k
n
= C
nk
n
2

. Pascals identity:
C
k
n+1
= C
k
n
+C
k1
n
3

. Sum of the binomial coecients


C
0
n
+C
1
n
+C
2
n
+ +C
n
n
= 2
n
4

. Vandermondes identity
C
k
n+m
= C
0
n
C
k
m
+C
1
n
C
k1
m
+C
2
n
C
k2
m
+ +C
k
n
C
0
m
Proofs. 1

. C
nk
n
=
n!
(n k)!(n (n k))!
=
n!
(n k)!k!
= C
k
n
2

. Expanding the righthand side of the equality we obtain


C
k
n
+C
k1
n
=
n!
k!(n k)!
+
n!
(k 1)!(n k + 1)!
=
n!
(k 1)!(n k)!
_
1
k
+
1
n k + 1
_
=
n!
(k 1)!(n k)!

n k + 1 +k
k(n k + 1)
=
(n + 1)!
k!(n k + 1)!
= C
k
n+1
3

. Substituting in the binomial expansion


(a +b)
n
= C
0
n
a
n
+C
1
n
a
n1
b + +C
n
n
b
n
a = b = 1, we obtain 2
n
= C
0
n
+C
1
n
+C
2
n
+ +C
n
n
, as desired.
4

. By identifying the coecients of x


k
of both sides of the following identity:
(1 +x)
m+n
= (1 +x)
m
(1 +x)
n
,
C
0
m+n
+C
1
m+n
x + +C
k
m+n
x
k
+ +C
m+n
x
m+n
285
= (C
0
n
+C
1
n
x +C
2
n
x
2
+ +C
n
n
x
n
)(C
0
m
+C
1
m
x +C
2
m
x
2
+ +C
m
m
x
m
)
we get that:
C
k
m+n
= C
0
n
C
k
m
+C
1
m
C
k1
m
+ +C
k
n
C
0
m
.
Example. A farmer buys 3 cows, 2 pigs and 4 hens from a man who has 6 cows,
5 pigs and 8 hens. Find the number of choices that the farmer has.
Solution. The farmer can choose the cows in C
3
6
ways, the pigs in C
2
5
ways and
the hens in C
4
8
ways. By the multiplication rule the number of choices is C
3
6
C
2
5
C
4
8
.
Example. From a group which consists of 7 boys and 4 girls we want to choose
a six-member volleyball team that has at least 2 girls. In how many ways the volley
ball team can be selected?
Solution. We divide all possible choices into three groups V
2
, V
3
, V
4
such that in
each team in V
i
, i = 2, 3, 4 there are exactly i girls. These i girls can be chosen in C
i
4
ways and the remaining 6 i team members are chosen in C
6i
7
ways. Hence,
card V
i
= C
i
4
C
6i
7
and the number of all choices is
card V
2
+ card V
3
+ card V
4
= C
2
4
C
4
7
+C
3
4
C
3
7
+C
4
4
C
2
7
= 6 35 + 4 35 + 1 21 = 371.
Combinations with repetitions
Let A = a
1
, . . . , a
n
be a set with n elements.
A k-combination with repetitions of elements of n types is an unordered
group of k elements which consists of k
i
copies of a
i
, i = 1, n and k
1
+k
2
+ +k
n
= k.
For example a, a, b, b, b is a 5-combination with repetitions of the elements of
the set a, b, c, d.
Theorem. The number of all k-combinations with repetitions of elements of n
types, is equal to C
k
k+n1
.
Proof. To each k-element with repetitions of elements of n types we associate a
sequence of zeros and ones as follows.
First we write k
1
ones, then one zero, then k
2
ones, then 1 zero etc. up to k
n
ones.
If k
i
= 0 for some i, there will be no one in the correspondent position. Thus we
obtained an ordered m-tuple of 0s and 1s where
m = k
1
+ 1 +k
2
+ 1 + +k
n1
+ 1 +k
n
= k
1
+k
2
+ +k
n
+n 1
= k +n 1.
There is a one to one correspondence between the k-elements with repetitions of
elements of n types and k +n 1-tuples which contains k ones and n 1 zeros. The
number of these is C
k
k+n1
, as we needed.
286
Example. A domino is a rectangle divided into two squares with each square
numbered one of 0, 1, . . . , 6, repetitions allowed. How many dominoes are there?
Solution. In this case n = 7 (there are seven possibilities, 0, 1, 2, . . . , 6 for each
square) and k = 2 (each of two squares of a domino is to be numbered).
Hence, there are C
2
2+71
= C
2
8
=
8 7
2
= 28 dominoes.
Example. On their way home, seven students stop at a restaurant, where each of
them has one of the following: a cheeseburger, a hot dog, a taco or a sh sandwich.
How many dierent orders are possible (from the point of view of the restaurant)?
Proof. In this case n = 4 (there are four types of food available) and k = 7 (each
of 7 students choose a food). Hence, the number of possible orders is
C
7
7+41
= C
7
10
=
10!
7! 3!
=
8 9 10
6
= 120.
Remark. The problem of combinations with repetitions allowed with given n and
k is equivalent to the following problem. How many solutions are there to the equation
x
1
+x
2
+ +x
n
= k such that each x
i
is a nonnegative integer?
Proof. x
i
represents the number of elements of type i selected,
i = 1, n.
Example. How many solutions does the equation
x
1
+x
2
+x
3
+x
4
= 7
have such that x
i
, i = 1, 4, are nonnegative integers.
Solution. Here n = 4 and k = 7, so the answer is
C
7
7+41
= C
7
10
=
10!
7! 3!
= 120.
287
288
Chapter 6
Basic probability concepts
6.1 Sample space. Events
A random experiment is a process or an action whose outcomes are not known
in advance with certainty. Classic random experiments include ipping a coin, rolling
a dice, selecting a ball from an urn and drawing a card from a deck.
Each repetition of an experiment is a trial which has an observable outcome. If we
assume in the coin experiment that the coin cannot rest on its edge, the two possible
outcomes for a trial are the occurrence of a head or the occurrence of a tail.
The set which contains all the possible outcomes for an experiment is called the
sample space (denoted by S).
For the coin example the sample space S is dened as
S = head, tail = H, T.
Some other examples:
Example 1. If the experiment consists of ipping two coins, then the sample
space consists of the following 4 elements:
S = (H, H), (H, T), (T, H), (T, T).
The outcome will be (H, H) if both coins come up heads; it will be (H, T) if the
rst coin comes up heads and the second comes up tails, etc.
Example 2. If the experiment consists of rolling a dice, then the sample space is
S = 1, 2, 3, 4, 5, 6, where the outcome i means that i points appeared on the dice,
i = 1, 6.
Example 3. If the experiment consists of rolling a dice until a six is obtained
then we obtain an innite sample space S
S = 6, 16, 26, 36, 46, 56, 116, 126, . . . .
289
Example 4. If the experiment consists of rolling two dice, then the sample space
consists of the following 36 elements
S =
_

_
(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)
(2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)
(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)
(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)
(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6)
(6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)
_

_
where the outcome (i, j) is said to occur if i appears on the rst dice and j on the
second dice.
Each element of S is called elementary event. Any subset E of the sample space
(or any collection of elementary events) is known as an event. The events are denoted
by capital letters. Some examples of events are the following.
Example 5. In example 2 the event that an odd number appears on the dice is
A = 1, 3, 5 and the event that the outcome is at most 4 is B = 1, 2, 3, 4.
Example 6. In example 4 if
E = (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1),
then E is the event that the sum of the dice equals 7.
We say that an event E S is realized if the outcome of the experiment is an
element of the set E.
The impossible event is the event that never happened. This is the event con-
taining no outcomes and is denoted by .
The certain event is the event that happens in each trial. This is the event that
contains all possible outcomes which is S.
Remark. Since the events are sets of elementary events (or subsets of the sample
space) we may combine them according to the usual set operations.
Operations with events
Consider an experiment whose sample space is S.
Let A and B be two events of the sample space S.
The union of the events A and B is the event denoted by AB which consists
of all elementary events that are either in A or in B in both of them. That is, the
event A B will occur if either A or B occurs.
For instance, in example 5 if A = 1, 3, 5 and B = 1, 2, 3, 4 then
A B = 1, 2, 3, 4, 5.
290
The intersection of the events A and B is the event denoted by A B which
consists of all elementary events that are both in A and B. That is, the event A B
will occur only if both A and B occur.
For instance, in example 5, if A = 1, 3, 5 and B = 1, 2, 3, 4 then
A B = 1, 3.
Two events A and B are said to be mutually exclusive events or disjoint
events if they cannot be realized at the same time, that is A B = .
We can also dened unions and intersections of more than two events in a similar
manner.
The contrary event of a given event A is the event denoted by A which consists
of all elementary events in the sample space S that are not in A.
The contrary event, A, will occur if and only if the event A does not occur.
For instance, in example 5, if A = 1, 3, 5 then A = 2, 4, 6.
The dierence of two events A and B is the event denoted by AB(= AB)
which consists of all the elementary events that are in A but not in B. The event AB
will occur if and only if the event A will occur and the event B will not occur.
The inclusion A B is the event which is realized if every elementary event in
A is also in B.
Properties of events operations
Operations with events satisfy various identities which are listed in the table below.
Event spaces (-elds)
This section contains a technical approach which can be omitted at the rst read-
ing.
We have already observed that events are subsets of S. The question which arises
as a consequence of the previous remark is the following: which subsets of S can be
considered to be events?
It is obvious that if A and B are events, then A B, A, A B are also events.
This is too vague; to be rigorous, we say that a subset A of S can be an event if it
belongs to a set T T(S) which satises the following properties.
Denition. (Event space or -eld)
Let S be the sample space of a given experiment.
The set T T(S) is called an event space or a -eld if the following three
conditions are fullled:
(i) S T
(ii) if A T then A T
(iii) if A
j
T, j N, then

_
j=1
A
j
T.
291
Idempotent laws A A = A A A = A
Associative laws (A B) C = A (B C) (A B) C = A (B C)
Commutative laws A B = B A A B = B A
Distributive laws A (B C) = (A B) (A C) A (B C) = (A B) (A C)
Identity laws A = A A S = A
A S = S A =
Involution laws (A) = A
Complement laws A A = S A A =
S = = S
De Morgans laws A B = A B A B = A B
2
9
2
Remark. Let T be an event space (on the sample space S).
If A, B T then A B, A B, A B and AB T.
Proof. First, we observe the fact that T. This is true since = S T
(according to the second rule of the previous denition).
If we take now in the third rule of the denition of an event space: A
1
= A, A
2
= B
and A
j
= , j 3 then
A B =

_
j=1
A
j
T.
As a consequence of De Morgans laws we have that A B = A B which leads
us to the following identity
A B = (A B) T
(according to the second and third rule of the previous denition.
Now, it is obvious that A B = A B T.
Similarly, AB = (A B) (B A) T.
Example. In the experiment of rolling a dice we dene the following events:
A the event that the outcome is even (A = 2, 4, 6)
B the event that the outcome is odd (B = 1, 3, 5)
In this case T
1
= , A, B, S is an event space and T
2
= , A, S is not an event
space since A = B , T
2
.
The smallest event space which can be dened on a sample space S is T = , S
and the largest event space is T(S).
Usually, if S is nite then T = T(S). If S is innite, then T(S) is too big to be
useful, and a smaller collection of subsets is required.
The classical denition of probability
The classical denition was given by the French mathematician Pierre Simon
Laplace in his book Theorie analytique des probabilites in the following form:
The probability of an event is the ratio of the number of cases favorable to it, to
the number of all cases possible when nothing leads us to expect that any of these
cases should occur more than any other.
Denition. (The classical denition of probability)
We consider an experiment which has a nite number of equally likely outcomes,
S = s
1
, . . . , s
n
.
The function P : T(S) [0, 1], dened as
T(S) A P(A) =
number of favorable cases for the occurrence of A
number of all possible outcomes(n)
is a probability on T().
Example. In the experiment of rolling a pair of unbiassed dice compute the
probability of the event that
a) the sum of the dice is 11
293
b) the sum of the dice is at least 11.
Solution. The number of all possible cases is 36 = 6 6 (see Example 2).
Let A be the event that the sum of the dice is 11 and B the event that the sum
of the dice is at least 11. Then
A = (5, 6), (6, 5), B = (5, 6), (6, 5), (6, 6),
P(A) =
2
36
=
1
18
and P(B) =
3
36
=
1
12
.
Example. In the experiment of drawing a card from a deck with 52 cards compute
the probability of drawing a king or a spade.
Solution. Out of the 52 cards, there are 13 spades and 4 kings. The number of
favorable cases is 13 + 4 1 = 16 (in order not to count the king of spades twice).
The desired probability will be
16
52
=
4
13
.
Classical probability suers from a serious limitation since its denition consider
all outcomes to be equiprobable. This can be useful for drawing cards, rolling dice or
extracting balls from an urn but it cannot help us in the experiments with outcomes
with unequal probabilities.
The axiomatic denition of probability
The axiomatic approach build up probability theory from a number of axioms.
Denition. Let S be the sample space of a given experiment and let T T(S)
be an event space on S.
The function P : T R which satises the following properties:
i) P(A) 0, A T
ii) P(S) = 1
iii) P
_

_
i=1
A
i
_
=

i=1
P(A
i
), for each sequence (A
i
)
i1
T of mutually exclusive
events (A
i
A
j
= , i ,= j)
is called a probability function on T.
The previous denition was introduced by the Russian mathematician Kolmogorov
in 1933.
Denition. (Probability space)
Let S be a sample space of a given experiment, T T(S) be an event space
on S and P be a probability function on T. Then the triple (S, T, P) is called a
probability space.
Elementary properties of a probability function
From the previous denition with its three axioms we can deduce a lot of properties
that one would expect a probability function to have.
Let (S, T, P) be a probability space.
P1) P() = 0.
294
Proof. Since S = S . . . and the sequence S, , , . . . , , . . . consists of
mutually exclusive events, by applying the third axiom of the probability function we
get
P(S) = P(S) +

i=1
P()
which is equivalent to

i=1
P() = 0.
The last equality cannot be true unless P() = 0.
Of course, the fact that the impossible event has probability 0 is natural.
P2) For each pair A, B T of mutually exclusive events (A B = ) we have
P(A B) = P(A) +P(B).
Proof. Since A B = A B . . . and P() = 0 we get
P(A B) = P(A) +P(B) +

i=1
P() = P(A) +P(B),
as desired.
P3) A T, P(A) = 1 P(A).
Proof. For each A T we have A A = S and A A = . By applying the
previous property and the second axiom we get
1 = P(S) = P(A A) = P(A) +P(A)
which implies that
P(A) = 1 P(A).
P4) For each A, B T with A B we have:
a) P(B A) = P(B) P(A);
b) P(A) P(B).
Proof. Since A B we get that B = A(B A) and A(B A) = . From (P2)
we have that
P(B) = P(A) +P(B A)
and since P(B A) 0 we obtain the inequality P(B) P(A).
P5) A T, 0 P(A) 1.
Proof. Clearly, A S for every event A. Then the rst axiom and (P4) give
0 = P() P(A) P(S) = 1.
P6) The addition rules of probabilities
i) The case of two events
A, B T : P(A B) = P(A) +P(B) P(A B)
295
ii) the case of three events
A, B, C T : P(A B C) = P(A) +P(B) +P(C) P(A B)
P(A C) P(B C) +P(A B C)
Proof. i) It is clear that (by means of a Venn diagram for example)
A B = A [B (A B)]
Then, by using (P2) and (P4), we get
P(A B)
(P2)
= P(A) +P(B (A B))
(P4)
= P(A) +P(B) P(A B)
ii) We apply part (i) to obtain:
P(A B C) = P((A B) C)
(i)
= P(A B) +P(C) P((A B) C)
(i)
= P(A) +P(B) P(A B) +P(C)
P((A C) (B C))
(i)
= P(A) +P(B) P(A B) +P(C)
[P(A C) +P(B C) P((A C) (B C))]
= P(A) +P(B) +P(C) P(A B) P(A C)
P(B C) +P(A B C)
We can generalize the addition rules to the case of more then three events.
Theorem. (The Poincares formula) The probability of the union of any n
events A
1
, A
2
, . . . , A
n
is given by:
P
_
n
_
i=1
A
i
_
=
n

i=1
P(A
i
)

1i<jn
P(A
i
A
j
) +

1i<j<kn
P(A
i
A
j
A
k
)
+ (1)
n+1
P(A
1
A
2
A
n
).
Even if the proof of the theorem (which is by induction) will not be presented, the
form of the right-side above is clear. First, we have to sum the probabilities of the
individual events, then subtract the probabilities of the intersections of the events,
taken two at a time (in the ascending order of indices), then add the probabilities of
the intersections of the events, taken three at a time as before, and continue like this
until you add or subtract (depending on n) the probability of the intersection of all
n events.
P7) i) The Booles inequalities
A, B T : P(A B) P(A) +P(B).
296
ii) (A
i
)
i1
T; P
_

_
i=1
A
i
_

i=1
P(A
i
).
Proof. i) From the previous property we have:
P(A B) = P(A) +P(B) P(A B) P(A) +P(B)
ii) First, we observe that the union

_
i=1
A
i
can be written as a union of mutually
exclusive events in the following way

_
i=1
A
i
= A
1
(A
2
A
1
) (A
3
(A
1
A
2
)) . . .
So,
P
_

_
i=1
A
i
_
= P(A
1
) +P(A
2
A
1
) +P(A
3
(A
1
A
2
)) +. . .
(P4)
P(A
1
) +P(A
2
) +P(A
3
) +. . .
=

i=1
P(A
i
)
P8) The Bonferronis inequality
A
1
, . . . , A
n
T : P
_
n

i=1
A
i
_

i=1
P(A
i
) n + 1.
Proof. The proof is by induction.
The rst case is n = 1 and is P(A
1
) P(A
1
).
The case n = 2: P(A
1
A
2
) P(A
1
) +P(A
2
) 1.
To prove this we use the addition rule and the fact that P(A
1
A
2
) 1.
P(A
1
A
2
) = P(A
1
) +P(A
2
) P(A
1
A
2
) P(A
1
) +P(A
2
) 1.
The inductive step remains. We assume that the proposition is true for k and we
show that it necessarily follows for the case k + 1 (we use the case n = 2).
P(A
1
A
2
A
k+1
) = P((A
1
A
k
) A
k+1
)
P(A
1
A
2
A
k
) +P(A
k+1
) 1
P(A
1
) + +P(A
k
) k + 1 +P(A
k+1
) 1
=
k+1

i=1
P(A
i
) k
297
which is what we have to prove.
Next, some examples are presented to illustrate some of the above properties.
Example. (i) Let A, B T such that P(A) = 0, 5, P(B) = 0, 4 and P(A B) =
0, 6. Calculate P(A B).
(ii) If P(A) = 0, 5, P(B) = 0, 4, P(AB) = 0, 4 and B C, calculate P(ABC).
Solution. (i) From P(A B) = P(A) +P(B) P(A B) we obtain
P(A B) = P(A) +P(B) P(A B) = 0, 5 + 0, 4 0, 6 = 0, 3.
(ii) The inclusion B C implies C B and hence
A B C = A B.
In consequence
P(A B C) = P(A B) = P(A) +P(B) P(A B)
= P(A) + 1 P(B) P(A B) = 0, 5 + 1 0, 4 0, 4 = 0, 7.
Example. Consider a biased dice such that the probability of occurrence of a face
is directly proportional to the number of the points on the considered face.
Consider the following events:
A = the occurrence of an even number
B = the occurrence of an odd number
C = the occurrence of a prime number
a) Compute the probabilities of occurrence of each face of the dice.
b) Compute P(A), P(B) and P(C).
c) P(A C), P(B C), P(A B).
Solution. a) If we denote by P(1) = p then P(i) = i p,
i 2, 3, 4, 5, 6. On the other hand since S = 1, . . . , 6 we have that
1 = P(S) =
6

i=1
P(i) =
6

i=1
i p = p(1 + 2 + + 6) = p 21.

In conclusion
P(i) = i p =
i
21
, i = 1, 6.
b) P(A) = P(2, 4, 6) = P(2) +P(4) +P(6)
=
2
21
+
4
21
+
6
21
=
12
21
=
4
7
P(B) = P(1, 3, 5) = P(1) +P(3) +P(5) =
1 + 3 + 5
21
=
9
21
=
3
7
P(C) = P(2, 3, 5) = P(2) +P(3) +P(5) =
10
21
298
c) Since A C = 2, 3, 4, 5, 6 = 1, then
P(A C) = P(1) = 1 P(1) = 1
1
21
=
20
21
P(B C) = P(3, 5) = P(3) +P(5) =
3
21
+
5
21
=
8
21
P(A B) = P(A B) = P(A) =
4
7
.
Example. Consider the experiment of throwing in the same time a dice and a
coin.
Determine the following events and compute their probabilities:
a) A: the occurrence of an even number and the head;
b) B: the occurrence of a prime number;
c) C: the occurrence of an odd number and the tail;
d) D: A or B is realized;
e) E: B and C is realized;
f) Which events among A, B and C are mutually exclusive?
Solution. The sample space is:
S = 1H, 2H, 3H, 4H, 5H, 6H, 1T, 2T, 3T, 4T, 5T, 6T.
a) A = 2H, 4H, 6H
P(A) =
card A
card S
=
3
12
=
1
4
b) B = 2H, 2T, 3H, 3T, 5H, 5T
P(B) =
card B
card S
=
6
12
=
1
2
c) C = 1T, 3T, 5T
P(C) =
card C
card S
=
3
12
=
1
4
d) D = A B = 2H, 4H, 6H, 2T, 3H, 3T, 5H, 5T
P(A B) =
8
12
=
2
3
or
P(A B) = P(A) +P(B) P(A B)
A B = 2H, P(A B) =
1
12
P(A B) =
1
4
+
1
2

1
12
=
3 + 6 1
12
=
8
12
=
2
3
e) E = B C = 3T, 5T
P(E) =
card E
card S
=
2
12
=
1
6
299
f) A B = 2H ,=
A C =
B C = 3T, 5T
Hence, the events A and C are mutually exclusive.
Example. Let (S, T, P) a probability space and A, B T such that
P(A) =
3
8
, P(B) =
1
2
, P(A B) =
1
4
.
Compute the following probabilities:
a) P(A B);
b) P(A) and P(B);
c) P(A B);
d) P(A B);
e) P(A B);
f) P(B A).
Solution. a) P(A B) = P(A) +P(B) P(A B)
=
3
8
+
1
2

1
4
=
3 + 4 2
8
=
5
8
b) P(A) = 1 P(A) = 1
3
8
=
5
8
P(B) = 1 P(B) = 1
1
2
=
1
2
c) P(A B) = P(A B) = 1 P(A B) = 1
5
8
=
3
8
d) P(A B) = P(A B) = 1 P(A B) = 1
1
4
=
3
4
e) P(A B) = P(A B) = P(A (A B))
= P(A) P(A B) =
3
8

1
4
=
1
8
f) P(B A) = P(B A) = P(B (A B))
= P(B) P(A B) =
1
2

1
4
=
1
4
.
Example. Chevalier de Mere was a mid-seventeenth century nobleman and gam-
bler who tried to make money gambling with dice. De Mere made money by betting
that he could obtain at least one 6 on four rolls of one dice. When people did not
bet on this game with de Mere, he created a new game. He began to bet he would
get a double 6 on twenty-four rolls of two dice but he began losing money on it. He
asked his friend Blaise Pascal to analyze this game. Pascal analyzed this game and
asked Pierre de Fermat to work with him. It can be said that the formal study of
probability was lauched by two mathematicians and a gambler.
300
We will calculate and compare the probabilities of the following events:
A: we obtain at least one six in 4 rolls of a dice
B: we obtain at least one double 6 in 24 rolls of two dice
C: we obtain at least one double 6 in 25 rolls of two dice.
For this problem it is easier to determine the probabilities of the contrary events
A, B, C.
The event A means no six is obtained in 4 rolls of a dice.
The experiment has 6 6 6 6 = 6
4
possible outcomes. Since in each rolling of
one dice we have 5 possibilities to obtain no six, then in 4 rolls of the dice we have 5
4
possibilities to get no six.
Therefore, P(A) =
5
4
6
4
which implies
P(A) = 1 P(A) = 1
5
4
6
4
0, 52.
The event B means no double 6 is obtained in 24 rolls of two dice. The number
of possible outcomes is 36
24
. The number of favorable outcomes is 35
24
. Therefore,
P(B) = 1 P(B) = 1
35
4
36
4
0, 49.
The event C has the probability
P(C) = 1
_
35
36
_
25
0, 505.
Example. The birthday problem. If n people are present in a room, what is
the probability that no two of them celebrate their birthday on the same day of this
year? How large need n be so that this probability is less then
1
2
?
Solution. As each person can celebrate his or her birthday on any of 365 day, there
is a total o(365)
n
possible outcomes. (We are ignoring the possibility of someones was
born on February 29). Assuming that each outcome is equally likely, we see that the
number of favorable cases is
365 (365 1) (365 2) . . . (365 (n 1))
since the rst person can celebrate his or her birthday on any day, the second person
can celebrate on any day except the rst persons birthday, the third person can
celebrate on any day except the rst and second persons birthdays and so on.
The desired probability is
p
n
=
365 364 . . . (365 n + 1)
365
n
.
The values for the previous probability for dierent values of n can be found in
the next table.
301
n 1 2 5 10 20 23 30 50
p
n
1 0, 99 0, 97 0, 88 0, 58 0, 49 0, 29 0, 03
When n 23, the desired probability is less than
1
2
. That is, if there are 23 or
more people in a room, then the probability that at least two of them have the same
birthday is greater than
1
2
.
When there are 50 persons in the room, the probability that at least two have the
same birthday is approximately 0,97.
6.2 Conditional probability
In this section we introduce the concept of conditional probability. Conditional
probabilities are used to compute the probabilities when some partial information
concerning the result of the experiment is available.
Conditional probabilities
In the experiment of tossing a pair of unbiased dice suppose that we observe that
the rst dice is a 2. We want to determine the probability that the sum of the two
dice equals 7, given that we already know that the rst dice is a 2.
Since the possible outcomes for our experiment are (2,1), (2,2), (2,3), (2,4), (2,5)
and (2,6) and the favorable outcome is (2,5), the desired probability is
1
6
.
If we consider the events
A: the rst dice is 2 and
B: the sum of the dice is 7 then
A = (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
B = (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)
and
A B = (2, 5).
In the above example we are told that the event A has occurred and we are asked
to evaluate the probability of B on the basis of this fact. What is important here
is the event A, and given that A has occurred, the event B occurs only if the event
(2, 5) = A B occurs. The required probability is then
1
6
=
1
36
6
36
=
P(A B)
P(A)
.
If we denote by P(B[A) (the probability of B) given that A has occurred or just,
given B) then
P(B[A) =
P(A B)
P(A)
.
302
This example justies the following denition of conditional probability.
Denition. (Conditional probability) Let (S, T, P) be a probability space and
let A, B T such that P(A) > 0.
The conditional probability of the event B, given that A (has occurred) is denoted
by P(B[A) (or P
A
(B)) and is dened by
P(B[A) = P
A
(B) =
P(A B)
P(A)
.
Replacing A by the sample space S we obtain the probability of B,
P(B[S) =
P(B S)
P(S)
=
P(B)
1
= P(B).
Hence, the conditional probability is a generalization of the concept of probability
where S is restricted to an event A.
We will see now that the conditional probability, is, indeed, a probability.
Proposition. If (S, T, P) is a probability space and A T such that P(A) > 0
then P
A
: T R, P
A
(B) = P(B[A) is a probability function, too.
Proof. We verify the three axioms of a probability function.
i) P
A
(B) = P(B[A) =
P(B A)
P(A)
0, B T.
ii) P
A
(S) = P(S[A) =
P(S A)
P(A)
=
P(A)
P(A)
= 1.
iii) Let (B
i
)
i1
T be a sequence of mutually exclusive events. Then
P
A
_

_
i=1
B
i
_
= P
_

_
i=1
B
i
[A
_
=
P
__

_
i=1
B
i
_
A
_
P(A)
=
P
_

_
i=1
(B
i
A)
_
P(A)
=

i=1
P(B
i
A)
P(A)
=

i=1
P(B
i
A)
P(A)
=

i=1
P(B
i
[A) =

i=1
P
A
(B
i
),
which completes the proof.
From the denition of conditional probability we can derive a simple but very
useful result: the so-called multiplicative theorem.
Theorem. (The multiplication rules) Let (S, T, P) be a probability space.
i) If A, B T such that P(A) > 0, then
P(A B) = P(A) P(B[A).
ii) If A, B, C T such that P(A B) > 0, then
P(A B C) = P(A) P(B[A) P(C[A B).
303
iii) If A
1
, . . . , A
n
T such that P(A
1
A
2
A
n1
) > 0 then
P(A
1
A
2
A
n
) = P(A
1
) P(A
2
[A
1
) P(A
3
[A
1
A
2
) . . .
P(A
n
[A
1
A
2
A
n1
).
Proof. i) P(A) P(B[A) = P(A)
P(B A)
P(A)
= P(BA) = P(AB), as desired.
ii) P(A) P(B[A) P(C[A B)
= P(A)
P(A B)
P(A)

P(A B C)
P(A B)
= P(A B C).
Observe that if P(AB) > 0 then P(A) P(AB) > 0, so the previous fractions
are correctly dened.
iii) By induction and is left as an exercise.
The importance of the previous theorem is given by the fact that we can calculate
the probability of the intersection of n events, step by step, by means of conditional
probabilities which is easier.
Next, we present a simple example which illustrates the point.
Example. An urn contains 12 identical balls of which 6 are red, 4 are green and
2 are yellow. Four balls are extracted from the urn without replacement. Determine
the probability that the rst ball is green, the second is red, the third is yellow and
the last is red.
Solution. If we denote by G
1
, R
2
, Y
3
and R
4
the events that the rst ball is green,
the second is red, the third is yellow and the fourth is red then the desired probability
is
P(G
1
R
2
Y
3
R
4
) = P(G
1
) P(R
2
[G
1
) P(Y
3
[G
1
R
2
) P(R
4
[G
1
R
2
Y
3
)
=
4
12

6
11

2
10

5
9
=
2
99
.
Probability trees
An eective and simpler method of applying the probability rules is the probability
tree, in which:
the events are represented by lines (branches)
the probability along a path is the product of the probabilities on the branches
which form the path
the sum of the probabilities at the end of the branches which start from the
same point is 1 because all possible events are listed.
We will present rst a theoretical example.
Example. Let (S, T, P) be a probability space. Let A
1
, A
2
, A
3
, E T such that
A
1
A
2
A
3
= S, A
i
A
j
= , i ,= j.
304
We can obtain the following probability tree.
E A
1
P(E A
1
) = P(A
1
) P(E|A
1
)
E A
1
P(E A
1
) = P(A
1
) P(E|A
1
)
E A
2
P(E A
2
) = P(A
2
) P(E|A
2
)
E A
2
P(E A
2
) = P(A
2
) P(E|A
2
)
E A
3
P(E A
3
) = P(A
3
) P(E|A
3
)
E A
3
P(E A
3
) = P(A
3
) P(E|A
3
)
A
1
A
2
A
3
P(E|A
1
)
P(E|A
1
)
P(E|A
2
)
P(E|A
2
)
P(E|A
3
)
P(E|A
3
)
P(A
1
)
P(A
2
)
P(A
3
)
Example. A graduate statistics course has 7 male and 3 female students. The
professor wants to select two students at random for a research project. By using a
probability tree determine the probabilities of all the possible outcomes.
F
F F P(F F) =
3
10

2
9
=
2
30
M F P(M F) =
7
10

3
9
=
7
30
M
F M P(F M) =
3
10

7
9
=
7
30
M M P(M M) =
7
10

6
9
=
14
30
P(F) =
3
10
P(M) =
7
10
P(F|F) =
2
9
P(M|F) =
7
9
P(F|M) =
3
9
P(M|M) =
6
9
305
Remark that the rst two branches represent the two possibilities female and male
students, on the rst choice. The second set of branches represents the two possibilities
on the second choice. The probabilities of female and male student chosen rst are
3
10
and
7
10
respectively. The probabilities for the second set of branches are conditional
probabilities based on the choice of the rst student selected.
6.3 The total probability formula.
Bayes formula
We will rst analyze the following example:
Example. Suppose a disease is present in 0,1% of a population. A diagnostic test
is available but imperfect. The test shows 5% false positives and 1% false negatives.
That is, for a patient not having the disease, the test shows positive with probability
0,05 and negative with probability 0,95. For a patient having the disease, the test
shows negative with probability 0,01 and positive with probability 0,99.
A person is randomly chosen.
(i) Determine the probabilities of the following congurations: diseased and posi-
tive test, diseased and negative test, not diseased and positive test, not diseased and
negative test.
(ii) Determine the probability that a person will test positive and the probability
that a person will test negative.
(iii) If the chosen person tests positive, what is the probability that he/she is
diseased? If the chosen person tests negative, what is the probability that he/she is
diseased?
Solution. Let
D: the event that the person is diseased
T: the test is positive.
We are given the following data:
P(D) = 0, 001, P(D) = 0, 999, P(T[D) = 0, 05,
P(T[D) = 0, 95, P(T[D) = 0, 99, P(T[D) = 0, 01.
We can represent these informations by using a probability tree:
(i)
306
D
D T P(D T) = 0, 99 0, 001
D T P(D T) = 0, 01 0, 001 = 0, 00001
D T P(D T) = 0, 05 0, 999 = 0, 04995
D T P(D T) = 0, 95 0, 999 = 0, 94905
D
P(D) = 0, 001
P(D) = 0, 999
P(T|D) = 0, 99
P(T|D) = 0, 01
P((T|D) = 0, 05
P(T|D) = 0, 95
(ii) P(T) = P(D T) +P(D T) = 0, 00099 + 0, 04995 = 0, 05094
P(T) = P(D T) +P(D T) = 0, 00001 + 0, 094905 = 0, 094906
(iii) P(D[T) =
P(D T)
P(T)
=
0, 00099
0, 05094
=
99
5094
0, 0198
P(D[T) =
P(D T)
P(T)
=
1
94905
Thus only 1,98% of those persons whose test results are positive actually have the
disease. The result is surprising, since the proportion is low given that the test is quite
good. We will present a second argument which is less rigorous but is more relevant.
Since 0,1% of the population actually has the disease, it follows that 1 person
out of every 1000 tested will have it (on average). The test conrms that a diseased
person has the disease with probability 0,99. Thus out of every 1000 person tested,
the test will conrm that 0,99 persons have the disease. On the other hand, out of
999 healthy people, the test will state (incorrectly) that 999 0, 05 = 49, 95 have the
disease. Hence, for every diseased persons that the test correctly states are ill, there
are 49,95 healthy persons that the test states are ill (incorrectly).
Thus, the proportion of correct positives is equal to:
correct positives
correct positives + incorrect positives
=
0, 99
0, 99 + 49, 95
=
99
5094
0, 019.
The fact that the probability P(D[T) is less then 1 reects the fact that the test
is imperfect. If the test would be perfect
P(T[D) = P(T[D) = 1
then
P(D[T) =
P(D T)
P(D T) +P(D T)
307
=
P(D) P(T[D)
P(D) P(T[D) +P(D) P(T[D)
=
P(D) 1
P(D) 1 +P(D) 0
= 1.
By using the same reasoning we can observe that a person testing positive has the
disease depends on the proportion of people in population who are actually ill. Let
us suppose that the incidence rate of disease is r. Replacing the proportion 0,001 by
r and 0,999 by 1 r in the above calculation, we obtain that
P(D[T) =
r 0, 99
r 0, 99 + (1 r) 0, 05
=
99r
99r + 5 5r
=
99r
5 + 94r
A graph of this function is shown in the gure below.
`

1
0,8
0,6
0,4
0,2
0,2 0,4 0,6 0,8 1
r
P(D[T)
We see that the test is relevant when the incidence rate for the disease is large.
Since most diseases have small incidence rate then the false positive rate and false
negative rate for this tests are very important numbers.
Denition. (Partition of the sample space)
Let (S, T, P) be a probability space. The events A
1
, A
2
, . . . , A
n
T form a
partition of S if:
i) A
1
A
2
A
n
= S
ii) A
i
A
j
= , i ,= j
iii) P(A
i
) > 0, i = 1, n.
It is obvious that any event B T can be expressed in terms of a partition of S:
B = B S = B
_
n
_
i=1
A
i
_
=
n
_
i=1
(B A
i
) =
n
_
i=1
(A
i
B).
Furthermore,
P(B) = P
_
n
_
i=1
(A
i
B)
_
=
n

i=1
P(A
i
B) =
n

i=1
P(A
i
) P(B[A
i
).
308
Thus, we have the following result.
Theorem. (The total probability formula) Let (S, T, P) be a probability space
and let A
1
, A
2
, . . . , A
n
be a partition of S.
Then, for any event B T we have
P(B) =
n

i=1
P(A
i
) P(B[A
i
).
So, if we know the probabilities of the partitioning events P(A
i
), i = 1, n and the
conditional probabilities of B, given A
i
, then by using the previous formula we can
obtain the probability P(B).
The computation in the last example is a particular case of the following general
result.
Theorem. (Bayes formula) Let (S, T, P) be a probability space,
A
1
, A
2
, . . . , A
n
be a partition of S and let B T with P(B) > 0.
Then, for all j = 1, n we have
P(A
j
[B) =
P(B[A
j
) P(A
j
)
n

i=1
P(B[A
i
) P(A
i
)
.
Proof. We write
P(A
j
[B) =
P(A
j
B)
P(B)
=
P(B[A
j
) P(A
j
)
n

i=1
P(B[A
i
) P(A
i
)
according to the denition of conditional probability and to the total probability
formula.
The previous formula was rst stated by the English clergyman Thomas Bayes,
who died in 1761 but whose now famous formula was not published until 1763.
The probabilities P(A
1
), . . . , P(A
n
) are called the prior probabilities in the
sense that they do not take into account any information about B.
P(A
j
[B), j = 1, n are called the posterior probability in the sense that they
are reevaluations of the respective prior P(A
j
) after the event has occurred.
Example. Two identical urns have the following compositions:
- the rst urn contains 10 black and 30 white balls
- the second urn contains 20 black and 20 white balls.
An urn is selected at random and a ball is taken from the urn. If the ball is white,
what is the probability that it comes from the rst urn?
Solution. We consider the following events:
U
1
: the event of choosing the rst urn
U
2
: the event of choosing the second urn
W: the event that the extracted ball is white.
We compute the probabilities:
P(U
1
) = P(U
2
) =
1
2
309
(since the urns are identical, we have the same chance to select any urn)
P(W[U
1
) =
30
40
=
3
4
P(W[U
2
) =
20
40
=
1
2
In order to compute the desired probability, P(U
1
[W) we apply the Bayes law:
P(U
1
[W) =
P(W[U
1
) P(U
1
)
P(U
1
) P(W[U
1
) +P(U
2
) P(W[U
2
)
=
3
4

1
2
1
2

3
4
+
1
2

1
2
=
3
4
3
4
+
2
4
=
3
5
= 0, 6.
Example. (Bac Polynesie 2007) In a vacantion village, three training courses are
proposed both to the children and to the adults. They take place in the same time
and their topics are magic, drama and digital photography. 150 persons (90 adults
and 60 children) are registered at one of this training courses.
- the course of magic was chosen by one half of the children and 20% of the adults
- the course of digital photography was chosen by 27 adults and 10% of the children.
1. Fill in the following table.
Digital
Magic Drama photography Total
Adults 90
Children 60
Total 150
We choose at random a person registered at one of the training courses. We con-
sider the following events.
A: the chosen person is an adult
M: the chosen person is registered at the course of magic
D: the chosen person is registered at the drama course
P: the chosen person is registered at the digital photography course.
2. a) What is the probability that the chosen person is a child?
b) What is the probability that the chosen person takes the course of digital
photography given that he/she is an adult?
c) What is the probability that the chosen person is an adult which is registered
at drama course?
3. Compute the probability that the chosen person is registered at the course of
magic.
4. Compute the probability that the chosen person is a child given that he/she is
registered at the course of magic.
5. We choose at random three persons out of the group of 150 persons. Which is
the probability that only one person takes the course of magic?
310
Solution. 1)
Digital
Magic Drama photography Total
Adults 18 45 27 90
Children 30 24 6 60
Total 48 69 33 150
2) a) P(A) =
card A
card S
=
60
150
=
2
5
b) P(P[A) =
P(P A)
P(A)
=
27
150
90
150
=
27
90
=
3
10
c) P(A D) =
card (A D)
card S
=
45
150
=
3
10
3) P(M) =
card M
card S
=
48
150
=
9
25
4) P(A[M) =
P(A M)
P(M)
=
30
150
9
25
=
30
48
=
5
8
.
6.4 Independence
Since we already know what conditional probability means we can dene the notion
of independent events. Intuitively, in order that two events A and B to be independent,
the probability of A should not change when the event B occurs and the probability
of B should not change when the event A occurs. A rst approach to the denition of
independent events is: P(A) = P(A[B). There are two reasons for which the previous
denition is not satisfactory: the denition is not symmetric in A and B and P(A[B)
cannot be dened when P(B) = 0. We dene independence as follows.
Denition. Let (S, T, P) be a probability space. Two events A and B T are
said to be independent if
P(A B) = P(A) P(B).
The events A
1
, A
2
, . . . , A
n
T are said to be independent if
P
_
_

jJ
A
j
_
_
=

jJ
P(A
j
),
for all index sets J 1, 2, . . . , n.
311
Example. We choose a random card from a deck of 52 cards. Let A be the event
that the card is a queen, and B be the event that it is a spade. Then
P(A) =
4
52
=
1
13
and P(B) =
13
52
=
1
4
.
The event AB is the event that we draw a spade queen, the probability of which
is just
1
52
. We see that P(A) P(B) = P(A B) and hence the events A and B are
independent.
The next example explains why in the denition of independence for more than
two events, we need to require
P
_
_

jJ
A
j
_
_
=

jJ
P(A
j
)
for all index sets J, and not only for such sets of size 2.
Example. In the experiment of rolling 2 fair dice, let A be the event that the rst
dice is even, B the event that the second dice is even and C the event that the sum
of the dice is even.
Show that events A, B and C are pairwise independent but A, B, C are not inde-
pendent.
Solution.
P(A) =
card A
card S
=
18
36
=
1
2
P(B) =
card B
card S
=
18
36
=
1
2
A B = (2, 2), (2, 4), (2, 6), (4, 2), (4, 4), (4, 6), (6, 2), (6, 4), (6, 6)
P(A B) =
card (A B)
card S
=
9
36
=
1
4
Hence A and B are independent since
P(A B) =
1
4
=
1
2

1
2
= P(A) P(B)
P(C) =
1
2
, A B = A C, P(A C) =
1
4
hence A and C are independent.
In the same way we can show that B and C are independent.
Since A B C = A B = A C = B C then
P(A B C) =
1
4
,=
1
8
= P(A) P(B) P(C)
wherefrom we deduce the fact that the events A, B and C are not independent.
Remark. 1) If P(B) ,= 0, A and B are independent if and only if
P(A[B) = P(A).
312
2) A, B are independent events
A, B are independent events
A, B are independent events
A, B are independent events.
Proof. 1) Let A, B T such that P(B) ,= 0.
A, B are independent events
P(A B) = P(A) P(B)
P(A) =
P(A B)
P(B)
P(A) = P(A[B).
2) We shall prove just the rst equivalence since the other equivalences can be
justied similarly.
Since B = B S = B (A A) = (B A) (B A) and the events B A and
B A are mutually exclusive we obtain
P(B) = P(B A) +P(B A).
Suppose now that A and B are independent events, that is
P(A B) = P(A) P(B).
P(A B) = P(B) P(B A) = P(B) P(B) P(A)
= P(B)(1 P(A)) = P(B) P(A).
We see that P(AB) = P(A) P(B) and hence the events A and B are indepen-
dent.
Suppose now that A and B are independent events, that is
P(A B) = P(A) P(B)
P(A B) = P(B) P(A B) = P(B) P(A) P(B)
= P(B)(1 P(A)) = P(B) P(A) = P(B) P(A).
We see that P(AB) = P(A) P(B) and hence the events A and B are indepen-
dent.
6.5 Classical probabilistic models.
Urn models
In this section we will consider random experiments which frequently appear in
practical applications and we will calculate the probabilities of their outcomes. The
mathematical models that will be used to describe the considered experiments are the
urn models (which contain colored balls of the same weight).
313
Urn models with replacement
Urn model with two states with replacement
Consider an urn U which contains white and black balls. Let p be the probability
of getting a white ball and q the probability of getting a black ball. Since the events
of extracting a white ball, respectively a black ball are contrary events we have that
p +q = 1.
A trial consists in taking a ball, recording its colour and putting it back into the
urn. In consequence the probability of taking a ball of a specied colour at the rst
trial is the same with the probability of taking a ball of the same colour at the second
trial, and so on. The trials in this experiment are independent.
We want to determine the probability that in n repeated trials to get k white balls
(k n).
We denote by X
k
n
the desired event. We have to compute P(X
k
n
).
The desired white balls can be obtained at any k trials from the considered n
trials.
Denote by W
j
the event of getting a white ball at the j
th
trial, j = 1, n. The
desired event can be written as
X
k
n
=
_
(W
i
1
W
i
1
W
i
k
W
i
k+1
W
i
n
)
where i
1
, i
2
, . . . , i
n
= 1, 2, . . . , n.
The previous union contains C
k
n
terms since we can obtain k white balls at any k
trials from n trials.
P(X
k
n
) = P
_
_
(W
i
1
W
i
1
W
i
k
W
i
k+1
W
i
n
)
_
=

P(W
i
1
) P(W
i
2
) . . . P(W
i
k
) P(W
i
k+1
) . . . P(W
i
n
)
=

p p . . . p
. .
k times
q q . . . q
. .
nk times
=

p
k
q
nk
= C
k
n
p
k
q
nk
Hence,
P(X
k
n
) = C
k
n
p
k
q
nk
.
Remark 1. The term C
k
n
p
k
q
nk
can be obtained as the general term in the
binomial theorem
(p +q)
n
=
n

k=0
C
k
n
p
k
q
nk
.
This is why the previous model is also called the binomial model.
Remark 2. Pascal urn model
Consider an urn U which contains white and black balls. Let p the probability of
getting a white ball and q the probability of getting a black ball (p +q = 1).
A trial consists in taking a ball from the urn, recording its colour and putting it
back into the urn.
314
We want to determine the probability that in n successive trials to get k (k n)
white balls and at the n
th
trial (which is the last trial) to get a white ball. The
previous event can also be described as: to get k 1 white balls in the rst n1 trials
and a white ball at the n
th
trial.
Denote by Y
k
n
the desired event. The needed event can be written as:
Y
k
n
= X
k1
n1
W
n
where X
k1
n1
represents the event of obtaining: k 1 white balls in n 1 successive
trials and W
n
is the event of getting a white ball at the n
th
trial.
Hence,
P(Y
k
n
) = P(X
k1
n1
W
n
) = P(X
k1
n1
) P(W
n
)
= C
k1
n1
p
k1
q
(n1)(k1)
p = C
k1
n1
p
k
q
nk
.
Remark 3. Geometric model
The geometric model is a particular case of the Pascal model in which we take
k = 1, that is in n successive trials we get one white ball which is obtained at the n
th
trial.
Hence,
P(Y
1
n
) = C
11
n1
p q
n1
= pq
n1
P(Y
1
n
) = pq
n1
.
The name Geometric model comes from the fact that the term pq
n1
is the n
th
term of a geometric progression whose rst term in p and its ratio is q.
Another way of obtaining Y
1
n
is the following:
Since Y
1
n
= W
1
W
2
W
n1
W
n
, we obtain
P(Y
1
n
) = P(W
1
) P(W
2
) . . . P(W
n1
) P(W
n
)
= q q . . . q
. .
n1 times
p
= q
n1
p = pq
n1
,
as we expected.
Urn model with more than two states with replacement
Consider an urn U which contains balls of s colours: c
1
, c
2
, . . . , and c
s
. Let p
i
be the
probability of getting a ball of colour c
i
, i = 1, s. Since the events of extracting a ball of
colour c
i
, i = 1, s form a partition of the sample space we have that p
1
+p
2
+ +p
s
= 1.
A trial consists in taking a ball, recording its colour and putting it back into the
urn.
We want to determine the probability that in n repeated trials to get k
i
balls of
colour c
i
, i = 1, s,
k
1
+k
2
+ +k
s
= n.
315
First, we will count in how many dierent ways the desired event can be obtained.
There are C
k
1
n
possible choices for the balls of colour c
1
; for each choice of the balls
of the rst colour there are C
k
2
nk
1
possible choices for the balls of colour c
2
; for each
choice of the balls of the rst two colours there are C
k
3
nk
1
k
2
possible choices for the
third group; and so on. In consequence, there are
C
k
1
n
C
k
2
nk
1
C
k
3
nk
1
k
2
. . . C
k
s
nk
1
k
2
k
s1
=
n!
k
1
!(n k
1
)!

(n k
1
)!
(n k
1
k
2
)! k
2
!

(n k
1
k
2
)!
(n k
1
k
2
k
3
)!k
3
. . .
. . .
(n k
1
k
2
k
s1
)
(n k
1
. . . k
s
)! k
s
!
=
n!
k
1
!k
2
! . . . k
s
!
possible ways in which the desired event can be obtained.
We denote by X
k
1
,k
2
,...,k
s
n
the desired event. We have to compute P(X
k
1
,k
2
,...,k
s
n
).
Denote by X
k
i
c
i
the probability of extracting k
i
balls of colour c
i
from n trials,
i = 1, s.
X
k
1
,...,k
s
n
=
_
(X
k
1
c
1
X
k
2
c
2
X
k
s
c
s
).
The previous union contains
n!
k
1
!k
2
! . . . k
s
!
terms.
P(X
k
1
,...,k
s
n
) =

P(X
k
1
c
1
X
k
2
c
2
X
k
s
c
s
)
=

p
k
1
1
p
k
2
2
. . . p
k
s
s
=
n!
k
1
!k
2
! . . . k
s
!
p
k
1
1
. . . p
k
s
s
Remark 4. The term
n!
k
1
!k
2
! . . . k
s
!
p
k
1
1
. . . p
k
s
s
can be obtained as the general term
in the multinomial theorem
(p
1
+ +p
s
)
n
=

(k
1
,...,k
s
)
k
1
+k
2
++k
s
=n
n!
k
1
!k
2
! . . . k
s
!
p
k
1
1
p
k
2
2
. . . p
k
s
s
(the above sum is over all nonnegative integer-valued vectors (k
1
, . . . , k
s
) such that
k
1
+k
2
+ +k
s
= n.
This is why the previous model is also called the multinomial model.
Poisson urn model
Suppose we have n urns, (U
1
, . . . , U
n
), each of them containing white and black
balls in dierent proportions. Let p
i
be the probability of getting a white ball from
the i
th
urn and q
i
the probability of getting a black ball from the same urn, i = 1, n.
Since the previous events are contrary events we have that p
i
+q
i
= 1, i = 1, n.
Our experiment consists in taking one ball from each urn (so we will get exactly
n balls). We want to nd the probability of getting k white balls from the selected n
balls, k n. We denote by X
k
the desired event.
316
Since the desired k white balls can be obtained from any k urns then X
k
can be
written as:
X
k
=
_
(W
i
1
W
i2
W
i
k
W
i
k+1
W
i
n
)
where by W
i
we denote the event of getting a white ball from the i
th
urn and
i
1
, . . . , i
n
is any permutation of the set 1, 2, . . . , n.
Hence
P(X
k
) =

p
i
1
p
i
2
. . . p
i
k
q
i
k+1
. . . q
i
n
,
where the sum is made over all the permutations (i
1
, . . . , i
n
) of the set (1, 2, . . . , n).
Remark 5. The previous value can also be obtained as the coecient of t
k
of the
following polynomial:
(p
1
t +q
1
)(p
2
t +q
2
) . . . (p
n
t +q
n
).
Actually, we have the following more general identity:
n

k=0
P(X
k
)t
k
=
n

i=1
(p
i
t +q
i
).
Remark 6. The Poisson urn model is a generalization of the binomial model.
Indeed, if in the Poisson urn model we consider n urns with same composition, then
extracting one ball from each urn is the the same to extract repeatedly n balls from
one urn.
In this case we have
n

k=0
P(X
k
)t
k
=
n

i=1
(pt +q) = (pt +q)
n
,
wherefrom we obtain
P(X
k
) = C
k
n
p
k
q
nk
as we expected.
Urn model without replacement
Urn model with two states without replacement
Let U be an urn which contains a white balls and b black balls. A trial consists
in taking a ball from the urn, recording its colour and not replacing the ball into the
urn. We want to nd the probability that in n successive trials to get k white balls
and l = n k black balls. The numbers l and k must satisfy the following conditions
n = l +k, l a, l b.
We observe that since the extracted ball in one trial is not replaced our experiment
(which consists in n successive trials) can be performed by taking n balls at a time.
317
By using the previous remark and the classical denition of the probabilities we get
that the probability of the desired event X
k,l
a,b
is
P(X
k,l
a,b
) =
no. of favorable outcomes
no. of possible outcomes
=
C
k
a
C
l
b
C
n
a+b
.
Indeed, since the number of possibilities of taking k white balls is C
k
a
and for
each choice of k white balls there are C
l
b
possibilities for taking l black balls then the
number of favorable cases is C
k
a
C
l
b
as we mentioned before.
In conclusion:
P(X
k,l
a,b
) =
C
k
a
C
l
b
C
n
a+b
, n = k +l, l a, l b.
The previous model can be easily generalized to more then two states as follows.
Urn model with more than two states without replacement
Let U be an urn which contains a
i
balls of colour c
i
, i = 1, s. The experiment
described is similar to the previous one. A trial consists in taking a ball from the
urn, recording its colour and not replacing back into the urn. We want to nd the
probability that in n successive trials to get k
i
balls of colour c
i
, i = 1, s. The numbers
k
i
, i = 1, s must satisfy the following conditions: k
1
+ k
2
+ + k
s
= s and k
i
a
i
for each i = 1, s.
The probability of the desired event, X
k
1
,k
2
,...,k
s
a
1
,a
2
,...,a
s
can be obtained by the same
reasoning as in the previous example.
P(X
k
1
,k
2
,...,k
s
a
1
,a
2
,...,a
s
) =
C
k
1
a
1
C
k
2
a
2
. . . C
k
s
a
s
C
n
a
1
+a
2
++a
s
, k
1
+k
2
+ +k
s
= n
and k
i
a
i
, i = 1, s.
Examples
Example 1. Suppose that the probability that an item produced by a certain
production line will be defective is 0,1. Find the probability that a sample of 10 items
will contain at most 1 defective item.
Solution. We shall use the binomial model (the urn model with 2 states with
replacement) since the probability of obtaining a defective item at each trial is the
same.
Since we are interested in at most 1 defective item, we have to compute the
probability of the following event X
0
10
X
1
10
.
The desired probability is
P(X
0
10
X
1
10
) = P(X
0
10
) +P(X
1
10
)
= C
0
10
(0, 1)
0
(0, 9)
10
+C
1
10
(0, 1) (0, 9)
9
= 0, 9
10
+ 10 0, 1 0, 9
9
= 0, 9
9
.
318
Example 2. An urn contains 10 white and 5 black balls. Balls are randomly
selected, one at a time, until a white ball is obtained. If we assume that each selected
ball is replaced before the next ball is drawn, what is the probability that:
(a) exactly 3 extractions are needed
(b) at least 3 extractions are needed.
Solution. We shall use the geometric model, with
p =
10
15
=
2
3
and q =
5
15
=
1
3
a) P(Y
1
3
) = p q
2
=
2
3

_
1
3
_
2
=
2
27
b) The desired event is
Y = Y
1
3
Y
1
4
=

_
n=3
Y
1
n
.
It is easier to compute the probability of the contrary event which is: at most two
extractions are needed.
Y = Y
1
1
Y
1
2
P(Y ) = 1 P(Y ) = 1 P(Y
1
1
Y
1
2
) = 1 (P(Y
1
1
) +P(Y
1
2
))
= 1 (p +pq) = 1 p pq = q pq = q
2
=
1
9
.
Example 3. The probabilities that three men hit a target are respectively
1
6
,
1
4
and
1
3
. Each shoots once at the target.
a) Find the probability that exactly one of them hits the target.
b) Find the probability that at most two of them hit the target.
c) If only one hit the target, what is the probability that is was the rst man.
Solution. We will use the Poisson model. By using the notations introduced before
we have:
p
1
=
1
6
, q
1
=
5
6
, p
2
=
1
4
, q
2
=
3
4
, p
3
=
1
3
, q
3
=
2
3
.
a) P(X
1
) is the coecient of t
1
of the following polynomial
(p
1
t +q
1
)(p
2
t +q
2
)(p
3
t +q
3
) =
_
1
6
t +
5
6
__
1
4
t +
3
4
__
1
3
t +
2
3
_
,
hence
P(X
1
) =
1
6

3
4

2
3
+
5
6

1
4

2
3
+
5
6

3
4

1
3
=
31
72
.
We denote by M
i
the event that the target was hit by the i
th
men, i = 1, 3. The
previous probability can be directly computed as follows.
X
1
= (M
1
M
2
M
3
) (M
1
M
2
M
3
) (M
1
M
2
M
3
)
319
P(X
1
) = P(M
1
) P(M
2
) P(M
3
) +P(M
1
) P(M
2
) P(M
3
)
+P(M
1
) P(M
2
) P(M
3
)
P(X
1
) =
1
6

3
4

2
3
+
5
6

1
4

2
3
+
5
6

3
4

1
3
=
31
72
b) B = X
0
X
1
X
2
We will compute the probability of the contrary event: B = X
3
.
P(B) = 1 P(B) = 1 P(X
3
) = 1
1
6

1
4

1
3
=
71
72
c) P(M
1
[X
1
) =
P(M
1
X
1
)
P(X
1
)
=
P(M
1
M
2
M
3
)
P(X
1
)
P(M
1
[X
1
) =
P(M
1
) P(M
2
) P(M
3
)
P(X
1
)
=
1
6

3
4

2
3
31
72
=
6
31
Example 4. At a lottery, among 100 tickets, 25 are winning tickets. A person
buys 4 tickets from this lottery. Find the probability that at least one ticket is a
winning one?
Solution. We use the urn model with two states without replacement. The desired
event is
A = X
1
4
X
2
4
X
3
4
X
4
4
.
We compute the probability of the contrary event A = X
0
4
.
P(A) = 1 P(A) = 1 P(X
0
4
) = 1
C
0
25
C
4
75
C
4
100
= 1
C
4
75
C
4
100
.
Example 5. In the last 30 years, the probability that a newborn is a girl is 0,52.
A family has 5 babies born in the last 30 years. What is the probability that:
a) the fth born child to be the second boy of the family;
b) the rst boy to be the 4
th
newborn;
c) the last baby born in the family to be a boy.
Solution. We use the Pascal model with
p = 1 0, 52 = 0, 48, q = 0, 52.
a) P(Y
2
5
) = C
1
4
p
2
q
3
= 4 0, 42
2
0, 52
3
b) P(Y
1
4
) = C
0
3
p q
3
= 0, 42 0, 52
3
c) p = 0, 48.
Example 6. A dice is rolled fourteen times.
a) What is the probability that we obtain exactly one 6?
b) What is the probability that we obtain 4 times 4, 2 times 6 and 6 times 3?
Solution. a) We use the urn model with 2 states (face 6 and other faces) with
replacement.
320
In this case we have n = 14, k = 1, p =
1
6
and q =
5
6
.
P(X
1,13
14
) = C
1
14

_
1
6
_
1

_
5
6
_
13
= 14
1
6

_
5
6
_
13
=
14 5
13
6
14
.
b) We use the urn model with 4 states (face 4, face 6, face 3 and the other faces)
with replacement. In this case we have
n = 14, k
1
= 4, k
2
= 2, k
3
= 6,
k
4
= n k
1
k
2
k
3
= 14 4 2 6 = 2, p
1
= p
2
= p
3
=
1
6
, p
4
=
1
2
P(X
4,2,6,2
14
) =
14!
4! 2! 6! 2!
_
1
6
_
4
_
1
6
_
2
_
1
6
_
3
_
1
2
_
2
.
Miscellaneous examples
Example 1. Among the 20 students of a group, 6 speak English, 5 speak French
and 2 speak German.
If we choose randomly one student which is the probability that he/she knows a
foreign language (English, French or German)?
Solution. Let E, F, G be the considered events. Then
P(E) =
6
20
, P(F) =
5
20
, P(A) =
2
20
.
If the desired event is denoted X, then X = E F G. Since the pairs of events
E and F, E and G, F and G are not mutually exclusive we will use the addition rule
for computing the probability of the union E F G.
P(X) = P(E F G) = P(E) +P(F) +P(G) P(E G)
P(F G) P(E F) +P(E F G).
The events E, F and G are independent, hence:
P(X) = P(E) +P(F) +P(G) P(E) P(G) P(F) P(G)
P(E) P(F) +P(E) P(F) P(G)
=
6 + 5 + 2
20

6 5 + 6 2 + 5 2
20 20
+
6 5 2
20 20 20
=
13
20

52
400
+
3
400
=
211
400
.
Example 2. We are given 3 urns which contain white balls and black balls as
follows: U
1
(a, b), U
2
(c, d) and U
3
(e, f). A ball is drawn from the third urn. If the
selected ball is white it is replaced in the rst urn and the second ball is drawn from
the urn U
1
. If the rst ball is black it is replaced in the second urn wherefrom the
second ball is drawn.
Determine the probability of the following events:
321
a) the second ball is white;
b) the rst ball is white given that the second ball is black.
Solution. We use the following notations:
W
i
- the event that the i
th
ball is white, i = 1, 2
B
i
- the event that the i
th
ball is black, i = 1, 2.
a) We apply the total probability formula where the partition is W
1
, B
1
.
P(W
2
) = P(W
1
) P(W
2
[W
1
) +P(B
1
) P(W
2
[B
1
)
=
e
e +f

a + 1
a +b + 1
+
f
e +f

c
c +d + 1
.
In the same say we get:
P(B
2
) = P(W
1
) P(B
2
[W
1
) +P(B
1
) P(B
2
[B
1
)
=
e
e +f

b
a +b + 1
+
f
e +f

d + 1
c +d + 1
.
b) According to Bayes rule we have
P(W
1
[B
2
) =
P(W
1
) P(B
2
[W
1
)
P(B
2
)
=
e
e +f

b
a +b + 1
e
e +f

b
a +b + 1
+
f
e +f

d + 1
c +d + 1
.
Example 3. An urn contains a white balls (a 3) and b black balls. If we extract
without replacement 3 balls, what is the probability that all three balls to be white?
Solution. If we denote by W
i
the event that the i
th
ball is white, i = 1, 3, and by
X the desired event, then
X = W
1
W
2
W
3
.
By applying the multiplication rule for computing the probabilities we obtain:
P(X) = P(W
1
W
2
W
3
) = P(W
1
) P(W
2
[W
1
) P(W
3
[W
1
W
2
)
=
a
a +b

a 1
a +b 1

a 2
a +b 2
.
Example 4. Three machines A, B and C produce respectively 40%, 30% and 30%
of the total number of items of a factory. The percentages of defective output of these
machines are 2%, 4% and 5%. Suppose an item is selected at random and is found to
be defective. Find the probability that the item was produces by machine A.
Solution. We are given
P(A) = 0, 4, P(B) = 0, 3, P(C) = 0, 3
P(D[A) = 0, 02, P(D[B) = 0, 04, P(D[C) = 0, 05.
322
A D
B D
C D
A D
B D
C D
P(D|A) = 0, 02
P(D|B) = 0, 04
P(D|C) = 0, 05
A
B
C
0,4
0,3
0,3
By using the Bayes formula and the total probability formula we get:
P(A[D) =
P(A) P(D[A)
P(D)
=
P(A) P(D[A)
P(A) P(D[A) +P(B) P(D[B) +P(C) P(D[C)
=
0, 4 0, 02
0, 4 0, 02 + 0, 3 0, 04 + 0, 3 0, 05
0, 229.
Example 5. A student takes a multiple-choice exam. Suppose for each question
he either knows the answer or gambles and chooses an option at random (at each
question there are exactly 4 choices one of which is the correct answer). To pass,
students need to answer at least 60% of the questions correctly. The student has
studied for a minimal pass, that is, with probability 0,6 he knows the answer to a
question. Given that he answers a question correctly, what is the probability that he
actually knows the answer?
Solution. Let C and K denote, respectively, the events that the student answers
323
the question correctly and the event that he knows the answer. Now
P(K[C) =
P(K) P(C[K)
P(C)
=
P(K) P(C[K)
P(K) P(C[K) +P(K) P(C[K)
=
0, 6 1
0, 6 1 + 0, 4 0, 25
=
0, 6
0, 6 + 0, 1
=
6
7
0, 857
Example 6. Suppose A and B are events with 0 < P(A) < 1 and 0 < P(B) < 1.
a) If A and B are independent, can they be mutually exclusive?
b) If A and B are mutually exclusive, can they be independent?
c) If A B, can A and B be independent?
Solution. a) No.
Since A and B are independent, then
P(A B) = P(A) P(B) ,= 0
wherefrom A B ,= .
b) No.
Since A and B are mutually exclusive, then
P(A B) = 0 ,= P(A) P(B).
c) No.
If A B, then
P(A B) = P(A) ,= P(A) P(B).
Example 7. Suppose that each of three men at a part throws his hat into the
center of the room. The hats are rst mixed up and then each man randomly selects
a hat. What is the probability that none of the three men selects his own hat?
Solution. Let H
i
, i = 1, 3, be the event that the i
th
man selects his own hat. We
have to compute the probability of the event H
1
H
2
H
3
. In order to do that we
will compute the probability of the contrary event:
H
1
H
2
H
3
= H
1
H
2
H
3
= H
1
H
2
H
3
.
Hence
P(H
1
H
2
H
3
) = 1 P(H
1
H
2
H
3
).
To calculate P(H
1
H
2
H
3
) we will apply the addition rule
P(H
1
H
2
H
3
) = P(H
1
) +P(H
2
) +P(H
3
) P(H
1
H
2
) P(H
1
H
3
)
P(H
2
H
3
) +P(H
1
H
2
H
3
).
324
It remains to compute P(H
1
H
j
), i ,= j and P(H
1
H
2
H
3
).
For each i, j 1, 2, 3, i ,= j we have:
P(H
i
H
j
) = P(H
i
) P(H
j
[H
i
) =
1
3

1
2
=
1
6
P(H
1
H
2
H
3
) = P(H
1
) P(H
2
[H
1
) P(H
3
[H
1
H
2
) =
1
3

1
2
1 =
1
6
.
Now, we have that:
P(H
1
H
2
H
3
) =
1
3
+
1
3
+
1
3

1
6

1
6

1
6
+
1
6
=
2
3
.
Hence, the probability that none of the men selects his own hat is
1
2
3
=
1
3
.
Example 8. In the poker game (dealing 5 cards from a well-shued deck of 52
cards) nd the following probabilities:
a) the hand is all spades?
b) the hand us a ush?
c) the hand is a full house?
Solution. To calculate the probability associated with any particular hand, we
rst need to calculate how many hands can be dealt. Since the order of cards is
irrelevant we should use combinatorics. This is expressed as:
C
5
52
=
52!
5! 47!
=
52 51 50 49 48
5 4 3 2 1
= 2.598.960
a) We determine rst in how many ways we can select 5 spades. That is C
5
13
= 1287.
Thus, the probability of a hand of spades is
C
5
13
C
5
52
=
1287
2598960
0.0005
b) A ush is a hand of ve cards all in the same suit. The probability of a ush is
4 C
5
13
C
5
52
=
4 1287
2598960
0.002
c) A full house consists of three of a kind and one pair. There are thirteen numbers
that the three of a kind may have and then twelve possible numbers possible for the
pair. This is expressed as: 13C
3
4
12C
2
4
and in consequence the probability of a full
house is equal to:
13C
3
4
12C
2
4
C
5
52
=
13 4 12 6
2958960
= 0, 0012.
325
Example 9 (Probability as a continuous set function)
A sequence of events A
n

n1
is said to be an increasing sequence if A
1

A
2
A
n
A
n+1
. . . and it is said to be a decreasing sequence if
A
1
A
2
A
n
A
n+1
. . . . If A
n

n1
is an increasing sequence of events,
then we dene the new event lim
n
A
n
, by lim
n
A
n
=

_
n1
A
n
. Similarly, if A
n

n1
is
a decreasing sequence of events then lim
n
A
n
is dened by lim
n
A
n
=

n1
A
n
.
Prove that if A
n

n1
is either an increasing or a decreasing sequence of events,
then
lim
n
P(A
n
) = P
_
lim
n
A
n
_
.
Solution. Suppose, rst that A
n

n1
is an increasing sequence and dened the
events B
n
, n 1 by
B
1
= A
1
, B
n
= A
n
A
n1
, n > 1.
It is easy to verify that the events B
n

n1
are mutually exclusive events such
that

_
i=1
A
i
=

_
i=1
B
i
and
n
_
i=1
A
i
=
n
_
i=1
B
i
, n 1.
Hence
P
_
lim
n
A
n
_
= P
_

_
i=1
A
i
_
= P
_

_
i=1
B
i
_
=

i=1
P(B
i
)
= lim
n
n

i=1
P(B
i
) = lim
n
P
_
n
_
i=1
B
i
_
= lim
n
P
_
n
_
i=1
A
i
_
= lim
n
P(A
n
)
which proves the result when A
n

n1
is increasing.
If A
n

n1
is a decreasing sequence, then A
n

n1
is an increasing sequence,
hence
P
_

_
i=1
A
n
_
= lim
n
P(A
n
).
Since

_
n=1
A
n
=

n=1
A
n
, the previous equality becomes
1 P
_

i=1
A
i
_
= lim
n
(1 P(A
n
)) = 1 lim
n
P(A
n
)
which proves the result.
326
Chapter 7
Random variables
In general, in performing an experiment we are not interested in its outcomes,
but rather in some function of them. For example, suppose one plays a game where
the payo is a function of the number of dots on two dice: suppose one receives 2
euros if the total number of dots equals 2 or 3; that one received 4 euros if the total
number of dots equals 4, 5, 6 or 7, and that one has to pay 8 euros otherwise. Our
payo is a function of the total number of dots on the dice. In order to compute the
probability that the payo equals some number we compute the probability that the
total number of dots correspond to the number selected. This leads to the notion of
random variables.
Denition. Let (S, T, P) be a probability space.
A random variable is a (measurable) function from the probability space to the
real numbers:
X : S R such that for each
x R, s : X(s) < x T.
Random variables are denoted by capital letters, such as X, Y, Z, U, V and W.
Example. In the particular example described before we obtain the following
random variable:
X : S 2, 4, 8
X(s) = 2, for each s A = (1, 1), (1, 2), (2, 1)
X(s) = 4, for each s B = (2, 2), (1, 3), (3, 1), (1, 4), (2, 3), (3, 2), (4, 1)
(1, 5), (2, 4), (3, 3), (4, 2), (5, 1), (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)
X(s) = 8 otherwise, s C = S (A B)
Since in this experiment we have T = T(S) the condition x R, s S [
X(s) < x T is fullled.
For short, we shall use the notation X < x for the event
s S [ X(s) < x.
327
Remark. (Events dened by a random variable)
Let (S, T, P) be a probability space and let X : S R be a random variable. If
x R, then:
a) X x = s S [ X(s) x T
b) X = x = s S [ X(s) = x T
c) X x, X > x T
d) X x X > x = S; X x X > x =
X x X < x = S; X x X < x = .
Proof. We shall use the denition of the -eld T and the properties of event
operations.
a) X x =

n>0
_
X < x +
1
n
_
T
b) X = x = X x X < x T
c) X x = X < x T
X > x = X x T
d) Obvious.
7.1 Discrete random variables
Denition. Let (S, T, P) be a probability space and let X : S R be a random
variable.
X is said to be a discrete random variable (d.r.v.) if X(S) = M R is nite or
countable.
A set M is countable if there is a one-to-one correspondence between M and N
(the set of natural number).
In consequence a random variable X : S M R is a discrete random variable
if
M = x
i
[ i I N.
Denition. (Probability mass function) (p.m.f.)
Let X : S M, M = x
i
[ i I N be a d.r.v.
The function f = f
X
: M R dened by
f(x
i
) = P(X = x
i
), i I
is called the probability mass function of the d.r.v. X.
Example 1. The random variable described before is a d.r.v. since M = 2, 4, 8.
Its p.m.f. is f : 2, 4, 8 R dened by
f(2) = P(A) =
3
36
=
1
12
f(4) = P(B) =
18
36
=
1
2
f(8) = p(C) =
15
36
=
5
12
328
Theorem. (Properties of the p.m.f.)
Let X : S M = x
i
[ i I N be a d.r.v. and let f
X
= f be its probability
mass function. Then
1) f(x
i
) 0, i I
2)

iI
f(x
i
) = 1.
Proof. We will use the properties of the probability function.
1) f(x
i
) = P(X = x
i
) 0.
2) We remark rst that the events X = x
i

iI
form a partition of the sample
space S. By using this fact, we get that

iI
f(x
i
) =

iI
P(X = x
i
) = P
_
_
iI
X = x
i

_
= P(S) = 1,
as desired.
Notation. If X : S M = x
i
[ i I N is a d.r.v. with the p.m.f. f
X
we will
use the following notations:
p
i
= f(x
i
), i I.
By using these notations the properties of the p.m.f. can be written as:
1) p
i
0, i N
2)

iI
p
i
= 1.
Denition. The distribution of a d.r.v.
The distribution of a d.r.v. X is a table of the following form:
X :
_
x
i
f
X
(x
i
)
_
iI
or X :
_
x
i
p
i
_
iI
with
_

_
p
i
0

iI
p
i
= 1.
The values taken by X are written on the rst row of the previous table and the
probabilities
p
i
= P(X = x
i
), i I.
are written on the second row of the table.
Operations with discrete random variables
Let (S, T, P) bee a probability space and let X, Y two discrete random variables
dened on S.
X : S M
1
= x
i
[ i I N d.r.v.
X :
_
x
i
p
i
_
iI
, p
i
0,

iI
p
i
= 1
Y : S M
2
= y
j
[ j J N d.r.v.
329
Y :
_
y
j
q
j
_
, q
j
0,

jJ
q
j
= 1.
Denition. The discrete random variables X and Y are called independent if
for each i I and j J the following events X = x
i
, Y = y
j
are independent
events. In this case
P(X = x
i
Y = y
j
) = P(X = x
i
) P(Y = y
j
) = p
i
q
j
.
The sum of two discrete random variables
X +Y : S M = x
i
+y
j
[ i I, j J
f
X+Y
(x
i
+y
j
) = P(X = x
i
Y = y
j
)
X,Y indep
=
= P(X = x
i
) P(Y = y
j
) = p
i
q
j
.
The distribution of the sum X +Y is
X +Y :
_
x
i
+y
j
f
X+Y
(x
i
+y
j
)
_
iI,jJ
If X and Y are independent then
X +Y :
_
x
i
+y
j
p
i
q
j
_
iI,jJ
Example 2. Let X, Y be two independent discrete random variables dened by:
X :
_
1 1
0, 3 0, 7
_
and Y :
_
0 2
0, 6 0, 4
_
Compute X +Y .
Solution. Since X and Y are independent, we have:
X +Y :
_
1 + 0 1 + 2 1 + 0 1 + 2
0, 3 0, 6 0, 3 0, 4 0, 7 0, 6 0, 7 0, 4
_
X +Y :
_
1 1 1 3
0, 18 0, 12 0, 42 0, 28
_
.
As we can see, the value 1 is taken twice, with probabilities 0,12 and 0,42. Then
P(X +Y = 1) = 0, 12 + 0, 42 = 0, 56.
Finally,
X +Y :
_
1 1 3
0, 18 0, 56 0, 28
_
.
330
The product of 2 discrete random variables
X Y : S M = x
i
y
j
[ i I, j J
f
XY
(x
i
y
j
) = P(X = x
i
Y = y
j
)
X,Y indep
=
= P(X = x
i
) P(Y = y
j
) = p
i
q
j
.
The distribution of the d.r.v. XY is:
XY :
_
x
i
y
j
f
XY
(x
i
y
j
)
_
iI,jJ
If X and Y are independent, then
XY :
_
x
i
y
j
p
i
y
j
_
iI,jJ
The sum (product) of a d.r.v. with a constant
If c R, then
c +X :
_
c +x
i
p
i
_
iI
c X :
_
cx
i
p
i
_
iI
The inverse of a discrete random variable
If x
i
,= 0, i I, then
X
1
:
_
1
x
i
p
i
_
iI
Example. If X and Y are the discrete random variables dened in example 2,
then
X
1
:
_
1 1
0, 3 0, 7
_
and Y
1
.
The power of a discrete random variable
Let k R and X :
_
x
i
p
i
_
iI
.
If x
k
i
is well dened for each i I then
X
k
:
_
x
k
i
p
i
_
iI
331
Example. Let X, Y be the independent d.r.v. dened by
X :
_
1 0 1 2
0, 3 0, 4 0, 2 p
_
and Y :
_
1 2 3
0, 2 0, 4 q
_
Compute: X
2
; 2X Y ; X
2
X.
Solution. First, observe that
p = 1 0, 3 0, 4 0, 2 = 0, 1 and q = 1 0, 2 0, 4 = 0, 4
X
2
:
_
(1)
2
0
2
1
2
2
2
0, 3 0, 4 0, 2 0, 1
_
X
2
:
_
0 1 4
0, 4 0, 3 + 0, 2 0, 1
_
2X Y : 2
_
1 0 1 2
0, 3 0, 4 0, 2 0, 1
_

_
1 2 3
0, 2 0, 4 0, 4
_
2X Y :
_
2 1 2 2 2 3 0 1 0 2 0 3
0, 3 0, 2 0, 3 0, 4 0, 3 0, 4 0, 4 0, 2 0, 4 0, 4 0, 4 0, 4
2 1 2 2 2 3 4 1 4 2 4 3
0, 2 0, 2 0, 2 0, 4 0, 2 0, 4 0, 1 0, 2 0, 1 0, 4 0, 1 0, 4
_
2X Y :
_
5 4 3 2 1 0 1 2 3
0, 12 0, 12 0, 22 0, 16 0, 16 0, 08 0, 08 0, 04 0, 02
_
The computation rules introduced before cannot be used in order to determine
X
2
X since the variables X
2
and X are not independent
X
2
X :
_
(1)
2
+ 1 0
2
0 1
2
1 2
2
2
0, 3 0, 4 0, 2 0, 1
_
X
2
X :
_
0 2
0, 4 + 0, 2 0, 3 + 0, 1
_
X
2
X :
_
0 2
0, 6 0, 4
_
.
7.2 The distribution function of a random variable
Denition. The distribution function F (or the cumulative distribution
function) of the random variable X is the function
F : R [0, 1] dened by
F(x) = P(X < x) = P(X < x).
If we want to mention the role of X we denote F by F
X
.
332
Example. Compute the cumulative distribution function of the following discrete
random variable
X :
_
0 1 2
1
4
1
2
1
4
_
.
Solution. If x 0, F(x) = P(X < x) = P() = 0.
If x (0, 1], F(x) = P(X < x) = P(X = 0) =
1
4
.
If x (1, 2], F(x) = P(X < x) = P(X = 0 X = 1) =
1
4
+
1
2
=
3
4
.
If x > 2, F(x) = P(X < x) = P(X = 0 X = 1 X = 2) = P(S) = 1.
Hence,
F(x) =
_

_
0, x 0
1
4
, 0 < x 1
3
4
, 1 < x 2
1, x > 2.
Theorem. (Properties of the cumulative distribution function) The cu-
mulative distribution function F = F
X
of a random variable X has the following
properties.
1) If x
1
, x
2
R, x
1
< x
2
, then
P(x
1
X < x
2
) = F(x
2
) F(x
1
)
P(x
1
< X < x
2
) = F(x
2
) F(x
1
) P(X = x
1
)
P(x
1
X x
2
) = F(x
2
) F(x
1
) +P(X = x
2
)
P(x
1
< X x
2
) = F(x
2
) F(x
1
) +P(X = x
2
) P(X = x
1
)
2) F is a monotone increasing function.
3) F is continuous from the left i.e.
lim
yx
F(y) = F(x 0) = F(x), x R.
4) lim
x
F(x) = 0, lim
x
F(x) = 1.
5) If x R, then
P(X x) = F(x + 0) = lim
yx
F(y).
6) If x R, then
P(X = x) = lim
yx
F(y) F(x) = F(x + 0) F(x).
7) The set of all points of discontinuity of F is at most countable.
Proof. 1) Let x
1
, x
2
R, x
1
< x
2
.
The events X < x
1
and x
1
X < x
2
are mutually exclusive and satisfy the
equality:
X < x
1
x
1
X < x
2
= X < x
2

333
F(x
2
) = P(X < x
2
) = P(X < x
1
x
1
X < x
2
)
= P(X < x
1
) +P(x
1
X < x
2
)
= F(x
1
) +P(x
1
X < x
2
),
hence
P(x
1
X < x
2
) = F(x
2
) F(x
1
).
Similarly, starting from the equality
X < x
1
x
1
< X < x
2
X = x
1
= X < x
2
,
we obtain
P(X < x
1
) +P(x
1
< X < x
2
) +P(X = x
1
) = P(X < x
2
),
hence
P(x
1
< X < x
2
) = F(x
2
) F(x
1
) P(X = x
1
).
The remaining two equalities can be obtained in the same way.
2) Property 2 follows because for x
1
< x
2
the event X < x
1
is contained in the
event X < x
2
and cannot have a larger probability, hence F(x
1
) F(x
2
).
3) Let x R and let (x
n
)
nN
an arbitrary increasing sequence such that lim
n
x
n
=
x. If x
n
increase to x, then the events X < x
n
are increasing events whose union is
the event X < x.
_
n1
X < x
n
= X < x.
Hence, by the continuity property of probabilities (see example 9, page 326):
lim
n
F(x
n
) = lim
n
P(X < x
n
) = P
_
_
_
n1
X < x
n

_
_
= P(X < x) = F(x).
Hence, F(x 0) = F(x).
4) If (x
n
)
nN
increases to , then the events X < x
n
, n 1, are increasing
events whose union is X < = S.
Hence, lim
n
F(x
n
) = lim
n
P(X < x
n
) = P(X < ) = 1 which proves the second
part of the fourth property.
The proof of the rst part of this property is similar and is left as an exercise.
5) 6) F(x + 0) = lim
yx
F(y) = lim
n
F
_
x +
1
n
_
= lim
n
P
_
X < x +
1
n
_
= P
_

_
X < x +
1
n
__
= P(X x) = P(X < x) +P(X = x) = F(x) +P(X = x).
7) The proof of this property is beyond the scope of this text and it will be omitted.
334
Remark. If x R and F is continuous at x then P(X = x) = 0.
Proof. Since F is continuous at x then F(x+0) = F(x). By applying the property
6 of the previous theorem we get
P(X = x) = F(x + 0) F(x) = F(x) F(x) = 0.
7.3 Continuous random variables
In the previous sections we considered discrete random variables, that is, random
variables whose set of possible values is at most countable. However, there also exist
random variables whose set of possible values is uncountable.
Denition. Let (X, T, P) a probability space and let X : S R be a random
variable whose cumulative distribution function is F.
If there exists a function f : R R such that
F(x) =
_
x

f(t)dt,
for each x R, then
a) X is said to be a continuous random variable
b) f is called the probability density function of X.
The distribution of a c.r.v. X whose density is f is dened as:
X :
_
x
f(x)
_
xR
Theorem. (Properties of the density function of a continuous random
variable) Let X be a continuous random variable and let f : R R be its density
function. Then the following properties hold:
1) f(x) 0, x R
2)
_

f(x)dx = 1.
Remark. If X is a continuous random variable having the distribution function
F then all the properties of a distribution function hold. Also we have:
a) F is continuous on R, hence for each x R we have
P(X = x) = 0.
b) If x
1
, x
2
R, x
1
< x
2
P(x
1
< X < x
2
) = P(x
1
X < x
2
) = P(x
1
X x
2
)
= P(x
1
< X x
2
) =
_
x
2
x
1
f(t)dt.
c) F is dierentiable on R I, where I is a set which is at most countable and
F

(x) = f(x), x R I.
335
7.4 Numerical characteristics of random variables
Expected value
One of the most important concepts in probability theory is that of the expectation
of a random variable.
If X is a discrete random variable dened as
X :
_
x
i
p
i
_
iIN
, p
i
0, i I,

iI
p
i
= 1,
such that

iI
[x
i
[p
i
< , then the expectation or the expected value of X, denoted
by E(X), is:
E(X) =

iI
x
i
p
i
.
This is also known as the mean, or average or rst moment of X.
Hence, the expected value of X is a weighted average of the possible values that
X can take on, each value being weighted by the probability that X assumes it.
The expected value can be seen as a guide to the location of X and is often called
a location parameter.
If X is a continuous random variable has the density f such that
_

[x[f(x)dx < ,
then X has an expected value, which is given by
E(X) =
_

xf(x)dx.
Remark. The condition

i=1
[x
i
[p
i
< is needed because, if it is violated, it is
known that

i=1
x
i
p
i
may be take dierent values, depending on the order of summa-
tion.
Example. Suppose an insurance company pays the amount of 500 Euro for lost
luggage on an airplane trip. It is known that the company pays this amount in 1 out
of 100 policies it sells. What premium should the company charge?
Solution. Let X be the r.v. dened as X = 0 if no loss occur, and X = 500 for
lost luggage. Then the distribution of X is:
X :
_
0 500
0, 99 0, 01
_
.
Then the expected loss to the insurance company is
E(X) = 0 0, 99 500 0, 01 = 5.
336
Thus, the company must charge 5 Euro (it will also add an amount for adminis-
trative expenses and a prot).
From the denition of the expectation and familiar properties of summations or
integrals, it follows that:
Theorem. Properties of the expected value.
1) If X is a constant random variable, i.e.
X :
_
a
1
_
, then E(X) = a.
2) Let X be a r.v. and let a R. Then
E(aX) = aE(X).
3) Let X and Y be two r.v. and let a, b R. Then
E(X +Y ) = E(X) +E(Y )
E(a +X) = a +E(X)
E(aX +b) = aE(X) +b.
4) If X and Y are two independent r.v. then:
E(XY ) = E(X)E(Y ).
Variance
The following example illustrates that the expected value, as a measure of location
of the distribution, may show us very little about the entire distribution.
Let X and Y be two d.r.v. whose distributions are dened as follows:
X :
_
1 1 3
5
8
2
8
1
8
_
; Y :
_
100 100 300
5
8
2
8
1
8
_
.
It is easy to see that E(X) = E(Y ) = 0.
The distribution of X is over an interval of length 4, the distribution of Y is over
an interval of length 100 times larger and they have the same center of location.
Hence, an additional measure is needed to be associated with the spread of location.
This new measure is the variance of a r.v.
Denition. If X is a random variable with mean E(X) = m, then the variance
of X, denoted by V (X) is dened by
V (X) = E((X m)
2
).
If X is a d.r.v., i.e. X :
_
x
i
p
i
_
iI
then
V (X) =

iI
(x
i
m)
2
p
i
.
337
If X is a c.r.v. X :
_
x
f(x)
_
xR
then
V (X) =
_

(x m)
2
f(x)dx.
For the random variables X and Y mentioned before we have
V (X) = (1)
2

5
8
+ 1
2

2
8
+ 3
2

1
8
= 2
and
V (Y ) = (100)
2

5
8
+ 100
2

2
8
+ 300
2

1
8
= 20000.
Thus, the variance shows the dierence in size of the range of the distributions of
the r.v.s X and Y .
Theorem. (Properties of the variance)
1) If X is a r.v. then V (X) 0.
2) If X is a r.v. then
V (X) = E(X
2
) (E(X))
2
.
3) If X is a constant random variable, i.e. X :
_
a
1
_
, then
V (X) = 0.
4) If X is a r.v. and a, b R then
V (aX +b) = a
2
V (X).
5) If X and Y are two independent random variables and a, b R then
V (aX +bY ) = a
2
V (X) +b
2
V (Y ).
For a = 1 and b = 1 we have
V (X Y ) = V (X) +V (Y ).
Proof. We will prove the second property since the rest of them can be eas-
ily obtain from the variances denition and familiar properties of summations and
integrals.
2) V (X) = E((X E(X))
2
) = E((X m)
2
) = E(X
2
2mX +m
2
)
= E(X
2
) 2mE(X) +m
2
= E(X
2
) m
2
= E(X
2
) (E(X))
2
.
In words, the variance of X is equal to the expected value of X
2
minus the square
of its expected value. This is, in practice, the easier way to compute V (X).
338
Standard deviation
The square root of the variance V (X) is called the standard deviation of X :
(X) =
_
V (X).
Unlike the variance, the standard deviation is measured in the same units as X
and E(X) and serves as a measure of deviation of X from E(X).
Moments and central moments
Denition. (Moments)
Let X be a r.v. and let k N.
The moment of order k of X is the number

k
= E(X
k
).
If X is a d.r.v., i.e., X :
_
x
i
p
i
_
iI
then

k
= E(X
k
) =

iI
x
k
i
p
i
(if the previous sum exists).
If X is a c.r.v., i.e., X :
_
x
f(x)
_
xR
then

k
= E(X
k
) =
_

x
k
f(x)dx
(if the previous integral converges).
The moments of order k generalize the expected value because

1
= E(X).
Denition. (Central moments)
Let X be a random variable with mean E(X) = m and let k N.
The central moment of order k of X is the number

k
= E((X m)
k
).
If X is a d.r.v., i.e., X :
_
x
i
p
i
_
iI
then

k
= E((X m)
k
) =

iI
(x
i
m)
k
p
i
(if the previous sum exists).
339
If X is a c.r.v., i.e, X :
_
x
f(x)
_
xR
then

k
= E((X m)
k
) =
_

(x m)
k
f(x)dx
(if the previous integral converges).
The central moments of order k generalize the variance because

2
= E((X m)
2
) = V (X).
We have the following relationship moments and central moments.
Theorem. Let X be a random variables and let k N. Then

k
=
k

i=0
(1)
i
C
i
k

ki
(
1
)
i
, where
0
= 1.
Proof. By using the binomial theorem, the properties of the expected value and
the fact that m = E(X) =
1
we have:

k
= E((X m)
k
) = E
_
k

i=0
C
i
k
X
ki
(m)
i
_
= E
_
k

i=0
(1)
k
C
i
k
X
ki

i
1
_
=
k

i=0
(1)
i
C
i
k
E(X
ki
)
i
1
=
k

i=0
(1)
i
C
i
k

ki

i
1
,
as desired.
Particular cases:

1
= 0

2
=
2

2
1

3
=
3
3
2

1
+ 2
3
1

4
=
4
4
3

1
+ 6
2

2
1
3
4
1
.
Examples
Example 1. We consider the following gambling game. A player bets on one of
the numbers 1 through 6. Three dice are rolled, and if the number bet by the player
appears i times, i = 1, 3, then the player wins i units; if the number bet by the player
340
does not appear on any of the dice, then the player loses 1 unit. Is the game fair to
the player?
Solution. By assuming that the dice are fair and independent of each other we
can use the urn model with replacement and 2 states (p =
1
6
, q =
5
6
) and 3 repeated
trials.
Let X be the random variables which represents the players winning in the game.
P(X = 1) = C
0
3

_
1
6
_
0

_
5
6
_
3
=
125
216
P(X = 1) = C
1
3

_
1
6
__
5
6
_
2
=
75
216
P(X = 2) = C
2
3

_
1
6
_
2

_
5
6
_
=
15
216
P(X = 3) = C
3
3

_
1
6
_
3

_
5
6
_
0
=
1
216
.
Hence the distribution of X is
X :
_
1 1 2 3
125
216
75
216
15
216
1
216
_
In order to determine whether or not this is a fair game for the player we compute
E(X).
E(X) =
125 + 75 + 2 15 + 3 1
216
=
17
216
.
The game is not fair, since in the long run, the player will lose 17 monetary units
at every 216 games he plays.
Example 2. Let X be the discrete random variable whose distribution is
X :
_
1 3 5 7 9 11
0, 05 0, 1 0, 15 0, 2 0, 3 0, 2
_
Compute its distribution function and draw its graph representation.
Solution. The distribution function F : R [0, 1] is dened as
F(x) = P(X < x)
F(x) =
_

_
0, x 1
0, 05, 1 < x 3
0, 05 + 0, 1 = 0, 15, 3 < x 5
0, 15 + 0, 15 = 0, 3, 5 < x 7
0, 3 + 0, 2 = 0, 5, 7 < x 9
0, 5 + 0, 3 = 0, 8, 9 < x 11
0, 8 + 0, 2 = 1, x > 11
341

`
1
0,8
0,5
0,3
0,15
0,05
O 1 3 5 7 9 11
F(x)
x
Example 3. Let X :
_
1 0 2
0, 2 a b
_
.
1) Determine a, b R such that E(X) = 0, 8.
2) Compute V (X).
3) Compute the moment and the central moment of order 3 of X.
Solution. 1) From the properties of the probability mass function and the deni-
tion of the expected value we have to determine a, b 0 such that:
_
0, 2 +a +b = 1
1 0, 2 + 0 a + 2 b = 0, 8

_
2b = 1
a = 0, 8 b

_
b = 0, 5
a = 0, 3
Hence
X :
_
1 0 2
0, 2 0, 3 0, 5
_
2) V (X) = E(X
2
) (E(X))
2
= (1)
2
0, 2 + 0
2
0, 3 + 2
2
0, 5 0, 8
2
= 2, 2 0, 64 = 1, 56
4)
3
= E(X
3
) = (1)
3
0, 2 + 0
3
0, 3 + 2
3
0, 5 = 0, 2 + 4 = 3, 8

3
= E((X 0, 8)
3
) = (1 0, 8)
3
0, 2 + (0 0, 8)
3
0, 3 + (2 0, 8)
3
0, 5
= 1, 1664 0, 1536 + 0, 864 = 0, 456
By using the relationship between central moments and moments we have a second
342
method for computing
3

3
=
3
3
2

1
+ 2
3
1
= 3, 8 3 2, 2 0, 8 + 2 0, 8
3
= 3, 8 5, 28 + 1, 024 = 0, 456
as we expected.
Example 4. Let f : R R dened by
f(x) =
_

1
18
x +k, if 1 < x 2
0, otherwise
a) Determine k R such that f is a density probability function of a continuous
random variable X.
b) Determine the distribution function F.
c) Determine E(X).
Solution. a) f is a density probability function if the following two conditions are
satised:
f(x) 0, x R

f(x)dx = 1
The rst condition implies that k
x
18
for each x (1, 2] wherefrom we easily
obtain
k
2
18
=
1
9
.
From the second condition we obtain:
1 =
_

f(x)dx =
_
1

f(x)dx +
_
2
1
f(x)dx +
_

2
f(x)dx
=
_
2
1
_

1
18
x +k
_
dx =
1
18

x
2
2

2
1
+kx

2
1
=
4
36
+
1
36
+ 3k.
It remains for us to solve the equation
3k = 1 +
1
12
, k =
13
36
_
observe that k
1
9
_
.
The distribution function is F : R [0, 1]
F(x) =
_
x

f(t)dt.
- for x 1, since f(x) = 0 then F(x) = 0
343
- for 1 < x 2,
F(x) =
_
x

f(t)dt =
_
1

f(t)dt +
_
x
1
f(t)dt =
_
x
1
_

t
18
+
13
36
_
dt
=
_

t
2
36
+
13t
36
_

x
1
=
x
2
36
+
13x
36
+
1
36
+
13
36
=
x
2
36
+
13x
36
+
14
36
.
- for x > 2
F(x) =
_
2

f(t)dt +
_
x
2
f(t) = F(2) +
_
x
2
0dt = F(2) = 1.
Hence
F(x) =
_

_
0, x 1

x
2
36
+
13x
36
+
14
36
, 1 < x 2
1, x > 2
b) E(X) =
_

xf(x)dx =
_
2
1
x
_

x
18
+
13
36
_
dx
=
_

x
3
54
+
13x
2
72
_

2
1
=
8
54

1
54
+
52
72

13
72
=
1
6
+
13
24
=
9
24
=
3
8
.
Example 5. Let f : R R be a function dened as
f(x) =
_
e

x
5
, x 0
0, x < 0
a) Determine R such that f is a density probability function of a continuous
random variable X.
b) Compute E(X), V (X) and
15
.
Solution. a) Since f is a probability density function then
f(x) 0, x R and
_

f(x)dx = 1.
From the inequality f(x) 0, x R we obtain 0.
From the second condition we obtain
1 =
_

f(x)dx =
_

0
e

x
5
dx =
_

0
e
y
5dy = 5(1) = 5.
344
Hence =
1
5
0.
b) E(X) =
_

xf(x)dx =
_

0
x
1
5
e

x
5
dx.
By using the following change of variable:
x
5
= y, x = 5y, dx = 5dy
we have:
E(X) =
_

0
5y
1
5
e
y
5dy = 5
_

0
ye
y
dy = 5(2) = 5
V (X) = E(X
2
) [E(X)]
2
.
By using the same change of variable we obtain
E(X
2
) =
_

0
x
2

1
5
e

x
5
dx =
_

0
25y
2

1
5
e
y
5dy
= 25(3) = 25 2! = 50,
hence V (X) = 50 25 = 25.
Using once more the Gamma function we obtain

15
=
_

0
x
15

1
5
e

x
5
dx =
_

0
5
15
y
15

1
5
e
y
5dy
= 5
15
(16) = 5
15
15!
7.5 Special random variables
Certain types of random variables occur over and over again in applications. In
this section we will study a few of them.
Discrete random variables
The Bernoulli and binomial random variables
Suppose that we perform an experiment whose outcome can be classied as either
a success (with probability p, 0 < p < 1) or a failure (with probability 1 p).
If we let X = 1 when the outcome is a success and X = 0 when it is a failure then
the distribution of X is
X :
_
1 0
p 1 p
_
.
A random variable X is said to be a Bernoulli random variable (after the Swiss
mathematician James Bernoulli) if its distribution is:
X :
_
1 0
p 1 p
_
, 0 < p < 1.
345
The expected value is E(X) = 1 p + 0 (1 p) = p.
The variance is
V (X) = E(X
2
) (E(X))
2
= 1
2
p + 0
2
(1 p) p
2
= p(1 p) = pq,
where by q we denoted the probability of a failure; q = 1 p.
Suppose that n independent trials, each of which is a Bernoulli experiment, are
performed. If X represents the number of successes that occur in the n trials, then X
is said to be a binomial random variables with parameters (n, p).
Notation X B(n, p).
The probability mass function of a binomial random variable with parameters n
and p is given by
P(X = k) = C
k
n
p
k
(1 p)
nk
= C
k
n
p
k
q
nk
, k = 0, n
(the reasoning is similar to that used in the urn model with two states and with
replacement).
Denition. The binomial random variable with parameters n and p is the
random variable whose distribution is
X :
_
k
C
k
n
p
k
q
nk
_
k=0,n
, p +q = 1, p (0, 1).
To check that P(X = k) = C
k
n
p
k
q
nk
, k = 1, n is a probability mass function, we
note that:
P(X = k) = C
k
n
p
k
q
nk
> 0
and
n

k=1
P(X = k) =
n

k=1
C
k
n
p
k
q
nk
= (p +q)
n
= 1.
Remark. If X is a binomial random variable with parameters n and p, X
B(n, p), then
E(X) = np and V (X) = npq.
Proof. Since a binomial random variable X, with parameters n and p, represents
the number of successes in n independent trials (with the success probability p), then
X can be represented as
X =
n

i=1
X
i
where X
i
=
_
1, if the i
th
trial is a success
0, otherwise
Because X
i
, i = 1, n, are n independent Bernoulli r.v. we have that
E(X) = E
_
n

i=1
X
i
_
=
n

i=1
E(X
i
) = np
346
V (X) = V
_
n

i=1
X
i
_
=
n

i=1
V (X
i
) = npq.
Remark. If X
1
B(n
1
, p) and X
2
B(n
2
, p) are independent then X
1
+X
2
is a
binomial random variable with parameters (n
1
+n
2
, p); i.e. X
1
+X
2
B(n
1
+n
2
, p).
Example. Suppose that an airplane engine will fail with probability 1 p inde-
pendently from engine to engine; suppose that the airplane will make a successful
ight if at least a half of its engines are operative. For what value of p a four-engine
airplane is preferable to a two-engine airplane?
Solution. As the number of functioning engines is a binomial random variable
with parameters (n, p) it follows that the probability for a four engine airplane to
make a successful ight is
P(X
1
= 2) +P(X
1
= 3) +P(X
1
= 4)
= C
2
4
p
2
(1 p)
2
+C
3
4
p
3
(1 p) +C
4
4
p
4
(1 p)
0
= 6p
2
(1 p)
2
+ 3p
3
(1 p) +p
4
whereas the corresponding probability for a two engine airplane is
P(X
1
= 1) +P(X
2
= 2) = C
1
2
p(1 p) +C
2
2
p
2
= 2p(1 p) +p
2
.
Hence, the four engine plane is better if
6p
2
(1 p)
2
+ 3p
3
(1 p) +p
4
2p(1 p) +p
2
which is equivalent to (by dividing the inequality with p)
6p(1 p)
2
+ 3p
2
(1 p) +p
3
2 p (p 1)
2
(3p 2) 0
which is equivalent to
p
2
3
.
In conclusion, the four-engine plane is better when the success probability is
greater than
2
3
, whereas the two-engine plane is better if the success probability
is smaller than
2
3
.
The geometric random variable
The geometric random variable is closely related to the binomial random variable.
Consider a sequence of independent Bernoulli experiments where p is the proba-
bility of success and q = 1 p the probability of failure.
We saw that the random variable which represents the number of successes (in n
successive trials) is binomial.
The geometric random variable represents the waiting time until the rst
success occurs.
347
If the rst success occurs at the k
th
trial, then we must have k 1 failures before
the rst success. The Bernoulli trials are independent, hence the probability of the
desired event is (1 p)
k1
p = q
k1
p, k 1 (see also the geometric urn model).
Denition. The geometric random variable with parameter p is the random vari-
able whose distribution is
X :
_
k
q
k1
p
_
k1
p (0, 1), p +q = 1.
To check that P(X = k) = q
k1
p, k 1 is a probability mass function, we note
that:
P(X = k) = q
k1
p > 0
and

k=1
P(X = k) =

k=1
q
k1
p = p

k=1
q
k1
= p lim
n
(1 +q + +q
n
)
= p lim
n
1 q
n1
1 q
= p
1
1 q
=
p
p
= 1.
Remark. If X is a geometric random variable with parameter p (0, 1) then
E(X) =
1
p
and V (X) =
q
p
2
.
Proof.
E(X) =

k=1
kq
k1
p = p

k=1
kq
k1
= p
_

k=1
q
k1
_

= p
_
1
1 q
_

= p
1
(1 q)
2
=
p
p
2
=
1
p
,
as desired.
348
To determine V (X) we compute rst E(X
2
).
E(X
2
) =

k=1
k
2
q
k1
p = p

k=1
k
2
q
k1
= p

k=1
(k
2
k +k)q
k1
= p

k=2
k(k 1)q
k1
+p

k=1
kq
k1
= pq

k=2
k(k 1)q
k2
+p

k=1
kq
k1
= pq
_

k=1
q
k
_

+
1
p
= pq
_
1
1 q
_

+
1
p
= pq
_
1
(1 q)
2
_

+
1
p
= pq
2
(1 q)
3
+
1
p
=
2q
p
2
+
1
p
=
q +q +p
p
2
=
1 +q
p
2
.
Hence
V (X) =
1 +q
p
2

1
p
2
=
q
p
2
.
We can also remark that the standard deviation is:
(X) =

q
p
.
The occurrence of a geometric series explains the use of the word geometric in
describing the probability distribution.
As an application of the previous remark we present the following:
- if we toss a fair coin, then the expected waiting time for the rst head to occur
is
1
p
=
1
1
2
= 2 tosses
- if we roll a fair dice, then the expected waiting time for the six to occur is
1
p
=
1
1
6
= 6 rolls.
The negative binomial (Pascal) random variable
The binomial distribution nds the probability of exactly k successes in n inde-
pendent trials.
The geometric distribution nds the number of independent trials until the rst
success occurs.
349
We can generalize these two results and nd the number of independent trials
required for k successes.
Suppose that independent trials, each having the probability of success p, 0 < p <
1 are performed until a total of k successes is obtained.
Let X be the random variable which represent the number of trials required, then
P(X = n) = C
k1
n1
p
k
(1 p)
nk
, n = k, k + 1, . . . ,
The previous equality holds because, in order that k successes to occur in the rst
n trials, there must be k 1 successes in the rst n 1 trials and the n
th
trial must
be a success. The mentioned probability was computed in the Pascal urn model.
Denition. The negative binomial (or Pascal) random variable with parameters
k and p is the random variable whose distribution is:
X :
_
n
C
k1
n1
p
k
q
nk
_
nk
To check that P(X = n) = C
k1
n1
p
k
q
nk
, n k is a probability mass function, we
note that
P(X = n) > 0
and

n=k
C
k1
n1
p
k
q
nk
= p
k
(C
k1
k1
+C
k1
k
q +C
k1
k+1
q
2
+. . . )= p
k
1
(1 q)
k
=
p
k
p
k
= 1.
In establishing the previous equality we used the following Taylor expansion:
1
(1 q)
k
= 1 +kq +
k(k + 1)
2
q
2
+
k(k + 1)(k + 2)
3!
q
3
+. . . , [q[ < 1.
The geometric random variable is a negative binomial random variable with k = 1.
Remark. If X is negative binomial random variable with parameters k and p
(0, 1) then
E(X) =
k
p
and V (X) =
kq
p
2
.
Proof.
E(X) =

n=k
nC
k1
n1
p
k
q
nk
=
k
p

n=k
C
k
n
p
k+1
q
nk
, since nC
k1
n1
= kC
k
n
=
k
p

m=k+1
C
k+11
m1
p
k+1
q
m(k+1)
, by setting m = n + 1
=
k
p
1 =
k
p
since the numbers C
k+11
m1
p
k+1
q
m(k+1)
, m k + 1
350
represent the probability mass function of a negative binomial random variables with
parameters (k + 1, p).
To determine V (X) we compute rst E(X
2
).
E(X
2
) =

n=k
n
2
C
k1
n1
p
k
q
nk
=
k
p

n=k
nC
k
n
p
k+1
q
nk
, since nC
k1
n1
= kC
k
n
=
k
p

m=k+1
(m1)C
k
m1
p
k+1
q
m(k+1)
by setting m = n + 1
=
k
p

m=k+1
mC
k+11
m1
p
k+1
q
n(k+1)

k
p

m=k+1
C
k+11
m1
p
k+1
q
m(k+1)
=
k
p

k + 1
p

k
p
=
k
p
_
k + 1
p
1
_
.
Therefore,
V (X) =
k
p
_
k + 1
p
1
_

_
k
p
_
2
=
k
2
+k kp k
2
p
2
=
k(1 p)
p
2
=
kq
p
2
,
as desired.
Example. Find the expected value and the variance of the number of times one
must roll a dice until the face 6 occurs 4 times.
Solution. The experiment can be described by a negative binomial random vari-
able with parameters k = 4 and p =
1
6
.
Hence
E(X) =
k
p
=
4
1
6
= 24
and
V (X) =
4
5
6
_
1
6
_
2
= 120.
The hypergeometric random variable
The hypergeometric random variable is obtained while sampling without replace-
ment.
Suppose that a sample of size n is to be chosen randomly (without replacement)
from an urn containing a white balls and b black balls. If we let X denote the number
351
of white balls selected, then
P(X = k) =
C
k
a
C
nk
b
C
k
a+b
, k = 0, 1, . . . , n
(see also the urn model with two states without replacement).
Denition. The hypergeometric random variables with parameters n, a, b (n
a +b, max(a, n b) k min(n, a)) is the random variable whose distribution is
X :
_
k
C
k
a
C
nk
b
C
n
a+b
_
k=0,n
To check that P(X = k) =
C
k
a
C
nk
b
C
n
a+b
, k = 0, n is a probability mass function we
note that
P(X = k) > 0
and
n

k=0
P(X = k) =
n

k=0
C
k
a
C
nk
b
C
n
a+b
=
1
C
n
a+b
n

k=0
C
k
a
C
nk
b
=
C
n
a+b
C
n
a+b
= 1.
In establishing the previous equality we used the Vandermondes identity:
n

k=0
C
k
a
C
nk
b
= C
n
a+b
.
Remark 1. If X is a hypergeometric random variable with parameters n, a, b then
E(X) = n
a
a +b
, V (X) = n
a
a +b

b
a +b

a +b n
a +b 1
.
Proof.
E(X) =
n

k=1
k
C
k
a
C
nk
b
C
n
a+b
=
1
C
n
a+b
n

k=0
(kC
k
a
)C
nk
b
=
1
C
n
a+b
n

k=1
(aC
k1
a1
)C
nk
b
, since kC
b
a
= aC
k1
a1
=
a
C
n
a+b
n1

l=0
C
l
a1
C
n1l
b
, by setting k = l + 1
=
a
C
n
a+b
C
n1
a+b1
, by using Vandermondes identity
=
a
C
n
a+b

n
a +b
C
n
a+b
= n
a
a +b
, since C
n1
a+b1
=
n
a +b
C
n
a+b
.
352
Hence E(X) = n
a
a +b
.
To determine V (X) we compute rst E(X
2
).
E(X
2
) =
n

k=0
k
2

C
k
a
C
nk
b
C
n
a+b
=
1
C
n
a+b
n

k=1
[k(k 1) +k]C
k
a
C
nk
b
=
1
C
n
a+b
n

k=2
(k(k 1)C
k
a
)C
nk
b
+E(X)
=
1
C
n
a+b
n

k=2
a(a 1)C
k2
a2
C
nk
b
+E(X),
since k(k 1)C
k
a
= (k 1)aC
k1
a1
= a(a 1)C
k2
a2
=
a(a 1)
C
n
a+b
n2

k=0
C
l
a2
C
n2l
b
+E(X), by setting k = l + 2
=
a(a 1)
C
n
a+b
C
n2
a+b2
+E(X), by using the Vandermondes identity
=
a(a 1)n(n 1)
(a +b)(a +b 1)
+n
a
a +b
V (X) = E(X
2
) (E(X))
2
= n(n 1)
a(a 1)
(a +b)(a +b 1)
+n
a
a +b
n
2
a
2
(a +b)
2
= = n
a
a +b

b
a +b

a +b n
a +b 1
,
as desired.
Remark 2. Let X be a hypergeometric random variable with parameters n, a
and b. If we denote by p (respectively q) the probability of extracting a white ball
(respectively a black ball) at the beginning of the experiment then the expected value
and the variance of the r.v. X can be written as
E(X) = np, V (X) = npq
a +b n
a +n 1
,
where p =
a
a +b
and q =
b
a +b
.
Remark 3. (Approximation to binomial distribution)
If n balls are randomly chosen without replacement from a set of a + b balls, of
which a are white balls, then the r.v. which represents the number of white balls
extracted is hypergeometric. If a and b are large in relation to n then it seems that
there is no dierence whether the selection is made with or without replacement.
In this case, when a and b are large, the probability of taking a white ball at each
additional selection will be approximately equal to p =
a
a +b
.
353
We may expect, that the probability mass function of X can be approximate by
the p.m.f. of a binomial r.v. with parameters n and p.
We will verify now the previous statement.
P(X = k) =
C
k
a
C
nk
b
C
n
a+b
=
a!
k!(a k)!

b!
(n k)!(b n k)!

n!(a +b n)!
(a +b)!
=
n!
k!(n k)!

a!(a +b n)!
(a +b)!(a k)!

b!
(b n k)!
= C
k
n
a(a 1) . . . (a k + 1)
(a +b) . . . (a +b k + 1)

b . . . (b n k + 1)
(a +b k)(a +b n + 1)
= C
k
n
a
a +b

a 1
a +b 1
. . .
a k + 1
a +b k + 1

b
a +b k

b n k + 1
a +b n + 1
C
k
n
p
k
q
nk
.
In practice, the hypergeometric law can be replaced by the binomial distribution
if the following inequality holds 10n < a +b.
The Poisson random variable
Denition. The Poisson random variable with parameter , > 0 is a random
variable whose distribution is:
X :
_
k
e

k
k!
_
k=0,1,...
To check that P(X = k) = e


k
k!
, k 0 is a probability mass function we note
that P(X = k) > 0 and

k=0
P(X = k) =

k=0
e


k
k!
= e

k=0

k
k!
= e

= 1.
In establishing the previous equality we used the following Taylor expansion:
e

= 1 +

1!
+

2
2!
+. . . , R.
The Poisson probability distribution was introduced by S.D. Poisson in the book
entitled Recherches sur la probabilite des jugements en mati`ere criminelle et en
mati`ere civile.
Remark 1. If X is a Poisson random variable with parameter , > 0, then
E(X) = and V (X) = .
354
Proof.
E(X) =

k=0
ke

k
k!
= e

k=1
k

k
k!
= e

k=1

k1
(k 1)!
= e

=
V (X) = E(X
2
) [E(X)]
2
=

k=0
k
2
e

k
k!

2
= e

k=1
[k(k 1) +k]

k
k!

2
= e

k=2

k2
(k 2)!
+e

k=1

k1
(k 1)!

2
= e

2
e

+e

2
=
Remark 2. (The Poisson distribution as the limit of the binomial)
The Poisson random variable may be used as an approximation for a binomial
random variable with parameters (n, p) when n is large (compared to k) and p is
small enough such that np is of moderate size.
Suppose that X is a binomial random variable with parameters (n, p) and let
= np. Then
C
k
n
p
k
(1 p)
nk
=
n!
k!(n k)!
p
k
(1 p)
nk
=
n!
k!(n k)!
_

n
_
k
_
1

n
_
nk
=
n(n 1) . . . (n k + 1)
n
k


k
k!

_
1

n
_
n
_
1

n
_
k
If n is large and p is small, then
_
1

n
_
n
=
_
_
1

n
_

n(n 1) . . . (n k + 1)
n
k
1
355
(since k is much smaller)
_
1

n
_
k
1
Hence, for n large (compared to k) and p small,
C
k
n
p
k
(1 p)
nk
e


k
k!
In consequence, if n independent trials (whose outcomes are success with prob-
ability p and failure with probability 1 p) then, for n large and p small such that
np is moderate in size, the number of successes occurring is approximately a Poisson
variable with parameter = np.
Some examples of random variables that usually follow the Poisson probability
law are:
1. The number of misprints on a page (or on a group of pages) of a book.
2. The number of customers entering a bank on a given day.
3. The number of people in a community living to 90 years of age.
4. The number of particles emitted by a radioactive source within a certain period
of time.
5. The number of accidents in 1 day on a particular stretch of a highway.
Example. (Misprints on a page)
Suppose a page of a book contains n = 1000 characters, each of which is misprinted
with a probability p = 10
4
. Compute the probabilities of having:
a) no misprint on the page
b) at least one misprint on the page
both by binomial formula and by Poisson formula.
Solution. Let X be the r.v. which represents the number of misprints on the page:
a) - by the binomial formula:
C
0
1000
(10
4
)
0
(1 10
4
)
1000
0.904833
- by the Poisson formula with
= np = 1000 10
4
= 0, 1 :
0, 1e
0,1
0!
0, 904837
b) - by the binomial formula
P(X 1) = 1 P(X = 0) 1 0.904833 = 0, 095167
- by the Poisson formula
P(X 1) = 1 P(X = 0) 1 0, 904837 = 0, 095163.
Remark 3. (The sum of independent Poisson variables is Poisson)
If X
1
and X
2
are independent Poisson random variables with parameters
1
and

2
, respectively, then X
1
+X
2
is Poisson with parameter
1
+
2
.
356
Proof.
P(X
1
+X
2
= n) =
n

i=0
P(X
1
= k, X
2
= n k)
=
n

i=0
P(X
1
= k) P(X
2
= n k) =
n

k=0
e


k
1
k!
e


nk
2
(n k)!
= e
(
1
+
2
)

1
n!
n

k=0
n!
k!(n k)!

k
1

nk
2
= e
(
1
+
2
)

1
n!
n

k=0
C
k
n

k
1

nk
2
= e
(
1
+
2
)

1
n!
(
1
+
2
)
n
= e
(
1
+
2
)
(
1
+
2
)
n
n!
, n = 0, 1, . . .
Sometimes, we are interested not just in one Poisson random variable, but in a
family of random variables.
For example, in the previous example we may be interested for the probabilities
of the number of misprints on several pages.
A family of random variables X(t) depending on a parameter t is called a stochastic
or random process.
The parameter t is time in most applications.
Next we will present a particular stochastic process called the Poisson process.
Denition. (Poisson process)
A family of random variable (X(t))
t>0
is called a Poisson process with rate ,
> 0, if the r.v. X(t) (the number of occurrences of some type in any interval of
length t) has a Poisson distribution with parameter t for any t > 0:
P(X(t) = k) =
(t)
k
e
t
k!
, k = 0, 1, . . .
and for each 0 < t
1
< < t
n
the random variables X(t
1
); X(t
2
)X(t
1
); . . . , X(t
n
)
X(t
n1
) are independent (i.e. the numbers of occurrences in non overlapping time
intervals are independent of each other).
Example. (Misprints on several pages)
Suppose the pages of a book contain misprinted characters, independent of each
other, with a rate of = 0, 1 misprints per page. Suppose that the number X(t) of
misprints on any t pages form a Poisson process. Find the probabilities of having
a) no misprints on the rst 3 pages
b) at least two misprints on the rst two pages.
Solution. a) Since in this case t = 3 and t = 0, 3, then
P(X(3) = 0) =
0, 3
0
e
0,3
0!
= e
0,3
0, 74
b) In this case t = 2, t = 0, 2. Hence
P(X(2) 2) = 1 [P(X(2) = 0) +P(X(2) = 1)]
357
= 1
0, 2
0
e
0,2
0!

0, 2
1
e
0,2
1!
1 1, 2e
0,2
0, 017
The next remark, which is an immediate consequence of the denition of random
processes and Remark 3, says that the number of occurrences in an interval depends
only on the length of the interval. This property is called stationarity.
Remark 4. For any s, t > 0
X(s +t) X(s) = X(t).
Continuous random variables
The uniform random variable
Denition. A random variable X is said to be uniformly distributed over the
interval [a, b] if its distribution is
X :
_
x
f(x)
_
xR
,
where the probability density function is
f(x) =
_
1
b a
, a x b
0, otherwise
Notation: X U(a, b)
Remark 1. The function f veries the two conditions of a probability density
function.
Proof. 1) f(x) 0, x R (since a < b)
2)
_

f(x)dx =
_
b
a
1
b a
dx =
x
b a

b
a
=
b a
b a
= 1
Remark 2. The distribution function
If X U(a, b), then
F(x) =
_

_
0, x a
x a
b a
, a < x b
1, x > b.
Proof. F : R [0, 1], F(x) = P(X < x) =
_
x

f(t)dt
if x a then
F(x) =
_
x

0dt = 0
if a < x b then
F(x) =
_
a

0dt +
_
x
a
1
b a
dt = 0 +
t
b a

x
a
=
x a
b a
358
if b < x then
F(x) =
_
a

0dt +
_
b
a
1
b a
dt +
_
x
b
0dt =
b a
b a
= 1,
as desired.
Remark 3. If X U(a, b), then
E(X) =
a +b
2
, V (X) =
(b a)
2
12
.
Proof.
E(X) =
_

xf(x)dx =
_
b
a
x
1
b a
dx =
1
b a

x
2
2

b
a
=
b
2
a
2
2(b a)
=
b +a
2
To determine V (X) we compute rst E(X
2
).
E(X
2
) =
_

x
2
f(x)dx =
_
b
a
x
2
1
b a
dx
=
1
b a

x
3
3

b
a
=
b
3
a
3
3(b a)
=
a
2
+ab +b
2
3
.
Hence, the variance is
V (X) =
a
2
+ab +b
2
3

_
a +b
2
_
2
=
a
2
2ab +b
2
12
=
(b a)
2
12
Notation. If a = 0 and b = 1 we obtain the standard uniform random variable.
Example. Buses arrive at a specied station at 10 minute intervals starting a
6 A.M. That is, they arrive at 6, 6:10, 6:20 and so on. If a passenger arrives at the
station at a time that is uniformly distributed between 6 and 6:20, nd the probability
that he waits:
a) less than 5 minutes for a bus
b) more than 7 minutes for a bus.
Solution. Let X denote the number of minutes past 6 that the passenger arrives
at the station.
a) The passenger will have to wait less than 5 minutes if (and only if) he arrives
between 6:05 and 6:10 or between 6:15 and 6:20.
Hence, the desired probability is
P(5 < X < 10) +P(15 < X < 20) =
_
10
5
1
20
dx +
_
20
15
1
20
dx =
10
20
=
1
2
b) The passenger will wait more than 7 minutes if he arrives between 6 and 6:03
or between 6:10 and 6:13, so the desired probability:
P(0 < X< 3) +P(10 < X< 13) =
3
20
+
3
20
=
6
20
=
3
10
.
359
The exponential random variable
Denition. A random variable X is said to be an exponential random variable (or
exponentially distributed) with parameter , > 0 if its probability density function
is
f(x) =
_
e
x
, x > 0
0, x 0
Remark 1. The function f veries the two conditions of a probability density
function.
Proof. 1) f(x) 0, x R (since > 0)
2)
_

f(x)dx =
_
0

0 dx +
_

0
e
x
dx =
e
x

0
=
_
0
1

_
= 1.
Remark 2. The distribution function
If X is an exponential distribution then
F(X) =
_
0, x 0
1 e
x
, x > 0
Proof. F : R [0, 1],
F(x) = P(X < x) =
_
x

f(t)dt
- if x 0,
F(x) =
_
x

0dt = 0
- if x > 0,
F(x) =
_
0

0dt +
_
x
0
e
t
dt
=
_
x
0
(e
t
)

dt = e
x
+ 1 = 1 e
x
,
as desired.
Remark 3. If X is exponential distributed then
E(X) =
1

and V (X) =
1

2
.
Proof.
E(X) =
_

xf(x)dx =
_

0
xe
x
dx
By using the following change of variable: y = x we obtain
E(X) =
_

0
y

e
y
dy =
1

_

0
y
21
e

dy =
1

(2) =
1

360
To determine V (X) we compute rst E(X
2
) (in computing the corresponding
integral, we will use the same change of variable as before).
E(X
2
) =
_

0
x
2
e
x
dx =
_

0
y
2

2
e
y
dy
=
1

2
_

0
y
31
e
y
dy =
1

2
(3) =
2

2
Hence, the variance is
V (X) = E(X
2
) (E(X))
2
=
2

2

1

2
=
1

2
Remark 4. The memoryless property
If X is an exponential random variable then:
P(X > s +t[X > t) = P(X > s), for all s, t 0.
Proof.
P(X > s +t[X > t) =
P(X > s +t, X > t)
P(X > t)
=
P(X > s +t)
P(X > t)
=
1 P(X s +t)
1 P(X t)
=
1 P(X < s +t)
1 P(X < t)
=
1 F(s +t)
1 F(t)
=
e
(s+t)
e
t
= e
s
= 1 F(s) = 1 P(X < s)
= P(X s) = P(X > s).
To understand why the previous equality is called the memoryless property, con-
sider that X represents the length of time that an item functions before failing. The
previous equality says that the probability that an item functioning at age t will con-
tinue to function for at least an additional time s is the same as of a new item to
function at least a period of time equal to s.
We can say the mentioned equality says that an functional item is as good as
new. It can be shown that the exponential random variables are the only continuous
random variables that are memoryless.
Then is a very important relationship between the exponential random variables
and the Poisson process.
The next remark shows that in a Poisson process, the waiting time for an occur-
rence and the time between two any consecutive occurrences (the interarrival time)
have the same exponential distribution with parameter .
Remark 5. Let (X(t))
t0
be a Poisson process with rate , > 0.
a) If s 0 and if T
1
is the random variable which represents the length of time till
the rst occurrence after s then T
1
is an exponential random variable with parameter
.
361
b) Suppose we have an occurrence at time s 0. Let T
2
be the random variable
which represents the time between this occurrence and the next one. Then T
2
is an
exponential random variable with parameter .
Proof. a) Let t > 0. We have to compute P(T
1
< t).
P(T
1
< t) = P(T
1
t) = 1 P(T
1
> t).
Clearly, for any t > 0, the waiting time T
1
is greater than t, if and only if there is
no occurrence in the time interval (s, s +t]. Thus:
P(T
1
< t) = 1 P(T
1
> t) = 1 P(X(s +t) X(s) = 0)
= 1 P(X(t) = 0) = 1
(t)
0
0!
e
t
= 1 e
t
The previous equality together with P(T
1
< t) = 0 for t 0, shows that T
1
has
the distribution function of an exponential random variable with parameter .
b) We have that P(T
2
< t) = 0 for t 0.
Let t > 0. We have to compute P(T
2
< t).
Instead of assuming that we have an occurrence at time s, we assume that we have
an occurrence in the time interval [s s, s] and let s 0. Then
P(T
2
< t) = P(T
2
t) = 1 P(T
2
> t)
= 1 lim
s0
P(X(s +t) X(s) = 0[X(s) X(s s) = 1)
= 1 P(X(s +t) X(s) = 0) = 1 P(X(t) = 0) =1 e
t
.
Thus T
2
has the distribution function of an exponential random variable.
Remark 6. If in a Poisson process the rate is , that is the mean number of
occurrences per unit time is , then the mean interoccurrence time is
1

.
Example. A checkout counter at a supermarket completes the process according
to an exponential random variable with a service rate of 15/hour. A customer arrives
at the checkout counter. Find the following probabilities.
a) the service is completed in more than 5 minutes;
b) the customer has to wait more than 8 minutes knowing that he already waited
3 minutes;
c) the service is completed in a time between 5 and 8 minutes.
Solution. a) We have rst to convert the service rate so that the time period is
1 minute.
The service rate is = 0, 25/minute.
a) P(X > 5) = 1 P(X 5) = 1 P(X < 5)
= 1 1 +e
0,255
= e
1,25
.
b) P(X > 8[X > 3) =
P(X > 8, X > 3)
P(X > 3)
=
P(X > 8)
P(X > 3)
=
1 1 +e
0,258
1 1 +e
0,253
= e
0,255
= e
1,25
,
362
as we expected according to the memoryless property of the exponential random
variable.
c) P(5 < X < 8) =
_
8
5
f(x)dx =
_
8
5
e
x
dx = e
x

8
5
= e
0,255
e
0,258
= e
1,25
e
2
.
The Erlang random variable
The Erlang distribution is a generalization of the exponential distribution. While
the exponential random variable describes the time between two consecutive events,
the Erlang random variable describes the time interval between any event and the k
th
following event.
Denition. A random variable X is said to be an Erlang random variable with
parameters and n ( > 0, N

) if it has the following distribution


X :
_
x
f(x, n, )
_
xR
where
f(x, n, ) =
_
_
_

n
x
n1
e
x
(n 1)!
, n = 1, 2, 3, . . . , x 0
0, x < 0
Remark 1. The function f(, n, ) veries the two conditions of a probability
density function.
Proof. 1) f(x, n, ) 0, x R (since > 0).
2) In order to compute the integral
_

f(x, n, )dx we will use the Gamma


function. If we let t = x, then
_

f(x, n, )dx =

n
(n 1)!
_

0
x
n1
e
x
dx
=

n
(n 1)!
_

0
_
t

_
n1
e
t
1

dt
=

n
(n 1)!

1

n
_

0
t
n1
e
t
dt
=
1
(n 1)!
(n) =
1
(n 1)!
(n 1)! = 1,
as we needed.
Remark 2. If X is an Erlang random variable, then
E(X) =
n

and V (X) =
n

2
.
363
Proof. In the next computations we will make the same change of variable as in
the proof of the previous remark.
E(X) =
_

xf(x, n, )dx =

n
(n 1)!
_

0
x x
n1
e
x
dx
=

n
(n 1)!
_

0
t
n

n
e
t

dt =
1
(n 1)!

1

(n + 1)
=
1
(n 1)!

1

n! =
n

.
Note that this is n times the expected value of the exponential distribution (with
parameter ).
Similarly,
E(X
2
) =
1
(n 1)!

1

2
(n + 2) =
n(n + 1)

2
.
Therefore, the variance of X is:
V (X) =
n(n + 1)

2

n
2

2
=
n

2
.
Remark 3. The distribution function of an Erlang random variable
If X is an Erlang random variable with parameters n and then
F(x) =
_

_
1
n1

k=0
(x)
k
e
x
k!
, x 0
0, x < 0
364
Proof. Let x > 0.
F(x) =
_
x

f(x, n, )dx =

n
(n 1)!
_
x
0
t
n1
e
t
dt
=
1
(n 1)!
_
x
0
(t)
n1
e
t
dt =
1
(n 1)!
_
x
0
u
n1
e
u
du
=
1
(n 1)!
_
x
0
u
n1
(e
u
)

du
=
1
(n 1)!
_
(x)
n1
e
x
+ (n 1)
_
x
0
u
n2
e
u
du
_
=
1
(n 2)!
_
x
0
u
n2
e
u
du
(x)
n1
e
x
(n 1)!
=
1
(n 3)!
_
x
0
u
n3
e
u
du
(x)
n2
e
x
(n 2)!

(x)
n1
e
x
(n 1)!
= =
=
_
x
0
ue
u
du
n

k=2
(x)
k
e
x
k!
= ue
u

x
0
+
_
x
0
(e
u
)

du
n

k=2
(x)
k
e
x
k!
= 1
n

k=0
(x)
k
e
x
k!
,
as desired.
Example. The lengths of phone calls at a certain phone booth are exponentially
distributed with a mean of 4 minutes. I arrived at the booth while Ana was using the
phone, and I was told that she already spent 2 minutes on the call before I arrived.
a) What is the average time I will wait until she ends her call?
b) What is the probability that Anas call will last between 3 and 6 minutes after
my arrival.
c) Assume that I am the rst in line at the booth to use the phone after Ana, and
by the time she nished her call more than 4 people were waiting to use the phone.
What is the probability that the time between I start using the phone and the time
the fourth person behind me starts his/her call is greater than 15 minutes?
Solution. Let X be the random variable which represents the lengths of calls at
the phone booth. Then
f
X
(x) =
_
e
x
, x 0
0, x < 0
=
1
4
a) Due to the memoryless property of the exponential random variable, the average
time I wait until Anas call ends is 4 minutes.
b) Due to the memoryless property of the exponential random variable, the proba-
bility that Anas call lasts between 3 and 6 minutes after my arrival is the probability
365
that an arbitrary call lasts between 3 and 6 minutes, which is
P(3 < X < 6) =
_
6
3
e
x
dx = (e
x
)

6
3
= e
3
e
6
= e

3
4
e

6
4
0.2492
c) Let Y the random variable that represents the time between I start my phone
call until the fourth person starts his/her call. Then Y is an Erlang random variable
with parameters n = 4 and =
1
4
. Then,
P(Y > 15) = 1 P(Y 15) = 1 P(Y < 15)
= 1 F
Y
(15) = 1 1 +
3

k=0
( 15)
k
k!
e
15
= e

15
4
_
1 + 15
1
4
+
1
2!
_
15
1
4
_
2
+
1
3!
_
15
1
4
_
3
_
0, 4838
The normal random variable
Denition. A random variable X is said to be a normal random variable with
parameters m and (m R, > 0) if it has the following distribution:
X :
_
x
f(x; m, )
_
xR
where
f(x, m, ) =
1

2
e

(xm)
2
2
2
.
Notation: X N(m,
2
).
The normal r.v. is also called the Laplace-Gauss random variable.
Remark 1. If X N(m,
2
) then its density function is a bell-shaped curve (or
Gauss curve) that is symmetric with respect to the line x = m (see gure below).
366

`
m 2 m m m + m + 2
1

2

399

The normal distribution was introduced by the French mathematician Abraham


de Moivre in 1733 and was used by him to approximate probabilities associated with
binomial random variables when n (the binomial parameter) is large. This result was
later extended by Gauss and Laplace.
Remark 2. The function f(, m, ) veries the two conditions of a probability
density function.
Proof. 1) f(x, m, ) > 0, x R (since > 0)
2) In order to compute the integral
_

f(x, m, )dx =
1

2
_

(xm)
2
2
2
dx
we make the following change of variable:
x m

2
= u
with x = m+

2u and dx =

2du.
Hence:
_

f(x, m, )dx =
1

2
_

e
u
2

2du
=
2

_

0
e
u
2
du =
2

2
= 1,
as desired.
In establishing the previous equality we used the value of the Euler-Poisson inte-
gral:
_

0
e
u
2
du =

2
.
367
Remark 3. If X N(m,
2
), m R, > 0 then
E(X) = m and V (X) =
2
.
Proof.
E(X) =
1

2
_

xe
(xm)
2
2
2
dx
=
1

2
_

(x m)e
(xm)
2
2
2
dx +
m

2
_

(xm)
2
2
2
d
Letting y = x m in the rst integral yields
E(X) =
1

2
_

ye

y
2
2
2
dy +m
_

f(x, m, )d
where f(, m, ) is the normal density. By symmetry, the rst integral must be zero,
so
E(X) = m
_

f(x, m, )d = m 1 = m.
Since E(X) = m, we have that
V (X) = E((X m)
2
) =
1

2
_

(x m)
2
e

(xm)
2
2
2
dx
=
1

2
_

2
u
2
e

u
2
2
du
=

2

2
_

u
_
e

u
2
2
_

du
=

2

2
_
ue

u
2
2

+
_

u
2
2
du
_
=

2

2
2
_

0
e

u
2
2
du =
2
2

2
_

2
=
2
.
Remark 4. a) If X N(m,
2
), m R, > 0 then
Y = aX +b N(am+b, a
2

2
) (a ,= 0, b R).
b) If X N(m,
2
), m R, > 0 then
Z =
X m

N(0, 1)
The random variable Z N(0, 1) is called a standard normal random variable.
Proof. We can suppose that a > 0 (the proof for a < 0 is quite similar).
Let F
Y
be the cumulative distribution function of the r.v. Y
F
Y
(y) = P(aX +b < x) = P
_
X <
x b
a
_
= F
X
_
x b
a
_
.
368
The density probability function of Y is obtained by dierentiating the previous
equality
f
Y
(y) =
1
a
f
X
_
x b
a
_
=
1

2a
e

(
xb
a
m
)
2
2
2
=
1

2a
e

(x(am+b))
2
2
2
a
2
Hence, Y is normal with mean am+b and variance a
2

2
.
b) Take a =
1

and b =
m

in the part a).


Remark 5. (The distribution function of a normal random variable)
Let : R R
(z) =
1

2
_
z
0
e

y
2
2
dy
be the Laplace function (whose values can be found in tables).
If X N(m,
2
), m R, > 0 then the distribution function of the r.v. X is
given by
F(x) =
1
2
+
_
x m

_
.
Also, we have
i) P(a < X < b) =
_
b m

_
a m

_
ii) P([X m[ < r) = 2
_
r

_
with the following particular cases:
P([X m[ < ) = 2(1) = 0, 6826
P([X m[ < 2) = 2(2) = 0, 9544
P([X m[ < 3) = 2(3) = 0, 9972
Proof. We shall list rst some properties of the Laplace function:
a) (0) = 0
b) lim
z
(z) =
1
2
c) lim
z
(z) =
1
2
d) (z) = (z), z R.
Let F : R [0, 1] be the distribution function of the r.v. X N(m,
2
). Then
F(x) = P(X < x) =
_
x

f(t; m, )dt
=
1

2
_
x

(tm)
2
2
2
dt
369
By making the following change of variable:
t m

= y, dt = dy
then
F(x) =
1

2
_ xm

y
2
2
dy =
1

2
_ xm

y
2
2
dy
=
1

2
_
0

y
2
2
dy +
1

2
_ xm

0
e

y
2
2
dy =
1
2
+
_
x m

_
i) P(a < X < b) = F(b) F(a) =
_
b m

_
a m

_
ii) P([X m[ < r) = P(r < X m < r) = P(mr < X < m+r)
=
_
m+r m

_
mr m

_
=
_
r

_
=
_
r

_
+
_
r

_
= 2
_
r

_
.
The particular cases are obtained from the previous equality by taking r = ,
r = 2 and r = 3.
Example. An expert in a paternity suit testies that the length of pregnancy
is approximately normally distributed with parameters m = 270 and = 10. The
defendent is able to prove that he wasnt in the country for a period that began 290
days before the birth of the child and ended 240 days before the birth. What is the
probability that the mother could have had a very long or a very short pregnancy as
mentioned before.
Solution. Let X denote the length of pregnancy in days. If he is the father, the
probability that the birth could occur within the indicated period is
P(X < 240 or X > 290) = P(X < 240) +P(X > 290)
= F(240) + 1 F(290) =
1
2
+
_
240 270
10
_
+ 1
1
2

_
290 270
10
_
= 1 +(3) (2) = 1 (3) (2) 0, 0241
Next, we will present the De Moivre-Laplace limit theorem which states that when
n is large a binomial random variable with parameters n and p will have approximately
the same distribution as a normal random variable with the same mean and variance
(as the binomial).
De Moivre (1733) proved this result for the particular case p =
1
2
and Laplace
(1812) generalized it to general p.
370
Theorem. (De Moivre-Laplace limit theorem)
If X
n
B(n, p) then, for any a, b R, a < b we have
lim
n
P
_
a
X
n
np
_
np(1 p)
< b
_
= (b) (a).
Remark 6. (Normal approximation with continuity correction)
If X
n
B(n, p) then for any integers i, j, 0 i j n then
P(i X
n
j)
_
_
_
j +
1
2
np
_
np(1 p)
_
_
_
_
_
_
i
1
2
np
_
np(1 p)
_
_
_.
In general, we make the following adjustments
P(X
n
j)
1
2
+
_
_
_
j +
1
2
np
_
np(1 p)
_
_
_
P(X
n
< j)
1
2
+
_
_
_
j
1
2
np
_
np(1 p)
_
_
_
Hence, we have two possible approximation to binomial probabilities:
- if n is large and p small such that np is moderate in size we can use the Poisson
approximation
- if np(1 p) is large (usually for np and n(1 p) 5) we can use the normal
approximation.
Example. Each item produced by a manufacturer is, independently, of good qual-
ity with probability 0,95. Approximate the probability of the event that at most 40
of the next 1000 items are of bad quality.
Solution. Let X be the random variable which represents the items of bad quality
among the next 1000 produced. It is obvious X B(1000; 0, 05). The expected value
of the binomial is np = 1000 0, 05 = 50 and the variance is
np(1 p) = 1000 0, 05 0, 95 = 47, 5
P(X 40) = P
_
X 40 +
1
2
_
P
_
_
_
X 50

47, 5

40 +
1
2
50

47, 5
_
_
_
=
1
2
+(1, 42) =
1
2
(1, 42) = 0, 5 0, 422 = 0, 078
371
The Gamma random variable
A gamma distribution is a generalization of the Erland distribution where n = a.
which may not be an integer. This distribution has a lot of applications in statistics.
Denition. A random variable X is said to be a gamma random variable with
two parameters a > 0 (shape parameter) and b > 0 (called the scale parameter) if it
has the following distribution:
X :
_
x
f(x; a, b)
_
xR
where
f(x; a, b) =
_
_
_
1
(a)b
a
x
a1
e

x
b
, x > 0
0, x 0.
Notation: X (a, b).
Remark 1. The function f(, a, b) veries the two conditions of a probability
density function.
Proof. 1) f(a, x, b) 0, x R (since a, b > 0).
2) In order to compute the integral
_

f(x, a, b)dx we make the following change


of variable
x
b
= y with x = by and dx = bdy. Hence
_

f(x, a, b)dx =
1
(a)b
a
_

0
b
a1
y
a1
e
y
bdy
=
1
(a)
_

0
y
a1
e
y
dy =
1
(a)
(a) = 1,
as desired.
Remark 2. If X (a, b) then E(X) = ab and V (X) = ab
2
.
Proof. In the next computations we will make the same change of variable as in
the proof of the previous remark.
E(X) =
_

xf(x, a, b)dx =
1
(a)b
a
_

0
xx
a1
e

x
b
dx
=
1
(a)b
a
_

0
b
a
y
a
e
y
bdy
=
b
a+1
(a)b
a
(a + 1) =
ba(a)
(a)
= ab.
372
Similarly,
E(X
2
) =
_

x
2
f(x, a, b)dx =
1
(a)b
a
_

0
x
a+1
e

x
b
dx
=
1
(a)b
a
_

0
b
a+1
y
a+1
e
y
bdy
=
b
2
(a)
(a + 2) =
b
2
(a + 1)a(a)
(a)
= a(a + 1)b
2
.
Therefore, the variance of X is:
V (X) = E(X
2
) (E(X))
2
= a(a + 1)b
2
a
2
b
2
= ab
2
.
Remark 3. (The distribution function of a Gamma random variable)
If X is a gamma random variable with parameters a and b, then
F(x) =
_
_
_
0, x 0
1
(a)
_ x
b
0
u
a1
e
u
du.
Proof. For each x > 0 we have
F(x) = P(X < x) =
_
x

f(t, a, b)dt =
1
(a)b
a
_
x
0
t
a1
e

t
b
dt
=
1
(a)b
a
_ x
b
0
b
a1
u
a1
e
u
bdu =
1
(a)
_ x
b
0
u
a1
e
u
du.
Gamma random variables, with values of the parameter a not just integers or half-
integers are often use to model continuous random variables with an approximately
known distribution on (0, ).
An important case in which we obtain a gamma random variable is described in
the following remark:
Remark 4. (Square of a normal random variable)
If X N(0,
2
) then Y = X
2

_
1
2
,
1
2
2
_
.
Proof. The distribution function F
Y
of Y is given by
F
Y
(y) = P(X
2
y) =
_
P(

y X

y), if y > 0
0, if y 0
=
_
F
X
(

y) F
X
(

y), if y > 0
0, if y 0
373
Hence, for y > 0 we have
f
Y
(y) = F

Y
(y) =
1
2

y
(f
X
(

y) +f
X
(

y))
=
1
2

y

2

2
e

y
2
2
=
1

2
y

1
2
e

y
2
2
=
1

_
1
2
_
(2
2
)
1
2
y
1
2
1
e

y
2
2
In conclusion this density is gamma with a =
1
2
and =
1
2
2
.
Remark 5. Sum of independent gamma variables
If X
1
(a
1
, b) and X
2
(a
2
, b) are independent then
X
1
+X
2
(a
1
+a
2
, b).
Proof. Appendix B.
The Beta random variable
Similarly, continuous random variables with unknown distribution on [0, 1] are
often modeled by Beta random variables.
Denition. A random variable X is said to be a Beta random variable with
parameters a and b (a > 0, b > 0) if it has the following distribution:
X :
_
x
f(x; a, b)
_
xR
where
f(x; a, b) =
_
_
_
1
B(a, b)
x
a1
(1 x)
b1
, x [0, 1]
0, otherwise
Remark 1. The function f(, a, b) veries the two conditions of a probability
density function.
Proof. 1) f(x, a, b) 0, x R (since a, b > 0).
2)
_

f(x, a, b)dx =
1
B(a, b)
_
1
0
x
a1
(1 x)
b1
dx =
1
B(a, b)
B(a, b) = 1.
Remark 2. If X is a Beta r.v. then
E(X) =
a
a +b
, V (X) =
ab
(a +b)
2
(a +b + 1)
.
374
Proof.
E(X) =
_
R
xf(x, a, b)dx =
1
B(a, b)
_
1
0
x
a
(1 x)
b1
dx
=
1
B(a, b)
B(a + 1, b) =
1
B(a, b)

a + 1 1
a + 1 +b 1
B(a, b) =
a
a +b
E(X
2
) =
1
B(a, b)
_
1
0
x
a+1
(1 x)
b1
dx =
1
B(a, b)
B(a + 2, b)
=
1
B(a, b)

a + 1
a +b + 1
B(a + 1, b)
=
1
B(a, b)

a + 1
a +b + 1

a
a +b
B(a, b)
=
a + 1
a +b + 1

a
a +b
.
Hence,
V (X) = E(X
2
) (E(X))
2
=
a + 1
a +b + 1

a
a +b

a
2
a +b
=
a
a +b
_
a + 1
a +b + 1

a
a +b
_
=
ab
(a +b)
2
(a +b + 1)
.
Remark 3. (The distribution function of a Beta random
variable)
If X is a Beta r.v. then
F(x) =
_

_
0, x < 0
1
B(a, b)
_
x
0
t
a1
(1 t)
b1
dt, x [0, 1]
1, x > 1
The Chi-square random variable
The Chi-square distribution straddle the exponential and the normal distribution.
If in the Gamma distribution we take a =
n
2
and b = 2
2
we obtain the Chi-square
distribution.
Denition. A random variable X is said to be a Chi-square r.v. or
2
random
variable with n degrees of freedom (n N

) and parameter ( > 0) if it has the


following distribution
X :
_
x
f(x; n, )
_
xR
where
f(x; n, ) =
_

_
1
(2
2
)
n
2

_
n
2
_x
n
2
1
e

x
2
2
, x > 0
0, x 0
375
By using the properties of the Gamma distribution we easily get that:
Remark 1. The function f(, n, ) veries the two conditions of the probability
density function.
Remark 2. If X is a
2
random variable then
E(X) = n
2
and V (X) = 2n
4
.
Remark 3. (The distribution function of a
2
random
variable)
If X is a
2
random variable then
F(x) =
_

_
0, x 0
1

_
n
2
_
_ x
2
2
0
u
n
2
1
e
u
du, x > 0
Remark 4. If X
1
, X
2
, . . . , X
n
are independent normal variables,
X
i
N(0,
2
), i = 1, n then the random variable X dened by
X = X
2
1
+ +X
2
n
is a Chi-squared r.v. with n degrees of freedom and parameter .
Proof. Since X
i
, i = 1, n, is normal then X
2
i

_
1
2
,
1
2
2
_
(see Remark 4 from
the subsection The Gamma random variable) and
X = X
2
1
+X
2
2
+ +X
2
n

_
n
2
,
1
2
2
_
(see Remark 5 from the subsection The Gamma random variable), as we needed.
The Chi-Square distribution is used in certain statistical inference problems.
The lognormal random variable
Denition. A random variable X is said to be a lognormal random variable with
parameters m and (m > 0, > 0) if it has the following distribution:
X :
_
x
f(x; m, )
_
xR
where
f(x; m, ) =
_
_
_
1

2x
e

1
2
2
ln
2 x
m
, x > 0
0, x 0
Remark 1. The function f(, m, ) veries the two conditions of a probability
density function.
376
Proof. By using the following change of variable y =
1

ln
x
m
with x = me
y
and
dx = me
y
dy we obtain
_

f(x; , m, )dx =
1

2
_

0
1
x
e

1
2
2
ln
2 x
m
dx
=
1

2
_

1
m
e
y
e

1
2
y
2
me
y
dy
=
1

2
_

y
2
2
=

2
= 1.
Remark 2. If X is a lognormal random variable then
E(X) = me

2
2
and V (X) = m
2
e

2
(e

2
1).
Proof. By using the same change of variable as in the previous remark we obtain:
E(X) =
_

xf(x; m, )dx =
1

2
_

0
e

1
2
2
ln
2 x
m
dx
=
1

2
_

0
e

y
2
2
me
y
dy =
m

2
_

0
e

y
2
2
+y

2
2
+

2
2
dy
=
me

2
2

2
_

0
e

(y)
2
2
dy =
m

2
e

2
2

2 = me

2
2
.
Similarly,
E(X
2
) =
1

2
_

me
y
e

y
2
2
me
y
dy
=
m
2

2
_

y
2
2
+2y2
2
+2
2
dy
=
m
2
e
2
2

2
_

(y2)
2
2
dy =
m
2

2
e
2
2

2
= m
2
e
2
2
In conclusion:
V (X) = E(X
2
) (E(X))
2
= m
2
e
2
2

_
me

2
2
_
2
= m
2
e
2
2
m
2
e

2
= m
2
e

2
(e

2
1).
Remark 3. If Y is the random variable dened by Y = ln X where X is a
lognormal random variable with parameters m > 0 and > 0 then
E(Y ) = E(ln X) = ln m and V (X) = V (ln Y ) = .
377
Proof. We will use the same change of variable as in the previous remarks
E(Y ) = E(ln X) =
_

ln x f(x, m, )dx
=
1

2
_

0
ln x
1
x
e

1
2
2
ln
2 x
m
dx
=
1

2
_

(y + ln m)
1
m
e
y
e

1
2
y
2
me
y
dy
=

2
_

ye

y
2
2
dy +
ln m

2
_

1
2
y
2
dy
= 0 +
ln m

2 = ln m.
Similarly
E(Y
2
) = E(ln
2
X) =
1

2
_

(y + ln m)
2
e

1
2
y
2
dy
=

2

2
_

y
2
e

y
2
2
dy +
ln
2
m

2
_

1
2
y
2
dy
=

2

2
_

y
_
e
y
2
2
_

dy +
ln
2
m

2
=

2

2
_
ye

y
2
2

+
_

y
2
2
dy
_
+ ln
2
m
=
2
+ ln
2
m.
In conclusion:
V (Y ) = V (ln X) =
2
+ ln
2
mln
2
m =
2
,
as desired.
Remark 4. If Y N(0, 1), m > 0 and > 0 then the random variable X dened
by X = me
Y
is a lognormal random variable with parameters m and .
Proof. Let Y N(0, 1), m > 0 and > 0.
We have to determine the probability density function of the random variable X.
First,we will determine the distribution function of the random variable X.
If x 0 then
F
X
(x) = P(X < x) = P(me
Y
< x) = 0.
For x > 0 we get
F
X
(x) = P(X < x) = P(me
Y
< x) = P
_
e
Y
<
x
m
_
= P
_
y < ln
x
m
_
= P
_
Y <
1

ln
x
m
_
= F
Y
_
1

ln
x
m
_
.
378
By dierentiating the distribution function of X we get the probability density
function f
X
.
Hence, for x 0 we have f
X
(x) = 0 and for x > 0
f
X
(x) = F

X
(x) = f
Y
_
1

ln
x
m
_
1
x
=
1
x

1

2
e

1
2
ln
2 x
m
,
which is the probability density function of a lognormal random variable with param-
eters m and .
Remark 5. If X is a lognormal random variable with parameters m > 0 and
> 0 then the random variable Y dened by
Y =
1

ln
X
m
is a standard normal random variable (Y N(0, 1)).
Proof. Let m > 0, > 0 and X a lognormal r.v. with parameters m and .
We have to determine the probability density function of the random variable Y .
We determine rst the distribution of the random variable Y
F
Y
(y) = P(Y < y) = P
_
1

ln
X
m
< y
_
= P
_
ln
x
m
< y
_
= P
_
X
m
< e
y
_
= P(X < me
y
) = F
X
(me
y
).
By dierentiating the distribution function of Y we get the probability density
function f
Y
f
Y
(y) = F

Y
(y) = F

X
(me
y
)me
y
= f
X
(me
y
)me
y
= me
y
1
me
y

2
e

1
2
2
ln
2
e
y
=
1

2
e

1
2
2

2
y
2
=
1

2
e

y
2
2
which is the probability density function of a standard random variable.
379
Appendix A
Notions of topology in R
n
A metric space is an ordered pair (X, d) where X is an non-empty set and d is
a metric (that is a function):
d : X X R
such that
D1) d(x, y) 0, x, y X (non-negativity)
d(x, y) = 0 if and only if x = y
D2) d(x, y) = d(y, x), x, y X (symmetry)
D3) d(x, z) d(x, y) +d(y, z), x, y, z X (triangle inequality).
The notion of a metric is a generalization of the Euclidean distance in R
n
dened
as
d(x, y) =
_
(x
1
y
1
)
2
+ + (x
n
y
n
)
2
, x = (x
1
, . . . , x
n
) R
n
,
y = (y
1
, . . . , y
n
) R
n
.
Remark. If d is the Euclidean distance dened before, then (R
n
, d) is a metric
space called the Euclidean metric space R
n
.
A metric space also induces topological properties like open and closed sets which
leads to the study of more abstract topological spaces.
Denition. (Topological space)
A topological space is a pair (X, T ) where X is an nonempty set and T T(X),
satisfying the following axioms:
T1) , X T
T2) The union of any collection of sets in T is also in T .
T3) The intersection of any nite collection of sets in T is also in T .
The collection T is called a topology on X. The sets in T are called open sets
and their complements in X are called closed sets.
Every metric space is a topological space.
Below, we will present the manner in which the Euclidean metric on R
n
induces
a topological structure on R
n
.
Denitions
1. Open ball in R
n
Let a = (a
1
, . . . , a
n
) R
n
and let r > 0.
The open ball of radius r and center a is the set
B(a, r) = x R
n
[ d(a, x) < r.
Some particular cases:
R : B(a, r) = (a r, a +r).
380
If n = 1, the open ball of radius r and center a is the interval of center a and
radius r. Indeed, if n = 1 then x B(a, r) i [x a[ < r which gives us:
r < x a < r or a r < x < a +r, as desired.
R
2
: B(a, r) is the disc centered at a and radius r.
R
3
: B(a, r) is the interior of a sphere centered at a and radius r.
2. Vicinity or neighbourhood of a R
n
Let a R
n
. A set V R
n
is called a vicinity (or neighbourhood) of the point
a i there is r > 0 such that B(a, r) V .
By 1(a) we denote the set of all vicinities of the point a, i.e.
1(a) = V R
n
[ r > 0 : B(a, r) V .
3. Open set in R
n
A set G R
n
is called an open set i for every x G, G is a vicinity of x i.e.
x G, r > 0: B(x, r) G.
Remark. If we denote by T = G R
n
[ G - open set then (R
n
, T ) is a topo-
logical space. A topological space which can arise in this way from a metric space is
called a metrizable space.
4. Closed set in R
n
A set F R
n
is called a closed set in R
n
i R
n
F is an open set.
5. Domain in R
n
A set D R
n
is called a domain if it is an open set and a connected one (in one
piece).
6. Accumulation point (limit point)
The point a is an accumulation point (or limit point) of the set A R
n
i
every open ball centered at a contains at least one element of A, other than a.
In the previous denition there is no condition on the membership of a in A.
Hence, if A R
n
then a R
n
is an accumulation point of the set A i r > 0,
B(a, r) A a ,= .
We will denote by A

the set of accumulation points of the given set A. Hence, if


A R
n
A

= a R
n
[ r > 0 : B(a, r) A a ,= .
381
7. Isolated point
Let A R
n
and let a A. A point a A is an isolated point for A i there is
an open ball centered at a which contains only the point a from A. In other words, a
point a from A is an isolated point for A if it isnt an accumulation point of the given
set.
Hence, if A R
n
and a A, a is an isolated point of the set A i r > 0:
B(a, r) A = a.
8. Boundary
Let A R
n
. The boundary of A consists of all points a R
n
for which there is
an open ball of positive radius which intersects both A and R
n
A.
We denote the boundary of A by frA.
Hence, if A R
n
then
frA = a R
n
[ r > 0 : B(a, r) A ,= , B(a, r) (R
n
A) ,= .
9. Bounded set
The set A R
n
is a bounded set if there is M > 0 such that A B(0, M).
Hence A R
n
is bounded i M > 0 such that a A, d(0, a) < M.
Example. Let A R
2
,
A = (x, y) R
2
[ x
2
+y
2
9, x > 0, y > 0 (3, 3).
`

y
x
(3,3)
1
2
3
1 2 3
(1,1)
O
(3,3) - an isolated point
(1,1) - an accumulation point which belongs to A
382
(0, 1)
_
a boundary point
an accumulation point which doesnt belong to A
A

= (x, y) R
2
[ x
2
+y
2
9, x 0, y 0
frA = (x, y) R
2
[ x
2
+y
2
= 9, x 0, y 0
(x, 0) [ 0 x 3 (0, y) [ 0 y 3.
383
Appendix B
Functions of random variables
In many cases we are given the probability distribution of a random variable and
we are asked for the distribution of some function of it. For example, suppose that
we know the distribution of X and try to nd the distribution of g(X). In order to
do that we have to express the event that g(X) < y in terms of X being in some set
(which depends on the function g). We start with some examples:
Example. Linear functions of random variables
Let X be a random variable and consider a new random variable Y = aX + b,
with a ,= 0 and b R.
Case 1. If X is a discrete random variable, then
f
Y
(y) = P(Y = y) = P(aX +b = y) = P
_
X =
y b
a
_
= f
X
_
y a
a
_
.
Case 2. If X is continuous, we determine rst the distribution function of Y .
F
Y
(y) = P(Y < y) = P(aX +b < y) = P(aX < y b).
- for a > 0 we have
F
Y
(y) = P
_
X <
y b
a
_
= F
X
_
y b
a
_
Since f
X
= F

X
, we obtain that F
Y
is dierentiable and
f
Y
(y) = F

Y
(y) =
1
a
f
X
_
y b
a
_
=
1
[a[
f
X
_
y b
a
_
- for a < 0 we have
F
Y
(y) = P
_
X >
y b
a
_
= 1 P
_
X
y b
a
_
= 1 P
_
X <
y b
a
_
= 1 F
X
_
y b
a
_
and
f
Y
(y) = F

Y
(y) =
1
a
f
X
_
y b
a
_
=
1
[a[
f
X
_
y b
a
_
.
In conclusion, we obtain that
f
Y
(y) =
1
[a[
f
X
_
y b
a
_
.
Example. Let X be a continuous random variable whose probability density
function is f
X
. Determine the p.d.f. of Y = X
2
.
384
Solution. If y 0 then
F
Y
(y) = P(Y < y) = P(X
2
< y) = 0.
If y > 0 then
F
Y
(y) = P(Y < y) = P(X
2
< y) = P(

y < X <

y)
= F
X
(

y) F
X
(

y).
By dierentiating the previous equality we get:
f
Y
(y) =
1
2

y
f
X
(

y) +
1
2

y
f
X
(

y).
In conclusion,
f
Y
(y) =
_

_
0, y 0
1
2

y
[f
X
(

y) +f
X
(

y)].
The previous examples illustrate two dierent situations: the case in which Y is
an invertible function of X and the case in which it is not.
We will generalize the previous situations just in the case in which Y is an invertible
function of X.
In the other case we will not present the general case which is more complicated
and is beyond the scope of this text.
Theorem. (Discrete case) Let X be a discrete random variable and let Y =
g(X), where g is one-to-one function on R.
Then Y is a discrete random variable whose p.m.f. is
f
Y
(y) =
_
f
X
(g
1
(y)), if there is x RangeX with x = g(y)
0, otherwise
Theorem. (Continuous case) Let X be a continuous random variable and let
Y = g(X) where g is a one-to-one dierentiable function on R.
Then Y is a continuous random variable whose p.d.f. is
f
Y
(y) =
_

_
f
X
(g
1
(y))[(g
1
)

(y)[ =
f
X
(x)
[g

(x)[
, if there is x RangeX
with x = g(y)
0, otherwise
The proofs of these theorems are similar to the ideas presented in the rst example
of this subsection.
385
Expectation of a function of one random variable
The expectation of Y = g(X) is given by:
E(Y ) = E[g(X)] =
_

i
g(x
i
)p
i
(discrete case)
_

g(x)f
X
(x)dx (continuous case)
Sums of independent continuous random variables
Let X and Y be independent continuous random variables whose probability den-
sity functions are f
X
and f
Y
. We wish to know the p.d.f. of X +Y .
Denition. (Convolution)
Let X and Y be two continuous random variables with density functions f
X
and
f
Y
. Then the convolution f g of f and g is the function dened by
(f g)(z) =
_

f
X
(z y)f
Y
(y)dy
=
_

f
Y
(z x)f
X
(x)dx.
Theorem. Let X and Y be two independent random variables with density func-
tions f
X
and f
Y
. Then the sum Z = X+Y is a random variable with density function
f
Z
, where f
Z
= f g.
The proof of this theorem is beyond the scope of this book and will be omitted.
To get a better understanding of this important result, we will present some ex-
amples.
Example. (Sum of two independent Gamma random variables)
If X (a
1
, b) and Y (a
2
, b) are independent, then
Z = X +Y (a
1
+a
2
, b).
Solution. We know that:
f
X
(x) =
_
_
_
1
(a
1
)b
a
1
x
a
1
1
e

x
b
, x > 0
0, x 0
and
f
Y
(y) =
_
_
_
1
(a
2
)b
a
2
y
a
2
1
e

y
b
, y > 0
0, y 0
386
and so, if z > 0
f
Z
(z) =
_

f
X
(z y)f
Y
(y)dy
=
1
(a
1
)(a
2
)b
a
1
+a
2
_
z
0
(z y)
a
1
1
e

zy
b
y
a
2
1
e

y
b
dy
=
e

z
b
z
a
1
1
z
a
2
1
(a
1
)(a
2
)b
a
1
+a
2
_
z
0
_
1
y
z
_
a
1
1
_
y
z
_
a
2
1
dy.
If we make the following change of variable:
y
z
= x with dy = zdx, we obtain:
If z > 0 then
f
Z
(z) =
e

z
b
z
a
1
+a
2
2
(a
1
)(a
2
)b
a
1
+a
2
_
1
0
(1 x)
a
1
1
x
a
2
1
dx
=
e

z
b
z
a
1
+a
2
1
(a
1
)(a
2
)b
a
1
+a
2
B(a
1
, a
2
)
=
e

z
b
z
a
1
+a
2
1
(a
1
)(a
2
)b
a
1
+a
2

(a
1
)(a
2
)
(a
1
+a
2
)
=
1
(a
1
+a
2
)b
a
1
+a
2
e

z
b
z
a
1
+a
2
1
If z 0 then
f
X
(z y) = 0 for y > 0 and
f
Y
(y) = 0 for y 0,
hence f
Z
(z) = 0. This completes the proof.
Example. (Sum of two independent normal random variables)
If X N(m
1
,
2
1
) and Y N(m
2
,
2
2
) are independent then
Z = X +Y N(m
1
+m
2
,
2
1
+
2
2
).
Solution. If X N(m
1
,
2
1
) then U =
X m
1

1
N(0, 1)
(see Remark 4 from the subsection The normal random variable).
Similarly, V =
Y m
2

2
N(0, 1).
First, we will prove that if U, V are two independent standard normal variables
and , > 0 such that
2
+
2
= 1, then
W = U +V N(0, 1).
If U N(0, 1) then by applying once more the remark mentioned before we have
that U N(0,
2
).
387
Similarly, V N(0,
2
), so
f
W
(z) =
_

f
U
(z y)f
V
(y)dy
=
1
2
_

(zy)
2
2
2
e

y
2
2
2
dy
=
1
2
_

y
2

2
+
2
(zy)
2
2
2

2
dy

2
+b
2
=1
====
=
1
2
_

y
2
+
2
z
2
2
2
zy
2
2

2
dy
=
1
2
_

y
2
2
2
zy+
4
z
2

4
z
2
+
2
z
2
2
2

2
dy
=
1
2
_

(y
2
z)
2
2
2

2
e

2
b
2
z
2
2
2

2
dy.
If we make the following change of variable: x =
y
2
z

with
dy = dx, then
f
W
(z) =
1
2
e

z
2
2
_

x
2
2
dx =
1
2
e

z
2
2

2 =
1

2
e

z
2
2
In conclusion: W N(0, 1).
If we take =

1
_

2
1
+
2
2
and =

2
_

2
1
+
2
2
, then
W = U +V =
X m
1


1
_

2
1
+
2
2
+
Y m
2


2
_

2
1
+
2
2
=
X +Y (m
1
+m
2
)
_

2
1
+
2
2
N(0, 1).
We apply again the Remark 4 (mentioned before), part a), with
a =
_

2
1
+
2
2
and b = m
1
+m
2
we get that
aW +b = X +Y N(m
1
+m
2
,
2
1
+
2
2
),
as desired.
388
Bibliography
[1] Apostol, T.M., Calculus, vol. 2, Multivariable Calculus and Linear Algebra, with
Applications to Dierential Equations and Probability, Second edition, John Wi-
ley, New York, 1969.
[2] Anton, H., Rorres, C., Elementary Linear Algebra, 9
th
edition, John Wiley, New
York, 2005.
[3] Binmore, K., Game Theory, A very Short Introduction, Oxford University Press,
2007.
[4] Blaga, P., Muresan, A.S., Matematici aplicate n economie, vol. 1, Ed. Transil-
vania Press, Cluj-Napoca, 1996.
[5] Brickman, L., Mathematical Introduction to Linear Programming and Game The-
ory, Springer-Verlag, New-York, 1989.
[6] Carmichael, F., A Guide to Game Theory, Prentice Hall, Pearson Education
Limited, 2005.
[7] Cobzas, S., Analiz a matematic a (Calcul diferent ial), Presa Universitar a Clujeana,
Cluj-Napoca, 1997.
[8] Dantzig, G.B., Thapa, M.N., Linear Programming, Springer-Verlag, New York,
1997.
[9] Dowling, E.T., Mathematiques pour leconomistes, Mc Graw-Hill, Paris, 1990.
[10] Duca, D.I., Multicriteria optimisation in complex space, Casa Cart ii de Stiint a,
Cluj-Napoca, 2005.
[11] Eiselt, H.A., Sandblom, C.L., Linear Programming and its Applications, Springer
Berlin Heidelberg New York, 1965.
[12] Filip, D.A., Curt, P., Methodes quantitatives en economie, Ed. Mediamira, Cluj-
Napoca, 2008.
[13] Grinstead, Cl.M., Introduction to Probability,
http://www.darmouth.edu/chance.reaching aids/books articles/ probabil-
ity book/amsbook.mac.pdf
389
[14] Kemeny, J.G., Snell, J.L., Thompson, G.L., Introduction to Finite Mathematics,
3
rd
edition, Prentice-Hall, 1974.
[15] Kinney, J., A probability and Statistic Companion, John Wiley, New York, 2009.
[16] Klimov, G., Probability Theory and Mathematical Statistics, Mir Publishers,
Moscow, 1996.
[17] Kolman, B., Beck, R.E., Elementary Linear Programming with Applications, El-
sevier Science, 1995.
[18] Krishnan, V., Probability and Random Processes, John Wiley, New York, 2006.
[19] Lay, D.C., Linear Algebra and its Applications, Addison-Wesley Publishing Com-
pany, 2003.
[20] Le Gall, J.-F., Integration, Probabilites et Processus Aleatoires, 2006.
[21] Lipschitz, S., Schaums outline of Theory and Problems of Finite Mathematics,
Schaums outline series, Mc Graw-Hill Book Company, 1966.
[22] Lisei, H., Probability Theory, Casa Cart ii de Stiint a, Cluj-Napoca, 2004.
[23] Luderer, B., Nollan, V., Vetfers, K., Mathematical Formulas for Economists, 3rd
edition, Springer-Verlag Berlin Heidelberg, 2007.
[24] Luenberger, D.G., Ye, Y., Linear and Nonlinear Programming, 3
rd
edition,
Springer-Verlag, 2008.
[25] Meester, R., A Natural Introduction to Probability Theory, 2nd edition,
Birkhauser Verlag AG, 2008.
[26] Mihoc, I., Calculul probabilit at ilor si statistic a matematic a, lito Univ. Babes-
Bolyai, Cluj-Napoca, 1998.
[27] Mihoc, I., Mihoc, M., Matematici aplicate n economie. Analiz a matematic a, vol.
II, Ed. Presa Universitar a Clujeana, 1999.
[28] Muresan, A.S., Matematici pentru economisti, vol. 1, 2, lito Univ. Babes-Bolyai,
Cluj-Napoca, 1991.
[29] Muresan, A.S., Matematici aplicate n nant e, b anci si burse, Ed. Risoprint,
Cluj-Napoca, 2000.
[30] Muresan, A.S., Lung, R.I., Matematici aplicate n economie (Cercet ari
operat ionale), Ed. Mediamira, Cluj-Napoca, 2005.
[31] Muresan, A.S., si colectiv, Elemente de algebr a liniar a si analiz a matematic a
pentru economisti, Ed. Todesco, Cluj-Napoca, 2003.
[32] Muresan, A.S., Non-cooperative games, Ed. Mediamira, Cluj-Napoca, 2003.
390
[33] Muresan, A.S., si colectiv, Elemente de teoria probabilit at ilor si statistic a matem-
atic a pentru economisti, Ed. Todesco, Cluj-Napoca, 2004.
[34] Muresan, A.S., si colectiv, Analiz a matematic a si Teoria probabilit at ilor aplicate
n economie, Ed. Todesco, Cluj-Napoca, 2006.
[35] Muresan, A.S., Blaga, P., Matematici aplicate n economie, vol. 2, Ed. Transil-
vania Press, Cluj-Napoca, 1996.
[36] Muresan, A.S., Mihoc, M., Filip, D., Curt, P., Rap, I., Radu, Rosca, A., P acurar,
M., Petru, P., Mihalca, G., Analiz a matematic a, teoria probabilit at ilor si algebr a
liniar a aplicate n economie, edit ia a doua, Ed. Mediamira, 2008.
[37] Murty, K.G., Linear Programming, John Wiley, 1983.
[38] Nicolescu, M., Analiz a matematic a, vol. 1, E.D.P., Bucuresti, 1997.
[39] Pedregal, P., Introduction to Optimization, Springer-Verlag New York, 2004.
[40] Piatecki, C., Le dilemme du prisonnier et autres dilemmes sociaux, 2006.
[41] Piatecki, C., Theorie des Choix en Incertain, 2007.
[42] Popescu, O., si alt ii, Matematici aplicate n economie, E.D.P., Bucuresti, 1998.
[43] Purcaru, I., Matematici generale si elemente de optimizare: teorie si aplicat ii,
Ed. Economica, Bucuresti, 1997.
[44] Purcaru, I., Matematici generale si elemente de optimizare: teorie si aplicat ii,
edit ia II, Ed. Economica, Bucuresti, 2004.
[45] Roussases, G., Introduction to Probability and Statistical Inference, Elsevier Sci-
ence, 2003.
[46] Sheldon, R., A First Course in Probability, 5th edition, Prentice-Hall Inc., 1998.
[47] Sheldon, R., Introduction to Probability and Statistics for Engineers and Scien-
tists, third edition, Elsevier Academic Press, 2004.
[48] Sheldon, R., Introduction to Probability Models, Sixth edition, Academic Press,
1997.
[49] Simon, C.P., Blume, L., Mathematics for Economists, W.W. Norton Company
Inc., New York, 1994.
[50] Stanasil a, O., Analiz a matematic a, E.D.P., Bucuresti, 1991.
[51] Stewart, J., Analyse. Concepts et contextes, Vol. 1, Fonctions dune variable, De
Boeck Universite, Paris, Bruxelles, 2001.
[52] Stewart, J., Analyse. Concepts et contextes, vol. 2, Fonctions de plusieurs vari-
ables, De Boeck Universite, Paris, Bruxelles, 2001.
391
[53] Stirzaker, D., Elementary Probability, 2
nd
edition, Cambridge University Press,
2003.
[54] Sydsater, K., Strom, A., Berck, P., Economists Mathematical Manual, fourth
edition, Springer-Verlag, 2005.
[55] http://www.masternance.proba.jussieu.fr, 2004-2005
[56] http://fr.wikipedia.org/wiki/Accueil
[57] http://www.bibmath.net, BibM@th, la bibliot`eque des Mathematiques
[58] http://www.cmath.fr
[59] http://www.netprof.fr
[60] http://aleph0.clarku.edu/djoyce
[61] http://homeomath.imingo.net
[62] http://www.les-mathematiques.net
392

S-ar putea să vă placă și