Sunteți pe pagina 1din 91

Introduction to probabilities with Scilab

Michael Baudin
June 2011
Abstract
In this article, we present an introduction to probabilities with Scilab.
Numerical experiments are based on Scilab. The first section presents discrete
random variables and conditionnal probabilities. In the second section, we
present combinations problems, tree diagrams and Bernouilli trials. In the
third section, we present simulation of random processes with Scilab. Coin
simulations are presented, as well as the Galton board.

Contents
1 Discrete random variables
1.1 Sets . . . . . . . . . . . . . . . . . .
1.2 Distribution function and probability
1.3 Properties of discrete probabilities . .
1.4 Uniform distribution . . . . . . . . .
1.5 Conditional probability . . . . . . . .
1.6 Life table . . . . . . . . . . . . . . .
1.7 Bayes formula . . . . . . . . . . . .
1.8 Independent events . . . . . . . . . .
1.9 Notes and references . . . . . . . . .
1.10 Exercises . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

2 Combinatorics
2.1 Tree diagrams . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Permutations . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 The gamma function . . . . . . . . . . . . . . . . . . . . .
2.4 Overview of functions in Scilab . . . . . . . . . . . . . . .
2.5 The gamma function in Scilab . . . . . . . . . . . . . . . .
2.6 The factorial and log-factorial functions . . . . . . . . . . .
2.7 Computing factorial and log-factorial with Scilab . . . . .
2.8 Stirlings formula . . . . . . . . . . . . . . . . . . . . . . .
2.9 Computing permutations and log-permutations with Scilab
2.10 The birthday problem . . . . . . . . . . . . . . . . . . . .
2.11 A modified birthday problem . . . . . . . . . . . . . . . .
2.12 Combinations . . . . . . . . . . . . . . . . . . . . . . . . .
1

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

4
4
5
7
10
11
14
16
19
20
21

.
.
.
.
.
.
.
.
.
.
.
.

21
22
22
25
28
28
30
31
33
36
38
40
43

2.13
2.14
2.15
2.16
2.17
2.18
2.19
2.20

Computing combinations and log-combinations with Scilab


The poker game . . . . . . . . . . . . . . . . . . . . . . . .
Bernoulli trials . . . . . . . . . . . . . . . . . . . . . . . .
Computing the binomial distribution . . . . . . . . . . . .
The hypergeometric distribution . . . . . . . . . . . . . . .
Computing the hypergeometric distribution with Scilab . .
Notes and references . . . . . . . . . . . . . . . . . . . . .
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3 Simulation of random processes with Scilab


3.1 Overview . . . . . . . . . . . . . . . . . . .
3.2 Generating uniform random numbers . . . .
3.3 Simulating random discrete events . . . . . .
3.4 Simulation of a coin . . . . . . . . . . . . . .
3.5 Simulation of a Galton board . . . . . . . .
3.6 Generate random permutations . . . . . . .
3.7 References and notes . . . . . . . . . . . . .
4 Acknowledgments

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

44
47
48
51
54
55
56
57

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

59
59
60
61
63
64
66
69
70

5 Answers to exercises
71
5.1 Answers for section 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2 Answers for section 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Bibliography

89

Index

90

c 2008-2010 - Consortium Scilab - Digiteo - Michael Baudin


Copyright
This file must be used under the terms of the Creative Commons AttributionShareAlike 3.0 Unported License:
http://creativecommons.org/licenses/by-sa/3.0

Discrete random variables

In this section, we present discrete random variables. The first section presents
general definition for sets, including unions and intersections. Then we present
the definition of the discrete distribution function and the probability of an event.
In the third section, we give properties of probabilities, such as, for example, the
probability of the union of two disjoints events. The fourth section is devoted to the
very common discrete uniform distribution function. Then we present the definition
of conditional probability. This leads to Bayes formula which allows to compute
the posterior conditional probability, given a set of hypothesis probabilities. This
section finishes with the definition of independent events.

1.1

Sets

A set is a collection of elements. In this document, we consider sets of elements in


a fixed non empty set , to be called a space.
Assume that A is a set of elements. If x is a point in that set, we note x A. If
there is no point in A, we write A = . If the number of elements in A is finite, let
us denote by #(A) the number of elements in the set A. If the number of elements
in A is infinite, the cardinality cannot be computed (for example A = N).
The set Ac is the set of all points in which are not in A:
Ac = {x / x
/ A} .

(1)

The set Ac is called the complementary set of A.


The set B is a subset of A if any point in B is also in A and we can write A B.
The two sets A and B are equal if A B and B A. The difference set A B is
the set of all points of A which are not in B:
A B = {x A / x
/ B} .

(2)

The intersection A B of two sets A and B is the set of points common to A and
B:
A B = {x A and x B} .

(3)

The union A B of two sets A and B is the set of points which belong to at least
one of the sets A or B:
A B = {x A or x B} .

(4)

The operations that we defined are presented in figure 1. These figures are often
referred to as Venns diagrams.
Two sets A and B are disjoints, or mutually exclusive if their intersection is
empty, i.e. A B = .
In the following, we will use the fact that we can always decompose the union of
two sets as the union of three disjoints subsets. Indeed, assume that A, B . We
have
A B = (A B) (A B) (B A),
4

(5)

c
A

B
A-B

Figure 1: Operations on sets Upper left, union: A B, Upper right, intersection:


A B, Lower left, complement: Ac , Lower right, difference: A B
where the sets A B, A B and B A are disjoints. This decomposition will be
used several times in this chapter.
The cross product of two sets is the set
A B = {(x, y) /x A, y B} .

(6)

Assume that n is a positive integer. The power set An is the set


An = {(x1 , . . . , xn ) /x1 , ..., xn A} .

(7)

Example 1.1 (Die with 6 faces) Consider a 6-faces die. The space for this experiment is
= {1, 2, 3, 4, 5, 6} .

(8)

The set of even numbers is A = {2, 4, 6} and the set of odd numbers is B = {1, 3, 5}.
Their intersection is empty, i.e. A B = which proves that A and B are disjoints.
Since their union is the whole sample space, i.e. A B = , these two sets are
mutually complement, i.e. Ac = B and B c = A.

1.2

Distribution function and probability

A random event is an event which has a chance of happening, and the probability
is a numerical measure of that chance. What exactly is random is quite difficult
to define. In this section, we define the probability associated with a distribution
function, a concept that can be defined very precisely.
5

Assume that is a set, called the sample space. In this document, we will
consider the case where the sample space is finite, i.e. the number of elements in
is finite. Assume that we are performing random trials, so that each trial is
associated to one outcome x . Each subset A of the sample space is called an
event. We say that the event A B occurs if both the events A and B occur. We
say that the event A B occurs if the event A or the event B occurs.
Example 1.2 (Die with 6 faces) Consider a 6-faces die which is rolled once. The
sample space is:
= {1, 2, 3, 4, 5, 6} .

(9)

Let us denote by x the outcome of this experiment. When we focus on the


experiments where x is odd, we consider the event A = {1, 3, 5}. When the outcome
is lower than 3, we consider the event B = {1, 3, 5}. The intersection of these two
events is C = {1, 3}, which occurs when the outcome x is both odd and lower than
3.
We will first define a distribution function and then derive the probability from
there. The example of a 6-faces die will serve as an example for these definitions.
Definition 1.1. (Distribution) A distribution function is a function f : [0, 1]
which satisfies
0 f (x) 1,

(10)

(11)

for all x and


f (x) = 1.

Example 1.3 (Die with 6 faces) Assume that a 6-faces die is rolled once. The
sample space for this experiment is
= {1, 2, 3, 4, 5, 6} .

(12)

Assume that the die is fair. This means that the probability of each of the six
outcomes is the same, i.e. the distribution function is f (x) = 1/6 for x , which
satisfies the conditions of the definition 1.1.
Definition 1.2. (Probability) Assume that f is a distribution function on the sample space . For any event A , the probability P of A is
X
P (A) =
f (x).
(13)
xA

Example 1.4 (Die with 6 faces) Assume that a 6-faces die is rolled once so that
the sample space for this experiment is = {1, 2, 3, 4, 5, 6}. Assume that the distribution function is f (x) = 1/6 for x . The event
A = {2, 4, 6}
6

(14)

corresponds to the statement that the result of the roll is an even number. From
the definition 1.2, the probability of the event A is
P (A) = f (2) + f (4) + f (6)
1 1 1
=
+ +
6 6 6
1
=
.
2

1.3

(15)
(16)
(17)

Properties of discrete probabilities

In this section, we present the properties that the probability P (A) satisfies. We also
derive some results for the probabilities of other events, such as unions of disjoints
events.
The following theorem gives some elementary properties satisfied by a probability
P.
Proposition 1.3. (Probability) Assume that is a sample space and that f is a
distribution function on . The probability of the event is one, i.e.
P () = 1.

(18)

The probability of the empty set is zero, i.e.


P () = 0.

(19)

Assume that A and B are two subsets of . If A B, then


P (A) P (B).

(20)

0 P (A) 1.

(21)

For any event A , we have

Proof. The equality


P 18 derives directly from the definition 11 of a distribution function, i.e. P () = x f (x) = 1. The equality 19 derives directly from 11.
Assume that A and B are two subsets of so that A B. Since a probability
is the sum of positive terms, we have
X
X
P (A) =
f (x)
f (x) = P (B)
(22)
xA

xB

which proves the inequality 20.


The inequalities 21 derive directly from the definition of a probability 13. First,
the probability P (A) is positive since 10 states that fPis positive. P
Second, the
probability P of an event A is lower than 1, since P (A) = xA f (x) x f (x) =
P () = 1, which concludes the proof.
Proposition 1.4. (Probability of two disjoint subsets) Assume that is a sample
space and that f is a distribution function on .
Let A and B be two disjoints subsets of , then
P (A B) = P (A) + P (B).
7

(23)

Figure 2: Two disjoint sets.


The figure 2 presents the situation of two disjoints sets A and B. Since the two
sets have no intersection, it suffices to add the probabilities associated with each
event.
Proof. Assume that A and B are two disjoints subsets of . We can decompose
A B as A B = (A B) (A B) (B A), so that
X
f (x)
(24)
P (A B) =
xAB

f (x) +

f (x) +

f (x).

(25)

xBA

xAB

xAB

But A and B are disjoints, so that AB = A, AB = and B A = B. Therefore,


X
X
f (x)
(26)
f (x) +
P (A B) =
xA

xB

= P (A) + P (B),

(27)

which concludes the proof.


Notice that the equality 23 can be generalized immediately to a sequence of
disjoints events.
Proposition 1.5. (Probability of disjoints subsets) Assume that is a sample space
and that f is a distribution function on .
For any disjoints events A1 , A2 , . . . , Ak with k 0, we have
P (A1 A2 . . . Ak ) = P (A1 ) + P (A2 ) + . . . + P (Ak ).

(28)

Proof. For example, we can use the proposition 1.4 to state the proof by induction
on the number of events.
Example 1.5 (Die with 6 faces) Assume that a 6-faces die is rolled once so that
the sample space for this experiment is = {1, 2, 3, 4, 5, 6}. Assume that the distribution function is f (x) = 1/6 for x . The event A = {1, 2, 3} corresponds to
the numbers lower or equal to 3. The probability of this event is P (A) = 12 . The
event B = {5, 6} corresponds to the numbers greater than 5. The probability of this
event is P (B) = 13 . The two events are disjoints, so that the proposition 1.5 can be
applied, which implies that P (A B) = 56 .
8

Figure 3: Two sets with a non empty intersection.


Proposition 1.6. (Probability of the complementary event) Assume that is a
sample space and that f is a distribution function on .
For any subset A of ,
P (A) + P (Ac ) = 1.

(29)

Proof. We have = A Ac , where the sets A and Ac are disjoints. Therefore, from
proposition 1.4, we have
P () = P (A) + P (Ac ),

(30)

where P () = 1, which concludes the proof.


Example 1.6 (Die with 6 faces) Assume that a 6-faces die is rolled once so that
the sample space for this experiment is = {1, 2, 3, 4, 5, 6}. Assume that the distribution function is f (x) = 1/6 for x . The event A = {2, 4, 6} corresponds
to the statement that the result of the roll is an even number. The probability of
this event is P (A) = 21 . The complementary event is the event of an odd number, i.e. Ac = {1, 3, 5}. By proposition 1.6, the probability of an odd number is
P (Ac ) = 1 P (A) = 1 21 = 21 .
The following equality gives the relationship between the probability of the union
of two events in terms of the individual probabilities and the probability of the
intersection.
Proposition 1.7. (Probability of the union) Assume that is a sample space and
that f is a distribution function on . Assume that A and B are two subsets of ,
not necessarily disjoints. We have:
P (A B) = P (A) + P (B) P (A B).

(31)

The figure 3 presents the situation where two sets A and B have a non empty
intersection. When we add the probabilities of the two events A and B, the intersection is added twice. This is why it must be removed by subtraction.
Proof. Assume that A and B are two subsets of . The proof is based on the analysis
of Venns diagram presented in figure 3. The idea of the proof is to compute the
9

probability P (A B), by making disjoints sets on which the equality 23 can be


applied. We can decompose the union of the two set A and B as the union of
disjoints sets:
A B = (A B) (A B) (B A).

(32)

The equality 23 leads to


P (A B) = P (A B) + P (A B) + P (B A).

(33)

The next part of the proof is based on the computation of P (A B) and P (B A).
We can decompose the set A as the union of disjoints sets
A = (A B) (A B),

(34)

which leads to P (A) = P (A B) + P (A B), which implies


P (A B) = P (A) P (A B).

(35)

Similarly, we can prove that


P (B A) = P (B) P (B A).

(36)

We plug the two equalities 35 and 36 into 33, and find


P (A B) = P (A) P (A B) + P (A B) + P (B) P (B A),

(37)

which implies 31 and concludes the proof.


Example 1.7 (Disease) Assume that infections can be bacterial (B), viral (V) or
both (B V ). This implies that B V = but the two events are not disjoints,
i.e. B V 6= . Assume that P (B) = 0.7 and P (V ) = 0.4. What is the probability
of having both types of infections ?
The probability of having both infections is P (B V ). From proposition 1.7,
we have P (B V ) = P (B) + P (V ) P (B V ), which leads to P (B V ) =
P (B) + P (V ) P (B V ). We finally get P (B V ) = 0.7 + 0.4 1 = 0.1.
This example is based on an example presented in [15].

1.4

Uniform distribution

In this section, we describe the particular situation where the distribution function
is uniform.
Definition 1.8. (Uniform distribution) Assume that is a finite, nonempty, sample
space. The uniform distribution function is
f (x) =

1
,
#()

for all x .
10

(38)

Proposition 1.9. (Probability with uniform distribution) Assume that is a finite,


nonempty, sample space and that f is a uniform distribution function. Then the
probability of the event A is
#(A)
.
#()

(39)

P (A) =

(40)

1
#()
xA

(41)

#(A)
,
#()

(42)

P (A) =
Proof. The definition 1.2 implies

f (x)

xA

which concludes the proof.


Example 1.8 (Die with 6 faces) Assume that a 6-faces die is rolled once so that
the sample space for this experiment is = {1, 2, 3, 4, 5, 6}. In the previous analysis
of this example, we have assumed that the distribution function is f (x) = 1/6 for
x . This is consistent with definition 1.8, since #() = 6. Such a die is a
fair die, meaning that all faces have the same probability. The event A = {2, 4, 6}
corresponds to the statement that the result of the roll is an even number. The
number of outcomes in this event is #(A) = 3. From proposition 1.9, the probability
of this event is P (A) = 21 .

1.5

Conditional probability

In this section, we define the conditional distribution function and the conditional
probability. We analyze this definition in the particular situation of the uniform
distribution.
In some situations, we want to consider the probability of an event A given that
an event B has occurred. In this case, we consider the set B as a new sample space,
and update the definition of the distribution function accordingly.
Definition 1.10. (Conditional distribution function) Assume that is a sample
space and that f is a distribution function on . Assume that A is a nonempty
subset of . The function f (x|A) defined by
(
P f (x)
, if x A,
xA f (x)
(43)
f (x|A) =
0, if x
/ A,
is the conditional distribution function of x given A.
P
The hypothesis that A is nonempty implies P (A) = xA f (x) > 0, so that the
equality 43 is well defined.
The figure 4 presents the situation where an event A is considered for a conditionnal distribution. The distribution function f (x) is with respect to the sample
space while the conditionnal distribution function f (x|A) is with respect to the
set A.
11

Figure 4: A set A, subset of the sample space .


Proof. We must prove that the function f (x|A) is a distribution function. Let us
prove that the function f (x|A) satisfies the equality
X
f (x|A) = 1.
(44)
x

Indeed, we have
X

f (x|A) =

f (x|A) +

xA

f (x|A)

(45)

xA
/

X
P
xA

f (x)
xA f (x)

= 1,

(46)
(47)
(48)

which concludes the proof.


This leads us to the following definition of the conditional probability of an event
A given an event B.
Proposition 1.11. (Conditional probability) Assume that is a finite sample space
and A and B are two subsets of . Assume that P (B) > 0. The conditional
probability of the event A given the event B is
P (A|B) =

P (A B)
.
P (B)

(49)

The figure 5 presents the situation where we consider the event A|B. The probability P (A) is with respect to while the probability P (A|B) is with respect to
B.
Proof. Assume that A and B are subsets of the sample space . The conditional
distribution function f (x|B) can be used to compute the probability of the event A
given the event B. Indeed, we have
X
P (A|B) =
f (x|B)
(50)
xA

xAB

12

f (x|B)

(51)

AB

Figure 5: The conditionnal probability P (A|B) measures the probability of the set
A B with respect to the set B.
since f (x|B) = 0 if x
/ B. Hence,
P (A|B) =

f (x)
xB f (x)

P
xAB

P
xAB f (x)
= P
xB f (x)
P (A B)
=
.
P (B)

(52)
(53)
(54)

The previous equality is well defined since P (B) > 0.


This definition can be analyzed in the particular case where the distribution
function is uniform. Assume that #() is the size of the sample space and #(A)
(resp. #(B) and #(A B)) is the number of elements of A (resp. of B and A B).
Therefore, the equation 49 implies
P (A|B) =
=

#(AB)
#()
#(B)
#()

(55)

#(A B)
.
#(B)

(56)

We notice that
#(B) #(A B)
#(A B)
=
,
#() #(B)
#()

(57)

for all A, B . This leads to the equality


P (B)P (A|B) = P (A B),

(58)

for all A, B . The previous equation could have been directly found based on
the equation 49.

13

Age Group
<1
1-4
5-9
10-14
15-19
20-24
25-29
30-34
35-39
40-44
45-49
50-54
55-59
60-64
65-69
70-74
75-79
80-84
85-89
90-94
95-99
100+

Male
100000
99276
99156
99085
98989
98573
97887
97223
96526
95665
94396
92487
89643
85726
80364
72889
62860
49846
34096
18315
7198
1940

Female
100000
99391
99292
99232
99164
98991
98758
98484
98133
97621
96823
95603
93850
91384
87726
82275
74398
63218
48086
30289
14523
4804

Figure 6: Life table, United States of America, 2009 [14].


.

1.6

Life table

In this section, we analyze a life table in order and compute conditional probabilities
with Scilab.
The World Health Organization gives life tables for many countries in the world
[14]. The table 6 gathers data compiled in the USA in 2009. The first line counts
100,000 born alive males and females, with decreasing values when the age is increasing.
From this table, we can easily deduce that the probability that 91384/100000 =
91.384 % of the females live to age 60 and that 63218/100000 = 63.218 % of the
females live to age 80. We consider a women who is 60. What is the probability
that she lives to age 80 ?
Let us denote by A = {a 60} the event that a woman lives to age 60, and let
us denote by B = {a 80} the event that a woman lives to age 80. We want to
compute the conditionnal probability P ({a 80}|{a 60}). By the proposition

14

1.11, we have
P ({a 60} {a 80})
P ({a 60})
P ({a 80})
=
P ({a 60})
0.63218
=
0.91384
= 0.6918,

P ({a 80}|{a 60}) =

(59)
(60)
(61)
(62)

with 4 significant digits. In other words, a women who is already 60, has 69.18 %
of chance to live to 80.
It is easy to gather the data into Scilab variables, as in the following Scilab script.
The ages variable contains the age classes. It is made of 22 entries, where the class
#k is from age ages(k-1)+1 to ages(k). The number of male survivors in the class
#k is males(k), while the number of female survivors is females(k).
ages = [ 0 ; 4 ; 9 ; 1 4 ; 1 9 ; 2 4 ; 2 9 ; 3 4 ; 3 9 ; 4 4 ; 4 9 ; 5 4 ; . .
59;64;69;74;79;84;89;94;99;100];
males = [ 1 0 0 0 0 0 ; 9 9 2 7 6 ; 9 9 1 5 6 ; 9 9 0 8 5 ; 9 8 9 8 9 ; 9 8 5 7 3 ; . .
97887;97223;96526;95665;94396;92487;89643;85726;
80364;72889;62860;49846;34096;18315;7198;1940];
females = [ 1 0 0 0 0 0 ; 9 9 3 9 1 ; 9 9 2 9 2 ; 9 9 2 3 2 ; 9 9 1 6 4 ; 9 8 9 9 1 ; 9 8 7 5 8 ; . .
98484;98133;97621;96823;95603;93850;91384;
87726;82275;74398;63218;48086;30289;14523;4804];

The following lifeprint function prints the data and displays a table similar to
the figure 6.
function lifeprint ( ages , males , females )
nc = size ( ages , " * " );
mprintf ( " <1
%6d %6d \ n " , males (1) , females (1));
for k = 2: nc
amin = ages (k -1) + 1;
amax = ages ( k );
mprintf ( " %3d - %3d %6d %6d \ n " ,..
amin , amax , males ( k ) , females ( k ));
end
endfunction

We are now interested to compute the required probabilities with Scilab. The
following lifeproba function returns the probability that a person can live to age
a, given the datas in the tables ages and survivors. In practice, the survivors
variable will be either equal to males or to females. The algorithm first searches
in the ages table the class k which contains the age a. We compute the index k
such that a is contained in the age class from ages(k-1)+1 to ages(k). Then the
probability of living to age a is computed by using the number of survivors associated
to the class k.
function p = lifeproba (a , ages , survivors )
nc = size ( ages , " * " );
for k = 2: nc
if ( a >= ages (k -1) + 1 & a <= ages ( k ) ) then
break

15

end
end
if ( k == nc ) then
error ( " Age not found in table " )
end
p = survivors ( k )/ survivors (1)
endfunction

Although the previous algorithm is rather naive (it does not make use of vectorized
statements), this is sufficient for small life tables such as in our case.
The following session shows how the lifeproba function computes the probability of living to age 60.
--> pa = lifeproba (60 , ages , females )
pa =
0.91384

The following lifecondproba returns the probability that a person with age a
can live to age b, given the datas in the tables ages and survivors. We assume
that a < b. The algorithm is a straightforward application of the proposition 1.11.
function p = lifecondproba (a , b , ages , survivors )
if ( a >= b ) then
p = 1
return
end
pa = lifeproba (a , ages , survivors )
pb = lifeproba (b , ages , survivors )
p = pb / pa
endfunction

In the following session, we compute the probability that a woman lives to age 80,
given that she is 60.
--> pab = lifecondproba (60 , 80 , ages , females )
pab =
0.6917841

It is now easy to compute the probability that a female live to various ages, given
that she is 40. This is done in the following script, which produces the figure 7.
bages = floor ( linspace (41 ,99 ,20));
for k = 1 : 20
pab ( k ) = lifecondproba (40 , bages ( k ) , ages , females );
end
plot ( bages , pab , " bo - " );
xtitle ( " Probability of living to age B , for a women of age 40. " ,..
" B " ," Probability " );

1.7

Bayes formula

In this section, we present Bayes formula, which allows to compute the posterior
conditional probability, given a set of hypotheses probabilities.
Proposition 1.12. ( Bayes formula) Assume that the sample space can be decomposed in a sequence of events, which are called hypotheses. Let us denote by
16

Probability of living to age B, for a women of age 40.


1.0

0.9

0.8

Probability

0.7

0.6

0.5

0.4

0.3

0.2

0.1
40

50

60

70

80

90

100

Age B

Figure 7: Probability that a female live to various ages, given that she is 40.
Hi with i = 1, m a sequence of pairwise disjoints sets such that
= H1 H2 . . . Hm ,

(63)

where m is a positive integer. Then,


P (Hi |E) = P

P (E|Hi )P (Hi )
.
i=1,m P (E|Hi )P (Hi )

(64)

In order to use Bayes formula, assume that the probabilities of each hypothesis
is known, that is, assume that the probabilities P (Hi ) are given for 1 i m.
Assume that the probabilities P (E|Hi ) are known for 1 i m. Therefore, we
are able to compute P (Hi |E), that is, if the event E has occurred, we are able to
compute the probability of each hypothesis Hi . In practice, we consider the most
likely hypothesis Hi for which the probability P (Hi |E) is maximum over i = 1, m.
Proof. By definition of the conditional probability, we have
P (Hi |E) =

P (Hi E)
.
P (E)

(65)

The numerator can be computed by using the conditional probability P (E|Hi )


which, by hypothesis, is known. Hence we have
P (Hi E) = P (E|Hi )P (Hi ),

17

(66)

for 1 i m. On the other side, we must compute de denominator P (E). This


can be done by using the fact that the sequence of hypotheses Hi is a decomposition
of the whole sample space . Hence, we can decompose the event E as
E = (E H1 ) (E H2 ) . . . (E Hm ).

(67)

By hypothesis, these events are pairwise disjoints. Therefore, by proposition 1.5, we


can compute the probability of the event E as the sum of the probabilities of the
disjoints subsets E Hi . We have
X
P (E) =
P (E Hi ).
(68)
i=1,m

By definition of the conditional probability, we have P (E Hi ) = P (E|Hi )P (Hi ),


which leads to
X
P (E) =
P (E|Hi )P (Hi ).
(69)
i=1,m

We can now plug 66 and 69 into 65, which concludes the proof.
The following exercise is presented in [17], in chapter 1, Elements of probability.
Example 1.9 Consider the situation where an insurance company tries to compute
the probability of having an accident. This company makes the assumption that
people are accident prone or are not accident prone and consider the probability of
having an accident during the 1 year period following the insurance policy purchase.
They assume that an accident-prone person will have an accident with probability
0.4. For a non-accident-prone person, the probability of having an accident is 0.2.
Assume that 30 % percent of the population is accident prone. Assume that a person
has an accident. What is the probability that he is accident prone ?
Let us denote by E the event that the person will have an accident within the
year of purchase and denote by H1 the event that the person is accident prone.
By hypothesis, the sample space of all the persons can be decomposed with the
pairwise disjoints sets H1 and H2 = H1c , which are, in this particular situation the
hypotheses. By hypothesis, we know that P (E|H1 ) = 0.4 and P (E|H2c ) = 0.2. We
also know that P (H1 ) = 0.3, which implies that P (H2 ) = 1 P (H1 ) = 0.7. We
want to compute P (H1 |E). By Bayes formula 64, we have
P (E|H1 )P (H1 )
P (E|H1 )P (H1 ) + P (E|H2 )P (H2 )
0.4 0.3
=
0.4 0.3 + 0.2 0.7
0.4615,

P (H1 |E) =

with 4 significant digits.

18

(70)
(71)
(72)

1.8

Independent events

In this section, we define independent events and give exemples of such events.
Definition 1.13. (Independent events) Assume that is a finite sample space. The
event two events A, B are independent if P (A) > 0 and P (B) > 0 and
P (A|B) = P (A),

(73)

In the previous definition, the roles of A and B can be reversed, which leads to
the following result. If two events A, B are independent, then
P (B|A) = P (B).

(74)

This is the subject of exercise 1.4.


This definition, associated with the definition of the conditional probability 49
leads immediately to the following proposition.
Proposition 1.14. (Independent events) Assume that is a finite sample space.
Assume that the two events A, B are satisfying P (A) > 0 and P (B) > 0. Then
the events A and B are independent if and only if
P (A B) = P (A)P (B).

(75)

Proof. Assume that the two events A, B are satisfying P (A) > 0 and P (B) > 0.
In the first part of this proof, let us prove that 75 is satisfied. By definition of
the conditional probability 1.11, we have P (A B) = P (A|B)P (B). Since A and B
are independent, we have P (A|B) = P (A), which leads to P (A B) = P (A)P (B)
and concludes the first part.
In the second part, let us assume that 75 is satisfied, and let us prove that the
events A and B are independent. By definition of the conditional probability 1.11,
we have
P (A|B) =

P (A B)
.
P (B)

(76)

By hypothesis, we have P (A B) = P (A)P (B), so that


P (A|B) =

P (A)P (B)
= P (A),
P (B)

(77)

since P (B) > 0. Similarily, switching the roles of A and B, by definition we


have P (B|A) = P P(BA)
. By hypothesis, we have P (B A) = P (B)P (A), so that
(A)
P (B|A) =

P (B)P (A)
P (A)

= P (B), since P (A) > 0, which concludes the proof.

The proposition 1.14 has an important and rather subtle consequence which is
presented in the following example. This example is based on the Historial remarks
of section 1.2 in [7], which present the results collected by Gerolamo Cardano (15011576).

19

Example 1.10 (Die with 6 faces) Assume that a 6-faces of a fair die is rolled twice
(instad of once) and consider the problem of choosing the correct sample space. The
correct sample space takes into account for the order of the rolls and is
= {(i, j) / i, j = 1, 6} .

(78)

The set as 6 6 = 36 elements. The wrong (for our purpose), unordered, sample
space is
2 = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6),
(2, 2), (2, 3), (2, 4), (2, 5), (2, 6),
(3, 3), (3, 4), (3, 5), (3, 6), (4, 4), (4, 5), (4, 6)
(5, 5), (5, 6), (6, 6)}.

(79)

The set 2 as 21 elements. The fact that the set 2 is the wrong sample space for
this experiment is linked to the fact that the two rolls are independent. Therefore,
the equality 75 can be applied, stating that the probabilities of two independent
events is the product of the probabilities. Therefore, the probability of any event
(i, j) is
P ((i, j)) = P (i)P (j),

(80)

for all i, j = 1, 6. Since P (i) = P (j) = 16 , we have


P ((i, j)) =

1
,
36

(81)

for all i, j = 1, 6. The only sample space which can leave this probability consistent
with the uniform distribution equality of a finite space 38, is . If the sample space
2 was chosen, the equality 75 would be violated so that the two events would
become dependent.

1.9

Notes and references

The material for section 1.1 is presented in [11], chapter 1, Set, spaces and measures. The same presentation is given in [7], section 1.2, Discrete probability
distributions. The example 1.3 is given in [7].
The equations 21, 18 and 23 are at the basis of the probability theory so that in
[17], these properties are stated as axioms.
In some statistics books, such as [8] for example, the union of the sets A and B
is denoted by the sum A + B, and the intersection of sets is denoted by the product
AB. We did not use these notations in this document.
The section 1.6 on life tables is inspired by an example presented in [7], in section
4.1, Discrete conditional Probability. We have updated the data from 1990 to 2009
and introduce the associate Scilab function.

20

1.10

Exercises

Exercise 1.1 (Head and tail ) Assume that we have a coin which is tossed twice. We record the
outcomes so that the order matters, i.e. the sample space is HH, HT, T H, T T . Assume that the
distribution function is uniform, i.e. each of the head and the tail have an equal probability.
1. What is the probability for the event A = HH, HT, T H ?
2. What is the probability for the event A = HH, HT ?
Exercise 1.2 (Two dice) Assume that we are rolling a pair of dice. Assume that each face has
an equal probability.
1. What is the probability of getting a sum of 7 ?
2. What is the probability of getting a sum of 11 ?
3. What is the probability of getting a double one, i.e. snakeeyes ?
Exercise 1.3 (M
er
es experiments) This exercise is presented in [7], in the historical remarks
of section 1.2, Discrete Probability Distributions. Famous letters between Pascal and Fermat
were investigated by a request for help from a French nobleman and gambler, Chevalier de Mere.
It is said that de Mere had been betting that, in four rolls of a die, at least one six would turn up
(event A). He was winning consistently and, to get more people to play, he changed the game to
get that, in 24 rolls of two dice, a pair of 6 would turn up (event B). It is claimed that de Mere
lost with 24 and felt that 25 were necessary to make the game favorable (event C). What is the
probability for the three events A, B and C ? Can you compute with Scilab the probability for
event A and a number of rolls equal to 1, 2, 3 or 4 ? Can you compute with Scilab the probability
for the event B or C for a number of rolls equal to 10, 20, 24, 25, 30 ?
Exercise 1.4 (Independent events) Assume that is a finite sample space. Assume that the
two events A, B are independent. Prove that P (B|A) = P (B).
Exercise 1.5 (Booles inequality ) Assume that is a finite sample space. Let (Ei )i=1,n be a
sequence of finite sets included in and n > 0. Prove Booles inequality:

[
X
P
Ei
P (Ei ).
(82)
i=1,n

i=1,n

Exercise 1.6 (Discrete conditional probability ) In this exercise, we consider a laboratory


blood test which is experimented against healthy and persons who have the disease. Assume that,
when the person has the disease, the test is positive with probability 99 %. However, this test also
generates false positive with probability 1 %. This means that, when a person does not have the
disease, the test is positive with probability 1 %. Assume that 0.5 % of the population has the
disease. Given that the test is positive for one person, what is the probability that this person has
the disease ? Consider the case where 5 % of the population has the disease. Consider the case
where the probability for a false positive is 0.1 % (but keep the probability of a true positive equal
to 99 %).
This exercise is presented in [17], Chapter 1, Elements of probability.

Combinatorics

In this section, we present several tools which allow to compute probabilities of


discrete events. One powerful analysis tool is the tree diagram, which is presented
in the first part of this section. Then, we detail permutations and combinations
numbers, which allow to solve many probability problems.

21

Figure 8: Tree diagram - The task is made with 3 steps. There are 2 choices for the
step #1, 3 choices for step #2 and 2 choices for step #3. The total number of ways
to perform the full sequence of steps is n = 2 3 2 = 12.

2.1

Tree diagrams

In this section, we present the general method which allows to count the total number
of ways that a task can be performed. We illustrate that method with tree diagrams.
Assume that a task is carried out in a sequence of n steps. The first step can be
performed by making one choice among m1 possible choices. Similarly, there are m2
possible ways to perform the second step, and so forth. The total number of ways
to perform the complete sequence can be performed in n = m1 m2 . . . mn different
ways.
To illustrate the sequence of steps, the associated tree can be drawn. An example
of such a tree diagram is given in the figure 8. Each node in the tree corresponds to
one step in the sequence. The number of children of a parent node is equal to the
number of possible choices for the step. At the bottom of the tree, there are N leafs,
where each path, i.e. each sequence of nodes from the root to the leaf, corresponds
to a particular sequence of choices.
We can think of the tree as representing a random experiment, where the final
state is the outcome of the experiment. In this context, each choice is performed
at random, depending on the probability associated with each branch. We will
review tree diagrams throughout this section and especially in the section devoted
to Bernoulli trials.

2.2

Permutations

In this section, we present permutations, which are ordered subsets of a given set.
Definition 2.1. ( Permutation) Assume that A is a finite set. A permutation of A
is a one-to-one mapping of A onto itself.
Without loss of generality, we can assume that the finite set A can be ordered
and numbered from 1 to n = #(A), so that we can write A = {1, 2, . . . , n}. To define
a particular permutation, one can write a matrix with 2 rows and n columns which
22

1
2
3

2
3

Figure 9: Tree diagram for the computation of permutations of the set A = {1, 2, 3}.
represents the mapping. One example of a permutation on the set A = {a1 , a2 , a3 , a4 }
is


1 2 3 4
=
,
(83)
2 1 4 3
which signifies that the mapping is:
a1 a2 ,
a2 a1 ,
a3 a4 ,
a3 a4 .
Since the first row is always the same, there is no additional information provided
by this row. This is why the permutation can be written by uniquely defining the
second row. This way, the previous mapping can be written as

= 2 1 4 3 .
(84)
We can try to count the number of possible permutations of a given set A with
n elements.
The tree diagram associated with the computation of the number of permutations
for n = 3 is presented in figure 9. In the first step, we decide which number to place
at index 1. For this index, we have 3 possibilities, that is, the numbers 1, 2 and
3. In the second step, we decide which number to place at index 2. At this index,
we have 2 possibilities left, where the exact numbers depend on the branch. In the
third step, we decide which number to place at index 3. At this last index, we only
have one number left.
This leads to the following proposition, which defines the factorial function.
Proposition 2.2. ( Factorial) The number of permutations of a set A of n elements
is the factorial of n defined by
n! = n (n 1) . . . 2 1.

23

(85)

Proof. #1 Let us pick an element to place at index 1. There are n elements in


the set, leading to n possible choices. For the element at index 2, there are n 1
elements left in the set. For the element at index n, there is only 1 element left.
The total number of permutations is therefore n (n 1) . . . 2 1, which concludes
the proof.
Proof. #2 The element at index 1 can be located at indexes 1, 2, . . . , n so that there
are n ways to set the element #1. Once the element at index 1 is placed, there
are n 1 ways to set the element at index 2. The last element at index n can
only be set at the remaining index. The total number of permutations is therefore
n (n 1) . . . 2 1, which concludes the proof.
When n = 0, it seems that we cannot define the number 0!. For reasons which
will be clearer when we will introduce the gamma function, it is convenient to define
0! as equal to one:
0! = 1.

(86)

Example 2.1 Let us compute the number of permutations of the set A = {1, 2, 3}.
By the equation 85, we have 6! = 3 2 1 = 6 permutations of the set A. These
permutations are:
(1
(1
(2
(2
(3
(3

2
3
1
3
1
2

3)
2)
3)
1)
2)
1)

(87)

The previous permutations can also be directly read from the tree diagram 9, from
the root of the tree to each of the 6 leafs.
In some situations, all the elements in the set A are not involved in the permutation. Assume that j is a positive integer, so that 0 j n. A j-permutation is
a permutation of a subset of j elements in A. The general counting method used
for the previous proposition allows to count the total number of j-permutations of
a given set A.
Proposition 2.3. ( j-permutations) Assume that j is a positive integer. The number of j-permutations of a set A of n elements is
(n)j = n (n 1) . . . (n j + 1).

(88)

Proof. The element at index 1 can be located at indexes 1, 2, . . . , n so that there are
n ways to set the element at index 1. Once element at index 1 is placed, there are
n 1 ways to set the element at index 2. The element at index j can only be set at
the remaining n j + 1 indexes. The total number of j-permutations is therefore
n (n 1) . . . (n j + 1), which concludes the proof.

24

Notice that the number of j-permutations of n elements and the factorial of n


are equal when j = n. Indeed, we have
(n)n = n (n 1) . . . (n n + 1) = n!.

(89)

On the other hand, the number of 0-permutations of n elements can be defined to


be equal to 1:
(n)0 = 1.

(90)

Example 2.2 Let us compute the number of 2-permutations of the set A = {1, 2, 3, 4}.
By the equation 88, we have (4)2 = 4 3 = 12 permutations of the set A. These
permutations are:
(1 2)
(1 3)
(1 4)

(2 1)
(2 3)
(2 4)

(3 1)
(3 2)
(3 4)

(4 1)
(4 2)
(4 3)

(91)

We can check that the number of 2-permutations in a set of 4 elements is (4)2 = 12


which is stricly lower that the number of permutations 4! = 24.

2.3

The gamma function

In this section, we present the gamma function which is closely related to the factorial function. The gamma function was first introduced by the Swiss mathematician
Leonard Euler in his goal to generalize the factorial to non integer values[18]. Efficient implementations of the factorial function are based on the gamma function
and this is why this functions will be analyzed in detail. The practical computation
of the factorial function will be analyzed in the next section.
Definition 2.4. ( Gamma function) Let x be a real with x > 0. The gamma function
is defined by
Z 1
( log(t))x1 dt.
(92)
(x) =
0

The previous definition is not the usual form of the gamma function, but the
following proposition allows to get it.
Proposition 2.5. ( Gamma function) Let x be a real with x > 0. The gamma
function satisfies
Z
(x) =
tx1 et dt.
(93)
0

Proof. Let us consider the change of variable u = log(t). Therefore, t = eu , which


leads, by differenciation, to dt = eu du. We get ( log(t))x1 dt = ux1 eu du.
Moreover, if t = 0, then u = and if t = 1, then u = 0. This leads to
Z 0
(x) =
ux1 eu du.
(94)

25

For any continuously differentiable function f and any real numbers a and b.
Z a
Z b
f (x)dx.
(95)
f (x)dx =
b

We reverse the bounds of the integral in the equality 94 and get the result.
The gamma function satisfies
Z


et dt = et 0 = (0 + e0 ) = 1.
(1) =

(96)

The following proposition makes the link between the gamma and the factorial
functions.
Proposition 2.6. ( Gamma and factorial) Let x be a real with x > 0. The gamma
function satisfies
(x + 1) = x(x)

(97)

(n + 1) = n!

(98)

and

for any integer n 0.


Proof. Let us prove the equality 97. We want to compute
Z
tx et dt.
(x + 1) =

(99)

The proof is based on the integration by parts formula. For any continuously differentiable functions f and g and any real numbers a and b, we have
Z b
Z b
b
0
f (t)g (t)dt = [f (t)g(t)]a
f (t)0 g(t)dt.
(100)
a

Let us define f (t) = tx and g 0 (t) = et . We have f 0 (t) = xtx1 and g(t) = et . By
the integration by parts formula 100, the equation 99 becomes
Z
 x t 
(x + 1) = t e 0 +
xtx1 et dt.
(101)
0

x t

Let us introduce the function h(t) = t e . We have h(0) = 0 and limt h(t) = 0,
for any x > 0. Hence,
Z
(x + 1) =
xtx1 et dt,
(102)
0

which proves the equality 97.


The equality 98 can be proved by induction on n. First, we already noticed that
(1) = 1. If we define 0! = 1, we have (1) = 0!, which proves the equality 98 for
n = 0. Then, assume that the equality holds for n and let us prove that (n + 2) =
(n+1)!. By the equality 97, we have (n+2) = (n+1)(n+1) = (n+1)n! = (n+1)!,
which concludes the proof.
26

The gamma function is not the only function f which satisfies f (n) = n!. But the
Bohr-Mollerup theorem prooves that the gamma function is the unique function f
which satisfies the equalities f (1) = 1 and f (x+1) = xf (x), and such that log(f (x))
is convex [2].
It is possible to extend this function to negative values by inverting the equation
97, which implies
(x) =

(x + 1)
,
x

(103)

for x ] 1, 0[. This allows to compute, for example (1/2) = 2(1/2). By


induction, we can also compute the value of the gamma function for x ] 2, 1[.
Indeed, the equation 103 implies
(x + 1) =

(x + 2)
,
x+1

(104)

which leads to
(x) =

(x + 2)
.
x(x + 1)

(105)

By induction of the intervals ] n 1, n[ with n a positive integer, this formula


allows to compute values of the gamma function for all x 0, except the negative
integers 0, 1, 2, . . .. This leads to the following proposition.
Proposition 2.7. ( Gamma function for negative arguments) For any non zero
integer n and any real x such that x + n > 0,
(x) =

(x + n)
.
x(x + 1) . . . (x + n 1)

(106)

Proof. The proof is by induction on n. The equation 103 prooves that the equality
is true for n = 1. Assume that the equality 106 is true for n et let us proove that it
also holds for n + 1. By the equation 103 applied to x + n, we have
(x + n) =

(x + n + 1)
.
x+n

(107)

Therefore, we have
(x) =

(x + n + 1)
x(x + 1) . . . (x + n 1)(x + n)

(108)

which proves that the statement holds for n + 1 and concludes the proof.
The gamma function is singular for negative integers values of its argument, as
stated in the following proposition.
Proposition 2.8. ( Gamma function for integer negative arguments) For any non
negative integer n,
(n + h)
when h is small.
27

(1)n
,
n!h

(109)

factorial returns n!
gamma
returns (x)
gammaln returns log((x))
Figure 10: Scilab commands for permutations.
Proof. Consider the equation 106 with x = n + h. We have
(n + h) =
But (h) =

(h+1)
,
h

(h)
.
(h n)(h n + 1)) . . . (h + 1)

(110)

which leads to

(n + h) =

(h + 1)
.
(h n)(h n + 1)) . . . (h + 1)h

(111)

When h is small, the expression (h + 1) converges to (1) = 1. On the other hand,


the expression (h n)(h n + 1)) . . . (h + 1)h converges to (n)(n + 1) . . . (1)h,
which leads to the the term (1)n and concludes the proof.
We have reviewed the main properties of the gamma function. In practical
situations, we use the gamma function in order to compute the factorial number, as
we are going to see in the next sections. The main advantage of the gamma function
over the factorial is that it avoids to form the product n! = n (n 1) . . . 1, which
allows to save a significant amount of CPU time and computer memory.

2.4

Overview of functions in Scilab

The figure 10 presents the functions provided by Scilab to compute permutations.


Notice that there is no function to compute the number of permutations (n)j =
n (n 1) . . . (n j + 1). This is why, in the next sections, we provide a Scilab
function to compute (n)j .
In the next sections, we analyze each function in Scilab. We especially consider
their numerical behavior and provide accurate and efficient Scilab functions to manage permutations. We emphasize the need for accuracy and robustness. For this
purpose, we use the logarithmic scale to provide intermediate results which stays in
the limited bounds of double precision floating point arithmetic.

2.5

The gamma function in Scilab

The gamma function allows to compute (x) for real input argument. The mathematical function (x) can be extended to complex arguments, but this has not be
implemented in Scilab.
The following script allows to plot the gamma function for x [4, 4].
x = linspace ( -4 , 4 , 1001 );
y = gamma ( x );
plot ( x , y );

28

-2

-4

-6
-4

-3

-2

-1

Figure 11: The gamma function.


h = gcf ();
h . children . data_bounds = [
- 4. -6
4.
6
];

The previous script produces the figure 11.


The following session presents various values of the gamma function.
-->x = [ -2 -1 -0 +0 1 2 3 4 5 6] ;
- - >[ x gamma ( x )]
ans =
- 2.
Nan
- 1.
Nan
0. - Inf
0.
Inf
1.
1.
2.
1.
3.
2.
4.
6.
5.
24.
6.
120.

Notice that the two floating point signed zeros +0 and -0 are associated with the
function values and +. This is consistent with the value of the limit of the
function from either sides of the singular point. This contrasts with the value of the
gamma function on negative integer points, where the function value is %nan. This
is consistent with the fact that, on this singular points, the function is equal to
on one side and + on the other side. Therefore, since the argument x has one
single floating point representation when it is a negative nonzero integer, the only
solution consistent with the IEEE754 standard is to set the result to %nan.
Notice that we used 1001 points to plot the gamma function. This allows to get
29

4.0e+006

3.5e+006

3.0e+006

2.5e+006

2.0e+006

1.5e+006

1.0e+006

5.0e+005

0.0e+000
1

10

Figure 12: The factorial function.


points exactly located at the singular points. These values are ignored by the plot
function and makes a nice plot. Indeed, if 1000 points are used instead, vertical lines
corresponding to the y-value immediately at the left and the right of the singularity
would be displayed.

2.6

The factorial and log-factorial functions

In the following script, we plot the factorial function for values of n from 1 to 10.
f = factorial (1:10)
plot ( 1:10 , f , "b - o " )

The result is presented in figure 12. We see that the growth rate of the factorial
function is large.
The largest values of n so that n! is representable as a double precision floating point number is n = 170. In the following session, we check that 171! is not
representable as a Scilab double.
--> factorial (170)
ans =
7.257+306
--> factorial (171)
ans =
Inf

The factorial function is implied in many probability computations, sometimes as


an intermediate result. Since it grows so fast, we might be interested in computing
its order of magnitude instead of its value. Let us introduce the function fln as the
logarithm of the factorial number n!:
fln (n) = log(n!).
30

(112)

Logarithm of n!
800

700

600

log(n!)

500

400

300

200

100

0
0

20

40

60

80

100

120

140

160

180

Figure 13: Logarithm of the factorial number.


Notice that we used the base-e logarithm function log, that is, the reciprocal of the
exponential function.
The factorial number n! grows exponentially, but its logarithm grows much more
slowly. In the figure 13, we plot the logarithm of n! in the interval [0, 170]. We see
that the y coordinate varies only from 0 up to 800. Hence, there are a large number
of integers n for which n! may be not representable as a double but fln (n) is still
representable as a double.

2.7

Computing factorial and log-factorial with Scilab

In this section, we present how to compute the factorial function in Scilab. We focus
in this section on accuracy and efficiency.
The factorial function returns the factorial number associated with the given
n. It has the following syntax:
f = factorial ( n )

In the following session, we compute n! for several values of n, from 1 to 7.


-->n = (0:7) ;
- - >[ n factorial ( n )]
ans =
0.
1.
1.
1.
2.
2.
3.
6.
4.
24.
5.
120.
6.
720.
7.
5040.

31

The implementation of the factorial function in Scilab allows to take both


matrix and hypermatrices input arguments. In order to be fast, it uses vectorization.
The following factorialScilab function represents the computationnal core of the
actual implementation of the factorial function in Scilab.
function f = factorialScilab ( n )
n ( n ==0)=1
t = cumprod (1: max ( n ))
v = t ( n (:))
f = matrix (v , size ( n ))
endfunction

The statement n(n==0)=1 allows to set all zeros of the matrix n to one, so that
the next statements do not have to manage the special case 0! = 1. Then, we use
the cumprod function in order to compute a column vector containing cumulated
products, up to the maximum entry of n. The use of cumprod allows to get all the
results in one call, but also produces unnecessary values of the factorial function.
In order to get just what is needed, the statement v = t(n(:)) allows to extract
the required values. Finally, the statement f = matrix(v,size(n)) reshapes the
matrix of values so that the shape of the output argument is the same as the shape
of the input argument.
The following function allows to compute n! based on the prod function, which
computes the product of its input argument.
function f = factorial_naive ( n )
f = prod (1: n )
endfunction

The factorial_naive function has two drawbacks. The first one is that it cannot manage matrix input arguments. Furthermore, it requires more memory than
necessary.
In practice, the factorial function can be computed based on the gamma function.
The following implementation of the factorial function is based on the equality 98.
function f = myfactorial ( n )
if ( ( or ( n (:) < 0) ) | ( n (:) <> round ( n (:) ) ) ) then
error ( " myfactorial : n must all be nonnegative integers " );
end
f = gamma ( n + 1 )
endfunction

The myfactorial function also checks that the input argument n is positive. It also
checks that n is an integer by using the condition ( n(:) <> round (n(:) ).
Indeed, if the value of n is different from the value of round(n), this means that the
input argument n is not an integer.
The main drawback of the factorialScilab function is that it uses more memory than necessary. It may fail to produce a result when it is given a large input
argument. In the following session, we use the factorial function with a very large
input integer. In this particular case, it is obvious that the correct result is Inf.
Anyway, this should have been the result of the function, which should not have
generated an error. On the other side, the myfactorial function works perfectly.
--> factorial (1. e10 )
! - - error 17

32

stack size exceeded !


--> myfactorial (1. e10 )
ans =
Inf

We now consider the computation of the log-factorial function fln . We can use
the gammaln function which directly provides the correct result.
function flog = factoriallog ( n )
flog = gammaln ( n +1)
endfunction

The advantage of this method is that matrix input arguments can be manage by the
factoriallog function.
There is another possible implementation for the log-factorial function, based on
the logarithm function. We have
log(n!) = log(1 2 3 n)
= log(1) + log(2) + log(3) + . . . + log(n).

(113)
(114)

The previous equation can be simplified since log(1) = 0. This leads to the following
implementation.
function flog = f ac t o r ia l l o g _l e s s n ai v e ( n )
flog = sum ( log (2: n ))
endfunction

The previous function has several drawbacks. The first problem is that this function
may require a large array if n is large. Hence, using this function is limited to
relatively small values of n. Moreover, it requires to evaluate the log function at
least n 1 times, which leads to a performance issue. Finally, it is not possible to
directly let the variable n be a matrix of doubles. All these issues make the function
factoriallog_lessnaive a less than perfect implementation of the log-factorial
function.

2.8

Stirlings formula

In this section, we present Stirlings formula [2, 19], which allows to get an asymptotic
formula for the gamma function.
Let us recall the definition of the asymptotic symbol .
Definition 2.9. ( Asymptotic) Let {an }n0 and {bn }n0 be two real sequences. We
say that an is asymptotically equal to bn if
an
= 1.
n bn
lim

(115)

In this case, we write an bn .


Stirlings formula gives the asymptotic rate of the gamma function.
Proposition 2.10. ( Stirlings formula) For any real number x ,
1
(x) ex xx 2 2.
33

(116)

We shall not give the proof of this formula here (see [2, 19] for a complete
derivation).
The previous proposition allows to directly derive an asymptotic behavior for
the factorial function.
Proposition 2.11. ( Stirlings formula) For any positive integer n ,
 n n
2n.
n!
e

(117)

Proof. By the proposition 2.10, we have


1
n! = (n + 1) en1 (n + 1)n+1 2 2
1
en1 (n + 1)n+ 2 2

en1 (n + 1)n 2n.

(118)
(119)
(120)

We can simplify the previous expression for large values of n. Indeed, we have
en1 en and (n + 1)n nn , which concludes the proof.
In the following script, we compare Stirlings formula and the factorial function
for various values of n. Moreover, we compute the number of significant
digits

n!Sn
produced by Stirlings formula, by using the equation d = log10 n! , where Sn
is Stirlings approximation given by the equation 117.
-->n = (1:20:185) ;
-->f = factorial ( n );
-->s = sqrt (2* %pi .* n ).*( n ./ %e ).^ n ;
-->d = - log10 ( abs (f - s )./ f );
- - >[ n f s d ]
ans =
1.
1.
0.9221370
21.
5.109 D +19
5.089 D +19
41.
3.345 D +49
3.338 D +49
61.
5.076 D +83
5.069 D +83
81.
5.79 D +120
5.79 D +120
101.
9.42 D +159
9.41 D +159
121.
8.09 D +200
8.08 D +200
141.
1.89 D +243
1.89 D +243
161.
7.59 D +286
7.58 D +286
181.
Inf
Inf

1.1086689
2.4022947
2.692415
2.8648116
2.9878919
3.0836832
3.1621171
3.2285294
3.2861201
Nan

We see that the number of significant digits increases with n, reaching more than 3
when n is close to its upper limit.
It is not straightforward to proove Stirlings formula. Still, we can easily proove
the following proposition, which focuses on the log-factorial function.
Proposition 2.12. ( Log-factorial for large n) For any positive integer n ,
log(n!) n log(n) n.

(121)

Proof. By the property of the logarithm function, we have


log(n!) = log(1 2 3 n)
= log(1) + log(2) + log(3) + . . . + log(n).
34

(122)
(123)

The previous equation can be simplified, since log(1) = 0. The log function is a
nondecreasing function of x for x > 0. Therefore,
Z k
Z k+1
log(x)dx < log(k) <
log(x)dx,
(124)
k1

for k 1. We sum the previous equation over k = 1, 2, . . . , n and get


Z n+1
Z n
log(x)dx.
log(x)dx < log(n!) <

(125)

We must now compute the two integrals which appear in the previous inequalities. We recall that the anti-derivative of the log(x) function is x log(x) x, since
(x log(x) x)0 = log(x) + x x1 1 = log(x). Furthermore, we recall that the limit
of the function x log(x) is zero when x converges to zero. Therefore,
Z n
log(x)dx = [x log(x) x]n0
(126)
0

= n log(n) n

(127)

and
Z

n+1

log(x)dx = [x log(x) x]n+1


1

(128)

= (n + 1) log(n + 1) (n + 1) + 1
= (n + 1) log(n + 1) n.

(129)
(130)

We plug the two previous results into the equation 125 and get
n log(n) n < log(n!) < (n + 1) log(n + 1) n,

(131)

which concludes the proof.


Stirlings formula 116 is consistent with the asymptotic equation 117, which
allows to directly derive the equation
 n n

log(n!) = log
2n
(132)
e 
 n
n

(133)
= log
+ log( 2n)
en 

= n log
+ log( 2) + log( n)
(134)
e
1
1
= n log(n) n log(e) + log(2) + log(n)
(135)
2
2
1
1
= n log(n) n + log(2) + log(n),
(136)
2
2
since log(e) = 1. This immediately implies the equation 121.
In the following session, we compare the asymptotic equation 121 with the gammaln function. The session displays the column vectors [n f s d], where f is
computed from the gammaln function, s is computed from the asymptotic equation
121 and d is the number of significant digits in the asymptotic equation. We see that,
for n greater than 1010 , we have more than 10 significant digits in the asymptotic
equation.
35

-->n = logspace (1 ,10 ,10) ;


-->f = gammaln ( n +1);
-->s = n .* log ( n ) - n ;
-->d = - log10 ( abs (f - s )./ f );
- - >[ n f s d ]
ans =
10.
15.104413
100.
363.73938
1000.
5912.1282
10000.
82108.928
100000.
1051299.2
1000000.
12815518.
10000000.
1.512 D +08
1.000 D +08
1.742 D +09
1.000 D +09
1.972 D +10
1.000 D +10
2.203 D +11

2.9

13.025851
360.51702
5907.7553
82103.404
1051292.5
12815511.
1.512 D +08
1.742 D +09
1.972 D +10
2.203 D +11

0.8613409
2.0526167
3.1309743
4.1721275
5.1972489
6.2141578
7.2263182
8.2354866
9.2426477
10.248397

Computing permutations and log-permutations with Scilab

There is no Scilab function to compute the number of permutations (n)j = n.(n


1) . . . (n j + 1).
We might be interested in simplifying the expression for the permutation number
We have
(n)j = n.(n 1) . . . (n j + 1)
n.(n 1). . . . .1
,
=
(n j).(n j 1) . . . 1
n!
.
=
(n j)!

(137)
(138)
(139)

This leads to the following function permutations_verynaive.


function p = p e r m ut a t i o ns _ v e r yn a i v e ( n , j )
p = factorial ( n )./ factorial (n - j )
endfunction

In the following session, we see that the previous function works for small values of
n and j.
-->n = [5 5 5 5 5 5] ;
-->j = [0 1 2 3 4 5] ;
- - >[ n j p er m u t at i o n s _v e r y n ai v e (n , j )]
ans =
5.
0.
1.
5.
1.
5.
5.
2.
20.
5.
3.
60.
5.
4.
120.
5.
5.
120.

In the following session, we compute the permutations number (171)171 = 171!.


--> p er m u t at i o n s _v e r y n ai v e ( 171 , 171 )
ans =
Inf

36

This is caused by an overflow during the computation of the factorial function.


There is unfortunately no way to fix this problem, since the result is, indeed, not
representable as a double precision floating point number.
On the other hand, the permutations_verynaive function performs poorly in
cases where n is large, whatever the value of j, as presented in the following session.
--> p er m u t at i o n s _v e r y n ai v e ( 171 , 0 )
ans =
Nan

There is certainly something to do about this problem, since, when j = 0, we have


(n)0 = 1, for any n 1.
The following permutations_naive function allows to compute (n)j for positive
integer values of n, j. It is based on the prod function, which computes the product
of the given vector.
function p = permutations_naive ( n , j )
p = prod ( n - j +1 : n )
endfunction

In the following session, we check the values of the function (n)j for n = 5 and
j = 1, 2, . . . , 5.
-->n = 5;
--> for j = 0 : 5
--> p = permutations_naive ( n , j );
--> disp ([ n j p ]);
--> end
5.
0.
1.
5.
1.
5.
5.
2.
20.
5.
3.
60.
5.
4.
120.
5.
5.
120.

The following session shows that permutations_naive performs more nicely


than the previous function for small values of j.
--> permutations_naive ( 171 , 0 )
ans =
1.

The permutations_naive function has still several drawbacks. First, it requires


more memory than necessary. For example, it may fail to compute (n)n = 1 for
values of n larger than 105 .
--> permutations_naive ( 1. e7 , 1. e7 )
! - - error 17
stack size exceeded !

Furthermore, the function permutations_naive does not manage matrix input


arguments.
In order to accurately compute the permutation number, we may compute its

37

logarithm first. By the equation 139, we have




n!
log((n)j ) = log
(n j)!
= log(n!) log((n j)!)
= log((n + 1)) log((n j + 1)).

(140)
(141)
(142)

The previous equation leads to the definition of the log-permutation function, as


defined in the following function.
function plog = permutationslog ( n , j )
plog = gammaln ( n +1) - gammaln (n - j +1);
endfunction

In order to compute the permutation number, we compute the exponential of the


expression. This leads to the following function permutations, where we round the
result in order to get integer results.
function p = permutations ( n , j )
p = exp ( gammaln ( n +1) - gammaln (n - j +1));
if ( and ( round ( n )== n ) & and ( round ( j )== j ) ) then
p = round ( p )
end
endfunction

The permutations function takes matrix input arguments, as presented in the


following session.
-->n = [5 5 5 5 5 5] ;
-->j = [0 1 2 3 4 5] ;
- - >[ n j permutations (n , j )]
ans =
5.
0.
1.
5.
1.
5.
5.
2.
20.
5.
3.
60.
5.
4.
120.
5.
5.
120.

Finally, the permutations functions requires the minimum amount of memory


and performs correctly, even for large values of n.
--> permutations ( 1. e7 , 1. e7 )
ans =
Inf
--> permutations ( 1. e7 , 0 )
ans =
1.

2.10

The birthday problem

In this section, we consider a practical computation of a probability, based on permutations. We present in this example the Scilab functions which allow to perform
the computations.

38

Assume that n > 0 persons are gathered in a room. Can we compute the
probability that two persons in the room have the same birthday ?
To perform this computation, we assume that the year is made of 365 days. We
assume that each day has the same probability of a birthday.
Let us denote by Nn the sample space, which is the space of all possible
combinations of birthdays for n persons. It is defined by
= {(i1 , . . . , in ) / i1 , . . . , in = 1, 2, . . . , 365} .

(143)

The searched event E is the event that two persons have the same birthday. To
compute this event more easily, we can compute the probability of the complementary event E c , i.e., the event that all persons have a distinct birthday. Indeed, once we have computed P (E c ), we can deduce the searched probability with
P (E) = 1 P (E c ).
By hypothesis, all days in the year have the same probability so that we can
c)
.
apply proposition 1.9 for uniform discrete distributions. Therefore, P (E c ) = #(E
#()
The birthday of the each person can be chosen from 365 days. Since the birthdays
of several persons are independent events, this leads to
#() = 365n .

(144)

Let us now compute the size of the complementary event E c . The birthday of
the first person can be chosen among 365 possible days. Once chosen, the birthday
for the second person can be chosen among 365-1 = 364 days, so that the birthdays
are different. By repeating this process for the n persons, we get the following size
for the event E c :
#(E c ) = 365.364 . . . (365 n + 1) = (365)n .

(145)

Let us define the probability Q(E) as the probability of the complementary event:
Q(E) = P (E c ).

(146)

We can combine 144 and 145 to compute the complementary probability


Q(E) =

(365)n
365n

(147)

which leads to the required probability


P (E) = 1 Q(E) = 1

(365)n
.
365n

(148)

The following twobirthday_verynaive function returns the probability that two


persons have the same birth time in a group of n persons, in a period of d days. In
365!
order to compute (365)n , we choose the formula (365)n = (365n)!
.
function p = t wo b ir th d ay _ ve ry n ai ve ( n , d )
p = 1 - factorial ( d )./ factorial ( d - n ) ./ d ^ n
endfunction

39

We are going to use the previous function with d = 365. In the following session,
we see that the twobirthday_verynaive does not allow to compute any result,
whatever the value of n.
-->n = (1:5) ;
- - >[ n tw o bi rt h da y_ v er y na iv e (n ,365)]
ans =
1.
Nan
2.
Nan
3.
Nan
4.
Nan
5.
Nan

Indeed, the formula involves 365!, which cannot be represented as a double.


The following twobirthday_naive function uses the permutations function previously defined.
function p = twobirthday_naive ( n , d )
p = 1 - permutations ( d , n ) ./ d ^ n
endfunction

In the following session, we compute several the probability that two persons have
the same birthday for n going from 1 to 10.
-->n = (1:5) ;
- - >[ n twobirthday_naive (n ,365)]
ans =
1.
0.
2.
0.0027397260273972490197
3.
0.0082041658847813447863
4.
0.0163559124665503263785
5.
0.0271355736996392593596

We can explore larger values of n and see when the probability p break the p =
0.5 threshold. The following Scilab session shows how to compute the required
probability for n = 20, 21, . . . , 25.
-->n = (20:25) ;
- - >[ n twobirthday_naive (n ,365)]
ans =
20.
0.4114383835806317835093
21.
0.4436883351652584073221
22.
0.4756953076625709542213
23.
0.5072972343239776638057
24.
0.5383442579144532835755
25.
0.5686997039695230737877

This shows that if more than 23 persons are gathered, there is a favorable probability
that two persons have the same birthday.

2.11

A modified birthday problem

We now consider a modified birth problem. We have computed that 23 persons are
sufficient to make a favorable bet that two persons are born the same day. We are
interested in the event that two persons are born on the same day, at the same hour.
More precisely, we would like to compute the number of persons in the group so that
40

the probability of having two persons with the same birth day and hour is greater
than 0.5.
We assume that a year is made of 365 days and that a day is made of 24 hours.
This computation should be easy to perform. It suffices to use the twobirthday_naive function with d = 365 24. In the following session, we explore the
values of P (E) for n = 20, 21, . . . , 25.
-->n = (20:25) ;
- - >[ n twobirthday_naive (n ,365*24)]
ans =
20.
0.0214717378650828294440
21.
0.0237058206494269452236
22.
0.0260462518925431707473
23.
0.0284922544755111806225
24.
0.0310430168113396964813
25.
0.0336976934779864567560

We see that the probability is much lower than previously. The problem is that,
for larger values of n, the previous function fails, as can be seen in the following
session.
--> twobirthday_naive (100 ,365*24)
ans =
Nan

The problem is that the number of permutations required in the implementation is


(365 24)100 is represented by Inf.
--> permutations ( 365*24 , 100 )
ans =
Inf

The same issue occurs for (365 24)100 . This leads to the ratio Inf/Inf, which is
equal to the IEEE Nan. In order to solve this issue, we can use logarithms of the
intermediate results. We have


(365)n
(149)
log(Q(E)) = log
365n
= log((365)n ) log(365n )
(150)
= log((365)n ) n log(365).
(151)
This leads to the following twobirthday_lessnaive function, which uses the permutationslog function.
function p = t wo b ir th d ay _ le ss n ai ve ( n , d )
q = exp ( permutationslog ( d , n ) - n * log ( d ))
p = 1 - q
endfunction

We check in the following session that our less naive function allows to compute the
required probability for n = 100.
--> tw ob i rt hd a y_ l es sn a iv e (100 ,365*24)
ans =
0.4329003041011145747063

41

In order to search for the number of persons which makes the probability be greater
that p = 0.5, we perform a while loop. We quit this loop when the probability
breaks the threshold.
n = 1;
while ( %t )
p = tw ob i rt hd a y_ l es sn a iv e ( n , 365*24 );
if ( p > 0.5 ) then
mprintf ( " n = %d , p = %e \ n " ,n , p )
break
end
n = n + 1;
end

The previous script produces the following output.


n =111 , p =5.033485 e -001

We are now interested in computing the probability of getting two persons with
the same birth day and hour in a group of 500 persons. The following session shows
the result of calling the twobirthday_lessnaive with n = 500.
-->p = t wo b ir th d ay _ le ss n ai ve ( 500 , 365*24 )
p =
0.9999995

This probability is very close to 1. In order to get more significant digits, we use
the format function in the following session.
--> format (25)
-->p
p =
0.9999995054082023715480

This allows to get a larger number of significant digits. But the result is only accurate
at most to 17 significant digits. Since 6 of these digits are digits are 9, there are
only 17-6=11 digits available for the required probability p. The reason for this
inaccuracy is that q is very close to zero, which makes p = 1 q be very close to
1. This implies that, because of our way to compute the probability p, we can at
best expect 11 accurate digits for p. We emphasize that this is independent from
the actual accuracy of the intermediate computations and is only generated by the
way of representing the solution of the problem with double precision floating point
numbers.
One possible solution is to compute q instead of p. Indeed, the q value is close
to zero, where floating point numbers can be represented with limited but sufficient
accuracy. The following twobirthday function allows to compute both p and q.
function [ p , q ] = twobirthday ( n , d )
q = exp ( permutationslog ( d , n ) - n * log ( d ))
p = 1 - q
endfunction

In the following session, we display the result of the computation and use the
mprintf function with the %.17e format in order to display 17 digits after the
decimal point.

42

- - >[p , q ] = twobirthday ( 500 , 365*24 );


--> mprintf ( " p = % .17 e \ n " , p )
p =9 .9 999 95 054 08 202 370 e -001
--> mprintf ( " q = % .17 e \ n " , q )
q =4 .9 459 17 976 44 640 870 e -007

We now have all the available digits for q and we know that the probability of having
two persons with the same birth day and hour in a group of n = 500 persons is close
p = 1 4.94591797644640870 107 .
But this does not imply that all these digits are all exact. Indeed, floating
point evaluations of elementary operators like +, -, *, / and elementary and special
functions like exp, log and gamma are associated with rounding errors and various
approximations.
We have computed the probability for n = 500 with the symbolic computation
system Wolfram Alpha [16] and used the expresssion:
(365* 24)!/(3 65*24 - 500)!/( (365*24) ^500)

We found that the exact probability is in this case


z =4 .9 459 17 976 40 207 720 e -7

rounded to 17 digits. In the following session, we compute the relative error between
the computed and the exact probabilities.
-->z =4. 94 591 79 764 020 77 20 e -7
z =
0.0000005
--> abs (q - z )/ z
ans =
8.963 D -12
-->- log10 ( abs (q - z )/ z )
ans =
11.047534

We see that there are approximately 11 digits accurate in this case which is sufficiently accurate for our purpose.

2.12

Combinations

In this section, we present combinations, which are unordered subsets of a given set.
The number of distinct subsets with j elements which can be
from a set
 chosen

n
A with n elements is the binomial coefficient and is denoted by
. The following
j
proposition gives an explicit formula for the binomial number.
Proposition 2.13. ( Binomial) The number of distinct subsets with j elements
which can be chosen from a set A with n elements is the binomial coefficient and is
defined by
 
n.(n 1) . . . (n j + 1)
n
.
(152)
=
j
1.2 . . . j
The following proof is based on the fact that subsets are unordered, while permutations are based on the order.
43

Proof. Assume that the set A has n elements and consider subsets with j > 0
elements. By proposition 2.3, the number of j-permutations of the set A is (n)j =
n.(n1) . . . (nj +1). Notice that the order does not matter in creating the subsets,
so that the number of subsets is lower than the number of permutations. This is
why each subset is associated with one or more permutations. By proposition 2.2,
there are j! ways to order asetwith j elements. Therefore, the number of subsets
n
, which concludes the proof.
with j elements is given by
= n.(n1).....(nj+1)
1.2...j
j
The expression for the binomial coefficient can be simplified if we use the number
of j-permutations and the factorial number, which leads to
 
(n)j
n
=
.
(153)
j
j!
The equality (n)j =

n!
(nj)!

leads to
 
n!
n
.
=
j
(n j)!j!

(154)

  

n
n
=
.
j
nj

(155)

This immediately leads to

The following proposition shows a recurrence relation for binomial coefficients.


Proposition 2.14. For integers n > 0 and 0 < j < n, the binomial coefficients
satisfy
  
 

n
n1
n1
=
+
.
(156)
j
j
j1
The proof recurrence relation of the proposition 2.14 is given as an exercise.

2.13

Computing combinations and log-combinations with


Scilab

In this section, we show how to compute combinations with Scilab.  


n
There is no Scilab function to compute the binomial number
. In order
j
to compute the required combinations, we will use the gamma function. By the
equation 154, we have
 
n
log
= log(n!) log((n j)!) log(j!)
(157)
j
= log((n + 1)) log((n j + 1)) log((j + 1)). (158)
The following Scilab function performs the computation of the binomial number for
positive values of n and j.
44

function c = nchoosek ( n , j )
c = exp ( gammaln ( n +1) - gammaln ( j +1) - gammaln (n - j +1))
if ( and ( round ( n )== n ) & and ( round ( j )== j ) ) then
b = round ( b )
end
endfunction

In the following session, we compute the value of the binomial coefficients for
n = 1, 2, . . . , 5. The values in this table are known as Pascals triangle.
--> for n =0:5
--> for j =0: n
-->
c = nchoosek ( n , j );
-->
mprintf ( " %2d
" ,c );
--> end
--> mprintf ( " \ n " );
--> end
1
1
1
1
2
1
1
3
3
1
1
4
6
4
1
1
5
10
10
5
1

We now explain why we choose to use the exp and the gammaln to perform our
computation for the nchoosek function. Indeed, we could have used a more naive
method, based on the prod function, as in the following example :
function c = nchoosek_naive ( n , j )
c = prod ( n : -1 : n - j +1 )/ prod (1: j )
endfunction

For small integer values of n, the two previous functions produce the same result.
Unfortunately, even for moderate values
  of n, the naive method fails. In the following
n
session, we compute the value of
with n = 10000 and j = 134.
j
--> nchoosek ( 10000 , 134 )
ans =
2.050+307
--> nchoosek_naive ( 10000 , 134 )
ans =
Inf

The reason why the naive computation fails is because the products involved in
the intermediate variables for the naive method are generating an overflow. This
means that the values are too large for being stored in a double precision floating
point variable. This is a pity, since the result can be stored in a double precision
floating point variable. The nchoosek function, on the other hand, computes first the
logarithm of the factorial number. This logarithm cannot overflow because if x is a
double precision floating point number, then log(x) can always be represented since
its exponent is always smaller than the exponent of x. In the end, the combination
of the exp and gammaln functions allows to accurately compute the result in the
sense that, if the result is representable as a double precision floating point number,
then nchoosek will produce a result as accurate as possible.
45

Notice that we use the round function in our implementation of the nchoosek
function. This is because the nchoosek function manages in fact real double precision floating point input arguments. Consider the example
  where n = 4 and j = 1
n
and let us compute the associated number of nchoosek
. In the following Scilab
j
session, we use the format so that we display at least 15 significant digits.
--> format (20);
-->n = 4;
-->j = 1;
-->c = exp ( gammaln ( n +1) - gammaln ( j +1) - gammaln (n - j +1))
c =
3.99 99999999 9999822

We see that there are 15 significant digits, which is the best that can be expected
from the exp and gammaln functions. But the result is not an integer anymore,
i.e. it is very close to the integer 4, but not exactly equal to it. This is why in
the nchoosek function, if n and j are both integers, we round the number c to the
nearest integer with a call to the round function.
Finally, notice that our implementation of the nchoosek function uses the function and. This allows
 to
 use arrays of integers as input variables. In the following
5
session, we compute
, for j = 0, 1, . . . , 5 in one single call. This is a consequence
j
of the fact that the exp and gammaln both accept matrices input arguments.
-->n = 5 * ones (6 ,1);
-->j = (0:5) ;
-->c = nchoosek ( n , j );
- - >[ n j c ]
ans =
5.
0.
1.
5.
1.
5.
5.
2.
10.
5.
3.
10.
5.
4.
5.
5.
5.
1.

It appears, as we will see later in this document, that the number of combinations
appears in several probability computations. For example, this function is used as
an intermediate computation in the hypergeometric distribution function which will
be presented later in this document. We will see that the numerical issue associated
with the use of floating point numbers is solved by the use of the logarithm of the
number of combinations, and this is why we now focus on this computation.
 
k
Let us introduce the function clog as the logarithm of the number
:
x
 
n
clog (n, j) = log
.
(159)
j
The functions fln and clog are related by the equation
 
n!
n
,
=
j
(n j)!j!
46

(160)

1
1
1
1

2
2
2
2

3
3
3
3

4
4
4
4

5
5
5
5

6
6
6
6

7
7
7
7

8
8
8
8

9
9
9
9

10
10
10
10

J
J
J
J

Q
Q
Q
Q

K
K
K
K

Figure 14: Cards of a 52 cards deck - J stands for Jack, Q stands for Queen and
K stands for King
Name
no pair
pair
double pair
three of a kind
straight
flush
full house
four of a kind
straight flush
royal flush

Description
none of the below combinations
two cards of the same rank
2 two cards of the same rank
three cards of the same rank
five cards in a sequence, not all the same suit
five cards in a single suit
one pair and one triple, each of the same rank
four cards of the same rank
five in a sequence in a single suit
10, J, Q, K, 1 in a single suit

Example
7 3 6 3 1
Q Q 2 3 1
2 2 Q Q
2 2 2 3 1
3 4 5 6 7
2 3 7 J K
2 2 2 Q Q
5 5 5 5 2
2 3 4 5 6
10 J Q K 1

Figure 15: Winning combinations at the Poker


which implies
clog (n, j) = log(n!) log((n j)!) log(j!).

(161)

The following nchooseklog computes clog (n, j) from the equation 161.
function c = nchooseklog ( n , k )
c = gammaln ( n + 1 ) - gammaln ( k + 1) - gammaln ( n - k + 1)
endfunction

2.14

The poker game

In the following example, we use Scilab to compute the probabilities of poker hands.
The poker game is based on a 52 cards deck, which is presented in figure 14.
Each card can have one of the 13 available ranks from 1 to K, and have on the 4
available suits , , and . Each player receives 5 cards randomly chosen in
the deck. Each player tries to combine the cards to form a well-known combination
of cards as presented in the figure 15. Depending on the combination, the player
can beat, or be defeated by another player. The winning combination is the rarest;
that is, the one which has the lowest probability. In figure 15, the combinations are
presented in decreasing order of probability.
Even if winning at this game requires some understanding of human psychology,
understanding probabilities can help. Why does the four of a kind beats the full
house ?
47

To answer this question, we will compute the probability of each event. Since the
order of the cards can be changed by the player, we are interested in combinations
(and not in permutations). We make the assumption that the process of choosing
the cards is really random, so that all combinations of 5 cards have the same probabilities, i.e. the distribution function is uniform. Since the order of the cards does
not matter, the sample space is the set of all combinations of 5 cards chosen from
52 cards. Therefore, the size of is
 
52
# =
= 2598960.
(162)
5
The probability of a four of a kind is computed as follows. In a 52-cards deck,
there are 13 different four of a kind combinations. Since the 5-th card is chosen at
random from the 48 remaining cards, there are 13.48 different four of a kind. The
probability of a four of a kind is therefore
624
13.48
0.0002401
P (four of a kind) =   =
2598960
52
.
5

(163)

The probability of a full house is computed as follows.


  There are 13 different
4
ranks in the deck, and, once a rank is chosen, there are
different pairs for one
2
 
4
rank Therefore the total number of pairs is 13.
. Once the pair is set, there are
2
 
4
12 different ranks to choose for the triple and there are
different triples for one
 3
 
4
4
rank. The total number of full house is therefore 13.
.12.
. Notice that the
2
3
triple can be chosen first, and the pair second, but this would lead exactly to the
same result. Therefore, the probability of a full house is
   
4
4
13.12.
.
2
3
3744
 
P (full house) =
0.0014406
(164)
=
2598960
52
.
5
The computation of all the probabilities of the winning combinations is given as
exercise.

2.15

Bernoulli trials

In this section, we present Bernoulli trials and the binomial discrete distribution
function. We give the example of a coin tossed several times as an example of such
a process.
Definition 2.15. A Bernoulli trials process is a sequence of n > 0 experiments with
the following rules.
48

p
p

S
q

S
q

(start)
p

F
q

Figure 16: A Bernoulli process with 3 trials. The letter S indicates success and
the letter F indicates failure.
1. Each experiment has two possible outcomes, which we may call success and
failure.
2. The probability p [0, 1] of success of each experiment is the same for each
experiment.
In a Bernoulli process, the probability p of success is not changed by any knowledge of previous outcomes For each experiment, the probability q of failure is
q = 1 p.
It is possible to represent a Bernoulli process with a tree diagram, as the one in
figure 16.
A complete experiment is a sequence of success and failures, which can be represented by a sequence of Ss and Fs. Therefore the size of the sample space is
#() = 2n , which is equal to 23 = 8 in our particular case of 3 trials.
By definition, the result of each trial is independent from the result of the previous
trials. Therefore, the probability of an event is the product of the probabilities of
each outcome.
Consider the outcome x = SF S for example. The value of the distribution
function f for this outcome is
f (x = SF S) = pqp = p2 q.

(165)

The table 17 presents the value of the distribution function for each outcome x .
We can check that the sum of probabilities of all events is equal to 1. Indeed,
X
f (xi ) = p3 + p2 q + p2 q + pq 2 + p2 q + pq 2 + pq 2 + q 3
(166)
i=1,8

= p3 + 3p2 q + 3pq 2 + q 3
= (p + q)3
= 1

(167)
(168)
(169)

We denote by b(n, p, j) the probability that, in n Bernoulli trials with success


probability p, there are exactly j successes. In the particular case where there are
49

x
SSS
SSF
SFS
SFF
FSS
FSF
FFS
FFF

f (x)
p3
p2 q
p2 q
pq 2
p2 q
pq 2
pq 2
q3

Figure 17: Probabilities of a Bernoulli process with 3 trials


n = 3 trials, the figures 16 and 17 gives the following results:
b(3, p, 3)
b(3, p, 2)
b(3, p, 1)
b(3, p, 0)

=
=
=
=

p3
3p2 p
3pq 2
q3

(170)
(171)
(172)
(173)

The following proposition extends the previous analysis to the general case.
Proposition 2.16. ( Binomial probability) In a Bernoulli process with n > 0 trials
with success probability p [0, 1], the probability of exactly j successes is
 
n j nj
b(n, p, j) =
pq ,
(174)
j
where 0 j n and q = 1 p.
Proof. We denote by A the event that one process is associated with exactly j
successes. By definition, the probability of the event A is
X
b(n, p, j) = P (A) =
f (x).
(175)
xA

Assume that an outcome x is associated with exactly j successes. Since there


are n trials, the number of failures is n j. That means that the value of the
distribution function of this outcome x A is f (x) = pj q nj . Since all the outcomes
x in the set A have the same distribution function value, we have
b(n, p, j) = #(A)pj q nj .

(176)

The size of the set A is the number of subsets of j elements in a set of size n. Indeed,
the order does not matter since we only require that, during the whole process, the
total number of successes is exactly j, no matter of the order of the successesand

n
failures. The number of outcomes with exactly j successes is therefore #(A) =
,
j
which, combined with equation 176, concludes the proof.
50

Example 2.3 A fair coin is tossed six times. What is the probability that exactly 3
heads turn up ? This process is a Bernoulli process with n = 6 trials. Since the coin
is fair, the probability of success at each trial is p = 1/2. We can apply proposition
2.16 with j = 3 and get
 
6
b(6, 1/2, 3) =
(1/2)3 (1/2)3 0.3125,
(177)
3
so that the probability of having exactly 3 heads is 0.3125.

2.16

Computing the binomial distribution

In this section, we present the computation of the binomial distribution function.


As we are going to see, there are numerical issues, so that computing this binomial
distribution function is not as easy as it seems.
The following binopdf_naive function is a naive implementation of the equation
174. It takes as input arguments the number of trials n in the Bernoulli process, the
probability of success of one trial pb and the number of sucesses x. It returns the
probability of getting exactly x successes in the Bernoulli experiment.
function p = binopdf_naive ( x , n , pb )
qb = 1 - pb
p = nchoosek (n , x ) * pb ^ x * qb ^( n - x )
endfunction

We can check that our implementation is correct by checking against two simple
examples.
Example 2.4 A fair coin is tossed six times. What is the probability that exactly
3 heads turn up ?
-->n = 6; pb = 0.5;
-->p = binopdf_naive ( 3 , n , pb
p =
0.3125

Example 2.5 Assume that we work in a factory producing n = 100 puffins each
day. The probability of producing a defective puffin is 2%. What is the probability
that exactly 0 puffins are produced ?
-->n = 200; pb = 2/100;
-->p = binopdf_naive ( 0 , n , pb
p =
0.0175879

We can now consider an example where the probability of success of each Bernoulli
trial is far more likely. Assume that the probability of sucess is p = 1 1020 , i.e.
there is an extremely strong probability of success. Assume that we make n = 100
trials. What is the probability of having 90 sucesses ?
-->n = 100; pb = 1 - 1. e -20;
--> binopdf_naive ( 90 , n , pb )
ans =
0.

51

In order to check our computation, we can use Wolfram Alpha [16], with the expression :
nchoosek(100,90) * (1-10^-20)^90 * (10^-20)^(100-90)
The exact result is
p = 1.73103094564399999... 10187 .

(178)

This number is small, but it is representable by the double precision floating point
numbers used in Scilab. The reason of the failure of the naive implementation is
because the probability of success is so close to 1 that is has been rounded to one
by Scilab, as show in the following session.
--> format ( " e " ,25)
--> pb = 1 - 1. e -20
pb =
1. 00 000 00 000 000 00 000 D +00

Hence the probability of failure qb is represented by the floating point number zero,
leading to an inaccurate computation. In fact, any probability smaller than the
machine epsilon  1016 would lead to the same issue.
The following binopdf_lessnaive function is a more accurate implementation
of the Binomial distribution function. Instead of computing the complementary
probability qb from pb, it takes it directly as an input argument.
function p = binopdf_lessnaive ( x , n , pb , qb )
p = nchoosek (n , x ) .* pb ^ x .* qb ^( n - x )
endfunction

The operator .* used in the binopdf_lessnaive function allows it to returns a


result when x is a vector. In the following session, we compute the same probability
as before and show that the current implementation is accurate, with 15 significant
digits (i.e. the maximum possible precision).
-->n = 100; pb = 1 - 1. e -20; qb = 1. e -20;
--> binopdf_lessnaive ( 90 , n , pb , qb )
ans =
1.731030945643998948 -187

In the previous example, there is an obvious difference between the naive and the
accurate implementations. But there are cases where the difference between the two
functions is less obvious, leading to the false feeling that the naive implementation
is accurate. This happens in particular where the complementary probability q is
close to zero, but larger than the machine precision, that is, larger than  1016 in
the context of Scilab. In this case, the computed value of qb=1-pb is nonzero, but
does not have full accuracy: its digits are mainly driven by the rounding errors.
There is still something wrong with our less naive implementation. Indeed, consider the following case, where we use a probability pb close to one and a particularily
chosen value x.
-->n = 1. e9 ; pb = 1 - 1. e -14; qb = 1. e -14;
-->x = 1. e9 - 48
x =

52

999999952.
--> binopdf_lessnaive ( x , n , pb , qb )
ans =
Nan

We can compute the exact result with Wolfram Alpha and get
e = 8.05 5386429 9160072 1 e -302

The previous number is close to the low limit available for double precision normalized floating point numbers. We can analyze what happens here by computing the
intermediate terms which appear in the computation, as in the following session.
--> nchoosek (n , x )
ans =
Inf
--> pb ^ x
ans =
0.9999900080431765037048
--> qb ^( n - x )
ans =
0.

Since the number of combination is extremely large, it is represented by the floating


point number Inf. Moreover, the term q nx is extremely small, and this is why it
is represented by zero. The product Inf * 0 generates the IEEE value Nan, which
stands for Not a number.
A solution to this issue is to use the logarithm of the number of combinations.
We consider the logarithm of the equation 174 and get
 
n
log(b(n, p, j)) = log
+ j log(p) + (n j) log(q).
(179)
j
This leads to the following implementation of the binopdf function.
function p = binopdf ( x , n , pb , qb )
plog = nchooseklog (n , x ) + x .* log ( pb ) + (n - x ) * log ( qb )
p = exp ( plog )
endfunction

In the following session, we check that our implementation gives an accurate result.
-->n = 1. e9 ; pb = 1 - 1. e -14; qb = 1. e -14;
-->x = 1. e9 - 48
x =
999999952.
--> binopdf ( x , n , pb , qb )
ans =
8.055383937519171497 -302

By looking more closely at the previous result, we see that the order of magnitude
is good, but that all digits are not exact. In the following session, we compute the
number of decimal significant digits from the formula d = log10 (|c e|/e), where
e is the exact result and c is the computed result.
-->d = - log10 ( abs (p - e )/ e )
d =
6.5094691877632762100347

53

We see that the accurate implementation has about 6 significant digits, which is
far less than the maximum achievable precision. Indeed, the maximum achievable
precision is 15 significant digits for Scilab doubles. In order to measure the sensitivity of the output arguments depending on the input arguments, we compute the
condition number of the binomial distribution function for this particular value of x.
In the following session, we compute the probability p1 of a slightly modified input
argument x + 2 * %eps * x. Then we compute the relative difference of the input
rx and the relative difference of the output ry. The condition number is defined as
the ratio between these two relative differences.
--> p1 = binopdf ( x + 2 * %eps * x , n , pb , qb )
p1 =
8.055461212394521478 -302
--> rx = 2* %eps
rx =
4. 44 089 20 985 006 26 162 D -16
--> ry = abs ( p2 - p )/ p
ry =
9. 81 753 23 288 868 64 651 D -07
-->c = ry / rx
c =
2. 21 071 17 469 036 34 071 D +09

We see that the condition number is close to 1010 . This means that a relatively
small variation of the input argument x generates a relatively large change in the
output probability p. In our particular case, a very small variation of x can make
the probability varying suddenly from values close to zero to values close to 1.
Hence, the computation is numerically difficult and this explains why the accuracy of the binopdf function is sometimes less than maximum. We emphasize
that this is not a problem which is specific to the binopdf function: it is, indeed, a
problem which comes from the behaviour of the function itself, which varies greatly
for particular values of the input argument x.

2.17

The hypergeometric distribution

In this section, we present the hypergeometric distribution function.


Consider an urn containing m > 1 balls. Assume that this urn contains k red
balls and mk blue balls. In this urn, draw n balls without replacement. We assume
here that 1 k m, 1 n m and 1 x n. What is the probability of having
x red balls ? This distribution function is called the hypergeometric distribution
function.
We assume here that 1 k m, 1 n m and 1 x n. We denote by
P (X = x) = h(x, m, k, n) this distribution function.
The sample space ismade
 of all the possible choices of n balls from a set of m
m
balls. There are #() =
ways to perform this choice.
n
 
k
There are
ways to select k balls from the set of k red balls and there are
x


mk
ways to select n x balls in the set of m k blue balls.
nx
54

Therefore, the probability of selecting x red balls is given by the hypergeometric


distribution function, defined by
 

k
mk
x
nx
 
P (X = x) = h(x, m, k, n) =
.
(180)
m
n
The actual computation of this distribution function is not straightforward, as
we are going to see in the next section.
An example of application of this distribution function is given in the exercise 2.9,
which consider the computation of the probability of the prediction of earthquakes
by chance. Another example of this distribution function is given in the next
section.

2.18

Computing the hypergeometric distribution with Scilab

In this section, we consider the computation of the hypergeometric distribution


function in Scilab.
In the following function, we use the nchoosek function and implements a naive
formula for the hypergeometric distribution function.
function p = hygepdf_naive ( x , m , k , n )
p = nchoosek (k , x ) * nchoosek (m -k ,n - x ) / nchoosek (m , n )
endfunction

We can check that our implementation is correct by checking against a simple


example.
Example 2.6 Assume that we have a collection of m = 100 puffins and that k = 30
of them are defective. Assume that we select n = 10 puffins at random. What is
the probability of having exactly x = 5 defective puffins ? The following session
computes this probability by using the hygepdf_naive function.
-->m = 100; k = 30; n = 10;
--> hygepdf_naive ( 5 , m , k , n )
ans =
0.0996373

We can see that the naive implementation fails to compute the h(x = 200, m =
1030, k = 500, n = 515) which exact value is 1.65570 1010 . This example is
used by Yalta in [21] to prove that Excel 2007 is inaccurate with respect to the
hypergeometric function.
-->m =1030; k =500; n =515;
--> hygepdf_naive ( 200 , m , k , n )
ans =
0.

The reason of this failure is because some intermediate term involved in the
computation of h(x, m, k, n) is too large to be represented in a double precision
floating point number. Indeed, the computation of h(x = 200, m = 
1030,
k =

515
515
500, n = 515) involves the computation of the intermediate terms
,
200
300
55


1030
and
. The following session shows that the last term is represented as the
500
Infinity number of the IEEE-754 standard.
--> nchoosek (1030 ,515)
ans =
Inf

We can solve the problem by computing first the logarithm of the probability
and then exponentiating the result. Let us introduce the function hlog defined by
hlog (x, m, k, n) = log(h(x, m, k, n))
 


 
k
mk
m
= log(
) + log(
) log(
).
x
nx
n

(181)
(182)

Hence, if we are able to compute accurately the logarithm of the hypergeometric


distribution, we can easily compute the required probability by the equation
h(x, m, k, n) = exp(hlog (x, m, k, n)).

(183)

The
computation
of the function log-combination function, i.e. the clog (n, k) =

n
log
function, has been presented in the section 2.13.
k
The following hygepdf function finally computes the hypergeometric function
from the equations 182 and 183.
function p = hygepdf ( x , m , k ,
c1 = nchooseklog (
k,
x
c2 = nchooseklog ( m -k , n - x
c3 = nchooseklog ( m ,
n
p_log = c1 + c2 - c3
p = exp ( p_log )
endfunction

n )
)
)
)

The previous implementation is a Scilab port of a Matlab implementation due to


John Burkardt.
The following Scilab session shows that the previous implementation leads to an
accurate floating point result, as opposed to the naive implementation.
-->m =1030; k =500; n =515;
--> hygepdf ( 200 , m , k , n )
ans =
1.656 D -10

2.19

Notes and references

Combinatorics topics presented in section 2 can be found in [7], chapter 3, Combinatorics. The example of Poker hands is presented in [7], while the probabilities of
all Poker hands can be found in Wikipedia [20].
Parts of the section 2.8, dedicated to Stirlings formula are based on [7], chapter
3, Combinatorics.
There is some confusion related to the hypergeometric distribution function,
since many texts use various symbols for the distribution function, its parameters
56

and the order of these parameters. The notation for the hypergeometric distribution
function h presented in section 2.17 is taken from Matlab. The letters chosen for
the parameters and the orders of the arguments are the one chosen in Matlab.
In an earlier version of this document, we considered [7], section 5.1, Important
Distributions. Indeed, Grinstead and Snell chose to denote the number of items
(or balls) in the urn by the capital letter N , which may lead to bugs in Scilab
implementations, because of the variable n, denoting the number of samples (number
of balls selected). Also, in Matlab, the total number of balls in the urn (i.e. m)
comes as the first parameter of the distribution function, while it is the last one in
Grinstead and Snell. For practical reasons, we choose to keep Matlabs choice.
The gamma function presented in section 2.3 is covered in many textbook, as in
[1]. An in-depth presentation of the gamma function is done in [18].

2.20

Exercises

Exercise 2.1 (Recurrence relation of binomial ) Prove the proposition 2.14, which is the
following. For integers n > 0 and 0 < j < n, the binomial coefficients satisfy
  
 

n
n1
n1
=
+
.
(184)
j
j
j1
Exercise 2.2 (Number of subsets) Assume that is a finite set with n 0 elements. Prove
that there are 2n subsets of . Consider that and are proper subsets of .
Exercise 2.3 (Probabilities of Poker hands) Why does computing the probability of a straight
flush forces to take into account the probability of the royal flush ? Explain other possible conflicts
between Poker hands. Compute the probabilities for all Poker hands in figure 15.
This exercise is partly given in [7], in section 3.2, Combinations.
Exercise 2.4 (Bernoulli trial for a die experiment) A die is rolled n = 4 times. What is the
probability that we obtain exactly one 6 ? What is the probability for n = 1, 2, . . . , 12 ?
Exercise 2.5 (Probability of a flight crash) Assume that there are 20 000 flights of airplanes
each day in the world. Assume that there is one accident every 500 000 flights. What is the
probability of getting exactly 5 crash in 22 days ? What is the probability of getting at least 5
crash in 22 days ? What is the probability of getting exactly 3 crash in 42 days ? What is the
probability of getting at least 3 crash in 42 days ? Consider now all that the year is a sequence of
16 periods of 22 days (ignoring the 13 days left in the year). What is the probability of having one
period in the year which contains at least 5 crash ?
This exercise is presented in La loi des series noires, by Janvresse and de la Rue [9].
Exercise 2.6 (Binomial function maximum) Consider the dicrete distribution function of a
Bernoulli process, as defined by 2.16. Show that


p nj+1
b(n, p, j 1),
(185)
b(n, p, j) =
q
j
for j 1. Compute jm 1 so that b(n, p, j) is maximum, i.e. so that
b(n, p, jm ) b(n, p, j),

1 j n.

(186)

Consider the experiment presented in section 3.4, which consists in tossing a coin 10 times and
counting the number of heads. With a Scilab simulation, can you compute what is the number of
heads which is the most likely to occur ?
This exercise is given in [7], in the exercise part of section 3.2, Combinations.

57

Exercise 2.7 (Binomial coefficients and Pascals triangle) This exercise is given in [7], in
chapter 3, Combinatorics. Let a, b be two real numbers and let n be a positive integer. Prove
the binomial theorem which states that
X n
n
(a + b) =
aj bnj .
(187)
j
j=0,n

 
n
The binomial coefficients
can be written in a triangle, where each line corresponds to n and
j
each row corresponds to j, as in the following array

1.

1. 1.

(188)
A = 1. 2. 1.

1. 3. 3. 1.
1. 4. 6. 4. 1.
Use the binomial theorem in order to prove that the sum of the terms in the n-th row is 2n . Prove
that if the terms are added with alternating signs, then the sum is zero.
Binomial coefficients can also be represented in a matrix called Pascals matrix, where the
binomial coefficients are stored in the diagonals of the matrix.

1. 1. 1. 1.
1.
1. 2. 3. 4.
5.

(189)
A = 1. 3. 6. 10. 15.

1. 4. 10. 20. 35.


1. 5. 15. 35. 70.
Design a Scilab script to compute Pascals triangle and Pascals matrix and check that you find
the same results as presented in 188 and 189.
Exercise 2.8 (Binomial identity ) Prove the following binomial identity
 
X n2
2n
=
.
n
j

(190)

j=0,n

To help to prove this result, consider a set with 2n elements, where n elements are red and n
elements are blue. Compute the number of ways to choose n elements in this set.
This exercise is given in [7], in chapter 3, Combinatorics.
Exercise 2.9 (Earthquakes and predictions) Assume that a person predicts the dates of
major earthquakes (with magnitude larger than 6.5 or with a large number of deaths, etc...) in
the world during 3 years, i.e. in a period of 1096 days. Assume that the specialist predicts 169
earthquakes. Assume that, during the same period, 196 major earthquakes really occur, so that 33
earthquakes were correctly predicted by the specialist. What is the probability that earthquakes
are predicted by chance ?
This exercise is presented by Charpak and Broch in [4].
Exercise 2.10 (Log-factorial function) There is another possible implementation of the logfactorial function. Indeed, we have, by definition
n! = 1.2. . . . .n,

(191)

(192)

which implies
fln (n)

log(n!) = log(1.2. . . . .n)


X
=
log(i).
i=1,n

Propose an implementation of the log-factorial function based on formula 193.

58

(193)

Simulation of random processes with Scilab

In this section, we present how to simulate random events with Scilab. The problem
of generating random numbers is more complex and will not be detailed in this
chapter. We begin by a brief overview of random number generation and detail
the random number generator used in the rand function. Then we analyze how to
generate random numbers in the interval [0, 1] with the rand function. We present
how to generate random integers in a given interval [0, m 1] or [m1 , m2 ]. In the
final part, we present a practical simulation of a game based on tossing a coin.

3.1

Overview

In this section, we present a special class of random number generators so that we


can have a general representation of what exactly this means.
The goal of a uniform random number generator is to generate a sequence of real
values un [0, 1] for n 0. Most uniform random number generators are based on
the fraction
xn
m

un =

(194)

where m is a large integer and xn is a positive integer so that 0 < xn < m . In many
random number generators, the integer xn+1 is computed from the previous element
in the sequence xn .
The linear congruential generators [10] are based on the sequence
xn+1 = (axn + c)

(mod m),

(195)

where
m is the modulus satisfying m > 0,
a is the multiplier satisfying 0 a < m,
c is the increment satisfying 0 c < m,
x0 is the starting value satisfying 0 x0 < m.
The parameters m, a, c and x0 should satisfy several conditions so that the
sequence xn has good statistical properties. Indeed, naive approaches leads to poor
results in this matter. For example, consider the example where x0 = 0, a = c = 7
and m = 10. The following sequence of number is produced :
0.6

0.9

0.

0.7

0.6

0.9

0.

0.7 . . .

(196)

Specific rules allow to design the parameters of a uniform random number generator.
As a practical example, consider the Urand generator [12] which is used by Scilab
in the rand function. Its parameters are
m = 231 ,
59

a = 843314861,
c = 453816693,
x0 arbitrary.
The first 8 elements of the sequence are
0.2113249
0.7560439
0.0002211
0.3303271
0.6653811
0.6283918
0.8497452
0.6857310 . . .

3.2

(197)
(198)

Generating uniform random numbers

In this section, we present some Scilab features which allow to simulate a discrete
random process.
We assume here that a good source of random numbers is provided.
Scilab provides two functions which allow to generate uniform real numbers in the
interval [0, 1]. These functions are rand and grand. Globally, the grand function
provides much more features than rand. The random number generators which
are used are also of higher quality. For our purpose of presenting the simulation of
random discrete events, the rand will be sufficient and have the additional advantage
of having a simpler syntax.
The simplest syntax of the rand function is
rand ()

Each call to the rand function produces a new random number in the interval [0, 1],
as presented in the following session.
--> rand ()
ans =
0.2113249
--> rand ()
ans =
0.7560439
--> rand ()
ans =
0.0002211

A random number generator is based on a sequence of integers, where the first


element in the sequence is the seed. The seed for the generator used in the rand is
hard-coded in the library as being equal to 0, and this is why the function always
returns the same sequence of Scilab. This allows to reproduce the behavior of a
script which uses the rand function more easily.
The seed can be queried with the function rand("seed"), while the function
rand("seed",s) sets the seed to the value s. The use of the seed input argument
is presented in the following session.
--> rand ( " seed " )
ans =
0.
--> rand ( " seed " ,1)
--> rand ()

60

ans

=
0.6040239
--> rand ()
ans =
0.0079647

In most random processes, several random numbers are to use at the same time.
Fortunately, the rand function allows to generate a matrix of random numbers,
instead of a single value. The user must then provide the number of rows and
columns of the matrix to generate, as in the following syntax.
rand ( nr , nc )

The use of this feature is presented in the following session, where a 2 3 matrix of
random numbers is generated.
--> rand (2 ,3)
ans =
0.6643966
0.9832111

3.3

0.5321420
0.4138784

0.5036204
0.6850569

Simulating random discrete events

In this section, we present how to use a uniform random number generator to generate integers in a given interval. Assume that, given a positive integer m, we want to
generate random integers in the interval [0, m 1]. To do this, we can use the rand
function, and multiply the generated numbers by m. We must additionally use the
floor function, which returns the largest integer smaller than the given number.
The following function returns a matrix with size nrnc, where entries are random
integers in the set {0, 1, . . . , m 1}.
function ri = generateInRange0M1 ( m , nbrows , nbcols )
ri = floor ( rand ( nbrows , nbcols ) * m )
endfunction

In the following session, we generate random integers in the set {0, 1, . . . , 4}.
-->r = generateInRange0M1 ( 5 , 4 , 4 )
r =
2.
0.
3.
0.
1.
0.
2.
2.
0.
1.
1.
4.
4.
2.
4.
0.

To check that the generated integers are uniform in the interval, we compute the
distribution of the integers for 10000 integers in the set {0, 1, . . . , 4}. We use the bar
to plot the result, which is presented in the figure 18. We check that the probability
of each integer is close to 15 = 0.2.
-->r = generateInRange0M1 ( 5 , 100 , 100 );
--> counter = zeros (1 ,5);
--> for i = 1:100
--> for j = 1:100
--> k = r (i , j );
--> counter ( k +1) = counter ( k +1) + 1;

61

Figure 18: Distribution of random integers from 0 to 4.


--> end
--> end
--> counter = counter / 10000;
--> counter
counter =
0.2023
0.2013
0.1983
--> bar ( counter )

0.1976

0.2005

We emphasize that the previous verifications allow to check that the empirical
distribution function is the expected one, but that does not guarantee that the
uniform random number generator is of good quality. Indeed, consider the sequence
xn = n (mod 5). This sequence produces uniform integers in the set {0, 1, . . . , 4},
but, obviously, is far from being truly random. Testing uniform random number
generators is a much more complicated problem and will not be presented here.
It is easy to adapt the previous function to various needs. For example, the following function returns a matrix with size nrnc, where entries are random integers
in the set {1, 2, . . . , m}.
function ri = generateInRange1M ( m , nr , nc )
ri = ceil ( rand ( nr , nc ) * m )
endfunction

The following function returns a matrix with size nrnc, where entries are random
integers in the set {m1 , m1 + 1 . . . , m2 }.
function ri = generateInRangeM12 ( m1 , m2 , nr , nc )
f = m2 - m1 + 1
ri = floor ( rand ( nr , nc ) * f ) + m1
endfunction

62

3.4

Simulation of a coin

Many practical experiments are very difficult to analyze by theory and, most of the
time, very easy to experiment with a computer. In this section, we give an example
of a coin experiment which is simulated with Scilab. This experiment is simple, so
that we can check that our simulation matches the result predicted by theory. In
practice, when no theory is able to predict a probability, it is much more difficult to
assess the result of simulation.
The following Scilab function generates a random number with the rand function
and use the floor in order to get a random integer, either 1, associated with Head,
or 0, associated with Tail. It prints out the result and returns the value.
// tossacoin -//
Prints " Head " or " Tail " depending on the simulation .
//
Returns 1 for " Head " , 0 for " Tail "
function face = tossacoin ( )
face = floor ( 2 * rand () );
if ( face == 1 ) then
mprintf ( " Head \ n " )
else
mprintf ( " Tail \ n " )
end
endfunction

With such a function, it is easy to simulate the toss of a coin. In the following
session, we toss a coin 4 times. The seed argument of the rand is used so that the
seed of the uniform random number generator is initialized to 0. This allows to get
consistent results across simulations.
rand ( " seed " ,0)
face = tossacoin
face = tossacoin
face = tossacoin
face = tossacoin

();
();
();
();

The previous script produces the following output.


Tail
Head
Tail
Tail

Assume that we are tossing a fair coin 10 times. What is the probability that
we get exactly 5 heads ?
This is a Bernoulli process, where the number of trials is n = 10 and the probability is p = 5. The probability of getting exactly j = 5 heads is given by the
binomial distribution and is
P (exactly 5 heads in 10 toss) = b(10, 1/2, 5)
 
10 5 105
=
pq
,
5

(199)
(200)

where p = 1/2 and q = 1 p. The expected probability is therefore


P (exactly 5 heads in 10 toss) 0.2460938.
63

(201)

The following Scilab session shows how to perform the simulation. Then, we
perform 10000 simulations of the process. The floor function is used in combination
with the rand function to generate integers in the set {0, 1}. The sum allows to count
the number of heads in the experiment. If the number of heads is equal to 5, the
number of successes is updated accordingly.
--> rand ( " seed " ,0);
--> nb = 10000;
--> success = 0;
--> for i = 1: nb
--> faces = floor ( 2 * rand (1 ,10) );
--> nbheads = sum ( faces );
--> if ( nbheads == 5 ) then
-->
success = success + 1;
--> end
--> end
--> pc = success / nb
pc =
0.2507

The computed probability is P = 0.2507 while the theory predicts P = 0.2460938,


which means that there is lest than two significant digits. It can be proved that when
we simulate an experiment of this type n times, we can expect that the error is less
or equal to 1n at least 95% of the time. With n = 10000 simulations, this error
corresponds to 0.01, which is the accuracy of our experiment.

3.5

Simulation of a Galton board

In this section, we define the binomial distribution function. We give the example
of the Galton board which is based on a Bernoulli process and simulate this random process. We present the Scilab functions which allow to manage the binomial
function and present their use on the Galton board example.
Definition 3.1. ( Binomial distribution function) Let n be a positive integer and let
p be a real in the interval [0, 1]. Assume that the random variable B is the number of
successes in a Bernoulli process with parameters n and p. The distribution function
b(n, p, j) of B is the binomial distribution.
A Galton board is board in which a ball is dropped at the top of the board and
deflected off a number of pins on their way down to the bottom of the board. The
location of the ball at the end of one trial is the result of random deflections either
to the right or to the left. The figure 19 presents a Galton board.
The following function allows to simulate one fall of a ball on a Galton board.
It takes as its input argument the number n of steps in the Galton board, so that
the final number of cups (where the ball fall) is n + 1. At each stage, a random
number is generated with the rand. If the random number is lower than 1/2, the
ball is deflected to the left. If not, the ball is deflected to the right. Each time
a ball is deflected to the right, the integer jmin is increased. After n deflections,
the variables jmin is the index of the cup into which the ball has fallen. The same
algorithm can be designed with the jmax index, with initial value n + 1. That index
would be decreased each time the ball is deflected to the left.
64

Figure 19: A Galton board.


// simulgalton -//
Performs one simulation of the Galton board with n stages ,
//
and returns the index j =1 ,2 ,... , n where the ball falls . -function j = simulgalton ( n , verbose )
if exists ( " verbose " ," local " )==0 then
verbose = 0
end
jmin = 1
for k = 1 : n
if verbose == 1 then
mprintf ( " Step # %d ( %d )\ n " ,k , jmin )
end
r = rand ()
if r <0.5 then
if verbose == 1 then
mprintf ( " To the left !\ n " )
end
else
if verbose == 1 then
mprintf ( " To the right !\ n " )
end
jmin = jmin + 1
end
end
j = jmin
endfunction

In the following Scilab script, we perform 10 000 experiments of the Galton board.
The cups variables stores the number of balls in each cup. For each experiment, we
update the number of balls in the cup which has been randomly generated by the
process. The bar function allows to plot the figure.
rand ( " seed " , 0 )
n = 10
cups = zeros (1 , n +1)
nshots = 10000
for k = 1: nshots
j = simulgalton ( n );
cups ( j ) = cups ( j ) + 1;

65

Simulation of the Galton board with n=100


0.25
Galton board
Binomial distribution

0.20

0.15

0.10

0.05

0.00
1

10

11

Figure 20: Simulation of a Galton board with n = 10 stages and 100 simulations.
The line is the binomial distribution function.
end
bar (1: n +1 , cups )

The figures 20, 21 and 21 presents the result of the simulation of a Galton board
for n = 100, n = 1000 and n = 1000 as bar plots.
In the figure 23, we present Scilab functions which allow to manage the binomial
distribution. The cdfbin cumulated density function will be presented in the more
general context of cumulated density functions.
In the following session, we use the binomial function to compute the probabilities of the binomial distribution function. The plot function generates the plot
which is presented in the previous figures, where the binomial distribution function
is computed with n = 10 and p = 0.5.
pr = binomial (0.5 , n )
plot (1: n +1 , pr )

In the figures 20, 21 and 21, we see that the bar plot representing the Galton
simulations converge toward line plot representing the binomial distribution function.

3.6

Generate random permutations

In this section, we present Scilab features which allows to generate random permutations. This problem is similar to the problem of shuffling a deck of cards. This
can be done by using the perms and grand functions.
The figure 24 presents the Scilab functions which allow to generate random permutations.
The perms function computes all the permutations of the given vector of indexes.
If the size of the input vector is n, then the size of the output matrix is n!n. In the
66

Simulation of the Galton board with n=1000


0.25
Galton board
Binomial distribution

0.20

0.15

0.10

0.05

0.00
1

10

11

Figure 21: Simulation of a Galton board with n = 10 stages and 1000 simulations.

Simulation of the Galton board with n=10000


0.30
Galton board
Binomial distribution
0.25

0.20

0.15

0.10

0.05

0.00
1

10

11

Figure 22: Simulation of a Galton board with n = 10 stages and 10 000 simulations.
binomial returns a matrix containing b(n, p, j) for j = 1, 2, . . . , n
cdfbin
computes the binomial cumulated density function
Figure 23: Scilab commands for the binomial distribution function
perms
returns all the permutations of the given vector
grand ( i , prm , v ) generate i random permutations of the column vector v
Figure 24: Scilab commands to generate random permutations
67

following session, we use the perms to compute all the permutations of the vector
(1, 2, 3).
--> perms (1:3)
ans =
3.
2.
3.
1.
2.
3.
2.
1.
1.
3.
1.
2.

1.
2.
1.
3.
2.
3.

When the number of elements in the array is small, lower than 5 for example,
we may generate all the possible permutations and randomly chose one of them. In
the following session, we use n=4 and store in p all the possible permutations of the
column vector (1, 2, 3, 4)T . Then we generate the random number j in the interval
[1, n] and use this to get the permutation stored in the j-th row of p.
-->n =
n =
4.
-->p =
p =
4.
4.
4.
[...]
1.
1.
1.
-->j =
j =
4.
-->v =
v =
4.

perms ( (1: n ) )
3.
3.
2.

2.
1.
3.

3.
2.
2.
4.
2.
3.
floor ( rand (

1.
2.
1.
4.
3.
4.
) * n) + 1

p (j ,:)
2.

1.

3.

The previous method is feasible, but only for very small values of n. Indeed, when
n grows, the required memory is grows as fast as n!, which is impractical for even
moderate values of n.
The following randperm function returns a random permutation of the numbers
in the interval [1, n]. It is based on the use of the grand function, which is used
to generate numbers in the uniform in the interval [0, 1[. Then, we use the gsort
function in order to sort the array and compute the order of the integers. Combined,
these two functions allows to generate a random permutation, at the cost of the sort
of a large number of values.
function p = randperm ( n )
[ ignore , p ] = gsort ( grand (1 ,n , " def " ) , " c " ," i " );
endfunction

For moderate values of n, the randperm function performs well, but for large values
of n, sorting the array might be expensive.
The grand function provides an algorithm to compute a random permutation
over a given array of values. This function is presented in figure 24. In the following
68

session, we use the grand function three times and get three independent permutations of the vector (1:10). A side effect of the call to this function is that is
updates the state of the uniform random number generator used by grand.
-->s = grand ( 1 , " prm " , (1:10) )
s =
5.
1.
4.
3.
8.
7.
-->s = grand ( 1 , " prm " , (1:10) )
s =
4.
2.
3.
9.
6.
8.
-->s = grand ( 1 , " prm " , (1:10) )
s =
7.
4.
8.
9.
5.
2.

2.

10.

6.

9.

10.

7.

5.

1.

10.

1.

6.

3.

The source code used by grand to produce random permutations has been implemented by Bruno Pinon. The algorithm is presented in [10], section 3.4.2 Random
sampling and shuffling. According to Knuth, this algorithm was first published by
Moses and Oakford [13] in 1963 and by Durstenfeld [5] in 1964.
The following genprm function is a simplified version of this algorithm. We assume that the size of the input matrix x is n. The algorithm proceeds by performing
n steps of the algorithm. At the step i, we compute an integer k as a random integer
uniform in the interval [i, n]. Then we exchange the values at indices i and k.
function x = genprm ( x )
n = size (x , " * " )
for i = 1: n
t = grand (1 ,1 , " unf " ,0 ,1)
k = floor ( t * ( n - i + 1)) + i
elt = x ( k )
x(k) = x(i)
x ( i ) = elt
end
endfunction

In the following session, we use the genprm function in order to generate three
independent permutations of the vector (1:10).
--> genprm
ans =
10.
--> genprm
ans =
7.
--> genprm
ans =
5.

( 1:10 )
2.
9.
( 1:10 )

8.

5.

3.

4.

4.
2.
( 1:10 )

9.

3.

1.

6.

2.

6.

3.

4.

10.

9.

7.

5.

1.

6.

1.

10.

8.

7.

8.

We emphasize that the previous function is not designed to be used in practice,


since the grand(1,"prm",x) provides as faster implementation (based on a compiled
source code) of the same algorithm.

3.7

References and notes

Some of the section 3 which introduces to random number generators is based on


[10].
69

Acknowledgments

I would like to thank John Burkardt for his comments about the numerical computation of the permutation function. Thanks are also adressed to Samuel Gougeon
who suggested to improve the performance of the computation of the Pascal matrix.

70

Answers to exercises

5.1

Answers for section 1

Answer of Exercise 1.1 (Head and tail ) Assume that we have a coin which is tossed twice. We
record the outcomes so that the order matters, i.e. the sample space is = {HH, HT, T H, T T }.
Assume that the distribution function is uniform, i.e. each of the head and the tail have an equal
probability. The size of the sample space is #() = 4. The distribution is uniform so that
P (x) = 41 , for all x .
1. What is the probability for the event A = HH, HT, T H ? The number of elements in the
event is #(A) = 3. By proposition 1.9, the probability of the event A is
P (A) =

#(A)
3
= .
#()
4

(202)

2. What is the probability for the event A = HH, HT ?


P (A) =

2
1
#(A)
= = .
#()
4
2

(203)

Answer of Exercise 1.2 (Two dice) Assume that we are rolling a pair of dice. Assume that
each face has an equal probability. The sample space is
= {(i, j)/i, j = 1, 6}.
The size of the sample space is #() = 36. The distribution is uniform so that P (x) =
x .

(204)
1
36 ,

for all

1. What is the probability of getting a sum of 7 ? A sum of 7 corresponds to the event


A = {(1, 6), (6, 1), (2, 5), (5, 2), (4, 3), (3, 4)}.

(205)

The number of elements in the event is #(A) = 6. The probability is therefore P (A) =
6
1
36 = 6 .
2. What is the probability of getting a sum of 11 ? A sum of 11 corresponds to the event
A = {(5, 6), (6, 5)}.

(206)

The number of elements in the event is #(A) = 2. The probability is therefore P (A) =
1
2
36 = 18 .
3. What is the probability of getting a double one, i.e. snakeeyes ? The event is made of the
1
set A = (1, 1), with #(A) = 1. Its probability is P (A) = 36
.
Answer of Exercise 1.3 (Meres experiments) The two proofs are based on the fact that
one event and its complementary event are satisfying P (A) + P (Ac ) = 1. Since P (A) is complex
to compute directly, we compute instead P (Ac ) and then use P (A) = 1 P (Ac ).
The event A is that with four rolls of a die, at least one six turns up. The fact that a die is
rolled four times corresponds to the sample space
= {i / i = 1, 6}4 .

(207)

The size of the sample space is #() = 64 . Two make the computation more easy, we consider the
complementary event Ac , which is
Ac = {i / i = 1, 5}4 .

71

(208)

4
The size of Ac is #(Ac ) = 54 The probability of event A is therefore P (A) = 1 56 0.5177469.
Since P (A) > 1/2, de Mere wins consistently.
De Mere claims in 24 rolls of two dice, a pair of 6 would turn up (event B). The sample space
is now
= {(i, j) / i, j = 1, 6}24 .

(209)

The size of the sample space is #() = 3624 . The complementary event is
Ac = {(i, j) / i, j = 1, 5 or (6, i) / i = 1, 5 or (i, 6) / i = 1, 5}24 .

(210)

The size of Ac is #(Ac ) = (25 + 5 + 5)24 = 3524 . The probability of event A is therefore P (A) =
24
0.4914039 < 1/2, which explains why de Mere looses consistently.
1 35
36
De Mere claims that 25 rolls were necessary to make the game favorable (event C). The same

35 24
derivation leads to the probability P (A) = 1 36
0.5055315 > 1/2.
Can you compute with Scilab the probability for event A and a number of rolls equal to 1, 2,
3 or 4 ? The following Scilab session shows how to perform the computation.
-->i =1:4
i =
1.
2.
- - >1 -(5/6)^ i
ans =
0.1666667

3.

4.

0.3055556

0.4212963

0.5177469

Can you compute with Scilab the probability for the event B or C for a number of rolls equal to
10, 20, 24, 25, 30 ?
The following Scilab session shows how to perform the computation.
-->i =[10 20 24 25 30]
i =
10.
20.
24.
25.
- - >1 -(35/36)^ i
ans =
0.2455066
0.4307397

30.

0.4914039

0.5055315

0.5704969

Answer of Exercise 1.4 (Independent events) Assume that is a finite sample space.
Assume that the two events A, B are independent. Let us prove that
P (B|A) = P (B).

(211)

By definition of the conditional probability, we have


P (B|A) =

P (B A)
,
P (A)

(212)

where, by hypothesis, P (A) > 0. We have B A = A B so that P (B A) = P (A B) =


P (A|B)P (B). Moreover, A and B are independent which implies P (A|B) = P (A). This leads to
P (B A) = P (A)P (B). We plug this equality into the equation 212 and we get P (B|A) = P (B),
which concludes the proof .
Answer of Exercise 1.5 (Booles inequality) Assume that is a finite sample space. Let
(Ei )i=1,n be a sequence of finite subsets of and n > 0. Let s prove Booles inequality:

[
X
P
Ei
P (Ei ).
(213)
i=1,n

i=1,n

To prove this result, we will use the following result.

72

Let E1 , E2 be two sets not necessarily disjoints. Therefore, we have


P (E1 E2 ) P (E1 ) + P (E2 ).

(214)

The equality 214 is the result of proposition 1.7, which states that P (E1 E2 ) = P (E1 ) + P (E2 )
P (E1 E2 ). The equality 214 can be deduced from the fact that P (E1 E2 ) 0, by definition of
a probability.
The equality 213 is therefore true for n = 2. To finish the proof, we will use induction. Let us
assume that the inequality is true for n, and let us prove that it is true for n + 1. Let us denote
by Fn the set defined by
[
Fn =
Ei .
(215)
i=1,n

S

We want to compute the probability P


We use the equality 214, which leads to


E
= P (Fn+1 ). We see that Fn+1 = En+1 Fn .
i
i=1,n+1

P (Fn+1 ) = P (En+1 Fn ) P (En+1 ) + P (Fn ).


By hypothesis, the result is true for n, which implies

[
X
P (Fn ) = P
Ei
P (Ei ).
i=1,n

(216)

(217)

i=1,n

We now plug 217 into 216 and get


P (Fn+1 ) P (En+1 ) +

P (Ei ) =

i=1,n

P (Ei ),

(218)

i=1,n+1

which concludes the proof.


Answer of Exercise 1.6 (Discrete conditional probability) In this exercise, we consider a
laboratory blood test which is experimented against healthy and persons who have the disease.
Assume that, when the person has the disease, the test is positive with probability 99 %. However,
this test also generates false positive with probability 1 %. This means that, when a person does
not have the disease, the test is positive with probability 1 %. Assume that 0.5 % of the population
has the disease. Given that the test is positive for one person, what is the probability that this
person has the disease ?
We see that this situation is based on conditional probabilities, where we would like to compute
posterior probabilities. This is why we would like to use Bayes formula 64. Let us denote by D
the event that the person has the disease and E the event that the test is positive. The event D
is the hypothesis which allow to decompose the sample space of all persons into the subsets D
(the persons who have the disease) and Dc (the healthy persons). We know that P (E|D) = 0.99
and P (E|Dc ) = 0.01. We know that P (D) = 0.005, which implies P (Dc ) = 1 P (D) = 0.995. We
now use Bayes formula and have
P (D|E)

P (E|D)P (D)
P (E|D)P (D) + P (E|Dc )P (Dc )
0.99 0.005
=
0.99 0.005 + 0.01 0.995
0.3322
=

(219)
(220)
(221)

with 4 significant digits. This might be surprising, since the probability of a false positive is 1 %,
which is quite small.
Consider the case where 5 % of the population has the disease.
P (D|E)

0.99 0.05
0.99 0.05 + 0.01 0.95
0.8389

73

(222)
(223)

P (E|Dc ) P (D|E)
0.100000
0.047391
0.050000
0.090494
0.010000 0.332215
0.005000
0.498741
0.001000
0.832632
0.000500
0.908674
0.000100
0.980295
Figure 25: Probabilities of having the disease, given that the test is positive
P (E|D) = 0.99 , P (D) = 0.005. The bold data is the data presented in the text of
the exercise, which corresponds to P (E|Dc ) = 0.01. The lower the probability of a
false positive, the more the result is accurate.
P (D)
P (D|E)
0.500000
0.990000
0.100000
0.916667
0.050000
0.838983
0.010000
0.500000
0.005000 0.332215
0.001000
0.090164
0.000500
0.047188
Figure 26: Probabilities of having the disease, given that the test is positive
P (E|D) = 0.99 , P (E|Dc ) = 0.01. The bold data is the data presented in the text
of the exercise, which corresponds to P (D) = 0.005. The more disease is rare, the
more the result is accurate.
with 4 significant digits. This shows that when more people have the disease (which is certainly
not desirable), the probability is higher.
Consider the case where the probability for a false positive is 0.1 % (but keep the probability
of a true positive equal to 99 %).
P (D|E)

0.99 0.05
0.99 0.05 + 0.001 0.95
0.9811

(224)
(225)

with 4 significant digits. This shows that when the probability of a false positive is lower, the
probability of having the disease, given that the test is positive, is higher.
The previous results might surprise a little bit, but are the result of the false positive. These
results are presented in the figures 25 and 26, which present the probability P (D|E) computed
with varying parameters P (E|Dc ) and P (D). The conclusion of this experiments is that a false
positive can reduce the reliability of the test if the disease is rare, or if the probability of a false
positive is high.
To make the previous computations clearer, consider the example where the population counts
1000 persons. Since 0.5 % of the population has the disease, this makes 0.005 1000 = 5 persons
who have the disease and 0.995 1000 = 995 who do not have the disease. From the 5 persons
who have the disease, there will be 0.99 5 = 4.95 persons who will have a positive test. Similarly,
from the 995 person who do not have the disease, there will be 0.01 995 = 9.95 persons who will
have a positive test. Therefore, given that the test is positive, the probability that the person has

74

the disease is
4.95
= 0.3322.
4.95 + 9.95

5.2

(226)

Answers for section 2

Answer of Exercise 2.1 (Recurrence relation of binomial ) Let us prove proposition 2.14, i.e.
that, for integers n > 0 and 0 < j < n, the binomial coefficients satisfy
  
 

n
n1
n1
=
+
.
(227)
j
j
j1
The proof is based on the expansion of the binomial formula. By definition of the binomial
number 152, we have


(n 1) . . . ((n 1) j + 1)
n1
(228)
=
j
j(j 1) . . . 1
=

(n 1) . . . (n j)
,
j(j 1) . . . 1

(229)

and


n1
j1


=
=

(n 1) . . . ((n 1) (j 1) + 1)
(j 1)(j 2) . . . 1
(n 1) . . . (n j + 1)
.
(j 1)(j 2) . . . 1

We can sum the two previous equalities and obtain



 

(n 1) . . . (n j + 1)(n j) (n 1) . . . (n j + 1)
n1
n1
+
=
+
j
j1
j(j 1) . . . 1
(j 1)(j 2) . . . 1
((n j) + j) (n 1) . . . (n j + 1)
=
j(j 1) . . . 1
n(n 1) . . . (n j + 1)
,
=
j(j 1) . . . 1

(230)
(231)

(232)
(233)
(234)

which proves 227 and concludes the proof.


Answer of Exercise 2.2 (Number of subsets) Assume that n is a finite set with n 0
elements. We consider that and n are proper subsets of n . Let us prove that there are 2n
subsets of n .
In order to simplify the proof, for n 0, we can consider, without loss of generality, the set
n = {1, 2, . . . , n}.
It is easy to see that the claim is true for n = 0, since the set A1 = = n is a subset of n ,
so that there are 20 = 1 subsets. For n = 1, we can construct the sets A1 = , A2 = {1} = n so
that there are 21 = 2 subsets. It is more interesting to construct the sequence of subsets for n = 2.
Indeed, the list of subset is A1 = , A2 = {1} = n , A3 = {2} = n , A4 = {1, 2} = n , so that
the claim is true since there are 22 = 4 subsets.
We can make the proof by induction on n, since we have prooved that the claim is true for
n = 0, 1, 2. Assume that the claim is true for n and let us prove that there are 2n+1 subsets of the set
n+1 . Let us denote by (Aj )j=1,2n the sequence of subsets of the set n . Since n+1 = n n + 1,
we have
Aj n n+1 ,

75

(235)

for j = 1, 2n . We can construct additionnal subsets by considering the sets


Bj = Aj {n + 1},

(236)

for j = 1, 2n . For each j = 1, 2n , we have Bj n+1 . We have found 2n sets (Aj )j=1,2n and 2n
sets (Bj )j=1,2n which are subsets of n+1 . The total number of subsets is therefore 2n +2n = 2n+1 ,
which concludes the proof.
Answer of Exercise 2.3 (Probabilities of Poker hands) Why does computing the probability
of a straight flush forces to take into account the probability of the royal flush ?
Consider the event where the 5 cards are in sequence with a single suit. This might be a straight
flush, if the last card in the sequence is not an ace. If the last card is an ace, then the hand is not
a straight flush anymore, since it is a royal flush. Therefore, the event which is associated with a
straight flush is a sequence of cards with a single suit, but not a royal flush. When computing the
probability of a straight flush, the simplest is to compute the probability of the event sequence of
cards with a single suit, and to remove the probability of the royal flush.
Explain other possible conflicts between Poker hands.
The following is a list of conflicts between Poker hands.
A straight is a hand with 5 fives in sequence, but at least two cards have different suit. If
the cards are in sequence and have the same suit, this is not a straight anymore, this is a
straight flush.
A double pair is when there are two pairs in the hand. But the two pairs must have different
suit, since, if they all have the same suit, this is not a double pair anymore, this is a four of
a kind.
A flush is a hand where all the cards have the same suit. But the cards must not be in
sequence, since they would not form a flush anymore, but would form a straight flush or a
royal flush.
Compute the probabilities for all Poker hands in figure 15.
The probability of a pair is computed
and,
  as follows. There are 13 different ranks in the deck 

4
4
once the rank is chosen, there are
different pairs of this rank Therefore, there are 13
2
2
different pairs in the deck. The remaining 3 cards in the hand are chosen so that there have a
different suit from the current pair. If not, there would be a three of a kind or
 even
 a four of a
12
kind. Therefore, their suit must be chosen in the remaining 12 suits. There are
different set
3
of
 3 cards which have different ranks. For each card, there are 4 different suits, so that there are
12
.43 different combinations for the remaining 3 cards in the hand. The probability of a pair
3
is therefore
   
4
12
13.
.
.43
2
3
1098240
 
0.4225690
(237)
=
P (pair) =
2598960
52
.
5
To compute the probability of a double pair, we must take into account the fact that the two
pairs must have different suits. If not, that would be a four of a kind, instead of a double pair.
This is why we begin by choosing two different ranks in the set of 13 ranks available. Once done,
each pair can have two of the four suits. Therefore, the total number of double pair in the first 4
   2
13
4
cards is
.
. The 5th card must be chosen from the 11 remaining ranks (if not, one of
2
2
the pair would become a three of a kind ). Once the rank of the 5th card is chosen, it can have one
of the 4 available suits. Therefore, there are 11 4 different choices for the 5th card. In the end,

76

the probability for a double pair is


   2
13
4
.
.11.4
2
2
123552
 
P (double pair) =
0.0475390
2598960
52
.
5

(238)

The probability
of a three of a kind is computed as follows. There are 13 
different
ranks and
 

4
4
there are
ways to choose 3 cards in a set of 4. Therefore, there are 13.
different ways
3
3
to select the 3 first cards of the same kind.
  The last two cards must be chosen so that it does
12
not create a four of a kind. There are
different ways to select two ranks in the set of the
2
remaining 12 ranks. After that the ranks are chosen, each of the 2 cards
can have one of the 4
 
12
2
suits, so that there are 4 ways to select the suits. All in all, there are
.42 different ways to
2
select the last 2 cards. In the end, the probability for a three of a kind is
   
4
12
13.
.
.42
3
2
54912
 
=
0.0211285
(239)
P (three of a kind) =
2598960
52
.
5
The probability of a four of a kind and of a full house have already been computed in the text.
The probability of a straight flush is computed as follows. The following is a list of the 10
possible sequences:
1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9 6 7 8 9 10
7 8 9 10 J 8 9 10 J Q 9 10 J Q K 10 J Q K 1,
where all the cards can take the same among the 4 possible suits , , and . Therefore, the
total number of such hands is 4 10 = 40. In order for these hands to be straight flush, and not
royal flush, we must remove the 4 royal flush. Finally, the total number of straight flush is 4 10 4
and the probability of this hand is
P (straight flush) =

4 10 4
36
  =
0.0000154
2598960
52
.
5

(240)

The probability of the royal flush is easy to compute since there are only 4 such hands in the
deck. The probability of the royal flush is therefore
4
4
P (royal flush) =   =
0.0000015
2598960
52
.
5

(241)

The probability of a straight is computed as follows. The following is a list of the 10 possible
sequences:
1 2 3 4 5 2 3 4 5 6 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9 6 7 8 9 10
7 8 9 10 J 8 9 10 J Q 9 10 J Q K 10 J Q K 1,
where each card can have one of the 4 suits , , and . Therefore, the total number of such
hands is 45 .10. But we require that a straight is neither a straight flush, nor a royal flush, so that
we have to remove these higher value hands. Using the previous computation for the straight flush,
we get 45 .10 4 10 different straight. Therefore, the probability of the straight is
P (straight) =

10200
45 .10 4 10
 
=
0.0039262
2598960
52
.
5

77

(242)

Name
total
no pair
pair
double pair
three of a kind
straight
flush
full house
four of a kind
straight flush
royal flush

Number
2598960
1302540
1098240
123552
54912
10200
5108
3744
624
36
4

Probability
1.
0.5011774
0.4225690
0.0475390
0.0211285
0.0039262
0.0019654
0.0014406
0.0002401
0.0000139
0.0000015

Figure 27: Probabilities of Poker hands


The probability of aflush
 is computed as follows. There are 4 suits in the deck. Once the
13
suit is chosen, there are
different sequences of 5 cards which have the same suit. The total
5
 
13
number of such a hand is therefore 4.
. But we have to remove straight flush and royal flush
5
 
13
from this counting. This leads to 4.
4 10 different flush. Therefore, the probability of this
5
hand is
 
13
4.
4 10
5
5108
 
=
0.0019654
(243)
P (flush) =
2598960
52
.
5
The probability of a no pair event is simple to compute when we already know the probabilities
of all other events.
P (no pair)

1 P (royal flush) P (straight flush) P (four of a kind) P (full house)


(244)

P (flush) P (straight) P (three of a kind) P (double pair) P (pair)


(245)
 
13
4.
4 10
5
1302540


=
=
0.5011774
(246)
2598960
52
.
5
The results are summarized in the figure 27.
Answer of Exercise 2.4 (Bernoulli trial for a die experiment) A fair die is rolled n = 4
times. What is the probability that we obtain exactly one 6 ? For each roll, the probability of
getting a six is p = 1/6. Since each roll is independent of the previous rolls, this is a Bernoulli
process with n = 4 trials. Therefore the probability of getting exactly one 6 is
 
4
b(4, 1/6, 1) =
(1/6)1 (5/6)3 0.3858025.
(247)
1
What is the probability for n = 1, 2, . . . , 12 ? The following Scilab function computes the
probability of getting exactly one 6 in n toss of a fair die.

78

// one6inNtoss -//
Computes the probability of getting exactly one "6" in n toss of a fair
function bnpj = one6inNtoss ( n )
p = 1/6
q = 1-p
j = 1
bnpj = nchoosek (n , j ) * p ^ j * q ^( n - j )
endfunction
In the following session, we use the function one6inNtoss to compute the probability for n =
1, 2, . . . , 12.
--> for n = 1:12
--> b = one6inNtoss ( n );
--> mprintf ( " In %d toss , p ( one six )= %f \ n " ,n , b );
--> end
In 1 toss , p ( one six )=0.166667
In 2 toss , p ( one six )=0.277778
In 3 toss , p ( one six )=0.347222
In 4 toss , p ( one six )=0.385802
In 5 toss , p ( one six )=0.401878
In 6 toss , p ( one six )=0.401878
In 7 toss , p ( one six )=0.390714
In 8 toss , p ( one six )=0.372109
In 9 toss , p ( one six )=0.348852
In 10 toss , p ( one six )=0.323011
In 11 toss , p ( one six )=0.296094
In 12 toss , p ( one six )=0.269176
Answer of Exercise 2.5 (Probability of a flight crash) Assume that there are 20000 flights
of airplanes each day in the world. Assume that there is one accident every 500 000 flights. What
is the probability of getting exactly 5 crash in 22 days ?
There are several answers to this question, depending on the accuracy required.
1. Flight by flight. The first approach is based on the analysis of a Bernoulli process, where
each flight has a probability of crash.
2. Time decomposition. The second approach is based on the analysis of a Bernoulli process,
where each days has a probability of crash.
3. Poisson approximation. The third approach is based on the Poisson approximation of the
binomial distribution function.
This answer is based on the hypothesis that the flights are independent. Therefore, the process
can be considered as a Bernoulli process where each flight has a crash probability equal to p =
1
500000 . The number of steps in the Bernoulli process is equal to the number of flights n. In 22
days, the number of flights is n = 22 20000. The probability of getting exactly 5 crash in 22 days
is


22 20000 5 22200005
P (exactly 5 crash in 22 days) =
p q
0.0018241
(248)
5
The figure 28 presents the results for various number of crash.
What is the probability of at least 5 crash in 22 days ? The probability of getting more than
5 crash can be computed as depending the probability of getting 0, 1, 2, 3 or 4 crash. Therefore,
P (at least 5 crash in 22 days)

1 P (0, 1, 2, 3 or 4 crash in 22 days)


X 22 20000
= 1
pj q 2220000j
j

(249)
(250)

j=0,4

0.0021294

79

(251)

Event
0 crash in 22 days
1 crash in 22 days
2 crash in 22 days
3 crash in 22 days
4 crash in 22 days
5 crash in 22 days
6 crash in 22 days
7 crash in 22 days
8 crash in 22 days
9 crash in 22 days
10 crash in 22 days

Probability
0.41478255
0.36500937
0.16060408
0.04711041
0.01036424
0.00182409
0.00026753
0.00003363
0.00000370
0.00000036
0.00000003

Figure 28: Crash probabilities for 22 days with one crash every 500 000 flights and
20000 flights each day
Event
0 crash in 42 days
1 crash in 42 days
2 crash in 42 days
3 crash in 42 days
4 crash in 42 days
5 crash in 42 days
6 crash in 42 days
7 crash in 42 days
8 crash in 42 days
9 crash in 42 days
10 crash in 42 days

Probability
0.18637366
0.31310838
0.26301125
0.14728625
0.06186013
0.02078494
0.00581976
0.00139674
0.00029331
0.00005475
0.00000920

Figure 29: Crash probabilities for 42 days with one crash every 500 000 flights and
20 000 flights each day
What is the probability of getting exactly 3 crash in 42 days ? What is the probability of
getting at least 3 crash in 42 days ?
The same computations can be performed for 42 days, which represent 6 weeks. The figure 29
presents the results.
The probability of having exactly 3 crash in 42 days is


22 20000 3 22200003
P (exactly 3 crash in 42 days) =
p q
(252)
3

0.14728625

(253)

The probability of having at least 3 crash in 42 days is


P (at least 3 crash in 42 days)

1 P (0, 1, 2, 3 or 4 crash in 22 days)


X 22 20000
= 1
pj q 2220000j
j

(254)
(255)

j=0,4

0.2375067

80

(256)

We now present another approach for the computation of the same problem. The method is
based on counting the number of crash during a given time unit, for example one day. We consider
that the process is a Bernoulli process, where each day is associated to the probability that exactly
1 crash occur, which is obviously an approximation, since it is possible that more than one crash
occur during one day. Since there is one crash every 500 000 flights, the probability that one flight
has no crash is p = 49999
50000 . By hypothesis, all flight are independent, therefore, the probability of
getting no accident in one day is

P (no crash in 1 day) =

49999
50000

20000
0.6703174.

(257)

Hence, the probability of getting at least one crash is



P (at least 1 crash in 1 day) = 1

49999
50000

20000
0.3296826.

Therefore, the probability of having exactly 5 crash in 22 days is


 
22 5 225
P (exactly 5 crash in 22 days)
p q
5
where p = 1


49999 20000
50000

(258)

(259)

0.0392106 and q = 1 p. This leads to

P (exactly 5 crash in 22 days) 0.0012366.

(260)

We see that the result is close, but different from the previous probability, which was equal to
0.00182409. In fact, the result depends on the time unit that we consider for our calculation.
Obviously, if we consider that the unit time is the 1/2 day, the formula is changed to


22 2 5 2225
P (exactly 5 crash in 22 days)
p q
(261)
5
(262)
where p = 1


49999 20000/2
50000

and q = 1 p. This gives

P (exactly 5 crash in 22 days) 0.0015155.

(263)

If we consider the hour as the time unit, we get P (exactly 5 crash in 22 days) = 0.0017973.
The third approach is based on the approximation that the binomial distribution function is
closely approximated when n is large by the Poisson distribution function, that is
b(n, p, j)

j
exp(),
j!

(264)

where = np. Here, the parameter is equal to = 22 20000/500000 = 0.88. The result is
P (exactly 5 crash in 22 days)

5
exp()
5!
0.0018241.

(265)
(266)

The three approaches are presented in figure 30. We can see that the flight by flight approach
gives a result which is very close to the Poisson approximation, with 6 common digits. The various
approaches based on time decomposition gives different results, with only 1 common digit. We can
check that smaller time units lead to results which are closer to the flight by flight approach.
Consider now all that the year is a sequence of 16 periods of 22 days (ignoring the 13 days left
in the year). What is the probability of having one period in the year which contains at least 5
crash ?

81

Event
Flight by flight
Time Unit = day
Time Unit = 1/2 day
Time Unit = hour
Poisson

Probability
0.00182409
0.0012366
0.0015155
0.0017973
0.0018241

Figure 30: Probability of having exactly 5 crash in 22 days with one crash every 500
000 flights and 20 000 flights each day - Different approaches
This decomposition corresponds to 1622 = 352 days (instead of the usual 365 days by regular
year), where the remaining 13 days are ignored. We want to compute the probability that one of
the 16 periods contains at least 5 crash. We have
P (one period contains at least 5 crash)

1 P (all periods contain less than 4 crash).


(267)

Since the periods are disjoints, the crash in all 16 periods are independents, so that
P (all periods contain less than 4 crash) = P (one period contain less than 4 crash)16 . (268)
Now the probability of having less than 4 crash in one period is equal to
P (less than 4 crash in 22 days)

1 P (at least 5 crash in 22 days)

(269)

We now plug 269 into 268 and get


P (all periods contain less than 4 crash) = (1 P (at least 5 crash in 22 days))

16

(270)

We finally plug 270 into 267 and get


P (one period contains at least 5 crash)

16

1 (1 P (at least 5 crash in 22 days)) (271)


.

It has already been computed that the probability of having at least 5 crash in one period of
22 days is
P (at least 5 crash in 22 days)

0.0021294,

(272)

so that required probability is


P (one period contains at least 5 crash) 1 (1 0.0021294)16

(273)

1 0.997870616

(274)

0.0335316,

(275)

which is approximately 3%. This is much larger that the original probability 0.0021294 0.2% of
having at least 5 crash in 22 days.
The approach presented here is still too simplified. Indeed, we should instead consider the
probability that one period in the year contains at least 5 crash and consider all possible periods of
22 days in the year. In this problem, the periods are not disjoints anymore, so that the computations
performed earlier cannot be applied. This kind of problems involves a method called scan statistics
which will not be presented here (see [9, 6]).
Answer of Exercise 2.6 (Binomial function maximum) Consider the discrete distribution
function of a Bernoulli process, as defined by 2.16. Let us prove that


p nj+1
b(n, p, j) =
b(n, p, j 1),
(276)
q
j

82

for j 1. By definition, we have



b(n, p, j 1) =


n
pj1 q nj+1 ,
j1

where q = 1 p. By pre-multiplying the previous equality by p/q, we find




p
n
b(n, p, j 1) =
pj q nj ,
j1
q
, we find
By pre-multiplying the previous equality by nj+1
j





p nj+1
nj+1
n
b(n, p, j 1) =
pj q nj .
j1
q
j
j

 n 
nj+1
We can simply expand the term
and have
j
j1





nj+1
n j + 1 n(n 1) . . . (n j + 2)
n
=
j1
j
j
(j 1)(j 2) . . . 1
n(n 1) . . . (n j + 2)(n j + 1)
=
j(j 1)(j 2) . . . 1
 
n
=
.
j

(277)

(278)

(279)

(280)
(281)
(282)

We now plug the previous result into 279 and get 276, which concludes the proof.
Let us compute jm 1 so that b(n, p, j) is maximum, i.e. so that
b(n, p, jm ) b(n, p, j),

1 j n.

(283)

To find jm
 , we consider the equality 276, which states that increasing values of j depends on
p nj+1
. If this term is greater than 1, then increasing values of j will produce increasing
q
j
values of b(n, p, j). Similarly, if this term is lower than 1, then increasing values of j will produce
decreasing values of b(n, p, j). Therefore, the index jm is the solution of the equation


p n jm + 1
1.
(284)
q
jm
This is equivalent to
p(n jm + 1) qjm ,

(285)

since q > 0 and jm > 0. We expand the previous inequality and get
pn pjm + p (1 p)jm = jm pjm ,

(286)

where the term pjm appears on both sides of the previous equation. This can be simplified into
pn + p jm ,

(287)

jm p(n + 1).

(288)

which can be written as

Therefore, the index j which makes the discrete distribution function b(n, p, j) maximum is
jm = [p(n + 1)],

(289)

where the [.] function is so that [x] is the largest integer lower or equal than x.
Consider the experiment presented in section 3.4, which consists in tossing a coin 10 times and
counting the number of heads. With a Scilab simulation, can you compute what is the number of
heads which is the most likely to occur ?
The following function returns the probability of getting j heads in 10 tosses of a coin. It is
based on the same method presented in section 3.4, replacing the number of successes (which was
equal to 5) by the variable j.

83

function p = tossingcoin ( j )
rand ( " seed " ,0)
nb = 10000;
success = 0;
for i = 1: nb
faces = floor ( 2 * rand (1 ,10) );
nbheads = sum ( faces );
if ( nbheads == j ) then
success = success + 1;
end
end
p = success / nb
endfunction
By using this function with different values of j, we can easily determine the value of j which
maximizes this probability. The following session performs a loop for j = 0, 10 and prints out the
computed probability for each value of j.
--> for j = 0:10
--> p = tossingcoin ( j );
--> mprintf ( " P ( j = %d )= %f \ n " ,j , p );
--> end
P ( j =0)=0.000800
P ( j =1)=0.009700
P ( j =2)=0.043900
P ( j =3)=0.119600
P ( j =4)=0.206800
P ( j =5)=0.250700
P ( j =6)=0.200000
P ( j =7)=0.114400
P ( j =8)=0.043700
P ( j =9)=0.008800
P ( j =10)=0.001600
We see that j = 5 maximizes the probability, which corresponds to the fact that the most probable
event is that the number of heads in 10 tosses of a coin is 5. Indeed, this corresponds to the
jm = np = 10 21 = 5 value that we just found by theory.
Answer of Exercise 2.7 (Binomial coefficients and Pascals triangle) Let a, b be two real
numbers and let n be a positive integer. Let us prove the binomial theorem which states that
X n
(a + b)n =
aj bnj .
(290)
j
j=0,n

We expand the formula (a + b)n and get


(a + b)n = (a + b)(a + b) . . . (a + b).

(291)

The first term of his product an , the second term is an1 b, and so forth, until the last term bn . The
expansion can then be written as the sum of terms aj bnj , with j = 0, n, and where each term is
associated with a coefficient that we have to compute. Consider the term aj bnj and let us count
the number of times that his term will appear in the expansion. This is equivalent as choosing j
elements in a set of n elements. Indeed, 
theorder of the elements does not count, since ab = ba.
n
Therefore, each term aj bnj will appear
times, which concludes the proof.
j
 
n
The binomial coefficients
can be written in a triangle, where each line corresponds to n
j

84

and each row corresponds to j, as in the following array

1.
1. 1.

L=
1. 2. 1.
1. 3. 3. 1.
1. 4. 6. 4. 1.

(292)

Let us use the binomial theorem in order to prove that the sum of the terms in the n-th row is 2n .
We apply the binomial theorem with a = b = 1 and get
X n
n
(1 + 1)
=
(293)
j
=

j=0,n
n

2 .

(294)

Let us prove that if the terms are added with alternating signs, then the sum is zero. We apply
the binomial theorem with a = 1 and b = 1 and we get
X n
(1 1)n =
(1)j (1)nj
(295)
j
j=0,n
 
 
 
n
n
n
= (1)0 (1)n
+ (1)1 (1)n1
+ . . . + (1)n (1)0
(296)
0
1
n
 
 
 
n
n
n
= (1)n
+ (1)n1
+ ... +
(297)
0
1
n
This last equality proves that, if the sum is beginning
 with
 the
 lastterm, alternating the signs of
n
n
the terms leads to a zero sum. Additionally, we have
=
so that Pascals triangle has
j
nj
a symmetry property. If we use this symmetry property in the binomial expansion, we have
X  n 
n
(a + b) =
anj bj .
(298)
nj
j=0,n

We use this equation with a = 1 and b = 1 and we get


X  n 
(1 1)n =
(1)nj (1)j
nj
j=0,n
 
 
 
n
n
n
= (1)n (1)0
+ (1)n1 (1)1
+ . . . + (1)0 (1)n
0
1
n
   
 
n
n
n
=

+ . . . + (1)n
0
1
n

(299)
(300)
(301)

This last equality proves that, if the sum is beginning with the first term, alternating the signs of
the terms leads to a zero sum.
Binomial coefficients can also be represented in a matrix called Pascals matrix, where the
binomial coefficients are stored in the anti-diagonals of the matrix. The following matrix is Pascals
matrix of order 5 :

1. 1. 1.
1. 1.
1. 2. 3.
4. 5.

S = 1. 3. 6. 10. 15.
(302)

1. 4. 10. 20. 35.


1. 5. 15. 35. 70.
Let us design a Scilab script to compute Pascals triangle and Pascals matrix and check that we
find the same results as presented in 292 and 302. The following pascallow function allows to
compute Pascals lower triangular matrix of order n. It is directly based on the nchoosek function
already defined in section 2.13.

85

function c = pascallow ( n )
c = zeros (n , n );
for i = 1: n
c (i ,1: i ) = nchoosek (i -1 ,(1: i ) -1);
end
endfunction
In the following session, we use the pascallow function to check that we get the matrix presented
in 292.
--> pascallow (5)
ans =
1.
0.
2.
1.
3.
3.
4.
6.
5.
10.

0.
0.
1.
4.
10.

0.
0.
0.
1.
5.

0.
0.
0.
0.
1.

In order to compute Pascals symetric matrix, some algebra is required so that the matrix
elements Sij are filled anti-diagonal by anti-diagonal, where an anti-diagonal is associated with a
constant sum i + j. The following function computes Pascals symetric matrix of order n.
function c = pascalsym ( n )
c = zeros (n , n );
for i = 1: n
c (i ,1: n ) = nchoosek ( i +(1: n ) -2 ,i -1);
end
endfunction
The following session shows a sample use of this function in order to check that we get the same
result as presented in equation 302.
--> pascalsym (5)
ans =
1.
1.
1.
1.
2.
3.
1.
3.
6.
1.
4.
10.
1.
5.
15.

1.
4.
10.
20.
35.

1.
5.
15.
35.
70.

We can additionnaly define Pascals upper triangular matrix as in the following function.
function c = pascalup ( n )
c = zeros (n , n );
for i = 1: n
c (i , i : n ) = nchoosek ( ( i : n ) -1 , i -1 );
end
endfunction
In the following session, we compute Pascals upper triangular matrix of order 5.
--> pascalup ( 5 )
ans =
1.
1.
1.
0.
1.
2.
0.
0.
1.
0.
0.
0.
0.
0.
0.

1.
3.
3.
1.
0.

1.
4.
6.
4.
1.

The lower, upper and symetric Pascal matrix are related by the equality L U = S, as shown
in the following session.

86

-->L
-->U
-->S
-->L
ans

=
=
=
*

pascallow ( 5 );
pascalup ( 5 );
pascalsym ( 5 );
U - S
=

0.
0.
0.
0.
0.

0.
0.
0.
0.
0.

0.
0.
0.
0.
0.

0.
0.
0.
0.
0.

0.
0.
0.
0.
0.

Answer of Exercise 2.8 (Binomial identity)


Let us prove the following binomial identity
 
X n2
2n
=
.
n
j

(303)

j=0,n

To help ourselves to prove this result, we consider a set A with 2n elements, where n elements are
red and n elements are blue. Let us compute the number of ways to choose n elements in this set.
For example, we consider the case n = 3 so that the set is
A = {R1 , R2 , R3 , B1 , B2 , B3 } .

(304)

We are searching subsets Ai A, where i = 1, imax , where imax is the positive integer to be
computed. To organize our computation, we order the subsets depending on the number of red
balls in the subset. The following is the list of all possible subsets of size 3 with no red element.
A1 = {B1 , B2 , B3 } .

(305)

The following is the list of all possible subsets of size 3 with 1 red element (and, therefore, 2 blue
elements).
A2 = {R1 , B1 , B2 } ,

A3 = {R2 , B1 , B2 } ,

A4 = {R3 , B1 , B2 } ,

(306)

A5 = {R1 , B2 , B3 } ,

A6 = {R2 , B2 , B3 } ,

A7 = {R3 , B2 , B3 } ,

(307)

A8 = {R1 , B1 , B3 } ,

A9 = {R2 , B1 , B3 } ,

A10 = {R3 , B1 , B3 } .

(308)

The other
can be computed with the same method so 
that we finally find that there are,
 subsets

6
2n
indeed,
= 20 subsets. Therefore, we write the term
as the sum of the subsets where
3
n
there are j red elements, where 1 j n. We have
 
X
2n
=
Cj ,
(309)
n
j=0,n

 
n
where Cj is the number of subsets of A with n elements where j elements are red. There are
j


n
ways to choose the j red elements and
ways to choose the nj blue elements so that Cj =
nj
 

 


n
n
n
n
. But the symetry property of the binomial function states that
=
,
j
nj
j
nj
which leads to
 2
n
Cj =
.
(310)
j
Finally, the previous equality can be plugged into 309 so that the equality 303 holds true, which
concludes the proof.

87

Answer of Exercise 2.9 (Earthquakes and predictions) Assume that a person predicts the
dates of major earthquakes (with magnitude larger than 6.5 or with a large number of deaths,
etc...) in the world during 3 years, i.e. in a period of 1096 days. Assume that the specialist
predicts 169 earthquakes. Assume that, during the same period, 196 major earthquakes really
occur, so that 33 earthquakes were correctly predicted by the specialist. What is the probability
that the earthquakes are predicted by chance ?
We consider the set of m = 1096 days, where k = 196 days are earthquakes and m k =
1096 196 are not earthquake days. In this set, we are picking n = 169 days, where x = 33 days
are earthquakes.
The probability of selecting x earthquake days is given by the hypergeometric distribution
function defined by
 

k
mk
x
nx
 
.
(311)
P (X = x) = h(x, m, k, n) =
m
n
In our particular situation, we have

P (X = 33)

 

196
1096 196
33
169 33


1096
169

0.0705625,

(312)

(313)

which is approximately 7 %.
To know if the prediction is based on chance, we perform the computation, with Scilab, of
all the probabilities with x = 0, 1, . . . , 169. The following script allows to compute the required
probability and to draw the plot which is presented in figure 31.
// The number of days in three years
m = 1096
// The number of days selected
n = 169
// The number of earthquake days in three years
k = 196
// The number of earthquake days selected
x = 33
// The probability of picking 169 days , where 33 are earthquakes .
p = hygepdf ( x , m , k , n )
// Plot the distribution
xdata = zeros ( n +1);
pdata = zeros ( n +1);
for i = 0: n
xdata ( i +1) = i ;
pdata ( i +1) = hygepdf ( i , m , k , n );
end
plot ( xdata , pdata )
f = gcf ();
f . children . title . text = " Probability of random predictions " ;
f . children . x_label . text = " Number of earthquakes " ;
f . children . y_label . text = " Probability " ;
Obviously, the prediction of the specialist appears to be a random choice, since the number
of earthquakes predicted corresponds to the maximum probability.

88

Probability of random predictions


0.09

0.08

0.07

Probability

0.06

0.05

0.04

0.03

0.02

0.01

0.00
0

20

40

60

80

100

120

140

160

180

Number of earthquakes

Figure 31: Probability of having x earthquake days while choosing n = 169 days
from m = 1096 days, where k = 196 days are earthquake days.
Answer of Exercise 2.10 (Log-factorial function) The following Scilab implementation is
adapted from a Matlab source code written by John Burkardt[3]. The following factoriallog_sum
function uses the sum and ln functions to perform a fast computation of fln (n) = log(n!) from
equation 193.
function value = factoriallog_sum ( n )
value = sum ( log (2: n ));
endfunction
The previous implementation has the advantage of relying only on the logarithm function. But the
factoriallog_sum function does not take matrix input arguments. The main issue is that more
memory is required, making the computation intractable for values of n larger than 105 . Finally,
the function might generate inaccurate results for large values of n, caused by the accumulation of
rounding errors in the sum.

References
[1] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with
Formulas, Graphs, and Mathematical Tables. Dover Publications Inc., 1972.
[2] George E. Andrews, Richard Askey, and Ranjan Roy. Special Functions. Cambridge University Press, Cambridge, 1999.
[3] John Burkardt. Probability density functions. http://people.sc.fsu.edu/
~burkardt/m_src/prob/prob.html.
[4] Georges Charpak and Henri Broch. Devenez sorciers, devenez savants. Odile
Jacob, 2002.

89

[5] Richard Durstenfeld. Algorithm 235: Random permutation. Commun. ACM,


7(7):420, 1964.
lauser
[6] J. Glaz and N. Balakrishnan. Scan Statistics and Applications. BirkhA
Boston, 1999.
[7] Charles Grinstead, M. and J. Snell, Laurie. Introduction to probabilities, Second
Edition. American Mathematical Society, 1997.
[8] J. M. Hammersley and D. C. Handscomb. Monte Carlo Methods. Chapman
and Hall, 1964.
Janvresse and T. de la Rue. La loi des series noires. La Recherche, 393:52
[9] E.
53, Janvier 2006. http://www.univ-rouen.fr/LMRS/Persopage/Janvresse/
Publi/series_noires.pdf.
[10] D. E. Knuth. The Art of Computer Programming, Volume 2, Seminumerical
Algorithms. Third Edition, Addison Wesley, Reading, MA, 1998.
[11] M. Lo`eve. Probability Theory I, 4th Edition. Springer, 1963.
[12] Michael A. Malcolm and Cleve B. Moler. Urand: a universal random number
generator. Technical report, Stanford University, Stanford, CA, USA, 1973.
[13] Lincoln E. Moses and V. Oakford, Robert. Tables of random permutations.
Stanford University Press, Stanford, Calif., 1963.
[14] World Health Organization. Life table, united states of america, 2009. http:
//www.who.int/.
[15] Dmitry Panchenko. Introduction to probability and statistics. Course 18.05,
Lecture #2, Properties of Probability, Finite Sample Spaces, Some Combinatorics, Lectures notes taken by Anna Vetter, 2005.
[16] Wolfram Research. Wolfram alpha. http://www.wolframalpha.com.
[17] M. Ross, Sheldon. Introduction to probability and statistics for engineers and
scientists. John Wiley and Sons, 1987.
[18] Pascal Sebah and Xavier Gourdon.
Introduction to the gamma
function.
http://www.profesores.frc.utn.edu.ar/electronica/
analisisdeseniales/aplicaciones/Funcion_Gamma.pdf.
[19] Edmund Taylor Whittaker and George Neville Watson. A Course of Modern
Analysis. Cambridge Mathematical Library, 1927.
[20] Wikipedia. Poker probability wikipedia, the free encyclopedia, 2009.
[21] A. Talha Yalta. The accuracy of statistical distributions in microsoft excel 2007.
Comput. Stat. Data Anal., 52(10):45794586, 2008.

90

Index
Bernoulli, 30
binomial, 25
combination, 25
combinatorics, 11
complementary, 2
conditional
distribution function, 9
probability, 9
disjoint, 2
event, 3
factorial, 13, 21
fair die, 8
gamma, 15
grand, 34
intersection, 2
outcome, 3
permutation, 12
permutations, 23
poker, 28
rand, 34
random, 3
rank, 28
sample space, 3
seed, 35
subset, 2
suit, 28
tree diagram, 12
uniform, 8
union, 2
Venn, 2

91

S-ar putea să vă placă și