Sunteți pe pagina 1din 543

PERFORMANCE ANALYSIS

OF COMMUNICATIONS
NETWORKS AND SYSTEMS

PIET VAN MIEGHEM


Delft University of Technology

cambridge university press


Cambridge, New York, Melbourne, Madrid, Cape Town, Singapore, So Paulo
Cambridge University Press
The Edinburgh Building, Cambridge cb2 2ru, UK
Published in the United States of America by Cambridge University Press, New York
www.cambridge.org
Information on this title: www.cambridge.org/9780521855150
Cambridge University Press 2006
This publication is in copyright. Subject to statutory exception and to the provision of
relevant collective licensing agreements, no reproduction of any part may take place
without the written permission of Cambridge University Press.
First published in print format 2006
isbn-13
isbn-10

978-0-511-16917-5 eBook (NetLibrary)


0-511-16917-5 eBook (NetLibrary)

isbn-13
isbn-10

978-0-521-85515-0 hardback
0-521-85515-2 hardback

Cambridge University Press has no responsibility for the persistence or accuracy of urls
for external or third-party internet websites referred to in this publication, and does not
guarantee that any content on such websites is, or will remain, accurate or appropriate.

Waar een wil is, is een weg.


to my father

to my wife Saskia
and my sons Vincent, Nathan and Laurens

Contents

Preface
1

Introduction

Part I

7
9

Probability theory and set theory


Discrete random variables
Continuous random variables
The conditional probability
Several random variables and independence
Conditional expectation

Basic distributions
3.1
3.2
3.3
3.4
3.5
3.6
3.7

Probability theory

Random variables
2.1
2.2
2.3
2.4
2.5
2.6

xi

37

Discrete random variables


Continuous random variables
Derived distributions
Functions of random variables
Examples of other distributions
Summary tables of probability distributions
Problems

Correlation
4.1
4.2
4.3

9
16
20
26
28
34

37
43
47
51
54
58
59
61

Generation of correlated Gaussian random variables


Generation of correlated random variables
The non-linear transformation method
v

61
67
68

vi

Contents

4.4
4.5
4.6
5

Inequalities
5.1
5.2
5.3
5.4
5.5
5.6
5.7

Examples of the non-linear transformation method


Linear combination of independent auxiliary random
variables
Problem

The minimum (maximum) and inmum (supremum)


Continuous convex functions
Inequalities deduced from the Mean Value Theorem
The Markov and Chebyshev inequalities
The Hlder, Minkowski and Young inequalities
The Gauss inequality
The dominant pole approximation and large deviations

Limit laws
6.1
6.2
6.3
6.4

A stochastic process
The Poisson process
Properties of the Poisson process
The nonhomogeneous Poisson process
The failure rate function
Problems

Renewal theory
8.1
8.2
8.3
8.4
8.5

Stochastic processes

The Poisson process


7.1
7.2
7.3
7.4
7.5
7.6

Basic notions
Limit theorems
The residual waiting time
The renewal reward process
Problems

Discrete-time Markov chains


9.1

78
82
83
83
84
86
87
90
92
94
97

General theorems from analysis


Law of Large Numbers
Central Limit Theorem
Extremal distributions

Part II

74

Denition

97
101
103
104
113
115
115
120
122
129
130
132
137
138
144
149
153
155
157
157

Contents

9.2
9.3
9.4
10

Continuous-time Markov chains


10.1
10.2
10.3
10.4
10.5
10.6
10.7
10.8

11

Discrete-time Markov chain


The steady-state of a Markov chain
Problems

Denition
Properties of continuous-time Markov processes
Steady-state
The embedded Markov chain
The transitions in a continuous-time Markov chain
Example: the two-state Markov chain in continuous-time
Time reversibility
Problems

Applications of Markov chains


11.1 Discrete Markov chains and independent random variables
11.2 The general random walk
11.3 Birth and death process
11.4 A random walk on a graph
11.5 Slotted Aloha
11.6 Ranking of webpages
11.7 Problems

12

Branching processes
12.1
12.2
12.3
12.4
12.5

13

General queueing theory


13.1
13.2
13.3
13.4
13.5
13.6

14

The probability generating function


The limit Z of the scaled random variables Zn
The Probability of Extinction of a Branching Process
Asymptotic behavior of Z
A geometric branching processes

A queueing system
The waiting process: Lindleys approach
The Benes approach to the unnished work
The counting process
PASTA
Littles Law

Queueing models

vii

158
168
177
179
179
180
187
188
193
195
196
199
201
201
202
208
218
219
224
228
229
231
233
237
240
243
247
247
252
256
263
266
267
271

viii

Contents

14.1
14.2
14.3
14.4
14.5
14.6
14.7
14.8
14.9

The M/M/1 queue


Variants of the M/M/1 queue
The M/G/1 queue
The GI/D/m queue
The M/D/1/K queue
The N*D/D/1 queue
The AMS queue
The cell loss ratio
Problems

Part III
15

Physics of networks

General characteristics of graphs


15.1
15.2
15.3
15.4
15.5
15.6
15.7

Introduction
The number of paths with m hops
The degree of a node in a graph
Connectivity and robustness
Graph metrics
Random graphs
The hopcount in a large, sparse graph with unit link
weights
15.8 Problems
16

The Shortest Path Problem


16.1 The shortest path and the link weight structure
16.2 The shortest path tree in NQ with exponential link
weights
16.3 The hopcount kQ in the URT
16.4 The weight of the shortest path
16.5 The ooding time WQ
16.6 The degree of a node in the URT
16.7 The minimum spanning tree
16.8 The proof of the degree Theorem 16.6.1 of the URT
16.9 Problems

17

The e!ciency of multicast


17.1 General results for jQ (p)
17.2 The random graph Js (Q )
17.3 The n-ary tree

271
276
283
289
296
300
304
309
312
317
319
319
321
322
325
328
329
340
346
347
348
349
354
359
361
366
373
380
385
387
388
392
401

Contents

17.4
17.5
17.6
17.7
17.8
18

The ChuangSirbu law


Stability of a multicast shortest path tree
Proof of (17.16): jQ (p) for random graphs
Proof of Theorem 17.3.1: jQ (p) for n-ary trees
Problem

The hopcount to an anycast group


18.1
18.2
18.3
18.4
18.5
18.6

Introduction
General analysis
The n-ary tree
The uniform recursive tree (URT)
Approximate analysis
The performance measure  in exponentially growing
trees

ix

404
407
410
414
416
417
417
419
423
424
431
432

Appendix A

Stochastic matrices

435

Appendix B

Algebraic graph theory

471

Appendix C

Solutions of problems

493

Bibliography

523

Index

529

Preface
Performance analysis belongs to the domain of applied mathematics. The
major domain of application in this book concerns telecommunications systems and networks. We will mainly use stochastic analysis and probability
theory to address problems in the performance evaluation of telecommunications systems and networks. The rst chapter will provide a motivation
and a statement of several problems.
This book aims to present methods rigorously, hence mathematically, with
minimal resorting to intuition. It is my belief that intuition is often gained
after the result is known and rarely before the problem is solved, unless the
problem is simple. Techniques and terminologies of axiomatic probability
(such as denitions of probability spaces, ltration, measures, etc.) have
been omitted and a more direct, less abstract approach has been adopted.
In addition, most of the important formulas are interpreted in the sense of
What does this mathematical expression teach me? This last step justies
the word applied, since most mathematical treatises do not interpret as
it contains the risk to be imprecise and incomplete.
The eld of stochastic processes is much too large to be covered in a single
book and only a selected number of topics has been chosen. Most of the topics are considered as classical. Perhaps the largest omission is a treatment
of Brownian processes and the many related applications. A weak excuse
for this omission (besides the considerable mathematical complexity) is that
Brownian theory applies more to physics (analogue elds) than to system
theory (discrete components). The list of omissions is rather long and only
the most noteworthy are summarized: recent concepts such as martingales
and the coupling theory of stochastic variables, queueing networks, scheduling rules, and the theory of long-range dependent random variables that currently governs in the Internet. The connement to stochastic analysis also
excludes the recent new framework, called Network Calculus by Le Boudec
and Thiran (2001). Network calculus is based on min-plus algebra and has
been applied to (Inter)network problems in a deterministic setting.
As prerequisites, familiarity with elementary probability and the knowledge of the theory of functions of a complex variable are assumed. Parts in
the text in small font refer to more advanced topics or to computations that
can be skipped at rst reading. Part I (Chapters 26) reviews probability
theory and it is included to make the remainder self-contained. The book
essentially starts with Chapter 7 (Part II) on Poisson processes. The Poisxi

xii

Preface

son process (independent increments and discontinuous sample paths) and


Brownian motion (independent increments but continuous sample paths)
are considered to be the most important basic stochastic processes. We
briey touch upon renewal theory to move to Markov processes. The theory
of Markov processes is regarded as a fundament for many applications in
telecommunications systems, in particular queueing theory. A large part
of the book is consumed by Markov processes and its applications. The
last chapters of Part II dive into queueing theory. Inspired by intriguing
problems in telephony at the beginning of the twentieth century, Erlang
has pushed queueing theory to the scene of sciences. Since his investigations, queueing theory has grown considerably. Especially during the last
decade with the advent of the Asynchronous Transfer Mode (ATM) and the
worldwide Internet, many early ideas have been rened (e.g. discrete-time
queueing theory, large deviation theory, scheduling control of prioritized
ows of packets) and new concepts (self-similar or fractal processes) have
been proposed. Part III covers current research on the physics of networks.
This Part III is undoubtedly the least mature and complete. In contrast to
most books, I have chosen to include the solutions to the problems in an
Appendix to support self-study.
I am grateful to colleagues and students whose input has greatly improved
this text. Fernando Kuipers and Stijn van Langen have corrected a large
number of misprints. Together with Fernando, Milena Janic and Almerima Jamakovic have supplied me with exercises. Gerard Hooghiemstra has
made valuable comments and was always available for discussions about
my viewpoints. Bart Steyaert eagerly gave the ner details of the generating function approach to the GI/D/m queue. Jan Van Mieghem has given
overall comments and suggestions beside his input with the computation of
correlations. Finally, I thank David Hemsley for his scrupulous corrections
in the original manuscript.
Although this book is intended to be of practical use, in the course of
writing it, I became more and more persuaded that mathematical rigor has
ample virtues of its own.
Per aspera ad astra

January 2006

Piet Van Mieghem

1
Introduction

The aim of this rst chapter is to motivate why stochastic processes and
probability theory are useful to solve problems in the domain of telecommunications systems and networks.
In any system, or for any transmission of information, there is always a
non-zero probability of failure or of error penetration. A lot of problems in
quantifying the failure rate, bit error rate or the computation of redundancy
to recover from hazards are successfully treated by probability theory. Often
we deal in communications with a large variety of signals, calls, sourcedestination pairs, messages, the number of customers per region, and so on.
And, most often, precise information at any time is not available or, if it
is available, deterministic studies or simulations are simply not feasible due
to the large number of dierent parameters involved. For such problems, a
stochastic approach is often a powerful vehicle, as has been demonstrated
in the eld of physics.
Perhaps the rst impressing result of a stochastic approach was Boltzmanns and Maxwells statistical theory. They studied the behavior of particles in an ideal gas and described how macroscopic quantities as pressure and
temperature can be related to the microscopic motion of the huge amount
of individual particles. Boltzmann also introduced the stochastic notion of
the thermodynamic concept of entropy V,
V = n log Z
where Z denotes the total number of ways in which the ensembles of particles can be distributed in thermal equilibrium and where n is a proportionality factor, afterwards attributed to Boltzmann as the Boltzmann constant.
The pioneering work of these early physicists such as Boltzmann, Maxwell
and others was the germ of a large number of breakthroughs in science.
Shortly after their introduction of stochastic theory in classical physics, the
1

Introduction

theory of quantum mechanics (see e.g. Cohen-Tannoudji et al., 1977) was


established. This theory proposes that the elementary building blocks of
nature, the atom and electrons, can only be described in a probabilistic
sense. The conceptually di!cult notion of a wave function whose squared
modulus expresses the probability that a set of particles is in a certain state
and the Heisenbergs uncertainty relation exclude in a dramatic way our
deterministic, macroscopic view on nature at the ne atomic scale.
At about the same time as the theory of quantum mechanics was being
created, Erlang applied probability theory to the eld of telecommunications. Erlang succeeded to determine the number of telephone input lines
p of a switch in order to serve QV customers with a certain probability s.
Perhaps his most used formula is the Erlang E formula (14.17), derived in
Section 14.2.2,
Pr [QV = p] =

p
p!
Pp m
m=0 m!

where the load or tra!c intensity  is the ratio of the arrival rate of calls to
the telephone local exchange or switch over the processing rate of the switch
per line. By equating the desired blocking probability s = Pr [QV = p], say
s = 1034 , the number of input lines p can be computed for each load .
Due to its importance, books with tables relating s,  and p were published.
Another pioneer in the eld of communications that deserves to be mentioned is Shannon. Shannon explored the concept of entropy V. He introduced (see e.g. Walrand, 1998) the notion of the Shannon capacity of a
channel, the maximum rate at which bits can be transmitted with arbitrary
small (but non zero) probability of errors, and the concept of the entropy
rate of a source which is the minimum average number of bits per symbol required to encode the output of a source. Many others have extended
his basic ideas and so it is fair to say that Shannon founded the eld of
information theory.
A recent important driver in telecommunication is the concept of quality of service (QoS). Customers can use the network to transmit dierent
types of information such as pictures, les, voice, etc. by requiring a specic level of service depending on the type of transmitted information. For
example, a telephone conversation requires that the voice packets arrive at
the receiver G ms later, while a le transfer is mostly not time critical but
requires an extremely low information loss probability. The value of the
mouth-to-ear delay G is clearly related to the perceived quality of the voice
conversation. As long as G ? 150 ms, the voice conversation has toll quality, which is roughly speaking, the quality that we are used to in classical

Introduction

telephony. When G exceeds 150 ms, rapid degradation is experienced and


when G A 300 ms, most of the test persons have great di!culty in understanding the conversation. However, perceived quality may change from
person to person and is di!cult to determine, even for telephony. For example, if the test person knows a priori that the conversation is transmitted
over a mobile or wireless channel as in GSM, he or she is willing to tolerate
a lower quality. Therefore, quality of service is both related to the nature
of the information and to the individual desire and perception. In future
Internetworking, it is believed that customers may request a certain QoS
for each type of information. Depending on the level of stringency, the network may either allow or refuse the customer. Since customers will also pay
an amount related to this QoS stringency, the network function that determines to either accept or refuse a call for service will be of crucial interest
to any network operator. Let us now state the connection admission control
(CAC) problem for a voice conversation to illustrate the relation to stochastic analysis: How many customers p are allowed in order to guarantee that
the ensemble of all voice packets reaches the destination within G ms with
probability s?This problem is exceptionally di!cult because it depends on
the voice codecs used, the specics of the network topology, the capacity of
the individual network elements, the arrival process of calls from the customers, the duration of the conversation and other details. Therefore, we
will simplify the question. Let us rst assume that the delay is only caused
by the waiting time of a voice packet in the queue of a router (or switch).
As we will see in Chapter 13, this waiting time W of voice packets in a single
queueing system depends on (a) the arrival process: the way voice packets
arrive, and (b) the service process: how they are processed. Let us assume
that the arrival process specied by the average arrival rate  and the service process specied by the average service rate  are known. Clearly, the
arrival rate  is connected to the number of customers p. A simplied
statement of the CAC problem is, What is the maximum  allowed such
that Pr [W A G] ? ? In essence, the CAC problem consists in computing
the tail probability of a quantity that depends on parameters of interest. We
have elaborated on the CAC problem because it is a basic design problem
that appears under several disguises. A related dimensioning problem is the
determination of the buer size in a router in order not to lose more than a
certain number of packets with probability s, given the arrival and service
process. The above mentioned problem of Erlang is a third example. Another example treated in Chapter 18 is the server placement problem: How
many replicated servers p are needed to guarantee that any user can access
the information within n hops with probability Pr [kQ (p) A n]  , where

Introduction

 is certain level of stringency and kQ (p) is the number of hops towards the
most nearby of the p servers in a network with Q routers.
The popularity of the Internet results in a number of new challenges. The
traditional mathematical models as the Erlang B formula assume smooth
tra!c ows (small correlation and Markovian in nature). However, TCP/IP
tra!c has been shown to be bursty (long-range dependent, self-similar and
even chaotic, non-Markovian (Veres and Boda, 2000)). As a consequence,
many traditional dimensioning and control problems ask for a new solution. The self-similar and long range dependent TCP/IP tra!c is mainly
caused by new complex interactions between protocols and technologies (e.g.
TCP/IP/ATM/SDH) and by other information transported than voice. It
is observed that the content size of information in the Internet varies considerably in size causing the Noah eect: although immense oods are
extremely rare, their occurrence impacts signicantly Internet behavior on
a global scale. Unfortunately, the mathematics to cope with the self-similar
and long range dependent processes turns out to be fairly complex and beyond the scope of this book.
Finally, we mention the current interest in understanding and modeling
complex networks such as the Internet, biological networks, social networks
and utility infrastructures for water, gas, electricity and transport (cars,
goods, trains). Since these networks consists of a huge number of nodes Q
and links O, classical and algebraic graph theory is often not suited to produce even approximate results. The beginning of probabilistic graph theory
is commonly attributed to the appearance of papers by Erds and Rnyi in
the late 1940s. They investigated a particularly simple growing model for a
graph: start from Q nodes and connect in each step an arbitrary random,
not yet connected pair of nodes until all O links are used. After about Q@2
steps, as shown in Section 16.7.1, they observed the birth of a giant component that, in subsequent steps, swallows the smaller ones at a high rate.
This phenomenon is called a phase transition and often occurs in nature.
In physics it is studied in, for example, percolation theory. To some extent,
the Internets graph bears some resemblance to the Erds-Rnyi random
graph. The Internet is best regarded as a dynamic and growing network,
whose graph is continuously changing. Yet, in order to deploy services over
the Internet, an accurate graph model that captures the relevant structural
properties is desirable. As shown in Part III, a probabilistic approach based
on random graphs seems an e!cient way to learn about the Internets intriguing behavior. Although the Internets topology is not a simple ErdsRnyi random graph, results such as the hopcount of the shortest path and
the size of a multicast tree deduced from the simple random graphs provide

Introduction

a rst order estimate for the Internet. Moreover, analytic formulas based
on other classes of graphs than the simple random graph prove di!cult to
obtain. This observation is similar to queueing theory, where, beside the
M/G/x class of queues, hardly closed expressions exist.
We hope that this brief overview motivates su!ciently to surmount the
mathematical barriers. Skill with probability theory is deemed necessary
to understand complex phenomena in telecommunications. Once mastered,
the power and beauty of mathematics will be appreciated.

Part I
Probability theory

2
Random variables

This chapter reviews basic concepts from probability theory. A random variable (rv) is a variable that takes certain values by chance. Throughout this
book, this imprecise and intuitive denition su!ces. The precise denition
involves axiomatic probability theory (Billingsley, 1995).
Here, a distinction between discrete and continuous random variables is
made, although a unied approach including alsoR mixed cases via the Stieltjes integral (Hardy et al., 1999, pp. 152157), j({)gi ({), is possible. In
general, the distribution I[ ({) = Pr [[  {] holds in both cases, and
Z
X
j(n) Pr[[ = n] where [ is a discrete rv
j({)gI[ ({) =
n

Z
=

j({)

gI[ ({)
g{
g{

where [ is a continuous rv

In most practical situations, the Stieltjes integral reduces to the Riemann


integral, else, Lesbesgues theory of integration and measure theory (Royden,
1988) is required.

2.1 Probability theory and set theory


Pascal (16231662) is commonly regarded as one of the founders of probability theory. In his days, there was much interest in games of chance1 and
the likelihood of winning a game. In most of these games, there was a nite
number q of possible outcomes and each of them was equally likely. The
1

La rgle des partis, a chapter in Pascals mathematical work (Pascal, 1954), consists of a
series of letters to Fermat that discuss the following problem (together with a more complex
question that is essentially a variant of the probability of gamblers ruin treated in Section
11.2.1): Consider the game in which 2 dice are thrown q times. How many times q do we have
to throw the 2 dice to throw double six with probability s = 12 ?

10

Random variables

probability of the event D of interest was dened as


qD
Pr [D] =
q
where qD is the number of favorable outcomes (samples points of D). If the
number of outcomes of an experiment is not nite, this classical denition
of probability does not su!ce anymore. In order to establish a coherent and
precise theory, probability theory employs concepts of group or set theory.
The set of all possible outcomes of an experiment is called the sample
space . A possible outcome of an experiment is called a sample point $
that is an element of the sample space . An event D consists of a set of
sample points. An event D is thus a subset of the sample space . The
complement Df of an event D consists of all sample points of the sample
space that are not in (the set) D, thus Df = \D. Clearly, (Df )f = D
and the complement of the sample space is the empty set, f = > or, vice a
versa, >f = . A family F of events is a set of events and thus a subset of the
sample space that possesses particular events as elements. More precisely,
a family F of events satises the three conditions that dene a -eld2 : (a)
f
> 5 F, (b) if D1 > D2 > = = = 5 F, then ^"
m=1 Dm 5 F and (c) if D 5 F, then D 5
F. These conditions guarantee that F is closed under countable unions and
intersections of events.
Events and the probability of these events are connected by a probability
measure Pr [=] that assigns to each event of the family F of events of a sample
space a real number in the interval [0> 1]. As Axiom 1, we require that
Pr [ ] = 1. If Pr [D] = 0, the occurrence of the event D is not possible, while
Pr [D] = 1 means that the event D is certain to occur. If Pr [D] = s with
0 ? s ? 1, the event D has probability s to occur.
If the events D and E have no sample points in common, D _ E = >,
the events D and E are called mutually exclusive events. As an example,
the event and its complement are mutually exclusive because D _ Df = >.
Axiom 2 of a probability measure is that for mutually exclusive events D
and E holds that Pr [D ^ E] = Pr [D]+Pr [E]. The denition of a probability
measure and the two axioms are su!cient to build a consistent framework
on which probability theory is founded. Since Pr [>] = 0 (which follows from
2

A eld F posseses the properties:


(i)  M F;
(ii) if D> E M F, then D  E M F and D K E M F;
(iii) if D M F, then Df M F=
This denition is redundant. For, we have by (ii) and (iii) that (D  E)f M F. Further, by De
Morgans law (D  E)f = Df K E f , which can be deduced from Figure 2.1 and again by (iii),
the argument shows that the reduced statement (ii), if D> E M F, then D  E M F, is su!cient
to also imply that D K E M F.

2.1 Probability theory and set theory

11

Axiom 2 because D _ > = > and D = D ^ >), for mutually exclusive events
D and E holds that Pr [D _ E] = 0.
As a classical example that explains the formal denitions, let us consider the experiment of throwing a fair die. The sample space consists of
all possible outcomes: = {1> 2> 3> 4> 5> 6}. A particular outcome of the
experiment, say $ = 3, is a sample point $ 5 . One may be interested in
the event D where the outcome is even in which case D = {2> 4> 6}  and
Df = {1> 3> 5}.
If D and E are events, the union of these events D ^ E can be written
using set theory as
D ^ E = (D _ E) ^ (Df _ E) ^ (D _ E f )
because D_E, Df _E and D_E f are mutually exclusive events. The relation
is immediately understood by drawing a Venn diagram as in Fig. 2.1. Taking

ABc

AB

AcB

B
:

Fig. 2.1. A Venn diagram illustrating the union D ^ E.

the probability measure of the union yields


Pr [D ^ E] = Pr [(D _ E) ^ (Df _ E) ^ (D _ E f )]
= Pr [D _ E] + Pr [Df _ E] + Pr [D _ E f ]

(2.1)

where the last relation follows from Axiom 2. Figure 2.1 shows that D =
(D _ E) ^ (D _ E f ) and E = (D _ E) ^ (Df _ E). Since the events are
mutually exclusive, Axiom 2 states that
Pr [D] = Pr [D _ E] + Pr [D _ E f ]
Pr [E] = Pr [D _ E] + Pr [Df _ E]
Substitution into (2.1) yields the important relation
Pr [D ^ E] = Pr [D] + Pr [E]  Pr [D _ E]

(2.2)

Although derived for the measure Pr [=], relation (2.2) also holds for other
measures, for example, the cardinality (the number of elements) of a set.

12

Random variables

2.1.1 The inclusion-exclusion formula


A generalization of the relation (2.2) is the inclusion-exclusion formula,
Pr [^qn=1 Dn ] =

q
X

Pr [Dn1 ] 

n1 =1

q
X

q
X

q
X

Pr [Dn1 _ Dn2 ]

n1 =1 n2 =n1 +1
q
X

q
X

Pr [Dn1 _ Dn2 _ Dn3 ]

n1 =1 n2 =n1 +1 n3 =n2 +1
q
X
q1

+ + (1)

q
X

n1 =1 n2 =n1 +1

q
X

Pr _qm=1 Dnm

(2.3)

nq =nq1 +1

The formula shows that the probability of the union consists of the sum of
probabilities of the individual events (rst term). Since sample points can
belong to more than one event Dn , the rst term possesses double countings.
The second term removes all probabilities of samples points that belong to
precisely two event sets. However, by doing so (draw a Venn diagram), we
also subtract the probabilities of samples points that belong to three events
sets more than needed. The third term adds these again, and so on. The
inclusion-exclusion formula can be written more compactly as,
q
q
q
q
h
i
X
X
X
X
q
m31
(1)

Pr _mp=1 Dnp
(2.4)
Pr [^n=1 Dn ] =
m=1

n1 =1 n2 =n1 +1

or with

i
h
Pr _mp=1 Dnp

Vm =

nm =nm31 +1

1$n1 ?n2 ?===?nm $q

as
Pr [^qn=1 Dn ] =

q
X
(1)m31 Vm

(2.5)

m=1
Proof of the inclusion-exclusion formula 3 : Let D = q31
n=1 Dn and E = Dq such that
3

Another proof (Grimmett and Stirzacker, 2001, p. 56) uses the indicator function dened in
Section 2.2.1. Useful indicator function relations are
1DKE = 1D 1E
1Df = 1 3 1D
1DX E = 1 3 1(DE)f = 1 3 1Df KE f = 1 3 1Df 1E f
= 1 3 (1 3 1D )(1 3 1E ) = 1D + 1E + 1D 1E = 1D + 1E + 1DKE
Generalizing the last relation yields
1q
D =13
n=1 n

q
\

(1 3 1Dn )

n=1

Multiplying out and taking the expectations using (2.13) leads to (2.3).

2.1 Probability theory and set theory

13



q31
q31
D  E = q
n=1 Dn and D K E = Dq K n=1 Dn = n=1 Dn K Dq by the distributive law in set
theory, then application of (2.2) yields the recursion in q
l
l
k
k
q31
q31
Pr [q
n=1 Dn ] = Pr n=1 Dn + Pr [Dq ] 3 Pr n=1 Dn K Dq

(2.6)

By direct substitution of q < q 3 1, we have


k
k
l
k
l
l
q32
q32
Pr q31
n=1 Dn = Pr n=1 Dn + Pr [Dq31 ] 3 Pr n=1 Dn K Dq31
while substitution in this formula of Dn < Dn K Dq gives
l
k
l
l
k
k
q32
q32
Pr q31
n=1 Dn K Dq = Pr n=1 Dn K Dq + Pr [Dq31 K Dq ] 3 Pr n=1 Dn K Dq K Dq31
Substitution of the last two terms into (2.6) yields
l
k
q32
Pr [q
n=1 Dn ] = Pr [Dq31 ] + Pr [Dq ] 3 Pr [Dq31 K Dq ] + Pr n=1 Dn
k
l
k
l
k
l
q32
q32
3 Pr q32
n=1 Dn K Dq31 3 Pr n=1 Dn K Dq + Pr n=1 Dn K Dq K Dq31
(2.7)
Similarly, in a next iteration we use (2.6) after suitable modication in the right-hand side of (2.7)
to lower the upper index in the union,
l
k
l
l
k
k
q33
q33
Pr q32
n=1 Dn = Pr n=1 Dn + Pr [Dq32 ] 3 Pr n=1 Dn K Dq32
l
k
l
k
q33
Pr q32
n=1 Dn K Dq31 = Pr n=1 Dn K Dq31 + Pr [Dq32 K Dq31 ]
k
l
3 Pr q33
n=1 Dn K Dq31 K Dq32
l
k
l
l
k
k
q33
q33
Pr q32
n=1 Dn K Dq = Pr n=1 Dn K Dq + Pr[Dq32 K Dq ]3Pr n=1 Dn K Dq K Dq32
k
l
k
l
q33
Pr q32
n=1 Dn K Dq K Dq31 = Pr n=1 Dn K Dq K Dq31 + Pr [Dq32 K Dq K Dq31 ]
l
k
3 Pr q33
n=1 Dn K Dq K Dq31 K Dq32
The result is
Pr [q
n=1 Dn ] = Pr [Dq32 ] + Pr [Dq31 ] + Pr [Dq ] + 3 Pr [Dq32 K Dq31 ] 3 Pr [Dq32 K Dq ]
l
k
3 Pr [Dq31 K Dq ] + Pr [Dq32 K Dq31 K Dq ] + Pr q33
n=1 Dn
k
l
k
l
k
l
q33
q33
3 Pr q33
n=1 Dn K Dq32 3 Pr n=1 Dn K Dq31 3 Pr n=1 Dn K Dq
k
l
k
l
q33
+ Pr q33
n=1 Dn K Dq31 K Dq32 + Pr n=1 Dn K Dq K Dq32
k
l
k
l
q33
+ Pr q33
n=1 Dn K Dq K Dq31 3 Pr n=1 Dn K Dq K Dq31 K Dq32
which starts revealing the structure of (2.3). Rather than continuing the iterations, we prove the
validity of the inclusion-exclusion formula (2.3) via induction. In case q = 2, the basic expression
(2.2) is found. Assume that (2.3) holds for q, then the case for q + 1 must obey (2.6) where
q < q + 1,
k
l
q
q
Pr q+1
n=1 Dn = Pr [n=1 Dn ] + Pr [Dq+1 ] 3 Pr [n=1 Dn K Dq+1 ]

14

Random variables

Substitution of (2.3) into the above expression yields, after suitable grouping of the terms,
q
q
l
k
[
 [

Pr Dn1 3
Pr q+1
n=1 Dn = Pr[Dq+1 ] +
n1 =1

q
 [



Pr Dn1 K Dn2 3
Pr Dn1 K Dq+1

n1 =1 n2 =n1 +1

q
[

q
[

q
[

q
[

n1 =1

q
 [

Pr Dn1 KDn2 K Dn3 +

n1 =1 n2 =n1 +1 n3 =n2 +1

+ + (31)q31

q
[

+ + (31)q

q
[

Pr [Dn ] 3

n1 =1

q+1
[

q+1
[

k
l
Pr Kq
m=1 Dnm 3

q
[

n1 =1 n2 =n1 +1
q+1
[



Pr Dn1 KDn2 KDq+1

n1 =1 n2 =n1 +1
q
[

n1 =1 n2 =n1 +1
q
[

q
[

nq =nq1 +1

l
k
Pr Kq
m=1 Dnm K Dq+1

q
[
nq =nq1 +1



Pr Dn1 K Dn2

n1 =1 n2 =n1 +1

q+1
[

q+1
[

q+1
[



Pr Dn1 K Dn2 K Dn3

n1 =1 n2 =n1 +1 n3 =n2 +1

+ + (31)q

q+1
[

q+1
[

n1 =1 n2 =n1 +1

which proves (2.3).

q+1
[

l
k
Pr Kq
m=1 Dnm K Dq+1

nq+1 =nq +1

Although impressive, the inclusion-exclusion formula is useful when dealing with dependent
random
variables because of its general nature. In parh
i
m
ticular, if Pr _p=1 Dnp = dm and not a function of the specic indices np ,
the inclusion-exclusion formula (2.4) becomes more attractive,
Pr [^qn=1 Dn ]

q
X
=
(1)m31 dm
m=1

1$n1 ?n2 ?===?nm $q


q
X
m31 q
=
dm
(1)
m
m=1

An application of the latter formula to multicast can be found in Chapter


17 and many others are in Feller (1970, Chapter IV). Sometimes it is useful
to reason with the complement of the union (^qn=1 Dn )f = \ ^qn=1 Dn =
_qn=1 Dfn . Applying Axiom 2 to (^qn=1 Dn )f ^ (^qn=1 Dn ) = ,
Pr [(^qn=1 Dn )f ] = Pr [ ]  Pr [^qn=1 Dn ]
and using Axiom 1 and the inclusion-exclusion formula (2.5), we obtain
Pr [(^qn=1 Dn )f ] = 1 

q
q
X
X
(1)m31 Vm =
(1)m Vm
m=1

m=0

(2.8)

2.1 Probability theory and set theory

15

with the convention that V0 = 1. The Booles inequalities


Pr [^qn=1 Dn ] 

q
X

Pr [Dn ]

(2.9)

n=1

Pr [_qn=1 Dn ]  1 

q
X

Pr [Dfn ]

n=1

are derived as consequences of the inclusion-exclusion formula (2.3). Only if


all events are mutually exclusive, the equality sign in (2.9) holds whilst the
inequality sign follows from the fact that possible overlaps in events are, in
contrast to the inclusion-exclusion formula (2.3), not subtracted.
The inclusion-exclusion formula is of a more general nature and also applies to other measures on sets than Pr [=], for example to the cardinality as
mentioned above. For the cardinality of a set D, which is usually denoted
by |D|, the inclusion-exclusion variant of (2.8) is
|(^qn=1 Dn )f | =

q
X
(1)m |Vm |

(2.10)

m=0

where the total number of elements in the sample space is |V0 | = Q and

X
m

|Vm | =
_p=1 Dnp
1$n1 ?n2 ?===?nm $q

A nice illustration of the above formula (2.10) applies to the sieve of


Eratosthenes (Hardy and Wright, 1968, p. 4), a procedure to construct the
table of prime numbers4 up to Q . Consider the increasing sequence of
integers
= {2> 3> 4> = = = > Q }
and remove successively all multiples of 2 (even numbers starting from 4,
6, ...), all multiples of 3 (starting from 32 and not yet removed previously),
all multiples of 5, all multiples of the next number larger than 5 and still in
the list (which is the prime 7) and so on, up to all multiples
hs i of the largest
Q . Here [{] is the
possible prime divisor that is equal to or smaller than
largest integer smaller than or equal to {. The remaining numbers in the
list are prime numbers. Let us now compute the number of primes (Q )
smaller than or equal to Q by using the inclusion-exclusion formula (2.10).
4

An integer number s is prime if s A 1 and s has no other integer divisors than 1 and itself
s. The sequence of the rst primes are 2, 3, 5, 7, 11, 13, etc. If I
d and e are divisors of q,
then q = de from which it follows that d and e cannot exceed
both q. Hence, any composite
I
number q is divisible by a prime s that does not exceed q.

16

Random variables

The number of primes smaller than a real number { is ({) and, evidently,
if sq denotes the q-th prime, then  (sq ) = q. Let Dn denote the set of the
multiples of the n-th prime sn that belong to . The number of such sets Dn
in the sieve of Eratosthenes
is equal tothe largest
prime number sq smaller
hs i
s
than or equal to
Q , hence, q = 
Q . If t 5 (^qn=1 Dn )f , this means
that t is not divisible by each prime
s number smaller than sq and that t is
a prime number lying between Q ? t sQ . The cardinality of the set
(^qn=1 Dn )f , the number of primes between Q ? t  Q is
s
f
q
|(^n=1 Dn ) | = (Q )  
Q
On the other hand, if u 5 _mp=1 Dnp for 1  n1 ? n2 ? ? nm  q, then
u is a multiple of sn1 sn2 = = = snm and the number of multiples of the integer
sn1 sn2 = = = snm in is


= _mp=1 Dnp
sn1 sn2 = = = snm
Applying
inclusion-exclusion formula (2.10) with | | = V0 = Q  1 and
s the
Q gives
q=
(Q )  

q
s
X
Q =Q 1
(1)m
m=1

X
1$n1 ?n2 ?===?nm $q

Q
sn1 sn2 = = = snm

hs i
Q , i.e. the
The knowledge of the prime numbers smaller than or equal to
s
rst q = 
Q primes, su!ces to compute the number of primes (Q )
smaller than
s or equal to Q without explicitly knowing the primes t lying
between Q ? t  Q .

2.2 Discrete random variables


Discrete random variables are real functions [ dened on a discrete probability space as [ : $ R with the property that the event
{$ 5 : [ ($) = {} 5 F
for each { 5 R. The event {$ 5 : [ ($) = {} is further abbreviated as
{[ = {}. A discrete probability density function (pdf) Pr[[ = {] has the
following properties:
(i) 0  Pr[[ = {]  1 for real { that are possible outcomes of an

2.2 Discrete random variables

17

experiment. The set of values { can be nite or countably innite


and constitute the discrete probability space.
P
(ii)
{ Pr[[ = {] = 1=
In the classical example of throwing a die, the discrete probability space
= {1> 2> 3> 4> 5> 6} and, since each of the six edges of the (fair) die is equally
possible as outcome, Pr[[ = {] = 16 for each { 5 .
2.2.1 The expectation
An important operator acting on a discrete random variable [ is the expectation, dened as
X
H [[] =
{ Pr [[ = {]
(2.11)
{

The expectation H [[] is also called the mean or average or rst moment of
[. More generally, if [ is a discrete random variable and j is a function,
then \ = j([) is also a discrete random variable with expectation H [\ ]
equal to
X
j({) Pr [[ = {]
(2.12)
H [j([)] =
{

A special and often used function in probability theory is the indicator


function 1| dened as 1 if the condition | is true and otherwise it is zero.
For example,
X
X
1{Ad Pr [[ = {] =
Pr [[ = {] = Pr[[ A d]
H [1[Ad ] =
{

{Ad

H [1[=d ] = Pr[[ = d]

(2.13)

The higher moments of a random variable are dened as the case where
j({) = {q ,
X
H [[ q ] =
{q Pr [[ = {]
(2.14)
{

From the denition (2.11), it follows that the expectation is a linear operator,
#
" q
q
X
X
dn [n =
dn H [[n ]
H
n=1

n=1

The variance of [ is dened as

h
i
Var[[] = H ([  H [[])2

(2.15)

18

Random variables

The variance is always non-negative. Using the linearity of the expectation


operator and  = H [[], we rewrite (2.15) as

Var[[] = H [ 2  2
(2.16)

Since Var[[]  0, relation (2.16) indicates that H [ 2  (H [[])2 . Often
p
the standard deviation, dened as  = Var [[], is used. An interesting
variational principle of the variance follows, for the variable x, from
i
h
i
h
H ([  x)2 = H ([  )2 + (x  )2
which is minimized at x =  = H [[] with value Var[[]. Hence, the best
least square approximation of the random variable [ is the number H [[].

2.2.2 The probability generating function


The probability generating function (pgf) of a discrete random variable [
is dened, for complex }, as
X {
} Pr [[ = {]
(2.17)
*[ (}) = H } [ =
{

where the last equality follows from (2.12). If [ is integer-valued and nonnegative, then the pgf is the Taylor expansion of the complex function *[ (}).
Commonly the latter restriction applies, otherwise the substitution
} =
hlw is used such that (2.17) expresses the Fourier series of *[ hlw . The
importance of the pgf mainly lies in the fact that the theory of functions can
be applied. Numerous examples of the power of analysis will be illustrated.
Concentrating on non-negative integer random variables [,
*[ (}) =

"
X

Pr [[ = n] } n

(2.18)

n=0

and the Taylor coe!cients obey

1 gn *[ (})
Pr [[ = n] =
n!
g} n }=0
Z
1
*[ (})
=
g}
2l F(0) } n+1

(2.19)
(2.20)

where F(0) denotes a contour around } = 0. Both are inversion formulae5 .


Since the general form H[j([)] is completely dened when Pr[[ = {] is
5

A similar inversion formula for Fourier series exist (see e.g. Titchmarsh (1948)).

2.2 Discrete random variables

19

known, the knowledge of the pgf results in a complete alternative description,

"
X
j(n) gn *[ (})
(2.21)
H [j([)] =
n!
g} n }=0
n=0

Sometimes it is more convenient to compute values of interest directly from


(2.17)
rather than from (2.21). For example, q-fold dierentiation of *[ (}) =
H } [ yields


[ [3q
gq *[ (})
1
[3q
}
= H
= H [([  1) ([  q + 1)}
q
g} q
q!
such that


1 gq *[ (})
[
=
H
q!
g} q }=1
q

(2.22)

Similarly, let } = hw , then

gq *[ (hw )
= H [ q hw[
q
gw
from which the moments follow as

gq *[ (hw )
H [[ ] =
gwq w=0
q

(2.23)

and, more generally,

gq h3wd *[ (hw )
H [([  d) ] =

gwq
q

(2.24)

w=0

2.2.3 The logarithm of the probability generating function


The logarithm of the probability generating function is dened as

O[ (}) = log (*[ (})) = log H } [

(2.25)
*0 (})

from which O[ (1) = 0 because *[ (1) = 1. The derivative O0[ (}) = *[


[ (})
2
0
00 (})
*
*
(})
shows that O0[ (1) = *0[ (1), while from O00[ (}) = *[
 *[
, it follows
[ (})
[ (})
that O00[ (1) = *00[ (1)  (*0[ (1))2 . These rst few derivatives are interesting
because they are related directly to probabilistic quantities. Indeed, from
(2.23), we observe that
H[[] = *0[ (1) = O0[ (1)

(2.26)

20

Random variables

and from H[[ 2 ] = *00[ (1) + *0[ (1)

2
Var[[] = *00[ (1) + *0[ (1)  *0[ (1)
= O00[ (1) + O0[ (1)

(2.27)

2.3 Continuous random variables


Although most of the concepts dened above for discrete random variables
are readily transferred to continuous random variables, the calculus is in
general more di!cult. Indeed, instead of reasoning on the pdf, it is more
convenient to work with the probability distribution function dened for
both discrete and continuous random variables as
I[ ({) = Pr [[  {]

(2.28)

Clearly, we have lim{<3" I[ ({) = 0, while lim{<+" I[ ({) = 1. Further,


I[ ({) is non-decreasing in { and
Pr [d ? [  e] = I[ (e)  I[ (d)

(2.29)

This relation follows from the observations {[  d} ^ {d ? [  e} =


{[  e} and {[  d} _ {d ? [  e} = >. For mutually exclusive events
D _ E = >, Axiom 2 in Section 2.1 states that Pr [D ^ E] = Pr [D] + Pr [E]
which proves (2.29). As a corollary of (2.29), I[ ({) is continuous at the
right which follows from (2.29) by denoting d = e   for any  A 0. Less
precise, it follows from the equality sign at the right, [  e, and inequality
at the left, d ? [. Hence, I[ ({) is not necessarily continuous at the left
which implies that I[ ({) is not necessarily continuous and that I[ ({) may
possess jumps. But even if I[ ({) is continuous, the pdf is not necessary
continuous6 .
The pdf of a continuous random variable [ is dened as
i[ ({) =
6

gI[ ({)
g{

(2.30)

Weierstrass was the rst to present a continuous non-dierentiable function,


i ({) =

"
[

eq cos (dq {)

q=0

where 0 ? e ? 1 and d is an odd positive integer. Since the series is uniformly convergent
for any {, i ({) is continuous everywhere. Titchmarsh (1964, Chapter IX) demonstrates for
i ({+k)3i ({)
that
takes arbitrarily large values such that i 0 ({) does not exist.
de A 1 + 3
2
k
Another class of continuous non-dierentiable functions are the sample paths of a Brownian
motion. The Cantor function which is discussed in (Berger, 1993, p. 21) and (Billingsley, 1995,
p. 407) is an other classical, noteworthy function with peculiar properties.

2.3 Continuous random variables

21

Assuming that I[ ({) is dierentiable at {, from (2.29), we have for small,


positive {
Pr [{ ? [  { + {] = I[ ({ + {)  I[ ({)

gI[ ({)
{ + R ({)2
=
g{
Using the denition (2.30) indicates that, if I[ ({) is dierentiable at {,
Pr [{ ? [  { + {]
{{<0
{

i[ ({) = lim

(2.31)

If i[ ({) is nite, then lim{{<0 Pr [{ ? [  { + {] = Pr [[ = {] = 0,


which means that for well-behaved (i.e. I[ ({) is dierentiable for most {)
continuous random variables [, the event that [ precisely equals { is zero7 .
Hence, for well-behaved continuous random variables where Pr [[ = {] = 0
for all {, the inequality signs in the general formula (2.29) can be relaxed,
Pr [d ? [  e] = Pr [d  [  e] = Pr [d  [ ? e] = Pr [d ? [ ? e]
If i[ ({) is not nite, then I[ ({) is not dierentiable at { such that
lim I[ ({ + {)  I[ ({) = I[ ({) 6= 0

{{<0

This means that I[ ({) jumps upwards at { over I[ ({). In that case,
there is a probability mass with magnitude I[ ({) at the point {. Although the second denition (2.31) is strictly speaking not valid in that
case, one sometimes denotes the pdf at | = { by i[ (|) = I[ ({)(|  {)
where
R +" ({) is the Dirac impulse or delta function with basic property that
3" (|  {)g{ = 1. Even apart from the above-mentioned di!culties
for certain classes of non-dierentiable, but continuous functions, the fact
that probabilities are always conned to the region [0,1] may suggest that
0  i[ ({)  1. However, the second denition (2.31) shows that i[ ({) can
be much larger than 1. For example, if [ is a Gaussian random variable
1
can be
with mean  and variance  2 (see Section 3.2.3) then i[ () = I2
made arbitrarily large. In fact,

2
exp  ({3)
22
s
= ({  )
lim
<0
2
7

In Lesbesgue measure theory (Titchmarsh, 1964; Billingsley, 1995), it is said that a countable,
nite or enumerable (i.e. function evaluations at individual points) set is measurable, but its
measure is zero.

22

Random variables

2.3.1 Transformation of random variables


It frequently appears useful to know how to compute I\ ({) for \ = j([).
the event
 {} is equivalent
Only
j 31 exists,
if the31inverse
function

{j([)
gj
gj
to [  j ({) if g{ A 0 and to [ A j 31 ({) if g{
? 0. Hence,
(

gj
I[ j 31 ({) >
A0
31 g{
I\ ({) = Pr [j([)  {] =
(2.32)
gj
1  I[ j ({) > g{ ? 0
For well-behaved continuous random variables, we may rewrite (2.31) in
terms of dierentials,
i[ ({) g{ = Pr [{  [  { + g{]
and, similarly for i\ (|),
i\ (|) g| = Pr [|  \ = j ([)  | + g|]
If
j is increasing, then the event {|  j ([)  | + g|} is equivalent to
31
j (|)  [  j 31 (| + g|) = {{  [  { + g{} such that
i\ (|) g| = i[ ({) g{
If j is deceasing, we nd that i\ (|) g| = i[ ({) g{. Thus, if j 31 and
j 0 exists, then the relation between the pdf of a well-behaved continuous
random variable [ and that of the transformed random variable \ = j([)
is

g{ i[ ({)
i\ (|) = i[ ({) = 0
g|
|j ({)|
This expression also follows by straightforward dierentiation of (2.32). The
chi-square distribution introduced in Section 3.3.3 is a nice example of the
transformation of random variables.
2.3.2 The expectation
Analogously to the discrete case, we dene the expectation of a continuous
random variable as
Z "
{i[ ({)g{
(2.33)
H [[] =
3"

R"
In addition for the expectation to exist8 , we require 3" |{| i[ ({)g{ ? 4.
If [ is a continuous random variable and j is a continuous function, then
8

This requirement is borrowed from measure theory and Lebesgue integration (Titchmarsh, 1964,
Chapter X)(Royden, 1988, Chapter 4), where a measurable function is said to be integrable (in
the Lebesgue sense) over D if i + = max(i ({)> 0) and i 3 = max(3i ({)> 0) are both integrable
over D. Although this restriction seems only of theoretical interest, in some applications (see the

2.3 Continuous random variables

23

\ = j([) is also a continuous random variable with expectation H [\ ] equal


to
Z
"

H [j([)] =
3"

j({)i[ ({)g{

(2.34)

It is often useful to express the expectation H [[] of a non-negative random


variable [ in tail probabilities. Upon integration by parts,
" Z " Z "
Z "
Z "

H [[] =
{i[ ({)g{ = {
i[ (x)gx +
g{
i[ (x)gx
{
0
{
0
Z0 "
(1  I[ ({)) g{
(2.35)
=
0

The case for a non-positive random variable [ is derived analogously,


0
Z 0
Z {
Z 0
Z {

{i[ ({)g{ = {
i[ (x)gx

g{
i[ (x)gx
H [[] =
3"
Z 0

=
3"

3"

3"

3"

3"

I[ ({)g{

The general case follows by addition:


Z
Z "
(1  I[ ({)) g{ 
H [[] =

3"

I[ ({)g{

A similar expression exists for discrete random variables. In general for


any discrete random variable [, we can write
H [[] =

"
[
n=3"

31
[

31
[

n Pr [[ = n] =

n Pr [[ = n] +

n=3"

31
[

n Pr [[ = n]

n=0
"
[

n (Pr [[ $ n] 3 Pr [[ $ n 3 1]) +

n=3"

"
[

n (Pr [[ D n] 3 Pr [[ D n + 1])

n=0
32
[

n Pr [[ $ n] 3

n=3"

= 3 Pr [[ $ 31] 3

(n + 1) Pr [[ $ n] +

n=3"
32
[
n=3"

Pr [[ $ n] +

"
[
n=1

"
[

n Pr [[ D n] 3

"
[

(n 3 1) Pr [[ D n]

n=1

Pr [[ D n]

n=1

Cauchy distribution dened


U in (3.38)) the Riemann integral may exists where the Lesbesgue
does not. For example, 0" sin{ { g{ equals, in the Riemann sense, 2 (which is a standard
excercise in contour integration), but this integral does not exists in the Lesbesgue sense.
Only for improper integrals (integration interval is innite), Riemann integration may exist
where Lesbesgue does not. However, in most other cases (integration over a nite interval),
U
Lesbesgue integration is more general. For instance, if i ({) = 1{{ is ra tio n a l} , then 01 i (x)gx
does not exist in the Riemann sense (since upper and lower sums do not converge to each
U
other). However, 01 i (x)gx = 0 in the Lesbesgue sense (since there is only a set of measure
zero dierent from 0, namely all rational numbers in [0> 1] ). In probability theory and measure
theory, Lesbesgue integration is assumed.

24

Random variables

or the mean of a discrete random variable [ expressed in tail probabilities


is9
"
31
X
X
H [[] =
Pr [[  n] 
Pr [[  n]
(2.36)
n=1

n=3"

2.3.3 The probability generating function


The probability generating function (pgf) of a continuous random variable
[ is dened, for complex }, as the Laplace transform
Z "

*[ (}) = H h3}[ =
h3}w i[ (w)gw
(2.37)
3"

Again, in some cases, it may be more convenient to use } = lx in which case


the double sided Laplace transform reduces to a Fourier transform. The
strength of these transforms is based on the numerous properties, especially
the inverse transform,
Z f+l"
1
*[ (})h}w g}
(2.38)
i[ (w) =
2l f3l"
where f is the smallest real variable Re(}) for which the integral in (2.37)
converges.
as for discrete random variables, we have h}d *[ (}) =

3}([3d)Similarly
H h

q }d

q
q g (h *[ (}))
(2.39)
H [([  d) ] = (1)

g} q
}=0

The main dierence with the discrete case lies in the denition H h3}[
(continuous) versus H } [ (discrete). Since the exponential is an entire
9

We remark that
"
[

H [[] =

n Pr [[ = n] =

n=3"
"
[

6=

"
[

n (Pr [[ D n] 3 Pr [[ D n + 1])

n=3"

n Pr [[ D n] 3

n=3"

"
[

n Pr [[ D n + 1] =

n=3"

"
[

Pr [[ D n]

n=3"

because the series in the second line are diverging. In fact, there exists a nite integer n such
that, for any real arbitrarily small  A 0 holds that Pr [[ D n ] = 1 3  and Pr [[ D n ] $
Pr [[ D n] for all n ? n . Hence,
H [[] =

n
[
n=3"

Pr [[ D n] +

"
[
n=n

Pr [[ D n] D (1 3 )

n
[

1+f<"

n=3"

S
S"
where "
n=n Pr [[ D n] = f is nite. Also, even for negative [,
n=3" Pr [[ D n] is always
positive.

2.3 Continuous random variables

25

P"

function10 with power series around } = 0, h3}[ =


expectation and summation can be reversed leading to

"
3}[ X
(1)n H [ n n
=
}
H h
n!

n=0

(31)n [ n
n!

} n , the

(2.40)

n=0


provided11 H [ n = R (n!) which is a necessary condition for the summation
to converge
for } 6= 0. Assuming convergence12 , the Taylor series
3}[

of H h
around } = 0 is expressed as function
of the moments of [,
whereas in the discrete case, the Taylor series of H } [ around } = 0 given
by (2.18) is expressed
in terms of probabilities of [. This observation has

led

3}[
to call H h
sometimes the moment generating function, while H } [
is the probability generating function
[ of the random variable [. On the
around } = 1,
other hand, series expansion of H }
*[ (}) =

"
X

Pr [[ = n] (} + 1  1) =

n=0

"
X
n=0

n
X
n
Pr [[ = n]
(}  1)m
m

5
6
"
"
X
X
n
7
Pr [[ = n]8 (}  1)m
=
m
m=0

m=0

n=m

shows with (2.22) that


H

 X
"
n
[
Pr [[ = n]
=
m
m
n=m


If moments are desired, the substitution } $ h3x in H } [ is appropriate.

2.3.4 The logarithm of the probability generating function


The logarithm of the probability generating function is dened as


O[ (}) = log (*[ (})) = log H h3}[
10

11

12

(2.41)

An entire (or integral) function is a complex function without singularities in the nite complex
plane. Hence, a power series around any nite point has innite radius of convergence. In other
words, it exists for all nite complex values.
The Landau big R-notation species the order of a function when the argument tends to some
limit. Most often the limit is to innity, but the R-notation can also be used to characterize
the behavior of a function around some nite point. Formally, i ({) = R (j ({)) for { < "
means that there exist positive numbers f and {0 for which |i ({)| $ f|j({)| for { A {0 .
The lognormal distribution dened by (3.43) is an example where the summation (2.40) diverges
for any } 6= 0.

26

Random variables

from which O[ (0) = 0 because *[ (0) = 1. Further, analogous to the


discrete case, we see that O0[ (0) = *0[ (0), O00[ (0) = *00[ (0)  (*0[ (0))2 and
H[[] = *0[ (0) = O0[ (0)
However, the dierence with the discrete case lies in the higher moments,

q
q g *[ (})
H [[ ] = (1)
(2.42)
g} q }=0
because with H[[ 2 ] = *00[ (0),

2
Var[[] = *00[ (0)  *0[ (0)
= O00[ (0)

(2.43)

The latter expression makes O[ (}) for a continuous random variable particularly useful. Since the variance is always positive, it demonstrates that
O[ (}) is convex (see Section 5.5) around } = 0. Finally, we mention that

H ([  H[[])3 = O000
[ (0)

2.4 The conditional probability


The conditional probability of the event D given the event E (or on the
hypothesis E) is dened as
Pr [D|E] =

Pr [D _ E]
Pr [E]

(2.44)

The denition implicitly assumes that the event E has positive probability,
otherwise the conditional probability remains undened. We quote Feller
(1970, p. 116):
Taking conditional probabilities of various events with respect to a particular hypothesis E amounts to choosing E as a new sample space with probabilities proportional to the original ones; the proportionality factor Pr[E] is necessary in order
to reduce the total probability of the new sample space to unity. This formulation shows that all general theorems on probabilities are valid for conditional
probabilities with respect to any particular hypothesis. For example, the law
Pr [D ^ E] = Pr [D] + Pr [E]  Pr [D _ E] takes the form
Pr [D ^ E|F] = Pr [D|F] + Pr [E|F]  Pr [D _ E|F]

The formula (2.44) is often rewritten in the form


Pr [D _ E] = Pr [D|E] Pr [E]

(2.45)

2.4 The conditional probability

27

which easily generalizes to more events. For example, denote D = D1 and


E = D2 _ D3 , then
Pr [D1 _ D2 _ D3 ] = Pr [D1 |D2 _ D3 ] Pr [D2 _ D3 ]
= Pr [D1 |D2 _ D3 ] Pr [D2 |D3 ] Pr [D3 ]
Another application of the conditional probability occurs when a partitioning of the sample space is known: = ^n En and all En are mutually
exclusive, which means that En _ Em = > for any n and m 6= n. Then, with
(2.45),
X
X
Pr [D _ En ] =
Pr [D|En ] Pr [En ]
n

The event Dn = {D _ En } is a decomposition (or projection) of the event D


in the basis event En , analogous to the decomposition of a vector in terms
of a set of orthogonal basis vectors that span the total state space. Indeed,
using the associative property D _ {E _ F} = D _ E _ F and D _ D = D,
the intersection Dn _ Dm = {D _ En } _ {D _ Em } = D _ {En _ Em } = >,
which implies mutual exclusivity (or orthogonality). Using the distributive
property D _ {En ^ Em } = {D _ En } ^ {D _ Em }, we observe that
D=D_
= D _ {^n En } = ^n {D _ En } = ^n Dn

P
Finally, since all events Dn are mutually exclusive, Pr [D] = n Pr [Dn ] =
P
n Pr [D _ En ]. Thus, if = ^n En and in addition, for any pair m> n holds
that En _ Em = >, we have proved the law of total probability or decomposability,
X
Pr [D|En ] Pr [En ]
(2.46)
Pr [D] =
n

Conditioning on events is a powerful tool that will be used frequently. If


the conditional probability Pr [D|En ] is known as a function j (En ), the law
of total probability can also be written in terms of the expectation operator
dened in (2.12) as
Pr [D] = H [j (En )]

(2.47)

Also the important memoryless property of the exponential distribution (see


Section 3.2.2) is an example of the application of the conditional probability.
Another classical example is Bayes rule. Consider again the events En
dened above. Using the denition (2.44) followed by (2.45),
Pr [En |D] =

Pr [D _ En ]
Pr [D|En ] Pr [En ]
Pr [En _ D]
=
=
Pr [D]
Pr [D]
Pr [D]

(2.48)

28

Random variables

Using (2.46), we arrive at Bayes rule


Pr [D|En ] Pr [En ]
Pr [En |D] = P
m Pr [D|Em ] Pr [Em ]

(2.49)

where Pr [En ] are called the a-priori probabilities, while Pr [En |D] are the
a-posteriori probabilities.
The conditional distribution function of the random variable \ given [
is dened by
I\ |[ (||{) = Pr [\  ||[ = {]

(2.50)

for any { provided Pr [[ = {] A 0. This condition follows from the denition


(2.44) of the conditional probability. The conditional probability density
function of \ given [ is dened by
i\ |[ (||{) = Pr [\ = ||[ = {] =
=

Pr[[ = {> \ = |]
Pr [[ = {]

i[\ ({> |)
i[ ({)

(2.51)

for any { such that Pr [[ = {] A 0 (and similarly for continuous random


variables i[ ({) A 0) and where i[\ ({> |) is the joint probability density
function dened below in (2.59).

2.5 Several random variables and independence


2.5.1 Discrete random variables
Two events D and E are independent if
Pr [D _ E] = Pr [D] Pr [E]

(2.52)

Similarly, we dene two discrete random variables to be independent if


Pr [[ = {> \ = |] = Pr [[ = {] Pr [\ = |]

(2.53)

If ] = i ([> \ ), then ] is a discrete random variable with


X
Pr [] = }] =
Pr [[ = {> \ = |]
i ({>|)=}

Applying the expectation operator (2.11) to both sides yields


X
i ({> |) Pr [[ = {> \ = |]
H [i ([> \ )] =
{>|

(2.54)

2.5 Several random variables and independence

29

If [ and \ are independent and i is separable, i ({> |) = i1 ({)i2 (|), then


the expectation (2.54) reduces to
X
X
i1 ({) Pr [[ = {]
i2 (|) Pr [\ = |] = H [i1 ([)] H [i2 (\ )]
H [i ([> \ )] =
{

(2.55)
The simplest example of the general function is ] = [ + \ . In that case,
the sum is over all { and | that satisfy { + | = }. Thus,
X
X
Pr [[ = {> \ = }  {] =
Pr [[ = }  |> \ = |]
Pr [[ + \ = }] =
{

If [ and \ are independent, we obtain the convolution,


X
Pr [[ + \ = }] =
Pr [[ = {] Pr [\ = }  {]
{

Pr [[ = }  |] Pr [\ = |]

2.5.2 The covariance


The covariance of [ and \ is dened as
Cov [[> \ ] = H [([  [ ) (\  \ )] = H [[\ ]  [ \

(2.56)

If Cov[[> \ ] = 0, then the variables [ and \ are uncorrelated. If [ and \


are independent, then Cov[[> \ ] = 0. Hence, independence implies uncorrelation, but the converse is not necessarily true. The classical example13 is
Q (0> 1) (Section 3.2.3) because
\ = [ 2 where [
has2 a normal
3distribution

[ = 0 and H [\ = H [ = 0 as follows from (3.23). Although [


and \ are perfect dependent, they are uncorrelated. Thus, independence is
a stronger property than uncorrelation. The covariance Cov[[> \ ] measures
the degree of dependence between two (or generally more) random variables.
If [ and \ are positively (negatively) correlated, the large values of [ tend
to be associated with large (small) values of \ .
As an application of the covariance, consider the problem of computing the
variance of a sum Vq of random variables [1 > [2 > = = = > [q . Let n = H [[n ],
13

Another example: let X be uniform on [0> 1] and [ = cos(2X) and \ = sin (2X ). Using
(2.34),
] 1
cos(2x) sin (2x) gx = 0
H [[\ ] =
0

as well as H [[] = H [\ ] = 0. I
Thus, Cov[[> \ ] = 0, but [ and \ are perfectly dependent
because [ = cos (arcsin \ ) = 1 3 \ 2 .

30

Random variables

then H [Vq ] =

Pq

n=1 n

and

5
!2 6
q
i
h
X
([n  n ) 8
Var [Vq ] = H (Vq  H [Vq ])2 = H 7
n=1

5
6
q X
q
X
=H7
([n  n )([m  m )8
n=1 m=1

6
5
q
q
q
X
X
X
([n  n )([m  m )8
= H 7 ([n  n )2 + 2
n=1

n=1 m=n+1

Using the linearity of the expectation operator and the denition of the
covariance (2.56) yields
Var [Vq ] =

q
X
n=1

Var [[n ] + 2

q
q
X
X

Cov [[n > [m ]

(2.57)

n=1 m=n+1

Observe that for a set of independent random variables {[n } the double
sum with covariances vanishes.
The Cauchy-Schwarz inequality (5.17) derived in Chapter 5 indicates that
i h
i
h
(H [([  [ ) (\  \ )])2  H ([  [ )2 H ([  [ )2
such that the covariance is always bounded by
|Cov [[> \ ]|  [ \
2.5.3 The linear correlation coe!cient
Since the covariance is not dimensionless, the linear correlation coe!cient
dened as
Cov [[> \ ]
([> \ ) =
(2.58)
[ \
is often convenient to relate two (or more) dierent physical quantities expressed in dierent units. The linear correlation coe!cient remains invariant
(possibly apart from the sign) under a linear transformation because
(d[ + e> f\ + g) = sign(df)([> \ )
This transform shows that the linear correlation coe!cient ([> \ ) is inde2 provided  2 A 0.
pendent of the value of the mean [ and the variance [
[
Therefore, many computations simplify if we normalize the random variable
properly. Let us introduce the concept of a normalized random variable

2.5 Several random variables and independence

31

[
[ W = [3
[ . The normalized random variable has a zero mean and a
variance equal to one. By the invariance under a linear transform, the correlation coe!cient ([> \ ) = ([ W > \ W ) and also ([> \ ) = Cov[[ W > \ W ].
The variance of [ W \ W follows from (2.57) as

Var [[ W \ W ] = Var[[ W ] + Var[\ W ] 2 Cov [[ W > \ W ]


= 2(1 ([> \ ))
Since the variance is always positive, it follows that 1  ([> \ )  1.
The extremes ([> \ ) = 1 imply a linear relation between [ and \ . Indeed, ([> \ ) = 1 implies that Var[[ W  \ W ] = 0, which is only possible if
\ + f0 . A similar argu[ W = \ W + f, where f is a constant. Hence, [ = [
\
ment applies for the case ([> \ ) = 1. For example, in curve tting, the
goodness of the t is often expressed in terms of the correlation coe!cient.
A perfect t has correlation coe!cient equal to 1. In particular, in linear
regression where \ = d[ + e, the regression
coe!cients
h
i dU and eU are the
2
minimizers of the square distance H (\  (d[ + e)) and given by
dU =

Cov [[> \ ]
2
[

eU = H [\ ]  dU H [[]
Since a correlation coe!cient ([> \ ) = 1 implies Cov[[> \ ] = [ \ , we
see that dU = [
as derived above with normalized random variables.
\
Although the linear correlation coe!cient is a natural measure of the
dependence between random variables, it has some disadvantages. First,
the variances of [ and \ must exist, which may cause problems with
heavy-tailed distributions. Second, as illustrated above, dependence can
lead to uncorrelation, which is awkward. Third, linear correlation is not
invariant under non-linear strictly increasing transformations W such that
(W ([)> W (\ )) 6= ([> \ ). Common intuition expects that dependence measures should be invariant under these transforms W . This leads to the denition of rank correlation which satises that invariance property. Here, we
merely mention Spermans rank correlation coe!cient, which is dened as
V ([> \ ) = (I[ ([)> I\ (\ ))
where  is the linear correlation coe!cient and where the non-linear strict
increasing transform is the probability distribution. More details are found
in Embrechts et al. (2001b) and in Chapter 4.

32

Random variables

2.5.4 Continuous random variables


We dene the joint distribution function by I[\ ({> |) = Pr [[  {> \  |]
and the joint probability density function by
i[\ ({> |) =

C 2 I[\ ({> |)
C{C|

Hence,

I[\ ({> |) = Pr [[  {> \  |] =


The analogon of (2.54) is

"

3"

3"

i[\ (x> y)gxgy

(2.60)

"

H [j([> \ )] =
3"

(2.59)

3"

j({> |)i[\ ({> |)g{g|

(2.61)

Most of the di!culties occur in the evaluation of the multiple integrals. The
change of variables in multiple dimensions involves the Jacobian. Consider
the transformed random variables X = j1 ([> \ ) and Y = j2 ([> \ ) and
denote the inverse transform by { = k1 (x> y) and | = k2 (x> y), then
iX Y (x> y) = i[\ (k1 (x> y)> k1 (x> y)) M (x> y)
where the Jacobian M (x> y) is

M (x> y) = det

C{
Cx
C|
Cx

C{
Cy
C|
Cy

If [ and \ are independent and ] = [ + \ , we obtain the convolution,


Z "
Z "
i[ ({)i\ (}  {)g{ =
i[ (}  |)i\ (|)g|
(2.62)
i] (}) =
3"

3"

which is often denoted by i] (}) = (i[  i\ )(}). If both i[ ({) = 0 and


i\ ({) = 0 for { ? 0, then the denition (2.62) of the convolution reduces to
Z }
(i[  i\ )(}) =
i[ ({)i\ (}  {)g{
0

2.5.5 The sum of independent random variables


P
Let VQ = Q
n=1 [n , where the random variables [n are all independent.
We rst concentrate on the case where Q = q is a (xed) integer. Since
VQ = VQ31 + [Q , direct application of (2.62) yields the recursion
Z "
iVQ (}) =
iVQ 31 (}  |)i[Q (|)g|
(2.63)
3"

2.5 Several random variables and independence

33

which, when written out explicitly, leads to the Q -fold integral


Z
iVQ (}) =

4

i[Q (|Q )g|Q

4

i[1 (|1 )i[0 (}  |Q   |1 )g|1

(2.64)

In many cases, convolutions are more e!ciently computed via generating


functions. The generating function of Vq equals
" q
#
i
h Sq
Y
Vq
[
[
} n
= H } n=1 n = H
*Vq (}) = H }
n=1

Since all [n are independent, (2.55) can be applied,


*Vq (}) =

q
Y

H } [n

n=1

or, in terms of generating functions,


*Vq (}) =

q
Y

*[n (})

(2.65)

n=1

Hence, we arrive at the important result that the generating function of


a sum of independent random variables equals the product of the generating functions of the individual random variables. We also note that the
condition of independence is crucial in that it allows the product and expectation operator to be reversed, leading to the useful result (2.65). Often, the
random variables [n all possess the same distribution. In this case of independent identically distributed (i.i.d.) random variables with generating
function *[ (}), the relation (2.65) further simplies to
*Vq (}) = (*[ (}))q

(2.66)

In the case where the number of terms Q in the sum VQ is a random


variable with generating function *Q (}), independent of the [n , we use the
general denition of expectation (2.54) for two random variables,
" X
V X
Q
*VQ (}) = H }
} { Pr [VQ = {> Q = n]
=
n=0 {

" X
X

} { Pr [VQ = {|Q = n] Pr [Q = n]

n=0 {

where the conditional probability (2.45) is used. Since the value of VQ

34

Random variables

depends on the number of terms Q in the sum, we have Pr [VQ = {|Q = n] =


Pr [Vn = {]. Further, with
X
} { Pr [VQ = {|Q = n] = *Vn (})
{

we have
*VQ (}) =

"
X

*Vn (}) Pr [Q = n]

(2.67)

n=0

The average H [VQ ] follows from (2.26) as


H [VQ ] =

"
X

*0Vn (1) Pr [Q = n] =

n=0

H [Vn ] Pr [Q = n]

(2.68)

n=0

hP
n

"
X

Pn
Since H [Vn ] = H
m=1 [m =
m=1 H [[m ] and assuming that all random
variables [m have equal mean H [[m ] = H [[], we have
H [VQ ] =

"
X

nH [[] Pr [Q = n]

n=0

or
H [VQ ] = H [[] H [Q ]

(2.69)

This relation (2.69) is commonly called Walds identity. Walds identity


holds for any random sum of (possibly dependent) random variables [m
provided the number Q of those random variables is independent of the [m .
In the case of i.i.d. random variables, we apply (2.66) in (2.67) so that
*VQ (}) =

"
X

(*[ (}))n Pr [Q = n] = *Q (*[ (}))

(2.70)

n=0

This expression is a generalization of (2.66).

2.6 Conditional expectation


The generating function (2.67) of a random sum of independent random variables can be derived using the conditional expectation H [\ |[ = {] of two
random variables [ and \ . We will rst dene the conditional expectation
and derive an interesting property.
Suppose that we know that [ = {, the conditional density function

2.6 Conditional expectation

35

i\ |[ (||{) dened by (2.51) of the random variable \f = \ |[ can be regarded as only function of |. Using the denition of the expectation (2.33)
for continuous random variables (the discrete case is analogous), we have
Z "
H [\ |[ = {] =
|i\ |[ (||{) g|
(2.71)
3"

Since this expression holds for any value of { that the random variable
[ can take, we see that H [\ |[ = {] = j ({) is a function of { and, in
addition since [ = {, H [\ |[ = {] = j ([) can be regarded as a random
variable that is a function of the random variable [. Having identied the
conditional expectation H [\ |[ = {] as a random variable, let us compute its
expectation or the expectation of the slightly more general random variable
k ([) j ([) with j ([) = H [\ |[ = {]. From the general denition (2.34)
of the expectation, it follows that
Z "
Z "
k ({) j ({) i[ ({) g{ =
k ({) H [\ |[ = {] i[ ({) g{
H [k ([) j ([)] =
3"

3"

Substituting (2.71) yields


Z "Z "
k ({) |i\ |[ (||{) i[ ({) g|g{
H [k ([) j ([)] =
3" 3"
Z "Z "
=
k ({) |i[\ ({> |) g|g{ = H [k ([) \ ]
3"

3"

where we have used (2.51) and (2.61). Thus, we nd the interesting relation
H [k ([) H [\ |[ = {]] = H [k ([) \ ]

(2.72)

As a special case where k({) = 1, the expectation of the conditional expectation follows as
H [\ ] = H[ [H\ [\ |[ = {]]
where the index in H] claries that the expectation is over the random
P
variable ]. Applying this relation to \ = } VQ where VQ = Q
n=1 [n and
all [n are independent yields


*VQ (}) = H } VQ = HQ HV } VQ |Q = q

Since HV } VQ |Q = q = *VQ (}) and specied in (2.65), we end up with


*VQ (}) = HQ [*VQ (})] =

"
X
n=0

which is (2.67).

*Vn (}) Pr [Q = n]

3
Basic distributions

This chapter concentrates on the most basic probability distributions and


their properties. From these basic distributions, other useful distributions
are derived.

3.1 Discrete random variables


3.1.1 The Bernoulli distribution
A Bernoulli random variable [ can only take two values: either 1 with
probability s or 0 with probability t = 1  s. The standard example of
a Bernoulli random variable is the outcome of tossing a biased coin, and,
more generally, the outcome of a trial with only two possibilities, either
success or failure. The sample space is = {0> 1} and Pr[[ = 1] = s, while
Pr [[ = 0] = t. From this denition, the pgf follows from (2.17) as

*[ (}) = H } [ = } 0 Pr [[ = 0] + } 1 Pr [[ = 1]
or
*[ (}) = t + s}

(3.1)

From (2.23) or (2.14), the q-th moment is


H [[ q ] = s
which shows that  = H[[] = s. From (2.24), we nd H [([  d)q ] =
s(1  d)q + t(d)q such that the moments centered around the mean  are

H [([  )q ] = st q + (1)q tsq = st t q31 + (1)q sq31

Explicitly, with s + t = 1, Var[[] = st and H ([  )3 = st(t  s).


37

38

Basic distributions

3.1.2 The binomial distribution


A binomial random variable [ is the sum of q independent Bernoulli random
variables. The sample space is = {0> 1> > q}. For example, [ may
represent the number of successes in q independent Bernoulli trials such as
the number of heads after q-times tossing a (biased) coin. Application of
(2.66) with (3.1) gives
*[ (}) = (t + s})q

(3.2)

Expanding the binomial pgf in powers of }, which justies the name binomial,
q
X
q n q3n n
s t
}
*[ (}) =
n
n=0

and comparing to (2.18) yields



q n q3n
Pr[[ = n] =
s t
n

(3.3)

The alternative, probabilistic approach starts with (3.3). Indeed, the probability that [ has n successes out of q trials consists of precisely n successes
(an event with probability sn ) and q  n failures (with probability equal to
t q3n ). The total number
q of ways in which n successes out of q trials can be
obtained is precisely n .
P
The mean follows from (2.23) or from the denition [ = qm=1 [Bernoulli
and the linearity of the expectation as H [[] = qs. Higher order moments
around the mean can be derived from (2.24) as

q
q
gp h3wqs t + shw
gp X q n q3n w(tq3n)

p
t s
H [([  ) ] =
= p
h

n
gwp
gw
n=0
w=0
w=0
q
X
q n q3n
t s
=
(tq  n)p
n
n=0

In general, this form seems di!cult to express more elegantly. It illustrates


that, even for simple random variables, computations may rapidly become
unattractive. For p = 2, the above dierentiation leads to Var[[] = qst.
But, this result is more economically obtained from (2.27), since O[ (}) =
qs
qs2
q log (t + s}), O0[ (}) = t+s}
and O00[ (}) =  (t+s})
2 . Thus,
Var [[] = qs2 + qs = qst

(3.4)

3.1 Discrete random variables

39

3.1.3 The geometric distribution


The geometric random variable [ returns the number of independent Bernoulli
trials needed to achieve the rst success. Here the sample space is the
innite set of integers. The probability density function is
Pr [[ = n] = st n31

(3.5)

because a rst success (with probability s) obtained in the n-th trial is


proceeded by n  1 failures (each having probability t = 1  s). Clearly,
Pr [[ = 0] = 0. The series expansion of the probability generating function,
*[ (}) = s}

"
X

tn }n =

n=0

s}
1  t}

(3.6)

justies the name geometric.


The mean H [[] = *0[ (1) equals H [[] = s1 . The higher-order moments
can be deduced from (2.24) as

!
"
X
h3wt@s
gq
q
=
s
t n (n  t@s)q
H [([  ) ] = s q

gw
1  thw
w=0

n=0

Similarly as for the binomial random variable, the variance most easily folt
,
lows from (2.27) with O[ (}) = log s+log (})log(1t}), O0[ (}) = }1 + 13t}
O00[ (}) =  }12 +

t2
.
(13t})2

Thus,
Var [[] =

t
t2 t
+ = 2
2
s
s
s

(3.7)

P
The distribution function I[ (n) = Pr [[  n] = nm=1 Pr [[ = m] is obtained as
n31
X
1  tn
= 1  tn
tm = s
Pr [[  n] = s
1t
m=0

The tail probability is


Pr[[ A n] = t n

(3.8)

Hence, the probability that the number of trials until the rst success is
larger than n decreases geometrically in n with rate t. Let us now consider
an important application of the conditional probability. The probability
that, given the success is not found in the rst n trials, success does not
occur within the next p trials, is with (2.44)
Pr[[ A n + p|[ A n] =

Pr [{[ A n + p} _ {[ A n}]
Pr [[ A n + p]
=
Pr [[ A n]
Pr [[ A n]

40

Basic distributions

and with (3.8)


Pr[[ A n + p|[ A n] = tp  Pr[[ A p]
This conditional probability turns out to be independent of the hypothesis,
the event {[ A n}, and reects the famous memoryless property. Only because Pr[[ A n] obeys the functional equation i ({ + |) = i ({)i (|), the
hypothesis or initial knowledge does not matter. It is precisely as if past
failures have never occurred or are forgotten and as if, after a failure, the
number of trials is reset to 0. Furthermore, the only solution to the functional equation is an exponential function. Thus, the geometric distribution
is the only discrete distribution that possesses the memoryless property.

3.1.4 The Poisson distribution


Often we are interested to count the number of occurrences of an event
in a certain time interval, such as, for example, the number of IP packets
during a time slot or the number of telephony calls that arrive at a telephone
exchange per unit time. The Poisson random variable [ with probability
density function
n h3
(3.9)
n!
turns out to model many of these counting phenomena well as shown in
Chapter 7. The corresponding generating function is
Pr [[ = n] =

*[ (}) = h3

"
X
n
n=0

n!

} n = h(}31)

(3.10)

and the average number of occurrences in that time interval is


H[[] = 

(3.11)

This average determines the complete distribution. In applications it is


convenient to replace the unit interval by an interval of arbitrary length w
such that
(w)n h3w
Pr [[ = n] =
n!
equals the probability that precisely n events occur in the interval with
duration w. The probability that no events occur during w time units is
Pr [[ = 0] = h3w and the probability that at least one event (i.e. one or
more) occurs is Pr [[ A 0] = 1  h3w . The latter is equal to the exponential distribution. We will also see later in Theorem 7.3.2 that the Poisson

3.1 Discrete random variables

41

counting process and the exponential distribution are intimately connected.


The sum of q independent Poisson random variables each with mean n is
P
again a Poisson random variable with mean qn=1 n as follows from (2.65)
and (3.10).
The higher-order moments can be deduced from (2.24) as

w
gq h3(w3h )

H [([  )q ] = h3

gwq

w=0

from which

H[[] = Var[[] = H ([  )3 = 

The Poisson tail distribution equals


Pr [[ A p] = 1 

p
X
n h3
n=0

n!

which precisely equals the sum of p exponentially distributed variables as


demonstrated below in Section 3.3.1.
The Poisson density approximates the binomial density (3.3) if q $ 4
but the mean qs = . This phenomenon is often referred to as the law
of rare events: in an arbitrarily large number q of independent trials each
with arbitrarily small success s = q , the total number of successes will
approximately be Poisson distributed.
The classical argument is to consider the binomial density (3.3) with s =


q



q!
 q3n
n
1
3
n!(q 3 n)! qn
q


3n n31


n
\

m

 q
=
13
13
13
n!
q
q
q
m=1

Pr[[ = n] =

or

log (Pr[[ = n]) = log

n
n!



 n31



[
m


log 1 3
3 n log 1 3
+
+ q log 1 3
q
q
q
m=1



 33 
{
{
{2
For large q, we use the Taylor expansion log 1 3 q
= 3q
to obtain up to order
3 2q
2 +R q
 32 
R q
 n



 n(n 3 1)



2

log (Pr[[ = n]) = log
+ n + R q32 3
+ R q32 3  3
+ R q32
n!
q
2q
2q
 n




1 
= log
(n 3 )2 3 n + R q32
33
n!
2q
With h{ = 1 + { + R({2 ), we nally obtain the approximation for large q,





n h3
1 
Pr[[ = n] =
13
(n 3 )2 3 n + R q32
n!
2q

42

Basic distributions

The coe!cient of

1
q

k
is negative if n M  +

1
2

t
 + 14 >  +

1
2

t
l
 + 14 . In that n-interval,

the Poisson density is a lower bound for the binomial density for large q and qs = . The reverse
holds for values of n outside that interval. Since for the Poisson density

Pr[[=n]
Pr[[=n31]


,
n

we

see that Pr[[ = n] increases as  A n and decreases as  ? n. Thus, the maximum of the
Poisson density lies around n =  = H[[]. In conclusion, we can say that the Poisson density
approximates the binomial density for large q and qs =  from below in the region of about the
I
standard deviation  around the mean H[[] =  and from above outside this region (in the
tails of the distribution).

A much shorter derivation anticipates results of Chapter 6 and starts from


the probability generating function (3.2) of the binomial distribution after
substitution of s = q ,

(}  1) q
= h(}31)
lim *[ (}) = lim 1 +
q<"
q<"
q
Invoking the Continuity Theorem 6.1.3, comparison with (3.10) shows that
the limit probability generating function corresponds to a Poisson distribution. The SteinChen (1975) Theorem1 generalizes the law of rare events:
this law even holds when the Bernoulli trials are weakly dependent.
As a nal remark, let Vq be the sum of i.i.d. Bernoulli trials each with
mean s, then Vq is binomially distributed as shown in Section 3.1.2. If s
is a constant and independent of the number of trials q, the Central Limit
Theorem 6.3.1 states that sVq 3qs tends to a Gaussian distribution. In
qs(13s)

summary, the limit distribution of a sum Vq of Bernoulli trials depends on


how the mean s varies with the number of trials q when q $ 4:
if s = q , then
if s is constant, then
1

g n h3
n!
{2
g 3
V
3qs
q
s
$ hI22
qs(13s)

Vq $

The proof (see e.g. Grimmett and Stirzacker (2001, pp. 130132)) involves coupling theory
of stochastic random variables. The degree of dependence is expressed in terms of the total
variation distance. The total variation distance between two discrete random variables [ and
\ is dened as
gW Y ([> \ ) =

|Pr [[ = n] 3 Pr [\ = n]|

and satises
gW Y ([> \ ) = 2 sup |Pr [[ M D] 3 Pr [\ M D]|
DaZ

3.2 Continuous random variables

43

3.2 Continuous random variables


3.2.1 The uniform distribution
A uniform random variable [ has equal probability to attain any value in
the interval [d> e] such Rthat the probability density function is a constant.
e
Since Pr[d  [  e] = d i[ ({)g{ = 1, the constant value equals
i[ ({) =

1
1
e  d {M[d>e]

(3.12)

where 1| is the indicator function dened in Section 2.2.1. The distribution


function then follows as
{d
1
+ 1{Ae
Pr [d  [  {] =
e  d {M[d>e]
The Laplace transform (2.37) is2
Z "
h3}d  h3}e
*[ (}) =
h3}w i[ (w)gw =
}(e  d)
3"

(3.13)

while the mean  = H [[] most easily follows from


Z "
{g{
d+e
1{M[d>e] =
H[[] =
2
3" e  d
The centered moments are obtained from (2.39) as
}

(e3d)
3 }2 (e3d)
2
h
q
q

h

(1) g

H [([  )q ] =

q
e  d g}
}

}=0

)}
2(1)q gq sinh( e3d

2
=

e  d g} q
}
}=0

Using the power series


X ( e3d )2n+1
sinh( e3d
2 )}
2
=
} 2n
}
(2n + 1)!
"

n=0

leads to

(e  d)2q
H ([  )2q =
(2q + 1)22q

H ([  )2q+1 = 0
2

}*

(})

(3.14)

[
Notice that
equals the convolution i W j of two exponential densities i and j with rates
de
d and e, respectively.

44

Basic distributions

Let us dene X as the uniform random variable in the interval [0> 1]. If
Z = 1  X is a uniform random variable on [0> 1], then Z and X have the
g

same distribution denoted as Z = X because Pr[Z  {] = Pr[1  X  {] =


Pr[X  1  {] = 1  (1  {) = { = Pr [X  {] =
The probability distribution function I[ ({) = Pr[[  {] = j({) whose
inverse exists can be written as a function of IX ({) = {1{M[0>1] . Let [ =
j 31 (X ). Since the distribution function is non-decreasing, this also holds for
the inverse j 31 (=). Applying (2.32) yields with [ = j 31 (X )

I[ ({) = Pr j 31 (X )  { = Pr [X  j({)] = IX (j({)) = j({)


g

For instance, j 31 (X ) =  ln(13X)


=  lnX are exponentially random vari
31
ables (3.17) with parameter ; j (X ) = X 1@ are polynomially distributed
random variables with distribution Pr [[  {] = { ; j 31 (X ) = cot(X ) is
a Cauchy random variable dened in (3.38) below. In addition, we observe
that X = j ([) = I[ ([), which means that any random variable [ is
transformed into a uniform random variable X on [0> 1] by its own distribution function.
The numbers dn that satisfy congruent recursions of the form dn+1 =
(dn +) mod P , where P is a large prime number (e.g. P = 231 1),  and
 are integers (e.g.  = 397 204 094 and  = 0) are to a good approximation
dn
uniformly distributed. The scaled numbers |n = P31
are nearly uniformly
distributed on [0> 1]. Since these recursions with initial value or seed d0 5
[0> P  1] are easy to generate with computers (Press et al., 1992), the above
property is very useful to generate arbitrary random variables [ = j 31 (X )
from the uniform random variable X .

3.2.2 The exponential distribution


An exponential random variable [ satises the probability density function
i[ ({) = h3{

> {  0

(3.15)

where  is the rate at which events occur. The corresponding Laplace transform is
Z "

(3.16)
h3w h3}w gw =
*[ (}) = 
}+
0
and the probability distribution is, for {  0,
I[ ({) = 1  h3{

(3.17)

3.2 Continuous random variables

45

The mean or average follows from (2.33) or from H [[] = *0[ (0) as  =
H [[] = 1 . The centered moments are obtained from (2.39) as

q

q h}@
g
}+
1

H [
= (1)q


g} q

}=0

h}@

around } = 0 is
!
q
"
"
"
n X
h}@ X 1 } n X
1 q X (1)n
n }
}q
=
(1)
=
}+
n! 


n!

Since the Taylor expansion of

n=0

we nd that

n=0

}+

q=0

n=0



q
1 q
q! X (1)n
H [
= q


n!

(3.18)

n=0

For large q, the centered moments are well approximated by




1 q
q!
H [
' q

h
The exponential random variable possesses, just as its discrete counterpart, the geometric random variable, the memoryless property. Indeed, analogous to Section 3.1.3, consider
Pr[[  w + W |[ A w] =

Pr [{[  w + W } _ {[ A w}]
Pr [[  w + W ]
=
Pr [[ A w]
Pr [[ A w]

and since Pr [[ A w] = h3w , the memoryless property


Pr[[  w + W |[ A w] = Pr[[ A W ]
is established. Since the only non-zero solution (proved in Feller (1970,
p. 459)) to the functional equation i ({ + |) = i ({)i (|), which implies
the memoryless property, is of the form f{ , it shows that the exponential
distribution is the only continuous distribution that has the memoryless
property. As we will see later, this memoryless property is a fundamental
property in Markov processes.
It is instructive to show the close relation between the geometric and
exponential random variable (see Feller (1971, p. 1)). Consider the waiting
time W (measured in integer units of w) for the rst success in a sequence of
W
Bernoulli trials where only one trial occurs in a timeslot w. Hence, [ = {w
is a (dimensionless) geometric random variable. From (3.8), Pr[W A nw] =
(1  s)n and the average waiting time is H [W ] = wH [[] = {w
s . The

46

Basic distributions

transition from the discrete to continuous space involves the limit process
w $ 0 subject to a xed average waiting time H [W ]. Let w = nw, then

w w@{w
= h3w@H[W ]
lim Pr[W A w] = lim 1 
{w<0
{w<0
H [W ]
For arbitrary small time units, the waiting time for the rst success and
with average H [W ] turns out to be an exponential random variable.

3.2.3 The Gaussian or normal distribution


The Gaussian random variable [ is dened for all { by the probability
density function


({  )2
1
i[ ({) = s exp 
(3.19)
22
 2
which explicitly shows its dependence on the average  and variance  2 . The
importance of the Gaussian random variables stems from the Central Limit
Theorem 6.3.1. Often a Gaussian also called normal random variable
with average  and variance  2 is denoted by Q (>  2 ). The distribution
function is


Z {
{
1
(w  )2
(3.20)
gw 
exp 
I[ ({) = s
22

 2 3"
R{
w2
where3 ({) = I12 3" h3 2 gw is the normalized Gaussian distribution corresponding to  = 0 and  = 1. The double-sided Laplace transform is
1
*[ (}) = s
 2
3

"

3}w

h
3"


2 }2
(w  )2
3}
2
gw
=
h
exp 
2 2

(3.22)

Abramowitz and Stegun (1968, Section 7.1.1) dene the error function as
2
erf (}) = I


h3w gw

such that (Abramowitz and Stegun, 1968, Section 7.1.22)


1
I
 2






(w 3 )2
1
{3
exp 3
I
gw
=
1
+
erf
22
2
2
3"

(3.21)

3.3 Derived distributions

47

and the centered moments (2.39) are


2 2

 }
2q

2 q
g
h 2


(2q)!
2q

=
=
H ([  )

2q
g}
q!
2

}=0

2q+1
=0
H ([  )

(3.23)

We note from (2.65) that a sum of independent Gaussian


variables
Pq random

Pq
2
Q (n > n2 ) is again a Gaussian random variable Q

>
n=1 n
n=1 n . If
2
[ = Q (>  2 ), then the scaled random variable \ = d[ is a Q (d>
(d)| )
random variable that is veried by computing Pr [\  |] = Pr [  d .
Similarly for translation, \ = [ + e, then \ = Q ( + e>  2 ). Hence, a
linear combination of Gaussian random variables is again a Gaussian random
variable,
q
!
q
q
X
X
X
dn Q (n > n2 ) + e = Q
dn n + e>
d2n n2
n=1

n=1

n=1

3.3 Derived distributions


From the basic distributions, a large number of other distributions can be
derived as illustrated here.
3.3.1 The sum of independent exponential random variables
By applying (2.65) and (2.38) a substantial amount of practical problems
can be solved. For example, the sum of q independent exponential random
variables, each with dierent rate n A 0, has the generating function
*Vq (}) =

q
Y
n=1

n
} + n

and probability density function


Qq
Z f+l"
h}w
n=1 n
Qq
g}
iVq (w) =
2l
f3l"
n=1 (} + n )
The contour can be closed over the negative half plane for w A 0, where the
integral has simple poles at } = n . From the Cauchy integral theorem,
we obtain
q
! q
Y
X
h3m w
Qq
iVq (w) =
n
n=1;n6=m (n  m )
n=1

m=1

48

Basic distributions

If all rates are equals n = , the case reduces to *Vq (}) =


H [Vq ] = q and with probability density
Z
h}w
q f+l"
iVq (w) =
g}
2l f3l" (} + )q


}+

with

Again, the contour can be closed over the negative half plane and the q-th
order poles are deduced from Cauchys relation for the q-th derivative of a
complex function

Z
i ($) g$
1 gn i (})
1
=
n! g} n }=}0
2l F(}0 ) ($  }0 )n+1
as

gq31 h}w
q
(w)q31 3w
h
=
iVq (w) =
(q  1)! g} q31 }=3
(q  1)!

(3.24)

For integer q, this density corresponds to the qErlang random variable.


When extended to real values of q = ,
i[ (w; > ) =

(w)31 3w
h
()

(3.25)

it is called the Gamma probability density function, with corresponding pgf





} 3
= 1+
(3.26)
*[ (}; > ) =
}+

and distribution
I[ ({; > ) =


()

w31 h3w gw

(3.27)

This integral, the incomplete Gamma-function, can only be expressed in


closed analytic form if  is an integer. Hence, for the q-Erlang random
variable [, the distribution follows after repeated partial integration as
Z {
q31
X ({)n
q
q31 3w
I[ ({; > q) =
h3{
w h gw = 1 
(3.28)
(q  1)! 0
n!
n=0

Pq31 ({)n 3{


We observe that Pr[[ A {] = n=0
, which equals Pr[\  q  1]
n! h
where \ is a Poisson random variable with mean  = {. Further, Pr[[ A
W
{] = Pr[ [ A {], where H[[ W ] = q, or the distribution of the sum of
i.i.d. exponential random variables each with rate  follows by scaling { $
{ from the distribution of the sum of i.i.d. exponential random variables
each with unit rate (or mean 1). Moreover, (2.65) and (3.26) show that a

3.3 Derived distributions

49

sum of q independent Gamma random variables specied by n (but with


P
same ) is again a Gamma random variable with  = qn=1 n .
At last all centered moments follow from (2.39) by series expansion around
} = 0 as

3
gq h}d 1 + }

H [([  d)q ] = (1)q

g} q

}=0

q
3p
X
(d)

= (1)q q!dq
p (q  p)!
p=0
In particular, since H [[] =  =  , we nd with

3}
p

= (1)p K(}+p)
p!K(})

q
 3p
 q X 
H [([  ) ] = (1) q! q
 p=0 p (q  p)!
q
q X
( + p)
q
q
= (1) q
(1)p
 p=0 p
() p
q

= (1)q

q
() X (>  + 1 + q> )
q

where X (d> e> }) is the conuent hypergeometric function (Abramowitz and


Stegun, 1968, Chapter 13). For example, if q = 2, the variance equals

and
 2 = 2 and further, H ([  )3 = 23 , H ([  )4 = 3(+2)
4

4(5+6)
5
.
H ([  ) =
5

3.3.2 The sum of independent uniform random variables


Pn
The sum Vn =
m=1 Xm of n i.i.d. uniform random variables Xm has as
distribution function the n-fold convolution of the uniform density function
(nW)
iX ({) = 10${$1 on [0> 1] denoted by iX ({). The distribution function
equals
Pr [Vn  {] =

[{]
X
m=0

(1)m
({  m)n
m!(n  m)!

Indeed, from (2.66) and (3.13) the Laplace transform of Vn is

*Vn (}) =

1  h3}
}

(3.29)

50

Basic distributions

The inverse Laplace transform determines, for f A 0,


n
Z f+l"
1  h3}
1
{ g
(nW)
Pr [Vn  {] =
iX ({) =
h}{ g}
g{
2l f3l"
}

R f+l" hvd
P
n
1
Using (1  h3} ) = nm=0 nm (1)m h3m} and the integral 2l
f3l" vq+1 gv =
dq
q! 1Re(d)A0 , yields
(nW)
iX ({)

n
X
({  m)n31
n
1
(1)m
=
(n  1)! ({3m)D0
m

(3.30)

m=0

from which (3.29) follows by integration.

3.3.3 The chi-square distribution


Suppose that the total error of q independent measurements [n , each perturbed by Gaussian noise, has to be determined. In order to prevent that erP
rors may cancel out, the sum of the squared errors V = qn=0 h2n is preferred
Pq
rather than n=0 |hn |. For simplicity, we assume that all errors hn = [n {n ,
where {n is the exact value of quantity n, have zero mean and unit variance.
The corresponding distribution of V is known as the chi-square distribution.
From the "2 -distribution, the "2 -test in statistics is deduced which determines the goodness of a model of a distribution to a set of measurements.
We refer for a discussion of the "2 -test to Leon-Garcia (1994, Section 3.8)
or Allen (1978, Section 8.4).
We rst deduce the distribution of the square \ = [ 2 of a random variable
[ and note that if X and Y are independent so are the random variables
s
j(X ) and k(Y ). The event {\  |} or {[ 2  |} is equivalent to { | 
s
[  |} and non-existent if | ? 0. With (2.29) and |  0,
s
s
s
s
Pr [\  |] = Pr [ |  [  |] = I[ ( |)  I[ ( |)
and, after dierentiation,
s
s
i[ ( {) + i[ ( {)
s
i[ 2 ({) =
2 {
If [ is a Gaussian random variable Q(>  2 ), then is, for {  0,
h
i
2)
s
exp  ({+
 {
2 2
s
cosh
i[ 2 ({) =
2
 2{
In particular, for Q (0> 1) random variables where  = 0 and  = 1, i[ 2 ({) =

3.4 Functions of random variables

51

3{
2

h
I
2{

reduces to a Gamma distribution (3.25) with  = 12 and  = 12 . Since


the sum of q independent Gamma random variables with (> ) is again a
Gamma random variable (> q), we arrive at the chi-square "2 probability
density function,
q

{
{ 2 31
i"2 ({) = q q h3 2
2
2  2

(3.31)

3.4 Functions of random variables


3.4.1 The maximum and minimum of a set of independent
random variables
The minimum of p i.i.d. random variables {[n }1$n$p possesses the distribution4


Pr min [n  { = Pr [at least one [n  {] = Pr [not all [n A {]


1$n$p

or


Pr

p
Y
min [n  { = 1 
Pr[[n A {]

1$n$p

(3.32)

n=1

whereas for the maximum,




p
Y
Pr max [n A { = Pr [not all [n  {] = 1 
Pr[[n  {]
1$n$p

n=1

or


Pr

Y
p
Pr[[n  {]
max [n  { =

1$n$p

(3.33)

n=1

For example, the distribution function for the minimum of p independent


exponential random variables follows from (3.17) as

!


p
p
Y
X
3n {
Pr min [n  { = 1 
h
= 1  exp {
n
1$n$p

n=1

n=1

or, the minimum of p independent exponential random variables each with


Pp
rate n is again an exponential random variable with rate
n=0 n . In
addition to the memoryless property, this property of the exponential distribution will determine the fundamentals of Markov chains.
4

An alternative argument for independent random variables is that the event {min1$n$p [n A
{} is only possible if and only if {[n A {} for each 1 $ n $ p. Similarly, the event
{max1$n$p [n $ {} is only possible if and only if all {[n $ {} for each 1 $ n $ p.

52

Basic distributions

3.4.2 Order statistics


The set [(1) > [(2) > = = = > [(p) are called the order statistics of the set of
random variables {[n }1$n$p if [(n) is the n-th smallest value of the set
{[n }1$n$p . Clearly, [(1) = min1$n$p [n while [(p) = max1$n$p [n . If
the set {[n }1$n$p consists of i.i.d. random variables with pdf i[ , the joint
density function of the order statistics is, for only {1 ? {2 ? ? {p ,

Cp
i{[(m) } ({1 > {2 > = = = > {p ) =
Pr [(1)  {1 > = = = > [(p)  {p
C{1 = = = C{p
p
Y
i[ ({m )
(3.34)
= p!
m=1

Indeed, conning to discrete random variables for simplicity, if {1 ? {2 ?


? {p , then

Pr [(1) = {1 > = = = > [(p) = {p = p! Pr [[1 = {1 > = = = > [p = {p ]


else

Pr [(1) = {1 > [(2) = {2 > = = = > [(p) = {p = 0

because there are precisely p! permutations of the set {[n }1$n$p onto the
given ordered sequence {{1 > {2 > = = = > {p }. If the sequence is not ordered such
that {n A {o for at leastone couple of
indices n ? o, then the probability is
zero because the event [(n) A [(o) is, by denition, impossible. Finally,
the product in (3.34) follows by independence.
If the set {[n }1$n$p is uniformly distributed over [0> w], then
p!
i{[(m) } ({1 > {2 > = = = > {p ) = p
w
=0

0  {1 ? {2 ? ? {p  w
elsewhere

while for exponential random variables with i[ ({) = h3{


i{[(m) } ({1 > {2 > = = = > {p ) = p!p h3
=0

Sp

m=1

{m

0  {1 ? {2 ? ? {p
elsewhere

The order relation between the set [(1)  [(2)   [(p)


is preserved


j
[

after a continuous,
non-decreasing
transform
j,
i.e.
j
[
(1)
(2)

 j [(p) . If the distribution function I[ is continuous (it is always


non-decreasing), the argument shows that the order statistics of a general
set of i.i.d. random variable {[n }1$n$p can be reduced to a study of the
order statistics of the set of i.i.d. uniform random variables {Xn }1$n$p on
[0,1] because X = I[ ([).

3.4 Functions of random variables

53

The event [(n)  { means that at least n among the p random variables {[m }1$m$p are smaller than {. Since each of the p random variables
is chosen independently from a same distribution I[ , the probability that
precisely q of the p random variables is smaller than { is binomially distributed with parameter s = Pr [[  {]. Hence,
p
X

p
(Pr [[  {])q (1  Pr [[  {])p3q
(3.35)
Pr [(n)  { =
q
q=n

The probability density function can be obtained in the usual, though cumbersome, way by dierentiation,

g Pr [(n)  {
i[(n) ({) =
g{
p
X
p g
(Pr [[  {])q (1  Pr [[  {])p3q
=
q g{
q=n

p
X
p
(Pr [[  {])q31 (1  Pr [[  {])p3q
= i[ ({)
q
q
q=n

p
X
p
(Pr [[  {])q (1  Pr [[  {])p3q31
 i[ ({)
(p  q)
q
q=n
p
p31

p31
Using q q = p q31 , (p  q) p
and lowering the upper index
q =p q
in the last summation, we have

p
X
p1
i[(n) ({) = pi[ ({)
(Pr [[  {])q31 (1  Pr [[  {])p3q
q1
q=n
p31
X p  1
(Pr [[  {])q (1  Pr [[  {])p3q31
 pi[ ({)
q
q=n

p
X
p1
(Pr [[  {])q31 (1  Pr [[  {])p3q
= pi[ ({)
q1
q=n

p
X
p1
(Pr [[  {])q31 (1  Pr [[  {])p3q
 pi[ ({)
q1
q=n+1

or, with I[ ({) = Pr [[  {],

p1
(I[ ({))n31 (1  I[ ({))p3n
i[(n) ({) = pi[ ({)
n1

(3.36)

The more elegant and faster argument is as follows: in order for [(n) to be
equal to {, exactly n  1 of the p random variables {[m }1$m$p must be

54

Basic distributions

less than {, one equal to { and the other p  n must all be greaterthan {.
Abusing the notation i[ ({) = Pr [(n) = { and observing that p p31
n31 =
p!
p!
1!(n31)!(p3n)! is an instance of the multinomial coe!cient q1 !q2 !qn ! which
gives the number of ways of putting p = q1 + q2 + + qn dierent objects
into n dierent boxes with qm in the m-th box, leads alternatively to (3.36).
3.5 Examples of other distributions
1. The Gumbel distribution appears in the theory of extremes (see Section 6.4) and is dened by the distribution function
3d({3e)

IGumbel ({) = h3h

(3.37)

The corresponding Laplace transform is


Z "

}
3d(w3e)
h3}w h3h
dh3d(w3e) gw = h3e}  1 +
*Gumbel (}) =
d
3"

g 3e}
from which the mean follows as H [[] =  g}
h  1 + d} }=0 = e + d ,
where  = 0.57721=== is the Euler constant. The variance is best computed
2
with (2.43) resulting in Var[[] = 6d
2.
2. The Cauchy distribution has the probability density function
iCauchy ({) =
and corresponding distribution,
ICauchy ({) =

1
 (1 + {2 )

1 
+ arctan {
 2

The Laplace transform


1
*Cauchy (}) =


"

3"

(3.38)

h3}{ g{
1 + {2

only converges for purely imaginary } = l$, in which case it reduces to a


Fourier transform,
Z
1 " h3l${ g{
*Cauchy (l$) =
 3" 1 + {2
This integral is best evaluated by contour integration. If $  0, we consider
a contour F consisting of the real axis and the semi-circle that encloses the
negative Im({)-plane,
Z  3l$uh3l 3l
Z " 3l${
Z 3l${
h
g uh
h
g{
h
g{
=
+ lim
2
2
2
u<" 0
1 + u h32l
3" 1 + {
F 1+{

3.5 Examples of other distributions

55

3l

Since h3l$uh = h$u sin  = h3|$|u sin  and sin   0 for 0    , the limit
of the last integral vanishes. The contour encloses the simple pole (zero of
{2 + 1 = ({  l)({ + l)) at { = l. Applying Cauchys residue theorem, we
obtain
Z " 3l${
h
g{
h3l${ ({ + l)
=
2l
lim
= h3$
2
2
{<3l
1
+
{
1
+
{
3"
If $  0, we close the contour over the positive Im({)-plane such that the
contribution of the semi-circle to the contour F again vanishes. The resulting
contour then encloses the simple pole at { = l and
Z " 3l${
h
g{
h3l${ ({  l)
=
2l
lim
= h3$
2
2
{<l
1
+
{
1
+
{
3"
Combining both expressions results in

*Cauchy (l$) = H h3l$[ = h3|$|

Since |$| is not analytic around $ = 0, none of the moments of the Cauchy
distribution exists! Hence, the Cauchy distribution is an example of a distribution without mean (see the requirement for the existence
of the expectaR " {g{
tion in Section 2.3.2), although the improper integral 3" 1+{2 = 0 due to
R " {g{
R 0 {g{
diverge.
symmetry (in the Riemann sense), but both 3" 1+{
2 and 0
1+{2
Pq
In addition, if Vq = n=1 [n is the sum of i.i.d. Cauchy random variables
[n , the sample mean Vqq has the Fourier transform,
q
h
i
h $ Sq
i Y
h $ i h $ iq
Vq
H h3l$ q = H h3l q n=1 [n =
H h3l q [n = H h3l q [
= h3|$|
n=1

Hence, the sample mean Vqq of i.i.d. Cauchy random variables is again a
Cauchy random variable independent of q. This means that the law of large
numbers (see Section 6.2) does not hold for the Cauchy random variable,
as a consequence of the non-existence of the mean. Also, the sum Vq has
1
Fourier transform h3|q$| and the pdf equals iVq ({) = q 1+({@q)
2 .
(
)
3. The Weibull distribution with pdf dened for {  0 and d> e A 0

e
exp  {d

(3.39)
iWeibull ({) =
d 1 + 1e
generalizes the exponential distribution (3.17) corresponding to e = 1 and
d = 1 . It is related to the Gaussian distribution if e = 2. Let [ be a Weibull

56

Basic distributions

random variable. All higher moments can be computed from (2.34) as


Z "
h i
dn  n+1
1
{ e
n
n
e

g{ =
{ exp 
H [ =
d
d 1 + 1e 0
 1e
The generating function possesses the expansion

"
"
3}[ X
n + 1 (}d)n
(})n h n i
1 X
H [ = 1
*[ (}) = H h

=
n!
e
n!
 e n=0
n=0
which cannot be summed in explicit form for general e.
Sometimes an alternative denition of the Weibull distribution appears
iWeibull ({) = de{e31 h3d{

(3.40)

3d{e

IWeibull ({) = 1  h

with the advantage of a simpler expression for the distribution function


IWeibull ({). If [ possesses this probability density (3.40), the moments and
variance are

h i Z "
 n + 1e
n
n
{ iWeibull ({)g{ =
H [ =
dn@e
0

Z "
 1 + 2e  2 1 + 1e
2
Var[[] =
({  H [[]) iWeibull ({)g{ =
d2@e
0
The interest of the Weibull distribution in the Internet stems from the
self-similar and long-range dependence of observables (i.e. quantities that
can be measured such as the delay, the interarrival times of packets, etc.).
Especially if the shape factor e 5 (0> 1), the Weibull has a sub-exponential
tail that decays more slowly than an exponential, but still faster than any
power law.
4. Power law behavior is often described via the Pareto distribution
with pdf for {  0 and  A 0>

{ 331
(3.41)
iPareto ({) =
1+


and with distribution function
Z

{ 331
 {
{ 3
1+
IPareto ({) =
gw = 1  1 +
(3.42)
 0


Since lim{<" I ({) = 1, the power  must exceed 0. The higher moments
are Beta-functions (Abramowitz and Stegun, 1968, Section 6.2.1)
h i Z "
{n g{
n (  n)
H [n =
{ +1 = n!
 0 (1 +  )
 ()

3.5 Examples of other distributions

57


and show that H [ n only exists if  A n. Hence, the mean H [[] only ex-
ists if  A 1. The deep tail asymptotic for large { is iPareto ({) = R {331
and Pr [[ A {] = R ({3 ). For example, the distribution of the nodal degree
in the Internet has an exponent around  = 2=4 (see Section 15.3).
5. Another distribution with heavy tails is the lognormal

distribution
dened as the random variable [ = h\ where \ = Q >  2 isa Gaussian

or normal random variable. From (2.32), it follows that Pr h\  { =


Pr [\  log {] for {  0, and with (3.20)


Z log {
1
(w  )2
Ilognormal ({) = s
exp 
gw
(3.43)
22
 2 3"
and, for { A 0,

i
h
{3)2
exp  (log2
2
s
ilognormal ({) =
{ 2

(3.44)

The moments are




Z "
h i
1
(log {  )2
n
n31
H [ = s
{
exp 
g{
22
 2 0


Z "
(x  )2
1
nx
gx
h exp 
= s
22
 2 3"
or, explicitly,

2 2
h i
n 
n
H [ = exp (n) exp
2

and

(3.45)

2
Var[[] = h2 h2  h

(3.46)

The probability generating function is by denition (2.37)


1
*[ (}; >  2 ) = s
 2

Z
0

"

h3
h3}w

(log w3)2
2 2

1
gw = s
 2

"

h3}h h3

({3)2
2 2

g{

3"

(3.47)
only exists for Re(})  0.
The integral (3.47) indicates that *[
This means that *[ (}; >  2 ) is not analytic at any point } = lw on the
imaginary axis because the circle with arbitrary small but non-zero radius
around } = lw necessarily encircles points with Re(}) ? 0 where *[ (}; >  2 )
does not exist. Hence, the Taylor expansion (2.40) of the generating function
around } = 0 does not exist, although all moments or derivatives at } = 0
(}; >  2 )

58

Basic distributions

exist. Indeed, the series



2 2
"
"
X
(1)n H [ n n X (}h )n
n 
} =
exp
n!
n!
2
n=0

n=0

is a divergent series (except for  = 0 or } = 0). The fact that the pgf
(3.44) is not available in closed form complicates the computation of the
sum of i.i.d. lognormal random variables via (2.66). This sum appears in
radio communications with several transmitters and receivers.
In radio communications, the received signal levels decrease with the distance between the
transmitter and the receiver. This phenomenon is called pathloss. Attenuation of radio signals
due to pathloss has been modeled by averaging the measured signal powers over long times and
over various locations with the same distances to the transmitter. The mean value of the signal
power found in this way is referred to as the area mean power Pd (in Watts) and is well-modeled as
Pd (u) = fu3 where f is a constant and  is the pathloss exponent5 . In reality the received power
levels may vary signicantly around the mean power Pd (u) due to irregularities in the surroundings
of the receiving and transmitting antennas. Measurements have revealed that the logarithm of
the mean power P (u) at dierent locations on a circle with radius u around the transmitter is
approximately normally distributed with mean equal to the logarithm of the area mean power
Pd (u). The lognormal shadowing model assumes that the logarithm of P(u) is precisely normally
distributed around the logarithmic value of the area mean power: log10 (P(u)) = log10 (Pd (u))+[,
where [ = Q (0> ) is a zero-mean normal distributed random variable (in dB) with standard
deviation  (also in dB and for severe uctuations up to 12 dB). Hence, the random variable
P(u) = Pd (u)10[ has a lognormal distribution (3.43) equal to

Pr [P(u) $ {] = Pr [ $ log10

&
%

] {
{
(log10 x 3 log10 (Pd (u)))2 gx
1
exp 3
= I
Pd (u)
22
x
 2 log 10 0

3.6 Summary tables of probability distributions


3.6.1 Discrete random variables
Name

Pr [[ = n]

H [[]

Var[[]


*[ (}) = H } [

Bernoulli
Binomial
Geometric

Pr [[ = 1] = s
q n
q3n
n s (1  s)
s (1  s)n31

s
qs

s (1  s)
qs (1  s)

1  s + s}
((1  s) + s})q

1
s

13s
s2

Poisson

n

s}
13(13s)}
h(}31)

n!

h3

The constant f depends on the transmitted power, the receiver and the transmitter antenna
gains and the wavelength. The pathloss exponent  depends on the environment and terrain
structure and can vary between 2 in free space to 6 in urban areas.

3.7 Problems

59

3.6.2 Continuous random variables


Name
Uniform
Exponential
Gaussian
Gamma
Gumbel
Cauchy
Weibull
Pareto
Lognormal

i[ ({)

H [[]

Var[[]



*[ (}) = H h3}[

1d{e
e3d
h3{


({)2
exp 3
22
I
 2
({)1 3{
h
K()
{
3{
h
h h

d+e
2
1


(e3d)2
12
1
2

h}d 3h}e
}(e3d)

}+

2





2
2
6

exp


1
 (1+{2 )
e

de{e31 h3d{

331

1+ {





(log {)2
exp 3
22
I
{ 2

 = 0=5772===
does not exist

does not exist

K(1+ 1
e)

K(1+ 2
3K2 (1+ 1
e)
e)

d1@e
 1{A1}
31

2 }2
2


l
3 }


}+

K (} + 1)
h3| Im(})| (Re(}) = 0)

d2@e
 2 1{A2}

(31)2 (32)


exp () exp


2


2

 2

2
h2 h2 3 h

3.7 Problems
(i) If *[ (}) is the probability generating function of a non-zero discrete
random variable [, nd an expression of H [log [] in terms of *[ (}).
(ii) Compute the mean value of the n-th order statistic in an ensemble of
(a) p i.i.d. exponentially distributed random variables with mean 1
and (b) p i.i.d. polynomially distributed random variables on [0,1].
(iii) Discuss how a probability density function of a continuous random
variable [ can be approximated from a set {{1 > {2 > = = = > {q } of q measurements or simulations.
(iv) In a circle with radius u around a sending mobile node, there are
Q  1 other mobile nodes uniformly distributed over that circle. The
possible interference caused by these other mobile nodes depends on
their distance to the sending node at the center. Derive for large Q
but constant density  of mobile nodes the pdf of the distance of the
p-th nearest node to the center.
(v) Let X and Y be two independent random variables. What is the
probability that the one is larger than the other?

4
Correlation

In this chapter methods to compute bi-variate correlated random variables


are discussed. As a measure for the correlation, the linear correlation coe!cient dened in (2.58) is used. First, the generation of q correlated Gaussian
random variables is explained. The sequel is devoted to the construction of
two correlated random variables with arbitrary distribution.

4.1 Generation of correlated Gaussian random variables


Due to the importance of Gaussian correlated random variables as an underlying system for generating arbitrary correlated random variables, as will be
demonstrated in Section 4.3, we discuss how they can be generated in multiple dimensions. With the notation of Section 3.2.3, a Gaussian (normal)
random variable with average  and variance  2 is denoted by Q (>  2 ). By
linearly combining Gaussian random variables, we can create a new Gaussian
random variable with a desired mean  and variance  2 .

4.1.1 Generation of two independent Gaussian random variables


The fact that a linear combination of Gaussian random variables is again a
Gaussian random variable allows us to concentrate on normalized Gaussian
random variables Q (0> 1). Let [1 and [2 be two independent normalized
Gaussian random variables. Independent random variables are not correlated and the linear correlation coe!cient  = 0. The resulting joint probability distribution is i[1 [2 ({> |; ) = i[1 ({)i[2 (|) and with (3.19),
i[1 [2 ({> |; 0) =
61

h3

{2 +| 2
2

2

62

Correlation

It is natural to consider a polar transformation


and the transformed random
2
2
2
variables U = [1 + [2 and  = arctan [
[1 . The inverse transform is
s
s
{ = u cos  and | = u sin , which diers slightly from the usual polar
transformation in that we now dene u = {2 + | 2 instead of u2 = {2 + |2 .
The reason is that the Jacobian is simpler for our purposes,
#
" cos 
s
 C{ C{
I
 u sin 
1
2 u
Cu
C
=
M (u> ) = det C| C| = det sin  s
I
u cos 
2
Cu
C
2 u
whereas the usual polar transformation has the Jacobian equal to the variable u. Using the transformation rules in Section 2.5.4,
u

h3 2
iUX (u> ) =
4
which shows that iUX (u> ) does not depend on . Hence, we can write
iUX (u> ) = iU (u) iX () with iX () = f, where f is a constant and iU (u) =
u

h3 2
4f

. This implies that  is a uniform random variable over an interval 1@f.


We also recognize from (3.15) that iU (u) is close to an exponential random
variable with rate  = 12 . Therefore, it is instructive to choose the constant
f such that U is precisely an exponential random variable with rate  = 12 .
3u

1
1
, we end up with iU (u) = h 2 2 and iX () = 2
.
Thus, choosing f = 2
These two independent random variables U and  can each be generated
separately from a uniform random variable X on [0,1], as discussed in Section
3.2.1, leading to

U = 2 ln(X1 )
 = 2X2
and, nally, to the independent Gaussian random variables
p
p
[1 = 2 ln(X1 ) cos 2X2
[2 = 2 ln(X1 ) sin 2X2
The procedure can be used to generate a single Gaussian random variable,
but also more independent Gaussians by repeating the generation procedure.
4.1.2 The q-joint Gaussian probability distribution function
A collection of q random variables [l is called a random vector [ =
([1 > [2 > = = = > [q )W , a matrix with dimension q 1. The average of a random
vector is a vector with components H [[l ] for 1  l  q. The variance of a
random vector

Var [[] = H ([  H [[])([  H [[])W = H [[ W  H [[] (H [[])W

4.1 Generation of correlated Gaussian random variables

63

is a matrix [ with elements ( [ )l>m = Cov[[l > [m ]. Since Cov[[l > [m ] =


Cov[[m > [l ], the covariance matrix [ is real and symmetric, [ = W[ .
The importance of real, symmetric matrices is that they have real eigenvectors (see Appendix A.2). Moreover, [ is non-negative denite because,
using vector norms dened in Section A.3,

{W [ { = H {W ([  H [[])([  H [[])W {
h
i
W
= H ([  H [[])W { ([  H [[])W {
h
2 i
= H ([  H [[])W {2  0
which implies that all real eigenvalues l are non-negative. Hence, there
exists an orthogonal matrix X such that
[ = Xdiag(l )X W

(4.1)

If all random variables [l are independent, Cov[[l > [m ] = 0 for l 6= m and


Cov[[l > [l ] = Var[[l ]  0 then [ = diag(Var[[l ]).
Gaussian random variables are completely determined by the mean and
the variance, i.e. by the rst two moments. We will now show that the
existence of an orthogonal transformation for any probability distribution
such that X W [ X = diag(l ) implies that a vector of joint Gaussian random
variables can be transformed into a vector of independent Gaussian random
variables. Also the reverse holds, which will be used below to generate q joint
correlated Gaussian random variables. The multi-dimensional generating
function of a q-joint Gaussian or q-joint normal random vector [ is dened
for the vector } = (}1 > }2 > = = = > }q )W as

3}[
1 W
W
} [ }  H [[] }
(4.2)
*[ (}) = H h
= exp
2
Using (4.1), and the fact that X is an orthogonal matrix such that X 31 = X W
and X X W = L,

W
W W
1 W W
W
*[ (}) = exp
X } diag(l )X }  X H [[] X }
2
Denote the vectors z = X W } and p = X W H [[]. Then we have

1 W
W
z diag(l )z  p z
*[ (}) = exp
2
4
3
q
q  z2
2
X
Y
m m
m zm
D
C
 pm zm =
= exp
h 2 3pm zm
2
m=1

m=1

64

Correlation
m zm2
3pm zm
2

and h
= *[m (zm ) is the Laplace transform (3.22) of a Gaussian
random variable [m because all m are real and non-negative. With (2.65),
this shows that a vector of joint Gaussian random variables can be transformed into a vector of independent Gaussian random variables. Reversing
the order of the manipulations also justies that (4.2) indeed denes a general q-joint Gaussian probability generating function. If [1 > [2 > = = = > [q are
joint normal and not correlated, then [ is a diagonal matrix, which implies
that [1 > [2 > = = = > [q are independent. As discussed in Section 2.5.2, independence implies non-correlation, but the converse is generally not true. These
properties make Gaussian random variables particularly suited to deal with
correlations.

0
2

fXY(x,y)

0.25
0.2
0.15
0.1
0.05
0
2
0
2

Fig. 4.1. The joint probability density function (4.4) with [ = \ = 0 and
[ = \ = 1 and  = 0=

The corresponding q-joint Gaussian probability density function of the


vector [ can be derived after inverse Laplace transform for the vector { =
({1 > {2 > = = = > {q )W as

1
1
W 31
i[ ({) = s q s
exp  ({  H[{]) [ ({  H[{])
(4.3)
2
2
det [
The inverse Laplace transform for q = 2 is computed in Section C.2.
After computing the inverse matrix and the determinant in (4.3) explicitly, the two-dimensional (q = 2) or bi-variate Gaussian probability density
function is
5
2
2 6
({3[ )

exp 7
i[\ ({> |; ) =

2
[

2
[ \

3

({3[ )(|3\ )+
2(132 )

2[ \

p
1  2

(|3\ )
2
\

8
(4.4)

4.1 Generation of correlated Gaussian random variables

65

Figures 4.14.3 plot i[\ ({> |; ) for various correlation coe!cients . If


 = 0, we observe that i[\ ({> |; 0) = i[ ({)i\ (|), which indicates that
uncorrelated Gaussian random variables are also independent.

2

fXY(x,y)

0.25
0.2
0.15
0.1
0.05
0

2
0

Fig. 4.2. The joint probability density function (4.4) with [ = \ = 0 and
[ = \ = 1 and  = 0=8=

2

fXY(x,y)

0.25
0.2
0.15
0.1
0.05
0

2
0

Fig. 4.3. The joint probability density function (4.4) with [ = \ = 0 and
[ = \ = 1 and  = 0=8=

If we denote {W =
(4.4) reduces to

({3[ )
[

and |W =

(|3\ )
,
\

the bi-variate normal density

h
i
W 2
W | W +(| W )2
exp  ({ ) 32{
2(132 )
p
i[\ ({W > |W ; ) =
2[ \ 1  2

66

Correlation

from which we can verify the partial dierential equation


Ci[\ ({W > |W ; )
Ci[\ ({W > |W ; )
=
C
C{W C|W

(4.5)

and the symmetry relations


i[\ ({W > |W ; ) = i[\ ({W > | W ; ) = i[\ ({W > | W ; )

(4.6)

4.1.3 Generation of q correlated Gaussian random variables


Let {[l }1$l$q be a set of q independent normal random variables, where
each [l is distributed as Q (0> 1). The vector [ is rather easily simulated. The analysis above shows that H [[] = 0 (the null-vector) and [ =
diag(Var[[l ]) = diag(1) = L, the identity matrix. We want to generate
the correlated normal vector \ with a given mean vector H [\ ] and a given
covariance matrix \ . Since linear combinations of normal random variables are normal random variables, we consider the linear transformation
\ = D[ + E where D and E are constant matrices. We will now determine
D and E. First,
H [\ ] = H [D[] + H [E] = DH [[] + H [E] = H [E]
Hence, the matrix E is a vector with components equal to the given components H [\l ] of the mean vector H [\ ]. Second,

\ = H (\  H [\ ])(\  H [\ ])W

= H D[(D[)W = H D[[ W DW = DH [[ W DW
= D [ DW = DDW
From the eigenvalue decomposition of \ = X diag(l )X W with real eigens
s W
values l  0 and the fact that diag(l ) = diag( l ) diag( l )
such
that
p
p W
DDW = X diag( l ) diag( l ) X W
we obtain

p
D = X diag( l )

The matrix D is also called the square root matrix of \ and can be found
from the singular value decomposition of \ or from Cholesky factorization
(Press et al., 1992).
Example Generate a normal vector \ with H [\ ] = (300> 300)W , with
standard deviations 1 = 106=066, 2 = 35=355 and correlation \ = 0=8.

4.2 Generation of correlated random variables

67

Solution: The covariance matrix of \ is obtained using the denition of


the linear correlation coe!cient (2.58),

2

1
11250 3000
\ 1 2
=
\ =
3000 1250
\ 1 2 22
The square root matrix D of \ is

63=640 84=853
D=
0
35=355
which is readily checked by computing DDW = \ . It remains to generate
p independent draws for [1 and [2 from a normal distribution with zero
mean and unit variance as explained in Section 4.1.1. Each pair ([1 > [2 )
out of the p pairs is transformed as \ = D[ +H [\ ]. The result, component
\2 versus \1 , is shown in Fig. 4.4.
Y2

450
400
350
300
250
200
150
100
50
0
0

100

200

300

400

500

600

700
Y1

Fig. 4.4. The scatter diagram of the simulated vector \=

4.2 Generation of correlated random variables


Let us consider the problem of generating two correlated random variables
[ and \ with given distribution functions I[ and I\ . The correlation
is expressed in terms of the linear correlation coe!cient ([> \ ) =  dened in (2.58). The need to generate correlated random variables often
occurs in simulations. For example, as shown in Kuipers and Van Mieghem

68

Correlation

(2003), correlations in the link weight structure may signicantly increase


the computational complexity of multi-constrained routing, called in brief
QoS routing. The importance of measures of dependence between quantities
in risk and nance is discussed in Embrechts et al. (2001b).
In general, given the distribution functions I[ and I\ , not all linear
correlations from 1    1 are possible. Indeed, let [ and \ be positive
real random variables with innite range which means that I[ ({) = 1 if
{ $ 4 and that I[ ({) = I\ ({) = 0 for { ? 0. Consider \ = d[ + e with
d ? 0 and e  0. For all nite | ? 0,


|e
|e
 Pr [ A
I\ (|) = Pr [\  |] = Pr [d[ + e  |] = Pr [ 
d
d

|e
= 1  I[
A0
d
which contradicts the fact that I\ (|) = 0 for | ? 0. Hence, positive random variables with innite range cannot be correlated with  = 1= The
requirement that the range needs to be unbounded is necessary because two
uniform random variables on [0> 1], X1 and X2 , are negatively correlated with
 = 1 if X1 = 1  X2 .
In summary, the set of all possible correlations is a closed interval [min > max ]
for which min ? 0 ? max . The precise computation of min and max is, in
general, di!cult, as shown below.
4.3 The non-linear transformation method
The non-linear transformation approach starts from a given set of two random variables [1 and [2 that have a correlation coe!cient [ 5 [1> 1]. If
the joint distribution function
Z {1 Z {2
i[1 [2 (x> y; [ )gxgy
I[1 [2 ({1 > {2 ; [ ) = Pr [[1  {1 > [2  {2 ] =
3"

3"

is known, the marginal distribution follows from (2.60) as


Z {1 Z "
Pr [[1  {1 ] =
i[1 [2 (x> y; [ )gxgy
3"

3"

Since for any random variable [ holds that I[ ([) = X where X is a uniform
random variable on [0> 1], it follows that X1 = I[1 ([1 ) and X2 = I[2 ([2 )
are uniformly correlated random variables with correlation coe!cient X .
As shown in Section 3.2.1, if X is a uniform random variable on [0,1],
any other random variable \ with distribution function j({) can be constructed as j 31 (X ). By combining the two transforms, we can generate

4.3 The non-linear transformation method

69

\1 = j131 (I[1 ([1 )) and \2 = j231 (I[2 ([2 )) that are correlated because [1
and [2 are correlated. It may be possible to construct directly the correlated random variables \1 = W1 ([1 ) and \2 = W2 ([2 ) if the transforms W1
and W2 are known.
The goal is to determine the linear correlation coe!cient \ dened in
(2.58),
H [\1 \2 ]  H [\1 ] H [\2 ]
p
\ = p
Var [\1 ] Var [\2 ]
as a function of [ . Using (2.61),

H [\1 \2 ] = H j131 (I[1 ([1 )) j231 (I[2 ([2 ))


Z "Z "
=
j131 (I[1 (x)) j231 (I[2 (y)) i[1 [2 (x> y; [ )gxgy (4.7)
3"

3"

This relation shows that \ is a continuous function in [ and that the


joint distribution function of [1 and [2 is needed. The main di!culty lies
now in the computation of the integral appearing in H [\1 \2 ]. For [1 and
[2 , Gaussian correlated random variables are most often chosen because
an exact analytic expression (4.4) exists for the joint distribution function
I[2 [2 ({1 > {2 ; [ ).
4.3.1 Properties of \ as a function of [
From now on, we choose Gaussian correlated random variables for [1 and
[2 .
Theorem 4.3.1 The correlation coe!cient \ is a dierentiable and increasing function of [ .
Proof:

From the partial dierential equation (4.5) of i[1 [2 (x> y; [ ), it follows that
CH [\1 \2 ]
=
C[

"

3"

"




 C 2 i[1 [2 (x> y; [ )
j131 I[1 (x) j231 I[2 (y)
gxgy
CxCy
3"

Partial integration with respect to x and y yields






] " ] "
gj131 I[1 (x) gj231 I[2 (y)
CH [\1 \2 ]
i[1 [2 (x> y; [ )gxgy
=
C[
gx
gx
3" 3"
Applying the chain rule for dierentiation and

gj 1 ({)
g{

1
j 0 (j 1 ({))

gives


gj 31 ({) 
i[ (x)
g{
gj 31 (I[ (x))
=
= 0 31

gx
g{
j (j (I[ (x)))
{=I[ (x) gx
Since j 0 ({) and i[ (x) are probability density functions and positive,

CH[\1 \2 ]
C[

we have shown that \ is a dierentiable, increasing function of [ .

C\
C[

A 0. Hence,

70

Correlation

Since [ 5 [1> 1], \ increases from \ min at [ = 1 to \ max corresponding to [ = 1. In the sequel, we will derive expressions to compute
the boundary cases [ = 1 and [ = 1.
Theorem 4.3.2 (of Lancaster) For any two strictly increasing real functions W1 and W2 that transform the correlated Gaussian random variables [1
and [2 to the correlated random variables \1 = W1 ([1 ) and \2 = W2 ([2 ),
it holds that
|\ |  |[ |
If two correlated random variables \1 and \2 can be obtained by separate
transformations from a bi-variate normal distribution with correlation coefcient [ , the correlation coe!cient \ of the transformed random variables
cannot in absolute value exceed [ . The interest of the proof is that it uses
powerful properties of orthogonal polynomials and that \ is expanded in a
power series in [ in (4.12).
Proof: The proof is based on the orthogonal Hermite polynomials Kq ({) (see e.g. Rainville
(1960) and Abramowitz and Stegun (1968, Chapter 22)) dened by the generating function
"
[


Kq ({) wq
exp 2{w 3 w2 =
q!
q=0

(4.8)



After expanding exp 2{w 3 w2 in a Taylor series and equating corresponding powers in w, we nd
that
[ q2 ]
[
(31)n (2{)q32n
Kq ({) = q!
(4.9)
n! (q 3 2n)!
n=0
with K0 ({) = 1. The Hermite polynomials satisfy the orthogonality relations
] "
2
h3{ Kq ({) Kp ({) g{ = 0
p 6= q
3"

"

3"

I
2
h3{ Kq2 ({) g{ = 2q q! 

These orthogonality relations enable us to expand functions in terms of Hermite polynomials


(similar to Fourier analysis). If the expansion of a function i ({),
i ({) =

"
[

dn Kn ({)

n=0

converges for all {, then it follows from the orthogonality relations that
] "
2
1
h3{ i ({) Kn ({) g{
dn = n I
2 n!  3"
The joint normalized Gaussian density function can be expanded (Rainville, 1960, pp. 197198)
in terms of Hermite polynomials


2
2
exp 3 { 32{|+|
"
2
(13 )
2
2 [
q
(4.10)
s
= h3{ 3|
Kq ({) Kq (|) q
2 q!
1 3 2
q=0

4.3 The non-linear transformation method

71

 
 
In order for the covariance Cov[\1 \2 ] to exist, both H \12 and H \22 must be nite. Since
\m = Wm ([m ) for m = 1> 2, the mean is
&
%
] "
] "
1
({ 3 [ )2
g{
H [\m ] =
Wm ({)i[m ({) g{ = I
Wm ({) exp 3
2
2[
2[ 3"
3"
] "
I
2
1
= I
Wm ([ + 2[ x)h3x gx
 3"
Let
"

 [
I
Wm [ + 2[ x =
dn;m Kn (x)

(4.11)

n=0

with
dn;m =

1
I
2n n! 

"

3"



I
2
h3{ Wm [ + 2[ { Kn ({) g{

then, since K0 ({) = 1,


H [\m ] = d0;m
k l
The second moment H \m2 follows from (2.34) as
 
H \m2 =

"

3"

1
= I


(Wm ({)) i[m ({) g{ = I


2[
"
I
2
3x2
Wm ([ + 2[ x)h
gx

"

3"

%
Wm2 ({) exp

&
({ 3 [ )2
3
g{
2
2[

3"

Substituting (4.11) gives


] "
"
" [
"
[
  [
2
1
dp;m dn;m I
Kn (x) Kp (x) h3x gx =
d2n;m 2n n!
H \m2 =
 3"
n=0 p=0
n=0
which is convergent. Similarly, using (4.4),
] " ] "
H [\1 \2 ] =
W1 (x) W2 (y) i[1 [2 (x> y; [ )gxgy
3"

"

3"

5
9
exp 73

"

W1 (x) W2 (y)

(x[ )2
2
[

2[
(y\ )2
(x3[ )(y3\ )+
2
[ \
\
2(132
[)

3

6
:
8

s
gxgy
1 3 2


{2 32[ {|+| 2
] " ] "
 
 exp 3

I
I
(132[ )
=
W1 [ + 2[ { W2 \ + 2\ |
g{g|
t
3" 3"
 1 3 2[
%
&
] " ] "
"
"
[
[
1
{2 3 2[ {| + | 2


= t
dn;1
dp;2
Kn ({) Kp (|) exp 3
g{g|
1 3 2[
3" 3"
 1 3 2 n=0
p=0
=

3"

3"

2[ \

Using (4.10),
H [\1 \2 ] =

] "
] "
"
"
"
[
[
q
2
2
1 [
[
dn;1
dp;2
h3{ Kn ({) Kq ({) g{
h3| Kp (|) Kq (|) g|
q
 n=0
2
q!
3"
3"
p=0
q=0

72

Correlation

Introducing the orthogonality relations for Hermite polynomials leads to


H [\1 \2 ] =

"
[

dq;1 dq;2 2q q!q


[

q=0

The correlation coe!cient becomes


S"
S"
q
q
dq;1 dq;2 2q q!q
q=0 dq;1 dq;2 2 q![ 3 d0;1 d0;2
[
tS
= tS q=1
\ = tS
S"
"
"
"
2
2
2
2
2
2 2n n!
n
n
n n!
d
2
n!
3
d
d
2
n!
3
d
d
2
d
n=0 n;1
n=1 n;2
n=1 n;1
n=1 n;2
0;1
0;2
I
I
S
S"
2
2
Denote q = dq;1 2q q! and q = dq;2 2q q!, then Var[\1 ] = "
n=1 n and Var[\2 ] =
n=1 n .
Since the linear correlation coe!cient  ([> \ ) equals the correlation coe!cient of the corresponding normalized
random
with mean zero and variance 1, as shown in Section 2.5.3, we may
S
S"variable
2
2
choose "
n=1 n =
n=1 n = 1 such that
\ =

"
[

q q q
[

(4.12)

q=1

If 21 = 1 and 12 = 1, then |\ | = |[ | because all other n and n must then vanish. In all
other cases, either 21 ? 1 or 12 ? 1 or both, such that
\ = 1 1 [ +

"
[

q q q
[

q=2

and

y
y


x"
x"
"
"
[

[
x[
x[

q
q
q
2 | |q
q q [  $
|q q | |[ | $ w
2q |[ | w
q

[


q=2

q=2

q=2

where we have used the Cauchy-Schwarz inequality


partial summation,
"
[

2q |[ |q = (1 3 |[ |)

q=2

because

Sq

n=2

2n ?

S"

n=2

de $

q=2

sS

d2

sS

e2 (see Section 5.5). By

" [
q
[


 |[ |2


= 1 3 21 |[ |2
2n |[ |q $ (1 3 |[ |) 1 3 21
1
3
|
|
[
q=2 n=2

2n = 1 3 21 . Thus



t
t
|\ | $ |[ | |1 1 | + 1 3 21 1 3 12 |[ |
Finally, for 21 $ 1 and 12 $ 1, the inequality |1 1 | +
Lancasters theorem because |[ | $ 1.

t
t
1 3 21 1 3 12 $ 1 holds. This proves

4.3.2 Boundary cases


Let us investigate some cases for special values of [ .
1. [ = 0. Since uncorrelated Gaussian random variables ([ = 0) are
independent, also \1 = j131 (I[1 ([1 )) and \2 = j231 (I[2 ([2 )) are independent such that \ = 0. Hence, uncorrelated Gaussian random variables
with [ = 0 lead to uncorrelated random variables \1 and \2 with \ = 0.

4.3 The non-linear transformation method

73

2. [ = 1. Perfect positively correlated Gaussian random variables


[1 = [2 = [ have joint distribution


1
(x  [ )2
(x  y)
exp 
i[1 [2 (x> y; 1) = s
2
2[
2[
which follows from Pr [[1  {1 > [2  {2 ] = Pr [[  {1 > [  {2 ] = Pr [[  {]
with { = max({1 > {2 ). In that case,
Z "
H [\1 \2 ] =
j131 (I[ (x)) j231 (I[ (x)) gI[ (x)
3"
1

Z
=

j131 ({) j231 ({) g{

(4.13)

which may lead to \ max ? 1 depending on the specics of j1 and j2 .


By transforming { = j1 (x), we obtain
Z j31 (1)
1
xj231 (j1 (x)) j10 (x)gx
H [\1 \2 ] =
j131 (0)

which shows that, if j1 = j2 = j,


Z j31 (1)

x2 j 0 (x)gx = H \ 2
H [\1 \2 ] =
j 31 (0)

Hence, if \1 and \2 have the same distribution function j as \ , the case


[ = 1 leads to

H \ 2  (H [\ ])2
\ =
=1
Var [\ ]
3. [ = 1. Perfect negatively correlated Gaussian random variables
[1 = [2 = [ have joint distribution


1
(x  [ )2
exp 
i[1 [2 (x> y; 1) = s
(x + y)
2
2[
2[
which follows from the symmetry relations (4.6). In that case,
Z "
j131 (I[ (x)) j231 (I[ (x)) gI[ (x)
H [\1 \2 ] =
Z3"
"
=
j131 (I[ (x)) j231 (1  I[ (x)) gI[ (x)
3"
1

Z
=

j131 ({) j231 (1  {) g{

(4.14)

which may lead to \ min A 1, depending on the specics of j1 and j2 .

74

Correlation

4.4 Examples of the non-linear transformation method


4.4.1 Correlated uniform random variables
Let us rst focus on the relation between [ and X . Since H [X ] = 12 and
1
X2 = 12
, the denition of the linear correlation coe!cient (2.58) gives
H [X1 X2 ] 

X =

1
4

1
12

where, using (2.61),


H [X1 X2 ] = H [I[1 ([1 )I[2 ([2 )]
Z "Z "
=
I[1 (x)I[2 (y)i[1 [2 (x> y; [ )gxgy
3"

3"

In the case of Gaussian correlated random variables specied by (3.20) and


(4.4), we must evaluate the integral
]

"

H [X1 X2 ] =

gx
3"

"

gy

gw h

3"

H [X1 X2 ] =

(2)2

gx0

x3[1
[1

3"

"

, w0 =

gy 0

3"

x0

g h

( [ )2
2
22
[2

3"

(x3[1 )(y3[2 )+
2(132
[)

2 2
(2)2 [
1 [2

Substituting successively x0 =

3"

"

(w[ )2
1
22
[1

2

x[
2[
1
3
2

[1 [2
[1

9
exp 9
73

w3[1
[1

w02
h3 2

, y0 =

gw0

3"

6
:
:
8

t
1 3 2[

y3[2
[2

y0

2

y[
2
2

[2

, 0 =

 02
h3 2

3"

 3[2
[2

, we obtain



02
0 0
02
exp 3 x 32[ x 2y +y
2(13[ )
g 0
t
1 3 2[

We now use the partial dierential equation (4.5),





4

3
02
0 0
02
x02 32[ x0 y 0 +y02
C2
exp 3 x 32[ x 2y +y
exp
3
Cx0 Cy 0
2(13[ )
2(132
F
C E
[)
E
F=
t
t
D
C C
1 3 2[
1 3 2[
such that
CH [X1 X2 ]
=
C

"

gx0

3"

"

gy0

3"

h3

w02
2

3"

gw0

3"

h3

 02
2

g 0

C2
Cx0 Cy 0




02
0 0
02
exp 3 x 32[ x 2y +y
2(13[ )
t
2
(2)
1 3 2[

Partial integration in the last integral y 0


&$
#
%
x02 3 2[ x0 y0 + y 02
C2


exp 3
L2 =
gy
h
g
Cx0 Cy 0
2 1 3 2[
3"
3"
#
%
&$
] "
y2 C
x02 3 2[ x0 y + y2


exp
3
=
gyh3 2
Cx0
2 1 3 2[
3"
]

"

y0

02

3 2

4.4 Examples of the non-linear transformation method

75

yields
CH [X1 X2 ]
=
C

"

y2
gyh3 2

3"

"

gx0

3"

x0

w02
h3 2

gw0

C
Cx0

3"




02
0 0
02
exp 3 x 32[ x 2y +y
2(13[ )
t
(2)2 1 3 2[

and similarly in the x0 integral,


%



&
] "
] "
x2 2 3 2[ 3 2[ xy + y 2 2 3 2[
CH [X1 X2 ]
1


t
gy
gx exp 3
=
C
2 1 3 2[
3"
(2)2 1 3 2[ 3"
 
 
5 
(432[ )
$2 6
 #
] "
exp 3
y2 ] "
2
2(23[ )
2 3 2[

y
[
 x3 
 8
gy
t
gx exp 73 
=
2 3 2[
2 1 3 2[
3"
3"
(2)2 1 3 2[
v 
v 

2 2 3 2[
2 1 3 2[
1
1
1



 =
t
t
=
2
2
2
2
4
3

2
3

2
[
[
(2)
1 3 [
4 3 2[
Thus, we nd that
6
CX
1
= t
C[
 4 3 2
[
or that
X =

6


 
1
6
[
t
+ f = arcsin
+f

2
4 3 2[

It remains to determine the constant f. We have shown in Section 4.3.2 that random variables
generated from uncorrelated Gaussian random variables are also uncorrelated implying that X =
0 if [ = 0 and, hence, that the constant f = 0. This nally results in
 
6
[
(4.15)
X = arcsin

2

In summary, two uniform correlated random variables X1 and X2 with


correlation coe!cient X are found by transforming two Gaussian correlated

random variables [1 and [2 with correlation coe!cient [ = 2 sin 6X .


Equation (4.15) further shows that X = 1 if [ = 1, which indicates
that the whole range of the correlation coe!cient X is possible.
4.4.2 Correlated exponential random variables
In Section 3.2.1, we have seen that, if X is a uniform random variable on
[0,1], j 31 (X ) =  1 log X is an exponential random variable with mean 1 .
The correlation coe!cient for two exponential random variables, \1 and \2 ,
with mean 11 and 12 respectively, is
\ =

H [\1 \2 ] 
1
1 2

1
1 2

= H [1 2 \1 \2 ]  1

76

Correlation

As above, we generate \1 =  11 log I[1 ([1 ) and \2 =  12 log I[2 ([2 ),
where [1 and [2 are correlated Gaussian random variables with correlation
coe!cient [ . Then,
1 2
H [1 2 \1 \2 ] =
H [log I[1 ([1 ) log I[2 ([2 )]
1 2
Z "Z "
log I[1 (x) log I[2 (y)i[1 [2 (x> y; [ )gxgy
=
3"

3"

In the general case for [ 6= 0, the previous method can be followed, which yields after substitution
towards normalized variables,


2
2
] x

] y
 exp 3 x 32[ xy+y
] "
] "
2 )
2
2
2
13
(
w

[
H [1 2 \1 \2 ] =
gx
gy log
h3 2 gw log
h3 2 g
t
3"
3"
3"
3"
(2)2 1 3 2[
Unfortunately, we cannot evaluate this integral analytically.

Let us compute the upper bound \ max from (4.13) with j131 ({) =  11 log {
and j231 ({) =  12 log {,
Z 1
log2 {g{ = 2
H [1 2 \1 \2 ; [ = 1] =
0

and thus \ max = 1. The lower boundary \ min follows from (4.14) as1 ,
Z 1
2
log { log(1  {)g{ = 2 
H [1 2 \1 \2 ; [ = 1] =
6
0
2

Here, we nd \ min = 1  6 = 0.644 934===.


In summary, exponential correlated random variables can be generated
from Gaussian correlated random variables, but the correlation coe!cient
1

Substituting the Taylor expansion log(1 3 {) = 3


]

log { log(1 3 {)g{ = 3


0

and

]
{n log {g{ = 3

"

S"

{n
n=1 n

"
[
1
n
n=1

gives
{n log {g{

h3(n+1)x xgx = 3

1
(n + 1)2

Thus,
]

log { log(1 3 {)g{ =


0

Since
]

1
n(n+1)2

1
n

1
n+1

log { log(1 3{)g{ =


0

"
[
n=1

1
n(n + 1)2

1
,
(n+1)2

"
"
"
"
"
[
[
[
1 [ 1 [ 1
1
1
2
= 13
= 23
= 2 3(2) = 2 3
3
3
2
2
2
n n=2 n n=2 n
n
n
6
n=1
n=2
n=1

4.4 Examples of the non-linear transformation method

77

h
i
2
\ is limited to the interval 1  6 > 1 . As explained in the introduction
of Section 4.2, the exponential random variables are positive with innite
range for which not all negative correlations are possible. The analysis
demonstrates that it is not possible to construct two exponential random
2
variables with correlation coe!cient smaller than \ min = 1  6 ' 0=645.

4.4.3 Correlated lognormal random variables


Two correlated lognormal random variables \1 and \2 with distribution
specied in (3.43) can be constructed directly from two correlated Gaussian
random variables [1 and [2 . In particular, let \1 = hd1 [1 and \2 = hd2 [2 .
The explicit scaling parameters can be used to determine the desired mean.
From (4.7),
Z

"

"

H [\1 \2 ] =
3"

3"

hd1 x hd2 y i[1 [2 (x> y; [ )gxgy

12 2
22 2
= exp d1 1 + d2 2 + d1 + d1 d2 [ 1 2 + d2
2
2
where the Laplace transform (4.2) for q = 2 has been used. Invoking (3.45)
and (3.46) with m $ dm m and m2 $ d2m m2 , the correlation coe!cient \ is
hd1 1 d2 2 [  1
\ = r

2 2
2 2
hd1 1  1 hd2 2  1

(4.16)

If at least one (but not all) of the quantities 1 > 2 > d1 or d2 grows large, \
tends to zero irrespective of [ . Thus even if [1 and [2 and, hence also
\1 and \2 , have the strongest kind of dependence possible, i.e. [ = 1,
the correlation coe!cient \ can be made arbitrarily small. In case d1 1 =
d2 2 = , (4.16) reduces to
2

\ =

h [  1
h2  1
2

We observe that \ max = 1, while \ min = h3 A 1 for  A 0; again a


manifestation that for positive random variables with innite range not all
negative correlations are possible.

78

Correlation

4.5 Linear combination of independent auxiliary random


variables
In spite of the generality of the non-linear transformation method, the involved computational di!culty suggests us to investigate simpler methods of
construction. It is instructive to consider two independent random variables
Y and Z with known probability generating functions *Y (}) and *Z (})
respectively. In the discussion of the uniform random variable in Section
3.2.1, it was shown how to generate by computer an arbitrary random variable from a uniform random variable. We thus assume that Y and Z can
be constructed. Let us now write [ and \ as a linear combination of Y and
Z,
[ = d11 Y + d12 Z + e1
\ = d21 Y + d22 Z + e2
which is specied by the matrix

D=

d11 d12
d21 d22

and compute the covariance dened in (2.56),


Cov [[> \ ] = H [[\ ]  [ \
h
i
= d11 d21 H Y 2  (H [Y ])2 + (d11 d22 + d12 d21 ) H [Y Z ]
h
i

 (d11 d22 + d12 d21 ) H[Y ] H[Z ] + d12 d22 H Z 2  (H[Z ])2
Since X and Y are independent, H [Y Z ] = H [Y ] H [Z ], and with the denition of the variance (2.16) and denoting Y2 =Var[Y ] and similarly for Z ,
we obtain
2
Cov [[> \ ] = d11 d21 Y2 + d12 d22 Z

In a same way, we nd
2
2
[
= d211 Y2 + d212 Z
2
\2 = d221 Y2 + d222 Z

(4.17)

such that the correlation coe!cient, in general, becomes


2
d11 d21 Y2 + d12 d22 Z
q
= q
2
2
d211 Y2 + d212 Z
d221 Y2 + d222 Z

which is independent of the constants e1 and e2 since for a centered moment


H ([  H [[])2 = H ([ + e  H [[ + e])2 .

4.5 Linear combination of independent auxiliary random variables

79

In order to achieve our goal of constructing two correlated random variables [ and \ , we can choose the coe!cients of the matrix D to obtain
an expression as simple as possible. If we choose [ = Y or d11 = 1,
d12 = e1 = 0, the correlation coe!cient reduces to
2
d21 [
1
=r
= q q
2
2
2 + d2  2
d2 Z
[
d221 [
22 Z
1 + d22
2 2
21

By rewriting this relation, we obtain


d21 =
If we choose d22 =

d
Z
p 22
[
1  2

p
1  2 , the random variables [ and \ are specied as
[=Y
\ =

p
Z
Y + 1  2 Z + e2
[

2 =  2 and  2 =  2 . Finally,
and the corresponding variances (4.17) are [
Y
\
Z
we require that H [Z ] = Z = 0, which species

e2 = H [\ ]

\
H [[]
[

If Z is a zero mean random variable with standard deviation Z = \ , the


random variables [ and \ are correlated with correlation coe!cient 
p
\
\
[ + 1  2 Z + \

[
(4.18)
\ =
[
[
In the sequel, we take the positive sign for .
Let us now investigate what happens with the distribution functions
of [
3}[
and \ . Using the
pgfs
for
continuous
random
variables
*
(})
=
H
h
[

and *\ (}) = H h3}\ , the last relation (4.18) becomes

3}
H h3}\ = h




\ 3  \ [
[

i h s 2 i
h

3} \ [
H h3} 13 Z
H h [

because Y = [ and Z are independent, or



p


\
3} \ 3  \ [
[
*\ (}) = h
*[
} *Z
1  2 }
[

(4.19)

In order to produce two random variables [ and \ that are correlated with
correlation coe!cient , the pgf of the zero mean random variable Z with

80

Correlation

variance \2 must obey,


h




\ 3  \ [
[
I
}
132

*Z (}) =
*[

[

*\

\
s

13

s} 2
13

(4.20)

}
2

which can be written in terms of the translated random variables \ 0 =


\  \ and [ 0 = [  \ [ ,

}
*\ 0 s 2
13

*Z (}) =
\
s
*[ 0
}
2
[

13

This form shows that, if [ 0 and \ 0 have a same distribution, Z possesses,


in general, a dierent distribution. Only the pgf of a Gaussian (with zero
mean) obeys the functional equation

}
i s 2
13

i (}) =

i s 2}
13

The joint probability generating function follows from (2.61) as


Z "Z "

h3}1 {3}2 | i[\ ({> |)g{g|


*[\ (}1 > }2 ) = H h3}1 [3}2 \ =
3"

and the inverse is


i[\ ({> |) =

1
(2l)2

f1 +l" Z f2 3l"

f1 3l"

f2 3l"

3"

h}1 {+}2 | *[\ (}1 > }2 )g}1 g}2

(4.21)

Using (4.18), we have

 

s 2 i
h

3 }1 +}2  \ [
[
[
H h3}2 13 Z
H h
*[\ (}1 > }2 ) = h



p


\
3}  3 \ 
*Z }2 1  2 (4.22)
= h 2 \ [ [ *[ }1 + }2
[



3}2 \ 3  \ [

Introduced into the complex double integral (4.21), the joint probability
density function of the two correlated random variables can be computed.
The main deciency of the linear combination method is the implicit assumption that any joint distribution function i[\ ({> |) can be constructed
from two independent random variables [ and Z . The corresponding joint

4.5 Linear combination of independent auxiliary random variables

81

pgf (4.22) possesses a product form that cannot always be made compatible
with the form of an arbitrary pgf *[\ (}1 > }2 ). The examples below illustrate
this deciency.

4.5.1 Correlated Gaussian random variables

If [ and \ are Gaussian random variables with Laplace transform H h3}\


given in (3.22), the expression (4.20) for *Z (}) becomes
 2
\
}2
*Z (}) = exp
2
which shows that Z is also a Gaussian random variable with mean Z = 0
and standard deviation Z = \ . Further, the joint pgf follows from (4.22)
as


2
3}1 [3}2 \
[
\2 2
2
= exp }2 \  [ }1 +
} + }1 }2 \ [ +
}
H h
2 1
2 2
Since


2
2

[
\2 2 1
}1
[ \
[
2
} + }1 }2 \ [ +
} =
}1 }2
[ \
\2
}2
2 1
2 2
2

formula (4.2) indicates that H h3}1 [3}2 \ is the two dimensional pgf of a
joint Gaussian with pdf (4.4). The linear combination method thus provides
the exact results for correlated Gaussian random variables.

4.5.2 Correlated exponential random variables


Let [ and \ be two correlated, exponential random variables with rate {
and | . Recall that H [[] = [ = 1{ . Using the Laplace transform (3.16)
in (4.20), we obtain
*Z (}) = h |

13
I

132

1+
1+

s}

132
}
s
| 132
|

The corresponding probability distribution function follows from (2.38) as

IZ (w) =

1
2l

f+l"

f3l"

1+
1+

s}

132
s}
| 132
|


} w+


|

13
I

132

g}

fA0

82

Correlation

Dene the normalized time W =


1
IZ (w) =
2l

|

s1

f+l"

f3l"

132

, then

1 + W } h}(w+(13)W )
g}
1 + }W
}

Since w + (1  ) W A 0, the contour can be closed over the negative Re(})


plane encircling the poles at } =  W1 and } = 0. By Cauchys residue
theorem,

(1 + W }) } + W1 }(w+(13)W )
(1 + W }) } }(w+(13)W )
IZ (w) = lim
h
h
+ lim
}<0 (1 + }W ) }
(1 + }W ) }
}<3 W1
w

= 1  (1  ) h3(13) h3 W
Hence, for the generation of two exponential, correlated random variables,
the auxiliary random variable Z has an exponential distribution with an
atom of size (1  ) h3(13) at w = 0, which is fortunately easily to generate
with a computer. It appears that only for   0, the linear combination
method leads to correct results for exponential random variables. Moreover,
the method does not give anh indication
i of the validity in the range of . We
2
have shown above that  5 1  6 > 1 .
While the linear combination method applied to generate two exponential random variables still correctly treats a range of , the application to
correlated uniform random variables leads to bizarre results and denitely
shows the deciency of the method. The di!culties already encountered in
this chapter in generating q = 2 correlated random variables with arbitrary
distribution suspects that the case for q A 2 must be even more intractable.
4.6 Problem
(i) Show in two dimensions that (4.3), or in explicit form (4.4), is indeed
the joint pdf corresponding to (4.2).

5
Inequalities

Hardy et al. (1999) view the most known inequalities from various angles,
provide several dierent proofs and relate the nature of these inequalities.
For example, starting from the most basic inequality between geometric and
arithmetic mean1 ,
s
{+|
 max({> |)
(5.1)
{| 
2
s
s 2
{  |  0, they masterly extend this
which directly follows from
relation to the theorem of the arithmetic and geometric mean in several real
variables {n ,
q
q
Y
X
{tnn 
tn {n
(5.2)
min({> |) 

Pq

n=1

n=1

where
n=1 tn = 1. They further move to the inequalities of CauchySchwarz, of Hlder, of Minkowski and many more. Only a few inequalities
are reviewed here and we recommend the classic treatise on inequalities by
Hardy, Littlewood and Polya for those who search for more depth, elegance
and insight.

5.1 The minimum (maximum) and inmum (supremum)


Since these concepts will be frequently used, we explain the dierence by
concentrating on the minimum and inmum (the maximum and the supremum follow analogously). Let
be a non-empty subset of R. The subset

The arithmetic-geometric mean P({> |) is the limit for q < " of the recursion {q =
I
1
({q31 + |q31 ), which is an arithmetic mean, and |q = {q31 |q31 , which is a geometric
2
mean, with initial values {0 = { and |0 = |. Gausss famous discovery on intriguing properties
of P({> |) (which lead e.g. to very fast converging series for computing ) is narrated in a
paper by Almkvist and Berndt (1988).

83

84

Inequalities

is said to be bounded from below by P if there exists a number P such that,


for all { 5
holds that {  P . The largest lower bound (largest number
P ) is called the inmum and is denoted by inf (
). Further, if there exists
an element p 5
such that p  { for all { 5
, then this element p is
called the minimum and is denoted by min (
). If the minimum min (
) exists, then min (
) = inf (
). However, the minimum does not always exists.
The classical example is the open interval (d> e), where inf ((d> e)) = d, but
the minimum does not exist because d 5
@ (d> e). On the other hand, for the
closed interval [d> e], we have that inf ([d> e]) = min ([d> e]) = d. This example
also illustrates that every nite non-empty subset of R has a minimum.
5.2 Continuous convex functions
A continuous function i ({) that satises for x and y belonging to an interval
L,

x+y
i (x) + i (y)
i

2
2
is called convex in that interval L. If i is convex, i is concave. Hardy
et al. (1999, Section 3.6) demonstrate that this condition is fundamental
from which the more general condition2
q
!
q
X
X
i
tn {n 
tn i ({n )
(5.3)
n=1

Pq

n=1

where n=1 tn = 1, can be deduced. Moreover, they show that a convex


function is either very regular or very irregular and that a convex function
that is not entirely irregular is necessarily continuous. Current textbooks,
in particular the book by Boyd and Vandenberghe (2004), usually start with
the denition of convexity from (5.3) in case q = 2 where t1 = 1  t2 = t
and 0  t  1 as
i (tx + (1  t)y)  ti (x) + (1  t)i (y)

(5.4)

where x and y can be vectors in an p-dimensional space.Fig_convex


Geometrically with p = 1 as illustrated in Fig. 5.1, relation (5.4) shows
that each point on the chord between (x> i (x)) and (y> i (y)) lies above the
2

The convexity concept can be generalized (Hardy et al., 1999, Section 98) to several variables
in which case the condition (5.3) becomes
$
#
[
[
[
i
tn {n >
tn |n $
tn i ({n > |n )
n

5.2 Continuous convex functions

85

f(x)

f(v)
c2
c1

f(u)

u a

a'

b'
b v

Fig. 5.1. The function i is convex between x and y.

curve i in the interval L. The more general form (5.3) asserts that the
centre of gravity of any number of arbitrarily weighted points of the curve
lies above or on the curve. Figure 5.1 illustrates that for any convex function
i and points d> d0 > e> e0 5 [x> y] such that d  d0  e0 and d ? e  e0 , the
chord f1 over (d> e) has a smaller slope than the chord f2 over (d0 > e0 ) or,
0
(d)
(d0 )
 i (ee)3i
= Suppose that i ({) is twice dierentiable
equivalently, i (e)3i
0 3d0
e3d
in the interval L, then a necessary and su!cient condition for convexity is
i 00 ({)  0 for each { 5 L. This theorem is proved in Hardy et al. (1999,
pp. 7677). Moreover, they prove that the equality in (5.3) can only occur
if i ({) is linear.
Applied to probability, relation (5.3) with tn = Pr [[ = n] and {n = n is
written with (2.12) as
i (H [[])  H [i ([)]

(5.5)

and is known as Jensens inequality. The Jensens inequality (5.5) also hold
for continuous random variables. Indeed, if i is dierentiable and convex,
then i ({)  i (|)  i 0 (|)({  |). Substitute { by the random variable [
and | = H [[], then
i ([)  i (H [[])  i 0 (H [[])({  H [[])
After applying the expectation operator to both sides, we obtain (5.5). An
important application of Jensens inequality is obtained for i ({) = h3}{
with real } as

h3}H[[]  H h3}[ = *[ (})

86

Inequalities

Any probability generating function *[ (}) is, for real }, bounded from
below by h3}H[[] .
A continuous analog of (5.3) with i ({) = h{ (and similarly for i ({) =
 log {)


Z y
Z y
1
1
exp
i ({)g{ 
hi ({) g{
yx x
yx x
can be regarded as a generalization of the inequality between arithmetic and
geometric mean.

5.3 Inequalities deduced from the Mean Value Theorem


The mean value theorem (Whittaker and Watson, 1996, p. 65) states that
if j({) is continuous on { 5 [d> e], there exists a number  5 [d> e] such that
Z e
j(x)gx = (e  d)j()
d

or, alternatively, if i ({) is dierentiable on [d> e], then


(5.6)
i (e)  i (d) = (e  d)i 0 ()
R{
The equivalence follows by putting i ({) = d j(x)gx= It is convenient to
rewrite this relation for 0    1 as
i ({ + k)  i ({) = ki 0 ({ + k)
In this form, the mean value theorem is nothing else than a special case for
q = 1 of Taylors theorem (Whittaker and Watson, 1996, p. 96),
i ({ + k)  i ({) =

q31
X
n=1

i (n) ({) n kq (q)


k +
i ({ + k)
n!
q!

(5.7)

An important application of Taylors Theorem (or of the mean value theorem) to the exponential function gives a list of inequalities. First,
h{ = 1 + { +

{2 {
h
2

and, since h{ A 0 for any nite {, we have for any { 6= 0,


h{ A 1 + {

(5.8)

5.4 The Markov and Chebyshev inequalities

87

A direct generalization follows from Taylors Theorem (5.7),


{

h =

q31
X

{n {q {
+
h
n!
q!

n=0

such that, for q = 2p and any {,


h{ A

2p31
X
n=0

{n
n!

and, for q = 2p + 1,
h{ A

2p n
X
{
n=0

h{ ?

2p n
X
{
n=0

Second, estimates of the product


from (5.8) as3
q
Y

{A0

n!

{?0

n!
Qq

n=0 (1 + dn {)

(1 + dn {) ? exp

q
X

n=0

where dn { 6= 0 are obtained


!
dn {

n=0

5.4 The Markov and Chebyshev inequalities


Consider rst a non-negative random variable [. The expectation reads
Z "
Z d
Z "
H [[] =
{i[ ({)g{ =
{i[ ({)g{ +
{i[ ({)g{
0
d
Z "
Z0 "

{i[ ({)g{  d
i[ ({)g{ = d Pr [[  d]
d

Hence, we obtain the Markov inequality


Pr [[  d] 
3

H [[]
d

(5.9)

A tighter bound
relation indicates

 SqThe above
Tqis obtained if all dn A 0 (e.g. dn is a probability).
that j({) =
n=0 (1 + dn {) is smaller than i ({) = exp {
n=0 dn for any { 6= 0 and
j(0) = i (0) = 1. Further, from (1 + dn {) ? hdn { it can be veried that, S
for all Taylor
q
n
coe!cients 1 ? n $ q holds that 0 ? jn $ in and j2 ? i2 such that j({) =
n=0 jn { ?
Sq
S
S
"
q
n ?
n for { A 0. Thus, for { = 1, we have j(1) ?
i
{
i
{
i
or
n=0 n
n=0 n
n=0 n
q
\

(1 + dn ) ?

n=0

q
[
1
n!
n=0

q
[

n=0

$n
dn

88

Inequalities

Another proof of the Markov inequality follows after taking the expectation
of the inequality d1[Dd  [ for [  0. The restriction to non-negative
random variables can be circumvented by considering the random variable
[ = (\  H [\ ])2 and d = w2 in (5.9),
h
i
i H (\  H [\ ])2
h
Var [\ ]
=
Pr (\  H [\ ])2  w2 
w2
w2
From this, the Chebyshev inequality follows as
Pr [|[  H [[]|  w] 

2
w2

(5.10)

The Chebyshev inequality quanties the spread of [ around the mean H [[].
The smaller , the more concentrated [ is around the mean.
Further extensions of the Markov inequality use the equivalence between
the events {[  d} / {j([)  j(d)} where j is a monotonously increasing
function. Hence, (5.9) becomes
Pr [[  d] 

H [j([)]
j(d)

H [[ n ]
For example, if j({) = {n , then Pr [[  d]  dn . An interesting application of this idea is based on the equivalence of the events {[  H [[]+w} /
{hx[  hx(H[[]+w) } provided x  0. For x  0,
i
h

Pr [[  H [[] + w] = Pr hx[  hx(H[[]+w)  h3x(H[[]+w) H hx[


(5.11)

where in the last step Markovs inequality


has been used. If the gener (5.9)
ating function or Laplace transform H hx[ is known, the sharpest bound
is obtained by the minimizer xW in x of the right-hand side because (5.11)
holds for any x A 0. In Section 5.7, we show that this minimizer xW obeying
Re x A 0 indeed exists for probability generating functions. The resulting
inequality
h W i
W
Pr [[  H [[] + w]  h3x (H[[]+w) H hx [
(5.12)
is called the Cherno bound.
The Cherno bound of the binomial distribution Let [ denote
a binomial random variable hwith probability
generating function given by
i
x[
[
x
= H (h ) = (t + shx )q . Then, with H [[] = qs,
(3.2) such that H h

x
h3x(H[[]+w) H hx[ = h3x(qs+w)+q log(t+sh )

5.4 The Markov and Chebyshev inequalities

x[
g2 3x(H[[]+w)
Provided gx
h
H
h

x=xW

89

A 0, the minimum xW is solution of

g 3x(H[[]+w) x[
h
H h
=0
gx
Explicitly,

qshx
g 3x(H[[]+w) x[
x
h
H h
= h3x(qs+w)+q log(t+sh ) (qs + w) +
gx
t + shx
from which xW follows using t = 1  s as

qst + tw
xW = log
qst  sw
Hence,

i
h
1
W
W
h3x (H[[]+w) H hx [ =
1+

w
qt
w
qs

w3qt
w+qs

W
W
For large q, but s and w xed, we observe4 that h3x (H[[]+w) H hx [ =

w2
w2
, we
h3 qst 1 + R q1 . Since Var[[] = qst and by denoting | 2 = Var[[]
nd that the asymptotic regime for large q,
#
"
|[  H [[]|
2
 |  h3|
Pr p
(5.13)
Var [[]
is in agreement with the Central Limit Theorem 6.3.1. The corresponding
Chebyshev inequality,
#
"
|[  H [[]|
1
|  2
Pr p
|
Var [[]
is considerably less tight for the binomial distribution than the Cherno
bound (5.13). More advanced and sharper inequalities than that of Chebyshev are surveyed by Janson (2002).
4

Write
h3x

 (H[[]+w)






k  l
w
w
H hx [ = exp (w 3 qt) log 1 3
3 (w + qs) log 1 +
qt
qs

and use the Taylor expansion of log (1 {) around { = 0.

90

Inequalities

5.5 The Hlder, Minkowski and Young inequalities


The Hlder inequality is
X  X 
X 
X
d e }  
e
d
}

+ + = 1

Let d = [ and e = \ , and further s = 1 A 1 and t = 1 A 1 such that


1
1
s + t = 1, then we obtain as a frequently used application,
H [[\ ]  (H [[ s ])1@s (H [\ t ])1@t

(5.14)

The Hlder inequality can be deduced from the basic convexity inequality
(5.4). Since  log { is a convex function for real { A 0, the basic convexity
inequality (5.4) is with 0    1,
log (x + (1  )y)   log(x) + (1  ) log(y)
After exponentiation, we obtain for x> y A 0 a more general inequality than
(5.1), which corresponds to  = 12 ,
x y 13  x + (1  )y
s
Sq|{m | s
|{
m=1 m |

Substitute x =

|{ |s
Pq m
s
m=1 |{m |

!

and y =

|| |t
Pq m
t
m=1 ||m |

t
Sq||m | t ,
||
m=1 m |

!13

then

|{m |s
||m |t
P
  Pq
+
(1

)
q
s
t
m=1 |{m |
m=1 ||m |

and summing over all m yields


q
X

|{m |s ||m |t(13)

3
4 3
413
q
q
X
X
C
|{m |s D C
||m |t D

m=1

m=1

By choosing s = 1 and t =
s A 1 and 1s + 1t = 1,
q
X
m=1

1
13 ,

we arrive at the Hlder inequality with

3
41 3
41
s
t
q
q
X
X
s
t
|{m |m |  C
|{m | D C
||m | D
m=1

(5.15)

m=1

(5.16)

m=1

A special important case of the Hlder inequality (5.14) for s = t = 2 is


the CauchySchwarz inequality,

(H [[\ ])2  H [ 2 H \ 2
(5.17)
It is of interest to mention that the Hlder inequality is of a general type in

5.5 The Hlder, Minkowski and Young inequalities

91

the following sense (Hardy et al., 1999, Theorem 101 (p. 82)). Suppose that
i ({) is convex (such Rthat the inverse j({) =Ri 31 ({) is also convex) and that
{
{
i (0) = 0. If I ({) = 0 i (x)gx and J({) = 0 j(x)gx, and if
!
!
q
q
q
X
X
X
tn dn en  I 31
tn I (dn ) J31
tn J(en )
Pq

n=1

n=1

n=1

with n=1 tn = 1 holds for all positive dn and en , then i ({) = {u and the
above inequality is Hlders inequality.
The next inequalities are of a dierent type. For s A 1, the Minkowski
inequality is
(H [|[ + \ |s ])1@s  (H [|[|s ])1@s + (H [|\ |s ])1@s

(5.18)

or, written algebraically,


41 3
41 3
41
3
s
s
s
q
q
q
X
X
X
s
s
s
C
|{m + |m | D  C
|{m | D + C
||m | D
m=1

m=1

m=1

Suppose that i ({) is continuous and strictly increasing for {  0 and i (0) =
0. Then the inverse function j({) = i 31 ({) satises the same conditions.
The Young inequality states that for d  0 and e  0 holds that
Z e
Z d
i (x)gx +
j(x)gx
(5.19)
de 
0

with equality only if e = i (d). The Young inequality follows by geometrical


consideration. The rst integral is the area under the curve | = i ({) from
[0> d], while the second is the area under the curve { = j(|) = i 31 (|) from
[0> e].
Applications of the CauchySchwarz inequality
will demonstrate that both the generating function *[ (}) =
1. We
H h3}[ and its logarithm O[ (}) = log(*[ (})) are convex functions of
}. First, the
second
derivative is continuous and non-negative because
*00[ (}) = H [ 2 h3}[  0. Further, since
O00[ (})

*[ (})*00[ (})  (*0[ (}))2


=
*2[ (})

it remains to show that *[ (})*00[ (})(*0[ (}))2 


0. From CauchySchwarz
}
inequality (5.17) applied to *0[ (}) = H [h3}[ with [ $ h3 2 [ and \ =

}
2
[h3 2 [ , we obtain (*0[ (}))2 = H [h3}[
 H h3}[ H [ 2 h3}[ =
*[ (})*00[ (}). Hence, O00[ (})  0.

92

Inequalities

2. Let \ = 1[A0 in (5.17) while [ is a non-negative random variable,


then with (2.13),


(H [[])2  H [ 2 H [1[A0 ] = H [ 2 (1  Pr [[ = 0])
such that an upper bound for Pr [[ = 0] is obtained,
Pr [[ = 0]  1 

(H [[])2
H [[ 2 ]

(5.20)

5.6 The Gauss inequality


In this section, we consider a continuous random variable [ with even probability density function, i.e. i[ ({) = i[ ({), which is not increasing for
{ A 0. A typical example of such random variables are measurement errors
due to statistical uctuations.
In his epoch-making paper, Gauss (1821) established the method of the
least squares (see e.g. Section 2.2.1 and 2.5.3). In that same paper, Gauss
(1821, pp. 10-11) also stated and proved Theorem 5.6.1, which is appealing
because of its generality.
We dene the probability p as
Z 
p = Pr [  [  ] =
i[ (x) gx
(5.21)
3

p
where  = Var [[] is the standard deviation.

Theorem 5.6.1 (Gauss) If [ is a continuous random variable with even


probability density function, i.e. i[ ({) = i[ ({), which is not increasing
for { A 0, then
s
p 3
if p ? 23 then  q
if p = 23 then   43
2
if p A 23 then  ? 3I13p
and, conversely,

q
if  ?
if  A

I
3

then p 

4
3

then p  1 

q3

4
2

Given a bound on the probability p, Gausss Theorem 5.6.1 bounds the


extent of the error [ around its mean zero in units of the standard deviation
 or, equivalently, it provides bounds for the normalized random variable

5.6 The Gauss inequality

93

[ W = [3H[[]
. The proof of this theorem only uses real function theory and

is characteristic for the genius of Gauss.
Proof: Consider the inverse function { = j (|) of the integral | =
I[ (3{). An interesting general property of the inverse function is
]

j 2 (x) gx =

"

3"

U{
3{

i[ (x) gx = I[ ({) 3

{2 i[ ({) g{

 
which is veried by the substitution { = j (x). Since H [[] = 0 and Var[[] = H [ 2 , we have
]

j 2 (x) gx =  2 = Var [[]

(5.22)

Beside j (0) = 0, the derivative


j 0 (|) =

1
1
=
0 ({) 3 I 0 (3{)
I[
i
({)
+
i[ (3{)
[
[

is increasing from | = 0 until | = 1 because i[ ({) attains a maximum at { = 0 and is not


00
increasing for { A 0. Hence, j (|) D 0. From the dierential


00
g |j0 (|) = j 0 (|) g| + |j (|) g|
we obtain by integration
|j 0 (|) 3 j (|) =

00

xj (x) gx
0

00

Since j (|) D 0, we have that |j 0 (|) 3 j (|) D 0 and since |j 0 (|) A 0 (for | A 0) that
k (|) = 1 3

j (|)
|j 0 (|)

lies in the interval [0> 1]. From (5.21), it follows that  = j (p) and that k (p) = 1 3
j 0 (p) =


pj 0 (p)

or


p (1 3 k (p))

With this preparation, consider now the following linear function


J (|) =


(| 3 pk(p))
p (1 3 k (p))

(5.23)


Clearly, we have that J (p) =  and that J0 (|) = p(13k(p))
= j 0 (p) is independent of
|. Since j 0 (|) is non decreasing which is the basic assumption of the theorem the dierence
g
j 0 (|)3J0 (|) is negative if | ? p, but positive if | A p. Since j 0 (|)3J0 (|) = g|
(j (|) 3 J (|)),
the function j (|) 3 J (|) is convex with minimum at | = p for which j (p) 3 J (p) = 0. Hence,
j (|) 3 J (|) D 0 for all | M [0> 1]. Further, J (|) is positive for | M (pk (p) > 1]. Especially in this
interval, the inequality j (|) D J (|) is sharp because j (|) is positive in (0> 1]. Thus,

]
J2 (|) g| $

pk(p)

j 2 (|) g| ?

pk(p)

j 2 (|) g|

Using (5.22) and with (5.23), we have


2 2
p2 (1 3 k (p))2

(1 3 pk(p))3
? 2
3

94

Inequalities

from which we arrive at the inequality


3p2 (1 3 })2

2 ?

(5.24)

(1 3 p})3

where } = k (p) M [0> 1]. The derivative of the right-hand side with respect to },
$
#
3p2 (1 3 })
g
3 (1 3 })2 2
=3
p
(2 3 3p + p})
g} (1 3 p})3
(1 3 p})4
3p2 (13})2
is monotonously decreasing for all } M [0> 1] if p ? 23 with maximum at
(13p})3
I
} = 0. Thus, if p ? 23 , evaluating (5.24) at } = 0 yields  ? 3p. On the other hand, if p A 23 ,
2
2
(13})
2
then 3p
is maximal provided 2 3 3p + p} = 0 or for } = 3 3 p
. With that value of },
(13p})3
2
2
2
I
the inequality (5.24) yields  ? 3 13p . Both regimes p A 3 and p ? 3 tend to a same bound
 ? I2 if p < 23 . The converse is similarly derived from (5.24).

shows that

1
If [ has a symmetric uniform distribution with i[ ({) = 2d
1{M[3d>d] , then

d

p = d and  = I3 from which p = I3 . This example shows that Gausss

Theorem 5.6.1 is sharpsfor p 


the rst condition   3p.

2
3

in the sense that equality can occur in

5.7 The dominant pole approximation and large deviations


In this section, we relate asymptotic results of generating functions to the
theory of large deviations. An asymptotic expansion in discrete-time is
compared to established large deviations results.
The rst approach using the generating function *[ (}) of the random
variable [ is an immediate consequence of Lemma 5.7.1.
Lemma 5.7.1 If *[ (}) is meromorphic with residues un at the (simple)
poles sn ordered as 0 ? |s0 |  |s1 |  |s2 |  and if *[ (}) = r(} Q+1 ) as
} $ 4, then holds
*[ (}) =
=

Q
X
n=0
Q
X

Pr [[ = n] } +
n

Pr [[ = n] } +

n=0

"
X

sQ +1
n=0 n

"
X
un

n=0

un
} Q+1
(}  sn )

Q
X
1
}p
+
}  sn p=0 sp+1
n

(5.25)
!
(5.26)

The normalization condition *[ (1) = 1 implies that


Pr[[ A Q ] = 1 

Q
X
n=0

Pr [[ = n] =

"
X

un
Q+1
s
(1  sn )
n=0 n

(5.27)

5.7 The dominant pole approximation and large deviations

95

The Lemma follows from Titchmarsh (1964, Section 3.21). Rewriting


(5.26) gives,
"
!
Q
"
X
X
X un
n
Pr [[ = n] } 
(5.28)
}m
*[ (}) =
m+1
s
m=Q+1 n=0 n
n=0
and hence,
Pr [[ = m] = 

"
X
un
n=0

sm+1
n

(m A Q )

(5.29)

The cumulative density function for N A Q follows from (5.29) as


Pr[[ A N] =

"
X

Pr [[ = m] =

m=N+1

"
X

un
N+1
s
(1  sn )
n=0 n

(N A Q ) (5.30)

Lemma 5.7.1 means that, if the


plot Pr [[ = m] versus m exhibits a kink at
m = Q , then *[ (}) = R } Q as } $ 4. Alternatively5 , the asymptotic
regime does not start earlier than m  Q . For large N, only the pole with
smallest modulus, s0 , will dominate. Hence,
u0
(5.31)
Pr[[ A N]  N+1
s0
(1  s0 )
This approximation is called the dominant pole approximation with the
residue at the simple pole s0 equal to u0 = lim}<s0 *[ (})(}  s0 ).
The second approach is a large deviations approximation in discrete-time.
We have
"
X
Pr [[ = m]
 log Pr [[ A N] =  log
  log

m=N+1
"
X

{m3N31 Pr [[ = m]

m=N+1

  log C{3N31

"
X

({ 5 R and {  1)
4

{m Pr [[ = m]D

m=0

= (N + 1) log {  log *[ ({)

(5.32)

This inequality holds for all real {  1. To get the tightest bound, we
determine the maximizer {max of (5.32), thus L(N) = sup{D1 [(N +1) log {
log *[ ({)]. There exists such a supremum on account of the convexity of
5

In terms of the queue occupancy in ATM, the initial Pr [[ = m]-regime for m ? Q reects the
cell scale, while the asymptotic regime m D Q refers to the burst scale.

96

Inequalities

L(N) because *[ ({) and log *[ ({) are convex for {  1 as shown in Section
5.5. Assuming that the maximum, say {max exists, then it is solution of
[ ({m a x )
{max = (N + 1) *
*0 ({m a x ) and the large deviations estimate becomes
[

3(N+1)
Pr [[ A N]  h3[(N+1) log {m a x 3log *[ ({m a x ))] = *[ ({max ) {max

(5.33)

Observe that (5.33) can be obtained directly from (5.11) with N = w+H[[].
Comparing (5.33) and (5.31) indicates, for large N, that {max = s0 because
lim

N<"

 log Pr [[ A N]
= log s0 = log {max
N

Example A frequently appearing dominant pole (see, for example, the


extinction probability of a Poisson branching process in Section 12.3, the
M/D/1 queue in Section 14.5 and the size of the giant component in the
random graph in Section 15.6.4) is the real zero  dierent from 1 of h(}31) 
}. The trivial zero is } = 1. The non-trivial solution h(31) =  can be
expressed as a Lagrange series (Markushevich, 1985, p. 94) for6  A 1,
 = h3

"
X
(q + 1)q31 3 q
h
q!
q=0

(5.34)

An exact and fast converging expansion for  around  = 1,


"
2
(1  )2 2 (1  )3 22 (1  )4 52 (1  )5
 =1+
+
+
+
(1  ) +

3
9
135
405
20 (1  )6 3824 (1  )7 1424 (1  )8 15856 (1  )9
+
+
+
189
42525
18225
229635
#
10
11

44536288 (1  )
11714672 (1  )
11
+
+ r (1  )
+
189448875
795685275

(5.35)

(}31)

is derived in Van Mieghem (1996) as the zero of h }31 3} . The numerical


data show that the approximation  ' 12 , which can be deduced from the
series, is within 1% accurate for 0=84 ?   1.

From (14.43) we observe for d = e = 1 and } =  that in (5.34)  = 1 for all 0 $  ? 1.

6
Limit laws

Limit laws lie at the heart of analysis and probability theory. Solutions of
problems often considerably simplify in limit cases. For example, in Section
16.5.1, the ooding time in the complete graph with Q nodes and exponentially distributed link weights can be computed exactly. However, the
expression is unattractive, but, fortunately, the limit result for Q $ 4 is
appealing. Many more results and deep discussions are found in the books
of Feller (1970, 1971). In this chapter, we will mainly be concerned with
P
sums of independent random variables, Vq = qn=1 [n .

6.1 General theorems from analysis


In this section, we dene modes of convergence of sequences of random
variables and state (without proof) some general theorems that will be used
later on.

6.1.1 Summability
We will need results from the analysis on summability1 . First the discrete
case is presented and then the continuous case.
Lemma 6.1.1 Let {dq }qD1 be a sequence of numbers with limq<" dq = d,
then the average of the partial sums converges to d,
q
1 X
dp = d
q<" q
p=1

lim

(6.1)

In his classical treatise on Divergent Series, Hardy (1948) discusses Csaro, Abel, Euler and
Borel summability in depth.

97

98

Limit laws

Proof: The demonstration of (6.1) is short enough to include here. The


fact that there is a limit d of the sequence d1 > d2 > = = = implies that, for an
arbitrary % A 0, there exist a nite number q0 such that, for all q A q0 ,
P
holds that |dq  d| ? %. Consider the average partial sum vq = q1 qp=1 dp
or, rewritten,
vq  d =

q0
q
1 X
1 X
(dp  d) +
(dp  d)
q p=1
q p=q
0

Hence,

q0
q
q  q0
1 X
1 X
f
%
|dp  d| +
|dp  d| ? +
|vq  d| 
q
q p=q
q
q
p=1
0
f
? +%
q
Since f is a constant, qf can be made arbitrarily small for q large enough

such that |vq  d| ? %, which is equivalent to (6.1).


In fact, as illustrated by many examples in Hardy (1948, Chapter I and
II), relation (6.1) converges in more cases than limq<" dq = d does. For
example, if d2q = 1 and d2q+1 = 0, the limit limq<" dq does not exist, but
(6.1) tends to 12 . Probabilistically, the Lemma 6.1.1 is closely related to the
sample mean and the Law of Large Numbers (Section 6.2).
The continuous case distinguishes between limw<" j(w), which is called the
pointwise limit (for su!ciently large w, all points
R w w will be arbitrarily close
to that limit) and between the limit limw<" 1w 0 j(x)gx, which is called the
time average2 of j=
Lemma 6.1.2 If the pointwise limit limw<" j(w) = j" exists, then the time
average
Z
1 w
j(x)gx = j" =
lim
w<" w 0
Proof: The proof is analogous to that of Lemma 6.1.1 in the discrete
case since limw<" j(w) = j" means that for an arbitrary % A 0, there exist
a nite number W such that, for all w A W holds that |j(w)  j" | ? %. For
any w A W ,
Z
Z
Z
1 w
1 W
1 w
j(x)gx  j" =
(j(x)  j" ) gx +
(j(x)  j" ) gx
w 0
w 0
w W
2

In summability theory, it is also known as the Cesaro limit of j.

6.1 General theorems from analysis

and

99

Z w
Z
1 W
1 w
1

j(x)gx  j" 
(j(x)  j" ) gx +
|j(x)  j" | gx
w
w 0
w W
0
f
wW
? +%
w
w

Since f is a constant, the lemma follows by letting w $ 4.

Both in Markov theory (Section 9.3.2) and in Littles Law (Section 13.6)
these Lemmas will be used.

6.1.2 Convergence of a sequence of random variables


A sequence {[n }nD0 of random variables may converge to a random variable
[ in several ways. If


Pr lim |[n  [| = 0 = 1
n<"

then the sequence {[n }nD0 converges to [ with probability 1 (w.p. 1) or


almost surely (a.s.). This mode of convergence is denoted by [n $ [ w.p. 1
or a.s. as n $ 4. If, for any  A 0,
lim Pr [|[n  [| A ] = 0

n<"

then it is said that the sequence {[n }nD0 converges in probability or in meas
sure to [. This mode of convergence is denoted by [n $ [ as n $ 4.
Convergence in probability is a weaker notion of convergence than almost
sure convergence. Almost sure convergence implies convergence in probability, whereas convergence in probability means that there exists a subsequence of {[n }nD0 that converges almost surely. An equivalent criterion for
almost surely convergence is
Pr [|[n  [| A  i.o.] = 0
where i.o. stands for innitely often, thus for an innite number of n.
If, for all { with the possible exception of a set of measure zero where
I[q ({) is discontinuous, the distributions
lim I[q ({) = I[ ({)

n<"

then the sequence {[n }nD0 converges in distribution to [, denoted as [n $

100

Limit laws
g

[ as n $ 4 or, sometimes, in mixed form as [n $ I[ as n $ 4. If, for


1  t,
lim H [|[n  [|t ] = 0

n<"

t
then the sequence
R " {[n }tnD0 converges to [ in O , the space of all functions
i for which 3" |i ({) | g{ ? 4. The most common values of t are 1, 2 and
t = 4. This convergence is also called convergence in norm (see Appendix
A.3). The Markov inequality (5.9)

H [|[n  [|]

shows that convergence in mean (t = 1) implies convergence in probability.
In general, it is fair to say that the convergence of sequences belong to the
most complicated topics in both analysis and probability theory. In many
limit theorems, for example, the Law of Large Numbers in Section 6.2 and
Littles Law in Section 13.6, the art consists in proving the theorem with
the least possible number of assumptions or in its most widely applicable
form.
Pr [|[n  [|  ] 

6.1.3 List of general theorems


Theorem 6.1.3 (Continuity Theorem) Let {Iq }qD1 be a sequence of
distribution functions with corresponding probability generating functions
{*q }qD1 . If limq<" *q (}) = *(}) exists for all }, and, in addition if *
is continuous at } = 0, then there exists a limiting distribution function I
g
with generating function * for which Iq $ I .
Proof: See e.g. Berger (1993, p. 51).

Theorem 6.1.4 (Dominated Convergence Theorem) Let {iq }qD1 and


i be real functions and suppose that for each {
lim iq ({) = i ({)

q<"

If there exists a real function j ({) such that |iq ({)| ? j({) and for which
the random variable j([) has nite expectation, then
lim H [iq ([)] = H [i ([)]

q<"

Proof: See e.g. Royden (1988, Chapter 4).

6.2 Law of Large Numbers

101

6.2 Law of Large Numbers


Theorem 6.2.1 (Weak Law of Large Numbers) Let {[n } be a sequence
of independent random variables each with distribution identical to that of
the random variable [ with  = H [[]. If the expectation  = H [[] exists,
then, for any  A 0,

Vq
(6.2)
lim Pr
    = 0
q<"
q
Proof 3 : Replacing [n by [n   demonstrates that, without loss of
generality,
that  = 0. Denote Xq = Vqq , then *Xq (}) =
3}X we may
3}Vassume

q = H h
q @q . Since the set {[ } is independent with common
H h
n
distribution, applying relation (2.66) yields
} q

*Xq (}) = *[
q
Since the expectation exists ( = 0), the Taylor expansion

} (2.40)
q of *[
around } = 0 is *[ (}) = 1 + r(})
. Taking
and*Xq (}) = 1+ r q
the logarithm, log(*Xq (})) = q log 1 + r q} = q=r q} = r(}) for large q
such that limq<" *Xq (}) = 1. By the Continuity Theorem 6.1.3, *X (}) =

g
H h3}X = 1 which implies that Xq $ 0. Hence, the sequence Vqq converges
in distribution to  = 0, which is equivalent to (6.2).

The Weak Law of Large Numbers is a general result of the behavior of


the sample mean Vqq of independent random variables with same existing
expectation . It is weak in the sense that only convergence in probability
is established. For large q, the weak law of large numbers states that the
sample mean Vqq will be close (less than an arbitrary
) to the expectation
V
q   remains small for all
with high probability. It does not imply
that

q
the Weak Law
large q. In fact, large uctuations in Vqq   can happen;

of Large Numbers only concludes that large values of Vqq   occur with
(very) small probability. For example, in a coin-tossing experiment with
a fair coin such that Pr [[n = 1] = Pr [[n = 0] =  = 12 in q-trials, the
sequence of always head {[n = 1}1$n$q is possible with probability 23q
and Vqq = 1 A . But, only for q $ 4, the probability of this always
head sequence is impossible (limq<" 23q = 0). For all nite q there is a
non-zero probability of having a large deviation from the mean.
If we assume in addition to the existence of the expectation that also the
variance Var[[] exists, the Weak Law follows from the Chebyshev inequality
3

An alternative proof is given in Feller (1970, p. 247248).

102

Limit laws

(5.10). This exemplies the increasingly complexity if less restrictions in


the theorems are assumed. Indeed, using (2.57) for independent random

and the Chebyshev inequality (5.10) gives
variables, Var Vqq = Var[[]
q

Vq
Var [[]

    
Pr
q
q2
which tends to zero for any xed  and nite Var[[]. In fact, with the
additional assumption of a nite variance Var[[], a much more precise result
can be proved known as the Central Limit Theorem (Section 6.3). We
remark that the Weak Law of Large Numbers also holds in the case Var[[]
does not exist.
Theorem 6.2.2 (Strong Law of Large Numbers) Let {[n } be a sequence of independent random variables each with distribution identical to
that of the random variable [ with  = H [[]. If the expectation  = H [[]
and variance Var[[] exists, then,


Vq
= =1
(6.3)
Pr lim
q<" q
Proof : See e.g. Feller (1970, p. 259261), Berger (1993, pp. 4648) or
Wol (1989, pp. 4041). Their proof is based on the Kolmogorov criterion:
P
Var[[n ]
is a su!cient condition for the Strong Law
the convergence of "
n=1
n2
of Large Numbers for independent random variables
[n with mean H [[n ]

and variance Var[[n ]. If the existence of H [ 4 is assumed, Ross (1996, pp.


5658) and Billingsley (1995, p. 85) provide a dierent proof. Wol (1989,
pp. 4142) remarks that both the Weak and Strong Laws hold under much
weaker conditions: it is only needed that the [n are not correlated. In other
words, Vqq $  w.p. 1 implies H [[n ] =  even if Var[[n ] = 4.

The Strong Law of Large Numbers roughly states that Vqq   remains
small for su!ciently large q with overwhelming probability. The importance
of the Law of Large Numbers is the mathematical foundation of the intuition
that the sample mean is the best estimator.
Theorem 6.2.3 (Law of the Iterated Logarithm) Let {[n } be a sequence of independent random variables each with distribution identical to
that of the random variable [ with  = H [[] and, if Var[[] exists, then,


Vq  q
=1 =1
(6.4)
Pr lim sup s
q<"
 2q log log q

6.3 Central Limit Theorem

103

Proof: See e.g. Billingsley (1995, p. 154156) or Feller (1970, Section

VIII.5)4 .
In addition to the Weak and Strong Laws of Large Numbers, the

Law of
the Iterated Logarithm provides information about large values of Vqq  .
q

Specically, it states that the bound Vq     2 log log q holds almost


q

surely. The latter means that it is satised innitelyqoften and only for a

nite number of values of q, the converse Vq   A  2 log log q may occur.


q

6.3 Central Limit Theorem


Theorem 6.3.1 (Central Limit Theorem) Let {[n } be a sequence of
independent random variables each with distribution identical to that of the
3q g
random variable [ with nite  = H [[] and 2 = Var[[]. Then VqI
$
q
Q (0> 1) or, explicitly,


Z {
w2
1
Vq  q
s
 { $ ({) = s
h3 2 gw
Pr
 q
2 3"
Proof : Without loss of generality, we may conne to normalized random
variables replace [n by [n3 such that  = 0 and  = 1. Consider the
scaled random variable Xq = q Vq , where q is a real number depending on
q and to be determined later. Similarly as in the proof of the Weak Law of
Large Numbers, we nd that *Xq (}) = (*[ (q }))q . Due to the existence
of the variance, the Taylor expansion (2.40) of *[ around } = 0 is known
2
with higher precision as *[ (}) = 1 + }2 + r(} 2 ). For su!ciently small }, the
logarithm

q2 } 2
2 } 2
2 2
+ r(dq } ) = q q + r(qd2q } 2 )
log(*Xq (})) = q log 1 +
2
2

only converges to a nite (non-zero) number if q = R I1q . Choosing the simplest function that satises this condition, q =
limq<" log(*Xq (})) =

}2
2

I1 ,
q

leads to

or, since
logarithm is a continuous, increasing
2the
}
function, limq<" *Xq (}) = exp 2 . The transform (3.22) shows that the
corresponding limit random variable is a Gaussian Q (0> 1). The theorem
then follows by virtue of the Continuity Theorem 6.1.3.

Feller also mentions sharper bounds.

104

Limit laws

An alternative formulation of the Central Limit Theorem is that the nfold convolution of any probability density function
converges
to a Gaussian
i
h
(nW)
({3)2
1
with  = nH [[]
probability distribution, i[ ({) $ I2 exp  22
and  2 = nVar[[]. Both the Law of Large Numbers and the Central Limit
Theorem can be shown to be valid for a surprisingly large class of sequences
{[n } where each random variable may have a dierent distribution. The
conditions for the extension of the Central Limit Theorem are summarized
in the Lindeberg conditions (Feller, 1971, p. 263). An example where the
sum of independent random variables tend to a dierent limit distribution
than the Gaussian appears in Section 16.5.1.
If higher moments are known, the convergence to the Gaussian distribution can be bounded. Feller (1971, Chapter XVI) devotes a chapter on
expansions related to the Central Limit Theorem culminating in the BerryEsseen Theorem.
Theorem 6.3.2 (BerryEsseen Theorem) Let {[n } be a sequence of
independent random variables each with distribution identical to that
i
h of the
3
2
.
random variable [ with nite  = H [[],  = Var[[] and  = H |[3|
3

Then, with F = 3,

F

Vq  q

s
 {  ({)  s
sup Pr
 q
q
{

(6.5)

Proof : See e.g. Feller (1971, Section XVI.5). The constant F can be
slightly improved to F  2=05.

As an example of the rate of convergence towards the Gaussian distribution, the n-fold convolutions of the uniform density given by (3.30) is plotted
in Fig. 6.1 together with the Gaussian approximation (3.19).

6.4 Extremal distributions


6.4.1 Scaling laws
In this section, limit properties of the maximum and minimum of a set {[n }
of independent random variables are discussed. For simplicity, we assume
that all random variables [n have identical distribution I ({) = Pr [[  {]

6.4 Extremal distributions

105

1.0

Pdf of k-convolved uniform random variables

k=2

Exact convolution
Gaussian approximation

0.8
k=3

k=4

0.6

k=8
0.4
k = 16
0.2

0.0
0

10

12

14

(n)

Fig. 6.1. Both the exact iX ({) with iX ({) = 10{1 and the Gaussian approximation for several values of n.

such that (3.33) and (3.32) simplify to

Pr max [n  { = I p ({)
1$n$p


Pr min [n A { = (1  I ({))p


1$n$p

Consider the limit process when p $ 4. Let {{p } be a sequence of real


numbers. Then, conning to the maximum rst,


log Pr max [n  {p
= p log I ({p )
1$n$p

Since 0  I ({p )  1 and since the logarithm has a Taylor expansion log(1
P
{n
{) =  "
n=1 n around { = 0 and convergent for |{| ? 1, we rewrite the
right-hand side as log I ({p ) = log [1  (1  I ({p ))] and, after expansion,


= p (1  I ({p )) + r [p (1  I ({p ))]
log Pr max [n  {p
1$n$p

106

Limit laws

If limp<" p (1  I ({p )) = , we arrive at


lim Pr max [n  {p = h3
p<"

(6.6)

1$n$p

Hence, by choosing an appropriate sequence {{p } such that  is nite (and


preferably non-zero), a scaling law for the maximum of a sequence can be
obtained and, similarly, for the minimum, if limp<" p (I ({p )) = ,


(6.7)
lim Pr min [n A {p = h3
p<"

1$n$p

The distribution of limp<" min1$n$p [n and limp<" max1$n$p [n are


called extremal distributions.
1.0

D=1

0.8

Probability density function

D=2

D=2
D = 0.5

D = 0.5

0.6

0.4

Weibull

D=1
Frchet
0.2

Gumbel
0.0
-6

-4

-2

Fig. 6.2. The probability density function of the three types of extremal distributions.

6.4.2 The Law of Extremal Types


Two distribution functions I and J are said to be of the same type if there
exists constants d A 0 and e for which I (d{ + e) = J({) for all {.

6.4 Extremal distributions

107

Theorem 6.4.1 (Law of Extremal Types) Any extremal distribution of


a sequence of i.i.d random variables can only have one of the three types
3{

1. Gumbel I ({) = h3h


3
2. Frchet I ({) = h3{ 1{D0

3. Weibull I ({) = h3(3{) 1{?0 + 1{D0
where  A 0.
Proof : See e.g. Berger (1993, pp. 6569).

The generality of this theorem is appealing: any maximum or minimum


of a set of i.i.d. random variables has (apart from the scaling constants d
and e) one of the above three types. The corresponding probability density
functions are plotted in Fig. 6.2.
6.4.3 Examples
1. Consider the set {[n } of exponentially distributed random variables
with I ({) = 1  h3{ . The condition for the maximum is ph3{p $ 
or, equivalently, {p = 1 (log p  log ) and (6.6) becomes, after putting
{ =  log ,


1
3{
lim Pr max [n  (log p + {) = h3h
p<"
1$n$p

The minimum p (1  h3{p ) $  is equivalent to h3{p  log p  log 
log 
or {p  1 log(log p  log )  1 log log p  log
p . Hence, after putting
{ =  log , the limit law for the minimum of exponential random variables
is


{
1
3{
lim Pr min [n A
log log p +
= h3h
p<"
1$n$p

log p
For both the maximum and the minimum of exponential random variables,
3{
a scaling law exists that leads to a Gumbel distribution IJxpeho ({) = h3h .
In other words, for large p, the random variables P =  max1$n$p [n 
log p and Q =  log p (min1$n$p [n )  log p log log p have an identical
distribution equal to the Gumbel distribution.
2. Another example is the maximum of a set of i.i.d. uniform random
variables {Xn } in [0> 1] with I ({) = { for 0  {  1. Since p (1  {p ) $ 

with 0  {p  1 we have, after putting { = 
or, equivalently, {p $ 1  p
with {  0,


{
= h3{
lim Pr max Xn  1 
p<"
1$n$p
p

108

Limit laws

3. Consider a rectangular lattice with size }1 and }2 and with independent


and identical, uniformly distributed link weights on (0> 1] between each lattice point. The number of lattice points (nodes) equals Q = (}1 + 1)(}2 + 1)
and the number of links is O = 2}1 }2 + (}1 + }2 ). The shortest hop path
between two diagonal corner points consists of k = }1 + }2 hops. The weight
Zk of such a k hop path is the sum of k independent uniform random
variables with distribution specied in (3.29),
k
1 X k
I ({) = Pr [Zk  {] =
(1)m ({  m)k 1m${
m
k!
m=0

In particular, Pr [Zk  k] = 1 and for small { ? 1 it holds that I ({) = {k! .


The precise computation of the minimum weight of a k hop path in a lattice
is di!cult due to dependence among those k hop paths and we content
ourselves here with an approximate estimate. If we neglect the dependence
of the k hops paths due to possible overlap, then the minimum weight among
all
k hop
can be approximated by (6.7) because the number5 p =
paths
}1 +}
k!
2
= }1 !}2 ! of those k hop paths is large. The limit sequence must obey
}1
p (I ({p )) $  for su!ciently large p, which implies that I ({p ) must be
1
k
k
small or, equivalently, {p must be small. Hence, p {k!p =  or {p = k!
.
p
The limit law (6.7) for the minimum weight Z = min1$n$p Zk>n of the
shortest hop path in a rectangular lattice is
"
lim Pr

p<"

min Zk>n A

1$n$p

k!{
p

1 #
k

= h3{

In other words, the random variable pZ


tends to an exponential random
k!
or
variable with mean 1 for large p = }1k!
!}2 !

|k
Pr [Z  |]  1  exp p
k!
5

Any path in a rectangular lattice can be representated by a sequence of r(ight), l(eft), u(p) and
d(own), which is called an encoded path word. The encoded path word of the shortest hop
path between diagonal corner points consists of }1 rs (or ls) and }2 ds (or us). The total


number of these paths equals }1}+}2 . Two paths coincide in a same lattice point at j $ k hops
1
from the source node if their encoded path word has the same sum of rs and ds in the rst
j lettres. The number of overlapping links between two paths equals the number of the same
consecutive lettres (r or d) in a block after a same sum of rs and ds in the encoded path words.


Checking for overlap between k hop paths requires a comparison of }1}+}2 ! permutations in
1
the encoded path words.

6.4 Extremal distributions

109

From (2.35), the mean shortest weight of a k hop path equals

Z "
Z "
1
1
{k
g{ =  1 +
(}1 !}2 !) k
(1  IZ ({)) g{ 
exp p
H [Z ] =
k!
k
0
0
For a square lattice where }1 = }2 = k2 , we have
2

k
k
1
!
H [Z ] =  1 +
k
2
Using Stirlings formula (Abramowitz and Stegun, 1968, Section 6.1.38) for
s
1

the factorial k! = 2kk+ 2 h3k+ 12k where 0 ?  ? 1, for large k, the mean
H [Z ] increases about linearly in the number of hops k,

2
s

k
k
k
H [Z ] '
kh 12k

2h
2h
The average weight of a link (or 1 hop) of the shortest k hop path is roughly
1
2h  0=184.
In spite of the fact that path dependence (overlap) has been
in the

s ignored
Q is correct.
computation of the minimum weight, H [Z ] = R (k) = R
However, the approximate analysis does not give the correct prefactor in
H [Z ] nor the correct limit pdf which turns out to be Gaussian. Hence,
if random variables are not independent, Theorem 6.4.1 does not apply.
Finally, a shortest k hop path is not necessarily the overall shortest path
because it is possible though with small probability that the overall
shortest path has k + 2m hops with m A 0.
4. The probability density function of the longest shortest path The most commonly
used process that informs each node about changes in a network topology (e.g. an autonomous
domain) is called ooding: every router forwards the packet on all interfaces except for the
incoming one and duplicate packets are discarded. Flooding is particularly simple and robust
since it progresses, in fact, along all possible paths from the emitting node to the receiving node.
Hence, a ooded packet reaches a node in the network in the shortest possible time (if overhead in
routers are ignored). Therefore, the interesting problem lies in the determination of the ooding
time WQ , which is the minimum time needed to inform the last node in a network with Q nodes.
Only after WQ , all topology databases at each router in the network are again synchronized, i.e. all
routers possess the same topology information. Rather than investigating the ooding time WQ
(for which we refer to Section 16.5), the largest number of traversed routers (hops) or the longest
shortest path from the emitting node to the furthermost node in its shortest path tree is computed.
The number of hops, in short the hopcount KQ , along the shortest path between two arbitrary
nodes in a network containing Q nodes is modeled subject to the following assumptions: (a) the
hopcount KQ is a Poisson random variable with mean H [KQ ] =  =  log Q with  A 0, which is
motivated in Section 16.3.1; (b) the number of nodes Q is very large6 ; (c) all shortest paths from
the emitting node towards any other node in the network are independent. The problem reduces
to compute the pdf of the random variable max1$n$Q 31 Kn . The distribution function follows
from (3.9) as
IKQ ({) =
6

{
"
[
[
n h3
n h3
=13
n!
n!
n=0
n={+1

The size of the Internet is currently estimated at about Q ; 105 .

110

Limit laws

The condition limp<" p (1 3 I ({p )) =  becomes


"
[

lim Q 13

Q <"

n
=
n!

n={Q +1

from which we must choose the appropriate {Q as function of Q. Observe that the maximum
term in the series has index n = [], where the latter denotes the largest integer smaller or equal
to . For, the ratio between two consecutive (positive) terms in the n-sum equals d dn = 
such
n
n1
that, if  A n, then dn A dn31 , implying that the terms increase, while, if  ? n, the terms
dn ? dn31 form a decreasing sequence. The series is rewritten as
"
[
n={Q +1

"
{Q +1 [ ({Q + 1)!n
n
=
n!
({Q + 1)! n=0 ({Q + 1 + n)!


{Q +1

2
=
1+
+
+
({Q + 1)!
{Q + 2
({Q + 2)({Q + 3)

We choose {Q = [] + [] ; (1 + ) for large Q and, thus large , where  must be related to
. The series then consists of decreasing terms. Moreover, for large ,


"
[
n
(1+)+1
1
1
=
1+
+
+
n!
((1 + ) + 1)!
(1 + ) + 2@
((1 + ) + 2@)((1 + ) + 3@)
n=[]+[]+1


1+

(1+)+1
((1 + ) + 1)!

1
1
+
+
(1 + )
(1 + )2

and thus,
"
[
n=[]+[]+1

(1+)+1 1 + 
n
=
n!
((1 + ) + 1)! 


 
1
1+R


Using Stirlings formula (Abramowitz and Stegun, 1968, Section 6.1.38), {! ;


large { yields

I
1
2{{+ 2 h3{ , for

(1+)+1 1 + 
1+
(1+)+1 h(1+)
;
I
1
1
((1 + ) + 1)! 
((1 + ) + 1) 2(1+)+ 2 (1 + )(1+)+ 2 
;

h(1+)[13log(1+)] 1
s

2(1 + )

For large Q, the condition becomes


;

Q (1+)[13log(1+)]+13 1
s

2(1 + ) log Q



1+R

1
log Q



and, after taking the logarithm of both sides,


1
1
log  ; ((1 + ) [1 3 log (1 + )] + 1 3 ) log Q3 log log Q3 log(2(1+))3log +R
2
2

1
log Q

or
log  + ( 3 1) log Q +

1
log log Q + R
2

1
log Q


; ((1 + ) [1 3 log (1 + )]) log Q
3

1
log(2(1 + )) 3 log 
2

(6.8)

At this point, we will assume that  ? 1, which justies the expansion log (1 + ) =  + R( 2 ).
This assumption will be checked later. Thus,
(1 + ) [1 3 log (1 + )] ; (1 + ) [1 3 ] ; (1 3  2 )

6.4 Extremal distributions

111

and  must be solved from


U ; (1 3  2 ) log Q 3


3 log 
2

I

with U = log
2 + ( 3 1) log Q + 12 log log Q . The NewtonRaphson iteration can be
applied with starting value 0 to nd the solution of the equation up to the leading order in log Q,
i.e. U ; (1 3  2 ) log Q. Hence,
v

log
U
31
U
;13
;13
3
13
 log Q
2 log Q
2

0 =

I

2 +

1
2

log log Q

2 log Q

which demonstrates that, for  D 1, the assumption  ? 1 is correct for large Q. The case  ? 1
requires the application of NewtonRaphsons method on (6.8), which we omit here. The second
iteration in NewtonRaphsons method leads to

 1 = 0 3

0
2

+ log 0

20 log Q +

1
2

1
0



and shows that the n-th iteration improves the previous with a quantity of order R log3n Q .
 31 
Since (6.8) is only accurate up to R log Q , a second iteration is superuous and we obtain
the choice {Q = (1 + ), or
{Q =

I
 1
1
3 + 1
log Q 3 log
2 3 log log Q
2
2
4

After substituting { = 3 log , we nally arrive for  D 1 and large Q at



Pr

max

1$n$Q 31

Kn $


{
{
3 + 1
1
1
+
log Q 3 log log Q 3 log (2) = h3h
2
2
4
4

from which the pdf of the hopcount of the longest shortest path (lsp) follows as
iovs ({) = 2h3h

2({f) 32({3f)

(6.9)

with
3 + 1
1
1
log Q 3 log log Q 3 log (2)
2
4
4


1
1
1
3
=
+
H [KQ ] 3 log H [KQ ] 3 log (2)
2
2
4
4

f=

and
H [ovs] = f +
ydu [ovs] =


E
2

3
1
+
2
2


H [KQ ] 3

1
log H [KQ ] 3 0=170
4

2
' 0=4112
24

Observe that the average longest shortest path is about twice the average hopcount if  = 1
while the variance is small, constant and independent of the scaling parameter f or . Figure 6.3
compares the above approximate analysis with simulations.

112

Limit laws
0.6

Theory
Simulation

0.5

Pr[lsp = k]

0.4

0.3

0.2

0.1

0.0
7

10

11

12

13

Number of hops k

Fig. 6.3. The hopcount of the shortest path for Q = 4000. Both simulations based on an
Internet-like topology generator (with unit link weights) and theory iovs (n) with  = 0=4786
are shown.

Notes
(i) The classical theory of extremes, extremal properties of dependent
sequences and extreme values in continuous-time are treated in detail
in the book by Leadbetter et al. (1983).
(ii) A more recent book by Embrechts et al. (2001a) applies the theory
of extremal events to problems in insurance and nance.

Part II
Stochastic processes

7
The Poisson process

The Poisson process is a prominent stochastic process, mainly because it


frequently appears in a wealth of physical phenomena and because it is
relatively simple to analyze. Therefore, we will rst treat the Poisson process
before considering the more general Markov processes.
7.1 A stochastic process
7.1.1 Introduction and denitions
stochastic1

process, formally denoted as {[(w)> w 5 W }, is a sequence of


A
random variables [(w), where the parameter w most often the time runs
over an index set W . The state space of the stochastic process is the set of
all possible values for the random variables [(w) and each of these possible
values is called the state of the process. If the index set W is a countable
set, [[n] is a discrete stochastic process. Often n is the discrete time or a
time slot in computer systems. If W is a continuum, [(w) is a continuous
stochastic process. For example, the outcome of q tosses of a coin is a
discrete stochastic process with state space {heads, tails} and the index set
W = {0> 1> 2> = = = > q}. The number of arrivals of packets in a router during
a certain time interval [d> e] is a continuous stochastic process because w 5
[d> e]. Any realization of a stochastic process is called a sample path. For
example, a sample path of the outcome of q tosses of a coin is {heads, tails,
tails, = = =, heads}, while a sample path of the number of arrivals in [d> e] is
1d$w?d+k > 3 1d+k$w?d+4k > 8 1d+4k$w?d+5k > = = = > 13 1d+(n31)k$w?e , where
k = e3d
n . Other examples are the measurement of the temperature each
day, the notation of the value of a stock each minute or rolling a die and
recording its value, which is illustrated in Fig. 7.1.
1

The word stochastic is derived from  r"l in Greek which means to aim at, try to
hit.

115

116

The Poisson process

Especially in continuous stochastic processes, it is convenient to dene


increments as the dierence [(w)  [(x). The continuous time stochastic process [(w) has independent increments if changes in the value of the
process in dierent time intervals are independent, or, if for all w0 ? w1 ?
? wq , the random variables [(w1 )  [(w0 )> [(w2 )  [(w1 )> = = = > [(wq ) 
[(wq31 ) are independent. The continuous (time) stochastic process has stationary increments if [(w + v)  [(v) possesses the same distribution for all
v. Hence, changes in the value of the process only dependent on the distance
w between process events, not on the time point v.
6

2
t

Fig. 7.1. Two dierent sample paths of the experiment: roll a die and record the
outcome. The total number of dierent sample paths is 6W where W is the number
of times an outcome is recorded. The state space only contains 6 possible outcomes
{1> 2> 3> 4> 5> 6}.

Stochastic processes are distinguished by (a) their state space, (b) the
index set W and (c) by the dependence relations between random variables
[(w). For example, a standard Brownian motion (or Wiener process)2 is dened as a stochastic process [(w) having continuous sample paths, stationary independent increments and [(w) has a normal distribution Q (0> w). A
Poisson process, dened in more detail in Section 7.2, is a stochastic process
[(w) having discontinuous sample paths, stationary independent increments
and [(w) has a Poisson distribution. A generalization of the Poisson process
is a counting process. A counting process is dened as a stochastic process
Q (w)  0 with discontinuous sample paths, stationary independent increments, but with arbitrary distribution. A counting process Q (w) represents
the total number of events that have occurred in a time interval [0> w]. Examples of a counting process are the number of telephone calls at a local
exchange during an interval, the number of failures in a telecommunication
network, the number of corrupted bits after transmission due to channel
errors, etc.
2

Harrison (1990) shows that the converse is also true: if \ is a continuous process with stationary
independent increments, then \ is a Brownian motion.

7.1 A stochastic process

117

7.1.2 Modeling a stochastic process from measurements


In practice, understanding observed phenomena often asks for a stochastic
model that captures the main characteristics of the studied phenomena and
that enables computations of diverse quantities of interest. Examples in the
eld of data communications networks are the determination of the arrival
process at a switch or router in order to dimension the number of buer
(memory) places, the modeling of the graph of the Internet, the distribution
of the duration of a telephone call or web browsing session, the number of
visits to certain websites, the number of links that refer to a web page, the
amount of downloaded information, the number of traversed routers by an
email, etc. Accurate modeling is in general di!cult and often trades o
complexity against accuracy of the model.
Let us illustrate some aspects of modeling by considering Internet delay
measurements. A motivation for obtaining an end-to-end delay model for (a
part of) the Internet is the question whether massive service deployment of
voice over IP (VoIP) can substitute classical telephony with a comparable
quality. Specically, classical telephony requires that the end-to-end delay
of an arbitrary telephone conversation hardly exceeds 100 ms.

end-to-end delay D [ms]

50

Interdepature time = 12 s
hopcount IP path = 13
# measurement points = 1006
E[D] = 35.03 ms
V[D] = 1.36 ms
min[D] = 34.18 ms
max[D] = 53.95 ms

45

40

35
5:00 a.m.

6:00 a.m.

7:00 a.m.

8:00 a.m.

8:30 a.m.

Time from 5:00 a.m. until 8:30 a.m. measured on 21/11/2002

Fig. 7.2. The raw data of the end-to-end delay of IP test packets along a same path
of 13 hops in the Internet measured during 3.5 hours.

118

The Poisson process

The end-to-end delay along a xed path between source and destination
measured during some interval is an example of a continuous time stochastic
process. We have received data of the delay measured at RIPE-NCC as
illustrated in Fig. 7.2. Figure 7.2 shows a sample path of this continuous
stochastic process. The precise details of the measurement conguration are
for the present purpose not relevant. It su!ces to add that Figure 7.2 shows
the time dierence between the departure of an IP test packet of 100 byte at
the sending box and its arrival at the destination box accurate within 10 s.
1
packets per second. Each
The average sending rate of IP test packets is 12
IP test packet is assumed to follow the same path from sending to receiving
box. The steadiness of the path is checked by trace-route measurements
every 6 minutes.
Usually, in the next step, the histogram of the raw data is made. A
histogram counts the number of data points that lie in an interval of G
ms, which is often called the bin size. Most graphical packages allow to
choose the bin size. Figure 7.3 shows two dierent histograms with bin size
G = 0=5 ms and G = 0=1 ms. In general, there is no universal rule to
choose the bin size G. Clearly, the bin size is bounded below by the measurement accuracy, in our case G A 10 s. A ner bin size provides more
detail, but the resulting histogram exhibits also more stochastic variations
because there are fewer data points in a small bin and adjacent bins may
possess a signicantly dierent amount of data points. Hence, compared
to one larger bin that covers a same interval, less averaging or smoothing
occurs in a set of smaller bins. The normalized histogram obtained by dividing the counts per bin by the total number of data points provides a rst
approximation to the probability density function of G. However, it is still
discrete and approximates Pr [n ? G  n + G]. A more precise description
of constructing a histogram is given in Section C.1.
The histogram is generally better suited to decide whether outliners in de
data points may be due to measurement errors or not. Figure 7.3 suggests
to either neglect the data points with G A 40 ms or to measure at a higher
sending rate of IP test packets in order to have more details in the intervals
exceeding 38 or 40 ms. If there existed a good3 stochastic model for the endto-end delay along xed Internet paths, a normal procedure4 in engineering
and physics would be to t the histogram with that stochastic model to
obtain the parameters of that stochastic model. The accuracy of the t can
3
4

Which is still lacking at the time of writing.


Other more di!cult methods in the realm of statistics must be invoked in case the measurement
data are so precious and rare that any additional measurement point has a far larger cost than
the cost of extensive additional computations.

7.1 A stochastic process

119

be expressed in terms of the correlation coe!cient  explained in Section


2.5.3. The closer  tends to 1, the better the t, which gives condence that
the stochastic model corresponds with the real phenomenon.

Number of data points

140
120

300

100

200

80
100

60
0

40

35

40

45

50

20
0
34

35

36

37

38

39

40

41

42

43

44

D in ms

Fig. 7.3. The histogram of the end-to-end delay with a bin size of 0.1 ms (the insert
has bin size of 0.5 ms).

Assuming that the presented measurement is a typical measurement along


a xed Internet path (which is true for about 80% of the investigated dierent paths), it demonstrates that there is a clear minimum at about 34 ms due
to the propagation delay of electromagnetic waves. In addition, the end-toend delay lies for 99% between 34 and 38 ms. However, there is insu!cient
data to pronounce claims in the tail behavior (Pr [G A {] for { A 40 ms).
Just this region is of interest to compute the quality of service expressed as
the probability that the end-to-end delay exceeds { ms is smaller than 103d
where d species the stringency on the quality requirement. Toll quality
in classical telephony sets { at 100 ms and d in the range of 4 to 5. The
existence of a good stochastic model covering the whole possible range of
the end-to-end delay G would enable us to compute tail probabilities based
on the parameters that can be tted from the measurements.
The histogram is in fact a projection of the raw measurement data onto
the ordinate (end-to-end delay axis). All time information (the abscissa in
Fig. 7.2) is lost. Usually, the time evolution and the dependencies or correlations over time of a stochastic phenomenon are di!cult and most analyses
are only tractable under certain simplifying conditions. For example, often
only a steady state analysis is possible and the increments [(wn )  [(wn31 )

120

The Poisson process

of the process for all w0 ? ? wn31 ? wn ? ? wq are assumed to be independent or weakly dependent. The study of Markov processes (Chapters
911) basically tries to compute and analyze the process in steady state.
Figure 7.2 is measured over a relatively long period of time and indicates
that after 8.00 a.m. the background tra!c increases. The background trafc interferes with the IP test packets and causes them to queue longer in
routers such that larger variations are observed. However, it is in general
di!cult to ascertain that (a part of) the measurement is performed while
the system operates in a certain stable regime (or steady state).
We have touched upon some aspects in the art of modeling to motivate the
importance of studying stochastic processes. In the sequel of this chapter,
one of the most basic and simplest stochastic processes is investigated.

7.2 The Poisson process


A Poisson process with parameter or rate  A 0 is an integer-valued, continuous time stochastic process {[(w)> w  0} satisfying
(i) [(0) = 0
(ii) for all w0 = 0 ? w1 ? ? wq , the increments [(w1 )  [(w0 )> [(w2 ) 
[(w1 )> = = = > [(wq )  [(wq31 ) are independent random variables
(iii) for w  0, v A 0 and non-negative integers n, the increments have the
Poisson distribution
Pr [[(w + v)  [(v) = n] =

(w)n h3w
n!

(7.1)

It is convenient to view the Poisson process [(w) as a special counting


process, where the number of events in any interval of length w is specied
via condition (iii). From this denition, a number of properties can be
derived:
(a) Condition (iii) implies that the increments are stationary because the
right-hand side does not dependent on v. In other words, the increments
only depend on the length of the interval w and not on the time v when the
interval begins. Further, with (3.11), the mean H [[(w + v)  [(v)] = w
and because the increments are stationary, this holds for any value of v. In
particular with v = 0 and condition 1, the expected number of events in a
time interval with length w is
H [[(w)] = w

(7.2)

Relation (7.2) explains why  is called the rate of the Poisson process,
namely, the derivative over time w or the number of events per time unit.

7.2 The Poisson process

121

(b) The probability that exactly one event occurs in an arbitrarily small
time interval of length k follows from condition (iii) as
Pr [[(k + v)  [(v) = 1] = kh3k = k + r(k)
while the probability that no event occurs in an arbitrarily small time interval of length k is
Pr [[(k + v)  [(v) = 0] = h3k = 1  k + r(k)
Similarly, the probability that more than one event occurs in an arbitrarily
small time interval of length k is
Pr [[(k + v)  [(v) A 1] = r(k)
Example 1 A conversation in a wireless ad-hoc network is severely disturbed by interference signals according to a Poisson process of rate  = 0=1
per minute. (a) What is the probability that no interference signals occur
within the rst two minutes of the conversation? (b) Given that the rst
two minutes are free of disturbing eects, what is the probability that in the
next minute precisely 1 interfering signal disturbs the conversation?
(a) Let [ (w) denote the Poisson interference process, then Pr [[(2) = 0]
needs to be computed. Since [ (0) = 0 and with (7.1), we can write
Pr [[(2) = 0] = Pr [[(2)  [ (0) = 0] = h32 , which equals Pr [[(2) = 0] =
h30=2 = 0=8187.
(b) The events during two non-overlapping intervals of a Poisson process
are independent. Thus the event {[(2)  [(0) = 0} is independent from the
event {[(3)  [(2) = 1} which means that the asked conditional probability Pr [[(3)  [(2) = 1|[(2)  [(0) = 0] = Pr [[(3)  [(2) = 1]. From
(7.1), we obtain Pr [[(3)  [(2) = 1] = 0=1h30=1 = 0=0905=
Example 2 During a certain time interval [w1 > w1 + 10 s], the number of IP
packets that arrive at a router is on average 40/s. A service provider asks
us to compute the probability that there arrive 20 packets in the period
[w1 > w1 + 1 s] and 30 IP packets in [w1 > w1 + 3 s]. We may regard the arrival
process as a Poisson process.
We are asked to compute Pr [[(1) = 20> [(3) = 30] knowing that  = 40
31
s . Using the independence of increments and (7.1), we rewrite
Pr [[(1) = 20> [(3) = 30] = Pr [[(1)  [(0) = 20> [(3)  [(1) = 10]
= Pr [[(1)  [(0) = 20] Pr [[(3)  [(1) = 10]
=

()20 h3 (2)10 h32


= 10326  0
20!
10!

122

The Poisson process

which means that the request of the service provider does not occur in
practice.

7.3 Properties of the Poisson process


The rst theorem is the converse of the above property (b) that immediately followed from the denition. The Theorems presented here reveal the
methodology of how stochastic processes are studied.
Theorem 7.3.1 A counting process Q (w) that satises the conditions (i)
Q (0) = 0, (ii) the process Q (w) has stationary and independent increments,
(iii) Pr [Q (k) = 1] = k + r(k) and (iv) Pr [Q (k) A 1] = r(k) is a Poisson
process with rate  A 0.
Proof: We must show that conditions (iii) and (iv) are equivalent to
condition (iii) in the denition of the Poisson process. Denote Sq (w) =
Pr [Q (w) = q] and consider rst the case q = 0, then
S0 (w + k) = Pr [Q (w + k) = 0] = Pr [Q (w + k)  Q (w) = 0> Q (w) = 0]
Invoking independence via (ii)
S0 (w + k) = Pr [Q (w + k)  Q (w) = 0] Pr [Q (w) = 0]
By denition, S0 (w) = Pr [Q (w) = 0] and from (iii), (iv) and the fact that
P"
n=0 Pr [Q (k) = n] = 1, it follows that
Pr [Q (k) = 0] = 1  k + r(k)

(v)

Combining these with the stationarity in (ii), we obtain


S0 (w + k) = S0 (w) (1  k + r(k))
or
S0 (w + k)  S0 (w)
r(k)
= S0 (w) +
k
k
from which, in the limit k $ 0, the dierential equation
S00 (w) = S0 (w)
is immediate. The solution is S0 (w) = Fh3w and the integration constant F
follows from (i) and S0 (0) = Pr [Q (0) = 0] = 1 as F = 1. This establishes
condition (iii) in the denition of the Poisson process for n = 0.

7.3 Properties of the Poisson process

123

The verication for q A 0 is more involved. Applying the law of total


probability (2.46),
Sq (w + k) = Pr [Q(w + k) = q]
q
X
=
Pr [Q (w + k)  Q (w) = m|Q (w) = q  m] Pr [Q (w) = q  m]
m=0

By independence (ii),
Pr [Q (w + k)  Q (w) = m|Q (w) = q  m] = Pr [Q (w + k)  Q (w) = m]
and by denition Pr [Q(w) = q  m] = Sq3m (w), we have
Sq (w + k) =

q
X

Pr [Q (w + k)  Q (w) = m] Sq3m (w)

m=0

By the stationarity (ii)


Pr [Q(w + k)  Q (w) = m] = Pr [Q (k)  Q (0) = m]
we obtain using (i)
Sq (w + k) =

q
X

Pr [Q (k) = m] Sq3m (w)

m=0

while (v) and (iii) suggest to write the sum as


Sq (w + k) = Sq (w) Pr [Q (k) = 0] + Sq31 (w) Pr [Q (k) = 1]
q
X
+
Sq3m (w) Pr [Q (k) = m]
m=2

Since Sq (w)  1 and using (iv),


q
X

Sq3m (w) Pr [Q (k) = m] 

m=2

q
X

Pr [Q(k) = m] = Pr [Q (k) A 1] = r(k)

m=2

we arrive with (v), (iii) at


Sq (w + k) = Sq (w) (1  k + r(k)) + Sq31 (w) (k + r(k)) + r(k)
or
Sq (w + k)  Sq (w)
r(k)
= Sq (w) + Sq31 (w) +
k
k
which leads, after taking the limit k $ 0, to the dierential equation
Sq0 (w) = Sq (w) + Sq31 (w)

124

The Poisson process

with initial condition Sq (0) = Pr [Q(0) = q] = 1{q=0} . This dierential


equation is rewritten as

g w
h Sq (w) = hw Sq31 (w)
(7.3)
gw
In case q =
reduces with S0 (w) = h3w to
1, the dierential equation w
g
w
gw h S1 (w) = . The general solution is h S1 (w) = w + F and, from the
initial condition S1 (0) = 0, we have F = 0 and S1 (w) = wh3w . The general
q 3w
solution to (7.3) is proved by induction. Assume that Sq (w) = (w)q!h
holds
for q, then the case q + 1 follows from (7.3) as

g w
(w)q
h Sq+1 (w) = 
gw
q!
q+1 3w

h
and integrating from 0 to w using Sq+1 (0) = 0, yields Sq+1 (w) = (w)(q+1)!
which establishes the induction and nalizes the proof of the theorem.

The second theorem has very important applications since it relates the
number of events in non-overlapping intervals to the interarrival time between these events.
Theorem 7.3.2 Let {[(w); w  0} be a Poisson process with rate  A 0
and denote by w0 = 0 ? w1 ? w2 ? the successive occurrence times of
events. Then the interarrival times q = wq wq31 are independent identically
distributed exponential random variables with mean 1 .
Proof: For any v  0 and any q  1, the event {q A v} is equivalent to
the event {[(wq31 +v)[(wq31 ) = 0}. Indeed, the q-th interarrival time q
can only be longer than v time units if and only if the q-th event has not yet
occurred v time units after the occurrence of the (q  1)-th event at wq31 .
Since the Poisson process has independent increments (condition (ii) in the
denition of the Poisson process), changes in the value of the process in nonoverlapping time intervals are independent. By the equivalence in events,
this implies that the set of interarrival times q are independent random
variables. Further, by the stationarity of the Poisson process (deduced from
condition (iii) in the denition of the Poisson process),
Pr [q A v] = Pr [[(wq31 + v)  [(wq31 ) = 0] = h3v
which implies that any interarrival time has an identical, exponential distribution,
Iq ({) = Pr [q  {] = 1  h3{

7.3 Properties of the Poisson process

This proves the theorem.

125

The converse of Theorem 7.3.2 also holds: if the interarrival times {q }
of a counting process {Q (w)> w  0} are i.i.d. exponential random variables
with mean 1 , then {Q (w)> w  0} is a Poisson process with rate .
An association to the exponential distribution is the memoryless property,
Pr[q A v + w|q A v] = Pr[q A w]
By the equivalence of the events, for any w> v  0,
Pr[q A v + w|q A v] = Pr[[(wq31 + v + w) 3 [(wq31 ) = 0|[(wq31 + v) 3 [(wq31 ) = 0]
= Pr[[(wq31 + v + w) 3 [(wq31 + v) = 0|[(wq31 + v) 3 [(wq31 ) = 0]

By the independence of increments (in non-overlapping intervals),


Pr[q A v + w|q A v] = Pr[[(wq31 + v + w)  [(wq31 + v) = 0]
and by the stationarity of the increments, the memoryless property is established,
Pr[q A v + w|q A v] = Pr[[(wq31 + w)  [(wq31 ) = 0] = Pr[q A w]
Hence, the assumption of stationary and independent increments is equivalent to asserting that, at any time v, the process probabilistically restarts
again with the same distribution and is independent of occurrences in the
past (before v). Thus, the process has no memory and, since the only continuous distribution that satises the memoryless property is the exponential
distribution, exponential interarrival times q are a natural consequence.
The arrival time of the q-th event or the waiting time until the q-event is
P
Zq = qn=1 n . In Section 3.3.1, it is shown that the probability distribution
of the sum of independent exponential random variables has a Gamma distribution or Erlang distribution (3.24). Alternatively, the equivalence of the
events, {Zq  w} +, {Q (w)  q}, directly leads to the Erlang distribution,
IZq (w) = Pr [Zq  w] = Pr [Q (w)  q] =

"
X
(w)n h3w
n=q

n!

The equivalence of the events, {Zq  w} +, {Q (w)  q}, is a general


relation and a fundamental part of the theory of renewal processes, which
we will study in the next Chapter 8.
Theorem 7.3.3 Given that exactly one event of a Poisson process {[(w); w 
0} has occurred during the interval [0> w], the time of occurrence of this event
is uniformly distributed over [0> w].

126

The Poisson process

Proof: Immediate application of the conditional probability (2.44) yields


for 0  v  w,
Pr [1  v|[(w) = 1] =

Pr [{1  v} _ {[(w) = 1}]


Pr [[(w) = 1]

Using the equivalence {1  v} +, {[(w0 + v)  [(w0 ) = 1} and the fact


that {[(w0 + v)  [(w0 ) = 1} = {[(v) = 1} by the stationarity of the
Poisson process gives
{1  v} _ {[(w) = 1} = {[(v) = 1} _ {[(w) = 1}
= {[(v) = 1} _ {[(w)  [(v) = 0}
Applying the independence of increments over non-overlapping intervals and
(7.1) yields
Pr [1  v|[(w) = 1] =
=

Pr [[(v) = 1] Pr [[(w)  [(v) = 0]


Pr [[(w) = 1]
v
(v) h3v h3(w3v)
=
(w) h3w
w

which completes the proof.

Theorem 7.3.3 is immediately generalized to q events. For any set of real


variables vm satisfying 0 = v0 ? v1 ? v2 ? ? vq ? w and given that q
events of a Poisson process {[(w); w  0} have occurred during the interval
[0> w], the probability of the successive occurrence times 0 ? w1 ? w2 ? ?
wq ? w of these q Poisson events is
Pr [w1  v1 > = = = > wq ? vq |[(w) = q] =

Pr [{w1  v1 > = = = > wq ? vq } _ {[(w) = q}]


Pr [[(w) = q]

Using a similar argument as in the proof of Theorem 7.3.3,


s = Pr [{w1  v1 > w2  v2 > = = = > wq ? vq } _ {[(w) = q}]
= Pr [[(v1 )  [ (v0 ) = 1> = = = > [(vq )  [(vq31 ) = 1> [(w)  [(vq ) = 0]
3
4
q
Y
= C Pr [[(vm )  [ (vm31 ) = 1]D Pr[[(w)  [(vq ) = 0]
m=1

3
=C

q
Y

h3(vm 3vm31 )  (vm  vm31 )D h3(w3vq )

m=1

= q

q
Y
m=1

(vm  vm31 ) h3

Sq

m=1 (vm 3vm31 )3(w3vq )

= q

q
Y
m=1

(vm  vm31 ) h3w

7.3 Properties of the Poisson process

127

Thus,
q
Pr [w1  v1 > w2  v2 > = = = > wq ? vq |[(w) = q] =

q
Y

(vm  vm31 ) h3w

m=1
(w)q h3w
q!

q! Y
(vm  vm31 )
wq
q

m=1

from which the density function


i{wm } (v1 > = = = > vq |[(w) = q) =

Cq
Pr [w1  v1 > = = = > wq ? vq |[(w) = q]
Cv1 = = = Cvq

follows as
q!
wq
which is independent of the rate . If 0 ? w1 ? w2 ? ? wq ? w are the
successive occurrence times of q Poisson events in the interval [0> w], then
the random variables w1 > w2 > = = = > wq are distributed as a set of order statistics,
dened in Section 3.4.2, of q uniform random variables in [0> w]. In other
words, if q i.i.d. uniform random variables on [0> w] are assorted in increasing
order, they may represent q successive occurrence times of a Poisson process.
The average spacing between these q ordered i.i.d. uniform random variables
is qw as computed in Problem (ii) of Section 3.7.
A related example is the conditional probability where 0 ? v ? w and
0  n  q,
i{wm } (v1 > v2 > = = = > vq |[(w) = q) =

Pr [{[(v) = n} _ {[(w) = q}]


Pr [[(w) = q]
Pr [{[(v) = n} _ {[(w)  [(v) = q  n}]
=
Pr [[(w) = q]
Pr [[(v) = n] Pr[[(w)  [(v) = q  n]
=
Pr [[(w) = q]

Pr [[(v) = n|[(w) = q] =

q!(v)n h3v ((w  v))q3n h3(w3v)


n!(w)q h3w
(q  n)!
n
q v
=
(w  v)q3n
n wq
=

Hence, if s = vw , the conditional probability becomes



q n
s (1  s)q3n
Pr [[(v) = n|[(w) = q] =
n

128

The Poisson process

Given that a total number of q Poisson events have occurred in time interval
[0> w], the chance that precisely n events have taken place in the sub-interval
[0> v] is binomially distributed with parameter q and s = vw . Observe that
also this conditional probability is independent of the rate . In addition,
since limw<" [(w) = 4 such that q $ 4, applying the law of rare events
results in
lim Pr [[(v) = n|[(w) = q] =

w<"

vn 3v
h
n!

Given an everlasting Poisson process, the chance that precisely n events


occur in the interval [0> v] is Poisson distributed with mean equal to the
length of the interval.
Application The arrival process of most real-time applications (such as
telephony calls, interactive-video, ...) in a network is well approximated by
a Poisson process. Suppose a measurement conguration is built to collect
statistics of the arrival process of telephony calls in some region. During a
period [0> W ], precisely 1 telephony call has been measured. What can be
said of the time { 5 [0> W ] at which the telephony call has arrived at the
measurement device? Theorem 7.3.3 tells us that any time in that interval
is equally probable.
Theorem 7.3.4 If [(w) and \ (w) are two independent Poisson processes
with rates { and | , then is ](w) = [(w) + \ (w) also a Poisson process with
rate { + | .
Proof: It su!ces to demonstrate that the counting process Q] (w) =
Q[ (w) + Q\ (w) has exponentially distributed interarrival times ] . Suppose
that Q] (wq ) = q, it remains to compute the next arrival at time wq+1 = wq +v
for which Q] (wq + v) = q + 1. Due to the memoryless property of the
Poisson process, the occurrence of an event from wq on for each random
variable [ and \ is again exponentially distributed with parameter { and
| , respectively. In other words, it is irrelevant which process [ or \
has previously caused the arrival at time wq . Further, the event that the
interarrival time of the sum processes {] A v} is equivalent to {[ A v} _
{\ A v} or
Pr [] A v] = Pr [[ A v> \ A v] = Pr [[ A v] Pr [\ A v] = h3({ +| )v
where the independence of [(w) and \ (w) has been used. This proves the
theorem.

7.4 The nonhomogeneous Poisson process

129

A direct consequence is that any sum of independent Poisson processes is


also a Poisson process with aggregate rate equal to the sum of the individual
rates. This theorem is in correspondence with the sum property of the
Poisson distribution.
7.4 The nonhomogeneous Poisson process
As will be shown later in Section 11.3.2, the Poisson process is a special case
of a birth-and-death process, which is in turn a special case of a Markov
process. Hence, it seems more instructive to discuss these special processes
as applications of the Markov process. Therefore, only associations to the
Poisson process are treated here. In many cases, the rate is a time variant
function (w) and such process is termed a nonhomogeneous or nonstationary Poisson process. For example, the arrival rate of a large number p of
individual IP-ows at a router is well approximated by a nonhomogeneous
Poisson process, where the rate (w) varies over the day depending on the
number p and the individual rate of each ow of packets. Since the sum of
independent Poisson random variables is again a Poisson random variable,
Pp(w)
we have (w) = m=1 m (w).
If [(w) is a nonhomogeneous Poisson process with rate (w), the increment
[(w) [(v) reects the number of events in an interval (v> w] and increments
of non-overlapping intervals are still independent.
Rw
Theorem 7.4.1 If (w) = 0 (x)gx and v ? w, then [(w) [(v) is Poisson
distributed with mean (w)  (v).
The demonstration is analogous to the proof of Theorem 7.3.1.
Proof (partly): Denote by Sq (w) = Pr [Q (w)  Q (v) = q], then
S0 (w + k) = Pr [Q (w + k)  Q (v) = 0]
= Pr [Q (w + k)  Q (w) = 0> Q(w)  Q (v) = 0]
Invoking independence of the increments,
S0 (w + k) = Pr [Q(w + k)  Q (w) = 0] Pr[Q (w)  Q(v) = 0]
= S0 (w)(1  (w)k + r(k))
or
S0 (w + k)  S0 (w)
r(k)
= (w)S0 (w) +
k
k
from which, in the limit k $ 0, the dierential equation
S00 (w) = (w)S0 (w)

130

The Poisson process

g
is immediate. Rewritten as gw
log S0 (w) = (w), after integration over (v> w],
we nd log S0 (w) =  ((w)  (v)) since S0 (v) = Pr [Q (v)  Q (v) = 0] = 1.
Thus, for the case q = 0, we nd S0 (w) = exp [ ((w)  (v))], which proves
the theorem for q = 0.
The remainder of the proof (q A 0) uses the same ingredients as the proof
of Theorem 7.3.1 and is omitted.

A nonhomogeneous Poisson process [(w) with rate (w) can be transformed to a homogeneous Poisson process \ (x) with rate 1 by the time
transform x = (w). For, \ (x) = \ ((w)) = [(w), and \ (x + x) =
\ ((w) + (w)) = [(w + w) because (w) = (w) for small w such
that
Pr [\ (x + x)  \ (x) = 1] = Pr [[(w + w)  [(w) = 1]
= (w)w + r(w)
= x + r(x)
because x = (w)w + r(w). Hence, all problems concerning nonhomogeneous Poisson processes can be reduced to the homogeneous case treated
above.

7.5 The failure rate function


Previous sections have shown that the Poisson process is specied by a
rate function (w). In this section, we consider the failure rate function of
some object or system. Often it is interesting to know the probability that
an object will fail in the interval [w> w + w] given that the object was still
functioning well up to time w. Let [ denote the lifetime of an object5 , then
this probability can be written with (2.44) as
Pr [{w  [  w + w} _ {[ A w}]
Pr [[ A w]
Pr [w ? [  w + w]
=
Pr [[ A w]

Pr [w  [  w + w|[ A w] =

If i[ (w) is the probability density function of [ and I[ (w) = Pr [[  w],


then for small w and assuming that i[ (w) is well behaved6 such that
5

In medical sciences, [ can represent in general the time for a certain event to occur. For
example, the time it takes for an organism to die, the time to recover from illness, the time for
a patient to respond to a therapy and so on.
Recall the discussion in Section 2.3.

7.5 The failure rate function

131

Pr [w ? [  w + w] = i[ (w)w,
Pr [w  [  w + w|[ A w] =

i[ (w)
w
1  I[ (w)

This expression shows that


u(w) =

i[ (w)
1  I[ (w)

(7.4)

can be interpreted as the intensity or rate that a w-year old object will fail.
It is called the failure rate u(w) and
U(w) = 1  I[ (w) = Pr [[ A w]

(7.5)

is usually termed7 the reliability function. Since u(w) = Pr[w$[$w+{w|[Aw]


for
{w
small w, the failure rate u (w) A 0 because u (w) = 0 would imply an innite
lifetime [. Using the denition (2.30) of a probability density function, we
observe that
u(w) = 

gU(w)
gw

U(w)

=

g ln U(w)
gw

Or, since U(0) = 1, the corresponding integrated relation is


 Z w

U(w) = exp 
u(x)gx

(7.6)

(7.7)

The expressions (7.6) and (7.7) are inverse relations that specify u(w) as function of U(w) and vice versa. The reliability function U(w) is non-increasing
with maximum at w = 0 since it is a probability distribution function. On
the other hand, the failure rate u(w) being a probability density function can
take any positive real value. From (7.4) we obtain the density function of
the lifetime [ in terms of failure rate u(w) as
 Z w

u(x)gx
i[ (w) = u(w)U(w) = u(w) exp 
0

with i[ (0) = u(0). Using the tail relation (2.35) for the expectation of the
lifetime [ immediately gives the mean time to failure,
Z "
U(w)gw
(7.8)
H [[] =
0

In case I[ (W ) = 1 and i[ (W ) 6= 0 for a nite time W , which is the


maximum lifetime, the denition (7.4) demonstrates that u(w) has a pole at
7

In biology, medical sciences and physics, U(w) is called the survival function and u(w) is the
corresponding mortality rate or hazard rate.

132

The Poisson process

w = W= In practice, the failure rate u(w) is relatively high for small w due to
initial imperfections that cause a number of objects to fail early and u(w) is
increasing towards the maximum life time W due to aging or wear and tear.
This shape of u(w) as illustrated in Fig. 7.4 is called a bath-tub curve,
which is convex.
r(t)
fX(0)

t
T

Fig. 7.4. Example of a bath-tub shaped failure rate function u(w).

An often used model for the failure rate is u(w) = dwd31 with corresponding reliability function U(w) = exp [wd ] and where the lifetime [ has a
Weibull distribution function I[ (w) = 1  U(w) as in (3.40). In case d = 1,
the failure rate u(w) =  is constant over time, while d A 1 (d ? 1) reects an
increasing (decreasing) failure rate over time. Hence, a bath-tub shaped
(realistic) failure function as in Fig. 7.4 can be modeled by a Weibull model
for u (w) with d ? 1 in the beginning, d = 1 in the middle and d A 1 at the
end of the life time.
For an exponential lifetime where i[ (w) = h3w , the failure rate (7.4)
equals u(w) =  and is independent of time. This means that the failure
rate for a w-year-old object is the same as for a new object, which is a
manifestation of the memoryless property of the exponential distribution.
It also explains why  in both the exponential as Poisson process is often
called a rate.

7.6 Problems
(i) A series of test strings each with a variable number Q of bits all equal
to 1 are transmitted over a channel. Due to transmission errors, each
1-bit can be eected independently from the others and only arrives
non-corrupted with probability s. The length Q of the test strings
(words) is a Poisson random variable with mean length  bits. In

7.6 Problems

133

this test, the sum \ of the bits in the arriving words is investigated
to determine the channel quality via s. Compute the pdf of \ .
(ii) At a router, four QoS classes are supported and for each class packets
arrive according to a Poisson process with rate m for m = 1> 2> 3> 4.
Suppose that the router had a failure at time w1 that lasted W time
units. What is the probability density function of the total number
of packets of the four classes that has arrived during that period?
(iii) Let Q (w) = Q1 (w) + Q2 (w) be the sum of two independent Poisson
processes with rates 1 and 2 . Given that the process Q (w) had
an arrival, what is the probability that that arrival came from the
process Q1 (w)?
(iv) Peter has been monitoring the highway for nearly his entire life and
found that the cars pass his house according to a Poisson process.
Moreover, he discovered that the Poisson process in one lane is independent from that in the other lanes. The rate of these independent
processes diers per lane and is denoted by 1 > 2 > 3 , where m is
expressed in the number of cars on lane m per hour.
(a) Given that one car passed Peter, what is the probability that
it passed in lane 1?
(b) What is the probability that q cars pass Peter in 1 hour ?
(c) What is the probability that in 1 hour q cars have passed and
that they all have used lane 1?
(v) In a game, audio signals arrive in the interval (0> W ) according to a
Poisson process with rate , where W A 1@. The player wins only
if at least one audio signal arrives in that interval, and if he or she
pushes a button (only one push allowed) upon the last of the signals.
The player uses the following strategy: he or she pushes the button
upon the arrival of the rst signal (if any) after a xed time v  W .
(a) What is the probability that the player wins?
(b) Which value of v maximizes the probability of winning, and
what is the probability in that case?
(vi) The arrivals of voice over IP (VoIP) packets to a router is close to
a Poisson process with rate  = 0=1 packets per minute. Due to an
upgrade to install weighted fair queueing as priority scheduling rule,
the router is switched o for 10 minutes.
(a) What is the probability of receiving no VoIP packets when
switched o?
(b) What is the probability that more than ten VoIP packets will
arrive during this upgrade?

134

The Poisson process

(c) If there was one VoIP in the meantime, what is the most probable minute of the arrival?
(vii) A link of a packet network carries on average ten packets per second.
The packets arrive according to a Poisson process. A packet has a
probability of 30 % to be an acknowledgment (ACK) packet independent of the others. The link is monitored during an interval of 1
second.
(a) What is the probability that at least one ACK packet has been
observed?
(b) What is the expected number of all packets given that ve
ACK packets have been spotted on the link?
(c) Given that eight packets have been observed in total, what is
the probability that two of them are ACK packets?
(viii) An ADSL helpdesk treats exclusively customer requests of one of
three types: (i) login-problems, (ii) ADSL hardware and (iii) ADSL
software problems. The opening hours of the helpdesk are from 8:00
until 16:00. All requests are arriving at the helpdesk according to
a Poisson process with dierent rates: 1 = 8 requests with login
problems/hour, 2 = 6 requests with hardware problems/hour, and
3 = 6 requests with software problems/hour. The Poisson arrival
processes for dierent types of requests are independent.
(a) What is the expected number of requests in one day?
(b) What is the probability that in 20 minutes exactly three requests arrive, and that all of them have hardware problems?
(c) What is the probability that no requests will arrive in the last
15 minutes of the opening hours?
(d) What is the probability that one request arrives between 10:00
and 10:12 and two requests arrive between 10:06 and 10:30?
(e) If at the moment w + v there are n + p requests, what is the
probability that there were n requests at the moment w?
(ix) Arrival of virus attacks to a PC can be modeled by a Poisson process
with rate  = 6 attacks per hour.
(a) What is the probability that exactly one attack will arrive
between 1 p.m. and 2 p.m.?
(b) Suppose that at the moment the PC is turned on there were no
attacks on PC, but at the shut-down time precisely 60 attacks
have been observed. What is the expected amount of time
that the PC has been on?

7.6 Problems

135

(c) Given that six attacks arrive between 1 p.m. and 2 p.m., what
is the probability that the fth attack will arrive between 1:30
p.m. and 2 p.m.?
(d) What is the expected arrival time of that fth attack?
(x) Consider a system V consisting of q subsystems in series as shown
in Fig. 7.5. The system V operates correctly only if all subsystems
operate correctly. Assume that the probability that a failure in a
subsystem Vl occurs is independent of that in subsystem Vm . Given
the reliability functions Um (w) or each subsystem Vm , compute the
reliability function U(w) of the system V.

S1

S3

S2

Sn

Fig. 7.5. A system consisting of q subsystems in series.

(xi) Same question as in previous exercise but applied to a system V


consisting of q subsystem in parallel as shown in Fig. 7.6.

S1
S2
Sn
Fig. 7.6. A system consisting of q subsystems in parallel.

8
Renewal theory

A renewal process is a counting process for which the interarrival times q


are i.i.d. random variables with distribution I (w). Hence, a renewal process
generalizes the exponential interarrival times in the Poisson process (see
Theorem 7.3.2) to an arbitrary distribution. Since the interarrival times are
i.i.d. random variables, at each event (or renewal) the process probabilistically restarts. The classical example of a renewal process is the successive
replacement of light bulbs: the rst bulb is installed at time Z0 , fails at
time Z1 = 1 , and is immediately exchanged for a new bulb, which in turn
fails at Z2 = 1 + 2 , and thereafter replaced by a third bulb, and so on.
How many light bulbs are replaced in a period of w time units given the life
time distribution I (w)?

N(t)

W1
W0

W2
W1

W3
W2

W4
W3

W5
W4

W5

Fig. 8.1. The relation between the renewal counting process Q (w), the interarrival
time q and the waiting time Zq .

137

138

Renewal theory

8.1 Basic notions

P
As illustrated in Fig. 8.1, the waiting time Zq = qn=1 n (for q  1, with
Z0 = 0 by convention) is related to the counting process {Q (w)> w  0} by the
equivalence {Q (w)  q} +, {Zq  w}: the number of events (renewals) up
to time w is at least q if and only if the q-th renewal occurred on or before
time w. Alternatively, the number of events by time w equals the largest
value of q for which the q-th event occurs before or at time w, Q (w) =
max [q : Zq  w]. The convention that Z0 = 0 implies that Q(0) = 0: the
counting process starts counting from zero at time 0. The main objective
of renewal theory is to deduce properties of the process {Q(w)> w  0} as a
function of the interarrival distribution I (w) = Pr [  w].
8.1.1 The distribution of the waiting time Zq
If we assume that the interarrival times are i.i.d. having a Laplace transform
Z "
Z "
3}w
* (}) =
h gI (w) =
h3}w i (w)gw
0

the waiting time Zq is the sum of q i.i.d. random variables specied by


(2.66) as
Z "
*Zq (}) =
h3}w iZq (w)gw = *q (})=
(8.1)
0

By partial integration, we
R w nd the Laplace transform of the distribution
IZq (w) = Pr [Zq  w] = 0 iZQ (x)gx
Z "
*q (})
*Zq (})
= 
(8.2)
h3}w IZq (w)gw =
}
}
0
The inverse Laplace transform follows1 with (2.38) as
Z f+l" q
* (}) }w
1
Pr [Zq  w] =
h g}
2l f3l"
}

(8.3)

As an alternative to the approach with probability generating functions,


1

In general, by integration of (2.38), we nd


]
I[ (w) =

i[ (x)gx =
0

1
2l

f+l"

f3l"

*[ (})

h}w 3 1
g}
}

U f+l" *[ (})
1
whose form seems dierent from (8.3). However, 2l
g} = 0 because the contour
f3l"
}
can be closed over the positive Re(}) A f plane where *[ (}) is analytic and because limU<"
* (Uhl ) = 0 for 3 2 ?  ? 2 , which follows from the existence of the Laplace integral
U[
" 3}w
i[ (w)gw.
0 h

8.1 Basic notions

139

we can resort to the q-th convolution, which follows from (2.63) as


iZ1 (w) = i (w)
Z "
Z w
iZQ (w) =
iZQ 31 (w  |)i (|)g| =
iZQ 31 (w  |)i (|)g|
3"

Integrated,

Z
Pr [Zq  w] =

gx
3"
Z " Z

"

3"
w3|

=
3"
w

3"

iZQ 31 (x  |)i (|)g|

iZQ 31 (x)gx i (|)g|

Pr [Zq31  w  |] i (|)g|

=
0

(qW)

By denoting Pr [Zq  w] = I

(w), we have

I(1W) (w) = I (w)


Z w
(qW)
I (w) =
I((q31)W) (w  |)i (|)g|
0

(0W)

These equations also show that we can dene I (w) = 1. Let us dene
P
(nW)
Xq (w) = qn=1 I (w). By summing both sides in the last equation, we
obtain
Z wX
Z w q31
q
X
((n31)W)
Xq (w) =
I
(w  |)i (|)g| =
I((n)W) (w  |)i (|)g|
0 n=1

0 n=0

(0W)

With the denition I

(w) = 1, we arrive at
Z w
Xq (w) =
Xq31 (w  |)gI (|) + I (w)

(8.4)

or, written in terms of convolutions,


Xq (w) = (Xq31  I ) (w) + I (w)
(qW)

Finally, we mention the interesting bound on the convolution I


a non-negative random variable  ,
Z w
(qW)
I((q31)W) (w  |)gI (|)
I (w) =
0
Z w
((q31)W)
(w)
gI (|) = I((q31)W) (w)I (w)
 I
0

(w) for

140

Renewal theory

which follows from the monotone increasing nature of any distribution func(0W)
tion. By iteration on q starting from I (w) = 1, it is immediate that
I(qW) (w)  (I (w))q

(8.5)

Since (I (w))q is the distribution of the maximum (3.33) of a set of q


i.i.d. random variables {n }1$n$q , the bound (8.5) means that, for n  0,
#
" q


X
n  {  Pr max n  {
Pr
1$n$q

n=1

P
which is rather obvious because qn=1 n  max1$n$q n . The equality sign
is only possible if q  1 of the n are zero.
8.1.2 The renewal function p (w) = H [Q (w)]
From the equivalence {Q (w)  q} +, {Zq  w}, we directly have
Pr [Q (w)  q] = Pr [Zq  w] = I(qW) (w)

(8.6)

Pr [Q (w) = q] = Pr [Q (w)  q]  Pr [Q (w)  q + 1]


= I(qW) (w)  I((q+1)W) (w)
The expected number of events in (0> w] expressed via the tail probabilities
(2.36) follows with (8.6) as
p(w) = H [Q (w)] =

"
X

I(nW) (w)

(8.7)

n=1

and p(w) is called the renewal function. According to a property of the


counting process, Q (0) = 0, the number of events in (0> w] when w $ 0 is
assumed to be zero such that p(0) = 0. From (8.5), it follows at each point
w for which I (w) ? 1 that
p(w) 

"
X
n=1

(I (w))n =

1
1
1  I (w)

Hence, for nite w where I (w) ? 1, the renewal function p(w) converges at
least as fast as a geometric series and is bounded. In the limit w $ 4, where
limw<" I (w) = 1, we see that p(w) is not bounded anymore. Intuitively, the
number of repeated events (renewals) in an innite time interval is clearly
innite.
The renewal function p(w) completely characterizes the renewal process.
Indeed, if *p (}) is the Laplace transform of p(w), then after taking the

8.1 Basic notions

141

Laplace transform of both sides in (8.7) and using the denition Pr [Zq  w] =
(qW)
I (w) together with (8.2), we obtain
1X n
1 * (})
*p (}) =
* (}) =
}
} 1  * (})
"

(8.8)

n=1

provided |* (})| ? 1. From this expression, the interarrival time can be
found from
}*p (})
* (}) =
1 + }*p (})
after inverse Laplace transform. By taking the inverse Laplace transform
(2.38), p(w) is written as a complex integral
Z f+l"
* (}) h}w
1
g}
p(w) =
2l f3l" 1  * (}) }
8.1.3 The renewal equation
After taking the inverse Laplace transform of *p (}) = *p (})* (}) + *}(}) ,
which is deduced from (8.8), a third relation for p (w) that often occurs is
Z w
p(w) =
p(w  x)gI (x) + I (w)
0
Z w
I (w  x)gp(x) + I (w)
(8.9)
=
0

and is called the renewal equation. Taking the limit q $ 4 in (8.4) also
leads to the renewal equation. Since p(0) = 0, the renewal equation implies
that I (0) = Pr [  0] = 0 or that processes where a zero interarrival time
is possible (e.g. in simultaneous events) are ruled out. For a Poisson process,
Theorem 7.3.1 states that the occurrence of simultaneous events (k $ 0) is
zero. The requirement p(0) = 0 generalizes the exclusion of simultaneous
events in any renewal process.
The probabilistic argument that leads to the renewal equation is as follows.
By conditioning on the rst renewal for n A 0,
Pr [Q (w) = n|Z1 = v] = 0
= Pr [Q (w  v) = n  1]

w?v
wv

where in the last case for w  v the event {Q (w) = n} is only possible if n  1
renewals occur in time interval (v> w], which is, due to the stationarity of the

142

Renewal theory

renewal process, equal to n  1 renewals in (0> w  v]. By the law of total


probability (2.46), we uncondition to nd for n  1,
Z "
g Pr [Z1  v]
gv
Pr [Q (w) = n|Z1 = v]
Pr [Q (w) = n] =
gv
0
Z w
=
Pr [Q (w  v) = n  1] i (v)gv
(8.10)
0

Multiplying both sides by n and summing over all n  1 gives the average
at the left-hand side,
H [Q (w)] =

"
X

n Pr [Q (w) = n]

n=1

The sum at the right-hand side is


"
X

n Pr [Q (w  v) = n  1] =

n=1

"
X
(n + 1) Pr [Q (w  v) = n]
n=0

= H [Q (w  x)] + 1
Combining both sides yields

H [Q (w  x)] gI (v)

H [Q (w)] = I (w) +
0

which is again the renewal equation (8.9) since p(w) = H [Q (w)].


8.1.4 A generalization of the renewal equation
The renewal equation (8.9) is a special case of the more general class of
integral equations
Z w
\ (w  x)gI (x)>
w0
(8.11)
\ (w) = k (w) +
0

in the unknown function \ (w), where k (w) is a known function and I (w) is
a distribution function. This equation can be written using the convolution
notation as
\ (w) = k (w) + \  I (w)
By conditioning on the rst renewal as shown above, many renewal problems
can be recasted into the form of the general renewal equation (8.11). An
example is the derivation of the residual life or waiting time given in Section
8.3. Therefore, it is convenient to present the solution to the general renewal
equation (8.11).

8.1 Basic notions

143

Lemma 8.1.1 If k (w) is bounded for all w, then the unique solution of the
general renewal equation (8.11) is
Z w
k(w  x)gp(x)
(8.12)
\ (w) = k (w) +
where p (w) =

P"

n=1

I (nW) (w)

is the renewal function.

Proof: Let us rst concentrate on the formal solution. In general, convolutions are best treated in the transformed domain. After taking the Laplace
transform of the general renewal equation (8.11), we obtain
*\ (}) = *k (}) + *\ (}) *I (})
such that
*\ (}) =

*k (})
1  *I (})

There always exists a region in the }-domain where |*I (})| ? 1 such that
the geometric series applies,
*\ (}) = *k (})

"
X

(*I (}))n = *k (}) + *k (})

n=0

"
X

(*I (}))n

n=1

Back transforming and taking into account that (*I (}))n is the transform
of a n-fold convolution yields
\ (w) = k (w) + k 

"
X

I (nW) (w) = k (w) + k  p (w)

n=1

This formal manipulation demonstrates2 that (8.12) is a solution of the


general renewal equation (8.11).
Suppose now that there are two solutions \1 (w) and \2 (w). Their dierence
Y (w) = \1 (w)  \2 (w) obeys
Z w
Y (w  x)gI (x) = Y  I (w)
Y (w) =
0
2

Alternatively, by substituting the solution into the equation, a check is


\ (w) = k (w) + \ W I (w) = k (w) + k W I (w) + k W p W I (w)
&
%
"
[
(nW)
I
(w) W I (w)
= k (w) + k W I (w) +
%
= k (w) + k W I (w) +

n=1
"
[
n=2

= k (w) + k W p (w)

&
I

(nW)

(w) = k (w) + k W

"
[

n=1

&
I

(nW)

(w)

144

Renewal theory

By convolving both sides with I and using the original equation, we deduce
that Y (w) = Y  I  I (w). Continuing this process, for each n, we have
that Y (w) = Y  I (nW) (w). Since I (nW) (w) $ 0 for all nite w and n $ 4
(because p (w) exists for all nite w), and if Y (w) is bounded, this implies that
Y (w) = 0 for all nite w. This demonstrates the uniqueness and motivates
the requirement that k (w) should be bounded.

8.1.5 The renewal function for a Poisson process


Before showing below that the renewal function p(w) can be specied in
detail as w $ 4, we consider rst the Poisson process where the interarrival
times {q }qD1 are i.i.d. exponentially distributed with rate . Since * (}) =

}+ ,
Z f+l" }w
h
1
g}
fA0
p(w) =
2l f3l" } 2
The contour can be closed over the negative Re(})-plane (because w  0).
The only singularity
of the integrand is a double pole at } = 0 with residue

gh}w
= w. This result, of course, follows directly from the
p(w) =  g}
}=0

denition of the Poisson process given in (7.2). We see that the renewal
function p(w) for the Poisson process is linear for all w. Moreover, the Poisson
process is the only continuous time renewal process with a linear renewal
function p(w). Indeed, if3 p (w) = w, the renewal equation is
Z w
Z w
w =
((w  x)) gI (x) + I (w) = 
I (x)gx  wI (0) + I (w)
0

By dierentiation with respect to w and assuming non-zero interarrival times


such that I (0) = Pr [  0] = 0, we obtain a dierential equation
 = I (w) +

gI (w)
gw

whose solution is I (w) = 1  h3w . By Theorem 7.3.2, exponential interarrival times characterize a Poisson process with rate  = .
8.2 Limit theorems
In the limit w $ 4, the equivalence relation (8.6) indicates that, for any
xed value of q, Pr [Q (w)  q] = 1, which means that the number of events
3

A linear form p(w) = w +  with  6= 0 is impossible because p (0) = 0.

8.2 Limit theorems

145

Q (w)
Q (w) $ 4 as w $ 4. Let us consider Q(w)
, which is the sample mean
of the rst Q (w) interarrival times in the
w]. The Strong Law of
intervalZ(0>
q
Large Numbers (6.3) indicates that Pr limq<" q =  = 1 and, because

(w)
Q (w) $ 4 as w $ 4, we have that QQ(w)
$  = H [ ] as w $ 4. Since
ZQ(w)  w ? ZQ(w)+1 , we obtain the inequality

ZQ(w)+1
ZQ(w)
w

?
Q (w)
Q (w)
Q(w)
Since both lower and upper bound tend to , we arrive at the important
= 1 . The random variable counting the number
result that limw<" Q(w)
w
of events in (0> w] per interval length w, converges to the average interarrival time  = H [ ]. Unfortunately4 , weh cannot
i simply deduce the intuQ(w)
tends to 1 . On the other
itive result that also the expectation, H
w
hand, the expectation
of ZQ(w) is obtained from Walds identity (2.69)

as H ZQ(w) = H [Q (w)] H [ ]. Taking the expectation in the inequality


H[Q(w)]
1
 H[
+ 1w from which, after
ZQ(w)  w ? ZQ(w)+1 , leads to H[Q(w)]
w
w
] ?
the limit w $ 4, the intuitive result follows. Thus, we have proved5 the
following theorem:

Theorem 8.2.1 (Elementary Renewal Theorem) If  = H [ ] is the


average interarrival time of events in the renewal process, then
H [Q (w)]
p(w)
1
= lim
=
w<" w
w

1
Q (w)
lim
=
w<"
w


lim

w<"

(8.13)

The left-hand side in (8.13) describes the long run average number of
events (renewals) per unit time. The right-hand side is the reciprocal of
the average interarrival rate (or life time). For example, in the light bulb
replacement process, a bulb lasts on average  time units, then, in the long
run or steady state, the light bulbs must be replaced at rate 1 per time unit.
4

As remarked by Ross (1996, p. 108), if X is uniformly distributed on (0> 1), consider the random
variables \q dened as \q = q1X $ 1 . For large q, X A 0 with probability 1, whence \q < 0
q
k
l
1
if q < ". However, H [\q ] = qH 1X $ 1 = q q
= 1, for all q. The sequence of random
q
variables \q converges to 0, although the expected values of \q are all precisely 1=
The elementary renewal theorem can be proved only by resorting to complex function theory
and using LaplaceStieltjes transforms (Cohen, 1969, p. 100). The limit argument provided
by the Strong Law of Large Numbers follows then from a Tauberian theorem.

146

Renewal theory

The extension6 of the Elementary Renewal Theorem is the Key Renewal


Theorem. The Key Renewal Theorem gives the limit w $ 4 of the solution
(8.12) of the general renewal equation (8.11).
Theorem 8.2.2 (Key Renewal Theorem) If j(w) is directly7 Riemann
integrable over [0> 4), then
Z
Z w
1 "
lim
j(w  x)gp(x) =
j(x)gx
(8.14)
w<" 0
 0
The proof8 is more complicated, based on analysis and found in Feller
(1971, Section XI.1). The essential di!culty is demonstrating that the limit
at the right-hand side indeed exists. An application of the Key Renewal
Theorem is presented in Section 8.3 and here we consider Blackwells Theorem.
Blackwells Theorem follows from the Key Renewal Theorem when choosing k (w) = 1wM[0>W ) in the general renewal equation (8.11). The corresponding
solution (8.12) for w A W is
Z w
Z w
1w3xM[0>W ) gp(x) =
gp(x) = p(w)  p(w  W )
\ (w) =
w3W

R"
while the Key Renewal Theorem states that limw<" \ (w) = 1 0 k(x)gx =
W
 . Hence, we arrive at Blackwells Theorem, for any xed W A 0,
p(w)  p(w  W )
1
=
w<"
W

lim

The interpretation of Blackwells Theorem is that the number of expected


renewals in an interval with length W su!ciently far from the origin (or in
steady-state regime) is approximately equal to W . It can be shown that
the reverse, i.e. the Key Renewal Theorem can be deduced from Blackwells
theorem, also holds. Hence, the Key Renewal Theorem is equivalent to
Blackwells Theorem.
Similarly to the Key Renewal Theorem the di!culty in Blackwells Theorem is the proof that the limit exists. If the existence of the limit is proved,
6

In the sequel we assume that the distribution of S


the interarrival times I (w) is not periodic in
the sense that there exists no integer g such that "
q=0 Pr [ = qg] = 1 or, the random variable
 does not only take integer units of some integer g.
The concept is introduced to avoid widly oscillating functions that are still integrable over
[0> "), such as j(w) = w1{|w3q|? 1 } . The precise denition is given in Feller (1971). A
q2

su!cient condition for Udirect Riemann integrability is (a) j(w) D 0 for all w D 0, (b) j(w) is
non-increasing and (c) 0" j(x)gx ? ".
Based on the relatively new probabilistic concept of coupling, alternative proofs of the Key
Renewal Theorem exist (see e.g. Grimmett and Stirzacker (2001, pp. 429430)).

8.2 Limit theorems

147

which means that limw<" p(w)  p(w  W ) = d(W ) exists, the Elementary
Renewal Theorem su!ces to prove that the limit has value 1 . Following the
argument of Ross (1996, p. 110), we can write, for nite { and |,
d({ + |) = lim [p(w)  p(w  {  |)]
w<"

= lim [p(w)  p(w  {)] + lim [p(w  {)  p(w  {  |)]


w<"

w<"

= d({) + d(|)
Apart from the trivial solution d({) = 0, the only other9 solution of d({ +
|) = d({) + d(|) is d({) = f{, where f is a constant. Hence, given that
limw<" p(w)  p(w  W ) = d(W ) exists, this is equivalent to the fact that
q 3W )
the sequence {eq }qD0 where eq = p(wq )3p(w
and wq A wq31 converges
W
to a constant f. The simplest sequence with this property is {eWq }qD0 where
eWq = p(q)  p(q  1) and W = 1. Lemma 6.1.1 states that
1
1X W
1X
p(q)
=
en = lim
p(n)  p(n  1) = lim
q<" q
q<" q
q<" q

q

n=1

n=1

f = lim

where the last equality follows from the Elementary Renewal Theorem (8.13).
Theorem 8.2.3 (Asymptotic Renewal Distribution) If the average
 = H [ ] and variance  2 = Var[ ] of the interarrival time of the events in
a renewal process exist, then
5

6
Z {
Q (w)  w
1
2
? {8 = s
h3x @2 gx
lim Pr 7 q
w<"
w
2 3"
 3

(8.15)

Proof : The Elementary Renewal Theorem states that Q (w)  w for large
w, which suggests to consider the random variable X (w) = Q(w)  w with
H [X(w)] $ 0. From the equivalence {Q (w) ? q} +, {Zq A w}, we have
{X (w) ? {w } +, {Z{w + w A w} where {w is such that {w + w is a positive


The proof is as follows: (i) if | = 0, we see that d({ + 0) = d({) + d(0) or d(0) = 0. (ii)
d(q{) = qd({) for integer q. (iii) Using (ii), we have
that

 d(q{ + p|) = qd({) + pd(|). By
choosing q{ + p| = 0 it follows from (i) that d 3 p
| = 3p
d (|) such that (ii) holds for
q
q
rational numbers. Thus, d(t1 { + t2 |) = t1 d({) + t2 d(|) for rational numbers t1 and t2 . (iv)


(y)
Recalling the denition i x+y
$ i (x)+i
of a convex function in Section 5.2 and the fact
2
2
that a function that is both concave and convex is a linear function, it follows that d({) is
linear and with (i) that d({) = f{.

148

Renewal theory

integer. Then,

h
i
Pr [X (w) ? {w ] = Pr Z{w + w A w


5
w
Z{w + w  {w +  
w  {w + w 

8
q
q
A
= Pr 7
 {w + w
 {w + w

The waiting time Zq consists of a sum of i.i.d. random variables with mean
 and variance  2 . By the Central Limit Theorem 6.3.1, there holds that


Z "
Zq  q
1
2
s
h3x @2 gx
lim Pr
A{ = s
q<"
 q
2 {
which implies that

6
5
Z "
Z{w + w  {w + w 
w  {w + w 
1
2

8
7
q
q
s
=
A
h3x @2 gx
lim Pr
w<"
w
w
2 |
 {w + 
 {w + 
provided



w3 {w + w 
limw<" t
 {w + w

= |. Hence, we must determine {w such that,

for large w,
{
q w
 {w +

r
which is satised if {w =

|2 2

w


= |

1+4


|

!
w


and provided the negq


w
ative sign is chosen. For large w, we see that {w   |

 + R(1). Thus,
r

Z "
1
| w
2
=s
h3x @2 gx
lim Pr X (w) ? 
w<"
 
2 |

which is equivalent to

22

6
Z "
Q (w)  w
1
2
lim Pr 7 q
? {8 = s
h3x @2 gx
w<"
2 3{
 w3

Noting that

R"

3x2 @2 gx
3{ h

R{

3x2 @2 gx
3" h

nally proves (8.15).

Comparing Theorem 8.2.3 to the Central Limit Theorem 6.3.1 shows that
the asymptotic variance of Q (w) behaves as
2
Var [Q (w)]
= 3
w<"
w

lim

(8.16)

8.3 The residual waiting time

149

Moreover, Theorem 8.2.3 is a central limit theorem for the dependent random
variables Q (Zq ) where dependence is obvious from Q (Zq ) = Q (Zq31 ) + 1.
8.3 The residual waiting time
Suppose we inspect a renewal process at time w and ask the question How
long do we have to wait on average to see the next renewal? This question
frequently arises in renewal problems. For instance, the arrivals of taxis at
a station is a renewal process and, often, we are interested to known how
long we have to wait until the next taxi. Also, packets arriving at a router
may nd an earlier packet that is partially served. In order to compute the
total time spent in the system, it is desirable to know the residual service
time of that packet. In addition, this problem belongs to one of the classical
examples to demonstrate how misleading intuition in probability problems
can be. There are two dierent arguments to the question above leading to
two dierent answers:
(i) since my inspection of the process does not alter or inuence the
process, the distribution of my waiting time should not depend on the
time w; hence, my average waiting time equals the average interarrival
time of the renewal process.
(ii) the time w of the inspection is chosen at random in (i.e. uniformly distributed over) the interval between two consecutive renewals; hence
my expected waiting time should be half of the average interarrival
time.
Both arguments seem reasonable although it is plain that one of them
must be wrong. Let us try to sort out the correct answer to this apparent
paradox, which, according to Feller (1971, pp. 1213), has puzzled many
before its solution was properly understood.
A(t)

R(t)
time
t

WN(t)

WN(t)+1

L(t)
Fig. 8.2. Denition of the random variables the age D(w), the lifetime O(w) and the
residual life (or waiting time) U(w).

Figure 8.2 denes the setting of the renewal problem and the quantities of
interest: D(w) is the age at time w, which is the total time elapsed since the

150

Renewal theory

last renewal before w at time ZQ(w) , the residual waiting time (or residual
life or excess life) U(w) is the remaining time at w until the next renewal at
time ZQ(w)+1 and O(w) is the total waiting time (or life time). From Fig. 8.2,
we verify that
D(w) = w  ZQ (w)
U(w) = ZQ (w)+1  w
O(w) = ZQ (w)+1  ZQ(w) = D(w) + U(w)
The distribution of the residual waiting time, IU(w) ({) = Pr [U(w)  {] will
be derived. Similar to the probabilistic argument before, we condition on
the rst renewal. If Z1 = v  w, then the rst renewal occurs before time
w and the event {U(w) A {|Z1 = v} has the same probability as the event
{U(w  v) A {} because the renewal process restarts from scratch at time v.
If v A w, the residual waiting time U(w) lies in the rst renewal interval [0> v].
In this case, we have either that the residual waiting time U(w) is certainly
shorter than { if v is contained in the interval [w> w + {], else the residual
waiting time U(w) is surely larger than {. In summary,
;
? Pr [U(w  v) A {] if 0  v  w
Pr [U(w) A {|Z1 = v] =
0
if w ? v  w + {
=
1
if v A { + w
Using the total law of probability (2.46),
Z "
g Pr [Z1  v]
Pr [U(w) A {] =
gv
Pr [U(w) A {|Z1 = v]
gv
0
Z "
Z w
g Pr [  v]
gv
Pr [U(w  v) A {] i (v)gv +
=
gv
0
{+w
Z w
Pr [U(w  v) A {] gI (v) + 1  I ({ + w)
=
0

This relation is an instance of the general renewal equation (8.11). Since


1  I ({ + w) is monotonously decreasing, for all {, it holds with (2.35) that
Z "
Z "
(1  I ({ + w)) gw 
(1  I (w)) gw = H [ ] ? 4
0

which also implies that limw<" 1I ({+w) = 0. Hence, k (w) = 1I ({+w)
is bounded for all w  0 and Lemma 8.1.1 is applicable, yielding
Z w
Pr [U(w) A {] = 1  I ({ + w) +
[1  I ({ + w  v)] gp(v)
0

8.3 The residual waiting time

151

Also, the conditions for direct Riemann integrability in the Key Renewal
Theorem 8.2.2 for j(w) = 1  I ({ + w) are satised such that
Z w
[1  I ({ + w  v)] gp(v)
lim Pr [U(w) A {] = lim
w<"
w<" 0
Z "
1
=
(1  I ({ + w)) gw
with (8.14)
H [ ] 0
Z "
1
=
(1  I (w)) gw
H [ ] {
In other words, the steady-state or equilibrium distribution function for the
residual waiting time equals
Z {
1
(1  I (w)) gw (8.17)
lim Pr [U(w)  {] = Pr [U  {] = IU ({) =
w<"
H [ ] 0
Similarly, for w A |, the event {D(w) A |} is equivalent to the event {no
renewals in [w  |> w]}, which is equivalent to {U(w  |) A |}. Hence,
lim Pr [D(w) A |] = lim Pr [U(w  |) A |] = lim Pr [U(w) A |]
w<"
w<"
Z "
1
(1  I (w)) gw
=
H [ ] |

w<"

or, both the residual waiting time U and the age D have the same distribution in steady state (w $ 4). Intuitively, when reversing the time axis in
steady state or looking backward in time, an identically distributed renewal
process is observed in which the role of the age D and the residual life U
are interchanged. Thus, by a time symmetry argument, both distributions
must be the same in steady state.
It is instructive to compute the average residual waiting time H [U] =
H [D] in steady state. Using the expression of the average in terms of tail
probabilities (2.35), we have
Z "
H [U] =
(1  IU ({)) g{
0
Z " Z "
1
=
g{
(1  I (w)) gw
H [ ] 0
{
Reversing the order of the {- and w-integration yields
Z "
Z w
1
gw (1  I (w))
g{
H [U] =
H [ ] 0
0
Z "
1
=
w (1  I (w)) gw
H [ ] 0

152

Renewal theory

After partial integration, we end up with



Z "
H 2
Var[ ] + (H [ ])2
1
2
=
w i (w)gw =
H [U] =
2H [ ] 0
2H [ ]
2H [ ]
or
H [ ] Var[ ]
+
H [U] =
2
2H [ ]

(8.18)

This expression shows that the average remaining waiting time equals half of
the average interarrival time plus the ratio of the variance over the mean of
the interarrival time. The last term is always positive. Since H [D] = H [U]
and H [O] = H [D] + H [U], we observe the curious result that
H [O] = H [ ] +

Var[ ]
 H [ ]
H [ ]

or that the average total waiting H [O] is longer than the average interarrival time H [ ], contrary to intuition. This fact is referred to as the inspection paradox: the steady-state interrenewal time, O(w) = ZQ(w)+1  ZQ(w) ,
containing the inspection point at time w, exceeds on average the generic
interarrival time, say Z1 . The explanation is that the inspection point at
time w is uniformly chosen over the time axis and every inspection point is
thus equally likely. The chance that the inspection point w lies in a renewal
interval is proportional to the length of that interval. Hence, it has higher
probability to fall in a long interval, which explains10 why H [O]  H [ ].
Only for deterministic interarrival times where Var[ ] = 0 holds the equality sign, H [O] = H [ ]. For exponential interarrival times, application of
(3.18) gives Var[ ] = (H [ ])2 and H [U] = H [ ] while H [O] = 2H [ ]: the
fact of being inspected at time w changes the lifetime distribution and even
doubles the expected total life time for exponentially distributed failure or
interoccurrence times.
Returning to the initial question, we observe that the intuitive result that
]
my waiting time H [U] = H[
2 is only correct for deterministic processes.
Thus, the variability in the interarrival process causes the paradox. We will
see later, in queueing theory in Section 14.3.1, that also in queueing systems the variability in service discipline causes the average waiting time to
increase. At last, Feller (1971, p. 187) remarks that an apparently unbiased
inspection plan may lead to false conclusions because the actual observations are not typical of the population as a whole. When people complain
that buses or trains start running irregularly, the inspection paradox shows
10

A similar type of reasoning is used in the computation of the waiting of the GI/D/m queueing
system in Section 14.4.2.

8.4 The renewal reward process

153

that above-average interarrival times are experienced more often. The inspection paradox thus implies that complaints may be erroneously based on
an overestimation of the real deviations from the regular time schedule of
busses or trains.
By separating each renewal interval into two non-overlapping subintervals
D(w) and U(w), we have described an alternating renewal process. An alternating renewal process models a system that can be in on- or o-period with
a repeating pattern [1 > \1 > [2 > \2 > = = = where each on-period [q has a same
distribution Ion and is followed by an o-period \q . Each o-period has also
a same distribution Io . The o-period \q may dependent on the on-period
[q , but the q-th renewal cycle with duration [q + \q is independent of any
other cycle. An alternating renewal process can be used to model a data
stream of packets, where the on-period reects the time to store or process
an arriving packet and the o-period a (random) delay between two packets.
Another example is the modeling of the end-to-end delay from a source v
to a destination g in the Internet, where the o-period describes a queueing
in a router due to other interfering tra!c along that path. During the onperiod, a packet is not blocked by other packets. The on-period equals the
propagation delay to travel from the output port of one router to the output
port of the next-hop router. The end-to-end delay along a path with k hops
equals the sum of k consecutive o-periods augmented by the propagation
time from v to g.
8.4 The renewal reward process
The renewal reward process associates at each renewal at time Zq a certain
cost or reward Uq , which may vary over time and can be negative. For
example, each time a light bulb fails, it must be replaced at a certain cost
(negative reward) or each customer in a restaurant pays for his meal (positive reward). The reward Uq may depend on the interarrival time q or
length of the q-th renewal interval, but it is independent of other renewal
epochs (dierent from the q-th). Thus, the pairs (Uq > q ) are assumed to be
independent and identically distributed. Most often one is interested in the
total reward U(w) over a period w (not to be confused with the residual life
time) dened as
X

Q(w)

U(w) =

Uq

(8.19)

q=1

In this setting, the renewal reward process is a generalization of the counting


process where Uq = 1.

154

Renewal theory

By slightly rewriting the total reward U(w) earned over an interval w as


PQ(w)
Uq Q (w)
U(w)
= q=1
w
Q (w)
w
and taking the limit w $ 4, the rst fraction tends with probability one
to the average reward H [U] per renewal period by the Strong Law of Large
1
Numbers (Theorem 6.2.2), while the second fraction tends to 1 = H[
] by
the Elementary Renewal Theorem 8.2.1. Hence, with probability one holds
that
U(w)
H [U]
=
(8.20)
lim
w<" w
H [ ]
which means that the time average reward rate equals the average award
per renewal period multiplied by the interarrival rate of renewals (or divided
by the average length of a renewal interval).
Similarly as in the proof of the Elementary Renewal Theorem 8.2.1, the
inequality for any w,
X

Q(w)

Q(w)

Uq  U(w) 

q=1

Uq + UQ(w)+1

q=1

leads, after taking the expectations and using Walds identity (2.69), to an
inequality for the averages,

H [Q (w)] H [U]  H [U(w)]  H [Q (w)] H [U] + H UQ(w)+1


Dividing by w, the limit w $ 4 becomes

H UQ(w)+1
H [Q (w)]
H [U(w)]
H [Q (w)]
H [U] lim
 lim
 H [U] lim
+ lim
w<"
w<"
w<"
w<"
w
w
w
w

Since the average reward per renewal period is nite and H UQ(w)+1 =
H [U], we obtain by the Elementary Renewal Theorem 8.2.1 that
H [U(w)]
H [U]
=
w<"
w
H [ ]
lim

(8.21)

Hence, by comparing (8.21) and (8.20), the time average of the average
reward rate equals the time average of the reward rate.
Example The hard disc in a network server is replaced at cost F1 at time
W . The lifetime or age of this mass storage has pdf iD . If the hard disc fails
earlier, the cost of the repair and the penalties for service disruption is F2 .
What is the long run cost of the hard disc in the server per unit time?

8.5 Problems

155

Consider the replacement of hard discs as a renewal process with i.i.d.


interarrival times  and with distribution
Rw
0 iD (x)gx if w ? W
Pr [ ? w] =
1
if w  W
The average replacement time follows from the tail expression (2.35),
Z "
Z W
H [ ] =
(1  Pr [ ? w]) gw =
(1  Pr [ ? w]) gw
0

The replacement cost F ( ) equals F ( ) = F1 1 ?W +F2 1 DW and the average


cost is with (2.13),
H [F] = F1 Pr [ ? W ] + F2 (1  Pr [ ? W ])
The Elementary Renewal Reward Theorem (8.21) and (8.20) states that the
long-run cost of replacements equals
F1 Pr [ ? W ] + F2 (1  Pr [ ? W ])
H [F]
=
RW
H [ ]
(1  Pr [ ? w]) gw
0

Usually the replacement time W is chosen to minimize this long-run cost.


8.5 Problems
More worked examples can be found in Karlin and Taylor (1975, Chapter
5).

(i) Calculate Pr ZQ(w)  { .


(ii) Derive a recursion equation for the generating function *Q(w) (}) =

H } Q(w) of the number of renewals in the interval [0> w] and deduce from that equation the renewal equation (8.9) and a relation
for Var[Q (w)].
(iii) In a TCP session from A to B, IP data packets and IP acknowledgement packets travel a distance of 2000 km over precisely the same
bi-directional path. In case of congestion, the average speed is 40000
km/s and without congestion the speed is three times higher. Congestion only occurs in 20% of the travels. What is the average speed
of IP packets in the TCP session?
(iv) The production of digitalized speech samples depends primarily on
the codec, with an eective average rate u (bits/s). Since this rate is
low compared to the ATM capacity F (bits/s), UMTS will use AAL2
mini-cells in which 1 ATM cell is occupied by Q users. The nancial
cost of an UMTS operator increases at qf euro per unit time whenever

156

Renewal theory

there are q ? Q speech samples are waiting for transmission and an


additional cost of N euro each time an ATM cell is transmitted. What
is the average cost per unit time for the UMTS operator?
(v) The cost of replacing a router that has failed is D euro. However,
one can decide to replace a router that has been in service for a
period of time W . The advantage of this approach is that the cost of
replacing a working router is only E euro, where E ? D. The policy
ChangeRouter consists of replacing a router either upon failure or
upon reaching the age W , whichever occurs rst. Replacement of the
current router by a new one occurs instantaneously and at each time
there can only be one router in the network. Let {[m } be a sequence
of i.i.d. random variables, where [m is the lifetime of a router m.
(a) Find the time average cost rate F of the policy ChangeRouter.
(b) Compute F if W = 5 years, the cost of replacing the failed
router is D = 10000 euro and the cost of replacing a working
router is E = 7000 euro. The independent random variables
[m are exponentially distributed and the average lifetime of a
router is 10 years.

9
Discrete-time Markov chains

A large number of stochastic processes belong to the important class of


Markov processes. The theory of Markov chains and Markov processes is well
established and furnishes powerful tools to solve practical problems. This
chapter will be mainly devoted to the theory of discrete-time Markov chains,
while the next chapter concentrates on continuous time Markov chains. The
theory of Markov processes will be applied in later chapters to compute or
formulate queueing and routing problems.

9.1 Denition
A stochastic process {[(w)> w 5 W } is a Markov process if the future state
of the process only depends on the current state of the process and not on
its past history. Formally, a stochastic process {[(w)> w 5 W } is a continuous
time Markov process if for all w0 ? w1 ? ? wq+1 of the index set W and
for any set {{0 > {1 > = = = > {q+1 } of the state space it holds that
Pr[[(wq+1 ) = {q+1 |[(w0 ) = {0 ,...,[(wq ) = {q ] = Pr[[(wq+1 ) = {q+1 |[(wq ) = {q ]
(9.1)
Similarly, a discrete-time Markov chain {[n > n 5 W } is a stochastic process
whose state space is a nite or countably innite set with index set W =
{0> 1> 2> = = =} obeying
Pr [[n+1 = {n+1 |[0 = {0 > = = = > [n = {n ] = Pr [[n+1 = {n+1 |[n = {n ]
(9.2)
A Markov process is called a Markov chain if its state space is discrete. The
conditional probabilities Pr [[n+1 = m|[n = l] are called the transition probabilities of the Markov chain. In general, these transition probabilities can
depend on the (discrete) time n. A Markov chain is entirely dened by the
transition probabilities (9.2) and the initial distribution of the Markov chain
157

158

Discrete-time Markov chains

Pr [[0 = {0 ]. Indeed, by the denition of conditional probability (2.45), we


obtain
Pr [[0 = {0 > = = = > [n = {n ] = Pr [[n = {n |[0 = {0 > = = = > [n31 = {n31 ]
Pr [[0 = {0 > = = = > [n31 = {n31 ]
and, by the denition of the Markov chain (9.2),
Pr [[0 = {0 > = = = > [n = {n ] = Pr [[n = {n |[n31 = {n31 ]
Pr [[0 = {0 > = = = > [n31 = {n31 ]
This recursion relation can be iterated resulting in
Pr [[0 = {0 > = = = > [n = {n ] =

n
Y

Pr [[m = {m |[m31 = {m31 ] Pr [[0 = {0 ]

m=1

(9.3)
which demonstrates that the complete information of the Markov chain is
obtained if, apart from the initial distribution, all time depending transition
probabilities are known.

9.2 Discrete-time Markov chain


If the transition probabilities are independent of time n,
Slm = Pr [[n+1 = m|[n = l]

(9.4)

the Markov chain is called stationary. In the sequel, we will conne ourselves
to stationary Markov chains. Since the discrete-time Markov chain is conceptually simpler than the continuous counterpart, we start the discussion
with the discrete case.
Let us consider a state space V with Q states (where Q = dim V can be
innite). It is convenient to introduce a vector notation1 . Since [n can
only take Q possible values, we denote the corresponding state vector at
discrete-time n by v[n] = [v1 [n] v2 [n] vQ [n]] with vl [n] = Pr [[n = l].
Hence, v[n] is a 1 Q vector. Since the state [n at discrete-time n must
P
be in one of the Q possible states, we have that Q
l=1 Pr [[n = l] = 1 or, in
PQ
vector notation, v[n]=x = l=1 1=vl [n] = 1, where xW = [1 1 1]. This
fact is also written as kv[n]k1 = 1, where kdk1 is the t = 1 norm of vector d
1

Unfortunately, a vector in Markov theory is represented as a single row matrix which deviates
from the general theory in linear algebra, followed in Appendix A, where a vector is represented
as a single column matrix. In order to be consistent with the literature on Markov processes,
we have chosen to follow the notation of Markov theory here, but elsewhere we adhere to the
general convention of linear algebra.

9.2 Discrete-time Markov chain

159

dened in the Appendix A.3. In a stationary Markov chain, the states [n+1
and [n are connected via the law of total probability (2.46),
Pr [[n+1 = m] =
=

Q
X
l=1
Q
X

Pr [[n+1 = m|[n = l] Pr [[n = l]


Slm Pr [[n = l]

(9.5)

l=1

which holds for all m, or, in vector notation,


v[n + 1] = v[n]S
where the transition probability matrix S is
5
S11
S12
S13

S1;Q31
S1Q
9 S21
S
S

S
S2Q
22
23
2;Q31
9
9 S31
S32
S33

S3;Q31
S3Q
9
S =9
..
.
.
.
.
..
..
..
..
9
.

9
7 SQ31;1 SQ31;2 SQ31;3 SQ31;Q31 SQ 31;Q
SQ1
SQ;2
SQ3

SQ;Q31
SQQ

(9.6)
6
:
:
:
:
:
:
:
8

(9.7)

Since (9.6) must hold for any initial state vector v[0], by choosing v[0] equal
to a base vector [0 0 1 0 0] (all columns zero except for column l)
which expresses that the Markov chain starts from one of the possible states,
say state l, then v[1] = [Sl1 Sl2 SlQ ]. Furthermore, since kv[n]k1 = 1 for
P
any n, it must hold that Q
m=1 Slm = 1 for any state l. The relation
Q
X

Slm = 1

(9.8)

m=1

means that, at discrete-time n, there certainly occurs a transition in the


Markov chain, possibly to the same state as at time n  1. The Q Q
transition probability matrix S thus consists of Q 2  Q transition probabilities Slm and at each row, one transition probability can be expressed in
P
terms of the others, e.g. Sln = 1  Q
m=1;m6=n Slm . A matrix with elements
0  Slm  1 obeying (9.8) is called a stochastic matrix whose properties are
investigated in Appendix A. Apart from the matrix representation, Markov
chains are often described by a directed graph (as illustrated in the gure below), where Slm is represented by an edge from state l to m provided
Slm A 0. Especially, this feature enables to deduce structural properties of
the Markov chain (such as e.g. communicating states) elegantly.

160

Discrete-time Markov chains


P22

P12
1
P41
4

2
P32

P16

P34
P45

3
P55

P75

P47

5
9
9
S =9
7

P63
6

P56
P67

0
0
0
S41
0
0
0

S12
S22
S32
0
0
0
0

0
0
0
0
0
S63
0

0
0
S34
0
0
0
0

0
0
0
S45
S55
0
S75

S16
0
0
0
S56
0
S67

0
0
0
S47
0
S67
0

P76

Given the initial state vector v[0], the general solution of (9.6) is
v[n] = v[0]S n

(9.9)

Similarly, when knowledge of the Markov chain at discrete-time n is available, we obtain from (9.6) that
v [n + q] = v[n]S q
The elements of the matrix S q are called the q-step transition probabilities,
Slmq = Pr [[n+q = m|[n = l]

(9.10)

for n  0 and q  0. Since the discrete Markov chain must be surely in one
of the Q states q time units later given that it started at time n in state l,
we obtain an extension of (9.8), for all q  1,
Q
X

Slmq = 1

(9.11)

m=1

The demonstration of (9.11) is by induction. If q = 1, (9.8) justies (9.11).


Assume that (9.11) holds. Any matrix element of S q+1 can be written as
P
q
Slmq+1 = Q
n=1 Sln Snm . Summing over all m, yields for q  0,
Q
X

Slmq+1 =

m=1

Q
X

Sln

Q
X

q
Snm

m=1

n=1

Q
X

Sln

(induction argument)

n=1

=1
This proves (9.11).

(q = 1 case)

6
:
:
:
8

9.2 Discrete-time Markov chain

161

9.2.1 Denitions and classication


9.2.1.1 Irreducible Markov chains
A state m in a Markov chain is said to be reachable from state l if it is possible
to proceed from state l to state m in a nite number of transitions which is
equivalent to Slmq A 0 for nite q. If every state is reachable from every other
state, the Markov chain is said to be irreducible. The example of the Markov
graph above is not irreducible because state 2 is absorbing. Markov theory is
considerably more simplied if we know that the chain is irreducible, which
justies to investigate methods to determine irreducibility.
An equivalent requirement for the Markov chain to be irreducible is that
the associated directed graph is strongly connected, i.e. if there is a path
from node l to node m for any pair of distinct nodes (l> m). Let us review
some basic notions from graph theory (see Appendix B.1). Denote by D the
adjacency matrix of S where all non-zero elements in S are replaced by 1.
A walk of length n from state l to state m is a succession of n arcs of the form
(q0 $ q1 )(q1 $ q2 ) (qn31 $ qn ), where q0 = l and qn = m. A path is
a walk in which all nodes are dierent, i.e. qo 6= qp for all 0  o 6= p  n.
Lemma B.1.1 (proved in Appendix B.1, art. 5) states that the number
of

n
walks of length n from state l to state m is equal to the element D lm .
A directed graph is strongly connected if and only if each non-diagonal
PQ31 n
PQ31 n
element of the matrix n=1
S or, equivalent, of E = n=1
D is positive.
Since S has Q states, the longest possible path between two states consists
of Q  1 hops. By summing over all powers of 1  n  Q  1, the element
elm of the matrix E equals the number of all possible walks (of any possible
length) between l and m. Hence, if elm A 0 for all l 6= m, there exists walks
from any state l to any other state m. The converse is readily veried.
Another way to determine irreducibility follows from the denition of
reducibility in the Appendix A.4. However, the methods for strongly connectivity or irreducibility are still algebraic in that they require matrix operations. A computationally more e!cient method consists of applying allpair shortest path algorithms on the Markov graph. Examples of all-pair
shortest path algorithms are that of Floyd-Warshall (with computational
complexity FFloyd-Warschall = R(Q 3 )) or the algorithm of Johnson (complexity FJohnson = R(Q 2 log Q + Q O), where O is the total number of links in
the Markov graph). These algorithms are nicely discussed in Cormen et al.
(1991).

162

Discrete-time Markov chains

9.2.1.2 Communicating states


If two states l and m are reachable from one to each other, they are said to
communicate, which is often denoted by l #$ m.
The concept of communication is an equivalence relation: (a) reexivity:
l #$ l since S 0 = L or Slm = lm . (b) symmetry: l #$ m then m #$ l
which follows from the denition of communication and (c) transitivity: if
l #$ m and m #$ n then l #$ n. The transitivity follows from the nonq+p
negativity of S and S q+p = S q S p such that the matrix element Sln
=
PQ
q
p
q
p
o=1 Slo Son  Slm Smn . By denition of l #$ m and m #$ n, we have
q
p A 0 for some nite q and p. Hence, S q+p A 0, which
Slm A 0 and Smn
ln
implies l #$ n. As an application, the total state space can be partitioned
into equivalence classes. States in one equivalence class communicate with
each other. If there is a possibility to start in one class and to enter another
class in which case there is no return possible to the rst class (otherwise
the two classes would form one class), the Markov chain is reducible. In
other words, a Markov chain is irreducible if the equivalence relations result
into one class.
9.2.1.3 Periodic and aperiodic Markov chains
q A 0 for some q  1. The
Consider a state m in a Markov chain with Smm
period gm is dened as the greatest common divisor of those q for which
q A 0. The gure below illustrates a Markov chain with period g = Q .
Smm

1
2

5
9
9
9
9
S =9
9
9
7

...

6
4
5

0 1 0 0
0 0 1 0
0 0 0 1
.. .. ..
. . .
0 0 0 0
1 0 0 0

..
.

0
0
0
..
.

:
:
:
:
:
:
:
1 8
0

Since the greatest common divisor of a set is the largest integer g that
divides any integer in a set, it is smaller than the minimum element in the
set. Thus,
q
1  gm  min{Smm
A 0}

PQ

q+p
qSp +
q p
The relation Smm
= Smm
mm
o=1;o6=m Smo Som deduced from matrix multiplication and the fact that all elements in S are non-negative shows that

9.2 Discrete-time Markov chain

163

n cannot decrease with increasing n = q + p. Hence, if S A 0, then all


Smm
mm
n
Smm A 0 for n A 1 and thus gm = 1.

Lemma 9.2.1 If two states l and m communicate (l #$ m), then gl = gm .


Proof: Let q and p be integers such that Slmq A 0 and Smlp A 0. From
P
q p
q p
S q+p = S q S p , the matrix element Sllq+p = Q
n=1 Sln Snl  Slm Sml . By
denition of q and p, Sllq+p A 0, and, by denition of a period, gl |(q + p).
Similarly, from S q+o+p = S q S o+p , the matrix element
Sllq+o+p =

Q
X
u=1

Sluq

Q
X

o
p
o
Sun
Snl
 Slmq Smm
Smlp

(9.12)

n=1

o A 0, which implies by denition that g |o, then we also have


Now, if Smm
m
that Sllq+o+p A 0 from which gl |(q + p + o). Both conditions gl |(q + p)
and gl |(q + p + o) imply that gl |o. But since gm is the largest such divisor
gm  gl . By symmetry of the communication relation (replace l $ m and

m $ l), gl  gm which proves the lemma.

The consequence of Lemma 9.2.1 is that all the states in an irreducible


Markov chain have common period g. The irreducible Markov chain is
periodic with period g if g A 1 else it is aperiodic (g = 1). A simple
su!cient condition for an irreducible chain to be aperiodic is that Sll A 0
for some state l. Most Markov chains of practical interest are aperiodic.

9.2.2 The hitting time


Let D be a subset of states, D  V. The hitting time WD is the rst positive
time the Markov chain is in a state of the set D, thus for n  0, WD =
min(n : [n 5 D). The hitting time2 of a state m follows from the denition
if D = {m}. For irreducible Markov chains, the hitting time Wm is nite, for
any state m.
From the denition of the hitting time, the recursion
X
Pr [[1 = n|[0 = l] Pr [Wm = p  1|[0 = n]
Pr [Wm = p|[0 = l] =
n6=m

is immediate. Indeed, in order to have the transition from state l to state m


at discrete-time p, it is necessary to have rst a transition from state l to
some other state n and to pass from that state n to state m for the rst time
2

The hitting time Wm is also called the rst passage time into a state m.

164

Discrete-time Markov chains

after p  1 time units. For a stationary Markov chain, we have for p A 0


that
X
Sln Pr [Wm = p  1|[0 = n]
(9.13)
Pr [Wm = p|[0 = l] =
n6=m

and, by denition for p  0,


Pr [Wm = p|[0 = m] = 0p

(9.14)

The event {[q = m} can be decomposed in terms of the hitting time Wm .


Indeed, since the events {Wm = p> [q = m} are disjointed for 1  p  q,
{[q = m} = ^qp=1 {Wm = p> [q = m}
Applied to the q-step transition probabilities,
Pr [[q = m|[0 = l] = Pr [^qp=1 {Wm = p> [q = m}|[0 = l]
q
X
=
Pr [Wm = p> [q = m|[0 = l]
=

p=1
q
X

Pr [Wm = p|[0 = l] Pr [[q = m|[0 = l> Wm = p]

p=1
p31
By denition of the hitting time, {Wm = p} = ^n=1
{[n 6= m} {[p = m}
such that

Pr [[q = m|[0 = l> Wm = p] = Pr [q = m|[0 = l> ^p31


n=1 {[n 6= m} {[p = m}

= Pr [[q = m|[p = m]
where the last step follows from the Markov property (9.2). Thus we obtain
Pr [[q = m|[0 = l] =

q
X

Pr [Wm = p|[0 = l] Pr [[q = m|[p = m]

p=1

or, written in terms of q-step transition probabilities with (9.10),


Slmq

q
X

q3p
Pr [Wm = p|[0 = l] Smm

(9.15)

p=1

For an absorbing state m where Smm = 1, relation (9.15) simplies to


Slmq = Pr [Wm  q|[0 = l]

(9.16)

9.2 Discrete-time Markov chain

165

9.2.3 Transient and recurrent states


The probability that a Markov chain initiated at state l will ever come into
state m is denoted as
ulm = Pr [Wm ? 4|[0 = l]

(9.17)

If the starting state l equals the target state m, then ull is the probability of
ever returning to state l. If ull = 1, the state l is a recurrent state, while,
if ull ? 1, state l is a transient state. If l is a recurrent state, the Markov
chain started at l will denitely (i.e. with probability 1) return to state l
after some time. On the other hand, if l is a transient state, the Markov
chain started at l has probability 1  ull of never returning to state l. For
an absorbing state l dened by Sll = 1, we have by (9.16) that ulm = 1,
implying that an absorbing state is a recurrent state. Further, the mean
return time to state m when the chain started in m is denoted by
pm = H [Wm ? 4|[0 = m]

(9.18)

Concepts of renewal theory will now be applied to a Markov process.


Let Qn (m) denote the number of times that the Markov chain is in state m
during the time interval [1> n] given the chain started in state l or Qn (m) =
Pn
q=1 1{[q =m|[0 =l} . Using (2.13), the average number of visits to state m in
the time interval [1> n] is
" n
#
n
X
X

1{[q =m} |[0 = l =


H 1{[q =m} |[0 = l
H [Qn (m)|[0 = l] = H
q=1

n
X

q=1

Pr [[q = m|[0 = l]

q=1

or, in terms of q-step transition probabilities (9.10),


H [Qn (m)|[0 = l] =

n
X

Slmq

(9.19)

q=1

The average number of times that the Markov chain is ever in state m
given that it started from state l, is with Q (m) = limn<" Qn (m),
H [Q (m)|[0 = l] =

"
X
q=1

Pr [[q = m|[0 = l] =

"
X

Slmq

q=1

Hence, if state m is reachable from state l, by denition, there is some n


for which Slmn A 0, which implies that H [Q (m)|[0 = l] A 0. Further, consider the probability Pr [Q (m)  q|[0 = l] that the number of returns to

166

Discrete-time Markov chains

state m exceeds q, given the Markov chain started from state l. The event
{Q (m)  q} is equivalent to the occurrence of the events {Q (m)  q  1}
and the event that the Markov chain will return to m again given that it
started from m. The probability of the latter event is precisely umm . Thus, we
obtain the recursion
Pr [Q (m)  q|[0 = l] = umm Pr [Q (m)  q  1|[0 = l]
with solution for q  1,
Pr [Q (m)  q|[0 = l] = (umm )q31 Pr [Q (m)  1|[0 = l]
Now, Pr [Q (m)  1|[0 = l] = Pr [Wm ? 4|[0 = l] = ulm , such that
Pr [Q (m)  q|[0 = l] = (umm )q31 ulm

(9.20)

The average computed with (2.36) yields,


X
ulm
=
Slmq
1  umm
q=1
"

H [Q (m)|[0 = l] =

(9.21)

provided ulm A 0. If ulm = 0 then (9.20) vanishes for every q and thus
H [Q(m)|[0 = l] = 0, which means that state m is not reachable from state
l. In summary:
For a recurrent state m for which umm = 1, we obtain from (9.21) that
H [Q (m)|[0 = l] $ 4 (if ulm 6= 0 else H [Q (m)|[0 = l] = 0) and from
(9.20),
Pr [Q(m) = 4|[0 = l] = lim Pr [Q (m)  q|[0 = l] = ulm
q<"

P
q
A state m is recurrent if and only if "
q=1 Smm $ 4.
For a transient state m for which umm ? 1, there holds that H [Q (m)|[0 = l]
will be nite and Pr [Q (m) = 4|[0 = l] = 0 or, equivalently,
Pr [Q (m) ? 4|[0 = l] = 1
P
q
A state m is transient if and only if "
q=1 Smm is nite.
These relations explain the dierence between a recurrent and a transient
state. When the Markov chain starts at a recurrent state, it returns innitely
often to that state because umm = Pr [Q (m) = 4|[0 = m] = 1. If the chain
starts at some other state l that is reachable from state m (ulm A 0), then the
chain will visit state m innitely often. From this analysis, some consequences
arise.

9.2 Discrete-time Markov chain

167

Corollary 9.2.2 A nite-state Markov chain must have at least one recurrent state.
Proof: Suppose the contrary that, if the state space V is nite, all states
are transient states. For a transient state m it follows from (9.21) that
P"
n
n
n=1 Slm is nite, which implies that limn<" Slm = 0 for any other state l.
If the state space is nite and all states are transient states, then
X
lim Slmn = 0
mMV

n<"

Since the summation has a nite number of terms, the limit and summation
operator can be reversed,
X
lim
Slmn = 0
n<"

mMV

But the total law of probability (9.11) requires that


time n), which leads to a contradiction.

P
mMV

Slmn = 1 (for any

Theorem 9.2.3 If l is a recurrent state that leads to a state m, then the


state m is also a recurrent state and ulm = uml = 1.
Proof: Clearly, the theorem is true if l = m. Suppose that, for l 6= m,
Pr [Wl ? 4|[0 = m] = uml ? 1. This implies that the Markov chain starting
from state m has probability 1  uml A 0 of never hitting state l, which is
impossible because l is a recurrent state that will be visited innitely often.
Hence uml = 1.
Since state m 6= l is reachable from state l, by denition ulm A 0 and there
is a minimum discrete-time q such that Slmq A 0 and Pr [[n = m|[0 = l] = 0
for n ? q. Similarly, since uml = 1, there exists a minimum discrete-time
p to have a transition from m $ l given the chain started in state m, thus,
Pr [[p = l|[0 = m] = Smlp A 0. From (9.12) and the fact that Slmq A 0 and
Smlp A 0 and state l is a recurrent state such that Sllo A 0, we have, for any
o  0,
q+o+p
 Smlp Sllo Slmq A 0
Smm

Summing over all o,


"
X
o=1

q+o+p
Smm
 Slmq Smlp

"
X
o=1

Sllo

168

or

Discrete-time Markov chains


"
X
o=1

o
Smm

"
X

o
Smm

o=q+p+1

"
X

q+o+p
Smm

o=1

Slmq Smlp

"
X

Sllo

o=1

P
o
It follows from (9.21) that the right-hand side diverges. Hence, "
o=1 Smm
diverges and relation (9.21) indicates that umm = 1 or, that m must be a
recurrent state.

A non-empty set F  V of states is said to be closed if no state l 5 F


@ F, which is
leads to a state m 5
@ F. Thus, ulm = 0 for any l 5 F and m 5
equivalent to Slm = 0. If the set F is closed, the Markov chain starting in
F will remain, with probability 1, in F all the time. For example, if l is
an absorbing state, F = {l} is closed. A closed set F is irreducible if state
l is reachable from state m for all l> m 5 F. Theorem 9.2.3 together with
Corollary 9.2.2 implies that, if F is a nite, irreducible closed set, all states
are recurrent.
9.3 The steady-state of a Markov chain
9.3.1 The irreducible Markov chain
The steady-state vector  = limn<" v[n] follows, after taking the limit n $
4 in (9.6), as
 = =S

(9.22)

or, for each component m ,


m =

Q
X

Snm n

(9.23)

n=1

with =x = 1 or kk1 = 1. Equation (9.22) shows that the steady-state


vector  does not depend on the initial state v[0].
Alternatively, in view of (9.9), we trivially write S n  S n31 S = 0 or
S S n31 S n = 0 and if D = limn<" S n exists, then DDS = 0 or S DD =
0. This implies that D(L  S ) = (L  S )D = 0 or D = T(1), where T() is
the adjoint matrix of S (see Appendix A.1, art. 7). The non-zero columns
(or rows) of the adjoint matrix T() consist of the (unscaled) eigenvector(s)
belonging to eigenvalue . By (9.11) the rows of S n for any n are normalized
and so must D = T(1). Since there is only one eigenvector  belonging to
 = 1 (Frobenius Theorem A.4.2), the rows of D = limn<" S n must all be
the same and equal to  = [d11 d12 d1Q ]. Furthermore, only if all rows
of D are equal to the steady-state vector , then the dependence on the

9.3 The steady-state of a Markov chain

169

PQ

initial state v[0] vanishes since relation (9.9) becomes m = l=1 vl [0]dlm =
P
n
d1m Q
l=1 vl [0] = d1m . Hence, D = limn<" S = x= or, componentwise, for
all 1  m  Q,
lim Slmn = m

(9.24)

n<"

The sequence of matrices S> S 2 > S 3 > = = = > S n thus converges to D = x= for
su!ciently large n. Instead of multiplying the last matrix S n in the sequence
by S to obtain the next one S n+1 , with a same computational eort, the
n
sequence S> S 2 > S 4 > = = = > S 2 , obtained by successively squaring, converges
considerably faster to D = x= and may be useful for sparse S .
On the other hand, relation (9.22) is an eigenvalue equation with eigenvalue  = 1 and eigenvector . The Frobenius Theorem A.4.2 states that
the transition probability matrix S has one eigenvalue  = 1 with corresponding eigenvector . Since in (9.22) the set (S  L)W  W = 0 has rank
Q  1, the normalization condition kk1 = 1 furnishes the (last) remaining
equation. Except for the trivial case where S is the identity matrix L, the
solution of  is obtained from
5
9
9
9
9
9
9
7

S11  1
S21
S31
S12
S22  1
S32
S13
S23
S33  1
..
..
..
.
.
.
S1;Q 1 S2;Q 1 S3;Q 1
1
1
1

..
.

SQ1;1
SQ1;2
SQ1;3
..
.

SQ 1
SQ 2
SQ 3
..
.

SQ 1;Q 1  1 SQ;Q 1
1
1

6 5

1
2
3
..
.

: 9
: 9
: 9
: =9
: 9
: 9
8 7 
Q 1
Q

6
0
: 9 0:
: 9 0:
: 9 :
: =9 . :
: 9 . :
: 9 . :
8 7 08
1
(9.25)
6

In practice, this method is used, especially if the number of states Q is large


and the transition probability matrix S does not exhibit a special matrix
structure.
In summary, for irreducible Markov chains, there are in general two ways3
of computing the state distribution : via the limiting process (9.24) or via
solving the set of linear equations (9.25). Recall that we have invoked the
Frobenius Theorem A.4.2, which is only applicable for irreducible Markov
chains. There exist cases of practical interest where (9.24) fails to hold. For
example, in the two-state Markov chain studied in Section 9.3.3, there is a
chain where the limit state bounces back and forth between state 0 and state
1. It is of importance to know whether the steady-state distribution  exists
in the sense that m 6= 0 for at least one m. If m = 0 for all m, then there is
no stationary (or equilibrium or steady-state) probability distribution.
3

A third method consists of a directed graph solution of linear, algebraic equations discussed by
Chen (1971, Chapter 3) and applied to the steady state equation (9.22) by Hooghiemstra and
Koole (2000).

170

Discrete-time Markov chains

9.3.2 The average number of visits to a recurrent state


A direct application of Lemma 6.1.1 to the steady-state of a Markov chain
is that, if (9.24) holds, then
q
1 X p
Slm = m
lim
q<" q
p=1

Invoking (9.19) where Qqq(m) is the fraction of the time the chain is in state
m during the interval [1> q], the relation is equivalent to
H [Qq (m)|[0 = l]
= m
q<"
q
lim

(9.26)

The time average of the average number of visits to state m given the Markov
chain started in state l converges to the steady-state distribution. In other
words, the long run mean fraction of time that the chain spends in state m
equals m and is independent of the initial state l. From (9.21), it immediately follows that, if m is a transient state, m = 0. Only recurrent states
m have a non-zero probability m that the steady-state is in recurrent state
m. Lemma 6.1.1 and its consequence (9.26) suggests to investigate Qqq(m) for
recurrent states m.
If the Markov chain starts in a recurrent state m, we know from (9.21)
that the chain returns to state m innitely often. Let Zn (m) denote the time
of the n-th visit of the Markov chain to state m. Then,
Zn (m) = min(Qp (m) = n)
pD1

The interarrival time between the n-th and (n1)-th visit is n (m) = Zn (m)
Zn31 (m). The interarrival times {n (m)}nD1 are independent and identically
distributed random variables as follows from the Markov property. Indeed,
every time w the Markov chain returns to state m, it behaves from that time
onwards as if the Markov process would have started from state m, ignoring
the past before time w. Moreover, they have a common mean H [ (m)] =
H [1 (m)] equal to the mean return time to m given by H [Wm |[0 = m] = pm
because the hitting time is Wm = 1 (m). In other words, just as in renewal
theory in Chapter 8, we have a counting process {Qp (m)> p  1} with
associated waiting times Zn (m) and i.i.d. interarrival times n (m), specied
by the equivalence
{Qp (m) ? n} +, {Zn (m) A p}
Invoking the Elementary Renewal Theorem (8.13), we obtain with pm =

9.3 The steady-state of a Markov chain

171

H [Wm |[0 = m]
lim

q<"

Qq (m)
1
=
q
pm

(9.27)

Thus, the chain returns to state m on average every pm time units and, hence,
the fraction of time the chain is in state m is roughly p1m . These results are
summarized as follows:
Theorem 9.3.1 (Limit Law of Markov Chains) If m is a recurrent state
and the Markov chain starts in state l, then, with probability 1,
1{Wm ?"}
Qq (m)
=
q<"
q
pm
lim

(9.28)

and
lim

q<"

ulm
H [Qq (m)|[0 = l]
=
q
pm

(9.29)

Proof: Above we have proved the case (9.27) where the initial state
[0 = m. In that case 1{Wm ?"} = 1. For an arbitrary initial distribution,
it is possible that the chain will never reach the recurrent state m. In that
case 1{Wm ?"} = 0 given [0 = l. It remains to proof (9.29). By denition,
0  Qq (m)  q or, 0  Qqq(m)  1, which demonstrates that, for any q, Qqq(m)
is bounded. From the Dominated Convergence Theorem 6.1.4, we have

Qq (m)
Qq (m)
[0 = l = H lim
[0 = l
lim H
q<"
q<"
q
q


1{Wm ?"}

[0 = l
=H
pm
ulm
Pr [Wm ? 4|[0 = l]
=
=
pm
pm
which completes the proof.

Theorem 9.3.1 introduces the need for an additional denition. A recurrent state m is called null recurrent if pm = 4, in which case (9.29) reduces
to
H [Qq (m)|[0 = l]
=0
(9.30)
lim
q<"
q
By Tauberian theorems (which investigate conditions for the converse of
Lemma 6.1.1 but which are far more di!cult, as illustrated in the book by
Hardy (1948)), it can be shown that, for null recurrent states, the stronger
result limq<" Slmq = 0 also holds. A recurrent state m is called positive

172

Discrete-time Markov chains

recurrent if pm ? 4. The dierence between a transient and a null recurrent state that both obey (9.30) lies in the fact that, for a transient state,
the limit limq<" H [Qq (m)|[0 = m] is nite while, for a null recurrent state,
limq<" H [Qq (m)|[0 = m] = 4. Relation (9.30) indicates that for a null
recurrent state
H [Qq (m)|[0 = m] = R (qd )
where 0 ? d ? 1, while for a positive recurrent state
H [Qq (m)|[0 = m] = m q + r (q)
The strength of the increase of H [Qq (m)|[0 = m] leads to term positive recurrent states also as strongly ergodic states while null recurrent states are
called weakly ergodic. Figure 9.1 sketches the classication of states in a
Markov process.
state j

recurrent

transient

Sj does not exist

null recurrent
Sj = 0

positive recurrent
Sj > 0
aperiodic
lim Pijk S j
kof

periodic
lim Pijkd d S

k of

Fig. 9.1. Classication of the states in a Markov process with the corresponding
steady state vector m .

With these additional denitions, Corollary 9.2.2 can be sharpened as


follows:
Corollary 9.3.2 A nite-state Markov chain must have at least one positive
recurrent state.

9.3 The steady-state of a Markov chain

173

Proof: By summing (9.11) over q and dividing by q, we nd


1XX p
Slm = 1
q
p=1
Q

m=1

Using (9.19) yields


Q
X
H [Qq (m)|[0 = l]
=1
q
m=1

When taking the limit q $ 4 of both sides, the summation and limit
operator can be reversed because the summation involves a nite number of
terms. Hence,
Q
X
m=1

H [Qq (m)|[0 = l]
=1
q<"
q
lim

which is only possible if at least one state m is positive recurrent because


transient and null recurrent states obey (9.30).

Similarly, Theorem 9.2.3 and the combined consequence can be sharpened:


Theorem 9.3.3 If l is a positive recurrent state that leads to a state m, then
the state m is also a positive recurrent state.
Theorem 9.3.4 An irreducible Markov chain with nite-state space is positive recurrent.
Alternatively, a Markov chain with a nite number of states has no null
recurrent states. Thus, nite-state Markov chains appear to have simpler
behavior than innite-state Markov chains.
Theorem 9.3.5 For an irreducible, positive recurrent Markov chain (even
with an innite-state space), the steady-state is unique.
Proof: The steady-state of a positive recurrent irreducible Markov chain
satises both (9.23) and (9.24), even for an innite-state Markov chain.
Suppose that d 6=  is a second steady-state vector which satises kdk1 = 1
and
dm =

Q
X
n=1

Snm dn

(9.31)

174

Discrete-time Markov chains

Multiplying both sides by Sml and summing over all m


Q
X

Sml dm =

m=1

Q
X

Sml

m=1

Q
X

Q
X

Snm dn

n=1

dn

n=1

Q
X

Snm Sml =

m=1

Q
X

2
dn Snl

n=1

The reversal in m- and n-summation is always allowed (even for Q $ 4) by


absolute convergence. Using (9.31),
dl =

Q
X

2
dn Snl

n=1

Repeating this process leads, for any q  1 and l  1 to


dl =

Q
X

q
dn Snl

n=1

In the limit for q $ 4, application of (9.24) yields


dl = l

Q
X

dn = l

n=1

which demonstrates uniqueness.

Theorem 9.3.6 For an irreducible, positive recurrent Markov chain holds


lim

q<"

H [Qq (m)|[0 = l]
Qq (m)
1
= lim
=
= m
q<"
q
q
pm

(9.32)

and
Qq (m)  qm g
$ Q(0> 1)
3@2 s
m m
q

(9.33)

where m2 = Var[Wm |[0 = m]


Proof: For an irreducible, nite-state Markov chain where ulm = 1, Theorem 9.3.1 and Theorem 9.3.4 together with (9.26) lead to the fundamental
relation (9.32). Relation (9.33) is an application of the Asymptotic Renewal Distribution Theorem 8.2.3. We have shown that the interarrival
times {n (m)}nD1 are i.i.d. with mean H [ (m)] = H [Wm |[0 = m] = pm and

(assumed nite) variance Var[ (m)] = Var[Wm |[0 = m] = m2 .

9.3 The steady-state of a Markov chain

175

As a corollary, from (8.16), we have


Var [Qq (m)|[0 = l]
= m2 m3
q
Moreover, since kk1 = 1, it must hold from (9.32) that
lim

q<"

(9.34)

Q
X
1
=1
pm
m=1

and, from (9.22), that


X Slm
1
=
pm
pl
Q

l=1

A Markov chain that is irreducible and for which all states are positive
recurrent is said to be ergodic. Ergodicity implies that both the steady-state
distribution  and the long-run probability distribution limn<" v[n] are the
same. Ergodic Markov chains are basic stochastic processes in the study of
queueing theory.
9.3.3 Example: the two-state Markov chain
The two-state Markov chain is dened by


1s
s
S =
t
1t
and illustrated in Fig. 9.2. A matrix computation of the two-state Markov
1p

1q

1
q

Fig. 9.2. A two-state Markov chain.

chain is presented in Appendix A.4.2. Here, we follow a probabilistic approach. Since there are only two states, at any discrete-time n, there holds
that Pr [[n = 0] = 1Pr [[n = 1]. Hence, it su!ces to compute Pr [[n = 0].
By the total law of probability and the Markov property (9.2), we have
Pr [[n+1 = 0] = Pr [[n+1 = 0|[n = 1] Pr [[n = 1]
+ Pr [[n+1 = 0|[n = 0] Pr [[n = 0]

176

Discrete-time Markov chains

or, from Fig. 9.2, the Markov chain can only be in state 0 at time n + 1, if
it is in state 0 at time n and the next event at time n + 1 brings it back to
that same state 0, or if it is in state 1 at time n and the next event at time
n + 1 induces a transfer to state 0. Introducing the transition probabilities,
Pr [[n+1 = 0] = t Pr [[n = 1] + (1  s) Pr [[n = 0]
= t (1  Pr [[n = 0]) + (1  s) Pr [[n = 0]
= (1  s  t) Pr [[n = 0] + t
This recursion can be iterated back to n = 0,
n31
X
(1  s  t)m
Pr [[n = 0] = (1  s  t) Pr [[0 = 0] + t
n

m=0

Using the nite geometric series


n,

Pn31
m=0

{m =

13{n
13{

for any { 6= 1 else

Pn31

t
t
n
+ (1  s  t) Pr [[0 = 0] 
Pr [[n = 0] =
s+t
s+t

m=0

{m =

(9.35)

With Pr [[n = 1] = 1  Pr [[n = 0],

s
s
(9.36)
+ (1  s  t)n Pr [[0 = 1] 
s+t
s+t

If |1  s  t| ? 1, the state  = Pr [[" = 0] Pr [[" = 1] directly


follows as
i
h
s
t
(9.37)
 = s+t
s+t
Pr [[n = 1] =

t
Observe from (9.35) and (9.36) that, if Pr [[0 = 0] = s+t
= Pr [[" = 0] and
s
Pr [[0 = 1] = s+t
= Pr [[" = 1], the Markov chain starts and remains the
whole time (for all n) in the steady-state. In addition, the probability of a
particular sequence of states can be computed from (9.3) or directly from
Fig. 9.2. For example,

Pr [[0 = 1> [1 = 0> [2 = 1> [3 = 1] = ts(1  t) Pr [[0 = 1]


We distinguish three cases:
(i) s = t = 0: The Markov chain consists of two separate states that
do not communicate. Each state can be considered as a single state,
irreducible, Markov chain. Any real number belonging to [0> 1] is a
steady-state solution of each separate set. Also, S = L and, hence,
S " = limn<" S n = L.

9.4 Problems

177

(ii) 0 ? s + t ? 2: The Markov chain is aperiodic irreducible positive


recurrent with steady-state  given in (9.37). This is the regular case.
(iii) s = t = 1: The Markov chain is periodic with period
1 12, but still irreducible positive recurrent with steady-state  = 2 2 given above.
However, S 2q = L and S 2q+1 = S such that limn<" S n does not
exist, but
 1 1
n
1X m
S = 12 21
lim
n<" n
2
2
m=1

9.4 Problems
(i) Given the transition probability matrix
5
0=8 0=2
S = 7 0=8 0=0
0=0 0=8

S>
6
0=0
0=2 8
0=2

(a) draw the Markov chain, (b) compute the steady-state vector in
three dierent ways.
(ii) Consider the discrete-time Markov chain with Q states and with
transition probabilities at each state m,
Sm>m+1 = 1 
Sm1 =

1
m

1
m

(a) draw the Markov chain, (b) show that the drift is positive, but
that the Markov chain is nevertheless recurrent.
(iii) Assume that trees in a forest fall into four age groups. Let e [n],
| [n], p [n] and x [n] denote the number of baby trees, young trees,
middle-aged trees and old trees, respectively, in the forest at a given
time period n. A time period lasts 15 years. During a time period,
the total number of trees remains constant, but a certain percentage
of trees in each age group dies and is replaced with baby trees. All
surviving trees in the baby, young and middle-aged group enter into
the next age group. Surviving old trees remain old. Let 0 ? se > s| >
sp > sr ? 1 denote the loss rates in each age group in percent.
(a) Make a discrete Markov chain presentation of the process of
aging and replacement in the forest.

178

Discrete-time Markov chains

(b) The distribution of tree population amongst dierent age categories in time period n is represented by

W
{ [n] = e [n] | [n] p [n] x [n]
If {[n + 1] = S {[n], what is the transition probability matrix
S?
(c) Let se = 0=1> s| = 0=2> sp = 0=3> sr = 0=4 and suppose that

W
{ [n] = 5000 0 0 0 . What is the number of trees in
each category after 15 and after 30 years?
(d) What is the steady-state situation?
(iv) A faulty digital video conferencing system shows a clustered error
pattern. If a bit is received correctly, then the chance to receive the
next bit correctly is 0.999. If a bit is received incorrectly, then the
next bit is incorrect with probability 0.95.
(a) Model the error pattern of this system using the discrete-time
Markov chain.
(b) How many communicating classes does the Markov chain have?
Is it irreducible?
(c) In the long run, what is the fraction of correctly received bits
and the fraction of incorrectly received bits?
(d) After the system is repaired, it works properly for 99.9% of
the time. A test sequence after repair shows that, when always starting with a correctly received bit, the next 10 bits
are correctly received with probability 0.9999. What is the
probability now that a correctly (and analogously incorrectly)
received bit is followed by another correct (incorrect) bit?

10
Continuous-time Markov chains

Just as it was convenient in Chapter 2 to treat discrete and continuous


random variables distinctly, the same recipe is advised for discrete-time and
continuous-time Markov chains. Here also, it appears that the continuous
case is more intricate than the discrete counterpart.

10.1 Denition
For the continuous-time Markov chain {[(w)> w  0} with Q states, the
Markov property (9.1) can be written as
Pr[[(w +  ) = m|[( ) = l> [(x) = {(x)> 0  x ?  ] = Pr[[(w +  ) = m|[( ) = l]
and reects the fact that the future state at time w +  only depends on the
current state at time  . Similarly as for the discrete-time Markov chain,
we assume that the transition probabilities for the continuous-time Markov
chain {[(w)> w  0} are stationary, i.e. independent of a point  in time,
Slm (w) = Pr [[(w +  ) = m|[( ) = l] = Pr [[(w) = m|[(0) = l]

(10.1)

Analogous to (9.5) and (9.6), the state vector v(w) in continuous-time with
components vn (w) = Pr [[(w) = n] obeys
v(w +  ) = v( )S (w)

(10.2)

Immediately, it follows from (10.2) that


v(w + x +  ) = v( )S (w + x)
v(w + x +  ) = v( + x)S (w) = v( )S (x)S (w)
= v( + w)S (x) = v( )S (w)S (x)
such that, for all w> x  0, the Q Q transition probability matrix S (w)
179

180

Continuous-time Markov chains

satises
S (w + x) = S (x)S (w) = S (w)S (x)

(10.3)

This fundamental relation1 (10.3) is called the Chapman-Kolmogorov equation. Furthermore, since the Markov chain must be at any time in one of
the Q states, the analogon of (9.8) is, for any state l,
Q
X

Slm (w) = 1

(10.4)

m=1

For continuous-time Markov chains, it is convenient to postulate the initial


condition of the transition probability matrix
S (0) = L

(10.5)

where S (0) = limw0 S (w). The relations (10.1), (10.3), (10.4) and (10.5) are
su!cient to describe the continuous-time Markov process completely.

10.2 Properties of continuous-time Markov processes


We will now concentrate on typical properties of a continuous-time Markov
process.

10.2.1 The innitesimal generator T


Lemma 10.2.1 The transition probability matrix S (w) is continuous for all
w  0.
Proof: Continuity is proved if limk<0 S (w + k) = limk<0 S (w  k) = S (w).
From (10.3) and (10.5), we have for k A 0,
lim S (w + k) = S (w) lim S (k) = S (w)L = S (w)

k<0

k<0

Similarly, the other limit follows for w A 0 and 0 ? k ? w from S (w) =


S (w  k)S (k).

If a function is dierentiable, it is continuous. However, the converse is


not generally true. Therefore, we include the additional assumption that
1

On a higher level of abstraction, S (w) can be viewed as a linear operator acting upon the vector
space dened by all possible state vectors v(w). Relation S (w + x) = S (x)S (w) is known as
the semigroup property. The family of these commuting operators possesses an interesting
algebraic structure (see e.g. Schoutens (2000)).

10.2 Properties of continuous-time Markov processes

181

the matrix
lim
k0

S (k)  L
= S 0 (0) = T
k

(10.6)

exists. This matrix T is called the innitesimal generator of the continuoustime Markov process and it plays an important role as shown below. The
innitesimal generator T corresponds to S L in discrete-time. From (10.4),
Q
X

Slm (k) = 1  Sll (k)

m=1>m6=l

and, dividing both sides by k and letting k approach zero, we nd for each
l with the denition of T that
Q
X

tlm = tll  0

(10.7)

m=1>m6=l
S (k)

Hence, the sum of the rows in T is zero, tlm = limk0 lmk  0 and tll  0.
The elements tlm of T are derivatives of probabilities and reect a change in
transition probability from state l towards state m, which suggests us to call
P
them rates. Usually, one denes tl = tll  0. Then, Q
m=1 |tlm | = 2tl ,
which demonstrates that T is bounded if and only if the rates tl are bounded.
Karlin and Taylor (1981, p. 140) show that tlm is always nite. For nitestate Markov processes, tm are nite (since tlm are nite), but, in general, tm
can be innite. If tm = 4, the state is called instantaneous because when
the process enters this state, it immediately leaves the state. In the sequel,
we conne the discussion to non-instantaneous states, thus 0  tm ? 4.
Continuous-time Markov chains with all states non-instantaneous are coined
conservative.
Probabilistically, (10.1) indicates that, for small k,
Pr [[(w + k) = m|[(w) = l] = tlm k + r(k)
Pr [[(w + k) = l|[(w) = l] = 1  tl k + r(k)

(l 6= m)
(10.8)

which clearly generalizes the Poisson process (see Theorem 7.3.1) and motivates us to call tl the rate corresponding to state l.
Lemma 10.2.2 Given the innitesimal generator T, the transition probability matrix S (w) is dierentiable for all w  0,
S 0 (w) = S (w)T

(10.9)

= TS (w)

(10.10)

182

Continuous-time Markov chains

These equations are called the forward (10.9) and backward (10.10) equation.
Proof: For w = 0, the lemma follows from the existence of T = S 0 (0).
The derivative S 0 (w) is dened, for w A 0, as
S 0 (w) = lim

k<0

S (w + k)  S (w)
k

where the derivative of the matrix has elements Slm0 (w) =


(10.3),

gSlm (w)
gw .

Using

S (w + k)  S (w) = S (w)S (k)  S (w) = S (w) (S (k)  L)


= S (k)S (w)  S (w) = (S (k)  L) S (w)
we obtain
S (k)  L
= S (w)T
k<0
k
S (k)  L
= lim
S (w) = TS (w)
k<0
k

S 0 (w) = S (w) lim

which proves the lemma.

Suppose we are interested in the probabilities vn (w) = Pr [[(w) = n] of


nding the system in state n at time w. Each component of the state vector
v(w) is determined by (10.2) as
vn (w + k) =

Q
X

vm (w)Smn (k)

m=1

from which
Q
X
Smn (k)
vn (w + k)  vn (w)
Snn (k)  1
= vn (w)
+
vm (w)
k
k
k
m=1>m6=n

In the limit k & 0, we nd with tmn = limk0


the dierential equation for vn (w),
v0n (w) = tn vn (w) +

Smn (k)
k

Q
X

and tn = limk0

tmn vm (w)

13Snn (k)
k

(10.11)

m=1>m6=n

which, together with the initial condition vn (0), completely determines the
probability vn (w) that the Markov process is in state n at time w=

10.2 Properties of continuous-time Markov processes

183

10.2.2 Algebraic properties of the innitesimal generator T


Equation (10.10) is a matrix dierential equation in w that can be similarly
solved as the scalar dierential equation i 0 (w) = ti (w). With the initial
condition (10.5), the solution is
S (w) = hTw

(10.12)

which demonstrates the importance of the innitesimal generator T, explicitly given by


6
5
t1
t12
t13
t1;Q 31
t1Q
9 t21
t2
t23
t2;Q 31
t2Q :
9
:
9 t31
t32
t3
t3;Q 31
t3Q :
9
:
(10.13)
T=9
:
..
..
..
..
..
..
9
:
.
.
.
.
.
.
9
:
7 tQ31;1 tQ31;2 tQ31;3 tQ31 tQ31;Q 8
tQ1

tQ;2

tQ 3

tQ;Q31

tQ

Moreover, if all eigenvalues n of T are distinct, art. 4 and art. 8 in Appendix


A.1 indicate that
S (w) = hTw = [diag(hn w )\ W

(10.14)

where [ and \ contain as columns the right- and left-eigenvectors of T


respectively. Written explicitly in terms of the right-eigenvectors {n and
left-eigenvectors |n (which both are an 1 Q matrices or column vectors as
common in vector algebra), (10.14) reads
S (w) =

Q
X

hn w {n |nW

n=1

where the inner or scalar vector product |nW {n = 1 while the outer product
{n |nW is an Q Q matrix,
6
5
{n1 |n1 {n1 |n2 {n1 |n3 {n1 |nQ
9 {n2 |n1 {n2 |n2 {n2 |n3 {n2 |nQ :
:
9
9
:
W
{n |n = 9 {n3 |n1 {n3 |n2 {n3 |n3 {n3 |nQ :
9
:
.
.
.
.
.
..
..
..
..
..
7
8
{nQ |n1 {nQ |n2 {nQ |n3

{nQ |nQ

If we further assume (thus omitting pathological cases) that S (w) is a stochastic, irreducible matrix
for any time w, Frobenius Theorem A.4.2 indicates that all eigenvalues hn w ? 1 and that only the largest one is precisely
equal to 1, say h1 w = 1, which corresponds to the steady-state eigenvector
|1W =  and {1 = x, where xW = [1 1 1]. Frobenius Theorem A.4.2

184

Continuous-time Markov chains

implies that all eigenvalues of T have a negative real part, except for the
steady-state eigenvalue 1 = 0. Hence, we may write
S (w) = x +

Q
X

h3|Re n w|+Im n w {n |nW

(10.15)

n=2

where S" = x is the Q Q matrix with each row containing the steadystate vector . The expression (10.15) is called the spectral or eigen decomposition of the transition probability matrix S (w).
Apart from the eigen decomposition method and the Taylor expansion
Tw

"
X
(Tw)n
n=0

n!

the matrix equivalent of h{ = limq<" (1 + {@q)q can be used,

Tw q
Tw
S (w) = h = lim L +
q<"
q

(10.16)

(10.17)

Since T has negative diagonal elements and positive o-diagonal elements,


computing the powers Tn as required in (10.16) suers from numerical
rounding-o error propagation. Relation (10.17) circumvents this problem
by choosing q su!ciently high, maxl tl w ? q, such that L + Tw
q has nonnegative elements smaller than 1 everywhere. For stochastic matrices S ,
n
the sequence S> S 2 > S 4 > = = = > S 2 rapidly converges. Yet another useful representation (10.24) of S (w) is discussed in Section 10.4.1.

10.2.3 Exponential sojourn times


We end this section on properties by proving a remarkable and important
characteristic of continuous-time Markov processes.
Theorem 10.2.3 The sojourn times m of a continuous-time Markov process
in a state m are independent, exponential random variables with mean t1m .
Proof: The independence of the sojourn times follows from the Markov
property (see the renewal argument in Section 9.3.2). The exponential sojourn time is proved in two dierent ways.
1. The proof consists in demonstrating that the sojourn times m satisfy the memoryless property. In Section 3.2.2, it has been shown that the
only continuous distribution that satises the memoryless property is the
exponential distribution.

10.2 Properties of continuous-time Markov processes

185

The event {m  w + W |m A W } for any W  0 and w  0 is equivalent to


the event {[(w + W + x) = m|[(W + x) = m> [(x) = m}. According to the
Markov property (9.1) and with (10.1),
Pr [m  w + W |m A W ] = Pr [[(w + W + x) = m|[(W + x) = m> [(x) = m]
= Pr [[(w + W + x) = m|[(W + x) = m]
= Smm (w)
which is independent of W illustrating the memoryless property. Using the
denition of conditional probability (2.44),
Pr [m  w + W |m A W ] =

Pr [m  w + W ]
= Smm (w)
Pr [m A W ]

which holds for any W and thus also for W = 0, where Pr [m A 0] = 1. The
distribution of the sojourn time at state m satises
Pr [m  w] = h3m w = Smm (w)
After dierentiation evaluated at w = 0, we nd m = tm .
2. An alternative demonstration of the exponential sojourn times starts
by considering for an initial state m, the probability Kq that the process
remains in state m during an interval [0> w]. The idea is to rst sample the
continuous-time interval with step qw and afterwards proceed to the limit
q $ 4, which corresponds to a sampling with innitesimally small step,



2w
w
= m> [
= m> = = = > [ (w) = m
Kq = Pr [(0) = m> [
q
q

q31
Y

pw
(p + 1)w

= m [
= m Pr [[(0) = m]
=
Pr [
q
q
p=0


q

= m [ (0) = m
= Pr [
Pr [[(0) = m]
q

q
w
Pr [[(0) = m]
= Smm
q

where (9.3) and (10.1) are used. For large q, Smm qw can be expanded in a
Taylor series around the origin,


1
w
w
0
= Smm (0) + Smm (0) + R
Smm
q
q
q2

w
1
= 1  tm + R
q
q2

186

Continuous-time Markov chains

such that

q

1
w
w
= exp q log 1  tm + R
Smm
q
q
q2

For large q the logarithm can be expanded to rst order as



w
w
1
1
= tm + R
log 1  tm + R
q
q2
q
q2
which shows that


q
w
= h3tm w
lim Smm
q<"
q

On the other hand,


lim Kq = Pr [[(x) = m> 0  x  w]

q<"

Hence, the probability that the process remains in state m at least for a
duration w equals
Pr [[(x) = m> 0  x  w] = h3tm w Pr [[(0) = m]
Conditioned to the initial state with (2.44),
Pr [[(x) = m> 0  x  w|[(0) = m] = Pr [m  w] = h3tm w

(10.18)

Without resorting to the memoryless property, Theorem 10.2.3 has been


proved.

In summary, the continuous-time Markov process {[(w)> w 5 W } can be


described in two equivalent ways, either by the transition probability matrix
S (w) or by the innitesimal generator T. In the rst description, the process
starts at time w = w0 = 0 in state {0 , where it stays until a transition occurs
at w = w1 , which makes the process jump to state {1 . In state {1 , the process
stays until w = w2 at which time it jumps to state {2 , and so on. The sequence
of states {0 > {1 > {2 > = = = is a discrete Markov process and is called the embedded
Markov chain. The embedded Markov chain is further discussed in Section
10.4. The innitesimal description based on T formulates the evolution of
the process in terms of rates. The process waits in a state m until a jump
or trigger occurs with rate tm and the average waiting time in state m is t1m .
If tm = 0, the Markov process stays innitely long in state m, implying that
state m is an absorbing state.

10.3 Steady-state

187

10.3 Steady-state
Theorems 9.3.4 and 9.3.6 demonstrate that, when a nite-state Markov chain
is irreducible (all states communicate and Slm (w) A 0), the steady-state 
exists. Since, by denition, the steady-state does not change over time, or
limw<" S 0 (w) = 0, it follows from (10.9) and (10.10) that
TS" = S" T = 0
where limw<" S (w) = S" . This relation implies that S" is the adjoint
matrix of T belonging to eigenvalue  = 0, which plays a role analogous
to  = 1 in the discrete case. By the same arguments as in the discrete
case and as shown in Section 10.2.2, all rows of S" are proportional to the
eigenvector of T belonging to  = 0. Thus, the steady-state (row) vector 
is solution of
T = 0

(10.19)

which means that  is orthogonal to any column vector of T such that


necessarily det T = 0 in order for a non-zero solution to exist. A single
component of  in (10.19) obeys, using (10.7),
l tl =

Q
X

m tml

(10.20)

m=1>m6=l

This equation has a continuity or conservation law interpretation. The lefthand side reects the long-run rate at which the process leaves state l. The
right-hand side is the sum of the long-run rates of transitions towards the
state l from other states l 6= m or the aggregate long-run rate towards state
l. Both in- and outwards ux at any state l are in steady-state precisely
in balance. Therefore relations (10.20) are called the balance equations.
The balance equation (10.20) directly follows from the dierential equation (10.11) of the state probabilities vn (w) since limw<" vn (w) = n and
limw<" v0n (w) = 0.
Alternatively, the steady-state vector  obeys (10.2) or
 = v(0)S" = lim v(0)hTw
w<"

which, together with (10.14), implies that all eigenvalues of T must have negative real part such that only  = 0 determines the steady-state. This stability condition on the eigenvalues corresponds to that in a linear, time-variant
system. Since all rows in S" are equal (see also (10.15)), the dependence of
the steady-state vector  on the initial state drops out. For, analogous to

188

Continuous-time Markov chains

the discrete-time case and recalling the normalization kv(0)k1 = 1, a single


component becomes
m =

Q
X

vn (0) (S" )nm = (S" )1m

n=1

Q
X

vn (0) = (S" )1m

n=1

10.4 The embedded Markov chain


The main dierence between discrete and continuous-time Markov chains
lies, apart from the concept of time, in the determination of the number
of transitions. The sojourn time in a discrete chain is deterministic and
all times are equal to 1. In other words, if Ilm (w) denotes the distribution
function of the time until a transition from state l to state m occurs, then it
is plain that, for a discrete-time process,
Ilm (w) = 1wD1
Even though the process remains in state m with probability Smm , there has
been a transition precisely after w = 1 units.
On the other hand, Theorem 10.2.3 demonstrates that the sojourn times
in state m are exponential distributed with mean t1m . After, on average t1m
time units, a transition from state m to another state occurs. In contrast
to discrete-time Markov chains, after the exponentially distributed sojourn
time in state m the process makes a transition to other states l 6= m. Let us
investigate this fact in more detail. Let us denote
Ylm (k) = Pr [[(k) = m|[(k) 6= l> [(0) = l]
which describes the probability that, if a transition occurs, the process moves
from state l to a dierent state m 6= l. Using the denition of conditional
probability (2.44),
Ylm (k) =

Slm (k)
Pr [{[(k) = m} _ {[(k) 6= l} |[(0) = l]
=
Pr [[(k) 6= l|[(0) = l]
1  Sll (k)

In the limit k & 0, we have


Slm (k)
k
k0 13Sll (k)
k

Ylm = lim Ylm (k) = lim


k0

tlm
tl

P
By (10.7), we see that m=1>m6=l Ylm = 1, demonstrating that, given a transition, it is a transition out of state l to another state m. The quantities Ylm
correspond to the transition probabilities of the embedded Markov chain.

10.4 The embedded Markov chain

189

Alternatively, we can write the rate tlm in terms of the transition probabilities Ylm of the embedded Markov chain as
tlm = tl Ylm

(10.21)

Since tl is the rate (i.e. the number of transitions per unit time) of the
process in state l, relation (10.21) shows that the transition rate tlm from
state l to state m equals the rate of transitions in state l multiplied by the
probability that a transition from state l to state m occurs. By denition,
Yll = 0. For, if we assume that Ylm A 0, relation (10.21) would result in
tll = Yll tl A 0 which contradicts the denition tll = tl . Hence, in the
embedded Markov chain specied by the transition probability matrix Y ,
there are no self-transitions (Yll = 0), which is equivalent to the fact that
the sum of the eigenvalues of Y is zero (A.7), since trace(Y ) = 0.
From the steady-state equation or balance equation (10.20), (10.21) and
Yll = 0, we observe that
l tl =

Q
X

m tm Yml

m=1

On the other hand, the embedded Markov chain has a steady-state vector y
that obeys (9.22) or (9.23)
yl =

Q
X

ym Yml

m=1

and kyk1 = 1. The relations between the steady-state vectors of the continuoustime Markov chain  and of its corresponding embedded discrete-time Markov
chain y, are
l tl
(10.22)
yl = PQ
m=1 l tl
yl @tl
l = PQ
m=1 ym @tm

(10.23)

The classication in the discrete-time case into transient and recurrent


can be transferred via the embedded Markov chain to continuous Markov
processes.

10.4.1 Uniformization
The restriction Yll = 0 or tll = 0, which means that there are no selftransitions from a state into itself, can be removed. Indeed, we can rewrite

190

Continuous-time Markov chains

the basic relation (10.12) between the transition probability matrix S (w) and
the innitesimal generator T for all  as




T
T
= h3w exp w L +
S (w) = exp Lw + w L +


Dening W () = L + T
 and   maxl tl , a description, alternative to (10.15),
(10.16) and (10.17), appears
Slm (w) = h3w

"
X
(w)n
n=0

n!

Wlmn ()

(10.24)

where W () is a stationary transition probability matrix and, hence, a stochastic matrix.
We also observe that W () = T + L can be regarded as a rate matrix,
with the property that, for each state l,
Q
X

Wlm () =

m=1

Q
X
m=1

Tlm + 

Q
X

lm = 

m=1

the transition rate in any state l is precisely the same, equal to . Whereas
the embedded Markov chain dened by (10.21) has no self-transitions (Yll =
P
tl
0), we see for any l and m, that Wll () = 1  1 Q
m=1;m6=l tlm = 1    0 while
t
Wlm () = lm . Hence, W () can be interpreted as an embedded Markov chain
that allows self-transitions. In view of (10.21), the embedded structure of
W () is summarized as
tlm = Wlm ()

for l 6= m

tll = 1  Wll ()


where the constant rate tl =  for any state l is, besides self-transitions tll 6=
0, the characterizing property. These properties also reveal that, starting
from the embedded chain Y where Yll = 0, we can add self-transitions
tll A 0 with the eect that, on (10.7), the transition rate tl $ tl + tll . The
opposite gure illustrates an embedded Markov chain with self-transitions
and the corresponding transition probability matrix, where the transition
P
t
rates tl follow from 6m=1 Ylm = 1 with Ylm = tlml . This change in transition
rate will change the steady-state vector since the balance equations (10.20)
change. However, the Markov process {[(w)> w  0} is not modied because
a self-transition does not change [(w) nor the distribution of the time until
the next transition to a dierent state. But self-transitions clearly change
the number of transitions during some period of time. When the transition

10.4 The embedded Markov chain

rate tm at each state m are the same, the embedded Markov


called a uniformized chain.
5
0 tt121
0
q22
9
t22
q12
q46
0
9 0
1
t2
9
q61
q24
t32
t33
2
9
0
4
t3
t3
Y =9
t43
q51
q32
9
0
0
q65
9
t
6
q52
q43
9 t51 t52 t534
7
3
t5
t5
t5
5
q53
q56
t61
0
0
t6
q36

q33

191

chain W () is
0
t24
t2

0
0
0
0

0
0
0
0
0
t65
t6

6
0
:
0 :
t36 :
:
t3 :
t46 :
t4 :
t56 :
t5 8
0

In addition, in a uniformized chain, the steady-state vector w() of W ()


is the same as the steady-state vector . Indeed, from (9.23),
wm () =

Q
X

Wnm ()wn ()

n=1

we have, with W () = L +

T
,

Q
X
tnm
wn ()
nm +
wm () =

n=1

= wm () +

Q
1X
wn ()tnm

n=1

or,
wn ()tn =

Q
X

wn ()tnm

n=1;n6=m

where wn () = n (independent of ) since it satises the balance equation (10.20) and Theorem 9.3.5 assures that the steady-state of a positive
recurrent chain is unique.
We will now interpret (10.24) probabilistically. Let Q(w) denote the
total number of transitions in [0> w] in the uniformized (discrete) process
{[n ()}. Since the transition rates tl =  are all the same, Q (w) is a Poisson process with rate  because, for any continuous-time Markov chain,
the inter-transition or sojourn times are i.i.d. exponential random variables.
n
Thus, Pr [Q (w) = n] = h3w (w)
is recognized as the probability that the
n!
number of transitions that occur in [0> w] in the uniformized Markov chain
with rate  equals n. With (9.10), Wlmn () = Pr [[n () = m|[0 () = l] is the
n-step transition probability of that discrete {[n ()} uniformized Markov

192

Continuous-time Markov chains

process. Relation (10.24) can be interpreted as


Slm (w) =

"
X

Pr [[n () = m|[0 () = l> Q (w) = n] Pr [Q (w) = n]

n=0

or, the probability that the continuous Markov process moves from state l
to state m in a time interval of length w, can be decomposed in an innite
sum of probabilities. Each probability corresponds to a transition from state
l to state m in n-steps, where the number of intermediate transitions n is a
Poisson counting process with rate .
10.4.2 A sampled-time Markov chain
The sampled-time Markov chain approximates the continuous Markov process
in that the transition probabilities Slm (w) are expanded to rst order as in
(10.8) with xed step k = w. The transition probabilities of the sampledtime Markov chain are
Slm = tlm w

(l 6= m)

Sll = 1  tl w
Clearly, the sampled-time Markov chain also allows self-transitions, as illustrated in Fig. 10.1.
1  q12't
q12

1
q51

2
q52

q23

q53

q34
4

q45
Continuous-time Markov process

1  q23't
q12't

q23't
q53't

q51't

1  q34't
3
q34't

q52't

4 1  q45't
q45't
1  (q51 + q52 + q53 't
Sampled-time Markov chain
5

Fig. 10.1. A continuous-time Markov process and its corresponding sampled-time


Markov chain.

From (10.8), we observe that the approximation lies in two facts: (a) w is
xed such that tlm w  Pr [[(w + w) = m|[(w) = l] is increasingly accurate
as w $ 0 and (b) transitions occur at discrete times every w time units.
The sampling step w should be chosen such that the transition probabilities
obey 0  Slm  1, from which we nd that w  max1l tl .

10.5 The transitions in a continuous-time Markov chain

193

Let y denote the steady-state vector of the sampled-time Markov chain


with kyk1 = 1. Being a discrete Markov chain, the steady-state vector
components ym satisfy (9.23) for each component m,
ym =

Q
X

Q
X

Snm yn = w

n=1

tnm yn + (1  tm w) ym

n=1;n6=m

or
tm ym =

Q
X

tnm yn

n=1;n6=m

By comparing with the balance equation (10.20) and on the uniqueness of


the steady-state (Theorem 9.3.5), we observe that y =  or, the steady-state
of the sampled-time Markov chain is exactly (not approximately) equal to
the steady-state of the continuous Markov chain for any sampling step w
 max1l tl . Although we can possibly miss by sampling every w time units
the smaller-scale dynamics of the continuous Markov chain, the long-run
behavior or steady-state is exactly captured!

10.5 The transitions in a continuous-time Markov chain


Based on the embedded Markov chain, there exists a framework that deduces
all properties of the continuous Markov chain. In particular, the exponential sojourn times of a continuous-time Markov (Theorem 10.2.3) chain is
postulated as a dening characteristic.
Theorem 10.5.1 Let Ylm denote the transition probabilities of the embedded
Markov chain and tlm the rates of the innitesimal generator. The transition
probabilities of the corresponding continuous-time Markov chain are found
as
Z w
X
Yln
h3tl x Snm (w  x)gx
(10.25)
Slm (w) = lm h3tl w + tl
n6=l

Proof: If l is an absorbing state (tl = 0), then, by denition, Slm (w) = lm
for all w  0. For a non-absorbing state l and a process starting from state l,
the event {  w> [( ) = n} _ {[(w) = m} is possible if and only if the rst
transition from l to n occurs at some time x 5 [0> w] and the next transition
from n to m takes place in the remaining time w  x. The probability density
g
Pr [l  w] = tl h3tl w for w  0 and
function of the sojourn time is il (w) = gw

194

Continuous-time Markov chains

for innitesimally small , we have


s = Pr [  w> [( ) = n> [(w) = m|[(0) = l]
Z w
=
gx Pr [ = x> [(x  ) = l|[(0) = l]
0

Pr [[(x) = n|[(x  ) = l] Pr [[(w) = m|[(x) = n]


Z w
=
gxil (x)Yln Snm (w  x)
0
Z w
tl h3tl x Snm (w  x)gx
= Yln
0

Furthermore,
Pr[  w and [(w) = m|[(0) = l] =

Pr[w  > [( ) = n> [(w) = m|[(0) = l]

n6=l

and
Pr [ A w and [(w) = m|[(0) = l] = lm Pr [l A w] = lm h3tl w
Finally,
Slm (w) = Pr [[(w) = m|[(0) = l]
= Pr[  w and [(w) = m|[(0) = l] + Pr[ A w and [(w) = m|[(0) = l]
Combining all above relations into the last one proves the theorem.
By a change of variable v = w  x in (10.25), we have
Slm (w) = lm h3tl w + tl

Z
Yln h3tl w

htl v Snm (v)gv

n6=l

and, after dierentiation with respect to w, we nd for w  0,


Z w
X
X
tn Yln h3tl w
htl v Snm (v)gv + tl
Yln Snm (w)
Slm0 (w) = tl lm h3tl w  tl
n6=l

n6=l

Yln Snm (w)


= tl lm h3tl w  tl Slm (w)  lm h3tl w + tl
= tl Slm (w) + tl

X
n6=l

n6=l

Yln Snm (w)

10.6 Example: the two-state Markov chain in continuous-time

195

Evaluated at w = 0, recalling that S 0 (0) = T and S (0) = L,


X
Slm0 (0) = tl Slm (0) + tl
Yln Snm (0)
tlm = tl lm + tl

n6=l

Yln nm = tl lm + tl Ylm

n6=l

which is precisely relation (10.21). With tl = tll and (10.21), we arrive at


Slm0 (w)

Q
X

tln Snm (w)

n=1

which is precisely the backward equation (10.10). Hence, (10.25) can be


interpreted as an integrated form of the backward equation and thus of the
entire continuous-time Markov process.

10.6 Example: the two-state Markov chain in continuous-time


The continuous-time two-state Markov chain is dened by the innitesimal
generator


 
T=
 
where >   0. We will solve S (w) from the forward equation (10.9),
 0



0 (w)
S11 (w) S12
S11 (w) S12 (w)
 
=
0 (w) S 0 (w)
S21
S21 (w) S22 (w)
 
22
which actually contains two independent transition probabilities because
S12 (w) = 1  S11 (w) and S21 (w) = 1  S22 (w). The forward equation simplies
to
0
(w) = ( + )S11 (w) + 
S11
0
(w) = ( + )S22 (w) + 
S22

Only the rst equation needs to be solved since, by symmetry, the solution
of S11 (w) equals that of S22 (w) after changing the role of  $  and  $ .
The linear, rst-order, non-homogeneous dierential equation consists of
the solution to the corresponding homogeneous dierential equation and a
particular solution. The solution of the homogeneous dierential equation,
0 (w) = ( + )S (w), is S (w) = Fh3(+)w . The particular solution
S11
11
11
is generally found by variation of the constant F, which proposes S11 (w) =

196

Continuous-time Markov chains

F(w)h3(+)w as general solution, where F(w) needs to satisfy the original


dierential equation. Hence,
F 0 (w) = h(+)w
 (+)w
or, after integration, F(w) = +
h
+ f. The integration constant f
follows from the initial condition (10.5), S11 (0) = 1. Finally, we arrive at

 3(+)w

+
h
+ +
 3(+)w

+
h
S22 (w) =
+ +
S11 (w) =

from which the steady-state vector is immediate,


i
h


 = +
+
10.7 Time reversibility
In this section, we consider only ergodic Markov chains that have a non-zero
steady-state distribution . Suppose the Markov process operates already in
the steady-state, or, in other words, the Markov process is stationary. We are
interested in the time-reversed process dened by the sequence [q > [q31 > = = =
We will show that this reversed time sequence again constitutes a Markov
process.
Theorem 10.7.1 The time-reversed Markov process is a Markov chain.
Proof: It su!ces to demonstrate that the time-reversed process satises the
Markov property
Pr [[q = {q |[q+1 = {q+1 > = = = > [q+n = {q+n ] = Pr [[q = {q |[q+1 = {q+1 ]
By denition of the conditional probability (2.44),
U = Pr [[q = {q |[q+1 = {q+1 > [q+2 = {q+2 > = = = > [q+n = {q+n ]

= Pr [q = {q | _np=1 {[q+p = {q+p }

Pr _np=0 {[q+p = {q+p }

=
Pr _np=1 {[q+p = {q+p }
Since the intersection is commutative D _ E = E _ D, the indices can be reversed,

Pr _0p=n {[q+p = {q+p }


U=
Pr [_1p=n {[q+p = {q+p }]

Pr [q+n = {q+n | _0p=n1 {[q+p = {q+p } Pr _0p=n1 {[q+p = {q+p }


=
Pr [_1p=n {[q+p = {q+p }]

10.7 Time reversibility

197

The original stationary process is a Markov process that satises (9.2). Using (9.2)
and (9.3) we have

Pr [q+n = {q+n | _0p=n1 {[q+p = {q+p } = Pr[[q+n = {q+n |[q+n1 = {q+n1]


and
n1
Y

Pr[[q+p = {q+p |[q+p1 = {q+p1 ]


Pr _0p=n1 {[q+p = {q+p } = Pr[[q = {q ]
p=1

and, similarly,
n
Y

Pr[[q+p = {q+p |[q+p1 = {q+p1 ]


Pr _1p=n {[q+p = {q+p } = Pr[[q+1 = {q+1 ]
p=2

Hence,
Pr [[q+n = {q+n |[q+n1 = {q+n1 ] Pr [[q+1 = {q+1 |[q = {q ] Pr [[q = {q ]
Pr [[q+n = {q+n |[q+n1 = {q+n1 ] Pr [[q+1 = {q+1 ]
Pr [[q+1 = {q+1 |[q = {q ] Pr [[q = {q ]
=
Pr [[q+1 = {q+1 ]

U=

Applying Bayes rule (2.48) to the last relation nally proves the theorem.

Consider the transition probability of the time-reversed Markov process


Ulm = Pr [[q = m|[q+1 = l]
With Bayes rule (2.48),
Pr [[q = m|[q+1 = l] =

Pr [[q+1 = l|[q = m] Pr [[q = m]


Pr [[q+1 = l]

and, since the process is stationary,


Pr [[q = m] = m > Pr [[q+1 = l] = l
the transition probability of the time-reversed process is
Ulm =

m Sml
l

(10.26)

A Markov chain is said to be time reversible if, for all l and m, Slm = Ulm .
From (10.26), the condition for time reversibility is
l Slm = m Sml

(10.27)

This condition means that, for all states l and m, the rate l Slm from state
l $ m equals the rate m Sml from state m $ l. An interesting property of
time reversible Markov chains is that any vector { satisfying k{k1 = 1 and

198

Continuous-time Markov chains

{l Slm = {m Sml is a steady-state vector of a time-reversible Markov chain.


Indeed, summing over all l,
X
X
{l Slm = {m
Sml = {m
l

Theorem 9.3.5 indicates that the steady-state is unique and, thus, { = .


As a side remark, we note that a transition matrix is only equal to its
transpose S = S W if the Markov process is time reversible and doubly
stochastic (i.e. l = Q1 for all l, as shown in Appendix A.5.1).
The continuous-time analogon can be immediately deduced from the discrete-time embedded Markov chain dened by the transition probabilities
Ylm . Let Xlm denote the transition probabilities of the time-reversed embedded Markov chain and ulm the rates of the corresponding continuous Markov
chain, then by (10.21)
ulm = ul Xlm

(10.28)

We will now show that the rates ul of the time-reversed continuous Markov
process are indeed exponential random variables. Assume that the timereversed process is in state l at time w. The probability that the process is
still in state l at reversed time w  x is, using Theorem 10.2.3,
Pr [[( ) = l> w  x    w]
Pr [[(w) = l]
Pr [[(w  x) = l] h3tm w
= h3tm w
=
Pr [[(w) = l]

Pr [[( ) = l> w  x    w|[(w) = l] =

because, in steady-state w $ 4, Pr [[(w  x) = l] = Pr [[(w) = l] = l for


any nite x. Thus, the sojourn time in state l of the time-reversed process is
exponentially distributed with precisely the same rate ul = tl as the forward
time process. The steady-state vector y of the embedded Markov chain can
be written in terms of the steady-state vector of the continuous Markov
chain via (10.22). By (10.26), we obtain
Xlm =

ym Yml
m tm Yml
=
yl
l tl

With (10.21) and (10.28)


l ulm
m tml
=
ul
tl
but, since ul = tl , we nally arrive at
l ulm = m tml

(10.29)

10.8 Problems

199

Comparing (10.29) with the discrete case (10.26), we see that the transition
probabilities Slm and Ulm are changed for the rates tlm and ulm . We know
that m is the portion of time the process (both forward and reversed) spend
in state m and that tlm is the rate at which the process makes transitions
from state l to state m. Equation (10.29) has again a balance interpretation:
m tml is the rate at which the forward process moves from state m to l, while
l ulm is the rate of the time-reversed process from state l to m and both rates
are equal. Intuitively, when a process jumps from state l $ m in forward
time, it is plain that the process makes, in reversed time, just the opposite
transition from m $ l. Similarly as above, a continuous-time Markov chain
is time reversible if, for all l and m, it holds that ulm = tlm . For these processes
(which occur often in practice, as demonstrated in the chapters on queueing),
the rate from l $ m is equal to the rate from m $ l since l tlm = m tml .

10.8 Problems
(i) Consider a computer that has two identical and independent processors. The time between failures has an exponential distribution. The
mean value of this distribution is 1000 hours. The repair time for a
damaged processor is exponentially distributed as well, with a mean
value of 100 hours. We assume damaged processors can be repaired
in parallel. There are clearly three states for this computer: (1) both
processors work, (2) one processor is damaged and (3) both processors are damaged.
(a) Make a continuous Markov chain presentation of these states.
(b) What is the innitesimal generator matrix T for this Markov
chain? Give the relation between the state probability at time
w and its derivative.
(c) Calculate the steady-state of this process.
(d) What is the availability of the computer if (i) both processors
are required to work, or (ii) at least one processor should work.
(ii) Consider two identical servers that are working in parallel. When one
server fails, the other has to do the whole job alone under a higher
load. The failure times of servers are exponentially distributed: H =
3 1034 k31 , when the servers are equally loaded and F = 7 1034
k31 , when one of the servers works under the full load. In addition,
both servers may fail at the same time with a failure rate of E =
6 1035 k31 .
As soon as one of the servers fails, the repair is initiated. The

200

Continuous-time Markov chains

average downtime of a server is 31 = 10 hours. However, if both


servers are damaged, the whole system must be shut down. The
average time needed to repair both damaged servers is 31
B = 20
hours.
(a)
(b)
(c)
(d)
(e)

Draw the Markov chain for this system.


Determine the innitesimal generator matrix T.
Determine the steady-state probabilities.
Determine the average lifetime of dierent states.
What is the average number of server repairs needed during a
period of one year?

11
Applications of Markov chains

This chapter illustrates the theory of Markov chains with several examples.
Examples of queueing problems are deferred to later chapters. Generally,
Markov processes can be solved explicitly provided the transition probability
matrix S or the innitesimal generator T has a special structure. Only in a
very small number of problems is the entire time dependence of the process
available in analytic form.

11.1 Discrete Markov chains and independent random variables


This section illustrates examples of some simple Markov chains. Consider
a set {\q }qD1 of positive integer, independent random variables that are
identically distributed with Pr [\ = n] = dn .
The discrete-time Markov process, dened by [q = \q for q  1, possesses the (innite) transition probability matrix,
6
5
d1 d2 d3 d4
9 d1 d2 d3 d4 :
:
9
:
S =9
9 d1 d2 d3 d4 :
7 d1 d2 d3 d4 8

All rows are identical and Pr [[q+1 = m|[q = l] = dm shows that the states
[q+1 and [q are independent from each other.
Another, more interesting, discrete-time Markov process is dened by
[q = max [\1 > \2 > \3 > = = = > \q ] =
Hence, the process [q mirrors the maxima of the rst q random variables.
Clearly, [q+1 = max [[q > \q+1 ] reects the Markov property: the next
state is only dependent on the previous state of the process. From [q+1 =
201

202

Applications of Markov chains

max [[q > \q+1 ], we observe that Pr [[q+1 = m|[q = l] = 0 if m ? l because


the maximum does not decrease by adding a new random variable in the
list. If m A l, the state m is determined by \q+1 = m with probability dm . If
m = l, then \q+1  m, which has probability
Pr [\q+1  m] =

m
X

Pr [\q+1 = n] =

n=1

m
X

dn = Dm

n=1

The corresponding probability matrix


5
D1 d2
9 0 D2
9
S =9
0
9 0
7 0
0

is

6
d4
d4 :
:
d4 :
:
D4 8

P
A related discrete-time Markov chain is [q = qn=1 \n which obeys [q+1 =
[q + \q+1 . Furthermore, if m  l, Pr [[q+1 = m|[q = l] = 0 because the
random variables \q are non-negative such that the sum cannot decrease by
adding a new member. If m A l, then
d3
d3
D3
0

Pr [[q+1 = m|[q = l] = Pr [[q + \q+1 = m|[q = l]


= Pr [\q+1 = m  l] = dm3l
The corresponding probability matrix has
the same elements on diagonal lines
5
0
d1 d2
9 0
0
d1
9
9
S =9 0
0
0
7 0
0
0

a Toeplitz structure possessing


d3
d2
d1
0

6
:
:
:
:
8

This list can be extended by considering other integer functions of the set
Q
{\q }qD1 (such as [q+1 = min [[q > \q+1 ] or [q = qn=1 \n etc.).
11.2 The general random walk
The general random walk is an important model that describes the motion
of an item that is constrained to moving either one step forwards, stay at the
position where it currently is or move one step backwards. In general, this
three-possibility motion has transition probabilities that depend on the
position m as depicted in Fig. 11.1. Figure 11.1 illustrates that, if the process
is in state m, it has three possible choices: remain in state m with probability

11.2 The general random walk

203

um = Pr [[n+1 = m|[n = m], move to the next state m + 1 with probability


sm = Pr [[n+1 = m + 1|[n = m] or jump back to state m  1 with probability
tm = Pr [[n+1 = m  1|[n = m]. A general random walk is dened by the
(Q + 1) (Q + 1) band matrix
5
9
9
9
9
S =9
9
9
7

u0 s0 0 0
t1 u1 s1 0
0 t2 u2 s2
..
..
..
..
.
.
.
.
0 0 0 0
0 0 0 0

..
.

0
0
0
..
.

0
0
0
..
.

0
0
0
..
.

tQ31 uQ31 sQ31


0
tQ
uQ

:
:
:
:
:
:
:
8

(11.1)

where sm A 0, tm A 0, um  0 and tm + um + sm = 1 for all 0  m  Q . The


bordering states zero and Q are special: s0  0, u0  0 and s0 + u0 = 1 and
tQ  0, uQ  0 and tQ + uQ = 1.
rj
0

...

j1

pj
j

j+1

...

qj

Fig. 11.1. A transition graph of the general random walk.

The general random walk serves as model for a number of practical phenomena:
The one-dimensional motion of physical particles, electrons that hop from
one atom to another. In this case, the number of states Q can be very
large.
The gamblers ruin problem: a state m reects the capital of a gambler
whereas the sm is the chance that the gambler wins while tm is the probability that he looses. The gambler achieves his target when he reaches
state Q , but he is ruined at state 0. In that case are both states absorbing states with u0 = uQ = 1. In games, most often the probabilities are
independent of the state and simplify to sm = s, tm = t and um = 1  s  t=
The continuous-time counterpart, the birth and death process (Section
11.3), has applications to queueing processes. For a wealth of examples
and applications of the random walk, we refer to the classical treatise of
Feller (1970, Chapter III, XIV).

204

Applications of Markov chains

11.2.1 The probability of gamblers ruin


The probability of gamblers ruin is dened as xm = Pr [[W = 0|[0 = m]
where W = minn {[n = 0} is the hitting time to state 0, which is equivalent
to xm = Pr [W0 ? 4|[0 = m]. By denition, x0 = 1 and since the gambler
achieves his goal at state Q , he stops and never gets ruined, xQ = 0. The
law of total probability (2.46) gives the situation after the rst transition,
Pr [[W = 0|[0 = m] =
=

Q
X
n=0
Q
X

Pr [[W = 0|[0 = m> [1 = n] Pr [[1 = n|[0 = m]


Pr [[W = 0|[1 = n] Pr [[1 = n|[0 = m]

n=0

(Markov property)
=

Q
X

Smn Pr [[W = 0|[1 = n]

n=0

= tm Pr [[W = 0|[1 = m  1] + um Pr [[W = 0|[1 = m]


+ sm Pr [[W = 0|[1 = m + 1]
After the rst transition, the probability Pr [[W = 0|[1 = m] = xm remains
the same as the initial Pr [[W = 0|[0 = m] because W is a random variable
depending on the state and not on the discrete-time. Hence, we obtain the
equations
x0 = 1
xm = tm xm31 + um xm + sm xm+1

(1  m ? Q )

which are dierent from the corresponding steady-state equations (11.5).


The dierence lies in left-multiplication of S , yielding x = S x instead of
right-multiplication in  = S . Substituting um = 1  sm  tm gives after
some modication

tm
tm
(11.2)
xm
xm+1 =  xm31 + 1 +
sm
sm
Iteration on m for the rst few values using x0 = 1 yields

t1
t1
x2 =  + 1 +
x1
s1
s1

t1
t2
t2
t1 t2
t1 t2
t1
x2 = 
+ 1+
x1
x3 = x1 + 1 
+
+
s2
s2
s1 s1 s2
s1 s1 s2

11.2 The general random walk

205

which suggests

!
m31 Y
m31 Y
n
n
X
X
tp
tp
+ 1+
xm = 
x1
s
s
p
p
p=1
p=1
n=1

n=1

as readily veried by substitution in (11.2). The unknown x1 is determined


by the last relation, xQ = 0. Finally, the probability of gamblers ruin is
PQ31 Qn
tp
Pr [[W = 0|[0 = m] =

n=m

1+

p=1 sp
tp
n=1
p=1 sp

PQ31 Qn

(11.3)

Similarly, the mean hitting time m = H [W |[0 = m] follows by reasoning


on the possible transitions. From state m, there is a transition to state m  1
with probability tm . In case of a transition from state m to state m  1, the
hitting time W consists of the rst transition plus the remaining time from
state m  1 on which is m31 . Using the law of total probability or reading
all possible transitions from the transition graph (see Fig. 11.1), we nd
m = 1 + tm m31 + um m + sm m+1
with the boundary equations for the state 0 and state Q (which in the
gamblers ruin problem are absorbing states), 0 = Q = 0. With um =
1  sm  tm ,

tm
tm
1
m+1 =   m31 + 1 +
m
sm
sm
sm
By iteration,
1
s1
1
3 = 
s2
1
4 = 
s3
1
=
s1
2 = 

t1
+ 1+
1
s1

t2
t2
1
1
t1 t2
t2
t1
 1 + 1 +
+
2 =  
1+
+ 1+
1
s2
s2
s1 s2
s1
s1 s1 s2

t3
t3
 2 + 1 +
3
s3
s3

1
t2 t3
t1 t2
t1 t2 t3
t2
1
t3
t1

+
+
+
1+

1+
+ 1+
1
s2
s1
s3
s2
s1 s2
s1 s1 s2
s1 s2 s3

or
m31
X
1
m = 
sn
n=1

n31 Y
q
X
tn3p+1
1+
sn3p
q=1 p=1

m31 Y
n
X
tp
1+
sp

!
1

n=1 p=1

Eliminating 1 from Q = 0, nally leads to mean hitting time to ruin or

206

Applications of Markov chains

the mean duration of the game

!
m31
q
n31 Y
X
X
1
tn3p+1
m = 
1+
sn
sn3p
q=1 p=1
n=1

Pn31 Qq
! PQ31 1

tn3p+1
m31 Y
n
1
+
X
n=1 sn
q=1
p=1 sn3p
tp
+ 1+
PQ31 Qn
s
1 + n=1 p=1 tp
p=1 p

(11.4)

sp

n=1

In spite of the relatively simple dierence equations, the solution rapidly


grows unattractive. The particular case of the Markov chain where tn = t
and sn = s simplies considerably. The probability of gamblers ruin (11.3)
becomes
m Q
PQ31 t n
t
 st
n=m
s
s
=
Pr [[W = 0|[0 = m] =
Q
PQ31 t n
t
1

n=0
s
s
and, if s = t = 12 , via de lHospitals rule, Pr [[W = 0|[0 = m] =
target fortune Q (at which the game ends) is innitely large,
Pr [[W = 0|[0 = m] = 1
m
t
=
s

Q3m
Q .

If the

if t  s
if t ? s

which demonstrates that the gambler surely will loose all his money if his
chances s on winning are smaller than those t on losing. Even in a fair
game where s = t, he will be defeated surely. In a favorable game (s A t)
m
and with start capital m, ruin is possible with probability st . Another
interpretation is a game with two players d and e in which player d starts
with capital m and has winning chance of s, while player e starts with capital
Q  m and wins with probability t = 1  s.
Similarly, the mean duration of the game (11.4) simplies to
3
m 4
n 4 3
n
t
t
t
m31
Q31
1

1

1

XE
s
s
s
FX
F E
m = 
D+C
C
Q D
st
st
n=1
n=1
1  st
or

5
H [W |[0 = m] =

m 4
t
s

1 9 E1
:
F
7Q C
Q D  m 8
st
1  st

11.2 The general random walk

207

11.2.2 The steady-state


The steady-state equation (9.23) for the vector component m becomes, for
1  m ? Q,
m = sm31 m31 + um m + tm+1 m+1

(11.5)

and, for m = 0 and m = Q,


0 = u0 0 + t1 1
Q = sQ31 Q31 + uQ Q
We rewrite these equations using um = 1tm sm , u0 = 1s0 and uQ = 1tQ
as
s0 0 = t1 1

(11.6)

sm m = (sm31 m31  tm m ) + tm+1 m+1

(11.7)

sQ31 Q31 = tQ Q
Explicitly, for a few values of m, we observe that
s1 1 = (s0 0  t1 1 ) + t2 2 = t2 2
s2 2 = (s1 1  t2 2 ) + t3 3 = t3 3

or, in general, for all m,


sm m = tm+1 m+1
By iteration of m+1 =
m+1

sm
tm+1 m

starting at m = 0, we nd

m
Y
sm sm31
sp
s0
=
0 = 0
tm+1 tm
t1
tp+1
p=0

The normalization kk1 = 1 yields a condition for 0 ,


431
3
Q m31
X
Y sp
D
0 = C1 +
t
p=0 p+1
m=1

which determines the complete steady-state vector  for the general random
walk as
Qm31 sp
m =

p=0 tp+1

1+

PQ Qm31
m=1

sp
p=0 tp+1

(11.8)

These relations remain valid even when the number of states Q tends to
innity provided the innite sum converges.

208

Applications of Markov chains

In the simple case where sn = s and tn = t, we obtain with  = st ,


m =

(1  ) m
1  Q+1

(11.9)

11.3 Birth and death process


A birth and death process1 is dened by the innitesimal generator matrix
5
9
9
9
T=9
9
9
7

0
1
0
0
0
..
.

0
0
0
0
0
 (1 + 1 )
1
0
0
0
2
 (2 + 2 )
2
0
0
0
3
 (3 + 3 )
3
0
0
0
4
 (4 + 4 ) 4
..
..
..
..
..
.
.
.
.
.

0
0
0
0
0
..
.

..
.

6
:
:
:
:
:
:
8

The transition graph is shown in Fig. 11.2. Although the theory in the
previous chapter was derived for nite-state Markov chains, the birth and
death process is a generalization to an innite number of states. The general
random walk (Section 11.2) forms the embedded Markov chain of the birth
and death process with transition probabilities specied by (10.21) resulting
l
l
in Yl>l31 = l+
, Yl>l+1 = l+
and Yln = 0 for n 6= l  1 6= l + 1.
l
l
The transition probability matrix is a tri-band diagonal matrix which is
irreducible if all m A 0 and m A 0.
O0

Oj1
...

j1

P1

Oj
j

Pj

j+1

...

Pj+1

Fig. 11.2. The transition graph of a birth and death process.

The basic system of dierential equations that completely describes the


birth and death process follows from the general state probability equations
(10.11) for vn (w) = Pr [[(w) = n] as
v00 (w) = 0 v0 (w) + 1 v1 (w)

(11.10)

v0n (w) =  (n + n ) vn (w) + n31 vn31 (w) + n+1 vn+1 (w)

(11.11)

with initial condition vn (0) = Pr [[(0) = n]. Exact analytic solutions for
1

Kleinrock (1975, p. ix) mentions that William Feller was the father of the birth and death
process.

11.3 Birth and death process

209

any n and n are not possible. Indeed, let us denote the Laplace transform
of vn (w) by
Z "
h3}w vn (w)gw=
(11.12)
Vn (}) =
0

Since vn (w) is a continuous and bounded function (|vn (w)|  1 for all w A 0),
the Laplace transform exists for Re(}) A 0. The Laplace transform of (11.10)
and (11.11) becomes,
(0 + }) V0 (}) = v0 (0) + 1 V1 (})

(11.13)

(n + n + }) Vn (}) = vn (0) + n31 Vn31 (}) + n+1 Vn+1 (})

(11.14)

which is a set of dierence equations more complex due to the initial condition vn (0) than the set (A.51) in Appendix A.5.2.3. That set (A.51) appears
in the general random walk whose solution is shown to be intractable in general. This innite set of dierential equations has been thoroughly studied
over years under several simplifying conditions for n and n , for example,
n =  and n =  for all n. As shown in Chapter 13, they form the basis
for the simplest set of queueing models of the family M/M/m/K.

11.3.1 The steady-state


The steady-state follows from (10.19) as solution of the set
0 0 + 1 1 = 0
m31 m31  (m + m ) m + m+1 m+1 = 0
This set is identical to (11.6) and (11.7) provided sm and tm are changed for
m and m . After this modication in (11.8), the steady-state of the birth
and death process is
0 =

m =

p
p=0 p+1
m=1
Qm31 p
p=0 p+1
P" Qm31 p
+ m=1 p=0 p+1

1+

1
P" Qm31

(11.15)

m1

(11.16)

Theorem 9.3.4 states that an irreducible Markov chain with a nite number of states is necessarily recurrent. However, it is in general di!cult
to decide whether an irreducible Markov chain with an innite number of
states is recurrent or transient. In case of the birth and death process, it
is possible to determine when the process is transient or recurrent. The
process is transient if and only if the embedded Markov chain (determined

210

Applications of Markov chains

above) is transient. Section 9.2.3 discusses that, for a recurrent chain,


ulm = Pr [Wm ? 4|[0 = l] equals 1: a nite hitting time means every state
m is certainly visited starting from initial state l. Applied to the embedded
Markov chain, it follows from the gamblers ruin (11.3) that
Pm31 Qn
tp
n=0

p=1 sp
tp
p=1 sp

Pr [W0 ? 4|[0 = m] = 1  PQ31 Qn


n=0

Thus, for any xed initial state m, the condition for a recurrent chain
Pr [Wm ? 4|[0 = l] = 1
PQ31 Qn
tp
is only possible in the limit Q $ 4 if limQ<" n=0
p=1 sp = 4. Transformed to the birth and death rates, the condition for recurrence becomes
P
Qm31 p
2 = "
p=0 p = 4. Furthermore, we observe from (11.16) that the
m=1
Qm31 p
P
innite series 1 = "
p=0 p+1 must converge to have a stationary or
m=1
steady-state distribution.
In summary, if 1 ? 4 and 2 = 4 the birth and death process is
positive recurrent. If 1 = 4 and 2 = 4, it is null recurrent. If 2 ? 4,
the birth and death process is transient.
11.3.2 A pure birth process
A pure birth process is dened as a process {[(w)> w  0} for which in any
state l it holds that l = 0. It follows from Fig. 11.2 that a birth process
can only jump to higher states such that Slm (w) = 0 for m ? l. Similarly, in
a pure death process {[(w)> w  0} all birth rates l = 0.
11.3.2.1 The Poisson process
Let us rst consider the simplest case where all birth rates are equal l = 
and where Smm (w) = Pr [m A w|[(0) = m] = h3w . Using either the back or
forward equation or (10.25) with Yl>l+1 = l>l+1 , yields
Z w
3w
Slm (w) = lm h
+
h3x Sl+1>m (w  x)gx
0

or, for m = l + n with n A 0

3w

Sl>l+n (w) = h

hx Sl+1>l+n (x)gx

Explicitly, for n = 1,

Z
3w

Sl>l+1 (w) = h

hx h3x gx = wh3w

11.3 Birth and death process

211

which is independent of l and, thus, it holds for any l  1. For n = 2,


Z w
Z w
(w)2 3w
3w
x
3w
Sl>l+2 (w) = h
h
h Sl+1>l+2 (x)gx = h
hx xh3x gx =
2
0
0
This suggests us to propose for any l  0,
Sl>l+n (w) =

(w)n 3w
h
n!

(11.17)

which is veried inductively as


Z w
Z w
3w
x
3w
h Sl+1>l+n (x)gx = h
hx Sl>l+n31 (x)gx
Sl>l+n (w) = h
0

Z
= h3w

0
w

hx

n31

(x)
(w) 3w
h3x gx =
h
(n  1)!
n!

Hence, the transition probabilities of a pure birth process have a Poisson


distribution (11.17) and are only function of the dierence in states n =
m  l  0 for any w  0. Moreover, for 0  x  w, consider the increment
[(w)  [(x),
X
Pr [[(w)  [(x) = n] =
Pr [[(x) = l> [(w) = l + n]
lD0

Pr [[(x) = l] Pr [[(w) = l + n|[(x) = l]

lD0

Pr [[(x) = l] Sl>l+n (w  x)

lD0

( (w  x))n 3(w3x) X
h
=
Pr [[(x) = l]
n!
lD0

Thus, the increment [(w)  [(x) has a Poisson distribution,


Pr [[(w)  [(x) = n] =

( (w  x))n 3(w3x)
h
n!

(11.18)

and, since [(0) = 0 and the increments are independent (Markov property),
we conclude that the pure birth process is a Poisson process (Section 7.2).
11.3.2.2 The general birth process
In case the birth rates n depend on the actual state n, the pure birth process
can be regarded as the simplest generalization of the Poisson. The Laplace

212

Applications of Markov chains

transform dierence equations (11.13) and (11.14) reduce to the set


v0 (0)
0 + }
n31
vn (0)
Vn (}) =
+
Vn31 (})
n + } n + }
V0 (}) =

which, by the usual iteration, has the solution,


Qn31
n
X
vm (0) p=m
p
Vn (}) =
Qn
p=m (p + })
m=0

(11.19)

Q
with the convention that ep=d i (p) = 1 if d A e. The validity of this
general solution is veried by substitution into the dierence equation for
Vn (}). The form of Vn (}) is a ratio that can always be transformed back to
the time-domain provided that n is known. If all n A 0 are distinct, using
(2.38) with f A 0, we nd
vn (w) =

n
X

vm (0)

m=0

n31
Y
p=m

1
p
2l

f+l"

f3l"

Qn

h}w

p=m

(p + })

g}

By closing the contour over the negative real plane (Re(}) ? 0), only simple
poles at } = q are encountered,
1
2l

f+l"

f3l"

h}w

Qn
p=m

(p + })

g} = 

n
X

Qn

q=m

h3q w

p=m;p6=q (p

 q )

resulting in
vn (w) = 

n
X
m=0

vm (0)

n
X
q=m

h3q w

Qn

Qn31
p=m

p=m;p6=q (p

p
 q )

(11.20)

If some n = m , multiple poles occur and a slightly more complex result


appears that still can be computed in exact analytic form.
11.3.2.3 The Yule process
A classical example of a process with distinct birth rates is the Yule process,
where n = n. In that case, (11.20) can be simplied. With
Qn31
(n  1)!
p=m p
=
Qn
Qq31
Q
(m  1)! p=m (p  q) np=q+1 (p  q)
p=m;p6=q (p  q )

11.3 Birth and death process

213

Q
q313m (q  m)! and
and with p=m (p  q) = (1)q313m q3m
o=1 o = (1)
Qn3q
Qn
p=q+1 (p  q) =
o=1 o = (n  q)! we nd
Qn31
(1)q313m (n  1)!
p=m p
=
Qn
(m  1)!(q  m)!(n  q)!
p=m;p6=q (p  q )
Qq31

such that
n
X
q=m

Qn31

n3m
(n  1)! X h3(q+m)w (1)q31
=
Qn
(m  1)! q=0 q!(n  m  q)!
p=m;p6=q (p  q )

n3m
(n  1)!h3mw X n  m 3w q
=
h
(n  m)!(m  1)! q=0
q

n3m
n  1 3mw
1  h3w
h
=
m 1

h3q w

p=m

p

Finally, for the Yule process, we obtain from (11.20) the evolution of the
state probabilities over time

n
n3m
X
n  1 3mw
vn (w) =
1  h3w
h
vm (0)
(11.21)
m1
m=0

In practice, vm (0) = mq if the process starts from state q (implying vn (w) = 0
for n ? q because the process moves to the right for w  0) and the general
form simplies to

n3q
n  1 3qw
1  h3w
vn (w) =
h
(11.22)
q1
The Yule process has been used as a simple model for the evolution of a
population in which each individual gives birth at exponential rate  and
[(w) denotes the number of individuals in the population (that never decreases as there are no deaths) as a function of time w. At each state n
the population has precisely n individuals that each generate births such
that n = n, the birth rate of the population. If the population starts at
w = 0 with one individual q = 1, the evolution over time has the distribution

n31
, which is recognized from (3.5) as a geometric
vn (w) = h3w 1  h3w
distribution with mean hw . Since the sojourn times of a Markov process
are i.i.d. exponential random variables, the average time Wn to reach n inPn 1
1 Pn
1
dividuals from one ancestor equals H [Wn ] =
m=1 m = 
m=1 m which
is well approximated (Abramowitz and Stegun, 1968, Section 6.3.18) as
H [Wn ]  log(n+1)+
, where  = 0.577 215. . . is Eulers constant. If the


214

Applications of Markov chains

population starts with q individuals, the distribution (11.22) at time w consists of a sum of q i.i.d. geometric random variables, which is a negative
binomial distribution. The Yule process has been employed for example as
a crude model to estimate the spread of a disease or epidemic and the split
of molecules in new species by cosmic rays.

11.3.3 Constant rate birth and death process


In a constant rate birth and death process, both the birth rate n = 
and death rate n =  are constant for any state n. From (11.16), the
steady-state for all states m with  =  ? 1,
m = (1  ) m

m0

(11.23)

only depends on the ratio of birth over death rate. The time-dependent
constant rate birth and death process can still be computed in analytic
form. In this case, the matrix form of the innitesimal generator T has
the tri-band Toeplitz structure, which can be diagonalized in analytic form
as shown in Appendix A.5.2.1. In this section, we present an alternative
approach. Instead of dealing with an innite set of dierence equations,
a generating function approach seems more convenient. Let us denote the
generating function of the Laplace transforms Vn (}) by
*({> }) =

"
X

Vn (}){n

(11.24)

n=0

Using (11.12) into (11.24) gives


Z
" Z "
X
3}w
n
*({> }) =
h vn (w){ gw =
n=0 0

"

h3}w

"
X

vn (w){n gw

n=0

where the reversal of summation and integration is allowed because all terms
are positive. Since 0  vn (w)  1, the sum is at least convergent for |{| ? 1,
which shows that *({> }) is analytic inside the unit circle |{| ? 1 for any
Re(}) A 0.
After multiplying (11.14) by {n and summing over all n, we obtain
( + }) V0 (}) + ( +  + })

"
X
n=1

Vn (}){n =

"
X

vn (0){n + 

n=0

+ V1 (}) + 

"
X

Vn31 (}){n

n=1
"
X
n=1

Vn+1 (}){n

11.3 Birth and death process

215

and, written in terms of *({> }),


P
n+1 +  (1  {) V (})
 "
0
n=0 vn (0){
*({> }) =
{2  { ( +  + }) + 
Note that in the general case n and n , an expression in terms of *({> })
is not possible. Before continuing with the computations, we make the
additional simplication that vn (0) = nm : we assume that the constant rate
birth and death process starts in state m. With this initial condition, the
generating function
*({> }) =

 (1  {) V0 (})  {m+1
{2  { ( +  + }) + 

(11.25)

still depends on the unknown function V0 (}). The following derivation involving the theory of complex functions demonstrates a standard procedure
that will also be useful in other queuing problems.
The denominator in (11.25) has two roots,
q
1
++}
+
{1 =
( +  + })2  4
2
2 q
1
++}

{2 =
( +  + })2  4
2
2
We need the powerful theorem of Rouch (Titchmarsh, 1964, p. 116) to
deduce more on the location of {1 and {2 .
Theorem 11.3.1 (Rouch) If i (}) and j(}) are analytic inside and on a
closed contour C, and |j(})| ? |i (})| on C, then i (}) and i (}) + j(}) have
the same number of zeros inside C.
Choose i ({) =   { ( +  + }) and j({) = {2 such that i ({) + j({) =
{2  { ( +  + }) + , the denominator in (11.25). Since both i ({) and
j({) are polynomials, they are analytic everywhere in the complex {-plane.
We know that *({> }) is analytic inside the unit disk. If the roots {1 or {2 lie
inside the unit disk, the numerator in (11.25) must have zeros at precisely the
same place in order for *({> }) to be analytic inside the unit disk. Hence, we
consider as contour C in Rouchs Theorem, the unit circle |{| = 1. Clearly,

inside the unit circle (because  A 0,  A 0
i ({) has one single zero ++}
and Re(}) A 0). Furthermore, on the unit circle |{| = 1,
|  { ( +  + })|  |  |{| |( +  + })|| = |  |( +  + })|| A  = |{2 |
which shows that |j(})| ? |i (})| on the unit circle. Rouchs Theorem then
tells us that i ({) + j({) has precisely one zero inside the unit circle. This

216

Applications of Markov chains

implies that |{1 | A 1 and |{2 | ? 1 and that the numerator in (11.25) has a
zero {2 ,
 (1  {2 ) V0 (})  {m+1
=0
2
This relation determines the unknown function V0 (}) as
V0 (}) =

{m+1
2
 (1  {2 )

such that (11.25) becomes


13{
m+1
{m+1
2
13{2  {

(1  {)  {m+1 (1  {2 )
{m+1
= 2
*({> }) =
 ({  {1 ) ({  {2 )
 (1  {2 ) ({  {1 ) ({  {2 )
We know that the numerator can be divided by ({  {2 ), or explicitly,
(1  {)  {m+1 (1  {2 ) = {m+1
 {m+1 + {2 {({m  {m2 )
{m+1
2
2
" m
#
m31
X m3n
X
{2 {n + {2 {
{2m313n {n
= ({  {2 ) 
"
= ({  {2 )

n=0

{m2

n=0

+ ({2  1)

m
X

{2m3n {n

n=1

Finally,

P
{m2 + ({2  1) mn=1 {2m3n {n
*({> }) =
 (1  {2 ) ({  {1 )

(11.26)

By expanding the denominator in a Taylor series around { = 0, and denoting


d0 = {m2
dn = ({2  1){2m3n
dn = 0
we have
1
*({> }) =
 (1  {2 )
=

1
 (1  {2 )

nAm
"
X

!
dn {n

n=0
n
"
X X
n=0

dn3p
{p
1
p=0

"
X

!
p
{3p
1 {

p=0

{n

Comparing with (11.24) and equating the corresponding powers in {, we

11.3 Birth and death process

217

nd an explicit form of the Laplace transforms of the probabilities that the


birth and death process is in state n,
n
X
dn3p
1
 (1  {2 ) p=0 {p
1
!

n31 p
n
X
1
{
{
{m3n
2
2
= 2
+
1n?m

(1  {2 ) {n1 p=0 {p
1
!

m 13n
1
{

{
{
{m2 {m3n
1
1
2 1
=
+ 2
1n?m
 (1  {2 ) {n1
{1  {2

Vn (}) =

This expression can be put in dierent forms by using relations among the
zeros {1 and {2 , such as {1 + {2 = ++}
and {1 {2 =  . This ingenuity

is required to recognize in Vn (}) a known Laplace transform. Otherwise,
one has to proceed by computing the inverse Laplace transform by contour
integration via (2.38). In any case, the computation needs advanced skills in
complex function theory and we content ourselves here to present the result
without derivation (see e.g. Cohen (1969, pp. 8082)),
h
i
(11.27)
vn (w) = h3(+)w (n3m)@2 Ln3m (dw) + (n3m31)@2 Ln+m+1 (dw)
3(+)w

+h

(1  )

"
X

3p@2 Lp (dw)

p=n+m+2

s
where  =  , d = 2  and where Lv (}) denotes the modied Bessel function (Abramowitz and Stegun, 1968, Section 9.6.1). Using the asymptotic
formulas for the modied Bessel function, the behavior of vn (w) for large w
can be derived (see e.g. Cohen (1969, p. 84)),
s 2


s
s


(nm)@2 h(1 ) w
1
vn (w) = (1  ) +
n
m
+ R(w )
s
s
s s 3@2
1 
1 
2  w

1
only if  = 1
1 + R(w1 )
=s
w
n

This expression demonstrates that the constant rate birth death


process

s 2
converges to the steady-state (1  )n with a relaxation rate 1   .
Clearly, the higher , the lower the relaxation rate and the slower the process
tends to equilibrium as illustrated in Fig. 11.3. Intuitively, two eects play
a role. Since the probability that states with large n are visited increases
with increasing , the built-up time for this occupation will be larger. In
addition, the variability of the number of visited states (further derived
for the M/M/1 queue in Section 14.1) increases with increasing , which

218

Applications of Markov chains

0.10
U = 0.8
U = 0.9
U = 0.7

400
300

0.06

U = 0.6

200

0.04
100

0.02

U = 0.4

0.0

0.00

0.2

0.4

20

0.8

U = 0.2

0.6

Relaxation time

s4(t,U)

0.08

40

60

80

100

t
Fig. 11.3. The probability v4 (w) that the process is in state 4 given that it started
from state 0 as function of time (in units of average death
s time,  = 1) for various
 = . The insert shows the relaxation time  = (1  )2 (in units of average
death time,  = 1). The corresponding steady state probability 4 are 0.0012, 0.015,
0.051, 0.072, 0.082, 0.065 for  = 0=2> 0=4> 0=6> 0=7> 0=8> 0=9 respectively. Observe that
for  = 0=9, the plotted 100 time units are smaller than the relaxiation time, which
is 379 time units.

suggests that larger oscillations of the sample paths around the steady-state
are likely to occur, enlarging the convergence time.

11.4 A random walk on a graph


Let J(Q> O) denotes a graph with Q nodes and O links. Suppose that
the link weight zlm = zml of an edge from node l $ m (or vice versa) is
proportional to the transition probability Slm that a packet at node l decides
to move to node m. Clearly, zll = 0. Specically, with (9.8),
zlm
Slm = PQ

n=1 zln

This constraint (9.8) destroys the symmetry in link weight structure (zlm =
P
PQ
zml ) because, in general, Slm 6= Sml since Q
n=1 zln 6=
n=1 zmn . The sequence of nodes (or links) visited by that packet resembles a random walk
on the graph J(Q> O) and constitutes a Markov chain. Moreover, the steadystate of this Markov process is readily obtained by observing that the chain is

11.5 Slotted Aloha

219

time reversible. Indeed, the condition for time reversibility (10.27) becomes
l zlm
m zml
= PQ
PQ
n=1 zln
n=1 zmn
or, since zlm = zml ,
l
PQ

n=1 zln

PQ

m
= PQ

n=1 zmn

This implies that l =  n=1 zln and using the normalization kk1 = 1,
we obtain the steady-state probabilities for all nodes l,
PQ
PQ
zln
n=1 zln
= PQ n=1
l = PQ PQ
PQ
2 l=1 n=l+1 zln
l=1
n=1 zln
This Markov process can model an active packet that monitors the network by collecting state information (number of packets, number of lost or
retransmitted packets, etc.) in each router. Of course, the link weight structure zlm for the active packet is decisive and requires additional information
to be chosen e!ciently. For example, for tra!c monitoring, the distribution
of the number of packets forwarded by each router must be obtained. For
the collection of these data, the active packet should in steady-state visit
all nodes about equally frequently or l = Q1 , implying that the Markov
transition matrix S must be doubly stochastic (see Appendix A.5.1).
11.5 Slotted Aloha
The Aloha protocol is a basic example of a multiple access communication
scheme of which Ethernet2 is considered as the direct descendant. Aloha
which means hello in the Hawaiian language was invented by Norman
Abramson at the university of Hawaii in the beginning of 1970s to provide
packet-switched radio communication between a central computer and various data terminals at the campus. Slotted Aloha is a discrete-time version
of the pure Aloha protocol, where all transmitted packets have equal length
and where each packet requires one timeslot for transmission.
Consider a network consisting of Q nodes that can communicate with
each other via a shared communication channel (e.g. a radio channel) using
the slotted Aloha protocol. The simplest arrival process D of packets at
each node is a Poisson process. We assume that these Poisson arrivals at a
2

The essential dierence with the Ethernets CSMA/CD (carrier sense multiple access with
collision detection) is that Aloha does not use carrier sensing and does not stop transmitting
when collisions are detected. Carrier sensing is only adequate if the nodes are near to each
other (as in a local area network) such that collisions can be detected before the completion of
transmission. Only then is a timely reaction possible.

220

Applications of Markov chains

node are independent from the Poisson arrivals at another node and that all
Poisson arrivals at a node have the same rate Q where  is the overall arrival rate at the network of Q nodes. The idea of the Aloha protocol is that,
upon receipt of a packet, the node transmits that newly arrived packet in the
next timeslot. In case two nodes happen to transmit a packet at the same
timeslot, a collision occurs, which results in a retransmission of the packets.
A node with a packet that must be retransmitted is said to be backlogged.
Even if new packets arrive at a backlogged node, the retransmitted packet
is the rst one to be transmitted and, for simplicity (to ignore queueing
of packets at a node), we assume that those new packets are discarded. If
backlogged nodes retransmit the packet in the next timeslot, surely a new
collision would occur. Therefore, backlogged nodes wait for some random
number of timeslots before retransmitting. We assume, for simplicity, that
su is the probability (which is the same for all backlogged nodes) that a
successful transmission occurs in the next time slot. Moreover, the probability su of retransmission is the same for each timeslot. The number of time
slots between the occurrence of a collision and a successful transmission is
a geometric random variable Wu (see Section 3.1.3) with parameter su such
that Pr [Wu = n] = su (1  su )n31 .

11.5.1 The Markov chain


The slotted Aloha protocol constitutes a discrete-time Markov chain [n 5
{0> 1> = = = > Q }, where a state m counts the number of backlogged nodes out of
the Q nodes in total and the subscript n refers to the n-th timeslot. Each
of the m backlogged nodes retransmits a packet in the next time slot with
probability su , while each of the Q  m unbacklogged nodes will transmit
surely a packet in the next time slot provided a packet arrives in the current
timeslot. The latter event (at least one arrival D) occurs with probability
sd = Pr [D A 0] = 1  Pr [D = 0]. If we assume that the arrival process is
Poissonean, then sd = 1  exp  Q , but the computations in this section
are more generally valid.
The probability that q backlogged nodes in state m retransmit in the next
time slot is binomially distributed

m q
s (1  su )m3q
eq (m) =
q u
and, similarly, the probability that q unbacklogged nodes in state m transmit

11.5 Slotted Aloha

221

in the next time slot is

Q m q
sd (1  sd )Q3m3q
xq (m) =
q
A packet is transmitted successfully if and only if (a) one new arrival and
no backlogged packet or (b) no new arrival and one backlogged packet is
transmitted. The probability of successful transmission in state m and per
time slot equals
sv (m) = x1 (m)e0 (m) + x0 (m) e1 (m)
The transition probability Sm>m+p equals
;
A
A
?

Sm>m+p

xp (m)
2pQ m
p=1
x1 (m) (1  e0 (m))
=
x
(m)
e
(m)
+
x
(m)
(1

e
(m))
p
=0
A
0
0
1
A
= 1
p = 1
x0 (m) e1 (m)

The state with m backlogged nodes jumps to the state m  1 with one backlogged node less if no new packets are sent and there is precisely 1 successful
retransmission. The state m remains in the state m if there is 1 new arrival
and there are no retransmission or if there are no new retransmissions and
none or more than 1 retransmission. The state m jumps to state m +1 if there
is 1 new arrival from a non-backlogged node and at least 1 retransmission
because then there are surely collisions and the number of backlogged nodes
increases by 1. Finally, the state m jumps to state m + p if p new packets
arrive from p dierent non-backlogged nodes, which always causes collisions
irrespective of how many backlogged nodes also retransmit in the next time
slot.
The Markov chain is illustrated in Fig. 11.4, which shows that the state
can only decrease by at most 1.
P0i
P01
0

1
P10

P00

2
P21

P11

P04

P03

P02

3
P32

P22

P44

PNN

P43
P33

Fig. 11.4. Graph of the Markov chain for slotted Aloha. Each state m counts the
number of backlogged nodes.

222

Applications of Markov chains

The transition probability matrix S has the structure


5
S00 S01 S02

S0Q
9 S10 S11 S12

S1Q
9
9 0 S21 S22

S2Q
9
S =9 .
..
.
.
.
.
..
..
..
..
9 ..
.
9
7 0
0 SQ31>Q32 SQ31;Q31 SQ31;Q
0
0
0
SQ;Q31
SQQ

6
:
:
:
:
:
:
:
8

whose eigenstructure is computed in Appendix A.5.3.


In the asymptotic regime when Q $ 4, slotted Aloha has the peculiar
property that the steady-state vector  does not exist. Although for a small
number of nodes Q , the steady-state equations can be solved, when the
number Q grows, Slotted Aloha turns out to be instable. It seems di!cult
to prove that limQ<"  = 0, but there is another argument that suggests
the truth of this awkward Aloha property. The expected change in backlog
per time slot is equivalent to
H [[n+1  [n |[n = m] = (Q  m) sd  sv (m)

(11.28)

and equals the expected number of new arrivals minus the expected number
of successful transmissions. This quantity H [[n+1  [n |[n = m] is often
called the drift. If the drift is positive for all timeslots n, the Markov chain
moves (on average) to higher
or to the right in Fig. 11.4. Since
states

sv (m)  1 and sd = 1  exp  Q , it follows that


lim H [[n+1  [n |[n = m] = 4

Q<"

Thus, the drift tends to innity, which means that, on average, the number of
backlogged nodes increases unboundedly and suggests (but does not prove,
a counter example is given in problem (ii) of Section 9.4) that the Markov
chain is transient for Q $ 4.
A more detailed discussion and engineering approaches to cure this instability are found in Bertsekas and Gallager (1992, Chapter 4). The interest
of the analysis of slotted Aloha lies in the fact that other types of multiple
access protocols, such as the important class of carrier sense multiple access (CSMA) protocols, can be deduced in a similar manner. Of the CSMA
class with collision detection, Ethernet is by far the most important because it is the basis of local area networks. Multiple access protocols of the
CSMA/CD type are discussed in our book Data Communications Networking (Van Mieghem, 2004a).

11.5 Slotted Aloha

223

11.5.2 E!ciency of slotted Aloha and the oered tra!c J


We now investigate the probability of a successful transmission in state m in
more detail,
sv (m) = x1 (m)e0 (m) + x0 (m) e1 (m)
= (Q  m) sd (1  sd )Q3m31 (1  su )m + msu (1  su )m31 (1  sd )Q3m


(Q  m) sd
msu
=
+
(1  sd )Q3m (1  su )m
1  sd
1  su
For small arrival probability sd and small retransmission probability su , the
probability of successful transmission in state m can be approximated by
using the Taylor
expansions of (1  {) = h ln(13{) = h3{ (1 + r (1)) and

{
2 as
13{ = { + r {

sv (m) = (Q  m) sd + msu + r s2d + s2u h3[(Q3m)sd +msu ] (1 + r (1))


= [(Q  m) sd + msu ] exp ( [(Q  m) sd + msu ]) (1 + r (1))
Similarly, the probability that no packet is transmitted in state m equals
sqr (m) = x0 (m) e0 (m) = (1  su )m (1  sd )Q3m
= exp ( [(Q  m) sd + msu ]) (1 + r (1))
Hence, for small sd and small su , the probability of successful transmission
and of no transmission in state m is well approximated by
sv (m) ' w (m) h3w(m)
sqr (m) ' h3w(m)
Now, w (m) = (Q  m) sd +msu is the expected number of arrivals and retransmissions in state m or, equivalently, the total rate of transmission attempts
in state m. That total rate of transmissions in state m, w(m), is also called the
oered tra!c J. The analysis shows that, for small sd and small su , sv (m)
and sqr (m) are closely approximated in terms of a Poisson random variable
with rate w (m). Moreover, the probability of successful transmission sv (m)
can be interpreted as the departure rate from state m or the throughput
VSAloha = Jh3J , which is maximized if J = w (m) = 1. By controlling su
to achieve w (m) = (Q  m) sd + msu = 1, slotted Aloha performs with highest throughput. The e!ciency SAloha of slotted Aloha with many nodes
Q A 1 is dened as the maximum fraction of time during which packets are
transmitted successfully which is max sv (m) = h31 . Hence, SAloha = 36%.
Pure Aloha (Bertsekas and Gallager, 1992), where the nodes can start
transmitting at arbitrary times instead of only at the beginning of time

224

Applications of Markov chains

slots, only performs half as e!ciently as slotted Aloha with PAloha = 18%.
Recall that each packet is assumed to have an equal length that corresponds
with the length of one timeslot. In pure Aloha, a transmitted packet at
time w is successful if no other packet is sent during (w  1> w + 1). This time
interval is precisely equal to two timeslots in slotted Aloha which explains
why PAloha = 12 SAloha . The same observation tells us that, in pure Aloha,
sqr (m) ' h32w(m) because in the successful interval the expected number of
arrivals and retransmissions is twice that in slotted Aloha. The throughput
V roughly equals the total rate of transmission attempts J (which is the same
as in slotted Aloha) multiplied by sqr (m) ' h32w(m) , hence, VPAloha = Jh32J .

11.6 Ranking of webpages


To retrieve webpages related to a users query, current websearch engines
rst perform a search similar to that in text processors to nd all webpages
containing the query terms. Due to the massive size of the World Wide Web,
this rst action can result in a huge number of retrieved webpages. Several
thousands of webpages related to a query are not uncommon. To reduce the
list of webpages, many websearch engines apply a ranking criterion to sort
this list. In this section, we discuss PageRank, the hyperlink-based ranking
system used by the Google search engine. PageRank elegantly exploits the
power of discrete Markov theory.

11.6.1 A Markov model of the web


The hyperlink structure of the World Wide Web can be viewed as a directed
graph with Q nodes. Each node in the webgraph represents a certain webpage and the directed edges represent hyperlinks. Let us consider a small
collection of webpages as in Fig. 11.5 to illustrate the underlying idea of
PageRank, invented by Brin and Page, the founders of Google.
2
3

5
4

(a)

0
0

0
P51

P12
0
0
0
P52

0
P23

P14
P24

0
P43

0
0
0

P15
P25
0

0
0

(b)

Fig. 11.5. A subgraph of the World Wide Web (a) and the corresponding transition
probability matrix S (b).

11.6 Ranking of webpages

225

The topology of any graph is determined by an adjacency matrix (see


Appendix B.1). A reasonable criterion to assess the importance of a webpage is the number of times that this webpage is visited. This criterion
suggests us to consider a discrete Markov chain whose transition probability
matrix S corresponds to adjacency matrix of the webgraph (shown in (b)
in Fig. 11.5). The element Slm of the Markov transition probability matrix
is the probability of moving from webpage l (state l) to webpage m (state
m) in one time step. The components vl [n] of the corresponding state vector
v[n] denotes the probability that at time n the webpage l is visited. The
long run mean fraction of time that webpage l is visited equals the steadystate probability l of the Markov chain. This probability l is the ranking
measure of the importance of webpage l used in Google. The basic idea is
indeed simple, but we have not shown yet how to determine the elements
Slm nor whether the steady-state probability vector  exists. In particular,
we will demonstrate that guaranteeing that the steady-state vector  exists
and that  can be computed for a Markov chain containing some billion of
states the order of magnitude of the number Q of webpages requires a
deeper knowledge of discrete Markov chains.
To start determining the elements Slm , we assume that, given we are on
webpage l, any hyperlink on that webpage has equal probability to be clicked
on. This assumption implies that Slm = g1l where the degree gl of a node l
(see also Section 15.3) equals the number of adjacent neighbors of node l in
the webgraph. This number gl is thus equal the number of hyperlinks on
webpage l. The transition probability matrix in Fig. 11.5 then becomes
5
6
0 13 0 13 31
9 0 0 1 1 1 :
3
3
3 :
9
:
S =9
9 0 0 0 0 0 :
7 0 0 1 0 0 8
1
2

1
2

The uniformity assumption is in most cases the best we can make if no


additional information is available. If, for example, web usage information
is available showing that a random surfer accessing page 2 is twice as likely
to jump to page 4 than to any other neighboring
webpage of 2, then the
1
1
1
second row can be replaced by 0 0 4 2 4 .
When solely adopting the adjacency matrix of the webgraph as underlying
structure of the Markov transition probability matrix S , we cannot assure
that S is a stochastic matrix. For example, it often occurs that a node
such as node 3 in our example in Fig. 11.5 does not contain outlinks. Such
nodes are called dangling nodes. For example, many webpages may point

226

Applications of Markov chains

to an important document on the web, which itself does not refer to any
other webpage. The corresponding row in S possesses only zero elements,
which violates the basic law (9.8) of a stochastic matrix. To rectify the
deviation from a stochastic matrix, each zero row must be replaced by a
particular non-zero row vector3 y W that obeys (9.8), i.e. kyk1 = y W x = 1
where xW = [1 1 1]. Again, the simplest recipe is to invoke uniformity
W
and to replace
zero row byy W = xQ . In our example, we replace the
any
third row by 15 15 51 15 15 and obtain
5

0
9 0
9 1
S = 9
9 5
7 0
1
2

1
3

0
1
5

0
1
2

1
3
1
5

1
3
1
3
1
5

1
0

0
0

1
3
1
3
1
5

:
:
:
:
0 8
0

However, this adjustment is not su!cient to insure the existence of a


steady-state vector . In Section 9.3.1, we have shown that, if the Markov
chain is irreducible, the steady-state vector exists. In an irreducible Markov
chain any state is reachable from any other (Section 9.2.1.1). By its very nature, the World Wide Web leads almost surely to a reducible Markov chain.
In order to create an irreducible matrix, Brin and Page have considered
=

S = S + (1  )

x=xW
Q

where 0 ?  ? 1, x=xW is a Q Q matrix with each element equal to


1 and S is the previously adjusted matrix without zero-rows. The linear
combination of the stochastic matrix S and a stochastic perturbation matrix
=

x=xW ensures that S is an irreducible stochastic matrix. Every node is


now directly connected (reachable in one step) to any other (because of
x=xW ), which makes the Markov chain irreducible with aperiodic, positive
recurrent states (see Fig. 9.1). Slightly more general, we can replace the
W
W
matrix x=x
Q by xy , where y is a probability vector as above but where we
must additionally require that each component of y is non-zero in order to
guarantee reachability. Brin and Page have called yW the personalization
vector which enables to deviate from non-uniformity. Hence, we arrive at
the Brin and Page Markov transition probability matrix
=

S = S + (1  )xy W
3

(11.29)

We use the normal vector algebra convention, but remark that the stochastic vectors  and
v[n] are also row vectors (without the transpose sign)!

11.6 Ranking of webpages

1
4
6
4
1
For y W = 16
16
16
16
16
matrix in our example becomes

5
9
=
9
S =9
9
7

1
80
1
80
1
16
1
80
33
80

19
60
1
20
1
4
1
20
9
20

and  =

3
40
41
120
3
8
7
8
3
40

19
60
19
60
1
4
1
20
1
20

4
5,

227

the probability transition

67
240
67
240
1
16
1
80
1
80

6
:
:
:
:
8

If the presented method were implemented, the initially very sparse matrix
=

S would be replaced by the dense matrix S , which for the size Q of the web
would increase storage dramatically. Therefore, a more eective way is to
dene a special vector u whose component um = 1 if row m in S is a zero-row
or node m is dangling node. Then, S = S + uy W is a rank-one update of S
=

and so is S because
=

S =  S + uy W + (1  )x=y W = S + (u + (1  )x) yW


11.6.2 Computation of the PageRank steady-state vector
The steady-state vector  obeys the eigenvalue equation (9.22), thus  = 
S . Rather than solving this equation, Brin and Page propose to compute
the steady-state vector from  = limn<" v[n]. Specically, for any starting
W
vector v[0] (usually v[0] = xQ ), we iterate the equation (9.6) p-times and
choose p su!ciently large such that kv[p]  k   where  is a prescribed
tolerance. Before turning to the convergence of the iteration process that
=

actually computes powers of S as observed from (9.9), we rst concentrate


on the basic iteration (9.6),
=

v [n + 1] = v[n]S = v[n] S + (u + (1  )x) y W


Since v[n]x = 1, we nd
v [n + 1] = v[n]S + (v[n]u + (1  )) yW

(11.30)

This formula indicates that only the product of v[n] with the (extremely)
=
sparse matrix S needs to be computed and that S and S are never formed
nor stored. As shown in Appendix A.4.3, the rate of convergence of a Markov
chain towards the steady-state is determined by the second largest eigenvalue. Furthermore, Lemma A.4.4 demonstrates that, for any personaliza=

tion vector yW , the second largest eigenvalue of S is 2 , where 2 is the


second largest eigenvalue of S . Lemma A.4.4 thus shows that by choosing

228

Applications of Markov chains

 in (11.29) appropriately, the convergence of the iteration (11.30) tends at


least as n (since 2 ? 1 for irreducible and 2 = 1 for reducible Markov
chains) towards the steady-state vector . Brin and Page report that only
50 to 100 iterations of (11.30) for  = 0=85 are su!cient. Clearly, a fast
convergence is found for small , but then (11.29) shows that the true characteristics of the webgraph are suppressed.
This brings us to a nal remark concerning the irreducibility approach.
The original method of Brin and Page that resulted in (11.29) by enforcing
that each node is connected to each other alters the true nature of the
Webgraph even though the connectivity strength to create irreducibility
W
is extremely small, Q1 in the case yW = xQ . Instead of maximally connecting
all nodes, an other irreducibility approach of minimally connecting nodes
investigated by Langville and Meyer (2005) consists of creating one dummy
node that is connected to all other nodes and to which all other nodes
are connected to ensure overall reachability. Such approach changes the
webgraph less. The large size Q of the web introduces several challenges
such as storage, stability and updating of the PageRank vector, choosing
the personalization vector y and other implementational considerations for
which we refer to Langville and Meyer (2005).
11.7 Problems
(i) Determine the steady-state probability distribution for the birthdeath processes with following transition intensities
(a) l =  and l = l,

and l = 
(b) l = l+1
where  and  are constants.
(ii) Consider a slotted ALOHA in Section 11.5. There are eight stations
that compete for slots by transmitting with probability 0.12 each in
one slot. Assume that the stations always have packets to transmit.
Compute the average time for one station to transmit seven packets.

12
Branching processes

A branching process is an evolutionary process that starts with an initial


set of items that produce several other items with a certain probability
distribution. These generated items in turn again produce new items and
so on. If we denote by [n the number of items in the n-th generation and
by \n>m the number of items produced by the m-th item in generation n, then
the basic law between the number of items in n-th and n + 1-th generation
is, for n  0,
[n+1 =

[n
X

\n>m

(12.1)

m=1

Figure 12.1 illustrates the basic law (12.1) of a branching process.


X0 = 1

X1 = 3 = Y0

X2 = 5 = Y1,1 + Y1,2 + Y1,3


= 2 +0 + 3

Fig. 12.1. A branching process with one root ([0 = 1) drawn as a tree in which all
nodes of generation n lie at a same distance n from the root (label 0).

In general, the production process in each generation n can be dierent,


but most often and in the sequel it is assumed that all generations produce
items with the same probability distribution such that all random variables
\n>m are independent and have the same distribution as \ . The branching
229

230

Branching processes

process is entirely dened by the basic law (12.1) and the distribution of
the initial set [0 . The basic law (12.1) indicates that the number of items
[n+1 in generation n + 1 is only dependent on the number of items [n in
the previous generation n. The Markov property (9.2)
Pr [[n+1 = {n+1 |[0 = {0 > = = = > [n = {n ] = Pr [[n+1 = {n+1 |[n = {n ]
5
6
{n
X
= Pr 7
\ = {n+1 8
m=1

= Pr [{n \ = {n+1 ]
is obeyed, which shows that the branching process {[n }nD0 is a Markov
chain with transition probabilities Slm = Pr [l\ = m]. The discrete branching
process can be extended to a continuous-time branching process in which
items are produced continuously in time, rather than by generations. Since
continuous-time Markov processes are mathematically more di!cult than
their discrete counterpart, we omit the continuous-time branching processes
but refer to the book of Harris (1963) and to a simple example, the Yule
process, in Section 11.3.2.3.
There are many examples of branching processes and we briey describe
some of the most important. In biology, a certain species generates osprings
and the survival of that species after q generations is studied as a branching
process. In the same vein, what is the probability that a family name that is
inherited by sons only will eventually become extinct? This was the question
posed by Galton and Watson that gave birth to the theory of branching
processes in 1874. In physics, branching processes have been studied to
understand nuclear chain reactions. A nucleus is split by a neutron and
several new free neutrons are generated. Each of these free neutrons again
may hit another nucleus producing additional free neutrons and so on. In
micro-electronics, the avalanche break-down of a diode is another example of
a branching process. In queuing theory, all new arrivals of packets during the
service time of a particular packet can be described as a branching process.
The process continues as long as the queue lasts. The number of duplicates
generated by a ooding process in a communications network is a branching
process: a ooded packet is sent on all interfaces of a router except for
the incoming interface. The spread of computer viruses in the Internet can
be modeled approximately as a branching process. The application of a
branching process to compute the hopcount of the shortest path between
two arbitrary nodes in a network is discussed in Section 15.7.

12.1 The probability generating function

231

12.1 The probability generating function


Since \n>m are independent, identically distributed random variables with
same distribution as \ and independent of the random variable [n , the
probability generating function *[n+1 (}) of [n+1 follows from (2.70) and
the basic law (12.1) as
*[n+1 (}) = *[n (*\ (}))

(12.2)

with *[0 (}) = H } [0 = i (}) where i (}) is a given probability generating


function. Iterating the general relation (12.2) gives,
*[n+1 (}) = *[n (*\ (}) = *[n31 (*\ (*\ (})))
= i (*\ (*\ (= = = (*\ (})))))
where the last relation consists of n nested repeated functions *\ (=), called
the iterates of *\ (=).
The expectation can be derived from the probability generating function
by derivation and setting } = 1. More elegantly, by taking the expectation
of the basic law (12.1) and recalling that [n and \n>m are independent, we
have with  = H [\ ]
6
5
5
6
[n
[n
X
X
\n>m 8 = H 7
\ 8 = H [\ [n ] = H [[n ]
H [[n+1 ] = H 7
m=1

m=1

Iteration starting from a given average H [[0 ] of the initial population gives
H [[n ] = n H [[0 ]

(12.3)

Using (2.27), the variance of [n+1 follows from (12.2) with


*0[n+1 (}) = *0[n (*\ (}))*0\ (})

2
*00[n+1 (}) = *00[n (*\ (})) *0\ (}) + *0[n (*\ (}))*00\ (})
as

2
Var [[n+1 ] = *00[n+1 (1) + *0[n+1 (1)  *0[n+1 (1)

2
2
= *00[n (1) (*0\ (1)) + *0[n (1)*00\ (1) + *0[n (1)*0\ (1)  *0[n (1)*0\ (1)
2

= *00[n (1) (H [\ ]) + H [[n ] *00\ (1) + H [[n ] H [\ ]  (H [[n ] H [\ ])


= 2 Var [[n ] + H [[n ] Var [\ ]

Iteration starting from a given variance Var[[0 ] of the initial set of items

232

Branching processes

and employing the expression for the average (12.3) yields


Var [[1 ] = 2 Var [[0 ] + H [[0 ] Var [\ ]
Var [[2 ] = (H [\ ])2 Var [[1 ] + H [[1 ] Var [\ ]

= 4 Var [[0 ] + 2 +  H [[0 ] Var [\ ]


Var [[3 ] = (H [\ ])2 Var [[2 ] + H [[2 ] Var [\ ]

= 6 Var [[0 ] + 4 + 3 + 2 H [[0 ] Var [\ ]


from which we deduce
X

2(n31)
2n

Var [[n ] =  Var [[0 ] +

m H [[0 ] Var [\ ]

m=n31

or
Var [[n ] = 2n Var [[0 ] + H [[0 ] Var [\ ] n31

1  n
1

(12.4)

Substitution into the recursion for Var[[n ] justies the correctness of (12.4).
The relations for the expectation (12.3) and the variance (12.4) of the
number of items in generation n imply that, if the average production
per generation is H [\ ] =  = 1, H [[n ] = H [[0 ] and that Var[[n ] =
Var[[0 ] + nH [[0 ]Var[\ ]. In the case that the average production H [\ ] =
 A 1 (H [\ ] ? 1), the average population per generation grows (decreases)
exponentially
p in n with rate log  and, similarly for large n, the standard
deviation Var [[n ] grows (decreases) exponentially in n with the same rate
log . Hence, the most important factor in the branching process is the average production H [\ ] =  per generation. The variance terms and H [[0 ]
only play a role as prefactor. A branching process is called critical if  = 1,
subcritical if  ? 1 and supercritical if  A 1. In the sequel, we will only
consider supercritical ( A 1) branching processes.
Often, the initial set of items consists of only one item. In that case,
[0 = 1 and *[0 (}) = i (}) = } and
H [[n ] = n
Var [[n ] = Var [\ ] n31

1  n
1

while the explicit nested form of the probability generating function indicates
that
*[n+1 (}) = *\ (*[n (}))

(12.5)

This relation is only valid if i (}) = } or, equivalently, only if [0 = 1. In


case i (}) = }, *[n (}) is the n-th iterate of *\ (}).

12.2 The limit Z of the scaled random variables Zn

233

Example Due to the nested structure (12.2), closed form expressions


for the n-th generation probability generating function *[n (}) are rare.
Assume that [0 = 1. A simple case that allows explicit computation occurs
in a deterministic production of p osprings in each generation for which
*\ (}) = } p . We have from (12.5) that *[1 (}) = *\ (}) = } p and
*[n (}) = } np
This branching process evolves as an p-ary tree shown in Fig. 17.7. A
second example that can be computed exactly is the geometric branching
process studied in Section 12.5.
12.2 The limit Z of the scaled random variables Zn
The conditional expectation, dened in Section 2.6,
(Markov property)
H [[n+1 |[n > [n31 > = = = > [0 ] = H [[n+1 |[n ]
6
5

[n
X

7
=H
\n>m [n 8 = H [\ [n |[n ] = [n

m=1
is a random variable, which suggests us to consider the scaled random varin
able Zn = [
because
n
H [Zn+1 |Zn > Zn31 > ===> Z1 ] = Zn
while (12.3) shows that H [Zn ] = H [[0 ] for all n. The stochastic process
{Zn }nD1 is a martingale process, which is a generalization of a fair game
with characteristic property that at each step n in the process H [Zn ] is a
constant (independent of n). From (12.4), the variance of the scaled random
n
is
variables Zn = [
n
Var [Zn ] = Var [[0 ] + H [[0 ]

Var [\ ]
3n
1


2  

(12.6)

which geometrically tends, provided H [\ ] =  A 1, to a constant independent of n. The expression for the variance (12.6) indicates that
Var [Z ] = lim Var [Zn ] = Var [[0 ] + H [[0 ]
n<"

Var [\ ]
2  

(12.7)

exists provided H [\ ] =  A 1. We now show that the limit variable Z =


limn<" Zn exists if H [\ ] A 1.
Theorem 12.2.1 If H [\ ] =  A 1, the scaled random variables Zq $ Z
a.s.

234

Branching processes

Proof: Consider
i
h
2

+ H Zq2  2H [Zn+q Zq ]
H (Zn+q  Zq )2 = H Zn+q
Using (2.72) with k({) = { and the Markov property,

H [Zn+q Zq ] = H [H [Zn+q |Zq ] Zq ] = H Zq2

we have, with (2.16), H [Zn+q ] = H [Zn ] = H [[0 ] and (12.6),


i

h
Var [\ ] 3q
3n
1


H (Zn+q  Zq )2 = Var [Zn+q ]Var [Zq ] = H [[0 ]
2  
In the limit n $ 4,

h
i

H (Z  Zq )2 = R 3q

which means that the sequence {Zn }nD1 converges to Z in O2 or in mean


square (see Section 6.1.2). Moreover,
""
#
"
i
h
X
X
2
2
H (Z  Zq ) = H
(Z  Zq ) = R(1)
q=1

q=1

P
2
which means that the series has nite expectation and that "
q=1 (Z  Zq )
is nite with probability 1. The convergence of this series implies for large
q that (Z  Zq )2 $ 0 with probability 1 or that Zq $ Z a.s.

Theorem 12.2.1 means that the number of items in generation n is, for
large n, well approximated by [n  Z n . Hence, an asymptotic analysis
of a branching process crucially relies on the properties of the limit random
variable Z .

The generating function *Z (}) = H } Z of this limit random variable
can be deduced as the limit of the sequence of generating functions
 [
n
Z
3n
n
(12.8)
*Zn (}) = H }
= H } n = *[n (}  )
3n31

Using (12.5) in case1 [0 = 1 with } $ } 


 [
 [n+1
n
n+1
= *\ H } n 
H }
leads with Zn =
Zn ,

[n
n

to the recursion of the pgf of the scaled random variables

1
*Zn+1 (}) = *\ *Zn } 

The use of the general equation (12.2) is inadequate.

12.2 The limit Z of the scaled random variables Zn

235

In the limit n $ 4 where Zn $ Z a.s., we can apply the Continuity


Theorem 6.1.3 which results in the functional equation for the pgf of the
continuous limit random variable Z ,

1
*Z (}) = *\ *Z } 
(12.9)
Since Z is a continuous random variable except at Z = 0 as explained
below (see (12.19)), it ismore convenient to dene the moment generating function "Z (w) = H h3wZ . Obviously, the relation between the two
generating functions is with } = h3w
"Z (w) = *Z (h3w )
With } = h3w in (12.9) the functional equation of "Z (w) is for w  0 and
H [Z ] = H [[0 ] = 1,


w
(12.10)
"Z (w) = *\ "Z

The functional equation (12.10) is simpler than (12.9) and "Z (w) is convex for all w, while *Z (}) is not convex for all }. In particular, *Z (}) =
"Z ( log }) is not analytic at } = 0 and appears2 to have a concave regime
00
near } & 0 where "0Z ( log }) + "Z ( log }) ? 0.
Lemma 12.2.2 "Z (w) is the only probability generating function satisfying
the functional equation (12.10).

W
Proof : Let #Z W (w) = H h3wZ and "Z (w) = H h3wZ be two probability generating functions that satisfy both (12.10). Then #Z W (w)  "Z (w)
is continuous for Re w  0 and, since H[Z ] = H[Z W ] = 1, the Taylor series
(2.40) around w = 0 is
""
#

X (w)n
n
#Z W (w)  "Z (w) = (w)H [Z W  Z ] + H
(Z W )  Z n
n!
n=2
""
#

X (w)n
= wH
Z n+1  (Z W )n+1
(n + 1)!
n=1

from which #Z W (w)  "Z (w) = w k(w) and k(0) = 0. Since |*0\ (})|   for
|}|  1, equation (5.6) of the Mean Value Theorem implies |*\ (d)  *\ (e)| 
 |d  e| for any |d|> |e| 5 [0> 1]. Since |"Z (w) |  1 and |#Z W (w)|  1 for
2

This fact is observed for both a geometric and Poisson production distribution function.

236

Branching processes

Re(w)  0, we obtain

w
w

|w k(w)| = *\ #Z W
 *\ "Z




w
w
w
=  k
  #Z W
 "Z






or

w
|k(w)|  k


After N iterations, we have that |k(w)|  k wN which hold for any integer
N. Hence,for any nite
w and since

k (w) is continuous which allows that


w
w
limN<" k N = k limN<" N ,

w
|k(w)|  lim k
= k(0) = 0
N<"
N
which proves the Lemma.

Lemma 12.2.2 is important because solving the functional equation, for example by Taylor expansion, is one of the primary tools to determine "Z (w).
If *\ (}) is analytic inside a circle with radius U\ A 0 centered at } = 1,
then the Taylor series around }0 = 1,
*\ (}) = 1 +

"
X

xn (}  1)n

n=0

converges for all |}  1| ? U\ . The denition "Z (w) = H h3wZ implies


that the maximum value of |"Z (w)| inside and on a circle with radius u
around the origin is attained at "Z (u). The functional equation (12.10)
then shows that "Z
(w) isanalytic inside a circle around w = 0 with radius
UZ for which "Z  UZ ? 1 + U\ . Since "Z (0) = 1, "Z (w) is convex
and decreasing for real w and U\ A 0, there exists such a non-zero value of
UZ . This implies that the Taylor series
"Z (w) = 1 +

"
X

$n wn

(12.11)

n=1

converges around w = 0 for |w| ? UZ . There exists a recursion to compute


$n for any n  1 as shown in Van Mieghem (2005). If "Z (w) is not known in
closed form, the interest of the Taylor series (12.11) lies in the fast convergence for small values of |w| ? 1. The recursion for the Taylor coe!cients $n

12.3 The Probability of Extinction of a Branching Process

237

enables the computation of "Z (w) for |w| ? 1 to any


desired degree of accu
w
extends the w-range
racy. The functional equation "Z (w) = *\ "Z 
to the entire complex plane. For large values of
particular for nega w and in
w
tive real w, "Z (w) is best computed from "Z [log |w|]+1 after [log |w|] + 1


w
? 1,
 |w|]+1
[log

"Z [logw|w|]+1


functional iteratives of (12.10). Indeed, since  A 1 such that


the Taylor series (12.11) provides an accurate start value
for this iterative scheme.

12.3 The Probability of Extinction of a Branching Process


In many applications the probability that the process will eventually terminate and which parameters inuence this extinction probability are of
interest. For instance, a nuclear reaction will only lead to an explosion if
critical starting conditions are obeyed. The branching process terminates if,
for some generation q A 0, [q = 0 and, of course, [p = 0 for all p A q.
Let us denote
tn = Pr [[n = 0] = *[n (0)
If we assume that [0 = 1, the analysis simplies because the more specic
version (12.5) holds. Hence, only if the initial set consists of a single item
[0 = 1,
tn+1 = *[n+1 (0) = *\ (*[n (0)) = *\ (tn )

(12.12)

and with t0 = *[0 (0) = 0, t1 = *\ (0) = Pr [\ = 0]  0. Obviously, if there


is no production, Pr [\ = 0] = 1, or always production, Pr [\ = 0] = 0,
extinction never occurs. By its denition (2.18), a probability generating
function of a non-negative discrete random variable is strict increasing along
the positive real }axis. When excluding the extreme cases such that 0 ?
Pr [\ = 0] ? 1, by the strict increase of *\ ({) for { = Re }  0, we observe
that
0 = t0 ? t1 = *\ (0) ? t2 = *\ (t1 ) ? t3 = *\ (t2 ) ? ===
The series t0 > t1> t2 > = = = is a monotone increasing sequence bounded by 1
because *\ (1) = 1. Hence, the probability of extinction
0 = lim Pr [[n = 0] = Pr [Z = 0]
n<"

exists and 0 ? 0  1. The existence of a limiting process and the fact that
the probability generating function is analytic for |}| ? 1 and hence, continuous, which allows us to interchange limn<" *\ (tn ) = *\ (limn<" tn )

238

Branching processes

yields the equation for the extinction probability 0 ,


0 = *\ (0 )

(12.13)

It demonstrates that the extinction probability 0 is a root of *\ ({)  { in


the interval { 5 [0> 1].
Since *Z (0) = Pr [Z = 0] = 0 , this equation (12.13) follows more directly from (12.9). Notice, however, that in the functional equation (12.9)
@ N which may cause that iZ ({)
the function }  is not analytic at } = 0 if  5
is possibly not continuous at { = 0, although the limit lim}<0 *Z (0) = 0
exists. On the other hand, since *Z (}) = "Z ( log }), the extinction probability is found as
lim "Z (w) = 0

w<"

and the convexity of "Z (w) implies that, for any real value of w, "Z (w)  0 .
An alternative, more probabilistic derivation of equation (12.13) is as
follows. Applying the law of total probability (2.46) to the denition of the
extinction probability
0 = Pr [[q = 0 for some q A 0]
"
X
=
Pr [[q = 0 for some q A 0|[1 = m] Pr [[1 = m]
m=0

Only if [0 = 1, relation (12.5) indicates that *[1 (}) = *\ (}) which implies
that Pr [[1 = m] = Pr [\ = m]. In addition, given the rst generation consists
of m items, the branching process will eventually terminate if and only if each
of the m sets of items generated by the rst generation eventually dies out.
Since each set evolves independently and since the probability that any set
generated by a particular ancestor in the rst generation becomes extinct is
0 , we arrive at
"
X
0m Pr [\ = m] = *\ (0 )
0 =
m=0

The dierent viewpoints thus lead to a same result summarized by:


Theorem 12.3.1 If [0 = 1 and 0 ? Pr [\ = 0] ? 1, the extinction probability 0 is (a) the smallest positive real root of { = *\ ({) and (b) 0 = 1
if and only if H [\ ]  1 and Pr [\ = 0] + Pr [\ = 1] ? 1.
Proof: (a) Suppose that {r is the smallest positive real root obeying *\ ({r ) = {r A 0. Then,
t1 = *\ (0) ? *\ ({r ) = {r . Assume (induction hypothesis) that tq ? {r . The recursion (12.12)
and the strict increase of *\ ({) then shows that tq+1 = *\ (tq ) ? *\ ({r ) = {0 . Hence, the
principle of induction demonstrates that tq ? {r for all (nite) q and, hence, that 0 $ {0 .

12.3 The Probability of Extinction of a Branching Process

239

(b) First, the condition Pr [\ = 0] + Pr [\ = 1] ? 1 implies that Pr [\ A 1] A 0 and that


there exists at least one integer m A 1 such that Pr [\ = m] A 0. In that case, for real { A 0
but { smaller
convergence which is at least U = 1, the second derivative
S" than the radius U of m32
*00
is positive, which implies that *\ ({) is strict convex in
m=2 m (m 3 1) Pr [\ = m] {
\ ({) =
(0> 1). Since { = 1 obeys { = *\ ({) and *\ (0) = Pr [\ = 0] M (0> 1), the strict convex function
| = *\ ({) can only intersect the line | = { in some point { M (0> 1) if *\ ({) is below that line
near their intersection at { = 1 or if *0\ (1) = H [\ ] A 1. In the other case, if H [\ ] ? 1, the only
intersection is at { = 1.

The two possibilities are drawn in Fig. 12.2.


MY (x)
1

a
b

Pr[Ya = 0]

Pr[Yb = 0]
x3
0

S0 x2

x1

x0

Fig. 12.2. The generating function *\ ({) along the positive real axis {. The two
possible cases are shown: curve d corresponds to H [\ ] ? 1 and curve e to H [\ ] A 1.
The fast convergence towards the zero 0 is exemplied by the sequence {0 A {1 =
*\ ({0 ) A {2 = *\ ({1 ) A {3 = *\ ({2 ).

A root equation such as (12.13) also appears in queuing models such as the
M/G/1 (Section 14.3) and GI/D/1 (Section 14.4) and reects the asymptotic
behavior as explained in Section 5.7. The extinction probability 0 can be
expressed explicitly as a Lagrange series as demonstrated in Van Mieghem
(1996).
The branching process with innitely many generations n $ 4 can be
viewed as an innite directed tree where each node has a nite degree a.s.
The fact that 0 ? 1 if H [\ ] A 1 implies that, in innite directed trees,
there exists an innitely long path starting from the root with probability
1  0 .
Theorem 12.3.2 The limiting branching process with [0 = 1 obeys for

240

Branching processes

n$4
Pr [[n = 0] $ 0
Pr [[n = m] $ 0

for any m A 0

Proof: First, if H [\ ] ? 1, then Theorem 12.3.1 states that 0 = 1. For any probability
generating function * (}), it holds that |* (}) | $ 1 for |}| $ 1. Hence, *[n ({) $ 1 for real
{ M [0> 1]. Moreover, tn = *[n (0) $ *[n ({). In the limit n < ", tn < 0 = 1, which implies
that for all { M [0> 1] it holds that *[n ({) < 0 = 1. The fact that a probability generating
function, a Taylor series around } = 0, converges to a constant 0 for 0 $ { $ 1 implies that
Pr [[n = m] < 0 for any m A 0 and Pr [[n = 0] < 0 .
The second case H [\ ] A 1 possesses an extinction probability 0 ? 1. For { M (0 > 1), Fig. 12.2
shows that 0 ? *\ ({) ? { ? 1. By induction using (12.5), we nd that 0 ? *[n ({) ?
*[n1 ({) ? ? 1 or limn<" *[n ({) = 0 for { M (0 > 1). For { M [0> 0 ), the same argument
tn = *[n (0) $ *[n ({) $ 0 shows that limn<" *[n ({) = 0 for { M [0> 1). This proves the
theorem.

Theorem 12.3.2 states that, regardless of the value of H [\ ], the probability


that the n-th generation will consists of any nite positive number of items
tends to zero if n $ 4. Theorem 12.3.2 is equivalent to the statement
that, after an innite number n $ 4 of evolutions or generations, [n $ 4
with probability 1  0 . Theorem 12.3.2 also illustrates that the Markov
chain with an innite number of states behaves dierently than a chain with
a nite number of states. In particular, Theorem 12.3.2 shows that the
innite Markov chain {[n }nD0 has a single absorbing state [n = 0 while
all other states m are transient (limq<" Slmq = 0 for 1  l> m ? 4). The
existence of the steady-state vector  (not all components are zero) does
not imply that the branching process with [0 = 1 and 0 ? Pr [\ = 0] ? 1
and innitely many states is an irreducible Markov chain.
12.4 Asymptotic behavior of Z
The convexity of "Z (w) implies that "0Z (w)  0 for all real w and that "0Z (w)
is decreasing in w. We know that "0Z (0) = 1= Since limw<" "Z (w) = 0 ,
it follows that limw<" "0Z (w) = 0. The following Lemma 12.4.1 is a little
more precise.
Lemma 12.4.1 "0Z (w) = r(w31 ) for w $ 4.
Proof: The derivative of the functional equation (12.10) is "0Z (w) = *0\ ("Z (w)) "0Z (w).
By iteration, we have
N31


\

 
N "0Z N w = "0Z (w)
*0\ "Z m w
m=0


 
Since "Z (w) M [0 > 1] for real w D 0, then *0\ (0 ) $ *0\ "Z m w $  for any m. Theorem 12.3.1
0
states that if  = *\ (1) A 1, then there are two zeros 0 and 1 of i (}) = *\ (})3} in } M [0> 1]. By
Rolles Theorem applied to the continuous function i (}) = *\ (}) 3 }, there exists an  M (0 > 1)

12.4 Asymptotic behavior of Z

241

for which i 0 () = 0. Equivalently, *0\ () = 1 and  A 0 . Since *0\ (}) is monotonously increasing
in } M [0> 1], we have that *0\ (0) = Pr [\ = 1] $ *0\ (0 ) ? 1. Since "Z (w) is continuous and

 
monotone decreasing, there exists an integer N0 such that *0\ "Z m w ? 1 for m A N0 and
any w A 0. Hence,
lim

N<"

N31
\

"
0 31

  N\

  \

 
*0\ "Z m w =
*0\ "Z m w
*0\ "Z m w < 0

m=0

m=0

and, for any nite w A 0, N "0Z

m=N0

 N 
 w < 0 for N < " which implies the lemma.

Lemma 12.4.1 is, for large w, equivalent to |"0Z (w) |  Fw313 for some
real  A 0 and where F is a nite positive real number. Lemma 12.4.1 thus
suggests that
"0Z (w) = j (w) w331

(12.14)

where 0 ? j (w)  F on the real positive waxis.


Lemma 12.4.2 If *0\ (0 ) A 0 and  A 1, then
I = lim j (w)
w<"

(12.15)

exists, is nite and strictly positive.


Proof: We rst use (a) the convexity of any pgf "Z (w) implying that "00
Z (w) D 0 for all w and
we then invoke (b) the functional equation (12.10) of "Z (w).
(a) The function j (w) = 3"0Z (w) w+1 is dierentiable, thus continuous, and has for real w A 0
only one extremum at w =  obeying
 =

3"0Z ( )
00

"Z ( )

( + 1) A 0

Since "0Z (0) = 31, implying that j (w) = w+1 (1 + r(w) as w  0 or that j (w) is initially monotone
increasing in w, the extremum at w =  is a maximum. The derivative of j (w) = 3"0Z (w) w+1 is,
with (12.14),
0
(w) =
j

such that, for  nite, max j (w) =


inequality for w D 0

+1
+1
j (w) 3 "00
Z (w) w
w

 +2 00
"
+1 Z

0
j
(w) $

( ). Since "00
Z (w) D 0 for all w, we also obtain the

+1
+1
j (w) $
F
w
w

0 (w) $ 0. Hence, j (w) is not increasing for w < ".


from which limw<" j

(b) Substitution of (12.14) in the derivative of the functional equation (12.10) yields

   
w
w
(12.16)
j

j (w) = *0\ "Z



 
Since *0\ "Z w
D *0\ (0 ) A 0 (the restriction of this Lemma), there holds with D =

*0\ (0 )  A 0 for all w A 0 that

j (w) D Dj

 
w


242

Branching processes

For w ?  , j (w) is shown in (a) to be monotone increasing, which requires that D D 1 for  A 1.
But, since the inequality with D D 1 holds for all w A 0, we must have that  < ". Hence, j (w)
is continuous and strict increasing for all w D 0 with a maximum at innity, which proves the
existence of a unique limit I $ F.
If I = 0, the suggestion (12.14) is not correct implying that "0Z (w) decreases faster than any

power of w31 . The proof of Lemma 12.4.1 indicates that his case can occur if *0\ (0 ) = 0.

In fact, D = 1. For, when passing to the limit w $ 4 in (12.16) using


Lemma 12.4.2, we obtain
3 = *0\ (0 )
which determines the exponent   1 as
=

log *0\ (0 )


log 

After integration of (12.14), we have that


Z "
"Z (w) = 0 +
j (x) x331 gx

(12.17)

(12.18)

Approximating j (x) by its limit I for large w, we obtain the asymptotic


form
I 3
"Z (w)  0 +
w

Beside  and the extinction probability 0 , the parameter I appears as
additional characterizing quantity of a branching process. The behavior of
the Laplace transform (2.37) for large w reects the Rbehavior of the probaf+l" h{w
1
{v31
bility density function for small {. Hence, using 2l
f3l" wv gw = K(v) for
Re v A 0, the probability density function is, for small {,
iZ ({)  0  ({) +

I
{31
 ( + 1)

(12.19)

The probability density function iZ ({) is not continuous at { = 0 if 0 A 0


and reects the two dierent regimes: (a) Z = 0 implying that the branching process extincts, [n = 0, from some generation n on and (b) Z A 0
implying [n  Z n for large n, the number of items per generation grows
exponentially with prefactor Z . If two sample paths of a same branching
process are generated, [n>1  Z1 n and [n>2  Z2 n may be largely dierent for large n, because of the random nature of Z : although the prefactor
Z1 and Z2 both have the same probability density function iZ ({), Z1 can
dier substantially from Z2 as illustrated by the pdf iZ ({) in Fig. 12.3.

12.5 A geometric branching processes

243

10

P =5
0.8

P =4

-1

10

P =3

-2

10
fW(x)

0.6

increasing P

fW (x)

-3

10

P =2
0.4

increasing P
-4

10

-5

10

0.2

10

increasing P

Poisson
Geometric

0.0
0

10

Fig. 12.3. The probability density function of the limit random variable Z for both
a geometric and a Poisson production process for a same set of values of the average
 = H[\ ].

12.5 A geometric branching processes


Consider a production generation function *\ (}) of the form i (}) = d}+e
f}+g .
Beside straightforward iteration of (12.5), a more elegant approach3 relies
on the following property of i (}). For } = {, the dierence is
i (})  i ({) =

dg  ef
(}  {)
(g + f}) (g + f{)

and, hence, for any two points {0 and {1 ,


i (})  i ({0 )
=
i (})  i ({1 )
3

f{1 + g
f{0 + g

}  {0
}  {1

The linear fractional transformation i (}) is an automorphism of the extended complex plane
and basic in the geometric theory of a complex function for which we refer to the book of
Sansone and Gerretsen (1960, vol. 2). Fixed points of an automorphism of the extended plane
are solutions of } = i (}), which is a quadratic equation f} 2 + (g 3 d) } 3 e = 0 and which
shows that there are at most two dierent xed points.

244

Branching processes

Let us now conne to the two xed points, {0 and {1 , of i (}) that are
1 +g
solution of i (}) = } and let  = f{
f{0 +g , then
i (})  {0
}  {0
=
i (})  {1
}  {1
Now, substitute } $ i (}), then
i (i (}))  {0
}  {0
i (})  {0
=
= 2
i (i (}))  {1
i (})  {1
}  {1
Let us denote the iterates of i (}) by zq = iq (}) = i (iq31 (})). By iterating, we nd that the iterates obey
zq  {0
}  {0
= q
zq  {1
}  {1
or
zq =

{0 (}  {1 )  {1  q (}  {0 )
}  {1   q (}  {0 )

(12.20)

Since the probability generating function (3.6) of a geometric random


variable \ is of the form i (}) = d}+e
f}+g , a geometric branching process is
regarded as a basic reference model in the study of branching processes.
The production process in each generation obeys Pr [\ = n] = tsn for n  0
t
leading to *\ (}) = 13s}
, which is slightly dierent from (3.6). We know
that the equation *\ ({) = { can have two real zeros in [0> 1], one at {1 = 1
since *\ (}) is a probability generating function and another at {0 = st =
1
1
H[\ ] =  = 0 such that
=

t
1
s{1 + 1
= =
s{0 + 1
s


The functional equation (12.5) associates zq = *[q (}) and after substitution in (12.20) we obtain

q31
 1 }  q31 + 1

(12.21)
*[q (}) =
(q  1) }  q + 1
In the case that H [\ ] =  $ 1 or s = t, using the rule of de lHospital gives
*[q (}) =

(q  1)}  q
q}  q + 1

From (12.21), the probabilities of extinction at the n-th generation are


!

1 n  1
Pr [[n = 0] = *[n (0) =
 n  1

12.5 A geometric branching processes

245

If H [\ ] =  A 1, then limn<" Pr [[n = 0] = 1 = {0 = 0 (Theorem 12.3.2)


whereas for H [\ ] =   1, we nd that limn<" Pr [[n = 0] = 1. If W0 is
the hitting time dened in Section 9.2.2 as the smallest discrete-time n such
that [n = 0, then Pr [W0  n] = Pr [[n = 0].
The probability generating function of the scaled random variables Zn =
[n
follows from (12.21) and (12.8) as
n

*Zn (}) =

3n
n31  1 }   n31 +
(n  1) } 3n  n +

1


1


from which *Z ;Jhr (}) = limn<" *Zn (}) follows as


*Z ;Geo (}) =

 1 log } + 1 
 log } + 1 

1

1


(12.22)

and
"Z ;Geo (w) =

w


+1

w+1

1

1


=1+

"
X
n31 (1)n
n31

(  1)
n=0

wn

(12.23)

Since "Z (w) = H h3Z w and using (2.40), all moments are found as H Z n =
n!n31
.
(31)n31

Furthermore, with 0 = *Z (0) = 1 and from (2.38), the probability density function follows as
Z f+l" w + 1  1
1

 {w
iZ ;Geo ({) =
h gw
(f A 0)
1
2l f3l" w + 1  
By closing the contour for { A 0 over the negative Re(w)-plane, we encounter
a simple pole at w = 1 + 1 =  (1  0 ) ? 0 (since  A 1) resulting in

;
2

1
1
A
{A0

1
exp
{
1

? 

1
iZ ;Geo ({) =
(12.24)
{=0
  ({)
A
=
0
{?0
From (12.7) the variance is Var[ZGeo ] = +1
31 . The limit random variable
ZGeo of a geometric branching process is exponentially distributed with an
atom at { = 0 equal to the extinction probability 0 = 1 . From (12.17),
the exponent Geo = 1 for any value of   1. Comparing (12.24) and
the general relation (12.19) for small { indicates that the parameter I =

2
1

1
for a geometric production process.

The limit random variable Z for production processes \ of which all moments exist can be computed via Taylor series expansions. In Van Mieghem

246

Branching processes

(2005), series for both "Z ;Po (w) and iZ ;Po ({) of a Poisson branching process
are presented. Fig. 12.3 illustrates that the probability density function
iZ ;Po ({) of a Poisson branching process is denitely distinct from that of
1
geometric branching process. Since H [Z ] = 1, the variance Var[ZPo ] = 31
of a Poisson limit random variable ZPo implies that iZ ;Po ({) is centered
around { = 1 more tightly as  increases.

13
General queueing theory

Queueing theory describes basic phenomena such as the waiting time, the
throughput, the losses, the number of queueing items, etc. in queueing
systems. Following Kleinrock (1975), any system in which arrivals place
demands upon a nite-capacity resource can be broadly termed a queueing
system.
Queuing theory is a relatively new branch of applied mathematics that
is generally considered to have been initiated by A. K. Erlang in 1918 with
his paper on the design of automatic telephone exchanges, in which the famous Erlang blocking probability, the Erlang B-formula (14.17), was derived
(Brockmeyer et al., 1948, p. 139). It was only after the Second World War,
however, that queueing theory was boosted mainly by the introduction of
computers and the digitalization of the telecommunications infrastructure.
For engineers, the two volumes by Kleinrock (1975, 1976) are perhaps the
most well-known, while in applied mathematics, apart from the penetrating
inuence of Feller (1970, 1971), the Single Server Queue of Cohen (1969)
is regarded as a landmark. Since Cohens book, which incorporates most
of the important work before 1969, a wealth of books and excellent papers
have appeared, an evolution that is still continuing today.

13.1 A queueing system


Examples of queueing abound in daily life: queueing situations at a ticket
window in the railway station or post o!ce, at the cash points in the supermarket, the waiting room at the airport, train or hospital, etc. In telecommunications, the packets arriving at the input port of a router or switch are
buered in the output queue before transmission to the next hop towards
the destination. In general, a queueing system consists of (a) arriving items
247

248

General queueing theory

Service process
Arrival
process

Departure
process
Queueing process

Fig. 13.1. The main processes in a general queueing system.

(packets or customers), (b) a buer or waiting room, (c) a service center


and (d) departures from the system.
The main processes as illustrated in Fig. 13.1 are stochastic in nature.
Initially in queueing theory, the main stochastic processes were described
in continuous time, while with the introduction of the Asynchronous Transfer Mode (ATM) at the late eighties, many queueing problems were more
eectively treated in discrete time, where the basic time unit or time slot
was the minimum service time of one ATM cell. In the literature, there is
unfortunately no widely adopted standard notation for the main random
variables, which often troubles the transparency. Let us start dening the
main random variables in continuous time.

13.1.1 The arrival process


The arrival process is characterized by the arrival time wq of the q-th packet
(customer) and the interarrival time q = wq  wq31 between the q-th and
(q  1)-th packet. If all interarrival times are i.i.d. random variables with
distribution ID (w), then
Pr [q  w] = ID (w)
As illustrated in Fig. 8.1, we can associate a counting process {Q (w)> w 
0} to the arrival process {wq > w  0} by the equivalence {Q (w)  q} +,
{wq  w}. In other words, if all interarrival times are i.i.d., the number of
arriving packets (customers) is a general renewal process with interarrival
time distribution specied by ID (w). We mention explicitly the condition
of independence which was initially considered as a natural assumption.
In recent measurements, however, arrivals of IP packets are shown not to
obey this simple condition of independence, which has lead to the use of
complicated self-similar and long-range dependent arrival processes.
In the sequel, we will use the following
notation: QD (w) is the number of
Rw
arrivals at time w, while D(w) = 0 QD (x)gx is the total number of arrivals
in the interval [0> w].

13.1 A queueing system

249

13.1.2 The service process


The service process is specied in similar way by the service time {q of the qth packet (customer). If the random variables {q are i.i.d. with distribution
I{ (w), then
Pr [{q  w] = I{ (w)

(13.1)

The service process needs additional specications. First of all, in a singleserver queueing system, only one packet (customer) is served at a time. If
there is more than one server, more packets can evidently be served simultaneously. Next, we must detail the service discipline or scheduling rule,
which describes the way a packet is treated. There is a large variety of
service disciplines. If all packets are of equal priority, the simplest rule
is rst-in-rst-out (FIFO), which serves the packets in the same order in
which they arrive. Other types such as last-in-rst-out or a random order are possible, though in telecommunication, FIFO occurs most often. If
we have packets of dierent multimedia ows, all with dierent quality of
service requirements, not all packets have equal priority. For instance, a
delay-sensitive packet (of e.g. a voice call) must be served as soon as possible preferably before non-delay-sensitive packets (of e.g. a le transfer).
In these cases, packets are extracted from the queue by a certain scheduling rule. The simplest case is a two-priority system with a head-of-the line
scheduling rule: high-priority packets are always served before low-priority
packets. In the sequel, we conne the presentation to a single-server system
with one type of packet and a FIFO discipline. Hence, we omit a discussion
of scheduling rules. A next assumption is that of work conservation: if there
is a packet waiting for service, the server will always serve the packet. Thus,
the server is only idle if there are no packets waiting in the buer and immediately starts service when the rst packet is placed in the queue or arrives.
In a non-work-conservative system, the server may stay idle, even if there
are customers waiting (e.g. a situation where patients have to wait during
a coee break in a hospital). Finally, we assume that the arrival process
is independent of the service process. Situations where arriving packets of
some type (e.g. control) change the way the remaining packets in the buer
are served, or a service discipline that serves at a rate proportional to the
number of waiting packets, are not treated.
The service in a router consists in fetching the packet from the buer,
inspecting the header to determine the correct output port and in placing
the packet on the output link for transmission.
In this chapter unless the contrary is explicitly mentioned, we consider

250

General queueing theory

a single-server queueing system under a work-conservative, FIFO service


discipline in which the arrival and service process are independent.

13.1.3 The queueing process


From Fig. 13.1, we observe at least two aspects regarding the queue or
buer: (a) the number of dierent queues and (b) the number of positions
in the queue. In general, a queueing system may have several queues or
even a shared queue for dierent servers. For example, in a router, there
is one physical fast memory or buer in which arriving packets are placed.
Depending on the output interfaces, each link driver per output port is a
server that extracts the packet destined for its link from the common buer
and transmits the packet on this link. For simplicity, we consider here only
one queue with N positions. Often queueing analyses are greatly simplied
in the innite buer case N $ 4. If the buer is innitely long, there is
zero loss, as opposed to the nite buer case in which losses can occur if the
queue is full and packets arrive.
So far, the description of the queueing system is complete: we have specied the arrival process, the service process and the physical size of the
waiting room or queue. We now turn our attention to desirable quantities
that can be deduced from the model specication of the queueing system
such as (a) the waiting or queueing time zq of the q-th packet, (b) the
system time Wq = zq + {q of the q-th packet, (c) the unnished work (also
called the virtual waiting time or workload) y(w) at time w, (d) the number
of packets in the queue QT (w) or in the system QV (w) at time w and (e) the
departure time uq of the q-th packet.
The waiting or queueing time zq of the q-th packet is only zero if the
queue is empty at arrival time wq . The unnished work y(w) at time w is
the time needed to empty the queueing system or to serve all remaining
packets in the system (queue plus server) at time w. Hence, the unnished
work at time w is equal to the sum of the service times of the QT (w) buered
packets at time w plus the remaining service time of the packet under service
at time w. Precisely at an arrival epoch w = wq as illustrated in Fig. 13.2, we
observe that y(wq ) = Wq = zq + {q . In addition, y 0 (w) = 1 for all w 6= wq or
y(w) = max [Wq  w + wq > 0] for w  wq .
The departure times uq satisfy uq = wq + Wq . The time during which
the server is busy is called a busy period, and likewise, the interval of nonactivity is called an idle period.

13.1 A queueing system

251

v(t)
x2
x1

x3

x6

w4

w3

w2

x4

x5 w
6

r1

r2

r3

r4

r5

r6

NS(t)

t1

t2

t3

t4

busy period

t5

t6

idle

Fig. 13.2. The unnished work y(w) and the number of packets in the system QV (w)
as function of time. At any new arrival at wq holds y(wq ) = zq +{q . The unnished
work y(w) decreases with slope 1 between two arrivals. The waiting times zq and
departure times uq are also shown. Notice that z1 = z5 = 0.

13.1.4 The Kendall notation for queueing systems


Kendall introduced a notation that is commonly used to describe or classify
the type of a queueing system. The general syntax is D@E@q@N@p, where
D species the interarrival process, E the service process, q the number of
servers, N the number of positions in the queue and p restricts the number
of allowed arrivals in the queueing system. Examples for both the interarrival distribution D and the service distribution E are P (memoryless or
Markovian) for the exponential distribution, J for a general distribution1
and G for a deterministic distribution. When other letters are used besides
these three common assignments, the meaning will be dened. For example, P@J@1 stands for a queueing system with exponentially distributed
interarrival times, a general service distribution and 1 server. If one of the
two last identiers N and p is not written, they should be interpreted as
innity. Hence, P@J@1 has an innitely long queue and no restriction on
the number of allowed arrivals.
1

Often J is written where JL, general independent process, is meant. We interpret J as a


general interarrival process, which can be correlated over time.

252

General queueing theory

13.1.5 The tra!c intensity 


An important parameter in any queueing system is the tra!c intensity also
called the load or the utilization, dened as the ratio of the mean service
time H [{] = 1 over the mean interarrival time H [ ] = 1
=


H [{]
=
H [ ]


(13.2)

where  and  are the mean interarrival and service rate, respectively.
Clearly, if  A 1 or H [{] A H [ ], which means that the mean service time is
longer than the mean interarrival time, then the queue will grow indenitely
long for large w, because packets are arriving faster on average than they
can be served. In this case ( A 1), the queueing system is unstable or will
never reach a steady-state. The case where  = 1 is critical. In practice,
therefore, mostly situations where  ? 1 are of interest. If  ? 1, a steadystate can be reached. These considerations are a direct consequence of the
law of conservation of packets in the system, but can be proved rigorously
by ergodic theory or Markov steady-state theory, which determine when the
process is positive recurrent.
13.2 The waiting process: Lindleys approach
From the denition of the waiting time and from Fig. 13.2, a relation between
zq+1 and zq is found. Suppose the waiting time for the rst packet z1 = z,
which is the initialization. If uq  wq+1 , which means that the q-th packet
leaves the queueing system before the (q + 1)-th packet arrives, the system
is idle and zq+1 = 0. In all other situations, uq A wq+1 , the q-th packet
is still in the queueing system while the next (q + 1)-th packet arrives and
zq+1 = wq + zq + {q  wq+1 . Indeed, the waiting time of (q + 1)-th packet
equals the system time Wq = zq + {q of the q-th packet which started
at wq minus his own arrival time wq+1 . During the interval [wq > wq+1 ], the
queueing system has processed an amount of the unnished work equal to
wq+1  wq = q+1 time units. Hence, we arrive at the general recursion for
the waiting time,
zq+1 = max (zq + {q  q+1 > 0)
Let q = {q  q+1 , then
zq+1 = max [0> zq + q ]
= max [0> max [zq31 + q31 > 0] + q ]
= max [0> q > zq31 + q31 + q ]

(13.3)

13.2 The waiting process: Lindleys approach

and, by iteration,
"
zq+1 = max 0> q > q31 + q > q32 + q31 + q > = = = >

q
X

253

#
n + z1

(13.4)

n=1

A number of observations are in order:


First, if both the interarrival times q+1 and the service times {q are
i.i.d. random variables and mutually independent, then the dierences q
are i.i.d. random variables. In addition, zq and q are also independent
because (13.4) shows that zq only depends on n with indices n ? q. Then,
the waiting time process {zq }qD1 is a discrete time Markov process with
a continuous state space (the waiting times zq are positive real numbers)
because the general relation (13.3) reveals that, since the random variable
q is independent of zq , the waiting time for the (q + 1)-th packet is only
dependent on the waiting time of the previous q-th packet. This is the
Markov property. Since the state space is a continuum, it is not a Markov
chain, merely a Markov process.
Second, if there exists a packet p for which zp = 0 (e.g. packet p = 5
in Fig. 13.2), which means that the p-th packet nds the system empty,
then all packets after the p-th packet are isolated from the eects of those
before the p-th. Mathematically, this separation between two busy periods
directly follows from (13.3) because zp+1 = max [0> p ] leading via iteration
for q  p to
"
#
q
q
X
X
zq+1 = max 0> q > q31 + q > = = = >
n >
n
n=p+1

n=p

In other words, this relation is similar to (13.4) as if the system were started
from n = p and zp = 0 instead of n = 1 with z1 = z. Any busy period can
be regarded as a renewal of the waiting process, independent of the previous
busy periods.
Third, again invoking the assumption that q are i.i.d. random variables,
then the order in the sequence {q }qD1 is of no importance in (13.4) and
we may relabel the random variables in (13.4) as n $ q3n+1 to obtain a
new random variable
"
#
q31
q
X
X
$q+1 = max 0> 1 > 1 + 2 > = = = >
n >
n + z1
n=1

n=1

which is identically distributed as zq+1 . The interest of this observation


Pq3m
n where 0 =
is that, provided z1 = 0 and, hence, $q = max1$m$q n=0
0, the sequence {$q }qD1 can only increase with q, because the maximum

254

General queueing theory

P
cannot decrease2 if an additional term q+1
n=0 n is added. Thus, if z1 = 0,
the event {$q+1 ? {} is always contained in {$q ? {}. In steady-state,
which is reached if q $ 4,
lim {$q ? {} =

q<"

_"
q=1 {$q

? {} = {sup
mD0

m
X

n ? {}

n=0

which means that the random variable $q with same distribution as the
waiting time zq converges to a limit random variable that is the supremum
P
of the terms mn=0 n in the series. From this relation, it follows that the
steady-state distribution Z ({) of the waiting time is
"
#
m
X
n ? {
Z ({) = lim Pr [zq ? {] = lim Pr [$q ? {] = Pr sup
q<"

q<"

mD0

n=0

if the latter probability exists, i.e. not zero for all {. Lindley has proved that,
if  ? 1, the latter corresponds to a proper probability distribution. In other
words, the steady-state distribution of the waiting time in a GI/G/1 system3
exists. Alternatively, the Markov process {zq }qD1 is ergodic if  ? 1.
Lindleys proof is as follows. Due to the assumption that q are
i.i.d. random variables,
the

1 Sq
Strong Law of Large Numbers (6.3) is applicable: Pr limq<" q
n=0 n = H [] = 1 where
H [] = H [{] 3 H [ ] ? 0 (the mean service time is smaller than the mean interarrival time) if
 ? 1 while H [] A 0 if  A 1. In case  A 1, there exists a number  A 0 and  A 1 such
Sq
Sq
1
that, for all q A , holds
n=0 n D  H [] q with probability 1. For large ,
n=0 n can be
k
l
Sm
made larger than any xed { such that Pr supmD0 n=0 n ? { = 0. In case  ? 1, we have for
Sq
su!ciently large q that
n=0 n ? 0. Thus, for any { A 0 and  A 0, there exists a number 
(independent of {) such that, for all q A ,
% q
&
% q
&
[
[
Pr
n ? { D Pr
n ? 0 A 1 3 
n=0

n=0

while, for q ? , we can always nd a number  A 0 such that for all { A ,


&
% q
[
n ? { A 1 3 
Pr
n=0

Sm

Since supmD0 n=0 n is attained for m ?  or m A  and because both regimes can be bounded
by the same lower bound,
6
5
&
%qA
&
%q?
m
[
[
[
n ? {8 A Pr
n ? { 1m? + Pr
n ? { 1mA
Pr 7sup
mD0 n=0

n=0

n=0

A13
k
l
k
l
S
S
Clearly, lim{<" Pr supmD0 mn=0 n ? { = 1 and Pr [zq ? 0] = 0, thus Pr supmD0 mn=0 n ? {
2
3

This observation cannot be made from (13.4) because q , which aects all but the rst term
in the maximum, can be negative.
Notice that the analysis crucially relies on the independence of the interarrival and service
process.

13.2 The waiting process: Lindleys approach

255

is non-decreasing and a proper probability distribution. We omit the considerations for the case
 = 1.

We now concentrate on the computation of the steady-state distribution


for the waiting time (in the queue) in the case that the load  ? 1 and
under the conning assumption that both the interarrival times q+1 and
the service times {q are i.i.d. random variables. We nd from (13.3) that
Pr [zq+1 ? {] = Pr [zq + q ? {]

if {  0

Pr [zq+1 ? {] = 0

if { ? 0

With the law of total probability (2.46) and since q can be negative, the
right-hand side is
Z "
g
Pr [zq + q ? {] =
Pr [q ? v] gv
Pr [zq ? {  v|q = v]
gv
3"
Using the independence of zq and q , and that zq  0, we obtain for {  0,
Z {
Pr [zq ? {  v] g Pr [q ? v]
Pr [zq + q ? {] =
3"

The distribution Pr [q ? v] = Pr [{q  q+1 ? v] = Fq (v) can be computed


(see Problem (v) in Chapter 3) provided the interarrival and service process
are known. Proceeding to the steady-state by letting q $ 4 amounts to
Lindleys integral equation in Z ({) = limq<" Pr [zq ? {] with F({) =
limq<" Fq ({),
Z {
Z ({) =
Z ({  v)gF(v)
if {  0
(13.5)
3"

=0

if { ? 0

The integral equation (13.5) is of the Wiener-Hopf type and treated in general by Titchmarsh (1948, Section 11.17) and specically by Kleinrock (1975,
Section 8.2) and Cohen (1969, p. 337). Apart from Lindleys approach, Pollaczek has used variants of the complex integral expression for
Z
{h3d{ f+l" h{}
max({> 0) =
g}
(f A Re(d) A 0)
2l f3l" }  d
to treat the complicating non-linear function max({> 0) in (13.3). Several
other approaches (Kleinrock, 1975, Chapter 8) have been proposed to solve
(13.3). We will only discuss the approach due to Benes, because his approach
does not make the conning assumption that both the interarrival times
q+1 and the service times {q are i.i.d. random variables. As mentioned
before, in Internet tra!c, which has been shown to be long-range dependent

256

General queueing theory

(i.e. correlated over many time units mainly due to TCPs control loop), the
interarrival times can be far from independent.
D(t)
[(W)

v(W)

D(u)
x1+ x2+ x3
[(u)

v(u)

x1+ x2
u

x1

idle
t
0 t1 t2

t3 t 4

t5

t 6 t7

t8 t9

t10

Fig. 13.3. The amount of work arriving to the queueing system (w) versus time
w. At w = x, we observe that (w) = (w)  w + y(0 ) A 0. The largest value of
(w)  (x) is found for x = w1 because (w1 ) = w1 , the only negative value of (w)
in [0> x). Graphically, we shift the line at 45r so to intersect the point (w1 > (w1 )) to
determine y(x). At w =  , ( ) ? 0 and the largest negative value of (w) in [0>  )
is attained at w = w8 . Three of the ve idle periods have also been shown.

13.3 The Bene


s approach to the unnished work
Instead of observing the queueing system at a time w, the Benes approach
considers the behavior over a time interval [0> w). Let (w), (w) and e(w)
denote the amount of work arriving to the queueing system in the interval
[0> w), the total idle time of the server and the total busy time of the server in
the interval [0> w), respectively. The amount of work arriving to the system
is expressed in units of time and must be regarded as the time needed to
process this work, similarly to the denition of the unnished work. If the
work to process arrives at discrete times, then (w) increases in jumps, as

13.3 The Benes approach to the unnished work

257

illustrated in Fig. 13.3,


X

D(w)

(w) =

{m

m=0

In general, however, the work may arrive continuously over time with possibly jumps at certain times. The purpose is to determine the unnished
work or virtual waiting time y(w) at time instant w, and not Rover a time
w
interval [0> w) as the previously dened quantities and D(w) = 0 QD (x)gx.
Clearly, for w A 0, the unnished work at time w consists of the total amount
of work brought in by arrivals during [0> w) plus the amount of work present
just before w = 0 minus the total time the server has been active,
y(w) = y(03 ) + (w)  e(w)

(13.6)

From the denitions above,


(w) + e(w) = w

(13.7)

Moreover, (w), (w) and e(w) are non-decreasing and right continuous (jumps
may occur) functions of time w. Since (w) and e(w) are complementary, it is
convenient to eliminate e(w) from (13.6) and (13.7) and further concentrate
on the total idle time (w) given as
(w) = y(w) + w  y(03 )  (w)

(13.8)

If y(x) A 0 at any time x 5 [0> w), then (w) = 0. On the other hand, if
y(x) = 0 at some time x 5 [0> w), then it follows from (13.8) that
(x) = x  y(03 )  (x)

(13.9)

Since (w) is non-decreasing in w, the total idle time in the interval [0> w) at
moments x when the buer is empty (y(x) = 0) is the largest value for 
that can be reached in [0> w),

(w) = sup x  y(03 )  (x)


0?x?w

and the supremum is needed because (w) can increase discontinuously (in
jumps). Combining the two regimes, we obtain in general that

3
(13.10)
(w) = max 0> sup x  y(0 )  (x)
0?x?w

258

General queueing theory

Equating the two general expressions (13.8) and (13.10) for the total idle
time of the server leads to an new relation for the unnished work,

3
3
y(w) = y(0 ) + (w)  w + max 0> sup x  y(0 )  (x)
0?x?w


3
= max y(0 ) + (w)  w> sup {(w)  w  ((x)  x)}
0?x?w

The quantity (w) = (w)  w + y(03 ) is recognized as the server overload


during [0> w), while (w)  w is the amount of excess work arriving during the
interval [0> w). Thus, (w)  (x) is the amount of excess work during [x> w)
or the overload of the server during [x> w) provided x A 0 and (0) = y(03 ).
Then,


y(w) = max (w)> sup {(w)  (x)}
0?x?w

and, with the convention that y(03 ) = sup0?x?03 {(0)  (x)} = (0),
y(w) = sup {(w)  (x)}

(13.11)

0?x?w

The unnished work y(w) at time w is equal to the largest value of the overload
or excess work during any interval [x> w)  [0> w). The relation (13.11) is
illustrated and further explained in Fig. 13.3. This general relation (13.11)
shows that the unnished work is the maximum of a stochastic process.
Furthermore, if y(w) = 0, (13.11) indicates that sup0?x?w {(w)  (x)} = 0.
Let xW denote the value at which sup0?x?w {(w)  (x)} = (w)  (xW ) = 0.
But (xW ) is the lowest value of (w) in [0> w) and, unless an arrival occurs
during the interval [w> w+ w], (w + w) = (w)  w ? (w). This argument
shows that, as soon as a new idle period begins, (w) attains the minimum
value so far.
During the idle period as shown in Fig. 13.4, (w) further decreases linearly
with slope 1 towards a new minimum (em ) in [0> em ] until the beginning of
a new busy period, say the m-th at w = em . Then, for all em ? w ? em+1 ,
y(w) = (w)  (em ) = sup {(w)  (x)}
em ?x?w

In other words, we observe that idle periods decouple the past behavior from
future behavior, as deduced earlier from the waiting time analysis in Section
13.2. As illustrated in Fig. 13.4, the series {(em )}, where em denotes the start
of the m-th busy period, is monotonously decreasing in em , i.e. (em ) A (em+1 )
for any m.

13.3 The Benes approach to the unnished work

259

[(t)
b1

b3

b2

b4
b5

t
0 t1 t2

t3 t4

t5

t6

t7

t8 t9

t10

Fig. 13.4. The excess work (w) for the same process as in previous plot. The arrows
with em denote the start of the m-th busy period. Observe that (em ) is the minimum
so far and that a busy period ends at w A em for which (w) = (em ). The length of
a busy period has been represented by a double arrow.

Let us proceed to compute the distribution of the unnished work following an idea due to Benes. Benes applies the identity4 , valid for all },
Z w
3}w
=1}
h3} g
h
0

to the total idle time of the server by putting  = (x)


Z
3}w

 31 (w)

=1}

h3}(x) g(x)

31 (0)

where  31 (w) is the inverse function. Note that (0) = 0 and that g(x) =
1{y(x)=0} gx =  (y(x)) gx where  ({) is the Dirac impulse. Let w $ (w),
then
Z w
3}(w)
=1}
h3}(x)  (y(x)) gx
h
0

Substituting (13.9) in the integral, which is only valid if y(x) = 0, and (13.8)
at the left-hand side, which is generally valid, gives
Z w
3
3
h3}(y(w)+w3y(0 )3(w)) = 1  }
h3}(x3y(0 )3(x))  (y(x)) gx
0
4

Borovkov (1976, p. 30) proposes another but less simple approach by avoiding the use of the
identity ingeniously introduced by Benes=

260

General queueing theory

or, in terms of the excess work (w) = (w)  w + y(03 ),


Z w
3}y(w)
3}(w)
h
=h
}
h3}((w)3(x))  (y(x)) gx
0

Taking the expectation of both sides yields,


Z w h
i
h
i
h
i
3}y(w)
3}(w)
3}((w)3(x))
=H h
}
H h
H h
 (y(x)) gx
0

Recall that, with (2.34), with the denition of a generating function (2.37),
and further with (2.61),
h
i
T = H h3}((w)3(x))  (y(x))
Z "Z "
C2
Pr [(w)  (x)  {> y(x)  |] g{g|
=
h3}{ (|)
C{C|
3" 3"
and with (2.45), we have
Z

C2
h}{ (|)
Pr [(w)  (x)  {|y(x)  |] Pr [y(x)  |] g{g|
C{C|
4 4
Z 4
Z 4
C Pr [y(x)  |]
C
g{h}{
(|) Pr [(w)  (x)  {|y(x)  |]
=
g|
C{ 4
C|
4
Z 4
g
h}{
=
Pr [(w)  (x)  {|y(x) = 0] Pr [y(x) = 0] g{
g{
4
4

T=

Combining all leads to


]

"

3"

]
h3}{ iy(w) ({)g{ =

"

3"

h3}{ i(w) ({)g{


]

"

g
g{

h3}{

3}

3"

]


Pr [(w) 3 (x) $ {|y(x) = 0] Pr [y(x) = 0] gx g{

By partial integration, we can remove the factor } at both sides. Indeed,


since Pr [y(w)  {] = 0 for { ? 0,
Z "
Z "
3}{
h iy(w) ({)g{ = }
h3}{ Pr [y(w)  {] g{
3"

3"

Hence, we arrive at
]

"

]
h3}{ Pr [y(w) $ {] g{ =

3"

k
h3}{ Pr [(w) $ {]

"

3"

which is equivalent to
Pr [y(w)  {] = Pr [(w)  {] 

g
g{

]

Z
0


Pr [(w) 3 (x) $ {|y(x) = 0] Pr [y(x) = 0] gx g{

g Pr [(w)  (x)  {|y(x) = 0]


Pr [y(x) = 0] gx
g{
(13.12)

13.3 The Benes approach to the unnished work

261

This general relation for the distribution of the unnished work in terms
of the excess work is the Benes equation. If y(x) = 0 for all x 5 [0> w),
this means that during that interval no work arrives and that (w)  (x) =
x  w or that w  (w)  (x)  0 for any x 5 [0> w). Thus, if we choose
{ 5 [w> 0) such that the event {(w)  (x) = x  w  {} is possible, the
probabilities appearing in the right-hand side are not identically zero while
Pr [y(w)  {] = 0. Hence, for { 5 [w> 0), the Benes equation reduces to
Z w
g Pr [(w)  (x)  {|y(x) = 0]
Pr [y(x) = 0] gx
Pr [(w)  {] =
g{
0
from which the unknown probability of an empty system Pr [y(x) = 0] can
be found5 for w + {  x  w. The Benes equation translates the problem
of nding the time-dependent virtual waiting time or unnished work in an
integral equation that, in principle, can be solved. We further note that
in the derivation hardly any assumptions about the queueing system nor
the arrival process are made such that the Benes equation provides the
most general description of the unnished work in any queueing system. Of
course, the price for generality is a considerable complexity in the integral
equation to be solved. However, we will see examples6 of its use in ATM.
13.3.1 A constant service rate
If the server operates deterministically as in ATM, for example, the amount
of work arriving to the queueing system in the interval [0> w) simplies to
(w) = D(w), the number of ATM cells in the interval [0> w), because {m = {
is the time to process one ATM cell, which we take as time unit { = 1.
With this convention, we have that (w) = D(w)  w. After substitution of
x = w  |, the integral L in (13.12) is
Z w
g Pr [(w)  (w  |)  {|y(w  |) = 0]
Pr [y(w  |) = 0] g|
L=
g{
0
and the event
{(w)  (w  |)  {} = {D(w)  D(w  |)  { + |}
5

This relation in the unknown function i (x) = Pr [y(x) = 0] is a Volterra equation of the rst
kind (see e.g. Morse and Feshbach (1978, Chapter 8))
] }
j(}) =
N(}|x)i (x)gx
d

These integral equations frequently appear in physics in boundary problems, potential and
Greens function theory.
Borovkov (1976) investigates the Benes method in more detail. He further derives from (13.12)
formulae for light and heavy tra!c, and the discrete time process.

262

General queueing theory

Since D(w)  D(w  |) is a non-negative integer n and thus a discrete random


variable, the probability density function is
gPr[(w)(w|) {|y(w|) = 0]
= Pr[D(w)D(w|) = n|y(w|) = 0] 1{+|=n
g{
which implies that only values at | = n  { contribute to the integral L.
Hence, 0  |  w implies that d{e  n  b{ + wc where b}c (respectively
d}e) are the integer equal to or smaller (respectively larger) than },
X

b{+wc

L=

Pr [D(w)  D(w + {  n) = n|y(w + {  n) = 0] Pr [y(w + {  n) = 0]

n=d{e

Hence, for a discrete queue with time slots equal to the constant service
time, the Benes equation reduces to
Pr [y(w)  {] = Pr [D(w)  b{ + wc]
X

(13.13)

b{+wc

Pr [D(w)D(w+{n) = n|y(w+{n) = 0]Pr[y(w+{n) = 0]

n=d{e

13.3.2 The steady-state distribution of the virtual waiting time


Let us turn to the steady-state distribution Y ({) = limw<" Pr [y(w)  {].
Since (w) is the amount of work arriving to the queueing system in the
is the mean amount of work arriving in that interval
interval [0> w), (w)
w
[0> w). The steady-state stability condition, equivalent to  ? 1, requires
that
(w)
=?1
lim
w<" w
because the server capacity is 1 unit of work per unit of time. Since (w)
is not decreasing, (w) decreases continuously with slope 1 between two
arrivals and increases (possibly discontinuously with jumps) during arrival
epochs, as illustrated in Fig. 13.3. In the stable steady-state regime where
 ? 1 and limw<" (w)
w ? 1, we nd that

(w)
lim (w) = lim w
 1 = 4
w<"
w<"
w
and thus limw<" Pr [(w)  {] = 1 and limw<" (w)
w = 1 ? 0. From (13.7),
we see that
(w)
e(w)
= 1  lim
=1
lim
w<" w
w<" w

13.4 The counting process

263

which, with (13.9), suggests that


Y (0) = lim Pr [y(w) = 0] = 1  
w<"

(13.14)

If the Strong Law of Large Numbers is applicable, which implies that the
lengths of the idle periods are independent and identically distributed, this
relation Y (0) = 1   is proved to be true by Borovkov (1976, pp. 3334).
Hence, for any stationary single-server system with tra!c intensity , the
probability of an empty system at an arbitrary time is 1  .
Taking the limit w $ 4 in (13.12) then yields
Z w
gPr[(w)(w  |)  {|y(w  |) = 0]
Pr[y(w  |) = 0]g|
Y ({) = 1  lim
w<" 0
g{
The tail probability 1  Y ({) = limw<" Pr [y(w) A {] = Pr [y(w" ) A {] is
Z "
gPr[(w" )(w"|)  {|y(w"|) = 0]
Pr[y(w" |) = 0]g|
Pr[y(w" ) A {] =
g{
0
This relation shows that, at a point in the steady-state w" $ 4, the contributions to Y ({) are due to arrivals and idle periods in the past. The
corresponding steady-state equation for (13.13) is
Z w"


"
X

Pr [y(w" ) A {] =
Pr y(w" + {  n) = 0
QD (x) gx = n
n=d{e

Z

w" +{3n

w"

Pr
w" +{3n

QD (x) gx = n

(13.15)

13.4 The counting process


A similar general conservation relation to (13.3) can be deduced for the
counting process,
QV (un+1 ) = max (QV (un )  1> 0) + QD (un+1 )  QD (un )
The number of packets in the system at the departure time of the (n + 1)-th
packet equals the number of packets in the system at the departure time
of the previous packet n minus that packet, but increased by the number
of arrivals in the time interval [un > un+1 ]. Similarly, for the queue (which is
system minus the packet currently under service),
QT (un+1 ) = max (QT (un ) + QD (un+1 )  QD (un )  1> 0)
which is the direct analog of (13.3) for the waiting time.
Whereas the waiting process is more natural to consider in problems where

264

General queueing theory

interarrival times are specied, the counting process has more advantages
in a discrete time analysis. In the latter, the queueing system is observed
at certain moments in time, for instance, at the beginning of a timeslot n
that starts immediately7 after the departure of the n-th packet and is equal
to the interval [un > un+1 ], for all un A 0 and u0 = 0. It will be convenient
to simplify the notation: Sn = QV (un+ ) denotes the system content (i.e. the
number of occupied queue positions including the packets currently being
served) at the beginning of timeslot n, Qn = QT (un+ ) is the queue content
at the beginning of timeslot n, and Xn and An are the number of served
packets and of arriving packets during timeslot n respectively. The system
content satises the continuity (or balance) equation
Sn+1 = (Sn  Xn )+ + An

(13.16)

whereas the queue content obeys


Qn+1 = (Qn  Xn + An )+

(13.17)

where ({)+  max({> 0). On the other hand, the relation between system
and queue content implies that Qn = (Sn  Xn )+ such that (13.16) is rewritten as
Sn+1 = Qn + An

(13.18)

The number of cells at the beginning of a timeslot n + 1 in the system is


the sum of the number of queued packets at the beginning of the previous
timeslot n and the newly arrived packets during timeslot n.

13.4.1 Queue observations


It is worthwhile to investigate the relation between observations at various
instances of time of the queueing process {QV (w)> w  0} which represents the
number of packets in the system at time w. As seen before and as illustrated
in Fig. 13.5, two time instances seem natural: an inspection at departure
times where QV (uq+ ) describes the number of packets in the system just
after departure of the q-th packet and an observation at arrival times where
QV (w3
q ) describes the number of packets in the system just before the q-th
packet enters.
Suppose that the q-th packet leaves QV (uq+ ) = n  m packets behind
in the system. This implies that precisely n arrivals after the q-th packet
have entered the system. Hence, the (q + n + 1)-th packet sees, just before
7

We write {+ = { +  and {3 = { 3  where  A 0 is an arbitrary, positive real number. The


notation {+ should not be confused with ({)+ = max({> 0).

13.4 The counting process

265

entering the system, at most n packets, because during the period w = uq


and w = wq+n+1  uq , only departures are possible. Thus, QV (w3
q+n+1 )  n
and, clearly, for m  n, the (q + m + 1)-th packet observes no more than
QV (w3
q+m+1 )  m. Hence, the following implication holds
n
o

)

m
QV (uq+ )  m =, QV (w3
q+m+1

t n j 1

t n k 1

tn+k
n+k

n+k

n+1

x d n+1

n+j

n+j+1-k


n


n j k

only departures possible

^N r k ` ^N t
S


n


n  k 1

only arrivals possible

d k`

which holds for all k d j,



n


n  j 1

k` ^N r


n j k

d k`

which holds for all k d j,

^N r d j` ^N t
S

^N t


n  j 1

^N t
S

d j`


n  j 1

^N t
S


n  j 1

d j` ^N r d j`
S


n

d j` ^N r d j`
S


n

Fig. 13.5. Relation between queue observations at arrival and at departure epochs.

Consider now the converse. Suppose that the (q + m + 1)-th packet sees
precisely n  m packets in front of it upon arrival: QV (w3
q+m+1 ) = n  m. This
implies that the (q + m + 1  n)-th packet is the rst packet that will leave
the system after w = wq+m+1 and that the (q + m  n)-th packet has already
left the system. At its departure at w = uq+m3n  wq+m+1 , it has observed at
most n packets behind it, because only arrivals are possible in the interval
+
)  n and, set n = m, then QV (uq+ )  m
[uq+m3n > wq+m+1 ). Hence, QV (uq+m3n
leading to the implication
o
n

+
QV (w3
q+m+1 )  m =, QV (uq )  m
Combining both implications leads to the equivalence,
n
o

QV (w3
)

m
+, QV (uq+ )  m
q+m+1

266

General queueing theory

or, for any sample path (or realization), it holds, for any non-zero integer m,
that
i
h

)

m
= Pr QV (uq+ )  m
Pr QV (w3
q+m+1
In steady-state for q $ 4, with limq<" QV (w3
q ) = QV;D and limq<"
QV (uq+ ) = QV;G , we nd that
Pr [QV;D = m] = Pr [QV;G = m]

(13.19)

In words, in steady-state, the probability of the number of packets in the


system observed by arriving packets is equal to the probability of the number of packets in the system left behind by departing packets. Of course,
we have assumed that the steady-state distribution exists. If one of these
distributions exists, the analysis demonstrates that the other must exist.
Notice that no assumptions about the distribution or dependence are made
and that (13.19) is a general result which only assumes the existence of a
steady-state.

13.5 PASTA
Let us denote by limw<" QV (w) = QV the steady-state system content or the
number of packets in the system in steady-state. To compute the waiting
time distribution (under a FIFO service discipline), we must take the view
of how a typical arriving packet in steady-state nds the queue. Therefore,
it is of interest to know when
?

Pr [QV;D = m] = Pr [QV = m]

(13.20)

The equality would imply that, in steady-state, the probability that an arriving packet nds the system in state m equals the probability that the
system is in state m. Recall with (6.1) that the existence of the probabilities
means that Pr [QV = m] also equals the long-run fraction of the time the system contains m packets or is in state m. Similarly, Pr [QV;D = m] also equals
the long-run fraction of arriving packets that see the system in state m. In
general, relation (13.20) is unfortunately not true. For example, consider
a D/D/1 queue with a constant interarrival time f and a constant service
time {f ? f . Clearly, the D/D/1 system has a periodic service cycle: a
busy period takes {f time units and the idle period equals f  {f time
units. Thus, every arriving packet always nds the system empty and conf
cludes Pr [QV;D = 0] = 1, while Pr [QV = 1] = {ff and Pr [QV = 0] = f 3{
f .
The waiting time computation of the GI/D/c system in Section 14.4.2 is
another counter example. Since the arrival process {QD (w)> w  0} interacts

13.6 Littles Law

267

with the system process {QV (w)> w  0} because every arrival increases the
system content with one, they are dependent processes. Relation (13.20)
is true for Poisson arrivals and this property is called Poisson arrivals see
time averages (PASTA).
Theorem 13.5.1 (PASTA) The long-run fraction of time that a process
spends in state m is equal to the long-run fraction of Poisson arrivals that
nd the process in state m>
Pr [QV;D = m] = Pr [QV = m]
Proof: See8 e.g. Wol (1982).

The Poisson process has the typical property that future increments are
independent of the past and, thus also of the past system history. In certain
sense, Poisson arrivals perform a random sampling which is su!cient to characterize the steady-state of the system exactly. The PASTA property also
applies to Markov chains. The transitions in continuous time Markov chains
are Poisson processes if self-transitions are allowed (see Section 10.4.1). For
any state m, the fraction of Poisson events that see the chain in state m is
m , which (see Lemma 6.1.2) also equals the fraction of time the chain is in
state m.

13.6 Littles Law


Littles Law is perhaps the simplest of the general queueing formulae.
Theorem 13.6.1 (Littles Law) The average number of packets (customers) in the system H [QV ] (or in the queue H [QT ]) equals the average
arrival rate  times the average time spent in the system H [W ] (or in the
queue H [z]),
H [QV ] = H [W ]

(13.21)

H [QT ] = H [z]
8

Although Wollfs general proof (Wol, 1982) only contains two pages, it is based on martingales
and on axiomatic probability theory.

268

General queueing theory

Littles Law holds if two of the three limits


D(w)
=
w

(13.22)

QV (x)gx = H [QV ]

(13.23)

lim

1
lim
w<" w

w<"
w

1X
lim
Wn = H [W ]
q<" q
q

(13.24)

n=1

exist.
$(t)

idle

T3
T2

3
2
1

T1

t1 = 0 t2

t5 W

t3 t 4

t6

t7

t8 t9

t10

Fig. 13.6. The arrival (bold) and departure (dotted) process, together with the
system time Wn for each packet in the queueing system.

Proof: Recall that D(w) represents the total number of arrivals in time
interval [0> w]. If QV (w) = 0 or the system is idle at time w, then
Z

QV (x)gx =
0

Z wX
"
0 m=1

1{xM[wm >wm +Wm ) gx =

D(w) Z w
X
m=1

D(w)

1{xM[wm >wm +Wm ) gx =

Wn

n=1

The general case where QV (w)  0 is more complicated as Fig. 13.6 shows
for w =  because not all intervals [wm > wm + Wm ) for 1  m  D( ) are contained
PD( )
n=1 Wn counts too much and is an upper bound for
Rin [0>  ). Hence,
Q
(x)gx.
If
G(w)
denotes the number of departures in [0> w], Fig. 13.6
V
0
illustrates that the area (in grey) in an interval [0> w],Rwhich equals the total
w
number of packets in the system in that interval 0 (D(x)  G(x))gx =
Rw
0 QV (x)gx, can be bounded for any realization (sample path) and any w  0

13.6 Littles Law

269

by
Z

Wn 

QV (x)gx 
0

n:Wn +wn $w

D(w)

Wn

n=1

where the lower bound only counts the packets that have left the system by
time w. By dividing by w, we have
D(w)
w

X
n:Wn +wn $w

Wn
1

D(w)
w

D(w) X Wn
QV (x)gx 
w
D(w)
D(w)

(13.25)

n=1

Since we assume that the limit (13.22) exists, we have that D(w) = R(w)
for w $ 4. From the existence of the limit (13.24), we can thus write
X Wn
= H [W ]
w<"
D(w)
D(w)

lim

n=1

When w $ 4 in (13.25) and using the limits dened above, we nd that


the upper bound converges to H [W ]. In order to proof (13.21), it remains
to show that also the lower bound in (13.25) converges to the same limit
H [W ]. Since D(w) = w + r(w) for w $ 4, it follows for the sequence of
arrival times wq that wq $ 4 if q $ 4 and that
q
D(wq )
=
$
wq
wq

as q $ 4

The convergence of the series (13.24) implies for q $ 4 that


Wq X Wn q  1 X Wn
=

$0
q
q
q
q1
q

q31

n=1

n=1

Combining both relations leads to


Wq
Wq q
=
$0
wq
q wq

as q $ 4

which implies that, for any % A 0, there exists a xed p such that, for all
n A p, we have that Wwnn ? % or wn + Wn ? (1 + %)wn . For w A wp , the lower
bound in (13.25) is
X
nAp:Wn +wn $w

Wn +

p
X
n=1

D(w@(1+%))

Wn =

n=1

Wn

270

General queueing theory

or
1
w

X
nD1:Wn +wn $w

1 D(w@(1 + %))
Wn =
1 + % w@(1 + %)

In the limit w $ 4, we obtain


X
1
w

nD1:Wn +wn $w

Wn $

D(w@(1+%))
n=1

Wn
D(w@(1 + %))


H [W ]
1+%

Since % can be made arbitrarily small, this nally proves (13.21).

Although the proof may seem rather technical9 for, after all, an intuitive
result, it reveals that no assumptions about the distributions of arrival and
service process apart from steady-state convergence are made. There are
no probabilistic arguments used. In essence Littles Law is proved by showing that two limits exist for any sample path or realization of the process,
which guarantees a very general theorem. Moreover, no assumptions about
the service discipline, nor about the dependence between arrival and service
process or about the number of servers are made which means that Littles
Law also holds for non-FIFO scheduling disciplines, in fact for any scheduling discipline! Littles Law connects three essential quantities: once two of
them are known the third is determined by (13.21). Littles Law is very
important in operations where it relates the average inventory (similar to
H [QV ]), the average ow rate or throughput  and the average ow time
H [W ] in a process ow of products or services. Several examples can be
found in Chapter 14, in Anupindi et al. (2006) and Bertsekas and Gallager
(1992, pp. 157162).

We have chosen for a very general proof. Other proofs (e.g. in Ross (1996) and Gallager (1996))
use arguments from renewal reward theory (Section 8.4) which makes their proofs less general
because they require that the system has renewals.

14
Queueing models

This chapter presents some of the simplest and most basic queueing models.
Unfortunately, most queueing problems are not available in analytic form
and many queueing problems require a specic and sometimes tailor-made
solution.
Beside the simple and classical queueing models, we also present two other
exact solvable models that have played a key role in the development of
Asynchronous Transfer Mode (ATM). In these ATM queueing systems the
service discipline is deterministic and only the arrival process is the distinguishing element. The rst is the N*D/D/1 queue (Roberts, 1991, Section
6.2) whose solution relies on the Benes approach. The arrivals consist of Q
periodic sources each with period of G time slots, but randomly phased with
respect to each other. The second model is the uid ow model of Anick
et al. (1982), known as the AMS-queue, which considers Q on-o sources
as input. The solution uses Markov theory. Since the Markov transition
probability matrix has a special tri-band diagonal structure, the eigenvector
and eigenvalue decomposition can be computed analytically.
We would like to refer to a few other models. Norros (1994) succeeded
in deriving the asymptotic probability distribution of the unnished work
for a queue with self-similar input, modeled via a fractal Brownian motion.
The resulting asymptotic probability distribution turns out to be a Weibull
distribution (3.40). Finally, Neuts (1989) has established a matrix analytic
framework and was the founder of the class of Markov Modulated arrival
processes and derivatives as the Batch Markovian Arrival process (BMAP).

14.1 The M/M/1 queue


The M/M/1 queue consists of a Poisson arrival process of packets with
exponentially distributed interarrival times, a service process with exponen271

272

Queueing models

tially distributed service time, one server and an innitely long queue. The
M/M/1 queue is a basic model in queueing theory for several reasons. First,
as shown below, the M/M/1 queue can be computed in analytic form, even
the transient time behavior. Apart from the computational advantage, the
M/M/1 queue possesses the basic feature of queuing systems: the quantities
of interest (waiting time, number of packets, etc.) increase monotonously
with the tra!c intensity .
Packets arrive in the M/M/1 queue with interarrival rate  and are served
with service rate . The M/M/1 queue is precisely described by a constant
rate birth and dead process. Any arrival of a packet to the queueing system
can be regarded as a birth. The current state n that reects the number of
packets in the M/M/1 system jumps to state n + 1 at the arrival of a new
packet and the transition rate equals the interarrival rate : on average every
1
 time units a packet arrives to the system. A packet leaves the M/M/1
system after service, which corresponds to a death: at each departure from
the system the current state is decreased by one, with death rate  equal
to the service rate : on average every 1 time units, a packet is served.
In the sequel, we concentrate on the steady-state behavior and refer for the
transient behavior to the discussion of the birth and death process in Section
11.3.3.

14.1.1 The system content in steady-state


From the analogy with the constant rate birth and death process as studied
in Section 11.3.3, we obtain immediately the steady-state queueing distribution (11.23) as
Pr [QV = m] = (1  ) m

m0

(14.1)

where QV = limw<" QV (w) is the number of packets in the system in the


stationary regime. In other words, Pr [QV = m] is the probability that the
M/M/1 system (queue plus server) contains m packets. It has been shown
in Section 11.3 that the M/M/1 queue is ergodic (i.e. that an equilibrium
or steady-state exists) if  =  ? 1, which is a characteristic of a general
queueing system. The probability density function of the system content
(14.1) is a geometric distribution reecting the memoryless property. We
observe that the M/M/1 system is empty with probability Pr [QV = 0] = 1
 and of all states, the empty state has the highest probability. Immediately,
the chance that there is a packet in the M/M/1 system is precisely equal to
the tra!c intensity , namely Pr [QV A 0] = 1  Pr [QV = 0] = .

14.1 The M/M/1 queue

273

The corresponding probability generating function (2.18) is


*QV (}) =

"
X
n=0

Pr [QV = n] } n =

1
1  }

The average number of packets in the M/M/1 system H [QV ] = *0V (1) equals


H QV;P@P@1 =
1
while the variance Var[QV ] follows from (2.27) as


Var QV;P@P@1 =
(1  )2
Both the mean and variance of the number of packets in the system diverges
as  $ 1. When the interarrival rate tends to the service rate, the queue
grows indenitely long with indenitely large variation. From Littles law
(13.21), the average time spent in the M/M/1 system equals

H [QV ]
1
1
=
=
H WP@P@1 =

 (1  )


(14.2)

where  ? 1 or, equivalently,  ? . If  = 0, there is no load in the system


and the average waiting time attains its minimum equal to the average
service time 1 . In the other limit  $ 1, the average waiting time grows
unboundedly, just as the queue length or number of packets in the system.
The behavior of the M/M/1 system in the limit  $ 1 is characteristic for
the average of quantities (QV , z> W> = = =) in many queueing systems: a simple
pole at  = 1.
As a remark, the average waiting time in the M/M/1 queue follows, after
taking expectations from the general relation Wq  {q = zq or H [z] =
H [W ]  1 , as

H zP@P@1 =

1

1
 =
 (1  ) 
 (1  )

(14.3)

14.1.2 The virtual waiting time


For the M/M/1 queue, the virtual waiting time y(w) at some time w consists
of (a) the residual service time of the packet currently under service, (b)
the time needed to serve the QT (w) packets in the queue. As mentioned in
Section 13.1.3, the virtual waiting time at arrival epochs equals the system
time Wq . In other words, if a new packet, say the q-th packet, enters the
M/M/1 system at w = wq , the total time (system time) that the packet

274

Queueing models

spends in the M/M/1 system equals y(wq ) = Wq . At w = w3


q , the number
)
does
not
include
the
new
packet
at
the
last
position
and the packet
QV (w3
q
3
sees QV (wq ) other packets in the system (queue plus the packet in the
server) in front of it. We assume further that the server operates in FIFO
(rst in, rst out) order. Since the service time is exponentially distributed
and possesses the memoryless property, the residual or remaining service
time of the packet currently under service has the same distribution. In
other words, it does not matter how long the packet has already been under
service. The more general argument is that the PASTA property applies.
The system time of the q-th packet is thus the sum of QV (w3
q )+1 exponential
3
i.i.d. random variables. As shown in Section 3.3.1, if QV (wq ) = n, the system
time has an Erlang distribution given by (3.24) with q = n + 1,
iWq (w|QV (w3
q ) = n) =

(w)n 3w
h
n!

Using the law of total probability (2.46), the system time Wq of the q-th
packet or the virtual waiting time at time w = wq becomes
g
Pr [Wq  w]
gw
"
X

3
iWq (w|QV (w3
=
q ) = n) Pr QV (wq ) = n

iWq (w) =

n=0

= h3w

"
X
(w)n
n=0

n!

Pr QV (w3
q) = n

3
In Section 11.3.3, vn (w3
q ) = Pr [QV (wq ) = n] is computed in (11.27) assuming that the system starts with m packets, i.e. vn (0) = nm . In steadyn
state, where wq $ 4, it is shown that vn (w3
q ) $ (1  )  . In most
cases, however, a time-dependent solution is not available in closed form.
Fortunately, for Poisson arrivals, the PASTA property helps to circumvent this inconvenience. Based on the PASTA property, in steady-state,
limq<" Pr [QV (w3
q ) = n] = Pr [QV = n] given by (14.1). The probability
density function iW (w) = limq<" iWq (w) of the steady-state system time W
(or the total waiting time of a packet) is
3w

iW (w) = h

"
X
(w)n
n=0

n!

(1  ) n

or
iW (w) = (1  ) h3(13)w

(14.4)

14.1 The M/M/1 queue

275

In summary, the total time spent in the M/M/1 system in steady-state


1
1
= 3
, which has
( ? 1) has an exponential distribution with mean (13)
1
been found above in (14.2) by Littles law. Similarly , the waiting time in
the M/M/1 queue is
iz (w) = (1  ) (w) + (1  ) h3(13)w

(14.5)

where the rst term with Dirac function reects a zero queueing time provided the system is empty, which has probability Pr [QV = 0] = 1  .

14.1.3 The departure process from the M/M/1 queue


There is a remarkable theorem due to Burke which has far-reaching consequences for networks of M/M/1 queues.
Theorem 14.1.1 (Burke) In a steady-state M/M/1 queue, the departure
process is a Poisson process with rate 
Burkes Theorem is equivalent to the statement that the interdeparture
times uq  uq31 in steady-state are i.i.d. exponential random variables with
mean 1 .
Proof: Let us denote the probability density function of the interdeparture time u by
g
Pr [u  w]
iu (w) =
gw
In steady-state, it holds in general that Pr [QV;D = m] = Pr [QV;G = m], as
shown in Section 13.4.1, while the PASTA property (Theorem 13.5.1) states
that Pr [QV;D = m] = Pr [QV = m]. Hence, in steady-state in the M/M/1
queue, departing packets see the steady-state system content, i.e. Pr [QV;G = m] =
Pr [QV = m]. Moreover, in steady-state, the departure process can be decomposed into two dierent situations after the departure of a packet: (a) the
departing packet sees an empty system (which is equivalent to the system is empty) or (b) the departing packet sees a next packet in the queue
(which is equivalent to the system serves immediately the next packet in
1

The Laplace transform of the waiting time in the queue follows from Wq = zq + {q as
*W (})
(1 3 )  } + 
=
*{ (})
} + (1 3 )  
(1 3 ) 
= (1 3 ) + 
} + (1 3 ) 

*z (}) =

which, after inverse Laplace transformation, gives (14.5).

276

Queueing models

the queue),
Pr [u  w] = Pr [u  w|QV = 0] Pr [QV = 0] + Pr [u  w|QV A 0] Pr [QV A 0]
In case (a), we must await for the next packet to arrive and to be served.
This total time is the sum of an exponential random variable with rate 
and an exponential random variable with rate . It is more convenient to
compute the Laplace transform as shown in Section 3.3.1,
Z "


h3}w g (Pr [u  w|QV = 0]) =
*u|QV =0 (}) =
}+}+
0
In case (b), the next packet leaves the M/M/1 queue after an exponential
service time with rate ,
Z "

*u|QV A0 (}) =
h3}w g (Pr [u  w|QV A 0]) =
}+
0
Hence,
*u (}) = *u|QV =0 (}) Pr [QV = 0] + *u|QV A0 (}) Pr [QV A 0]




(1  ) +
=
=
}+}+
}+
}+
which proves the theorem.

Burkes Theorem states that the steady-state arrival and departure process
of the M/M/1 queue are the same! Consequently, the steady-state departure
rate equals the steady-state arrival rate .
14.2 Variants of the M/M/1 queue
A number of readily obtained variants from the birth-death analogy are
worth considering here. Mainly a steady-state analysis is presented.
14.2.1 The M/M/m queue
Instead of one server, we consider the case with p servers. The buer is still
innitely long and the interarrival process is exponential with interarrival
rate . The M/M/m queue can model a router with p physically dierent
interfaces (or output ports) with same transmission rate  towards the same
next hop. All packets destined to that next hop can be transmitted over
any of the p interfaces. This type of load balancing frequently occurs in the
Internet.
As shown in Fig. 14.1, the M/M/m system can still be described by a birth

14.2 Variants of the M/M/1 queue

277

and death process with birth rate n = , but with death rate n = n for
0  n  p and n = p if n  p. Indeed, if there are n  p packets in
system, they can all be served and the departure (or death) rate from the
system is n. Only if there are more packets n A p, only p of them can be
served such that the death rate is limited to the maximum service rate p.
O
0

O
1

P

P

...
m1
(m  1)P

O
m

mP

mP

m+1

...
mP

Fig. 14.1. The birthdeath process corresponding to the M/M/m queue.

14.2.1.1 System content


From the basic steady-state relations for the birth and death process (11.15)
and (11.16), we nd
Pr [QV = 0] =

1+

Pp31

= Pp31

m
m=1 m!m

1
P
+ "
m=p

m
pm3p p!m

1
m

m=0 m!m

(14.6)

p
1
p!p 13 
p

m
Pr [QV = 0]
m!m

 m
pp
=
Pr [QV = 0]
p! p

Pr [QV = m] =

mp

(14.7)

mp

(14.8)


The tra!c intensity is  = p
, the ratio between average interarrival rate
and average (maximum) service rate. Again,  ? 1 corresponds to the stable
(ergodic) regime.
For the M/M/m system it is of interest to know what the probability of
queueing is. Queueing occurs when an arriving packet nds all servers busy,
which happens with probability Pr [QV  p], or explicitly,

Pr [QV  p] =

Pr [QV = 0] p
p



p! 1  p

(14.9)

This probability also corresponds to a situation in classical telephony where


no trunk is available for an arriving call. Relation (14.9) is known as the
Erlang C formula.

278

Queueing models

14.2.1.2 Waiting (or queueing) time


Instead of computing the virtual waiting time (or system time, unnished
work), we will now concentrate on the waiting time of a packet in the
M/M/m queue. The system time can be deduced from the basic relation
Wq = zq + {q , where {q is an exponential random variable with rate p.
A packet only experiences queueing if all servers are occupied. This event
has probability Pr [QV  p] specied by the Erlang C formula (14.9). Hence,
the queueing time z can be decomposed into two cases: (a) an arriving
packet does not queue (z = 0) and (b) an arriving packet must wait in the
M/M/m queue,
Pr[z  w] = Pr[z  w|QV ? p] Pr[QV ? p] + Pr[z  w|QV  p] Pr[QV  p]
= Pr [QV ? p] + Pr [z  w|QV  p] Pr [QV  p]
= 1  Pr [QV  p] + Pr [z  w|QV  p] Pr [QV  p]

(14.10)

It remains to compute Pr [z  w|QV  p]. The reasoning is analogous to


that in the M/M/1 queue. An arriving packet must wait for the packet
currently under service and for the m packets already in the queue before
it. Thus, z equals the sum of m + 1 exponentially random variables with
rate p because for the M/M/m queue, the service rate is p. Hence (see
Section 3.3.1), the distribution for the waiting time z in the queue, provided
m packets are in the queue, is an Erlang distribution,
iz (w|QT = m) =

p(pw)m 3pw
h
m!

Furthermore, the number of packets in the queue QT in steady-state is


related to the system content as QV = p + QT . Using the law of total
probability (2.46), the waiting time in the queue in steady-state is
g
Pr [z  w|QV  p]
gw
"
X
iz (w|QV = p + m) Pr [QV = p + m|QV  p]
=

iz (w|QV  p) =

m=0

The conditional probability Pr [QV = p + m|QV  p] follows from (2.44)



as
and (14.8) with  = p
Pr [QV = 0] pp
Pr [QV = p + m]
=
Pr [QV = p + m|QV  p] =
Pr [QV  p]
Pr [QV  p] p!



= 1
= (1  ) m
p
p


p

m+p

14.2 Variants of the M/M/1 queue

279

We observe from (14.1) that, if all p servers are busy, the system content
of an M/M/m system behaves as that in a M/M/1 system. Thus, the
conditional probability density function for the waiting time in the M/M/m
queue is also an exponential distribution,
iz (w|QV  p) = (1  ) ph3pw

"
X
(pw)m
m=0

m!

= (1  ) ph3(13)pw
or
Pr [z  w|QV  p] = 1  h3(13)pw
Substitution in (14.10) nally results in the distribution of the waiting time
in the queue of the M/M/m system,
Iz (w) = Pr [z  w] = 1  Pr [QV  p] h3(13)pw

(14.11)

Since Iz (0) = 1  Pr [QV  p] A 0 while obviously Iz (03 ) = 0, there


is probability mass at w = 0, which is reected by a Dirac impulse in the
probability density function,
iz (w) = (1  Pr [QV  p]) (w) + (1  ) p Pr [QV  p] h3(13)pw
(14.12)
The pdf of the system time W = z + { follows after convolution of (14.12)
and i{ (w) = h3w as
iW (w) = (1  Pr [QV  p]) hw +

Pr [QV  p]
(1  ) p h(1)pw  hw
1  p (1  )
(14.13)

and the average system time can be computed from (14.13) with (2.33) or
directly from H [W ] = H [z] + H [{] as
H [W ] =

1 Pr [QV  p]
+

p (1  )

(14.14)

Also, in the single-server case (p = 1), (14.12) reduces to the pdf (14.5) of
the M/M/1 queue. Furthermore, Burkes Theorem 14.1.1 can be extended
to the M/M/m queue: the arrival and departure process of the M/M/m
queue are both Poisson processes with rate .
14.2.2 The M/M/m/m queue
The dierence with the M/M/m queue is that the number of packets (calls)
in the M/M/m/m queue is limited to p. Hence, when more than p packets (calls) arrive, they are lost. This situations corresponds with classical

280

Queueing models

telephony where a conversation is possible if no more than p trunks are occupied, otherwise you hear a busy tone and the connection cannot be set-up.
The limitation to p arrivals is modeled in the birth and death process by
limiting the interarrival rates, n =  if n ? p and n = 0 if n  p. The
death rates are the same as in the M/M/m queue, n = n for 0  n  p
and n = p if n  p.
From the basic steady-state relations for the birth and death process
(11.15) and (11.16), we nd
Pr [QV = 0] = Pp
Pr [QV = m] =

(14.15)

m
m=0 m!m
m

m!m
=0

mp

Pr [QV = 0]

mAp

(14.16)

The quantity of interest in the M/M/m/m system is the probability that all
trunks (servers) are busy, which is known as the Erlang B formula,
Pr [QV = p] =

p
p!p
Pp m
m=0 m!m

(14.17)

In practice, a telephony exchange is dimensioned (i.e. the number of lines


p is determined) such that the blocking probability Pr [QV = p] is below
a certain level, say below 1034 . In summary, the Erlang B formula (14.17)
determines the blocking probability or loss probability (because only p calls
or packets are allowed to the system), while the Erlang C formula (14.9) is
the probability that a packet must wait in the (innitely long) queue because
all servers are busy.
Although the Erlang B formula (14.17) has been derived in the context
of the M/M/m/m queue, it holds under much weaker assumptions, a fact
already known to Erlang, as mentioned by Kelly (1991). Kelly starts his
memoir on Loss Networks with the Erlang B formula, which Erlang obtained
from his powerful method of statistical equilibrium. The latter concept is
now identied as the steady-state of Markov processes. Kelly further relates
the impact of the Erlang B formula from telephony to interacting particle
system and phase transitions in nature (e.g. the famous Ising model). Much
eort has been devoted over time to generalize Erlangs results as far as
possible. The Erlang B formula (14.17) holds for the M/G/m/m queue as
well, thus for an arbitrary service process provided the mean service rate (per
server) equals . The proof by Gnedenko and Kovalenko (1989, pp. 237
240) is long and complicated, whereas the proof of Ross (1996, Section 5.7.2)

14.2 Variants of the M/M/1 queue

281

is more elegant and is based on the time-reversed Markov chain. As a


corollary, Ross demonstrates that the departure process (including both lost
and served packets) of the M/G/m/m system is a Poisson process with rate
.
Example 1 In case p $
4, the expression (14.15) and (14.6) tend to
limp<" Pr [QV = 0] = exp   while (14.16) and (14.7) become

m

Pr [QV = m] =
exp 
m!m

This queueing system is denoted as M/M/4. Thus, the number in the
M/M/4 system (in steady-state) is Poisson distributed with parameter  .
Hence, in case p $ 4, the average number in the system H[QV ] = 
(as follows from (3.11)) and the average time in the system follows from
Littles theorem (13.21) as H [W ] = 1 . The fact that, if p $ 4, the mean
time in the system M/M/4 equals the average service time has a consistent
explanation: if the number of servers p $ 4, implying that there is an
innite service capacity, it means that there is no waiting room and the
only time a packet is in the system is his service time 1 .
Example 2 Consider two voice over IP (VoIP) gateways connected by a
link with capacity F. Denote the capacity of a voice call by Fvoice (in bit/s).
For example in ISDN, Fvoice = 64 kb/s. In general, Fvoice in VoIP depends
on the specics of the codecs used. The arrival rate  of voice tra!c can be
expressed in terms of the number d of call attempts per hour and the mean
call duration g (in seconds) as
dg
3600
The number p of calls that the link can carry simultaneously is
=

p=

F
Fvoice

Since the arrival process of voice calls is well modeled by a Poisson process
with exponential holding time, the Erlang B formula (14.17) is applicable
to compute the blocking probability  or grade of service (GoS) as
up
Ppp! um
m=0 m!



(14.18)


where u =  = p and  = p
is the tra!c intensity. This relation
(14.18) species the probability that admission control will have to refuse

282

Queueing models

a call request between the two VoIP gateways because the link is already
transporting p calls. An Internet service provider can make a trade-o
between the link capacity F (by hiring more links or a higher capacity link
from a network provider) and the blocking probability  or GoS. The latter
must be small enough to keep its subscribed customers, but large enough
to make prot. A reasonable value for GoS seems  = 1034 . If the Internet
service provider hires a 2 Mb/s link and oers its customers VoIP software
with codec rate 40 kb/s (G.726 standard), then p = 50. Since the left-hand
side of (14.18) is strictly increasing in , solving the equation (14.18) for
u yields u  28=87 or the tra!c intensity equals  = 0=5775. Furthermore,
F
= 40 kb/s, we obtain  = 1=155 Mb/s. If the mean call duration
since  = p
g (in seconds) is known, the number of call attempts per hour then follow as
d = 4158129
. If we assume that a telephone call lasts on average 2 minutes
g
or g = 120 s, the number of call attempts per hour that the Internet service
provider can handle with a GoS of 1034 equals d = 34651.

14.2.3 The M/M/1/K queue


In contrast to the basic M/M/1 queue, the M/M/1/K system cannot contain
more than N packets (including the packet in the server). Arriving packets
that nd the system completely occupied (with N packets), are refused
service and are to be considered as lost (or marked).
In the basic steady-state relations for the birth and death process (11.15)
the appearing summation is limited to N instead of innity or n =  if
n ? N and n = 0 if n  N. Thus, with  =  and
1
Pr [QV = 0] = PN

m=0

m

1
1  N+1

the pdf of the system content for the M/M/1/K system becomes,
(1  ) m
1  N+1
=0

Pr [QV = m] =

0mN

(14.19)

mAN

The probability that m positions in the M/M/1/K system are occupied is


proportional to that in the innite system (14.1) with proportionality factor

31
1  N+1
.
The probability that the system is completely lled with N packets equals
Pr [QV = N] =

(1  ) N
1  N+1

(14.20)

14.3 The M/G/1 queue

283

This probability also equals the loss probability for packets in the M/M/1/K
system. Regarding the QoS problem in multimedia based on IP-networks,
a rst crude estimate of the packet loss in a router with N positions can
be derived from (14.20). The estimate is rather crude because the arrival
process of packets in the Internet is likely not a Poisson process and the
variable length of the packets does not necessarily lead to an exponential
service rate.

14.3 The M/G/1 queue


The most general single-server queueing system with Poisson arrivals is the
M/G/1 queue. The service time distribution I{ (w) can be any arbitrary
distribution. Due to its importance, we will derive the system content and
the waiting time distribution in steady-state.
In order to describe the M/G/1 queueing system, special observation
points in time must be chosen such that the equations for the evolution
of the number of packets in the system are most conveniently deduced. The
set of departure times {uq } appears to be a suitable set. Any other set of
observation points is likely to lead to a more complex mathematical treatment, mainly because the remaining service time of the packet just under
service is a stochastic variable. If the M/G/1 queue is observed at departure
times, the evolution of the number of packets in the system QV (uq+ ) that the
departing packets leave behind is a discrete Markov chain, namely the embedded Markov chain of the M/G/1 queue continuous-time process. Section
13.4.1 has shown that, in steady-state, relation (13.19) tells that the distribution of the number of packets in the system observed by arrivals equals
that left behind by departures. In addition, the PASTA Theorem 13.5.1
states that, in steady-state, Poisson arrivals observe the actual distribution
of the number of packets in the system. This PASTA property makes that
the embedded Markov chain observing the system at departure epoch only,
nevertheless provides the steady-state solution since the arrival process is
Poisson.
Let us concentrate in deriving the transition probabilities that specify this
embedded Markov chain entirely apart from an initial state distribution.
With the notation of Section 13.4, Sn = QV (un+ ) and An denote the number
of packets in the system at the discrete-time un and the number of arrivals
during a time interval [un > un+1 ], respectively. The transition probability of
the embedded Markov chain is
Ylm = Pr [Sn+1 = m|Sn = l]

284

Queueing models

and the evolution over time follows from (13.16) as


Sn+1 = (Sn  1)+ + An

(14.21)

Hence, since An  0, we see that Sn+1  (Sn  1)+ or that Sn+1 ? (Sn  1)+
is impossible. Hence, Ylm = 0 for m ? l and l A 1 while, for l A 0, Ylm =
Pr [l  1 + An = m|Sn = l] = Pr [An = m  l + 1]. The case for l = 0 results
in Y0m = Pr [An = m] = Y1m . Denoting dm = Pr [An = m], the transition
probability matrix becomes2
6
5
d0 d1 d2 d3
9 d0 d1 d2 d3 :
:
9
:
9
Y = 9 0 d0 d1 d2 :
9 0 0 d d :
0
1
8
7
.. . .
..
..
..
.
.
.
.
.
and the corresponding transition graph is sketched in Fig. 14.2.
aj  i + 1
a1

...

i2

a3

i1

a2

i+1

i+2

...

...

a0

Fig. 14.2. State transition graph for the M/G/1 embedded Markov chain.

The number of Poisson arrivals during a time slot [un > un+1 ] clearly depends on the length of the service time {n+1 = un+1  un that is distributed
according to I{ (w), which is independent of a specic packet n. Furthermore,
the arrival process is a Poisson process with rate  and independent of the
state of the queueing process, thus Pr [An = m] = Pr [A = m]. Hence, using
the law of total probability (2.46),
Z "
Pr [A = m|{ = w] gI{ (w)
Pr [A = m] =
0

Z
=

0
2

"

h3w

(w)m
gI{ (w)
m!

The structure of this transition probability matrix Y has been investigated in great depth by
Neuts (1989). Moreover, Y belongs to the class of matrices whose eigenstructure is explicitly
given in Appendix A.5.3.

14.3 The M/G/1 queue

285

If we denote the Laplace transform of the service time by


Z "
*{ (}) =
h3}w gI{ (w)
0

then we observe that


m
dm = Pr [A = m] =
m!

"

h
0

()m gm *{ (})
w gI{ (w) =
m!
g} m }=

3w m

(14.22)

so that the transition probability matrix Y is specied. Since all dm A 0 for


all m A 0, Fig. 14.2 indicates that all states are reachable from an arbitrary
state l as the Markov process evolves over time in at least l steps. These
l steps occur in the transition from state l to state 0. This implies that
the Markov process is irreducible and the steady-state stability requirement
 ? 1 makes the Markov process ergodic. The steady-state vector with
components m = Pr [QV;G = m] where limq<" QV (uq+ ) = QV;G follows from
(9.22) as solution of  = Y , where Y is a matrix of innite dimensions.
14.3.1 The system content in steady-state
Rather than pursuing with the matrix analysis that is explored by Neuts
(1989), we present an alternative method to determine the steady-state distribution m using generating functions. The generating function approach
will lead in an elegant way to the celebrated Pollaczek-Khinchin equation.
The probability generating function (pgf) Jn (}) of a discrete random variable Gn is dened in (2.18) as
"
X
{
jn [m]} m
Jn (}) = H } Gn =

(14.23)

m=0

where jn [m] = Pr[Gn = m]. From (14.21), we have


i
h
+
Vn+1 (}) = H } (Sn 31) +An

(14.24)

Anticipating the corresponding result (14.31) derived in Section 14.4 for


the GI/D/m system in discrete-time, we observe that the generating function Vn (}) satises a formally similar equation when p = 1 in (14.31).
This correspondence points to a more general framework because, by choosing appropriate observation points, the M/G/1 and GI/D/1 systems (in
discrete-time) obey a same equation formally. Since the results deduced for
the GI/D/m system are more general because of the p servers instead of 1
here, we content ourselves here to copy the result (14.35) derived below3 in
3

Notice that the notation D(}) here is dierent from D(w) =

Uw
0

QD (x)gx used before.

286

Queueing models

Section 14.4,
(}  1) D(})

V(}) = 1  D0 (1)
}  D(})
We further continue to introduce in this general equation the details of the
P"
m
M/G/1 queueing system by specifying D(}) =
m=0 Pr [A = m] } . With
(14.22), we nd the Taylor expansion,

"
X
(})m gm *{ (})
= *{ (  })
(14.25)
D(}) =
m!
g} m }=
m=0

and the probability generating function of the system content of the steadystate M/G/1 queueing system,
(}  1) *{ (  })

V(}) = 1 + *0{ (0)


}  *{ (  })
But, since *0{ (0) = H [{] =  1 , which is the average service time, we
nally arrive using (13.2) at the famous Pollaczek-Khinchin equation,
V(}) = (1  )

(}  1) *{ (  })
}  *{ (  })

(14.26)

Let us further investigate what can be concluded from the Pollaczek-Khinchin


equation (14.26). First, we can verify by executing the limit } $ 1 that
V(1) = 1, the normalization condition for any probability generating function. More interestingly, the average number of packets in the M/G/1 system
follows after some tedious manipulations using de lHospitals rule as
H [QV ] = V 0 (1)
=+

From (2.42), *00{ (0) = H {2 ,

2 *00{ (0)
2(1  )


2 H {2
H [QV ] =  +
2(1  )

(14.27)

Hence, the average number of packets in the M/G/1 system (in steady-state)
is proportional
to the second moment of the service time distribution. Since
2
H { = Var[{]+(H [{])2 , the relation4 (14.27) shows that, for equal average
I
4

Sometimes the coe!cient of variation for the service time, F[ =




2
2 1 + F[
H [QV ] =  +
2(1 3 )

Va r[[]
,
H[[]

is used such that

14.3 The M/G/1 queue

287

service rates, the service process with highest variability leads to the largest
average number of packets in the system. One of the early successes of the
Japanese industry was the just in time (JIT) principle, which essentially
tries to minimize the variability in a manufacturing process. Minimization
of variability is also very important in the design of scheduling rules: the
less variability, the more e!ciently buer places in a router are used. Since
a deterministic server has the lowest variance (namely zero), the M/D/1
queue will occupy on average the lowest number of packets. This design
principle was used in ATM, where all service times precisely equal the time
needed to serve one ATM cell. The average time spent in the system follows
directly from Littles law (13.21),

H {2
H [QV ]
= H [{] +
H [W ] =

2(1  )
and, since H [W ] = H [{] + H [z], the average waiting time in the queue is

H {2
H [z] =
(14.28)
2(1  )
Observe a general property of averages in queueing systems: there is a
simple pole at  = 1. Both the average number of packets in the system
(and in the queue) and the average waiting time grows unboundedly as
 $ 1.

14.3.2 The waiting time in steady-state


The derivation of the pgf (14.26) of the steady-state system content V(})
has not made any assumption about the service discipline that determines
the order in which packets are served. The waiting time (in the queue)
and the system time (total time spent in the M/G/1 queueing system) will,
of course, dependent on the order. As mentioned earlier, a FIFO service
discipline is assumed. At each departure time un , the number of packets left
behind by that n-th packet is precisely QV (un ). With FIFO, this implies
that during the total time Wn that the n-th packet has spent in the M/G/1
queueing system, precisely QV (un ) packets have arrived. Similarly, as above
in (14.22), we compute the number of Poisson AW arrivals during Wn (instead
of {n ) and directly nd, in steady-state,

Z
m " 3w m
()m gm *W (})
W
Pr [A = m] =
h w gIW (w) =
m! 0
m!
g} m }=

288

Queueing models

and the corresponding pgf,


DW (}) =

"
X

Pr [AW = m] } m = *W (  })

m=0

where *W (}) is the Laplace transform of the system time W . Since the number of Poisson arrivals AW during the system time W of a packet in steadystate equals the number of packets left behind by that packet, Pr [AW = m] =
Pr [QV;G = m]. The PASTA property (Theorem 13.5.1) states that, in steadystate, the observed number of packets in the queue at departure or arrival
times is equal in distribution to the actual number of packets in the queue
or that Pr [QV;G = m] = Pr [QV = m]. By considering the pgfs of both sides,
DW (}) = V(}), such that with (14.26),
*W (  }) = (1  )

(}  1) *{ (  })
}  *{ (  })

After a change of variable v =   }, we end up with the result that the
Laplace transform of the total system time in steady-state is a function of
the Laplace transform of the service time
*W (v) = (1  )

v*{ (v)
v   + *{ (v)

(14.29)

Since Wn = zn + {n and, in steady-state W = z + {, we have that H h3vW =


H [h3vz h3v{ ] = H [h3vz ] H [h3v{ ], where the latter follows from independence
between { and z. Hence, *W (v) = *{ (v)*z (v), from which the Laplace
transform of the waiting time in the queue follows as
v
(14.30)
*z (v) = (1  )
v   + *{ (v)
Due to the correspondence with (14.26), these two relations (14.29) and
(14.30) are also called Pollaczek-Khinchin equations for the system time
and waiting time respectively. For example, for an exponential service time
(13)

with average 1 , *{ (v) = v+
and (14.29) becomes *W (v) = v+(3)
, which
is indeed the Laplace transform of the pdf of total system time (14.4) in
the M/M/1 queue. The relation (14.30) can be written in terms of the
residual service time (8.17), which after Laplace transform becomes *uz (v) =
13*{ (v)
vH[{] , as
*z (v) =

1
1  *uz (v)

It shows that the dominant tail behavior (see Section 5.7) arises from the
pole at *uz (v) = 1 . By formal expansion into a Taylor series (only valid for

14.4 The GI/D/m queue

289

|*uz (v)| ? 1), we nd


*z (v) = (1  )

"
X

n *nuz (v)

n=0

or, after taking the inverse Laplace transform,


X
g
(nW)
Pr [z  w] =
(1  ) n iuz
(w)
gw
"

iz (w) =

n=0

The pdf iz (w) of the waiting time in the queue can be interpreted as a sum
(nW)
of convolved residual service time pdfs iuz (w) weighted by (1  ) n =
Pr [QV = n], the steady-state probability of the system content in the M/M/1
system (14.1).

14.4 The GI/D/m queue


The analysis of the GI/D/m queue illustrates a discrete-time approach to
queueing. Since each of the p servers operate deterministically which means
that per unit time precisely one packet (or an ATM cell or customer) is
served, the basic time unit in the analysis, also called a time slot, is equal
to that service time. Hence, the arrival process is expressed as a counting
process: instead of specifying the interarrival rate, the number of arrivals at
each time slot is used.
In the sequel, we conne ourselves to a deterministic server discipline that
removes during timeslot n precisely
p cells from the queue. Hence, we have

X
p
n
= } . Substituting (13.16) in (14.23) leads
Xn = p and [n (}) = H }
to
i
h
+
(14.31)
Vn+1 (}) = H } (Sn 3p) +An
At this point, a further general evaluation of the expression (14.31) is only
possible by assuming independence between the random variables An and
Qn . From (13.18), it then follows that Vn+1 (}) = Tn (})Dn (}). This crucial
assumption facilitates the analysis considerably. For,
i
h
+
Vn+1 (}) = H } (Sn 3p) } An
h
i

+
(by independence)
= H } (Sn 3p) H } An
= Dn (})

"
X
m=0

Pr[(Sn  p)+ = m] } m

(by denition (14=23))

290

Queueing models

The summation can be worked out as


"
X
Pr[(Sn  p)+ = m] } m
E=
m=0

= Pr[(Sn  p)+ = 0] +
=

p
X

Pr[Sn = p + m] } m

m=1
"
X

Pr[Sn = m] } m3p

Pr[Sn = m] +

m=0

"
X

m=1+p

Setting in terms of the generating function of Vn (}) yields


3
4
p
"
p
X
X
X
E=
Pr[Sn = m] + } 3p C
Pr[Sn = m] } m 
Pr[Sn = m] } m D
m=0

=}

3p C

Vn (}) 

m=0
p
X

m=0

Pr[Sn = m] (} m  } p )D

m=0

Finally, we obtain a recursion relation for the generating function of the


system content in the GI/D/m system at discrete-time n,
3
4
p31
X
Pr[Sn = m] (} m  } p )D
(14.32)
Vn+1 (}) = Dn (})} 3p CVn (}) 
m=0

In the single-server case p = 1, where precisely one cell is served per time
slot (provided the queue is not empty), equation (14.32) simplies with
Vn (0) = Pr[Sn = 0] to
Vn+1 (}) = Dn (})} 31 {Vn (})  Pr[Sn = 0] (1  })}


Vn (})  Vn (0)
+ Vn (0)
= Dn (})
}

(14.33)

Notice that Vn+1 (0) = Dn (0)(Vn0 (0) + Vn (0)).


14.4.1 The steady-state of the GI/D/m queue
The steady-state behavior is reached if the systems distributions do not
change anymore in time. With limn<" Vn (}) = V(}) and limn<" Dn (}) =
D(}), (14.32) reduces in steady-state to
P
p
m
D(}) p31
m=0 Pr[S = m] (}  } )
V(}) =
} p  D(})

14.4 The GI/D/m queue

291

Recall that [n (}) = } p is the generating function of the service process.


At this point, we use the same powerful argument from complex analysis as
in Section 11.3.3. Since a generating function of a probability distribution
is analytic inside and on the unit circle, the possible zeros of } p  D(})
inside that unit circle must be precisely cancelled by zeros in the numerator.
Clearly, } = 1 is a zero of } p  D(}). On the unit circle, excluding the point
} = 1 where D(1) = 1, we know5 that |D(})| ? 1 for all points on the unit
circle |}| = 1 (except for } = 1).
The region around } = 1 deserves some closer investigation. From the
Taylor expansions
D(}) = 1 + (}  1) + r(}  1)
} p = 1 + p(}  1) + r(}  1)
and the fact that  ? p because a steady-state requires  ? 1, we substitute
}  1 = %hl , which describes a circle with radius % around } = 1. Along this
circle with arbitrarily small radius %, we nd that
q

l
|D(})| = 1 + %h + r(%) = (1 + % cos  + r(%))2 + (% sin  + r(%))2

|} p | = 1 + p%hl + r(%) = (1 + p% cos  + r(%))2 + (p% sin  + r(%))2


which demonstrates that |} p | A |D(})| on this arbitrary small circle if cos  A
0 or  2 ?  ? 2 . Invoking Rouchs Theorem 11.3.1 with i (}) = } p and
j(}) = D(}) on the contour F, the unit circle including the point } = 1 by
an arbitrarily small arc % on the right of } = 1 as illustrated in Fig. 14.3, such
that |i (})| A |j(})| on the contour F, shows that } p  D(}) has precisely
p zeros 1 > 2 > = = = > p = 1 inside that contour F.
V(})
Since T(}) = D(})
is the generating function of the number of occupied
buer positions, it is also analytic inside the unit circle. Therefore, the zeros
P
p
m
{q }1$q$p31 and p = 1 are also zeros of s(}) = p31
m=0 Pr[S = m] (} } ).
This leads to a set of p equations for each q 6= 0,
p31
X

Pr[S = m] (qp  qm ) = 0

m=0

which determine the unknown probabilities Pr[S = m]. Since s(}) is a poly5

For any probability generating function *J (}) it holds for |}| $ 1 that


[
 [
"
"
  [
"

m

|*J (})| = 
Pr [J = m] }  $
Pr [J = m] } m  $
Pr [J = m] = 1
m=0
 m=0
m=0

292

Queueing models

|z| = 1

Fig. 14.3. Details of the contour F in the neighborhood of } = 1=

nomial of degree p, s(}) is entirely determined by its zeros as


s(}) = (}  1)

p31
Y

(}  q )

q=1

The unknown  is determined from the normalization condition V(1) =


T(1) = 1, which is explicitly
lim

}<1 } p

s(})
=1
 D(})

With de lHospitals rule,


31

p31
Y
q=1

1
=
(1  q ) lim
p31
}<1 p}
 D0 (})

Qp31

(1  q )
p

q=1

Finally, we arrive at the generating function of the buer content via that
of the system content V(}) = D(})T(}),
T(}) =

p31
(p  )(}  1) Y }  q
} p  D(})
1  q

(14.34)

q=1

With D0 (1) =  the single-server case (p = 1) in (14.34) reduces to the


well-known result for the pgf of the system and buer content of a GI/D/1
system respectively as

(}  1) D(})
V(}) = 1  D0 (1)
}  D(})
}1
T(}) = (1  )
}  D(})

(14.35)
(14.36)

14.4 The GI/D/m queue

293

The probability of an empty buer, T(0) = Pr [Q = 0], immediately follows


from (14.34). The average queue length for the single-server (p = 1) is
obtained as T0 (1), or
H [QT ] = H [Q] =

D00 (1)
2 (1  )

(14.37)

14.4.2 The waiting time in a GI/D/m system


Let W = z + 1 denote the steady-state system time of an arbitrary packet,
a test packet, in units of a timeslot. In addition to the system content
S that describes the number of packets in the system at the beginning of
a time slot, an additional random variable must be introduced: F denotes
the number of packets that arrive in the same timeslot just before the test
packet. In the D-server and assuming a FIFO discipline, these F packets
will be served before the test packet, possibly in the same time slot. The
system time of the test packet equals

(S  p)+ + F
W =
+1
p
where b{c denotes the largest integer smaller than or equal to {. Indeed,
(S  p)+ + F are the number of packets in the system just before the
arrival of the test packet. At the beginning of a timeslot, at most p
packets are served, which explains the integer division. The service time
takes precisely one additional time slot. Let us simplify the notation by
dening R = (S  p)+ + F. From this expression for the system time,
we deduce, for each integer n  1 (the minimal waiting time in the system
equals 1 timeslot), that
Pr [W = n] =

p31
X

Pr [R = (n  1)p + m]

m=0

such that the generating function of the waiting time W (}) is


W (}) =

"
X

Pr [W = n] } n =

n=1

"
p31
XX
m=0 n=0

" p31
X
X

Pr [R = (n  1)p + m] } n

n=1 m=0

Pr [R = np + m] } n+1

294

Queueing models

Also,
W (} p ) = } p

p31
X

} 3m

m=0

=}

=}

Pr [R = np + m] } pn+m

n=0

p31
X

3m

"
" X
X

Pr [R = q] } q q>pn+m

n=0 q=0

m=0
p

"
X

p31
X

3m

"
X

Pr [R = q] }

q=0

m=0

"
X

q>pn+m

n=0

where the Kronecker delta n>p = 1 if n = p else n>p = 0. The sum


"
X

"
X

q>pn+m =

n=0

q3m>pn = 1p|q3m

n=3"

is one if p divides q  m else it is zero. Such expression can be written as


"
X

q3m>pn

n=3"

p31
X
1  h2l(q3m)
=
=
h2ln(q3m)@p
p 1  h2l(q3m)@p
n=0

Using the latter summation yields


W (} p ) =

p31
"
p31
X
} p X 3m X
}
Pr [R = q] } q
h2ln(q3m)@p
p
q=0
m=0

n=0

p31 p31
}p X X

}h2ln@p

n=0 m=0

p31
}p X

n=0

3m

"
X

Pr [R = q] }h2ln@p

q=0

1  } 3p
2ln@p

31 U }h
1  }h2ln@p

where we have introduced the generating function U(}) =


Thus, we arrive at
W (} p ) =

P"

q=0 Pr [R

= q] } q .

p31

}p  1
1 X
2ln@p
U
}h

2ln@p 31
p
n=0 1  }h

The generating function U(}) can be specied further since


i
i
h
h
+
+
U(}) = H } (S3p) +F = H } F H } (S3p)
where independence of the arrival process and queueing process (GI) has

14.4 The GI/D/m queue

295

been used. From (14.31), the corresponding steady-state relation is


i
i
h
i
h
h
+
+
+
V(}) = H } (S3p) +A = H } A H } (S3p) = D(})H } (S3p)
i
h
+
while from V(}) = D(})T(}), we observe that T(}) = H } (S3p) . Hence,
U(}) = I (})T(})
where T(}) is given by (14.34).
We now turn our attention to the determination of I (}), the generating
function of the number of packets in front of the test packet. The test
packet has been uniformly chosen out of the total ow of arriving packets at
the system. Let us denote by AW the number of arriving packets in the same
time slot as the test packet. The random variable AW is not the same as
the number of arriving cells A per time slot. For example, we know that
there is at least one arrival in the time slot of the test packet, namely,
the test packet itself, hence, Pr [AW = 0] = 0. Furthermore, the larger the
number of arriving packets in a time slot, the higher the probability that
the test packet is chosen out of those packets in this time slot. Hence,
Pr [AW = m] is proportional to the number m of arriving packets in a time slot.
In addition, Pr [AW = m] is also proportional to Pr [A = m], which describes
how likely a number m of arriving packets is. Combining both shows that
Pr [AW = m] = m Pr [A = m]

P
1
W
with the proportionality factor equal to  = H[A]
because "
m=0 Pr [A = m] =
1. The test packet is uniformly distributed among the arriving packets
AW in the time slot of the test packet (in steady-state). The probability
of having precisely n packets in front of the test packet given AW = m  1
equals
1n?m
Pr [F = n|AW = m] =
m
Indeed, the test packet has equal probability 1m of occupying any of the
m possible positions. The occupation of a position n + 1 implies precisely
n cells in front of the test packet in a FIFO discipline. Using the law of
total probability (2.46),
Pr [F = n] =
=

"
X

Pr [F = n|AW = m] Pr [AW = m]

m=1
"
X
m=n+1

"
X
1 m Pr [A = m]
1
=
Pr [A = m]
m
H [A]
H [A]
m=n+1

296

Queueing models

The generating function I (}) becomes


I (}) =

"
X
n=0

"
"
1 X X
Pr [F = n] } =
Pr [A = m] } n
H [A]
n

n=0 m=n+1

m
X

"
"
1 X
1 X
1  }m
Pr [A = m]
} n31 =
Pr [A = m]
H [A]
H [A]
1}
m=1
m=1
n=1
3
4
"
"
X
X
1
C
=
Pr [A = m] } m 
Pr [A = m]D
H [A] (}  1)

m=1

m=1

} (D(})  Pr [A = 0]  1 + Pr [A = 0])
H [A] (}  1)

Since H [A] = D0 (1) = , we nally arrive at


I (}) =

D (})  1
(}  1) D0 (1)

Combining all involved expressions leads to the expression of the generating


function of the total time spent in the GI/D/m queue
2ln@p
p31
p31
p1
X
D
}h
 1 Y }h2ln@p  q
}
p


p

W (} ) =

31 p
p
1  q
}  D }h2ln@p q=1
1  }h2ln@p
n=0

For the single-server case (p = 1), the generating function of the system
time (queueing time plus service time) considerably simplies to

D (})  1
1
}
W (}) =

}  D(})
from which H [W ] and Var[W ] readily follow. The computation of the pdf
given the arrival process D(}) is more complex, as illustrated for the M/D/1/K
queue in the next section.

14.5 The M/D/1/K queue


Suppose we have a buer of N cells and an aggregate arrival stream consisting of a large number of sources with none of them dominating the others.
This input process is well modeled by a Poisson process with arrival rate
. Both the input process as well as the buer content and the output
process have been simulated in Fig. 14.4. Observe the eect of variations
and maximum number of cells in the queue and input process!

14.5 The M/D/1/K queue

297

6
Poisson(0.8)

number of cells in queue

number of served cells

4
number of arriving cells

10

2
0
0

0
0

200

400

600

800

1000

timeslot

200

400

600

800

1000

200

400

600

800

1000

timeslot

timeslot

Fig. 14.4. On the left, the Poisson input process with  = 0=8 in terms of the number
of cells versus the timeslot. In the middle, the buer occupancy for a buer with
N = 20 as function of time. On the right, the M/D/1/K output process in cells
served per timeslot.

14.5.1 The pdf of the buer occupancy in the M/D/1 queue


n

For a Poisson process, Pr[A = n] = n! h3 and D(}) = h(}31) . The pgf
T(}) of the buer content immediately follows from (14.36) as

T(}) = (1  )

(1  }) h(13})
1  } h(13})

The average queue length is obtained from (14.37) as

H QT;P@G@1 = H [Q] =

2
2 (1  )

and Littles law (13.21) provides the average waiting time in the queue

H QT;P@G@1


H zP@G@1 =
=

2 (1  )

(14.38)

Since } h(13}) is analytic everywhere, there always exists a neighborhood


(depending on ) around } = 0 for which |} h(13}) |  1. Hence, we can use
the series expansion for the geometric series to obtain
"
"
X
X
T(})
= (1  }) h(13})
} n hn(13}) =
(1  }) } n h(n+1)(13})
1
n=0

n=0

298

Queueing models

Integrating with respect to  removes the factor (1  }),


Z
"
X
}n
T(})
g =
h(n+1) h3(n+1)}
1
n+1
n=0

"
"
X
X
}n
(1)q (n + 1)q q q
h(n+1)
}
=
n+1
q!
q=0

n=0

" X
"
X
(1)q (n+1)
h
=
(n + 1)q31 q } q+n
q!
q=0
n=0

After a change in the variable p = q + n with p  0, which implies that


n  p since q = p  n  0, we have
p
!
Z
"
X
X (1)p3n
T(})
g =
h(n+1) (n + 1)p3n31 p3n } p
1
(p  n)!
p=0 n=0
p+1
!
"
X
X (1)p+13n
hn n p3n p+13n } p
=
(p
+
1

n)!
p=0
n=1

Dierentiation with respect to , gives


p+1
!

"
p+13n
p3n
X
X
(n)
(n)
+
}p
T(}) = (1  )
(1)p+13n hn
(p
+
1

n)!
(p

n)!
p=0
n=1

from which we nally deduce the probability t[p] = Pr [Q = p] that the


p-th position in the buer is occupied6


p+1
X
(n)p+13n
(n)p3n
p+13n n
(14.39)
+
t[p] = (1  )
(1)
h
(p + 1  n)! (p  n)!
n=1

One readily observes from the derivation above that the probability v[p] =
Pr [S = p] that p positions in the system are occupied, is v[p] = t[p  1]
6

Explicitely, we have for the queue content probabilities


t[0] = h (1 3 )
t[1] = h (1 3 ) (h 3  3 1)
t[2] = h (1 3 ) (

2
3 h2 +  3 h 3 2 h )
2

while the system content probabilities are


v[0] = (1 3 )
v[1] = (h 3 1) (1 3 )
v[p] = t[p 3 1]

(p D 2)

14.5 The M/D/1/K queue

299

for p  2 because the }-transform is V(}) = (1  ) 13}(13})


. This result is
h(13})
a characteristic property of a deterministic server. Next, we rewrite (14.39)
as
"p+1
#
p
p+13n
p3n
X
X
n (n)
n (n)

h
h
t[p] = (1  )
(p + 1  n)!
(p  n)!
n=1
n=1
h
i
(14.40)
= (1  ) h(p+1) j(h3 ; p + 1)  hp j(h3 ; p)
where
j({; p) =

p
X
{n
(p  n)n
n!

(14.41)

n=0

Due to the nature of the dierences, we immediately nd the cumulative


distribution,
N
X

h
i
t[p] = (1  ) h(N+1) j(h ; N + 1)  h

p=1

and since t[0] = (1  ) h , we arrive at


Pr[Q  N] =

N
X

t[p] = (1  ) h(N+1) j(h ; N + 1)

(14.42)

p=0

The expressions in (14.40) are numerically only useful for small p because
the series is alternating. This problem may be solved by considering a
famous result due to Lagrange (Markushevich, 1985, Vol. 2, Chapter 3,
Section 14)
he} = 1 + e

"
X
(e + qd)q31 3d} q
}h
q!

(14.43)

q=1

that converges for |}|  d31 . Dierentiation of (14.43) with respect to


z = }h3d} leads to
X (e + d + qd)q
h(e+d)}
=
} q h3qd}
1  d}
q!
q=0
"

g e}
g e} g}
because gz
h = g}
h gz =
} = , we obtain

ehe}
(13d})h3d} .

Choosing d = 1, e + d = p and

"
"
X
X
h3p
(p  q)q
(q  p)q
=
(h3 )q = j(h3 ; p) +
(h3 )q
1   q=0
q!
q!
q=p+1

300

Queueing models

and thus
3

j(h

"
X
(q  p)q
h3p

(h3 )q
; p) =
1   q=p+1
q!

where the innite series consists of merely positive terms. Substituted in


(14.42), this nally yields
N+1

Pr[Q A N] = (1  ) 

"
X

qq+N+1
(h3 )q
(q
+
N
+
1)!
q=1

(14.44)

In the heavy tra!c limit  =  $ 1, the dominant zero (5.35) of 1 


and the resulting tail
} h(13}) is approximately equal to   1 + 2 (13)

asymptotic (5.31) for the buer occupancy pdf is
Pr [Q A N]  

(13)
1   3N31
 h3N log   h32N 

1  

(14.45)

14.6 The N*D/D/1 queue


The N*D/D/1 queue is a basic model for constant bit rate sources in ATM,
as shown in Fig. 14.5. The input process consists of a superposition of Q
independent periodic sources with the same period G but randomly phased,
i.e. arbitrarily shifted in time with respect to each other. The server operates
deterministically and serves one ATM cell per timeslot. The buer size is
assumed to be innitely long mainly to enable an exact analytic solution.
During a time period G measured in time slots or server time units, precisely
Q cells arrive such that the tra!c intensity or load (13.2) equals  = Q
G.
1
2
3

K
D

.
.
.

N
Fig. 14.5. Sketch of an ATM concentrator where Q input lines are multiplexed onto
a single output line. The N*D/D/1 queue models this ATM basic switching unit
accurately.

Whereas the arrivals in the M/D/1 queue are uncorrelated, the successive

14.6 The N*D/D/1 queue

301

interarrival times in the N*D/D/1 queue are negatively correlated. For the
same average arrival rate, this more regular arrival process results in shorter
queues than in the M/D/1 queue, where the higher variability in the arrival
process causes longer queues.
Due to the dependence of the arrivals over many timeslots, the solution method is based on the Benes approach and starts from the complementary distribution (13.15) for the virtual waiting time or unnished
work in steady-state, Pr [y(w" ) A {] = limw<" Pr [y(w) A {]. Applied to the
N*D/D/1 queue the unnished work equals the number of ATM cells in the
system, thus Pr [y(w" ) A {] = Pr [QV A {]. Hence, in the steady-state for
 ? 1 or Q ? G,
Z w"


"
X

Pr y(w" + {  n) = 0
QD (x) gx = n
Pr [QV A {] =
n=d{e

w" +{3n

Z

w"

Pr
w" +{3n

QD (x) gx = n

The periodic cell trains with period equal to G timeslots at each input line
lead to a periodic aggregated arrival stream of the Q input lines also with
period G. Each cell train transports precisely one cell per period G, which
allows us to observe the characteristics of the aggregated arrival process
during the time interval [0> G). The computations are most conveniently
performed if we choose the steady-state observation point w" = G. Each of
the Q ATM cells arrives uniformly in [0> G] due to the random phasing of
each cell train and the probability that it arrives in [G +{n> G] is s = n3{
G .
Hence, the number of arrivals in [G + {  n> G] is a sum of Bernoulli random
variables, which is binomially distributed,


Z G
n  { Q3n
n{ n
Q
1
QD (x) gx = n =
Pr
G
G
n
G+{3n
The conditional probability is obtained as follows. The unnished work
at time G + {  n only depends on past arrivals in the interval [0> G + {  n].
Given that the number of arrivals in [G + {  n> G] equals n while there
are always precisely Q in [0> G], the number of arrivals in [0> G + {  n]
Q3n
? 1 since
equals Q  n and the corresponding tra!c intensity 0 = G+{3n
Q ? G and, thus, Q  n ? G  n for any n. From Section 13.3.2, we use the
local stationary result: for any stationary single-server queueing system with
tra!c intensity , the probability of an empty system at an arbitrary time
is 1  . If we take a random point in w 5 [0> G + {  n], then stationarity
0
implies that Pr [y (w) = 0] = 1 0 = G+{3Q
G+{3n . Since  ? 1, the system

302

Queueing models

is necessarily empty at some instant wW in [0> G + {  n). As explained in


Section 13.2 and Section 13.3, we may consider that the system restarts
from wW on ignoring the past. But the probability Pr [y (w) = 0] at a random
time in the interval [0> wW ] and in [wW > G + {  n] is the same, which means
that Pr [y(G + {  n) = 0] = G+{3Q
G+{3n because we can periodically repeat
the systems process in [0> wW ] while omitting any activity in [wW > G + {  n].
With respect to this newly constructed periodic arrival pattern, the point
w = G + {  n is arbitrary such that the local stationary result is applicable.
In summary, we arrive at the overow probability in a N*D/D/1 system,
Pr [QV A {] =

Q
X
n{ n
G+{Q Q
n  { Q3n
1
(14.46)
G+{n n
G
G

n=d{e

Observe that Pr [QV A Q ] = 0. Rewriting (14.46) yields


Q
G+{Q X Q
(n  {)n (G + {  n)Q3n31
Pr [QV A {] =
GQ
n
n=d{e

Q3d{e
G+{Q X Q
=
(Q  m  {)Q3m (G + {  Q + m)m31
Q
m
G
m=0

G+{Q
=
GQ

Q
X
m=0

G+{Q

GQ

Q
m

(G + {  Q + m)m31 (Q  {  m)Q3m

Q
X


Q
(G+{Q +m)m31 (Q m {)Q3m
m

m=Q3d{e+1

Applying Abels identity (Comtet, 1974, p. 128), valid for all x> |> },
q
X
q
(x  n})n31 (| + n})q3n
(x + |) = x
n
q

(14.47)

n=0

with q = Q , x = G + {  Q , | = Q  { and } = 1 gives


Pr[QV  {] =

G+{Q
GQ

Q
X


Q
(G + {  Q + m)m31 (Q  m  {)Q3m
m

m=Q3d{e+1

(14.48)
demonstrating that, indeed, Pr [QV  0] = 0. For small {, relation (14.48)
is convenient, while (14.46) is more suited for large { $ Q . For example,

1 Q
1 Q
1+ G
Pr [QV  1] = G+13Q
, while Pr [QV A Q  1] = G
.
G+1

14.6 The N*D/D/1 queue

In the heavy tra!c regime for  =


mation (Roberts, 1991) is

Q
G

303

$ 1, a Brownian motion approxi-



{
32{ Q
+ 13


Pr [QV A {] ' h

(14.49)

Figure 14.6 compares the exact (14.46) overow probability and the Brownian approximation (14.49) for  = 0=95. Observe from (14.45) that
Pr [QV A {] ' h3

2{2
Q

Pr QM/D/1 A {

which shows that, for su!ciently high Q , the overow probability of the
N*D/D/1 queue tends to that of the M/D/1 queue. Thus, an arrival process
consisting of a superposition of a large number of periodic processes tends to
2{2

a Poisson arrival process. The decaying factor h3 Q reects the eect of the
negative correlations in the arrival process and shows that a Poisson process
overestimates the tail probability in heavy tra!c. Comparison of (14.46)
and (14.44) for lower loads  = Q
G illustrates that the Poisson approximation
becomes more accurate.
0

10

-1

Exact
Brownian Approximation

10

-2

10

-3

10

M/D/1: N of

-4

10

-5

10

-6

Pr[NS > x]

10

N = 5000

-7

10

-8

10

-9

10

N = 1000

-10

10

N = 500

-11

10

N = 200

-12

10

N = 100

-13

N = 50

10

U = 0.95

-14

10

-15

10

20

40

60

80

100

Fig. 14.6. The overow probability Pr [QV A {] in the N*D/D/1 queue for  = 0=95
and various number of sources Q .

304

Queueing models

14.7 The AMS queue


Many arrival patterns in telecommunication exhibit active or on periods
succeeded by silent inactive or o periods. At the burst or ow level
phenomena of the order of time of an on-o period are dominant and the ner
details of the individual packet arrivals within an on-period can be ignored.
The stream of packets can be regarded as a continuous uid characterized
by the ow arrival rate.
The AMS queue is perhaps the simplest exact solvable queueing model
that describes the queueing behavior at the burst or ow level. The AMS
queue named after Anick, Mitra and Sondhi (Anick et al., 1982; Mitra, 1988)
considers Q homogeneous, independent on-o sources in a continuous uid
ow approach. For each source, both the on- and o-period are exponentially
distributed, which makes the model Markovian by Theorem 10.2.3. In the
on-period each source emits a unit amount of information. Hence, at each
moment in time when u sources are in the on-period, u packets (units of
information) arrive at the buer. The service time is constant and equal to
f ? Q packets per unit time. If f A Q , then the buer is always empty. The
time unit is chosen as the average time of an on-period while the average
time of an o-period is denoted by 1 . The buer size is innitely long. The
Q
tra!c intensity then equals  = f(1+)
and stability requires that  ? 1.

The ratio 1+
is the long term on time fraction of the sources.
Suppose the number of on-sources at time w is l. During the next time
interval w only two elementary actions can take place: a new source can
start with probability (Q  l)w or a source can turn o with probability
lw. Compound events have probabilities R(w2 ). The probability of no
change in the arrival process is 1  [(Q  l) + l]w during which l sources
are active and the queue empties at rate f  l. The AMS queueing process is
a birth-death process where state l describes the number of on-sources and
where the birth rate l = (Q  l)  and the death rate l = l. Let Sl (w> {)
where 0  l  Q , w  0, {  0 be the probability that at time w, l sources
are on and the buer content does not exceed {. Then, we have

Sl (w + w> {) = [Q  (l  1)]  w Sl31 (w> {) + (l + 1) w Sl+1 (w> {)


+ [1  {(Q  l)  + l} w] Sl (w> {  (l  f) w) + R(w2 )
Passing to the limit w $ 0 yields
CSl (w> {)
CSl (w> {)
+ (l  f)
= (Q  l + 1)Sl31 (w> {)  [(Q  l) + l]Sl (w> {)
Cw
C{
+ (l + 1) Sl+1 (w> {)

14.7 The AMS queue

305

The time-independent equilibrium probabilities, l ({) = limw<" Sl (w> {),


reect the steady-state where l sources are on and the buer content does
(w>{)
= 0, the steady-state equations become for
not exceed {. Setting CSlCw
0lQ
gl ({)
= (Q  l + 1)  l31 ({)  [(Q  l)  + l] l ({) + (l + 1) l+1 ({)
g{
(14.50)
In matrix notation, where ({) is a column vector as opposed to Markov
theory where ({) is a row vector,
(l  f)

g({)
= T({)
g{

(14.51)

and G = diag[f> 1  f> 2  f> = = = > Q  f] and T is a tri-diagonal (Q + 1)


(Q + 1) matrix,
5
9
9
9
T=9
9
9
7

Q 
1
0
0
Q  [(Q  1) + 1]
2
0
0
(Q  1)
[(Q  2) + 2] 3
..
..
..
..
.
.
.
.
0
0
0

0
0
0

0
..
..
.
.
[ + (Q  1)] Q

Q

6
:
:
:
:
:
:
8

The buer overow probability is Pr [QV A {] = 1  k({)k1 = 1 


PQ
g({)
m=0 m ({), which implies that k(4)k1 = 1. Moreover, lim{<" g{ = 0
and T(4) = 0 corresponds to the steady-state of the continuous Markov
chain (arrival process and service process as a whole). Furthermore,

Q l
1

(14.52)
l (4) =
(1 + )Q l
is the probability that l out of Q sources are on simultaneously irrespective
of what the buer level in the system is.
31
As shown in Section 10.2.2, besides ({) = hG T{ (0), the solution of
(14.51) can be expressed in terms of the eigenvalues m , the corresponding
right-eigenvector {m and left-eigenvector |m of G31 T,
({) =

Q
X

hm { {m |mW (0)

m=0

where, as shown in Appendix A.5.2.2, the eigenvalues are labeled in increasing order Q 3[f]31 ? ? 1 ? 0 ? Q = 0 ? Q31 ? ? Q3[f] . This
way of writing distinguishes between underload and overload eigenvalues.
Only bounded solutions are allowed. As shown in Appendix A.5.2.2, there

306

Queueing models

are precisely Q [f]1 negative real eigenvalues such that m 5 [0> Q [f]1].
In addition, m = Q that corresponds to the eigenvalue Q = 0 and the (4)
eigenvector. The general bounded solution of (14.51) is
X

Q3[f]31

({) = (4) +

dm hm { {m

(14.53)

m=0

where the scalar coe!cients dm = |mW (0) still need to be determined. Rather
than determining |mW (0) as in Appendix A.5.2.2, a more elegant and physical method is used. The eigenvalue solution in Appendix A.5.2.2 has scaled
the eigenvectors by setting the Q component equal to 1, hence, ({m )Q = 1.
Writing the Q -th component in (14.53) gives with (14.52)
Q
+
Q ({) =
(1 + )Q

Q3[f]31

dm hm {

(14.54)

m=0

The most convenient choice of { is { = 0. If the number of on-source m at


any time exceeds the service rate f, then the buer builds-up and cannot be
empty,
m (0) = 0

for [f] + 1  m  Q

This observation provides one equation in (14.54) for the coe!cients dn ,


X

Q3[f]31
m=0

dm = 

Q
(1 + )Q

and shows that Q  [f]  1 additional equations are needed to determine all
coe!cients dm . By dierentiating (14.54) p-times and evaluating at { = 0,
we nd these additional equations

Q3[f]31
X
gp Q ({)
=
dm mp

p
g{
{=0
m=0

which will be determined with the help of the dierential equation (14.51).
Indeed, for p = 1, the dierential equation (14.51) gives

gQ ({)
= G31 T(0)
g{ {=0
31
The important observation is that the eect of multiplication by
G T
gm ({)
decreases the number of zero components in (0) by 1, i.e. g{
=0
{=0

for [f] + 2  m  Q . Any additional multiplication by G31 T has the same

14.7 The AMS queue

eect. Since
that

gp ({)
g{p

307

p
= G31 T ({), we thus nd, for 0  p  Q  [f]  1

gp Q ({)
=0
g{p {=0

We write these Q  [f] equation in the unknown dn in matrix form,


5

1
1
12
..
.

9
9
9
9
9
9
9 Q [f]2
9 
7 1
Q [f]1
1

1
2
22
..
.

1
3
32
..
.

Q [f]2

3

Q [f]1

3

2
2

Q[f]2
Q[f]1

..
.

Q [f]1
2
Q
[f]1
..
.
Q [f]2
Q [f]1

Q [f]1

:
d0
:
d1
: 9
: 9
d2
:=9
: 9
..
: 7
.
:
8
dQ [f]1

Q [f]1

9
: 9
: 9
:=9
: 9
8 9
9
7

Q
(1+)Q

0
0
..
.
0
0

6
:
:
:
:
:
:
:
8

and recognize the matrix, denoted by Y , as a Vandermonde matrix (Section


A.1, art. 5) with
Y

l=0

m=l+1

Q3[f]31Q3[f]31

det (Y ) =

(m  l )

Since all eigenvalues appearing in the Vandermonde matrix are distinct (Appendix A.5.2.2) det (Y ) 6= 0 and a unique solution follows for all 0  m 
Q  [f]  1 from Cramers theorem as

dm = 


1+

Q3[f]31
l=0;l6=m

m
l  m

(14.55)

Together with the exact determination of the eigenvalues m and corresponding right-eigenvector {m explicitly given in Appendix A.5.2.2, the coe!cients
dm completely solve the AMS queue.
P
The buer overow probability Pr [QV A {] = 1  Q
m=0 m ({) becomes
PQ
with m=0 m (4) = 1,
X

Q3[f]31

Pr [QV A {] = 

m=0

m {

dm h

Q
X

({m )o

o=0

Using the explicit form of the generating function (A.44) where the roots
u1 and u2 belonging to eigenvalue n are specied in (A.42) and the residue
n = nm = f1 in (A.43), the buer overow probability is
X

Q 3[f]31

Pr [QV A {] = 

m=0

dm hm { (1  u1 )nm (1  u2 )Q 3nm

(14.56)

308

Queueing models

For large {, Pr [QV A {] will be dominated by the exponential with the


largest negative eigenvalue 0 (for which n0 = Q ),
Pr [QV A {]  d0 h0 {

Q
X

({Q )m

m=0

Writing that largest negative eigenvalue (A.47) in terms of the tra!c intensity , gives
(1 + ) (1  )
0 = 
1  Qf
Q Q
P
From (A.49), we have Q
. Combined with (14.55), the
m=0 ({Q )m =
f
asymptotic formula for the buer overow probability becomes
Y

Q3[f]31
Q 0 {

Pr [QV A {]   h

l=1

l
l  0

(14.57)

10

-1

10

-2

10

-3

U = 0.9

10

-4

10

-5

10

-6

Pr[NS > x]

10

N = 40

-7

10

-8

10

U = 0.7

-9

10

N = 100

-10

10

-11

10

-12

10

U = 0.5

-13

10

-14

10

-15

10

10

15

20

Fig. 14.7. The overow probability (14.56) in the AMS queue versus the buer level
{ for xed  = 12 . For each tra!c intensity  = 0=5, 0=7 and 0.9, the upper curve
corresponds to Q = 40 and the lower to Q = 100. The asymptotic formula (14.57)
is shown in dotted line.

Figure 14.7 shows both the exact (14.56) and asymptotic (14.57) overow

14.8 The cell loss ratio

309

probability as function of { for various tra!c intensities  and two size of


Q . The average o-period in Fig. 14.7 is two times the average on-period.
For large values of { and large tra!c intensities , the asymptotic formula is
adequate. For smaller values, clear dierences are observed. As mentioned in
Section 5.7, the asymptotic regime that nearly coincides with (14.57) refers
to the burst scale phenomena while the non-asymptotic regime reects the
smaller scale variations. The AMS queue allows us to analyze the eect of
the burstiness of a source by varying .
14.8 The cell loss ratio
Due to its importance in ATM and in future time-critical communication
services, the QoS loss-performance measure, the cell loss ratio, deserves some
attention. In designing a switch for time-critical services with strict delay
requirements smaller than GW , the buer size N is dimensioned as follows.
The order of magnitude of GW is about 10 ms, the maximum end-to-end
delay for high-quality telephony (world wide) advised in ITU-T standards.
Let H[K] be the average number of hops of a path in an ATM network
that rarely exceeds 10 hops. The buer size N is determined such that the
GW
GW
, thus, N
maximum waiting time of a cell never exceeds H[K]
  10 . For
example, for STM-1 links where F = 155 Mb/s, we have that  ' 366 800
W
ATM cell/s such that N  G
10 ' 366 ATM cell buer positions. This rstorder estimate shows that the ATM buer for time-critical tra!c consists
of a few hundreds of ATM cell positions. That small number for N indeed
assures that the delay constraints are met, but introduces the probability
to loose cells. Hence, the QoS parameter to be controlled for time-critical
services is the cell loss ratio.
The cell loss ratio fou is dened as the ratio of the long-time average number of lost cells because of buer overow to the long-time average number
of cells that arrive in steady-state. There are typically two dierent views
to describe the cell loss ratio: a conservation-based and a combinatorial
one. The conservation law simply states that cells entering the system also
must leave it. The average number of entering cells are all those oered
per time slot minus the ones that have been rejected, thus (1  fou). On
the other side, the average number of cells that leave the system are related to the server activity as (1  t [0]), where  is the service rate and
t[m] = Pr [QT = m]. Hence, we have
(1  fou)  = (1  t [0]) 

(14.58)

In the combinatorial view, only the arrival process is viewed from a position

310

Queueing models

in the buer and the number of ways in which cells are lost are counted
leading to
"
N
1 X X
q
t [N  m] d [m + q]
(14.59)
fou = 0
D (1) q=0
m=0

with D0 (1) =  and d[m] = Pr [A = m]. Although equation (14.58) is simple,


its practical use is limited since the quantities involved are to be known with
extremely high accuracy if fou is of the order of 10310 , which in practice
means a virtually loss-free service. Therefore, we conne ourselves to the
combinatorial result and express (14.59) in terms of a generating function
as

gY (})
0
fou D (1) =
(14.60)
g} }=1
where
Y (}) =

"
X

}q

q=0

N
X

N
X

t [N  m] d [m + q] =

m=0

t [N  m] } 3m

"
X
q=0

"
X

N
X

}q
!

t [N  m] } 3m d [m + q] } m

m=0

d [m + q] } m+q

q=0

m=0

Rearranging in terms of the generating function for the arrivals D(}) and
P
m
for the buer occupancy T(}) = N
m=0 t [m] } , where t [m] = 0 for m A N,
yields

!
m31
N
X
X
t [N  m] } 3m D(}) 
d [q] } q
Y (}) =
q=0

m=0

= D(}) }

3N

N
X

t [N  m] }

N3m

m=0

= } 3N D(})T(})  } 3N

N
X

t [N  m] }

3m

N
X
m=0

t [m] } m

d [q] } q

q=0

m=0
N3m31
X

m31
X

d [q] } q

(14.61)

q=0

In order to express the cell loss ratio entirely in terms of the generating
functions D(}) and T(}), we employ (2.20),
!


Z
Z
q
q
} q+1
X
X
1
\
($)
\ ($)
1
m
m
1
g$
|[m]} =
}
g$ =
2l F(0) $ m+1
2l F $  }
$
m=0
m=0
Z
1
\ ($) } q+1
= \ (}) 
g$
(14.62)
2l F $  } $

14.8 The cell loss ratio

311

where F is a contour enclosing the origin and the point } and lying within
the convergence region of \ (}). Combining (14.61) and (14.62), we rewrite
Y (}) as
Z
D($)T($)
1
Y (}) = } 3N D(})T(})  } 3N T(})D(}) +
g$
2l F ($  }) $ N
Z
D($)T($)
1
g$
=
2l F ($  }) $ N
Finally, our expression for the cell loss ratio in a GI/G/1/K system reads
Z
1
D($)T($)
fou =
g$
(14.63)
2lD0 (1) F ($  1)2 $N
where the contour F encloses both the origin and the point } = 1 and lies in
the convergence region of D(}). Usually, D(}) is known while T(}) proves
to be more complicated to obtain. The product T(})D(}) = V(}) is the pgf
of the system content.
If T(}) and D(}) are meromorphic functions7 and if

D(}) T(})

= 0>
lim
}<" (}  1)2 } N31
the contour F in (14.63) can be closed over |$| A 1-plane to get
1 X
D($)T($)
fou =  0
Res$<s N
D (1) s
$ ($  1)2

(14.64)

where s are the poles of D(})T(}) outside the unit circle. If these conditions
are met, a non-trivial evaluation of the cell loss ratio can be obtained. In
case the buer pgf of the nite system is known, then T(}) is a polynomial
T(})
is zero and
of degree at most N so that the only pole of T(})
}N
lim}<" } N =
} D(})
t(N)  1 and the above conditions simplify to lim}<" (}31)
2 = 0. Executing (14.63) then leads to
1 X
T(s)
fou =  0
Res$<s D($)
(14.65)
N
D (1) s s (s  1)2
where only the poles s of the arrival process D(}) play a role. For example,
if the number of arrivals has a geometric distribution d [n] = (1  )n
13
with 0    1 with generating function (3.6), Dgeo (}) = 13}
, then the
conditions for (14.65) are satised and we obtain,

1
N
fougeo =  T

7

Functions that only have poles in the complex plane.

312

Queueing models

An important class excluded from (14.64) consists of entire functions D(})


that possess a Taylor series expansion converging for all complex variables
}. The pgf of a Poisson process with parameter , DPoisson (}) = h(}31) ,
is an important representative of that class. For a Poisson arrival process,
(14.63) is
Z
h$ T($)
h3
g$
fouPoisson =
2l F ($  1)2 $N
Deforming the contour to enclose the negative half $-plane (Re($) ? f)
yields
Z f+l"
h$ T($)
h3
fouPoisson =
g$
2l f3l" ($  1)2 $ N
where the real number f exceeds unity. This expression is recognized as an
inverse Laplace transform and since the argument of the Laplace transform
is a rational function, an exact evaluation is possible leading, however, again
to (14.59). Hence, the combinatorial view does not oer much insight immediately which suggests to consider a conservation-based approach. Indeed,
it is well known that, owing to the PASTA property (Theorem 13.5.1), an
exact expression (Syski, 1986; Bisdikian et al., 1992; Steyaert and Bruneel,
1994) in continuous-time for the cell loss ratio in a M/G/1/K system can
be derived, with the result
fouM/G/1/K;cont = (1  )

Pr[Q A N  1]
1   Pr[Q A N  1]

(14.66)

where, as usual, the tra!c intensity  =  and Pr [Q A N  1] is the overow probability in the corresponding innite system M/G/1. Transforming
fouco nt
yields
(14.66) to discrete-time using foudiscr = (13fou
co nt )
fouM/G/1/K;discr =

1   Pr[Q A N  1]
 1  Pr[Q A N  1]

(14.67)

14.9 Problems
(i) A router processes 80% of the time data packets. On average 3.2
packets are waiting for service. What is the mean waiting time of a
packet given that the mean processing time equals 1 ?
(ii) Compute in a M/M/m/m queue the average number of busy servers.
(iii) Let us model a router by a M/M/1 system with average service time
equal to 0.5 s.

14.9 Problems

313

(a) What is the relation between the average response time (average system time) and the arrival rate ?
(b) How many jobs/s can be processed for a given average response
time of 2.5 s?
(c) What is the increase in average response time if the arrival
rate increases by 10%?
(iv) Assume that company has a call center with two phone lines for service. During some measurements it was observed that both the lines
are busy 10% of the time. On the other hand, the average call holding
time was 10 minutes. Calculate the call blocking probability in the
case that the average call holding time increases from 10 minutes to
15 minutes. Call arrivals are Poissonean with constant rate.
(v) Consider a queueing network with Poisson arrivals consisting of two
innitely long single-server queues in tandem with exponential service
times. We assume that the service times of a customer at the rst
and second queue are mutually independent as well as independent
of the arrival process. Let the rate of the Poisson arrival process be
, and let the mean service rates at queues 1 and 2 be 1 and 2 ,
respectively. Moreover, assume that  ? 1 and  ? 2 . Give the
probability that in steady-state there are q customers at queue 1 and
p customers at queue 2.
(vi) Let us consider the following simple design question: which queue of
the M/M/m family is most suitable if the arrival rate is  and the
required service rate is n, with n A 1. We have the three options
illustrated in Fig. 14.8 at our disposal. Since all queues have innite
P
O/k

kP
O

O
P

P
O/k
A

k
B

Fig. 14.8. Three dierent options: (A) one M/M/1 queue with service rate
n, (B) n M/M/1 queue with service rate  and (C) one M/M/k queue with
service rate n=

buers and the same tra!c intensity  = n
, and thus the same
throughput. The QoS qualier of interest here is the delay, more

314

Queueing models

precisely, the system time W of a packet. Compare the average system


time and draw conclusions.
(vii) An aeroplane takes exactly 5 minutes to land after the airports tra!c
control has sent the signal to land. Aeroplanes arrive at random with
an average rate of 6/hour. How long can an aeroplane expect to circle
before getting the signal to land? (Only one aeroplane can land at a
time)
(viii) There are two kinds of connection requests arriving at a base station
of a mobile telephone network: connection requests generated by
new calls (that originate from the same cell as the base station) or
handovers (that originate from a dierent cell, but are transferred
to the cell of the base station). The handovers are supposed not to
experience blocking. Therefore, the base station has to reject some of
the new call connection requests. Every accepted connection request
occupies one of the P available channels. During a busy hour, the
average measured channel occupation time of a call is 1.64 minutes
irrespective of the type of call. Furthermore, the average number of
active calls is 52 and the measured blocking is 2% of the number of
all the connection requests. The average interarrival time between
two consecutive new call connection requests in the cell is 3 seconds.
(a) Calculate the arrival rate (in calls/minute) for the handover
calls.
(b) What is the percentage of new calls that are blocked?
(ix) Let Q denote the number of Poisson arrivals with rate  during
the service time { (random variable) of a packet. Assume that the
Laplace transform of the service time *{ (v) = H[h3v{ ] is known.
(a) Show that the pgf of Q is given by *{ ((1  })).
(b) What is the pgf if the service time { is exponential distributed
with mean 1 ? Deduce from this the distribution of Q .
(x) A single-server queue has exponential inter-arrival and service times
with means 31 and 31 , respectively. New customers are sensitive
to the length of the queue. If there are l customers in the system
when a customer arrives, then that customer will join the queue with
a probability (l+1)31 , otherwise he/she departs and does not return.
Find the steady-state probability distribution of this queuing system.
(xi) The M/M/m/m/s queue (The Engset-formula). Consider a system
with p connections and v customers who all desire to telephone and,
hence, need to obtain a connection or line. Each customer can occupy
at most one line. The group of v customers consists of two subgroups.

14.9 Problems

315

When a line has been assigned to a customer, this customer is transferred from the still demanding subgroup to the served group.
The number of call attempts decreases with the size of the served
group whose members all occupy one line. More precisely, the arrival rate in the Engset model is proportional to the size of the still
demanding subgroup and the number of arrivals is exponential. The
holding time of a line is also exponentially distributed with mean 1 .
(a) Describe the M/M/m/m/s queue as a birth-death process.
(b) Compute the steady-state.
(c) Compute the blocking probability (similar to the blocking in
the Erlang model).
(xii) Compare the cell loss ratio of the M/M/1/K and of the discrete
M/1/D/K using the dominant pole approximation in Section 5.7.
Hint: approximate the cell loss ratio by the overow probability.

Part III
Physics of networks

15
General characteristics of graphs

The structure or interconnection pattern of a network can be represented by


a graph. Properties of the graph of a network often relate to performance
measures or specic characteristics of that network. For example, routing is
an essential functionality in many networks. The computational complexity
of shortest path routing depends on the hopcount in the underlying graph.
This chapter mainly focuses on general properties of graphs that are of
interest to Internet modeling.
Mainly driven by the Internet, a large impetus from dierent elds in
science makes the understanding of the growth and the structure of graphs
one of the currently most studied and exciting research areas. The recent
books by Barabasi (2002) and Dorogovtsev and Mendes (2003) nicely reect
the current state of the art in stochastic graph theory and its applications
to, for example, the Internet, the World Wide Web, and social and biological
networks.

15.1 Introduction
Network topologies as drawn in Fig. 15.1 are examples of graphs. A graph
J is a data structure consisting of a set of Y vertices connected by a set
of H edges. In stochastic graph theory and communications networking,
the vertices and edges are called nodes and links, respectively. In order
to dierentiate between the expectation operator H [=], the set of links is
denoted by L and the number of links by O and similarly, the set of nodes
by N and number of nodes by Q . Thus, the usual notation of a graph
J (Y> H) in graph theory is here denoted by J (Q> O).
The full mesh or complete graph NQ consists of Q nodes and O = Omax =
Q(Q31)
links, where every node has a link to every other node. The graph
2
that is generated by the statement any l is directly connected to any m in
319

320

General characteristics of graphs

full mesh
(complete graph)

star

ring

2D (square) lattice

Tree (connected,
loopless graph)

Fig. 15.1. Several types of network topologies or graphs.

a population of Q members,
is a complete graph NQ . Since in NQ the number of links Omax = R Q 2 for large Q , it demonstrates Metcalfes law:
the value of networking increases quadratically in the number of connected
members.
The interconnection pattern of a network with Q nodes can be represented
by an adjacency matrix D consisting of elements dlm that are either one or
zero depending on whether there is a link between node l and m or not.
The adjacency matrix is a real symmetric Q Q matrix when we assume
bi-directional transport over links. If there is a link from l to m (dlm = 1)
then there is a link from m to l (dml = 1) for any m 6= l. Moreover, we
exclude self-loops (dmm = 0) or multiple links between two nodes l and m.
More properties of the adjacency matrix of a graph are found in Appendix
B.
A walk from node D to node E with n  1 hops or links is the node
list WD<E = q1 $ q2 $ qn31 $ qn where q1 = D and qn = E.
A path from node D to node E with n  1 hops or links is the node list
PD<E = q1 $ q2 $ qn31 $ qn where q1 = D and qn = E and where
qm 6= ql for each index l and m. Sometimes the shorter notation PD<E
= q1 q2 qn31 qn is used. All links ql $ qm and the nodes qm in the path
PD<E are dierent, whereas in a walk WD<E no restrictions on the node
list is put. If the starting node D equals the destination node E, that path
PD<D is called a cycle or loop. In telecommunications networks, paths and
not walks are basic entities in connecting two communicating parties. Two
paths between D and E are node(link)-disjoint if they have no nodes(links)
in common.
Apart from the topological structure specied via the adjacency matrix D, the link between node l and m is further characterized by a link
weight z(l $ m), most often a positive real number1 that reects the
1

In quality of service routing, a link is specied by a vector z(l


< m) with positive components, each reecting a metric (such as delay, jitter, loss, monetary cost, administrative weight,
physical distance, available capacity, priority, etc.).

15.2 The number of paths with m hops

321

importance of that particular link. Often symmetry in both directions,


z(l $ m) = z(m $ l), is assumed leading to undirected graphs. Although
this assumption seems rather trivial, we point out that in telecommunications, transport of information in up-link and down-link is, in general, not
symmetrical. Via measurements in the Internet, Paxson (1997) found in
1995 that about 50% of the paths from D $ E were dierent from those
from E $ D= Furthermore, it is often assumed that the link metric z(l $ m)
is independent from z(n $ o) for all links (l $ m) dierent from (n $ o). In
the Internets intra-domain routing protocol, the Open Shortest Path First
(OSPF) protocol, network operators have the freedom2 to specify the link
weight z(l $ m) A 0 on the interfaces of their routers.
15.2 The number of paths with m hops
Let [m (D $ E; Q) denote the number of paths with m hops between a
source node D and a destination node E. The most general expression for
the number of paths with m hops between node D and node E is
X
X
X

1D<n1 = 1n1 <n2 = =1nm31 <E


[m (D $ E; Q ) =
n1 M{D>E}
@
n2 M{D>n
@
@
1 >E} nm31 M{D>n
1 > >nm32 >E}

(15.1)
where 1{ is the indicator function. The number of paths with one hop equals
[1 (D $ E; Q ) = 1D<E . The maximum number of m hop paths is attained
in the complete graph NQ where 1n1 <n2 = 1 for each link n1 $ n2 and
equals
(Q  2)!
(15.2)
max([m (D $ E; Q )) =
(Q  m  1)!
The maximum number of hops in any path is Q  1. This maximum occurs,
for example, in a line graph where the path runs from the one extreme node
to the other or in a ring (see Fig. 15.1) between neighboring nodes where
there is a one hop and a (Q  1)-hops path.
The total number of paths PQ between two nodes in the complete graph
is
PQ =

Q31
X

max([m (D $ E; Q )) =

m=1

Q31
X
m=1

X 1
(Q  2)!
= (Q  2)!
(Q  m  1)!
n!
Q32
n=0

= (Q  2)!h  U
2

10
In Ciscos OSPF implementation, it is suggested to use z(l < m) = E(l<m)
where E(l < m)
denotes the capacity (in bit/s) of the link between nodes l and m. An approach to optimize the
OSPF weights to reect actual tra!c loads is presented by Fortz and Thorup (2000).

322

General characteristics of graphs

where
"
"
X
X
1
(Q  2)!
U = (Q  2)!
=
m!
(Q  1 + m)!
m=Q31

m=0

1
1
1
+
+
+
Q  1 (Q  1)Q
(Q  1)Q (Q + 1)
m
"
X
1
1
?
=
Q 1
Q 2
=

m=1

implying that for Q  3, U ? 1. But PQ is an integer. Hence, the total


number of paths in NQ is exactly equal to
PQ = [h(Q  2)!]

(15.3)

where h = 2.718 281=== and [{] denotes the largest integer smaller than or
equal to {. Since any graph is a subgraph of the complete graph, the
maximum total number of paths between two nodes in any graph is upper bounded by [h(Q  2)!].

15.3 The degree of a node in a graph


The degree gm of a node m in a graph J(Q> O) equals the number of its
neighboring nodes and 0  gm  Q  1. Clearly, the node m is disconnected
from the rest of the graph if gm = 0. Hence, in connected graphs, 1  gm 
Q  1. The basic law for the degree (see also Appendix (B.2)) is
Q
X

gm = 2O>

m=1

since each link belongs to precisely two nodes and, hence, is counted twice.
In directed graphs, the in(out)-degree is dened as the number of the in(out)going links at a node, while the sum of in- and out-degree equals the degree.
The minimum nodal degree in the graph J is denoted by gmin = minmMJ gm .
P
2O
The average degree of a graph is dened as gd = Q1 Q
m=1 gm = Q which
is, for a connected graph, bounded by 2  Q2  gd  Q  1. The lower
bound is obtained for any spanning tree, a graph that connects all nodes
and that contains no cycles and where O = Omin = Q  1. The upper bound
is reached in the complete graph NQ with Omax = Q(Q31)
. Graphs where
2
gmin = gd such as NQ and the ring topology in Fig. 15.1 are called regular
graphs since any node has precisely gd links.
Sometimes networks are classied either as dense if gd is high or as sparse

15.3 The degree of a node in a graph

323

Fig. 15.2. Degree graph with  = 2=4 and Q = 300. All nodes are drawn on a
circle.

if gd is small. For instance, the Internet is sparse with average degree gd 


3, although some backbone routers may have a much higher degree, even
exceeding 100. The distribution of the degree GInternet of an arbitrary node
in the Internet is shown to be approximately polynomial (Siganos et al.,
2003),
Pr [GInternet = n] 

n3
 ( )

(15.4)

P
3v for Re(v) A 1 is the Riemann
with3  5 (2=2> 2=5) and  (v) = "
n=1 n
Zeta function (Titchmarsh and Heath-Brown, 1986). A graph of this class
is called a degree graph. Figures 15.2 and 15.3 show two instances of a
degree graph.
Also the web graph consisting of websites and hyperlinks features a power
law for the in-degree. David Aldous has given the following argument why a
power law of the in-degree of the web graph is natural. To a good approximation, the number of websites is growing exponentially at rate  A 0. This
means that the lifetime W of a random website satises Pr [W A w]  h3w .
3

A more general expression than (15.4) is Pr [gm = n] = fn j(n), where f is a normalization
constant and where j(n) is a slowly varying function (Feller, 1971, pp. 275-284) with basic
property that limw<" j(w{)
= 1, for every { A 0.
j(w)

324

General characteristics of graphs

Fig. 15.3. Degree graph with  = 2=4 and Q = 200. The higher degree nodes are
put inside the circle.

Let o (x) denote the number of links into a site at time x after its creation. At
observation time w, the distribution of the number of links [ into a random
website is, by the law of total probability,
Z w
g Pr [W  x]
gx
Pr [[ A n] =
Pr [[ A n|W = x]
gx
0
Z w

h3x Pr [[ A n|W = x] gx
0
Z w
Z w
31
3x
h 1{o(x)An} gx = 
h3x gx = h3w + h3o (n)

0

o31 (n)

Only if o increases exponentially fast as o (x)  hx for some  ? , a power


law behavior of the in-degree


Pr [[ A n]  n3 
arises for su!ciently large w. For a polynomial growth o (x)  x and large
w,
Pr [[ A n]  h3n

1


The large dierence in the decrease of Pr [[ A n] with n between both ex-

15.4 Connectivity and robustness

325

amples illustrates the importance of the growth law of o (x). The argument
shows that a polynomial scaling law, commonly referred to as a power law,
is a natural consequence of exponential growth. An exponential growth possesses the property that go(x)
gx = o (x) which is established by preferential
attachment. Preferential attachment means that new links are on average
added to sites proportional to their size. The more links a site has, the larger
the probability that a new link attaches to this site. For example, already
popular websites are increasingly more often linked to than small or less
popular websites. Since many aspects of the Internet, such as the number
of IP packets, number of users, number of websites, number of routers, etc.,
are currently growing approximately exponentially fast, the often observed
power laws are more or less expected.
15.4 Connectivity and robustness
A graph J is connected if there is a path between each pair of nodes and
disconnected otherwise. A telecommunication network should be connected.
Moreover, it is essential that the network should be robust: it should still
operate if some of the links between routers or switches are broken or temporarily blocked by other calls. Hence, the network graph should possess a
redundancy of links. The minimum number of links to connect all nodes in
the network equals Q 1. This minimum conguration is called redundancy
level 1. In general, a redundancy level of G is dened by Baran (2002) as
the link-to-node ratio in an innite G-lattice4 . A redundancy level of at
least 3 is regarded as a highly robust network. A consequence of this insight has been employed in the design of the early Internet (Arpanet): it
would be theoretically possible to build extremely reliable communication
networks out of unreliable links by the proper use of redundancy. Another
more timely application of the same principle is the design of reliable ad-hoc
and sensor networks.
4

A G-lattice is a graph where each nodal position corresponds to a point with integer coordinates
within a G dimensional hyper-cube with size ]. Apart from the border nodes, each node has a
same degree equal to 2G. The number of nodes equals Q = ] G . From (B.2), the link-to-node
ratio follows as
Q
1 [
O
=
gm = G 3 u
Q
2Q m=1


 1
where the correction u = R Q G 31 is due to the border nodes. For an innite G-lattice,
where the limit ] < " (which implies Q < "), we obtain
lim

]<"

O
=G
Q

326

General characteristics of graphs

There exist interesting results from graph theory that help to dimension
a reliable telecommunication network. Instead of the redundancy level, the
edge and vertex connectivity seem more natural quantiers from which robustness can be derived. The edge connectivity (J) of a connected graph
J is the smallest number of edges (links) whose removal disconnects J. The
vertex connectivity (J) of a connected graph dierent5 from the complete
graph NQ is the smallest number of vertices (nodes) whose removal disconnects J.
edge connectivity

B
A
D

O(G) = 1
G

E
F

A
D
G

E
A

N(G) = 1

vertex connectivity

E
F

Fig. 15.4. An example of the edge and the vertex connectivity of a graph.

These denitions are illustrated in Fig. 15.4. For any connected graph J
holds that
(J)  (J)  gmin (J)

(15.5)

In particular, if J is the complete graph NQ , then (NQ ) = (NQ ) =


gmin (NQ ) = Q  1. Due to the importance of the inequality6 (15.5), it
deserves some more discussion. Let us concentrate on a connected graph
J that is not a complete graph. Since gmin (J) is the minimum degree of
a node, say q, in J, by removing all links of node q, J is disconnected.
By denition, since (J) is the minimum number of links that leads to
disconnectivity, it follows that (J)  (J) and (J)  Q  2 because J
is not a complete graph and consequently the minimum nodal degree is at
most Q  2. Furthermore, the denition of (J) implies that there exists
a set V of (J) links whose removal splits the graph J into two connected
subgraphs J1 and J2 , as illustrated in Fig. 15.5. Any link of that set V
5
6

The complete graph NQ cannot be disconnected by removing nodes and we dene (NQ ) =
Q 3 1 for Q D 3.
A second general inequality (B.23) relates the second smallest eigenvalue of the Laplacian to
the edge and vertex connectivity (see Section B.4).

15.4 Connectivity and robustness

327

connects a node in J1 to a node in J2 . Indeed, adding again an arbitrary


link of that set makes J again connected. But J can be disconnected into the
same two connected subgraphs by removing nodes in J1 and/or J2 . Since
possible disconnectivity inside either J1 or J2 can occur before (J) nodes
are removed, it follows that (J) cannot exceed (J), which establishes the
inequality (15.5).
G2

G1
C

Fig. 15.5. A graph J with Q = 16 nodes and O = 32 links. Two connected


subgraphs J1 and J2 are shown. The graphs connectivity parameters are (J) = 1
(removal of node F), (J) = 2 (removal of links from F to J1 ), gmin (J) = 3 and
gd = 2O
Q = 4.

Let us proceed to nd the number of link-disjoint paths between D and


E in a connected graph J. Suppose that K is a set of links whose removal
separates D from E. Thus, the removal of all links in the set K destroys all
paths from D to E. The maximum number of link-disjoint paths between
D and E cannot exceed the number of links in K. However, this property
holds for any set K, and thus also for the set with the smallest possible
number of links. A similar argument applies to node-disjoint paths. Hence,
we end up with Theorem 15.4.1:
Theorem 15.4.1 (Mengers Theorem) The maximum number of link(node)-disjoint paths between D and E is equal to the minimum number of
links (nodes) separating D and E.
Recall that the edge connectivity (J) (analogously vertex connectivity
(J)) is the minimum number of links (nodes) whose removal disconnects
J. By Mengers Theorem, it follows that there are at least (J) link-disjoint
paths and at least (J) node-disjoint paths between any pair of nodes in J.
In order to dimension the graph J of a robust telecommunications network, the goal is to maximize both (J) and (J). Of course, the most
reliable graph is the complete graph; however, it is also the most expensive. Usually, since the cost of digging and of installing/connecting the
bres is around 70% of the total network cost, the total number of links O

328

General characteristics of graphs

is minimized. Since the minimum cannot exceed the average, we have that
gmin  gd = 2O
Q . From (15.5), it follows that the best possible reliability is
achieved if the network graph is designed such that
(J) =

2O
Q

The optimum implies that gmin (J) = gd = 2O


Q or that each node has the
same degree gm = gd . Hence, a best possible reliable graph is a regular
graph (gm = gd ), but not every regular graph necessarily obeys (J) = gd .
Furthermore, two dierent graphs with the same parameters Q , O, (J),
(J) and gmin (J) are not necessarily equally reliable. Indeed, the edge
and vertex disconnectivity only give a minimum number (J) and (J)
respectively, but do not give information about the number of subsets in
J that lead to this number. It is clear that if only one vulnerable set of
nodes is responsible for a low (J), while in another graph there are more
such sets, that the rst graph is more reliable than the second one. In
summary, the presented simplied analysis gives some insights, but more
details (e.g. the number of vulnerable sets or subgraphs) must be considered
in the dimensioning study.

15.5 Graph metrics


An important challenge in the modeling of a network is to determine the class
of graphs that represents best the global and local structure of the network.
Most of the valuable networks like the Internet, road infrastructures, neural
networks in the human brain, social networks, etc. are large and changing
over time. In order to classify graphs a set of distinguishing properties,
called metrics, needs to be chosen. These metrics are in general function of
the graphs structure J(Q> O). Natural metrics of a graph are the degree
distribution and the hopcount distribution of an arbitrary path. Beside
quantities such as the diameter and the complexity of a graph dened in
algebraic graph theory (Appendix B), some other metrics are the clustering
coe!cient, the expansion, the resilience, the distortion and the betweenness.
The clustering coe!cient fJ (y) characterizes the density of connections
in the environment of a node y and is dened as the ratio of the number of
links | connecting the gy neighbors of y over the total possible gy (g2y 31) ,
fJ (y) =

2|
gy (gy  1)

(15.6)

The expansion hJ (k) of a graph reects the number of nodes that can be

15.6 Random graphs

329

reached in k hops from a node y,


hJ (k) =

1 X
|F (k)|
Q2

(15.7)

yMN

where F (k) is the set of nodes that can be reached in k hops from a node
y and |D| represents the number of elements in the set D. We can interpret
F (k) geometrically as a ball centered at node y with radius k.
The resilience uJ (p) measures the connectivity or robustness of a graph.
Let p = |F (k)| denote the number of nodes in a ball centered at node y
and with radius k, and dene o (y> p) as the number of links that needs to
be removed to split F (k) into two sets with roughly equal numbers of nodes
(around p@2). The resilience uJ (p) of a graph is
1 X
uJ (p) =
o(y> p)
(15.8)
O
yMN

The distortion wJ (p) measures how closely the graph resembles a tree and
is dened as
1 X
z (F (k))
(15.9)
wJ (p) =
Q
yMN

where z (J) is the value of the minimum spanning tree in J with unit link
weight z (l $ m) = 1 for each link of J=
Consider a ow with a unit amount of tra!c between each pair of nodes
in the graph J. Each ow between a node pair follows the shortest path
between that node pair. The betweenness E of a link (node) is dened as
the number of shortest paths between all possible pairs of nodes in J that
traverse the link (node). If Kl<m denotes the number of hops in the shortest
path from l $ m, then the total number of hops KJ in all shortest paths in
P PQ
PO
J is KJ = Q
l=1
m=l+1 Kl<m . This number is also equal to KJ =
o=1 Eo ,
where Eo is the betweenness of a link o in J. Taking the expectation of both
relations gives the average betweenness of a link in terms of the average
hopcount
Q
2
H[KQ ]  H[KQ ]
O
with equality only for the complete graph.

H [E] =

15.6 Random graphs


Besides the regular topologies in Fig. 15.1, the class of random graphs constitutes an attractive set of topologies to analyze network performance. The

330

General characteristics of graphs

theory of random graphs originated from a series of papers by Erds and


Rnyi in the late 1940s. There exists an astonishingly large amount of literature on random graphs. The standard work on random graphs is the book
by Bollobas (2001). We also mention the work of Janson et al. (1993) on
evolutionary processes in random graphs.
The two most frequently occurring models for random graphs are the
Erds-Rnyi random graphs Js (Q ) and Ju (Q> O). The class of random
graphs denoted by Js (Q ) consists of all graphs with Q nodes in which
the links are chosen independently and with probability s. In the class
Js (Q ) the total number of links is not deterministic, but on average equal

to H [O] = s Omax where Omax = Q2 . Since s = OH[O]
we also call s the
max
link density of Js (Q ). An instance of the class J0=013 (300) is drawn in
Fig. 15.6. Related to Js (Q ) are the geometric random graphs J{slm } (Q )
where the links are still chosen independently but where the probability of
l $ m being an edge is slm . An example of J{slm } (Q ) is the Waxman graph
ul  um |) and where
(Waxman, 1998; Van Mieghem, 2001) with slm = exp(d|
the vector ul represents the position of node l and d a real, non-negative
number. Geometric random graphs are good models for ad-hoc wireless
networks where the probability slm = i (ulm ) that there is a wireless link
between node l and m is specied by the radio propagation that is briey
explained at the end of Section 3.5.
The class Ju (Q> O) is the set of
random graphs with Q nodes and O links.
Om a x
In total, we can construct O dierent graphs, which corresponds to the
number of ways we can distribute a set of O ones in the Omax possible places
in the upper triangular part above the diagonal of the adjacency matrix
D. Each of the possible Omax links has equal probability to belong to a
random graph of the class Ju (Q> O). The probability that an element in
the adjacency matrix D is dlm = 1 equals s = OmOa x . As opposed to the
class Js (Q ), the number of non-zero elements in D in each random graph
of Ju (Q> O) is precisely 2O (see Appendix B.1, art. 2), which induces weak
dependence between links in Ju (Q> O). The latter also explains why more
computations are easier in Js (Q ) than in Ju (Q> O).
The average number of paths with m hops between two arbitrary nodes in
Js (Q ) follows from (15.1) and (2.13) as

H[[m ] =

(Q  2)! m
s
(Q  m  1)!

(15.10)

for 1  m  Q 1. The average total number of paths between two arbitrary

15.6 Random graphs

331

Fig. 15.6. A connected random graph Js (Q ) with Q = 300 and s = 0=013 drawn
on a circle.

nodes D and E equals


6
5
Q31
Q32
X
X s3o
1
 (Q  2)!sQ31 h s
H7
[m 8 = (Q  2)!sQ 31
o!
m=1

o=0

where the latter bound is closely approached for large Q. Moreover, when
the random graph reduces to the complete graph (s = 1), we again obtain
(15.3). Since the degree gm of a node m is the number of links incident
with that node, it follows directly from the denition of Js (Q ) that the
probability density function of the degree Grg of an arbitrary node in Js (Q )
equals

Q 1 n
Pr [Grg = n] =
s (1  s)Q313n
(15.11)
n
The interest in random graphs is fueled by the fact that the topology of
the Internet is inaccurately known and also that good models7 are lacking.
In some sense, the Internet can be regarded as a growing and changing organism. Such complex networks also arise in other elds. Increased interest
7

A detailed discussion on di!culties in modeling or simulating the Internet is presented by Floyd


and Paxson (2001).

332

General characteristics of graphs

from dierent disciplines to understand network behavior resulted in a new


wave of science, which may be termed the physics of networks and which
was recently reviewed by Strogatz (2001). Random graphs are an elegant
vehicle to thoroughly analyze the performance of, for example, routing algorithms. Some constructed overlay networks such as Gnutella and mobile
wireless ad-hoc networks seem reasonably well modeled by Js (Q ). However, the class Js (Q) does not describe the Internet topology well, and the
degree distribution especially deviates signicantly. The degree distribution
(15.11) in Js (Q ) is binomially distributed, while that of the Internet is
close to a power law (15.4). Hence, there is a discrepancy between Internet
measurements and properties of the random graph Js (Q ).

15.6.1 The number F(Q> O) of connected random graphs in the


class Ju (Q> O)
From the point of view of telecommunications networks, by far the most
interesting graphs are those with connected topology. This limitation restricts the value of the link density s from below by a critical threshold sf .
For large Q, the critical threshold is sf  logQQ , as shown in Section 15.6.3.
In the theory of random graphs, the problem to determine the number of
connected random graphs F(Q> O) in the class Ju (Q> O) has been intensively studied. Gilbert (1956) has presented an exact recursion formula for
F(Q> O) via the technique of enumeration. Erds and Rnyi (1959, 1960)
have determined the asymptotic behavior of random graphs via the probabilistic method, largely introduced by Erds himself. Since the analysis8 of
Gilbert is both exact and simple, we will review his results here and those
of Erds and Rnyi in the next section.
Consider a particular random graph M of the class of random graphs
Ju (Q + 1> O), which is constructed from the class Ju (Q> O) by adding one
node labelled Q + 1. Suppose that the node labelled Q + 1 in the random
graph M belongs to a connected component N that possesses y other nodes
and some number  of links.
The remaining part of M has Q  y nodes and
O   links. There are Qy ways in which the y nodes of N can be chosen
out of the Q nodes in Ju (Q> O). On the other hand, there are F(y + 1> )
Q 3y
ways of picking a connected random graph N while there are ( 2 ) ways
O3

of constructing the remaining part of M. Hence, since the number of ways


Q +1
we can construct a graph M equals ( O2 ) , we obtain Gilberts recursion
8

A dierent, less admissible approach is found in Goulden and Jackson (1983).

15.6 Random graphs

333

formula,
Q+1
2

y+1
Q (X
2 )
X
Q

y=0

Q3y
2

F(y + 1> )

O

=y

(15.12)

Gilbert (1956) further derives the generating function for F(Q> O) as

!
n
"
" X
"
X
X
F(Q> O) Q O
(1 + |)(2) {n
{ | = log 1 +
(15.13)
Q!
n!
Q=1 O=1

n=1

which converges for 2  |  0 and all {. So far, no other explicit formulae


for F(Q> O) exist.
In 1889, Caley (see Appendix B.1, art. 3) proved that, in the special case
where O = Q  1, there holds F(Q> Q  1) = Q Q32 . In the other extreme
where O = Omax corresponding
with a full mesh, we have F(Q> Omax ) = 1.

Actually, when Q2  Q  1 ? O  Omax , the graph is always connected


because the adjacency matrix D has necessarily at least one non-zero element
Q

per row. This means that F(Q> O) = ( 2 ) for Q  Q  1 ? O  O . In
max

all cases where O ? Q  1, the random graphs are necessarily Disconnected,


leading to F(Q> O) = 0.
For computational purposes, we rewrite (15.12) as
Q+1
2

Q +1
(X
2 )

=Q

y+1
0 Q31
Q3y
2 )
X Q (X
2
2
F(Q +1> )
F(y+1> )
+
y =y
O
O

y=0

0
= >O , we arrive after a substitution of Q $ Q  1 at the
Since O3
recursion formula,

Q
F(Q> O) =

(y+1
Q 313y
2 )
Q 1 X
2
F(y + 1> )
y
O


=y

Q32
X
y=0

(15.14)

Below we list a few values:


F(2> 1) = 1
F(3> 2) = 3
F(4> 3) = 16
F(5> 4) = 125
F(5> 8) = 45
F(6> 5) = 1296
F(6> 9) = 4945
F(6> 13) = 105
F(7> 6) = 16807
F(7> 10) = 331506
F(7> 14) = 116175
F(7> 18) = 1330

F(3> 3) = 1
F(4> 4) = 15
F(5> 5) = 222
F(5> 9) = 10
F(6> 6) = 3660
F(6> 10) = 2997
F(6> 14) = 15
F(7> 7) = 68295
F(7> 11) = 343140
F(7> 15) = 54257
F(7> 19) = 210

F(4> 5) = 6
F(5> 6) = 205
F(5> 10) = 1
F(6> 7) = 5700
F(6> 11) = 1365
F(6> 15) = 1
F(7> 8) = 156555
F(7> 12) = 290745
F(7> 16) = 20349
F(7> 20) = 21

F(4> 6) = 1
F(5> 7) = 120
F(6> 8) = 6165
F(6> 12) = 455
F(7> 9) = 258125
F(7> 13) = 202755
F(7> 17) = 5985
F(7> 21) = 1

334

General characteristics of graphs

15.6.2 The Erds and Rnyi asymptotic analysis


In a classical paper, Erds and Rnyi (1959) proved that

 
1
32{
is connected = h3h
lim Pr Ju Q> Q log Q + {Q
Q<"
2

(15.15)

Ignoring the integral part [.] operator and eliminating { using the number
of links O = 12 Q log Q + {Q gives, for large Q ,
Pr[Ju (Q> O) = connected]  h3Q h

3 2O
Q

(15.16)

which should be compared with the exact result,


Pr[Ju (Q> O) = connected] =

F(Q> O)
(Q )
2

(15.17)

In contrast to the unattractive computation of the exact F(Q> O) via recursion (15.14), the Erds and Rnyi asymptotic expression (15.16) is simple.
The accuracy for relatively small Q is shown in Fig. 15.7.
1.0

L=N

Pr[Gr(N,L) = disconnected]

0.8

L = 3/2 N
0.6

L = 2N
0.4

L = 2/3 N log N
0.2
exact
Erdos' asymptotic formula
0.0
0

10

20

30

40

50

60

Number of Nodes N

Fig. 15.7. The probability that a random graph Ju (Q> O) is disconnected : a comparison between the exact result (15.17) and Erdos asymptotic formula (15.16) for
O = Q , O = 32 Q , O = 2Q and O = 23 Q log Q .

The key observation of Erds and Rnyi (1959) is that a phase transition in
random graphs with Q nodes occurs when the number of links O is around

15.6 Random graphs

335

Of = 12 Q log Q . Phase transitions are well-known phenomena in physics.


For example, at a certain temperature, most materials possess a solid-liquid
transition and at a higher temperature a second liquid-gas transition. Below
that critical temperature, most properties of the material are completely
dierent than above that temperature. Some materials are superconductive
below a certain critical temperature Wf , but normally conductive (or even
on the property Dn
isolating) above Wf . Erds and Rnyi concentrated

that a random graph Ju (Q> O) with O{ = 12 Q log Q + {Q consists of


Q  n connected nodes and n isolated nodes for xed n. If Dfn means the
absence of property Dn , they proved that, for all xed n, Pr [Dfn ] $ 0 if
Q $ 4 which means that for a large number of nodes Q , almost all random
graphs Ju (Q> O{ ) possess property Dn . This result is equivalent to a result
proved in Section 15.6.3 that the class of random graphs Js (Q) is almost
surely disconnected if the link density s is below sf  logQQ and connected
for s A sf . In view of the analogy with physics, it is not surprising that
corresponding sharp transitions also are observed for other properties than
just Dn .
In the sequel, we will show that, for the random graph Ju (Q> O{ ), the
probability that the largest connected component, called the giant component JF (Q> O{ ), has Q  n nodes is, for large Q , Poisson distributed with
mean h32{ ,
32{

(h32{ )n h3h
lim Pr [number of nodes in JF (Q> O{ ) = Q  n] =
Q<"
n!

(15.18)
If n = 0, then all nodes belong to the giant component and the graph is
completely connected in which case (15.18) leads to (15.16).
The total number of graphs Ju (Q> O{ ) with n  1 isolated nodes equals
Q (Q 3n)
2
, the number of ways in which n isolated nodes can be chosen
n

O{

out of the total of Q nodes multiplied by the number of graphs that can be
constructed with Q  n nodes and O{ links. Observe that this total number
also includes those graphs where not all the Q  n nodes are necessarily
connected. In other words, this total number includes the graphs that do
not possess property Dn . The total number of graphs W0 without isolated
node follows from the inclusion-exclusion formula (2.10) as
Q3n
Q
X
n Q
2
(1)
W0 (Q> O{ ) =
n
O{
n=0

where the index n = 0 equals the total number of graphs with Q nodes
and O{ links, i.e. the total number of elements in the sample space. Evi-

336

General characteristics of graphs

dently, the total number F(Q> O{ ) of connected random graphs of the class
Ju (Q> O{ ) is smaller than W0 (Q> O{ ) because all of them must obey property
D0 as well.
Since9
Q (Q 3n)
2
(h32{ )n
n
O{
lim
=
Q

(2)
Q<"
n!
O{

we obtain
32{ )n
W0 (Q> O{ ) X
32{
n (h
= h3h
(1)
=
Q

Q<"
(2)
n!
"

lim

n=0

O{

But, if Q $ 4, the dierence


W0 (Q> O{ )  F(Q> O{ )
 Pr [Df0 ] $ 0
(Q )
2

O{

which demonstrates (15.18) for n = 0. The remaining case for n  0 in


(15.18) follows from the observation that the number of graphs in Ju (Q> O{ )
9

It is convenient to take the logarithm of




Q 
wn =

Q n
2
O
 {
Q


2
O{



O\
n31
{ 31 Q 3n
3m
1 \
2
(Q 3 m)
=
Q 
n! m=0
3
m
m=0
2

2m


O{ 31 O\

n31 
{ 31 1 3
m
n
n O{ 31
Qn \
(Q 3n)(Q 313n)
13
13
13
2m
n! m=0
Q
Q
Q 31
1 3 Q (Q
m=0
31)

which is
log (n!wn ) = n log Q +

H[
{ 31
m=0








m
n
n
log 1 3
+ (H{ 3 1) log 1 3
+ log 1 3
Q
Q
Q 31
m=0

n31
[


log 1 3

2m
(Q 3 n)(Q 3 1 3 n)


3 log 1 3

2m
Q(Q 3 1)

 
For large Q and using the expansion log (1 3 }) = 3} + R } 2 , we have for xed n with






2m
2m
log 1 3
= log 1 3
+ R Q 33
(Q 3 n)(Q 3 1 3 n)
Q(Q 3 1)
that




2n
log (n!wn ) = n log Q + R Q 31 3 O{
+ R O2{ Q 33
Q
In order to have a nite limit limQ <" log (n!wn ) = f M R, we must require that n log Q 3
f
O{ 2n
= f which implies that O{ = Q
log Q 3 Q
. For this scaling the order term R O2{ Q 33
Q
2
2n f
indeed vanishes if Q < ". By choosing { = 3 2n
, we arrive at the correct scaling of O{ =
1
Q
log
Q
+
{Q
postulated
above
and
f
=
32n{.
2

15.6 Random graphs

337

with property Dn is equal to n multiplied by the number of connected


graphs with Q  n nodes and O{ links, which is approximately
Q (Q 3n)
n

O{

( )
Q
2
O{

W0 (Q  n> O{ )
(h32{ )n 3h32{
h
$
Q
3n
( )
n!
2
O{

where the limit gives the correct result because the small dierence between
the total number and that without property Dn tends to zero.

15.6.3 Connectivity and degree


There is an interesting relation between the connectivity of a graph, a global
property, and the degree G of an arbitrary node, a local property. The implication {J is connected} =, {Gmin  1} where Gmin = minall nodes MJ G
is always true. The opposite implication is not always true, however, because a network can consists of separate, disconnected clusters containing
nodes each with minimum degree larger than 1. A random graph can be
generated from a set of labelled Q nodes by randomly assigning a link with
probability s to each pair of nodes. During this construction process, initially separate clusters originate, but at a certain moment, one of those
clusters starts dominating (and swallowing) the other clusters. This largest
cluster becomes the giant component. For large Q and a certain sQ which
depends on Q , the implication {Gmin  1} =, {Js (Q) is connected} is almost surely (a.s.) correct. A rigorous mathematical proof is fairly complex
and omitted. Thus, for large random graphs Js (Q ) holds the equivalence
{Js (Q ) is connected} +, {Gmin  1} almost surely such that
Pr [Js (Q ) is connected] = Pr [Gmin  1]

a.s.

From (3.32) and (15.11), we have that

Q
Pr[Gmin  1] = (Pr[Guj  1])Q = (1  Pr[Guj = 0])Q = 1  (1  s)Q31
which shows that Pr [Gmin  1] rapidly tends to one for xed 0 ? s ? 1
and large Q. Therefore, the asymptotic behavior of Pr [Js (q) is connected]

338

General characteristics of graphs

requires the investigation of the inuence of s as a function of Q ,

Pr [Js (Q ) is connected] = exp Q log 1  (1  sQ )Q31


4
3
"
m(Q31)
X
(1  sQ )
D
= exp CQ
m
m=1
4
3
"
(Q31)m
X
(1

s
)
Q
31
Q
D
= h3Q(13sQ )
exp CQ
m
m=2

If we denote fQ , Q (1  sQ )Q 31 , then
Q

"
X
(1  sQ )(Q31)m
m=2

"
X
m=2

fmQ
mQ m31


can be made arbitrarily small for large Q provided we choose fQ = R Q 
with  ? 12 . Thus, for large Q , we have that


Pr [Js (Q ) is connected] = h3fQ 1 + R Q 231
which tends to 0 for 0 ?  ? 12 and to 1 for  ? 0. Hence, the critical
exponent where a sharp transition occurs is  = 0. In that case, fQ = f (a
real positive constant) and

log Qf
log f
log Q
sQ = 1  exp
+R
=
=
Q 1
Q
Q
In summary, for large Q ,
Pr [Js (Q ) is connected] $

0
1

if s ?
if s A

log Q
Q
log Q
Q

(15.19)

with a transition region around sf  logQQ with a width of R( Q1 ). Notice


{
' logQQ + Q{ : for large { ? 0,
the agreement with (15.15) where s{ = OOmax
32{
32{
$ 0, while for large { A 0, h3h
$ 1 and the width of the transition
h3h
1
region for the link density s is R( Q ).
15.6.4 Size of the giant component
Let V = Pr [q 5 F] denote the probability that a node q in Js (Q ) belongs
to the giant component F. If q 5
@ F, then none of the neighbors of node q

15.6 Random graphs

339

belongs to the giant component. The number of neighbors of a node q is


the degree gq of a node such that
Pr [q 5
@ F] = Pr [all neighbor of q 5
@ F]
X
=
Pr [all n neighbors of q 5
@ F|gq = n] Pr [gq = n]
nD0

Since in Js (Q ) all neighbors of q are independent10 , the conditional probability becomes, with 1  V = Pr [q 5
@ F],
Pr [all n neighbors of q 5
@ F|gq = n] = (Pr [q 5
@ F])n = (1  V)n
Moreover, this probability holds for any node in q 5 Js (Q ) such that,
writing the random variable Grg instead of an instance gq ,
1V =

"
X

(1  V)n Pr [Grg = n] = *Grg (1  V)

n=0

where *Grg (x) = H xGrg is the generating function of the degree Grg in
Js (Q ). For large Q , the degree distribution in Js (Q ) is Poisson distributed
with mean degree rg = s (Q  1) and *Grg (x) ' hrg (x31) . For large Q , the
fraction V of nodes in the giant component in the random graph satises
an equation similar to that in (12.13) of the extinction probability in a
branching process,
V = 1  h3rg V

(15.20)

and the average size of the giant component is Q V. For rg ? 1 the only
solution is V = 0 whereas for rg A 1 there is a non-zero solution for the
size of the giant component. The solution can be expressed as a Lagrange
series using (5.34),
V (rg ) = 1  h3rg

"
X
q
(q + 1)q
rg h3rg
(q + 1)!
q=0

(15.21)

By reversing (15.20), the average degree in the random graph can be expressed in terms of the fraction V of nodes in the giant component,
rg (V) = 
10

log (1  V)
V

(15.22)

This argument is not valid, for example, for a two-dimensional lattice Z2s in which each link
between adjacent nodes at integer value coordinates in the plane exists with probability s. The
critical link density for connectivity in Z2s is sf = 12 , a famous result proved in the theory of
percolation (see, for example, Grimmett (1989)).

340

General characteristics of graphs

15.7 The hopcount in a large, sparse graph with unit link weights
Routers in the Internet forward IP packets to the next hop router, which is
found by routing protocols (such as OSPF and BGP). Intra-domain routing
as OSPF is based on the Dijkstra shortest path algorithm, while inter-domain
routing with BGP is policy-based, which implies that BGP does not minimize a length criterion. Nevertheless, end-to-end paths in the Internet are
shortest paths in roughly 70% of the cases. Therefore, we consider the shortest path between two arbitrary nodes because (a) the IP address does not
reect a precise geographical location and (b) uniformly distributed world
wide communication, especially, on the web seems natural since the information stored in servers can be located in places unexpected and unknown to
browsing users. The Internet type of communication is dierent from classical telephony because (a) telephone numbers have a direct binding with a
physical location and (b) the intensity of average human interaction rapidly
decreases with distance. We prefer to study the hopcount KQ because it is
simple to measure via the trace-route utility, it is an integer, dimensionless,
and the quality of service (QoS) measures (such as packet delay, jitter and
packet loss) depend on the hopcount, the number of traversed routers. In
this section, we rst investigate the hopcount in a sparse, but connected
graph where all links have unit weight. Chapter 16 treats graphs with other
link weight structures.

15.7.1 Bi-directional search


The basic idea of a bi-directional search to nd the shortest path is by
starting the discovery process (e.g. using Dijkstras algorithm) from D and
E simultaneously. When both subsections from D and from E meet, the
concatenation forms the shortest path from D to E. In case all link weights
are equal, z (l $ m) = 1 for any link l $ m in a graph J, the shortest path
from D and E is found when the discovery process from D and that from E
have precisely one node of the graph in common.
Denote by FD (o), respectively FE (o), the set of nodes that can be reached
from D, respectively E, in o or less hops. We dene FD (0) = {D} and
FE (0) = {E}. The hopcount is larger than 2o if and only if FD (o) _ FE (o) is
empty. Conditionally on |FD (o)| = qD , respectively |FE (o)| = qE , the sets
FD (o) and FE (o) do not possess a common node with probability
Q313qD
Pr [FD (o) _ FE (o) = B||FD (o)| = qD > |FE (o)| = qE ] =

E
Q 31

qE

15.7 The hopcount in a large, sparse graph with unit link weights

341

which consists of the ratio of all combinations in which the qE nodes around
E can be chosen out of the remaining nodes that do not belong to the set
FD over all combinations in which qE nodes can be chosen in the graph with
Q nodes except for node D. Furthermore,
Q313qD
(Q  qD  1)(Q  qD  2) (Q  qD  qE )
qE
Q31
=
(Q  1)(Q  2) (Q  qE )
q
E

(1 

qD +1
qD +2
qD +qE
)
Q )(1  Q ) (1 
Q
qE
1
2
(1  Q )(1  Q ) (1  Q )

For large Q , we apply the Taylor series around { = 0 of log (1  {) =


P
{m
{  "
m=2 M ,
Q313qD
log

E
Q31

qE

n
qD + n
 log 1 
log 1 
Q
Q
n=1

!
X
qE
qE
"
X
X
1
n
qD + n


=
(qD + n)m  nm
Q
Q
mQ m
m=2
n=1
n=1

2
qD qE
qD qE
1
1
1

=
U
+
+
Q
Q
2qD 2qE
2qD qE
=

qE
X

where the remainder is

m31
qE
"
X
X
qD qE 3
1 X m
m3p
p
q
n =R
U=
p D
mQ m
Q
m=3

p=0

n=1

After exponentiation

qD qE 2
1+R
Q

H [|FD (o)|2 |FE (o)|2 ]


for
By the law of total probability (2.47) and up to R
Q2

qD qE

Pr KQ A 2o|FD (o)| = qD > |FE (o)| = qE = h3 Q

large Q , we obtain
Pr [KQ

|FD (o)| |FE (o)|


A 2o]  H exp 
Q

(15.23)

This probability (15.23)


h holds for any ilarge graph with a unit link weight
structure provided H |FD (o)|2 |FE (o)|2 = r(Q 2 ). Formula (15.23) becomes
increasingly accurate for decreasing |FD (o)| and |FE (o)|, and so for sparser
large graphs.

342

General characteristics of graphs

15.7.2 Sparse large graphs and a branching process


In order to proceed, the number of nodes in the sets FD (o) and FE (o) needs to
be determined, which is di!cult in general. Therefore, we concentrate here
on a special class of graphs in which the discovery process from D and E is
reasonably well modeled by a branching process (Chapter 12). A branching
process evolves from a given set FD (o  1) in the next o-th discovery cycle
(or generation) to the set FD (o) by including only new nodes, not those
previously discovered. The application of a branching process implies that
the newly discovered nodes do not possess links to any previously discovered
node of FD (o  1) except for its parent node in FD (o  1). Hence, only for
large and sparse graphs or tree-like graphs, this assumption can be justied,
provided that the number of links that point backwards to early discovered
nodes in FD (o  1) is negligibly small.
Assuming that a branching process models the discovery process well, we
will compute the number of nodes that can be reached from D and similarly
from E in o hops from a branching process with production \ specied by
the degree distribution of the nodes in the graph. The additional number
of nodes [o discovered during the o-th cycle of a branching process that
are included in the set FD (o) is described by the basic law (12.1). Thus,
P
|FD (o)| = on=0 [n with [0 = 1 (namely node D). In terms of the scaled
n
random variable Zn = [
with unit mean H [Zn ] = 1,
n
|FD (o)| =

o
X

Zn n

n=0

and where  = H [\ ]  1 A 1 denotes the average degree minus 1, i.e. the


outdegree, in the graph. Only the root has H [\ ] equal to the mean degree.
Immediately, the average size of the set of nodes reached from D in o hops
is with H [Zn ] = 1,
H [|FD (o)|] =

o
X

n =

n=0

o+1  1
1

which equally holds for H [|FE (o)|].


Applying Jensens inequality (5.5) to (15.23) yields

H [FD (o)] H [FE (o)]


FD (o)FE (o)
exp 
 H exp 
Q
Q
such that
Pr [KQ

A n]  exp 

2
n
Q (  1)2

15.7 The hopcount in a large, sparse graph with unit link weights

343

With the tail probability expression (2.36) for the average, we arrive at the
lower bound for the expected hopcount in large graphs,

"
"
X
X
2
n
Pr [KQ A n] 
exp 

H [KQ ] =
Q (  1)2
n=0
n=0

P
n can be evaluated exactly11 as
The sum V1 (w) = "
n=0 exp w
h

i

" cos 2n log  + arg  2nl
X
log 
w
log 
2
1 log w + 
q
V1 (w) = 
+s
2
2
log 
log  n=1
2n sinh 2n
log 

"

X
3n
1  h3w
+
n=1

Furthermore,
h

X
2n
2nl
"
" cos log  log w + arg  log  X
1

q
q

= e()

2
n=1 2n sinh 2n2
n=1
2n sinh 2n
log 
log 
and the function W () =

2e()
I
log 

is increasing, but for 1 ?   5 its maximum


2


value W (5) is smaller than 0=0035. Since w = Q(31)
2 is small and  A 1, we
approximate
1 log w + 
V1 (w)  
(15.24)
2
log 
11

For  = Uh(v) A 0 and Uh(s) D 0, we have


K(v)
=
sv

"

wv31 h3

gw

and
] "
"
"
[
[
n
1
=
wv31
h3 w gw
nv

0
n=0
n=0

K(v)
or

"

wv31 V1 (w) gw = K(v)

v
31

v

By Mellin inversion, for f A 0,


V1 (w) =

1
2l

f+l"

f3l"

K(v)   v
gv
v 3 1 w

By moving the line of integration to the left, we encounter a double pole at v = 0 from K(v)
2nl
and v131 and simple poles at v = log
from v131 . Invoking Cauchys residue theorem leads

to the result.

344

General characteristics of graphs

and arrive, for large Q , at


2


1 log (31)2 + 
log Q
log Q
+ 

H [KQ ] 
log 
2
log 
log 

This shows that in large, sparse graphs for which the discovery process is
Q
well modeled by a branching process, it holds that H [KQ ] scales as log
log 
where  = H [\ ]  1 A 1 is the average degree minus 1 in the graph.
We can rene the above analysis. Let us now assume that the convergence
of Zn $ Z is su!ciently fast for large Q and that Z A 0 such that,
|FD (o)|  ZD

o
X
n=0

n = ZD

o+1  1
o+1
 ZD
1
1

is a good approximation (and similarly for |FE (o)|). The verication of this
approximation is di!cult in general. Theorem 12.3.2 states that Pr [Z = 0] =
0 and equivalently Pr [Z A 0] = 1  0 where the extinction probability 0
obeys the equation (12.13). Using this approximation, we nd from (15.23)

ZD ZE 2o+2
ZD > ZE A 0
Pr [KQ A 2o]  H exp 
Q (  1)2
where the condition on Z A 0 is required else there are no clusters FD (o)
and FE (o) nor a path. Since the same asymptotics also holds for odd values
of the hopcount, we nally arrive, for n  1 and large Q , at
h

Pr [KQ A n]  H exp ]n ZD > ZE A 0


where the random variable
]=
g

2
ZD ZE
Q (  1)2

and ZD = ZE = Z . A more explicit computation of Pr [KQ A n] requires


the knowledge of the limit random variable Z , which strongly depends on
the nodal degree \ .
The average hopcount H [KQ ] is found similarly as in the analysis above
by using (15.24) with w = ],
H [KQ ]  H [ V1 (])| ZD > ZE A 0]

#
"

1 2 log Z  log Q + 2 log (31) + 

=H
Z A 0

2
log 

1 log Q  2 log (31)  
H [ log Z | Z A 0]
= +
2
2
log 
log 

15.7 The hopcount in a large, sparse graph with unit link weights

345

In sparse graphs with average degree H [\ ] equal to  and for a large number
of nodes Q , the average hopcount is well approximated12 by
H [KQ ] =

1   2 log (  1)
H [ log Z | Z A 0]
log Q
+ 
2
log 
2
log 
log 

(15.25)

This expression (15.25) for the average hopcount which is more rened than
Q
the commonly used estimate H [KQ ]  log
log  contains the curious average
H [ log Z | Z A 0] where Z is the limit random variable of the branching
process produced by the graphs degree distribution \ .
Application to Gp (N) The above analysis holds for xed H [\ ] = s(Q 

where  is approximately
1) such that, for large Q , we require that s = Q
equal to the average degree. Since the binomial distribution (15.11) for
the degree in Js (Q) is very well approximated by the Poisson distribution
n
Pr [Grg = n]  n! h3 for large Q and constant , formula (15.25) requires
the computation of H [ log Z | Z A 0] in a Poisson branching process, which
is presented in Hooghiemstra and Van Mieghem (2005) but here summarized
in Fig. 15.8. The numerical evaluation of average hopcount (15.25) in a
1.2

1.0

E[logW|W>0]

0.8

0.6

0.4

0.2

0.0

-0.2
1

10

Fig. 15.8. The quantity H [ log Z | Z A 0] of a Poisson branching process versus the
average degree .
12

A more rigorous derivation that stochastically couples the graphs growth specied by a certain
degree distribution to a corresponding branching process is found in van der Hofstad et al.
(2005). In particular, the analysis is shown to be valid for any randomly constructed graph
with a nite variance of the degree. More details on the result for the average hopcount are
presented in Hooghiemstra and Van Mieghem (2005).

346

General characteristics of graphs

random graph of the class Js (Q ) for small average degree  and large Q
shows that (15.25) is much more accurate than only its rst term log Q .
At the other end of the scale for a constant link density s = f ? 1, which
corresponds to an average degree H [\ ] = f(Q  1), the above analysis no
longer applies for such large values of the average degree H [\ ]. Fortunately,
in that case, an exact asymptotic analysis is possible (see Problem (iii)):
Pr [KQ = 1] = s

Pr [KQ = 2] = (1  s) 1  (1  s2 )Q 32

(15.26)

Values of KQ higher than 2 are extremely unlikely since Pr [KQ A 2] = (1 

Q 32
s) 1  s2
tends to zero rapidly for su!ciently large Q . Hence, H[KQ ] '
Pr [KQ = 1] + 2 Pr [KQ = 2] ' 2  s and, similarly, we nd Var[KQ ] '
s(1s). This asymptotic analysis even holds for a larger link density regime
1
s = fQ 3 2 + with  A 0 because

Q32
1
=0
lim Pr [KQ A 2] = lim (1  fQ 3 2 + ) 1  fQ 31+2
Q<"

Q<"

but for  = 0, it holds that limQ <" Pr [KQ A 2] = h3f A 0.


In summary, if the link density s scales as s = fQ 3 with  5 [0> 12 ), the

average hopcount H[KQ ] ' 2  s is constant and very small. If s = Q 13
,
equation (15.25) shows that H [KQ ]  log Q . The regime in between for
 5 [ 12 > 1) needs other analysis techniques.
15.8 Problems
(i) An extremely regular graph is a g-lattice where each nodal position corresponds to a point with integer coordinates within a gdimensional hyper-cube with size ]. Apart from border nodes, each
node has a constant degree (number of neighbors), precisely equal
to 2g. Assuming that all link metrics are equal to one, compute the
probability generating function of the hopcount of the shortest path
between two uniformly chosen points.
coe!cient
(ii) If fJs (Q) is the clustering

of the random graph Js (Q ),


then compute Pr fJs (Q )  { and H fJs (Q) .
(iii) Derive (15.26) in Js (Q ) with unit link weights.

16
The Shortest Path Problem

The shortest path problem asks for the computation of the path from a
source to a destination node that minimizes the sum of the positive weights1
of its constituent links. The related shortest path tree (SPT) is the union of
the shortest paths from a source node to a set of p other nodes in the graph
with Q nodes. If p = Q  1, the SPT connects all nodes and is termed a
spanning tree. The SPT belongs to the fundamentals of graph theory and
has many applications. Moreover, powerful shortest path algorithms like
that of Dijkstra exist. Section 15.7 studied the hopcount, the number of
hops (links) in the shortest path, in sparse graphs with unit link weights.
In this chapter, the inuence of the link weight structure on the properties
of the SPT will be analyzed. Starting from one of the simplest possible
graph models, the complete graph with i.i.d. exponential link weight, the
characteristics of the shortest path will be derived and compared to Internet
measurements.
The link weights seriously impact the path properties in QoS routing
(Kuipers and Van Mieghem, 2003). In addition, from a tra!c engineering
perspective, an ISP may want to tune the weight of each link such that the
resulting shortest paths between a particular set of in- and egresses follow
the desirable routes in its network. Thus, apart from the topology of the
graph, the link weight structure clearly plays an important role. Often, as
in the Internet or other large infrastructures, both the topology and the
link weight structure are not accurately known. This uncertainty about the
precise structure leads us to consider both the underlying graph and each
of the link weights as random variables.

A zero link weight is regarded as the coincidence of two nodes (which we exclude), while an
innite link weight means the absence of a link.

347

348

The Shortest Path Problem

16.1 The shortest path and the link weight structure


Since the shortest path is mainly sensitive to the smaller, positive link
weights, the probability distribution of the link weights around zero will
dominantly inuence the properties of the resulting shortest path. A regular link weight distribution Iz ({) = Pr [z  {] has a Taylor series expansion
around { = 0,

Iz ({) = iz (0) { + R {2
since Iz (0) = 0 and Iz0 (0) = iz (0) exists. A regular link weight distribution
is thus linear around zero. The factor iz (0) only scales all link weights, but
does not inuence the shortest path. The simplest distribution of the link
weight z with a distinct dierent behavior for small values is the polynomial
distribution
Iz ({) = { 1{M[0>1] + 1{M[1>") >

 A 0>

(16.1)

The corresponding density is iz ({) = {31 1{M[0>1] . The exponent


 = lim
{0

log Iz ({)
log {

is called the extreme value index of the probability distribution of z and


 = 1 for regular distributions. By varying the exponent  over all nonnegative real values, any extreme value index can be attained and a large
class of corresponding SPTs, in short -trees, can be generated.
Fw(x)
1

D
D 

D!

larger scale

Fig. 16.1. A schematic drawing of the distribution of the link weights for the three
dierent -regimes. The shortest path problem is mainly sensitive to the small
region around zero. The scaling invariant property of the shortest path allows us
to divide all link weights by the largest possible such that Iz (1) = 1 for all link
weight distributions.

16.2 The shortest path tree in NQ with exponential link weights

349

Figure 16.1 illustrates schematically the probability distribution of the


link weights around zero (0> ], where  A 0 is an arbitrarily small, positive
real number. The larger link weights in the network will hardly appear in
a shortest path provided the network possesses enough links. These larger
link weights are drawn in Fig. 16.1 from the double dotted line to the right.
The nice advantage that only small link weights dominantly inuence the
property of the resulting shortest path tree implies that the remainder of the
link weight distribution (denoted by the arrow with larger scale in Fig. 16.1)
only plays a second order role. To some extent, it also explains the success
of the simple SPT model based on the complete graph NQ with i.i.d. exponential link weights, which we derive in Section 16.2. A link weight structure
eectively thins the complete graph NQ any other graph is a subgraph of
NQ to the extent that a specic shortest path tree can be constructed.
Finally, we assume the independence of link weights, which we deem a
reasonable assumption in large networks, such as the Internet with its many
independent autonomous systems (ASs). Apart from the Section 16.7, we
will mainly consider the case for  = 1, which allows an exact analysis.

16.2 The shortest path tree in NQ with exponential link weights


16.2.1 The Markov discovery process
Let us consider the shortest path problem in the complete graph NQ , where
each node in the graph is connected to each other node. The problem of
nding the shortest path between two nodes D and E in NQ with exponentially distributed link weights with mean 1 can be rephrased in terms of a
Markov discovery process. The discovery process evolves as a function of
time and stops at a random time W when node E is found. The process is
shown in Fig. 16.2.
The evolution of the discovery process can be described by a continuoustime Markov chain [(w), where [(w) denotes the number of discovered nodes
at time w, because the characteristics of a Markov chain (Theorem 10.2.3)
are based on the exponential distribution and the memoryless property. Of
particular interest here is the property (see Section 3.4.1) that the minimum
of q independent exponential variables each with parameter l is again an
P
exponential variable with parameter ql=1 l .
The discovery process starts at time w = W0 with the source node D and
for the initial distribution of the Markov chain, we have Pr[[(W0 ) = 1] = 1.
The state space of the continuous Markov chain is the set VQ consisting of
all positive integers (nodes) q with q  Q . For the complete graph NQ , the

350

The Shortest Path Problem

transition rates are given by


q = q(Q  q)>

q 5 VQ

(16.2)

Indeed, initially there is only the source node D with label2 0, hence q =
1. From this rst node D precisely Q  1 new nodes can be reached in
the complete graph NQ . Alternatively one can say that Q  1 nodes are
competing with each other each with exponentially distributed strength to
be discovered and the winner amongst them, say F with label 1, is the one
reached in shortest time which corresponds to an exponential variable with
rate Q  1.
v8
v7

corresponding URT

Markov
discovery
process

v6
v5
v4
v3

2
5

v2

h=0

h=1
h=2

v1

1
7

h=3

time
2
4
3

W6
7

Fig. 16.2. On the left, the Markov discovery process as function of time in a graph
with Q = 9 nodes. The circles centered at the discovering node D with label
0 present equi-time lines and yn is the discovering time of the n-th node, while
n = yn  yn1 is the n-th interattachment time. The set of discovered nodes
redrawn per level are shown on the right, where a level gives the number of hops k
from the source node D. The tree is a uniform recursive tree (URT).
2

When continuous measures such as time and weight of a path are computed, the source node is
most conveniently labeled by zero, whereas in counting processes, such as the number of hops
of a path, the source node is labeled by one.

16.2 The shortest path tree in NQ with exponential link weights

351

After having reached F from D at hitting time y1 , two nodes q = 2 are


found and the discovery process restarts from both D and F. Although
at time y1 we were already progressed a certain distance towards each of
the Q  2 other, not yet discovered, nodes, the memoryless property of
the exponential distribution tells us that the remaining distance to these
Q  2 nodes is again exponentially distributed with the same parameter 1.
Hence, this allows us to restart the process from D and F by erasing the
previously partial distance to any other not yet discovered node as if we
ignore that it were ever travelled. From the discovery time y1 of the rst
node on, the discovery process has double strength to reach precisely Q  2
new nodes. Hence, the next winner, say G labeled by 2, is reached at y2
in the minimum time out of 2(Q  2) traveling times. This node G has
equal probability to be attached to D or F because of symmetry. When
G is attached to D (the argument below holds similarly for attachment
to F), symmetry appears to be broken, because G and F have only one
link used, whereas D has already two links used. However, since we are
interested in the shortest path problem and since the direct link from D
to G is shorter than the path D $ F $ G, we exclude the latter in the
discovery process, hereby establishing again the full symmetry in the Markov
chain. This exclusion also means that the Markov chain maintains single
paths from D to each newly discovered node and this path is also the shortest
path. Hence, there are no cycles possible. Furthermore, similar to Dijkstras
shortest path algorithm, each newly reached node is withdrawn from the
next competition round, which guarantees that the Markov chain eventually
terminates. Besides terminating by extinction of all available nodes, after
each transition when a new node is discovered, the Markov chain stops with
1
, since each of the q already discovered nodes has
probability equal to Q3q
precisely 1 possibility out of the remaining Q  q to reach E and only one
of them is the discoverer. The stopping time W is dened as the inmum
for w  0 at which the destination node E is discovered. In summary, the
described Markov discovery process, a pure birth process with birth rate
q = q(Q  q), models exactly the shortest path for all values of Q .
16.2.2 The uniform recursive tree
A uniform recursive tree (URT) of size Q is a random tree rooted at D. At
each stage a new node is attached uniformly to one of the existing nodes
until the total number of nodes is equal to Q . The hopcount kQ (equivalent
to the depth or distance) is the smallest number of links between the root
D and a destination chosen uniformly from all nodes {1> 2> = = = > Q }.

352

The Shortest Path Problem

o
n
(n)
Denote by [Q the n-th level set of a tree W , which is the set of nodes
in the tree W at hopcount n from the root nD in aograph with Q nodes, and
(n)
(n)
(0)
by [Q the number of elements in the set [Q . Then, we have [Q = 1
because the zeroth level can only contain the root node D itself. For all
(n)
n A 0, it holds that 0  [Q  Q  1 and that
Q31
X

(n)

[Q = Q

(16.3)

n=0
(q)

Another consequence of the denition is that, if [Q = 0 for some level


(m)
q ? Q  1, then all [Q = 0 for levels m A q. In such a case, the longest
possible shortest path in the tree has a hopcount of q. The level set
n
o
(1)
(2)
(Q31)
OQ = 1> [Q > [Q > = = = > [Q
(n)

of a tree W is dened as the set containing the number of nodes [Q at each


level n. An example of a URT organized per level n is drawn on the right
in Fig. 16.2 and in Fig. 16.3. A basic theorem for URTs proved in van der
Hofstad et al. (2002b), is the following:
(n)

(n)

Theorem 16.2.1 Let {\Q }n>QD0 and {]Q }n>QD0 be two independent
copies of the vector of level sets of two sequences of independent URTs.
Then
(n)

(n31)

{[Q }nD0 = {\Q1

(n)

+ ]Q3Q1 }nD0 >

(16.4)

where on the right-hand side the random variable Q1 is uniformly distributed


over the set {1> 2> = = = > Q  1}.
Theorem 16.2.1 also implies that a subtree rooted at a direct child of the
root is a URT. For example, in Fig. 16.3, the tree rooted at node 5 is a
URT of size 13 as well as the original tree without the tree rooted at node
5. By applying Theorem 16.2.1 to the URT subtree, any subtree rooted at
a member of a URT is also a URT.
An arbitrary URT X consisting of Q nodes and with the root labeled by
1 can be represented as
X = (q2 # 2) (q3 # 3) = = = (qQ # Q )

(16.5)

where (qm # m) means that the m-th node is attached to node qm 5 [1> m  1]
and q2 = 1. Hence, qm is the predecessor of m and the predecessor relation
is indicated by the arrow #. Moreover, qm is a discrete uniform random
variable on [1> m  1] and all q2 > q3 > = = = > qQ are independent.

16.2 The shortest path tree in NQ with exponential link weights

Root
1

12

18

22

24

26

10

14

21

13

16

20

23

25

19

11

15

17

353

X N( 0)

X N(1)

X N( 2)

X N( 3)

X N( 4)

Fig. 16.3. An instance of a uniform recursive tree with Q = 26 nodes organized per
level 0  n  4. The node number (inside the circle) indicates the order in which
the nodes were attached to the tree.

Theorem 16.2.2 The total number of URTs with Q nodes is (Q  1)!


Proof: (a) Let the nodes be labeled in the order of attachment to the
URT and assign label 1 for the root. The URT growth law indicates that
node 2 can only be attached in one way, node 3 in two ways, namely to node
1 and node 2 with equal probability. The n-th node can be attached in n  1
possible nodes. Each of these possible constructions leads to a URT.
(b) By summing over all allowable congurations in (16.5), we obtain
1 X
2
X
q2 =1 q3 =1

and this proves the theorem.

===

Q31
X

1 = (Q  1)!

qQ =1

In general, Cayleys Theorem (Appendix B.1 art. 3) states that there are
Q Q32 labeled trees possible. The URT is a subset of the set of all possible
labeled trees. Not all labeled trees are URTs, because the nodes that are
further away from the root must have larger labels.
The shortest path tree from the source or root D to other nodes in the complete graph is the tree associated with the Markov discovery process, where
the number of nodes [(w) at time w is constructed as follows. Just as the discovery process, the associated tree starts at the root D. We now investigate
the embedded Markov chain (Section 10.4) of the continuous-time discovery
process. After each transition in the continuous-time Markov chain, [(w) $

354

The Shortest Path Problem

[(w)+1, an edge of unit length is attached randomly to one of the q already


discovered nodes in the associated tree because a new edge is equally likely
to be attached to any of the q discovering nodes. Hence, the construction
of the tree associated with the Markov discovery process and illustrated in
Fig. 16.2 on the right demonstrates that the shortest path tree in the complete graph NQ with exponential link weights is an uniform recursive tree.
This property of the shortest path tree in NQ with exponential link weights
is an important motivation to study the URT. More generally, in van der
Hofstad et al. (2001) we have proved that, for a xed link density s and su!ciently large Q , the shortest path tree in the class RGU, the class of random
graphs Js (Q ) with exponential or uniformly distributed link weights, is a
URT. Smythe and Mahmoud (1995) have reviewed a number of results on
recursive trees that have appeared in the literature from the late 1960s up
to 1995.

16.3 The hopcount kQ in the URT


16.3.1 Theory
The hopcount kQ from the root to an arbitrary chosen node in the URT
equals the number of links or hops from the root to that node. We allow
the arbitrary node to coincide with the root in which case kQ = 0.
Theorem 16.3.1 The probability generating function of the hopcount in the
URT with Q nodes is
i
h
(Q + })
*kQ (}) = H } kQ =
(16.6)
(Q + 1)(} + 1)
Proof: Since the number of nodes at hopcount n from the root (or at
(n)
level n) is [Qk , al node uniformly chosen out of Q nodes in the URT has
(n)

probability

H [Q
Q

of having hopcount n,
Pr[kQ = n] =

h
i
(n)
H [Q
Q

(16.7)

If the size of the URT grows from q to q + 1 nodes, each node at hopcount
n  1 from the root can generate a node at hopcount n with probability 1@q.
Hence, for n  1,
i
h
(n31)
h
i Q31
X H [q
(n)
H [Q =
q
q=n

16.3 The hopcount kQ in the URT

355

With (16.7), a recursion for Pr[kQ = n] follows for n  1 as


Pr[kQ

Q 31
1 X
= n] =
Pr[kq = n  1]
Q
q=n

The generating function of kQ equals


Q31
i
h
X
kQ
= Pr[kQ = 0] +
Pr [kQ = n] } n
*kQ (}) = H }
n=1

1
1
+
Q
Q

1
1
+
=
Q
Q

Q
31 Q31
X
X
n=1 q=n
q
Q
31 X
X
q=1 n=1

Pr[kq = n  1]} n

Q31
} X
1
+
Pr[kq = n  1]} =
*k (})
Q
Q q=1 q
n

Taking the dierence between (Q + 1)*kQ +1 (}) and Q *kQ (}) results in the
recursion
(Q + 1)*kQ +1 (}) = (Q + })*kQ (})


Iterating this recursion starting from *k1 (}) = H } k1 = H } 0 = 1 leads
to (16.6).

Corollary 16.3.2 The probability density function of the hopcount in the


URT with Q nodes is
(n+1)

(1)Q 3(n+1) VQ
(16.8)
Q!
Proof: The probability generating function *kQ (}) in (16.6) is also the
(n)
generating function of the Stirling numbers VQ of the rst kind (Abramowitz
and Stegun, 1968, 24.1.3) such that the probability that a uniformly chosen
node in the URT has hopcount n equals (16.8).

Pr[kQ = n] =

The explicit form of the generating function shows that the average hopcount kQ in a URT of size Q equals

Q
X

1
g
0
H[kQ ] = *kQ (1) =
log *kQ (})
(16.9)
=
g}
o
}=1
o=2

= #(Q + 1) +   1
0

(})
where #(}) = KK(})
is the digamma function (Abramowitz and Stegun,
1968, Section 6.3) and the Euler constant is  = 0=57721 = = =. Similarly,

356

The Shortest Path Problem

the variance (2.27) follows from the logarithm of the generating function
OkQ (}) = log (Q + })  log (Q + 1)  log (} + 1) as
Var[kQ ] = # 0 (Q + 1)  # 0 (2) + #(Q + 1) +   1
2
+ # 0 (Q + 1)
6
Using the asymptotic formulae for the digamma function leads to

1
H[kQ ] = log Q +   1 + R
Q

2
1

+R
Var[kQ ] = log Q +  
6
Q
= #(Q + 1) +  

(16.10)
(16.11)

For large Q , we apply an asymptotic formula of the Gamma function


(Abramowitz and Stegun, 1968, Section 6.1.47) to the generating function
of the hopcount (16.6),


1
Q }31
*kQ (}) =
1+R
(} + 1)
Q
P
1
n
= "
Introducing the Taylor series of K(})
n=1 fn } where the coe!cients fn
are listed in Abramowitz and Stegun (1968, Section 6.1.34), we obtain with
Q } = h} log Q ,


"
"
X
1
logn Q n
1 X
n31
} 1+R
fn }
*kQ (}) =
Q
n!
Q
n=1
n=0
1 " n
1+R Q X X
logn3p Q n
=
}
fp+1
Q
(n  p)!
n=0 p=0

With the denition (2.18) of the probability generating function, we conclude


that the asymptotic form of the probability density function (16.8) of the
hopcount in the URT is
n
1 + R Q1 X
logn3p Q
(16.12)
fp+1
Pr[kQ = n] =
Q
(n

p)!
p=0
Since the coe!cients fn are rapidly decreasing, approximating the sum in
(16.12) by its rst term (p = 0) yields to rst order in Q ,
(log Q )n
(16.13)
Pr[kQ = n] 
Q n!
which is recognized as a Poisson distribution (3.9) with mean log Q . Hence,
for large Q and to rst order, the average and variance of the hopcount in

16.3 The hopcount kQ in the URT

357

the URT are approximately H[kQ ]  Var[kQ ]  log Q . The accuracy of the
Poisson approximation can be estimated by comparison with the average
(16.10) and the variance (16.11) found above up to second order in Q . For
example, if the URT has Q = 104 nodes, the Poisson approximation yields
H[kQ ] = Var[kQ ] = 9=21034, while the average (16.10) is H[kQ ] = 8=78756
accurate up to 1034 and the variance (16.11) is Var[kQ ] = 8=14262. The
exact results are H[kQ ] = 8=78761 and Var[kQ ] = 8=14277.

16.3.2 Application of the URT to the hopcount in the Internet


In trace-route measurements explained in Van Mieghem (2004a), we are
interested in the hopcount KQ denoted with capital K, which equals kQ in
the URT excluding the event kQ = 0. In other words, the source and the
destination are dierent nodes in the graph. Since from (16.8) Pr[kQ = 0] =
(1)

(31)Q 31 VQ
Q!

1
Q

we obtain, for 1  n  Q  1,

Pr[KQ = n] = Pr[kQ = n|kQ 6= 0] =


=

Pr[kQ = n> kQ 6= 0]
Pr[kQ 6= 0]

Q
Pr[kQ = n]
Q 1

Using (16.8), we nd
(n+1)

Pr[KQ = n] =

Q (1)Q3(n+1) VQ
Q 1
Q!

(16.14)

with corresponding generating function,


*KQ (}) =

Q31
X

Pr[KQ = n] } n

n=1
Q31
Q X
Q
Pr[kQ = 0]
Pr[kQ = n] } n 
Q 1
Q 1
n=0

Q
1
=
*kQ (}) 
Q 1
Q

The average hopcount H[KQ ] = H[kQ |kQ 6= 0] is


H[KQ ] =

Q 31
Q X1
Q 1
o
o=2

(16.15)

358

The Shortest Path Problem

Hence, for large Q and in practice, we nd that

Pr[KQ = n] = Pr[kQ = n] + R

1
Q

which allows us to use the previously derived expressions (16.12), (16.10)


and (16.11).
The histogram of the number of traversed routers in the Internet measured
between two arbitrary communicating parties seems reasonably well modeled
by the pdf (16.12). Figure 16.4 shows both the histogram of the hopcount
deduced from paths in the Internet measured via the trace-route utility and
the t with (16.12). From the t, we nd a rather high number of nodes

Asia
Europe
USA
fit with log(NAsia) = 13.5
fit with log(NEurope) = 12.6
fit with log(NUSA) = 12.9

0.10

Pr[H = k]

0.08

0.06

0.04

0.02

0.00
0

10

15

20

25

30

hop k

Fig. 16.4. The histograms of the hopcount derived from the trace-route measurement in three continents from CAIDA in 2004 are tted by the pdf (16.12) of the
hopcount in the URT.

h12=6  3 105  Q  h13=5  7 105 , which points to the approximate nature of


modeling the Internet hopcount by that deduced from a URT. The relation
between Internet measurements and the properties of the URT is further
analyzed in a series of articles (Van Mieghem et al., 2000; van der Hofstad
et al., 2001; Van Mieghem et al., 2001b; Janic et al., 2002; van der Hofstad
et al., 2002b). At the time of writing, an accurate model of the hopcount in
the Internet is not available.

16.4 The weight of the shortest path

359

16.4 The weight of the shortest path


The weight sometimes also called the length of the shortest path is
dened as the sum of the link weights that constitute the shortest path. In
Section 16.2.1, the shortest path tree in the complete graph with exponential
link weights was shown to be a URT. In this section, we conne ourselves
to the same type of graph and require that the source node D (or root) is
dierent from the destination node E.
By Theorem 10.2.3 of a continuous-time Markov chain, the discovery time
Pn
of the n-th node from node D equals yn =
q=1 q , where 1 > 2 > = = = > n
are independent, exponentially distributed random variables with parameter
q = q(Q q) with 1  q  n. We call m the interattachement time between
the discovery or the attachment to the URT of the m  1-th and m-th node
in the graph. The Laplace transform of yn is
Z "
3}y
g
n
H h
h3}w Pr [yn  w]
=
gw
0
For a sum of independent exponential random variables, using the probability generating function (3.16), we have
"

!#
n
n
n
X
Y
Y

3}y
q(Q  q)
q
H h3}q =
H h n = H exp }
=
} + q(Q  q)
q=1
q=1
q=1
(16.16)

The probability generating function3 *ZQ (}) = H h3}ZQ of the weight


ZQ of the shortest path equals
*ZQ (}) =

Q31
X

H h3}yn Pr [E is n-th attached node in URT]

n=1

Q31 n
1 X Y q(Q  q)
Q 1
} + q(Q  q)

(16.17)

n=1 q=1

because any node apart from the root D but including the destination node
E has equal probability to be the n-th attached node.
The average weight is

Q 31
n
g*ZQ (})
1 X g Y q(Q  q)
H [ZQ ] = 
=

g}
Q 1
g} q=1 } + q(Q  q)
}=0
n=1

1
d

}=0

If the link weights have mean


(instead of 1), then ZQ is multiplied by d as explained in
Sections 16.2.1 and 3.4.1. The weight of the scaled shortest path ZQ>d has pgf
k
l
*ZQ>d (}) = H h3}dZQ = *ZQ (d})

360

The Shortest Path Problem

Using the logarithmic derivative of the product,

n
g Y q(Q  q)

g} q=1 } + q(Q  q)

n
Y

}=0

q(Q  q) g
=
} + q(Q  q) g}
q=1
=

n
X

q(Q  q)
log
} + q(Q  q)
q=1

}=0

n
X

1
q(Q  q)
q=1

gives
Q31 n
Q31
Q
31
X
1 X
1
1
1 XX
H [ZQ ] =
=
1
Q 1
q(Q  q)
Q  1 q=1 q(Q  q)
q=1

1
Q 1

n=1
Q31
X
q=1

n=q

Q q
q(Q  q)

The average weight is


H [ZQ ] =

Q31
#(Q ) + 
1 X 1
=
Q 1
q
Q 1

(16.18)

q=1

For large Q ,
log Q + 
+R
H [ZQ ] =
Q

1
Q2

Similarly, the variance is computed (see problem (ii) in Section 16.9) as,
2
P
Q 31 1
Q31
X
q=1 q
1
3
Var [ZQ ] =

(16.19)
2
Q (Q  1)
q
(Q  1)2 Q
q=1
and for large Q,
2
+R
Var [ZQ ] =
2Q 2

log2 Q
Q3

By inverse Laplace transform of (16.17), the distribution Pr [ZQ  w] can


be computed. The asymptotic distribution for the weight of the shortest
path is (see problem (iii) in Section 16.9)
3{

lim Pr [Q ZQ  log Q  {] = h3h

Q<"

(16.20)

A related but slightly more complex analysis is presented in Section 16.5.1


where we study the ooding time. The interest of such an asymptotic analysis is that it often leads to tractable solutions that are physically more appealing to interpret. Moreover, it turns out that results for nite, not too
small Q are reasonably approximated by the asymptotic law.

16.5 The ooding time WQ

361

Since ZQ equals the sum of the link weights of the shortest path from
the root to an arbitrary node and since KQ = kQ |kQ A 0 is the number of
links in that shortest path (where the arbitrary destination node is dierent
from the root), one may wonder whether there is a relation between them.
Although the shortest path has precisely KQ hops, the destination node of
that path is not necessarily the KQ -th attached node to the URT grown
at the root. The destination node cannot be discovered sooner than the
KQ -th attached node, otherwise the hopcount of the shortest path would be
shorter than KQ . Hence, the destination node is the n-th discovered node
and attached to the URT somewhere in between the KQ  1-th and the last
attached node. Thus, n 5 [KQ > Q  1]. If n = KQ , then all previously
discovered nodes belong to the shortest path and the m-th attached node in
the URT is linked to the m  1-th, for all m  n. If n A KQ , precisely n  KQ
of the attached nodes do not belong to the shortest path. Hence ZQ = Zn
provided n  KQ nodes in the URT discovered so far do not belong to the
path and precisely KQ do. The latter condition requires the determination
of all structurally favorable possibilities which is rather complex.
Curiously, the probability that the shortest path consists of the direct
(2)
link between source and destination is, with (16.14), (16.18) and VQ =
P
Q31 1
(1)Q (Q  1)! n=1
n,
Pr[KQ

Q31
1 X 1
= H [ZQ ]
= 1] =
Q 1
n
n=1

16.5 The ooding time WQ


The most commonly used process that informs each node (router) about
changes in the network topology is called ooding: the source node initiates
the ooding process by sending the packet with topology information to all
adjacent neighbors and every router forwards the packet on all interfaces
except for the incoming one and duplicate packets are discarded. Flooding
is particularly simple and robust since it progresses, in fact, along all possible paths from the emitting node to the receiving node. Hence, a ooded
packet reaches a node in the network in the shortest possible time (if overheads in routers are ignored). Therefore, an interesting problem lies in the
determination of the ooding time WQ , which is the minimum time needed
to inform all nodes in a network with Q nodes. Only after a time WQ , all
topology databases at each router in the network are again synchronized,
i.e. all routers possess the same topology information. The ooding time WQ

362

The Shortest Path Problem

is dened as the minimum time needed to reach all Q  1 remaining nodes


from a source node over their respective shortest paths.
We will here consider the ooding time WQ in the complete graph containing Q nodes and with independent, exponentially distributed link weights
with mean 1. The generalization to the random graph Js (Q ) with i.i.d. exponential (or uniform4 ) distributed link weight is treated in van der Hofstad
et al. (2002a).
The ooding time WQ equals the absorption time, starting from state
q = 1 of the birth-process with rates (16.2). The probability generating
function follows directly from (16.16) with n = Q  1,
Z
3{WQ

*WQ ({) = H[h

"

]=
0

h3{w iWQ (w) gw =

Q31
Y
q=1

q(Q  q)
q(Q  q) + {

(16.21)

The average ooding time equals


H[WQ ] =

Q31
X

H [q ] =

q=1

Q31
X
q=1

Q31
1
2 X 1
2
=
= (#(Q ) + ) (16.22)
q(Q  q)
Q q=1 q
Q

Using the asymptotic expansion (Abramowitz and Stegun, 1968, Section


6.3.18) of the diagamma function, we conclude that
2 log Q
Q
which demonstrates that the average ooding time in the complete graph
with exponential link weights with mean 1 decreases to zero when Q $ 4.
Also, the average ooding time is about twice as long as the average weight
of an arbitrary shortest path (16.18). The variance of WQ equals
H[WQ ] 

Q31
Q31
1
2 X 1
4 X 1
=
+
q2 (Q  q)2
Q 2 q=1 q2 Q 3 q=1 q
q=1
q=1
(16.23)

log Q
2
For large Q , we have that Var[WQ ] = 3Q 2 + R Q 3 .

Var[WQ ] =

Q31
X

Var [q ] =

Q31
X

16.5.1 The asymptotic law for ooding time WQ


The exact expression iWQ (w) for probability density function of the ooding
time WQ derived in van der Hofstad et al. (2002a), does not provide much
4

Both the exponential and uniform distribution are regular distributions with extreme value
index  = 1. This means that the small link weights that are most likely included in the
shortest path are almost identically distributed for all regular distributions with same iz (0).

16.5 The ooding time WQ

363

insight. Because we are interested in the ooding time in large networks,


we investigate the asymptotic distribution of WQ > for Q large. We rewrite
(16.21) as
[(Q  1)!]2
h
*WQ ({) = Q

Q31
Q2
q=1 { + 4  q 
For Q = 2P , using

K(}+p)
K(}+1)

*W2P ({) =

Qp31
q=1

Q 2
2

(16.24)

(q + }), we deduce that

!2
s
(2P )(1 + { + P 2  P )
s
(P + { + P 2 )

(16.25)

s
{
For large P , there holds { + P 2  P + 2P
, provided |{| ? 2P . After
substitution of { = 2P | in (16.25), with ||| ? 1, we obtain
*W2P (2P |)  2 (1 + |)

2 (2P )
 2 (1 + |)(2P )32|
2 (2P + |)

from which follows the asymptotic relation


lim Q 2| *WQ (Q |) = 2 (1 + |)>

Q<"

Equivalently, we have for ||| ? 1,


3|(QWQ 32 log Q)

lim H[h

Q<"

1
] = lim
Q<" Q

"

3|w

h
3"

||| ? 1

iWQ

w + 2 log Q
Q

(16.26)

gw

= 2 (1 + |)
This limit demonstrates that the probability distribution function of the
random variable QWQ  2 log Q converges to a probability distribution with
Laplace transform 2 (1 + |). Let us dene the normalized density function

w + 2 log Q
1
(16.27)
jQ (w) = iWQ
Q
Q
We can prove convergence in density, i.e. limQ <" jQ (w) = j (w) and that the latter exists. By
the inversion theorem for Laplace transforms we obtain for w M R,
lim jQ (w) = lim

Q <"

Q <"

1
2l

f+l"

f3l"

h|w Q 2| *WQ (Q|)g|

where 0 ? f ? 1. Since K(}) is analytic over the entire complex plane except for simple poles at
the points } = 3q for q = 0> 1> 2> ===> we nd that Q 2| *WQ (Q|) is analytic whenever the real part
of | is non-negative. Evaluation along the line Re(|) = f = 0 then gives
] "
1
lim jQ (w) = lim
hlwx Q 2lx *WQ (lQx)gx
Q <"
Q <" 2 3"

364

The Shortest Path Problem

As dominating function we take


|hlwx Q 2lx *WQ (lQx)| = |*WQ (lQx)| $

1 + x2
x4

when x| A 1> and |*WQ (lQx)| $ 1> for |x| $ 1= This follows from the rst equality in (16.24),
using only the factors in the product with q = 1 and q = Q 3 1> and bounding the other factors
using
q(Q 3 q)
$1
|q(Q 3 q) + lQx|
The Dominated Convergence Theorem 6.1.4 allows us to interchange the limit and integration
operator such that
lim jQ (w) =

Q <"

1
2

1
=
2l

"

hlwx lim Q 2lx *WQ (lQx)gx =

3"
] l"

Q <"

1
2l

l"

3l"

hw| lim Q 2| *WQ (Q|)g|


Q <"

hw| K2 (1 + |)g|

(16.28)

3l"

The right-hand side of (16.26) is a perfect square, which indicates that


the limit distribution is a two-fold convolution. Now, the Mellin transform
(Titchmarsh, 1948) of the exponential function is
Z f+l"
1
3w
w3|  (|) g|>
fA0
h =
2l f3l"
and thus with w = h3x ,
1
g 3h3x
h
=
gx
2l

f+l"

h|x  (| + 1) g|

f3l"

which shows that (16.28) is the two-fold convolution of the probability den3w
g
sity function gw
(w)> where (w) = h3h is the Gumbel distribution (3.37).
Furthermore, the two-fold convolution is given by
Z "
g (2W)
3x
3(w3x)
h3h h3h
gx
 (w) = h3w
gw
3"

Z "
w
3w
3w@2
 x gx
=h
exp 2h
cosh
2
3"
Z "
h
i

= 2h3w
exp 2h3w@2 cosh (x) gx = 2h3w N0 2h3w@2
0

where N ({) denotes the modied Bessel function (Abramowitz and Stegun,
1968, Section 9.6) of order .
In summary,

g (2W)
 (w) = 2h3w N0 2h3w@2
(16.29)
lim jQ (w) = j(w) =
Q<"
gw

16.5 The ooding time WQ

365

and the corresponding distribution function is


Z }

h3w N0 (2h3w@2 )gw = 2h3}@2 N1 2h3}@2


lim Pr[Q WQ  2 log Q  }] = 2
Q<"

3"

(16.30)
The right-hand side of (16.29) is maximal for w = 0=506357, which is slightly
smaller than  = 0=577261> but still in accordance with H[WQ ] given by
(16.22). The asymmetry shows that {Q WQ  2 log Q + }} is much more
0

10

M=5
M = 10
M = 20
limit M of

-1

10

-2

10

g2M(t)

g2M(t)

0.20

-3

10

0.15
0.10
0.05

-4

10

0.00
-4

-2

10

-5

10

-4

-2

10

Fig. 16.5. The scaled density jQ (w) for three values of Q = 2P (dotted lines) and
the asymptotic result (full line) on a log-lin scale. The insert is drawn on a lin-lin
scale.

likely than the event {Q WQ  2 log Q  }}, which conrms the intuition that
the ooding time can be much longer than the average H[WQ ], but not so
much shorter than H[WQ ]. Figure 16.5 illustrates the convergence of jQ (w) to
the limit in (16.29). When comparing (16.26) with the corresponding result
(C.6) for the weight of the shortest path, we observe that, for large Q , the
random variable Q WQ  2 log Q consists of the sum of Q ZQ;1  log Q +
Q ZQ;2  log Q , where both Q ZQ;m  log Q are i.i.d. random variables.
Intuitively, we can say that the ooding time consists of the time to travel
from a left-hand corner of the graph to the center and from the center to a
right-hand corner of the graph.
The asymptotic distribution (16.30) is a beautiful example of a sum of Q

366

The Shortest Path Problem

independent random variables that clearly does not converge to a Gaussian


and, hence, does not obey the (extended) Central Limit Theorem 6.3.1.

16.6 The degree of a node in the URT


o
n
(n)
the set of nodes with degree n in a graph with
Let us denote by GQ
(n)

Q nodes
n
o and by GQ the cardinality (the number of elements) of this set
(n)
GQ . Since each node appears only in one set, it holds for any graph
that
Q31
X (n)
GQ = Q
(16.31)
n=1

In a probabilistic setting, we may investigate the event that the degree n


occurs in a graph of size Q . The expectation of that event is
5
6
Q
Q
Q
i
i X
h
h
X
X
(n)
7
8
1{gm =n} =
H 1{gm =n} =
Pr [gm = n] (16.32)
H GQ = H
m=1

m=1

m=1

By summing over all n, we verify that


"Q31
#
Q Q31
Q
X (n)
X
X
X
H
GQ =
Pr [gm = n] =
1=Q
n=1

m=1 n=1

m=1

which is again (16.31).


h
i
(n)
16.6.1 Recursion for Pr GQ = m in the URT
The growth law of URTs dictates the way a specic tree of size Q transforms
to the tree of size Q + 1 by adding the node with label Q + 1 at random.
Based on this growth law, the set of nodes with degree n in a specic tree
of size Q + 1 consists of:
(i) the same set of nodes with degree n in the ancestor tree of size Q
provided the new node qQ+1 is not attached to any of the nodes of
this set nor to any of the nodes with degree n  1;
(ii) the same set of nodes with degree n except for one, say node qo ,
provided the new node qQ+1 is attached to that node qo ;
(iii) the same set of nodes with degree n and one additional node of the
set of n  1 degree nodes provided the new node qQ+1 is attached to
a node of the set of degree n  1.

16.6 The degree of a node in the URT

367

The evolution scenario in three parts is generally applicable for any class
of trees that possess a growth law. It does not hold for graphs in general
because only in a tree, a node has one well-dened parent node and the
in-degree is one. Using the law of total probability (2.46) yields,
i
h
n
n
oi h
oi
h
(n)
(n)
(n) (n31)
(n) (n31)
@ GQ >GQ
@ GQ >GQ
Pr qQ+1 5
Pr GQ +1 = m = Pr GQ = m|qQ +1 5
oi h
oi
n
n
h
(n)
(n)
(n)
Pr qQ+1 5 GQ
+ Pr GQ = m + 1|qQ+1 5 GQ
h
oi h
oi
n
n
(n)
(n31)
(n31)
+ Pr GQ = m  1|qQ+1 5 GQ
Pr qQ+1 5 GQ
If the process of attaching a new node Q + 1 does not depend on the way
thehQ previous nodes
i arehattached ibut rather on their number, there holds
(n)
(n)
Pr GQ = m|qQ+1 = Pr GQ = m . This property holds for the URT. We
obtain a three point recursion for n A 1,
oi
h
i
h
i h
n
(n)
(n)
(n)
(n31)
Pr GQ+1 = m = Pr GQ = m Pr qQ+1 5
@ GQ > GQ
i h
n
oi
h
(n)
(n)
+ Pr GQ = m + 1 Pr qQ +1 5 GQ
oi
i h
n
h
(n)
(n31)
+ Pr GQ = m  1 Pr qQ +1 5 GQ
The probability generating function
"
h (n) i X
h
i
(n)
*G (}> Q ; n) = H } GQ =
Pr GQ = m } m
m=0

is obtained after multiplication by } m and summing over all m,


oi
h
n
(n)
(n31)
*G (}> Q ; n)
*G (}> Q + 1; n) = Pr qQ+1 5
@ GQ > GQ
h
oi * (}> Q ; n)  * (0> Q; n)
n
(n)
G
G
+ Pr qQ+1 5 GQ
}
h
oi
n
(n31)
+ Pr qQ+1 5 GQ
}*G (}> Q ; n)
h
i
(n)
Now *G (0> Q ; n) = Pr GQ = 0 is the probability of the event that there
are no nodes with degree n for 1  n  nmax  Q  1. Since the normalization of the generating function requires that *G (1> Q ; n) = 1 and since
oi
h
oi
h
oi
n
n
n
h
(n) (n31)
(n)
(n31)
+Pr qQ+1 5 GQ +Pr qQ+1 5 GQ
=1
@ GQ >GQ
Pr qQ+1 5
oi
i h
n
h
(n)
(n)
= 0. Further,
it follows that Pr GQ = 0 Pr qQ+1 5 GQ
oi
h
n
(n)
6= 0
Pr qQ+1 5 GQ

368

The Shortest Path Problem

for any n 5 [1> nmax ] because the attachment of the node qQ+1 is possible
to any non-empty set, this means that the absence
of nodes
with degree
h
i
(n)
n 5 [1> nmax ] cannot occur in URTs, thus Pr GQ = 0 = 0. A consequence is that the probability generating function *G (}> Q ; n) is at least
R (}) as
h } $ 0 n(for Q A 1).oiAfter using *G (0> Q ; n) = 0 and eliminat(n)
(n31)
, the recursion relation for the probability
@ GQ > GQ
ing Pr qQ +1 5
generating function becomes5
oi
n
4
3 h
(n)
h
oi
n
5
G
Pr
q
Q+1
Q
*G (}> Q + 1; n)
(n31) D
(1  })
= 1+C
 Pr qQ+1 5 GQ
*G (}> Q ; n)
}
(16.33)
The special case for n = 1 and Q A 1 is
h
oi
n
4
3
(1)
(1  })
Pr qQ+1 5 GQ
D *G (}> Q ; 1)
*G (}> Q + 1; 1) = C1 +
}

16.6.2 The Average Number of Degree n Nodes in the URT


In the URT, a new node qQ+1 is attached uniformly to any of Q previously
attached nodes such that
h
i
(n)
oi H GQ
n
h
(n)
=
Pr qQ+1 5 GQ
Q
Also, the probability
that an arbitrary node in a URT of size Q has degree
k
l
(n)

n equals

H GQ
Q

. We obtain from (16.33)


3

*G (}> Q + 1; n) = C1 + C

i
h
(n)
H GQ
Q}

h
i4
4
(n31)
H GQ
D (1  })D *G (}> Q ; n)

Q
(16.34)

k (n) l
(n)
= 1 for all p $ n because Gp = 0 if
With the initialization *G (}> p; n) = H } Gp
1 ? p $ n, after iterating (16.33) we arrive at

*G (}> Q ; n) =

Q
31 
\
p=n



k
rl 
rl
q
q
k
1
(n)
(n31)
1 3 Pr qp+1 M Gp
13
(1 3 })
3 Pr qp+1 M Gp
}

16.6 The degree of a node in the URT

369

By taking the derivative of both sides in (16.34) with respect to } and


evaluating at } = 1, a recursion for the average is found,
h
i
(n31)
i Q 1 h
i H GQ
h
(n)
(n)
H GQ +
(16.35)
H GQ+1 =
Q
Q
i
h
(n)
(n)
Let uQ = (Q  1)H GQ , then the recursion valid for 1 ? n  Q  2
becomes
un31
(n)
(n)
uQ+1 = uQ + Q
(16.36)
Q 1
Theorem 16.6.1 In the URT, the average number of degree n nodes is given
by
(n)
n31
i Q
h
(1)Q+n31 VQ31
(1)Q X (m)
(n)
+ n
VQ31 (2)m (16.37)
H GQ = n +
2
(Q  1)!
2 (Q  1)!
m=1

Proof: See Section 16.8.

(n)

For large Q and using the asymptotics of the Stirling numbers VQ of the
rst kind (Abramowitz and Stegun, 1968, Section 24.1.3.III), the asymptotic
law is
h
i

!
(n)
H GQ
1
logn31 Q
= n +R
Pr [GURT = n] =
(16.38)
Q
2
Q2
The ratio of the average number of nodes with degree n over the total number
of nodes, which equals the probability that an arbitrary node in a URT of
size Q has degree n,
exponentially fast with rate ln 2.
i
h decreases
(n)
The variance Var GQ is most conveniently computed from the logarithm
of the probability generating function with (2.27). By taking the logarithm
of both sides in (16.34) and dierentiating twice and adding (16.35), we
obtain
i
h
i
h
(n)
(n)
Var GQ+1 = i (Q ; n) + Var GQ
where
3
i (Q ; n) =  C

i
h
(n)
H GQ
Q

h
h
i 42 3 h
i
i4
(n31)
(n)
(n31)
H GQ
H GQ
H GQ
D +C
D
+
+
Q
Q
Q

370

The Shortest Path Problem

h
i
(n)
Since Var Gp = 0 for p  n, the general solution is
Q
i X
h
(n)
i (m; n)
Var GQ =
m=n

For large Q , using (16.38), we observe that


h
i

!
(n)

Var GQ
log2n32 Q
3
1

=
+R
Q
2n 22n
Q2

(16.39)

(n)

G
(n)
In practice, if we use the estimator wQ = QQ for the probability that the
degree of a node equals n, then (a) the estimator is unbiased
because the
l
k
(n)
h i
H
G
Q
(n)
mean of the estimator H wQ equals the correct mean
and (b) the
Q
l
k


(n)
h i
(n)

Var GQ
G
(n)
variance Var wQ = Var QQ =
$ 0 as R Q1 for large Q .
2
Q

10

RIPE data (May-June 2003) N = 2574, L = 3992


fit: ln(Pr[D U = k]) = 0.44 - 0.67 k with U = 0.99

Pr[DU = k]

10

10

10

10

RIPE data (Jan.-Feb. 2004) N = 3850, L = 6743


fit: ln(Pr[D U = k]) = -0.49 - 0.41 k with U = 0.95

-1

-2

-3

-4

10

12

14

16

18

20

22

24

26

28

30

32

34

Fig. 16.6. The histogram of the degree GX derived from the graph JX formed by
the union of paths measured via trace-route in the Internet. Both measurements in
2003 and 2004 are tted on a log-lin plot and the correlation coe!cient  quanties
the quality of the t.

The law (16.38) is observed in Fig. 16.6, which plots the histogram of the
degree GX in the graph JX . The graph JX is obtained from the union of
trace-routes from each RIPE measurement box to any other box positioned

16.6 The degree of a node in the URT

371

mainly in the European part of the Internet. For about 50 measurement


boxes in 2003, the correspondence is striking because the slope of the t on
a log-lin scale equals 0.668 while the law (16.38) gives  ln 2 = 0=693.
Ignoring in Fig. 16.6 the leave nodes with n = 1 suggests that the graph JX
is URT-like. For 72 measurement boxes in 2004 which obviously results in
a larger graph JX , deviations from the URT law (16.38) are observed. If
measurements between a larger full mesh of boxes were possible and if the
measurement boxes were more homogeneously spread over the Internet, a
power law behavior is likely to be expected as mentioned in Section 15.3.
However, these earlier reported trace-route measurements that lead to power
law degrees have been performed from a relatively small number of sources
to a large number of destinations. These results question the observability
of the Internet: how accurate are Internet properties such as hopcount and
degree that are derived from incomplete measurements, i.e. from a selected
small subset of nodes at which measurement boxes are placed?

16.6.3 The degree of the shortest path tree in the complete graph
with i.i.d. exponential link weights
In the complete graph NQ with i.i.d. exponential link weights, any node q
possesses equal properties in probability because of symmetry. If we denote
by gq the degree of node q in the shortest path tree rooted at that node q,
the symmetry implies that Pr [gq = n] = Pr [gl = n] for any node q and l. In
fact, we consider here the degree of a URT as an overlay tree in a complete
graph. Concentrating on a node with label 1, we obtain from (16.32)
i
h
i
h
(n)
(1)
H GQ = Q Pr [g1 = n] = Q Pr [Q = n
The latter follows from the fact that the degree of a node is equal to the
(1)
number of its direct neighbors, the nodes at level 1, [Q . By denition of
the URT, the second node surely belongs to the level set 1, while node 3 has
equal probability to be attached to the root or to node 2. In general, when
attaching a node m to a URT of size m  1, the probability that node m is
1
. Thus, the number of nodes at level 1 in the
attached to the root equals m31
URT (constructed upon the complete graph) is in distribution equal to the
sum of Q  1 independent Bernoulli random variables each with dierent
1
,
mean m31
(1) g
[Q =

Q
X
m=2

Bernoulli

1
m1

Q31
X
m=1


1
Bernoulli
m

372

The Shortest Path Problem

because each node in the complete graph is connected to Q  1 neighbors.


The generating function is
 SQ 31
 
Q31
h (1) i
Y  Bernoulli 1 
1
[Q
m=1 Bernoulli m
m
=
H }
H }
=H }
m=1

Using the
generating function (3.1) of a Bernoulli random vari probability
 
Bernoulli 1m
able, H }
= 1  1m + }m , yields
h (1) i Q31
Y } + m  1 (} + Q  1)
[Q
=
H }
=
m
(})(Q )
m=1

Compared to the generating function (16.6) of the hopcount kQ , we recognize that


Q32
Q31
h (1) i
X
X
Pr[kQ31 = n]} n+1 =
Pr[kQ31 = n  1]} n
H } [Q = }*Q31 (}) =
n=0

n=1

from which we deduce, for 1  n  Q  1,


h
i
(1)
Pr [Q = n = Pr[kQ31 = n  1]
Using (16.7), we arrive at the curious result
h
i
(n31)
h
i H [Q
31
(1)
for 1  n  Q  1=
Pr [Q = n =
Q 1
The probability that the number of level 1 nodes in the shortest path tree
in the complete graph with i.i.d. exponential link weights is n equals the
average number of nodes on level n  1 in a URT of size Q  1 divided
by that size Q  1. In other words, the horizontal distribution at level
1 is related to the vertical distribution of the size of the level sets. In
summary6 , in the complete graph with i.i.d. exponential link weights, the
probability that an arbitrary node q as root of a shortest path tree has
degree n is
(n)

Pr [gq = n] = Pr[kQ31

(1)Q313n VQ31
= n  1] =
(Q  1)!

(16.40)

The degree of an arbitrary node in the union of all shortest paths trees in
the complete graph NQ with i.i.d. exponential link weights is also given by
(16.40) because in that union each node q is once a root and further plays,
6

This result is due to Remco van der Hofstad (private communication).

16.7 The minimum spanning tree

373

by symmetry, the role of the m-th attached node in the URT rooted at any
other node in NQ .
16.7 The minimum spanning tree
From an algorithmic point of view, the shortest path problem is closely related to the computation of the minimum spanning tree (MST). The Dijkstra
shortest path algorithm is similar to Prims minimum spanning tree algorithm (Cormen et al., 1991). In this section, we compute the average weight
of the MST in a graph with a general link weight structure.

16.7.1 The Kruskal growth process of the MST


Since the link weights in the underlying complete graph are chosen independently and assigned randomly to links in the complete graph, the resulting
graph is probabilistically the same if we rst order the set of link weights
and assign them in increasing order randomly to links in the complete graph.
In the latter construction process, only the order statistics or the ranking
of the link weights su!ce to construct the graph because the precise link
weight can be unambiguously associated to the rank of a link. This observation immediately favors the Kruskal algorithm for the MST over Prims
algorithm. Although the Prim algorithm leads to the same MST, it gives
a more complicated, long-memory growth process, where the attachment of
each new node depends stochastically on the whole growth history so far.
Pietronero and Schneider (1990) illustrate that in our approach Prim, in
contrast with Kruskal, leads to a very complicated stochastic process for the
construction of the MST.
The Kruskal growth process described here is closely related to a growth
process of the random graph Ju (Q> O) with Q nodes and O links. The
construction or growth of Ju (Q> O) starts from Q individual nodes and in
each step an arbitrary, not yet connected random pairs is connected. The
only dierence with Kruskals algorithm for the MST is that, in Kruskal,
links generating loops are forbidden. Those forbidden links are the links
that connect nodes within the same connected component or cluster. As
a result, the internal wiring of the clusters diers, but the cluster size statistics (counted in nodes, not links) is exactly the same as in the corresponding
random graph. The metacode of the Kruskal growth process for the construction of the MST is shown in Fig. 16.7.
The growth process of the random graph Js (Q ), which is asymptotically
equal to that of Ju (Q> O), is quantied in Section 15.6.4 for large Q . The

374

The Shortest Path Problem


KruskalGrowthMST
1. start with Q disconnected nodes
2. repeat until all nodes are connected
3.
randomly select a node pair (l> m)
4.
if a path Pl$m does not exist
5.
then connect l to m
Fig. 16.7. Kruskal growth process

fraction of nodes V in the giant component of Js (Q ) is related to the average


degree or to the link density s because rg = s(Q  1) in Js (Q ) by (15.20).
For large Q , the size of the giant cluster in the forest is thus determined as
a function of the number of added links that increase rg .

a
e
c

d
f
b

Fig. 16.8. Component structure during the Kruskal growth process.

We will now transform the mean degree rg in the random graph Js (Q )
to the mean degree MST in the corresponding stage in Kruskal growth
process of the MST. In early stages of the growth each selected link will
be added with high probability such that MST = rg almost surely. After
some time the probability that a selected link is forbidden increases, and
thus rg exceeds MST . In the end, when connectivity of all Q nodes is
reached, MST = 2 (since it is a tree) while rg = R(log Q ), as follows from
(15.19) and the critical threshold sf  logQQ .
Consider now an intermediate stage of the growth as illustrated in Fig. 16.8.
Assume there is a giant component of average size Q V and qo = Q(1 V)@vo
small components of average size vo each. Then we can distinguish six types
of links labelled d-i in Fig. 16.8. Types d and e are links that have been

16.7 The minimum spanning tree

375

chosen earlier in the giant component (d) and in the small components (e)
respectively. Types f and g are eligible links between the giant component
and a small component (f) and between small components (g) respectively.
Types h and i are forbidden links connecting nodes within the giant component (h), respectively within a small component (i ). For large Q , we can
enumerate the average number of links O{ of each type {:
Od + Oe = 12 PVW Q
Of = VQ (1  V)Q
Og = 12 q2o v2o

Oh = 12 (VQ )2  VQ
Oi = 12 qo vo (vo  1)  qo (vo  1)

To highest order in R(Q 2 ), we have


Of = Q 2 V(1  V)>

1
Og = Q 2 (1  V)2 >
2

1
Oh = Q 2 V 2
2

The probability that a randomly selected link is eligible is t =



or, to order R Q 2 ,

Of +Og
Of +Og +Oh +Oi

t = 1  V2

(16.41)

In contrast with the growth of the random graph Js (Q ) where at each stage
a link is added with probability s, in the Kruskal growth of the MST we are
only successful to add one link (with probability 1) per 1t stages on average.
Thus the average number of links added in the random graph corresponding
1
to one link in the MST is 1t = 13V
2 . This provides an asymptotic mapping
between rg and MST in the form of a dierential equation,
grg
1
=
gMST
1  V2
By using (15.22), we nd
gMST grg
(1 + V) (V + (1  V) log(1  V))
gMST
=
=
gV
grg gV
V2
Integration with the initial condition MST = 2 at V = 1, nally gives the
average degree MST in the MST as function of the fraction V of nodes in
the giant component
MST (V) = 2V 

(1  V)2
log(1  V)
V

(16.42)

As shown in Fig. 16.9, the asymptotic result (16.42) agrees well with the
simulation (even for a single sample), except in a small region around the
transition MST = 1 and for relatively small Q .
The key observation is that all transition probabilities in the Kruskal

376

The Shortest Path Problem

Fraction S of nodes in the giant component

1.0

0.8

N = 1000
N = 10000
N = 25000
Theory

0.6

0.4

0.2

0.0
0.0

0.5

1.0

1.5

2.0

Mean degree PMST

Fig. 16.9. Size of the giant component (divided by Q ) as a function of the mean
degree M ST . Each simulation for a dierent number of nodes Q consists of one
MST sample.

growth process asymptotically depend on merely one parameter V, the fraction of nodes in the giant component, and V is called an order parameter in
statistical physics. In general, the expectation of an order parameter distinguishes the qualitatively dierent regimes (states) below and above the phase
transition. In higher dimensions, uctuations of the order parameter around
the mean can be neglected and the mean value can be computed from a selfconsistent mean-eld theory. In our problem, the underlying complete (or
random) graph topology makes the problem eectively innite-dimensional.
The argument leading to (15.20) is essentially a mean-eld argument.

16.7.2 The average weight of the minimum spanning tree


By denition, the weight of the MST is
ZMST =

O
X

z(m) 1mMMST

(16.43)

m=1

where z(m) is the m-th smallest link weight. The average MST weight is
H [ZMST ] =

O
X
m=1

H z(m) 1mMMST

16.7 The minimum spanning tree

377

The random variables z(m) and 1mMMST are independent because the m-th
smallest link weight z(m) only depends on the link weight distribution and
the number of links O, while the appearance of the m-th link in the MST
only depends on the graphs topology, as shown in Section 16.7.1. Hence,

H z(m) 1mMMST = H z(m) H [1mMMST ] = H z(m) Pr [m 5 MST]


such that the average weight of the MST is
H [ZMST ] =

O
X

H z(m) Pr [m 5 MST]

(16.44)

m=1

In general for independent link weights with probability density function


iz ({) and distribution function Iz ({) = Pr [z  {], the probability density
function of the m-th order statistic follows from (3.36) as

miz ({) O
(Iz ({))m (1  Iz ({))O3m
(16.45)
iz(m) ({) =
Iz ({) m

The factor Om (Iz ({))m (1  Iz ({))O3m is a binomial distribution with mean
 = Iz ({) O and variance  2 = OIz ({) (1  Iz ({)) that, by the Central
Limit Theory 6.3.1, tends for large O to a Gaussian
m
O,

I1 h3
 2

(m3)2
2 2

, which

we
{m = H z(m) ' Iz31 ( Om ).
peaks at m = . For large Q and xed
We found before in (16.41) that the link ranked m appears in the MST
with probability
have7

Pr [m 5 MST] = 1  Vm2
where Vm is the fraction of nodes in the giant component during the construction process of the random graph at the stage where the number of
links precisely equals m. Since links are added independently, that stage in
fact establishes the random graph Ju (Q> O = m). Our graph under
Q consideration is the complete graph NQ such that we add in total O = 2 links.
7

31
In general, it holds that z(n) = Iz
(X(n) ) and



 31



31
H z(n) = H Iz
(X(n) ) 6= Iz
(H X(n) )
but, for a large number of order statistics O, the Central Limit Theorem 6.3.1 leads to
 




m
31
31
H z(n) ' Iz
(H X(n) )
' Iz
O
because for a uniform random variable X on [0,1] the average weight of the m-th smallest link
is exactly


m
m
H z(l) =
'
O+1
O

378

The Shortest Path Problem

With (15.22) and rg =

2O
Q,

it follows that
log(1  Vm )
2m
=
Q
Vm

Hence,
H [ZMST ] '

O
X
m=1

Iz31

(16.46)

m
1  Vm2
O

We now approximate the sum by an integral,


Z O
x

Iz31
H [ZMST ] '
1  Vx2 gx
O
1
Substituting { = 2x
Q (which is the average degree in any graph J (Q> x))
2
yields for large Q where O ' Q2 ,
Z
Z

Q Q 31 {
Q Q 3131 {
2
1  V Q { g{ '
Iz
Iz
H [ZMST ] '
1  V 2 ({) g{
2
2
Q
2 0
Q
2
Q

It is known (Janson et al., 1993) that, if the number of links in the growth
process of the random graph is below Q2 , with high probability (and ignoring
a small onset region just below Q2 ), there is no giant component such that
V ({) = 0 for { 5 [0> 1]. Thus, we arrive at the general formula valid for
large Q ,
Z
Z

Q Q 31 {
Q 1 31 {
g{ +
Iz
Iz
H [ZMST ] '
1  V 2 ({) g{
2 0
Q
2 1
Q
(16.47)
The rst term is the contribution from the smallest Q@2 links in the graph,
which are included in the MST almost surely. The remaining part comes
from the more expensive links in the graph, which are included with diminishing probability since 1  V 2 ({) decreases exponentially for large { as
can be deduced from (15.21). The rapid decrease
of 1  V 2 ({) makes only

relatively small values of the argument Iz31 Q{ contribute to the second


integral.
At this point, the specics of the link weight
distribution needs to be
introduced. The Taylor expansion of Q2 Iz31 Q{ for large Q to rst order is


1
1
{
Q 31 { Q 31
{
=
Iz
= Iz (0) +
+R
+R
2
Q
2
2iz (0)
Q
2iz (0)
Q
since we require that link weights are positive such that Iz31 (0) = 0. This
expansion is only useful provided iz is regular, i.e. iz (0) is neither zero nor

16.7 The minimum spanning tree

379

innity. These cases occur, for example, for polynomial link weights with
iz ({) = {31 and  6= 1. For polynomial link weights, however, holds

13 1
1
that Q2 Iz31 Q{ = Q 2  {  . Formally, this latter expression reduces to the
rst order Taylor approach for  = 1, apart from the constant factor iz1(0) .
Therefore, we will rst compute H [ZMST ] for polynomial link weights and
then return to the case in which the Taylor expansion is useful.
16.7.2.1 Polynomial link weights
The average weight of the MST for polynomial link weights follows8 from
(16.47) as
!

1
Z Q

1
1
Q 13 
+
{  1  V 2 ({) g{
H [ZMST ()] '
1
2
+
1
1

and g{ =
Let | = V ({) and use (15.22), then { = V 31 (|) =  log(13|)
|

log(13|)
g
g| while | = V (1) = 0 and | = V (Q ) = 1, such that
 g|
|
Z
L=
1

Z
=

1
{  1  V 2 ({) g{

log(1  |)

|

After partial integration, we have


L =1


1
+
+1

Finally, we end up with


H [ZMST ()] ' Q
8

1


2
+1

1


1
+1

1
13 

log(1  |)

g|
|

g
1  |2
g|
Z

"

h3{

{  +1

"

(1  h3{ ) 

h3{

1
+1


g{

(1  h3{ ) 

g{

(16.48)

Since the average of the n-th smallest link weight can be computed from (3.36) as


1


K n+ 
H!

H z(n) = 
1
K (n)
K H+1+ 
the exact formula (16.44) reduces to
H [ZM S T

H!
()] = 
K H+1+

1



H
[
K m+
m=1

1


K (m)



1 3 Vm2

Analogously to the above manipulations, after convertion to an integral, substituting { =


K(}+ 1 )
2x
and using (Abramowitz and Stegun, 1968, Section 6.1.47), for large }, that K(}) =
Q
 
1 
(})  1 + R }1 , we arrive at the same formula.

380

The Shortest Path Problem

If  ? 1, then H[ZMST ()] $ 0 for Q $ 4, while for  A 1, H[ZMST ()] $


4. In particular, lim<" H [ZMST ()] = Q  1. Only for  = 1,
H [ZPVW (1)] is nite for large Q . More precisely,
H [ZMST (1)] =  (3) = 1=202 = = =

(16.49)

where we have used (Abramowitz and Stegun, 1968,R Section 23.2.7) the
" v31
integral of the Riemann Zeta function  (v)  (v) = 0 hxx 31 gx, which is
convergent for Re (v) A 1. This particular case for  = 1 has been proved
earlier by Frieze (1985) based on a dierent method.
16.7.2.2 Generalizations
We now return to the Taylor series valid for link weights where 0 ? iz (0) ?
4. The above result for  = 1 immediately yields
H [ZMST ] =

 (3)
iz (0)

(16.50)

This result is for the complete graph NQ . A random graph Js (Q ) with s ? 1


and weight density iz ({) is equivalent to NQ with a fraction 1s of innite
link weights. Thus the eective link weight distribution is siz ({) + (1 
s)z>" , and we can simply replace iz (0) by siz (0) in the expression (16.50)
to obtain the average weight of the MST in the random graph Js (Q ).
16.8 The proof of the degree Theorem 16.6.1 of the URT
i
h
(Q31)
16.8.1 The case n = Q: H GQ
(Q )

If n = Q, the recursion (16.35) becomes with GQ


l
k
(Q )
H GQ +1 =
With initial value

(1)
G2

= 0,
l
k
(Q 31)
H GQ
Q

= 2, the solution
l
k
(Q 31)
=
H GQ

2
(16.51)
(Q 3 1)!
l
k
(n)
is readily veried. Since for any URT, it holds that Pr GQ = m = 0 for m A Q 3 n, we have
k
l
k
l
(Q 31)
(Q 31)
that H GQ
= Pr GQ
= 1 . Since there exists in total (Q 3 1)! dierent URTs of size
Q, this result (16.51) means that there are precisely two possible URTs with a node of degree
Q 3 1. Indeed, one is the root with Q 3 1 children and the other is the root with one child of
degree Q 3 1 that in turn possesses Q 3 2 children. Also,
(Q 31)

uQ

l
k
(Q 31)
=
= (Q 3 1)H GQ

2
(Q 3 2)!

(16.52)

16.8 The proof of the degree Theorem 16.6.1 of the URT

i
h
(1)
16.8.2 The case n = 1: H GQ

381

If n = 1 and Q D 3, the recursion (16.35) is slightly dierent because the newly attached node
nQ +1 necessarily belongs to the set of degree 1 nodes in the URT of size Q + 1 such that

(0)

(1)

with G1 = 1 and G2
for n = 1 becomes

k
l
Q 3 1 k (1) l
(1)
H GQ +1 =
H GQ + 1
Q
l
l
k
k
(1)
(1)
(1)
= 2. With uQ = (Q 3 1)H GQ , the recursion
= 2. Hence, H G3
(1)

(1)

uQ +1 = uQ + Q
(1)

(1)

The particular solution is uQ ;s = dQ 2 + eQ + f. Substitution of uQ = dQ 2 + eQ + f into the


dierence equation yields
dQ 2 + (e + 2d) Q + d + e + f = dQ 2 + (e + 1) Q + f
or, by equating corresponding power in Q, we nd the conditions e + 2d = e + 1 and d + e = 0
from which d = 12 , e = 3 12 . Thus,
l
k
Q
(1)
(1)
uQ = (Q 3 1)H GQ =
(Q 3 1) + f
2
and

k
l
f
Q
(1)
H GQ =
+
2
Q 31

k
l
(1)
Using H G3
= 2 shows that f = 1 such that, for Q A 2,
l
k
Q
1
(1)
H GQ =
+
2
Q 31

(16.53)

i
h
(n)
16.8.3 The general case: H GQ
Let us denote
U({> |) =

31
" Q
[
[

(n)

uQ {n |Q

(16.54)

Q =3 n=2

then the recursion (16.36) is transformed into


31
31
31
" Q
" Q
" Q
[
[
[
[
[
[
(n)
(n)
(n31) n Q
(Q 3 1)uQ +1 {n | Q =
(Q 3 1)uQ {n | Q +
uQ
{ |
Q =3 n=2

Q =3 n=2

Q =3 n=2

Now, the left-hand side is


31
" Q
" Q 32
" Q 32
[
[
1 [ [
1 [ [
(n)
(n)
(n)
(Q 3 1)uQ +1 {n | Q =
(Q 3 2)uQ {n | Q =
(Q 3 2)uQ {n | Q
|
|
Q =3 n=2
Q =4 n=2
Q =3 n=2

" Q 31
"
1 [ [
1 [
(n)
(Q 31) Q 31 Q
QuQ {n |Q 3
QuQ
{
|
| Q =3 n=2
| Q =3

" Q 31
"
2 [ [ (n) n Q
2 [ (Q 31) Q 31 Q
uQ { | +
u
{
|
| Q =3 n=2
| Q =3 Q

382

The Shortest Path Problem

Using (16.54) yields


31
" Q
[
[
CU({> |)
2
(n)
(Q 3 1)uQ +1 {n | Q =
3 U({> |)
C|
|
Q =3 n=2

"
"
2 [ (Q 31) Q 31 Q
1 [
(Q 31) Q 31 Q
QuQ
{
| +
u
{
|
| Q =3
| Q =3 Q

Invoking (16.52) yields


"
"
"
[
({|)Q
2 [ (Q 31) Q 31 Q
4 [ ({|)Q
uQ
{
| =
= 4{|
= 4{| (h{| 3 1)
| Q =3
{| Q =3 (Q 3 2)!
Q!
Q =1
"
"
"
[
2 [ Q ({|)Q
(Q + 2) ({|)Q
1 [
(Q 31) Q 31 Q
= 2{|
QuQ
{
| =
| Q =3
{| Q =3 (Q 3 2)!
Q!
Q =1

= 2 ({|)2

"
"
[
[
({|)Q
({|)Q
+ 4{|
= 2 ({|)2 h{| + 4{| (h{| 3 1)
Q!
Q!
Q =0
Q =1

such that
31
" Q
[
[
CU({> |)
2
(n)
(Q 3 1)uQ +1 {n | Q =
3 U({> |) 3 2 ({|)2 h{|
C|
|
Q =3 n=2

Similarly,
31
" Q
[
[

(n31) n Q

uQ

{ |

={

Q =3 n=2

32
" Q
[
[

(n)

uQ {n |Q

Q =3 n=1

={

31
" Q
[
[

(n)

uQ {n |Q + {2

Q =3 n=2
"
[

(1)

uQ | Q 3 {

Q =3
(1)

Q
2

{2

"
[

(1)

uQ |Q = {2

Q =3

"
[

(Q 31) Q 31 Q
uQ
{
|

=2

Q =3

(1)

uQ |Q 3 {

Q =3

= {U({> |) + {2

Using both (16.52) and uQ =

"
[

"
[

(Q 31) Q 31 Q

uQ

Q =3
"
[

(Q 31) Q 31 Q

uQ

Q =3

(Q 3 1) + 1 leads to

 3 
"
"
[
[
({|)2 g2
Q
|
{2 |3
(Q 3 1)|Q + {2
|Q =
+
2
2
2 g|
13|
13|
Q =3
Q =3

"
[
Q =3

"
[
({|)Q
({|)Q
= 2 ({|)2
= 2 ({|)2 (h{| 3 1)
(Q 3 2)!
Q!
Q =1

such that
31
" Q
[
[

(n31) n Q

uQ

{ |

Q =3 n=2

= {U({> |) +

({|)2 g2
2 g| 2

|3
13|


+

{2 | 3
3 2 ({|)2 (h{| 3 1)
13|

Combining all transforms the recursion (16.36) to a rst order linear partial dierential equation
(1 3 |)

CU({> |)
+
C|



2
{2 |3
3 3 3| + | 2
13{3
+ 2 ({|)2
+
U({> |) = {2 | 3
|
13|
(1 3 |)3


1
1
+
= {2 | 2
13|
(1 3 |)3

16.8 The proof of the degree Theorem 16.6.1 of the URT

383

with boundary equations U({> 0) = U(0> |) = 0. Further,


]

(n)
31
31
" Q
" Q
l
k
[
[
[
[
uQ
U({> |)
(n)
n Q 31
g|
=
|
=
H GQ {n | Q 31
{
|2
Q
3
1
Q =3 n=2
Q =3 n=2

Hence, if { = 1,
]

31
" Q
" 
l
k
l
k
[
[
[
U(1> |)
(n)
(1)
Q 3 H GQ
| Q 31
g| =
H GQ |Q 31 =
2
|
Q =3 n=2
Q =3
"
"
"
l
k
[
[
[
Q Q 31
|Q 31
(1)
|
H GQ | Q 31 =
Q|Q 31 3
3
2
Q 31
Q =3
Q =3
Q =3
Q =3
Q =3
]
1
1
1
|
g|
=
3 (1 + 2|) 3
2 (1 3 |)2
2
13|

"
[

Q| Q 31 3

"
[

or
U(1> |) =

|2
(1 3 |)3

|2
(1 3 |)

(16.55)

It is more convenient to consider the dierential equation as an ordinary dierential equation


in | and to regard the variable { as a parameter. The homogeneous dierential equation,


2
CUk ({> |)
= { + 3 1 Uk ({> |)
(1 3 |)
C|
|
is solved after integration with respect to |,


]
]
] {+ 2 31
g|
g|
|
ln Uk ({> |) =
g| = ({ 3 1)
+2
13|
13|
| (1 3 |)
l
k
= (1 3 {) ln(1 3 |) + 2 (ln | 3 ln(1 3 |)) = ln (1 3 |)3{31 |2
or Uk ({> |) = (1 3 |)3{31 | 2 . The particular solution is of the form U ({> |) = F (|) Uk ({> |)
where F (|) obeys


CF (|)
1
1
= {2 (1 3 |){
+
C|
13|
(1 3 |)3
or
F (|) = {2
=3

] 

(1 3 |){33 + (1 3 |){31 g| + f ({)

{2 (1 3 |){32
{2 (1 3 |){
3
+ f ({)
{32
{

where f ({) is a function of {, independent of |, to be determined later. The solution is


U ({> |) = 3

({|)2
3

({ 3 2) (1 3 |)

{| 2
+ f ({) (1 3 |)3{31 | 2
(1 3 |)

The initial condition U (0> |) = 0 shows that f (0) = 0, while the boundary condition (16.55)
implies that f(1) = 0. Expanding this solution in a power series around { = 0 and | = 0 yields
Uk ({> |) = (1 3 |)3{31 |2 =

" 
[
3{ 3 1
(31)Q |Q +2
Q
Q =0

384

The Shortest Path Problem

From the generating function of the Stirling numbers of the rst kind (Abramowitz and Stegun,
1968, Section 24.1.3),
q
[
K({ + 1)
(m)
Vq {m
=
K({ + 1 3 q)
m=0

(16.56)

we observe that
(n+1)
Q
3{ 3 1
[
VQ +1 (31)n n
K(3{)
=
=
{
Q
Q !K(3{ 3 Q)
Q!
n=0

such that
Uk ({> |) =

(n+1)
" [
Q
[
VQ +1
Q =0 n=0

Q!

(31)Q +n {n | Q +2 =

(n+1)
32
" Q
[
[
VQ 31
Q =2 n=0

(Q 3 2)!

(31)Q +n {n |Q

Hence,

U ({> |) =

" [
"
[
Q =2 n=2

1
2n31

(n+1)
32
"
" Q
Q 
[
[
[
VQ 31
{n |Q 3 {
(31)Q +n {n | Q
|Q + f ({)
2
(Q
3
2)!
Q =2
Q =2 n=0

It remains to determine f ({) by equating the corresponding powers in { and | at both sides. With
the denition (16.54), equating the second power (Q = 2) in | yields
0=

"
[
n=2

1
{n 3 { + f ({)
2n31

which indicates that


f ({) = { 3

{2
23{

agreeing with f (0) = f (1) = 0. The Taylor series around { = 0 is f ({) =


1
f1 = 1 and fn = 3 2n1
for n A 1. Equating the power Q A 2 in |,
Q
31
[

(n)

uQ {n =

n=2

"
[
n=2

S"

n
n=0 fn {

with f0 = 0,

(n+1)
Q
32
Q 
[
VQ 31
n
{
3
{
+
f
({)
(31)Q +n {n
2n31 2
(Q
3
2)!
n=0

(n+1)
"
"
Q 
[
[
VQ 31
n
n
{
3
{
+
f
{
(31)Q +n {n
n
2n31 2
(Q 3 2)!
n=2
n=0
n=0
3
4
(m+1)
n31
"
"
[
[
[
VQ 31
1 Q  n
Q +m D n
C
(31)
=
fn3m
{
{ 3{+
2n31 2
(Q 3 2)!
m=0
n=2
n=1
3
3
4
4
(1)
(m+1)
n31
"
"
[
[
[
(31)Q VQ 31
(31)Q +m VQ 31
1 Q  n
C
C
D
D {n
{ 3 { + f1
=
fn3m
{+
2n31 2
(Q 3 2)!
(Q 3 2)!
m=0
n=2
n=2

"
[

(1)

which, by using VQ 31 = (31)Q (Q 3 2)! and f1 = 1, equals


Q
31
[
n=2

(n)
uQ {n

"
[
n=2

3
4
(m+1)
n31
"
Q 
[
[
VQ 31
Q +m D n
n
C
{ +
(31)
{
fn3m
2n31 2
(Q 3 2)!
m=0
n=2
1

16.9 Problems

385

Finally, by equating the corresponding powers in {, leads to


(n)

uQ =

(m+1)
Q  n31
[
VQ 31
(31)Q +m
f
+
n3m
2n31 2
(Q 3 2)!
m=0

1
2n31

n31
Q  (31)Q +n31 V (n)
[ (m)
(31)Q
Q 31
+
V
(32)m
+ n
2
(Q 3 2)!
2 (Q 3 2)! m=1 Q 31

or to (16.37). As a check, using (16.56) the generating function


reveals that
k
l
(Q 31)
=
H GQ

Q
2Q 31

(31)q K(q3{)
K(3{)

1
Q!
1
2
Q
+
3 Q 31
+
=
2Q 31
(Q 3 1)!
2
(Q 3 1)!
(Q 3 1)!
(Q 3 1)!
+

m=0

(m)

Vq {m ,

Q
32
[
1
(31)Q
(m)
V
(32)m
+ Q 31
(Q 3 1)!
2
(Q 3 1)! m=1 Q 31
&
%
(31)Q 31 K(Q 3 1 + 2)
1
(31)Q
(Q 31)
Q 31
+
+ Q 31
3 VQ 31 (32)
(Q 3 1)!
2
(Q 3 1)!
K(2)

Q
2Q 31

Q
2

Sq

 1
=
Also H GQ

1
Q 31

is readily veried.

16.9 Problems
(i) Comparison of simulations with exact results. Many of the theoretical results are easily veried by simulations. Consider the following
standard simulation: (a) Construct a graph of a certain class, e.g. an
instance of the random graphs Js (Q ) with exponentially distributed
link weights (b) Determine in that graph a desired property, e.g. the
hopcount of the shortest path between two dierent arbitrary nodes,
(c) Store the hopcount in a histogram and (d) repeat the sequence
(a)-(c) q times with each time a dierent graph instance in (a). Estimate the relative error of the simulated hopcount in Js (Q ) with
s = 1 for q = 104 > 105 and 106 .
(ii) Given the probability generating function (16.17) of the weight of the
shortest path in a complete graph with independent exponential link
weights, compute the variance of ZQ .
(iii) Prove the asymptotic law (16.20) of the weight of the shortest path
in a complete graph with i.i.d. exponential link weights.
(iv) In a communication network often two paths are computed for each
important ow to guarantee su!cient reliability. Apart from the
shortest path between a source D and a destination E, a second path
between D and E is chosen that does not travel over any intermediate
router of the shortest path. We call such a path node-disjoint to the
shortest path. Derive a good approximation for the distribution of

386

The Shortest Path Problem

the hopcount of the shortest node-disjoint path to the shortest path


in the complete graph with exponential link weights with mean 1.

17
The e!ciency of multicast

The e!ciency or gain of multicast in terms of network resources is compared


to unicast. Specically, we concentrate on a one-to-many communication,
where a source sends a same message to p dierent, uniformly distributed
destinations along the shortest path. In unicast, this message is sent p
times from the source to each destination. Hence, unicast uses on average
iQ (p) = pH [KQ ] link-traversals or hops, where H [KQ ] is the average
number of hops to a uniform location in the graph with Q nodes. One of
the main properties of multicast is that it economizes on the number of linktraversals: the message is only copied at each branch point of the multicast
tree to the p destinations. Let us denote by KQ (p) the number of links in
the shortest path tree (SPT) to p uniformly chosen nodes. If we dene the
multicast gain jQ (p) = H [KQ (p)] as the average number of hops in the
SPT rooted at a source to p randomly chosen distinct destinations, then
jQ (p)  iQ (p). The purpose here is to quantify the multicast gain jQ (p).
We present general results valid for all graphs and more explicit results valid
for the random graph Js (Q ) and for the n-ary tree. The analysis presented
here may be valuable to derive a business model for multicast: How many
customers p are needed to make the use of multicast for a service provider
protable?
Two modeling assumptions are made. First, the multicast process is assumed1 to deliver packets along the shortest path from a source to each of
the p destinations. As most of the current Internet protocols forward packets based on the (reverse) shortest path, the assumption of SPT delivery is
quite realistic. The second assumption is that the p multicast group member nodes are uniformly chosen out of the total number of nodes Q . This
assumption has been discussed by Phillips et al. (1999). They concluded
1

The assumption ignores shared tree multicast forwarding such as core-based tree (CBT, see
RFC2201).

387

388

The e!ciency of multicast

that, if p and Q are large, deviations from the uniformity assumption are
negligibly small. Also the Internet measurements of Chalmers and Almeroth
(2001) seem to conrm the validity of the uniformity assumption.
17.1 General results for jQ (p)
Theorem 17.1.1 For any connected graph with Q nodes,
Qp
(17.1)
p+1
Proof: We need at least one edge for each dierent user; therefore
jQ (p)  p and the lower bound is attained in a star topology with the
source at the center.
We will next show that an upper bound is obtained in a line topology.
It is su!cient to consider trees, because multicast only uses shortest paths
without cycles. If the tree has not a line topology, then at least one node
has degree 3 or the root has degree 2. Take the node closest to the root
with this property and cut one of the branches at this node; we paste that
branch to a node at the deepest level. Through this procedure the multicast
function jQ (p) stays unaltered or increases. Continuing in this fashion until
we reach a line topology demonstrates the claim.
For the line topology we place the source at the origin and the other
nodes at the integers 1> 2> = = = > Q  1. The links of the graph are given by
(l> l + 1)> l = 0> 1> = = = > Q  2. The multicast gain jQ (p) equals H [P ], where
P is the maximum of a sample of size p, without replacement, from the
integers 1> 2> = = = > Q  1. Thus,
n
p  jQ (p) 

p
Pr [P  n] = Q31
>

pn Q 1

from which jQ (p) = H [P ] is


n n31 Q31 n31
Q31
X
X p31

jQ (p) =
n p Q31p =
n Q31
n=p
Q31
X

n=p
Q
31
X

pQ
pQ
p
Q
=
p+1
p+1
p
n=p
n=p p+1
PQ31 n Q
where we have used that
n=p p @ p+1 = 1, because it is a sum of
probabilities over all possible disjoint outcomes.

=p

p
Q31
=

Figure 17.1 shows the allowable space for jQ (p).

17.1 General results for jQ (p)


gN(m)

389

Nm/(m + 1)

N1

N/2

clog(N)
1

m
1

N1

Fig. 17.1. The allowable region (in white) of jQ (p). For exponentially growing
graphs, H[KQ ] = f log Q , implying that the allowable region for these graphs is
smaller and bounded at the left (in dotted line) by the straight line p(f log Q ).

Theorem 17.1.2 For any connected graph with Q nodes, the map p 7$
(p)
jQ (p) is concave and the map p 7$ ijQ
is decreasing.
Q (p)
Proof: Dene \p to be the random variable giving the additional number
of hops necessary to reach the p-th user when the rst p1 users are already
connected. Then we have that
H [\p ] = jQ (p)  jQ (p  1)
Moreover, let \p0 be the random number of additional hops necessary to
reach the p-th multicast group member, when we discard all extra hops
of the (p  1)-st group member. An example is illustrated in Fig. 17.2.
The random variable \p0 has the same distribution as \p31 , because both
the (p  1)-st and the p-th group member are chosen uniformly from the
remaining Q  p  1 nodes. In general, \p0 6= \p31 > but, for each n,
Pr[\p0 = n] = Pr[\p31 = n] and, hence,

(17.2)
H \p0 = H [\p31 ]
Furthermore, we have by construction that \p  \p0 with probability 1,
implying that

(17.3)
H [\p ]  H \p0
Indeed, attaching the p-th group member to the reduced tree takes at least
as many hops as attaching that same group member to the non-reduced tree
because the former is contained in the latter and the extra hops added by

390

The e!ciency of multicast

the p  1 group member can only help us. Combining (17.2) and (17.3)
immediately gives

jQ (p)  jQ (p  1) = H [\p ]  H \p0 = jQ (p  1)  jQ (p  2) (17.4)
This is equivalent to the concavity of the map p 7$ jQ (p).
Root
A

Fig. 17.2. A multicast session with p = 5 group members where \5 = 1 (namely


link C-5). To construct \50 the three dotted lines must be removed and we observe
that \50 = 2 (A-C-5), which is referred to as the reduced tree. In this example,
\50 = \4 = 2 because A-C-4 and A-C-5 both consist of 2 hops. In general, they are
equal in distribution because the role of group member 4 and 5 are identical in the
reduced tree.

In order to show that


jQ (p)
p

jQ (p)
iQ (p)

is decreasing it su!ces to show that p 7$

is decreasing, since iQ (p) is proportional to p. Dening jQ (0) = 0,


we can write jQ (p) as a telescoping sum
p
p
X
X
{jQ (n)  jQ (n  1)} =
{n
jQ (p) =
n=1

n=1

where {n = jQ (n)  jQ (n  1)> n = 1> = = = > p. Then,


jQ (p)
1 X
{n
=
p
p
p

n=1

is the mean of a sequence of p positive numbers {n . By (17.4) the sequence


{n  {n31 is decreasing and, hence,
p
p31
jQ (p)
1 X
jQ (p  1)
1 X
{n 
{n =
=
p
p
p1
p1
n=1

n=1

17.1 General results for jQ (p)

This proves that p 7$ jQ (p)@p is decreasing.

391

Next, we will give a representation for jQ (p) that is valid for all graphs.
Let [l be the number of joint hops that all l uniformly chosen and dierent
group members have in common, then the following general theorem holds,
Theorem 17.1.3 For any connected graph with Q nodes,
p
X
p
(1)l31 H [[l ]
jQ (p) =
l

(17.5)

l=1

Note that
jQ (1) = iQ (1) = H [[1 ] = H [KQ ]
so that the decrease in average hops or the gain by using multicast over
unicast is precisely
p
X
p
(1)l31 H [[l ]
jQ (p)  iQ (p) =
l
l=2

However, computing H [[l ] for general graphs is di!cult.


Proof of Theorem 17.1.3: Let D1 > D2 > = = = > Dp be sets where Dl consists
of all links that constitute the shortest path from the source to multicast
group member l. Denote by |Dl | the number of elements in the set Dl . The
multicast group members are chosen uniformly from the set of all nodes
except for the root. Hence,
H [[1 ] = H [|Dl |] >

for 1  l  Q

and
H [[2 ] = H [|Dl _ Dm |] >

for 1  l ? m  Q


etc.. Now, jQ (p) = H [|D1 ^ D2 ^ ^ Dp |]. Since T(D) = H [|D|] @ Q2 is
a probability measure on the set of all links, we obtain from
Q the inclusionexclusion formula (2.3) applied to T and multiplied with 2 afterwards,
H [|D1 ^ D2 ^ ^ Dp |] =

p
X

H [|Dl |] 

l=1

H [|Dl _ Dm |] +

l?m

+ (1)p31 H [|D1 _ D2 _ _ Dp |]

p
H [[2 ] + + (1)p31 H [[p ]
= pH [[1 ] 
2
This proves Theorem 17.1.3.

392

The e!ciency of multicast

Corollary 17.1.4 For any connected graph with Q nodes,


p
X
p
H [[p ] =
(1)l31 jQ (l)
l

(17.6)

l=1

The corollary is a direct consequence of the inversion formula for the


binomial (Riordan, 1968, Chapter 2). Alternatively, in view of the GregoryNewton interpolation
formula (Lanczos, 1988, Chapter 4, Section 2) for
P p l
l31 l j (0) where
j

jQ (p) = "
Q (0), we can write H [[l ] = (1)
Q
l=1 l
 is the dierence operator, i (0) = i (1)  i (0).
Corollary 17.1.5 For any connected graph, the multicast e!ciency jQ (p)
is bounded by
iQ (p)
 H [KQ ]
(17.7)
jQ (p)
where H [KQ ] is the average number of hops in unicast.
Proof: We give two demonstrations. (a) From jQ (Q  1) = Q  1 (all
nodes, source plus Q  1 destinations, of the graph are spanned by a tree
(p)
(see Theorem
consisting of Q  1 links) and the monotonicity of p 7$ jiQ
Q (p)
17.1.2) we obtain:
jQ (p)
jQ (Q  1)
Q 1
1

=
=
iQ (p)
iQ (Q  1)
(Q  1)H [KQ ]
H [KQ ]
(b) Alternatively, Theorem 17.1.1 indicates that jQ (p)  p, which, with
the identity iQ (p) = pH [KQ ], immediately leads to (17.7).

Corollary 17.1.5 means that for any connected graph, including the graph
describing the Internet, the ratio of the unicast over multicast e!ciency is
bounded by the expected hopcount in unicast. In order words, the maximum
savings in resources an operator can gain by using multicast (over unicast)
never exceeds H [KQ ], which is roughly about 15 in the current Internet.
17.2 The random graph Js (Q )
In this section, we conne to the class RGU, the random graphs of the
class Js (Q ) with independent identically and exponentially distributed link
weights z with mean H [z] = 1 and where Pr[z  {] = 1  h3{ , { A 0.
In Section 16.2, we have shown that the corresponding SPT is, asymptotically, a URT. The analysis below is exact for the complete graph NQ while
asymptotically correct for connected random graphs Js (Q ).

17.2 The random graph Js (Q )

393

17.2.1 The hopcount of the shortest path tree


Based on properties of the URT, the complete probability density function
of the number of links KQ (p) in the SPT to p uniformly chosen nodes can
be determined. We rst derive
K (p)arecursion for the probability generating
of the number of links KQ (p) in the
function *KQ (p) (}) = H } Q
SPT to p uniformly chosen nodes in the complete graph NQ .
Lemma 17.2.1 For Q A 1 and all 1  p  Q  1,
*KQ (p) (}) =

(Q  p  1)(Q  1 + p})
p2 }
*
(})
+
K
(p)
Q
1
2 *KQ 1 (p1) (})
(Q  1)2
(Q  1)
(17.8)

Proof: To prove (17.8), we use the recursive growth of URTs: a URT of size
Q is a URT of size Q  1, where we add an additional link to a uniformly
chosen node.
1

2 N
N

Case A

Case B

Case C and D

Fig. 17.3. The several possible cases in which the Q -th node can be attached uniformly to the URT of size Q  1. The root is dark shaded while the p multicast
member nodes are lightly shaded.

In order to obtain a recursion for KQ (p) we distinguish between the p


uniformly chosen nodes all being in the URT of size Q  1 or not. The
p
probability that they all belong to the tree of size Q  1 is equal to 1  Q31
(case A in Fig. 17.3). If they all belong to the URT of size Q  1, then we
have that KQ (p) = KQ31 (p). Thus, we obtain

h
i
p
p
*KQ (p) (}) = 1 
*KQ 31 (p) (}) +
H } 1+OQ 31 (p) (17.9)
Q 1
Q 1
where OQ31 (p) is the number of links in the subtree of the URT of size
Q  1 spanned by p  1 uniform nodes and the one refers to the link from

394

The e!ciency of multicast

the added Q -th node to its ancestor in the URT of size Q  1. We complete
the proof by investigating the generating function of OQ31 (p). Again, there
are two cases. In the rst case (B in Fig. 17.3), the ancestor of the added
Q -th node is one of the p  1 previous nodes (which can only happen if it is
unequal to the root), else we get one of the cases C and D in Fig. 17.3. The
probability of the rst event equals p31
Q31 , the probability of the latter equals
p31
1  Q31 . If the ancestor of the added Q-th node is one of the p  1 previous
nodes, then the number of links OQ31 (p) equals KQ31 (p  1), otherwise
the generating function of the number of additional links equals

1
1
*KQ 31 (p) (}) +
*
1
(})
Q p
Q  p KQ 31 (p31)
The rst contribution comes from the case where the ancestor of the added
Q -th node is not the root, and the second from where it is equal to the root,
1
1
= Q3p
. Therefore,
which has probability Q313(p31)
i p1
h
*
H } OQ 31 (p) =
(})
Q  1 KQ 31 (p31)

*KQ 31 (p31) (})


Q p Q p1
+
*KQ 31 (p) (}) +
Q 1
Q p
Q p
p
Q p1
=
*KQ 31 (p31) (}) +
*KQ 31 (p) (}) (17.10)
Q 1
Q 1
Substitution of (17.10) into (17.9) leads to (17.8).

Since jQ (p) = H[KQ (p)] = *0KQ (p) (1), we obtain the recursion for
jQ (p),

p2
p2
p
jQ (p) = 1 
j
(p)
+
jQ31 (p  1) +
Q31
(Q  1)2
(Q  1)2
Q 1
(17.11)
Theorem 17.2.2 For all Q  1 and 1  p  Q  1,
p
i p!(Q  1  p)! X
h
(Q + n})
p
KQ (p)
=
(1)p3n
*KQ (p) (}) = H }
2
(1 + n})
n
((Q  1)!)
n=0
(17.12)
Consequently,
(m+1) (p)

p!(1)Q3(m+1) VQ Sm
Pr [KQ (p) = m] =

(Q  1)! Q31
p

(17.13)

17.2 The random graph Js (Q )


(m+1)

395

(p)

where VQ
and Sm denote the Stirling numbers of rst and second kind
(Abramowitz and Stegun, 1968, Section 24.1).
Proof: By iterating the recursion (17.8) for small values of p, the computations given in
van der Hofstad et al. (2006a, Appendix) suggest the solution (17.12) for (17.8). One can verify
that (17.12) satises (17.8). This proves (17.12) of Theorem 17.2.2. Using (Abramowitz and
Stegun, 1968, Section 24.1.3.B), the Taylor expansion around } = 0 equals
*KQ (p) (}) =



p
K(Q + n})
p!Q(Q 3 1 3 p)! [ p
1
(31)p3n
3
n
(Q 3 1)!
Q!K(1 + n})
Q
n=0

(m+1)
Q
31
p
[
(31)Q 3(m+1) VQ
p!Q(Q 3 1 3 p)! [ p
(31)p3n
nm } m
n
(Q 3 1)!
Q!
m=1
n=0
$
# p
(m+1)
Q 31
[ p
p!Q(Q 3 1 3 p)! [ (31)Q 3(m+1) VQ
(31)p3n nm } m
=
n
(Q 3 1)!
Q!
m=1
n=0

Using the denition of Stirling numbers of the second kind (Abramowitz and Stegun, 1968,
24.1.4.C),
p  
[
p
(p)
(31)p3n nm
=
p!Sm
n
n=0
(p)

for which Sm

= 0 if m ? p, gives
*KQ (p) (}) =

Q 31
(p!)2 (Q 3 1 3 p)! [
(m+1) (p) m
(31)Q 3(m+1) VQ
Sm }
2
((Q 3 1)!)
m=1

This proves (17.13) and completes the proof of Theorem 17.2.2.

Figure 17.4 plots the probability density function of K50 (p) for dierent
values of p.
Corollary 17.2.3 For all Q  1 and 1  p  Q  1,
jQ (p) = H [KQ (p)] =

Q
X
1
pQ
Q p
n

(17.14)

n=p+1

and

P
1
2 (p)
p2 Q 2 Q
jQ
Q 1+p
n=p+1 n2
Var [KQ (p)] =
jQ (p) 

Q +1p
(Q + 1  p) (Q  p)(Q + 1  p)
(17.15)

The formula (17.14) is proved in two dierent ways. The earlier proof
presented in Section 17.6 below does not rely on the recursion in Lemma
17.2.1 nor on Theorem 17.2.2. The shorter proof is presented here. Formula
(17.14) can be expressed in terms of the digamma function #({) as

#(Q )  #(p)
1
(17.16)
jQ (p) = pQ
Q p

396

The e!ciency of multicast


0.5

Pr[H50(m) = j]

0.4

0.3

0.2

0.1

0.0
0

10

20

30

40

50

j hops

Fig. 17.4. The pdf of K50 (p) for p = 1, 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, 40, 45,
47.
Proof of Corollary 17.2.3: The expectation and variance of KQ (p) will not be obtained
using the explicit probabilities (17.13), but by rewriting (17.12) as
p
k
l
K(p + 1)K(Q 3 p) [ p
(31)p3n CwQ 31 wQ 31+n}
2
w=1
n
K (Q)
n=0
k
l
K(p + 1)K(Q 3 p)
(31)p CwQ 31 wQ 31 (1 3 w} )p
=
w=1
K2 (Q)

*KQ (p) (}) =

(17.17)

Indeed,
k
l
K(p + 1)K(Q 3 p)
Q 31 Q 31
p
} p
(31)
w
C
C
(1
3
w
)
}
w
w=}=1
K2 (Q)
k
l
K(p + 1)K(Q 3 p)
p(31)p31 CwQ 31 wQ log w(1 3 w)p31
=
>
2
w=1
K (Q)
k
l
K(p + 1)K(Q 3 p)
H[KQ (p) (KQ (p) 3 1)] =
(31)p C}2 CwQ 31 wQ 31 (1 3 w} )p
2
w=}=1
K (Q)
K(p + 1)K(Q 3 p)
p31
=
p(31)
K2 (Q)
k
l
CwQ 31 wQ log2 w(1 3 w)p32 [3(p 3 1)w + (1 3 w)]
H[KQ (p)] =

w=1

We will start with the former. Using


H[KQ (p)] =

Cwl (1

w)m |w=1

m!(31)m 

l>m

and Leibniz rule, we nd

l
K(p + 1)K(Q 3 p) Q 3 1 Q 3p k Q
p!
C
w log w
w=1
K2 (Q)
p31 w

17.2 The random graph Js (Q )

397

Since
Cwn [wq log w]w=1 =

q!
(q 3 n)!

q
[
m=q3n+1

1
m

we obtain expression (17.14) for H[KQ (p)].


We now extend the above computation to H[KQ (p)(KQ (p) 3 1)] that we write as
H[KQ (p) (KQ (p) 3 1)] =

K(p + 1)K(Q 3 p)
(U1 + U2 )
K2 (Q)

(17.18)

where
k
l
U1 = p(p 3 1)(31)p32 CwQ 31 wQ +1 log2 w(1 3 w)p32
w=1
k
l
U2 = p(31)p31 CwQ 31 wQ log2 w(1 3 w)p31
w=1

Using
Cwn [wq

log w]w=1

q!
=2
(q 3 n)!

q
[

q
[

l=q3n+1 m=l+1

53
q
[
1
q!
7C
=
lm
(q 3 n)!
l=q3n+1

6
42
q
[
1D
18
3
l
l2
l=q3n+1

we obtain,
k
Q 3 1
l
p(p 3 1)(p 3 2)!CwQ 3p+1 wQ +1 log2 w
w=1
p32
53
6
42
Q
+1
Q
+1
Q 3 1
[ 1
[ 1
7C
8
D 3
= (Q + 1)!
p32
n
n2
n=p+1
n=p+1

U1 =

Similarly,
Q 3 1
l
k
p(p 3 1)!CwQ 3p wQ log2 w
w=1
p31
6
53
42
Q
Q
Q 3 1
[
[
1
1
8
7C
D 3
= Q!
p31
n
n2
n=p+1
n=p+1

U2 =

Substitution into (17.18) leads to


6
53
42
Q
Q
[
[
1
1
p2 Q 2
8
7C
D 3
H[KQ (p)(KQ (p) 3 1)] =
(Q + 1 3 p)(Q 3 p)
n
n2
n=p+1
n=p+1
+

2p(p 3 1)Q
(Q + 1 3 p)(Q 3 p)

Q
[
n=p+1

1
n

2 (p), we
From jQ (p) = H[KQ (p)] and Var [KQ (p)] = H[KQ (p)(KQ (p) 3 1)] + jQ (p) 3 jQ

obtain (17.15). This completes the proof of Corollary 17.2.3.

For Q = 1000, Fig. 17.5 illustrates the typical behaviorp


for large Q of the
expectation jQ (p) and the standard deviation Q (p) = Var [KQ (p)] for
all values of p. For any spanning tree, the number of links KQ (Q  1) is
precisely Q  1, so that Var[KQ (Q  1)] = 0.

398

The e!ciency of multicast


1000

12

10

8
600
6
400
4

g1000 (m)
V1000 (m)

Standard deviation VN (m)

average hopcount gN (m)

800

200
2

200

400

600

0
1000

800

Fig. 17.5. The average number of hops jQ (p) (left axis) in the SPT and the corresponding standard deviation Q (p) (right axis) as a function of the number p
of multicast group members in the complete graph with Q = 1000.

Figure 17.5 also indicates that the standard deviation Q (p) of KQ (p)
is much smaller than the average, even for Q = 1000. In fact, we obtain
from (17.15) that
Var [KQ (p)] 

2 (p)
2jQ
Q 1+p
2Q jQ (p)
jQ (p) 
= PQ
Q +1p
Q p
p n=p+1

1
n

2
= r(jQ
(p))

This bound implies that with probability converging to 1 for every p =


1> = = = > Q  1,

KQ (p)

%

1

jQ (p)
KQ (p)3jQ (p)
s
jQ (p)
g
KQ (p)3jQ (p)
s
$
jQ (p)

In van der Hofstad et al. (2006a), the scaled random variable

is proved to tend to a Gaussian random variable, i.e.


s
Q (0> 1), for all p = r( Q ). For large graphs of the size of the Internet
and larger, this observation implies that the mean jQ (p) = H [KQ (p)]
is a good approximation for the random variable KQ (p) itself because the
variations of KQ (p) around the mean are small. Consequently, it underlines
the importance of jQ (p) as a signicant measure for multicast.

17.2 The random graph Js (Q )

399

17.2.2 The weight of the shortest path tree


In this section, we summarize results on the weight ZQ (p) of the SPT
and omit derivations, but refer to van der Hofstad et al. (2006a). For all
1  p  Q  1, the average weight of the SPT is
H [ZQ (p)] =

p
X
m=1

Q31
1 X 1
Q m
n

(17.19)

n=m

In particular, if the shortest path tree spans the whole graph, then for all
Q  2,
H [ZQ (Q  1)] =

Q31
X
q=1

from which H [ZQ (Q  1)] ?  (2) =

2
6

1
q2

(17.20)

for any nite Q . The variance is

m
Q31
Q31
Q31
X 1 X
X 1
1
4 X 1

5
+
4
Var [ZQ (Q  1)] =
Q
n3
m3
n
m4
n=1

m=1

or asymptotically, for large Q ,


4 (3)
+R
Var [ZQ (Q  1)] =
Q

(17.21)

m=1

n=1

log Q
Q2

(17.22)

Asymptotically for large Q , the average weight of a shortest path tree is


(2) = 1=645 = = =, while the average weight of the minimum spanning tree,
given by (16.49), is (3) ? (2). This result has an interesting implication. The Steiner tree is the minimum weight tree that connects a set of
p members out of Q nodes in the graph. The Steiner tree problem is
NP-complete, which means that it is unfeasible to compute for large Q .
If p = 2, the weight of the Steiner tree equals that of the shortest path,
ZSteiner>Q (2) = ZQ , while for p = Q , we have ZSteiner>Q (Q ) = ZMST .
Hence, for any p ? Q and Q , H[ZSteiner>Q (p)]  (3) because the weight
of the Steiner tree does not decrease if the number of members p increases.
The ratio (2)
(3) = 1=368 indicates that the use of the SPT (computationally
easy) never performs on average more than 37% worse than the optimal
Steiner tree (computationally unfeasible). In a broader context and referring to the concept of the Prize of Anarchy, which is broadly explained
in Robinson (2004), the SPT used in a communications network is related
to the Nash equilibrium, while the Steiner tree gives the hardly achievable
global optimum.
Simulations even for small Q, which allow us to cover the entire p-range

400

The e!ciency of multicast


10

fW* (m)(x)
N

10

10

m
m
m
m
m
m
m
m
m
m
m
m

-1

-2

=1
=5
= 10
= 20
= 30
= 40
= 50
= 60
= 70
= 80
= 90
= 95

N = 100
10

10

-3

Normalized Gumbel
Normalized Gaussian N(0,1)

-4

-4

-2

Fig. 17.6. The pdf of the normalized random variable ZQ (p) for Q = 100.

as illustrated in Fig. 17.6 indicate that the normalized random variable


ZQW (p) = ZQs(p)3H[ZQ (p)] lies between a normalized Gaussian Q (0> 1) and
ydu[ZQ (p)]

a normalized Gumbel (see Theorem 6.4.1). Fig. 17.6 may suggest that, for
all p ? Q ,
 {3
3I
6

lim Pr [ZQW (p)  {]  h3h

Q<"

(17.23)

where  = 0=57721=== is Eulers constant. For the particular case of p = 1,


the relation to the Gumbel distribution has been shown in Section 16.4 where
the correct limit law is given in (16.20). However, van der Hofstad et al.
(2006b) show that the weight of the shortest path tree for p = Q  1 tends
to a Gaussian,
s

g
2
Q (ZQ (Q  1)   (2)) $ Q 0> SPT
2
with SPT
= 4 (3)  4=80823 as follows from (17.22). This shows that simulations alone may be inadequate to deduce asymptotic behavior. Finally,
Janson (1995) gave the related result for the minimum spanning tree. He
extended Friezes result (16.49) by proving that the scaled weight of the
minimum spanning tree also tends to a Gaussian for large Q ,
s

g
2
Q (ZMST   (3)) $ Q 0> MST

17.3 The n-ary tree

401

2
where2 MST
= 6 (4)  4 (3)  1=6857.

k=2

k=5

Fig. 17.7. The left hand side tree (n = 2) has Q = 31 and G = 4, while the right
hand side (n = 5) has Q = 31 and G = 2.

17.3 The n-ary tree


In this section, we consider the n-ary tree of depth3 G with the source at the
root of the tree and p receivers at randomly chosen nodes (see Fig. 17.7).
In a n-ary tree the total number of nodes satises
Q = 1 + n + n2 + + nG =

n G+1  1
n1

(17.24)

Theorem 17.3.1 For the n-ary tree,


jQ>n (p) = Q  1 

G31
X
m=0

Proof: See Section 17.7.

Q313 nm+1 31
n G3m

n31

Qp31

(17.25)

Unfortunately, the m-summation seems di!cult to express in closed form.


Observe that jQ (Q  1) = Q  1> because all binomials vanish. The sum
extends over all levels m  G1> for which the remaining number of nodes in
2

Wstlund (2005) succeeded in computing the triple sum in Jansons original result
2
M
ST =

" [
"
" [
[
4
(l + n 3 1)!nn (l + m)l32 m
32
45
l!n! (l + m + n)l+n+2
l=0 m=1 n=1

The depth G is equal to the number of hops from the root to a node at the leaves.

402

The e!ciency of multicast


10000

8000

gN (m)

6000

k=3
k=2

4000

k=5
k = 10

random graph
k-ary tree
0.8
m  law
4
N = 10

2000

0
0

2000

4000

6000

8000

10000

Fig. 17.8. The multicast gain jQ (p) computed for the n-ary tree with four values
of n, the random graph (with eective nrg = h = 2=718===), and the Chuang-Sirbu
power law for Q = 104 on a linear scale where the prefactor H [KQ ] is given by
(16.10).

the lower levels o (i.e. G  o A m) is larger than p nodes. In some sense, we


may regard (17.25) as an (exact) expansion around p = Q  1. Explicitly,

jQ>n (p) = Q  1  n
1
G

G31
X

p
Q 1

nm+1 31
31
n31

n G3m

m=2

n

G31

n
Y
1
t=0

1

t=0

p
Q 1t

t=

n31

p
Q313t

1
?
n

1

(17.26)

Q31
n .

Moreover,

!nm

p
Q 1

which shows that jQ>n (p) is a polynomial in p of degree 


the terms in the m-sum rapidly decrease; their ratio equals

31
Q nm+1
31
n31
1
m
n 31

p
Q 1t

nm 31
n31

?? 1

17.3 The n-ary tree

403

Figure 17.8 indicates that formula j(17.25), althoughk derived subject to


(n31)]
 1 , where b{c is the
(17.24), also seems valid when G = log[1+Q
log n
largest integer smaller than or equal to {. This suggests that the deepest
level G need not be lled completely to count nG nodes and that (17.25)
may extend to incomplete n-ary trees. As further observed from Fig. 17.8,
jQ>n (p) is monotonously decreasing in n. Hence, it is quite likely that he
map n 7$ jQ>n (p) is decreasing in n 5 [1> Q  1]. Intuitively, this conjecture
can be understood from Fig. 17.7. Both the n = 2 and n = 5 trees have an
equal number of nodes. We observe that the deeper G (or the smaller n),
the more overlap is possible, hence, the larger jQ>n (p).
Theorem 17.1.1 can also be deduced from (17.25). The lower bound is
attained in a star topology where n = Q  1, G = 1 and H[KQ ] = 1.
The upper bound is attained in a line topology where n = 1, G = Q  1
and H[KQ ] = Q2 . Furthermore, for real values of n 5 [1> Q  1], the set of
curves specied by (17.25) covers the total allowable space of jQ>n (p), as
shown in Fig. 17.1. This suggests to consider (17.25) for estimating n in real
topologies.
Since jQ (1) = H[KQ ], the average hopcount in a n-ary tree follows from
(17.25) as
H[KQ ] = Q  1 

G31
X

G31
Q  1  n n3131
1 X G3m nm+1  1
=
n
Q 1
Q 1
n1
m+1

m=0

G3m

m=0

G
1
QG
+

=
Q  1 (Q  1)(n  1) n  1

(17.27)

For large Q , we nd with

log[1 + Q (n  1)]
 1  logn Q + logn (1  1@n) + R(1@Q )
G=
log n

that
1
+R
H[KQ ] = logn Q + logn (1  1@n) 
n1

logn Q
Q

(17.28)

Comparing (17.28) with the average hopcount in the random graph (16.10)
shows equality to rst order if nrg = h. Moreover, both the second order
1
terms   1 = 0=42 and log(1  1@h) h31
= 1=04 are R(1) and independent of Q . This shows that the multicast gain in the random graph is well
approximated by jQ>h (p).

404

The e!ciency of multicast

17.4 The ChuangSirbu law


We discuss the empirical ChuangSirbu scaling law, which states that jQ (p) 
H [KQ ] p0=8 for the Internet. Based on Internet measurements, Chuang and
Sirbu (1998) observed that jQ (p)  H [KQ ] p0=8 . Subsequently, Phillips
et al. (1999) dubbed this observation the ChuangSirbu law.
Corollary 17.1.5 implies that the empirical law of ChuangSirbu cannot
hold true for all p  Q . Indeed, if jQ (p) = H [KQ ] p0=8 > we obtain from the
inequality (17.7) and the identity iQ (p) = pH [KQ ], that p0=2  H [KQ ].
Write p = {Q for a xed 0 ? { ? 1 and { independent of Q . Hence, we
have shown that
Q]
$ 0>
Corollary 17.4.1 For all graphs satisfying the condition that H[K
Q 0=2
for large Q , the empirical ChuangSirbu law does not hold in the region
p = {Q with 0 ? {  1 and su!ciently large Q .

The most realistic graph models for the Internet assume that H [KQ ] 
f log Q , since this implies that the number of routers that can be reached
from any starting destination grows exponentially with the number of hops.
For these realistic graphs, Corollary 17.4.1 states that empirical Chuang
Sirbu law does not hold for all p. On the other hand, there are more regular
graphs (such as a g-lattice, where H[KQ ] ' g3 Q 1@g ) with H [KQ ]  Q 0=2+
(and  A 0) for which the mathematical condition p0=2  H [KQ ] is satised
for all p and Q . As shown in Van Mieghem et al. (2000), however, these
classes of graphs, in contrast to random graphs, are not leading to good
models for SPTs in the Internet.

17.4.1 Validity range of the ChuangSirbu law


For the random graph Js (Q ), the SPT is very close to a URT for large Q
and with (16.10), we obtain
iQ (p)  p(log Q +   1)
From the exact jQ (p) formula (17.16) for the random graph Js (Q ), the
asymptotic for large Q and p follows as

1
Q
pQ

log
(17.29)
jQ (p) 
Q p
p
2
The above scaling explains the empirical ChuangSirbu law for Js (Q ): forp
pQ
log Q
small with respect to Q, the graphs of (log Q +1)p0=8 and Q3p
p 
1
2 look very alike in a log-log plot, as illustrated in Fig. 17.9.

17.4 The ChuangSirbu law

405

Using the asymptotic properties of the digamma function #, we obtain


(17.29) as an excellent approximation for large Q (and all p) or, in normalized form with p = {Q and 0 ? { ? 1,
jQ ({Q ) + 0=5
{ log {

Q
{1

(17.30)

Q ] 0=8
= H[K
{ . It is interesting
The normalized ChuangSirbu law is jQ ({Q)
Q
Q 0=2
H[KQ ]
to note that the ChuangSirbu law is best if Q 0=2 = 1, since then both
endpoints { = 0 and { = 1 coincide with (17.30). This optimum is achieved
when Q  250 000, which is of the order of magnitude of the estimated
number of routers in the current Internet. This observation may explain
the fairly good correspondence on a less sensitive log-log scale with Internet
measurements. At the same time, it shows that for a growing Internet, the t
of the ChuangSirbu law will deteriorate. For Q  106 , the ChuangSirbu
law underestimates jQ (p) for all p.
7

10

0.8

m  law
random graph
6

10

10

gN (m)

1.00
4

10

0.95
Effective Power Exponent

10

10

0.90

0.85

0.80
Number of Nodes N
0.75

10

10
0

10

10

10

10

10
3

10

10
4

10

10

10

10
5

10

10

10
6

10

10

10

10

Fig. 17.9. The multicast e!ciency for Q = 10m with m = 3> 4> ===> 7. The endpoint of
each curve jQ (Q  1) = Q  1 determines Q . The insert shows the eective power
exponent versus Q .

17.4.2 The eective power exponent (Q )


For small to moderate values of p, jQ (p) is very close to a straight line
in a log-log plot. This power law behavior implies that log jQ (p) 

406

The e!ciency of multicast

log H(KQ )+(Q ) log p, which is a rst order Taylor expansion of log jQ (p)
in log p. This observation suggests the computation4 of the eective power
exponent (Q ) as

g log jQ (p)
(Q ) =
(17.31)
g log p
p=1

Only for a straight line, the dierential operator can be replaced by the
dierence operator such that (Q)   W (Q )> where
W

 (Q) =

jQ (2)
log H[K
Q]

log 2

(17.32)

In general, for small p, the eective power exponent (17.31) is not a constant
0.8 as in the ChuangSirbu law, but dependent on Q . Since jQ (p) is concave
jQ (p)
by Theorem 17.1.2, (Q ) is the maximum possible value for g log
at any
g log p
p  1. A direct consequence of Theorem 17.1.1 is that the eective power
exponent (Q ) 5 [ 12 > 1]. From recent Internet measurements, Chalmers and
Almeroth (2001) found that 0=66  (Q)  0=7.
The eective power exponent (Q) as dened in (17.31) for the random
graph is

2
2
Q #(Q ) +   6 + 6Q

(Q ) =
(Q  1) #(Q ) + (  1) + Q1
while, according to the denition (17.32),
W

 (Q ) =

jQ (2)
log H[K
Q]

log 2


= 1 + log2

(Q  1) (#(Q ) +   3@2 + 1@Q )


(Q  2) (#(Q ) +   1 + 1@Q )

The dierence (Q ) W (Q ) monotonously decreases and is largest, 0.048


at Q = 3 while 0.0083 at Q = 105 and 0.0037 at Q = 1010 . This eective
power exponent (Q ) is drawn in the insert of Fig. 17.9, which shows that
(Q ) is increasing and not a constant close to 0.8. More interestingly, for
Q]
large Q , we nd with (16.10) and (16.11) that (Q )  Var[K
H[KQ ] and that
Q]
limQ<" (Q ) = 1. In Van Mieghem et al. (2000), the ratio  = Var[K
H[KQ ]
pops up naturally as the extreme value index of the distribution of the
link weights in a topology. Since measurements of the hopcount in Internet
Q]
indicate that Var[K
H[KQ ]  1, which corresponds to a regular distribution, this
extreme value index strongly favors the model of the hopcount based on
4

Although (17.5) only has meaning for integer p, analytic continuation to a complex variable
is possible and, hence, dierentiation can be dened.

17.5 Stability of a multicast shortest path tree

407

shortest paths in Js (Q ), although random graphs do not model the Internet


topology well.
Thus, if the number of nodes in the Internet is still growing, we suggest,
only for small to moderate values of p, the consideration of a power law
approximation for the multicast gain
Va r [KQ ]

jQ (p)  H [KQ ] p

H [KQ ]

instead of the ChuangSirbu law.


In summary, many properties in nature seem linear on an insensitive loglog scale. However, deriving from these plots simple and attractive power
laws for complicated matter, seems a little oversimplied5 .
17.5 Stability of a multicast shortest path tree
We now turn to the problem of quantifying the stability in a multicast tree.
Inspired by Poisson arrival processes, at a single instant of time, we assume
that either one or zero group members can leave. In the sequel, we do
not make any further assumption about the time-dependent process of leaving/joining a multicast group and refrain from dependencies on time. The
number of links in the tree that change after one multicast group member
leaves the group has been chosen as measure for the stability of the multicast
tree. If we denote this quantity by Q (p), then, by denition of jQ (p),
the average number of changes equals
H [Q (p)] = jQ (p)  jQ (p  1)

(17.33)

Since jQ (p) is concave (Theorem 17.1.2), H [Q (p)] is always positive and
decreasing in p. If the scope of p is extended to real numbers, H [Q (p)] 
0 (p) which simplies further estimates.
jQ
The situation where on average less than one link changes if one multicast
group member leaves may be regarded as a stable regime. Since H [Q (p)]
is always positive and decreasing in p, this stable regime is reached when
the group size p exceeds p1 , which satises H [Q (p1 )] = 1. For example,
for the URT that is asymptotically the SPT for the class RGU dened in
Section 16.2.2, this condition approximately follows from (17.29) as

(p  1)Q
Q
Q
pQ

(17.34)
log
log
H [Q (p)] 
Q p
p
Q p+1
p1
5

Many recent articles devote attention to power law behavior but most of them seem prudent:
just recall the immense interest (hype?) a few years ago in the long range and self-similar nature
of Internet tra!c and the relation to the simple power law with only the Hurst parameter
(comparable to (Q) here) in the exponent.

408

Let { =

The e!ciency of multicast


p
Q,

then 0 ? { ? 1 and

H [Q (p)]
{
({  1@Q )
1

log { +
log { 
Q
1{
1  ({  1@Q )
Q

After expanding the second term in a Taylor series around { to rst order
in Q1 ,

{  1  log {
1
H [Q ({Q )] 
+R
(1  {)2
Q
For large Q , H [Q ({1 Q)]  1 occurs when {1 = 0=3161, which is the
{
= 1. For the class RGU, a stable tree as dened
solution in { of {313log
(13{)2
above is obtained when the multicast group size p is larger than p1 =
0=3161Q  Q3 . In the sequel, since p1 is high and of less practical interest,
we will focus on multicast group sizes smaller than p1 = The computation
of p1 for other graph types turns out to be di!cult. Since, as mentioned
above, the comparison with Internet measurement (Van Mieghem et al.,
2001a) shows that formula (17.29) provides a fairly good estimate, we expect
that p1  Q3 also approximates well the stable regime in the Internet.
The following theorem quanties the stability in the class RGU.
Theorem 17.5.1 For su!ciently large Q and xed p, the number of
changed edges Q (p) in a random graph Js (Q ) with uniformly distributed
link weights tends to a Poisson distribution,
n
3H[{Q (p)] (H [Q (p)])

Pr [Q (p) = n]  h

n!

(17.35)

where H [Q (p)] = jQ (p)  jQ (p  1) and jQ (p) is given by (17.16) or


approximately by (17.29).
Proof: In Section 16.2.1 we have mentioned that the SPT in the class
RGU is an URT for large Q . In addition, the random variable for the
number of hops KQ from the root to an arbitrary node tends, for large Q ,
to a Poisson random variable with mean H [KQ ]  log Q +   1 as shown in
Section 16.3.1. Now, Q (p) = KQ (p)  KQ (p  1) is the positive discrete
random variable that counts the absolute value of the dierence between
the hopcount kU<p from the root (source) to user p and the hopcount
kU<p31 from the root to the user closest in the tree to p, which we here
relabel by p  1. Both users p and p  1 are not independent nor the
two random variables kU<p and kU<p31 are independent in general due to
possible overlap in their paths.
If the shortest paths from the root to each of the two users p and p  1

17.5 Stability of a multicast shortest path tree

409

Root
A

B
D
m

m1

Fig. 17.10. A sketch of a uniform recursive tree, where kU$p = 3 and kU$p1 = 4
and the number of links in common is two (shown in bold Root-A-B).

overlap, there always exists a node in the SPT, say node E as illustrated in
Fig. 17.10, that sees the partial shortest paths from itself to p and p  1
as non-overlapping and independent. Since the SPT is a URT, the subtree
rooted at that node E (enclosed in dotted line in Fig. 17.10) is again a
URT as follows from Theorem 16.2.1. With respect to E, the nodes p and
p  1 are uniformly chosen and the number of links Q (p) that change
if the p-th node leaves is just its hopcount with respect to E (instead of
the original root). We denote the unknown number of nodes in that subtree
rooted at E by (p)  Q . We have that (p)  (p  1) because by
adding a group member, the size of the subtree can only decrease. For large
Q and small p, (p) is large such that the above mentioned asymptotic
law of the hopcount applies. If both p and Q are large, (p) will become
too small for the asymptotic law to apply. Thus, for xed p and large Q ,
this implies that Q (p) tends to a Poisson random variable with mean
H [Q (p)].

Simulations in Van Mieghem and Janic (2002) indicate that the Poisson
law seems more widely valid than just in the asymptotic regime (Q $ 4).
The proof can be extended to a general topology. Assume for a certain
class of graphs that the pdf of the hopcount Pr [KQ = n] and the multicast
e!ciency jQ (p) can be computed for all sizes Q. The subtree rooted at E
is again a SPT in a subcluster of size (p), which is an unknown random
variable. The argument similar as the one in the proof above shows that

Pr [Q (p) = n] = Pr K(p) = n

410

The e!ciency of multicast

This argument implicitly assumes that all multicast users are uniformly
distributed over the graph. By the law of total probability,
Q
X

Pr K(p) = n|(p) = q Pr [(p) = q]


Pr K(p) = n =

q=1
Q
X

Pr [Kq = n] Pr [(p) = q]

q=1

which, unfortunately shows that the pdf of (p) is required to specify


Pr [Q (p) = n]. However, we can proceed further in an approximate way by
replacing the unknown random variable (p) by its best estimate, H [(p)].
In that approximation, the average size H [(p)] of the shortest path subtree
rooted at E can be specied, at least in principle, with the use of (17.33).
PH[(p)]31

n Pr KH[(p)] = n , by equating
Indeed, since H KH[(p)] = n=1

H KH[(p)] = jQ (p)  jQ (p  1)
a relation in one unknown H [(p)] is found and can be solved for H [(p)].
In conclusion, we end up with the approximation

Pr [Q (p) = n]  Pr KH[(p)] = n


which roughly demonstrates that, in general, Pr [Q (p) = n] is likely related to the hopcount distribution in that certain class of graphs.
Unfortunately, for very few types of graphs, both the pdf Pr [KQ = n] and
the multicast gain jQ (p) can be computed. This fact augments the value of
Theorem 17.5.1, although the class RGU is not a good model for the graph
of the Internet. Fortunately, the shortest path tree deduced from that class
seems a reasonable approximation as shown in Fig. 16.4 and su!cient to
provide rst order estimates.

17.6 Proof of (17.16): jQ (p) for random graphs


Before embarking with the proof of formula (17.16), we rst prove the following lemma.

Lemma 17.6.1

For d A e,
V(d> e) =

e
[
d!
(d 3 n)! 1
=
[#(d + 1) 3 #(d 3 e + 1)]
(e 3 n)! n
e!
n=1

and
V(e> e) =

e
[
1
= #(e + 1) + 
n
n=1

17.6 Proof of (17.16): jQ (p) for random graphs

411

Proof: We start by writing


V(d> e) =

e
[
(d 3 n) (e 3 n + 1)
n
n=1

=d

e
e
[
[
(d 3 1 3 n) (e 3 n + 1)
(d 3 1 3 n) (e 3 n + 1)
3
n
n=1
n=1



and by the recurrence for the binomial
Since (d 3 1 3 n) (e 3 n + 1) = (d 3 e 3 1)! d313n
e3n
Se d313n d31
=
>
we
have
that
n=1
e3n
e31
V(d> e) = dV(d 3 1> e) 3

1 (d 3 1)!
d 3 e (e 3 1)!

After s iterations, we have


V(d> e) = d(d 3 1) (d 3 s + 1)V(d 3 s> e) 3

s31
[
d!
1
(e 3 1)! m=0 (d 3 m)(d 3 m 3 e)

and, if s = d 3 e, the recursions stops with result,


d3e31
e
[
1
d!
d! [ 1
3
e! n=1 n
(e 3 1)! m=0 (d 3 m)(d 3 m 3 e)
3
4
# d
$
d3e
d3e
e
d
[
[1
1D
d! [ 1
d! C [ 1
d! [ 1
=
3
3
=
3
e! n=1 n
e! n=1 n n=e+1 n
e! n=1 n n=1 n

V(d> e) =

from which the lemma follows.

l
k
(Q )
in the URT with Q
Proof of equation (17.16): We will investigate H [[l ] = H [l
nodes. Here H [[l ] is the number of joint hops in a multicast SPT from the root to l uniformly
chosen
k l nodes in the URT and where all the group member nodes are dierent from the root. Let
l be the same quantity where we allow the group member nodes to be the root. Then,
H [
k l
l = Q 3 l H [[l ]
H [
Q
since there are l possibilities each with probability

1
Q

that one of the nodes equals the root, in

which case [l = 0.
k l
l is deduced from Fig. 17.11, where two clusters are
The average number of joint hops H [
shown each with respectively n and Q 3 n nodes. The rst cluster with n nodes does not possess
the root (dark shaded), but it contains the l multicast group members (light shaded). There is
already at least 1 joint hop because the link between the root and node D, that can be viewed as
the root of the rst cluster, is used by all l group members lying in the rst cluster. Given the size
n of the rst cluster, the probability that all l uniformly chosen group members belong to the rst
cluster equals Qn(n31)(n3l+1)
because the probability that the rst group member belongs to
(Q 31)(Q 3l+1)
n
that cluster, which is Q
, the probability that the second group member also belongs to the rst
n31
cluster, which is Q 31 and so on. Since the size of the rst cluster connected to the root is uniform
in between 1 and Q 3 1, the probability that the size is n equals Q 131 . When all l nodes are in
that rst cluster of size n, [l is at least 1, and the problem restarts, but with Q replaced by n and
D being the root. Hence, if all l group members
belong
tolthe rst cluster, the average number of
k
S 31 n(n31)(n3l+1) 
(n) because we must sum over all possible
joint hops is Q 131 Q
1
+
H
[
l
n=1 Q (Q 31)(Q 3l+1)
sizes for the rst cluster. If not all l group member nodes are in the rst cluster, the group member
nodes are divided over the two clusters. But, in that case, we have no joint overlaps or [l = 0.

412

The e!ciency of multicast


Root

N  k nodes

k nodes

l
k
(Q ) -recursion.
Fig. 17.11. The two contributing clusters leading to the H [
l

Thus, if not all l group members nodes are in the rst cluster, the only way that there are possible
joint overlaps ([l A 0), is that all l group member nodes are in the second cluster. However,
by removing the rst cluster, we are left again with a uniform recursive tree of size Qk 3 n. The
l
S 31 (Q 3n)(Q 3n31)(Q 3n3l+1)
(Q 3n) .
average number of joint hops in this case is Q 131 Q
H [
l
n=1
Q (Q 31)(Q 3l+1)
Adding both contributions results in the recursion formula
k
l
(Q ) =
H [
l

Q
31
k
l
[
n(n 3 1) (n 3 l + 1) 
1
(n)
1 + 2H [
l
Q 3 1 n=1 Q (Q 3 1) (Q 3 l + 1)

(17.36)

We next write
(Q )

l

l
k
(Q ) =
= Q(Q 3 1) (Q 3 l + 1)H [
l

l
k
Q!
(Q )
H [
l
(Q 3 l)!

then the above recurrence equation (17.36) turns into


(Q )

l

Q
31
[
1
(n)
[n(n 3 1) (n 3 l + 1) + 2l ]
Q 3 1 n=1

Q
31
Q
31
[
[
2
1
(n)
n(n 3 1) (n 3 l + 1) +

Q 3 1 n=l
Q 3 1 n=1 l

Subtracting
(Q )

(Q 3 1)l

(Q 31)

3 (Q 3 2)l

(Q 3 1)!
(Q 31)
+ 2l
(Q 3 l 3 1)!

from which we obtain


(Q 31)

(Q )

l
Q


(Q 3 2)!
+ l
Q(Q 3 l 3 1)!
Q 31

(17.37)

Iterating (17.37) gives


(Q )

l
Q

n31
[
m=0

(Q 3n)


(Q 3 2 3 m)!
+ l
(Q 3 m)(Q 3 l 3 1 3 m)!
Q 3n

17.6 Proof of (17.16): jQ (p) for random graphs

413

l
k
(l)
(l) = 0, because the root is then always one of the group member nodes, we
Since l = H [
l
nally obtain,

(Q )

l

=Q

Q[
3l31
m=0

Q
[
(Q 3 2 3 m)!
(n 3 2)!
=Q
(Q 3 m)(Q 3 l 3 1 3 m)!
n(n
3 l 3 1)!
n=l+1

(Q )

It can be shown that, for large Q, l


Because
k
l
(Q )
H [l
=

(17.38)

(Q 32)!
Q
.
l31 (Q 3l31)!

k
l
Q
(Q ) = (Q 3 l 3 1)! (Q )
H [
l
l
Q 3l
(Q 3 1)!

we have that
l
k
(Q 3 l 3 1)!Q
(Q )
=
H [l
(Q 3 1)!
k
l
(Q )
and, for large Q, H [l
;

1
Q
l31 (Q 31)

Q
[
n=l+1

(n 3 2)!
n(n 3 l 3 1)!

(17.39)

1
=
l31

Invoking Theorem 17.1.3, the average number of multicast hops for p uniformly chosen, distinct
group members is

jQ (p) =

p  
Q
[
(Q 3 l 3 1)!Q [ (n 3 2)!
p
(31)l31
l
(Q
3
1)!
n(n 3 l 3 1)!
l=1
n=2
Q
32
p
[
3Q
(Q 3 2 3 v)! [ p
(Q 3 l 3 1)!
(31)l
l
(Q 3 1)! v=0
Q 3v
(Q
3 l 3 1 3 v)!
l=1

The l-summation can be executed as follows. Consider {Q 31 (131@{)p =


Dierentiating v times yields

Sp p
l Q 3l31 =
l=0 l (31) {

p  
l
[
(Q 3 l 3 1)!
p
gv k Q 313p
(31)l
{
({ 3 1)p
{Q 3l3v31 =
v
l
(Q 3 l 3 1 3 v)!
g{
l=0

Expanding the right-hand side around { = 1 gives


" 
l [
gv k Q 313p
Q 3 1 3 p gv
p
{
=
({
3
1)
({ 3 1)n+p
n
g{v
g{v
n=0

" 
[
Q 3 1 3 p (n + p)!
({ 3 1)n+p3v
n
(n + p 3 v)!
n=0

Evaluation at { = 1 only leads to a non-zero contribution if n + p 3 v = 0. Hence,


p  
Q 3 1 3 p
[
(Q 3 1)!
(Q 3 l 3 1)!
p
=
v! 3
(31)l
v
3
p
l
(Q
3
l
3
1
3
v)!
(Q
3 1 3 v)!
l=1

414

The e!ciency of multicast

and
Q 32
Q
32
[
3Q(Q 3 1 3 p)! [
v!
1
+Q
(Q 3 1)!
(Q
3
v)(v
3
p)!(Q
3
1
3
v)
(Q
3
v)(Q
3 1 3 v)
v=0
v=0
%Q 32
&
Q
32
[
v!
v!
3Q(Q 3 1 3 p)! [
=
3
(Q 3 1)!
(v
3
p)!(Q
3
1
3
v)
(v
3
p)!(Q
3 v)
v=p
v=p
%Q 31
&
Q
[ 1
[
1
3
+Q
n
n
n=1
n=2
%Q 3p31
&
Q[
3p
[
(Q 3 n 3 1)!
(Q 3 n)!
3Q(Q 3 1 3 p)!
3
=
+Q 31
(Q 3 1)!
(Q 3 n 3 1 3 p)!n
(Q 3 n 3 p)!n
n=1
n=2

jQ (p) =

Rewrite the rst summation as


Q 3p31
[
n=1

Q[
3p
(Q 3 2)!
(Q 3 n 3 1)!
(Q 3 n 3 1)!(Q 3 n 3 p)
=
+
(Q 3 n 3 1 3 p)!n
(Q 3 2 3 p)!
(Q 3 n 3 p)!n
n=2

Q[
3p
Q[
3p
(Q 3 2)!
(Q 3 n)!
(Q 3 n 3 1)!
+
3p
(Q 3 2 3 p)!
(Q
3
n
3
p)!n
(Q
3 n 3 p)!n
n=2
n=2

Then,
3Q(Q 3 1 3 p)!
jQ (p) =
(Q 3 1)!
=

&
Q[
3p
(Q 3 n 3 1)!
(Q 3 2)!
3p
+Q 31
(Q 3 2 3 p)!
n(Q 3 n 3 p)!
n=2

Q 3p
Q (p 3 1) + 1
pQ(Q 3 1 3 p)! [ (Q 3 n 3 1)!
+
(Q 3 1)
(Q 3 1)!
n(Q 3 n 3 p)!
n=2

= 31 +

Q 3p
pQ(Q 3 1 3 p)! [ (Q 3 n 3 1)!
(Q 3 1)!
n(Q 3 n 3 p)!
n=1

Using Lemma 17.6.1


Q[
3p
n=1

(Q 3 1)!
(Q 3 n 3 1)! 1
=
[#(Q) 3 #(p)]
(Q 3 n 3 p)! n
(Q 3 p)!

(17.40)

nally leads to (17.16).

17.7 Proof of Theorem 17.3.1: jQ (p) for n-ary trees

Let [l be the number of joint hops for l dierent multicast group members (we allow the root to
l = 0). Then,
be a user in which case [
k
l
l D 1 = Pr [All group members belong to the same cluster connected to the root]
Pr [
= n Pr [All group members belong to the rst cluster connected to the root]
(Q 31)@n
1+n++nG1 
=n

Ql 
l

l
=n 
1+n++nG 
l

(17.41)

17.7 Proof of Theorem 17.3.1: jQ (p) for n-ary trees

415

By self-similarity of n-ary trees we obtain


1+n++nG2 
k
l
l
l D 2|[
l D 1 = s(G31) = n 
Pr [
l
G1 
1+n++n
l

because keach cluster


extending
from the root
l
k
l isk itself a n-ary
l tree of depth G 3 1. In general, we
l D m = Pr [
l D m|[
l D m 3 1 Pr [
l D m 3 1 . Hence, by iteration,
have Pr [
k
l
l D m =
Pr [

G
\

(q)

sl

>

m = 1> 2> = = = > G 3 1

(17.42)

q=G3m+1

k
l
l D G = 0, because if [
l = G some destinations must
Note that for l D 2 the probability Pr [
be identical. From (2.36) we obtain for l D 2,
k l G31
[
l =
H [

G
\

(q)
sl

m=1 q=G3m+1

Since H [[l ] =

Q
H
Q 3l

G31
[
m=1

nm

1++nGm 
l
1++nG 
l

G31
[
m=1


m
nG3m 1++n
l
1++nG 

(17.43)

k l
l , we nd
[

H [[l ] =

1++nm 
G31
Q [ nG3m
l
1++nG  >
Q 3 l m=1

lD2

(17.44)

k l
1 and H [[1 ] we nd
For the value of H [
G
k l
r
q
[
1
1 = 1
H [
nm (1 + + nG3m ) =
GnG+1 3 (Q 3 1)
Q m=1
Q(n 3 1)

and
H [[1 ] =


m
G31
G
[ nG3m 1++n
nG
1 [ m
Q
1
n (1 + + nG3m ) =
1++nG  +
Q 3 1 m=1
Q 3 1 m=1
Q 31
1

Invoking Theorem 17.1.3 yields


jQ>n (p) =

1++nm 
G31
p  
[
Q [ nG3m
p
pnG
l
(31)l
3
1++nG 
Q 3 1 l=1 l
Q 3 l m=1
l

Writing Dm =

nm+1 31
n31

and reversing the l- and m-summation yields, using (17.24),

jQ>n (p) =

G31
p
[
Dm ! [ p
(Q 3 l 3 1)!
pnG
(31)l
nG3m
3Q
l
Q 31
Q!
(Dm 3 l)!
m=1
l=1

Concentrating on the inner sum with lower sum bound l = 0, denoted as Vm , and substituting
n = p 3 l, we have
p  
[
p
K(Q 3 p + n)
(31)p3n
Vm =
n
K(D
m 3 p + n + 1)
n=0
Invoking the Taylor series of the hypergeometric function (Abramowitz and Stegun, 1968, Section
15.1.1),
I (d> e; f; }) =

"
K(f) [ K(d + q)K(e + q) q
}
K(d)K(e) q=0
K(f + q)q!

416

The e!ciency of multicast

p!Vm is the coe!cient in } p of the Cauchy product of


(1 3 })p =

"  
[
p
(31)n } n
n
n=0

and
"
[
K(Q 3 p)
K(Q 3 p + n)
I (1> Q 3 p; Dm 3 p + 1; }) =
}n
K(Dm 3 p + 1)
K(D
m 3 p + 1 + n)
n=0

Hence,
Vm =

1
K(Q 3 p)
gp
[(1 3 })p I (1> Q 3 p; Dm 3 p + 1; })]|}=0
p! K(Dm 3 p + 1) g} p

Invoking the dierentiation formula (Abramowitz and Stegun, 1968, Section 15.2.7),

(31)p K(d + p)K(f 3 e + p)K(f)
gp 
(13})d31 I (d+p> e; f+p; })
(1 3 })d+p31 I (d> e; f; }) =
g} p
K(d)K(f 3 e)K(f + p)
we have, since d = 1 and I (d> e; f; 0) = 1,
Vm =

(31)p K(Q 3 p)K(Dm + 1 3 Q + p)


K(Dm + 1 3 Q)K(Dm + 1)

Thus,
jQ>n (p) =



G31
[
Dm ! (31)p (Q 3 p 3 1)!(Dm 3 Q + p)!
(Q 3 1)!
pnG
3Q
3
nG3m
Q 31
Q!
(Dm 3 Q)!Dm !
Dm !
m=1
G31
G31
[
(31)p31 (Q 3 p 3 1)! [ G3m (Dm 3 Q + p)!
pnG
nG3m +
n
+
Q 31
(Q 3 1)!
(Dm 3 Q)!
m=1
m=1

from which (17.25) is immediate.

17.8 Problem
(i) Compute the eective power exponent  W (Q) for the n-ary tree.

18
The hopcount to an anycast group

In this chapter, the probability density function of the number of hops to


the most nearby member of the anycast group consisting of p members
(e.g. servers) is analyzed. The results are applied to compute a performance measure  of the e!ciency of anycast over unicast and to the server
placement problem. The server placement problem asks for the number of
(replicated) servers p needed such that any user in the network is not more
than m hops away from a server of the anycast group with a certain prescribed probability. As in Chapter 17 on multicast, two types of shortest
path trees are investigated: the regular n-ary tree and the irregular uniform
recursive tree treated in Chapter 16. Since these two extreme cases of trees
indicate that the performance measure   1  d log p where the real number d depends on the details of the tree, it is believed that for trees in real
networks (as the Internet) a same logarithmic law applies. An order calculus
on exponentially growing trees further supplies evidence for the conjecture
that   1  d log p for small p.

18.1 Introduction
IPv6 possesses a new address type, anycast, that is not supported in IPv4.
The anycast address is syntactically identical to a unicast address. However, when a set of interfaces is specied by the same unicast address, that
unicast address is called an anycast address. The advantage of anycast
is that a group of interfaces at dierent locations is treated as one single
address. For example, the information on servers is often duplicated over
several secondary servers at dierent locations for reasons of robustness and
accessibility. Changes are only performed on the primary servers, which
are then copied onto all secondary servers to maintain consistency. If both
the primary and all secondary servers have a same anycast address, a query
417

418

The hopcount to an anycast group

from some source towards that anycast address is routed towards the closest
server of the group. Hence, instead of routing the packet to the root server
(primary server) anycast is more e!cient.
Suppose there are p (primary plus all secondary) servers and that these p
servers are uniformly distributed over the Internet. The number of hops from
the querying device D to the closest server is the minimum number of hops,
denoted by kQ (p), of the set of shortest paths from D to these p servers in
a network with Q nodes. In order to solve the problem, the shortest path
tree rooted at node D, the querying device, needs to be investigated. We
assume in the sequel that one of the p uniformly distributed servers can
possibly coincide with the same router to which the querying machine D is
attached. In that case, kQ (p) = 0. This assumption is also reected in
the notation, small k, according to the convention made in Section 16.3.2
that capital K for the hopcount excludes the event that the hopcount can
be zero.
Clearly, if p = 1, the problem reduces to the hopcount of the shortest
path from D to one uniformly chosen node in the network and we have that
kQ (1) = kQ >
where kQ is the hopcount of the shortest path in a graph with Q nodes.
The other extreme for p = Q leads to
kQ (Q ) = 0
because all nodes in the network are servers. In between these extremes, it
holds that
kQ (p)  kQ (p  1)
since one additional anycast group member (server) can never increase the
minimum number of hops from an arbitrary node to that larger group.
The hopcount to an anycast group is a stochastic problem. Even if the
network graph is exactly known, an arbitrary node D views the network
along a tree. Most often it is a shortest path tree. Although the sequel
emphasizes shortest path trees, the presented theory is equally valid for
any type of tree. The node Ds perception of the network is very likely
dierent from the view of another node D0 . Nevertheless, shortest path
trees in the same graph possess to some extent related structural properties
that allow us to treat the problem by considering certain types or classes
of shortest path trees. Hence, instead of varying the arbitrary node D over
all possible nodes in the graph and computing the shortest path tree at
each dierent node, we vary the structure of the shortest path tree rooted

18.2 General analysis

419

at D over all possible shortest path trees of a certain type. Of course, the
connement of the analysis then lies in the type of tree that is investigated.
We will only consider the regular n-ary tree and the irregular URT . It
seems reasonable to assume that real shortest path trees in the Internet
possess a structure somewhere in between these extremes and that scaling
laws observed in both the two extreme cases may also apply to the Internet.
The presented analysis allows us to address at least two dierent issues.
First, for a same class of trees, the e!ciency of anycast over unicast dened
in terms of a performance measure ,
=

H [kQ (p)]
1
H [kQ (1)]

is quantied. The performance measure  indicates how much hops (or link
traversals or bandwidth consumption) can be saved, on average, by anycast.
Alternatively,  also reects the gain in end-to-end delay or how much faster
than unicast, anycast nds the desired information. Second, the so-called
server placement problem can be treated. More precisely, the question How
many servers p are needed to guarantee that any user request can access the
information within m hops with probability Pr [kQ (p) A m]  , where  is
certain level of stringency, can be answered. The server placement problem
is expected to gain increased interest especially for real-time services where
end-to-end QoS (e.g. delay) requirements are desirable. In the most general
setting of this server placement problem, all nodes are assumed to be equally
important in the sense that users requests are generated equally likely at
any router in the network with Q nodes. As mentioned in Chapter 17, the
validity of this assumption has been justied by Phillips et al. (1999). In
the case of uniform user requests, the best strategy is to place servers also
uniformly over the network. Computations of Pr [kQ (p) A m] ?  for given
stringency  and hop m, allow the determination of the minimum number p
of servers. The solution of this server placement problem may be regarded
as an instance of the general quality of service (QoS) portfolio of an network
operator. When the number of servers for a major application oered by the
service provider are properly computed, the service provider may announce
levels  of QoS (e.g. via Pr [kQ (p) A m] ? ) and accordingly price the use
of the application.
18.2 General analysis
Let us consider a particular
o shortest path tree W rooted at node D with
n
(n)
the level set OQ = [Q
as dened in Section 16.2.2. Suppose
1$n$Q31

420

The hopcount to an anycast group

that the result of uniformly distributing p anycast group members over the
graph leads to a number p(n) of those anycast group member nodes that
(n)
are n hops away
n from
o the root. These p distinct nodes all belong to the
(n)
(n)
n-th level set [Q . Similarly as for [Q , some relations are immediate.
First, p(0) = 0 means that none of the p anycast group members coincides
with the root node D or p(0) = 1 means that one of them (and at most one)
is attached to the same router D as the querying device. Also, for all n A 0,
(n)
it holds that 0  p(n)  [Q and that
Q31
X

p(n) = p

(18.1)

n=0

Given the tree W specied


the level set OQand the anycast group members
(0) by (1)
specied by the set p > p > = = = > p(Q31) , we will derive the lowest nonempty level p(m) , which is equivalent to kQ (p).
Let us denote by hm the event that all rst m + 1 levels are not occupied
by an anycast group member,
n
o n
o
n
o
hm = p(0) = 0 _ p(1) = 0 _ _ p(m) = 0
The probability distribution of the minimum hopcount,
[kQ (p)
Pr
= m|OQ ],
(m)
is then equal to the probability of the event hm31 _ p A 0 . Since the

f

event p(m) A 0 = p(m) = 0 , using the conditional probability yields


i
hn
o

Pr [kQ (p) = m|OQ ] = Pr p(m) A 0 hm31 Pr [hm31 ]

hn
o
i

= 1  Pr p(m) = 0 hm31 Pr [hm31 ]

(18.2)

Since hm = hm31 _ p(m) = 0 , the probability of the event hm can be decomposed as


i
hn
o

(18.3)
Pr [hm ] = Pr p(m) = 0 hm31 Pr [hm31 ]
The assumption that allp
group members
are uniformly distributed
anycast

enables to compute Pr p(m) = 0 hm31 exactly. Indeed, by the uniform


assumption, the probability equals the ratio of the favorable possibilities
over the total possible. The total number of ways to distribute p items over
P
(n)
the latter constraints follows from the condition
Q  m31
n=0 [Q positions
Q3Sm31 [ (n)
n=0 Q
. Likewise, the favorable number of ways to
hm31 equals
p

18.2 General analysis

421

distribute p items over the remaining levels higher than m, leads to


S
i Q 3 mn=0 [Q(n)
hn
o

(m)
p
Pr p = 0 hm31 = Sm31
(18.4)
(n)
Q3
[
Q

n=0

The recursion (18.3) needs an initialization, given by


h
i
p
Pr [h0 ] = Pr p(0) = 0 = 1 
Q
Q 31

(0)

( )
which follows from Pr p = 0 = p
and equals Pr p(0) = 0 |h31
Q
(p)

p
(although the event h31 is meaningless). Observe that Pr p(0) = 1 = Q
holds for any tree such that
p
Pr [kQ (p) = 0] =
Q
By iteration of (18.3), we obtain
Pr [hm ] =

m
Y
v=0

Q3Sv

(n)

n=0

[Q

n=0

Q 3Sm

p
Q3Sv31 [ (n) =
p

(n)

n=0

p
Q

[Q

(18.5)

P
where the convention in summation is that en=d in = 0 if d A e. Finally,
combining (18.2) with (18.4) and (18.5), we arrive at the general conditional
expression for the minimum hopcount to the anycast group,
Q 3Sm31 [ (n) Q3Sm [ (n)
n=0 Q
n=0 Q

p
p
(18.6)
Pr [kQ (p) = m|OQ ] =
Q
p

Clearly, while Pr [kQ (0) = m|OQ ] = 0 since there is no path, we have for
p = 1,
(m)

[
Pr [kQ (1) = m|OQ ] = Q
Q
It directly follows from (18.6) that

Q 3Sq

n=0

Pr [kQ (p)  q|OQ ] = 1 

p
Q

(n)

[Q

(18.7)

P 31
P
(n)
(n)
If Q  qn=0 [Q ? p or, equivalently, Q
n=q+1 [Q ? p, then equation
(18.7) shows that Pr [kQ (p) A q|OQ ] = 0. The maximum possible hopcount
of a shortest path to an anycast group strongly depends on the specics of the
shortest path tree or the level set OQ . A general result is worth mentioning:

422

The hopcount to an anycast group

Theorem 18.2.1 For any graph, it holds that


Pr[kQ (p) A Q  p] = 0
In words, the longest shortest path to an anycast group with p members
can never possess more than Q  p hops.
Proof: This general theorem follows from the fact that the line topology
is the tree with the longest hopcount Q  1 and only in the case that all
p last positions (with respect to the source or root) are occupied by the p
anycast group members, is the maximum hopcount Q  p.

For the URT , Pr[kQ (p) = Q  p] is computed exactly in (18.12).


Corollary 18.2.2 For any graph, it holds that
1
Q
Proof: This corollary follows from Theorem 18.2.1 and the law of total
probability. Alternatively, if there are Q  1 anycast members in a network
with Q nodes, the shortest path can only consist of one hop if none of the
anycast members coincides with the root node. This probability is precisely
1

Q.
Pr[kQ (Q  1) = 1] =

Using the tail probability formula (2.36) for the average, it follows from
(18.7) that
P
Q32
(n)
1 X Q  qn=0 [Q
(18.8)
H [kQ (p)|OQ ] = Q
p
p q=0
from which we nd,
Q31
1 X
(n)
H [kQ (1)|OQ ] =
n[Q
Q
n=1

Thus, given OQ , a performance measure  for anycast over unicast can be


quantied as
H [kQ (p)|OQ ]
1
=
H [kQ (1)|OQ ]
Using the law of total probability, the distribution of the minimum hopcount to the anycast group is
X
Pr [kQ (p) = m] =
Pr [kQ (p) = m|OQ ] Pr [OQ ]
(18.9)
all OQ

18.3 The n-ary tree

423

or explicitly,
Pr[kQ (p) = m] =

SQ31

SQ 31 {n SQ 31 {n h
i
n=m
 n=m+1
(1)
(Q31)
p
= {Q31
Pr [Q = {1 >= = = >[Q
Q p

n=1 {n =Q31

where the integers {n  0 for all n. This expression explicitly shows the
importance of the level structure OQ of the shortest path tree W . The level
set OQ entirely determines the shape of the tree W . Unfortunately, a general
form for Pr [OQ ] or Pr [kQ (p) = m] is di!cult to obtain. In principle, via
extensive trace-route measurements from several roots, the shortest path
tree and Pr [OQ ] can be constructed such that a (rough) estimate of the
level set OQ in the Internet can be obtained.
18.3 The n-ary tree
For regular trees, explicit expressions are possible because the summation
in (18.9) simplies considerably. For example, for the n-ary tree dened in
Section 17.3,
(m)

[Q = n m
(m)

Provided the set OQ only contains these values of [Q for each m, we have
that Pr [OQ ] = 1, else it is zero (because then OQ is not consistent with a
G+1
n-ary tree). Summarizing, for the n-ary tree with Q = n n3131 and G levels,
the distribution of the minimum hopcount to the anycast group is
Q3 nm 31 Q3 nm+1 31
n31
n31

p
(18.10)
Pr [kQ (p) = m] =
Q p
p

Extension of the integer n to real numbers in the formula (18.10) is expected to be of value as suggested in Section 17.3. When a n-ary tree was
used to t corresponding Internet multicast measurements (Van Mieghem
et al., 2001a), a remarkably accurate agreement was found for the value
n  3=2, which is about the average degree of the Internet graph. Hence,
if we were to use the n-ary tree as model for the hopcount to an anycast
group, we expect that n  3=2 is the best value for Internet shortest path
trees. However, we feel we ought to mention that the hopcount distribution of the shortest path between two arbitrary nodes is denitely not a
n-ary tree, because Pr [kQ (1) = m] increases with the hopcount m, which is
in conict with Internet trace-route measurements (see, for example, the
bell-shape curve in Fig. 16.4).
Figure 18.1 displays Pr [k(p)  m] for a n-ary with outdegree n = 3 and

424

The hopcount to an anycast group


1.0
N = 500
k=3

0.8

Pr[hN (m) d j]

m = 50
m = 10

0.6

m=5

0.4

m=2
m=1

0.2

Fig. 18.1. The distribution function of k500 (p) versus the hops m for various sizes
of the anycast group in a n-ary tree with n = 3 and Q = 500

Q = 500. This type of plot allows us to solve the server placement problem. For example, assuming that the n-ary tree is a good model and the
network consists of Q = 500 nodes, Fig. 18.1 shows that at least p = 10
servers are needed to assure that any user is not more than four hops separated from a server with a probability of 93%. More precisely, the equation
Pr[k500 (p) A 4] ? 0=07 is obeyed if p  10.
Figure 18.2 gives an idea how the performance measure  decreases with
the size of the anycast group in n-ary trees (all with outdegree n = 3), but
with dierent size Q . For values of p up to around 20% of Q , we observe
that  decreases logarithmically in p.

18.4 The uniform recursive tree (URT)


Chapter 16 motivates the interest in the URT. The URT is believed to
provide a reasonable, rst order estimate for the hopcount problem to an
anycast group in the Internet.

18.4.1 Recursion for Pr [k(p) = m]


Usually, a combinatorial approach such as (18.9) is seldom successful for
URTs while structural properties often lead to results. The basic Theo-

18.4 The uniform recursive tree (URT)

425

1.0

k=3

0.8

0.6

N = 100
N = 500

0.4
N = 5000
5

N = 10

0.2
6

N = 10

0.0 2

7 8 9

7 8 9

0.1

m/N

Fig. 18.2. The performance measure  for several sizes of n-ary trees (with n = 3)
as a function of the ratio of anycast nodes over the total number of nodes.

rem 16.2.1 of the URT, applied to the anycast minimum hop problem, is
illustrated in Fig. 18.3.
Root

i anycast
members

m  i anycast
members

R1

N  k nodes

T2

k nodes
T1

Fig. 18.3. A uniform recursive tree consisting of two subtrees W1 and W2 with n and
Q  n nodes respectively. The rst cluster contains l anycast members while the
cluster with Q  n nodes contains p  l anycast members.

Figure 18.3 shows that any URT can be separated into two subtrees W1 and
W2 with n and Q  n nodes respectively. Moreover, Theorem 16.2.1 states

426

The hopcount to an anycast group

that each subtree is independent of the other and again a URT. Consider
now a specic separation of a URT W into W1 = w1 and W2 = w2 , where the tree
w1 contains n nodes and l of the p anycast members and w2 possesses Q  n
nodes and the remaining p  l anycast members. The event {kW (p) = m}
equals the union of all possible sizes Q1 = n and subgroups p1 = l of the
event {kw1 (l) = m  1} _ {kw2 (p  l)  m} and the event {kw1 (l) A m  1} _
{kw2 (p  l) = m},
{kW (p) = m} = ^n ^l {{kw1 (l) = m  1} _ {kw2 (p  l)  m}}
^ {{kw1 (l) A m  1} _ {kw2 (p  l) = m}}
Because kQ (0) is meaningless, the relation must be modied for the case
l = 0 to
{kW (p) = m} = {kw2 (p) = m}
and for the case l = p to
{kW (p) = m} = {kw1 (p) = m  1}
This decomposition holds for any URT W1 and W2 , not only for the specic
ones w1 and w2 . The transition towards probabilities becomes
X
Pr [kW (p) = m] =
(Pr [kw1 (l) = m  1] Pr [kw2 (p  l)  m]
all w1 >w2 >n>l

+ Pr [kw1 (l)  m  1] Pr [kw2 (p  l) = m])


Pr [W1 = w1 > W2 = w2 > Q1 = n> p1 = l]
Since W1 and W2 and also p1 are independent given Q1 , the last probability
o simplies to
o = Pr [W1 = w1 > W2 = w2 > Q1 = n> p1 = l]
= Pr [W1 = w1 |Q1 = n] Pr [W2 = w2 |Q1 = n] Pr [p1 = l|Q1 = n] Pr [Q1 = n]
Theorem 16.2.1 states that Q1 is uniformly distributed over the set with
1
. The fact that l out of the p
Q  1 nodes such that Pr [Q1 = n] = Q31
anycast members, uniformly chosen out of Q nodes, belong to the recursive
subtree W1 implies that p  l remaining anycast members belong to W2 .
Hence, analogous to a combinatorial problem outlined by Feller (1970, p. 43)
that leads to the hypergeometric distribution, we have
nQ 3n
Pr [p1 = l|Q1 = n] =

Qp3l

18.4 The uniform recursive tree (URT)

427

because all favorable combinations are those l to distribute l anycast mem 3n


bers in W1 with n nodes multiplied by all favorable Q
p3l to distribute the
remaining p  l in W2 containing Q  nnodes. The total way to distribute
p anycast members over Q nodes is Q
p . Finally, we remark that the hopcount of the shortest path to p anycast members in a URT only depends on
its size. This means that the sum over all w1 of Pr [W1 = w1 |Q1 = n], which
equals 1, disappears and likewise also the sum over all w2 . Combining the
above leads to
Pr [kQ (p) = m] =

Q31
X p31
X

(Pr [kn (l) = m  1] Pr [kQ3n (p  l)  m]

n=1 l=1

nQ3n

+ Pr [kn (l) A m  1] Pr [kQ3n (p  l) = m])


+

Q31
X Q3n Pr [kQ 3n (p)
p

= m] +

p
Q

(Q  1) p

n=1

l p3l

(Q  1) Q
p

Pr [kn (p) = m  1]

By substitution of n 0 = Q  n and p0 = p  l, we obtain the recursion,


Q3n

Pr [kQ (p) = m] =

Q31
X p31
X nl
n=1 l=1
Q3n31
X

p3l

(Pr [kn (l) = m  1] + Pr [kn (l) = m])



(Q  1) Q
p

Pr [kQ3n (p  l) = t]

t=m

Q31
X
n=1

n
p

(Pr [kn (p) = m] + Pr [kn (p) = m  1])



(Q  1) Q
p
(18.11)

This recursion (18.11) is solved numerically for Q = 20. The result is shown
in Fig. 18.4, which demonstrates that Pr [k(p) A Q  p] = 0 or that the
path with the longest hopcount to an anycast group of p members consists
of Q  p links.
Since there are (Q  1)! possible recursive trees (Theorem 16.2.2) and
there is only one line tree with Q  1 hops where each node has precisely
one child node, the probability to have precisely Q  1 hops from the root is
1
(Q31)! (which also is Pr [kQ = Q  1] given in (16.8)). The longest possible
hopcount from a root to p anycast members occurs in the line tree where
all p anycast members occupy the last p positions. Hence, the probability

428

The hopcount to an anycast group

for the longest possible hopcount equals


Pr [kQ (p) = Q  p] =

p!

(Q  1)! Q
p

(18.12)

because there are p! possible ways to distribute the p anycast


members
Q
at the p last positions in the line tree while there are p possibilities to
distribute p anycast members at arbitrary places in the line tree.
-1

10

N = 20

-3

10

-5

10

Pr[hN (m) = j]

-7

10

-9

10

-11

10

-13

10

-15

10

-17

10

-19

10

10

12

14

16

18

20

Fig. 18.4. The pdf of kQ (p) in a URT with Q = 20 nodes for all possible p.
Observe that Pr[kQ (p) A Q  p] = 0= This relation connects the various curves
to the value for p.

Figure 18.4 allows us to solve the server placement problem. For example, consider the scenario in which a network operator announces that
any user request will reach a server of the anycast group in no more than
m = 4 hops in 99.9% of the cases. Assuming his network has Q = 20 routers
and the shortest path tree is a URT, the network operator has to compute
the number of anycast servers p he has to place uniformly spread over the
Q = 20 routers by solving Pr [k20 (p) A 4] ? 1033 . Figure 18.4 shows that
the intersection of the line m = 4 and the line Pr [k20 (p) = 4] = 1033 is the
curve for p = 7. Since the curves for p  7 are exponentially decreasing,
Pr [k20 (p) A 4] is safely1 approximated by Pr [k20 (p) = 4], which leads to
the placing of p = 7 servers. When following the line m = 4, we also observe
that the curves for p = 5> 6> 7> 8 lie near to that of p = 7. This means that
1

More precisely, since Pr [k20 (4) A 4] = 0.001 06 and Pr [k20 (5) A 4] = 0.000 32, only p = 5
servers are su!cient.

18.4 The uniform recursive tree (URT)

429

placing a server more does not considerably change the situation. It is a


manifestation of the law   1  d log p, which tells us that by placing p
servers the gain measured in hops with respect to the single server case is
slowly, more precisely logarithmically, increasing. The performance measure
 for the URT is drawn for several sizes Q in Fig. 18.5.

18.4.2 Analysis of the recursion relation


The product of two probabilities in the double sum in (18.11) seriously complicates a possible analytic treatment. A relation for a generating function of
Pr [kQ (p) = m] and other mathematical results are derived in Van Mieghem
(2004b). Here, we summarize the main results.
p
. Using Pr [kn (l)  1] = 1, the
(a) Let us check Pr [kQ (p) = 0] = Q
p3l
convention that Pr [kn (l) = 1] = 0 and Pr [kQ3n (p  l) = 0] = Q
3n , the
right hand side of (18.11), denoted by u, simplies to


Q31
p
XX
pl n Q n
1
u=

pl
Q n l
(Q  1) Q
p n=1 l=0
Q31
X p31
X n Q  1  n
1
=

p1l
l
(Q  1) Q
p n=1 l=0
Q31
X Q  1 p
1
=
=

Q
p1
(Q  1) Q
p n=1
(b) Observe that Pr [kQ (Q ) = m] = 0 for m A 0.
(c) For p = 1,

Pr [kQ

Q 31
1 X
n
= m] =
(Pr[kn = m] + Pr [kn = m  1])
Q 1
Q
n=1

Multiplying both sides by } m , summing over all m leads to the recursion for
the generating function (16.6)
(Q + 1)*Q+1 (}) = (} + Q )*Q (})

430

The hopcount to an anycast group

(d) The case p = 2 is solved in Van Mieghem (2004b, Appendix) as

m
2(1)Q313m (m+1)
2(1)Q3m X
(n+m+1)
n 2n1
VQ
Pr [kQ (2) = m] =
VQ
+
(1)
Q!
Q !(Q  1)
n
n=1


m31 
2n + 1
2(1)Q3m X m + n + 1
n+m+1
(1)n VQ

+
n
m
Q !(Q  1)
n=0

(18.13)
In van der Hofstad et al. (2002b) we have demonstrated that the covariance
between the number of nodes at level u and m for u  m in the URT is

u
h
i (1)Q31 X
(u) (m)
(n+m+1)
n+m 2n + m  u
H [Q [Q =
VQ
(1)
(Q  1)!
n
n=0

For m  u = 1, the last term in (18.13) is recognized as


2n31 1 2n
= 2 n , the rst sum in (18.13) is
n

m
2(1)Q3m X
(n+m+1)
n 2n1
VQ
(1)
Q !(Q  1)
n
n=1

With

2(31)Q 313m (m+1)


VQ
Q!

l
k
(m31) (m)
H [Q
[Q

(Q2 )

. Since



(m) 2
H [Q
(m+1)
2(1)Q3m31 VQ

=

(Q  1)
Q!
2 Q2

= 2 Pr [kQ = m], we obtain



h
i
(m) 2
(m31) (m)
H [Q
H [Q [Q
2Q
Pr [kQ (2) = m] =

Pr [kQ = m] +

Q
Q 1
2 Q2
2

m
2(1)Q31 X m + n
n+m
(1)n+m VQ
+
Q !(Q  1)
n
n=1

It would be of interest to nd an interpretation for the last sum.


Without proof2 , we mention the following exact results:
Q
X
Q
p=1

Pr [kQ (p) = Q  2] =

Q1
X
p=1

Q  1 Pr [kQ 1 (p) = Q  3]
1
+
Q 1
(Q  2)!
p

For p  Q  3, it holds that


p!
Pr [kQ (p) = Q  p  1] =

(Q  1)! Q
p
2

"
#
p
X
Q
p+1
+ (p  1)(p@2 + 1) +
2
n

By substitution into the recursion (18.11), one may verify these relations.

n=2

18.5 Approximate analysis

431

18.5 Approximate analysis


Since the general solution (18.9) is in many cases di!cult to compute as
shown for the URT in Section 18.4, we consider a simplied version of the
p
above problem where each node in the tree has equal probability s = Q
to be a server. Instead of having precisely p servers, the simplied version
considers on average
and the probability that there are precisely
Q p p servers
Q3p
p servers is p s (1  s)
. In the simplied version, the associated
equations to (18.4) and (18.3) are
i
hn
o
oi
hn
(m)

Pr p(m) = 0 hm31 = Pr p(m) = 0 = (1  s)[Q


Pr [hm ] =

m
Y

Pr

hn
oi
Sm31 (o)
p(m) = 0 = (1  s) o=0 [Q

o=0

which implies that the probability that there are no servers in the tree is
(1  s)Q . Since in that case, the hopcount is meaningless, we consider
the conditional probability (18.2) of the hopcount given that the level set
contains at least one server (which is denoted by e
kQ (p)) is

Sm31 (o)
(m)
i
h
1  (1  s)[Q (1  s) o=0 [Q
Pr e
kQ (p) = m|OQ =
1  (1  s)Q
Thus,
h
i 1  (1  s)Sqo=0 [Q(o)
e
Pr kQ (p)  q|OQ =
1  (1  s)Q

i
h
(o)
Finally, to avoid the knowledge of the entire level set OQ , we use H [Q =
(o)

Q Pr [kQ (1) = o] from (16.7) as the best estimate for each [Q and obtain
the approximate formula

l
k
Sm31 k (o) l
(m)
H [Q
H [Q
(1  s) o=0
1  (1  s)
h
i
Pr e
kQ (p) = m =
(18.14)
1  (1  s)Q
In the dotted lines in Fig. 18.5, we have added the approximate result for
the URT where H [kQ (p)] is computed based on (18.14), but where H[kQ (1)]
is computed exactly. For p = 1, the approximate analysis (18.14) is not
well
i Fig. 18.5 illustrates this deviation in the fact that appr (1) =
h suited:
H e
kQ (1) @H [kQ (1)] ? 1. For higher values of p we observe a fairly good
correspondence. We found that the probability (18.14) reasonably approximates the exact result plotted on a linear scale. Only the tail behavior (on

432

The hopcount to an anycast group

log-scale) and the case for p = 1 deviate signicantly. In summary for the
URT, the approximation (18.14) for Pr [kQ (p) = m] is much faster to compute than the exact recursion and it seems appropriate for the computation
of  for p A 1. However, it is less adequate to solve the server placement
problem that requires the tail values Pr [kQ (p) A m].
1.0
N = 10 : K
N = 20 : K
N = 30 : K
N = 50 : K

0.8

 0.404 ln(m/N)
0.295 ln(m/N)
 0.252 ln(m/N)
 0.210 ln(m/N)

0.6

0.4

0.2

0.0

7 8 9

0.1

7 8 9

1
m/N

Fig. 18.5. The performance measure  for several sizes Q of URTs as a function of
the ratio p@Q

18.6 The performance measure  in exponentially growing trees


In this section, we investigate the observed law   1  d log p for a much
larger class of trees, namely the class of exponentially growing trees to which
both the n-ary tree and the URT belong. Also most trees in the Internet
are exponentially growing trees. A tree is said to grow exponentially in the

(m) 1@m
number of nodes Q with degree  if limm<" [Q
=  or, equivalently,
(m)

[Q  m , for large m. The fundamental problem with this denition is that


it only holds for innite graphs Q = 4. For real (nite) graphs, there must
(o+1)
(o+2)
(Q31)
exist some level m = o for which the sequence [Q > [Q > = = = > [Q
P
(m)
ceases to grow because Q31
m=0 [Q = Q ? 4. This boundary eect complicates the denition of exponential growth in nite graphs. The second
(0)
(1)
(o)
complication is that even in the nite set [Q > [Q > = = = > [Q not necessary
(m)
(m)
all [Q with 0  m  o need to obey [Q  m , but enough should.

18.6 The performance measure  in exponentially growing trees

433

Without the limit concept, we cannot specify the precise conditions of exponential growth in a nite shortest path tree. If we assume in nite graphs
P
(m)
(m)
that [Q  m for m  o, then om=0 [Q = Q with 0 ?  ? 1. Indeed,
for  A 1, the highest hopcount level o possesses by far the most nodes since
o+1 31
o
31   , which cannot be larger than a fraction Q of the total number
of nodes.
We now present an order calculus to estimate  for exponentially growing
trees based on relation (18.8). Let us denote
Q3{
|=

p
Q

p31
Y
m=0

{
1
Q m

For large Q and xed p,


{p
| = exp 
(1 + r (1))
Q
(m)

In the case where the tree is exponentially growing for m  o as [Q = m m


with m some slowly varying sequence, only very few levels o (bounded by
P
(n)
a xed number) around o obey qn=0 [Q = R(Q ) where q 5 [o  o> o],
Pq
(n)
while for all m A o, we have n=0 [Q = q Q with some sequence q ?
P
(n)
q+1 ? max q = 1. Applied to (18.8) where { = qn=0 [Q ? Q ,

Q32

p
X (13q )Q
q
p
H [kQ (p)|OQ ]  (1 + r (1))
exp  q  +
Q
Q
p
q=0
o
X

q=o+1

If there are only a few levels more than o, the last series is much smaller than
1 and can be omitted. Since the slowly varying sequence q is unknown, we
approximate q =  and
Z p o 3x
Z o

p
p
Q
h
1
q
q
gx
exp  q  
exp   gq =
p
Q
Q
log 
x

0
q=0
Q
Z " 3x
h
h3p
1
gx 

log  p  x
p log 
Q
p
h3p
p
1
 
 log  + R
=
log 
p
Q
Q
o
X

where in the last step a series (Abramowitz and Stegun, 1968, Section 5.1.11)

434

The hopcount to an anycast group

for the exponential integral is used. Thus,

3p
p
33 h p 3log p3log 
+R Q
1+
log Q

  (1 + r (1))
1
31

1  +hlog+log
+
R
Q
Q

!
3p
h
1
log p h31  p
= (1 + r (1)) 1 

+R
log Q
log Q
log2 Q
Since by denition  = 1 for p = 1, we nally arrive at

3p
1
log p h31  h p

+R
 1
log Q
log Q
log2 Q
which supplies evidence for the conjecture   1  d log p that exponentially growing graphs possess a performance measure  that logarithmically
decreases in p, which is rather slow.
Measurement data in the Internet seem to support this log p-scaling law.
Apart from the correspondence with gures in the work of Jamin et al.
(2001), Fig. 6 in Krishnan et al. (2000) shows that the relative measured
tra!c ow reduction decreases logarithmically in the number of caches p.

Appendix A
Stochastic matrices

This appendix reviews the matrix theory for Markov chains. In-depth analyses are found in classical books by Gantmacher (1959a,b), Wilkinson (1965)
and Meyer (2000).
A.1 Eigenvalues and eigenvectors
1. The algebraic eigenproblem consists in the determination of the eigenvalues  and the corresponding eigenvectors { of a matrix D for which the
set of q homogeneous linear equations in q unknowns
D{ = {

(A.1)

has a non-zero solution. Clearly, the zero vector { = 0 is always a solution of


(A.1). A non-zero solution of (A.1) is only possible if and only if the matrix
D  L is singular, that is
det (D  L) = 0

(A.2)

This determinant can be expanded in a polynomial in  of degree q,


f() = (1)q q + fq31 q31 + + f1  + f0 = 0

(A.3)

which is called the characteristic (eigenvalue) polynomial of the matrix D


and where the coe!cients
X
Pq3n
(A.4)
fn = (1)n
doo

and Pn is a principal minor1 . Since a polynomial of degree q has q complex


zeros, the matrix D possesses q eigenvalues n , not all necessarily distinct.
1

A principal minor Pn is the determinant of a principal n n submatrix Pnn obtained by


deleting the same q 3 n rows and columns in D. Hence, the main diagonal elements (Pnn )ll
are n elements of main diagonal elements {dll }1$l$q of D.

435

436

Stochastic matrices

In general, the characteristic polynomials can be written as


f() =

q
Y

(n  )

(A.5)

n=1

Since f() = det (D  L), it follows from (A.3) and (A.5) that for  = 0
det D = f0 =

q
Y

n

(A.6)

n=1

Hence, if det D = 0, there is at least one zero eigenvalue. Also,


(1)q31 fq31 =

q
X

n = trace(D)

(A.7)

n=1

If all n  0, we can apply the general theorem of the arithmetic and


geometric mean (5.2) to (A.6) and (A.7) with tn = Sqn m ,
m=1

q
Y
n=1

P
n n

q
n=1 n n
P
q
m=1 m

!Sq

m=1

m

and by choosing m = 1, we nd the inequality

trace(D) q
det D 
q
To any eigenvalue , the set (A.1) has at least one non-zero eigenvector {.
Furthermore, if { is a non-zero eigenvector, also n{ is a non-zero eigenvalue.
Therefore, eigenvectors are often normalized, for instance, a probabilistic
eigenvector has the sum of its components equal to 1 or a norm k{k1 = 1
as dened in (A.23). If the rank of D  L is less than q  1, there will
be more than one independent vector. Just these cases seriously complicate
the eigenvalue problem. In the sequel, we omit the discussion on multiple
eigenvalues and refer to Wilkinson (1965).
2. The eigenproblem of the transpose DW ,
DW | = |

(A.8)

is of singular importance. Since the determinant


of a matrix is equal to the

W
determinant of its transpose, det D  L = det (D  L) which shows
that the eigenvalues of D and DW are the same. However, the eigenvectors
are, in general, dierent. Alternatively, we can write (A.8) as
|W D = | W

(A.9)

A.1 Eigenvalues and eigenvectors

437

The vector |mW is therefore called the left-eigenvector of D belonging to


the eigenvalue m , whereas {m is called the right-eigenvector belonging to
the same eigenvalue m . An important relation between left- and righteigenvectors of a matrix D is, for m 6= n ,
|mW {n = 0

(A.10)

Indeed, left-multiplying (A.1) with  = n by |mW ,


|mW D{n = n |mW {n
and similarly right-multiplying (A.9) with  = m by {n
|mW D{n = m |mW {n
leads, after subtraction to 0 = (n  m ) |mW {n and (A.10) follows. Since
eigenvectors may be complex in general and since |mW {n = {Wn |m , the expression |mW {n is not an inner-product that is always real and for which

W
|mW {n = {Wn |m holds. However, (A.10) expresses that the sets of left- and
right-eigenvectors are orthogonal if m 6= n .
3. If D has q distinct eigenvalues, then the q eigenvectors are linearly independent and span the whole q dimensional space. The proof is by reductio
ad absurdum. Assume that v is the smallest number of linearly dependent
eigenvectors labelled by the rst v smallest indices. Linear dependence then
means that,
v
X
n {n = 0
(A.11)
n=1

where n 6= 0 for 1  n  v. Left-multiplying by D and using (A.1) yields


v
X

n n {n = 0

(A.12)

n=1

On the other hand, multiplying (A.11) by v and subtracting from (A.12)


leads to
v31
X
n (n  v ) {n = 0>
n=1

which, because all eigenvalues are distinct, implies that there is a smaller
set of v  1 linearly depending eigenvectors. This contradicts the initial
hypothesis.
This important property has a number of consequences. First, it applies to
left- as well as to right-eigenvectors. Relation (A.10) then shows that the sets

438

Stochastic matrices

of left- and right-eigenvectors form a bi-orthogonal system with |nW {n 6= 0.


For, if {n were orthogonal to |n (or |nW {n = 0), (A.10) demonstrates that
{n would be orthogonal to all left-eigenvectors |m . Since the set of lefteigenvectors span the q dimensional vector space, it would mean that the
q dimensional vector {n would be orthogonal to the whole q-space, which
is impossible because {n is not the null vector. Second, any q dimensional
vector can be written in terms of either the left- or right-eigenvectors.
4. Let us denote by [ the matrix with in column m the right-eigenvector
{m and by \ W the matrix with in row n the left-eigenvector | W . If the rightand left-eigenvectors are scaled such that, for all 1  n  q, |nW {n = 1, then
\ W[ = L

(A.13)

or, the matrix \ W is the inverse of the matrix [. Furthermore, for any
right-eigenvector, (A.1) holds, rewritten in matrix form, that
D[ = [ diag(n )

(A.14)

Left-multiplying by [ 31 = \ W yields the similarity transform of matrix D,


[ 31 D[ = \ W D[ = diag(n )

(A.15)

Thus, when the eigenvalues of D are distinct, there exists a similarity transform K 31 DK that reduces D to diagonal form. In many applications, similarity transforms are applied to simplify matrix problems. Observe that a
similarity transform preserves the eigenvalues, because, if D{ = {, then
K 31 { = K 31 D{ = (K 31 DK)K 31 {. The eigenvectors are transformed to
K 31 {.
When D has multiple eigenvalues, it may be impossible to reduce D to
a diagonal form by similarity transforms. Instead of a diagonal form, the
most compact form when D has u distinct eigenvalues each with multiplicity
P
pm such that um=1 pm = q is the Jordan canonical form F,
5
9
9
9
F=9
9
7

Fp1 3d (1 )
Fd (1 )

:
:
:
:
:
8

..
.
Fpu31 (u31 )
Fpu (u )

A.1 Eigenvalues and eigenvectors

439

where Fp () is a p p submatrix of the form,


5


0
..
.

1

..
.

9
9
9
Fp () = 9
9
7 0
0

0
1 0
..
..
.
.
0 
0 0

..
.

:
:
:
:
:
1 8


The number of independent eigenvectors is equal to the number of submatrices. If an eigenvalue  has multiplicity p, there can be one large
submatrix Fp (), but also a number n of smaller submatrices Fem () such
P
that nm=1 em = p. This illustrates, as mentioned in art. 1, the much higher
complexity of the eigenproblem in case of multiple eigenvalues. For more
details we refer to Wilkinson (1965).
5. The companion matrix of the characteristic polynomial (A.3) of D is
dened as
5
9
9
9
F=9
9
7

(1)q31 fq31 (1)q31 fq32


1
0
0
1
..
..
.
.
0
0

..
.

(1)q31 f1 (1)q31 f0
0
0
0
0
..
..
.
.
1
0

6
:
:
:
:
:
8

Expanding det (F  L) in cofactors of the rst row yields det (F  L) =
f (). If D has distinct eigenvalues, D as well as F are similar to diag(l ). It
has been shown that the similarity transform K for D equals K = [. The
similarity transform for F is the Vandermonde matrix Y (), where
5
9
9
9
Y ({) = 9
9
9
7

{1q31 {2q31
{1q32 {2q32
..
..
.

.
..
{
.
{
1

q31
{q31
q31 {q
q32
{q31 {qq32
..
..
.
.

{q31
1

{q
1

6
:
:
:
:
:
:
8

The Vandermonde matrix Y () is clearly non-singular if all eigenvalues are

440

Stochastic matrices

distinct. Furthermore,
5
9
9
9
Y ()diag (l ) = 9
9
9
7
while

9
9
9
FY () = 9
9
9
7

q1
q2

q31
1
2q31
..
..
.

.
..
2
2
.
1

1

2

(1)q1 f (1 ) + q1


1q1
..
.

(1)q1 f (2 ) + q2


2q1
..
.

21
1

22
2

qq31 qq
q31
q31
q31 q
..
..
.
.
2q31
q31

..
.

2q
q

6
:
:
:
:
:
:
8

(1)q1 f (q ) + qq


q1
q
..
.
2q
q

6
:
:
:
:
:
:
8

Since f (m ) = 0, it follows that FY () = Y ()diag(l ), which demonstrates


the claim. Hence, the eigenvector {n of F belonging to eigenvalue n is

{Wn = nq31 nq32 n 1


6. When left-multiplying (A.1), we obtain
D2 { = D{ = 2 {
or, in general for any integer n  0,
Dn { = n {

(A.16)

Since any eigenvalue  satises its characteristic polynomial f () = 0, we


directly nd from (A.16) that the matrix D satises its own characteristic
equation,
f(D) = 0

(A.17)

This result is the CaleyHamilton theorem. There exist several other proofs
of the CaleyHamilton theorem.
7. Consider an arbitrary matrix polynomial in ,
I () =

p
X

In n

n=0

where all In are q q matrices and Ip 6= R. Any matrix polynomial I ()


can be right and left divided by another (non-zero) matrix polynomial E()
in a unique way as proved in Gantmacher (1959a, Chapter IV). Hence the

A.1 Eigenvalues and eigenvectors

441

left-quotient and left-remainder I () = E()TO () + O() and the rightquotient and right-remainder I () = TU ()E() + U() are unique. Let us
concentrate on the right-remainder in the case where E() = L  D is a
linear polynomial in . Using Euclids division scheme for polynomials,
p31

I () = Ip 

p31

(L  D) + (Ip D + Ip31 ) 

= Ip p31 + (Ip D + Ip31 ) p32 (L  D)

p32
X

In n

n=0

X
p32 p33

2
+ Ip D + Ip31 D + Ip32 
+
In n
n=0

and continued, we arrive at


5
I () = 7Ip p31 + + n31

p
X

Im Dm3n + +

p
X

6
Im Dm31 8 (L  D)

m=1

m=n

p
X

Im Dm

m=0

In summary, I () = TU () (L  D) + U() (and similarly for the leftquotient and left-remainder) with
P

P
Pp
p
p
n31
m3n
n31
m3n I
T

I
D
()
=

D
TU () = p
m
m
O
m=n
m=n
P n=1 m
Ppn=1 m
U() = p
I
D
=
I
(D)
O()
=
D
I
m
m=0 m
m=0
(A.18)
and where the right-remainder is independent of . The Generalized Bzout
Theorem states that the polynomial I () is divisible by (L  D) on the
right (left) if and only if I (D) = R (O() = R).
By the Generalized Bzout Theorem, the polynomial I () = j()L j(D)
is divisible by (L  D) because I (D) = j(D)L  j(D) = R. If I () is an
ordinary polynomial, the right- and left-quotient and remainder are equal.
The CaleyHamilton Theorem (A.17) states that f(D) = 0, which indicates
that f()L = T() (L  D) and also f()L = (L  D) T(). The matrix
T() = (L  D)31 f() is called the adjoint matrix of D. Explicitly, from
(A.18),
3
4
q
q
X
X
T() =
n31 C
fm Dm3n D
n=1

m=n

Pq
m3n . The main theand, with (A.6), T(0) =  (D)31 det D =
m=1 fm D
oretical interest of the adjoint matrix stems from its denition f()L =

442

Stochastic matrices

T() (L  D) = (L  D) T() in case  = n is an eigenvalue of D. Then,


(n L  D) T(n ) = 0, which indicates by (A.1) that every non-zero column
of the adjoint matrix T(n ) is an eigenvector belonging to the eigenvalue
n . In addition, by dierentiation with respect to , we obtain
f0 ()L = (L  D) T0 () + T()
This demonstrates that, if T(n ) 6= R, the eigenvalue n is a simple root of
f() and, conversely, if T(n ) = R, the eigenvalue n has higher multiplicity.
The adjoint matrix T() = (L  D)31 f() is computed by observing
is divisible without rethat, on the Generalized Bzout Theorem, f()3f()
3
mainder. By replacing in this polynomial  and  by L and D respectively,
T() readily follows as illustrated in Section A.4.2.
8. Consider the arbitrary polynomial of degree o,
j({) = j0

o
Y

({  m )

m=1

Substitute { by D, then
j(D) = j0

o
Y

(D  m L)

m=1

Since det (DE) = det D det E and det(nD) = n q det D, we have


det(j(D)) = j0q

o
Y

det(D  m L) = j0q

m=1

o
Y

f(m )

m=1

With (A.5),
det(j(D)) = j0q

o Y
q
Y

(n  m ) =

m=1 n=1

q
Y

q
Y
n=1

j0

o
Y

(n  m )

m=1

j (n )

n=1

If k({) = j({)  , we arrive at the general result: For any polynomial j({),
the eigenvalues values of j(D) are j (1 ) > = = = > j (q ) and the characteristic
polynomial is
q
Y
(j (n )  )
(A.19)
det(j(D)  L) =
n=1

which is a polynomial in  of degree at most q. Since the result holds for an

A.2 Hermitian and real symmetric matrices

443

arbitrary polynomial, it should not surprise that, under appropriate conditions of convergence, it can be extended to innite polynomials, in particular
to the Taylor series of a complex function. As proved in Gantmacher (1959a,
Chapter V), if the power series of a function i (}) around } = }0
i (}) =

"
X

im (}0 )(}  }0 )m

(A.20)

m=1

P
m
converges for all } in the disc |}}0 | ? U, then i (D) = "
m=1 im (}0 )(D}0 L)
provided all eigenvalues of D lie with the region of convergence of (A.20),
i.e. |  }0 | ? U. For example,
hD} =
log D =

"
X
} n Dn
n=0
"
X
n=1

n!

for all D

(1)n31
(D  L)n for |n  1| ? 1, all 1  n  q
n

and, from (A.19), the eigenvalues of hD} are h}1 > = = = > h}1 . Hence, the knowledge of the eigenstructure of a matrix D allows us to compute any function
of D (under the same convergence restrictions as complex numbers }).

A.2 Hermitian and real symmetric matrices

W
=
A Hermitian matrix D is a complex matrix that obeys DK = DW
D, where dK = (dlm )W is the complex conjugate of dlm = Hermitian matrices
possess a number of attractive properties. A particularly interesting subclass
of Hermitian matrices are real, symmetric matrices that obey DW = D.The
W
inner-product of vector | and { is dened as | K { and obeys | K { =
K K
P
| { = {K |. The inner-product {K { = qm=1 |{m |2 is real and positive
for all vectors except for the null vector.
9. The eigenvalues of a Hermitian matrix are all real. Indeed, leftmultiplying (A.1) by {K yields
{K D{ = {K {
K

and, since {K D{ = {K DK { = {K D{, it follows that {K { = K {K { or


 = K because {K { is a positive real number. Furthermore, since D = DK ,
we have
DK { = {

444

Stochastic matrices

Taking the complex conjugate, yields


DW {W = {W
In general, the eigenvectors of a Hermitian matrix are complex, but real for a
real symmetric matrix since DK = DW . Moreover, the left-eigenvector | W is
the complex conjugate of the right-eigenvector {. Hence, the orthogonality
relation (A.10) reduces, after normalization, to an inner-product
{K
n {m = nm

(A.21)

where nm is the Kronecker delta, which is zero if n 6= m and else nn = 1.
Consequently, (A.13) reduces to
[K [ = L
which implies that the matrix [ formed by the eigenvectors is an unitary
matrix ([ 31 = [ K ). For a real symmetric matrix D, the corresponding
relation [ W [ = L implies that [ is an orthogonal matrix ([ 31 = [ W ).
Although the arguments so far (see Section A.1) have assumed that the
eigenvalues of D are distinct, the theorem applies in general (as proved in
Wilkinson (1965, Section 47)): For any Hermitian matrix D, there exists a
unitary matrix X such that
X K DX = diag (m )

real m

and for any real symmetric matrix D, there exists an orthogonal matrix X
such that
X W DX = diag (m )

real m

10. To a real symmetric matrix D, a bilinear form {W D| is associated,


which is a scalar dened as
q
q X
X
W
W
dlm {l |m
{ D| = {D| =
l=1 m=1

We call a bilinear form a quadratic form if | = {. A necessary and su!cient


condition for a quadratic form to be positive denite, i.e. {W D{ A 0 for all
{ 6= 0, is that all eigenvalues of D should be positive. Indeed, art. 9 shows
the existence of an orthogonal matrix X that transforms D to a diagonal
form. Let { = X }, then
{W D{ = } W X W DX } =

q
X
n=1

n }n2

(A.22)

A.3 Vector and matrix norms

445

which is only positive for all }n provided n A 0 for all n. From (A.6),
a positive denite quadratic form {W D{ possesses a positive determinant,
det D A 0. This analysis shows that the problem of determining an orthogonal matrix X (or the eigenvectors of D) is equivalent to the geometrical
problem of determining the principal axes of the hyper-ellipsoid
q
q X
X

dlm {l |m = 1

l=1 m=1

Relation (A.22) illustrates that the eigenvalues n are the squares of the
principal axis. A multiple eigenvalue refers to an indeterminacy of the principal axes. For example if q = 3, an ellipsoid with two equal principal axis
means that any section along the third axis is a circle. Any two perpendicular diameters of the largest circle orthogonal to the third axis are principal
axis of that ellipsoid.

A.3 Vector and matrix norms


Vector and matrix norms, denoted by k{k and kDk respectively, provide a
single number reecting a size of the vector or matrix and may be regarded
as an extension of the concept of the modulus of a complex number. A norm
is a certain function of the vector components or matrix elements. All norms,
vector as well as matrix norms, satisfy the three distance relations
(i) k{k A 0 unless { = 0
(ii) k{k = || k{k for any complex number 
(iii) k{ + |k  k{k + k|k
In general, the Hlder t-norm of a vector { is dened as
41@t
3
q
X
|{m |t D
k{k = C
t

(A.23)

m=1

For example, the well-known Euclidean norm or length of the vector { is


found for t = 2 and k{k22 = {K {. In probability theory where { denotes
P
a discrete pdf, the law of total probability states that k{k1 = qm=1 {m = 1
and we will write k{k1 = k{k. Finally, max |{m | = limt<" k{kt = k{k" .
The unit-spheres Vt = {{| k{kt = 1} are, in three dimensions q = 3, for
t = 1, an octahedron, for t = 2, a ball and for t = 4, a cube. Furthermore,
V1 ts into V2 , which in turn ts into V" , implies that k{k1  k{k2  k{k"
for any {.

446

Stochastic matrices

The Hlder inequality proved in Section 5.5 states that, for


real s> t A 1,
K
{ |  k{k k|k
s
t

1
s

+ 1t = 1 and
(A.24)

A special case of the Hlder inequality where s = t = 2 is the CauchySchwarz inequality


K
{ |  k{k k|k
(A.25)
2
2
The t = 2 norm is invariant under an unitary (hence also orthogonal) transformation X , where X K X = L, because kX {k22 = {K X K X { = {K { = k{k2 .
An p
other example
s of a non-homogeneous vector norm is the quadratic
form k{kD = {W D{ provided D is positive denite. Relation (A.22)
shows that, if not all eigenvalues m of D are the same, not all p
components
of the vector { are weighted similarly and, thus, in general, k{kD is a
non-homogeneous norm. The quadratic form k{kL equals the homogeneous
Euclidean norm k{k22 .

A.3.1 Properties of norms


All norms are equivalent in the sense that there exist positive real numbers
f1 and f2 such that, for all {,
f1 k{ks  k{kt  f2 k{ks
For example,
k{k2  k{k1 

s
q k{k2

k{k"  k{k1  q k{k"


s
k{k"  k{k2  q k{k"
By choosing in the Hlder inequality (5.15) s = t = 1, {m $ m {vm for real
v A 0 and |m $ m A 0, we obtain with 0 ?  ? 1 an inequality for the
weighted t-norm
Pq

m=1 m |{m |
Pq
m=1 m

v

Pq

!1

v

v
m=1 m |{m |
P
q
m=1 m

!1
v

For m = 1, the weights m disappear such that the inequality for the Hlder
t-norm becomes
1 1

k{kv  k{kv q v (  31)

A.3 Vector and matrix norms

where q

1 1
( 31)
v 

447

 1. On the other hand, with 0 ?  ? 1 and for real v A 0,

P
q

1v

4 1v 3
41
3
1 v
q
q
v
v
X
X
k{kv
|{
|{
|
|
m
m
D =C
D
C
Pq
= P
1 =
1
Pq
v
q
k{kv
|{
|
v ) 
n
n=1
( n=1 |{n |v ) v
(
|{
|
m=1
m=1
n
n=1
v
m=1 |{m |

Since | =

|{ |v
Sq m
v
n=1 |{n |

 1 and

1


A 1, it holds that |   | and

3
41 3
4 1v P
!1
1 v
q
q
q
v
v
X
X
|{m |v v
|
|{
|
|{
m=1
m
m
C
D C
D = Pq
Pq
Pq
=1
v
v
v
n=1 |{n |
n=1 |{n |
n=1 |{n |
m=1
m=1
1 1

which leads to the opposite inequality (without normalization as q v (  31) ),


k{kv  k{kv
In summary, if s A t A 0, then the general inequality for Hlder t-norm is
1

k{ks  k{kt  k{ks q t 3 s

(A.26)

For p q matrices D, the most frequently used norms are the Euclidean
or Frobenius norm
41@2
3
q
p X
X
|dlm |2 D
(A.27)
kDkI = C
l=1 m=1

and the t-norm


kDkt = sup

kD{kt

(A.28)
k{kt

{
= D k{k
, which shows that

{6=0

On the second distance relation,

kD{kt
k{kt

kDkt = sup kD{kt

(A.29)

k{kt =1

Furthermore, the matrix t-norm (A.28) implies that


kD{kt  kDkt k{kt

(A.30)

Since the vector norm is a continuous function of the vector components and
since the domain k{kt = 1 is closed, there must exist a vector { for which
equality kD{kt = kDkt k{kt holds. Since the n-th vector component of D{
P
is (D{)l = qm=1 dlm {m , it follows from (A.23) that

t 41@t
3

p X
X
q

D
C
kD{kt =
d
{
lm
m

l=1 m=1

448

Stochastic matrices

For example, for all { with k{k1 = 1, we have that

X
q
p X
p X
q
p
X
X
X
q


kD{k1 =
d
{
|d
|
|{
|
=
|{
|
|dlm |
lm m
lm
m
m

l=1 m=1
l=1 m=1
m=1
l=1

!
q
p
p
X
X
X

|{m | max
|dlm | = max
|dlm |
m=1

l=1

l=1

Clearly, there exists a vector { for which equality holds, namely, if n is the
column in D with maximum absolute sum, then { = hn , the n-th basis vector
with all components zero, except for the n-th one, which is 1. Similarly, for
all { with k{k" = 1,

q
q
X
X
q

kD{k" = max
dlm {m  max
|dlm | |{m |  max
|dlm |
l
l
l

m=1
m=1
m=1
Again, if u is the row with maximum absolute sum and {m = 1.sign(dum )
P
P
such that k{k" = 1, then (D{)u = qm=1 |dum | = maxl qm=1 |dlm | = kD{k" .
Hence, we have proved that
kDk" = max
l

kDk1 = max
m

from which

q
X
m=1
p
X

|dlm |

(A.31)

|dlm |

(A.32)

l=1

K
D = kDk
1
"

The t = 2 matrix norm, kD{k2 > is obtained dierently. Consider


kD{k22 = (D{)K D{ = {K DK D{
Since DK D is a Hermitian matrix, art. 9 shows that all eigenvalues are real
and non-negative because a norm kD{k22  0. These ordered eigenvalues are
denoted as 12  22   q2  0. Applying the theorem in art. 9, there
exists a unitary matrix X such that { = X } yields

{K DK D{ = } K X K DK DX } = } K diag m2 }  12 } K } = 12 k}k22
Since the t = 2 norm is invariant under a unitary (orthogonal) transform
k{k2 = k}k2 , by the denition (A.28),
kDk2 = sup
{6=0

kD{k2
= 1
k{k2

(A.33)

A.3 Vector and matrix norms

449

where the supremum is achieved if { is the eigenvector of DK D belonging to


12 . Meyer (2000, p. 279) proves the corresponding result for the minimum
eigenvalue provided that D is non-singular,
31
D =
2

1
= q31
min kD{k2

k{k2 =1

The non-negative quantity m is called the m-th singular value and 1 is


the largest singular value of D. The importance of this result lies in an
extension of the eigenvalue problem to non-square matrices which is called
the singular value decomposition. A detailed discussion is found in Golub
and Loan (1983). If D has real eigenvalues 1  2   q , the above
can be simplied and we obtain
{W D{
{W {

(A.34)

{W D{
{6=0 {W {

(A.35)

1 = sup
{6=0

q = inf

because, for any {, it holds that q {W{  {W D{  1 {W {.


The Frobenius norm kDk2I = trace DK D . With (A.7) and the analysis
of DK D above,
kDk2I =

q
X

n2

(A.36)

n=1

In view of (A.33), the bounds kDk2  kDkI 

s
q kDk2 may be attained.

A.3.2 Applications of norms

n n31
(a) Since D = DD  kDk Dn31 , by induction, we have for any
integer n, that

n
n
D  kDk
and
lim Dn = 0 if kDk ? 1

n<"

(b) By taking the norm of the eigenvalue equation (A.1), kD{k = || k{k
and with (A.30),
||  kDkt

(A.37)

450

Stochastic matrices

Applied to DK D, for any t-norm,


12  DK Dt  DK t kDkt
Choose t = 1 and with (A.33),

kDk22  DK 1 kDk1 = kDk" kDk1
(c) Any matrix D can be transformed by a similarity transform K to
a Jordan canonical form F (art. 4) as D = KFK 31 , from which Dn =
KF n K 31 . A typical Jordan submatrix (Fp ())n = n32 E, where E is
independent of n. Hence, for large n, Dn $ 0 if and only if || ? 1 for all
eigenvalues.

A.4 Stochastic matrices


A probability matrix S is reducible if there is a relabeling of the states that
leads to


S
E
1
Se =
R S2
where S1 and S2 are square matrices. Relabeling amounts to permuting rows
and columns in the same fashion. Thus, there exists a similarity transform
K such that S = K SeK 31 .

A.4.1 The eigenstructure


In this section, the basic theorem on the eigenstructure of a stochastic,
irreducible matrix will be proved.
Lemma A.4.1 If S is an irreducible non-negative matrix and if y is a vector
with positive components, then the vector } = (S +L)y has always fewer zero
components than y.
Proof: Denote

y=

y1
0


and } =

}1
0

where y1 A 0> }1 A 0

which is always possible by suitable renumbering of the states and




S11 S12
S =
S21 S22

A.4 Stochastic matrices

451

The relation } = (S + L)y is written as






y1
}1
S11 y1
+
=
0
S21 y1
0
Since S is irreducible, S21 6= R, such that y1 A 0 implies that S21 y1 6= 0,
which proves the lemma.

Observe, in addition, that all components of } are never smaller than


those of y. Also, transposing does not alter the result.
Theorem A.4.2 (Frobenius) The modulus of all eigenvalues  of an irreducible stochastic matrix S are less than or equal to 1. There is only one real
eigenvalue  = 1 and the corresponding eigenvector has positive components.
Proof: The t = 4 norm (A.31) of a probability matrix S with q states
dened by (9.7) subject to (9.8) precisely equals kS k" = 1. From (A.37), it
follows that all eigenvalues are, in absolute value, smaller than or equal to 1.
Since all elements Slm 5 [0> 1] and because an irreducible matrix has no zero
element rows, y W S has positive components if yW has positive components.
(yW S )
Thus, there always exists a scalar 0 ? y = min1$n$q (yW ) n , such that
n

y y W  y W S . By Lemma A.4.1, we can always transform the vector y to a


vector } by right-multiplying both sides with (L + S ) such that
y y W (L + S )  yW S (L + S )
y } W  } W S
and, by denition of y , y  } since the components of } are never smaller
than those of y. Hence, for any arbitrary vector y with positive components,
the transform in Lemma A.4.1 leads to an increasing set y  }  ,
which is bounded by 1 because no eigenvalue can exceed 1. This shows that
 = 1 is the largest eigenvalue and the corresponding eigenvector | W has
positive components.
This eigenvector | W is unique. For, if there were another linearly independent eigenvector zW corresponding to the eigenvalue  = 1, any linear
combination } W = | W + zW is also an eigenvector belonging to  = 1.
But  and  can always be chosen to produce a zero component which the
transform method shows to be impossible. The fact that the eigenvector | W
is the only eigenvector belonging  = 1, implies that the eigenvalue  = 1 is
a single zero of the characteristic polynomial of S .

The theorem proved for stochastic matrices is a special case of the famous
Frobenius theorem for non-negative matrices (see for a proof, e.g. Gant-

452

Stochastic matrices

macher (1959b, Chapter XIII)). We note that, in the theory of Markov


chains, the interest lies in the determination of the left-eigenvector | W = 
belonging to  = 1, because the right-eigenvector { of S belonging to  = 1
equals xW = [1 1 1], where  is a scalar, because of the constraints
(9.8). Recall (A.10) and (A.13), the proper normalization, | W x = 1, precisely corresponds to the total law of probability. Using the interpretation
of Markov chains, an alternative argument is possible. If all eigenvalues
were || ? 1, application (c) in Section A.3.2 indicates that the steady-state
would be non-existent because S n $ 0 for n $ 4. Since this is impossible,
there must be at least one eigenvalue with || = 1. Furthermore, (9.22)
shows that at least one eigenvalue corresponding to the steady-state is real
and precisely 1.
Corollary A.4.3 An irreducible probability matrix S cannot have two linearly independent eigenvectors with positive components.
Proof: Consider, apart from |W =  belonging to  = 1, another eigenvector zW belonging to the eigenvalue $ 6= 1. On art. 3, zW | = 0, which is

only possible if not all components of zW are positive.

The corollary is important because no other eigenvector of S than | W = 


can represent a (discrete) probability density. Since the null vector is never
an eigenvector, the corollary implies that at least one component in the
other eigenvectors must be negative.
Since the characteristic polynomial of S has real coe!cients (because Slm
is real), the eigenvalues occur in complex conjugate pairs. Since  = 1 is an
eigenvalue, for an even number of state q, there must be at least another
real eigenvalue obeying 1   ? 1. It has been proved that the boundary
of the locations of the eigenvalues inside the unit disc consists of a nite
number of points on the unit circle joined by certain curvilinear arcs.
There exist an interesting property of a rank-one update S of a stochastic
matrix S . The lemma is of a general nature and also applies to reducible
Markov chains with several eigenvalues m = 1 for 1 ? m  n.
Lemma A.4.4 If {1> 2 > 3 > = = = > q } are the eigenvalues of the stochastic
matrix S , then the eigenvalues of S = S + (1  )xy W , where y W is any
probability vector, are {1> 2 > 3 > = = = > q }.

A.4 Stochastic matrices

453

Proof: We start from the eigenvalues equation (A.2)

det S  L = det S  L + (1  )xy W

= det (S  L) L + (S  L)31 (1  )xyW

= det (S  L) det L + (1  ) (S  L)31 xy W


Applying the formula

det L + fgW = 1 + gW f

(A.38)

which follows, after taking the determinant, from the matrix identity

L 0
L
0
L + fgW f
L
f
=
gW 1
0
1
gW 1
0 1 + gW f
gives

det S  L = det (S  L) 1 + y W (1  ) (S  L)31 x

Since the row sum of a stochastic matrix S is 1, we have that S x = x and,


thus, (S  L) x = (  ) x from which (S  L)31 x = (  )31 x.
Using this result leads to
1 + y W (1  ) (S  L)31 x = 1 +

1
1 W
1
y x=1+
=




because a probability vector is normalized to 1, i.e. y W x = 1. Hence, we end


up with

1
det S  L = det (S  L)

Invoking (A.19) yields
q
q
Y
Y
1

det S  L =
= (1  )
(n  )
(n  )


n=1

n=2

which shows that the eigenvalues of S are {1> 2 > 3 > = = = > Q }.

A similar property may occur in a special case where a Markov chain is


supplemented by an additional state q + 1 which connects to every other
state and to which every other state is connected (such that S is irreducible).
Then,

S (1  )x

S =
0
yW

454

Stochastic matrices

with corresponding eigenvalues {1> 2 > 3 > = = = > q > 0}. This result is similarly proved as Lemma A.4.4 using (Meyer, 2000, p. 475)

D E
(A.39)
= det D det G  FD31 E
det
F G
provided D31 exists unless F = 0.
A.4.2 Example: the two-state Markov chain
The two-state Markov chain is dened by


1s
s
S =
t
1t
Observe that det S = 1st. The eigenvalues of S satisfy the characteristic
polynomial f() = 2  (2  s  t) + det S = 0, from which 1 = 1 and
2 = 1  s  t = det S . The adjoint matrix T () is computed (art. 7) via
the polynomial f()3f()
3 ,
f()  f()
=  +   (2  s  t)

and after  $ L and  $ S
T () = L + S  (2  s  t)L


1+t
s
=
t
1+s
The (unscaled) right- (left-) eigenvectors of S follow as the non-zero columns
(rows) of T (). For 1 = 1, we nd {1 = (1> 1) and |1W = (t> s). For
2 = 1st, the eigenvector {2 = (s> t) and |2W = (1> 1). Normalization
1
1
(1> 1) and {2 = s+t
(s> t). If the
(art. 4) requires that |nW {n = 1 or {1 = s+t
eigenvalues are distinct (s + t 6= 0), the matrix S can be written as (art. 4)
S = [diag(n )\ W ,




1
1 s
1
0
t s
S =
0 1st
1 1
s + t 1 t
from which any power S n is immediate as




1
1
0
1 s
t s
n
S =
1 1
0 (1  s  t)n
s + t 1 t



n
(1  s  t)
1
t s
s s
=
+
t t
s+t t s
s+t

(A.40)

A.4 Stochastic matrices

The steady-state matrix S " = limn<" S n follows as





1
t s

"
S =


s+t t s

455

(A.41)

because |1  s  t| ? 1.
Alternatively, the steady-state vector is a solution of (9.25),



0
s t
1
=
2
1
1 1


s t
Applying Cramers rule with G = det
= (s + t), we obtain
1 1



0 t
s 0
1
1
and 2 = G
or
det
det
1 = G
1 1
1 1
i
h
s
t
 = s+t
s+t
which indeed agrees with (A.41) and (9.37).
A.4.3 The tendency towards the steady-state
A stochastic matrix S and the corresponding Markov chain is regular if the
only eigenvalue with || = 1 is  = 1. It is fully regular if, in addition,
 = 1 is a simple zero of the characteristic polynomial of S . The Frobenius Theorem A.4.2 indicates that a regular matrix is necessarily reducible.
Application (c) in Section A.3.2 demonstrates that the steady-state only
exists for regular Markov chains. Alternatively, a regular matrix S has the
property that S n A R (for some n), i.e. all elements are strictly positive.
In the sequel, we concentrate on fully regular stochastic matrices S , where
all eigenvalues lie within the unit circle, except for the largest one,  = 1.
If the Q eigenvalues of the regular stochastic matrix S are ordered as 1 =
1 A |2 |   |Q |  0, the second largest eigenvalue 2 will determine
the speed of convergence of the Markov chain towards the steady-state.
A.4.3.1 Example: the three-state Markov chain
The three-state Markov chain S is dened by (9.7) with Q = 3. Assuming
that S is irreducible, we determine the eigenvalues. Since the Frobenius
Theorem A.4.2 already determines one eigenvalue 1 = 1, the remaining
two 2 and 3 are found from (A.6) and (A.7). They obey the equations
2 3 = det S
2 + 3 = S11 + S22 + S33  1 = trace(S )  1

456

Stochastic matrices

or the quadratic equation {2  (2 + 3 ) { + 2 3 = 0. The explicit solution


is
q
1
1
2 = (trace(S )  1) +
(trace(S )  1)2  4 det S
2
2q
1
1
3 = (trace(S )  1) 
(trace(S )  1)2  4 det S
2
2
All eigenvalues are real if the discriminant (trace(S )  1)2  4 det S is nonnegative which leads to three cases:
(a) In case (trace(S )  1)2 A 4 det S , the eigenvalues obey 1 A 2 A
3 , but not necessarily 1 A |2 | A |3 |. The latter inequality is true if
trace(S ) A 1, in which case the speed of convergence towards the steadystate is determined
by the decay of (2 )n as n $ 4. If trace(S ) = 1, then
s
2 = 3 =  det S and if trace(S ) ? 1, |3 | determines the speed of
convergence. Notice that 2 A 12 (trace(S )  1)   12 .
(b) In case (trace(S )  1)2 ? 4 det S , there are two complex conjugate
roots 2 =  + l and 3 =   l, both with the same modulus |2 | = |3 |
equal to 2 + 2 = 2 3 = det S and with real part  = 12 (trace(S )  1). In
this case, we have that 0  det S ? 1. Hence, the Markov chain converges
s
n
det S
in the discrete-time n.
towards the steady-state as
(c) In case (trace(S )  1)2 = 4 det S , there is a double eigenvalue
 = 2 = 3 =

s
1
(trace(S )  1) = det S
2

and S cannot be reduced by a similarity transform K to a diagonal matrix


(Section A.1, art. 4) that but to the Jordan canonical form F such that
S n = K 31 F n K. Since (Meyer, 2000, pp. 599600)
6
35
64n 5
1 0 0
1 0
0
F n = C7 0  1 8D = 7 0 n nn31 8
0 0
n
0 0 
the Markov chain converges towards the steady-state as n

n31
s
det S
in

the discrete-time n. We observe that  12  2 = 3 ? 1 because 0 


trace(S ) ? 3. If trace(S ) = 3, then S = L, and S is not irreducible.
The fastest possible convergence occurs when 2 = 3 = 0 or when
det S = 0 and trace(S ) = 1 in which case S has rank 1. In any matrix of rank 1, all row vectors are linearly dependent. Since the column
sum of a stochastic matrix S is 1 by (9.8), every row in S is precisely the
same and (9.6) shows that after one discrete-time step, the steady-state

A.5 Special types of stochastic matrices

457

 = Q1 Q1 Q1 is reached. As shown in Section 9.3.1, a transition


probability matrix with constant rows can be regarded as a limit transition probability matrix D = limn<" S n of a Markov process with transition
probability matrix S .

A.5 Special types of stochastic matrices


A.5.1 Doubly stochastic matrices
A doubly stochastic matrix S has both row and column sums equal to 1,
Q
X
n=1

Sln =

Q
X

Snm = 1

for all l> m

n=1

If S is symmetric, S = S W , then S is doubly stochastic, but the reverse


implication is not true. As observed in Section A.4.1, the left-eigenvector
| W =  and the right-eigenvector { = x belonging to eigenvalue  = 1 satisfy
| W x = 1. For doubly stochastic matrices, it holds that the role of left- and
right-eigenvector can be reversed, which leads to | = { or

 = Q1 Q1 Q1
The example
in Section A.4.3
illustrates that a steady-state vector equal

does not necessarily imply that S is doubly


to  = Q1 Q1 Q1
stochastic.

A.5.2 Tri-diagonal bandmatrices


A.5.2.1 Tri-diagonal Toeplitz bandmatrix
A Toeplitz matrix has constant entries on each diagonal parallel to the main
diagonal. Of particular interest is the Q Q tri-diagonal Toeplitz matrix,
6
5
e d
:
9 f e
d
:
9
:
9
.. .. ..
D=9
:
.
.
.
:
9
7
f
e d 8
f

that arises in the Markov chain of the random walk and the birth and death
process. Moreover, the eigenstructure of the tri-diagonal Toeplitz matrix D
can be expressed in analytic form.

458

Stochastic matrices

An eigenvector { corresponding to eigenvalue  satises (D  L) { = 0


or, written per component,
(e  ){1 + d{2 = 0
f{n31 + (e  ){n + d{n+1 = 0

2n Q 1

f{Q31 + (e  ){Q = 0
We assume that d 6= 0 and f 6= 0 and rewrite the set with {0 = {Q+1 = 0 as

f
e
{n+1 +
{n+2 +
0n Q 1
{n = 0
d
d
which are second order dierence equations with constant coe!cients. The
n
general solution of these equations is {n = u1n +u
2 where

f u1 and u2 are the


e3
2
roots of the corresponding polynomial { + d { + d = 0. If u1 = u2 ,
the general solution is {n = u1n + nu1n , which is impossible since it implies
that all {n = 0 due to the fact that {0 = {Q+1 = 0, which forces u1 to be
zero. An eigenvector is never the zero vector. Thus, we have distinct roots
u1 6= u2 that satisfy

e
u1 + u2 = 
d
f
u1 u2 =
d
The constants  and  follow from the boundary requirement {0 = {Q +1 = 0
as
+ =0
u1Q +1

+ u2Q+1 = 0

Rewriting the last equation with  = , yields

Q+1
u1
u2

= 1 or

u1
u2

2lp
Q +1

for some 1  p  Q (the root p = 0 must be rejected since u1 6= u2 ).


h
2lp
Substitution of u1 = u2 h Q +1 into the last root equation yields
u1 =

pf

lp

Q +1
dh

and u2 =

pf

lp

3 Q +1
dh

The rst root equation is only possible for special values of  = p with
1  p  Q , which are the eigenvalues,
r

lp
s
f 3 Qlp
p
p = e + d
h +1 + h Q +1 = e + 2 df cos
d
Q +1

A.5 Special types of stochastic matrices

459

Since there are precisely Q dierent values of p, there are Q distinct eigenvalues p . The components {n of the eigenvector belonging to p are

f n
f n lpn
lpn
pn
2
2
h Q +1  h3 Q +1 = 2l
sin
{n = 
d
d
Q +1
The scaling constant  follows from the normalization k{k1 = 1 or
2l

Q n
X
f 2
n=1

Since sin

pn
Q+1

Q n
X
f 2
n=1

sin

pn
Q +1

=1

h lpn i
= Im h Q +1 we have

sin

pn
Q +1

= Im

" Q r
X
f
n=1

d
p
f

lp
Q +1

lp
Q +1

n #
Q+1

dh
:
91 
 18
= Im 7
p f lp
1  d h Q +1

p f Q +1
p
1 + (1)p
sin
d
Q+1

=
1
pf
p
f
1  2 d cos Q+1 + d

from which the scaling constant  is

5
631
p f Q+1
p
sin
1 + (1)p
d
Q+1

2l = 7
 18
pf
p
f
1  2 d cos Q+1 + d
Finally, the components {n of the eigenvector { belonging to p become,
for 1  n  Q ,

f n
pn
2 sin
d
Q+1
{n = 
s f Q +1 
p
1+(31) ( d )
sin( Qp
+1 )
sf
1
p
132 d cos( Q +1 )+ df
Observe that for stochastic matrices d + e + f = 1 (see the general random
walk in Section 11.2) and for the innitesimal rate matrix d + e + f = 0
(see the birth and death process in Section 11.3), which only changes the
eigenvalue through e.

460

Stochastic matrices

A.5.2.2 Tri-diagonal AMS matrix


This section computes the exact spectrum of the tri-diagonal AMS matrix
specied in (14.51). The analysis bears some resemblance to that of the birth
and dead process with constant birth and death rates in Section 11.3.3.
The eigenvalue equation G31 T{ = { is rewritten for the m-th component
of the right-eigenvector belonging to the eigenvalue  as
(Q  m + 1) {m31  [( + 1  ) m + Q   f] {m + (m + 1) {m+1 = 0
for 0  m  Q . This dierence equation has linear coe!cients whereas
those in Section A.5.2.1 are constant. It is most conveniently solved using
P
m
generating functions. Let J (}) = Q
m=0 {m } , then the dierence equation
@ [0> Q ] to
is transformed with {m = 0 if m 5

Q } J (})  {Q } Q  } 2 J0 (})  Q {Q } Q31  ( + 1  ) }J0 (})


[Q  f]J (}) + J0 (}) = 0
from which the logarithmic derivative is
J0 (})
Q } + f  Q 
= 2
J (})
} + ( + 1  ) }  1
The integration of the right-hand side requires a partial faction decomposition,
} 2

Q } + f  Q 
f1
f2
=
+
+ ( + 1  ) }  1
}  u1 }  u2

where u1 and u2 are the roots of the quadratic polynomial } 2 +( + 1  ) }


1 and f1 and f2 are the residues computed for n = 1> 2 as
fn = lim

}<un

(}  un ) (Q } + f  Q)
 (}  u1 ) (}  u2 )

and they obey f1 + f2 = Q and f1 u2 + f2 u2 =

f3Q
.


Explicitly,

p
( + 1  )2 + 4
A0
u1 =
2
p
( + 1  )  ( + 1  )2 + 4
u2 =
?0
2
( + 1  ) +

(A.42)

s
with u1 u2 =  1 and u1 + u2 =  +13
 . Moreover, unless  =   1 2l 

A.5 Special types of stochastic matrices

461

in which case u1 = u2 = Il , the roots are distinct. The residues are
Q u1 + f  Q 
 (u1  u2 )
Q u2 + f  Q 
f2 =
= Q  f1
 (u2  u1 )
f1 =

(A.43)

Integration now yields


log J (}) = f1 log (}  u1 ) + f2 log (}  u2 ) + e
or
J (}) = he (}  u1 )f1 (}  u2 )Q3f1
The integration constant e is obtained from lim}<" J(})
= {Q . Thus,
}Q

f1
J (})
u2 Q
}  u1
e
1

lim
=
h
lim
= he
}<" } Q
}<" }  u2
}
such that he = {Q . The obvious scaling for the eigenvector is to choose
{Q = 1 and we arrive at
J (}) =

Q
X

{m } m = (}  u1 )f1 (}  u2 )Q3f1

(A.44)

m=0

which shows that f1 must be an integer n 5 [0> Q ] for J (}) to be a polynomial of degree Q . Expanding the binomials with f1 = n gives
n
Q3n
X
X Q  n
n m
n3m
} (u1 )
} q (u2 )Q 3n3q
J (}) =
m
q
q=0
m=0

m
" X
X
Q n
n
(u1 )n3m (u2 )Q3n3m+q } m
=
m

q
q
q=0
m=0

from which the eigenvector components belonging to  $ (n) are, for


0  m  Q,

m
X
Q n
n
Q3m
u1 n3m u2 Q3n3m+q
(A.45)
{m (n) = (1)
m

q
q
q=0
The requirement on f1 also leads to equations for the eigenvalues . Indeed, equating f1 = n in (A.43) and substituting the explicit expressions for
the roots u1 and u2 , we obtain after squaring the quadratic equations for the
eigenvalue (n) for 0  n  Q
D(n)  2 (n) + E(n) (n) + F(n) = 0

(A.46)

462

Stochastic matrices

where
D(n) = (Q@2  n)2  (Q@2  f)2
E(n) = 2(1  ) (Q@2  n)2  Q (1 + ) (Q@2  f)
F(n) = (1 + )2 [(Q@2)2  (Q@2  n)2 ]
Each of the Q + 1 quadratic equations (A.46) has two roots 1 (n) and 2 (n),
thus in total 2(Q +1), while there are only Q +1 eigenvalues. The coe!cients
D (n), E (n) and F (n) only depend on n via (Q@2  n)2 , which means that
the quadratics (A.46) for which n0 = Q  n are identical. This observation
reduces the set {1 (n)> 2 (n)}0$n$Q of roots to precisely Q + 1 and connes
the analysis to 0  n  Q@2. We will show that all roots are real and
distinct (except for n = Q@2).
 (n) = E 2 (n)  4D (n) F (n) is with | = (Q@2  n)2 5

The discriminant
0> (Q@2)2 ,

 (n) = 16| 2 + 4 (1 + ) f2 (1 + )  2fQ + Q 2 |


which shows that  (n) is concave in | because

g2 {(n)
= 32 ? 0, for
g| 2
2
Q (f(1 + )  Q )2 A 0

| = 0,  (Q@2) = 0 and, for | = (Q@2)2 ,  (0) =


and, hence,  (n)  0 for n 5 [0> Q@2]. This means that, for 0  n ? Q@2,
the roots 1 (n) and 2 (n) are real and distinct and, for n = Q@2 (only if Q
is even) where  (Q@2) = 0,
1 (Q@2) = 2 (Q@2) =

1+
E (Q@2)
=
2D (Q@2)
1  2 Qf

For  ? n  Q@2, the roots {1 ()> 2 ()} are dierent from the roots
{1 (n)> 2 (n)} because D(n) } 2 + E(n) } + F(n) ? D() } 2 + E() } + F()
for all }. Indeed, D()  D(n) = (Q@2  )2  (Q@2  n)2 A 0 and the
discriminant (E()  E (n))2  4 (D()  D(n)) (F()  F(n)) ? 0 shows
that there are no real solutions. Thus, an extreme eigenvalue occurs for
n = 0 for which F (0) = 0 such that 1 (0) = 0 and
2 (0) = 

1 +   Q
E (0)
f
=
D (0)
1  Qf

(A.47)

Q
? 1 and f ? Q shows that 2 (0) ? 0,
The stability requirement  = f(1+)
and thus 2 (0) is the largest negative eigenvalue. The eigenvalues for other
0 ? n  Q@2 are either larger than 0 or smaller than 2 (0). We need to
consider two dierent cases (a) f ? Q@2 and (b) f A Q@2 while F (n) ? 0
for all n 5 [0> Q).

(a) If f ? Q@2 and if 0  n ? f and , then D (n) A 0. Hence, the product

A.5 Special types of stochastic matrices

463

1 (n)2 (n) = F(n)


D(n) ? 0 which means that 1 (n) A 0 A 2 (n) and that there
are precisely [f] positive eigenvalues. Similarly, D (n) ? 0 for f ? n ? Q@2,
such that 1 (n)2 (n) A 0 while 1 (n) + 2 (n) =  E(n)
D(n) ? 0 shows that both
eigenvalues are negative because E (n) ? 0. Indeed, if   1 and f ? Q@2,
the above expression immediately leads to E (n) ? 0 while if  ? 1 and
f ? Q@2, the expression
"
2
2 #


Q
Q
Q
Q
n 
f
f
+1
 2f
E(n) = 2(1  )
2
2
2
f
shows that both terms are negative.
(b) If f A Q@2, we see that D (n) A 0 for 0 ? n ? Q  f leading to
1 (n) A 0 A 2 (n). For Q  f ? n ? Q@2, we have D (n) ? 0 and thus
1 (n)2 (n) = F(n)
D(n) A 0 while their same sign follows from 1 (n) + 2 (n) =
 E(n)
D(n) requires us to consider the sign of E (n). If   1, then E (n) A 0. If
 A 1, then

Q
+ 2(1  ) (Q@2  n)2
E(n) = Q (1 + ) f 
2

Q 2
Q
+ 2(1  ) f 
? Q (1 + ) f 
2
2

 (Q  f)
Q
+1 A0
= 2f f 
2
f
which shows that 0 ? 2 (n) ? 1 (n). Hence, there are Q  [f] + 2(Q@2 
Q + [f]) = [f] positive eigenvalues.
In summary, there are [f] positive eigenvalues, one 1 (0) = 0 and Q  [f]
negative eigenvalues. Relabel the eigenvalues as (n > Q3n ) = (1 (n)> 2 (n))
in increasing order Q3[f]31 ? ? 1 ? 0 ? Q = 0 ? Q31 ? ?
Q3[f] .This way of writing distinguishes between underload and overload
eigenvalues. In terms of the discriminant by  (n) = E 2 (n)  4D (n) F (n),
the non-positive eigenvalues are
(a) If f ? Q@2,
s

3E(n)3 {(n)
2D(n)
s
3E(n)~ {(n)
=
2D(n)

1 (n) =
1>2 (n)

0  n  [f]
[f] + 1  n 

Q
2

(b) If f A Q@2,
s

1 (n) =

3E(n)3 {(n)
2D(n)

0  n  Q  [f]  1

464

Stochastic matrices

The eigenvector belonging to m follows from (A.45) where u1 and u2 are


given in (A.42) and n is determined from (A.43) since n = f1 . The eigenvectors for 1 (n) and 2 (n) belonging to a same quadratic n must be dierent.
Especially in this case, the corresponding n = f1 values can be determined
from (A.43). For example, for Q = 0, we nd u1 = 1, u2 =  1 and n = 0
and the eigenvector belonging to Q is with (A.45),

m
Q 
3m Q 3m
Q3m Q
u1 u2
=
(A.48)
{m (0) = (1)
m
m Q
After renormalization such that k{(0)k1 = 1, i.e. by dividing each comQ m
P
(1+)Q
1 PQ
ponent by Q
, the steady-state vector
m=0 {m (0) = Q
m=0 m  =
Q
(14.52) is obtained. Similarly, for the largest negative eigenvalue 0 in (A.47),
we nd with u1 = 1  Qf , u2 =  Q131 and n = f1 = Q such that
(f )
Q3m


Q
Q
Q 3m 0
Q3m Q
1
u1
u2 =
(A.49)
{m (Q ) = (1)
f
m
m
The left-eigenvectors | satisfy (A.9): | W G31 T = | W . The above approach is applicable. However, there is a more elegant method based on the
observation that there exists a diagonal matrix q
Z = gldj (Z0 > = = = > ZQ ) for
31
W
Q
31
31 TZ is
m
which Z TZ = Z TZ , namely Zm =
m  . Since Z
symmetric, the left- and right-eigenvectors corresponding to the same eigenvalue are the same (Section A.2, art. 9). Now |W G31 T = | W is equivalent
to

31 31

| W Z = |W Z Z 31 G31 Z Z 31 TZ = | W Z Z 31 GZ
Z TZ
W = |W Z , G
31 GZ and T
31 TZ = TW , we obtain
With |Z
Z = Z
Z = Z
Z
31
31
W
W
|Z GZ TZ = |Z . The transpose |Z = TZ GZ |Z is

Z 2 | = TG31 Z 2 |
which shows compared to G31 T{ = { that { = Z 2 | or, the vector components are, for 0  m  Q ,

Q m
{m =
 |m
(A.50)
m
A.5.2.3 General tri-diagonal matrices
Since tri-diagonal matrices of the form (11.1) frequently occur in Markov
theory, we devote this section to illustrate how far the eigen-analysis can be

A.5 Special types of stochastic matrices

465

pushed. For an eigenpair (the right-eigenvector { belonging to eigenvalue


), the components in (S  L){ = 0 satisfy
(u0  ) {0 + s0 {1 = 0
tm {m31 + (um  ) {m + sm {m+1 = 0

1m?Q

tQ {Q31 + (uQ  ) {Q = 0
If sm = s and tm = t, the matrix S reduces to a Toeplitz form for which the
eigenvalues and eigenvectors can be explicitly written, as shown in Appendix A.5.2.1. Here, we consider the general case and show how orthogonal
polynomials enter the scene.
Using um = 1  tm  sm , u0 = 1  s0 and uQ = 1  tQ , the set becomes,
with  =   1,
s0 + 
{0 ()
s0
sm + tm + 
tm
{m ()  {m31 ()
{m+1 () =
sm
sm
tQ
{Q 31 ()
{Q () =
tQ + 
{1 () =

1m?Q

(A.51)

The dependence on the eigenvalue  is made explicit. Solving (A.51) iteratively for m ? Q ,

{0 () 2
 + (t1 + s1 + s0 )  + s1 s0
s0 s1

{0 () 3
{3 () =
 + (t1 + t2 + s2 + s1 + s0 )  2
s2 s1 s0
+ (t2 t1 + t2 s0 + s2 t1 + s2 s1 + s2 s0 + s1 s0 )  + s2 s1 s0

{2 () =

reveals a polynomial of degree m in the eigenvalue  =   1. By inspection,


the general form of {m () for m ? Q is
m
{0 () X
{m () = Qm31
fn (m) n
p=0 sp n=0

(A.52)

466

Stochastic matrices

with
fm (m) = 1
fm31 (m) =
f0 (m) =

m31
X
p=0
m31
Y

(sp + tp )
sp

p=0

where t0 = sQ = 0. By substituting (A.52) into (A.51),


m31
X

fn (m + 1) n =

n=1

m31
X

[(tm + sm ) fn (m)  tm sm31 fn (m  1) + fn31 (m)]  n

n=1

and equating the corresponding powers in , a recursion relation for the


coe!cients fn (m) (0  n ? Q ) is obtained with fm (m) = 1,
fn (m + 1) = (tm + sm ) fn (m)  tm sm31 fn (m  1) + fn31 (m)
from which all coe!cients can be determined. Finally, for m = Q , the explicit
form of {Q () follows from (A.51) as
Q 31
{0 () X
tQ
tQ
{Q31 () =
{Q () =
fn (Q  1) n
Q 32
tQ + 
tQ +  Q
s
p
p=0
n=0

We can always scale an eigenvector without eecting the corresponding


eigenvalue. If we require a normalization of the eigenvector k{()k1 = 1,
then {0 () is uniquely determined,
{0 () =

PQ 31 Pm
1 + m=1 n=0

fn (m)
Tm31
 n +
p=0 sp

tQ
|tQ +|

P
Q31
n=0

f (Q31) n
TnQ 32

p=0 sp

Another scaling consists of choosing {0 () = 1. Hence, apart from the


eigenvalue , all eigenvector components {m () are explicitly determined.
If  = 1 or  = 0, the solution is {m () = {0 (0). If k{()k1 = 1, then {m ()
1
, which is, after proper scaling by Q + 1 (art. 4 in Section A.1), the
= Q+1

right-eigenvector x = 1 1 1 belonging to the left-eigenvector 


(see also Section A.4.1). If {0 () = 1, we immediate obtain x. Eigenvectors
belonging to dierent eigenvalues  0 6=  are linearly independent (art. 3 in
Section A.1), but only orthogonal if S = S W , i.e. if sm = tm+1 . Only in the
latter case (art. 9 in Section A.2), where also all eigenvalues are real, we

A.5 Special types of stochastic matrices

467

have
Q
X


{m () {m  0 = k{()k22 0

m=0

This orthogonality requirement determines the dierent eigenvalues . Since


 0 = 0 is an eigenvalue, each other real eigenvalue  6= 0 must obey
Q
X

{m () = 0

m=0

P
while the normalization enforces k{()k1 = Q
m=0 |{m ()| = 1. The scaling
PQ
{0 () = 1 leads to the polynomial m=0 en  n of degree Q whose Q zeros
equal the eigenvalues  6= 0 and whose coe!cients are, with sm = tm+1 and
for 2  n  Q  2,
e0 = (Q + 1) tQ
e1 = Q + tQ

Q
32
X
m=1

en =

Q32
X
m=n

f1 (m)
f1 (Q  1)
+ 2tQ QQ32
Qm31
p=0 sp
p=0 sp

Q31
X fn31 (m)
tQ fn (m)
2tQ fn (Q  1)
+
+
QQ32
Qm31
Qm31
p=0 sp
p=0 sp
p=0 sp
m=n31

sQ32 + 2tQ + fQ32 (Q  1)


QQ32
p=0 sp
1
= QQ31
p=0 sp

eQ31 =
eQ

The Newton identities (B.9) relate these coe!cients to the sum of integer
powers of the real zeros  6= 0.
Proceeding much further in the case that S is not symmetric is di!cult.
A similarity transform is needed to transform the linearly independent set of
vectors { () for dierent  to an orthogonal set from which the eigenvalues
then follow, as in the symmetric case above. Karlin and McGregor (see
Schoutens (2000, Chapter 3)) have shown the existence of a set of orthogonal
polynomials (similar to our set {m ()) that obey an integral orthogonality
condition (similar to Legendre or Chebyshev polynomials) instead of our
summation orthogonality condition. Only in particular cases, however, were
they able to specify this orthogonal set explicitly.

468

Stochastic matrices

A.5.3 A triangular matrix complemented with one subdiagonal


The transition probability matrix S has the structure of a triangular matrix
complemented with one subdiagonal,
5
9
9
9
9
S =9
9
9
7

S00 S01 S02

S0Q
S10 S11 S12

S1Q
0 S21 S22

S2Q
..
..
..
..
..
.
.
.

.
.
0
0 SQ31>Q32 SQ31;Q31 SQ31;Q
0
0
0
SQ;Q31
SQQ

6
:
:
:
:
:
:
:
8

Besides the normalization kk1 = 1, the steady-state vector  obeys the


relation  = =S , or per vector component (9.23),

m =

m+1
X

Snm n

n=0

because Snm = 0 if n A m + 1. Immediately we obtain an iterative equation


that expresses m+1 (for m ? Q ) in terms of the n for 0  n  m as

m+1 =

1  Smm
Sm+1;m

m31
X
Snm
m 
n
Sm+1;m
n=0

Let us consider the eigenvalue equation (A.1) that is written for stochastic
matrices as (S  L)W {W = 0. The matrix (S  L)W is a (Q + 1) (Q + 1)
matrix of rank Q because det(S L)W = 0 (else all eigenvectors { are zero).
When writing this set of equations in terms of {0 , we produce the following
set of Q equations,
5

S10
9 S11 3 
9
9
9 S12
9
9
9
9

9
9
..
7
.

S22 3 

S32

..
.

..
.

S1;Q 31

S2;Q 31

S3;Q 31

0
S21

0
0

..
.

0
0
.
..

0
0
.
..

SQ 31;Q 32
SQ 31;Q 31 3 

0
SQ ;Q 31

6
5 {
:
1
:
: 9 {2
: 9 {3
:9
:=9
..
:9
:9
.
:7
: {Q 31
8
{Q

5 3S
00
3S01
3S02
..
.
3S0;Q 32
3S0;Q 31

: 9
: 9
: 9
:=9
: 9
: 9
8 7

6
:
:
:
: {0
:
:
8

Since the right hand side matrix is a triangular


matrix, the determinant
QQ31
equals the product of the diagonal elements or n=0
Sn+1;n . By Cramers

A.5 Special types of stochastic matrices

469

rule, we nd that
5

S10
9 S11 3 
9
9
9
S12
9
9
..
9
9
.
9
det 9
..
9
.
9
9
9
9
S1m
9
9
..
7
.
{m
=
{0

S1;Q 31

0
S21
S22 3 
..
.
..
.
S2m
..
.
S2;Q 31

..
.

0
0
..
.

 3 S00
3S01
..
.

Sm31;m32

3S0>m32

0
0
..
.
..
.

Sm31;m31 3 

3S0;m31

..
.

Sm31;m
..
.

3S0m
..
.
3S0;Q 31

Sm+1;m
..
.

..
.

Sm31;Q 31
TQ 31
n=0

Sm+1;Q 31

0
0
..
.
..
.
..
.
..
.
0
SQ ;Q 31

6
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
:
8

Sn+1;n

The above determinant is of the form (Meyer, 2000, p. 467)


RmQ3m
Dmm
= det F det D
det
EQ3mm FQ 3mQ3m
Q 31
where det F = Q
n=m Sn+1;n . In the determinant det D, we can change the
m-th column with the (m  1)-th, and subsequently, the (m  1) th with the
(m  2)-th and so on until the last column is permuted to the rst column,
in total m  1 permutations. After changing the sign of that rst column,
the result is that det D = (1)m det (Smm  Lmm ) where Smm is the original
transition probability matrix limited to m states (instead of Q + 1). Hence,
for 1  m  Q ,
{0 (1)m det (Smm  Lmm )
{m =
Qm31
n=0 Sn+1;n
and the normalization of eigenvectors k{k1 = 1 determines {0 as
{0 =

1+

PQ

(31)m det(Smm 3Lmm )


Tm31
m=1
n=0 Sn+1;n

If the Q + 1 eigenvalues  are known, we observe that all eigenvectors can


be expressed in terms of the original matrix S in a same way.

Appendix B
Algebraic graph theory

This appendix reviews the elementary basics of the matrix theory for graphs
J (Q> O). The book by Cvetkovic et al. (1995) is the current standard work
on algebraic graph theory.

B.1 The adjacency and incidence matrix


1. The adjacency matrix D of a graph J with Q nodes is an Q Q matrix
with elements dlm = 1 only if (l> m) is a link of J, otherwise dlm = 0. Because
the existence of a link implies that dlm = dml , the adjacency matrix D = DW
is a real symmetric matrix. It is assumed further that the graph J does
not contain self-loops (dll = 0) nor multiple links between two nodes. The
complement Jf of the graph J consists of the same set of nodes but with a
link between (l> m) if there is no link (l> m) in J and vice versa. Thus, (Jf )f =
J and the adjacency matrix Df of the complement Jf is Df = M  L  D
where M is the all-one matrix ((M)lm = 1). Information about the direction
1

3
4

2
6

Fig. B.1. A graph with Q = 6 and O = 9. The links are lexicographically ordered,
h1 = 1 $ 2> h2 = 1 $ 3> h3 = 1 # 6 etc.

of the links is specied by the incidence matrix E, an Q O matrix with


471

472

Algebraic graph theory

elements

;
? 1 if link hm = l $ m
elm =
1 if link hm = l # m
=
0 otherwise

Figure B.1 exemplies the denition of D and E:


5

9
9
9
D=9
9
7

0
1
1
0
0
1

1
0
1
0
1
1

1
1
0
1
0
0

0
0
1
0
1
0

0
1
0
1
0
1

1
1
0
0
1
0

:
9
:
9
9
:
: E =9
9
:
8
7

1
1 1
0
0
0
0
0
0
1
0
0
1 1 1
0
0
0
0 1
0 1
0
0
1
0
0
0
0
0
0
0
0 1 1
0
0
0
0
0
1
0
0
1 1
0
0
1
0
0
1
0
0
1

6
:
:
:
:
:
8

2. The relation between adjacency and incidence matrix is given by the


admittance matrix or Laplacian T,
T = EE W =   D
where  = diag(g1 > g2 > = = = > gQ ) is the degree matrix. Indeed, if l 6= m and
noting that each column has only two non-zero elements at a dierent row,
O
X

tlm= EE W lm =
eln emn = 1
n=1

PO

2
If l = m, then
n=1 eln = gl , the number of links that have node l in
common.
Also, by the denition of D, the row sum l of D equals the degree gl of
node l,
Q
X
gl =
dln
(B.1)

PQ

n=1

Consequently, each row sum n=1 tln = 0 which shows that T is singular
implying that det T = 0. The Laplacian is symmetric T = TW because D
and  are both symmetric and the quadratic form dened in Section A.2
art. 10,
{W T{ = {W TW { = {W E W E{ = kE{k22  0
is positive semidenite, which implies that all eigenvalues of T are nonnegative and at least one is zero because det T = 0.
P PQ
Since Q
l=1
n=1 dln = 2O, the basic law for the degree follows as
Q
X
l=1

gl = 2O

(B.2)

B.1 The adjacency and incidence matrix

473

Notice that S = 31 D is a stochastic matrix because all elements of S lie


in the interval [0> 1] and each row sum is 1.
3. Let M denote the all-one matrix with (M)lm = 1 and  (J) the total
number of spanning trees in the graph J, also called the complexity of J,
then
adjT =  (J) M

(B.3)

adjT
where T31 = det
T . We omit the proof, but apply the relation (B.3) to
the complete graph NQ where T = Q L  M. Equation (B.3) demonstrates
that all elements of adjT are equal to  (J). Hence, it su!ces to compute
one suitable element of adjT, for example (adjT)11 that is equal to the
determinant of the (Q  1) (Q  1) principal submatrix of T obtained by
deleting the rst row and column in T,
5
6
Q 1
1
===
1
9 1
Q  1 ===
1 :
9
:
(adjT)11 = det 9
:
..
..
..
7
8
.
.
.

1

1

=== Q  1

Adding all rows to the rst and subsequently adding this new rst row to
all other rows gives
6
5
1
1
===
1
1 :
9
9 1 Q  1 = = =
: = det 9
= det 9
..
..
8
7
7 ...
.
.
1
1
=== Q  1
5

(adjT)11

1 1
0 Q
..
.
0 0

6
=== 1
=== 0 :
. : = Q Q 2
..
. .. 8
=== Q

Hence, the total number of spanning trees in the complete graph NQ which
is also the total number of possible spanning trees in any graph with Q
nodes equals Q Q32 . This is a famous theorem of Cayley of which many
proofs exist (van Lint and Wilson, 1996, Chapter 2).
4. The complexity of J is also given by
 (J) =

det (M + T)
Q2

Indeed, observe that MT = (ME) E W = 0 since ME = 0. Hence,


(Q L  M) (M + T) = Q M + QT  M 2  MT = Q T
and
adj ((Q L  M) (M + T)) = adj (M + T) adj (Q L  M) = adj (QT)

(B.4)

474

Algebraic graph theory

Since TNQ = Q L  M and as shown in art. 3, adj(Q L  M) = Q Q32 M and


since adj(Q T) = Q Q31 adjT = Q Q 31  (J) M where we have used (B.3),
adj (M + T) M = Q  (J) M
Left-multiplication with M+T taking into account that MT = 0 and M 2 = Q M
nally gives
(M + T) adj (M + T) M = det (M + T) M = Q 2  (J) M
which proves (B.4).
5. A walk of length n from node l to node m is a succession of n arcs of
the form (q0 $ q1 )(q1 $ q2 ) (qn31 $ qn ) where q0 = l and qn = m.
A path is a walk in which all vertices are dierent, i.e. qo 6= qp for all
0  o 6= p  n.
Lemma B.1.1 The number
of walks of length n from node l to node m is

equal to the element Dn lm .


Proof (by induction): For n = 1, the number of walks of length 1
between state l and m equals the number of direct links between l and m,
which is by denition the element dlm in the adjacency matrix D. Suppose
the lemma holds for n  1. A walk of length n consists of a walk of length
n  1 from l to some vertex u which is adjacent to m. By the
induction
n31

hypothesis, the number of walks of length n  1 from l to u is D


and
lu
total
number
the number of walks with length 1 from u to m equals dum . The
n

P
n31
of walks from l to m with length n then equals Q
d
=
D lm
um
u=1 D
lu
(by the rules of matrix multiplication).

Explicitly,
Q
Q X
Q

X
X
=

dlu1 du1 u2 dun32 un31 dun31 m


Dn
lm

u1 =1 u2 =1

un31 =1

As shown in Section 15.2, the number of paths with n hops between node l
and node m is
X
X
X
[n (l $ m; Q ) =

dlu1 du1 u2 dun31 m


u1 6={l>m} u2 6={l>u1 >m}

un31 6={l>u1 >===>un32 >m}

The denition of a path restricts the rst index u1 to Q  2 possible values,


the second u2 to Q  3, etc.. such that the total possible number of paths is
n31
Y
o=1

(Q  1  o) =

(Q  2)!
(Q  n  1)!

B.2 The eigenvalues of the adjacency matrix

475

whereas the total possible number of walks clearly is Q n31 .


A graph is connected if, for each pair of nodes, there
a walk or,
exists

n
equivalently, if there exists some integer n A 0 for which D lm 6= 0 for each

l> m. The lowest integer n for which Dn lm 6= 0 for each pair of nodes l> m
is called the diameter of the graph J. Lemma B.1.1 demonstrates that the
diameter equals the length of the longest shortest hop path in J.

B.2 The eigenvalues of the adjacency matrix


In this section, only general results of the eigenvalue spectrum of a graph J
are treated. For special types of graphs, there exists a wealth of additional,
but specic properties of the eigenvalues.
1. Since D is a real symmetric matrix, it has Q real eigenvalues (Section
A.2), which we order as 1  2   Q . Section A.1, art. 4 shows that,
apart from a similarity transform, the set of eigenvalues with corresponding
eigenvectors is unique. A similarity transform consists of a relabeling of
the nodes in the graph that obviously does not alter the structure of the
graph but merely expresses the eigenvectors in a dierent base. The classical Perron-Frobenius Theorem for non-negative square matrices (of which
Theorem A.4.2 is a special case) states that 1 is a simple and positive root
of the characteristic polynomial in (A.3) possessing the only eigenvector of
D with non-zero components. Moreover, it follows from (A.34) that
PQ PQ
{W D{
l=1
m=1 dlm {l {m
1 = sup W = max
PQ 2
{6=0
{6=0 { {
l=1 {l
The maximum is attained if and only if { is the eigenvector of D belonging
W
as shown in Section A.3.
to 1 and for any other vector | 6= {, 1  {{WD{
{
P
By choosing the vector | = x = (1> 1> = = = > 1), we have, with Q
m=1 dlm = gl
and (B.2),
1 

Q Q
Q
1 XX
1 X
2O
dlm =
gl =
Q
Q
Q
l=1 m=1

(B.5)

l=1

The stochastic matrix S = 31 D where  = diag(g1 > g2 > = = = > gQ) is
the degree matrix has the characteristic polynomial det 31 D  L =
Q
Y
det(D3{)
where
det

=
gm . Since the largest eigenvalue of a stochastic
det {
m=1

matrix equals 1 = 1 (Theorem A.4.2), for a regular graph where gm = u,


the largest eigenvalue equals 1 = u.

476

Algebraic graph theory

2. Since dll = 0, we have that trace(D) = 0. From (A.7),


Q 31

(1)

fQ31 =

Q
X

n = 0

(B.6)

n=1

3. The Newton identities for polynomials. Let sq (}) denote a polynomial


of order q dened as
sq (}) =

q
X

dn (q) } = dq (q)

n=0

q
Y

(}  }n (q))

(B.7)

n=1

where {}n (q)} are the q zeros. It follows from (B.7) that sq (0) = d0 (q) =
Q
dq (q) qn=1 (}n (q)). The logarithmic derivative of (B.7) is
s0q (})

= sq (})

q
X
n=1

1
}  }n (q)

For } A maxn }n (q) (which is always possible for polynomials, but not for
functions), we have that
s0q (}) = sq (})

q X
"
X
(}n (q))m

} m+1

n=1 m=0

= sq (})

"
X
]m (q)
m=0

} m+1

where
]m (q) =

q
X

(}n (q))m

n=1

Thus
q
X

ndn (q) } n =

n=1

q
X

dn (q) } n

"
X

]m (q)} 3m =

m=0

n=0

q
" X
X

dn (q) ]m (q)} n3m (B.8)

m=0 n=0

Let o = n  m, then 4  o  q. Also m = n  o  0 such that n  o.


Combined with 0  n  q, we have max(0> o)  n  q. Thus,
q
" X
X

dn (q) ]m (q)} n3m =

m=0 n=0

q
X

q
X

dn (q) ]n3o (q)} o

o=3" n=max(o>0)

31 X
q
X
o=3" n=0

dn (q) ]n3o (q)} o +

q X
q
X
o=0 n=o

dn (q) ]n3o (q)} o

B.2 The eigenvalues of the adjacency matrix

477

Equating the corresponding powers of } in (B.8) yields


d0 (q)]3o (q) +

q
X

n=1
q
X

dn (q) ]n3o (q) = 0

o0

dn (q) ]n3o (q) = (o  q)do (q)

n=o+1

The last set of equations for 0  o ? q,


do (q) = 

q
1 X
dn (q) ]n3o (q)
qo

(B.9)

n=o+1

are the Newton identities that relate the coe!cients of a polynomial to


the sum of the positive powers of the zeros. Applied to the characteristic polynomials (A.3) and (A.5) of the adjacency matrix with }n (q) = n ,
dn (q) = (1)Q fn and fQ 31 = 0 (from (B.6)) yields for the rst few values,
fQ 32 = 

Q
1 X 2
n
2
n=1

3n
3
n=1
3
4
!2
Q
Q
X
1C X 2
=
n  2
4n D
8

fQ 33 = 

fQ 34

Q
1X

n=1

n=1

4. From (A.4), the coe!cient of the characteristic polynomial fQ 32 =


P
P . Each principal minor P2 has a principal submatrix of the form
 doo 2
0 {
with {> | 5 [0> 1]. A minor P2 is non-zero if and only if { = | = 1
| 0
in which case P2 = 1. For each set of adjacent nodes, there exists such
non-zero minor, which implies that
fQ32 = O
From art. 3, it follows that the number of links O equals
O=

Q
1 X 2
n
2
n=1

(B.10)

478

Algebraic graph theory

5. Each principal submatrix P33 is


5
0
P33 = 7 {
}

of the form
6
{ }
0 | 8
| 0

with determinant P3 = det P33 = 2{|}, which is only non-zero for { =


| = }. That form of P33 corresponds with a subgraph of 3 nodes that are
fully connected. Hence, fQ33 = 2 the number of triangles in J. From
art. 3, it follows that
1X 3
n
the number of triangles in J =
6
Q

(B.11)

n=1

6. In general, from (A.4) and by identifying the structure of a minor Pn ,


any coe!cient fQ3n can be expressed in terms of graph characteristics,
X
(1)f|fohv(G)
(B.12)
(1)Q fQ3n =
GMJn

where Jn is the set of all subgraphs of J with exactly n nodes and f|fohv (G)
is the number of cycles in a subgraph G 5 Jn . The minor Pn is a determinant of the Pnn submatrix of D and dened as
X
(1)(s) d1s1 d2s2 dnsn
det Pn =
s

where the sum is over all n! permutations s = (s1 > s2 > = = = > sn ) of (1> 2> = = = > n)
and (s) is the parity of s, i.e. the number of interchanges of (1> 2> = = = > n)
to obtain (s1 > s2 > = = = > sn ). Only if all the links (1> s1 ) > (2> s2 ) > = = = (n> sn ) are
contained in J, d1s1 d2s2 = = = dnsn is non-zero. Since dmm = 0, the sequence of
contributing links (1> s1 ) > (2> s2 ) > = = = (n> sn ) is a set of disjoint cycles and (s)
depends on the number of those disjoint cycles. Now, det Pn is constructed

from a specic set G 5 Jn of n out of Q nodes and in total there are Qn
such sets in Jn . Combining all contributions leads to the expression (B.12).
7. Since D is a symmetric 0-1 matrix, we observe that using (B.1),
Q
Q
Q
X
X
X
2
2
dln dnl =
dln =
dln = gl
D ll =
n=1

n=1

n=1

Hence, with (A.16) or (B.10), (A.7) and basic law for the degree (B.2) is
expressed as
Q
Q
X
X
2n =
gn = 2O
(B.13)
trace(D2 ) =
n=1

n=1

B.2 The eigenvalues of the adjacency matrix

479

Furthermore,
Q
Q
X
X

D2

lm

l=1 m=1;m6=l

Q
Q
Q
X
X
X

dln dnm =

Q X
Q
X

l=1 m=1;m6=l n=1

n=1 l=1

Q X
Q
X

Q
X

dnl (gn  dnl ) =

n=1 l=1

Q
X

dnl

gn

dnm

m=1;m6=l
Q
X

dnl 

Q
X

l=1

n=1

!
dnl

l=1

or
Q
Q
X
X

D2

lm

l=1 m=1;m6=l

Q
X

gn (gn  1)

(B.14)

n=1

2
P PQ
Lemma B.1.1 states that Q
l=1
m=1;m6=l D lm equals twice the total number of two-hop walks with dierent source and destination nodes. In other
words, the total number of connected triplets of nodes in J equals half
(B.14).
8. The total number Qn of walks of length n in a graph follows from
Lemma B.1.1 as
Q
Q X
X
(Dn )lm
Qn =
l=1 m=1

Since any real symmetric matrix (Section A.2, art. 9) can be written as D =
X diag(m )X W where X is an orthogonal matrix of the (normalized) eigenvecP
n
tors of D, we have that Dn = X diag(nm )X W and (Dn )lm = Q
q=1 xlq xmq q .
Hence,
Q
!2
Q X
Q
Q X
Q
X
X
X
Qn =
xlq xmq nq =
xlq nq
l=1 m=1 q=1

q=1

l=1

9. Applying the Hadamard inequality for the determinant of any matrix


Fqq ,
q
!1
q
2
Y
X
|det F| 
|flm |2
m=1

l=1

yields, with dlm = dml and (B.1),


|det D| 

Q
Q
Y
X
m=1

l=1

! 12
d2ml

Q
Q
Y
X
m=1

l=1

! 12
dml

Q
Y
p
gm
m=1

480

Algebraic graph theory

Hence, with (A.6),


(det D)2 =

Q
Y

2n 

Q
Y

gm

(B.15)

m=1

n=1

10. Applying the CauchySchwarz inequality (5.17)


q
!2
q
q
X
X
X
dn en

d2n
e2n
n=1

n=1

n=1

to the vector (2 > = = = > Q ) and the 1 vector (1> 1> = = = > 1) gives
Q
!2
Q
X
X
n
 (Q  1)
2n
n=2

n=2

Introducing (B.6) and (B.13)

21  (Q  1) 2O  21
leads to the bound for the largest (and positive) eigenvalue 1 ,
r
2O (Q  1)
(B.16)
1 
Q
P
2O
Alternatively, in terms of the average degree gd = Q1 Q
m=1 gm = Q , the
largest eigenvalue 1 is bounded by the geometric
mean of the average degree
p
and the maximum possible degree, 1  gd (Q  1). Combining the lower
bound (B.5) and upper bound (B.16) yields
r
2O (Q  1)
2O
 1 
(B.17)
Q
Q
11. From the inequality (A.26) for Hlder t-norms, we nd that, if
Q
X

|n |t ? t

n=1

then
Q
X

PQ

|n |s ? s

n=1

for s A t A 0. Since n=1 n = 0, not all n can be positive


and combined
PQ
PQ
PQ
s
s
s
with n=1 n  n=1 |n | , we also have that n=1 n ? s . Applied
to the case where t = 2 and s = 3 gives the following implication: if

B.2 The eigenvalues of the adjacency matrix

481

2 ? 2 then Q 3 ? 3 . In that case, the number of triangles



n=1 n
1
1
n=2 n
given in (B.11) is
Q

Q
1 3 1X 3
1 3 1 X 3
the number of triangles in J = 1 +
n  1 
n A 0

6
6
6
6
PQ

n=2

n=2

P
2
2
Hence, if Q
n=2 n ? 1 , then the number
s of triangles in J is at least one.
Equivalently, in view of (B.10), if 1 A O then the graph J contains at
least one triangle.
12. A Theorem of Turan states that
h
Theorem B.2.1 A graph J with Q nodes and more than
tains at least one triangle.

Q2
4

i
links con-

h 2i
2
This theorem is a consequence of art. 7 and 11. For, using O A Q4  Q4
s
which is equivalent to Q ? 2 O in the bound on the largest eigenvalue (B.5),
1 
and 1 A
triangle.

s
2O
2O
A s = O
Q
2 O

s
O is precisely the condition in art. 11 to have at least one

13. The eigenvalues of the complete graph NQ are 1 = Q  1 and


2 = = = = = Q = 1. This follows by computing the determinant in (A.2)
in the same way as in Section B.1, art. 3. Alternatively, the adjacency
matrix of the complete graph is M  L and, if xW = [1 1 1] is the all-one
vector, then M = x=xW . A direct computation yields

W
x=xW
Q
det (M  L  L) = det x=x  ( + 1) L = ( ( + 1)) det L 
+1
Using (A.38) and xW x = Q ,

det (M  L  L) = ( ( + 1))


1
Q

Q
+1

= (1)Q31 ( + 1)Q31 ( + 1  Q )
,
gives the eigenvalues of NQ . Since the number of links in NQ is O = Q(Q31)
2
Q(Q31)
we observe that the equality sign in (B.16) can occur. Since O 
for
2
any graph, the upper bound (B.16) shows that 1  Q  1 for any graph.

482

Algebraic graph theory

14. The dierence between the largest eigenvalue 1 and second largest
2 is never larger than Q , i.e.
1  2  Q

(B.18)

Since 1 A 0 as indicated by (B.17), it follows from (B.6) that


0=

Q
X
n=1

n  1 +

Q
X

|n |  1 + (Q  1) |2 |

n=2

such that
2  

1
Q 1

Hence,
1  2  1 +

Q 1
1
=
Q 1
Q 1

Art. 13 states that the largest possible eigenvalue is 1 = Q  1 of the


complete graph which proves (B.18). Again, the equality sign in (B.18)
occurs in case of the complete graph.
15. Regular graphs. Every node m in a regular graph has the same degree
gm = u and relation (B.1) indicates that each row sum of D equals u.
Theorem B.2.2 The maximum degree gmax = max1$m$Q gm is an eigenvalue of the adjacency matrix D of a connected graph J if and only if the
corresponding graph is regular (i.e. gm = gmax for all m).
Proof: If { is an eigenvector of D belonging to eigenvalue  = gmax so
is each vector n{ for each complex n (Section A.1, art. 1). Thus, we can
scale the eigenvector { such that the maximum component, say {p = 1, and
{n  1 for all n. The eigenvalue equation D{ = gmax { for that maximum
component {p is
gmax {p = gmax =

Q
X

dpm {m

m=1

which implies that all {m = 1 whenever dpm = 1, i.e. when the node m is
adjacent to node p. Hence, the degree of node p is gp = gmax . For any
node m adjacent to p for which the component {m = 1, a same eigenvalue
relation holds and thus gm = gmax . Proceeding this process shows that every
node n 5 J has same degree gn = gmax because J is connected. Hence,
{ = x where xW = [1 1 1]. Conversely, if J is connected and regular,
P
then Q
m=1 dpm = gmax for each p such that x is the eigenvector belonging

B.2 The eigenvalues of the adjacency matrix

483

to eigenvalue  = gmax , and the only possible eigenvector (as follows from

art. 1). Hence, there is only one eigenvalue gmax .


16. The characteristic polynomial of the complement Jf is
det (Df  L) = det (M  D  ( + 1) L)

= (1)Q det (D + ( + 1) L) L  (D + ( + 1) L)31 M

= (1)Q det ((D + ( + 1) L)) det L  (D + ( + 1) L)31 x=xW


where we have used that M = x=xW and x is the all-one vector. Similar to
the proof of Lemma A.4.4, we nd
det (Df  L) = (1)Q j () det (D + ( + 1) L)

(B.19)

where
j () = 1  xW (D + ( + 1) L)31 x
In general, j () is not a simple function of  although a little more is

2
1

known. For example, j () = 1  (D + ( + 1) L)3 2 x which shows that


2
j () 5 (4> 1]. Unlike in the proof of Lemma A.4.4, x is generally not an
eigenvector of D and we can write (Section A.1, art. 8)
1 X (1)n Dn
+1
( + 1)n
"

(D + ( + 1) L)31 =

n=0

P
n n
where the last sum "
n=0 D } can be interpreted as the matrix generating function of the number of walks
of length n (see Section B.1, art. 5
n
and art. 8). Since D = X diag nm X W (Section A.2, art. 9) where the
orthogonal
s xm of D, the matrix product

matrix X consists of eigenvectors


W
x X = x=x1 x=x2 x=xQ = Q cos 1 cos 2 cos Q
where m is the angle between theeigenvector xm and the all-one vector
P
n
2
x. Hence, xW Dn x = xW X diag nm X W x = Q Q
m=1 m cos m and, with
P" (3m )n
+1
n=0 (+1)n = +1+m , we can write
j () = 1  Q

Q
X
m=1

cos2 m
 + 1 + m

With (A.5), we have f (  1) = det (D + ( + 1) L) =

QQ

n=1 (n

+ 1 + )

484

Algebraic graph theory

and, hence,
det (Df  L) =

(1)Q X
 + 1 + m  Q 2 cos2 m
Q
m=1

Q
Y

(n + 1 + )

n=1;n6=m

(B.20)
which shows that the poles of j () are precisely compensated by the zeros
of the polynomial f (  1). Thus, the eigenvalues of Df are generally
dierent from {m  1}1$m$Q where m is an eigenvalue of D. Only if x
n 3Q
and
is an eigenvector of D corresponding with n , then j () = +1+
+1+n
all eigenvalues of Df belong to the set {m  1}1$m6=n$Q ^ {Q  1  n }.
According to art. 15, x is only an eigenvector when the graph is regular.

B.3 The stochastic matrix S = 31 D


The stochastic matrix S = 31 D, introduced in Section B.2, art. 1, characterizes a random walk on a graph. A random walk is described by a
nite Markov chain that is time-reversible. Alternatively, a time-reversible
Markov chain can be viewed as random walk on an undirected graph. Random walks on graphs have many applications in dierent elds (see e.g.
the survey by Lovsz (1993)); perhaps, the most important application is
randomly searching or sampling.
The combination of Markov theory and algebra leads to interesting properties of S = 31 D. Section 9.3.1 and A.4.1 show that the left-eigenvector
of S belonging to eigenvalue  = 1 is the steady-state vector  (which is a
1Q row vector) and that the corresponding right-eigenvector is the all-one
vector x, which essentially follows from (9.8) and which indicates that, at
each discrete time step, precisely one transition occurs. These eigenvectors
obey the eigenvalue equations S W  W =  W and S x = x and the orthogonality relation x = 1 (Section A.1, art. 3). If g = (g1 > g2 > = = = > gQ ) is the degree
vector, then the basic law for the degree (B.2) is written in vector form as
g W
gW x = 2O, or, 2O
x = 1. Theorem 9.3.5 states that the steady-state
g W
x = 1
eigenvector  is unique such that the equations x = 1 and 2O
imply that the steady-state vector is
W
g
=
2O
or
m =

gm
2O

(B.21)

B.3 The stochastic matrix S = 1 D

485

In general, the matrix S is not symmetric, but, after a similarity transform K = 1@2 , a symmetric matrix U = 1@2 S 31@2 = 31@2 D31@2 is
obtained whose eigenvalues are the same as those of S (Section A.1, art. 4).
The powerful property (Section A.2, art. 9) of symmetric matrices shows
that all eigenvalues are real and that U = X W diag(U ) X , where the columns
of the orthogonal matrix X consist of the normalized eigenvectors yn that
obey ymW yn = mn . Explicitly written in terms of these eigenvectors gives
U=

Q
X

n yn ynW

n=1

where, with Frobenius Theorem A.4.2, the real eigenvalues are ordered as
1 = 1  2   Q  1. If we exclude bipartite graphs (where the set
of nodes is N = N1 ^ N2 with N1 _ N2 = B and where each link connects
a node in N1 and in N2 ) or reducible Markov chains (Section A.4), then
|n | ? 1, for n A 1. Section A.1, art. 4 shows that the similarity transform
K = 1@2 maps the steady state vector  into y1 = K 31  W and, with (B.21),
31@2  W

y1 =
31@2  W
2
or

gm
2O

y1m = s
s 2 =
PQ
gm
m=1

gm
s
= m
2O

2O

Finally, since S = 31@2 U1@2 , the spectral decomposition of the transition


probability matrix of a random walk on a graph with adjacency matrix D is
S =

Q
X

31@2

n 

yn ynW 1@2

= x +

n=1

Q
X

n 31@2 yn ynW 1@2

n=2

The q-step transition probability (9.10) is, with yn ynW lm = ynl ynm and
(B.21),
s
Q
gm
gm X q
q
+
n ynl ynm
Slm =
2O
gl
n=2

The convergence towards the steady state m can be estimated from


s
s
Q
Q
X

q
gm X q
Slm  m  gm
|qn | |ynl | |ynm | ?
|n |
gl
gl
n=2

n=2

486

Algebraic graph theory

Denoting by  = max (|2 | > |Q |) and by 0 the largest element of the reduced set {|n |} \ {} with 2  n  Q , we obtain
s

q
Slm  m ? gm  q + R (0q )
gl

B.4 Eigenvalues and connectivity


A graph J has n components (or clusters) if there exists a relabeling of the
nodes such that the adjacency matrix has the structure
6
5
D1 R = = = R
9
.. :
9 R D2
. :
:
D=9
9 ..
:
..
7 .
8
.
R
= = = Dn
where the square submatrix Dp is the adjacency matrix of the connected
component p. Disconnectivity is a special case of reducibility of a stochastic
matrix dened in Section A.4 and expresses that no communication is possible between two states in a dierent component or cluster. Using (A.39)
indicates that
det (D  L) =

n
Y

det (Dp  p L)

(B.22)

p=1

If D is a regular graph with degree u, so is each submatrix Dp . Since Dp


is connected, Section B.2, art. 15 states that the largest eigenvalue of any
Dp equals u. Hence, by (B.22), the multiplicity of the largest eigenvalue of
D equals the number of components in the regular graph.
As shown in Section B.1, art. 2, the Laplacian T has non-negative eigenvalues of which at least one equals zero. In addition, the matrix
(Q  1)L  T = (Q  1)L   + D
is non-negative with constant row sums all equal to Q  1. Although the
matrix (Q  1)L  T is not an adjacency matrix and does not represent
a regular graph, the main argument in the proof of Theorem B.2.2 is the
property of constant row sums and non-negative matrix elements. Hence,
the multiplicity of the largest eigenvalue of (Q  1)L  T is equal to the
number of components of J. But the largest eigenvalue of (Q  1)L  T is
the smallest of T  (Q  1)L and also of T. Hence, we have proved

B.5 Random matrix theory

487

Theorem B.4.1 The multiplicity of the smallest eigenvalue  = 0 of the


Laplacian T is equal to the number of components in the graph J
If T has only 1 zero eigenvalue with corresponding eigenvector x (because
PQ
n=1 tln = 0 for each 1  l  Q is, in vector notation, Tx = 0), then the
graph is connected; it has only 1 component. Theorem B.4.1 also implies
(T)
that, if the second smallest eigenvalue T = Q31 of T is zero, the graph J
is disconnected. Since all eigenvectors of a matrix are linearly independent,
the eigenvector {T of T must satisfy {WT x = 0 since x is the eigenvector
belonging to  = 0. By requiring this additional constraint and choosing
the scaling of the eigenvector such that {W { = 1, we obtain similar to (A.35)
that
T =

min

k{k22 =1 and {W x=0

{W T{

The second smallest eigenvalue T has many interesting properties that


characterize how strongly a graph J is connected. It is interesting to mention
the inequality (Cvetkovic et al., 1995, p. 265)


 (J)  T  2 (J) 1  cos
Q

(B.23)

where  (J) and  (J) are the vertex and edge connectivity respectively.

B.5 Random matrix theory


Random matrix theory investigates the eigenvalues of an Q Q matrix
D whose elements dlm are random variables with a given joint distribution.
Even in case all elements dlm are independent, there does not exist a general
expression for the distribution of the eigenvalues. However, in some particular cases (such as Gaussian elements dlm ), there exist nice results. Moreover,
if the elements dlm are properly scaled, in various cases the spectrum in the
limit Q $ 4 seems to converge rapidly to a deterministic limit distribution. The fascinating results of random matrix theory and applications from
nuclear physics to the distributions of the non-trivial zeros of the Riemann
Zeta function are discussed by Mehta (1991).
Random matrix theory immediately applies to the adjacency matrix of
the random graph Js (Q ) where each element dlm is 1 with probability s
and zero with probability 1  s.

488

Algebraic graph theory

B.5.1 The spectrum of the random graph Js (Q )


Let  denote an arbitrary eigenvalue of the adjacency matrix of the random graph Js (Q ). Clearly,  is a random variable with mean H [] = Q1

PQ
n = 0 because of (B.6). In addition, the variance Var[] = H 2 =
n=1
1 PQ
2
n=1 n and from (B.10)
Q
2O
= s(Q  1)
Q
This results
implies
s
that, for xed s and large Q , the eigenvalues of Js (Q )
grow as R
Q , with the exception1 of the largest eigenvalue 1 .
The number of links O in Js (Q ) is binomially distributed with mean
H [O] = s Q (Q231) . Taking the expectation of the bounds (B.17) on the
largest eigenvalue gives
r
2
2 (Q  1) hs i
H [O]  H [1 ] 
H
O
Q
Q
Using (2.12) yields
Var [] =

(Q2 )
(Q2 ) Q
hs i X
X
s n
s
Q
2
O =
n Pr [O = n] =
ns (1  s)( 2 )3n
H
n
n=0

n=0

Unfortunately, the sum cannot be expressed in closed form, but


(Q2 )
Q
1 X Q2 s n
s
q
ns (1  s)( 2 )3n  s
n
Q
2

n=0

with equality for Q $ 4. In summary, for any Q and s,


s
s(Q  1)  H [1 ]  s (Q  1)

(B.24)

The degree distribution (15.11) of the random graph is a binomial distribution with mean H [Grg ] = s(Q  1) and Var[Grg ] = (Q  1)s(1  s).
The inequality (5.13) indicates that the degree Grg converges exponentially
fast to zero the mean H [Guj ] for xed s and large Q , which means that
the random graphs tends to a regular graph with high probability. Section
B.2, art. 1 states that 1 $ s (Q  1) with high probability. Comparison
with the bounds (B.24) indicates that the upper bound is less tight than the
lower bound and that the upper bound is only sharp when s $ 1, i.e. for
the complete graph. Section B.2, art. 13 shows that only for the complete
graph the upper bound is indeed exactly attained.
1

 1 
It is known that, for large Q, the second largest eigenvalue of Js (Q) grows as R Q 2 + .

B.5 Random matrix theory

489

B.5.2 Wigners Semicircle Law


Wigners Semicircle Law is the fundamental result in the spectral theory of
large random matrices.
Theorem B.5.1 (Wigners Semicircle Law) Let D be a random Q Q
real symmetric matrix with independent and identically distributed elements
dlm with  2 = Var[dlm ] and denote by (DQ ) an eigenvalue of the set of the
Q real eigenvalues of the scaled matrix DQ = IDQ . The probability density
function i(DQ ) ({) of (DQ ) tends for Q $ 4 to
lim i(DQ ) ({) =

Q<"

1 p 2
4  {2 1|{|$2
2 2

(B.25)

Since Wigners rst proof (Wigner, 1955) of this Theorem and his subsequent generalizations (Wigner, 1957, 1958) many proofs have been published. However, none of them is short and easy enough to include here.
Wigners Semicircle Law illustrates that, for su!ciently large Q , the distribution of the eigenvalues of IDQ does not depend anymore on the probability
distribution of the elements dlm . Hence, Wigners Semicircle Law exhibits
a universal property of a class of large, real symmetric matrices with independent random elements. Mehta (1991) suspects that, for a much broader
class of large random matrices, a mysterious yet unknown law of large numbers must be hidden. The scaling of D by I1Q can be understood from the
previous Section B.5.1. The adjacency matrix of the random graph satis2
es the conditions in Theorem B.5.1
 = s (1  s) and its eigenvalues
swith
Q . In order to obtain the nite limit
(apart from the largest) grow as R
distribution (B.25) scaling by I1Q is necessary.
The spectrum of Js (50) together with the properly rescaled Wigners
Semicircle Law (B.25) is plotted in Fig. B.2. Already for this small value of
Q , we observe that Wigners Semicircle Law is a reasonable approximation
for the intermediate s-region. The largest eigenvalue 1 for nite Q , which is
distributed around s (Q  1) as demonstrated above and shown in Fig. B.2
but which is not incorporated in Wigners Semicircle Law, inuences the
PQ
average H [] = Q1
n=1 n = 0 and causes the major bulk of the pdf
around { = 0 to shift leftward compared to Wigners Semicircle Law, which
is perfectly centered around { = 0.
The complement of Js (Q ) is (Js (Q ))f = J13s (Q ), because a link in
Js (Q ) is present with probability s and absent with probability 1  s and
(Js (Q ))f is also a random graph. For large Q , there exists a large range
of s values for which both s  sf and 1  s  sf such that both Js (Q )

490

Algebraic graph theory


14
p = 0.1
N = 50
p = 0.2
p = 0.3
p = 0.4
p = 0.5
p = 0.6
p = 0.7
p = 0.8
p = 0.9
Semicircle Law (p = 0.5)

12
p = 0.1

10
p = 0.9

fO(x)

8
6

p = 0.8

p = 0.2

p = 0.7

p = 0.3

E[O1] = p(N 1)

2
0
0

10

20

30

40

eigenvalue x

Fig. B.2. The probability density function of an eigenvalue in Js (50) for various
s. Wigners Semicircle Law, rescaled and for s = 0=5 ( 2 = 14 ), is shown in bold.
We observe that the spectrum for s and 1  s is similar, but slightly shifted. The
high peak for s = 0=1 reect disconnectivity, while the high peak at s = 0=9 shows
the tendency to the spectrum of the complete graph where Q  1 eigenvalues are
precisely 1.

and (Js (Q ))f are connected almost surely. Figure B.2 shows that the normalized spectra of Js (Q ) and J13s (Q ) are, apart from a small shift and
ignoring the largest eigenvalue, almost identical. Equation (B.20) indicates
that the spectrum of a graph and its complement tends to each other if
cos m $ 0 (except for the largest eigenvalue which will tend to x). This
seems to suggest that Js (Q ) and J13s (Q ) are tending to a regular graph
with degree s (Q  1) and (1  s) (Q  1) and that these regular graphs
(even for small Q) have nearly the same spectrum (apart from the largest

s
'  IQ
 I1Q
eigenvalue s (Q  1) and (1  s) (Q  1) respectively): I13s
Q
where s is an eigenvalue of Js (Q ).
Figure B.3 shows the probability density function i ({) of the eigenvalues of the adjacency matrix D of Js (Q ) with Q = 100 together with the
eigenvalues of the corresponding matrix DX where all one elements in the
adjacency matrix of Js (100) are replaced by i.i.d uniform random variables
on [0,1]. Wigners Semicircle Law provides an already better approximation

B.5 Random matrix theory

491

12
p = 0.2
N = 100
p = 0.3
p = 0.4
p = 0.5
p = 0.6
p = 0.7
p = 0.8
p = 0.9
Semicircle Law (p = 0.5)

10
p = 0.7

p = 0.8

fO(x)

0
0

20

40

60

80

eigenvalue x

Fig. B.3. The spectrum of the adjacency matrix of Js (100) (full lines) and of the
corresponding matrix with i.i.d. uniform elements (dotted lines). The small peaks
at higher values of { are due to 1 .

than for Q = 50. Since the elements of DX are always smaller (with probability 1) than those of D, the matrix norm kDX k2 ? kDk2 , which implies
by Section B.2, art. 1 that 1 (DX ) ? 1 (D). In addition, relation (B.13)
P
2
shows that Q
n=1 n (DX ) ? 2O such that Var[ (DX )] ? Var[ (D)], which
is manifested by a narrower and higher peaked pdf centered around { = 0.

Appendix C
Solutions of problems

C.1 Probability theory (Chapter 2)


(i) Using the general formula (2.12) for a non-zero random variable [, we have
"
[

H [log [] =

log n Pr [[ = n]

n=1

S
n
while (2.18), *[ (}) = "
n=0 Pr [[ = n] } , shows that we need to express log n in terms of
n
} . A possible solution starts from the double integral with 0 ? d $ e,
] e
] "
] " ] e
g{
gwh3w{ =
gw
g{h3w{
d

where the reversal Uof integration is justied by absolute convergence (Titchmarsh, 1964,
Section 1.8). Since 0" gwh3w{ = {1 , the left-hand side integral equals
]

"

g{
d

gwh3w{ = log

while the integral at the right hand side is


] " ] e
]
gw
g{h3w{ =
d

"

e
d

h3wd 3 h3we
gw
w

hence,
]

"

log n =
0

h3w 3 h3wn
gw
w

Multiplying both sides by Pr [[ = n] and summing over n, we obtain (reversal in operators


is justied on absolute convergence),
"
[

"

log n Pr [[ = n] =
0

n=1

"

=
0

"

gw [  3w
h 3 h3wn Pr [[ = n]
w n=1
$
#
"
"
[
[
gw
Pr [[ = n] 3
h3wn Pr [[ = n]
h3w
w
n=1
n=1

which nally gives with (2.18)


]
H [log [] =
0

"

h3w 3 *[ (h3w ) + *[ (0)


gw
w

493

494

Solutions of problems

(ii) (a) The pdf of the n-th smallest order statistic follows from (3.36) for an exponential
distribution as
p 3 1 
n31 3(p3n+1){
i[(n) ({) = p
1 3 h3{
h
n31
The probability generating function (2.37) is
l
p 3 1 ] " 
k
n31 3(}+(p3n+1))w
1 3 h3w
*[(n) (}) = H h3}[(n) = p
h
gw
n31
0
Let x = h3w and  = } +  (p 3 n + 1), then the integral reduces to the well-known Beta
function (Abramowitz and Stegun, 1968, Section 6.2.)
] "
]

n31 3w
1 1
1
1 3 h3w
h
gw =
(1 3 x)n31 x@31 gw = E (n> @)


0
0
1 K (n) K (@)
=
 K (n + @)
Hence,
*[(n) (}) =


}
n31
\
+p+13n
K 
1
p!
p!
}
 =
}
(p 3 n)! K  + p + 1
(p 3 n)! m=0 
+p3m



The mean follows from H [(n) = 3O0z(n) (0) where O[ is the logarithm of the generating
function (2.41) as
n31


1 [ 1
H [(n) =
 m=0 p 3 m

(C.1)

(b) For a polynomial probability density function i[ ({) = {31 1{M[0>1] with  A 0, we
have with (3.36) for { M [0> 1] that
p 3 1
{n31 (1 3 { )p3n
i[(n) ({) = p
n31
with mean
p 3 1 ] 1


{n (1 3 { )p3n g{
H [(n) = p
n31
0


1
p 3 1 ] 1
K n+ 
1
p!


=p
wn+  31 (1 3 w)p3n gw =
1
n31
(n 3 1)! K p + 1 + 
0




If  < ", then H [(n) = 1, while for  < 0, H [(n) = 0. For a uniform distribution


n
where  = 1, the result is H [(n) = p+1 . Indeed, the p independently chosen uniform
random variables divide, after ordering, the line segment [0> 1] into p + 1 subintervals. The
length O of each subinterval has a same distribution, which more easily follows by symmetry
if the line segment is replaced by a circle of unit perimeter. Since the length O of each
subinterval is equal in distribution, one can consider the rst subinterval [0> [(1) ] whose
length O exceeds a value { M (0> 1) if and only if all p uniform random variables belong to
[{> 1]. The latter event has probability equal to (1 3 {)p such that Pr [O A {] = (1 3 {)p
1
and, with (2.35), H [O] = p+1
.
(iii) If [ were a discrete random variable, then Pr [[ = n] E qqn , where qn is the number of
values in the set {{1 > {2 > = = = > {q } that is equal to n. For a continuous random variable
[, the values are generally real numbers ranging from {min = min1$m$q {m until {max =
max1$m$q {m . We rst construct a histogram K from the set {{1 > {2 > = = = > {q } by choosing
3{min
a bin size {{ = {maxp
, where p is the number of bins (abscissa points). The choice
of 1 ? p ? q is in general di!cult to determine. However, most computer packages allow
us to experiment with p and the human eye proves sensitive enough to make a good choice

C.1 Probability theory (Chapter 2)

495

of p: if p is too small, we loose details, while a high p may lead to high irregularities
due to the stochastic nature of [. Once p is chosen, the histogram consists of the set
{k0 > k1 > = = = > kp31 } where km equals the number of [ values in the set {{1 > {2 > = = = > {q } that
lies in the interval [{min + m{{> {min + (m + 1){{] for 0 $ m $ p 3 1. By construction,
Sp31
m=0 km = q.
The histogram K approximates the probability density function i[ ({) after dividing each
value km by q{{ because
]

{max {min
31
{
[

{max

1=
{min

i[ ({)g{ = lim

{{<0

i[ (m ) {{ E

m=0

p31
[
m=0

km
{{ = 1
q{{

where in the Riemann sum m denotes a real number m M [{min + m{{> {min + (m + 1){{].
Alternatively from (2.31) we obtain
i[ (m ) = lim

{{<0

Pr [m ? [ $ m + {{]
Pr [{min + m{{ ? [ $ {min + (m + 1){{]
E
{{
{{

such that
i[ (m ) E

km
q{{

which reduces to the discrete case where {{ = 1.


Q
(iv) The density of mobile nodes in the circle with radius u equals  = u
2 . Let U denote the
(random) position of a mobile node. The probability that there is a mobile node between
distance { and { + g{ (and { $ u) is
Pr [{ $ U $ { + g{] =
From (2.31), the pdf of U equals iU ({) =
{2
u2

2{
1
u 2 {$u

2{g{
u2

and the distribution function follows by

integration as IU ({) =
1{$u + 1{Au . The (random) position U(p) of the p-th nearest
mobile node to the center is given by (3.36)
iU(p) ({) = QiU ({)

Q 3 1
(IU ({))p31 (1 3 IU ({))Q 3p
p31

Written in terms of the density  for { $ u,


iU(p) ({) = 2{

Q 313(p31)
Q 3 1  {2 p31 
{2
13
p31
Q
Q
2

we recognize, apart from the prefactor 2{, a binomial distribution (3.3) with s = {
.
Q
Similar to the derivation of the law of rare events in Section 3.1.4, this binomial distribution
tends, for large Q but constant density , to a Poisson distribution with  = {2 . Hence,
asymptotically, the pdf of the position U(p) of the p-th nearest mobile node to the center
is, for { $ u,
p31

{2
2
h3{
iU(p) ({) = 2{
(p 3 1)!
(v) We use the law of total probability (2.46) rst assuming that Z is discrete,
[
Pr[Y 3 Z $ {|Z = n] Pr [Z = n]
Pr[Y 3 Z $ {] =
n

and, by independence, Pr[Y 3 Z $ {|Z = n] = Pr[Y $ n + {]. Hence,


[
Pr[Y $ n + {] Pr [Z = n]
Pr[Y 3 Z $ {] =
n

496

Solutions of problems
If Z is continuous, the general formula is
]

"

Pr[Y $ { + |]

Pr[Y 3 Z $ {] =
3"

g Pr[Z $ |]
g|
g|

(C.2)

from which the pdf follows by dierentiation


]
iY 3Z ({) =

"

3"

iY ({ + |)iZ (|)g|

This resembles the convolution integral (2.62). If both Y and Z have the same distribution,
direct integration of (C.2) yields
Pr[Y $ Z ] = Pr[Z $ Y ] =

1
2

This equation conrms the intuitive result that two independent random variables with
same density function have equal probability to be larger or smaller than the other.

C.2 Correlation (Chapter 4)


(i) In two dimensions, formula (4.2) becomes
&
%
k
l
2
2
[
\
3}1 [3}2 \
2
2
H h
= exp 3}2 \ 3 [ }1 +
} + }1 }2 \ [ +
}
2 1
2 2
%

 &
2 1 3 2
\
1
2
}22
= exp 3}2 \ 3 [ }1 + ([ }1 + }2 \ ) +
2
2
Hence, with (4.21) the joint probability distribution is

i[\ ({> |; ) =

1
(2l)2

f1 +l"

f1 3l"

f2 +l"

h}1 ({3[ )+}2 (|3\ )

f2 3l"

2
2
2 \ (1 ) 2
1
}2
2
h 2 ([ }1 +}2 \ ) +
g}1 g}2
] f2 +l" 2 (12 )
2
\
1
}2 }2 (|3\ )
2
=
h
h
g}2
2l f2 3l"
] f1 +l"
2
1
1

h}1 ({3[ ) h 2 ([ }1 +}2 \ ) g}1


2l f1 3l"

Evaluating the last integral, denoted by O, yields




] f1 +l"
1
1
([ }1 + }2 \ )2 g}1
h}1 ({3[ ) exp
2l f1 3l"
2


] "
1
1
([ (f1 + lw) + }2 \ )2 gw
=
h(f1 +lw)({3[ ) exp
2 3"
2
%
2 &

] "
2 
3[
1 f1 ({3[ )
}2 \
gw
=
hlw({3[ ) exp
h
w 3 l f1 +
2
2
[
3"

O=

C.3 Poisson process (Chapter 7)

497

Since the integrand is an entire function, thekcontour canlbe shifted, which allows substitution as in real analysis. Thus, let x = w 3 l f1 + }2\ , then
[

1 3 }2\
[
h
2

1 3 }2\
[
h
2

&
2
[
2
[
x gx
h
exp 3
2
3"
%
$&
#
] "
2
({ 3 [ )
({3[ )
exp 3 [ x2 + 2lx
gx
2
2
[
3"
5
&]
$2 6
%
#
2
"
[
)
({ 3 [ )2
({
3

({3[ )
[
8 gx
exp 3
exp 73
x+l
2
2
2[
2
[
3"

1 f1 ({3[ )
O=
h
2

"


k
l
} 
l x+l f1 + 2 \
({3[ )

[)
By substituting w = x + l ({3
, the integral becomes
2
[

%
&
I ] "
] "
2
2
[
[
2
2
2
w gw = 2
w gw =
exp 3
exp 3
h3z z31@2 gz
2
2

[ 0
3"
0
I
I
 
2
2
1
=
K
=
[
2
[

"

&

where we have used the Gamma function (Abramowitz and Stegun, 1968, Chapter 6).
Hence,
} 
3 2 \ ({3[ )

O=

I
[ 2

({ 3 [ )2
exp 3
2
2[

&

and


2
[)
] f2 3l" \2 (12 ) 2 
exp 3 ({3
2

2[
1
}2 3 \ +  \
2
[
i[\ ({> |; ) =
I
h
2l f2 3l"
[ 2


({3[ ) }2 }2 |

g}2

The last integral is recognized



with (3.22) as the inverse Laplace transform of a Gaussian
2 1 3 2 and mean  =  + \ ({ 3  ). Thus
with variance  2 = \
\
[

[

% 
2 &



|3\ 3  \ ({3[ )
[
({3[ )2 exp 3
2
exp 3 22
2\ (132 )
[
I
s
i[\ ({> |; ) =
I
[ 2
\ 1 3 2 2
which nally leads to the joint Gaussian density function (4.4). Hence, the linear combination method leads to exact results for Gaussian random variables.

C.3 Poisson process (Chapter 7)


(i) Let \ be a binomial random variable with parameters Q and s, where Q is a Poisson
random variable with parameter . The probability density function of \ is obtained by
applying the law of total probability (2.46),
Pr [\ = n] =

"
[
q=0

Pr[\ = n|Q = q] Pr [Q = q]

498

Solutions of problems
With (3.3) and (3.9), we have
Pr [\ = n] =

"
"  
[
sn 3 [ t (q3n) q
q n (q3n) q h3
=
h
s t
n
q!
n!
(q 3 n)!
q=0
q=n
"
(s)n 3+t
sn n 3 [ t q q
h
=
h
n!
q!
n!
q=0
n

Since t = 1 3 s, we arrive at Pr [\ = n] = (s)


h3s , which means that \ is a Poisson
n!
random variable with mean s. If a su!cient sample of test strings dened above is sent
and received, the average number of one bits at receiver divided by the average number
of bits at the sender gives the probability s (if errors occur indeed independently).
(ii) Since the counting process of a sum of a Poisson process is again a Poisson counting process
S
with rate equal to 4m=1 m , the average number of packets of the four classes in the routers
S
buers during interval W is  = W 4m=1 m . Hence, the probability density function for the
q

total number Q of arrivals is Pr [Q = q] = q! h3 .


(iii) Theorem 7.3.4 states the Q(w) is a Poisson counting process with rate 1 + 2 . Then,
Pr [{[1 (w) = 1} K {[(w) = 1}]
Pr [[(w) = 1]
Pr [{[1 (w) = 1} K {[2 (w) = 0}]
=
Pr [[(w) = 1]
Pr [[1 (w) = 1] Pr [[2 (w) = 0]
1
=
=
Pr [[(w) = 1]
1 + 2

Pr [[1 (w) = 1|[(w) = 1] =

since the Poisson random variables [1 and [2 are independent. As an application we can
consider a Poissonean arrival ow of packets at a router with rate . If the packets are
marked randomly with probability s = 1 , the resulting ow consists of two types, those
marked and those not. Each of these ows is again a Poisson ow, the marked ow with
rate 1 = s and the non-marked ow with 2 = (1 3 s). Actually, this procedure leads
to a decomposition of the Poisson process into two independent Poisson processes and leads
to the reverse of Theorem 7.3.4.
1
(iv) (a) Applying the solution of previous exercise immediately gives  +
1
2 +3
(b) Since the three Poisson processes are independent, the total number of cars on the three
lanes, denoted by [, is also a Poisson process (Theorem 7.3.4) with rate  = 1 + 2 + 3 .
q
Hence, Pr [[ = q] = q! h3 .
(c) Let us denote the Poisson process in lane m by [m . Then, using the independence
between the [m ,
Pr [[1 = q> [2 = 0> [3 = 0] = Pr [[1 = q] Pr [[2 = 0] Pr [[3 = 0]
=

q
q
1 31 32 33
h
h
h
= 1 h3
q!
q!

(v) (a) The player relies on the fact that during the time there is exactly one arrival. Since
the game rules mention that he should identify the last signal in (0> W ), signals arriving
during (0> v) do not inuence his chance to win because of the memoryless property of the
Poisson process. The number of arrivals in the interval (v> W ) obeys a Poisson distribution
with parameter  (W 3 v). The probability that precisely one signal arrives in the interval
(v> W ) is Pr [Q (W ) 3 Q (v) = 1] =  (W 3 v) h3(W 3v) .
(b) Maximizing this winning probability with respect to v (by equating the rst derivative
to zero) yields
g
Pr [Q (W ) 3 Q (v) = 1] = 3h3(W 3v) + 2 (W 3 v) h3(W 3v) = 0
gv
with solution  (W 3 v) = 1 or v = W 3 1@. This maximum (which is readily veried by
g2
checking that gv
2 Pr [Q (W ) 3 Q (v) = 1] ? 0) lies inside the allowed interval (0> W ). The
maximum probability of winning is Pr [Q (W ) 3 Q (W 3 1@) = 1] = 1@h.

C.3 Poisson process (Chapter 7)

499

(vi) (a) We apply the general formula (7.1) for the pdf of a Poisson process with mean H [[(w)] =
w = 1. Then, Pr [[ (w + v) 3 [ (v) = 0] = h3w = 1h .
S
1
(b) Pr [[ (w + v) 3 [ (v) A 10] = 1 3 Pr [[ (w + v) 3 [ (v) $ 10] = 1 3 1h 10
n=0 n! .
(c) Each minute is equally probable as follows from Theorem 7.3.3.
(vii) This exercise is an application of randomly marking in a Poisson ow as explained in
solution (iii) above. The total ow of packets can be split up into an ACK stream, a
Poisson process Q1 with rate s = 3v31 and a data ow, an independent Poisson process
Q2 with rate (1 3 s)  = 7v31 . Then,
(a) Pr [Q1 A 1] = 1 3 Pr [Q1 = 0] = 1 3 h33
(b) The average number is H [Q1 + Q2 |Q1 = 5] = H [Q1 |Q1 = 5] + H [Q2 |Q1 = 5] = 5 +
H [Q2 ] = 5 + 7 = 12 packets.
(c) Pr [Q1 = 2|Q1 + Q2 = 8] =

Pr[Q1 =2>Q1 +Q2 =8]


Pr[Q1 +Q2 =8]

32 h3 76 h7
2!
6!
108 h10
8!

E 29=65%

(viii) (a) Since the three Poisson arrival processes are independent, the total number of requests
will also be a Poisson process with the parameter  = 1 + 2 + 3 = 20 requests/hour
(Theorem 7.3.4). The expected number of requests during an 8-hour working day is H [Q] =
w = 20 8 = 160 requests.
(b) If we denote arrival processes of requests with dierent ADSL problems each with a
random variable [l for l = 1> 2> and 3, then due to their mutual independence
Pr [[1 = 0> [2 = n> [3 = 0] = Pr [[1 = 0] Pr [[2 = n] Pr [[3 = 0]
= h31 w
8

from which Pr [[1 = 0> [2 = 3> [3 = 0] = h3 3

(2 w)n h32 w 33 w


h
n!

 6 3

h3 3

h3 3 = 1=7 1033 .
3!
20
(c) If we denote the total number of requests by [ then Pr [[ = 0] = h3w = h3 4 =
33
6=7 10 .
(d) The precise time is irrelevant for Poisson processes, only the duration of the interval
matters. Here intervals are overlapping and we need to compute the probability
3

s = Pr [{[ (0=2) = 1} K {[ (0=5) 3 [ (0=1) = 2}]


=

1
[

Pr [{[ (0=1) = n} K {[ (0=2) 3 [ (0=1) = 1 3 n} K {[ (0=5) 3 [ (0=2)} = 1 + n]

n=0

1
[

Pr [[ (0=1) = n] Pr [[ (0=2) 3 [ (0=1) = 1 3 n] Pr [[ (0=5) 3 [ (0=2) = 1 + n]

n=0

1
[
n=0

h32

(2)n 32 (2)13n 36 (6)1+n


h
h
= 48h310 = 2=18 1033
n!
(1 3 n)!
(1 + n)!

(e) Given that at the moment w + v there are n + p requests, the probability that there
were n requests at the moment w is
Pr [{[ (w) = n} K {[ (w + v) = n + p}]
Pr [[ (w + v) = n + p]
Pr [[ (w) = n] Pr [[ (w + v) 3 [ (w) = p]
=
Pr [[ (w + v) = n + p]

Pr [[ (w) = n|[ (w + v) = n + p] =

(w)n h3w (v)p h3v


n!
p!
=
( (w + v))n+p h3(w+v)
(n + p)!
n + p  w n  v p
=
n
w+v
w+v

500

Solutions of problems

(ix) (a) The number of attacks that are arriving to the PC is a Poisson random variable [ (w)
with rate  = 6. The probability that exactly one (n = 1) attack during one (w = 1) hour
follows from (7.1) as Pr [[(1) = 1] = 6h36 .
H[[(w)]
=
(b) Applying (7.2), the expected amount of time that the PC has been on is w =

60
=
10
hours.
6
(c) The arrival time of the fth attack is denoted by W . Given that there are six attacks in
one hour (w = 1), we compute the probability Pr[W ? w|[(1) = 6] that either ve attacks
arrive in the interval (0> w) and one arrives in (w> 1) or all six attacks arrive in (0> w) and none
arrives in the interval (w> 1). Hence, for 0 $ w ? 1,
IW (w) = Pr[W ? w|[(1) = 6]
Pr[{[(w) = 5} K {[(1) = 6}] + Pr[{[(w) = 6} K {[(1) = 6}]
Pr[[(1) = 6]
Pr[[(w) = 5] Pr[[(1) 3 [(w) = 1] + Pr[[(w) = 6] Pr[[(1) 3 [(w) = 0]
=
Pr[[(1) = 6]

((w)5 @5!)h3w (1 3 w)h3(13w) + ((w)6 @6!)h3w h3(13w)


= 6w5 3 5w6
(6 @6!)h3

The probability
that the fth attack will arrive between 1:30 p.m. and 2 p.m. is IW (1) 3

7
= 57
.
IW 12 = 1 3 64
64
U
(d) The expectation of W given [(1) = 6 follows from (2.33) as H [W |[(1)] = 01 {iW ({)g{
gI (w)

W
derived in (c). Alternatively, the expectation can be computed from
where iW (w) = gw

U 
(2.35), H [W |[(1)] = 01 1 3 (6{5 3 5{6 ) g{ = 57 . Hence the expected arrival time of the
fth attack between 1 p.m. and 2 p.m. is about 1:43 p.m.
(x) Let [ and [m denote the lifetime of system and subsystem m respectively. For a series
of subsystems with independent lifetimes [m is the event {[ A w} = Kq
{[ A w} and
 m=1 m

T
Pr [[ A w] = q
Pr
[[
A
w].
Recall
with
(3.32)
that
Pr
[[
A
w]
=
Pr
min
m
1$m$q [m A w .
m=1
Using the denition of the reliability function (7.5) then yields

Userie s (w) =

q
\

Um (w)

m=1

(xi) The probability that the system V shown in Fig. 7.6 fails is determined by the subsystem
with longest lifetime or [ = max1$m$q [m . Invoking relation (3.33) combined with the
denition of the reliability function (7.5) leads to

Up a ra llel (w) = 1 3

q
\

(1 3 Um (w))

m=1

C.4 Renewal theory (Chapter 8)


(i) The equivalence {Q (w) A q} Ui {Zq $ w} indicates
"
[


Pr ZQ (w) $ { =
Pr [{Zq $ {} K {Zq+1 A w}]
q=0

= Pr [Z0 $ {> Z1 A w] +

"
[
q=1

Pr [Zq $ {> Zq+1 A w]

C.4 Renewal theory (Chapter 8)

501

The convention Z0 = 0 reduces Pr [Z0 $ {> Z1 A w] = Pr [Z1 A w] = Pr [1 A w] = 1 3


I (w). Furthermore, by the law of total probability,
]

"

Pr [Zq $ {> Zq+1 A w] =

Pr [Zq $ {> Zq+1 A w|Zq = x]


0

g Pr [Zq $ x]
gx
gx

Pr [Zq+1 A w|Zq = x] g Pr [Zq $ x]

=
0

A renewal process restarts after each renewal from scratch (due to the stationarity and the
independent increments of the renewal process). This implies that Pr [Zq+1 A w|Zq = x] =
Pr [q+1 A w 3 x] = 1 3I (w 3 x) because the interarrival times are i.i.d. random variables.
Combined,
" ]
[


Pr ZQ (w) $ { = Pr [ A w] +
q=1

Pr [ A w 3 x] g Pr [Zq $ x]

Pr [ A w 3 x] g

= Pr [ A w] +
0

"
[

$
Pr [Zq $ x]

q=1

With the basic equivalence (8.6) and the denition (8.7) of the renewal function p(w), we
arrive at
] {


Pr ZQ (w) $ { = Pr [ A w] +
Pr [ A w 3 x] gp (x)
0

This equation holds for all {. If { = w, we can use the renewal equation,
]

Pr [ A w 3 x] gp (x) = p(w) 3
0

Pr [ $ w 3 x] gp (x)
0

= p(w) 3 p(w) + I (w)




which indeed conrms Pr ZQ (w) $ w = 1.
(ii) The generating function of the number of renewals in the interval [0> w] is with (8.10)
"
l [
k
Pr [Q(w) = n] } n
*Q (w) (}) = H } Q (w) =
n=0

= Pr [Q(w) = 0] +
0

]
= Pr [Q(w) = 0] + }
]
0

$
Pr [Q(w 3 v) = n 3 1] } n

n=1
#"
w
[

= Pr [Q(w) = 0] + }

"
[

i (v)gv

$
Pr [Q(w 3 v) = n] } n

i (v)gv

n=0
w

*Q (w3v) (}) gI (v)

From (8.6), we have that Pr [Q(w) = 0] = 1 3 I (w) and


]
*Q (w) (}) = 1 3 I (w) + }

*Q (w3v) (}) gI (v)

By derivation with respect to }, we arrive at the dierential-integral equation for the


derivative of the generating function,
*0Q (w) (}) =
=

]
0

*Q (w3v) (}) gI (v) + }

*Q (w) (}) 3 1 + I (w)


}

0
] w

+}
0

*0Q (w3v) (}) gI (v)


*0Q (w3v) (}) gI (v)

502

Solutions of problems
which reduces to the renewal equation (8.9) for } = 1 since *0Q (w) (1) = p(w). The second
derivative
] w
] w
0
(})
=
2
*
(})
gI
(v)
+
}
*00
*00

Q (w)
Q (w3v)
Q (w3v) (}) gI (v)
0

2
2
= *0Q (w) (}) 3
}
}

*Q (w3v) (}) gI (v) + }

*00
Q (w3v) (}) gI (v)

evaluated at } = 1, is
*00
Q (w) (1) = 2p(w) 3 2I (w) +

]
0

*00
Q (w3v) (1) gI (v)

The variance Var[Q(w)] follows from (2.27) as



2
0
0
Var[Q(w)] = *00
Q (w) (1) + *Q (w) (1) 3 *Q (w) (1)
] w
= 3p(w) 3 p2 (w) 3 2I (w) +
*00
Q (w3v) (1) gI (v)
0

(iii) Every time an IP packet is launched by TCP, a renewal occurs and the reward is that 2000
km are travelled, in each renewal, thus Uq = 2000 km. The speed in a trip that suers
from congestion is, on average, 40 000 km/s, while the speed without congestion experience
is 120 000 km/s. Since congestion only occurs in 1/5 cases, the average length (in s) of a
renewal period is
H [ ] =

4
2000
1
7
2000
+
=
120000
5
40000
5
300

The average speed of an IP packet (in km/s) then follows from (8.20) as
lim

w<"

U(w)
H [U]
2000
=
= 7 = 85714=3
w
H [ ]
300

(iv) Every transmission of an ATM cell is a renewal with average length of the renewal interval
equal to H [ ] = Q@u, where 1@u is the mean interarrival time for a voice sample. If q is
the time between the q-th and q + 1-th arrival of sample, then the average total cost per
ATM cell transmission equals
%
H [U] = H

Q
[

&
qf q + N = f

q=1

Q
[

qH [q ] + N

q=1

f Q(Q 3 1)
+N
u
2
H[U]

(Q 31)

.
Hence, the average cost per unit time incurred in UMTS is H[ ] = f 2 + Nu
Q
(v) (a) The replacement of a router is a renewal process where the time at which router Um is
replaced is Zm = plq([m > W ), and

D> if [m $ W
Um =
E> if [m A W
The average cost per renewal period is H [U] = D Pr [[m $ W ] + E Pr [[m A W ] and the
average length of a renewal interval equals
]

"

Pr [Zm A w] gw =

H [[] =
0

"

Pr [min ([m > W ) A w] gw =


0

The time average cost rate of the policy ChangeRouter is F =

Pr [[m A w] gw
0

H [U]
.
H [[]

C.5 Discrete-time Markov chains (Chapter 9)

503

(b) For D = 10000> E = 7000, Pr [[m $ W ] = 1 3 h3W with mean life time
and W = 5, we have

1
= 10 years




H [U] = D Pr [[m $ W ] + E Pr [[m A W ] = 10000 1 3 h31@2 + 7000 h31@2 ' 8200
and
]

Pr [[m A w] gw =

H [[] =
0



h30=1w gw = 10 1 3 h31@2 ' 4

such that time average cost rate of the policy ChangeRouter is F '

8200
= 2050.
4

C.5 Discrete-time Markov chains (Chapter 9)


(i) (a) The Markov chain is drawn in Fig. C.1
0.2

0.8

0.2

0.2

0.8

0.8

Fig. C.1. Three-state Markov chain.


n

(b) The steady-state vector  is computed via (9.24). The sequence S 2


gences to yield three correct digits after four multiplications
5

S2

0=800
= 7 0=640
0=640
5

S8

0=762
= 7 0=761
0=761

0=160
0=320
0=160

6
0=040
0=040 8
0=200

S4

0=190
0=191
0=190

6
0=048
0=048 8
0=048

S 16

0=768
= 7 0=742
0=742
5

0=762
= 7 0=762
0=762

rapidly conver-

0=168
0=211
0=186

6
0=046
0=046 8
0=072

0=190
0=190
0=190

6
0=048
0=048 8
0=048



from which we nd that the row vector in S 16 equals  = 0=762 0=190 0=048 .
The second method consists in solving the set (9.25) by Cramers method. Hence,

 30=2

det P =  0=2
 1

2 =









30=2
0=2
1

0=8
31=0
1

0
0
1

det P

0=0
0=8
1


0=0 
0=8  = 0=84
1 








= 0=19

1 =

1 =

















0
0
1

0=8
31
1

0=0
0=8
1

det P

30=2
0=2
1

0=8
31
1

det P

0
0
1

















= 0=762

= 0=048

The third method relies on the specic structure of the Markov chain, a discrete birth and
dead process or general random walk with constant sn = s and tn = t. Applying formula

504

Solutions of problems
(11.9), taking into account that Q = 2,  =

0=2
0=8

1
4
1
43

16
= 0=762
21

1 =

13
13

1
4

yields,

4
= 0=190
21
1
= 0=048
3 = 2  =
21
2 = 1  =

(ii) The Markov chain is shown in Fig. C.2. The state 1 is an absorbing state. From (9.23),
1

1
2

1
2
1
3

2
3

3
4

1
4

4
5

1
5

...

1
n

Fig. C.2. A recurrent Markov chain with positive drift.


the steady-state vector components are found as
1 =

Q
[
n
n
n=1

2 = 0
m32
m =
m31
m31

mD2

or 1 = 1 and m = 0 for m A 1. Hence, the steady-state vector exists, and is dierent from
 = 0, which demonstrates that the Markov chain is positive recurrent for any number of
states Q. However, the drift for m A 1 because m = 1 is absorbing is
H [[n+1 3 [n |[n = m] = 1 3

1
2
1
3 =13
m
m
m

which is always positive for m A 2. Hence, given an initial state m A 2, the Markov chain
will, on average, move to the right (higher states).
(iii) (a) The Markov chain is shown in Fig. C.3.
1  pb

pb

1  py

1  pm

1  po

py
pm
po

Fig. C.3. Markov chain of the growth process of trees in a forest during a period of 15
years.

C.5 Discrete-time Markov chains (Chapter 9)


(b) The evolution of the Markov process is dened by
6 5
5
se
s|
sp
e[n + 1]
0
0
9 |[n + 1] : 9 1 3 se
7 p[n + 1] 8 = 7
0
1 3 s|
0
0
0
1 3 sp
x[n + 1]
(c) The number of trees in each category after 15
5
6 5
e[1]
0=1 0=2 0=3 0=4
0
0
0
9 |[1] : 9 0=9
7 p[1] 8 = 7 0
0=8
0
0
x[1]
0
0
0=7 0=6
and after 30 years (two periods)
5
6 5
e[2]
0=1
9 |[2] : 9 0=9
7 p[2] 8 = 7 0
0
x[2]

0=2
0
0=8
0

0=3
0
0
0=7

6 5
sr
0
: 9
87
0
1 3 sr

505
6
e[n]
|[n] :
p[n] 8
x[n]

years (one period) is


6 5
6 5
6
5000
500
: 9 0 : 9 4500 :
87 0 8=7 0 8
0
0

6
6 5
6 5
500
950
0=4
0 : 9 4500 : 9 450 :
=

0 8 7 0 8 7 3600 8
0
0
0=6

(d) The steady-state vector  obeys equation (9.22) or, equivalently, (9.25). Applying a
variant of (9.25), we have
6 5
6
5
6 5
1
1
1
1
1
e
31
0
0 : 9 | :
9 0 : 9 1 3 se

7 0 8=7
0
1 3 s|
31
0 8 7 p 8
0
0
1 3 sp 3sr
0
0
The determinant is det S = 3s0 3 (1 3 se ) (1 + 2s0 3 s| 3 sp + s| sp 3 s0 s| ) and via
Cramers method we have
5
6
1
1
1
1
1
0
31
0
0
9
:
e =
det 7
0 1 3 s|
31
0 8
det S
0
0
1 3 sp 3sr
5
6
31
0
0
7
31
0 8
det 1 3 s|
0
1 3 sp 3sr
3sr
=
=
det S
det S
With the numerical values given in (c), e = 0=25773. After a similar calculation for the
other categories, the total number of trees in steady growth is
5
6 5
6
5000e
1289
9 5000| : 9 1160 :
7 5000 8 ' 7 928 8
p
1624
50000
(iv) (a) The clustered error pattern is modeled as a two-state discrete Markov chain. When a
bit is received incorrectly, the system is in state 0 else it is in state 1. The Markov chain
is shown in Fig. 9.2, wheres = 1 3 0=95 =
 0=05 and t = 1 3 0=999 = 0=001. The transition
0=95
0=05
.
probability matrix is S =
0=001 0=999
(b) There is only one communicating class because both states 0 and 1 are reachable from
each other. The Markov chain is therefore irreducible.
(c) The steady-state vector follows from (9.37) as

 

1
 = 50
= 0=0196 0=9804
51
51
The fraction of correctly received bits in the long run is 98.04% and the fraction of incorrectly received bits is 1.96%.
(d) After repair, the system operates correctly in 99.9% of the cases, which implies that

506

Solutions of problems
1 = 0=999 and 0 = 0=001. Formula (9.37) indicates that
or s = 999t. The test sequence shows that

s
s+t

= 0=999 and

t
s+t

= 0=001

Pr[[0 = 1> = = = > [11 = 1] = (1 3 t)10 Pr [[0 = 1] = (1 3 t)10 = 0=9999


which leads to t ' 1035 and thus, s = 0=0999. A correctly (incorrectly) received bit is
followed by a next correctly (incorrectly) received bit with probability 1 3 t = 0.999 99,
respectively 1 3 s = 0.100 01.

C.6 Continuous-time Markov processes (Chapter 10)


(i) (a) The failure rate for each processor is  = 0=001 per hour. The repair rate is  = 0=01
per hour. The Markov chain is shown in Fig. C.4.

2O

2P

Fig. C.4. The Markov chain for the three states: (1) both processors work, (2) one processor
is damaged and (3) both processors are damaged.
(b) The innitesimal generator is
3
T=C

32

0

2
3( + )
2

4
0
 D
32

If the state probability vector is denoted by v(w), we can also write v(w)T =
3
[v1 (w)

v2 (w)

v3 (w)] C

32

0

2
3( + )
2

4
0

 D = v01 (w)
32

v02 (w)

g
gw

(v(w)), or


v03 (w)

(c) The steady-state  = limw<" v (w) obeys the equation (10.19)


3

1

2

3

32
C 
0

2
3( + )
2

4
0
 D = [0
32

0]

2
2



2

Since 1 + 2 + 3 = 1, we nd that 1 = +
, 2 = (+)
. From
2 and 3 =
+
the balance equation, we know that the probability ux from state 1 to state 2 should
precisely equal that in the opposite direction such that 21 = 2 and similar for the
transitions 2 < 3, 1 = 22 . Using 1 + 2 + 3 = 1 leads faster to the solution. With
 = 0=001 and  = 0=01, the values are 1 = 0=8264, 2 = 0=1653 and 3 = 0=0083.
(d) The availability in case (i) is 1 = 0=8264. The availability in case (ii) is 1 +2 = 0=9917.
(ii) (a) In state 0, both servers are damaged, state 1 refers to one server down and one operating
while in state 2, both servers are operating. The corresponding Markov chain is shown in
Fig. C.5.
5
6
3E
0
E
7
8
I + H 3I 3 H 3 

(b) The innitesimal generator T =
H
2K
3H 3 2K

C.6 Continuous-time Markov processes (Chapter 10)

507

OB
O

PF + PE

PH

PE

1 31
1 31
h
h
= 6=66 1032 h 31 , E =
=
15
20
= 7 1034 h31 and H = 6 1035 h31 .

Fig. C.5. The Markov chain is specied by  =


5 1032 h 31 , K = 3 1034 h 31 , I

(c)

The steady-state vector  obeys (10.19). The solution of T = 0 is


5


0

1

2

3E
7 I + H
H

0
3I 3 H 3 
2K

6
E

8= 0

3H 3 2K

Since this linear set of equation


S is undetermined, we remove an arbitrary equation and add
the normalization condition 2m=0 m = 1,
5


0

1

2

3E
7 I + H
H

0
3I 3 H 3 
2K

6
1

1 8= 0
1

The steady-state probabilities are


E (I + H + )
= 0=9898
(H + E ) (I + H + ) + 2K (I + H + E )
2K
1 =
2 = 0=0088
(I + H + )

2 =

0 = 1 3 2 3 1 = 0=0013
(d) Theorem 10.2.3 states that the average lifetime of state m is H [m ] =
1
1
=
t0
E
1
=
H [1 ] =
t1
I
1
H [2 ] =
=
t2
H
H [0 ] =

1
. This yields
tl

= 20 h
1
= 14=9 h
+ H + 
1
= 1515 h
+ 2K

(e) A repair takes place when the system transfers from state 1 to 2. When the system
jumps from state 0 to state 2, two repairs take place. The fraction of time during which
both servers are damaged is 0 and the fraction of time in which one server is operating is
1 . The rate of repairs will be the rate of changing from state 1 to 2, plus two times the
rate of changing from state 0 to state 2:
iu = 1 t12 + 20 t02 = 1  + 20 E = 7=17 1034
If we denote with [ the random variable of the number of total failures over the period of
1 year, then the average value of [ will be
H [[] = iu 24 365 = 6=28

508

Solutions of problems

C.7 Continuous-time Markov processes (Chapter 11)


(i) In both cases we apply the general formulae (11.15) and (11.16) for the steady-state of a
general birth and death process.

(a) Using the notation  = 
, we rst compute
m31
\
p=0

m
m31
\
m \ 1
m
p

= m
=
=
p+1
(p + 1) 
 p=1 p
m!
p=0

Then, with (11.16),

m =

1+

m
m!
S" m
m=1 m!

m 3
h
m!

mD0

which demonstrates that the steady-state probability that the birth and death process is
in state m is Poisson distributed with mean .

(b) Similarly, we rst compute with p = (p+1)
and p = ,
m31
\
p=0

m31
\
p

m
=
=
p+1
(p + 1)
m!
p=0

which leads to precisely the same steady-state as in (a). Indeed, the steady-state is only a
function of the ratios p , which are the same in both (a) and (b).
p+1

(ii) All stations in slotted ALOHA operate independently and each has probability sw = 0=12 to
transmit in a timeslot. A station is successful in one slot with probability sv = sw (13sw )Q 31
where the number of stations Q = 8. Thus, sv = 0=049. The waiting time Z to transmit
one packet is a geometric random variable with parameter sv from which (Section 3.1.3)
the mean H [Z ] = s1 . Alternatively, H [Z ] obeys the equation
v

H [Z ] = sv + (1 3 sv ) (1 + H [Z ])
because the average waiting time equals 1 timeslot with probability sv plus 1 timeslot
increased with the average waiting time with probability 1 3 sv . Solving that equation
again yields H [Z ] = s1 = 20=39 timeslots. The average transmission time for 7 packets is
v
7H [Z ] = 142=7 timeslots.

C.8 Queueing (Chapters 13 and 14)


(i) Let us denote the number of packets in the server by Q{ . Since a router either serves 0 or
1 packet, the problem states that Pr [Q{ = 1] = 0=8, and also that H [Q{ ] = 0=8. For any
queue, it holds that QV = QT + Q{ and W = z + {, the number in the system equals the
number in the buer and the number that is being served. From Littles Theorem (13.21),
1
it follows with H [{] = 
that
H [Q{ ] =






or  = H [Q{ ]. Substituted into Littles law for the waiting time in the buer, H QT =


H [z], and using H QT = 3=2 gives
H [z] =



H QT 1
4
=
H [Q{ ] 


(ii) In a M/M/m/m queue, the number of busy servers equals the number (of packets) in the

C.8 Queueing (Chapters 13 and 14)

509

system QV . From (14.16) and the denition (2.11), the average number of busy servers
equals
H [QV ] =

p
[
m=0

m Pr [QV = m] = Sp

p
[

m
m=0 m!m m=1

m
(m 3 1)!m

The sum can be rewritten as


p
[
m=1

such that

6
5
p31
p
 [ m
 7[ m
p 8
m
=
=
3
(m 3 1)!m
 m=0 m!m
 m=0 m!m
p!p

6
5
p
7
p!p
8 =  (1 3 Pr [QV = p])
H [QV ] =
1 3 Sp
m


m
m=0 m!

where the last probability is recognized as the Erlang B formula (14.17).


(iii) (a) Since the average service rate  = 2 s31 , the average response time (average system
time) follows from (14.2) as


1
H WM /M / 1 =
23


(b) If H WM / M / 1 = 2=5 s, then it follows from (a) that  = 1=6 s31 . Hence, the number
of jobs/s that can be processed for a given average response time of 2.5s equals 1.6 jobs/s
(c) A 10% increase
in arrival rate corresponds to  = 1=76 s31 and from (a) we obtain that

1
H WM / M / 1 = 0=24 = 4=17 s, which is with respect to 2.5s an increase in average response
time of 67%.
(iv) We know that when the average call holding time is 1@ = 10 min, the time blocking
1
probability SB t = 10
. Additionally, for Poisson call arrivals, the time blocking probability
SB t equals the call blocking probability SB on the PASTA property. The number of
channels is p = 2. The arrival intensity  can be calculated from the Erlang B formula
(14.17)
SB =

u2 @2
1 + u + u2 @2

where u = @. Solving this equation


for u = 2 and taking into account that the tra!c
t
intensity  M [0> 1] yields  =
from which  =

I
1+ 19
.
90

SE +

2
2SE 3SE

13SE

. For SB =

1
,
10

we have that u =

I
1+ 19
9

The blocking probability (14.17) corresponding to an average call


I
1 + 19
holding time 1@ = 15 min for which u = @ =
15 is SB t E 0=174.
90
(v) The queue 1 is a M/M/1 queue. By Burkes Theorem 14.1.1, the departure process from
queue 1 is a Poisson with rate . By assumption, this departure process which is the
arrival process to the second queue is also independent of the service process at queue 2.
Therefore, queue 2, viewed in isolation, is also a M/M/1 queue. We know that the queueing
processes in both queues are stable because the load 1 =  ? 1 and 2 =  ? 1. The
1
2
steady-state distribution of the number of customers in queue 1 and queue 2 follow from
q
p
(14.1) as Pr[q at queue 1] = 1 (1 3 1 ) and Pr[p at queue 2] = 2 (1 3 2 ). The number of
customers presently in queue 1 is independent of the sequence of earlier arrivals at queue
2 and therefore also of the number of customers presently in queue 2. This implies that
p
Pr[q at queue 1> p at queue 2] = Pr[q at queue 1] Pr[p at queue 2] = q
1 (131 )2 (132 )

(vi) The average system times H [W ] for the three dierent queueing systems are immediate.
From (14.2), for system A, we have
H [WD ] =

1
n (1 3 )

510

Solutions of problems
For each of the n subqueues of system B, (14.2) gives
H [WE ] =

1
 (1 3 )

while for the M/M/k queue, (14.14) yields


H [WF ] =

n (1 3 ) + Pr [QV D n]
n (1 3 )

Clearly, H [WE ] = nH [WD ] shows that by replacing n small systems by one larger system
with same processing capability, the average system time decreases by a factor n.
From the relation H [WF ] = i (n> )H [WD ] with i (n> ) = n (1 3 ) + Pr [QV D n], it is
more complicated to decide where i (n> ) is larger or smaller than 1. The extreme values
Ci (n>)
C Pr[QV Dn]
of i (n> ) are known: i (n> 0) = n and i (n> 1) = 1. Since
= 3n +
C
C
C Pr[Q Dn]

Ci (n>)

V
and
A 0, it cannot be concluded that
is monotonously decreasing
C
C
from n to 1 in which case we would have i (n> ) A 1. Assuming n real, we observe that
Ci (n>)
C Pr[QV Dn]
= (1 3 ) +
A 0 for all  ? 1, which implies that i (2> ) ? i (3> ) ?
Cn
Cn
and allows us to concentrate only on i (2> ). Numerical results show that i (2> ) ? 1 if
 D 0=85, but i (3> ) D 1. This leads us to the conclusion that for n A 2, system A always
outperforms system C; only if n = 2 and in the heavy tra!c regime  D 0=85, system C leads
to a slightly shorter average system time of maximum 1.7%. Hence, by replacing n A 2
processing units (servers) by one with same processing capability, always lowers the total
time spent in the system. Of course, all conclusions only apply to systems that can be well
modeled as M/M/m queueing systems. To rst order a computing device (processor) may
be regarded as a M/M/1 queue. Then, the analysis shows that replacing an old processor
by a n times faster one is faster (on average) than installing n old processors in parallel.
(vii) The waiting process for aeroplanes is modeled as a M/D/1 queue because the arrival process
1
is Poissonean with rate  = 10
arrivals/minute, it consists of a single queue as 1 aeroplane
can land at a time and the service process (the landing process) takes precisely { = 5
5
minutes (constant service time). Thus, H[{] = 5 minutes> Var[{] = 0 and  = 10
. Since
the M/D/1 process is a special case of the M/G/1, we can apply the general formula (14.28)
for the average waiting time in the queue of an M/G/1 system

H[z] =

1
52
H[{2 ]
= 10 1 = 2=5 minutes
2(1 3 )
2 2

(viii) (a) We know that the arrival intensity of new calls to the cell is i = 20 calls/min. Let k
denote the arrival rate of the handover calls. The average time spent by a call in the cell is
H [W ] = 1=64 minutes and the average number of ongoing calls is H [Q] = 52. Furthermore,
the blocking rate is SE = 0=02. The total arrival rate of calls that are carried by the base
station is
c a rrie d = (1 3 SE ) o  e red = (1 3 SB ) (f + h )
Littles formula (13.21) states that H [Q] = c a rried H [W ]. Note that only the carried calls
have an inuence on the state of the system. We can solve the asked k from these two
equations as
h =

H [Q]
3 f = 12=35 calls/minute
H [W ] (1 3 SB )

(b) The arrival intensity of lost calls is lo st = SB (f + h ) = 0=647 calls/minute. If only
the new calls are blocked, the asked blocking rate is
SB f =

lo st
= 3=24%
f

(ix) (a) The derivation is given in the study of


 the
 M/G/1 queue in Section 14.3.1 where D (})
given by (14.25) should be replace by H } Q .

C.8 Queueing (Chapters 13 and 14)

511

b) Use in (14.25) the Laplace transform of an exponential random variable with mean
given in (3.16)
*{ (v) =

1



+v

One obtains
k l
H } Q = *{ ( (1 3 })) =


@ ( + )
=
(1 3 })  + 
1 3 (@ ( + )) }

[
n
" 


= 13
}n
 +  n=0  + 

from which the probability density function


Pr [Q = n] =


13


+




+

n

follows. Thus, Pr [Q = n] is recognized as a geometric random variable with mean H [Q] =



.

(x) The queueing system is modeled by a birth and death process. The death rate is obvious
and equal to . The arrival rate into state m equals the arrival rate of customers  multiplied
1

by the probability of really going to that state m which is m+1
. Hence, m = m+1
. The

steady-state equation of this birth and death process is a Poisson process with rate  = 
as derived above in Section C.7 solution (i).
(xi) The M/M/m/m/s queue (The Engset formula ). The arrival rate in Engset model is proportional to the size of the still demanding subgroup and the number of arrivals is
1
exponential. The holding time of a line is exponentially distributed with mean 
.
The Engset model is described as a birthdeath process where each state n refers to the size
of the served subgroup. Since the total of customers is v, the still demanding subgroup
consists of v 3 n members. The birth rate is m = (v 3 n) and the death rate n = n. The
proportionality factor  can be interpreted as the arrival rate per still demanding customer. The Markov graph is depicted in Fig. C.6. Application of the general birthdeath

sD

(s  1)D

2
P

(s  2)D

...

3
2P

(s  m + 1)D

3P

m
mP

Fig. C.6. The Markov chain of the Engset loss model.


formulae for the steady-state vector (11.16) or Pr [QV = m] yields, with u =
Tm

m =

(v3n+1)
n
Sp Tq (v3n+1)
q=1
n=1
n
n=1

1+


,


v m
u
m
= Sp  v  q
q=0 q u

The computation of the blocking probability is more complex than for the Erlang B formula,
because the arrival process is not a Poisson process. Indeed, due to the nite number of
customers v, the largest number of possible arrivals is nite and the arrival rate depends
on the state. Hence, the PASTA property cannot be applied. For a small time interval
{w, the blocking probability SE equals the ratio of the se ({w), the probability of blocking
in {w, over sd ({w), the probability of an arrival in {w. Since the arrival rates depend on

512

Solutions of problems
the state, the probability of an arrival in {w is not equal to p as for the Erlang B model.
Instead, we have
 
Sp
p
[
(v 3 q) v uq
sd ({w) = {w
(v 3 q) Pr[QV = q] = {w q=0
Sp  v  q
q
q=0 q u
q=0
Sp v31 q
u
q=0
q
= v{w S
v q
p
q=0 q u
Furthermore, blocking is only caused if QV = p and if at least one of the v 3 p customers
of the still demanding group generates an arrival. However, since the interval {w can
be made arbitrarily small1 , a generation of more than 1 arrival has probability r({w) such
that it su!ces to consider only one call attempt. Hence,
v31 p
u
se ({w) = {w(v 3 p) Pr[QV = p] = v{w Spp  v  q
q=0 q u
The Engset call blocking probability SE =
SE

se ({w)
sd ({w)

becomes

v31 p
u
= Sp pv31
uq
q=0
q

(C.3)

Observe that SE = Pr [QV = p] in a system with v 3 1 instead of v customers: an entering


customer observes a system with v 3 1 customers ignoring himself. At last, if we denote
 = v, the Engset call blocking formula (C.3) can be rewritten as
 p
SE = S
p

1
p!

q=0




(v313p)!vpq
q!(v313q)!

 q



(v313p)!vpq

(v313p)!

The ratio (v313q)! is a polynomial in v of degree q3p such that limv<"


=
(v313q)!
1. In conclusion, if  = v and v < ", the Engset call blocking probability reduces to the
Erlang B formula (14.17).
(xii) Although for a M/D/1 the exact expression of the overow probability (14.44) exists, this
series converges slowly for high tra!c intensities  =  so that fast executable expressions
are desirable. Substituting (14.45) with  =  into (14.67) gives
13

fou() E

3N


13
31
13 3N
 1 3 31


For su!ciently high loads  A 0=8, we use the approximation  E 32 of Section 5.7 to
obtain
(1 3 )2N
fouM / D / 1 / K '
(C.4)
1 3 2N+1
Comparing with (14.20) in the M/M/1/K queue,
fouM / M / 1/ K '

(1 3 )N
1 3 N+1

the M-server (in continuous-time) needs approximately twice as much buer places to guarantee the same cell loss ratio as in the corresponding D-server (in discrete-time). Further
combining (14.3) and (14.38) shows that




M / M / 1 H zM / M / 1 = 2M / D / 1 H zM / D / 1
or, the average waiting time in the queue (normalized to the average service time) for the
1

Similar arguments are used in Chapter 7 when studying the Poisson process.

C.9 General characteristics of graphs (Chapter 15)

513

M/M/1 queue is exactly twice as long as for the M/D/1 queue. The variability of the
service in the M-server causes these rather large dierences in performance. Furthermore,
the simple formula (C.4) is particularly useful to engineer ATM buers or to dimension
simple queueing networks. If the number of individual ows that constitute the aggregate
ow are large enough and none of the individual ows is dominant, the aggregate arrival
process is quite well approximated by a Poisson process. Given as a QoS requirement a
stringent cell loss ratio fou W , the input ow  can be limited such that fouM / D / 1 / K ? fouW .
Alternatively, the buer size N can be derived from (C.4) subject to fouM / D / 1 / K = fou W for
an aggregate Poisson input ow  = 0=9. As long as the input ow is limited to  ? 0=9,
the thus found buer size N always guarantees a cell loss ratio below fou W provided the
input ow can be approximated as a Poisson arrival process.

C.9 General characteristics of graphs (Chapter 15)


(i) In one dimension (g = 1), the hopcount kQ of the shortest path between two uniformly
chosen points {D and {E equals the distance between {D and {E . We allow the hopcount to
be zero which is reected by the small k while capital K refers to the case where the source D
2(]3n)
1
and the destination E are dierent. Thus, Pr[|{D 3 {E | = n] = ]
1n=0 + ] 2 11$n$]31
with corresponding generating function
*] ({) =

]31
[

Pr[|{D 3 {E | = n]{n =

n=0

] 3 ]{2 + 2{({] 3 1)
] 2 ({ 3 1)2

Since the nodes are uniformly chosen, all coordinate dimensions are independent and the
generating function of the hopcount of the shortest path in a g-lattice is2 *g] ({). From
g
(] 2 3 1) and
(2.26) and (2.27), the average number of hops is immediate as H[kQ ] = 3]
g(] 2 31)(] 2 +2)

. The total number of nodes in the g-lattice is


the variance as Var[kQ ] =
18] 2
Q = ] g such that, for large Q, we obtain
H[kQ ] '

g 1@g
Q
3

and
Var[kQ ] '

g 2@g
Q
18

both increasing in g A 1 (for constant Q )as inQ (for constant g). For a two-dimensional
I
Q .
lattice, the average hopcount scales as R
(ii) Using the denition (15.6) of the clustering coe!cient and applying the law of total probability (2.46) yields

31
k
l Q[
Pr fJs (Q ) $ { =
Pr
n=0




2|
$ { gy = n Pr [gy = n]
gy (gy 3 1)

The degree distribution Pr [gy = n] in the random graph is given by (15.11) and
k  l


Pr

{

 



2
n
n 
n
[

2|
3m
2
sm (1 3 s) 2
$ { gy = n = Pr | $
{ gy = n =
m
gy (gy 3 1)
2
m=0

If the sizes of the hypercube are not identical, the pgf is

g
\
m=1

*]m ({).

514

Solutions of problems
because | is the number of links between the gy = n neighbors of y, which is binomially
distributed with parameter s. Combined gives
k  l
n

{
 
2
31 
k
n
l Q[
n
[
Q 3 1 n
3m
2
sm (1 3 s) 2
Pr fJs (Q ) $ { =
s (1 3 s)Q 313n
m
n
m=0
n=0

l
k
The average H fJs (Q ) is computed via (2.35) as
k
l ]
H fJs (Q ) =

k
l
Pr fJs (Q ) A { g{

n=0

Let w =

2
[

k  l
m= n
{ +1
2

 
n
n
3m
2
sm (1 3 s) 2
g{
m

n
{, then
2
 

 

Q 3 1 n
s (1 3 s)Q 313n
n

Q
31 
[

2
[

m=

k 
n
{
2

 
 
] n
n
n
2
1
3m
m
2
g{ = n
s (1 3 s) 2
m
l
0
2

 
n

 
2
n
n
[
3m
2
gw
sm (1 3 s) 2
m

m=[w]+1

   
n

 
 
2
2
n
1 [ [  n2  m
3m
s (1 3 s) 2
= n
m
w=0 m=w+1
2

Reversing the w- and m- sum yields


 

]
0

2
[

 
n

 
 
2 m31  
n
n
n
1 [ [  n2  m
3m
3m
2
sm (1 3 s) 2
s (1 3 s) 2
g{ = n
m
m
l
m=1 w=0

k 
m= n
{
2

 
n

 
 
2
n
1 [  n2  m
3m
s (1 3 s) 2
= n
m
=s
m
m=1
2

l
k
Hence, we nd that H fJs (Q ) = s. Along the same lines, we nd that the generating
function *f (}) of the clustering coe!cient fJs (Q ) is

*f (}) =

Q
31 
[
n=0

Q 3 1 n
s (1 3 s)Q 313n
n

 

}
3 n

1 3 s + sh (2)

n
2

The variance is computed from (2.43) as


31 
l 
k
 Q[
Q 3 1 sn (1 3 s)Q 313n
Var fJs (Q ) = s 3 s2
n
n
n=2
2

(iii) The probability Pr [KQ = 2] is determined by the intersection of two independent events.
First, there is no direct path between node D and E. This event has a chance proportional
to 13s. Second, there is at least one path with two hops. All Q 32 possible two-hops paths
between D and E have the structure (D < m) (m < E) and they have no links in common,
i.e. they are mutually independent and independent from the direct link. The probability
of the second event equals 1 3 S2 , where S2 is the probability that there is no path with

C.10 The uniform recursive tree (Chapter 16)

515

two hops. Hence, we have that Pr [KQ = 2] = (1 3 s)(1 3 S2 ) and it remains to compute
S2 . The event of no path with two hops is

f
32
32
= KQ
Q
m=1 1(D<m)(m<E)
m=1 1((D<m)(m<E))f
such that
S2 = Pr
=

k
f l
k
l
32
32
= Pr KQ
Q
m=1 1(D<m)(m<E)
m=1 1((D<m)(m<E))f

Q
32
\

32
 Q\


 
Q 32

1 3 Pr 1((D<m)(m<E)) = 1 3 s2
Pr 1((D<m)(m<E))f =

m=1

m=1

which demonstrates (15.26).

C.10 The uniform recursive tree (Chapter 16)


(i) The relative error u, dened as 1 minus the simulated value over the exact value at hop n
given in (16.14), versus the number of hops n is shown in Fig. C.7. The insert in Fig. C.7
illustrates that, on a linear scale, the dierence between simulation and theory (full line)
is not distinguishable for q D 105 iterations. The average H [uq ] and standard deviation

10 iterations
5

10 iterations
6

10 iterations

0.20
Pr[H50 = k]

Relative error

0.1

0.01

exact pdf

0.15
0.10
0.05
0.00

0.001

10

6 8 10 12 14
k hops

15

20

25

k hops
Fig. C.7. The relative error of the simulations of the hopcount in the complete graph with
exponential link weight versus the hopcount for 10 4 > 105 and 106 iterations.
 [uq ] of the relative error for q iterations versus the hops n are
H [u104 ] = 0=12
H [u105 ] = 0=047
H [u106 ] = 0=017

 [u104 ] = 0=17
 [u105 ] = 0=073
 [u106 ] = 0=02

where the range of n values has been limited for q = 104 to 10 hops, for q = 105 to 11
hops, and for q = 106 to 12 hops. For larger hops, the simulations return zeros because the
tail probability Pr [KQ A n] decreases as R (1@n!) and simulating such a rare event requires
on average at least as many simulations as (Pr [KQ = n])31 . The table roughly shows

516

Solutions of problems



that the average error over the non-zero returned values decreases as R I1q , which is in
agreement with the Central Limit Theorem 6.3.1. Each iteration of the simulation can be
regarded as an independent trial and the histogram sums in a particular way the number
of these trials.
(ii) Using (2.43), we have

2
2
0
Var [ZQ ] = *00
= *00
ZQ (0) 3 *ZQ (0)
ZQ (0) 3 (H [ZQ ])

$2
#
Q
31
Q
31
n
[
[
g2 \ q(Q 3 q) 
1
1
1
=
3

Q 3 1 n=1 g} 2 q=1 } + q(Q 3 q) 
Q 3 1 q=1 q
}=0

where H [ZQ ] is given in (16.18). The derivatives of the product


j (}) =

n
\

q(Q 3 q)
}
+
q(Q 3 q)
q=1

are elegantly computed via the logarithmic derivative gj(})


= j (})
g}


g2 j(})
g log j(}) 2
g2 log j(})
+ j (})
. With
derivative is g}2 = j (})
g}
g} 2

g log j(})
.
g}

The second

n
n
[
1
g log j (})
q(Q 3 q)
g [
log
=
=3
g}
g} q=1
} + q(Q 3 q)
}
+
q(Q
3 q)
q=1
n
[
1
g2 log j (})
=
2
g} 2
q=1 (} + q(Q 3 q))

we obtain since j(0) = 1,


Q
31
[
1
Var [ZQ ] =
Q 3 1 n=1

n
[

q=1

$2
1
q(Q 3 q)

Q
31 [
n
[
1
1
+
3
Q 3 1 n=1 q=1 q2 (Q 3 q)2

# SQ 31

1
q=1 q

$2

Q 31
(C.5)

The rst sum is


Q
31
[
n=1

n
[

q=1

and, with
Q
31
[
n=1

$2
1
q(Q 3 q)

Q
31
[

n=1 q=1

SQ 31 Sn

n
[

q=1

n=q

1
m=1 m(Q 3m)

$2
1
q(Q 3 q)

n
[

Q
31
n
[
[
1
1
=
q(Q 3 q) m=1 m(Q 3 m)
q=1

SQ 31 Sn
n=1

1
m=1 m(Q 3m)

SQ 31 Sn
n=q

1
m=1 m(Q 3m)

q(Q 3 q)

Sq31 Sn
n=1

1
m=1 m(Q 3m) ,

3
4
Q
31 [
q31
n
n
[
[[
1
1
1
C
D
3
=
q(Q 3 q) n=1 m=1 m(Q 3 m) n=1 m=1 m(Q 3 m)
q=1
4
3
Q
31
Q
31
Q
31
q31
q31
[
[
[
[
[
1
1
1
C
13
1D
=
q(Q 3 q) m=1 m(Q 3 m) n=m
m(Q 3 m) n=m
q=1
m=1
Q
31
[

Q
31
[
q=1

Q
31
Q
31
q31
[
[
[
1
1
1
1
3
q(Q 3 q) m=1 m
(Q
3
q)
m(Q
3 m)
q=1
m=1

Q
31
[
q=1

q31
[
1
1
q(Q 3 q) m=1 (Q 3 m)

C.10 The uniform recursive tree (Chapter 16)


Furthermore, since
Q
31
[
q=1

Sn

1
q=1 q(Q 3q)

Sn

1
q=1 q

1
Q

1
Q

SQ 31

1
q=Q 3n q ,

517

we have

q31
Q 31
q31
Q 31
[
[ 1
1
1
1
1
1 [
1 [
=
+
(Q 3 q) m=1 m(Q 3 m)
Q q=1 (Q 3 q) n=1 n
Q q=1 (Q 3 q)

1
Q

Q
31
[

1
m

m=1

Q[
3m31
n=1

1
1
+
n
Q

Q
31
[

1
m

m=1

Q
31
[
n=m+1

Q
31
[
n=Q 3q+1

1
n

1
n

and
Q
31
[
q=1

 q31
q31
Q 31 
[
[
1
1
1
1 [ 1
1
=
+
q(Q 3 q) m=1 (Q 3 m)
Q q=1 q
Q 3 q m=1 (Q 3 m)
=

Q 31
1 [ 1
Q m=1 m

Q
31
[
n=Q 3m+1

Q 31
Q 31
1
1 [ 1 [ 1
+
n
Q m=1 m n=m+1 n

Hence,
Q
31
[
n=1

n
[

q=1

$2

#Q 31 $2
Q 31
[ 1
1 [ 1
3
q
Q m=1 m
q=1

Q[
3m31

Q 31
Q
31
[
1
1
1 [ 1
+
n
Q
m
n
m=1
n=1
n=Q 3m+1
3
4
#Q 31 $2
Q
31
Q
31
Q
31
[ 1
[ 1
1 [ 1 C[ 1
2
D
3
3
=
Q q=1 q
Q m=1 m n=1 n n=Q 3m n

1
q(Q 3 q)

2
=
Q

Q 31
1 [ 1
Q m=1 m

Q
31
[
n=Q 3m+1

1
n

#Q 31 $2
Q 31
Q 31
[ 1
2 [ 1
1 [ 1 1
1
+
+
=
Q q=1 q
Q m=1 m Q 3 m
Q m=1 m
#Q 31 $2
Q 31
Q 31
[ 1
2 [ 1
2 [ 1
1
+ 2
+
=
Q q=1 q
Q m=1 m
Q m=1 m

Q
31
[
n=Q 3m+1

Q
31
[
n=Q 3m+1

1
n

Substituted into (C.5) yields


S
Q 31
Var [ZQ ] = 3

1
q=1 q
2

2
2

(Q 3 1) Q
SQ 31 Sn

SQ 31

1
m
1) Q 2

m=1

(Q 3

2
+

SQ 31
m=1

1
m

SQ 31

1
n=Q 3m+1 n

Q (Q 3 1)

1
q=1 q2 (Q 3q)2

n=1

Q 31

Further,
Q
31
[

n
[

n=1 q=1

q2 (Q

Q
31
Q
31
Q
31
[
[
[
1
1
1
=
1=
2
2
2
2 (Q 3 q)
3 q)
q
(Q
3
q)
q
q=1
q=1
n=q

The partial fraction expansion of


Q
31
[

1
q2 (Q 3q)

n
[

n=1 q=1

q2 (Q

1
Q 2q

1
1
+ Q 2 (Q
Q q2
3q)

Q 31
Q 31
1 [ 1
2 [ 1
1
+
= 2
2
3 q)
Q q=1 q
Q q=1 q2

such that

1
n

518

Solutions of problems
Combined,
S
Q 31
Var [ZQ ] = 3

1
q=1 q
2

2
4

(Q 3 1) Q

SQ 31

(Q

1
q=1 q
3 1) Q 2

Q
31
[
2
1
Q (Q 3 1) m=1 m

Q
31
[
n=Q 3m+1

SQ 31 1
1
q=1 q2
+
n
Q(Q 3 1)

Invoking the identity


Q
31
[
m=1

Q
31
Q
31
[
[
1
1
1
=
Q 3 m n=m n
q2
q=1

(which can be veried by induction) yields


Q
31
[
m=1

1
m

Q
31
[
n=Q 3m+1

Q
31
[
1
1
=
n
Q
3m
m=1

Q
31
[
m=1

Q
31
[
n=m+1

3
4
Q
31
Q 31
[
1D
1
1 C[ 1
=
3
n
Q 3 m n=m n
m
m=1

Q
31
Q
31
Q
31
Q 31
[
[
[
2 [ 1
1
1
1
1
3
=
3
2
Q 3 m n=m n
m (Q 3 m)
q
Q q=1 q
q=1
m=1

Finally, we arrive at (16.19).


(iii) The limit for Q < " of the probability generating function (16.17) of the weight ZQ of
the shortest path
l
k
*ZQ (}) = H h3}ZQ =

Q
31 \
n
[
q(Q 3 q)
1
Q 3 1 n=1 q=1 } + q(Q 3 q)

will be derived from which the distribution then follows by taking the inverse Laplace
transform. Since
3v
4 3v
4


 2
 2
Q
Q
Q
Q
C
D
C
} + q(Q 3 q) =
3q
3q D
+}+
+}3
2
2
2
2
u
we have with | =

Q
2

2

+ },

n
\

n
n
\
n!(Q 3 1)! \
q(Q 3 q)
1
1




=
Q
Q
}
+
q(Q
3
q)
(Q
3
n
3
1)!
q=1
q=1 | + 2 3 q q=1 | 3 2 + q

The products can be written in terms of the Gamma function,




3n
K |+ Q
2


 =

Q
K |+ Q
q=1 | + 2 3 q
2


n
+1
K |3 Q
\
1
2
 = 


Q
K |3 Q
+n+1
q=1 | 3 2 + q
2
n
\

Thus,

*ZQ





Q
31
K |3 Q
+ 1 Q[
K (n + 1) K | + 2 3 n
2




(}) = (Q 3 2)!
K(Q 3 n) K | 3 Q + n + 1
K |+ Q
n=1
2
2

C.10 The uniform recursive tree (Chapter 16)


I
Let the number of nodes be even Q = 2P such that | = P 2 + } ; P +
|}| ? 2P). The sum, denoted by V, can be split as
V=

519
}
2P

(provided

2P
31
P
[
[
K (n + 1)
K (n + 1)
K (| + P 3 n)
K (| + P 3 n)
+
K(2P
3
n)
K
(|
3
P
+
n
+
1)
K(2P
3
n)
K
(|
3 P + n + 1)
n=1
n=P+1
P
31
[
m=0

P
31
[
K (P 3 m + 1) K (| + m)
K (P + n + 1) K (| 3 n)
+
K(P + m) K (| 3 m + 1)
K(P 3 n) K (| + n + 1)
n=1

P
31
[

m=3(P 31)

K (| + m) K (P 3 m + 1)
K(P + m) K (| 3 m + 1)

and
*Z2P (}) = (2P 3 1)!

K (| 3 P + 1)
1
K (| + P) 2P 3 1

P
31
[
m=3(P 31)

K (| + m) K (P 3 m + 1)
K(P + m) K (| 3 m + 1)

For large P,
(2P 3 1)!




 }
K } +1
}
K (| 3 P + 1)
 ; (2P)3 2P K
; K (2P)  2P
+1
}
K (| + P)
2P
K 2P + 2P

which suggests that we consider } < 2P} since then, using (Abramowitz and Stegun, 1968,
Section 6.1.47),
*Z2P (2P}) ; (2P)3} K (} + 1)
3}

; (2P)

1
2P 3 1

P
31
[
m=3(P 31)

 

  
1
1
1+R
1+R
P
P

K (} + 1)

Hence,
lim Q } *ZQ (Q}) = K (} + 1)

Q <"

or equivalently,

l
k
lim H h3(Q ZQ 3log Q )} = K (} + 1)

Q <"

(C.6)

The inverse Laplace transform of K (} + 1) is a Gumbel distribution (3.37) and we arrive


at the asymptotic distribution for the
k weight of thel shortest path (16.20).
Q
Since Pr [QZQ 3 log Q $ |] = Pr ZQ $ |+log
from which after substitution of { =
Q
|+log Q
and ignoring the limit Q < ", it follows that Pr [ZQ $ {] = h3Q h
Q
probability density function is found after derivation as
Q { +{
)
iZQ ({) = Q 2 h3Q (h

Q{

, the

(C.7)

The goodness of this asymptotic distribution (C.7) for nite Q is illustrated in Fig. C.8.
Observe from Fig. C.8 that iZQ (0) = 1 while iZQ (0) = Q 2 h3Q E 0. Since *ZQ (}) =
U
" 3}w
iZQ (w) gw is a single-sided Laplace transform, integrating by parts yields }*ZQ (}) =
0 h
U
0
0
iZQ (0) + 0" h3}w iZ
(w) gw provided iZ
(w) exists for all w D 0. Hence, we nd a wellQ
Q
known limit criterion of single-sided Laplace transforms,
iZQ (0) = lim }*ZQ (})
}<"

(C.8)

Applied to (16.17) leads to iZQ (0) = 1 for all nite Q and applied to the scaled link
weight where the mean is d1 such that *ZQ;d (}) = *ZQ (d}) gives iZQ (0) = d. The
interpretation of this property is related to the choice of the link weights. The shortest

520

Solutions of problems
2

10

N = 200
N = 100
N = 50

10

fWN(x)

10

-1

10

-2

10

-3

10

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Fig. C.8. The pdf of the weight of the shortest path for various Q. Each simulation consists
of 106 iterations. The bold curves represent the nite Q-equivalent (C.7) of the asymptotic
result.

 
path includes almost surely the smallest link weights of Iz ({) = iz (0){ + R {2 since
Iz (0) = 0. Both the exponential (with parameter 1) and uniform distribution are regular
with iz (0) = 1. Since the smallest values of the weight of the shortest path ZQ in
NQ occur for direct links a.s., the distribution of ZQ around zero is dominated by the
distribution of the link weight z around zero. The contribution cannot be due a q-hop
shortest path with q A 1 since such a path existsof thesum of q exponentials, which has a
probability density around { = 0 of the form R {q31 . Indeed, see (3.24) or apply (C.8)
T
n
to the pgf of a sum of q exponentials *Vq (}) = q
n=1 }+n .
(iv) When the intermediate nodes of the shortest path between a source D and a destination
E are removed from NQ , we obtain again a complete graph with Q 3 kQ + 1 nodes.
The resulting graph contains link weights that are not perfectly exponentially distributed
anymore nor are they perfectly independent, because we have removed a special set of nodes
and not a random set. But, since we have removed at each node of the shortest path, apart
from the shortest link, Q 3 3 other links, we assume that the dependence between NQ and
the reduced graph is ignorably small. Under these assumptions, the shortest node-disjoint
path in NQ is a shortest path in NQ 3KQ +1 with exponential link weight with mean 1.
The distribution of hopcount kqg
Q of that shortest node-disjoint path is
31
k
l Q[


Pr kqg
Pr kQ 3m+1 = n|kQ = m Pr [kQ = m]
Q =n '
m=0

The hopcount KQ of the shortest path in the complete graph NQ with independent exponential link weights with mean 1 is given in (16.8). With the assumption that




Pr kQ 3m+1 = n|kQ = m = Pr kQ 3m+1 = n

C.10 The uniform recursive tree (Chapter 16)

521

node disjoint shortest path


shortest path
0.20

Pr[HN = k]

0.15

N = 200
0.10
N = 100

N = 50
0.05

0.00
0

10

12

14

k hops

Fig. C.9. Both the pdf of the hopcount of the shortest path (thin line) and the shortest
node-disjoint path (bold line).

we obtain
(m+1)
Q 31 (n+1)
l
k
(31)n+1 [ VQ 3m+1 VQ
Pr kqg
Q =n '
Q!
(Q 3 m + 1)!
m=0

For large Q, we can use the Poisson approximation (16.13)


l
k
Pr kqg
Q =n E

Q 31
1 [ (log(Q 3 m + 1))n (log Q )m
Qn! m=0
Q 3m+1
m!

Since (log(Q 3 m + 1))n = logn Q 3 n(m31)


logn31 Q + R
Q


R Q12 , we have to highest order in Q,

logn1
Q2


and

1
Q 3m+1

1
Q

#
#
$$
Q 31
1 [ logn Q
logn31 Q
(log Q )m
+R
2
Qn! m=0
Q
Q
m!
#
#
$$
1 logn Q
logn31 Q
E
+R
E Pr [kQ = n]
n!
Q
Q2

l
k
Pr kqg
Q =n E

For large Q, we expect approximately that the hopcount of the shortest and that of the
shortest node-disjoint path have about the same distribution. The validity of the assumption is illustrated in Fig. C.9 for relatively small values of Q = 50> 100> and 200. Each
simulation consisted of q = 106 iterations. The corresponding weight of the shortest and
node-disjoint shortest path are drawn in Fig. C.10. The weight of the node-disjoint shortest
path is evidently always larger than that of the shortest path in the same graph. Nevertheless, for large Q , the simulations suggest that both pdfs tend to each other.

522

Solutions of problems

shortest path
node-disjoint shortest path

10

N = 50

fW (x)

0.1

N = 100

N = 200

0.01

0.001

0.0

0.1

0.2

0.3

0.4

0.5

Fig. C.10. Pdf of the weight of the shortest path (thin line) and the node-disjoint shortest
path (bold line).

C.11 The e!ciency of multicast (Chapter 17)


(i) Using (17.25), we obtain

jQ>n (2) = Q 3 1 3

G31
[

G3m


Q 313

m=0

nm+1 31
n31


Q 323

nm+1 31
n31

(Q 3 1)(Q 3 2)

(2Q 3 3)G
(2Q 3 3)
(2Q 3 3)GQ
+
3
+
(Q 3 1)(Q 3 2)
(Q 3 1)(Q 3 2)(n 3 1)
(Q 3 2)(n 3 1)
Q(Q 3 1 3 2G)
1
(Q 3 1 3 2G)
3
3
3
(Q 3 1)(Q 3 2)(n 3 1)
(Q 3 1)(Q 3 2)(n 3 1)2
(Q 3 2)(n 3 1)2

or, for large Q,


jQ>n (2) ; 2G 3

3
+R
n31

logn Q
Q

the eective power exponent  W (Q) as dened in (17.32), equals for the n-ary tree and
large Q ,


3
2G3 n1
log
1
G3 n1
 W (Q) ;
log 2
5
6
1
8

= 1 + log2 71 3
1
2(n 3 1) logn Q + logn (1 3 1@n) 3 n31


1
1
;13
= 1 + log2 1 3
2(n 3 1)H[KQ ]
(log 4)(n 3 1)H[KQ ]
which shows, for large Q, that  W (Q) ? 1, but that  W (Q) < 1 if n < ".

Bibliography

Abramowitz, M. and Stegun, I. A. (1968). Handbook of Mathematical Functions.


(Dover Publications, Inc., New York).
Allen, A. O. (1978). Probability, Statistics, and Queueing Theory. Computer Science
and Applied Mathematics, (Academic Press, Inc., Orlando).
Almkvist, G. and Berndt, B. C. (1988). Gauss, Landen, Ramanuyan, the ArithmicGeometric Mean, Ellipses,  and the Ladies Diary. American Mathematical
Monthly 95, 585608.
Anick, D., Mitra, D., and Sondhi, M. M. (1982). Stochastic theory of a datahandling system with multiple sources. The Bell System Technical Journal 61, 8
(October), 18711894.
Anupindi, R., Chopra, S., Deshmukh, S. D., Van Mieghem, J. A., and Zemel, E.
(2006). Managing Business Flows. Principles of Operations Management , 2nd
edn. (Prentice Hall, Upper Saddle River).
Barabasi, A.-L. (2002). Linked, The New Science of Networks. (Perseus, Cambridge,
MA).
Baran, P. (2002). The beginnings of packet switching - some underlying concepts:
The Franklin Institute and Drexel University seminar on the evolution of packet
switching and the Internet. IEEE Communications Magazine, 28.
Berger, M. A. (1993). An Introduction to Probabiliy and Stochastic Processes.
(Springer-Verlag, New York).
Bertsekas, D. and Gallager, R. (1992). Data Networks, 2nd edn. (Prentice-Hall
International Editions, London).
Billingsley, P. (1995). Probability and Measure, 3rd edn. (John Wiley & Sons, New
York).
Bisdikian, C., Lew, J. S., and Tantawi, A. N. (1992). On the tail approximation
of the blocking probability of single server queues with nite buer capacity.
Queueing Networks with Finite Capacity, Proc. 2nd Int. Conf., 267280.
Bollobas, B. (2001). Random Graphs, 2nd edn. (Cambridge University Press,
Cambridge, UK).
Borovkov, A. A. (1976). Stochastic Processes in Queueing Theory. (Springer-Verlag,
New York).
Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. (Cambridge University Press, Cambridge).
Brockmeyer, E., Halstrom, H. L., and Jensen, A. (1948). The Life and Works of
A. K. Erlang. (Academy of Technical Sciences, Copenhagen).

523

524

BIBLIOGRAPHY

Chalmers, R. C. and Almeroth, K. C. (2001). Modeling the branching characteristics and e!ciency gains in global multicast trees. IEEE INFOCOM2001,
Alaska.
Chen, L. Y. (1975). Poisson approximation for dependent trials. The Annals of
Probability 3, 3, 534545.
Chen, W.-K. (1971). Applied Graph Theory. (North-Holland Publishing Company,
Amsterdam).
Chuang, J. and Sirbu, M. A. (1998). Pricing multicast communication: A costbased approach. Proceedings of the INET98 .
Cohen, J. W. (1969). The Single Server Queue. (North-Holland Publishing Company, Amsterdam).
Cohen-Tannoudji, C., Diu, B., and Lalo, F. (1977). Mcanique Quantique. Vol. I
and II. (Hermann, Paris).
Comtet, L. (1974). Advanced Combinatorics, revised and enlarged edn. (D. Riedel
Publishing Company, Dordrecht, Holland).
Cormen, T. H., Leiserson, C. E., and Rivest, R. L. (1991). An Introduction to
Algorithms. (MIT Press, Boston).
Cvetkovic, D. M., Doob, M., and Sachs, H. (1995). Spectra of Graphs, Theory and
Applications, third edn. (Johann Ambrosius Barth Verlag, Heidelberg).
Dorogovtsev, S. N. and Mendes, J. F. F. (2003). Evolution of Networks, From
Biological Nets to the Internet and WWW. (Oxford University Press, Oxford).
Embrechts, P., Klppelberg, C., and Mikosch, T. (2001a). Modelling Extremal
Events for Insurance and Finance, 3rd edn. (Springer-Verlag, Berlin).
Embrechts, P., McNeil, A., and Straumann, D. (2001b). Correlation and Dependence in Risk Management: Properties and Pitfalls. Risk Management: Value at
Risk and Beyond, ed. M. Dempster and H. K. Moatt, (Cambridge University
Press, Cambridge, UK).
Erds, P. and Rnyi, A. (1959). On random graphs. Publicationes Mathematicae
Debrecen 6, 290297.
Erds, P. and Rnyi, A. (1960). On the evolution of random graphs. Magyar Tud.
Akad. Mat. Kutato Int. Kozl. 5, 1761.
Feller, W. (1970). An Introduction to Probability Theory and Its Applications, 3rd
edn. Vol. 1. (John Wiley & Sons, New York).
Feller, W. (1971). An Introduction to Probability Theory and Its Applications, 2nd
edn. Vol. 2. (John Wiley & Sons, New York).
Floyd, S. and Paxson, V. (2001). Di!culties in simulating the internet. IEEE
Transactions on Networking 9, 4 (August), 392403.
Fortz, B. and Thorup, M. (2000). Internet tra!c engineering by optimizing OSPF
weights. IEEE INFOCOM2000 .
Frieze, A. M. (1985). On the value of a random minimum spanning tree problem.
Discrete Applied Mathematics 10, 4756.
Gallager, R. G. (1996). Discrete Stochastic Processes. (Kluwer Academic Publishers, Boston).
Gantmacher, F. R. (1959a). The Theory of Matrices. Vol. I. (Chelsea Publishing
Company, New York).
Gantmacher, F. R. (1959b). The Theory of Matrices. Vol. II. (Chelsea Publishing
Company, New York).
Gauss, C. F. (1821). Theoria combinationis observationum erroribus minimus obnoxiae. Pars prior. Gauss Werke 4, 326.

BIBLIOGRAPHY

525

Gilbert, E. N. (1956). Enumeration of labelled graphs. Canadian Journal of Mathematics 8, 405411.


Gnedenko, B. V. and Kovalenko, I. N. (1989). Introduction to Queuing Theory,
second edn. (Birkhauser, Boston).
Golub, G. H. and Loan, C. F. V. (1983). Matrix Computations. (North Oxford
Academic, Oxford).
Goulden, I. P. and Jackson, D. M. (1983). Combinatorial Enumeration. (John
Wiley & Sons, New York).
Grimmett, G. R. (1989). Percolation. (Springer-Verlag, New York).
Grimmett, G. R. and Stirzacker, D. (2001). Probability and Random Processes, 3rd
edn. (Oxford University Press, Oxford).
Hardy, G. H. (1948). Divergent Series. (Oxford University Press, London).
Hardy, G. H., Littlewood, J. E., and Polya, G. (1999). Inequalities, 2nd edn.
(Cambridge University Press, Cambridge, UK).
Hardy, G. H. and Wright, E. M. (1968). An Introduction to the Theory of Numbers,
4th edn. (Oxford University Press, London).
Harris, T. E. (1963). The Theory of Branching Processes. (Springer-Verlag, Berlin).
Harrison, J. M. (1990). Brownian Motion and Stochastic Flow Systems. (Krieger
Publishing Company, Malabar, Florida).
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2001). First passage
percolation on the random graph. Probability in the Engineering and Informational Sciences (PEIS) 15, 225237.
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2002a). The ooding
time in random graphs. Extremes 5, 2 (June), 111129.
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2002b). On the
covariance of the level sizes in recursive trees. Random Structures and Algorithms 20, 519539.
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2005). Distances
in random graphs with nite variance degree. Random Structures and Algorithms 27, 1 (August), 76123.
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2006a). Size and
weight of shortest path trees with exponential link weights. Combinatorics, Probability and Computing.
van der Hofstad, R., Hooghiemstra, G., and Van Mieghem, P. (2006b). The weight
of the shortest path tree. Random Structures and Algorithms.
Hooghiemstra, G. and Koole, G. (2000). On the convergence of the power series
algorithm. Performance Evaluation 42, 2139.
Hooghiemstra, G. and Van Mieghem, P. (2005). On the mean distance in scale free
graphs. Methodology and Computing in Applied Probability (MCAP) 7, 285306.
Jamin, S., C. Jin, A. R. Kurc, D. R., and Shavitt, Y. (2001). Constrained mirror
placement on the internet. IEEE INFOCOM01 .
Janic, M., Kuipers, F., Zhou, X., and Van Mieghem, P. (2002). Implications for QoS
provisioning based on traceroute measurements. Proceedings of 3nd International
Workshop on Quality of Future Internet Services, QofIS2002 ed. B. Stiller et al.,
Zurich, Switzerland, Springer Verlag LNCS 2511 , 314.
Janson, S. (1995). The minimal spanning tree in a complete graph and a functional limit theorem for trees in a random graph. Random Structures and Algorithms 7, 4 (December), 337356.

526

BIBLIOGRAPHY

Janson, S. (2002). On concentration of probability. Contemporary Combinatorics,


ed. B. Bollobs, Bolyai Soc. Math. Stud. 10, Jnos Bolyai Mathematical Society,
Budapest, 289301.
Janson, S., Knuth, D. E., Luczak, T., and Pittel, B. (1993). The birth of the giant
component. Random Structures and Algorithms 4, 3, 233358.
Karlin, S. and Taylor, H. M. (1975). A First Course in Stochastic Processes, 2nd
edn. (Academic Press, San Diego).
Karlin, S. and Taylor, H. M. (1981). A Second Course in Stochastic Processes.
(Academic Press, San Diego).
Kelly, F. P. (1991). Special invited paper: Loss networks. The Annals of Applied
Probability 1, 3, 319378.
Kleinrock, L. (1975). Queueing Systems. Vol. 1 Theory. (John Wiley and Sons,
New York).
Kleinrock, L. (1976). Queueing Systems. Vol. 2 Computer Applications. (John
Wiley and Sons, New York).
Krishnan, P., Raz, D., and Shavitt, Y. (2000). The cache location problem.
IEEE/ACM Transactions on Networking 8, 5 (October), 586582.
Kuipers, F. A. and Van Mieghem, P. (2003). The Impact of Correlated Link Weights
on QoS Routing. IEEE INFOCOM03 .
Lanczos, C. (1988). Applied Analysis. (Dover Publications, Inc., New York).
Langville, A. N. and Meyer, C. D. (2005). Deeper inside PageRank. Internet
Mathematics 1, 3 (Februari), 335380.
Le Boudec, J.-Y. and Thiran, P. (2001). Network Calculus, A Theory of Deterministic Queuing Systems for the Internet. (Springer Verlag, Berlin).
Leadbetter, M. R., Lindgren, G., and Rootzen, H. (1983). Extremes and Related
Properties of Random Sequences and Processes. (Springer-Verlag, New York).
Leon-Garcia, A. (1994). Probability and Random Processes for Electrical Engineering, 2nd edn. (Addison-Wesley, Reading, Massachusetts).
van Lint, J. H. and Wilson, R. M. (1996). A course in Combinatorics. (Cambridge
University Press, Cambridge, UK).
Lovsz, L. (1993). Random Walks on Graphs: A Survey. Combinatorics 2, 146.
Markushevich, A. I. (1985). Theory of functions of a complex variable. Vol. I III.
(Chelsea Publishing Company, New York).
Mehta, M. L. (1991). Random Matrices, 2nd edn. (Academic Press, Boston).
Meyer, C. D. (2000). Matrix Analysis and Applied Linear Algebra. (Society for
Industrial and Applied Mathematics (SIAM), Philadelphia).
Mitra, D. (1988). Stochastic theory of a uid model of producers and consumers
coupled by a buer. Advances in Applied Probability 20, 646676.
Morse, P. M. and Feshbach, H. (1978). Methods of Theoretical Physics. (McGrawHill Book Company, New York).
Neuts, M. F. (1989). Structured Stochastic Matrices of the M/G/1 Type and Their
Applications. (Marcel Dekker Inc., New York).
Norros, I. (1994). A storage model with self-similar input. Queueing Systems 16, 34, 387396.
Pascal, B. (1954). Oeuvres completes. Bibliothque de la Plade, (Gallimard, Paris).
Paxson, V. (1997). End-to-end Routing Behavior in the Internet. IEEE/ACM
Transactions on Networking 5, 5 (October), 601615.
Phillips, G., Schenker, S., and Tangmunarunkit, H. (1999). Scaling of multicast
trees: Comments on the chuang-sirbu scaling law. ACM Sigcomm99 .

BIBLIOGRAPHY

527

Pietronero, L. and Schneider, W. (1990). Invasion percolation as a fractal growth


problem. Physica A 170, 81104.
Press, W. H., Teukolsky, S. A., Vetterling, W. T., and Flannery, B. P. (1992).
Numerical Recipes in C , 2nd edn. (Cambridge University Press, New York).
Rainville, E. D. (1960). Special Functions. (Chelsea Publishing Company, New
York).
Riordan, J. (1968). Combinatorial Identities. (John Wiley & Sons, New York).
Roberts, J. W. (1991). Performance Evaluation and Design of Multiservice Networks. Information Technologies and Sciences, vol. COST 224. (Commission of
the European Communities, Luxembourg).
Robinson, S. (2004). The prize of anarchy. SIAM News 37, 5 (June), 14.
Ross, S. M. (1996). Stochastic Processes, 2nd edn. (John Wiley & Sons, New York).
Royden, H. L. (1988). Real Analysis, 3rd edn. (Macmillan Publishing Company,
New York).
Sansone, G. and Gerretsen, J. (1960). Lectures on the Theory of Functions of a
Complex Variable. Vol. 1 and 2. (P. Noordho, Groningen).
Schoutens, W. (2000). Stochastic Processes and Orthogonal Polynomials. (SpringerVerlag, New York).
Siganos, G., Faloutsos, M., Faloutsos, P., and Faloutsos, C. (2003). Power laws and
the AS-level internet topology. IEEE/ACM Transactions on Networking 11, 4
(August), 514524.
Smythe, R. T. and Mahmoud, H. M. (1995). A survey of recursive trees. Theory
of Probability and Mathematical Statistics 51, 127.
Steyaert, B. and Bruneel, H. (1994). Analytic derivation of the cell loss probability
in nite multiserver buers, from innite buer results. Proceedings of the second
workshop on performance modelling and evaluation of ATM networks, Bradford
UK , 18.111.
Strogatz, S. H. (2001). Exploring complex networks. Nature 410, 8 (March), 268
276.
Syski, R. (1986). Introduction to Congestion Theory in Telephone Systems, 2nd
edn. Studies in Telecommunication, vol. 4. (North-Holland, Amsterdam).
Titchmarsh, E. C. (1948). Introduction to the Theory of Fourier Integrals, 2nd edn.
(Oxford University Press, Ely House, London W. I).
Titchmarsh, E. C. (1964). The Theory of Functions. (Oxford University Press,
Amen House, London).
Titchmarsh, E. C. and Heath-Brown, D. R. (1986). The Theory of the Zetafunction, 2nd edn. (Oxford Science Publications, Oxford).
Van Mieghem, P. (1996). The asymptotic behaviour of queueing systems: Large
deviations theory and dominant pole approximation. Queueing Systems 23, 27
55.
Van Mieghem, P. (2001). Paths in the simple random graph and the Waxman
graph. Probability in the Engineering and Informational Sciences (PEIS) 15,
535555.
Van Mieghem, P. (2004a). Data Communications Networking. (Delft University of
Technology, Delft).
Van Mieghem, P. (2004b).
The Probability Distribution of the Hopcount
to an Anycast Group.
Delft University of Technology, Report 2003605
(www.nas.ewi.tudelft.nl/people/Piet/teleconference) .

528

BIBLIOGRAPHY

Van Mieghem, P. (2005).


The limit random variable W of a branching process.
Delft University of Technology, Report 20050206
(www.nas.ewi.tudelft.nl/people/Piet/teleconference) .
Van Mieghem, P., Hooghiemstra, G., and van der Hofstad, R. (2000). A Scaling Law
for the Hopcount in the Internet. Delft University of Technology, Report2000125
(www.nas.ewi.tudelft.nl/people/Piet/telconference).
Van Mieghem, P., Hooghiemstra, G., and van der Hofstad, R. (2001a). On the
e!ciency of multicast. IEEE/ACM Transactions on Networking 9, 6 (December),
719732.
Van Mieghem, P., Hooghiemstra, G., and van der Hofstad, R. W. (2001b). Stochastic model for the number of traversed routers in internet. Proceedings of Passive
and Active Measurement: PAM-2001, April 23-24, Amsterdam.
Van Mieghem, P. and Janic, M. (2002). Stability of a multicast tree. Proceedings
IEEE INFOCOM2002 2, 10991108.
Veres, A. and Boda, M. (2000). The chaotic nature of TCP congestion control.
IEEE INFOCOM2000, Tel-Aviv, Israel .
Walrand, J. (1998). Communication Networks, A First Course, 2nd edn. (McGrawHill, Boston).
Wstlund, J. (2005). Evaluation of Jansons constant for the variance in the random minimum spanning tree problem. Linkping studies in Mathematics. Series
editor: Bengt Ove Turesson 7 (www.ep.liu.se/ea/lsm/2005/007).
Waxman, B. M. (1998). Routing of multipoint connections. IEEE Journal on
Selected Areas in Communications 6, 9 (December), 16171622.
Whittaker, E. T. and Watson, G. N. (1996). A Course of Modern Analysis, Cambridge Mathematical Library edn. (Cambridge University Press, Cambridge,
UK).
Wigner, E. P. (1955). Characteristic vectors of bordered matrices with innite
dimensions. Annals of Mathematics 62, 3 (November), 548564.
Wigner, E. P. (1957). Characteristic vectors of bordered matrices with innite
dimensions ii. Annals of Mathematics 65, 2 (March), 203207.
Wigner, E. P. (1958). On the distribution of the roots of certain symmetric matrices.
Annals of Mathematics 67, 2 (March), 325327.
Wilkinson, J. H. (1965). The Algebraic Eigenvalue Problem. (Oxford University
Press, New York).
Wol, R. W. (1982). Poisson arrivals see time averages. Operations Research 30, 2
(April), 223231.
Wol, R. W. (1989). Stochastic Modeling and the Theory of Queues. (Prentice-Hall
International Editions, New York).

Index

n-ary tree, 387, 401, 414, 417, 423, 432, 522

Pareto, 56
Poisson, 40, 116, 129, 335
polynomial, 44, 348, 494
regular, 348, 362
uniform, 43, 74
Weibull, 55, 107, 132

adjacency matrix, 320, 471, 488


eigenvalues, 475
Bayes rule, 28, 197
Benes equation, 261, 301
cell loss ratio (clr), 309, 512
Central Limit Theorem, 104, 148, 366, 377,
516
Cherno bound, 88
ChuangSirbu scaling law, 404407
complete graph, 319, 321, 327, 347, 349, 359,
371, 373, 380, 392, 473, 481, 482, 488, 520
conditional distribution function, 28
conditional expectation, 34, 233, 341
conditional probability, 26
conditional probability density function, 28
correlation coe!cient, 30, 61, 67, 69, 74, 119
covariance, 29, 71, 78
matrix, 63, 66
degree graph, 323
degree of a node, 225, 322, 472
disjoint paths, 520
Mergers Theorem, 327
distribution
n-th order statistics, 53, 494
Bernoulli, 37
binomial, 38, 332, 488
Cauchy, 54
chi-square, 50
Erlang, 48, 125, 274, 278
exponential, 44, 75
extremal, 106
Frchet, 107
Gamma, 48, 51, 125
Gaussian, 46, 103, 400
geometric, 39, 272
Gumbel, 54, 107, 400
joint Gaussian, 64
lognormal, 57, 77

Engset formula, 314, 511


Erlang B formula, 2, 280
Erlang C formula, 277
event, 10, 53
mutually exclusive, 10
failure rate, 131
ooding time, 362
giant component, 335, 337, 339
Google, 224
graph connectivity, 325, 486
edge connectivity, 326, 487
vertex connectivity, 326, 487
graph metrics
betweenness, 329
clustering coe!cient, 328, 346, 513
diameter, 475
distortion, 329
expansion, 328
hopcount, 329
resilience, 329
histogram, 118, 494
hopcount, 340, 347, 354, 357, 387, 392, 403,
409, 418, 420, 423, 431, 513, 520
incidence matrix, 471
inclusion-exclusion formula, 12, 335, 391
sieve of Eratosthenes, 15
indicator function, 12, 17, 43, 321
inequality
Boole, 15
Cauchy-Schwarz, 90, 91, 480
Chebyshev, 88
Gauss, 92

529

530

Index

Hlder, 90, 480


Jensen, 85, 342
Markov, 88
Minkowsky, 91
innitesimal generator, 181
Laplacian (admittance matrix), 472, 486
law of rare events, 41, 128, 495
law of total probability, 27, 123, 142, 159, 204,
205, 238, 255, 274, 278, 284, 295, 324,
341, 367, 410, 422, 445, 495
level set of a tree, 352
Lindleys equation, 255
link weight, 320, 340, 341, 347, 349, 359, 362,
373, 392, 406, 408
Littles law, 267, 273, 275, 281, 287, 297, 508,
510
Markov chain
absorbing states, 164
communicating states, 162
conservative, 181
continuous-time, 179
discrete-time, 158
embedded, 186, 188
hitting time, 163
irreducible Markov chain, 161, 226
periodic and aperiodic, 162
transient and recurrent states, 165
mean time to failure, 131
memoryless property, 27, 40, 45, 125, 132, 185,
351
Metcalfes law, 320
minimum spanning tree (MST), 373, 399
modes of convergence, 99
Newton identities for polynomials, 477
order statistics, 52, 127
PageRank (Google), 224
phase transition, 335, 376
Poisson arrivals see time averages (PASTA),
267, 274, 275, 283, 288, 312, 509, 511
Pollaczek-Khinchin equation, 286
power law, 325
probability density function (pdf), 16, 20, 22
joint, 28, 32
probability generating function (pgf), 18
logarithm of, 19, 25
moment generating function, 25, 235
process
arrival, 248, 270
birth and death, 208, 304, 351
branching, 229, 342
geometric, 244
Poisson, 246, 345
counting, 263
Markov, 180, 253, 349
balance equation, 187
Chapman-Kolmogorov equation, 180

forward and backward equation, 182, 195


time reversibility, 196
nonhomogeneous Poisson, 129
Poisson, 120, 210
queueing, 250
renewal, 137
service, 249, 270
stochastic, 115
modeling, 117
Yule, 212, 230
quality of service (QoS), 2, 249, 283, 309, 340,
419
random graph, 330, 332, 337, 339, 346, 354,
362, 373, 374, 377, 387, 392, 403, 404,
406, 408, 410, 488, 513
random variable
continuous, 20, 59
discrete, 16, 58
expectation, 17, 22
indepedent, 97, 104
independent, 28, 29, 32, 34, 47, 49, 51, 78
normalized, 31, 93, 400
random vector, 62
random walk, 202, 484
redundancy level, 325
regular graphs, 322, 328, 475, 482, 486, 488
reliability function, 131
renewal
alternating renewal process, 153
Blackwells Renewal Theorem, 146
Elementary Renewal Theorem, 145, 170
inspection paradox, 152
Key Renewal Theorem, 146, 151
renewal equation, 141
renewal function, 140
renewal process, 137
renewal theory, 165
server placement problem, 419, 424, 429
shortest path, 340, 347
tree (SPT), 387, 392, 399, 407, 419, 428
slotted Aloha, 219
stochastic matrix, 159, 450, 451, 455, 473
total variation distance, 42
transition probability matrix, 159, 190, 201
spectral decomposition, 184
unnished work, 256, 261, 263, 301
uniform recursive tree (URT), 354, 380, 392,
404, 407, 411, 417, 419, 422, 424
uniformization, 189
Walds identity, 34, 145, 154
web graph, 224, 323
Wigners Semicircle Law, 489

S-ar putea să vă placă și