Sunteți pe pagina 1din 784

To my wife, Betsy and my mother, Vicky

Mircea Grigoriu

Stochastic Calculus

Applications in Science
and Engineering

Springer Science+Business Media, LLC


Mircea Grigoriu
Cornell University
School of Civil
and Environmental Engineering
Ithaca, NY 14853
U.S.A.

Library of Congress Cataloging-in-Publication Data

Grigoriu, Mircea.
Stochastic calculus : applications in science and engineering I Mircea Grigoriu.
p.cm.
Includes bibliographical references and index.
ISBN 978-1-4612-6501-6
1. Stochastic analysis. I. Title.

QA274.2.G75 2002
519.2-dc21 2002074386
CIP

ISBN 978-1-4612-6501-6 ISBN 978-0-8176-8228-6 (eBook)


DOI 10.1007/978-0-8176-8228-6

AMS Subject Oassifications: 37A50, 60GXX, 35SXX, 35QXX

Printed on acid-free paper. \:ti)®


© 2002 Springer Science+Business Media New York
Originally published by Birkhiiuser Boston in 2002
Birkhiiuser ao»
Softcover reprint of the hardcover 1st edition 2002

All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC),
except for brief excerpts in connection with reviews or
scholarly analysis. Use in connection with any form of information storage and retrieval, electronic
adaptation, computer software, or by similar or dissimilar methodology now known or hereafter
developed is forbidden .
The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the
former are not especially identified, is not to be taken as a sign that such names, as understood by the
Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.

SPIN 10835902

Typeset by the author.

9 876 54 32 1
Contents

1 Introduction 1

2 Probability Theory 5
2.1 Introduction . . 5
2.2 Probability space .. 5
2.2.1 Sample space . 5
2.2.2 a-field •• 0 • 6
2.2.3 Probability measure 8
2.3 Construction of probability spaces 12
2.3.1 Countable sample space 13
2.3.2 Product probability space . 13
2.3.3 Extension of probability measure 16
2.3.4 Conditional probability . 16
2.3.5 Sequence of sets 18
2.3.6 Sequence of events 20
2.4 Measurable functions . . . 21
2.4.1 Properties . . . . . 22
2.4.2 Definition of random variable 22
2.4.3 Measurable transformations 24
2.5 Integration and expectation . . . . . . 26
2.5.1 Expectation operator . . . . . 28
2.5.1.1 Finite-valued simple random variables 28
2.5.1.2 Positive random variables . . . . 29
2.5.1.3 Arbitrary random variables .... 29
2.5.2 Properties of integrals of random variables 30
2.5.2.1 Finite number of random variables 30
2.5.2.2 Sequence of random variables . 32
2.5.2.3 Expectation ............ 33
2.6 The Lq (Q, F, P) space . . . . . . . . . . . . . . . . . . . . 34
2.7 Independence . . . . . . . . . . . 36
2.7.1 Independence of a-fields . 36
2.7.2 Independence of events . 36

v
v1 Contents

2.7.3 Independence of random variables 38


2.8 The Fubini theorem . . . . 39
2.9 Radon-Nikodym derivative . 41
2.10 Random variables . . . . . . 42
2.10.1 Distribution function 43
2.10.2 Density function . . 45
2.10.3 Characteristic function 47
2.10.3.1 Properties 47
2.10.3.2 Infinitely divisible characteristic function . 52
2.10.3.3 a-Stable random variable . . 57
2.11 Random vectors . . . . . . . . . . . . . . . . . 58
2.11.1 Joint distribution and density functions 59
2.11.2 Independence . . . . . 61
2.11.3 Characteristic function 62
2.11.4 Moments . . . . 64
2.11.5 Gaussian vector . . . . 65
2.12 U sefu1 inequalities . . . . . . 68
2.13 Convergence of random variables . 70
2.14 Random walk 75
2.15 Filtration . . . . . . . . 78
2.16 Stopping time . . . . . . 78
2.17 Conditional expectation . 82
2.17.1 a-field generated by a countable partition of Q 84
2.17.2 General a-field 87
2.18 Martingales . . . . . . . . . 92
2.18.1 Properties . . . . . . 94
2.18.2 Stopped martingales 96
2.18.3 Inequalities 98
2.19 Problems . . . 99

3 Stochastic Processes 103


3.1 Introduction . 103
3.2 Definitions . . . 104
3.3 Continuity . . . 110
3.4 Stopping times 114
3.5 Finite dimensional distributions and densities 117
3.6 Classes of stochastic processes 119
3.6.1 Stationary processes 119
3.6.2 Ergodic processes . . . 120
3.6.3 Markov processes . . . 120
3.6.4 Independent increment processes 122
3.6.5 Gaussian processes . . . . . . . 124
3.6.6 Translation processes . . . . . . 125
3.6.7 Mixture of translation processes 126
Contents vii

3.7 Second moment properties . . . . . . . . . . . 127


3. 7.1 Properties of the correlation function 130
3.7.2 Power spectral density . . . . . . . . . 132
3.7.2.1 Bochner's theorem . . . . . . 132
3.7.2.2 ~-valued stochastic processes . 132
3.7.2.3 C-valued stochastic processes . 134
3.7.2.4 ~d -valued stochastic processes 135
3.8 Equivalent stochastic processes . 137
3.9 Second moment calculus 139
3.9.1 Continuity .. 141
3.9.2 Differentiation . 142
3.9.3 Integration . . . 145
3.9.3.1 Variation functions. 146
3.9.3.2 Conditions of existence 149
3.9.3.3 Properties for calculations . 151
3.9.4 Spectral representation . . . . . . . . 153
3.9.4.1 C- and ~-valued stochastic processes . 154
3.9.4.2 ~d -valued stochastic processes . . . . . 157
3.9.4.3 Random fields . . . . . . . . . . . 158
3.9.4.4 Karhunen-Loeverepresentation . 161
3.10 Extremes of stochastic processes . 165
3.10.1 Mean crossing rate . . . . 165
3.10.2 First passage time density 168
3.11 Martingales . . . . . . . . . 169
3.11.1 Properties . . . . . . 173
3.11.2 Stopped martingales 175
3.11.3 Inequalities . . . . . 176
3.11.4 Quadratic variation and covariation processes 179
3.12 Poisson processes . . . . . 182
3.13 Brownian motion process . 186
3.14 Levy processes . . . . . . 189
3.14.1 Properties . . . . . 191
3.14.2 The Levy decomposition . 193
3.15 Problems . . . . . . . . . . . . . 201

4 Ito's Formula and Stochastic Differential Equations 205


4.1 Introduction . . . . . . . . . . . . . 205
4.2 Riemann-Stieltjes integrals . . . . . 206
4.3 Preliminaries on stochastic integrals 208
4.4 Stochastic integrals . . . . . . . . . 216
4.4.1 Semimartingales . . . . . . 217
4.4.2 Simple predictable integrands 221
4.4.3 Adapted caglad integrands . . 223
4.4.4 Properties of stochastic integrals . 224
viii Contents

4.5 Quadratic variation and covariation . . 228


4.5.1 Definition . . . . . . . . . . 228
4.5.2 Properties . . . . . . . . . . 229
4.5.3 Stochastic integrals and covariation processes . 234
4.6 Ito's formula . . . . . . . . . . . 237
4.6.1 One-dimensional case . . . 238
4.6.2 Multi-dimensional case . . . 247
4.6.3 Fisk-Stratonovich's integral 249
4.7 Stochastic differential equations . . 253
4.7.1 Brownian motion input . . . 256
4.7.1.1 Existence and uniqueness of a solution 258
4.7.1.2 Properties of diffusion processes 262
4.7.1.3 Moments and other properties of
diffusion processes 267
4.7.2 Semimartingale input . 271
4.7.3 Numerical Solutions . . . . . 275
4.7.3.1 Definitions . . . . . 276
4.7.3.2 Euler and Milstein numerical solutions 277
4.8 Problems . . . 284

5 Monte Carlo Simulation 287


5.1 Introduction . . . . . 287
5.2 Random variables . . 288
5.2.1 Gaussian variables 288
5.2.2 Non-Gaussian variables 289
5.3 Stochastic processes and random fields . 293
5.3.1 Stationary Gaussian processes and fields. 293
5.3.1.1 Spectral representation. Stochastic processes . 293
5.3.1.2 Spectral representation. Random fields . 299
5.3.1.3 Sampling theorem. Stochastic processes 304
5.3.1.4 Sampling theorem. Random fields 309
5.3.2 Non-stationary Gaussian processes and fields . 310
5.3.2.1 Linear differential equations. . . . . 310
5.3.2.2 Fourier series. Stochastic processes . 312
5.3.2.3 Fourier series. Random fields 315
5.3.3 Non-Gaussian processes and fields . . . . 316
5.3.3.1 Memoryless transformations . 316
5.3.3.2 Transformations with memory 320
5.3.3.3 Point and related processes 325
5.4 Improved Monte Carlo simulation 329
5.4.1 Time change . . . . . . . . . . . . 330
5.4.2 Measure change . . . . . . . . . . 334
5.4.2.1 Time invariant problems . 334
5.4.2.2 Time variant problems . . 337
Contents IX

5.5 Problems 341

6 Deterministic Systems and Input 343


6.1 Introduction . . . . . . . . . . . . . . . . . . . 343
6.2 Random walk method . . . . . . . . . . . . . . 345
6.2.1 Dirichlet boundary conditions (q = 0) . 346
6.2.1.1 Local solution . . . . . 346
6.2.1.2 Monte Carlo algorithm 353
6.2.1.3 The Laplace equation . 355
6.2.1.4 The Poisson equation . 359
6.2.1.5 Heat and transport equations 361
6.2.2 Dirichlet boundary conditions (q ::/= 0) . 364
6.2.2.1 The Feynman-Kac functional 364
6.2.2.2 The inhomogeneous Schrodinger equation 367
6.2.2.3 The homogeneous Schrodinger equation 370
6.2.3 Mixed boundary conditions . . . . . . . . . . . . . 371
6.2.3.1 Brownian motion reflected at zero. . . . . 372
6.2.3.2 Brownian motion reflected at two thresholds 383
6.2.3.3 Brownian motion in the first orthant of IR 2 . . . 387
6.2.3.4 General case . 390
6.3 Sphere walk method . . . . 394
6.3.1 The Green function .. 395
6.3.2 Mean value property . 396
6.3.3 Dirichlet boundary conditions 399
6.3.4 Mixed boundary conditions 402
6.4 Boundary walk method . . . . . . 403
6.5 Algebraic equations . . . . . . . . . 406
6.5 .1 Inhomogeneous equations 407
6.5.2 Homogeneous equations . 413
6.6 Integral equations . . . . . . . . . 416
6.6.1 Inhomogeneous equations 418
6.6.2 Homogeneous equations 421
6. 7 Problems . . . . . . . . . . . . . 425

7 Deterministic Systems and Stochastic Input 429


7.1 Introduction . . . . . . . . . . . 429
7.2 Linear systems . . . . . . . . . . . . . . . 432
7.2.1 Brownian motion input . . . . . . . 432
7.2.1.1 Mean and correlation equations . 433
7.2.1.2 Linear random vibration . . . . . 437
7 .2.2 Semimartingale input . . . . . . . . . . . 449
7 .2.2.1 Direct method. Square integrable martingales 452
7.2.2.2 Direct method. General martingales. 455
7 .2.2.3 State augmentation method . . . . . . . . . . 460
x Contents

7.3 Nonlinear systems . . . . . . . . . . . 473


7.3.1 Brownian motion input . . . . 475
7.3.1.1 Moment equations . 475
7.3.1.2 Differential equation for characteristic function 478
7.3.1.3 Fokker-Planck-Kolmogorovequations . 481
7.3.1.4 Exact solutions. . . . . . . . . . 492
7.3.1.5 Nonlinear random vibration . . . . . . . 494
7.3 .2 Semimartingale input . . . . . . . . . . . . . . . 508
7.3 .2.1 Direct method. Square integrable martingales 508
7 .3.2.2 Direct method. General martingales . 509
7.3.2.3 State augmentation method 512
7.4 Applications . . 513
7.4.1 Models 513
7.4.1.1 Earth climate . . . . 514
7.4.1.2 Non-Gaussian input 515
7.4.2 Materials science . . . . . . 518
7.4.3 Reliability analysis . . . . . . 522
7.4.3.1 Crossing theory . . 523
7.4.3.2 First passage time . 527
7.4.4 Finance . . . . . . . . . . . 534
7 .4.5 Estimation . . . . . . . . . 539
7.4.5.1 Time invariant problems . 539
7.4.5.2 Time dependent problems. Discrete time 541
7.4.5.3 Time dependent problems. Continuous time 543
7.5 Problems . . . 546

8 Stochastic Systems and Deterministic Input 549


8.1 Introduction . . . . . 549
8.2 Local solutions . . . . . . . . . . 550
8.3 Algebraic equations . . . . . . . . 551
8.3 .1 Inhomogeneous equations 552
8.3.1.1 Monte Carlo simulation method . 552
8.3.1.2 Taylor series method .. . 554
8.3.1.3 Perturbation method .. . 558
8.3.1.4 Neumann series method . 561
8.3.1.5 Decomposition method . 563
8.3.1.6 Equivalent linearization method . 564
8.3.1.7 Iteration method . . . . . . . . . 565
8.3.2 Homogeneous equations . . . . . . . . . . 566
8.3.2.1 Deterministic eigenvalue problem . 567
8.3.2.2 Exact expressions and bounds . 569
8.3.2.3 Taylor series method . 570
8.3.2.4 Perturbation method . 572
8.3.2.5 Iteration method . . . 574
Contents xi

8.3.2.6 Level crossing for stochastic processes 575


8.4 Differential and integral equations . . . . . . . . . 582
8.4.1 Inhomogeneous equations . . . . . . . . . 583
8.4.1.1 Monte Carlo simulation method . 584
8.4.1.2 Taylor series method . . . 584
8.4.1.3 Perturbation method . . . 587
8.4.1.4 Neumann series method . 591
8.4.1.5 Other methods . . . . . . 598
8.4.2 Homogeneous equations . . . . . . 599
8.4.2.1 Deterministic eigenvalue problem . 600
8.4.2.2 Exact expressions and bounds . 602
8.4.2.3 Perturbation method . 603
8.4.2.4 Iteration method 604
8.5 Effective material properties . . . . . . 605
8.5.1 Conductivity . . . . . . . . . . 610
8.5.1.1 Homogeneous media 610
8.5.1.2 Heterogeneous media 611
8.5.1.3 Effective conductivity 613
8.5.2 Elasticity . . . . . . . . . . . . 617
8.5.2.1 Displacement controlled experiment.
Voigt's average. . . . . . . . . . . . 619
8.5.2.2 Stress controlled experiment. Reuss's average 622
8.5.2.3 Physically based approximations . 624
8.5.2.4 Analytically based approximations 629
8.6 Evolution and pattern formation 633
8.6.1 Elasticity . . . . . . . . . . . 633
8.6.2 Crystal plasticity . . . . . . . 645
8.6.2.1 Planar single crystal 645
8.6.2.2 Polycrystals 652
8.7 Stochastic stability . . . . 655
8.8 Localization phenomenon . 663
8.8.1 Soilliquefaction . 663
8.8.2 Mode localization. 666
8.9 Problems . . . . . . . . . 670

9 Stochastic Systems and Input 673


9.1 Introduction . . . . . . 673
9.2 Methods of analysis . . . . 674
9.2.1 Local solutions . . 676
9.2.2 Monte Carlo simulation 677
9.2.3 Conditional analysis 678
9.2.4 State augmentation . . . 682
9.2.5 Liouville equation . . . 687
9.2.6 Taylor, perturbation, and Neumann series methods . 690
xii Contents

9.2.7 Galerkin method . . . . . 694


9.2.8 Finite difference method . 700
9.3 Mechanics . . . . . . . . . . . . . 701
9.3.1 Variational principles . . . 701
9.3.2 Deterministic problems . . 703
9.3.3 Stochastic problems . . . 704
9.3.4 Methods of analysis . . . 706
9.3.4.1 Classical methods . 707
9.3.4.2 Polynomial chaos . 709
9.4 Physics . . . . . . . . . . . . . . . . 714
9.4.1 Boltzmann transport equation . 714
9.4.2 Ising model . . . . . . . . . . . 715
9.4.3 Noise induced transitions . . . . 720
9.5 Environment and ecology . . . . . . . . . . . . . . . . . . . . . . 725
9.5.1 Rainfallrunoffmodel . . . . . . . . . . . . . . . . . . . . 726
9.5.2 Water quality in streams . . . . . . . . . . . . . . . . . . 729
9.5.3 Subsurface flow and transport . 732
9.6 Waves in random media . . 736
9.7 Seismology . . . . . . . . . . . . 741
9.7.1 Physical model . . . . . . 741
9.7.2 Cellular automata model . 743
9.8 Model selection . . . . . . . . . . 745
9.8.1 Partially known input . . . 745
9.8.2 Partially known system and input . 748
9.9 Problems . . . . . . . . . . . . . . . . . . 754

Bibliography 757
Index 771
Stochastic Calculus

Applications in Science
and Engineering
Chapter 1

Introduction

Algebraic, differential, and integral equations are used in the applied sciences, en-
gineering, economics, and the social sciences to characterize the current state of a
physical, economic, or social system and forecast its evolution in time. Generally,
the coefficients of and/or the input to these equations are not precisely known be-
cause of insufficient information, limited understanding of some underlying phe-
nomena, and inherent randonmess. For example, the orientation of the atomic
lattice in the grains of a polycrystal varies randomly from grain to grain, the spa-
tial distribution of a phase of a composite material is not known precisely for a
particular specimen, bone properties needed to develop reliable artificial joints
vary significantly with individual and age, forces acting on a plane from takeoff to
landing depend in a complex manner on the environmental conditions and flight
pattern, and stock prices and their evolution in time depend on a large number of
factors that cannot be described by deterministic models. Problems that can be
defined by algebraic, differential, and integral equations with random coefficients
and/or input are referred to as stochastic problems.
The main objective of this book is the solution of stochastic problems, that
is, the determination of the probability law, moments, and/or other probabilistic
properties of the state of a physical, economic, or social system. It is assumed that
the operators and inputs defining a stochastic problem are specified. We do not
discuss the mathematical formulation of a stochastic problem, that is, the selection
of the functional form of the equation and of the probability laws of its random
coefficients and input for a particular stochastic problem.
The book is addressed to researchers and graduate students. It is intended
to serve as a bridge between heuristic arguments used at times in the applied sci-
ences and the very rich mathematical literature that is largely inaccessible to many
applied scientists. Mathematicians will find interesting unresolved technical prob-
lems currently solved by heuristic assumptions.

1
2 Chapter 1. Introduction

Organization
The book is largely self-contained and has two parts. The first part includes
Chapters 2-5, and develops the probabilistic tools needed for the analysis of the
stochastic problems considered in the book. Essentials of probability theory are
reviewed in Chapter 2. An extensive review of stochastic processes, elements of
stochastic integrals, Ito's formula, and a primer on stochastic differential equa-
tions and their numerical solution is presented in Chapters 3 and 4. Numerous ref-
erences are provided that contain details and material not included in Chapters 2,
3, and 4. Methods of Monte Carlo simulation for random variables, stochastic
processes, and random fields are discussed in Chapter 5. These methods provide
useful illustrations for some of the theoretical concepts in Chapters 2-4 and are
essential for the solution of many of the stochastic problems in this book.
The second part of the book, Chapters 6-9, develops methods for solving
stochastic problems. In this book stochastic problems are characterized by type

Table 1.1: Stochastic problems

INPUT

Deterministic Chapter6
Stochastic Chapter7

rather than by the particular field of application. Table 1 shows the four types
of problems considered in the book. Chapter 6 is concerned with deterministic
problems, that is, problems in which both the governing differential equations are
deterministic, but whose solution utilizes concepts of the theory of probability
and stochastic processes. Such problems arise in physics, mechanics, material
science, heat conduction, and many other fields. For example, consider a Laplace
equation "£f= 1 a2 u (x) 1axf = 0 with Dirichlet boundary conditions defined on an
open bounded subset D of Rd . The solution of this equation at an arbitrary point
x E Dis equal to the expectation of u(Y), where Y denotes the exit point from D
of an Rd-valued Brownian motion starting at x. Numerical algorithms based on
this approach are simpler and more efficient, though less general, than current fi-
nite element and finite difference based algorithms. Chapter 7 discusses problems
in which the equations describing a system have deterministic coefficients and the
input to these equations is stochastic. Such problems arise in the description of
various types of physical, economic, and ecological systems. Chapter 8 covers
problems defined by equations with random coefficients and deterministic input.
Examples include the derivation of macroscopic properties of a material from the
attributes of its constituents and phenomena, such as localization and pattern for-
mation, relevant in physics and mechanics. Chapter 9 is concerned with problems
characterized by equations with random coefficients and inputs. An example is
Chapter 1. Introduction 3

the system describing the propagation of pollutants through a soil deposit with
uncertain properties for which no realistic deterministic description is possible.
We now briefly describe the presentation of the material. Essential facts are
in boxes throughout the book. The book contains numerous examples that have
two purposes: to state consequences of the essential facts and to illustrate the use
of the stated facts in the solution of stochastic problems. The statements of the
facts and examples are followed by proofs or notes printed in smaller characters.
Complete proofs are given if they are not very technical. The notes include the
idea and/or the essential steps of technical proofs and references where complete
proofs can be found.
The advanced topics on probability theory and stochastic processes in the
first part of the book are essential for Chapters 6 and 7. Many of the developments
in Chapters 8 and 9 are largely based on the second moment calculus. How-
ever, extensions of the methods in Chapters 6 and 7 to the problems considered in
Chapters 8 and 9 require most of the advanced topics in Chapters 2-4.

Classroom use
The book can be used as a text for four different one-semester courses or
one two-semester course. The four one-semester courses can differ depending
on the applications they emphasize. The first, second, third, and fourth courses
emphasize the applications in Chapter 6, 7, 8, and 9, respectively, and require the
following background.

• The first course: properties of the conditional expectation (Chapter 2), stop-
ping times and martingales (Chapter 3), the Ito calculus and diffusion processes
(Chapter 4), and Monte Carlo techniques (Chapter 5).

• The second course: facts on stochastic processes (Chapter 3), the stochas-
tic integral, the Ito calculus, and the theory of stochastic differential equations
(Chapter 4), and Monte Carlo techniques (Chapter 5).

• The third course: a review of probabilistic concepts (Chapter 2), second mo-
ment calculus for stochastic processes (Chapter 3), and Monte Carlo techniques
(Chapter 5).

• The fourth course: essential probability concepts (Chapter 2), properties of


Brownian motion and second moment calculus for stochastic processes (Chap-
ter 3), use of Ito 's formula (Chapter 4), and Monte Carlo simulation techniques
(Chapter 5).
Owing to time limitations, the one-semester courses need to be focused
on applications. The presentation of the book, summarizing facts on probability
theory and stochastic processes that are essential for calculations and illustrating
these facts by numerous examples, facilitates the course development.
4 Chapter 1. Introduction

A two-semester course can include most of the topics discussed in this


book. A good part of the first semester needs to be spent on a review of the topics
in Chapters 2-4. Depending on students' interest and background, the lectures can
focus on understanding and use of the essential facts given in these chapters or
they may incorporate proofs of some of these facts based on the material and ref-
erences in the book. The Monte Carlo simulation techniques in Chapter 5 can be
used during lectures on Chapters 2-4 to illustrate various probabilistic concepts.
Most students in my classes on stochastic problems are in favor of this approach,
which facilitates the understanding of some technical concepts of probability the-
ory. This approach also contributes to the development of computational skills
needed for solving realistic stochastic problems that, generally, do not have ana-
lytical solutions. The remainder of the first semester can be used to discuss the
solution of problems defined by deterministic equations and inputs (Chapter 6).
The second semester of the course can include analysis of the stochastic problems
discussed in Chapters 7-9 and/or other problems that are relevant to the instruc-
tor's field. The addition of field specific applications is strongly recommended
and is consistent with the goal of this book: to use a unified framework for the
solution of stochastic problems arising in applications.

Acknowledgements
This book could not have been completed without the contributions of many
individuals. In particular, I wish to express my deepest appreciation to Professors
S. I. Resnick and G. Samorodnitsky of Cornell University for their technical ad-
vice and comments on various topics in the book, Dr. E. Simiu of National Insti-
tute of Standards and Technology for reviewing the entire manuscript, Professor
S. T. Ariaratnam of the University of Waterloo, Canada, for numerous stimulat-
ing discussions, as well as Professors P. R. Dawson, J. Jenkins, S. Mukherjee,
and Dr. C. Myers of Cornell University and Professor 0. Ditlevsen of Denmark
Technical University, Lyngby, for their useful comments. I also wish to thank my
doctoral students S. Arwade, E. Mostafa, and C. Roth for their many contribu-
tions, and Mr. C. Willkens for his enthusiastic and professional support of the
computer hardware and software used in this project. Finally, I am grateful to my
wife Betsy for understanding, encouragement, and support. During this project,
she became an accomplished sculptor and pilot.
My research on stochastic problems, some of which is incorporated in this
book, has been supported by the National Science Foundation, National Institute
of Standards and Technology, Electric Power Research Institute, Jet Propulsion
Laboratory, Air Force Office of Scientific Research, Federal Aviation Adminis-
tration, AON Financial Products, and other institutions. I am indebted to these
organizations and their continued support.
Chapter 2

Probability Theory

2.1 Introduction
Essential concepts of probability theory needed in this text are reviewed
and are illustrated by examples. The review includes the concepts of events,
sample space, a-field, measure, probability measure, probability space, condi-
tional probability, independence, random variable and vector, integral of random
variables, expectation, distribution, density, and characteristic functions, second
moment properties, convergence of sequences of random variables, conditional
expectation, and martingales. The readers familiar with these concepts can skip
this chapter entirely. However, some of those readers may benefit from using this
chapter as a summary of facts and examples needed in the rest of the book.

2.2 Probability space


The three components of probability space, the sample space n, the a-field
:F, and the probability measure P, are defined and illustrated.

2.2.1 Sample space


Consider an experiment and let n be the set of outcomes of this experiment,
called the sample space. For example, n = {1, 2, 3, 4, 5, 6} for the experiment
of rolling a die and n = [a, b] c [0, oo), 0 < a < b < oo, for the experiment
consisting of strength tests of nominally identical steel specimens. The first and
second sample spaces have a finite and an uncountable number of elements, re-
spectively. If n has a finite, countable, or uncountable number of elements, it is
referred to as a finite, countable, or uncountable sample space, respectively. The
elements of n are denoted by (J).

5
6 Chapter 2. Probability Theory

Example 2.1: Consider the experiment of rolling a die and two games associated
with this experiment. In the first game, one loses $10 if w :::: 3 and wins $10 if w >
3. In the second game, one loses $10 if w :::: 3 and wins $5, $10, or $15 if w = 4,
5, or 6. The relevant information for the first and second games is given by the
collections of subsets A1 = {{1, 2, 3}, {4, 5, 6}} and Az = {{1, 2, 3}, {4}, {5}, {6}}
of Q, respectively. Playing these games does not require knowing in the finest
detail the outcome of each roll of the die. Coarser information suffices. To play
the first game we need to know only if w :::: 3 or w > 3. The second game requires
a more refined description of the outcomes of the experiment because it has more
options of interest. ¢>
Note: Similar considerations apply to the experiment of testing steel specimens for strength.
Suppose that each steel specimen is subjected to the same force of magnitude x > 0 and
the objective is to design a safe steel structure. There are two relevant outcomes for this
"game", survival A = {w ~ x} and failure B = A c. The precise value of the strength w of a
particular specimen is not essential. To assess the likelihood of survival of steel specimens
subjected to action x it is sufficient to know whether w < x or w ~ x. A

2.2.2 a-field
Let F be a collection of subsets of Q relevant to a particular experiment. It
seems natural to re~uire that F has at least two pro_rerties: (1) If A E F, then the
outcomes corresponding to the non-occurrence of$ should also be in F, that is,
A E F implies A c E F, so that F is closed to co~plements, and (2) If A i E F,
i E I, that is, if the subsets Ai occur individually, then UiEJ Ai should also be in F
for any subset J of the index set I. These heuristic considerations are consistent
with the following properties defining F.

A non-empty collection of subsets F of Q is called a a-field on Q if

1. 0 E F.
2. A E F ===}AcE F.
3. Ai E F, i E I, I= a countable set ===?- UiE/ Ai E F. (2.1)

Note: The first condition in the definition of the a-field :F can be replaced with 0 E :F
because :F is closed to complements. The last conditions imply that countable intersections
of events are events. We have UiE/ Ai E :F for Ai E :F (condition 3) so that (UiEI Ai)c E
:F (condition 2) and (UiE/ Ai)c = niE/ Af E F by De Morgan's formulas.
If the last condition in the definition of a a- field F is replaced with the requirement
that F is closed under finite union, F is said to be a field. Hence, a a-field is a field.
However, a field may not be a a-field. There is no difference between a field and a a-field
for finite sample spaces. A
The members ofF are called events, or F-measurable subsets of Q, or just
measurable subsets of Q if there is no confusion regarding the reference a -field.
The pair (Q, F) is said to be a measurable space.
2.2. Probability space 7

Example 2.2: The collections Ai, i 1, 2, defined in Example 2.1 are not a-
fields. However,

:F1 = {members of A1, 0, Q} and


:F2 = {members of A2. {4, 5, 6}, {1, 2, 3, 5, 6}, {1, 2, 3, 4, 6}, {1, 2, 3, 4, 5}, 0, Q}
are a-fields. They are the smallest a-fields including Ai. ¢
Note: A 1 is not closed to union while A2 is not closed to both union and complements so
that neither is au -field. ~

Let A be a collection of subsets of Q and define

a(A) = n
Q2A
Q, (2.2)

where g are a-fields on n. Then a(A) is a unique a-field, called the a-field
generated by A. There is no a-field smaller than a(A) that includes A.

• The Borel a-field is generated by the collection of open sets of a topological


space. The members of this a -field are called Borel sets.

• The Borel a-fields on JRd, d > 1, and lR are generated by the intervals in these
spaces and are denoted by B(JRd) =Ed and B = .6 1 = B(JR), respectively. The
Borel a-fields on the intervals [a, b], [a, b), (a, b], and (a, b) of the real line are
denoted by B([a, b]), B([a, b)), B((a, b]), and B((a, b)), respectively.

Note: The Borel u-field constitutes a special case of u(A) in Eq. 2.2 that is generated
by the collection of open sets D of a space X. This collection is a topology on X if it
(1) contains the empty set and the entire space, (2) is closed to finite intersections, and
(3) is closed to uncountable unions. There are notable similarities between a topology D
and a u-field :F defined on a space X. Both D and :F include the entire space and the
empty set and are closed to countable unions and finite intersections. However, D is closed
to uncountable unions while :F is closed under complements and countable intersections.
If X= lR, B can be generated by any of the intervals (a, b), (a, b], [a, b), or [a, b]
([151], Section 1.7, p. 17), that is,

B = u((a, b), -oo:::; a:::; b:::; +oo) = u([a, b), -oo <a:::; b:::; +oo)
= u([a, b], -oo <a:::; b < +oo) = u((-oo, x], x E JR) = u(open subsets oflR).
The Borel u-field Bd, d > 1, can be generated, for example, by the open intervals
xf=l (ai, bi) of JRd. ~
Example 2.3: Let A be a subset of Q. The a-field generated by A is :F =
{0, A, Ac, Q}. The a-fields :Fi in Example 2.2 coincide with the a-fields gen-
erated by the collections of events Ai. ¢
8 Chapter 2. Probability Theory

Example 2.4: Let Q = lR and A be the collection of finite unions of intervals of


the form ( -oo, a], (b, c], and (d, oo). This collection is a field but is not au-field
on the sample space Q. <>
Proof: The empty set is a member of A because the interval (b, c] is in A and (b, b] =
0. The complements, (a, oo), (-oo, b] U (c, oo), and (-oo, d], of (-oo, a], (b, c], and
(d, oo), respectively, are in A. The collection A is also closed under finite unions by
definition. Hence, A is a field. However, A is not closed under countable intersections so
that it is not a a-field. For example, the intersection nn(b- ljn, c] = [b, c] is not in A
although the intervals (b- ljn, c] are in A for each n. •

Example 2.5: The intervals (a, b], [a, b), and [a, b], a < b, are Borel sets al-
though they are not open intervals, that is, they are not members of the usual
topology V on the real line. Singletons, that is, isolated points of the real line, are
also Borel sets. <>
Proof: The intervals (a, b + 1/n), (a -ljn, b), and (a- 1jn, b + 1/m) are open intervals
that are in B for each n = 1, 2, ... , so that nn,m;::l (a- 1/n, b + 1/m) = [a, b] is in B.
Similar calculations show that [a, b), and (a, b] are Borel sets. That singletons are Borel
sets follows from the equality {a} = nn,m:::I (a-1jn, a+ 1/m) since (a-1jn, a+ 1/m) E
B holds for each n, m, and a E lit •

Example 2.6: Let :F beau-field on a sample space Q. A member A of this field


is called an atom if A ::j:. 0 and if B ~ A implies that either B = 0 orB = A.
Hence, atoms are the finest members of a field. For example, the atoms of the u-
fields associated with the games of Example 2.1 are {1, 2, 3}, {4, 5, 6} E :F 1 and
{1, 2, 3}, {4}, {5}, {6} E :Fz. <>

2.2.3 Probability measure


Consider a measurable space ( Q, :F). We define two real-valued functions
on :F, a measure f.J, and a probability measure P, and give some of their properties.

A set function f.,/, : :F ~ [0, oo] is said to be a measure if it is countably additive,


that is,
00

f.J,(U~ 1 Ai) = 2:f.,l,(Ai) for Ai E :F, Ai n Aj = 0, i ::j:. j. (2.3)


i=l

The triple (Q , :F, f.J,) is called a measure space.

We now give four examples of measures.


1. The Lebesgue measure, denoted by A. d and defined on (JRd, Ed), is

Ad([at, bi] X··· X [ad, ba]) = n


i=l
d
(bi- ai),
2.2. Probability space 9

that is, the volume of the interval [a1, bl] x ... x [aa, ba].

2. The counting measure is used later in the book in conjunction with the Poisson
process. Suppose that :F is the power set of the sample space Q, that is, the
collection of all subsets of Q. The counting measure of A E :F is the cardinal of
A if A is finite and infinity otherwise.

3. The a-finite measure has the property that for every A E :F there exists a
sequence of disjoint events Ai, i = 1, 2, ... , such that J.t(Ai) < oo for every i,
Q = UiAi, and J.t(A) = L:;~ 1 J.t(A n Ai). For example, the Lebesgue measure A
on lR is a-finite because A(A) = LieZ A(A n [i, i + 1)), where A E !3, Z denotes
the set of integers, and A = A1 is the Lebesgue measure on the real line.

4. The finite measure is a measure with the property J.t(Q) < oo. The scaled
version of this measure, J.t(A) 1J.t(Q), takes values in [0, 1].

A set function P : :F--+ [0, 1] with the properties

1. P(Q) = 1,

L P(Ai),
00

2. P(U~ 1 Ai) = Ai E :F, Ai n A1 = 0, i -=1= j, (2.4)


i=l

is said to be a probability measure or a probability. The triple (Q, :F, P) is


called a probability space.

Note: A probability space can be obtained from a measure space (Q, F, J.l) with a finite
measure J.l by setting PO = J.l01 J.l(Q). A set N in :F with probability zero is called a
null set. A property that is true on Q \ N is said to hold almost everywhere (a.e.), almost
surely (a.s.), for almost every w, or with probability one (w.p.1). Because of the second
condition in Eq. 2.4, we say that the probability measure is countably additive.
The definition of P is consistent with our intuition. Let A and B be two events
associated with an experiment defining Q and F. Suppose that the experiment is performed
n times and denote by n A and n B the number of outcomes in which A and B are observed,
respectively. For example, nA and nB may denote the number of outcomes {w ::: 3} and
{w > 3} when a die is rolled n times. The probabilities of A and B are given by the limits
as n --+ oo of the relative frequencies of occurrence, n A In and n BIn, of the events A and
B. The relative frequencies have the properties (1) P(Q) = liilln--+oo(noln) = 1 since
no = n for each nand (2) if A and Bare disjoint events, P(A U B) = liilln--+oo((nA +
nB)/n) = limn--+oo(nA/n) + limn--+oo(nB/n) = P(A) + P(B), consistent with Eq. 2.4.
We also note that Eqs. 2.3 and 2.4 are meaningful because F is closed to countable unions.
Also, A E :F implies that A c is in the domain of P since F is a CJ' -field. &

We assume throughout the book that all probability spaces are complete. A
probability space (Q, F, P) is complete if for every A c B such that B E :F
and P(B) = 0 we have A E :F so that P(A) = 0 (Eq. 2.5). The following result
shows that this assumption is not restrictive.
10 Chapter 2. Probability Theory

For any probability space (Q, F, P) there exists a complete space (Q, F, P)
such that F ~ f: and P = P on F ([40], Theorem 2.2.5, p. 29).

The formulas in Eqs. 2.5-2.8 give properties of the probability measure P


that are useful for calculations.

P(A)::::: P(B), A~ B, A, BE F.
P(A) = 1- P(Ae), A E F.
P(A U B) = P(A) + P(B) - P(A n B), A, B E F. (2.5)

Proof: The first relationship follows since B = A U (B \ A), A and B \ A are disjoint
events, B \ A = B n A e, and the probability measure is a countably additive positive set
function.
We have 1 = P(Q) = P(A U A e) = P(A) + P(Ae) by Eq. 2.4. This also shows
that the probability of the impossible event is zero because P(0) = 1 - P(Q) = 0.
The last relationship results from Eq. 2.4 and the observations that A U B is the
union of the disjoint sets {A n Be, A n B, A e n B}, A is the union of the disjoint sets
{An Be, An B}, and B is the union of the disjoint sets {Ae n B, An B} . •

If A;, BE F and A;, i = 1, ... , n, partition Q, we have


n
P(B) = L P(B n A;). (2.6)
i=l

Proof: This equality follows from Eq. 2.4 since Q = Ut=l A;, A; E F, A; n AJ = 0,
i =f- j, and B = U; (B n A;) is a union of disjoint events. •

If A; E F, i = 1, 2, ... , then

L P(A;).
00

P(U~l A;)::::: (2.7)


i=l

Proof: Since
u~ 1 A;= A 1 u (A]' n A 2) u (AI n A2 n A3 ) U···

and the events AJ, AI n A2, AI n A2 n A3, ... are disjoints, we have

which gives Eq. 2.7 by using the first relation in Eq. 2.5. For example, P(Aj n A2) :::;
P(A2), PC AI n A2 n A3) :::; P(A3), and so on. A set function P satisfying Eq. 2.7 is said
to be subadditive. •
2.2. Probability space 11

The probability measure satisfies the inclusion-exclusion formula

n n i-1
P (uj= 1 A;)= L P(A;)- L L P(A; n Aj)
i=l i=2 j=l
n i-! j-1
+L L L P(A; n AJ n Ak)- ... + (-l)n+I p ( n~=JAq),
i=3 j=2k=!
(2.8)

where A; E F, i = 1, ... , n, are arbitrary.

Proof: This formula extends the last equation in Eq. 2.5 and results from a repeated appli-
cation of this equation. For example, the probability of P(A 1 U Az U A 3) is equal to the
probability of the union of A 1 U A 2 and A 3 so that

P((A 1 U A 2 ) U A 3 ) = P(A 1 U A 2) + P(A 3)- P((A 1 U A 2) n A 3)


= P(A 1 U A 2) + P(A 3) - P((A 1 n A 3) U (A 2 n A3))
= P(A 1) + P(Az)- P(At n Az) + P(A3)
- P(A 1 n A 3)- P(Az n A3)) + P(At n A 2 n A 3),

which is the inclusion-exclusion formula for n = 3. •

Example 2.7: Let F; be the failure event of component i of a series system with n
components, that is, a system that fails if at least one of its components fails. The
system probability of failure is P f = P(Uj= 1 F;). If the events F; are disjoint, the
failure probability is Pf = Lt=l P(F;). Otherwise, Lt=l P(F;) gives an upper
bound on P f. <>

Example 2.8: The probability of failure P f = P(Uj= 1 F;) of the series system
in the previous example can be calculated exactly by the inclusion-exclusion for-
mula. However, the use of this formula is prohibitive for large values of n or even
impossible if the probability of the events F; 1 n · · · n F;m is not known for m 2:: 2.
The calculation of the bounds
n n
Pt:::: Pf,u = L P(F;)- L.
i=1
max
i=2 j=l, ... ,l-l
P(Fj n F;) and

Pf 0> Pf,1 ~ P(FI) +~max ( 0, P(F;)- ~ P(FJ n F;)) (2.9)

on Pf is relatively simple since it involves only the probability of the events F;


and F; n Fj, i =1= j.
12 Chapter 2. Probability Theory

If P(Fi) = p and P(Fi n Fj) = p 2 , i =/= j, theupperandlowerboundson


the probability of failure Pf of this system are

n
Pf,u = n p- (n- 1) p 2 and Pf,l = p +I: max (o. p- (i- 1) p 2 ).
i=2

These bounds are shown in Fig. 2.1 for p = 0.1 as a function of the system size
n. The bounds are relatively wide and deteriorate as n increases. <>

1.2

Figure 2.1. Bounds on the probability of failure for a series system

Proof: The equality P(Fi n Fj) = p 2, i =j:. j, is valid if Fi and F1 are independent events,
as we will see later in this section (Eq. 2.54). We have (Eq. 2.7)

Pt = P(U/= 1Fi) = P(F1) + P(F[ n Fz) + P ((F1 U Fz)c n F3) + · ··


+ P ((F1 U .. · U Fn-1)c n Fn)
so that
n
Pt = P(F1) +L P(S1 n · · · n si-1 1 Fi) P(Fj), n 2:: 2,
i=2
where Si = F{. The conditional probabilities P(S1 n · · · n Si-1 I Fi) are smaller and
larger than P(Sj I Fi ), j = 1, ... , i - 1, and 1 - L~-:,11 P(Fj I Fi), respectively ([53],
pp. 236-237). These inequalities give the bounds ofEq. 2.9. •

2.3 Construction of probability spaces


The starting point in many applications involving probabilistic models is an
experiment rather than a probability space. We need to construct a probability
space based on the available information. This section illustrates three construc-
tions of a probability space.
2.3. Construction of probability spaces 13

2.3.1 Countable sample space


Let Q = {w1 , wz, ... }be a sample space and consider a collection of num-
bers Pi 2: 0, i = 1, 2, ... , such that L:~ 1 Pi = 1. Take :F to be the power set of
n, that is, the collection of all subsets of n.
The function P : :F ~ [0, 1] defined by

P(A) =L Pi· A E :F (2.10)


WiEA

is a probability measure on (Q, :F).

Proof: The set function Pis positive for any A E :F, P(Q) = L:~ 1 Pi = 1 by definition,
and is countably additive since

provided that A j• j = 1, 2, ... , are mutually disjoint events ([151], p. 41). •

2.3.2 Product probability space


Let p > 0 be the probability that maximum yearly wind speed V 2: 0 at
a site exceeds a value Vcr deemed critical for a facility, that is, the probability of
the event A = {V > Vcr}. The associated sample space, a-field, and probability
measure are Q = [0, oo), :F = {0, A, Ac, Q}, and P(A) = p, respectively. To
evaluate the probability that this facility performs satisfactorily during its design
life of n years we need to construct a new probability space for an "experiment"
consisting of n repetitions of the maximum yearly wind speed. This engineer-
ing application resembles the experiment of tossing a loaded coin n times ([151],
p. 41).
Consider two probability spaces (Qk. :Fk, Pk), k = 1, 2, describing two
experiments. These two experiments can be characterized jointly by the product
probability space (Q, :F, P) with the following components.

Product sample space:

(2.11)

Product a-field:
:F = :F1 = a(R), where
x :Fz (2.12)
R = {A1 x Az : A1 e :F1, Az e :Fz} = measurable rectangles. (2.13)
14 Chapter 2. Probability Theory

Product probability measure, P = Pt x Pz, on the measurable space (Q, F):


The probability P is unique and has the property

(2.14)

Note: The product sample space Q contains the outcomes of both experiments generating
the sample spaces r.lt and r.lz. Let

be two collections of subsets of Q. These collections are a-fields on r.l, are included in :F,
and :F = a(R) = a(9 1 , 9z) since every member of n is the intersection of sets from 9t
and 9z.
The construction of the product probability measure is less simple. It can be
shown that there exists a unique probability P on (Q, :F) such that it satisfies Eq. 2.14
([40], Theorem 3.3.5, p. 59) . .&

Example 2.9: Let Qk = {1, 2, 3, 4, 5, 6}, k = 1, 2, be sample spaces associated


with the experiment of rolling two dice. The a-fields Fk on these spaces consist
of all subsets of Qk. The probability measures on Qk are Pk({i}) = 1/6, i =
1, ... , 6.
The product sample space Q, the product a-field :F, and the product prob-
ability measure P are

(1, 1) (1, 2) (1, 6)


(2, 1) (2, 2) (2, 6)
(3, 1) (3, 2) (3, 6)
Q = n1 x Qz = {c:v = (i, })} =
(4, 1) (4, 2) (4, 6)
(5, 1) (5, 2) (5, 6)
(6, 1) (6, 2) (6, 6)

all subsets of Q, and P({c:v}) = 1/36, respectively.<>

Note: The sample space Q corresponds to the experiment of rolling two dice. The prod-
uct a-field consists of all subsets of Q since the members of n are (i, j), Uieh (i, j),
U je[z (i, j), Uieh ,je/z (i, j), where It, lz ~ {1, 2, 3, 4, 5, 6}. The product probability
measure can be defined for each outcome w = (i, j) and is P({w}) = 1/36 because the
members of Q are equally likely. Also, P({w} = (i, j)) is equal to P1 ({i}) Pz({j}) =
(1/6) (1/6) . .&

Example 2.10: Consider the same experiment as in the previous example but as-
sume that the a-fields on QI = Qz = {1, 2, 3, 4, 5, 6} are

:Ft ={At= {1, 2}, A}, 0, QI} and :Fz = {Az = {1, 2, 3}, A2, 0, Qz}.

The probabilities of At and Az are Pt (At)= 2/6 and Pz(Az) = 3/6.


2.3. Construction of probability spaces 15

The product probability space is Q = {w = (i, j), i, j = 1, ... , 6}. The


product a -field coincides with the collection of measurable sets
{AI X A2, AI X A2_, A! X A2, A! X A2_},
unions of the members of this collection, and sets in which at least one component
is the empty set, for example, the set 0 x A2. The product probability measure is
given by Eq. 2.14. o
Note: The product sample space n is as in Example 2.9 but the product cr-field is much
coarser than in this example because fewer events are relevant. The definition of the product
probability measure is consistent with our intuition. For example, P(A I x A2) = 6/36 and
coincides with the product of PI (AI) and P2(A2) . .&

The formulas ofEqs. 2.11-2.14 can be generalized to define a product prob-


ability space for three or more probability spaces. Consider the probability spaces
(Qk. :h, Pk), k = 1, 2, ... , n.

• If n is finite, the product probability space (Q, :F, P) is defined by

Q = QI X··· X Qn,

:F = FI X ••• X Fn,
P =PI X··· X Pn. (2.15)

• If n is infinity, the definition of Eq. 2.15 applies with n replaced by oo.


Note: The last equalities in Eq. 2.15 are commonly used notations. If the probability spaces
(Ok, :Fk, Pk) are identical, the product sample space, cr-field, and probability are denoted
by O'j, :Ff, and Pf for n < oo and Of, :Ff, and Pf for n = oo . .&

Example 2.11: A loaded coin with sides {1} and {0} and probabilities p E (0, 1)
and q = 1 - p, respectively, is tossed n times. The probability space for a single
toss is Q = {0, 1}, :F = {0, Q, {0}, {1}}, P({l}) = p, and P({O}) = q. The
corresponding elements of the product probability space for n tosses are
Qn = {w= (WI, ... ,Wn) :wi =0orl},

:Fn = all subsets of nn' and

pn(A) = L P({w}) = L pn"' qn-nw' A E :Fn'


weA weA

where nw = .L:7=I Wi gives the number of 1'sin w = (WI, ... , Wn). 0


Note: We have
pn(On) = L pn"' qn-nw = L pn'w qn'-n'w (pl qo +PO qi) = ... = 1
W!, •.. ,Wn lVJ, •.. ,Wnf

since pi q 0 + p 0 qi = 1, where n' = n- 1 and n:V = I:?~ 1 wi so that pncnn) = 1. The


set function pn is also positive and countably additive ([151], Section 2.3) . .&
16 Chapter 2. Probability Theory

2.3.3 Extension of probability measure


Consider an experiment with a sample space Q. Assume that a set function
R defined on a collection C of subsets of Q is available and that this function is
real-valued, positive, countably additive, and R(Q) = 1. For calculation purposes
we need to extend R to F = a (C) such that its extension is a probability measure
on the measurable space (Q, F). The following theorem states conditions under
which the extension of R to F is a probability measure. For proof and additional
information on this topic, see [151] (Section 2.4).

If (1) Cis a field on Q and (2) R is a real-valued, positive, and countably additive
function defined on C such that R(Q) = 1, then there exists a unique probability
P on F = a(C) such that P(A) = R(A) for each A E C, that is, the restriction
of P to Cis equal toR ([66], Theorem 14, p. 94).

Example 2.12: Let Q = lR and let C be the collection of all finite unions of
intervals of the type (a, b] for a < b, (-oo, a], (a, oo), and (-oo, oo) to which
we add the empty set. Let F : lR -+ [0, 1] be a continuous increasing function
such that Iimx-+-oo F(x) = 0 and limx-+oo F(x) = 1. DefineR : C -+ [0, 1]
by R((a, b]) = F(b)- F(a), R((-oo, a]) = F(a), R((a, oo)) = 1 - F(a),
and R((oo, oo)) = 1. This set function can be extended uniquely to a probability
measure on (JR, B). <>
Note: The collection of subsets Cis a field and a(C) = B. The set function R is well
defined because, for example, R((a, b] U (b, c]) = R((a, c]) and R(R) = 1, and it is
finitely additive. Moreover, R is countably additive ([66], Proposition 9, p. 90). The above
theorem implies the stated result. A

2.3.4 Conditional probability


Let (Q, F, P) be a probability space and B E Fan event with P(B) > 0.
We define a new probability measure on (Q, F) under the assumption that B has
occurred.
The probability of A conditional on B is

P(A I B) = P(A n B) for A, B E F and P(B) > 0. (2.16)


P(B)

Note: The set function P(· I B) is a probability because it is defined on :F, takes positive
values, P(Q I B) = P(B)/ P(B) = 1, and is countably additive, that is,
P(Uiel Ai I B) = P((Uie/ Ai) n B)/ P(B) = P(Uiel (Ai n B))/ P(B)
=L P(Ai n B)/ P(B) =L P(Ai 1 B)
iel iel
for any countable set I and disjoint events Ai, i E I. A
2.3. Construction of probability spaces 17

If B has not occurred in an experiment, then Be has occurred. The proba-


bility of A given that Be has been observed is given by Eq. 2.16 with Be in place
of B provided that P(Be) > 0. Hence, the conditional probability of A can take
the values P(A n B)/ P(B) or P(A n Be)/ P(Be) depending on whether B or Be
has been observed. In Example 2.86 we extend the definition of Eq. 2.16 to give
the probability of A given the information content of a sub-a-field g of :F.
Let Ai E :F, i = 1, ... , n, be a partition of n such that P(Ai) > 0, and
let B E :F be an arbitrary event. The following two properties of the conditional
probability are very useful for calculations.

The law of total probability:


n n
P(B) =L P(B n Ai) =L P(B 1 Ai) P(Ai), (2.17)
i=1 i=1

and
The Bayes formula:

P(Aj) P(B I Aj) P(Aj) P(B I Aj)


P(B) > 0.
P(Aj I B)= P(B) = L7= 1 P(Ai) P(B I Ai)'
(2.18)
Proof: The events Ai n B, i = 1, ... , n, are disjoint so that Eqs. 2.17 and 2.18 result
from Eq. 2.6 and the definition of the conditional probability (Eq. 2.16). To apply the
Bayes formula, the probability P(B) must be strictly positive. The probabilities P(Ai)
and P(A j I B) in Eq. 2.18 can be interpreted as the prior probability of Ai and the
posterior probability of Aj given the occurrence of B, respectively. P(B I Ai) is the
probability of B conditional on Ai. •

Example 2.13: Consider the experiment of rolling two dice. The sample space,
the a-field :F, and the probability measure are n = {w = (i, j) : i, j = 1, ... , 6},
the collection of all parts of n, and P(w) = 1/36, respectively. Consider the
events A1 = (6, 2) and A2 = {(6, 2), (4, 4), (1, 6)}. The probabilities of these
events given that B = {w E n : i + j = 8} has occurred are P(A 1 I B) = 1/5
and P(A2 I B) = 2/5. <>
Proof: The event B is {(6, 2), (5, 3), (4, 4), (3, 5), (2, 6)}. The probability of A1 condi-
tional on B is P(A1 I B) = 1/5 since A1 is one of the five equally likely members of B.
The same result can be obtained from Eq. 2.16 since P(A1 n B) = P(A 1) = 1/36 and
P(B) = 5/36. The conditional probability of A1 given Be is zero. Hence, the probability
of occurrence of A 1 is 1/5 and zero if B and Be has occurred, respectively. The conditional
probabilities of A2 under B and Be are P(A2 I B) = (2/36)/(5/36) = 2/5 and P(A2 I
Be) = (1/36)/(31/36) = 1/31 since P(B) = 5/36 and P(Be) = 1- P(B) = 31/36. We
will see in Example 2.86 that the probabilities P(A2 I B), P(A2 1 Be), P(B), and P(Be)
are sufficient to specify the conditional probability P(A2 I 9), where 9 = {0, Q, B, Be}
is the a-field generated by B. •
18 Chapter 2. Probability Theory

Example 2.14: Let X be the unknown strength of a physical system, that is, the
maximum load that the system can sustain. Suppose that the system survives
without any damage if subjected to a proof load test of intensity x pr· Consider the
events A = {X > x} and B = {X > Xpr} that the system strength exceeds x and
Xpr. respectively. The probability that the system can sustain a load x given that B
has occurred,
P8 (x) = P(X > max{Xpr. x})j P(X > Xpr),
is larger than the system reliability P(X > x) calculated when the information
provided by the proof load test is not available or is ignored. ~

Example 2.15: A system is subjected to two actions described by the events B 1


and B2, where P(B1) = 0.8 and P(B2) = 0.2. Let P(A I BI) = 0.9 and P(A I
B2) = 0. 7 be the system reliability under action B 1 and B2, respectively, where
A is the event that the system survives. By the law of total probability (Eq. 2.17),
the system reliability is Ps = P(A) = P(A I B1)P(B1) + P(A I B2)P(B2) so
that Ps = (0.9)(0.8) + (0.7)(0.2) = 0.86. ~

2.3.5 Sequence of sets


Let Ai. i = 1, 2, ... , be subsets of a set n. The limit supremum and the
limit infimum of this sequence are

lim supAi= n~ 1 u~j Ai = n~ 1 Bj, Bj = u~jAi ..j.., (2.19)


i--+00

(2.20)

Note: The limits in Eqs. 2.19-2.20 are subsets of n. The sequence of events Bj = U~j Ai
and Cj = n~ j Ai are monotone, B j is a decreasing sequence, that is, Bj 2 B j +1o whjle
C j is an increasing sequence, that is, Cj ~ Cj +1o for all j. These properties are denoted
by B j -).. and C j t, respectively. .A

The limits in Eqs. 2.19 and 2.20 have some notable properties.

Ii~supAi = {Ai i.o.} =


1--+00
\w: f=i=1
1A;(w) = oo}
= {w: wE Aij' j = 1, 2, ... } (2.21)

for some subsequence i j depending on w, where i.o. means infinitely often.

Proof: Recall that the function 1A : n -+ {0, 1}, called the indicator function, is defined
by 1A (w) = 1 for w E A and 1A (w) = 0 for w ¢ A.
2.3. Construction of probability spaces 19

If wE lim sup; A;, then wE Bj = u~ 1 A;, Vj, so that there exists iJ 2: j such
that w E A;1 , j= 1, 2, ... , and L~I 1A; (w) 2: Lj 1A;J (w) = oo implying that w E
{w: I:~ 1 1A; (w) = oo}. Therefore, we have lim sup; A; c {w: L~I lA; (w) = oo}.
If wE {w: L~I1A;(w) = oo}, there exists iJ -+ oo, j = 1, 2, ... , such that
wE A;r Hence, wE Uk-:::_jAk for all j's so that wE lim sup; or {w: I:~ 1 1A;(w) =
oo} c lim sup; A; ([151], Lemma 1.3.1, p. 6). •

lim inf A; = {w : w E An for all n except for a finite number}


i---+oo

= /w: f
!=1
1Af(w) < oo} = {w: wE A;, Vi ::: jo(w)}. (2.22)

Proof: If w E liminf; A;, there exists jo(w) such that w E Cia = n~ 10 A;, that is, w
belongs to all sets A; fori 2: jo(w), or w does not belong to the sets AJ, ... , Aio(w) so
thati:~ 1 1Af(w) is finite.
If w is such that L~ 1 1A c ( w) is finite, then w does not belong to a finite number of
subsets A;. Hence, there erlsts j~(w) such that w E A; fori 2: jo or w E n~ A;. •10

liminfA; = (limsupAf)c ~ 1imsupA;. (2.23)


l i i

Proof: If w E lim inf; A i, there exists jo such that w E CJo, that is, w E A i for
i 2: jo. Hence, w E A; infinitely often, that is, w E lim sup; A;. De Morgan's law
(u~ 1 n~i A;r = n~ 1 U~i Af yields Eq. 2.23 . •

If the limits in Eqs. 2.19 and 2.20 coincide, the sequence A i is said to be a
convergent sequence with the limit

lim A; =liminfA; =limsupA;. (2.24)


i---+oo i---+oo i---+oo

Example2.16: Let A; = {(X!,X2) E ~ :X! E [O,i),x2 E [0, 1/i)}, i =


1, 2, ... , be a sequence of sets in IR 2 . The sequences of events B 1 = U;-:::_ 1 A; and
CJ = n;-:::_JAi have the same limit [0, oo) x {0} as j -+ oo. Hence, {A;} is a
convergent sequence and lim;___,. 00 A; = [0, oo) x {0}. <>

Example 2.17: Consider a monotone sequence of subsets A;. This sequence is


convergent with lim;___,. 00 A; = u~ 1 A; and lim;___,. 00 A; = n~ 1 A; if A; t and
A; ,!.,, respectively. <>
20 Chapter 2. Probability Theory

Proof: We need to show that lim sup A; and lim inf A; are equal. If A; t, then B j =
U;::;:jAi = U;::;:tA; sothatlimsupA; = U;::;:tA;. The sequence Cj = ni::;:jAi inEq. 2.20
is Aj so that liminfA; = U;::;:tA; =limA;. Similar considerations give the limit of a
decreasing sequence of subsets. •

2.3.6 Sequence of events


Let (0, :F, P) be a probability space and consider a sequence of events
A; E:F, i = 1, 2, ... , on this space. The following three statements relate to
the continuity of the probability measure. The last result in this section is the
Borel-Cantelli lemma, a useful tool for proving almost sure convergence.

Continuity of probability measure. If A; t A or A; .,!. A, then P(A;) t P(A)


or P(A;).,!. P(A), respectively, where A= limi--+oo A;.
Proof: If A; is an increasing sequence, then Dt =At and D; =A;\ Ai-l· i = 1, 2, ... ,
provide a partition of u~ 1 A; so that
oo n
P(lim A;)= P(A) = P (u~ 1 D;) =
1--+oo
L P(D;) = n--+oo
lim L P(D;)
i=l i=l
= lim P {U7_ 1 D;) = lim P(An)
n--+00 - n--+oo

since P is a countably additive function. By the first property in Eq. 2.5, the numerical
sequence P(An) is increasing. Similar consideration can be applied for the case A; ,J... If A;
is a decreasing sequence such that n~ 1 A; = 0, we have li!Ilj--+ 00 P(A;) = P(n~ 1 A;)=
P(0) = 0. •

If a sequence of events A i is convergent with limit A = lim; A;, then

lim P(A;)
i--+oo
= P( i--+oo
lim A;) = P(A), (2.25)

that is, for a convergent sequence of events, probability and limit operations
can be interchanged.
Proof: A direct consequence of Eq. 2.25 is that for any sequence of events A; the equalities
P(limsupA;) = .lim P(Bj) = .lim P(U~jA;) and
i--+oo J--+OO J--+00
P(ljminfA;) = .lim P(Cj) = ,lim P(n~jA;)
1--+00 J --+00 j--+00

hold because Bj = U~jAi and Cj = n~jAi are decreasing and increasing sequences,
respectively, lim sup A; = limB j, and lim inf A; = lim C j.
Since A; is a convergent sequence of events, the decreasing and increasing se-
quences Dj = SUPi::;:j A; and Ej = inf;::;:j A; respectively, converge to A. The inequali-
ties P(Ej) :s P(A j) :s P(Dj) hold for each j because Ej ~ Aj ~ Dj· By the continuity
of the probability measure, P ( E j) and P ( D j) converge to P (A) as j --+ oo so that the
limit of P(A j) is P(A) . •
2.4. Measurable functions 21

Fatou's lemma. The pro~ability measure satisfies the inequalities

P(liminfA;) ~ limidfP(A;) ~ limsupP(A;) ~ P(limsupA;) (2.26)


i i ' i i

for any sequence of events A i.

Proof: Because lim sup; A; and liminf; A; are events as countable unions and intersec-
tions of events (Eqs. 2.19 and 2.20), we can calculate P(limsup; A;) and P(liminf; A;).
The inequality P(liminf; A;) ~ P(limsup; A;) follows from Eqs. 2.5 and 2.23. Because
P(A;) is a numerical series, it satisfies the inequality liminf; P(A;) ~ lim sup; P(A;).
The proof of the remaining inequalities is left to the reader.
If the sequence A; is convergent, lim sup; A; = lim inf; A; = lim; A; so that the
left and right terms of Eq. 2.26 coincide. This observation provides an alternative proof for
Eq. 2.25 . •

Borel-Cantelli Lemma. If L; P(A;) < oo, then


P(limsupA;) = P(A; i.o.) = 0. (2.27)
i

Proof: Because lim sup; A; ~ Bj and Bj = U;:::jAi (Eq. 2.19), we have

L P(A;),
00

P(lim_supA;) ~ P (u~ 1 A;) ~ j = 1, 2, ... ,


I i=j

by the monotonicity and subadditivity of the probability measure. The extreme right term in
these inequalities converges to zero as j --+ oo because it is assumed that P(A;) < oo. Li
Hence, P(limsup; A;)= 0. •

2.4 Measurable functions


Let (Q, :F) and (\11, Q) be two measurable spaces and consider a function
h : Q --+ \II with domain Q and range \II.

The function h is said to be measurable from (Q, :F) to (\II, Q) if

h- 1 (B) = {w: h(w) E B} E :F, VB E Q. (2.28)

Note: It is common to indicate this property of h by the notation h : (Q, :F) --+ (Ill, 9),
h E :F/9, or even h E :F provided that there can be no confusion about the a-field 9. If
the a-fields :F and 9 are not in doubt, it is sufficient to say that his measurable. &
22 Chapter 2. Probability Theory

2.4.1 Properties

If h : (Q, :F) -+ (\11, Q) is measurable and Pis a probability measure on (Q, :F),
then Q : Q -+ [0, 1], defined by

Q(B) = P(h- 1 (B)), BE Q, (2.29)

is a probability measure on (\11, Q), called the probability induced by h or the


distribution of h.

Proof: The function Q has the properties Q(\11) = 1 by definition and Q(Uie/ Bi) =
Lie/ Q(Bi) for any countable set I and disjoint events Bi E 9, since the sets h- 1 (Bi)
are disjoint events in F and P is a probability. Hence, the set function Q is a probability
measure on (\11, Q). •

If C is a collection of subsets of \II, then h - 1 (a (C)) = a (h - 1 (C)), where


h- 1 (a(C)) and a(h- 1 (C)) are the inverse image of a(C) and the a-field gen-
erated by h - 1 (C) in Q, respectively ([151], Proposition 3.1.2, p. 73).

Let C be a collection of subsets of \II such that a(C) = Q. The function h is


measurable if and only if h - 1 (C) s; :F ([151], Proposition 3.2.1, p. 76).

If g : IRd -+ IRq is a continuous function, then g is Bd I Bq -measurable.

Proof: Let vr denote the topology on lR' generated by the open intervals of lR'. Because
g is a continuous function, we have g - 1 (D) E Vd for all D E Vq . This function is
measurable since the Borel fields gi and Bq are generated by the open sets in W and Rq,
respectively. •

2.4.2 Definition of random variable


If a probability measure Pis attached to the measurable space (Q, :F), the
function h in Eq. 2.28 is called a random variable. We consider primarily random
variables corresponding to (\11, Q) = (JE.d, Bd) and denote them by upper case
letters, for example, X : Q -+ JRd for d > 1 and X : Q -+ lR for d = 1. Random
variables with values in JRd, d > 1, are also called random vectors.

The a-field generated by a random variable X is

(2.30)

and represents the smallest a-field with respect to which X is measurable.

Proof: The collection a(X) of subsets of Q is a a-field because (1) W E Bd so that r.l =
x- 1 (Rd) E a(X), (2) if BE Bd, then x- 1 (B) E a(X) and (X- 1(B))c = x- 1(Bc) E
2.4. Measurable functions 23

a(X), and (3) if Bi E Bd, i E /, we have x- 1 (Bi) E a(X) and Uie/X- 1 (Bi) =
x- 1 (Uie/ Bi) E a(X), where I is a countable index set.
If 1l is another u -field relative to which X is measurable, this field must include the
subsets x- 1(Bd) of 0 so that 1l includes a(X) . •

The distribution of X is the probability induced by the mapping X : Q --+ lR d


on the measurable space (JRd, Bd), that is, (Eq. 2.29)

(2.31)

The mapping X : (Q, :F) --+ (IR, B) is a random variable if and only if
x- 1 (( -00, x]} E F, X E !R ([151), Corollary 3.2.1, p. 77).
Note: This result is used extensively in applications to determine whether a function is
measurable and to find properties of random variables . .&

The JRd -valued function X : (Q, :F) --+ (JRd, Bd) is a random vector if and only
if its coordinates are random variables.

Note: If X is a random vector, its coordinates Xi = Jri a X, i = 1, ... , d, are random


variables because the projection map rri (x) = Xi is continuous.
S1pose now that Xi are random variables. Let R be the collection of open rectan-
gles in~ with members R = /1 x · · · x Ia, where /i are open intervals in JR. The a-field
Bd is generated by these rectangles, that is,~ = a(R) ([151], Proposition 3.2.4, p. 81),
so that it is sufficient to show that x- 1 (R) is in F. We have x- 1 (R) = nf= 1X/ 1(/i) so
that x- 1 (R) E :F since xi are random variables. This property shows that X is a random
vector if and only if {wE 0 : Xi (w) :::S xj} E }=for all Xi E JR, i = 1, ... , d.
These considerations can be extended to a countable collection (X1, X2, ... ) of
random variables, referred to as a time series or a discrete time stochastic process. A time
series (X1, X2, ... ) takes values in IR00 • The measurable spaces (0, :F) and (IR00 , 8 00 )
need to be considered to establish whether (X1, X2, ... ) is a measurable function . .&

Example 2.18: Consider the experiment of rolling a die. Let X, Y : n --+ lR be


defined by X(w) = 1 and -1 for w ~ 3 and w > 3, respectively, and Y(w) = w,
where n = {1, 2, 3, 4, 5, 6} and P({w}) = 1/6. The a-fields generated by X
and Y are ;:x = {0, {1, 2, 3}, {4, 5, 6}, Q} and FY = P(Q), where P(Q) is the
power set of n, that is, the collection of all subsets of the sample space. We have
X E ;:x, Y E :FY, and X E :FY since ;:x c :FY. However, Y ¢ Fx since the
information content of :F X is too coarse to describe Y. <>

Example 2.19: Let X : (IR, B) --+ (IR, B) be an increasing function. Then X


is measurable because B can be generated by the collection of intervals (a, b],
a ~ b, and x- 1 ((a, b]) = (x- 1(a), x- 1 (b)] is in B. <>
24 Chapter 2. Probability Theory

Example 2.20: Let Q = [0, 1], F = B([O, 1]), and P(dw) = dw. If a and f3 > a
are constants, the function w f--* X (w) = a + ({3 - a) w is a random variable on
(Q, F, P) since it is continuous. The distribution of X is Q((xt, xz]) = (xz-
Xt)/({3- a). ¢
Proof: Take B = (XJ, xz] in the range of X. The distribution Q ((x!, xz]) is equal to the
probability P((wt. wz]) of the interval x- 1((x 1, xz]), where w; = (xi - a)/({3 -a). •

2.4.3 Measurable transformations


The determination of the output of a physical system subjected to a random
input involves transformations of random variables. For example, suppose that the
input to a system is modeled by an JRd -valued random variable X and the objective
is to characterize the system output Y = g(X), where g : JRd ~ IR depends on
the system properties. The mapping

(Q' F) L (IRd' Bd) ~ (IR, B)


from (Q, F) to (lR, B) defines the output. A first question we need to pose is
whether Y is a random variable.
If (Q, F), (\11, 9), and (<I>, H) are measurable spaces and

h: (Q, F)~ (\11, 9), g: (\11, 9) ~ (<1>, H)

are measurable functions, then

go h: (Q, F)~ (<1>, H) is measurable.

Proof: ForB E 1{, g- 1 (B) ising since g is measurable. We have h- 1 (A) E :F for any
A E g since h is measurable, so that h-I (g -I (B)) E :F. Hence, g o h is measurable from
(Q, :F) to (<I>, 1{). •

Example 2.21: If X : (Q, F) ~ (lR, B) is a real-valued measurable function and


g : (IR, B) ~ (IR, B) is a Borel measurable function, that is g -I (B) c B, then
Y = g(X) =go X : (Q, F) ~ (lR, B) is a real-valued measurable function. <>
Proof: We need to show that y- 1 (B) = x- 1 (g- 1 (B)) E :F for any B E B. This
condition is satisfied because B 1 = g- 1(B) E B, VB E Band x- 1 (B 1) E :F, VB' E B,by
the properties of g and X (Fig. 2.2). •

Example 2.22: If X is a random variable on a probability space (Q, F, P), r > 0


is an integer, and A., u are some real constants, then

e
-J..X
' and eAux
are random variables on the same probability space. ¢
2.4. Measurable functions 25

Real Real
line line

g-1
set

Figure 2.2. Mappings X and g

Note: A continuous function g : lR -+ lR is Borel measurable because the open intervals


of lR generate the cr-field Band g is continuous if g- 1(1) E V for any I E V, where
V denotes the topology generated by the open intervals of JR. Hence, g o X is a random
variable if g is a continuous function and X is a random variable.
The function w-+ eA u X(w) E IC defined by the last transformation is complex-
valued. A complex-valued function defined on Q is measurable if its real and imaginary
parts are measurable functions from (Q, F) to (JR, B). Hence, eA u X is a measurable
function because its real and imaginary parts, cos(u X) and sin(u X), are F-measurable as
continuous transformations of X. 6

Example 2.23: If (X, Y) : (Q, F) --+ (JR.2 , B 2) and g : (JR.2 , B 2) --+ (JR., B) are
measurable functions, then g(X, Y) e :F jB. For example, the functions X v Y =
max(X, Y), X 1\ Y = min(X, Y), X+ Y, X- Y, X Y, and X/Y are measurable
from (Q, F) to (JR., B). The transformation XfY is defined for Y =f:. 0 ([40],
Theorem 3.1.5, p. 36)). ~

Example 2.24: If X;, i = 1, 2, ... , is a sequence of random variables defined on


a measurable space (Q, F), then inf; X;, sup; X;, liminf; X;, and lim sup; X; are
measurable. ~
Proof: The event {inf; X; > x} = nj{X; > x}, x E lR, is in F since X; are random
variables. Hence, inf; X; is a measurable function. Similarly, sup; X; is a random vari-
able since {sup; X; :::0 x} = n;{X; :::0 x}. Because liminf; X; = supi:O:l infj:o:i Xj and
infj:o:i X j are random variables, liminf; X; is measurable. A similar argument shows that
lim sup; Xi is a random variable. •

Example 2.25: Consider a series X = (X 1. X2, ... ), where X; are measurable


functions from (Q, :F) to (\11, Q). Let K be the collection of all subsets of Z + =
26 Chapter 2. Probability Theory

{1, 2, ... }. The function (m, w) ~---+ Xm(w) depending on the arguments m and w
is measurable from (/Z. + x Q, K x :F) to (IV, Q). Generally, this property does
not hold if the discrete index m in this example is allowed to take values in an
uncountable set, as we will see in the next chapter. ¢

Proof: Let A= {(m, w) : Xm(w) E B} be the inverse image of the function (m, w) ~--+
Xm(w) in z+ x n corresponding to an arbitrary member B of 9. Because Xm is mea-
surable, this set can be expressed as the countable union Um{w : Xm(w) E B} of sets
{w : Xm (w) E B} that are in :F for each m 2: 1. Hence, A is inK x :F. We also note that
the function m !--+ Xm (w) is K-measurable for a fixed (L) E n since {m : Xm (w) E B} is a
subset of z+ so that it is inK. •

Example 2.26: Let M : (Q, :F) --+ (IZ +, K) and Xm : (Q, :F) --+ (IV, Q) for
m E z+ be measurable functions, where K consists of all parts of Z +. The
functionw ~---+ XM(w)(w) is measurable from (Q, :F) to (IV, Q). ¢

Proof: We need to show that A = {w : XM(w)(w) E B} is in :F forB E 9. The sets


Am = {w: M(w) = m}, m = 1, 2, ... , provide a countable partition of nand Am E :F for
each m because M is :F/ K-measurable. Hence, A is equal to a countable union of disjoint
parts, that is,

Since Xm is an :F/9-measurable function for each m, A is in :F. •

2.5 Integration and expectation


Let (Q, :F, P) be a probability space. We define the expectation for a spe-
cial class of random variables taking a finite number of values and then extend
this definition to arbitrary random variables.

If I is a finite index set and A; E :F, i E /,is a partition of Q, then

X= I:x; lA;• x; E JR, (2.32)


ie/

defines a simple random variable.

Note: Because X takes a constant value x; in A; and the subsets A; partition n, we have
P(A;) = P(X = x;), where {X; = x;} is an abbreviated notation for the event {w E n :
X(w) = x;}.
The collection of simple random variables is a vector space. Moreover, if X and
Yare two simple random variables defined on a probability space (Q, :F, P), then X Y,
X A Y = min(X, Y) and X v Y = max(X, Y) are simple random variables on the same
space ([151], Section 5.1). ~
2.5. Integration and expectation 27

If A E F, the indicator function

1 ifwEA
1A(w) ={ 0 ifw ~A is a simple random variable. (2.33)

Proof: Because 1:4 1 (B) = {w: lA(w) E B} is 0, A, Ac, and Q ifO, 1 ¢ B, 1 E B but
0 ¢ B, 0 E B but 1 ¢ B, and 0, 1 E B, respectively, we have 1:4 1(B) E F for any BE B
so that lA is FIB-measurable. •

The following measurability theorem suggests that we may be able to ex-


tend properties established for simple random variables to arbitrary random vari-
ables.
LetX: Q--+ RbesuchthatX(w) ~ Oforallw E Q. ThenX E F/Bifand
only if there exists an increasing sequence of simple random variables X n ~ 0,
n = 1, 2, ... , converging to X, referred to as an approximating sequence for
X ([151], Theorem 5.1.1, p. 118).

Proof: If there is an approximating sequence of simple random variables Xn and X (w) =


liiDn--+oo Xn(w), then X is a random variable since limits preserve measurability (Exam-
ple 2.24).

Figure 2.3. Construction of the approximating sequence X n

If X E FjB, construct the sequence

Xn
nzn
= LX
(k 1)
; lAn,k + n lBn ~ 0
k=l

of simple random variables, where An,k = {w : (k- l)/2n ::::; X(w) < kj2n} and Bn =
{w : X(w) ~ n} (Fig. 2.3). The sequence Xn, n = 1, 2, ... , is increasing, that is, Xn ::::;
Xn+l• and its members are measurable functions that are smaller than X. If X(w) < oo,
28 Chapter 2. Probability Theory

then IXn(w)- X(w)l ::; 2-n --+ 0 as n--+ oo. If X(w) = +oo, then Xn(w) = n so that it
approaches infinity as n increases. •

2.5.1 Expectation operator


Consider a probability space (Q, :F, P). We define the expectation for
finite-valued simple random variables and then extend this definition to arbitrary
real-valued random variables.

2.5.1.1 Finite-valued simple random variables

If X is a simple random variable (Eq. 2.32) such that lxd < oo, i E /, its
expectation is
E[X] = I>i P(A;). (2.34)
iel

Note: The expectation of the indicator function in Eq. 2.33 is E[lA] = P(A). A

Example 2.27: Consider a loaded die with m < oo sides showing the numbers
Xi with probabilities Pi. i = 1, ... , m. Suppose we roll the die n times and
ni ;::: 0 denotes the number of times we see side i. The arithmetic average, i =
Lie! Xi (ni/n), of the observed values approximates E[X] in Eq. 2.34 for large
values of n because n if n converges to Pi as n --+ oo. <>

The following properties are direct consequences of the definition in Eq. 2.34.

If X andY are finite-valued simple random variables, then

• X ;::: 0 ==> E[X] ;::: 0.


• E[a X+ fJ Y] =a E[X] + fJ E[Y], a, fJ =constants==> E[·] is linear.
• X ::; Y implies E[X] ::; E[Y] ==> E[·] is monotone. (2.35)

Proof: Let X = Lief Xi 1Ai and Y = LjeJ Yj 1ni' where I and J are finite sets.
X 2:: 0 implies x; 2:: 0 so that E[X] is positive (Eq. 2.34), that is, the first property. To
prove the second property, note that a X + f3 Y is a finite-valued simple random variable
corresponding to the partition A; n Bj of f.l ([151], pp. 120-121). If X ::; Y, the first
property and the fact that Y- X 2:: 0 is a simple random variable imply E[Y- X] 2:: 0 or
E[Y] 2:: E[X] (Eq. 2.34) . •

If X and X n are finite-valued simple random variables and X n t X or Xn ,!.. X,


then ([151], Property 5, p. 121)

E[Xn] t E[X] or E[Xn] ,!, E[X]. (2.36)


2.5. Integration and expectation 29

2.5.1.2 Positive random variables

If P(X = oo) > 0, set E[X] = oo. Otherwise,

E[X] = lim E[Xn], (2.37)


n--j.CXJ

where Xn is an approximating sequence of finite-valued simple random vari-


ables such that Xn t X and E[Xn] is finite for each n.

Note: The measurability theorem guarantees the existence of an approximating sequence.


The expectation in Eq. 2.37 is well defined since the value of E[X] does not change if
another approximating sequence Ym t X is used in this equation in place of Xn ([151],
Proposition 5.2.1, p. 122). A

The properties of the expectation for finite-valued simple random variables


in Eqs. 2.35 and 2.36 can be extended to positive random variables with the range
[0, oo].

If X, Y, and Xn, n = 1, 2, ... , are positive random variables with range [0, oo],
then ([151], Section 5.2.3)

• E[X] E [0, oo].


• E[a X+ p Y] =a E[X] + p E[Y], a, p > 0.
• Xn t X ==} E[Xn] t E[X]. (2.38)

Note: The last property, called the monotone convergence theorem, gives conditions
under which limits and expectations can be interchanged ([151], Section 5.2.3). A similar
result is given in Eq. 2.36. A

2.5.1.3 Arbitrary random variables


An arbitrary random variable X has the representation X = X+ - x-,
where

x+ =X v o= max:(X, 0) and x- =(-X) v o= max:(-X, 0) (2.39)

are positive random variables.

• If X is an arbitrary random variable such that at least one of E[X +]and E[X-]
is finite, we define
(2.40)
elf E[X+] and E[X-] are both finite, then E[X] in Eq. 2.40 exists and is finite.
We say that X has a finite expectation or is P-integrable.

• If both E[X+] and E[X-] are unbounded, E[X] does not exist.
30 Chapter 2. Probability Theory

Note: If the expectations E[X+] and E[X-] are unbounded, E[X] given by Eq. 2.40 does
not exist. If E[X+] and E[X-] are finite, then both E[X] < oo and E[IXI] < oo since
X = x+ - x- and lXI = x+ + x-. If one of the expectations E[X+] and E[X-] is
finite and the other infinite, E[X] is defined but is unbounded. For example, E[ X] exists
and E[X] = +oo if E[X+] = +oo and E[X-1 is bounded. A

If the expectation of a random variable X exists, it is also denoted by

E[X] =In X(w) P(dw) or E[X] = f X dP. (2.41)

• The integral of X with respect to P over set A E F is

E[X 1A] =In 1A(w) X(w) P(dw) = i X(w) P(dw). (2.42)

• If E[X 1A] exists and is finite, X is said to be integrable with respect to P


over A or P-integrable over A.

Note: The integrals in Eqs. 2.41 and 2.42 are special cases of the Lebesgue-Stieltjes inte-
gral J h dJ.L, where his a measurable function on a measure space (r.l, :F, J.L). The random
variable X and the probability measure Pin Eqs. 2.41 and 2.42 correspond to the measur-
J
able function h and the measure J.L, respectively. Hence, X d P has properties similar to
fhdJ.L. A

Example 2.28: Let Q = {w = (i, j) : i, j = 1, 2, ... , 6} be the sample space


associated with the experiment of rolling two dice and F be the collection of all
subsets of n. Define a random variable X(w) = i 2 + } 2 . This function is positive
and takes on a finite number of values. The integral of X with respect to P over
A is (Eq. 2.42)

1 1
E[X lA] = 2)i 2 + } 2 ) 36 = [(2)(40) + (2)(34) + 32] 36 = 5,
wE A

where A= {w = (i, j): i +j = 8} = {(2, 6), (3, 5), (4, 4), (5, 3), (6, 2)}. <>

2.5.2 Properties of integrals of random variables


We review essential properties of integrable random variables. These prop-
erties are divided into three groups depending on the number of random variables
in the integrand and on the domain of integration.

2.5.2.1 Finite number of random variables


Let X and Y be random variables defined on a probability space (Q, F, P)
and let A be an event on this space, that is, A E F.
2.5. Integration and expectation 31

I fA X dP is finite if and only if fA lXI dP < oo ([40], Section 3.2).

Note: We have IE[X]I < oo if and only if E[IXI] < oo by setting A = Q. This result is
consistent with an earlier comment (Eq. 2.40). A

If X andY are P-integrable over A, a X+ bY is P-integrable over A and

i<aX+bY)dP=a LxdP+b LYdP, (2.43)

where a, bare constants ([40], Section 3.2).

Note: If A = Q, Eq. 2.43 yields E[a X + bY] = a E[X] + b E[Y] showing that E
is a linear operator in agreement with Eq. 2.35 and 2.38. This property implies that
fA L7=1 ai Xi dP =L:?=t ai fA Xi dP if the random variables Xi are P-integrable over
A and ai denote constants. A

If X is P-integrable and Ai E :F is a partition of Q, then ([40], Section 3.2)

(2.44)

IIf X:=::: 0 a.s., then fA X dP :=::: 0 ([40], Section 3.2).


If X is P-integrable, then X is finite a.s., that is, {w: X(w) = ±oo} is in :F and
has probability zero ([40], Section 3.2).

If Y ::::X a.s. and the integrals f X dP and f Y dP exist ([40], Section 3.2),

LYdP:::: L XdP. (2.45)

Note: Because X- Y 2: 0, we have fA (X- Y) dP 2: 0 by one of the above properties.


The mean value theorem, a P(A) :::; fA X d P :::; b P(A) if a :::; X :::; b a.s., and the
modulus inequality,

IL XdPI::::: LIXIdP, (2.46)

are direct consequences of Eq. 2.45. For example, a P(A) :::; fA X dP follows from
Eq. 2.45 with Y = a and X 2: a a.s. The modulus inequality can be obtained from
Eq. 2.45 with (-lXI, X) and (X, IX!) in place of (Y, X). A
32 Chapter 2. Probability Theory

2.5.2.2 Sequence of random variables


Let Xn, n =
1, 2, ... , andY be random variables defined on a probability
space (Q, F, P). Let A be an event on this space, that is, A E F.

Iflimn~oo Xn = X a.s. and if, for each n,


•IXnl:::;: Y a.s., Y ::=: 0, and fAY dP < oo, or
• IXn I :::;: c a.s. for a positive constant c, or
• Xn ::=: 0 a.s. is an increasing sequence that can take on the value +oo,

then lim { Xn dP
n~oo }A
= }A{ (n~oo
lim Xn) dP = { X dP.
}A
(2.47)

Note: Under the conditions of this theorem we can interchange the limit and the integra-
tion operations. The a.s. convergence of Xn to X means that the numerical series Xn (w)
converges to X(w) as n ~ oo for each w E Q \ N, where N E :F and P(N) = 0
(Section 2.13). An event N with this property is called a null set.
The statements in Eq. 2.47 corresponding to the conditions (1) IXnl :::=:: Y a.s and
fAY dP < oo, (2) IXnl :::=:: c a.s., and (3) Xn ~ 0 a.s. is an increasing sequence with +oo
being an allowed value are referred to as the dominated convergence, bounded conver-
gence, and monotone convergence theorems, respectively. Slightly weaker versions of the
dominated, bounded, and monotone convergence conditions under which Eq. 2.47 holds
can be found in [40] (Section 3.2). i.

If LnfA IXnl dP < oo, then integration can be performed term by term ([40],
Section 3.2)
(2.48)

If Xn ::=: 0 a.s. on A for each n, then ([40], Section 3.2)

{ (liminf Xn) dP:::;: liminf { Xn dP (Fatou's lemma). (2.49)


}A n n }A

Note: Recall that


liminf Xn
n~oo
= n~l
sup inf Xk = lim 1\k>nXk
k~n n~oo -
and

limsupXn
n~oo
= n~l
inf supXk = lim Vk>nXb
k~n n~oo -

where 1\k>nXk and Vk>nXk are increasing and decreasing sequences. Because Vk>nXk =
- 1\k~n (:_Xk), we hmre limsupn~oo Xn = -liminfn~ 00 (-Xn). The above limits of
Xn resembles the limits for sequences of subsets in Q (Eqs. 2.19 and 2.20). Also note that

{VnXn :::5 x} = nn{Xn :::5 x} E :F and {1\nXn > x} = nn{Xn > x} E :F


2.5. Integration and expectation 33

if Xn are random variables on (Q, :F, P). •

If the sequence of random variables Xn is such that IXnl ~ Z, where Z:::: 0 a.s.
and P-integrable over A, then

~ liminf1 XndP
JA{ (liminfXn)dP A
1 1
n n

~ lim sup Xn d P ~ (lim sup Xn) d P (Lebesgue theorem). (2.50)


n A A n

Proof: Since Z is P-integrable, Z is finite a.s. so that P({w E A : Z(w) = +oo}) = 0.


Note that Xn + Z 2: 0 a.s. so that the Fatou lemma yields

/(liminfXn)dP+ JZdP~liminf J XndP+ J ZdP

or j(liminfXn)dP ~ liminfJ XndP since liminf(Xn + Z) = liminfXn + Z. The


last inequality in Eq. 2.50 results from sup{Xn, Xn+l• . .. } = -inf{-Xn. -Xn+l• ... }.
The middle inequality is valid for any numerical sequence, in particular for the sequence
fAXndP . •

Example 2.29: The inequalities in Eq. 2.26 can be obtained from the Lebesgue
theorem in Eq. 2.50 with A = Q, X n = 1An, and An E F. ¢

Proof: Take A = Q and Xn = lAn, An E :F, in Eq. 2.50. Then

f (liminfXn)dP =/(sup inf lAk)dP = P (sup lnk::nAk) = P(liminfAn)


n-+oo n~l k~n n~l n-+oo

because infk~n lAk = lnk>nAk is an increasing sequence so that SUPn>Ilnk>nAk is equal


to Un~ tl nk>n Ak or lun> 1n;>nAk . Similar considerations give the last inequalitY in Eq. 2.26.
The integral fA Xn dP is J~ lAn dP = P(An) so that we have

liminf P(An) ~lim sup P(An)


n n

from the middle inequality in Eq. 2.50. •

2.5.2.3 Expectation

The properties in--Sections 2.5.2.1 and 2.5.2.2 hold for any A E F. If


A = Q, the above integrals become expectations. We give here a list of properties
of the expectation operator that are useful for calculations. It is assumed that all
expectations exist and are finite.
34 Chapter 2. Probability Theory

• The expectation is a linear operator.


• If X :::: 0 a.s., then E[X] :::: 0.
• If Y :::; X a.s., then E[Y] :::; E[X].
• IE[X]I s E[IXI].
• Iflimn--+oo Xn =X a.s., and there exists Z:::: 0 a.s. such that E[Z] < oo and
IXn I S Z, then limn--+oo E[Xn] = E[Iiffin--+oo Xn] = E[X].
• If Ln E[IXn ll < 00, then E [.En Xn] = Ln E[Xn].
Proof: The first four properties follow from Eq. 2.43, the fact that X ~ 0 a.s. implies
fA X dP ~ 0 for A E :F, Eq. 2.45, and Eq. 2.46, respectively. The Lebesgue theorem
gives fA(limXn)dP = limfA XndP if limXn =X. The last property follows from
Eq. 2.48 . •

2.6 The Lq(Q, F, P) space


Let (Q, F, P) be a probability space. Denote by Lq(Q, F, P) the collec-
tion of real-valued random variables X defined on (Q, F, P) such that E[IXI q] <
oo for q :::: 1. If X is in Lq(Q, F, P) we write X "' Lq(Q, F, P) or X "' Lq
provided that there is no confusion about the probability space. The case q = 2
is most relevant for applications because it relates to second moment calculus and
estimation theory discussed later in this and subsequent chapters. The L 2 space
has some notable properties.

• L2 is a vector space.
• L2 is a Hilbert space with the inner product (X, Y) = E[X Y] for X, Y E L2
and the norm II X II Lz= (E[X 2l) 112 , referred to as the L2-norm.
Proof: We need to show that (1) (X, Y) = E[X Y] is an inner product on L2, (2) L2 is a
vector space, and (3) L2 with the metric d(X, Y) = (X- Y, X- Y) 112 =II X- Y IIL 2 is
complete.
The expectation E[X Y] defines an inner product on L2 because
(0, X) = 0, X E L2,
(X, X) > 0, X E L2, X#= 0,
(X, Y) = (Y, X), X, Y E L2,
(X+ Y, Z) =(X, Y) + (Y, Z), X, Y, Z E L2,
A. (X, Y) =(A. X, Y), X, Y E L2,A. E R.
The function d: L2 x L2----+ [0, oo) defined by d(X, Y) =II X- Y IIL 2 is a metric on L2
because
d(X, Y) = 0 if and only if X= Y,
d(X, Y) = d(Y, X) for each X, Y E L2,
d(X, Y) ~ d(X, Z) + d(Z, Y) for each X, Y, Z E L2.
2.6. The Lq (Q, :F, P) space 35

The first condition in the above equation is not satisfied in a strict sense. The condition
holds if we do not distinguish between random variables that differ on a set of probability
zero.
That L2 is a vector space follows from the properties (a) X ~ L2 and ). E lR
imply ). X ~ L2 since E[(). X) 2] = ). 2 E[X2] < oo and (b) X, Y ~ L2 implies
X + Y ~ L2 since E[(X + Y) 2] = E[X 2] + E[Y 2] + 2 E[X Y], E[X 2] and E[Y 2]
are finite, and IE[X Yll < oo by the Cauchy-Schwarz inequality discussed later in this
chapter (Eq. 2.112).
It remains to show that L2 endowed with the metric (X, Y) r+ d(X, Y) =II X-
Y IILz is complete, that is, that any sequence Xn E L2, n = 1, 2, ... , with d(Xn, Xm) -+ 0
as n, m-+ oo is convergent and its limit is in L2. The proof that L2 is a Hilbert space can
be found in [66] (Proposition 4, p. 399). We only show here that, if X, Xn E L2 and
II Xn- X 11£ 2 -+ 0 as n -+ oo, that is, Xn converges in mean square (m.s.) to X, then
II Xn - Xm II Lz-+ 0 as n, m -+ oo, that is, Xn is a Cauchy sequence in the mean square
sense (Section 2.13). Take e > 0 and an index ii such that II Xn- X 11£ 2 < t:/2 and
II Xm- X IIL 2 < ef2 for n, m 2:: ii. This is possible since II Xn- X IIL 2 converges to zero
as n-+ oo. We have II Xn- Xm 11£ 2 :511 Xn- X 11£2 +II Xm- X 11£ 2 < e for n, m 2:: ii,
that is, II Xn - Xm IIL 2 converges to zero as n, m -+ oo. •

Example 2.30: The sequence of spaces L q ( Q , :F, P) is decreasing, that is, L ql ~


Lq for q :=: q'. However, the spaces Lq(Q, :F, tL) corresponding to a measure
space (Q, :F, tL) may not decrease with q. <>

Proof: The first part follows from (E[IXIJ)P :5 E[IXIP], 1 < p < oo, X E L P• de-
rived from HOlder's inequality with Y = 1. This inequality is proved later in this chapter
(Eq. 2.113). Holder's inequality with IXIq in place of X gives {E[IXIql)P :::: E[IXIPq]
j
or (E[IXIql)q q :::: E[IXIq l for p = q' jq 2:: 1. Hence, E[IXIq l < oo implies
1 I I

E[IXIql < oo, that is, Lql s; Lq forq' 2:: q.


Let h : (JR, B, ).) -+ (JR, B, ).) be a measurable function defined by h(w) = 0 for
w < 1 and h(w) = 1/w for w 2:: 1, where ).(dw) = dw is the Lebesgue measure. The
integrals fJR h d). and fJR h 2 d). are equal to +oo and one, respectively. Hence, h is an
element of L2(JR, B, ).) but is not in L1 (JR, B, ).), so that L2 rt. L1. •

Let (Q, F, P) be a probability space and Q a sub-a-field of F. Consider


an m.s. convergent sequence Xn in L2(Q, Q, P). If the m.s. limit X of Xn is in
L2(Q, Q, P), we say that L2(Q, Q, P) is a closed subspace of L2(Q, F, P). The
following theorem, referred to as the orthogonal projection theorem, gives an
essential property of the L 2 spaces.

If Q is a sub-a-field of :F, then for any X E L2(Q, :F, P) there is a unique


X E L2(Q, Q, P) such that ([156], Theorem4.11, p. 84)
II X- X 11£ 2 = min{ll X- Z IIL 2 : Z E L2(Q, Q, P)} and
(X- X, Z) = 0, vz E L2(Q, Q, P). (2.51)
36 Chapter 2. Probability Theory

Note: The random variable X E L2(f2, 9, P) has the smallest mean square error of all
members of L2(f2, 9, P), and is called the best m.s. estimator of X. The second equal-
ity in the above equation can be used to calculate X and shows that the error X - X is
orthogonal to L2(f2, 9, P). A

2.7 Independence
We define independent a-fields and apply this definition to events and ran-
dom variables. We also discuss the Borel Zero-One Law giving the probability of
lim sup of independent events.

2.7.1 Independence of a-fields


Consider a probability space ( Q , :F, P) and a collection of sub-a-fields, :F i,
i E I, of :F.

• If I is finite and

P (nieiA;) = n
ie/
P(Ai), 'v'Ai E :F;, (2.52)

then the a-fields :Fi, i E I, are independent.


• If I is infinite and Eq. 2.52 holds for all finite subsets of I, then the a-fields
:Fi are independent.

Note: The above condition implies that any sub-collection of events A; E :Fi, i E I, must
satisfy Eq. 2.52 since some of the events Ai may coincide with the sample space 0. This
requirement is consistent with Eq. 2.54. If the a-fields :Fi, i E I, are on different sample
spaces, the above independence condition has to be applied on the corresponding product
measure space. A

Example 2.31: The sub-a-fields :Ft and :F2 generated by the collections of events
At = {{1, 2, 3}, {4, 5, 6}} and A2 = {{1, 2, 3}, {4}, {5}, {6}} in Example 2.1 are
not independent. For example, the subset {1, 2, 3} belongs to both :F 1 and :F2,
but the probability P({1, 2, 3} n {1, 2, 3}) = P({1, 2, 3}) = 1/2 differs from
P({1, 2, 3}) P({1, 2, 3}) = 1/4. <>

2.7.2 Independence of events


The definition in Eq. 2.52 applied to events is consistent with the classical
definitions of independence between events.

The events Ai E :F, i E I, of a probability space (Q, :F, P) are independent if


the a-fields a(Ai) generated by these events are independent, where the index
set I may be finite or infinite.
2. 7. Independence 37

Example 2.32: Let A and B be two events in :F. These events are independent if
the a-fields a(A) = {0, A, Ac, Q} and a(B) = {0, B, Be, Q} are independent or,
equivalently, if P(A n B) = P(A) P(B). <>

Proof: The independence of the a-fields a(A) and a(B) requires that P(Ai n Bj) =
P(Ai) P(Bj) for all Ai E a(A) and Bj E a(B) (Eq. 2.52). The resulting non-trivial
conditions of independence are P(A n B) = P(A) P(B), P(Ac n B) = P(Ac) P(B),
P(A n Be) = P(A) P(Bc), and P(Ac n Be) = P(Ac) P(Bc). These conditions are
equivalent with the classical requirement P(A n B) = P(A) P(B) for the independence of
two events (Eq. 2.53). For example, P(AcnB) = P(B)- P(AnB) since A cnB = B\(An
B) so that P(A c n B) = P(A c) P(B) if P(A n B) = P(A) P(B). Similar considerations
apply to show that P(A n Be) = P(A) P(Bc) follows from P(A n B) = P(A) P(B) . •

We give now classical definitions for the independence of two or more


events and for families of events.
Two events, A and B, are said to be independent if

P(A n B)= P(A) P(B). (2.53)

Note: Let A and B be two events such that P(A) > 0 and P(B) > 0. If A and B are
independent, the occurrence of B does not affect the probability of A so that P(A I B) =
P(A) implying P(A n B) = P(A) P(B). An equivalent condition of independence for
the events A and B is P(B 1 A) = P(B). Note that the events A and B = Ac are not
independent because A cannot occur if B is observed. &

A finite collection of events, Ai, i = 1, ... , n, is independent if

n
m

P(Ai 1 n Ai2 n · · · n Aim)= P(Aik) (2.54)


k=l

holds for any subset {i 1, ... , im} C {1, ... , n}.

Example 2.33: The conditions of Eq. 2.54 are essential to assure the indepen-
dence of three or more events. It is not sufficient to require that Eq. 2.54 be
satisfied for the entire collection of events. For example, consider the sample
space Q = {1, 2, 3, 4} with :F given by all parts of Q and a probability measure
P defined by P({1}) = .../2/2- 1/4, P({2}) = 1/4, P({3}) = 3/4- .../ij2, and
P({4}) = 1/4. Let A1 = {1, 3}, A2 = {2, 3}, and A3 = {3, 4} be some events on
(Q, :F). The probability of A1 n A2 n A3 = {3} is P({3}) = 3/4- .../2/2 and
is equal to P(A1)P(A2)P(A3). However, P(A1 n A2) f. P(A1)P(A2) ([106],
p. 2). <>

Example 2.34: Let Si, i = 1, 2, ... , n, be the event that the maximum flow in a
river during year i does not exceed the height of a flood protection system. The
38 Chapter 2. Probability Theory

probability that there will be no flood inn years is P8 (n) = P(n?=l S;). If the
events S; are independent, the reliability of the flood protection system in n years
is P8 (n) = Jl?=l P(S;) so that P8 (n) = pn for the special case P(S;) = p. ¢

The families C; c :F, i = 1, ... , n, are independent, if for any A 1 E


cl •... ' An E Cn. the events AI, ... ' An are independent.

The families Ct. t E T, where T is an arbitrary index set, are independent if Cr.
t E I, are independent families for each finite subset I of T.

The above definitions and the following criterion can be used to prove in-
dependence of a-fields. The criterion uses classes of events forming a rr-system.
A collection C of subsets of n is said to be a rr -system if it is closed to finite
intersection, that is, A, B E C implies A n B E C.

If C; is a non-empty class of events in :F for each i = 1, ... , n, such that (1) Ci


is a rr-system and (2) C;, i = 1, ... , n, are independent, then the a-fields a(C;),
i = 1, ... ,n, areindependent([l51], Theorem4.1.l,p. 92).

Borel Zero-One Law. If A; is a sequence of independent events, then

P(l . A·)_ P(A·. ) _ { 0 ifandonlyif I:; P(A;) < oo, (2 .55)


tmisup 1 - 1 t.o. - 1 ifandonlyif I:; P(A;) = oo.

Proof: For proof see [151] (Proposition 4.5.2, p. 103). We only show that Li P(A;) =
oo implies P(A; i.o.) = 1. Note that, by the definition of the event {A; i.o.} and the
independence of A;,

1- P(A; i.o.) = P(U~ 1 n~j Aj) = P( _lim


J~OO
n~jAj)

00

= _lim P(n~jAj) = .lim n[1- P(A;)].


J~OO J~OO i=j

The inequality 1 - x .:::; e-x, 0 < x < 1, applied for x = P(A;) gives 1 - P(A;) .:::;
exp[-P(A;)] so that 1- P(A; i.o.) .:::; limj~oo e- L~j P(Ai). Because L~j P(A;) =
oo forallj, exp[- L~j P(A;)] is zero so that 1- P(A; i.o.) = 0. •

2.7.3 Independence of random variables

The random variables X;, i E I, defined on a probability space (!:2, :F, P) are
independent if the a-fields a(X;) generated by these random variables are inde-
pendent, where the index set I is finite or not.
2.8. The Fubini theorem 39

Example 2.35: Let X and Y be two real-valued random variables defined on


(Q, :F, P). If the a-fields a(X) and a(Y) are independent, the random variables
X and Y are said to be independent. This definition is equivalent to a classical
definition for the independence for random variables given in Section 2.11.2. ¢
Note: The condition of independence of the events A and B in Example 2.32 is equivalent
to the requirement that the random variables lA and ln be independent. .t.

Example 2.36: Let Xk. k = 1, 2, ... , be a sequence of independent random vari-


ables and CfJk be real-valued measurable functions defined on the real line. The
random variables CfJk o Xk are independent. For example, the collection of func-
tions sin(Xk) are independent since the "sin" function is continuous and, hence,
measurable. ¢
Proof: We have rpJ; 1(B) ~ Band XJ; 1(rpJ; 1(B)) ~ :F because (/Jk is a Borel measurable
function and Xk is a random variable. Because XJ; 1(rpJ; 1(B)) ~ XJ; 1(B) and the u-fields
XJ; 1(B), k = 1, 2, ... , are independent by hypothesis, the random variables CfJk o Xk are
independent. •

2.8 The Fubini theorem


The definition of the integrals in Eq. 2.42 is also valid for random variables
defined on a product probability space. The Fubini theorem gives conditions under
which the integration on the product space can be performed sequentially.
Let (Qk, :Fk. Pk), k = 1, 2, be two probability spaces and denote their
completions by (Qk. :ik. Pk). Denote by Q = Ql x Qz, :F = :Ft x :Fz, and
P = Pt x Pz the product sample space, a-field, and probability measure. Let
(:it, Pt), (:iz, Pz), and (:i, P) be the completions of (:Ft. Pt), (:Fz, Pz), and
(:F, P), respectively. The following statement is the Fubini theorem.

If (a>t, wz) r-+ X (a>t, wz) is :F-measurable and P-integrable, then


1. X(a>t, ·)is :iz-measurable and Pz-integrable for each a>t E Ql \Nt, Nt E :Ft.
and Pt (Nt) = 0.
2. f 02 X(·, wz) Pz(dwz) is :it-measurable and fit-integrable.
3. The equality

L X(w)i>(dw) = L [L1 2
X(a>t,a>z)Pz(dw 1)] P2 (dw 1) (2.56)

holds. If in addition X is positive and either side of Eq. 2.56 exists and is finite
or infinite, so does the other side and Eq. 2.56 is valid ([40], p. 59).
Note: We have seen in Section 2.2.3 that for any probability space (111, Q, Q) there exists
a complete probability space (111, Q, Q) such that Q c g and Q = Q on Q. Hence,
40 Chapter 2. Probability Theory

the assumption that X is defined on a complete probability space is not restrictive. The
last statement of the theorem considers the case in which X is positive but may not be
?-integrable, that is, the integral fn X dP may not be finite. &

Example 2.37: Let X : ([0, 1] x Q, B([O, 1]) x F) ~ (JR, B) be a measurable


function. The product measure space on which X is defined is endowed with
the product measure A x P, where A denotes the Lebesgue measure and P is a
probability measure on (Q, F). It is common to interpret the first argument of
Xastime. Theintegralx(A,w) = JAls(X(s,w))ds,A E B([O,l]),B E B,
represents the time X(·, w), w E Q, spends in B during a time interval A. The
expectation ofthis occupation time is E[x (A, w)] = fA P(X(s) E B) ds. <>

Proof: The measurable mapping (s, w) r-+ X (s, w) generalizes the time series considered
in Example 2.25 because the index s takes values in [0, 1] rather than in a countable set.
This mapping is a stochastic process (Chapter 3). The function X(·, w), called the sample
path w of X, is defined on [0, 1] for each w.
The indicator function, 1s : (JR, B) --+ ({0, 1}, K), is measurable because B E B
and K = {0, {0, 1}, {0}, {1}} so that

1s oX: ([0, 1] x n, B([O, 1]) x :F) --+ ({0, 1}, K)

is measurable. The expectation of the occupation time is

E[x(A, w)] = £ x(A, w)P(dw) = £[£ 1s(X(s, w)) ds J P(dw)

= L[£ 1s(X(s, w)) P(w)J ds = L P(X(s) E B)ds,

where we have used Fubini's theorem ([151], Example 5.9.1, p. 153). •

Example 2.38: If X is a positive random variable, its expectation can be calcu-


latedfromE[X] = ho,oo) P(X > x)dx. <>

Proof: The mapping (x, w) r-+ 1{X(w)>x) is measurable from ([0, oo) x Q, B([O, oo)) x:F)
to ({0, 1}, K), where K = {0, {0}, {1}, {0, 1} }. We have

{ P(X > x) dx = { [{ 1{X(w)>x} P(dw)] dx


J[O,oo) Jro,oo) Jn
= { [{ 1{X(w)>x} dx] P(dw) = { X(w) P(dw) = E[X]
Jn J[O,oo) Jn

by Fubini's theorem and the equality f[O,oo) 1{X(w)>x) dx = f[O,X(w)) dx = X(w). •


2.9. Radon-Nikodym derivative 41

2.9 Radon-Nikodym derivative


We will primarily use the Radon-Nikodym derivative to improve the effi-
ciency of the Monte Carlo simulation (Section 5.4.2).

Let (0, :F) be a measurable space and f.L, v : 0-+ [0, oo] be two measures on
this space. If JL(A) = 0, A E :F, implies v(A) = 0, we say that v is absolutely
continuous with respect to f.L and indicate this property by the notation v « f.L.
If v « f.L and f.L « v, v and f.L are said to be equivalent measures.

Example 2.39: Consider a measure space ( 0, :F, f.L) and a measurable function
h: (0, :F)~ ([0, oo), B([O, oo))). The set function

v(A) = l h df.L, A E :F, (2.57)

is a measure that is absolutely continuous with respect to f.L. <>


Proof: The set function v is positive by definition. This function is countably additive
since, for any disjoint sets An E :F, n = 1, 2, ... , we have

v(U~l An) = 1 00
Un=lAn
h df-1, = Jf n=l
h lAn df-1,

= fJ h lAn df-1, =f L h df-1, = f v(An).


n=l n=l n n=l
J
The above term by term integration is valid whether I;~ 1 h lAn df-1, is or is not finite
since h is positive. The measure v is absolutely continuous with respect to f1, because

v(A) = { h df-1, = { h lA df-1, ::0 sup (h(w)) fl,(A)


}A }g wEA
so that fl,(A) = 0 implies v(A) = 0 with the usual convention 0 · oo = 0
We also note that if f1, is a-finite so is v, that is, there exists a sequence of disjoint
events B j E :F such that v(B j) < oo for every j and v(B) = I;~ 1 v(B n B j ), B E :F.
Take Bj ={wE !J: j -1 ::0 h(w) < j} E :F and note that Vj(A) = fA(h lBj)df-1, is
finite for each j and A E :F such that fl,(A) < oo. Also, Vj(A) is bounded by j fl,(A). The
monotone convergence theorem for the sequence of measurable, increasing, and positive
functions Xn = LJ=l h lBj gives (Eq. 2.47)

lim
n~oo }A
{ Xndf-1, { (lim
= }A n~oo
Xn) df-1,.

The left and the right sides of this equation are

lim tJhlB.df-1,=-f { hlBdfl,=i:vj(A) and


n~oo A 1 JA 1
j=l . j=l j=l

Lc;hlsj) d~ ~ •(A),
42 Chapter 2. Probability Theory

respectively, so that v is a- finite. •

The Radon-Nikodym theorem can be interpreted as the converse of Ex-


ample 2.39. The statement of this theorem is given in Eq. 2.58 without proof.

If J-t and v are a -finite measures on a measurable space (!>2, :F) such that v « J-t,
then there exists a measurable function
dv
h = - : (!>2, F)-+ ([0, oo), B([O, oo))), (2.58)
d~-t

called the Radon-Nikodym derivative of v with respect to f.t, such that Eq. 2.57
holds ([66], Theorem 18, p. 116).

Example 2.40: Consider a probability space (!>2, :F, P), a partition A; E :F,
i = 1, .. . ,n,ofr>2 suchthatP(A;) > O,andarandomvariableX = L:7= 1 x; 1A;
defined on this space, where x; E R Let Q be a probability measure on (!>2, :F)
such that Q(A;) > Oandh; = P(A;)/Q(A;), i = 1, .. . ,n. Denote expectations
under the probability measures P and Q by E p and E Q, respectively. The expec-
tations Ep[X] and EQ[X h] of the random variables X and X h with respect to
the measures P and Q coincide. <>
Proof: The expectation of X h under Q is

h;) Q(A;) = L
n n ( P(A;))
EQ[X h] = ~)x; x; ~ Q(A;) = Ep[X].
i=l i=l Q( ,)

Since both P and Q are zero only on the impossible event 0, the function h is measurable
and Q is absolutely continuous with respect to P on the a -field a (A;, i = 1, ... , n). The
function his the Radon-Nikodym derivative of P with respect to Q. A

Example 2.41: Let X "' L 1 (!>2, :F, P) be a random variable. If Q is a probability


on a measurable space (!>2, :F) such that P « Q, we have

Ep[X] =I =I X~~ =I
X dP dQ X hdQ = EQ[X h], (2.59)

where h = d P1d Q is the Radon-Nikodym derivative. This result extends our


discussion in Example 2.40 to an arbitrary random variable. The importance sam-
pling technique uses Eq. 2.59 to improve the efficiency of the Monte Carlo solu-
tion (Section 5.4.2). <>

2.10 Random variables


Consider a probability space (!>2, :F, P) and a real-valued random variable
X defined on this space (Section 2.4). This section defines the distribution, den-
sity, and characteristic functions of X, and illustrates the use of these functions in
calculations.
2.10. Random variables 43

2.10.1 Distribution function


The cumulative distribution function or the distribution function of a ran-
dom variable X is defined by the mapping X : Q ---+ ffi. and the probability mea-
sure P.
The distribution function of X is

F(x) = P(X- 1 ((-oo,x])) = P({w: X(w) S x}) = P(X S x). (2.60)

Note: F is the probability induced by X on (R, B) and constitutes a special case of the
distribution in Eq. 2.29 forB = ( -oo, x] and h = X. The definition is meaningful because
X is F-measurable so that {w: X(w) ~ x} E F. The notation P(X ~ x) used in Eq. 2.60
is an abbreviation for P({w: X(w) ~ x}). &

The distribution function F has the following properties.

• F is a right continuous, increasing function with range [0, 1].


• F can have only jump discontinuities and the set of these jumps is countable.
The number of jumps ofF exceeding£ E (0, 1) is smaller than 1/£.
elimx-HXJ F(x) = 1 and limx-*-oo F(x) = 0.
• P(a < X S b) = F(b)- F(a) 0::: 0, a S b.
• P(a S X< b)= F(b)- F(a) + P(X =a)- P(X =b), aS b.

Proof: Since F is a probability measure, its range is [0, 1]. Also, F is an increasing
function because {X ~ x1} s; {X ~ xz} for x1 ~ xz (Eq. 2.5). That F is right continuous
follows from the continuity of the probability measure and the definition of F. Let {Xn}
be a decreasing numerical series converging to x, Bn = {w: X(w) ~ xn}, and B = {w:
X(w) ~ x}. The sequence of events Bn is decreasing sothatlirun--* 00 Bn = n~ 1 Bn = B
and (Eq. 2.25)

lim F(xn) = lim P(Bn) = P( lim Bn) = P(B) = F(x).


n-+oo n---+oo n---+oo

A function g : I -+ R I c R, is discontinuous at c E I if it is not continuous at


this point. The discontinuity of g at c can be of four _types:
(a) Infinite discontinuity if the left limit limxtc g(x) and/or the right limit limx,j,c g(x) of
g at c is infinity.
(b) Jump discontinuity if the left and right limits of g at c are finite but not equal.
(c) Removable discontinuity if limx-+c g(x) exists and is finite but differs from the value
of gat c.
(d) Oscillating discontinuity if g is bounded and the left and right limits of g at c do not
exist.
F can have only jump discontinuities because it is a bounded, increasing, and right
continuous function. We need to show that F has at most a countable number of jump
discontinuities. Let~ < ~' be two distinct jump points ofF and consider the open intervals
It; = (F(~-), F(~)) and It;' = (F(~'- ), F(()) associated with these jumps. Because the
jumppoints~and~'aredistinct,thereexists~ E (~.~')suchthatF(~) ~ F(~) ~ F(~ 1 -),
44 Chapter 2. Probability Theory

showing that I~g and I~;' are disjoint intervals. The collection of the intervals l~g is countable
since each /1; contains a rational number and the set of rational number is countable.
The sum of all jumps ofF is Ll;eJ[F(~+)- F(~-)] = Ll;eJ[F(~)- F(~-)] ::::
1, where J denotes the collection of jump points of F. Hence, 8 n8 :::: 1 so that n 8 :::: 1/8,
where n8 denotes the number of jumps of F larger than 8.
The third property results from the equalities

lim
x~+oo
P(( -oo, x]) = P(!J) = 1 and lim
x~-oo
P(( -oo, x]) = P(0) = 0.
The fourth property follows from the equality P(B \ A) = P(B) - P(A), which
holds since the event A= {w: X(w) ::::a} is included in B = {w: X(w):::: b} for a:::; b.
The last property holds because the event {a :::: X < b} is the union of the disjoint
events {X = a} and {a < X :::: b} \{X = b} so that the probability of {a :::: X < b} is
P(X = a) + P({a < X :::; b} \ {X = b}), and P({a < X :::; b} \{X = b}) is equal to
P(a < X :::: b) - P(X =b) . •

Example 2.42: Consider the system in Example 2.14 and let F denote the distri-
bution of the system strength X. The system reliability without and with a proof
test at Xpr is Ps(X) = 1 - F(x) and Ps(x) = (1- F(x V Xpr))/(1- F(xpr)).
respectively.
Figure 2.4 shows the system reliability disregarding and accounting for the

Reliability with
0.8
proof test

0.6

0.4 Reliability
without prootI test
0.2
X
pr
00 2 3 4 6
X

Figure 2.4. Reliability with and without proof test information

information provided by a proof test at x pr = 2 for a lognormal random variable


X = exp(Y), where Y is a Gaussian variable with mean JL = 1 and variance
a 2 = 9 (Eq. 2.68). The figure suggests that the additional information provided
by the proof test can be beneficial in practical applications. ¢

Example 2.43: Let F be the distribution of a random variable X defined on a


probability space (Q, :F, P). The distribution F is continuous at x E lR if and
only if P({X = x}) = 0. ¢
2.10. Random variables 45

Proof: Let {xn} be a positive sequence such thatxn -J, 0 as n--... oo. Define the sequence of
intervals An = {w E r.l : X (w) E (x- Xn, x]} for x E R. Since this sequence is decreasing,
we have P({X = x}) = P (n~ 1 An)= limn--+oo P(An) and P(An) = F(x)- F(xn)· If
F is continuous at x, then F(x)- F(xn) --... 0 as n--... oo so that P({x}) = 0. Conversely,
if P({X = x}) = 0, we have limn--+oo[F(x)- F(xn)l = 0 so that F is continuous at x
since it is right continuous. •

2.10.2 Density function


Consider a distribution function F that is absolutely continuous in IR, that
is, there exists an integrable function f, called a probability density function or a
density function, such that Eq. 2.61 is satisfied.

The density function f is an integrable function such that

F(b)- F(a) = ib f(x)dx, a :S: b. (2.61)

The density function has the following properties.

• j(x) = F'(x) so that f~oo f(~) d~ = F(x).


•f ~ 0 because F is an increasing function.
•f is not a probability measure.
• The area under f is f~oo f (x) dx = 1.

Consider a random variable X defined on a probability space (Q, :F, P) and


a measurable function g : (~,B) -+ (~,B). Then Y = go X is a random variable
defined on (Q, :F, P) (Section 2.4.3).

If X andY =go X are integrable and Q(B) = P(X- 1(B)), B E B, then

E[Y] =In Y(w) P(dw) =In g(X(w)) P(dw) and (2.62)

E[Y] = l g(x) Q(dx) = l g(x) dF(x) = l g(x) f(x) dx. (2.63)

Proof: If g = lB, B E B, then Eqs. 2.62 and 2.63 give E[Y] = P(X E B) = Q(B). If I
is a finite index set and g = LiEf bi lBi, the subsets B; E :F partition JR, and bi are real
constants, then Eqs. 2.62 and 2.63 give LiEf bi P(B;) because the integration is a linear
operator.
If g is an arbitrary positive Borel function, there exists a sequence of simple, in-
creasing, and measurable functions gn, n = 1, 2, ... , converging tog as n --... oo. We
have seen that the expectations of gn (X) in Eqs. 2.62 and 2.63 coincide. The monotone
46 Chapter 2. Probability Theory

convergence theorem shows that the expectations of g(X) in Eqs. 2.62 and 2.63 coincide.
If g is an arbitrary Borel function, it can be represented by g = g+ - g-, where g+ and
g- are positive Borel functions. Because g(X) is integrable and the expectation is a linear
operator, the formulas in Eqs. 2.62 and 2.63 give the same result.
That Eqs. 2.62 and 2.63 give the same result can be indicated by Ep[g o X] =
EQ[g], where Ep and EQ denote expectations under the probabilities P and Q, respec-
tively. We note that the probabilities P and Q live on different measurable spaces, P is
defined on (Q, :F) while Q is the probability measure induced by X on (R, B) (Eq. 2.29).
Generally, the last two formulas in Eq. 2.63 involving Riemann-Stieltjes and Riemann in-
tegrals are used to calculate E[Y]. •

Example 2.44: The expectation of the Cauchy random variable X with density
f(x) = aj[;rr (a 2 + x 2 )], a > 0, x E JR. does not exist.<>
Proof: The expectation of x+ is (Eq. 2.63)

E[X+] = E [lx>O X] = { 00 x 2a dx =~ log(a 2 + x 2 ) IQ"= +oo.


- lo n (a + x 2) 2n

Similarly, E[X-] is infinity. Hence, E[X] is not defined. •

Example 2.45: Let g : (JR, B) ---+ (JR, B) be a measurable function and X be a


random variable defined on a probability space (Q, :F, P). The expectation of
the random variable g(X) can be calculated by the last integral in Eq. 2.63 or
Ef[g(X)] = Eq [g(X) f(X)jq(X)], where q is a probability density function
such that If = {x E lR : f(x) > 0} ~ Iq = {x E lR : q(x) > 0}. The
expectations E f and Eq correspond to f and q, respectively. The requirement on
the subsets Iq and If of lR is consistent with the absolute continuity condition for
the existence of the Radon-Nikodym derivative (Eq. 2.58). <>
Proof: The expectation of g(X) is

E j[g(X)] =!,
It
g(x) f(x) dx =!, Iq
[g(x) f(x)
q(x)
Jq(x) dx,
where the last equality holds since f is zero in Iq \ If. The integral on Iq is well de-
fined because q is strictly positive on Iq, and gives the expectation of the random variable
g(X) f(X)jq(X) with respect to the density function q. •

Example 2.46: Let X be a random variable with density f. It is assumed that the
expectation of X and of all functions of X considered in this example exist and are
finite. Let X;, i = 1, ... , n, be independent copies of X. The classical estimator
X= (ljn) I:7= 1 X; of E[X] has mean E[X] and variance Var[X]/n. The impor-
tance sampling estimator of E[X] is Xis = (1/n) I:7= 1 X; f(X;)/q(X;), where
q is a density such that q(x) = 0 implies f(x) = 0. The mean and variance of
Xis are E[X] and Var[X f(X)/q(X)]jn, respectively.
The success of the importance sampling technique depends on the func-
tional form of q. For example, E[X] = 1/2 and Var[X] = 0.05 for a random
2.10. Random variables 47

variable X with density f(x) =


6x (1-x) 1 [O,IJ(X). The variance of Xis is 0.0929
and 0.0346 for q(x) = 1 [O,IJ(x) and q(x) = (20/7) x 1[o,0.7l + [2- (20/3)(x -
0.7)] 1(0.7,1]• respectively. This shows that Xis may or may not be superior to X
depending on q. ¢

Note: The importance sampling technique in this illustration is a special case of Exam-
ple 2.45 for g(x) = x. If we set q(x) = x f(x)/ E[X], the variance of Xis is zero. How-
ever, this estimator cannot be used because it depends on E[X], that is, on the parameter to
be estimated. A

2.10.3 Characteristic function


Let X be a real-valued random variable with distribution F and density f.
Define acomp1ex-valuedfunction g(x, u) = exp (.J=I ux), u, x E R

The characteristic function of X is

cp(u) = E [eAux] = k eAux dF(x) = k eAux f(x) dx. (2.64)

Note: We define the expectation for real-valued random variables and eA u X is complex-
valued. The expectation of eA u X is complex-valued with real and imaginary parts
E[cos(u X)] and E[sin(u X)], respectively. A

Example 2.47: The characteristic function exists for any random variable. For
example, the characteristic function of the Cauchy random variable in Exam-
ple 2.44 is cp ( u) = exp( -a Iu I), u E R We have seen that the expectation of
this random variable does not exist.
The moment generating function, m(u) = E[exp(u X)], u E R is also
used in calculations. However, m(·) may not be bounded, for example, the mo-
ment generating function of a Cauchy random variable. ¢

2.10.3.1 Properties

• lcp(u) I :::: cp(O) = 1.


• cp( -u) = cp(u)*, where z* denotes the complex conjugate of z E C.
• The characteristic function is positive definite.
• The characteristic and the density functions are Fourier pairs.
• cp is uniformly continuous in R
elf X E Lq, then cp E Cq(JR) andcp(k)(O) = (.J=T)k E[Xk], k = 1, ... , q.
48 Chapter 2. Probability Theory

Proof: Eq. 2.64 gives rp(O) = 1, lrp(u)l = IJ e R ux f(x) dxl ::'0 J leR uxl f(x) dx =
1, and rp(u) = rp(-u)*.
Let Z = Lk=lZk exp(.J=T Uk X) be a random variable. Because Z Z* = IZI 2 is
a positive random variable for any Zk E IC and uk E JR,
n n
0 ::'0 E[Z Z*] =L Zk z/ E [eR (uk-uz) X]= L Zk z/ rp(uk- UJ),
k,l=l k,l=l

so that rp is positive definite.


For the fourth property we need to show that

rp(u)= [ eRux f(x)dx and f(x)=-1- [ e-Ruxrp(u)du. (2.65)


lJR 2TC lJR
The first equation is the definition of the characteristic function. The second equation
follows from the fact that, if fJ : lR -+ <C is a continuous, positive definite function such that
{J(O) = 1 and J lfJ(u)l du < oo, then fJ is a characteristic function of a distribution F on
lR that is absolutely continuous with respect to the Lebesgue measure and has a continuous
density f given by ([66], Theorem 14, p. 231)

f(x) = - 1- [ e-Rux {J(u)du.


2 TC lJR
It can also be shown that the characteristic function defines uniquely the density and the
distribution of X ([66], Theorem 3, p. 211) and that

F(x2) - F(xl) = -
1
lim
1'
e-Ruxz _ e-Rux1
,--, rp(u) du, (2.66)
2:rc r---+oo -r -v-lu

where and x 1 and x 2 > x 1 are points of continuity ofF ([142], Theorem 3, p. 12).
To prove that rp is uniformly continuous in JR, we have to show that for an arbitrary
s > 0 there exists 8 > 0 such that lrp(u +h) - rp(u)l < s if lhl < 8 for all u E R The
increment of the characteristic function in ( u, u + h) satisfies the inequality

lrp(u +h)- rp(u)l = JE[eR<u+h)X]- E[eRux]l

::'0 E [JeRux (eRhx _ 1)J] ::'0 E [JeRhx -lJ]


= f JeRhx- 11 dF(x) + [ JeRhx- 11 dF(x)
Jlxl>a Jlxi:'Oa
for any a > 0. The first integral is smaller than 2 P(IXI > a). We can take a sufficiently
large such that P(IXI > a) < sj4. The second integral can be made smaller than s/2 for
the selected value of a by taking lh I smaller than a real 8 > 0 that is independent of u.
Hence, the increment lrp(u +h)- rp(u)l of the characteristic function does not exceed an
arbitrary s > 0 provided that lh I < 8 irrespective of the value of u.
The proof of the relationship between rp(k) (u) = dkrp(u)jduk and moments of X
can be based on the inequalities ([151], p. 298)

eRx- L (v-lx)
n ,--, k lxln+l
<--- and
21xln
<--
k=O k! - (n + 1)! n!
2.1 0. Random variables 49

For example, suppose that X ~ L1 and consider the difference

The latter expectation is finite because

IeR u
xlleHhX -1-FfhXI
h :::0: 21XI E L1

by the second inequality with n = 1 and x replaced by h X. The first inequality gives

eRhx -1- Ffhxl h2 x2 x2


l h
<--=h-----+0
- 2h 2 ,
ash-!- 0,

so that
lirulcp(u + h~- cp(u)- E [n X eRuxJI = 0

by dominated convergence. Hence, we have dcp(u)jdu = E[.J=IxeRuxl so that


cp1 (0) = F f E[X]. Higher order moments can be found in the same way. •
Example 2.48: Let X be a random variable with characteristic function q;. The
characteristic function of a X+ b is ({JaX+b(u) = q;(a u) eHub, where a and b
are constants. <>
Proof: The characteristic function of a X+ b is E[exp(.J=T u(a X+ b))] by Eq. 2.64 or
E[exp(Ff u a X)] exp(Ff u b). •

Example 2.49: Let X be a random variable with the distribution function F (x) =
LiEf Pi 1[xi,oo)(x), where I is a finite index set, Pi :=:: 0 such that LiEf p; = 1,
and {x;} is an increasing sequence of real numbers. The density and the charac-
teristic functions, f(x) = LiEf p; 8(x -Xi) and q;(u) = LiEf p; [cos(u Xi) +
A sin(u x;)], of X are Fourier pairs. The characteristic function consists of a
superposition of harmonics with amplitudes p; and frequencies coinciding with
the locations x; of the jumps ofF. <>

Example 2.50: Let X be a real-valued random variable on a probability space


(Q, :F, P) andY= g(X) = xr, where r :=:: 1 is an integer. Because g(x) = xr
is a continuous function, it is Borel measurable and Y = X r is a random variable.
If X ~ Lr (Q, :F, P), the expectation of Y exists and is finite. The moment of

l: l:
order r of X is

f.L(r) = E[Xr] = xr dF(x) = xr f(x) dx. (2.67)

If the functions g(x) = (x - t.L(1)Y and g(x) = lx- f.LClW are considered,
the resulting expectations define the central moments of order r and absolute
50 Chapter 2. Probability Theory

central moments of order r of X. The moments f.L = JL(1), a 2 = E(X- JL) 2,


v = afJL, Y3 = E[(X- JL) 3 ]ja 3 , and Y4 = E[(X- JL) 4 ]ja 4 are called mean,
variance, coefficient of variation, coefficient of skewness, and coefficient of
kurtosis, respectively. These moments can be calculated by direct integration
(Eq. 2.67) or from cp (r) (0) = ( RY E[Xr]. The positive square root of the
variance is called standard deviation and is denoted by a. <>
Example 2.51: A random variable X is said to be Gaussian with mean f.L and
variance a 2 , in short X ~ N (JL, a 2 ), if it has the density

2
f(x) = -1- exp [ - -1(X-JL)
-- ] , X E JR. (2.68)
,J2ii a 2 a
The skewness and kurtosis coefficients of X are Y3 = 0 and Y4 = 3. The skewness
coefficient is zero because f is symmetric about x = f.L.
The random variable X follows a gamma distribution with parameters k >
0 and A > 0 if it has the density
xk-1 Ak e-A.x
f(x) = r(k) , x:;:: o. (2.69)

The first four moments of this random variable, denoted by X r(k, A), are
JL = kjA, a = kjA
2 2 , Y3 = 2/ -Jk,, and Y4 = 3(1 + 2/ k).
The characteristic functions of the random variables with densities given by
Eqs. 2.68 and 2.69 are

cp(u) = exp ( ,J=l f.L u- ~a 2 u 2 ) (2.70)

1
for X"' r(k, A). (2.71)
cp(u) = (1 - J=T ujA)k
Figure 2.5 shows the density and the characteristic functions of a gamma and a
Gaussian random variable with mean f.L = 1.5 and variance a 2 = 1. Because
these densities are not even functions, their characteristic functions are complex-
valued.<>
Example 2.52: Let N be a discrete random variable taking values in {0, 1, 2, ... }
with the probability
An
P(N = n) = - e-:A, n = 0, 1, ... , (2.72)
n!
where A > 0 is an intensity parameter. The probability measure in Eq. 2.72 defines
a Poisson random variable. The characteristic function of X = a N + b is
(2.73)

where a and b are constants. The random variable X is referred to as Poisson with
parameters (a, b, A). <>
2.1 0. Random variables 51

f(x)

0.5

X
0
-3 -2 -1 0 2 3 4 5 6

9i{!p}
0.5

0
u
-0.5
-4 -3 -2 -1 0 2 3 4

S{<p}
0

u
-1
-4 -3 -2 -1 0 2 3 4

Figure 2.5. The density and characteristic functions of a gamma and a Gaussian
variable with the same mean and variance

Note: The characteristic function of N is (Eq. 2.64)

fPn(u) = E [eAuN] = Loo eAuk ~k e-J... = e-J... Loo (')..eAut


'
k=O k. k=O k.
= exp [ ').. (eA u - 1) J.
This function and fPx(u) = exp(Hub)cpn(ua) give Eq. 2.73, where fPx denotes the
characteristic functions of X. •

Example2.53: Let Nj, j = 1, .. . ,m, be independent Poisson variables with


intensity parameters AJ > 0. The characteristic function of the random variable
X = I:j= 1 x 1 N1 is

(2.74)

where xi are real numbers. <>


Proof: We have cp(u) = E [eA uX] = IT}=I E [eA uxj Nj] by the independence of
the random variables Nj. The result in Eq. 2.74 follows from this expression and Exam-
ple2.52. •
52 Chapter 2. Probability Theory

Example 2.54: Let X be a compound Poisson random variable defined by


N
X =L Yk for N > 0 and X = 0 for N = 0, (2.75)
k=l
where N is a Poisson variable with intensity A. > 0 (Eq. 2.72) and Yk. k = 1, 2, ... ,
are independent copies of Y 1· The random variables Yk are independent of N. The
characteristic function of X is

rp(u) = exp (-A l (1- eR"Y)dFy(y)), (2.76)

where Fy denotes the distribution function of Y 1· <>


Proof: We have

rp(u) = E [eRux 1w:~:l} +e.J=Iux 1(N=O}J = E [eRu 'Lf=l Yk J + P(N = 0)

~E { E [.Q ,v'=T•Y• 1N]) +P(N ~oJ


= L (q~y(u)}k
00

P(N = k) + P(N = 0)
k=l
~ (A. qiy(u))k A
= LJ k! e- = exp[-A. (1- qiy(u))],
k=O
where rpy denotes the characteristic function of Y1. The expression of rp in Eq. 2.76 is an
alternative form of the above result. •

2.10.3.2 Infinitely divisible characteristic function


Let X be a random variable with characteristic and distribution functions rp
and F, respectively.

If there is a characteristic function (/)n for every integer n :=:: 1 such that

rp(u) = (rpn(u)l, u E JR, (2.77)

then rp is said to be infinitely divisible (i.d.).

Note: This definition shows that a random variable X with an infinitely divisible char-
acteristic function has the representation X = L?=l X~n) for each n ~ 1, where x}n),
i = 1, ... , n, are independent identically distributed (iid) random variables with the char-
acteristic function (/Jn. A

Example 2.55: The characteristic function of X ,..... N(f.L, a 2 ) is infinitely divisi-


ble.<>
2.1 0. Random variables 53

Proof: The characteristic functions ({Jn (u) = exp ( J=T JJ, u In - u2 u 2I (2 n)) and <p in
Eq. 2.70 satisfy Eq. 2.77 for each n. Hence, X can be represented by a sum of n independent
Gaussian variables with mean JJ,In and variance u 2 ln. •
Example 2.56: Let N denote a Poisson random variable with intensity A > 0 and
let a, b be some constants (Example 2.52). The characteristic function of aN+ b
is infinitely divisible. ¢
Proof: <fJn(u)= exp (CAin) (eRau -1) + J=T (bin) u) and<p inEq. 2.73 satisfy the
condition <p = (<pn)n for any n :::=: I integer. Hence, a Poisson variable X with parameters
(a, b, A) can be represented by a sum of n independent Poisson variables with parameters
(a, bin, AIn) for each n (Example 2.52). •

Infinitely divisible characteristic functions are encountered in the analysis


of the Brownian motion and other processes with stationary independent incre-
ments introduced in the next chapter and used extensively in this book. We review
here essential properties, methods of construction, and canonical representations
of infinitely divisible characteristic functions.

Properties of i.d. characteristic functions.

If rp is an i.d. characteristic function, then rp has no real zeros, that is, rp(u) =j:. 0
for every u E IR ([124], Theorem 5.3.1, p. 80).
Proof: This property can be used as a criterion for finding whether a particular character-
istic function is not infinitely divisible. For example, the characteristic function in Exam-
ple 2.49 is not infinitely divisible.
If <p is infinitely divisible, there exist characteristic functions ({Jn = <plfn for each
integer n > 0 so that g(u) = liiDn--+oo <fJn(u) = liiDn--+oo(<p(u)) 1fn takes only two values,
zero for <p(u) = 0 and one for <p(u) I 0. Because <p and ({Jn are characteristic functions,
there is an interval I c R containing zero in which both <p and ({Jn are not zero so that
log(<pn(u)) = (lin) log(<p(u)), u E I. The right side of this equation approaches zero as
n --+ oo so that <fJn (u) --+ 1, u E I, as n increases indefinitely.
The function g is a characteristic function as a limit of characteristic functions
([124], Chapter 3), can be either zero or one, and g(u) =
1 for u e I. Hence, g(u) 1 =
everywhere by the continuity of the characteristic function so that <p(u) I 0, Vu E R. •

Finite products of i.d. characteristic functions are i.d. characteristic functions


([124], Theorem 5.3.2, p. 81).
Proof: Let <fJb k = 1, ... , q, be i.d. characteristic functions so that <fJk = (rpk,n)n, n 2: 1,
where rpk,n are characteristic functions. The function rp = Tik=l rpk is a characteristic

[n
function as a product of characteristic functions and

rp(u) = Ii
k=l
rpk(U) = n[(/Jk,n(U)]n =
k=l k=l
rpk,n(U)]n = rpn(u)n

for any n 2: 1, where rpn = Tik=l rpk,n· Hence, <pis infinitely divisible. •
54 Chapter 2. Probability Theory

If cp is an i.d. characteristic function so is lcp I ([124], Corollary to Theorem 5.3.2,


p. 81).

Proof: The function lcp(u)l 2 = cp(u) (cp(u))* = cp(u) cp(-u) is an i.d. characteristic func-
tion by the previous property so that (lcp(u)l 2 ) 1/(2 n) = lcp(u)l 1/n is a characteristic func-
tion for each integer n > 0. •

A characteristic function that is the limit of a sequence of i.d. characteristic


functions is infinitely divisible ([124], Theorem 5.3.3, p. 82).

If cp is an i.d. characteristic function, cp 01 is a characteristic function for each real


number a > 0. The converse is also true ([124], Corollary to Theorem 5.3.3,
p. 82).

Example 2.57: The characteristic function of X ,....., N(O, 1) is cp(u) = e-u 2 12 .


We have seen in Example 2.55 that cp is infinitely divisible. This function has no
real zeros consistent with a property of i.d. characteristic functions. The func-
tion cp(u) 01 = e-u 20112 , a > 0, is also an i.d. characteristic function because it
corresponds to the variable N(O, a), a result consistent with the last property. <>

Construction of an i.d. characteristic function.

If g : lE. -+ C is a characteristic function and p > 0 denotes a real number,

cp(u) = exp {p [g(u)- 1]}, u E JE., (2.78)

is an i.d. characteristic function ([124], Lemma 5.4.1, p 83).

Note: This fact can be used to construct infinitely divisible characteristic functions. For
example, the characteristic function in Eq. 2.73 with b = 0 can be obtained from Eq. 2.78
with p = J... and g(u) = exp(H au), a E JR. .&

A characteristic function cp is infinitely divisible if and only if

cp(u) = lim exp[pn (gn(u)- 1)], (2.79)


n--+oo

where Pn > 0 are real numbers and gn denote characteristic functions ([124],
Theorem 5.4.1, p. 83).

The limit of a sequence of finite products of Poisson type characteristic functions


is an i.d. characteristic function. Moreover, every i.d. characteristic function can
be written as the limit of a sequence of finite products of Poisson type charac-
teristic functions ([124], Theorem 5.4.2, p. 83).
2.1 0. Random variables 55

Proof: We only prove the stated representation of i.d. characteristic functions.


Suppose that q> is an i.d. characteristic function. According to the previous theorem,
we have q>(u) = liiiln--+oo exp[pn (gn(u)- 1)], where Pn > 0 and gn are characteristic
functions. Take gn(u) = f"M. e A ux dGn(x) in Eq. 2.79, where Gn. n = 1, 2, ... , is a
collection of distributions. For a > 0 consider a partition -a = ao :::; a1 :::; · · · :::; am = a
of (-a, a) such that maxi (ai -ai-d --+ 0 as m --+ oo. The function Pn (gn (u) - 1) can
be approximated by

where bk = Pn [Gn(ak)- Gn(ak-I)]. For any a > 0 the last expression is the limit
of a finite product of characteristic functions corresponding to Poisson random variables
(Eq. 2.73). The stated result follows by taking the limit as a --+ oo. •

Canonical representations of i.d. characteristic functions.

Let a E JR. be a constant, (J be a real-valued, bounded, and increasing func-


tion defined on the real line such that 0( -oo) = 0. If a function IP admits the
representation

11
log(qJ(u))=v-wu+ 1("M.
A ux-1-
e- .J=T
1 +x
ux) 1 +x x-dO(x) (2.80)
2 --
2
2

for all u E JR., then qJ is an i.d. characteristic function. The constant a and the
function (J are uniquely determined by qJ ([124], Lemma 5.5.1, p. 85).

Note: The integrand of the integral in Eq. 2.80 is defined by continuity at x = 0 so that it
is equal to -u 2 /2 at this point. A

If IP is an i.d. characteristic function, there exists a sequence of functions ~ n ( u)


such that limn--+oo ~n (u) = log(qJ(u)), u E lR, where

~n(U)=Hanu+ h(
"M.
eA- ux-1- R u x
1 +x
2
1 +x
) -
x
-
2
2 -dOn(X),

an= n { -1 X 2 dFn(X),
J"M. +x
On(X) = n l x

-00
Y
- -2 dFn(y),
1+y
(2.81)

and Fn denotes the distribution function of qJ lfn ([124], Lemma 5.5.3, p. 88).
56 Chapter 2. Probability Theory

Levy-Khinchine representation. A function cp is an i.d. characteristic function


if and only if log(cp(u)) is given by Eq. 2.80. The representation is unique and
the integrand defined at x = 0 by continuity is -u 2 /2 ([124], Theorem 5.5.1,
p. 89).

Note: If the function e has a jump CY 2 = &(0+) - &(0-) at X = 0, the Levy-Khinchine


representation can be given in the form

r> 2 u 2
log(rp(u))=J=lau---+
2
k (
IR\{0}
eR-lux_1- R
1 +x
u
2
x) -+
1 -x2
2 -d&(x)
x
(2.82)
for each u E R because the integrand is -u2 /2 at x = 0 (Eq. 2.80). An alternative form of
this equation is

r>2u2
log(rp(u)) = J=l au- - -+ ~ ( eR-lux - 1- .Pux) d~(x),
2 (2.83)
2 IR\{0} 1 +x

I
where
j:_ 00 l+t d&(y) for x < 0,
~(x) = Y (2.84)
-J
2
X
00 l+y
y2 d&( y ) for x > 0.

The function ~ is defined on R \ {0}, is increasing in (-oo, 0) and (0, oo), satisfies the
conditions~( -oo) = 0, ~(oo) = 0, and the integral fc-t:,t:)\{O} x 2 d~(x) < oo for any s >
0. The representation in Eq. 2.83 is unique and is referred to as the Levy representation
for the i.d. characteristic function rp. A

An alternative version of the Levy-Khinchine representation is

a2 u2
log(cp(u)) = J=T au- - 2-

+{ (eRux- 1- J=T u x(x)) AL(dx), u E R, (2.85)


liR\{0}

where x (y) = -1c-oo,1J(Y) + y 1(-l,l)(Y) + 1[l,oo)(Y) and AL is a Levy mea-


sure, that is, a measure defined on lR \ {0} such that fiR\{O} (y 2 1\ 1) AL(dy) < oo
([66], Theorem 13, p. 299).

Example 2.58: The characteristic functions of the Gaussian, Poisson, and com-
pound Poisson random variables are special cases of the Levy-Khinchine repre-
sentation given by Eq. 2.85. <>
Proof: The Uvy-Khinchine representation in Eq. 2.85 with A£ = 0 gives the characteristic
function of a Gaussian variable (Eq. 2.70). The characteristic function in Eq. 2.85 with
r> 2 = 0 and AL(B) =A 1rn(B) for~ E R \ {0}, A> 0, and BE B, is

log(rp(u)) = J=1 u (a- AxCm +A (eH u ~ - 1)


2.10. Random variables 57

and has the form of the characteristic function for a Poisson variable (Eq. 2. 73). If a 2 =0
and AL(dx) =A dF(x), A > 0, Eq. 2.85 becomes

log(rp(u)) = R au+). [ (eRux- 1- R u x(x)) dF(x)


JlR'.\(0}

=.J=lau-.J=l).u [ x(x)dF(x)+A [ (eRux_l)dF(x)


JlR'.\ {0} JJR>.\ {0}
and resembles the characteristic function of a compound Poisson variable (Eq. 2.76). •

2.10.3.3 a-Stable random variable

A random variable X is a -stable if, for any a i > 0 and independent copies Xi,
i = 1, 2, of X, there exist c > 0 and b such that

(2.86)

that is, c (a1 X 1 + az Xz) + b and X have the same distribution.


Note: Because a1 X1 + az Xz and (X- b)jc have the same distribution, Xb k = 1, 2, are
independent copies of X, and rp(u) = E[exp(.J=-1 u X)] is the characteristic function of
X, Eq. 2.86 gives rp(a1 u) cp(az u) = cp(ujc) exp( - A u bjc).
There are alternatives to the definition in Eq. 2.86. For example, X has an a-stable
distribution if for each n ::: 2 there are two real numbers, en > 0 and dn, such that L?
= 1 Xi
and en X+ dn have the same distribution, where Xi are independent copies of X ([162],
Definition 1.1.4, p. 3). A

We give here two properties of a-stable random variables that are particu-
larly useful for calculations. An extensive discussion on a-stable random variables
is in [161, 162].
The characteristic function of an a-stable variable is infinitely divisible.

Proof: Let cp denote the characteristic function of an a-stable variable X and let X 1, ... , Xn
denote independent copies of this random variable. Because X is an a-stable variable, we
have L?= 1 Xi if:. en X+ dn, where en > 0 and dn are real numbers. This equality gives
[rp(u)]n = cp(cn u) eH u dn or
rp(u) = [cp(ufcn)r e - R u dn/cn

if we replace u with ufcn. We have 'Pn(u) = cp(ufcn)eHudn/(ncn) and rp(u) =


['Pn (u) ]n. Hence, cp is an i.d. characteristic function. •

This property shows that the class of a-stable variables is a subset of the
class of random variables with i.d. characteristic functions. Hence, a-stable vari-
ables have all the properties of random variables with i.d. characteristic functions.
For example, the characteristic functions of a-stable variables have no real zeros.
58 Chapter 2. Probability Theory

The characteristic function of any a-stable variable has the form

rp(u) = exp{ H f.L u- aa /u/a [1 +R {3 sign(u) w(/u/, a)]}, (2.87)

- tan(Jra/2), a -::j:. 1,
where w(/u/, a)= { (2 /.7r) log /u/, a= 1, (2.88)

f.L, a ~ 0, /{3/ :::: 1, 0 <a :::: 2 are real constants ([124], Theorem 5.7.3, p. 102).

Note: The characteristic function of an a-stable random variable can be obtained from
Eqs. 2.83 and 2.84 with

~(u) ={ Ct lxl-a for x < 0,


-c21x1-a for x < 0,

where ct, c2 2::: 0 are real numbers such that ct + c2 > 0 ([124], Theorem 5.7.2, p. 101).
The integrals

1° (eAux
-oo
-1- J=I ux) ___:!:___
1 + x2 lxla+l
and

{
00
( Aux 1 J=I ux) dx
Jo e - - 1 +x2 lxla+l

can be calculated for a < 1, a = 1, and a > 1 and give Eqs. 2.87 and 2.88.
The parameters a, f3, a, and !J,, referred to as stability index, skewness, scale, and
shift or location, respectively, control the distribution type, the departure from a symmet-
ric distribution about !J,, the range of likely values, and the shift from zero. We use the
notation X ~ Sa (a, {3, !J,) to indicate that X is a real-valued a-stable random variable with
parameters (a, a, {3, !J,). The density of an a-stable variable with f3 = 0 is symmetric about
its location parameter !J,. 6

Example 2.59: The characteristic functions of Gaussian and Cauchy variables


with parameters S2(1, 0, 0) and St (1, 0, 0) are shown in the left graph of Fig. 2.6.
The imaginary parts of these characteristic functions are zero. The right graph in
the figure shows the real and imaginary parts of the characteristic function,

1p(u) = exp [-lui ( 1-2 .J=I sign(u) log lu//.7r) J,


of the Cauchy random variable S 1 ( 1, -1, 0). This characteristic function is comp-
lex-valued since {3 -::j:. 0. ¢

2.11 Random vectors


Let (Q, :F, P) be a probability space and

(2.89)
Figure 2.6. Characteristic function of some a-stable random variables

be a measurable function, that is, x- 1 (B) E :F for every Borel set B E Bd. This
function, called a random vector in !Rd or IRd -valued random variable, induces
the probability measure Q(B) = P(X E B) = P(X- 1 (B)), B E Bd, on the
measurable space (JRd, Bd).

2.11.1 Joint distribution and density functions


Let B = xf+ 1 ( -oo, xi] be a rectangle in JRd for Xi E R i = 1, ... , d. This
rectangle is in Bd so that x- 1(B) E :F because X is measurable.

The joint distribution function of X is

F(x) = P(nf= 1 {Xi :S xi}), X= (x1, ... , Xd) E ~. (2.90)

The domain and the range of this function are JRd and [0, 1], respectively.

The following properties ofF result directly from its definition in Eq. 2.90.

• Iimxk--+oo F(x), 1 :::; k :::; d, is the joint density of the JRd- 1 -valued random
variable (X1, ... , Xk-1, Xk+1, ... , Xd).
elimxk--+-oo F(x) = 0 fork E {1, ... , d}.
• The function x k 1-+ F (x) is increasing for each k E { 1, ... , d}.
• The function Xk 1-+ F(x) is right continuous for each k E {1, ... , d}.

If F is such that
f(x) = ad F(x) (2.91)
ax1 ... axd
exists, then f is called the joint density function of X.
60 Chapter 2. Probability Theory

Note: X takes values in the infinitesimal rectangle xf=l (xi, Xi + dxi] with probability
P ( nf=l {Xi E (xi, Xi + dxj]}) ~ f(x) dx. The distribution of one or more coordinates
of X can be obtained from the joint distribution or the joint density of X. For example, the
marginal distribution and marginal density of Xt are Ft (xt) = F(xt. oo, ... , oo) and
ft (xt) = fJRd-1 f(x) dx2 · · · dxa or ft (xt) = dFt (xt)fdxt. respectively. ~
Suppose that the last d2 < d coordinates of a random vector X have been
measured and are equal to z = (zt •... , za2 ). Let x<l) and x< 2) be vectors con-
sisting of the first dt = d- d2 and the last d2 coordinates of X.

The conditional density j01 2) of x<l) given x< 2) = z is

/ (112)( (1) I ) = f(x<l),z) (2.92)


x z f<2)(z) ,

where t<i) denotes the density of X (i), i = 1, 2, and x (1) = (xt •... , xa 1 ).

Note: The definition of the conditional probability P(A I B) in Eq. 2.16 with A= {Xt E
(xt, x1 +dxt ], ... , Xa 1 E (xa 1, xa 1+dxa1]} and B = {Xa1+1 E (zt. Zl +dzt], ... , X a E
(za2 , za2 + dza2 ]} provides a heuristic justification for Eq. 2.92. We can view P(A I B) as
[(f(x<l), z)/j<2)(z))] dx< 1) and gives the probability content of the infinitesimal rectangle
(X}, X} + dxt] X ••• X (xaJ. xaJ + dxaJ] under the condition x< 2) = z. For a rigorous
discussion, see [66] (Section 21.3, pp. 416-417). ~

If X and Y are Jlld -valued random variables and there is a one-to-one corre-
spondence between these variables given by y = g(x) and x = h(y), then the
densities fx and /y of X and Y are related by

/y(Y)
ax· I,
= fx(h(y)) Iay~ x,yer. (2.93)

Proof: The postulated correspondence implies that the Jacobian


9
ay1 ¥v;Yd
a(xt •...• xa)
o(Yl• · · ·, Yd) OXd
W,
Yl ayd
is non-zero and finite everywhere. The Jacobian of the inverse transformation has the same
properties since iJ(xJ ,... ,xd) a J, ... , d) = 1. Let Dx be a neighborhood of x E Ri and
iJ(yJ, ... ,yd) a XJ, ... ,Xd)
Dy = {q E JRd : 11 = g(~). ~ E Dx} denote the image of Dx by the transformation x ~--+
y = g(x). The equality P(X E Dx) = P(Y E Dy) can be written as fvx fx(~)d~ =
fv y /y(q)dq, and implies

{ fx(h(q)) l:t·····~d~~ dq= { /y(q)dq


lvy 171, ..•• TJd lvy
2.11. Random vectors 61

by change of variables. The last equality gives Eq. 2.93.


Suppose now that the mapping y ~---+ x = h (y) does not have a unique solution. Let
{Av} be a partition of the x-space so that the mapping y ~---+ xis unique in each Av. The
probability mass of Dy is equal to the sum Lv fv llvl of the corresponding contributions
in the subsets A v of the x -space, where each term fv IJ v I is equal to the right side of
Eq. 2.93 for the restriction of the mapping y ~---+ x = h(y) to Av. •

Example 2.60: Let X be a real-valued random variable with a continuous density


fx and define Y = X 2 . The density of Y is

/y(y) = (fx(Y1/2) + fx(-y1f2)) /(ly1f2)


for y > 0. <>
Proof: The mapping from y to x is unique on A1 = (-oo, 0) and A2 = (0, oo) and
is given by y ~---+ x = h(y) = -y 112 and y ~---+ x = h(y) = y 112 , respectively. The
contribution of y ~---+ x = -y 112 to /y is fx(-y 112 ) ld(-y 112 )jdyl (Eq. 2.93).
This result can also be obtained by direct calculations from the relationship P(Y :S
y) = P(-y1f2 <X:::; y1f2) . •

Example 2.61: Let X 1 and X2 be real-valued random variables and define Y =


X1 + X2. The density of Y is /y(y) = fJR fx(IJ, y- IJ) d!J. <>

Proof: It is convenient to augment Y to a vector Y E JR.d, which is related to X by a one-


to-one mapping. We take Y1 = X1 and Y2 = Y = X1 + X2 so that 13xif3Yjl = 1 and
!y Ch Y2) = fx (Yb 5'2 - Y1 ). The result follows by integration.
Direct calculations can also be used for solution. From

we can calculate /y by differentiation. •

2.11.2 Independence
Consider a family of random variables X;, i E I, defined on a probability
space (Q, :F, P), where the index set I is finite or infinite. We say that X; are
independent random variables if the a- fields a (X;) generated by these random
variables are independent (Section 2.7.3).

Example 2.62: Let X 1 and X 2 be random variables defined on a probability space


(Q, :F, P). Denote by F and F; the joint distribution of (X 1, X2) and the dis-
tribution of X;, respectively. If X 1 and X 2 are independent, then F (x 1, x2) =
F1 (x1) F2 (x2) for all x1, x2 E R <>
Note: The independence of X1 and X2 implies that the a-fields cr(X1) and cr(X2) gen-
erated by X1 and X2 are independent. Hence, P(A1 n A2) = P(A1) P(A2) for any
A; E cr(X;), i = 1, 2. This property implies F(x1, x2) = F1 (x1) F2(x2) if we take
A;= Xl 1 ((-oo, x;]) E :F, i = 1, 2. _.
62 Chapter 2. Probability Theory

The following equation gives an alternative definition of independence for


random variables. This new definition is equivalent with the above definition of
independence based on the a-fields generated by the coordinates of X ([151],
Corollary 4.2.2, p. 94).

A family of random variables Xi, i E I, is independent if and only if for all


finite J c I ([151], Theorem 4.2.1, p. 94)

P(Xi :::;: Xi, i E 1) = n


ieJ
P(Xi :::;: Xi), Xi E R (2.94)

Note: If I = {1, ... , d} is finite, the condition in Eq. 2.94 becomes P(N/= 1{Xi :::::Xi}) =
flf= 1P(Xi :::::Xi), Xi ER, or F(x) = flf=t Fi(Xi), where x = (x1, ... , Xd), F denotes
the joint distribution of X = (X 1, ... , Xd), and Fi is the distribution of Xi. The indepen-
dence condition F(x) = flf= 1Fi (xi) is also satisfied by all subsets of I = {1, ... , d}. For
example, if we set Xd = oo, this condition applies to the first d - 1 coordinates of X. If the
distributions F and Fi have densities f and fi, Eq. 2.94 implies f(x) = flfeJ fi(xi) . .l

2.11.3 Characteristic function


Consider a Borel measurable function g : (JRd, Bd) ~ (JRtl, Bq) and an
JRd -valued random variable X = (X 1•... , X d) defined on a probability space
(Q, :F, P). The function g(X) is measurable from (Q, :F) to (JRq, Bq) and its
expectation can be calculated from the following formulas.

E[g(X]) = l g(X(w)) P(dw) and (2.95)

E[g(X)] = { g(x) Q(dx) = { g(x) dF(x) = { g(x) f(x) dx. (2.96)


~ ~ ~

Note: That Eqs. 2.95 and 2.96 give the same result can be shown by extending the ar-
guments used to prove the equivalence of Eqs. 2.62 and 2.63. The chain of equalities in
Eq. 2.96 holds because Q(dx) = dF(x) = f(x)dx. If q = 1, lli denote some con-
stants, and g(x) = "L,f= 1ai xi, then the expectation of g(X) is "L,f=t ai E[Xj] showing
that expectation is a linear operator. .l

The formulas in Eqs. 2.95 and 2.96 can be extended to complex-valued


continuous functions of X, for example, we can take g(X) = exp( .J=T uT X),
where U = (u 1, ... , Ud) E JRd and UT denotes the transpose of U.

The joint characteristic function of X is

cp(u) = E [e,;=IuT X] = kd e,;=IuT x dF(x) = Ld e,;=IuT x f(x) dx.


(2.97)
2.11. Random vectors 63

The following properties of cp are useful for calculations.

•lcp(u)l::::: 1 for all u E JRd.

• The joint characteristic and the joint density functions are Fourier pairs .
• If X has independent coordinates, cp(u) = n%=1 cpk(Uk), where cpk is the
characteristic function of X k.
• cp is uniformly continuous.

Proof: The first property results from the definition of the characteristic function since

The characteristic and the density functions of X are Fourier pairs related by Eq. 2.97
and ([62], p. 524)
f(x) = _1_ { e-Rurx rp(u)du. (2.98)
(27l')d ]n:€_d
If the coordinates of the random vector X are independent, the characteristic func-
tion becomes (Eq. 2.97)

rp(u) = E [eR uT XJ = { d
ln:€.
nd

k=l
(eR Uk Xk fk(xk) dxk)

n E[eRukXk] = n rpk(uk),
d d
=
k=l k=l

where rpk(uk) = E[eR uk xk] is the characteristic function of the coordinate k of X. The
Fourier transform of rp(u) shows that the density of X is equal to the product of the densi-
ties of its coordinates. Hence, the above equality provides an alternative way of checking
whether a random vector has independent coordinates.
For the last property we need to show that for any e > 0 there is 8 > 0 such that
II h II< 8 implies lrp(u +h) - rp(u)l < e, where II · II denotes the usual norm in JRd. The
increment of the characteristic function from u to u + h is

lrp(u +h)- rp(u)l = IE [eR (u+h)T X - eR uT XJ I


:S E [lepur X (eRhT X -1)1] :S E [leyCihT X_ 1IJ
= Lc leRhT x- 11 dF(x) + fv leRhT x- 11 dF(x),
where D = xf=l [-a, a] and 0 < a < oo. The integral on Dc is smaller than 2 P(X E
Dc) so that it can be made smaller than e/2 for a sufficiently large a. For this value of a,
the integral on D can also be made smaller than e /2 by choosing an adequate value of 8
since maxxED I exp(.J=T hT x)- 11 converges to zero as II h II->- 0. •
64 Chapter 2. Probability Theory

2.11.4 Moments
Let X be an ~d -valued random variable on a probability space (Q, :F, P)
and consider the function g(x) = Tif=l xt;, where s; ~ 0 are integers. Because g
is continuous, g(X) is a real-valued random variable.

If X "" L 8 , that is, X; "" Ls for each i = 1, ... , d, the moments of order
s= "£1=
1 s; of X exist, are finite, and are given by

{L(SJ, ... , Sd) = E[g(X)] = E [fr x:;].


z=I
(2.99)

Note: The characteristic function can be used to calculate moments of any order of X
provided that they exist. For example,

(2.100)

If the coordinates of X are independent random variables, the moments of X can be


calculated from JL(SJ' ... 'Sd) = rrt=l E[x:i ]. A
Moments of any order of a random vector X can be obtained from Eq. 2.99
by selecting adequate values for the exponents s;. For example,

• Mean of X;: fLi = E[Xi] = fL(SJ, ... , sd), for s; = 1, SJ = 0, j =f. i.


• Correlationof(X;,Xj) :r;,J =E[X;Xj] ={L(SJ, ... ,sd),
for s i = si = 1, Sk = 0, k =f. i, j.
• Covariance of (X;, Xj): c;,J = E[(X;- {L;)(Xj- Jlj)] = r;,J - fLi fLJ·
(2.101)

Note: The relationship between c;,j and r;,j holds by the linearity of the expectation op-
erator. If c;,j = 0 fori f:. j, then X; and Xi are said to be uncorrelated. If r;,j = 0 for
i f:. j, then X; and Xi are said to be orthogonal. If Jli = tL i = 0, the coordinates X; and
Xi of X are uncorrelated if and only if they are orthogonal. A

It is common to list the means, correlations, and covariances of the coor-


dinates of X in vectors and matrices. We view all vectors as column vectors and
denote the transpose of a vector a by aT. The mean vector, the correlation
matrix, and the covariance matrix of X are, respectively,

/L = {tL;} = E[X], r = {ri,J} = E[X XT], and


c = {ci,J} = E[(X- /L)(X- /L)T]. (2.102)
2.11. Random vectors 65

The pair (JL, r) or (JL, c) gives the second moment properties of X. A short
hand notation for these properties is X ~ (JL, r) or X ~ (JL, c). The information
content of (JL, r) and (JL, c) is equivalent (Eq. 2.101).

The correlation coefficient of Xi and X 1 is

(2.103)

The correlation coefficient Pi,J has the following remarkable properties.

•IPi,JI::::: 1.
• Pi, 1 = ± 1 if and only if Xi and X 1 are linearly related.
• If Xi and X 1 are independent, Pi,J = 0. The converse is not generally true.

Proof: These properties can be obtained from classical inequalities discussed later in this
chapter or by direct calculations as shown here. Let Xi = (Xi - Mi)/ui denote a scaled
version of X; with mean zero and variance one. Because E[(Xi ±X j ) 2 ] = 2 (1 ±Pi,}) ~ 0,
we have IPi,J I :::: 1.
If a, b are constants and Xi = a X 1 + b, then Pi,} = ±1 results by direct calcu-
lations. If Pi,} = ±1, we have E[(Xi =F X1) 2] = 0. One of the inequalities in the next
section shows that the last equality implies that xi - Xj = 0 a.s. so that xi and X j are
linearly related a.s. (Example 2.65).
If Xi and X 1 are independent, we have

E[(Xi-Mi)(Xj-Mj)]= { (~-P.,i)(YJ-P.,j)f(~,YJ)d~d'fJ
1JR2
= E[Xi- p.,;] E[Xj- Mj] = 0,

because the joint density f of (Xi, X J) is equal to the product of the densities of Xi and
X J. Generally, the converse of this property is not true. For example, let Xi be a Gaussian
random variable with mean zero and variance one and let Xj = xf. These variables are
uncorrelated because E[Xi (Xj- 1)] = E[Xf- X;]= 0 and Xi has zero odd moments.
However, X; and Xi are not independent since x1 is perfectly known if X; is specified. •

2.11.5 Gaussian vector


Let X be an JRd -valued random variable with mean JL and covariance matrix
c. This vector is Gaussian with mean JL and covariance matrix c if its density and
characteristic functions are given by Eqs. 2.104 and 2.105, respectively.
66 Chapter 2. Probability Theory

f(x) = [(2:n')d det(c)r 112 exp [ -~(x- /L)T c- 1 (x- /L) J and (2.104)

<p(u) = exp (RuT /L- ~uT c u). (2.105)

Note: The functions f and rp are defined on JRlf and are real and complex-valued, respec-
tively. If the covariance matrix c is diagonal, the coordinates of X are not only uncorrelated
but also independent because the joint density of X is equal to the product of the marginal
densities of this vector. Hence, the independence and the lack of correlation are equivalent
concepts for Gaussian vectors.
To indicate that X is an !Rd -valued Gaussian variable with second moment proper-
ties (p,, c), we write X ~ Nd(/L, c) or just X ~ N(p,, c) if there can be no confusion about
the dimension of X. A

Example 2.63: If d = 1, 1-L = 0, and c = 1, X = X is called the standard


Gaussian variable. The density and the distribution of this variable are

¢(x) = -1- exp ( -x2)


..ffii
-
2
and <l>(x) = l~ ¢(y) dy. (2.106)

The standard bivariate Gaussian vector corresponding to d = 2, /L = 0, c 1,1 =


c2,2 = 1, and ct,2 = c2,1 = p has the density

¢(xt,x2;p)= 2 :rrJ11 -p2 exp [ Xf - 2 p XI X2 +


2 ( 1 -p 2)
xi] (2.107)

and the characteristic function

<p(ut,u2;p)=exp - ( ui + 2 p u21 u2 + u~) . (2.108)

Figure 2.7 shows the density and the characteristic functions in Eqs. 2.107 and
2.108 for p = ±0.8. ¢
We give two essential properties satisfied by any Gaussian vector X ,....,
N(/L, c). One of these properties uses the notations x< 1> and x< 2> for the first
dt < d and the last d2 = d - dt coordinates of X, /L (p) = E[X<P>], and
c(p,q) = E[(X(p) - /L(P))(X(P) - /L(P))T], p, q 1, 2.=
• Linear transformations of a Gaussian vector are Gaussian vectors.
• The conditional vector X = x< 1> 1 (X<2> = z) is an JRd' -valued Gaussian
variable with mean and covariance matrices
jl = /L(l) + c<I,2) (c(2,2))-l(z _ /L(2)) and
c = c(l,l) _ c(l,2) (c(2,2))-l c<2,1). (2.109)
2.11. Random vectors 67

Density Density
0.4

0.2

Characteristic function Characteristic function

p = -0.8
0.5

0
5
5 5

-5 -5 -5 -5

Figure 2.7. The density and the characteristic function of the standard bivariate
Gaussian vector with p = 0.8 and p = -0.8

Proof: Let Y =a X +b, where a and bare real-valued (q, d) and (q, 1) constant matrices.
The characteristic function of Y at v E JR.q is

The first term is the characteristic function of X for u =aT v so that (Eq. 2.105)

E [eR vT YJ = exp ( .J=l(aT v)T JL- ~(aT v)T c (aT v)) exp(.J=l vT b)
= exp ( .J=l vT (ap,+b)- ~vT (aT ca)v).
Hence, Y is a Gaussian vector with mean JLy = a JL + b and covariance cy = aT ca.
The properties of X can be obtained by direct calculations based on the density of
X in Eq. 2.104 and the definition of the conditional density (Eq. 2.92). If dt = d2 =
1, JL = 0, q 1 = c2 2 = 1, and c1 2 = c2 1 = p, then X = X1 I (X2 = z) is
c
a Gaussian v~iable with mean fl = ; z and ~ariance = 1 - p2 . If p = 0, that is,
X 1 and X 2 are independent, X and X 1 have the same distribution. Otherwise, X has
a smaller variance than X I· The probabilistic characteristics of the conditional vector X
have useful applications in reliability studies. For example, suppose that X 1 and X 2 control
the performance of a system and we can measure only X2. If the correlation coefficient p
is not zero, the measured value of X 2 can be used to reduce the uncertainty in X 1. •
68 Chapter 2. Probability Theory

Example 2.64: Let X be an ~2 -valued random variable with mean vector p, =


(J.q, f.L2) and covariance matrix c with entries c1,1 =a}, c1,2 = c2,1 = p a1 a2,
and c2,2 =a£. The random variable,

X1 =pal (X2- f.L2) + J.t~,


02

is the optimal, mean square, linear estimator of X 1 corresponding to an observa-


tion of X2. <>
Proof: Let Z = a X 2 + b, a, b E R, be a linear estimator of X 1· We require that the esti-
mator is unbiased and minimizes the mean square error E[(Z- X1) 2]. The first condition
implies E[Z] =ILl so that ILl =a IL2 +band Z =a (X2 - IL2) +ILl· The mean square
error of the estimator Z of X 1 is

E[(Z- X1) 2] = E[(a (X2- IL2)- (X1- IL1)) 2 ] = a2 a-i +a[- 2a pal a2,
and takes its minimum value at a = p a1fa2. This optimal value of a is the solution of
oE[(Z- x 1)2]/aa = o. •

2.12 Useful inequalities


The Chebyshev, Cauchy-Schwarz, HOlder, and Jenssen inequalities are use-
ful for both applications and theoretical considerations. These inequalities are
stated and proved. Let X and Y be random variables defined on a probability
space (Q, F, P).

Chebyshev's inequality. If r : ~ --+ (0, oo) is a strictly positive, even function


that increases in (0, oo) and E[r(X)] < oo, then

P(IXI > a) ::::; E~(~~)], a > 0. (2.110)

Proof: The function r(X) is a random variable because {w: r(X(w)) ~ y}, y ~ 0, is equal
to {w: X(w) E [-r- 1(y), ,- 1(y)]} and X is a random variable. We have

E[r(X)] = { r(X)dP ~
Jn
i IXI::a
r(X)dP ~ r(a) P(IXI >a)

because r > 0 andr(x) ~ r(a) for lxl ~a. The common form of the Chebyshev inequality
corresponds to r (x) = lx iP, where p ~ 1 is an integer. •

Jensen's inequality. If g : ~ --+ ~is a convex function and X and g(X) are
integrable random variables, then

g(E[X]) ::::; E[g(X)]. (2.111)


2.12. Useful inequalities 69

Proof: We use two facts to prove Eq. 2.111. First, convex functions are continuous so that
g(X) is a random variable. Second, a convex function g has the property

g(x) = sup{l(x) : l(u) ::=: g(u), VuE IR},

where l denotes a linear function. We have

E[g(X)] = E[sup{l(X)}]:;::: sup{E[l(X)]} = sup{l(E[X])} = g(E[X]),


where the above inequality and the above last equality hold by Eq. 2.50 and the linearity of
the expectation operator, respectively.
The inequalities,

IE[ X] I :::: E[IXI] and (E[X]) 2q :::: E[X 2q] for an integer q > 0,

follow from Eq. 2.111 with g(x) = lxl and g(x) = x 2 q, respectively. •

Cauchy-Schwarz's inequality . If E[X 2 ] and E[Y 2 ] are finite, then E[X Y]


exists and
(2.112)

Proof: The expectation E[(X +A Y) 2] = E[Y 2 ] A2 + 2 E[X Y] A+ E[X 2] is positive for


any A E I!t Hence, the polynomial of A has no real roots so that we must have (E[X YlP-
E[X 2 ] E[Y 2 ] :S 0 . •

HOlder's inequality. If 1 < p < oo and q is given by 1I p + 1I q 1,


E[IXIP] < oo, and E[IYiq] < oo, then

(2.113)

Proof: The inequality Ia bl ::=: la!P / p + lblq jq holds for a, bE lR and p, q as in Eq. 2.113.
The expectation of this inequality with a = X/(E[IXIP]) 11P and b = Yj(E[iYiq])lfq
gives Eq. 2.113. The HOlder inequality gives E[IXI] ::=: E[IXIP] 11P for Y = 1 and the
Cauchy-Schwarz inequality for p = q = 2.
The inequality E[IXI] ::=: E[IXIP] 11P shows that X~ Lp implies X~ Lt. where
p > 1 is an integer. •

Minkowski's inequality. If 1 :=:: p < oo and X, YELp, then X+ YELp and

(2.114)

Proof: That X, Y E L p implies X +Y E L P follows from the inequality


70 Chapter 2. Probability Theory

If p = 1, the inequality in Eq. 2.114 is trivial. If p > 1, the expectation of


IX+ YIP = IX+ YIIX + Ylp-l :::: (lXI + IYI) IX+ Ylp-l is
E[IX +YIP]:::: E[IXIIX + YIP- 1 ] + E[IYIIX + YIP- 1 ].
The Holder inequality gives the upper bounds
E[IXIP]lfp E[IX + Yl(p-l)q]lfq and E[IYIP]lfp E[IX + Yl(p-l)q]lfq

on the right side terms of the above inequality, where 11p+ 1Iq = 1, so that we have
E[IX +YIP] :::5 (E[IXIP]lfp + E[IYIP] 11P) E[IX + Yl(p-l)q]lfq.

Eq. 2.114results by division with E[IX + Yl(p-l)q]lfq since (p -1)q = p. •


Example 2.65: Consider an JR2 -valued random variable X = (X 1, X2) with the
second moment properties

JL= [ f..tl
f..t2
J and c= [ af
pal a2
pa1a2 ]·
a:}

The above inequalities can be used to prove that the correlation coefficient p takes
values in [-1, 1] and p = ± 1 if and only if X 1 and X 2 are linearly related. <>
Proof: Set X; = (X; -
JL;)IO'i ~ (0, 1). The Cauchy-Schwarz inequality applied to the
random variables XI and x2 gives lpl 2 :::: 1. If xl =a x2 + b, then p =ale so that
p = ±1. If p = 1, the expectation of [XI- X2J 2 is zero so that P(IXI- X2l >e)= 0,
Ve > 0, by the Chebyshev inequality. Hence, X1 = X2 a.s. or X 1 = a X 2 + b a.s. •

2.13 Convergence of random variables


Let X and Xn, n = 1, 2, ... , be real-valued random variables defined on a
probability space (Q, F, P). The distribution functions of X and X n are F and
Fn, respectively. The convergence of the sequence X n to X has various definitions
depending on the way in which the difference between X n and X is measured.
We now define the almost sure convergence (X n ~ X), the convergence in
probability (X n ~ X), the convergence in distribution (X n ~ X), and the
convergence in Lp or the Lp convergence (Xn ~ X), where p ::: 1 is an
integer. The L P convergence for p = 2, called convergence in mean square , is
denoted by Xn ~ X or l.i.m. n-+ooXn =X.

• Xn ~X iflimn-+oo Xn(w) = X(w), Yw E Q \ N, P(N) = 0.


• Xn ~ X iflimn-+oo P(IXn -XI > e) = 0, Ve > 0.
• Xn ~X iflimn-+oo Fn(X) = F(x), Yx E JR.
• Xn ~ X if limn-+oo E[IXn - XiP] = 0.
2.13. Convergence of random variables 71

Note: The a.s. convergence has some equivalent definitions. For example, it can be shown
that Xn ~ X if and only if for any e > 0 we have

lim P(IXn -XI :::; e for all n 0::: m)


m-+oo
=I or

lim P(IXn -XI > e for some n 0::: m)


m-+oo
= 0.

An alternative form of the latter condition is P(IXn - XI > e i.o.) = 0 since


Am (e) = U~m {I X n - X I > e}, Ve > 0, is a decreasing sequence of events, so that

lim P(Am(e))
m-+oo
= P(m-+oo
lim Am(e)) = P(limsup{IXn- XI> e})
n-+oo
= P({IXn- XI> e} i.o.),

where the last equality holds by Eq. 2.21. &

We summarize some useful properties of convergent sequences of ran-


dom variables and the relationships between different types of convergence ([151],
Section 6.3). These relationships are illustrated in Fig. 2.8.

~
a.s.
X b X
n


i
, ,~
On a
subsequence
I
, ,,
,
of every
subsequence , ,,
, If X is dominated
,,
n

pr ,, by some YE L
,,
p
X X
n

!
d
X X
n

Figure 2.8. Types of convergence for random sequences and their relationship
72 Chapter 2. Probability Theory

The convergence X n ~ X implies:


• X is unique with probability one.
• Xn is Cauchy in probability. The converse is also true.
d
•Xn~X.

• Xn ~X if Xn is dominated by a random variable in L P• that is, there


exists YELp and IXnl :5 Y a.s. for n ::=: 1.
Note: A sequence of random variables Xn is Cauchy in probability if for arbitrary 8 > 0
and 11 > 0 there exists n(e, 11) such that P(IXm- Xnl > 8) < 11 form, n ~ n(8, Tj).

The limits of sequences of random variables have similar properties as the limits
of numerical series. For example, Xn ~ X and Yn ~ Y imply a Xn +
fJ Yn ~a
~
X+ fJ Y, where a, fJ E ~and Xn Yn ~X Y.
~

If Xn ~ X andY :::: 0 is a random variable such that IXnl < Y a.s. for each
n, then lXI :5 Y a.s.

Xn ~ X if and only if each subsequence {Xnk} of {Xn} contains a subse-


quence{Xnk;}suchthatXnk; ~X.

IXn ~X implies Xn ~X.


m.p. X · X 1"f r < p.
IX n ~ Implies Xn ~
pr
X and Xn m.r.
~

Note: Proofs of the above statements can be found, for example, in [40] (Chapter 4) and
[150] (Section 6.3). A

Example 2.66: Let ([0, 1], 8([0, 1]), P) be a probability space with P(dw) =
dw and let Xn = 2n 1(0,1/n) be a sequence of random variables defined on this
space. The sequence X n converges to zero in probability as n --+ oo but does not
converge in L P• p :::: 1. ¢
Proof: For any 8 > 0, we have P(IXnl > 8) = P((O, 1/n)) = 1/n ~ 0 as n ~ oo.
Hence, Xn converges in probability to zero. On the other hand,
E[iXniP] = (2n) P((O, 1/n)) = 2n /n ~ oo as n ~ oo.
Hence, convergence in probability does not imply Lp convergence ([151], p. 182). •

Example 2.67: Consider a sequence of random variables X n defined on the prob-


ability space ([0, 1], 8([0, 1]), A.) by X 1 = 1[0.1/21• X2 = 1[1f2,1J, X3 = 1[1,1/31•
X4 = 1[1/3,2/3], Xs = 1[2/3,1], and so on, where A. is the Lebesgue measure. This
sequence converges in L P to zero but does not converge a.s. to zero. ¢
2.13. Convergence ofrandom variables 73

Proof: The sequence converges to zero in Lp. p > 0, since

so that E[IXnJP] -+- 0 as n -+- oo. However, the numerical sequence Xn(w) does not
converge to zero ([151], p. 182). •

Example 2.68: If X n is a sequence of uncorrelated random variables with finite


mean f..t and variance a 2 , then

Sn = Ln Xn- f..t pr
----+ 0, n --* 00. (2.115)
k=1 n
This result is referred to as the weak law of large numbers ([106], p. 36). ¢
Proof: The mean and variance of Sn are J.Ln = 0 and u; = u 2 jn. The Chebyshev
inequality (Eq. 2.110) gives P(ISn I > e) :::; u; je 2 = u 2 f(n e 2 ) for arbitrary e > 0 and
each n 2:: 1. Hence, P(ISnl > e) -+- 0 as n -+- oo so that Sn L 0. This convergence
indicates that most of the probability mass of Sn is concentrated in a small vicinity of zero
for a sufficiently large n. The convergence Sn L 0 does not provide any information
on the behavior of the numerical sequence Sn(w) for an arbitrary but fixed we n. Other
versions of the weak law of large numbers can be found in [151] (Section 7.2). •

Example 2.69: If Xn is a sequence of independent identically distributed random


variables such that E[JX 111 exists and is finite, then

-1 L:xk ~ E[XI],
n
n--* oo. (2.116)
n k=1

This result is known as the strong law of large numbers [151] (Sections 7.4
and 7.5). Figure 2.9 shows five samples of (1/n) E/:= 1 Xk. where Xk are indepen-
dent Gaussian variables with mean zero and variance one. All samples approach
the mean of X 1 as n increases in agreement with Eq. 2.116. ¢
Note: The strong law of large numbers characterizes the behavior of the numerical se-
quence (1/n) I:/:= 1 Xk(w). The a.s. convergence of(l/n) I:Z= 1 Xk top,= E[XI] means
that the numerical sequence (1 In) Lk=
1 Xk (w) converges to p, for each w E n \ N, where
N E F and P(N) = 0. •

Example 2.70: If Xk, k = 1, 2, ... , is a sequence of iid random variables with


finite mean f..t = E [X I] and variance a 2 , then

* 1 ~ Xk- f..t d
Sn = r;; ~ ----+ N(O, 1), n--* oo. (2.117)
vn k=1 a
This result is known as the central limit theorem ([151], Section 8.2). ¢
74 Chapter 2. Probability Theory

-1.5'---'---'---'---'-----'----'----'-----'-----'---___j
1 11 13 15 17 19 21
n

Figure 2.9. Five sample paths of (1/n) Lk=l xk for xk ""'N(O, 1)


Proof: The mean and variance of S~ are zero and one for each n so that this sequence
has the required second moment properties. We show that the characteristic function of .s;:'
approaches the corresponding function of the standard Gaussian variable as n -+ oo. The
characteristic function of s~ is

J
rfJn (u) = exp [ -.J=l u J.t ./fila rp(u/(../fi a))n,

where rp denotes the characteristic function of X 1· An alternative form of this characteristic


function is

,-, ../fi [ v-lu


,-, '- 2
r{Jn(u)=exp { -v-lutt-+n J.t + (J=Tu)
2 +O(n- 3/2 )]}
a -vna 2na

and results from rp(u/(../fi a))n = exp [ln (rp(u/(../fi a))n)] by expanding the function
1n (rp(u/(../fi a))n) in a Taylor series ([79], p. 376). The limit of the characteristicfunction
of S~ as n -+ oo is equal to exp( -u 2/2) which is the characteristic function of the standard
Gaussian variable. •

Example 2.71: Consider a sequence of Bernoulli variables X n taking the values


one and zero with probabilities P(Xn = 1) = Pn and P(Xn = 0) = 1 - Pn.
Pn E (0, 1). If L~l Pn < 00, then P(lirnn-+oo Xn = 0) = 1. ¢
Proof: The assumption I:~ 1 Pn = I:~ 1 P(Xn = 1) < oo and the Borel-Cantelli
lemma yield P((Xn = 1) i.o.) = 0 or
1 = P({(Xn = 1) i.o.}c) = P({limsup(Xn = 1W) = P(liminf(Xn = 0)),
n-+oo n-+oo
as stated ([151], Example 4.5.1, p. 103). •
2.14. Random walk 75

2.14 Random walk


Let X = (X 1 , X 2, ... ) be a sequence of iid random variables defined on
a probability space (Q, :F, P) with values in a measurable space (W, Q). The
sequenceR= (Ro, Rt, ... ) defined by

n
Rn=LXi, n=1,2, ... , and Ro=O (2.118)
i=l

is called a random walk.


The random variables (Rt, ... , Rn) and (X 1, ... , Xn) are related by a mea-
surable mapping for each n 2:: 1 so that (Rt, ... , Rn) is a(XI, ... , Xn)-measur-
able. The random variables (Ro, Rt, ... , Rn) are also a(X 1, ... , Xn)-measurable
because the a-field {0, Q} generated by the random variable Ro : Q ~ lR is in-
cluded in a(Xt, ... , Xn) for each n > 1. We will think of the index n of the
random walk as time.

Example 2.72: Suppose that the random variables Xi in Eq. 2.118 are real-valued
with finite mean p., and variance a 2 and that the measurable space (W, Q) is (lR, B).
The mean, variance, and coefficient of variation of the random walk R n corre-
sponding to these random variables are

y'Var[Rn] v
and c.o.v.[Rnl = = r.;;'
E[Rn] vn

where v = a/ p., and c.o.v.[Rn] are defined if p., -=/= 0.


The variance of the random walk increases linearly with the index n, a prop-
erty shared by the Brownian motion process (Section 3.13). If p., = 0, the random
walk has mean zero and oscillates about its mean value with an amplitude that
increases with n. If p., -=/= 0 and n ~ oo, the average and the coefficient of varia-
tion of Rn approach ±oo depending on the sign of p., and zero, respectively. This
observation suggests that Rn approaches +oo or -oo as time n increases. <>

The heuristic considerations in Example 2. 72 are correct in the sense of the


following two results, which are stated without proof.

If P(X 1 = 0) < 1, the asymptotic behavior of the random walk as n ~ oo is


one and only one of the following ([150], Proposition 7.2.2, p. 562):

• Rn ~ +oo a.s.,
• Rn ~ -oo a.s., or
• - oo = liminf Rn <lim sup Rn = +oo a.s. (2.119)
76 Chapter 2. Probability Theory

If P(X 1 = 0) < 1 and E[IX 1ll < oo, the asymptotic behavior of the random
walk Rn as n ~ oo is ([150], Proposition 7.2.3, p. 563):

• If E[X I] > 0, Rn ~ +oo a.s.


• If E[Xt] < 0, Rn ~ -oo a.s.
• If E[Xt] = 0, -oo = liminfRn < limsupRn = +oo. (2.120)

Note: Eq. 2.119 shows that Rn converges to an infinity or oscillates between -oo and +oo
as n --+ oo. The additional information on the mean of X1 in Eq. 2.120 allows us to predict
the behavior of the random walk more precisely than in Eq. 2.119. For example, if the
random variables Xk take the values one and zero with probabilities p, 0 < p < 1, and
1- p, the corresponding random walk Rn converges to +oo a.s. since E[XI] = p. ~

Example 2.73: Let X 1, X2, ... be a sequence of iid Gaussian random variables
with mean J.t and standard deviation u. Figure 2.10 shows samples of the random
walk Rn in Eq. 2.118 for X 1 ""N(J.t, u 2 ), where J.t = 1, J.t = 0, and J.t = -1 and
1500.--.--,----,----,-----,----,----,----,----,---------,

-1000 '-----'-----'-----'-----'-----'------'------'------'------'----.-J
0 100 200 300 400 500 600 700 800 900 1000

Figure 2.10. Sample paths of the random walk for X k ""N(J.t, 102 ) with J.t = 1,
J.t = 0, and J.t = -1

u = 10. The samples corresponding to J.t = 1 and J.t = -1 exhibit an increasing


and decreasing trend, respectively, while the samples for J.t = 0 oscillate about
zero in agreement with Eq. 2.120. <>
Example 2.74: Consider a sequence of iid random variables An. n = 1, 2, ... ,
defined on a probability space (Q, :F, P). Let Xn be another sequence of ran-
dom variables defined on the same probability space generated by the recurrence
2.14. Random walk 77

formula Xn =An Xn-1, n = 1, 2, ... , with Xo = x E lR \ {0}. Ifln/A1/ "'


L1 (Q., :F, P) and E[ln /A1/l < 0, then Xn converges to zero a.s. as n ~ oo. The
limit
1 n
ALE= lim - 2)n/Ak/ = E[ln/A1/l
n-+oo n k= 1

is called the Lyapunov exponent in the analysis of dynamic systems (Section 8. 7).
The solution Xn is stable a.s., that is, it converges a.s. to zero, if ALE < 0 and di-
verges if ALE > 0. <>

Proof: The recurrence formula for Xn yields Xn/x = 0k= 1 Ak for Xo = x so that we
have ln IXn/xl = Lk= 1 ln IAkl· It follows that ln IXn/xl is a random walk since ln IAkl
are iid random variables (Eq. 2.118). If E[ln IA1IJ < 0, the random walk ln IXn/xl con-
verges to -oo a.s. as n -+ oo (Eq. 2.120) so that the solution IXn/xl = exp (Lk= 1 ln IAk I)
converges to zero a.s. If E[ln IA1Il > 0, ln IXn/xl tends to +oo a.s. as n -+ oo so that
IXnfxl = exp (Lk= 1 ln IAkl) approaches +oo a.s. •

Example 2.75: Suppose that the random variables An in Example 2.74 are expo-
nential with distribution F (x) = 1 - exp(- p x ), p > 0, x :::: 0. The solution X n
is stable a.s. for p > p* ::: 0.56, that is, Xn ~ 0 as n ~ oo. Figure 2.11 shows
the dependence of the Lyapunov exponent ALE on p in the range [0.4, 0. 7]. <>

0.4,--~-~-~-~-~----,

0.3

0.2

0.1

'A o - - - - - - - - - ~I

I
-0.1

.
-0.2

-0.3 I

-0.4'----~-~-~.!...._~-~-__j
tp
0.4 0.45 0.5 0.55 0.6 0.65 0.7
p

Figure 2.11. Stability condition for Xn =An Xn-1 with An iid exponential vari-
ables

Proof: The mean of ln IAk I = ln(Ak) is finite and can be calculated from

ALE= {
h
00
ln(u)pexp(-pu)du = -p1 00

-00
~ exp[-~- pexp(-~)]d~
by numerical integration. The Lyapunov exponent ALE= E[ln IA1IJ decreases with p and
is negative for p > p* ::: 0.56. •
78 Chapter 2. Probability Theory

2.15 Filtration
Let (Q, F) be a measurable space. An increasing collection Fo !:; FI !:;
· · · Fn !:; · · · !:; F of sub-a-fields ofF is said to be a filtration in (n, F).

A probability space (Q, F, P) endowed with a filtration (Fnk::o is called a fil-


tered probability space and is denoted by (Q, F, (Fnk~:O• P). We assume that
Fo contains all the P-null sets of F.
Let (Q, F) and (\11, Q) be measurable spaces, (Fn)n:::o denote a filtration
on (Q, F), and X = (Xo, XI, ... ) be a sequence of measurable functions from
(n, F) to (\11, 9).
The sequence X is said to be adapted to the filtration (Fn)n:::o or Fn-adapted
if Xn is Fn-measurable for each n ::::: 0. The minimal or natural filtration
of X = (Xo, XI, ... ), that is, the smallest a-field with respect to which X is
adapted, is a(Xo, XI. ... , Xn).

Example 2.76: Let X = (X I, Xz, ... ) be outcomes of a sequence of coin tosses.


The a -field Fn = a (X I, ... , Xn) represents the knowledge at time n ::::: 1. The
information content of F n is sufficient to decide whether an event related to the
first n tosses has or has not occurred. For example, the event
A = {at least 2 heads in the first five tosses}
is Fs-measurable because we can decide after five tosses whether A has or has not
occurred. If {tail, tail, head, tail} is a sample (XI(w), Xz(w), X3(w), X4(w))
of the first four tosses, the event A remains undecided so that A ¢ F 4· If
(XI(W), Xz(w), X3(w), X4(w)) is {tail, head, head, tail}, A has been observed
in the first four tosses for this sample irrespective of the fifth outcome. How-
ever, we cannot conclude that A E F4. This conclusion would be correct if we
could tell whether or not A has occurred by watching only the first four outcomes
irrespective of the particular sample of (X I. Xz, X3, X4). <>
Example 2.77: Let Fn = a(RI, ... , Rn). n ::::: 1, and Fo = {0, n} be the natu-
ral filtration of the random walk sequence R = (Ro, RI, ... ) in Eq. 2.118. The
sequenceR is Fn-adapted. <>
Note: The sequence of a-fields :Fn contains the information accumulated by observing the
random walk Rn up to and including time n ~ 0. The a-field :Fn+I includes :Fn and other
sets such that Rn+l = Rn + Xn+I is measurable. Hence, the sequence of sub-a-fields
=
:Fn a(Ro, R1, ... , Rn) is a filtration in (Q, :F) and R is :Fn-adapted. A

2.16 Stopping time


In a game of chance we may decide to quit or continue to play after each
round depending on what happened up to that point and on the available resources,
2.16. Stopping time 79

for example, we may be out of money. Hence, the time T at which we stop
playing the game is a random variable that takes values in {1, 2, ... } and depends
on the entire game history. Let :Fn represent the knowledge accumulated at time
n 2: 1. The event {T = n} to quit the game after n rounds should be in :Fn. A
similar "game" can be envisioned between a physical or biological system and the
environment. The system quits the game when its damage state reaches a critical
level, which is the analogue to running out of cash. Depending on the situation,
the time T at which such a system quits the game may be called failure time or
death time.
We define in this section stopping times and prove some of their properties.
Additional information on stopping times can be found in [59] (Chapter 2) and
[151] (Section 10.7).

Let (Q, :F) be a measurable space and (:Fnk::O be a filtration of :F. A


{0, 1, 2, ... , oo}-valued random variable T defined on (Q, :F) is a stopping
time with respect to (:Fn)n:;::o or an :Fn-stopping time if {w : T(w) :::: n} E :Fn
for each n 2: 0.

Note: This definition states that T is an Fn-stopping time if Fn contains sufficient in-
formation to determine whether T is smaller or larger than n for each time n :=::: 0. For
example, T = inf{n :=::: 0 : \Rn I :=::: a}, a > 0, gives the first time when the random walk R
in Eq. 2.118 exits (-a, a). We can determine whether T:::; nor T > n by observing the
random walk up to time n.
We also note that a constant random variable t E {0, 1, ... } is a stopping time since
{w : t :::; n} is either 0 or Q so that the event {t :::; n} is in Fn for each n :=::: 0. If Tis an
Fn-stopping time so is T + t since {w: T(w) + t:::; n} = {w: T(w) :::; n- t} E F(n-t)vO
and F(n-t)vO C Fn fort :=::: 0. A

IT is a stopping time if and only if {T = n} E :Fn for each n 2: 0.


Proof: If Tis a stopping time, we have {T:::; n} E Fn and {T:::; n- l}c E Fn-1 ~ Fn.
Hence, {T = n} = {T :::; n} n {T :::; n - l}c E Fn since Fn is a a-field.
Suppose now that {T = n} E Fn for each n :=::: 0. Because {T = m} E Fm ~ Fn
form:::; nand {T:::; n} = U:!t=0 {T = m}, we have {T:::; n} E Fn. •

If Tis an :Fn-stopping time,

:Fr ={A E :F: An {T:::; n} E :Fn foralln 2: 0}


={A E :F: An {T = n} E :Fn foralln 2: 0}, (2.121)

is a sub-a-field of :F including events in :F that "occur up to timeT".

Proof: We show first that the collection of sets in the second definition of Fr is a a-field,
and then that the above two definitions of Fr are equivalent.
We have (1) Q n {T = n} = {T = n} E Fn since Tis a stopping time so that Q E
Fr. (2) A E Fr implies AcE Fr since Acn{T = n} = {T = n}n(A n {T = n})c E Fn,
80 Chapter 2. Probability Theory

and (3) Ai E :FT, i = 1, 2, ... , implies U~I Ai E :FT since


(u~I Ai) n {T = n} = u~I (Ai n {T = n}) E :Fn.

Hence, the collection of sets in the second definition of :FT is a cr-field.


We show now that the above two definitions of :FT coincide. Take A E :F and
assume that An {T = n} E :Fn for all n ~ 0. Hence, An {T:::; n} = Uk=I (An {T = k})
is in :Fn. It remains to show that An {T :::; n} E :Fn for all n ~ 0 implies An {T =
n} E :Fn for all n ~ 0. This follows from the equality An {T = n} = (An {T:::: n}) n
(An {T:::; n- 1))C since An {T:::; n} E :Fn and An {T:::: n- 1} E Fn-I £ :Fn. •

If X = (Xo, XI, ... ) is a sequence of functions measurable from (Q, :F) to


(\11, g) and :Fn = cr(Xo, XI, ... , Xn), then the hitting time

T(w) = inf{n 2: 0: Xn(w) E B}, B E g, (2.122)

is an :Fn -stopping time.

Proof: Take B E Q. The random variable T(w) = inf{n ~ 0 : Xn(w) E B} satisfies the
condition {T:::; n} if at least one of the events {Xo E B}, {Xo ¢:. B, ... , Xk-I ¢:. B, Xk E
B}, k = 1, ... , n, occurs, that is,

{T:::; n} = {Xo E B} U (uk=I {Xo ¢:. B, ... , Xk-I ¢:. B, Xk E B})

so that {T :::; n} E :Fn because it consists of finite intersections and unions of events in :Fk
and :Fk £ :Fn fork:::; n. •

IA stopping timeT is FT-measurable.


Proof: We need to show that {T :::; m} is in :FT for each m ~ 0, that is, {T :::; m} n {T :::;
n} E :Fn for all n ~ 0. This condition is satisfied because {T :::; m} n {T :::; n} = {T :::;
m 1\ n} and {T:::; m 1\ n} E :FmAn £ :Fn. •

IIf S and T are Fn-stopping times such that S ::::; T, then :Fs ~ :FT.
Proof: If A E :Fs, then A n {S :::; n} E :Fn for all n ~ 0. We need to show that
An {S:::; n} E :Fn implies An {T:::; n} E :Fn for all n ~ 0. This implication is true since
the sets An {S:::; n} and {T:::; n} are in :Fn, and An {T:::; n} =(An {S:::; n}) n {T:::; n}
holds if S:::; T. •

IIf SandT are Fn-stopping times, then S 1\ T and S v T are stopping times.

Proof: We have

{S 1\ T :::; n} = {S:::; n} U {T:::; n} and {S v T:::; n} = {S:::; n} n {T:::; n}


so that the events {S 1\ T:::; n} and {S v T:::; n} are in :Fn for each n ~ 0 since SandT are
stopping times. •
2.16. Stopping time 81

IIf SandT are Fn-stopping times, then Fsl\r = Fs n Fr.


Proof: Because S 1\ T :::: S, T, we have :Fsl\r c :Fs, :Fr by one of the above properties
so that :Fsl\r c :Fs n :Fr. To prove the opposite inclusion, take A E :Fs n :Fr and note
that, if A E :Fs n :Fr, then
An {S 1\ T :::;: n} =An ({S:::: n} U {T:::: n}) =(An {S:::: n}) U (An {T:::: n})

belongs to :Fn so that A E :FsAr· •

If X = (Xo, X1, ... ) is a sequence of functions measurable from (Q, F) to


(\11, 9), Fn = a(Xo, X1, ... , Xn). and T denotes anFn-stoppingtime, thenXr
is Fr-measurable.
Proof: Take B E 9. Then
{Xr E B} = U~ 0 {T = k} n {Xr E B} = U~0 {T = k} n {Xk E B} E :F
since {T = k} E :Fk C :F and {Xk E B} E :F. It remains to show that Xr is :Fr-
measurable, that is, that A = {X r E B} E :Fr for each B E 9 or A n {T :::: n} is in :Fn for
all n :::: 0. We have
An {T:::;: n} = {Xr E B} n {T :::;: n} = Uk=O{T = k} n {Xr E B}
= Uk=O{T = k} n {Xk E B} E :Fn
since :Fn is a filtration on :F, X is :Fn -adapted, and :Fk s; :Fn fork :::: n. •

If (1) X = (X 1, Xz, ... ) is a sequence of functions measurable from (Q, F)


to (\II, 9) that are iid, (2) P is the probability measure on (Q, F), and (3) T
is an a.s. finite Fn = a(X1, ... , Xn)-stopping time, n 2: 1, then the sequence
Y = (Y1, Yz, .. .), where

Yn(w) = Xr(w)+n(w), n 2: 1, (2.123)

is independent of Fr and has the same distribution as X.


Proof: Because Xk are independent random variables, (Xn+l• Xn+2· .. .), n:::: 1, obtained
from X by translating the time with n units, is independent of the past history, that is, of
the a-fields :Fm, m :::: n. We show that this property is preserved even if time is translated
by a random amount T, provided that Tis a finite stopping time, that is, that the sequence
Y = (Xr+l• Xr+2• .. .) is independent of the a-field :Fr. The Brownian motion and other
processes considered in this book have this property, called the strong Markov property.
We show first that the measurable functions X, Y : (Q, :F) -+ (\11 00 , 9 00 ) have the
same distribution. For B E goo, we have
P(Y E B)= P((Xr+l• Xr+2• .. .) E B)
00

= L P((Xk+I, xk+z, .. .) E B) P(T = k)


k=l
00

= L P((X1, Xz, .. .) E B) P(T = k) = P(X E B).


k=l
82 Chapter 2. Probability Theory

To show that Y is independent of Fy, we need to prove that P(An{Y E B}) = P(A) P(Y E
B), where A E Fr and B E goo. We have

L P(A n {Y
00

P(A n {Y E B}) = E B} 1 T = k) P(T = k)


k=l

=L
00

P(Ak n {(Xk+t· xk+ 2 , .. .) E B})


k=l
00

=L P(Ak) P((Xk+t· xk+ 2 , .. .) E B)= P(A) P(Y E B)


k=l
because the events Ak = An{T = k} and (Xk+l, xk+2· .. .) E Bare independent for each
k and the sequences X and Y have the same distribution. Since A is an arbitrary member
of Fr, we conclude that Y is independent of Fr. •

Example 2.78: Let R be the random walkinEq. 2.118, where Xi areiid Gaussian
random variables with mean f1- and standard deviation a > 0. LetT = inf{n ::::
0 : Rn tj. (-a, a)} be the first timeR leaves (-a, a), a > 0. Figure 2.12 shows
1000 samples of the stopping time T and the corresponding histogram of these
samples for f1- = 0.2, a = 1, and a = 5. <>
0.05,--~-~--------,

Histogram ofT
0.04

0.03

0.02

0.01

200 400 600 800 1000 20 40 60 80 100


Sample number Stopping time

Figure2.12. SamplesandahistogramofT for Rn = L7=t xi, Xt ~ N(0.2, 1),


and a= 5

2.17 Conditional expectation


We have defined the conditional density jCl1 2l of a random vector x<ll
given that another vector X <2l has a known value z (Eq. 2.92). This density can
be used to calculate the expectation

(2.124)
2.17. Conditional expectation 83

oftheconditional vector x<I) I cx< 2) = z), that is, the expectation of x<l) given
the information x<2) = z.
We extend the definition of the conditional expectation E[X (1) I x<2)] in
Eq. 2.124 by considering information more general than X <2) = z. This section
defines the conditional expectation E[· I Q], where (Q, :F, P) is a probability
space and Q is a sub-a-field of :F. The conditional expectation E[· I Q] is needed
in many applications involving stochastic processes, as we will see in this and the
subsequent chapters.

Example 2.79: Consider the experiment of rolling two dice. The sample space,
a-field, and probability measure for this experiment are Q = {w = (i, j) : i, j =
1, ... , 6}, all subsets of n, and P({w}) = 1/36 for w e Q, respectively. Define
the random variables X (w) = i + j and Z (w) = i 1\ j. The expected value of X
is E[X] = (1/36)[(1 + 1) + (1 + 2) + · · · + (6 + 6)] = 7.
Suppose we are told the value of Z and asked to determine the expectation
of X given Z, that is, the expectation of the conditional variable X I Z denoted
by E[X I Z]. The conditional expectation E[X I Z] is a simple random variable
with density in Fig. 2.13 so that

E{E[X I Z]} = L E[X I Z = z] P(Z = z) = (12)P(Z = 6)


z
+ (32j3)P(Z = 5) + · · · + (52/ll)P(Z = 1) = 7 = E[X].
is the expected value of E[X I Z]. <>
0.4

0.35
P(Z=l)
0.3

P(Z=2)
0.25

0.2
P(Z=3)

0.15 P(Z=4)

0.1 P(Z=5)
0.05
P(Z=6)
I
03 4 5 a 7 a 9 10 11 12 13
X

Figure 2.13. The density of E[X I Z]

Proof: If Z = i 1\ j = min(i, j) = 6, there is a single outcome (6, 6) so that X I (Z =


6) = 6 + 6 = 12. The density and the expectation of the conditional random variable
84 Chapter 2. Probability Theory

X I (Z
= 6) are J01 2)(x) = fx1z(x I 6) = 8(x- 12) and E[X I Z = 6] = 12, where 8
denotes the Dirac delta function. The probability of the event {Z = 6} is 1/36. If Z = 5,
there are three equally likely outcomes, (6, 5}, (5, 5), and (5, 6}, so that fx1z(x I 5) =
(1/3)8(x - 10) + (2/3)8(x - 11) and E[X I Z = 5] = (10)(1/3) + (11)(2/3) = 32/3.
The probability of the event {Z = 5} is 3/36.
Note that the random variable E[X I Z] does not have a density. The above expres-
sions of f(l/Z) are formal. They are used here and in some of the following examples for
simplicity. •

2.17.1 a-field generated by a countable partition of Q


Let (n, :F, P) be a probability space and let An be a countable collection
of measurable sets that partition n, that is, UnAn = n, Ann An' = 0 for n f. n',
and An E F. Denote by g the a-field generated by {An}. The members of Q
are unions of sub-collections of {An}. Let X be an integrable random variable on
(n, :F, P).
If P(An) > 0, define the function E[X I Q] : n~ R by

E[X I Q](w) = L E[X I An] 1An (w), where (2.125)


n

E[X I An]= { X(w) PAn(dw) = -1() { X(w) P(dw). (2.126)


Ja P An JAn

Note: E[X I An] is the conditional expectation of X with respect to An and PAn (A) =
P(A I An) = P(A nAn)/ P(An) . .l

The function E[X I Q]: n~ R has thefollowing properties.

• E[X I Q] is (}-measurable.
• E[X I Q] can be viewed as an approximation of X.
• E[X I Q] satisfies the defining relation

iXdP=iE[XIQ]dP, VAeQ. (2.127)

Proof: Since 1An e 9 and E[X I An] are some constants, the conditional expectation in
Eqs. 2.125 and 2.126 is a {;-measurable function. We also note that E[X I 9] is a discrete
random variable on (Q, 9) whose probability density function is f (x) = Ln P (An) 8(x-
E[X I AnD·
Since E[X I An] can be viewed as a local average of X over the subset An.
E[X I 9] provides an approximation for the random variable X. The accuracy of this
approximation depends on the refinement of the partition An. If the partition has a single
element that necessarily coincides with the sample space, E[X I 9] is equal to E[X]. If
2.17. Conditional expectation 85

the sample space is countable and the partition is given by the elements {w} of the sample
space, E[X I Q] is equal to X.
It remains to prove Eq. 2.127. If A coincides with the sample space n, then

{ E[X I Q] dP = E{E[X I Q]} = L E[X I An] P(An) = { X dP = E[X]


ln n ln
because E[X I Q] is equal to E[X I An] on the atoms An of g with probabilities P(An).
If A is an arbitrary element of g, that is, A =
UkeJ Ak is a union of some of the sets
partitioning n, we have (Eqs. 2.44 and 2.126)

1 A
X dP =L 1
keJ Ak
X dP =L
keJ
E[X I Ak] P(Ak) and

{ E[XIQ]dP=L { E[XIQ]dP=L { LE[XIAn]1An(w)P(dw)


h keJhk keJhk n

= L L { E[X I An] P(dw) = L { E[X I Ak] P(dw)


keJ n JAknAn keJ}Ak

=L E[X I Ak] P(Ak)·


keJ

The above chain of equalities is based on properties of integrals of random variables and
Eqs. 2.125 and 2.126. •

Example 2.80: Let (0, :F, P) be a probability space, where n = [0, 1], :F =
B([O, 1]), and P(dw) = dw. Consider a random variable X(w) = 2 + sin(2n w),
w E n, defined on this space. Let At = [0, 1/4), A2 = [1/4, 3/4), A3 =
[3/4, 1), and A4 = {1} be a partition of nand denote by g the a-field generated
by this partition. We have

where x can be any real number. The lack of uniqueness is of no concern since
P(A4) = 0. The conditional expectation will be defined as a class of random vari-
ables that are equal a.s. (Eq. 2.128). Figure 2.14 shows the measurable functions
X and E[X I Q] for x = 3. The conditional expectation E[X I Q] represents an
approximation of X whose accuracy depends on the partition {An} of Q. 0

Eq. 2.126. For example, we have E[X I At] =


for A1. •
m
Proof: The conditional expectation E[X I Q] is given by Eq. 2.125 with E[X 1 An] in
Jd/4 [2 + sin(2n w)] dw = 2(1 + 1/n)

The defining relation (Eq. 2.127) is essential for calculations and reveals
some useful properties of the conditional expectation.
86 Chapter 2. Probability Theory

2.

E[X IO](ro)
1.8

1.6
X(ro)
1.4

1.2

1
0 0.2 0.4 0.6
0)

Figure 2.14. The conditional expectation E[X I Q] of X(w) = 2 + sin(2Jr w) for


At = [0, 1/4), Az = [1/4, 3/4), A3 = [3/4, 1), and A4 = {1}

The integrals fA X dP and fA E[X I Q] dP are equal for every A E Q although


their integrands, X and E[X I Q], are F-measurable and Q-measurable, respec-
tively. These integrals represent local averages of the random variables X and
E[X I Q] over A.

IE[X I Q] is integrable because X is an integrable random variable.


IE[X I Q] is unique up to an equivalence.
Proof: Let hi, i = 1, 2, be two 9-measurable functions satisfying the defining relation
fA X dP = fA hi dP, VA E 9. Hence, ht and hz are two versions of the conditional
expectation. The subset A' = {w: ht (w) > hz(w)} is in 9 since the functions hi are 9-
measurable. The integral fA' (h 1 - hz) d P is zero by hypothesis so that P(A1) = 0 since
ht- hz > 0 on A'. Similar arguments show that the subset A"= {w: ht(w) < hz(w)}
has probability zero so that P({w : ht (w) =I= hz(w)}) = 0 or ht = hz a.s., that is, ht
and hz can differ only on a set of probability zero ([40], Section 9.1). The conditional
expectation E[X I 9] is represented by all9-measurable random variables satisfying the
defining relation in Eq. 2.127. The random variables in this class are equal a.s. and represent
versions of the conditional expectation. •

Example 2.81: Let An = {Z = n}, n = 1, ... , 6, be subsets of the sample space


n of the experiment in Example 2. 79. The a -field g = a (Z) consists of unions
of An's. The conditional expectation, E[X I Z], calculated previously coincides
with E[X I Q] in Eq. 2.125, where X(w) = i + j. 0

Proof: The conditional expectation E[X I 9] is constant on An and is equal to


E[X I Anl· For example, E[X I A6] is equal to (11(1136))(12)(1136)=12 and E[X I As]
is (11(3/36))(11+10+11)(1136)=3213. Hence, the conditional expectations E[X I Z] and
E[X I cr(Z)] coincide. •
2.17. Conditional expectation 87

Example 2.82: Let X be a random variable with distribution function F and finite
mean defined on a probability space (Q, :F, P). If a E ffi. is such that F(a) E
(0, 1), then

J~ 00 xdF(x) fa00 xdF(x)


E[X I Q] = F(a) Ic-oo,a] + 1 _ F(a) 1c-oo,a)C

for Q = {0, Q, A, Ac}, where A= x- 1 ((-oo, a]) E F.<>


Proof: The events A and Ac partition n so that E[X I 9] is given by Eq. 2.126 with

1-
E[X I A ] = -
P(A) A
1
X dP = -1-
F(a) -oo
xdF(x) la and

1-
E[XIAc]=- {
P(Ac) }Ac
XdP= 1
1- F(a) a
1 00
xdF(x).

The expectation of E[X I 9] is


E{E[X I 9]} = E[X I A] P(A) + E[X I Ac] P(Ac) = E[X]

so that the random variables E[X I 9] and X have the same expectation. •

2.17.2 General 0"-fi.eld


Let X be an integrable random variable defined on a probability space
(Q, :F, P) and Q denote a sub-a-field of :F.

The conditional expectation E [X I Q] of X with respect to Q is defined to be


the class of Q-measnrable functions satisfying the defining relation (Eq. 2.127)

I XdP= I E[XIQ]dP, VAEQ. (2.128)

Note: The definition is meaningful because, if E[IXIJ < oo and 9 is a sub-cr-field of :F,
there exists a unique equivalence class of integrable random variables, denoted by E[X I
9], that is 9-measurable and satisfies the defining condition of Eq. 2.128 for all A E 9
([40], Theorem 9.1.1, p. 297). &

The following properties result directly from Eq. 2.128.

• If Q = {0, Q}, then E[X I Q] = E[X].


• If Q = :F, then E[X I :F] = X a.s.
• E{E[X I 9]} = E[X].

Proof: If 9 = {0, Q}, then E[X I 9] is constant on n so that Eq. 2.128 gives E[X] =
E[X I 9] for A = n.
If 9 = :F, then E[X I 9] = E[X I F] is F-measurable and the defining relation
yields fA (X- E[X I :F]) dP = 0 for all A E :F so that X= E[X IF] a.s.
The defining relation gives E{E[X I 9]} = E[X] for A= Q . •
88 Chapter 2. Probability Theory

We list here some properties of the conditional expectation E[X I Q] that


are used frequently in calculations.

E [(X- E[X I Q]) Z] = 0, vz E Q. (2.129) 1

Proof: If Z = lA, A E Q, then Eq. 2.129 holds because E{(X- E[X I Q]) lA} =fA (X-
E[X I Q])dP and this integral is zero by the defining relation. Eq. 2.129 is also valid
for a simple random variable Z = Ln bn iAn, An E Q, by the linearity of expectation.
The extension to an arbitrary random variable Z results from the representation of Z by
a difference of two positive random variables, which can be defined as limits of simple
random variables ([40], Section 9.1). •

The conditional expectation E[X I Q] is the projection of X on Q and


X - E[X I Q] is orthogonal to Q (Eq. 2.129, Fig. 2.15). Moreover, E[X I Q]
represents the best mean square (m.s.) estimator of X with respect to the in-
formation content of Q ([98], Sections 4.3 and 4.4).

E[X lv]
{Z: ZEQ}

Figure 2.15. A geometrical interpretation of conditional expectation

Note: The function (X 1, X2) t-+ (X 1, X2) = E[X 1 X2l is the inner product defined on L2
(Section 2.6). If (X 1, X2) = 0, X 1 and X2 are said to be orthogonal. A random variable
Y E L2(Q, :F, P) is orthogonal to L2(Q, Q, P) if (Y, Z) = 0 for all Z E L2(Q, Q, P).
Eq. 2.129 states that X- E[X I Q] is orthogonal to L2(Q, Q, P). According to the or-
thogonal projection theorem in Eq. 2.51, E[X I Q] is the best m.s. estimator of X given the
information content of g. 4

E[X Z I Q] = Z E[X I Q] a.s., VZ E Q. (2.130) 1

Proof: The random variables E[X Z I Q] and Z E[X I Q] are Q-measurable by the def-
inition of the conditional expectation and the fact that the product of two Q-measurable
functions, the functions Z and E[X I Q], is Q-measurable. If Z = 1~, ~ E Q, Eq. 2.130
holds a.s. because for A E g the left and the right sides of this equation are

{ E[Xl~IQ]dP= { Xl~dP= { XdP and


}A }A JAn~
{l~E[XIQ]dP={ E[XIQ]dP={ XdP,
h hn~ hn~
2.17. Conditional expectation 89

respectively, by the defining relation. Hence, Eq. 2.130 holds for a simple random variable
Z because the conditional expectation is a linear function (Eq. 2.131). The extension to
general random variables results from the representation of an arbitrary random variable by
the difference of two positive random variables, which can be defined as limits of simple
random variables ([40], Theorem 9.1.3, p. 300). •

We now list essential properties of the conditional expectation . These properties


hold a.s., and are similar to the properties of the expectation operator.

If X and Xn are integrable random variables, then ([151], Section 10.3):

• X E 9 implies E[X I 91 =X.


• E[a X 1 + b X2 I 91 = a E[X 1 I QJ + b E[X2 I 91 (linearity) .
• X 1 ~ x2 implies E[X 1 I 91 ~ E[X2 I 9] (monotonicity).
• IE[X I 911 ~ E[IXI I QJ (modulus inequality).
• Xnt (-J,)X implies E[Xn I 91 t (i)E[X I 9]
(monotone convergence).
• IXn I ~ Y, E[Y] < oo, and Xn --?- X imply E[Xn I 91 --?- E[X I 91
(dominated convergence). (2.131)

Cauchy-Schwarz and Jensen inequalities are, respectively,

(E[X Y I 9]) 2 ~ E[X 2 I 9] E[Y 2 I 9] and (2.132)

g(E[X I 9]) ~ E[g(X) I Q] foraconvexfunctiong. (2.133)

If 91 and 92 are sub-a-fields of :F such that 91 C 92, then we can perform the
following change of fields in the conditional expectation

E[X I 911 = E[X I 92] {:=::} E[X I 921 E 91 and (2.134)

E{E[X I 92] I 91} = E[X I 911 = E{E[X I 911 I 92}. (2.135)

Proof: If E[X I 911 = E[X I 92] holds (Eq. 2.134), thedefiningrelationgivesfA X dP =


fA E[X I 9l]dP for any A E 9! and the latter integral is equal to fA E[X I 9z]dP
J J
by hypothesis. Hence, we have A X d P = A E [X I 921 d P for all A E 91, so that
E[X I 9z] is 91-measurable. Conversely, if E[X I 92] is 91-measurable, then
E{E[X I 921 I 9J} = E[X I 9zl by the first statement in Eq. 2.131. We also have
E[X I 911 = E{E[X 19 2 11 9J} by Eq. 2.135.
For Eq. 2.135 take A E 9! C 92 and apply the defining relation in Eq. 2.128 twice,
first for the random variable E[X I 92] and second for the random variable X, that is,

i E{E[X I 9211 9J}dP = i E[X I 92ldP = i XdP = i E[X I 9JldP.


90 Chapter 2. Probability Theory

This gives the first equality of Eq. 2.i35. The second equality of this equation results since
E[X I 9d is 91-measurable and, therefore, 9z-measurable. Hence, E{E[X I 9tll 9z} =
E[X 19tlbythefirstpropertyofEq.2.i31. •

Example 2.83: Let R = (Ro, R1, ... ) be the random walk in Eq. 2.118 and :Fm
be the a- field generated by ( R o, R1 , ... , Rm), m :=: 0. The average of the random
walk at a time n > m given the past history (Ro, R1, ... , Rm), that is, the a-field
:Fm, is E[Rn I :Fm] = Rm + (n- m) E[X 1], where Xk are independent identically
distributed random variables in L 1 . <>
Proof: We have

E[Rn I Fm] =E [Rm + t


k=m+1
Xk I .Fm] = E[Rm I Fm]

+E [ t
k=m+1
Xk I .Fm] = Rm + t
k=m+1
E[Xk]

because the conditional expectation is a linear operator, Rm is .Fm-measurable, and Xk is


independent of Fm fork > m. •

Example 2.84: Let (Q, :F, P) be a probability space, {An} denote a countable
partition of Q such that An E :F for each n, and Q be the a-field generated by
{An}. Then the random variable X - E[X I Q] is orthogonal to Q, where the
conditional expectation E[X I Q] is equal to Ln E[X I An] lAn. <>
Proof: The conditional expectation E[X I 9] is given by Eqs. 2.i25 and 2.i26, where the
members of 9 are UnEJ An} with J c; {1, 2, ... }. It remains to show that X- E[X I 9] is
orthogonal to 9, that is, that E[(X- E[X I 9]) Z] = 0 for all Z E 9. Because Z E 9, we
have Z = Ln ctn iAn so that

n m,n
The above expression is zero since E[X iAn] = E[X I An] P(An) (Eq. 2.i26) and
E[lAm iAn] = P(Am nAn) = 8mn P(An) so that the second summation becomes
Ln E[X I An] ctn P(An) . •

Example 2.85: Let X and Z be random variables in Lz(Q, :F, P). We have seen
that the conditional expectation E[X I a(Z)] = E[X I Z] is the best m.s. esti-
mator of X with respect to the information content of a(Z). The best m.s. linear
estimator of X is
E[X Z] - E[X] E[Z]
+
A

X = E[X] (Z - E[Z]).
E[Z 2] - E[Z] 2
This estimator has the functional form of the conditional Gaussian variable in Ex-
ample 2.64, represents the linear regression of X with respect to Z, and becomes
X = E[X] if X and Z are not correlated.<>
2.17. Conditional expectation 91

Proof: The general form of a linear estimator of X given Z is X = a Z + b, where a, b are


some constants. Hence, X is a (Z)-measurable. We require that X satisfies the defining and
the orthogonality conditions. These conditions give E[X] = E[X] for A = Q (Eq. 2.128)
and (X- X, Z) = 0 (Eq. 2.129) or E[X] =a E[Z] +band E[X Z] =a E[Z 2 ] +b E[Z].
The solution a, b of these equations introduce in a Z + b gives X. The resulting estimator
has the property that its m.s. error E[(X- X) 2] is smaller than the error E[(a Z +b- X) 2]
of any other linear estimator. •

The conditional expectation can be used to generalize the definition of con-


ditional probability in Eq. 2.16. Let (Q, F, P) be a probability space and consider
a sub-a-field gofF.

The conditional probability with respect to g is


P(A I 9) = E[IA I Q], A E F. (2.136)

Note: The definition is meaningful since 1A is an F-measurable function. Because condi-


tional probability is a conditional expectation, P(A I 9) is 9-measurable and

L P(A 19)dP = L E[lA 19JdP = L 1AdP = P(AnA), VA E 9,

where fA E [ 1A I 9] d P = JA 1A d P holds by the defining relation of conditional expec-


tation. A

Example 2.86: Let (Q, F, P) be a probability space and A, B E F such that


P(B) > 0 and P(Be) > 0. The conditional probability in Eq. 2.136 is

P(A n B) P(A n Be)


P(A I Q) = P(B) 1B + P(Be) 1Bc (2.137)

for the a-field g = {0, n, B, Be} generated by B. Hence, P(A I 9) is a random


variable taking the values P(A n B)/ P(B) and P(A n Be)/ P(Be) with prob-
abilities P(B) and P(Be), respectively. This observation is consistent with the
definition of conditional probability in Eq. 2.16. <>

Proof: We have P(A I 9) = E[lA I 9] by Eq. 2.136 and (Eqs. 2.125 and 2.126)
E[lA I 9] = E[1A I B]1B + E[1A I Be] 1Bc
= (-1-
P(B)
{ 1A dP) 1B
fn
+ (P(Be)
-1 - { 1A dP) 1nc.
fnc

If 9 = F, the corresponding conditional probability is P(A I F) = E[1A I F] =


1A, A E F, because 1A is F-measurable. Hence, P(A I F) is equal to one and zero for
w E A and w E A e, respectively. •
92 Chapter 2. Probability Theory

2.18 Martingales
Let (0, :F, P) be a probability space endowed with a filtration (:Fnk~.o
such that :F00 = U~ 1 :Fn ~ :F and let Xn, n = 0, 1, ... , be random variables
defined on (0, :F, P). The sequence X = (Xo, X1, X2, ... ) is referred to as a
discrete time stochastic process or just a stochastic process. The numerical
sequence (Xo(w), X1 (w), ... ), w E 0, is called a sample or sample path of X.
Continuous time stochastic processes are discussed in Chapter 3.

The sequence X is an :Fn-martingale if

1. E[iXnil < 00, n = 0, 1, ... ,


2. X is adapted to :Fn, that is, Xn E :Fn for each n :;:: 0, and
3. E[Xn I :Fm] = Xm, 0 ::S m ::S n. (2.138)

Note: If the equality in Eq. 2.138 is replaced by 2: and :5, then X is said to be an Fn-
submartingale and Fn-supermartingale, respectively. The filtration Fn need not be men-
tioned if there is no confusion about it.
If the random variables Xn are in Lp(rl., F, P), X is called a p-integrable martin-
gale, submartingale, or supermartingale depending on the last condition of Eq. 2.138. If
p = 2, then X is said to be a square integrable martingale, submartingale, or supermartin-
gale. .l

Suppose m :;:: 1 rounds have been completed in a game with unit stake. We
can think of Xn as our total winnings (or losses) after n rounds of a game so that
Xn - Xm, n > m, gives our net total winnings in the future rounds m + 1, ... , n.
The best m.s. estimator of Xn - Xm. n > m, given our knowledge :Fm after
m rounds is E[Xn - Xm I :Fm]. where :Fm = a(X1, ... , Xm). m :;:: 1, and
:Fo = {0, 0}. If X is a martingale, then E[Xn - Xm I :Fm] = 0, that is, our
average fortune E[Xn I :Fm] at a future time n is equal to our current fortune X m·
This game can be generalized by allowing stakes other then one. Let A i be our
stake at game i = 0, 1, ... , where Ao = 0. Because we decide our stake for round
m + 1 on our knowledge :Fm after m games, Am+l is :Fm-measurable. Processes
with this property are said to be predictable processes. The sequence A 1, A2, ...
is called gambling strategy. The summation,
n
Mn =L Ai (Xi - Xi-1), (2.139)
i=l

gives our total winnings after n :;:: 1 games, where X o = 0. The sequence M =
(Mo, M1, ... ) defined by Eq. 2.139 for n:;:: 1 and Mo = 0 is a discrete version of
the stochastic integral considered in Chapter 4. The integrand A i is a predictable
process and the integrator Xi is a martingale.
2.18. Martingales 93

Example 2.87: Consider the random walk in Eq. 2.118. We have shown that

(Ro, R1, ... , Rn) and (RI, ... , Rn)

area(XI, ... , Xn)-measurableforn ~ 1 sothatR = (Ro, R1, R2, ... ) is adapted


to the filtration (Fn)n~O given by Fn =
a(X1, ... , Xn), n =
1, 2, ... , and Fo =
{0, Q}. If the random variables Xi have finite mean, R is an Fn-supermartingale,
martingale, or submartingale depending on the value of E[X I].<>
Proof: It remains to prove the last condition in Eq. 2.138. For n :;:: m :;:: 0, we have
n
E[Rn I Fm] = Rm + L E[Xil = Rm + (n- m) E[XI]
i=m+l
by Example 2.83. If E[XI] is negative, zero, or positive, then R is a supermartingale,
martingale, or submartingale, respectively. •

Example 2.88: Let R = (Ro, R1, ... ) be the random walk in Eq. 2.118. If the
random variables Xi are in L2(Q, .r, P) and have mean zero, the sequence Sn =
R~ = L,7,j=l Xi X j, n ~ 1, with So = 0 is an Fn-submartingale and Sn -n E[XI]
is an Fn-martingale. <>
Proof: The sequences Sn and Sn- n E[XI] have the first two properties in Eq. 2.138. We
also have

E[Sn+l I Fn] = E[(Rn + Xn+I) 2 I Fnl = E[R~ + 2 Rn Xn+I + X~+l I Fn]


= R~ + 2 Rn E[Xn+Il + E[X~+I] = Sn + E[XI] 2: Sn
since Rn is Fn-measurable, Xn+l is independent of Fn, and E[Xn+Il = 0. Hence, S =
(So, S1, ... ) is a submartingale. We also have

E[Sn+l - (n + 1) E[XI] I Fn] = (Sn + E[XID- (n + 1) E[XI] = Sn - n E[XIJ

so that Sn - n E[XI] is a martingale. •

Example 2.89: LetYn = exp(H u Rn)/cpn(u), u E IR, where R = (Ro, R1, ... )
istherandomwalkinEq.2.118andcpn(u) = E[exp(HuRn)],u E IR, denotes
the characteristic function of Rn. Then Yn is a martingale. <>
Proof: The characteristic function of Rn is 'Pn(u) = rp(u)n, where rp denotes the charac-
teristic function of X 1· The random variables Yn are Fn = a (RI, ... , Rn)-measurable for
each n 2: 0 and have unit mean. For n > m, we have

E[Yn I Fm] = _1_ E [eHu(Rm+.E~=m+I xk I Fm]


'Pn(u)

= eHuRm [ ,-, n
E e" -1 u Lk=m+l xk
]
= eHuRm rp(u)n-m = Ym
f{Jn(u) f{Jn(u)
since Rm E Fm and the random variables Xko k::: m + 1, are independent of Fm. •
94 Chapter 2. Probability Theory

2.18.1 Properties
Most of the properties given in this section follow directly from the defini-
tion of submartingale, martingale, and supermartingale processes.

• The expectation of a submartingale, martingale, and supermartingale


X = (X o, X 1 , ... ) is an increasing, constant, and decreasing function of
time.
• X is a martingale if it is both a submartingale and a supermartingale.
• If X is a submartingale, then -X is a supermartingale.

Proof: The expectation of Eq. 2.138 is E{E[Xn I Fm]} = E[Xm], n 2: m 2: 0. We


also have E{E[Xn I Fm]} = E[Xn] (Eq. 2.128) so that E[Xn] is constant. If X is a
submartingale, then E[Xn I Fm] 2: Xm (Eq. 2.138) and the expectation of this inequality
is E[Xn] 2: E[Xm]. The opposite inequality results for a supermartingale. Martingales
model fair games because the average fortune E[Xn] of a player does not change in time.
Submartingales and supermartingales correspond to super-fair and unfair games, respec-
tively.
If X is both a submartingale and a supermartingale, it must satisfy simultaneously
the conditions E[Xn I Fm] 2: Xm and E[Xn I Fm] ::5 Xm for n 2: m, which imply
E[Xn I Fm] = Xm.
The condition E[Xn I Fm] 2: Xm, n 2: m, implies E[(-Xn) I Fm] ::5 (-Xm)
which gives the last property. •

The third condition in Eq. 2.138 can be replaced by

E[Xn+l I Fn] = Xn, n 2: 0. (2.140)

Proof: We have E[Xn+2 I Fn] = E{E[Xn+2 I Fn+d I Fn} = E[Xn+l I Fn] = Xn by


Eq. 2.135 and the fact that X is an Fn-martingale. •

If X= (Xo, X1, ... ) is an Fn-square integrable martingale, the sequence Y =


(Yo, Y1, ... ) defined by Yo= Xo- E[Xo] and Yn = Xn- Xn-1, n 2: 1, has the
orthogonality property

E[Yi YJ] = 0 fori i= j. (2.141)

Proof: We have Yn E L1, Yn E Fn, and

E[Yn+l I Fn] = E[Xn+l - Xn I Fn] = Xn- Xn = 0


since E[Xn+l I Fn] = Xn and E[Xn I Fn] = Xn. For n > m,
E[Yn Ym] = E{E[Yn Ym I Fm]} = E{Ym E[Yn I Fm]} = 0
since Ym E Fm and E[Yn I Fm] = 0. •
2.18. Martingales 95

Doob decomposition. If X = (X o, X 1 , ... ) is an Fn -submartingale, then there


is an Fn-martingale Mn and a process An such that Ao = 0, An t, An E Fn-1.
n ::: 1, and the representation

Xn = An + Mn, n ::: 0, is valid and unique. (2.142)

Proof: The notation An t means An :::: An-l· We say that A is predictable because An
is Fn-1-measurable. The representation in Eq. 2.142 shows that submartingales have a
predictable part A that can be told ahead of time and an unpredictable part M.
We first show that if the representation in Eq. 2.142 exists, it is unique. We have
E[Xn I Fn-11 = E[An I Fn-11 + E[Mn I Fn-11 =An+ Mn-l for n :::: 1. Substituting
Mn-l in this equation with its expression from Eq. 2.142, we obtain the recurrence formula
An = An-l + E[Xn I Fn-ll- Xn-l, which defines A uniquely with Ao = 0.
We now show that the decomposition in Eq. 2.142 exists, that is, that there are
processes A and M with the claimed properties. Let An, n = 0, 1, ... , be defined by the
above recurrence formula with Ao = 0. Note that An E Fn-1 and An :::: An-l since Xn
is a submartingale. Hence, An, n = 0, 1, ... , has the stated properties. We also have

E[Mn I Fn-ll = E[Xn- An I Fn-11 = E[Xn I Fn-ll


- E [An-l + E[Xn I Fn-11- Xn-l I Fn-d = -An-l + Xn-l = Mn-l
so that M is an Fn-martingale. •

Example 2.90: Let Xn, n ::: 0, be a square integrable martingale. Then X~ is a


submartingale with Doob decomposition in Eq. 2.142, where M n = X~ - An and
An= 1:?=1 E [xf- xf-t I Fi-1].

Proof: The process X~ satisfies the first two conditions in Eq. 2.138. Also,

E [X~ I Fn-1] = E [ (Xn- Xn-1) 2 + 2 Xn Xn-l - X~-1 I Fn-1]


= E [ (Xn- Xn-t) 2 I Fn-1] + X~-l :::: X~-1
since E [cxn- Xn-1) 2 I Fn-1 J : : 0, Xn-l is Fn-1-measurable, and Xn is a martin-
gale. Hence, X~ is a submartingale so that it admits the decomposition in Eq. 2.142.
The recurrence formula in the Doob decomposition defining A becomes An = An-l +
E [X~ I Fn-1]- X~_ 1 . •

Let X = (Xo, X1, ... ) be a square integrable Fn-martingale and A =


(Ao, At, ... ) be an Fn-predictable process such that Ao = 0 and E[A~] < oo.
Then Mn in Eq. 2.139 is an Fn-martingale.
Proof: The expectation of IMn I satisfies the condition
n n
L E [IAi II Xi- Xi-11]:::: L (E[AT] E[(Xi- Xi-1) 21)
1/2
E[IMnll::::
i=l i-l
96 Chapter 2. Probability Theory

so that it is finite since Ai and Xi have finite second moments. That Mn E :Fn follows
from its definition and the properties of An and Xn. Recall that An E :Fn-1, n :::: 1, since
A is :Fn-predictable. For n > m, we have

E[Mn I :Fml = E [Mm + t


i=m+1
Ai (Xi - Xi-d I :Fm]
n
= Mm + L E[Ai (Xi - xi-1) I :Fml
i=m+1
n
= Mm + L E {E[Ai (Xi- xi-1) I :Fi-!ll :Fm} = Mm
i=m+1

since Ai E:Fi-1 and xi is a martingale so that E[Ai (Xi- xi-1) I :Fi-ll= Ai E[Xi-
xi-1 I :Fi-d = o.
If the integrator X of the discrete stochastic integral in Eq. 2.139 is not a martingale,
the integral M = (Mo, M1, .. .) may not be a martingale either. For example, we have
E[Mn I :Fn-d = Mn-1 + E[An (Xn- Xn-1) I :Fn-d form= n -1 or Mn-1 + Zn An,
where Zn = E[Xn - Xn-1 I :Fn-d· •

2.18.2 Stopped martingales


Let X = (Xo, X 1, ... ) be an Fn-submartingale, martingale, or supermartin-
gale (Eq. 2.138) and letT denote an Fn-stopping time. We call

X~ (w) = Xn/\T(w)(w) forn = 0, 1, ... andw E Q, (2.143)

the sequence X stopped at T. Figure 2.16 shows five samples of a random walk
Rn = 2::7= 1 Xi for n ~ 1 and Ro = 0 (Example 2.73, Fig. 2.10) stopped at the
timeT when it leaves the interval (-a, a), a = 100, for the first time, where Xi
are independent Gaussian variables with mean zero and variance 100.

If Xn, n ~ 0, is adapted to a filtration Fn, so is X~ in Eq. 2.143.

Proof: Note that


n-1
Xn/\T = L xk 1(T=k) + Xn 1(T?;n)•
k=O
Xk E :Fk S::: :Fn fork ~ n since X is adapted, and l(T=k} E :Fk S::: :Fn and l(T?:n} E :Fn
since Tis a stopping time. Hence, Xn/\T is :Fn-measurable for all n :::: 0. •

If Xn, n ~ 0, is an Fn-submartingale, martingale, or supermartingale, so is X~


in Eq. 2.143.
2.18. Martingales 97

150r------.-------.----,----,------,------,

-150'------'-----'----'-------'---------'------'
0 50 100 150 200 250 300

Figure 2.16. Samples of a stopped random walk

first and third properties in Eq. 2.138. Thatxr


Proof: We have shown that X! is Fn-adapted. It remains to prove that X! has also the
is integrable follows from

E[IXnAriJ = r
J{T:Sn)
IXni\TidP + r
J{T>n)
IXni\TidP

= tf
k=O j{T=k}
IXkldP + f
j{T>n}
IXnldP,

E[IXni] < oo, and the inequalities f{T=k} IXkldP ::; E[IXkl1 and f{T>n) IXnldP ::;
E[IXniJ. For the last property in Eq. 2.138 we use Xni\T = I:Z:JXk l{T=k)+Xn l{r:::n}
sothatXcn+l)Ar-Xn/\T = (Xn+l-Xn) l{T>n)· The expectation E[X(n+l)I\T-XnAT I
Fn] is equal to l{T>n) E[Xn+l - Xn I Fn] since l{T>n} is Fn-measurable. If X is a

xr
submartingale, martingale, or supermartingale, E[Xn+l - Xn I Fn] is positive, zero, or
negative so that is a submartingale, martingale, or supermartingale, respectively. •

If (1) X is an :Fn -martingale, (2) T is a stopping time with respect to :Fn such that
T < oo a.s., (3) Xr is integrable, and (4) E[Xn l{T>n}l ---+ 0 as n ---+ oo, then
E[Xr] = JL, where JL = E[Xn]. This statement is referred to as the optional
stopping theorem.

Proof: The value of the martingale X at T can be written as Xr = Xni\T + (Xr -


Xn) l{T>n) so that E[Xr] = J-t + E[Xr l{T>n}J- E[Xn l{T>n)J since X! is a martin-
gale. The expectation E[Xn l{T>n}1 converges to zero as n --+ oo by hypothesis. The
expectation E[Xr l[T>n}1 = L~n+l E[Xk l[T=k}1 also converges to zero as n --+ oo
98 Chapter 2. Probability Theory

because XT is integrable so that the series I:f:,0 E[Xk l{T=k}l is convergent. Hence, the
expectation of XT is J.L. •

2.18.3 Inequalities
We present two inequalities that are most useful for applications. Addi-
tional inequalities and convergence theorems can be found in, for example, [66]
(Chapter 24) and [151] (Chapter 10).

Doobmaximalinequality. If X= (Xo, XI, ... ) is a positive Fn-submartingale


and A. > 0 is an arbitrary constant, then

(2.144)

Proof: Set X~ = maxo::;k::;n Xk. For A. > 0 letT = min{k :5 n : Xk 2: A.} if there exists
k ::: n such that Xk 2: A. and let T = n otherwise.
Because X is a positive submartingale and Tis a stopping time, we have E[Xn] 2:
E[XT] since T::; n. Also,

E[XT] = E[XT l{x;t~J.j] + E[XT l{x~<AJ] 2: A. P(X~ 2: A.)+ E[Xn l{X~<A}l

since X~ 2: A. implies XT 2: A. and X~ <A. implies XT = Xn. The above results give
E[Xn] 2: E[XT] 2: A. P(X~ 2: A.)+ E[Xn l{x;t<~.J]

or A. P(X~ 2: A.) :5 E[Xn] - E[Xn l{x~<~.J] = E[Xn l{x,::~J.}l· •

Doob maximal L2 inequality. If X = (Xo, XI, ... ) is a square integrable


positive Fn-submartingale, then

E [(max xk) 2]~ 4 E[X~].


O::;k::;n
(2.145)

Proof: The equality E[Y 2] = 2 J000 y P(Y


> y) dy holding for any positive square inte-
grable random variable Y applied to X~= maxo::;k::;n Xk gives

E[(X!) 2 ] = 2 fooo x P(X! > x) dx :5 2 fooo E[Xn l{x~~x}l dx,


where the above inequality follows from Eq. 2.144. We also have

fooo E[Xn L
l{x;:~xj]dx = fooo (fo Xn l{x;:~xjdP) dx = (foX~ Xndx) dP
{ Xn* Xn dP = E[Xn* Xn] :5 ( E[(Xn)
= Jo * 2] E[Xn]
2 )1/2 ,
2.19. Problems 99

where the Fubini theorem and the Cauchy-Schwarz inequality were used. We have found
E[(X~)2] :::; 2 ( E[(X~)2] E[X~]) 112 , which gives the Doob inequality in Eq. 2.145 fol-

lowing division by ( E[(X~) 2 ) 112 . •

Example 2.91: The sequence Xn = Lk=O Zk. n = 0, 1, ... , with Zo = 0 and


Zk 2: 0 independent random variables is a positive Fn-submartingale, where
Fn = a(Zt, ... , Zn) for n 2: 1 and Fo = {0, Q}. Figure 2.17 shows ten samples
200,---.-----.-----.----.----.----.----,---,---,----,

180

180

140

120

100
Samples of X
eo

60

40

Figure 2.17. Ten samples of a positive submartingale X

of X for random variables Zk uniformly distributed in the range (0, 2 J.tk), where
J.tk = p (ePk - 1) and p = 0.1. The samples resemble the growth of a crack in a
the plate with the number n 2: 0 of stress cycles ([79], p. 17). <>
Note: That X is a submartingale follows from E[Xn I Fm] = E[Xm + Lk=m+l zk I
Fm] = Xm + Lk=m+l E[Zk] ~ Xm. The expectation E[Xn] = Lk=O E[Zk] of Xn is
an increasing function of time.
The submartingale X in this example is square integrable so that the inequality in
Eq. 2.145 applies but gives a trivial result since X has increasing samples so that x:; =
maxo:::;k:::;n Xk = Xn. Estimates of P(X~ ~A.) and E[Xn 1x~::::>.J/A. in Eq. 2.144 based on
1,000 samples of X are 0.2910 and 0.3136, respectively for A.= 160. A.

2.19 Problems
2.1: Let Q be a sample space and let A be a collection of subsets of Q. Show that
a(A) = ng 2Ag is a a-field, where g are a-fields.
100 Chapter 2. Probability Theory

2.2: Let Q = { 1, 2, 3, 4, 5, 6} be a sample space and consider the collection

:F = {0, Q, {1, 2, 3}, {2, 4, 6}, {2, 4}, {1, 3, 5, 6}, {6}, {1, 2, 3, 4, 5}}

of subsets of Q. Is :Fa a-field?

2.3: Let :F be a a-field on a sample space Q. Show that :F is closed to countable


intersections.

2.4: Complete the proof ofEq. 2.8

2.5: Suppose that A and B are independent events. Show that A c and B are
independent events.

2.6: Prove that any monotone function is measurable.

2.7: Suppose that h : (JE., B) --+ (JE., B) is measurable and let a be a constant.
Show that ha(x) = h(x) if h(x) :::=:a and ha(x) =a if h(x) >a is measurable.

2.8: Let X be a random variable defined on a probability space (Q, :F, P). Show
that the a-field generated by X is the smallest field with respect to which this
random variable is measurable.

2.9: Complete the details of the proof of Eq. 2.26.

2.10: Showthatinfi Xi, supi Xi, liminfi Xi, andlimsupi Xi are random variables
on (Q, :F, P), where Xi, i = 1, 2, ... , are random variables on this space.

2.11: Prove the statements in Section 2.5.2.1.

2.12: Prove the statements in Section 2.5.2.2.

2.13: Let hn = l[n,n+l], n = 1, 2, ... , and let A. denote the Lebesgue measure.
J
We have fJE. hn dA. = 1 for all n. Show that (lim hn) dA. i= lim hn dA.. J
2.14: Use the dominated convergence theorem to find the limit as n --+ oo of the
integral Jt) .JX"/(1 + n x 3 ) dx.

2.15: Find the characteristic function of a random variable X that is uniformly


distributed in a bounded interval (a, b).

2.16: Let Xi "'"' Sa(ai, f3i, JLi), i = 1, ... , n, be independent a-stable random
variables. Find the characteristic function of "L7= 1 Xi. Comment on your result.
2.17: Let (Y1, Yz) be an JE.2 -valuedGaussian variable with the density in Eq. 2.107.
Find the joint density of (XJ = gJ(YJ), Xz = gz(Yz)), where gi, i = 1, 2, are
increasing functions. Specialize your results for g i (y) = exp(y).
2.19. Problems 101

2.18: Calculate the mean and variance of the estimator X in Example 2.64.
2.19: Show that an JRd -valued random variable X E Lz is Gaussian if and only
if Ef= 1 Oli Xi is a real-valued Gaussian variable for every collection of constants
ott, ... , aa.

2.20: Show that Xn ~ X and Yn ~ Y imply Xn + Yn ~ X+ Y.

2.21: Let g : lR ~ lR be a continuous function. Let X and X 1, Xz, ... be random


variables on a probability space (Q, :F, P). Show that X n ~ X and Xn ~ X
imply g(Xn) ~ g(X) and g(Xn) ~ g(X), respectively.

2.22: Take a collection of random variables Xi, i E I, with the same mean and
variance but different distributions. Plot the probabilities P(IX i I > x) as a func-
tion of x > 0 and a Chebyshev bound. Can the bound be used to approximate
P(IXil > x)?

2.23: Let Rn = '£7= 1 Xi for n :::: 1 and Ro = 0, where Xi > 0 are iid random
variables. Find the probability law of the stopping timeT= inf{n::;: 0: Rn >a},
where a > 0 is a constant.

2.24: Let X and Xi, i = 1, ... , n, be random variables on a probability space


(Q, :F, P). Show that the best mean square estimator of X given (X 1, ... , Xn) is
X= E[X I cr(Xt, ... , Xn)].
2.25: Let Xi, i = 0, 1, ... , be a sequence of random variables generated by
Xi+ 1 = p Xi + Wi, where p is a constant and Wi are iid random variables, and
Xo is independent of Wi. Is Xi a martingale, submartingale, or supermartingale?

2.26: Let X be an integrable random variable defined on a probability space


(Q, :F, P) and let :Fn. n ::;: 0, be a filtration on this space. Show that Xn =
E[X I :Fn] is an Fn-martingale.

2.27: Let Xi, i ::;: 1, be iid random variables defined on a probability space
(Q, :F, P) such that P(Xi = 1) = P(Xi = -1) = 1/2. Consider the filtra-
tion :Fn = cr(Xt, ... , Xn) on this space and the random walk Rn = '£7= 1 Xi.
Show that R~ - n and ( -l)n cos(rr Rn) are Fn-martingales.

2.28: Let Xn be an Fn-martingale and let B E B. Show that T = min{n : Xn E


B} is an :Fn -stopping time.
Chapter 3

Stochastic Processes

3.1 Introduction
In the previous chapter we defined a time series or a discrete time stochastic
process as a countable family of random variables X= (X t. X2, ... ). Time series
provide adequate models in many applications. For example, X n may denote the
damage of a physical system after n loading cycles or the value of a stock at the
end of day n. However, there are situations in which discrete time models are
too coarse. For example, consider the design of an engineering system subjected
to wind, wave, and other random forces over a time interval I. To calculate the
system dynamic response, we need to know these forces at each time t E I.
The required collection of force values is an uncountable set of random variables
indexed by t E I, referred to as a continuous time stochastic process or just a
stochastic process. We use upper case letters for all random quantities. A real-
valued stochastic process is denoted by {X (t), t E I} or X. If the process takes on
values in JRd, d > 1, we use the notation {X(t), t E I} or X.
If X is indexed by a space coordinate~ E D c JRq rather than timet E I,
then {X(~),~ E D} is called a random field. There are notable differences be-
tween random processes and fields. For example, the concept of past and future
has a clear meaning for stochastic processes but not for random fields. How-
ever, stochastic processes and random fields share many properties. There are
also situations in which a random function depends on both space and time pa-
rameters. The evolution in time of (1) the wave height in the North Sea, (2) the
temperature everywhere in North America, (3) the ground acceleration in Califor-
nia during an earthquake, and (4) the Euler angles and the dislocation density in
a material subjected to plastic deformation are some examples. These functions
are random fields at each fixed time t E I and stochastic processes at each site
~ E D. They are referred to as space-time stochastic-processes and are denoted
by {X(t, ~), t 2:: 0, ~ED}
This chapter reviews essential concepts on stochastic processes and illus-

103
104 Chapter 3. Stochastic Processes

trates them by numerous examples. The review includes properties of stochastic


processes, filtration, stopping time, and finite dimensional distribution. Second
moment calculus is considered in detail because of its extensive use in applica-
tions. The concluding sections of the chapter deal with martingales, Poisson,
Brownian motion, and Levy processes. These processes are needed subsequently
to develop the Ito formula and apply it to the solution of a wide class of stochastic
problems. Properties of the random fields used in the book are also reviewed.

3.2 Definitions
Let X : I X n 1--+ r be a function of two arguments, t E I and (j) E n,
where I is a subset of lR or [0, oo) and en, :F, P) denotes a probability space.

If X et) is an JRii -valued random variable on the probability space en, :F, P) for
each t E I, that is, X et) E :F for each t E I, then X is said to be an JRd- valued
stochastic process.

Note: We will also refer to an JRd -valued stochastic process with d > 1 as a vector stochas-
tic process. The function X (·, w) for a fixed w E Q is called a sample path, path, sample,
or realization of X. The function X(t, ·)for a fixed tis by definition an JRlf-valued ran-
dom variable. The measurable space (JRli, Bd) used in the definition of X is sufficiently
general for our discussion. The definition of a stochastic process X does not require that
the functions X(·, w) : [0, oo) 1-+ JRll and X(·,·) : [0, oo) x Q 1-+ JRll be measurable. A

Random fields are defined in the same way. Let en, :F, P) denote a proba-
bility space and let X : D X n 1--+ JRd be a function of two arguments, ~ E D and
w E n, where D C R.q and q ~ 1 is an integer.

If X e~) is an JRd -valued random variable on the probability space (n, :F, P) for
each ~ E D, that is, X e~) E :F for each ~ E D, then X is said to be an JRd -valued
random field in D.

Example 3.1: Figure 3.1 shows daily average temperatures measured in Central
Park, New York City, during two different years starting on the first of January
and ending on the last day of December. These records can be interpreted as
two samples of a stochastic process {X (t), t = 1, ... , 365} giving daily average
temperatures in Central Park. A collection of samples of X of the type shown in
this figure can be used to estimate the probability law of X. <>

Example 3.2: Let <I> denote one of the three Euler angles of the atomic lattice
orientation in aluminum AL 7075. Figure 3.2 shows 14,000 measurements of <I>
performed on an aluminum AL 7075 plateD with dimensions 540 x 540 J,Lm. The
left plot shows the spatial variation of <I> over the plate. The right plot presents
contour lines for <I>. These lines partition D in subsets characterized by nearly
constant values of <I>, called grains or crystals. The plots in this figure can be
3.2. Detinitions 105

100,--~--~-----, 100.----------------,
Sample#1 Sample#2

OL--~--~-~-~
0 100 200 300 400 100 200 300 400

time (days) time (days)

Figure 3.1. Two records of daily temperatures in Central Park, New York City in
two consecutive years

y (!Jm)

100 200 300 400 500


0 0
X (IJ.m)

Figure 3.2. Three dimensional and contour lines for an Euler angle of atomic
lattice orientation in aluminum AL 7075

interpreted as a sample of the random field <I> : D x Q -+ IR, D c JR 2 • Proper-


ties of this field can be inferred from measurements on a collection of aluminum
specimens. <>

Example 3.3: Let X : I x Q-+ lR be defined by X(t, w) = Y(w) cos(t), where


I = [0, 10] andY is a random variable defined by Y(w) = won the probability
space ([0, 1], 8([0, 1]), P), where P(dw) = dw. Figure 3.3 shows the mapping
(t, w) ~ X(t, w) and 10 samples of X.<>
Note: X(t, ·)is B([O, 1])-measurable for a fixed t, since cos(t) is a constant and Y(w) is a
continuous function of w so that it is B([O, 1])-measurable. Hence, X is a stochastic pro-
cess. X(·, w) is B([O, 10])-measurable for a fixed w because Y(w) is a constant and cos(t)
106 Chapter 3. Stochastic Processes

Mapping (t, ro) H X(t, ro) Ten samples of X

10

0 0

Figure 3.3. The mapping (t, w) 1-+ X (t, w) = Y (w) cos(t) and ten samples of X

is continuous. It is natural to ask whether the function (t, w) 1-+ X(t, w) is measurable
from ([0, 10] x [0, 1],8([0, 10]) x 8([0, 1])) to ([-1, 1],8([-1, 1])), where [-1, 1] is
the range of X. The answer to this question is in the affirmative (Example 3.4). A

We have seen that a time series X = (X 1, X2, ... ) viewed as a function


(m, w) 1-+ Xm(w) is measurable from G~+ x n, K x :F) to (IR, B), where z+ =
{1, 2, ... }, K consists of all subsets of z+, and (Q, :F) denotes the measurable
space on which the random variables X n• n = 1, 2, ... , are defined (Section 2.4.3).
Generally, this property is not preserved when the time becomes continuous, that
is, for stochastic processes and random fields.

A stochastic process X is measurable if the function X I x n 1-+ JRd is


measurable from (I x n, B(l) x :F) to (JRd, Ed).

Note: If X is a measurable stochastic process, then X(t, ·)is F-measurable and X(·, w) is
8(1)-measurable by Fubini's theorem (Section 2.8). The first statement is valid even if X
is not measurable because X is a stochastic process.
The assumption that a process is measurable may not be very restrictive since many
of the stochastic processes used in applications are measurable (Example 3.4). A

The requirements in the definitions of stochastic processes and random


fields are quite weak so that the samples of these random functions can be quite
erratic. We can impose additional conditions to control the behavior of these func-
tions, for example, we may require that a stochastic process satisfy some measur-
ability conditions and/or its samples are continuous. A stochastic process X (t),
t 2:: 0, is said to be right/left continuous or continuous if the function X (t, w),
t 2:: 0, is right/left continuous or continuous, respectively, for every w E n. If the
above properties hold for every wE n \ N such that N E :F and P(N) = 0, we
3.2. Definitions 107

say that X is right/left continuous a.s. or continuous a.s. Requirements on the


sample properties may imply some measurability properties, as illustrated by the
following examples.

Example 3.4: Let {X (t), t E [0, 1]} be a real-valued stochastic process. If X


(1) starts at zero, that is, X(O) = 0, (2) has a.s. continuous samples, (3) has
stationary independent Gaussian increments, that is, the random variable X (t) -
X (s), t 2: s, is Gaussian with mean zero and variance t - s, and (4) the increments
of X are independent of the past, that is, X (t) - X (s ), t > s, is independent of
Fs = a(B(u), 0 :::; u :::; s), then X is called a Brownian motion or a Wiener
process and is denoted by B. The Brownian motion process is measurable. <>
Proof: LetsCnl(t,w) = B(kz-n,w),t E ((k-l)z-n,kz-n],k = l, ... ,zn,and
s(nl(O, w) = 0 be an approximating sequence forB in the time interval [0, 1]. The pro-
cess s(n) has piece-wise constant sample paths, depends on a countable number of values
of B, and can be viewed as a random walk with time step t::.t = z-n defined by a sum of
independent Gaussian variables with mean zero and variance z-n for each n (Section 2.14).
Hence, the function (t, w) r-+ sCnl(t, w) is measurable from 8([0, 1]) x .F to (lR, B). The
measurable mapping (t, w) r-+ sCnl(t, w) converges to (t, w) r-+ B(t, w) point-wise by
the continuity of the sample paths of B so that the function B (t, w) = liflln-H,o B (n l (t, w)
is measurable, since limits of measurable functions are measurable. We will see in Sec-
tion 3.13 that B has continuous samples.
We also note that the random variables X(t)- X(s) and X(v)- X(u), u < v ::::
s < t, are independent since X(t)- X(s) is independent of :Fs by property (4). Hence, the
Brownian motion process has independent increments. •

In Section 2.15 we have defined filtration and adapted time series. These
concepts extend directly to the case in which the time index is continuous.

A collection of sub-a-fields (F1 h::::o ofF such that Fs ~ F 1 C F, 0 :::; s :::; t,


is called a filtration in (Q, F). A probability space (Q, F, P) endowed with
a filtration (F1) 1::::o is said to be a filtered probability space and is denoted by
(Q, F, (F1 ) 1::::o. P). A stochastic process X defined on (Q, F, P) is adapted
to the filtration (F1 ) 1::::o, or F 1 -adapted, or just adapted if X(t) E F 1 for each
t 2: 0.
Note: The definition is given for the most common case considered in our discussions
when t takes values in I = [0, oo). A similar definition holds for an arbitrary index set/.
If X is .F1-adapted, then X is a stochastic process since X(t) E Ft implies X (t) E .F for
each t :::: 0. Generally, the converse is not true. A

Let {X(t), t 2: 0} be a stochastic process defined on a probability space


(Q, F, P). The natural filtration

F 1X =a (X(s), 0:::; s:::; t) =a (uo 9 ga(X(s))) (3.1)

of X is the smallest filtration with respect to which X is adapted.


108 Chapter 3. Stochastic Processes

Note: In Section 2.15 we have defined the natural filtration :F! = o-(X 1, ... , Xn), n 2: 1,
for a time series X= (Xr, X2, ... ). The definition of :F1x in Eq. 3.1 is a direct extension
of the definition of :F! to a continuous time index. If (:F1)t>O is a filtration on (Q, :F, P)
such that :Fl c :F1 for each t 2: 0, then X is :F1 -adapted. A-

Example 3.5: Let ([0, 1], B([O, 1]), P) be a probability space and a random vari-
able Y(w) = w defined on this space, where P(dw) = dw. Consider also a
stochastic process X: [0, 1] x Q--+ IR defined by X(t, w) = Y(w). The natural
filtration of X is :F1x = B([O, 1]), t 2: 0. <>

Note: We have o-(X(t)) = o-(Y) for all t E [0, 1] so that :F1x = o-(Uo 99 o-(X(s))) =
o-(Y). Because w r+ Y(w) is the identity function, the o--field generated by the random
variable Y is B([O, 1]). A

Example 3.6: Let {B(t), t 2: 0} be a Brownian motion, :F1 = a(B(s), 0:::: s :::: t)
denote the natural filtration of B, and let g : [0, oo) x IR --+ IR be a continuous
function. The stochastic process Y(t) = g(t, B(t)) is :F1 -adapted. For example,
B(t) 2 , B(t) 2 - t, B(t)q for an arbitrary integer q 2: 1, and maxo<s<t B(s)q are
:Fradapted. We say that these processes are adapted to the Brownian motion.
On the other hand, the process Z(t) = B(t + r), r > 0, is not adapted to the
Brownian motion. <>
Note: The natural filtration of some of the processes derived from B may be smaller than
:F1 . For example, the natural filtration:;:; = o-(B(s) 2, 0 ::"= s ::"= t) is smaller than :F1 ,
t 2: 0, because :F{ contains the entire information on B 2 and IB I but not on B. Hence, the
stochastic process B2 is both :F{ and :Fradapted ([131], Example 1.5.2, p. 78).
The random variable B(t + r) is :Ft+r-measurable but it is not :F1-measurable for
r > 0. Hence, Z is not :F1-adapted. A

A stochastic process X defined on a filtered probability space


(Q, :F, (:F1 ) 1:c:o, P) is progressively measurable or progressive with respect to
(:F1 ) 1:c:o if the function X(s, w) : [0, t] x Q--+ IRis B([O, t)) x :F1 -measurable
for every t 2: 0.

Note: A progressively measurable process X is measurable since B([O, t)) x :Ft is included
in B([O, oo)) x :F. The process X is also :F1 -adapted because X(t, ·)is :Ft-measurable for
each t 2: 0 (Fubini's theorem). A

Example 3.7: If X is an :F1 -adapted stochastic processes with continuous sam-


ples, then X is progressively measurable. Hence, the Brownian motion is progres-
sively measurable with respect to its natural filtration. <>
Proof: Take a timet in [0, 1]. Let x(n) (t, w) = X (k 2-n, w) if t E ((k - 1) 2-n, k Tn],
k = 1, ... , 2n, and xCn) (t, w) = 0 if t = 0, n = 1, 2, ... , define a sequence approximating
{X(t),t E [0, 1]}. Themapping(t,w) 1-+ xCnl(t,w)isB([O,k2-n])x:Fk2-n-measurable
3.2. Definitions 109

if k and n are such that t E ((k -1) 2-n, k 2-n] so that it is also B([O, s]) x Fs-measurable
for s 2: k 2-n. That the process X is progressively measurable follows from the continuity
of the sample paths of X and the convergence of the mapping (t, w) r-+ x<nl(t, w) to
(t, w) r-+ X(t, w). •

Figure 3.4 illustrates the relationship between adapted, measurable, and


progressive processes. We have seen that a progressive process is measurable

Figure 3.4. Relationship between adapted, measurable, and progressive processes

and adapted. An adapted process may or may not be measurable. An adapted and
measurable process may not be progressive. The following statement generalizes
Example 3.7.

IAny right or left continuous adapted process X (t), t ?:: 0, is progressive.

Proof: Suppose that X is right continuous and define the sequence of processes

X ( ) _ { X(i tfn, w), if (i- 1) tfn < s ::=:: i tfn, i = 1, ... , n,


n,t s, w - X(s, w), if s = 0

for n = 1, 2, ... and t > 0. For any B E B, the set

{(s, w) : 0 ::'S s ::'S t, wE !J, Xn,t(s, w) E B} = {0} x {w: X(O, w) E B}


U (u?=t ((i- 1) t/n, i t/n] x {w: X(i t/n, w) E B})

is in B([O, t]) X Ft since X is adapted. Hence, Xn,t(S, w) with 0 :::: s :::: t and (J) E n is
B([O, t]) x Fr-measurable for each n 2: 1 so that Xn,t is progressive. By right continuity
we have Xn,t(s, w)-;. X(s, w) as n-;. oo for all 0 ::=:: s ::=:: t and wE !J. We conclude that
X is progressive since limits of measurable functions are measurable. •
110 Chapter 3. Stochastic Processes

We assume throughout the book that the filtration :F1 , t 2: 0, on (!J, :F, P)
has two properties, referred to as the usual hypotheses:
1. :F1 is complete meaning that (a) :F is complete, that is, for every A c B such
that B E :F and P(B) = 0 we have A E :F so that P(A) = 0 (Section 2.2.3), and
(b) :Fo contains all null subsets of :F, that is, A E :F such that P(A) = 0 implies
A E Fo.
2. :F1 is right continuous, that is, the a-field :Fr+ = ns>t:Fs is equal to :F1 for all
t 2: 0. The filtration Fr+ provides an "infinitesimal peek into the future" in the
sense that A E :Ft+ implies A E :Ft+s for ails > 0. Note that a right continuous
filtration :Fn indexed by n = 0, 1, ... , is constant since Fn+ = Fn+l·

Example 3.8: Let (!J, :F, (:F1) 1:;::o, P) be a filtered probability space and X be an
Fradapted stochastic process. If :F1 is right continuous, then the random variable
X(t) = lim supu{-t (X (u) - X (t)) /(u - t) is :F1 -measurable provided it exists. <>

Proof: Define the random variables Yn= n (X(t + 1/n)- X(t)) and Zn = SUPm>n Ym,
take p > n. Note that Zp E :Fr+lfp C :Ft+lfn so that infp Zp E :Ft+lfn for-any n
and infp Zp E n~ 1 :Ft+l/n = :Ft+· If the filtration :F1 is right continuous, then X(t) is
:F1-measurable for each t :::: 0. •

3.3 Continuity
We have already defined right/left continuity, continuity, a.s. right/left con-
tinuity, and a.s. continuity. Generally, it is difficult to establish whether a process is
sample continuous from these definitions. The following Kolmogorov criterion
provides a simple alternative. If {X (t), t E I} is a real-valued stochastic process
and I c [0, oo) is a closed interval, and there exist three constants a, {3, ~ > 0
such that
(3.2)

then limh-+0 SUPs,tEI,Is-tl<h [X(s)- X(t)[ = 0 a.s., that is, almost every sample
path of X is uniformly continuous in I. An JE.d -valued process, d > 1, is sam-
ple continuous if each of its coordinates satisfies the condition in Eq. 3.2 ([197],
Proposition 4.2, p. 57).
We give in this section some of the most common definitions of continuity
for stochastic processes. Both continuity at a point and in an interval are discussed.
Let {X(t), t 2: 0} be an JE.d-valued stochastic process defined on a probability
space (!J, :F, P).

X is continuous in probability at t 2: 0 if

lim P(ll X(s)- X(t) 112: e)= 0, \Is > 0. (3.3)


s-+t
3.3. Continuity 111

X is continuous in the p'th mean at t if

lim E[ll X(s)- X(t) liP] = 0. (3.4)


s-H

X is almost surely (a.s.) continuous at t if

P ({w: lim
s-+t
II X(s, w)- X(t, w) II= 0}) = 1 or

P(Q 1 ) = P({w: lim X(s, w)


s-+t
=f. X(t, w)}) = 0. (3.5)

x
Note: The Euclidean norm II II= ( 'Lf= xl)
1
112 is used in the above definition to assess

the magnitude of the difference X(t)- X(s).


If p = 2 in Eq. 3.4, X is said to be continuous in the mean square sense or
m.s. continuous at t. The mean square continuity is used extensively in the second moment
calculus considered later in this and the following chapters.
The subset 0 1 of Q in Eq. 3.5, consisting of the sample paths of X that are not con-
tinuous at timet, is in Fbecause itis defined by the condition II X(t+, w)-X(t-, w) II> 0
and II X(t+, w)- X(t-, w) II is a measurable function of the random variables X(t+) and
X(t-).
The process X is continuous in probability, continuous in the p'th mean, and almost
surely continuous in an interval I c [0, oo) if it is continuous in probability, continuous
in the p'th mean, and almost surely continuous at each t E I, respectively. ~

Example 3.9: A process {C(t) E JR, t ~ 0}, with C(O) = 0 and


N(t)

C(t) =L Yk, t > 0, (3.6)


k=1

is said to be a compound Poisson process, where Yk. k = 1, 2, ... , are indepen-


dent identically distributed (iid) real-valued random variables and N is a Poisson
counting process with intensity parameter A > 0. The compound Poisson pro-
cess in Eq. 3.6 and the Brownian motion process in Example 3.4 are continuous in
probability at each time t ~ 0. These processes are also continuous in probability
in [0, oo). Figure 3.5 shows a sample of a Brownian motion and a sample of a
compound Poisson process with jumps Y1 ,...., N(O, a 2 ) and intensity A such that
A E[Yf] = A a 2 = 1. The two processes have identical first two moments but
their samples differ significantly. Processes with the same first two moments are
said to be equal in the second moment sense (Section 3.8). ~

Proof: The process N has stationary independent increments, that is, (1) the distribution
of the random variable N(t) - N(s), s < t, depends only on the time lag t - s rather
than the times t and s and (2) the increments of N over non-overlapping time intervals are
112 Chapter 3. Stochastic Processes

5,----,----.----.----.----.----~---.----~---.-----

4.5

3.5

)\
........::J Compound Poisson
Brownian motion
process

process

0 2 3 4 5 6 7 8 9 10

Figure 3.5. Samples of a Brownian motion and a compound Poisson process with
Y1 "'N(O, a 2 ) andA.a 2 = 1

independent random variables, for example, N (t)- N (u) and N ( v)- N (s) are independent
for s < v :;:: u < t (Sections 3.6.4 and 3.12). The probability,

P(N(t) - N(s) = n) = (A (t- s))n e-).. (t-s), n = 0, 1, ... , (3.7)


n!
of N (t) - N (s ), s < t, depends only on the time lag t - s and an intensity parameter A > 0.
The compound Poisson process C in Eq. 3.6 has similar properties. If the random variables
Yk are equal to unity, then C is a Poisson process.
B is continuous in probability at timet because P(IB(t)- B(s)l > e) is equal to
2 4>( -e I .Jit=ST) so that it converges to zero as s ---+ t for any e > 0. The process C is
continuous in probability at t 2: 0 because the event IC(t) - C(s)l > e can occur if there
is at least one jump in the time interval (s, t], and the probability P(N(t - s) > 0) =
1- exp[-A (t- s)] of this event approaches zero ass---+ t. Because these processes are
continuous in probability at each time t 2: 0, they are also continuous in probability in
[0, oo). •

Example 3.10: The Brownian motion B is m.s. continuous at each t 2: 0. If


Y1 E L2, then the compound Poisson process C in Example 3.9 is m.s. continuous
at each t ::=: 0. <>
Proof: The second moment of the increments of Band Care E[(B(t)- B(s))2 ] = t - s
and A (t -s) E[Yf] +(A (t -s) E[Yd) 2 , respectively. The second moment of B(t)- B(s)
3.3. Continuity 113

follows from the definition of B. The expression of the expectation E[(C(t)- C(s)) 2 ] can
be found in [79] (Section 3.3). The limits of the second moments of the increments of B
and C are zero as s -+ t. •

Example 3.11: The Brownian motion process (Example 3.4) and the compound
Poisson process (Example 3.9) are a.s. continuous at each t :::: 0.
Proof: The random variable B(t)- B(t- 1/n) is Gaussian with mean zero and variance
1/n for any integer n ?:: 1. For ant: > 0 and n = 1, 2, ... define the sequence of events
An(t:) = {IB(t)- B(t- 1/n)l > t:}. By Fatou's lemma (Section 2.3.6) we have

P(liminf An (t:)) :::0 liminf P(An (t:)) = liminf2 (1 - ci>(.JI1 E))


n---+00 n---+oo n---+oo
= 0.
Consider now the compound Poisson process C and define the sequence of events
An= {IC(t)-C(t-1/n)l > 0}, n?:: 1. We have P(liminfn An) :::0 liminfn P(An), where
P(An) is the probability 1- e-'-fn that C has at least a jump in the interval (t- 1/n, t].
=
Because P(An) converges to zero as n-+ oo, we have P(Qt) 0 (Eq. 3.5). •

Example 3.12: Almost sure continuity at each time t does not imply sample con-
tinuity. <>
Proof: If X is a.s. continuous at each t ?:: 0, the probability of the set Qt in Eq. 3.5 is
zero at each time. However, the probability of Ut::::ont may not be zero or may not even be
defined since this set is an uncountable union of measurable sets, which is not necessarily
measurable.
For example, let (Q, :F, P) be a probability space with Q = [0, 1], :F = B([O, 1]),
and P(dw) = dw and let X be a real-valued stochastic process defined by X(t, w) = 0
and X(t, w) = 1 fort < wand t ?:: w, respectively. Because Qt in Eq. 3.5 is {t}, we
have P(Qt) = 0 so that X is continuous a.s. at each t E [0, 1]. However, the probability
of UtE[O,l]Qt = [0, 1] = Q is equal to 1 so that the process is not sample continuous.
Another example is the compound Poisson process that has a.s. continuous samples at each
timet but its samples are piece-wise constant functions. The Poisson process has the same
property. •

Example 3.13: The Brownian motion B is sample continuous while the com-
pound Poisson process C does not have continuous samples (Fig. 3.5). <>
Proof: Because B(t +h)- B(t) ~ N(O, h), h > 0, for any t ?:: 0, we have E[(B(t +h)-
B(t)) 4] = 3 h 2 so that the Kolmogorov condition in Eq. 3.2 is satisfied for a = 4, f3 = 1,
and I; ?:: 3. That Cis not sample continuous can be seen from Fig. 3.5. •

A process X is called cadh'tg if its sample paths are a.s. right continuous with
left limits. X is called dtglad if its sample paths are a.s. left continuous with
right limits. The names are acronyms for the French "continua droite, limites a
gauche" and "continua gauche, limites a droite", respectively.
Note: The Brownian motion is both cadlag and caglad because it has continuous samples
almost surely. The samples of the compound Poisson process C are right continuous with
left limit so that C is cadlag. The difference C (t+) - C (t-) = C (t) - C (t-) at time t is
one of the random variables Yk if C has a jump at t and zero otherwise . .i.
114 Chapter 3. Stochastic Processes

3.4 Stopping times


In Section 2.16 we have defined and illustrated stopping times associated
with time series. We consider now stopping times related to stochastic processes.
For example, let X(t), t 2:: 0, be the state of a system and assume that X has
continuous samples. Denote by T the first time when X exceeds a critical value
causing system failure and by f the time when X reaches its largest value during
a time interval [0, r], r > 0. To determine whether T is smaller or larger than a
timet ::: 0, it is sufficient to monitor X up to this time. Hence, the event {T :::: t}
is in F1 = a(X(s), 0 :::: s :::: t). On the other hand, we need to have complete
information on X in [0, r] to tell whether f is smaller than t < r. The information
content of F 1 is insufficient to make this assertion. Hence, { f :::: t} is not in F 1 •

Let (Q, F, (F1k:o:O• P) be a filtered probability space and T : Q --+ [0, oo] be
a random variable defined on this space. If {w : T(w) :::: t} E Ft for all t ::: 0,
then T is said to be an Frstopping time or a stopping time.

Note: A stopping timeT is said to be finite a.s. if P(T < oo) = 1 or P(T = oo) = 0 . .&
Example 3.14: Any positive constant is a stopping time on an arbitrary filtered
probability space (Q, F, (Ft)t~o. P). ¢

Proof: LetT =a, where a :=:: 0 is a real number. Then {w : T(w) :::; t}, t :=:: 0, is 0 if a > t
and the sample space Q if a :::; t so that it is in :Ft, t :=:: 0. •

• If Tis an Ft-stopping time, then {T < t} EFt and {T = t} E F 1 , t 2:: 0.


• If {T < t} E F 1 for all t ::: 0, then Tis an Frstopping time.

Proof: We have {T < t} = U~ 1 {T :::0 t- 1/n} and {T :::0 t - 1/n} E :Ft-lfn C :Ft so
that {T < t} is in :F1 . The event {T = t} is equal to {T :::; t} n {T < t}c so that it is in :Ft.
Suppose now that {T < t} E :F1 for all t :=:: 0. For an arbitrary t :=:: 0 and integer
m :=:: 1 we have {T :::; t} = n~m{T < t + 1/n} so that {T :::; t} E :Ft+l/m since
{T < t + 1/n} E :Ft+lfn c :Ft+lfm· Because m is arbitrary we have {T :::: t} E
n~=l :Ft+lfm = :Ft+ = :Ft, where the latter equality holds since :Ft is assumed to be a
right continuous filtration. •

Let (Q, F, (Ft)t~o, P) be a filtered probability space, Tt, Tz, ... be stopping
times on this space, and c E [0, oo] a constant. Then
• c + Tt and c 1\ Tt are F 1 -stopping times.
• supn~l Tn and infn~l Tn are Ft-stopping times.
eliminfn-+oo Tn, lim SUPn-+oo Tn are Frstopping times.

Proof: If c = oo, then c+ Tt = oo is a constant, which is a stopping time. Also, the random
variable c 1\ Tt = Tt is a stopping time. If c < oo, then {c + Tt :::; t} = {Tt :::: t - c} is 0
3.4. Stopping times 115

fort < c and {T1 ::; t - c} E Ft-c c :Ft fort 2: c. The event {c 1\ T1 ::; t} is Q fort 2: c
and {T1 ::; t} fort <c. Hence, c + T1 and c 1\ T1 are stopping times.
=
The statements in the second bullet results from {supn> 1 Tn ::; t} nn> dTn ::; t},
{infn:::1 Tn < t} = u~ 1 {Tn < t}, and {Tn < t} E :Ft by the right conti;;-uity of the
filtration :F1 , t 2: 0.
The statements in the last bullet follow from the definitions,

lim inf Tn = sup inf Tn and lim sup Tn = inf sup Tn,
n-'>oo m:C:1 n=::m n-'>00 m:C:1 n:C:m

of" liminf" and" lim sup". •

Let X (t), t :::: 0, be a right continuous adapted process and B denote an open
set. Then the first entrance timeT = inf{t :::: 0: X(t) E B} is an F 1 -stopping
time.

Proof: Since X is a right continuous process and B is an open set, we have {T < t} =
Uo::::s<t, s=rationadX(s) E B} and {X(s) E B} E :Fs C :Ft. Hence, Tis an :Ft-stopping
time because :F1 = :Ft+ by assumption. •

Example 3.15: LetT= inf{t 2': 0: B(t) >a} be the first time when a Brownian
motion B exceeds a level 0 < a < oo, that is, the first entrance time of B in
(a, oo), and let F 1 , t :::: 0, denote a right continuous filtration such that F 1 :J
rr(B(s), 0 :::; s :::; t). Then T is an F 1 -stopping time. Figure 3.6 shows the
density ofT for a = 1, 2, 3 calculated from P(T :::; t) = 2 [1 - <P(a/ .J()] by
differentiation. <>
Proof: T is an :Ft-stopping time by the previous property. We have P(B(t) > a)
1- 'P(a/Jf) and

P(B(t) > a) = P(B(t) > a IT > t) P(T > t) + P(B(t) > a IT ::; t) P(T ::; t)
= (1/2) P(T ::; t),

where ¢ denotes the distribution of the standard Gaussian variable. The last equality in
the above formula holds because P(B(t) > a I T > t) = 0 by the definition ofT, and
P(B(t) > a I T ::; t) = 1/2 because B(T) = a prior tot for T ::::: t and the Brownian
motion is symmetric with respect to its starting point. •

Example 3.16: LetT be an F 1 -stopping time and 0 =to < t 1 < · · · < tk < · · ·
a sequence of numbers such that tk -+ oo ask -+ oo. Define an approximation
T of T by T = tk+ 1 if T E [tk, tk+ 1) and T = oo if T = oo. Then T is an
F 1 -stopping time. <>

Proof: If t E [tk, tk+1), we have {'T ::; t} = {T < tk} and the last event is in :F1k c :F1 •
Note that the random variable f = tk if T E [tk, tk+1) and f = oo if T = oo is not an
:F1 -stopping time. •
116 Chapter 3. Stochastic Processes

0.6

0.5

0.4

0.3

a=3
0.2

0.1

0
0 10

time

Figure 3.6. Density of the stopping timeT = inf{t 2: 0 : B(t) 2: a} for a =


1, 2, 3

We define now two a-fields including events that occur up to a stopping


timeT using similar considerations as in Section 2.16.
Let T be an :F1-stopping time. Then

:Fr = {A E :F: An {T ::::: t} E :F1 , for all t 2: 0},


:Fr+ = {A E :F: An {T ::::: t} E :Ft+• for all t 2: 0} (3.8)

are sub-a -fields of :F including events in :F "occurring up to time T ."

Note: Fr and Fr + include all events in F occurring prior to T. If the filtration (Ft )1"=o
is right continuous as considered in our discussion, then Fr and Fr + coincide. The proof
that Fr and Fr+ are sub-u-fields ofF can be based on arguments as in Section 2.16 ([61],
Proposition 1.4, p. 52). A

In Section 2.18.2 we have defined stopped martingales in the discrete time


setting. This definition can be extended to stochastic processes. Let X be a
stochastic process defined on a probability space (Q, :F, P) endowed with a fil-
tration (:F1 }t;~o and letT, T' be :F1 -stopping times. Then

XT (t) = X(t 1\ T) (3.9)

is said to be the process X stopped at T. The following properties are valid.


3.5. Finite dimensional distributions and densities 117

• T and T 1\ T' are FT-measurable.


• If T :::; T', then FT s; FT'·
• If X is Frprogressive, then X(T) is FT-measurable on {T < oo}.
• Let 9t denote FT At for each t 2: 0. Then Wt )r:::o is a filtration and the stopped
process XT is both 9t and Fr-progressively measurable.

Note: Some of these properties will be used, for example, in Chapters 6 and 7 to solve de-
terministic partial differential equations and assess the evolution of mechanical, ecological,
or other systems. The proof of these and related properties can be found in [61] (Proposi-
tion 1.4, p. 52). A

3.5 Finite dimensional distributions and densities


Let {X(t), t 2: 0} be an JRd -valued stochastic process defined on a proba-
bility space (Q, F, P). Let n 2: 1 be an integer, ti 2: 0, i = 1, ... , n, be arbitrary
distinct times, and set Xn = (X(t1), ... , X(tn)).
The finite dimensional distributions (f.d.d.'s) of order n of X are the distri-
butions of the random vectors X n (Section 2.11.1 ), that is,

Fn(X(l), ... , x<n); t1, ... , tn) = P(n?=dX(ti) E x%= 1 ( -oo, Xi,k]}), (3.10)

where X(i) = (Xi,l, ... , Xi,d) E JRli and n 2: 1 is an integer.


Note: This definition is meaningful since x%= 1 ( -oo, xi,k] are Borel sets so that the
sets {X (ti) E x%= 1 ( -oo, Xi,k]} are in :F. The finite dimensional distributions Fn in
Eq. 3.10 are probability measures induced by X on the measurable space (Wd, Bnd)
(Sections 2.4.1 and 2.10.1). The marginal distribution F1 (·; t) of X at time t is also
denoted by F(·; t). A

If X is a real-valued stochastic process (d = 1), its finite dimensional dis-


tributions are

(3.11)

where Xi E JR. i = 1, ... , n. The corresponding finite dimensional densities of


X can be obtained from

(3.12)

provided that these derivatives exist. The marginal density of X is the density of
the random variable X(t) and is denoted by /1 (·; t) or f(·; t).
Generally, the information available on a stochastic process X in applica-
tions is sufficient to estimate at most its first and second order finite dimensional
118 Chapter 3. Stochastic Processes

distributions. Hence, assumptions need to be made to specify all finite dimen-


sional distributions of X. Two difficulties may be encountered. First, the postu-
lated finite dimensional distributions can be inconsistent with the physics of the
phenomenon modeled by X. For example, the Gaussian distribution is an inade-
quate model for the strength of a physical system because strength is positive and
bounded. Second, an arbitrary collection of distributions F n. n = I, 2, ... , may
not define a stochastic process.
If a collection of finite dimensional distributions F n, n = 1, 2, ... , satisfies
the consistency and the symmetry conditions, then it can be used to define
a stochastic process. A similar statement holds for random fields ([1], Sec-
tion 1.5).

Note: A collection of distributions Fn, n = 1, 2, ... , of the type in Eq. 3.10 satisfies the
consistency condition if any Fm of order m < n can be derived from Fn. For example,

Fm(Xl, ... ,xm; tl, ... ' tm) = Fn(Xl, ... 'Xm, 00, ... ' oo; flo ... ' tm, ... ' fn). (3.13)

The family of distributions Fn, n = 1, 2, ... , satisfies the symmetry condition if


the functions Fn are invariant when their arguments xi and ti are subjected to the same
permutation. &

Example 3.17: Denote by Q = C([O, 1]) the collection of real-valued continu-


ous functions defined on [0, 1] and let X : [0, 1] x C([O, 1]) ---+ ffi. be given by
X(t, w) = w(t), where w is a member of C([O, 1]). The family of finite dimen-
sional distributions

n
n
Fn(Xl, ... 'Xn; tl, ... ' tn) = <l>(xk), n = 1, 2, ... ,
k=l

satisfies the conditions of consistency and symmetry. Hence, this family can be
used to define a stochastic process but not the process X in this example. <>

Proof: That the family of distributions Fn satisfies the conditions of consistency and sym-
metry follows from its definition.
However, this family cannot be used to define the process X since (1) the sequence
of events An = {w : X(t, w) > £, X(t + 1/n, w) < -£}converges to the empty set as
n--+ oo by the continuity of the samples of X so that P(An)--+ P(0) =
0 as n--+ oo for
every e > 0 and (2) P(X(t) > £, X(s) < -£) = (<1>(-£)) 2 for any t =!=sand£> 0 by
the definition of Fn so that

P(An) = P(X(t) > e, X(t + 1/n) <-e)= (<l>(-e)) 2 , n = 1, 2, ... ,

does not approach zero as n --+ oo, a contradiction. The conclusion is somewhat obvious
in this example since the assumptions that X (t) is independent of X (s) for t =!= s and that
X has continuous samples are at variance. •
3.6. Classes of stochastic processes 119

The finite dimensional distributions of a process X do not provide infor-


mation on all of its statistics. For example, the probability of the set {IX(t)l <
a, 0 :::::: t :::::: 1} = no 9 :od1X(t)l < a}, a > 0, depends on an uncountable num-
ber of values of X so that it cannot be calculated from its f.d.d.'s. Moreover, this
set may not even be in :F since it is the intersection of an uncountable number
of events. Additional conditions are needed in the definition of X to assure that
{IX(t)l <a, 0:::::: t:::::: 1} and other similar sets are in F.

3.6 Classes of stochastic processes


We define stationary, ergodic, Markov, independent increment, Gaussian,
translation, and mixture of translation processes, and illustrate them by examples.
These processes are used frequently in applications [79, 175].

3.6.1 Stationary processes

An JRd -valued stochastic process X is said to be stationary in the strict sense


or stationary if

d
(X(ti), ... , X(tn)) = (X(ti + r), ... , X(tn + r)) (3.14)

for any n 2: 1, distinct times t i, i = 1, ... , n, and time shift r.


Note: The condition in Eq. 3.14 is equivalent to the requirement that all finite dimensional
distributions of X are invariant under an arbitrary time shift. If X is stationary, then the
finite dimensional distributions of X depend on only the lag between the times {!I, ... , tn}
rather than their values. Hence, the marginal distribution of X is time invariant. A

Example 3.18: The Brownian motion B is not a stationary process since B(t) "'
N(O, t) so that its marginal distribution, F(x; t) = P(B(t) :::::: x) = cf>(xj .ji),
changes in time. <>
Example 3.19: Let Yk be independent identically distributed random variables
that are independent of a random variable Y. The time series X k = Y + Yk,
k = 1, 2, ... , is stationary. <>
Proof: The finite dimensional distributions of the process X = (XI, X2, ... ) are
Fn(Xl, ... , xn; k1, ... , kn) = P(nt=I {Y + Yk; ::::xi})

~ J. [fV''• 9;- ,>] dF,(y) ~f.. [fV''"'- ,>] dF,(y),

where FI and Fy denote the distributions of YI and Y, respectively. The expression of Fn


shows that X is a stationary time series.
Note that the time series Zk = k Y + Yk is not stationary because its f.d.d.'s
Fn(XI, ... , xn; kJ, ... , kn) depend on the times ki. •
120 Chapter 3. Stochastic Processes

3.6.2 Ergodic processes

An JE.d -valued stochastic process X is said to be ergodic if ensemble averages


equal time averages, that is, if we have

11r/2
E[g(X(t))] = lim - g(X(s)) ds a.s. (3.15)
r---+oo r -r/ 2

for any real-valued measurable function g such that E[lg(X(t))l] < oo.

Note: The condition in Eq. 3.15 states that the ensemble average E[g(X(t))] can be calcu-
lated from the temporal average (1/r) J~~~ 2 g(X(s)) ds on a sample path of X of length
r -+ oo ([176], Section 3.2.3, [197], Section 2.3). A

Example 3.20: The time series X = (X 1, X2, ... ) in Example 3.19 is ergodic if
Y = c is a constant. Otherwise, X is not an ergodic time series. <>

Proof: If Y = c, the random variables g(c + Yk) are independent so that the sum

n
(1/n) L g(c + Yk)
k=I

converges a.s. to E[g(c+ Y1)] as n -+ oo by the strong law oflarge numbers (Section 2.13),
provided that E[lg(c + YJ)Il is finite.
If Y is not a degenerated random variable, X is not ergodic because Eq. 3.15 does
not hold for all measurable functions g. For example, if g(x) = x the left and right sides
ofEq. 3.15 are E[Y] + E[Yj] and Y(w) + E[Yj], respectively, where Y(w) is a sample of
the random variable Y. •

Weaker ergodicity conditions can be defined by restricting the form of the


function g in Eq. 3.15. For example, this equation gives the conditions for er-
godicity of the mean and ergodicity of the marginal distribution for g (x) = x
and g(x) = 1 xf=l(-oo,~il(x) respectively, where~ = (~J, ... , ~d) E JE.d ([141],
Section 9.8).

3.6.3 Markov processes


We limit our discussion to real-valued processes for simplicity. The exten-
sion to JE.d -valued processes is straightforward.
A real-valued stochastic process X is Markov if for every integer n 2: 1 and
times t1 < · · · < tn the conditional random variables X(tn) I XCtn-d and
(X Ctn-2), ... , X Ct1)) I X Un-I) are independent, that is, the past of X is inde-
pendent of its future conditional on the present.
3.6. Classes of stochastic processes 121

Note: This definition implies

fXCtn),X(tn-z), ... ,X(ti)IX(tn-I)(Xn, Xn-2• · · · • XI I Xn-I)


= fX(tn_ 2 ), ... ,X(t1)1X(tn-I)(Xn-2• ···'XI I Xn-I) fX(tn)IX(tn-I)(Xn I Xn-I), (3.16)

where the subscripts of f indicate the reference random variables. For example, the func-
tion fX(tn)IXCtn-Il is the density of X(tn) I X(tn-I). The alternative forms of this defini-
tion,

fn (xi, ... , Xn; fi, ... , tn)


= fn-1 (XI, ... , Xn-1; tj' ... , tn-I) fX(tn)IX(tn-Il(Xn I Xn-J) (3.17)

and

fX(tn)I{X(tn-Il, ... ,X(ti)}(Xn I Xn-1, · · ·, XJ) = fX(tn)IXUn-Il (xn I Xn-J), (3.18)

can be obtained from Eq. 3.16 by multiplying with the density of X(tn-I) and from
Eq. 3.17 by dividing with the density of (X(tn-J), ... , X(tt)), respectively. A Markov
processcanalsobedefinedbytheconditionE[g(X(t+s)) 1.1l1 = E[g(X(t+s)) I X(t)],
where t, s :::: 0 are arbitrary times and g : lR---+ lR is a Borel function ([61], p. 156) . .l

The density of the conditional random variable X(t) I X(s), t > s, called
the transition density of X, has the following remarkable properties.
If X is a Markov process, then all its finite dimensional densities are specified
completely by its marginal and transition densities.

Proof: Repeated application of Eq. 3.17 gives


n
fn (XJ, ... , xn; t1, ... , tn) = /(XJ; t1) Jl fx(ti)IX(ti-Il (xi I Xi-J),
i=2

Chapman-Kolmogorov equation. The conditional densities f X(t)IX(to)•


!x(t)IX(s)• and !x(s)IX(to)• to < s < t, are related by

!x(t)IX(t0 )(x I xo) = k. fx<t)IX(s)(X I y) fx(s)IX(t0)(Y I xo) dy. (3.19)

Proof: The density of the conditional random variable X (t) I X (to), to < s < t, is

fx(t)IX(to)(x I xo) =
f2(xo, x; to, t)
f ( t)
1 xo; o
1
= f 1 (xo; t)
h
o lR'. f3(xo, y, x; to, s, t)dy
= { f3(x 0 , y, x; to, s, t) f2(xo, y; to, s) dy,
llR'. f2(xo, y; to, s) !1 (xo; to)
which gives Eq. 3.19. •
122 Chapter 3. Stochastic Processes

Example 3.21: Let X (t) = L~=l Yk be a discrete time process, where t ::=: 1 is an
integer and Yk, k = 1, 2, ... , are iid random variables. The process X is Markov
and fPX(t)I(X(s)=y)(u) = eHuy n~=s+l cpy1 (u) is the characteristic function of
X(t) I X(s), t > s. o
Proof: Fort > s we have X(t) I (X(s) = y) = y + Li=s+l Yk so that future values X(t)
of X are independent of its past values conditional on X (s) = y. The characteristic function
results from the expression of X(t) I (X(s) = y), where cpy1 denotes the characteristic
function of Yt. •

3.6.4 Independent increment processes

• An JRd -valued stochastic process X is said to have independent increments


if the random variables X(t) - X(v) and X(u) - X(s), s < u :5 v < t, are
independent.
• If X has independent increments and the distribution of X (t) - X (s ), s :5 t,
depends on only the time lag t - s rather than the values of the times s and t,
then X is said to have stationary independent increments.

Note: The Brownian motion and the compound Poisson processes in Examples 3.4 and 3.9
have stationary independent increments. For example, by the definition of the Brownian
motion process B, the increments of B over non-overlapping intervals are independent and
B(t)- B(s) ~ N(O, t - s), t 2: s. A

The finite dimensional densities of a real-valued process X with X (0) = 0 and


independent increments are

fn (XI, ... , Xn; ft, ... , tn) = fy, (xt) fy2 (xz- XI) ... !Y. (Xn - Xn-1), (3.20)

where /Y; denotes the density of Yi = X(ti)- X(ti-t), i = 1, ... , n.

Proof: Let 0 = to < t1 < tz < · · · < tn be n arbitrary times. Then

P(X(tt).::; x1, ... , X(tn).::; xn) =P (rl.::; x1, Yt + Yz.::; xz, ... , t
k=l
Yk.::: xn)

so that the f.d.d. of order n of X is


X! 1X2-Y!
P(X(tt).::; x1, ... , X(tn).::; xn) = 1-oo dy1 fy 1 (Yt) -oo dyz IY/Yz) · · ·

l Xn-1-:Ek:iYk
· · · -oo dYn-1 fyn-! (Yn-1)
1Xn-:Ek:/Yk
-oo dyn fyn(Yn)

where fyk denotes the density of Yk. The corresponding finite dimensional density of order
n of X can be obtained from the above equation by differentiation or from the density of
the random vector (Yt. ... , Yn) and its relation to (X(t1), ... , X (tn)). •
3.6. Classes of stochastic processes 123

Let X be a real-valued process with stationary independent increments. The


following three properties of X are useful in applications.
If X is a real-valued process with stationary independent increments, then X is
a Markov process.

Proof: The densities of the conditional variables X Un) I {X Un-1), ... , X(t1)} and X(tn) I
XUn-1) coincide (Eq. 3.20). This property shows that the Brownian motion and the com-
pound Poisson processes are Markov. •

If X is a real-valued process with stationary independent increments, then

P(X(t + r1 + r2)- X(t) :S x) = ~ F,2 (x- y) fr: 1(y)dy,

!r:1+r:2 (x) = ~ !r: (x- y) f, 1 (y) dy,


2 and

'Pr:1 +r:z (u) = 'Pr:1 (u) 'Pr:z (u) (3.21)

for r1, r2 > 0, where F,, fr:, cp, denote the distribution, the density, and the
characteristic function of X(t + r)- X(t), r > 0.

Proof: Set .6.1 = X(t + r1)- X(t) and .6.2 = X(t + r1 + r2)- X(t + r1) for r1, r2 > 0.
The distribution of X (t + r1 + r2) - X (t) = .6.1 + .6.2 is

P(X(t + r1 + r2)- X(t) _:::: x) = P(.6.1 + .6.2 _:::: x) = k Fr:2 (x- y) /r:1(y)dy

since the random variables .6.1 and .6.2 are independent. The density of X (t + r1 + r2) -
X (t) results from the differentiation of the above equation. The characteristic function of
X(t + r1 + r2) - X (t) is the product of the characteristic functions of .6.1 and .6.2, that is,
CfJr:1+r:z(U) = fPr:l (u)rpr:2 (u). This relationship becomeS fPL,~=l Tk(U) = nk=1 fPr:k(U) for
m non-overlapping time intervals oflength Tk, k = 1, ... , m. •

The mean J.L(t) = E[X (t)] and variance a (t) 2 = E[(X (t)- J.L(t)) 2 ] of a process
X(t), t :::: 0, with X(O) = 0 and stationary independent increments are linear
functions of time.
Proof: Properties of the expectation give
M(t + r) = E[(X(t + r)- X(t)) + X(t)] = M(r) + M(t),
a(t + r) 2 = E{[(X(t + r)- X(t))- (M(t + r)- M(t)) + (X(t)- M(t))] 2}
= E{[(X(t + r)- X(t))- (M(t + r)- M(t))] 2} + E{[X(t)- M(t)] 2}
= a(r) 2 + a(t) 2
fort, r :::: 0 so that M(t)= t M(l) and a(t) 2 = ta(1) 2. This result is consistent with
the evolution of mean and variance of the Brownian motion B and the compound Poisson
process C. These moments are E[B(t)] = 0, Var[B(t)] = t, E[C(t)] = At E[Yd, and
Var[C(t)] =At E[Yf] provided that Y1 E L2, where A > 0 is the intensity of the Poisson
process N in the definition of C (Example 3.9). •
124 Chapter 3. Stochastic Processes

Example 3.22: The finite dimensional density of order n of a Brownian motion


B is

fn (XI, · · . , Xn; f1, ... , tn) = </J ( X~) </J ( ~) ... </J ( Xn - Xn-l ) ,
yfl tz-
Jtn- tn-1
t1
(3.22)
where ¢ denotes the density of the standard Gaussian variable. The characteristic
function of the increment C(t) - C(s), t ::;: s, of a compound Poisson process
C(t) = "£f~? Yk is

J
E [ eA u (C(t)-C(s)) = e). (t-s)[l-<py(u)], (3.23)

where A. > 0 is the intensity of the underlying Poisson process N and cp y is the
characteristic function of Y 1· <>
Proof: Properties of Brownian motion and Eq. 3.20 give Eq. 3.22. The characteristic
function of C(t)- C(s) 1: C(t- s), t 2: s, is
E [eA u C(t-s)] = f E [eA u C(t-s) I N(t- s) = n] [A. (t ~ s)]n e-A (t-s)

E(•,(•))" •)I') ~ ,-•


n=O

~ ,-• ,,_,, ( 1 + [A(<:, (H)[I-•,<•ll'

whererpy(u) = E[eHuYq. •

3.6.5 Gaussian processes


• A stochastic process X is said to be Gaussian if all its finite dimensional
distributions are Gaussian.
• A Gaussian process X satisfying the condition in Eq. 3.14 is called a station-
ary Gaussian process.

Note: To prove that a stochastic process X is Gaussian, we need to show that Xn


(X(ti), ... , X(tn)) E Rnd is a Gaussian vector for any integer n 2: 1 and times ti, i =
1, ... , n, (Section 2.11.5). &

Example 3.23: Let Z = (Z 1, ... , Zm) E JRm be a Gaussian vector and


m
X(t) = L zk wk(t) = w(t) z, t::: o, (3.24)
k=l

where Wk(t), k = 1, ... , m, are real-valued, deterministic, and continuous func-


tions of time and w (t) denotes a row vector with coordinates w k ( t). Then X is a
Gaussian process. <>
3.6. Classes of stochastic processes 125

Proof: Let Xn = (X(t1), ... , X(tn)), where n 0:: 1 is an integer and (tJ, ... , tn) denote
arbitrary times. The vector Xn can be expressed as a linear transformation of the Gaussian
vector Z so that it is Gaussian (Section 2.11.5). •

Let g : [0, oo) x IRm -+ IRd be a measurable function and Z be an !Rm-


valued random variable, where d, m ::: 1 are integers. The function X(t) =
g (t, Z) depends on a finite number of random variables, the variables Z 1, ... , Zm,
and is called a parametric stochastic process. Parametric processes are used, for
example, to approximate general stochastic processes for Monte Carlo simulation
(Chapter 5). The process in Eq. 3.24 is a parametric Gaussian process.

3.6.6 Translation processes

An JRd -valued stochastic process X is said to be a translation process if its


coordinates are memoryless, measurable, nonlinear functions of an JRd -valued
Gaussian process G, that is, Xi(t) = gi(G(t)) and gi : JRd -+ IR, i =I, ... , d,
are measurable functions.
Note: Generally,wedefinethecoordinatesofXbyXi(t) = gi(Gi(t)),i = 1, ... ,d,
so that gi : R -+ R are real-valued functions defined on R. If the processes Gi are
independent and Xi (t) = gi (Gi (t)), i = 1, ... , d, then the non-Gaussian processes Xi are
independent of each other. Otherwise, the coordinates of X are dependent non-Gaussian
processes. &

Example 3.24: Let

X(t) = g(G(t)) =p-I o <I>(G(t)), t 2::0, (3.25)


be a real-valued translation process, where F is an arbitrary distribution with den-
sity f, <I> is the distribution of N(O, 1), and G is a Gaussian process such that
E[G(t)] = 0 and E[G(t) 2 ] = 1. The marginal distribution of X is F. The finite
dimensional density of order n of X is

= [(2nt det(p)r 112 Il f(xp) exp


p=l ¢(Yp)
(-~ yT p- 1 y),
2
(3.26)

where p = {E[G(tp) G(tq)]}, Yp = <I>- 1 o F(xp), p, q = 1, ... , n, ¢is the


density of N(O, 1), andy= (yJ, ... , Yn). <>
Note: The marginal distribution of X is P(X(t)::; x) = P(F- 1 o CI>(G(t))) = P(G(t)::;
ci>- 1(F(x))) = F(x). The finite dimensional distribution Fn of order n of X is the prob-
ability of the event {X(tJ) :S x1, ... , X(tn) :S Xn} or {G(fJ) :S Y1· ... , G(tn) :S Yn}.
The density fn in Eq. 3.26 can be obtained by differentiating Fn. Alternatively, fn can
be calculated from the density of the Gaussian vector (G(tJ), ... , G(tn)) in Section 2.11.5
and the change of variable in Eq. 3.25. &
126 Chapter 3. Stochastic Processes

3.6. 7 Mixture of translation processes


Let Xk(t) = gk(Gk(t)) = (pCklf 1 o ct>(Gk(t)), t ::: 0, k = 1, ... , m, be
real-valued translation processes, where p(k) are absolutely continuous distribu-
tions and Gk denote stationary Gaussian processes with mean zero, variance one,
correlation function Pk(r) = E[Gk(t) Gk(t + r)], and spectral density Sk. The
processes G k are independent of each other. The finite dimensional densities of
the translation processes Xk are (Eq. 3.26)

(3.27)

where Pk = {E[Gk(tp) Gk(tq)]}, Yk,p = cp-l o p(kl(xp), p, q = 1, ... , n,


¢ is the density of N(O, 1), J?\x) = f(kl(x) = dF(kl(x)jdx, and Yk =
(Yk,l, · · ·, Yk,n).
Let X be the collection of real-valued stochastic processes with finite di-
mensional densities

(3.28)

where Pk ::: 0, k = 1, ... , m, are such that Lk=l Pk = 1. The processes in X


are referred to as mixtures of translation processes. A member of X is said to
be non-degenerate if m > 1 and Pk E (0, 1) for all k = 1, ... , m. We now give
some of the properties of these processes.

• Translation processes are included in X. The non-degenerate members of X


are not translation processes.
• The members of X are stationary non-Gaussian processes.
• The moments tLkCql, ... , qn; t1, ... , tn) = E [07=
1 Xk(ti)q;] and
tL(q}, ... ,qn;tl, ... ,tn) = E[07= 1 X(ti)q;] oforderq = L7= 1 qi ofXk and
X with finite dimensional densities in Eqs. 3.27 and 3.28, respectively, are re-
lated by
m
f.L(ql, · .. , qn; t1, ... , tn) =L Pk f.Lk(ql, ... , qn; t1, · · ·, fn). (3.29)
k=l

Proof: Take m = 2 and n = 2 in Eq. 3.28. Suppose that h in this equation defines
a non-degenerate process in X, the densities fik) satisfy the condition !?) (xl, x2) =
3. 7. Second moment properties 127

f(k)(xi) j(k)(x2), k = 1, 2, and Jlk(l; t) = 0, k = 1, 2. Hence, theW-valued ran-


dom variable X with density f2 = p 1?) + (1 - p) 1?), p E (0, 1), has uncorrelated
coordinates (Eq. 3.29). If X were a translation vector, then we would have

P J?) (XI, X2) + (1 - p) J?) (XI, X2) = p f(l) (XI) f(l) (x2) + (1 - p) /( 2) (xi) /( 2) (X2)
= (p f(I)(XI) + (1- p) J< 2)(XI)) (p j<I)(x2) + (1- p) f( 2)(x2))
since uncorrelated translation variables are independent. The above equality gives

(!(l)(xi)- !(2)(XI)) (!(I)(x2)- !(2)(x2)) =0


for all (xi, x2) E JR2 . This implies that j(l) coincides with j< 2) so that the mixture is
degenerate in contradiction with the initial assumption.
The last two properties follow from the properties of the Gaussian processes Gk and
the definitions of the processes Xk and X by elementary calculations. •

Example 3.25: Let {('r, a) = E[X(t) X(t + i) X(t +a)] be the second order
correlation function of X E X with finite dimensional density in Eq. 3.28. Then
{(i, a)= L~=I Pk {k(i, a), where sk(i, a)= E[Xk(t) Xk(t + i) Xk(t+a)]. ~
Note: This is a special case of Eq. 3.29. The second order correlation functions ~ and ~k
depend on only two arguments because the processes X and Xk are stationary. A

3.7 Second moment properties


Many of the definitions and results in this section relate to the second mo-
ment properties of random vectors (Section 2.11.4). Let X be an JRd -valued pro-
cess in L2(Q, :F, P), that is, the coordinates of X have finite second moments at
all times.

• Mean function: p,(t) = E[X(t)]


• Correlation function: r(t, s) = E[X(t) X(s)T]
• Covariance function: c(t, s) = E[(X(t) -p,(t)) (X(s) -p,(s))T]
(3.30)

Note: The moments in Eq. 3.30 exist and are finite because X(t) E L2. For example, the
entry (i, j) of r(t, s) giving the correlation function r;,j(f, s) = E[X;(t) Xj(s)] is finite
since (E[X; (t)X i (s)]) 2 .::; E[X; (t) 2] E[X j (s) 2] by the Cauchy-Schwarz inequality and
the expectations E[X; (t) 2 ] exist and are finite by hypothesis. Similar considerations show
that the mean and covariance functions of X exist and are finite.
Generally, the condition X(t) E L2, t ::::_ 0, is insufficient for the existence of
moments of X of order higher than two. The class of Gaussian processes is an exception
because their moments of order three and higher are related to the first two moments by
algebraic equations ([79], Appendix B). A
128 Chapter 3. Stochastic Processes

The second moment properties of X are given by the pair of functions (p,, r)
or (p,, c) in Eq. 3.30.

Note: The functions (JL, r) and (JL, c) contain the same information since the correlation
and covariance functions are related by c(t, s) = r(t, s)- /l(t) p,(tl.
If r;,j (t, s) = 0 at all times, the processes X; and Xi are said to be orthogonal.
If c i, j ( t, s) = 0 at all times, the processes X; and X J are said to be uncorrelated. If the
processes X; and X J have mean zero, that is, 1-ti (t) = 0 and 1-t J (t) = 0 at all times, they
are orthogonal if and only if they are uncorrelated. A

Example 3.26: Generally, the second moment properties of a process X are in-
sufficient to define its finite dimensional distributions. The Gaussian process is a
notable exception. <>
Proof: Let n :::: 1 be an integer and t;, i = 1, ... , n, denote arbitrary times. The
finite dimensional distribution of order n of X is the distribution of the vector Xn =
(X(t]), ... , X(tn)). If X is a Gaussian process, Xn is a Gaussian vector whose second
moment properties can be obtained from the mean and correlation functions of X. The
distribution of Xn is completely defined by its first two moments (Section 2.11.5). •

If a process X is stationary, then its finite dimensional distributions are in-


variant to a time shift so that (1) the mean function p, is a constant and (2) the
correlation and covariance functions r and c depend only on the time lag. These
observations suggest the following definition.

A stochastic process X is said to be weakly stationary or stationary in the


weak sense if
1. The mean function p,(t) = p, is time invariant and
2. The correlation and the covariance functions r(t, s) and c(t, s) depend on
only the time lag T = t - s, that is, r(t, s) = r(r) and c(t, s) = c(r).
Note: A stationary process in L2 is weakly stationary but the converse is not generally
true. A notable exception is the Gaussian process that is stationary if and only if it is
weakly stationary. For example, let X be a real-valued weakly stationary Gaussian process
with mean 1-t and covariance function c(t, s) = E[(X(t)- ~-t)(X(s)- ~-t)J. The vector
Xn = (X(t]), ... , X(tn)) is Gaussian with mean {t ln and covariance c = {c(t; - tj)},
i, j = 1, ... , n, for any integer n :::: 1 and times 0 :::;: t1 < · · · < tn, where ln is an
n-dimensional vector with unit entries. The finite dimensional density of order n of X at
x = (x1, ... , Xn), that is, the density

f(x; t1, ... , tn) = [(2rr)ndet(c)]-l/2 exp [- ~(x - {t ln)T c- 1 (x - {t ln) J


of Xn is invariant to a time shift.
The above notations r(t,s) = r(r) and c(t,s) = c(r), r = t - s, are used for
convenience when dealing with weakly stationary stochastic processes. They may cause
some confusion since the functions r(t, s), c(t, s) and r(r), c(r) are defined on~ and JR,
respectively. A
3. 7. Second moment properties 129

Example 3.27: Let X be a real-valued discrete time stochastic process defined by

where p E (0, 1), Wk are uncorrelated random variables with mean E[W k] = 0
and variance Var[Wk] = 1 that are uncorrelated to the initial value X o of X. The
process X is weakly stationary as k --+ oo but may or may not be stationary
depending on the higher order properties of W k· <>
Proof: The recurrence formulas 11-k+l = p Il-k and u£+ 1 = p 2 u£ + 1 for the mean
Il-k= E[Xk] and the variance u£ = E[(Xk- Mk) ] of Xk result by averaging the defining
2
equation for X and its square, respectively. The covariance function of X satisfies the
equation c(k + p, k) = E[Xk+p Xk] = pP u£, where xk = xk -Il-k· This equation
results by taking the expectation of the product of Xk+p = pP xk + Z:f=l ps-I Wk+p-s
with Xk because Xu and Wv, v 2: u, are not correlated. The asymptotic values of the mean,
variance, and covariance functions of X ask---+ oo are zero, 1/(1- p2), and pP j(l- p 2),
respectively, so that X becomes a weakly stationary process for large values of k.
However, X may not be stationary. For example, f{Jk(u) = E[exp(J=T u Xk)]
satisfies the recurrence formula (/Jk+l (u) = f{Jk(P u) f{JWk (u) if the random variables Wk
and Xk are independent for all k's, where cpwk(u) = E[exp(J=T u Wk)]. Suppose that X
becomes stationary for large values of k and denote the stationary characteristic function
of X by cp. For these values of k the recurrence formula becomes cp ( u) :::::: cp (p u) cpwk ( u) or
cp(u)jcp(p u) :::::: cpwk (u). This equality is not possible because its left side is time invariant
while its right side depends on time if, for example, Wk has the above second moment
properties but different characteristic functions at each time k. •

Example 3.28: Let X (t) = g(G(t)) = F- 1 o <l'(G(t)) be the translation process


in Eq. 3.25. Suppose that F is such that f x dF(x) = 0 and f x 2 dF(x) = 1
and that G is a stationary Gaussian process with mean zero, variance one, and
covariance function p(r) = E[G(t) G(t + r)]. The covariance function of X is

;(r) = E[g(G(t)) g(G(t + r))] = { g(yi) g(y2) </J(JI, y2; p(r)) dy1 dy2,
]Tft2
(3.31)
where¢ denotes the joint density of (G(t), G(t + r)) (Section 2.11.5). The
covariance function ; of X takes on values in the range [; *, 1], where ;* =
E[g(Z) g(-Z)] and Z is a standard Gaussian variable. The translation process
X is weakly stationary and stationary. <>
Proof: The formula in Eq. 3.31 results from the definition of the translation process X or
a theorem by Price ([79], Section 3.1.1). This theorem also shows that ~(r) increases with
p(r). The covariance function of X is ~(r) = 0 and 1 for p(r) = 0 and 1, respectively.
However, p(r) = -1 does not imply ~(r) = -1. If we have p(r) = -1, that is, G(t) = Z
and G(t + r) = -Z, the value of the covariance~* = E[g(Z) g(-Z)] of (X(t), X(t +
r)) depends on g. For example,~* = -1 if g(y) = y 3 but~* > -1 if g(y) = a+
exp(,B + y G(t)), where a, ,B, yare some constants ([79], Section 3.1.1). This property of
the covariance function of X is particularly relevant in applications when we attempt to fit
130 Chapter 3. Stochastic Processes

a non-Gaussian translation model to a marginal distribution F and covariance function ~


estimated from data. If~ takes on values outside the range [~*, 1], it is not possible to fit
a translation process to F and ~. Additional conditions for the existence of a translation
process with specified F and~ are given in [79] (Section 3.1.1).
That X is weakly stationary follows from Eq. 3.31. Also, the finite dimensional
density fn of translation processes in Eq. 3.26 depends on time through the covariance
matrix p = {E[G(tp) G(tq)] = p(tp- tq)} so that it is invariant to a time shift. •

3.7.1 Properties of the correlation function


Let X be an JRd -valued stochastic process in L2(Q, :F, P) with correlation
functions r;,J (t, s) = E[X; (t) X 1 (s)]. We divide the properties of these functions
in two groups corresponding tor i,i and r;,J, i =/= j.
• r;,; is symmetric, that is, r;,; (t, s) = r;,; (s, t). If X is weakly stationary, then
r;,; is an even function, that is, r;,; (r) = ru ( -r), where r = t - s.
•The inequality lr;,;(t, s)l 2 :S r;,; (t, t) r;,; (s, s) holds. If X is weakly stationary,
then lr;,;(r)l :S r;,;(O).
• r;,; is positive definite.
Proof: The first property follows from the definition of the correlation function and the
fact that this function depends only on the time lag for weakly stationary processes. The
Cauchy-Schwarz inequality gives the second property. That r;,; is positive definite, that
is, Lk,t ak a1 r;,; (tk, t1) ~ 0 for any integer n ~ 1, constants (al, ... , an), and times
(tb ... , tn), results from the fact that (Lk=l ak X; (tk) ) 2 is a positive random variable so
that its expectation Lk,t ak a1 r;,; (tk, t1) must be positive.
The notations r;,; (t, s) and r;,; (r), r = t - s, are somewhat inconsistent but are
common in the second moment analysis of stochastic processes, and we will use them. The
same comment applies to the correlation function r;,j of distinct coordinates of X. •

• r;,J, i =/= j, satisfies the condition r;,J (t, s) = TJ,i (s, t). If X is weakly station-
ary, then r;,J(r) = TJ,;(-r), where r = t - s.
• The inequality lr;,J (t, s) I :S Jr;,; (t, t) TJ,J (s, s) holds. If X is weakly station-
ary, then lr;,J(r)l :S Jr;,;(O)rj,J(O).
Proof: The first property follows from the definition of the correlation function because
r;,j (t, s) = E[X; (t) X j (s)] = E[X j (s) X; (t)] = r j,i (s, t). If X is weakly stationary, we
haver;,j(t+r, t) = r1,;(t, t+r) sothatr;,j(r) = rj,;(-r). TheCauchy-Schwarzinequal-
ity gives the second property. We also haver;,; (t, t) rj,j (s, s) .:::; [r;,; (t, t) + rj,j(s, s)] 2 /4
since 4a b.:::; (a+ b) 2 for any a, b E R •

Example 3.29: Let X be a <Cd -valued process in L2 (Q, :F, P), that is, its co-
ordinates are complex-valued processes such that E[X; (t) X; (t)*] < oo, i =
1, ... , d, at all times. The correlation functions of X are
r;,j(t, s) = E[X;(t) Xj(s)*], i, j = 1, ... , d, (3.32)
3. 7. Second moment properties 131

and satisfy the condition r;,J (t, s) = TJ,i (s, t)*, where z* denotes the complex
conjugate of z E IC. <)

Proof: Let U; and V; denote the real and imaginary parts of X; . Then

r;,j (t, s) = E [ ( U;(t) + J=1 V;(t)) J=1 Vj(s))]


( Uj(s)-

= E [U;(t) UJ(s) + V;(t) Vj(s)] + J=1 E [V;(t) Uj(s)- U;(t) Vj(s)]


and is equal to the complex conjugate of rj,;(s, t) that can be obtained by similar calcu-
lations. Alternatively, note that rj,i (s, t)* = (E[X j (s) X; (t)*J)* = E[X J (s)* X; (t)] =
r i, J (t, s). Other properties of the correlation functions for complex-valued processes can
be derived in the same manner. •

Example 3.30: Let Ak, Bk be uncorrelated random variables with mean zero and
unit variance. The function
n
X (t) = L ak [Ak cos(vk t) + Bk sin(vk t)], t 2: 0, (3.33)
k=l
is a real-valued parametric stochastic process with mean zero and covariance func-
tion
n
c(t, s) =La[ cos(vk (t- s)), (3.34)
k=l
where ak, Vk > 0, k = 1, 2, ... , n, are some constants. The process X is weakly
stationary. If in addition Ak, Bk are Gaussian variables, then X is a stationary
Gaussian process. <)

Proof: The mean is zero because X is a linear function of Ak and Bb and these random
variables have mean zero. The covariance function of X is
n
c(t, s) = L uk uzE[(Ak cos(vk t) + Bk sin(vk t))(At cos(vz s) + Bz sin(vt s))]
k,l=l
n
= L u[ (cos(vk t) cos(vk s) + sin(vk t) sin(vk s))
k=l
n
= L u£ cos(vk (t - s)) = c(t - s)
k=l
by the linearity of the expectation operator and the properties of the random variables
Ak, Bk. Because c(t, s) depends on the time lag t - s rather than the times t and s, X
is a weakly stationary process.
If in addition the random variables Ako Bk are Gaussian, then X is stationary since
it is a weakly stationary Gaussian process. •
132 Chapter 3. Stochastic Processes

3.7.2 Power spectral density


Let X be a weakly stationary stochastic process in L2(Q, F, P). We have
seen that the mean and the correlation functions or the mean and covariance func-
tions provide equivalent second moment properties for weakly stationary pro-
cesses. The spectral density of a process X provides an alternative way for speci-
fying the second moment properties of this process.

3.7.2.1 Bochner's theorem

i:
A continuous function r : ~ t-* <C is positive definite if and only if it has the
representation
r(r) = eA v-r dS(v), (3.35)

where Sis a real-valued, increasing, and bounded function.

Note: The function r is said to be positive definite if Lk,l=l ~k ~z* r(tk - tz) ~ 0 for any
integer n ~ 1, arguments tk E JR, and complex constants ~b k = 1, ... , n, where z*
denotes the complex conjugate of z E C.
The proof of Bochner's theorem can be found in [45] (Section 7.4). We only show
that r(·) given by Eq. 3.35 is positive definite. Note that

t ~k ~z* r(tk- tz) = t ~k ~z* L: eA v (tk-t1) dS(v)

L: (t~keAvtk) (t~zeAvtt)*
k,l=l k,l=l

= dS(v)
k=l 1=1

dS(v)

is positive for any integer n ~ 1, arguments tk, and complex constants ~k· &

3.7.2.2 R-valued stochastic processes


If X is a real-valued weakly stationary process, its correlation function
r(r) = E[X(t + r) X(t)] is a positive definite function. The Bochner theorem
states that, if the correlation function r : ~ ~ ~is continuous, then there ex-
ists a real-valued, increasing, and bounded functionS, called the spectral dis-
tribution function of X, such that r has the representation in Eq. 3.35. The
spectral distribution function is defined up to a constant that can be eliminated
by setting S(-oo) = 0. If Sis absolutely continuous, there exists a function
s(v) = dS(v)jdv, v E ~called the spectral density or the mean power spec-
tral density function of X.
3. 7. Second moment properties 133

L:
The correlation and spectral density functions are Fourier pairs, that is,

r(r) = eR vr s(v) dv 1
and s(v) = -
2rr
100
_
e-Rvr
00
r(r)dr.

(3.36)

Note: If M = E[X (t)] is not zero, the spectral density function of X has an atom at v =0
equal to M2 since r(r) = M2 + c(r) (Eq. 3.36). Formal calculations give

s(v) = M2 8(v) + s(v), where s(v) = l-


2rr -oo
100
e-Rvr c(r)dr, (3.37)

where 8(·) denotes the Dirac delta function. In engineering applications the one-sided
spectral density function

g(v) = M2 8(v) + 2s(v), v:::: 0, (3.38)

is preferred since v :::: 0 in Eq. 3.38 can be interpreted as frequency. By analogy with the
spectral distribution S of X, we can define the one-sided spectral distribution function of
this process by

where the notation J;+ indicates that the integral does not include the value of the integrand
at zero. &

We summarize now some properties of the spectral and the one-sided spec-
tral density functions.
• s ( ·) or g ( ·) provide the second moment properties of X.
• sO and g(·) are positive functions and s(v) = s( -v) for all v E R
• An alternative to Eq. 3.36 involving only real-valued functions is

r(T) = fooo g(v) cos(v r) dv and g(v) = _!:__ roo r(T) COS(V T) dT. (3.39)
rr lo
• The area under the spectral densities sO and g(·) is r(O). If E[X(t)] = 0,
this area is the variance of X.

Proof: The Fourier transform of the spectral density is the correlation function of X and
the mass of the atom of its spectral density at v = 0 is M2 . Hence, the functions sO and
gO deliver the second moment properties of X. Because Sis an increasing function, the
spectral densities sO and g(·) are positive. The spectral density sO is an even function
since the correlation function of X is even (Eq. 3.36). The last two properties result from
Eq. 3.36, Eq. 3.38, and the fact that the spectral density s(·) is an even function. •
134 Chapter 3. Stochastic Processes

Table 3.1. Examples of correlation and one-sided spectral density functions for
weakly stationary processes with mean zero (adapted from [175])

Process X r(r) g(v)


WN rr go o(r) go

!"'
0 < V ::'S Vc
BLWN go sin(vcr)/r
0, V > Vc
go, Va < V ::'S Vb
RP go(Vb- Va) sin(p) cos(y)jp,
0, v ¢. (va, Vb]

BN { a 2 (1- I• I/~). lrl ::'S ~ 0"2 A sin2 (v


A/2)
0, lrl > ~ rr (v A/2) 2
2 exp( -A.Irl) 2u 2 ),
1M a rr (v2+J,.2)
2M a2 exp( -A.Ir I) (1 + A.lr I) 4u 2 J,. 3
rr (v2+J,.2)2

Example 3.31: The spectral density and the one-sided spectral density functions
of the stochastic process in Eq. 3.33 are
1 n n
s(v) = 2 L af [o(v- Vk) + o(v + Vk)] and g(v) = L af o(v- Vk),
k=l k=l
(3.40)

respectively, where 8 ( ·) denotes the Dirac delta function. <i


Note: The spectral distribution function S is a piece-wise constant with jumps of magni-
tude a'f /2 at -vk and vk, k = 1, ... , n. The definition of the spectral density is based on
formal operations since dS (v) I d v does not exist in R This operation can be made rigorous
by considering a sequence of processes whose spectral densities are defined and converge
to the spectral density of X. ~

Table 3.1 gives examples of covariance and spectral density functions for
real-valued weakly stationary processes, where the WN, BLWN, RP, BN, 1M,
and 2M denote white noise, band-limited white noise, rectangular pulse, binary
noise, first order Markov, and second order Markov process, respectively, and go,
Vc, Va, Vb, ~.A., and a 2 are positive constants. The notations p = (vb- va) r/2
andy = (vb + va) r /2 are also used in the above table.

3.7.2.3 Cvalued stochastic processes


We have defined complex-valued processes in Example 3.29. If X is a
weakly stationary complex-valued process, its correlation function r(t - s) =
E[X(t) X(s)*] is complex-valued (Eq. 3.32). This correlation function can be
3. 7. Second moment properties 135

obtained from its real and imaginary parts (Example 3.29) and has the following
properties.
• The correlation function of X has the property r (t - s) = r (s - t) *.
• The correlation function of X is positive definite.

Proof: The equality r(t, s) = r(s, t)* (Example 3.29) implies the first property. The
second property results from the observation that the random variable

is positive for any integer n :::: 1, times fb and complex-valued constants ~k so that its
expectation Lk,t=l ~k ~z* r(tk- tt) is positive. The Bochner theorem (Eq. 3.35) states that
X has a spectral distribution function. •

Example 3.32: The correlation and spectral density of complex-valued processes


can be real-valued. For example, let X be a stationary Gaussian process with mean
zero and covariance function c(r) = E[X(t + r) X(t)] = exp(-A.Irl), A. > 0.
The correlation function and the spectral density of the complex-valued process
Y (t) = exp( R X (t)) are real-valued functions. <>
Proof: We have ry(r) = E[Y(t + r) Y(t)*] = E[exp(.J=T (X(t + r)- X(t)))] so that
ry(r) = exp ( -1 e-Jc lrl) because X(t + r)- X(t) is a Gaussian variable with mean
zero and variance e-Jc lrl. Hence, ry (r) is an even real-valued function so that the spectral
density of Y has the same properties. •

3.7.2.4 JRd -valued stochastic processes

If X is an JRd -valued weakly stationary stochastic process with mean zero and
continuous correlation function r(r) = E[X(t + r) X(t)T], then

r·l,j·(r) = i
lR
eRvr dSl,j·(v) = klR
eRvr s·l,j·(v)dv , where

S;,i = spectral distribution function of X;,

St,J = Sj,; = ~ [sr- .J=1 S2- (1- .J=l) (S;,; + Sj,j)], i # j,


(3.41)

Sp, p = 1, 2, is defined by Eq. 3.42 below, and s;,J (v) = dS;,J (v)jdv.
Proof: The representation of the entries r;, i of the correlation function follows from results
in Section 3.7.2.2. We show now that the correlation functions ri,J• i i= j, have a similar
representation. The integral representations of r;,j involving the spectral densities s;,j are
valid if the functions s;,J exist.
136 Chapter 3. Stochastic Processes

The complex-valued process Y(t) = L,f=l t;; X;(t), t;; E C, is weakly station-
ary because it has mean zero and correlation function ry(r) = E[Y(t + r) Y(t)*] =
L.%, 1= 1 sksz* rk,z(r). If ry is continuous, then ry has the representation in Eq. 3.36 since
it is positive definite. Let i, j E { 1, ... , d} be some fixed, distinct indices and let Y1 and
Y2 be special cases of Y corresponding to (1) t;; = sj = 1 and sk = 0 fork =I= i, j and
(2) t;; = A, t; j = 1, and sk = 0 for k =I= i, j, respectively. The correlation functions of
these processes,

ry 1 (r) = r;,; (r) + rj,j(r) + r;,j (r) + rj,i (r) and


ry 2 (r)=r··(r)+r·
1,1 ],1·(r) '
],]·(r)+J=lr·1,]·(r)-J=lr·

have the representations

ryp(r) =I: eR vr dSp(v), p =l, 2, (3.42)

where Sp denotes the spectral distribution of Yp, p = 1, 2 (Eq. 3.36). The last two equa-
tions and the representation of the correlation functions r;,; in Bochner's theorem give

1JR
r·1,]·(r)= [ eRvrdS·1,]·(v) , where

S·1,]· = s~.
],1
= ~
2 [s1 - R s2 - · ·)]
(1 - J=T)(S·1,1· + s],]

for the correlation functions r;,j, i =I= j. The spectral distributions S;,j, i =I= j, are bounded
complex-valued functions and the matrix {S;,j} is a Hermitian since S;,j = Sj,;. The
spectral distributions S;,j, i =1= j, are not monotone functions.
It remains to show that it is possible to define a spectral density for distinct pairs of
coordinates of X. Consider a bounded interval [v, v + Av) and set AS;,j (v) = S;,j (v +
Av) - S;,j (v). The Hermitian matrix {AS;,j (v)}, i, j = 1, ... , d, has the property
L,fJ=l t;; t;j AS;,J (v) ::: 0, t;; E C, since

ry(r) =I: eRvr dSy(v) =J~l I: t;; t;j eRvr dSu(v),

and the spectral distribution Sy of Y(t) = L,f=l sk X(tk) is an increasing function. For a
fixed pair of distinct indices i, j E { l, ... , d} and sk = 0 for k =I= i, j the sum
d
L t;; t;j AS;,j (v)
i,j=l

becomes

ASi,i (v) It;; 12 + ASJ,J (v) Is} 12 + 2 !R(ASi,J (v)t;; t;j) ::: 0, or

AS;,; (v) It;; 12 + ASJ,J (v) lsJ 12 + 2IAS;,J (v)llt;;llt;j I ::: 0,

because the real part of AS; ,1· (v) t;; (!'J is smaller than its absolute value. The latter ex-
pression divided by Is} 12 is a polynomial of 11 = It;; 1/lsj I that has no real roots so that
3.8. Equivalent stochastic processes 137

lilSi,j(v)i 2 :5 iilSi,i(v)liilSj,j(v)l. Hence, si,J(v) exists and satisfies the condition


lsi,j(v)l 2 :5 su(v)sj,J(v) at any v E lR, where si,i(v) = dSi,i/dv and Sj,j(v) =
dSJ,Jfdv ([45], Section 8.1). •

Example3.33: Let Xi(t) = .JT=/)Wi(t) + JPW(t), i = 1, 2, be the coordi-


nates of an JR2 -valued process X, where p E (0, 1) and (W1, W2, W) are zero-
mean, weakly stationary processes that are uncorrelated with each other. Then X
is a weakly stationary process with spectral densities

Si,i(v)=(l-p)sw;(v)+psw(v), i=l,2,
s1,2(v) = s2,1(v) = psw(v)

where sw; and sw denote the spectral densities of Wi and W, respectively.


Proof: The covariance functions of X are ci,J (r)= (1- p) liij cw; (r) + p cw(r), where
cw; and cw denote the covariance functions of Wi and W. The spectral densities si,i are
the Fourier transform of the covariance functions ci,i by Bochner's theorem (Eq. 3.35).
The correlation functions in Eq. 3.42 are

ry 1 (r) = (1- p) (cw1 (r) + cw2(r)) + 4 p cw(r),


ry 2 (r) = (1- p) (cw1 (r) + cw2 (r)) + 2 p cw(r),
so that

sl (v) = _I_ r e-A


2n )~
v Try! (r) dr = (1 - p) (sw! (v) + sw2 (v)) + 4 p sw(v),
s2(v) = -1- r e-A
2n )~
VT rYl (r) dr = (1 - p) (sw! (v) + sw2 (v)) + 2 p sw(v).
The spectral densities s1,2 and s2, 1 result from Eq. 3.41. •

3.8 Equivalent stochastic processes


In both applications and theoretical studies we need to assess differences
and similarities between two stochastic processes X and Y that may or may not be
defined on the same probability space. If X and Y have some common properties,
we say that they are equivalent in a sense defined by the shared properties.
The stochastic processes X and Y defined on the same probability space
(Q, F, P) are said to be indistinguishable if their samples coincide a.s.

Note: Let no be the subset of Q collecting the samples X(-, w) and Y(-, w) of X andY
that do not coincide. The processes X and Y are indistinguishable if P(Qo) = 0. A

The stochastic processes X and Y defined on the same probability space


(Q, :F, P) are said to be modifications if P({w : X(t, w) = Y(t, w)}) = 1
at each value oft.
138 Chapter 3. Stochastic Processes

Note: Indistinguishable processes are modifications but modifications may not be indistin-
guishable. Let Q 1 = {w : X(t, w) i= Y(t, w)} be the subset of Q on which the modifica-
tions X and Y differ at t. The set Q 1 is an event and P (Q1 ) = 0. If the probability of U1 Q 1
is zero, then X and Y will be indistinguishable. Because U1 Q 1 is an uncountable union of
members of :F, it may not be in :F, in which case P(U1 Q 1) is not even defined. A

The following conditions of equivalence for X and Y are weaker and do not
require that these processes be defined on the same probability space.
The stochastic processes X and Y are said to be versions if they have the same
finite dimensional distributions.
The stochastic processes X and Y are said to be equal in the second moment
sense if they have the same second moment properties.

Note: Generally, processes with the same second moment properties have different finite
dimensional distributions. An exception is the class of Gaussian processes. Recall that two
Gaussian processes with the same second moment properties are versions.
Other types of equivalence between stochastic processes can be defined. For ex-
ample, processes with the same marginal distribution and second moment properties or
processes with the same finite dimensional distributions of order two can be viewed as
equivalent. A

Example 3.34: Suppose that the stochastic processes X and Y are modifications
with right continuous sample. Then X and Y are indistinguishable. <>
Proof: The event {X(r) i= Y(r)} has zero measure for each r E Q so that the event
A = UrEQJ{X(r) i= Y(r)} has zero probability since the set of rational numbers Q is
countable. If rn -(, t, rn E Q, and t E R, then {X(rn) i= Y(rn)} C A for each n so that we
also have {X (t) i= Y (t)} c A at any time t E lR?. by right continuity. Hence, X and Y are
indistinguishable. •

Example 3.35: Let X (t) and Y (t), t E [0, 1], be real-valued processes defined
on the probability space ([0, 1], B([O, 1]), P), where P(dw) = dw such that
X (t, w) = 0 and Y (t, w) = 0 or 1 for t ::j::. w or t = w, respectively. The
processes X and Y are not indistinguishable but are versions and modifications. <>
Proof: The samples of X andY differ for all w E Q, for example, suptE[O, 11 X (t, w) = 0
while suptE[O,l] Y(t, w) = 1 foreachw E [0, 1].
The processes X and Y are modifications because Qt = {w : X ( t, w) i= Y (t, w)} =
{t} and P({t}) = 0 for any t E [0, 1]. These processes are also versions since the vectors
(X(t1), ... , X(tn)) and (Y(t1), ... , Y(tn)) differ on a subset of Q with measure zero for
any times t1, ... , tn in [0, 1]. •

Example 3.36: The mean and covariance functions of the Brownian motion and
the compound Poisson processes B and C are

J-LB(t) = 0, CB(t, s) = min(t, s) (3.43)


3.9. Second moment calculus 139

and
fi,c(t) =At E[Yl], cc(t, s) =A min(t, s) E[Yf], (3.44)
with the notations in Eq. 3.6 and the assumption E[Yf] < oo. If EY1 = 0
and A E[Yf] = 1, the Brownian motion and the compound Poisson process are
equal in the second moment sense. However, the samples of these processes differ
significantly (Fig 3.5) showing that the second moment equivalence provides little
information on the sample properties. <>
Proof: The Brownian motion has mean zero by definition. The expectation E[B(t) B(s)],
t > s, is

E[(B(t)- B(s) + B(s)) B(s)] = E[(B(t)- B(s)) B(s)] + E[B(s) 2] = s

since E[(B(t)- B(s)) B(s)] = E[B(t)- B(s)] E[B(s)] = 0 by the independence of the
increments of Band E[B(s) 2 ] = s.
We have E[C(t)] = L~o E[C(t) I N(t) = n] P(N(t) = n), E[C(t) I N(t) =
n] = n E[YI], and P(N(t) = n) =e-At (A t)n jn! so that

00 00

E[C(t)]= LnE[YI]e-At(At)njn!=E[YtJe-At Ln(At)njn!


n~ n~
00

= E[YI]e-At (At) L(At)n-l/(n -1)! = )..t E[Yt].


n=l
Similar calculations can be performed to find the mean and the covariance functions of C
in Eq. 3.44. •

Many physical phenomena are modeled in applications by stochastic pro-


cesses X whose properties have to be inferred from the available records of these
phenomena and/or other type of information, for example, restrictions imposed by
the physics of the modeled phenomena. Generally, the available information is in-
sufficient to find uniquely the probability law of X. There may be many stochastic
processes that are consistent with the available information. This lack of unique-
ness in modeling can have significant practical implications. For example, if the
available information on the input to a linear dynamic system is limited to the
mean and covariance functions, the second moment properties of the state of this
system are defined uniquely. However, the extreme, range, and other sample prop-
erties of the output can differ significantly depending on the particular model used
for X ([79], Chapter 5).

3.9 Second moment calculus


The Chebyshev inequality, the Cauchy-Schwarz inequality, the mean square
(m.s.) convergence of sequences of random variables (Sections 2.12 and 2.13) and
the following facts are essential tools for second moment calculations.
140 Chapter 3. Stochastic Processes

elf Xn. X E L2 andXn ~X, then

lim E[Xn] = E[I.i.m. n-H)oXn] = E[X]. (3.45)


n-HXl

• If Xn, X, Yn, Y E L2 and Xn ~ X, Yn ~ Y, then

lim E[Xm Yn]


m,n~oo
= E[(l.i.m. m~ooXm) (l.i.m. n~ooYn)] = E[X Y]. (3.46)

• If a sequence X n has two m.s. limits X and Y, then X = Y a.s., that is,
Xn ~ X and Xn ~ Y implies P(X i= Y) = 0. (3.47)

Proof: The above statements show that the expectations and the m.s. limits can be in-
terchanged under some conditions (Eqs. 3.45-3.46) and that the m.s. limit is unique with
probability one, that is, the subset of Q in which the m.s. limits of a sequence Xn may
differ has zero measure (Eq. 3.47).
We have

0 :::: IE[X]- E[Xnll = IE[X- Xnll :::; ( E[(X- Xnll) 112


for each n :=::: 1 by the Cauchy-Schwarz inequality so that liiDn~oo E[Xn] = E[X] since
Xn ~ X by assumption.
Note that

IE[Xm Yn -X Y]l = IE[Xm Yn - Xm Y + Xm Y- X Y]i


:::; IE[Xm (Yn - Y)]l + IE[(Xm - X)Y]I

:::: ( E[X~] E[(Yn - Y) 2 ]) 112 + ( E[(Xm - X) 2 ] E[Y 2 ]) 112 ,

where the Cauchy-Schwarz inequality has been used to obtain the final upper bound on
IE[Xm Yn - X Y]l. The property in Eq. 3.46 follows from the above inequality since
Xm ~ X and Yn ~ Y by hypothesis.
The uniqueness with probability 1 of the m.s. limit in Eq. 3.47 follows from the
Chebyshev inequality P(IX- Yl >e) :::; E[(X- Y) 2 ]/e 2 , e > 0, and the inequalities

0 :::; E[(X- Y) 2 ] :::; E[((X- Xn) + (Xn - Y)) 2 ]


:::; 2 E[(X- Xn) 2 ] + 2 E[(Xn - Y) 2 ],

where the last inequality holds since (a+ b)2 :::; 2a 2 + 2b2 . Because Xn ~ X, Y, we
have P(IX- Yl > e) = 0, Ve > 0. •

The remainder of this section gives definitions and basic properties of m.s. con-
tinuity, differentiation, and integration for real-valued processes in L 2(Q, :F, P).
We also extend these results to Rd -valued stochastic processes.
3.9. Second moment calculus 141

3.9.1 Continuity
We have defined in Section 3.3 several types of continuity for stochastic
processes. Generally, it is difficult to show that a process X is sample continuous,
that is, that almost all samples of X are continuous functions of time. Weaker def-
initions of continuity may be adequate in many applications. We define again m.s.
continuity and give simple criteria for determining whether a process is m.s. con-
tinuous at a time t and in a time interval.
A real-valued stochastic process X in L2 is m.s. continuous or continuous in
the mean square sense at timet ifl.i.m. 5 ~rX(s) = X(t), that is,

lim E[(X(s)- X(t)) 2 ] =lim II X(t)- X(s) II~= 0. (3.48)


s~t s~t

Note: This definition states that X is m.s. continuous at t if the distance between X(t)
and X (s) defined by the second moment of the difference X (t) - X (s) approaches zero as
s ~ t. The process X is said to be m.s. continuous in an interval/ if it is m.s. continuous
at each t E /.The norm in Eq. 3.48 is the norm in L2.
An JRd -valued stochastic process X is m.s. continuous at a time t if and only if its
coordinates Xi, i = 1, ... , dare m.s. continuous at t. &

The definition in Eq. 3.48 is not very useful for checking m.s. continuity.
We give a simple criterion for assessing whether a process is m.s. continuous.
• A real-valued process X is m.s. continuous at a time t if and only if its
correlation function r(u, v) = E[X(u) X(v)] is continuous at u = v = t, that
is, limu,v~t r(u, v) = r(t, t).
• A real-valued weakly stationary process X is m.s. continuous at a time t if
and only iflim,~o r(r) = r(O).

Proof: If X is m.s. continuous at t, the following chain of equalities and inequalities

0 :::=: !r(u, v)- r(t, t)!


= !E[X (u) X (v)] - E[X(t) X (u)] + E[X(t) X (u)]- E[X (t) X (t)ll
::::: IE[X(u) (X(v)- X(t))JI + IE[X(t) (X(u)- X(t))JI
::S: ( E[X(u) 2 ] E[(X(v) -X (t)) 2 ]) 112 + ( E[X(t) 2 ] E[(X (u) -X (t)) 2J) 112
shows that r(u, v) ~ r(t, t) as u, v ~ t. The last relationship follows by the Cauchy-
Schwarz inequality.
Conversely, the relationship E[(X(s)- X(t)) 2 ] = r(s, s) + r(t, t)- 2r(s, t) and
the convergence r(u, v) ~ r(t, t), u, v ~ t, imply the m.s. continuity of X at timet. •

A real-valued process X is m.s. continuous in an interval I if and only if its


correlation function r(u, v) = E[X (u) X (v)] is continuous on I x I.
142 Chapter 3. Stochastic Processes

Proof: We have E[(X(t) -X (s)) 2] = r(t, t) + r(s, s) - 2 r(t, s) fort, s E I arbitrary so


that X is m.s. continuous in I if r(t, t) +r(s, s)- 2 r(t, s) converges to zero as it -si ~ 0.
Hence, the process X is m.s. continuous in I if r is continuous in I x I.
A process X is uniformly m.s. continuous in I if for each s > 0 there exists 8 > 0
such that II X(u)- X(v) ii2< s for iu- vi < 8, u, v E I, where 8 depends on s but
not on the arguments of X. Hence, the distance between X(u) and X(v) can be bounded
uniformly by s if iu - vi is smaller than a fixed value 8. It can also be shown that if X is
m.s. continuous in a closed and bounded interval I, then X is uniformly m.s. continuous in
I ([157], Theorem 2.5, p. 34). A similar result holds for deterministic functions. •

Example 3.37: The Brownian motion B, the compound Poisson process C, and
the process X in Eq. 3.33 are m.s. continuous. Also, the processes B and X are
sample continuous while C is not. <>

Proof: The correlation functions of B and C at two times t and s are proportional with
t 1\ s. Because t 1\ sis a continuous function, Band Care m.s. continuous. The process
X is weakly stationary so that it is sufficient to observe that its correlation function is
continuous at the origin (Eq. 3.34).
The differences between the sample properties of B, C, and X show that m.s. con-
tinuity is a relatively weak requirement that does not provide much information on the
sample continuity of a process. •

3.9.2 Differentiation
The derivative X(t) = dX(t)jdt of the process X in Eq. 3.33 is

n
X(t, w) = L ak l!k [-Ak(w) sin(vk t) + Bk(w) cos(vk t)]
k=l

and can be calculated sample by sample. Generally, such calculations are not pos-
sible since analytical expressions are rarely available for the samples of arbitrary
stochastic processes.
A real-valued process X in L2 is mean square differentiable at t if
. X(t+h)-X(t)
L1.m. h---+0 h (3.49)

exists. This limit, denoted by X(t) or dX(t)jdt, is called the m.s. derivative of
X att.

Note: Because the limit in Eq. 3.49 is not known, it is not possible to calculate the second
moment of the difference X(t)- (X(t +h)- X(t))/ h. To prove the existence of X(t),
we need to show that {(X(t + hn)- X(t))/ hn} is a Cauchy sequence in L2 if hn ~ 0 as
n ~ oo. Because L2 is complete, {(X (t + hn) - X (t)) / hn} has a limit in L2 denoted by
X(t). Recall that a sequence {Yn} in L2 is Cauchy if II Yn- Ym ib~ 0 as m, n ~ oo.
3.9. Second moment calculus 143

An .!Rd -valued process X is m.s. differentiable at t if and only if its coordinates, that
is, the real-valued stochastic processes X;, i = 1, 2, ... , are m.s. differentiable at t. The
process X is said to be m.s. differentiable in an interval I if it has an m.s. derivative at
each timet E /. A

As for m.s. continuity, we give a criterion for assessing whether a real-


valued process X is m.s. differentiable. We also give some useful formulas in-
volving the m.s. differentiation and the expectation operators.
• X(t) exists in the mean square sense and E[X(t) 2 ] < oo if and only if
a2 r(u, v)j(au av) exists and is finite at u = v = t.
•If X is a weakly stationary process, then X(t) exists in the mean square sense
and E[X(t) 2 ] < oo if and only if r"(r) = d 2 r(r)jdr 2 exists and is finite at
r =0.
Note: The proof of these statements considers a sequence Yn = [X (t + hn) -X (t)]/ hn,
where hn ,!, 0 as n ~ oo. The existence of X in the m.s. sense implies that Yn is an
m.s. Cauchy sequence, that is II Yn- Ym 112~ 0 as m, n ~ oo. This property can be used
to show that a2 r(u, v)j(au av) exists and is finite at u = v = t. The converse results by
considering a finite difference approximation of a2 r(u, v)j(au av) and showing that the
existence of this partial derivative implies E[X(t) 2 ] < oo ([175], Section 4.4). A

If X is a real-valued m.s. differentiable stochastic process, then

d
dt E[X(t)] = E[X(t)],
. a .
at E[X(t) X(s)] = E[X(t) X(s)], and
a2 • . a2 r(t, s)
atasE[X(t)X(s)] = E[X(t)X(s)] or rx,x(t,s) = atas (3.50)

Additional properties are in Table 3.2.

Proof: The formulas in Eq. 3.50 show that differentiation and expectation can be inter-
changed. The proof of these formulas is based on Eq. 3.45. For example, the deriva-
tive of E[X(t)] is given by the limit of E[(X(t +h)- X(t))/h] ash ~ 0. Because
X exists, the expectation and the limit operations can be interchanged (Eq. 3.45) so that
limh---+0 E[(X (t +h)- X (t))/ h] is equal to E[l.i.m.h---;.O(X(t+h)- X(t))/ h] = E[X(t)].
The existence of derivatives of X of order two and higher require additional condi-
tions. For example, X(t) exists and E[X(t) 2 ] < 00 if and only if arx j((U, v)jau av exists
and is finite at u = v = t. • '

If X is a real-valued, weakly stationary, and m.s. differentiable process, then

d 2r(r)
rx
'
j((T) = ---
dr
2- = -r
11
(r) and rx,x(r) =- /_: v 2 eRv'dS(v).
(3.51)
Proof: The first formula in Eq. 3.51 follows from the last formula in Eq. 3.50. The
second formula in Eq. 3.51 can be obtained by differentiating Eq. 3.35. These results
144 Chapter 3. Stochastic Processes

Table 3.2. Properties of mean square differentiation (adapted from [175]).

Non-stationary Processes Weakly Stationary Processes

d dX (t) dX (t)
-[aX(t) +hY(t)] = a - - +b - - , a, b = constants
dt dt dt
d dg(t) dX (t)
d[g(t)X(t)]= --X(t)+g(t)--, g = a differentiable function
t dt dt
dn
E [ x(n) (t)] = - E [X(t)] E [ xCnl (t)] = 0, n2:,l
dtn

E [ x(n) (t) x(m) (s)] =


an+mr(t, s) [ J
E x(n) (t) x<m) (s) = ( -l)n dn+mr(r)
atn asm drn+m
dn
Note: x(n)(t) = -X(t) and T =s-t.
dtn

show that r 11 (r) = - f~oo v2 eR v r dS(v) so that the variance of X is E[X(t) 2]


f~oo v2 dS(v). The latter result can also be obtained by direct calculations. We have

d · d .
r (r) = -E[X(t + r) X(t)] = -E[X(t) X(t- r)]
11

dr dr
= -E[X(t) X(t- r)J = -E[X(t + r) X(t)J

since X is a weakly stationary process. •

Example 3.38: Let B denote a Brownian motion and C be a compound Poisson


process with E[Yt] = 0 and A. E[Yf] = 1 (Eq. 3.6). The correlation function of
these two processes is r(u, v) = u 1\ v so that they are not m.s. differentiable since
a 2 r(u, v)j(au av) does not exist.
Let us attempt to calculate the mixed derivative of u 1\ v anyway. Formal
calculations give a 2 r(u, v)j(au av) = 8(u- v), where 8(·) denotes the Dirac
delta function. Also, the one-sided spectral density of B and C is g(v) = 1/n,
v 2: 0. Processes with constant spectral density and delta correlation functions
are called white noise processes in the engineering and physics literature. These
calculations defining white noise processes are meaningless because B and C do
not exist in the m.s. sense. We will consider alternative ways of defining white
noise processes based on properties of the increments of the Brownian, compound
Poisson, and other processes. <>
Note: The differentiation of u /\ v relative to u gives a step function that is 1 for v 2: u and
zero otherwise. The differentiation of this step function with respect to v gives the Dirac
delta function that is zero for u i= v and has unit mass at u = v. Continuing with formal
calculations we find that the one-sided spectral density function of B is constant and equal
to 1/n (Eq. 3.39 and 3.38). &
3.9. Second moment calculus 145

Example 3.39: If X E L2 is m.s. differentiable at a timet, then X is m.s. contin-


uous at this time. 0

Proof: We need to show that II X (t +h) - X(t) 112 converges to zero ash -+ 0. Because
X is m.s. differentiable, the right side of the equation

II X(t +h)- X(t) ll2= lhl II [X(t +h)- X(t)]/ h ll2


converges to (0) II X(t) 112 ash -+ 0 so that, X is m.s. continuous at t. •

3.9.3 Integration
Let {X(t), t E [a, b]} be areal-valued stochastic process andleth : [a, b]-+
~be a real-valued function, where [a, b] C llt Our objective is to define the inte-
grals

1b h(t)dX(t) and 1b X(t)dh(t) (3.52)

in the mean square sense. These integrals are random variables because their
values depend on the particular sample of X used for calculations. If X is a para-
metric process, the integrals in Eq. 3.52 can be written explicitly. For example, if
X is in Eq. 3.33 and h is an integrable function in [0, t], we have

where hc,k(t) = f~ h(s) cos(vk s) ds and hs,k(t) = - f~ h(s) sin(vk s) ds are


Riemann integrals. The process Y has mean zero, covariance function
n
E[Y(t) Y(s)] =La[ Vf [hc,k(t) hc,k(s) + hs,k(t) hs,k(S) J
k=1
and can be interpreted as the output of a linear filter with transfer function h to a
random input X.
Generally, it is not possible to obtain analytical solutions for the integrals
in Eq. 3.52. We define these integrals in the m.s. sense. The definition of these
integrals requires some notation, that we introduce now. A finite set of points
p = (a = to < t1 < · · · < tm = b) is called a partition of [a, b] and t~ E
[tk-1, tk] is an intermediate point of p. The mesh of pis !::J.(p) = max1::;k::;m(tk-
tk-1). A partition p' is a refinement of p if p' 2 p. Refining partitions are not
necessary for the definition of the m.s. integrals in this section. We will use
refining partitions to define some of the stochastic integrals in Chapter 4.
Let Pn =(a= tan) < t{n) < · · · < tf/:} =b) be a sequence of partitions
of [a, b] with intermediate points t~(n) E [t~~l' tt)J such that !::J.(pn) -+ 0 as
146 Chapter 3. Stochastic Processes

n ~ oo. Define the sums

mn
Sh,x(Pn) = Lh(t£(n)) ( X(t~n))- X(t~~ 1 )) and
k=l
mn
Sx,h(Pn) = LX(t£(n)) (h(t~n))- h(t~~ 1 )). (3.53)
k=l

If Sh,x(Pn) and Sx,h(Pn) are Cauchy sequences in L2, we say that h is


m.s. Riemann-Stieltjes integrable on [a, b] with respect to X and that X is
m.s. Riemann-Stieltjes integrable on [a, b] with respect to h, respectively.

Note: The m.s. limits of the Cauchy sequences Sh,X and Sx,h define the integrals in
Eq. 3.52. These definitions are admissible because the limits of the Cauchy sequences
in Eq. 3.53 are in L2 and are independent of the intermediate points t£ (n) ([157], Theo-
rem 2.16, p. 41). The sequences Sh,X and Sx,h are similar to the Riemann-Stieltjes sums
used to define classical integrals.
The independence of the limits of the Cauchy sequences in Eq. 3.53 on the interme-
diate points t£ (n) of the partitions of [a, b] is essential for the definition of the m.s. integrals
in Eq. 3.52. If the limits were dependent on t£ (n), the above definitions would not be ad-
missible. This condition is not always satisfied (Section 4.3).

I: I:
The definition of the m.s. integrals in Eq. 3.52 can be extended to the integrals
X(t) dh(t) depending on an JRlf -valued process X. The coordinates
I: I:
h(t) dX(t) and
h(t)dXi(t) and Xi(t)dh(t), i = 1, ... ,d, of these integrals are given by limits of
Cauchy sequences as in Eq. 3.53. A

3.9.3.1 Variation functions

Consider a real-valued process X in L2, an interval [a, b] of the real line,


and a function h: [a, b] ~ R

m
• Vh(P) = L lh(tk)- h(tk-I)I = thevariationofh and
k=l

L II X(tk)- X(tk-I) ll2= the variation of X


m
• vx(p) = (3.54)
k=l

on [a, b] relative to the partition p = (a= to < t1 < · · · < tm =b).


3.9. Second moment calculus 147

m n
v,(p, q) = LL lr(tk, St) - r(tk, Sf-!) - r(fk-l, S[) + rCtk-l, S[-J)l
k=l l=l
= the variation of r (3.55)

on [a, b] x [a, b] relative to the partitions p =(a= to < t1 < · · · < tm =b)
and q =(a= so < SJ < · · · < sm =b) of [a, b].
Note: The variation of ron [a, b] x [a, b] is defined by
m n
v,(p, q) =LL IE[(X(tk)- X(tk-J))(X(st)- X(st-J))JI
k=ll=l
so that

L L (E[(X(tk)- X(tk-!)) 2 ] E[(X(sz)- X(sz-J)) 2l) l/2 = vx(p) vx(q)


m n
v,(p, q) :S
k=ll=l
for any partitions p and q of [a, b]. A

• vh = sup{vh(p)} =the total variation of h,


p

• vx = sup{vx(p)} =the total variation of X (3.56)


p

on [a, b], where the supremum is taken over the set of all partitions p of [a, b ].

v, =sup v,(p, q) =the total variation of ron [a, b] x [a, b], (3.57)
p,q

where the supremum is calculated over all partitions (p, q) of [a, b] x [a, b].

• If vh < oo, we say that h is of bounded variation on [a, b].


• If v x < oo ( v, < oo), we say that X is of bounded variation in the strong
(weak) sense on [a, b].
Note: It can be shown that a monotone or a differentiable function h is of bounded vari-
ation. The total variation of h on an interval [a, c], a < b < c, is the sum of the total
variations of this function on [a, b] and [b, c]. Moreover, vh (p) increases as the partition
pis refined ([157], Section 2.4.2).
It can be shown that ( 1) it is sufficient to consider identical partitions for the two
arguments of r, that is, we can take p = q in Eq. 3.57, (2) the variation v, (p, q) increases
as the partition of [a, b] x [a, b] is refined, (3) the total variation of r over finite unions of
disjoint intervals is the sum of the total variations of r on these intervals, and (4) if X is
of bounded variation in the strong sense, then X is of bounded variation in the weak sense
(see note following Eq. 3.55 and [157], Section 2.4.4). A
148 Chapter 3. Stochastic Processes

Example 3.40: If X is an m.s. differentiable stochastic process defined on [a, b ],


then X is of bounded variation in the strong sense on this interval. <>
Proof: Let p = (a= to < t1 < · · · < tm =b) be a partition of [a, b]. The variation of X
on [a, b] relative top is (Eq. 3.54)

vx(p) =L
m
II
ltk X(t) dt 112:::: Lm ltk II X(t) 112 dt :::=a (b- a),
k=1 tk-1 k=1 tk-1

where a= max1e[a,b] II X(t) liz. The equality and inequality in this equation follow from
Eq. 3.58 and 3.61 in a subsequent section. •

Example 3.41: The Brownian motion B is of bounded variation in the weak sense
on any bounded interval [0, r ], 0 < r < oo, butis not of bounded variation in the
strong sense on [0, r]. <>
Proof: Let p = (0 = to < t1 < · · · < tm = r) be a partition of [0, r ]. The variation of the
correlation function r(t, s) = E[B(s) B(t)] = s 1\ ton [0, r] x [0, r] with respect top is
m
v,(p, p) = L IE [<BCtk)- BCtk-1>> (BCtt)- BCtt-1>>]1
k,/=1
m m
=LIE [<B(tk)- B(tk-1))ZJI = L (tk- tk-1) = r < oo.
k=I k=1

The variation of B relative to a partition Pn of [0, r] with points t~n) = k r j n,


k = 0, 1, ... , n, is
n n
VB(Pn) = L II B(tt))- B(t~~1 ) liz= L FJn =./liT
k=I k=I

so that VB(Pn)--+ oo as n--+ oo. •

Example 3.42: The compound Poisson process C in Eq. 3.6 with jumps Y k ~ Lz
such that E[Yk] = 0 is of bounded variation in the weak sense on [0, r] but is not
of bounded variation in the strong sense. <>
Proof: Let p = (0 = to < t1 < · · · < tm = r) be a partition of [0, r ]. The variation of the
correlation function of C on [0, r] x [0, r] relative to p is

L E[(C(tk)- CCtk-I))z] = J.. r E[Yf] < oo.


m
v,(p, p) =
k=I

The variation of C on [0, r] relative to a partition Pn defined by the points t~n) =


krjn,k =0, 1, ... ,n,is
3.9. Second moment calculus 149

so that vc(pn) ---+ oo as n ---+ oo. The properties E[C(t)] = ). t E[YJl, Var[C(t)] =
). t E[Yf], and E[C(t) 2] = (). t E[YJ]) 2 + ). t E[Yf] of C have been used in the above
equations. •

3.9.3.2 Conditions of existence


J:
J:
We summarize conditions for the existence of the integrals X(t) dh(t)
and h(t) dX (t) in Eq. 3.52 and some relationships between these integrals.
If X is m.s. continuous and h is of bounded variation on [a, b], then

•1bX(t)dh(t) existsand 111bX(t)dh(t)ll2savh,

• II 1b X (t) dt 112::: 1b II X (t) 112 dt ::: a(b- a), and

• E [ {b X(t)dt {b Z(t)dt] = { E[X(u)Z(v)]dudv, (3.58)


la la J[a,b]Z
where a = maxtE[a,bl II X (t) ll2 and Z is an m.s. continuous stochastic process.
If X is m.s. continuous and h is continuously differentiable on [a, b], then

• 1b X(t)dh(t), 1b X(t)h(t)dt existand

• 1b X(t)dh(t) = 1b X(t)h(t)dt. (3.59)

If X is m.s. continuous on [a, b], then the integral J~ X(s) ds exists, is m.s. dif-
ferentiable on [a, b], and

-d
dt
lt
a
X(s) ds = X(t), tE[a,b]. (3.60)

If X is m.s. continuous on [a, b], then the following integral exists and

1b X(t)dt = X(b)- X(a). (3.61)

Note: The existence of the integral J/:X (t) dt in the last two properties in Eq. 3.58 follows
from the first property in this equation applied for the special case h(t) = t. The continuity
of the function t r+ II X(t) 112 on [a, b] implies the existence of J/: II X(t) 112 dt.
150 Chapter 3. Stochastic Processes

The proof of the statements in Eqs. 3.58, 3.59, 3.60, and 3.61 can be found in [157]
(Theorem 2.22, p. 46, Theorem 2.23, p. 50, Theorem 2.24, p. 51, and Theorem 2.25, p. 51,
respectively) . .&

If hi are continuous and Xi E Lz are processes of bounded variation in the weak


sense on [ai, bi], i = 1, 2, then

(3.62)

If h is continuous and X is m.s. differentiable with m.s. continuous derivative X


on [a, b], then

• ib h(t) dX(t), ib X h(t) dt exist and

• ib h(t) dX (t) = ib X h(t) (t) dt. (3.63)

If h, k : [a, b] ---+ ~ are continuous and X E Lz is a process of bounded


variation in the weak sense on [a, b] , then

• Y(t) =it h(s) dX(s) exists, is of bounded variation in the weak

sense on [a, b] fortE [a, b], and

• ib h(t)dY(t) exists, and ib h(t)dY(t) = ib h(t)k(t)dX(t). (3.64)

Note: If we take ai = a, bi = b, hi = h, and Xi = X, i = 1, 2, in Eq. 3.62, and assume as


in this equation that h is continuous and X E Lz is of bounded variation in the weak sense
on [a, b] , then

• 1b1b h(s)h(t)
a2r(s t)
' dsdt,
1b h(t)dX(t) existand

tt
a a as at a

o E [ ( t h (t) dX (t) ) ' ] ~ h(<) h(t) '';,'~·,') d, dt ~ p2 0, (3.65)


3.9. Second moment calculus 151

f3 = maxte[a,b]lh(t)l
I:
where ([157], Theorem 2.29, p. 59). This special case also shows
that the integral h (t) d X (t) exists if h is continuous and X is of bounded variation in
the strong sense on [a, b] ([157], Theorem 2.26, p. 52). The proof of the statements in
Eqs. 3.62, 3.63, and 3.64 can be found in [157] (Theorem 2.30, p. 60, Theorem 2.27, p. 53,
and Theorem 2.31, p. 61, respectively).
We also note that the m.s. integrals in Eqs. 3.58, 3.59, 3.60, 3.62, and 3.64 with X
and Z being equal to a Brownian motion process B are defined in the m.s. sense. For exam-
ple, we have II I:
B(t)dt 112:::0 I:
II B(t) 112 dt :5 b 112 (b-a) and E [ (!:
B(t)dt) 2] =
I[a,b]z(u v)dudv.
/1. A

3.9.3.3 Properties for calculations

We summarize some properties of the m.s. integrals in Eq. 3.52 that are rele-
vant for calculations. These properties resemble features of the Riemann-Stieltjes
integral.

Integration by parts. The m.s. integral I: h(t) dX (t) exists if and only if the
m.s. integral J: X(t) dh(t) exists and
1b h(t)dX(t) =h(t)X(t) I~ -1b X(t)dh(t). (3.66)

If c E [a, b] and the following m.s. integrals exist, then

• 1b h(t) dX (t) = 1c h(t) dX (t) + 1b h(t) dX (t),


• 1b X(t)dh(t) = 1c X(t)dh(t) + 1b X(t)dh(t). (3.67)

The m.s. integral is linear. If h, k and X, Y E L2 are such that J: h(t) dX (t),
J: k(t) dX (t), and I: h(t) dY (t) exist in m.s., then
• 1b [a h(t) + fJ k(t)] dX (t) =a 1b h(t) dX (t) + fJ 1b k(t) dX (t),
• 1b h(t)d[a X(t) + fJ Y(t)] =a 1b h(t)dX(t) + fJ 1b h(t)dY(t),
(3.68)

where a, fJ E lit Similar properties hold for the m.s. integral I: X (t) dh(t).
152 Chapter 3. Stochastic Processes

Expectations and m.s. integrals interchange. If the m.s. integral


I: h(t) dX (t) or I: X (t) dh(t) exists, then the integrals I: h(t) dE[X (t)] and
I: E[X (t)] dh(t) exist, respectively, and

• E [ib h(t)dX(t)J = ib h(t)dE[X(t)],

• E [ib X(t)dh(t)J = ib E[X(t)]dh(t). (3.69)

Note: The proof of the statements in Eqs. 3.66, 3.67, 3.68, and 3.69 can be found in [157]
(Theorem 2.17, p. 42, Theorem 2.18, p. 43, Theorem 2.19, p. 44, and Theorem 2.20, p. 44,
respectively) and are based on the definition of the m.s. integrals in Eq. 3.52. For example,
the expectation of Sh,x(Pn) is

mn
E[Sh,X(Pn)] = L h(t~(n)) (E[X(t~n))]- E[X(t~~ 1 )J).
k=l

The left side of this equation converges to E [!: h (t) d X (t) Jbecause Sh, x (Pn) is a Cauchy
sequence in Lz approximating I:
h(t) dX (t) as Ll(pn) ---+ 0 (Eq. 3.45). The right side of

I:
the above equation converges as Ll(Pn) ---+ 0 to the classical Riemann-Stieltjes integral
h(t)dE[X(t)]. &

Example 3.43: Let h : [a, b] ---+ lR be a differentiable function on [a, b] and let
X be an m.s. continuous process. The m.s. integral Y = I: X (t) dh(t) is well
defined and E[Y 2 ] =I: I: r(t, s) g(t) g(s) dt ds exists and is bounded, where r
denotes the correlation function of X and g(t) = dh(t)jdt. <>

Proof: The integral Y = I:


X(t)dh(t) exists in the mean square sense by one of the
properties in Eq. 3.58 since h being differentiable in [a, b] is also of bounded variation in
this interval. The expectation E[Y 2] results from the last property in Eq. 3.58 since the
process g X is m.s. continuous. •

Example 3.44: Let hand X be as in Example 3.43. The mean and variance of the
m.s. integral Y =I: X (t) dh(t) are

/-Ly = ib g(t) fJ,(t) dt and a; = ib ib g(t) g(s) c(t, s) dt ds

where f-L and c are the mean and covariance functions of X. If X is a Gaus-
sian process, Y is a Gaussian variable with the characteristic function cp y (u) =
exp( H u /-Ly - u 2 a'j /2). If X is a Brownian motion process, then f-Ly = 0 and
a'j =I: I:(t 1\s)g(t)g(s)dtds. <>
3.9. Second moment calculus 153

Proof: The properties in Eq. 3.58 can be used to confirm the above statements. We present
here a direct proof. The sum Sgx,l(Pn) = :E;~l g(tk(n)) X(tk(n)) (t~n)- t~':l 1 ) is a
Cauchy sequence in L2 with expectation 1-ty,n = Lk=l g(tk(n)) ~-t(tk (n)) (tt>- t~':l 1 ) that
converges to 1-ty as !:::.(pn) -+ 0. Similar considerations show that the expectation of the
square of the centered sums Sg x, 1 (Pn) - 1-ty,n converges to a} as !:::.(pn) -+ 0. If X is
Gaussian, then Y is a Gaussian variable with the specified characteristic function. •

Example 3.45: Let h : [0, oo) 4- IR be a differentiable function and let X be a


m.s. continuous process. Then Y(t) = J~ g(s) X(s) ds, t ::: 0, is a m.s. differen-
tiable process with mean and correlation functions

E[Y(t)] =lot g(s) f.-L(s) ds and


E[Y(t) Y(s)] =lot du g(u) los dv g(v) r(t, s),

where g(t) = dh(t)fdt. The process Y(t), t ::: 0, can represent the output of
a linear dynamic system with transfer function g subjected to an input X (Sec-
tion 7.2). If X is Gaussian, Y is a Gaussian process so that the above moments
define all finite dimensional distributions of Y. ¢

Proof: The existence of the process Y and the expressions of its first two moments result
from Eq. 3.58 since Y(t) = JJ X(s) dh(s) and his differentiable so that it is of bounded
variation on compact sets.
The process Y is m.s. differentiable since g X is m.s. continuous (Eq. 3.60) so that
Y is also m.s. continuous. Note that Y satisfies the differential equation Y(t) = g(t) X(t),
t :=:: 0, with the initial condition Y (0) = 0. The right side of this equation is m.s. continuous
so that its left side must also be m.s. continuous. Hence, JJ Y(s) ds = Y(t) - Y(O) by
JJ
Eq. 3.61, which gives Y(t) = g(s) X(s) ds, that is, the definition of Y.
We give the definition of a compact set here for convenience. Consider a topological
space (Section 2.2.2) and a subset K of this space. The set K is compact if every open
cover of K has a finite subcover, that is, if any collection Da, a E I, of open sets such that
K c Uael Da has a finite subcollection Da 1 , ••• , Dan with the property K c Uk=l Dak.
It can be shown that a set K c R is compact if and only if K is closed and bounded
([26], Theorem 13.6, p. 95). Hence, the intervals [a, oo) and [a, b) of the real line are not
compact sets, but [a, b] is a compact set. The same conclusions follows from the definition.
For example, [a, oo) is not a compact set since it is not possible to extract from the open
cover (a- lfn, n), n = 1, 2, ... , of this interval a finite cover. •

3.9.4 Spectral representation


We show that any weakly stationary stochastic process can be represented
by a superposition of harmonics with random amplitude and phase. The represen-
tation is very useful in many applications.
154 Chapter 3. Stochastic Processes

Example 3.46: Let X be the real-valued stochastic process in Eq. 3.33 with co-
variance and spectral density functions given by Eqs. 3.34 and 3.40, respectively.
An alternative representation of X and its covariance and spectral density func-
tions is
n
X(t) = L Cp eR vpt, c(r) = and
p=-n p=-n

L { [8(v -
n a2
s(v) = vp) + 8(v + vp)], (3.70)
p=-n

where V-k = -Vk, O"-k = ak, k = 1, ... , n, the random coefficients C P are
complex-valued with means E[C p] = 0 and correlations E[C P c;] = (a'j;/2) 8pq.
Co = 0, and ao = 0. <>
Note: The representation in Eq. 3.70 results from Eqs. 3.33, 3.34, and 3.40 and the rela-
tionships cos(u) = [eR u + e-R u]/2, sin(u) = [eR u - e-R u]/(2 .J=T). The
coefficients of the harmonics of X are Cp = O"p [Ap- .J=T Bp]/2, where A-p = Ap
and B-p = -Bp. The second moment properties of Cp are E[Cp] = 0 and E[Cp c;] =
(O"p O"q/4) E[(Ap- R Bp) (Aq + R Bq)] = (0"~/2) Opq. where A_k = Ak and
B_k = - Bk fork = 1, ... , n. A

The representation in Eq. 3.70 shows that X consists of a superposition of


harmonics C PeR vp t with frequencies vp. The coefficients C P of these harmon-
ics are orthogonal, have mean zero, and variance s(v p) = a'j;/2. This observation
suggests the representation X (t) = f~oo eR v t d Z (v) for an arbitrary weakly
stationary process X, where the random complex-valued coefficients d Z (v) asso-
ciated with the harmonics of frequencies v E IR are such that E [d Z (v)] = 0 and
E[dZ(v) dZ(v')*] = 8(v- v') s(v) dv. It turns that our intuition is correct.

3.9.4.1 <C- and IR-valued stochastic processes

Spectral representation theorem. If X is a complex-valued, weakly stationary,


and m.s. continuous process with spectral distribution S and spectral density
s, then there exists a complex-valued process Z with orthogonal increments
defined up to an additive constant such that the m.s. integral

X(t) =I: eRvt dZ(v) (3.71)

exists at any timet and, for Z( -oo) = 0, we have

E[Z(v)] = 0, E[IZ(v)l 2 ] = S(v), and E[ldZ(v)l 2 ] = dS(v) = s(v) dv.


(3.72)
3.9. Second moment calculus 155

Note: This theorem shows that a complex-valued weakly stationary stochastic process can
be constructed as a superposition of orthogonal harmonics eP v t d Z (v) with frequency
v and random amplitude and phase defined by the increments d Z (v) of a process Z, called
the spectral process associated with X ([45], Section 7.5). The representation in Eq. 3.71
is the continuous version ofEq. 3.70.
Since X is m.s. continuous, its correlation function is continuous so that the Bochner
theorem in Eq. 3.35 can be applied and gives E[X(t + r) X(t)] = f~oo eP vr dS(v).
The m.s. integral in Eq. 3.71 generalizes the m.s. integral defined by Eq. 3.52 be-
cause both its integrand and integrator are complex-valued. The definition, properties, and
calculation formulas established for the m.s. integral in Eq. 3.52 can be extended to this
type of integral by applying results of the previous section to the real and imaginary parts
of the integral in Eq. 3.71. These considerations show, for example, that the integral in
Eq. 3.71 exists in the mean square sense because its integrand eP vt is a continuous
function and the integrator Z is of bounded variation in the weak sense. The variation of
the correlation function of Z in a bounded interval [-a, a] x [-a, a], a > 0, for a partition
p = (-a = vo < VJ < · · · < Vm =a) of this interval is (Eq. 3.55)

m
Vr(p, p) = L IE[(Z(vk)- Z(vk-J))(Z(vl)- Z(vi-J))*ll
k,l=l
m m
=L E[IZ(vk)- Z(vk-1)1 2 ] = L(S(vk)- S(vk-J))) = S(a)- S(-a) (3.73)
k=l k=l

so that Z is of bounded variation in the weak sense on [-a, a] for any a > 0. The process
Z is also of bounded variation in the weak sense on lR because liiila~oo[S(a)- S( -a)]=
S(oo) < oo. A

Spectral representation theorem. If X is a real-valued, weakly stationary, and


m.s. continuous process with one-sided spectral density g, then there exist two
real-valued processes U, V with orthogonal increments such that the m.s. inte-
gral
X(t) = fooo [cos(v t) dU(v) + sin(v t) dV(v)] (3.74)

exists at any time t and we have

E[U(v)] = E[V(v)] = 0, E[dU(v) dV(v 1 )] = 0, and


E[dU(v) 2 ] = E[dV(v) 2 ] = g(v) dv for v, v' :::: 0. (3.75)

Note: The properties of the processes U, V are similar to the properties of the coefficients
Ak> Bk in Eq. 3.33. The moments of U and V in Eq. 3.75 can be obtained from Eqs. 3.71
and 3.72 under the assumption that X is real-valued ([45], Section 7.6). A

Example 3.47: The correlation functions of the processes in Eqs. 3.71 and 3.74
156 Chapter 3. Stochastic Processes

are

E[X(t)X(s)*]= i:e.J=I(t-s)vs(v)dv and

E[X(t) X(s)] = fooo cos(v(t- s)) g(v) dv,


respectively, if X takes complex and real values, respectively. Both processes have
mean zero. <>
Proof: We have seen in Eq. 3.69 that expectation and integration can be interchanged under
some conditions that are satisfied by the m.s. integrals in Eqs. 3.71 and 3.74 (Section 3.9.3.2
and 3.9.3.3). This result and the properties of the processes Z, U, and V give the above
correlation functions. •

The following two examples illustrate linear operations on weakly station-


ary processes. These operations are commonly used in applications, for example,
to find properties of the solution of linear differential equations with random input
(Chapter 7).
Example 3.48: Let Y (t) = I:~= I Ck X (t + tk), where X is a complex-valued
process with spectral representation in Eq. 3.71, tk are times, Ck E C denote
some constants, and n ~ 1 is an integer. The stochastic process Y has the spec-
tral representation Y(t) = f~oo eR v 1 dZ(v), where dZ(v) = h(v) dZ(v) and
h(v) = L~=l CkeRvtk. <>
Proof: We have

Y(t) = 1oo
-00
[tk=l
Ck eR v tk] eR v t dZ(v) = 1oo h(v) eR v t dZ(v)
-00

so that Y consists of a superposition of harmonics eR v t with amplitudes h ( v) d Z ( v). If


h, referred to as gain, is such that f~oo lh(v)l 2 dS(v) < oo, then Y is in Lz. •

Example 3.49: Let X be a real-valued weakly stationary process. If X exists in


an m.s. sense, then

X(t) = /_: .J=-1 v eR vt dZ(v) (3.76)

with the notation in Eqs. 3.71-3.72. <>


Proof: Take a, 1: > 0 and let

x<a)(t) = J:a eR vt dZ(v) and x<a)(t) = r: R v e R vt dZ(v).

We need to show that E [ ( (X(al(t + <)- x<al(t))/<- x<al(t) ) 2 ] converges to zero as


1: --+ 0 for any a > 0. The stated result follows by letting a approach infinity. That the
above expectation converges to zero as 1: --+ 0 follows by straightforward calculations. •
3.9. Second moment calculus 157

£:
Example 3.50: Let X be the process in Eq. 3.71 and define

Y(t) = eHvt h(v)dZ(v),

where h(v) = 0, 1, and 2 for V < -~, V E [-~,Hand V >~.respectively, and


~ > 0. We have

Y(t) = ~~ eH vt dZ(v) + 2 roo eH vt dZ(v) (3.77)


-~ 1;
so that

Y(t) = l.i.m.;--+oY(t) = D.Z(O) + 2 roo eR vt dZ(v), (3.78)


lo+
where D.Z(O) = l.i.m.;--+0 f~; eH vt dZ(v) denotes the jump of Z at v = 0. If
the spectral distributionS is continuous at v = 0, the jump D.Z (0) vanishes.
The Hilbert transform X of X is defined by Y(t) = X(t)+.J=l X(t). If X
is the real-valued process in Eq. 3.74, its Hilbert transform has the representation

X(t) = fooo [sin(v t) dU(v)- cos(v t) dV(v)] (3.79)

and is real-valued with the notation in Eqs. 3.74 and 3.75. <>
Note: The definition of X shows that this process can be obtained from X by a linear
operation with the gain h(v) = A. 0, and -A for v < 0, v = 0, and v > 0,
respectively, that is, X(t) = fJR h(v) eH v t dZ(v) ([45], p. 142). The expression of the
gain results from Eq. 3.78 and the definition of X. A

3.9.4.2 JRd -valued stochastic processes

Spectral representation theorem. If X is an JRd -valued, weakly stationary,


and m.s. continuous process with spectral density functions s i,J, there exist two
r -valued processes u' v with orthogonal increments and mean zero such that

X(t) = fooo [cos(vt) dU(v) + sin(vt) dV(v)] (3.80)

exists in the m.s. sense at any time and

E[dUi(v)dUJ(v')] = E[dVi(v)dVJ(v')] = o(v- v')gi,J(v)dv,


E[dUi(v) dVJ(v')] = -E[dVi(v) dUJ(v')] = o(v- v') hi,J (v) dv, where
8i,J (v) = Si,J (v) + Si,J ( -v) and hi,J (v) = -.J=T [si,J (v) - Si,J ( -v)].
(3.81)
158 Chapter 3. Stochastic Processes

Proof: We have shown that the correlation and spectral density functions of the coordinates
of X are Fourier pairs (Eq. 3.41). An alternative form of the correlation function in Eq. 3.41
is

ri,J (r) = k eR v' Si,J (v) dv = fooo [ (si,J (v) + Si,J ( -v)) cos(v r)

+ J=T (si,j(v) -si,j(-v)) sin(vr)J dv.

The correlation function ri,J based on the representation of X in Eq. 3.80 is

ri,J (r) = E [Xi (t + r) X 1(t)] = fooo [gi,j (v) cos(v r) -hi,} (v) sin(v r)] dv.

The above expressions of ri,J yield the definitions of the second moment properties of U
and V in Eq. 3.~1. •

Example 3.51: Let X be the IR2 -valued process in Example 3.33 with coordinates
Xi(t) = .Jf-=-pWi(t) + .JPW(t), where (W1, W2, W) are zero-mean, weakly
stationary processes that are mutually uncorrelated and p E (0, 1). The process
X = (X 1, X2) has the spectral representation

X(t) = fooo [cos(v t) dU(v) + sin(v t) dV(v)],


where U and V are uncorrelated with each other, E[dUi(t)] = E[dVi(t)] = 0,
E[dUi(v) dl:~j(v')] = E[dVi(v) dVj(v')] = [(1- p) g<i)(v) OiJ + p g(v)] 8(v-
v') dv and g<iJ, g denote the one-sided spectral densities of Wi, W. The one-sided
spectral densities of X are gi,J (v) = [(1 - p) g<i) (v) OiJ + p g(v)]. 0
Proof: The spectral representations,

Wi (t) = fooo [ cos(v t) dU(i) (v) + sin(v t) dV(i) (v) J, i = 1, 2,

W(t) = fooo [cos(v t) dU(v) + sin(v t) dV(v)],


of processes Wi and Ware given by Eqs. 3.74 and 3.75. These processes and the definition
of X show that the coordinates of U and V in the spectral representation of X are U; (v) =
y'1="p u<i)(v)+.JP U(t) and Vi(v) = y'1="p y(i)(v)+.JP V(t). We have hi,} (v) = 0
because E[dUi(v)dVj(v 1 )] = 0 (Eq. 3.81) so that si,j(v) = si,J(-v). The equality
gi,j(v) = si,J(v) + Si,j(-v) shows that the ordinates of gi,J are twice the ordinates of
si,J. The same result can be obtained by taking the Fourier transform of the covariance
functions Ci,J (r) = (1- p) 8ij cw; (r) + p cw(r) of X. •

3.9.4.3 Random fields


Let (Q, :F, P) be a probability space. A function X : D x n ~ JRd is
said to be an JRd -valued random field if X (t) is an JRd -valued random variable on
3.9. Second moment calculus 159

(Q, :F, P) for each t E D, where D c JRd' and d, d' ~ 1 are integers. We limit
our discussion to real-valued random fields (d = 1) and denote these fields by X
or {X(t), tED}.
The f.d.d. of order n of X is the probability ofn?=l {X(ti) ::::Xi}, where ti E
D and Xi E R The random field X is said to be strictly stationary/homogeneous
or just stationary/homogeneous if its finite dimensional distributions are invari-
ant under a space shift, that is, if the points t i, i = 1, ... , n, are mapped into
ti + T E D, where T E JRd' and the integers n ~ 1 are arbitrary. The marginal
and the finite dimensional distributions of X are space invariant and depend only
on the "space lag" (tt - t2), respectively.
Suppose that X E L2(Q, :F, P) and define by

JL(t) = E[X (t)],


r(t, s) = E[X(t) X(s)], and
c(t, s) = E[(X(t)- JL(t)) (X(s)- JL(s))] (3.82)

the mean, correlation, and covariance functions, respectively. The pair (JL, r)
or (JL, c) defines the second moment properties of X. The properties of these
functions result from Section 3.7.1. If X is stationary, then JL(t) = JL is constant
andr(t, s), c(t,s) depend only on (t- s). If X is such that

JL(t) =constant and


r(t, s), c(t, s) =functions of only t - s, (3.83)

then X is called a weakly stationary/homogeneous random field. A stationary


field is weakly stationary but the converse is not generally true.
If the correlation function of a weakly stationary random field X is contin-
uous, it has the representation (Bochner's theorem in Eq. 3.35)

r(T) = { eHT·V dS(v) = { eHT·V s(v)dv, (3.84)


J~d' J~d'
where Sis a bounded, real-valued function such that JR dS(v) ~ 0 for all R E
d' d'
!3 , S( -oo, ... , -oo) = 0, and T · v = Lk=l 'l'i Vi ([1], Theorem 2.1.2, p. 25).
The function S is called spectral distribution. If S is absolutely continuous,
then dS(v) =[ad' S(v)jav1 · · · avd'] dv = s(v )dv, where s denotes the spectral
density function. The last equality in Eq. 3.84 is valid if X has a spectral density
function.
The second moment calculus developed in Section 3.9 can be extended di-
rectly to random fields. We can define m.s. continuity and differentiability for
random fields as in Sections 3.9.1 and 3.9.2. The extension of the m.s. integration
in Section 3.9.3 to random fields involves minor changes because the integration
is performed over a subset of !Rd' rather then an interval of the real line. A very
useful discussion on the second moment calculus for random fields can be found
in [1] (Chapters l and 2). If X is a weakly stationary m.s. continuous random
160 Chapter 3. Stochastic Processes

field with spectral distribution S and spectral density s, then X has the spectral
representation (Eq. 3.71)

(3.85)

where the random field Z is such that E[Z(v)] = 0 and E[dZ(v) dZ(v')*] =
8(v- v') dS(v) = 8(v- v') s(v) dv. If X is a real-valued field, then (Eq. 3.84)

{ sin(v · t) s(v) dv = 0 (3.86)


]ffi.d'
because the correlation function of X must be real-valued. The condition in
Eq. 3.86 is satisfied if the spectral density of X is such that s(v) = s(-v) for
all v e r'.
Example 3.52: Let X be a real-valued random field defined by
n
X(t) = L ak [Ak cos(vk · t) + Bk sin(vk · t)], t E JR2 , (3.87)
k=l

where E[Ak] = E[Bk] = 0, E[Ak BL] = 0, E[Ak At] = E[Bk Bz] = 8kz, ak > 0,
and Vk E JR2 for all k, l = 1, ... , n. The field has mean zero. The correlation and
spectral density functions of X are
n
r(T) = E[X(t + T) X(t)] = L al cos(vk · T) and
k=l
n 2
s(v) =L ak [8(v - Vk) + 8(v + Vk)] (3.88)
k=l 2

so that X is a weakly stationary random field. <>


Proof: The expectation of the defining equations of X gives E[X(t)] = 0. The correlation
function of X is
n
E[X(t) X(s)] = L Uk Uf8kl [cos(vk · t) cos(vk · s) + sin(vk · t) sin(vk · s)]
k,/=1
n
=L al cos (vk · (t- s))
k=l
by the properties of the random coefficients (Ak, Bk). The Fourier transform of this func-
tion is the spectral density of X. The random field X is closely related to the stochastic
process in Example 3.30. Both random functions are constructed by superposing n har-
monics with random amplitude and phase.
The waves generating the random field X in Eq. 3.87 can be visualized simply since
the domain of definition of X is ~2 . The zeros of cos(vk ·t) are the lines Vk ·t = rr /2+q rr,
3.9. Second moment calculus 161

q = 0, 1, ... , in the (tl, t2)-space so that the wave associated with the frequency Vk has
length 2 rr I J
v'f. 1 + v'f. 2 and evolves in a direction perpendicular to the zero lines at an
angle ek = tan- 1(vk,21vk,1) relative to the coordinate t1. •

3.9.4.4 Karhunen-Loeve representation

We show that a complex-valued process in L 2 can be represented by a linear


combination of a countable number of deterministic functions with random coef-
ficients. This representation, referred to as the Karhunen-Loeve representation,
generalizes the expansion in Example 3.30 and holds also for random fields ([1],
Section 3.3). The Karhunen-Loeve representation can be useful in applications
since it provides alternative definitions for stochastic processes and random fields
as functions of a countable number of random variables. However, the Karhunen-
Loeve representation is relatively difficult to obtain and captures only the second
moment properties of a random function.
Let {X(t), t E [a, b]} be a complex-valued stochastic process with mean
zero and correlation function r(t, s) = E[X(t) X(s)*]. Recall that the correlation

I: I:
function of X is symmetric, that is, r(t, s) = r(s, t)* (Example 3.29) and satisfies
the condition lr(t, s)e dt ds < oo since X is in L2. Consider the integral
equation

1b r(t,s)ifJ(s)ds =A.ifJ(t), t E [a,b]. (3.89)

A number).. ¥= 0 and a function ifJ satisfying Eq. 3.89 such that I:


lifJ (t) 12 dt < oo
is called an eigenvalue and an eigenfunction for Eq. 3.89, respectively.
It can be shown that (1) Eq. 3.89 has at least one eigenvalue and eigenfunc-
tion, (2) the collection of eigenvalues for Eq. 3.89 is at most countable and strictly
positive, (3) for each eigenvalue Ak there exists at most a finite number of linearly

I:
independent eigenfunctions, (4) the eigenfunctions ifJ k corresponding to distinct
eigenvalues are orthogonal, that is, ifJk(t) ifJL(t)* dt = 0 fork ¥=land Ak ¥= A.1,
and (5) if the correlation function r of X is square integrable and continuous in
[a, b] x [a, b], the Mercer theorem holds, that is, r(t, s) = "£'1:::
1 Ak ifJk(t) ifJk(s)*,
and this series representation converges absolutely and uniformly in [a, b] x [a, b]
([1], Section 3.3, [49], Appendix 2, [98], Section 6.2).
162 Chapter 3. Stochastic Processes

Karhunen-Loeve representation. Let {X(t), t E [a, b]} be a complex-


valued stochastic process with mean zero and correlation function r(t, s) =
E[X(t) X(s)*]. If r(·, ·)is square integrable and continuous in [a, b] x [a, b],
then X has the representation
n
X (t) = l.i.m.
n-+oo ~
"'ak Xk ¢k(t) at each t E [a, b], where

1b
k=l

1
xk = - X(t) ¢k(t)* dt, (3.90)
Gk a

Note: We only show that Xk in Eq. 3.90 has the stated properties. The mean of Xk is zero
since E[X(t)] = 0. The correlation of these random variables is

E[Xk XlJ = -1-


r7k !7[
1b1ba a
E[X(t) X(s)*] rPk(t)* r/Jt(s) dt ds

= -1-
r7k !7[
1b [1b
a a
r(t, s) r/Jt(s) ds ] ¢k(t)* dt = -!7[
r7k
1b
a
r/Jt(t)¢k(t)* dt = Dkf·

It can be shown that the m.s. error,


n
E [ ( X(t)- x<nlct)) ( X(t)- x<nlct))*J = r(t, t)- L lad 1¢k(t)l 2 •
k=l

of x<nl(t) = Lk=l ak Xk ¢k(t) converges to zero as n--+ oo ([49], Section 6-4). A

Example 3.53: Let {X (t), t E [ -~, ~]}, ~ > 0, be a real-valued stochastic pro-
cess with mean zero and correlation function r(t,s) = (lj4)e- 2 "'1<1, where
r = t - s denotes the time lag and a > 0. The Karhunen-Loeve representa-
tion of X in [-~, ~] is

X(t) = f [A
k=l
xk ¢k(t) + [5:: xk ¢k(t) J'
where
1 cos(2 a bk t)
Ak = , ¢k (t) = -----n::=;=::;::.=T,i==;=~::;=n====r=~
4a (1 + b~) .j~ + sm(4a bk ~)j(4a bk)

1 sin(2a bk t)
J~ - sin(4 a bk ~)/(4a bk)
A A

Ak = ¢k(t) = '
+ b~)
A '

4a (1
3.9. Second moment calculus 163

Note: The eigenvalues and eigenfunction in the representation of X are the solution of the
integral equation J~~ e- 2 a lt-sl ¢(s) ds = 4 A¢(t) defined by Eq. 3.89 ([49], Example 6-
4.1, p. 99) ...

Example 3.54: Let X be a real-valued process defined in [0, 1] with mean zero
and correlation function r(t, s) = t 1\ s. The Karhunen-Loeve representation of
this process is

. v'2 Ln sin(k + 1/2) n t


X(t) = l.t.m.- k Zb
n--->oo n
k=O
+ 1/ 2
where E[Zk] = 0, E[Zf] = 1, and the sequence Zo, Z1, ... is uncorrelated. <>
Proof: The condition in Eq. 3.89 becomes

f s ¢(s) ds +t [
1
¢(s) ds =A ¢(t), t E (0, 1),

and gives J/ ¢(s)ds = AtjJ1 (t) by differentiation with respect tot. We can perform this
operation since the left side of the above equation is differentiable so that ¢ on its right
side is also differentiable. Differentiating one more time with respect to time, we have
At/J 11 (t) + ¢(t) = 0 in (0, 1) with the boundary conditions ¢(0) = 0 and ¢'(1) = 0, which
J/
result from Eq. 3.89 with r(t, s) = tl\s and the equation ¢(s) ds =A ¢ 1 (t), respectively.
The solution of the above differential equation gives the eigenvalues and eigenfunctions in
the representation of X.
We note that the Brownian motion and the compound Poisson processes have corre-
lation functions similar to the correlation function of the process X in this example. Hence,
the above Karhunen-Loeve representation can be used to characterize these two processes.
This observation also demonstrates a notable limitation of the Karhunen-Loeve representa-
tion. The representation cannot distinguish between processes that are equal in the second
moment sense. •

Example 3.55: Let {W(t), t E [a, b]} be a white noise defined formally as a
process with mean zero and correlation function E[W(t) W(s)] = y 8(t - s),
where t, s E [a, b], y > 0 is interpreted as the noise intensity, and 8(·) denotes
the Dirac delta function. Then W(t) = ,JY L~l wk ¢k(t) is the Karhunen-
Loeve representation of W, where Wk are random variables with E[Wk] = 0 and
E[Wk Wt] = 8kt and {¢k} is a collection of orthonormal functions spanning the
class of square integrable functions defined in [a, b]. <>

Proof: Eq. 3.89 gives y¢(t) = At/J(t) so that all eigenvalues are equal toy and the
eigenfunctions are arbitrary provided that they are orthonormal and span the space of square
integrable functions. The first two moments of the random variables {Wk} result from the
definition of Xk in Eq. 3.90. These calculations are formal since the process W does not
exists in the second moment sense. However, this representation of the white noise is
frequently used in engineering applications [175]. •
164 Chapter 3. Stochastic Processes

Example 3.56: Suppose that a weakly stationary process X with mean zero and
unknown correlation function E [X (t) X (s)] is observed through an imperfect de-
vice. The observation equation is Y(t) = X(t) + W(t), 0 ~ t ~ -r, where W
is the white noise process in the previous example. The processes X and W are
not correlated. It is assumed that the observed process Y has mean zero and a
known correlation function ry(t- s) = E[Y(t) Y(s)], t, s E [0, -r]. Then the best
m.s. estimator of X is

1
=L l +
00

Xopt(f) j).. Yk </>k(t)


k=l y k

where y fJ..k is the noise to signal ratio, (J..k, </>k) are the eigenvalues and eigenfunc-
tions given by Eq. 3.89 with r(t, s) = ry(t- s), and the properties of the random
variables Yk result from Eq. 3.90 written for the process Y. If the noise intensity
y decreases to zero, the optimal estimator Xopt has the same Karhunen-Loeve
representation as X. <>

Proof: Let X(t) = L~l ..ji:k xk r/Jk(t) be the Karhunen-Loeve representation of X,


where r/Jk and Ak are the solutions of Eq. 3.89 with r being the correlation function of X.
The previous example has shown that W admits the expansion W(t) = ~ L~l Wk r/Jk(t)
so that
00 00

Y(t) = X(t) + W(t) = L:<AXk + .JYWk)r/Jk(t) = LYkrfJk(t),


k=l k=l

where Yk = ..ji:k Xk + ~ Wk (Eq. 3.90 and Example 3.55). The above equation shows
that the Karhunen-Loeve representations of X and Y involve the same eigenfunctions cfrk
and these functions can be obtained from Eq. 3.89 with r(t, s) = ry(t- s). The Karhunen-
Loeve representation of Y can be calculated since this process is assumed to have known
second moment properties.
Consider the estimator X(t) = L~l hk Yk r/Jk(t), t E [0, -r] of X, where the coef-
ficients hk need to be determined. The m.s. error, e =E [fo ( X(t)- X(t) ) 2 dt]. of X
L~l E [ihk Yk- ../):k Xkl 2] so that it is minimized for hk = A.k/(A.k + y).
is equal to
The expression of X with hk = J...kf()..k + y) is the stated optimal estimator. In some of
the above operations we interchange summations with integrals and expectations. It can be
shown that these operations are valid ([49], Appendix 2). •

The results in Eqs. 3.89-3.90 can be extended directly to random fields in


Lz provided that their correlation functions satisfy similar conditions. For exam-
ple, let X (t) E Lz, t E R, be a real-valued random field, where R is a bounded
=
rectangle in JRd. If the correlationfunctionr(t, s) E[X(t) X(s)] of X is square
integrable and continuous in R x R, then X admits a Karhunen-Loeve represen-
tation similar to the one in Eq. 3.90 ([1], Section 3.3, [98], Section 6.2).
3.10. Extremes of stochastic processes 165

3.10 Extremes of stochastic processes


Let {X(t), t ::=:: 0} be the state of a physical system and suppose that the
system performance is adequate as long as X does not leave an open subset D of
JRd. The stopping timeT= inf{t ::=:: 0: X(t) ¢. D, X(O) ED} is defined to be the
system failure time. The probabilities Ps(t) = P(T > t) and Pt(t) = 1- Ps(t)
are referred to as reliability and failure probability in (0, t], respectively.

Let N(D; t) or N(D; I) denote the number of times X exits D during a time
interval I = [0, t]. Then

Pt(t):::; E[N(D; t)] + P(X(O) ¢.D). (3.91)

Proof: We have Pt(t) = P({N(D; t) > 0} U {X(O) ¢. D}) so that Pt(t)::; P(N(D; t) >
0) + P(X(O) ¢. D). Also,

=L L k P(N(D; t) = k) = E[N(D; t)].


00 00

P(N(D; t) > 0) P(N(D; t) = k) :::


k=l k=l

Numerical results show that the upper bound on Pt(t) in Eq. 3.91 is relatively tight for
highly reliable systems, that is, systems characterized by very small values of Pt(t) and
E[N(D; t)] ([175], Chapter 7). If E[N(D; t)] is not finite or is finite but larger than 1, the
upper bound in Eq. 3.91 provides no information on Pt(t). •

The following two sections give conditions under which E[N(D; t)] is fi-
nite and formulas for calculating E [ N ( D; t)] and the density of T.

3.10.1 Mean crossing rate


Let {X (t), t ::=:: 0} be a real-valued stationary stochastic process with a.s. con-
tinuous samples and continuous marginal distribution. We say that a sample
X ( ·, w) of X upcrosses a level x at time t or has an x -upcrossing at t if for each
e > 0 there exist t1 E (t- e, t) and tz E (t, t +e) such that X(tt, w) < x <
X(tz, w). Suppose that D = (-oo, x) and I = (0, t]. Since X can exit D only
through its right end, N(D; I) in Eq. 3.91 represents the number of x-upcrossings
of X in I, and is denoted by N(x; I) or N(x; t).
Let Xh be a piece-wise linear approximation of X so that Xh(j h, w) =
X(j h, w), j = 1, 2, ... , where h > 0 is a small time step. The scaled prob-
ability Jh(x) = P(X(O) < x < X(h))j h is equal to the average number of
x-upcrossings of Xh per unit of time or the mean x-upcrossing rate of X h since
Xh has one x-upcrossing in h with probability P(X(O) < x < X(h)) and no
x-upcrossing in this interval if the event {X(O) < x < X(h)} does not occur.
Let Nh (x; I) denote the number of x-upcrossings of X h in I. It can be shown
that Nh(x; I) :::; N(x; I) and that Nh(x; I)-+ N(x; I) a.s. and E[Nh(x; I)]-+
E[N(x; I)] ash -+ 0 ([121], Lemma 7.2.2, p. 148).
166 Chapter 3. Stochastic Processes

The following result, referred to as the Rice formula, is used extensively in


engineering. The proof of Rice's formula is based on the approximation X h of
X. Let Zh = (X(h)- X(O))/ h be the slope of Xh in a time step and let fh(·, ·)
denote the joint density of (X(O), Zh).

Rice's formula. If (1) X is a real-valued stationary process with a.s. contin-


uous samples and continuous marginal distribution, (2) the density f X(t),X(t)
of (X(t), X(t)) exists, and (3) fh(g, z) is continuous in g for all z and small
h > 0 and converges to f X(t),X(t) (g, z) uniformly in g for fixed z ash --+ 0, and
(4) / 000 z /h (g, z) dz < oo, then the mean x-upcrossing rate of X is

A.(x)+ lim Jh(X) = roo z fx(t) X(t)(x, z) dz


= h~O h '
= E [X(t)+ I X(t) =X] /x(t)(X). (3.92)

Proof: We first note that the mean rate at which X crosses with negative slope a level x E lR
at a timet, that is, the mean x-downcrossing rate of X at t, is
A.(x)- = E [X(t)- I X(t) = x] !x(t)(x).
If X is not a stationary process, its mean x-upcrossing and x-downcrossing rates will de-
pend on time and will be denoted by A.(x; t)+ and A.(x; t)-, respectively.
The event X(O) < x < X(h) that Xh has an x-upcrossing in h can be written as
{X(O) < x < X(O) + h Zh} = {X(O) < x} n {Zh > (x- X(O))/ h} so that

Jh(X)=h- 1 ("' dz1x


fo x-zh
dufh(u,z)=h- 1
fo
roo dz fo{ 1 dv(hz)fh(x-hzv,z)
where the last equality results by the change of variables u = x - h z v. The limit of Jh (x)
ash ~ 0 gives Eq. 3.92. Details on this proof and on crossings of stochastic processes can
be found in [121] (Chapter 7).
If X is an JRd -valued stochastic process and D E sd, it can be shown that under sim-
ilar assumptions the mean rate at which X exits D at a time t or the mean D-outcrossing
rate of X at tis ([175], Chapter 7)

A.(D; t) ={ E[X(n)(t)+ I X(t) = x] fX(t)(x)da(x), (3.93)


lav
where aD = iJ \ D denotes the boundary of D and X(n) (t) = X(t) · n(x) is the projection
of X(t) on the outer normal n(x) to D at x E aD. •
Example 3.57: Let X be an m.s. differentiable Gaussian process with mean and
covariance functions J.t(t) and c(t, s), respectively. The mean x-upcrossing rate
of X at time t is given by Eq. 3.92 with

E[X(t)+ I X(t) = x] = a(t) [¢ (~(t)) + ~(t)


a(t) a(t)
<1> (~(t))]
a(t)
and

1
f(x; t) = a(t) ¢
(X- {l(t))
a(t) ' where
3.1 0. Extremes of stochastic processes 167

{J,(t) = (l.,(t) + [C,t(f, s) lt=s jc(t, t)] (X- f.L(l)), C,t(f, s) = ac(t, s)jat, a(t) 2 =
C,ts(f, s) lt=s - (c,t(f, s) lt=s) 2 jc(t, t),
and C,ts(t, S) = s)j(at as).a2 c(t,
Figure 3.7 shows the variation in time of the mean x-upcrossing rate of

0.03

0.025

0.02

O.D15

~= 2.5
0.01 /~=2.0
/ /}=1.0
~= 0.0
0.005

0
0 0.5 2.5 3.5 4.5

time

Figure 3.7. Mean x-upcrossing rate for a non-stationary Gaussian process X (t) =
Y(t) I (Y(O) = n
X= 3.0, and t E [0, 5]

X(t) = Y(t) I (Y(O) = g) for x = 3 and t E [0, 5], where Y is a stationary


Gaussian process with mean zero and covariance function p(r) = E[Y(t) Y(t +
r)] = (1 +a lr I) e-a 1<1 for a = 1 and several values of g. We note that the
mean x-upcrossing rate .A(x; t)+ (1) converges to the stationary value .A(x)+ =
(a/(2 n)) exp( -x 2 /2) as t -+ oo and (2) is an increasing function of g at each
time t > 0. The first property is a consequence of the reduced effect of the initial
value of X on its future values as time increases. The increased likelihood that X
upcrosses a level x for an initial value X (0) = g closer to this level explains the
second property of .A(x; t) +. <>

Proof: The JR2 -valued Gaussian variable (X(t), X(t)) has mean JL = (J-L(t), (t(t)) and
covariance matrix c with entries cu = 1- p(t) 2 , c12 = c21 = p(t) p'(t), and c22 =
-p"(O)- p'(t) 2 . Hence, X(t) I (X(t) = x) is a Gaussian random variable with mean and
standard deviation {l(t) and &(t) (Section 2.11.5).
The mean x-upcrossing rate of Y is }..(x)+ = (aj.J21r) <f>(x) since Y(t) and Y(t)
are independent Gaussian variables so that E[Y(t)+ I Y(t)] = E[Y(t)+] and E[Y(t)+] =
aj.J21r. •
168 Chapter 3. Stochastic Processes

3.10.2 First passage time density


Let {X(t), t ~ 0} be a real-valued Gaussian process with mean zero and
covariance function c(t, s) = E[X(t) X(s)]. Suppose that X(O) < x and denote
by T = inf{t > 0 : X (t) ~ x} the first time X exceeds a level x.

Durbin formula. If c(·, ·) has continuous first order partial derivatives and
Iimstt Var[X(t)- X(s)]/(t- s) < oo, then

!T,X(T)(t, z) = E [1£x(s)<x, O:o::s<t) X(t)+ I X(t)= x, X(t) = z] fx(t),X(t)(x, z),


/T(t) = E [1£X(s)<x, o::::s<t) X(t)+ I X(t) = x] fxct)(x). (3.94)

Note: The density fT of T is equal to the mean x -upcrossing rate of the samples of X that
have not ever left D = (-oo, x) in [0, t]. Because l(X(s)<x, O:O::S<t) is zero on the subset
of the sample space corresponding to samples of X that upcross x at least once in [0, t], we
have fT(t) :::; A.(x; t)+. Details on Durbin's formulainEq. 3.94, extensions of this formula,
and computer codes for calculating the density ofT can be found in [56, 158, 159]. A

Example 3.58: Let {X(t), t = 0, 1, .. .} be a time series with independent values


following the same distribution F. Denote by T = inf{t ~ 0 : X(t) > x} the
first time the series X exceeds a specified level x. The exact and approximate
expressions of the distributions of T are

FT(t) = 1- F(t)t+l and FT(t):::::: F(x)e-J.(x)+t,

where A.(x)+ denotes the mean x-upcrossing rate of X.


Figure 3.8 shows the probability P(T > t) for F(x) = <l>(x) and x = 1.5.

0.5

0.4

0.3
Exact
Approximate
0.2

0.1

0
0 10 20 30 60 70 80 90 100

time

Figure 3.8. Exact and approximate probabilities P(T > t) for F(x) = <l>(x) and
X= 1.5
3.11. Martingales 169

The approximation of P(T > t) based on the mean x-upcrossing rate of X is


accurate in this case. ¢

Proof: Because the events {T > t} and ni=


0{X(k) :::: x} have the same probability, we
=
have P(T > t) (P(Xl :::; x))t+l =
F(x)t+l.
Note also that the event {T > t} occurs if X(O):::; x and X has no x-upcrossing in t
(Eq. 3.91) so that

P(T > t) = P({X(O):::; x} n {N(x; t) = 0}) ~ P(X(O) :::; x) P(N(x; t) = 0).


The above approximate equality is based on the assumption that {X (0) :::; x} and {N (x; t) =
0} are independent events. The approximation of P(T > t) in Fig. 3.8 is given by the
above equation in which the probability of {N(x; t) = 0} is taken to be exp(-A.(x)+ t),
where A.(x)+ = F(x) (1 - F(x)) is the mean x-upcrossing rate of X. This expression
of P(N(x; t) = 0) is based on an asymptotic result stating that the x-upcrossings of X
approach a Poisson process with intensity A.(x)+ as x increases indefinitely ([45], Sec-
tion 12.2). •

3.11 Martingales
We have examined in Section 2.18 discrete time martingales. Here we con-
sider continuous time stochastic processes that are martingales.

A real-valued stochastic process X defined on a filtered probability space


(Q, :F, (Ftk~:O• P) is an :F1-martingale if

1. E[IX (t) ll < oo, Vt 2:: 0,


2. X is :F1 - adapted, and
3. E[X(t) I :Fs] = X(s) a.s., Vs::;: t. (3.95)

Note: The last condition in Eq. 3.95 can be replaced with E[X(t) I Fs] = X(s At),
t, s ~ 0, since E[X(t) I Fsl = X(t) = X(t As) fort <sand E[X(t) I Fs] = X(s) for
t ~ s.
If the equality E[X(t) I Fs] =
X(s), s :::; t, is replaced by~ and:::;, X is said to be
an .1'1-submartingale and .1'1-supermartingale, respectively.
If X is in L2 (Q, F, P) and satisfies the last two conditions in Eq. 3.95, then X is said
to be an :Fi -square integrable martingale. Similarly, a submartingale/supermartingale
X E L2(0, F, P) is called a square integrable submartingale/supermartingale. If the first
condition in Eq. 3.95 is replaced by E[IX(t)IP] < oo, p > 1, then X is said to be a
p-integrable martingale. &

We have seen in Section 2.18 that the martingale, submartingale, and su-
permartingale can be viewed as models for fair, super-fair, and unfair games by
interpreting t 2:: s, s, E[X(t) I :F8 ], and X(s) as future, present, average future
fortune, and current fortune, respectively. For example, the average future fortune
170 Chapter 3. Stochastic Processes

of a player E[X(t) I Fs] is equal to his or her current fortune X (s) for martingales
but it is larger/smaller then X (s) for submartingales/supermartingales. The con-
ditions that a stochastic process X must satisfy to be a martingale, submartingale,
and supermartingale add structure to this process, as we will see in this section.

Example 3.59: Consider a filtered probability space (Q, F, (F1 ) 1::::o. P) and a
random variable Y E F such that E[IYI] < oo. The process X(t) = E[Y I Ft],
t 2: 0, is an F 1 -martingale. <>

Proof: We have E[IX(t)ll = E[IE(Y I Ft)ll::::: E[E[IYI I Ftll = E[IYI] < oo by the
Jensen inequality for conditional expectation (Section 2.17.2) and hypothesis. X is :Ft-
adapted because E[Y I F 1 ] is F 1 -measurable for each t :::: 0. Properties of the conditional
expectation give E[X(t) I Fs] = E[E(Y I Ft) I Fs] = E[Y I Fs] = X(s) for all s ::::: t.
Hence, X is a martingale, referred to as the martingale closed by the random variable Y. •

Example 3.60: The Brownian motion process B is an F 1 -martingale, where F 1 =


a (B (s) : 0 ::::; s ::::; t) denotes the natural filtration of B. <>

Proof: We have E[IB(t)l] ::::: ( E[B(t) 2 ]) 112 = t 112 by the Cauchy-Schwarz inequality
and properties of B. Since F 1 = a(B(s) : 0::::: s ::::: t) for all t:::: 0, B is F 1-adapted. For
t:::: s we have
E[B(t) IFsl = E[(B(t)- B(s)) + B(s) IFs]
=E[B(t)- B(s) I Fs] + E[B(s) I Fs] = B(s)

since the conditional expectation is a linear operator, the increment B(t)- B(s) is inde-
pendent of B(u), u ::::: s, so that E[B(t)- B(s) I Fs] = E[B(t)- B(s)], B has mean zero,
and B(s) is Fs-measurable. Hence, B is a martingale. •

Example 3.61: The compound Poisson process C in Eq. 3.6 can be an F 1 -martin-
gale, submartingale, or supermartingale depending on the sign of E[Y J], where F 1
denotes the natural filtration of C and Y 1 is assumed to be in L J. <>

1]
Proof: The mean of the absolute value of C(t) is smaller thanE [ L,f~? IYk =At E[IYJIJ,
which is finite because E[IYJil < oo by hypothesis. The process is Ft-adapted. Fort 2: s
we have

= E[(C(t)- C(s)) + C(s) IFsl


E[C(t) I.Fsl
= E[C(t)- C(s) IFsl + E[C(s) IFsl =A (t- s) E[Yj] + C(s)
since C has stationary independent increments, E[C(t) - C(s)] = A (t - s) E[Yj], and
C(s) is Fs-measurable. If E[Yj] is zero, positive, or negative, then C is a martingale,
submartingale, or supermartingale, respectively. The above equalities also show that the
compensated compound Poisson process C(t) -At E[Yj] is an F 1 -martingale. •
3.11. Martingales 171

Example 3.62: Let {Bi(v), 0 :::; v :::; v}, i = 1, 2, be two independent Brownian
motions, 0 < v1 < vz < · · · < Vn < Vn+l = v < oo be some frequencies, and
ak > 0, k = 1, ... , n, denote arbitrary numbers. Then

n
X(t) = L ak [B1 (vk) t:.B1 (vk) cos(vk t) + Bz(vk) !::.Bz(vk) sin(vk t)]
k=l y'vk
(3.96)
is a weakly stationary process with one-sided spectral density function g(v) =
Lk=l a[ o(v- Vk), where t:.Bi(Vk) = B(vk+l)- B(vk), i = 1, 2. The definition
of X constitutes a discrete version ofEq. 3.74.
Figure 3.9 shows a sample of X scaled to have unit variance for n = 100,
t:.v = Vk+l - Vk = 1/10, a[ = g(vk) t:.v, and g(v) = (2/JT)j(v 2 + 1). The
estimated value of the coefficient of kurtosis of X is 2.18, indicating that X is not
a Gaussian process. <>

1.5

05

-0.5

-1

-1.5

-2

- 2 . 5 o'---,c'co-~,o--:'eoc---=a~o-----c1~oo-1c'c20-~140c--~,.cc-o---c1B~o---o-'2oo

time

Figure 3.9. A sample of X in Eq. 3.96

Proof: The mean and variance of the increments Bi(vk) t:.Bi(vk) are zero and vk t:.vb
respectively. The correlation of these increments fork < l is

E[Bi(vk) t:.Bi(vk) Bi(vz) t:.Bi(vz)] = E {E[Bi(vk) t:.Bi(vk) Bi(vz) l!!.Bi(vz) I.Fz]}


= E {Bi(vk) t:.Bi(vk) Bi(vz) E[t:.Bi(vz) I.Fz]} = 0

since Bi(vk), l!!.Bi(vk), and Bi(vz) are .Fz-measurable and E[t:.Bi(vz) I .Fz] = 0. Hence,
the increments Bi ( v) t:. Bi ( v) have the same second moment properties as the increments
of U and V in the spectral representation of a real-valued process (Eq. 3.74). •
172 Chapter 3. Stochastic Processes

A real-valued process X defined on a filtered probability space


(Q, :F, (:Ftk::o, P) is an :Frlocal martingale, submartingale, or super-
martingale if there exists an increasing sequence Tn, n = 1, 2, ... , of
:Ft-Stopping times, that is, Tn :S Tn+l, such that (1) limn--;.oo Tn = +oo a.s. and
(2) the stopped process xTn (t) = X (t 1\ Tn) is an :Frmartingale, submartingale,
or supermartingale, respectively, for each n.

Note: The sequence Tn is said to reduce X, and is referred to as a localizing sequence. A


martingale, submartingale, or supermartingale can have two or more localizing sequences.
In many applications the stopping times Tn are related to sample properties of X, for exam-
ple, the sequence of stopping times Tn = inf{t 2: 0: IX(t)l > n} gives the first time when
X leaves the interval [-n, n]. This sequence of stopping times is increasing but may or
may not converge to infinity depending on the properties of X. If X is a Brownian motion
B, then Tn is a localizing sequence forB (Example 3.64).
We also note that every martingale, submartingale, and supermartingale is a local
martingale, submartingale, and supermartingale, respectively. For example, we can take
Tn = ooforalln 2:1. &

Example 3.63: Let N be a Poisson process with intensity ). > 0. The random
variables Tn = inf{t =:: 0 : N(t) = n}, n = 1, 2, ... , define an increasing
sequence of stopping times such that Tn --+ oo a.s. as n --+ oo. The processes
NTn, n = 1, 2, ... , are :F1 -submartingales so that N is an :F1 -local submartingale
with the localizing sequence Tn. Note also that N is an :Frsubmartingale. <>

Proof: The random variable Tn is an .1"1-stopping time since the events {Tn ~ t} and
{N(t) 2: n} coincide and {N(t) 2: n} E Ft. We also have Tn = Lt=l Zi, where Zi are
independent exponential random variables with mean 1/'A so that Tn+l = Tn + Zn+l 2: Tn
a.s. for all n 2: 1 and Tn --+ oo a.s. as n --+ oo since E[Zd = 1/'A > 0 (Section 2.14).
We will see in Section 3.11.2 that right continuous stopped martingales, submartin-
gales, and supermartingales are martingales, submartingales, and supermartingales, respec-
tively. Hence, NTn is an .1"1 -submartingale for each n 2: 1. •

Example 3.64: Let {B(t), t =:: 0} be a Brownian motion in ~d such that B(O) =
x E ~d and let S(x, n) = {y E ~d :II y- x II:S n}, n = 1, 2, ... , be a
sphere of radius n centered at x = (xi. ... , xa). The coordinates Bi of B are
independent real-valued Brownian motions starting at xi rather than zero. The
random variables Tn = inf{t =:: 0: B(t) fj. S(x, n)} define an increasing sequence
of stopping times with respect to the natural filtration of B and T n --+ oo a.s. as
n--+ oo. <>

Proof: The random variables Tn satisfy the condition Tn :::; Tn+l and are stopping times
since we can tell whether Tn exceeds an arbitrary time t 2: 0 from properties of B up to this
time. Since the Brownian motions Bi have continuous samples, they cannot reach infinity
in a finite time so that Tn --+ oo a.s. as n --+ oo. •
3.11. Martingales 173

3.11.1 Properties
Many of the properties of the martingales considered here are counterparts
of the properties of discrete time martingales in Section 2.18.1.

• The expectation of a martingale, submartingale, and supermartingale X is a


constant, increasing, and decreasing function, respectively.
• X is a martingale if it is both a submartingale and a supermartingale.
• If X is a submartingale, then -X is a supermartingale.
Proof: These properties follow from the definition of the martingale, submartingale, and
supermartingale processes. For example, the expectation of a submartingale X is an in-
creasing function of time because the condition X(s) ~ E[X(t) I .1$], s ~ t, implies
E[X(s)] ~ E[E(X(t) I Fs)] = E[X(t)] . •

Increments of a martingale X over non-overlapping intervals are orthogonal pro-


vided that X (t) E L2 for all t ::: 0.
Proof: For u ~ v ~ s ~ t we have

E[(X(t)- X(s)) (X(v)- X(u))] = E {E [(X(t)- X(s)) (X(v)- X(u)) I Fs]}


= E {(X(v)- X(u)) E [X(t)- X(s) I Fs]} = 0

since X(v)- X(u) is Fv-measurable, Fv C .1"8 , and E[X(t)- X(s) I Fs] = 0 by the
martingale property. The above expectations are finite since X (t) E L2, t 2:: 0. •

If X is an Frmartingale, then X is also an .rtx-martingale, where Ff =


a (X (s) : s :::; t) denotes the natural filtration of X.

Proof: We have E[IX(t)l] < oo because X is a martingale. The random variable X(t) is
Ff -measurable by the definition of the natural filtration. Fort 2:: s we have
E[X(t) I F{'J = E{E[X(t) I Fs] IF{'}= E[X(s) IF{']= X(s)

since X is an Ft-martingale and X(s) E .:rf. •

If X is an Ft-adapted process with finite mean and has increments independent


of the past, then the compensated process X(t)- E[X(t)] is an Frmartingale.
Proof: The compensated process X(t) - E[X(t)] satisfies the first two conditions in
Eq. 3.95 by hypothesis. We also have

E[X(t) I Fs] = E[(X(t)- X(s)) + X(s) I Fs] = E[(X(t)- X(s))] + X(s)


fort 2:: s since X (s) E Fs and X has increments independent of the past, that is, X(t) -
X(s) is independent of .1"8 • The above equation gives E[X(t)- E[X(t)] I Fs] = X(s)-
E[X(s)] and shows that X(t) - E[X(t)] is a martingale. This result is consistent with
Example 3.61 showing that a compensated compound Poisson process with jumps in L 1 is
a martingale. •
174 Chapter 3. Stochastic Processes

IEvery martingale has an optional modification ([61], Lemma 4.1, p. 72)).


Note: Two definitions are needed to clarify the meaning of the above statement. Let P and
0 be the smallest a -fields on [0, oo) x Q with respect to which all left and right continuous
Fradapted processes are measurable, respectively. A process X is called predictable if
the mapping (t, w) r-+ X (t, w) is P-measurable. If this mapping is 0-measurable, then X
is said to be optional. Predictable processes allow "a peek into the future" because they
are left continuous. We note two important properties:

1. Predictable and optional processes are measurable because the a-fields P and 0 are
included in B([O, oo)) x F and
2.P~O.

We only show that Pis included in 0. Let X be an Ft-adapted process with left
continuous paths and define x<n) (t) = X(kf2n) fort E [k/2n, (k+ 1)/2n) (Fig. 3.10). The
process x<n) is F 1-adapted, has right continuous samples, and approaches X as n ~ oo.

Sample of a predictable
process, X(·,ro)

time

Figure 3.10. A hypothetical sample of a predictable process and the correspond-


ing sample of an optional process x<n) approximating X

Hence, X is also an optional process so that P ~ 0. The use of a similar construction


showing the other inclusion fails. Let us assume that X is an Ft-adapted process with right
continuous paths. The sequence of processes X(n)(t) = X((k + l)/2n), t E (kf2n, (k +
1)/2n], is left continuous and approaches X as n ~ oo but is not Ft-adapted. .l

IEvery martingale has a unique cadlag modification ([147], Corollary 1, p. 8).


Note: This statement augments the previous property and shows that martingales have
well-behaved paths. If a martingale is specified by a collection of finite dimensional distri-
butions, we can assume that its sample paths are cadlag. .l
3.11. Martingales 175

Doob-Meyer decomposition. If X is a right continuous F 1-submartingale of


class DL, then there exists a unique (up to indistinguishability) right continuous
F 1-adapted increasing process A with A(O) = 0 such that M = X -A is an
F 1-martingale and

E [fot/\T Y(s-)dA(s)] = E [fot/\T Y(s)dA(s)] = E[Y(t 1\ T) A(t 1\ T)]

for every non-negative right continuous F 1-martingale Y, every t :::: 0, and Ft-
stopping timeT ([61], Theorem 5.1, p. 74).
Note: We have proved a similar decomposition for discrete time submartingales (Sec-
tion 2.18.1). A right continuous .1'1 -submartingale X is said to be of class DL ifthe family
X (t 1\ T) with T ranging over the collection of .1'1-stopping times is uniformly integrable
for each t :::>:: 0. A family of random variables Za, a E /, is uniformly integrable if
SUPae/ ~Zai:O::n IZal dP converges to zero as n--+ oo.
We say that A is an increasing process if A(t, w), t :::>:: 0, are increasing functions
for all w E Q. For example, the Poisson process is increasing. •

3.11.2 Stopped martingales

If X is an .1"1 -submartingale and T1, T2 are .1"1 -stopping times assuming values
in a finite set {t1, t2, ... , tn} ~ [0, oo), then ([61], Lemma 2.2, p. 55)

(3.97)

Note: This inequality is similar to the last condition in Eq. 3.95 with s, t replaced by the
stopping times T1, T2. By the defining properties of the conditional expectation, Eq. 3.97
is satisfied if and only if fA X (T2) d P :::0:: fA X (T1 1\ T2) d P, 'v' A E .FT1 • A

Optional sampling theorem. If X is a right continuous .1"1 -submartingale and


h T2 are .1"1 -stopping times, then ([61], Theorem 2.13, p. 61)
E[X (T2 1\ t) I FT,] :::: X (T1 1\ T2 1\ t) for each t > 0. (3.98)

If, in addition, T2 < oo a.s., limt-+oo E[IX(t)ll{Tz>tJ] = 0, and E[IX(T2)1] <


oo, then E[X(T2) I FT,]:::: X(T1 1\ T2).
Note: The inequality in Eq. 3.98 does not follow from Eq. 3.97 since the values of T1
and T2 are not restricted to a finite set. This generalization can be achieved by requiring
X to be right continuous. The results in Eqs. 3.97 and 3.98 hold also for martingales and
supermartingales with" :::>:: "replaced by" ="and" :::; ",respectively. A

Stopped right continuous (local) submartingales, martingales, and supermartin-


gales are (local) submartingales, martingales, and supermartingales, respec-
tively.
176 Chapter 3. Stochastic Processes

Proof: Suppose that X(t), t 2: 0, is a right continuous submartingale and Tis a stopping
time. We have E[IX (T 1\ t)l] < oo since X is a submartingale. Since X is right continuous
and adapted, it is progressive (Section 3.2) so that X(T 1\ t) E :FT 1\t c :F1 . Also, Eq. 3.98
with T1 = s ::::; t and Tz = T yields E[X(T 1\ t) I :Fsl 2: X(T 1\ s). Hence, xT is a
submartingale.
Suppose now that X(t), t 2: 0, is a right continuous local submartingale, T is a
stopping time, and that Tn, n = 1, 2, ... , is a localizing sequence for X. The above consid-
erations applied to the submartingale xTn show that xTn (T 1\ t), t 2: 0, is a submartingale.
Hence, xTn (T 1\ t) = X (Tn 1\ T 1\ t) = xT (Tn 1\ t), t 2: 0, is a submartingale for all n 2: 1
so that Tn is a localizing sequence for XT.
Similar arguments can be used to show that stopped right continuous local martin-
gales and supermartingales are local martingales and supermartingales, respectively. •

If X is an .F1 -right continuous submartingale which is bounded from below, then


X is of class D L. Hence, X admits the Doob-Meyer decomposition.

Proof: The inequality in Eq. 3.98 with Tz = T, T1 = T 1\ t, t 2: 0, gives

E[X(T 1\ t) I :FT/\tl 2: X(T 1\ t).

The family E [X ( T 1\ t) I :FT 1\t] of random variables indexed by the stopping times T
is uniformly integrable for each t 2: 0 ([40], Theorem 4.5.3, p. 96). Since X(T 1\ t) is
bounded from below, then X(T 1\ t) indexed by Tis uniformly integrable. Hence, X is
of class DL. For example, the Poisson process N is right continuous and is bounded from
below since N(t) 2: 0, t 2: 0. Hence, N is of class DL. •

3.11.3 Inequalities

Jensen inequality. If X is an .F1 -martingale and rp lR 1-7 lR is a convex


function such that E[frp(X(t))f] < oo, t 2: 0, then

E[rp(X(t)) I Fs] 2: rp(X(s)), s :St. (3.99)

Proof: We first note that Eq. 3.99 implies that cp(X) is an :F1-submartingale. The Jensen
inequality can be used to show, for example, that IB I and B 2 are submartingales, where B
is a Brownian motion. We have also established a Jensen inequality for random variables
(Section 2.12).
Because cp is a convex function, we have cp(x) = sup{l(x) : l(u) ::::; cp(u), 'v'u}, where
l(u) =au+ b is a linear function with a, b real constants. Also,

E[cp(X(t)) I :Fsl = E[sup{l(X(t))} I :Fsl2: sup{E[l(X(t)) I :Fs]}


= sup{l(E[X(t) I FsD} = sup{l(X(s))} = cp(X(s))

since E[sup{l(X(t))} I :Fsl 2: E[l(X(t)) I :Fsl for any linear function l, the conditional
expectation is a linear operator, and X is an .1(-martingale. •
3.11. Martingales 177

If X is an F 1-submartingale, 0 < -r < oo, and F c [0, -r] is a finite set, then for
each x > 0 and bounded interval (a, b) we have ([61], Lemma 2.3, p. 56 and
Lemma 2.5, p. 57)

P(max{X(t)} 2: x) ::=:: E[X(-r)+]/x,


teF
P(min{X(t)}::::: -x)::::: (E[X(-r)+]- X(O)) jx, and
tEF
E[U(a, b, F)]::::: (E[(X(-r)- a)+l) j(b- a). (3.100)

Note: If T1 = min{t E F : X(t) ::;: a} and Tz = min{t > T1 : t E F : X(t) 0:: b},
we say that X has an oscillation in (T1, Tz) larger than b - a. This event relates to the
event of an a-downcrossing followed by a b-upcrossing defined for differentiable processes
(Section 3.10.1). The number of oscillations of X in F larger than b- a is denoted by
U(a, b, F).
The bounds in Eq. 3.100 show that the samples of X cannot be very rough because
(1) E[X(t')+]/x and (E[X(t')+] - X(O))/x converge to zero as lxl --+ oo so that the
maximum and the minimum values of X in F are finite with probability one and (2) the
average number of oscillations of X in F that are larger than b - a is finite. A

If X is a right continuous submartingale, the first two inequalities in Eq. 3.100


can be extended to

P( sup {X(t)} 2: x) ::=:: E[X(-r)+]/x,

P( inf {X(t)}::::: -x)::::: (E[X(-r)+]- E[X(O)]) jx, (3.101)


o:::t:::-r

for each x > 0 and -r > 0 ([61], Proposition 2.16, p. 63).

Example 3.65: A Brownian motion B is a continuous martingale with respect


to its natural filtration so that the inequalities in Eq. 3.101 can be applied. The
probability P(maxre[O,-rJ{B(t)} 2: x) of the largest value of the standard Brownian
motion B in [0, -r] and its upper bound E[B(-r) +]jx given by the first inequality
in Eq. 3.101 are shown in Fig. 3.11 for -r = 10 and x E [1, 5]. The upper bound
E[B(-r)+]jx is relatively wide for all values of x so that it does not provide a very
useful approximation for P(maxre[O,-rJ{B(t)} 2: x). 0
Note: The expectation E[B(t')+] is (2:7r t')- 112 J000 x ex 2 f(Z-r) dx = .jt"j(2:7r). LetT=
inf{t ::=: 0 : B(t) ::=: x} be a stopping time with respect to the natural filtration of B. The
probability that the largest value of B exceeds x in [0, t'] is equal to P(T ::;: r) and the
expression of this probability is given in Example 3.15. A

Example 3.66: Let {Xn, n = 0, 1, ... } be an Fn-martingale and define a process


(H · X)n = Lk= 1 Hk(Xk - Xk-1), where {Hn, n = 1, 2, ... } is an arbitrary
sequence of random variables. The process H · X is an :Fn-martingale if H is
predictable, that is, Hn E Fn-1 (Section 2.18). 0
178 Chapter 3. Stochastic Processes

1.4,------,---.,.------r---.,------,---.,.----~-~

0.6

0.4

0.2
in [O,t) exceeds x

0~-~--~--~--~-~---L--~-~
1 1.5 2.5 3.5 4.5
X

Figure 3.11. Probability of the largest value of a Brownian motion and a bound
on this probability for -r = 10

Proof: If Hn E Fn-1• then

E[(H · X)n+1 I Fn] = (H · X)n + E[Hn+1 (Xn+1 - Xn) I Fn]


= (H · X)n + Hn+1E[Xn+1- Xn I Fn] = (H · X)n
so that (H ·X) is an Fn-martingale. If H is not predictable, the process (H ·X) may not be
a martingale since the random variables E[Hn+l (Xn+l - Xn) I Fn] and Hn+ 1E[Xn+ 1 -
Xn I Fn], generally, differ.
The formula of (H ·X) resembles an integral, referred to as a stochastic integral and
defined in Chapter4. In Section2.18 we have interpreted Hn, Hn Dn = Hn(Xn- Xn-1),
and (H · X)n as the amount of money we bet at round n, the amount of money we would
win or lose for a bet Hn, and our fortune after n bets, respectively. Because our bet Hn
at time n can be based on only the outcomes {X1, Xz, ... , Xn-1 }, we require that H be a
predictable process, that is, Hn E Fn-1· The condition Hn E Fn-1 is consistent with the
game rule that the bets Hn must be placed after completion of round n - 1.
If the time is continuous, the characterization of a predictable process is less obvi-
ous. However, a similar interpretation can be constructed. For example, consider the game
of roulette. Let T be the random time when the ball comes to rest measured from the mo-
ment at which the wheel is spun and the ball is released. The players can bet at any time
t < T but not at t :;:: T. One way of requiring that the bets be placed strictly before T
is to impose the condition that the amount of money we bet at time t is a left continuous
function so that it is not possible to take advantage of a jump in the process we are betting
on ([58], Sections 2.1 and 2.2) •
3.11. Martingales 179

3.11.4 Quadratic variation and covariation processes


We define variation and covariation processes for martingales and present
some useful properties. These processes are also discussed in Section 4.5 for
semimartingales. The variation and covariation processes are needed to establish
the Ito formula (Chapter 4) and solve various stochastic problems (Chapters 5-9).

If X is a right continuous F 1-local martingale, then there exists a right contin-


uous increasing process [X] such that, for each t =:: 0 and each sequence of
partitions Pn = (0 = tan) :::: t~n) :::: · · · :::: tf:'} = t) of [0, t] with D.(pn) -+ 0 as
n-+ oo, we have
mn
L (X(t~n))- X(t~':}1 ))
2
Sn = ~ [X](t), as n ~ 00. (3.102)
k=1

If X is a right continuous square integrable martingale, then the convergence in


Eq. 3.102 is in L 1 ([61], Proposition 3.4, p. 67).

Note: The process [X] is called the square bracket or quadratic variation process. This
process is also denoted by [X, X]. The latter notation is used almost exclusively in the later
chapters of the book. A

Example 3.67: For a Brownian motion B(t), t =:: 0, the quadratic variation pro-
cess is [B](t) = t.

Proof: Take t > 0 and consider the partition Pn = k tIn, k = 0, 1, ... , n, of [0, t]. The
sequence of sums in Eq. 3.102 is Sn = Lk=l (B(k tIn) - B((k- 1) t ln)) 2 so that Sn g,
(tin) Lk=l G~, where Gk, k = 1, ... , n, are independent N(O, 1), since the increments
B(k tIn) - B((k- 1) tin) of B are independent Gaussian variables with mean zero and
variance t ln. Hence, Sn ~ t E[GI] = t as n ~ oo by the law of large numbers. We will
revisit this property of the Brownian motion later in this chapter (Eq. 3.112). •

Example 3.68: Let N be a Poisson process with unit intensity. The quadratic
variation process of M(t) = N(t)- t, t =:: 0, is [M](t) = N(t).

Proof: Take t > 0 and consider the partition Pn = k tIn, k = 0, 1, ... , n, of [0, t].
For a sufficiently large n, the intervals ((k- 1) tin, k tin], k = 1, ... , n, of the partition
contain at most one jump of N. Let ln(t) be the collection of intervals ((k- 1) tin, k tIn]
containing a jump of N. The sum Sn in Eq. 3.102 is

Sn = L (M(ktln)- M((k -1)tln)) 2 + L (M(ktln)- M((k -1)tln)) 2


k¢.ln(t) kEln(t)

= L 1
n2 + L ( 1
1 - ;;
)2 1
= n 2 (n - N(t)) + N(t) ( 1
1 - ;;
)2 ,
k¢.Jn(t) kEln(t)

and the last expression converges to N(t) as n ~ oo. •


180 Chapter 3. Stochastic Processes

If X is a continuous .1"1-local martingale, then [X] can be taken as a continuous


process [61] (Proposition 3.6, p. 71).

If X is a (local) square integrable, right continuous .1"1 -martingale, then

X2 - [X] is a (local) martingale. (3.103)

Proof: The expectation of the absolute value of X (t )2 - [X](t) is finite for each t ~ 0 since
X is a square integrable martingale and [X] is in L1 (Eq. 3.102). The process X 2 - [X]
is F 1-adapted because [X] is adapted by definition. It remains to show the last property in
Eq. 3.95 holds. For t ~ s we have

E[X(t) 2 - X(s) 2 I Fs] = E[(X(t)- X(s)) 2 + 2X(t) X(s)- 2 X(s) 2 I Fs]


= E[(X(t)- X(s)) 2 1 Fs] + 2E[X(t)X(s) I Fs]
-2 E[X(s) 2 I Fs] = E[(X(t)- X(s)) 2 I Fs].

since X is a martingale. The above equality gives

E [%; (X(tj"1J -X (tj:')1 l)2 IF,] ~ [%; (X(tj"1)2


E l2) IF,]
-X (tj:')1

= E[X(t) 2 - X(s) 2 I Fs]

for any sequence of partitions s = tcin) :::; tt> :::; · · · :::; ti:'} t
= of the interval [s, t].
The left side of the above equalities converges in L1 to [X](t) - [X](s) as fl(pn) --+ 0
(Eq. 3.102) so that

E[X(t) 2 - [X](t) I Fs] = E[X(s) 2 - [X](s) I Fs] = X(s) 2 - [X](s) a.s.

showing that X2 - [X] is a martingale.


It can also be shown that, if X is a local square integrable martingale, X2 - [X] is a
local martingale ([61], Proposition 6.1, p. 79). •

If X andY are right continuous, square integrable martingales, then X Y- [X, Y]


is a martingale, where

[X, Y] = ~([X+ Y] -[X]- [Y])


(3.104)

is the covariation of X and Y.


Note: If X and Y are right continuous square integrable martingales so are the processes
X+ Y and X- Y. Hence, the definition in Eq. 3.104 is meaningful.
The covariation process [X, Y](t) can be calculated as the limit of the sums

'L,(X(tt))- X(t~~ 1 ))(Y(ukn))- Y(uk~ 1 ))


k
3.11. Martingales 181

as the meshes of the partitions {t~n)} and {ukn)} of [0, t] converge to zero (Eq. 3.102).
The convergence is in probability. If the martingales X and Y are square integrable, the
convergence is in L 1· Similar statements hold for local martingales ([ 61], Proposition 6.2,
p. 79).
If the covariation of the martingales X and Y is zero, that is, [X, Y] = 0, we say
that X and Y are orthogonal. A

If X is a right continuous, square integrable martingale, then there exists a


unique right continuous adapted increasing process < X > satisfying the con-
ditions for the Doob-Meyer decomposition so that

X 2 - <X > is a martingale. (3.105)

Note: The process x 2 is a submartingale which is bounded from below since X 2(t) ~ 0
for all t ~ 0. We have seen that such a process is of class DL. Hence, X2 admits the
Doob-Meyer decomposition, where A in the Doob-Meyer decomposition is denoted by
<X>.
Consider the decomposition of X 2 in Eqs. 3.103 and 3.105. Since X 2 - [X] and
X 2 - < X > are martingales, so is (X 2 - < X >)- (X 2 - [X]) = [X]- < X >.
Generally, [X] and < X > differ. If X is continuous, the processes [X] and < X > are
indistinguishable ([61], p. 79). A

Example 3.69: Let M (t) = N (t)-).. t, t 2: 0, be a compensated Poisson process,


where N is a Poisson process with intensity A > 0. We have [M](t) N(t) and =
< M > (t) =At, t 2: 0. 0
Proof: We have seen in Example 3.68 that [M](t) = N(t). We show here that the process
A in the Doob-Meyer decomposition is < M > (t) = '). t, t ~ 0.
Since M is a square integrable martingale and < M > is continuous, we only need
to show that M 2 (t)- < M > (t) = M 2 (t) - J... tis a martingale. For s ::::= t we have

E[M 2 (t) I Fsl = E [ (N(t) - N(s) - J.. (t- s) - (N(s) - J.. s)) 2 I Fs]

= E [ (N(t) - N(s) - J... (t - s)) 2 I Fs]

+ 2 E [(N(t)- N(s)- J.. (t- s)) (N(s)- J...s) I Fs]


+ E [ (N(s) - J.. s) 2 I Fs] = J.. (t - s) + (N(s) - ').. s) 2
= M(s) 2 + J.. (t- s)
so that M 2 (t)- J.. t, t ~ 0, is a martingale. •

In Section 3.9.3.1 we have defined the variation and the total variation for
deterministic functions and stochastic processes (Eqs. 3.54-3.56). In the follow-
ing sections we will extend the concept of variation for deterministic functions
to samples X(·, w), w E Q, of a stochastic process X(t), t 2: 0. The varia-
tion of X(·, w) in a time interval [a, b] is defined by Vx(w) = sup[all p) Vx(w, p),
182 Chapter 3. Stochastic Processes

where p = (a =to < t1 < · · · < tm =b) is a partition of [a, b], Vx(w, p) =
I:%'= 1 IX(tk, w) - X(tk-1, w)l, and the supremum is calculated over all partitions
p of [a, b]. We say that X is of finite variation if almost all samples of X are of fi-
nite variation on each compact of [0, oo). Otherwise, X is said to be of unbounded
variation.

3.12 Poisson processes


We have defined the Poisson and the compound Poisson processes in Exam-
ple 3.9. In this section we consider these processes in more detail. The processes
and random variables considered in this section are defined on the same filtered
probability space (Q, :F, (:F1 ) 1 ~o, P).

If {Tn, n = 0, 1, 2, ... , oo} with To = 0 a.s. is a strictly increasing sequence of


positive random variables, then

N(f) = L l(t~TnJ (3.106)


n~l

is the counting process associated with the sequence Tn. If T = supn {Tn} = oo
a.s., then N is said to be without explosions.

Note: The counting process N (1) has right continuous, piece-wise constant samples with
unit jumps at the random times Tn and starts at zero with probability one if T1 has no
probability mass at zero, (2) takes on positive integer values including {+oo}, (3) has cadlag
samples if T = oo a.s., and (4) is Fr-adapted if and only if Tn are .F1 -stopping times. A

A counting process N is said to be a Poisson process if it (1) is :F1 -adapted,


(2) is without explosions, and (3) has stationary increments that are independent
of the past, that is, N(t) - N(s), t > s, has the same distribution as N(t - s)
andisindependentof:Fs.

Note: The requirement of increments independent of the past is stronger than the require-
ment of independent increments stated in Section 3.6.4. Consider a process X adapted to a
filtration .F1 , t ~ 0, and take 0 :::; u < v :::; s < t. Since X (t) - X (s) is independent of Fs
by hypothesis, it is also independent of X ( v) - X (u) E Fv <; Fs . A

The probability that N has n jumps in (0, t] is


(A t)n
P(N(t) = n) = - - e-Jct, n = 0, 1, 2, ... , (3.107)
n!
for some A > 0. Hence, the random variable N(t) follows a Poisson distribution
with parameter At for each t > 0.
3 .12. Poisson processes 183

Note: The probability in Eq. 3.107 results from the defining properties of the Poisson
process ([147], Theorem 23, p. 14). The samples of N look like staircases with steps of
unit height and random width. The Poisson process is a special case of the compound
Poisson process (Fig. 3.5). We also note that N is continuous in probability, m.s., and
a.s. at each time t > 0 but does not have continuous samples.
The probability in Eq. 3.107 shows that (1) the period between consecutive jumps of
N is an exponential random variable with distribution P(T1 > t) = P(N(t) = 0) = e-1. t
and mean 1/). and (2) N starts at zero a.s. because P(T1 > t) --+ 1 as t--+ 0. •

The mean, variance, covariance function, cumulant of order q, and characteristic


function of a Poisson process N with intensity A. > 0 are

E[N(t)] =A. t, Var[N(t)] =A. t, Cov[N(s), N(t)] =A. min(s, t),


Xq = A. t, and q;(u; t) = exp [-A. t ( 1 - eR u) J. (3.108)

Note: These properties can be obtained from the probability law of N in Eq. 3.107 ([79],
pp. 79-83). The relationship between cumulants and moments can be found in [79] (Ap-
pendix B, p. 377), where q ~ 1 is an integer. Because E[N(t)] gives the expected number
of jumps of N in timet,). = E[N(t)]jt represents the average number of jumps per unit
of time and is called the intensity or the mean arrival rate of N. &

Example 3. 70: If N is a Poisson counting process with intensity A., then N is a


submartingale and N(t) -A. t, (N(t) -A. t) 2 -A. tare martingales.<>
Proof: The compensated process N(t) - ). t is a martingale by one of the properties in
Section 3 .11.1.
To prove that (N(t)-). t) 2 -). tis a martingale, we note that E[(N(t)-). t) 2 ]-). t <
oo and (N(t)-). t) 2 -). tis adapted. Also, fort ~ s,

E [ (N(t) -). t) 2 I :Fs] = E [ ((N(t) -). t) - (N(s) -). s) + (N(s) -). s)) 2 I :Fs]

= E [CN(t)- N(s)-). (t- s)) 2 I :Fs J + E [CN(s)- ).s) 2 I :Fs J


+2E[(N(t)- N(s)- ).(t -s)) (N(s)- ).s) I :Fs] = ).(t -s) + (N(s)- ).s) 2
since N(t) - N(s) is independent of :Fs, N(s) E :Fs, and N(t) - N(s) -). (t - s) is
a martingale. Hence, (N(t) - ). t) 2 - ). t is a martingale. We have performed similar
calculations in Example 3.69. •

Example 3.71: Let C(t) = Ln;:::l Yn 1(t::::Tn) = 2:::~? Yn be the compound Pois-
son process in Example 3.9 with Y1 E Lq, q ~ 2. The mean, variance, covariance
function, cumu1ant of order q, and characteristic function of C are
E[C(t)] =A. t E[YI], Var[C(t)] =A. t E[Yfl,
Cov[C(s), C(t)] =A. rnin(s, t) E[Yfl, Xq =A. t E[Y{], and
q;(u; t) = exp [-A. t (1 - q;y1 (u))], (3.109)
184 Chapter 3. Stochastic Processes

where A > 0 is the intensity of the Poisson process N and cp y 1 denotes the charac-
teristic function of Y !· <>
Note: These properties of C can be obtained from the definition of this process and the
probability law of the Poisson process in Eq. 3.107 ([79], Section 3.3). A

Example 3.72: Let C be a compound Poisson process with E[l Y Ill < oo. Denote
by F 1 the natural filtration of C. Then C is an F 1 -submartingale, martingale, or
supermartingale if E[Y!] ~ 0, = 0, or ::::: 0, respectively. The compensated
compound Poisson process C(t) = C(t)- At E[YI] is an .Frmartingale. If Y1 ~
0 a.s., then A(t) = At E[YI] is the process in the Doob-Meyer decomposition of
the submartingale C. <>
Proof: The first part of the statement has been proved in Example 3.61, where it was shown
that E[C(t) /J=:d =A (t- s) E[Yj] + C(s) for s ::::; t.
If Y1 :::: 0 a.s., then C is a submartingale such that C(t) :::: 0 for all t :::: 0. Hence,
C is of class DL (Section 3.11.2) so that it admits the Doob-Meyer decomposition in
Section 3.11.1. Since C(t) = C(t)- At E[Yj] is a martingale and At E[Yj] is an adapted
continuous increasing process, then the process A in the Doob-Meyer decomposition is
A(t) =At E[Yj] . •

Consider the compound Poisson process C(t) = L.fi?


Yk in Examples 3.9
and 3.71 and let Tk denote the time of jump k. Figure 3.12 shows a hypothetical

Poisson white noise

y+dy
y
- - - - - - -y- - -
I
~--------
y2 ---------------,--

I
0
time I

Figure 3.12. A hypothetical sample of the jumps of a compound Poisson process

sample of the sequence of jumps of C up to a time t > 0. This sequence, referred


to as Poisson white noise, can be interpreted as the formal derivative of C and
is Wp(t) = L.fi?
Yk 8(t- Tk), where 8(·) denotes the Dirac delta function. Let
3.12. Poisson processes 185

M (t, dy) be a random measure giving the number of jumps of C in (y, y + dy]
during a time interval (0, t]. This measure is equal to 2 for the sample of W p
in Fig. 3.12. The measure M(t, dy) (1) is random because its value depends on
the sample of C, (2) has the expectation E[M(t, dy)] = 'AtdF(y) = tL(dy)t,
where tL(dy) = 'AdF(y) and F denotes the distribution of Yr, and (3) provides
the alternative definition,

C(t) = .L y M(t, dy), (3.110)

for the compound Poisson process C ([172], Theorem 3.3.2, p. 145).

Let C be a compound Poisson process, A be a Borel set in IR, and M (t, dy) be
a random measure giving the number of jumps of C in (y, y + dy] during (0, t].
Then

cA(t)
N(t)

= LYk 1A(Yk) =
k=l
L
O<s<t
~C(s) 1A(~C(s)) = iA
y M(t, dy)

(3.111)
is a compound Poisson process, where ~C(s) = C(s)- C(s-) and C(s-) =
limstt C(s).

Note: Because cA retains only the jumps of C in A, it is called a thinned version of C. If


A = R, the processes cA and C coincide. Figure 3.13 shows a hypothetical sample of a

Compound Poisson
process ~o·--~Q~--~o----~._o
I 0 •

Poisson white noise

Thinned compound
Poisson process

10

time

Figure 3.13. Samples of a compound Poisson process C, its formal derivative


Wp(t) = dC(t)/dt, and a thinned version cA of C for A = (-a, a)c, a = 1
186 Chapter 3. Stochastic Processes

compound Poisson process C, the corresponding samples of the jumps of C, referred to as


Poisson white noise, and the thinned version cA for A = (-a, a)c, a = 1.
Denote by Yk = Yk lA (Yk) the jumps of C taking values in A. The characteristic
function of :Y1 is

fP:i\ (u) = E [eH u Y1 lA(YJ)]


= { eHuylA(Y)fy(y)dy+ { eHuylA(Y)fy(y)dy
h 1 h\A 1

= f e H u Y fy 1 (y) dy +f fy 1 (y) dy = rp~A) (u) + P(Y1 ¢ A)


h jiFI!.\A 1

so that J..t (1- 'Py1 (u)) = J..t (P(YJ E A)- rp~~l(u)) = 5.t (1- rj)(u)), where 5.

J..P(YJ E A),rp~~)(u) = fAeHuy fy 1 (y)dy,andrj)(u) = rp~~)(u)/P(YJ E A) de-


notes the characteristic function of Y1 I (Y1 E A). Hence, the characteristic function of
cA is 'PeA (u) = exp [ 5. t (I - rj)(u))], cA has jumps arriving at an average rate 5., and
the characteristic function of these jumps is rj).
The definitions in Eq. 3.111 can be modified to consider other transformations of C.
For example, the processes
N(t)
Ch(t) = L h(Yk) =~ h(y) M(t, dy) and
k=l y

ct(t) =
N(t)
L h(Yk)IA (Yk) = L
k=l O<s~t
h(~C(s))IA (~C(s)) = L h(y) M(t, dy)

are compound Poisson processes with jumps h(Yk) and h(Yk) lA (Yk) arriving at the mean
rates ). and 5., respectively, where h : JR. -+ JR. is a Borel measurable function. A

Example 3.73: Let C be a compound Poisson process with jumps satisfying the
condition E[Yf] < oo. The quadratic variation of the compensated compound
Poisson process C(t)
-
= C(t) -A. t E[YJ] is [C](t)
-
= Lk=l
N(t) 2
Yk. <>

Proof: Since C is a right continuous square integrable martingale, the sequence Sn


I:;~, (C(t~n))- CCtt!,)) 2 converges to [C](t) in probability and L1 as the mesh of the
sequence of partitions Pn = (0 = t~n) :::0 t~n) :::0 · • • :::0 t;r:} = t) of [0, t] converges to zero
(Eq. 3.102).
The expression of [C] results from the observation that the terms of the sum Sn in
Eq. 3.102 are not zero only in intervals (t~~l, t~n)] containing at least one jump of C and
considerations in Example 3.68. •

3.13 Brownian motion process


A Brownian motion B(t), t ::.:: 0, in JRd consists of d independent real-
valued Brownian motions B i, i = 1, ... , d. It is assumed unless stated otherwise
3.13. Brownian motion process 187

that B starts at 0 E JRd. If B (0) = x, we say that the Brownian motion starts at x.
Because the properties of B are determined by the properties of its coordinates B i,
we consider here only real-valued Brownian motions, and denote these processes
by B. We have defined the Brownian motion process B in Example 3.4. This
section gives essential properties of B.

There exists a modification of the Brownian motion with a.s. continuous paths
([147], Theorem 26, p. 17).

Note: This theorem is consistent with a previous result based on the Kolmogorov criterion
(Eq. 3.2). We will work exclusively with continuous modifications of the Brownian motion
B. Therefore, it will be assumed that B has continuous samples. A

If Pn = (0 = tcin) :S t{n) :S · · · :S t:::l = t) is a sequence of refining partitions


of [0, t] such that fl(pn) -+ 0 as n -+ oo, then ([147], Theorem 28, p. 18)

mn 2
lim "- (B(t(n)) - B(t<n_}_ )) = t a.s. (3.112)
n-+oo ~ k k 1
k=l

Note: This property justifies the notation (dB(t)) 2 = dt used extensively in applications.
If the mesh of the sequence of partitions Pn converges to zero as n --+ oo but is not
refining, then the limit in Eq. 3.112 exists in m.s. (Example 3.74). We have also seen that
the sequence Sn = I::~ 1 ( B(t~n))- B(t~':l 1 ) ) 2 converges tot as n --+ oo in probability
and L1 (Eq. 3.102).
Figure3.14showsthreesamplesofZmn(s) = I::~,; 1 (s(t~n) As)- B(t~':l 1 As)) 2

1.2,--------------, 1.2,--------------,
Samples of z Three samples of z
m
1 form =1000 n
n
0.8

0.6

0.4

0.2

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

time time

Figure 3.14. Samples of Zmn (s) = 2:~,; 1 ( B(t~n) i\ s) - B(t~':l1 i\ s))


2
for s <
t, t = 1, and t~n) = kfmn
188 Chapter 3. Stochastic Processes

for s ::::: t, t = 1, tt) = k/mn, and mn = 10, 100, and 1,000 corresponding to a single sam-
ple of the Brownian motion B. The figure also shows three samples of Zmn corresponding
to the same partition of [0, 1] (mn = 1, 000) but different samples of the Brownian motion.
If mn is small, the samples of Zmn can differ significantly from the identity function t ~--+ t
and from one another. However, these samples nearly coincide with t ~--+ t as the partition
is refined such that Ll(pn) -+ 0 as n -+ oo. A

The samples of a Brownian motion process B are of unbounded variation a.s. in


any bounded interval.

Proof: Let Pn = (0 = t6n) ::::: t}n) ::::: · · · ::::: tf;:} = t) be a sequence of refining partitions
of [0, t] such that Ll(pn) -+ 0 as n -+ oo. The left side of the inequality
mn mn
~)B(t~n))- B(t~':!1 )] 2 ::5 m:x IB(t~n))- B(t~':!1 )1 L IB(t~n))- B(t~':!1 )1
k=l k=l

and maxk IB(t~n)) - B(t~':! 1 )1 converge tot and zero a.s. as n -+ oo by Eq. 3.112 and
the continuity of the samples of B, respectively. To satisfy the above inequality, the
summationL:~,;, 1 1B(tt))- B(t~':!1 )1 must approach infinity a.s. as n -+ oo ([147], The-
orem 29, p. 19).

Brownian motion is a square integrable martingale with respect to its natural


filtration.
Proof: We have shown that B is an :Ft = a(B(s) : 0 :::; s :::; t)-martingale. Because
E[B(t) 2] = t < oo for each t > 0, B is an F 1-square integrable martingale. •

Example 3.74: The equality in Eq. 3.112 holds also in the mean square sense if
the mesh of the sequence of partitions Pn in this equation converges to zero as
n --+ oo. The sequence of partitions Pn does not have to be refining. <>

Proof: The firsttwo moments of Ymn = L:~,;, 1 [ B(tt))- B(t~':!1 ) t- tare E[Ymnl =0
and E[Y~n] = 2 L:~,;, 1 (tt)- t~':!1 ) 2 since the Brownian motion has stationary indepen-
dent Gaussian increments and the fourth moment of N(O, a 2) is 3 a 4 . Hence,
mn
E[Y: 2 ] < 2 max(/n) - /n) ) "(t(n) - t(n) ) = 2 t Ll(p ) -+ 0 as n -+ 00
mn - k k k-l ~ k k-1 n
k=l

so that Ymn -+ 0 in m.s. as n-+ oo. •

Example 3.75: Let B be a Brownian motion, N denote a Poisson process with


intensity A. > 0, and N(t) = N(t) -A. t . Assume that Band N are independent
of each other. The square bracket or quadratic variation process of B + N is
[B + N](t) = t + N(t). <>
3.14. Levy processes 189

Proof: Let Pn be a sequence of partitions of a time interval [0, t] and consider the sums

Sn = L [(B(tt)) + N(tt))- (B(t~~l) + N(t~~l)t


k

= L(LlB~n))Z + L(LlNt)) 2 + 2 L(ilB~n))(LlNt)),


k k k

where ilB~n) = B(tt))- B(t~~ 1 ) and ilN~n) = N(tt))- N(t~~ 1 ). If the sequence of
partitions Pn is refining and Ll(pn) --+ 0 as n--+ oo, then Lk(ilB~n)) 2 and Lk(/lNt)) 2
converge tot and N(t) a.s., while Lk(LlB~n))(LlN~n)) converges to zero since the Brow-
nian motion has continuous samples so that

and maxk lllB~n)l --+ 0 as n --+ oo. Hence, Sn converges tot+ N(t) a.s. as n --+ oo.
If the sequence of partitions Pn is not refining, then Sn converges to t + N (t) in m.s. and
probability as n --+ oo.
Similar arguments can be applied to show that [B + C](t) = t + L;f~? Yf", where
B is a Brownian motion, C(t) = L;f~? Yk is a compound Poisson process with jumps
Y1, Yz, ... in Lz, Cis independent of B, and C(t) = C(t)-). t E[Yll· •

3.14 Levy processes


This section defines Levy processes, presents some of their essential proper-
ties, and gives the Levy decomposition and the Levy-Khinchine formulas. It also
shows that Levy processes are more complex than the superposition of a Brownian
motion and compound Poisson processes.

Let (fJ, F, (F1 )t?:.O• P) be a filtered probability space. A process {X(t), t :=:: 0}
defined on this space is Levy if it
1. is F 1-adapted and starts at zero,
2. has stationary increments that are independent of the past, that is, X (t)- X (s ),
t > s, has the same distribution as X(t- s) and is independent of Fs, and
3. is continuous in probability.

Table 3.3 summarizes the three defining properties for the Poisson, Brow-
nian motion, and Levy processes. The third property differentiates these pro-
cesses. Yet, the Poisson, Brownian motion, and Levy processes are closely re-
lated. For example, the Poisson and Brownian motion processes are Levy pro-
cesses since they are continuous in probability. The compound Poisson process C
190 Chapter 3. Stochastic Processes

Table 3.3. Defining properties for Poisson, Brownian motion, and Levy processes

Poisson I Brownian motion I Levy

F 1-adapted starting at zero

stationary increments that are independent of the past


counting process I
Gaussian increments I
continuous in
without explosions B(t) - B(s) ,....., N(O, t - s) probability

in Eq. 3.6 is also a Levy process because it is F 1 -adapted, starts at zero, has station-
ary increments that are independent of the past, and P(IC(t) - C(s)l > 0) ---+ 0
as It- sl ---+ 0.

We have shown that the Poisson, compound Poisson, and Brownian motion
processes are not m.s. differentiable. It will be seen that the samples of the Levy
process are also too rough to be differentiable. Yet, it is common in the engi-
neering literature to use the formal derivative of the compound Poisson, Brownian
motion, and Levy processes, called the Poisson, Gaussian, and Levy white noise
process, respectively, to model the input to various physical systems. Because
the Poisson, Gaussian, and Levy white noise processes do not exist, calculations
involving these processes are formal so that the resulting findings may be ques-
tionable. We will see in the following chapters how white noise processes can be
incorporated in the theory of stochastic differential equations in a rigorous man-
ner.

Example 3.76: If X 1 and Xz are two independent Levy processes defined on the
same filtered probability space (Q, F, (F1k<:.O· P), then X = X 1 + Xz is also a
Levy process. <>

Proof: X is F 1 -adapted and has stationary independent increments, since it is the sum of
two processes with these properties. It remains to show that X is continuous in probability.
Set Ai = {IXi(t) -Xi(s)l > e/2}, i = 1, 2, and A= {IX(t)- X(s)l > e} for some e > 0.
Because Ac 2 A! n A2, we have A~ A1 U Az so that P(A) :::: P(A1) + P(Az) implies
P(IX(t)- X(s)l > e) -+ 0 as It- sl -+ 0 since the processes X1o Xz are continuous in
probability. •

Example 3.77: The characteristic function of a Levy process X has the form

({J(u; t) = E[eA u X(t)] = e-t 1/f(u), (3.113)

where 1/J is a continuous function with 1/J(O) = 0. <>


3.14. Levy processes 191

Proof: Levy processes have stationary independent increments, that is, the random vari-
ables X(t + s)- X(s) and X(s)- X(O), s, t::;: 0, are independent, so that

<p(u; t + s) = E [eR u [(X(t+s)-X(s))+(X(s)-X(O))]] = <p(u; t) <p(u; s),


which implies <p(u; t) = [<p(u; 1)]1 = exp( -t 1/f(u)) for any integer t ::;: 1. Hence, <pis an
infinitely divisible characteristic function (Section 2.10.3.2). A similar argument shows that
the above relationship can be extended to t ::;: 0 rational. It can be shown that the mapping
t ~--+ <p(u; t) is right continuous since X has a cadlag modification (Section 3.14.1) so that
<p(u; t) = exp( -t 1/f(u)) holds for any timet ::;: 0 ([147], Section 4).
The characteristic functions of the Poisson and compound Poisson processes in
Eqs. 3.108 and 3.109 have the functional form in Eq. 3.113. If X is a Brownian mo-
tion B, then <p(u; t) = E[eH u B(t)] = exp( -t u2 /2) since B(t) "' N(O, t) so that
1/f (u) = u2/2. The functional form of 1/f in Eq. 3.113 for an arbitrary Levy process is given
at the end of this section (Eq. 3.122). •

Example 3.78: Let X be an Fradapted process starting at zero that has station-
ary increments that are independent of the past. Suppose that the characteristic
function of the random variable X (t) for some t =:: 0 is cp(u; t) = e -t lula, where
u E lR and a E (0, 2). Then X is a Levy process that may or may not be an
Frmartingale depending on the value of a. <>
Proof: Let f denote the density of X(t)- X(s), t > s. For every e > 0 the probability
P(IX(t)- X(s)l >e) = f(-e,e)c f(x) dx converges to zero as (t- s)--+ 0 since f(x) =
-l;r f~oo eH u x e-<t-s) lui a du approaches a delta function centered at the origin as (t-
s) --+ 0 so that its integral over the interval (-e, e)c converges to zero.
It can be shown that E[IX(t)IP] is finite if and only if p E (0, a) ([162], Propo-
sition 1.2.16, p. 18). Hence, X is not a martingale for a ~ 1 because the expectation
E[IX(t)l] is not bounded but this process is an Ft-martingale for a > 1. If a = 2, the
characteristic function of X(t) is <p(u; t) = e-tu 2 so that X(t) "' N(0,2t) is a square
integrable martingale that has the same distribution as ../2 B, where B denotes a Brownian
motion process. •

3.14.1 Properties

Each Levy process has a unique modification which is Levy and cadlag ([147],
Theorem 30, p. 21).
Note: Only this modification is considered in our discussion so that we assume that the
Levy process has cadlag samples, that is, right continuous samples with left limits. The
Brownian motion and the compound Poisson are examples of processes with cadlag sam-
ples that are also Levy processes. j.

A Levy process preserves its properties if it is restarted at a stopping time ([147],


Theorem 32, p. 23).
192 Chapter 3. Stochastic Processes

Note: Let X be a Levy process, T denote a stopping time, and Y(t) = X(T + t)- X(T),
t ~ 0, be a new process derived from X. The process Y (1) is a Levy process adapted
to Fr+t• (2) is independent of Fr. and (3) has the same distribution as X. Hence, the
Brownian motion, Poisson, and compound Poisson processes have these properties because
they are Levy processes. ~

A Uvy process X can have only jump discontinuities, where

~X(t) = X(t)- X(t-) isthejumpofX att, (3.114)

X(t-) = limstt X(s), and X(t) = X(t+) = lims,l,t X(s).

Note: X has only jump discontinuities because it has cadUtg samples. The random variables
X(t-) and X(t+) = X(t) denote the left and the right limits of X at timet, respectively,
(Fig. 3.15). ~

X(t+)=X(t)
I

: M(t)=X(t)-X(t-)
I

~X(<-)
I

time

Figure 3.15. A jump discontinuity of X at timet

If a Levy process X has bounded jumps, that is, if there exists a finite constant
c > 0 such that supt I~X(t)l :::; c < oo a.s., then X has finite absolute moments
of any order, that is, E[IX(tWl < oo for n = 1, 2, ... ([147], Theorem 34,
p. 25).

Proof: Because Levy processes have dtdlag samples, they can have only jump discontinu-
ities ([147], p. 6).
LetO < c < oo be a constant such that sup1 I~X (t) I ::::; c a.s. and define the stopping
times

T1 = inf{t > 0: IX(t)l ~ c} and


Tn+l = inf{t > Tn : IX(t)- X(Tn)l ~ c} for n ~ 1.
3.14. Uvyprocesses 193

The sequence (T1, T2, ... ) is strictly increasing since X is right continuous. Moreover,
the stopped process xTn must satisfy the condition sup1 IXT•(t)l :::; 2nc < oo since
IX(t)- X(Tk)l cannot exceed 2c fortE (Tb Tk+l1 by the definition of the stopping times
Tk. Because a Levy process preserves its properties if it is restarted at a stopping time, the
random variable Tn - Tn-1 is independent of FTn-i and has the same distribution as Tt.
Hence,

E[e-T•J = fln E[e-<n-Tk-ilJ = ( E[e-Ti 1) n


k=l
since Tn = (Tn - Tn-t) + (Tn-1 - Tn-2) + ·· · + (Tt - To) and To = 0. Because
{IX(t)l > 2nc} £:; {Tn < t}, we have

P(IX(t)l > 2nc):::; P(Tn < t) = P(e-T" > e- 1 ):::; E[e-T"]e1 = (E[e-T1J)n i
by the Chebyshev inequality. The last formula giving the tail behavior of the distribution
of IX(t)l shows that X(t) has moments of any order since 0:::; E[e-T1] < 1. •

3.14.2 The Levy decomposition


The Levy decomposition and the Levy-Khinchine formulas show that a
Levy process X can be represented by the sum of three processes (Fig. 3.16). We
also show that X can be represented by the sum of three independent processes, a
Brownian motion, a compound Poisson process, and a limit of compound Poisson
processes (Example 3.84).

If X is a Levy process and A E B is a Borel set such that 0 ¢ A, then

L
00

NA(t) = 1A(.6.X(s)) = Ll!T/::::tl isaPoissonprocess (3.115)


n=l

with intensity AL(A) = E[NA(l)], called the Levy measure, where T1A
inf{t > 0: ~X(t) E A} and Tn~l = inf{t > TnA : ~X(t) E A}, n ~ 1.

Proof: The notation A is used for the closure of A, for example, the closure of an open
interval A = (a, {3) is A = [a, {3]. The process NA is defined by the jumps of X in A,
where T1A = inf{t > 0 : flX(t) E A} denotes the first time X has a jump in A and
Tn~l = inf{t > TnA : flX(t) E A}, n 2: 1, is the time of jump n + 1 of X in A. The
process NA is similar to cA in Eq. 3.111.
The definition in Eq. 3.115 is meaningful because Levy processes have cadUtg sam-
ples so that they can have only jump discontinuities. The process NA (1) is a counting
process giving the number of jumps of X in A during a time interval, (2) is adapted by
definition, (3) is without explosions since X has cadlag paths so that the random variables
T/ are stopping times with the property lirnn~oo TnA = oo, and (4) has stationary incre-
ments that are independent of the past. The distribution of this increments depends on only
the time lag t - s because X has stationary increments. Hence, NA is a Poisson counting
process with intensity AL(A) = E[NA(1)] < oo ([147], p. 26).
194 Chapter 3. Stochastic Processes

Let M(t, dy, w) = #{s :::0 t : ~X(s, w) E (y, y + dy]} be a random measure
counting the number of jumps of X in (y, y + dy] during (0, t] for each fixed (t, w). The
Poisson process NA can be defined by NA(t, w) = fA M(t, dy, w) for each w, where
AL(dy) = E[M(l, dy)] is the Levy measure of NA for A= (y, y + dy]. •

Example 3.79: If A is a Borel set such that 0 fl. A, then

E [i h(y) M(t, dy)] = t i h(y) )..L(dy) and

E [ (i h(y) M(t, dy)- t i h(y) AL(dy)) ] = t


2
i h(y) 2 )..L(dy), (3.116)

where )..L(dy) = E[M(1, dy)] denotes a Levy measure and h 1 A is a function in


L2 relative to the Levy measure ).. L. <>

Note: If A; are disjoint Borel sets and h = Li a; lAi, then

E [L h(y)M(t, dy)l ~ E [~a; NAnA; (I)] ~ ~" E [NAnA, Ctl]


= t Lai AL(A n A;)

since NAnAi is a Poisson process with intensity ).L(A n A;). Similar considerations can
be used to prove the second equality in Eq. 3.116. The extension to an arbitrary function
h lA E L2 can be found in [147] (Theorem 38, p. 28). &

Let X be a Levy process and A denote a Borel set such that 0 fl. A. The associ-
ated jump process,

JA(t) = L
O<s::St
~X(s) IA(~X(s)) = i y M(t, dy), (3.117)

and X- JA are Levy processes ([147], Theorem 37, p. 27).

Proof: J A has piece-wise constant cadlag sample paths with jumps ~X (TnA) arriving at
the times TnA defining the Poisson process NA (Eq. 3.115). J A is a compound Poisson
process defined by the large jumps of X, that is, the jumps of X in A. Therefore, almost all
samples of J A are of bounded variation on compacts.
Since J A is a compound Poisson process, it is also Levy. The process X - J A is
Levy as the difference of two Levy processes. •
3.14. Levy processes 195

If X is a Levy process and A = (-oo, -a) U (a, oo), a > 0, then


Y 0 (t) = X(t)- L
O<s:::t
.6.X(s) 1(1-!lX(s)l>a} = X(t) -1
lxl>a
Y M(t, dy)

(3.118)

has bounded absolute moments of any order.

Proof: The process in Eq. 3.118 can also be written as Y0 (t) = X(t)-JA(t) with the nota-
tion in Eq. 3.117. The jumps ofY 0 have magnitude smaller than a so that E[IY0 (t)ln] < oo
for any order n, by one ofthe properties in Section 3 .14.1. It is common to define the pro-
cess ya for a = 1, but any other value of a > 0 is acceptable. •

The jump processes J A; in Eq. 3.117 corresponding to Borel sets A 1 and Az


such that 0 ¢. Ai, i = 1, 2, and A 1 n Az = 0 are independent Levy processes
([147], Theorem 39, p. 30).

Note: A similar property holds for processes obtained from a compound Poisson process
by retaining its jumps in two disjoint sets. A

A Levy process X can be represented by X = Y + Z, where Y and Z are


independent Levy processes, Y is a zero mean martingale with bounded jumps
including all small jumps of X, and Z has paths of finite variation on compacts
including all the large jumps of X ([147], Theorem40, p. 31).

Proof: Recall that a process is said to have samples of finite variation on compacts if
almost all its samples are of bounded variation on each compact of [0, oo).
Take A = (-1, ll. The processes JA and X- JA are Levy (Eq. 3.117). Any
sample function s ~ X (s, w) has a finite number of jumps in A during a time interval
[0, t] so that J A is of finite variation on compacts. The jumps of the Levy process X- JA
are smaller than 1 so that this process has bounded absolute moments of any order. Because
X- J A has stationary independent increments and X (0) - JA (0) = 0, we have E[X (t)-
JA(t)] = f3 t, where f3 = E[X(l)- JA(l)]. Hence, Y(t) = (X(t)- JA(t))- f3 tis a
martingale. The claimed representation of X results by setting Z(t) = JA(t) + f3 t.
That Y and Z are independent follows from the observations that (1) the processes
JA and JA(e) are independent Levy processes according to the previous property applied
for A1 =A and Az = A(e) = (-1, -e) U (e, 1), e E (0, 1), and (2) the process JA(e)
approaches X - J A as e --+ 0. •
196 Chapter 3. Stochastic Processes

A Levy process X with bounded jumps, that is, supt ID.X(t)l :::; a a.s., a > 0,
has the representation

X(t)- E[X(t)] = zc(t) + r


j{IYI9}
y M(t, dy) = zc(t) + zd(t), (3.119)

where M (t, dy) =M (t, dy) - t 'AL (dy ), zd and zc are independent Levy pro-
cesses, zc is a martingale with continuous path, and zd is a martingale ([147],
Theorem41, p. 31).

Note: The representation in Eq. 3.119 shows that the process Y in the decomposition
X = Y + Z of a general Levy process defined by the previous statement can be represented
by the sum of two independent Levy processes, the processes ZC and zd. •

The last two statements show that any Levy process X has the representa-
tion (Fig. 3.16)

Jx= LevyJ
I
.. ..
Z(t) =JA(t)+~ t Y(t) =[X(t)- J~(t)]- ~ t
Levy with large compensated Levy with
jumps bounded jumps

t +
Z~(t) =m.artingale lzd(t)=flyl<a y [ M(t' dy)-t ')..,L (dy)]
With continuous
samples martingale

Figure 3.16. Components of a general Levy process X

X(t) = [(X(t)- JA(t))- f3 t] + [JA(t)) + f3 t] = zc(t) + zd(t) + Z(t),


(3.120)

where Zc, zd and Z are independent Levy processes, zc is a martingale with


continuous samples, zd is a martingale, and Z has paths of finite variation on
compacts (Fig. 3.16). The processes zc + zd and Z include the small and large
jumps of X, respectively.
The following theorem, referred to as the Levy decomposition, provides
additional information on the representation in Fig. 3.16 and Eq. 3.120 for an
arbitrary Levy process X. The representation depends on (1) a random measure
M defining a Poisson process NA(t) = fA M(t, dy) with intensity 'AL(A) for
any Borel set A, 0 ¢. A such that, iff' is another Borel set with the properties
0 ¢. f' and f' n A = 0, then NA and Nr are independent, (2) f3 = E[X(1)-
J A(1)] E lR, and (3) a Brownian motion process B that is independent of N A.
3.14. Levy processes 197

Levy decomposition. If X is a Levy process and A£ is a measure on R \ {0}


such that f min(1, y 2 ) AL(dy) < oo, then ([147], Theorem 42, p. 32)

X(t) = B(t) + { y(M(t, dy)- t AL(dy))


]lyl<l

+ t E [x(1)- { y M(1, dy)] + { y M(t, dy)


)IYI?.l )lyl~l
= B(t) + r
}lyl<l
y(M(t, dy)- t AL(dy)) + f3 t + L
O<s::::t
~X(s) 1(1.!lX(s)l~l}·
(3.121)

Levy-Khinchine formula. A Levy process X is defined by its characteristic


function E[eA u X(t)] =
e-t tfr(u), where

1/f(u) = a2 uz + { (1- eA u y + .J=I u y) AL(dy)


2 ]lyl<l

-Rf3u+ f (1-e.J=Iuy)AL(dy), (3.122)


)IYI~l
and A£, a 2 , and f3 need to be specified. The parameters A£, a 2 , and f3 define a
Levy process uniquely in distribution ([147], Theorem 43, p. 32).

Example 3.80: Let B denote a Brownian motion that is independent of a com-


pound Poisson process C(t) =
"Lf~? Yk. Then X =
B +Cis a Levy process.
The processes B and C correspond to the components Y and Z, respectively, in
the representation of a general Levy process (Fig. 3.16). o
Note: B is a martingale with a.s. continuous samples and E[IB(t)iP] < oo, p 2: 1,
and C has paths of finite variation on compacts. The variation 'Lf~? IYkl of C in (0, t],
0 < t < oo, is bounded a.s. since P(N(t) > n) --+ 0 as n --+ oo for each timet 2: 0. A

Example 3.81: Let X (t), t =::: 0, be a Levy process with characteristic function
({J(u; t) = E[eA u L(t)] = e-t lui", where a E (0, 2]. Figure 3.17 shows a sample
of a Brownian motion B, its variation, and its quadratic variation, respectively.
The figure also shows similar results for a Levy process X of the type considered
in Example 3.78 with a =
1.5. The sample of X exhibits jump discontinuities
whose effects are most visible in the sample of the quadratic variation process
[X] of X. The steady increase in the sample of [X] resembling the sample of [B]
is interrupted by jumps characteristic of the quadratic variation of a compound
Poisson process. 0
Note: The sample of X in Fig. 3.17 has been generated by the recurrence formula X(t +
llt) = X(t) + l!.X(t), t = 0, llt, 2/lt, .. ., where X(O) = 0 and /lX(t) are independent
198 Chapter 3. Stochastic Processes

Brownian motion Levy process (a=1.5)


6,-------~------~--~ 400 ,-------------------~
Sample Sample
4

-2
0 2 4 8 10 2 4 6 8 10
200 8000
Variation Variation
150 6000

100 4000

50 2000

0
2 4 6 10 0 2 4 6 8 10
X 104
15 6
Quadratic variation [B] Quadratic variation [X]
10 4

5 2

2 4 6 10 2 4 6 8 10
time time

Figure 3.17. Samples of a Brownian motion and a Levy process and their varia-
tion and quadratic variation

random variables with the characteristic function q;(u) = e-i':.t lui". Additional informa-
tion on the generation of samples of X and other processes is given in Chapter 5 . .&

Example 3.82: Let AL(dy) = dy fly Ia+! for y E lR \ {0} and a E (0, 2) be a
measure defining a Levy process X. Consider also a compound Poisson process
C(t) = L,~~? Yk. where A > 0 denotes the intensity of the Poisson process
N and Yk are independent random variables with the distribution F defined by
AL(dy) = A.dF(y).
There are notable similarities and essential differences between X and C.
Both processes are defined by random measures M (dt, dy) giving the number of
jumps in (t, t + dt] x (y, y + dy]. However, the average number of jumps in a
Borel set A per unit oftime is finite for C but may be infinite for X. <>

Proof: TheaveragenumberofjumpsinA E BperunittimeisfA AL(dy) =A fA dF(y) :S


A < oo for C. However, the average number of jumps of X per unit of time fA AL (d y)
3.14. Levy processes 199

may or may not be finite depending on A. For example,

{ AL(dy) =2 ('" y-(a+l) dy =~ < oo and


1(-a,a)c 1a aa
{ AL(dy) = 2 fa y-(a+l) dy = oo
1(-a,a) 1o+

for A = (-a, al and A = (-a, a), 0 < a < oo, respectively. The large jumps of X
generate the compound Poisson process J(-a,a)c defined by Eq. 3.117. This process has
iidjumps l.b.X(t)l ~a arriving in time at the mean rate 1/(a aa). The small jumps of X
do not define a compound Poisson process because the mean arrival rate of these jumps is
not finite. Additional considerations on the jumps of X and measures needed to count the
small jumps of this process can be found in [162] (Section 3.12). •

Example 3.83: Let n be a measure defined on ~d \ {0} satisfying the condition


f (1/\ II y 11 2) n(dy) < oo, and define the function
1
1/t(u) = J=l aT u + 2 q(u)
+ ld (
1 - e..;-:::rur Y + J=1 uT Y 1niYII<ll) n(dy) (3.123)

for every u E ~d, where a E ~d and q : ~d --* ~ is a quadratic form of u. Then


there exists a unique Levy process X in distribution such that its characteristic
function is qJ(u; t) = E [eAuT X(t)] = e-t1/I(U). Moreover, the jump process
{.b.X(t), t ~ 0} of X generates a Poisson process with measure 11: ([20], Theo-
rem 1, p. 13). The measure n(dy) gives the number of jumps of X in (y, y + dy]
per unit of time.
Figure 3.18 shows samples of a real-valued Levy X process with a = 0.1,
q(u) = a 2 u 2 , a = 1, 8 = 0.1, n(dy) = IYI-(a+l) dy, and several values of
ot. The representation X ::::: x<1l + x<2l + x<3,e) was used to generate the sam-
ples in this figure, where x<l)' x<2)' and x<3.e) are mutually independent pro-
cesses, X(l) = a B(t) - at, x< 2l and x< 3.e) are compound Poisson processes
with measures n<2l(dy) = n(dy) 1(1YI2:1} and n<3·"l(dy) = n(dy) l(e<IYI<l). and
8 E (0, 1). ~

Proof: We show that X = x< 1l + x< 2l + x<3l, where x< 1l (t) = a B(t)- at, a is a (d, d)
matrix, the coordinates of B E Rd are independent Brownian motion, x<2l is a compound
Poisson process with jumps of magnitude larger than 1, and x< 3l is a process including all
jumps of X strictly smaller than 1.
Thecharacteristicfunctionofx(ll(t) ~ N(-at,aaT t)isrp(ll(u; t) = e-ty,Ol(u),
i
where 1/f(l) (u) = AuT a + uTa aT u.
Consider first the large jumps of X, that is, jumps with magnitude larger than 1
and denote n<2l(dy) = :rt(dy) 1111YII?:l}· The total mass of n<2l is finite and x< 2l(t) =
Ls~t b.X(2) (s) is a compound Poisson process with characteristic function rp<2l(u; t) =
200 Chapter 3. Stochastic Processes

500,---~~~~~~~~~-, 20,-~~-~~~~~~~-,

a=1.0 a=1.5

-500

-1000

-1500'--~~~~~~~~~_j -80'--~~~~~~~~~_j
0 20 40 60 80 100 0 20 40 60 80 100

40,-~~~~~~~~~-,

a=1.7
20

-20

-40'--~~~~~~~~~___j

0 20 40 60 80 100
time time

Figure 3.18. Samples of a Levy process X with a 0.1, a = 1, E: 0.1,


n(dy) = IYI-(a+l) dy, and several values of a

e-t1fJ<Zl(u), where 1jJ<2l(u) = ffiii.d (1- eyCiuT Y) n<2l(dy), ~x<2l(s) = ~X(s) if


= 0 if II ~X(s) II< I.
II ~X(s) 112: 1, and ~x< 2l(s)
Consider now the process x< 3l (t) = Ls<t ~x< 3 l (s) corresponding to the small
jumps of X defined by ~x(3l (s) = ~X (s) if II b:x (s) II< 1 and zero otherwise. Define
the compensated compound Poisson process

x(3,e)(t) = L ~X(s) l{e<llt.X(s)ll<l}- t


s~t
r d yn(3,e)(dy), t 2:0,
jfili.

for every B > 0, where n< 3,sl(dy) = n(dy) l{s<IIYII<l}· This process is independent of
x< 2l and has the characteristic function rp< 3,e) (u; t) = e-t 1/1(3,s)(u)' where

It can be shown that {X( 3,sl,


B > 0} is a Cauchy sequence with the norm defined by the

J
expectation E [ sup09 g II Y (s) 11 2 and its limit denoted by x< 3l is a Levy process with
the characteristic function rp(3l(u; t) = e-t1/1< 3l(u), where ([20], Theorem 1, p. 13)
3.15. Problems 201

Hence, xC3) is a limit of compound Poisson processes with bounded jumps.


The processes x(l), xC2), and xC 3) correspond to the processes zc, Z, and zd in
Eq. 3.120 and Fig. 3.16 giving the decomposition of a general Levy process. •

In the following chapter we will define an integral, referred to as a stochastic


integral, with integrand and integrator given by a process with caglad samples and
a semimartingale, respectively. This integral is sufficiently general to establish
the Ito formula and find the solution of a variety of stochastic problems. We will
define semimartingales and give some of their essential properties in Section 4.4.1.
For example, we will see that an adapted process X with cadlag samples is a
semimartingale X if it admits the representation X(t) = X(O) + M(t) + A(t),
where M(O) = A(O) = 0, M is a local martingale, and A is an adapted process
with cadlag samples of finite variation on compacts.

Example 3.84: Any Levy process is a semimartingale. ¢


Proof: If X is a Levy process, then it has the representation X = Y + Z, where (1) Y and
Z are independent Levy processes, (2) Y is a martingale with bounded jumps so that it is
a square integrable martingale (Section 3.10.2), and (3) Z has paths of finite variation on
compacts. The process X is adapted and cadlag since it is Levy and so are the processes Y
andZ. •

Example 3.85: The Poisson process Nand the square of a Brownian motion B
are semimartingales. ¢
Proof: The processes Nand B 2 admit the representation M +A, where (M(t) = N(t)-
At, A(t) =At) and (M(t) = B(t) 2 - t, A(t) = t), respectively. The properties of M and
A show that Nand B 2 are semimartingales. For example, B(t) 2 - tis a martingale since
E[B(t) 2 ] = t < oo, B 2 is :Ft = cr(B(s), 0 ::; s ::; t)-adapted, and fort ~ s we have

E[B(t) 2 I :Fsl = E [ (B(t)- B(s)) 2 + 2 B(t) B(s)- B(s) 2 I :Fs] = (t- s) + B(s) 2 .
The martingale B(t) 2 - t has finite moments of any order and starts at zero. The process
A(t) = t is continuous, adapted, and A(O) = 0. We also note that the filtration :Ft
cr ( B (s), 0 ::; s ::; t) includes the natural filtration of B 2. •

3.15 Problems
3.1: Let Y be a real-valued random variable on a probability space (Q, F, P).
Find the natural filtration of the stochastic process X(t) = Y h(t), t 2:: 0, where
h : [0, oo) -+ ~ is a continuous function.

3.2: Complete the proofs of the properties of stopping times in Section 3.4.

3.3: Prove the first three statements following Eq. 3.9.


202 Chapter 3. Stochastic Processes

3.4: Write the finite dimensional densities for an IRd -valued Gaussian process X
with specified mean and covariance functions.

3.5: Is the process in Eq. 3.33 ergodic in the first two moments, that is, is Eq. 3.15
satisfied with g(x) = xP, p = 1, 2?

3.6: Let X (t), t ::: 0, be a real-valued Gaussian process with mean zero and
covariance function E[X(t) X(s)] = t 1\s. Show that this process has independent
increments.

3.7: Show that a compound Poisson process has the second moment properties in
Eq. 3.109.

3.8: Prove the results given in Table 3.1.

3.9: Consider the process X in Eq. 3.33 with Ak =A+ U + k and Bk = B + Vk.
where A, B, Uk, Vk are mutually uncorrelated random variables with zero-mean
zero and unit-variance. Is X a weakly stationary process?

3.10: Complete the proof of the m.s. continuity criterion in Section 3.9.1.

3.11: Show that if X is m.s. continuous on a closed bounded interval I, then X is


uniformly m.s. continuous on I.

3.12: Prove that the m.s. derivative of a process X exists and E[X(t) 2 ] < oo if
and only if a 2 r(u, v)j(au av) exists and is finite at u = v = t.

3.13: Consider an m.s. differentiable process X defined on an interval I. Show


that X(t) = 0 for all t E I if and only if X is constant on I.

3.14: Prove that the m.s. derivative of a weakly stationary process X exists and
E[X(t) 2 ] < oo if and only if d 2 r(r:)jdr: 2 exists and is finite at r: = 0.

3.15: Show that a monotone or differentiable function h : [a, b] -+ IR is of


bounded variation on [a, b].

3.16: Show that the variation Vh(P) defined by Eq. 3.54 increases as the partition
p is refined.

3.17: Prove the statements in Eqs. 3.58, 3.59, 3.60, and 3.61.

3.18: Prove Eqs. 3.62, 3.63, and 3.66.

3.19: Find the second moment properties of a process Y(t) = f~ ~(u)dX(u),


where X(t) = fR exp(.J=T tv) dZ(v) (Eq. 3.71) and Z has the properties in
Eq. 3.72.
3.15. Problems 203

3.20: Develop the Karhunen-Loeve representation for a band limited white noise
process with mean zero (Table 3.1).

3.21: Find the mean crossing rate of a process Y(t) = a(t) X(t), where X is
given by Eq. 3.33 and a is a differentiable function.

3.22: Find the mean upcrossing rate of the translation process in Eq. 3.25, where
G satisfies the conditions in Eq. 3.92. Specialize your results for a lognormal and
exponential distribution function F.

3.23: Develop an algorithm for calculating the density !T in Eq. 3.94 approxi-
mately.

3.24: Show that the expectation of a submartingale and supermartingale is an


increasing and decreasing function of time, respectively.

3.25: Prove that a counting process N associated with the strictly increasing se-
quence of positive random variables {Tn, n = 1, 2, ... } is adapted to the natural
filtration of N if and only if {Tn, n = I, 2, ... } are stopping times.

3.26: Complete the proof ofEq. 3.116 in Example 3.79.

3.27: Show that the process J A defined by Eq. 3.117 has paths of finite variation
on compacts.

3.28: Let X be a Levy process in L 1· Show that X (t) - E [X (t)] is a martingale.

3.29: Show that the sum of two independent compound Poisson processes is a
compound Poisson process.
Chapter 4

Ito's Formula and Stochastic


Differential Equations

4.1 Introduction
The probabilistic concepts reviewed in the previous chapters are applied to
develop one of the most useful tools for the solution of stochastic problems, the
Ito calculus. Our objectives are to:
1. Define the stochastic integral or the Ito integral, that is, an integral involving
dtglad (left continuous with right limits) integrands and semimartingale integra-
tors. The Ito integral cannot be defined in the Riemann-Stieltjes sense because
many semimartingales do not have sufficiently smooth samples. The stochastic
integral considered in this chapter is an extension of the original Ito integral de-
fined for Brownian motion integrators. We also define an alternative to the Ito
integral, called the Fisk-Stratonovich integral or the Stratonovich integral, that
has properties similar to the Riemann-Stieltjes integral. The relationship between
the Ito and the Stratonovich integrals is also established.
2. Develop a change of variable formula for functions of semimartingales, called
the Ito formula. This formula differs from the change of variable formula of
classical calculus and is essential for the solution of many stochastic problems in
engineering, physics, and other fields.

3. Introduce a new type of differential equation, called a stochastic differential


equation, giving the propagation in time of the state of a physical system driven by
semimartingales, and establish conditions under which the solution of a stochastic
differential equation exists and is unique. The semimartingale noise includes the
Gaussian, Poisson, and Levy white noise processes and is sufficiently general for
our developments. The theory of stochastic differential equations provides essen-
tial tools for the analysis of deterministic partial differential equations, random

205
206 Chapter 4. Ito's Formula and Stochastic Differential Equations

vibration, and other stochastic problem in sciences and engineering, as we will


see in the following chapters.
Examples are presented to clarify the calculation of the Ito and Fisk-Straton-
ovich integrals, illustrate the relationship between these integrals, and demonstrate
the use of the Ito formula to find probabilistic characteristics of the solution of
stochastic differential equations driven by white noise.

4.2 Riemann-Stieltjes integrals


A finite set of points p ={to, t1, ... , tm} such that 0 =to < · · · < tm = t
is called a partition of [0, t]. A point tk E [tk-1, fk] is an intermediate point of
p. The mesh of the partition pis f:...(p) = maxl:<:=k:<:=m Ctk- fk-1). A partition p' is
a refinement of p or refines p if p' 2 p. A real-valued function h is of bounded
r-variation on [0, t] if sup {I:i \h(ti)- h(ti-IW} < oo, where r > 0 and the
supremum is taken over all partitions of the interval [0, t]. If r = 1, h is said to
be of bounded variation on [0, t] (Section 3.9.3.1).
Consider two real-valued functions f and g defined on a bounded interval
[0, t] c JR. Define the Riemann-Stieltjes sum off with respect to g corre-
sponding to a partition p of [0, t] by

m
Sj,p(g) = L f(tk) [g(tk)- g(fk-I)]. (4.1)
k=l

We say that f is Riemann-Stieltjes integrable with respect tog on [0, t] if there


is a number a such that, for every 8 > 0, there is a partition p 8 of [0, t] with the
property Is f,p(g) -a\ < 8 for every p 2 Ps and every choice of intermediate
points ([5], p. 141). The number a is called the Riemann-Stieltjes integral off
with respect tog on [0, t] and is denoted by J~ f(s) dg(s) or J~ f dg.

It can be shown that J~ f dg exists if ([131], p. 94)


1. f and g have no discontinuities at the same points E [0, t] and
2. f and g are of bounded p-variation and q-variation, respectively, for some
p > 0 and q > 0 such that 1/ p + 1/q > 1.

Note: The Riemann-Stieltjes integral JJ


f dg can be obtained as the limit of a sequence of
sums s f,Pn (Eq. 4.1), where Pn is a sequence of refining partitions with arbitrary interme-
diate points tk (n) E [t~~~, t~n)] such that f}.(pn) -+ 0 as n -+ oo. &

Let /I, h be Riemann-Stieltjes integrable with respect tog on [0, t], f be


Riemann-Stieltjes integrable with respect tog, g1, g2 on [0, t], c1, c2 some con-
stants, and s E (0, t). The Riemann-Stieltjes integral has the following properties
([5], Theorem 7.2, p. 142, Theorem 7.3, p. 142, and Theorem 7.4, p. 143).
4.2. Riemann-Stieltjes integrals 207

• lot(qft+c2h)dg=q lot ftdg+c2 lot f2dg,

e lot f d(q g1 + C2 g2) = C1 lot f dg1 + C2 lot f dg2, and

• lot f dg =los f dg + [t f dg. (4.2)

Note: The last equality in Eq. 4.2 holds if JJ f dg exists. The integral J: f dg in this
equation is JJ l[s,t) f dg. A
Let B be a Brownian motion process. The sample paths s r--+ B (s, w) of B
are of bounded q-variation on any finite interval [0, t] for q > 2 [182]. Hence,
J~ f(s) dB(s, w) exists as a Riemann-Stieltjes integral for almost all sample paths
of B if f is of bounded variation since 1I p + 1I q > 1 for p = 1 and q > 2.
The definition of J~ f dB as a Riemann-Stieltjes integral corresponding to the
sample paths of the Brownian motion is referred to as path by path definition.
The integrals

exist as Riemann-Stieltjes integrals for almost all sample paths of the Brownian
motion because the functions e 8 , cos(s), and sk are of bounded variation.
These observations may suggest that the path by path definition can be
extended to more general integrals, for example, J~ B dB and J~ f dB, where
f is an arbitrary continuous function. However, the Riemann-Stieltjes integrals
J~ B(s, w) dB(s, w) and J~ f(s) dB(s, w) may not exist because:

1. The condition 1I p + 1I q > 1 is not satisfied by the integrand and the integrator
of the path by path integral J~ B (s, w) dB (s, w) and

2. The Riemann-Stieltjes integral J~ f(s) dg(s) does not exist for all continuous
functions f on [0, t] unless g is of bounded variation ([147], Theorem 52, p. 40)
so that the integral J~ f (s) dB (s, w) cannot be defined as a path by path Riemann-
Stieltjes integral for an arbitrary continuous function f.

We need an alternative definition for J~ B dB, J~ f dB, and other more


general integrals involving random integrands and integrators. The definition con-
. sidered in the following sections allows a broader class of integrands and integra-
tors than the path by path definition but is less intuitive than the Riemann-Stieltjes
integral.
208 Chapter 4. ItO's Formula and Stochastic Differential Equations

4.3 Preliminaries on stochastic integrals


Consider two real-valued stochastic processes, X and Y. Our objective is
to define the integral of X with respect to Y on a bounded interval I of the real
line, referred to as the stochastic integrals of X with respect to Y on I. We will
clarify in the first part of this chapter the precise meaning of the stochastic integral
and establish conditions that X and Y must satisfy so that this integral exists in
some sense. The definition of the stochastic integral is based on sequences of sums
resembling the Riemann-Stieltjes sums of classical calculus (Eq. 4.1). However,
there are two notable differences between the definition of stochastic integrals and
Riemann-Stieltjes integrals.

1. The sequence of sums defining stochastic integrals is random. Convergence


criteria for sequences of random variables need to be used to define the stochastic
integral. One commonly used convergence criterion is mean square convergence.
2. The limit of the sequence of sums defining a stochastic integral may depend on
the selection of the intermediate points t~ (n) of the sequence of partitions Pn of
[0, t]. In contrast, the Riemann-Stieltjes integral has the same value irrespective
of the intermediate points used to define the sequence of sums in Eq. 4.1.

We illustrate these differences by examples calculating the stochastic inte-


grals J;
B dB and J; N dN, where B and N denote the Brownian motion and
Poisson processes, respectively. These integrals are defined as limits of random
sequences of sums corresponding to partitions of the interval of integration [0, t].
Generally, the result of the integration depends on the selection of the intermediate
points of the partitions of [0, t]. Two stochastic integrals are defined for particular
selections of the intermediate points, the Ito and the Stratonovich integrals. We
show that the Ito integral of the Brownian motion with respect to itself is a mar-
tingale while the corresponding Stratonovich integral is not a martingale. We also
show that the Ito and the pathwise Riemann-Stieltjes definitions of the integral
J; N (s-) d N (s) coincide.

Example 4.1: Consider the random sequence


mn
fB,n(B) = L B(t~~ 1 ) [ B(tt))- B(t~~ 1 )],
k=!

where Pn = (0 = tcin) < t{n) < · · · < t~~ = t) is a sequence of partitions of


[0, t] with intermediate points t~ (n) = t~~! such that !.'i(pn) --+ 0 as n --+ oo. The
limit of fB,n(B) as n --+ oo exists in m.s. and in probability. This limit, denoted
by J;B(s) dB(s) or J;B dB and called the Ito integral of B with respect to B
on [0, t], is
lo
0
t
B(s) dB(s) =
1
-(B(t) 2
2
- t), t 2: 0. (4.3)
4.3. Preliminaries on stochastic integrals 209

If, in addition, the sequence of partitions of [0, t] is refining, that is, Pn C Pn+l•
then lB,n(B) converges also a.s. to the limit in Eq. 4.3. The Ito integral differs
from B(t) 2 j2, that is, the expression of the integral J;
B dB obtained by the for-
mal use of classical calculus. <>

Proof: The notation lB,n(B) is used for simplicity although itis slightly inconsistent with
the definition of s f,p(g) in Eq. 4.1. We have

where the last equality holds since :L~;, 1 ( (B(t~n)) 2 - B(t~~l ) 2 )) is a telescopic series
whose sum is B(t) 2 . The m.s.limit of J B n(B) is (B(t) 2 - t)/2 since

E [ (Js,.(B)- ~ (B(t) t)) '] ~


2- E [ ( -~ E( B(tl"'J- B(t)''\l )' + ~) ']
=! L E [ (6.Bk) 2 (6-B/) 2] - !_ L E [ (6.Bk) 2] + ::_
4 k,l 2 k 4

approaches zero as n -+ oo, where 6-Bk = B(t~n)) - B(t~~ 1 ). Hence, the Ito integral
JJ B dB can be defined as the m.s. limit of lB,n(B). The integral can also be defined as
the limit in probability of lB,n(B) because the m.s. convergence implies convergence in

t
probability (Section 2.13).

1
Because limn-+oo :L~;, [ B(tt))- B(t~~ ) 1
= t holds a.s. for a sequence of
refining partitions of [0, t] (Section 3.13 in this book, [147], Theorem 28, p. 18), JB,n(B)
converges a.s. to (B(t) 2 - t)/2 for refining partitions so that in this setting the Ito integral
can be defined as the a.s.limit of lB,n(B). •

Example 4.2: Consider the random sequence

mn
JB,n (B) = L B(ti (n)) [ B(t~n)) - B(t~~ 1 )]
k=l

that differs from lB,n(B) in Example 4.1 by the choice of the intermediate points,
tk (n) = (1 - O)t~~l + (} tt), (} E [0, 1], instead of t~~1 • The limit of JB,n (B) as
n ~ oo exists in m.s. and in probability and is

-
l.i.m. lB n(B)
n-+oo '
= -21 B(t) 2 + ((}- 1/2) t. (4.4)
210 Chapter4. Ito's Formula and Stochastic Differential Equations

If e = 1/2, this limit, denoted by f~ B(s) o dB(s) or f~ B o dB and called the


Stratonovich integral of B with respect to Bon [0, t], is

t
lo
B(s) o dB(s) = t
lo
B o dB = ~ B(t) 2 .
2
(4.5)

The Stratonovich integral coincides with the result obtained by the formal use of
the classical calculus. The difference between the Ito and Stratonovich integrals
is caused by the relationship between the integrands B(t ~~ 1 ) and B(t£(n)) and the
integrators B(t~n))- B(t~~ 1 ) of these integrals. They are independent for the Ito
integral but are dependent for the Stratonovich integral. <>
Proof: Take a fixed 8 E [0, 1] and denote by

t::.Bk = B(tt))- B(t~~ 1 ) ~ N(O, !::.tk),


t::.Bk = B(tk(n))- B(t~~ 1 ) ~ N(O, 8 !::.tk), and

t::.Bf: = B(t~n)) - B(tk (n)) ~ N(O, (1 - 8) !::.tk)

.
the mcrements of B m
. ( (n) (n)
tk-l, tk(n)) , ( tk-l, tk1 (n)) , an d (tk1 (n) , tk(n)) , respectiVe
. l
y, w h ere
!::.tk = tt) - t~~ 1 . With this notation we have
mn mn
JB,n(B) = L B(t~~ 1 ) !::.Bk + L !::.Bk !::.Bk
k=l k=l

since B(tk(n)) = B(t~~1 ) + t::.Bk. We have shown that the first term of J B,n(B) converges
in m.s. to (B(t) 2 - t)/2 as n -+ oo and its limit defines the Ito integral (Eq. 4.3). It remains
to show that the second term in JB,n(B), that is, L~~~ t::.Bk t::.Bk, converges in m.s. to
8 t since Xn~ X and Yn ~ Y imply Xn + Yn ~ X+ Y (Section 2.13). The first
moment of the second term of JB,n(B) is

E [I: t::.Bk t::.sk] = L E [<t::.Bk)2] +L E [t::.Bk t::.Bf:] = L8 t::.tk = 8t


k=l k k k

since t::.Bk = t::.Bk + t::.Bf: and E[t::.Bk t::.Bf:] = E[t::.Bk] E[!::.BkJ = 0. The second
moment of this term,

E [(I:
k=l
t::.Bk [),Bk) = E [(I:(t::.Bk) +I: t::.Bk t::.Bf:)
2

k=l
] 2
k=l
2
]

= L E[(t::.Bk> (t::.B[) +L E[(t::.Bk) +L E[t::.Bk t::.Bf: t::.B[ t::.B/'1


2 2] 4]
kf.l k k,l

+2 L E[(t::.Bk) 2 t::.B[ t::.B/'J = 82 L t::.tk t::.tt + 3 82 ,Lct::.tk) 2


k,l kf.l k

+8(1- 8) ,Lct::.tk) = 8 L t::.tk Mt +28


2 2 2 _L<t::.tk) 2 +8(1- 8) ,Lct::.tk) 2,
k k,l k k
4.3. Preliminaries on stochastic integrals 211

converges to cet) 2 as n --+ oo since Lk 1 D.tk Mz approaches t 2 and Lk(D.tk) 2 ::s


maxz{Ll.tz} Lk D.tk converges to zero as Ll.(;n) --+ 0. Hence, 2:~ 1 Ll.Bk Ll.Bk converges
in m.s. to 8 t so that the m.s. limit of JB ,n (B) exists and is

- 1 2 1 2
l.i.m. JB nCB)= -2 (B(t) - t)
n---+oo '
+e t = -2 B(t) + (8- 1/2) t.

This limit coincides with the Ito integral for () = 0 but differs from this integral for () i= 0.
The Stratonovich integral results for () = 1/2. •

The stochastic integrals in Examples 4.1 and 4.2 can be extended to define
stochastic processes. Take a fixed t > 0 and set

M(s) =lot 1[0,sJ(u) B(u) dB(u) = los B(u) dB(u) and (4.6)

M(s) = lot 1[0,sj(U) B(u) o dB(u) = los B(u) o dB(u), (4.7)

where s E [0, t]. These integrals can also be defined, for example, as the m.s. lim-
its of the sums
mn
lB,n(B)(s) = L B(t~':l1 ) [ B(t~n) 1\ s) - B(t~':l 1 1\ s)] and (4.8)
k=l
mn
JB,n(B)(s) = L B(tk (n)) [ B(tt) 1\ s) - B(t~':l 1 1\ s)]. (4.9)
k=l

Calculations as in Examples 4.1 and 4.2 can be used to show that J B n(B)(s) and
JB,n(B)(s) converge in m.s. as n--+ oo to (B(s) 2 - s)/2 and B(s) 2 (fJ -1/2) s, +
respectively.

Example 4.3: Let t > 0 be a fixed time and M be the stochastic process in
Eq. 4.6. Then (1) E[M(s)] = 0, (2) E[M(s) 2 ] = s2 /2, (3) M(s) = M(a) +
J; B(u) dB(u) for a E (0, s), (4) M has continuous samples, and (5) M is a
square integrable martingale with respectto the natural filtration .Ft = a ( B (s), 0 :;::
s =::: t) of the Brownian motion B. <>
Proof: The stated properties of M follow from Eq. 4.3 properties of the Brownian motion.
However, we present here a direct proof of these properties.
With the notations Bk-1 = B(t~~ 1 ), Ll.Bk = B(tt) As) - B(t~~l As), and
:Fk = :Ftk(n), the first moment of JB ' n(B)(s) is
212 Chapter 4. Ito's Formula and Stochastic Differential Equations

foreachn ands E [0, t] because b.Bk = Ofor s ~ rf~ 1 and E[b.Bk I .Fk-Il = E[b.Bk] =
0 for s > t~':l 1 . The second moment of JB,n(B)(s) is

E[(JB,n(B)(s)) 2 ] =E [L
k,l
Bk-1 Bz-1 b.Bk b.Bz]

=L E[Bf_ 1 (b.Bk) 2 J + 2 L E[Bk-1 Bz-I b.Bk b.Bzl


k k>l
=L E { E[Bf_ 1 (b.Bk) 2 I Fk-tl} +2 L E { E[Bk-I Bz-1 b.Bk b.Bz I Fk-tl}
k k>t
= L E { Bf- 1 E[(b.Bdl} +2 L E {Bk-1 Bt-I b.Bt E[b.Bkl}
k k>t
= "t(n)
~ k-1
(t(n) 1\ s- t(n) 1\
k k-1
s)
k

since Bk-I and Bt-l are .Fk-1-measurable fork > l, b.Bk is independent of Fk-I·
E[b.Bk] = 0, and E[(b.Bk) 2] = tt) 1\ s - t~':l 1 1\ s. Hence,

E[{JB,n(B)(s)) 2] = Lt~':l 1 (tkn) 1\s -t~':l 1 1\s)


k

for each nand s E [0, t] and converges to fo


u du = s 2 /2 as the partition mesh approaches
zero.
The third property of M follows from

los B(u) dB(u) = lot l[O,aJ(u) B(u) dB(u) + f 1[a,s] (u) B(u) dB(u)
and the linearity of JB,n(B)(·) with respect to the integrand, where 0 ~a ~ s ~ t.
A proof that M is sample continuous can be found in [42] (Theorem 2.6, p. 38). The
proof starts with the observation that lB,n(B)(·) is a sample continuous process since

(n) (n) (n)


lB,n(B)(s) = JB,n(B)(tk-I) + B(tk-I) [B(s)- B(tk_ 1)]

"
10rs E tk-l'tk(n)) .
( (n)

It is left to show that M is a square integrable martingale. This result also follows
from Eq. 4.3. Here we only sketch a direct proof of this property. We show that lB,n(B)
is a square integrable, that is, E[(JB,n(B)(s)) 2] < oo, s E [0, t], JB,n(B)(s) E .Fs,
0 ~ s ~ t, and E[JB,n(B)(s) I Fa] = JB,n(B)(a) for s ::=: a (Section 3.11). We have
already shown that E [ {1B,n(B)(s)) 2] is finite. That JB,n(B)(s) is .Fs-adapted follows
from the definition of JB,n(B)(s). The process JB,n(B)(s) has the martingale property
E[JB,n(B)(s) I .Fa]= JB,n(B)(a), s 2:: a, since

E[JB,n(B)(s) I .Fa]= E[JB,n(B)(s)- lB,n(B)(a) I .Fa]+ lB,n(B)(a)


4.3. Preliminaries on stochastic integrals 213

and

E[ls,n(B)(s)- ls,n(B)(u) I Fa-]= LE[Bq-1l!.Bq I Fa-]


q

=L E {E[Bq-1 l!.Bq I Fq-d I Fa-} =L E {Bq-1 E[l!.Bq] I Fa-} = 0,


q q

where Bq_ 1 and l!.Bq have the same meaning as in the previous equations but are restricted
to the interval [u, s]. The above equalities hold by properties of the Brownian motion.
Because the martingale property is preserved by L2-limits, M given by Eq. 4.6 is a square
integrable martingale ([42], Proposition 1.3, p. 13). •

Example 4.4: Consider a fixed time t and the stochastic process M in Eq. 4.7.
This process, that is, the Stratonovich integral of B with respect to B on [0, s ], is
not a martingale with respect to the natural filtration of B. <>

Proof: Properties of the sequence of sums defining the Stratonovich integral can be used
to show that M is not a martingale, as was done in the previous example. An alternative
-
approach is taken here. We note that M(t) = foB
t
o dB = B(t) 2 /2 (Eq. 4.5) cannot be
an F 1 = u(B(s), 0 s s s f)-martingale since its expectation E[B(t) 2/2] = t /2 is not
constant. •

Example 4.5: The sample paths of the It6 and Stratonovich integrals, J;
B dB
and J;B o dB, differ and the difference between these integrals increases in time.
Figure 4.1 shows a sample of J; B dB and a sample of J; B o dB corresponding
to the same sample path B(·, w) of a Brownian motion B, that is, the functions
(B(t, w) 2 - t)/2 and B(t, w) 2 /2, respectively.
Samples of the It6 and the Stratonovich integrals corresponding to the same
sample B(·, w) of a Brownian motion Bare also shown in Fig. 4.2 with solid lines.
The dotted lines in this figure give sample paths of J B,n(B)(·) and fs,n(B)(·) for
two partitions Pn of [0, t] with points ttl= k tfmn, k = 0, 1, ... , mn, mn = 10,
and mn = 100 obtained from the same sample B(·, w) of B. These plots suggest
that the sample paths of J B, n (B) ( ·) and JB, n (B) ( ·) approach the sample paths of
the It6 and Stratonovich integrals, respectively, as n increases. ()

Example 4.6: Consider the random sequence

L N(t~~1 ) [N(t~n))- N(t~~1 )],


mn
lN,n(N) =
k=1

where Pn = (0 = tan) < t{nl < · · · < t~~ = t) is a sequence of partitions of


[0, t] such that t::..(pn) -+ 0 as n -+ oo and N is a Poisson process with intensity
).. > 0. Then J N,n (N) converges in m.s. and a.s. as n -+ oo. The limit of J N,n (N),
214 Chapter 4. Ito's Formula and Stochastic Differential Equations

1.2 ,-------,,------,---r-----r-----r----.-----,---,----,-----,

time, t

Figure4.1. Sample paths oflto andStratonovichintegrals f~ B dB and f~ BodB

denoted by f~ N(s-) dN(s) or f~ N_ dN and called the Ito integral of N_ with


respect to N on [0, t], is

t t
Jo N-(s)dN(s) = Jo N(s-)dN(s) =
1
2 (N(t) 2 -N(t)). (4.10)

Because N has cadUtg samples, N_(s) = N(s-) = lillluts N(u) is a caglad


process. The Ito integral J N_ dN coincides with the path by path Riemann-
Stieltjes integral and is a martingale.
Figure 4.3 shows a sample of a Poisson process N and the correspond-
ing sample of the stochastic integral f~ N_ dN. The values of the sample of
J N_ dN at the jump times Tt = 3, T2 = 4, and T3 = 8 of the sample of N are
N(3-) ilN(3) = 0, 0 + N(4-) ilN(4) = 0 + (1)(1) = 1, 1 + N(8-) ilN(8) =
1 + (2)(1) = 3, respectively, where ilN(t) = N(t) - N(L). 0

Proof: We first show that the Ito and the path by path Riemann-Stieltjes integrals of N-
with respect toN coincide. Consider a classical Riemann-Stieltjes integral j}: a(x) d{3(x),
where {3 is a step function with jumps f3k at Xko k = 1, 2, ... , n. If a and {3 are such that
they are not discontinuous from the right or from the left at each Xk simultaneously, then
([5], Theorem 7.11, p. 148)

1
n
= L a(xk) f3k·
b
a(x) d{3(x)
a k=l
4.3. Preliminaries on stochastic integrals 215

0.6.----~-~---------,

mn=10
0.4

0.2

-0.2

-0.4'---~-~-~-~---'
' JB,n

0 0.2 0.4 0.6 0.8 02 0.4 Q6 OB

mn=100 mn=100
0.4 0.6

0.2 0.4

0.2

-0.2

-0.4'---~-~-~-~----' -0.2'---~-~-~-~---'
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
time, t time, t

Figure 4.2. Samples of the approximations J B,n(B) and JB,n(B) of J B dB and

-
J
-
B o dB for mn = 10 and mn = 100

3 3

2.5
N(t)
2.5
f01 N - dN
2 2

1.5 1.5

0.5 0.5

0 5 10 0 5 10
time,t time, t

Figure 4.3. A hypothetical sample of the Poisson process N and the correspond-
ing sample of the Ito integral N_ dN J
This result implies

lo t
N(s-, w) dN(s, w) =
N(t,w)
L N(Tk-l (w), w) [N(Tk(w), w)- N(Tk-1 (w), w)]
0 k=l
N(t,w) 1
L (k- 1) = 2.(N(t, w) 2 - N(t, w)),
k=1
216 Chapter 4. Ito's Formula and Stochastic Differential Equations

where Tk, k = 1, 2, ... , denote the jump times of Nand To = 0. Hence, the path by path
JJ
integral N(s-, w) dN(s, w) coincides with the Ito integral in agreement with a result in
[147] (Theorem 17, p. 54), and lN,n(N) converges a.s. to N- dN as n ~ oo.JJ
It remains to show that JN,n(N) converges in m.s. to JJ
N- dN as n ~ oo. Note
that JN,n (N) can be written as

1 mn l mn 2
lN,n(N) = 2 L (N(t~n)) 2 - N(t~~1 ) 2 )- 2 L (N(t~n))- N(t~~1 ))
k=l k=l
1 1 mn 2
= -N(t)
2
2- - " (N</n))- N(t(n) ))
2~ k k-1
k=l

so that JN,n (N) is an increasing sequence such that 0 .::; JN,n (N) .::; N(t) 2 /2 a.s. for each
n. Hence, we have lN,n(N) ~ JJ
N- dN as n ~ oo by dominated convergence and
JJ
the a.s. convergence of lN,n(N) to N- dN (Section 2.5.2.2)
We also present here an alternative proof for the m.s. convergence of JN,n(N) to
JJ N- dN = (N(t) 2 - N(t))/2 based on direct calculations and elementary arguments.
We need to show that the expectation of the square of

JN,n(N)- 2
1 (N(t) - N(t))2 = 2l "~ [ ANk- (ANk) 2]
k

converges to zero as n ~ oo, where ANk = N(t~n))- N(t~~ 1 ), that is, the sum
L E [ ( ANk - (ANk) 2) (ANt - (ANt) 2 ) J
k,l

=L E [ ANk ANt - ANk (ANt) 2 - ANt (ANk) 2 + (ANk) 2 (ANt) 2 ]


k,t

=A- 4 L(Atk) 2 (Att) 2 -n2 L(Atk) 2 +4A. 3 L(Atd


~t k k

converges to zero as Atk ~ 0, where Atk = t~n) - t~~ 1 • The first four moments of the
increments of N used to establish the above expression can be obtained from the observa-
tion that the cumulants Xr of any order r of ANk are A. Atk and the relationships f-Ll = Xl,
f-L2 = X2 +X f. f-L3 = X3 + 3 Xl X2 + Xf, and /-L4 = X4 + 3 xi+ 4 Xl X3 + 6 Xf X2 +xi
between the moments 1-Lr of order rand the cumulants of a random variable ([79], p. 83
and p. 377). The cumulant of order r of a random variable with characteristic function f{l is
equal to the value of c.J=1)-r d 7 f{l(u)jdu 7 at u = 0. •

4.4 Stochastic integrals


This section (1) provides additional information on semimartingales be-
cause of their close relation to stochastic integrals, (2) defines the stochastic inte-
gral for a class of simple predictable integrands and semimartingale integrators,
4.4. Stochastic integrals 217

(3) extends this definition to stochastic integrals with dtgUtd integrands and semi-
martingale integrators, and (4) gives properties of the stochastic integral that are
useful for calculations. The presentation in this and the following two sections is
based on [147].
The Ito integrals J; J;
B dB and N_ dN in the previous section are special
cases of the stochastic integral defined here. The stochastic integral differs signif-
icantly from the integrals in Section 3.9.3. The integrals in Section 3.9.3 (1) have
random integrand and deterministic integrator or vice versa and (2) the random
integrand/integrator of these integrals are in L2. In contrast, both the integrand
and integrator of the stochastic integral considered here are stochastic processes
whose expectations may not exist.

4.4.1 Semimartingales
Let (Q, F, (F1k::O• P) be a filtered probability space. We denote by S,
V, and .C the class of simple predictable, .1"1-adapted with cadlag paths, and
F 1-adapted with caglad paths processes, respectively.
Let 0 = To ~ T1 ~ · · · ~ Tn+ 1 < oo be a finite sequence of F,-stopping
times and Hi denote random variables such that Hi E Fr; and IHil < oo a.s.,
i = 0, 1, ... 'n, that is, IHi(w)i < 00 forw E Q \no and P(Qo) = 0.

A process {H (t), t 2:: 0} is said to be simple predictable if


n
H(t) = Ho 1{oJ(t) +L Hi 1m,T;+d(t). (4.11)
i=l

Note: We haveS c .C since the processes inS are Ft-adapted with caglad, piece-wise
constant sample paths.
The classes of processes S, 'D, and .C are vector spaces. For example, the sum of
two F 1 -adapted processes with cadlag paths is an Ft-adapted process with cadlag paths.
Also, multiplying an F 1-adapted process with cadlag paths with a real constant does not
change its type, so that 'D is a vector space. •

Consider a real-valued stochastic process X defined on a filtered probability


space (Q, F, (Ft )t;::o. P) and let Lo denote the collection of real-valued random
variables defined on this space.

Let Ix : S ~ Lo be a mapping defined by


n
/x(H) = Ho X(O) + L Hi [X(Ti+l)- X(Tj)]. (4.12)
i=l

Note: The mapping I x is linear and its range is Lo since it is a finite sum of random
variables Hi weighted by increments of X. We also note that Jx(H) resembles Js,n(B)
218 Chapter4. Ito's Formula and Stochastic Differential Equations

and J N,n (N) used in Examples 4.1 and 4.6 to define the stochastic integrals J B dB and
J N_ dN, respectively. A
To check whether I x is a continuous mapping, we need to define the mean-
ing of convergence inS and Lo. A sequence Hn inS is said to converge uni-
formly in (t, w) to H E S if, for any 8 > 0, there is an integer n£ ::: 1 indepen-
dent of (t, w) such that IHn(t, w)- H(t, w)l < 8 for n ::: n£ or, equivalently,
supwEr.!,t:c:O IHn (t, w) - H (t, w) I -+ 0, n -+ oo. The convergence in probability
is used in Lo.

• A stochastic process X is a total semimartingale if (1) X E V and (2) I x in


Eq. 4.12 is continuous from S to Lo.
• A process X is a semimartingale if X 1 is a total semimartingale for each
t E[0, oo).

Note: The continuity of lx means that the uniform convergence Hn ---+ H inS implies
pr .
the convergence I(Hn)--+ lx(H) m Lo as n---+ oo.
The definition of the (total) semimartingale shows that the collection of (total) semi-
martingales is a vector space because, if X 1 and X 2 are (total) semimartingales and c1, c2
are some constants, then q X 1 + c2 X2 is a (total) semimartingale. A

Example 4. 7: Let
N(t)
C(t) = L Yk, t ::: 0, (4.13)
k=1
be a compound Poisson process, where N is a Poisson process with respect to a
filtration :F1 and the iid random variables Yk are Fo-measurable. Then C_(t) =
C(t-) = limstt C(s) is a simple predictable process. <>
Proof: Recall that a Poisson process N is an adapted counting process without explosions
that has stationary independent increments (Section 3.12). We have
{w: Tn(w)::; t} = {w: N(t, w):::: n} E :F1 for each t,
so that the jump times T; of N are :F1-stopping times (see also [147], Theorem 22, p. 14).
We have H; = L~= 1 Yj E :Fo s; :FT;• i :::: 1, IH;I < oo a.s., and Tn+1 < oo a.s. for
each n. The jump times are finite because P(Tn+1 > t) ::; E[Tn+lllt by Chebyshev's
inequality and E[Tn+d = (n + 1)/A so that P(Tn+1 > t) approaches zero as t ---+ oo,
where A denotes the intensity of the Poisson process. Hence, C- is in S . •

Example4.8: SetH = c_ and X = Bin Eq. 4.12, where C_ is defined in


the previous example and B is a Brownian motion assumed to be independent of
C. Figure 4.4 shows a sample of c_ and B. The random variable in Eq. 4.12 is
Ix(H) = H1 [B(T2) - B(T1)] + H2 [B(T3) - B(T2)] for n = 2. If C has unit
jumps, that is, C is a Poisson process, then
4.4. Stochastic integrals 219

5.----.----,-----,----,-----,----,----,-----,----,

4 C_(·,ro)

-1
0 2 3 4 5 6 7 8 9

T 1(ro) T2(ro) T3(ro)


4
B(-,ro)
3

3 4 5 6 7 8 9
time

Figure4.4. A sampleofC_ and a sample of Bused to calculate /x(H) = lB(C_)

where <P denotes the distribution of the standard Gaussian variable. Figure 4.5
shows the distribution of Ix(H) for n = 2 and two values of the intensity param-
eter A. of the Poisson process. <>

0.9

0.8

0.7

0.6

0.5

04

03 A-=1
0.2

0.1

-1 ~ 0

Figure 4.5. The distribution of I x (H) defined in Fig. 4.4


220 Chapter4. Ito's Formula and Stochastic Differential Equations

Note: The increments B(7i+t)- B(Ti) of the Brownian motion are independent Gaussian
variables with mean zero and variance 11+ 1 - Ti conditional on the jump times 1i. Hence,
I x (H) is a zero mean Gaussian variable with variance (T2 - Tt) + 4 (T3 - T2) conditional
on the jump times. The formula for P(lx(H) ::::; ~)follows from the distribution P(T >
rJ) = exp( -'A rJ) of the time T between consecutive jumps of N. This distribution depends
only on rJ and ).. since N has stationary independent increments. A

Generally, it is difficult to establish from the definition whether a process is


a semimartingale. However, relatively simple criteria are available for identifying
semimartingales. Before giving some of these criteria we give a definition. A
process is said to have paths of finite variation on compacts if almost all its
samples are of bounded variation on each compact interval of [0, oo). Recall
that in Section 3.9.3 we have defined the variation of a real-valued deterministic
function. Now we state some criteria for identifying semimartingales.

• If X is an adapted process with cadlag paths of finite variation on compacts,


then X is a semimartingale ([147], Theorem 7, p. 47).
• If X is a square integrable martingale with cadlag paths, then X is a semi-
martingale ([147], Theorem 8, p. 47).
• If X is a locally square integrable local martingale with cadUtg paths, then X
is a semimartingale ([147], Corollary 1, p. 47).
• If X is a local martingale with continuous paths, then X is a semimartingale
([147], Corollary 2, p. 47).

Note: Local martingales were defined in Section 3.11. A (local) martingale X is locally
square integrable if there exists an increasing sequence Tn, n 2: 1, of stopping times such
that limn-+oo Tn = oo a.s. and such that for every n 2: 1 the process xTn is a square
integrable martingale.
These criteria show that a Brownian motion B is a sernimartingale because B has
continuous paths and martingales are local martingales. A compound Poisson process C is
also a sernimartingale because C is an adapted process with cadlitg paths of finite variation
on any bounded time interval (0, t]. We have also seen that C is a martingale if its jumps
Yk are in Lt and have mean zero. A

At the end of Section 3 .14.2 we introduced briefly a representation for semi-


martingales that is very useful in applications. We state again this representation
here.

An adapted, cadlag process X is decomposable if there exists a locally square


integrable martingale M and an adapted process A with dtdlag sample paths of
finite variation on compacts such that M(O) = A(O) = 0 and ([147], p. 48)

X(t) = X(O) + M(t) + A(t). (4.14)


4.4. Stochastic integrals 221

• An adapted, cadUtg process X is a classical semimartingale if there exists a


local martingale M and an adapted process A with dtdU:tg sample paths of finite
variation on compacts such that M(O) = A(O) = 0 and

X(t) = X(O) + M(t) + A(t). (4.15)

• An adapted, cadlag process X is a semimartingale if and only if X is a classical


semimartingale ([147], Theorem 14, p. 105 and Theorem 22, p. 114).

Note: We have used the representation of Levy processes in Section 3.14.2 to show that
these processes are classical semimartingales and therefore semimartingales. The crite-
ria stated previously in this section show that decomposable processes are classical semi-
martingales. Similar considerations can be applied to show that other processes are semi-
martingales. For example, a compound Poisson process C(t) = L,f~? Yk with Y1 E L1 is
a semimartingale since

C(t) = (C(t)- At E[YI]) +At E[YI] = C(t) +At E[YI],

the compensated compound Poisson process C(t) = C(t) -At E[YI] is a martingale, and
At E[Yd is adapted with continuous samples of finite variation on compacts (Eq. 4.15). A

Example 4.9: The square of a Brownian motion B, that is, the process X (t) =
B(t) 2 , t ::: 0, is a semimartingale. <>

Note: The process X can be given in the form X(t) = M(t)+A(t), where M(t) = B(t) 2- t
is a square integrable martingale, A(t) = t has continuous samples of finite variation on
compacts, and M(O) = A(O) = 0. Hence, X is a semimartingale.
We have seen that B2 is a submartingale. That B2 is a semimartingale also results
from the fact that submartingales are semimartingales. Let Y be a submartingale. Then Y is
a semimartingale since Y(t) = (Y(t) - E[Y(t)])+E[Y(t)], Y(t)- E[Y(t)] is a martingale,
and E[Y(t)] is an increasing function so that it is of finite variation on compacts. A

4.4.2 Simple predictable integrands


The mapping in Eq. 4.12 gives the random variable I x (H) E Lo. We extend
this mapping to define a stochastic process lx(H)(t), t ::: 0, called stochastic
integral, and derive properties of this process.

If H E S, X E V, and 0 = To :::: T1 :::: · · · :::: Tn+l < oo are stopping times, the
stochastic integral of H with respect to X on [0, t] is
n
Jx(H)(t) = HoX(O) +L H; [X(t A 7i+l)- X(t AT;)], t::: 0. (4.16)
i=l

Alternative notations for the stochastic integral 1 x(H)(t) defined by Eq. 4.16 are
J~ H(s)dX(s), J~ H dX, and (H · X)(t). If the timet= 0 is excluded from the
222 Chapter 4. Ito's Formula and Stochastic Differential Equations

definition in Eq. 4.16, the resulting stochastic integral, denoted by f~+ H dX, is

t n
lo H(s) dX(s) = L Hi [X(t /\ Ti+l)- X(t /\ 'li)]
O+ i=l

so that (Eq. 4.16)

lo
t H(s)dX(s) = HoX(O) + lo+
t H(s)dX(s). (4.17)

Figure 4.6 illustrates the definition of lx(H)(·) in Eq. 4.16 fort E [0, 8].

:rmple ~ of : :

Figure 4.6. A sample of HE S, a sample of X, and the corresponding sample of


lx(H)(·) in Eq. 4.16

The sample of the simple predictable process has Ho = 1, H1 = 2, H2 = 3,


To = 0, T1 = 3, and T2 = 7. The sample of the cadlag process X is a sample of
a Brownian motion starting at B(O) = -3 that has been modified to have jumps
of 2 and -3 at times t = 6 and t = 7.7, respectively. The corresponding sample
of lx(H)O in Fig. 4.6 shows that (1) only the jumps of the integrator result in
jumps of the stochastic integral and (2) the stochastic integral has cadlag sample
paths. The remainder of this section gives additional properties of the stochastic
integral in Eq. 4.16.
4.4. Stochastic integrals 223

The mappings in Eqs. 4.12 and 4.16 are related by

lx(H)(t) = /xt(H), where X 1 (-) = X(t /\ ·). (4.18)

If H, G E S, X 1, Xz ED, and a, f3 are some constants, then


lx(a H + f3 G)(t) =a lx(H)(t) + f3 lx(G)(t),
lax 1+f3X 2 (H)(t) =a lx (H)(t) + f3 lx 2 (H)(t).
1 (4.19)

If 0 .:::: s .:::: t, H E S, and X E D, then

( H(u)dX(u)= t H(u)dX(u)+1 H(u)dX(u),


1
(4.20)
lo lo s+

where J:+ H (u) dX (u) = J; 1(s,tJ(u) H(u) dX(u).

Proof: The properties in Eqs. 4.18 and 4.19 follow from the definitions of Ix(H) and
Jx(H), the linearity of these integrals with respect to their integrands and integrators, and
the fact that S and 'D are vector spaces. For the property in Eq. 4.20 take 0 ::: s ::: t and
note that (Eq. 4.19)

t
lo
H dX = Ho X(O) + t
lo
(l(o sJ(u)
'
+ l(s tJ(u))
'
H(u)dX(u)

= Ho X(O) + f 1(0,sJ(u) H(u) dX(u) + f 1(s,t](u) H(u) dX(u)

= Ho X(O) + {s H dX +
lo+
1t
s+
H dX = {s H dX +
Jo
1t
s+
H dX

as stated. •

4.4.3 Adapted caglad integrands


The class of integrands used to define the stochastic integral J x (H) in
Eq. 4.16 is too restrictive for the type of stochastic integrals encountered in many
stochastic problems. In this section we extend the definition of J x (H) to include
integrands H E £and integrators X E D. The extension requires a definition of
convergence in the collection S of simple predictable processes S, and is particu-
larly useful for the case in which X E D is a semimartingale.

A sequence of processes H n E S is said to converge to a process H uniformly


on compacts in probability, and denote it by H n ~ H, if for each t > 0
pr
sup IHn(s)- H(s)l---+ 0, as n-+ oo. (4.21)
O::::S::::t
224 Chapter 4. Ito's Formula and Stochastic Differential Equations

Note: The ucp convergence over a time interval [0, t] implies the convergence in probabil-
ity at an arbitrary timer E [0, t] since

{w: IHn(r, w)- H(r, w)l > e} ~ {w: sup IHn(s, w)- H(s, w)l > e}
o:::s::::r
for any e > 0 and r E [0, t ], P (supo<s<t IHn (s)- H (s) I) ~ 0 as n ~ oo, and probability
is an increasing set function. A - -

The following two facts, given without proof, yield the desired generaliza-
tion of the stochastic integral in Eq. 4.16.

1. The space S of simple predictable processes is dense in the space £ of


adapted, cagUtd processes under the ucp convergence([147], Theorem 10, p. 49).
2. If X is a semimartingale, the mapping J x : S --+ V is continuous under the
ucp convergence in both Sand V ([147], Theorem 11, p. 50). The continuous
linear mapping J x : £ --+ V extending J x : S --+ V is called the stochastic
integral of H E £ with respect to a semimartingale process X.

Note: These results show that any process H E £ can be approximated by a simple pre-
dictable process in the sense of the ucp convergence and suggest that most of the properties
of the stochastic integral in Eq. 4.16 hold also for its extension lx : £ ~ 1J.
J6
The extended definition of the stochastic integral includes the integrals B dB and
J6 N- dN in Examples 4.1 and 4.6 because their integrands, Band N-, are in£ and their
integrators are semimartingales. A

4.4.4 Properties of stochastic integrals


We give here some essential properties of the stochastic integral H · X de-
fined in Section 4.4.3, where H E £ and X is a semimartingale unless stated
otherwise.
The jump process tl.(H · X)(s) of the stochastic integral is indistinguishable
from H (s) fl. X (s) ([147], Theorem 13, p. 53), that is, the sample paths of these
two processes coincide with probability 1.

Note: Recall that b.Y(t) = Y(t) - Y(t-) is the jump process of a cadlag process Y. Our
illustration of the definition of the stochastic integral in Fig. 4.6 shows that the jumps of the
stochastic integral are equal to H (1i) b. X (Ti ), where 7i, i=1 ,2, ... , denote the jump times
of the integrator, in agreement with the above property. A

If X has paths of finite variation on compacts, the stochastic integral H · X is in-


distinguishable from the corresponding path by path Riemann-Stieltjes integral
([147], Theorem 17, p. 54).

Proof: If H is a simple predictable process, the path by path Riemann-Stieltjes integral


J6H(s, w) dX(s, w) and the stochastic integral in Eq. 4.16 coincide. This property of
4.4. Stochastic integrals 225

the stochastic integral has been illustrated in Example 4.6 showing that the Ito integral
JJ N- dN and the path bypath Riemann-Stieltjes integralJJ N(s-, w) dN(s, w) coincide.
. If H E .C, there exists a sequence Hn of processes in S converging to H in ucp so
that we can find a subsequence Hnk of Hn that converges a.s. to H (Section 2.13). The
integrals Hnk · X can be calculated as path by path Riemann-Stieltjes integrals, that is,

(Hnk. X)(t,w) = H;k(w)X(O,w) + LHtk(w) [X(t A Ti+l•w)- X(t A Tj,w)].


i
The statement in the theorem results by taking the limit as k --+ oo and using properties of
Hnk and Hnk ·X. •

Preservation. The process Y = H · X is a semimartingale. Moreover, if G is a


process in £, then

G · Y = G · (H ·X)= (G H)· X, (4.22)

where G · Y denotes the stochastic integral of G with respect to Y, and G H is


the product of G and H ([147], Theorem 19, p. 55). The property in Eq. 4.22 is
referred to as associativity.

Preservation. If H e £ is as stated at the beginning of this section and if


• X is a locally square integrable local martingale, then H · X is a locally square
integrable local martingale ([147], Theorem 20, p. 56);
• X is a local martingale, then H ·X is a local martingale ([147], Theorem 17,
p. 106).
Note: We only show that, if H is a bounded simple predictable process and X is a square
integrable martingale, then H · X is a square integrable martingale.
Assume X (0) = 0 without loss of generality since X (·) - X (0) is a martingale that
starts at zero. The stochastic integral (Eq. 4.16)
n
(H. X)(t) =L Hi [X(Ti+l 1\ t)- X{Tj 1\ t)]
i=l

is adapted. If His bounded by a > 0, then (H · X)(t) is in L2 for each t since

E[((H . X)(t)) 2 ] ~ E [ (E H, [X <'i+t At) - X (T; A t) I)']


S a2 E [t, (x(Ti+l A 1)2 - X(TI A 1)2)]

= a 2 ( E[X (Tn+l A t) 2] - E[X (Tt A t) 2])

as X is a square integrable martingale. The above calculations have used the martingale
properties of X. It remains to show that H · X has the martingale property. For s :::; t the
226 Chapter 4. Ito's Fonnula and Stochastic Differential Equations

integral (H · X)(t) is the sum of (H · X)(s) and terms H; [XCTi+l 1\ t)- X(T; 1\ t)] for
1j > s. The conditional expectation of these terms,
E [H; (XCTi+l 1\ t)- X(T; 1\ t)) I Fs]
= E {E [H; [XCTi+l 1\ t)- X(T; 1\ t)] I FrJ I Fs}
= E {H; E [X(Ti+l 1\ t)- X(T; 1\ t) I Fr;) I Fs}
is zero since E [X(Ti+l 1\ t)- X(Tj 1\ t) I FrJ = 0 by the optional sampling theorem
(Section 3.11.2) so that H ·X has the prop~ E [(H · X)(t) I Fs] = (H · X)(s).
Stochastic integrals with Brownian m'otion integrator are of the type considered
here. For example, JJ
B(s)dB(s) = (B(t) 2 - t)
/2 is a martingale that is also square
integrable (Example 4.6) and JJ H(s) dB(s), H E .C, is a square integrable martingale if
E [JJ H(s) 2 ds J < oo, as we will see later in this chapter (Eq. 4.36). A
We state without proof an additional property of the stochastic integral
but first, we need to introduce a new concept. A sequence of stopping times
an = (0 = Tdn) :S T 1(n) :S · · · :S T~~)) is said to tend to the identity if
(1) limn__.oo supk rt) = oo a.s. and (2) supk Irt.;}1 - Tk(n) I converges a.s. to zero
as n-+ oo.
If X is a semimartingale as stated at the beginning of this section, Y is a process
in D or£, and an is a sequence of random partitions tending to the identity, then

t Yan (s) dX(s) = LY(T/n))(X('lj~~ 1\ · ) - X('lj(n) 1\ ·)


lo+ i

~ { y_(s) dX(s), n-+ oo, (4.23)


lo+
where y_(s) = Y(s-) = lirnstt Y(s), y_(O) = 0, and Yan(t) = Y(O) 1[0} +
L; Y (1j(n)) 1(T;~~ .r/"ll ([147], Theorem 21, p. 57).
Note: This property holds also for a deterministic sequence of partitions. Numerical cal-
culations in this book use almost exclusively deterministic rather than random partitions of
the integration interval. There are some situations in which it is convenient to use random
partitions ([115], Section 9.5). A
Example 4.10: Let N be a Poisson process with intensity A > 0 and jump times
T1, T2, ... , M(t) = N(t) -At denote the associated compensated Poisson pro-
cess, G(t) = 1[o,r1)(t), and H(t) = 1(o,ri](t). The stochastic integral H · M is a
martingale. However, G · M given by Eq. 4.16 with (G, M) in place of (H, X) is
not a martingale. <>
Proof: We have shown that M is a martingale so that it is also a semimartingale (Sec-
tion 4.4.1). Note also that G E 'D and H E .C. The process in Eq. 4.16 with (H, X)
replaced by (G, M) is

lot G(s) dM(s) =lot G(s) dN(s)- J.. lot G(s) ds = -J... (t 1\ T1)
4.4. Stochastic integrals 227

so that G · M is not a martingale. On the other hand, Eq. 4.16 with (H, X) replaced by
( H, M) yields

lot H(s) dM(s) =lot H(s) dN(s)-). lot G(s) ds = N(t 1\ T1)-). (t 1\ T1).
That H · X is a martingale follows from

E[(H · M)(t) IFs] = E [N(t A T1)-). (t A T1) IFs]


= N(s 1\ t 1\ T1)-). (s 1\ t 1\ T1) = N(s 1\ T1)-). (s 1\ T1) = (H · M)(s)
for 0 :::; s :::; t, where the second equality is given by the optional sampling theorem
(Section 3.11.2) applied to the martingale N(t) -At. This example suggests that the
integrand must have di.glad sample paths to assure that the stochastic integral preserves the
martingale property of its integrator. •

Example 4.11: Consider the stochastic integral J(t) = J~ c_ dB, where Cis a
compound Poisson process (Eq. 4.13) and B denotes a Brownian motion that is
independentofC.Ifthejumps Yk ofC areinLz, then

J (t) = Jo
t C(s-) dB(s) = L Hi[B(Ti+100
1\ t) - B(Tj 1\ t)]
0 1=1

is a square integrable martingale with mean zero and

=L
00

E[J(t) J(s)] E[H,Z] E[TI+1 1\ t 1\ s- Ti 1\ t 1\ s],


1=1

where T1 denote the jump times of C and H1 = L~= 1 Yk. <>


Proof: That J is a square integrable martingale follows from a preservation property of the
stochastic integral. The first two moments of J are

= L E[Hil E[B(1i+11\ t)- B(TI 1\ t)] = 0


00

E[J(t)] and
1=1

00

E[J(t) J (s)] = L E[H1 Hj (B(TI+l 1\ t) - B(T1 1\ t)) (B(Tj+1 1\ s) - B(Tj 1\ s))]


l,j=1
00
=L E[Hf] E[(B(TI+l 1\ t 1\ s) - B(TI 1\ t 1\ s)) 2 ]
1=1

L E[Hf] E[T1+1 1\ t 1\ s -
00

= 7i 1\ t 1\ s],
1=1
where the last two equalities follow from properties of the increments of the Brownian
motion B and the independence between C and B. •
228 Chapter4. Ito's Formula and Stochastic Differential Equations

4.5 Quadratic variation and covariation


In this section, X and Y are semimartingales with the property X (0-) =
Y (0-) = 0. We define the quadratic variation and covariation processes for semi-
martingales, give properties of these processes that are useful for applications, and
illustrate how the quadratic variation and covariation processes can be used to cal-
culate stochastic integrals. The variation and covariation processes are essential
for establishing the Ito formula (Section 4.6). In Section 3.11.4 we have defined
variation and covariation processes for martingales.

4.5.1 Definition

The quadratic variation process of X, denoted by [X, X] or [X], is defined by

[X, X](t) =X (t) 2 - 21 1


X (s-) dX (s). (4.24)

Note: The stochastic integralJ6


X (s-) d X (s) is defined since X is a semimartingale so
that X- is in£, where X-(s) = X(s-) = limuts X(u). A

The path by path continuous part [X, X]c of [X, X] is defined by

[X, X](t) =[X, X]c(t) + X(0) 2 + L (~X(s)) 2


O<s:Sf
=[X, xnt) + L (~X(s)) . 2 (4.25)
0:9:9

Note: It can be shown that any semimartingale X has a unique continuous local martin-
gale part xc and that [Xc, xc] = [X, X]c ([147], p. 63). For example, xc for the semi-
martingale X= B +Cis xc = B so that [X, X]c(t) = t, [X, X](t) = t + L,~~? Y'f,
and [X, X](t)- [X, X]c(t) = L,~~? Yf = LO::;sg(~C(s)) 2 , where Band C denote
a Brownian motion and a compound Poisson process, respectively, and N, Yk are defined
by Eq. 4.13. The process [X, X](t) - [X, X]c(t) is given by the sum of the squares of
all jumps of C in [0, t]. This result holds for any semimartingale X (Eq. 4.25) because
semimartingales can have only jump discontinuities and the number of jumps is at most
countable.
We also note that the increment ~X (0) = X (0) - X (0-) of X at t = 0 is X (0)
since X (0-) = 0 by assumption so that [X, X](O) = X (0) 2 and [X, X]c (0) = 0. A

A semimartingale X with [X, X]c(t) = 0 is said to be a quadratic pure


jump semimartingale. The quadratic variation of a quadratic pure jump semi-
martingale X is [X, X](t) = Lo:::s:::rC~X(s)) 2 (Eq. 4.25).
4.5. Quadratic variation and covariation 229

The quadratic covariation or the bracket process of X and Y is

[X, Y](t) = X(t) Y(t)- lot X(s-)dY(s)- lot Y(s-)dX(s). (4.26)

Note: The definition of [X, Y] in Eq. 4.26 is meaningful since X, Yare semimartingales
so that X_, Y _ E .C and the above stochastic integrals are well defined. The quadratic
covariation of X and X coincides with the quadratic variation of X in Eq. 4.24. ~

4.5.2 Properties

Polarization identity.

[X, Y] = l1 ([X+ Y, X+ Y]- [X, X]- [Y, Y]). (4.27)

Proof: Since the collection of semimartingales defines a linear space, the quadratic vari-
ation process [X + Y, X + Y] is defined. By properties of the stochastic integral and
Eqs. 4.24-4.26 we have [X+ Y, X + Y] = [X, X] + [Y, Y] + 2 [X, Y], which yields the
polarization identity. •

Integration by parts.

( X(s-)dY(s)=X(t)Y(t)- ( Y(s-)dX(s)-[X,Y](t). (4.28)


lo+ lo+
Note: This formula is a direct consequence of the definition in Eq. 4.26. If [X, Y] = 0,
Eq. 4.28 coincides with the integration by parts formula in the classical calculus. ~

The process [X, X] in Eq. 4.24 (1) is adapted with cadlag, increasing samples,
(2) can be obtained for each t > 0 and s E [0, t] from

L (X(t~n) s))
~ 2
X(0) 2 + 1\ s)- X(tt!1 1\ -4 [X, X](s), (4.29)
k=l

where Pn = (0 = tan) < t}n) < · · · < t~~ = t) is a sequence of partitions


with f::!..(pn) ~ 0 as n ~ oo, and (3) has the properties [X, X](O) =X (0) 2 and
I::!..[ X, X](t) = [X, X](t)- [X, X](t-) = (I::!.. X (t)) 2 ([147], Theorem 22, p. 59).

Proof: We have S~n) = 1:;,;:1 ( X(tt)) 2 - X(t~~ 1 ) 2 ) = X(t) 2 - X(0) 2 and


230 Chapter 4. Ito's Formula and Stochastic Differential Equations

as n ~ oo, where we used Jci X(s-)dX(s) = X(O-) X(O) + Jci+ X(s-)dX(s) and
X(O-) = 0. The identity (b- a) 2 = b 2 - a 2 - 2a(b- a) with a = X(t~~1 )) and
b = X(t~n)) gives

after summation over k. Hence, sin) - 2 S~n) converges in ucp to

(x(t) 2 -X(0) 2)-2fX(s-)dX(s), n~oo,


as stated (Eq. 4.29).
The quadratic variation process [X, X] increases with t (Eq. 4.29) and has adapted,
cadlag samples (Eq. 4.24). It remains to show that X has the properties in (3). The equality
X(0) 2 = [X, X](O) follows directly from the convergence of the series in Eq. 4.29 to
[X, X](t). The second equality in property (3) can be obtained from the observation that

(~X(t)) 2 = (X(t)- X (t- )) 2 = X(t) 2 +X (t- ) 2 - 2 X(t) X(t-)

= X(t) 2 - X(t-) 2 - 2X(t-) ~X(t) = ~(X 2 )(t)- 2X(t-) ~X(t) = MX, X](t)
by the first property in Section 4.4.4 and Eq. 4.24. •

Example 4.12: Let C(t) = L.~~? Yk be a compound Poisson process. Then


Ll[C, C](t) = Y~(t) l(t.N{t)=ll• where ~N(t) = N(t)- N(t-). <>
Proof: If C has a jump at a timet 2: 0, then ~C(t) = YN(t) l(t.N(t)=l}· The expression
of MC, C] results from property (3) following Eq. 4.29. •

• The quadratic variation process [X, X] in Eq. 4.24 is a semimartingale.


• The quadratic covariation process [X, Y] in Eq. 4.26 is a semimartingale.
Proof: Since [X, X] is an adapted process with clldH1g, increasing samples by the above
property (Eq. 4.29), the process [X, X] is a semimartingale (Section 4.4.1). The polariza-
tion identity and the fact that the collection of martingales is a linear space show that the
quadratic covariation process [X, Y] is also a semimartingale.
We also note that the product of two semimartingales is a semimartingale since
stochastic integrals and quadratic covariation processes are semimartingales (Eq. 4.28). •

If X has a.s. continuous sample paths of finite variation on compacts, then


[X, X](t) = X (0) 2 .

Proof: For the sequence of partition Pn in Eq. 4.29


4.5. Quadratic variation and covariation 231

as n-+ oo since supk fX(t~n))- X(t~~ 1 )i-+ 0 and I::;,1 fX(t~n))- X(t~~1 )i < oo by
the continuity and the finite variation of the samples of X, respectively. •

For each t > 0 and partitions Pn = (0 = t~n) < tin) < · · · < tf::} = t) such that
.6.(pn) -+ 0 as n -+ oo we have
mn
X(O) Y(O) +L (X(tt) 1\ s)- X(t~~l 1\ s)) ( Y(t~n) 1\ s)- Y(t~~l 1\ s))
k=l
ucp
-----+ [X, Y](s), s E [0, t], (4.30)

[X, Y](O) = X(O) f(O) and Ll[X, Y](t) = .6.X(t) .6.Y(t) ([147], Theorem 23,
p. 61).

If X is an adapted process with cadlag paths of finite variation on compacts, then


X is a quadratic pure jump sernimartingale ([147], Theorem 26, p. 63).
Proof: Note that X is not assumed to be a semimartingale. However, X with the stated
properties is a semimartingale (Section 4.4.1) so that Eq. 4.24 applies and gives

X(t) 2 =2 f X(s-) dX(s) +[X, X](t).

Because X_ E £ and X is of finite variation on compacts, the stochastic integral


X_· X is indistinguishable from the path by path Riemann-Stieltjes integral (Section 4.4.4 ).
Note also that, for each w,
mn mn
X(t) 2 - X(0) 2 = L (X(tt)) 2 - X(t~~ 1 ) 2 ) = L [X(t~~1 ) ( X(t~n))- X(t~~1 )) J
k=l k=l
mn
+ L:[x(t~n))(X(t~n))-X(t~~ 1 ))],
k=l

where Pn = (0 = t~n) < · · · < t~n'j = t) is a partition of [0, t] such that fl(pn) -+ 0 as
n -+ oo. The first summation on the right side of the above equation converges in ucp to
f X- d X as n -+ oo. The second summation on the right side of the above equation is
mn mn
L X(t~~1 ) (x(ttl)- X(t~~1 )) + L (x<t~n))- X(t~~1 ))
2

k=l k=l
so that it converges in ucp to JJ X_ dX + LO<s::O:t (fl.Xs ) 2 as n -+ oo. Hence,
X(t) 2 - X(0) 2 = 2 f X(s-)dX(s) + L (.6.X(s)) 2 or

f
0<s:9

X(t) 2 =2 X(s-)dX(s) + L (.6.X(s)) 2 ,


0 O:::;s:::;t

which implies [X, X](t) = LO::;s 9 (.6.X(s)) 2 (Eq. 4.24). •


232 Chapter 4. Ito's Fonnula and Stochastic Differential Equations

If X is a quadratic pure jump semimartingale and Y is an arbitrary semimartin-


gale, then
[X, Y](t) = X(O) Y(O) + L
~X(s) ~Y(s). (4.31)
O<s:C::t

Note: The Kunita-Watanabe inequality discussed in the following section (Eq. 4.32) gives
I[X, Y](t)i :::0 ([X, X](t) [Y, Y](t)) 112 for H = K = 1 so that [X, X]c(t) = 0 implies
[X, Y]c(t) = 0. Hence, [X, Y](t) is equal to LO<s<t D.X(s) D.Y(s) by Eq. 4.30.
This result shows that, for example, the quadratic covariation of a compound Pois-
son process and a Brownian motion is zero. &

Example 4.13: Consider the compound Poisson process C in Eq. 4.13. This pro-
cess is a quadratic pure jump semimartingale with quadratic variation [C, C] (t) =
'£~~? Yf. If the jumps of C are one a.s., the compound Poisson process C coin-
cides with a Poisson process N so that [N, N](t) = '£~~? 1 = N(t). <>
Proof: The compound Poisson process C is a sernimartingale since it is adapted and has
cadUtg paths of finite variation on compacts (Section 4.4.1). We have seen in Example 4.12
that D.[C, C](t) = Y~(t) if C has a jump at timet and D.[C, C](t) = 0 otherwise. Hence,
the quadratic variation of C in a time interval (0, t] is [C, C](t) = L:f~? Yf. •

Example 4.14: We have calculated the stochastic integral N _ · N and found

lot N(s-)dN(s) = (N(t) 2 - N(t))/2,

in agreement with the definition of the quadratic covariation process of N. <>


Proof: We have shown that the quadratic variation of a Poisson process N is [N, N](t) =
N(t). This result and [N, N](t) = N(t) 2 - 2 f~ N(s-) dN(s) (Eq. 4.24) yield the above
expression for f~ N- dN. •

Example 4.15: If X is a process with continuous paths of finite variation on com-


pacts, then J~ X(s-) dX(s) = (X(t) 2 - X(0) 2 )/2 and [X, X](t) = X(0) 2 . <>

Note: We have [X, X](t) = X (0) 2 by one of the properties in this section so that f~ X- dX
results from Eq. 4.24. For example, take X(t) = A cos(v t), where A is a real-valued
random variable and v > 0 is a real number. Then

lot X(s-)dX(s) =lot X(s)dX(s) = fo 1


(A cos(vs))d(A cos(vs))

= (A 2 /2)[(cos(vt)) 2 - 1] = X(t) 2 j2- A 2 j2


so that [X, X](t) = X(t) 2 - 2 (x(t) 2 /2- A 2 j2) = A 2 = X(0) 2 . &
4.5. Quadratic variation and covariation 233

Example 4.16: The quadratic covariation [B, C] of a Brownian motion B and the
compound Poisson process C is zero. The process [B + C, B + C] is equal to the
sum of the quadratic variations of B and C, that is,
N(t)
[B + C, B + C](t) = [B, B](t) + [C, C](t) = t + LYf.
k=l

Figure 4.7 shows a hypothetical sample path of C and [B + C, B + C]. The linear

5
C(-,ro)
4

0 2 3 4 5 6 7 8 9

-
30
[B+C,B+C](-,ro) ~

20
__....o

10

.-J:)

0 2 3 4 5 6 7 8 9
time

Figure 4.7. Hypothetical samples of C and B +C


variation of [B, B] in time is interrupted by jumps caused by the quadratic pure
jump semimartingale C. <>
Proof: The polarization identity (Eq. 4.27) with (B, C) in place of (X, Y) gives [B, C] =
([B + C, B + C]- [B, B]- [C, C]) /2. We have already found the expressions for the
quadratic variation processes [B, B] and [C, C). The quadratic covariation process [B, C]
is zero by Eq. 4.31 and properties of (B, C). These observations give the stated expression
for [B + C, B +C). •

Example 4.17: Let X be the Uvy process with characteristic function


E[exp(-v'-1 u X(t))] = exp( -t 1/f(u)),
where

1/f(u) = v'-Iau+ a 22u 2 + { (t-eAuy +v'-Iuy1(Jyl<ll) n(dy),


Jn<l.\{0}

n(dy) = IYI-(a+l) dy, and ot E (0, 2) (Section 3.14.2). Figure 4.8 shows samples
of the quadratic variation process [X, X] fora = 0.1, a = 1, and several values of
234 Chapter 4. Ito's Formula and Stochastic Differential Equations

X 104
5 2500
a=1.0 a=1.5
4 2000

3 1500

2 1000

500

0 0
0 20 40 60 80 100 0 20 40 60 80 100

400 200
a=1.7 a=1.99
300 150

200 100

100 50

0 0
0 20 40 60 80 100 0 20 40 60 80 100
time time

Figure 4.8. Samples of the quadratic variation of a Levy process with a = 1, 1.5,
1.7, and 1.99

a. For small values of a, [X, X] resembles the quadratic variation of a compound


Poisson process. For large values of a, [X, X] increases almost linearly in time as
the quadratic variation of a Brownian motion. <>

Note: The generation of samples of X is based on the algorithm in Section 3.14.2 using
the approximation X ::::: X(l) + xC2l + xC 3 ,sJ, where (X(l), xC2l, xC 3 ,sl) are mutually
independent, xCll(t) = CJ B(t)- at, B is a Brownian motion, and xC 2l and xC 3 ,s) are
compound Poisson processes including jumps of X with magnitude in [1, oo) and (s, 1),
respectively, and s E (0, 1). The plots in Fig. 4.8 are for s = 0.1. •

4.5.3 Stochastic integrals and covariation processes

We give some useful inequalities and formulas involving stochastic inte-


grals whose integrators are quadratic variation and covariation processes. Also,
we present a simple formula for calculating the second moment of stochastic in-
tegrals with square integrable martingale integrators.
4.5. Quadratic variation and covariation 235

Kunita-Watanabe inequality. If X, Yare semimartingales and H, K are mea-


surable processes, then a.s. ([147], Theorem 25, p. 61))

1 00
IH(s)IIK(s)lld[X, Y](s)l

::::: (Joroo H(s) 2 d[X,X](s)


)1/2 (roo
Jo K(s) 2 d[Y,Y](s)
)1/2
(4.32)

Note: The integrals in Eq. 4.32 are defined since [X, X], [Y, Y] are cadlag, increasing,
adapted processes and [X, Y] is a process with samples of finite variation on compacts . .t.

If (1) X, Y are semimartingales, (2) H, K are measurable processes, and


(3) 1/p + 1/q = 1, then

E [looo IH(s)IIK(s)lld[X, Y](s)l]


:0 II (f H (s) 2 d[X, X](s>Y'lll (fp
K (s)' d[Y, Y](s) )''l , q
(4.33)

where II Z IILp= (E[ZP]) 11P denotes the p-norm of Z E L p·

Proof: Set Z1 = ( f000 H(s) 2 d[X, X](s) ) f000 K(s) 2 d[Y, Y](s) )1/2 . The
1/2 (
and Z2 =
Kunita-Watanabe inequality gives

E [fooo jH(s)jjK(s)jjd[X, Y](s)JJ ::; E (Z1 Z2],


by averaging. This result and HOlder's inequality,

JE[Z1 Z2JI ::; E[!Z1 Z2ll::; (E[JZ1JP])lfp (E[IZ21q])lfq,

yield Eq. 4.33. •

If X, Y are semimartingales and H, K E .C, then

[H ·X, K · Y](t) =lot H(s) K(s)d[X, Y](s). (4.34)

Note: The quadratic covariation [H · X, K · Y] is defined because stochastic integrals


are semimartingales. We need to show that [H ·X, Y](t) = JJ
H(s)d[X, Y](s) holds
for H E S and that this equality can be extended to processes H E C. A proof of this
result can be found in [147] (Theorem 29, p. 68), and uses the continuity of the stochastic
236 Chapter 4. Ito's Formula and Stochastic Differential Equations

integral and the fact that Sis dense in£. The above equality and the symmetry of[·,·]
give [H ·X, K · Y](t) = fJ
H(s) d[X, K · Y](s) and [X, K · Y](s) = jg
K(u) d[X, Y](u)
so that Eq. 4.34 results since d[X, K · Y](s) = K (s) d[X, Y](s). A

If X, Y are semimartingales and Z E V, then


mn
L Z(t~~1 ) ( X(tt))- X(t~~ 1 )) ( Y(t~n))- Y(t~~1 ))

rt
k=l
ucp
-+ Jo Z(s-)d[X, Y](s) as n--? oo, (4.35)

where Pn = (0 = tcin) :S tt) :S · · · :S tS:J = t) is a sequence of partitions of


[0, t] such that t:.(pn) --? 0 as n --? oo ([147], Theorem 30, p. 69).

Proof: We have

z_ · [X, Y] = z_ · (X Y) - z_ · (X- · Y) - z_ · (L ·X)


= z_ ·(X Y)- (Z- X-)· Y- (Z- L) ·X
= z_ ·(X Y)- (Z X)-· Y- (Z Y)- ·X,

by the definition of the quadratic covariation (Eq. 4.26) and the associativity of the stochas-
tic integral (Eq. 4.22), where Z_(t) = Z(t-) and H · V denotes f H dV. The sequence,
mn
L Z(t~~1 ) [ ( X(t~n)) Y(t~n)) - X (t~~l) Y(t~~ 1 ))
k=l

-X(t~~1 ) (ru~n))- Y(t~~ 1 ))- Y(t~~ 1 ) (xu~n))- X(t~~ 1 ))],


in Eq. 4.35 converges in ucp to z_ · (X Y)- (Z X)- · Y- (Z Y)- ·X as n -+ oo. This
completes the proof since the above limit is equal to z_ · [X, Y]. •

If M is a square integrable martingale with M (0) = 0 and H is an adapted


process with d1gU1d paths such that E [J~ H(s) 2 d[M, M](s)] < oo for each
t ::=: 0, then

E [(lot H(s)dM(s))
2
] = E [lot H(s) d[M,M](s)].
2 (4.36)

Proof: This property is very useful in applications since it provides a procedure for calcu-
lating the second moment of stochastic integrals with square integrable martingale integra-
tors. For example, if M is a Brownian motion B, then Eq. 4.36 becomes

(4.37)
4.6. Ito's formula 237

since [B, B](t) = t. The result in Eq. 4.37 is referred to as the Ito
isometry ([135],
Lemma 3.1.5, p. 26).
The expectation on the right side of Eq. 4.36 exists by hypothesis. The expectation
on the left side of this equation is defined because the stochastic integral H · M is a square
integrable martingale by a preservation property of the stochastic integral (Section 4.4.4).
This integral is zero at t = 0 by the definition of M. The assumption M(O) = 0 and
Eq. 4.24 give

[H · M, H · M](t) = (H · M)(t) 2 - 2 lot (H · M)(s-) d(H · M)(s).

The expectation of the integral in this equation is zero since H · M is a square integrable
martingale so that
E
[[H · M, H·M](t)] = E[<H ·
M)(t) 2].
We also have (Eq. 4.34)

[H · M, H · M](t) = f H(s) 2 d[M, M](s)

so that E [<H · M)(t) 2] = E [J6 H(s) 2 d[M, M](s)J..


Example 4.18: The second moment of the Ito integral J~ B dB is

E [(lot B(s) dB(s)) 2] = E [lot B(s) 2 ds J= t 2/2,


where B is a Brownian motion. <>
Proof: The isometry in Eq. 4.37 and the Fubini theorem (Section 2.8) give

E [f B(s) 2 ds] = f E[B(s) 2] ds =lots ds = t 2 /2.

An alternative solution is possible in this case since


E[(B(t) 2 - t) 2]/4 = E[B(t) 4 - 2 t B(t) 2 + t 2]/4 = (3 t 2 - 2 t 2 + t 2)/4 = t 2/2
and J6 B(s) dB(s) = (B(t) 2 - t)/2 (Example 4.1). •

4.6 Ito's formula


The Ito formula extends the change of variable formula of the classical cal-
culus to stochastic integrals with adapted, caglad integrands and sernimartingale
integrators. We will use Ito's formula to develop Monte Carlo simulation tech-
niques (Chapter 5), obtain the local solution of a class of deterministic partial dif-
ferential equations (Chapter 6), and solve various problems defined by differential
equations with random coefficients and/or input (Chapters 7, 8, and 9). This sec-
tion states and proves the Ito formula, defines the Fisk-Stratonovich integral, and
demonstrates by examples the use of Ito's formula and the relationship between
the Ito and Fisk-Stratonovich integrals.
238 Chapter 4. Ito's Formula and Stochastic Differential Equations

4.6.1 One-dimensional case


The classical change of variables formula,

g(h(t))- g(h(O)) = 1 h(t)

h(O)
g 1 (u) du =
lot

0
g'(h(s)) dh(s), (4.38)

gives the increment of a deterministic real-valued function t 1-+ g(h(t)) in an


interval [0, t]. The differential form of this formula is

d
dt [g(h(t))] = g'(h(t)) h'(t) or d[g(h(t))] = g'(h(t)) dh(t), (4.39)

where the first derivatives g' and h' of functions g and h, respectively, are assumed
to exist. The Ito formula extends the rules of the classical calculus in Eqs. 4.38
and 4.39 to the case in which the deterministic function h is replaced with a semi-
martingale X.
ThechangeofvariablesformulainEq.4.38gives f~ B(s)dB(s) B(t) 2 =!
2
for g(y) = y j2 and h replaced by a Brownian motion process B. This re-
sult is in disagreement with our previous calculations showing that the Ito inte-
gral f~ B(s)dB(s) is equal to B(t) 2 j2- t/2 (Eq. 4.3). On the other hand, the
Stratonovich integral f~ B(s) o dB(s) = B(t) 2 /2 (Eq. 4.5) and the classical cal-
culus give the same result.

Ito's formula. If X is a semimartingale and g E C2 (JR), then (1) g(X) is a


semimartingale and (2) the integral form of the Ito formula,

g(X(t))- g(X(O)) = t
fo+
g'(X(s-))dX(s) +~
2
t
fo+
g"(X(s-))d[X, XY(s)

+ L [g(X(s))- g(X(s-))- g'(X(s-)) ~X(s)], (4.40)


O<s::;t

is valid a.s. at each t :::: 0 ([147], Theorem 32, p. 71).

Note: The notation g E C2 (JR) means that the function g : lR -+ lR has continuous second
order derivative and g 1/g 11 denote the first/second derivatives of g. The last two terms on
the right side of Eq. 4.40 are not present in the classical calculus (Eqs. 4.38 and 4.39).
The lt6 formula can also be given in the form

g(X(t)) - g(X(O)) = lo0+t g 1 (X (s-)) dX(s) + -l lot g'' (X(s-)) d[X, X)(s)
2 0+

+ L [g(X(s))- g(X(s-))- g 1 (X(s-)) L'>X(s)- ~g'1 (X(s-)) (L'>X(s)) 2J,


O<s::::;t
(4.41)

by the relationship between [X, X] and [X, Xf (Eq. 4.25). A


4.6. Ito's formula 239

Ito's formula. If X is a continuous semimartingale and g E C(JR 2 ), then


(1) g(X) is a continuous semimartingale and (2) the differential and integral
forms of the Ito formula are
1
dg(X(t)) = g 1(X(t))dX(t) + "2g 11 (X(t))d[X, X](t) and

g(X(t))- g(X(O)) = t g (X(s)) dX(s) + ~2 lot g (X(s)) d[X, X](s).


lo
1 11

(4.42)

Proof: The proof of the Ito formula in Eq. 4.40 is presented in detail since we will use this
formula extensively. The proof, adapted from [147] (Theorem 32, p. 71), has two parts.
First, it is assumed that X is a continuous semimartingale. This result is then extended to
the case in which X is an arbitrary semimartingale. A heuristic proof of the Ito formula can
be found in [117] (Theorem 6.7.1, p. 184).
The following result is used in the proof. If g E C2 (R) is defined on a bounded
interval, the Taylor formula gives

h2
g(x +h)- g(x) = h g (x) + 2 g'' (x) + r(x, h),
1 h E R, (4.43)

where ir(x, h)l ::::: h 2 a(lhl), a : [0, oo) -+ [0, oo) is increasing, and limu,r,oa(u) = 0.
Two cases are considered.

Case 1: X is a continuous semimartingale. We have [X, X] and [X, X]c in the


Ito formula for X continuous (Eq. 4.42) and X arbitrary (Eq. 4.40), respectively, because
~X(t) = 0 at all times t 2: 0 if X has continuous samples so that [X, X](t) = [X, X]c(t)
in this case.
Since X is continuous and g E C(R2 ), the process g(X) has continuous samples.
If Eq. 4.42 holds, g(X) is a semimartingale since (1) the processes g' (X) and g 11 (X) are
adapted as memoryless transformations of the adapted process X, (2) g'(X) and g 11 (X)
have dtdHtg samples by the properties of g and X, for example,

limg 1 (X(s)) = g'(limX(s)) = g'(X(t)) and limg 1 (X(s)) = g1 (limX(s)) = g1 (X(t-)),


s,!,t s,!,t stt stt

(3) the integrals JJg 1 (X (s-)) dX (s) and JJ


g 11 (X(s-)) d[X, X]c(s) are semimartingales
by a preservation property of the stochastic integral (Section 4.4.4), and (4) sums of semi-
martingales are semimartingales.
Without loss of generality it can be assumed that X (0) = 0. If X (0) f= 0, we can
consider the process X(t)- X(O) that starts at zero. The Brownian motion process is an
example of a process considered in this part of the proof. The Taylor formula cannot be
applied to g(X) because X may not take values in a bounded interval. This difficulty is
removed by stopping X when it leaves for the first time an interval (-a, a), 0 < a < oo,
and then letting a -+ oo. Therefore, we can assume that X is bounded and apply Taylor's
formula to g(X). Fix t > 0 and consider a sequence of partitions Pn = (0 =tan) ::::: t{n) :::::
240 Chapter 4. Ito's Formula and Stochastic Differential Equations

· · · :S t/:} = t) of [0, t] such that t.(pn) -+ 0 as n -+ oo. Then


mn
g(X(t))- g(X(O)) = L (g(X(t~n)))- g(X(t~~1))) = s~n) +sin)+ sjnl, (4.44)
k=1

where

mn
s~n) = L r(X(t~~1), X(t~n))- X(t~~l))
k=l

correspond to the three terms of the Taylor formula in Eq. 4.43 applied to the intervals
of Pn· The sums S~n) and sin) converge in ucp to the Ito integrals g 1 (X(s-))dX(s) fd
and (1/2) Jd
g" (X (s-)) d[X, X](s) as n -+ oo, respectively (Eqs. 4.23 and 4.35). The
absolute value of s~n) can be bounded by

mn
ls~n)l = L r(X(t~~1 ), X(tt))- X(t~~1 ))
k=l
mn
< max a(IX(/n))- X(t(n) )I) "(X(t(n))- X(t(n) )) 2
- 1<k<m
- - n
k k-1 w
k=l
k k-1

for each n. Because Lk (x(tt))- X(t~~ 1 )) 2 ~ [X, X](t) (Eq. 4.29), this sequence

also converges to [X, X](t) in probability. We also have maxk a(IX(t~n))- X (t~~ 1 )I) -+ 0
as n -+ oo since limu{-0 a (u) = 0 and the function s r-+ X (s, w) is continuous on [0, t] for
almost all wand, hence, uniformly continuous in this time interval so that maxk IX(tt))-
(n)
X(tk_ 1)1-+ Oa.s.asn-+ oo.
In summary, we have shown that for each t 2: 0 the sequences ~n), sin), and S~n)
Jd
converge in probability to g 1 (X (s-)) dX(s), (1/2) JJ
g 11 (X(s-)) d[X, X](s), and zero,
respectively, as n -+ oo. Hence, we have

lim P (is(n)
n-+oo 1
+ s<n) + S(n)-
2 3 Jot g 1 (X(s-)) dX(s)

- ~lot g 11 (X(s-)) d[X, X](s) I > e) = 0


for any e > 0 so that the equality

g(X(t)) - g(X(O)) = Jot g 1 (X (s- )) dX (s) + l1 Jot g 11 (X (s-)) d[X, X](s) (4.45)
4.6. Ito's formula 241

holds a.s. for each t. Note that X(s-) = X(s) for each s E [0, t] since X is assumed to be
continuous. Because the processes on both sides of Eq. 4.45 are continuous, this equality
holds for all values oft. If X (0) is not zero, the Ito formula of Eq. 4.45 becomes

g(X(t))- g(X(O)) = lo tg (X(s))dX(s) +-1 lot g


1 11 (X(s))d[X, X](s). (4.46)
0 2 0

Case 2: X is a general semimartingale. Consider an arbitrary time t > 0 and


note that LO<s<t(~X(s)) 2 ~ [X, X](t) and [X, X](t) < oo a.s. because [X, X] is an
increasing process with cadlag paths (Eqs. 4.25 and 4.29). Given 8 > 0 and t > 0, denote
by V a collection of jumps of X in [0, t] such that Lsev<~X(s)) 2 ~ 8 2 and such that the
set U of jumps in [0, t] that are not in V is finite. Then Eq. 4.44 gives

mn
g(X(t))- g(X(O)) = L (g(X(t~n)))- g(X(t~~ 1 ))
k=l

=L (g(X(t~n)))- g(X(t~~1 )) + L (g(X(t~n)))- g(X(t~~1 )), (4.47)


~u ~v

where Lk,U corresponds to the intervals of the partition Pn of [0, t] including jumps in U
and Lk, v =
Lk - Lk,U. Since X has cadlag sample paths, it has at most a countable
number of jumps in [0, t] and, for a sufficiently fine partition Pn· the intervals (t~~~, t~n)]
contain at most one jump of X in U so that

L [ g(X(t~n))) - g(X(t~~1 ))] --+ L [g(X(s)) - g(X(s-))], as n --+ oo,


k,U seU

for almost all samples of X.


The Taylor formula in Eq. 4.43 can be applied to each term of the summation
Lk, v = Lk- Lk,U and gives

L (g(X(t~n)))- g(X(t~~ 1 ))
k,V

= ~ [lcxcr~~ 1 )) (xcrtl)- X(t~~1 )) + ~ g"(X(t~~ 1 )) (x(ttl)- x(t~~1 )) 2 ]

- L [g'(X(t~~1 )) (x(t~n))- X(t~~1 )) + ~ g 11 (X(t~~1 )) (x(t~n))- X(t~~~)t]


k,U
+ Lr(X(t~~ 1 ), X(tt>)- X(t~~ 1 )).
k,V

As in the previous case, the first term on the right side of the above equation converges in
ucp to Jci+
g 1(X(s-)) dX(s) + (1/2) Jci+
g 11 (X(s-)) d[X, X](s). The second term con-
verges to

- L [g'(X(s-)) ~X(s) + ~g"(X(s-)) (~X(s)) 2 ] as n--+ oo.


seU
242 Chapter 4. Ito's Formula and Stochastic Differential Equations

It remains to show that, as e {,. 0, LsEU (g(X(s))- g(X(s-))) and the above summation
converge to LO<s:::t [g(X(s)) - g(X (s- ))] and

- L [lcxcs-)) ~X(s) + ~g 11 (X(s-)) (~X(s)) 2 J,


O<s:::t

respectively, and Lk,V r(X(t~':l 1 ), X(t~n))- X(t~':l 1 )) approaches zero. Details on this
part of the proof can befound in [147] (pp. 73-74). Hence, the limit of
mn
g(X(t))- gcxco>> =L (g<X<tt))>- gcxct~':l 1 n)
k=l

as n ~ oo and e {,. 0 is

lo0+t g (X(s-)) dX(s) +-21 lot0+ g (X(s-)) d[X, X](s)


1 11

+ L [g(X(s))- g(X(s-))- g 1 (X(s-)) ~X(s)- ~ g 11 (X(s-)) (~X(s)) 2 ],


O<s:::t

which is the right side ofEq. 4.40 since (Eq. 4.25)

l0+
g"(X(s-))d[X, X](s)- ~ L
2 O<s~t
g11 (X(s-)) (~X(s)) 2

= r
lo+
g 11 (X(s-))d[X, X]c(s).

It remains to show that g(X) is a semimartingale. Considerations as in the previous


case show that g(X) is adapted and has cadHtg samples. The two integrals on the right
side of Eq. 4.40 are semimartingales by a preservation property of the stochastic integral
(Section 4.4.4). The last term in Eq. 4.40 can be split in two summations involving jumps
~X of X with magnitude larger and smaller than one. The summation including jumps
I~X (s) I > 1 is a process of finite variation on compacts since X has cadlag samples, so
that it is a semimartingale (Section 4.4.1). The summation including jumps I~X (s) I ::": 1
can be written in the form

LO<s~t[g(X (s)) - g(X(s-)) - g 1 (X(s-)) 6.X(s)] = i LO<s:::t g 11 (8(s)) (6.X (s)) 2 ,

where LO<s~t means LO<s~t.ILl.X(s)l~l and 18(s)l ::": I~X(s)l ::": 1 a.s. Since g E
C2 (JR), there is a constant a> 0 such that lg''(8(s))l ::":a and

which is bounded since X is a cadlag process. Hence, LO<s<t[g(X(s))- g(X(s-))-


g 1 (X(s-)) ~X(s)] is a semimartingale (Section 4.4.1). BeC"ause we have shown that
Eq. 4.40 holds and the right side of this equation is a semimartingale as a sum of semi-
martingales, g(X) is also a semimartingale. •
4.6. Ito's formula 243

The collection of semimartingales has some notable closure properties in


the sense that processes obtained by performing some operations on semimartin-
gales are semimartingales. Let X and Y be semimartingales and a, f3 be some
constants. We have seen that (1) a X+ f3 Y, (2) J X_ dY, (3) [X, X] and [X, Y],
and (4) X Yare semimartingales. The Ito formula gives another closure property
for semimartingales. This formula shows that any memoryless transformation of
a semimartingale is a semimartingale, provided that the transformation has con-
tinuous second order derivatives. For example, the processes sin(X), cos(X), and
ex are semimartingales. In contrast, memoryless transformations of martingales
may not be martingales. For example, the square of a Brownian motion is not a
martingale.
The Ito formula is applied in the following examples to derive some prop-
erties relevant for the analysis of some of the stochastic problems discussed in the
following chapters of the book.

Example 4.19: Consider the mapping B r-+ g(B) = Bn of a Brownian motion,


where n ?: 2 is an integer. Then

B(tt = n lo tB(st-
0
1 dB(s) + n(n -
2
1) lot B(st-
0
2 ds (4.48)

is an .F1-semimartingale, where .F1 is the natural filtration of B. The formula in


Eq. 4.48 is consistent with Eq. 4.3, provides an alternative method for calculating
the integral J~ B dB, and shows that this integral is a martingale since it is equal
to (B(t) 2 - t)/2. 0

Proof: The Ito formula in Eq. 4.40 (1) can be applied to the mapping B 1-+ Bn since this
mapping is infinitely differentiable and B is a continuous square integrable martingale so
that it is a semimartingale, (2) shows that Bn is a semimartingale, and (3) gives a recurrence
formula for calculating the stochastic integral JJ B(s)q dB(s), where q 2: 1 is an integer
(Eq. 4.48).
That g(B) = B 2 is a semimartingale also results from the representation B(t) 2 =
A(t) + M(t), where A(t) = tis an adapted continuous process with A(O) = 0 and paths
of finite variation on compacts and M(t) = B(t) 2 - t is a square integrable martingale
starting at zero. Hence, the process B2 is decomposable so that it is a semimartingale
(Section 4.4.1 ).
The equality in Eq. 4.48 also shows that the Ito integral JJ B(s)n- 1 dB(s) can be
calculated path by path from (1/n) B(t)n- ((n- 1)/2) JJ
B(s)n- ds and samples of the
2
Brownian motion B. •

Example 4.20: The process N 2 is a semimartingale and

N(t) 2 =2 ( N(s-) dN(s) + N(t), (4.49)


lo+
where N is a Poisson process with intensity 'A > 0 (Eq. 4.10). 0
244 Chapter 4. Ito's Formula and Stochastic Differential Equations

Proof: That N 2 is a semimartingale follows from the Ito formula. This formula applied to
the function g(N) = N 2 gives (Eq. 4.40)

N(t) 2 - N(0) 2 = lo t 2N(s-)dN(s) +-1lot 2d[N, N]c(s)


0+ 2 0+
+ L [ N(s) - N(s- ) - 2 N(s-) !).N(s) J.
2 2
O<s:::t

We have seen that N is a quadratic pure jump semimartingale so that [N, Nf (t) = 0 for
all t ::=: 0. We also have
N(t)
L [N(s) 2 - N(s-) 2 - 2N(s-) !).N(s)] =L [N(Ti) 2 - N(Ti-1) 2 - 2N('li-l)J
O<s:::t i=l
N(t)
=L [i 2 - (i - 1) 2 - 2 (i - 1)] = N(t),
i=l
where 1j, i = 1, 2, ... , denote the jump times of N. The above results and the Ito formula
yield Eq. 4.49 since N(O) = 0. •

Example 4.21: Let C be the compound Poisson process defined by Eq. 4.13 with
jumps given by a sequence Yk of iid real-valued random variables. The process
C 2 is a semimartingale and

C(t) 2 = 21 0+
t
C(s-)dC(s)
N(t)
+ Lyf_
i=l
(4.50)

This formula provides an alternative way for calculating the stochastic integral
JC-dC. <>
Proof: According to Ito's formula, C2 is a semimartingale and (Eq. 4.40)

C(t) 2 -C(0) 2 = (
lo+
1
2C(s-)dC(s)+-
2 t
lo+
2d[C,C]c(s)

+ L [C(s) 2 - C(s- ) 2 - 2 C(s-) f).C(s) J. (4.51)


O<s:::t

Because Cis a quadratic pure jump semimartingale, we have [C, C]C(t) = 0, t 2: 0. The
summation on the right side of the above equation is

L [C(s) 2 - C(s-)2 - 2 C(s-) f).C(s) J


O<s:::t
N(t)
=L [C(Tj) 2 - CCTi-t) 2 - 2 CC1i-l) yi]
i=l
4.6. Ito's formula 245

and constitutes a compound Poisson process with jumps ~2 occurring at the jump times ~
of C. These considerations and C(O) = 0 give Eq. 4.50. •

Example 4.22: Let C be the compound Poisson process in Eq. 4.13 and g a real-
valued function with continuous second order derivative. An alternative form of
the Ito formula in Eq. 4.40 is

g(C(t))- g(C(O)) = ( f [g(C(s-) + y)- g(C(s-))] M(ds, dy), (4.52)


Jo+JIR
where M (ds, dy) denotes a random measure giving the number of jumps of C in
the range (y, y + dy] during the time interval (s, s + ds]. <>
Proof: If g E C2 (R), Eq. 4.40 gives

g(C(t)) - g(C(O)) = t
lo+
g1 (C(s- )) dC(s) +~
2
t
lo+
g11 (C(s-)) d[C, C]c (s)

+ L [g(C(s)) - g(C(s-)) - g 1 (C(s-)) ~C(s)] = L [g(C(s)) - g(C(s-))]


O<s:s:t

since Jd+ g' (C(s-)) dC(s) = LO<s:::t g1 (C(s-)) ~C(s) and [C, C]c = 0. We also have

t r [g(C(s-) + y)- g(C(s-))]M(ds, dy) = L [g(C(s))- g(C(s-))]


Jo+JIR O<s~t
since the C has piece-wise constant samples with a finite number of jumps in (0, t] a.s. •

Example 4.23: The JR2 -valued process


X(t) = (XI(t) = cos(B(t)), X2(t) = sin(B(t))), t 2::0, (4.53)
takes on values on the unit circle centered at the origin of lR 2 and is called the
Brownian motion on the unit circle ([135], Example 7.5.5, p. 121). This process
is the solution of the stochastic differential equation
1
dX1 (t) = 2 X1 (t) dt- X2(t) dB(t),
1
dX2(t) = 2 X2(t) dt +X 1(t) dB(t). (4.54)

Stochastic differential equations are discussed in Section 4.7. Figure 4.9 shows a
sample path of X. <>
Note: The differential form of the Ito formula in Eq. 4.42 applied to the coordinates of X
gives

d(cos(B(t)) =- sin(B(t)) dB(t)- ~ cos(B(t)) dt = ~X 1(t) dt- X2(t) dB(t),

d(sin(B(t)) = cos(B(t)) d B(t)- ~ sin(B(t)) dt = ~ X2(t) dt +X 1 (t) dB(t)


since d[B, B]c(t) = d[B, B](t) = dt. A
246 Chapter4. Ito's Formula and Stochastic Differential Equations

A sample of
(sin(B), cos(B))

Figure 4.9. A sample path of the Brownian motion on the unit circle

Example 4.24: Let Y and Z be real-valued processes with cadlag samples adapted
to the natural filtration of a Brownian motion B. Let X(t) = A(t) + M(t) be an-
other process, whereA(t) = f~ Y(s-)ds andM(t) = f~ Z(s-)dB(s). Then X
is a sernimartingale, and

g(X(t))- g(X(O)) =
ht g (X(s-))dX(s) + ~2kt g 11 (X(s-))d[X, xns)
1

forgE C(l~_2), where [X, X]c(t) =[X, X](t) = [M, M](t) = ~~ Z(s-) 2 ds. 0
Proof: Since A and M are semimartingales (Section 4.4.4), X is also a semimartingale.
The expression of g(X(t))- g(X(O)) is given by the Ito formulainEq. 4.40.
The quadratic variation of X can be calculated from [X, X] = [M +A, M +A] =
[M, M] +[A, A]+ 2 [M, A]. We have (Eq. 4.34)

[M, M](t) = [Z · B, Z · B] = lot Z(s-) 2 d[B, B](s) = lot Z(s-) 2 ds.

The quadratic variation of A is [A, A](t) = 0, t 2:: 0, since

L IA(t~n))- A(t~~ 1 )1
mn
L (A(t~n))- A(t~~1 )) 2
mn
:::; sup IA(t~n))- A(t~~ 1 )1
k=l k k=l

converges to zero for partitions Pn of [0, t] with mesh !!.(pn) ---+ 0 as n ---+ oo. Leti (t) = t
denote the identity function. Then A(t) = JJ
Y(s-) di(s) so that the quadratic covariation
of MandA is [M, A]= JJ Y(s-) Z(s-) d[B, i](s), t 2:: 0, and
mn
L Itt) -
mn
L (B(tt))- B(t~~1 )) (t~n) - t~~ 1 ) :::; sup IB(tt))- B(t~~ 1 )1 t~~ 1 1,
k=l k k=l
4.6. Ito's formula 247

converges to zero since I::;~ I lt~n) - t~~ 1 1 = t and supk IB(t~n)) - B(t~~ 1 ) I -+ 0 a.s. as
n -+ oo by the continuity of B. Hence, the quadratic variation process of X is [X, X] =
[M, M] as stated. •

4.6.2 Multi-dimensional case

Ito's formula. If X is a vector of d semimartingales and g : m.d -+ IR has


continuous second order partial derivatives, then (1) g(X) is a semimartingale
and (2) the integral form of the Ito formula is ([147], Theorem 33, p. 74)

g(X(t))- g(X(O)) = L 11 _£(X(s-))


d a dXi(s)
i=l 0+ axi

+-1 L 11 --(X(s-))d[Xi,
da2g
Xj]c(s)
2.. axiaXj 0+
l,j=l

+ 0~' [ g(X (')) - g(X ('-)) - ~ ::.(X('-)) t.X; (')] . (4.55)

If the coordinates of X are continuous semimartingales, the differential and


integral forms of the Ito formula are

dg(X(t)) =" a d
_£(X(t))dXi(t)
~ ax·
i=l l
1
+-"
d a2
_g_(X(t))d[Xi, Xj](t)
2 ~ ax· ax.
i,j l J
and

g(X(t))- g(X(O)) ="


~
11 a
d
_£(X(s)) dXi(s)
0 ax·
i=l l

1
+ 2 L lo
d t a2g
ax· ax (X(s)) d[Xi, Xj](s). (4.56)
i,}=I 0 z J

Note: If the coordinates of X are continuous sernimartingales, then [Xi, X j ]c = [Xi, X j]


and X (s-) = X (s) so that the last summation in Eq. 4.55 vanishes.
The differential form of the Ito formula is used in applications to develop evolution
equations for moments and other properties of X. &

Example 4.25: Let B be a Brownian motion in JRd and let X(t) = g(B(t)) =
B(t)/ II B(t) II, t :=:: 0, be the Brownian motion on the unit sphere Sd(O, 1)
in m.d centered at the origin of this space, where g : m.d \ {0} --+ Sd(O, 1) and
II x II= <L.f=I xf) 112 denotes the Euclidean norm.
248 Chapter 4. Ito's Formula and Stochastic Differential Equations

The coordinates of X satisfy the differential form


d
d-1 Bp(t) "8;p-B;(t)Bp(t)
dXp(t) = --2- II B(t) 113 dt +~ II B(t) 113 dB;(t)
1=1
d
d- 1 Xp(t) d "8;p- X;(t) Xp(t)
= --2- II B(t) II t +~ II B(t) II dB;(t)

for each p = 1, ... , d. These forms are not stochastic differential equations for
X of the type defined later in this chapter (Eq. 4.69) since the increments dX (t)
of this process depend on X (t), B (t), and the increments dB (t) of B rather than
just on X(t) and dB(t). The process X can be modified to satisfy a stochastic
differential by a random time change ([135], Example 8.5.8, p. 149) or by state
augmentation. For example, we can consider the evolution of the process (X, Y),
where Y is an JE.d -valued process defined by dY(t) = dB(t). <>
Proof: The differential form of the Ito formula in Eq. 4.56 gives

a a2
dXp(t) =" d
L..
gp
(B(t))
ax·
dB;(t)
1
+-2 "L..
d
gp
ax· ax.
(B(t))
d[B;, B ·](t)
J
i=1 l i,j=1 l J

for p = 1, ... , d. The polarization identity (Eq. 4.27) gives


= zd ([B; + Bj, B; + Bj]- [B;, B;]- [Bj, Bj]) (t),
1
d[B;, Bj](t)

which is (1/2)(2 dt- dt- dt) = 0 fori =f. j since the processes B; are independent Brow-
nian motions. Hence, the double summation in the above expression of dXp(t) becomes
L.%= 1 (a 2gp(B(t))jax?) dt. •
Example 4.26: Consider a real-valued function g : JE.d -+ lR with continuous
and bounded first and second order derivatives. Let B be an lR d -valued Brownian
motion starting at x = (x1, ... , Xd) E JE.d whose coordinates B; are indepen-
dent Brownian motions such that B;(O) = x;, i = 1, ... , d. Denote by Ex the
expectation operator for B (0) = x. The limit,

A[g(x)] =lim Ex [g(B(t))] - g(x) =


t..j,O t
~
2
t azg(x)'
ax 2
(4.57)
i=1 l

called the generator of B, exists and is proportional with the Laplace operator
D. = "£.f= a ;ax?.
1 2 This result provides a link between a class of stochastic
processes and some deterministic partial differential equations (Chapter 6). <':>
Proof: The Ito formula in Eq. 4.56 gives

g(B(t))- g(B(O)) =L
d it a (B(s))
g dB;(s) +-1 Ld it a2 (B(s))
g 2 ds
i=1 0 ax; 2 i=1 0 axi
4.6. Ito's formula 249

since d[B;, B1 ] (s) = 8ij ds, where 8ij = 1 for fori = j and 8;1 = 0 fori f= j.
The integrals in the first summation on the right side of the above equation are
:Ft = a (B(s) : 0 :::: s :::: f)-martingales starting at zero. Hence, their expectation is zero.
The integrals in the second summation on the right side of this equation can be defined as
Riemann integrals and can be approximated by t a2 g(B(&(w) t, w))jax?, &(w) E (0, 1),
for almost all w's. The limit of these integrals scaled by t as t +
0 is deterministic and
equal to a2 g(x)jaxf. These observations yield Eq. 4.57 . •

Example 4.27: A stochastic process X is a standard Brownian motion if and only


if X is a continuous local martingale with X (0) = 0 and [X, X](t) = t. <>
Proof: Necessity follows from properties of the Brownian motion. For sufficiency, define
the function

Z(t) = g(X(t), t) = exp ( H u X(t) + ~ u2 t), u E JR,

that has continuous second and first derivatives relative to x and t, respectively, for each
u E JR. The Ito formula applied to Z(t) = g(X(t), t) = exp ( H u X(t) + t u 2 /2)
gives

Z(t) = Z(O) + t (ag(X(s), s) dX(s) + ag(X(s), s) ds)


lo ax as

+~ t a2g(X(s), s) d[X, X](s)


2 lo ax 2

= 1+ H u t Z(s)dX(s) + u22 lot Z(s)ds- u22 lot Z(s)d[X, X](s)


lo
= 1+ H u{ Z(s)dX(s).

By a preservation property of the stochastic integral, Z is a complex-valued local martin-


gale so that the stopped process zt is a martingale. Since Z has the martingale property,
we have E[Z(t) I Fsl = Z(s), t ~ s, which implies E [eR u(X(t)-X(s)) I :Fs J =
e-u 2 (t-s)/ 2 for each u E lR so that X(t)- X(s) is independent of :Fs and is normally
distributed with mean zero and variance t - s. In summary, X starts at zero, is Ft-adapted,
has continuous samples, and has stationary independent Gaussian increments with mean
zero and variance t - s. Hence, X is a Brownian motion. •

4.6.3 Fisk-Stratonovich's integral


Recall that the symbol " o " inserted between the integrand and the inte-
grator has been used to denote Stratonovich integrals. We preserve this notation
throughout the book. Let X and Y be semimartingales.

The Fisk-Stratonovich or Stratonovich integral of Y with respect to X is

lo tY(s-) o dX(s) = lot Y(s-) dX(s) + -[Y,


0 0
1
2
X]c (t). (4.58)
250 Chapter 4. Ito's Formula and Stochastic Differential Equations

Integration by parts. If at least one of the semimartingales X and Y is contin-


uous, then ([147], Corollary, p. 76)

X(t)Y(t)-X(O)Y(O)= r
lo+
X(s-)odY(s)+ r
lo+
Y(s-)odX(s). (4.59)

Proof: If at least one of the semimartingales X andY is continuous, we have [X, Y](t) =
[X, Y]C(t) + X(O) Y(O) so that (Eq. 4.28)

X(t)Y(t)= rr
lo+
X(s-)dY(s)+ {t Y(s-)dX(s)+[X,Y]c(t)+X(O)Y(O).
lo+
This observation, Eq. 4.58, and the equality [X, Y] = [Y, X] give the integration by parts
formula based on Stratonovich integrals. Eq. 4.59 coincides with the integration by parts
formula in the classical calculus. •

It is possible under some conditions to develop change of variable formulas


similar to the It() formulas based on Fisk-Stratonovich integrals.

If the third order derivative of a function g : lR ~ lR is continuous, then ([147],


Theorem 34, p. 75)

g(X(t))- g(X(O)) = t
lo+
g'(X(s-)) o dX(s)

+ L [g(X(s))- g(X(s-))- g'(X(s-)) ~X(s)]. (4.60)


O<s::;t

The Itoand Stratonovich integrals of g'(X) with respect to X are related by


([147], Theorem 34, p. 75)

r
lo+
g'(X(s-)) odX(s) = r
lo+
g'(X(s-))dX(s)

+ ~ t g"(X(s-))d[X, X]c(s). (4.61)


2 lo+

Proof: IfEq. 4.61 is valid, the Jt(j formula in Eq. 4.40 gives Eq. 4.60. Hence, it is sufficient
to show that (Eq. 4.58)

[g 1 (X),X]c(t) = rr g (X(s-))d[X,X]c(s).
lo+
11
4.6. Ito's formula 251

If g E C3 (R), the Ito formula can be applied to the semimartingale g1 (X) and gives

g1(X(t))- g1 (X(O)) = r g (X(s-))dX(s) + ~2 lo+


lo+
11 r g (X(s-))d[X, X]c(s) 111

+ L [g'(X(s))-g (X(s-))-g (X(s-))~X(s)].


1 11
O<s:::t

so that
[g1 (X), xr = [g (X-) ·X, X]c + ~[g
11
2
111 (X-) ·[X, xr, X]c.
The terms on the right side of the above equation are (Eq. 4.34)

[g 11 (X-)·X,X]c(t)= t g (X(s-))d[X,Xr(s) and


lo+
11

[g 111 (X-) ·[X, X]c, X](t) = t g (X(s-))d[[X, X]c, X](s)


111
lo+
= L g"'(X(s-)) MX, xr(s) ~X(s).
O<s::st

Since MX, X](t) = (~X(t)) 2 by Eq. 4.29, we have

L g"'(X(s-)) MX, X]c(s) ~X(s) = ( L g111 (X(s-)) (~X(t))3)c = 0


O<s~t O<s~t

so that [g 1(X), X]c is as stated. •


It can be shown that the change of variable formula in Eq. 4.60 holds also
for functions g E C 2 (JR) ([147], Theorem 20, p. 222). The generalization of this
change of variable formula to the case of an JRd -valued semimartingale is also
possible.

If X is a vector consisting of d semimartingales and g : JRd ---+ lR has second or-


der continuous partial derivatives, then (1) g(X) is a semimartingale and (2) the
change of variable formula is ([147], Theorem 21, p. 222)

g(X(t))- g(X(O)) = t t ~(X(s-)) o dXi(s)


t
i=l lo+ axi

+ O~t [g(X(s))- g(X(s-))- ::i (X(s-)) ~Xi(s)]. (4.62)

Note: If the coordinates of X are continuous semimartingales, then the change of variable
formula in Eq. 4.62 becomes

d !at a
g(X(t))- g(X(O)) = L
i=l 0+
_!_(X(s)) o dXi(s),
OXj
(4.63)
252 Chapter 4. Ito's Formula and Stochastic Differential Equations

and coincides with the change of variable formula of the classical calculus. A

The Ito and the Fisk-Stratonovich integrals in Eqs. 4.55 and 4.62 are related by

t lo+r ~(X(s-))
i=l oxi
odXi(s) = t lo+r ~(X(s-))dXi(s)
i=l axi

+-1 La lot --(X(s-))d[Xi,


a2g
Xj]c(s). (4.64)
2 i,j=l 0+ OXj ax j

Note: Generally, the evolution of the state X of a physical system is defined by a differential
equation driven by a non-white (colored) noise so that X satisfies a Stratonovich stochastic
differential equation ([102], Section 5.4.2). The relationship in Eq. 4.64 and the definition
of the Fisk-Stratonovich integral in Section 4.6.3 can be used to express the solution X of
a stochastic problem in functions of Riemann-Stieltjes and It() integrals (Section 4.7.12).
This representation of X is particularly useful for calculations because It() integrals are
martingales or semimartingales. A

Example 4.28: Let B be a Brownian motion and h : lR --+ lR be a function with a


continuous second order derivative. Then

r
lo
h(B(s)) o dB(s) = r
lo
h(B(s)) dB(s) +~
2 lo
r h'(B(s)) ds (4.65)

gives the relationship between the Stratonovich and the Ito integrals of h(B) with
respect to B. <>
Proof: The integrals JJ
h(B(s)) o dB(s) and JJ h(B(s)) dB(s) are defined by the prop-
erties of h and B. Eq. 4.64 with d = 1 and h = g' gives Eq. 4.65 since d[B, B]C(t) =
d[B, B](t) = dt.
For example, if h(x) = exp(x), we have
lo t 0
eB(s) o dB(s) = lot
0
eB(s) dB(s) + -1
2 0
lot eB(s) ds (4.66)

so that [eB, B](t) = JJ eB(s) ds (Eq. 4.58). For h(x) = x, we have


lot B(s) o dB(s) =lot B(s) dB(s) +~lot ds =lot B(s) dB(s) + ~·
in agreement with Examples 4.1 and 4.2. •

Example 4.29: Let X = N be a Poisson counting process. Then

lot h(N(s-)) o dN(s) =lot h(N(s-)) dN(s) (4.67)

for h E C2 (IR). <>


4. 7. Stochastic differential equations 253

Proof: The coincidence of the It() and the Stratonovich integrals with integrand depending
on N and integrator N results from Eq. 4.64 with d = 1 because N is a quadratic pure jump
semimartingale so that [N, N]c = 0. The same result can be obtained from Eq. 4.58 since
[h(N), Nf = 0. The equality in Eq. 4.67 remains valid if N is replaced by a compound
Poisson process. •

4.7 Stochastic differential equations


It is common in applications to characterize the current state of a physical
system by an JRd-valued function of time x(t), t ::: 0, called the state vector.
For example, the coordinates of x may denote the displacements and velocities at
the degrees of freedom of a discrete dynamical system or at the nodes of a finite
difference representation of a continuum. Generally, the behavior of a physical
system subjected to an input w(t), t ::: 0, is specified by a differential equation of
the form
x(t) = a(x(t), t) + b(x(t), t) w(t), t ::: 0, (4.68)

giving the rate of change of the state vector, where the functions a and b depend
on the system properties. Classical analysis is based on the assumptions that the
system properties and the input are perfectly known and deterministic. The evo-
lution of the state vector x is given by Riemann-Stieltjes integrals with integrands
depending on the system properties and integrators depending on input and time.
In this section, we generalize Eq. 4.68 by assuming that the input is a
stochastic process. The properties of the system are still assumed to be deter-
ministic and perfectly known so that the coefficients of Eq. 4.68 are deterministic,
known functions of the state vector and time. Because the input is random, the
state vector is a stochastic process. Let X denote the solution of Eq. 4.68 with w
replaced by a stochastic process W. It is common in applications to assume that
W is a white noise process for the following reasons ([175], Chapter 8).

1. In many cases the input has a much shorter memory than the system.

2. A broad class of correlated or colored inputs can be modeled by the output of


linear filters to white noise so that the augmented vector consisting of X and the
state of the filter defining the colored input satisfies a differential equation driven
by white noise.

3. The first two moments of the state X of a linear system driven by white noise
can be obtained simply provided that these moments exist. The system defined by
Eq. 4.68 is said to be linear if a(x(t), t) = «(t) x(t) and b(x(t), t) = {J(t) x(t),
where the matrices « and {J are functions of time. We will adopt in our discussion
a more restrictive definition for linear systems, which is common in random vibra-
tion studies. We say that the system in Eq. 4.68 is linear if a(x(t), t) = «(t) x(t)
and b(x(t), t) = {J(t).
254 Chapter4. Ito's Formula and Stochastic Differential Equations

Table 4.1. Noise input processes


DEFINITION
INPUT NOISE
Increment of I Formal
Gaussian white noise Brownian motion, B dB(t)jdt
Poisson white noise Compound Poisson, C dC(t)/dt
Levy noise Levy,L dL(t)/dt
Semimartingale noise Semimartingale, S dS(t)/dt

Table 4.1 summarizes the types of white noise processes used in this book. If the
jumps of the compound Poisson process Care in Lz, the second moment prop-
erties of the Poisson and the Gaussian white noise processes have the same func-
tional form and these processes can be made equal in the second moment sense.
The Levy white noise includes both the Gaussian and the Poisson white noise pro-
cesses (Section 3.14.2). The semimartingale noise is considered at the end of this
section. The use of the white noise model results in a significant simplification
in calculations, for example, linear filters driven by white noise ([175], Section
5.2) but can entail significant technical difficulties. For example, the Gaussian
white noise is frequently interpreted in applications as the derivative of the Brow-
nian motion, a stochastic process with a.s. non-differentiable sample paths! In
our discussion, we will deal with increments of the Brownian motion, compound
Poisson, Levy, and semimartingale processes rather than the corresponding white
noise processes. Accordingly, the evolution of the state vector X is defined by the

Stochastic differential equation

dX(t) = a(X(t-), t)dt + b(X(t-), t)dY(t), t ::=: 0, (4.69)

whose meaning is given by the

Stochastic integral equation

X(t) = X(O) +lot a(X(s-), s) ds +lot b(X(s-), s) dY(s), t ::=: 0, (4.70)

where a, bare (d, 1), (d, d')-dimensional matrices whose entries are real-valued
Borel measurable functions, Y is a vector in JRd' consisting of d' real-valued semi-
martingales, and the state X is an JRd -valued stochastic process. The input Y in-
cludes the Brownian motion, Poisson, compound Poisson, and Levy processes.
If Y is a Brownian motion B, the state X is called a diffusion or Ito process
and the matrices a and b b T are referred to as drift and diffusion coefficients,
respectively. The first and second integrals in Eq. 4.70 are Riemann-Stieltjes and
4. 7. Stochastic differential equations 255

stochastic or Ito integrals, respectively.


Two types of solutions are considered for Eqs. 4.69-4. 70, strong and weak
solutions. If the initial value X (0) of these equations is deterministic, a strong
solution is a stochastic process X (t), t 2: 0, satisfying the following conditions
([131], p. 137).

1. X is adapted to the natural filtration :ri = a(Y(s), 0 :::; s :::; t) of Y, that is,
X (t) is a function of Y (s), s :::; t, at each timet 2: 0.

2. X is a function of the sample paths of Y and the coefficients a and b.

3. The Riemann-Stieltjes and Ito integrals in Eq. 4.70 are well defined at all times
t > 0.
- If X(O) is random, :rr needs to be extended to Ft
= a ( X(O), Fr).
A version of the input Y needs to be specified to construct a strong solution of
Eq. 4. 70. If we replace one version of Y by another version of this process, a
different strong solution may result because these versions of Y can have different
sample properties. We construct approximations of the strong solution in Monte
Carlo studies because paths of X are calculated from paths of Y by integrating
Eq. 4.70. The path behavior is not essential for the weak solution. A weak solu-
tion of Eqs. 4.69-4.70 is a pair of adapted processes ( Y, X) defined on a filtered
p~ob~bility space (Q, 1i, (1i 1 )t-:~_o, P) such that Y is a version of Y and the pair
(Y, X) satisfies Eq. 4.70. The weak solution is completely defined by the initial
condition, the functions a and b, and the finite dimensional distributions of Y. The
particular version of the input does not have to be specified. The above definitions
show that a strong solution is a weak solution but the converse is not generally
true.
The uniqueness of the solution of Eqs. 4.69-4.70 is generally defined in two
ways. A strong solution is said to be unique in the strong or pathwise sense
if two different solutions of these equations have the same sample paths except
on a subset of Q of measure zero. The strong uniqueness requires that the set of
solutions of Eqs. 4.69-4.70 consists of indistinguishable processes. On the other
hand, two solutions, weak or strong, of these equations are unique in the weak
sense if they have the same finite dimensional distributions. Hence, the collection
of weak solutions of Eqs. 4.69-4.70 consists of processes that are versions of each
other ([42], Section 10.4, [131], Section 3.2.1, [135], Section 5.3).

Example 4.30: Consider the stochastic differential equation

dX(t) = s(X(t))dB(t), t 2:0,

s
where X (0) = 0, B is a Brownian motion starting at zero, and (x) = -1, 0, and
1 for x < 0, x = 0, and x > 0, respectively. It can be shown that this equation
has a weak solution ([42], Section 7.3). The solution is unique in the weak sense
([42], pp. 248-249) but it is not unique in the strong sense. <>
256 Chapter 4. Ito's Formula and Stochastic Differential Equations

Proof: Let X (·, w) be a solution of the above stochastic differential equation corresponding
to a sample B(·, w) of B, that is,

X(t, w)- X(O, w) =lot ~(X(s, w)) dB(s, w) so that

X(t, w)- X(O, w) = { ~(X(s, w)) dB(s, w),

where X= -X. If X(O) = 0 and X(·, w) is a solution corresponding to a sample B(·, w)


of B, then X(·, w) is also a solution corresponding to the same sample of B. The solutions
X and X have the same probability law but their sample paths differ. •

4. 7.1 Brownian motion input


Let Y in Eqs. 4.69 and 4. 70 be an JRd' -valued Brownian motion process B =
(Bt, ... , Bd'), where Bi, i = 1, ... , d', are independent real-valued Brownian
motions. The meaning ofEq. 4.69 with Bin place of Y is given by the stochastic
integral equation

X(t) = X(O) +lot a(X(s), s) ds +lot b(X(s), s) dB(s), (4.71)

where f~ a(X(s), s) ds and f~ b(X(s), s) dB(s) are Riemann and Ito integrals,
respectively.

JJ
Note: Generally, b(X(s), s) dB(s) cannot be defined as a path bypathRiemann-Stieltjes
integral because almost all sample paths of the Brownian motion process are of unbounded
variation. We have seen that there is a relatively simple relationship between the It() and
Stratonovich integrals (Eq. 4.58) so that the solution of Eq. 4.71 can also be expressed in
terms ofRiemann-Stieltjes and Stratonovich integrals. The process X(s-) in Eqs. 4.69 and
4.70 is replaced by X(s) in Eq. 4.71 since diffusion processes have continuous samples, as
we will see later in Section 4.7.1.1.
Note that diffusion processes are Markov, that is, the solution X of Eq. 4.71 is
a Markov process. Let to E (0, t) and X(to) = x. Because the increments of B are
independent of the past, future states,

X(t) =x +itto
a(X(s), s) ds +it
to
b(X(s), s) dB(s), t ~ to,
of X are independent of the past states X(u), u < fo, of this process conditional on its
present state X(to) = x. The converse is not true. For example, the compound Poisson
process is Markov but is not a diffusion process.
We also note that the solution of Eq. 4.71 has the strong Markov property, that is,
if Tis an Ft-Stopping time, the process X(T + r), r ~ 0, depends only on X(T), where
Ft denotes the filtration generated by the natural filtration of the Brownian motion B and
the initial state X(O). ~
4. 7. Stochastic differential equations 257

This section gives conditions for the existence and uniqueness of the so-
lution of Eq. 4.71, outlines some essential properties of the solution, illustrates
some theoretical concepts by examples, and demonstrates how Ito's formula can
be used to develop differential equations for the moments, the density, and the
characteristic function of X. In applications Eq. 4.71 is frequently given in the
form
X(t) = a(X(t), t) + b(X(t), t) W g(t), t ~ 0, (4.72)
where W g, called Gaussian white noise, is interpreted as the formal derivative of
the Brownian motion process B ([175], Section 5.2.1).
Example 4.31: Let X be the solution of dX (t) = -a X (t) dt + f3 dB(t), t ~ 0,
where a > 0 and f3 are some constants, X(O)""' N(fl,(O), y(O)), B is a Brownian
motion, and X (0) is independent of B. This real-valued diffusion process, called
the Ornstein-Uhlenbeck process, is a special case of Eq. 4.71 with d = d' = 1,
drift -ax, and diffusion f3 2 . The theorems in the following section guarantee the
existence and the uniqueness of the solution X.
Samples of the strong solution of this equation can be calculated from sam-
ples B ( ·, w) of B and the recurrence formula

X(t +/::it, w) = X(t, w) e-aLl.t + f3 i


1
t+Ll.t
e-a(t+Ll.t-s) dB(s, w)

giving X at the end of a time interval [t, t + ~t] from its value at the beginning
of the interval and the input in [t, t + ~t]. The weak solution of the Ornstein-
Uhlenbeck equation is given by a pair of a Brownian motion B and a Gaussian
process X with mean fl,(t) = E[X (t)] = f.l,(O) e-a 1 , variance y (t) = E[X (t) 2 ] =
y(O)e- 2 a 1 +f3 2 j(2a) (1-e- 2 a 1 ),andcovariancec(t,s) = E[X(t)X(s)] =
y(s 1\ t) e-a it-sl. <>
Proof: The recurrence formula for X shows that the Omstein-Uhlenbeck process is Gaus-
sian so that the second moment properties of X define its finite dimensional distributions.
We establish here differential equations for some of the second moment properties of X.
Complete differential equations for the second moment properties of the state vector of a
linear system driven by white noise are in Section 7 .2.1.1.
The mean equation can be obtained by averaging the defining equation of X. The
stochastic integral equation

X(t)=X(s)-a [ X(u)du+f31 dB(u),


1
t~s,
of X multiplied by X(s) and averaged gives

E[X(t) X(s)] = E[X(s) 2 ] - a E [it X(u) X(s) du J+ fJ E [ [ X(s) dB(u) J


or r(t, s) = r(s, s) -a J:r(u, s) du by using the Fubini theorem, the independence of
X(s) from future increments of the Brownian motion, and E[dB(u)] = 0, where r(t, s) =
E[X(t) X(s)]. This equation gives or(t, s)jot =-a r(t, s) by differentiation with respect
tot so that r(t, s) = r(s, s) e-a (t-s). •
258 Chapter 4. Ito's Formula and Stochastic Differential Equations

4.7.1.1 Existence and uniqueness of a solution

We need some definitions to state the main results of this section. Let a (x)
and b(x) be (d, 1) and (d, d') matrices whose entries are functions of only x E
JE.d. We say that a and b satisfy the uniform Lipschitz conditions if there exist a
constant c > 0 such that

II a(XJ)- a(x2) liS c II XJ- x2 II and


I h(XJ)- b(x2) llm::'S c II X! - X2 II, X!, X2 E JE.d, (4.73)

d 2) 1/2 ( d d' 2) 1/2


where II ~ II= ( Li=l ~i and II b lim= Li=l LJ=l bii denote the usual
Euclidean norm and a matrix norm, respectively.
The matrices a and b are said to be locally Lipschitz if, for each a > 0,
there is a constant Ca > 0 such that

I a(XJ)- a(x2) liS Ca I XJ - X2 II and


II b(xJ)- h(X2) llm::'S Ca I X!- X2 II (4.74)

for x 1, x2 E JE.d satisfying the condition II x 1 II, II x2 II< a.


The matrices a and b satisfy the growth condition if there is k > 0 such
that

x ·a(x)::::: k(l+ II x 11 2) and II b(x) 11~::::: k(l+ II x 11 2), (4.75)

where x · a(x) = :Lf=l Xi ai(x). The condition x · a(x) ::::: k (1+ II x 11 2 ) is


weaker than the typically stated growth condition, II a(x) 11 2::::: k' (1 + II x 11 2),
because (x - a(x)) · (x - a(x)) :=:: 0 so that

x · a(x)::::: 1( II x 11 2 +II a(x) 1 2) :::::


2 -k'+l(
2- 1+ II x 11 2) .

We present two theorems giving sufficient conditions for the existence


and uniqueness of the solution of Eq. 4.71 with drift and diffusion coefficients
that are not explicit functions of time. Alternative versions of these theorems
dealing with drift and diffusion coefficients that may depend explicitly on time are
available ([115], Section 4.5, [117], Theorem 7.1.1, p. 195, [135], Theorem 5.2.1,
p. 66, [165], Section 4.2). We also note that the theorems in this section can be
applied to general diffusion processes by considering an augmented state vector
X(t) = (XJ(t) = X(t),X2(t) = t) E m.d+l, where xl =X is defined by
Eq. 4.71 and x2 is the solution of the deterministic differential equationd X2(t) =
dt with the initial condition X2(0) = 0.
4. 7. Stochastic differential equations 259

If (1) the drift and diffusion coefficients do not depend on time explic-
itly, are bounded functions, and satisfy the uniform Lipschitz conditions in
Eq. 4.73, (2) B is a Brownian motion martingale on a filtered probability space
(Q, :F, (:Ft)r~o, P), (3) B(O) = 0, and (4) X(O) is :Fo-measurable, then
• There exists a strong solution X of Eq. 4. 71 unique in the strong sense,
• X is a B([O, oo)) x :F-measurable, adapted, and continuous process, and
• The law of X is uniquely determined by the drift and diffusion coefficients and
the laws of Band X(O) ([42], Theorem 10.5, p. 228).
Proof: A Brownian motion B is a a(B(s), 0 ::S s ::S t)-martingale. The strong uniqueness
of the solution means that if x< 1) and x<2) are two t-continuous solutions ofEq. 4.71, then
x< 1) (t, w) = x<2)(t, w) for all t ~ 0 a.s.
If some of the conditions of the above theorem are violated, it does not mean that
Eq. 4.71 does not have a solution or that its solution is not unique. The theorem gives
only sufficient conditions for the existence and uniqueness of the solution of Eq. 4. 71. We
outline the plan of the proof of this theorem. Technicalities involved in this proof can be
found in [42] (Chapter 10). Let

l(t, X)= X(O) + A(t) + M(t) (4.76)


denote the right side of Eq. 4. 71 with drift and diffusion coefficients depending only on the
state X, where

A(t) ={ a(X(s)) ds and M(t) ={ b(X(s)) dB(s). (4.77)

Generally, the processes /(·,X) and X(-) differ. This theorem gives conditions under
which these processes are indistinguishable. Let (Q, :F, (nh>O• P) be a filtered proba-
bility space with :Ft = a(X(O), B(s) : 0 ::S s ::S t) defined by-the initial condition X(O)
and the history of the Brownian motion B up to time t ~ 0. The Brownian motion B is
:Fr-adapted by the definition of this filtration. We present now the essential steps of the
proof.
(1) If X is :Fr-adapted and B([O, oo)) x :F-measurable, then l(t, X) is a continuous
semimartingale. Since X is B([O, oo)) x :F-measurable and b is continuous, the entries of
b(X) are also B([O, oo)) x :F-measurable. Since the diffusion coefficients are bounded,
M is in L2. These observations and a preservation property of the stochastic integral show
that M is a square integrable martingale. The martingale M starts at zero and is continu-
ous. The integral giving A can be defined path by path because the drift coefficients are
bounded functions. Since X E B([O, oo)) x :F and a is continuous, the entries of a(X) are
8([0, oo)) x :F-measurable. Also, A is an adapted continuous process, which starts at zero
and is of finite variation on compacts. Hence, A is also a semimartingale. We conclude
that/(·, X) is a semimartingale with continuous samples.
(2) Sensitivity of I(·, X) to X. The inequality ([42], Lemma 10.1, p. 222)

E [sup 11/(s, Y) -l(s, Z) 11 2 ] ::S 3£[11 Y(O)- Z(O) 11 2 ]


0:9:9

+ c-r E [f II Y(s) - Z(s) 11 2 ds J


260 Chapter 4. ItO's Formula and Stochastic Differential Equations

holds for each r > 0, t E [0, r], and B([O, oo)) x F-measurable, Ft-adapted processes
Y and Z such that Y(s)- Z(s) E L2, where cr > 0 is a constant. The distance between
I (·, Y) and I(-, Z) in [0, t] depends on differences between the initial values and the time
histories of Y and Z.
(3) The uniqueness of the solution of Eq. 4.71. We show that there is at most one
B([O, oo)) x F-measurable adapted solution of this equation ([42], Theorem 10.3, p. 224).
Let Y and Z be two solutions of Eq. 4.71 so that Y(O) = Z(O) = X(O), Y(t) = I(t, Y),
and Z(t) = I (t, Z) for all t 2:: 0. For a fixed r > 0, we have

E[ll Y(t)- Z(t) 11 2 ]::::; Cr E [{ II Y(s)- Z(s) 11 2 ds]

= Cr lot E [11 Y(s)- Z(s) 11 2 ] ds

for t E [0, r] by the above inequality and the Fubini theorem. We need one additional fact
to establish the uniqueness of the solution. Consider two Lebesgue integrable functions
f and gin [0, r] for some r E (0, oo). If there is a constant I; > 0 such that f(t) <
g(t) +I; Jci f(s) ds for all t E [0, r], then ([42], Lemma 10.2, p. 224)

f(t) ::::; g(t) +I; lot eS (t-s) g(s) ds.

The latter inequality with f(t) = E[ll Y(t) - Z(t) 11 2 ], g(t) = 0, and I; = cr gives
E[ll Y(t)- Z(t) 11 2 ] = 0 so that P(Y(t) = Z(t)) = 1 for each t E [0, r] and, since r
is arbitrary, this result holds for each t 2:: 0. The processes Y and X are indistinguishable
since they are continuous (Section 3.8).
(4) The existence of the solution ofEq. 4.71. The proof is based on the construction
of a convergent sequence of processes defined by

xC0l(t) = X(O),
x(n)(t) = I(t, x(n-ll), n = 1, 2, ... ,

for all t 2:: 0. The properties of the I in Eq. 4.76 imply that the processes xCnl defined
by the above recurrence formula are Ft-adapted and continuous, and their law is uniquely
determined by X(O) and B. Moreover, the sequence of processes x(n) converges to a limit
X a.s. and uniformly in [0, r] and this property holds on any bounded time interval since
r is arbitrary. The limit X has the same properties as the processes x(n), so that X is
an adapted continuous process whose probability law is determined by X (0) and B. The
process X is the strong solution of Eq. 4.71 since it is constructed from a specified version
of the Brownian motion and some initial conditions. Hence, the processes I(·, X) and X
are indistinguishable in [0, oo). •

If ( 1) X (0) and B satisfy the conditions in the previous existence and uniqueness
theorem and (2) the matrices a and b are locally Lipschitz continuous functions
(Eq. 4.74) and satisfy some growth conditions (Eq. 4.75), then there is a unique,
adapted, and continuous solution X of Eq. 4.71 whose law is uniquely deter-
mined by the laws of X(O) and B ([42], Theorem 10.6, p. 229).
4. 7. Stochastic differential equations 261

Example 4.32: Consider the deterministic differential equation dx (t) = x (t) 2 dt,
t 2: 0, starting at x(O) = 1. The drift coefficient a(x) = x 2 does not satisfy the
growth condition in Eq. 4.75 for all x e R Yet, this equation has the unique
solution x(t) = 1/(1 - t) in [0, 1). The solution x(t) = 1/(1 - t) becomes
unbounded as t t 1, so that the solution does not exist at all times. The behavior
of x(t) suggests that the growth condition is needed to assure the existence of a
solution for all times t. <>

Example 4.33: Let x be the solution of dx(t) = 3 x(t) 213 dt, t 2: 0, with the
initial condition x (0) = 0. The drift a (x) = 3 x 213 of this equation does not
satisfy the Lipschitz condition in Eq. 4.73 at x = 0. This equation has a solution
but its solution is not unique. For example, any member of the infinite collection
of functions Xa(t) defined by Xa(t) = (t - a) 3 fort > a > 0 and Xa(t) = 0 for
t ~ a is a solution. This observation suggests that the Lipschitz conditions are
needed to have a unique SIOlution. <>

Example 4.34: The stochastic differential equation

dX(t) = c X(t) dt +a X(t) dB(t), t E [0, r], (4.78)

has the unique solution

X(t) = X(O) exp[(c- a 2 /2) t +a B(t)], (4.79)

called the geometric Brownian motion process. <>


Proof: That the above stochastic differential equation has a unique solution follows from
the second existence and uniqueness theorem in Section 4.7.1.1 since (1) the coefficients
a(x) = c x and b(x) = ax of this equation satisfy the uniform Lipschitz conditions in
Eq. 4.73 and (2) the growth conditions in Eq. 4.75.
The first existence and uniqueness theorem in Section 4.7.1.1 cannot be applied
directly because the functions a(x), b(x) are not bounded. However, we can modify the
original problem so that its coefficients satisfy the requirements of this theorem. Let X be
the solution of the stochastic differential equation

dX(t) = a(X(t)) dt + b(X(t)) dB(t),


where the coefficients a, b satisfy the uniform Lipschitz conditions but may not be bounded.
For ~ > 0 define the function

=1
-~ if X<-~,
[x]~ x if-~ :sx ::::;~,
~ ifx>~,

and consider the stochastic differential equation

dX(~) (t) = a ( [ x<n (t) J~) dt +a ( [ x<~) (t) J~) dB(t).


262 Chapter 4. Ito's Formula and Stochastic Differential Equations

Since the functions a([·]~) and b([·]~) satisfy the conditions of the first existence and
uniqueness theorem in Section 4.7.1.1, the solution x<n of the above equation exists and
is unique in the strong sense for each~ > 0. Hence, Eq. 4.78 has a unique solution in the
strong sense.
We now show that

X(t) = g(B(t), t) = X(O) exp[(c- a 2/2) t +a B(t)]

solves Eq. 4.78. Let i(t) = t denote the identity function. The multi-dimensional It()
formula applied to the mapping (B, i) ~ g(B, i) gives (Eq. 4.56)

X(t)- X(O) = g(B(t), t)- g(B(O), 0) = t


lo
ag(B(s), s) dB(s)
au
+ t ag(B(s), s) ds +! t a2g(B(s), s) d[B, B](s)
lo as 2 lo au2

= t a X(s) dB(s) + lot (c- a 2/2) X(s) ds !f.-!2 lot a 2 X(s) ds,
lo
where agjau and a2gjau 2 denote partial derivatives of g with respect to its first argument.
Hence, the process X in Eq. 4.79 is the solution ofEq. 4.78. •

4.7.1.2 Properties of diffusion processes

We have seen that the solution X of Eq. 4.71 is a semimartingale process


that is also Markov. The largest difference during a bounded time interval [0, r],
t' > 0, between two solutions of Eq. 4.71 corresponding to different initial con-
ditions can be bounded by a function depending on r and the difference between
the two initial conditions. We restate these results and illustrate them with simple
examples. Examples are also used to show that memoryless monotonic transfor-
mations of diffusion processes are diffusion processes. A useful result for appli-
cations, the Wong-Zakai theorem, is stated at the end of the section.

Suppose that the drift and diffusion coefficients in Eq. 4.71 satisfy the conditions
of the first theorem of existence and uniqueness in Section 4.7.1.1. If X and Z
are solutions ofEq. 4.71 corresponding to the initial conditions X(O) and Z(O),
respectively, such that X(O), Z(O) E Fo and X(O)- Z(O) E L2, then

E [o~~~-r II X(t) - Z(t) 11 2 ] ::::; 3 E [11 X(O)- Z(O) 11 2 ] ecr -r, r~ 0,


(4.80)
where C-r > 0 is a constant depending on r ([42], Theorem 10.5, p. 228).

Example 4.35: Let X be the solutionofEq. 4.71 witha(x, t) =ax andb(x, t) =


b, where a and b are constant matrices. The difference between the solutions
x<O (t) of this equation corresponding to the initial conditions X (i) (0), i = 1, 2,
is X(l)(t)- x<2)(t) = f/l(t) (X< 1)(0)- x<2)(0)), where the Green function f/J(t)
4. 7. Stochastic differential equations 263

gives the solution X at time t ~ 0 for a unit initial condition. For example,
x< 1) (t)- x<2) (t) = (X<1) (0)- x<2) (0)) e-at for the Ornstein-Uhlenbeck process
X defined by dX (t) = -ex X (t) dt + f3 dB(t), t ~ 0, where ex > 0 and f3 are some
constants. Hence, SUPo::::s:::r (X(l) (s) - x<2) (s )) 2 = (X(l) (0) - x<2) (0) ) 2 for this
process. 0
Note: The bound in Eq. 4.80 depends on an undetermined constant Cr and its definition
requires that x<l) (0), x<2)(0) be in L2. Because c, is not known, the bound only provides
qualitative information on the evolution in time of the difference between x<l) and x<2).
For the Omstein-Uhlenbeck process X, Eq. 4.80 gives

Similar bounds can be obtained for any linear system. Hence, the bound in Eq. 4.80 is not
useful for these systems. A

Suppose that the drift/diffusion coefficients and the Brownian motion in Eq. 4. 71
satisfy the conditions of the first existence and uniqueness theorem in Sec-
tion 4. 7 .1.1. Then the solution X of this equation is a semimartingale with the
representation
X(t) = X(O) + A(t) + M(t), (4.81)
where A and M are defined by Eq. 4. 77, A is an adapted process with samples of
finite variation on compacts, M is an :Ft -square integrable martingale, A (0) = 0,
and M(O) = 0 ([42], Theorem 10.5, p. 228).

Note: The stated properties of A and M have been established in the proof of the existence
and uniqueness of the solution ofEq. 4.71 (Section 4.7.1.1). We calculate here just the first
two moments of a coordinate
d' t d'
Mi(t) =L fo bik(X(u))dBk(u) = L(bik(X) · Bk)(t)
k=l k=l

of M. The mean of Mi is zero because bik(X) · Bk is a martingale starting at zero. The


second moment of M; is

d'
E[M;(t) 21 = L E [Cbik(X) · Bk) (bu(X) · Bt)]
k,l=l

[lot
d'
= L E bik(X(u))bu(X(u))d[Bk, Bt](u)J
k,l=l
d1 d1
=L E [ {' bik(X(u)) 2 du] =L {' E[b;k(X(u)) 2 ] du
bl k blk
264 Chapter 4. Ito's Formula and Stochastic Differential Equations

by the definition of Mi, the linearity of the expectation operator, a property of the stochastic
integral (Eq. 4.34), and Fubini's theorem. The expectation of Mi is finite since the entries
of b are assumed to be bounded. A

Example 4.36: Let d = d' = 1, a(x) = a(x), and b(x) = b(x) in Eq. 4.71. If
the conditions of the first existence and uniqueness theorem in Section 4. 7 .1.1 are
satisfied, then

X(t) = X(O) +lot a(X(s)) ds +lot b(X(s)) dB(s)

has a unique solution and

g(X(t))- g(X(O)) = t g'(X(s)) dX(s) + ~2 lot g"(X(s)) b(X(s)) 2 ds


lo
for any function g with continuous second order derivative. <>
Proof: We have seen that X is a semimartingale so that the Ito formula can be applied
to g(X). Because X is continuous, we can use the Ito formula in Eq. 4.42. This formula
depends on the quadratic variation process [X, X] of X(t) = Z + A(t) + M(t), where
Z = X(O), A(t) = JJ
a(X(s)) ds, and M(t) = JJ
b(X(s)) dB(s). We have
[X,X]=[Z,Z]+[Z,A]+[Z,M]+[A,Z]+[A,A]
+[A, M] + [M, Z] + [M, A]+ [M, M],

by the linearity of quadratic covariation (Eq. 4.26). All quadratic covariations with argu-
ments Z and A or Mare zero because Z is a constant process and A(O) = M(O) = 0. The
quadratic variation of A and the quadratic covariation of A and M are also zero. Hence, we
have (Eq. 4.34)

[X, X]= z2 + [M, M] = z2 + f b(X(s)) 2 d[B, B](s) = z2 + f b(X(s)) 2 ds

so that d[X, X](t) = b(X(t)) 2 dt since z2 is a constant process. •


Example 4.37: Let X be the diffusion process in Example 4.36 and g : ffi.
~ ffi. be
an increasing function with continuous second order derivative. The memoryless
transformation Y(t) = g(X(t)) of X is a diffusion process defined by dY(t) =
ay(Y(t)) dt + by(Y(t)) dB(t), where
ay(y) = g'(x)a(x) + 2:1 g"(x) b(x) 2 , by(y) = g'(x) b(x), and x = g- 1(y).
Figure 4.10 shows a sample path of X and the corresponding sample of
Y(t) = g(X(t)) = X(t) 3 , respectively, for a(x) = -x and b(x) = 1. The dif-
ference between the samples X and Y is significant. Memoryless transformations
can be used to generate diffusion processes of specified marginal distribution (Sec-
tion 3.6.6in this book, [79], Section 3.1). For example, Y(t) = F -l (<l>(.J2 X(t)))
is a diffusion process with marginal distribution F, where <I> denotes the distribu-
tion of the standard Gaussian variable. <>
4. 7. Stochastic differential equations 265

6.------.-------,-------.-------,------,

5
3
Y(·,CO)=X(·,ro)

0.2 0.4 0.6 0.8


time

Figure 4.10. Sample paths of X defined by dX(t) = -X(t) dt + dB(t) and its
memoryless transformation Y(t) = X(t) 3

Proof: The Ito formula in Eq. 4.45 applied to Y = g(X) gives


Y(t)- Y(O) ={ g 1 (X(s)) [a(X(s)) ds + b(X(s)) dB(s)]
+ ~ (' g 11 (X(s))b(X(s)) 2 ds
2 Jo
and has the differential form

dY(t) = [g' (X(t)) a(X(t)) + ~ g11 (X(t)) b(X(t)) 2] dt + g1(X(t)) b(X(t)) dB(t).
I
Because g is an increasing function, x = g- 1(y) exists so that Y is a diffusion process
defined by the stochastic differential equation

dY(t) = [g1 (g- 1(Y(t))) a(g- 1(Y(t))) + ~ g'' (g- 1(Y(t))) b(g- 1(Y(t))) 2] dt

+ g1(g- 1(Y(t))) b(g- 1(Y(t))) dB(t)


with the stated drift and diffusion coefficients.
For example, the mapping x ~ y = g(x) = x3 has the inverse x = g- 1(y) =
JyJ 113 sign(y), where sign(y) = -1, 0, and 1 for y < 0, y = 0, andy> 0. The stochastic
differential equation of Y for a(x) = -x and b(x) = 1 is
= 3 (-Y(t) + JY(t)J 113 sign(Y(t)))dt + 3JY(t)J 213 dB(t).
dY(t)

Sample paths of Y can be obtained from samples of X and Y = X 3 or can be generated


directly from the above stochastic differential equation of Y. •
266 Chapter 4. Ito's Formula and Stochastic Differential Equations

Example 4.38: Let X be the diffusion process in Example 4.36 and h : lR ---+ lR
be a function with continuous second order derivative. Then,

lo t h(X(s)) odX(s) = lot h(X(s))dX(s) +-1 lot h (X(s))b(X(s)) 2 ds,


1

0 0 2 0

where f~ h(X(s)) o dX(s) and f~ h(X(s)) dX(s) denote Stratonovich and Ito in-
tegrals, respectively. <>
Proof: We have seen that X is a semimartingale with continuous sample paths and that
d[X, X]c(t) = d[X, X](t) = b(X(t)) 2 dt. The above relationship between the Ito and
Stratonovich integrals follows from Eq. 4.61 with g 1 replaced by h. •

Example 4.39: Let X be the solution of the Ito integral equation

X(t) = X(O) +lot a(X(s), s) ds +lot b(X(s), s) dB(s). (4.82)

This process also satisfies the Stratonovich integral equation

X(t) =X(O)+ lot a(X(s),s)ds+ lot b(X(s),s)odB(s), (4.83)

where a(x, t) = a(x, t)- (1/2) b(x, t) [ab(x, t)jax]. It is assumed that b(x, t)
has continuous second and first order partial derivatives with respect to x and t,
respectively. <>
Proof: Let U(s) = b(X(s), s). The Ito and Stratonovich integrals of U with respect to B
are related by (Eq. 4.58)

{' U(s) o dB(s) = {' U(s) dB(s) +! [U, Bf(t).


lo lo 2
The ltOformula applied to U(t) = b(X(t), t) gives

= [a(X(t), t )au- +-
au 1 2 a u] + b(X(t), t )au- dB(t)
2
dU(t)
ox ot +-2 b(X(t), t) -ax 2 dt
ax
so that
d[U, B](t) au dB(t), dB(t) J= b(X(t), t) ih
= [b(X(t), t) ih au dt.
Since [U, B] = [U, Bf, the relationship between the Ito and Stratonovich integrals be-
comes

{t U(s) o dB(s)
lo
= {' U(s) dB(s) +!
lo 2
t
lo
b(X(s), s) a~(s) ds,
uX
(4.84)

which yields the expression of ii in Eq. 4.83. •


4. 7. Stochastic differential equations 267

It is tempting in numerical calculations to (1) approximate Bin Eq. 4.82 by


a sequence of processes B (n) converging in some sense to B and (2) approximate
X by the limit of the sequence of processes X (n) defined by Eq. 4.82 with B (n) in
place of B. It turns out that this approximation can be incorrect. Let p n = (0 =
tan> < t~n) < · · · < tf:2 = t) be a partition of [0, t], t > 0, such that ll(pn) ---+ 0
as n---+ oo. Define B(n) to be a linear interpolation function between the values of
Bat the points of Pn· The process B(n) has continuous samples that approach the
samples of B as n ---+ oo but it differs from B in an essential way. The Brownian
motion is a martingale while B(n) is not. Moreover, B(n) is of finite variation on
compacts so that the integral J~ b(x<n>(s), s) dB(n)(s) can be calculated as a path
by path Riemann-Stieltjes integral.

Wong-Zakai theorem. If B(n) is a sequence of processes with continuous sam-


ples of finite variation on compacts and piece-wise continuous derivatives that
converges to B a.s. and uniformly, then the solution X (n) of Eq. 4.82 with B (n)
in place of B converges a.s. and uniformly to the solution of the Ito equation
([102], Section 5.4.2)

X(t) = X(O) +lot a*(X(s),s)ds +lot b(X(s),s)dB(s),


_*
where a (x, t) = a(x, t) +-21 b(x, t) CJb(x,
ax .
t)
(4.85)

Note: The Stratonovich version ofEq. 4.85 is (Eqs. 4.82 and 4.83)

X(t)=X(O)+ lo t[a*(X(s),s)--b(x,t)--'-
0
1
2
ab(x t) J ds+ lot b(X(s),s)odB(s)
ax 0

= X(O) +lot a(X(s), s) ds +lot b(X(s), s) o dB(s),


showing that the approximation of B in an Ito equation by a colored noise results in the
interpretation of this equation as a Stratonovich stochastic integral equation. •

4.7.1.3 Moments and other properties of diffusion processes


Examples are presented to illustrate how the Ito formula can be used to
derive differential equations for moments, densities, and characteristic functions
of X. These examples are extended significantly in Chapter 7. We also show how
the Stratonovich version of an It6 stochastic differential equation can be used to
find properties of X.

Example 4.40: Let X be a real-valued diffusion process representing the unique


solution of a stochastic differential equation with drift and diffusion coefficients
a(x) and b(x) 2 • If a and bare polynomials, we can develop ordinary differential
equations for the moments t-t(q; t) = E[X(t)q], q = 1, 2, ... , of X provided
268 Chapter4. Ito's Formula and Stochastic Differential Equations

these moments exist and are finite. Generally, the resulting moment equations
form an infinite hierarchy so that they cannot be solved exactly. 0
Proof: If q :::: 0 is an integer and g(x) = xq, the Ito formula can be applied to the process
g(X) since X is a semimartingale and g E C2 (IR). The expectation of this formula gives

dE[X(t)q] = q E[X(t)q- 1 dX(t)] + q (q2- l) E[X(t)q- 2 b(X(t)) 2 ] dt

= q E[X(t)q- 1 a(X(t))] dt + q (q 2- l) E[X(t)q- 2 b(X(t)) 2 ] dt.

This equation becomes a differential equation for the moments of X (t) if the drift and
diffusion coefficients are polynomials. If a and b are polynomials of degree at most one,
the moment equations are closed so that they can be solved. •

Example 4.41: Let X be an Omstein-Uhlenbeck process defined by dX (t) =


-a X (t) dt + {3 dB(t), where B is a Brownian motion and a > 0 and {3 are
some constants. The moments JL(q; t) = E[X(t)q] of X(t) satisfy the ordinary
differential equation
. {32q(q- 1)
JL(q; t) = -a q JL(q; t) + 2 JL(q- 2; t), q = 1, 2, ... , (4.86)

with the convention JL(u; t) = 0 for any integer u ::;: -1, where f-t(q; t) =
dJL(q; t)fdt. 0
Proof: The drift and diffusion coefficients of X satisfy the uniform Lipschitz and the
growth conditions so that there exists a unique, adapted, and continuous solution if X (0) E
Fo. The differential form of the Ito formula is (Example 4.36)

dg(X(t)) = g1 (X(t)) (-a X(t) dt + f3 dB(t)) + ~2 g'' (X(t)) dt,

[-a
so that
~E[g(X(t))] = E
dt
g 1 (X(t)) X(t) + /322 g 11 (X(t))].
Eq. 4.86 results from the above equation with g(x) = ~. •
Example 4.42: Let X be the Omstein-Uhlenbeck process in Example 4.41. The
characteristic function ({J(u; t) = E[exp( A u X (t)), u E IR, of X (t) satisfies
the partial differential equation
o({J o({J {3 2 u 2
- = - a u - - --qJ. (4.87)
ot au
2
The stationary solution ({Js(u) = limHoo qJ(u; t) = exp ( -{3 2 u 2/(4a)) of this
equation satisfies an ordinary differential equation obtained from Eq. 4.87 by set-
ting oqJjot = 0. Hence, the stationary value of an Omstein-Uhlenbeck process at
any time t is a Gaussian variable with mean zero and variance f3 2 I (2 a), a known
result ([175], Example 5.6, p. 179). 0
4. 7. Stochastic differential equations 269

Proof: The partial differential equation of the characteristic function given by Eq. 4.87 can
be obtained by applying Ito's formula to g(X) = exp(H u X), u E R. Because g is a
complex-valued function, the ItO formula previously developed cannot be applied directly.
However, this formula is linear in g so that we can apply it to the real and imaginary parts
of g separately and add the corresponding contributions.
This extended version of the Ito formula gives

~rp(w
at •
t) = E [-a X(t) J=T u e A u X(t) + fi 2 (J=l u)2 eA u X(t)]
2

when applied to g(X) = exp(J=T u X). The result of Eq. 4.87 follows since akrpjauk =
E [ ( J=T X (t) )k exp(J=T u X (t)) J. where k ~ 1 is an integer. •
Example 4.43: Let X be the solution of the stochastic differential equation
dX (t) = a(X (t), t) dt + b(X (t), t) dB(t), t 2: to, (4.88)
starting at X(to) = xo. The density f(x; t I xo; to) of the conditional variable
X(t) I (X(to) = xo), referred to as the transition density of X, satisfies the
Fokker-Planck equation
at a 1 a2
at = -ax (a f) + 2 ax2 (b 2 f) (4.89)

with the initial condition f(x; to I xo; to) = 8(x -xo) and the boundary conditions
limlxl ... oo a(x, t) f(x; t I xo; to)= 0, limlxl ... oo b(x, t) 2 f(x; t I xo; to)= 0, and
lim1x1 ... oo ja[b(x, t) 2 f(x; t I xo; to)]jax = 0. Because X is a Markov process,
the density of X(to) and the transition density f(x; t I xo; to) define all the finite
dimensional densities of this process.
The solution of the Fokker-Planck equation is completely defined by the
coefficients of Eq. 4. 88 and does not depend on the particular version of B. Hence,
the Fokker-Planck equation can only deliver the weak solution for Eq. 4.88. We
also note that, if a process Y is such that the density of Y (t) I Y(to) = yo satisfies
Eq. 4.89, it does not mean that Y is a diffusion process. ¢
Proof: The expectation of the Ito formula (Eq. 4.42) gives

d [ ag bz azg]
dt E[g] =E a ax +2 ax2

for any real-valued function x ~ g(x) with continuous second order derivative or

d
dt
r
lJR g f(x; t I xo; to) dx = JJR r [a axag + 2b2 a2g]
axZ f(x; t I x;, to) dx.

Integrating the last equation by parts and using the stated boundary condition, we find

r gat
lJR r [-axa (a f)+ 21 ax2
at dx = lJR a2 (b2 f) ] gdx.

The Fokker-Planck equation results since g is an arbitrary function. •


270 Chapter 4. Ito's Formula and Stochastic Differential Equations

Example 4.44: Let g(x, t) be a function with continuous second and first order
partial derivatives in x and t, respectively. The Stratonovich calculus obeys the
classical chain rule of differentiation, that is,

d 0 g(X(t), t) = Bg(~~), t) d 0 X(t) + Bg(~~t), t) dt, (4.90)

where X is defined by Eqs. 4.82-4.83 and do is used to indicate Stratonovich


calculus. <>
Proof: Let X be a diffusion process defined by Eq. 4.82. The Ito formula applied to
Y(t) = g(X(t), t) gives

Y(t)- Y(O) = r (axag dX(s) + agas ds) + 2~ lo(' axa2g2 b2 d[B, B](s)
lo
= f'
lo
(a ag
ax
+ ag
as
+ ~ b2 a2g)
2 ax2
ds + f' bag dB(s),
Jo ax
where g(x, t) has continuous second and first order partial derivatives in x and t, respec-
tively.
The relationship between the Ito and Stratonovich integrals given by Eq. 4.84 with
U replaced by b (agjax) yields

lo tb-odB=
0
a8
ax
lot b-dB+-
a8
ax
1 lot (ab
2
8 b
- -a+
ax ax
0
a2-
8 ) bds
ax2 0

so that

Y(t)- Y(O) = lo t(aaxa8- +a8as- - -1 baxab-axa8- ) ds + lot baxa8-


0 2 0
o dB

or
Y(t)- Y(O) = lo t(ag- + aag)
as 0 ax
- ds + lot ba-
0 ax
8 dB, o

where a =a- (1/2) b abjax (Example 4.39). Since we have d 0 X(t) =a dt + b o dB(t),
the differential form of the above equation is d 0 Y(t) = (agjax) d 0 X(t) + (agjat) dt as
stated. •

Example 4.45: The Stratonovich calculus can be used to find the solution of some
Ito differential equations. For example, the solution of the Ito equation dX (t) =
a(X(t)) dt + b(X(t)) dB(t), t ?::. 0, with a(x) = h(x) h' (x)j2, and b(x) = h(x),
is X(t) = g- 1 (g(X(O)) + B(t)), where his a differentiable function and g is the
primitive of h -l. <>
Proof: The Stratonovich integral representation of the diffusion process X is X(t) =
X(O) + f~ h(X(s)) o dB(s) since a = a- (1/2) b abjax = 0 (Eq. 4.83). The differ-
ential form of this equation is d 0 X(t) = h(X(t)) o dB(t). The rules of classical calculus

loot
apply and give
-do X (s)
- = dB(s) = B(t) lot
h(X(s)) 0
so that g(X(t))- g(X(O)) = B(t) ([131], Section 3.2.2). •
4. 7. Stochastic differential equations 271

Example 4.46: The solution of the Ito stochastic differential equation dX(t) =
a(t) X(t) dt + b(t) X(t) dB(t) is

X(t) = X(O) exp [lot (a(s)- 0.5 b(s) 2 ) ds +lot b(s) dB(s) J.
If a(t) = a and b(t) = bare constant, then

X(t) = X(O) exp [ca- 0.5 b 2 ) t + b B(t) J

is the geometric Brownian motion (Example 4.34). <>


Proof: The Ito formula applied to Y(t) = ln(X (t)) gives

Y(t)- Y(O) = f (a(s)- 0.5 b(s) 2 ) ds + f b(s) dB(s),

which yields the stated result. The process X is strictly positive a.s. on any bounded time
interval by the functional form of its stochastic differential equation and properties of the
Brownian motion so that the mapping X I-+ ln(x) is in C2 (JR) . •

4.7.2 Semimartingale input


This section considers stochastic differential equations driven by semimartin-
gale noise, that is, X is the solution of Eqs. 4.69 and 4.70, where Y is a semi-
martingaleS. The first and second integrals in Eq. 4.70 are Riemann-Stieltjes and
stochastic integrals, respectively.
We present without proof conditions for the existence and uniqueness of
the solution of Eq. 4.70 with Y being a semimartingale process S and illustrate
these results by examples.

If Sis a semimartingale with S(O) = 0, X(O) is finite and To-measurable, and


the function G : [0, oo) x Q x JR. -+ JR. is such that (1) the process (t, w) -+
G(t, w, x) is in£ for a fixedx and (2) IG(t, w, x)- G(t, w, x') I _:: K (w)lx -x'l
for each (t, w), where K is a finite-valued random variable, then the stochastic
integral equation

X(t) =X(O)+ lot G(s,·,X(s-))dS(s) (4.91)

has a unique solution that is a semimartingale ([147], Theorem 6, p. 194).

Note: The requirement that G(·, ·, x) be an adapted, caglad process for each x E JR. guar-
antees that the stochastic integral in Eq. 4.91 is defined. The requirement that G(t, w, ·)
satisfy a Lipschitz condition for each (t, w) resembles the conditions in Section 4. 7 .1.1 for
the existence and uniqueness of the solution of differential equations driven by Gaussian
white noise.
The theorem in Eq. 4.91 can be stated for a system of stochastic integral equations
involving a finite number of input semimartinga1es ([147], p. 194). A
272 Chapter 4. Ito's Formula and Stochastic Differential Equations

Example 4.47: Let dX (t) = b(X(t- ), t) dS(t) be a stochastic differential equa-


tion with the integral form

X(t) = X(O) +lot b(X(s-),s)dS(s),

where Sis a semimartingale. If X (0) E Fo, t 1--+ b(x, t) is continuous for each x,
and x 1--+ b(x, t) satisfies a uniform Lipschitz condition for each t 2: 0, then the
above equation has a unique solution that is a semimartingale.
If S is a compound Poisson process C, we have
N(t) t
X(t)=X(O)+ Lb(X(n-),n)Yk=X(O)+ f b(X(s-),s)dC(s),
k=l lo
where (T1, T2, .. .) and (Y1, Y2, .. .) denote the jump times and the jumps of C,
respectively. <>
Proof: The existence and uniqueness of the solution X follows from the above theorem.
The second condition on Gin Eq. 4.91 becomes the uniform Lipschitz condition lb(t, x)-
b(t, x') I :=:: c lx- x'l for each t 2: 0 and a constant c > 0 since b is a deterministic function.
If S is a compound Poison process C, the sample paths of X are constant between
consecutive jumps of C and have jumps

f).X(Tk) = X(Tk)- X(Tk-) = b(X(Tk-), Tk) f).C(Tk) = b(X(Tk-), Tk) Yk

at the jump times of C so that X(t) = X(Tk-) + b(X(n-), Tk) Yb t E [Tko Tk+l). This
recurrence formula gives the stated expression of X. &

Example 4.48: Let S be a semimartingale and X be the solution of dX (t)


a(X(t-), t) dt + b(X (t- ), t) dS(t). If X (0) E Fo and a, bare such that (1) the
functions t 1--+ a(x, t), b(x, t) are continuous for each x and (2) the functions
x 1--+ a(x, t), b(x, t) satisfy uniform Lipschitz conditions for each t 2: 0, then X
is the unique solution of the stochastic integral equation

X (t) =X (0) +lot a(X(s- ), s) ds +lot b(X (s- ), s) dS(s)

and is a semimartingale. <>


Note: We have mentioned that the statement of theorem guaranteeing the existence and
uniqueness of the solution of Eq. 4.91 can be extended to a vector input and/or output.
Consider the alternative form,

X(t)=X(O)+ f[a(X(s-),s) b(X(s-),s)] [ d~~s) J.


of the integral equation defining X, that extends Eq. 4.91 to the case in which the input
is a vector of semimartingales. Since a and b satisfy the conditions for the existence and
uniqueness of the solution of Eq. 4.91, X is the a unique solution of the above stochastic
differential equation and is a semimartingale since (s, S (s)) is a semimartingale. A
4. 7. Stochastic differential equations 273

We need to introduce some notation and definitions to state the following


conditions for the existence and uniqueness of the solution of stochastic differ-
ential equations driven by semimartingale noise. Let 'Dd denote the class of !Rd-
valued stochastic processes whose coordinates are adapted processes with cadlag
sample paths, that is, are processes in 'D. An operator G : 'Dd --+ 'D is said
to be functional Lipschitz if for any processes X, Y E 'Dd we have (1) for any
stopping timeT, xT- = yT- implies G(X)T- = G(Yl- and (2) there is an
increasing finite process K(t), t =::: 0, such that IG(X)(t)- G(Y)(t)l ::::: K(t) II
X(t)- Y(t) II a.s. for each t =::: 0 ([147], p. 195).

If S = (St, ... , Sa') is a vector of semimartingales, S(O) = 0, li E 'D, i =


1, ... , d, and the operators G{ : 'Dd --+ 'D, i = 1, ... , d, j = 1, ... , d', are
functional Lipschitz, then the system of stochastic integral equations

Xi(t) = li(t) + L
d'

j=l 0
1 t
Gf (X)(s-) dSj(s), i = 1, ... , d, (4.92)

has a unique solution in 'Dd. If the processes li are semimartingales, then Xi,
i = 1, ... , d, are also semimartingales ([147], Theorem 7, p. 197).

Note: The solution of Eq. 4.92 is an Ri -valued process X = (X 1 , ... , X a). This equation
extends Eq. 4.91 since it considers vector input/output processes and X depends on some
processes Ji. That the integrands in Eq. 4.92 do not depend explicitly on time is not a
restriction since the state X can be augmented to include time as a coordinate . .&

If conditions for the existence and uniqueness for the solution of Eq. 4.92 hold
and X}O) E 'D, i = 1, ... , d, then X(k), defined by recurrence formula

x?+l) (t) = ]j (t) + L


d'

j=l 0
1 t
G{ (X(k))(s-) dSj (s), i = 1, ... 'd, (4.93)

converges in ucp to the solution X ofEq. 4.92 ([147], Theorem 8, p. 199).

Example 4.49: Let X be the solution of the stochastic differential equation in Ex-
ample 4.48, where S = C is a compound Poisson process and a, b are not explicit
functions of time. This equation has a unique solution under the conditions stated
in Example 4.48.
We construct here the solution X based on elementary arguments. If there
is a function h with inverse h- 1 such that h'(x) = 1ja(x), then

X(t) = h- 1 (h(X(Tk)) + t - Tk), t E [Tk. Tk+t),

where X(Tk) = X(Tk-) + b(X(Tk-)) Yk. Tk are the jump times of C, and Yk
denote the jumps of this process. <>
274 Chapter4. Ito's Formula and Stochastic Differential Equations

Proof: We have already established the relationship between values of X immediately


before and after a jump of C. If t E [Tk, Tk+ 1), then X is the solution of the determinis-
tic differential equation X(t) = a(X(t)) with the random initial condition X(Tk) so that
JJ/R) dxja(x) = Jfk ds and h(X(t))- h(X(Tk)) = t- Tk fortE [Tb Tk-1).
Note also that if a ( ·) is such that X is a continuous differentiable function during the
time intervals [Tb Tk+ 1), then X is a pure jump semimartingale. Under this assumption
[X, X] is zero in each time interval [Tb Tk+1). The only contribution to the quadratic
variation process [X, X] is provided by the jumps 6.X (Tk) of X so that [X, X]c = 0 and
[X, X](t) = L,f~? (6.X(Tk)) 2. •

Example 4.50: Let X be the solution of dX (t) = X (t-) dC(t), t :=:: 0, where C
is a compound Poisson process. The Ito and Stratonovich interpretations of this
equation,

X(t)=X(O)+ lot X(s-)dC(s) and X(t)=X(O)+ lot X(s-)odC(s),


coincide and X(t) = X(O) nf~?(l + Yk), where (T1, h ... ) and (Y1, Y2, .. .)
denote the jump times and the jumps of C, respectively. <>
Proof: It can be shown that ([147], Theorem 23, p. 224)

X(t) = X(O) eC(t) n (1 + 6.C(s)) e-Ll.C(s) = X(O) n


N(t)
(1 + Yk).
O<s~t k=1
This solution can also be obtained by elementary arguments since d X (t) = 0 for t E
[Tb Tk-1) andeachk:::: 1. For example, we have X(t) = X(T1- )+6.X(T1) = X(T1-H

I
X (T1-) 6.C(T1) fort E [T1, T2) so that
X(O) fort E [0, T1),
X(t) = X(O) (1 + Y1) fort E [T1, T2),
X(O) (1 + Y1) (1 + Y2) fort E [T2, T3).

The relationship,

= f' X(s-) dC(s) + ! [X, C]c (t),


lof' X
(s-) o dC(s)
lo 2
between the Ito and Stratonovich integrals (Eq. 4.58) shows that these integrals coincide
since X and Care pure jump semimartingales so that [X, CjC = 0. •

Example 4.51: Let X be the process in Example 4.49 and g E C 2 (JP&) denote a
real-valued function. The change of the process g(X) in a time interval can be
obtained from the Ito formula in Eq. 4.40. A more familiar version of this result
is ([172], Theorem 4.2.2, p. 199)

g(X(t)) -g(X(O)) =lot a(X(s-))g'(X(s-))ds

+lot l[g(X(s-)+yb(X(s-)))-g(X(s-))]M(ds,dy), (4.94)


4. 7. Stochastic differential equations 275

where M is a Poisson random measure on [0, oo) x lE. defining a compound Pois-
son process C(t) = '£~~? Yk, E[M(ds, dy)] = ("Ads) dF(y), "A > 0 is the in-
tensity of the Poisson process N, and F denotes the distribution of the iid random
variables Yk (Section 3.12). <>
Proof: The Ito formula in Eq. 4.40 gives

g(X(t))-g(X(O))= t g'(X(s-))dX(s)+~ t g"(X(s-))d[X,X]c(s)


lo+ 2]o+
+ L [g(X(s)- g(X(s-))- g'(X(s-)) LlX(s)].
O<s:::t

Because X is a quadratic pure jump semimartingale the above integral with integrator
[X, X]c is zero so that

g(X(t))- g(X(O)) = t
lo+
g'(X(s-)) [a(X(s-))ds + b(X(s-))dC(s)]
+ L [g(X(s)- g(X(s-))- g'(X(s-)) LlX(s)].
0<s:9

Because LlX(s) = b(X(s-))LlC(s), we have


f0+
g' (X(s-)) b(X(s-)) dC(s) = L
O<s:c:t
g' (X(s- )) b(X(s-)) LlC(s)

= L g'(X(s-)) LlX(s)
O<s:::t

so that

g(X(t))- g(X(O)) = f0+


g'(X(s-)) a(X(s-))ds + L
O<s~t
[g(X(s)- g(X(s-))].

The result in Eq. 4.94 follows from the above equation and the integral form,

t f [g(X(s-) + y b(X (s- )) ·-


lo lJR
g(X(s- ))] M(ds, dy),

ofLO<s:c:t[g(X(s))- g(X(s-))]. •

4.7.3 Numerical Solutions


Explicit solutions of stochastic differential equations are only available in a
few cases ([115], Section 4.4). Numerical methods are needed for solving most
stochastic differential equations encountered in applications. Our discussion on
numerical methods is limited to real-valued diffusion processes defined by the Ito
differential equation

dX(s) = a(X(s)) ds + b(X(s)) dB(s), s E [0, t], (4.95)


276 Chapter 4. Ito's Formula and Stochastic Differential Equations

where a, b are such that this equation has a unique strong solution (Section 4. 7 .1.1 ).
An extensive discussion on numerical solutions for stochastic differential
equations driven by Brownian motion can be found in [115]. Essentials on nu-
merical solutions ofEq. 4.95 are in [131] (Section 3.4).

4.7.3.1 Definitions

Let Pn be a sequence of partitions of [0, t] with points 0 = tcin) < t{n) <
.. · < t/::! = t such that ~(Pn) ~ 0 as n ~ oo. We take mn =nand t~n) -t~~I =
tIn so that the partition mesh is ~ (Pn) = tIn. In this section we use the notation
~ (pn) = 8n = tIn for convenience.
A numerical solution x<nl ofEq. 4.95 is a stochastic process that approxi-
mates the solution X of this equation. The values of X (n) are calculated only at the
times rtl = k tIn, k = 0, 1, ... , n, that is, at the points of Pn· Any interpolation
method can be used to define X (n) at all times in [0, t ]. For example, the following
definition of x<n) is based on linear interpolation.

Numerical solution based on linear interpolation. If s E [t~~~, t~nl],

x<n) (t(n)) - x<n) (t(n) )


x<nl(s) = x<n)(t(n))
k-!
+ k
(n) (n)
k-! (s- t(n)
k-!
). (4.96)
tk - tk-!

Note: The approximating solution x<n) (s) at a times E [t~:_\, t~n)] depends linearly on
the values of x<n) at the two ends of this interval. The samples of x<n) are continuous
samples but are not differentiable at the times t~n) defining the partition Pn· A

To assess the accuracy of an approximating solution, we need to (1) define


a measure for the difference between X (n) and X, called the error of X (n), (2) de-
termine the dependence ofthe error of x<nl on the partition mesh ~(Pn) = 8n,
and (3) develop upper bounds on this error.

Error functions. For a partition Pn of [0, t], we define the error functions

es(Pn, t) = E [lx(t)- x<nl(t)IJ and

ew(Pn. t) =IE [X(t)]- E [ x<n)(t)JI· (4.97)

Note: The expectation E [sup09 :9 1X(s)- x<nl(s)l] provides a useful measure for the
accuracy of the approximation x<n) over the entire time interval [0, t], but it is difficult to
evaluate. The error functions in Eq. 4.97 are calculated at the end of the time interval [0, t]
for simplicity.
4. 7. Stochastic differential equations 277

We also note that es(Pn. t) measures the difference between the sample values of
the random variables X (t) and xCn) (t) while ew (Pn, t) gives the difference between the
expected values of these random variables. A

We give now some definitions characterizing the dependence of the approx-


imating solution xCn) on the mesh 8n of Pn and time. The first definitions are
based on the error functions in Eq. 4.97. The last definition involves properties of
the drift and diffusion coefficients of Eq. 4.95.
If es(Pn. t) -+ 0 as n -+ oo, we say that xCnl converges strongly at time
t to X. The approximation xCn) converges strongly with order y > 0 at timet
to X if there exist some constants c > 0 and 8 > 0 such that es(Pn. t) ::::; c (8n)Y
for each 8n ::::; 8 and c does not depend on 8 n. Weaker versions of these definitions
result by replacing es(Pn. t) with ew(Pn. t). We say that xCn) converges weakly
at timet to X if ew(Pn· t) -+ 0 as n -+ oo. Similarly, xCnl converges weakly
with order y > 0 at time t to X if there exist some constants c > 0 and 8 > 0
such that ew(Pn. t) ::::; c (8n)Y for each 8n ::::; 8 and c does not depend on 8n.
Let (:Ft)t>O be the natural filtration of X defined by Eq. 4.95. A discrete
time approximation xCn) is strongly consistent if there exists a positive function
~(u) such that limu-).0 nu) = 0,

and

E [ ( xCnl (t~~ 1 ) - xCn) (t~n)) - E [ xCnl (t~~ 1 )- xCnl (t~n)) I :Ftinl]

-b(X(n)(tt))) flBt)) 2 /8n J :S ~(8n).

Note: This definition requires that the conditional expectation of the increment of x(n)
in the time interval [t~n), t~~1 ] divided by On converges in m.s. to the drift of X and that
the variance of the difference of the random parts of the approximate and exact solutions
converges in m.s. to zero as on -+ 0. An extensive discussion on this topic can be found in
[115] (Chapter 9). A

4.7.3.2 Euler and Milstein numerical solutions

Euler numerical solution.


xCnl(t(n))- xCnl(t(n)) = a(X(t(n) )) Llt(n) + b(X(t(n) )) LlB(n) (4.98)
k k-1 k-1 k k-1 k ,

h AB(n) _ B( (n)) (n) ) (n) _ (n) (n) _ _


wereu k- tk -B(tk_ 1 ,Lltk -tk -tk_ 1 -8n,andk-l, ... ,n.
278 Chapter 4. Ito's Formula and Stochastic Differential Equations

Proof: The increment of the solution X in Eq. 4.95 during a time interval [t~~l, tt)] is
given by the recurrence formula

X(t~n))- X(t~~ 1 ) = 1 t(n)

k
ti':}l
a(X(s)) ds + 1 t(n)

k
ti':}l
b(X(s)) dB(s). (4.99)

The Euler numerical solution x<n) results from this equation by approximating the func-
tions a(X(s)) and b(X (s)), s E [t~~l, t~n)], by their values a(X (t~~ 1 )) and b(X (t~~l )) at
the left end of this time interval. Any interpolation method can be used to define x<n) at
all times in [0, t], for example, Eq. 4.96. •
The Euler numerical solution is the simplest scheme for solving stochastic
differential equations. This method is frequently used in the subsequent chap-
ters of the book to generate samples of diffusion and other processes defined by
stochastic differential equations.

The Euler numerical solution converges strongly with order y = 0.5 and weakly
with order y = 1 under relatively mild conditions ([115], Theorem 9.6.2, p. 324,
[131], p. 162).
Note: An extensive discussion on the convergence of the Euler approximation can be found
in [115] (Chapters 8 and 9). For example, if X(O) E Lz is .Fa-measurable and a, b in
Eq. 4.95 are measurable functions in Lz satisfying some Lipschitz and growth conditions
(Eqs. 4.73 and 4.75), then a strongly consistent approximation x<n) converges strongly to
X. Moreover, we have E [1x(n)(t)- X(t)l] :::: p (8n + ~(8n)) 1 1 2 , where pis a positive
constant and ~ is the function in the definition of a strongly consistent approximation, that
is,~ > 0 and limu-J,O nu) = 0 ([115], Theorem 9.6.2, p. 324). The function~ is zero for

the Euler approximation so that the above upper bound becomes p ~/Z. Hence, the Euler
approximation converges strongly with order y = 0.5. A

Example 4.52: Consider the differential equation x(s) = a(x(s), s), s E [0, t],
starting at x (0) = xo. If (1) the drift a is uniform Lipschitz in x, that is, there exists
f3 > 0 such that la(x, s)- a(y, s)l ::::: f31x- yl for x, y E IE. and s E [0, t], (2) the
functions aajat and aa;ax are continuous, and (3) all solutions x are bounded in
[0, t], then the error ekn) = x(t~n)) - xt) satisfies the inequality

le(n)l
k
<CiOn (ekf38n-
- 2{3
1) '
where ct =max 1aa;atl +max Ia aa;axl and xt) = x<n)(t~n)) denotes the Euler
approximation at time t~n).
Figure 4.11 shows time histories of lekn) I and of the above upper bound on
the absolute value of the difference between the exact and the Euler numerical
solutions for a(x, t) = -5x, x(O) = 1, ct = 25, f3 = 5, and two partitions Pn
of [0, 1] with n = 10 and 100. The bound on le kn) I is too wide to provide any
meaningful information on the accuracy of the Euler numerical solutions. Note
that the plots for n = 10 and n = 100 have different scales. <>
4. 7. Stochastic differential equations 279

le(n)l
k

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8


time time

Figure 4.11. Upper bound and error of the Euler numerical solution

Proof: The difference between the Taylor expansion,

8 (n) E [t(n) /n) ]


k k ' k+l '

and the Euler approximation,

x<n) - x(n)
k+l - k
+ a(x(n) /n)) 8
k ' k n,

of the solution of x(t) = a(x, t) in the interval [t~n)' t~~~] is

e(n)
k+l
= e(n)
k
+ [a(x(t(n))
k ' k
t(n)) - a(x(n) t(n))] 8
k ' k n
+ !2 x((}(n))
k
82

The absolute value of the square bracket in the above equation is smaller than f31x(t~n)) -
x~n))l = f31e~n) I by assumption. We also have x = da(x, t)fdt = oafot +a oaf ox so
that there exists a > 0 such that lx(et))l ::::a. The above recurrence formula for ekn) and
the bounds give

< (1 +f38n)k+l_1 a8~


- f3 8n 2 '

since e&n) = 0. The latter inequality and ( 1 + c )n :::: en c, c > 0, deliver the stated bound
on the error with (c, n) replaced by ({3 8n, k + 1).
The drift of the differential equation considered in this example is a (x, t) = -5 x
so that x(t) = e- 5 t for x(O) = 1. The difference la(x, t)- a(y, t)l = 51x- Yl so that
we can take f3 = 5. Because x = da(x, t)fdt = oafot +a oaf ox, oafot = 0, and
a oaf ox = ( -5 x)( -5) = 25 x, we can take a = 25. •
280 Chapter 4. Ito's Formula and Stochastic Differential Equations

Milstein numerical solution.

x<n) (tin)) - x<n) (ti~ 1 ) = a(X (ti~ 1 )) Mt) + b(X (ti~1)) !lBt> + R.t?,
(4.100)

where R(n) -
k,1 - 2
!
b' (X (t(n) )) b(X (t(n) )) [(llB(n)) 2 - !lt(n)]
k-1 k-1 k k . (4.101)

Proof: Consider the increment of X in Eq. 4.99 during a time interval [ti~1 , tt>]. Instead
of approximating the functions a(X(s)) and b(X(s)), s E [ti~1 • tin)], by their values
a(X(ti~1 )) and b(X(ti~1 )) at the left end of this time interval (Eq. 4.98), we use the ItO
formula to represent the integrands a and bin [ti~ 1 , tin)]. The resulting expression is used
to establish the Milstein approximation.
The Ito formula applied to the functions a(X(s)) and b(X(s)) for s E [ti~1 • rt>J
gives

a(X(s))- a(X(ti~1 )) = [ s (a' a+ 21 a" b 2 )du + [s


(n) (n) a' bdB(u),

= [t(n)s (b' a+ -b" [s


tk-1 tk-1

1
b(X(s))- b(X(ti~1 )) b2 ) du + <) b' b dB(u),
2 t n
~I ~I

so that Eq. 4.99 becomes

= {k
t(n) [
a(X(tin\))+ {
s (a a+-a"b
1
1 2 )du+
[s a 1 bdB(u)
] ds
lri':!l - lri':!l 2 ri':!1
+ { t(n)
k
[
b(X(tk(n)1 )) + { s (b 1 a+ -b"
1
b2 ) du +
[s b' b dB(u)
] dB(s)
lr<•l
k-1
- Jr.<•)
k-1
2 r<•l
k-1

or
X (tin))- X(ti~ 1 ) = a(X(ti~ 1 )) t:..tin) + b(X(ti~ 1 )) t:..Bin) + Rkn),
where Rkn) = Rtf + Rt~ represents the correction relative to the Euler approximation
and

Rknl = { t0)
k
[
r b1 bdB(u) ] dB(s)
' 1ri~1 lt~~~
:::::. f?.(n) = b'(X(/n) ))b(X(/n) )) { k
t(n) [ s ]
dB(u) dB(s)
k-1 Jr<•)
{
k, 1 k-1 Jr<•l
k-1 k-1

= !2 b' (X(t(n)
k-1
)) b(X(t(n) ))
k-1
[<t:..B(n))Z- L!J.t(n)].
k k
4. 7. Stochastic differential equations 281

The arguments of the functions a and b were not written in some of the previous formulas
for simplicity. The final expression of R1nt
above is valid since (Eq. 4.3)

l t(n)
k
(n)
1k-!
1
k k
1
B(s) dB(s) = - (B(t(n)) 2 - /nl)- - (B(t(n) )2- /n) )
2 2 k-1 k-1

The Milstein approximation results by neglecting the component Rkni of Rkn) and replac-
. (n) b -(n) • '
mg Rk,l y Rk,l.

If the drift and diffusion coefficients of X satisfy some type of Lipschitz and
growth conditions, then the Milstein approximation converges strongly with or-
der y = 1 ([115], Theorem 10.3.5, p. 350).
Note: This property shows that the Milstein approximation is superior to the Euler approx-
imation. However, the difference between the Milstein and the Euler approximate solutions
for Eq. 4.95 corresponding to the same partition of the time interval [0, t] may not be sig-
nificant (Example 4.53). 4

Example 4.53: Consider the stochastic differential equation

dX(s) = c X(s) ds +a X(s) dB(s), E [0, t],

with the initial condition X (0) = xo, where c and a are some constants. The
geometric Brownian motion X(t) = xoe<c-O.saz)t+aB(t) is the solution of this
equation (Example 4.34).
Figure 4.12 shows with dotted and solid lines samples of the exact solution
and the corresponding samples of x<nl in the time interval [0, 1] obtained by the
Euler and Milstein formulas for c = 0.1, a = 2, and two partitions Pn of [0, 1]
with n = 50 and n = 150. The samples of X and X (n) in the figure correspond to
the same sample B ( ·, w) of B. The Milstein approximation is slightly superior to
the Euler approximation, an expected result since these approximations converge
strongly with order y = 1 and y = 0.5, respectively. <>
Note: The Euler and the Milstein formulas for the geometric Brownian motion are
X(n) - (1
k+l -
+ cjn) X(n)
k
+a X(n)
k
f),.B(n)
k ,
and

x1~ 1 = (1 + cjn) Xkn) +a Xkn) f),.Bkn) + (1/2) a 2 Xkn) [(f),.Bknl) 2 - 1/n],


282 Chapter 4. Ito's Formula and Stochastic Differential Equations

3,---~--~-----------,
Milstein, n=50
2.5 2.5

Euler, n=l50 Milstein, n=l50


2.5 2.5

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8


time time

Figure 4.12. Samples of the Euler and Milstein numerical solutions for a geomet-
ric Brownian motion process

respectively, with the notations Xkn) = x<n)(tin)) and L'l.Bin) = B((k + 1)1n)- B(kln)
for the increment of the Brownian motion in [kIn, (k + 1) In].
Estimates of the error function es (Pn, t) in Eq. 4.97 must be based on samples of
the exact and approximate solutions corresponding to the same sample of the Brownian
motion, that is,
1 ns
es(Pn.t)= ns .L [ix(t,w)-x<n)ct,w)IJ·
lil=l

where X(·, w) and x<n) (·, w) are calculated from the same sample B(·, w) of Band ns 2: 1
is an integer denoting the number of samples used in estimation. The samples of X and
x<n) needed to estimate the error ew (Pn , t) do not have to correspond to the same samples
of the Brownian motion process B. &

The difference between the Euler and the Milstein numerical solutions in
Eqs. 4.98, 4.100, and 4.101 is the correction term Rkni
that depends on the diffu-
sion part of Eq. 4.95. A major source of numerical error and instability relates to
the crude approximation of the drift part of Eq. 4.95 by both methods. Methods
developed for the numerical solution of deterministic differential equations can
provide superior approximations to the solution for Eq. 4.95. Consider a deter-
4. 7. Stochastic differential equations 283

ministic differential equation

i(s) = a(x(s), s), s E [0, t], (4.102)

assumed to have a unique solution in [0, t]. For this equation the Euler and Mil-
stein approximations coincide. We present an alternative to these numerical solu-
tions, which provides a superior approximation to the solution of Eq. 4.102.

Runge-Kutta numerical solution.

x~~l = x~n) + ~ [ak.l + 2ak,2 + 2ak,3 + t:¥k,4] 8n (4.103)

where x~n) = x<nl(t~n)), ak,l = a(x~n), tt)), ak,2 = a(xt) + !Xk,l 8n/2, t~n) +
8n/2),ak,3 =a(xk(n) +ak,28n/ 2 ,tk(n) + 8n /2) ,and t:¥k,4 =a(xk(n) +ak.3 8n.tk+l.
(n) )

Note: Additional information on the Runge-Kutta and related numerical methods for solv-
ing deterministic differential equations can be found in [115] (Section 8.2). &

Example4.54: LetXbedefinedbyEq.4.95witha(x) = px+f3x 3 andb(x) =


a. The Euler and Runge-Kutta numerical solutions for this equation are

x[~ 1 = x[nl + (p x[nl + f3 ( x[nl) 3) 8n +a t.B~n) and

x[~ 1 = x[nl + ~ [Ak.l +2Ak,2 +2Ak,3 +Ak,4] 8n +a t.Bt) (4.104)

where x[n) = x<nl (t~n)), t.Bt) = B(t~~ 1 )- B(t~n)), and the coefficients A have
the definitions of the coefficients a in Eq. 4.103 with X [n) in place of x~n).
Figure 4.13 shows samples of the Euler and Runge-Kutta (R-K) solutions
for p = 1, f3 = -1, n = 250, and n = 10, 000 corresponding to the same sample
of B. The Euler numerical solution diverges if the time step is relatively large
(n = 250, 8n = 100/250). On the other hand, the solution given by the Runge-
Kutta method is bounded for n = 250. Moreover, estimates of moments and other
properties of X from samples of x<nl generated by the Runge-Kutta method with
n = 250 are accurate. The solutions of the Euler and the Runge-Kutta methods
are satisfactory and nearly coincide for 8n = 100/10, 000. <>

Note: The selection of the time step on for a numerical solution x<n) must be such that
the solution does not diverge and provides sufficient information on the samples and/or
other properties of X. For example, on = 100/250 is inadequate for the Euler method
because the corresponding numerical solution x<n) diverges, but seems to be satisfactory
if the analysis is based on the Runge-Kutta method. &
284 Chapter4. Ito's Formula and Stochastic Differential Equations

X 10245
2.-~~~--~--~---,

n =250 (Euler) n =250 (R-K)


0 1---------.,

-2

-4

-6

-8

-10 '--~--~--~--~----'
0 20 40 60 80 100 20 40 60 80 100

3,-~------~------, 3.-~~~~~-------,

n =10000 (Euler) n =1 0000 (R-K)

-1

-2 o'-----~~40--..!.........6Q-.--~80--.......J100
rrme

Figure 4.13. Samples of the Euler and Runge-Kutta (R-K) solutions

4.8 Problems
4.1: Show that the stochastic integral Jx(H) in Eq. 4.16 is an adapted cadlag
process.
4.2: Prove the equality J~ s dB(s) = t B(t)- J~ B(s) ds, where B is a Brownian
motion.
4.3: Prove the associativity and the preservation properties of the stochastic inte-
gral (Section 4.4.4) for the special case of simple predictable integrands.
4.4: Show that the jump process D..(H · X) is indistinguishable from H(D..X),
where H E £and X is a semimartingale (Section 4.4.4).
4.5: Show that C 2 is a semimartingale, where C is the compound Poisson process
inEq. 4.13.
4.6: Check whether the processes

X(t) = B(t)- 4t and X(t) = t 2 B(t)- 2 lots B(s) ds

are F,-martingales, where F 1 denotes the natural filtration of a Brownian motion


process B.
4.8. Problems 285

4.7: Show that X(t) = J~ h(s) dB(s), t E [0, r], r > 0, is a Gaussian process
with mean zero and covariance function E[X(s)X(t)] = f~M h(u) 2 du, where
h : [0, r] ~ ~ is a continuous function.

4.8: Use Ito's formula to show that f~ B dB is a martingale, where B is a Brow-


nian motion.

4.9: Suppose that X is a real-valued diffusion process. Show that Y ( t) = e x (t),


t ::: 0, is also a diffusion process.

4.10: Write the Ito and the Stratonovich differential equations satisfied by the
process Y(t) = eB(t), t::: 0, where B is a Brownian motion.

4.11: Transform the following Stratonovich and Ito differential equations into
Ito and Stratonovich differential equations, respectively, that is, the Stratonovich
equations

dX(t) =a X(t)dt + bX(t) o dB(t),


dX (t) = sin(X (t)) cos(X (t)) dt + (t 2 + cos(X (t)) o dB(t)

and the Ito equations

dX(t) =a X(t) dt + b X(t) dB(t),


dX(t) = 2 exp( -X(t)) dt + X(t) 2 dB(t),

where a and b are some constants.

4.12: Show that X (t) = X(O) e-cx 1 + f3 J~ e-cx(t-s) dB(s) is the solution of the
Omstein-Uhlenbeck process in Example 4.41.

4.13: Find the solution of

where Bk are independent Brownian motions and a, b k are constants.

4.14: Let X(t) = (X1(t) = B(t) 2, X2(t) = e8 Ul) be an ~2 -valued process,


where B denotes a Brownian motion. Find the stochastic differential equation
satisfied by X. Is X a diffusion process?

4.15: Let X be a stochastic process defined by the Stratonovich differential equa-


tion d 0 X(t) =a X(t) dt + b X(t) o dB(t), where a and bare constants. Find the
Ito version of this equation.

4.16: Find the solution of dX(t) =a X(t)dt + bX(t) dC(t), X(O) > 0, where
Cis the compound Poisson process in Eq. 4.13 and a, bare constants.
Chapter 5

Monte Carlo Simulation

5.1 Introduction

The Monte Carlo simulation method is used extensively in applied science


and engineering to solve a variety of stochastic problems. Solutions for this type
of problems by Monte Carlo simulation are presented in the following four chap-
ters of the book. We have already used the Monte Carlo method in previous chap-
ters to illustrate theoretical concepts on random variables, conditional expectation,
stochastic processes, and stochastic integrals.

The solution of a stochastic problem by the Monte Carlo method involves


three steps. First, independent samples of the random parameters and functions in
the definition of the stochastic problem need to be generated. Second, the result-
ing deterministic problems corresponding to the samples generated in the previous
step have to be solved. Third, the collection of deterministic solutions needs to be
analyzed statistically to estimate properties of the solution of the stochastic prob-
lem. The first step of the Monte Carlo solution is the focus of this chapter. We
give methods for generating samples of random variables, stochastic processes,
and random fields. Improved Monte Carlo simulation methods for estimating sys-
tem reliability are also presented.

Most algorithms for generating samples of random variables, stochastic


processes, and random fields consist of transformations with and without mem-
ory of random variables uniformly distributed in (0, 1) and Gaussian variables
with mean zero and variance one, denoted by U(O, 1) and N(O, 1), respectively.
Most computer software packages include functions producing samples of these
random variables. For example, the MATLAB functions rand and randn generate
independent samples of U(O, 1) and N(O, 1).

287
288 Chapter 5. Monte Carlo Simulation

5.2 Random variables


Let X be an ~-valued random variable defined on a probability space
(Q, :F, P). Our objective is to generate independent samples of X.

5.2.1 Gaussian variables


Let X "' N(p,, y) be a Gaussian vector in ~d, where p, andy denote the
mean and covariance matrices of X.

Case 1 (d = 1). If X "' N(J-t, y), the scaled and translated MATLAB function
J-t + .jY randn can be used to generate independent samples of X. Some of the
algorithms used to generate samples of a Gaussian variable are based on mem-
oryless transformations of some random variables [27, 79, 155]. For example,
Gaussian variables can be related simply to uniformly distributed random vari-
ables (Eq. 5.1).

Monte Carlo algorithm. The generation of independent samples of a random


variable X"' N(O, 1) involves two steps:
1. Generate samples of the independent random variables U t. U2"' U(O, 1).
2. Calculate two independent samples of X from

Zt = ./-2ln(Ut) cos(2n U2) and Z2 = J-2ln(Ut) sin(2n U2). (5.1)

Proof: If z1 and Z2 are independent copies of N(O, 1), their density is f(zt, Z2) =
exp[-(zt + z~)/2]/(2n). The density of the random variables (R, 8) defined by the
mapping z 1 = R cos(8) and Z2 = R sin(8) is fR,e(r,9) = r exp(-r 2/2)/(2n).
Hence, R has the density fR (r) = r exp( -r 2/2), 8 is uniformly distributed on (0, 2n ],
and the variables (R, 8) are independent. Samples of R can be generated from samples of
Ut ~ U(O, 1) andtherepresentationR 1::. ,J-21n(Ut) derivedfromFR(R) = Ut and the
fact that Ut and 1 - Ut have the same distribution, where FR denotes the distribution of
R. This method for generating samples of random variables is discussed further in Eq. 5.4.
The projection of Ron the coordinates (z1, Z2) gives Eq. 5.1. •

Case 2 (d > 1). The generation of independent samples of a Gaussian vector


X "'N(p,, y) can be based on the MATLAB function randn or Eq. 5.1 and the
Cholesky decomposition in Eqs. 5.2 and 5.3. The vectors X and
Z = 1L + {J G "'N(JL, y), (5.2)
have the same distribution, where the coordinates of G E JRd are independent
N (0, 1) and the non-zero entries of the lower triangular matrix {J are

22:=1
. 1 0
{3·· _
I}- [
Yij -
j-1
f3ir /3jr
2 ]1/2'
1 :::; j :::; i :::; d, with L f3ir f3jr = 0. (5.3)
Yjj - Lr=1 f3jr r=1
5.2. Random variables 289

Monte Carlo algorithm. The generation of samples of X "' N(p., y) involves


two steps:
1. Generate samples of G.
2. Calculate the corresponding samples of X from Eq. 5.2 with {3 in Eq. 5.3.

Note: A sample of the column vector G can be obtained from the MATLAB function
randn(d, 1). A proof of the validity of Eqs. 5.2 and 5.3 can be found in [155] (Sec-
tion 3.5.3). An alternative method for generating samples of X is in Problem 5.1. &

5.2.2 Non-Gaussian variables


Let X be an arbitrary JRd -valued random variable with dependent compo-
nents and distribution F.

Case 1 (d = 1). The generation of samples of an arbitrary non-Gaussian random


variable can also be based on the MATLAB function rand.
Monte Carlo algorithm. The generation of samples of X with distribution F
involves two steps:
1. Generate independent samples of U(O, 1).
2. Calculate the corresponding samples of X from

Z = F- 1 (U(O, 1)). (5.4)

Proof: The distribution of Z = F- 1(U(O, 1)) is


P(Z:::; z) = P(F- 1(U(O, 1)) :::; z) = P(U(O, 1) :::; F(z)) = F(z).
Samples of X can be calculated from Eq. 5.4 and independent samples of U(O, 1). For
example, n independent samples of an exponential random variable with mean 1/A., >.. > 0,
are given by -ln(l - rand(n, 1))/A. f!:. -ln(rand(n, 1))/A.. •

The Monte Carlo simulation algorithm based on Eq. 5.4, referred to as


the inverse transform method, is general but becomes inefficient if F - 1 has
to be calculated numerically. For example, the algorithm in Eq. 5.1, rather than
<1>- 1 (U(O, 1)), is used in Monte Carlo studies since the inverse <I> - 1 of the distri-
bution <I> of N(O, 1) is not available in closed form.

Example 5.1: Let X "' S01 (a, {3, p.) be an a-stable random variable with scale
a > 0, skewness {3, and location p., where 1/31 :::; 1 and a E (0, 2]. If f3 = 0 and
11- = 0, X has a symmetric density about zero and its samples for a = 1 can be
generated from

Z = sin(a V) [cos((1;: a) V)J(l-OI)/01 ,


(5.5)
[cos(V)] 1/01
290 Chapter 5. Monte Carlo Simulation

where V is uniformly distributed in ( -n /2, n /2) and W follows an exponential


distribution with mean one. The random variables V and W are independent. The
above definition of Z becomes
sin(V)
Z = - - = tan(V), for a= 1 (X~ Cauchy),
cos(V)
Z = 2-v'W sin(V), for a = 2 (X ~ N(O, 2)). (5.6)

If X ~ Sa(a, 0, tt), samples of X can be obtained from tt +a Z with Z in Eq. 5.5


or 5.6. If X ~ Sa(a, {3, tt), samples of X can be calculated from tt +a Z and

I
samples of Z ~ Sa(l, {3, 0), where

sin(a (V-vo)) [cos(V-a (V-vo))](l-a)ja


z !!._ (cos(V)) !fa W ' (5.7)
- ~ [ (~ + f3 V) tan(V) - f3 ln ( (nf;j!~o~(V))], a= I,

with V and Was in Eq. 5.5, vo = -n f3 h(a)j(2a), and h(a) = 1 -11- al ([79],
Section 4.8.1, [162], Section 1.7). <>

Example 5.2: Let X ~ S1 (a, 0, 0), a > 0, be a Cauchy variable with density
f(x) = aj[n (x 2 + a 2 )] and characteristic function q;(u) = e-a lui. Figure 5.1
shows n = 100 independent samples of X and a histogram of X based on 10,000

f(x)
1.5

-10
Histogram
-15 Sam/les of
-20
X- 1(0.2,0,0)

-25 0.5

A ~
-30

-35 0
0 20 40 60 80 100 -3 -2 -1 0 2 3
X
Sample number

Figure 5.1. Samples of a Cauchy variable X ~ S1 (0.2, 0, 0) and a histogram of


X based on 10,000 independent samples

samples of this variable, respectively, for a = 0.2. The figure also shows the
density f of X. The samples of X have been obtained from a Z and samples of
Z in Eq. 5.6 for a = 1. <>

Example 5.3: Let N be a Poisson random variable with probability law P (N =


n) = (A.n jn!) e-A, n = 0, 1, ... , and intensity A. > 0. It is possible to generate
5.2. Random variables 291

samples of N by the inverse method in Eq. 5.4. However, the approach is inef-
ficient if the probability that N takes relatively large values is not negligible. If
E[N] =A 2: 10, it is convenient to generate samples of N from

Z =max {0, [A+ G A112 - 1/2]} (5.8)

where [x] denotes the integer part of x and G "'N(O, 1) ([155], Section 3.7.2).

Note: The approximation of N by the random variable Z in Eq. 5.8 is acceptable for large
values of A because the distribution of the random variable (N - A)/...fi. converges to the
distribution of N(O, 1) as A increases indefinitely. •

Case 2 (d > 1). Let F1, Fklk-1, ... ,1. k = 2, ... , d, and F denote the dis-
tributions of the coordinate X 1 of X E JRli , the conditional random variable
Xk I (Xk-1, ... , X1), and X= (X1, ... , Xd), respectively.

Let Z = (Z1, ... , Zd) be an JRd -valued random variable defined by


F1 (Z1) = U1,
Fklk-1, ... ,1 (Zk) = Uk. k = 2, ... , d, (5.9)

where U1, ... , Ua are independent U(O, 1) random variables. Then, X and Z
are equal in distribution.

Proof: The random vector Z has the same distribution as X by definition (Eq. 5.9). The
representation of Z in Eq. 5.9 constitutes an extension of the inverse transform method in
Eq. 5.4 to random vectors. •

Monte Carlo algorithm. The generation of samples of X involves two steps:


1. Generate independent samples of U(O, 1) and construct samples of U =
(U1, ... , Ua).
2. Calculate the corresponding samples of X from Eq. 5.9.

Note: Let (u1, ... , ua) be a sample of (Ut. ... , Ua) and z = (Z1, ... , za) be the corre-
sponding sample of Z obtained from Eq. 5.9. The first coordinate of z is Z1 = F1 1(u1).
The other coordinates of z can be calculated from Fkjk-1, ... ,1 (Zk) = Uk for increasing
values of k ~ 2, where the vector (Zk-1, ... , Z1) has already been calculated in a previous
step. •

Example 5.4: Let X = (X 1, X2) be a non-Gaussian vector with X 1 "' N(JL, a 2)


and X2 I (X1 = x1) "'N(x1, 82). The density of X is

1 </J (X1-JL)
f(X1,X2) = a8 -0'- </J
(X2-X1)
8 .
292 Chapter 5. Monte Carlo Simulation

5,-------------,
..~.--· 15
4 cr=1, &=1 cr=1, &=3
3 10

·....
....
2
5
....:~·:-:~ ..~:
1
x2 x2
0
.'. ;.r •• •• 0
.. ' " . 0

-1

...
:.
'· .... 0

-2 -5
-3
~L__-~-~--~-___..J -10L__-~-~--~-___..J
~ -2 2 4 -4 -2 2 4

Figure 5.2. Samples X = (XI, Xz), where X1 "' N(tt. a 2 ) and Xz I (XI =
xi)"' N(x1, 82 ) for (a= 1, 8 = 1) and (a= 1, 8 = 3)

The mapping in Eq. 5.9 becomes Z 1 = tt +a <1>- 1(U1) and Zz I (Zt = Z1) =
Zl + 8 <I>- 1(Uz), where U1 and Uz are independent copies of U(O, 1). Figure 5.2
shows 100 independent samples of X generated by the algorithm in Eq. 5.9 for
a = 1, 8 = 1, and 8 = 3. The dependence between X 1 and Xz is stronger for
smaller values of 8. ¢

Example 5.5: Let X be an a-stable vector in ~d defined by the value of a, loca-


tion vector JL E ~d, and a finite measure y on the unit sphere Sd centered at the
origin of ~d. Define a discrete approximation y (n)(u) = L~= 1 Yr 8(u r - u) of
y, where (At, ... , An) is a partition of Sd, Yr = y(A,), andu, E A, are arbitrary

-I
points, r = 1, ... , n. Samples of the random vector X can be generated from

z(n) L~=1 y,tla v, Ur + JL, a =I= 1,


- L~=1 Yr (v, + ~ ln(y,)) u, + JL, a= 1,
(5.10)

where V, are independent copies of Sa(l. 1, 0). ¢

Note: The characteristic function rp(u) = E [ e.;=I uT X] of X is

exp 1a- fsd lbla y(du) +A tan {!If) fsd lbla sign(b) y(du)), a=1= 1, 2
exp a- fsd lbl y(du)- A~ fsd b ln(lbl) y(du)), a= 1
exp a- fsd lbl 2 y(du)), a= 2

where a =AuT/Land b = uT u. It can be shown that z(n) converges in distribution


to X ifmax1::;r::;d y(Ar)--+ 0 as n--+ oo ([79], Section 3.5.3). ~
5.3. Stochastic processes and random fields 293

Example 5.6: Let X E JRd be a translation vector, that is, X = g(Y), where Y
is an JRd -valued Gaussian vector with mean zero, covariance matrix p = {p i,J =
E[Y; Y1 ]} such that Pi,i = 1, i = 1, ... , d. Suppose that the memoryless mapping
from Y to X is given by X;= F;-l (<P(Y;)) = g;(Y;), i = 1, ... , d, where F; are
absolutely continuous distribution functions.
Samples of X can be generated from (1) samples of Y and the definition of
X or (2) samples of U(O, 1), the distribution of X, and the algorithm in Eq. 5.9.
The latter approach is less efficient for translation random vectors. <>

Note: Let Yi = gj 1(xi) andy= (Yl· ... , YJ). The distribution,

P(X1:::: Xi, ... , Xd:::: XJ) = P(Yt:::: Yt. · .. , Yd:::: Yd) = <l>d(y; p),
called multivariate translation distribution, is the input to the Monte Carlo simulation
algorithm based on Eq. 5.9, where <l>J(·; p) denotes the joint distribution function of the
Gaussian vector Y ~ N(O, p). A

5.3 Stochastic processes and random fields


Properties of stochastic processes and random fields have been discussed
in Chapter 3. In this section we present methods for generating sample paths of
these random functions. Let X (t) be an JRd -valued stochastic process defined on a
probability space (Q, :F, P), where the argument t E lR is viewed as time. We also
consider JRd -valued random fields X (t) defined on the same probability space, but
t E JRd' denotes a space argument.

5.3.1 Stationary Gaussian processes and fields


If all finite dimensional distributions of a random function, that is, a stochas-
tic process X(t) or a random field X(t), are Gaussian, this function is said to be
a Gaussian process or field. Our objective is to generate independent samples of
a Gaussian function X with mean zero and a prescribed covariance function. If
the mean of X is not zero, we generate samples of X - E [X] and add the mean
function E[X] to these samples.

5.3.1.1 Spectral representation. Stochastic processes


The spectral representation theorem shows that a weakly stationary pro-
cess can be approximated by a superposition of harmonics (Sections 3.9.4.1 and
3.9.4.2). This theorem can be used to develop models for generating sample paths
of stationary Gaussian proces~es, which consist of a superposition of harmonics
with (1) fixed frequencies and random amplitudes or (2) random frequencies and
amplitudes.
Let X(t), t E lR, be a stationary Gaussian process with mean zero, covari-
ance function c(r) = E[X(t + r) X(t)T], and spectral density s(v) = {skz(v)},
294 Chapter 5. Monte Carlo Simulation

k, l = 1, ... , d. Our objectives are to (1) construct a sequence of processes X (n),


n = 1, 2, ... , such that x<n) depends on a finite number of random variables for
each n and becomes a version of X as n --+ oo and (2) develop algorithms for
generating samples of x<n).

Fixed frequencies. We first introduce some notation and definitions. Let v * > 0
be a cutoff frequency and Pn = (0 = ao <at < ···<an = v*) be a partition of
the frequency range [0, v*]. Denote by L\v, =a, -ar-1 and v, = (ar-1 +a,)/2,
r = 1, ... , n, the length and the midpoint of the frequency intervals defined by p n.
It is common to take equal frequency intervals in which case we have .:\ v r = v* / n
and v, = (r- 1/2) v* fn. Consider also the Rd -valued Gaussian variables A,, B,
with mean zero and second moments
{Oir
E[Ar,k Ap,t] = E[Br,k Bp,l] = Orp Ja gk,l(V) dv:::::: Orp gk,z(v,) L\v,,

E[Ar,k Bp,l] = -E[Br,k Ap,l] = Orp Ja rr


Olr-1

hk,z(v) dv:::::: Orp hk,z(v,) L\v,,


Olr-l
(5.11)

where g(v) = s(v) +s(-v) andh(v) = -.J=T (s(v) -s(-v)) (Section 3.9.4.2).
The approximate equalities in Eq. 5.11 can be used for small values of .:\ v r.
If the frequency band of the process X is not bounded, a cutoff frequency
v* > 0 has to be selected to apply the Monte Carlo simulation algorithm consid-
ered in this section. The cutoff frequency v * > 0 has to be such that most of the
energy of the process X corresponds to harmonics with frequencies in [0, v *].
We now define a sequence of processes X (n) and show that X (n) is approxi-
mately a version of X for sufficiently large values of n. This property justifies the
use of samples of x<n) as substitutes for samples of X.

If L\(pn) --+ 0 and v* --+ oo, then


n
x<n)(t) = L [A, COS(Vr t) + B, sin(v, t)] (5.12)
r=l

becomes a version of the stationary Gaussian process X.

Proof: Recall that Ll(Pn) = max1::;r~n (a,- ar-t) denotes the mesh of Pn· For any
partition of the frequency range [0, v*], x<n) is a process with mean zero and covariance
function with entries
5.3. Stochastic processes and random fields 295

Hence, x<nl is weakly stationary for each n. This process is Gaussian because it depends
linearly on the Gaussian variables A, and B,. If d = 1, then gu = g and hn = 0 so that
the covariance function of x<n) is

For a fixed v*, we have

lim E
n-+oo
[(x(n) (t) x(n) (s)T) .. ]
lJ

= fov* [gi,J (v) cos(v (t- s)) -hi,} (v) sin(v (t- s))] dv

so that the covariance function of x(n) converges to the covariance function of X if in


addition to the limit as n ---+ oo we also let v* ---+ oo (Section 3.9.4.2). Hence, x(n)
becomes a version of X since X and x<nl are Gaussian processes.
The model x<nl has two properties that may be undesirable in some applications.
The sample paths of this process are ( 1) periodic with period 2 rr I v1 if ~ Vr = v* In and
Vr = (r - 112) ~Vr and (2) infinitely differentiable even if the samples of X are not
differentiable. Hence, samples longer than 2 rr lvl provide the same information as samples
of length 2 rr I v1, and estimates of, for example, mean crossing rates of X from samples of
x<nl are incorrect if X is not differentiable (Section 3.10.1) . •

Monte Carlo algorithm. The generation of samples of X in a time interval


[0, r], r > 0, involves three steps:
1. Select a cutoff frequency v* > 0, partition the frequency band [0, v*] inn
intervals, and select frequencies v,, r = 1, ... , n, in these intervals.
2. Generate samples of the Gaussian random variables A, and B,, r = 1, ... , n,
with mean zero and second moments in Eq. 5.11 using the algorithm in Sec-
tion 5.2.1.
3. Calculate the corresponding samples of x<nl in [0, r] from Eq. 5.12.

Example 5.7: Let X be an IR2 -valued stationary Gaussian process with spectral
density
Sk,J(V) = (1- p) /h1 Sk(V) + p sz(V) k, [ = 1, 2,

where sk(v) = 1/(2 vk) 1[-vk.vkl(v), sz(v) = 1/(2 ii) 1[-ii,vJ(v), and 0 < vko v <
oo. Figure 5.3 shows sample paths of the coordinates of X for vk = 25, v = 5,
two values of p, and n = 100 harmonics (Eq. 5.12). The frequency content of X
depends strongly on p. The coordinates X 1 and X 2 of X are nearly in phase for
values of p close to unity. <)
Note: Set v* = max(vlo iJ2, v), ~vr = v* In, and Vr = (r- 112) ~Vr for r = 1, ... , n.
The covariance matrix of the random amplitudes Ar and Br can be calculated from Eq. 5.11.
The samples in Fig. 5.3 have been generated by the model in Eq. 5.12. A
296 Chapter 5. Monte Carlo Simulation

4,---~-------------, 4,------------------.
Sample of ~ (p= 0.3) Sample of~ (p= 0.7)

-4 -4
0 2.5 5 7.5 10 0 2.5 5 7.5 10
time, t time, t

4 4
Sample of X2 (p= 0.3) Sample of X2 (p= 0. 7)
2

-4L---~----~----~--~ -4L---~----~----~--~
0 2.5 5 7.5 10 0 2.5 5 7.5 10
time, t time, t

Figure 5.3. Samples of the coordinates X 1 and X2 of X for p = 0.3 and p = 0.7
(adapted from [79], Fig. 4.4)

Random frequencies. We limit our discussion to real-valued processes for sim-


plicity. Our objective is to develop a Monte Carlo simulation algorithm for gen-
erating samples of a stationary Gaussian process {X(t) E ~} with mean zero,
covariance function c(r) = E[X(t + r) X(t)], and one-sided spectral density g.
This algorithm is based on a model consisting of a superposition of harmonics
with both random amplitudes and frequencies, in contrast to the model in Eq. 5.12
that has random amplitudes but fixed frequencies.

Let Cr ( v), r = 1, 2, be independent copies of a compound Poisson process


C(v) = I::~~) Yk. where N is a Poisson process with intensity A. > 0 and Yk
are iid random variables with E[Yt] = 0 and E[Yf] = J-t2 < oo. Denote by
(Tt, Tz, .. .) the random times of the jumps (Yt, Yz, .. .) of C. We continue to call
Tt, k = 1, 2, ... , jump times although they are frequencies. As for the previous
algorithm (Eq. 5.12) we select a cutoff frequency v* > 0.
5.3. Stochastic processes and random fields 297

If ~(v) 2 = g(v)j(AJ-t2),).-+ oo, and v*-+ oo, then


x<Al(t) = fov* ~(v) [cos(v t) dCt (v) + sin(v t) dC2(v)]
N, (v*) N2(v*)
L ~(Tl,k) Yt,k cos (Tt,k t) + L ~(T2,1) Y2,1 sin (T2,1 t)
k=l 1=1
(5.13)

becomes a version of the Gaussian process X [87].

Proof: The parameter ').. > 0 indexing the sequence of processes approximating X has a
similar role as n = 1, 2, ... in Eq. 5.12. The first equality in Eq. 5.13 is a statement of the
spectral representation for an approximation of X obtained by neglecting the power of this
process beyond a cutoff frequency v* > 0. The second equality in Eq. 5.13 holds since
Nr(v*) < oo a.s. for v* < oo and the integrators Cr. r = 1, 2, have piece-wise constant
samples.
The process x(A) has mean zero and covariance function

= Af-t2 fo ~(v) 2
v•
E[X(A)(t)X(A)(s)] [cos(vt) cos(vs) +sin(vt) sin(vs)]dv

rv*
= Jo g(v) cos(v (t- s)) dv,

where the last equality holds because').. f-L2 ~(v) 2 = g(v) by hypothesis. Hence, the co-
variance function of x(A) converges to the covariance function of X as v* -+ oo, that is,
X and limv*--+oo x(A) are equal in the second moment sense for each ').. > 0. It can also
be shown that i(A)(t) = X(A)(t)/Var[X(A)(t)] converges in distribution to N(O, 1) ([79],
Example 3.12, p. 83). The approach in [79] can be used to show that the distribution of
a vector with coordinates i(A) Uk) becomes Gaussian as ').. -+ oo, where tk are arbitrary
times. In summary, we showed that (1) the second moments of x(A) approach the second
moments of X as v* -+ oo for any ').. > 0 and (2) x(A) converges to a Gaussian process
as ').. -+ oo for any v* > 0. Hence, x(A) becomes a version of X as ').. and v* increase
indefinitely. •

The models x<n) and X(A) in Eqs. 5.12 and 5.13 have similar features but
differ in several notable ways. Both models consist of a superposition of a finite
number of harmonics with random amplitudes. However, the frequencies of these
harmonics are fixed for X (n) but are random for X (A). The frequencies of con-
stituent harmonics of X (A) coincide with the jump times of the compound Poisson
processes C1 and C2. Both processes, x<n) and x<Al, are weakly stationary for
any n, A., and v*. However, x<n) is a Gaussian process for any n and v * while X(A)
is Gaussian asymptotically as A. -+ oo for any value of v *.
298 Chapter 5. Monte Carlo Simulation

Monte Carlo algorithm. The generation of samples of X in a time interval


[0, r ], r > 0, involves three steps:
1. Select a cutoff frequency v * > 0, an intensity A > 0 for the Poisson
processes N,, and a distribution for the random variable Y1 E L2 such that
~(v) 2 = g(v)j(AJ-L2).
2. Generate samples of the jump times (Tr,l •... , Tr,N,(v•)) and the jumps
(Yr,l, ... , Yr,N,(v•)) of the compound Poisson processes C,, r = 1, 2.
3. Calculate the corresponding samples of X (J..) from the second equality in
Eq. 5.13.

Note: The methods in Section 5.2.1 can be used to generate samples of the jumps of the
compound Poisson processes Cr. The jump times of these processes can be generated in
at least two ways. One approach is based on the representation T,.,i = L~=l Wr,i of the
jump times of C,, where Wr,i are iid exponential random variables with mean 1/'A. The
other approach can first generate samples of the Poisson variables Nr(v*) (Example 5.3)
and then place Nr (v*) independent uniformly distributed points in [0, v*]. A

Example 5.8: Let X be a stationary Gaussian process with mean zero, covariance
function c(r) = E[X(t + r) X(t)] = sin(v* r)j(v* r), and one-sided spectral
density g(v) = (1/v*) 1[o,v•J(v).
Figure 5.4 shows a sample of x<J..l, the covariance function c(r) of X, and

4,-------------------,
3 Sample of x<A.>

0.5

Estimate of c('t)

-1
0

-3 '-. c('t)
-4 -0.5
0 5 10 15 20 0 0.5 1 1.5 2
time time lag

Figure 5.4. A sample of x<J..), the covariance function c(r) of X, and an estimate
of c(r) from 500 samples of x<J..)

an estimate of c(r) based on ns = 500 samples of x<J..) for v* = 20, A v* = 10,


and Yr,p independent N(O, 1) random variables. The figure also shows that X (J..)
does not have periodic samples and that the estimate of c(r) is satisfactory al-
though the samples of x<J..) have on average only A v* = 10 harmonics. In con-
trast, the samples of the model inEq. 5.12 with n = 10 equally spaced frequencies
in [0, 20] are periodic with period 2 n. <>
5.3. Stochastic processes and random fields 299

5.3.1.2 Spectral representation. Random fields

We have seen in Section 3.9.4.3 that the spectral representation theorem


applies also to weakly stationary or homogeneous random fields. We apply this
theorem as in the previous section to develop models for generating samples of
stationary Gaussian fields consisting of a superposition of harmonics with ( 1) fixed
frequencies and random amplitudes and (2) random frequencies and amplitudes.
We discuss only the case of real-valued random fields to simplify the pre-
sentation. Considerations in Section 3.9.4.3 can be used to extend results in this
section to vector random fields. Let X (t), t E Rd', be a real-valued, homogeneous
Gaussian field with mean zero, covariancefunctionc(T) = E[X(t+T) X(t)], and
spectral density s(v), where t, T E Rd' are spatial coordinates and v E Rd' is a fre-
quency. Our objectives are as in the previous section to ( 1) construct a sequence of
approximations of X whose limit is a version of this random field and (2) develop
algorithms for generating samples of X in a bounded subset S of R d' •

Fixed frequencies. Let D be a boundedsubsetofRd', (D1, ... , Dn) be a partition


of D, and Vr denote interior points of Dr. It is common to (1) take D to be a
bounded rectangle centered at the origin of Rd', that is, D = x%~ 1 [-vz, vz],
vZ r
> 0, (2) partition D in rectangles D 7 , = 1, ... , d', defined by a grid with
step .6-vk = vkfnk. k = 1, ... , d', where nk ~ 1 are integers, and (3) choose Vr
at the center of Dr. Denote by Vr the Lebesgue measure of Dr, that is, the volume
of this set, and define a collection of independent Gaussian variables, A r and Br,
r = 1, ... , n, with mean zero and variance

(5.14)

The following equation defines a sequence of random fields X (n), n =


1, 2, ... , that can be used to approximate a homogeneous Gaussian field X with
specified second moment properties.

n
x<n)(t) = L [Ar cos(vr. t) + Br sin(vr. t)]' (5.15)
r=l

.
becomes a vers10nof X, where Vr · t = Lk=l
d'
Vr,k tk. Vr = (vr,l, ... , Vr,d'), and
t = (tl, ... ' fd 1 ) .

Proof: The random field x<n) is Gaussian with mean zero for any D and partition of it.
300 Chapter 5. Monte Carlo Simulation

This field is homogeneous since

n
= L E (CAr cos(vr · t) + Br sin(vr · t)) (Ap cos(vp · s) + Bp sin(vp · s))]
r,p=l
n n
= LYr (cos(vr · t) cos(vr · s) + sin(vr · t) sin(vr · s)) = LYr cos(vr · (t- s)).
r=l r=l

Considerations as in Eq. 5.12 show that the covariance function of xCnl approaches the
covariance function of X as the partition of D is refined and D is increased to ~~. Because
x(n) is a Gaussian field for any D and partition of it, xCn) becomes in the limit a version
of X. This properties justifies the use of samples of xCn) as a substitute for samples of X.
We note that the second moments Yr of the random variables Ar and Br in Eq. 5.14
must be consistent with the properties of the spectral density of real-valued random fields
(Section 3.9.4.3). •

Monte Carlo algorithm. The generation of samples of X in a subset S of lR d'


involves three steps:
1. Select cutoff frequencies vZ > 0, k = 1, ... , d', partition D =
x f~ 1 [ - vZ, vZJ in n rectangles, and select frequencies v r, r = 1, ... , n, in these
rectangles.
2. Generate samples of the independent Gaussian variables A r, Br with mean
zero and variance in Eq. 5.14.
3. Calculate the corresponding samples of xCnl (t), t E S, from Eq. 5.15.

Example 5.9: Let X be a real-valued stationary Gaussian field defined on JRd'


with mean zero and spectral density s(v) = L~=l Yr 8(v - Vr) such that s(v) =
s(-v) and Vr E JEd' (Section 3.9.4.3). The field X has the representation

n
X(t) = L [Ar cos(vr · t) + Br sin(vr · t)],
r=l

where Ar and Br are independent Gaussian variables with mean zero and variance
E[A;] = E[B;] = Yr· This representation of X and the model of this process in
Eq. 5.15 have the same functional form. Figure 5.5 shows two sample of X for
d' = 2, n = 6, VJ = (1, 2), V2 = (2, 1), V3 = (2, 2), V4 = -V}, V5 = -V2,
and V6 = -v3. The left sample is for Yr = 1, r = 1, ... , 6. The right sample
corresponds to Yl = Y4 = 1 and Yr = 0.01 for r i= 1, 4, and has a dominant wave
with frequency VI = (1, 2). <>
Note: If d 1 = 2, we can think of cos(vr · t) = cos(vr,l t1 + Vr,2 t2) as a wave with length
2rr!Jvr,2 1 +vr,22 traveling in the direction Or = tan- 1 (vr2/VrJ).
, , The wave length is
5.3. Stochastic processes and random fields 301

10

5 2

0 0

-5 -2

-10 -4
6 6
6 6

tz 0 0
t1 t2
0 0
t1

Figure 5.5. Samples of X for two discrete spectral densities

given by the distance between the zeros of cos(117 • t), that is, the lines Vr,l t1 + Vr,2 t2 =
(2 p - 1) 1r /2 in JR2 , where p is an integer. &

Random frequencies. We construct a sequence of random fields X(;.), A > 0,


consisting of a superposition of waves with random amplitude and random fre-
quency or length, which becomes in the limit a version of X. A similar construc-
tion has been used for the processes in Eq. 5.13.

Let C(v) = I:f~fl Yk 1D(Tk) (v) be a compound Poisson process, where


D(T) = D n ( x1~ 1 [Tj. oo)), T = (Tt, ... , Td'), Dis a bounded Borel subset
of ~d', (VI, ... , vd') are the coordinates of v, Yk denote iid random variables with
mean zero and variance f.L2 = E[Y{l < oo, N is a Poisson process defined on ~d'
with intensity A > 0 and random points T k E JRd' . We will discuss the generation
of point processes in Section 5.3.3.3. Let

N,(D)

Cr(v) = L Yr,k 1D(Tr,k)(v), r = 1, 2, (5.16)


k=l

be two compound Poisson processes, where N r and Yr,k are independent copies
of Nand Yk. respectively, and the random variables T r,k have the same meaning
as T k in the definition of the compound Poisson process C.
302 Chapter 5. Monte Carlo Simulation

If ~(v) 2 = s(v)j(A. J-iz), A.--+ oo, and D approaches m_d', then

xP-l(t) = L~(v) [cos(v. t) dCJ (v)- sin(v. t) dCz(v)]


N,(D) N2(D)

= L ~(TJ,k) Y1,k cos (Tl.k · t)- L ~(Tz,I) Yz,1 sin (Tz,1 · t) (5.17)
k=l 1=1

becomes a version of the homogeneous Gaussian field X.

Proof: A real-valued random field X can be represented by (Section 3.9.4.3)

X(t)= { s(v) (cos(v·t)dCJ(V)-sin(v·t)dCz(v )],


}Tied'

where the processes CP• p = l, 2, are independent copies of C in Eq. 5.16. This represen-
tation suggests the definition of the approximating sequence of fields in Eq. 5.17.
The compound Poisson process C has mean zero and covariance function

where v, v1 E D, D(T k) = D n ( x1~ 1 [Tk,j, oo) ). Tk,j denotes the coordinate j of Tk,
and T1 is uniformly distributed in D. The above derivations used the fact that the Poisson
points Tk are uniformly distributed in D conditional on N(D). If D = x1~ 1 [-vj, vj].
J
then E [ 1D(T 1 )(v) 1D(T 1) (v 1) = TI1~ 1 (vj !\ vj + vj)/(2 vj) since the coordinates TJ,j
of T 1 are uniformly distributed in [- vj, vj].

J
The definition of X(A) in Eq. 5.17 gives E [ x<;_l(t) = 0 and

E[X(A)(t) x(A)(s)] =A IL2 L s(v) 2 cos(v. (t- s)) dv.


5.3. Stochastic processes and random fields 303

The above expression for the covariance function of x<"l results from

E[x<"l(t) x<"l(s)] = E [ M~
L I;(Tr,k) I;(Tu) yl,k Yl,l cos (Tl,k. t) cos (Tu· s) ]

+E [~~
L
k,l=l

k,l=l
I;(T2,k) I;(T2,1) Y2,k Y2,1 sin (T2,k · t) sin (T2,1 · s) ]
L
Nl(D) ]
=E [ I;(Tl,k) 2 rl,k cos (Tr,k · t) cos (Tr,k · t)
k=l

+E [ L
N2(D)
I;(T2,k) 2 ri,k sin (T2,k. t) sin (T2,k.
]
t) ,
k=l

the fact that the random variables Tr,k• k = 1, ... , Nr(D), r = 1, 2, are independent and
uniformly distributed in D conditional on Nr (D), the properties of the random variables
Yr,ko and E[Nr(D)] =A vv, where vv = fv dv denotes the volume of D.
Hence, x<"l is a weakly stationary random field for any A > 0 and D c JR'f'.
Moreover, x<"l becomes equal to X in the second moment sense if Dis increased to JR'f'.
Also, x<Jcl converges to a Gaussian field as A --+ oo for any D. We conclude that x<"l
becomes a version of X if A --+ oo and D is increased to JR'f'. •

The models in Eqs. 5.15 and 5.17 have features and limitations similar to
those of the models in Eqs. 5.12 and 5.13.

Monte Carlo simulation. The generation of samples of X in a bounded subset


S of JE.d' involves three steps:

1. Selectcutofffrequenciesv;,k = 1, ... ,d',definingD = xf~ 1 [-v;,vkJ,


an intensity A. > 0 for the Poisson processes Nr, and a distribution for Yr E L2
such that ~(v) 2 = s(v)j(A. f.-L2).
2. Generate samples of the Poisson points (T r,l, ... , Tr,N,(D)) and the jumps
(Yr, 1, ... , Yr,N, (D)) of the compound Poisson processes C r, r = 1, 2.

3. Calculate the corresponding samples of x<"l(t), t E S, from the second


equality in Eq. 5.17.

Note: The generation of Poisson points in a subset of JR'f' is discussed in Section 5.3.3.3.
The methods described in Section 5.2 can be used to generate samples of the random vari-
able (fr,l• ... , Yr,N,(D)). A

Example 5.10: Let X be a zero mean, homogeneous Gaussian field defined on


304 Chapter 5. Monte Carlo Simulation

JR2 with spectral density and covariance functions

s (v) =
1 [
exp -
vf - 2 p v1 v2 + vi ] , v E JR2 ,
2n Jl=P2 2 (1- p 2 )

r[ + 2 p r1 r2 + r 22 ] rEJR2 ,
c(r)=E[X(t)X(t+r)]=ex[p- 2 ,

where IPI < 1. The spectral density of X is shown in Fig. 5.6 for p 0.7.

0.25

-3 -3

Figure 5.6. Spectral density of X for p = 0.7

Figure 5.7 shows four samples of X fort E S = [0, 10] x [0, 10], A = 0.5, Y 1
a Gaussian random variable with mean zero and variance one, D = [-3, 3] x
[ -3, 3], and s(v) 2 = s(v)jA. The random field X(i.) has approximately the same
second moment properties as X for any A and S( v) satisfying the above condition.
The model x<i.) in Eq. 5.17 has been used to generate the samples shown in this
figure. The average number of Poisson points in D is A v D = 18, where vD
denotes the volume of D. <>

5.3.1.3 Sampling theorem. Stochastic processes

We show that a sampling theorem for deterministic functions can be ex-


tended to define a sequence of processes x<n>, n = 1, 2, ... , such that each x<n)
depends on a finite number of random variables and the limit of this sequence as
n ""'"+ oo is a version of a specified stationary Gaussian process X. The following
equation states the sampling theorem for deterministic functions.
5.3. Stochastic processes and random fields 305

2 2

0
0

-2 -1
10 10
10 10
5
0 0 0 0

10

0 0 0 0

Figure 5.7. Four samples of X(J.) approximating X

Let [ -v*, v*], 0 < v* < oo, be the frequency band of a deterministic function
x(t), t E JR. Then ([29], Sections 5-4, 5-5)
n
x(t) = lim "'x(kt*)ak(t; t*), where
n--+oo L.,;
k=-n
* sin(n(tjt*-k))
ak(t· t ) - and t* = njv*. (5.18)
' - n(tjt*-k)

Note: It is sufficient to know the values x(k t*), k E Z, of x to reconstruct the entire signal.
The function ak is 1 if tIt* = k, is zero if tIt* is an integer different from k, and decreases
rapidly to zero as It It* - kl increases. A

Let X(t), t E lR, be an JRd-valued stationary Gaussian process with mean


zero, covariancefunctionc(r) = E[X(t + r) X(t)T], and spectral density s(v) =
{skz(v)}, k, l = 1, ... , d. It is assumed that the coordinates X; of X have a
bounded frequency range [ -v;, v;J, 0 < v; < oo. If this condition is not satis-
f=
fied, we approximates by a spectral density s that is equal to s in x 1 [ -v;, v;J
and is zero outside this rectangle, and develop a representation for the process
with the spectral density s. We give first a version of the sampling theorem in
Eq. 5.18 for real-valued processes (d = 1) and then extend this result to vector
processes (d > 1).
306 Chapter 5. Monte Carlo Simulation

If X is a real-valued stationary Gaussian process with zero spectral density out-


side a frequency band [ -v*, v*], 0 < v* < oo, then

n1 +n+l
x<n)(t) = L X(kt*)ak(t; t*), (5.19)
k=n 1 -n

becomes a version of X as n --+ oo, where n 1 denotes the integer part oft j t *
and n ~ 1 is an integer.

Proof: We call nodes the times k t* at which the process is sampled, where v = vr,
v* = vj', t* = rr jv*, and k E Z. The integer n 2:: 1 is half the size of the window centered
on the current cell [n 1 t*, (n 1 + 1) t*], that is, the cell containing the time argument t
(Fig. 5.8). The model x<n) in Eq. 5.19 depends linearly on the values of X at 2 (n + 1)
nodes and coincides with X at the nodes included in its window.

Current cell
(n - n)t* r* ,--1-.
/ (n +l)r* (n + n +1)/
1\.
n
t t t

• ~I;_/ • ~ti~e
* J,
t

Figure 5.8. Moving window for X (n) with n = 2

We have (Eq. 5.18)


n
X(t, w) = lim '\" X(kt*, w)ak(t; t*)
n-+oo L..
k=-n

for almost all w's since the samples of X consists of a superposition of harmonics with
frequencies in the range [-v*, v*]. The representation limn-+oo x<n) cannot be used to
generate samples of X because it depends on an infinite number of random variables, the
random variables X(k t*), k E Z.
Consider the model in Eq. 5.19. This stochastic process is Gaussian with mean zero
and covariance function

n1 +n+l (ns+n+l )
E[x<n) (t) x<n) (s)] = k=~-n ak(t; t*) l=~-n c((k- l) t*) CX[ (s; t*) ,

where c denotes the covariance function of X. Hence, x<n) is not stationary for finite
values of n because E[x<n) (t) x<n) (s)] depends explicitly on the times t and s. However,
E[x<nl(t)X(nl(s)] converges to c(t- s) as n--+ oo This follows from the sampling
5.3. Stochastic processes and random fields 307

theorem in Eq. 5.18 applied twice to the covariance function cO in the above equation. We
can apply this theorem to the covariance function of X because the frequency band of its
Fourier transform is [ -v*, v*] by hypothesis. Therefore, x<nl becomes a version of X as
the window size is increased to infinity. This property justifies the use of samples of x<n)
as a substitute for samples of X provided that n is sufficiently large. Numerical results
show that x<nl with n :::::: 10 approximates satisfactorily X [78, 88]. •

The model x<nl in Eq. 5.19 can be used to develop an efficient Monte Carlo
simulation algorithm for generating samples of stationary Gaussian processes.

Monte Carlo algorithm. Suppose that a sample x<nl(., w) of x<n) has been
generated up to a time (n t + 1) t*. The extension of this sample in the next cell
[(nt + 1) t*, (nt + 2) t*] involves two steps:
1. Generate a sample of the conditional random variable

X((nt + n + 2) t*)
= X((nt + n + 2)t*) I [X((nt + n + l)t*) = X((nt +n + 1) t*, w),
X ((nt + n) t*) =X ((nt + n) t*, w), ... ,
X((nt - n) t*) =X ((nt - n) t*, w)]. (5.20)

2. Calculate the corresponding sample of X (n) in [ (nt + 1) t*, (nt + 2) t*] from
Eq. 5.19.

Note: The real-valued random variable X((nt + n + 2) t*) is Gaussian with known mean
and variance (Section 2.11.5). Because X is stationary and we consi?er in Eq. 5.20 only
values of X at the past 2 n +2 nodes, the second moment properties of X((nt +n +2) t*) do
not change in time so that they have ~o be calculated once. The algorithms in Section 5.2.1
can be used to generate a sample of X((nt + n + 2) t*).
The extension of the sample of x<n) beyond the time (nt + 1) t* should use a sample
of the conditional variable

X((nt + n + 2) t*)
= X((nt + n + 2) t*) I [X((n 1 + n + 1) t*) = X((n 1 + n + 1) t*, w),
X((nt + n) t*) = X((nt + n) t*, w), .. .] ,

which accounts for the entire time history, rather than a sample of X((nt + n + 2) t*).
However, this exact formulation is impractical since the properties of X((nt + n + 2) t*)
have to be recalculated at each node and depend on a vector of increasing length as time
progresses. Moreover, the improved accuracy of this formulation may be insignificant
because of the reduced influence of values of X at nodes far away from the cell containing
the current time. Numerical results support this statement for processes that do not exhibit
long range dependence. A process is said to have long range dependence if its covariance
function is such that c(r) ~ r-!3 for r-+ oo, where f3 E (0, 1) is a constant [78, 88].
308 Chapter 5. Monte Carlo Simulation

The algorithm in Eq. 5.20 can be applied to narrowband processes, that is, processes
whose spectral density is zero outside a small vicinity of a central frequency 1{) > 0. How-
ever, the Monte Carlo simulation algorithm based on Eq. 5.20 can be inefficient if the cen-
tral frequency vo is large. A modified version of this algorithm developed for narrowband
processes can be found in [78]. A

If X is an Rd -valued stationary Gaussian process whose coordinates X; have


a bounded frequency range [- v ;*, v;*], v;* > 0, i = 1, ... , d, then the process
x<n) with coordinates
n;, 1 +n;+1
x}n;)= L X;(ktt)ak(t;tt) (5.21)
k=n;, 1 -n;

becomes a version of X as n; --+ oo, i = 1, ... , d, where t;* = nfv;*, n; > 1


are integers, n;, 1 is the integer part oft It;*, and n = (n 1, ... , nd) [88].

Note: The process x<nl with coordinates in Eq. 5.21 is Gaussian with mean zero for
any values of n;. Straightforward calculations show that x<n) becomes a version of X as
min1::;:i::;:d n; -+ oo so that we can use samples of x<n) as a substitute for samples of X
provided that n; are sufficiently large. A

Example 5.11: Consider the R 2 -valued process X with coordinates

where Y; and Y are independent band-limited Gaussian white noise processes with
mean zero and spectral densities sy;(v) = (1/(2\i;)) 1[-v;,v;J(v) and sy(v) =
(1/(2 iio)) 1[-vo,voJ(v), respectively. The spectral density s of X has the entries
s;,; (v) = (1- p) sy; (v) + p sy(v), i = 1, 2, and s1,2(v) = s2,1 (v) = p sy(v). Fig-
ure 5.9 shows the exact and approximate covariance function E[X 1 (t + r) X1 (t)]
fort at a cell midpoint, lag times r ::=: 0, VJ = 2 n, ii2 = 0.4 n, and v = 0.2 n so
that vj = 2 n, vi = 0.4 n, tj = 0.5, and ti = 2.5. The approximate covariance
functions for p = 0 and p = 0.5 corresponding ton 1 = 10 nearly coincide with
the exact covariance functions of X 1 for these values of p. <>

Note: If the spacing t;* = t* is the same for all the coordinates of x<n) in Eq. 5.21, we have
n;,t = n 1 • Suppose that n; = n and n;,t = n 1 , and that a sample of X has been generated
up to a time (n 1 + 1) t*. The extension of this sample to the cell [(nt + 1) t*, (nt + 2) t*]
requires a sample of X((nt + n + 2) t*).
The simulation algorithm is less simple if the nodes in the representation of the
coordinates of x<n) in Eq. 5.21 are not equally spaced. Details on a simulation algorithm
for this case can be found in [88]. A
5.3. Stochastic processes and random fields 309

Covariance functions Covariance functions


=
{p 0) 0.8 (p = 0.5)
0.5

0.4
0

X 1(n , =10) 0.2

-0.5 L---~-~~-~-----' OL---~--~----~--~

0 2 3 4 0 2 3 4
time lag, t time lag, t

Figure 5.9. Covariance functions of X 1 and xintl

5.3.1.4 Sampling theorem. Random fields

Let X (t), t E JR!.d', be a real-valued, homogeneous, Gaussian random field


with mean zero, covariance function c(-r) = E[X(t + -r) X(t)], -r E JR!.d', and
spectral density s(v), v E lR!.d', 0 < v; < oo. It is assumed that the ordinates
of the spectral density are zero outside a rectangle D = x ~'= 1 [ - v;, v;]. If this
condition is not satisfied, we can truncate the spectral density of X, develop a
representation of X for this version of the field, and then let the cutoff frequencies
v; increase indefinitely. Results in this section can be extended simply to vector
random fields [88].

If X is a real-valued, homogeneous Gaussian field with zero spectral density


outside D = x~'= 1 [-v;, v;], u = 1, ... , d', then

n
nt 1 +nt+1 d'
x<m(t) = .L X (k1 tt, ... , kd' tj,) CXka (ta; t;)
kt=n 11 -nt
(5.22)
u
becomes a version of X as na --+ oo, = 1, ... , d', where t; = n;v;, n 1" is
the integer part of ta It;, na ::::_ 1 are integers, and n = (n 1, ... , nd' ).

Note: The random field x<n) is Gaussian with mean zero for any window width na. The
field is not homogeneous since E[x(n) (t) x<n) (s)] depends explicitly on t and s. The
covariance function of x<n) approaches the covariance function of X as na -+ oo, a =
1, ... , d 1 , so that x<n) becomes a version of X for a window with infinite width. This
property justifies the use of samples of x<n) as a substitute for samples of X.
The representation of X in Eq. 5.22 is fort E x~'= 1 [nat~, (na + 1) t~] and depends
on values of X at nodes around this cell. Suppose that a sample of x<n) has been generated
310 Chapter 5. Monte Carlo Simulation

for values of t in this cell. To extend this sample to a neighboring cell, it is necessary to
generate new values of X conditional on the previously generated values of this field. An
algorithm for generating samples of X based on the model in Eq. 5.22 is given in [88]. £.

Example 5.12: Let X (t), t E ~2 , be a real-valued, homogeneous, Gaussian field


with mean zero, spectral density s(v) = 1/(4v;v:i) in [-v;, vi] x [-v:i, v:i]
and zero outside this rectangle, and covariance c(r) = E[X(t + r) X(t)] =
n;=l sin(v~ Tq )/(v~ Tu ). The covariance function of x<n) in Eq. 5.22 is

n11 +n1+l n12 +nz+l n, 1+n,+l nz2 +nz+l


E[x<m(t) x<m(s)] = L L L L
k, =n11 -n, kz=n 12 -nz 1, =ns 1 -n, lz=ns2 -nz

n n
2 2
c((k! -lJ) t;, (k2 -12) t:i) aku (tu; t;) atv (sv; t;),
u=l v=l
so that x<m is not stationary for a finite window width.
Numerical results in [88] show that the covariance function of X (n) ap-
proaches rapidly the covariance function of X. Satisfactory approximations are
reported in [88] for n 1 = n2 ::::::: 10. <>

5.3.2 Non-stationary Gaussian processes and fields


Let X(t), t E [0, r], be an JR.d -valued stochastic process and X(t), t E S,
denote an ~a-valued random field, where r E (0, oo) and Sis a bounded sub-
set of JR.<i'. It is assumed that X(t) and X(t) are non-stationary Gaussian func-
tions with mean zero. Their covariance functions, c;,j(t, s) = E[X;(t) Xj(s)]
andc;,j(t,s) = E[X;(t)Xj(s)],dependontwotimeargumentst,s E [O,r]for
stochastic processes and two space arguments t, s E S for random fields. Our
objective is to generate independent samples of these Gaussian functions. If the
mean of a Gaussian function X is not zero, we generate samples of X- E[X] and
then add the mean function E[X] to these samples.

5.3.2.1 Linear differential equations


The Monte Carlo simulation algorithm in this section applies only to Gaus-
sian processes. Let X(t), t 2: 0, be a Gaussian process with mean zero and
covariance function c(t, s) = E[X(t) X(s)T]. Suppose that X is defined by the
linear stochastic differential equation

X(t) = a(t) X(t) + b(t) Y(t), t 2: 0, (5.23)

where Y E JR.d' is a stationary Gaussian process with mean zero and a prescribed
covariance function.
5.3. Stochastic processes and random fields 311

Note: It is assumed that the (d, d) and (d, d 1 ) matrices a and bare such that Eq. 5.23 has
a unique solution. If Y is a Gaussian white noise, then X is called a filtered Gaussian
process.
The solution of Eq. 5.23 is

X(t)=8(t,O)X(0)+ f 8(t,s)Y(s)ds=8(t,O)X(0)+ k.U(t,s)dZ(v),

where U(t, s) = JJ 8(t, s) eR vs ds, 8(t, s) denotes the Green function for Eq. 5.23,
=
Y (t) fJR e R v t d Z (v), and Z is a process with stationary, orthogonal increments (Sec-
tion 3.9.4). The second moment properties of X in Eq. 5.23 can be calculated by the
methods in Section 7.2.1.2. A

Monte Carlo algorithm. The generation of samples of a Gaussian process X


with specified second moment properties involves two steps:
1. Define the coefficients a, band the properties of the process Yin Eq. 5.23.
2. Generate samples of Y by any of the methods in Section 5.3.1 and integrate
Eq. 5.23 to obtain samples of X.

Note: The determination of the coefficients a and b of Eq. 5.23 is difficult in a general
setting. Relatively simple results are available in special cases, for example, Y is a station-
ary Gaussian process with mean zero and the matrices a, bare constant ([30], Theorem 1,
p. 106).
A finite difference approximation of Eq. 5.23 can be used to obtain samples of X
from samples of Y (Section 4.7.3). For example, we can use the forward finite difference
scheme
X(t + ~t) = [i + a(t) ~t] X(t) + b(t) ~Y(t),
where i denotes the (d, d) identity matrix, ~t > 0 is the time step, and ~Y(t) = Y(t +
~t)- Y(t). A

Example 5.13: Let X be a real-valued stationary Gaussian process defined by the


stochastic differential equation dX (t) = -p X (t) dt + ,j2j) dB(t) fort :::: 0,
where p > 0 and B denotes a Brownian motion. The above equation admits a
stationary solution X s, where X s is a zero-mean Gaussian process with covariance
functionc(r) = E[X(t+r)X(t)] = e-PI<I. IftheinitialstateX(O)hasthesame
marginal distribution as Xs, that is, X(O) ""' N(O, 1), and is independent of the
driving noise, then X is a version of X s.
The recurrence formula Xk+l = (1- p L\t) Xk + ,j2j)Wk. k = 0, 1, ... ,
can be used to generate samples of X, where ~t > 0 denotes the time step,
Xk =X (k !:it), and Wk are independent copies of N(O, !:it). <>

Proof: The solution of the above stochastic differential equation is

X(t) = X(O)e-pt + .JlP fat e-p(t-s) dB(s).


312 Chapter 5. Monte Carlo Simulation

If E[X(O)] = 0 and E[X(0)2] = 1, then E[X(t)] = 0 and


E[X(t)X(s)] = e-p(t+s) +2p lot du los dve-p(t-u) e-p(s-v) E[dB(u)dB(v)]
= e-P (t+s) + 2 p Jo r/\s e-P (t-u) e-P (s-u) du = e-P It-s!
since the increments dB(t) "' N(O, dt) of B are independent. The above second moment
properties of X can be obtained directly from the mean and covariance equations (Sec-
tion 7.2.1.2). If X(O) is not a zero mean, unit variance random variable, then X is not a
weakly stationary process. •

5.3.2.2 Fourier series. Stochastic processes


Let X(t), t e [0, -r], be an 1Rd-valued non-stationary Gaussian process
with mean zero and covariance functions Ci,j(t, s) = E[Xi(t) Xj(s)] fori, j =
1, ... , d and t, s e [0, -r].
Our objective is to develop an algorithm for generating samples of X. A
family of processes x<n> defined by partial sums of the Fourier series of X is
used to approximate this process. We show that the family of processes X (n) can
be used to generate samples of X in [0, -r] if the Fourier series of the covariance
functions Ci,j converge to Ci,j in the rectangle R = [0, -r] x [0, -r], -r E (0, oo).
We introduce some notation and definitions prior to presenting a Monte
Carlo algorithm for generating samples of X. Let v 1 = 2rr I -r and Vk = k v1,
where k = 1, ... , ni, ni :=: 1 are integers, i = 1, ... , d, and n = (n1. ... , nd).
Consider the weighted averages

Aik =~ (' Xi(U) cos (Vk u) du, k = 0, 1, ... , ni,


· -r lo
Bik
·
=~
-r
r
lo
Xj(U) sin(VkU) du, k=1, ... ,nj, (5.24)

of Xi in [0, -r ]. The following equation defines the sequence of processes X (n)


used to approximate X and generate samples of this process.

If the Fourier series of Ci,J converge to the covariance functions Ci,J of X at


every interior point of the rectangle R = [0, -r] x [0, -r ], then the process X (n)
with coordinates

Xzn;) (t) = A~o + f:


k=l
[Ai,k cos (vk t) + Bi,k sin (vk t)] , i = 1, ... , d,

(5.25)
becomes a version of the Gaussian process X as ni --+ oo, i = 1, ... , d.
Proof: The convergence of the Fourier series of ci,J to ci,j in R can be proved under
various conditions. For example, if Ci,j is continuous and its first order partial derivatives
5.3. Stochastic processes and random fields 313

are bounded in R, then the Fourier series of c;,j converges to c;,J at every interior point of
R in the vicinity of which the mixed derivative iPc;,j jot as exists. This property holds in
the special case in which c;,J and its partial derivatives oc;,J jot, oc;,J jos, and a2 c;,J jot as
are continuous in R ([183], Chapter 7).
The Fourier series of c;,j in R is

L
00

c;,j (t, s) = Ak,l [ ai~·j) cos(vk t) cos(vl s) + bi~·i) cos(vk t) sin( vis)
k,/=0

+ci~·i) sin(vk t) cos(vis)+ di:·J) sin(vk t) sin(vis)]


where
c ") = - 2
ak'/ 4la'f la'f
du dv Ci,j (u, v) cos(vk u) cos(vl v),
' T 0 0

b c'·1") = c c'· 1") = -4la'f du la'f dv c; · (u, v) cos(vk u) sin(vl v) and


k,l l,k T2 O O ,j

c ") = -4la'f
dk'/ 2 du la'f dv c; j (u, v) sin(vk u) sin(v/ v),
' T 0 0 '
fori, j = 1, ... , d, k,l = 0, 1, ... , vo = 0, Ak,l = 1j4 fork = l = 0, Ak,l = 1j2 for
k > 0, l = 0 or k = 0, l > 0, and Ak,l = 1 for k,l > 0 ([183], Section 7.2).
The covariance functions c~~·ni) (t, s) = E [ X~n;)(t) XJni\s)] represent partial

sums of the Fourier series ci,j (t, s) since abj) = E[A;,k A j,ll, bkY) = E[Ai,k B J,zl,
d d(i,J) E[B B ] H . R
ck,l = i,k j,l , an k,l =
(i,J) E[B A ] (n;,nj)
i,k j,l . ence, ci,j converges to Ci,j m as
ni, n i --+ oo because the Fourier series of c;,j is convergent in R by hypothesis. Hence,
x<n) becomes equal to X in the second moment sense as n1, ... , nd --+ oo. Because
x<n) and X are Gaussian processes, x<n) becomes a version of X as more and more terms
are used in Eq. 5.25. This property justifies the use of samples of x<n) as a substitute for
samples of X if all ni 's are sufficiently large.
We also note that the coefficients Ai,k> B;,k exist in m.s. since E[IAi,k B J,lll :::;
2 E[B j,l]
( E[Ai,k] 2 )1/2 and, for example,

E[Af k] = ~ {r du {r dv Ci,i (u, v) cos(vk u) cos(vk v) :::; ~ fr du {' dv lci,i (u, v)l,
• T Jo lo T Jo lo
and the last integral is bounded. •

Monte Carlo algorithm. The generation of samples of X in a time interval


[0, T] involves two steps:
1. Generate samples of the correlated Gaussian random variables (A i,k, Bi,k')
fori= I, ... , d, k = 0, 1, ... , n;, and k' = 1, ... , ni.
2. Calculate the corresponding samples of X (n) in [0, -r] from Eq. 5.25.
Note: The methods in Section 5.2.1 can be used to generate samples of the correlated
Gaussian variables (Ai,b B;,k') defined by Eq. 5.25. A
314 Chapter 5. Monte Carlo Simulation

Example 5.14: Let X be an Omstein-Uhlenbeck process defined by the stochas-


tic differential equation dX(t) = -pX(t)dt + ,J'lPdB(t), where t :=::: 0, B
is a Brownian motion, and p > 0. The mean and correlation functions of X
are p.,(t) = E[X(t)] = xoe-Pt and r(t,s) = E[X(t)X(s)] = pe-p(t+s) +
e 2 P min(t,s)-p (t+s) for X (0) = xo. The coefficients of the family of approxima-
tions x<n) of X in Eqs. 5.24 and 5.25 with d = 1 and n 1 = n can be obtained in
closed form [77].
Figure 5.10 shows with heavy and thin lines the evolution in time of the

1.4 , - - - - " T - - - , - - - - , - - - . . . , - - - - - . . - - - - - - ,

1.2 c(t,t)

0.8

0.6
n=lOO
0.4 n=50
n=5

0.5 1.5 2 2.5 3


time

Figure 5.10. Time evolution of exact and approximate variance functions, c(t, t)
and c<n,n) (t, t) for xo = 0, p = 0.6, and r = 3

exact and approximate variance functions, c(t, t) and c<n,n)(t, t), for xo = 0,
p = 0.6, r = 3, and several values of n. The approximate variance functions
c<n,n)(t, t) correspond to a function c* defined on [-r, r] x [ -r, r] and extended
periodically to the entire plane JR2 . This function is such that c*(t, s) = c(t, s)
for (t,s) E R = [O,r] x [0, r], c*(t,s) = c(-t, -s) for (t,s) E [-r,O] x
[-r, 0], and c*(t, s) = 0 for (t, s) E [0, r] x [-r, 0] U [-r, 0] x [0, r]. The
approximate covariance function c<n,n)(r, r) approaches c(r, r)/2 as n increases
since the periodic extension c* of c has a jump discontinuity on the boundary of
[-r, r] x [-r, r]. The difference between c<n,n)(r, r) and c(r, r) can be elimi-
nated by using a periodic function c** that coincides with c in [0, r] x [0, r] and
is continuous on the boundaries of [0, r] x [0, r]. <>

Note: The Fourier series of the covariance function c(t, s) = r(t, s) - {-t(t) f..t(s) of X is
convergent and converges to c(t, s) since this function is continuous in [0, -r] x [0, -r] and
its right and left partial derivatives are continuous almost everywhere in this set. •
5.3. Stochastic processes and random fields 315

5.3.2.3 Fourier series. Random fields


We restrict our discussion to real-valued random fields defined on R 2 , that
is, d = 1 and d' = 2. The extension to the general case is straightforward.
Our objective is to generate samples of an inhomogeneous Gaussian field X with
mean zero and covariance function c(t, s) = E[X(t) X(s)] in a bounded sub-
set of R2 . We first introduce some notation and definitions. It can be shown
that the functions eR (k 1 t 1+kz tz), kt, k2 = 0, 1, ... , define an orthogonal sys-
tem on R = [-rr, rr] x [ -rr, rr] in JR2 . An alternative representation of these
functions is cos(kt 1t) cos(k2 12), cos(kt 1t) sin(k212), sin(kt 1t) cos(k2 12), and
sin(kt 1t) sin(k2 12). Similarly, the family of functions eR Ck1 t1+kz tz+k3 t3+l4 t4),
kt, k2, k3, k4 = 0, 1, ... , defines an orthogonal system on R* = R x R. An
alternative representation of this system is n:=l g, (k, 1, }, where gl g2 g3 g4 are
distinct products of two sine and two cosine functions with the arguments of the
functions g,. The Fourier series of a real-valued periodic function defined on R or
R* converges absolutely and uniformly if its second partial derivatives exist and
are piece-wise continuous ([43], Section 11.5).
Consider the weighted averages

Ao =~ { X(t)dt,
(2rr) JR

Ak1,k2 =:2 1 X(t) cos(kt 1t) cos(k2 12) dt,

of X in R. Let also Bk1,k2 , Ck 1,k2 , and Dk1,k2 be averages of X over R weighted by


the functions cos(kt 1t) sin(k2 12), sin(kt 1t) cos(k2 12), and sin(kt 1t) sin(k2 t2),
respectively. The coefficients Ao, Ak1,k2 , Bk1,k2 , Ck 1,k2 , and Dk1,k2 are correlated
Gaussian variables.

If the Fourier series of c converges to the covariance function c of X at every


interior point of R*, then
n
xCn)(t) = Ao + L [Ak 1,k2 cos(kt tt) cos(k2t2)
k1,kz=l
+ Bk1 ,k2 cos(kt tt) sin(k2t2) + Ck 1 ,k2 sin(kt 1t) cos(k2t2)
+Dk1 ,k2 sin(kt 1t) sin(k2 t2)] (5.26)

becomes a version of the Gaussian field X as n ~ oo.

Proof: The Fourier series of c(t, s) is

L
00

c(t, s) = ao + akl ,kz,k3,k4 gl (kt It) g2(k2 t2) g3(k3 St) g4(k4 S2),
k1 ,kz,k3,k4=l
316 Chapter 5. Monte Carlo Simulation

where t = (tt. t2), s = (st. s2) and


ao = ~ f c(t,s)dtds,
JR*
L.
(2n)

ak,,k2 ,k3 ,/4 = : 4 c(t,s)gt(kt t1)g2(k2t2)g3(k3s1)g4(k4s2)dtds

with the appropriate expressions for the functions gr. The covariance function of x(n)
is equal to the partial sum of the Fourier series c. If the Fourier series of the covariance
function of X is convergent in R*, then the covariance function of x(n) converges to the
covariance function of X as n--+ oo. Because X and x(n) are Gaussian, x(n) becomes a
version of X as n --+ oo. Conditions for the convergence of Fourier series can be found,
for example, in [43] (Section 11.5) and [183] (Chapters 1 and 7). •

Monte Carlo algorithm. The generation of samples of X in a bounded rectan-


gle R involves two steps:
1. Generate samples of the correlated Gaussian variables Ao and Ak,,k2 , Bk,,k2 ,
Ck, ,k2 , and Dk,,k2 , kt, k2 = 1, ... , n (Section 5.2.1).
2. Calculate the corresponding samples of x(n) from Eq. 5.26.

The above Monte Carlo simulation algorithm is useful in many applications


involving functions that vary randomly in space. We will use this algorithm in
Chapters 8 and 9 to generate samples of the spatial variation of the atomic lattice
orientation and other material properties.

5.3.3 Non-Gaussian processes and fields


We consider non-Gaussian functions defined by transformations with and
without memory of Gaussian functions as well as point and related processes.
These processes have been discussed in Chapter 3. Our objective is to develop
Monte Carlo algorithms for generating samples of these classes of non-Gaussian
random functions.

5.3.3.1 Memoryless transformations


Let Y(t), t E r',
be an ~d -valued stationary Gaussian process (d' = 1)
or homogeneous Gaussian field (d' 2:: 1). The argument of Y is viewed as time
for stochastic processes and space for random fields. The coordinates of Y have
mean zero, variance one, and covariance functions Pi,j (T) = E[Yi(t + T) Yj(t)],
i, j = 1, ... , d. Consider also a continuous mapping g : ~d --+ r.
The non-
Gaussian random function
X(t) = g(Y(t)), (5.27)
is called a translation stochastic process or random field (Section 3.6.6). It is
common to define the coordinates of X by the transformations X i ( t) = g i ( Yi ( t)),
5.3. Stochastic processes and random fields 317

i = 1, ... , d, depending only on the corresponding coordinates of Y. For exam-


ple, we may take g i = Fj 1 o ci>, where Ft, i = 1, ... , d, are some distributions
and ci> denotes the distribution of N(O, 1) ([79], Section 3.1). The random func-
tions Xt have the marginal distribution Ft and correlation functions

~i,j (r) = E[Xt (t + T) Xj (t)) = k


2
gi(u) gj(V) </J(u, v; Pi,} (T)) du dv, (5.28)

where¢(·,·; p) denotes the density of an JR2 -valued Gaussian variable with mean
zero, variance one, and correlation coefficient p. If d = 1, the finite dimensional
densities of X are

(5.29)

where Y = (Y(tt), ... , Y(tn)) is a Gaussian vector with mean zero and covari-
ance matrix p = {p(ti- lJ) = E[Y(ti) Y(tJ)]}, Xt E lR, It E JRlf', i = 1, ... , n,
¢ (y1, ... , Yn; p) denotes the density of Y, and Yi = ci> - 1 ( F (xt)). Similar results
can be found for JRd -valued random functions with d > 1.
The translation processes and fields are stationary, can follow any marginal
distribution, and their correlation function is completely defined by the marginal
distributions Fi and the covariance functions Pi,J of Y. Generally, the marginal
distributions Ft and the correlation functions ~i,J are available in applications,
for example, they may be estimated from records of X. Hence, the covariance
functions Pi,J defining the underlying Gaussian function Y have to be calculated
from Eq. 5.28. It turns out that the functions (Ft, ~i,J) cannot be arbitrary. They
must be such that the solutions Pi,J of Eq. 5.28 are covariance functions ([79],
Section 3 .1.1 ). If this condition is not satisfied, there is no translation model with
the specified properties (Ft, ~i,J ).

Monte Carlo algorithm. The generation of samples of X involves two steps:


1. Generate samples of Y by any of the methods in Section 5.3.
2. Calculate the corresponding samples of X from Eq. 5.27.

Example 5.15: Suppose that X is a real-valued translation process with mean


zero, covariance function E[X(t + r)X(t)] = e-alrl (3 + 2e-2alrl)j5, a >
0, and marginal distribution F(x) = ci> (lxl 1/ 3 sign(x)). This process can be
defined by the memory less transformation X (t) = Y (t) 3 , where Y is a stationary
Gaussian process with mean zero and covariance function p(r) = e -a lrl ([79],
Section 3.1.1).
Figure 5 .11 shows a sample of Y and the corresponding sample of X for
a = 1. The sample of Y has been obtained by the spectral representation method
in Eq. 5.12 with v* =50 and n = 500. <>
318 Chapter 5. Monte Carlo Simulation

-2

~o~----~5o-------1~oo_______
1L5o_______
200L______2~5-o----~soo

30,------.------~-------,------~------~----~

Sample of X = Y
20

-20

-30 ' - - - - - - - - - ' - - - - - - - - ' - - - - - - - - - ' - - - - - - - - - " - - - - - - - - - ' - - - - - - _ _ _ J


0 50 100 150 200 250 300
time

Figure 5.11. Samples of a Gaussian process Y with an exponential covariance


function and of its translation image X = Y 3

Example 5.16: Let Y(t) = (a1 Y1 (ti) + a1 Y2(t2))) (af+ai)- 112 be areal-valued
Gaussian field, where t = (t1, t2) E R2 , ak are constants, and Yk are stationary
Gaussian fields with mean zero, variance one, and spectral densities that are zero
outside the frequency ranges [ -v;, vkJ, k = 1, 2, and take constant non-zero val-
ues in these ranges.
Figure 5.12 shows a sample of Y(t) fort E [0, 5] x [0, 5] and the translation
image of this samples defined by the memoryless transformation X (t) = Y (t) 3 .
Results are for a1 = a2 = 1, vi = 10, and vi
= 5. The samples of Y1 and Y2
have been obtained by the algorithm in Eq. 5.22 with n 1 = n2 = 5. <>

The class of translation processes in Eq. 5.27 can be extended to the class of
mixtures of translation processes (Section 3.6.7). The non-Gaussian processes in
this class can match not only correlation and marginal distributions as translation
processes, but also higher order correlation functions. This property is illustrated
by the following example.

Example 5.17: Let Xk(t) = (eYk(t) - e 112) j-./ e2- e, k = 1, 2, be lognormal


translation processes, where Y1 and Y2 denote two independent stationary Gaus-
5.3. Stochastic processes and random fields 319

Figure 5.12. Samples of a Gaussian field Y and of its translation image X = Y3


(adapted from [79], Fig. 3.7)

sian processes with mean zero and covariance functions


Pl (a) =e-A. lal, ). > 0,

( sin ((vb - Va) a/2) cos ((vb + Va) a/2)


~~=
(vb- Va)a/2
,
The marginal distribution and density of the processes X k are

F(x) = <I>(log(x .Je=1 + 1) + 1/2)),


f(x) =
x
~
e-1+1
<f>(log(x .Je=1 + 1) + 1/2) (5.30)

for x > -ej,Je 2 - e. The first and second order correlation functions of the
processes X k are
ePk(a) _ 1
gk(a) = E [Xk(t) Xk(t +a)]= ,
e-1
ePk(a)+pk(u)+pk(a-u)
~k(a, a) = E [Xk(t) Xk(t +a) Xk(t +a)] = (e _ 1) 312 (5.31)
320 Chapter 5. Monte Carlo Simulation

Let X be a mixture of the translation processes X 1 and X2 so that the fi-


nite dimensional distributions, Fn and F~k), of any order n ~ 1 of X and X k.
respectively, are related by Fn = .Ei=
1 Pk F~k), where Pk ~ 0 and .Ei=
1 Pk = 1
(Section 3.6. 7). The Monte Carlo algorithm in Section 5.3.1.3 based on the sam-
pling theorem can be applied to generate samples of X. However, the generation
i~ less simple in this case since the distribution of the conditional random variable

X((n 1 + n + 2) t*) in Eq. 5.20 cannot be obtained analytically.


Let j< 2 n+ 2) and J<2 n+ 3) be the joint density functions of

(X((n 1 - n) r), ... , X((n 1 + n + 1) r)) and


(X((n 1 - n) r), ... , X((n 1 + n + 1) r), X((n 1 + n + 2) r)),

respectively. Let (X((n 1 - n) r, w) = Z1, ... , X((n 1 + n + 1) r, w) = Z2n+2) be


a sample of the first vector. The density of the conditional vector in Eq. 5.20 is
j(x I z) = J< 2 n+ 3)(z, x)jJ< 2 n+ 2)(z), where Z = (Z1, ... , Z2n+2).
Figure 5.13 shows five samples of X for A.= 1/5, Va = 7, and Vb = 10 cor-
responding to P1 = 0, 1/4, 1/2, 3/4, and 1. The samples have been generated by
the Monte Carlo algorithm in Eqs. 5.20 and 5.21 using a nodal spacing t * = 0.1
and a window size n = 5. The samples illustrate the dependence of the correlation
structure of X on the correlation functions of the constituent processes X k and the
weights Pk of these processes in the definition of X. Figure 5.14 shows the first
order correlation function of X for p 1 = 1/2. Consider also a translation process
XT with the marginal distribution F in Eq. 5.30 and the covariance function in
Fig. 5.14. Figure 5.15 shows a three dimensional view and contour lines of the
difference ~ - ~T between the second order correlation functions of X and X T.
The figure suggests that the translation process X T may be inadequate to model
X if the second order correlation function of this process needs to be represented
accurately in addition to its marginal distribution and first order correlation func-
tions.<>
Note: There is no simple and efficient way to generate samples from/(· I z) since the
vector z changes in time. The following algorithm has been used for the calculations in the
figures. Let u be a sample of a random variable uniformly distributed in (0, 1) and denote
by (a, b) the range of/(· I z) used for numerical calculations. If u ::::; 0.5, integrate the
J: /Cot I z) dot = u. If u > 0.5,
J:
conditional density from the left to find x such that
integrate from the right to find x satisfying the condition j(a I z) dot = 1 - u. The
solution xis a sample of/(· I z). A

5.3.3.2 Transformations with memory


If the current value of a stochastic process X is obtained from the past
history of another process Y, the mapping from Y to X is said to have memory.
Transformations with memory can be defined by differential equations with input
Y and output X. These transformations cannot be used to define random fields
because, unlike time, space does not have a natural flow.
5.3. Stochastic processes and random fields 321

':~
-10
100 20 40 60 80 100

0~
-10

,:~r
-10
100 20 40 60 80 100

0~
- 1 0 ·

':~~0
-10L----------~------~----------------------~--------L-------~------~
0 ~ ~ w 00 100
time

Figure 5.13. Five samples of X(t) corresponding to p 1 = 0, 114, 112, 3/4, and 1

~('t)

0.8

oL---~--~----~--~--~
0 2 't 3 4 5

Figure 5.14. First order correlation function~ of X for p 1 = 1/2

Our objective is to generate samples of X. It is assumed that (1) X is an


JE.d -valued process defined by a differential equation with random input Y, (2) the
defining differential equation for X is known, and (3) Y is Gaussian, Poisson,
Levy, or sernimartingale noise.
322 Chapter 5. Monte Carlo Simulation

0.4

0.3

0.2

0.1

0
0
0

2 3
3 3 (j
(j

Figure 5.15. Difference~ - ~T of the second order correlation functions of the


processes X and X T

Monte Carlo algorithm. The generation of samples of X involves two steps:


1. Generate samples of the driving noise Y.
2. Integrate the differential equation of X driven by samples of Y to obtain
samples of this process.

Note: The numerical methods in Section 4.7.3 can be applied to the differential equation
defining X to calculate samples of X from samples of Y. This section considers only
the first step of the above Monte Carlo algorithm, that is, the generation of samples of Y,
assumed to be a Gaussian, Poisson, or Levy noise. .l

Gaussian white noise. Let B be a real-valued Brownian motion. The increments


of B in non-overlapping time intervals are independent Gaussian variables with
mean zero and variance equal to the duration of these time intervals, that is, the
successive time increments I::!..B(t) = B(t+l::!..t)-B(t), l::!...t > 0, of Bare indepen-
dent N(O, l::!..t) variables. If the driving noise is an JEd' -valued Brownian motion B
with independent coordinates, then its increments I::!..B(t) = B(t + l::!..t)- B(t) are
independent JEd' -valued Gaussian variables with mean zero and covariance matrix
i l::!..t, where i denotes the (d', d') identity matrix.
Samples of B can be generated by the recurrence formula

B(tk+l) = B(tk) + .;t;:i randn(d', 1), (5.32)

where l::!..t > 0 is a time increment, tk = k l::!..t, k = 0, 1, ... , and B(to) = 0.


5.3. Stochastic processes and random fields 323

Note: The time increment !':!..t should correspond to the time step used to integrate the dif-
ferential equation for X. The algorithm in Eq. 5.32 is based on properties of the Brownian
motion process . .A.

Poisson white noise. Let C(t) = 'Lf~? Yk be a compound Poisson process,


where N denotes a Poisson process with intensity A. > 0 and Yk are independent
copies of a random variable Y J. The process C has stationary independent incre-
ments. If A. /:).t « 1, the increment of C in (t, t + [).t] is either one of the variables
Yk or zero, with probabilities 1 - e-'A !'it ::::: A. /:).t or e-'A 1'1t ::::: 1 -A. /:).t, respec-
tively. If Y1 E Lz, the mean and covariance function of Care E[C(t)] =A. t E[Y J]
and E[(C(t)- E[C(t)])(C(s)- E[C(s)])] =A. min(t, s) E[Yf]. If E[YJ] = 0
and A. E[Yf] = 1, C and the Brownian motion B are equal in the second mo-
ment sense. However, the samples of these processes differ significantly. The
Brownian motion has continuous samples a.s. while the samples of C have jump
discontinuities.

Samples of C in a time interval [0, r ], r > 0, can be generated in two steps:


1. Generate samples of the jump times h and the jumps Yk of C.
2. Calculate samples of C corresponding to the samples of Tk and Yk.

Note: The number of jumps of C in (0, r] is given by a Poisson random variable N(). r)
with intensity ). r (Example 5.3). The jump times of C are uniformly distributed in (0, r]
conditional on the number of jumps. The generation of N (). r) independent samples of Y1
can be based on the methods in Sections 5.2.1 and 5.2.2. Additional information on the
generation of samples of C can be found in Sections 5.3.1.1 and 5.3.1.2.
The above approach can be extended to~~ -valued compound Poisson processes
C(t) = L,f~? Y b where Y k are iid Rd' -valued random variables . .A.

Levy white noise. The general form of a Levy process L is given by the Levy
decomposition in Section 3.14.2. The Levy process includes the Brownian mo-
tion and the compound Poisson processes so that the Gaussian and Poisson white
noise processes are special cases of the Levy noise. We have already generated
samples of L using the representation of this process in Section 3.14.2. We do not
restate this Monte Carlo algorithm. Instead, we give a Monte Carlo algorithm for
generating samples of an a-stable process La, that is, a Levy process whose incre-
ments [).La(t) = La(t + /:).t) - La(t) are independent Sa(([).t) 1fa, 0, 0) random
variables (Example 5.1).

Samples of La can be generated by the recurrence formula

(5.33)

where fk = k !':!...t, k = 0, 1, ... , La(O) = 0, and Zk are iid ~SaO, 0, 0).


324 Chapter 5. Monte Carlo Simulation

Example 5.18: Let La(t), t E [0, r], be an a-stable Levy process. Samples of
this process for a= 1.0, 1.5, 1.7, and 1.99 have been generated in Section 3.14.2
based on a representation of La as a sum of compound Poisson processes with
small and large jumps. Figure 5.16 shows samples of La for the same values of

500,--------~---------, 20,----~------,

a=l a=l.5

-500

-1000

-1500L__-~-~-~-~-_j -80'----~-~-~-~-----"

0 20 40 60 80 100 0 20 40 60 80 100
time time

10,--------------,
a=l.7
0

20 40 60 80 100 20 40 60 80 100
time time

Figure 5.16. Samples of a Levy process

a generated by the algorithm in Eq. 5.33. The size of the jumps of La decreases
with a. The sample of La for a = 1.99 resembles the sample of a Brownian
motion process. <>

Semimartingale noise. We have seen that any semimartingale admits a decom-


position in an adapted process with dtdlag samples of finite variation on compacts
and a local martingale (Section 4.4.1). The generation of samples of a process
with finite variation on compacts does not pose difficulties. An example of such a
process is the compound Poisson process. We focus here on the generation of the
martingale component of a semimartingale. Most of the martingales used in this
book are defined by (1) sums of two stochastic integrals with Brownian motion
and compensated Poisson processes integrators or (2) memoryless transforma-
tions of Brownian motions. Samples of these semimartingales can be calculated
from samples of Brownian motion and compound Poisson processes.
Example 5.19: We have seen that a Brownian motion B and a compound Poisson
process C can have the same second moment properties. Another example of a
5.3. Stochastic processes and random fields 325

process that is equal to B in the second moment sense is the process M * with
increments dM*(t) = B(t) dB(t)/ ,Jt, t > 0. Figure 5.17 shows several samples
of B and the corresponding samples of M *. The sample properties of these two

Figure 5.17. Samples of a Brownian motion and the corresponding samples of


M*

processes differ significantly although their increments have the same first two
moments and are not correlated.
It is common in the engineering literature to define white noise in the sec-
ond moment sense. This example demonstrates one more time the need for a
precise definition of the white noise process that has to go beyond second moment
properties. ¢
Proof: Consider the martingale M(t) = B(t) 2 - t. Denote by dM*(t) = B(t) dB(t)j./i,
t > 0, the increments dM(t) = 2 B(t)dB(t) of M scaled by 2./i. The first two moments
of the increments of M* are E[dM*(t)] = 0 and E[dM*(t) dM*(s)] = 8(t- s) dt so that
M* and B are equal in the second moment sense. However, while the increments of B are
independent, those of M* are not.
An algorithm for generating samples of B was discussed previously in this section.
The recurrence formula M*(t + dt) = M*(t) + dM*(t) can also be used to produce
samples of M*. •

5.3.3.3 Point and related processes


We have already used point processes in Examples 5.8 and 5.10 to pro-
duce samples of stationary Gaussian processes and fields. This section provides
algorithms for generating samples of (1) two point processes, the binomial and
the Poisson processes, (2) a compound Poisson processes, and (3) the Voronoi
tessellation.
Binomial process. Consider a bounded Borel set B in JRd' with Lebesgue measure
vB, that is, vB is the volume of B. A random point T E B is said to be uniformly
326 Chapter 5. Monte Carlo Simulation

distributed in B if T belongs to B' c B, B' E B(r'), with the probability


PB' = P(T E B') = VB' /VB.
Let A~n), i = 1, ... , n, be a partition of B in Borel sets and denote by v rn)
the volwne of A~n). The probability that a uniformly distributed point in B falls in
A~n) is p}n) = vjn) /VB. Consider k independent uniformly distributed points in B,
and let ki ::: 0 denote the number of these points falling in A }n), so that we have
2:7= 1 ki = k. The probability that ki points are in A~n), i = 1, ... , n, referred to
as the multinomial probability, is

kl
k1! · ···kn! n V~B
n

1=1
( ~n))ki
(5.34)

r),
The special case of Eq. 5.34 for n = 2 and k2 = k - k1 gives the probability that
k1 ::: 0 out of k independent uniformly distributed points in B fall in A that is,
the binomial distribution. If the number of points k in B and the probability p A 1
that a uniformly distributed point in B falls in A 1 tend to infinity and zero, respec-
tively, such that k p A 1 is a constant A. vB, the binomial distribution converges to a
Poisson distribution with intensity A..

Monte Carlo algorithm. The generation of n independent uniformly dis-


tributed points in a Borel set B c Rd' involves two steps:
1. Select a rectangle R c !Rd' including B.
2. Generate independent uniformly distributed points in R until n of these points
fall in B. The points in B are uniformly distributed in this set.
Note: The MATLAB function rand can be used to generate independent, uniformly dis-
tributed samples in the unit cube [0, l]d'. The columns ofrand(d1 , n) give the coordinates
of n such samples. Uniformly distributed samples in R can be obtained by translating
and scaling the entries of rand(d1, n). Alternative methods can be developed to produce
uniformly distributed samples in B ([180], pp. 38-40). A

Poisson process. A stationary or homogeneous Poisson process can be viewed as


a random measure
N : ( Rd' , B(Rd' ) ) ---+ (Z+, K) , (5.35)
where the a-field K consists of all the parts of Z+ = {0, 1, ... } and N(B) gives
the number of points in a bounded Borel set B E B(JRd'). This random measure
is defined by two properties: (1) N(B) is a Poisson variable with intensity A. v B·
where v B is the volume of B and A. > 0 is a constant and (2) the random variables
N(B) and N(B') are independent forB, B' E B(JRd') and B n B' = 0. These
properties of N imply

p (N(B1) = k1, ... ' N(Bm) = km)


m
=a~ exp -
(A. Vj )ki (
EAVj '
m )
(5.36)
5.3. Stochastic processes and random fields 327

where Bi E B(JE.d'), i = 1, ... , m, are disjoint subsets with volume vi. We


also note that (a) N is stationary and isotropic, that is, the process is invariant to
translation and rotation, (b) the conditional process N I (N(B) = n) is binomial,
and (c) E[N(B)] =A VB ([180], Section 2.4.1).
Consider a generalized Poisson process N characterized by a point depen-
dent intensity X(x), x E JE.d', and properties resembling N. For example, the prob-
ability of the event {N(B) = n} is (AB VB)n exp( -AB VB)/n! for n = 0, 1, ... ,
and B E B(JE.d') so that E[N(B)] = AB VB, where AB VB = JB X(x) dx. Also,
the random variables N(B) and N(B') are independent forB, B' E B(JE.d') and
B n B' = 0.

Monte Carlo algorithm. The generation of samples of N in B involves two


steps:
1. Generate a sample of the Poisson random variable N(B).
2. Place N (B) independent, uniformly distributed points in B.

Monte Carlo algorithm. The generation of samples of N in B involves two


steps:
1. Apply the above algorithm to generate samples of N * in B, where N* is a
Poisson process with intensity A* = maxxEB X(x).
2. Modify the resulting points of N* according to the following thinning rule. A
pointx E B of N* is eliminated with probability 1- X(x)/A *.This modification
is performed independently of the other points of this process.

Proof: The expectation n B = E [ L:1(B) I (T i) I A* Jgives the average number of points


retained from the N*(B) points in B since a Poisson point Ti is kept with probability
l(Ti)/A*. Wehave

"B ~E {E [f i(T;)/A' I N'(B)] l~ :, E ( N'(B) E [XCT;)]l

= -AB E[N *(B)]= AB VB


A*
since the points T i are mutually independent and uniformly distributed in B conditional
on N*(B), AB = (lfvB) fB l(x)dx = E[l(Ti)], and E[N*(B)] =A* VB· Hence, the
expected number of points kept from N* (B) is equal to E [N] = AB v B. •

Example 5.20: Let N be a Poisson process with intensity A > 0 defined on JE. 2
and B = [0, 1] x [0, 1] be a Borel set in this space. Consider also a generalized
PoissonprocessN with intensity X(x) = 6 AXf x2 defined on the same space. The
processes Nand N have the same average number of points in B. Figure 5.18
shows samples of N and N in B, respectively, for A = 100. The points of N are
uniformly spread in B while the points of N are concentrated in the upper right
comer of B where X takes on larger values. <>
328 Chapter 5. Monte Carlo Simulation

Samples of ./If
..Samples of N ·..
.....
. . .: . . :
\

0.8
. ... 0.8
.. . .. .· .. ···: .
.··.: .. ....· ...·· ..
0.6:: 0.6
\ ·...
. .. .. ..
. ·:
0.2 ..... 0.2
..
oL_~--~~--------~ OL-------~--------~
0 0

Figure 5.18. Samples of the Poison processes N and N

Compound Poisson processes in space. Consider the real-valued compound


Poisson process C(v) = L:f~f) Yk lB(Tk) (v) in Eq. 5.16, where B c m_d' is
a bounded Borel set, N(B) denotes a Poisson variable with intensity A v B. A > 0,
VB is the Lebesgue measure of B, T = (T1, ... , Ta') are points of N in B,

B(T) = B n ( x1~ 1 [Tj, oo)), and Yk denote independent copies of a random


variable Y1 with finite first two moments.

Monte Carlo algorithm. The generation of samples of C in B involves two


steps:
1. Generate a sample of the Poisson random variable N(B), N(B) independent,
uniformly distributed points T k in B, and N (B) independent samples of Y1·
2. Calculate the corresponding samples of C from its definition and samples of
Tk and Yk.

Voronoi tessellation. The Voronoi tessellation is used in forestry, astrophysics,


biology, material science, geology, physics, ecology, and other fields of applied
science and engineering ([180], Section 10.2).
Let P denote the class of open convex non-empty polygons p in R 2 • A
subset of P is said to be a planar tessellation if its elements p are disjoint, the
union of their closure fills the plane, and the set {p : p n B i= 0} is finite for any
bounded B E B(R2 ). A random tessellation is a random variable with values in
the class of tessellations.
Let N be a Poisson process defined on m_d' with intensity A > 0. Almost
all points x E m_d' have a nearest point T x in the set {T 1 , T 2, ... } of points of N.
5.4. Improved Monte Carlo simulation 329

The open convex polytopes defining the Voronoi tessellation are

Pk = {x E !Rd' :II x- Tk 11<11 x- T1ll,l #k}, k = 1,2, ... , (5.37)

that is, the polytope k consists of the points in JRd' that are closest to T k.

Monte Carlo algorithm. The Peneration of a sample of a Voronoi tessellation


in a bounded Borel set B of JRd involves two steps:
1. Generate a sample of a Poisson process N in B, that is, a sample of the vector
(T1, ... , TN(B))·
2. Construct the edges of the polytopes defining the Voronoi tessellation, that
is, the hyperplanes that pass through the midpoint of the segments connecting
adjacent points of N(B) and are orthogonal to these segments.
Note: The Voronoi tessellation can be generalized in various directions, for example, the
Poisson process N may not be stationary and/or the hyperplanes defining the polytope
edges may not intersect the segment between adjacent Poisson points at its midpoint. ~
Example 5.21: Consider the inhomogeneous Poisson process N in Example 5.20.
Figure 5.19 shows samples of Voronoi tessellations in B = [0, 1] x [0, 1] corre-

Homogeneous Inhomogeneous

0.8

0.6
~N

0.2

0
0 0.2 0.4 0.6 0.8

X!

Figure 5.19. Samples of Voronoi tessellations corresponding to the homogeneous


and inhomogeneous Poisson processes in Example 5.20

sponding to the samples of Nand N in Fig. 5.18. The sample of the Voronoi
tessellation obtained from the sample of N shows that it is possible to generate
microstructures with grains of a wide range of sizes and shapes. <>

5.4 Improved Monte Carlo simulation


The main limitation of the Monte Carlo method is the computation time,
which can be excessive. The time required to obtain a sample of the state X of
330 Chapter 5. Monte Carlo Simulation

a physical system and the number of samples needed to estimate properties of X


determine the computation time. Software packages developed for the integration
of deterministic differential equations are used to calculate samples of X from
input samples. Generally, (1) marginal reductions of the computation time can be
obtained by modifying existing packages for integrating deterministic differential
equations since most of these packages are efficient and (2) a large number of
output samples may be needed in a broad range of applications. For example,
suppose that the probability Ps ( r) that X does not leave a safe set D during a time
interval [0, r], that is, the system reliability in [0, r], needs to be calculated. If
Ps(r) is of the order 1- w- 5 , we need at least 10 6 samples of X to estimate this
probability.
It is shown that a stochastic problem can be modified by changing the time
and/or the measure such that the Monte Carlo solution of the modified problem
can be based on fewer samples of X and the calculation of each sample of this
process may even require less computation time. We use numerical examples to
assess the usefulness of changing the time and/or the measure in Monte Carlo
studies.

5.4.1 Time change


Suppose that X E m;d is the solution of the linear stochastic differential
equation

dX(t) = a(t) X(t) dt + b(t) dB(t), t 2:: 0, (5.38)

driven by a Brownian motion B E m;d', where a and b are such that this equation
has a unique solution (Section 4.7.1.1). Our objective is to estimate properties of
the solution X of this equation.
Let (} and X (0) be the Green function and the initial state, respectively, for
Eq. 5.38. The solution of this equation is X(t) = O(t, 0) X(O) + M(t), where

M(t)= fo 1
0(t,s)b(s)dB(s), t2::0, (5.39)

is a martingale under some conditions. For example, M is a square integrable mar-


tingale if(} b is square integrable in (0, t). We can generate samples of X from
(1) numerical solutions ofEq. 5.38 with B replaced by its samples or (2) samples
of M added to the function (} (t, 0) X (0). If X (0) is random, we need to use sam-
ples of X (0) in (} (t, 0) X (0). The latter approach is used in this section to generate
samples of X in conjunction with the time change theorem for martingales. Our
discussion is limited to a single driving noise, that is, B = B is a real-valued
Brownian motion.
5.4. Improved Monte Carlo simulation 331

Time change theorem. If M(t), t ::=:: 0, is a continuous local martingale such


that M(O) = 0 and limt-HlO[M, M](t) = oo a.s., then

M(t) = B([M, M](t)) a.s. fort ::=:: 0, (5.40)

where B is a Brownian motion ([147], Theorem 41, p. 81). We refer to t and


T(t) = inf{s > 0: [M, M](s) > t} as the original and the new clocks or times,
respectively. If T is deterministic, it is denoted by r.

Note: We only give here a heuristic justification of the time change theorem in a simplified
setting. Let (Q, :F, P) be a probability space with the filtration :F1 = a(B(s) : 0 :::;
s :::; t), t 2: 0, where B is a Brownian motion process. Let M(t) = J6
A(s) dB(s) and
assume that A is an :Fradapted real-valued process that is independent of B and has the
property limt--+oo J6 A(s) 2 ds = oo a.s. Conditional on A, the increments A(s) dB(s)
of M are independent and follow a Gaussian distribution with mean zero and variance
A(s) 2 ds. These increments have the same distribution as a Brownian motion B running in
a different clock T defined locally by dT(s) = A(s) 2 ds so that T(t) = J6
A(s) 2 ds and
limn--+oo T(t) = oo a.s. by hypothesis. If A(·) is a deterministic function a(·), the new
J6
clock T(t) = -r(t) = a(s) 2 ds is a deterministic increasing function.
We note that the time change in Eq. 5.40 modifies the diffusion coefficient since the
martingale in Eq. 5.39 giving the noise contribution to the solution of Eq. 5.38 becomes a
Brownian motion in the new clock. We will see that the change of measure considered in
the following section causes a change of the drift coefficient. A

Monte Carlo algorithm. The generation of samples of X involves two steps:


1. Generate samples B in the new clock and map them into the original clock to
find samples of M (Eq. 5.40).
2. Calculate the corresponding samples of X from X(t) = O(t, 0) X(O) + M(t),
where samples of X (0) need to be used for random initial conditions.

Note: The following examples show that this Monte Carlo algorithm is preferable to al-
gorithms based on a direct integration of Eq. 5.38 if the relationship between the new and
original clocks is deterministic. If this condition is not satisfied, the use of the time change
theorem in Monte Carlo simulation seems to be impractical. A

Example 5.22: Let X be defined by X(t) + 2 ~ v X(t) + v2 X (t) = {3 W (t), t ::=:: 0,


where ~ E (0, 1), v > 0, W(t) stands for the formal derivative of a Brownian
motion B, {3 is a constant, and (X(O), X(O)) = (Xo, Xo). We have

X(t) = e-{vt [XoOu(t) + Xo012(t) + M(t)], where

M(t) = f3 lot e{ vs 012(t - s) dB(s),

011 (t) = cos(v t) + ~ v sin(v t)fv, 012(t) = sin(v t)/v, and v = v ~-The
332 Chapter 5. Monte Carlo Simulation

time change theorem gives the representation

X(t) = e-svt [xoOn(t) +Xo812(t) + B('r)J, where

r(t) = (e) [<1- ~ 2)e2 svt -1


v
2
4~ v
+ ~ cos(2lit)- ~ sin(2lit)]
4v

denotes the new time. The mapping from the original to the new times is deter-
ministic in this case.
Figure 5.20 shows the relationship between the original and the new times

9 0.4,---~-~---~-----,

8
New time T(t)
7
0.3
6
5
0.25 \
\estimated
0.2
4
0.15
3
exact
2
0.1

2 4 6 8 10 2 4 6 8 10
Original time, t Original time, t

Figure 5.20. The time change and the evolution of the exact and the estimated
standard deviation of X

and the evolution of the exact and estimated standard deviation of X for v = 1,
~ = 0.05, and f3 = 0.2. The estimated standard deviation of X is based on 200
samples of this process generated in the new clock. <>

Note: The state X= (XI =X, x2 =X) is the solution of Eq. 5.38 with d = 2, d 1 = 1,
(a X)I = X2, (a X)2 = -v 2 X1 - 2 ~ v X2, b1 = 0, and b2 = {3. The process X repre-
sents the displacement of a linear oscillator with damping ratio ~ and natural frequency v
subjected to white noise.
The process M is a square integrable, continuous martingale. The new clock, r (t) =
[M, M](t) = {3 2 JJ e 2 svs B12 (t- s) 2 ds, is deterministic and approaches infinity as t---+
oo. The time change theorem states that M(t) = B(r) a.s. The samples of X used to
estimate the standard deviation of this process were obtained in two steps. First, samples
of B have been generated. Second, the corresponding samples of X have been calculated
from the samples of B and the relationship between M and B. The method is more efficient
than the classical Monte Carlo algorithms, which produce samples of X by integrating the
differential equation for this process driven by samples of W. A

Example 5.23: Consider the martingale M(t) = B(t) 2 - t, t 2: 0, where B is a


Brownianmotion. Thequadraticvariationofthismartingale, T(t) = [M, M](t) =
5.4. Improved Monte Carlo simulation 333

4 f~ B(s) 2 ds, converges to infinity a.s. as t ---+ oo so that the time change theo-
rem can be applied. Because the new time T is random, the mapping from B to M
is complex and its use in Monte Carlo simulation may be impractical. Figure 5.21
shows 20 samples of T obtained from samples of B in the original clock and the

800.-----------~--~--.
,...., 1
New clock, 15 B(s)=M(T- (s))
700
T(t)=[M,M](t)
600

500

400

300

200
-15
-20 L__----~----~-------'
0 10 20 30

Original clock, t New clock, s

Figure 5.21. Samples of the new clock T and the Brownian motion B
corresponding samples of B(s) = M(T- 1 (s)). The process B is a Brownian
motion in the new clock according to the time change theorem. ¢
Note: Because M(t) = 2 JJ B(s) dB(s), we have (Section 4.5.3)

T(t) = [M, M](t) = 4 [f B(s) dB(s), f B(s) dB(s) J= 4 f B(s) 2 ds

so that T is a random process with increasing samples. Define the sequence of stopping
times T1 = inf{t > 1 : B(t) = 0}, T2 = inf{t > T1 + 1 : B(t) = 0}, ... , and note that
(1) fo
00B(s) 2 ds = x1 + x2 + ... 'where x1 = ![
1 B(s) 2 ds, x2 = IP
B(s) 2 ds, .. .,
(2) the random variables X 1, X 2, ... are independent and identically distrib~ted, and (3) the
expectation of X 1 is

E[Xd =E [loTI B(s) 2 ds]::: E [fo1B(s) 2 ds] = fol E[B(s) 2]ds = 1/2.
We have JJ
B(s) 2 ds -+ oo a.s. as t -+ oo because Lk=
1 X; converges a.s. to infinity
as n -+ oo (Section 2.14 in this book, [150], Proposition 7.2.3, p. 563) and its limit is
fooo B(s)2 ds.
The time change theorem states that M(t) = B([M, M](t)) a.s., t ::: 0. The samples
of B in the new clock can be obtained from the recurrence formula

B(s + ~s. w) = B(s, w) + (M(T- 1(s + ~s. w), w)- M(T- 1(s, w), w)), s ::: 0,
334 Chapter 5. Monte Carlo Simulation

where .6.s > 0 is a time increment in the new clock. ~

Example 5.24: Let N be a Poisson process with jump times Tk. k = 1, 2, ... , and
intensity A > 0. The compensated Poisson process N(t) = N(t) -At is a mar-
tingale with quadratic variation [N, N](t) = [N, N](t) = N(t) (Section 3.12).
However, the process B(s) = N([N, Rr 1(s) ), s :::: 0, in Eq. 5.40 is not a Brown-
ian motion since N is not a continuous martingale as required by the time change
theorem in Eq. 5.40. <>
= 1 is
Proof: Let ih denote values of a B at time k .6.s, where k ;::: 0 is an integer and .6.s
a time unit in the new clock. The definition of Bk in Eq. 5.40 gives Bk = N(N- 1 (k)) =
k-). Tk so that Bk- Bk-1 = 1 -). Z, where Z is an exponential random variable with
mean 1/).. Hence, B cannot be a Brownian motion since its increments are not Gaussian
variables. •

5.4.2 Measure change


Consider a probability space (Q, :F, P) and suppose that the Monte Carlo
solution of a stochastic problem defined on this space requires a large number
of output samples. Our objective is to define a new probability measure Q on
the measurable space (Q, :F) such that the Monte Carlo solution of the origi-
nal stochastic problem solved in the probability space (Q, :F, Q) requires fewer
samples. This Monte Carlo solution is referred to as the importance sampling
method.

5.4.2.1 Time invariant problems


Let P and Q be two probability measures on a measurable space (Q, :F).
Recall that Pis said to be absolutely continuous with respect to Q if Q(A) = 0
implies P(A) = 0, A E :F, and we denote this property by P « Q. The mea-
sures P and Q are said to be equivalent if P « Q and Q « P. We indicate that
two measures are equivalent by the notation P ,....., Q (Section 2.9). The Radon-
Nikodym theorem states that if P « Q there exists a positive measurable func-
tion g = dP jdQ : (Q, :F) --+ ([0, oo), B([O, oo))), called the Radon-Nikodym
derivative, such that P(A) = fA g(w) Q(dw), A E :F (Section 2.9). The examples
in this section illustrate how this theorem can be used to improve the efficiency of
Monte Carlo simulation.
Example 5.25: Let (Q, :F, P) be a probability space and X denote a real-valued
random variable defined on this space. Our objective is to estimate the probability
P(X > x) by Monte Carlo simulation. The estimates of P(X > x) by direct
Monte Carlo and importance sampling simulation are
5.4. Improved Monte Carlo simulation 335

respectively, where x; and z; are independent samples generated from the densities
f and q, respectively, where f(~) = dP(X ::=:::Old~, q(~) = dQ(X ::=::: ~)ld~,
and Q is a measure on (Q, F) such that P « Q.
Let X = exp(Y), Y ~ N(1, (0.2) 2 ). The exact probability P(X > x) is
0.3712, 0.0211, 0.1603 X 10- 3 , 0.3293 X 10- 4 , and 0. 7061 X 10-5 for X = 3, 5,
8, 9, and 10, respectively. The corresponding estimates pMc(x) based on 10,000
samples are 0.3766, 0.0235, 0.2 x 10-3 , 0, and 0. The estimates p1 s(x) based on
the same number of samples are 0.3733, 0.0212, 0.1668 x 10 - 3 , 0.3304 x 10-4 ,
and 0.7084 x 10-5 for x = 3, 5, 8, 9, and 10, where the density q(z) = ¢((z-
x) I a) I a corresponds to a Gaussian variable with mean x and variance a 2 . While
the importance sampling provides satisfactory estimates of P(X > x) for levels
up to x = 10, the estimates PMc(x) are inaccurate for x 2: 8. <>
Note: The required probability is

P(X > x) = kl(x,oo)(~) f(~)d~ = Ep [l(x,ooj(X)],

where Ep denotes the expectation operator under P. The estimate PMc(x) approximates
this expectation. We also have

P(X > x) = { [l(x,oo)(~) !<OJ


JJR q(~) q(~)d~ = EQ [l(x,ooj(X) f(X)]
q(X) ,

where q is the density of X under Q. This density has been selected such that 50% of its
samples exceed x and the ratio f I q is bounded in JR. &

Example 5.26: In reliability studies we need to calculate the probabilities P s =


P(X E D) and P1 = P(X E De), called reliability and probability of failure,
where D c ocd is a safe set. The probability of failure is

P1 = [ 1vc(x) f(x) dx = Ep[1vc(X)] or

i[
Jfi!!.d

P1 = f(x)J
1vc(x) - - q(x) dx = EQ [ 1vc(X) -
f(X)J
- ,
fi!!.d q(x) q(X)

where f and q are the densities of X under the probability measures P and Q,
respectively. The measure P defines the original reliability problem. The measure
Q is selected to recast the original reliability problem and is such that P « Q.
The estimates of P1 by direct Monte Carlo and importance sampling are denoted
by pI,MC and pf,IS, respectively (Example 5.25).
Let X have independent N(O, 1) coordinates and D be a sphere of radius
r > 0 centered at the origin of ocd, that is, D = {x E ocd :II X liS r}. Table 5.1
lists the exact and estimated probabilities of failure ford = 10, and several values
of r. The estimates of P 1 are based on 10,000 samples. The density q corresponds
to an ocJ -valued Gaussian variable with mean (r, 0, ... , 0) and covariance matrix
a 2 i, where a > 0 is a constant and i denotes the identity matrix. The direct
336 Chapter 5. Monte Carlo Simulation

Table 5.1. Exact and estimated failure probabilities (d = 10)

r (j
Pt Pf,IS PJ,MC
5 0.5 0.053 0.0 0.0053
1 0.0009
2 0.0053
3 0.0050
6 0.5 0.8414 X 10 -4 0.0001 X 10 -4 0.0
1 o.1028 x 10- 4
2 o.5697 x 10-4
3 1.1580 x 10-4
4 1.1350 x 10- 4
7 0.5 0.4073 x 10- 6 0.0 0.0
1 o.oo16 x 10-6
2 o.I223 x 10_ 6
3 o.6035 x 10- 6
4 0.4042 x 10_6

Monte Carlo method is inaccurate for relatively large values of r, that is, small
probabilities of failure. The success of the importance sampling method depends
on the density q. For example, pf,IS is in error ford = 10 and a = 0.5 but
becomes accurate if a is increased to 3 or 4. <>
Note: The exact reliability,

1- p = p
f
d
( "'x2 < r2
~ l
)
-
= -1-
f'(d/2)
lor0 2
j2 ~d/2-1 e~ d~
,
!=l

is an incomplete gamma function, where f'(-) denotes the gamma function. The densities
of X under the measures P and Q are

f(r) ~ (2K)-d/Z oxp [ -~ t,x{] •nd

q(<) ~ [(2K) qZrd/Z oxp 1- ,:z [(<! -') 2 + ~ <i] )·


The estimate of P f by the importance sampling method is

, 1 ~ f(Zk)
Pf,IS = - ~ 1vc(Zk) -(-),
n k=l q Zk

where Zk are independent samples of X under Q. &


5.4. Improved Monte Carlo simulation 337

5.4.2.2 Time variant problems


The Girsanov theorem can be used to define a new measure Q that is abso-
lutely continuous with respect to the original measure P and provides the frame-
work for a more efficient Monte Carlo solution of the original stochastic problem.
We state the Girsanov theorem without proof and illustrate its use in Monte Carlo
simulation by examples.
Let (Q, :F, (F1 )t>O, P) be a filtered probability space and let B be an JRd'-
valued Brownian motio; on this space. Let X be an JRd -valued diffusion process
defined by

X(t) = x +lot a(X(s), s) ds +lot b(X(s), s) B(s), (5.41)

where X(O) = x denotes the initial state and the coefficients a, bare such that the
solution of this equation exists and is unique (Section 4. 7 .1.1 ).
We need some definitions to state the Girsanov theorem. Let
g(x, t) = a(x, t) + b(x, t) y(x, t), (5.42)
where y(X(t), t) E JRd' is a bounded, measurable function that is Ft-adapted with
dtglad samples. Define an JRd -valued stochastic process Z by

Z(t) = Z(O) +lot g(Z(s), s) ds +lot b(Z(s), s) dB(s), t E [0, -r], (5.43)

that starts as X at Z (0) = X (0) = x, where -r > 0 and

B(t) = B(t)- lot y(X(s), s)ds. (5.44)

Define also a new probability measure Q by

( ddPQ) (t) = exp [tJo y(X(s), s)T dB(s)-


1
2 Jo
t '!; y,(X(s), s)
d' 2 ds ] ,

(5.45)
where t E [0, -r].

Girsanov's theorem. This theorem gives the following properties that are es-
sential for Monte Carlo simulation ([147], pp. 108-114):
1. The process iJ in Eq. 5.44 is a Brownian motion under Q.
2. The solution X ofEq. 5.41 under P satisfies also Eq. 5.43 under Q.
Note: The stochastic equations for X and Z in Eqs. 5.41 and 5.43 (1) have the same
diffusion coefficients but different drift coefficients and (2) are driven by Brownian motions
under the measures P and Q, respectively. The drift of Z can be obtained from the drift
of X and Eq. 5.42. The sample paths of X under P and Z under Q differ. However, the
processes X and Z have the same probability law. l.
338 Chapter 5. Monte Carlo Simulation

The Girsanov theorem can be used to improve the efficiency and accuracy
of the direct Monte Carlo method. For example, let X denote the state of a system
that fails if X leaves a safe set D in a time interval [0, r]. Suppose that our objec-
tive is to estimate the probability of failure P f ( r) of this system by Monte Carlo
simulation. The classical Monte Carlo method generates n samples of X, count
the number of samples n f of X that leave D at least once in [0, r], and estimate
Pj(T) by n1jn. This solution can be inefficient in applications because (1) the
estimates of Pf ( r) have to be based on a very large number of samples of X for
highly reliable systems, that is, systems with P f ( r) « 1, and (2) the computation
time for obtaining a single sample of X can be significant for realistic systems.
The Girsanov theorem provides a procedure for modifying the drift of X if con-
sidered under a new probability Q. If a significant number of samples X under
Q cause system failure, then the number n of samples needed to estimate P f ( r)
can be reduced significantly relative to the classical Monte Carlo method based
on the definition of X under the original probability measure P. The following
algorithm outlines the estimation of P f ( r) based on samples of X under Q.

Monte Carlo algorithm. The estimation of the failure probability P f(T) in-
volves three steps:
1. Select the function y in Eq. 5.42 such that the fraction of samples of X exiting
D at least once during [0, r] is relatively large.
2. Generate samples z; (t) of X under Q and of the Radon-Nikodym derivative
dQjdP, i = 1, ... , n, fortE [0, r].
3. Estimate the failure probability P f(T) by (Girsanov's theorem)

(5.46)

Note: The success of the Monte Carlo simulation algorithm in Eq. 5.46 depends essentially
on the choice of the function y in Eq. 5.42. Unfortunately, there exists no simple method
for selecting y. The following two examples are presented to clarify the statement of the
Girsanov theorem. Estimates of Pj(T) based on this theorem are given in Examples 5.29
and 5.30. These examples also present some practical ideas for selecting an adequate ex-
pression for y in Eq. 5.42. &

Example 5.27: Let (Q, :F, (F1)r:::_o, P) be a filtered probability space and X be
a process on this space defined by dX(t) = adt + bdB(t), t E [0, r], where
r > 0, X(O) = x, B is a Brownian motion under P, and a, b are constants.
Let Z be another process on the measurable space (Q, F) satisfying the equation
dZ(t) = bdB(t), t E [0, r], where Z(O) = x and B is a Brownian motion with
respect to a probability measure Q defined by (d QjdP)(t) = exp[ -(ajb) B(t)-
(a 2 j(2b 2 )) t]. Then X and Z have the same law. The expectation of X(t) and the
probability of the event A = {X(t) _::: ~}, ~ E JR. are x +at and <I>((~- x-
5.4. Improved Monte Carlo simulation 339

a t)/b -J{), respectively, and can be calculated from X under P or from Z, that is,
X under the probability measure Q. <>

Proof: Take y = -ajb so that g = 0 (Eq. 5.42) and (Eq. 5.45),

( ~;) (t) = exp [f ( -ajb) dB(s) - ~ f( -a/b) 2 ds]

= exp [-(a/b) B(t) + a 2 t/(2b 2)]


- t
since B(t) = B(t) - fo (-a/b) ds = B(t) +at jb (Eq. 5.44) and Z(t) = x + b B(t) -

(Eq. 5.43).
We have E p [X (t)] = E p[x +at+ b B(t)] = x +at under P. This expectation is
also

EQ [ Z(t) ~~ (t)] = EQ[(x + b B(t)) exp(a B(t)/b- a 2 t /(2 b2 ))]

= e-a 2 t/(2b 2 ) ( x EQ [exp(a B(t)/b)] + b EQ [B(t) exp(a B(t)/b)]) = x +at

- d
since B(t) = 0 G, G ~ N(O, 1), under Q, and E[exp(u G)] = exp(u2 /2), u E R
Similar considerations can be used to calculate higher order moments of X ( t).
The probability of A= {X(t):::: ~}is as stated since X(t) ~ N(x +at, b2 t) under
P. We also have

P(A) = L(~ ~) (t) d Q(t) = e-a 2 t/(2 b2) Lexp(a B(t)jb) d Q(t)

=e-a2t/(2b2) r eayfbrp(yj../i)j../idy=cf>(~-x-at),
1(-oo,(f;-x)/b] b0

where cf> denotes the distribution of N(O, 1). •

Example 5.28: Let X be defined by dX (t) = a X (t) dt + b X (t) dB(t) under a


probability P. Let Z be the solution of dZ(t) = b Z(t) d B(t) under a probability
Q given by (dQ/dP)(t) = exp(-(a/b) B(t) + a 2 t/(2b 2 )). Then X and Z have
the same law. <>

Proof: The Radon-Nikodym derivative in Eq. 5.45 withy = -ajb gives the above ex-
pression of dQjdP. The process Z is a geometric Brownian motion equal to Z(t)
x exp( -0.5 b2 t + b B(t)) for Z(O) = x. •

Example 5.29: Let X be the process in Example 5.27 with X (0) = x = 0. Our
objective is to estimate P f(r) = P(maxo:::t:sr X(t) > Xcr). If X is a performance
index and Xcr denotes a critical level for a system, then P 1 (r) is the probability of
failure in [0, r].
Let Z be a process defined by dZ(t) = (a+ by)dt + bdB(t), where
iJ is a Brownian motion under a probability measure Q defined by Eq. 5.45 and
340 Chapter 5. Monte Carlo Simulation

y is a constant. The estimates of P f ( r) delivered by the classical Monte Carlo


simulation and the importance sampling methods are

PJ,Mc(r) = ~ ti=1
n
1(xcr.oo) (max {xi(!)})
o.:::r:::-r
and

PJ,IS(T) = ~~ 1(Xcr,OO) c~~;,{zi(t)}) (:~)i (r),


respectively, where Xi and Zi denote independent samples of X and Z. If a = 2,
b = 2, and Xcr = 6, the exact value of Pt(<) is 0.1316 x 10- 11 and 0.0098 for
r = 0.5 and r = 1.5, respectively. The estimate pf,Mc(r) is zero for sample
sizes smaller than 1,000. On the other hand, the estimates pf,l s ( r) based on 500
samplesandZdefinedbydZ(t) = (a+by)dt+bdB(t)are0.1311 x 10- 11 ,
0.1460 x 10- 11 ,0.1023 x 10- 11 , and 0.0919 x 10- 11 for y = 6, 8, 10, and 12
if r = 0.5 and 0.0099, 0.0092, 0.0085, and 0.0077 for y = 1, 1.5, 2.0, and 2.5
if r = 1.5. The latter estimates become 0.0079, 0.0088, 0.0084, and 0.0085 for
y = 1, 1.5, 2.0, and 2.5 if based on 1,000 independent samples of Z. <>
Proof: The exact probability of failure is ([111], Section 15.4)

Pj(r) = In ' Xcr- X


0 b (2 7( t )
[ 1
3 112 exp - -2- (Xcr- x- at)
2b t
2] dt.
Because the probability that the samples of X reach Xcr in [0, r] is very low, a
large number of samples of X is needed to estimate Pf (r) by pf, M c, for example, about
10/10- 11 = 1012 samples for r = 0.5. This number of samples can be reduced signifi-
cantly if we modify the drift coefficient. In this example we have changed the drift such that
itreachesxcr at timer, that is, (a+ by) r = Xcr so that y = (xcr/r- a)fb. For Xcr = 6,
a = 2, and b = 1, we have y = 10 and y = 2 for r = 0.5 and r = 1.5, respectively.
Samples (dQ/dP)i of the Radon-Nikodym derivative dQfdP can be calculated from

(~;) (t) = exp[y B(t) + y 2 t/2],


and samples of B. Alternative forms for y can be found in [181]. •

Example 5.30: Let X be an Ornstein-Uhlenbeck process defined by

dX(t) =-a X(t)dt + ~dB(t), t 2:0,

for X(O) = x and a > 0. Let pf,Mc(r) and PJ,Is(r) be the estimates of Pt(r)
in Example 5.29, where Z is defined by

dZ(t) = (-a Z(t) + y ~) dt + ~ dB(t), t 2: 0,

Z (0) = x, and B is a Brownian motion under Q.


5.5. Problems 341

If ex = 1, Xcr = 2.5, and r = 1, then pf,Mc(r) is zero for sample sizes


n = 500, 1,000, 5,000, and 10,000, and 0.5 x 10 - 4 for n = 100, 000 samples. The
corresponding values of pJ,Is(r) withy = 4 are 0.2340 x 10- 4 ,0.3384 x 10-4 ,
0.3618 X 10-4 , 0.3402 X 10-4 , and 0.3981 X 10- 4 for n = 500, 1,000, 5,000,
10,000, and 100,000 samples, respectively.
Figure 5.22 shows ten samples of X under the measure P and the corre-

4,---------------~---, 4,-------------------~
Samples of X under P Samples of X under Q
3 3

2 X/
cr

-1

-2L-------~--~--~--~
0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
time time

Figure 5.22. Ten samples of X under P and under Q

sponding samples of this process under the measure Q. Because of the drift cor-
rection obtained by Girsanov's theorem, the samples of X under Q are more likely
to exceed Xcr· This property of the samples of X under Q results in superior esti-
mates of Pt(r). <>
Proof: As in the previous example we have modified the drift such that it reaches Xcr
at timer. This condition gives y = (Xcr- xe-a')/[.J2lii(l- e-ar)] so that y =
4.47 for the values considered. The Radon-Nikodym derivative (dQ/dP)(t) is equal to
exp[y B(t) + y 2 t/2].
The measure change used in this example has been established by elementary con-
siderations and is not optimal in the sense that the number of samples of X under the above
measure Q that exceed Xcr in [0, r] is relatively small (Fig. 5.22). Alternative measures Q
can be selected such that approximately 50% of the samples of X under Q exceed Xcr in
[0, r ]. The use of such measures would further increase the efficiency of the Monte Carlo
algorithm for estimating P1 (r). •

5.5 Problems
5.1: Let X ~ N (/L, y) be a Gaussian vector in JRd . Develop an algorithm for gen-
erating samples of X using the fact that conditional Gaussian vectors are Gaussian
vectors.

5.2: Prove Eqs. 5.5 and 5.6.


342 Chapter 5. Monte Carlo Simulation

5.3: Use Eq. 5.9 to develop a Monte Carlo algorithm for generating samples of
the translation vector considered in Example 5.6.

5.4: Complete calculations in the text giving the covariance function of X (n) in
Eq. 5.12.

5.5: Complete the details of the proof showing that X (n) in Eq. 5.12 becomes a
version of X as !l.(pn) --+ 0 and v* --+ oo.

5.6: Complete the details of the proof showing that x<J...) in Eq. 5.13 becomes
equal to X in the second moment sense as v* --+ oo and that x<J...) becomes a
Gaussian process as A.--+ oo.

5.7: Show that the second moment properties of the compound Poisson processes
Cp. p = 1, 2, used to define the process x<J...) in Eq. 5.17 are as stated. Find the
second moment properties of the increments d C P ( v) of these processes.

5.8: Suppose that X is a stationary Gaussian process with mean zero and one-
sided spectral density g(v) = (1/v*) 1[o, v*l (v), v :=:: 0, where v* > 0 is a constant.
Calculate the mean square errore = E[(X(t) - x<n)(t))] of the approximate
representation x<n) of X given by Eqs. 5.12 and 5.19.

5.9: Complete the details of the proof showing that X (n) in Eq. 5.22 becomes a
version of X as na --+ oo, a= 1, ... , d'.

5.10: Redo Example 5.14 with an alternative periodic extension of the covariance
function of X such that c<n,n)(r, r) is approximately equal to c(r, r).

5.11: Prove Eq. 5.29. Find also the finite dimensional density of X ford > 1.
5.12: Prove the convergence of a binomial distribution to a Poisson distribution
under the conditions stated following Eq. 5.34.

5.13: Show that the points of N I N(B) are uniformly distributed in B, where N
is a Poisson random measure defined on Rd' and B E B (Rd') is bounded.

5.14: Repeat the analysis in Example 5.22 for the case in which X is an Ornstein-
Uhlenbeck process.
5.15: Calculate moments of order 3 and higher of X in Example 5.27 using the
definition of this process under the probability measure Q.

5.16: Find the density of the random variable T = inf{t :=:: 0: X(t) = x}, x > 0,
by using the change of measure in Example 5.27 and the distribution of the first
time a Brownian motion B starting at B(O) = 0 reaches x (Section 3.4).

5.17: Extend the analysis in Example 5.30 to a process X defined by the stochastic
differential equationdX(t) = (a X(t) + fJ X(t) 3 ) dt+a dB(t), where a, {J, a are
some constants and B denotes a Brownian motion.
Chapter 6

Deterministic Systems and


Input

6.1 Introduction
The current state of a system is commonly given by the solution of a deter-
ministic differential, algebraic, or integral equation. For example, the functions
giving the displacements of the points of a solid satisfy partial differential equa-
tions obtained from equilibrium conditions, kinematic constraints, and material
constitutive laws. Generally, it is not possible to solve analytically the equations
defining the system state. Numerical methods are needed for their solution. Most
numerical methods are global, that is, they determine the solution everywhere or
at a large number of points in a system even if the solution is needed at a single
point.
This chapter develops alternative methods for solving deterministic differ-
ential, algebraic, and integral equations that have two common features.
( 1) The methods are local, that is, they give directly the value of the solution of
a differential or integral equation at an arbitrary point in the set on which the
solution is defined, rather than extracting its value from the field solution.
For algebraic equations, the methods give the value of a particular unknown
directly.
(2) The methods employ Monte Carlo simulation. Local solutions can be
expressed as averages of functions of some stochastic processes that can
be estimated by Monte Carlo simulation algorithms. A MATLAB function
for finding the local solution of a partial differential equation is given in
Section 6.2.1.3 for illustration.
We consider the following classes of differential, algebraic, and integral
equations and develop methods for finding their local solutions.

343
344 Chapter 6. Detenninistic Systems and Input

Differential eqnation (Sections 6.2 and 6.3):

d d 2
au(x,t) = "ai(X) au(x,t) +~ "fJiJ(X) a U(X,t)
at ~ ax· 2 ~ ax· ax.
i=l l i,J=l l J

+q(x,t)u(x,t)+p(x,t), (x,t)EDx(O,oo), (6.1)

where D is an open subset of JRd, ai, fJiJ are real-valued functions defined on
D C JR<f, d =:: 1 is an integer, and q, p denote real-valued functions defined on
D x [0, oo ). The solution u : D x (0, oo) ---+ lR ofEq. 6.1 depends on boundary and
initial conditions. This equation is meaningful if the first order partial derivative
and the second order partial derivatives of u with respect to t and x, respectively,
are continuous. If the functions q and p depend on only x and aujat = 0, Eq. 6.1
is said to be time-invariant.
In Section 6.4 we consider a class of partial differential equations that define
linear elasticity problems and differ from Eq. 6.1. The local solutions developed
in Sections 6.2 and 6.3 for Eq. 6.1 cannot be applied to elasticity problems. An
alternative solution based on Neumann series representations will be used to solve
locally some elasticity problems.

Algebraic equation (Section 6.5):

(i- Ag)u = tf>, (6.2)

where i is the (m, m) identity matrix, g is an (m, m) matrix, tf> and u denote m-
dimensional vectors, and A is a parameter. It is assumed that the entries of g and
tJ> are real-valued. If tf> =f. 0, A has a fixed value, and det(i -A g) =f. 0, Eq. 6.2 has
a unique solution. If tf> = 0, Eq. 6.2 defines an eigenvalue problem for matrix g.

Integral equation (Section 6.6):

u(x) = ¢(x) +A L g(x, y) u(y) dy, x, y ED, (6.3)

where D is an open subset of JRd and A is a constant. If ¢ = 0, this Fredholm


integral equation defines an eigenvalue problem for the kernel g.
The emphasis in this chapter is on local solutions for Eq. 6.1. Three methods
for obtaining local solutions are considered, the random walk method (RWM),
the sphere walk method (SWM), and the boundary walk method (BWM). Of
these three local methods, the RWM is discussed in detail. The method is a direct
consequence of properties of diffusion processes and the Ito formula. The RWM,
SWM, and RWM provide an alternative to traditional computation methods, for
6.2. Random walk method 345

example, the finite element, boundary element, and finite difference methods, used
currently to obtain numerical solutions of partial differential equations.
Some possible limitations of the traditional computational methods are:
(1) the computer codes used for solution are relatively complex and may require
extensive preprocessing to define a particular problem in a specified format, (2) the
numerical algorithms may become unstable in some cases, (3) the errors caused
by the discretization of the domain of integration and the numerical integration
methods used in analysis cannot be bounded, and (4) the field solution must be
calculated even if the solution is needed at a single point or a small collection of
points in D. In contrast, numerical algorithms based on the random walk method
are simple to program, always stable, accurate, local, and ideal for parallel compu-
tation. However, the local solutions are less general than those using the traditional
computational methods.

6.2 Random walk method


We consider the local solution ofEq. 6.1 with the initial condition

u(x, 0) = 1'/(X), XED, (6.4) I


giving the values of function u in D at the initial time t = 0. The boundary condi-
tions for Eq. 6.1 specify the values of u and/or its derivatives on the boundary aD
of D for t > 0. If derivatives of u are specified on a subset aD n of aD and values
of u are given on aD a = aD \ a Dn, the problem is said to have mixed boundary
conditions. The subsets aDa and aDn are referred to as Dirichlet and Neumann
boundaries. The conditions that the solution must satisfy on the boundaries aD a
and aDn are called Dirichlet and Neumann conditions, respectively.
The partial differential equation in Eq. 6.1 has numerous applications in
science and engineering. For example:
1. The beam equation in bending
d 2 u(x)
~ + p(x) = 0, xED= (0,1), (6.5)

giving the displacement function u of a beam of span l is a special case of


the time-invariant version of Eq. 6.1 with d = 1, u(x, t) = u(x), a 1 = 0,
f3n (x) = 2, q = 0, and p(x, t) = p(x).

2. The Poisson equation


~u(x) + p(x) = 0, x E D, (6.6)
corresponds to the time-invariant solution u(x, t) = u(x) of Eq. 6.1 with
ai(x) = 0, /3ij(X) = 28ij, i, j = 1, ... , d, q = 0, and p(x, t) = p(x),
346 Chapter 6. Deterministic Systems and Input

where .6. = ""£1= 1 a2 ;ax? is the Laplace operator, OiJ = 1 fori = j, and
OiJ = 0 fori f. j. If p = 0, Eq. 6.6 becomes Llu(x) = 0, and is referred to
as the Laplace equation.

3. The heat, transport, and diffusion equations for heterogeneous media are
given by Eq. 6.1 with q = 0, where p has the meaning of flux. The heat
equation for a homogeneous medium with diffusion coefficient a > 0 is

au(x, t) 2
-a'-t----'- =a .llu(x, t) + p(x, t), (6.7)

and is given by Eq. 6.1 with ai(X) = 0, f3iJ (x) = 2a 2 OiJ. and q = 0.
4. The Schrodinger equation for a heterogeneous medium is

d d 2
"" au(x)
~ai(x) -a-.
. Xr
+ 21 ..""
~
a u(x)
f3iJ(x) ~ +q(x)u(x)
X1 x 1
+ p(x) = 0,
I= 1 !,]= 1
(6.8)

and corresponds to the time-invariant solution u(x, t) = u(x) of Eq. 6.1


for q(x, t) = q(x) and p(x, t) = p(x). The Schr6dinger equation for a
homogeneous medium is

1
2 Llu(x) + q(x) u(x) + p(x) = 0, (6.9)

and can be obtained from Eq. 6.8 by setting a i (x) = 0 and f3iJ (x) = OiJ.
The above Schrodinger equations are referred to as inhomogeneous and ho-
mogeneous if p f. 0 and p = 0, respectively.

6.2.1 Dirichlet boundary conditions (q = 0)


Let D be an open subset of JRd and let g be a specified real-valued function
defined on the boundary aD of D. Denote by u the solution ofEq. 6.1 with q = 0,
the initial condition in Eq. 6.4, and the Dirichlet boundary condition

u(x, t) = g(x, t), X E aD, t > 0. (6.10) 1

Note: The boundary condition for the time-invariant version of Eq. 6.1 is u(x) = ~(x),
x E aD. The real-valued function~ giving the boundary values of u in this case differs
from~ in Eq. 6.10. We use the same notation for simplicity. A

6.2.1.1 Local solution


We establish the local solution of Eq. 6.1 with the initial and boundary
conditions in Eqs. 6.4 and 6.10 under the following assumptions.
6.2. Random walk method 347

1. The entries of the matrix {J(x) = {fiij (x)} are real-valued functions and fJ is
a symmetric positive definite matrix for each x E JRd, that is, the eigenvalues of
fJ (x) are strictly positive for all x E D. Hence, there exists a matrix b such that
{J(x) = b(x)b(x)T foreachx ED.
Note: There are several methods for calculating a matrix b satisfying the condition fJ (x) =
b(x) b(x)T for a fixed x E JRll. The Cholesky factorization provides such a representation
for fJ (Section 5.2.1 in this book, [155], Section 3.5.3). An alternative solution is given
by the equality rp(xl f:J(x) rp(x) = l(x), where rp(x) is a (d, d) matrix whose columns
are the eigenvectors of f:J(x) and l(x) is a diagonal matrix whose non-zero entries are the
eigenvalues of f:J(x). If the eigenvalues of f:J(x) are distinct, then we can take b(x) =
rp(x) fL(X), where fL(X) is a diagonal matrix with Jkii (x) = Jl..ii (x). A

2. Set ai(x) = ai(x), i = 1, ... , d, and b such that {J(x) = b(x)b(x)T for all
X E r. Let B be an JRd-valued Brownian motion and let X= (X, Xd+t) be an
JR.d+l-valued diffusion process defined by the stochastic differential equation

dX(s) = a(X(s)) ds + b(X(s)) dB(s),


dXd+l (s) = -ds. (6.11)

It is assumed that the drift and diffusion coefficients of X satisfy the conditions
in Section 4.7.1.1 so that the solution X of Eq. 6.11 exists and is unique. These
requirements impose additional conditions on the coefficients ofEq. 6.1. We also
note that X is a semimartingale with continuous samples.
3. For the time-invariant version of Eq. 6.1, let X be defined by Eq. 6.11 with
X(O) = x E D and let Fr = a(B(s) : 0 ~ s ~ t) denote the natural filtration of
B. If D is a Borel set, then

T = inf{s > 0: X(s) ¢ D} (6.12) 1

is an F 1-stopping time. It is assumed that T corresponding to X starting at x E D


has a finite expectation, that is, Ex [T] < oo.
Proof: Since D is an open subset of JRll, it is a Borel set. For all t ~ 0 and s E [0, t)
rational the event {T < t} = Uo:c:s<r{X(s) ~ D} is in :Ft since {X(s) ~ D} E :Fs C :F1 •
Hence, Tis an :Fr-stopping time (Section 3.4).
The assumption that T has a finite expectation is not restrictive in many cases. For
example, the average time B needs to leave a spherical domain is finite (Section 8.5.1.1).
Also, we will see in Section 7.3.1.3 that differential equations can be obtained for the
expectation ofT from the Fokker-Planck-Kolmogorov equations. The structure of these
differential equations can be used to establish conditions under which £X [T] is finite. •

4. For the general version of Eq. 6.1, let X be defined by Eq. 6.11 with X(O) =
X E D and xd+l (0) = t > 0. Define also the set Dt = D X (0, t) for any t > 0.
If D is a Borel set, then
348 Chapter 6. Deterministic Systems and Input

f = inf{s > 0: X(s) fj. Dr}

is an.F1 = u(B(s): 0.::; s.::; t)-stoppingtime. Note that f cannotexceedt.


Proof: Since D is an open subset of JRli, Dr = D x (0, t) c Rd+ 1 is a Borel set. Also, the
event {f < t} = Uo<sd'{X(s) ¢:Dr} is in Fr for all t E (0, t) and s E [0, i) rational since
c
{X(s) ¢ Dr} E :Fs :Ft- Hence, f is an Frstopping time (Section 3A). •

5. The functions 1J and ~ are continuous in their arguments, the function p is


HOlder continuous in D, the boundaries of D are regular, and the partial deriva-
tives of u in Eq. 6.1 are bounded in Dr.

Note: Consider Eq. 6.1 with q = 0. This equation can be written in the form

showing that p must be continuous in its arguments. It turns out that p has to satisfy a
stronger condition for u to have continuous partial derivatives of order 1 in t and order
2 in x, The function p must be Holder continuous in D, that is, there exist constants
c, a E (0, oo) such that lp(x, t)- p(x', t)l _:: : c II x- x' lla, where x, x' E D ([58],
p. 133).
Let X be a diffusion process starting at x E 3D. We say that x is a regular point
of D if the first timeT, when X exits D, is zero a.s., that is, P(T = 0) = 1. A boundary
point that does not satisfy this condition is said to be an irregular point. The boundary
point {0} of D = {x E R 2 , II x II< 1} \ {0} is an irregular point with respect to a Brownian
motion B in R 2 because B will not hit {0} in a finite time even if it starts at this point ([58],
p. 99). Useful considerations and examples on regular and irregular boundary points can be
found in [58] (Section 4A) and [134] (Section 9.2), That D may have irregular points does
not have significant practical implications on the local solution of Eq. 6.1 because (1) D
can be approximated by a regular subset of r if it has irregular points and (2) almost no
path of the diffusion processes used to estimate the local solution of Eq. 6.1 will ever hit
the irregular points of D, provided that D has a finite number of irregular points ([41],
pp. 49-50). A

If the conditions 1, 2, 4 and 5 are satisfied, then the local solution of Eq. 6.1
with the initial and boundary conditions in Eqs. 6.4 and 6.10 is

u(x, t) = E(x,r)[u(X(T))] + E(X,t) [!of p(X(s)) ds]

= E(X,r)[I](X(t)) If= t] P(T = t) + E(X,t)[~(X(t- T)) If< t] P(T < t)

+ E(X,r) [!of p(X(s)) ds] . (6.14)


6.2. Random walk method 349

Proof: The superscripts of the expectation operator E(X,t) indicate that X(O) = (x, t),
x E D and t > 0. The boundary value of u depends on the exit point of X. This function
is equal to rJ(X(t)) iff= t (Eq. 6.4) and ~(X(t- f)) iff < t (Eq. 6.10).
The generator of the diffusion process i is defined by the limit
. E(x,tl[g(X(s))]- g(x, t)
A[g(x, t)] =lim , (6.15)
s.j,O s
where g is a real-valued function defined on ~+1 with continuous second order partial
derivatives in x E JRd and continuous first order partial derivative in s 2: 0. Denote by
agjax; and a2gjax; ax j the partial derivatives of g with respect to the coordinates x; and
Xi, Xj of i = (x, xa+l) so that Xi =Xi fori = 1, ... , d, d + 1. We have seen that X is
a continuous semimartingale (Section 4.7.1.1) so that we can apply the multi-dimensional
Ito formula (Section 4.6.2) to g(X(s)). This formula gives
E(X,t)[g(X(s))]- g(X(O)) = d+1 E(X,t) [ { sL -
ag(x_~cr)) dX;(cr) ]
Jo
t
i=1 ax,

+!
2 .. 1
E(x,t) [ rs
Jo
(b(X(cr)) b(X(cr))T) .. a 2 g5X~cr)) dcr]
I} ax;aXj
1,]=

by averaging. The processes JJ[ag(X(cr))ja.X;]bij(X(cr))dBj(cr), i, j = 1, ... ,d, are


martingales starting at zero since the functions ag(X(cr))ja.X; are bounded in [0, s] by the
continuity of agja.X; and i, bij are bounded functions by hypothesis, and X is adapted to
the natural filtration of B. Hence, the above equation becomes

E(x,t)[g(X(s))]- g(x, t) = E(x,t) [


Jo
rs (t
i=1
ag(X(cr)) a;(X(cr))- ag(X(cr))) dcr
ax; axd+l

+! t Jo
2 .. 1
l,j=
{s (b(X(cr) b(X(cr))T) .. a 2 g(X(cr)) dcr] .
lj ax;axj

For a small s > 0, the sum of integrals under the expectation on the right side of the above
equation divided by s can be approximated by

Ld -
ag(X(8(w) s, w)) a; (X(8(w) s, w))- ag(X(8(w) s, w))
-

i=1 axi axd+1

+~ Ld
(b(X(8(w) s, w)) b(X(8(w) s, w))T) .. a g(X(B(w) s, w))
2 -

i,j=1 lj ax;axj

for each w E Q, where 9(w) E [0, 1]. Since the drift and diffusion coefficients of X are
bounded functions, i has continuous sample paths, and g has continuous partial derivatives
of order 1 and order 2 with respect to i and x, respectively, the limit of the above function
ass {. 0 is deterministic so that A in Eq. 6.15 is
a a a 1 a a2
A=---+"' ai-
axd+1 L...t ax·l
+-2 "'(bbT);· - - .
L...t J ax· ax.
(6.16)
i=1 i,j=1 l J
350 Chapter 6. Deterministic Systems and Input

The generator of X coincides with the differential operator of Eq. 6.1 for the drift and diffu-
sion coefficients of X in Eq. 6.11. We have performed similar calculations in Section 4.6.2
to show that the generator of a Brownian motion is proportional to the Laplace operator.
The expectation of the Ito formula applied to the function g (X (s)) can be given in the form
(Eq. 6.16)

E(X,t)[g(X(s))]- g(x, t) = E(X,t) [los A[g(X(CT))]dCT].

We have seen that any semimartingale X has the representation X (t) = X (0) +
A(t) + M(t),
where A is an adapted process with dtdUtg samples of finite variation on
compacts and M is a local martingale such that A(O) = M(O) = 0 (Section 4.4.1). Let
Xt (t) = X(O) +(A+ M)t (t) = X(O) +At (t) + Mt (t) be the process X stopped at
- -t -t - - -t
T. Since A and M have the same properties as A and M, respectively, then X is a
semimartingale so that Eq. 6.16 can be applied to the function g(Xt ), and gives

E(x,t)[g(Xt (s))]- g(x, t) = E(x,t) [los A[g(Xt (CT))]dCT].

- t - -
Also, g(X (s)) = g(X(T 1\ s)), fo A[g(X (CT))]dCT =
S - t
fot 1\S A[g(X(CT))]dcr,
- -
and T ~
t, so that the above equation gives

g(x, t) = E(x,t)[g(X(T))]- E(x,t) [lot A[g(X(CT))]dCT] (6.17)

for s 2: t. Eq. 6.17 is referred to as the Dynkin formula ([135], Theorem 7.4.1, p. 118).
If X is defined by Eq. 6.11 and the solution u of Eq. 6.1 with the initial and boundary
conditions in Eqs. 6.4 and 6.10 is used in place of g, then Eq. 6.17 gives Eq. 6.14 since
X(s) EDt for s < f and X(f) is on the boundary of D 1 so that A[u(X(s))] =- p(X(s))
(Eq. 6.1) and
u(X(T)) = { ~(X(t- f)), iff < t,
ry(X(t)), iff= t.
We can replace g with u in Eq. 6.17 since the solution u has a continuous first order deriva-
tive in t and continuous second order partial derivatives in x. Additional technical consid-
erations related to the above proof can be found in [42] (Chapter 6), [58] (Chapter 4), and
[135] (Chapter 9). •

If the conditions 1, 2, 3 and 5 are satisfied, the local solution of the time-
invariant version ofEq. 6.1 with the boundary condition u(x) = ~(x), x E aD,
is
u(x) =Ex [~(X(T))] +Ex [loT p(X(a)) da]. (6.18)

Proof: The expectation of the Ito formula applied to a real-valued function g E C2 (Rd)
6.2. Random walk method 351

and the Rd -valued diffusion process X in Eq. 6.11 is

Ex[g(X(s))]- g(x) =Ex


~los ag(X(u)) ai(X(u))du
[ LJ
i=l 0 axi

+! tr
2 . .
!,j=l
lo !j axiaXj
2
(b(X(u) b(X(u)l) .. a g(X(u)) du] .

Arguments similar to those used to derive Eq. 6.14 show that the generator of X is

a a 1 a a2
A=" ai(X)-
LJ ax·
+-2 "(b(x)b(xlk - - ,
LJ J ax· ax·
i=l ! i,j=l ! J

and the Ito formula applied to g(X) gives

Ex[g(X(s))]- g(x) =Ex [los A[g(X(u))]du J.


The local solution in Eq. 6.18 results by substituting in the above equation T and u for s
andg,respectively, sinceJ[ A[u(X(u))]du =- p(X(u))du. J[
It remains to show that we can use T in place of s in the Ito formula. Let n ~ 1 be
an integer and let Tn = T A n be a sequence of stopping times. The Ito formula can be

[!.' t
applied to the function u(XT•) for each n ~ 1, and gives

E" [u(XT"(')) ] - u(x) ~ E' h j(XT• (a)) dBj(a)]

+Ex [los A[u(XT" (u))] du J,


where h j(XTn (u)) = .Ef=l [au(XTn (u))jaxj] bij (XTn (u)). Since the stochastic integral
{T" {n
Jo hj(X(u))dBj(u) = Jo 1{a<T}hj(X(u))dBj(u)

is a martingale starting at zero, its expectation is zero. If h j is a bounded Borel function in


D, then there is a constant a > 0 such that !hi (x)l ::5 a for all x E D and

E" [ (f" h j(X(a)) dB1 (a))'] ~ [f" E' h 1 (X(a)) 2 da l~ a2 E"(T).

Hence, the family Jf" h j (X(u)) dB j(u), n = 1, 2, ... , is uniformly integrable ([135],
Appendix C) so that

0 = lim Ex [ {Tn hj(X(u))dBj(u)] =Ex [lim {Tn hj(X(u))dBj(u)]


n-+00 lo n-+00 lo
=Ex [loT hj(X(u))dBj(u)].
352 Chapter 6. Deterministic Systems and Input

The functions hi are bounded Borel functions in D by the properties of the diffusion coef-
ficients of X and of the solution u of the time-invariant version of Eq. 6.1. •

Example 6.1: Let D be a simply connected open bounded subset of ffi.d and let
u be the solution of the Laplace equation ~u(x) = 0, x E D, with the boundary
condition u(x) = ~(x), x E aD. Denote by T = inf{t > 0 : B(t) ¢. D} the first
time when an JEd -valued Brownian motion B starting at x E D leaves D. The
local solution is u(x) = Ex [;{B(T))] for each x E D. <>

Proof: The process BT is a square integrable martingale since Tis a stopping time and B
is a square integrable martingale. The Ito formula applied to the function u(BT) gives

u(BT (t)) = u(x) + Ld lot au(BT (s)) dB[ (s) +-I lot t:.u(BT (s))ds
i=l 0 axi 2 0

d lot au(BT (s))


= u(x) + L...
~
0 ax·
dB[ (s),
i=l l

where the latter equality holds since t:.u(BT (s)) = 0 for all s ::::: 0. The stochastic integrals
Mi (t) = JJ [au(BT (s))jaxj] dB[ (s), t ::::: 0, are .Ft-measurable, have finite expectation
since E[Mi(t) 2] = E [JJ (au(B(s))jaxi) 2 ds J and the functions au;axi are bounded in
D,and

s :5 t.

Hence, the processes Mi are martingales. Then u(BT (t)) = u(x) + L.f=
1 Mi (t) is a mar-
tingale with expectation Ex[u(BT (t))] = u(x) so that Ex[u(B(T))] = u(x) by letting
t--+ oo. This result and the specified boundary conditions yield u(x) = Ex[~(B(T))]. •

Example 6.2: The RWM can solve locally elliptic and parabolic partial differen-
tial equations. The method cannot be used to solve hyperbolic partial differential
equations. <>

Proof: Let X be the diffusion process in Eq. 6.11. The generator of this process is

d a 1 d az
A= ~ai-
L... ax·
+-2 ~ (bbTk - - .
L... J ax· ax·
i=l l i,j=l l J

Since b bT is a real-valued symmetric matrix, its eigenvalues cannot be negative so that the
RWM cannot be applied to solve parabolic partial differential equations ([73], Chapter 26).
For example, consider the time-invariant version of Eq. 6.1 with d = 2 and ai (x) =
0 and let X be an R2 -valued diffusion process defined by the stochastic differential equation
6.2. Random walk method 353

dX(t) = b dB(t), t ::::_ 0, where B is a Brownian motion in R2 and b denotes a (2, 2) real-
valued matrix. The generator of X is (Eq. 6.16)

Since

the differential operator defined by the generator of X cannot match the operator of a hy-
perbolic equation ([73], Chapter 26). •

6.2.1.2 Monte Carlo algorithm


The RWM in Eqs. 6.14 and 6.18 gives the value of the solution u ofEq. 6.1
at an arbitrary point x E D and time t > 0 as a sum of expectations depending
on the sample paths of X. Generally, it is not possible to find these expectations
analytically, but they can be estimated from samples of X generated by Monte
Carlo simulation.
Figure 6.1 shows a set Dr ford = 2 and two sample paths of X that exit
D 1 at the times T = t and T < t. The sample paths of X can be divided in two
classes, sample paths exiting Dr through D at T = t and sample paths leaving Dr

Figure 6.1. Hypothetical sample paths of X exiting Dr at times T= t and T < t


ford= 2
354 Chapter 6. Deterministic Systems and Input

through aD at an earlier time f < t. The values of u(X(T)) at these exit points
are I](X(t)) and HX(t- f)), respectively.
Suppose that ns independent samples of X have been generated. Denote
by n~ and n7 the number of sample paths of X that exit D 1 through D and aD,
respectively. We haven~ 2: 0, n7 2: 0, and ns = n~ + n7.

The local solution in Eq. 6.14 can be estimated by

I 1 n, " n,
u(x, t) = ns - "IJ(X(t, w'))
I ] + ns [ __!__ "~(X(t- T(w"), w"))
If ]

ns [
n' ~ ns n" ~
s w'=l s w"=l

+- L
1 n, iT(w)
p(X(s, w)) ds. (6.19)
ns w=l 0

Note: There is no reason to continue the calculation of a sample of X after this sample has
exited Dt. Hence, we need to generate samples of the process X stopped at f.
The terms in the square brackets of Eq. 6.19 are estimates of the expectations
E(X,tl[u(X(f)) I f = t] and E(X,tl[u(X(f)) I f < t] and the ratios n~/ns and n7/ns
weighting these terms approximate the probabilities P(f = t) and P(f < t), respectively.
Hence, the sum of the first two terms in Eq. 6.19 is an estimate for E(X.tl[u(X(T))].
The expectation E(X,t) [Jt p(X(s)) ds] in Eq. 6.14 is approximated by the last term in
Eq. 6.19.
The accuracy of the local solution given by Eq. 6.19 depends on (1) the sample size
ns = n~ + n7, (2) the time step used to generate sample paths of the diffusion process X,
and (3) the accuracy if the recurrence formula used for generating sample paths of X. The
algorithms in Section 4.7.3 can be used to generate samples of X. _.

The local solution in Eq. 6.18 can be estimated by

n, ns iT(w)
u(x) = -
1
L HX(T(w), w)) +-1 L p(X(s, w)) ds, (6.20)
ns w=l ns w=l 0

where X ( ·, w) and T (w), w = 1, ... , n s, are samples of X and T, respectively.


Example 6.3: Let u be the solution of the partial differential equation
2 a2u 2 a2u 2 2
xl -2 +x2 -2 = 2xl x2
axl ax2
defined in D = (c:, 1) x (c:, 1), where c: E (0, 1). The values of the solution u on
the boundary aD of D are given by the values of function X xi/2 on aD. The r
local solution is

u(x) =Ex [u(X(T))]- Ex [loT X1 (s) 2 X2(s) 2 ds J,


6.2. Random walk method 355

where x E D, the coordinates of X are defined by the stochastic differential


equationsdXi(t) = Xi(t)dBi(t), B1 andB2 areindependentBrownianmotions,
and T = inf{t > 0 : X(t) ¢ D}. Estimates of u(x) at many points in D for
s = 0.01 calculated from ns = 5, 000 samples of X generated with a time step
At = 0.0005 are in error by less than 3%. <>
Proof: The ItO formula applied to X I-* u(X) gives

Ex [u(X(t))]- u(x) =~Ex [


2
r
k
(xt(s) 2 alu(~(s)) + X2(s) 2 alu(~(s))) ds]
a~ a~
by expectation. The stated local solution results by substituting T for t and using the
defining equation of u.

U(XI, X2) = Xf xv2.


It is simple to verify that the solution of the above partial differential equation is
This exact solution has been used to assess the accuracy of the
estimates delivered by the local solution. •

6.2.1.3 The Laplace equation


LetD beanopenboundedsubsetin!Rd with boundaries Dr, r a =
1, ... , m,
and let ~r be functions defined on these boundaries. If m =
1, then Dis a simply
connected subset of JRd . Otherwise, D is multiply connected.

Let u be the solution of the Laplace equation

Llu(x) = 0, X E D,
u(x) = ~r(X), X E aD 7 , r = 1, ... , m. (6.21)

Denote by B an JRd -valued Brownian motion starting at x E D. The samples


of B will exit D through one of the boundaries aDr of D. Figure 6.2 shows
two samples of a Brownian motion B in JR2 and their exit points from a multiply
connected set D. We define as in Eq. 6.13 a stopping time T as the first time B
with B(O) = x E D exits D, that is,

T = inf{t > 0: B(t) ¢ D}. (6.22) 1

The local solution of the Laplace equation at an arbitrary point x E D is


m
u(x) = LEX[~r(B(T)) I B(T) E aDr]P(B(T) E aDr) (6.23)
r=l

Note: The local solution in Eq. 6.23 is a special case of Eq. 6.18 for ai (x) = 0, fJij (x) =
2 liij , and q = p = 0. The Brownian motion B is adequate for solution because its
356 Chapter 6. Deterministic Systems and Input

Figure 6.2. Local solution of the Laplace equation for a multiply connected do-
main Din JR2

generator is proportional to the Laplace operator. If D is simply connected, then m = 1 in


Eq. 6.23, aDl =aD, Ex[~r(B(T)) I B(T) E aD1] = Ex[~l(B(T))], and P(B(T) E
aD1) = 1.
If the boundary conditions are ~r(X) = kr, X E a Dr, then Eq. 6.23 gives
m
u(x) = Ex[u(B(T))] = Lkr Pr(x) xED, (6.24)
r=l
where Pr(X) = P(B(T) E aDr) [82, 84]. A

Example 6.4: Let u(x) be the steady-state temperature at an arbitrary point x =


(x1, x2) of an eccentric annulus

D = {(x1, x2) : xf +xi- 1 < 0, (x1 - 1/4) 2 +xi- (1/4) 2 > 0}

in JR2 with boundaries 8D1 = {(Xl, X2) : xf +xi- 1 = 0} and 8D2 = {(Xl, X2) :
(x1 - 1/4) 2 +xi - (1/4) 2 = 0} kept at the constant temperatures 50 and 0,
respectively. The local solution by the RWM is u (x) = 50 p 1(x) for any point x E
D (Eq. 6.24). Because Pl (x) cannot be found analytically, it has been calculated
from Eq. 6.20 and samples of B starting at x. The largest error recorded at x =
(0.7, 0), (0.9, 0), (0, 0.25), (0, 0.5), and (0, 0.75) was found to be 2.79% for n s =
1, 000 independent samples of B with time steps !'!,.t = 0.001 or 0.0001. Smaller
time steps were used at points x close to the boundary of D. The error can be
reduced by decreasing the time step and/or increasing the sample size.
Figure 6.3 shows the exact and RWM solutions along a circle of radius
r = 3/4 centered at the origin, where the angle() E [0, 2n) measured from the
coordinate Xl gives the location on this circle. The RWM solutions with (l'!,.t =
0.01, ns = 100) and (!'!,.t = 0.0001, ns = 5000) are shown with circles and
stars, respectively. The accuracy of the RWM method improves as the time step
decreases and the sample size increases [38]. <>
6.2. Random walk method 357

46
0 0
0
44 * *
0
0
* 0
0
42 0 *
0

0
* 0 0
40
0 0

0
u 38
0
0
36
*
34

Exact solution *
32 *

300 2 3 4 5 6 7
e
Figure 6.3. Solution along a circle of radius r = 3/4 centered at the origin

Note: The temperature u is the solution of~ u = 0 with the boundary conditions u(x) =
50 for X E aDl and u(x) = 0 for X E aD2. The exact solution is ([73], Example 16.3,
p. 296)

log(~(x) 2 +p(x) 2 ))
u~)=~ ( 1-
2 log(a)
, where a = 2 + v'3,

This exact result was used to evaluate the accuracy of the local solution. .&

Example 6.5: Let D = {x E IH?.2 : (x1/a1) 2 + (x2/a2) 2 < 1} be an ellipse, where


a1, a2 > 0 are some constants. Let u be the solution of the Laplace equation
llu(x) = 0, x E D, such that u(x) = ~(x), x E aD, where~ is a specified
function and aD = {x E JH?.2 : (x1/a1) 2 + (x2/a2) 2 = 1} denotes the boundary
of D. The local solution of this equation is u(x) = Ex [~(B(T))] at any point
x E D, where B denotes an IH?.2 -valued Brownian motion starting at x and T =
inf{t 2:: 0: B(t) ¢:. D} is the first time B exits D.
The following MATLAB function can be used to estimate u(x) (Eq. 6.19).
358 Chapter 6. Deterministic Systems and Input

function uest =laplace( a 1,a2, 1~~ ,dt,ns,nseed)


dts=sqrt(dt);
randn('seed' ,nseed)
for ks=l:ns,
xbr=x; xbrl=abs(xbr(l)); xbr2=abs(xbr(2));
xx=(xbrl/a1) 2 + (xbr2/a2) 2;
while xx < 1
xbr=xbr+randn(2,l)*dts;xbrl=abs(xbr(l)); xbr2=abs(xbr(2));
xx=(xbrlja1) 2 + (xbr2/a2) 2 ;
end,
u(ks)=~(xbrl,xbr2); end,
uest=mean( u);

The output "uest" is the estimated value of u at x E D based on "ns" independent


samples of B generated with a time step "dt". <>
Note: The input to this MATLAB function consists of the dimensions a1 and a2 of D, the
point x E D at which the local solution is to be determined, the seed "nseed" and the time
step "dt" used to generate "ns" independent samples of B, and the function~ giving the
values of u on aD . .!

Example 6.6: Let u be the solution of the partial differential equation

~ au(x) 1 ~ a 2 u(x)
L....ai(x)~+2 L....f3iJ(x)ax·Bx· =0, xEDc!R2 ,
i=l l i,j=l l 1

with the boundary conditions u(x) = ~(x), x E aD. If {3(x) = {f3iJ (x)} admits
the decomposition {3(x) = b(x) b(x) T for each x E D, then the local solution is
u(x) = Ex[u(X(T))], where X is an JR2 -valued diffusion process with drift and
diffusion coefficients ai (x) = ai (x) and biJ (x) and T denotes the first time X
starting at x E D exits D (Eq. 6.14). It is assumed that Ex [T] < oo.
Generally, the expectation Ex[u(X(T))] cannot be obtained analytically
but it can be estimated from samples of X generated by Monte Carlo simulation.
Numerical results were obtained for D = (- 2, 2) x ( -1, 1), boundary conditions
~(x) = 1 for x E ( -2, 2) x {-1} and zero on the other boundaries of D, and the
coefficients a1 (x) = 1, a2(x) = 2, f3n (x) = 1.25, f312(x) = /321 (x) = 1.5, and
/322 (x) = 4.25 of the partial differential equation of u. The local approximations
of u based on ns = 1, 000 samples of X and a time step of !1t = 0.001 are
satisfactory. For example, these values are 0.2720, 0.2650, and 0.2222 at x =
(0.5, 0), (1.0, 0), and (1.5, 0). They differ from the finite element solution by less
than 5%. <>
Note: The diffusion process X needed to estimate u(x) at an arbitrary point x E D is
defined by the stochastic differential equation
dX1(t) = dt +dB1(t) + (1/2)dB2(t),
{
dX2(t) = 2dt + (1/2) dB1 (t) + 2dB2(t),
6.2. Random walk method 359

where B 1 and B2 are independent Brownian motions. A finite element solution has been
used to assess the accuracy of the local solution because the exact solution is not known in
this case. A

6.2.1.4 The Poisson equation


Let D be an open bounded subset in JE.d with boundaries a Dr, r = 1, ... , m,
and let ~r be functions defined on these boundaries. Consider an JE.d -valued Brow-
nian motion B starting at x E D and let T be the stopping time in Eq. 6.22.

Let u be the solution of the Poisson equation

~u(x) = -p(x), xED,


u(x) = ~r(X), X E a Dr, r = 1, ... , m. (6.25)

The local solution of a Poisson equation at an arbitrary point x E D is

u(x) = Ex [u(B(T))] -Ex [loT A[u(B(s))] ds J


=~Ex [~r(B(T)) I B(T) E aDr] Pr(X) + ~Ex [loT p(B(s)) ds J.
(6.26)

Note: The function Pr (x) = P(B(T) E iJDr) is the probability that B starting at x E D
exits D for the first time through iJDr, r = 1, ... , m (Eq. 6.23). The operator A= (1/2) ~
is the generator of the Brownian motion B. Generally, the probabilities and expectations in
Eq. 6.26 cannot be obtained analytically but can be estimated from samples of B. A

Example 6.7: The Prandtl stress function for a bar in torsion is the solution of
~ u(x) = -2 G {3, x E D c JR2 , where G is the modulus of elasticity in
shear and f3 denotes the angle of twist. If D is a multiply connected domain
with the external boundary aD! and the interior boundaries a Dr, r = 2, ... , m,
delineating the cavities of the beam cross section, the Prandtl function satisfies the
conditions u(x) = kr for x E a Dr, r = 1, ... , m, where kr are constants. One
of the constants k, is arbitrary and can be set to zero, for example k1 = 0 ([23],
Section 7-6).
The local solution for the Prandtl stress function is

u(x) = Ex [u(B(T))]- ~Ex [loT ~ u(B(s)) ds J


m
=L Pr(X) kr + G f3 EX[T]
r=2
360 Chapter 6. Deterministic Systems and Input

with the notation in Eq. 6.26. If the Brownian motion B starts at a point on
the boundary of D, for example, X E oDr, and D is regular, then Pr(X) = 1,
pq (x) = 0 for q # r, and Ex [T] = 0 so that u (x) = kr, that is, the local solution
satisfies the boundary conditions exactly [82, 84]. 0

Note: The constant fJ can be calculated from the condition that the applied twist is twice the
volume of the Prandtl function over D ([23], Section 7-6). Therefore, the global solution
is needed to find {J. However, the value of fJ may not be necessary in the design phase
selecting the shape of D from several possible alternatives. Because local solutions help
detect regions of stress concentration in D, they provide a useful tool for the selection of
adequate shapes forD, that is, shapes which do not cause large localized stresses.
If D is a simply connected set, then Pr (x) = 0 for r = 2, ... , m so that the local
solution becomes u(x) = G fJ Ex[T] .l

Example 6.8: Suppose D is a simply connected circular cross section of radius


r > 0. LetT be the first time an JR.2 -valued Brownian motion B starting at x E D
leaves D. The Prandtl function given by the local solution in Eq. 6.26 is

G R(r2 - x2 - x2)
u(x) = G f3 Ex [T] = P
2
1 2 , x E D,

and coincides with the expression of this function in solid mechanics books ([23],
Section 7-4). o
Note: Consider a sphere of radius r > 0 centered at the origin of rand an JRli -valued
Brownian motion B starting at x, II x II < r. Let T be the first time that B starting at x
(r
exits the sphere. Then Ex[T] = 2- II x 11 2 ) fd (Section 8.5.1.1 in this book, [135],
Example 7.4.2, p. 119) . .l

Example 6.9: Estimates of the Prandtl stress function u(x) in Example 6.7 at
x = (2, 1), (3, 1), (4, 1), (2, 2), and (3, 2) for an elliptical cross section D = {x :
(x1/a1) 2 + (x2/a2) 2 < 1} with a1 = 5 and a2 = 3 have been obtained by Monte
Carlo simulation for G f3 = 1/2. For ns = 100 samples of B generated with a
time step !:l.t = 0.001 the largest error of the estimated Prandtl function is nearly
33%. The error decreases to 2% if the sample size is increased tons = 1, 000. 0

Note: The exact expression,

of the Prandtl stress function ([23], Section 7-4) was used to assess the accuracy of the local
solution. .l
6.2. Random walk method 361

Example 6.10: Consider a hollow circular shaft with inner radius a > 0 and outer
radius b > a and let T be the first time an JR2 -valued Brownian motion B starting
at x E D leaves D = {x E JR2 :a <II x II< b}. The local solution is

for u(x) = kt = 0 and u(x) = kz = 0.5 G {3(b2 - a 2 ) on the exterior and


the interior boundaries, aD1 = {x :II x II= b} and aDz = {x :II x II= a},
respectively (Eqs. 6.14 and 6.26).
No analytical results are available for Ex[T] and pz(x). The Monte Carlo
simulation method was therefore used. Numerical results have been obtained for
a = 1, b = 3, and G {3 = 2. The largest errors of the estimated values of the
Prandtl function at x = (1.5, 0), (2, 0), and (2.5, 0) are 21% for n s = 100 and
1.35% for ns = 1, 000 independent samples of B generated with a time step of
!1t = 0.001. 0

Note: The Prandtl stress function is u(x) = -~G {3(xT x - b 2) for a2 < xT x < b2
([23], Section 7-9). This expression of u has been used to assess the accuracy of the local
solution. &

6.2.1.5 Heat and transport equations

Let D be an open bounded subset in Rd, a > 0 denote a constant thermal


diffusion coefficient, and u(x, t) be the temperature at x E D and timet > 0.

Let u be the solution of the heat equation

au<x. t) = az I: a u<x. t)
d 2
(6.27)
at i=1 ax;
with specified initial and boundary conditions giving the functions u(x, 0) for
X ED and u(x, t) for X E aD and t > 0, respectively.

Consider an JRd -valued Brownian motion B starting at x E D and the stop-


ping time Tin Eq. 6.13, where X is an JRd+l_valued diffusion process starting at
(Xt (0), ... ' Xa(O)) =X E D and xd+l (0) = t > 0. The process X(s), s 2: 0, is
defined by the following stochastic differential equation.

dXi(s) = ...fia dBi(s), i = 1, ... , d,


dXa+t(s) = -ds. (6.28)
362 Chapter 6. Deterministic Systems and Input

The local solution of Eq. 6.27 is

u(x, t) = E(X,t) [u(X(T))]. (6.29)

Proof: The generator of X is A = -ajaxd+l + a 2 ~ so that Eq. 6.29 giving the local
solution of Eq. 6.27 is a special case of Eq. 6.14 for ai = 0, f3i} = 2a 2 li;1 , q = 0, and
p = 0. The steady-state solution u 8 (x) = limt-+oo u(x, t) satisfies the Laplace equation
~Us = 0 since CJusfCJt = 0.
The local solution in Eq. 6.29 can be extended to random heterogeneous media.
This extension is used in Section 8.5.1 to find effective conductivity coefficients for random
heterogeneous media. •

Example 6.11: Consider a rod oflength I > 0 with initial temperature u(x, 0) =
100, x E D = (0, 1), and boundary temperatures u(O, t) = u(l, t) = 0 for
t > 0. The temperature at an arbitrary time t > 0 and location x E (0, I) is
u(x, t) = 100 p3(x, t), where PI (x, t), p2(x, t), and p3(x, t) are the probabilities
that the R2 -valued diffusion process X= (XI, X2) defined by

dXI(s) =./2adB(s), Xt(O)=xE(O,I),


{
dX2(s) = -ds, X2(0) t > 0, =
exits D 1 = (0, I) x (0, t) for the first time through the boundaries {0} x (0, t),
{I} x (0, t), and (0, I) x {0}, respectively. The probabilities p,(x, t) satisfy the
condition I;;=I p,(x, t) = 1. Estimates of u(x, t) for a 2 = 1/2, l = 1, t E
(0, 2), x = 0.1, 0.3, 0.5, 0.7, and 0.9, ns = 1, 000 samples, and a time step of
M = 0.001 are in error by less than 3%. Figure 6.4 shows three samples of X
starting at (x = 0.25, t = 0.3) that exit D 1 at times f < t and f = t. ¢

Note: The generator of X= (XI, X2) is A= a 2 0 22


x1
° - a~2 so that (Eq. 6.14)

E(x,t)[u(X(T))] = u(x, t) + E(x,t) [!of A[u(X(s))]ds] = u(x, t),

where T is the first time that X starting at (x, t) exits Dt = (0, l) x (0, t), x E (0, l),
and t > 0. The last equality in the above equation holds since u is the solution of the
heat equation, X(s) E Dt for s E (0, T), and A coincides with the differential operator of
Eq. 6.27 ford = 1.
The exact solution is

u(x, t) = L -;-;
400
00 nrrx 2
sin - 1- exp{ -(n rr aj l) t}
n=I,3, ...
for x E (0, l) and t ~ 0 ([73], Example 26.2, pp. 528-531). A
6.2. Random walk method 363

'-;{1} X (0, t)
0.9

0.8

0.7

0.1

Figure 6.4. Samples of i with generator a 2 (a 2 ;ax?)- a;axz

Example 6.12: Consider the mass transport equation

au = -c au + d a2u
at ax ax 2
defined in D = {x : x > 0} fort > 0 with the initial and boundary conditions
u(x, 0) = 0 and u(O, t) = uo, t > 0, respectively. The local solution of this
equation at (x, t) is u(x, t) = uo P(T < t), where D1 = D x (0, t), T = inf{s >
0: X(s) ¢ Dt}, i = (Xt, Xz) is defined by

{
dXt(s) = -cds + (2d) 112 dB(s),
dXz(s) = -ds,

and B denotes a Brownian motion. Figure 6.5 shows two samples of i starting at
(x = 0.4, t = 0.2) and exiting D 1 at times T < t and T = t.
Estimates of u(x, t) calculated at several locations x and times tare in er-
ror by less than 2% for c = 1, d = 1, uo = 1, a time step !l.t = 0.001, and
ns = 500 samples. The error can be reduced by increasing the sample size. For
example, the error decreases to approximately 1% if n s = 1, 000 samples are used
in calculations [86]. <>
Note: The generator A= d (8 2 ;axr)-c (8jax 1)-8f8xz of X, Eq. 6.14, and the boundary
and initial conditions yield the stated local solution.
The exact solution of the mass transport equation giving the concentration of pollu-
tant at an arbitrary point x and time t > 0 is

u(x, t) =Z
uo[erfc (x-ct)
2./di + exp(c xfd) erfc (x+ct)]
./di ,
2
364 Chapter 6. Deterministic Systems and Input

(x = 0.4, t = 0.2)

U=U
/ 0

time
f
Figure 6.5. Samples of X with generator A= d (a 2 ;ax?) - c (ajaxl)- a;ax2
for c = 1 and d = 1

where erfc(x) = (2/ ..(ir) J000 e-~ 2 d~ is the complementary error function ([195], Exam-
ple Problem 3.1, p. 65). £

6.2.2 Dirichlet boundary conditions (q f. 0)


Our objective is the development of a local solution for Eq. 6.1 with the
initial and boundary conditions given by Eqs. 6.4 and 6.10. The method is based
on the Ito formula and the Feynman-Kac functional.

6.2.2.1 The Feyuman-Kac functional

Let X be an ~d -valued diffusion process defined by the stochastic differen-


tial equation

dX(s) = a(X(s)) ds + b(X(s)) dB(s), s ::: 0, (6.30)

starting at X (0) = x, where B is a Brownian motion in ~d and the entries of the


(d, 1), (d, d) matrices a, b do not depend explicitly on time. We have seen in the
proof ofEq. 6.18 that the generator of X is

d a 1 d a2
A=" a;(x)-
~
+- "(b(x)b(x)T)iJ - - .
ax· 2 i,j=l
~ ax· ax·J
(6.31)
i=l l l
6.2. Random walk method 365

Let p, 1; : JE.d --+ lR be two functions such that (1) p has a compact support
and continuous second order partial derivatives, (2) I; is a bounded Borel measur-
able function, and (3) the drift and diffusion coefficients in Eq. 6.30 are such that
this equation has a unique solution X that belongs a.s. to the support of p.

The Feynman-Kac functional

v(x, t) = Ex [P (X (t)) ef~ ~(X (u)) du J (6.32)

is the solution of the partial differential equation

av (6.33)
at=A[v]+l;v

with v (x, 0) = p (x) and some boundary conditions.

Proof: The derivation of the Feynman-Kac formula involves two steps. First, consider the
JR2 -valued process (Y, Z) defined by

Y(t) = p(X(t)),
{
Z(t) =exp(Jd~(X(u))du),

Y(O) = p(X(O)) = p(x), and Z(O) = 1. The expectation of the product of these two
processes is v(x, t) = Ex [Y(t) Z(t)] in Eq. 6.32. The Ito formula applied to the above
functions of X defining the processes Y and Z gives

{
dY(t) = A[p(X(t))]dt + 'L1=l ('Lf=l bij(X(t))ap~:?))) dBj(t),
dZ(t) = ~(X(t)) Z(t)dt.
Second, the product Y(t) Z(t) can be given in the form

Y(t)Z(t)=p(X(t)) [exp(foh ~(X(u))du) -1] exp({ S(X(u))du)


+ p(X(t)) exp ( { S(X(u)) du)

for 0 < h < t so that we have

v(x, t) =Ex { [exp (foh ~(X(u)) du) - 1]

xEX(h) [p(X(t- h)) exp (fo'-h nX(u)) du)]}

+Ex { EX(h) [p(X(t- h) exp (fo'-h ~(X(u)) du)]}


366 Chapter 6. Deterministic Systems and Input

by averaging, or

v(x,t)=Ex {[exp(foh i;(X(u))du) -1] v(X(h),t-h)}

l
+Ex [v(X(h), t - h)]

=Ex [exp (foh I;(X(u)) du) v(X(h), t - h)


so that, for a small h > 0, v(x, t) can be approximated by

v(x, t) ::::::: (1 + h l;(x) + o(h)) Ex [v(X(h), t - h)].


We now apply the Ito formula to the function v(X(a), t- h) of X(a) for a E (0, h). The
expectation of this formula is

Ex[v(X(h), t - h)]= v(x, t- h)+ Ex [foh A[v(X(a), t- h)]da].


Since h is small, we have

Ex [foh A [v(X(a), t - h)] da] ::::::: h A [v(x, t - h)]+ o(h)


so that
Ex[v(X(h), t - h)J::::::: v(x, t - h)+ h A [v(x, t - h)]+ o(h).
The approximation v(x, t) ::::::: (1 + h i;(x) + o(h)) Ex[v(X(h), t - h)] of v established
previously and the above approximate expression of Ex[v(X(h), t - h)] give

v (x, t) ::::::: v (x, t - h) + h I; (x) v (x, t - h) +h A [ v (x, t - h)] + o(h),


which yields Eq. 6.33 by moving v(x, t - h) to the left side of this equality, dividing by h,
and taking the limit as h ---+ 0.
It can also be shown that a bounded function w(x, t) satisfying Eq. 6.33 coincides
with v(x, t) provided that it has the same initial and boundary conditions as v ([134],
Theorem 8.2.1, p. 135). •

Example 6.13: Let r denote the time a Brownian motion B starting at zero spends
in the positive half line up to a time t > 0. The distribution and density of r are

F(r) = ~ arcsin ([f) and f(r) =


1
n ,Jr (t- r)
, T E (0, t).

This property of r is referred to as the arcsine law. <>


Proof: Take p = 1, d = 1, l;(x) = -fJ 1(x>0}• and X = Bin Eq. 6.32, where fJ > 0 is
arbitrary. Hence, J6
!;(X (u)) du is equal to -{J J6
1(B(u)>0} du, and is proportional to the
6.2. Random walk method 367

time B spends in (0, oo) during [0, t]. The corresponding Feynman-Kac functional is the
solution of the partial differential equation

av(x, t) - - fJ v(x, t ) + 21 a2v(x,t) X> 0, t > 0,


[ ax2 '
at - l a2v(x,t)
X :S 0, t > 0.
2 ax2 '

The function v must be bounded and satisfies the continuity conditions v(O+, t) = v(O-, t)
and av(O+, t)jax = av(O-, t)jax at each t > 0. It can be shown that

v(O, t) = t
Jo
1
n -,/r (t - r)
e-f3r dr = t e-f3r j(r)dr
Jo
so that the density of r is as asserted ([111], pp. 224-226). •

6.2.2.2 The inhomogeneous Schrodinger equation


Let X = (X, xd+l) be the m:d+ 1 -valued diffusion process in Eq. 6.11 and
let f be the first time X starting at (x, t) exits D 1 = D x (0, t), where x E D and
t > 0 (Eq. 6.13). It is assumed that the solution u of Eq. 6.1 and the function q in
this equation are such that Eqs. 6.32 and 6.33 can be applied with X, u, and q in
place of X, p, and s.

The local solution ofEq. 6.1 is

u(x, t) = E(X,t) [u(X(i')) exp (!of q(X(a))da)]

+ E(X,t) [!of p(X(s)) exp (los q(X(a)) da) ds]. (6.34)

Proof: We first note that the local solution in Eq. 6.14 is a special case of Eq. 6.34 corre-
sponding to q = 0.
Consider an JRd+ 2 -valued diffusion process (X, Xd+2) = (X, Xd+l• Xd+2 = Z)

I
defined by the stochastic differential equation

dX(s) = a(X(s)) ds + b(X(s)) dB(s),


dXd+l (s) = -ds,
dZ(s) = q(X(s), s) Z(s) ds,
for s 2: 0 and the initial condition X(O) = x E D, Xd+l (0) = t > 0, and Z(O) = 1. The
generator of the diffusion process (X, Z) =(X, Xd+l, Xd+2 = Z) is

2
a
A*=---+ :Ea;- + q z - -
axd+l
d
i=l
a
ax;
a
axd+2
+-
1
Ld
2 i,j=l
a
(bbT);· - - - ,
1 ax; ax j
368 Chapter 6. Deterministic Systems and Input

and can be applied to an arbitrary function g (x, s, z) with continuous first order partial
derivatives in s and z and continuous second order partial derivatives in x. For the function
g(x, Xd+b xd+2 = z) = 1/r(x, Xd+J) z, we have

a d a
A*[g(x, s, z)] = z [ --"-- + La;(x) ; - + q(x, s)
uXd+l i=l ux;

+ 21 L
d
(b(x) b(x)T)ij ~
az ] 1jr(x, xd+J).
i,j=l Xz X]

For g (X, Z) = 1/r (X) Z and the solution u of Eq. 6.1 in place of 1jr, the Ito formula
gives

E(x,t)[u(X(f)) Z(f)]- u(X(O)) Z(O) = E(x,t) [!of A*[u(X(s)) Z(s)]ds]

by averaging. The above formula yields Eq. 6.34 since (X, Z) starts at (x, t, 1) so that
u(X (0)) Z(O) = u(x, t), and X (s) E D 1 for s E (0, f) implies

fof A* [u(X(s)) Z(s)] ds = - fof p(X(s), s) Z(s) ds


for any (x, t) in D x (0, oo) (Eq. 6.1). Additional technical considerations on the local
solution of the Schrodinger equation can be found in [42] (Section 6.4 ). •
Generally, the expectations in Eq. 6.34 cannot be obtained analytically but
can be estimated from sample paths of X generated by Monte Carlo simulation.
The estimation of these expectations involves a notable computational effort since
they involve integrals depending on the entire history of X in D 1 • A simpler case
is considered in the following equation.

If q(x) = q is a constant, the local solution in Eq. 6.34 becomes

Example 6.14: If q(x) = q is constant, p(x) = 0, a; = 0, and {3;1 = 2t>;1 in


Eq. 6.8, the local solution of this special case of the SchrOdinger equation is

u(x) =Ex [~(.J2B(T))eqT], xED,

for the boundary condition u(x) = ~(x), where B is a Brownian motion in Iffi.d
starting at x E D and T = inf{t > 0: ,fiB(t) rf. D} is a stopping time. The
value of u can be estimated at x E D from n s samples ofT and B (T) by
l
L ~ (.J2 B(T(w), w)) eq
ns
u(x) = ns T(w).
w=l
6.2. Random walk method 369

Numerical results obtained forD = {x E ffi. 2 : x? +xi < 4}, q = - L:f=l a;,
and ~ (x) = exp(ao + a1 XI + a2 x2) are in error by less then 1.5% for ao = 0.5,
a1 = 0.3, a2 = 0.3, and ns = 500 samples of B generated with a time step
flt = 0.001. <>

Note: The errors were calculated with respect to the exact solution u(x) = exp(ao +
a 1 x 1 + a2 x2). That this function satisfies the SchrOdinger equation results by elementary
calculations. ~

Example 6.15: The local solution of the one-dimensional Schrodinger equation


(1/2) u"(x)+q u(x) = 0, x E D = (0, /),with the boundary conditions u(O) =a
and u(l) = f3 is u(x) = Ex [~(B(T)) eq T], where ~(B(T)) =a or f3 if B(T) = 0
or B(T) = l, respectively. The local solution can be estimated by (Eq. 6.35)

u(x)= :s [ a f
n'

1
eqT(w')+f3
n"
w~leqT(w")
]
=c1(x)a+c 11 (x)f3,

where x E (0, 1), n~ and n7 denote the number of samples of a Brownian motion
B starting at x E D that exit D = (0, /) through the left and the right ends of
d, ns = n~ + n7 denotes the sample size, and c'(x), c"(x) giving the weights
of the boundary values u(O) = a, u(l) = f3 result from the above expression of
u(x). The largest value, emax. of the errore= lu(x)- u(x)l/lu(x)l at X = k/10,
k = 1, ... , 9, is smallerthan5%for/ = l,a = 1, f3 = 2,q = 1, flt = 0.001, and
ns = 500. The error can be reduced by increasing the sample size and/or reducing
the time step flt used to generate Brownian motion samples. For example, emax
is less than 1.5% for ns = 1, 000 and flt = 0.0005 [85]. <>
Note: The exact solution of this Schrodinger equation used to asses the accuracy of the
estimator u is
;-;:;-: f3 - 01 cos (.j2q /) ;-;:;-:
u(x)=OICOS(y.<.qx)+ . f"l"":: sin(y2qx).
sm(y2q/)
The estimator u of u is less accurate when D and/or q is large. This unsatisfactory
performance is caused by the dependence of u(x) on exp(q T), a random variable that can
have a heavy tail. To clarify this statement, consider the special case 01 = f3 = 1, q = 1,
l = 2a, and x =a in which u(x) is equal to the expected value of exp(T). Estimates of
E[T] based on 1,000 sample paths of B generated with a time step D.t = 0.001 are 1.0097
for a = 1 and 4.0946 for a = 2. The corresponding estimates of coefficient of variation vr
ofT are 0.7554 and 0.7663, respectively. The estimates of the coefficient of variation of
exp(T) are 1.7506 for a = 1 and 31.6228 for a = 2. The large coefficients of variation are
the cause of unstable estimates for the expectation of exp(T). In fact, E[exp(T)] may not
be bounded. For example, if T is an exponential random variable with expectation 1j A and
A E (0, 1], then E[exp(T)] = +oo so that it will not be possible to obtain stable estimates
of the mean of exp(T) from samples ofT ([79], Section 2.2.6).
The performance of the estimator u can be improved by dividing D in sufficiently
small parts for which this estimator is accurate. Suppose that D = (0, I) is divided in m + 1
370 Chapter 6. Deterministic Systems and Input

equal intervals and letak = u(k lf(m + 1)) be the unknown value of u at the division point
Xk = k lf(m + 1), k = 1, ... , m. The relationships ak = c/c ak-1 + c'/c ak+l> between the
values of u at the division points can be obtained from the functional form of the estimate
u
of applied in the intervals ((k -1)1/(m + 1), (k+ 1)1/(m + 1)). The unknown values of u
at the division points can be obtained from these relationships and the boundary conditions
giving two additional equations, ao =a and am+l = {3. For example, let a = 1, f3 = 2,
q = 1, l = 2, and m = 3. The relationships between the values of u at the division
points are ak = (ak-1 + ak+I)/2, k = 1, 2, 3, by the symmetry of the Brownian motion.
These equations and the boundary conditions give a1 = 1.25, a2 = 1.50, and a3 = 1.75.
Suppose that u(0.3) needs to be estimated. The estimate of u(0.3) can be obtained by this
approach for the domain (0, 0.5) and the boundary conditions u(O) = 1, u(0.5) = 1.25.
The error of this estimate is under 1.2% for ns = 500 and D.t = 0.0005. A

6.2.2.3 The homogeneous SchrOdinger equation

Let u be the solution of Eq. 6.8 or 6.9, where q(x) = A. is a parameter and
p(x) = 0.
Our objective is to find a local solution for these equations, that is, to
find the lowest eigenvalue for Eqs. 6.8 and 6.9. We use Eq. 6.9 to illustrate the
construction of the local solution.

Let (A., u) be the solution of the eigenvalue problem

1
- l ~u(x) = A.u(x), xED,
u(x) = 0, X E aD. (6.36)

The lowest eigenvalue of the operator in Eq. 6.36 is

A.o = sup{p: Ex[exp(p T)] < oo, for all xED}, (6.37)

where T = inf{t ~ 0: B(t) ¢, D} and B(O) = x E D.

Note: The trivial solution u = 0 always satisfies Eq. 6.36. A constant A is said to be an
eigenvalue of Eq. 6.36 if there is a function u that is not zero everywhere in D and the pair
(A, u) satisfies Eq. 6.36. If Dis smooth, the operator -(1/2) D. has a countable number of
distinct eigenvalues 0 < AQ < AI < · · · such that An ~ oo as n ~ oo ([43], Volume 1,
Section V.5). Also, the eigenvalues of the operator -(1/2) D. are strictly positive as it can
be found from the equality ((1/2) D.u, u) = -A (u, u) using integration by parts and the
factthatu is an eigenfunction, where (u, v) = fv u(x) v(x)dx.
Let X= (X= B, xd+l) be anJR'i+ 1-valued diffusion process, where dXd+l (t) =
dt, X(O) =X ED, and xd+l (0) = 0. The generator of X is A= a;axd+l + (1/2) /), =
a;at + (1/2) D., where /), = "'£1=1 a2 ;ax'f. Suppose that Up is a solution of Eq. 6.36
for A = p > 0. The Ito formula applied to the function exp(p xd+l (t)) Up(X(t)) =
6.2. Random walk method 371

exp(p t) up(B(t)) gives

up(x) =Ex [epT up(B(T))]- Ex [loT A[eps up(B(s))]ds]

= EX [eP T Up (B(T))] -EX [loT eP 8 (p + (1/2) Ll) Up(B(s)) ds] , X E D.

If Ex[eP T] is finite, then up(x) = 0 for all x E D since up(B(T)) = 0 by the boundary
condition in Eq. 6.36, B(s) E D for s E (0, T), and up(B(s)) satisfies Eq. 6.36 for J... = p
by assumption. In this case p cannot be an eigenvalue of Eq. 6.36 so that Ao must be at
least sup{p : Ex[eP T] < oo, Vx E D}. It can be shown that the lowest eigenvalue of the
operator of Eq. 6.36 is given by ([57], Section 8.8, pp. 263-270)

J...o = sup{p : Ex [eP T] < oo, Vx E D}.

The use of Eq. 6.37 is impractical because it requires us to estimate the expecta-
tion Ex[exp(p T)] for many values of x E D and values of p. A more efficient Monte
Carlo method for estimating the lowest eigenvalue of integral and differential operators is
discussed in Section 6.6.2 . .l

6.2.3 Mixed boundary conditions


The RWM in Eqs. 6.14 and 6.18 has been developed to solve locally Eq. 6.1
with Dirichlet boundary conditions. If the unknown function u is not specified
everywhere on the boundary aD of D, Eqs. 6.14 and 6.18 cannot deliver the local
solution ofEq. 6.1.
Example 6.16: Let u be the solution of u"(x) = -m(x)fx, x E D = (0, 1),
with the boundary conditions u'(O) = 0 and u(1) = 0, where m(·) is a specified
continuous function and x > 0 is a constant. The local solution in Eq. 6.18 with
d = 1, D = (0, 1), and X = B gives

u(x) = Ex[u(B(T))]- Ex [loT A [u(B(s))] ds J


= Ex[u(B(T))] + 21X Ex [loT m(B(s)) ds J,

where B is a Brownian motion and T = inf{t ~ 0 : B(t) fl. D} is the first time B
starting at x E D exits D. We cannot calculate the value of u(x) from the above
equation since B can exit D through both its ends so that u(B(T)) is either u(O) or
u(1) and u(O) is not known. Hence, the expectation Ex[u(B(T))], and therefore
the value u(x) of u, cannot be calculated.<>
We will see that mixed boundary value problems can be solved locally by
using Brownian motion and diffusion processes, but these processes have to be
reflected toward the interior of D when they reach aD at a point where Neumann
conditions are prescribed. Also, we will have to extend the classical Ito formula
to handle functions of reflected Brownian motion and diffusion processes.
372 Chapter 6. Deterministic Systems and Input

6.2.3.1 Brownian motion reflected at zero


Let B be a Brownian motion. The absolute value of B, that is, the process
IB I, is called the Brownian motion reflected at zero. We ( 1) state without proof
Tanaka's formula providing a representation of the Brownian motion reflected
at zero, (2) prove a first extension of Tanaka's formula for the process g(IBI),
g E C2 (JR), and (3) apply the extended version of Tanaka's formula to solve locally
some mixed boundary value problems.

The local time process is

L(t) =lim- 1 lot 1(-e,eJ(B(s))ds, (6.38)


£,!-0 2£ 0

where the limit exists in Lz and a.s. ([42], Chapter 7).


Note: The local time L (1) is adapted to the natural filtration of B, (2) has continuous
increasing samples starting at zero, so that it is of finite variation on compacts, and (3) can
increase a.s. only when B is zero, that is, J000 l(t:B(t);ioO)dL(t) = 0 a.s. ([42], Theorem 7.6,
p. 150). An extensive discussion on the local time process can be found in [42] (Chapters 7
and 8). &

Example 6.17: The expectation of the local time process L exists and is equal to
E[L(t)] = ,J2 t jn for all t 2: 0. <>
Proof: The integral JJ1(-e,e) (B(s)) ds in the definition of the local time process is bounded
by t for any e > 0 so that

E [f 1(-e,e)(B(s))ds] = f E [1(-e,eJ(B(s))] ds = f [2 ¢>(ei-Js)- 1] ds

by Fubini's theorem and properties of the Brownian motion. Hence,

E[fot 1(-e,eJ(B(s))ds]::::: f (,J~:s +o(e)) ds= ~ +o(e), e ,(, 0,

and ( 4 e ..(iI .f27r + o (e)) I (2 e) converges to ,J2 t I rc as e ,(, 0 so that the expectation of
L(t) is as stated.
Tanaka's formula given later in this section (Eq. 6.39) provides an alternative way
for calculating E[L(t)]. According to this formula we have E[IB(t)l]- E[IB(O)IJ =
E[B(t)] + E[L(t)], where B is a Brownian motion. For B(O) = 0 we have E[L(t)] =
E[IB(t)i], which gives the stated expectation for L(t) since B(t) ~ N(O, t). •

Example 6.18: Let X be a real-valued mean square differentiable stationary Gaus-


sian process with mean zero and unit variance. The expectation of the local time
of X is E[L(t)] = t j ...;'2rr for all t 2: 0, where Lis given by Eq. 6.38 with X in
place of B.
We note that the expectations of the local time process L(t), t 2: 0, for a
Brownian and an m.s. differentiable Gaussian process with mean zero and vari-
ance 1 are proportional tot 112 and t, respectively.<>
6.2. Random walk method 373

Proof: The mean rate at which X crosses zero is iJ jrr:, where &2 E[X(t) 2] (Sec-
tion 3.10.1). Let N(t) be the number of zero-crossings of X during a time interval [0, t]
and let Di be the largest time interval including the time 7i of the zero-crossing i of X such
that X (s) E ( -e, e) for all s E Di. The approximate length of Di is 2 e IX(Ti) and has
the expectation 2 e Jli72!ir since the density of X at the time of a zero-crossing of X is
f(z) = (z/ir 2) exp( -z2 /(2 &2 )), z ~ 0.
The expected value of the total residence D(t) =I:~~) Di of X in ( -e, e) during
a time interval (0, t) is

E[D(t)] =
.
EIE [~
i=I
Di I N(t)] ) = E[N(t)] E[J_!____]
X(Ti)
= iJ t 2 t:
rr
~
o-

so that E[D(t)]/(2e) = tj../2ii. Note that the expectation of the local time is a fixed
fraction oft for stationary mean square differentiable Gaussian processes. •

The following equation provides a representation of the Brownian motion


reflected at zero, that is based on the local time process in Eq. 6.38. The represen-
tation is very useful for both theoretical developments and applications.

Tanaka's formula. If B is a Brownian motion, then

Jot Jot
IB(t)l - IB(O)I = sign(B(s)) dB(s) +lim...!_ 1(-s,sJ(B(s)) ds
s.J.-0 2e
= B(t) + L(t) a.s., (6.39)

where the limit is in L2 and iJ is a Brownian motion ([42], Section 7.3).

JJ
Note: The process B(t) = IB(O)I + sign(B(s)) dB(s) is a square integrable martingale
with continuous samples and quadratic variation (Section 4.5.3)

[B, B](t) = f (sign(B(s))) 2 d[B, B](s) =t a.s.,

where sign(x) is -1, 0, and 1 for x < 0, x = 0, and x > 0, respectively. Hence, B is a
Brownian motion (Section 4.6.2). The second integral on the right side of Eq. 6.39 is the
local time process L(t) in Eq. 6.38. Figure 6.6 shows a few sample paths of B and the
corresponding samples of the Brownian motion process B(t) = JJ
sign(B(s)) dB(s) for
B(O) = 0.
The Tanaka formula also shows that IB I is a semimartingale since B is a square
integrable martingale and L is a continuous increasing process that is adapted to the natural
filtration of B (Section 4.4.1 ). ~

Let hs be a real-valued function defined by h 8 (x) = lxl for lxl > e and
he (x) = e /2 + x 2 j (2 e) for lx I :S e, where e > 0 is arbitrary (Fig. 6.7).
374 Chapter 6. Deterministic Systems and Input

10.-----------------~ 10.------------------ ,
Samples of B Samples of f3

-5L---------~--------~
0 5 10 5 time 10
time

Figure 6.6. Five sample paths of a Brownian motion B and the corresponding
samples of B for B(O) = 0

Figure 6.7. The approximationh 8 (x) of lxl

Extended local time process. If g E C2 (IR),

i(t) =lot g'(IB(s)j) dL(s) = g'(O) L(t). (6.40)

Note: If g is the identity function, then the process L becomes the local time process L
defined by Eq. 6.38. A
6.2. Random walk method 375

First extension of Tanaka's fonnula. If g E C2 (IR), then

g(IB(t)l)- g(IB(O)i)

= t g'(IB(s)i) sign(B(s))dB(s) +!2 lot g"(IB(s)i)ds + L(t),


lo
(6.41)

where L(t) = g'(O) L(t) (Eq. 6.40).

Proof: The classical Ito formula cannot be applied to g(X) = g(IBI) since the mapping
B r+ g(IBI) is not differentiable. Formal calculations using Ito's formula yield

g(IB(t)l)- g(IB(O)I) = lo0t g' (IB(s)l) sign(B(s)) dB(s) + -2I lot0 g" CIB(s)l) ds
+lot g' (IB(s)l) 8(B(s)) ds
since

dg(lxl)jdx = g 1 (I xi) dlxlfdx = g 1 (lxl) sign(x) and


d 2 g(lxl)jdx 2 = g" (lxl) (sign(x)) 2 + g' (lxl) d sign(x)jdx = g'' (lxl) + 2 g' (lxl) 8(x),
where 8 is the Dirac delta function. We show that the above equation holds if the integral
JJ g (IB(s)l) 8(B(s)) ds is the process L in Eq. 6.40.
1 The formula in Eq. 6.4I can be
obtained from Tanaka's formula or by direct calculations. If g is the identity function,
Eq. 6.4I coincides with Tanaka's formula.
Tanaka'sformula: Because g has a continuous second derivative and the R'2-valued
process (B, L) is a sernimartingale, the classical Ito formula can be applied to g(X) =
g(B + L) and gives

g(IB(t)l)- g(IB(O)I) = lo0t g' (IB(s)l) d(B + L)(s) + -2I lot0 g" CIB(s)l) ds
= Jot g'(IB(s)l)dB(s)
IJot g"(IB(s)i)ds + Jo
+2 t g (IB(s)l)dL(s)
1

= Jot g'(IB(s)l)dB(s)
IJot g"(IB(s)l)ds +g'(O)L(t),
+2
that is, the result in Eq. 6.41.
Direct calculations: The proof of Eq. 6.4I involves three steps. First, a sequence of
approximations hn,e (x) converging to lx I is constructed, where hn,e E C2 (JR) for each inte-
ger n 2::: I and s > 0. Second, the Ito formula is applied to the sequence of approximations
g(hn,e(B)) of g(IBI). Third, it is shown that the resulting sequence of equations giving
the increment of g(hn,e(B)) in a time interval [0, t] converges to Eq. 6.4I as n --+ oo and
s ,).. 0. Similar steps are used in [42] (Chapter 7) to prove Tanaka's formula.
1. Let he be the function in Fig. 6.7 defined for somes > 0. The first derivative of he is
hi(x) =-I, xjs, and I for x.:::; s, x E [-s, s], andx 2::: s, respectively. The function he
is not twice differentiable at x = ±s, and we seth~ zero at x =±e. With this definition
the second derivative of he is h~(x) = (Ijs) I(-e,e)Cx).
376 Chapter 6. Deterministic Systems and Input

L: L:
Consider the sequence of functions

hn,e(x) = he(x- y) C{Jn(Y) dy = he(z) CfJn(X- z) dz,

where the functions CfJn E C00 (JR) have compact support shrinking to {0} as n ~ oo, for
example, CfJn (y) = n rp(n y) and rp(y) = c exp( -(1 - y 2 )- 1) for IYI < 1 and rp(y) = 0
for IYI 2: 1, where c > 0 is a constant such that J2 1 rp(y) dy = 1. The functions hn,e are
infinitely differentiable and are such that (1) hn,e ~ he, h~,e ~ h~ uniformly in lR as
n ~ oo and (2) h~,e ~ h~ pointwise as n ~ oo except at x = ±e.

L:
Consider first the functions hn,e. Since

lhn,e(X)- he(x)l :5 lhe(x- y)- he(x)l CfJn(Y) dy, X E IR,

it is sufficient to show that for '7 > 0 arbitrary there is n(17) such that f~oo lhe(x - y)-
he(x)l CfJn(Y) dy < '7 for all n > n(17) and for all x E JR. The function he is uniformly
continuous so that there is a 8 > 0 for which lhe(u)- he(v)l < '7 if lu- vi < 8. Divide
the domain of integration of f~oo Ihe (x - y) - he (x) I CfJn (y) dy in two intervals, ( -8, 8)
and ( -8, 8/. The integral on ( -8, 8) is smaller than '7 for all n. The integral on ( -8, 8)c is
equal to zero for n large enough so that there is n(17) such that lhn,e(x)- he(x)l is smaller
than '7 for all n > n ('7). The same arguments can be used to prove the uniform convergence
of h~ e to h~. That h~ e converges to h~ pointwise for x i= ±e results by similar consider-
ation~. At x = e, the difference h~,e(e) - h~ (e) is f~oo (h~ (e - y) - h~ (e)) CfJn (y) dy =
(1/e) J5e CfJn (y) dy and approaches 1/(2e) i= 0 as n ~ oo. Hence, h~,e does not converge
to h~ at x = e and at x = -e.
2. The Ito formula applied to B(t) r+ (go hn,e)(B(t)) = g(hn,e(B(t))) gives

g(hn,e(B(t)))- g(hn,e(B(O))) =lot g'(hn,e(B(s)))h~,e(B(s))dB(s)


+~lot g" (hn,e(B(s))) (h~,e(B(s))) 2 ds
+~ f g'(hn,e(B(s)))h~,e(B(s))ds a.s. (6.42)

for each t 2: 0. The expectation,

Ex[g(hn,e(B(t)))]- g(hn,e (x)) = ~Ex [f g" (hn,e(B(s))) (h~,e(B(s))) 2 ds]


+~Ex [f g1 (hn,e(B(s)))h~,e(B(s))ds],
of the above equation with B(O) = x can be used to estimate the local solution of mixed
boundary value problems by Monte Carlo simulation, as illustrated later in this section.
The stochastic integral JJ g 1(hn,e (B(s))) h~,e (B(s )) d B(s) is a martingale starting at zero
so that its expectation is zero.
The first extension of the Tanaka formula in Eq. 6.41 results by taking the limits of
Eq. 6.42 as n ~ oo and e .,(, 0. Before performing these operations we make the following
6.2. Random walk method 377

observation. Consider the sequence Tq = inf{t 2: 0 : IB(t)l 2: q}, q = I, 2, ... , of


stopping times and let BTq be the Brownian motion B stopped at Tq. The process BTq is
bounded by q. If Eq. 6.41 with BTq in place of B is valid for each q, then this equation
is valid. Hence, we can assume that B takes values in a compact set. This assumption is
made in the following step.
3(a). For each t ::,:: 0 and g E C2 (JR), we have almost surely

g(he(B(t)))- g(he(B(O))) = f g1 (he(B(s)))h~(B(s))dB(s)


+! t g''
2}o
(he(B(s))) (h~(B(s))) 2 ds + ! (
2}o
g 1 (he(B(s))) h~ (B(s)) ds. (6.43)

Consider first the left side of Eq. 6.42. By the definitions and the properties of hn,e
and he the sequence of functions go hn,e converges togo he uniformly in IR as n -+ oo so
that (go hn,e)(B) converges to (go he)(B) in L2 and a.s. Hence, the left side of Eq. 6.42
converges to the left side of Eq. 6.43 in L2 and a.s. for each t 2: 0 as n -+ oo.
Consider the first integral on the right side of Eq. 6.42. For each t ::,:: 0 the sequence
of functions l[O,t] (g 1 o gn,e) (g~,e)(B) converges to l[O,t] (g 1 o he) (h;,)(B) uniformly on
[0, oo) x Q as n -+ oo. This observation and the equality (Section 4.5.3)

E [ (f g' (hn,e(B(s))) h~,e(B(s)) dB(s)- lot g (he(B(s))) h~(B(s)) dB(s)) 2 ]


1

=lot E [ (g' (hn,e(B(s))) h~,e(B(s))- g (he(B(s))) h~(B(s)) ) 2] ds,


1

imply the L2 convergence of Jci g 1 (hn,e(B(s))) h~,e (B(s)) dB(s) to the stochastic integral
Jcig'(he(B(s)))h~(B(s))dB(s) as n-+ oo. The a.s. convergence of this sequence of
integrals results by bounded convergence.
Consider the second integral on the right side of Eq. 6.42. The sequence of func-
tions (g" o hn,e) (h~,e) converges to (g 11 o he) (h~) uniformly in IR as n -+ oo so that
(g 11 o hn,e) (h~,e)(B) converges to (g 11 o he) (hi)(B) in L2. Because the functions (g" o
hn,e) (h~,e)(B) are bounded a.s. on [0, t], Jci g 11 (hn,e(B(s))) (h~,e(B(s))) 2 ds converges
to Jci g 11 (he(B(s))) (h~(B(s))) 2 ds in L2 and a.s. as n-+ oo by bounded convergence.
Consider now the last integral in Eq. 6.42. The integrand g' (hn,e(B(s))) h~,e(B(s))
converges to g 1 (he(B(s))) (h~ (B(s))) as n -+ oo for w E Q \ N(s) and a fixed s ::,:: 0,
where N(s) = {w: B(s, w) = ±e} c Q and P(N(s)) = 0. Hence,

lim g 1 (hn e(B(s))) h~ e(B(s))


n-+oo ' '
= g (he(B(s))) h~ (B(s))
1

holds a.s. for each fixed s 2: 0. Let N = {(s, w) E [0, t] x Q : B(s, w) = ±e} be a subset of
[0, t] x Q where the preceding equality does not hold. The set N is A. x P-measurable, where
A. denotes the Lebesgue measure. This set has measure zero since fN(A. x P)(ds, dw) =
f[O,t] A.(ds) fN(s) P(dw) (Fubini's theorem) and JN(s) P(dw) = 0 for each s E [0, t].
Therefore,
lim g' (hn e(B(s))) hn" e(B(s)) = g 1 (he(B(s))) h'!. (B(s))
n-+oo ' ' c.

is true for A.-almost all s ::,:: 0, almost surely. Hence, the third integral on the right side of
Eq. 6.42 converges to Jci g' (he(B(s))) h~ (B(s)) ds in L2 and a.s. for each t ::,:: 0 as n -+ oo
by bounded convergence (Section 2.5.2.2). This completes the proof of Eq. 6.43.
378 Chapter 6. Deterministic Systems and Input

3(b). For each t :2:: 0 we have a.s.

g(/B(t)l)- g(/B(O)I) =lot g (/B(s)l) sign(B(s)) dB(s)


1

+-1 lot g (/B(s)/)ds +lim -21 lot g'(he(B(s))) 1(-s s)(B(s))ds,


11 (6.44)
2 0 s.j,O 8 0 '

where the above limit is in Lz and is equal to g1 (0) L (t).


Consider first the left side of Eqs. 6.43 and 6.44. The function (go h 8 )(B(t)) con-
verges to g(/B(t) /)in Lz and a.s. ass .J, 0 since these functions differ only in ( -s, s ), their
difference is bounded in this interval so that E [(g(/B(t)/)- g(h 8 )(B(t))) 2 ] is smaller
than a finite constant times the probability P(/B(t)/ :::e), and this probability approaches
zero ass .J, 0.
Consider now the integrals on the right side of Eqs. 6.43 and 6.44. The second
moment of the difference

I (t) = lot g1(h 8 (B(s))) h~(B(s)) dB(s) -lot g1(/B(s)/) sign(B(s)) dB(s)

of the first integrals on the right side of Eqs. 6.43 and 6.44 is (Section 4.5.3)

E[I(t) 2] = f E [(g 1 (h 8 (B(s)))h~(B(s))- g (/B(s)/) sign(B(s))) 2 ]


1 ds.

The above integrand is bounded by

2 E [(g1 (h 8 (B(s)))- g 1 (/B(s)l)) 2 (h~(B(s))) 2 ]


+ 2E[g1(/B(s)/) 2 (h~(B(s))- sign(B(s))) 2]

and this bound converges to zero ass .J, 0 since (h~(x))) 2 ::: 1, /h~(x) - sign(x)/ :::
1(-s,s)(x), and g 1 is continuous.
The expectation of the square of the difference,

J (t) = f g 11 (h 8 (B(s))) (h~(B(s))) 2 ds- f g 11 (/B(s)/)) ds,

of the second integrals on the right side of Eqs. 6.43 and 6.44 is

E[J(t) 2] =E [ (f (g 11 (h 8 (B(s)))- g 11 (/B(s)/)) (h~(B(s))) 2 ds

+lot g"<IB<s)l) <h~<B<s)) 2 - l)dsrJ = E [ut<t) + Jz<t)) 2]


::0 2 E [ 1t (t) 2 ] + 2 E [Jz(t) 2 ],
where J 1 and lz denote the first and second integrals, respectively, in the above equation.
6.2. Random walk method 379

Also, we have

\h (t)\ :5lot \l (hs(B(s)))- g (\B(s)\)\ (h~(B(s))) 2 ds


1 11

:5 lot \g' (hs(B(s)))- g (\B(s)l)\1c-s,s)(B(s))ds :5


1 11 f 1(-s,sJ(B(s))ds, c1

\h(t)\ :5 lot \g 11 (\B(s)\)\\h~(B(s)) 2 -1\ ds :5 f 1(-s,s)(B(s))ds,


c2

where the constants q, c2 > 0 exist since the functions g" and hs are continuous and the
Brownian motion is restricted to the interval ( -8, 8). Hence,

Lc{ r
and

2
E [ ( { 1(-s,s)(B(s))ds) ] = t 2 1(-s,sJ(B(s))ds dP

:5 t 2 LGlot 1(-s,s)(B(s))ds) dP = t { (L 1(-s,s)(B(s))dP) ds

= t fP(\B(s)\ < 8)ds = t { (ci>(8/vS)- ci>(-8/vS)) ds-+ 0, 8-+ 0,

Jd
proving the convergence of g 11 (hs(B(s))) (h's(B(s))) 2 ds to Jd
g 11 (\B(s)\)ds in L2 as
8 {, 0. Hence, the limit in Eq. 6.44 must also exist in L2 and the equality in Eq. 6.41 holds
a.s. for each fixed t 2:: 0. That lims-J_o(1/28) Jd
g 1 (hs(B(s))) 1(-s,s) (B(s))ds is equal
to g 1 (0) L(t) follows from the observations that JJ1c-s,s) (B(s))ds converges to zero as
8 {, 0 and

. 1 !of gI (hs(B(s))) 1(-s s) (B(s))ds- g I (0) L(t) I


Ihm-
s{-028 o '
= llim _!_ t [g 1 (h 8 (B(s)))- g 1 (0)] 1(-s s) (B(s))dsl
s{-028]o '

. 1
:5 hm- !of Ig (hs(B(s)))- g (0) I 1(-s s) (B(s))ds
s{-028 0
I I
'

:Slim- lot 1(-s (B(s))ds,


C8
s)
s{-0 2 8 0 '

where c > 0 and 8 bound \g'1 (x) I and hs (x ), respectively, for x E ( -8, 8). •

The expectation ofEq. 6.41 for IB(O)I =xis

£X[g(\B(t)i)]- g(x) = ~ex [lot g"(IB(s)l) ds J+ g (0) Ex [L(t)]. 1 (6.45)


380 Chapter 6. Deterministic Systems and Input

Note: This formula can be used to find the local solution for mixed boundary value prob-
lems. For example, if g is the solution of a partial differential equation defined in (0, l)
and g 1 is prescribed at x = 0, the last term in the above equation captures the contribution
of this boundary condition to the local solution. &

The process i in Eq. 6.40 can be approximated by

- g'(O) (
L(t, w):::: 2S Jo 1(-e,e)(B(s, w) ds,

for a small 8 > 0, that is, by the time B spends in the range ( -8, 8) during the
time interval (0, t) scaled by g'(0)/(2 8). The approximation has been used in the
following two examples to find local solutions by Monte Carlo simulation.

Example 6.19: Let u be the solution of u"(x) = -m(x)fx, x E (0, 1) (Exam-


ple 6.16). The function u gives the displacement in a beam with unit span and
stiffness x > 0, where m is the bending moment in the beam. It is assumed that
the beam supports a uniform load of intensity q > 0. Three boundary conditions
are considered.
(i) Fixed left end andfreerightendimplying u(O) = 0, u '(0) = 0, and -m(x)fx =
a (1- x) 2 , where a= q/(2 x).
(ii) Fixed left end and simply supported right end implying u(O) = 0, u '(0) = 0,
u(l) = 0, and -m(x)fx = a (x 2 - x) + {3 (x - 1), where {3 = mo/x and mo
denotes the unknown bending moment at the left end of the beam.
(iii) Simply supported ends with a rotational spring of stiffness kat the left end
implying u(O) = 0, u(1) = 0, and -m(x)fx =a (x 2 - x)- (k u'(O)fx)(x- 1).
Let X= IBI be a Brownian motion reflected at zero, starting at x E (0, 1).
LetT= inf{t > 0: IB(t)i ~ 1} denote the first time X= IBI exits (0, 1). The
local solution is (Eq. 6.45, [82]),

u(x) = Ex[u(X(T))] +Ex [loT m~~s)) ds J- u'(O) Ex [L(T)].

This equation specialized for the above boundary conditions gives the following
local solutions.
(i) The local solution cannot be applied directly since Ex [u(X (T))] = Umax is not
known. The boundary condition u(O) = 0 and the local solution at x = 0 yield
Umax = ~E 0 [J[ (1- X(s)) 2 ds J.
Once Umax has been calculated, we can apply
the local solution to find the value of u at an arbitrary x E (0, 1). The exact value
of the tip displacement is Umax = 0.25 a. The estimated value of Umax based on
ns = 1, 000 samples of X generated with a time step /).t = 0.001 is in error by
0.24%. Note that the value of Umax can also be used to solve locally the Dirichlet
boundary value problem for u(O) = 0 and u(1) = u max·
6.2. Random walk method 381

(ii) The local solution,

u(x) =~Ex [loT (a (X(s)- X(s) 2 ) + : 0 (1- X(s))) ds J,


depends on the unknown moment m O· We can find mo from the local solution
applied at x = 0 and the boundary condition u (0) = 0. The estimate of m o can be
used in the above equation to find u(x) at any x E (0, 1). The estimated value of
mo from ns = 1, 000 samples of X with a time step of /)..t = 0.001 is in error by
0.16%.
(iii) The local solution is
u(x) = -u 1 (0) Ex[L(T)]

+~Ex [loT ( 0 ·~ a (X(s)- X(s) 2 ) + k u'(O) (1- X(s))) ds J,


since u(X(T)) = u(1) = 0. The Monte Carlo estimate of Ex[L(T)] based on
ns = 500 samples and a time step /)..t = 0.001 is 1.0248. From the condition
u(O) = 0, the resulting estimate of the moment mo is 0.0494. The error in the
estimated value of mo is 1.2%. The value of u' (0) and the above expression of the
displacement function u can be used to estimate the value of this function at any
X E (0, 1). <>
Proof: Let X = IB I = B + L be a Brownian motion reflected at zero, where B is a
Brownian motion and L denotes the local time process in Eq. 6.38 and X(O) = x. The Ito
formula in Eq. 6.41 gives

u(X(t))- u(X(O)) = t u (X(s))dB(s) +u'(O)L(t) +!2 lot u (X(s))ds


lo
1 11

for g = u, where u'(x) = du(x)jdx and u11 (x) = d 2 u(x)jdx 2 . The expectation of the
above equation in the time interval (0, T) is

Ex [u(X (T))] - u(x) = u1 (0) Ex [L(T)] + ~Ex [loT u (X (s)) ds] .


11

Because u is the solution of u"(X(s)) = -m(X(s))/x for s < T, the above equation
yields the stated local solution [82]. •

Example 6.20: Let u be the solution of (1/2) u"(x)+q u(x) = 0, x E D = (0, l),
with the Dirichlet and Neumann boundary conditions u'(O) =a and u(l) = {3,
where a, {3 are some constants. The local solution of this Schrodinger equation is

u(x) = fJ Ex [eq T] -a Ex [loT eqs dL(s) J,


where T = inf{t > 0: IB(t)l fJ. D}, B(O) = x ED.
The estimates of u given by the local solution are in error by less than 2%
for a = 1, fJ = 2, q = 1, l = 1, and ns = 500 samples of X generated with a
time step of M = 0.0005. <>
382 Chapter 6. Deterministic Systems and Input

Proof: Let X be an JR2-valued process with coordinates Xr = IBI and X2 defined by


the differential equation dX2(t) = q X2(t)dt with X2(0) = 1 (Section 6.2.2.1). The Ito
=
formula applied to the function u(IB(t)l) X2(t) u(B(t) + L(t)) X2(t) gives

u(IB(t)l) X2(t) - u(x) ={ u' (B(s) + L(s)) X2(s) d(B(s) + L(s))

+lot u(B(s) + L(s)) dX2(s) + ~ { u11 (B(s) + L(s)) X2(s) ds

so that

Ex [u(IB(T)I) X2(T)] - u(x) = Ex [loT u' (IB(s)l) X2(s) dL(s)]

+ q Ex [loT u(IB(s)l) X2(s) ds] + ~Ex [loT u" (IB(s)l) X2(s) ds]
by considering the time interval [0, T], using the definition of X2, and averaging. Since
IB(s)l is in D for s < T, then (1/2) u" (IB(s)l) + q u(IB(s)l) = 0 so that

This formula yields the stated expression of u(x) since u(IB(T) I) = f3 and X2(s) = eq s.
The exact solution of the Schrodinger equation is

u(x) = f3- (a/~)sin(~l)


~
cos(y2ql)
r;;-:
cos(y2qx) +
a
~
y2q
r;;-:
sin(y2qx)

for q > 0. This solution has been used to calculate the error of the local solution. •

Example 6.21: Let X be a diffusion process defined by the stochastic differential


equation dX(t) = a(X(t)) dt + b(X(t)) dB(t), t ::: 0, where B is a Brownian
motion. It is assumed that the coefficients a, bare bounded and satisfy the uni-
form Lipschitz conditions so that the above equation has a unique solution (Sec-
tion 4.7.1.1). The absolute value of X, referred to as X reflected at zero, is given
by

IX(t)l- IX(O)I = lootsign(X(s)) dX(s) +lim-


1 lot

b(X(s))
e,I.O 0
2 1c-e,e)(X(s)) ds

= X(t) + b(0) 2 L(x)(t),


where L(x)(t) = liiDe,!.O 1/(2£) J~ 1(-e,e)(X(s)) ds and the limit is in L2. If a =
0 and b = 1, X is a Brownian motion and the above equation coincides with
Tanaka's formula (Eq. 6.39). 0
6.2. Random walk method 383

Proof: Let hn,s and h 8 be the functions used to prove Eq. 6.41, where 8 > 0 and n ~ 1 is
an integer. The classical ItCi formula applied to function hn,s (X) gives

hn,e(X(t))- hn,e(X(O)) = Jot h~, 8 (X(s)) dX(s) + 2 Jot h~. 8 (X(s)) b(X(s)) 2 ds.
1

The limit of this equation as n --.. oo and 8 ,!. 0 yields the stated representation of
IX(t)l - IX(O)I based on arguments similar to those used to prove Eq. 6.41. The second
integral on the right side of the above equation is equal to b(Of L (x) (t) since it converges
to (1/2) JJh~(X(s))b(X(s)) 2 ds = (1/28) JJ b(X(s)) 2 1(-s,s)(X(s))ds asn--.. oo.
The proof can be completed by considerations of the type used to prove Eq. 6.44. •

6.2.3.2 Brownian motion reflected at two thresholds

Let a and b > a be some constants and let r be a periodic function with
period 2 (b -a) defined by r(x) =
lx - al for lx - al ::: b -a. Figure 6.8
shows this function in [ -3, 3] for a = 0 and b = 1. Let B be a Brownian motion.

-2 -1 0 2 3
X

Figure 6.8. Mapping for Brownian motion reflected at a = 0 and b = 1

The process r(B) has the range [0, b- a], and is referred to as Brownian motion
reflected at two thresholds. Figure 6.9 shows a sample of a Brownian motion B
and the corresponding sample of r (B) for a = 0 and b = 1.
Let Xk(a) = a+ 2k (b- a) and Xk(b) = b + 2k (b- a), k E Z, be the
collection of points on the real line where r (x) is equal to 0 and b- a, respectively.
Fore E (0, (b- a)/2) define the intervals Ik,e(a) = {x : lx- Xk(a)l ::: s} and
h,s(b) = {x : lx - Xk(b)l ::: s} centered on Xk(a) and Xk(b), respectively, and set
J(a) = U~_ 00 (Xk(a), Xk(b)) and /(b)= U~_ 00 (Xk(b), Xk+l (a)).
384 Chapter 6. Deterministic Systems and Input

Sample of B Sample of r(B)


2.5r-___;---------, 1~~--r.--,--~,

2
0.8

1.5
0.6

0.4

0.2

-0.5L-~-~-~-~--' 0
0 2 3 4 5 0 2 3 4 5

time time

Figure 6.9. A sample of Band the corresponding sample of r(B) for a = 0 and
b=1

Second extension of Tanaka's formula. If g E C2 (JR.), then

g(r(B(t)))- g(r(B(O))) =lot g'(r(B(s))) (1/(a) -1/(b)) (B(s))dB(s)

+ L
00
[g'(O)L(t;xk(a))-g'(b-a)L(t;xk(b)))+- 11t
2 0
g 11 (r(B(s)))ds,
k=-00
(6.46)

where L(t; Xk(a)) = lim8 _J..o(1/2s) J~ 1h(a)(B(s)) ds and the expression of


L(t; Xk(b)) is given by L(t; Xk(a)) with lk(a) in place of Ik(a).

Proof: The Tanaka formulas in Eqs. 6.39 and 6.41 cannot be used to prove Eq. 6.46 since
they deal with Brownian motion processes reflected at a single threshold. Therefore, we
have to use direct calculations to establish the second extension of Tanaka's formula.
We only prove that r(B) satisfies the condition,

r(B(t))- r(B(O)) =lot (1I(a) - 1/(b)) (B(s)) dB(s)


00
+ L (L(t; Xk(a))- L(t; Xk(b))), a < b, (6.47)
k=-00

that is, a special case of Eq. 6.46 for g equal to the identity function. The proof involves
three steps (Eq. 6.41). First, a sequence of smooth approximations hn, 8 (B) converging to
r(B) is developed. Second, the ItO formula is applied to the function hn, 8 (B). Third, the
limit of the resulting formula is calculated for n --+ oo and e . (. 0.
6.2. Random walk method 385

l
1. Let E: > 0 and define
r(x), x ¢'. (uklk,s(a)) U (Uzlz, 8 (b)),
h ( ) _ f. + (X-Xk(a)) 2 xEh, 8 (a), kEd::,
sX- 2 2s '
. b _ a _ f. _ (X-Xk(b)) 2 h, 8 (b),
2 2s ' x E k E Z.
Let h n, 8 (x) = fJR h 8 (x - y) f{Jn (y) d y be a sequence of approximations of hs, where the
functions f{Jn E C 00 (R) have compact support shrinking to {0} as n --* oo. The functions
hn,s are infinitely differentiable for each n and E: > 0. Note that the sequence of functions
hn,s used in the proofs ofEqs. 6.41 and 6.46-6.47 share the same notation but have different
definitions.
For every & > 0, the functions hn,s and h~.s converge uniformly to h 8 and hi in
R as n --* oo, respectively, and h~ s converges pointwise to h~ in R as n --* oo except at
x = Xk(a)±& and x = Xk(b)±do~ k E Z. The function h~ is not defined atx = Xk(a)±&,
x = Xk (b) ± E: and is set to zero at these points (Section 6.2.3.1).
2. The Ito formula applied to the function hn,s(B(t)) yields

hn,s(B(t))- hn,s(B(O)) = t h~,s(B(s))dB(s) + 21 Jot h~, 8 (B(s))ds.


Jo

3. For each t 2: 0 we have almost surely


hs(B(t))- h 8 (B(O))

= r h~(B(s))dB(s) +
fo 2s
t ~
_..!:__
L-fo
(1h
·'
k=-00
(a)- lh (b)) (B(s))ds
·'

lo "' -1 lot (1h


00
t
=
0
hi(B(s)) dB(s) + £--- 2& 0 ,s
(a)- lh (b)) (B(s)) ds
,e
(6.48)
k=-00

as n --* oo, since h~(x) = L~-oo (lh,s(a)- lh,,(b)) (x). The last equality in the
above equation holds since the sequence

1 t L
Jo
m
l~t,,(a)(B(s))ds = Lm 1
fo
t llk,s(a)(B(s))ds
28 28
k=-m k=-m
is positive, increasing, and bounded by t j (2 E:) for each integer m 2: 1. The L2 limit of the
above equation as & {. 0 yields Eq. 6.47 for each t 2: 0. Similar arguments can be used to
establish Eq. 6.46.
The Tanaka formula in Eq. 6.39 is a special case ofEq. 6.47 for a = 0 and b--* oo.
Under these conditions (1) the subsets /(a) and /(b) ofR approach (0, oo) and (-oo, 0),
respectively, so that 1/(a)(x)- 1/(b)(x) = sign(x), (2) the summation on the right side
of Eq. 6.47 degenerates in a single term corresponding to Ik,o(&) = 1(-s,s)• and (3) the
function r(x) becomes lxl. •

Example 6.22: Consider a steady-state transport problem defined in D = (0, a) x


(0, b), b » a > 0. The pollutant concentration u satisfies the partial differential
equation
386 Chapter 6. Deterministic Systems and Input

with the boundary conditions u(x1, 0) = uo for x1 E (0, ~a), 0 < ~ < 1,
u(x1, 0) = 0 for x1 E (~a, a), u(x1, b) = ~ uo, and Bu(x)jBx1 = 0 at x1 = 0,
x1 =a.
The local solution is u(x) = Ex[u(X(T))], where X= (r(X1), X2), X=
(X 1, X2) is a diffusion process defined by the stochastic differential equation

dX1(t) = -CJ dt + (2d 1) 112 dB 1(t),


{
dX2(t) = (2d2) 112 dB2(t),

with the initial condition X (0) = x E D, B;, i = 1, 2, are independent Brownian


motions, and T = inf{t > 0: X(t) fl. D}. The process X can exit D only through
the boundaries (0, a) x {0} and (0, a) x {b} since its samples are reflected at the
boundaries x1 = 0 and XJ =a.
Numerical results have been obtained for c1 = 1, d1 = 1, d2 = 1, a = 1,
~ = 0.3, b = 20, and a large number of points in D. The resulting estimates of
u (x) given by the local solution based on n s = 1, 000 samples of X generated
with a time step of l:lt = 0.005 are in error by approximately 1%. <>

Proof: The generator


a a2 a2
A= -c~- +d1- +d2-
ax1 ax? axi
of X matches the differential operator of the transport equation. However, X cannot be
used to solve the posed boundary value problem since u is not known on the boundaries
XJ = 0 and XJ = a. We modify the process X 1 such that it is reflected at these boundaries
of D. The extended Tanaka formula in Eq. 6.46 cannot be applied since X1 is a Brownian
motion with drift. We have to derive another extension of the Tanaka formula to solve this
transport problem.
Let X= (r(X1), X2) be a new process, where r is a periodic function with period
2 a defined by r(~) = I~ I for~ E [-a, a] and X = (XJ, X2) denotes the diffusion process
defined above. The process X can exit D only through the boundaries x2 = 0 and x2 = b
because its first coordinate takes values in the range (0, a) so that T = inf{ t ::0: 0 : X(t) ¢:.
D} = inf{t ::0: 0: X2(t) ¢:. (0, b)} is the first time that X exits D = (0, a) x (0, b).
Consider the sequence of approximations u(hn,s(XJ), X2) of u(X), where hn,s is
the function used in the proof of Eq. 6.46 with a, b replaced by 0, a. Accordingly, the
intervals I (a) and I (b) in this proof become I (0) and I (a), respectively. Since u(~Io ~2)
has continuous second order partial derivatives with respect to~~ = hn,s (XJ) and h = x2,
and ~1 = hn,s(XJ) is infinitely differentiable with respect to XJ for each nand e > 0, the
Ito formula can be applied to (X1, X2) f-+ u(hn,s(Xr), X2) and gives

u(hn,sCX 1(t)), X2(t)) - u(hn,sCX 1(0)), X2(0))


6.2. Random walk method 387

The limit of the above equation as n --+ oo is

u(h6(X 1(t)), X2(t)) - u(h6(X 1(0)), X2(0))

= lo t
o
[ -q -au h 1
a~1 s
+ dt (
2
-a u (h 1 ) 2 + -au h ")
a~'f 6 a~1 6
a-
2
+ d2
a~?
u] ds

+ ('
lo
(/2dl ~hi
a~1
dBt + /fd2 ~ dB2),
a~2
where the arguments of functions u and h6 are (h 6 (X 1(s)), X2(s)) and X 1(s), respectively.
The limit of the above equation as 8 {. 0 provides a further extension of the Tanaka
formula, which can be used to develop local solutions for transport equations. This ap-
proach has not been used here. The local solution and the numerical results reported in
these examples have been obtained from the expectation of the above equation with 8 in
the range (0, 0.1).
The exact steady-state concentration distribution in (0, a) x (0, oo) is ([194], Ex-
ample Problem 3.2, p. 68)

= ~ uo + L- sm(n '[{ n cos(--)


00 2uo . n rr x1
u(x) exp [0.5 (q/d2- jn) X2]'
n=l nrr a

where jn = [(q/d2) 2 + (2n rr dtf(a d2)) 2] 112. The exact solution shows that the con-
centration approaches a constant value ~ uo as x2 --+ oo. This solution is used to assess
the accuracy of the local solution calculated in D = (0, a) x (0, b) with the additional
boundary condition u(xt, b)=~ uo. •

6.2.3.3 Brownian motion in the first orthant of ~2


The definition of a reflected Brownian motion in spaces of dimension 2
and higher requires us to specify the subsets of the boundary aD of D where the
Brownian motion is reflected and the direction of reflection at each point of these
subsets. Let X be a Brownian motion in the first orthant of ~ 2 , that is, X (t) is in
the closure of D =
{x E ~2 : Xt > 0, x2 > 0} at each t 2:: 0. Three cases are
examined: (1) X has independent coordinates and its reflections are orthogonal
to the boundaries of D, (2) X has independent coordinates and its reflections are
at an arbitrary angle relative to the boundaries of D, and (3) X has dependent
coordinates and its reflections are at an arbitrary angle relative to the boundaries
of D.

g(X(t))- g(X(O)) = t, lot ag~~;s)) (dBi(s) + dLi(s))

+~lot dg(X(s)) ds. (6.49)


388 Chapter 6. Deterministic Systems and Input

Proof: Let (BI, B2) be two independent Brownian motions. The 1R:.2-valued process X =
(IBJI, IB2I) lives in the first orthant of the plane and its reflections are orthogonal to the
boundaries of D. For example, if X is on the boundary XJ = 0, then L1 has a positive
increment causing an increase of X1 in the positive direction of coordinate x1.

The Ito formula can be applied to g(X) in conjunction with Tanaka's formula pro-
viding a representation for IB; I and gives Eq. 6.49. Direct calculations based on arguments
similar to those used to derive Eqs. 6.41 and 6.46 give

lo
2
=L
t ag(X(s))
g(X(t))- g(X(O)) sign(Bi(s))dBi(s)
i=!
0 ax·l
I t
+ 2: Jo ~g(X(s))ds + ~ii(t)
2

Case 2: Let dij, i, j = 1, 2, be a constant matrix with d 11, d22 > 0. Then

X(t) =[ ~I(t) J+ [ dn (6.50)


B2(t) d21

lives in the first orthant D of JR2 , where B1 and B2 are independent Brownian
motion processes and IB;I = Bi + L;, i = 1, 2 (Eq. 6.39).

Proof: The coordinates of X are independent Brownian motions in D because lh and


B2 are independent Brownian motions and the local time processes L1 and L2 are zero
if X(t) E D. The classical Ito formula can be applied directly to functions g(X(t)), g E
C2 (R2 ), by the definition of X.

The process X takes values in the first orthant of R2 because d11 and d22 are strictly
positive. For example, suppose that X(t) = (O,x2), x2 > 0, at a timet. The incre-
ment dL1 (t) of the local time L1 causes a change of X from (0, x2) to (du dL1 (t), x2 +
d21 dL1 (t)) E D. Similar considerations apply for the other coordinate. Additional infor-
mation and examples can be found in [42] (Chapter 8). •
6.2. Random walk method 389

Case 3: Xi = IB71. i = 1, 2, X (Xt,X2), B*(t) = bB(t), c


g E C2 (JE.), and

~ fo
2 t
g(X(t))- g(X(O)) = g,i(X(s)) sign(B7(s)) dBt(s)

+~lot [en g,u (X(s)) + c22 g,22(X(s))

it
+2c12 g,12(X(s)) sign(Bf(s)) sign(Bi(s))] ds

1
+ '""cu
2
L,.;
lim-
028 £ 1
g i (X(s)) 1(-£ e)(B 1~(s)) ds.
, ,
(6.51)
i=l + 0

Note: The processes IB7 I can be approximated by the sequence of processes hni ,s (B;>,
ni = 1, 2, ... , fori = 1, 2 and s > 0. The ltOformula applied to the mapping ( Bi, Bz) r--+
g(hn 1 ,s(Bi), hn 2 , 8 (Bz)) gives

g(hn 1 ,s(Bi(t)), hn 2 ,s(Bz(t)))- g(hn 1 ,s(Bi(O)), hn 2 ,s(Bz(O)))

= L2 lot g i h~
i=l 0 , ',
8 dB7(s) +2 L
1 2
i=l 0
lot cu g i h~
, ',
8 ds

+~lot [en 8,11 (h~J,e)2 + 2ct28,12h~J,£ h~z,s + C228,22 (h~z,s)2] ds,


Where the arguments are omitted, 8,i = agja~i, 8,ij = a 2 g j(a~i a~j ), and ~i = hni,s(Xi),
i = 1, 2. The formula in Eq. 6.51 results by taking the limit of the above expression as
ni --+ oo and Bi ,1- 0, i = 1, 2.
The formula in Eq. 6.51 can be used to find the solution of the Laplace equation
defined on a wedge D* = {x E IR2 : Xt > 0, x2 > axt, a > 0} in IR2 by the change
of variables Yl = x1, Y2 = x2 - a x1. In the new coordinates D* is mapped into the
first orthant D of IR2 and B becomes a Gaussian process B* = (Bi = Bt, Bz = B2 -
a Bt) with dependent coordinates. The condition that B is in D* is equivalent with the
condition that B* is in D. The solution of the two problems corresponding to D and D*
coincide if the reflections at the Neumann boundaries of the two problems are consistent.
An alternative solution for Neumann boundary value problems defined in a wedge set can
be found in [42] (pp. 174-177). 6

Example 6.23: Consider the partial differential equation

a2 u a2 u
- + 3- = -16, XED= (0, 1) X (0, 1),
axr ax~

corresponding to the time invariant version of Eq. 6.1 with d = 2, a i (x) = 0,


,Bn(x) = 2, ,8!2(x) = ,821(x) = 0, .B22(x) = 6, q(x) = 0, and p(x) = 16.
The solution satisfies the mixed boundary conditions au;ax 1 = 0 on {0} x (0, 1),
390 Chapter 6. Deterministic Systems and Input

u = 0 on {1} x (0, 1), oujox 2 = 0 on (0, 1) x {0}, and u = -3.5349xr +


0.5161xi + 3.0441 on (0, 1) x {1}.
The local solution of this equation is

u(x) = 8Ex[T] + Ex[u(X(T))], x ED,

where the IR2-valued process X has the coordinates X I = IBII = BI + LI and


X2 = .J3IB2I = .J3 (B2+L2) (Eq. 6.39), Bi areindependentBrownianmotions,
i = 1, 2, and T = inf{t 2:: 0 : X(t) ¢. D} is the first time X exits D. The local
solution is the sum of two expectations that can be estimated from samples of X
and T.
Estimated values of the local solution at x = (0.25, 0.25), (0.5,0.5), and
(0.75,0.75) based on ns = 500 samples of X generated with a time step !1t =
0.001 were found to be 3.2571, 4.1808, and 1.8325. The corresponding values of
u obtained by the finite difference method are 3.258, 4.195, and 1.801. <>
Proof: A slight generalization of the Ito formula in Eq. 6.49 gives

~ [ {T au(X(s))
axi
,
(dBi(s)+dLi(s))
]
Ex[u(X(T))]-u(x)=;:{iEx fo

1 x
+ -E [loT (a 2u(X(s)) +3 a2u(X(s))) ds ] ,
2 2
2 0 axi ax2

where q = 1 and c2 = .J3. If u is a solution, the above equation gives the stated re-
sult because (1) the partial derivatives au;axi are zero on the Neumann boundaries and
the stochastic integrals j[aujaxi] dB are martingales starting at zero so that the first two
expectations on the right side of the above equation are zero and (2) a2u(X(s))jax? +
3 (a 2u(X(s))jaxi) = -16 for s E (0, T) .•

6.2.3.4 General case


There are no simple extensions of the Tanaka and Ito formulas for an ar-
bitrary JRd -valued diffusion process reflected at a boundary of a subset D of IR d
(Method 1). However, heuristic approximations can be developed for finding the
local solution of general Neumann boundary value problems (Method 2).
Method 1: Let u be the solution of a partial differential equation defined in a
rectangle D = xf=I (ai, bi) of IRd, d 2:: 1. Let X be a diffusion process whose
generator coincides with the operator of this differential equation. Suppose that
the equation has mixed boundary conditions and that the partial derivative oujox i
is specified on the boundary xi = ai. To incorporate the contribution of the Neu-
mann boundary conditions to the solution, we need to reflect one of the coordi-
nates of X when the process reaches this boundary of D, for example, the coor-
dinate XI if X becomes equal to XI = ai and XI = a1 is a Neumann boundary.
The method in Examples 6.19 and 6.20 can be used to find the local solution of
6.2. Random walk method 391

this class of problems. However, the approach cannot be extended to solve locally
general mixed boundary value problems characterized by less simple Neumann
boundary conditions and arbitrary subsets Din JRd.
Method 2: Consider the time-invariant version of Eq. 6.1 defined on an open
bounded subset D of JRd with the mixed boundary conditions u(x) = ~d(x),
X E aDd, and 'Vu(x). c(x) = ~n(X), X E aDn, where ~d and ~n are prescribed
functions, aDd and aDn are subsets of aD such that aD = aDd U aDn and
aDdnaDn = 0, V' = (ajaxl, ... ,a;axd),andcdenotesaspecifiedlRd-valued
function. Our objective is to develop a heuristic algorithm for calculating the local
solution of this partial differential equation.
Let x E D be a point at which we would like to calculate the local solution
and let X be an JRd -valued diffusion process whose generator coincides with the
differential operator of Eq. 6.1. The process X is started at x E D if we seek the
solution of Eq. 6.1 at this point. The sample paths of X can be divided in two
classes:
1. Sample paths that reach aD for the first time at y E aDd. The con-
tribution of these sample paths of X to the local solution is given by results for
Dirichlet boundary value problems since the classical Ito formula is valid in D
and u (y) is known.
2. Sample paths that reach aD for the first time at y E aD n. The solution
for Dirichlet boundary value problems cannot be applied since u is not known
on aDn. The contribution of these samples of X to the local solution can be
calculated by the following heuristic algorithm:
- Apply the classical Ito formula between x andy E aDn to derive a rela-
tionship between the values of u at x E Dandy E aDn.

- Reflect X, that is, shift y E a Dn to a point x' E D by a selected amount in


a direction consistent with the Neumann boundary conditions at y.

- Restart X at x' E D and find the point where the restarted process reaches
aD for the first time. If this point is on aDd the sample of X is stopped
because u is known on aDd. Otherwise, the previous step is repeated till
the restarted process reaches aDd.
The above algorithm giving the contribution of the Neumann boundaries to
the local solution is illustrated in detail by the following example. In this algorithm
the shift II y - x II from y E aD to x E D must be small but remains unspecified.

Example 6.24: Consider the local solution of the partial differential equation

a2 u a2 u
ax2 + 3 ax2 = -16, X= (Xl, xz) ED= (0, 1) X (0, 1),
1 2

with the mixed boundary conditions au;ax1 = f3(xz) on {0} x (0, 1), u = 0
on {1} x (0, 1), au;axz = a(x1) on (0, 1) x {0}, and u = y(x1) on (0, 1) x
392 Chapter 6. Deterministic Systems and Input

{1}, where a, {3, andy are prescribed functions. The generator of the diffusion
process X = (X 1 = B1, X2 = J3 B2) coincides with the operator of the above
partial differential equation, where B1 and B2 are independent Brownian motion
processes.
Let X(·, w) be a sample of X that reaches aD for the first time at Y(w) E
aDJ. This sample is stopped at Y(w) because u(Y (w)) is known. The contribution
of the sample X(·, w) to the local solution is u(x, w) = ~(Y(w)) + 8 T(w), where
T(w) is the time it takes this sample of X to reach aDJ (Example 6.23). For
example, ~J(Y(w)) is equal to zero or y(Y(w)) if Y(w) belongs to {1} x (0, 1) or
(0, 1) x {1}, respectively.
Let X ( ·, w) be a sample of X that reaches aD for the first time at Y (I l (w) E
aDn (Fig. 6.10). This sample cannot be stopped because u is not known at

Dirichlet
0.9

Dirichlet
0.8

0.6/
0.7

yO)(m) Y(m)
0.5

~----·
o.4 (hl,O)
0.3

0.2

Neumann

Figure 6.10. A hypothetical sample path of X with two reflections

y(ll(w). The point y(ll(w) is reflected to x<l)(w) = y(ll(w) +(hi, 0) E D,


h1 > 0, and the sample of X is restarted at x<l)(w). Suppose that the restarted
sample reaches aD for the first time at y(2) (w) E aDn (Fig. 6.10). Because u is
not known at Y(2)(w), we reflect the sample to x< 2l(w) = y(2l(w) + (0, h2) ED,
h2 > 0. Suppose, as illustrated in Fig. 6.10, that the sample of X restarted at
x< 2l(w) reaches aD at Y(w) E aDJ. This sample of X can now be stopped be-
cause u(Y(w)) is known. Let T<il(w) and T(w) be the times the sample w of X
takes to travel from x(i-l)(w) to y(il(w), i = 1, 2, and from X(2)(w) to Y(w),
respectively, where x<0l (w) = x E D. It is assumed that reflections from Y (i) (w)
to x<il (w) take no time. The contribution of this sample of X to the local solution
6.2. Random walk method 393

is (Fig. 6.10)

u(x, w):::: ~d(Y(w)) + 8 [T(l)(w) + r< 2l(w) + T(w)]


- .BCYJll (w)) h1 - acr?l (w)) h2

where yCiJ = (r?l, riil) and -.BCYJll(w))hl- a(Y1(2)(w))h2 represents the


contribution of the Neumann boundary to the local solution. <>

Note: LetfCil(w) E aDn andletX(il(w) E Dbetherefiectionoff(il(w). The first order


Taylor expansion of u at xCil (w) gives

u(X(i)(w)) = u(Y(i)(w) + (X(il(w)- y(i)(w)))


::::: u(Y(i)(w)) + Vu(Y(i)(w)). (X(il(w)- yCil(w))

for II xCil (w) - y(i) (w) II small, or


u(Y(i)(w)::::: u(X(il(w))- Vu(Y(il(w)) · (X(i)(w)- yCil(w)).

The approximate contribution to the local solution of the sample of X shown in


Fig. 6.10 can be calculated from the following considerations. First, the contribution of
sample win the time interval [0, rCll (w)] to the local solution at x is

u(x, w) = u(Y(ll(w)) + 8 r<ll(w).


The first order Taylor expansion applied to u(X(l) (w)) gives

since xCll(w)- y(ll(w) =(hi, 0), hl > 0, so that

according to the boundary condition. Second, the contribution during the time interval
[TCI)(w), r< 2l(w)] of the process restarted at x(ll(w) to the local solution is

The first order Taylor expansion applied to u (XC 2l (w )) gives

for a shift h2 > 0 in the positive direction of coordinate x2. Third, the contribution during
the time interval [T< 2l(w), T(w)] of the process restarted at x< 2l(w) to the local solution
is
u(X(2 l(w)) = ~d(Y(w)) + 8 T(w).
The stated result follows from the above equations. •
394 Chapter 6. Deterministic Systems and Input

The approach illustrated in Fig. 6.10 can be applied to arbitrary Neumann


conditions. For example, suppose that the boundary condition at an arbitrary point
x E aDn is 'Vu(x) · c(x) = ~n(x) and that a sample of X has reached a point
a
yCil (w) E Dn. Following the approach in Fig. 6.10, this point is reflected to
XCil(w) E D such that xCil(w)- yCil(w) = h c(Y(il), where h > 0 is a spec-
ified constant. The sample values of u at xCil(w) and u(Y(i)(w)) can be related
approximately by

where Vu(Y(i)(w))·c(Y(i) (w)}} = ~n(Y(i) (w)) by the prescribed Neumann bound-


ary condition.

6.3 Sphere walk method


The RWM examined in Section 6.2 and the SWM considered here are simi-
lar in the sense that both methods ( 1) deliver local solutions for partial differential
equations and (2) are based on processes that originate at a point x E D at which
the solution of a partial differential equation has to be calculated and exit D after
a random time T. However, the stochastic processes used by these methods are
different. The RWM is based on Brownian motion and diffusion processes while
the SWM uses spherical processes.

Boundary "dD of D

Figure 6.11. Definition of the spherical process

Figure 6.11 illustrates the construction of the spherical process. The pro-
cess consists of a sequence of open spheres included in D whose centers approach
6.3. Sphere walk method 395

the boundary aD of D. The point X E D where the local solution has to be cal-
culated is the center of the first sphere. This sphere has radius r = min yeaD II
x - y II so that it is tangent to aD. The center X 1 of the second sphere is a random
point on the surface of the first sphere and its radius R 1 is such that it is tangent to
aD. The spherical process in Fig. 6.11 is stopped when the center of a sphere in
this sequence enters a small vicinity of aD for the first time ([160], Section 1.2.1).
The local solution given by the SWM is based on the Green function, the
mean value property, and the Monte Carlo simulation method. The SWM can be
applied to find the local solution of partial differential equations of the type given
by Eq. 6.1 with Dirichlet and mixed boundary conditions. Efficient local solutions
by the SWM have been reported only for the Laplace and Poisson equations [200].

6.3.1 The Green function


Consider the partial differential equation
C [u(x)] = -p(x), xED, (6.52)
where C is a linear partial differential operator, for example, the operator of the
time-invariant version of Eq. 6.1, D denotes an open bounded subset of~ d, and
pis a specified real-valued function defined on ~d. Let g(x, y) be the solution of
Eq. 6.52 atx corresponding to a unit action applied at a "source point" y E D, that
is, the solution of C [g(x, y)] = -8(x- y), x, y e D, with g(x, y) = 0 for x E
aD. The function g (x, y) is the Green function for C and can be used to calculate
the solution u ofEq. 6.52 at an arbitrary point x E D ([43], Section V.14).

If u in Eq. 6.52 satisfies the boundary condition u(x) = 0, x E aD, then

u(x) = L g(x, y) p(y) dy. (6.53)

If u in Eq. 6.52 satisfies the boundary condition u(x) = ~(x), x e aD, then

(6.54)

where~ denotes a continuous extension of~ to D U aD.


Note: The Dirichlet boundary conditions in Eqs. 6.53 and 6.54 are referred to as homoge-
neous and inhomogeneous, respectively. The solution in Eq. 6.53 holds since Cis a linear
operator. The formula in Eq. 6.54 is valid since v = u - ~ satisfies the partial differential
equation
C[v] = -p(x)- C[~(x)]
with v(x) = 0 for x E aD so that v(x) = fv g(x, y) (p(y) + C[~(y)l) dy by Eq. 6.53,
which gives Eq. 6.54. Useful information on Green's functions can be found in [43] (Vol-
ume 1, Chapter V), [73] (Section 22.5), and [128]. A
396 Chapter 6. Deterministic Systems and Input

Example 6.25: Let u be defined in D by C [u(x)] = -q(x) u(x), x E D,


with the boundary condition u(x) = 0, X E aD. If £ is a linear differen-
tial operator and q is integrable in D, then u also satisfies the integral equation
u (x) = f D g(x, y) q (y) u (y) d y, where g denotes the Green function of£. <>

Note: The formula in Eq. 6.53 with q u in place of p gives an integral equation for u. If
g(x, y) is positive for all x, y E D, the integral equation for u can be given in the form
u(x) = c(x) E[q(Y) u(Y)], where c(x) = fv g(x, y) dy andY is a random vector with
density g(x, ·) = g(x, ·)/c(x). A

Example 6.26: Consider the Laplace operator~ = L-1=1 2 a ;axF


in ]Rd, where
d ::: 2 is an integer. The Green function of this operator is g(x, y) = y(r) +
w (x, y), where w is regular, that is, it satisfies the equation ~ w = 0 and has
continuous second order partial derivatives, r =II x - y II,

Y(r )
_
-
{ 2 ~ Ilog(1/r),
d-2
d = 2,
(d-2)ad r ' d::: 3,

and ad = 2 nd/21r(d /2) is the surface area of the unit sphere in JRd ([128], Sec-
tion 1.2). The Green functions have various singularities depending on the func-
tional form of the operator £ and the dimension d. For example, the singularities
of the Green function for the Laplace operator in IR 2 and JR3 are given by log(r)
and r- 1, respectively.<>

Note: The Green function can be calculated as the limit as 8 -1, 0 of a sequence of solutions
g8 of L'lg 8 (x, y) = -pe(x, y), x, y E D, with the homogeneous boundary conditions
g8 (x, y) = 0, where p 8 (-, y) is zero outside the disc Ds(Y) = {x :II x- y II< 8} of radius
8 centered at y E D and satisfies the conditions JDs (y) p 8 (x, y) dx = 1 and Ds (y) C D
for each 8 > 0. The limit of g8 (·, ·)as 8 -1, 0 yields the Green function. A comprehensive
review on Green's function can be found in [128]. A

6.3.2 Mean value property


Let u be the solution of

£ [u(x)] = 0, X E D, (6.55)

where D is an open bounded subset of IRd and £ is the differential operator in


Eq. 6.52. Consider a sphere S(x, r) = {y E D :II x - y II< r} with radius
r > 0, center x E D, and boundary aS(x, r) = {y E D :II x - y II= r}
whose closure S(x, r) U aS(x, r) is in D. The area of aS(x, r) is ad(r) =
2 ;rd/ 2 rd-l I r(d/2). Let da(y) denote an infinitesimal element on aS(x, r) so
that (1/ad(r)) fas(X,r) da(y) = 1.
6.3. Sphere walk method 397

The mean value theorem states that ([44], Section IV.3)

u(x) = - 1 - f k(x, y) u(y) da(y), (6.56)


ad(r) las(x,r)
where the kernel k depends on £.

Note: The formula in Eq. 6.56 states that the value of u at the center of S (x, r) is equal to
the average of the values of this function on aS(x, r) weighted by the kernel k of C. We
will see that the kernel k is an essential ingredient of the SWM. Efficient algorithms for
finding the local solution of partial differential equations by the SWM are possible only if
k has a known analytical expression.
Two methods can be used to find the kernel of a differential equation. The kernel
of C can be obtained from Eq. 6.52 using integration by parts and the Green formulas
([44], Volume 2, Chapter IV). There are few differential operators for which the kernel kin
Eq. 6.56 has a simple form, for example,

1, for~.
k(x,y)= { 1\X-yl\cl/2
sinh(IIX-yl\ cl/2)' for~- c, c > 0.

Other examples can be found in [128, 160].


An alternative method for finding the kernel k of C can be based on the RWM in
Section 6.2. Let u be such that C[u] =
0 in S(x, r). If there is a diffusion process X with
generator C that starts at x, then

Ex [u(X(T))]- u(x) =Ex [loT C[X(s)]] = 0,

where T = inf{t > 0: X(t) ¢ S(x, r)} (Eq. 6.14). Hence,


u(x) =Ex [u(X(T))] = f u(y) fX(T)iX(O)(Y I x) da(y),
fas(x,r)
where fX(T)IX(O) is the density of X(T) conditional on X(O) =
x. If the kernel k is
positive for all x, y E D, the above representation of u and Eq. 6.56 imply

fX(T)iX(O)(Y I x) = k(x, y)fad(r).


Under this condition the kernel k of C can be approximated by a histogram of the condi-
tional random variable X(T) I (X(O) = x). &

Example 6.27: The local solution of the Laplace equation in Eq. 6.23 for D =
S(x, r) is u(x) = Ex [u(B(T))], where B denotes an JRd -valued Brownian mo-
tion starting at B(O) = x and T = inf{t > 0: B(t) ¢ S(x, r)}. Because B(T) is
uniformly distributed on as(x, r), we have

u(x) = _..!___() f u(y) da(y)


ad r las(x,r)
so that k = 1 for the Laplace operator since ad(r) is the area of aS(x, r). <>
398 Chapter 6. Deterministic Systems and Input

If y is a uniformly distributed random variable on asex, r), then


u(x) = Ex[k(x, Y) u(Y) ], (6.57)
au(x)
-a- = E
x [ak(x, Y)
aXi u(Y) ,
J i = 1, ... ' d. (6.58)
Xi

Note: The expression of u in Eq. 6.56 implies Eq. 6.57 since Y is a uniformly distributed
aS(x, r)-valued random variable, and Eq. 6.58 follows from Eq. 6.57 by differentiation
with respect to x; . The notation Ex is used to indicate that the expectation is performed on
a sphere centered at x.
The relationships in Eqs. 6.57 and 6.58 state that u and its partial derivatives at an
arbitrary point x E D are equal to averages of the values of u on aS (x, r) weighted by the
kernel k and by its partial derivatives ak I ax;, respectively. Hence, we can calculate u (x)
and au(x)jax; at X from the values of U on aS(X, r) provided that the kernel k is known.
If k 2: 0 on aS(x, r), the mean value property in Eq. 6.57 can also be given in the
form u(x) = ctd(x, r) Ex [u(Z)], where Z is an aS(x, r)-valued random variable with the
probability density function

k(x, y) = k(x, y)/ad(x, r) (6.59)

Let Y(w), w = 1, ... , ns, be ns uniformly distributed and independent


points on aS(x, r). Assume that u, k, and akjax; are known on aS(x, r).

The estimates of u and au;ax; at x E Dare

1 ns
u(x) = - L k(x, Y(w)) u(Y(w)) and
ns w=l
au(x) 1 ~ ak(x, Y(w)) y
-- = - ~ u( (w)). (6.60)
ax; ns w=l ax;

Note: Let G;, i = 1, ... , d, be independent copies of N(O, 1) and define the random
variable U = (Lf=l cT) 112 . The vector (G1/U, ... , Gd/U) is uniformly distributed
on aS(O, 1) in JRd ([155], Procedure N-4, p. 89). This observation can be used to generate
independent samples of Y. ~

In applications the function u and its derivatives are specified on the bound-
ary aD of D, which generally is not a sphere. Hence, Eqs. 6.57 and 6.58 cannot be
applied directly. The following two sections use the spherical process in Fig. 6.11
to find the local solution for Dirichlet and mixed boundary value problems.
6.3. Sphere walk method 399

6.3.3 Dirichlet boundary conditions


Let u be the solution of Eq. 6.55 defined in an open bounded subset D of
Jll!.d with the boundary conditions u(x) = ~(x), x E aD. Let X(w) = {X o(w) =
x, X 1 (w), ... , XN(w) (w)}, w = 1, ... , ns, be ns independent samples of a spher-
ical process starting at x E D, where N(w) denotes the number of steps until a
sample X (w) of X reaches for the first time a small vicinity of aD, that is, the first
time that the distance between X N(w) (w) and aD is smaller than a specified e > 0
(Figs. 6.11 and 6.12). This condition defines the number of steps N after which X
is stopped. The radii of the spheres centered at x, X 1, ... , X N are r, R1, ... , RN.

Spherical

vicinity of oD

Figure 6.12. Stopping rule for spherical processes

The points X; are uniformly distributed on aS(X i-1, R;-d (Eq. 6.57). Let X N+l
be the point of aD that is the closest to X N (Fig. 6.12). By the definition of N,
we have II XN- XN+I II< e.

The local solution of Eq. 6.55 is

(6.61)

Proof: Suppose that N(w) = 2 (Fig. 6.11). The use of Eq. 6.57 along the sample
(Xo(w), X 1 (w), Xz(w), X3(w)) of X gives

u(Xz(w)) = k(Xz(w), X3(w)) u(X3(w)),


u(X 1 (w)) = k(X 1(w), Xz(w)) u(Xz(w)), and
u(x,w) =k(x,XI(w))u(XI(w)),

so that u(x, w) = flT=I k(X;-1 (w), X; (w)) sCX3(w)) since the X 3((v) is on the boundary
JD of D so that u(X3(w)) = s(X3(w)). The equality u(XN) = k(XN, XN+IJS(XN+I)
400 Chapter 6. Deterministic Systems and Input

is approximate since the spherical process is stopped before reaching the boundary of D.
The approximation is likely to be satisfactory since u is a smooth function.
We have seen that samples of a Brownian motion (Example 6.27) or properties
of Gaussian variables (Eq. 6.60) can be used to generate uniformly distributed points on
8S(Xi, Ri)· We will see in Section 8.5.1.1 that the exit time for a Brownian motion from a
sphere is finite a.s. This property guarantees that the number of steps N in Eq. 6.61 that a
spherical process takes to reach the boundary of D is finite a.s. •

The local solution in Eq. 6.61 can be estimated by

1 ns
I: u(x, w),

err
ucx> = - where
ns w=1

u(x, w) = k(Xt-1 (w), X, (w))) <(XN(o)+l (w)). (6.62)

Note: The calculation of the estimate uof u can be inefficient since the radii Ri (w) of the
spheres S(Xi (w), Ri (w)), i = 0, 1, ... , N(w), are the solutions of nonlinear optimization
problems. This difficulty can be reduced by using specialized algorithms for calculating
Ri (w) based on simplified representations of 8D [200]. A

Example 6.28: Let u be a harmonic function on an open bounded set D c ~ d,


that is, du(x) = 0, that satisfies the Dirichlet boundary conditions u(x) = g(x),
x e aD. The local solution at an arbitrary point x e D can be estimated by
u(x) = (1/ns) 2:::,S= 1 g(XN(w)+1 (w)) (Eq. 6.62). 0
Note: The above local solution by the spherical process method follows from Eq. 6.62
since the kernel of fl. is k = 1. We also note that the local solution u(x) = Ex[~(XN+1)]
delivered by the SWM coincides with the local solution by the RWM in Eq. 6.23. The
coincidence is of no surprise since the points Xo = x, X 1(w), ... are on the sample paths
of a Brownian motion starting at x. A

Example 6.29: Let u be the solution of Eq. 6.55 with

a a2
2
C = Lai-
. 1 axi
+-
1
L
2 ..
2
dij - - ,
axi ax
1 j
xeD= (-2, 2) x (-1, 1),
!= ~J=

and the Dirichlet boundary conditions u(x) = 1 for x e ( -2, 2) x {-1} and
u (x) = 0 on the other boundaries of D, where a i and dij are some constants.
Two versions of the SWM can be implemented depending on the sign of
the kernel k of C. If k does not have a constant sign in D, the local solution must
be based on Eq. 6.57. If k has a constant sign, for example, k ~ 0 in D, the local
solution can be based on Eq. 6.59 and a modified spherical process whose points
are not uniformly distributed on the boundary of the spheres S(X i , Ri).
6.3. Sphere walk method 401

0.9

0.8 1\ Approximate kernel (r =1)


0.7

0.6

0.5
/Histogram (r = 1)
0.4
(\
0.3

I
0.2
\
l 11
Figure 6.13. Histogram and approximation of k for r = 1

Figure 6.13 shows a histogram of X(T) I X(O) and an analytical approxi-


mation of kin Eq. 6.59 for a sphere of radius r = 1 and the coefficients a 1 = 1,
az = 2, du = 1, d12 = dz1 = 0.5, and dzz = 2 of C. Because the coefficients of
C are constants, the kernel of C does not depend on the location of the center x of
the sphere S(x, r). The parameter a E [0, 2n) defines the position on aS(x, r).
Estimates of the local solution u based on n s = 1, 000 samples of the modified
spherical process and the analytical approximation of kin Fig. 6.13 are in error by
less than 3% relative to a finite element solution. 0

Note: Let X be an JR2-valued diffusion process defined by the stochastic differential equa-
tion dX(t) =a dt + b dB(t), t ~ 0, where a, b denote (2, 1), (2, 2) constant matrices and
B is an JR2 -valued Brownian motion. If a = (ai, az) and b bT = d = {dij }, the generator
of X coincides with .C. It was found that the function

h(a,r) =L
2

i=l
Pi(r) c/J
O"j
(a -1-Li)
O"j

with I-Ll = 1.25, 1-LZ = 3.4, a1 = az = 0.3, and pz(r) = 1 - Pl (r) approximates
k satisfactorily, where PI (r) = 0.6; 0.7; 0.8; 0.9 for r = 0.5; 1.0; 1.5; 2.0, and c/J(p) =
(2rr)- 112 exp(-0.5 p 2 ). The histograms in Fig. 6.13 are based on ns = 1, 000 samples of
X generated with a time step of 0.001. &

We have seen that the spherical process used to define the local solution in
Eq. 6.61 reaches a small vicinity of the boundary aD of Din a finite number of
steps. However, the product of kernels k(Xi-1. Xi) in Eq. 6.61 may not converge
402 Chapter 6. Deterministic Systems and Input

as the number of steps N increases, in which case the local solution given by this
equation cannot be used, as demonstrated by the following example.

Example 6.30: Consider a thin rectangular plate in D = (0, a) x (0, b), a, b > 0,
in JR2 . The plate is subjected to a uniform traction at the boundaries {0} x (0, b)
and {a} x (0, b). The state of stress at a point x E D in the plate is given by
the vector r(x) = (ru (x), r22(x), r12(x)). There is a mean value theorem for r
similar to Eq. 6.57 stating that

r(x) = Ex [a(8) r(8)],

where 8""' U(O, 2n) gives the location on the unit circle centered at x, a(8) =
a1 (8)T a2(8),

a1(8) = [4 cos 2(8)- 1, 4 sin 2(8)- 1, 2 sin(2 8)],


a2(8) = [cos 2(8), sin2(8), sin(2 8)],

and r is a co~mn vector ([51], Eq. 5.22).


The local solution in Eq. 6.61 depends on the product B n = TI7=1 a(8i)
I
of random matrices. Let A (n) 2 be the largest eigenvalue of B B n. We will see
in Section 8.8.2 that A(n) converges to a deterministic value A as n ---+ oo under
some conditions and gives the rate of change of r for large values of n. If A > 0,
the algorithm in Eq. 6.61 diverges so that the SWM cannot be applied. Exten-
sive Monte Carlo simulations have shown that the samples A(n, w) of A(n) are
positive and they seem to converge to a deterministic value A > 0 as n increases
for all w's. For example, A(lO, 000, w) ~ 0.015 for all the generated samples
suggesting that the SWM diverges. <>

6.3.4 Mixed boundary conditions


Let u : JRa ---+ lR be the solution of Eq. 6.55 defined in an open bounded
subset D of JRa with Dirichlet and Neumann boundary conditions on aD a and
aDn, respectively, where aDa U aDn =aD and aDa n aDn = 0. For example,
consider the Laplace equation

llu(x) = 0, XED,
{ u(x) = ~a(x), X E aDa, (6.63)
'Vu(x) · c(x) = ~n (x), X E aDn,
where c and ~a, ~n are JRa- and JR-valued prescribed functions, respectively.
The SWM in the previous section cannot solve mixed boundary value prob-
lems since u is not known on aDn. We present here an extended version of the
SWM for the local solution of Eq. 6.63. Alternative versions of the SWM as well
as numerical examples can be found in [200] (Sections 3.3 and 5.3).
Suppose we need to calculate the local solution of Eq. 6.63 at a point
x E D and that the spherical process in Fig. 6.12 has reached a point X N in
6.4. Boundary walk method 403

a small vicinity of aD. If X N+1 E aDa, then we can calculate u(X N) =


k(XN, XN+l) g(XN+l) and determine the contribution to u(x) from Eq. 6.61.
If XN+l E aDn, then u(XN+l) is not known. Method 2 in Section 6.2.3.4 can be
used for solution. Accordingly, we reflect X N+l in the direction of c(X N+l) to
Yo E D at a distance p > 0 from X N+l· We have

u(Yo) = u(XN+l +(Yo- XN+l))


+ Vu(XN+I) ·(Yo- XN+l)
:::::: u(XN+I)
= u(XN+I) + pgn(XN+I)

since Yo - XN+l = p c(X N+t). Hence, the unknown value of u at X N+l can be
related top gn(XN+I), a known value, and u(Yo). Now we restart the spherical
process from Yo. Suppose that after M steps the restarted spherical process is at
Y Min a small vicinity of aD and that Y M+l E aDa. By Eq. 6.61 u(Yo) can be
related to u(Y M+l) = ga(Y M+l), which is known. If Y M+l E aDn, then Y M+l
has to be reflected in D and the previous algorithm needs to be repeated until a
Dirichlet boundary is reached. Hence, the value of u (x) can be calculated for any
sample of the spherical process.

6.4 Boundary walk method


The differential equations of linear elasticity do not have the form in Eq. 6.1
so that the RWM cannot be used to solve elasticity problems. We have seen in Ex-
ample 6.30 that the SWM may give unreliable results because it involves infinite
products of random matrices, which may diverge. Some preliminary results sug-
gest that the boundary walk method (BWM) can be used to solve linear elasticity
problems [118]. The BWM is based on a representation of the local solution by
an infinite series whose terms are boundary integrals. We present essentials of the
BWM and illustrate it by a simple example.
Let D be an open bounded subset of ~d, d 2: 1, with boundary aD on which
tractions or displacements are prescribed. Our objective is to find the solution
u (x) of a linear elasticity problem at an arbitrary point x E D, where u is a scalar
or vector function.
To establish the local solution for a linear elasticity problem, we need to
introduce some notation and definitions. Let k(x, y) be a (d, d) matrix whose
entries are real-valued functions defined on ~2d. Define the operators

.K;<O[h(x)]= { k(i)(x,y)h(y)da(y), i=l,2, ... , (6.64)


lan
for X E aD, where

k(i)(x,y)= { k(x,~)k(i-l)(~,y)da(~), i=2,3, ... , (6.65)


lan
404 Chapter 6. Deterministic Systems and Input

and k (O) ( ~, y) = 1. It can be shown that the equations of linear elasticity can be
given in the form

JL(X) = g(x) + { k(x, y) p,(y) du(y) = g(x) + ,K:(l)[p,(x)], XE aD,


lav
(6.66)
where JL and g are unknown and known functions on aD, respectively. The func-
tion JL can be used to obtain the solution u for a linear elasticity problem in D
from [118]

u(~) = { r(~, y) p,(y) du(y) = R[p,(~)], xED, (6.67)


lav
where R denotes an integral operator and the kernel r(x, y) is a (d, d) matrix with
real-valued entries defined on JR2d. The kernels k(x, y) and r(x, y) are closely
related [118].

Ifsupxeavfav II k(x,y) II du(y) < 1,thelocalsolutionforalinearelasticity


problem is
00

u(x) = R[g(x)] + L:nx::<i)[g(x)], xED. (6.68)


i=l

Note: Let I denote the identity operator. Under the above condition, the Neumann series
of the operator I - K(l) is absolutely and uniformly convergent, so that the solution of
(I- K(l)) [p,(~)] = g(~). that is, the solution ofEq. 6.66is (Section 8.3.1.4)
00

p,(~) = g(~) + L K(i) [g(~)], ~ e av.


i=l

The above series representation of IL and Eq. 6.67 yield Eq. 6.68. Generally, the
condition under which Eq. 6.68 holds is not satisfied for elasticity problems. However, it
is possible to modify the series representation of the local solution such that it becomes
convergent, as demonstrated later in this section (Example 6.31) . .l

Suppose that the representation of the local solution in Eq. 6.68 is valid.
This representation cannot be used for numerical calculation since it has an infinite
number of terms. We have to approximate the local solution u (x), x E D, by the
first m < oo terms in Eq. 6.68, that is,
m
u(x)::: u(x) = R[g(x)] + LRKi[g(x)]. (6.69)
i=l

No probabilistic concepts have been used to establish Eqs. 6.68 and 6.69. We
will use Monte Carlo simulation to calculate the terms of u(x) in Eq. 6.69. A
random walk on aD, referred to as the boundary walk method (BWM), is used
to evaluate u(x) in Eq. 6.69.
6.4. Boundary walk method 405

Let Yo, YI, ... , Yi, ... be a aD-valuedMarkovchain, where fo and fili-I
denote the densities of Yo and Yi I Yi-I, i = 1, 2, ... , respectively. Define also
an Rd -valued process with states Zo, ZI, ... , Z;, ... , where

Zi = qi g(Yi), i = 0, 1, ... ,
qi-I k(Yi-I, Y;)
qi = , i = 1, 2, ... , and
fili-I (Yi I Yi-I)
r(x, Yo)
(6.70)
qo = fo(Yo) ·

The states Zo, ZI, ... , and Zi depend on Yo, (Yo, Y I), ... , and (Yo, ... , Y;), re-
spectively. The joint density fo.I, ... ,i of (Yo, ... , Yi) is fo,I, ... ,i (y 0, YI· ... , Yi) =
fo(Yo) n~=I fJij-1(Yj I Yj-1)for j ~ 1.

If fo.I .... ,i(Yo.YI· ... ,yi)= 0 implies that the entries of the matrix
r(x,yo) n~=Ik(Yi-I•Yi)g(yi) arezeroforallx ED, all boundary points
Yo, Y1• ... , Yi E aD, and i ~ 0, then

nK<O[g(x)] = E [Z;], i ~ 0. (6.71)

Proof: We use the definitions in Eqs. 6.66, 6.67, 6.70, and 6.70 to prove Eq. 6.71. Note
that

R[g(x)] = fav r(x, y) g(y) du(y) = fav [r~:C~) g(y)] fo(Y) du(y) = E [zo]
provided that fo(Y) = 0 implies that the entries of r(x, y) g(y) are zero for all x E D and
y E aD. The second term in Eq. 6.69 is

R.K:(l)[g(x)] ={ r(x, Yo) k(yo, YI) g(yi) du(yo) du(yi)


lavxav
= Ia [r(x,
.: ( Yo)) f k(yo,
(
YI)
)
J
g(yi) fo.1 (Yo. YI) du(yo) du(yi) = E [z 1]
avxav JO Yo 110 YI Yo
1

since fo,I(Yo. YI) = 0 implies the entries of r(x,y0)k(y0 ,yi)K(YI) are zero for all
x E D and Yo, Y1 E aD . •
Let (Yo(w), Y 1(w), ... ), w = 1, ... , ns. be ns independent samples of the
Markov process (Yo, Y1, ... ) and let (Zo(w), ZI(w), ... ) denote the correspond-
ing samples of (Zo, Z1, ... ).

The terms of the local solution in Eq. 6.68 can be approximated by the following
estimates of the expectation of the random variables Z i, that is,

(6.72)
406 Chapter 6. Deterministic Systems and Input

Example 6.31: Let u be the solution of ilu(x) = 0, x E D, with the boundary


conditions u(x) = ~(x), x E aD, where Dis an open, convex, bounded subset in
IR2 and ~ is a prescribed function. The local solution of this equation is

u(x) = ~ R[g(x)] + ~ t(
i=l
RK(i-I) [g(x)] + RK(i) [g(x)])

+2L
1 1 00
= 2 E[Zo] (E[Z;_I] + E[Z;]).
i=l

The local solution u has been calculated forD = {x E IR 2 , I x II< 1} and


~(x) = 100 cos(e), wherex 1 = cos(e), x 2 = sin(e), ande E [0, 2n). Numerical
results on a circle of radius 1/2 centered at zero are in error by less than 1% when
based on the approximation u(x) = (1/2) E[Zo] + (1/2) (E[Zo] + E[ZI]) +
(1/2) (E[ZI] + E[Z2]), where the expectations E[Z;] in the expression of have u
been calculated from 20,000 independent samples of the boundary walk. <>
Proof: The initial and transition densities fo(Yo) = r(x, Yo) and fili-I (y; I Yi-I) =
lk(Yi-1, Y;)l, i = 1, 2, ... , have been used in calculations, where

r(x, Yo)= cos(rpx,y 0 )/(2n: ro),


k(Yi-1, Yi) = - cos(rpyi-1 ,y)/(n: r;-I,i ),

ro =II x - Yo II, r;-I,i =II Yi-1- Yi II, f/Jx,y 0 denotes the angle between the interior
normalatyo E aDandtherayfromyo E aDtox E D,andrpyi_ 1 ,yi istheanglebetween
the interior normal at Y; and the ray from Yi E aD to Yi-1 E aD.
The series L~o E[Z;] cannot be used to calculate u(x) since it is not convergent
Consider the series v(x) = L~o Ai E[Z;]. This series has a single pole at A = -1 and
coincides with the series expansion of u (x) for A = 1. The idea is to eliminate the pole of
v(x) and then use the resulting series to calculate the local solution u(x).
Note that

I:>i (E[Z;_j] + E[Z;l)


00 00

(1 +A) v(x) = (1 +A) ~:::>; E[Z;] = E[Zo] +


i=l
or
= - - E[Zol +--I>'
1 1
+E[Z;l).
00 .
v(x) (E[Z;_j]
1 +A 1 +A i=l
Since the above series is convergent and u(x) is equal to v(x) for A= 1, u(x) has the stated
representation. •

6.5 Algebraic equations


Consider the algebraic equation (Eq. 6.2)

(i- 'A g)u = ~. (6.73)


6.5. Algebraic equations 407

where i is the (m, m) identity matrix, g is an (m, m) matrix, t/> and u denote
m-dimensional vectors, and /.. is a parameter. The entries of g and t/> are real
numbers. If tJ> =1= 0 and det(i -/..g) =I= 0 for a specified value of/.., then Eq. 6.73
has a unique solution. If t/> = 0, Eq. 6.73 becomes

(i- /..g)u = 0 (6.74)

and defines an eigenvalue problem. We refer to Eqs. 6.73 and 6.74 as inhomoge-
neous and homogeneous, respectively.
Our objectives are to develop methods for finding directly the value of
a particular coordinate of the solution u of Eq. 6.73 and the dominant eigen-
value/eigenvector of Eq. 6.74, that is, to develop methods for the local solution
ofEqs. 6.73 and 6.74.

6.5.1 Inhomogeneous equations


Consider the algebraic equation

(6.75)

defined by Eq. 6.73 with/.. = 1. This choice of/.. is not restrictive. The objective
is to find the value of a coordinate u i of u directly rather than extracting its value
from the global solution ofEq. 6.75.
Let X be a discrete time Markov process with discrete states {1 , ... , m + 1},
where m + 1 is an absorbing state. The process X starts at a state i E {1, ... , m}
and exits {1, ... , m} or, equivalently, enters the absorbing state m + 1, for the first
time in T transitions, where

T = inf{r > 0: X(r) = m + 1, X(O) = i E {1, ... , m}}. (6.76)

Let p = {Pi,J }, i, j = 1, ... , m, be a matrix such that Pi,J =::: 0, Pi = 1 -


I:j= 1 Pi,J=::: 0, i = 1, ... , m, and at least one of the Pi's is strictly positive. Let
v ={vi,}} be an (m, m) matrix with entries

v· . _ { gi,JIPi,J. if Pi,J > 0,


l,J- 0, (6.77)
if Pi,J = 0,

where gi,J are the entries of gin Eq. 6.75.

Let V = (V(O), V(1), ... ) be a process defined by

V(k + 1) = V(k) VX(k),X(k+l)· k = 0, 1, ... '


V(O) = 1. (6.78)
408 Chapter 6. Deterministic Systems and Input

Note: The (m + 1, m + 1) matrix p


with the first m rows and columns given by p,
Pi,m+l =Pi fori= l, ... ,m,fJm+l,j = Oforj = l, ... ,m,andfim+l,m+l = 1
is a transition probability matrix that is used to define the Markov chain X with the discrete
states {1, ... , m + 1} and a unit time step, where m + 1 is an absorbing state. The timeT
in Eq. 6.76 gives the number of transitions till X starting at X(O) = i E {1, ... , m} enters
the absorbing state m + 1 for the first time.
If all probabilities Pi were zero, the Markov chain X would not reach the absorbing
state m + 1 a.s. so that T would be unbounded. Because at least one of the probabilities
Pi is strictly positive, X reaches the absorbing state m + 1 in a finite number of transitions
a.s. (Eq. 6.80). &

If det(i- g) i= 0 and II g lim< 1, the local solution ofEq. 6.75 is


ui = E[V (T - 1) </Jir_ 1 / Pir- 1 I X (0) = i]. (6.79)

Proof: The above matrix norm II · lim is II g lim= maxll~ll=l II g ~ II, where II · II is the
Euclidian norm.
LetS = T- 1 and i, i1, ... , is(w)• ir(w) be a sample path X(-, w) of X, which
starts at state i E { 1, ... , m} and reaches the absorbing state m + 1 in T (w) transitions. The
expectation in Eq. 6.79 is

E[V(S) r/>; 5 /Pis I X(O) = i] = L (V(S(w)) rPiscw/ Pis(w)) P(w),


w

where the sum is performed over all sample paths w of X,

P(w) ={ Pi,i1 Pi1 ,i2 · · · Pis(w)-1 ,is(w) Pis(w) for S(w) > 0,
Pio =Pi for S(w) = 0

is the probability of sample path w, and V (S(w)) = vi,i 1 Vi 1,i2 ••. viscwl-! ,is(w) denotes the
value of V just prior to X entering the absorbing state m + 1 calculated along a sample path
w of X. An alternative form of Eq. 6.79 is

u; = Lw (vi,i1 v;1,i2 · · · Visrw)-J,is(w) rPiscw/Pis(w)) P(w)


or, equivalently,

oo m m
Ui = L:: L:: ... L:: gi,i1 g;l .i2 ... gik-1 .ik rPik
k=Oi1=l ik=l

= ¢; + (g t/1) i + (g 2 t/1) i + . .. ,
where the first value of k is zero since, if X enters the absorbing state at the first transition,
T = 1 and S = 0. The resulting expression of u; represents the entry i of the Neumann
series expansion L~o gs t/1, which is convergent since II g lim< 1 by hypothesis. Since
(i- g)- 1 t/1 = L~o gs tfi, u; in Eq. 6.79 solves locally Eq. 6.75. •
6.5. Algebraic equations 409

If at least one of the probabilities p; is not zero, the average number of transitions
of X to the absorbing state m + 1 is finite and equal to

w = (i - p)- 1 1, (6.80)

where 1 E !Rm has unit entries and w; = E[T I X(O) = i], i = 1, ... , m.
Proof: Let p be a real-valued function defined on {1, ... , m} and

w; =E [t k=O
p(X(k)) I X(O) = i]
fori E {1, ... , m} and S = T -1. It can be shown that w; = p(i) + LJ=I Pi,j Wj ([150],
pp. 108-110) or w = p + p w. If p(i) = 1 for all i E {1, ... , m}, then

w; = E[S + 1 I X(O) = i] = E[T I X(O) = i]


is the expected number of transitions that X, starting at X (0) = i, takes to enter the ab-
sorbing state m + 1 and w = ( i - p) - 1 1 as stated. For example, let m = 3 and

0.2 0.4 0.0 ]


p =[ 0.4 0.2 0.2 .
0.0 0.5 0.2
The expected values of the time to absorption are (3.09, 3.68, 3.55) for the initial states
(1, 2, 3) of X. •
The efficiency and accuracy of the local solution in Eq. 6.79 depends on
matrices g and p. We have no control on g but the selection of pis rather arbitrary
provided it defines a Markov chain X with a reachable absorbing state. There are
no simple rules for selecting an optimal transition probability matrix p. Some
suggestions for the selection of matrix p can be found in [94] (Section 7.1).
Leti ~--+ X(1, w) ~--+ · · · ~--+ X(T(w)-1, w) ~--+ X(T(w), w) be a sample of
X and let X(T(w)- 1, w) = X(S(w), w) denote the last value of X in {1, ... , m}
prior to its exit from this set. Suppose that n s independent samples of X have been
generated by using the transition probability matrix jJ defined following Eq. 6.78.
The values of V along a sample w of X can be calculated from Eq. 6.78.

The local solution in Eq. 6.79 can be estimated by

u; = -1 L V(T(w)- 1, w)
ns A.
'!'ir(w)-l. (6.81)
ns w= 1 Pir(w)-1

Example 6.32: Let u be defined by Eq. 6.75 with m = 1. The exact solution of
this equation is u = ¢/ (1 - g). The associated Markov process X has two states
and the transition probability matrix

-
p= [ p
0
1- p
1
J.
410 Chapter 6. Deterministic Systems and Input

If lgl < 1, Eq. 6.79 gives u = L~o gk ¢ = ¢ L~o gk, which coincides with
the exact solution¢ I (1- g) since the Neumann series L~o gk is convergent with
sum 11(1- g). 0

Example 6.33: Suppose that all probabilities Pi = 1 - L::j= 1 Pi,} are equal to
p E (0, 1). Then the average number of transitions to absorption for the Markov
chain X with transition probabilities Pi,} is Wi = 11p, for each i E {1, ... , m}. 0
Proof: The solution wi = ljp, i E {1, ... , m}, can be obtained directly by noting that X
is absorbed with probability p or takes values in {1, ... , m} with probability 1 - p at each
transition. Hence, T is the time to the first success for a sequence of independent Bernoulli
trials with probability of success p. Hence, Tis independent of the initial value of X and
has the mean and variance E[T] = ljp and Var[T] = (1- p)/r. respectively.
Consider the Neumann series w = 1 + L:~ 1 ps 1 corresponding to the solution
w = (i- p)- 1 1. Because the terms ps 1 of L:~ 1 ps 1 are equal to (1- p)s 1, this series
is convergent and all coordinates of ware equal to L:~0 (1- p)s = ljp. For example,
the entries of p 1 are L:j= 1 Pi,} = 1- p so that p 2 1 = p (p 1) = (1- p) p 1. •

Example 6.34: Let u be the solution of Eq. 6.75 with m = 3, ¢1 = 1, ¢2 = 2,


¢3 = 3, and
0.2 0.4 0.0 ]
g = [ 0.4 0.2 0.2 .
0.0 0.5 0.2
Suppose we take p equal to

0.3 0.3 0.3 ]


P1 = [ 0.2 0.2 0.5
0.5 0.2 0.1
or p 2 with all entries equal to 0.1. The average number of transitions of X
to absorption are (7.75, 7.63, 7.11) and (1.43, 1.43, 1.43) under p 1 and p 2 , re-
spectively, for the initial states (1, 2, 3) (Eq. 6.80). The ratio, ((7.75 + 7.63 +
7.11) 13)) 11.43 = 5.24, shows that the computation times for finding the local so-
lution based on 2,000 and 10,000 samples of X defined by p 1 and p 2 , respectively,
are similar.
The exact solution is u = (4.68, 6.84, 8.03). The algorithm in Eq. 6.81
can be applied since II g lim< 1. The estimates of u are (4.23, 6.65, 8.01)
and (3.90, 5.78, 7.14) for the transition probability matrices p 1 and p 2 based
on 2,000 and 10,000 samples, respectively. The errors of these estimates are
(-9.62, -2.78, -0.25)% for p 1 and (-16.67, -12.35, -11.08)% for p 2 . The
probability matrix with smaller absorbing probabilities yields a superior local so-
lution in this case for approximately the same amount of calculations. 0
Note: Experience provides some guidelines for selecting the probabilities Pi,}. If the
probabilities Pi are "too large", the average number of transitions of X to absorption is
very small and the estimates of the local solutions can be unstable and less accurate, as
demonstrated by this example. A
6.5. Algebraic equations 411

Example 6.35: Consider the Poisson equation ~u(x) + p(x) = 0 in Eq. 6.6
defined on an open bounded set D c ~ 2 with the Dirichlet boundary condition
u(x) = ~(x), x E aD. The local solution of this problem by the RWM is in
Section 6.2.1.4. We present here an alternative local solution based on the finite
difference approximation,

1 a2
Ui,j = 4 (Ui+1,j + Ui-1,} + Ui,J+1 + Ui,j-1} + 4 p(Xi,j), (6.82)

of this Poisson equation, where x i,j denotes the coordinate of the node (i, j) of
the finite difference mesh in D, a denotes the step of the finite difference mesh,
and Ui,J = u(xi,J ).
The method in this section can be applied to solve locally the linear system
defined by Eq. 6.82. However, we present an alternative local solution based on
n 8 walkers that start at a node x of the finite difference mesh and can move to a
node just right, left, up, or down from x with equal probability. A walker k travels
through the nodes x ~k), s = 1, ... , mk. of the finite difference mesh and exits
D at x~l E aD, where mk denotes the number of transitions walker k takes to
reach the boundary aD of D. For simplicity, we change notation and identify the
nodes of the finite difference mesh by a single subscript. The local solution can
be estimated by

The result provides an approximation for the local solution in Eq. 6.26. <>

Proof: Note that the walkers paths define a random walk in W. The first term in the
u,
expression of that is, the sum (1/ns) L:Z~ 1 Hx~h. is an estimate of the expectation
Ex[u(B(T))] in Eq. 6.26. The connection between the second term in the expression of u
and the integral Jcf p(B(s)) ds in Eq. 6.26 is less obvious.
To show that Ex [JcJ p(B(s)) ds Jis approximated by the second term in the ex-
pression of u(x), we develop an alternative interpretation of Jcf p(B(s)) ds. Consider a
circle S(x, a) of radius a > 0 centered at x E D, where a is the step of the finite dif-
ference mesh. Let ~T1 be the time a Brownian motion B starting at x needs to reach the
boundary of S(x, a) for the first time. Denote by Z1 the exit point of B from S(x, a).
Consider another circle S (Z 1 , a) and a Brownian motion starting at Z 1· Let ~ T2 be the
time it takes this Brownian motion to reach for the first time the boundary of S(Z1, a)
and so on. We stop this construction at step M if S(ZM, a) is the first circle that is not
included in D. The process x = Zo H- Z1 H- · • • H- ZM is similar to the spher-
ical process in Fig. 6.11. Because the radius a is small, the last point of the process
Z = (x = Zo. Z1, ... , ZM) is close to the boundary aD of D so that Jl
p(B(s)) ds
412 Chapter 6. Deterministic Systems and Input

can be approximated by 'L~ 1 p(Zr-1) D.Tr. Also,

so that E [ 'L! 1 p(Zr-1) D.Tr J= (a 2/2) E [ 'L!1 p(Zr-d J. where we used E[D.Tr] =
a2 /2 (Section 8.5.1.1). These considerations show that the second term in the expression
of uconstitutes a Monte Carlo estimate for (1/2) E [ L~l p(Zr-1)] (a 2/2). •

Example 6.36: Consider the partial differential equation in the previous example
with p(x) = 0 and D = (0, 1) x (0, 1). The coordinate of node (i, j) of the finite
difference mesh is Xi,j = ((i - 1) a, (j - 1) a), i, j = 1, ... , n + 1, where n
denotes the number of equal intervals in (0, 1) and a = 1/n. The estimate of the
solution u of the Laplace equation L\u(x) = 0 in D at an arbitrary node (i, j) of
the finite difference mesh is Ui,j = (1/ns) L~~~ ~(x~~) (Example 6.35), where
~ is the boundary value of u and x ~~ denotes the terminal point of the sample
paths k of the random walk.
If~ is equal to 100 on {1} x (0, 1) and zero on the other boundaries of D,
the above estimate of u gives Ui,j = 100n~/ns, where n~ denotes the number
of samples of the random walk that exit D through boundary {1} x (0, 1). The
estimates of u calculated at a relatively large number of points in D are in error
by less than 3% for a finite difference mesh with a = 1/1,000 and ns = 1, 000
samples.¢
Note: The exact solution of the Laplace equation with the domain and Dirichlet boundary
conditions considered in this numerical example is

400
L sinh(n :n: x1) sin(n :n: xz)
00
u(x) = -
:n: sinh(n :n:)
n=l,3, ...

and can be found in [73] (Example 26.13, pp. 550-551) . .l

Example 6.37: Let u be the solution of Eq. 6.75. Suppose that the condition
II g lim< 1 is not satisfied so that the local solution in Eq. 6.79 cannot be used.
Let a = i - g, J-L be an eigenvalue of a, {L > max{JJ-LI}, c = ({L) -z, 1/)* = c a 1/),
and g* = i- ca 2 . Then u is also the solution u = 1/)* + g* u and II g* lim< 1.
Hence, Eq. 6.79 can be used to solve u = 1/)* + g* u locally.¢
Proof: That u is the solution of u = !/)* + g* u results by straightforward calculations
using the definitions of!/)* and g*.
6.5. Algebraic equations 413

We can assume g = gT without loss of generality since, if this condition is not


satisfied, we can solve aT au =aT cp, and this equation can be set in the form of Eq. 6.75
with g symmetric. Also, we have a 2 v = 1-t a v = ~-t 2 v so that the eigenvalues ~-t* = c ~-t 2
of matrix c a 2 are positive and strictly smaller than one. The eigenvalues A* of g* are the
solution of
0 = det(g* -A* i) = det((l -A*) i - c a 2 )
so that they are equal to 1 - ~-t*. Because ft * E (0, 1) so are the eigenvalues A*. Hence, the
Monte Carlo solution is convergent for the modified problem [93]. •

6.5.2 Homogeneous equations


Consider the eigenvalue problem

(6.83)

obtained from Eq. 6.73 with if> = 0 and f.L = 1/A., where g is assumed to be a
real-valued symmetric matrix with simple positive eigenvalues. Our objective is
to find the largest eigenvalue of g and the corresponding eigenvector, referred to
as the dominant eigenvalue and eigenvector, respectively.

Example 6.38: Consider the eigenvalue problem in Eq. 6.83. Suppose that the
eigenvalues f.Lk of g satisfy the condition f.L1 > f.Lk for all k > 1. Let y(r) E
JRm, r = 0, 1, ... , be a sequence of vectors generated by the recurrence formula
y<r) = g y<r- 1), where y< 0l does not exclude the dominant eigenvector of g, that
is, y<0l = 2:r= 1 {J; u;, where u; denote the eigenvectors of g and fJ1 =f:. 0. Then
y<r) / f.L} and Y;(r+ 1) IY;(r) converge to fJ1 UJ and f.LJ, respectively, as r --+ oo, where

Y;(r) denotes the coordinate i of y<r). <>


Proof: We have
m m
yer) = g y<r-1) = g2 y<r-2) = ... = gr yCO) = L {3; gr u; = L {3; I-tt u;
i=1 i=l

so that

i'' ~ ~~ [p, "' + t,p, (:: )' ••].


Hence, the scaled vector y(r) I ft~ converges to fJ1 u1 as r ~ oo since /hi I ft1 < 1, and the
. (r+ I) (r)
ratiO Y; IY; approaches ft! as r ~ oo. •

Let p = {p;,j} be an (m, m) matrix such that Pi,j ::0: 0 and LJ=I Pi,j =
1 and let X be a Markov chain with states {1, ... , m} defined by the transition
probability matrix p. Let a = (aJ, ... , am) E JRm and rr; > 0, i = 1, ... , m,
such that 2:r= 1 rr; = 1. Define a discrete random variable Y taking the values
a; = a;/n; with probabilities n;, i = 1, ... , m, and let V be the process defined
414 Chapter 6. Deterministic Systems and Input

by the recurrence formula in Eq. 6.78, but starting at V(O) = ai rather than 1.
Consider also the recurrence formula

gx<r-l) = PrX(r), r = 1, 2, ... , (6.84)

starting at x (O) E JRm such that x (O) does not exclude the dominant eigenvector of
g. Set Pr =II g x<r-l) II in Eq. 6.84 so that the vectors x<l), x< 2), ... have unit
norm. Let Vi,J = gi,J I Pi,J if Pi,J > 0 and Vi,J = 0 if Pi,J = 0.
If the eigenvalues ILk of g = gT are such that IL =ILl > ILk fork > 1, then

n_
-· (e(q'))l/(q'-q)
,_ ·- for q' > q and large q, where (6.85)
e(q)
m m
e(q) = LE[V(q) I X(O) = Y, X(q) = j] =PI .. ·Pq 'L<x(q))J· (6.86)
}=1 }=1

Proof: Let wi,j be a sample path of X starting at state i and reaching a state j for the first
time after q transitions. The value of V along this sample is ai vi,i 1 Vi 1 h ... v;q_ 1 ,j so that
the expectation of V along all samples wi,j of X is

E[V(q) I X(O) = i, X(q) = j]


L (a; Vi,i 1 Vi 1 ,i2 ... viq_ 1 ,j)(Pi,i 1 Pi 1,i2 ···Piq_ 1 ,j)
iJ, ... ,iq-1 e{l, ... ,m}

L CXj 8i,iJ 8iJ,i2 ••• 8iq-1.} = CXj (gq);,j .


i1, ... ,iq-1 e{l, ... ,m}

If the initial value of X is Y, the above expectation becomes


m
E[V(q) I X(O) = Y, X(q) = j] = L (gq)i,j ai = (gq a) j.
i=l

For x<0) =a we have g 7 a =PI ... Pr-1 Pr x<r) (Eq. 6.84) so that

E[V(q) I X(O) = Y, X(q) = j] =PI ... Pq (x(q))j,


and summation with respect to j gives Eq. 6.86.
The dominant eigenvalue in Eq. 6.85 results from (Eq. 6.86)

e(q')
"m
L-J'=l(x
(q')
)j '-
q q q' > q'
-
eq
() = Pq+l · · · Pq' --",..---,....,..-- ~
"m
L..j=l
( (q)). - 1-L
x 1

where the approximate equality holds for large values of q since the vectors x<q) and x<q')
nearly coincide with the dominant eigenvector and the numbers Pq+ 1, ... , Pq' are approx-
imately equal to 1-L (Example 6.38). •
6.5. Algebraic equations 415

If the conditions under which Eqs. 6.85 and 6.86 hold, the coordinate u i of the
dominant eigenvector of Eq. 6.83 can be approximated by

1-~ i .
Uj::::: 1 ,
~q+ (1 - ~q -q)
'"'
~
E[V(r) I X(O) = Y, X(r) = J]. (6.87)
r=q+l

Proof: Suppose that q is sufficiently large so that the expectation E[V(r) I X(O)
Y, X (r) = j] can be approximated by 1-Lr u j. For q 1 > q we have

q' q' 1-~-tq'-q


" " E[V(r)
L..J
I X(O) = Y, X(r) = j] ~ ""
L..J
J-Lr Uj = J-Lq+l 1-J-L
Uj
r=q+l r=q+l

showing that Eq. 6.87 holds. •

Example 6.39: The dominant eigenvalue of the matrix

0.2 0.4 0.1 ]


g = [ 0.4 0.2 0.2
0.1 0.5 0.2
is~= 0.7653. The estimate of this eigenvalue in Eq. 6.85 varies between 0.7 and
0.7663 for q' = 11, 12, ... , 20, q = 10, the transition probability matrix

0.3 0.3 0.4 ]


p =[ 0.2 0.3 0.5 '
0.5 0.2 0.3

probabilities 1Ti = 1/3, and a = (100, 100, 100). The values of e(q) and e(q ')
have been estimated from ns = 1, 000 samples. ¢
Note: The expectations in Eqs. 6.86 and 6.87 have been estimated from samples of X
generated from the transition probability matrix p defining this process and the probability
(:rrb ... , :rrm) of its initial state. A

Example 6.40: The dominant eigenvalue and eigenvector of the matrix

0.6 0.2 0.1 ]


g = [ -0.2 0.7 -0.1
0.1 -0.1 0.7
are, respectively, 0.9388 and (0.5441, -0.6681, 0.5075). The dominant eigen-
value and eigenvector of g given by the recurrence formula in Eq. 6.84 with
x<0 ) = (1, 1, 1)/ .J3 are 0.9377 and (0.5346, -0.6470, 0.5437) after 10 iterations,
and 0.9388 and (0.5441, -0.6681, 0.5075) after 30 iterations. These estimates are
based on the transition probability matrix and the probability of X (0) in the pre-
vious example. ¢
416 Chapter 6. Deterministic Systems and Input

6.6 Integral equations


We consider the inhomogeneous and homogeneous Fredholm equations of
the second kind defined by (Eq. 6.3)

u(x) = ¢(x) +A L g(x, y) u(y) dy, x, y ED, and (6.88)

u(x) =A L g(x, y) u(y) dy, x, y ED, (6.89)

respectively, where D is an open bounded set in JRd, A is a parameter, and¢ and g


are real-valued functions defined on D and D x D, respectively. These equations
are frequently encountered in science and engineering ([37], Section 5.3). Our
objectives are to find local solutions for Eqs. 6.88 and 6.89, that is, the value of u
in Eq. 6.88 for a specified value of A at an arbitrary point x E D and the lowest
eigenvalue and eigenfunction of Eq. 6.89, respectively.
The methods of the previous section can be applied to solve locally Eqs. 6.88
and 6.89 if they are approximated by algebraic equations, as demonstrated by the
following three examples. Alternative local solutions for Eqs. 6.88 and 6.89 are
discussed in the remainder of this section.

Example 6.41: Let Di, i = 1, ... , m, be a partition of D, that is, D = U;"= 1 Di


and Di n D J = fZJ, i f. j. If the partition is sufficiently fine and the functions ¢,
g, and u are smooth, the solution ofEq. 6.88 can be approximated by the solution
u of the algebraic equations
m
u(Xi) =¢(xi)+ A Lg(Xj, X j) u(x j) Vj,
J=l

where xi E Di and Vi is the volume Di. The above equations defining has the u
formofEq. 6.73 in which the entries of the matrices g, "''and u are g(x i, Xj) Vj,
¢(xi), and u(xi). Similar arguments show that Eq. 6.89 can be cast in the form of
Eq. 6.74. ¢

Example 6.42: Suppose that the kernel of Eqs. 6.88 and 6.89 is degenerate, that
is, g(x, y) = L~=l gl,s(x) g2,s(y), x, y E D, and that the functions g1,s are
linearly independent. Then the solution of Eq. 6.88 is
m
u(x) = ¢(x) +A L gt,s(X) hs,
s=l

in which the coefficients hs = fv g2,s(Y) u(y) dy are given by the linear system
of algebraic equations
m
hs=as+ALhthts. s=l, ... ,m,
t=l
6.6. Integral equations 417

where as= fv82,s(Y)¢(y)dy and hts = fv81,t(y)gz,s(Y)dy (Eq. 6.73). If


as = 0, s = 1, ... , m, the above equations define an algebraic eigenvalue problem
forthematrix{brs}. <>

Note: If u has the above representation, Eq. 6.88 becomes

or

f.gl,s(X) [hs-as-A.thtbst]=O, XED.


s=l t=l
The coefficients of the above linear form must be zero since the functions g1 ,s are assumed
to be linearly independent. This condition yields the defining equations for hs. A

Example 6.43: Let Xs, s = 1, 2, ... , be a complete collection of real-valued func-


tions defined on D so that the representations

¢(x) =Las Xs(X) and u(x) = L.Bs Xs(X)


s

hold. The coefficients as can be calculated since ¢ is a known function. If the


series in the representations of ¢ and u are absolutely and uniformly convergent,
the unknown coefficients ,B s are the solutions of the algebraic system

.Bs- IXs- A L ,Bq Ysr = 0, S = 1, 2, ... ,


q

with an infinite number of equations, where Ysr are some coefficients. <>
Note: Because JD g (x, y) Xs (y) d y is an element of the space spanned by the functions
Xr, it has the representation Lq Ysq Xq (x). The coefficients Ysq can be calculated since
JD g(x, y) Xs(Y) dy is a known function.
If the integral in Eq. 6.88 can be performed term by term, we have

Li'is Xs(X) =Las Xs(X) +A Li'is LYsq Xq(X).


s s q

Hence,

so that the square brackets must be zero. The resulting conditions for tis constitute a sys-
tem with an infinite number of linear equations. The solution technique for this system
resembles the local method for solving Eq. 6.73 ([94], p. 89). A
418 Chapter 6. Deterministic Systems and Input

6.6.1 Inhomogeneous equations


Our objective is to find the value of the solution u of Eq. 6.88 at an arbitrary
point x E D, that is, to find the local solution of this equation.

Let UJ, u2, ... be a sequence of functions defined by

Uk(X) = l gk(X, y) ¢(y) dy, where

gk+J(X,y)= lgJ(X,Z)gk(Z,y)dz, k=1,2, ... , andg 1 =g. (6.90)

If l¢(x)l ::::: CJ, ig(x, y)l ::::: c2 for all x, y ED, 0 < CJ, c2 < oo, v = fv dx <
oo, and lA I c2 v < 1, then the local solution ofEq. 6.88 is

+ I>k Uk(x),
00

u(x) = ¢(x) xED. (6.91)


k=l

Proof: The Neumann series in Eq. 6.91 introduced in Eq. 6.88 gives

¢(x) +I:k=l
Ak Uk(x) = ¢(x) +A lv{ g(x, y) ¢(y) dy

+I:>k+l r g(x,y)uk(y)dy
00

k=l lv
under the assumption that the integral JD g(x, y) u(y) dy with u in Eq. 6.91 can be per-
formed term by term. These calculations are formal since we have not yet imposed any
conditions on the series in Eq. 6.91. The functions Uk in Eq. 6.90 result by equating the
terms of the above equation that have A at the same power. For example, the first two
functions of the series expansion in Eq. 6.91 are (Eq. 6.90)

u 1 (x) = fv g(x, y) ¢(y) dy

u2(x) = L g(x, y) UJ (y) dy = fv g(x, y) ( L g(y, z) ¢(z) dz) dy

= Lg2(x,z)¢(z)dz.

Because the inequalities 1¢ (x) I ::::; CJ and lg(x, y) I ::::; c2 are assumed to hold for
some constants c1, c2 > 0 and all x, y E D, lg2(x, y)l ::::; fv lg(x, z) llg(z, y)l dz :::0 v c~,
and, generally, lgk(X, y)l ::::; (v c2)k jv for all k = 1, 2, ... , the absolute value of u in
6.6. Integral equations 419

Eq. 6.91 is dominated by

lu(x)l::::: lcf>(x)l + f
k=l
I.A.Ik r lgk(X, y)llcf>(y)l dy
lv
00

::::: c1 +q ~)I.A.I cz v)k.


k=l

If the condition I.A. I cz v < 1 is satisfied, then Lk(I.A.I cz v)k is convergent implying that
the Neumann series in Eq. 6.91 is absolutely and uniformly convergent so that the term by
term integration performed above is valid. Moreover, the value of u (x) at an arbitrary point
x E D can be approximated by the first ii < oo terms of the Neumann series in Eq. 6.91.
The method in this section for solving locally Eq. 6.88 relates to the BWM devel-
oped in Section 6.4 for solving locally linear elasticity problems. Both methods are based
on Neumann series representations of the local solutions and the terms of these series are
calculated by Monte Carlo simulation (Eqs. 6.71 and 6.92). •

If the Neumann series of Eq. 6.91 is absolutely and uniformly convergent, and
the kernel g is positive, then

Uk(x) = ak(x) E [ ¢ ( y(k) (x)) J, (6.92)

where y(k) (x) is aD-valued random variable with probability density gk (x, ·) =
8k(x, ·)/ak(X) and ak(x) = fv 8k(X, y) dy.

Note: The functions gk (x, ·) are proper density functions with support D because they are
positive and the volume under their graph is unity. The definition of Uk in Eq. 6.90 has also
the form uk(x) = ak(x) f D gk(X, y) cf>(y) dy showing that the integral is the expectation
of cf>(Y(k)(x)). The recurrence formula given by Eq. 6.90 shows that, if g is positive, so
are the kernels gk. Generally, the integrals in Eq. 6.90 must be calculated numerically.
If the kernel g is not positive on D x D, uk (x) can be interpreted as the expectation
of the random variable g(x, z(k)) cp(z(k)) v, where z(k) is uniformly distributed in D and
v = fv dx is the volume of D . .6.

The local solution in the preceding section (Eq. 6.79) can be extended to
find local solutions for Eq. 6.88 ([48] and [95], p. 90). Let X = (X(O) =
x, X (1), ... ) be a discrete time, continuous state Markov process starting at the
point x E D where the local solution needs to be determined. Let f(· I y) denote
the density of the conditionalrandorn variable X(i) I (X(i -1) = y), i ~ 1. This
density, referred to as transition density, needs to be specified. Given a transition
density f(· I ·),the probability that X starting at x E D exits Din one transition
is p(x) = fvc f(~ I x) d~. The following two equations define a stopping time
T giving the number of transitions that X starting at x E D takes to exit D for the
first time and a discrete time process V = (V(O), V(l), ... ). It is assumed that
the transition density f(· I x) is such that T < oo a.s. for all x E D.
420 Chapter 6. Deterministic Systems and Input

T = inf{r > 0: X(r) f/. D, X(O) =xED}. (6.93) 1

V(k + 1) =A V(k) g(X(k), X(k + 1)) k = 0, 1, ... , (6.94)


f(X(k + 1)) I X(k))
where V starts at V(O) =1.

If the Neumann series in Eq. 6.91 is absolutely and uniformly convergent, the
local solution of Eq. 6.88 at x
E D is

u(x) = E [V(T- 1) cf>(X(T- 1))/ p(X(T- 1)) I X(O) =x ED]. (6.95)

Proof: Letx = xo r-+ XJ r-+ · · · Xk r-+ Xk+l with xo, ... , Xk ED and Xk+l E De be a
sample path of X that starts at x E D and exits D for the first time ink+ 1 transitions. Note
that X(T- 1) and X(T) are the last and the first values of X in D and De, respectively
(Eq. 6.93). The expectation of V(T -1) cp(X(T -1))/ p(X(T -1)) conditional on X(O) =
x and T = k + 1, k = 0, 1, ... , is cp(x)jp(x) fork= 0 and

fork= 1, 2, ... , where Dk = x~=l D so that

E [V(T- 1)¢(X(T- 1))/p(X(T- 1)) I X(O) =xED)= ¢(x)

+A { g(X,XJ)c/J(Xj)dXJ+A 2 {
fv fvxD g(X,Xj)g(XJ,X2)c/J(X2)dxldx2+ ...
= cp(x) +A UJ (x) + A2 u2(x) + · ..

with the notation in Eq. 6.90. Hence, the expectation in Eq. 6.95 coincides with the local
solution in Eq. 6.91. Because the Neumann series in Eq. 6.91 is absolutely and uniformly
convergent by hypothesis, Eq. 6.95 gives the local solution of Eq. 6.88. •

The local solution in Eq. 6.95 can be estimated from samples of X and V.
Letx r+ X(1, w) r+ · · · r+ X(T(w)- 1, w) r+ X(T(w), w), w = 1, ... , ns, be
ns independent samples of X starting at x E D.

The local solution in Eq. 6.95 can be estimated by

u(x) = _!.._t
ns w=l
V(T(w)- 1, w) cf>_(X(T(w)- 1' w)).
f(X(T(w)- 1, w))
(6.96)
6.6. Integral equations 421

Note: Let X (r, w) = x r E D be the value of a sample X (·, w) of X after r ~ 0 transitions.


The value of this sample at time r+l can be generated from the transition density f(· I x 7 ).
The sample X (·, w) is stopped when it leaves D for the first time. Samples of V can be
calculated from Eq. 6.94 and samples of X. A

Example 6.44: Consider the differential equation

d 4 u(x)
~ + X(X) u(x) = q(x), xED= (0, l),

with the boundary conditions u(O) = u(l) = 0 and u"(O) = u"(l) = 0, where
x > 0 and q are specified functions. The integral form of the above differential
equation is

fo
1 .
u(n = g*(~, T/) [q(T/)- X(T/) U(T/)] d17
= r/J(~)- fol g(~, T/) U(T/) dT/,
where g* denotes the Green function of the differential operator d 4 I dx 4 , ~ = xI l,
11 = yll,rjJ(~) = J~g*(~.T/)q(T/)dT/andg(~.T/) = g*(~.T/)X(T/). Because
max~, 11 e(O,I) lg(~. T/)1 < 1, the local solution in Eq. 6.95 is valid and has been
used to calculate the local solution.
Numerical results have been obtained for l = 1, x = 1, q = 1, and X
defined by X(k + 1) I (X(k) = ~) ,...., N(~, a 2 ), where a > 0 is a specified
parameter. These results are in error by less than 2% for a in the range (0.2, 0.6)
and ns = 500 samples. <>

Note: The Green function of the differential operator cf4 ldx 4 is

where~ = xI l, 11 = y I l E (0, 1). This function gives the value of u at x caused by a unit
action applied at y.
The solution of the above differential equation is the displacement function for a
simply supported beam with unit stiffness and span l that is supported on an elastic foun-
dation of stiffness x and is subjected to a load q. A

6.6.2 Homogeneous equations


Suppose that the real-valued kernel g in Eq. 6.89 is symmetric, that is,
g(x, y) = g(y, x) for all x, y E D, and is square integrable. Also, assume
that the eigenvalues of Eq. 6.89 are simple. Our objective is to find the lowest
422 Chapter 6. Deterministic Systems and Input

eigenvalue and the corresponding eigenfunction. Before discussing local solu-


tions for Eq. 6.89, we will review briefly some essential properties of the eivenval-
ues/eigenfunctions of this equation. Let Ak and Uk, k = 0, 1, ... , be the eigenval-
ues and eigenfunctions ofEq. 6.89. The eigenfunctions cannot be zero everywhere
in D, Uk E L2, and the pair (A.k, Uk), k = 0, 1, ... , satisfies the equation

(6.97)

• The eigenvalues are real-valued.


• The eigenfunctions corresponding to distinct eigenvalues are orthogonal and
linearly independent.

Proof: Let v, w : D --+ IR be square integrable functions. The inner product, (v, w) =
fv v(x) w(x)* dx, induces the norm II v II= (v, v) 112, where z* denotes the complex
conjugate of z E IC. This definition is meaningful by the Cauchy-Schwarz inequality since
the functions v, ware square integrable in D.
Suppose that there exists a complex-valued solution (A., u) of the eigenvalue prob-
lem in Eq. 6.89. The complex conjugate,

u(x)* =A* L g(x, y) u(y)* dy,

ofEq. 6.89 multiplied by u(x) and integrated with respect to x over D yields

fv u(x) u(x)* dx =A* L dx u(x) L g(x, y) u(y)* dy.

We also have

L u(x) u(x)* dx =A L dx u(x)* L g(x, y) u(y)dy

by multiplying Eq. 6.89 with u(x)* and integrating the result with respect to x over D. The
above two equations imply A = A* since g is symmetric and u is an eigenfunction, so that
the inner product (u, u) is real and non-zero.
Let u 1 and u 11 be two eigenfunctions corresponding to the eigenvalues >! and A." =j:.
A. 1, respectively. The inner product of these eigenfunctions,

(u 1 ,u11 ) = f u 1 (x)u 11 (x)*dx =A.' f g(x,y)u 1 (y)u 11 (x)*dxdy


Jv lvxD
= )..1 L[L g(x, y) u" (x)* dx Ju' (y) dy = ~:, (u', u 11 ),

shows that (u1 , u 11 ) must be zero since A. 11 =j:. A. 1 by hypothesis.


Let u 1 , ... , un be eigenfunctions corresponding to distinct eigenvalues ofEq. 6.89
and suppose that there exist some coefficients q, ... , en which are not all zero such that
Lk=l ck uk (x) = 0. The inner product of Lk Ck Uk with ui gives ci (ui, Ui) = 0 so that
all ci must be zero for all i = 1, ... , n in contradiction with our assumption. •
6.6. Integral equations 423

Let u<0 ) : D -+ lR be an initial trial function for the lowest eigenfunction


uo of Eq. 6.89 and construct a sequence of functions u (i) : D -+ lR, i = 1, 2, ... ,
by the recurrence formula

u(i+l) (x) = L g(x. y) u(i)(y) dy. i = o. 1•...• (6.98)

whereu<i)(y) = u<i)(y)/ II u<i) 11.

If (1) the kernel g in Eq. 6.89 is real-valued and symmetric, (2) the initial trial
function u<0 ) does not exclude uo, and (3) 0 < A.o < Ak fork ::=: 1, then

II u< 0 II
A.o ~ {u(i+ 1), u(i)} and uo(x) is proportional to u (i) (x) fori large. (6.99)

Proof: Because the collection of eigenfunctions of Eq. 6.89 is complete, we can represent
the initial trial function by

= I:CkO) Uk(x),
00

u<0)(x) x ED.
k=O

ca
Suppose that 0) -:f. 0 so that the eigenfunction uo is not excluded from u(O). The proof of
Eq. 6.99 involves three steps. First, we note that
(i)
= L ~ uk(x),
00

u<i)(x) i = 1, 2, ... ,
k=O Ak

where cii) = cr- I 1) II uCi- 1) II. For example, (Eqs. 6.97 and 6.98)

where ci1) = ckO) I II u<0l 11. Because uCll has the same functional form as u<0l, we have
uC2l(x) = Lk(ck2) /A~) uk(x). The above representation of u<i) follows by induction.
Second, the first term of the series giving u<i) is dominant for a sufficiently large
value of i since 0 < .Ao < Ak for all k ;:: 1 so that Ab u<il (x)lcg) converges to uo(x) as
i ~ 00.
Third, consider two successive approximations,
424 Chapter 6. Deterministic Systems and Input

Their inner product,


(i) 2
(i+I) (i) ~ (co )
(u ,u )- 2 "+I . (uo, uo),
A. 0 ' II u(z) II

divided by II u(il 11 2::::: (cg)) 2 (uo, uo)JA.5i gives the approximate expression of A.o in
Eq. 6.99. The above equations also show that uo becomes proportional to u<il for a suffi-
ciently large i. •

If the conditions in Eq. 6.99 are satisfied and g is a positive function, then

(6.100)

where a(x) = fv g(x, y) dy and Y(x) denotes an !Rd -valued random variable
with the probability density function g(x, ·) = g(x, ·)/a(x).

The functions in Eqs. 6.98 and 6.100 can be estimated from n s independent
samples of Y(x) (Eq. 6.100) by

(6.101)

Example 6.45: Let u be the solution of the ordinary differential equation

u"(x)+A.u(x)=O, xED=(-1,1), A>O,

with the boundary conditions u(±1) = 0. The integral form of this equation is

where g is the Green function of the differential operator d 2 1dx 2 . This equation
is a simple version of the SchrOdinger equation, which can also be used to analyze
the stability of a simply supported beam subjected to an axial load.
Figure 6.14 shows the lowest eigenfunction and eigenvalue of the above
differential equation. The approximate eigenfunction calculated from Eq. 6.101
with i = 100 iterations and ns = 500 samples at each iteration is practically
indistinguishable from the exact solution uo(x) = cos(rr xj2). In fact, a very
good approximation of uo results in a few iterations. However, the estimate of A.o
in Eq. 6.99 becomes stable after a larger number of iterations, as demonstrated by
its variation with the number of iterations. <>
Note: The Green function of the differential operator d 2 j dx 2 is the solution of the differ-
ential equation g 11 (x, y) = 8(x- y), x, y ED= (-1, l), with the boundary conditions
6. 7. Problems 425

0.8
2.8
Estimate of A.0
0.6
/ A-0=2.4674
2.6
0.4
Exact and
approximate
u0(x)

-0.5 0 0.5 0 20 40 60 80 100

X Iteration number

Figure 6.14. Exact/estimated eigenfunction uo and the variation of the estimate


of .Ao with the number of iteration

g(±l, y) = 0, where the primes denote derivatives with respect to x and <'lOis the Dirac
delta function. Elementary calculations give

(1 - y)(l + x)/2, X:::;: y,


{
g(x, y) = (1 + y)(l- x)/2, X ~y.

The Green function g is positive, symmetric, and square integrable in D so that Eq. 6.101
can be applied. The density of Y(x) is g = gfa(x), where a(x) = J~ 1 g(x, y)dy =
(1- x 2 )j2.
The solution of u 11 +'Au = 0 is an even function since, if u (x) satisfies this equation,
so does u( -x). Hence, we can replaceD by D* = (0, 1) and solve u" +'Au = 0 in D* with
= =
the boundary conditions u (1) 0 and r/ (0) 0. The general solution of this equation,

u(x) =a cos(-v'Ax) + f3 sin(-v'A x), x E D*,

where a and f3 are some constants. Since the function u must satisfy the boundary condi-
tions and cannot be zero everywhere in D*, we have

uk(x) = cos(j):/; x), where Ak = (2k + 1) 2 (n /2) 2 .


The parameters Ak and the functions Uk> k = 0, 1, ... , are the eigenvalues and the eigen-
functions of u" + 'A u = 0 with the specified boundary conditions. &

6.7 Problems
6.1: Develop an algorithm for solving locally Eq. 6.21 for the case in which~ r
are random fields defined on the boundaries aD r of D.

6.2: Extend the algorithm in Problem 6.1 to solve locally Eq. 6.25 with p and~
random fields defined on D and aD r, respectively.
426 Chapter 6. Deterministic Systems and Input

6.3: Solve the SchrOdinger equation (1/2) u"(x) + q(x) u(x), 0 < x < l, with
the Dirichlet boundary conditions u(O) =a and u(O) = f3 for two cases: (1) q is
a continuous function and (2) q is a random field with continuous samples.

6.4: Show that the eigenvalues of Eq. 6.36 with d = 1 and D = (0, 1) must be
strictly positive.

6.5: Let g(IB(t)l) and g(hn,e(B(t))) be the processes given by Eq. 6.41 and
Eq. 6.42, respectively. Evaluate numerically the difference between these two
processes in a time interval [0, 1] for several values of nand s > 0, g(x) = x 2 .

6.6: Write a MATLAB function generating samples of (1/s) J~ 1(-s,sJ(B(s)) ds


for t :::: 0 and some s > 0. Calculate statistics of this process approximating the
local time process Lin Eq. 6.38 for decreasing values of s.

6.7: Complete the details of the proof given in the text for the second extension
of Tanaka's formula in Eq. 6.46.

6.8: Generalize Eq. 6.41 to find a formula for the increment g(X(t))- g(X(O)),
where g E C2 (JR) and X denotes an arbitrary diffusion process. Apply your re-
sult to two diffusion processes, an Omstein-Uhlenbeck process and a geometric
Brownian motion process.

6.9: Prove Eq. 6.51.

6.10: Apply the algorithm described in Fig. 6.10 to solve locally the partial dif-
ferential equation in Example 6.22.

6.11: Use the SWM to solve locally the Laplace equation Llu(x) = 0, x E
D = (- 2, 2) x ( -1, 1) with the Dirichlet boundary conditions u (x) = 1 for
x E ( -2, 2) x {-1} and u(x) = 0 on the other boundaries of D.

6.12: Repeat the analysis of the partial differential equation in Example 6.22 for
the same Dirichlet boundary conditions on (0, a) x {0} and (0, a) x {b} but different
Neumann boundary conditions. Use in your calculations au(x)jax 1 =a on {0} x
(0, b) and au(x)jaxl = {3 on {a} x (0, b), where a and f3 are some constants.

6.13: Show that the solution of Eq. 6.75 is u = (i - g) -I t/J = (l:~o g') t/J if
there exists y E (0, 1) such that II g x II< y II x II for all x E !Rm, where g 0 = i
denotes the identity matrix.

6.14: Apply Eq. 6.79 to solve locally the problem in Example 6.34 by using tran-
sition probability matrices other than p 1 and p 2 in this example. Comment on the
dependence of the solution accuracy on the transition probability matrix. Extend
your analysis to the case in which t/J in Eq. 6.75 is random.
6. 7. Problems 427

6.15: Apply Eq. 6.85 and 6.87 to find the dominant eigenvalue and eigenvector
for some real-valued symmetric matrices of your choice.

6.16: Use Eq. 6.95 to solve locally the differential equation in Example 6.44.
Consider for X (k + 1) I X (k) a different probability law than the one used in the
text.
Chapter 7

Deterministic Systems and


Stochastic Input

7.1 Introduction
This chapter examines stochastic problems defined by

V[X(x, t)] = Y(x, t), t 2: 0, xED C ~n, (7.1)

where n 2: 1 is an integer, x and t denote space and time parameters, respec-


tively, V can be an algebraic, differential, or integral operator with deterministic
coefficients that may or may not depend on time, Y is the random input, and X
denotes the output. It is common in applications to concentrate on values of X at
a finite number of points x k E D rather then all points of D. Systems described
by the evolution in time of X at a finite number of points and at all points in D
are called discrete and continuous, respectively. The focus of this chapter is on
discrete systems since they are used extensively in applications and are simpler to
analyze than continuous systems. The vector X(t) E ~d collecting the processes
X(xk. t), called the state vector, defines the evolution of a discrete system. The
mapping from Y to X can be with or without memory. An extensive discussion
on memoryless transformations of random processes can be found in [79] (Chap-
ters 3, 4, and 5) and is not presented here.
Our objective is to determine the probability law of X from the probability
law of Y, the defining equation of X (Eq. 7.1), and the initial and/or boundary
conditions for this equation. This objective cannot be achieved for all types of
operators and inputs in Eq. 7.1. Therefore, in some cases, we will derive only
second moment properties of X. The methods needed to characterize X depend on
the properties of the input, the definition of V, and the required output statistics.
For example, if V is a linear operator and Y is Gaussian, then X is a Gaussian
process so that the first two moments of this process define its probability law.

429
430 Chapter 7. Deterministic Systems and Stochastic Input

On the other hand, the second moment properties of X are insufficient to define
its probability law if Y is not Gaussian or if Y is Gaussian but V is a nonlinear
operator. Linear and nonlinear operators are examined separately. It is also shown
that some of the methods developed for properties of the solution of initial value
problems can be extended to a class of boundary value problems.

Table 7.1. System, input, solutions

EQ Brownian motion Semimartingale, Semimartingale,


DM SAM
• Meanlcorr. • Meanlcorr. • PDE for rp in
• Linear if ME L2 special cases
random vibration • PDE for rp eHOMif

~
(formal calculus) if A, H, and K are GWN/PWN input
deterministic
• HOM for special
cases
• HOM if drift/ • Meanlcorr. • HOM if
diffusion are if a, bare poly( X)
drift/diffusion
poly( X) • PDE for rp in are poly(X, S)
~ • PDE forrp special cases • PDE for rp in

~z
if drift/diffusion special cases
are poly( X)
0 • Fokker-Planck-
Kolmogorov eqs
• Approximations
• System:
dX(t) = a(X(t-), t) dt + b(X(t-), t) dS(t)
X(t) = a(X(t-), t) + b(X(t-), t) p(S(t-))
• Input:
dS(t) = dA(t) + H(t) dB(t) + K (t) C(t) = dA(t) + dM(t)

Table 7.1 lists the type of problem considered and results obtained in the
following two sections. The notations mean/corr, HOM, and PDE for rp mean
that differential equations are obtained for the mean/correlation, the higher order
moments, and the characteristic function rp of X. Qualifiers such as Z = solution
of an SDE with GWNIPWN or H = poly( B) mean that Z satisfies a stochastic
differential equation driven by Gaussian and/or Poisson white noise processes or
H is a polynomial of a Brownian motion B. The qualifiers DM and SAM in the
table heading refer to direct and state augmentation methods used in the analysis.
The defining equations for the state X and the driving noise S are given at the
bottom of the table.
The analysis is based on the Ito formula for semimartingales. We give this
7.1. Introduction 431

formula here for convenience. Let Y be an JRm -valued semimartingale and let
g : JRm -+ lR be a function with continuous second order partial derivatives. Then
g(Y) is a semimartingale and (Section 4.6.2)

If Y has continuous samples, Eq. 7.2 becomes

g(Y(t))- g(Y(O)) = t lot ~(Y(s))


i=l By;
dY;(s)

+- L loto --(Y(s))d[Y;,
1 d
2 ..
Bzg
By; By1
Yj](s). (7.3)
l,j= 1

If Y is a pure jump semimartingale, Eq. 7.2 becomes

g(Y(t))- g(Y(O)) = t lo+t ~(Y(s-))dY;(s)


By;

E:~
i=l

+ 0~, [c(Y(,))- g(Y(,-))- (Y('-)) t.Y;(')J. (7.4)

Note: If Y has continuous samples, then Y(s-) = Y(s), L'.Y;(s) = 0, and [Y;, Yj]c =
[Y;, Yj] so that Eq. 7.2 becomes Eq. 7.3. If Y is a quadratic pure jump semimartingale,
then [Y, Y]c = 0 (Section 4.5.2), and Eq. 7.2 becomes Eq. 7.4. The formula in Eq. 7.4
applies if, for example, Y is a compound Poisson process . .A.
432 Chapter 7. Deterministic Systems and Stochastic Input

7.2 Linear systems


For a discrete linear system, Eq. 7.1 is an initial value problem that can be
given in the form

X(t) = a(t) X(t) + b(t) Y(t), t ::_ 0, (7.5)

where the state X is an JE.d -valued stochastic process, the input Y takes values
in JE.d', a denotes a (d, d)-matrix, and b is a (d, d')-matrix. The solution of this
differential equation is

X(t) = 9(t, 0) X(O) +lot 9(t, s) b(s) Y(s) ds, (7.6)

where the Green function 9, also called the unit impulse response function in
dynamics, satisfies the equation ([175], Section 5.1.1)

a9(t,s)
- - - = a(t) 9(t, s), t ::_ s ::_ 0, (7.7)
at
and 9(s, s) is the identity matrix.

Example 7.1: Suppose that Eq. 7.1 is

X(t) +a(t)X(t) + f3(t)X(t) = Y(t), t ::_ 0,

and set xl = X and x2 = X. The JE..2 -valued process X = (Xl, X2) is the
solution of

d[X1(t)]
dt X2(t) =
[0-f3(t)
1-a(t) ][X1(t)]
X2(t) + [OJ
1 Y(t),

that is, it satisfies a differential equation of the type given by Eq. 7.5. <>
Our objective is to develop differential equations for moments and other
probabilistic properties of X. If Y is a Gaussian process, the probability law of
X is defined completely by its first two moments since Eq. 7.5 is linear. If Y is
not Gaussian, X is generally non-Gaussian so that moments of order higher than
two and/or other statistics are needed to characterize its probability law ([79],
Section 5.2).

7.2.1 Brownian motion input


If Yin Eq. 7.5 is a Gaussian white noise, the state X is a diffusion process
defined by the stochastic differential equation

dX(t) = a(t) X(t) dt + b(t) dB(t), t ::_ 0, (7.8)


7.2. Linear systems 433

where the coordinates of B E ~d' are independent Brownian motions. The solu-
tion ofthis differential equation is (Eq. 7.6)

X(t) = IJ(t, 0) X(O) +lot IJ(t, s) b(s) dB(s). (7.9)

In Eq. 7.8 we can assume without loss of generality that the input is white
noise. If the input to this equation is colored, it can be approximated by the output
Y of a linear filter driven by white noise ([30], Section 2.17). The augmented
vector Z = (X, Y) satisfies a differential equation of the same form as Eq. 7 .8.
Note: If the coefficients a and b ofEq. 7.8 satisfy the conditions in Section 4.7.1.1, then the
equation has a unique solution. If the functions(} and bare bounded, the integral in Eq. 7.9
with Brownian motion integrator is a a (B(s), 0 ::::; s ::::; f)-square integrable martingale. We
have also seen that X is a sernimartingale (Section 4.7.1.1) . .&

Example 7.2: Let X be the process in Example 7.1, where Y is the solution of
dY(t) = p(t) Y(t) dt + ~(t) dB(t), t ::: 0, and B denotes a Brownian motion.
The augmented state vector Z = (X 1, X2, Y) is the solution of

X1(t)]
d [ X2(t) =
[0-f3(t)
1
-a(t) 01 ] [ X2(t)
X 1 (t) ] dt +[ 00 ] dB(t)
Y(t) 0 0 p(t) Y(t) ~(t)

so that Z is a diffusion process (Eq. 7.8). <>

7.2.1.1 Mean and correlation equations


Our objective is to calculate the second moment properties of X, that is,
the mean function p,(t) = E[X(t)] and the correlation function r(t, s) =
E[X(t) X(s)T] or the covariance function c(t, s) = E[(X(t) - p,(t)) (X(s) -
p,(s))T] of this process for deterministic and random initial conditions. If X(O)
is random, it is assumed to be independent of the driving noise. The state X is a
Gaussian process if X (0) is a constant or a Gaussian vector. Otherwise, the proba-
bility of X can be calculated from the second moment properties of the conditional
Gaussian process X(t) I X(O) and the distribution of X(O).

The mean and correlation functions of X in Eq. 7.8 satisfy the differential
equations

jt(t) = a(t) p,(t), t ::: 0,


r(t, t) = a(t) r(t, t) + r(t, t) a(t)T + b(t) b(t)r, t::: 0, and
ar(t,s)
- - - = a(t) r(t, s), t > s::: 0, (7.10)
at

with p,(O) = E[X(O)] and r(O, 0) = E[X(O) X(O)r]. The covariance function
c of X satisfies the same differential equation as r.
434 Chapter 7. Deterministic Systems and Stochastic Input

Proof: The mean equation results by averaging Eq. 7.8.


The integral form of the Ito formula for continuous semimartingales (Section 4.6.2
and Eq. 7.3) applied to function X(t) r-+ Xp(t) Xq(t), p, q = 1, ... , d, gives

Xp(t) Xq(t)- Xp(O) Xq(O) = Ld lot (fipk Xq(s) + Xp(s) fiqk) dXk(s)
k=l 0

Xp(t) Xq(t)- Xp(O) Xq(O) =lot Xq(s) dXp(s) +lot Xp(s) dXq(s)
r
+ "21 Jo (d[Xp, Xq](s) + d[Xq, Xp](s)).
The average of the left side of the above equation is rpq (t, t) - rpq (0, 0) so that its deriva-
tive with respect tot is fpq(t, t). The average of the first term on the right side of this
equation is

E[lot Xq(s) dXp(s) J= E [lot Xq(s) ( ~>pu(s) Xu(s) ds + ~ hpv(s) dBv(s))]


= ~ E [lot apu(s) Xq(s) Xu(s) ds J+ ~ E [f hpv(s) Xq(s) dBv(s) J

= L t apu(s)rqu(s,s)ds,
ulo
so that its derivative with respect to time is Lu apu (t) rqu (t, t). In the same way, the
average of JJ Xp(s) dXq(s) can be obtained. The quadratic covariation process [Xp, Xq]
is (Section 4.5)

r
d' t
[Xp. Xq](t) =L hpv(s)bqv(s)ds,
v=l Jo
with time derivative I:_~~~ hpv(t)bqv(t). The expectation of the last term on the right
side of the Ito formula is 'L~~ 1 JJbpv(s)bqv(s)ds. Hence, the Ito formula gives by
expectation and differentiation

d'
fpq (t, t) =L apu(t) rqu(t, t) +L aqu (t) rpu (t, t) +L hpv (t) bqv(t).
u u v=l

The matrix form of this equation is Eq. 7.1 0.


The Ito formula applied to mapping X(t) r-+ Xp(t) in the time interval [s, t] gives

fort2:s.
7.2. Linear systems 435

Note that the above equation can also be obtained by using classical analysis. We have

Xp(t)Xq(s)- Xp(s)Xq(s) =[ Xq(s)dXp(u)

= L:lt
d

u=l
apu(u)Xu(u)Xq(s)du+ Ld' 1ts bpv(u)Xq(s)dBv(u)
v=l

by multiplying the above equation with Xq (s). The average of the above equation is

rpq(t,s)-rpq(s,s)= Ld it apu(u)ruq(u,s)du
u=l s

since

E[Xq(s)dBv(u)] = E {E[Xq(s)dBv(u) IFsJ} = E {Xq(s) E[dBv(u) IFsl} = 0


where Fs = u{B(u), 0:::: u :::: s} is the natural filtration of B. The differentiation of the
above equation with respect to t yields the last formula in Eq. 7.1 0.
The calculation of r in Eq. 7.10 involves two steps. First, the function r(t, t), t ;:::: 0,
needs to be obtained for the specified initial condition r(O, 0) = E[X(O) X(Ol]. Second,
the last differential equation in Eq. 7.10 has to be integrated fort;:::: sand known r(s, s).
That c satisfies the same differential equations as r results from the relationship

r(t, s) = c(t, s) + /L(t) /L(s)T


and the mean equation. For example, the second formula in Eq. 7.10 is

:t (c(t, t) + /L(t) /L(t)T) = a(t) (c(t, t) + /L(t) /L(t/)


+ (c(t, t) + /L(t) /L(t)T) a(t)T + b(t) b(t)T

showing that c and r are the solutions of the same differential equations since

:t (c(t, t) + /L(t) /L(t)T) = c(t, t) + jL(t) /L(t)T + /L(t) jL(t)T


and jL(t) = a(t) /L(t). •
It is useful for applications to consider a noise B b with increments

dBb(t) = JLb(t)dt + ~b(t)dB(t), (7.11)

where the (d', 1) and (d', d') matrices ILb and ~bare interpreted as the mean and
the intensity of the driving white noise, respectively. The coordinates of the noise
B b are correlated.
436 Chapter 7. Deterministic Systems and Stochastic Input

If X is the solution ofEq. 7.8 with B replaced by B bin Eq. 7.11, the mean and
correlation functions of X are given by

jJ,(t) = a(t) ~t(t) + b(t) ILb(t), t ::::: 0,


r(t, t) = a(t) r(t, t) + r(t, t) a(t)T + b(t) ~b(t) ~ b(t)T b(t)T
+ b(t) ILb(t) /L(t)T + /L(t) ILb(t)T b(t)T, t::::: 0,
ar(t,s) =a(t)r(t,s)+b(t)/Lb(t)/L(S)T, t >S ::::0, (7.12)
at

with /L(O) = E[X(O)] and r(O, 0) = E[X(O) X(O)T]. The last two formulas
hold with c in place of r and IL b = 0.

Proof: The mean equation results by averaging the defining equation of X. The difference
between the equations for X and JL gives

dX(t) = a(t) X (t) dt + b(t) t; b(t) dB(t),


where X = X - JL. The covariance function of X satisfies the last two equations in Eq. 7.10
with b i;b in place of b. These equations and the relationship between the correlation and
the covariance functions give the conditions satisfied by the correlation function of X. If
JLb = 0 and i;b = i is the identity matrix, Eq. 7.12 coincides with Eq. 7.10. •

Example 7.3: Let X lR be an Omstein-Uhlenbeck process defined by dX (t) =


E
-a X (t) dt + f3 dB(t), t
::::: 0, where a > 0 and f3 are constants. If X (0) E L 2 is
independent of B, the mean and correlation functions of X are

f.J.,(t) = E[X(O)]e-" 1 , t::::: 0,

r(t, t) = E[X(0) 2 ] e- 2 at + ;: (1- e- 2 " 1 ) , t::::: 0,

r(t, s) = r(s, s) e-a (t-s), t ::::: s ::::: 0.

The probability law of X can be obtained from the properties of the Gaussian
process X (t) I (X (0) = ~) whose first two moments are given by the above
equations with (~, ~ 2 ) in place of (E[X (0)], E[X (0) 2 ]).
The differential equations for the correlation function of X become

f(t, t) = -2a r(t, t) + {3 2 i;b(t) 2 + 2{3 Mb(t) f.J.,(t), t::::: 0,


ar(t, s)
-a-t- = -a r(t, s) + f3 Mb(t) Jl,(s), t ::::: s,

if dB(t) is replaced by dBb(t) = Mb(t) dt + /;b(t) dB(t) (Eq. 7.11 ). <>


Proof: The second moment properties of X satisfy the differential equations jl(t)
-a Jh(t), f(t, t) = -2a r(t, t) + {3 2 , and ar(t, s)jat = -a r(t, s) (Eq. 7.10). To solve
these equations, the mean and variance of X (0) need to be specified. The differential equa-
tions for the mean and correlation functions of X driven by the white noise Bb are given
by Eq. 7.12 . •
7.2. Linear systems 437

7.2.1.2 Linear random vibration


One of the main objectives of linear random vibration studies is the determi-
nation of the second moment properties of the state X of a linear dynamic system
driven by white noise. The state X is the solution of Eq. 7.5 withY = W, where
W is a white noise process specified by

E[W(t)] = ILw(t),
E[(W(t) - ILw(t))(W(s)- /Lw(s))T] = qw(t) 8(t- s). (7.13)

The functions ILw(t) and qw(t) denote the mean and the intensity of W at time
t, respectively, and 8 (·) is the Dirac delta function. It is assumed that there is no
correlation between the noise and the initial state X(O).
The classical mean and correlation/covariance equations in linear random
vibration coincide with Eq. 7.12 for ILb = ILw and ~b ~b = qw. However, the
derivation of these equations in linear random vibration is based on a heuristic
definition of white noise (Eq. 7 .13) and on formal calculations ([175], Chapter 5).

Example 7.4: Let X be the solution of the differential equation

X(t) =-a X(t) + {J W(t), t =::: 0, (7.14)

where X (0) = 0, a > 0 and {J are constants, and W is a white noise with J-t w =
0 and qw = 1 (Eq. 7.13). Denote the solutions of Eq. 7.14 by X G and Xp if
W(t) is a Gaussian white noise Wa(t) = dB(t)fdt and a Poisson white noise
Wp(t) = dC(t)/dt, respectively. Suppose that Wa and Wp have the same first
two moments as W so that X G and X p are equal in the second moment sense. The
classical linear random vibration can only deliver the second moment properties
of the state so that it cannot distinguish between the processes X G and X p. This
is a severe limitation since the samples of these processes can differ significantly.
Figure 7.1 shows a sample of X G and three samples of X p for C(t) = 'Lz;,?
Yk,
where N is a Poisson process with intensity A. = 5, 10, and 100, and Yk are
independent copies of a Gaussian variable Yt with mean zero and variance E[Yf]
such that A. E[Yf] = 1. The qualitative difference between the samples of X p and
X G decreases as A. increases. ¢
Note: Recall that the mean and correlation functions of Care E[C(t)] =), E[Yt] = 0 and
E[C(t) C(s)] = J... E[Yf] min(t, s) (Section 3.12). Formal calculations of the type used in
the classical linear random vibration show that the mean and the covariance functions of
the Poisson white noise Wp(t) = dC(t)/dt are zero and J...E[Yf]8(t- s), respectively.
H we set J... E[Yf] = 1, the white noise processes Wa and Wp are equal in the second
moment sense. &

In some applications it is of interest to ( 1) establish conditions under which


X becomes a weakly stationary process X s and (2) determine the second moment
438 Chapter 7. Deterministic Systems and Stochastic Input

1.5,-----~-------, 2.----~-------,

1 xp 0.=5) 1 .5 xp (A.=IO)
0.5

0 0.5

-1
-2L--~-~----~-~ -1.5L--""--'-~----~-~
0 2 4 6 8 10 0 2 4 6 8 10

3,----~-------,

X (1.=100)
2 p

-1

-2L--~-~----~-~
0 2 4 10 2 4 6 10
6 time 8 time 8

Figure 7.1. A sample of X G and three samples of X p for A = 5, 10, and 100 such
that A E[Yf] = 1

properties of X s. Let p, s, r s, and c s denote the mean, correlation, and covariance


functions of X s provided they exist.

If (1) the coefficients of a linear system are time-invariant and the driving noise
is weakly stationary, that is, a(t) =a, b(t) = b, Jlw(t) = Jlw, and qw(t) = qw,
and (2) p,(t) and r(t, t) in Eq. 7.12 become time-invariant as t--+ oo, then X s
exists and its mean and covariance functions are given by

a lls + b llw = 0,
acs+csaT +hqwbT =0,
Cs('r) =a Cs(T), T :::: 0, (7.15)

where Cs and Cs(T) are shorthand notations for Cs (t, t) and Cs (t + r, t), respec-
tively.

Proof: If X is weakly stationary, the above equations follow from Eq. 7.12 since JL(t) and
c(t, t) do not vary in time so that their time derivatives are zero and c(t, s) depends on only
the time lag r = t - s rather than the times t and s.
The differential equations for JL(t) and the entries Cij(t, t), i 2: j, of c(t, t) have
the form ~(t) = a ~(t) + ~. t 2: 0, where a and~ are constant matrices. The matrix a
7.2. Linear systems 439

is equal to a for /L(t). Hence, /Ls = limt-+oo /L(t) exists and is finite if the eigenvalues
of a have negative real part. Similarly, Cs = limr-+oo c(t, t) exists and is finite if the
eigenvalues of the corresponding matrix Cl! have negative real part ([166], Section 4.1.3,
[175], Section 5.1.1).
Suppose that u has distinct eigenvalues >..i with negative real part and let ~s =
Li di Vi be the stationary solution of~ (t) = Cl! ~ (t) +~.where Vi denote the eigenvectors
ofu. Then~(t) = Li (ci eJ...; 1 + di) Vj, where Ci are some constants. If ~(0) =Lid[ Vj,
then W) = Li ( (d[- di) eJ...; + di) Vj. If ~(0) = ~s• then d[- di = 0 for all i's so that
1

= Li di vi = ~s• as stated. •
the general solution is ~(t)

Example 7.5: Consider a linear dynamic system with n degrees of freedom, mass
matrix m, damping matrix~, and stiffness matrix k that is subjected to a random
forcing function Y. Let X be an !Rn -valued process collecting the system displace-
ments at its degrees offreedom. This process is defined by ([175], Section 5.2.1.1)

m X(t) + ~ X(t) + k X(t) = Y(t), t 2:: 0. (7.16)

Let Vi and t/Ji, i = 1, ... , n, be the eigenvalues and eigenvectors of the eigenvalue
problem v2 m t/J = k t/J. Suppose that (1) the damping matrix is~ =am+ {3 k,
where a, {3 are some constants, and (2) the eigenvalues v are distinct. Then Xl
has the representation
n
X(t) = Lt/Ji Qj(t), t 2::0, (7.17)
i=l

where

the columns of the (n, n) matrix ci» are the eigenvectors t/Ji, mi are the non-zero
s
by ( .T .
entries of the diagonal matrix ci» T m ci», and the modal damping ratios i are given
~ )ii = 2 Si Vj mi. Any system response can be represented in the form

n
X(t) = LPt/Ji Q;(t), t 2::0, (7.19)
i=l

where p is a matrix depending on the response considered. For example, the


relative displacement between the first two degrees of freedom of the system is
X(t) = '£7= 1 (c/Ji,2 - tPi,l) Qi(t) so that pis a (1, n) matrix with Pll = -1,
P12 = 1, and Pli = 0 fori = 3, ... , n.
The definitions of X and X in Eqs. 7.17 and 7.19 show that the second
moment properties of any response depend only on the system characteristics and
the second moment properties of the modal coordinates Q;, i = 1, ... , n. ~
440 Chapter 7. Deterministic Systems and Stochastic Input

Note: The matrices 4»T m 4» and 4»T k 4» are diagonal by the definition of the eigenvalue
problem. Because s= s
am+ f3 k, the matrix 4»T 4» is also diagonal. The parameters iii;,
~i, and k;=Vf /iii; are called modal mass, damping ratio, and stiffness, respectively.
s
If the matrix 4»T 4» is not diagonal, a similar approach can be used for the state
vector (X, X) rather then X. In this case, the system eigenvalues are complex-valued,
and a right and a left complex-valued eigenvector corresponds to each eigenvalue ([127],
Section 3.7). ~

Example 7.6: Consider a special case of the previous example withY replaced
by -m 1 W(t), where 1 denotes an (n, 1) matrix with unit entries and W is a scalar
white noise with mean zero and intensity qw(t) = qw = rr go. This case corre-
sponds to a building with n-degrees of freedom subjected to seismic ground ac-
celeration modeled as a white noise with a one-sided spectral density of intensity
go > 0. The stationary displacement vector X (t) has mean zero and covariance
• y q •T, where y q denotes the covariance matrix of the n-dimensional vector
Q(t) = (Q1 (t), ... , Qn(t)). The entries (i, i) of • 'Yq are •T
i = 1, ... ,n, (7.20)

where r; = <•m 1); jm; are called modal participation factors. The correlation
coefficients between the modal coordinates are

PQ;,Qj =8'A/):/J,{).. J.t+1H}/x.


PQ.;,Qj = PQ;,Qj ('A+ J.t)/('A J.t + 1),
PQ· Q..
l' J
= -pQ·. Q.
l' 1
'A- 4'A /):/1, (1- 'A 2 HJ/X. (7.21)

where x = 4~? 'A ('AJ.t + 1) ('A+ J.t) + (1- 'A 2 ) 2 , 'A= v;jv1, and f..t = ~;/~J·
The above results can be used to calculate the second moment properties of the
response of a linear system subjected to seismic ground acceleration and develop
approximations for response maxima ([175], Example 5.10, p. 190). <>

Proof: An arbitrary pair of modal coordinates, Q; and Q1 , i "I= j, satisfy the differential
equations

1 0 0
!!._ [ Q;(t)
Q;(t)
]
= -vl
[ 0 -2~; v? 0 0 Q; (t) ]
Q;(t)
] [ Qj(t) +[ 0
-r;
]
W(t).
dt Qj(t) 0 0 0 1 0
Qj(f) 0 0 -v~
J
-2~j v] Q1 (t) -ri

The above equation defines the evolution of the state vector ( Q; , Q; , Q1 , Q1). The second
moment properties given by Eqs. 7.20 and 7.21 result from Eq. 7.15 applied to this state
vector. •
7.2. Linear systems 441

Example 7.7: Let X be the solution of Eq. 7.5 with constant coefficients, that is,
a(t) = a and b(t) = b so that the Green function (} in Eq. 7.7 depends only
on the time lag. Suppose that (1) the system is causal, that is, 9(r) = 0 for
r < 0, (2) Y is a weakly stationary process with mean zero, correlation function
ry, and spectral density sy, and (3) Eq. 7.5 admits a weakly stationary solution.
The spectral density of X is

s(v) = h(v)* sy(v) h(v)T, (7.22)

where z* denotes the complex conjugate of z E C and

h(v) = ~ 9(u) b e-R vu du (7.23)

is the system transfer function. <>

L: L:
Proof: We have

X(t) = fJ(t- a) b Y(a) da = fJ(u) b Y(t- u) du,

since the system is causal and time invariant. The starting time -oo is needed for stationary
response ([175], Section 5.2.2). The second equality in the above equation follows from
the change of variable u = t - a. Because Y has mean zero, the expectation of X is zero
at all times. If the mean of Y is a constant ILy f=. 0, then (Section 3.9.3.3)

E[X(t)] =E [L: fJ(u) b Y(t- u) du J= L: fJ(u) b JLy du


is a constant.

L: L:
The correlation function of X,

L: L:
E[X(t) X(s)T] = fJ(u) b E[Y(t- u) Y(s- v)T] bT fJ(v)T du dv

= fJ(u) b ry(t- s- u + v) bT fJ(v)T du dv = r(t- s),


depends on the time lag r = t - s rather than the times t and s, so that this process is
weakly stationary. The Fourier transform of the above equation is (Section 3. 7.2.4)

L: L:
s(v) = _1_ { e - A r v r(r) dr = _1_ { e<-A (r-u+v)+(u-v)) v r(r) dr
2n}~ 2n}~

= e-R (u-v) v fJ(u) b sy(v) bT fJ(v)T du dv

= h(v) sy (v) h(v)T,

where Sy (v) = (1/(2 n)) J~ e-R (r-u+v) v ry(r - u + v) dr for each (u, v). •
442 Chapter 7. Deterministic Systems and Stochastic Input

Example 7.8: Let X(t), t:::: 0, be the solution ofEq. 7.5 with a(t) =a, b(t) =
b, X(O) = x, and a Gaussian white noise input Y = W with second moment
properties in Eq. 7.13. Suppose that Eq. 7.5 has a stationary solution X s and let
ILs and Cs(·) denote its mean and covariance functions. The process

X(t) = Xs(t) I (Xs(O) = x), t 2: 0, (7.24)

is a version of X(t), t :::: 0, defined by Eq. 7.5 with X(O) = x. The mean and
covariance functions of X are

{L(t) = ILs + 9(t) (x - /Ls),


c(t, s) = fJ(t- s) Cs(O)- fJ(t) Cs(O) fJ(s)T, t 2: s, (7.25)

where (J is the Green function defined by Eq. 7. 7. <:>

Proof: That X is Gaussian with the first two moments in Eq. 7.25 follows from properties
of Gaussian vectors (Section 2.11.5). Because X and X are Gaussian processes, they are
versions if they have the same second moment properties. The. stationary solution Xs
satisfies the differential equation Xs (t) =a Xs (t) +bY (t) so that X(t) =a X(t) +b Y(t),
t ::: 0, results from Eq. 7.5 by conditioning with respect to X s (0) = x, where Y(f) = Y (t) I
Xs(O) and X(O) = x. Because Y is a white noise process, Y(t) = W(t) = W(t) so that X
and X satisfy the same differential equation and initial condition.
The extension to a colored stationary Gaussian input Y is discussed in [76] for the
case in which Y is defined by some of the coordinates of the state of a linear filter driven
by Gaussian white noise. •

Example 7.9: Let X be the solution of X(t) = -a X (t) + Y (t), t :::: 0, where
X (0) = x, Y is the stationary solution of Y(t) = - f3 Y (t) + W (t), W is a station-
ary Gaussian white noise with mean f-tw and intensity qw, a =f. {3, and a, f3 > 0.
The mean and variance functions of X (t), t 2: 0, are

t-t(t)=xe-at+t-ti (1-e-at),
qw [l-e-2at _ 2 e-(a+{J)t_e-2at]
f3 (7.26)
c(t, t) = 2{3 (a+ {3) a a-

where M = t-twf(a {3). <:>

Proof: The augmented state vector (X, Y) E ~2 satisfies the differential equation

!!_ [
dt
X (t)
Y(t)
J = [ 0-a 1
-fJ X(t) J+ [ 01 JW(t).
J[ Y(t)
This system has a stationary solution (Xs, Ys) since it has constant coefficients satisfying
the conditions in Eq. 7.15. The first two moments of (Xs, Ys) are MI = E[Xs(t)] =
Mwf(a fJ), M2 = E[Ys(t)] = 1-Lw/fJ, Y!,! = E[Xs(t) ] = qw/[2a fJ (rx + fJ)], Y!,2 =
- 2
7.2. Linear systems 443

Y2,1 = E[Xs(t) Ys(t)] = qw/[2{3 (a+ {3)], and Y2,2 = E[l\(t) 2 ] = qw/2{3, where
Z = Z- E[Z] denotes the centered variable Z. The Green function is

-ar
8(-r:) =[ ~

so that the mean and covariance functions of (X(t), Y(t)) = (Xs(t), Ys(f)) I (Xs(O) =
x, Ys(O) = y) are (Eq. 7.25)

'
~1 t
( )
= ~I + (x - ) -a t
~1 e + Ya -_ ~2
f3

fl2Ct) = ~2 + (y- ~2) e-!3 1 ,


' qw { I -at [ 1 -at 1 e-!3 t - e-at]
CJ, 1 (t,t)=2f3 a(a+f3)-e a(a+f3)e +a+f3 a-{3

e-!3 t _ e-at [ 1 _ e-!3 t _ e-at]}


_ --eat+ ,
a-{3 a+f3 a-{3

i\2Ct, t) = 22,1(t, t)
= qw { _1_ _ _1_ e-(a+f3) t _ _I_ e-!3 t (e-!3 t _e-at)},
2{3 a+f3 a+f3 a-{3

22, 2 (t,t)= ~; (1-e- 2 /3 1).

where the subscripts 1 and 2 refer to X andY, respectively.


If Ys(O) is a random variable with mean E[Y:~(O)] = ~2· we have ~(t) = E[Xs(t) I
Xs(O) = x] since the first two terms in the expression for flJ(f) = E[Xs(t) I Xs(O) =
x, Ys (0) = y] do not depend on Ys (0) and the expectation with respect to Ys (0) of the last
term in the expression for {l 1 (t) is zero. Similar considerations can be used to calculate the
variance of Xs(t) I Xs(O) = x [76]. •

Example 7.10: Let t E (0, l) in Eq. 7.5 be a space parameter and let Y = W
be a white noise with the second moment properties in Eq. 7.13. The solution of
this equation satisfies the homogeneous boundary conditions 2:7= 1 a pi X; (0) = 0
and 2:7= 1 {3qi X;(l) = 0, where p = 1, ... , d1, q = 1, ... , d- d1, 0 :S d1 :S d,
and (ap;. {3q;) are some constants. Generally, one or more coordinates of X(O)
are not known so that Eqs. 7.10 and 7.12 cannot be solved. This is the essential
difference between the initial value problem considered previously in this chapter
and the boundary value problem in this example.
444 Chapter 7. Deterministic Systems and Stochastic Input

The mean and covariance function of X are [83]


iL(t) = a(t) p,(t) + b(t) ILw(t),
C(t, t) = a(t) c(t, t) + C(t, t) a(t)T + b(t) q w (t) b(tl
+ fJ(t, 0) Yw,o(t) b(t)T + b(t) Y w,oCtl fJ(t, Ol,
c(t, s) = fJ(t, s) c(s, s)
+[it fJ(t, u)b(u) Yw,o(u)T du] fJ(s, O)T, 0 < s < t < l, (7.27)

where Yw,o(t) = E [X(O) W(tf] and() denotes the Green function in Eq. 7.7.
Initial conditions for these equations are given by the second moment properties
of X(O) = ~ Z, where~ is a (d, d) constant matrix depending on fJ(l, 0) and the
()
coefficients (a pi, /3qi) in the boundary conditions and Z = J~ (l, s) b(s) W (s) ds
is an JRd -valued random variable with mean and covariance matrices

P,z =lot fJ(l, s) b(s) ILw(s) ds and

yz =lot fJ(l, s) b(s) qw(s) b(s)T fJ(l, s l ds.

The above covariance equations differ in two ways from the covariance
equation used in linear random vibration. The equations for c(t, t) and c(t, s)
include new terms and c(O, 0) is not generally available but can be determined
from the relationship between X (0) and X (l). ¢
Proof: To derive Eq. 7.27, we use the formal definition of white noise in Eq. 7.13. The
solution ofEq. 7.8 att = 1,

X(1) = 8(1, 0) X(O) + f 8(1, s) b(s) W(s) ds = 8(1, 0) X(O) + Z,


and the boundary conditions give X(O) = ~ Z. To see that X(O) = ~ Z holds, consider the
linear system of equations defined by the above equation and the boundary conditions for
the coordinates of X(O) and X(1), which are assumed to be homogeneous.
The equation for the mean of X results by averaging and differentiation. The cen-
tered state vector is

X(t) = X(t)- /L(t) = 8(t, 0) X(O) + f 8(t, s) b(s) W(s) ds

so that
c(t, t) = E[X(t) X(t)T]
= 8(t, 0) c(O, 0) 8(t, O)T + 8(t, 0) t Yw '0 (s) b(s)T 8(t, s)T ds
lo
+[lot 8(t, s) b(s) Yw,o(s)T ds J 8(t, O)T
+lot ds lot du 8(t, s) b(s) E W(u)T] b(u)T 8(t, u)T,
[W(s)
7.2. Linear systems 445

where Yw,o(t) = E [X(O) W(t)T]. The above double integral degenerates into
lot O(t, s) b(s) qw(s) b(s)T O(t, sl ds

by the definition of the driving noise. The derivative of the covariance function of X (t) is

i:(t, t) = iJ(t, 0) c(O, 0) O(t, O)T + O(t, 0) c(O, 0) iJ(t, O)T


+ iJ(t, 0) ( Yw 0 (s) b(s)T O(t, s)T ds + O(t, 0) [rw 0 (t) b(t)T O(t, t)T
lo ' '
+ f Yw,o(s)b(s)T iJ(t,s)T ds] + [f O(t,s)b(s)Yw,o(s)T ds] iJ(t,Ol

+ [o(t, t) b(t) Yw,o(tl +lot iJ(t, s) b(s) Yw,oCsl ds] O(t, ol


+lot iJ(t,s)b(s)qw(s)b(s)TO(t,s)T ds+ lot O(t,s)b(s)qw(s)b(sliJ(t,sl ds

+ O(t, t) b(t) qw(t) b(t)T O(t, t)T,


where iJ(t, s) denotes aO(t, s)jat. The differentiation rule fr JJ
g(t, s) ds = g(t, t) +
JJ ft g(t, s) ds was used in the above calculations. The second formula in Eq. 7.27 follows
from the above equation, the definition of the Green function 0, and the expression of
c(t,t).
The last formula in Eq. 7.27 follows from

c(t, s) =E [ X(t) X(sl] = E [o(t, s) X(s) +[ O(t, u) b(u) W(u) du] X(s)T

= O(t, s) c(s, s) + 1t O(t, u) b(u) E [ W(u) X(s)T] du, t 2::. s,

since

E [ W(u) X(s)T] =E [ W(u) X(O)TJ O(s, O)T

+losE [ W(u) W(v)T] b(vl O(s, v)T du,

and E [ W(u) W(v)T] = 0 for u f. v.


The mean and covariance equations in Eq. 7.27 can also be solved as initial value
problems for the second moment properties, fL(O) = E[X(O)] = ~ fLz and c(O, 0) =
E [(X(O)- fL(O))(X(O)- fL(O))T] = ~ Yz ~T . •

Example 7.11: Let V, cl>, M, and Q be the displacement, the rotation, the bend-
ing moment, and the shear force in a beam of length l fixed at the left end and
simply supported at the right end. The state vector X = (V, cl>, M, Q) satisfies
the differential equations V(t) = cl>(t), <i>(t) = -a M(t), M(t) = Q(t), and
Q(t) = -W(t), where 1/a is the beam stiffness and W is a distributed white
noise load with mean /Lw and intensity qw.
446 Chapter 7. Deterministic Systems and Stochastic Input

The mean functions of the coordinates of X are (Eq. 7.27)

414 5312
f.Ll(t) = -aJLwl (-2l + 48~ - 16~ ),
3 1 3 5 2 1
JL2(t) =-a f.Lw l (-6~ + 16 ~ - 8),

2 1 2 5 1
f.L3(t) = f.Lwl (-2~ + 8~- 8),
5
f.L4(t) = f.Lw l (-~ + 8),
where~ = tIl E (0, 1). Figure 7.2 shows the functions Ji, 1 = f.Ll/(a f.Lw 14 ) and
JL2 = f.L2/(a f.Lw 13), JL3 = f.L3/(JLw 12), and JL4 = f.L4/(JLw l).

0.8

-0.025'---~-~-~~---' -0.4L.__~-~-~~---'
0 0.2 0.4 til 0.6 0.8 0 0.2 0.4 0.6 0.8
til

Figure 7.2. Mean function of X

The differential equations for the variance functions of X are (Eq. 7 .27)

c1,1 (t, t) = 2ct,2(t, t),


c1,2(t,t) =c2,2(t,t) -ac1,3(t,t),
c1,3(t, t) = c2,3(t, t) + CI,4(t, t),
c1,4(t, t) = c2,4(t, t) + ~ t2 E [ M(O) W(t)] + ~ t3 E [ Q(O) W(t)],

C2,2(f, t) = -2ac2,3(t, t),


c2,3(t, t) = -a c3,3(t, t) + c2,4(t, t),

c2,4(t, t) = -a c3,4(t, t) +atE [ M(O) W(t)] + ~ t 2 E [ Q(O) W(t)],

c3,3(t, t) = 2c3,4(t, t),

C3,4(f, t) = C4,4(f, t)- E [ M(O) W(t)] - t E [ Q(O) W(t)]'

C4,4(f, t) = qw - 2 E [ Q(O) W(t)] .


7.2. Linear systems 447

Figure 7.3 shows the scaled variance functions

X 10-4
6 0.6

4 0.4
L'44

2 0.2

0
\ 'C'l,l
-2 -0.2
0 0.5 til 0 til

Figure 7.3. Variance function of X

ct,l (t, t) = ct,l (t, t)j(cx 2 Yw 17 ), c2,2(t) = c2,2(t)j(cx 2 Yw 15)},


C3,3(f) = C3,3(t)/(Yw 13), and C4,4(f) = C4,4(t)/(Yw l)}.
The variance functions are consistent with the boundary conditions. For example,
=
ct,l (0, 0) ct,l (l, l) =
0 since the displacement is zero at the beam ends.~
Proof: The state vector X satisfies Eq. 7.5 withY = Win Eq. 7.13, and the boundary
conditions V(O) = 0, 4>(0) = 0, V(l) = 0, and M(l) = 0. The non-zero entries of
matrices a(t) and b(t) in Eq. 7.5 are a12 = 1, a23 = -a, a14 = 1, and b4 = -1. The
non-zero entries of the Green function 8 (t, 0) are On = 1, 012 = t, 013 = -a t 2/2,
014 = -a t 3/6, 022 = 1, 023 = -at, 024 = -a t 2 /2,033 = 1, 034 = t, and 044 = 1 so
that the coordinates of X(O) and X(l) are related by [83]

al al a 2 3 !ol
V(l) = V(O) + l <I>(O)-- M(O)-- Q(O) +- (l- s) 3 W(s) ds,
2 6 6 0
2
<I>(l) = <I>(O) -a l M(O) - -al Q(O) + -a!ol (l- s) 2 W(s) ds,

t
2 2 0

M(l) = M(O) + l Q(O)- (l- s) W(s) ds,

Q(l) = Q(O)- fo 1 W(s) ds.


The last two equalities constitute global equilibrium conditions for the beam. The first and
third equalities together with the boundary conditions V (0) = 0, <I> (0) = 0, V (l) = 0, and
M(l) = 0 can be used to find the unknown entries in X(O). The bending moment and the
shearforceattheleftendofthe beamareM(O) = i<3
Yt- Y2) and Q(O) = ff(Y2- Yt).
where Yt = 3 ~ 2 JJ(l- s) W(s)ds and Y2 = JJ(l- s) W(s)ds so that c3,3(0,0) =
3
2qwl 3 j105, c3,4(0,0) = -3qwl 2 j35, and c4,4(0,0) = 11qwlf35. The covariances
448 Chapter 7. Deterministic Systems and Stochastic Input

= 1, 2 and j = 1, ... , 4 because V (0) = 0 and <I> (0)


Ci,j (0, 0) are zero for i = 0. The
expressions of M(O) and Q(O) can also be used to calculate the expectations

E [ M(O) W(t)] = q; l (1- ~) [(1- ~) 2 - 1].


E [ Q(O) W(t)] = 3 ;w (1 - ~) [ 1 - ~ (1 - ~)2 J,
giving the non-zero entries in Yw,o(t). •

Example 7.12: Let X be the solution of

X(t) + fJ X(t) + v5 X(t)- /_: k(t- s) X(s) ds = Y(t),


where {J, vo > 0 are some constants, the kernel k is such that k(u) = 0 for u < 0
and f~oo lk(u)l du < oo, andY denotes a weakly stationary process with mean
zero and spectral density sy. If the above equation admits a weakly stationary
solution X 8 , then this solution has mean zero and spectral density

sy(v)
Sx(v) = 2,
j-v 2 + J=T fJ v + v5- h(v)j
where h(v) = f~oo k(u) e - R vu du. The process X describes the torsional mo-
tion of an airplane wing in a turbulent wind andY models the effect of the buffet-
ing forces on the wing. <>
Proof: We first note that the moment equations developed in this section cannot be applied
to find the second moment properties of X defined by the above equation.
Let X 8 (t) = f~oo eH vt dZ(v) be the spectral representation of X8 , where Z is a
process with E[dZ(v)] = 0 and E[dZ(v) dZ(v')*] = sx(v) 8(v- v 1 ) dv (Section 3.9.4.1).
The second moment of the increments of Z depend on the spectral density sx of X, which
is not known. The representation of Xs gives

£: = £: [£: J
L:
k(t- a) X 8 (a)da k(u)e-H vu du eH vt dZ(v)

= h(v)eHvt dZ(v)

by the change of variables u = t - a and the definition of h. Also, h(t) is bounded for
each t ~ 0 since lh(v)l ::; f~oo lk(u)l du, which is finite. The defining equation of X and

L: [
the above equality give

-v 2 + H fJ v + v6- h(v)] eH vt dZ(v) = Y(t).

Let X denote the left side of the above equation. Then, X and Y must have the same
second moment properties. The process X has mean zero consistently with Y. The second
7.2. Linear systems 449

moments of X and Y are

= L: l-v + n + v5- h(vf Sx(V) dv,

L:
E[X(t) 2 ] 2 ,8 v

E[Y(t) 2] = sy(v) dv,

so that
L: [l-v 2 + n ,8 v + v5- h(v)l 2
Sx(v)- Sy(v)] dv = 0,
which gives the stated expression of Sx. •

7.2.2 Semimartingale input


Let S be an ffi.d' -valued semimartingale defined on a filtered probability
space (Q, F, (F1) 1:c:o, P). Denote by X the solution of Eq. 7.5 withY being the
formal derivative of S with respect to time or a polynomial of S, that is,

dX(t) = a(t) X(t-) dt + b(t) dS(t), t 2: 0, (7.28)


X(t) =a(t)X(t-)+b(t)p(S(t-),t), t 2:0, (7.29)

where the coordinates of the ffi.dp -valued process p are polynomials of S with
coefficients that may depend on time.

The semimartingale noise S is represented by

S(t) = S(O) + A(t) + M(t), t 2: 0, (7.30)

where S(O) E Fo, A is an F 1 -adapted process with cadlag samples of finite


variation on each compact of [0, oo), A(O) = 0, M is an F 1-local martingale,
and M(O) = 0.

Note: We have seen that a real-valued process S defined on a probability space (r.l, :F, P)
endowed with a right continuous filtration (:F1) 1:c:o is a semimartingale if and only if it is a
classical semimartingale, that is, it has the representation

S(t) = S(O) + A(t) + M(t), t :::: 0, (7.31)

where S(O), A, and M have the properties in Eq. 7.30 (Section 4.4.1 in this book, [147],
Theorem 14, p. 105).
The matrices bin Eqs. 7.28 and 7.29 have the dimensions (d, d1) and (d, dp). re-
spectively. We use the same notation for simplicity. It is assumed that a and b are such that
the solutions of Eqs. 7.28 and 7.29 exist and are unique (Sections 4.7.1.1 and 4.7.2). •
450 Chapter 7. Deterministic Systems and Stochastic Input

The martingale component of Sin Eq. 7.30 is assumed to admit the representa-
tion
dM(t) = H(t) dB(t) + K(t)dC(t), (7.32)
where the entries of the (d', db) and (d', de) matrices Hand K are processes in
£, the coordinates of B and C are independent Brownian motion and compen-
sated compound Poisson processes, respectively, the jumps of the coordinates of
C have finite mean, B and C are independent of each other, H is independent
of (K, C), and K is independent of (H, B).

Note: Recall that the processes in£ are .Ft-adapted and have caglad paths (Section 4.4.1).
The stochastic integrals defined in Section 4.4.3 are for integrands in £ and semimartingale
integrators. We also note that the processes defined by Eqs. 7.30 and 7.32 do not include
all semimartingales. For example, a general Levy process (Section 3.14.2) does not admit
the representation in these equations.
Since M(O) = 0, we have (Eq. 7.32)

M(t) = t
lo+
H(s)dB(s) + t
lo+
K(s)dC(s).

The integrals JJ+ H;u (s) dBu (s) and JJ+


K;v (s) dCv(s), i = 1, ... , d 1 , u = 1, ... , db,
and v = 1, ... , de, in the above definition of M are local martingales by a preserva-
tion property of the stochastic integral (Section 4.4.4). Hence, the coordinates M;, i
1, ... , d 1 , of M are local martingales as sums of local martingales.
If in addition C E L2 and the conditions

E [Ia~ H;u Cd ds J < oo and

E [Ia~ K;v (s) 2 d[C, C](s)] < oo (7.33)

are satisfied at each timet :::: 0 for all indices i = 1, ... , d 1 , u 1, ... , db, and v =
1, ... , de, then M;, i = 1, ... , d 1 , are square integrable martingales. The second moment
of M;(t) = Lu H;u · Bu(t) + Lv K;v -
· Cv(t) is E[M;(t) ] 2 = Lu E [(H;u · Bu(t)) 2] +
E [ ( K;v · Cv (t)) 2] by the properties of H, K, B, and C. We have seen in Section 4.5.3

that£ [ (JJ+ H;u(s)dBu(s)) 2 ] = E [JJ+ H;u(s) 2 ds Jand£ [ (!J+ K;v(s)dCv(s)) 2] =

E [JJ+ K;v(s) 2 d[Cv, Cv](s)J. Hence, M; is a square integrable martingale under the
above conditions.
Let fh, t :::: 0, be the filtration generated by B and C. Generally, 9t is smaller
than the filtration .F1 considered at the beginning of this section. If the processes H and
K are 9 1-adapted and have caglad paths, then M is an 9t-local martingale. If in addition
the process A in the representation of S (Eq. 7.30) is also 9t-adapted, then S is an 9t-
semimartingale. For example, A can be a deterministic function with continuous samples of
finite variation on compacts and the processes H and K can be memoryless transformations
of B and C_, respectively. A
7.2. Linear systems 451

Example 7.13: Let B be a Brownian motion and C = r.:~? Yk be a compound


Poisson process that is independent of B, where N denotes a Poisson process
with intensity A. > 0 and Yk are iid random variables such that E[Yl] = 0. Then
the process M defined by dM(t) = 3 (B(t) 2 - t) dB(t) + 2 C(t-) dC(t) with
B(O) = C(O) = 0 is a martingale. If E[Yf] < oo, M is a square integrable
martingale. ~
Proof: Let gt = a(B(s), C(s), 0 ::; s ::; t), t ~ 0, be the filtration generated by
B and C. That M is a Qt·local martingale follows from a preservation property of the
stochastic integral since 3 (B(t) 2 - t), 2 C(t-) are {it-adapted processes with caglad sam-
ple paths and B, C are martingales (Section 4.4.4). The process M is a martingale since
E [JJ+ 3 (B(s) 2 - s) dB(s)] = 0 and E [JJ+ 2 C(s-) dC(s)] = 0 for all t ~ 0.
The process JJ+ 3 (B(s) 2 - s) dB(s) is a square integrable martingale since its
square has the expectation 9 E [JJ+ (B(s) 4 + s 2 - = 9 JJ+ (3 s 2 + s 2 -
2 s B(s) 2) ds]

2s s) ds = 6 t 3. The square of JJ+ 2 C(s-) dC(s) is 4 L,'C,_~ (L.~;:;:f Yi) 2 Yf so that its
expectation is finite provided that E[Yf] < oo since C has a.s. a finite number of jumps in
(0, t]. Hence, JJ+ 2 C(s-) dC(s) is also a square integrable martingale. Since this process
is independent of JJ+ 3 (B(s) 2 - s) dB(s), M is a square integrable martingale. •

Let

X 1(t)= lotO(t,a)b(a)dA(a) and

X2(t) =lot O(t, a) b(a) dM(a) (7.34)

be two processes corresponding to the components A and M of S (Eq. 7 .30).

The solutions of Eqs. 7.28 and 7.29 are, respectively,

X(t)=O(t,O)X(O)+X1(t)+X2(t), t:;::O, and (7.35)

X(t) = O(t,O)X(O) +lot O(t,a)b(a)p(S(a-),a)da, t:;:: 0, (7.36)

where 0 is defined by Eq. 7.7.

Our objective is to find the first two moments and/or other probabilistic
properties of the JRd- valued processes X in Eqs. 7.35 and 7.36 provided they exist.
The analysis is based on the Ito formula for semimartingales applied to functions
of X or to functions of an augmented state vector Z, which includes X. We
refer to these two approaches as the direct and the state augmentation methods,
respectively (Table 7.1). The first method is used in the next two sections to find
properties of X in Eq. 7.28 for square integrable and general martingale inputs.
Section 7.7.2.3 applies the state augmentation method to solve Eqs. 7.28 and 7.29.
Numerical examples are used to illustrate the application of these methods.
452 Chapter 7. Deterministic Systems and Stochastic Input

7.2.2.1 Direct method. Square integrable martingales

Let X be the solution of Eq. 7.28 with S in Eq. 7.30 and M in Eq. 7.32,
where Cq (t) = Cq (t) - 'Aq t E[Yq, J],

Nq(t)

Cq(t) = L Yq,k. t:::: 0, (7.37)


k=l

Nq are Poisson processes with intensities Aq > 0, and Yq,k are independent copies
of Yq,l, q = 1, ... , de. It is assumed that Cis in L2, the conditions in Eq. 7.33
are satisfied, and the coefficients a, bin Eq. 7.28 are such that this equation has a
unique solution.
We now derive differential equations for the mean function f.L(t) = E[X(t)]
and the correlation function r(t, s) = E[X(t) X(s) T] of X.

If A in Eq. 7.30 is a deterministic function ot with continuous first order deriva-


tive, then

jl(t) = a(t) f.L(l) + b(t) a(t),


r(t, t) = a(t) r(t, t) + r(t, t)a(t)T
+ f.L(l) a(t)T b(t)T + b(t) a(t) f.L(t)T
+ E[H(t) H(t)T] + E[K (t) e er K (t)r],
ar(t, s) . T
at =a(t)r(t,s)+b(t)Ol(t)JL(f)' t::=:s, (7.38)

where if (t) = b(t) H (t), K (t) = b(t) K (t), and e is a (de, de) matrix with
non-zero entries euu = .j'Au E[Y;,d, u = 1, ... , de.

Proof: The assumption that A is deterministic restricts the class of acceptable inputs. How-
ever, this assumption is adequate for the type of stochastic problems considered in the book,
as demonstrated by the examples in this and following chapters.
The average ofEq. 7.35 gives

JL(t) = 8(t, 0) JL(O) +{ 8(t, a-) b(a-) da(a-)

since E[dM(t)] = 0. The derivative with respect to time of this equation yields the above
differential equation for JL (Eq. 7.7).
7.2. Linear systems 453

The ItiHormula in Eq. 7.2 applied to X(t) r+ Xp(t) Xq (t) gives

Xp(t) Xq(t)- Xp(O) Xq(O)

= t f (8piXq(s-)+Xp(s-)8qi) [taij(s)Xj(s-)ds
i=l O+ }=1

+ E
d'
biu(s) (dau(s) + dMu(s))
]

+2
1 d t
L Jo (8pi 8qj + 8pj 8qi) L
d' - -
Hiu (s) Hjv(s) d[Bu, Bv](s)
i,j=l O+ u,v=l

+ L [xp(s)Xq(s)- Xp(s-)Xq(s-)- t(8pi Xq(s-) +Xp(s-)8qi) .6.Xi(s)],


O<s~t i=l

where .6.Xj(S) = Xj(S)- Xj(S-) is the jump of xi at times. The differentiation with
respect to time of the expected value of the above equation gives the second formula in
Eq. 7 .38. Except for the last term on the right side of the above equation, the calculations
are straightforward and are not presented. The last term takes the simpler form

L [(Xp(s-) + .6.Xp(s)) (Xq(s-) + .6.Xq(s))


0<S9
d
- L(8pi Xq(s-) + Xp(s-) 8qi) .6.Xi(s)] = L .6.Xp(s).6.Xq(s),
i=l 0<s9

which is zero at a times unless both Xp and Xq have a jump at this time. The jumps
d - - -
of Xp are .6.Xp(s) = Lv~l Kpv(s) .6.Cv(s). Two distinct coordinates of C have a
jump at the same time with zero probability since C has independent coordinates so that
.6.Xp(s).6.Xq(s) = Lv~l d -
Kpv(s) -
Kqv(s) ( .6.Cv(s)
- .)2
Consider a small time interval (s, s + .6.s], that will be reduced to a point by taking
the limit .6.s {. 0. If A.v .6.s« 1, a component Cv of C has at least one jump during the
time interval (s, s + .6.s] with probability 1 - el.u D.s :::: A.v .6.s. We have

by conditioning with respect to Fs since .D.Cv(s) is independent of Fs and the random


variables Kpv(s), Kqv(s) are Fs-measurable.
The last formula in Eq. 7.38 is obtained by multiplying the equality

Xp(t)- Xp(s) = 1t
s+
dXp(u), t ~ s,
with Xq(s), calculating the expectation of the resulting equation, and differentiating this
equation with respect to t. •
454 Chapter 7. Deterministic Systems and Stochastic Input

If H = h and K = k in Eq. 7.38 are also deterministic functions, the


state X is driven by Gaussian and Poisson white noise processes. In this case,
Eq. 7.38 coincides with Eq. 7.12 if a = ILb and the functions h and k are such
that the combined intensity of the Gaussian and Poisson noise processes has the
s
same intensity as Bb in Eq. 7.11, that is, b(t) Sb(tl is equal to

E[(dS(t) - da(t))(dS(t) - da(t))T]/dt

= [h(t)h(t)T +k(t)e(k(t)e)T],

where e is defined in Eq. 7.38 and E[dS(t)] = da(t). Hence, the formal second
moment calculus used in classica11inear random vibration theory can be applied
to find the second moment properties of X under the above conditions.

Example 7.14: Let X be a real-valued process defined by the stochastic differen-


tial equationdX(t) =a X(t-) dt + bdS(t), where dS(t) = dA(t) + dM(t) and
dM(t) = H(t) dB(t) + K (t) dC(t) with the notation of Eq. 7.32. The first two
moments of X satisfy the differential equations

,1(1; t) =a JL(l; t) + b E[A(t)],


,1(2; t) = 2a JL(2; t) + 2 b E[X (t-) A(t)]
+ b2 E[H (t) 2] +A b 2 E[Yf] E[K (t) 2 ],

where JL(p; t) = E[X (t- )P] = E[X (t)P]. The expectations E[X (t- )P] and
E[X (t)P] coincide since P(X (t) =X (t- )) = 1 at a fixed t.
If A(t) = 0 and H(t) = K(t) = 1, X is the solution of a stochastic
differential equation driven by Brownian and Poisson white noise, and its first
two moments satisfy the differential equations ,1(1; t) =a JL(l; t) and ,1(2; t) =
2a JL(2; t) + b 2 (1 +A E[Yf]). These moment equations are in agreement with
our previous calculations (Section 4.7.1.3). <>
Proof: The formulas in Eq. 7.38 do not apply since A is a random process. The above
equations involving moments of X have been obtained by Ito's formula applied to the
function X(t)P. Recall the notation /1-(t) = 11-(l; t) = E[X(t)], r(t, t) = fJ-(2; t) =
E[X(t) 2], H(t) = b H(t), K(t) = b K(t), and e = J;.. E[Yf]. •
Example 7.15: Let X be defined by dX(t) =a X(t) dt+bdS(t), wheredS(t) =
dM(t) and M(t) = B(t) 3 - 3 t B(t). The differential equations,

{1(t) =a JL(t), t ::: 0,


r(t,t)=2ar(t,t)+18b 2 t 2 , tO::O,
ar(t,s)
- - - = ar(t,s),
at
t::: s::: 0,
can be solved to find the second moment properties of X. <>
7.2. Linear systems 455

Note: The differential form of lt6's formula applied to the function B(t) r-+ B(t)n for an
integer n ~ 1 gives

d(B(t)n) = n B(t)n-l dB(t) + n(n 2- 1) B(t)n- 2 dt

so that dM(t) = 3 (B(t) 2 - t)dB(t). Hence, H(t) = 3 (B(t) 2 - t), E[H(t) 2]


E[(bH(t))2] = 18b2 t 2, and K(t) = 0. The above moment equations result from
Eq. 7.38 .•

The Ito formula in Eq. 7.2 can be applied to develop differential equations
for higher order moments of X and other probabilistic properties. However, the
determination of some coefficients of these equations can be cumbersome. Some
computational difficulties involved in the determination of higher order moments
of X are illustrated by the following example.

Example 7.16: Let X be defined by dX (t) = a(t) X (t) dt + b(t) dS(t), where
dS(t) = dM(t) and dM(t) = H(t) dB(t). The moments JL(p; t) = E[X(t)P]
are the solutions of

(;.(p; t) = pa(t) JL(p; t) + p(p2- l) b(t) 2 E [ X(t)P- 2 H(t) 2].

The expectation E [X(t)P- 2 H(tP] is difficult to calculate for p =::: 3 because of


the complex dependence between X and M. <>

Proof: lt()'s formula applied to the function X(t) r-+ X(t)P gives

X(t)P- X(O)P = lo tp X(s)p-l dX(s) +-1 lot p(p- 1) X(s)P- 2 d[X, X](s),
0 2 0
which yields the stated moment equation by averaging and differentiation with respect to
time since d[X, X](s) = b(s)2 H(s)2 dt.
JJ
For X(O) = 0, the state is X(t) = 8(t, s) b(s) H(s) dB(s) so that

E[X(t)P- 2 H(t) 2 ]= lro,t]q


{ E[(nH(sk)dB(sk))H(t)
k=I
2] (n()(t,sk)b(sk)) dsb
k=l

where q =p - 2 and () is the Green function for X. •

7.2.2.2 Direct method. General martingales


Let X be the solution of Eq. 7.28 with S defined by Eqs. 7.30 and 7.32.
We only require that the jumps of the coordinates of i: are in Lr. The con-
ditions in Eq. 7.33 need not be satisfied so that the martingale component of
S is not generally square integrable and the expectations E[H(t) H(t)T] and
456 Chapter 7. Deterministic Systems and Stochastic Input

E[K (t) e eT K (t)T] in Eq. 7.38 may not exist. We attempt here to develop a dif-
ferential equation for the characteristic function cp(u; t) = E [eFfuT X(t-)],
u E ffi.d, of X (t) because this function is always defined.

If A in Eq. 7.30 is a deterministic function a with continuous first order deriva-


tive, then

+~A, { E [exp (R [•T XCH + ~"' k4(t) r,,])] -~I·


(7.39)

Proof: The arguments u and t of the characteristic function rp(u; t) are not shown in the
above equation for simplicity.
The Ito formula in Eq. 7.2 applied to X(t) ~ exp (AuT X(t)) gives

1 d
-- L
t

2 k,l=l O+
1
Uk U[ eFf u
T
X(s-) d[Xk, XtJc(s)

+ L
O<s~t
[eFfuT X(s)- eFfuT X(s-)- tn
k=l
Uk eFfuT X(s-) b.Xk(s)],

(7.40)

where
d d'
dXk(s) = Lakt(s)Xt(s-)ds + Lbkr(s)dar(s)
1=1 r=l
db de
+L Hkp(s)dBp(s) +L Kkq(s)dCq(s),
p=l q=l

db db
d[Xk, Xt]c(s) = L Hkp(s) Htp'(s) d[Bp, Bp'](s) = L Hkp(s) Htp(s) ds,
p,p'=l p=l
7.2. Linear systems 457

and AXk(s) = Xk(s)- Xk(s-) = "d -


~q"::l Kkq(s) -
ACq(s). The expectation of the left
side ofEq. 7.40 is cp(u; t)- cp(u; 0), and its derivative with respect to time is acp(u; t)jat.
Consider now the first term on the right side of Eq. 7.40. There is no contribution
from the Brownian component of the driving noise. The terms corresponding to the com-
pensated compound Poisson process of the driving noise and the last term including the
jumps AXk(s) cancel. The only contribution of the first term comes from the drift of X
and the component « of the driving noise. The differentiation with respect to time of the
expectation of these contributions and properties of the characteristic function yield the first
two terms in Eq. 7.39.
The expectation of the second term on the right side of Eq. 7.40 and the expression
for [Xk, Xtf give the third term in Eq. 7.39.
The expectation of eAuT X(s)- eAuT X(s-) in the last term of Eq. 7.40
can be obtained from X(s) = X(s-) + AX(s), the expression of the jumps of X, and
properties of C. If Aq At « 1, the probability that a coordinate Cq of C has at least a jump
in (t, t +At] is approximately Aq At and the corresponding jump of Xk is Kkq (t) ACq (t),
where ACq(t) is equal to Yq,k for some k. The events Dq, q = 1, ... , de, that Cq jumps
in (t, t +At] and Cr, r =f. q, have no jumps in this time interval are disjoint so that

td
E [eA uT X(t-) e A I:Z=I uk Kkq(t) Yq,I _ e A uT X(t-) I Dq] P(Dq)
q=l

=At L Aq ( E [eAu
de T d -
X(t-) e A Lk=I Uk Kkq(t) Yq,l] _ cp(u; t))
q=l

since P(Dq) = (1 - e-Aq ~t) 0~~l,r;fq e-Ar M ~ Aq At provided that Aq At « 1.


The probability of the events corresponding to jumps of two or more processes Cq in (t, t +
At] are of order (At) 2 or higher so that they will not contribute to the partial differential
equation for cp. These considerations yield the last term in Eq. 7.39. •

Generally, Eq. 7.39 is not a partial differential equation for the characteristic
function of X since some of the expectations in this equation cannot be expressed
in terms of derivatives of q;. The condition in Eq. 7.39 becomes a partial differen-
tial equation for cp if, for example, H and K are deterministic functions, in which
case X is the state of a linear system driven by Gaussian and Poisson white noise
(Table 7.1). This special case has been considered in the previous section under
the assumption Yq,l E Lz, q = 1, ... , de.

Example 7.17: Let X be the solution of

dX (t) = a(t) X (t-) dt + b1 (t) dB(t) + bz(t) dC(t), t ::: 0,

where a, b1, and bz are deterministic functions of time. The input consists of
Brownian motion B and a compensated compound Poisson process C. The char-
acteristic function of X satisfies the partial differential equation

am
_.,..
at
am
= u a(t) _.,..
au
- 1
u2 b (t) 2
2
cp +A. (E [
e A ub1(t) YI] - 1) q;
'
458 Chapter 7. Deterministic Systems and Stochastic Input

where the arguments of the characteristic function are not shown. <>
Proof: Apply Eq. 7.39 ford= d 1 = 1, a(t) = 0, H(t) = b1(t), and K(t) = b 2 (t).
A special case of the above equation corresponding to h).(t) = 0 has been discussed in
Section 4.7.1.3. •

Example 7.18: If the coefficients b 1, b2 in the previous example are replaced by


some processes H, K E £ (Eq. 7.32), the characteristic function of X satisfies the
condition (Eq. 7.39)

Because the expectations on the right side of this equation cannot be expressed as
partial derivatives of cp, it is not possible to obtain a partial differential equation
for the characteristic function of X. <>

Example 7.19: Suppose that fi (t) 3 b (B(t) 2 - t) and K(t) = 0 in Exam-


ple 7.18 so that X is the solution of

dX(t) =a X (t) dt + 3 b (B(t) 2 - t) dB(t),

where a, bare some constants and B denotes a Brownian motion. The character-
istic function of X satisfies the condition (Eq. 7.39)

acp = ua acp- 9b2 u2 E


at au 2
[(B(t)2- t)2 eHuX(t)].
This is not a partial differential equation for cp since the complex dependence
between B(t) and X(t) prevents the representation of the above expectation as a
function of a partial derivative of cp. <>

Example 7.20: Let X be the solution of dX(t) = -p X(t-) dt + dLa(t), t ::::_


0, where X(O) = 0, p > 0, a E (1, 2), and La is an a-stable process with
E[exp(.J=T udLa(t))] = exp(-lula dt) (Sections 2.10.3.3 and 3.14.2). The
characteristic function cp(u; t) = E[exp( A u X (t))] of X (t) is the solution of
the partial differential equation

acp acp a
- = - p u - - i u l cp
at au

with the initial condition cp(u; 0) = I. <>


7.2. Linear systems 459

Proof: Note that the driving noise La does not admit the representation in Eqs. 7.30 and
7.32 so that Eq. 7.39 cannot be applied. The Ito formula in Eq. 7.3 applied to the function
X (t) 1-+ eH u X (t) gives

eHuX(t) -1 = t .J=lueHuX(s-) (-pX(s-)ds+dLa(s))


lo+
+ L [eHuX(s) _ eHuX(s-) _ .J=-1 u eHuX(s-) ~X(s)]
0<s::9
=-p.J=lu lot X(s-)eHuX(s-)ds+ L [e.J=IuX(s)_eHuX(s-)]
0+ O<s~t

since [X, X]c(t) = 0 and JJ+ eH uX(s-) dLa(s) = LO<s::;t eH u X(s-) ~X(s), so
that

rp(u; t)- 1 = E { -.J=-1 pu fo~ eHuX(s-) X(s-)ds

+ L
O<s::;t
eHuX(s-) (eHuLlLa(s) -l)).
Errors can result if we calculate the above expectation term by term since the expectations
of the individual terms may not be finite, as we will see in Example 7.53. However, we
will continue with formal calculations and perform the expectation term by term. The
expectation of the derivative with respect to time of the first term is

-p.J=-iuE[X(s-)eHuX(s-)] = -pu ::.


The expectation of the increment of the second term in a small time interval (t, t + ~t] can
be approximated by

L E [e.J=I u X(s-) (eR u LlLa(s) _ 1)] ~ rp(u; t) (e-lul" At _ 1)


t<s:::t+ilt
~ rp(u; t) (-lula ~t)

as ~t .j.. 0. Since the time derivative of the above term is q;(u; t) (-lula), our formal
calculations suggest that q; satisfies the stated partial differential equation.
We also note that the solution X(t) = JJ e-P (t-s) dLa(s) is an a-stable variable
Sa(u, {3, JJ-) with u = ((1- e-pat)/(pa)/fa, f3 = 0, and JJ- = 0 (Section 2.10.3.2 in
this book, [79], Example 3.27, p. 112). The characteristic function of X(t) is

- 1 u X(t) ] = exp ( -lula 1 - pe-Pa


q;(u; t) = E [ e-v,--, a t)

so that
aq; - t aq; 1-e-pat
-at = -e pa lula q;, -=-a
au pa
sign(u) lula- 1 q;,

and aq;jat + p u aq;;au = -lula q; showing that the above formal calculations yield a
correct result in this case. •
460 Chapter 7. Deterministic Systems and Stochastic Input

7.2.2.3 State augmentation method

The state augmentation method applies if it is possible to define a vector


process Z that includes the state X and satisfies a stochastic differential equa-
tion driven by Gaussian and/or Poisson white noise. Generally, the differential
equations defining the augmented state Z are nonlinear so that the methods in the
second part of this chapter dealing with nonlinear systems can be used to find
properties of Z.
It is possible to write conditions for various properties of the augmented
vector Z, for example, higher order moments and the characteristic or marginal
density function of Z. The resulting conditions may or may not be differential
equations for these properties of Z depending on the noise and system properties.
Instead of presenting general conditions for properties of Z, we give several ex-
amples illustrating the use of the state augmentation method for two special cases
of semimartingale noise (Eqs. 7.43 and 7.44). The last two examples in this sec-
tion examine the response of linear systems to polynomials of filtered Gaussian
and Poisson processes, that is, processes defined as the output of linear filters to
Gaussian and Poisson white noise, respectively.
We now outline the essentials of the state augmentation method for a state
vector X defined by

dX(t) = a(X(t-), t) dt + b(X(t-), t) dS(t), t;::: 0, (7.41)


X(t) = a(X(t-), t) + b(X(t-), t) p(S(t-), t), t;::: 0, (7.42)

rather thanEqs. 7.28 and 7.29. This definition of X includes the class of nonlinear
systems, which will be considered in Section 7.3.2.3. If a is linear in X and b
does not depend on X, the above equations coincide with Eqs. 7.28 and 7.29. It
is assumed that the matrices a, bare such that Eqs. 7.41 and 7.42 have a unique
solution (Sections 4.7.1.1 and 4.7.2). Let S be the semimartingale process in
Eqs. 7.30 and 7.32. We consider two special cases for S corresponding to the
following definitions of the processes A, H, and K.

Case 1. Let«, h, and k be some deterministic functions and define

A(t) = «(t), H(t) = h(B(t), t), and K(t) = k(C(t-), t). (7.43)

Case 2. Let {J, h, and k be some deterministic functions and define

dA(t) = {J(S(t-), t)dt, H(t) = h(S(t-), t), and K(t) = k(S(t-), t).
(7.44)
7.2. Linear systems 461

State augmentation formulation. If X satisfies Eq. 7.41 and Sis defined by


= (X, U = B, V = C) is the solution of
Eqs. 7 .30, 7 .32, and 7 .43, then Z

dX (t) = a(X (t-), t) dt + b(X (t- ), t) [da_(t)


{ +h(U(t), t) dB(t) + k(V(t- ), t) dC(t)],
(7.45)
dU(t) = dB(t),
dV(t) = dC(t).

Note: The augmented vector Z in Eq. 7.45 satisfies a stochastic differential equation driven
by Gaussian and Poisson white noise. Generally, this equation is nonlinear even if the
differential equation for X is linear.
If S corresponds to the special case in Eq. 7.44, then the augmented vector Z =

I
(X, S) is the solution of the stochastic differential equation

dX(t) = a(X(t- ), t) dt + b(X (t- ), t) [d{j(S(t- ), t)


+h(S(t-), t) dB(t) + k(S(t- ), t) dC(t)], (7.46)
dS(t) = p(S(t-), t) dt + h(S(t-), t) dB(t) + k(S(t- ), t) dC(t)
driven by Gaussian and Poisson white noise. A

State augmentation formulation. If X satisfies Eq. 7.42 and Sis defined by


Eqs. 7.30, 7.32, and 7.44, then Z =(X, S) is the solution of

dX(t) =a(X(t-),t)dt+b(X(t-),t)p(S(t-),t)dt,
{
dS(t) = p(S(t-), t) dt + h(S(t-), t) dB(t) + k(S(t- ), t) dC (t).
(7.47)

Example7.21: Let X be the solution of dX(t) = aX(t)dt + bdS(t), where


S(t) = M(t) = B(t) 3 - 3 t B(t) and B is a Brownian motion. The moments
fl-(p, q; t) = E[X(t)P Y(t)q] of the augmented vector Z = (X, Y = B) satisfy
the differential equations
9b 2
ft(p, q; t) = p a f1,(p, q; t) + T p (p- 1) [/1-(P- 2, q + 4; t)
-2t /1-(P- 2, q + 2; t) + t 2 /1-(P- 2, q; t)]
+ 3b p q [/1- (p - 1, q + 1 ; t) - t 11- (p - 1, q - 1; t)]
1
+ 2 q (q - 1) /1-(p, q- 2; t),

where p, q :::: 0 are integers and 11-(u, v; t) = 0 if at least one of its arguments u, v
is strictly negative. The equations of order p+q contain the moments fl-(p-2, q+
4; t) of order (p- 2) + (q+4) = p +q +2. It may seem that we cannot calculate
the moments /1-(p, q; t) of Z. However, the moments /1-(P - 2, q + 4; t) can
be obtained sequentially for increasing values of p since the moments 11-(0, q; t)
depend only on S, so that they can be calculated from the defining equation of this
process. We say that the above infinite hierarchy of equations is closed. <>
462 Chapter 7. Deterministic Systems and Stochastic Input

Proof: The augmented vector Z is a diffusion process defined by the stochastic differential
equation
dX(t) =a X(t) dt + 3 b (Y(t) 2 - t) dB(t),
{
dY(t) = dB(t),

since dS(t) = 3 (B(t) 2 - t)dB(t). The Ito formula applied to function Z =(X, Y) r-+
X P yq gives the above moment equations by averaging and differentiation. This example
illustrates the state augmentation method in Eq. 7.45. •

Example 7.22: Let X be the process in Example 7.21. We have found that the
direct method cannot deliver a partial differential equation for the characteristic
function of X (Example 7 .19). However, we have

where rp(u, v; t) = E [eA(uX(t)+vY(t))] is the characteristic function of the


augmented state vector Z =(X, Y =B).<>

Proof: We have seen in the previous example that Z = (X, Y) is a diffusion process. Let
x(u, v; t) = eACuX(t)+vY(t)) be a differentiable function of Z(t). The Ito formula in
Eq. 7.3 applied to the mapping (X, Y) r-+ x (u, v; t) gives

x(u, v; t)- x(u, v; 0) =lot v"=1 u x(u, v; s) dX(s) +lot R v x(u, v; s) dY(s)

+~lot (cv"=l u) 2 x(u, v; s) d[X, X](s) + 2 (H u) (H v) x(u, v; s) d[X, Y](s)


+( H v) 2 X (u, v; s) d[Y, Y](s)) ,

where d[X, X](s), d[X, Y](s), and d[Y, Y] are equal to 9 b2 (Y(s) 2 - s) 2 ds, 3 b (Y(s) 2 -
s) ds, and ds, respectively. By averaging the above equation and differentiating the result
with respect to time, we obtain the stated partial differential equation of((!. The character-
istic function({! must satisfy the condition ({J(O, 0; t) = 1 (Section 2.11.3). Also,({! has to
satisfy the initial condition ({J(u, v; 0) = <P<J(u) for B(O) = 0, as considered here, where
({Jo is the characteristic function of X (0). Boundary conditions needed to solve the partial
differential equation for({! are considered later in this chapter (Eq. 7.62).
The formula in Eq. 7.39 can also be used to obtain the above partial differential
equation for the characteristic function of Z. •

The previous examples have considered stochastic problems of the type in


Eq. 7.45. In the remainder of this section it is assumed that the state X is contained
7.2. Linear systems 463

in a vector Z =(X, S) satisfying Eq. 7.47. Our objective is to calculate moments

(7.48)

of order p +q of Z, where p;, qk ::: 0 are integers, p = "£f=l p;, and q =


"£t~ 1 qk. Theconventiontt(PI, ... , pa, q1, ... , qar; t) =
Ois usedifatleastone
of the p; 's or qk 's is strictly negative. If the state vector Z becomes a stationary
process Zs as t -+ oo, the moments of Zs are time invariant and are denoted by
J.t(PI, ... , pa, q1, ... , qar), that is, the last argument indicating the dependence
of time is eliminated.
If Z =(X, S) is defined by Eq. 7.47, then

provided that the above expectations are finite.

Note: The Ito formula in Eq. 7.2 applied to the function


d d'
(X(t), S(t)) t-+ TI X; (t)Pi TI Sk(t)qk
i=l k=l

gives the above condition. If the coefficients of the differential equations defining Z are
polynomials of X and S, Eq. 7.49 is a differential equation for the moments of the aug-
mented state vector Z. Generally, these equations form an infinite hierarchy. However,
there are numerous special cases of Eq. 7.47 that are relevant in applications and the resul-
tant moment equations are closed, as demonstrated by the following examples. .l
464 Chapter 7. Deterministic Systems and Stochastic Input

Example 7.23: Let X be defined by


n
X(t) + 2~ v X(t) + v2 X(t) = l:az(t) sui
1=0

where dS(t) = -a S(t) dt +a v'2ct dB(t), a > 0, a,~ > 0, and v > 0 are some
constants, B denotes a Brownian motion, n 2:: 1 is an integer, and a z are continuous
functions. The process X represents the displacement of a linear oscillator with
natural frequency v and damping ratio ~ driven by a polynomial of the Ornstein-
Uhlenbeck process S.
The moments f,L(p, q, r; t) = E [X (t)P X(t)q S(tY] of orders = p+q +r
of Z = (X, X, S) satisfy the ordinary differential equation

M(p, q, r; t) = p fL (p - 1, q + 1, r; t) - q v 2 fL (p + 1, q - 1, r; t)
n
- (q f3 + r a) f,L(p, q, r; t) + q l:az(t) f,L(p, q- 1, r + l; t)
1=0
+ r(r - 1) a a 2 f,L(p, q, r - 2; t)

at each time t, where f3 = 2 ~ v and fL(p, q, r; t) = 0 if at least one of the


arguments p, q, orr of f,L(p, q, r; t) is strictly negative. If (X, X, S) becomes sta-
tionary as t --+ oo, the stationary moments of this vector process can be obtained
from a system of algebraic equations derived from the above moment equation by
setting /L(p, q, r; t) = 0.
Figure 7.4 shows the dependence on damping f3 = 2 ~ v of the coefficients

3 '5 Skewness
3

QOL---~~--~----~
0.5 1.5 0.5 1.5

Figure 7.4. Coefficients of skewness and kurtosis of the stationary displacement


X of a simple linear oscillator driven by the square of an Ornstein-Uhlenbeck
process

of skewness and kurtosis, Y3 and y4, of the stationary process X for v 2 = 1.6,
v2 = 9.0, and the input 'E.?=o az (t) S(t) 1 with n = 2, ao(t) = a1 (t) = 0,
7.2. Linear systems 465

a2(t) = 1, a = 0.12, and a = 1. The skewness and the kurtosis coefficients


are increasing functions of f3. If the linear system has no damping ({3 = 0), these
coefficients are zero and 3, respectively, that is, the values corresponding to Gaus-
sian variables. This result is consistent with a theorem stating roughly that the
output of a linear filter with infinite memory to a random input becomes Gaussian
as time increases indefinitely ([79], Section 5.2). The linear filter in this example
has infinite memory for f3 = 0. <>
Proof: The state vector Z = (X, X, S) is a diffusion process defined by

X(t) ]
d [ X(t) =[ X(t)
-v2 X(t)- f3 X(t) + L:?=o a1(t) S(t)l ]
dt +[ o0 ] dB(t).
S(t) -a S(t) u ,.fi(X

l[ l
The Ito formula applied to (X(t), X(t), S(t)) r-+ X(t)P X(t)q S(tY gives the above dif-
ferential equation for the moments of (X, X, S) following averaging and differentiation.
To see that the moment equations are closed, consider the collection of equations,

11-(s, 0, r; t) 11-(s, 0, r; t)
:, M(' ~I, I,'· t) ~c M(' ~I, I,'· t)

l [ l
[
!1-(1,s-1,r,t) !1-(1,s-1,r,t)
11-(0, s, r; t) 11-(0, s, r; t)

L:1al(f)/-L(s -1,0,
0 r +l; t) 11-(s, -1,
11-(s 0, r -
1, 2;
r -t) 2; t)
+[ : +r(r -1)au 2 : ,
(s -1)L:1al(t)/1-(1,s -2,r+l;t) !1-(1,s -1,r -2;t)
s Li a1(b) 11-(0, s- 1, r + l; t) 11-(0, s, r - 2; t)
for a fixed value of s = p + q, where cis a time-invariant matrix that can be constructed
from the moment equations. The differential equations for the moments of Z with s = 1
can be used sequentially for increasing values of r to find /1-(1, 0, r; t) and 11-(0, 1, r; t)
for any value of r. These moments are needed to solve the moment equations for Z with
s = 2. These calculations can be continued to obtain moments of any order of the state
vector Z =(X, X, S). •

Example 7.24: Let X be the solution of X(t) = -p X(t) + g(S(t)), where g


is an increasing function with continuous second order derivative and S denotes
the Omstein-Uhlenbeck process in Example 7.23. The augmented vector Z =
(X, Y = g(S)) is a diffusion process defined by

{
dX(t) =
(-pX(t) + Y(t)) dt,
dY(t) = -ay(Y(t)) dt + by(Y(t)) dB(t),

where ay(y) =a a 2 g"(z) -a z g'(z) and by(z) 2 = 2a a 2 (g'(z)) 2 , z = g- 1 (y),


are the drift and the diffusion coefficients of Y, respectively. <>
466 Chapter 7. Deterministic Systems and Stochastic Input

Proof: Because g is an increasing function, it has an inverse so that it is possible to derive


a stochastic differential equation for Y from the Ito formula applied toY= g(S) since g- 1
exists and S = g- 1(Y). Hence, (X, Y) is an JR2 -valued diffusion process. •

Example 7.25: Let X be the process in Example 7.23. The characteristic function
of the augmented state vector Z (Z1 = X, Z2 X, Z3 = S) satisfies the
partial differential equation
arp 2 arp
2 2
- = - a a w rp- v v - + (u- f3 v )arp-
at au av
-a warp+ v taz(t) (-1) 1 cJ-=1)1+ 1 azrp,
aw 1=0 awl

where rp(u, v, w; t) = E[exp(J=T (u Z1(t) + v Z2(t) + w Z3(t)))] denotes the


characteristic function of Z(t) and f3 = 2 v. <>s
Proof: Apply the ltOforrnula in Eq. 7.3 to function Z(t) r--+ exp(.J=T (u Z1 (t)+v Z2(t) +
w Z3(t))). The derivative with respect to time of the expected value of this formula gives
the equation for cp. •

The state vector Z = (X, S) in the following two examples is defined by

d[ X(t)
S(t)
J= [ a(t)X(t-)+p(S(t-),t)
a(t) S(t-)
J dt+ jffi.d'
{ [ 0 J M(dt d)
y ' y '
(7.50)

where M is a Poisson measure (Section 3.12) and p is a polynomial in S with


coefficients that may depend on time. We note that the driving compound Poisson
process C(t) = f~+ fffi.d' y M(ds, dy) is not compensated and differs from the
process in Eq. 7.37.
Note: The coordinates of p are
d'
p;(S(t), f)= L /;;,dy) Il Sr(t)di,r, (7 .51)
d; r=1

where d; = (di,l, ... , di,d'), d;,r :::: 0 denote integers, and /;;,d(t) are time dependent
coefficients. The compound Poisson process driving this equation has the representation
N(t) 1
C(t) = '"_LYk = f yM(dt,dy), (7.52)
k=1 lo+
where N is a Poisson process of intensity ). > 0, Y k are independent copies of an JR<l'-
valued random variable Y 1 with distribution F and finite moments, and M (d t, d y) is a
Poisson random measure with expectation E[M(dt, dy)] = A.dt dF(y). The moments
{L(q1, ... , qn; t) = E [TI/= 1 Z;(t)q;] of Z = (X, S) are defined as in Eq. 7.48, where
q; :::: 0 are integers, q = I:/= 1 q;, and n = d + d'. A
7.2. Linear systems 467

If Z is the solution ofEq. 7.50, then

/J,(qi, · · ·, qn; t)
d
= L qk akz(t) p,(q1, ... , qk. ... , qz, ... , qa, ... , qn; t)
k,l=l
d
+L Lqk ~k.dk p,(q1, · · ·, qk- 1, ... , qa, qa+I + dk,I, ... , qn + dk,d'; t)
k=l dk
n
+ L qk CXk-d,l-d p,(qi, ... , qa, ... , qk, ... , qz, ... , qn; t)
k,l=d+l

+A.E [ nxi(t-)q; {
.
d
n"
lJRd' .
d' q; ·'
q,.
~ ri!(qi- ri)!
Sj(t-tiy~;-r;dF(y) ]
1
1=l J=l r;=O
- A. p,(ql, ... , qn; t), (7.53)

Proof: The augmented vector Z = (X, S) E R", n = d + d1, is the solution of

dZ(t) = m(Z(t-), t) dt + { c(Z(t-), t, y) M(dt, dy), t ~ 0,


lJRd'

where matrices m(z, t) and c(z, t, y) are such that the above equation has a unique solution
(Section4.7.2). ThedifferentialformoftheltOformulaappliedtog(Z(t)) = flt=l Zi(t)q;
gives (Eq. 7.2)
468 Chapter 7. Deterministic Systems and Stochastic Input

where 8,k(Z) = og(z)jozk· The expectation of this equation is

=It
dJ-L(ql, ... , qn; t)

qkakz(t)J-L(ql, ... ,qko ... ,qz, ... ,qd, ... ,qn;t)


kJ==l
d
+L Lqk ~;k,dk J-L(ql, ...• qk- 1, ...• qd. qd+l + dk,l, ...• qn + dk,d'; t)
k=l dk

+ t qkak-d,l-dJ-L(ql, ... ,qd, ... ,qko ... ,qz, ... ,qn;t)l dt

In
k,l=d+l

+). dt E
.
z=l
d
Xi (t)q; {
}fii!.d'.
d' q;
n '"""'
·'
q,. S · (t)'i yq;-r; dF(y)
~ ri!(qi -ri)! 1
1=1 r;=O
1

- fd,fizi(t)q;dF(y)l,
}Til!. i=l

with the convention that moments J-L(ql, ... , qn; t) with at least one strictly negative argu-
ment are zero.
An algorithm can be developed for solving exactly Eq. 7.53 following the approach
in Example 7.23. The moments J-L(O, ... , 0, rd+l, ... , rn) can be calculated exactly since
S is a filtered Poisson process ([79], Section 3.3). If the coefficients of the differential
equation defining Z are time invariant and this process becomes stationary as t ---+ oo,
the moments J-L(ql, ... , qn; t) = J-L(ql, ... , qn) do not depend on time so that they satisfy
algebraic equations obtained from Eq. 7.53 by setting jL(ql, ... , qn; t) = 0. •

Example 7.26: Let Z =(X, S) be the solution of

d[ S(t)
X(t) J = [ aX(t-)+S(t-) J d { [ 0J M(d d) 1
aS(t-) t+ }Til!. t, y y

with the notation in Eq. 7.50. If E[Yf] exists and is finite fork::=: 1, the moments
tt(p, q; t) = E[X (t)P S(t)q] of Z = (X, S) satisfy the differential equation

f_L(p, q; t) = (p a+ q a) tt(p, q; t) + p tt(P- 1, q + l; t)


~ q!
+A. 6_ k!(q- k)! f.L(p, q - k; t) E[Yl].
k

Figure 7.5 shows the first four moments tt(p, 0; t), p = 1, ... , 4, of X
calculated from the above moment equations and their Monte Carlo estimates for
Z(O) = 0, a= -2:rr, a= -1, l = 2, A.= 2, and Y1 ~ N(O, 5). The dependence
of the stationary skewness and kurtosis coefficients of X on the parameter a is
illustrated in Fig. 7.6 for A. = 2 and A. = 200. The stationary skewness and kurtosis
7.2. Linear systems 469

Mean
1.5
/
MC simulation MC simulation

2 3 4 5 2 3 4 5

30 140
Skewness 120 Kurtosis
25
100 MC simulation
20
MC simulation

/;Momeot
80
15
60
10
;equations 40
5\... 20

0 0
0 2 3 4 5 0 2 3 4 5
time time

Figure 7.5. The mean, variance, coefficient of skewness, and coefficient of kurto-
sis functions of X

40
35 Kurtosis
3.5 A=2 30

25

1..=200

1.5 A=200
5

0.5 0
0 2 3 0 2 3
-a/1t -a/n

Figure 7.6. Stationary coefficients of skewness and kurtosis of X

coefficients of X differ significantly from their Gaussian counterparts Yg,3 = 0


and Yg,4 = 3 indicating that X is a non-Gaussian process. The skewness and
kurtosis coefficients of X approach Yg,3 and Yg,4 as a -+ 0, consistent with a
result in ([79], Section 5.2). The stationary skewness and kurtosis coefficients of
X also approach Yg,3 and Yg,4 if).. increases indefinitely and a is kept constant
470 Chapter 7. Deterministic Systems and Stochastic Input

since the input becomes Gaussian as A ~ oo under adequate scaling. ¢


Proof: The expectation in Eq. 7.53 for the mapping Z t-+ XP Sq is

E [x<t-)P (csct-) + Y1)q- s(t-)q)] = Lq '


q.
k=l k!(q- k)!
E [xct-)P sct-)q-k Yf]

q!
=L
q
k!(q - k)! p,(p, q - k; t) E[Yf]
k=l
and involves moments ofordersmallerthan p+q. We also note that the moments p,(O, q; t)
can be calculated in advance from the characteristic function

q~(u; t) = E [eA uS(t)] = exp { -).. t [ 1- ~lot E [exp(.J=l u Y1 /.l!(t-s))] ds J}


or the cumulants

Xq(t) =)..
t E [(Y1 ea(t-s)) q] ds = ~
Jo
)..E[Yq]
[exp(qat) -1]

of S(t) ([79], Section 3.3).


Consider now the set of moment equations corresponding to increasing values of
s = p + q = 1, 2, ... , that is, the equations
jl(l, 0; t) =a p,(l, 0; t) + p,(O, l; t),
jl(O, 1; t) =a p,(O, 1; t) +).. E[YI] p,(O, 0; t),

fors = 1,
jl(2, 0; t) = 2a p,(2, 0; t) + 2p,(l, l; t),
jl(1, 1; t) =(a+ a) p,(1, 1; t) + p,(O, l + 1; t) +).. E[YI] p,(1, 0; t),
jl(O, 2; t) = 2a p,(O, 2; t) +).. (2 E[Yd p,(O, 1; t) + E[Yf] p,(O, 0; t)),

fors = 2, and
jl(3, 0; t) = 3 a p,(3, 0; t) + 3 p,(2, l; t),
jl(2, 1; t) = (2a +a)p,(2, 1; t) +2p,(1,l + 1; t) +)..E[YI]p,(2,0; t),
jl(l, 2; t) = (a+ 2a) p,(l, 2; t) + p,(O, l + 2; t) +).. (2 p,(1, 1; t) E[YI]
+ p,(1, 0; t) E[Yf]).
JL(O, 3; t) = 3 a p,(O, 3; t) +).. (3 p,(O, 2; t) E[YI] + 3 p,(O, 1; t) E[Yf]

+p,(O, 0; t) E[Yf]) ,
for s = 3. The moments of orders = 1 can be calculated since p,(O, q; t) are known for
all values of q. The moment equations for s = 2 involve the unknown moments p,(1, l; t).
These moments can be determined recursively from the moment equations
q!
+ l; t) +).. L
q
jl(1, q; t) = (a+ q a) p,(1, q; t) + p,(O, q k!(q _ k)! p,(1, q - k) E[Yf]
k=l
7.2. Linear systems 471

by increasing q. The moment equations of order s = 3 involve the unknown moments


J.L(2, l; t) that can be determined recursively from

q!
= (2a +q a) J.L(2, q; t)+2J.L(1, q +l; t)+A. L
q
it(2, q; t) k' _ k)'p,(2, q -k) E[Yf],
k=l .(q .

starting with q = 1. •

Example 7.27: Let Z = (X, S) be the process on Example 7.26. The charac-
teristic function of the augmented state vector Z satisfies the partial differential
equation

aq; aq;
- = ua-
at au
+ (-1) I (.J=i) I+1 ualq;
av1
aq;
- +av- +A.q;[q; (v)
av y
-1],

where q;(u, v; t) = E[exp( .J=T (u X (t)+v S(t)))] and the characteristicfunction


of Y1 is denoted by q;y(v) = E[exp(.Ff v Y1)]. <>

Proof: Apply Ito's formula in Eq. 7.4 to (X(t), S(t)) r+ exp(H (u X(t) + v S(t))).
We discuss here only the contribution of the last term in Eq. 7.4. The jumps of the co-
ordinates of Z at a time a > 0 are ~X(a) = 0 and ~S(a) = ~C(a), where C(t) =
J6+ y M (da, dy ). We have
E [ev'=1(u X(t)+v S(t)) _ eH (u X(t-)+v S(t-)) J
=E [eR (u X(t-)+vS(t-)) (eR v t..C(t) _ 1) J
= E [ eH (u x (t- )+v S(t-))] ( E[eH v t..C(t)] _ 1)

where the last equality holds since Z(t-) and ~C(t) are independent.

Note also that the characteristic function qi(v) = E[eH v S(t)] of S(t) satisfies the
partial differential equation

aq5 aq5 _
- = a v - +A.rp[rpy(v) -1],
at au
which can be obtained from the above partial differential equation of rp for u = 0. •

Example 7.28: Let X be the displacement of a linear oscillator with damping


ratio ~ > 0 and frequency v > 0 satisfying the equation

X(t) + 2 ~ v X(t) + v 2 X(t) = S(t) 1 ,

where S is a filtered Poisson process defined by

dS(t) =a S(t) dt + dC(t),


472 Chapter 7. Deterministic Systems and Stochastic Input

C denotes a compound Poisson process, and l > 0 is an integer. The moments


f.L(p, q, r; t) = E[X(t)P X(t)q S(tY] of Z =(X, X, S) are given by

f.t(p, q, r; t) = (-2 ~ v q +a r) f.L(p, q, r; t) + p f.L(p - + 1, r; t)1, q


- v 2 q t-L(P + 1, q- 1, r; t) + q f.L(p, q- 1, r + l; t)
r
~ r! k
+A~ k!(r _ k)! f.L(p, q, r- k; t) E[Y1 ].
k=l

Figure 7. 7 shows the first four moments of X calculated from the above
moment equations and their Monte Carlo estimates for Z (0) = 0, v = 2 n:, ~ =
0.05,a = -l,l = 2,A = 2,andY1 "'"'N(O, 12). Thedependenceofthestationary

Variance

0.5
0.1 Moment ~Moment
equations equations
0
2 3 4 5 0 2 3 4 5

30 150
Skewness Kurtosis
25

20
MC simulation
100 MC simulation

/
15 Moment
Moment 50 equations
equations

0 0
0 2 3 4 0 2 3 4 5

time time

Figure 7. 7. The mean, variance, coefficient of skewness, and coefficient of kurto-


sis functions

skewness and kurtosis coefficients of X on~ is illustrated in Fig. 7.8 and Fig. 7.9
for A = 2 and A = 200, respectively, and v = 2 n:, n:, and n: /2. These coefficients
approach Yg,3 = 0 and Yg,4 = 3, that is, the values corresponding to Gaussian
variables, as ~ --+ 0. The stationary skewness and kurtosis coefficients of X differ
significantly from Yg,3 and Yg,4 for other values of~ indicating that X is a non-
Gaussian process. The differences between the skewness and kurtosis coefficients
of the stationary displacement X and Yg,3 and Yg,4 decrease as A increases, an
expected finding since the input approaches a Gaussian process as A --+ oo. <>
7.3. Nonlinear systems 473

6,-------------------, 50--------------------,
Skewness Kurtosis
5
V=21t

0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5

Figure 7.8. Stationary coefficients of skewness and kurtosis of X for A= 2

4,-------------------, 20,-------------------,

3.5
Skewness Kurtosis
3

0.1 0.2 0.3 0.4 0.5 0.1 0.2 0.3 0.4 0.5

Figure 7.9. Stationary coefficients of skewness and kurtosis of X for A = 200

Proof: The above moment equations have been obtained from Eq. 7.53 applied to the
augmented vector Z = (X, X, S). The resulting system of equations for the moments of Z
is closed so that we can calculate moments of any order of Z. •

7.3 Nonlinear systems


If the matrix a in Eq. 7.5 is a nonlinear function of X or both a and b
depend on the state vector, the system is nonlinear. The matrices a and b may
also depend on time explicitly. The general form of a nonlinear system is
X(t) = a(X(t), t) + b(X(t), t) Y(t), t :=:: 0, (7.54)
where the driving noise Y is an JE.d' -valued process and the entries of the (d, 1)
and (d, d') matrices a and bare real-valued functions of state and time. Systems
474 Chapter 7. Deterministic Systems and Stochastic Input

defined by Eq. 7.54 in which a and b are linear functions of X are viewed as
nonlinear. This definition of a nonlinear system is not general ([131], Section 3.3).
We have defined a linear system by Eq. 7.5 consistently with the theory of
linear random vibration [ 175]. Moreover, there are relatively simple and general
methods for finding properties of the state X of Eq. 7.5, as we have seen in Sec-
tion 7 .2. For example, the probability law of X is completely defined by its second
moment properties if the input to Eq. 7.5 is a Gaussian process. In contrast, there
is no general method for finding the probability law and other properties of X de-
fined by Eq. 7 .54. Some examples of nonlinear systems for which it is possible to
find analytically properties of the state defined by Eq. 7.54 are in Section 7.3.1.4.
The state vector X in Eq. 7.54 can define the evolution of a dynamic sys-
tem, material structure, damage state in a system, portfolio value, and other pro-
cesses (Section 7.4). The determination of probabilistic properties of X is the
main objective of nonlinear random vibration, a branch of stochastic mechan-
ics and dynamics analyzing the response of nonlinear dynamic systems subjected
to random input [ 175]. Two cases are distinguished in nonlinear random vibration
depending on the functional form of b. If b does not depend on the state X, that is,
b(X(t), t) = b(t), the nonlinear system is said to have additive noise. Otherwise,
we say that the system has multiplicative noise. The behavior of systems with
additive and multiplicative noise can differ significantly.

Example 7.29: Let X be the solution of the stochastic differential equation


dX(t) = (/3 X(t)- X(t) 3 ) dt +a b(X(t)) dB(t), t ~ 0,

where {3 = ot + a 2 /2, ot and a are constants, b(x) = 1 or b(x) = x, and B denotes


a Brownian motion. The noise is additive if b(x) = 1 and multiplicative if b is a
function of x. Figure 7.10 shows samples of X for b(x) = 1, b(x) = x, ot = ±1,
and a = 1. The samples of X corresponding to multiplicative noise converge to
zero for ot = -1 and oscillate about a constant value for ot = 1. The samples of
X corresponding to additive noise have a very different behavior. For ot = -1,
X oscillates about zero. For ot = 1, X oscillates for some time about ../[5 then
jumps and continues to oscillate about - ../[5. <>
Note: We will come back to this example in our discussion on the stability of the solution
of dynamic systems with multiplicative random noise, referred to as stochastic stability
(Section 8.7). &

Our objectives in this section are as in the first part of the chapter. We at-
tempt to find properties of X satisfying Eq. 7.54 driven by various noise processes
Y. Because X is the solution of a nonlinear differential equation, it is not a Gaus-
sian process even if Y is Gaussian. Most available results are for inputs Y that
can be modeled by Gaussian white noise, but even for these inputs, probabilistic
properties of X are not generally available in analytical form. Numerical methods,
Monte Carlo simulation, and heuristic assumptions are used in most applications
to characterize X.
7.3. Nonlinear systems 475

1.---~--------------, 4,-------------------,
a=-1 (multiplicative noise) a=1 (multiplicative noise)
0.8
3

8 10 10 20 30 40 50

a=-1 (additive noise)


2

-2L---~--~--~--~~
0 10 20 30 40 50 10 20 30 40 50
time time

Figure 7.10. Samples of X for multiplicative and additive noise, ot = ±1, and
a= 1

7 .3.1 Brownian motion input


Let Y in Eq. 7.54 be a Gaussian white noise defined formally by the deriva-
tive dB I d t of a Brownian motion B with respect to time. Then X is a diffusion
process satisfying the stochastic differential equation

dX(t) = a(X(t), t) dt + b(X(t), t) dB(t), t 2: 0. (7.55)

Conditions for the existence and uniqueness of the solution of Eq. 7.55 are dis-
cussed in Section 4.7.1. The coefficients a and bbT in Eq. 7.55 are called drift
and diffusion, respectively.
We develop differential equations for moments, densities, and characteristic
functions of X and discuss the solution of these equations. Solutions of Eq. 7.55
by Monte Carlo simulation and various approximations are also presented.

7.3.1.1 Moment equations


Let g(X(t)) = flf= 1 Xi (t)q; and denote by
(7.56)
476 Chapter 7. Deterministic Systems and Stochastic Input

the moments of order q = 'Lf=


1 qi of X(t), where qi ::: 0 are integers and X
is the solution of Eq. 7.55. The convention tJ-(ql, ... , qd; t) = 0 is used if at
least one of the qi 's is strictly negative. If X becomes a stationary process X s as
t -+ oo, the moments of X s are time invariant, and we denote these moments by
tJ-(ql, ... ' qd).

The moments of X defined by Eq. 7.55 satisfy the condition

.
tJ-(ql, ... , qd; t) =~ [ ag(X(t))]
~ E ai(X(t), t) --=----'-:...:....
i=l axi(t)

+~
2
t
i,j=l
E [(b(X(t))b(X(t))T) a2g(X(t))
ij axi(t) axj(t)
] .

(7.57)

If the state X becomes a stationary process X s as t -+ oo, then

where Xs,i is the coordinate i of Xs.


Proof: Apply Ito's formula in Eq. 7.3 to function X(t) r+ g(X(t)). This formula can
be used since g E C2(]Rd) and X is a semimartingale with continuous samples (Sec-
tion 4.7.1.1), and gives

dg(X(t)) =L
d og(X(t)) (
ai(X(t), t) dt +L
d'
bik(X(t), t) dBk(t)
)
~~ a~U) bl

1 d d' a2g(X(t))
+ 2 .~ L axi(t)oXj(t)bik(X(t),t)bjz(X(t),t)d[Bk,Bz1(t).
I,J=l k,/=1
The second term, dM(t) = (og(X(t))/oXi(t)) bik(X(t)) dBk(t), on the right side
of the above equation is a local martingale by a preservation property of the stochastic
integral so that E[dM(t)] = 0 since M(O) = 0. The expectation of the above equation
divided by dt gives Eq. 7.57 because d[Bh Bz](t) = 8kz dt.
If X= Xs, its moments are time invariant so that Ji,(ql, ... , qd; t) = 0 and Eq. 7.57
yields Eq. 7.58. •
The condition in Eq. 7.57 cannot be used to calculate moments of X since
the expectations on its right side are not moments of X. If the drift and diffu-
sion coefficients are polynomials of the state vector, these expectations become
7.3. Nonlinear systems 477

moments of X. Under these conditions, Eqs. 7.57 and 7.58 are differential and
algebraic equations for the moments of the state vector, respectively, and are re-
ferred to as moment equations.

Example 7.30: Let X be defined by

dX(t) = a(X(t)) dt + b(X(t)) dB(t), t:;:: 0,

where the drift and diffusion are such that the solution X exists and is unique. The
moments of X (t) satisfy the condition (Eq. 7 .57)

jJ,(q; t) = q E [ X(t)q-l a(X(t))] + q (q 2- 1) E [ X(t)q- 2 b(X(t)) 2].

If the drift and diffusion coefficients are polynomials of X, this condition becomes
a differential equation for the moments of the state X. <>

Example 7.31: Suppose that the drift and diffusion coefficients in Example 7.30
are a(x) = {Jx- x 3 and b(x) =a. The moments of X satisfy the differential
equation

. q (q- l)a 2
J.t(q; t) = q fJ J.t(q; t) - q J.t(q + 2; t) + 2 J.t(q - 2; t).

It is not possible to calculate the moments of X exactly since the above moment
equations form an infinite hierarchy. The calculation of a moment of any order
of X(t) involves the solution of an infinite set of equations simultaneously. We
present in Section 7.3.1.5 some heuristic methods, called closure methods, used
in physics and nonlinear random vibration theory to solve infinite hierarchies of
moment equations ([ 175], Section 6.1.1 ). If the moment equations do not form an
infinite hierarchy, we say that the moment equations are closed. In this case it is
possible to calculate exactly moments of any order of X. <>

Note: The moment equations up to order q ~ 3 are

Ji,(l; t) = f3 JL(l; t)- JL(3; t),


jL(2; t) = 2 f3 JL(2; t)- 2 JL(4; t) + a 2,
Ji,(3; t) = 3 f3 JL(3; t)- 3 JL(5; t) + 3 a 2 JL(l; t).

These moment equations involve five unknowns, the moments JL(l; t), JL(2; t), JL(3; t),
JL(4; t), and JL(5; t). Hence, it is not possible to find the first three moments of X(t). The
situation persists for any value of q.
The stochastic differential equation for X has a unique solution because its coeffi-
cients satisfy local Lipschitz and growth conditions (Section 4.7.1.1). The theory of deter-
ministic ordinary differential equations can be used to assess the existence and uniqueness
of the solution of the moment equations. .&
478 Chapter 7. Deterministic Systems and Stochastic Input

Linear systems driven by polynomials of filtered Gaussian and Poisson pro-


cesses are examples of nonlinear systems for which the moment equations are
closed (Section 7.2.2.3). The following example presents another nonlinear sys-
tem characterized by closed moment equations.

Example 7.32: Let X be defined by the stochastic differential equation dX (t) =


-a X (t) dt + a X (t) dB(t), t 2: 0, where a, a are constants and B denotes a
Brownian motion. We have

JL(q; t) = E[X(t)q] = JL(q; 0) exp [(-a+ (q-l)a2) q t


2
J
for any integer q 2: 0. The moment of order q of X approaches zero and ±oo as
t -+ oo if a > a*(q) and a < a*(q), respectively, where a*(q) = a 2 (q - 1)/2.
If a = a*(q), then JL(q; t) = JL(q; 0) is time invariant. o

Note: The moment equation is

. q (q- l)a 2
!J.-(q; t) = -a q ~J.-(q; t) + 2 !J.-(q; t),

and has the stated solution. If -a + (q - 1) a 2 /2 < 0, or equivalently, a > a*(q), then
~J.-(q; t) decreases to zero as t -+ oo. The moment 11-(q; t) is time invariant if a = a*(q).
If a < a*(q), ~J.-(q; t) converges to ±oo as t -+ oo depending on the sign of ~J.-(q; 0).
We also note that X is a geometric Brownian motion process with solution (Sec-
tion 4.7.1.3)
X(t) = X(O) exp(( -(a+ a 2 f2)t +a B(t))].
If X (0) and B are independent, then

which coincides with 11-(q; t) since E[eu G] = eu 212 for G ~ N(O, 1). &

7.3.1.2 Differential equation for characteristic function


Let X be the solution of Eq. 7.55 corresponding to a deterministic or ran-
dom initial condition X (0) that needs to be specified. The expectation,

~(u; t) = E [eAuT X(t)], u E ~. (7.59)

of g(X(t)) = exp (AuT X(t)} is the characteristic function of the JRd-valued


random variable X(t).
7.3. Nonlinear systems 479

The characteristic function of X satisfies the condition

t)]
d
8cp(u; t) = J=T~::::ui E [e.J=IuT X(t) ai(X(t),

-~ .t
at i=l

Ui ui E [e.J=IuT X(t) (b(X(t), t) b(X(t), t)T) ij J.


1,]=1
(7.60)

If the state X becomes a stationary process X s as t ~ oo, then

J=TI:: d
Ui E [e.J=IuT X,(t) ai(Xs(t), t) J

-~ .t
i=l

Ui u Ej [e.J=IuT X,(t) (b(X 8 (t), t) b(X8 (t), t)T) ij] = 0. (7.61)


l,j=l

Proof: Apply Eq. 7.3 to the mapping X(t) ~ eA uT X(t). We can use this version of
the It<> formula since X is a continuous semimartingale. The expectation of the It<> formula
JJ
scaled by dt yields Eq. 7.60 since eA uT X(s) bik(X(s)) dBk(s) is a local martingale
starting at zero and d[Bk, BL](t) = Okl dt.
Let qi 2: 1 be integers and q = I:f=l qi. If the entries of the matrices a and b are
polynomials of X and the partial derivatives,

of ({! exist, the right sides of Eqs. 7.60 and 7.61 can be expressed as partial derivatives of ({!
with respect to the coordinates of u.
The characteristic function of the stationary solution Xs satisfies the condition in
Eq. 7.60 but it is time invariant so that Eq. 7.60 becomes Eq. 7.61. •

We have seen in the previous section that moment equations can be obtained
for a diffusion process X only if its drift and diffusion coefficients are polynomials
of its coordinates. The same condition is needed in the current setting. If the drift
and diffusion coefficients of X are polynomials of this process, then Eqs. 7.60
and 7.61 become partial differential equations for the characteristic function of X.
In this case, it is possible to find the characteristic function of the state if initial
and boundary conditions are specified for Eq. 7.60 and boundary conditions are
given for Eq. 7.61. The initial condition for Eq. 7.60 is cp(u; 0) = e .J=IuT Xo
and cp(u; 0) = E [e.J=IuT X(O)J if X(O) = xo and X(O) is an JRd-valued ran-
480 Chapter 7. Deterministic Systems and Stochastic Input

dom variable, respectively. The boundary conditions for Eqs. 7.60 and 7.61 are
discussed later in this section.
Example 7.33: The displacement X of an oscillator with cubic stiffness, called
the Duffing oscillator, satisfies the equation

X(t) + 2s v X(t) + v2 ( X(t) +a X(t) 3 ) = W(t),

s
where E (0, 1) is the damping ratio, v denotes the initial frequency, a is a real
constant, and W is a Gaussian white noise with mean zero and one-sided spectral
density of intensity go > 0. The characteristic function of X satisfies the partial
differential equation

-arp arp
= U1 -au2 - V
2
U2-
arp a3rp
+ V 2 a U2 -
arp
- 2 S V U2 -
rr go 2
- - - u 2 (/)
at au1 aui au2 2
with initial condition given by the distribution of the state X at t = 0. ¢
Proof: The vector process (X 1 = X, X 2 = X) satisfies the differential equation
dX1(t) = X2(t)dt,
{
dX2(t) = -v 2 (X1(t) +a X1 (t) 3) dt- 2 ~ v X2(t) dt + .JiiTo dB(t),
where B is a Brownian motion. The partial differential equation for the characteristic func-
tion of (Xt. X2) is given by Eq. 7.60. This equation must satisfy the boundary conditions
tp(O, 0; t) = 1 and tp, aktpjilu~, iltpjilu 2 --+ 0 as ui + u~ --+ oo fork = 1, 2, 3 since the
random variables X1 (t) and X2(t) have finite moments (Eq. 7.62). •

Boundary conditions. Suppose that the drift and diffusion coefficients in Eq. 7.55
are polynomials of X so that Eqs. 7.60 and 7.61 are partial differential equations
for the characteristic function rp ( ·; t) of X (t). Initial and boundary conditions
need to be specified to solve these equations. The initial conditions result directly
from the properties of the state at time t = 0. The boundary conditions can be
obtained from the following properties of the characteristic function.

• rp(O; t) = 1,
• lrp(u; t) I ::: 1,
• If X(t) has a density, then rp(u; t) --+ 0 as II u II-+ oo,
• If X (t) has a density and finite moments of order q, then
aq rp(u; t)
q1 qd --+ o as II u 11--+ oo,
au1 ... "a
d
where qi ::: 0 are integers and L qi = q. (7.62)
i=1
7.3. Nonlinear systems 481

Note: The first two conditions result from the definition of the characteristic function.
The last two properties follow from the fact that, if g is an integrable function and
y(u) = fJR eHux g(x)dx, then y(u)--+ 0 as lui--+ oo ([62], Lemma 3, p. 513). This
fact implies cp(u; t) --+ 0 as II u II-+ oo since the density f of X(t) is assumed to be
integrable. If X(t) has finite moments of order q, then x{ 1· · ·XJd f(x) is integrable so
that the partial derivatives of order q = '£%= 1 qk of cp converge to zero as II u II-+ oo. A

7.3.1.3 Fokker-Planck-Kolmogorov eqnations

The density of the solution X of Eq. 7.55 satisfies two partial differen-
tial equations, the Fokker-Planck or the forward Kolmogorov equation and the
backward Kolmogorov equation. These equations can be derived from (1) the
relationship between the characteristic and density functions and Eq. 7.60, (2) the
Ito formula (Section 4.7.1.3), or (3) the Chapman-Kolmogorov formula (Exam-
ples 7.38 and 7.40).
Let f(x; t I xo; 0) be the density of X(t) I (X(O) = xo), t ~ 0, where
X (0) = xo is a deterministic initial state. If the starting time is s > 0 rather then
zero, the density of X(t) I (X(s) = xo) is denoted by f(x; t I xo; s), 0 < s ::=: t.
The shorthand notation f (x; t) will also be used for f (x; t I x o; 0) provided there
is no ambiguity. If the initial conditions are random, that is, X (0) is an JRd -valued
random variable, the density of X(t) can be calculated from f(x; t I x o; 0) and
the density of X (0). It is assumed that f (x; t I x o; 0) satisfies the conditions

lim [ai(X, t) f(x; t I xo; 0)] = 0, i = 1, ... , d,


lx;l---+oo

lim (b(x, t) b(x, t)T)ij f(x; t I xo; 0) = 0, i, j = 1, ... , d, and


lx;l---+oo

lim o[(b(x, t) b(x, t)T)ij f(x; t I xo; 0)]/0Xi = 0, i, j = 1, ... , d.


lx;l---+oo
(7.63)

If f(x; t I xo; 0) satisfies the conditions in Eq. 7.63, then f is the solution of
the Fokker-Planck equation

of(x; t I xo; 0)
at =-
L - a [ai(x, t) f(x; t I x
d

0 ; 0)]
ox·
i=l '

+ 21 Ld 8
ox· zox. [ (b(x, t) b(x, t)T)ii f(x; t 1 x 0 ; O) J.
i,j=l ' J
(7.64)
482 Chapter 7. Deterministic Systems and Stochastic Input

If the state X becomes a stationary process X s with density fs as t --? oo and


the conditions in Eq. 7.63 are satisfied, then fs is the solution of the stationary
Fokker-Pianck equation

d a
- "-[ai(x,t)fs(x)]+- " - - (b(x,t)b(x,t)T) .. fs(X)
1 d a2 [ J =0.
~ax·
i=l l
2 ~ax· v i,j=l l
ax·J
(7.65)

Proof: We need to show that Eq. 7.60 multiplied by e-FfuT x j(2:rr)d and integrated
with respect to u over JR.d gives Eq. 7.64. Alternatively, we can show that Eq. 7.64 multi-
plied by eH u r x and integrated over JR.d with respect to x gives Eq. 7 .60. We follow the
latter approach. The left side of Eq. 7.64 gives

A term of the first summation on the right side of Eq. 7.64 multiplied by eH uT x and
integrated over JR.d with respect to x can be written as

k n Trnd-J
m.
d
}=l,jofti
(eRuJxJ dx ·)
j
{k _i_
ID
m.
ax·
l
[ai(x, t) f(x; t)] eHu;x; dxi}.

The integral in brackets becomes

ai(X, t) f(x; t) eHu; x; lx;=±oo -.J=l Ui k ai(X, t) f(x; t) eHu; x; dxi

= -J=T Ui k ai(X, t) f(x; t)eHu;x; dxi

using integration by parts and the first set of conditions in Eq. 7.63. Hence, the term we
considered becomes -.J=T ui E [ ai (X(t), t) eH ur X(t) J.
A term of the second summation on the right side of Eq. 7.64 can be written as

kd-2 n.. (e.J=I


d
'f/ij = Uk Xk dXk)
k=l,koftl,J

X { { _a_z_[(b(x,t)b(x,t)T) .. f(x;t)J eH(u;x;+ujXJ)dxidXj}


1JR2 axi ax j lj

following multiplication with eH u T X and integration over JR<f with respect to x. The
double integral in the brackets has also the form
7.3. Nonlinear systems 483

Integration by parts of the inner integral in the expression of ;ij gives

a~j [(b(x,t)b(x,t)T)ij f(x;t)J eHuiX; lx;=±oo

- ,J=l u; { _a_
1~ axj
[(b(x, t) b(x, t)T) ..
lj
f(x; t)J e A u; x; dx;
so that (Eq. 7.63)

;ij = -R u; kz eH(u;x;+uJ Xj) a~j [ (b(x, t)b(x, t)T)ij f(x; t)J dx; dxj
= -H u; k eAu;x; dx; k a~j [ (b(x, t)b(x, t)T)ij f(x; t)J eAuJXJ dxj.
The above inner integral gives

(b(x,t)b(x,t)T)ij f(x;t)eAuJXJ lxj=±oo

-H Uj k (b(x, t)b(x, t/)ij f(x; t)eAuJxJ dxj

upon integration by parts, so that (Eq. 7.63)

;ij =-u;Uj {
hz eH(u;x;+ujXj) (b(x,t)b(x,tl) v.. j(x;t)dx;dXj

and

TJij =-u;uj k_deAuTx (b(x,t)b(x,t/)ij f(x;t)dx

= -u; u j E [eA uT X(t) (b(X(t), t) b(X(t), t)T)iJ.


The proof is completed by assembling the above results.
Generally, it is not possible to find analytical solutions for Eqs. 7.64 and 7.65 (Sec-
tion 7.3.1.4) so that the solution of these equations has to be obtained numerically in most
applications. The available numerical solutions of the Fokker-Planck equations (1) replace
the domain of definition of these equations with an open bounded subset D of ~ and
(2) impose the boundary conditions f(x; t I xo: 0) = o and aj(x; t I xo: O)jax; = ofor
x E aD, where aD denotes the boundary of D. The solution of the resulting problem can
be solved by the finite difference or finite element methods [179]. •

Example 7.34: Let X be defined by dX(t) = -p X(t-) dt + dS(t) for p > 0,


t ~ 0, and X(O) = x, where Sis a Gaussian (GWN), Poisson (PWN), or Levy
(LWN) white noise. The GWN, PWN, and LWN are interpreted as the formal
derivative of the Brownian motion, compound Poisson, and a-stable processes,
respectively. The increments of the Levy white noised S (t) = d La (t) are a-stable
random variables with the characteristic function E [ e A u dLa(t) J = e-lul" dt,
u E JR.
484 Chapter 7. Deterministic Systems and Stochastic Input

The characteristic function qJ(u; t) = E[eA uX(t)] of X(t) satisfies the


partial differential equations

-a(/) a(/) u 2

at
= - p u - - -qJ
au 2
forGWN,

~~ = -pu ::+A (E [eAuY1] -1) (/) forPWN,


a(/) a(/) a
at = -p u au - lui (/) forLWN,

where A > 0 and Yt are parameters of the compound Poisson process and a E
(0, 2) defines the Levy white noise. The density, f(x; t I xo; 0), of the conditional
process X (t) I (X (0) = xo) satisfies the partial differential equations

aj a(xf) 1 a2f
at = p --a:;- + 2 ax2 forGWN,

aj = P a(x f)+ A f: (-l)k E[Yf] ak { forPWN.


at ax k=l k! ax

The first formula is a Fokker-Planck equation given by Eq. 7.64. The second for-
mula is not a classical Fokker-Planck equation, and we refer to it as a generalized
or extended Fokker-Pianck equation. The generalized Fokker-Planck equation
is valid if Yt has finite moments of any order and some additional conditions
stated in the following comments are satisfied. It is not possible to derive a partial
differential equation for f under Levy white noise with a E (0, 2). <>
Proof: The derivation of the partial differential equations for the characteristic and density
functions of X under Poisson white noise is outlined here. The differential equation for
the characteristic function of X driven by a Levy white noise is in Example 7.20. The
corresponding equation for a Gaussian noise results from Eq. 7.60.
LetC(t) = I:,f~? Yk beacompoundPoissonprocess, whereNisaPoissonprocess
with intensity ).. > 0 and Yk are independent copies of a random variable Yt. The Ito
formula in Eq. 7.4 applied toeA uX(t) gives

eAuX(t) -1 = -Hpu ( X(s-)eAux(s-)ds


lo+
+ L [eAuX(s) _ eAuX(s-)]
0<s:9

so that we have

cp(u;t)-1=-pu Jo r acp(u- s)
---a;;--ds+ L cp(u;s)E [ ev-lu~C(s)_l
r-1 ]

O+ O<s::::t

by averaging. The last term has this form since

X(s) = X(s-) + LlX(s) = X(s-) + LlC(s),


7.3. Nonlinear systems 485

X(s-) is independent of future jumps .b..C(s) = C(s) - C(s-) of C, and .b..C(s) is either
Yk for some k or zero in any small time interval .b..s > 0 so that the random variable
eH u llC(s) -1 is eH u Yk -1 with probability)... .b..s and zero with probability 1-)... .b..s
as .b..s -)., 0. The partial differential equation for rp results by differentiation.
The Fourier transform of the left side of the partial differential equation for rp is
af/at. The first term on the right side of the Fokker-Planck equation is given by the corre-
sponding term of the equation for rp by using integration by parts, the boundary conditions,
and the equality

ak f(x· t)
-----,-'-
axk
= 1-
2Jr
h
JR.
r-1
(-v'=Iule-y-luxrp(u;t)du,

which is valid for any integer k 2: 1. The Fourier transform of rp(u; t) E[eH u Y,] is

_1_ { e-H ux rp(u; t) E[eH u Y,] du

[f (H~YJ)k]
2Jr lJR.

= _1 { e-Ruxrp(u;t)E du
2n }JR. k=O k.

- ~ E[Yfl_l
- L. k' 2Jr e
k
-..;=Tux (. )( C[ )kd - ~ (-1)kE[Yfl akj
rp u, t y - 1 u u- L. k! a k,
k=O . JR. k=l X

where it is assumed that Y1 has finite moments of any order,

and the above integration can be performed term by term. The last equality follows from
the relationships between the density and the characteristic functions given above. •

The Fokker-Planck equations (Eqs. 7.64 and 7.65) are deterministic, linear
partial differential equations of the second order defined for all t 2: 0 and x E IE. d .
Conditions for the existence and uniqueness of the solution of these equations can
be found in [44] (Chapter IV). Comprehensive discussions on the Fokker-Planck
equations are in [70, 174].
In reliability studies and other applications we are interested in some global
properties of the state X defined by Eq. 7 .55, for example, the distribution of the
stopping time T = inf{ t > 0 : X (t) ¢ D} corresponding to an open subset D of
JE.d and deterministic initial state X(O) = xo ED. Let XT (t) = X(t /\ T) denote
the process X stopped at T. The samples of X T and X coincide for times t ::=: T
but differ fort > T. The samples of xT are constant and equal to X(T) E aD at
times t 2: T. If X is the state of a physical system and D denotes a safe set for this
system, the probability of the event {XT (t) E D} gives the fraction of samples of
X that never left D in [0, t], so that

Ps(t) = P(T > t) = P (xT(t) ED), t 2:0, (7.66)


486 Chapter 7. Deterministic Systems and Stochastic Input

is the system reliability in [0, t]. We will see that the probability Ps (t) in Eq. 7.66
can be calculated from the solution of the Fokker-Planck equation for the den-
sity of X with appropriate boundary conditions. We review now some boundary
conditions that are relevant for reliability studies and other applications. Extensive
discussions on the type and classification of boundary conditions for diffusion pro-
cesses and the Fokker-Planck equation can be found elsewhere ([70], Chapter 5,
[111], Section 15.6).

Boundary conditions. Let pv(t) = fv f(x; t I xo; 0) dx be the probability that


the solution X of Eq. 7.55 belongs to an open subset D of JRd at a time t ::: 0,
where X(O) = xo ED and f(x; t I xo; 0) denotes the density of the 1Rd-valued
random variable X(t) I (X(O) = xo). The evolution of f(x; t I xo; 0) and PD in
time is given by the following two equations.

d
of = _"'"" BA.i (x; t)' where (7.67)
at ~
i=l
axi

a
Ai(x; t) = ai f - 2 L
1 d
~[(bbT)iJ f], i = 1, ... , d, (7.68)
J=l ux1

are the coordinates of a vector A.(x; t) E !Rd, called the probability current.

opv(t)
- - =-
at
~
~
.
/,
D
oA.i(x; t)
OXi
~
dx =- ~
. aD
i
Ai(x; t)ni(x)da(x), (7.69)
z=l z=l

wheren(x) = (nl(X), ... ,nd(X)) istheexteriornormalatx E oD.


Note: We note that Eq. 7.67 is an alternative form of the Fokker-Planck equation (Eq. 7.64).
The rate of change of Pn(t) is
apv(t) = r aj(x, t I XQ, 0) dx
at lv at
""/,
=-~
i=l D
d
iD.iax·
'
d
(x, t) dx=-~ ""h
i=l aD
Ai(X,t)ni(X)da(x),

where the last equality in the above equation follows from the Gauss or the divergence
theorem ([68], p. 116). The function
d
- L'Ai(x,t)ni(x)da(x) = -A.(x,t) ·n(x)da(x)
i=l
gives the probability flow from D to De through an infinitesimal surface element da(x)
of aD. Hence, -fan' A.(x, t) · n(x) da(x) represents the probability flow from D to De
through a subset aD1 of aD.
7.3. Nonlinear systems 487

For numerical calculations, if D is not bounded, we have to approximate it by a


bounded subset iJ C D. For example, if D = {x E Rd : XJ < a}, a E lR, we can take
D = {x E Rd : XJ E (-at. a), Xi E ( -ai, ai), i = 2, ... , d} and solve Eq. 7.69 with D
in place of D, where ai > 0 are some constants. This approximation is adequate for large
values of ai since f and its first order partial derivatives vanish as II x II--.. oo so that the
contribution of the boundaries of iJ to the surface integral in Eq. 7.69 is virtually limited
to the hyperplane XJ =a [179]. A

Let T be the stopping time in Eq. 7.66 corresponding to a solution X of


Eq. 7.55 with X(O) = xo E D and an open subset D of ffi.d. We define two types
of boundaries depending on the properties of T.

• If T = oo a.s., aD is an inaccessible boundary. An inaccessible boundary


is natural or attracting if X never reaches aD orlim 1 ~ 00 X(t) E aD, respec-
tively.
• If T < oo a.s., aD is an accessible boundary. An accessible boundary is
reflecting or absorbing if l(x; t) · n(x) = 0, x E aD, or f(x; t I xo; 0) = 0,
X E aD, respectively.

Note: The above definitions can be applied to subsets of the boundary aD of D. Distinct
subsets of aD can be accessible or inaccessible boundaries.
The probability P ( XT (t) E De) is zero at t = 0 since X(O) = xo E D. This
probability is equal to P(T ::: t) (Eq. 7.66) and increases in time since more and more
samples of X reach the boundary of D. The density of xT (t) in D is the solution of
Eq. 7.64 with the absorbing boundary condition f(x; t I xo; 0) = 0 for X E aD. This
boundary condition eliminates the samples of X once they reach aD so that P(T > t) is
equal to Pv(t) in Eq. 7.69.
Absorbing boundary conditions are used in reliability analysis to characterize sys-
tem performance (Section 7.4.3). We have used reflecting boundaries in Section 6.2.3 to
find local solutions for deterministic partial differential equations with Neumann boundary
conditions. A

Example 7.35: Let X be a geometric Brownian motion defined by dX (t) =


c X (t) dt + a X (t) dB(t), t :;:: 0, where c, a are constants, B denotes a Brow-
nian motion, and X (0) > 0. If c - a 2 /2 :;:: 0, the boundary x = 0 is natural.
Otherwise, x = 0 is an attracting boundary. <>

Proof: We have seen that X(t) = X(O) e<c-a 2 / 2) t+a B(t) (Example 7.32) so that X(t) ?:
0 at all times since X (0) > 0 by hypothesis. The process can be zero if and only if
(c - a 2 /2) t + a B(t) becomes -oo. The function (c - a 2 /2) t + a B(t) is bounded
a.s. for t finite and

lim
t~oo
~[(c-
t
a 2 /2) t +a B(t)] = (c- a 2 /2) + lim [a B(t)ft]
t~oo
= (c- a 2 /2)
since limt~oo B(t)jt = 0 a.s. ([150], Section 6.4). If c- a 2 /2 < 0, then lim1 ~ 00 X(t) =
0 a.s. so that x = 0 is an inaccessible attracting boundary. Otherwise, the boundary x = 0
cannot ever be reached so that x = 0 is an inaccessible natural boundary. •
488 Chapter 7. Deterministic Systems and Stochastic Input

Example 7.36: Let B be a Brownian motion. Then

f(x;tly;O)= ~¢(~)
for B(O) = 0 and a reflecting boundary at x = 0. <>
Proof: The density f(x; t I y; 0) satisfies the Fokker-Planck equation
af 1 a2 f
at 2 axz
with the boundary condition aj(x; t I 0; O)jax 0 at x = =
0, t > 0, since ).(x; t) =
-(1/2) aj(x; t I 0; O)jax. The density f can be calculated from the above Fokker-Planck
equation with the stated boundary condition and the initial condition f(x; 0 1 0; 0) = 1 for
x = 0 and zero otherwise.
We follow here a different approach using the observation that B I (B(O) = 0)
reflected atx = 0 is equal in distribution with IB I I (B(O) 0). Because B(t) I (B(O)= 0) =
and ..fi N(O, 1) have the same distribution, we have for~ > 0

P(IB(t)l ~~I (B(O) =0)) =P(-~ < ..fi N(O, 1) ~~)=<I>(~)- <I>(:%),
so that the corresponding density is f(x; t I 0; 0) = (d/d~)P(IBI ~ ~ I B(O) = y) is as
stated ([154], Section 6.3). •

Example 7.37: Let dX (t) = (a X (t) + f3 X (t) 3 ) dt + a dB(t) be a stochastic


differential equation defining a real-valued diffusion process X starting at X (0) =
xo E lR, where a, {3, a are some constants.
Figure 7.11 shows the evolution in time of the density f(x; t I xo; 0) of
X(t) I (X(O) = xo) fort E [0, 0.125], a = f3 = -1, a = 1, and xo =
0. This density has been obtained by a finite difference solution of the Fokker-
Planck equation in the interval [-2, 2] with zero boundary conditions, time step
I::J.t = 0.00625, and space step ilx = 0.1. The plot shows that f(x; t I xo; 0) is
practically time invariant toward the end of the time interval. We also can see that
the selected integration interval [-2, 2] is adequate since the entire probability
mass of X (t) seems to be contained in this interval. <>
Note: The density f(x; t I xo; 0) of X(t) I (X(O) = xo) is the solution of the Fokker-
Planck equation
aj a
-=--((ax+ {3x 3) f ) + - -
a 2 a2 f
at ax 2 ax2
with f(-2, t I xo; 0) = /(2, t I xo; 0) = 0.

-2 + i Llx, i = 0, 1, ... , and ti =


j Llt, j =
Let Llx > 0 and Llt > 0 be the space and time steps defining the nodes Xi
0, 1, ... , of the finite difference mesh. The
finite difference approximation of the Fokker-Planck equation at node (i, j) is

!i,J+l - !i,j - -( + 3 f3 2) 1'. . - ( . + f3 3) fi+l,j - fi-1,}


2Llx
Llt - a xi Jz,J axz xi

+ -a 2 .:fi+I,J-
. .:. . :. : .:.;__ 2fi,j + fi-1,}
_:_:~__.:_c__:.:.f.._

2 (Llx)2 '
7.3. Nonlinear systems 489

10
. f(x;tJ 0;0) · >
8

-2
2

0.14

-2 0

Figure 7.11. The evolution of the density of X(t) I X(O) = xo for ot = f3 = -1,
a = 1, and xo = 0

where fi,j = f(xi; tj I xo; 0). The above linear system of equations has been used to
propagate the vector JCi) = (!l,j. h.i• .. .) in time. •

Example 7.38: Let X be a diffusion process defined by the stochastic differen-


tial equation dX(t) = a(X(t)) dt + b(X(t)) dB(t) and f(x; t I xo; 0) denote the
density of X(t) I (X(O) = xo). The Markov property of X can be used to de-
rive the Fokker-Planck equation for the density function f(x; t I xo; 0) under the
conditions in Eq. 7.63. 0
Proof: Let g E C(IR2) be an arbitrary real-valued function and consider the integral 'fJ =
fJR [f(x; t +hI xo; 0)- f(x; t I xo; 0)] g(x)dx, h > 0. BecauseXisaMarkovprocess,
the Chapman-Kolmogorov equation can be used and gives (Section 3.6.3)

f(x; t +hI xo; 0) ={ f(x; t +hI~; t) f(~; t I xo; O)d~


so that

'fJ = { [{ f(x; t + h I~; t) f(~; t I xo; 0) d~- f(x; t I xo; 0)] g(x) dx

= { [{ f(~; t + h I x; t) g(~) d~ - g(x)] f(x; t I xo; 0) dx


490 Chapter 7. Deterministic Systems and Stochastic Input

and

TJ/ h = k[k. J(t;; t + h I x; t) g(t;) ~ g(x) dt; J f(x; t I xo; 0) dx.


The Fokker-Planck equation results by taking the limit as h .,1. 0 of the above expression
with the approximation

g(l;) ~ g(x) = g1(x) (t; - x) + ~ g 11 (x) (t; - x) 2 + o((t; - x) 2 ), h ..1- 0,

and using integration by parts and the fact that g is an arbitrary function. •

The density f(x; t I ~; s) of the conditional vector X(t) I (X(s) = ~),


0 < s < t, satisfies, in addition to the Fokker-Planck equation, another partial
differential equation, called the backward Kolmogorov equation.

If X is the solution of Eq. 7.55, then f(x; t I y; s), 0 < s < t, satisfies the
backward Kolmogorov equation

aj(x; t I y; s) ~ 8f(x; t I y; s)
a = - L ai (y' s) ....:...._----:a--'-:....._-
s i=1 Yi

-~ t 2 i,J= 1
(b(y, s) b(y, s/)
iJ
az f(x; t I y; s).
ayi ay1
(7.70)

Proof: Let u(~, cr) = E(~,u)[g(X(t))], where cr is an arbitrary point in (0, t), the super-
scripts of E(~,u) indicate the condition X(cr) =~.and g E C2 (mtd). Takes, h > 0 such
that 0 < s - h < s < t and X (s - h) = y. We have
. E(y,s-h)[g(X(s))]- g(y) . E(y,s-h)[g(X(s))- g(y)]
hm = lim ----=-------'-'--
h h

(t
h.l-0 h.j.O

=lim~ E(y,s-h) [is ag(X(cr)) dXi(cr)


h.j.O h s-h axi

,t
i= 1

+~ (b(X(u), u) b(X(u), u)T) 11 '',:~~:~)) du)]


a
= L:>i (y, s) -a-. + 2
.
ag(y)
Y1
1
La
. .
( ) azg(y)
b(y, s) b(y, s)T i. ~
J y, Y;
= A[g(y)],
I= 1 l,j= 1

where the second equality in the above equation holds by Ito's formula and A is the gener-
ator of the diffusion process X (Section 6.2.1.1). The definition of the function u gives
u(y, s- h)= E(y,s-h)[g(X(t))] = E(y,s-h) {E(X(s),s)[g(X(t))]}
= E(y,s-h)[u(X(s), s)]
7.3. Nonlinear systems 491

so that
au(y, s) =lim u(y, s)- u(y, s- h)
as h,),O h
=-lim E(y,s-h)[u(X(s), s)]- u(y, s) = -A[u(y, s)]
h,),O h

or au(y, s)jas + A[u(y, s)] = o. The functions aujas and A[u] are
au(y, s)
- --=
as
1]Rd
< ) aj(x; t I y; s) d
gx
as
x and

A[u(y, s)J = f g(x) A[f(x; t I y; s)] dx,


lJRd
where f(x; t I y; s) denotes the density of X(t) I (X(s) = y). The last equality in the
above equation holds since A does not depend on the variable of integration. Because g is
arbitrary, we have
aj(x; t I y; s) + A[f(X,. t I y,. S )J -_ 0 •
--'----"-----
as
If the drift and diffusion coefficients of X do not depend explicitly on time, then the
transition density of X depends only on the time lag, that is, f(x; t I y; s) = f(x; 1: I y; 0),
where 1: = t- sis the time lag. Because aflas = -af/a1:, the above equation becomes
ana1: = A[ fl. This version of the backward K.olmogorov equation is particularly useful
for calculating moments of the first exit time T in Eq. 7.66 as illustrated by the following
example and [175] (Section 7.4.2). •

Example 7.39: Let X be defined by dX(t) = a(X(t)) dt + b(X(t))dB(t). The


mean JLT of the first exit timeT= inf{t :=: 0: X(t) fj. (a, ,8)} for X(O) = xo E
D = (a, ,8) is the solution of

dJLr d ILT 2
-1 = a(xo)- +-1 b(xo) 2 - -
dxo 2 dx5

with the boundary conditions JLr(a) = JLr(,B) = 0. ~

Proof: Let f(x, t I xo; 0) be the solution of the Fokker-Planck equation for X with the
boundary conditions f(a, t I xo; 0) = f(f3, t I xo; 0) = 0. The function Pv(t) =
J! f(x; t I xo; 0) dx gives the fraction of samples of X starting at xo E D that have not
reached the boundaries of D in [0, t], that is, the probability P(T > t). Integrating the
backward Kolmogorov equation with respect to x over (a, {3), we find

-apv = a(xo) -apv +-1 b(xo) 2 -


a2Pv
-
a 'I: axo 2 ax5

since the drift and diffusion coefficients of X do not depend explicitly on time. The integral
of the above equation over the time interval [0, oo) gives an equation for the mean ofT
0
since f (apvfa7:) dr = Pv(oo)- Pv(O) = -1 and f000 Pv(r) d1: = f.LT· •
492 Chapter 7. Deterministic Systems and Stochastic Input

Example 7.40: The Chapman-Kolmogorov equation can be used to derive the


backward Kolmogorov equation in Eq. 7.70.
Proof: Let X be a diffusion process defined by Eq. 7.55 with d = d 1 = 1 and let 0 < s <
s + h < t, where h > 0. Consider the density f(x; t I z; s +h) of X(t) I (X(s +h)= z)
and view it as a function of z. The second order Taylor expansion of this function about
z =~is
aj(x; t 1 z; s +h)
f(x; t I z; s +h) = f(x; t I ~; s +h)+ az lz=g (z- ~)

1 a2 f(x; t 1 z; s +h)
+2 az2 lz=g (z- ~)2 + o((z- ~)2).

This representation of f(x; t I z; s +h) and the Chapman-Kolmogorov equation,

f(x; t I~; s) = k. f(x; t I z; s +h) f(z; s + h I~; s) dz,


give

i
f(x; t I ~; s)
af(x; t 1 z; s +h)
=f(x;tl~;s+h)+ lz=g (z-~)f(z;s+hl~;s)dz

h
az JR

+ -21 a f(x·' t2 I z·' s)


2
lz=~ (z- ~) 2 f(z; s + h I ~; s) dz

k
az JR

+ o((z- ~) 2 ) f(z; s + h I ~; s) dz
or
f(x; t I~; s)- f(x; t I~; s +h)
h
_ aj(x; t I z; s +h) I
- az
!_
z=~ h
{ ( _ ")
}JR Z 5
f( .
Z, S
+ h I"·
5, S
)d
Z

+ !_2 a f(x; tazI2z; s +h) 1z=~ !_h


2 1( _JR
z
")2 f( .
s z, s + h I"·
5, s
) dz

+ *k o((z- ~) 2 ) f(z; s + h I ~; s) dz.


The above equation yields the backward Kolmogorov equation by taking the limit as h ,j.. 0
and noting that the last two integrals in this equation converge to the drift and diffusion
coefficients of X ([111], Section 15.2). •

7.3.1.4 Exact solutions


Generally, it is not possible to find analytical solutions for the state X in
Eq. 7.55 or the Fokker-Planck and backward Kolmogorov equations (Eqs. 7.64,
7.65, and 7.70). We have obtained in Section 4.7.1.2 analytical solutions for the
7.3. Nonlinear systems 493

state of some simple diffusion equations. Additional examples can be found in


[115] (Section 4.4). This section gives examples of nonlinear systems for which
the Fokk:er-Planck equation admits analytical solutions. Other examples are avail-
able in [67].

Example 7.41: Let X be a diffusion process defined by (Example 7.30)

dX (t) = a(X (t)) dt + b(X (t)) dB(t).


If the conditions in Eq. 7.63 are satisfied and the drift and diffusion coefficients
are such that X becomes a stationary process X s as t ~ oo, then the density of
Xs is

fs(x) = b(:) 2 exp [2 Jx :(~:2 da],


where c > 0 is a constant and r a(a) da denotes the primitive of a.¢

Proof: We have (Eq. 7.65)

a (a(x) fs(x)) - 21 dx
dx a22 ( b(x) 2 fs (x) ) =0

so that a(x) fs(x)- (1/2) (b(x) 2 fs(x) )' = -q or h'(x)- 2~(x) h(x) = -2q, where
h = b2 fs, ~ = ajb2 , and q is a constant, which must be zero by Eq. 7.63. Therefore, we
have h' (x) - 2 ~ (x) h (x) = 0, which yields the stated expression for !s. •

Example 7.42: Let X be the displacement of a simple oscillator defined by

X(t) + p X(t) + u'(X(t)) = W(t),


where p > 0 is the viscous damping coefficient, u denotes a potential, u '(x) =
du(x)fdx, and W is a stationary Gaussian white noise process with mean zero
and one-sided spectral density g(v) =go > 0 for all v :::: 0. The total mechanical
energy of the system is h (X, X) = u (X) + X2 /2. Denote the partial derivatives
of this function by h,k = ah;axk. where xl = X and x2 = X. The stationary
density of (X, X) is

fs(X, X)= C exp [- Zp h(X,


n go
x)],
where c > 0 is a constant. ¢

Proof: The diffusion process X = (X 1 = X, X 2 = X) is the solution of


{
dX 1(t) = h 2 (X(t)) dt,
dX2(t) = _: [h,l (X(t)) + p h, 2 (X(t))) dt, +,;'7f8o dB(t),
494 Chapter 7. Detenninistic Systems and Stochastic Input

where B is a Brownian motion. The stationary Fokker-Planck equation for the density fs
of the stationary solution Xs is (Eq. 7.65)

a a rr g0 a2 Is
- - (h 2 I s ) - - (-(h 1 +ph 2) I s ) + - - =
axl . ax2 . . 2 axi
o.

Suppose that there exists a function g such that fs(x) = g(h(x)). Then the above Fokker-
Planck equation becomes

J
p [<h,2) 2 g'(h)+g(h)]+ rr 0 [<h, 2)2 g"(h)+g'(h)] =0

since ls,l = g'(h)h,l, 1s,2 = g1(h)h,2, ls,22 = (h,2) 2 g"(h) + g'(h), h,l2 = 0, and
h. 22 = I. Hence, Is viewed as a function of h is the solution of

a rr go a2 Is
p - [ h 2 1 s ] + - - =0
ax2 · 2 axi

so that ph,2 Is+ (rr go/2) (alsfax2) = l(xl), where lis an arbitrary function of x1.
Because the left side of this equation approaches zero as lx21 -+ oo for each x1 (Eq. 7.63),
l(xl) must be zero. We have p h,2 Is + (rr go/2) h,2 g1 = 0 or p g + (rr go/2) g 1 = 0
following simplification with h,2 = x2 so that dgfg = -(2 pj(rr go)) dh. The solution of
this equation is the stated stationary density Is. •

7.3.1.5 Nonlinear random vibration


The main objective of most random vibration studies is the determination
of properties of the response or state X of a dynamic system defined by Eq. 7 .55,
for example, moments, marginal densities, mean crossing rates, and first passage
times. We have already seen in the previous section that analytical solutions of
Eqs. 7.55, 7.64, and 7.70 are possible only in a few cases. These difficulties jus-
tify the extensive use of numerical methods, Monte Carlo simulation, and approx-
imations for solving random vibration problems. We discuss here approximate
solutions of nonlinear random vibration problems, which are used extensively in
applications.
Fokker-Pianck equation. The previous section shows that analytical solutions
of the Fokker-Planck and backward Kolmogorov equations (Eqs. 7.64 and 7. 70)
are possible only in special cases. Numerical solutions of the Fokker-Planck and
backward Kolmogorov equations are limited to state vectors X with dimensions
d ::=:: 3 or 4 and are based on finite difference and finite element methods [179,
196].
Path integral method. Let X be defined by Eq. 7.55 and suppose that the coeffi-
cients a and b are such that this equation has a unique solution. Our objective is
to find the global solution of Eq. 7 .55, that is, the probability law of X over a time
interval [0, t], t ~ 0.
7.3. Nonlinear systems 495

The path integral method constructs the global solution ofEq. 7.55 by link-
ing local solutions of this equation, that is, solutions over relatively small time
intervals [tk-1, fk], where p = (0 = to < t1 < · · · < tn) is a partition of
[0, t] with mesh t..(p) = max1:sk:snCtk - tk-1). Two properties of X are used
in the path integral method. First, X is a Markov process so that its finite di-
mensional distributions can be obtained from the density of the initial state X (0)
and the density /(·; t I y; s) of the conditional vector X(t) I (X(s) = y),
0 :::: s :::: t, referred to as the transition density (Section 3.6.3). Second, if the
time step tk - fk-1 is sufficiently small, the vector X (tk) I (X Ctk-1) = y) is ap-
proximately Gaussian with mean y +a(y, fk-1) (tk- lk-1) and covariance matrix
b(y, lk-d b(y, fk-1)T (tk- fk-1).
For the numerical implementation of the path integral method both time
and space need to be partitioned. Let C i, i = 1, ... , n, be a partition of the
set D c JE.d in which X takes values. If XCtk-d E C;, the probability of the
event {X (tk) E C j} can be approximated by Jc f (x; tk I y i; tk-1) dx, where
J
/(·; tk I y; tk-1) denotes the density of X(tk) I (XCtk-d = y) and Yi E C; may
be selected at the center of C;. These considerations show that the evolution of X
can be approximated by a Markov chain with known transition probability matrix.
If the drift and diffusion coefficients of X do not depend explicitly on time, the
transition probability matrix has to be calculated once. Otherwise, the value of
this matrix needs to be updated at each time step. Theoretical considerations on
the path integral method and examples illustrating its application can be found
[132].

Moment closure. Generally, the moment equations cannot be solved exactly be-
cause they form an infinite hierarchy (Example 7.31). Several heuristic solutions
have been advanced in nonlinear random vibration theory and physics to close the
moment equations. Most of these solutions are based on postulated relationships
between moments of the state vector X providing additional equations, which are
used to close the infinite hierarchy of moment equations [22, 175]. For example:

- Central moment closure postulates that all central moments of order q =


'Lf= 1 q; > qo of X are zero, where qo is called the closure level ([175],
Section 6.1.1).

- Gaussian closure assumes that the moments of X of order q 2: 3 have the


same properties as the corresponding moments of Gaussian vectors, that is,
all odd and even central moments of X of order q 2: 3 are assumed to be
zero and be related in the same way as for Gaussian variables, respectively
([175], Section 6.1.1).

- Cumulant closure postulates that the cumulants of X of order higher than


a closure level qo are zero. The cumulant closure with qo = 2 and the Gaus-
sian closure approximations give the same answer ([175], Section 6.1.1).
496 Chapter 7. Deterministic Systems and Stochastic Input

- Consistent closure postulates a functional form,


n
j(x; t) =L Pk(t) fk(X),
k=I

for the density of X(t), where fk(X) are fixed densities, and Pk(t) are
weights such that Pk(t) 2: 0 and Lk=I Pk(t) = 1 for all t 2: 0. If X(t)
is stationary, the weights Pk are time-invariant. Let X= X be a real-valued
stationary diffusion process and let ei (PI, ... , Pn) be the expression of a
moment equation of order i in which the moments of X have been ob-
tained from the density f. Define an objective function e(pi, ... , Pn) =
Lf~I e; (PI, ... , Pn) 2 , where qo denotes a selected closure level. The op-
timal functions Pk minimize the objective function, that is, they are the
solutions of the system of equations aejapk = 0, k = 1, ... , n [74].
There is no justification for the relationships between moments of X postu-
lated by the central moment, Gaussian, and cumulant closure methods. Moreover,
the closure methods can deliver incorrect results, for example, negative even-order
moments [97]. Yet, these methods are used in large scale applications since they
are relatively simple and provide useful results in some cases [22, 105].
Example 7.43: Let X be the solution of
dX(t) = -[X(t) +a X(t) 3 ] dt + ,JJTiOdB(t),
where a and go > 0 are some constants. The moments of X satisfy the infinite
hierarchy of equations
•(
JL p; t) = - p JL(p; t) -a p JL(p + 2; t) + l7r go p (p - 1) JL(p- 2; t).

If X is a stationary process, the Gaussian closure method gives the nonlinear al-
gebraic equations
JL{l) -a (3 JL(1) JL(2) - 2 JL(1) 3) = 0,
{
-2 JL(2)- 2a [3 (JL(2)- JL{1) 2 ) 2 + 6 JL{1) 2 JL(2)- 5 JL(1) 4 ] + 1r go= 0,
for the first two moments of X. Approximate/exact values of the second moment
JL(2) are 0.2773/0.2896, 0.3333/0.3451, 0.4415/0.4443, and 0.4927/0.4927 for
a = 1, 0.5, 0.1, and 0.01, respectively, and 1r go/2 = 1/2. The approximations in
this case are satisfactory ([175], Example 6.7, p. 228). 0
Proof: The moment equations are given by Eq. 7.57. Let il(p; t) = E[(X(t)- J.L(l; t))P]
denote the central moment of order p of X(t). The Gaussian closure requires that the
moments of X(t) of order three and higher behave as moments of Gaussian variables, for
example, il(3; t) = 0 and il(4; t) = 3 il(2; t) 2 for the third and fourth order moments, so
that
J.L(3; t) = 3J.L(l; t)J.L(2; t)- 2J.L(l; t) 3 ,
J.L(4; t) = 3 [J.L(2; t) - J.L(l; t) 2 ] 2 + 6 J.L(l; t) 2 J.L(2; t)- 5 J.L(l; t) 4 .
7.3. Nonlinear systems 497

These conditions and the moment equations for p = 1 and p = 2 are used in the Gaussian
closure procedure to calculate the first four moments of X. If X is stationary, we need to
solve the above coupled nonlinear algebraic equations to find the first two moments of the
state. •

Example 7.44: Let X be the displacement of the Duffing oscillator in Exam-


ple 7.33 with a > 0. The moments p,(p, q; t) = E[X(t)P X(t)q] satisfy the
differential equation

/1-(p, q; t) = p p,(p- 1, q + 1; t) - v 2 q p,(p + 1, q- 1; t)


-a v2 q p,(p + 3, q- 1; t)
n: go
- 2 ~ v q p,(p, q; t) + -2- q (q - 1) p,(p, q - 2; t)

with the usual convention that p,(p, q; t) = 0 if at least one of the arguments
(p, q) is strictly negative. These moment equations form an infinite hierarchy that
cannot be solved exactly.
For a closure level qo = 2, the central moment and cumulant closure give
the relationship

p,(3, 0; t) = 3 p,(1, 0; t) p,(2, 0; t)- 2 p,(1, 0; t) 3 '


p,(4, 0; t) = 3 p,(2, 0; t) 2 - 2 p,(1, 0; t) 4 '
p,(3, 1; t) = -2 p,(1, 0; t) 3 p,(O, 1; t) + 3 p,(2, 0; t) p,(1, 1; t),

which can be used to close the moment equations of order p + q = 1 and 2. The
corresponding approximate stationary solutions are p,(l, 0) = 0, p,(O, 1) = 0,
p,(2,0) = ~: (J1 + 3 ~:r -1), p,(1, 1) = 0, andp,(0,2) = :t~· If a= 0,
the Duffing oscillator becomes a linear system so that the variance of the stationary
displacement is (n: go)/(4~ v 3 ). o
Note: The above moment equations have been obtained from Eq. 7.57. They can also be
derived by applying ItO's formula to the mapping (X(t), X(t)) 1-+ X(t)P X(t)q. •

Equivalent linearization. Let X be the solution of Eq. 7.55 with b(x, t) = b(t).
Denote by X 1 the solution of the linear stochastic differential equation

dX1(t) = a(t) X1(t) dt + b(t) dB(t), t :::: 0, (7.71)


driven by the noise in Eq. 7.55. The objective of the equivalent linearization
method is to find the matrix a in Eq. 7.71 such that X I is a satisfactory approx-
imation for X. Accordingly, the entries of a are selected to minimize the m.s.
difference between the drift coefficients of the exact and approximate differential
equations, that is, the mean square error [153, 175]

e = E [ (a(X (t), t) - a(t) X1 (t))T (a(X(t), t) - a(t) X1(t)))] . (7.72)


498 Chapter 7. Deterministic Systems and Stochastic Input

This seems to be a reasonable criterion for finding the parameters of the approx-
imating equation. The problem is that we cannot calculate the mean square error
in Eq. 7.72 since the probability law of X is not known. If the law of X were
known, there would be no need for constructing an approximation. To overcome
this difficulty, we consider the following objective function.

Objective function. lfEq. 7.55 has additive noise and X 1 is defined by Eq. 7.71,
then ii is selected to minimize the m.s. error

e= E [ (a(XJ(t), t)- ii(t) Xz(t))T (a(XJ(t), t)- ii(t) X1(t))]. (7.73)

Note: The objective function e in Eq. 7.73 depends only on the probability law of the
Gaussian process X1 defined completely by its first two moments, which can be calculated
from Eq. 7.10. Because the moments of X1 depend on the entries of a, the objective
function ein Eq. 7. 73 is a function of a. The optimal matrix a minimizes e. The entries of
the optimal matrix a satisfy a coupled system of nonlinear algebraic equations.
There are several reasons to question the accuracy of the approximating solution Xz
delivered by the equivalent linearization method: ( 1) there is little justification for using e
in place of e, (2) the approximation Xz is a Gaussian process while X is not, and (3) the
frequency contents of Xz and X may differ significantly. .A
Despite all these limitations that can cause significant errors in reliability
studies when using X 1 as an approximation for X, the equivalent linearization
method is used extensively in engineering applications. Numerical results and
theoretical considerations suggest that the method gives satisfactory approxima-
tions for the first two moments of the state vector at a fixed time [ 177, 153]. The-
oretical considerations on the equivalent linearization method can be found in
[19, 177, 153]. The equivalent linearization method has been extended to (1) ac-
commodate Poisson rather than Gaussian noise [80] and (2) use nonlinear systems
with known solutions to approximate the properties of the state of nonlinear sys-
tems whose solution is not known [175].
Example 7.45: Let X be the displacement of a Duffing oscillator defined by

X(t)+2f3X(t)+v 2 (x(t)+aX(t) 3 ) = W(t)

where f3 > 0 denotes the damping ratio, a > 0 provides a measure of nonlinearity,
v > 0 is the frequency of the associated linear oscillator (a = 0), and W is a
stationary Gaussian white noise with mean zero and one-sided spectral intensity
go > 0. The displacement X z of the equivalent linear system is the solution of
Xz(t) + 2 f3 Xz(t) + v;q Xz(t) = W(t),
where Veq is the only unknown coefficient. If the response is stationary, the natural
frequency of the equivalent linear oscillator is

1+
3a n 2go)
2v2 1+
(
Veq = .
f3v
7.3. Nonlinear systems 499

The frequency veq is equal to the frequency of the associated linear oscillator
(a= 0) corrected by a factor depending on the magnitude of a.¢
Proof: The linear differential equation for Xz is constructed by replacing the nonlinear
restoring force v 2 (X(t) +a X(t) 3) by the linear restoring force v~ Xz(t). The solution of
aejav~ = 0 under the assumption that Xz is a stationary process is
2 = v2 ( 1 +a E[Xt(t) 4 ]) 3rr-go)
= v2 ( 1 +a -
Veq 2 2- ,
E[Xt (t) ] 4,8 Veq

where e= E [ (v 2 (Xt(t) +a Xz(t) 3 ) - v~ Xz(t) ) 2]. The last equality in the expression
of v~q holds because Xz is a Gaussian process and E[Xz(t) 2 ] = rr go/(4 ,8 v~) for a linear
oscillator. The above nonlinear algebraic equation for ~ can be solved exactly in this
case. Generally, the equations satisfied by the unknown parameters of an equivalent linear
system have to be solved numerically. •

Perturbation method. Suppose that the linear and nonlinear parts of the matrices
a and b in Eq. 7.55 are of order one and s, 0 < s «
1, respectively, that is,
a = az +san and b = bz + s bn, where the entries of matrices a, az, an, b, bz,
and bn are of order one. Then the defining equation for X is
dX(t) = (az(X(t), t) + s an(X(t), t)) dt
+ (bz(X(t), t) + s bn(X(t), t)) dB(t). (7.74)
The solution is sought as a power series in s, that is,
+ s X(l)(t) + s2 X( 2)(t) + · ·. ,
X(t) = X(O)(t) (7.75)
and is usually approximated by X (t) ~ X (O) (t) + s X(l) (t), referred to as the first
order perturbation solution.

The processes X(O) and X(l) in the first order perturbation solution satisfy the
differential equations

dX(O)(t) = az(X(O)(t), t) dt + bz(X(O)(t), t) dB(t),


dX(l)(t) = az(X( 1)(t), t) dt + bz(X(l)(t), t) dB(t)
+ an(X(O)(t), t) dt + bn(X(O)(t), t) dB(t). (7.76)

Proof: The perturbation solution and the definition of X (Eqs. 7.74 and 7.75) give

d (x(0)(t) + e X(l)(t) + ... )


= [a 1 ( x(O) (t) + e x(I) (t) + · · · , t) + e an ( x(O) (t) + e x(I) (t) + ... , t) J dt
+ [bz (x(0l(t) +e xOl(t) + ... , t) + ebn (x(0l(t) + ex( 1l(t) + ... , t)] dB(t).
500 Chapter 7. Deterministic Systems and Stochastic Input

Because a1 and bt are linear functions of X, we have

at(X(t), t) = at(X(O)(t), t) + eat(X( 1)(t), t) + · · · ,


bt(X(t), t) = bt(X(O)(t), t) + e bt(X(l)(t), t) + · · · .

First order Taylor approximations of an and bn about x<0l (t) are

an(X(t),t)=an(X(O)(t),t)+"
d aan (X<0l ' t) (
eX~l)(t)+e 2 x~2)(t)+···),
~ 8x· I I
i=l I

The above equations and approximations yield Eq. 7.76 by equating terms of the same
order of magnitude. •

Additional considerations on the perturbation method are in Chapter 8. We


only state here that the method applies for small values of e, that is, in the context
of this chapter, for nearly linear dynamic systems. However, in some cases, the
method can give incorrect results even for small values of e (Example 7.47).

Example 7.46: Let X be the displacement of the Duffing oscillator in Exam-


ple 7.33 in which a =
e > 0 is the only small parameter in the defining equation
for X. The components of the first order perturbation of X are defined by

x<0l(t) + 2 ~ v x<O) + v2 x<0l(t) = W(t),


X(l) (t) + 2 ~ v X(l) + v 2 X(l) (t) = -v 2 X(O) (t) 3 ,

so that x<O) is a Gaussian process but x<I) is not. The process x<I) is the solution
of a linear system driven by a polynomial of a filtered Gaussian process (Exam-
ple 7 .23). The stationary mean and variance of X obtained from the first order
perturbation solution are zero and Var[X(t)] :::::: aJ (1- 3eaJ) + o(e), where
aJ = n go/(4 ~ v 3 ) is the variance of the displacement of the associated linear
system (e = 0), that is, the process X(O). ¢

Proof: The differential equations for x(O) and x(l) follow from Eq. 7.76.
The stationary processes x<O) and x< 1l have mean zero. The variance of the first
order perturbation solution is

Var[X(t)]:: E[(X(O)(t) + e x< 1l(t)) 2 ]


= E[X(O)(t) 2] + 2 e E[X(O)(t) X(l)(t)] + e 2 E[X( 1)(t) 2].

The approach in Example 7.23 or direct calculations using the solutions of x<0l and x<I)
can be used to find the approximate variance of X based on the first order perturbation
solution ([175], Example 6.11, p. 237). •
7.3. Nonlinear systems 501

Example 7.47: Suppose that the damping ratio~ of the Duffing oscillator in the
previous example is also a small parameter, that is, ~ = s > 0. The perturbation
method fails because the variance of the perturbation solutions increases indefi-
nitely in time while the density of the stationary displacement X is (Example 7 .42)

fs(x) = c exp [- 2 ~J (x 2 + i J.
x4) s > 0,

where c > 0 is a constant. 0


Proof: The first two terms of the perturbation solution are defined by (Eq. 7.76)

_x(O)(t) + v2 X(O)(t) = W(t),


_x(l) (t) + v2 x<l) (t) = -2 v x<0) (t) - v2 x<0) (t) 3 ,

so that the variance of the processes x(O) and x(l) approaches infinity as t --+ oo since the
linear oscillators defining these processes have no damping. •

Stochastic averaging. There are many applications in which the state X varies
slowly in time, that is, X(t) "' O(s), where s > 0 is a small parameter ([122],
Section 4.7). It is shown that approximate equations can be developed for the
evolution of X by averaging the differential equations developed for some mem-
oryless transformations of this process. We illustrate this approach, referred to as
the method of averaging, with two examples. A theorem justifying the validity of
the method of averaging is stated.

Example 7.48: Let x be the displacement of a nonlinear oscillator defined by

x(t) + v 2 x(t) = s g(x(t), x(t), t),

where v > 0 is a constant, g is an integrable function, and s > 0 is a small param-


eter. The solution of this equation can be approximated by x(t) = a(t) cos( 1jr(t)),
where 1fr(t) = v t + ~(t),

ii(t) = -2 -
8 lo
Jl' v 0
2n
g(a cos(1jr), -a v sin(1jr), t) sin(1jr) d1jr,

~(t) = -2 -
8 _ f2n g(a cos(1jr), -a v sin(1jr), t) cos(1jr) d1jr,
n va lo
a
and and ~ represent averages of the amplitude and phase of x (t) over a window
of size 2 n jv. 0
Proof: The change of variables,

x(t) = a(t) cos('f/f(t)) and i(t) = -v a(t) sin('f/f(t)),


502 Chapter 7. Deterministic Systems and Stochastic Input

and the defining equation for x give

a(t) = --8v g(a cos(l/1), -a v sin(l/1), t) sin(l/1),


• 8
cp(t) = - - g(a cos(1/f), -a v sin(l/1), t) cos(l/1)
va
for the amplitude a and the phase cp of x, where 1/f(t) = vt + cp(t). The differential
equations for a and cp are exact. They show that (1) a and cp vary slowly in time since
their rate of change is of order 8 and (2) the right side of the above equations are nearly
periodic functions of time with period 2 1r / v. The approximate equations for the amplitude
and phase result by averaging the above equations over a time interval [0, 2rr/v] under
the assumption that a(t) and cp(t) do not change during this time interval, so that these
functions can be approximated by their average values a(t) and ~(t) in [0, 2rrfv]. The
average value of 1/1 is 'if,(t) = v t + ~(t). An alternative version of this approach, referred
to as the Krylov-Bogoliubov-Mitropolsky averaging method, can be used to derive more
accurate approximations for x ([175], Section 6.3). •

Example 7.49: Let X be a real-valued stochastic process defined by

X(t) + e2 h(X(t), X(t)) + v 2 X(t) = e Y(t),

where h is a nonlinear function, v, e > 0 are some constants, and Y is a sta-


tionary, broadband Gaussian process with mean zero, correlation function r ( -r) =
E[Y(t) Y(t + -r)], spectral density s(·), and correlation time 't"c. The correlation
time provides information on the time lag beyond which the dependence between
values of Y is weak. The correlation time can be calculated, for example, from
the condition r(-rc)/r(O) = e- 1. Consider the change of variable

X(t) = A(t) cos(\ll(t)) and X(t) = -v A(t) sin(\ll(t)),


where \ll(t) = v t + <l>(t). If e is a small parameter, the evolution of the amplitude
A and phase <I> can be approximated by

-
dA(t) = e2 [ -1 hs(A(t))
-
+ 1r s(v)
_ ]
dt + e ../rr s(v) dB1 (t)
v 2 v2 A(t) v
1 -
- 2
d<l>(t) = e - -- hc(A(t)) dt + e ../rrs(v)
_ dBz(t),
v A(t) v A(t)
where

hs(a) = - 1- f2 h(a cos(l/1), -v a sin(l/1)) sin(l/1) dl/1,


lo
1f
2rr

hc(a) = - 1- f2 h(a cos(l/1), -v a sin(l/1)) cos(l/1) d1/f,


lo
1f
2rr
and B1, Bz denoteindependentBrownianmotions [152]. ¢
7.3. Nonlinear systems 503

Proof: The change of variables (X, X) t-+ (A, \ll) applied to the defining equation for X
gives
2
A(t) = ~v h(A cos(\ll), -v A sin(\ll)) sin(\ll) - :_ Y(t) sin(\ll),
v
.
= -e e
2
<l>(t) h(A cos(\ll), -v A sin(\ll)) cos(\ll) - - Y(t) cos(\ll).
vA vA
The temporal average over a time interval of duration 2 n I v of the terms in the above
equations that do not include the input Y is

!. e2 - e -
A= - hs(A)- - Y(t) sin(\ll),
v v
e 2- e -
!.
<I> = -- hc(A)- - - Y(t) cos(\ll),
vA vA
where ~(t) = v t + cP(t). We have performed a similar temporal average in the previous
example.
The next step of the stochastic averaging method is to approximate the random input.
It is assumed that the correlation time rc of the driving noise Y is such that there exists a
time D.t > 0 with the properties (1) A and if> are nearly constant in [t- D.t, t] for any t ?: 0
and (2) D.t » rc. Conditional on A(t - D.t) =a and cP(t - M) = ¢,we have

Yt (t) = Y(t) sin(v t + cP(t)) ~ Y(t) [sin(v t + ¢) + cos(v t + ¢) D.<t>(t)],


since

sin(v t+ cP(t)) = sin(v t + ¢ + D.cP(t))


= sin(v t + ¢) + cos(v t + ¢) D.cP(t) + O((D.cP(t)) 2),
where D.cP(t) = (cP(t)- cP(t- D.t)) I (A(t- D.t) = a, cP(t- M) = ¢). We need to
determine the second moment properties of the processes Yt (t) = Y(t) sin(v t + cP(t)) and
Y2(t) = Y(t) cos(v t + cP(t)) driving the amplitude and phase equations. The expectation
E[Yt (t)] ~ cos(v t + ¢) E[Y(t) D.cP(t)] can be calculated from the evolution equation for
if> giving
-
D.<l>(t) ~
e2
-hc(a)--
va va t-At
e it
Y(a) cos(va +</J)da

so that

e
E[Yt (t)] = - -
va
it
t-At
E[Y(t) Y(a)] cos(v t + ¢) cos(v a+ rfJ) da

~ --e
va
10
-oo
r(r) cos(v t + ¢) cos(v (t + r) + r/J) dr

= -- e
2 -va
10_ 00
r(r)[cos(v r) + cos(v (2t + r) + 2¢)]dr
e
= -2 -
va
10-oo
r(r) cos(v r) dr: + oscillatory terms.
504 Chapter 7. Deterministic Systems and Stochastic Input

The second equality in the above equation holds approximately since the correlation func-
tion is nearly zero for time lags larger than <c « lit. The oscillatory terms vanish by
temporal averaging over a time interval [0, 2 7l' I v] so that

-1°
8
E[Y1 (t)]:::::: - -
2v a _ 00
r(r) cos(v r) dr

= ---
B 100 r(r) cos(v r) dr = ---
7l' B
s(v).
4va -oo 2 va

The correlation and covariance functions of Y1 are

r1 (<) = E[Y1 (t) Y1 (t + r)]:::::: E[Y(t) Y(t + r) sin(v t + f/J) sin(v (t + r) + f/J)]
1
= 2 r(r) [cos(v r)- cos(v (2t + r) + 2f/J)]
= 21 r(r) cos(v r) + .
oscillatory terms

and q (<) :::::: r1 (<) - (tv~ s(v) ) 2 = rt (<) + O(s 2), respectively. Because r1 and q are
proportional to r, Y1 is a broadband process that can be approximated by a white noise with
intensity

q1 =! 100
2 -00
r(r) cos(v r) dr + oscillatory terms:::::: 7l' s(v),

where the last equality results by neglecting the oscillatory terms. Hence, Y1 can be ap-
proximated by
Jl'B ~
yl (t) :::::: - - s(v) - v 7l' s(v) wl (t),
2va
where W1 is a stationary Gaussian white noise with mean zero. Similar considerations
can be used to show that Y2(t) :::::: -v'rr s(v) W2(t), where W2 is a zero-mean Gaussian
white noise. The above approximate representations of Y1 and Y2 introduced in the dif-
ferential equations for (A, cP) give the stochastic differential equations for the approximate
amplitude and phase of X. •

It turns out that the approach used in the previous two examples to derive
simplified differential equations for the state of a system driven by random noise
is valid under some conditions. Consider an JRd -valued stochastic process defined
by the differential equation

X(t) = e 2 «(X(t), t) + B {J(X(t), t, Y(t)), (7.77)

where the JRd -valued functions«, fJ satisfy a sequence of conditions given by the
Stratonovich-Khas'minskii theorem [112], s > 0 denotes a small parameter, and
Y is an JRd' -valued stationary Gaussian process whose coordinates are broadband
processes with mean zero.
7.3. Nonlinear systems 505

If a and fJ in Eq. 7.77 satisfy the conditions in [112], then the solution X of this
equation converges in the weak sense to the solution of

dX(t) = 82 a(X(t)) dt + 8 b(X(t)) dB(t) (7.78)

as 8 ,1- 0, where B an ~d' -valued Brownian motion and

(7.79)

Note: The approximate solution of X in Eq. 7.78 has the same functional form as the
stochastic differential equation for (A, cP) in Example 7.49. The conditions that the matri-
ces« and fJ in Eq. 7.77 need to satisfy and the type of weak convergence of X to X are
defined in [112], and are not stated here.
The average operator rav is applied over all explicit time arguments. If the functions
« and fJ are periodic with period 't'(), this operator becomes

T~v {·} =-
1 lt+ro {·} ds.
t'Q t

The symbol fJ{JjfJX denotes a (d, d) matrix with rows (8f3k/8Xb ... , 8f3k/8Xd), k =
1, ... , d. The entries bij of b can be calculated from the expressions of b 17 given by
Eq. 7.79. A

Example 7.50: The Stratonovich-Khas'minskii theorem applied to the differen-


tial equation in Example 7.49 yields the stochastic differential equations for the
approximate amplitude and phase of the state X obtained by direct calculations in
this example [152]. <>

Proof: The exact differential equations for X = (A, <I>) in Example 7.49 show that
a1 = -v1 h(A cos(\11), -v A sin(\11)) sin(\11),
az = v1A h(A cos(\11), -v A .
sm(\11)) cos(\11),

lh = - ~ Y sin(\11),
v
1
{Jz =- vA Y cos(\11),

with the notation in Eq. 7.77, where ll'(t) = v t + <l>(t). The first coordinate of the drift
of the stochastic differential equation for the approximate amplitude and phase of X is
506 Chapter 7. Deterministic Systems and Stochastic Input

(Eqs. 7.78 and 7.79)

a1 = Tav E { ~ h(A cos(\11), -v A sin(\11)) sin(\11)}

+ Tav {[
0
00 E [( :~~) t <Pl)t+, + ( :~~) t (P2)t+< Jd<} .
Because iJP1/iJX1 = iJP1/iJA = 0 and iJP1/iJX2 = iJP1/iJcf> = -Y cos(\11)/v, the second
term on the right side of the above equation is

+ {1° Tav E [Y(t) Y(t + <) cos(\ll(t)) cos(\ll(t + <))] d•}


+ {1°
v A -oo

= Tav r(<) cos(\ll(t)) cos(\ll(t + <)) d<} = 7r 82(v),


v A -oo 2v A

where the last equality results by time averaging, as performed in Example 7.49 to find
E[Y1 (t)]. Hence, we have a1 = hs(A)jv + 1r s(v)/(2 v 2 A). The first and second com-
ponents of a1o that is, hs(A)/v and 1r s(v)/(2 v2 A), correspond to the time and stochastic
averaging (Example 7.49). Similar calculations can be used to find the remaining drift and
diffusion coefficients. For example, the entry (1, 1) of b bT is

btl+ ht2 = v12 Tav {L: r(<) sin(v t + rp) sin(v (t + <) + rp) d<}.
The expression under the integral coincides with the approximate correlation function of
the process Y1 in Example 7.49. •

Example 7.51: Let X be the solution of

X(t) + 2e 2 v X(t) + v 2 X(t) = e Y(t),

with the same notation as in Examples 7.49 and 7 .50. The approximate amplitude
and phase of X are the solutions of

dA(t)
-
= -e 2 ( -
v A(t) -
n s(v) )
_ dt + e ,Jn s(v) dB1 (t),
2 v2 A(t) v
-
dc'l>(t) = e
.../n_s(v) dB2(t).
v A(t)
An alternative form of the amplitude equation is

dR(t) = -e 2 v ( R(t)- 2 ~(t)) dt + e ../V dB1 (t),


where R = Af(a .J'i), where a 2 = n s(v)/(2 v3 ). The approximate amplitude
equations in Example 7.49 and this example are not coupled with the phase so that
they can be solved independently. <>
7.3. Nonlinear systems 507

Note: The above stochastic differential equations for A and <i> can be obtained by following
the approach in Example 7.49 or using the formulation in Eqs. 7.78 and 7.79. The equations
for A and <i> were originally derived in [8]. 6

Monte Carlo simulation. Monte Carlo simulation is the most general and con-
ceptually simple method for estimating properties of the state vector X. The
method involves (1) the generation of samples of the driving noise B in Eq. 7.55,
(2) numerical solutions of Eq. 7.55 to obtain samples of X (Sections 4.7.3 and
5.3.3.2), and (3) statistical techniques for estimating the required properties of X.
The only limitation of the Monte Carlo method is the computation time,
which can be excessive. For example, the estimation of the reliability of very
safe dynamic systems requires the generation of many samples of X since the
probability that the design conditions are violated is extremely low. The following
example presents another practical situation in which Monte Carlo simulation can
be inefficient.

Example 7.52: The process X = x e-t/ 2+B(t), called geometric Brownian mo-
tion, is the solution of dX (t) = X (t) dB(t), t 2: 0, with the initial condi-
tion X (0) = x, where B denotes a Brownian motion (Example 7 .35). Sup-
pose we want to estimate the moment of order q of X(t) from samples of this
process generated by Monte Carlo simulation. Let fl(q; t) be an estimator of
E[X (t)q] defined as the arithmetic average of the random variables X; (t)P, where
X;(t), i = 1, .. ., are independent copies of X(t). The number of samples of
X required such that the coefficient of variation of [l(q; t) has a specified value
Vreq is nreq = (eq 2 t - 1) jv}eq·
This number can be very large, for example,
nreq = 1, 192, 000 samples are needed so that Vreq = 0.05 fort = 1 and q = 4. <>

Proof: We have

where the last equality holds because q B(t) ~ N(O, q2 t). This result can also be obtained
by Ito's formula applied to the function X(t)q.
k
Let (l(q; t) = :L7=I Xi(t)q be an estimator of E[X(t)q], where Xi(t) are inde-
pendent copies of X ( t). This estimator is unbiased with variance

and coefficient of variation C.o.v.[(l(q; t)] = J<eq 2 t - l)jn. The required number of
samples nreq results from the condition C.o.v.[(l(q; t)] = Vreq· •
508 Chapter 7. Deterministic Systems and Stochastic Input

7.3.2 Semimartingale input


Let X be the solution ofEq. 7.54 in which the input Y is a semimartingale
noise or a polynomial of a semimartingale process, that is, X is the solution of
(Eqs. 7.41 and 7.42)

dX(t) = a(X(t-), t) dt + b(X (t- ), t) dS(t), t ::: 0, or (7.80)


X(t) = a(X(t-), t) + b(X(t-), t) p(S(t-), t), t ::: 0, (7.81)

where Sis an JRd' -valued semimartingale defined by Eqs. 7.30 and 7.32 and the
coordinates of the JRdp -valued process pare polynomials of S (Eq. 7.29). Polyno-
mials of semimartingales are semimartingales since sums of semimartingales are
semimartingales and so are smooth memoryless transformations of semimartin-
gales (Ito's formula, Section 4.6.2).
As in the previous sections, our objective is to develop differential equations
for properties of the state vector X. The analysis is based on Ito's formula for
semimartingales (Eq. 7.2). This formula can be applied to functions of the state
X in Eqs. 7.80 and 7.81 or functions of an augmented state vector Z including X.
We have referred to these two ways of using the Ito formula as the direct and the
state augmentation methods, respectively.

7.3.2.1 Direct method. Square integrable martingales


We have seen that the moment equations for the state X of a nonlinear
system driven by Gaussian white noise generally form an infinite hierarchy so that
the moments of X cannot be calculated exactly. This situation does not change if
the Brownian motion in Eq. 7.55 is replaced by a semimartingale noise.
LetS be defined by Eqs. 7.30 and 7.32 with A = 0 and H = 0, that is,
dS(t) = K(t) dC(t). It is assumed that K E £satisfies a condition in Eq. 7.33.
We derive differential equations satisfied by the mean and correlation functions,
p,(t) = E[X(t)] and rpq(t, t) = E[Xp(t) Xq(t)], of X in Eq. 7.80, and use the
notation K(s) = b(X(s-), s) K(s).

If X is defined by Eq. 7.80 with dS(t) = K(t) dC(t), then

jL(t) = E[a(X(t- ), t)],


rpq(t, t) = E[ap(X(t-), t) Xq(t-)] + E[aq(X(t-), t) Xp(t-)]
+ t
a=l
E [ (ilCa(s)f] E [ Kpa(t) Kqa(t)], (7.82)

provided that the above expectations exist and are finite.


Proof: The expectation ofEq. 7.80 is

dE[dX(t)] = E [a(X(t- ), t)] dt + E [b(X(t- ), t) K (t) dC(t)].


7.3. Nonlinear systems 509

Because theincrementdC(t) ofC is independent of b(X(t-), t) K(t) and E [ dC(t) J= 0,


the second expectation on the right side of the above equation is zero. These observations
yield the first formula in Eq. 7.82.
The Ito formula applied to the function X(t) ~-'* Xp(t) Xq(t) gives

d lot
Xp(t) Xq(t)- Xp(O) Xq(O) =L (.5p; Xq(s-) + Xp(s-) .5qi] dX;(s)

t
i=l O+

+ L [xp(s) Xq(s)- Xp(s-) Xq(s-)- [.5p; Xq(s-) + Xp(s-) .5qi] t.X;(s)].


O<s~t i=l

The equality Xp(s) = Xp(s-) + t.Xp(s) and previous calculations (Eq. 7.38) show that
the last term on the right side of the above equation is LO<s<t t.Xp(s) t.Xq(s). The
average of the resulting equation is -

rpq(t, t)- rpq(O, 0) = E [fo~ ap(X(s-), s) Xq(s-) ds

+In' aq(X(s-),s)Xp(s-)ds+ L t.Xp(s)t.Xq(s)].


0+ O<s~t

If the above expectations exist and are finite, the equalities,

E [ L t.Xp(s) t.Xq(s)] =E [ L t Kpa(s) Kqp(s) t.Ca(s) t.Cp(s)]


0<s9 0<s9 a,f3=1

= L tE[Kpa(s)Kqa(s)JE[(t.Ca(s)t].
0<S9ct=l

yield Eq. 7.82 by differentiation with respect to time. •

Generally, Eq. 7.82 is not a differential equation for the moments of X but it
becomes such an equation if a and bare polynomials of X. Usually, the collection
of equations for moments of X forms an infinite hierarchy so that these equations
cannot be solved exactly. Approximations, for example, closure methods, can be
used to find the second moment properties of X.

7.3.2.2 Direct method. General martingales


Suppose that the input is dS(t) = H(t) dB(t) + K (t) C(t), that is, S
is given by Eqs. 7.30 and 7.32 with A = 0. The state X may not have mo-
ments so that we attempt to derive a differential equation for the characteristic
function cp(u; t) = E [exp(.J=TuT X(t))], u E JRli, of X(t). The notation
H(t) = b(X(t-), t) H(t) and K(t) = b(X(t-), t) K(t) used in the following
equation differ from ii and Kin Eq. 7.38 since bin the current setting depends
on X.
510 Chapter 7. Deterministic Systems and Stochastic Input

The characteristic function of X satisfies the condition

t)]
d
acp~~; t) = .J=1 I > i E [e.J=IuT X(t-) ai(X(t-),
i=l
d
- ~ ~ Ui UJ E [e.J=IuT X (f-) ( H(t) H(t)T)iJ]
1,]

de
+ L>a { E [e.J=IuT X(t-) e.J=T I:f=l ui I:~ 1 Kia(t) ~Ca(t) J _cp(u; t)}.
a=l
(7.83)

Proof: The increment of function X(t) r-+ e R uT X(t), u E JRd, in [0, t] is

=L t
d
e.J=IuT X(t)- e.J=IuT X(O) J=T Ui e.J=IuT X(s-) dXi(s)
i=l lo+

- ~ t lot ui uJ e R uT X(s-) [~ iiik(s) dBk(s), ~ Hjl(s) dBz(s)]


i,J=l O+ k=l 1=1

+ L [e.J=IuT X(s) -e.J=IuT X(s-)- tHuie.J=IuT X(s-) !J.Xi(s)],


O<s~t i=l

where

db de
dXi(s) = ai(X(s-), s) ds + L Hik(s) dBk(s) +L Kia(s) dCa(s)
k=l a=l

d - -
and !J.Xi(s) = La~l Kia(s) l!.Ca(s). The above Ito formula, the assumption that all
expectations in Eq. 7.83 exist and are finite, and considerations similar to those used to
derive Eq. 7.39 give Eq. 7.83 by differentiation with respect to time. •

The condition in Eq. 7.83 is not a partial differential equation for cp because
the expectations in this equation cannot be expressed as functions of cp and its
partial derivatives for arbitrary a, b, il, and K. This condition becomes a par-
tial differential equation for cp under rather restrictive conditions. More general
systems can be analyzed by the state augmentation method discussed in the next
section.

Example 7.53: Let X be the solution of the stochastic differential equation

dX(t) = -(p X(t) + ~ X(t) 3 ) dt +a dLa(t), t::: 0,


7.3. Nonlinear systems 511

where p > 0, ~, and a are some constants and La is an a-stable process. The
marginal distribution of X has a heavy tail, that is,

P(X(t) > x) "'cx-P asx ~ oo,

where c, {J > 0 are some constants. Estimates of fJ based on a sample of X of


length 1,000,000 generated with a time step tl.t = 0.001 are in the range (2.5, 5)
for p = 1, ~ = 1, a = 0.2, and a = 1.
Formal calculations show that the characteristic function cp(·; t) of X(t) is
the solution of

This equation cannot explain the heavy tail of the distribution of X.


We have seen that the characteristic function cp(u; t) = E[eH u X(t)] of X
is the solution of the partial differential equation (Example 7 .20)

acp acp a
at= -pu au -lui cp.

for ~ = 0 and a = 1. This equation is a special case of the above partial differen-
tial equation of cp. ¢
Proof: Note that the process La in this example does not belong to the class of inputs S
in Eq. 7.83 so that this equation cannot be used. The It() formula applied to the function
X(t) ~ eHuX(t) gives

eHuX(t) _ eHuX(O) = -.J=l u f' (pX(s-) + s X(s-)3) eHuX(s-) ds


lo+
+ L [eAuX(s-) (eHuLU(s)_ 1)].
O<s:::t

The expectation of the terms on the left side of the above equation gives q; at times t and
t = 0. However, the expectation of the right side of this equation cannot be calculated term
by term since individual terms may not have finite expectations. Hence, it is not possible to
find a partial differential equation for q;. If we calculate the expectation term by term, the
result of these formal calculations is

<p(u; t)- <p(u; 0) = -.J=l u fo~ E [ (p X(s-) + s X(s-) 3) eHuX(s-)] ds

+ L E [eAuX(s-) (eAuAX(s) _ 1)].


O<s:::t

Differentiation with respect to time and considerations as in Example 7.20 yield the above
equation for the characteristic function.
Let Xi = X(i ~t), i = 1, ... , n, be a sample of X generated by Monte Carlo
simulation with a time step ~~ > 0. Let X(l) 2: · · · 2: X(n) be the order statistics
512 Chapter 7. Deterministic Systems and Stochastic Input

derived from the sample X1, ... , Xn. For example, X(l) = mroq:Si:Sn Xi and X(n) =
min1:si:sn Xi are the largest and the smallest observations in the record of X. It can be
shown that the Hill estimator [100]

Hk,n =! t
k 1=1
log ( X (j) )
x<k+1)
converges in probability to 1/f3 ask, n --+ oo such that kjn --+ 0 [100]. The range (2.5,5)
of the estimated values of f3 correspond to ratios kjn = 0.0001, 0,0005, 0.001, 0.005, and
0.01 and n = 1, 000, 000.
The parameter f3 gives the rate of decay of the tail of the distribution of X(t). We
also note that the asymptotic expression of P(X(t) > x) gives
log (P(X(t) > x)) ~ log(c)- f3 log(x) as x--+ oo
so that the graph (log(x), log (P(X(t) > x))) is a straight line with slope -{3. It is common
to refer to this behavior of the tail of the distribution of X as power law [15]. •

7 .3.2.3 State augmentation method


The state augmentation method can be applied to both evolution equations
for X, that is, Eqs. 7.80 and 7 .81. The analysis of the two evolution equations is
based on the same formulas but the augmented state vector used in analysis may
differ. Details of the state augmentation method are given in Section 7.2.2.3 and
are not repeated here. We only present an example illustrating the application of
the method.
Example 7.54: Let X be the solution of
dX(t) = a(X(t)) dt + b(X(t)) dS(t)
where dS(t) = 3 (B{t) 2 - t) dB(t). The moments p,(p, q; t) = E[X(t)P Y(t)q]
of order p + q of Z =(X, Y =B) satisfy the condition
jt(p, q; t) = p E[X(t)P- 1 Y(t)q a(X(t))]
+ (9 p (p- 1)/2) E[X(t)P- 2 Y(t)q b(X(t)) 2 (Y(t) 2 - t) 2 ]
+ 3 pq E[X(t)P- 1 Y(t)q- 1 b(X(t)) (Y(t) 2 - t)] + (q (q- 1)/2) p,(p, q- 2; t).
If a and b are polynomials of X, this equation becomes a differential equation for
the moments of Z = (X, Y). However, the resulting moment equations are not
closed.
The characteristic function q;(u; t) of Z(t) satisfies the differential form

~~ = .J=-1 Ul E [eAur Z(t) a(X(t))]


- (9/2) ui E [eAur Z(t) b(X(t)) 2 (Y(t) 2 - t) 2 ]

-3 u1 uz E [eAur Z(t) b(X(t)) (Y(t) 2 - t)]- (1/2) u~ q;,


7.4. Applications 513

which becomes a partial differential equation for cp if the coefficients a and b are
polynomials of X. <>
Proof: The augmented vector Z is the solution of

{
dX(t) = a(X(t)) dt + 3 b(X(t)) (Y(t) 2 - t) dB(t),
dY(t) = dB(t).
The differential equations for the moments and the characteristic function of Z result by ap-
plying the Ito formula to the functions X (t)P Y (t)q and eH uT Z(t), respectively, taking
the average, and differentiating the result with respect to time. •

7.4 Applications
The developments in the first part of this chapter are used to illustrate solu-
tions of some stochastic problems, which can be defined by Eq. 7.1. We present
elementary criteria for model selection (Section 7.4.1), demonstrate how stochas-
tic differential equations and diffusion processes can be used to describe the evo-
lution of some properties of random heterogeneous materials (Section 7 .4.2), dis-
cuss methods for reliability analysis based on the crossing theory for stochastic
processes, Fokker-Planck equation, probability bounds, and Monte Carlo simula-
tion (Section 7 .4.3), introduce elements of finance and establish the Black-Scholes
formula (Section 7.4.4), and give essentials of linear estimation and Kalman-Bucy
filtering (Section 7 .4.5).

7.4.1 Models
In many cases, the laws defining the evolution in time of the state X of a
system are either partially understood or too complex to be derived from basic
principles, for example, climatic changes, wave and wind forces, stock prices,
seismic ground acceleration, and many other phenomena. A common approach in
these cases is to select a probabilistic model for X.
The selection of a model for X is based on the available information,
which consists of (1) one finite record or a set of finite records of X and (2) phys-
ical properties of this process, for example, if X(t) represents the strength of a
fiber at location t, it must be positive. Because of the limited information the
model selection problem does not have a unique solution. Generally, there are
many competing models, that is, models that are consistent with the available
information. The selection of a model for X involves the following two steps.
1. Specify a finite collection of competing models Mk, k = 1, 2, ... , m.
There are no theorems defining the models Mk. Simplicity and use should
guide the selection of the collection of competing models. Nevertheless, in
applications, familiarity with a particular model plays a significant role in
the construction of the collection of competing models.
514 Chapter 7. Deterministic Systems and Stochastic Input

2. Find the model Mopt = Mk0 , ko E {1, ... , m}, that is optimal in some
sense. In Section 9.8.2 we present a Bayesian framework for finding M opt·

We use examples in this section to illustrate the classical approach for model
selection and demonstrate the practical importance of accounting for use in the
solution of the model selection problem.

7.4.1.1 Earth climate

It has been shown that earth climate is strongly correlated with calcium
concentration in ice deposits [199]. This dependence suggests that calcium con-
centration records obtained from ice-core samples can be used as a proxy for the
evolution of earth climate in time. Figure 7.12 shows an 80,000 year record of the

-1

-2

-3

~o~ __ _ L_ _ _ _L __ _ ~----4L---~s----~--~--~

time (years)

Figure 7.12. Logarithm of calcium concentration in an ice-core record

logarithm of calcium concentration X obtained from an ice-core sample.


The sample in Fig. 7.12 shows that X has two equilibrium points corre-
sponding to the interstadial and the full glacial states. The typical residence of X
in a state is between 1,000 and 2,000 years. These observations suggest that we
model the evolution of X by a stochastic differential equation whose drift corre-
sponds to a potential with two wells.
7.4. Applications 515

A model of X. The process X defined by

dX(t) = -u'(X(t)) dt + ay dY(t) + a1 dLa(t),


dY(t) = -Y(t)dt +-/I+ Y(t)2dB(t) (7.84)

has been proposed to model the evolution of earth climate, where u is a two-well
potential, B denotes a Brownian motion, La is an a-stable process, and ay, a1
are some constants [55].

Note: According to [55], Eq. 7.84 is a competing model since it is consistent with (1) the
record in Fig. 7.12 for a = 1.75 and ay/CFi = 3 and (2) current climate dynamics theories.
Note that the differential equation of X can be given in the form

dX(t) = -u' (X(t)) dt + dS(t),


where S is defined by

so that it is a semimartingale. Hence, properties of X can be obtained by the methods


presented previously in this chapter. ~

The selection of the stochastic process in Eq. 7.84 essentially follows the
classical approach in modeling. First, a functional form has been postulated for the
logarithm of the calcium concentration X. Second, the record in Fig. 7.12 has been
used to estimate the parameters of the postulated model. Details of the method
used to estimate the parameters of the processes in Eq. 7.84 are in [55]. The
potential in Eq. 7.84 has been obtained from the marginal histogram of X and the
relationship between the marginal distribution and the potential (Example 7.41).
The noise properties have been obtained from the statistical analysis of the record
in Fig. 7.12 corrected by the estimated drift of X. The analysis has shown that the
driving noise has a component ofthe a-stable type with a= 1.75 [55].
In most applications the collection Mk. k = 1, ... , m, of competing mod-
els consists of a single model M with a specified functional form but unknown
parameters. Therefore, the model selection problem is reduced to the estimation
of the unknown parameters of M. This classical solution of the model selection
problem has been used in [55] to establish the model M in Eq. 7.84.

7.4.1.2 Non-Gaussian input


Non-Gaussian processes such as (1) memoryless nonlinear transformations
of Gaussian processes, referred to as translation processes (2) Gaussian processes
scaled by random parameters and other conditional Gaussian processes, and (3) pro-
cesses defined by stochastic differential equations, for example, diffusion and fil-
tered Poisson processes, have been used in many fields of applied science and
engineering ([79], Chapter 2).
516 Chapter 7. Deterministic Systems and Stochastic Input

Translation processes can have any marginal distribution F but their cor-
relation function r cannot be chosen independently ofF, and their finite dimen-
sional distributions Fn, n ;:::: 2, are uniquely defined by F and r (Section 3.6.6 in
this book, [79], Section 3.1.1). Conditional Gaussian processes can be useful in
some particular applications but they cannot be calibrated to a specified marginal
distribution and correlation function ([79], Section 3.4). Diffusion processes can
match the marginal distribution of any signal provided that its correlation is ex-
ponential [122]. Filtered Poisson processes can match the correlation function
of any time series but they can only approximate its marginal distribution ([79],
Section 3.3).
Translation processes have attractive features for applications. However,
they cannot be viewed as universal non-Gaussian models. In addition to restric-
tions on their correlation and finite dimensional distribution functions, these mod-
els are not recommended in some cases for other reasons. For example, suppose
that our objective is to find properties of the state X of a system driven by a non-
Gaussian process Y. The representation of Y by a translation model Y T causes
difficulties since there is no theory for finding properties of X driven by Y T. The
output of a linear filter to Poisson white noise or to a polynomial of a class of
Gaussian processes is a preferable model for Y since the augmented system of
equations consisting of the defining differential equations for Y and X can be
solved by the methods discussed in this chapter. We consider the latter model of
Y in this section.
Suppose that the non-Gaussian input Y to a system with state X can be
modeled by the output of the linear filter in Eq. 7.85 to a polynomial Lt=! S1 of
an Omstein-Uhlenbeck process S. Properties of X can be obtained by the state
augmentation method discussed earlier in this chapter, the representation of Y,
and the evolution equation for X. The selection of a model for Y is by no means
trivial and may require many iterations.

Model of input Y.
n
Y(t) + /3 Y(t) + v 2 Y(t) = L S(t)1,
1=0
dS(t) = -a S(t) dt +a~ dB(t), (7.85)

where a > 0, a are some constants and B denotes a Brownian motion.

Note: We have shown in Section 7.2.2.3 that the moment equations for the diffusion pro-
cess Z = (Y, Y, S) are closed so that the moments of any order of the W -valued random
variable Z(t) can be calculated exactly for any t 2: 0. Let

dZ(t) = a(Z(t), t) dt + b dB(t)


be the defining differential equation for Z (Example 7.23).
7.4. Applications 517

The correlation function rpq(t, s) = E[Zp(t) Zq (s)] of Z satisfies the condition


arpq(t,s)
at = E [ ap(Z(t), t) Zq(s) ] , t 2::: s,

since Zp(t)- Zp(s) = J: [ap(Z(u), u) du + bp dB(u)], p = 1, 2, 3, so that


Zp(t) Zq(s)- Zp(s) Zq(s) = lt ap(Z(u), u) Zq(s) du +[ bp Zq(s) dB(u), t 2::: s,

which gives the above differential equation for rpq(t, s) by averaging and differentiating
with respect to t.
The following example shows that the correlation functions rpq can be calculated
from the above equation. Hence, we can calculate exactly the correlation function of the
non-Gaussian process Yin Eq. 7.85 and moments of any order of Y(t), t 2::: 0. A

Example 7.55: Suppose that :L?=o S(t) 1 = S(t) 2 , fJ = 0.25, v 2 = 1.6, a = 0.12,
and a = 1 in Eq. 7.85. The stationary skewness and kurtosis coefficients of Yare
Y3 : : : : 1.8 and Y4 ::::::: 8, respectively. Hence, if we need to analyze the response of a
system to an input with Y3 : : : : 1.8 and Y4 : : : : 8, the process Y can be used to model
this input. Figure 7.13 shows some of the correlation functions of the stationary

0.2

-0.2

-0.4
E[dY(t)/dt dY(t+'t)/dt]
-0.6'----------'----":----'-------'-------'-------.J
0 10 15 20 25 30
time lag, 't

Figure 7.13. Some of the correlation functions of Z

IR3 - valued process Z = ( Y, Y, S). The properties of the linear filter defining Y
control to a great extent the correlation function of this process if :L?=o S 1 is a
broadband process. <>
Proof: The moments J-L(p, q, r; s) are given in Example 7.23. The differential equations
giving the correlation functions rpq(t, s) = E[Zp(t) Zq(s)] involve only moments of Z
518 Chapter 7. Deterministic Systems and Stochastic Input

since the drift coefficients ap(Z(t), t) are polynomials of Z. However, some of these
moments are of order higher than 2 so that they are not correlations. The moments,
E [ Z3(t) 2 Z1 (s)]. E [ Z3(t) 2 Zz(s) ]. and E [ Z3(t) 2 Z3(s) ]. t 2: s, in the differential
equations for the correlation function of Z are not known but can be calculated simply.
For example,

where G is a zero-mean Gaussian variable with variance a 2 (1 - e-Za(t-s) ). which is


independent of Z1 (s) and Z3 (s ). That this representation is valid follows from the solution,

Z3(t) = Z3(s) + 1t e- 01 (t-u) a ,J2a dB(u) = Z3(s) + G,


of the last coordinate of Z. •

Consider a system with state X and let Y and YT be two models of the input
to this system, which are consistent with the available information. The model Y
i.s defined by a differential equation of the type in Eq. 7.85 and the model Y T is a
translation process. The construction of the model of Y exceeds by far the effort
require to define Y T. However, once Y has been obtained, properties of the state
X corresponding to Y can be established by the methods in this chapter. On the
other hand, there are no practical methods for calculating properties of X driven
by Yr.

7.4.2 Materials science


The evolution in time of stiffness, atomic lattice orientation, and other prop-
erties of random heterogeneous media subjected to persistent actions is discussed
extensively in Chapter 8. In this section we show that some global measures
of these properties satisfy differential equations with deterministic coefficients
driven by Gaussian white noise (Section 7.3.1).
Consider a random heterogeneous medium in a bounded subset D of R q,
q = 1, 2, 3, so that the properties of this material can be modeled by a random
field defined on a probability space (Q, :F, P). The material is subjected to an
external deterministic strain Ye(t), t 2: 0, with a constant rate Ye(t) = Ye > 0.
Denote by r(~, t) an R-valued continuous function of the strain tensor at location
~ E D and timet induced by Ye(t). Because the medium is random and heteroge-
neous, r : D x [0, oo) x Q ~ R is a random field defined on the probability space
(Q, :F, P) that depends on time and space arguments. A sampler(·, t, w) of r is
a real-valued function depending on the strain field in D at timet for a material
specimen w. Because strains are bounded in D, the moments of any order of r
exist and are finite, so that we can write

r(~. t) = E[r(~. t)J + f<~. t) = y(t) + f(~. t). (7.86)


7.4. Applications 519

It is assumed that r is ergodic in g and that D is sufficiently large with respect to


the correlation length of r so that the approximation,

E[f(g, t)] = y(t) c::: - 1 /, f(g, t, w) dg, (7.87)


VD D

holds for all t 2:: 0 and almost all w's, where v D denotes the volume of D. Hence,
the ensemble average E[f(t t)] = y(t) can be obtained from the spatial average
of an arbitrary sample of r at any time t and can be interpreted as a strain measure
at time t induced by the external action y e in a homogeneous material.
Let X (g, t) be an internal variable characterizing a material property at lo-
cation g E D and time t 2:: 0, for example, one of the Euler angles of the atomic
lattice orientation in a polycrystal. Because material properties and internal strains
vary randomly in space, X is a random function defined on the same probability
space as r. Our objective is to derive an evolution equation for X. We proceed
under the assumptions that (1) the evolution of material properties is governed
only by plastic straining and (2) the internal variable X takes values on the real
line. Generally, X is an ~d -valued random function but the assumption that X is
real-valued is satisfactory in many cases [14, 92].
If the material were homogeneous, then X(t t) = X h (t) would have the
same value everywhere in D and its evolution would be given by a deterministic
differential equation of the type Xh (t) = a(Xh (t)) y (t). The function a is prob-
lem dependent, for example, a(x) is proportional with 2 cos(x)- c if x denotes
the atomic lattice orientation of a planar crystal with two slip systems, where c is
a constant (Section 8.6.2.1).

If the medium is random and heterogeneous, the internal variable X charac-


terizing the evolution of its properties can be modeled by a diffusion process
satisfying the stochastic differential equation

dX(O = a(X(0) d~ + qw,o b(X(0) dB(O, (7.88)

where~ is a new time defined by d~(t) = y(t) dt, B denotes a Brownian mo-
tion, qw,o is a constant, and b(·) is a function of the state X.

Proof: Because the medium is random and heterogeneous, the internal variable X is a
random function depending on time and space. The evolution of X is characterized by the
change of its probability law in time. However, the determination of the evolution of the
probability law of X is a very difficult task. A simplified description of this evolution has
been proposed in [92]. Let C be a collection of points~ in D that are sufficiently far apart
so that the correlation between strain processes at distinct points in C can be neglected, that
is, the processes f'(~, ·)and f'(~', ·)in Eq. 7.86 can be assu~ed to be uncorrelated, where
~, f E C and ~ =/= ~ 1 • The collection of functions of time f' (~, ·), ~ E C, can be viewed
as samples of a stochastic process y 0 + Y (·), where y is the spatial average of the strain
rates in C and the noise Y captures the sample to sample variation of these strain rates.
Hence, the evolution of the internal variable X in a random heterogeneous medium can be
520 Chapter 7. Deterministic Systems and Stochastic Input

described by
X(t) = a(X(t)) y(t) + a(X(t)) Y(t),
that is, the evolution equation Xh(t) = a(Xh (t)) y(t) for a homogeneous medium in which
the strain rate y(t) is replaced with y(t) + Y(t). The equation of X becomes
dX Y
-d = a(X) + a(X) -;-,
~ ' y
in the new "time"~ defined by d~(t) = y(t)dt. Because Jie(t) > 0 by hypothesis, we
have y(t) > 0 so that the new time is well defined. It is assumed in [14] that W = Y jy in
Eq. 7.88 is a Gaussian white noise W with mean zero and intensity

where ~corr = y fcorr and fcorr are correlation times. The expectation E[f(t S) 2] is per-
formed with respect to the space parameter, and can be related to properties of the random
medium [14]. Generally, qw may depend on X in which case it can be given in the form
qw(~) 2 = q~ 0 h(X(n) 2 [14]. •
The methods in Section 7.3.1 dealing with nonlinear systems driven by Brownian
motion can be used to calculate properties of the diffusion process X in Eq. 7.88.
The Fokker-Planckequation (Section 7.3.1.3) can be used to assess changes in the
properties of a material subjected to a sustained action.

The density f(x; ~ I xo; ~o) of the conditional process X(S) I (X(~o) = xo) is
the solution of the Fokker-Planckequation (Eq. 7.64)

2 2
qw o a
-aj = - -
a (a(x)f) +- '- ( b(x) 2 f )
. (7.89)
a~ ax 2 ax2

Note: The dependence of the density f(x; ~ I .xo; ~o) on ~ can be used to characterize
the change in time of some material properties, referred to as the structural evolution.
If f(x; ~ I xo; ~o) converges to a stationary density fs(x) =
lims~oo f(x; ~ I xo; ~o),
we say that the structural evolution emerges in a pattern (Section 8.6). The stationary
density fs is the solution of

d q;
0 d2 (
- (a(x) fs(x))- - ' - 2 b(x) 2 fs(x)
)
=0
dx 2 dx

sothatf8 (x) = (cjb(x) 2 )exp[zra(u)j(q;,ob(u) 2 )du].wherec > Oisascaling


constant (Example 7.41).
We also note that f(x; ~ I xo; ~o) dx can be interpreted as the probability that the
state of a randomly selected point ~ in D belongs to an infinitesimal range (x, x + dx]
if the medium in Dis subjected to an average strain ~ and the initial state is X(~o) =
xo. The stationary density fs has a similar meaning. If the initial conditions are random,
J
this probability is given by the integral dx f(x; ~ I xo; ~o) fo(xo) dxo, where fo is the
density of X(~o). ~
7.4. Applications 521

Example 7.56: Consider a polycrystal consisting of brick-type grains of identical


dimensions characterized by 12 slip systems with random orientation specified by
their Euler angles (Section 8.6.2). The polycrystal is subjected to the deterministic
strain history eu in the direction of coordinate Xl (Fig. 7.14, left plot). The right

-200L--~--~-~-----'
50 100 150 200 0 50 100 150 200
time time

Figure 7.14. Applied strain history and corresponding stress histories in five
grains

plot in Fig. 7.14 shows the corresponding shear stress histories in the (x 1, x2)
plane in five different grains.
The stress histories in Fig. 7.14 represent samples of the random function
r(~, t) in Eq. 7.86, recorded at five different locations~ in the polycrystal. By
following the arguments used to define the process X in Eq. 7.88, the stress sam-
ples in Fig. 7.14 can be used to define a stochastic process for the shear stress in
the polycrystals. The differences between these stress samples is caused by the
uncertainty in the material properties, since the input is deterministic.

Note: The stress analysis is based on the Taylor hypothesis, assuming that there is no inter-
action between grains. Crystal plasticity theory has been used to calculate stress histories
in individual grains. Details on this topic are in Section 8.6.2.2. ~

Example 7.57: Let X be the atomic lattice orientation in a single crystal with two
slip systems subjected to a random shear strain rate y + Y = 1 + q W, where q is
a constant and W is a Gaussian white noise. The atomic lattice orientation process
is the solution of (Section 8.6.2.1 in this book, [12], [14])

X(s) = (2 cos(X(s))- 1) (1 + q W(S))' s ~ 0.

Figure 7.15 shows the evolution of the density f(x; S) for an initial state X(O),....,
u ([0, 2 iT]) and two values of the noise intensity q. The evolution of f (x; n
reaches a steady configuration that depends on the noise intensity. 0
522 Chapter 7. Deterministic Systems and Stochastic Input

q = 0.1 q=2

1.5
1.5

f(x;l;) .f(x;/;)1
0.5
0.5

0 0

0.15

0 0 0 0

Figure 7.15. Density f(x; 0 for X(O) ~ U([O, 2n]) and two noise intensities

Note: Generalities on crystal plasticity are in Section 8.6.2. The internal variable X in this
example is equal to twice the atomic lattice orientation. The general evolution equation for
a single crystal is in Section 8.6.2.1.
Because we assume that the crystal has two slip systems with deterministic prop-
erties, the material does not have random properties. The internal variable X is a random
process since the initial condition and applied strain are uncertain. A

7.4.3 Reliability analysis


Engineering systems are designed for specified functions and level of per-
formance. For example, a bridge has to be sufficiently strong to support its own
weight and traffic loads without excessive deformations and/or vibrations over a
specified time interval. Cars and planes are designed to carry passengers comfort-
ably and safely. Computers controlling air traffic, nuclear power plants, or bank
accounts need to function according to specified performance criteria. The height
of a flood protection system must be equal to a water level that is exceeded on the
average, for example, at most once in a hundred years. It is not possible to man-
ufacture a system that fulfills some design conditions with certainty because of
the randomness in both system properties and actions. However, we can calculate
the probability that these conditions are met during a specified time interval and
modify a design such that this probability is sufficiently large.
Let X(t) be an ~d-valued process defining the state of a system at time
t 2: 0 and denote by r > 0 the projected life of the system, called design life.
The coordinates of X may denote displacements and velocities at some points of
a bridge, stresses at critical points of an aircraft, the current earth temperature, or
the value of a stock. The design life r can vary significantly depending on the
7.4. Applications 523

system. Common values of r are 50 years for a bridge, 20 years for an aircraft,
thousands of years for earth temperature, and weeks or months for a stock price.
The reliability in r is the probability

Ps(r) = P(X(t) ED, 0 :S t :S r), (7.90)

where D is an open subset of JRd, called the safe set, and results from the design
conditions. Our objective is to calculate the reliability P s ( r) or the probability of
failure Pt(r) = 1- Ps(r) in r.

Example 7.58: Let X and X be the displacement and velocity of a linear or non-
linear oscillator that is subjected to a random input. Denote by X = (X 1 =
X, X 2 = X) the state vector of this system. The safe set is

D = {x E JR2 : lxtl <a}, a> 0, or


D = {x E JR2 : Xf +xi < e}, e> 0,

if the design objective is to control the displacement or the total energy of the
oscillator, respectively. <>

Generally, the reliability of an engineering system is found by using Monte


Carlo simulation, crossing theory for stochastic processes, or numerical solution
ofFokker-Planck-Kolmogorovequations. The following sections illustrate the use
of these methods.

7.4.3.1 Crossing theory

Crossing properties of stochastic processes were discussed in Section 3.1 0.1.


We have shown that the frequency at which a state vector X crosses the boundary
aD of the safe set D from D to Dc can be used to bound and approximate Ps(r).
A aD-crossing of X from D to DC is called a D-outcrossing.
Let N(D; r) be a random variable giving the number of D-outcrossings of
X during a time interval [0, r]. The random variable N(D; r) takes values in
{0, 1, ... }. It is assumed that X is such that E[N(D; r)] < oo. If X is a sta-
tionary process, then E[N(D; r)] = r E[N(D; 1)] so that A.(D) = E[N(D; 1)]
represents the average number of D-outcrossings per unit of time, and is called
the mean D-outcrossing rate of X. The corresponding values of E[N(D; 1)] for
the x-upcrossings and x-downcrossings of a real-valued process are denoted by
A.(x)+ and A.(x)-, respectively (Section 3.10.1).

Let A = {X(O) E D} and B = {N(D; r) = 0}. A common approximation for


Ps ( r) in reliability studies is

Ps(r):::: P(A) P(B):::: P(X(O) ED) exp (-E[N(D; r)]). (7.91)


524 Chapter 7. Deterministic Systems and Stochastic Input

Note: The first approximate equality is based on the assumption that the events A and B
are independent. The second approximate equality assumes that the D-outcrossings of X
follow a Poisson process. There is no justification for the first assumption but numerical
results support it. The second assumption is correct asymptotically as the size of D and the
reference period r increase indefinitely ([45], Section 12.2). _.

The probability of failure P f(T) takes values in [Pf,l (r), Pf,u (r)], where

Pf,u(<) =min (1, 1- P(X(O) ED)+ E[N(D; r)]),


Pj,t(T) = (qT p)2 j(qT Jr q), q E JE.m, (7.92)

0::::; t1 < · · · < tm ::::; r, A; = {X(t;) ¢ D}, p; = P(A;), TC;J = P(A; n AJ),
p =(pi, ... , Pm), and 1r = {TCiJ }, i, j = 1, ... , m.
Proof: Let A= {X(O) E D} and B = {N(D; r) = 0}. Then Ps (r) = P(A n B) so that
Pj(<) = P((A n B)c) = P(Ac U Be) .::0 P(Ac) + P(Bc),

where

P(Ac) = P(X(O) E De)= 1- P(X(O) ED),

L L k P(N(D; r) = k) = E[N(D; r)].


00 00

P(Bc) = P(N(D; r) = k) .::0


k=I k=I
These results give the upper bound Pf, u ( r).
For the lower bound Pf,l· note that {X(t) E D, 0 .::0 t .::0 r} c n;n=l Af for any
distinct times t; E [0, r], i = 1, ... , m, so that Ps(<) .::0 P (n;"=l Af) and

Pj(T)::: Pj,t(<) = P (U;"=l A;)= P(U;"=l {X(t;) E De}).


If m = 1, then Pj,t(<) = P(X(t) E De) with the optimal value P(X(t*) E De), where
t* E [0, r] maximizes P(X(t) E De). If X is stationary, the probability P(X(t) E De) is
time invariant so that t can be any time in [0, r ]. If m > 1 is fixed, an optimization algo-
rithm can be used to find the times Cti, ... ,
t~) that maximize Pf,l ( r) in Eq. 7 .92. Con-
sider some fixed distinct times (t1, ... , tm) and define a random variable Z = I:;"= I q; 1Ai'
where q; 's are arbitrary constants. The random variable Z is zero on the subset (u;"=l A; )c
of n and has the properties

P(Z i= 0) .::0 P (uf=l A;) ,


m m
E[Z] = Lq; E [1AJ = Lq; P(A;) = qT p, and
i=I i=I
m m
E[Z 2 J= L q;q1 E[1A;1A1 ]= L q;q1 P(A;nA 1 )=qrirq,
i,J=l i,J=l

where q = (qi, ... , qm) and pare viewed as column vectors. Because
E [(x Z- l{z;<oo}) 2] = x 2 E[Z2]- 2x E[Z] + P(Z i= 0)::: 0, Vx E R,
7.4. Applications 525

we have (E[Z]) 2 - E[Z 2] P(Z "I 0) .::; 0 or P(Z "I 0) ~ (E[Z]) 2 / E[Z 2] which gives

P (r) > P(U'?l A-) > P(Z-'- 0) > (E[Z]) 2 = (qT p) 2


f - 1=l 1 - 1 - E[Z2] qT lC q

for any q E JRm. Tighter lower bounds can be obtained by selecting q such that it maxi-
mizes (qT p) 2 j(qT Hq) [170].
Alternative lower bounds on the failure probability can be obtained. For example,
we have shown that the probability

is a lower bound on P f,l ( r) (Section 2.2.3). •

Example 7.59: Consider an elasto-plastic oscillator with a small damping ratio


~, elastic frequency v > 0, and yield displacement u > 0. The oscillator is
subjected to a broadband, stationary, Gaussian input with mean zero. Let Tel
denote the random time between consecutive plastic excursions of the oscillator
displacement X, that is, the time between consecutive ( -u, u)-outcrossings of X.
If u -+ oo, then Tel -+ oo and X approaches a stationary, narrowband Gaussian
process xel with mean zero as time increases (Section 7.2.1.2). It is assumed that
(1) the input is scaled such that xel has unit-variance and (2) u < oo is such
that Tel is much larger than the duration of the transient displacement following
a plastic excursion. Hence, X can be approximated by X el in Tel and the plastic
excursions are rare events.
Let N ( r) and !:ii be the number of plastic excursions in [0, r] and the plastic
deformation at the end of plastic excursion i = 1, ... , N(r), respectively. The
reliability in r can be defined by Ps(r) = P (.2:::~:) I!:iii_::: 8cr), where 8cr is a
specified limit value. The definition of Ps ( r) is meaningful since 2:::~:) I!:iii is
proportional to the energy dissipated by plastic deformation in [0, r] so that it can
be used as a damage measure [54]. It is assumed that !:i i are iid random variables
that are independent of the counting process N so that, if the laws of the random
variables !:i 1 and N ( r) are known, we can calculate the reliability P s ( r) from the
characteristic function,

L (~l~tl(u)r P(N(r) = n),


00

=
n=O

of 2:::~:) I!:iiI, where ~~~ 1 1 denotes the characteristic function of I!:i 11-
Methods of various complexity and accuracy have been proposed for cal-
culating the laws of !:i 1 and N ( r) [54]. One of the simplest methods for finding
the distribution of !:i 1 uses (1) the equivalence between the kinetic energy of the
526 Chapter 7. Deterministic Systems and Stochastic Input

oscillator mass at the initiation of a plastic excursion and the energy dissipated
by plastic deformation l11 and (2) the distribution of _Xel at the time of a ±u-
crossing of xel, which can be obtained from the properties of this process [ 189].
The assumption that N(r) is a Poisson random variable with intensity A. given by
the mean ( -u, u)-outcrossing of xel is not satisfactory since xel is a narrowband
process so that its ( -u, u)-outcrossings occur in clusters and therefore they can-
not be independent. This difficulty can be overcome by replacing N with another
counting process N, where N(r) gives the number of u-upcrossings in [0, r] of
the envelope process R(t) = [Xe1(t) 2 + xe1(t) 2] 112, where xel denotes the Hilbert
transform of xei (Section 3.9.4.1). The envelope process (1) is larger than IXe11
at all times, (2) is equal to 1xe11 at the zero-crossing times of xe1, (3) has a single
excursion above u for each cluster of (- u, u) -outcrossings of X el, and (4) may
have excursions outside ( -u, u) during which xei does not exit ( -u, u). The ex-
cursions of R which are not associated with ±u-crossings of X el are called empty
excursions, and are not considered in the calculation of P s ( r). The process N can
be approximated by a homogeneous Poisson process with intensity

where 8 = )1- A.i/A.2, Ak = f 0


00
vk g(v) dv, and g denotes the one-sided spec-
tral density of xeJ ([45], Sections 11.7 and 11.8).
To complete the analysis we need to estimate the number of empty envelope
crossings and modify N so that only the qualified envelope excursions, that is, the
envelope crossings that are accompanied by crossings of X el, will be counted. The
marked process N counting only the qualified envelope crossings has the intensity
r AR(u), where r denotes the long run fraction of qualified envelope excursions,
and can be calculated, for example, from

1
r = -- (1- e-n 8 u) or
n8u
-
r-1-2}0 Q>(a)
r [1- ,JiJT
2rr
<l>(y (u
yn(u
1T 2 -
2
a 2 )ju)-
-a 2 )/u
1/2] ,

where y = 8/~. Comparisons of the fraction of empty envelopes 1 - r


in the above expressions with simulation results show that the first expression
becomes less satisfactory as the spectral bandwidth 8 of X increases [54]. <:>

Note: The first expression of r results from the distribution P(R(t) > u) = e-u 2 12 of R,
the expression of the mean crossing rates :AR(u) and :A(u) of Rand xel, respectively, and
the following approximations and assumptions.
LetT be the random duration of an excursion of R above u. The long run average
ofT is P(R(t) > u)/:AR = 1/(u n y0:2) since r P(R(t) > u) approximates the average
time R is above u during a time interval [0, r ], provided that r is large, and AR r gives
the average number of u-upcrossings of R in [0, r]. The average interarrival time between
7.4. Applications 527

consecutive zero crossings of xel is 1/A.(O) = n j ./):2. If T ~ 1/A.(O), it is assumed that


corresponding envelope excursions are qualified. If T < 1/A.(O), empty envelope crossings
are possible, and the probability of an empty envelope crossing can be approximated by
[1/A.(O) - TJ/(1/A.(O)) or 1 - T ./}:ijn by assuming that Tis randomly located on the
time axis and the zeros of X are multiples of 1/A.(O). The first expression of r above
results under the assumption that Tis an exponentially distributed random variable [189].
The second expression of r is based on rather complex arguments regarding properties of
the envelope process [54]. These considerations also allow us to derive the probability of
empty envelope excursions. A

7.4.3.2 First passage time


LetT = inf{t 2: 0 : X(t) ¢. D} be a stopping time giving the first time
when the state X starting at X (0) E D leaves the safe set D. The reliability in
[0, r] is (Eqs. 7.66 and 7.90)

Ps(r) = P(T > r) = 1- Fr(r), (7.93)

where Fr denotes the distribution ofT. We discuss the calculation of Fr by


Monte Carlo simulation, Durbin's formula, and Fokker-Planck's equation.

Monte Carlo simulation. In Section 5.4.2 the Monte Carlo simulation method
was used to estimate the probability that the state of a physical system does not
leave a safe set during a specified time interval. Here we estimate the probability
that a structural system performs according to some design specifications during
an earthquake. The coordinates of the state vector X are displacements and ve-
locities at a finite number of points of a structural system. The safe set D results
from conditions on X. The input is the seismic ground acceleration, which can be
modeled by a process A(t) = ~(t) As(t), t E [0, r], where r > 0 is the duration
of the seismic event, As(t) is a stationary Gaussian process with mean zero and
spectral density s(v), and ~(t) denotes a deterministic modulation function.
Statistical analysis of a large number of seismic ground acceleration records
and plate tectonics considerations have been used to develop expressions for the
spectral density s(v), modulation function ~(t), and earthquake duration r of the
seismic ground acceleration process A(t) at many sites in the US. It has been
shown that s(v), ~(t), and r are completely defined by the moment magnitude M
of the seismic sources in the proximity of a site and the site-source distance R
[96, 139].
Let Pj(m, r) denote the probability that a structural system subjected to
the ground acceleration process A corresponding to a seismic source with (M =
m, R = r) does not satisfy some specified performance requirements, that is,

Pf(m, r) = 1- P(X(t) E D, 0 :S t :S r I M = m, R = r).


The plot of P f(m, r) versus all possible values (m, r) of (M, R) is referred to as
528 Chapter 7. Deterministic Systems and Stochastic Input

the fragility surface. The unconditional probability of failure

Pt =I I Pt(m, r) !M,R(m, r) dm dr

can be used to assess the seismic performance of the system under consideration,
where !M,R denotes the density of (M, R). Estimates of this density for the cen-
tral and eastern United States can be found in [64].

Example 7.60: A steel water tank with dimensions 20.3 x 16.8 x 4.8 ft is located
at the top of a 20 story steel frame building. The tank is supported at its comers
by four vertical legs of length 6.83 ft. It is assumed that (1) the steel frames of
the building remain linear and elastic during the seismic event but the tank legs
may yield, (2) the building and the tank oscillate in a vertical plane containing
the direction of the seismic ground motion, (3) there is no feedback from the tank
to the building, (4) the tank can be modeled as a rigid body, and (5) the design
condition for the tank requires that the largest deformation of its legs be smaller
than a critical value dcr·
Let Abt be the absolute acceleration process at the connection point be-
tween the building and the tank, so that this process represents the seismic input
to the tank. The calculation of fragility surfaces by Monte Carlo simulation in-
volves three steps. First, a pair of values (m, r) needs to be selected for (M, R)
and ns samples of Abt corresponding to these values of (M, R) have to be gen-
erated. Second, the dynamic response of the tank has to be calculated for each
sample of Abt. Third, the probability P f (m, r) has to be estimated by, for ex-
ample, the ratio n; j ns, where n; denotes the number of tank response samples
that exceed a critical displacement dcr· Figure 7.16 shows fragility surfaces for
the tank corresponding to several values of dcr and two different designs, which
are sketched in the figure. The fragility surfaces in Fig. 7.16 corresponding to the
same critical value dcr differ significantly for the two designs illustrated in this fig-
ure. Fragility surfaces of the type shown in Fig. 7.16 and information on the cost
of various design alternatives can be used in earthquake engineering to select an
optimal repair strategy for an existing system and/or develop efficient new seismic
designs.<>
Note: The building's steel frame can be modeled by a system with r = 20 translation
degrees of freedom. If the system has classical modes of vibration, the displacement at
the structure-tank connection point is Y(t) = :Lr=l ¢; Q;(t), where the constants¢; can
be obtained from the modal shapes of the building and the functions Q; denote modal
coordinates. The modal coordinates satisfy the differential equations
.. . 2
Q; (t) + 2 ~i v; Q; (t) + V; Q; (t) = -y; A(t), t :::: 0,

with zero initial conditions, where ~i and v; denote modal damping ratios and frequencies,
respectively, and Yi are modal participation factors (Examples 7.5 and 7.6). We have

Q;(t)= fo'e;(t-s)[-y;A(s)]ds, t:=::O,


7.4. Applications 529

dcr =0.5%

0.5 0.5

8
0 4
dcr =2.0%

0.5 0.5

8 8
0 4
dcr =6.0%

0.5 0.5

8 8
0 4 m m

Figure 7.16. Fragility surfaces for several values of dcr

where ()i is the unit impulse response function for mode i (Example 7.5). Let
n
A(t) = g{t) L Uk [ Ak cos(vk t) + Bk sin(vk t)]
k=l

be an approximate representation of the seismic ground acceleration for Monte Carlo sim-
ulation, where uk, Vk > 0 are constants and Ak, Bk are independent N(O, 1) variables
(Section 5.3.1.1). Then
n
Qi(t) = L uk [Auc,ik(t) + Bus,ik(t)] and
k=l

Y(t) = t
k=l
uk [Ak (t. z=l
tPi 8c,ik(t)) + Bk (t.
z=l
tPi 8s,ik(t))]

where

8c ik(t) = -Yi t ()i (t- s) ~(s) cos(vk s) ds,


· lo
8s ik(t) = -Yi t ()i (t- s) Hs) sin(vk s) ds.
' lo
The absolute acceleration at the structure-tank connection point is Abt(t) = A(t) + Y(t).
530 Chapter 7. Deterministic Systems and Stochastic Input

The stress -strain relationship for the tank legs considered in Fig. 7.16 is linear from
(0,0) to (0.0012, 36 ksi), constant from (0.0012, 36 ksi) to (0.1356, 36 ksi), linear from
(0.1356, 36 ksi) to (0.585, 60 ksi), and constant for larger strains, where the first and second
numbers in parentheses denote strains and stresses, respectively. &

Durbin's formula. We have given in Section 3.10.2 formulas for the densities of
T and (T, X(T)), where X is a real-valued process, T = inf{t > 0 : X (t) :=:: u },
and u E Ilt These formulas are repeated in the following equation.

The joint density of the random variables (T, X(T)) and the density of T are
fr.x(T)(t, z) = E [l(x(s)<u,O:;:s<t} X(t)+ I X(t) = u, X(t) = z] fx(t),X(t/u, z),
fr(t) = E [l(X(s)<u,O:;:s<t}X(t)+ I X(t) = u] /X(tj(U). (7.94)

We show that Eq. 7.94 can be used to bound the density ofT. Consider n
distinct times 0 < t1 < · · · < tn < tin [0, t] and the events A = {X(t) < u <
X(t + M)} and Bn = n?=l {X(ti) < u}, where M > 0 is a small time interval.
Then fr can be bounded by the functions in Eq. 7 .95. The bounds on f T in this
equation have been used to develop computer codes for calculating the density of
T approximately [159].

The density of T satisfies the inequalities

fr (t) :=:: A.(u; t)+

-lot E [1£T1=s} X(s)+ X(t)+ I X(s) J


= X(t) = u /x(t),X(s)(u, u) ds,

fr(t):::: E [lB. X(t)+ I X(t) = u J /x(t)(u):::: A.(u; t)+. (7.95)

Proof: The upper bound on fT results from Eq. 7.94 since 1(X(s)<u,0:::S<t} ::::: 1Bn ::::: 1.
Let Tk> k = 1, 2, ... , denote the time of the k'th u-upcrossing of X. The probability
that this upcrossing occurs in [t, t+D.t] can be approximated by frk (t) D.t, where fTk is the
density of Tk. Hence, the mean u-upcrossing rate A.(u; t)+ of X at timet is I:~ 1 fn (t)
so that

fT1<t> = A.(u; t)+- I: Jn<t> = A.(u; t)+- I:


00 00 lot !r n<s. t)ds.
k=2 k=2 ° . 1
7.4. Applications 531

We also have

= f lot
k=2
E [1(Tt=s} 1{Tk=tl X(s)+ X(t)+ I X(s) = X(t) = u] fx(s),X(t)(u, u)ds

~ fo' E [ ,,T, ~, 1 x,,,+ (E • 1T,~n x,,,+) 1 x ,,, ~ x u1 ~ "J tx1, 1.x1o'"· "' J,
::=:lot E [l{Tt=s} X(s)+ X(t)+ I X(s) = X(t) = u] fx(s),X(t)(u, u) ds

since I:~ 2 1(Tk=t} X(t)+ = (1- 1{Tt=tJ) X(t)+ ::=: X(t)+. The expression of fT1 and
the above inequality give the lower bound on fT in Eq. 7.95. •

Fokker-Planck equation. In Section 7.3.1.3 we have established the boundary


conditions that the Fokker-Planck equation must satisfy to deliver the distribution
and/or moments of the stopping timeT in Eq. 7.93. Generally, the Fokker-Planck
equation cannot be solved analytically. Numerical solutions have been obtained
for low dimension state vectors (d .:::; 3). The following examples illustrate how
the distribution ofT can be obtained from the solution of the Fokker-Planck equa-
tion.

Example 7.61: Let X = B be a Brownian motion and let

Pt(t) = 1- P( sup B(s) <a)


O:::sg

be the probability of failure for a safe set D = (-oo, a), a > 0. Figure 7.17
shows the evolution in time of the exact and approximate probability of failure
Pt (t) for a = 1 and X (0) = 0. The approximate solutions correspond to Monte
Carlo simulation and a finite difference solution of the Fokker-Planck equation for
the density of X (t) I (X (0) = 0) with an absorbing boundary at x =a. <>
Note: LetT= inf{t > 0: X(t) if. ( -oo, a)}. The distribution of this stopping time is

P(T:::: t) = 2[1- <I> (a!0)],


where <I> denotes the distribution of N(O, 1) ([154], Section 6.2). By the definitions ofT
and Pt(t) we have P(T ::=: t) = Pt(t).
The Monte Carlo estimate of Pt(t) is given by the fraction of samples of X that
have left D = ( -oo, a) at least once in [0, t].
A finite difference representation of the type in Example 7.37 has been used to solve
the Fok:ker-Planck equation aflat = (1/2) a2 f/ax 2 with the initial and boundary condi-
tions f(x; 0 I 0; 0) = 8(x) and f(a; t I 0; 0) = 0, t > 0, respectively. The probability of
failure has been calculated from Pt(t) = 1- J!:_ 00 f(x; t I 0; 0) dx . .&
532 Chapter 7. Deterministic Systems and Stochastic Input

Fokker-Pianck solution
o.a finite difference)

0.7

0.6

0.5

0.1

time

Figure 7.17. Evolution of the exact and approximate probability of failure P f(t)
for X = B and a safe set ( -oo, 1)

Example7.62: LetXbethesolutionofdX(t) = (s X(t)- X(t) 3 )dt+r:rdB(t),


s,
t ::: 0, where a are constants and B is a Brownian motion. Consider a safe set
D = (-a, a) and an initial state X (0) = xo E D, where a > 0 is a constant.
Figure 7.18 shows the evolution of the probability of failure P f(t) for = ±1, s
a= 1.6 a= 1.1
0.8 ~= 1 0.8 ~=-1

0.6 0.6

0.4 0.4

Fokker-Pianck Fokker-Planck

2 4 10 2 4 10

Figure 7.18. Probability of failure P f(t)) for X defined by dX (t) = (±X (t) -
X (t) 3 ) dt + dB(t) and a safe set D = (-a, a)

a = 1, xo = 0, a = 1.6, and a = 1.1. The finite difference and the Monte Carlo
simulation methods have been used for calculations. <>

Note: The initial and boundary conditions for the finite difference solution of the Fokker-
Planck equation are f(x; 0 I xo; 0) = 8(x - xo) and f(x; t I xo; 0) = 0 fort > 0,
respectively. The condition f(x; t I xo; 0) = 0, t > 0, defines an absorbing boundary for
the Fokker-Planck equation associated with the diffusion process X. &
7.4. Applications 533

Example 7.63: Let X be the solution of

X(t) + 2e 2 vo X(t) + v5 X(t) = eY(t), t > 0,

s,
where v, e > 0 are some constants and Y is a stationary Gaussian process
with mean zero and a broadband spectral density s(v). The envelope A(t) =
(X(t) 2 + X(t) 2 fv5) 112 of X can be approximated by a diffusion process (Exam-
ples 7.48 and 7.51). Let Pf(t) = 1 - P(maxo9 9 1X(t)l <a) be the probability
of failure for the state X in the above equation. An approximation of this proba-
bility offailure is Pf(t) = 1 - P(maxo 9 9 A(t) < a).
Figure 7.19 shows Monte Carlo estimates of P f(t) and Pf(t) obtained from

Pjt) (Fokker-Pianck)
0.25

0.2

0.15

0.1
Pjt) (Monte Carlo)
0.05

0 oL_~-=~~~~~--~5---L--~--~--~~10
time

Figure 7.19. Evolution of probabilities P 1 and P1 in time

1,000 samples of X and A, respectively. The figure also shows the probability
Pf (t) obtained from the finite difference solution of the Fok:ker-Planck equation
for the density of A. Numerical results are fore = 0.283, v = 1, s(vo) = 3.978,
and a = 5. That Pf(t) is larger than Pf(t) is consistent with our discussion in
Example 7.59 on the relationship between crossings of A and X.
Figure 7.20 shows the evolution of the joint density of (X, X) in time for
the case in which the driving noise is Y = W, where W is a zero-mean stationary
Gaussian white noise with spectral density sw(v) = s(vo), v E lit The results in
this figure have been obtained by a finite difference solution of the Fokker-Planck
equation for the density (X, X) [179]. ¢

Note: Let X = (XI = X, X2 = X) be the state of the above oscillator driven by W


rather than Y. Hence, X is an JR2 -valued diffusion process. The density f(x; t I xo; 0) of
534 Chapter 7. Deterministic Systems and Stochastic Input

Figure 7.20. Evolution of joint density of (X, X) in time (adapted from [179])

X(t) I (X(O) = xo) satisfies the Fokk:er-Planck equation


aj a(x2f) a[(v5xl +2s 2 vox2)!J a2j
- =--- + +ns(vo) - .
at ax! ax2 ax~

The probability of failure of this oscillator for the safe set D = (-a, a) x lR is given by
Pj(t) = 1- f!:_a f~oo f(x; t I xo; O)dx1 dx2, where f(x; t I xo; 0) isthesolutionofthe
above Fokk:er-Planck equation with the initial and boundary conditions f(x; 0 I xo; 0) =
8(x - xo) and f(±a, x2; t I xo; 0) = 0 fort > 0, x2 E R, and xo E D. A

7.4.4 Finance
The discussion in this section is based on Chapter 4 in [131]. Useful refer-
ences on this topic are [60], [126], and [151] (Section 10.16).
Let X (t) denote the price of a stock at time t. The relative return in a small
time interval [t, t + L\t], L\t > 0, can be modeled by

X(t + Ll.t)- X(t)


- - - - - - - = c Ll.t +a n01se ,
" . ,
X(t)

where c > 0 and a > 0 are the mean return rate and volatility, respectively.
Bond, money market, savings account, and other similar assets are viewed as
riskless so that their evolution is given by a deterministic differential equation.
7.4. Applications 535

The following models are considered for the stock and bond prices and for the
value of a portfolio.

Stock price: dX (t) = c X (t) dt +a X (t) dB(t).


Bond price: df3(t) = r f3(t) dt.
Portfolio value for trading strategy <Us(t), Ub(t)):
V(t) = Us(t) X(t) + Ub(t) {3(t).
Note: Because we use Gaussian white noise to model market volatility, the value of a
stock at time t, X (t) = X (0) e<c-O.S a 2 ) t+a B(t), is a geometric Brownian motion. The
evolution of the price of a bond is given by {3(t) = {3(0) r! t, where r > 0 denotes the
return rate. Alternative models have been proposed for the evolution of stock prices, for
example, stochastic differential equations driven by Levy or Poisson white noise and the
ARCH and GARCH processes ([126], Chapter 10).
The pair (Us (t), Ub (t)) is called a trading strategy and gives the number of shares
held in stocks and bonds at time t, respectively. The processes Us and Ub are assumed to
be u (B(s), 0 ::::: s ::::: t)-adapted. The above expression for V gives the value of a portfolio
at timet corresponding to the trading strategy (Us(t), Ub(t)). The processes Us and Ub
can take positive and negative values. A negative value of Us (t) means that stocks are sold
at time t, that is, short sale. A negative value of Ub means that money was borrowed at the
bond interest rate . .&
Options are contracts agreed at time t = 0 that entitle one to buy one share
of stock for a specified price k, called strike or exercise price, at or prior to a time
t = r. If an option can be exercised at time t = r or at any time t :::: r, it is called
a European call option or an American call option, respectively. The holder of
a call option does not have to exercise it. An option to sell a stock at a specified
price k at or until a time of maturity t = r is called a put. European and American
put options are defined in the same way as the corresponding call options. The
purchaser of a European call option can make a profit of max(O, X(r) - k) by
buying the stock at the strike price k and selling immediately at the market value
X(r). The purchaser of a European put makes a profit max(O, k- X(r)). Hence,
the European call and put options with strike price k are the random variables
max(O, X(r) - k) and max(O, k- X(r)), respectively. There are many other
types of options, for example, Asian and Russian options [60].
The following assumptions are used in our discussion:

• The options are European calls, there is no cost for transactions, and riskless
assets have a constant rate of return.
• The market is rational.
• The self-financing condition dV (t) = Us (t) dX (t) + Ub(t) df3(t) holds.
Note: In a rational market there is no opportunity for arbitrage, that is, riskless strategies
guaranteeing a profit do not exist. The self-financing condition implies that changes in a
portfolio's value are caused solely by changes in the prices of stocks and bonds. There is
no infusion of new capital in or consumption of wealth from a portfolio. .&
536 Chapter 7. Deterministic Systems and Stochastic Input

Consider an investor starting a portfolio, that is, buying a number of stocks


and bonds at time t = 0. The invested capital and the initial balance between
stocks and bonds is a personal decision. The price of riskless assets is not in
question. However, the price the investor is willing to pay for a stock at time
t = 0 is in question because the time evolution of stock prices is random. Our
objectives are to find (1) a rational price X(O) for a stock at timet = 0 and (2) a
self-financing strategy yielding a payoff max(O, X(r) - k), that is, the payoff
corresponding to a European call option for this stock. If these objectives are
satisfied we say that the investor is hedged. The Black-Scholes formula provides
a self-financing strategy yielding the desired payoff. Let u(X(t), r - t) = V(t)
be the value of a portfolio V at time t. Because we require that V at maturity be
max(O, X ( r) - k), the function u must satisfy the terminal condition for V ( r) =
u(X(r), 0) = (X(r)- k)+ = max(O, X(r)- k).

The Black-Scboles formula states that the rational price for a European call
option at time t = 0 and exercise price k is V (0), where

V(t) = u(X(t), r - t) = X(t) <I> (g(X(t), r - t))


- ke-r(r:-t) <I> (h(X(t), 'l'- t)), (7.96)

g(x,t)=
ln(xj k) + (r + a 2 /2) t , h(x,t)=g(x,t)-at 112 , (7.97)
112
at

and <I> denotes the distribution of N(O, 1).

The process V(t) in Eqs. 7.96 and 7.97 is the value of a portfolio at time t
defined by the self-financing strategy

U ( ) _ au(X (t), r - t) ) u(X(t), r - t)- Us(t) X(t)


(7.98)
s t -
ax ' Ub(t = {J(t) .

Proof: Consider the JR2-valued diffusion process X = (Xt, Xz) defined by Xt = X


and dXz(t) = -dt with the initial condition Xz(O) = t. The It() formula applied to the
function V(t) = u(Xt(t), Xz(t)) = u(X(t), t - t) gives (Eq. 7.3)

V(t)- V(O) = f [c X(s) u,l (X(s), t- s)- u,z(X(s), t- s) J ds

u21ot X(s) 2 u 11 (X(s), t - s) ds +


+-
2 0 '
lot
0
u X(s) u 1(X(s),
'
t- s) dB(s),

where u, 1 and u, 11 denote the first and second partial derivatives of u with respect to x1
and u,z is the partial derivative of u relative to xz.
7.4. Applications 537

The self-financing strategy and the portfolio value

dV(t) = U8 (t)dX(t) + Ub(t)df3(t)


= [c U8 (t) X(t) + r Ub(t) {3(t)] dt + u U8 (t) X(t) dB(t) and
V(t)- U8 (t) X(t)
Ub(t) = {3(t)

imply
dV(t) = [(c- r) U8 (t) X(t) + r V(t)] dt +u U8 (t) X(t) dB(t)
so that

V(t)- V(O) ={ [(c- r) U8 (s) X(s) + r V(s)] ds +u { Us(s) X(s) dB(s).

Because the above two expressions for V(t)- V(O) must coincide, we have
(c- r) U8 (t) X(t) + r u(X(t), -r- t)
(J2
= c X(t) u ' 1(X(t), -r- t)- u ' 2 (X(t), -r- t) +-
2
X(t) 2 u 11 (X(t), -r- t),
'
U8 (t) = U,l (X(t), t ' - t).

The first equation, with Us in the second equation above, gives


(J2
u 2 (X(t),-r-t)
'
= -X(t)
2
2 u 11 (X(t),-r-t)+rX(t)u t(X(t),-r-t)-ru(X(t),-r-t).
' '
This identity provides a partial differential equation,

u2 2
u 2 (x, s)
!
= -2 x u 11 (x, s)
'
+ r x u ' 1(x, s) - r u (x, s)

for function u defined for x > 0 and s E [0, -r] with the deterministic terminal condition
u (x, 0)= (x - k) +. The solution of this differential equation is
u(x, t) = x <l>(g(x, t))- ke-rt <l>(h(x, t))
with the notation in Eq. 7.97. •

Example 7.64: Let c = 1, u = 0.5, and r = 0.5 be the mean return rate, volatil-
ity, and return rate, respectively. Suppose that (1) current stock and bond prices
are X (0) = 6 and f3 (0) = 6 and (2) a European call option is available at a strike
price k at time r = 1. The question is whether the current offering of X (0) = 6 is
acceptable.
Figure 7.21 shows ten samples of the self-financing strategies Us and Ub in
Eq. 7.98 and the portfolio value V in Eq. 7.96 fork = 10 and ten samples of V
for k = 20. These samples suggest that it is unlikely to make any profit at these
strike prices. Estimates of the probability Pprf of making a profit and the expected
profit eprr are Pprf = 0.03 and eprf = 2.2 fork = 10 and Pprf ~ 0 and eprf = 2.1
fork = 20. If k = 5, the estimates are Pprf = 0.55 and eprf = 7.03, so that we
have a better then 50% chance to profit. The above estimates have been calculated
from 500 samples. <>
538 Chapter 7. Deterministic Systems and Stochastic Input

-4

-6
Samples of ub (k=lO)
-8L_~--~--~--~--~
0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8

60,-~---------------, 50,-~-----------,

50 Samples of V (k=lO) 40 Samples of V (k=20)


40 30

30 20

20 10

10

-10'---~-~-~-~--'
0 0.2 0.4 o.6tim&·s

Figure 7.21. Samples of trading strategies, stock price, and portfolio value

Example 7.65: Let X(t) = e-rt X(t) and V(t) = e-rt V(t) be the discounted
stock price and portfolio value at a timet E [0, r], where X and V are defined
on a probability space (Q, :F, P) with a filtration :Fr = a(B(s), 0 ::::; s ::::; t)
and B is the Brownian motion in the defining equation for X. The processes
X and V satisfy the differential equations dX(t) =a X(t)dB(t) and dV(t) =
Us(t) dX(t), where B is a Brownian motion under a probability measure Q de-
fined by (dQ/dP)(t) = exp[-(cja)B(t)- (c 2 j2a)t]. The process Vis an
:Fr- martingale and

V(t) = EQ [e-r(-r-t) h(X(r)) I :Fr],


where h(X(r)) = (X(r) - k)+ for a European call option and E Q denotes ex-
pectation with respect to probability measure Q. The above expression for V(t)
gives the Black-Scholes formula.<>
Proof: Ito's formula and the self-financing condition give
dX(t) = -r e-r t X(t) + e-r t dX(t) =a X(t) [(c- a) dt +a dB(t)],
dV(t) = Us (t) ( -r e-r t X(t) + e-r t dX(t)) = Us (t) dX(t).

That B(t) = (c-r) tfa+B(t) is a Brownian motion under Q was shown in Section 5.4.2.2.
The process
V(t) = V(O) +a lot U (s)X(s)dB(s)
8
7.4. Applications 539

is an Fr-martingale under probability Q since it is defined by a stochastic integral whose


integrator is a Brownian motion under Q and the process Us X is .1'1-adapted and has
continuous samples. The martingale property of V implies V(t) = EQ[V('r) I Ftl for
t E [0, r] so that

V(t) = e-rt V(t) = EQ [e-n h(X(r)) I Fr],


as stated.
The above expression of V(t) and the relationships

X(r) = X(t)e-(a 2 /2)(-r-t)+a(B(r)-B(t)),


- -

X(r) = X(t)e(r-a 2 /2)(-r-t)+a(B(r)-B(t)),


- -

give

V(t) = EQ [e-re h(X(r)) I Fr] = EQ [e-re h (x(t)e<r-a2/2)!9+a(B(r)-B(t))) I Fr]


with the notation(} = r - t. Because X(t) E Fr and B(r)- B(t) is independent of Fr,
we have V(t) = ~(X(t), t), where

~(x, t) =e-re k h (xe<r-a2j2)!9+av'Bz) cp(z)dz.

This function coincides with u in Eqs. 7.96 and 7.97 for a European call option ([131],
Section 4.2.2). •

7.4.5 Estimation
The evolution of the state X of many physical systems can only be modeled
approximately because of complexity and/or limited understanding. Generally,
the resulting approximate models are satisfactory for short term predictions of fu-
ture values of X but cannot be used for long term predictions of a system state.
Measurements of the actual state of a system can be used to update X and re-
duce errors caused by approximations in modeling. The Kalman-Bucy filtering
approach is an efficient tool for propagating the state X based on its model and
measurements. We present the essentials of this approach starting with time in-
variant estimation problems and then extending these results to time dependent
linear problems with discrete and continuous time.

7.4.5.1 Time invariant problems


Let X be an unknown JRd -valued random variable and let Z E JRd' be a
measurement of X. This measurement may not include all the coordinates of X
and may be altered by noise. Our objective is to find an estimator X of X that is
optimal in some sense and is based on the observation vector Z. The following
formulas use the notations JLx = E[X], JLz = E[Z], and Yuv for the covariance
matrix of a random vector (U, V).
540 Chapter 7. Deterministic Systems and Stochastic Input

• The best mean square (m.s.) estimator of X given Z is the conditional ex-
pectation X= g(Z) = E [X I .rZJ = E [X I Z].
• The best m.s. linear estimator of X given Z is
-1
(7.99)
A

X=JLx+YxzYzz (Z-JLz),

and has the error covariance matrix

y=E (X-X)(X-X)
[
A A T] =Yxx-YxzYzz-1 Yzx· (7.100)

Proof: Let X = g(Z) be an estimator of X. The m.s. error of this estimator is

E [(X- g(Z)/ (X- g(Z))]

= r [ r (x- g(z)/ (x- g(z)) fxlz(X I z) dx] fz(Z) dz


lJE.d' lJE.d
= kd' E [ex- g(Z))T (X- g(Z)) z = z] fz(z)dz 1

where X, g(Z), and Z are column vectors, fxlz is the density of X I Z, and fz denotes the
density of Z. The expectation in the above integral,

E[XT X I Z = z]- 2g(z)T E[X I Z = z] + g(z)T g(z),


is positive, and is minimized by g(z) = E[X I Z = z] for each z E JRCI'. Hence, X =
E [X I Z] is the best m.s. estimator, and its error X - X = X - E[X I Z] is orthogonal to
the a-field FZ generated by Z (Section 2.17.2).
Assume that the first two moments of (X, Z) exist and are finite. Consider the linear
estimator X =a Z + b, where a and bare unknown matrices. The requirements that (1) X
is unbiased, that is, E[X] = E[X] = JLx, and (2) the estimation error X- X is orthogonal
on FZ give Eqs. 7.99 and 7 .100. We note that the error covariance matrix can be calculated
from the second moment properties of (X, Z) prior to taking any observations. If X and Z
are not correlated, that is, y xz = 0, then X = JLx is the best m.s. linear estimator of X and
its error covariance matrix is y = y xx. In this case, Z provides no information on X. •

Example 7.66: Suppose that Z = h X+ Vis the observation vector, where his a
known deterministic matrix and Vis an JRd' -valued observation noise with mean
JLv and covariance matrix y vv· It is assumed that X and V are not correlated. The
best m.s. linear estimator of X and its error covariance matrix are
T T -1
X=JLx+Yxxh (hYxxh +Yvv) (Z-JLz),
A

Y = Yxx- Yxx hT (h Yxx hT + Yvv)- 1 h YxX' (7.101)

If d = d' = 1, h = 1, and 1-Lx = 1-Lz = 0, then

X= 1 + Yvv/Yxx z and y = Yxx (1- 1 )


1 + Yvv/Yxx
:S Yxx·
7.4. Applications 541

The quality of the estimation depends on the noise to signal ratio Yvv!Yxx· The er-
ror covariance matrix is zero for perfect measurements (Yvv = 0) and approaches
Yxx as Yvv!Yxx -+ oo. 0

Proof: The above results follow by elementary calculations from the expressions for the
estimator X in Eq. 7.99 and of its error covariance matrix. •

7.4.5.2 Time dependent problems. Discrete time

Let X(t) E ntd, t = 0, 1, ... , be the state of a system that is observed at


each time step. The defining equations for X and the observation vector Z E ~ d'
are

X(t + 1) = a(t) X(t) + b(t) W(t),


Z(t) = h(t) X(t) + V(t), (7.102)

where E[X(O)] = 0, E[W(t)] = E[V(t)] = 0 fort 2: 0, E[W(t) W(s) 7 ] =


q(t)81s, E[V(t) V(s) 7 ] = r(t)8 1s, E[X(O)X(0) 7 ] =
1/1, E[W(t) V(s) 7 ] 0 =
for t, s ::: 0, there is no correlation between the random process V, W and the
random variable X(O), and a, b, h are specified deterministic matrices. The as-
sumption that X (0), V, and W have mean zero is not restrictive. If this assumption
is not valid, the original equation for X can be replaced with an equation for the
state X(t)- E[X(t)], which has mean zero.
Our objectives are to determine (1) the best m.s.linear estimator

X(t Is)= E [X(t) I Z(l), ... , Z(s)] (7.103)

of X (t) given the observation vector (Z(l), ... , Z(s)) and (2) the error covariance
matrix,

y(t Is)= E[(X(t)- X(t Is)) (X(t)- X(t I s)) 7 ], (7.104)

for X(t I s). The solution of this problem is referred to as filtering, prediction,
and smoothing if t = s, t > s, and t < s, respectively. The definitions in
Eqs. 7.103 and 7.104 imply X(O I 0) = E[X(O)] = 0 and y(O I 0) = 1/f.
We note that X(t I s) andy (t I s) can be obtained from Eqs. 7.99 and 7 .100.
However, the approach is inefficient for a large observation vector ( Z ( 1), ... , Z (s)).
The following formulation provides a procedure for calculating X(t I s) and
y (t I s) efficiently. We give equations for the one-step predictor X(t I t - 1),
the estimator X(t I t), and their error covariance matrices.
542 Chapter 7. Deterministic Systems and Stochastic Input

The best m.s.Iinear one-step predictor and the best m.s. estimator of X sat-
isfy the equations

X(t 1 t - 1) = a(t- 1) X(t- 1 1 t - 1),

X(t 1 t) =xu 1 t -1)+k(t) (zct) -h(t)X(t 1 t -1)), (7.105)

where X(O I 0) = E[X(O)] = 0, y(O I 0) = 1/1,


y(t I t - 1) = a(t- 1) y(t- 1 I t - 1) a(t- 1) 7
+b(t -1)q(t -1)b(t -1) 7 ,
y(t I t) = (i - k(t) h(t)) y(t I t - 1),
7 rl
and

k(t) = y(t I t - 1) h(t) 7 ( h(t) y(t I t - 1) h(t) + r(t)


(7.106)

Proof: The first two formulas in Eq. 7.105 are based on the state propagation equation and
the estimation theory. The formulas in Eq. 7.106 are the Kalman-Bucy equations.
The first formula in E9. 7.105 results from Eq. 7.102 applied in the time interval
[t- 1, t], that gives a(t- 1) X(t- 1 I t- 1) + b(t- 1) W(t- 1) for the initial condition
X(t- 1 I t -1). Hence, the best m.s. estimator of X(t) is X(t I t- 1) = a(t -1) X(t- 1 I
t - 1) since W(t- 1) and X(t- 1) are uncorrelated and W has mean zero. If we assume
that the noise processes V and W and the random variable X (0) are mutually independent,
then

X(t It- 1) = E[X(t) I Z(1), ... , Z(t- 1)]


= E[a(t- 1) X(t- 1) + b(t- 1) W(t- 1) I Z(l), ... , Z(t- 1)]
= a(t- 1) X(t- 1 t - 1) 1

since W(t) is independent of Ft =a (Z(l), ... , Z(t - 1)) so that

E[b(t) W(t) I Z(1), ... , Z(t- 1)] = b(t) E[W(t)] = 0.


As expected, we obtained the first formula in Eq. 7.105.
The expression for X(t I t) is based on the following property. Let X E ~ and
Z i E lRd', i = 1, 2, be random vectors in L2 with mean zero. Denote the best m.s. linear
estimator of X given an observation vector U E ~' by X Iu. Straightforward calculations
show that

where N2 = z2- i2 lzl = z2- Yz2,ZI y~~ZI ZJ, called the innovation vector, includes
only information in z2 that is not in zl ([166], Section 6.2) and, for example, Yz2.Z!
denotes the covariance matrix of (Z2, Z1). The error covariance matrix of X lcz 1 ,z 2 ) is
7.4. Applications 543

the error covariance matrix of X Iz 1 less a positive definite matrix of the type in Eq. 7.100
corresponding to the information content in N2, that is,
-1
Yx
l(zt.Zzl
= Yx lzt - Yx,n 2 Yn 2 ,n 2 Yn 2 ,x·

The formulas in Eqs. 7.105 and 7.106 result by replacing Z1, Z2, and i2 lz 1 in the above
equations by (Z(l), ... , Z(t- 1)), Z(t), and X(t It- 1), respectively. •

To find X (t I t - 1), X (t I t), and their properties, we need to perform the


following operations. First, we need to solve the Kalman-Bucy equations for the
initial condition y(O I 0) = 1/1. The formulas in Eq. 7.106 give y(l I 0), k(1),
and y(l I 1) fort = 1. Now we can apply the Kalman-Bucy equations fort = 2
to find y(2 I 1), k(2), and y(2 I 2) and so on. Note that the covariance matrices
y(t I t - 1) and y(t I t) and the ~ain k(t) can be c:Iculated independently of the
observations. Second, calculate X(t I t - 1) and X(t I t) from Eq. 7.105 for the
initial condition X(O I 0) = 0 and the given observation vector (Z(l), Z(2), ... ).

Example 7.67: Let d = d' = 1, b = h = 1, and Ia I < 1, q > 0, r > 0 be some


constants (Eq. 7.102). The error covariance matrix of X(t It) is (Eq. 7.106)

a 2 y(t- 1 It- 1) + q ) ( 2
y(t I t) = ( 1 - a y(t- 1 I t- 1) + q) .
a y(t- 1 It- 1) + q + r
2

The error covariance matrix y (t I t) is zero for perfect measurements (r = 0) and


approaches a 2 y(t- 1 I t - 1) + q as r --+ oo. The measurements do not help in
the latter case since they are polluted by a large noise. <>

7.4.5.3 Time dependent problems. Continuous time


Let X (t), t :=::: 0, be an JRd -valued stochastic process in L2 that is observed
continuously. The defining equations for X and the observation process Z are

X(t) = a(t) X(t) + b(t) W(t),


Z(t) = h(t) X(t) + V(t), (7.107)
where X(O), W(t), and V(s) are uncorrelated with mean zero fort, s :=::: 0,
E[X(O)X(O)T] = 1/1, E[W(t) W(s)T] = q(t)8(t- s), and E[V(t) V(s)T] =
r(t) 8(t - s). The processes Wand V are white noise processes defined heuristi-
cally by their first two moments (Eq. 7.13).
Our objectives are to find (1) the best m.s. linear estimator X (t) = X(t I t)
of X(t) given the observations (Z(s), 0 :=: s :=: t) and (2) the error covariance
matrix

y(t) = y(t It)= E[(X(t)- X(t)) (X(t)- X(t))T] (7.108)


of X(t). We give equations for the evolution of X(t) and the propagation of the
error covariance matrix y ( t).
544 Chapter 7. Deterministic Systems and Stochastic Input

The best m.s. linear estimator of X is the solution of


d
+ k(t) N(t),
A A

dt X(t) = a(t) X(t) (7.109)

where X(0) = 0, the innovation N is a zero-mean white noise process with


E[N(t) N(s)T] = r(t) 8(t- s), andy satisfies the Riccati equation

y(t)= a(t) y(t) + y(t) a(t)T + b(t) q(t) b(tl- y(t) h(tl r(t)- 1 h(t) y(t)
where k(t) = y(t) h(t) r(t)- 1 and y(O) = t. (7.110)

Note: We have seen that the bestm.s.linear estimator X(t I t) of the state X(t) of a discrete
time system depends on observations linearly. We consider a similar estimator for the state
X in Eq. 7.107, that is, we take

X(t) = X(t I t) = f p(t, s) Z(s) ds,

where the matrix p(t, s) will be determined from the condition that it minimizes the co-
variance of the error X(t)- X(t). The above definition of X implies X(O) = 0 so that
y(O) = t (Eq. 7.108).
Under the assumption that W inEq. 7.107 is a Gaussian white noise, it can be shown
that p satisfies the Wiener-Hopf equation ([33], Chapter 4, [135], Chapter 6))

E[X(t) Z(sl] = f p(t, u) E[Z(u) Z(s)T] du, s:::: t.

Because E [ X(t) Z(s)T] = Jci p(t, u) E[Z(u) Z(s)T] du by the above definition of X,
the Wiener-Hopf equation implies E [ ( X(t)- X(t)) Z(s)T] = 0 for all s ::=:: t so that
X(t) = E [X(t) I .1'tl, where .1't = u(Z(s), 0 ::=:: s ::=:: t).
A heuristic proof of Eqs. 7.109 and 7.110 can be found in [166] (Section 7.2). The
proof is based on the finite difference approximation

X(t + !:l.t):::::: (i + a(t) !:l.t) X(t) + b(t)!:. W(t)


of Eq. 7.107, where !:l.t > 0 is a time step, i denotes the identity matrix, and f!. W(t) =
W(t + !:l.t)- W(t). This approximation has the functional form of the defining equation
for X in Eq. 7.102 so that Eqs. 7.105 and 7.106 can be applied to find the evolution of
X(t 1 t). The limit of the resulting equations as fit --+ 0 gives Eqs. 7.109 and 7.110.
Heuristic arguments need to be invoked to perform this limit. A

To find X(t) and its properties, we need to perform the following opera-
tions. First, the Riccati equation has to be solved for the initial condition y(O) =
1/f. The solution of this equation and the second formula in Eq. 7.110 yield the
time evolution of the gain k. Second, Eq. 7.109 has to be solved for the initial con-
dition X(0) = 0. The methods in Section 7 .2.1.2 can be used to find the second
7.4. Applications 545

moment properties of X. If the state X is observed at some discrete times, the pro-
cedure in the previous section can be applied in conjunction with the recurrence
formula (Eq. 7.6)

X(t) = O(t, s) X(s) + lt O(t, u) b(u) W(u) du, t > s.

Example 7.68: Let X be a real-valued process observed at the discrete times t =


n l!,.t, where n ::: 1 is an integer and l!,.t > 0 denotes the time step. The observation
equation is Z(n l!,.t) = X(n M) + V(n), where E[V(n)] = 0, E[V(n) V(m)] =
r 8nm, and r > 0. The state is defined by X(t) = W (t) with X (0) = 0, t ::: 0,
where W is a white noise with E[W(t)] = 0, E[W(t) W(s)] = 8(t - s), and
E[W (t) V (s)] = 0 for all s, t ::: 0. The error covariance matrices of X(n I n - 1)
and X(n In)= E[X(n !!,.) I Z(M), ... , Z(n !!,.t)] are

y(n In- 1) = y(n- 1 In- 1) + M,


r y(n I n- 1)
y(n In)= .
r+y(nln-1)

Ifthe observation noise is much smaller than the driving noise (l!,.t » r), then
y(n I n) :::::: r and y(n I n - 1) :::::: l!,.t. Hence, the prediction uncertainty is
approximately equal to the variance of the driving noise in a time step !!,.t. This
uncertainty is reduced in the filtering phase to the observation variance. (>

Proof: The discrete version of the state equation is X (n t!.t) = X((n - 1) M) + W(n),
where E[W(n)] = 0 and E[W(n) W(m)] = t!.t Onm· The formulas in Eq. 7.106 give the
above recurrence equations for y(n I n- 1) and y(n I n). Since y(O I 0) = 0, we have
y(l I 0) = M and y(1 I l) = r t!.tj(r + M):::::: r, where the approximate equality holds
for M » r. Similar approximations hold for y (n I n - 1) and y (n I n ). •

Example 7.69: Let X be the state in Eq. 7.5, where Y = Wand W is the white
noise in Eq. 7.13 assumed to have mean zero, that is, JL w (t) = 0. The mean and
covariance matrices of the initial state are p,(O) and c(O, 0), respectively. Sup-
pose that we observe the state X through an imperfect device giving the readings
Z(t) = h(t) X(t) + V(t) (Eq. 7.107), where his a (d, d) matrix and Vis a white
noise with E[V (t)] = 0 and E[V (t) V (s)T] = q v(t) 8 (t - s) that is uncorrelated
with X and W. The best m.s. linear estimator of the state X is the solution of

k(t) = a(t) X(t) + k(t) ( Z(t)- h(t) X(t)), (7.111)

where k(t) = c(t, t) h(t)T qv(t)- 1 and cis the covariance function of X. This
result is consistent with Eq. 7.109. (>

Proof: Define X by X(t) = cx(t) X(t) + k(t) Z(t) and impose two conditions: (1) the
estimator X of X is unbiased, that is, E[X(t)] = E[X(t)] for all t :::: 0 and (2) the error
546 Chapter 7. Deterministic Systems and Stochastic Input

X(t) = X(t)- X(t) is minimized in the m.s. sense, for example, we may require that the
trace of E[X(t) X (t)T] be minimized.
The means IL and [L of X and X satisfy the equations

jt(t) = a(t) p,(t)) and


/l(t) = a(t) [L(t) + k(t) #Lz(t) = a(t) [L(t) + k(t) h(t) p,(t),
where the last equality holds since #Lz(t) = E[Z(t)] = h(t) p,(t). To satisfy the first
condition, that is, that X is an unbiased estimator of X, we must have a(t) + k(t) h(t) =
a(t) and {L(O) = p,(O). Accordingly, X is the solution of

k(t) = a(t) X(t) + k(t) ( Z(t)- h(t) X(t))


with X(0) = p,(O). The difference between the differential equations for X and X yields

X(t) = [a(t)- k(t) h(t)] X(t) + b(t) W(t)- k(t) V(t).

The evolution of {L(t) = E[X(t)] and c(t, t) = E [ ( X(t) - {L(t)) ( X(t) - {L(t)) TJ can
be calculated from results in Section 7.2.1 since b(t) W(t) - k(t) V(t) is a white noise
process with mean zero and intensity b(t) qw (t) b(t)T + k(t) qv(t) k(t)T. The initial con-
dition for the differential equation of c is c(O, 0) = c(O, 0) since X(0) = p,(O) so that
X(O)- {L(O) = X(O) -p,(O).
The second condition requiring the minimization of the error X(t) = X (t) - X(t)
is used to find the optimal gain, that is, the function k that minimizes c(t, t) at each time
t ;::: 0. These calculations can be found in [129] (pp. 265-267). •

7.5 Problems
7.1: Derive the mean, correlation, and covariance equations in Eq. 7.12.

7.2: Find the second moment properties of the stochastic process X(t), t ;::: 0, in
Example 7.5 for~ 1= am+ fJ k.
7.3: Extend the approach in Example 7.8 to the case in which the driving noise Y
is a colored stationary Gaussian process given by some of the coordinates of the
state of a linear time-invariant filter driven by Gaussian white noise.

7.4: Let X be the solution ofEq. 7.5 withY being the white noise Win Eq. 7.13.
Suppose that the drift, diffusion, and noise parameters are time invariant, that is,
a(t) =a, b(t) = b, JLw(t) = JLw, and qw(t) = qw, and that the drift coefficients
are such that Eq. 7.5 has a stationary solution X s with second moment properties
given by Eq. 7.15. Show that the second moment properties of X are given by
Eq. 7.15 if X (0) is random and follows the marginal distribution of X s.

7.5: Find the first four moments of the martingale M(t) = B(t) 3 -3 t B(t), t :=: 0,
where B is a Brownian motion.
7.5. Problems 547

7.6: Let C(t) = '£~~? Yk be a compound Poisson process, where N is a Poisson


process with intensity ).. > 0 and the random variables Y k are independent copies
of Y1 with mean zero and finite moments. Let M(t) = C(t) 2 - C2(t), where
C2 = '£~~? Yf. Calcutate the first two moments of M. Is M a martingale with
respect to the natural filtration of C?

7.7: Find the first four moments of a process X defined by the stochastic differ-
ential equation dX (t) = -ot X (t) dt + f3 dM (t) fort ~ 0 and X (0) = 0, where
ot > 0, f3 are some constants, M(t) = B(t) 3 - 3 t B(t), and B is a Brownian
motion.

7.8: Complete the details of the proof ofEq. 7.38.

7.9: Let X be the solution of Eq. 7.28, where S = M and M is a continuous


square integrable martingale. Find whether it is possible to develop differential
equations for the moments J-L(ql, ... , qa) = E [ Il%= 1 Xk(t)qk Jof X.
7.10: Let X be the solution of the stochastic differential equation

dX(t) = -ot X(t) dt + dM(t), ot > 0, t ~ 0,

where dM(t) = x(M(t)) dB(t) and xis a continuous function. Find whether the
moment equations corresponding to the augmented vector (X, M) are closed for
x(g) = g and x(g) = g 2.

7.11: Complete the details of the proof ofEq. 7.39.

7.12: Write a differential equation involving moments of the augmented state vec-
tor Z = (X, S) in Eq. 7.47. Outline conditions under which these equations can
be used to find the moments of Z.

7.13: Repeat the calculations in Example 7.23 for l = 3.

7.14: Prove the result in Example 7.25.

7.15: Find the solution for X defined in Example 7.32 in which B is replaced by
a compound Poisson process. Develop moment equations for X.

7.16: Let X be the process defined in Example 7.33. Develop moment equations
for (X, X). Are these moment equations closed?

7.17: Let f(x) = (1/ h) l[xo-h/2,xo+h/2J(X), h > 0, be a density function. Find


the characteristic function q;(u) of this density. Comment on the behavior of q;(u)
as lui ---+ oo.

7.18: Show that limHoo B(t)ft = 0 a.s.


548 Chapter 7. Deterministic Systems and Stochastic Input

7.19: Derive the partial differential equation for the characteristic function of the
state vector (X, X, S) defined in Example 7.28 and specify the initial and bound-
ary conditions needed to solve this equation.

7.20: Derive the Fokker-Planck equation for the density of X defined by dX (t) =
a(X(t)) dt + b(X(t)) dB(t), t ::=: 0, starting at X(O) = xo by using Ito's formula.
Assume that the conditions in Eq. 7.63 are satisfied.

7.21: Find the density of the geometric Brownian motion in Example 7.36 by
solving the Fokker-Planck equation numerically.

7.22: Find the stationary density of the Ornstein-Uhlenbeckprocess X defined by


dX (t) = -a X (t) dt + ..jf(X dB(t), where a > 0 is a constant.

7.23: Complete the details of the proof in Example 7 .38.

7.24: Complete the details of the proof in Example 7.39.

7.25: Take u'(x) = v5 x (1 + s x 2 ) in Example 7.42, where sis a constant. Show


that the stationary density of the state vector (X 1 =X, Xz =X) is

fs(xl, xz) = c ..,12; ao exp [- 2 ~J (xr + ~ s x 4) ]

x 1
../f7i a0
xi J'
exp [ - - -
2 aJ

where ao = rr go/ (2 p v5), aJ = v5 aJ, and c > 0 is a constant.

7.26: Complete the calculations in Example 7.49.

7.27: Prove the statement in Example 7.51.

7.28: Develop an alternative model of the ice-core data in Fig. 7.12.

7.29: Develop the differential equations for the correlation functions r pq (t, s) of
the augmented state Z = (X, X, S) in Section 7.4.1.2 for both the transient and
stationary solutions. Confirm the numerical results in Fig. 7.13.

7.30: Define a reliability problem and calculate the bounds in Eq. 7.92 on the
probability of failure P f ( r). Find also Monte Carlo estimates of P f ( r).

7.31: Find the solution of the problem in Example 7.68 for the case in which X
is an Ornstein-Uhlenbeck process.
Chapter 8

Stochastic Systems and


Deterministic Input

8.1 Introduction
This chapter examines algebraic, differential, and integral equations with
random coefficients. The boundary and/or initial conditions required for the so-
lution of differential and integral equations can be deterministic or random. The
input is deterministic for all types of equations. We refer to these problems as
stochastic systems with deterministic input.
Our objective is to find probabilistic properties of the solution of stochas-
tic systems subjected to deterministic inputs. Two classes of methods are avail-
able for solving this type of stochastic problems. The methods in the first class
constitute direct extensions of some of the results in Chapters 3 and 6, for ex-
ample, the local solution for a class of differential equations with random coef-
ficients (Section 8.2), the crossing solution for the random eigenvalue problem
(Section 8.3.2.6), and some of the homogenization methods for random hetero-
geneous media (Section 8.5.1). These methods involve advanced probabilistic
concepts and in many cases deliver detailed properties of solutions. The methods
in the second class are largely based on relatively simple probabilistic concepts
on random variables, processes, and fields. The main objective of these methods
is the calculation of the second moment properties of the solution of algebraic,
differential, and integral equations with random coefficients.
We (1) examine partial differential equations with the functional form in
Chapter 6 (Section 8.2), inhomogeneous and homogeneous algebraic equations
(Sections 8.3.1 and 8.3.2), and inhomogeneous and homogeneous differential and
integral equations (Sections 8.4.1 and 8.4.2), (2) calculate effective or macro-
scopic constants for random heterogeneous materials (Sections 8.5.1 and 8.5.2),
(3) study the evolution of material properties and the possibility of pattern for-

549
550 Chapter 8. Stochastic Systems and Deterministic Input

mation in the context of elasticity and crystal plasticity (Sections 8.6.1 and 8.6.2),
(4) define the stochastic stability problem and illustrate its solution by some simple
examples, and (5) examine localization phenomena in heterogeneous soil layers
and nearly periodic dynamic systems (Section 8.8.1 and 8.8.2).

8.2 Local solutions


Consider the partial differential equations defined in Section 6.1 and sup-
pose that they have random coefficients, boundary conditions, and/or initial con-
ditions. Assume that the local solutions developed in Chapter 6 hold for almost
all samples of the random parameters in these equations. The local solutions are
random variables that can be characterized by estimates of their mean, covariance,
and other properties.
A three step procedure can be applied to find the local solution for partial
differential equations with random coefficients, boundary conditions, and/or ini-
tial conditions. First, the local solutions in Sections 6.2, 6.3, and 6.4 need to be
calculated for a sample w' of the random parameters in the definition of these
equations. Second, the first step has to be repeated for a collection of samples
{w'}. Third, statistics of the local solution need to be inferred from results in the
previous step. We illustrate this procedure by a simple example.

Example 8.1: Let U be the solution of the partial differential equation

with the boundary condition U(x) = 8(x), x E aD, where D is an open set
in JR2 , and (Ai, Hij) and 8 are real-valued random fields defined on D and aD,
respectively. It is assumed that ( 1) the matrix H = { H ij} admits the representation
H(x) = C(x) C(x)T ateachx ED and (2) the random fields Ai, Hii• Cii• and 8
are such that the local solution in Section 6.2 can be applied for almost all samples
of these fields.
Let Ai (·, w'), Cij (·, w'), and 8(·, w') be samples of the random fields Ai,
Cij, and 8, respectively, and define for each w' an W -valued diffusion process by
the stochastic differential equation

dX(t; w') = A(X(t; w1), w') dt + C(X(t; w'), w') dB(t, w'),
where B is an JR2 -valued Brownian motion. For a sample w' of Ai, Cij, and 8,
the local solution is

U(x, w') = Ex[S(X(T(w 1), w'), w1)], xED,

where T(w') = inf{t > 0: X(t, w') ¢ D} denotes the first time X(·; w') starting
at x E D exits D and the above expectation is with respect to the random variable
8.3. Algebraic equations 551

X(T(w'), w') for a fixed w' (Section 6.2.1). Statistics of the local solution U(x)
can be obtained from its samples U (x, w').
Numerical results have been obtained forD= ( -2, 2) x (-1, 1),

A; (x), Cii (x) "' U(i -a;, i +a;)

with lail < i, i = 1, 2, C12(x) "' U(1/2- a3, 1/2 + a3) with la31 < 1/2,
8(x) "' U(1 - a1, i + a1) on the side (-2, 2) x {-1} of D, and 8(x) = 0
on all other boundaries of D, where a1 = 0.4, a2 = 1, and a3 = 0.2, and the
random variables A;, C;j, and 8 are independent of each other. Figure 8.1 shows
a histogram of U((0.5, 0)) based on 100 independent samples of A;, Cij. and 8.

Histogram of U((0.5,0))
4.5

=
Mean 0.2623
Std = 0.1328
3.5 (100 samples)

2.5

1.5

1m~
0.5

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

Figure 8.1. Histogram of the local solution U (x) at x = (0.5, 0)

The solutions U((0.5, 0), w') for each w' are based on 1,000 independent samples
of X generated with a time step of M = 0.001. For a; = 0, i = 1, 2, 3, the local
solution is deterministic, and its estimate at x = (0.5, 0) for the same sample size
is 0.2731. <>

8.3 Algebraic equations


Consider the algebraic equation

(A-A.i) X=q, (8.1)

where i denotes the (n, n) identity matrix, q is ann-dimensional column vector


with real coordinates, A is an (n, n) matrix whose entries are real-valued random
variables defined on a probability space (Q, F, P), and A. is a parameter. We
552 Chapter 8. Stochastic Systems and Deterministic Input

refer to Eq. 8.1 with q f= 0 and q = 0 as inhomogeneous and homogeneous


equations, respectively. Our objective is to find probabilistic properties of the
solution X for the inhomogeneous version of Eq. 8.2 with a fixed value of A and
of the eigenvalues and eigenvectors of the homogeneous version of this equation.
The solution X is an n-dimensional column vector.
The local solutions in Sections 6.5.1 and 6.5.2 can be extended to solve
Eq. 8.1locally by following the approach in Section 8.2. This method of solution
is not considered here.

8.3.1 Inhomogeneous equations

Let Y =A- Ai for a fixed value of A such that y- 1 exists a.s. Then Eq. 8.1
becomes
YX=q, qEffi.n. (8.2)

Note: Because X = y- 1 q and the mapping Y r-+ y- 1 q is measurable, X is a random


variable on (Q, :F, P). The uncertainty in Y can have many sources, for example, geomet-
rical imperfections for mechanical systems, limited understanding of a phenomenon, and
random errors related to the definition of the boundary conditions.
The conditions in Eq. 8.2 are encountered frequently in applications [173]. For
example, the finite difference and finite element formulations of differential and integral
equations defining problems in solid mechanics, physics, economics, and other fields yield
conditions of the type in Eq. 8.2. A

Our objective is to find the probability law of X from its definition in Eq. 8.2
and the probability law of Y. We will be able to achieve this objective in only a
few cases since (1) the inverse y-l of a random matrix Y cannot generally be
found analytically and (2) the determination of the probability law of X from
X = y-l q is usually impractical even if Y - 1 is known. Most of the methods in
this section focus on the approximate calculation of the first two moments of X
under the assumption Y E L2.

8.3.1.1 Monte Carlo simulation method

The Monte Carlo simulation method provides the most general solution for
Eq. 8.2 in the sense that it can deliver an estimate of the probability law of X. The
essential limitation of the method is the computation time, that can be excessive if
the solution of the associated deterministic problem (Eq. 8.3) is computationally
intensive and/or the required sample size n s is large. Another potential limitation
of Monte Carlo simulation relates to the available information on Y. For example,
if only the first two moments of Y are available, the use of Monte Carlo simulation
requires us to postulate the probability law of this random vector.
8.3. Algebraic equations 553

1. Generate n 8 independent samples Yk• k = 1, ... , n 8 , of Y and calculate

Xk=Yk" 1 q, k=1, ... ,n8 • (8.3)

2. Estimate statistics of X, for example, the mean and covariance matrices of X,


ILx andy x• can be approximated by

(8.4)

Note: The algorithms in Section. 5.2 can be used to generate samples of Y. The Monte
Carlo simulation method is not restricted to linear equations. The method can be applied
to nonlinear equations in which case the determination of the samples Xk of X requires
solution of a nonlinear algebraic equation rather than Eq. 8.3. •

Example 8.2: Let X = (X 1, X2) be the solution of

[ ~:(<:s ~:(<::; J[ i~ J= [ ~ J.
where E>1 = tan- 1 (Z2/(Z1 +a)), E>2 = tan- 1 ((a- Z2)/(Z1 +a)), Z1 and
Z2 are independent random variables uniformly distributed in ( -e a, e a), and
0 ::: e ::: 1/3. The random vector X gives the forces in the two-member truss
in Fig. 8.2 under a deterministic load q. Because of geometric imperfections, the

t
a
!q
' '
::~~z,
I
~
1 ~
l 1l
a z1

Figure 8.2. A two-member truss with random geometric imperfections

position of the joint defined by the intersection of the two bars is random with the
coordinates (a + Z 1, Z2).
Figure 8.3 shows histograms of X 1 and X 2 obtained from n 8 = 1, 000 truss
samples for a = 1, e = 0.3, and q = 1. The estimated mean, standard deviation,
554 Chapter 8. Stochastic Systems and Deterministic Input

1.4,-------------, 1.5,---------------,

1.2 Histogram Histogram


of x 1 of x 2

0.8

0.6
0.5
0.4

Figure 8.3. Histograms of X 1 and X 2

skewness, and kurtosis are -1.16, 0.52, 1.23, and 4.17 for X 1 and 1.58, 0.52, 1.18,
and 3.94 for X2. If e = 0, that is, there is no uncertainty in the truss geometry, the
member forces are X 1 = -1 and X 2 = J2. <>

Note: The defining equations for (Xt. X2) result from the equilibrium conditions, where
81 and 82 are the random orientations of the bars (Fig. 8.2). The entries of matrix Y in
Eq. 8.2 are Yu = cos(E>1), Y12 = cos(E>z), Y21 = sin(E>1), and Y22 = sin(E>2). The
Monte Carlo simulation was based on Eqs. 8.3 and 8.4. First, ns independent samples
of the joint coordinates (a + Z1, Z2) were generated and the corresponding samples of
the angles (81, 82) and member forces (X 1, X2) were calculated. Second, the resulting
member forces were used to estimate moments and develop histograms for (X1, X2). •

8.3.1.2 Taylor series method

Suppose that all entries of Y in Eq. 8.2 are measurable functions of an


m.m-valued random variable Z E L2 with mean /Lz = {J.tz,u} and covariance
>'z = {Yz,uv}, u, v = 1, ... , m. Hence, the solution X can also be viewed as a
function Z ~--+ X ( Z). It is assumed that this function has continuous second order
partial derivatives in m_m. The following two equations give the first order Taylor
approximation of X and the first two moments of this approximation.
8.3. Algebraic equations 555

1. Approximation of X:

~ aX(/Lz)
X::: X(/Lz) + L... a (Zu- J.Lz,u). (8.5)
u=1 Zu

2. Approximate second moment properties of X:


m T
""' aX(/Lz) aX(/Lz)
and y X ::: L... a a Yz,uv· (8.6)
u,v=l Zu Zv

Note: The approximation of the solution X in Eq. 8.5 is referred to as the linear or first
order Taylor approximation. We also note that the approximate second moment properties
of X in Eq. 8.6 depend only on the first two moments of Z.
The Taylor expansion is

where
Rz(z<o)) = -21 L aazu
m 2 X(Z*)
azv
(
Zu- Zu(0)) ( Zv- Zv(0))
u,v=1
and Z* = e z(O) + (1 - e) Z, e E (0, 1) ([149], Remark 2, p. 453, [26],Theorem 21.1,
p. 142). The approximation in Eq. 8.5 is given by the above Taylor expansion for z( 0l = /Lz
without the remainder Rz.
The error of the linear approximation in Eq. 8.5 can be bounded by using the ex-
pression of Rz. This bound is rarely used in applications since it requires the calculation
of the second order partial derivatives a2x;azu azv. u, v = 1, 2, ... , m, which is a time
consuming task for large vectors Z. &

The approximate solutions in Eqs. 8.5 and 8.6 depend on the values of X
and its gradient for Z equal to its mean value, which are given by the following
two equations.

X(/Lz) = Y(/Lz)- 1 q and (8.7)

a~~:z) = Y(/Lz)-1 ( _ a~~z) X(/Lz)). (8.8)

Proof: The solution of Eq. 8.7 corresponds to Eq. 8.2 with Z replaced by its mean value
and involves the inversion of a deterministic matrix Y (/Lz). The derivative of Eq. 8.2 with
respect to a coordinate of Z is

8Y(Z) X(Z) + Y(Z) aX(Z) = .!.!!._


azu azu azu
556 Chapter 8. Stochastic Systems and Deterministic Input

and gives Eq. 8.8 by setting z = /Lz since aq jazu = 0. The partial derivatives aX(/Lz)fazu
are called sensitivity factors because they give the rate of change of X with respect to
perturbations in the coordinates of Z about /Lz. If both the sensitivity factor with respect
to a coordinate Zk of Z and the uncertainty in this coordinate are small, then Zk can be set
equal to its mean value as a first approximation. •

Example 8.3: Let X be the solution of Eq. 8.2 with m = n = 1. Suppose that
Z takes values in a bounded interval (a, b) and that the function Z ~--+ X ( Z) has
a continuous second order derivative. The error of the first order approximation,
X(Z) = X(J.Lz) + X'(J.Lz) (Z- J.Lz), can be bounded by

IX(Z(w))- X(Z(w))l ::S max IX"(~)I (Z(w)- J.Lz) 2


a<~<b

for almost all w E Q. The calculation of this bound is impractical for realistic ap-
plications since X" is not known and its numerical calculation can be prohibitive.
Moreover, the resulting bound may be too wide to be informative. <>

Example 8.4: Consider the algebraic equation in Example 8.2 with a = 1, q = 1,


and e = 0.3. The approximate mean and standard deviation given by Eq. 8.6 are
-1 (-14%) and 0.6 (14.81 %)for X 1 and ../2 (-10.65%) and 0.49 (-6.42%) for
X2. The numbers in parentheses give differences between these approximations
and the Monte Carlo simulation in Example 8.2, and can be viewed as errors
since the Monte Carlo solution is based on a large sample. The errors of the
approximations in Eq. 8.6 decrease with the uncertainty in Z and vanish if there
is no uncertainty in the truss geometry (e = 0). <>

Note: The approximate mean of X given by Eq. 8.7 can be obtained from the equilibrium
conditions for Z1 = 0 and Z2 = 0. The gradients of the coordinates of X relative to Z are
given by Eq. 8.8. The partial derivatives a cos(8u)faZv and a sin(8u)faZv, u, v = 1, 2,
have to be calculated to find the entries of matrices aY(Z)jazu, for example, the element
( 1, 1) of this matrix is

for u = 2. The entries (1,1), (1,2), (2,1), and (2,2) of matrices aY(/Lz)fazu are 0, l/v'2-1,
0, and -1 for u = 1 and 0, 0, 1, and -1/.J2 for u = 2. The derivatives aX(/Lz)fazu in
Eq. 8.8 are (2, -2) for u = 1 and (2.83, 2) for u = 2. The second formula in Eq. 8.6 gives
an approximation for the covariance matrix of X.
Figure 8.4 shows the functions Z f-+ Xu (Z), u = 1, 2. The approximate coordinate
of X in Eq. 8.5 are hyperplanes 7ru tangent to the graphs of these functions at (/Lz, xu(/Lz)),
u = 1, 2. The error of the approximate covariance in Eq. 8.6 depends on (1) the differences
between the hyperplanes 7ru and the functions Z f-+ Xu(Z) and (2) the uncertainty in Z.
For example, the approximate second moments of X are likely to be inaccurate if the above
differences are significant and/or the variance of the coordinates of Z is relatively large. A.
8.3. Algebrmcequations 557

Function X 1(Z) Function X2(Z)

0 4

-1 3 ..

-4
30 30
30 30
Zz zz
0 0 0 0

Figure 8.4. The mapping Z 1-+ Xu(Z) for a two-member truss

The approximations of the second moment properties of X in Eqs. 8.5-8.8


can be (a) improved by retaining additional terms of the Taylor series represent-
ing the function Z 1-+ X(Z) and (b) generalized to solve nonlinear algebraic
equations. We examine briefly these extensions.
Higher order approximations. If the function Z 1-+ X(Z) has continuous third
order partial derivatives, then X can be approximated by the quadratic form

X(Z) ~ X(P,z) +~
~
aX(p,z)
(Zu- JLz,u)
u=l azu

+ -2
1
Lm aZX(P,z)
a a (Zu -
u,v=l Zu Zv
JLz,u) (Zv - JLz,v). (8.9)

of Z. The calculation of the second moment properties of X based on this ap-


proximation requires knowledge of the first four moments of Z. For example, the
mean of X can be approximated by

/Lx ~ X(P,z)
1
+ -2 Lm aZX(P,z)
a a Yz,uv•
u,v=l Zu Zv

The above second order partial derivatives of X can be calculated from the second
partial derivatives ofEq. 8.2.
Nonlinear equations. Suppose that X is the solution of
YX+N=q, (8.10)
where Y is as in Eq. 8.2, N denotes an (n, 1) matrix whose entries are nonlinear
real-valued functions of (Z, X), and the 1Rm-valued random variable Z collects
558 Chapter 8. Stochastic Systems and Deterministic Input

the distinct random parameters in this equation. Suppose that X is approximated


by Eq. 8.5 so that we need to find X(/Lz) and oX(/Lz)fozu. u = 1, ... , m, to apply
Eq. 8.6. The solution ofEq. 8.10 with Z replaced by its mean value gives X(p. z).
The derivative ofEq. 8.10 with respect to a coordinate Z u of Z is

The gradients of X can be obtained from the above equation since the functions
Z 1--+ Y(Z) and (Z, X) 1--+ N(Z, X) are known so that the partial derivatives
oY jozu, oN jozu. and oN joxk can be calculated.

8.3.1.3 Perturbation method


Suppose that the random matrix Y in Eq. 8.2 has the representation

Y =a+8R, (8.11)

where a is a deterministic matrix, 8 denotes a small parameter, and R is the ran-


dompartofY.Itiscommonto seta= E[Y] sothat8 R = Y -a, E[R] = 0, and
Yij,kl = E[Rij Rkz]. The approximate perturbation solution of Eq. 8.2 withY in
Eq. 8.11 and its second moment properties are given by the following equations.

1. Approximation of X:

X=Xo+8Xt+8 2 X2+0(8 3 ), where


-1 -
X,= -a R Xr-1. r = 1, 2, ... , and Xo =a- 1 q. (8.12)

2. Approximate second moment properties of X:

E[X] = a- 1 q + 82 a- 1 E[R a- 1 R] a- 1 q + 0(8 3 ),


E[X XT] = E[Xo X61 + 82 E[Xo XI+ X 1 Xf + X2 X61 + 0(8 3 ). (8.13)

Proof: The above perturbation solution is regular because the random part of Y is small
and the solution X does not differ significantly from the solution x = a- 1 q of Eq. 8.2
with s = 0 (Example 8.5 in this section, [101], Section 1.2).
The representation of X in Eq. 8.12 and Eq. 8.2 give

(a+ s R) (Xo + s X 1 + s 2 X2 + · · ·) = q or
(a Xo- q) + s (a X1 + R Xo) + s (a X2 + R X t) + · · · = 0,
- 2 -

that is, a power series that must be zero for all values of s. We require that the coefficients
of all powers of s be zero according to a fundamental theorem of perturbation theory ([171],
8.3. Algebraic equations 559

p. 12). This requirement gives


a Xo =q (order 1),
aX1 = -RXo (order e),
a X2 = -R X 1 (order e2 )
for the terms of order er, r = 0, 1, 2. The solution of order p ~ 1 of X is

x<P)
p
= L(-1)k ek (a- 1 R)
- k
a- 1 q,
k=O

where ( a- 1 Rtis the identity matrix fork = 0. Note that the equations for Xo, X 1, ...
have the same deterministic operator.
The moments in Eq. 8.13 show that it is not possible to approximate E[X xT] to
the order e 2 by using the first order approximation of X. The first order approximation
of X delivers only one of the three terms of order e2, the term E[X 1 x[l. The other two
terms cannot be obtained from the first order approximation. •

Generally, it is difficult to assess whether a perturbation series, X o + 8 X 1 +


82 X 2 + · ··, is convergent since its general term is rarely known even for simple
problems (Example 8.5). The representation given by Eq. 8.12 can be generalized
in at least two directions, as shown by the following two results.

1. The perturbation method can be used to find the solution of Eq. 8.2 when the
random matrix Y has a more general form than in Eq. 8.11, for example, Y =
a+ 8 R1 + 82R2 + 0(8 3).
2. The perturbation method can also be applied to determine the solution of non-
linear problems defined by
YX+8N=q, (8.14)
where Y is given by Eq. 8.11 and the entries of the (n, 1) matrix N are real-
valued nonlinear functions of X and may be random. The first three terms of the
perturbation solution of this equation can be calculated from

aXo = q,
a X1 = -R Xo- N(Xo),
aX2 = R X1- N*(Xol X1, (8.15)

where N*(Xo) is an (n, n) matrix with columns aN jaxi evaluated at Xo.


Proof: The perturbation series in Eq. 8.12 and Eq. 8.14 give

(a +eR)(Xo +eX1 + ·· ·) +eN(Xo +eX1 + · ··) = q,


which can be approximated by

- - ~ aN(Xo)
(a +eR)(Xo +eX1 + · · ·) +eN(Xo) +e L.J (eX1 i + ···) = q,
i=1 axi ,
560 Chapter 8. Stochastic Systems and Deterministic Input

where X 1,i are the coordinates of X 1· •

Example 8.5: Let x be the solution of the deterministic nonlinear algebraic equa-
tion x + 6 x 3 = 1, where 6 is a small parameter. The approximation of the real
root of this equation to the order 6 3 is x ~ x< 3) = 1-6 + 3 6 2 - 126 3.
Figure 8.5 shows the exact root and the perturbation solutions of the first

0.9

0.8
/
o.? Solution x<l)
0.6

0.5

o.4 Exact solution


0.3

0.2 J'
0.1 Solution X (3)
0 o~--~o.~1----~o.2~--~o.3~--~o.4~_L~o.s
£

Figure 8.5. Exact and perturbation solutions

three orders. For small values of 6 the perturbation solutions are satisfactory, and
their accuracy improves slightly with the order of the approximation. For large
values of 6 the perturbation solutions are inaccurate. ¢
Note: The power series x = Lk=O, 1, ... ek Xk and the equation x + e x 3 = 1 give
(xo + e x1 + e2 x2 + · · ·) + e (xo + e x1 + 8 2 x2 + ·· ·)3 = 1 or
xo + e x1 + 8 2 x2 + 8 3 x3 + · · · + 8 [x6 + 3 8 x5 x1 + 3 8 2 (x5 x2 + xo xi) + · · · ] = 1
so that we have xo = 1, x1 + x6 = 0, x2 + 3 x5 x1 = 0, and x3 + 3 (x5 x2 + xo xi) = 0 by
equating terms of the same order of magnitude. These equations give xo = 1, x1 = - x6 =
-1, x2 = -3x5 x1 = 3, and x3 = -3 <x5 x2 + xo xi)= -12.
We note that the algebraic equation in this example provides an illustration for a
singular perturbation problem since the limit point 8 = 0 differs in an essential way from
the limit 8 ~ 0. The equations with e = 0 and 8 > 0 have a single root and three roots,
respectively. Problems that are not singular are said to be regular ([101], Section 1.2). .&

Example 8.6: Consider the algebraic equation in Example 8.2 and represent the
random matrix Y as the sum of a deterministic part a = E[Y] and a random part
e R = Y- a. For 6 = 0.3 the approximate means of X 1 and X2 are -1.0130
(1.30%) and 1.4245 (0.73%), and the approximate standard deviations of X 1 and
X2 are 0.3825 (-26.81%) and 0.3831 (-26.82%), respectively. These approxi-
mate results are based on Eq. 8.13. The numbers in parentheses give errors relative
to the Monte Carlo solution in Example 8.2. ¢
8.3. Algebraic equations 561

Note: The expectations E[Y] and E[Rij Rkzl needed for solution are difficult to find ana-
lytically because of the complex dependence of R on the random variables specifying the
random geometry of the truss. These averages were estimated from ns = 1, 000 samples
of Y generated by Monte Carlo simulation. For example, the estimates of the entries (1, 1),
(1, 2), (2, 1), and (2, 2) of E[Y] are 0.9848, 0.7003, 0.0025, and 0.7038, respectively. -'.

8.3.1.4 Neumann series method


The solution X = y- 1 q of Eq. 8.2 requires the inversion of an arbitrary
random matrix, and there are no efficient algorithms for finding Y - 1 . The Tay-
lor expansion and the perturbation methods calculate approximately probabilistic
characteristics of X without evaluating y- 1 . In contrast, the Neumann series
method approximates Y - 1 and uses this approximation to calculate probabilistic
properties of X. The Neumann series method is based on the following fact.

If m is an (n, n )-matrix such that

llmxii:S:yllxll, O<y<l, Vxe~n. then (8.16)


00

(i +m)- 1 = L(-l) 7 m7 , (8.17)


r=O

where i denotes the (n, n) identity matrix, m 0 = i, II ~ II is the Euclidean norm


of~ E !Rn, and the series in Eq. 8.17 is absolutely convergent ([187], Chapter 2).

Proof: Consider the sequence of sums Sk = E~=O (-1) 7 m 7 • Fork > l, we have
k k k
llskX-S[XII=II L (-1tm 7 xll::: L llm7 XII:S: L y7 llxll
r=l+1 r=l+1 r=l+1
yl+l(l- yk-1) yl+1
= -y
1 II
II::: -1--y II X II X

by norm properties and Eq. 8.16. Because the upper bound on II Sk x - S[ x II approaches
zero for each x E Rn as l, k - oo and Rn is complete ([32], Theorem 3.8, p. 41), Sk xis
Cauchy in Rn and has a limit
k 00

sx = lim SkX = lim L:<-1l m 7 x = L(-1) 7 m 7 x.


k...-.oo k-+oo r=O r=O

It is left to show that s = (i + m) - 1, that is, that the equalities s (i + m) x =


(i + m) s x = x hold for each x. The difference,

II Sk (i +m)x -x 11=11 skx +skmx -x 11=11 (-1)k mk+ 1 x 11::: yk+ 1 II x 11.
between Sk (i + m)x and x approaches zero ask - oo because y E (0, 1) so that
limk-+oo sk (i + m) x = x. Because s x = liffik_,. 00 Sk x holds for each x E lR" and
562 Chapter 8. Stochastic Systems and Deterministic Input

(i + m) x E JRn, we haves (i + m) x = x. Similar considerations can be used to show that


(i+m)sx=x.
We also note that under the condition in Eq. 8.16 the series I::~0 (-l)' m' xis
absolutely convergent since
100 00 00 1
2L(-l/m'x :::s_LIIm'xll:::s_Ly' llxll=-
1- llxll,
r=O r=O r=O - y
which is finite. •
Example 8.7: Let X E lP& be the solution of (a+ R) X= q, where a, q are some
constants and R is a random variable. If IRfal :::; y < 1 a.s., the sequence

SkXo=(t,(-1)' (~)')(qja), k=1,2, ... ,

converges a.s. to X so that X = (1 + I:~ 1 (-1)' (RfaY) (qja) is the solution


of (a + R) X = q. <>
Proof: An alternative form of (a+ R) X= q is (1 + a- 1 R) X= a- 1 q so that
X= (1 + a- 1 R)- 1 (a- 1 q).
Let Sk(w) = 1 + L~=1 (-1)' a-r R(w)' be a sample of sk. Because IR(w)fal::::; y < 1
for almost all w's, then
1- k+l
L la-r R(w)'l :::S 1 + LYr =
k k
ISk(w)l :::S 1 + 1~ y
r=1 r=l
is bounded by a geometric series with finite sum since y < 1 by hypothesis. Hence, the
numerical series Sk(w) approaches a limit S(w) = X(w) ask---+ oo for almost all w's. A

Let X be the solution of Eq. 8.2, where Y E L2. Denote by a = E[Y] and
R = Y- a the deterministic and random parts of Y, respectively. The Neumann
series solution for Eq. 8.2 and the approximate second moment properties of X
can be calculated from the following equations.

1. If there exists a y E (0, 1) such that II a - 1 R x II:::; y II x II a.s., then X has


the Neumann series representation:
00

X= LX,, where Xo = a- 1 q and


r=O
X,=-a- 1 RXr-1, r=1,2, .... (8.18)

2. Approximate second moment properties of X based on the first two terms of


the Neumann series, that is, X :::::: X o + X 1:

(8.19)
8.3. Algebraic equations 563

Proof: Consider the alternative form,


X= (a+ R)- 1 q = (i + a- 1 R)- 1 a- 1 q,
ofEq. 8.2. If II a- 1 Rx 11::0 y II x II a.s. forO< y < 1 andx E JR'I, then (Eq. 8.17)

X= (i + f:(-1/ ( Rr)
r=1
a- 1 a- 1 q a.s.

Generally, the first few terms of the above series are used to approximate X. For example,
the second moment properties of X given by Eq. 8.19 are based on the approximation

X ~Xo+X1 =a- 1 q -a- 1 Ra- 1 q.


Similar expressions can be found for higher order approximations of X. •

Example 8.8: Consider the equation Y X = q, where Y = a + R, R is uniformly


distributed in (-a, a), q = 10, a = 2, and a = 0.5. The solution X can be
interpreted as the deformation of a linear spring with random stiffness Y under a
constant force q.
The mean and variance of X are 5.00 and 0.52 for the approximation X ~
[ 1 - (RIa)] (q I a). The corresponding moments given by the approximation X ~
[1- (Rja) + (Rja) 2 ] (q fa) are 5.10 and 0.57, respectively. The exact mean and
variance of X are 5.11 and 0.55. The first order approximation is satisfactory in
this case. <>
Note: The Neumann series solution in Eq. 8.18 is convergent a.s. because there exists a
constant y E (0, 1) such that I(R/a)xl :::; ylxl for each x E R. Accordingly, we have
X= [1- (Rfa) + (Rfa) 2 - (Rfa) 3 + · · ·] (qfa).
The distribution of X is P(qf(a+R) :::; x) = 1-FR(qfx-a), where FR denotes the
distribution of R. The density of X is equal to q /(2 a x 2) in the range (qf(a+a), q f(a-a))
and zero outside this range so that the first two moments of this variable are E[X]
(O.Sqfa) ln((a + a)f(a- a)) and E[X2 ] = q 2f(a2- a 2 ). &

8.3.1.5 Decomposition method

The solution for Eq. 8.2 is represented by the series

L Ur,
00

X= where Uo = a- 1 q and
r=O
Ur = -a- 1 RUr-1. r = 1,2, .... (8.20)

Note: Let X = L~o A. 7 U r be a representation of X, where A. is an arbitrary parameter.


The equation,

LA
00 00
7 Ur =a- 1 q -)..a- 1 R LAr Ur,
r=O r=O
564 Chapter 8. Stochastic Systems and Deterministic Input
I,

coincides with the representation X = a- 1 q - a- 1 R X of Eq. 8.2 with X in Eq. 8.20


for A = 1. The formulas in Eq. 8.20 result by setting the coefficients of each power of
A equal to zero. The representation of X in Eq. 8.20 corresponds to A 1. The above =
manipulations are formal since the series in Eq. 8.20 may not be convergent. Extensive
considerations on this method can be found in [3]. We note that the representations of
the solution X in the Neumann series method (Eq. 8.18) and the decomposition method
(Eq. 8.20) have the same general terms. A

8.3.1.6 Equivalent linearization method


Generally, the solution X of Eq. 8.2 is a nonlinear function of the uncertain
parameters Z (Fig. 8.4). The objective of the equivalent linearization method is
to express X as a linear function of Z that is optimal in some sense. The follow-
ing two equations give the approximate solution by the equivalent linearization
method and the corresponding second moment properties of X.

1. Approximate solution X :

X~ Xt =a. Z + {3, (8.21)

where a. and f3 minimize e = E[ll q- Y Xt 11 2 ].


2. Approximate second moment properties of X:

JLx ~ lt. /Lz + {J and y X ~ lt. y Z lt.T. (8.22)

Proof: The unknown coefficients a and {J should be determined from the condition that the
difference between X and Xt is minimized in some sense. However, this difference cannot
be calculated directly since the probability law of X is unknown. If II y- 1 x II~ y II x II
a.s. for all x E Rn and a constant y < oo, then

II X- Xt 11=11 Y- 1 q- y-l Y Xt II~ y II q- Y Xt II a.s.

so that the use of the objective function e = E[ll q - Y X1 11 2 ] is meaningful. The


entries of the matrices a and {J are the solutions of the linear system of equations given by
ae;aaij = o and ae;api = o, i, j = 1, 2, ... , n. •
Example 8.9: Consider the problem in Example 8.8 and assume that Y is a log-
normal variable with mean J.t and variance a 2 and q > 0 is a constant. Figure 8.6
shows the exact mean and variance of X scaled by J.t I q and (J.t I q) 2 , respectively,
as a function of the coefficient of variation v = a I J.t. The figure also shows the
approximate second moment properties of the solution X given by the equivalent
linearization and the first order Taylor expansion. The approximate solutions are
accurate for small values of the coefficient of variation but become unsatisfactory
for relatively large values of v. 0
Proof: The equivalent linearization approximates X by Xt = a Y + f3 so that the objective
function is e = E[(q -a Y2 - f3 Y) 2 ]. The conditions aejaa = 0 and aeja{3 = 0 give
8.3. Algebraic equations 565

4.-----------------.

3
1.5

0.5 '---------~----------'
0 0.5 0.5
v = 0'/l.l v = 0'/ll

Figure 8.6. Exact and approximate mean and variance of X

a = -q ft-2 (1 + v2 )- 4 and .B = q rt- 1 (2+ v 2 ) (1 + v 2 )- 2 , so that the first two moments


of Xt are

E[Xtl = q rt- 1 (1 + v2)-4 (1 + 5 v2 + 4 v4 + v6 ),


Var[Xtl = q2 ft-2 (1 + v2)-8 v2.

These moments are used as approximations for the corresponding moments of X. •

8.3.1.7 Iteration method


The iteration method is based on (1) the recurrence formula

(8.23)

derived from Eq. 8.2, where a = E[Y] and R = Y -a, and (2) a heuristic
approximation, referred to as the local independence hypothesis, stating that the
correlation between X and R can be disregarded ([164], p. 18, [173], Section 11.8).
There is no theoretical justification for this hypothesis. We review briefly this
method because of its wide use in some fields [164, 173]. Two versions of the
iteration method are presented.

Version 1. Approximate second moment properties of X:

E[X]::::::: a- 1 q,
E[X XT]::::::: a- 1 q(a- 1 q)T + a- 1 E[R X XT RT] (a- 1)T. (8.24)

Proof: The above equations for the first two moments of X can be obtained from the
iteration of order zero of Eq. 8.2, that is, Eq. 8.23.
566 Chapter 8. Stochastic Systems and Deterministic Input

The expectation of Eq. 8.23 and the local independence hypothesis, E[R X] =
E[R] E[X] = 0, yield the first formula in Eq. 8.24.
The expression of the product of Eq. 8.23 with itself and the local independence
hypothesis give the second formula in Eq. 8.24. This formula and the local independence
hypothesis can be used to calculate the expectations E[X; X j ]. •

Version 2. Approximate second moment properties of X:

E[X]:::::: (i- a- 1 E[R a- I R])- 1 a- I q,


E[X XT] :::::: E[(a- 1 q - a- 1 R X) (a- 1 q - a- 1 R X)T]

=a- I ( q qT - q E[XT RT]- E[R X] qT

+E[R X XT RT]) (a-I)T. (8.25)

Proof: The average of Eq. 8.23 involves the unknown expectation E[R X]. The approach
here is to develop an equation for E[R X] rather then using the local independence hypoth-
esis in this equation. By multiplying Eq. 8.23 to the left with R and then averaging, we
have

E[R X]= E[R]a-I q- E[Ra- 1 R X]= -E[R a-IR X]:::: -E[R a-IR] E[X],

where the last approximate equality results by the local independence hypothesis. This
equality and Eq. 8.23 give the first formula in Eq. 8.25. The second formula in Eq. 8.25
results from the hypothesis of independence and the expression of X in Eq. 8.23. This
formula and the hypothesis of independence can be used to calculate the expectations
E[X; Xj] . •

8.3.2 Homogeneous equations

Let q = 0 in Eq. 8.1. Then this equation becomes

AX= AX. (8.26)

It is assumed that the matrix a = E[A] is symmetric.

Our objective is to solve the above random eigenvalue problem, that is, to
find probabilistic properties of the eigenvalues A; and the eigenvectors X; of A.
Note: It can be shown that the eigenvalues A; are Borel measurable functions of the entries
of A ([21], Theorem 2.2, p. 23) so that they are random variables on (Q, :F, P). The eigen-
vectors X; of A are also random variables on this probability space ([164], Theorems 1.6,
p. 37, and Theorems 1.8, p. 43). A

We consider eigenvalue problems defined by Eq. 8.26 and by equations of


similar form in which A is a linear differential or integral equation with random
8.3. Algebraic equations 567

coefficients. These types of problems are referred to as discrete and continuous


random eigenvalue problems, respectively. This section deals with discrete ran-
dom eigenvalue problems. Continuous problems are discussed in Section 8.4.2.
The determination of probabilistic properties of A; and X; can be based on
(1) extensions of the local solution in Section 6.5.2, (2) Monte Carlo simulation,
(3) exact or approximate expressions for the eigenvalues and eigenvectors of A
and bounds on A;, and (4) equations for moments of the eigenvalues and eigenvec-
tors of A. The first two classes of methods are not discussed since they constitute
direct extensions of previous developments in the book. The methods in the third
class are illustrated by exact expressions for A; and X; and bounds on A; (Sec-
tion 8.3.2.2), Taylor expansion (Section 8.3.2.3), and perturbation (Sec. 8.3.2.4).
The iteration (Section 8.3.2.5) and the crossing theory for stochastic processes
(Section 8.3.2.6) belong to methods in the fourth class. The next section reviews
essential properties of the eigenvalues and eigenvectors for a deterministic matrix.

8.3.2.1 Deterministic eigenvalue problem


Let a be an (n, n)-symmetric, non-singular matrix with real-valued entries.
The roots A. 1, ... , A.n of the characteristic equation det(a - A. i) = 0 are the
eigenvalues of a, where i denotes the (n, n) identity matrix. These roots may
not be distinct. We index the eigenvalues of a in increasing order, that is, A. 1 :S
A.2 :S · · · :S A.n. The polynomial det(a - A. i) of A. is called the characteristic
polynomial. An alternative form of the characteristic equation is

det(a- Ai) =co An + q An- 1 + · · · + Cn-1 A+ Cn = 0,

where Ck = -(Ck-1 d1 + Ck-2 d2 + · · · + q dk-1 + dk)/ k, k = 1, ... , n, co= 1,


and dk = tr(ak), k = 1, ... , n ([143], p. 84). A solution x of ax = A. x is said to
be non-trivial if x :f= 0. A non-trivial solution x; of ax; = A.; x;, i = 1, ... , n, is
called an eigenvector of a associated with the eigenvalue A.; of this matrix.
We summarize essential properties of the eigenvalues A.; and eigenvectors
x; of a and provide bounds on A. ; and a formula for calculating A.; .

• The eigenvalues A.; are real and the eigenvectors x; are in JE.n.
• If a is positive definite, its eigenvalues are positive.
• The eigenvectors corresponding to distinct eigenvalues are orthogonal and lin-
early independent.
• The eigenvectors are unique in the sense that an eigenvalue that is not multiple
can have only one eigenvector.
• If a has a multiple eigenvalue, it is possible to define a set of n linearly inde-
pendent eigenvectors providing a basis in IE. n .

Proof: The equation ax;= A;x; implies the equalities [((ax;)*)T]x; =A; (x7)T x;
and (x7)T (ax;) =A; (x7)T x; so that (A;- A;) (x7)T x; = 0, where x7 is the complex
568 Chapter 8. Stochastic Systems and Deterministic Input

conjugate of x;. Because x; is an eigenvector, (xi)T x; # 0 so that A.i' -A.; = 0. This


property also shows that the eigenvectors of a are elements of gt .
A matrix a is positive definite if xT ax ~ 0 for all x E !Rn. We have xf ax i =
A.;xf X; from ax;= A.; x; so that!..;~ Obecausexf ax;~ Obyhypothesis andx; is an
eigenvector such that xT xi > 0.
Two vectors, xi and x j, are said to be orthogonal if their inner product (xi , x j) =
xT x j is zero. The eigenvalue problems ax; = A.; x; and ax j =A. j x j imply x] ax; =
A.;x] x; and (axj)T x; = AjXJ x; so that (A.;- A.j) (x;,Xj) = 0. If A.; # Aj, then
(xi , x j) must be zero.
Suppose that A. 1 # /..2 and assume that the corresponding eigenvectors are linearly
+
dependent, that is, a1 XJ a2 x2 = 0 with a1, a2 # 0. We have (XI, a1 XJ a2 x2) = +
a1 (xi, xl) = 0 so that a1 must be zero in contradiction with our assumption. Hence, XJ
and x2 are linearly independent. If the eigenvalues of a are distinct, its eigenvectors span
the space JR1I and define a basis of this space.
Let (x 1, ... , Xn) be n linearly independent eigenvectors of a and suppose that AJ
has two eigenvectors, XJ and y 1. We have y 1 = Lk=l f3k Xk so that
n n n
AJYl =ayl = Lf3kaXk = Lf3kA.kXk and A.tYl = Lf3kA.tXk.
k=l k=l k=l
The difference, Lk= 2 f3k (A.t - A.k) Xk = 0, of the above equalities shows that the coeffi-
cients f3k (A.t - Ak) must be zero because the eigenvectors are linearly independent so that
f3k = 0 for k > 1 since A. 1 # Ak for k > 1. Hence, y 1 must be proportional to x 1·
Suppose that A. 1 is a double root of the characteristic equation and all the other roots
of this equation are simple. Hence, the eigenvectors (x3, ... , Xn) corresponding to the
distinct eigenvalues (/..3, ... , A.n) are linearly independent so that they span a subspace S
of IRn with dimension n - 2. The eigenvectors associated with A. 1 are linearly independent
of (x3, ... , Xn) since /..1 # A.;, i ~ 3, so that they are in the subspace JR1I \ S. Under
some conditions it is possible to select two linearly independent eigenvectors for AJ ([193],
pp. 7-21) .•

• A.; = {ax;, x;}j{x;, x; }.


eA.1::;: AR = (ax,x}j(x,x}::;: An forx E JRn \ {0}.

• lA.; - !hi I ::: ( .E~j=l r5) 112 ::: n max lrij 1. where r is an (n, n) symmetric
matrix with real-valued entries and /hi denote the eigenvalues of a+ r ([127],
pp. 96-99, and [193], pp. 101-103).
• lA.- akkl ::: Tk = .E7=I,i# lakd for at least one k ([120], p. 371, [193], p. 71).
Proof: We prove the first two statements. The first statement follows from the inner product
of ax;= A.; x; withx;.
Suppose that the eigenvectors of a are normalized such that their norm is II Xk II=
(xb Xk) 112 = 1. If x = Lk=l ck Xk is a vector in IRn with II x II= (x, x) 112 = 1, the
Rayleigh quotient is

~ 2
(ax,x)
AR = - - - = L.- ck Ak = /..1 +~ 2
L.- ck (A.k - AJ) ~ AJ
(x' x) k=l k=2
8.3. Algebraic equations 569

since II x =
11 2 Lk=l c~ =
1 and Ak :::: AI fork 2::: 2. Also, AR
An Lk=l c~ =An. If Ck = e; Cp fork =f. p,
n
AR = Ap + cp2 L...J
"" (Ak- Ap)Bk,
2
k=l,kf-p

so that AR = Ap + O(e 2) fore; = e and lei « 1. These properties imply


. (ax, x) d (ax, x)
AI = rmn --- an An = max
(x, x)
XEJRn\{0} (x, x) XEJRn\{0}

and provide bounds on the smallest and largest eigenvalues of a. •

8.3.2.2 Exact expressions and bounds


Suppose that it is possible to express the eigenvalues and the eigenvectors
of A as continuous functions of the entries of A. Because continuous func-
tions are Borel measurable the corresponding eigenvalues and eigenvectors are
random variables on (rl, :F, P). Statistics of A; and X; can be obtained by di-
rect calculations, Monte Carlo simulation, or other techniques using the mapping
A~ (A;, X;).

Example 8.10: Let SiJ, i, j = 1, 2, be a random stress tensor at a point in a


thin plate loaded in its own plane. Our objective is to determine the principal
stresses and stress directions of the stress tensor Sij, that is, the eigenvalues and
eigenvectors of random matrix A = {Su }. The eigenvalues and eigenvectors of
this matrix are, respectively,

2 ] I/2
A =S11+S22± [ (Su-S22) +S2
I,2 2 2 12

and X; = (1, -(Su- A;)/SI2), i = 1, 2. Probabilistic properties of A; and X;


can be obtained from the above equations by Monte Carlo simulation or direct
calculations. <>
Note: The characteristic equation of A is A2 - (Su + S22) A + Su S22 - Sf2 = 0.
The eigenvectors of A can be obtained from any of the two equations of the linear system
(A - A; i) Xi = 0, i = 1, 2, since det(A - A; i) = 0 a.s. &

Example 8.11: Consider the random matrix


-1
1
-1
~I g] ~a+Zb.
where Z is a random variable. The smallest eigenvalue of A is bounded by A I ::::
xf(a+Zb)xi =xfax+Zxfbxl =0.03983+0.3172Za.s.,wherexl =
[0.9295, -0.3159, 0.1903]T, II x II= 1, denotes the first eigenvector of a. The
bound follows from the properties of the Rayleigh quotient. <>
570 Chapter 8. Stochastic Systems and Deterministic Input

8.3.2.3 Taylor series method


Suppose that the entries of A are measurable functions of an JE.m -valued ran-
dom variable Z E Lz with mean JLz = {f.Lz,u} and covariance y z = {Yz,uvL u, v =
1, ... , m. Let a = A(p,z), A.; = A;(JLz), X; = X;(JLz), a,p = 8A(p,z)/8zp,
X;,p = 8X;(JLz)f8zp, Ai,p = aA;(JLz)f8zp, and Ci,p = aC;(JLz)/8zp, where C;
are the coefficients of the characteristic polynomial det(A - A i). It is assumed
that the eigenvalues A. i of a are distinct and that the eigenvectors xi of this matrix
are normalized to have unit norm.

If a is a symmetric matrix with det(a) f. 0 and distinct eigenvalues A.;, then


m
A; ~A.;+ L A.;,p (Zp- E[Zp]), where
p==1
,n-1
A. _ _ C1,p Ai + · •· + Cn-1,p Ai, + Cn,p and
(8.27)
l,p- ,n-1
n A;
(
+ n- l) CJ A;,n-2 + .. · + Cn-1

m
L
n T .
xk a,pX 1
X;~ x; + l:xi,p (Zp- E[Zp]), where x;
'
p =
A.· - Ak
Xk.
p==1 k==1,kf-i I

(8.28)

Proof: The partial derivative Ai,p and x;,p of A; and X; relative to a coordinate Zp of
Z in Eqs. 8.27 and 8.28 are called sensitivity factors. The expressions of A; and X; in
Eqs. 8.27 and 8.28 are referred to as first order approximations.
Let An+ C1 An- 1 + · · · + Cn-1 A+ Cn = 0 be the characteristic equation for A.
The differentiation of this equation with respect to Z p yields

nAn-lA ,p +C I,p An- 1 +(n-l)C 1 An-ZA ,p

+ · · · + Cn-1,p A+ Cn-I A,p + Cn,p = 0 or


(n An-l + (n- 1) C1 An- 2 + · · · + Cn-1) A,p
= -(Cl,p An-l + · · · + Cn-1,p A+ Cn,p).
where W,p stand for oWjozp· The above equation written for A = A; and Z = JLz
gives Ai,p in Eq. 8.27. Similar equations can be obtained for higher order derivatives of the
eigenvalues of A.
The differentiation of A X;= A; X; with respect to Zp gives A,p X;+ A X;,p =
A;,p X;+ A; X;,p or, by setting Z equal to JLz,

(a-}..;i)x;,p =A;,pXi -a,px;.

The above equation can be used to calculate the coordinates bjjl of x;,p = L:!== 1 hjjl Xl
in the basis defined by the eigenvectors of a by noting that (1) (a- A; i) Xi,p is orthogonal
8.3. Algebraic equations 571

to x; since (x TI
a - J..; x T) xi • p
I
= 0 and (2) the product of the above equation with xkT to
the left yields
(Ak -A;)b}f) = A;,k8ki - x [ a,pX;.
The latter equation gives the coordinates biJ:) of x;,p fori f= k since a has distinct eigenval-
ues by assumption. The coordinate b}f) remains undetermined and can be taken as zero so
"n
that x;,p = L...k=!,kf=i b(p) · b(p)
ik Xk· The ch o1ce · not restnctlve
ii = 0 1s ·
· · smce ' 1") x;,p
(a- ""i
and x; are orthogonal so that the above results do not change if we add to x;,p a vector
proportional to xi . •

The first order approximations of A; and Xi in Eqs. 8.27 and 8.28 are linear
functions of Z so that the second moment properties of Z are sufficient to calcu-
late the first two moments of the eigenvalues and eigenvectors of A. For example,
the approximate mean and variance of A; are A.; and r;;,q=l
Ai,p Ai,q Yz,pq. re-
spectively.

Example 8.12: Consider a (3, 3) random matrix A= a+ R, where

·=[
2.6 -1.1 0.0 ]
-1.1 5.2 -../2 '
0.0 -../2 10

0.1 -0.1
0.0]
R = Z [ -0.1 0.2 0.0 '
0.0 0.0 0.0

and Z is a random variable with mean zero (f-L z = 0) and variance a The i.
eigenvalues and eigenvectors of a, that is, of A with Z = 0, are A. 1 = 2.1647,
A.z = 5.2385, A.3 = 10.3967, and

0.9278 0.3711 0.0381 ]


b = [ 0.3671 -0.8902 -0.2699 '
0.0663 -0.2646 0.9621

where the columns of bare the eigenvectors of a. The first order Taylor approxi-
mations of the eigenvalues and eigenvectors of A given by Eqs. 8.27 and 8.28 are
At ::::: 2.1647 + 0.0449 Z, Az::::: 5.2385 + 0.2383 Z, A 3 ::::: 10.3967 + 0.0168 Z,
and
-0.0046 0.0118 0.0054 ]
X ::::: b + Z [ 0.0113 0.0025 -0.0105 '
0.0024 -0.0082 -0.0032
respectively. <>
Note: The random coefficients of the characteristic polynomial det(A- A i) are Co = 1,
Ct = -0.3 Z- 17.8, Cz = 0.01 Z 2 + 3.82 Z + 88.31, and C3 = -0.1 z 2 - 8 Z- 117.9
572 Chapter 8. Stochastic Systems and Deterministic Input

so that Co,I = 0, C1,1 = -0.3, C2,1 = 0.022 + 3.82, and C3,1 = -0.22-8. The
sensitivity factors for the eigenvalues can be calculated from (Eq. 8.27)

with 2 set as zero. The first order approximations of the eigenvectors of A is given by
Eq. 8.28. For example, the coordinates of x 1, 1 are

T T
x 3 a,1 XJ
x 2 a,l x1
b (l) _ --"'--'--- (1)
= -0.0124 and b13 = = -0.00089
12 - }q - A2 A.1 - A3

so that X 1 = XJ + (bi~ x2 +hi~ x3) 2. A

8.3.2.4 Perturbation method


Suppose that
A=a+cR, (8.29)

where a and k are (n, n) matrices with real-valued entries that are deterministic
and random, respectively, det(a) f. 0, E[R] = 0, and cis a small parameter. It
is assumed that a is symmetric and has distinct eigenvalues. Let Ai and xi be the
eigenvalues and eigenvectors of a. The eigenvectors xi are scaled to have unit
norm. It is assumed that a has distinct eigenvalues.

The first order approximations of the eigenvalues and eigenvectors of A are


T - 2
(8.30)
A;=::A;+cxi Rx;+O(c) and
T -
L
n
xkRXi 2
X;::::::: x; + c Xk + O(c ). (8.31)
A· -Ak
k=l,koFi l

Proof: The representations,

of the eigenvalues and eigenvectors on A introduced in A Xi = A; Xi with A in Eq. 8.29


yields
- (1) (1) (1)
(a+eR)(xi+eXi + .. ·)=(A.;+eAi + .. ·)(x;+eXi + .. ·)

so that

ax;=A.;x; (order 1),


(1) - (1) (1)
aX; +Rxi=A.iX; +Ai x; (order e).
8.3. Algebraic equations 573

The alternative form, (a- Aj i) x~l) = AP) Xj - Rx;, of the above equation for order 8
and the orthogonality of the vectors (a-).; i) xp) and x; imply xf (A?) x; - Rx;) = 0
orA;(1)
=X; Rx; smcex; x; = 1.
T - · T
Alternative calculations can be performed to find this result. For example, the equa-
tion for order 8 multiplied to the left by x]
gives
n n
xiT a "LBikXk+xiT Rx;
- =).;xiT "LBikXk+A;(1) xiT x;,
k=l k=l

where xp) = Lk=l B;k Xk and B;b k = 1, ... 'n, denote the projections of xp) on the
eigenvectors of a. The last equation simplifies to Bij (). i- ).;) +x] Rx; = A} 1) 8;i since
x] ax; =A;8ij andx] x; =8;i so that
(1) T -
A; =X; Rx; fori= j and

x~ Rx;
B;· =-1_ _ fori =I= j.
1 ).; - ).i

The projection B;; of X~l) on x; remains undetermined. Adding a vector proportional with
x; to X~l) = Li=l,i~i Bij xi does not change the above results so that we take B;; = 0,
and the resulting first order approximation of X; is x; + 8 x?). Additional considerations
on the determination of xp)
and approximations of order 2 and higher of the eigenvalues
and eigenvectors of matrix A can be found, for example, in [120] (Chapters. 9 and 11). •

The first order approximations of A; and X; in Eqs. 8.30 and 8.31 are linear
functions of R so that the second moment properties of this random matrix are
sufficient to calculate the first two moments of the eigenvalues and eigenvectors
of the random matrix A.

Example 8.13: Suppose that the deterministic and stochastic parts, a and R, of a
(3, 3) random matrix A (Eq. 8.29) are the matrices a and R in Example 8.12. It is
assumed that the random variable Z has mean zero and variance of order 1.
The zero order perturbation solution is given by the eigenvalues and the
eigenvectors of a (Example 8.12). The first order corrections of the eigenvalues
and eigenvectors of A are Ail) = x[ RXt = 0.045971 Z, A~l) = xi Rx2 =
0.238710 Z, A~1 ) = xf RX3 = 0.015319 Z, and
-0.0046 0.0118 0.0054 ]
X(l) = Z [ O.Ql 13 0.0025 -0.0105 '
0.0024 -0.0082 -0.0032
respectively. The eigenvalues and eigenvectors to the order e are A ; ~ J...; +
eA?) and X; ~X;+ ex?>. The above expressions for x<i) and A; can be
used to calculate approximately moments and other probabilistic properties of the
eigenvalues and eigenvectors of A. <>
574 Chapter 8. Stochastic Systems and Deterministic Input

Note: The first order approximation of the first eigenvector of A is (Eq. 8.31)

(l) xT Rx1 xT Rx1


X1 = 2 x2+ 3 X3=(-0.0124x2-0.00089x3)Z.
A.t-A.2 A.t-A.3

The calculations of moments and other probabilistic properties of A} 1) and x}l) are very
simple in this case since they depend linearly on Z. •

8.3.2.5 Iteration method


Let A be an (n, n) random symmetric matrix whose entries are real-valued
square integrable random variables and let

A =a+R, (8.32)

where a = E[A] is a symmetric matrix and R = A- a. Our objective is to


develop approximations for the first two moments of the eigenvalues and eigen-
vectors of A. The analysis is based on a version of the local independence hy-
pothesis (Section 8.3.1.7) stating that the dependence among A, X, and R can
be disregarded. As previously stated, this hypothesis has no justification. We in-
clude the iteration method in our discussion because of its wide use in some fields
[164, 173].
We present just a few results based on both versions of the iteration method
discussed in Section 8.3.1.7. Additional information on this method can be found
in [164] (pp. 18-19).

The expectation of the eigenvalues and eigenvectors of A can be approximated


by the solution of the eigenvalue problem

a E[X] ::::: E[A] E[X]. (8.33)

Note: The above approximations of the expectations of the eigenvalues and eigenvectors
of A coincide with the eigenvalues and eigenvectors of a.
The equation, a E[X] + E[R X] = E[A X], obtained by averaging (a + R) X =
A X cannot be used to calculate the first order moments of A and X since E[R X] and
E[A X] are not known. The approximations

E[R X] ~ E[R] E[X] =0 and E[A X] ~ E[A] E[X],

based on the hypothesis of local independence, yield Eq. 8.33. •

The second moment of the eigenvalues of A can be approximated by

E[Af] ::::: t..f + xf E[R R] Xi. (8.34)


8.3. Algebraic equations 575

Note: Consider the alternative form,


X= A - 1 (a+ R) X= A - 1 (a+ R)[A - 1 (a+ R) X]
=A - 2 (a a+ a R + R a+ R R) X,
of A X = A X. The expectation of this equation is
E[A 2 X] = a a E[X] +a E[R X] + E[R a X] + E[R R X]

and depends on the unknown averages E[A2 X], E[R X], E[R a X], and E[R R X]. The
local independence hypothesis and the last equation give E[A2 ] E[X] :::::: a a E[X] +
E[R R] E[X] or
E[ATJxt Xi:::::: xt a a xi +xt E[R R]Xj,
which yields Eq. 8.34 sincexf Xi= 1 andxf a a xi= >..fxf Xi= >..T- .&

Example 8.14: Consider the (3, 3) random matrix A in Example 8.13 with s = 1.
The first moments of the eigenvalues and eigenvectors of A coincide with the
deterministic solution corresponding to matrix a. We have obtained the same ap-
proximation for the mean values of the eigenvalues and eigenvectors by the pertur-
bation method. However, the approximate second moment of the first eigenvalue
of A is E[AI] ~ 4.69 + 0.0038 a}, and differs from the perturbation solution.¢

8.3.2.6 Level crossing for stochastic processes


We use the Rice formula discussed in Section 3.10.1 to (1) find the likely
values of the eigenvalues of an (n, n) real-valued random matrix A and (2) reduce
the uncertainty in the eigenvalues of A if some of its eigenvalues are specified. For
example, A may characterize a physical system with uncertain properties whose
lower eigenvalues have been measured without errors. Since A is assumed to be
symmetric and to have real-valued entries, its eigenvalues are real-valued. The
level crossing method discussed here does not seem to be useful for finding prop-
erties of the eigenvectors of A.
Let

V(A.) = A.n + C1 A.n- 1 + · · · + Cn-1 A.+ Cn = A.n +«(A.) C, (8.35)


be the characteristic polynomial, det(A -A. i), of A, where
a(A.) = [A.n- 1, A.n- 2 , ... , A., 1] and C = [C1, C2, ... , Cn-1, Cnf. (8.36)
The derivative V'(>..) = dV(>..)jd)... of Vis
V'(A.)=nA.n- 1 +fl(A.)C, where (8.37)
fl(A.) = [(n - 1) An- 2 , (n - 2))... n- 3 , ... , 1, 0]. (8.38)
The coefficients Ck. k = 1, ... , n, are measurable functions of the entries of A
so that they are random variables on the probability space (Q, :F, P) on which A
is defined. The zeros of the polynomial V(A.) are the eigenvalues of A, and these
zeros are on the real line since A is real-valued and symmetric.
576 Chapter 8. Stochastic Systems and Deterministic Input

Example 8.15: Let A be a (3, 3) symmetric matrix with entries A 11 = K1 + Kz,


Azz = Kz + K3, A33 = K3, A12 = -Kz, A13 = 0, and Az3 = -K3, where
K = (K1, Kz, K3), K = exp(G), and G is an JR3 -valued Gaussian variable with
mean (1, 1, 1) and covariance matrix y = {pli-jl}, i, j = 1, 2, 3. Figure 8.7
shows three samples of the polynomial V (A.) corresponding to three independent

10 r------,-----,------,---,----,----.----,-----,----,--,

: 7Third zero-crossing

-s Second zero-crossing
-8

Figure 8.7. Three samples of V(A.) for p = 0.3

samples of A with p = 0.3. The zero-crossings of stochastic process V give the


eigenvalues of the random matrix A. <>

The polynomial Y(A.), A. E [0, oo), in Eq. 8.35 is a real-valued stochas-


tic process defined on the real line that has continuous and differentiable sample
paths. The process V is not stationary and its finite dimensional distributions are
not usually Gaussian. The mean zero-crossing rate v(A.) of V at each A. 2: 0 is
given by the Rice formula (Section 3.10.1). It is difficult if not impossible in ap-
plications to calculate v(A.) analytically because of the properties of V and the
complex dependence of the coefficients C in Eq. 8.35 on A. To simplify calcula-
tions, we assume that Vis a non-stationary Gaussian process V o, which has the
same second moment properties as V. There is no theoretical justification for this
assumption, but numerical results in [75] provide a heuristic support for it. Note
that the probability law of VG is completely determined by the second moment
properties of V.
Let tJ-(A.) = E[Y(A.)] and 1-L'(A.) = E[Y'(A.)] be the mean functions of
the processes v<0>(A.) = V(A.) and y(l)(A.) = V'(A.), respectively, and denote
by Ypq(A., p) = Cov[V<P>(A.), y(q)(p)], p, q = 0, 1, their covariance functions.
8.3. Algebraic equations 577

Consider also the functions

m(A) = '(A) _ Yol (A, A) (A)


f.L Yoo(A, A) f.L '

(A) 2 = (A A) _ YOl (A, A) YlO(A, A) (8.39)


a Yll ' Yoo(A, A) '

where x (u) = u (2 <l>(u)- 1) + 2¢(u), <l>(u) = f~oo ¢(s) ds is the distribution


of N(O, 1), and¢(u) = (2rr)- 112 exp(-0.5u 2 ).

If V is assumed to be a Gaussian process V G. then v(A) becomes [75]

VG(A) = a(A) X (m(A)) ¢ (- JL(A) ) , A~ 0. (8.40)


JYOo(A, A) a(A) Jyoo(A, A)

L:
Proof: By the Rice formula the mean zero-crossing rate of V is

v(A.) = lzl fvo.), V'(J.)(O, z) dz or

v(A.) = [L: lzl fv'(J.)IV(J.)(Z I O)dz] fv(J.)(O) = E [iV'(A.)II V(A.) = o] fv(J.)(O),

where fv(J.), V'(J.) and fv'(J.)IV(J.) denote the densities of (V(A.), V' (A.)) and V'(A.) I V(A.),
respectively. The mean rate v(A.) includes zero-crossings with both positive and negative
slopes. The second moment properties of the JRZ-valued random variable (V(A.), V 1(A.))
are

tt(A.) = E[V(A.)] = A.n +«(A.)p., tt'(A.) = E[V 1 (A.)] = nA.n-l + {l(A.) p.,
Yoo(A.,p) = «(A.)y«(p)T, Yot(A.,p) = «(A.)y{l(p)T, Ylo(A.,p) = fl(A.)y«(p)T, and
Yll (A., p) = fl (A.) y fl (p) T, where p. and y denote the mean and covariance matrices of
the random vector C.
If it is assumed that Cis an 1Rn-valued Gaussian variable, then (V(A.), V 1(A.)) is a
Gaussian vector, and the mean and variance of the conditional random variable V'(A.) I
V(A.) are m(A.) and a(A.) 2 in Eq. 8.39, respectively (Section 2.11.5). The Rice formula
applied to the process VG yields vG(A) in Eq. 8.40.
The second moment properties of C may be difficult to obtain analytically, but can
be estimated simply from samples A generated by Monte Carlo simulation. •

Suppose that the first r < n eigenvalues of a physical system character-


ized by a random matrix A have been measured without errors on a particu-
lar sample of A. Our objective is to find properties of the remaining eigenval-
ues of A conditional on this information. Let f(z I y) be the joint density of
(V'({t), ... , V'({r)), r < n, conditional on (V({l) = Yl· ... , V({7 ) = Yr),
where {I < · · · < {r· Denote by v(A I 0, z, ~)the mean zero-crossing rate of Vat
578 Chapter 8. Stochastic Systems and Deterministic Input

A. > ~r conditional on V having zero-crossings at ~ s with slopes Zs, s = 1, ... , r,


where y = (yl, ... , Yr), Z = (ZI, ... , Zr), and~= (~1, ... , ~r).

The joint density of V ' ( ~s) at the Ys -crossings of V, s = 1, ... , r, that is, the
density of (V'(~I), ... , V'(~r)) conditional on (V(~I) = Yl, ... , V(~r) = Yr),
is
( J-) = TI~=I lzsl f(z I y)
g z I y, ~ r nr . (8.41)
JJE.r s=llzsl f(z I y)dz

The mean zero-crossing rate of V (A.) = det( A - A. i) conditional on A 1 = ~1 <


· · · < Ar = ~r. r < n, is

v(A. I~)= foc,g(z I 0, n v(A. I 0, Z, ~)dz, A> ~r· (8.42)

The corresponding mean zero-crossing rate for V a is denoted by va (A. I ~).


Proof: We present a heuristic proof for Eqs. 8.41 and 8.42. Rigorous arguments can be
constructed based on properties of Slepian processes ([121], Chapter 10).
Let C be a set defined by the Ys-crossings of V at ~s, s = 1, ... , r. The set can be
approximated by n~=l (Us U Ds) for small values of ll~s. s = 1, ... , r, where

Us= {V(~s) < Ys < V(~s) + V'(~s) ll~s} and


Ds = {V(~s) > Ys > V(~s) + V'(~s) ll~s}, s = 1, ... , r.

Consider also the event 9 = n;=l {V(A 1) E (x 1 , x 1 + llxtl}, !lx 1 > 0, for some arbitrary
values of A such that At > ~r. The probability of the simultaneous occurrence of C and 9
can be approximated by

where !lx1 is small and f(z, y, x) is the joint density of (V'(~s), V(~s)), s = 1, ... , r,
and V(A 1 ), t = 1, ... , q, so that

P(Q I C)= frt=l


ilXt fJE.r n~;=l lzs I f(z, y, x) dz
fJE.r Ds=l Ius I f(u, y) du

=n
l
q

t=l
ilXt
JE.r
D~- 1 1zsl f(z, y) f(x I y, z) d
r
fJR,ns=lluslf(u,y)du
Z

= n q

t=l
flxt {
}JE.r
g(z I y, ~) f(x I y, Z),

where f (x I y, z) denotes the density of V (At), t = 1, ... , q, conditional on (V (~s) =


Ys. V' (~s) = Zs ), s = 1, ... , r, and f(x, y) denotes the density of (V(~s), V' (~s)), s =
1, ... , r. The above result gives Eq. 8.41.
8.3. Algebraic equations 579

Let v(). 1 y, z, l;) be the mean zero-crossing rate of the conditional stochastic pro-
cess V().) I (V(l;s) = Ys. V'(l;s) = Zs, s = 1, ... , r) for A> l;r. The mean zero-crossing
rate of V(A), A > l;r, conditional on only V(l;s) = y 5 , s = 1, ... , r, can be obtained by
integrating v(). 1 y, z, l;) weighted by g(z I y, l;) over all possible values of z. This mean
crossing rate withy = 0 yields v(A II;) in Eq. 8.42.
The determination of the functions g(z I y, l;) and v(). I 0, z) is difficult for an
arbitrary vector C. If Cis Gaussian, the conditional zero-crossing rate v(). I 0, z), denoted
by vc(A I 0, z), can be obtained from Eq. 8.40 with the appropriate parameters and
g(z I y, l;) becomes

n~=1
lzs I exp [ -(1/2) (z- m)T y- 1 (z- m)]
gc(zly,l;)= [ 1 ] •
f~r TI~= 1 Ius I exp -(1/2) (u- m)T y- (u- m) du

where m and y denote the mean and covariance matrices of the conditional random vector
(V'(/;1), ... , V'(l;r)) I (V(/;1) = Y1o •.. , V(l;r) = Yr) [75] . •

n
The mean zero-crossing rates v(A.) and v(A. I provide useful information
on the eigenvalues Ai of A. For example, (1) the average number of eigenvalues of
A in an interval I of the real line is f 1 v(A.) dA. or f 1 v(A. I ~) dA., (2) the probability
that the number of eigenvalues N (I) in I exceeds a specified value q > 0 is
bounded by

P(N(l) ~ q) ::S E[N(l)]


J1 v(A.)dA. q > 0,
q q
and (3) the probability that A has at least one eigenvalue in I can be approximated
by 1- exp (- f 1 v(A.)dA.) or 1- exp (- f1 v(A. I ~)dA.).
Note: The definitions of v and v give the first property. The second property follows from
the Chebychev inequality. The last property is based on the heuristic assumption that the
zero-crossings of the characteristic polynomial define an inhomogeneous Poisson process
with intensity V(A) or v(A II;). A

Example 8.16: Consider a random matrix

l
where g denotes the constant of gravity, 0 < a < l and m > 0 are some constants,
and K > 0 is a random variable. The eigenvalues, A 1 = gIl and A2 = gIl +
2 a 2 K l(m 12 ), of A are the square of the natural frequencies of a mechanical
system consisting of two pendulums with length l and mass m coupled by a spring
of random stiffness K. The mean zero-crossing rate of the process V can be
calculated exactly and is [75]

(
v(A.) = 8 A. - - g) + -mz2 fK (m- z2 (A. - -)
g) .
l 2a 2 2a 2 l
580 Chapter 8. Stochastic Systems and Deterministic Input

The process V has a zero-crossing at )., = g j I, indicating a deterministic eigen-


J
value. The average number of zero-crossings of Vis 000 v()..) d). = 2. 0
Note: The probability density functions of the eigenvalues of A,

ft.. 1 (A.) =&(A.- gj l) and

can be obtained by direct calculations and show that the first eigenvalue of A is determin-
istic, in agreement with the crossing solution. •

Example 8.17: Consider a random matrix

A _ [ (Kt + Kz)fm
- (Kzlz- Ktlt)fm

where m,lt,fz. g > 0 are some constants and K 1. Kz > 0 are random variables.
This matrix defines the eigenvalue problem for a rigid bar with mass m, length I,
and moment of inertia g relative to its mass center, which is supported at its ends
by two springs with stiffness K 1 and Kz. The mass center is at distances It and lz
from the two springs so that I = It + lz. Numerical results are given for m g =
3220 lb, It = 4.5 ft, lz = 5.5 ft, g = m r 2 , r =4ft, and lognormal variables K 1
and Kz with E[KI] = 2400 lb/ft, E[Kz] = 2600 lb/ft, coefficients of variation
VI = vz = 0.3, and correlation coefficient p [75]. The lognormal vector (K 1. Kz)
represents the image of an JR2 -valued Gaussian variable by a nonlinear mapping
J
([79], pp. 48-49). The matrix A has two eigenvalues and we have 000 v()..) d)., =
2.
Figure 8.8 shows the mean zero-crossing rates v and va of V and Va, re-
spectively, for p = 0, 0.5, and 0.9 in the left three plots. The mean zero-crossing
rates v have been obtained by numerical integration. The plots also show the con-
ditional mean zero-crossing rates va().. I At = 46). The mean zero-crossing rates
va().. I At = 46) are also shown in the three plots on the right side of Fig. 8.8
together with Monte Carlo (MC) simulation results. The Gaussian assumption is
satisfactory for the matrix A in this example. The additional information A 1 = 46
reduces the uncertainty in Az. The reduction in uncertainty increases with the cor-
relation coefficient p between K 1 and K 2. 0
Note: The joint density of (V(A.), V' (A.)) is

IK 1,K2 (kt. kz)


fv().. ), V'(J.)(x, z) = l(a A.+ c kz) b- (b A.+ c kt)l a'

where a= -(ljm +If/~). b = -(ljm + Zif~). and c = (lf +Zi)f(m ~).The arguments
ks. s = 1, 2, in the above expression are related to x and z by the mappings V(A.)
= x and
V 1 (A.) = z. _.
8.3. Algebraic equations 581

0.16 r--~-:--~-------,
p =0.0 p =0.0
0.12 0.12
v(A.) MC
0.08 0.08

0 oL-~----~~----~~
30 40 50 60 70 80 90 100 30 40 50 60 70 80 90 100

0.16 0.16 r-----------------,


p =0.5 p =0.5
0.12 0.12

0.08

0.04

0.16 r--~~~-------,
~p =0.9
0.12
MC~
0.08 0.08
v G (A. I A1= 46)
0.04 0.04 ~ ~
o~LL--~--~----~~
0 .____._____._____.______.__)__.___,\._______,
30 40 50 60 70 80 90 100 30 40 50 60 70 80 90 100
A. A.

Figure 8.8. Exact and approximate zero-crossing rates

Example 8.18: Consider a system with n identical masses m > 0 connected in


series by n springs of random stiffnesses K 1 , ... , K n. The first spring of stiffness
K 1 is attached to a fixed support. Figure 8.9 shows the approximate mean zero-
crossing rate of V assumed to be a Gaussian process (Eq. 8.40) and a histogram
of the eigenvalues of the system obtained by Monte Carlo simulation for n =
4, independent Gaussian variables Ki with means E[KI] = 30,000, E[K2] =
20, 000, and E[K 3] = E[K4] = 10, 000 lb/in and coefficients of variation vi =
v = 0.15, and m = 100 lb sec 2/in [75]. Results show that the approximate
solution given by VG(A.) is consistent with simulation results. 0

Note: The square frequencies of the system coincide with the eigenvalues of the (n, n)
582 Chapter 8. Stochastic Systems and Deterministic Input

0.25 ,--.,---,...-----.----.---~-.---,.----,

0.2

0.15

0.1

0.05
Gaussian assumption
Monte Carlo

Figure 8.9. Approximate zero-crossing rates based on the Gaussian assumption


and Monte Carlo simulation

matrix A. The non-zero entries of A are Ai,i = (Ki + Ki+l)fm, i = 1, ... , n - 1,


An,n = Kn/m, and Ai,i+l = Ai+l,i = -Ki+lfm, i = 1, ... , n- 1. A

8.4 Differential and integral equations


Let D be an open bounded subset of Rd and t ~ 0 be a time argument.
Consider the differential equation

(A- AI) [U(x, t)] = q(x, t), x ED, t > 0, (8.43)

where I denotes the identity operator, A is a parameter, q is an 1Rn-valued deter-


ministic function, and A is a differential operator with random coefficients defined
on a probability space (Q, :F, P). Initial and boundary conditions need to be spec-
ified to find the solution U of Eq. 8.43. These conditions can be deterministic or
random. Generally, the solution of Eq. 8.43 is an 1Rn-valued random function
defined on D* = D x [0, oo). An alternative form of Eq. 8.43 is the integral
equation
U(x*) = F(x*) +A { K(x*, ~*) U(~*) d~*, (8.44)
ln•
wherex* = (x,t), ~* = (~,s), x,~ ED c Rd, s E (O,t), t > 0, F is a
random function depending on the initial and boundary conditions, and K denotes
a random kernel.
Note: The operator A in Eq. 8.43 and the generator of diffusion processes, used exten-
sively in Chapter 6 and other parts of the book, share the same notation but are different
8.4. Differential and integral equations 583

mathematical objects and should not be confused. The symbol A in Eq. 8.43 has been used
to parallel the notation A in Eq. 8.1. A

Example 8.19: Let U be the solution of U"(x) +A Y1 U(x) = q(x), x E D =


(0, 1), with the boundary conditions U (0) = Y2 and U' (0) = 0, where Y1, Y2 are
real-valued random variables and q is an integrable function. The operator A in
Eq. 8.43 for this differential equation is A = d 2jdx 2 +A (Y1 + 1). The integral
form of the differential equation for U is ([99], p. 226)

where the term in square bracket is a random function F (x) and the kernel of the
second integral K (x, 0 = 1[O,xJ (0 (~ - x) Y1 is random (Eq. 8.44). <>

The problem defined by Eq. 8.43 with q i= 0 or Eq. 8.44 with F i= 0 is


said to be inhomogeneous. If q = 0 and F = 0 in these equations, we deal with
a homogeneous problem. Our objective is to find probabilistic properties of the
solution of the inhomogeneous and homogeneous versions of the above equations.
The local solutions in Sections 6.2, 6.3, 6.4, and 6.5 can be extended to solve
locally some forms ofEqs. 8.43 and 8.44 by following the approach in Section 8.2.
These extensions are not considered here.

8.4.1 Inhomogeneous equations

Let V =A- AI for a fixed A such that Eq. 8.43, that is,

V[U(x,t)]=q(x,t), xEDC~d, t~O, (8.45)

has a.s. a unique solution.

Note: The solution U of this equation is a random field on (Q, :F, P) because it is a
measurable function of the random coefficients of V and of the random initial/boundary
conditions for Eq. 8.45, which are defined on this probability space. A

It is assumed that (1) all random parameters in Eq. 8.45 have finite second
moments, (2) Vis a linear operator, that is, V[a1 U1 + a2 U2] = a1 V[Ul] +
a2 V[U2], where a;, i = 1, 2, are some constants, and (3) the deterministic part
£ = E[V] of V has an inverse £- 1 that is a linear operator. The random part
R = V - £ of V has zero expectation by definition. For example, the random
operator in Example 8.19 is V = d 2 jdx 2 +A Y1 so that£= d 2jdx 2 +A E[Yl]
and R = A (Y1 - E[YI]).
Our objective is to find probabilistic properties of the solution U of Eq. 8.45.
We have already mentioned that some of the local solutions in Chapter 6 can be
extended to solve a class of differential equations with random coefficients. We
584 Chapter 8. Stochastic Systems and Deterministic Input

also can approximate Eq. 8.45 by an algebraic equation with random coefficients
and use any of the methods in Section 8.3.1 to solve the resulting equation. The fi-
nite difference, finite element, or other methods can be used to discretize Eq. 8.45.

8.4.1.1 Monte Carlo simulation method


The approach outlined in Sec. 8.3.1.1 for solving Eq. 8.2 can be applied
to find probabilistic properties of the solution U of Eq. 8.45. Our comments on
features and limitations of Monte Carlo simulation in Section 8.3 .1.1 remain valid.
The application of the Monte Carlo simulation involves the following two steps.

1. Generate ns independent samples Vk = £ + Rk, k = 1, ... , ns. of the


stochastic operator V and calculate

Vk[Uk(X, t)] = q(x, t), k = 1, ... , n 8 • (8.46)

2. Estimate statistics of U, for example, the mean and covariance functions, JL u


and Cu, of U, can be approximated by

1 ns
fLu(X, f)=- LUk(X, t),
ns k=l
1 ns
Cu(Xt, X2, ft, t2) = - L(Uk(Xt, ft)- fLu(Xt, ft))
ns k=l
X (Uk(X2, t2)- fLu(X2, t2))T. (8.47)

Note: The algorithms in Chapter 5 can be used to generate samples of V. The Monte Carlo
simulation method is not restricted to linear operators. It can be applied to nonlinear equa-
tions, in which case the determination of the samples Uk of U requires solving a nonlinear
differential equation rather than Eq. 8.46. A

8.4.1.2 Taylor series method


Suppose that V in Eq. 8.45 and the initiaVboundary conditions of this equa-
tion depend on m < oo random parameters collected in an ocm -valued random
variable Z E L2 defined on a probability space (Q, F, P) with mean JL z = {J.tz,u}
and covariance y z = {Yz,uv}. If a coefficient of D is, for example, a random
field, it can be approximated by a parametric model consisting of a finite sum
of deterministic functions weighted by random coefficients. We have used this
approximation in Section 5.3 to generate samples of an arbitrary random field.
The solution U of Eq. 8.45 can be viewed as a function not only of (x, t)
but also of Z, that is, U = U(x, t; Z). It is assumed that the mapping Z 1-+ U
has continuous second order partial derivatives for each (x, t). Since the map-
8.4. Differential and integral equations 585

ping Z ~---+ U is measurable, U is a random function on the probability space


(fJ, F, P).

1. Approximation of U:

~ aU(x, t; P.z)
U(x, t; Z):::::: U(x, t; P.z) +~ a (Zu -t-Lz,u). (8.48)
u=l Zu

2. Approximate second moment properties of U:

(8.49)

Note: The approximate solution for U in Eq. 8.48 is referred to as the linear or first order
Taylor solution. Considerations similar to the discrete case (Eqs. 8.5 and 8.6) can be used
to derive Eqs. 8.48 and 8.49. A
The functions U(x, t; P.z) and aU(x, t; P.z)fazu needed to apply Eqs. 8.48
and 8.49 can be calculated from the following two equations. Let l be the operator
D with JLz in place of Z.

C[U(x, t; P.z)] = q(x, t), (8.50)


r [aU(x, t; P.z)] __ aD(x, t; P.z) U( .
~.., - x,t,p.z). (8.51)
azu azu

Proof: Generally, l differs from C = E[V]. The function U(x, t; /Lz) in Eq. 8.50 results
from Eq. 8.45 with its initial and boundary conditions in which Z is set equal to /Lz·
The partial derivative W. J
[U] + V [ ~~ = 0 of Eq. 8.45 relative to a coordinate
Zu of Z gives Eq. 8.51 by taking Z equal to /Lz in this equation.
The functions u and au ;azu satisfy deterministic differential equations with the
same differential operator, the operator C. The derivatives au jazu give the rate of change
of U with respect to the coordinates of Z and are called sensitivity factors. If a sensitivity
factor au ;azk is small and the variance of Zk is not excessive, the effect of the uncertainty
in Zk on the solution is limited so that Zk may be taken equal to its mean value as a first
approximation. •
The approximate second moment properties of U in Eq. 8.49 can be (a) im-
proved by retaining additional terms of the Taylor series approximating the map-
ping Z ~---+ U and (b) generalized to solve nonlinear equations. For example, let
U be the solution of
D [U(x, t)] +N [U(x, t)] = q(x, t), (8.52)
586 Chapter 8. Stochastic Systems and Deterministic Input

where V is given by Eq. 8.45 and N is an n-dimensional column vector whose


entries are nonlinear functions of Z and U.
Note: Suppose that the approximation in Eq. 8.48 is augmented with second order terms.
In addition to Eqs. 8.50 and 8.51, we need a new equation giving the mixed derivatives,
a2 U(x, t; /Lz)fazuazv, u, v = 1, ... , m. This equation can be obtained by differentiating
Eq. 8.45 with respect to Zu and Zv, u, v = 1, ... , m, and setting Z equal to its mean value.
The first order approximation of Eq. 8.52 cannot use Eqs. 8.50 and 8.51 if U is de-
fined by Eq. 8.52 rather then Eq. 8.45. The function U (x, t; /Lz) is the solution of Eq. 8.52
with Z replaced by its mean value. The sensitivity factors for U can be obtained from
Eq. 8.52 by differentiation. Since

av [U]
azu
+v
azu
J
[au +aN [U] +taN [U] auk = 0
azu auk azu
k=l

and the functions z r-+ 1) and (Z, U) r-+ N are known, the derivatives av;azu. aN ;azu.
and au;auk can be calculated, and therefore the gradients aU(x, t; /Lz)/azu can be ob-
tained from the above equation. A

Example 8.20: Let U (x; Z) E lR, x ~ 0, be the solution of the differential


equation U'(x; Z) + (x Z 1 + x 2 Z2) U(x; Z) = 0, where U(O; Z) = h(Z) and
Z = (Z1, Z2) E JR.2 is the vector of uncertain parameters. Hence, the operator in
v
Eq. 8.45 is = djdx +X zl + x 2 z2.
The first order solution of U is given by Eq. 8.48, where U(x; p, z) and
8U(x; /Lz)/8zu can be calculated from

.C [U(x; /Lz)] = 0 with U(O; /Lz) = h(P,z) and

C [aU(x; /Lz)/8zu] =-xu U(x; /Lz) with 8U(O; /Lz)/8zu = 8h(p,z)/8zu.

In this case .C coincides with£ = E[V] since the coefficients of V are linear in
the coordinates of Z. <>

Note: The differential equations for U(x; /Lz) and aU(x,; /Lz)/azu. u = 1, 2, result from
Eqs. 8.50 and 8.51. Classical methods for solving deterministic differential equations can
be used to find U(x; /Lz) and aU(x,; /Lz)/azu. A

Example 8.21: Let U (t), t ~ 0, be the solution of the nonlinear differential equa-
tion (dfdt- a- Z U(t) 2 ) U(t) = q(t), where t ~ 0, a is a constant, Z denotes
a random variable, and U (0) = 0. The function U (t; f:-L z) and the sensitivity factor
V(t; t-Lz) = 8U(t; t-Lz)/8z satisfy the differential equations

U(t; t-Lz) =a U(t; t-Lz) + 1-Lz U(t; t-Lz) 3 + q(t),


V(t; t-Lz) =a V(t; t-Lz) + U(t; t-Lz) 3 + 3t-Lz U(t; t-Lz) 2 V(t; t-Lz),

with zero initial conditions.


Figure 8.10 shows the evolution of U(t; t-Lz) and V(t; t-Lz) fora= 1, q(t) =
8.4. Differential and integral equations 587

1.8 ,-----~-----,-----,.-----,------,

time

Figure 8.10. Solution and sensitivity factor for Z = fJ. z

sin(v t), v = 5, and IJ.z = -0.5. The sensitivity factor increases rapidly in time
and reaches a plateau at about t = 4, showing that the dependence of the solution
on the value of Z increases in time for t ::: 4 and is nearly constant for t > 4.
This observation is consistent with the graph of U(t; -0.4), showing a notable
difference from U ( t; fJ. z) for times t ~ 4. <>

Note: The differential equation defining U is nonlinear. The linear and nonlinear parts
of the differential operator are 'D[U] = (djdt -a) U and N[U] = -Z u 3 , respectively
(Eq. 8.52).
The equation

V(t; Z) =a V(t; Z) + U(t; Z) 3 + 3 Z U(t; Z) 2 V(t; Z),


for V(t; Z) = aU(t; Z)jaz, results from the equation for U by differentiation with respect
to Z, where V(t; Z) = a V (t; Z)jat. The sensitivity factor V(t; ~-tz) is the solution of the
above equation with V (0; Z) = 0 and Z set equal to its mean value. &

8.4.1.3 Perturbation method


Suppose that the random operator Din Eq. 8.45 has the representation

D=£+ER, (8.53)

where£ = E[D], E is a small parameter, E[R] = 0, and q is of order 1. The


perturbation series solution for Eq. 8.45 with Din Eq. 8.53 and its second moment
properties are given by the following equations.
588 Chapter 8. Stochastic Systems and Detenninistic Input

1. Approximation of U:

U(x, t) ~ Uo(x, t) + 8 Ut (x, t) + 8 2 U2(x, t) + 0(8 3), where


.C[Ur(X,t)]=-R[Ur-l(X,t)], r=1,2, ... , and
.C[Uo(x, t)] = q(x, t). (8.54)

2. Approximate second moment properties of U:

E[U(x, t)] ~ Uo(x, t) + 82 E[U2(x, t)] + 0(8 3),


E[U(xt, tt) U(x2, t2)T] ~ Uo(xt, tt) Uo(X2, t2)T
+ 8 2 E[Uo(xt, tt) U2(x2, t2l + Ut(Xt, tt) U1(x2, t2)T
+ U2(x1, tt) Uo(X2, t2l1 + 0(8 3). (8.55)

Proof: The approximation for U based on the first q terms of the power series in Eq. 8.54
is called approximation of order q. The first order approximation U ::::: Uo + s U 1
corresponds to q = 1. The initial and boundary conditions for the differential equation
defining U k, k = 0, 1, ... , consist of the terms of order sk of the corresponding conditions
for Eq. 8.45.
The recurrence formulas in Eq. 8.54 giving Uk can be obtained from Eq. 8.45 with
1J in Eq. 8.53 and the solution U is represented by the power series in Eq. 8.54. These
equations give

or, by grouping terms of the same order of magnitude,

The coefficients of each power of s must be zero by the fundamental theorem of perturba-
tion theory ([171], p. 12) so that we have

c [Uo]- q = o (order 1),


L:[Ut]+R[Uo]=O (orders),
L:[U2l+R[Ut] =0 (order s 2 ),

for the first three terms of the perturbation series. These conditions give Eq. 8.54. The
expressions of the first two moments of U depend on the order of the approximation con-
sidered for the solution. For example, the approximations of the mean of U to the orders
sand s2 are, respectively, E[U] = Uo + O(s 2) and E[U] = Uo + s 2 E[Uz] + O(s 3 ).
Errors may result if the order of the approximating solution is insufficient to deliver all the
terms needed for calculating moments for a specified accuracy. For example, the first order
approximation for U delivers only one of the three terms of order sZ in the expression of
E[U(xl, ft) U(x2, t2)T] (Eq. 8.55). •

The perturbation series method has notable features: ( 1) the differential


equations delivering the terms U k, k :::: 0, have the same deterministic operator
8.4. Differential and integral equations 589

£but different inputs (Eq. 8.54), (2) the calculation of the terms U k can be per-
formed sequentially starting with U o, (3) the formulas in Eq. 8.54 can be used to
express the solution of Eq. 8.45 in terms of the input q, and (4) the method can
be applied to solve nonlinear problems. For these problems, U o is the solution of
a nonlinear differential equation, but higher order terms satisfy linear differential
equations (Example 8.24).

Example 8.22: Let V = djdt + f3 +8 Y(t), t::: 0, inEq. 8.45 with d = 1, where
q and f3 > 0 are some constant, Y is a stochastic process with mean zero, and
U(O) = 0. Hence, the deterministic and random parts of V are£ = (djdt) + f3
and R = Y, respectively. The first two terms of the perturbation series are

and Ut(t) = q
--e-flt
f3
it
0
(efls -1) Y(s)ds.

The approximate mean of U is E[U(t)] = Uo(t) + 0(8 2). If Y is a white noise


process with unit spectral intensity, then

E[Ut (t) Ut (s)] = q2


/32
e-fl (t+s) [ -1- e 2 fl (t/\s) -
2 f3
~ efl (t/\s) + t /\ s +
f3
2_]f3
2
is one of the three terms entering the approximation of order 8 2 of the correlation
function of U (Eq. 8.55). <>
Note: The functions Uo and Ut satisfy the differential equations £[Uo(t)] = q and
£[Ut (t)] = - Y(t) Uo(t) with zero initial conditions. The Green function giving the kernel
of the integral representation of c-
1 is e-fl (t-s) for t ::: s. The term of order 82 of the
correlation function of U is E[Uo(t) U2(s) + Ut (t) U1 (s) + U2(t) Uo(s)] . .&

Example 8.23: Let U(t) be the solution ofthe differential equation in the previ-
ous example with Y(t) = Y a random variable uniformly distributed in (0, 1), q
a constant, and U(O) = 0. Denote by U 51 = lim1--;. 00 U(t) the stationary solu-
tion of this equation. The distribution of this random variable is P(U 51 .::; u) =
P(Y > (q- uf3)/(8u)) and can be approximated by P(Ust .::; u):::::: P(Y >
(/3 2/(8 q))(q j f3 - u)) for U :::::: Uo + 8 U1. Figure 8.11 shows with dotted and
solid lines the exact and approximate probabilities P(Ust .::; u), respectively, for
q = 1, f3 = 1, and 8 = 0.1, 0.5, 1.0. As expected, the perturbation solution is
accurate for small values of 8. <>
Proof: The exact solution is

U(t) = t e-(fJ+e Y)(t-s) q ds = __


q_ [1- e-(fJ+e Y) 1]
lo f3 + 8 Y
so that Ust = q/({3 + 8 Y) and P(Ust :::: u) is as stated. The first two terms of the
perturbation solution satisfy the equations

(dfdt + fJ)Uo(t) = q (order 1),


(djdt + {3)U1 (t) = -Y Uo(t) (orders),
590 Chapter 8. Stochastic Systems and Deterministic Input

P(U st $: u)
0.9

0.8

0.7 £ =0.1
£=0.5
0.6
£ =1.0
0.5

0.4

0.3

0.2

0.1

Figure 8.11. Exact and approximate distributions of the stationary solution

so that Uo(t) = (q I {3) (1 - e-.B 1 ) and U1 (t) = -(q Y I {3 2 ) [ 1 - (1 + f3 t) e-.B 1J since


U(O) = U1(0) = 0. Thefirstorderapproximation, Ust::::: qlf3-8(qYif3 2),ofthe
stationary solution yields the stated approximation for P(Ust _::: u). •

Example 8.24: Let U be the solution of the nonlinear differential equation U(t) =
-U(t) 2 - (U(t) + Y) 2 sin(8 t), t ~ 0, starting at U(O) = 1, where Y is a ran-
dom variable taking values in [0, 1] and 8 is a small parameter. The terms of the
perturbation solution

U(t) = Uo(t) + 8 U1(t) + 8 2 U2 (t) + 0(8 3),


are given by Uo = -U?J, U] = -2 Uo ul - (Uo + Y) 2 t, and u2 = -U'f-
2 Uo U2 - 2 t U1 (Uo + Y) with the initial conditions Uo(O) = 1, U1 (0) = 0,
and U2(0) = 0. The differential equation for Uo is nonlinear, but the equations
defining the terms Uk, k ~ 1, are linear. The approximate expression of U can be
used to calculate probabilistic properties of the solution ([17], pp. 56-62). <>
Note: The function sin(8 t) needs to be approximated by a polynomial in 8 to develop
differential equations for the terms Uk> k :::: 0, of the perturbation series. The resulting
approximation of sin (8 t) is accurate for small values of 8 t.
The expressions for Uk> k = 0, 1, 2, are
(2
Uo(t) = (t + 1)- 1, u1 (t) = - - -2 (112 + Z rp 1(t) + z 2 rp2 (t)), and
(t + 1)
tk+ I (YO + Y7) t
U2(t)
1
= -- -2 [ 6
LYk - - - + Y7 ln(t + 1)] ,
(t + 1) k=l k+1 t +1
8.4. Differential and integral equations 591

where CfJi, i = 1, 2, and Yb k = 1, ... , 7, are some deterministic functions of t and Y ( [ 17],
pp. 57-58).
Approximate mappings relating the random parameter Y to U(t) can be used to find
the marginal distribution of the solution at an arbitrary time 1. For example, consider the
mapping Y r-+ [; (1) = h (t, e, Y) corresponding to the first order approximation U
Uo + e U1 for U. This mapping has the inverse ([17], p. 61),

Y = ~ [rp1 (t)- (rp1 (1) 2 -


2rp2(1)
4rp2(1) 1/f(U, e)) 112 J,
in [0, 1], whererp1(1) = 2t/3+ 1, rp2(t) = t 2 j4+21/3 + 1/2, and 1/J(u, e)= 1/2- (1 +
t- u (1 + 1) 2)/(t: 12 ), which can be used to calculate approximately the probability law of
U(t) ([17], p. 62). •

8.4.1.4 Neumann series method


Suppose that Eq. 8.45 has the integral form in Eq. 8.44. We have used
such integral representation in Sections 6.3, 6.4 and 6.6.1. A useful discussion
on the relations between differential and integral equations can be found in [99]
(pp. 225-228).

Example 8.25: Consider the differential equation

d 4 U(x)
dx 4 + Y(x) U(x) = q(x), x ED= (0, 1)

with the boundary conditions U(O) = U(l) = 0 and U"(O) = U"(l) = 0,


where Y is a real-valued random field with strictly positive samples a.s. and q is
an integrable function. The solution U is a real-valued random field satisfying the
integral equation

U(x) = £- 1 [q(x)]- ( c- 1 R) [U(x)]

= fo 1
g(x,v)q(v)dv- fo 1
g(x,v)Y(v)U(v)dv,

where£= d 4 jdx 4 , R = Y(x), and

( ) (1 - v) x [I - x 2 - (1 - v) 2 ]j6, X :S V,
{ (8.56)
g x, v = (1-x)v[l- v 2 - (l-x) 2 ]/6, x:::v

is the Green function of£ = d 4 I dx 4 . 0

Note: The solution U can be interpreted as the deflection of a beam on elastic foundation
that has unit length, is simply supported at its ends, and is subjected to a load q. The
random field Y defines the stiffness of the beam-elastic support ensemble.
592 Chapter 8. Stochastic Systems and Deterministic Input

The above integral equation defining U can be approximated by

n n
U(xi):::: h L g(xi, Xj) q(xj)- h L g(xi, Xj) Y(xj) U(xj),
j=l j=l

where Xi = i h, i = 0, 1, ... , n, and h = lfn. The matrix form of the above set of
equations is Y U = q, in which U collects the unknown values U (xi) of U, the coordinate
i of q ish LJ=l g(xi, Xj) q(x j ), andY is a random matrix depending on the values of the
field Y at the nodes Xi of the finite difference mesh. The resulting algebraic equation has
random coefficients and can be solved by the methods in Section 8.3.1. A

Example 8.26: Suppose that U(t), t 2: 0, is the solution of the differential equa-
tion of (djdt + f3 + Y(t)) U(t) = q(t), where U(O) = Z is a random variable,
Y denotes a stochastic process with positive sample paths a.s., f3 > 0, and q is an
integrable function of time. The deterministic and random parts of the differential
operator ofthis equation are£ = djdt + f3 and R = Y(t), respectively. The
integral form of the differential equation for U is

U(t) = F(t) +lot e-f3(t-s) q(s) ds- lot K(t, s) U(s) ds

where F(t) = Z e-f3t, the Green function of£ is g(t, s) =


exp(-fJ (t- s)) for
t 2: sand zero otherwise, and K(t, s) = g(t, s) Y(s) is a stochastic kernel.<>

Note: The integral equation for U can be obtained by multiplying the differential equation
of U with the Green function of£ and integrating over the time interval [0, t], that is,

lot e-f3(t-s) (:s + f3) U(s)ds =lot e-f3(t-s) q(s)ds- lot e-f3(t-s) Y(s) U(s)ds.

Integration by parts of this equation gives the stated integral equation for U.
The integral equation for U can also be obtained by viewing U(t) = -{3 U(t) +
q(t) - Y(t) U(t) as a linear system with Green function g(t, s) = exp( -{3 (t- s)), t ~ s,
driven by q(t)- Y(t) U(t) . .&

Consider the deterministic Fredholm integral equation of the second kind

qJ(x) = f(x) +A fo\cx, y) qJ(y) dy, (8.57)

defining a real-valued function qJ(x) for x E [0, 1], where f is a real-valued func-
tion and A is a constant. The Neumann series solution of Eq. 8.44 is based on the
following fact.
8.4. Differential and integral equations 593

If the kernel k is square integrable on [0, 1] x [0, 1] and lA. I <II k 11- 1 , the
solution ofEq. 8.57 is

1
cp(x)=f(x)-A fo h(x,y;A.)f(y)dy, where (8.58)

I>r-l k,(x, y),


00

h(x, y; A)=- k1(x, y) = k(x, y), and


r=l

k,(x,y)= fo\cx,z)kr-J(z,y)dz, k=2,3,.... (8.59)

Proof: Let a(x) 2 = JJ k(x, y) 2 dy and b(y) 2 = JJ k(x, y) 2 dx. The norm,

is finite by hypothesis so that there exists an ex with the above property. We have

k2(x, y) 2 = (fol k(x, z) k(z, y) dz) 2


::; a(x) 2 b(y) 2 ,

= (fol k(x, z) k2(z, y) dz)


2 1
k3(x, y) 2 ::; a(x) 2 fo k2(z, y) 2 dz::; a 2 a(x) 2 b(y) 2 ,

k4(x, y) 2 = (fol k(x, z) k3(z, y) dz) 2


::; a(x) 2 fo 1 k3(z, y) 2 dz
b(y) a fo
::; a(x) 2 2 2 1 dz::; a
a(z) 2 4 a(x) 2 b(y) 2 ,

from Eq. 8.59, properties of the norm of k, and the Cauchy-Schwarz inequality. For exam-
ple,

by the Cauchy-Schwarz inequality,

fol k(x, z) 2 dz = a(x) 2 , and

fo 1 k2(z, y) 2 dz :S fo 1 a(z) 2 b(y) 2 dz = b(y) 2 II k f:s b(y) 2 a 2


594 Chapter 8. Stochastic Systems and Deterministic Input

::5
so that k3(x, y) 2 a(x) 2 b(y) 2 a 2 and kn+2 (x, y) 2 ::5 a 2 n a(x) 2 b(y) 2 or lkn+2 (x, y)l ::5
an a(x) b(y), n = 0, 1, .. ., which gives

:;)f
1
00
kr(x, y)
I
::5:; IA.Ir lkr(X, y)l ::5:; IA.Ir a(x)b(y)ar- 2
00 00

00

=a- 2 a(x)b(y) L(IA.Ia)r.


r=l

The geometric series I;~ 1 (lA. I a/ converges if lA. I < a- 1 or lA. I <II k 11- 1 since we can
take a =II k 11. Hence, the series I;~ 1 A.r kr (x, y) is absolutely and uniformly convergent
in [0, 1] x [0, 1] because its absolute value is bounded by a convergent series independent
of the arguments (x, y) multiplied by the function a- 2 a(x) b(y), which is bounded almost
everywhere in [0, 1] x [0, 1].
If lA. I <II k 11- 1, the solution of the Fredholm equation is given by Eq. 8.58 since
the series in Eq. 8.59 giving the resolvent kernel h is absolutely and uniformly convergent
so that it can be integrated term by term, that is,

L A.r kr+l (x, y) = k(x, y) + h(x, y; A.).


00

=-
r=1

Hence, we have A. JJ
k(x, z) h(z, y; A.) dz = k(x, y) + h(x, y; A.).
It remains to show that rp in Eq. 8.58 satisfies Eq. 8.57, that is, that

f(x)- A. fo\cx, y; A.) f(y) dy = f(x)

+A. fo 1 k(x, y) [f(y)- A. fo 1 h(y, z; A.) f(z) dz] dy,


holds, or

fo
1
[-h(x,y;A.)-k(x,y)+A. fo 1 k(x,z)h(z,y;A.)dz] f(y)dy=O.
The last condition is satisfied since the above integrand is zero. Hence, rp given by Eqs. 8.58
and 8.59 is the solution of Eq. 8.57. Additional considerations on the Neumann series in
Eq. 8.57 can be found in [99] (pp. 266-269), [110] (Chapter 3), and [187] (pp. 49-53).
The proof that Eqs. 8.58 and 8.59 give the solution of Eq. 8.57 can be obtained under
other conditions. For example, this result holds if the kernel is bounded in its domain of
definition, that is, lk(x, y)l ::5
fJ, 0 < fJ < oo, and lA. I < {J- 1. Moreover, the requirement
IA.I < {J- 1 can be replaced by IA.I < IA.1I, where A.1 denotes the smallest eigenvalue of k
([99], p. 267).
8.4. Differential and integral equations 595

Because the series representation for h(x, y; A.) in Eq. 8.59 is absolutely and uni-
formly convergent almost everywhere in [0, 1] x [0, 1], the integration in Eq. 8.58 can be
performed term by term. Therefore, the Neumann series solution is

= f(x) + L )/ fo
00 1
rp(x) kr(X, y) f(y) dy.
r=1

Numerical calculations based on the Neumann series solution are usually based on the first
two or three terms of this series. •

Example 8.27: Consider the integral equation

cp(x) = f(x) +A. l k(x, y) cp(y) dy, x E D, (8.60)

where Dis an open bounded subset of JRd, cp, f : D --+ lR", and k : D x D --+
JRnxn. If (1) lkij(X, y)l ::::; {3, 0 < f3 < oo, for all X, y ED and i, j = 1, ... , n,
(2) f D dx = v v < oo, and (3) lA. I < (n v v {3) -I, then the Neumann series
solution of Eq. 8.60 is

cp(x) = f(x)- A. l h(x, y; A.) f(y) dy, (8.61)

where h(x, y; A.) = - L~ 1 A.r- 1 kr(X, y) and the kernels kr can be obtained
from k1 (x, y) = k(x, y) and kr(X, y) = fv k(x, z) kr-1 (z, y) dz for r ~ 2. 0

Proof: We have

lk2,ij (x, y)l ::0 t


p=1
fv lk1,ip(x, z)llk1,pj (z, y)l dz ::0 n vv tP

by the definition of kr and the above requirements on k. Similar considerations give


lkr,lJ··I -< nr- 1 vr- 1
D f3r so that

If:)/
r=l
kr,ij (x, y) I ::0 f:
r=1
IA.Ir nr- 1 v;- 1 f3r

showing that 1:~ 1 ').f kr,ij (x, y) is absolutely and uniformly convergent in D x D if the
condition P-I < (n v D {3) -I is satisfied. The function h satisfies the equation

A. fv k(x, z) h(z, y; A.) dz = k(x, y) + h(x, y; A.)


since its series representation can be integrated term by term. The above property of h
shows that Eq. 8.61 is the solution of Eq. 8.60. •
596 Chapter 8. Stochastic Systems and Deterministic Input

The solutions of Eqs. 8.57-8.59 and Eqs. 8.60-8.61 can be extended to ob-
tain a Neumann series solution for Eq. 8.44. There is a notable difference between
these equations. The kernels of the integrals in Eqs. 8.57 and 8.60 are determinis-
tic while the kernel of the integral in Eq. 8.44 is stochastic.

I
If (1) there is f3 E (0, oo) such that Kij (x*, y*) I ::
f3 a.s. for all x *, y* E D*
and i, j = 1, ... , n, (2) fn• dx* = v1 < oo, and (3) /A/ < (n v1 {3)- 1, then
the solution ofEq. 8.44 completed with initial and boundary conditions is

U(x*) = F(x*)- A { H(x*, y*; A) F(y*) dy*, where (8.62)


ln•
00

H(x*, y*; A)=- LAr-l K,(x*, y*), Kt(x*, y*) = K(x*, y*), and
r=l
K,(x*,y*)= { K(x*,z*)Kr-t(z*,y*)dz*, r=2,3, .... (8.63)
ln•
Note: The result in Example 8.27 can be applied for almost all samples of the stochastic
kernel K since the entries of K are assumed to be bounded almost surely. Hence, the

vv
resulting Neumaun series is absolutely and uniformly convergent a.s. We also note that the
condition I.A. I < (n {3)- 1 implies that the Neumann series is absolutely and uniformly
J
convergent a.s. on any D c D* since jj dx* = v :::; v* . .A.

Approximate second moments properties of U can be calculated from a few


terms of the Neumann series. For example, the approximate first two moments of
U are

E[U(x*)] ::= E[F(x*)] +A { E[K(x*, y*) F(y*)] dy*,


ln•
E[U(x*) U(~*)T] ::= E[F(x*) F(~*lJ (8.64)

+A { E[F(x*) F(z*)T K (~*, z*)T] dz*


ln•
+A { E[K(x*, y*) F(y*) F(~*)T] dy*
ln•
+A2 { E[K(x*,y*)F(y*)F(z*l K(~*,z*lJdy*dz*
ln•xD*
(8.65)

for U(x*) ::= F(x*) +A fv• K(x*, y*) F(y*) dy*.

Example 8.28: Let U be the solution of (dfdt + f3 + Y) U(t) = q, t E [0, -r],


where Y is a random variable and U (0) = 0. The integral form of this equation is
8.4. Differential and integral equations 597

U(t) = f(t) +A J;
K(t, s) U(s) ds where f(t) = (q/fJ)(l- e-f3t), K(t, s) =
1[o, 11 (s) Y e-f3(t-s), and A = -1 (Example 8.26).
If IYI < 1/r a.s., the Neumann series is absolutely and uniformly conver-
gent in [0, r] a.s., and the solution is

U(t)=f(t)-A fo' H(t,s;A)j(s)ds,


where H(t, s; A) = - I:~ 1 Ar- 1 Kr(t, s) and the kernels Kr are K1(t, s)
K(t, s) and Kr(t, s) = J;
K(t, a) Kr-1(a, s)da for r :::: 2. Figure 8.12 shows
the variation in time of the exact and approximate mean and standard deviation of

0.4
Mean Standard deviation
0.8 ,•'
II I II II

0.3
0.6
0.2 ··. \ approximate
0.4
'exact 0.1
0.2

0 0
0 0.5 1.5 2 0 0.5 1.5 2
time time

Figure 8.12. Exact and approximate mean and standard deviation functions of U

U for f3 = 1, Y ~ U(-0.5, 0.5), and r = 2. The approximate moments of U


are based on the first two terms of the Neumann series solution. The approximate
mean of U is accurate but the approximate standard deviation of the solution is
unsatisfactory. Superior approximations can be obtained by retaining additional
terms of the Neumann series solution, but the calculation becomes tedious. <:>

Note: The absolute value of the kernel K can be bounded by

IK(t,s)l = 'l[o,t]Cs)Ye-f3(t-s)'::::: IYI

so that the Neumann series is absolutely and uniformly convergent in [0, r] if r IYI < 1
a.s. Because IYI < 0.5 a.s., the Neumann series giving the solution U is convergent for
r < 2. The first two kernels are K 1 = K and

K2(t, s) = Y 2 fo' l[o,t](u) e-f3(t-a) l[O,aj(s) e-f3(a-s) du

= Y 2 l[O,tJ(s) (t- s) e-f3(t-s)


598 Chapter 8. Stochastic Systems and Deterministic Input

so that U(t)::::: j(t)- Y !1 (t) + Y 2 fz(t), where


fJ(t)= fo' K(t,s)f(s)ds= ; 2 (1-(1+,8t)e-ll 1) ,

fz(t) = fo' Kz(t, s) f(s) ds = ~~ ( 1- (1 + ,B t) e-ll 1)


- ; 3 (.Bt-1+(1-(,Bt) 2 j2)e-ll 1 ) .

The approximate solution can be used to calculate moments and other properties of
U. The exact solution U(t) = (q j(q + Y)) ( 1- e-<ll+Y) 1) is a simple function of Y so
that its moments can be obtained without difficulty. A

8.4.1.5 Other methods


The decomposition and iteration methods are briefly discussed. The equiv-
alent linearization method introduced in Section 8.3.1.6 appears to be impractical,
and is not considered. The following two examples illustrate the decomposition
and the iteration methods. We have outlined the essentials of these methods in
Sections 8.3.1.5 and 8.3.1.7.

Example 8.29: Let U(x), x E [0, 1], be the solution of the differential equa-
tion (d/dt + fJ + Y(t)) U(t) = q with U(O) = 0, where Y is a zero-mean
process. The decomposition method is based on a series representation U(t) =
L~o Ur(t) of U, where

Uo(t) = _c- 1 [q] = te-ll {t-s) q ds = 1 (1 -e-ll 1) and


lo fJ
U1(t) = -(£- 1 R)[Uo(t)] = - t e-ll(t-s) Y(s) 1 (1- e-lls)ds
lo fJ
= --
q e-f3t lot (ells- 1) Y(s) ds
fJ 0

are the first two terms of the above decomposition of U, £ = djdt + {3, and
R = Y(t) ([3], Sections 4.7, 4.10, 5.1, 7.2, and 8.2). The functions Uo and Ut
coincide with the first two terms of the Neumann series solution for U. <>
Note: The differential equation satisfied by U is (C.+ R) [U(t)] = q. The integral ver-
sion of the differential equation for U can be written symbolically as U(t) = c.- 1[q]-
c.-1R[U(t)] or

U(t) = Uo(t) + Ut (t) + · · · = c.- 1[q] + c.- 1R[Uo(t) + U1 (t) + · · ·]


= ;:.- 1 [q] + ;:.- 1R[Uo(t)] + c.- 1 R[U 1 (t)] + · · · ,

suggesting the recurrence formulas Uo = c.- 1[q] and Uk = c.- 1R[Uk-tl fork 2': 1. A
8.4. Differential and integral equations 599

Example 8.30: Let U(t), t ::=: 0, be the solution of the differential equation
(.C + R) [U(t)] = q(t) in the previous example, where .C and Rare determin-
istic and random differential operators and q is an integrable function. The mean
of U can be calculated approximately from

E[U(t)]::::::: .c- 1[q(t)]- .c- 1 [E[R.C- 1 R] E[U(t)]]


with E [U (0)] = 0, according to the second version of the iteration method (Sec-
tion 8.3.1.7). ¢

Note: The expectation, .C[E[U(t)]] + E[R U(t)] = q(t), of the differential equation for U
cannot be used to calculate the first moment of the solution since E['R. U(t)] is not known.
An equation can be developed for this unknown expectation by applying the operator R to
the solution U(t) = .c- 1[q(t)]- .c- 1 R[U(t)], which gives

R[U(t)] = n.c- 1 [q(t)]- n.c- 1 R[U(t)].


The stated mean equation follows from the expectation of the above equation and the ap-
proximations,

E[R..C- 1[q(t)]]:::: E['R.].C- 1[q(t)] = 0,


E[R..C- 1 'R.[U(t)]]:::: E[R..C- 1 'R.] E[U(t)],

based on the local independence hypothesis. A

8.4.2 Homogeneous equations

Let q = 0 in Eq. 8.43. Then this equation becomes

A[<l>(x)] =A <l>(x), x E D c JRd. (8.66)

It is assumed that (1) A is a linear differential operator with time-invariant ran-


dom coefficients and (2) .C = E[A] is a linear self-adjoint operator.

Our objective is to solve the random eigenvalue problem defined by Eq. 8.66,
that is, to find properties of the eigenvalues A i and the eigenfunctions <l>i of A.
To complete the definition of the eigenvalue problem in Eq. 8.66, boundary con-
ditions need to be specified for this equation.

Note: It can be shown that the eigenvalues and eigenfunctions of A are random variables
and random fields, respectively, on a probability space (Q, :F, P) on which A is defined
([164], Section 1.2).
More general eigenvalue problems arise in some applications. For example, we may
be interested in finding the eigenvalues and eigenfunctions of the homogeneous equation
At [<I>] = A A2[<l>], where At and A2 are differential or integral operators with random
coefficients. A
600 Chapter 8. Stochastic Systems and Deterministic Input

The analysis can be based on ( 1) extensions of the local solutions in Sec-


tions 6.2.2.3 and 6.6.2, (2) Monte Carlo simulation, (3) algebraic eigenvalue prob-
lems approximating Eq. 8.66, which can be obtained, for example, by finite dif-
ference or finite element representations of Eq. 8.66, (4) exact or approximate
expressions for the solution of Eq. 8.66 and bounds on the eigenvalues of A (Sec-
tion 8.4.2.2), and (5) moment equations for the eigenvalues and eigenvectors of A
(Sections 8.4.2.3 and 8.4.2.4).

We do not discuss solutions of Eq. 8.66 by Monte Carlo simulation and


algebraic eigenvalue problems because these solutions can be implemented on the
basis of previous developments in this chapter. Also, an example illustrating the
Monte Carlo solution for a deterministic eigenvalue problem has been presented
in Section 6.6.2. Numerical comparisons between various methods for solving
Eq. 8.66 can be found in [25] (Section VI, pp. 53-71).

8.4.2.1 Deterministic eigenvalue problem

Let H be the collection of complex-valued square integrable functions de-


fined on a subset D of JRd, that is, JD f(x) f(x)* dx < oo for f E H. The inner
product (f,g} = JDJ(x)g(x)*dx, f,g E H, on His finite since l(f,g}l:::::
[(f, f} {g, g}] 112 by the Cauchy-Schwarz inequality and properties of H. The
norm on H induced by the inner product is II f II= (f, f} 112 , f E H.

An operator C : H ---* His said to be linear and self-adjoint if C[a f +


f3 g] = a C[f] + f3 C[g] and {,C[f], g} = (f, C[g]} for f, g E H and a, f3 E
C. We assume that C is a linear and self-adjoint operator. The eigenvalues and
eigenfunctions, A; and¢;, of C satisfy the equation

,C [¢;(x)] =A; ¢;(X) (8.67)

completed with boundary conditions. Let U be a vector space consisting of square


integrable functions u : D ---* C satisfying the boundary conditions specified for
Eq. 8.67 such that (,C -AI) u is defined for each u E U. The members of U are
called admissible functions. Hence, the eigenfunctions of C are in U.

Suppose that Cis a linear self-adjoint operator with real-valued coefficients.


We summarize some essential properties of the eigenvalues and eigenfunctions of
C defined by Eq. 8.67.
8.4. Differential and integral equations 601

• The eigenvalues and the eigenfunctions of£ are real-valued.


• If£ is positive definite and self-adjoint, then £ possesses a countable number
of eigenvalues and all eigenvalues are positive.
• The eigenfunctions corresponding to distinct eigenvalues are orthogonal and
linearly independent.
• The eigenfunctions are unique in the sense that an eigenvalue that is not mul-
tiple can have only one eigenfunction.
• The lowest eigenvalue£ satisfies the condition A.t :=:: (£[u], u) for all u E U
with II u II= 1 so that A.t = minueU, llull=l (C[u], u).
• The eigenfunctions {<f>k} of C span U and u = I:f: 1 Ck <f>k for any u E U,
where Ck = (u, <f>k) ([133], Chapter V).

Proof: Because C is self-adjoint, we have (C[¢], ¢} = (¢, £[¢]} so that A(¢,¢} =


A* (¢, 4>}. If 4> is an eigenfunction, II 4> II :f= 0 so that A = A*. This property also shows that
the eigenfunctions of C are real-valued functions.
An operator Cis positive definite if (C[u], u} is positive for u E U and (C[u], u} = 0
if and only if u = 0. If Cis positive definite, then (C[u], u} = A (u, u} is positive so that
A :;::: 0. It can be shown that the eigenvalues of C form a countable sequence converging to
infinity under relatively mild conditions ([65], Chapter 2 and [133], Chapter V). We index
the eigenvalues of C in increasing order, that is, Al ::; A2 ::; · · · so that limi~oo Ai = oo.
We have (C[¢j], 1/>j} = (1/>i, C[l/>j]} since Cis self-adjoint so that Ai (1/>i, 1/>j} =
A1 (1/>i, 4> j} or (Ai - Aj) (1/>i, 4>j} = 0. If Ai :f= A1, we have (1/>i, 4>1} = 0, that is, the
eigenfunctions corresponding to distinct eigenvalues are orthogonal. If (c/Ji, 1/>i} = 1 for
all values of i and (1/>i, 4>j} = 0 for Ai :f= Aj, i :f= j, the eigenfunctions are said to be
orthonormal.
If a 1/>i + f3 4>j = 0 implies a = f3 = 0, 1/>i and 4>1 are said to be linearly in-
dependent. Suppose that the orthonormal eigenfunctions c/Ji and 4>1 are linearly depen-
dent, that is, there are some non-zero coefficients a, f3 such that a 1/>i + f3 4> j = 0. Then
(a 1/>i + f3 4>1, 1/>i} is zero, but is also equal to a (1/>i, 1/>i} since 1/>i and 4>1 are orthogonal so
that we must have a = 0. Similar considerations show that f3 = 0, in contradiction with
our assumption.
Suppose that the first eigenvalue has two eigenfunctions, the functions ¢1 and¢.
Because the set of eigenfunctions c/Ji is orthonormal and complete, we have 4> = Li Ci 1/>i
and q = (¢, ¢k} = 0 fork :f= 1 so that 4> differs from ¢1 by a constant.
Let u E U such that II u II= 1 and {1/>j} be orthonormal eigenfunctions of C. We
have u(x) = Li ci 1/>i (x), where Ci are constants, and

(C[u],u} = :~::>iCiCj (1/>i,l/>j} = LAiCf =At+ L(Ai -Al)cf ::=:At


i,j i i~2

since Li cr = 1 and Aj ::::: A1· The inner product can be calculated term by term because
the space spanned by the eigenfunctions of C is complete.
The last property implies that any function in U can be represented as a linear form
of the eigenfunctions of C. •
602 Chapter 8. Stochastic Systems and Deterministic Input

8.4.2.2 Exact expressions and bounds

Denote the deterministic and random parts of the operator A in Eq. 8.66 by
.C and n, respectively. We assume that the random coefficients of A are square
integrable. Set .C = E[A] and n =A- .C. Bounds on the eigenvalues of A can
provide useful information on, for example, the dynamic properties of uncertain
dynamic systems. Exact expressions for the eigenvalues and eigenfunctions of A
can be most valuable in applications. However, these expressions are available
only for very simple problems.

Example 8.31: Consider the eigenvalueproblemd 2 cl>(x)jdx 2 +A cl>(x) = 0, x E


D = (0, 1), with the boundary conditions cl>(x) = 0 at x = 0 and dcl>(x)jdx +
Z cl>(x) = 0 at x = 1, where Z is a positive random variable. The lowest eigen-
value of this boundary value problem is the solution of tan(A ~/2 ) = -A~/ 2 jZ
in the interval [rr/2, rr] ([25], p. 53). The solution is a function of Z, and its
expression can be used to calculate moments and the distribution of A 1· <>

Note: The differential operator for this boundary value problem A = .C = d 2I dx 2 is deter-
ministic. The boundary condition at x = 1 is the only source of uncertainty. The set of ad-
missible functions for a sample Z (w) of Z consists of real-valued twice differentiable func-
tions <1>(·, w) with support [0, 1] and boundary conditions <1>(0, w) = 0 and d<l>(1, w)jdx+
Z(w) <1>(1, w) = 0. The solution of this equation is <l>(x) =a cos(A 1/ 2 x)+b sin(A 1/ 2 x),
where a and b are arbitrary constants. The first and second boundary conditions imply
a = 0 and A 1/ 2 cos(A 112) + Z sin(A 112 ) = 0, respectively. •

Example 8.32: Consider the boundary value problem in the previous example.
The inner product ((d 2 cl>(x)jdx 2 ), cl>(x)} provides an upper bound on the lowest
eigenvalue AI ofthisproblemsothatE[AI]::;; E[((d 2 ct>(x)jdx 2 ), cl>(x)}], where

-1/2
Z+ 2
2 [ 1 Z+ 2 (Z + 2) 2 ]
cl>(x) = (x - z + 1 x) 5- 2 (Z + 1) + 3 (Z + 1) 2

is an admissible function. <>


Note: The polynomial <l>(x) = cl x 2 + c2 X+ c3 is differentiable and satisfies the bound-
aryconditions stated in Example 8.31 for C3 = Oand C2 = -C1 (Z +2)/(Z + 1). Hence,
<l>(x) = C1 [x 2 - (Z +2)x/(Z + 1)]
is an admissible function. The square of the norm
of this function is

II
<I> 2
II = lo{
1 <I>
(x)
2d
x =
c2 [ 1
1 5-
z + 2 + 3cz(Z++2)1)2
2 (Z + 1)
2 J.
The constant C1 is selected such that II <I> II= 1. •
8.4. Differential and integral equations 603

8.4.2.3 Perturbation method


Suppose that
(8.68)
where £ = E[A], t: R A - £, and t: is a small parameter. It is assumed
as previously that the random coefficients of A are in L2. Let 'A; and¢; denote
the eigenvalues and eigenfunctions of the eigenvalue problem defined by £ with
boundary conditions given by the order 1 boundary conditions of the eigenvalue
problem for A.

If II ¢; II= 1 and 'A; is not a multiple eigenvalue, the first order approximation of
the eigenvalues and eigenfunctions of A are:

+ t: (R[¢i1, ¢i} + O(t: )


- 2
A; :::::'A; and (8.69)

cl>;(x)::::: ¢;(x) + t: L (R[¢i1, ¢k} ¢k(X) + O(t: 2 ). (8.70)


ki'i 'A; - Ak

Proof: We assume that the eigenvalues and eigenfunctions of A can be expanded in the
convergent power series
(I) 2 (2)
A; =A; + B A; + B A; + ... ,
<I>; (x) = </Ji(x) + e <1>~ 1 ) + e2 <~>?) + · · · (8.71)

in the small parameter e. This representation and Eq. 8.66 give

(£ + e R)[ifi 1· + e <1>(l 1) +···]=(A;+ e A(l 1) + · · ·) (ifi1·(x) + e <1>~l 1 ) + · · ·)

so that £[<fli1 =A; if!; and £[<1>} 1)] + R[<fiiJ =A; <t>jll + Aj 1l if!; for order 1 and orders,
respectively. These equations are accompanied by conditions given by the terms of order
1 and e of the boundary conditions prescribed for the eigenvalue problem of A. The inner
product of the equation of order e and <fi} gives

(£[<1>} 1\¢J) + (R[ifiiJ,ifiJ) =A; (<l>}l),ifiJ) +A} 1l (ifi;,¢J)

or B;j ().. j -A; )+(R[<fliJ, if!j) = A} 1l 8;1 since the collection of eigenfunctions is complete
so that <1>} 1) = Lk B;k <Pk and (if!;, if!j) = 8ij. This equation yields Ajll = (R[<fliJ, <Pi)
fori = j and B;1 = (R[<fliJ, <fiJ)f(),;- Aj) fori =!= j. The projection Bii of <t>} 1l on
¢; remains undetermined. We take Bii = 0 since the above results do not change if we
add to <1>} 1) a function proportional to if!;. Moreover, the eigenfunction <I>; ::::o <Pi+ e <t>jll
approaches ¢; as e ---+ 0. •

Example 8.33: Consider the eigenvalue problem (1 + t: Z) ~cl>(x) = A cl>(x),


x E D c JE.d, where~ = 2:,~= 1 a2 ;ax;, Z is a random variable with mean zero
and finite second moment, and ci> (x) = 0 on the boundary of D. The approximate
eigenvalues and eigenfunctions to the order t: are A i :::::'A; (1 + t: Z) and cl>;(x):::::
¢;(x). <>
604 Chapter 8. Stochastic Systems and Deterministic Input

Note: The deterministic and random parts A are £ = Ll and n = Z Ll, respectively. The
first order correction of the eigenvalue i relative to A.; is (Eq. 8.69)

(1)
A; = (Z Ll[¢;], ¢;) = Z (A.;¢;,¢;)= Z A.;.

The coordinates of the first order correction of the eigenfunctions of (1 + E: Z) Ll are


(Eq. 8.70)
BiJ = (Z M¢;], ¢j) = Z A.;(¢;, ¢j) = 0 fori =I j
A.; - Aj A.; -A. j '

jl)
so that <P (x) is zero. The result also follows from the alternative form Ll <P (x)
(A/(l + E: Z)) <P(x) of the original boundary value problem, showing that the random
and deterministic operators, (1 + E: Z) Ll and Ll, have the same eigenfunctions but different
eigenvalues. The eigenvalues of the random operator scaled by ( l + E: Z) coincide with the
eigenvalues of Ll. .&

8.4.2.4 Iteration method

Consider the decomposition A = £ + R of the random operator A, where


£ = E[A] and R = A - £ has mean zero. It is assumed that the random
coefficients of A are in L2.
Our objective is to find moments of the eigenvalues and eigenfunctions of
A. The analysis is based on the local independence hypothesis, which states that
the dependence between the eigenvalues, eigenfunctions, and the random part of
A can be ignored. We mention again that there is no theoretical justification for
this hypothesis. Results are given only for some of the first two moments of the
eigenvalues and eigenfunctions of A. Additional information on this method can
be found in [25] (pp. 43-52).

The expectation of the eigenvalues and eigenfunctions of A can be approximated


by the solution of the eigenvalue problem

£[E[<I>;(x)]] ~ E[A;] E[<I>;(x)]. (8.72)

Note: The expectation, .C[E[<P]] + E[R[<P]] = E[A <P], ofEq. 8.66 cannot be used to find
the mean values of eigenvalues and eigenfunctions of A[<P] = A <P since E['l?[<P]] and
E[A <P] are not known. The expectation ofEq. 8.66 and the local independence hypothesis,
giving the approximations E[R[<P]]::::: E[R] E[<P] = 0 and E[A <P] ::::: E[A] E[<P], yield
Eq. 8.72. .&

The second moment of the eigenvalues of A can be approximated by

E[AT] ~AT+ ((E[R R])[¢;], ¢;). (8.73)


8.5. Effective material properties 605

Note: The sequence of equalities,

<1> =A -l (.C + R)[<l>] = A - 2 (.C + R) (.C + R)[<l>]


=A - 2 (.C.C + .C R + R.C + R R)[<l>],

gives E[A2] E[<l>] :::::: .C .C[E[<l>]] + E[R R] E[<l>] by averaging and using the local inde-
pendence hypothesis. The inner product of the last formula for A = A; and <1> = <1>; with
1/Ji yields Eq. 8.73 since E[<l>j]:::::: 1/J;, (1/Ji, 1/Ji) = 1,

E[AT] (1/J;, 1/J;) :::::: (.C .C[I{Jj], 1/J;) + (E[R R] [1/Jj], 1/J; ),
and.Cisaself-adjointoperator, so that (.C.C[I{Jj], 1/J) = (.C[I{Jj], .C[I{Jj]) = >..T (1/J;, 1/J;). There
is no simple formula for the second moment of the eigenfunctions of a random operator A.
A

Example 8.34: Let (Ai, ci>i) be the solution of the eigenvalue problem

(1 + Z) D.ci>(x) =A ci>(x),
x E D, here Z is a random variable with mean zero and finite second moment.
The approximate mean solution is E[A;] ::::: Ai and E[ci>;] ::::: ¢i, where Ai and
<Pi denote the eigenvalues and eigenfunctions of the deterministic part .C = D.
of A = (1 + Z) D.. The approximate second moment of an eigenvalue of A is
E[AT] :::::AT (1 + E[Z 2]), so that the approximate variance of Ai is AT E[Z 2 ]. 0

Note: The inner product (E[R R][I{Jj], 1/Ji) is

(E[Z 2 jj. jj.][I{Jj], 1/Ji) = E[Z 2] (jj. fj.[I{Jj], 1/Ji) = E[Z 2] AT

since /j. is a self-adjoint operator, jj.[¢;] = Ai 1/Ji, and (1/Ji, 1/J;) = 1. A

We conclude here our discussion on methods for solving stochastic systems


with deterministic input. The remaining part of this chapter presents stochastic
problems selected from various fields of applied science and engineering. The
solutions of the problems in the following sections are based on the methods in
Chapters 5, 6, 7, and the first part of this chapter.

8.5 Effective material properties


Most materials are heterogeneous at the scale of their constituents, referred
to as the microscopic scale or microscale. For example, concrete at the scale of
its aggregates, granular materials at the scale of their particles, metals at the scale
of their grains or crystals, and multi-phase media or suspensions at the scale of
their inclusions. Moreover, material properties at the microscale tend to fluctu-
ate randomly in space so that they can be modeled by random fields at this scale.
Generally, we say that the microstructure of a material is known if there exists
606 Chapter 8. Stochastic Systems and Deterministic Input

complete information on the material features at the microscopic level, for exam-
ple, we know the concentration, the properties, and the geometry of the constituent
phases of a multiphase material as well as their spatial distribution.
Our objective is to calculate bulk, global, or effective properties, that is,
material properties at the macroscale or laboratory scale. These properties are
some type of averages of the material features at the microscale. Effective prop-
erties (1) are measured in most laboratory tests, (2) are used as material constants
in continuum theories, for example, the Lame constants of linear elasticity, and
(3) can be obtained in some cases even if the available information on the mi-
crostructure is partial, for example, we may know only the volume fraction and
mechanical properties of the constituents of a multi -phase media but not their ge-
ometry and spatial distribution.
Consider a microstructure and some differential equations describing the
state of the microstructure subjected to a specified action. Because material prop-
erties are random at the microscopic scale, these equations have random coeffi-
cients. The solution of these equations provides detailed information on the state
of the microstructure, which may not be needed. We can derive equations for
the material state at the macroscopic scale by averaging the solution of the state
equations for the microstructure, but this approach is computationally inefficient
since it requires finding first the state of the microstructure. Two methods have
been developed for establishing state equations at the macroscopic scale, the ho-
mogenization and representative volume methods. The homogenization method
considers a family of problems for microstructures with scale 8 > 0 and calcu-
+
lates the limit of the solution of these problems as 8 0. The method also delivers
effective material properties. The representative volume method develops differ-
ential equations for the macroscopic state of a material, that is, for an average of
the microscopic state over a window much larger than the scale of the microstruc-
ture, referred to as representative volume. The macroscopic state equations can be
obtained directly, rather than by averaging the microscopic state equations.
It is common to assume that the material properties at the microscopic scale
can be modeled by ergodic random fields. Spatial averages of these properties
over a representative volume much larger than the microscale are nearly deter-
ministic and practically independent of the particular microstructure sample used
for calculations.
Generally, numerical algorithms are needed for calculating effective prop-
erties. However, some homogenization problems can be solved analytically, for
example, the determination of the effective properties of series and parallel sys-
tems. The components of a series and a parallel system carry the same load, the
applied load, and experience the same deformation, the system deformation, re-
spectively. A series system fails by its weakest component while a parallel system
fails when its last surviving component breaks.
Example 8.35: Consider a series and a parallel system with n linear elastic com-
ponents of stiffness Ki, i = 1, ... , n, where Ki are independent copies of a real-
valued random variable K > 0 with finite mean. These heterogeneous systems
8.5. Effective material properties 607

can be replaced by homogeneous systems, that is, systems with components of


the same stiffness Keff = nj ('L:7= 1 1/ K;) and Keff = (1/n) 'L:7= 1 K; for series
and parallel systems, respectively. The equivalence between the original hetero-
geneous systems and their homogenized versions is very strong in this case. The
homogenized and the heterogeneous systems have the same force-displacement
relationship.
The effective stiffness Keff converges a.s. as n -+ oo to 1/ £[1/ K] and
E[K] for series and parallel systems, respectively, by the strong law of large num-
bers since the mean of K exists and is finite (Section 2.13). <:>
Proof: Let d > 0 be an imposed deterministic displacement at the boundaries, that is, the
two ends, of a series system. This displacement is generated by a force F which is ran-
dom because of the uncertainty in the system components. The force in each component
is F so that the elongation of component i is a random variable D; = F j K;. The system
deformation, Lt=l D; = Lt=l F I K; = F Lt=l 11 K; must coincide with d so that the
overall stiffness of the system is F ld = 11 Lt=l 11 K;. Hence, the stiffness of the compo-
nents of a homogeneous series system with n components and the same global constitutive
law is nl Lt=l 11 K;. If a deterministic force f > 0 is applied at the system boundaries,
the elongation of component i is f I K; so that f Lt=l (1I K;) is the system deformation.
The ratio of f to the system deformation gives the system stiffness, which has the same
expression as in the deformation controlled experiment.
Similar considerations can be applied to find the effective stiffness for a parallel
system. For example, suppose that the system is subjected to a deterministic force f > 0
causing a random deformation D. Because the components have the same deformation D,
the force carried by component i is K; D so that f = D Lt=l K;, showing that the system
stiffness is Lt=l K;. •

Example 8.36: Suppose that the components of the series and parallel systems
in Example 8.35 are made of Maxwell or Kelvin materials, that is, they are mod-
eled by a spring and a damper in series or a spring and a damper in parallel,
respectively. The overall force-displacement relations for a parallel system with
Maxwell and Kelvin components are

f
r(t) = tr
n

1/K;
1
+ tjC; and r(t)
f
= tr
n K;
1-exp[-(K;/C;)t]'

respectively, where K; and C;, i = 1, ... , n, denote the stiffness and damping
parameters of component i, n is the number of components, t ::::: 0 is the time
measured from the application of the force f > 0, and r(t) denotes the system
elongation at time t.
The overall force-displacement relations for a series system are

f 1
and _!_ = 1---------,--
=:::-:::---;:-----
r(t) r(t) 'L:7=t [1-exp(-(K;/C;)t)]/K;
for Maxwell and Kelvin components, respectively. <:>
608 Chapter 8. Stochastic Systems and Deterministic Input

Proof: The above results are consistent with findings in Example 8.35. If Ci ---+ oo and
Ci ---+ 0, the Maxwell and Kelvin materials degenerate into linear springs and the above
equations yield the results in Example 8.35.
The elongation f' (t) of Maxwell and Kelvin materials with parameters ( Ki, C i)
subjected to a force x > 0 are

respectively.
Suppose that a parallel system with Maxwell components is loaded by f > 0. The
deformation f'(t) at timet of all system components is f'(t) = Fi (t) (11 K; + t I Ci), where
the force F; (t) in component i = 1, ... , n at time t is unknown. Because L:7= 1 Fi (t) = f
at each t by equilibrium, we have

n n l
f = "'F;(t)
L...
= f'(t) " '
L... liK· +tiC
.
i=1 i=l l l

The elongation f'(t) can be determined from the above equation for a specified load and
time and then used to find internal forces F; (t). The deformation f' (t) and internal forces
F; (t) are parametric stochastic processes (Section 3.6.5) since they are deterministic func-
tions of time depending on a finite number of random parameters, the component stiffness
and damping. A similar approach can be applied to find the global force-displacement
relations for other systems. •

Example 8.37: Consider a rod of length l > 0 and random stiffness K(~) > 0,
~ E [0, l]. The effective stiffness oftherodis Keff = 11 J~ d~/K(~). Figure 8.13
shows samples of Keff as functions of the rod length. The samples nearly coincide
as the rod length increases. The plots are for the stiffness random field K (~) =
a+ <P(X(~))(b- a),~ E [0, l], a = 1, b = 2, and a process X defined by
dX(~) =-pX(~)d~ + .J'IPdB(~), where p = 10, B is a Brownian motion,
and X(O) ~ N(O, 1). <>

Proof: The determination of the effective stiffness is based on the approach in Exam-
ple 8.35 for series systems. Suppose that the rod is subjected to a fixed elongation d > 0.
A random force F > 0 needs to be applied at the ends of the rod to generate this elonga-
tion. The strain caused by Fat~ E [0, l] is f'(~) = F I K(~) so that d = JJ r(~) d~ or
F JJ d~ I K (~) = d, which gives the rod stiffness. The equivalent homogeneous rod has
stiffness K(l) = ll Keff·
We note that the stiffness K is a translation random field (Section 3.6.6) with the
properties K (~) ~ U(a, b) for all~ E [0, l] and scaled covariance function that is approx-
imately equal to exp(-p 1~1 -hi), where ~1, ~2 are some points in [0, l]. •

Consider a multi-phase material. The phase properties and their volume


fraction provide useful information on the microstructure. This partial information
on the microstructure suffices to calculate effective material properties in some
8.5. Effective material properties 609

0.9

Samples of Keff

0.4

0.3

0.2

0·1oc__----=o'=-.5---'----:1':-.5---:-----::2':-.5---:-----::3"::.5--:----:4"::.5-~

le.ngth

Figure 8.13. Samples of effective stiffness K eff

cases. For example, suppose that the series system in Example 8.35 has two types
of components with stiffnesses ka and kb and volume fractions ~a and ~b. respec-
tively, that is, there are na = n ~a and nb = n ~b components with stiffness ka and
kb. The effective stiffness of the components of a homogeneous series system with
n components is keff = 1I (~a I ka + ~b I kb). However, the above partial informa-
tion on the microstructure is insufficient in other cases. For example, consider two
microstructures of a material consisting of disconnected inclusions imbedded in a
matrix with a 50% inclusion volume fraction. In the first microstructure the inclu-
sions have much higher conductivity than the matrix. In the second microstructure
the matrix has much higher conductivity than the inclusions. The microstructures
are indistinguishable if the information is limited to phase properties and concen-
tration. However, the effective conductivity of the second microstructure exceeds
by far the corresponding property of the first microstructure. In this case we need
detailed information on the microstructure to calculate the effective conductivity.
The following two sections illustrate methods for calculating effective prop-
erties for random heterogeneous materials. In Section 8.5.1 we estimate the ef-
fective conductivity of multi-phase random materials. The analysis is based on an
extension of the random walk method (Section 6.2) and smooth approximations
of the conductivity field. In Section 8.5.2 we illustrate the determination of the
effective material properties for linear elasticity. The analysis is based on proper-
ties of random fields, concepts of linear elasticity, and perturbation and Neumann
series methods.
610 Chapter 8. Stochastic Systems and Deterministic Input

8.5.1 Conductivity
There are numerous studies providing approximations of and bounds on
macroscopic properties of heterogeneous media, such as electric or thermal con-
ductivity, magnetic permeability, and diffusion coefficients. Most of the available
results are for special microstructures, such as two-phase composites with inclu-
sions of simple geometries [184, 192]. A notable exception is the Monte Carlo
simulation method in [113, 186] for calculating the effective conductivity of two-
phase materials. The method uses samples of Brownian motions having differ-
ent speeds in different phases and non-zero probabilities of reflection at phase
interface. Numerical results by this Monte Carlo simulation method have been
reported for two-phase media. The method appears to be impractical for calculat-
ing the effective conductivity of general heterogeneous media because it requires
the solution of complex first passage problems for JR 2 - and JR3 -valued Brownian
motion processes.
In this section we discuss an alternative method for estimating the effective
conductivity of random heterogeneous media. The method is general, and can
be applied to estimate the effective conductivity of heterogeneous media with an
arbitrary number of phases, which can have any geometry. As previously stated,
the analysis is based on an extension of the random walk method discussed in
Section 6.2.

8.5.1.1 Homogeneous media


Suppose that u is the solution of the Dirichlet boundary value problem
a Llu(x) = -1, XED, (8.74)

defined on an open bounded subset D of ffi.d with u(x) = 0 for x E aD, where
aD denotes the boundary of D, (1 > 0 is a constant, and Ll = L~=l a2 ;ax~. The
solution u can be interpreted as the temperature in a homogeneous medium with
constant conductivity a > 0, which is subjected to a unit flux everywhere in D
and zero boundary conditions. Let
T = inf{t C:: 0: B(t) tJ. D, B(O) =x ED} (8.75)
be the first time an ffi.d -valued Brownian motion B starting at x E D exits D.

The local solution ofEq. 8.74 is

1
u(x) = -Ex[T], xED. (8.76)
2a

Proof: The Ito formula applied to the mapping B 1-+ g(B), g E C2 (lRd), in the time
interval [0, T] gives

Ex[g(B(T))]- g(x) =Ex [loT A[g(B(s))]ds]


8.5. Effective material properties 611

by expectation, where A = (1/2) y:_d =l a2 jax~ = (1/2) ~is the generator of B (Sec-
tion 6.2.1.3). The notation A used in 'iliis section for the generator of Band in the following
section for the generator of a diffusion process X, has no relation to a similar notation used
previously in this chapter for some differential operators (Section 8.4).
The local solution in Eq. 8.76 results by writing the above equation with u in place
of g and noting that B(T) E aD so that u(B(T)) = 0 and that B(s) ED for s < T so that
A[u(B(s))] = -l/(2cr). •

Example 8.38: If Din Eq. 8.74 is a sphere Sd(r) = {~ E r :II ~ II< r} of radius
r > 0 centered at the origin of m.d , then

Ex[T(r)] r 2 - llxll 2
u(x) = 2a = 2 da , x E Sd(r),

where T(r) is given by Eq. 8.75 with D = Sa(r). Hence, the temperature at the
center of the sphere is u(O) = r 2 /(2d a).¢
Proof: Let k > 0 be an integer and let Tk = min (k, T(r)). The expectation of the Ito
formula applied to the mapping B 1-+ II B 11 2 in the time interval (0, Tk) gives

Ex [11 B(Tk) 11 2]- II x 11 2= Ex [Jon~~ II B(s) 11 2 ds]


or Ex [11 B(Tk) 11 2]- II x 11 2= Ex [Tk]d since~ II x 11 2= 2d. We have

Ex [11 B(Tk) 11 2]- II x 11 2 r2- 11 x 112


Ex[Tk]= d ::::; d

so that T(r) = limk~oo Tk < oo a.s. and Ex[T(r)] = (r 2 - II x 11 2 )/d. Additional


considerations on this proof and related topics can be found in [135] (Chapter 7 and Exam-
ple 7.4.2, p. 119). •

8.5.1.2 Heterogeneous media


Consider a heterogeneous medium in D c m.d characterized by the conduc-
tivity field a(x) > 0, xED. Then Eq. 8.74 becomes

\1· (a(x) \lu(x)) = -y(x), xED, or


d
" ' aa(x) au(x)
L...- - - - - + a(x)~u(x) = -y(x), X ED, (8.77)
p=l axp axp

with u(x) = 0 for X E aD, where \1 = (ajaxt, ... , ajaxa) and y(x) denotes the
fluxatx ED.
Let X be a diffusion process defined by

dX(t) = a(X(t)) dt + b(X(t)) dB(t), (8.78)


612 Chapter 8. Stochastic Systems and Deterministic Input

where the entries of the (d, 1) and (d, d) matrices a and b are real-valued func-
tions defined on Jlld and B is an Jlld -valued Brownian motion. Define the drift and
diffusion coefficients of X by

aa(x) ~
ap(x) = -a-, p = 1, ... , d, and b(x) = v' 2a(x) i, (8.79)
Xp

where i is the (d, d) identity matrix. It is assumed that the drift and diffusion
coefficients defined by Eq. 8. 79 satisfy the conditions in Section 4. 7 .1.1 so that
the solution ofEq. 8.78 exists and is unique. Let

T* = inf{t 2: 0: X(t) 1. D, X(O) =x E D} (8.80)

denote the first time when X starting at X (0) = x E D exits D. It is assumed that
T* has a finite expectation (Section 6.2.1.1).

The local solution ofEq. 8.77 is

u(x) =Ex [loT* y(X(s)) ds J. (8.81)

Proof: We first note that, if the flux y is unity, then the local solution of Eq. 8.77 becomes
u(x) = Ex[T*].
Let g be a function with continuous second order partial derivatives and assume
X(O) =X. The average of the It6formula applied to the mapping X ~ g(X), g E C2 (JR.d),
in a time interval [0, t] gives

Ex[g(X(t))]- g(x) =Ex [f A[g(X(s))] ds J,


where
1
L ap(X)~p + 2 p,q=l
d a
L (b(x)b(xl)pq axp axq ,
d a2
A=
p=l
is the generator of X (Eqs. 8.78-8.79 and Section 6.2.1.1). The above Ito formula applied
to the solution u of Eq. 8.77 in the time interval (0, T*) yields

Ex [u(X(T*))]- u(x) =Ex [loT* A[u(X(s))] ds] ,


where A[u(x)] = L~=l (aa(x)jaxp) (au(x)jaxp) + a(x)f>u(x) (Eq. 8.79). The local
solution in Eq. 8.81 follows since X(T*) E aD and X(s) E D for s < T*, so that
u(X(T*)) = 0 and A[u(X(s))] = -y(X(s)).
Generally, the expectation in Eq. 8.81 cannot be calculated analytically. However,
this expectation can be estimated from samples of X. Methods for generating sample paths
of diffusion processes are discussed in Section 5.3.3.2.
8.5. Effective material properties 613

The conductivity field in a multi-phase medium has jumps at the interface between
its constituents so that the drift and diffusion coefficients given by Eq. 8.79 do not satisfy
the uniform Lipschitz conditions. However, we can develop smooth approximations u for
the conductivity field a of a multi-phase material, and use these approximations to calculate
the effective conductivity, as illustrated later in Example 8.39. •

8.5.1.3 Effective conductivity


Consider a deterministic heterogeneous medium with conductivity field a (x)
and a deterministic homogeneous medium of unknown conductivity a eff· The two
media are contained in two d-dimensional spheres Sa(r) with the same radius
r > 0. We define aeff from the condition that the temperatures at the center of
the heterogeneous and the homogeneous spheres coincide and referred to it as the
effective conductivity of the heterogeneous medium ([185], Chapter 18). The
effective conductivity depends on the radius r > 0 of the sphere.
Let T*(r) be the stopping time in Eq. 8.80 forD = Sa(r), y(x) = 1,
the diffusion process X in Eq. 8.78 with coefficients in Eq. 8.79, and initial state
X(O) = 0.
The effective conductivity for a deterministic heterogeneous material contained
in a sphere Sa(r), r > 0, is

r2
aeff(r) = 2d £0[T*(r)]. (8.82)

Proof: We have shown in the previous sections that the values of function u at the center of
a sphere Sa (r) containing a homogeneous material with conductivity O'eff and a heteroge-
neous material with conductivity field a(x), x E Sa(r), are r 2 /(2d aeff) and E 0 [T*(r)],
respectively (Eqs. 8.76 and 8.81 and Example 8.38). The effective conductivity of a het-
erogeneous medium results from the condition E0 [T*(r)] = r 2 /(2d Ueff(r)).
If the conductivity field is periodic, the dependence of O'eff on r becomes weak
for values of r exceeding the scale of material periodicity because additional segments of
material beyond this value do not bring any new information. •

Consider now a random heterogeneous material subjected to a unit flux, that


is, y (x) = 1 in Eq. 8. 77, and zero boundary condition. Let S d (r) be two identical
spheres containing a random heterogeneous material with conductivity given by
a random field :E(x), x E Sa(r), and a homogeneous material of unknown con-
ductivity :Eeff· Our objective is to find properties of :Eeff· We determine :Eeff, as
in Eq. 8.82, from the condition that the temperatures at the center of the hetero-
geneous and homogeneous spheres coincide. The problem is less simple in this
case since the temperature U(x) in the heterogeneous sphere is a random field.
For a sample w E Q of the random heterogeneous material the temperature at the
center of Sa(r) is U(O, w) = E 0 [T*(r, w)], where T*(r, w) is given by Eq. 8.80
forD= Sa(r), material sample w, and X(O) = 0.
614 Chapter 8. Stochastic Systems and Deterministic Input

To find :Eeff, we require that the temperature at the center of the sphere
Sd(r) containing the homogeneous material matches U(O, w) =
E 0 [T*(r, w)] for
almost all w's, that is, r 2 I (2d :Eeff(r, w)) =
U(O, w), which yields (Eq. 8.82)
r2
:E ff(r w) - w E Q (8.83)
e ' - 2d E 0 [T*(r, w)]' ·
The resulting effective conductivity is a random variable. The following Monte
Carlo algorithm can be used to calculate the effective conductivity :Eeff·

• Select a sphere Sd(r), that is, a radius r > 0, and generate nw independent
samples :E(x, w), x E Sd(r), of the conductivity random field :E.

• Estimate for each material sample w the effective conductivity :Eeff(r, w) in


Eq. 8.83. The estimation of the expectation E 0 [T*(r, w)] is based on ns samples
of the diffusion process X in Eqs. 8.78 and 8.79 for each material sample w.

• Estimate properties of :Eeff(r), for example, E[:Eeff(r)] can be estimated by

(8.84)

Note: The conductivity random field can be defined by a translation random field, that is,
l;(x) = F- 1 o <l>(G(x)), where F is a specified absolutely continuous distribution, <l>
denotes the distribution of N(O, 1), and G is a Gaussian random field with differentiable
samples (Sections 3.6.6 and 5.3.3.1 in this book, [79], Section 3.1).
Let 1;(-, w) be a sample of l; and let x(w) denote the diffusion process X in
Eqs. 8.78 and 8.79, in which a(·) is replaced by l;(·, w). Let T*(r, w) be the first time
x(w) starting at x = 0 exits Sd(r). The expectation of T*(r, w) can be estimated from ns
independent samples of x(w) generated by the methods in Section 5.3.3.2. The effective
conductivity corresponding to a material sample win Sd(r) is given by Eq. 8.83. An esti-
mate of the expected value of l;eff(r) associated with Sd(r) can be obtained from Eq. 8.84.
If the radius r of Sd(r) is much larger than the correlation distance of the conductiv-
ity field and l; is an ergodic random field, the dependence of :Eeff(r, w) on w is likely to be
insignificant for almost all material samples so that l;eff(r) can be obtained from a single
material sample and is nearly deterministic. &

Example 8.39: Consider a random heterogeneous medium in S d (r) = S1 (r) with


two homogeneous phases of conductivity a1 for the matrix and a2 > a1 for in-
clusions. The conductivity field a (x) of this medium is a piece-wise constant
function taking the values a1 and a2 on intervals of random location and length.
Let the intervals (~i, rJi) and (rJi, ~i+1), ~i < 'f/i < ~i+1• give the location in S1(r)
of inclusion i and the matrix to the right end of this inclusion for an arbitrary re-
alization w of the medium. The spatial distribution of the inclusions is given by a
Poisson process N with intensity A. > 0.
Numerical results have been obtained for a1 = 1, a2 = 2, r = 10, and A. =
1. The variance of the estimator :Eeff(r) in Eq. 8.84 can be relatively large for small
8.5. Effective material properties 615

values of nw and ns so that the resulting estimates of the effective conductivity can
be unstable, that is, their values may differ significantly from sample to sample.
For example, f:eff(r) is 1.7270 and 1.5812 for (nw = 10, ns = 10) and (nw =
50, ns = 10), respectively. The corresponding estimates of the coefficients of
variation of f:eff(r) are 8.33% and 4.12%. Stable values of i:eff(r) result for
nw ~ 50 and ns ~ 500. For nw = 50 and ns = 500 the estimated effective
conductivity is i:eff(r) is 1.4997, and has a coefficient of variation of 1.6%. ~

Note: The above numerical results are for a smooth approximation :E E C2 (St(r)) of the
conductivity random field I: of the two-phase material. The approximation :E is such that
Eq. 8.78 with drift and diffusion coefficients in Eq. 8.79 has a unique solution. 4

Example 8.40: Let U be the solution of d ( :E (x) d~~x)) j dx = -1 for x E


S(r) = (-r, r), r > 0, and U(±r) = 0, where the conductivity :E is a random
field with smooth strictly positive samples. The effective conductivity is

ir ir - - ir --/ ir
-1
r2 [ 0 Y dy r Y dy 0 dy r dy ]
:Eeff--
- 2
- --
:E(y)
+ :E(y) :E(y)
- -
:E(y)

Suppose that :E(t) = Ct + c2 exp(G(t)), t E S(r), where q, c2 > 0


are some constants and the values of the Gaussian field G at the locations t k =
-r +k !l.t, !l.t = rjn, are given by the autoregressive Gaussian sequence G k+l =
p Gk + y'1 - p 2 Wk, k=0,1, ... ,2n-1, with Go and Wk independent N(O, 1) vari-
ables.
Figure 8.15 shows estimates of the expectation and the standard deviation

2.8 2.5

2.7 E[l: e~ Std[l: elf]


2
2.6

2.5 1.5

j
2.4

2.3

2.2
0.5
2.1

2 0
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
p p

Figure 8.14. Second moment properties of :E eff

of :Eeff based on 1,000 material samples. Small and large values of p corre-
spond physically to rapid and slow spatial fluctuations of conductivity, respec-
tively. These values of p can also be viewed as corresponding to large and small
616 Chapter 8. Stochastic Systems and Deterministic Input

material specimens with respect to the microscale fluctuations. If p ~ 1, the con-


ductivity is nearly constant in each material specimen so that Eeff(W) ~ a+f3 e 0 o,
and the mean and standard deviation of Eeff are 1 +e 112 ~ 2.65 and (e 2 - e) 112 ~
2.16, respectively. If p ~ 0, the conductivity is nearly a white noise, the depen-
dence of Eeff(W) on w is weak so that Std[Eeffl is small. The relationship between
the estimate of E[Eeffl and p indicate a size effect phenomenon for the effective
conductivity. o
Proof: The solution of the above equation for U is

U(x) = -lx
-r
y dy
:E(y)
+
fr
-r
ydy
~
Jr --1L_
lx _!!!____,
-r :E(y)
-r !:(y)

The conditions in Eqs. 8.82 and 8.83 give the stated expression for l::eff· This expression
and samples of :E were used to construct the plots in Fig. 8.15. •

Example 8.41: Consider a random heterogeneous material in an open bounded


subset D of!R2 with conductivity E(t) =a+ f3 L:f~f'l Yk h(t- fk), where a, f3
are some constants, N is a Poisson process with intensity A > 0, and D' ::) Dis
such that points of N outside D' have nearly no contribution to E. Figure 8.15
shows contour lines for two samples of E corresponding to a = 2, f3 = 0.05,

Sample 1 Sample 2

6 I 8
1

Figure 8.15. Contour lines for two samples of E

Yk = 1, and h(t) = f/J(tl/0.2) f/J(t2/0.2), where r/J denotes the density of N(O, 1).
Figure 8.16 shows the dependence of estimates of the expectation of E eff on A.
The estimates of E [Eeffl for each value of A have been obtained from ten ma-
terial samples and thirty samples of X for each material sample (Eq. 8.84). The
estimates of E [Eeff] nearly coincide with and are much larger than the matrix
conductivity a for small and large values of A, respectively. 0
8.5. Effective material properties 617

•.• Estimates of E[l: affl

'•

Figure 8.16. Estimates of E [~eff]

Note: The average number of inclusions in D is equal to Avv', where vD' denotes the
volume of D 1• The effective conductivity increases with A since the inclusions have higher
conductivity than the matrix. A

8.5.2 Elasticity
Our objective is the determination of effective properties for a class of ran-
dom heterogeneous materials characterized by properties with a scale of spatial
fluctuations much smaller than the overall dimension of material specimens. This
type of heterogeneity is common to polycrystalline metals and multi-phase mate-
rials, for example, aluminum and glass fiber reinforced plastics.
Consider a random heterogeneous material in an open bounded subset D
of ~d that is in equilibrium under some external actions. Let Sij(X), fij(X),
Aijkt(x), and Cijkt(x) denote, respectively, the strain, stress, stiffness, and com-
pliance tensors at x E D. It is assumed that:
1. The material is linearly elastic with the stress-strain relationship

(8.85)

where x E D, i, j, k, l = 1, ... , d, and summation is performed over repeated


subscripts.
Note: The matrix form of these equations is

S(x) = A(x) r(x), x E D and r(x) = C(x) S(x), x E D, (8.86)

where S = (Su, S22. S33, S12, S23, S31) and r = (f'u, f'22· !33, f't2· !23, !31) denote
column vectors, and A and C are the corresponding matrices of stiffness and compliance
coefficients for three-dimensional elasticity problems (d = 3). The fourth order tensor
Aijkl has the properties: (a) Aijkl = Aijlk and Aijkl = Ajikl by the symmetry of the
stress and strain tensors and (b) Aijkl = Aklij by the existence of the strain energy ([125],
p. 141). The compliance coefficients Cijkl have the similar properties. Hence, the tensors
A;jkl and Cijkl in IR3 have 21 distinct entries. A
618 Chapter 8. Stochastic Systems and Deterministic Input

2. The deformations are small so that


1
rij (x) = 2 (Ui,J (x) + Uj,i (x)), x E D, (8.87)

where Ui (x) denotes the displacement function at x in the direction of coordinate


Xi and Ui,j = aUi(iJxj.

3. The stiffness A(x), x E D c JRd, is an JRq -valued random field defined on a


probability space (Q, F, P), where the integer q ::: 1 gives the number of distinct
entries of A. The random field A is homogeneous, square integrable, and ergodic.
Moreover, D is sufficiently large so that the spatial average of an arbitrary sample
of A over Dis nearly constant and equal to its ensemble average E[A(x)].

The constitutive law (Eq. 8.85), the kinematics relationships (Eq. 8.87), the
equilibrium conditions

Sij,J (x) + Fi (x) = 0, x E D, (8.88)

where Fi is the coordinate i of the body force vector, and the boundary conditions,
for example, traction, displacements, or a mixture of these actions on the boundary
aD of D, need to be used to find the displacement, strain, and stress fields in D.
The equilibrium conditions expressed in terms of displacements are

or [Aijkt(x) Uk,t(x)],J + Fi(x) = 0, so that the displacement field is the solu-


tion of a system of partial differential equations with random coefficients. The
solution Ui, i = 1, ... , d, of these equations is a random field with differentiable
functions defined on (Q, F, P). Hence, the strains and stresses are also random
fields defined on the same probability space (Eqs. 8.85 and 8.87). Note that the
local solution in Sections 6.2, 6.3, and 6.4 can be used to solve some elasticity
problems for samples A ( ·, w) of A.
Let (Sij(X), riJ(X), Ui(x)), xED, i, j = 1, ... , d, be the solution of the
stochastic boundary value problem defined by Eqs. 8.85-8.88. This solution pro-
vides detailed information on the stress, strain, and displacement fields that may
be unnecessary. Coarser information provided, for example, by local averages of
these fields may ~uffice in many applications. Denote by

- (x)
SiJ =-1
v(x)
1
D(X)
SiJ (~) d~ and rij
- (x) = -1-
v(x)
1
D(X)
rij (~) d~ (8.89)

the stress and strain moving average fields, where v(x) = fv(x) dx and D(x)
is a subset of D centered on x such that D(x) is small relative to D but large
with respect to the scale of fluctuation of SiJ and riJ. Also, D(x) has the same
shape and size for every x. A set D(x) with this property is referred to as a
representative volume. Our objective is to find properties of the moving average
8.5. Effective material properties 619

fields SiJ and f' ij. Two options are available. The first option is to derive properties
of Sij and f'iJ from Eq. 8.89. The disadvantage of this approach is that it requires
knowledge of the probability laws of the random fields (SiJ, f'iJ, Ui). The second
option is to find properties of Sij and f'ij directly. This approach requires the
development of new constitutive laws for the average stress and strain fields, and
such laws are difficult to derive in a general setting. For example, Eqs. 8.85 and
8.89 yield

- (x)
SiJ = -(I ) 1 Aijkl (~) f'kl (~) d~

1
V X D(X)

['kl (~)
=(-1 Aijkl(~) - - - - d~
) -
['ki(X) = Aijkt(X)
- -
['kl(X) (8.90)
V(X) D(X) ['kl(X)

and

- (x)
rij = -(1) 1 Cijkl (~) ski(~) d~

1
V X D(X)

skl (~)
=(-1 Cijkl(~) - - - - d~
) -
Ski(X) = Cijki(X)
- -
Ski(X), (8.91)
V(X) D(X) Skl(X)

respectively, provided that f'kl (x) and ski (x) are not zero. The matrix form of
the above equations is S(x) = A(x) f(x) and f(x) = C(x) S(x), respectively,
with notation as in Eq. 8.86. If the average is performed over D the argument x is
dropped.
The weighted averages of the stiffness and compliance coefficients defined
by Eqs. 8.90 and 8.91 give the constitutive law for the average stresses and strains,
where Aijkl and C;jkl are called the overall or effective stiffness and compliance
coefficients, respectively. These definitions of Aij kl and Cij kl are impractical since
they involve the fields S iJ and [' ij . The following sections present approximations
of and bounds on the effective stiffness and compliance coefficients, and are based
on developments in [137, 194].

8.5.2.1 Displacement controlled experiment. Voigt's average

Consider an arbitrary linear elastic heterogeneous continuum in D with vol-


ume v = fv dx, subjected to the boundary displacements

(8.92)

where Yij = Yji are specified constants defining the magnitude of the applied
boundary deformation. Let S;j , f' ij , and i1 denote the average over D of S ij , [' iJ ,
and U, respectively, where U is the strain energy of the body. It is assumed that
the body forces Fi in Eq. 8.88 are zero.
620 Chapter 8. Stochastic Systems and Deterministic Input

The average strain and the average strain energy are ([194], Section II.B)

- 1 = -1
r; !, U;, 1 (~) d~ = 9;1 ,
1/, 1-
V D

u-= 2v D Sij(~) U;,J(~)d~ = :2 SiJ y;1 , (8.93)

Proof: Let ~ : Rd ~ Rd be a differentiable function. The Gauss divergence theorem


states

f div(~(x))dx = f ~(x) ·n(x)da(x) = f s;(x)n;(x)da(x),


Jn Jan Jan

where div(~(x)) = si,i (x), n; (x) is the coordinate i of the exterior normal n(x) at x E an,
and da(x) denotes a surface element on an ([73], Section 9.4).
Let uCil(~) be a vector in Rd with entries U;(~)8qJ• where i, j are some fixed
indices and q = 1, ... , d, that is u<il (~) has zero coordinates except for the coordinate j
which is U; (~). The boundary conditions in Eq. 8.92 imply that the entries of u<il (~) are
(Yik ~k) 8qj for~ E an. These observations and the Gauss divergence theorem give

{ U;,J (~) d~ = { V. u<il (~) d~ = { uCil (~). n(~) da(~)


Jn
= i _
Jn

aD
Yik ~k 8qj nq(~) dam=
Jan
1 D
acr;k ~k)
a~j
d~ = 1_ _
D
Yij d~ = v Yij

so that I\1 = (ljv) JD U;,j (~) d~ = Yij.


We now calculate the average of the strain energy. Let iJ (~) be a vector in Rd with
entries S;q(~) Ui(~)8qJ• q = 1, ... , d, and note that

since Sij,j (~) = 0. The indices i and j are arbitrary but fixed. The average strain energy
accumulated in the elastic body is

- 11
U=-
2 V D lJ
S .. (x)U··(x)dx=-
l,J
11 -
2V D
'V·U(~)d~

= - 1/o-
2v an
Uq (~) nq (~) da(~) = -21/o
v an
SiJ (~) Yik ~k n 1 (~) da(~)

- __!_ f a(S;J (~) Yik h) dl: - ~ - .. f S·. (I:) dl: - ~ S·. -..
-2vJn a~j .,-2Y'1JD lJ'> .,-2 zJYIJ•

where the above equalities follow from the Gauss divergence theorem, boundary condi-
tions, and the definition of the average stress provided that there are no body forces. •
8.5. Effective material properties 621

The Voigt average


A(V) = ~ { A(x) dx (8.94)
v lv
and the overall stitTness A in Eq. 8.90 with D(x) = D satisfy the inequality

(8.95)

Proof: The Voigt average corresponds to the assumption that the strain field in a hetero-
geneous medium is uniform, and is analogous to the effective stiffness of a parallel system
(Example 8.35). We say that A (V) is an upper bound on A since the matrix A (V) - A is
positive definite (Eq. 8.95).
Let A be the overall material stiffness given by Eq. 8.90 in which D(x) coincides
with D. The corresponding average stress and strain fields over D are related by Sij =
!
Aijkl i\z so that the average strain energy is U = yT Ay (Eq. 8.93), where y is a
column vector including all distinct components of the tensor Yij, that is, the average strains
( Yll, Y22, Y33, Y12, 923, Y:3r). As previously stated, these relationships are of little use for
applications because Aijkl depends on detailed information on the stress and strain fields
(Eqs. 8.90-8.91).
Let r ij (x) be the strain field corresponding to the equilibrium configuration of a
linear elastic random heterogeneous material in D with the boundary conditions in Eq. 8.92.
!
The strain energy corresponding to this field is U = ;;T A y. Consider also the uniform
strain field Yij corresponding to a linear elastic homogeneous material in D under the same
boundary conditions (Eq. 8.92). We have

ii = ~ ;;T Ay = 21v fv rij (x) Aijkl (X) rkz(x) dx ~ 21v fv Yij Aijkl (x) Ykl dx

= ~ Yij Ykl [;v JD Aijkl(x)dxJ= ~ ;;T A(V) y,


since the actual strain field minimizes the strain energy ([68], Section 10.7). •

Example 8.42: Consider a rod of length v > 0 and unit cross section character-
ized by the random stiffness A(x), 0 .::5 x .::5 v. The displacements of the left and
right ends of the rod are zero and 9 > 0, respectively. The overall constitutive
law is S = A f' and
- llov A(x) -f'(x)
A= - - - dx .::5 A (V) = -1 lov A(x) dx,
v o r v o
where A is defined by Eq. 8.90. <>
Proof: The average strain energy accumulated in the rod isU = (1/2) A y2 . According to
the principle of minimum potential energy, we have

U= !
2
A 92 ~ __!._
2v
r
lo A(x) 9 2 dx = ! 9 2 (! {v A(x) dx) = !
2 v ]0 2
A (V) 92

so that A ~ A (V). •
622 Chapter 8. Stochastic Systems and Deterministic Input

8.5.2.2 Stress controlled experiment. Reuss's average

Consider a linearly elastic material in an open bounded subset D of IR d de-


fined at the beginning of Section 8.5.2. The material is subjected to the boundary
traction
(8.96)

where Sij = SJi are specified constants, n J (x) are the coordinates of the exterior
normal n(x) to aD at XED, and v denotes the volume of D. Let Sij. rij.
and u
be the average of SiJ, riJ, and U over D, respectively. It is assumed that the body
forces Fi in Eq. 8.88 are zero.

The average stress and the average strain energy are ([194], Section ll.B)

- = -11
SiJ
V D
SiJ (~) d~ = siJ,
u- = -
1 /, 1 -
sj1 (~) ri1 (~) d~ = - siJ ri1 . (8.97)
2v v 2

Proof: For any stress field we have

{ SiJ(~HknJ(~)da(~) = { (siJ(xHk) 1. d~
lav lv ·
= { (SiJ 1 C~Hk + SiJ(~)8Jk) d~ = { Sik(~)d~
lv ' lv
by the Gauss divergence theorem and the equilibrium condition Sjj,j (x) = 0. Because the
stress field is constant on the boundary of D, we have

so that sik = (1/v) fv sik(X) dx = Sik·


The average of the strain energy over D is

with the notation used to derive U in Eq. 8.93 and boundary conditions. •
8.5. Effective material properties 623

The Reuss average


c<R) = ~ /, C(x) dx (8.98)
V D

and the overall compliance C in Eq. 8.91 with D(x) = D satisfy the inequality

(8.99)

Proof: The Reuss average corresponds to the assumption that the stress field is uniform
throughout the heterogeneous material, and is analogous to the effective stiffness of a series
system (Example 8.35). We say that c(R) is an upper bound on C since c(R) -Cis positive
definite (Eq. 8.99).
Let i: be the overall material compliance given by Eq. 8. 91 in which D (x) coincides
with D. The corresponding average strains and stresses are related by f'ij = Cijkl Skz so
that the average strain energy is i1 = 1sT i: s (Eq. 8.97), where s is a column vector
whose entries are the distinct elements of Sij . As previously mentioned, the calculation of
the average compliance coefficients Cijkl is impractical.
Let Sij (x) be the stress field corresponding to the equilibrium configuration of a lin-
ear elastic random heterogeneous material subjected to the boundary conditions in Eq. 8.96.
1
The average strain energy of this solid is given by 11 = sT i: s. Consider also the con-
stant stress field Sij corresponding to a linear elastic homogeneous material in D satisfying
Eq. 8.96. We have

uc = ~ sT i:s = 21v fv Sij(x) CiJkl(x) Skz(x)dx :s 1


2v fv siJ CiJkl(x)skzdx

1_ _ 1 { 1 -T (R) -
= 2 siJskz-;;JvciJkZ(x)dx= 2 s c s,

since the actual stress field minimizes the complementary strain energy and Sij satisfies the
traction boundary conditions ([68], Section 10.9). •

The Voigt and Reuss averages can be used to bound the overall stiffness in
the sense of the inequalities in Eqs. 8.95 and 8.99. For example, we have seen that
A (V) provides an upper bound on A. We also have that

(8.100)

is a lower bound on A= (C)- 1 since A- A(R) is positive definite ([166], Ap-


pendix A.2).

Example 8.43: Consider the random heterogeneous rod in Example 8.42 and as-
sume that it is subjected to a traction s at its ends. The overall constitutive law is
S = A f' or f' = C S and
-
C = -
1
v
1v
0
S(x)
s
1
C(x) - - - dx ::S C(R) = -
v
1v
0
C(x) dx

so that A= 1/C::: A(R) = 1jc<Rl. <>


624 Chapter 8. Stochastic Systems and Deterministic Input

Proof: The average complementary strain energy accumulated in the rod isUc = (1 /2) Cs2.
According to the principle of minimum complementary potential energy, we have

uc = !2 Cs2 :5 __!__
2v lo
r C(x) s2 dx = !2 s2 (!v Jofv C(x) dx) = !2 c<Rl s2
so that C :5 C(R) . •

Example 8.44: Let D be a cube with unit sides containing a multi-phase material
consisting of n isotropic homogeneous phases with shear and bulk elastic moduli
o<r) and K(r)' and volume v,, r = 1, ... 'n, so that L~=l Vr = 1. It is assumed
that the dimension of individual constituents is much smaller than unity. Let G and
K denote the effective shear and bulk moduli of the material in D corresponding
to uniform pure shear and hydrostatic pressure, respectively. For example, G
for a uniform pure shear so in the (X!, X2)-plane is G = so/f'l2. where r12 =
fv f12(x) dx since D has unit volume (Eq. 8.90). The Voigt and Reuss averages
give the bounds

on the effective shear and bulk moduli, G and K. <>


Proof: Suppose that the cube is sheared in the (xl, x2)-plane. The effective shear modulus
Gin this plane is bounded by the Voigt and Reuss averages given by Eq. 8.94 and Eq. 8.100.
These equations can also be used to bound the effective bulk modulus K corresponding to
hydrostatic pressure.
The Reuss averages provide lower bounds on G and K and correspond to uniform
boundary stresses (Eq. 8.100), while the Voigt averages relate to uniform boundary strains
(Eq. 8.94). That the Reuss average is smaller than the Voigt average also follows from the
Cauchy inequality ([149], Theorem 2, p. 46)

applied for a, = (vr Gr) 1/ 2 and br = (Vr / Gr) 112. This inequality holds for any a,, br in
lR or C. •

8.5.2.3 Physically based approximations


We have mentioned that the determination of macroscopic material proper-
ties from Eqs. 8.90 and 8.91 is computationally intensive because the strain and
stress fields, respectively, must be known everywhere in D. The previous two sec-
tions provide bounds on the overall stiffness and compliance coefficients Aijkl and
Cijkl· An alternative approach is to calculate Aijkl and Cijkl from Eqs. 8.90 and
8.5. Effective material properties 625

8.91 in which the actual strain and stress fields are approximated. The accuracy
of the resulting effective moduli A;jkl and C;jkl depends on the postulated strain
and stress fields. We illustrate this approach by two examples. The perturbation
and the Neumann series method are applied in the following section to develop
alternative approximations for A;jkl and C;jkl·

Example 8.45: Consider the multi-phase material in Example 8.44 and let v (r) be
the Poisson ratio of phase r, r = 1, ... , n. Suppose that the material is sheared
in the (x1, x2)-plane by a deterministic stress so. The overall or effective shear
modulus in this plane is G = L~=l Vr c<r) ri7 /f'12, where ri7 and r12 are
averages of the strain field f'12 over Dr and D, respectively, and Dr denotes the
subset of D occupied by phase r (Eq. 8.90 and Example 8.44). We also denote by
K the effective bulk modulus (Example 8.44).
The use of the above expression for the effective shear modulus G requires
knowledge of the strain field. Two options are available. We can calculate the
actual strain field f' 12 or we can postulate an approximate strain field f' 12 and then
calculate f'12· We use the latter option and take ri7 to be the strain in a spheri-
cal inclusion with the properties of phase r that is embedded in an infinite elastic
matrix with the macroscopic properties G and K of the composite. The resulting
stress field is ri7 =so/ (G + ~cc<rl - G)) where~ = 2 (4- 5 v) (15 (1 - v)),
v = (3 K - 2 G)/ (6 K + 2 G), and K = L~=l Vr K(r) e<rl /G [34]. The ap-
proximate contraction e<rl of phase r is e<rl = pj (K +a (K(r) - K) ), where
a = (1 +i!)/(3 (1- v)) and p denotes a deterministic hydrostatic pressure applied
to D. The average strains and contraction over Dare

n
and e = LVr e<rl.
r=1

Figure 8.17 shows histograms of K and G for a three-phase material with


concentrations (0.2, 0.3, 0.5). The corresponding phase shear and bulk moduli
are (1, 2, 3) and (4, 1, 2), respectively. Let n v denote the average number of con-
stituents. A larger nv means a larger reference volume D. The histograms in the
figure are for nv = 20 and nv = 200. Larger values of nv can be interpreted
as larger material specimens. The histograms in Fig. 8.17 are centered on the
asymptotic values, 1.9122 and 2.1420, of K and G corresponding to nv --+ oo.
The uncertainty in K and G decreases with the specimen size. For example, the
range of the histograms of G is (1.57, 2.71) for nv = 20 and (1.97, 2.33) for
nv = 200. The corresponding standard deviations of G are 0.1941 and 0.0657 for
the small and large specimen, respectively. <>

Proof: The stress-strain relationship for an isotropic homogeneous material is Sij


A1 cwt Oij + 2 G [' ij, where A and G are material constants. The average of the stress
626 ChapterS. Stochastic Systems and Deterministic Input

3 4
G K
(n =20) 3
2 v (n =20)
v
2

0
1.5 2 2.5 3
0
1
l
1.5 2
ln 2.5 3

10 10
G K
8 8
(n =200) (n =200)
v v
6 6

4 4

2 2

0 0
1.5 2 2.5 3 1 1.5 2 2.5 3

Figure 8.17. Histograms of K and G for a three-phase material

field in Dis (Eq. 8.90 with D(x) = D)


- 11
Sij = - Sij (x) dx

t
v D

=
r= 1
Vr [.sij A(r) ..!._
Vr
r
Jv,
r aa(X) dx + 2 o(r) ..!._
Vr
r
Jv,
rij (x) dx]

so that

which yields the stated constitutive law since the total shearing strain is equal to 2f\2·
The relationship between pressure and volume change r a a = r 11 + r 22 + r 33 for
an isotropic homogeneous material is Sa a = (3 A + 2 G) r aa = K r a a, where K denotes
the bulk modulus. The average Saa of Saa over D is

so that K is as stated.
The expressions of G, K, a, {3, and ii form a system of nonlinear algebraic equations
whose solution gives G and K. The plots in Fig. 8.17 represent the solutions of these
algebraic equations for material samples generated by Monte Carlo simulation.
8.5. Effective material properties 627

A multinomial distribution with n distinct outcomes was used to generate samples


of the multi-phase material. Let Pr > 0, L:~=l Pr = 1, be the probability of occurrence
of phase r in a trial and let Xr denote the number of such occurrences in nv independent
trials, so that
P ( nr=l
n {X
r = Xr
})
= ' nvl ' plx 1 ... PnXn
Xl· · · ·Xn.
ifL:;~=l Xr = nv and zero otherwise, wherexr = 0, 1, ... , nv for each phase. The average
of the ratio Xr / nv is Pr, that is, the fractional volume of phase r in D. •

Example 8.46: A collection of material particles that may have random shape
and/or mechanical properties whose interaction is localized at their contacts is
referred to as a granular material. In contrast to multi-phase media, the parti-
cles of a granular material are not included in a matrix. Most studies on granular
materials focus on the development of continuum constitutive equations and the
analysis of potential changes in material properties, referred to as structure evo-
lution [72, 163]. The analysis of granular materials can be based on (1) concepts
of molecular dynamics, (2) continuum smoothing and balance approach, (3) ki-
netic theory based on the conservation theorem of statistical mechanics and the
Maxwell-Bolzman distribution, and (4) mean field hypothesis [72, 108, 148].
We discuss solutions based on the mean field hypothesis. The effective bulk
and effective shear moduli, j_ and 2 [L, of a random packing of homogeneous and
isotropic spheres of the same radius r > 0 given by the mean field hypothesis are

j_ = 11-k (1-
Snr
a) [-a- __2!?___],
1-v 2-v
2 /L = JLk(l-a) [~ _ ~], (8.101)
Snr 1-v 2-v

where 11- and v are the shear modulus and the Poisson ratio of the spheres, k
denotes the average number of contact points for a sphere of the aggregate, a and
b are some constants related to the geometry of the contact between two spheres,
and a is the aggregate porosity, that is, the ratio of void to total volume. The
mean field hypothesis states that ( 1) each particle has the same number of contacts
that is equal to the average number of particle contacts over the entire aggregate
and (2) the contacts are equally spaced on the boundary of each particles. These
assumptions hold in a perfectly ordered configuration of the aggregate. The use of
the mean field hypothesis eliminates the only source of uncertainty, the geometry
of packing, and gives an overall relationship between stresses and strains that is
deterministic.
The derivation of the effective moduli in Eq. 8.101 involves two steps. First,
the constitutive law of the interaction between two particles needs to be estab-
lished. Second, the mean field hypothesis is applied to eliminate the uncertainty
in the coefficients of the equilibrium conditions and establish overall constitutive
laws. Details on these derivations can be found in [52]. <>
628 Chapter 8. Stochastic Systems and Deterministic Input

Note: We summarize the essential steps of the mean field solution for the special case of an
assembly of infinitely long cylindrical particles with circular cross section of radius r > 0
and parallel axes. Let S be a plane section through the assembly, which is parallel to the
axes of the particles. Consider a rectangular subset S' of S with unit side in the direction
of the particle axes. The other side of S' has length a » r. The area of the intersection
of S' with the cylindrical particles of the aggregate is a* ::::::: Li N; [2 r sin(a;) ], where
N; denotes the number of cylindrical particles intersecting S' with an angle [a;, a; + da;)
(Fig. 8.18). The numbers N; are random since the packing is not ordered. The interaction

Aggregate geometry Particle interaction

2r sin( a.) 0 F(p,c)


l n

x2c : ' 2
x\l
x'l

e (p,c) X
1

Figure 8.18. Particles of a granular material

between two particles in contact consists of the forces Fn and F1 normal to and contained in
the plane tangent to the boundaries of these particles at their contact point. It is assumed that
there is no relative movements between particles along their axes so that we can consider
that F1 is contained in the (XJ, xz) plane (Fig. 8.18).
Suppose that the assembly is in equilibrium under equal pressure in the XJ and
xz directions. These boundary conditions would generate uniform stress and strain fields
in a homogeneous isotropic material replacing the aggregate of particles. Let os;i be a
perturbation of the uniform stress field in this material and oy;j denote the corresponding
increment in strains. Our objective is to determine the relationship between 8sij and 8y;j,
that is, the overall constitutive law or, equivalently, the effective constants of the aggregate.
Consider again the set S' and suppose that the stress increment is limited to the component
parallel to axis xz. The total force increment on this section is a 8szz. The equilibrium
of the assembly is preserved if the original contact forces between particles are slightly
altered. Let 8F~p,c) and oF/p,c) denote changes in the contact forces caused by the stress
increment 8szz. The assembly is in equilibrium if the condition
a 8szz =- L L [8F~p,c) cos(0(p,c)) + oF/p,c) sin(e(p,c)) J,
particles contacts
8.5. Effective material properties 629

is satisfied, where e(p,c) is the angle between Fn and axis x2 (Fig. 8.18). The summation
includes contact forces acting on the particles which intersect s' and have their center above
this plane. These forces are identified by the superscripts (p, c) in the above equation. The
angles 9(p,c) are uncertain because of random packing, and therefore oF/ip,c) and oF/p,c)
are random variables.
The analysis of the interaction between two particles can be used to obtain the re-
lationships oFn = en oyn and oFt = Ct OYt between the force increments oFn and oFt
and corresponding deformation increments, OJin and OYt, in the normal and tangential di-
rections, where en and c1 are constants. The strain increments in the (xi, x2) reference
corresponding to these displacements can be calculated simply. Classical formulas of ten-
sor calculus can be used to find the image orij of the strain tensor or;j in the (XI, X2)
reference, for example,

or'11 +or'22 or'11 -or'22


or22 = 2 + 2 cos(2 E>) + ori 2 sin(2 E>).

The last two equations and the equilibrium conditions give relations between the stress
increment 0S22 and the strain increments orij. The above results and the mean field hy-
pothesis yield Eq. 8.101 following the arguments in [52]. ~

8.5.2.4 Analytically based approximations

The methods in Sec. 8.4.1 can be applied to find moments and other prob-
abilistic properties of the displacement, strain, and stress fields for random het-
erogeneous media. We show here that some of the methods in this section, for
example, the perturbation and Neumann series methods, can be applied to cal-
culate approximately effective material constants and develop bounds on these
constants that are tighter than the Voigt and Reuss averages.
It is assumed, as stated at the beginning of Section 8.5.2, that the stiffness
coefficients Aijkl are homogeneous, square integrable random fields so that their
expectations E[AiJkl (x)] = OiJkl exist and are constant. The random part of the
stiffness coefficients, Aijkl (x) - Oijkl. has mean zero.

Perturbation method. Suppose that the uncertainty in the stiffness co-


efficients is sufficiently small so that AiJkl has the representation AiJkl(X) =
OiJkl + 8 Rijkl (x), where 8 is a small parameter. We will seek solutions for the dis-
placement, strain, and stress fields in power series, for example, the displacement
functions are represented by Ui(X) = u/
0)(x) + 8 u?)(x) + 0(8 2 ).

If (1) the random fields AiJkl and Fi are ergodic and independent of each other
and (2) D is sufficiently large so that sample and ensemble averages of A ijkl
and Fi nearly coincide, the constitutive law for the average stress and strain
fields can be approximated by Sij (x) = aiJkl f'iJ (x), where SiJ and denote riJ
approximations to the order e of Sij and f'ij in Eq. 8.89, respectively.
630 Chapter 8. Stochastic Systems and Deterministic Input

Proof: The representation of the displacement field and Eqs. 8.85, 8.87, and 8.88 give

~ a;Jkt ( u~~& (x) + u 1:% (x)) = -F;,


l
2 a;Jkl
( (1)
uk,lJ (x) + ut,k/x)
(1) )
= - 21 R;Jkl,J
-
(x)
( (0)
uk,I (x) + u 1(0)
,k (x)
)

- 21 R;Jki(x)
- ( (0)
uk,IJ(x) + ul,kJ(x)
(0) )

to order 1 and 8, respectively, provided that F; ~ 0(1). The boundary conditions for the
above equations correspond to the terms of order 1 and 8, respectively, of the specified
boundary conditions. If the boundary conditions for the equations of order 1 and the body
forces F; are deterministic, the solutions u?l are deterministic. The solutions u?l are
random fields irrespective of the properties of F; and the type of boundary conditions since
the differential equations defining these functions are driven by the random part of the
stiffness coefficients. The approximate strain and stress fields to the order 8 are

r;i (x) = rii(0) (x) + 8 riJ(1) (x) + 0(8 2 ),


SiJ(x)=a;Jklrkl (x)+8
(0) [ - (0)
R;Jkl(x)rkt (x)+a;Jklrkt (x) +0(8)
(1) J 2

= S(O)
lj
(x) + 8 S(l) (x) + 0(8 2 ),
lj

where rki) = ( u~~) + ul:V) /2 and the expression for S;j is given by the terms of order
1 and 8 of A;jkt(X) rkJ(X).
The expectation of the above expression of S;i is

E [ s;(0)
1 (x) 1 (x) = a;Jkl E r;1 (x) + 8 r;1 (x)
+ 8 s;(1) ] [ (0) (1) J
since R;jkl and rk~) are not related so that the expectation E [ R;jkl (x) rk~) (x) Jis equal to
J J
E [ R;j kl (x) E [ r~~) (x) and R;j kl has mean zero. If the random fields A;j kl and F; are
ergodic so are the stress and strain fields. Hence, for a sufficiently large D, the expectations
E [ sfjl(x) + 8 sgl(x)] and E [r~l(x) + 8 rgl(x)J can be approximated by the sample
' ' .
(0)
averages of Sii + 8 S;i(1) and riJ(0) + 8 rii(1) , denoted by SiJ and rij, respectlVely. These
observations give the approximate constitutive law sij = a;j kl f\t . •
Neumann series method. Let R;jkt(X) = A;Jkt(X)- a;jkt be the random
part of A;Jkl and assume that the displacement field satisfies the boundary condi-
tion in Eq. 8.92. Consider a trial strain field
m
f(m)(x) = 2)-GR)kyk(x),
k=O

where G denotes an integral operator, y 0 = ji is the uniform strain field used


to establish the Voigt bound (Eqs. 8.92 and 8.93), and y ko k = 1, ... , m, denote
unspecified deterministic strain fields.
8.5. Effective material properties 631

The average strain energy satisfies the inequality

for any y k. The above upper bound on U can be minimized by an adequate


selection of the strain fields y k, k = 1, ... , m, and provides a tighter bound on
the effective stiffness coefficients than the Voigt average.

Proof: The displacement field satisfies the integral equation

where gkp are Green's functions defined by

aijkl gkp,lj (x, ~) + 8ip 8(x- ~) = 0, x E D

with gkp (x, ~) = 0 for X E aDand Uk


depends on the boundary conditions ([194], p. 14).
The equation for U gives by differentiation r (x) = y* (x) - (G R f) (x), where G denotes
an integral operator that results from the integral equation for U by differentiation and y*
is the strain field corresponding to UJ.An alternative form of the equation for r is

f(x) = (i + G R)-l y*(x) = [I=


k=O
(-G R)k] y*(x),

where the last equality is valid if the above Neumann series is convergent. Under this
condition we can approximate the strain field by the first m terms of the Neumann series
expansion of r, that is, by the field

r<ml(x) = [t(-G R)k] y*(x).


k=O

The strain energy satisfies the inequality

-
U=-y
1 -T - -
2
Ay:::-
1
v
1
D
ri.(m) (x)(a+R)ijkl(x)rkl(m) (x)dx
1

since r<ml is an approximate strain field ([194], Section IV).


Note that we can use the strain field r<m) even if the Neumann series is not conver-
gent since all we need is an approximate strain field, which is consistent with the boundary
conditions. Other trial strain fields can be used, for example, the strain field f(m), since the
above bound on the average strain energy holds for any approximate strain field. •

Example 8.47: Consider a rod with unit cross section and random stiffness A (x),
0::: x ::: v, subjected to boundary conditions U(O) = 0 and U(v) = y vat its left
and right ends, respectively. The average and random parts of A, assumed to be a
632 Chapter 8. Stochastic Systems and Deterministic Input

homogeneous random field in L2, are a = E[A(x)] and R =A- a, respectively.


The integral equation for the strain r is

r(x) = 9 + fov G(x, y) r(y) dy,


where the kernel G(x, y) = R(y) [1 - v 8(x - y)]l(a v) depends on the average
and random parts of A.
Let fcx) = 9 + H (x) Yl be a trial strain field, where Yl is a constant
and H (x) = J;
G(x, y) dy. The optimal strain field with this functional form is
-
r(x) = 9 + H(x) y 1opt, where
opt _ J; A(x) H(x) dx
yl = -y J; A(x) H(x) 2 dx ·
This optimal field provides a tighter bound on A than the Voigt average. For
example, suppose that A is given by a sum of terms Yk e-.Bix-Zki, where Yk are
identical copies of an exponential random variable with mean 1I p and Z k denote
the jumps of a homogeneous Poisson process N(x), x ::: 0, with intensity A > 0.
The average of A is a = E[A(x)] = 2AI(P {3) so that the expectation of the
Voigt average is E[A (V)] = a = 10 for a = 10, A = 30, {3 = 10, and 11 p =
a {3 I (2 A). The estimated expectation of the upper bound on A corresponding
to the optimal strain field is 8.72. This estimate has been calculated from 100
independent samples of A. <>
Note: The equilibrium condition in terms of displacements is

U 11 (x) = _.!_ [R 1(x) U 1(x) + R(x) U 11 (x)]


a
and results from the condition (A(x) r(x)) 1 = 0, where a prime denotes a differentiation
with respect to x. The integration of the above equation and the boundary conditions give

U(x) = [9 + __!__ {v M(y)dy] x- .!_ {x M(y)dy,


av Jo a Jo
where M(x) = J0 [R 1 (y) U 1(y) + R(y) U" (y)] dy. Hence, the strain field is
f'(x)
1
= 9 +- fnv M(y) dy- -1 M(x)
av o a
or, following integration by parts,

f'(x) = 9 + -1 fnv R(y) (1 - v 8(y- x)) f'(y) dy.


av o
Let Gk(x, y) = fo G(x, z) Gk-1 (z, y) dz, k = 2, 3, ... , where G1 (x, y) = G(x, y). If
lv G(·, ·)I < 1 a.s., the Neumann series,

f'(x) = 9+ E [fov Gk(X, y) dy] 9,


8.6. Evolution and pattern formation 633

is convergent a.s., and represents the strain field in the rod (Eqs. 8.58 and 8.59). We use
the above representation of[' to select trial strain fields f. The Voigt average (Eq. 8.94)
correspondstothetrialstrainfieldf(x) = ji. SinceAji 2 ::::: (1/v)j~ A(x)ji(x) 2 dxholds
for any strain field ji, it can be applied to the strain field ji + H (x) Yl· We have selected
y~pt to minimize the upper bound (1/v) J~ A(x) (ji + H(x) YI) 2 dx on A ji 2 . A

8.6 Evolution and pattern formation


Many physical, economical, biological, and other systems experience no-
table changes in time because of internal and/or external factors. A change in the
attributes of a system beyond a scale factor is called evolution. If the evolution
ceases as time increases, we say that a pattern has emerged. Heterogeneity is
a necessary condition for evolution. A homogeneous system cannot evolve. For
example, a homogeneous population cannot evolve by natural selection but a het-
erogeneous population does evolve by natural selection since its individuals have
different fitness levels. Randomness of a system property adds another dimen-
sion to evolution studies in the sense that probabilistic tools are needed to assess
whether evolution and/or pattern formation occurs.
We illustrate the evolution and pattern formation phenomena by examples
from elasticity and crystal plasticity (Section 8.6.1 and 8.6.2). Let A(x, t) be a
random field specifying a property of an elastic body at location x and time t
(Section 8.6.1). We say that this property evolves if A changes in time beyond a
scale factor. For example, suppose that A is a homogeneous random field consist-
ing of a superposition of a finite number of waves with fixed length but random
amplitude and phase (Section 5.3.1.2). A relative change in time of the ordinates
of the spectral density of A indicates a change in the frequency content of this
random field so that the field evolves. On the other hand, a proportional increase
or decrease in time of all spectral ordinates of A does not mean evolution since
such a change affects only the scale but not the frequency content of A. If A
becomes time-invariant up to a scale factor as t ~ oo, we say that a pattern has
emerged. Generally, the resulting pattern is simpler than the initial property. For
example, the amplitudes of some of the constituent waves of A may decrease to
zero in time so that A consists of the superposition of fewer waves at a relatively
large time t > 0 than at the initial time t = 0. Similar considerations apply for
crystal plasticity problems (Section 8.6.2).

8.6.1 Elasticity
Let D be an open bounded subset of JR:d, d = 1, 2, 3, containing a random
heterogeneous elastic material. The boundary aD of D is subjected to specified
displacements varying slowly in time. The stiffness tensor of the material in D
is a random field Aijkz(x, t), where x E D, i, j, k, l = 1, 2, 3, and t ::: 0. Our
634 Chapter 8. Stochastic Systems and Deterministic Input

objective is to assess whether the stiffness A;jkl can evolve and, if it evolves,
whether its evolution emerges in a pattern.
Let flU(x, t), flrij(X, t), and flSij(x, t) denote the displacement, strain,
and stress increments at x E D during the time interval [t, t + flt], flt > 0, for
example, flU(x, t) = U(x, t + M)- U(x, t). We assume in our discussion the
following models for material, kinematics, and external actions.

1. Material. It is assumed that the material:


(a) is random, heterogeneous, and isotropic with stiffness [68]

A;jkt(X, t) = A(x, t) 8ij 8kt + S(x, t) (8;k 8jt + 8;j 8kz), (8.102)

where

A(x, t) = A{t) + 8 A(x, t), S(x, t) = ~(t) +8 S(x, t), (8.103)

A and ~ are deterministic functions of time, and 8 is a small parameter. At


the initial time, t = 0, A and 8 are homogeneous random fields with mean
zero. It is assumed that the stiffness random fields A and 8 in Eq. 8.103 are
strictly positive a.s., and that these fields are sufficiently smooth so that all
the following operations on these fields are valid.
(b) obeys the evolution rule

flA(x, t) = c 0·l [!lr~n(x, t) flr~m(x, t)] 112 ,


flS(x, t) = cCn ( flr~n(x, t) flr~m(x, t)] 1/ 2 , (8.104)

where r~n = r mn - (1/d) r pp 8mn denotes the entry (m, n) of the devia-
toric strain tensor and cC>-) and cC~l are specified constants. This rule can be
viewed as an approximation of the crystal plasticity evolution defined later
in Section 8.6.2 (Eq. 8.122).

(c) is locally linear, that is, the relationship between stress and strain incre-
ments is

!lS;j(X, t) = Aijkt(X, t) flrkz(x, t) , XED, t 2::: 0. (8.105)

2. Kinematics. The deformation is small so that strain increments in [t, t + flt]


are
1
!lr;j(X, t) = 2 (llU;,j(X, t) + !lUj,i(X, t)), XED, t 2::: 0. (8.106)

3. Loading. The boundary aD of D is subjected to the distortion


1
2 (llU;,j(X, t) + !lUj,i(X, t)) = fletij(f) = Cij M, X E aD, t 2:::0,
(8.107)
8.6. Evolution and pattern formation 635

in [t, t + D.t], where CiJ = cJi are some constants. The loading is quasi-static and
body forces are zero.

Note: If e = 0 in Eq. 8.103, the material is deterministic, homogeneous, and isotropic so


that 6.ui (x, t) = Cik Xk 6.t gives the increment of the displacement field in [t, t + 6.t]. The
resulting strain increments

1
Ay· · - -2 (Au·
'-" I) - '-' 1,]· (x ' t) + 6.u ],1· · (x ' t)) -- .. (t) -- cI).. M
6.a I)

are space invariant. Because 6.Yij I 6.t = Cij, we refer to Cij as strain rates.
The material model defined by Eqs. 8.102, 8.103, 8.104, and 8.105 is quite simple.
We use this model because it allows analytical developments. Such developments are not
possible when dealing with more realistic material models, for example, the model for
crystal plasticity in Section 8.6.2. We also note that the symbol 6. used in this section to
indicate increments of strain, stress, and other functions should not be confused with the
Laplace operator used previously in the book. &

Our objective is to determine whether the solid defined by Eqs. 8.102 to


8.106 and subjected to the boundary conditions in Eq. 8.107 evolves, that is,
whether the frequency content of the random fields A and/or 8 (Eqs. 8.102 and
8.103) changes in time. If the material experiences evolution, we will examine
whether its evolution emerges into a pattern. To achieve this objective, we need
to calculate the properties of the random fields A and 8 at all x E D and t 2:: 0.
The determination of these properties involves two steps. First, we need to find
properties of the displacement field U. Second, the displacement field and the
evolution rule in Eq. 8.104 can be used to find properties of the random fields A
and 8. The starting point of developments in this section is [116] dealing with
deterministic heterogeneous materials, that is, materials that can be described by
Eqs. 8.102 and 8.103, in which A and 8 are deterministic functions.
The first order perturbation method is used to find properties of the incre-
ments D.U of the displacement field in the time interval [t, t + D.t]. Accordingly,
these increments are represented by

D.U(x, t) = D.u(x, t) + e D.U(x, t), (8.108)

where D.u(x, t) is linear in x, because it constitutes the increment of the dis-


placement field in a homogeneous and isotropic material with parameters (A.,~)
(Eq. 8.103) and the boundary conditions in Eq. 8.107. It is convenient to solve
the differential equations for D. if (x, t) in the frequency domain. Let D. (]j (q, t),
A(q, t), and S(q, t) be Fourier transforms of D.Ui(x, t), D.A(x, t), and D. 8(x, t)
with respect to the spatial coordinate x at each t 2:: 0. These definitions hold for
almost all samples of D. A and D. 8 by the above properties of these fields.
636 Chapter 8. Stochastic Systems and Deterministic Input

The Fourier transforms of the differential equations of equilibrium in terms of


displacements for order c are

/3ijkl (t) [qt qj flfh(q, t) + qk qj tl(ft (q, t)]


= v'=-1 qj [A(q, t) ~ijkl + S(q, t) PijkJ], (8.109)

where

~iJkl = 8iJ 8kt, PiJkl= 8;k 8Jl + 8u 8kJ and


2 f3iJkl (t) = A(t)i;ijkl + !;(t) PiJkl· (8.110)

Proof: We note that the Fourier transforms l:.U;(q, t) of l:.U;(x, t) in Eq. 8.109 depend
on the particular samples of t:.A(q, t) and t:.S(q, t) used in this equation so that they are
random. The following calculations apply to samples of the random fields A and S. We
also note that the functions l:.U; (x, t), t:.l\(x, t), and l:.S(x, t) are defined only on a subset
D of Rd. The Fourier transforms in Eq. 8.109 correspond to extensions of these functions
to the entire space Rd. It is common to use periodic extensions.
The equilibrium condition t:.Sij,J (x, t) = 0 can be given in the form

(Aijkl(X, f) l:.fk[(X, t)) ,]. = 0


or

[ ( ().(t) + 8 1\(x, t)) ~ijkl + CHt) + 8 S(x, t)) Pijkl )(t:.akl

+-82 (l:.Uk /(X, t) + l:.Ut k(X, t)))] 1· = 0


- -
' ' '

since t:.rij(x, t) = t:.aij(t)+(8/2) ( l:.U;,j(X, t) + l:.Uj,i(x, t))


(Eqs. 8.106 and8.108).
The resulting condition involving terms of order l is satisfied identically and the conditions
for terms of order 8 gives

The result in Eq. 8.109 is the Fourier transform of the above equality. •

The evolution of material parameters in [t, t + tlt] is given by


(8.111)

- ~) -
tlA(x, t) = bmn tlUm,n(X, t),
~
fl.:~(x,
~) -
t) = bmn tlUm,n(X, t), (8.112)

where b(J..), b(l;), b};~, and b;!~ are constants depending on the average strain
rates c;i.
8.6. Evolution and pattem formation 637

Proof: The change I::!..A(x, t) = A(x, t + l::!..t)- A(x, t) = 1::!../..(t) + 8l::!..A(x, t) of the
material parameter A during the time interval [t, t + l::!..t] depends on the strain increment
in this interval (Eq. 8.104). The function D..r~n l::!..f~m can be approximated to the order 8
by

l::!..f mn *
* l::!..f nm = l::!..amn l::!..anm - 31 (l::!..app) 2
+ 28 ( l::!..amn I::!..Bnm- ~ l::!..app I::!..Bqq),
where I::!..Bpq(x, t) = (I::!..Up,q(x, t) + I::!..Uq,p(x, t))/2. The arguments of the functions in
the above and some of the following equations are not shown for simplicity. The corre-
sponding approximation of the square root of this function is

* ~ r::
* )1/2 -va+
(l::!..fmnl::!..rnm y'a
8 ( 1
l::!..apq-3l::!..arr8qp ) I::!..Bqp(X,t),

!
where a = l::!..apq l::!..aqp- (l::!..arr ) 2 and the function (y + 8 z) 112 has been approximated
by y1/2 + (8 z)/(2 y 112). We have

1::!../..(t) + 8 I::!..A(x, t) = c(A.) [ y'a + :a (!::!..a pq (t) - ~ I::!..CXrr (t) 8qp) I::!..Bqp(X, t)]
so that

1::!../..(t) = c(A.) y'a,


_ c<A-) ( 1 )
I::!..A(x,t)= y'a l::!..apq(t)- 3 D..arr(t)8qp I::!..Bqp(x,t).

The results in Eqs. 8.111-8.112 follow since l::!..aij (t) = cij M (Eq. 8.107) so that alf2 =
(cpq Cqp - (err ) 2 /3) 112 l::!..t and

l::!..amn - (1/3) l::!..amn 8nm Cmn + Cnm - (2/3) Cpp 8nm


Ja = (cpq Cqp - (1/3) (err )2)1/2 ·

The constants of the first equalities in Eqs. 8.111 and 8.112 are, respectively,

b(A.) = c(A.) (c c _ (c )2/ 3) 1/2 and b(A.) = c(A.) Cmn + Cnm - (2/3) Cpp 8nm .
pq qp rr mn (Cpq Cqp- ( 1/ 3) (Crr )2)1/2
(8.113)
The constants b(l;) and b;i~ in the second equalities ofEqs. 8.111 and 8.112 can be obtained
from b(A.) and b~~ by multiplication with c<~) jc<A-). •

The Fourier transform of Eq. 8.112 is

(8.114)
638 Chapter 8. Stochastic Systems and Deterministic Input

Note: The above equations show that the Fourier transforms of ~A(x, t) and ~S(x, t)
are linear functions of the Fourier transform of the displacement increments, that is, of the
functions ~Om (q, t). 6

The material evolution rules in Eq. 8.111 show that the deterministic part
of the stiffness coefficients varies linearly in time and remains space invariant.
The evolution of the random part of the stiffness coefficients cannot be obtained
directly from Eq. 8.112 since the increments of these coefficients depend on the
increments D..Um,n of the random part of the displacement field. However, the
Fourier transform of this equation (Eq. 8.114) and Eq. 8.109 yield finite differ-
ence equations for A and 8. These equations are coupled so that they have to be
solved simultaneously. Generally, the coefficients of the equations defining the
evolution of A and 8 depend on the wave number q (Eq. 8.109). Hence, the rela-
tive magnitude of the amplitudes of the constitutive waves of A(q, t) and 8(q, t)
may change in time and this change implies evolution. Accordingly, the proba-
bility law of the random fields A(x, t) and B(x, t) would also change in time in
which case we say that the material properties evolve.

Example 8.48: Consider a rod of length l > 0 with stiffness A(x, t) = a(t) +
8 A(x, t), where a and A denote the deterministic and random parts of A and
8 is a small parameter. It is assumed that the stiffness obeys the evolution rule
D..A(x, t) = c<a) D..f(x, t), where D..f(x, t) = D..U'(x, t) and D..A(x, t) denote
strain and stiffness increments at x during the time interval [t, t + D..t], U (x, t) is
the displacement function at location x and timet, and U'(x, t) = aU(x, t)jax.
The rod is subjected to an average elongation D.. a (t) x = c x D. t, c > 0, per unit
length during the time interval [t, t + D..t].
The stiffness A does not evolve because the spectral density of A at a time
t > 0 can be obtained from the spectral density of A at time t = 0 by scaling.
Hence, a very simple system of the type considered in this example cannot evolve
although its properties vary randomly in space. We will see in the following two
examples that this behavior does not persist for more general systems. <>
Proof: The increment of the displacement field in the rod during the time interval [t, t + M]
can be approximated by ~U(x, t) = ~u(x, t) + s ~U(x, t), x E D = (0, 1). The material
model and the material evolution rule give (Eqs. 8.102, 8.103, and 8.104)
~a(t) + s ~A(x, t) = c(a) (~a(t) + s ~(;' (x, t)) so that

~a(t) = c(a) ~a(t) = c(a) c M and ~A(x, t) = c(a) ~U 1 (x, t)


are consistent with Eqs. 8.111-8.112.
The evolution of the deterministic part of the stiffness is a(t) = a(O) + c(a) ct.
The evolution of the random part of A can be determined from (1) the Fourier trans-
form ~A(q, t) = c<a) .J=I q ~U(q, t) of the above relationship between ~A and~(;'
(Eq. 8.114) and (2) the term A' (x, t) ~a(t) + a(t) ~U" (x, t) = 0 of orders of the equi-
librium condition ~S'(x, t) = 0, whose Fourier transform is
R q A(q, t) ~a(t)- a(t) q 2 ~U(q, t) = o.
8.6. Evolution and pattern formation 639

The above relationship between !:l.U(q, t) and A(q, t) and the Fourier transform of the
evolution of A give
~ c<a) c ~
!:l.A(q, t) = -- - A(q, t) !:l.t
a(t)
or, by taking the limit as !:l.t --+ 0 of the finite difference equation for A scaled by !:l.t,
a~ c0)c ~
-A(q, t) = - - - A(q, t).
at a(t)

Hence, Aat time t is

A(q, t) = A(q, 0) ( r c<a) c


exp - Jo a(s) ds
)
so that

= A(x, 0) exp (- !ao


t c<a) c )
A(x, t) - - ds
a(s)

gives the fluctuating part of the stiffness at an arbitrary time t ~ 0. The above equation
holds for almost all material samples. If A(x, 0) is a homogeneous random field with
spectral density s(q, 0), the spectral density of A(x, t) is

s(q, t) = s(q, 0) exp (-2 !aot -


c<a) c
- ds
a(s)
)

because the spectral density and covariance functions

[_ _ J = E [A(x,
E A(x, t) A(y, t)
_ 0) J exp (-2 Jot
_ 0) A(y, c<a) c
a(s) ds
)

are Fourier pairs. The relationship between the spectral densities of A at the initial and an
arbitrary timet ~ 0 shows that A(x; t) is a scaled version of A(x; 0). Therefore, there is
no evolution. •

Example 8.49: Consider a heterogeneous isotropic material in an open bounded


subset D of JR2 specified by Eqs. 8.102 to 8.106. The material is subjected to an
average uniform extension and distortion per unit length with increments
!:l..an(t) = c M and .6.a12(t) = c IL M
during the time interval [t, t + !:l..t], respectively, where c, IL > 0 are some con-
stants (Fig. 8.19).
The change in time of the Fourier transform Aof the material parameter A
is given by the finite difference equation
!:l..A(q, t) = a(q; j.L, A.,~) A(q, t) M,
where a(q; j.L, A.,~) depends on the wave number q, the loading parameter j.L, and
the average material properties A. and ~. The solution A(q, t) of the above finite
difference equation shows that the amplitudes of the constituent waves of A(x, 0)
change in time at different rates depending on the current value of a(q; j.L, A.,~).
Hence, the material experiences evolution, which may emerge in a pattern. ¢
640 Chapter 8. Stochastic Systems and Deterministic Input

Figure 8.19. Specimen loading

Proof: The increments of the displacements in D during a time interval [t, t + l!!.t] are

t!.U1 (x, t)= !!!.au x1 + f!.a12 x2 + t: f!.U1 (x, t) and


f!.U2(x, t) = t: t!.U2(x, t)

forD = (0, 1) x (0, 1) so that

M'u (x, t) = !!!.au + t: t!.U1,1 (x, t),


f!.i22(x, t) = t: t!.U2,2(x, t), and
f!.a12 t: - _
i!!.i12(X, t) = -2- + -2 (t!.U1 ,2(x, t) + t!.U2 ,1(x, t))

represent the corresponding strain increments at x E Din [t, t + t!.t] (Eq. 8.106).
The relationship between stress and strain increments is (Eq. 8.105)

t!.Su (x, t) ]
[ f!.S22(x, t)
f!.Sn(x, t)

][
A(x,t)+28(x,t) A(x, t) 0 1!!.1u(x,t)]
= [ A(x, t) A(x, t) + 2 S(x, t) 0 f!.i22(X, t)
0 0 8 (x, t) f!.i12(X, t)

so that the quasi -static equilibrium conditions of order t: are

cA., 1 (x, t) + 2 8, 1 ex, t)) t!.a 11 + (A.(t) + 2 ~(t)) t!.U1,u (x, t) + A.(t) t!.U2,21 (x, t)
+ S,2(x, t) f!.a1212 + ~(t)(f!.Ut,22(x, t) + f!.U2,12(x, t))/2 = 0 and
S, 1(x, t) t!.au/2 + W) (f!.U1,21 (x, t)/2 + t!.U2,u (x, t))
+ S,2(x, t) !!!.au +A.(t) f!.U1,12(x, t) + (A.(t) + 2~(t)) t!.U2,22(x, t) = 0.
8.6. Evolution and pattern formation 641

The Fourier transform of these equations (Eq. 8.109),

[
q[ (A.(t) + 2W)) + qi ~(t)/2 q1 q2 (A.(t) + ~(t)/2) ] [ ~q 1 (q, t) J
q1 q2 (A.(t) + W)/2) qi (A.(t) + 2 ~(t)) + q[ ~(t)/2 ~U2(q, t)
= R [ q1 ~an (A(q, tl + 2 S(q, t)) + q2 ~~12 S(q, t)/2 J
q1 ~a12 S(q, t)/2 + q2 ~an S(q, t) '

constitutes an algebraic system of linear equations for (~Ut. ~U2) with solution

[ ~U1 (q,t) J Ac~t [ b11 b12 J[ A(q,t) J


~U2 (q, t) =an a 22 - a[2 b21 b22 S(q, t) '

where

an= qf (A.+ 2~) + qh/2, a12 = a21 = q1 q2 (A.+ ~/2),


a22 = qi (A.+ 2~) + qf ~/2, bn = q1 a22- q2 a12,
b21 = a11 qz- a12 q1, b12 = (2ql + q2 t-t/2) a22- q1 t-ta12 and
b22 =an q1 t-t- a12 (2qi + q2 t-t/2).

The coefficients aij and bij are functions of the wave numbers q and the current average
material parameters A. and ~. The arguments in the above equations are suppressed for
simplicity.
The strain function,

* *
[rmn<x.t)rnm<x,t)] 1/2"" 1 [ (~an )2 + ( ~a12 )2]1/2
- ..fi

e - -
+ .../2[(~an) 2 + (~a 12 ) 21112 [~au (~UI,I(x, t)- ~U2,2(x, t))

+ ~a12 (M\2(x, t) + ~U2,1 (x, t))],


is used to define the evolution of the material parameters (Eq. 8.104). The resulting equa-
tions describing the evolution of the material parameters for the average uniform extension
and distortion, ~au = c ~t and ~a12 = c t-t ~t, are

~A.= c(J..) c (1 + t-t 2) 112 ~t/..fi and ~~ = c<n c t-t (1 + t-t2) 112 ~tj./2
for terms of order 1, and

~A(x, t) = i:O·) [~U1,1 (x, t) - ~U2,2(x, t) + t-t (~UI,z(x, t) + ~U2,1 (x, t))],
~S(x, t) = c<~) [~U1,1 (x, t)- f!J.U2,2(x, t) + t-t (f!J.Ut,z(x, t) + f!J.Uz,l (x, t))],
for terms of order e, where c(J..) = c(J..) /J2 (1 + t-t2) and c(~) = c(;) /J2 (1 + t-t2). The
material parameters A. and ~ vary linearly in time at different rates if t-t f= 1. The evolution
of the fluctuating part of material parameters is less simple. The Fourier transforms of the
last two equations,

~A(q, t) = J=I c(J..) c [(q 1 + t-tqz) f!J.U1 (q, t) + (-q2 + t-tql) ~U2(q, t)],
~S(q, t) = J=Ic<~) c [(ql + t-tq2) f!J.U1(q, t) + (-q2 + t-tqi) ~U2(q, t)],
642 Chapter 8. Stochastic Systems and Deterministic Input

and the linear system of equations giving the Fourier transform of U as a function of the
Fourier transforms of A and 8 can be used to determine the evolution in time of A and S.
Because the increments of A and Sin [t, t + ll.t] are related by

A(q, t) = (c(l.} jc(lg)) S(q, t), t:::: 0,


it is sufficient to calculate the evolution of, for example, A, and use the above equation to
find S. We note that the initial conditions need to be consistent with scale factors relating
A and S so that A(q, 0) = (c(l.) jc<~l) S(q, 0).
The evolution equation for A can be given in the form

ll.A(q, t) = -c<l.l c ll.t f(q; JJ-, A,~) A(q, t)


or, by taking the limit as ll.t ---+ 0 of the above finite difference equation for A after division
with ll.t,

!!__A(q; t) = -c<l.l c ll.t f(q; JJ-, A, 0 A(q; t) = a(q; JJ-, A,~) A(q; t),
at
where
(ql + qz JJ-) (bn + b12 c(l.) jc<O) + (-qz + q1 JJ-) (bzl + bzz c(l.) ;c<~l)
f(q; !J-, A,~) = 2
an azz- a12
The differential equation aAjat = a(q; JJ-, A,~) A shows that the constitutive waves of
A(q; 0) change in time at different rates since a(q; JJ-, A, 0 is a function of q. That A
evolves in time can also be seen from the solution,

A(q, t) = A(q, 0) exp (f a(q; JJ-, A(s), ~(s)) ds) ,


of this equation. •

Example 8.50: The previous example demonstrates that deterministic heteroge-


neous materials can evolve. We extend these considerations to random hetero-
geneous materials, that is, we assume that the material parameters A and 8 are
homogeneous random fields. We use the spectral densities of the random fields
A and 8 to assess whether the material parameters exhibit evolution. A similar
approach was used in Example 8.48.
Suppose that A is a homogeneous random field with mean zero and spectral
density
1 n
s(q, 0) = 2 2::>.?
[o(q + q<kl) + 8(q- q<kl)]
k=l

at the initial time t = 0, where ak > 0 are some positive constants and q (k) E JR2
denote wave numbers. The field has the spectral representation
n
A(x, 0) = L ak [ Ak cos(q(k) · x) + Bk sin(q(k) · x)],
k=l
8.6. Evolution and pattern formation 643

where Ak and Bk are uncorrelated random variables with mean zero and variance
1, and q<k) · x = qfk) x1 + qik) x2 (Section 3.9.4.3).
The spectral density, covariance function, and spectral representation of the
random field A(x, t) at an arbitrary time t 2: 0 are

1 n
s(q, t) = 2L Yk(t) [o(q + q<kl) + 8(q- q<kl) J,
k=l
n
c(~, t) = E [ A(x, t) A(x + ~. t) J = L Yk(t) cos(q(k) · ~), and
k=l
n
A(x, t) = L Yk(t) 1 1 2 [ Ak cos(q(k) · x) + Bk sin(q(k) · x) J,
k=l

respectively, where Yk(t) = a'f exp [ 2 f~ a(q<kl; tt. A.(s), g(s)) ds J. Hence, (1)
A(x, t) is a weakly homogeneous random field with mean zero at each time t 2: 0
and (2) the spectral representations of the random fields A(x, 0) and A(x, t),
t > 0, involve waves of the same frequencies but of different amplitudes.
Suppose that the spectral density of A(x, t) has only four waves with q (1) =
(2, 3), q< 2l = (3, 6), q< 3l = (5, 7), and q<4 l = (10, 0.01) and ak = 1, k =
1, ... , 4. Figure 8.20 shows the evolution in time of a sample path of A for tt = 1,

t= 1.8
5 5

0 0

-5 -5
2 2
2 2
0 0 0 0
x2 xl x2

t= 3.6 t= 6.0
5 5

0 0

-5 -5
2 2
2 2
0 0 0 0
x2 xl x2

Figure 8.20. A sample path of A


644 Chapter 8. Stochastic Systems and Deterministic Input

c = 1, cO.) = 1, c<O = 1, A.(O) = 10, and ~(0) = 5. As time progresses, a single


wave becomes dominant. Figure 8.21 shows snapshots of the covariance function

t= 0.0 t= 1.8
5 5

0 0

-5 -5
2 2
2 2
-2 -2 -2 -2
~2
t= 3.6 t= 6.0
5 5

0 0

-5 -5
2 2
2 2

-2 -2 -2 -2
~2 ~2

Figure 8.21. Evolution of the covariance function of A


of A at the initial and later times. These graphs show that material evolves and a
pattern emerges as time increases. The resulting pattern is simpler than the initial
material structure. <>
Note: The Fourier transform of a sample A(x, 0, w), wE Q, of A(x, 0) is

,
A(q, 0, w) = -(2n)2
2- Ln ak [(Ak(w) + .J=l Bk(w) ) 8(q + q(k))
k=l

+ ( Ak(w)- .J=l Bk(w)) 8(q- q(k)) J.


The spectral density of the random field A(x, 0) and the Fourier transforms of the samples
of this field have energy only at frequencies (q(k), -q<kl).
The solution in the previous example gives

A(q, t, w) = A(q, 0, w) exp (f a(q; J-t, .A(s), ~(s)) ds)

for the Fourier transform of each sample w E Q of the material parameters. Because
A(q; 0, w) has energy only at (q(k), -q<kl), so does A(q, t, w). Hence, A(q, t, w) and
A(q, 0, w) have the same frequencies, but the energy associated with these frequencies
8.6. Evolution and pattern formation 645

differ for the two random fields because the ordinates of the spectral density of Aare mod-
ulated differently in time depending on the values of a(q; f-t, .l..(s), Hs)). The difference in
the evolution of the energy associated with each frequency is caused by the dependence of
the function a(q; f-t, .1.., 0 on q.
The sample in Fig. 8.20 is generated from a Gaussian random field. The model is
inadequate for representing material stiffness because stiffness is positive and bounded and
the Gaussian fields do not have these properties. We use the Gaussian model in Fig. 8.20
only for graphical illustration. ~

8.6.2 Crystal plasticity


Generally, two scales are used to characterize the evolution of the atomic
lattice of polycrystalline metals subjected to time dependent actions: ( 1) a micro-
scopic scale, defined by the grain (crystal) size and (2) a macroscopic scale, de-
fined by a sufficiently large assembly of crystals. At macroscopic scale, polycrys-
talline metals can be described mathematically by continuum models. Linking
hypotheses are needed to relate material behavior at the microscopic and macro-
scopic scales. Averaging techniques can be used to go upscale from crystals to
continuum properties. Several hypotheses have been proposed to go downscale
from continuum to grains.
This section provides basic information on crystal plasticity, presents briefly
a numerical method for analyzing the evolution of stresses, strains, and other state
variables in a polycrystal, and demonstrates by examples that the orientation of
the crystal atomic lattice can evolve in time and that this evolution may emerge in
a pattern. Our discussion is based on [50].

8.6.2.1 Planar single crystal


It is assumed that the plastic deformation in a crystal can occur only on a
collection of parallel planes, called slip planes. A slip system is defined by a
collection of slip planes and the slip directions in these planes. The orientation of
the slip planes is determined by the configuration of the atomic lattice. We limit
our discussion to crystals in JR2 . More general results can be found in [145].
Two reference frames are considered in crystal plasticity, the laboratory or
sample frame (e1, ez), a global system of coordinates attached to the polycrystal,
and the lattice frame (e 1, ez), a local frame attached to the lattice of each crystal
(Fig. 8.22). The angle <I> between e 1 and e1 varies in time during deformation and
is a stochastic process for a grain sample if the applied action is a random function
of time. If only one crystal is considered, the sample and lattice frames may be
taken identical. The geometry of a slip system a is completely defined by the unit
vectors (sa, na).

Note: The unit vectors (sa, na) are related to the lattice frame (e 1 , e2) by
646 Chapter 8. Stochastic Systems and Deterministic Input

slip system

Figure 8.22. Reference frames for crystal plasticity

where ea denotes the angle between the direction sa of the slip system a and e1. The
Schmid tensor in the lattice frame is ta = sa 0 na and has the entries ti) = nj, sf
i, j = 1, 2, where sf
= cos(()a), s2
= sin(()a), n'l = - sin(()a), and = cos(()a). n2
Note that the trace and the determinant of matrix ta are zero. We decompose ta into its
symmetric and skew parts denoted by

a a [ - cos(()a) sin(()a) (cos2 (ea)- sin 2 (ea))/2 ]


p = sym(t ) = (cos 2 (ea)- sin2 (ea))/2 cos(()a) sin(ea)

and
q a =s kew(t a )= [ 0_ 112 1/2 ]
0 '
respectively. The tensor qa is constant only for planar crystals. A

Let S;j, i, j = 1, 2, be stresses acting on a crystal defined in the lattice


frame. The pressure, the deviatoric stress tensor, and the resolved shear stress on
a slip system a are, respectively,

= 21 (Su + Szz) = 21 Spp, Sij* = SiJ- Sm OiJ, andSa = Sij a a


Sm s; ni.
(8.115)

An alternative form of the resolved shear stress is

(8.116)

sa = -2 cos(Ba) sin(Ba) Z1 + (cos2 (Ba)- sin2 (Ba)) Zz, (8.117)

where Z1 = (St1 - S22 )/2 = (Su - Szz)/2 and Zz = Si2 = S12.


8.6. Evolution and pattern formation 647

Note: The deviatoric stress tensor depends on two parameters, Z1 = (Si1 - S?,2)/2 =
(S11 - Szz)/2 and Zz = Si2 = S12. The resolved shear stress corresponding to the
pressure tensor Sm 8ij is sa = Sm 8ij sf
nj = Sm (sf
nj) = 0 since sa and na are
orthogonal, so that

by the definition of the Schmid tensor and the symmetry of the deviatoric tensor, where
8iJ = 1 for i = j and 8ij = 0 for i =I= j.
The relationships between the components (JP, qa) and (Pa, Qa) of the Schmid
tensors in the lattice and laboratory frames are

(8.118)

where A 11 = Azz = cos(<l>) and A12 = -Az1 = - sin(<l>) are the entries of the (2, 2)
matrix A. If the deformation of the crystal varies randomly in time, the angle of rotation <l>
and the components (Pa, Qa) of the Schmid tensor are stochastic processes. &

The distortion of a crystal subjected to a stress field SiJ consists of elas-


tic and plastic strains. Elasticity is neglected in our discussion to simplify the
mathematical model [145]. We need to specify a plastic law for the crystal char-
acterizing the evolution in time of its slip system. A plastic law is defined by a
yield condition, a flow rule, and an evolution rule.

1. The yield condition, defining a criterion for the occurrence of plastic flow, is

If ISa I < s~. no slip,


If !Sal= S~, slip can occur, (8.119)

where S~r can be interpreted as a strength parameter for the slip system a and
represents a material property.

Note: sg. may or may not depend on the shearing rates tfJ of the slip systems f3 in the
crystal. Because the resolved shear stress sa depends linearly on the stress parameters Z1
and Zz (Eq. 8.117), the yield condition is aa Z1 +ba Zz = ±sg., where aa and ba depend
on the orientation of the slip system a.
If sgr does not depend on tfJ, the yield condition is said to be rate indepen-
dent. Otherwise, we deal with a visco-plastic model. A typical visco-plastic model is
sa
sg. = f (... ' t fJ' ...)' where f is a specified function of the shearing rates t ~ and
sa is a state variable whose time evolution is given by the differential equation sa =
g(Sa, ... , 1tfJ j, ...), for example,

(8.120)

where hafJ are specified functions. This evolution equation is analogous to a modified
version of the Voce macroscopic model [50]. &
648 Chapter 8. Stochastic Systems and Deterministic Input

2. The flow rule, giving the velocity gradient L P, is

(8.121)

where ta denotes the shearing rate of the slip system a.


3. The evolution rule, giving the rotation rate W of the atomic lattice, is

(8.122)

where R is a matrix defining the rigid body rotation and A = k RT denotes the
lattice spin.

Note: The sum of G = La ta pa and W - A coincides with the plastic velocity gradient
LP given by Eq. 8.121.
The evolution of the geometry and structure of a crystal experiencing plastic defor-
mation is illustrated in Fig. 8.23. Let Do be a subset of Rd (d = 2 in our discussion)

~
~

Figure 8.23. Crystal evolution under plastic deformation

giving the geometry of the crystal at the initial time t = 0, and let Dt c Rd be its de-
formed shape at a later time t > 0. Denote by x 1--+ y(x, t) the mapping from Do to
Dt. If the imposed deformation and/or crystal properties are random, Dt is a random
subset in Rd. It is assumed that the Jacobian of the mapping from Do to Dt is not zero
so that the mapping can be inverted. The physical meaning of this assumption is that a
non-zero volume in Do cannot be mapped into a zero volume in Dt or that there is no
subset of Do of non-zero Lebesgue measure whose image in Dt has measure zero. The
gradients F(x, t) = {ayi(x, t)jaxj) of the mapping x 1--+ y(x, t) in the laboratory frame
are given by F = R F P, where R and F P correspond to rigid body rotation and lat-
tice distortion, respectively (Fig. 8.23). The velocity gradient L = {ayi(x, t)jayj) in
the current coordinates can be calculated from L = F p-l by the definition ofF and
8.6. Evolution and pattern formation 649

ay;/ayi = (ay;/axk) caxkfayj), so that


L = R RT + R if RT, where i,P = ftP (FP)- 1,
since F = R FP + R ftP, F- 1 = (FP)- 1 R- 1, and R- 1 = RT. The velocity gradient
depends on only three parameters as we neglect the elastic deformation of the crystal, so
that there is no volume change (Lpp = 0). In analysis it is assumed that the deformation
is homogeneous at the grain scale, that is, Lis constant in each grain [50].
It is common to separate the velocity gradient in two components

G = sym ( R if RT) and W = RRT + skew ( R i./ RT)


giving the rates of deformation and spin, respectively. The equalities in Eqs. 8.121 and
8.122follow from the above equation with the notation A = R RT since LP = R if RT =
Lata ta (Eq. 8.121). ~
The evolution of the lattice distortion and orientation in a crystal is con-
trolled by the input velocity gradients, G and W, which need to be specified. The
analysis involves two steps. First, the strain rates ta have to be calculated. There
are three distinct cases, crystal with one (rank deficient), two (kinematically de-
termined), and three or more (redundant) slip systems [145]. Second, Eqs. 8.121
and 8.122 can be used to determine the evolution of strains and lattice orientation,
respectively. For example, Eq. 8.122 gives

.
cl> = A12 = W12 - "21"''a
~r (8.123)
a

for the evolution of the lattice orientation since q f1 = q!f.2 = 0 and qf2 = -q!f.1 =
1/2 (Fig. 8.22).
Note: If the crystal has a single slip system, there is no solution.
If the crystal has two slip systems, the unknown shearing rates ta, a = 1, 2, can
be calculated from the definition of G in Eq. 8.121. Then Eq. 8.122 yields a differential
equation for the angle <l> between the current lattice frame and the laboratory frame.
If the crystal has three or more slip systems, the constitutive laws of the slip systems
are needed to find ta. The symmetric part of the velocity rate can be given in the form

G = Lta pa = LPa cta;sa) sa= [Leta ;sa) pa (pal] S* = M(S*) S*,


a a a
(8.124)

because the stress field is assumed to be uniform in the crystal so that sa


= (pa)T S*,
where S* is a vector whose coordinates are the distinct entries of the deviatoric tensor Sij .
Generally, numerical algorithms are needed to calculateS* from Eq. 8.124 for an imposed
deformation rate G. The resolved shear stress corresponding to S* (Eq. 8.116),
sa = Pij Sij = (pa)T S* = ±Scr = ±Scr Ita !f'lm, (8.125)

can be used to determine ta with Scr obtained, for example, from Eq. 8.120. The evolution
of the lattice orientation follows from Eq. 8.122. ~
650 Chapter 8. Stochastic Systems and Deterministic Input

Example 8.51: Consider an infinite planar crystal with two slip systems. The
crystal is subjected to plastic deformation with velocity gradient L. Let <l>(t)
denote the angle between e 1 and e1 at timet (Fig. 8.22), where e1 is aligned with
e
the first slip system, that is, 1 = s 1' and f) is the angle between the two slip
systems of the crystal. The solution of the differential equation

. L12(t)-L21(t) sec(2fJ)
<l>(t) = 2 - 2 [(LJ2(t)

+ L21 (t)) cos(2 <l>(t)) + 2 Lu (t) sin(2 <l>(t))],

gives the lattice orientation at timet > 0 [12, 145]. <>


Proof: The velocity gradient L depends on only three parameters since L 11 + L 22 = 0, so
that its symmetric part has the entries

and

Because the grain has two slip systems, the slip shearing rates ta, a = 1, 2, can be deter-
mined from the above equations as functions of G11 and G22. These expressions of ta,
a = 1, 2, and Eq. 8.123 yield a differential equation for ci> with known coefficients and
input.
We also note that the velocity gradient can be decomposed in three parts correspond-
ing to pure compression, pure shear, and spin modes,

(8.126)

with intensities Ln, (L12 + L21)/2, and (L12- L21)/2, respectively. •

Example 8.52: Suppose that the crystal in Example 8.51 is subjected to a pure
time-invariant compression, that is, L12(t) = L21(t) = 0 and Ln is a con-
stant. The lattice orientation angle <1> (Fig. 8.22) satisfies the differential equation
<i>(t) = -Ln sec(2fJ) sin(2 <l>(t)) and is

<l>(t) = coC1 [cot(<l>(O)) e-2L 11 t sec(2e)],

where <1> (0) denotes the initial state. The fixed points of the differential equation
=
for <1> are ¢o n (n /2), where n is an integer.
Figure 8.24 shows trajectories of the lattice orientation angle <1> for L 11 = 1,
several initial conditions <1>(0) E (-n/2, n/2), and an angle fJ = nj6 between
8.6. Evolution and pattern formation 651

2.-------~------~------~-------,

-2L-------~------~------~------~
0 0.5 1 1.5 2
time

Figure 8.24. Evolution of lattice orientation for pure time-invariant compression

the two slip systems of the crystal. The trajectories of the orientation angle ap-
proach the stable equilibrium point ¢o = 0 as t --+ oo so that the density of <l>(t)
approaches a delta Dirac function centered at zero as time increases, irrespective
of the initial state <1>(0). Hence, atomic lattice evolves in time and its evolution
emerges in a pattern [ 11, 119].
The trajectories of the lattice orientation are less simple if L 11 is time de-
pendent. For example, suppose that L 12 = L21 = 0 and that L 11 is a stationary
Gaussian process with mean zero, variance one, and one-sided spectral density
g(v) = 2 {Jj[n (v 2 + {3 2 )], v :::: 0. Figure 8.25 shows a sample of L 11 for f3 = 0.1

2.5,-----,-~----~----~----~----.

10 20 30 40 50
time

Figure 8.25. Evolution of lattice orientation for pure time-variant random traction

and the corresponding sample of <I> for <1>(0) = 0.5 and(} = n j6. The sample
path of the lattice orientation <I> oscillates between the fixed points 0 and n /2 be-
652 Chapter 8. Stochastic Systems and Deterministic Input

cause the equilibrium position depends on the sign of L 11, which changes in time
[11, 119]. In this case the lattice orientation <I> evolves but no pattern emerges
from its evolution. <>

Note: Let z be a real-valued function defining the state of a physical system specified by
the differential equation z(t) = h(z(t), 9), where h is a known function and (J denotes
a vector of parameters. The solutions zo(fJ) with the property h(zo(fJ), 9) = 0 are the
singular or fixed points of the above differential equation. Hence, a singular point is also
an equilibrium point since z(t) is zero at zo(fJ). A fixed point is said to be stable if the
solution z starting in a small vicinity of zo(fJ) remains near zo(fJ) as t -+ oo. If a fixed
point does not have this property, it is said to be unstable.
Several methods can be used to determine whether a fixed point is or is not stable.
For example, the linearization technique ([104], Chapter 8, [119]) considers the differential
equation

i(t) = h(zo(fJ), 9) + h'(zo(fJ), 9) (z(t)- zo(fJ))


= h'(zo(fJ), 9) (z(t)- zo(fJ))
derived from the defining equation of z by linearization about zo(fJ). The solution of z
this differential equation approximates z in a small vicinity of zo(fJ) and can be used to
evaluate the stability of zo(fJ). We note that y(t) = z(t)- zo(fJ) = y(O) eh'(zo(fJ),fJ) t since
y(t) = h' (zo(fJ), 9) y(t). If the real part of h 1 (zo(fJ), 9) is strictly negative, y approaches
zero as time increases or limt---+oo z(t) = zo(fJ) so that zo(fJ) is stable. 4

8.6.2.2 Polycrystals
A polycrystal is an ensemble of a large number of crystals interacting with
each other. Consider a polycrystal occupying an open bounded subset D of IR d
with boundary aD subjected to a specified traction and velocity on aDs and a Dv,
respectively, where aDs U aDv = aD and aDs n aDv = 0. The objective
is to find the velocity and stress fields in D. The stress S and velocity V in
the polycrystal viewed as a continuum must satisfy (1) equilibrium conditions,
Sij,j + b; = 0, where b; denote body forces, (2) kinematics constraints, L =
grad(V) = G + W, (3) a constitutive law, for example, S* = KG, where K
represents the material stiffness that can be calculated from the crystal properties,
and (4) an incompressibility requirement, div(V) = V;,; = 0.
The material stiffness K depends in a complex way on the properties of the
constitutive grains and their interaction. Generally, K is determined by the finite
element method based on:
(1) The relationships Gg = Cg Sg and Sg = Kg Gg derived from Eqs. 8.121 and
8.122, where Cg and Kg denote the grain compliance and stiffness, respectively,
Gg is the symmetric part of the plastic velocity gradient in a grain, and S g denotes
the deviatoric part of the strain tensor in a grain. The parameters C g and Kg
depend on the resolved shear stress and the Schmid tensor in the grain.
8.6. Evolution and pattern formation 653

(2) Linking hypotheses relating the microscopic and macroscopic solutions. The
Taylor and Sachs hypotheses are frequently used.
- The Taylor hypothesis states that (a) the macroscopic and microscopic ve-
locity fields coincide, that is, G = Gg and (b) the macroscopic deviatoric
stress tensor S is equal to the average of the deviatoric tensor S g over the
crystal volume, that is, S =< Sg >,where< · >denotes volume average.
Accordingly, we have the macroscopic constitutive law S = K G, where
K =< Kg >. The Taylor hypothesis provides an upper bound analogous
to the Voigt average in Eq. 8.94.

- The Sachs hypothesis states that (a) the macroscopic and microscopic de-
viators coincide, that is, S = Sg and (b) the macroscopic velocity is equal
to the average velocity over the crystal volume, that is, G =< Gg >.
Accordingly, we have the macroscopic constitutive law G = C S, where
c =< cg >.
Example 8.53: The calculation of the evolution of the stress, strain, and lattice
orientation in a polycrystal involves three steps. First, the constituent grains of the
polycrystal need to be characterized. Second, the finite element method can be
used to calculate the evolution of the polycrystal state variables. Third, statistical
methods need to be used to estimate global features of the polycrystal behavior.
Euler angles measurements of the atomic lattice orientation in a material
specimen can be used to develop probabilistic models for the geometrical proper-
ties of the grains in a polycrystal. For example, Figs. 8.26 and 8.27 show three-
dimensional and contour plots of <I> 1 for aluminum AL 7075, where <I> 1 is one
of the three Euler angles defining the atomic lattice orientation. The plots corre-

Experiment Simulation

4 .. ·
<1>,
2

0
550
550 550

0 0 0 0
y (IJlll) x(flm) y (IJlll) X (IJlll)

Figure 8.26. Experimental data of <I> 1 and a sample of its translation random field
model. Three-dimensional representation
654 Chapter 8. Stochastic Systems and Deterministic Input

Experiment Simulation

X (J.U11) X (J.U11)

Figure 8.27. Experimental data of <I> 1 and a sample of its translation random field
model. Contour lines

0.4

0.35 Histogram of <1> 1


0.3

0.25

0.2

0.15

0.1 0.2
r-,

0.05
h
2
nnnn 4
"
6
0

0 50 100 150 200


cjll distance

Figure 8.28. Histogram and correlation function of <I> 1

spond to experimental data and a sample of a translation random field calibrated


to these data (Section 5.3.3.1). Figure 8.28 shows a histogram and an estimate of
the correlation function of <I> 1 based on the experimental data in the previous two
figures, where the correlation function of <I> 1 is assumed to be isotropic. The esti-
mates in this figure have been used to calibrate the translation random field model
for <I> 1 [ 11]. These models and statistics on the grain geometry provide sufficient
information to capture the essential attributes of a polycrystal.
The finite element method has been used in [188] to find the evolution of
a polycrystal consisting of 4096 grains under the assumptions that (1) the grains
have a simple geometry, that is, brick grains with the same dimensions, (2) the
Euler angles are uniformly distributed random variables in each grain, and (3) the
Euler angles in different grains are independent of each other. Numerical results
8. 7. Stochastic stability 655

have been obtained for a polycrystal subjected to a 5% initial prestrain and subse-
quently to strain cycles of amplitude 3%. Figure 8.29 shows the number of grains

1600

1400

1200

1000 20 cycles
aoo 50 cycles
soo 100 cycles
400

200

~00 300 400 500 600 700 900 1000

stress level, s

Figure 8.29. Stress distribution in a polycrystal sample

in the polycrystal with stresses exceeding a stress levels after 1, 20, 50, and 100
strain cycles. The plot provides information on the global state of stress in the
polycrystal and its evolution in time. <>

8.7 Stochastic stability


Let X be an JRd -valued stochastic process representing the state of a non-
linear system defined by

X(t) = g(X(t), (X+ W(t)), t::: 0, (8.127)

for an initial state X (0) = xo, where a E JRd' is a parameter, W E JRd' denotes a
white noise process, and d, d' :=:: 1 are integers. It is assumed that Eq. 8.127 has a
stationary solution denoted by X s· If Xs = 0 satisfies Eq. 8.127, we say that this
equation admits a trivial stationary solution. For example, X s = 0 is a solution of
Eq. 8.127 if this equation is homogeneous.
A main objective of stochastic stability analysis is the determination of sub-
sets of JRd' including all values of a for which solutions X of Eq. 8.127 starting
in a small vicinity of a stationary solution X s converge to X s as time increases
indefinitely. If this behavior is observed for almost all samples of X, then X s is
said to be stable a.s. Otherwise, X s is unstable and a bifurcation, that is, a qual-
itative change of this solution, may occur. Stability analysis is relevant for many
aerospace, mechanical, and structural systems that exhibit nonlinear behavior [9].
Generally, the functional form of the density of X s and the Lyapunov expo-
nent are used to assess whether X s is or is not stable [6, 24, 102, 130, 191]. The
656 Chapter 8. Stochastic Systems and Deterministic Input

functional form of the density of X s can change significantly as a is perturbed


and such a change may be associated with a phase transition in a physical sys-
tem (Section 9.4.3 in this book, [102]). We define the Lyapunov exponent and
present three examples illustrating the stability analysis of the stationary solution
for stochastic differential equations driven by Gaussian and Poisson white noise.
Let X= X- Xs be an JRd -valued process giving the difference between a
solution X of Eq. 8.127 and a stationary solution X s of this equation. If X is an
ergodic process and d ::: 2, then In II X(t) II ft converges a.s. as t ~ oo to two
or more values, referred to as Lyapunov exponents, depending on the initial state
X(O). For stability analysis we are interested in the largest Lyapunov exponent,
called the upper or the top Lyapunov exponent, since this exponent controls the
rate of growth of X. Let ALE denote the top Lyapunov exponent.

Lyapunov exponent. If

1 -
ALE= lim -In
t---+00 t
II X(t) II< 0 and > 0, (8.128)

then the stationary solution X sis stable and unstable a.s., respectively. If ALE =
0, then X s may or may not be stable [9, 10].
Note: Generally, the calculation of the Lyapunov exponent involves two steps. First, a
differential equation needs to be developed for the process X = X- Xs. An approxi-
mate version of this equation obtained by linearization about Xs is used in the analysis.
Accordingly, X is the solution of the differential equation
d d' d
dXi(t) = L aij Xj(t)dt + L L bt Xj(t)dBk(t), i = 1, ... , d,
j=l k=l j=l
where Bk> k = 1, ... , d 1 , are Brownian motions independent of each other and the co-
efficients aij, bt may depend on Xs. Hence, X is a diffusion process conditional on
Xs. Second, the Ito formula can be used to show that S(t) = X(t)/ II X(t) II is a dif-
fusion process on the unit sphere in JRd. It can be proved under some mild conditions
that the top Lyapunov exponent is given by ALE = E[q(S(t))] = q(s) p,(ds), where J
q(s) = ((ii - bil) s, s) + (1/2) tr(b i/ ), ii = {aij}, 'L1=l bfj Xj are the entries of
b, and p,O denotes the marginal probability density function of S assumed to be ergodic
[112]. Theoretical considerations on Lyapunov exponents and their use in applications can
be found in [6, 7, 13]. A
Example 8.54: Let X(t), t ::: 0, be an JR2 -valued diffusion process defined by
dXt(t) =aXt(t)dt+a (Xt(t)dBt(t)+Xz(t)dBz(t))
{
dXz(t) =bXt(t)dt +a (Xz(t)dBt(t)- Xt(t)dBz(t)),
where a, b, a are some constants and Bt, Bz denote independent Brownian mo-
tion processes. The trivial stationary solution of the above stochastic differen-
tial equation is stable a.s. if (a + b) Io(fJ) + (a - b) It (fJ) < 0, where fJ =
(a- b)/(2a 2 ) and / 0 , It. denote modified Bessel functions of the first kind. 0
8. 7. Stochastic stability 657

Proof: The process X = X - X s satisfies the same stochastic differential equation as


X since the coefficients of the defining equation for X depend linearly on this process.
Moreover, X = X because we examine the stability of the trivial stationary solution. The
ItO formula in Section 4.6.2 applied to the function g(X(t)) =In II X(t) 11 2 gives

dR(t) = (a St (t) 2 + b S2(t)2) dt +a dBt (t),


where R(t) = In II X(t) II and S(t) = X(t)/ II X(t) 11. Hence, the function q(s) in
the above definition of ALE is q (s) = a + bsr si. We also note that the process S is
S = (cos(CI>), sin(CI>)) in polar coordinates, where Cl>(t) = tan- 1 (.X2(t)/Xt (t)).
The Ito formula applied to the mapping X(t) ~ Cl>(t) = tan- 1 (X2 (t)/X 1(t))
yields
dCI>(t) = (b- a) cos(CI>(t)) sin(CI>(t)) dt- a dB2(t),

so that Cl> is a diffusion process. The stationary density f(rp) of Cl> satisfies the Fokker-
Planck equation

d a2 d2 f(rp)
- ((b- a) cos(rp) sin(rp) f(rp)) = - - -2-,
drp 2 drp

and has the expression f(rp) = c exp ( 2,8 cos(rp) 2), where c > 0 is a constant such that
J~ rr f (rp) d rp = 1 and f satisfies the boundary condition f (0) = f (2rr). Elementary
calculations give c = 1I (2 ef3 rr lo (,8)).
The Lyapunov exponent ALE = E[q(S(CI>(t)))] = J~rr q(S(rp) f(rp) drp has the
expression
1
ALE = 2 Io(,B) [(a+ b) Io(,B) +(a- b) It (,8)].

Hence, the trivial stationary solution is stable a.s. if the stated condition is satisfied. For
example, the stationary trivial solution is stable a.s. for a = -1, b = -2, and a = 1 but
unstable for a= 1, b = 2, and a= 1. •

Two additional simple examples are presented. In both examples X is a


real-valued process defined by the differential equation

dX (t) = ({3 X (t) -X (t)m) dt +X (t) dS(t), t 2:: 0, (8.129)

where m 2:: 2 is an integer, a, a E IR, {3 = a + a 2 /2, and S denotes an


elementary semimartingale noise. The two examples correspond to S = a B
and S = C, where B is a Brownian motion, C = 'L:;,?
Yk denotes a com-
pound Poisson process, N is a Poisson process with intensity A > 0, and Y k
are iid random variables with distribution F. An alternative representation of C
is C(t) = f~+ fJR y M(ds, dy), where M is a Poisson random measure so that
M(ds, dy) gives the number of jumps ofC in (s, s + ds] x (y, y + dy] and has
the expectation E[M(ds,dy)] = AdsdF(y). Since the state X has dimension
d = 1, there is a single Lyapunov exponent so that this exponent is ALE.
658 Chapter 8. Stochastic Systems and Deterministic Input

The stationary solution Xs satisfies Eq. 8.129 so that X = X - Xs is the


solution of

dX(t) = [f3X(t)- (X(t)m- Xs(t)m)] dt +X(t)dS(t) (8.130)

that is approximated by

dX(t) = ~(t) X(t) dt + X(t) dS(t), (8.131)

where ~(t) = f3- m Xs(t)m-l and X(t)m- Xs(t)m ::::: m Xs(t)m-l X(t). The
approximation is acceptable since we consider only small initial deviations from
the stationary solution. If ALE is strictly negative, then almost all samples of the
solution of Eq. 8.131 converge to zero as t -+ oo, and we say that the stationary
solution of Eq. 8.129 is stable a.s.

Example 8.55: If S =a Bin Eq. 8.129, then

A _ { a for Xs = 0,
LE- (1- m)a for Xs =I= 0,

so that the trivial and non-trivial solutions ofEq. 8.129 are stable a.s. for a < 0 and
a > 0, respectively. Figure 8.30 shows five samples of the solution of Eq. 8.129
with S = a B, a = ±1, a = 1, m = 3, and X(O) = 0.1. For a = -1, the
samples of X approach the trivial solution X s = 0 as time increases. The samples

0.35
a=-1
0.3

0.25

0.2

0.15

0.1

0.05

0
0 5 10 15 20 5 10 15 20

time time

Figure 8.30. Samples of X in Eq. 8.129 with S = a B for a = ±1, a 1,


m = 3, and X(O) = 0.1

of X for a = 1 seem to converge as time increases to a stationary process with a


mean that is approximately equal to ,J7J = ,.j3!2, that is, the location of the right
well of the potential of X. The samples of X remain in this well for a relatively
long time because the noise intensity a is relatively small. The properties of the
samples of X in Fig. 8.30 are consistent with the predicted behavior of X based
on ALE·
8. 7. Stochastic stability 659

The density of Xs is ls(x) = c x<2 afcr 2 -l) exp [ -2xm-l /((m - 1) a 2 )] for
a > 0, where c > 0 is a constant. This density is the solution of the Fokker-Planck
equation for X s (Section 7.3 .1.4). For a < 0 the density of X s is concentrated at
zero, that is, ls(X) = o(x). We note that Is for a > 0 has different functional
forms for a < acr and a > acr. where acr = a 2 /2. Figure 8.31 shows the density
Is form = 3, a = 1, and several values of a. The thin and heavy lines correspond
to values of a smaller and larger than acr·

1.8

2.5
X

Figure 8.31. Densities of the stationary solution for m = 3 and a = 1

There are two values of a marking qualitative changes of the stationary


solution Xs, a = 0 and a = a 2 /2. These values of a are called bifurcation
points. The stable stationary solution X s is trivial and non-trivial for a < 0 and
a > 0, respectively. At a = a 2 /2 the density Is of the stable stationary solution
undergoes a notable change. The mode of Is is at zero for a E (0, a 2 /2) but
moves at a strictly positive value for a E (a 2 /2, oo). Hence, the values a = 0
and a = a 2 /2 define a transition from the trivial solution to non-trivial stationary
solutions and a transition between two different functional forms of the stationary
density Is, respectively. ¢

Proof: The solution of Eq. 8.131 with S = a B for a deterministic function ~(-) is a
geometric Brownian (Section 4.7.1.1). Accordingly, we have

X(t) = X(O) exp (lot ~(s) ds- ~2 t +a B(t)) = X(O) et Y(t),


660 Chapter 8. Stochastic Systems and Deterministic Input

where Y(t) = (lft) Jci ~(s) ds- a 2 f2 +(aft) B(t) or


y { a+ (aft) B(t), for Xs = 0 and
(t) = a-m (lft) JJ
Xs(u)m- 1 du +(aft) B(t), for Xs i= 0.
Note that (1) B(t)ft--+ 0 a.s. as t--+ oo ([150], p. 496) and (2) the moment of order
m - 1 of the non-trivial stationary solution is E[Xs(t)m- 1 ] =a. The Ito formula applied
to the function ln(Xs (t) 2 ) gives

d ln ( Xs(t) 2 ) = 2 (<13- Xs(t)m- 1 - a 2 f2) dt +a dB(t))

since Xs satisfies the original nonlinear equation defining X, or

so that E[Xs(t)m- 1 ] = 13 - a 2 f2 =a because the left side of the above equation is zero.
If Xs is ergodic, then (lft) JJ
Xs(u)m- 1 du converges a.s. to E[Xs(t)m- 1 ] =
a as t --+ oo so that lilllt--+oo Y(t) = a a.s. for Xs = 0 and lilllt--+oo Y(t) = a -
m E[Xs(t)m- 1 ] = (1 - m) a for Xs i= 0. These asymptotic values of Y coincide with the
mean of this process at any t 2: 0. Since X(t) = X(O)etY(t), the trivial and non-trivial
stationary solutions are stable a.s. if a < 0 and (1 - m) a < 0, respectively.
We now examine the stability of the stationary solution Xs by calculating the Lya-
punov exponent ALE in Eq. 8.128. Consider the approximate definition of X in Eq. 8.131
with S =a B. The differential equation for R(t) = ln(X(t) 2) is (Ito's formula)

R(t)- R(O) =2 f (13-m Xs(u)m- 1 - a 2 f2) du + 2a dB(t) = 2t Y(t),

or ln(IX(t)lfiX(O)I) = t Y(t), so that In IX(t)lft converges a.s. to E[Y(t)] as t--+ oo.


In summary, Xs = 0 is a stationary solution for Eq. 8.129 with S = a B that is
stable a.s. for a < 0. If a > 0, the trivial solution is not stable but there is a stable non-
trivial solution. These conclusions follow from both the analytical solution of Eq. 8.129
and the Lyapunov exponent. In this example, it was not necessary to calculate the Lyapunov
exponent ALE for stability analysis since Eq. 8.131 has a known analytical solution. The
Lyapunov exponent has been calculated for illustration. However, in most applications, we
do not know X so that the Lyapunov exponent needs to be calculated. •

Example 8.56: If S = C in Eq. 8.129 and 1 + Y1 > 0 a.s., then [81]

ALE =a+ a 2 /2 +A. E[ln(1 + Y1)].


The value of a marking the transition from X s = 0 to a non-trivial stationary
solution is a*(A.) = -a 2 /2- A. E[ln(1 + Y1)].
Figure 8.32 shows the boundary a*(A.) between the stability regions for the
trivial and non-trivial stationary solutions for A. in the range [5, 40], a = 1, Y 1 "'
U( -a, a), and a =a ../3fi.. For these parameter values, we have A. E[Yf] = a 2
so that C is equal in the second moment sense to a B in the previous example.
The plot in Fig. 8.32 suggests that a*(A.) approaches zero as A. increases, that is,
8. 7. Stochastic stability 661

0.14,----~----.----,---~--,------,------,

Estimate of a· (I.) from


10,000,000 samples
0.08

0.06

0.04

0.02

Figure 8.32. Boundary a*(A.) between stability regions for the trivial and non-
trivial stationary solutions

it converges to the bifurcation point between the trivial and non-trivial stationary
solutions in Example 8.55. The result is not surprising since C becomes a version
of the driving noise a Bas A.-+ oo.
Figure 8.33 shows samples of X with X(O) = 0.5, a= 1, Yt "'U(-a, a),
a = a ,f3!3, A. = 30, and two values of the parameter a, a = 0.015 < a* (30)
and a = 1 > a*(30). Consistently with the stability regions in Fig. 8.32, the

2.5 4
a= 0.015 a= 1.0
2
3
1.5
2

0.5
0
0
~1
50
u 100 150 200 50 100 150 200
time time

Figure 8.33. Samples of X for two values of a

sample of X corresponding to a = 0.015 converges to the trivial solution in time.


The sample of X for a = 1 seems to approach a stationary solution X s f. 0 as
time increases. <>
Proof: We first show that the solution of Eq. 8.131 with S = C, referred to as the geometric
662 Chapter 8. Stochastic Systems and Deterministic Input

compound Poisson process by analogy with the geometric Brownian motion, is

X(t) = X(O)exp (fo~ ~(s-)ds + C*(t)) = X(O)iZU),

where C*(t) = :L~? ln(1 + Yk) and Z(t) = (1/t) JJ ~(s-)ds + C*(t)jt. The Ito
formula, applied to the function

(t, x) r+ g(t, x) = exp (fo~ ~(u-) du + x)

whose arguments (t, x) correspond to the~-valued process (X1, X2) defined by dX1 (t) =
dt and dX2(t) = dC*(t), gives

g(t,C*(t))-g(O,C*(O)) = t
lo+
~(s-)g(s,C*(s-))ds
+ L [g(s, C*(s-) + ~C*(s))- g(s, C*(s-))]
O<s:::;t

which, following multiplication with X(O), becomes

X(t)- X(O) =lot ~(s-)X(s-)ds + L X(s-) [et.C*(s) -1].


O+ O<s::Ot

The differential form of the above equation is

dX(t) = W-) X(t-) dt + X(t-) dC(t)


since et.C*(s) - 1 is Yk for some k ~ 1 and zero if C has a jump Yk or no jump at times.
Hence, the above expression of X satisfies Eq. 8.131 with S =C.
If limt-+oo Z(t) < 0 a.s., the stationary solution Xs is stable a.s. as t --+ oo. For
the trivial stationary solution, Z(t) = f3 + C*(t)/t so that limt-+oo Z(t) =ex+ a 2 j2 +
A.E[ln(1 + Y1)] since C*(t)jt--+ A. E[ln(1 + Y1)] a.s. as t--+ oo ([150], Theorem 3.3.2,
p. 189). Hence, the trivial solution is stable a.s. for ex < ex*(A.) = -(a 2/2 +A. E[ln(1 +
Y1)]). If a non-trivial stationary solution Xs is an ergodic process, then

Z(t) = f3- m!
t
t
lo+
Xs(u-)m- 1 du + C*(t)/t

converges a.s. to f3- mE [ Xs(t-)m- 1J +A. E[ln(1 + Y1)] as t --+ oo. This solution
is stable a.s. for ex > ex*(A.) because E[Xs(t- )m- 1] = f3 +A. E[ln(1 + Y1)]. To obtain

L:
E[Xs(t)m- 1], note that the Ito formula applied to the function ln(Xs(t) 2) gives

d ln(Xs(t) 2) =2 (!3- Xs(t-)m- 1) dt + 2 ln(1 + y) M(dt, dy)

since Xs satisfies the original nonlinear equation for X so that we have


8.8. Localization phenomenon 663

by averaging. Because Xs is a stationary process, the left side of the above equality is zero
so that E[Xs (t- )m- 1] = {3 +A E[ln(l + Y1)].
As in the previous example, the Lyapunov exponent is not needed since we know
the exact solution, but it is calculated for illustration. The Lyapunov exponent can be found

L:
by Ito's formula applied to R(t) = ln(X(t) 2 ). This formula gives

dR(t) = 2!;(t-)dt +2 log(l + y)M(dt,dy).

so that IX(t)l = IX(O)I exp (JJ s(u-) du + C*(t)) and


ALE= lim
t-+oo t
! ( )(0 s(u-) du + C*(t)) ={3-m E[X8 (t)m- 1] +A. E[ln(l + Y1)]

is the Lyapunov exponent. •

8.8 Localization phenomenon


Consider a system occupying a subset D of ~d . A state variable of the
system that has much larger values in one or more small subsets D k of D than in
D \ UkDk is said to experience localization.
Localization is frequently observed in applications, and is used in engineer-
ing designs. For example, failure of a chain is localized at its weakest link because
the links have slightly different capacities and are subjected to the same action. To
increase automobile safety, the passenger cabin is designed to be much stiffer than
the car ends so that the deformation caused by front and rear collisions will be lo-
calized at these ends. Material heterogeneity and geometry are common causes
for localization in continuum mechanics. Material and/or geometry randomness
is not an essential condition for localization. Uncertainty adds a new dimension
to localization, and requires the use of probabilistic tools.
Two examples are used to illustrate the localization phenomenon. The first
example shows that earthquake induced strains localize because of soil hetero-
geneity. Strain localization can cause a phase change in a soil deposit. The second
example shows that, while the modes of vibration of a perfectly periodic contin-
uous beam extend over all beam spans, the corresponding modes of a beam with
nearly equal spans are concentrated or localized on only a few spans.

8.8.1 Soilliquefaction
Soils are heterogeneous materials whose properties exhibit significant spa-
tial fluctuations. Under external actions, soil deposits can experience a phase
transition. For example, sands and other cohesionless soils can change abruptly
their consistency from solid to near fluid if subjected to cyclic actions. This phase
transition phenomenon, called liquefaction, occurs if the pore water pressure ex-
ceeds a critical value. Soil liquefaction can have catastrophic effects on building
664 Chapter 8. Stochastic Systems and Deterministic Input

and bridges during seismic events. Generally, relatively small volumes of a soil
deposit liquefy during an earthquake. The pockets of liquefied soil appear to have
random size and location [144].
Our objective is to (1) assess the potential for liquefaction of cohesionless
soil deposits subjected to earthquakes and (2) examine the spatial distribution and
the size of liquefied soil pockets. The potential for liquefaction is measured by the
pore water pressure. It is assumed that liquefaction occurs at a point x in a soil
deposit if the pore water pressure at x exceeds a critical value.
The cone tip resistance Z1 and the soil classification index Zz largely con-
trol the occurrence and the extent of liquefaction in a soil deposit. Because Z 1 and
Zz exhibit a significant spatial variation, it has been proposed to model them by
random functions [144]. Let Z = (Z 1. Zz) be an JR(2 -valued random field defined
in an open bounded subset D of JR 3 . Because Z1 and Zz are bounded, Z cannot be
approximated by a Gaussian random field. We model Z by a translation random
field defined by

Zk(X) = Fk- 1 o <l>(Gk(t)(x)) = 8k(Gk(X)), k = 1, 2,


where Fk denotes the marginal distribution of Z k. <I> is the distribution of N (0, 1),
and G = ( G 1, Gz) : D x Q -+ JR2 is a homogeneous Gaussian random field
defined on a probability space (Q, :F, P). The Gaussian field G has mean zero,
variance one, and covariance functions Pkt(~) = E[Gk(x) Gt(X + ~)], x, x + ~ E
D. The coordinates of the random field Z can follow any marginal distribution.
The correlation functions E[Zk(X) Zt(X + ~)] = E[gk(Gk(X)) 8t(Gt(x +~))]of
Z are determined completely by Fk and Pkt (Section 5.3.3.1).
Suppose that a soil deposit in D with properties given by the random field
Z is subjected to a seismic ground acceleration record x(t), t E [0, r], where r
denotes the earthquake duration. A finite element code, DYNAFLOW, has been
used in [144] to calculate the evolution in time of the pore water pressure and
other soil states in D under a seismic action x(t). Let W(x, w), x E D, be a
sample of the pore water pressure at the end of a seismic event corresponding to a
sample Z(x, w), x E D, of Z. We can estimate moments and other probabilistic
properties of the pore water pressure random field from a collection of indepen-
dent samples of W. Information on the finite element code DYNAFLOW can be
found in [146].

Example 8.57: Consider a vertical section through a soil deposit D and let D v =
(0, a) x (0, b) c JR2 be a rectangle included in this section with the sides of length
a and b oriented in the horizontal and vertical directions, respectively (Fig. 8.34).
The soil deposit in D is subjected to the deterministic ground acceleration x(t)
in Fig. 8.34. Data analysis shows that the random fields Z 1 and Zz follow beta
distributions. The parameters of these distributions and the correlation functions
of Z are given in [144].
Figure 8.35 shows contour lines for six samples of the pore water pres-
sure random field W corresponding to six independent samples of Z. The darker
8.8. Localization phenomenon 665

Free field Water table

1
bou\ruy condffion• / /v 15

1.6m

b= 12m

~~----a-=-6~0m------~l
+~·-~=-==--------t-<1'

Figure 8.34. Soil deposit and seismic ground acceleration

Sample 3

Figure 8.35. Six samples of the pore water pressure random field W

shades in the figure are for larger values of W, and indicate an increased poten-
tial for liquefaction. The samples of W are consistent with observed liquefaction
patterns [144]. The size and the location of the pockets of liquefaction show a
remarkable sample to sample variation. <>

Note: An extensive data set has been used in [144] to estimate the probability law of the
random field Z. If it is assumed that the soil deposit in D is homogeneous, then liquefaction
either does not occur or occurs everywhere in D, in contrast with field observations. I.
666 Chapter 8. Stochastic Systems and Deterministic Input

8.8.2 Mode localization


Consider the dynamic system in Fig. 8.36 consisting of large number of
oscillators connected by springs with the same stiffness k. Suppose first that the
oscillators have identical natural frequencies. Then the modes of vibration extend
throughout the entire system. Suppose now that the natural frequencies of the
oscillators differ slightly and that k = 0. The modes and the modal frequencies of
the system coincide with the vibrations of individual oscillators and their natural
frequencies, respectively. If k > 0 is small, the system modes of vibration are
localized around individual oscillators and the corresponding modal frequencies
are approximately equal to the frequencies of these oscillators. For example, if
the system is subjected to a harmonic action of frequency equal to the natural
frequency of an oscillator, this oscillator will vibrate with a large amplitude. Its
nearest neighbors will be driven off resonance and, since the coupling between
the oscillators is weak, will vibrate with much smaller amplitudes. Similarly,

Figure 8.36. A system of simple oscillators

their neighbors will vibrate with even smaller amplitudes, so that the modes of
vibration of the dynamic system in Fig. 8.36 with slightly different oscillators and
weak couplings will be localized.
It is shown that properties of products of random matrices and the Lyapunov
exponents introduced in Section 8.7 can be used to find the rate of decay of the
localized modes.
Let Xn = An Xn-1. n = 1, 2, ... , where A1, A2, ... are independent
copies of a (d, d) matrix A with real-valued entries and X o = xo E !Rd. Note that
the sequence Xn = Cn Xo, n = 1, 2, ... , is an 1Rd-valued Markov chain, where
Cn =An An-!··· A1.
We present some properties of matrix C n and define the upper Lyapunov
exponent for the sequence X n.

If E [(log II A1 lim)+]< oo, the sequence E[1n II Cn llmJ is subadditive, that


is,
E [In II Cn+p lim] :::: E [In II Cn llmJ + E [In II C p lim] (8.132)

for n, p 2::: 1 integers, where II a II m= sup{ II ax II: x E !Rd, II x II= 1} denotes


a matrix norm.

Proof: Because E [ (ln I A1 lim)+] < oo, the expectation E [In I A1 lim] exists but may
8.8. Localization phenomenon 667

be -oo. Let n, p ~ 1 be integers and note that

E [1n II Cn+p lim]= E [In II An+p · · · Ar lim]


.:::: E [In II An+p · · · An+1 lim]+ E [In II An··· Ar lim]
=E[lniiCp llm]+E[lniiCn llmJ

by properties of the norm, the definition of Cn, and the assumption that the matrices Ak
are independent and have the same distribution. The subadditivity of In II Cn II implies the
convergence

1 . 1
-E [In II Cn lim]- mf -E [In II Cq lim], as n--+ oo,
n q:;::l q

in lR U {-oo} ([24], Example 5.2, p. 13). •

If E [ (ln II A 1 II m) +] < oo and the independent identically distributed matrices


Ar, A2, ... have inverses, the limit

. 1 1
ALE= hm -In
n-+oo n
II Cn lim= lim -ln
n-+oo n
II An··· Ar lim (8.133)

holds a.s. and is called the upper or the top Lyapunov exponent ([24], Theo-
rem4.1,p.ll).

Note: The result in Eq. 8.133 is a statement of a theorem by Furstenberg and Keston [69].
Recall that we have used the upper Lyapunov exponent in Section 8.7 to assess the
sample stability of the stationary solutions of differential equations driven by Gaussian and
Poisson white noise. We have seen that these solutions are stable a.s. if ALE < 0. The Lya-
punov exponent in Eq. 8.133 can also be used for the stability analysis of discrete systems
with random properties, for example, a system with state Xn, n = 0, 1, ... , defined by
the recurrence relationship Xn = An X n-1· n = 1, 2, ... , and a specified initial state X 0·
Such relationships also result from discrete time approximations of stochastic differential
equations of the type in Eq. 8.131.
That the Lyapunov exponent gives the rate of growth of the solution is consistent
with our intuition. For example, let ai, i = 1, 2, ... , be deterministic (d, d) matrices and
define the sequence Xn(xo) = an "n-1 · · · a1 xo = Cn xo, n = 1, 2, ... , starting from a
deterministic initial state xo. Let a~n and a~ax denote the smallest and largest eigenvalues
of ci Cn. We have (Section 8.3.2.1)

a~n II xo 11 2 .: : 11 Xn(xo) 11 2 = x6 ci CnXO :S aitax II xo 11 2 •


so that
1n (amin) + 1n II xo II::: ln II Xn(Xo) II:S ln (amax) + ln II xo II
and liffin__..oo In (O"min) /n :S A(xo) :S limn-+oo ln (amax) jn, where A(xo) is the limit of
ln II Xn(xo) II /n as n --+ oo. If the limits of ln(amin)/n and ln(amax)/n as n --+ oo are
bounded, so is A(Xo) for every xo and II Xn(Xo) II~ enA.(Xo) for large values of n so that
A(xo) gives the rate of growth of Xn (xo) with n. A
668 Chapter 8. Stochastic Systems and Deterministic Input

We have seen in Section 8.7 that systems with dimension d ::: 2 have more
than one Lyapunov exponent. For stability analysis the upper Lyapunov exponent
is relevant. However, for localization we need the smallest positive Lyapunov
exponent since it gives the least rate of decay of the state X n, and this rate provides
information on the extent of the localization. We denote this Lyapunov exponent
by XLE· Additional information on the use of the Lyapunov exponent for mode
localization analysis and methods for calculating XLE in some special cases can
be found in [198] and [122] (Chapter 9).

Example 8.58: Consider a perfectly periodic beam with uniformly distributed


mass, n spans of equal length l, and torsional springs with the same stiffness s
at its supports (Fig. 8.37). Suppose that the spans of this beam are perturbed so

s~--s~~~s~~~~-s~~--s~~--s~~
~~~~~~«---+}- ~J~·---,r~---J~#
Figure 8.37. A perfectly periodic continuous beam

that the original spans become Li = l + Ri, i = 1, ... , n, where Ri are indepen-
dent copies of a random variable R such that IR I « l /2 a. s. The resulting beam
is referred to as a randomly disordered or nearly periodic beam.
Figures 8.38 and 8.39 show the first five modes of vibration of the perfectly
periodic and randomly disordered beams, respectively, for l = 1, s = 20, n =
100, R ~ U( -0.03, 0.03), and unit mass per unit of length. The modes of the
periodic beam extend over all spans. On the other hand, the modes of the randomly
disordered beam are localized on only a few spans. <>
Note: The modal shapes in Figs. 8.38 and 8.39 were calculated by using the finite element
method. Two beam elements were used in each span of the perfectly periodic and randomly
disordered beams, so that both beams were modeled by discrete systems with 2 n + 1
degrees of freedom. An algorithm written in MATLAB was used for the modal analysis.
The example is taken from [198].
Let ei be the rotation at support i = 1, ... , n + 1 and let e<il = (E>i, E>i+l)·
Then e<il = T(i) eU-l), where T(i) denotes a transition matrix that can be obtained
by classical methods of structural analysis. For the perfectly periodic beam the transition
matrices are deterministic and identical. On the other hand, for the randomly disordered
beam the matrices T(i) are independent and identically distributed. The recurrence formula
for 8 gives
e<k> = T<k> ... T<2> e<l)
so that II e<k) 11=11 T(k) · · · T<2>eO> II and
II e<k) II= (11 T(k) · · · T<2>e<t> I II e<l) 1111) II e<l) 11~11 T(k) · · · T<2> lim II 9(1) II,
8.8. Localization phenomenon 669

o.
1
c~A ~AA~AA
AAA AA AAA AAAAAAAA AAAA AAAAAA
olvvv~vvvv~vvvv~VVVVV,HH~V!V~
A~AAAA~AAAJ Mode 1
-0.1 l . . . - - - ' - - - - - - ' - - - - ' - - - - ' - - - - ' - - - ' - - - ' - - ' - - - ' - - - - - '
0.121 41 61 1 1 1 121 141 1 1 1 1 2 1

0 ~WW~W~WW~~w~m~w~~M~ 2
-0.1 ...__.___..___..___..J....__ _ ! . __ __,__ _,__....~.- _ _ . __ _.J

0.121 41 61 1 1 1 121 141 161 181 2 1

O~~~w~~~W~~w~,M~3
-0.1 '---'----'----'----'---'-----'---'----'---'---'
0.121 41 61 1 1 21 1 1 1 1 2 1

orw~m~wv~~~m~ww~~M~ 4
-0.1 ' - - - ' - - - " ' - - - " ' - - - - ' - - - ' - - - - - ' - - . . . . 1 . - - - - ' - - - - ' - - - - '
0.1141 1 1 101 121 141 1 1 1 1 2 1

-0.1
o~~~~w~~w~~v~~w~~
'---'---.l...---'----'---'----'----'---'---'---'
5
1 21 41 61 81 101 121 141 161 181 201
Degrees of freedom

Figure 8.38. First five modes of a perfectly periodic beam

_] : :±: : : : : : Mode 1
••' 21 41 61 81 1Q1 121 141 ~·
I

_.:1 : : " ~ : : : ~Mode2


'!'
-O:EF :I' "~ :'i' : : : : TMode3
0.5' 21 41 81 81 1Q1 121 141 161

o.sl 21 41 121 'I' ~~


o: : : = : ~Mode4
,. :· ·:· E ·:· ·:1~....
=• :

~~I ~· ~
1 21 41 61 81 101 121 141 161 181 201
Degrees of freedom

Figure 8.39. First five modes of a randomly disordered beam

so that
II a<k) II~ iLE (k-l) II a<l) II. as k-+ 00,
670 Chapter 8. Stochastic Systems and Deterministic Input

where ~LE is the smallest positive Lyapunov exponent corresponding to the random matri-
ces T(k). The Lyapunov exponent ~LE in the above equation gives the rate of decay of the
rotation at supports, and is referred to as a localization factor. &

8.9 Problems
8.1: Consider a modified version of the partial differential equation in Exam-
ple 8.1 obtained by replacing its right side with an integrable function p(x),
x E D. Find the local solution of the resulting partial differential equation. Extend
your results to the case in which p(x) is a random field.

8.2: Extend the local solution of the partial differential equation in Problem 8.1
to the case in which D is a random set. Assume that the point x at which the local
solution is determined is in D a.s.

8.3: Derive approximate expressions for the second moment properties of X in


Eq. 8.2 based on a second order Taylor expansion (Eq. 8.9). Use your findings to
improve the second moment properties of X calculated in Example 8.4.

8.4: Find the perturbation solution of X in Eq. 8.2 withY= a+ s R1 + s 2 R2 +


O(s 3), where sis a small parameter and the random matrices R1 and R2 have
mean zero and are independent with each other. Develop expressions for the sec-
ond moment properties of X.

8.5: Solve the problem in Example 8.8 for Y being a shifted exponential random
variable with the density f(y) =A e-J... (y-a), y ::: a, where A > 0 and a are some
constants.

8.6: Find the first two moments of X defined by Y X = q by both versions of the
iteration method, where Y = a + R, R "' U (-a, a), and 0 < a < a.

8.7: Let S = (Sn, S12 = S21, S22) be a Gaussian vector, where SiJ are random
stresses. Suppose that the coordinates of Shave the mean values (100, -60, 300),
are equally correlated with correlation coefficient p = 0.7, and have the same
coefficient of variation v = 0.3. Find the first two moments of the eigenvalues
and eigenvectors of the stress tensor {Sij}.

8.8: Consider an infinite thin plate with a through crack of length 2 a. The plate
is subjected to uniform far field stresses, a tension Q 1 perpendicular to the crack
and a shear Q2. Suppose that (Ql, Q2) is a Gaussian vector with mean (10, 4),
coefficients of variation (0.3, 0.3), and correlation coefficients p = 0, 0.7, and
0.9. Let 8 be the direction of crack extension. Find the distribution of 8 based
on the relationship Q1/ Q2 = [1 - 3 cos(8)]/ sin(8).
8.9: Find second order Taylor approximations for the eigenvalues and eigenvec-
tors of a square random matrix A.
8.9. Problems 671

8.10: Consider an (n, n) symmetric matrix A= a+ R. Denote by Ai and Ai the


eigenvalues of the deterministic and random parts, a and R, of A, respectively.
Develop bounds on the probability P(Ai :S x).

8.11: Find the second moment properties of the eigenvalues and eigenvectors of
the random matrix in Example 8.12 by the iteration method.

8.12: Let G be an ~n -valued Gaussian variable with mean IL and covariance func-
tion y. Find the mean zero-crossing rate v(A.) of the stochastic process V (A.) =
),n + G1 A.n-l + · · · + Gn-1 A.+ Gn. Use your results to calculate v(A.) forn = 3,
IL = 0, and y equal to the identity matrix.

8.13: Let U be defined by dU(t) = -Z U(t) dt + dB(t), t ~ 0, where Z > 0


a.s., U (0) = 0, and B denotes a Brownian motion. Develop a differential equation
for the sensitivity factor V(t) = aU(t)jaZ and calculate the second moment
properties of (U, V) for Z "' U(l, 3).

8.14: Let U be the solution of Eq. 8.45 with D given by Eq. 8.53, where 8 is
a small parameter. Find expressions for the mean of U to the order 8 2 and for
the correlation function of U to the order 8 4 . Calculate the confidence interval
(E[U] - Std[U], E[U] + Std[U]) on U to the order 8 2 .

8.15: I:.et U be the solution of d 4 U(x)jdx 4 + N(x, U(x)) = q, x E (0, l), with
the boundary conditions U(O) = 0, dU(O)jdx = 0, U(l) = 0, andd 2 U(l)jdx 2 =
0, where q is constant and N(x, U(x)) is a nonlinear random function. De-
rive equations for the second moments of U(x) for N(x, U(x)) = A(x) U(x) +
B(x) U (x ) 3 , where A and B are square integrable random fields that are uncorre-
lated with each other.

8.16: Repeat the analysis in Problem 8.15 by using a finite difference approxima-
tion for the partial differential equation defining U.

8.17: Let U be the solution of dU(t)fdt = -Z U(t), t ~ 0, with the initial


condition U(O) = ~ > 0, where Z "' U(l, 2). Find U by the Neumann series
method, establish conditions for the convergence of this series representation of U,
and calculate the first two moments and the marginal distribution of the solution
based on the first three terms of the Neumann series.

8.18: Consider the random eigenvalue problem Z <l> 11 (x) +A <l>(x) = 0, x E


(0, 1), with the boundary conditions <1>(0) = <1>(1) = 0, where Z is a random
variable with positive mean. Find an upper bound on the lowest eigenvalue in this
problem.

8.19: Apply the Monte Carlo simulation method to estimate the second moment
properties and the distributions of the first three eigenvalues and eigenfunctions
of the random eigenvalue problem in Problem 8.18. Assume that Z is uniformly
distributed in (1, 2).
672 Chapter 8. Stochastic Systems and Detenninistic Input

8.20: Assume that conductivity a in Eq. 8. 77 is a translation random field defined


by :E(x) = a + exp(~ G(x)), x E JR3 , where G is a zero-mean Gaussian field
with covariance function E[G(x) G(x')] = exp( -A II x- x' II), and a,~. A > 0
are some constants. Find the effective conductivity for this random heterogeneous
material.

8.21: Generate additional samples of the process in Example 8.56 for different
values of A and a, and determine whether the observed sample properties are
consistent with the results in Fig. 8.32.
Chapter 9

Stochastic Systems and Input

9.1 Introduction
In most stochastic problems both the system properties and the input char-
acteristics are random. However, the degree of uncertainty in the input and the
system can be very different. For example, it is common in earthquake engineer-
ing to assume that the structural system is deterministic because of the very large
uncertainty in the seismic ground acceleration. Under this simplifying assumption
we can evaluate the seismic performance of structural systems by the methods in
Chapter 7. Similarly, developments in Chapters 6 or 8 can be applied to stochastic
problems characterized by a negligible uncertainty in both the system and input
or the input alone, respectively.
This chapter considers stochastic problems in which both the system and
input uncertainty has to be modeled. Methods for analyzing this type of stochastic
problems are discussed primarily in Section 9.2. The subsequent sections in this
chapter present applications from mechanics, physics, environment and ecology,
wave propagation in random media, and seismology. Some brief considerations
on model selection conclude the chapter. We start with a summary of the main
sources of uncertainty for general stochastic problems.

• Randomness in system and input properties. For example, the atomic lat-
tice orientation in metals exhibits random spatial variation (Section 8.6.2.2),
identically designed light bulbs and other physical systems have different
service life, properties of geological formations are characterized by notable
spatial fluctuations, the details of the next earthquake at a site in California
are not known.

• Finite information. For example, the functional form and the parameters of
most mathematical models used in finance, physics, mechanics, and other
fields of applied science and engineering need to be estimated from finite

673
674 Chapter 9. Stochastic Systems and Input

records so that they are uncertain and the degree of uncertainty depends on
the available sample size.
• Limited understanding and model simplification. For example, the un-
derlying principles of a phenomenon may not be well understood and/or
may be too complex for modeling and analysis so that simplified repre-
sentations need to be considered. The prediction of the outcome of a coin
tossing experiment is possible in principle because the coin can be viewed
as a rigid body subjected to an initial condition. However, the high sen-
sitivity of the outcome of this experiment to initial conditions, properties
of the landing surface, and many other parameters render the mathematical
modeling of this experiment impractical if not impossible. We need to set-
tie for a global characterization of the coin tossing experiment giving the
probability of seeing the head or the tail in a toss.
This chapter examines stochastic problems defined by
D[X(x, t)] = Y(x, t), t 2: 0, x ED C JRq, (9.1)
where D is a subset of JRq , D can be an algebraic, integral, or differential operator
with random coefficients, Y(x, t) is an 1Rm-valued random function depending
on space x E D and time t ::=: 0 arguments, the )Rn -valued random function X
denotes the output, and m, n, q 2: 1 are some integers. The output X depends on
Y, D, and the initial/boundary conditions for Eq. 9.1, which can be deterministic
or random. It is difficult to give conditions for the existence and uniqueness of the
solution ofEq. 9.1 in this general setting. We will give such conditions for special
forms of Eq. 9.1. In many applications it is sufficient to find the solution X at a
finite number of points Xk E D. Then Eq. 9.1 becomes an equation giving the
evolution in time of an JRd -valued stochastic process X (t), t 2: 0, collecting the
functions X(t, Xk). Similar considerations have been used in Section 7.1. Most
of the results in this chapter relate to differential equations, that is, Din Eq. 9.1 is
a differential operator.
Our objective is to calculate second moment and other properties of X in
Eq. 9.1 supplied with deterministic or random initial and boundary conditions, or
of its discrete counterpart, the stochastic process X. There is no general method
that can deliver the probability law of X or X. The next section presents methods
for calculating probabilistic properties of X and X. Most of the methods in this
section can be viewed as extensions of developments in Chapters 6, 7, and 8. The
subsequent sections present applications from various fields of applied science
and engineering.

9.2 Methods of analysis


Let X be the solution of Eq. 9.1. If the coefficients of the operator Din
this equation are not functions of X, we say that D has state independent co-
efficients. Otherwise, D has state dependent coefficients. We will see that a
9.2. Methods of analysis 675

stochastic problem defined by an operator V with state dependent coefficients can


be recast into another problem defined by an operator V * with state independent
coefficients. The state independent coefficients of an operator and the input Y
may or may not be defined on the same probability space.

Example 9.1: Let X be the solution of dX (t) = - R X (t) dt + a dB(t), t 2: 0,


where R is a random variable that is independent of the Brownian motion B and a
is a constant. The operator 'D[X (t)] = (d + R dt)[X (t)] has state independent co-
efficients. If the coefficients of the differential equation for X satisfy a.s. the con-
ditions in Section 4.7.1.1, the above stochastic differential equation has a unique
strong solution and X is well defined. ~

Example 9.2: Let X be the solution of

m X(t) + c X(t) + x(A(t)) X(t) = Y(t), t:::: 0,

where m, c > 0 are some constants, Y is a Gaussian process, A is a process de-


fined by the evolution equation A(t) = a [k(A(t)) H (t)]fl, a, f3 > 0 are some
constants, H(t) = [X(t) 2 + (X(t)jv(t)) 2 ] 112 denotes the envelope of X, v(t) =
[X (A(t))/m] 112 , and x, k are strictly positive functions. The coefficients m, c, a, f3
in the operator'D = m (d 2 jdt 2 )+c (djdt)+ x(A(t)) and the differential equation
for A can be random variables.
The operator V involves the process A whose value at a time t 2: 0 depends
on the entire history of X up to this time so that V has state dependent coeffi-
cients. However, the coefficients of the operator V * defining the evolution of the
augmented vector (X, A) are state independent.~
Note: Consider a thin massless rectangular plate that is kept vertical by a continuous sup-
port at its top edge. The plate has a mass m attached at its bottom edge and an initial
central crack. The process X (t) gives the displacement of the mass m at time t 2: 0 under
a loading Y, applied at the mass location. The parameter c > 0 denotes the plate damping
coefficient. The process A(t) denotes the crack length at time t. The constants 01, f3 are
material parameters ([175], Section 7.5.2.2). &

Two classes of methods are available for finding properties of the solution
X ofEq. 9.1. The methods in the first class constitute direct extensions of some
of the results in Chapters 6 and 7, for example, the local solution for some par-
tial differential equation, the Monte Carlo simulation, the conditional analysis,
the state augmentation, and the Liouville methods in Sections 9.2.1, 9.2.2, 9.2.3,
9.2.4, and 9.2.5, respectively. These methods involve rather advanced probabilis-
tic tools and can deliver detailed properties of X. The methods in the second class
extend some of the results in Chapter 8, for example, the Taylor, perturbation, and
Neumann series, the Galerkin, the finite difference, and finite element methods
in Sections 9.2.6, 9.2.7, 9.2.8, and 9.3, respectively. Most of these methods in-
volve elementary probabilistic concepts and are particularly useful for finding the
second moment properties of X.
676 Chapter 9. Stochastic Systems and Input

9.2.1 Local solutions


We have seen in Sections 8.2 and 8.5.1 that the local solutions in Chapter 6
can be extended to solve a class of stochastic problems with stochastic operator
and deterministic input. The methods in Chapter 6 can also be generalized to find
local solutions for stochastic problems defined by a class of operators with the
functional form of the deterministic problems considered in this chapter but with
random coefficients, input, and initial/boundary conditions. The solution of this
class of stochastic problems involves two steps. First, the methods in Chapter 6
need to be applied to estimate the required local solution for a collection of sam-
ples of the operator 'D, the input Y, and the initial/boundary conditions. Second,
moments and other probabilistic properties can be estimated from the collection
oflocal solutions delivered by the first step. The following example illustrates this
generalization of the methods in Chapter 6 by using the random walk method to
solve locally a partial differential equation with random coefficients.

Example 9.3: Let U be the solution of the partial differential equation


a2U(x) a2U(x)
-~ + Zr - -Z2 XED= (0, 1) x (0, 1)
ax? axi - '

with the boundary conditions au jaxr = 0 on {0} x (0, 1), U = 0 on {1} x (0, 1),
aU jax2 = 0 on (0, 1) X {0}, and U = 3.5349xf+0.516lxr +3.0441 On (0, 1) X
{1}. The coefficient Zr and the input Z2 are independent random variables that
are uniformly distributed in (3 -a, 3 +a), 0 < a < 3, and (16- b, 16 +b),
0 < b < 16, respectively.
Let X(t; w) be an JR2 -valued diffusion process with coordinates

Xr(t,w) =Br(t)+Lr(t),
{
X2(t, w) = JZ1(w) (B2(t) + L2(t)),

where B; are some independent Brownian motions and L i are local time processes
(Section 6.2.3.1). Let (Z1, Z2)(w) be a sample of (Z1, Z2). Denote by T(w) =
inf{t > 0 : X(t, w) ¢ D} the first time X(·, w) corresponding to this sample
(Z1, Z2) and starting at x ED leaves D. The local solution of the above partial
differential equation at x E D for a sample (Z r, Z2)(w) of (Zr, Z2) is

U(x, w) =Ex [h(X(t, w))] + -Z22(w)


- Ex [T(w)],

where h gives the values of U on the boundaries {1} x (0, 1) and (0, 1) x {1}
of D. Figure 9.1 shows a histogram of U(x) for a = 0.5, b = 4, and x =
(0.25, 0.25) that is obtained from 100 samples of Z. The estimates of U (x, w) for
each w have been obtained from 500 samples of X ( ·, w) generated with a time step
!1t = 0.001. The estimated local solution at x = (0.25, 0.25) of the associated
deterministic problem (a = b = 0) is 4.1808, and nearly coincides with the mean
of the histogram in Fig. 9 .1. 0
9.2. Methods of analysis 677

2.5r------,-------.------,
Histogram of U((0.25,0.25))
Mean =4.2635
2 Std = 0.3205
(100 samples)

1.5

0.5

4 4.5 5

Figure 9.1. Histogram of U((0.25, 0.25))

Proof: The Ito formula gives (Section 6.2.3.3)

Ex [h(X(T(w), w))]- U(x, w)

~
= L c; E
x [!oT(w) ---'---'--'----'------'-
oU(X(s, w), w) (
dB; (s) + d L; (s)
)]
i=l 0 ox;

+ ! Ex [ {T(w) ( a2U(X(s; w), w) + 21 (w) a2U(X(s; w), w)) ds] '


2 lo axl ax2

where q = 1 and c2 = ../Z1 (w). The boundary conditions and the partial differential
equation defining U give the above local solution. •

9.2.2 Monte Carlo simulation


Monte Carlo simulation is the most general method for solving stochas-
tic problems. The method can be used if (1) algorithms are available for solv-
ing the deterministic problems corresponding to samples of 1J, Y, and the ini-
tial/boundary conditions and (2) the probability laws of all random variables and
functions defining Eq. 9.1 are known. The essential limitation of the Monte Carlo
simulation is the computation time, that can be excessive because the time needed
to calculate a sample of X and/or the number of samples needed to find some
properties of X can be very large. The Monte Carlo simulation solution of Eq. 9.1
with state independent coefficients involves the following steps.
678 Chapter 9. Stochastic Systems and Input

1. Generate ns samples of y, D, and the initial/boundary conditions for Eq. 9.1.


2. Calculate the solutions X(·,·, w), w = 1, ... , ns, ofEq. 9.1 for the samples
generated in the previous step.

3. Estimate moments and other properties of X from its samples.

Note: The samples of X can be used to estimate properties of this random function. For
example, r(x, t, y, s) = (1/ns) :L:S=l X(x, t; w) X(y, s; w)T is an estimate for the cor-
relation function E[X(x, t) X(y, s)T] of X . .l

If the operator D in Eq. 9.1 has state dependent coefficients, the following
Monte Carlo algorithm can be applied to find properties of X.

1. Generate ns samples of the input Y, the subset of coefficients of D that are


state independent, and the initial/boundary conditions for Eq. 9.1.

2. Calculate the joint evolution of X and the state dependent coefficients of D


for the samples generated in the previous step.

3. Calculate moments and other properties of X from its samples.

The above Monte Carlo simulation algorithms can be applied to generate


samples of solutions for the stochastic problems in Examples 9.1 and 9 .2, and es-
timate properties of these solutions. For example, samples of X in Example 9.1
can be obtained by the method in Section 4. 7.3 for each sample of R, which can be
produced by the algorithms in Section 5.2. This approach can be extended with-
out difficulty to the case in which R is a stochastic process. Samples of (X, A) in
Example 9.2 can be obtained from samples of Y and numerical algorithms for in-
tegrating nonlinear differential equations, for example, the Runge-Kutta algorithm
(Section 4.7.3.2). If the parameters (m, c) and/or (a, {3) are random, samples of
these parameters need also to be generated. A sample path of (X, A) would cor-
respond in this case to a sample of (m, c), (a, {3), andY.

9.2.3 Conditional analysis


Suppose that Din Eq. 9.1 has state independent coefficients and let e be a
vector collecting all random coefficients of D. The solution of Eq. 9.1 by condi-
tiona! analysis involves the following steps.

1. Calculate the required properties of X I e by the methods in Chapter 7.


2. Calculate the required properties of X by eliminating the condition on e.
Note: The method is particularly efficient if the dependence of the required properties of
X 1 9 is a known function of 9 and the dimension of 9 is relatively small. If these
conditions are not satisfied, the method can still be applied in conjunction with the Monte
Carlo simulation and involves the steps: (1) generate samples 9(w), w = 1, ... , ns, of
9.2. Methods of analysis 679

8, (2) calculate the required properties of X I (8 =


8(w)), w =
1, ... , ns, by the
methods in Chapter 7, and (3) estimate the required properties of X from the properties of
X I (8 = 8(w)), w = 1, ... , n8 , obtained in the previous step.
If V has state dependent coefficients, the method can be applied to a modified ver-
sion ofEq. 9.1 defined by an operator V* that has state independent coefficients, as we will
see later in Section 9.2.4. A

Example 9.4: Let X be the process in Example 9.1 with the deterministic initial
condition X (0) = x. If R and B are independent, then X 19 = X IR is a Gaussian
process with mean and correlation functions
J.t(t I R) = E[X(t) I R] = x e-Rt,
r(t,s I R) = E[X(t)X(s) I R] = x 2 e-R(t+s) + ;: e-R(t+s) (e 2 R(tAS) -1).

These two functions define the probability law of X IR. Properties of X can be
obtained from the probability law of XIR by eliminating the condition on R.
Numerical integration or Monte Carlo simulation can be used for solution. Fig-
ure 9.2 shows the evolution in time of the variance of X for a = ,fi, R = 1, and
R"' U(a- b, a+ b) with a= 1 and b = 0.7. ¢
1.4r---.,----~----.---~------,

R- U (0.3,1.7)

R=1

oL---~--~--~--~--~
0 2 4 5
time

Figure 9.2. Evolution of the variance of X for a = ,fi, R "' U(0.3, 1.7), and
R= 1

Proof: The mean and correlation equations in Section 7 .2.1.1 can be used to find the second
moment properties of XIR. Alternatively, we can use direct calculations to obtain the above
equations. For example, the mean equation, f.t(t I R) = - R JJ,(t I R) with initial condition
J-L (0 I R) = x, results by calculating the expectation of the equation for X IR. We also have

X(t) = x e-R t +a lot e-R (t-u) dB(u),

so that the second moment properties of X I R are as stated. •


680 Chapter 9. Stochastic Systems and Input

Example 9.5: Let X be a real-valued stochastic process defined by dX (t) =


-R(t)X(t)dt + adB(t), t 2: 0, where X(O) = x, a is a real constant, and
R is a stochastic process that is independent of the Brownian motion B. It is as-
sumed that the solution of the above differential equation exists and is unique for
almost all samples of R (Section 4.7.1.1). Conditional on Rand X (0) = x, X is a
Gaussian process with mean and correlation functions

t-t(t I R) =xe-f~R(u)du,
rtl\s t s
r(t, sIR)= t-t(t I R) t-t(s I R) + a 2 Jo e- fu R(Od~- fu R(rJ)drJ du.

Generally, it is not possible to find simple expressions for t-tCt I R) and


r(t, s I R). However, conditional analysis and Monte Carlo simulation can be
used to estimate properties of X. First, formulas need to be developed for the re-
quired properties of X IR, for example, the mean and correlation functions of X IR
given above. Second, these formulas and samples of R can be used to estimate the
required properties of X.
Let R be a lognormal translation process defined by R(t) =a+ exp(G(t)),
where a > 0 and G denotes a stationary Gaussian process with mean zero, vari-
ance 1, and covariance function p(r:) = E[G(t) G(t + r:)]. Figure 9.3 shows ten
samples of G with p ( r:) = exp( -I r: I) generated by the spectral representation
method with a cutoff frequency v * = 5 and n = 50 equally spaced frequencies
(Section 5.3.1.1), the corresponding samples of R with a = 0.5, the corresponding
conditional correlations r(t, t I R), and an estimate of r(t, t) based on a hundred
samples of R. <>
Note: The mean and correlation formulas in Section 7.2.1.1 or direct calculations can be
used for solution. For example, we can use the solution

to calculate the second moment properties of XIR.


Let aCn) be an approximation of G depending on 2n independent Gaussian vari-
ables with zero-mean and unit-variance collected in a vector Z (Section 5.3.1.1). The
approximation R(n) (t) =a +exp(G(n) (t)) of the lognormal translation process R depends
on the Gaussian vector Z. Denote the conditional correlation function of X corresponding
to this representation of R by r(-, · 1 Z) . The samples of G in Fig. 9.3 have been ob-
tained from aCnl with n = 50 so that r(t, s I Z) depends on 100 Gaussian variables. The
calculation of r (t, s) by direct integration from

r(t, s) = h
JRlOO
r(t, s 1 z)
.
•=1
1
0100( v2:rr
~ e-zi 1 dz;
2
2 )

is not feasible for this value of n. However, r(t, s) can be estimated without difficulty from
r(·, · 1 Z) and samples of Z, as demonstrated in Fig. 9.3. A.
9.2. Methods of analysis 681

4.---------------. 15r--------------.
Samples of G(t) Samples of R(t)

10

-4L-------~------~
0 5 10 5 10
time time

1.-------------------, 0.5 .---------------.


Samples of r(t,tiR) Estimate of r(t,t)
0.8 0.4

0.6 0.3

0.4 0.2

0.2 0.1
OL-------~------~
5 10 0 5 10
time time

Figure 9.3. Ten samples of G, the corresponding samples of R and of the condi-
tional correlations r(t, t I R), and an estimate of r(t, t) based on 100 samples

Example 9.6: Let X be a real-valued process defined by the differential equation


X(t) = R(t) X(t) + Y(t), t ~ 0, where RandY are real-valued stochastic pro-
cesses. It is assumed that R is independent of Y and has continuous samples. The
process X is given by

X(t) = H(t,O)X(O)+ lot H(t,s)Y(s)ds,


where the transition matrix H (t, s) satisfies aH (t, s)jat = R(t) H (t, s), 0 ::; s ::;
t, H(s,s) = 1, sothatH(t,s) = exp(J: R(u)du). If X(O) = 0, the second
moment properties of X are

=lot E[H(t, s)] E[Y(s)]ds,


E[X(t)]

E[X(t)X(s)]= lot los E[H(t,u)H(s,v)]E[Y(u)Y(v)]dudv.


682 Chapter 9. Stochastic Systems and Input

The kernel H(t, s) for X in Example 9.4 is exp (-R (t- s)) 1(t:::sl• where R is
a random variable. The expectations E[H(t, s)] and E[H(t, u) H(s, v)] in the
expressions of the mean and correlation functions of X are given by the moment
generating function E[e~ R] of the random variable Revaluated at~ = -(t - s)
and~ = -(t - u + s - v), respectively.<>

Note: Consider a deterministic differential equation x(t) = a(t) x(t) + y(t), where X E
Rd, a is a (d, d) matrix whose entries are real-valued functions of time, y E ~ denotes
the input, and the time argument t takes values in a bounded interval [0, r ]. If the entries
of a are continuous functions, then (1) the transition matrix h(·, ·)is the unique solution of
oh(t, s)jot = a(t) h(t, s), 0 :::; s :::; t, with h(s, s) equal to the identity matrix and (2) the
unique solution of the above differential equation is

x(t) = h(t, 0) x(O) + f h(t, s) y(s) ds

in the time interval [0, r] ([30], Theorem 1, p. 20, and Theorem 1, p. 40). It is assumed that
the samples of R satisfy the condition of these theorems a.s.
The kernels H(t, s), E[H(t, s)], and E[H(t, u) H(s, v)] in the above equations can
be interpreted as a Green's function for X, the mean of X, and the correlation function of
this process. The expectation E[H(t, u) H(s, v)] is also referred to as a stochastic Green's
function ([3], p. 86). &

9.2.4 State augmentation


We have seen that the operator D in Eq. 9.1 may have state independent
or dependent coefficients. If D has state dependent coefficients, Eq. 9.1 can be
replaced by the equation D*[X*(x, t)] = Y*(x, t), where X* andY* are aug-
mented versions of X and Y, respectively, and V * is an operator with state inde-
pendent coefficients, as illustrated later in this section (Example 9.10). Hence, we
can assume that V in Eq. 9.1 has state independent coefficients without loss of
generality. Let 0 be a vector collecting all random parameters in this equation.
The coordinates of e can be random variables or stochastic processes. The state
augmentation method involves the following steps.

1. Augment the solution X of Eq. 9.1 or of its discrete counterpart X with 0.


Let Z = (X, 0) be a random function including the output X and the random
parameters in D.
2. Write a differential equation specifying Z. If Z is a random process defined
by a stochastic differential equation, the methods in Chapter 7 can be used to
find moments and other properties of this process.

Example 9.7: Let X be the process in Example 9.4. The operator defining this
process has state independent coefficients and depends on a single random param-
eter 8 = R. The augmented process Z has the coordinates (Z 1 = X, Zz = R)
9.2. Methods of analysis 683

and satisfies the stochastic differential equation

d [ ~~~~~ J= [ ~Z1(t) Z2(t) J dt + [ ~ ] dB(t)

with the initial condition Z1(0) = X(O) and Z2(0) = R. Let JL(p,q; t)
E[X(t)P Rq] be the moment of order p + q of Z(t), where p, q :::: 0 are inte-
gers. The moments of Z(t) satisfy the differential equation

. p(p-l)a 2
JL(p, q; t) = -p JL(p, q + 1; t) + 2 JL(p- 2, q; t)

with the convention that JL (p, q; t) is zero if at least one of the arguments p, q is
strictly negative. Since the moment equations for Z (t) form an infinite hierarchy,
it is not possible to solve exactly these equations.
The Fokker-Planck equation can be used to find the evolution in time of the
density I of the diffusion process (ZJ, Z2). For example, the stationary density
Is of Z is Is (Zl, Z2) = c(z2) exp (-zi Z2/a 2), where c(z2) > 0 is a function of
Z2 such that J"'R dz1 j000 dz2 Is (ZI, Z2) = 1. <>

Proof: The differential equation for the second coordinate of Z states that E> is time-
invariant. The moment equations for J-t(p, q; t) are given by the Ito formula applied to the
function (X(t), R) 1-+ X(t)P Rq. Although the moments ~-t(O, q; t) = E[Rq] are known
for any value of q, the moment equations for Z(t) are not closed. For example, there are
two differential equations for p + q = 1. The equation for (p = 0, q = 1) provides no
information since the moments of R are known, while the equation for (p = 1, q = 0)
involves the moments J-t(l, 0; t) and J-t(1, 1; t). There are three differential equations for
p + q = 2. The equation for (p = 0, q = 2) provides no information, while the equations
for (p = 2, q = 0) and (p = 1, q = 1) involve the moments J-t(2, 0; t), J-t(2, 1; t),
J-t(l, 1; t), and ~-t(l, 2; t). Hence, up to moments of order 2 we have three informative
equations and five unknown moments.
The density f(·, ·; t) of (Zl (t), Z2 (t)) is the solution of the partial differential equa-
tion (Section 7 .3.1.3)

aj =
at
~
az 1
[z1 Z2 f + CT 2
2
~]
az 1
.

If R > 0 a.s., then f converges to a stationary density fs as t --7 oo so that

2
- a [ Zl Z2 fs
az 1 + -cr2 -az
afs] = 0.
1

Because the expression in the square bracket is a constant and this constant is zero by
the boundary conditions at infinity, we have ln Us(ZJ, Z2)) = -zi z2/cr + d(z2), where
d (Z2) is a constant with respect to the variable of integration Zl, which yields the stated
expression for fs. •

Example 9.8: Let X be the solution of dX (t) = - R X (t) dt + r dB(t), where


R and f' are random variables independent of each other and independent of the
684 Chapter 9. Stochastic Systems and Input

driving Brownian motion B. The process in Example 9.7 is a special case cor-
responding to a deterministic rather than random noise intensity. The augmented
state Z(t) = (Z1(t) = X(t), Z2(t) = R, Z3(t) = f) satisfies the stochastic
differential equation

Z1(t)] [ -Z1(t)Z2(t)] [ Z3(t)]


d [ Z2(t) = 0 dt + 0 dB(t)
Z3(t) 0 0
with the initial conditions Z 1(0) = X (0), Z2 (0) = R, and Z3 (0) = r. The
moments JL(p, q, r; t) = E[X(t)P Rq rr] of the state vector Z are given by the
infinite hierarchy of equations
. p (p- 1)
JL(p, q, r; t) = -p JL(p, q + 1, r; t) + 2 JL(p- 2, q, r + 2; t)
so that they cannot be obtained exactly from these equations. We use the conven-
tion JL(p, q, r; t) = 0 if at least one of the arguments p, q, r is strictly negative.
The Fokk:er-Planck equation can be used to find the evolution in time of
the density f of the augmented state vector Z(t). If R > 0 a.s., f converges to
the time-invariant density fs(Z1, Z2, Z3) = c(z2, Z3) exp (-z1 z2/zD as t-+ oo,
where c(z2, Z3) > 0 is a function of Z2 and Z3· <>
Note: The Ito formula applied to Z ~--+ zf z:{ 23 gives the moment equations. The
Fokker-Planck equation,

a (-Z1 Z2 f ) +1 -a-2 (Z32 f ) '


-at = - -
at az1 2 azi

giving the evolution of the density f of the state vector Z can be used to establish and solve
the differential equation for fs. A

Example 9.9: Let X be the solution of


X(t) + 2 f3 X(t) + [1 + cr1 w1 (t)J X(t) = a2 W2(t), t :::: o,
where f3 > 0, and cr1, cr2 are real constants, and W1 and W2 denote Gaussian
white noise processes defined as the formal derivative of the independent Brow-
nian motion processes B1 and B2. The moments JL(p, q; t) = E[X 1(t)P X2(t)q]
of X = (X 1 = X, x2 = X) satisfy the equations
/L(p, q; t) = p JL(p- 1, q + 1; t)- q JL(p + 1, q- 1; t)- 2q f3 JL(p, q; t)
q (q - 1) (T2 q (q - 1) (T2
+ 2 1 JL(p + 2, q - 2; t) + 2 2 JL(p, q - 2; t)

with the convention in the previous two examples. The moments of order p + q =
1 and p + q = 2 are given, respectively, by
/L(l, 0; t) = JL(O, 1; t),
jL(O, 1; t) = -JL(l, 0; t)- 2{3 JL(O, 1; t) and
9.2. Methods of analysis 685

/.L(2, 0; t) = 2~-t(1, 1; t),


/.L(1, 1; t) = ~-t(O, 2; t)- ~-t(2, 0; t)- 2~-t(1, 1; t),
f.t(O, 2; t) = -2 J.t(l, 1; t)- 4,8 J.t(O, 2; t) + o} J.t(2, 0; t) + a:f.
The above equations are closed, so that we can calculate moments of any order of
X exactly. <>
Proof: The process X = (X 1 = X, X 2 = X) is a diffusion process defined by the stochas-
tic differential equation

= X 2 (t)dt,
= -[X 1(t) + 2 fi Xz(t)] dt- o-1 X1 (t) dB1 (t) + uz dBz(t).
The average of the Ito formula applied to the function (XI Xz) r-+ xf Xi
gives the mo-
ment equations.
The moment equations of a specified order p + q have the form m(t) = a m(t).
If the real parts of the eigenvalues of a are negative, the moments m(t) converge to a
time-invariant finite value as t --+ oo and we say that the process is asymptotically stable
in the moments of order p + q. For example, the eigenvalue of a for p + q 1 are =
J
AI,2 = -fi ± fi2- 1 so that X is asymptotically stable in the mean if fi > 0. If the
dimension of the vector m is large, it is convenient to use the Hurwitz criterion to assess
the sign of the real part of the eigenvalues of a ([30], Theorem 2, p. 55).
The differential equation defining X can be used to model a simply supported beam
with constant stiffness x and length l > 0. The beam is subjected to a fluctuating axial
force W1 applied at its ends. The beam deflection V(x, t) at timet 2: 0 and coordinate
x E [0, l] satisfies the differential equation

a4 V(x, t) ) a 2 v(x, t) a2 V(x, t)


X ax4 = - Wt (t axZ - m atZ

where m is the beam mass per unit length. The representation V (x, t) = Y (t) sin(:ll' xI l)
of V corresponding to the first buckling mode gives

:i\t) + v2 (1- W(t)IPcr) Y(t) = 0,


where v2 = 1l' 4 X I (m 14 ) and Per = 1l' 2 XI 12 is the first buckling load for the beam. This
equation has the same form as the equation defining X without the damping term 2 fiX (t)
and the forcing function uz Wz, which have not been considered in the beam model. We
also note that the above beam equation can be generalized by considering the possibility
that the stiffness x is random. •

Example 9.10: Consider the stochastic problem in Example 9.2 characterized by


an operator V with state dependent coefficients and assume that the input Y is a
Gaussian white noise with a constant mean J.ty, that is, we have formally Y(t) =
J.ty +dB(t)fdt, where B denotes a Brownian motion. The augmented state vector
Z* = (Z1 =X, Zz =X, Z3 =A) is a diffusion process defined by
dZt(t) = Zz(t)dt,
{ dZz(t) = -~ [x(Z3(t)) Z1 (t) + c Zz(t)] dt + J.ty dt + ~ dB(t),
dZ3(t) =a [k(Z3(t)) H(t)]P dt,
686 Chapter 9. Stochastic Systems and Input

where H(t) [Z1(t) 2 + (Zz(t)jv(t)) 2 ] 112 , v(t) = [x(Z3)/m] 112 denotes the
instantaneous frequency of the plate, and the parameters a, fJ in the definition of
the functions x and k may be random. The operator D * defining Z* has state
independent coefficients.
The methods in Section 7 .3.1 can be applied to find properties of Z * con-
ditional on the random parameters in D*. For example, the stochastic averaging
method can be used to derive an approximate evolution equation for the crack
length Z3 = A ([175], Section 7.5.2.3). <>

Example 9.11: Let X be the state of a mechanical system

dX(t) = a(X(t), 8o) dt + b(X(t), 8o) dB(t), t 2: 0,

where 8o E JE.m is a vector depending on some system properties that can be


altered by design changes. To assess the usefulness of changing a current value
of 8o, it is necessary to calculate the effects of such changes on X. The functions
V;(t) = oX(t)joe;, i =I, ... , m, evaluated at 8 = 8o provide a useful measure
of the sensitivity of X to perturbations of 8 about 8 o and are called sensitivity
factors.
The sensitivity factors satisfy the differential equations

dV:·(t) = (oa(X(t), 8o) V: (t) + aa(X(t), 8o)) dt


l ax(t) l ae;

+ cb(:iZ/o) V;(t) + ob(Xa~:· 8o)) dB(t), i = 1, ... ' m,

with coefficient depending on the processes X and V;. It is assumed that the
functions a and b are such that the stochastic differential equations for X and V;
have unique solutions. The methods in Section 7.3.1 can be used to find properties
of X and V;.
Figure 9.4 shows three samples of X and the corresponding samples of
the sensitivity factor V = ax;ae for X(O) = 0, a(X(t), 8 0 ) = -eo X(t),
b(X(t), 8o) = I, and eo = 1. The resulting sensitivity factor V is the solu-
tion of the differential equation dV(t) = -(eo V(t) +X (t)) dt, and has samples
much smoother than the samples of X. <>

Note: The differential equations for the sensitivity factors result by the differentiation of
the defining equation for X with respect to the coordinates B; of () and then setting () equal
to Oo.
Note that the differential equations for sensitivity factors are linear in these factors.
Also, the generation of samples of X and V can be performed simultaneously by consid-
ering the evolution of the vector (X, V), or it can be performed sequentially by calculating
first samples of X and using them to calculate the corresponding samples of V. .._
9.2. Methods of analysis 687

Samples of V

-2L-----------------~ -1.5 '-------------------~


0 5 10 0 5 time 10
time

Figure 9.4. Three samples of X and the corresponding samples of V for() o = 1

9.2.5 Liouville equation

Let X be an Rd -valued process defined by the differential equation

X(t) = h(X(t), t), t 2: 0, (9.2)

where his a deterministic function and X(O) = Xo is an JRd-valued random


variable with density fo. It is assumed that the solution of Eq. 9.2 exists and is
unique. Let X(t) = g(X(O), t) be the solution of this equation. The density
f(·, t) of X(t), t 2: 0, can be obtained from the mapping X(O) ~ X(t) or
the solution of a special case of the Fokker-Planck equation, called the Liouville
equation. We present both methods for finding the density of X(t).

If~ ~ g(~, t) has continuous first order partial derivatives with respect to the
coordinates of~ and defines a one-to-one mapping, then

f(x; t) =Ill fo(g- 1 (x, t)), X E r' (9.3)

where Ill and g- 1 (~, t) denote the Jacobian and the inverse of~~ g(~, t).

Note: The relationship between the random vectors X(t) and Xo gives Eq. 9.3. This
equation is useful if the expression for the mapping ~ 1-+ g (~, t) can be obtained explicitly
(Section 2.11.1 in this book, [176], Theorem 6.2.1, p. 143). The relationship X(t)
g(Xo, t) can also be used to calculate moments and other properties of X(t). &
688 Chapter 9. Stochastic Systems and Input

If the solution ofEq. 9.2 exists and is unique, the density/(·; t) of X(t) satisfies

t
the Liouville equation
at=_ a(fhi) (9.4)
at i=l axi
with the initial and boundary conditions f(x, 0) = fo(x), limJxd-Hxl f(x; t) =
0, i = 1, ... , d, respectively, so that

f(x, t) = fo(g- 1 (x, t)) exp [ - {


lo
t
Ld
i=l
ah·( s)
1
~· ds ] ,
ay,
(9.5)

where y = g(~, s) and~= g- 1(y, s).

Proof: The proof of the Fokker-Planck equation in Section 7.3.1.3 can be used to establish
Eq. 9.4. Let cp(u; t) = E [eAur X(t)J be the characteristic function of X(t) so that
f(x; t) = (1/(2 n)d) JJR;d e A ur x cp(u; t) du. The derivative with respect to time of cp
gives

~~ = E [ t R ukXk(t)eAur X(t)] = R t u k E [xk(t)eAur X(t)J


k=l k=l
d
= RL:>k E [hk(X(t), t)eAuT X(t)J.
k=l

The Fourier transform of the left side ofEq. 9.4 is aj jat. The Fourier transform of the right
side ofEq. 9.4 is- L.t=l [JR;d[a(f hk)faxk]eAuT x dx which gives the right side of
the above equation following integration by parts.
The explicit solution of Eq. 9.4 in Eq. 9.5 can be obtained from the associated La-
grange system ([176], Theorem 6.2.2, p. 146) •

If the function h in Eq. 9.2 depends on a finite number of random parameters


9, the vector X can be augmented to include these parameters. The evolution of
the augmented vector X = (Xx = X, Xe = 9) is defin~d by a differential
equation consisting of Eq. 9.2 and the additional equations Xe(t) = 0 with the
initial condition Xe(O) = 9.

Example 9.12: Let X be the solution of X(t) + v2 X(t) = 0, t 2: 0, with the


random initial conditions (X(O) = Xo, X(O) = Xo), where v > 0 is a constant
and fo denotes the density of (Xo, Xo). The relationship between the values of
the state vector X= (X 1 =X, Xz =X) at a timet 2: 0 and the initial timet = 0
is
X(t) = (X(O) t) = [ cos(~t)
g '
(lfv) sin(vt)
-v sm(v t) cos(v t)
X(O) J
9.2. Methods of analysis 689

so that

X(O) = -1 (X(t) t) =[ cos_(v t) -(1/v) sin(v t) ] X(t).


g ' v sm(v t) cos(v t)

The mapping X(O) 1--+ X(t) = g(X(O), t) can be used to calculate moments of
X (t) from moments of X (0) and to find the density,

f(x; t) = fo(xl cos(v t) - (x2jv) sin(v t), Xl v sin(v t) + x2 cos(v t))


of X(t) from the density of X(O) (Eqs. 9.3 and 9.5). ~

Note: The Jacobian of~ H- g(~, t) is unity so that f (x, t) = fo(g- 1 (x, t)) by Eq. 9.3.
The evolution equation for X is given by xl (t) = hl (X(t), t) = X2(t) and X2(t) =
h2(X(t), t) = -v 2 Xt(t) sothatah1/ax1 = 0, ah2/ax2 = O,andtheexponentialfunction
in Eq. 9.5 is unity. The density f(x; t) in the above equation coincides with the result in
Eq. 9.3 ([176], Example 6.1, p. 143). •

Example 9.13: Let X be the solution of X(t) + C X(t) + K X(t) =a sin(v t),
where C > 0 and K > 0 are independent random variables and a, v > 0 are some
constants. The augmented vector X with coordinates (Xl = X, = X, = x2 x3
C, X4 = K) satisfies the differential equation

so that the density j of X(t) satisfies the Liouville equation (Eq. 9.4)

aj a (- -) a [ _ _ _ _ . 1-]
- = - - - - X2f ---- (-X4Xt-X3X2+asm(vt))
at axl ax2

with solution given by Eq. 9.5. ~

Note: If in addition the input parameters (a, v) were random, the corresponding augmented
vector i would be an JR6 -valued process. The evolution of this process would be defined
by the above differential equations and two additional equations stating that the parameters
(a, v) are time-invariant. &

Example 9.14: Let X be an JR2 -valued process defined by X(t) = A X(t) with
the initial condition X(O) =X o, where the entries (1, 1), (1, 2), (2, 1), and (2, 2)
of A are -A1, A2, A1, and -A2, respectively, and At, A2 are random variables.
Then ([176], Example 8.2, p. 223)

- 1 [ A2+Ate-(AJ+Az)t A2 (1-e-(AJ+Az)t) J
X(t)-A1+A2 A1(1-e-(A 1+Az)t) At+A2e-<AI+Az)t Xo.
690 Chapter9. Stochastic Systems and Input

0.2
Mean of X 1 Std of X 1
0.8 p=O.O

0.6 0.1 p=1.0


p=O.O
0.4 Jfo' 0.05
'p=1.0
0.2 0
0 2 3 0 2 3
time time

Figure 9.5. Evolution of the mean and standard deviation of X 1 for lognormal
random variables (A 1, A2) and initial condition X o = (1, 0)

Figure 9.5 shows the evolution in time of estimates of the mean and stan-
dard deviation of X 1 for Xo = (1, 0) and lognormal random variables (A 1, A2)
with means (1.0, 0.5), coefficients of variation (0.3, 0.4), and two values of the
correlation coefficient p of the Gaussian images of A 1 and A2. The estimates are
based on 500 independent samples of X 1 . The first two moments of X 1 converge
to the corresponding moments of A2/(A1 + A2) as t--+ oo. <>

Note: The lognormal random variables Ai, i = 1, 2, are defined by Ai = exp(ftg,i +


ag,i Gi), where Gi are correlated N(O, 1) variables with correlation coefficient p. The
parameters (rtg,i• ag,i) relate to the first two moments (fti, (vi rti) 2) of Ai by aff,i =
ln(1+vf) andrtg,i = ln (rti/}1 + vr). The correlation coefficients p and~ of(G1, G2)
and (A 1, A2) satisfy the condition ([79], pp. 48-52)

Because X1(t) + X2(t) = 0 by the defining equation for X, we have X1(t) +


X2(t) = 1 for the initial condition Xo = (1, 0), so that E[X 1(t)] + E[X2(t)] = 1 and
Var[X1 (t)] = Var[X2(t)] at any timet :::: 0 A

9.2.6 Taylor, perturbation, and Neumann series methods


We have solved inhomogeneous differential and integral equations with ran-
dom coefficients in Sections 8.4.1.2, 8.4.1.3, 8.4.1.4, and 8.4.1.5 by the Taylor,
perturbation, Neumann series, and the decomposition and iteration methods, re-
spectively. The extension of these methods to the case in which both the operator
9.2. Methods of analysis 691

and the input are stochastic is elementary for the case in which D has state inde-
pendent coefficients.
We only present three examples illustrating how some of the methods in
Sections 8.4.1.2-8.4.1.5 can be extended to solve Eq. 9.1.

Example9.15: LetUbethesolutionof(d/dt-a+ZU(t) 2 )U(t) = Q sin(vt),


where t 2: 0, U(O) is specified, (Z, Q) are random variables in L2 with means
(f.,Lz, /Lq) and variances (a}, aJ'), Z and Q are uncorrelated, and a and v > 0 are
some constants. The solution of this equation, denoted by U(t; Z, Q), is a func-
tion of time and depends on the random parameters Z and Q. Let U(t; f.L z, f.,Lq).
Vz(t; f.Lz, /Lq). and Vq(t; f.Lz, /Lq) denote the functions U, au ;az, and au jaQ,
respectively, evaluated at the mean of (Z, Q).
Figure 9.6 shows the deterministic functions U(t; f.Lz, f.Lq), Vz(t; f.Lz, f.Lq),
and Vq(t; f.Lz, /Lq) for a = 1, f.Lz = -0.5, f.Lq = 1, and v = 5. These functions

1.4

1.2
\
0.8
U(t;l!
z .11q)

-0.2

-o. 4o'-----~--~--~--~-___J10

time

Figure 9.6. Functions U(t; f.Lz, f.Lq), Vz(t; f.Lz, f.,Lq). and Vq(t; f.Lz, f.,Lq) for a = 1,
f.Lz = -0.5, f.,Lq = 1, and v = 5

can be used to find properties of U approximately from

U(t; Z, Q) ~ U(t; f.Lz, f.,Lq) + Vz(t; f.Lz, f.,Lq) (Z- f.Lz) + Vq(t; f.Lz, f.,Lq) (Q- f.,Lq).
For example, the approximate mean and variance of U(t) are f.,L(t) ~ U(t; f.Lz, f.,Lq)
and a(t) 2 ~ Vz(t; f.Lz, f.,Lq) 2 a}+ Vq(t; f.Lz, f.Lq) 2 aJ. <>
Proof: The function U(t; J.Lz, J.Lq) is the solution of the differential equation for U with
(J.Lz, J.Lq) in place of (Z, Q). The sensitivity factors Vz and Vq can be obtained as in
Section 8.4.1.2. For example, the differential equation,

(d fdt -a) Vq (t; J.Lz, J.Lq) - 3 J.Lz U(t; J.Lz, J.Lq ) 2 Vq (t; J.Lz, J.Lq) = sin(v t),
for Vq results by differentiating the defining equation for U with respect to Q and setting
(Z, Q) equal to its expectation (J.Lz, J.Lq). •
692 Chapter 9. Stochastic Systems and Input

Example 9.16: Let U be the solution of (d/dt + f3 + e Y(t)) U(t) = Q, where


t 2: 0, U (0) = 0, Y is a stochastic process with mean zero, Q denotes a random
variable, and e is a small parameter.
The first order perturbation solution is U(t) = Uo(t) + e Ut(t), where
Uo(t) = (Q/{3) (1-e-fi 1 ), Ut(t) = -(Q/f3)Z(t)e-fi 1 , and Z is a process
given by Z(t) = J; (efis - 1) Y(s) ds. Moments and other probabilistic proper-
ties of U can be approximated from this or any higher order perturbation solution.
Figure 9.7 shows histograms of the exact and first order perturbation solu-
tions for U (0) = 0, f3 = 1, t = 5, Q ~ U (0.5, 1.5), andY (t) = Y ~ U (0.5, 1.5).
The histograms have been calculated from 1,000 samples of these solutions under

1.5 ,------~---------, 1.5 ,-----~-~--~-----,


Exact Approximate
(£=0.1) (£=0.1)

0.5 0.5

0 ~ 0 h
0 0.5 1.5 2 0 0.5 1.5 2

2 2
Exact Approximate
1.5 (£=0.5) 1.5 (£=0.5)

2 3

Figure 9.7. Histograms of the exact and first order perturbation solutions for
U(O) = 0, {3 = 1, t = 5, Q ~ U(0.5, 1.5), and Y(t) = Y ~ U(0.5, 1.5)

the assumption that Q and Y are independent random variables. The histograms
of the exact and first order perturbation solutions are similar fore = 0.1 but differ
significantly fore = 0.5. <>
Note: The functions Uo and U1 satisfy the differential equations Uo(t) + fJ Uo(t) = Q
with Uo(O) = 0 and U1 (t) + fJ U1 (t) = - Y(t) Uo(t) with U1 (0) = 0, respectively, which
yield the stated solutions. The approximate mean is (E[Q]/ fJ) (I - t)
e-fi + O(e 2 ). The
expectation of the product (Uo(t) + e U1 (t)) (Uo(s) + e U1 (s)) gives the terms of orders
one and order e but only one of the three terms of order e2 (Section 8.4.1.3). A
9.2. Methods of analysis 693

Example 9.17: Consider the differential equation (djdt + {3 + Y) U(t) = Q,


where t E [0, r], {3 = 1, U(O) = 0, Y "' U( -0.5, 0.5), Q "' U(0.5, 1.5)
is independent of Y, and r = 2. Because IYI < 1/r a.s., the Neumann series
representation of U is convergent and

U(t) = f(t)- A. loT: H(t, s; A.) f(s) ds


where A.= -1, f(t) = (Q/{3) (1- e-13t), H(t, s; A.)= - .E~ 1 A.'- 1 K,(t, s),
K,(t,s) = J;K(t,cr)Kr-t(cr,s)dcrforr 2: 2andKt(t,s) = K(t,s) =
l[O,t](S) Y e-13 (t-s).
Figure 9.8 shows with solid and dotted lines the exact and approximate

0.4
Mean Standard
0.8
0.3 deviation
0.6
0.2
0.4
0.1
0.2

0.5 1 1.5 2 0.5 1 1.5 2


time time

Figure 9.8. Exact and approximate mean and variance of U for {3 = 1, U (0) = 0,
Y"' U(-0.5,0.5), Q"' U(0.5, 1.5),andr = 2

mean and variance of the solution U in the time interval [0, 2]. The approximate
solution is given by the first two terms of the above Neumann series. In this case
the approximate mean and variance functions are accurate. Generally, a larger
number of terms needs to be retained from the Neumann series to obtain satisfac-
tory approximations. ¢
Proof: The integral form of the differential equation for U is

U(t) = U(O) e-13 t +lot e-fl (t-s) ( -Y U(s) + Q) ds

=j (1-e-l3t)-Ye-l3t lo el3su(s)ds
1

by viewing - Y U 0 + Q in the equation for U as the input. Hence, U is the solution of


the Fredholm equation of the second kind,

U(t)=f(t)+).. loT: K(t,s)U(s)ds,


694 Chapter 9. Stochastic Systems and Input

with A. = -1, where f(t) = (Q/ {3) (1 - e-13 1 ) and K (t, s) = 1[0,t] (s) Y e-13 (t-s) (Sec-
tion 8.4.1.4). •

9.2.7 Galerkin method


Let X be defined by Eq. 9.1 with some initial and boundary conditions. Let
g be a collection of ~n -valued deterministic functions with support D that are
square integrable, linearly independent in D, and satisfy the initial and boundary
conditions for Eq. 9.1. The members of g are called trial functions. We intro-
duce the inner product (f, g) = fv f(x)T g(x) dx on Q. For computation it is
convenient to construct g such that its members are orthogonal, that is, (f, g) = 0
for f, g E g distinct.
Let ({Jp E Q, p = 1, ... , m, and define

m
X(x, t) = L Cp(t)({Jp(x), xED, t 2:0, (9.6)
p=l

where the real-valued coefficients C P are random and may depend on time. The
function X is a member of the space S spanned by the trial functions ({J 1, ... , ({Jm.
Suppose that we approximate the solution of Eq. 9.1 by a member of S. The error
or residual of this approximation is

m
R(x, t) = D[X(x, t)]- Y(x, t) = L D[Cp(t) ({Jp(x, t)]- Y(x, t), (9.7)
p=l

where the last equality holds if D is linear in X.


The Galerkin solution for Eq. 9.1 is a member X of S whose coordinates
CP are determined from the condition that the residual R in Eq. 9. 7 is orthogonal
a.s. to the trial functions ((JI, •.. , ({Jm· The Galerkin solution is (1) useful if the
number m of trial functions in Eq. 9.6 is small since, in this case, the solution of
Eq. 9.1 is replaced by the solution of a much smaller problem for the unknown co-
efficients CP in Eq. 9.6 and (2) accurate if the selected trial functions capture the
essential features of the solution of Eq. 9 .1. There is no theory for selecting ad-
equate trial functions. Generally, physical and analytical considerations are used
to select the trial functions for a particular problem. If there is little information
on the class of adequate trial functions, the approximate representation in Eq. 9.6
can be based on, for example, Bernstein polynomials, as illustrated later in this
section (Example 9.21).
9.2. Methods of analysis 695

The coefficients C P of the Galerkin solution in Eq. 9.6 are given by


m
l)D[Cp(t) cpp], cpq) = (Y, cpq) a.s, q = 1, ... , m, (9.8)
p=l
provided that V is linear in X.
Note: We require that almost all samples of X be orthogonal to the trial functions. This
condition gives Eq. 9.8, that is, m algebraic or differential equations for the unknown coef-
ficients Cp.
The coefficients Cp are random because of the uncertainty in the coefficients of D
and the input Y. Generally, these coefficients depend on time so that they are stochastic
processes giving the evolution in time of the coordinates of the Galerkin solution X in the
basis defined by the trial functions f{!p. A

Under some conditions Eq. 9.8 becomes an algebraic equation with random
coefficients for C. For example, if V is a linear differential operator involving
only partial derivatives with respect to the coordinates of the space argument x,
the solution of Eq. 9.1 is time-invariant so that X in Eq. 9.6 becomes X(x) =
r;;=l Cp cpp(x) and Eq. 9.8 yields the algebraic equation A C = Y, where A =
{(V[cpp],cpq)},p,q = 1, ... ,m,C = (Ct, ... ,Cm),andY = {(Y,cpq)},q =
1, ... , m. Properties of C can be obtained from the solution of the above algebraic
equation.

Example 9.18: Let X be the solution of X(t) =A X(t), t E [0, 1], with the initial
condition X (0) = 1. The exact solution of this equation is X(t) = exp(A t). If A
is uniformly distributed in the range (a 1, a2), the exact moments of X are

E[X(t)m] = E [emAt] = 1 (ema2t _ ema 1 t).


m (a2- at) t

The Galerkin solution with the functional form X(t) = 1 + Ct t + c2 t 2 is


- -36 A (2 A - 5) + 20 A (3 A - 8)
X(t) = 1 + 3 A2- 12 A+ 20 t
30 A (3 A- 4) + 40 A (2 A- 3) 2
+ 3A 2 -12A+20 t
Figure 9.9 shows the mean and standard deviation of X and its approxima-
tion X for A "' U ( -1.5, -0.5). The first two moments of X have been estimated
from 500 independent samples of this process. <>
Note: The functional form considered for X satisfies the initial condition for any values of
Ct and C2 since ¢1 (t) = t and ¢2 = t 2 are zero at t = 0. The conditions in Eq. 9.8 imply
JJ t R(t) dt = 0 and JJ
t 2 R(t) dt = 0 so that

{ -A/2 + (1/2- A/3) C 1 + (2/3- A/4) C2 = 0,


-A/3 + (1/2- A/4) Ct + (1/2- A/5) C2 = 0,
696 Chapter 9. Stochastic Systems and Input

0.2 '-----~~------'
0 0 ·5 time 0 ·5 time

Figure 9.9. Mean and standard deviation of X and X for A"' U( -1.5, -0.5)

where R(t) =-A+ (1- At) C1 + t (2- At) Cz denotes the residual. These equations
show that the vector C = (Ct. Cz) is the solution of an algebraic equation with random
coefficients and input, and give the above expressions for C. If C has a large dimension,
it is not possible to obtain analytical expressions for the coordinates of C. In this case, the
Taylor series, perturbation, and other methods presented in the first part of this chapter can
be used for solution ([178], pp. 511-522).
The accuracy of the Galerkin solution depends strongly on the trial functions. For
example, the Galerkin solution X(t) = 1 + C t with C = A/(1 - 2 A/3) derived from the
orthogonality condition JJR(t) t dt = 0 provides unsatisfactory approximations for both
the mean and variance of X in [0, t].
We also note that the process Y(t) = X(t)- 1 satisfies the differential equation
Y(t) =A (Y(t) + 1) with the initial condition Y(O) = 0. It is convenient in some applica-
tions to modify the original differential equation so that its new version has homogeneous
initial and boundary conditions. A

Example 9.19: The Galerkin solution can also be applied to solve nonlinear dif-
ferential equations with random coefficients and input. Let X be the solution of
X(t) = AX(t) 2 fort e [0, l]andX(O) = 1. LetX(t) = l+L:;=l CpgJp(t)with
gJp(O) = 0, p = 1, ... , m, be a Galerkin solution for this equation. The unknown
coefficients of the Galerkin solution satisfy the nonlinear algebraic equation

m m
,L:apkCp-A L /3pqkCpCq=0, k=l, ... ,m,
p=l p,q=l

where CXpk = J~ ~p(t) glk(t) dt and /3pqk = J~ gJp(t) gJq(t) glk(t) dt. <>

Note: The Galerkin solution X satisfies the initial conditions for any values of the coeffi-
9.2. Methods of analysis 697

cient C1, ... , Cm. The residual

R(t) = t
p=!
Cpri;p(t)- A (t
p=l
Cprpp(t))
2

n m
= :Lcpri;p(t)-A L CpCqrpp(t)rpq(t)
p=! p,q=!

and the orthogonality conditions JJ


R(t) rpk(t) dt = 0, k = 1, ... , m, give the equations
for Cp.
Generally, the solutions C 1, ... , Cm of the above nonlinear algebraic equations can-
not be obtained analytically. Approximate properties of C1, ... , Cm can be obtained by
Monte Carlo and other methods for solving algebraic equations with random coefficients
(Section 8.3.1). A.

Example 9.20: Let X be the solution of Eq. 9.1 with D = (0, l), l > 0, and
D = a4 1ax 4 + M (x) a21at 2 depending on a random field M with strictly positive
samples in (0, l). The initial and boundary conditions are X(x, 0) = 0, X(O, t) =
X(l, t) = 0, and aX(O, t)lax 2 = aX(l, t)lax 2 = 0.
The coefficients C P of the Galerkin solution

_
X(x, t) = L Cp(t) fPp(x) = cl (t) sin(-[-)+
2 nx (2nx)
C2(t) sin - l -
p=l

satisfy the differential equations

{
Mn cl (t) + M12 C2(t) + (JT I l) 4 (l 12) cl (t) = Yl (t),
M21 cl (t) + Mn c2 (t) + (21T I l) 4 (l 12) cl (t) = Y2(t),
where Mpq = J6
M(x) sin(p JT xl l) sin(q JT xl l) dx, C p(O) = 0, and Cp(O) =
0, p, q = 1, 2. The terms on the right side of the above equation are Y q (t) =
Jci Y(x, t)sin(q JT xll) dx, q = 1, 2.
Suppose that the input is Y(x, t) = Y(t) =a Y(t), where a is a constant
andY denotes an Omstein-Uhlenbeckprocess defined by dY(t) = -p Y(t) dt +
-J2P dB(t), where p > 0 is a constant and B denotes a Brownian motion. Then
the vector C(t) = (CJ(t), C1(t), C2(t), C2(t), Y(t)) is an JR5 -valued diffusion
process conditional on the random coefficients M pq . The methods discussed pre-
viously in this chapter and results in Section 7.2.1 can be used to find properties
of C.<>
Note: The error of the approximate solution,
2
R(x, t) = "'[(p:rr)4
~ . ] sin (pTC
- 1- Cp(t) + M(x) Cp(t) - 1 -X) - Y(x, t),
p=!
698 Chapter 9. Stochastic Systems and Input

and the orthogonality conditions give the differential equations for the unknown coeffi-
cients Cp. These coefficients and their first derivative are zero at the initial time since
X(x, 0) = 0. The coefficients Cp are stochastic processes defined by differential equa-
tions with random coefficients and input.
The random function X gives the deflection of a simply supported beam with unit
stiffness, span l > 0, and random mass M(x), x E [0, l], that is subjected to a random load
Y(x, t), x E [0, l], t 2: 0 . .&

Example 9.21: Let g : [0, 1] --+ ~be a bounded function that is continuous
at x E [0, 1] and let bn(x) = L~=0 g(kjn)Pnk(x) be the Bernstein polyno-
mial of order n of g, where Pn,k(x) = [n!j(k! (n- k)!)]xk (1- x)n-k. Then
limn--+oo bn(x) = g(x). This limit holds uniformly in [0, 1] if g is continuous
in this interval. The Bernstein polynomials can be extended simply to represent
real-valued continuous functions defined on ~d, d > 1 ([123], p. 51).
Let G : [0, 1] x n --+ ~be a bounded random function on a probability
space (Q, F, P), which is uniformly continuous in probability on [0, 1]. Then the
sequence of random Bernstein polynomials,
n
Bn(X) =L G(kjn) Pn,k(X), (9.9)
k=O

converges uniformly in probability to Gin [0, 1]. This property suggests the use
of Bernstein polynomials as trial functions in Eq. 9 .6. The corresponding Galerkin
solution is X(x, t) = L~=O Ck(t) Pn,k(x), where Ck are random unknown coef-
ficients, which can be determined from Eq. 9.8.
Random Bernstein polynomials can also be used to develop parametric rep-
resentations for stochastic processes. These representations are useful in Monte
Carlo simulation and analytical studies ([79], Section 4.3.2.3). ¢

Proof: Foragivene > Owecanfind/l > Osothat ix-x'i < /l implies ig(x)-g(x')i < e
by the properties of g. We have

n
ig(x)- bn(x)i = ~)g(x)- g(kfn)) Pn,k(x) ~ L ig(x)- g(kjn)i Pn,k(x)
k=O lkfn-xl<8
+ L ig(x) - g(kjn)i Pn,k<x).
lkfn-xi:0:8

The sum indexed by ikfn -xi < /l is smaller than e Llkfn-xl<8 Pn,k(x) ~ e since
ig(x)- g(kjn)i ~ e and Lk=O Pn,k(x) = 1. The sum indexed by ikfn -xi 2: /lis smaller
than 2 fJ Llkfn-xl::::8 Pn,k(x), where fJ bounds g, that is, SUPxe[O, l]ig(x)i ~ {3. We also
have
n n
L(k -nx) 2 Pn,k(X) = L[k(k -1)- (2nx -l)k+n2 x 2 ]Pn,k(x) = nx (1-x),
k=O k=O
9.2. Methods of analysis 699

since Lk=O k Pn,k(x) = n x and Lk=O k (k- 1) Pn,k(x) = n (n- 1) x 2, and

"' 1 "' 2 nx(l-x) 1


LJ Pn,k(x) :S 82 LJ (k/n- x) Pn,k(x) :S n2 82 :S 4n 82.
lk/n-xl?.8 lk/n-xl?.8

These results show that lg(x)- bn(x)l :S e + fJ/(2n8 2 ) so that lg(x)- bn(x)l :S 2e
for n ~ fJ/(2e8 2 ). If g is continuous in [0,1], then lg(x)- bn(x)l :S 2e holds with 8
independent of x so that bn converges uniformly to g in this interval.
It remains to show that Bn in Eq. 9.9 converges uniformly in probability to G in
[0, 1], that is, that for any e, 17 > Othere exists n(e, 17) such that P(IBn(x)-G(x)l ~e) :S 17
for n ~ n(e, 17), or equivalently that

~
d(Bn, G)= sup d(Bn(x), G(x)) = sup
xe[O,l]
i
I ( )
xe[O,l] n 1 + Bn X
IBn(x)- G(x)l
-
G( )I P(dw)
X

can be made as small as desired as n ~ oo. Take a fixed x E [0, 1] and an integer n ~ 1.
Let li = [(i - 1)/n, i/n] be the subinterval of [0, 1] containing x. Since G is uniformly
continuous in probability in [0, 1], for any e, 17 > 0 there exists 8 = 8(e, 17) such that
lx- x'l :S 8 implies P(IG(x)- G(x')l ~e) :S 11· Hence, we have P(IG(x)- G(v/n)l ~
e) :S 17 for v = i - 1, i if 1/n :S 8. Suppose that n satisfies the condition 1/n ::::; 8. Then

~
d(Bn, X)= sup d (Bn(t), X(t)) = sup
xe[O,l]
i
IBn(x)- G(x)l
I
xe[O,l] n 1 + Bn(x)- G(x)
I P(dw)

= xe[O,l]
~ i
n
I L~-0 G(vjn) Pn,v- G(x) L~-oPn,vl
1 + IBn(x)- G(x)l
P(d )
())
< sup
- xe[O,l]Jn
r L~=O1 +IG(v/n) - G(x)l
IBn(x)-G(x)l
Pn, v P(dw) < 4 fJ rJ + 3 e +
-
_!!_,
2n8

where SUPxe[O,l]IG(x)l :S {J a.s. The above upper bound holds since (1) the integral in the
above summation corresponding to v = i - 1 can be separated in two integrals over the
events {IX((i- 1)/n)- X(t)l ~ e} and {IX((i- 1)/n)- X(t)l < e} and these integrals
are smaller than 2 fJ 17 and e, respectively, (2) similarly, the integral corresponding to v = i
is smaller than 2 fJ 17 + e, and (3) the remaining sum of integrals satisfies the inequality

r L~1IG(vjn)-
Jn
G(x)l Pn v r~
+ IBn(x)- B(x)l , P(dw) :S Jn LJ IG(vjn)- G(x)l Pn,v P(dw)
v=O

so that it is smaller thane+ {Jj(2n 82 ) by the upper bound on lg(x)- bn(x)l, where the
sum L~ includes the terms with indices v '# i - 1, i. We have found that

for any e,17 > 0 and n ~ n(e,17) so that Bn converges uniformly in probability toG on
[0, 1] as n ~ oo. •
700 Chapter 9. Stochastic Systems and Input

9.2.8 Finite difference method


Let X be the solution ofEq. 9.1 with some initial and boundary conditions.
Consider a time sequence ti = i At, i = 0, 1, ... , At > 0, and a finite collection
of points Xk ED. Denote by Xk,i = X(xk, ti) the value of X at the node (k, i)
of the finite difference representation for Eq. 9.1. The finite difference solution of
this equation involves the following steps.

1. Develop a finite difference representation for Eq. 9.1 and for the initial and
boundary conditions of this equation by approximating all derivatives at the
nodes (k, i) by finite differences.

2. Find moments and other probabilistic properties of the unknowns X k,i by the
methods presented in this and previous chapters.

Note: The derivatives in Eq. 9.1 and in the initial and boundary conditions for this equa-
tion can be approximated by finite differences of various accuracy. For example, the par-
tial derivative 8Xf8t can be approximated at (x, t) by [X(x, t + .t.t)- X(x, t)]j.t.t or
[X(x, t + .t.t)- X(x, t- .t.t)]/(2 At). The errors of these finite difference approximations
are of orders 0(-t.t) and O((At) 2 ), respectively.
If only the spatial derivatives are approximated by finite differences, we obtain dif-
ferential equations giving the time evolution for a vector with entries X(t, Xk). The coeffi-
cients of these equations depend on values of the random coefficients of V at the nodes of
the finite difference representation for Eq. 9.1. A

Example 9.22: Let X be the solution of

[ U(x) 2ataz a V(x) - az2


- V'(x)-- J X(x, t) = Y(x, t), X E (0, 1), t ~ 0,
ax ax

where U, V > 0 are random fields, V'(x) = dV(x)jdx, and the input Y(x, t)
depends on both time and space arguments. The initial and boundary conditions
are X(x, 0) = o, X(O, t) = o, and ax(l, t)jax = o.
Take ti = i At and Xk = kjn, k = 0, I, ... , n, where n > 1 is an in-
teger giving the number of equal intervals considered in D = (0, 1). The finite
difference representation of the equation of X at (x k, ti) is

where Xk,i = X(xk, t;), Vk = V(xk), Vfc = V'(xk), Uk = U(xk), and Yk,i =
Y(xk. ti). The coefficients in the above finite difference equation are

Ak = Wk ( Vk/(2jn) 2 - V£/(2/n)),

Bk = 2 Wk ( Uk/((Atp)- Vk/(2fn) 2 ), and

Ck = Wk ( V£/(2/n) + Vk/(2jn) 2 ),
9.3. Mechanics 701

where wk = (.M) 2 I uk. These coefficients are random because they depend on
the random variables Uk. Vk. and V{ The above finite difference equations for
Xk,i satisfy the initial and boundary conditions X k,-l = Xk,O = 0 and Xo,i = 0,
Xn+l,i = Xn-l,i· respectively.<>

Note: The solution X gives the elongation of a rod with unit length and cross section area,
material density U(x), and modulus of elasticity V(x), subjected to an axial force Y(x, t)
at location x and timet. The differential equation for X follows from (1) the Hooke law
stating that the stress in the rod is S(x, t) = V(x) aX(x, t)jax and (2) the Newton law
giving the condition
U( ) a2 X(x, t) _ as(x, t) Y( )
X at2 - ax + X' t •
The rod is at rest at the initial time so that Xk,-1 = Xk,O = 0 for all k's. The boundary
conditions Xo,i = 0 and Xn+l,i = Xn-l,i show that the rod is fixed at x =
0 and that the
stress is zero at x = 1. .l

We conclude here our discussion on some methods for solving Eq. 9.1. The
remaining part of this chapter presents stochastic problems selected from various
fields of applied science and engineering. The solutions of the problems in the
following sections are based on the methods in Chapters 6, 7, 8, and the first
part of Chapter 9. An exception is the discussion of the stochastic finite element
method in the following section. This method can be applied to find the solution
of Eq. 9.1, but is discussed in the following section because of its wide use in
mechanics.

9.3 Mechanics
The finite element method is the preferred method for solving determinis-
tic and stochastic problems in mechanics. The method has been applied to analyze
problems in physics, chemistry, aerospace, and other fields [28].
The finite element method can be viewed as a mixture of the finite difference
and the Rayleigh-Ritz methods. The unknowns in the finite element method are
values of X in Eq. 9.1 at a finite number of points, called nodes. The equations
for the values of X at nodes are obtained from variational principles.
We formulate a variational principle used frequently in mechanics, illus-
trate the Rayleigh-Ritz method by a simple example, give essentials of the finite
element method for deterministic problems, extend this formulation to stochastic
problems, and present finite element solutions for some stochastic problems.

9.3.1 Variational principles


Consider an elastic solid in an open bounded subset Din IRq, q = 1, 2, 3,
with boundary aD that is in equilibrium under the action of body forces b(x, t),
702 Chapter 9. Stochastic Systems and Input

XED, boundary traction p(x, t) for X E aDa and t 2: 0, and imposed displace-
ment u(x, t) for X E aDu and t 2: 0, where aD= aDa u aDu and aDa n aDu =
0. Let Ui (x, t), Sij (x, t), and Yij (x, t) denote the displacement, stress, and strain
fieldsatx E Dattimet 2:0. Thevectorss = (su,s12,S13,S22,s23.s33)and
y = (yu, YI2. YI3. J122, J123, Y33) contain the entire information on stresses and
strains since Sij = sji and Yij = Yji. The equilibrium conditions require

Sij,j(X, t) + bi(X, t) = 0, XED,


Sij(X, t)nj(X, t)- Pi(X, t) = 0, X E aDa, (9.10)

fort 2: 0, where Sij,j = asij ;ax j, n j (x, t) denotes the coordinate j of the exterior
normal n(x, t) at x E aDa and timet, and summation is performed on repeated
subscripts (Section 8.5.2 in this book, [68], Section 3.4).
Letu(x,t) = (ut,U2,U3)(x,t)bethedisplacementatx ED c ~3 and
t 2: 0 and let Uk, k =1, 2, be two arbitrary displacement fields in jj =
D u aD
such that Uk(X, t) = U(X, t) for X E aDu and t 2: 0. The difference 8u = Ut- U2
is said to be a virtual displacement. Virtual displacements must vanish on aD u
but are arbitrary on aDa.

The principle of virtual displacements states that an elastic solid in D c ~q ,


=
q 1, 2, 3, is in equilibrium if

{ sT 8y dx = { bT 8u dx + { pT 8u du(x), V8u, (9.11)


Jn Jn lana
where 8y is the first order variation of the strain vector corresponding to 8u and
du(x) is an infinitesimal surface element in aDa ([28], Section 1.3).

Note: The left and right sides of Eq. 9.11 are the first order variations of the strain energy
and external work, respectively. It can be shown that the principle of virtual displacements
constitutes an alternative statement of equilibrium. This principle is valid for any material
behavior and magnitude of the displacement field ([28], Section 1.3). A dual of the prin-
ciple of virtual displacements, called principle of virtual forces, can also be established.
This principle provides an alternative statement of compatibility ([28], Section 1.5, [68],
Section 10.9). 6

Example 9.23: Let u (x, t) be the solution of

(axa4
- 4
m a2)
+- 2
X at
u(x, t) = y(x, t),

for t 2: 0, x E D = (0, 1), m, x, l > 0, the boundary conditions u(O, t) =


u(l, t) = 0 and a 2u(O, t)jax 2 =
a2u(l, t)jax 2 =
0, and the initial condi-
tions u(x, 0) = g(x) and au(x, O)jat = h(x), where g, h, andy are spec-
ified continuous functions. Consider an approximate representation u(x, t) =
Lk=l Ck(t) IPk(X) of the solution u, where (/lk are specified functions satisfying
9.3. Mechanics 703

at least the kinematical boundary conditions and the weights c k need to be deter-
mined. If we take ({Jk(x) =sin (k :n: x/ l), the Rayleigh-Ritz method gives

k4 :n:4 X 2 !ol
Ck(t) +- -4- q(t) =- y(x, t) sin (k, :n: xj l) dx,
ml ml o

where q(O) and Ck(O) result from g(x) = L~=l q(O) sin (k :n: x/ l) and h(x) =
I::~= I Ck (0) sin (k :n: xI l), respectively, and the dots denote differentiation with
respect to time. <>
Proof: The function u represents the displacement at cross section x and time t of a simply
supported beam with span l, stiffness x, and mass m per unit length. The beam strain
energy, SE, is

SE
(a2ii)2 dx = -:n:4 X ~ k 4q(t) 2,
= -X2 lolo -ax2 n
413 L
k=l

so that 8(SE) = (n 4 x/(21 3)) Lk=l k 4 ck(t) 8ck(t), where 8q(t) denotes the first order
variation of the coefficients ck (t). The first order variation of the external work, EW, is

8(EW) = fol (y(x, t)- m


02 ~~~· t))) 8ii(x, t) dx

= E[{ y(x, t) sin(k n xjl) dx - (m Z/2) ck(t)] 8q(t).

The differential equations for ck follow from the condition 8(EW) = 8(SE) that must hold
for any 8ck(t) by the principle of virtual displacements (Eq. 9.11).
The Rayleigh-Ritz and the Galerkin methods consider similar representations for
the solution. However, there are two notable differences between these two methods. First,
the trial functions in the Galerkin method must satisfy all boundary conditions while the
trial functions in the Rayleigh-Ritz can satisfy only the kinematical boundary conditions.
Second, the unknown coefficients of the Rayleigh-Ritz and the Galerkin representations
are determined from the principle of virtual displacement and an orthogonality condition,
respectively. •

9.3.2 Deterministic problems


Consider an elastic solid in an open bounded subset D c JRq, q = 1, 2, 3,
that is in equilibrium under some external actions. The finite element and the
Rayleigh-Ritz methods differ only by the trial functions used in analysis. The
support of the trial functions is D for the Rayleigh-Ritz method. On the other
hand, the trial functions used in the finite element method are defined locally
on non-overlapping subsets of D, called finite elements, that partition D ([28],
Section 3.2).

Example 9.24: Consider a linear elastic rod of length l > 0, constant stiffness
x > 0, unit cross section area, and specific weight p > 0, that is suspended
704 Chapter 9. Stochastic Systems and Input

vertically from one of its ends. Let x E D = (0, I) denote the distance from the
suspension point. We assume small deformations so that the strain field is equal
to the first derivative of the displacement function with respect to x. The finite
element method involves the following steps.

1. Partition D in finite elements, select nodes, and define node displacements. For
example, let 0 = x1 < x2 < · · · < Xn+l = I, partition Din the finite elements
Dk = (xk, Xk+l), select Xk to be the node coordinates, and denote by Uk the
displacement of node x k. k = 1, ... , n, n + 1, in the direction of coordinate x.

2. Postulate the displacement field in each finite element. For example, take the
displacement in the finite element k to be

3. Calculate the corresponding strain field in each finite element from

(k)(x) = du(k)(x) = [ __1_


y dX Xk+J-Xk

4. Apply the principle of virtual displacements for the postulated displacement


field, which gives Lk=l (uf af- yf) 8uk = 0, where ak = f~k+J X fJf fJk dx
denotes the element stiffness matrix, y k = f~k+l p a.f dx is the element force
matrix, and 8uk is the first variation of the node displacement vector Uk. We
cannot conclude that uf af - yT = 0 for each k = 1, ... , n since the variations
8uk are linearly related.
S.Define theglobaldisplacementvectoru = (u1, ... , Un+l) and setuk = ik u.
The (2, n + 1) matrix j k selects the coordinates k and k + 1 of u. With this
notation, the principle of virtual displacement becomes (uT aT - yT) 8u = 0,
where a = Lk=l if
af j k andy = Lk=l if
Yk denote the global stiffness and
force matrices. Because the coordinates of u are not related, we have au = y.
This condition can be used to calculate u and the corresponding stress and strain
fields.<>
Note: Let u(x) denote the rod displacement at x E [0, l] and u'(x) = du(x)fdx. We
JJ
have [x u 1 (x) llu 1 (x)- p llu(x)] dx = 0 by Eq. 9.11 under the assumptions that the rod
deformation is small and the material is linear elastic. Hence, the rod strain energy and
JJ<x/2)
the first order variation of the external work are (u'(x)) 2 dx and JJ p llu(x) dx,
respectively. •

9.3.3 Stochastic problems


The element and global stiffness and force matrices in Example 9.24 are
random if the material properties and the applied actions are uncertain. Therefore,
9.3. Mechanics 705

the resulting equilibrium condition is an algebraic equation with random coeffi-


cients and input. An extensive review of the stochastic finite element method can
be found in [114, 169, 190].

Example 9.25: Consider the problem in Example 9.24 but assume that the rod
stiffness is a random field A (x ), x E [0, l], and that the rod is subjected to a
random action Y(x), x E [0, l]. If we retain the finite element partition and the
functional form of the displacement field in Example 9.24, the element stiffness
and force matrices are
Xk+1 1Xk+J
Ak = 1 A(x) P[ Pk dx and Yk = (p + Y(x)) a:[ (x) dx,
Xk Xk

respectively. The global stiffness and force matrices A* and Y* are linear forms
of A k and Yk, respectively. Moments and other properties of the random matrices
Ak, Yk, A*, andY* can be calculated from the definition of these matrices and
the probability laws of A and Y. For example, the first two moments of the entries
Ak,pq of the matrices Ak are

E[Ak,pq] = 1 Xk+1
E[A(x)]
(
/3[ Pk)
pq
dx,

1
Xk
Xk+J1Xf+1
E[Ak,pq Ai,st] = E[A(x) A(y)] (Pf Pk) (Pf Pk) st dx dy
~ ~ M

provided that A is in L2. ~

The approach in Examples 9.24 and 9.25 can be applied to general solid
mechanics problems with random material properties, input, and/or initial and
boundary conditions. Consider an elastic solid in an open bounded subset D of
lRq, q = 1, 2, 3, that is in equilibrium. Let aDa and aDu denote the parts of the
boundary aD of D where traction and displacement are specified, respectively.
It is assumed that (1) the density of the material in D exhibits a random spatial
variation that can be modeled by a random field R(x), x E D, (2) the material
in D is linear elastic with random properties so that the relationship between the
stress and strain fields is given by Hooke's law S(x, t) = A(x) f(x, t), where the
random field A = AT denotes the stiffness tensor, and (3) the solid in D is indy-
namic equilibrium under the random action Y (x, t), x E aD a, and displacement
constraint on aDu for all t ::_:: 0.
Let Dk. k = 1, ... , n, be a collection of open subsets in D such that jj =
Uk=l Dk and Dk n D1 = 0 fork =!= l. The members of this partition of D are
called finite elements. We denote by aD a,k = aDa n Dk the common part of the
boundary of the finite element k and the boundary of D with specified traction. Let
U k be a vector collecting the displacements of the nodes of the finite element k.
The displacement and strain fields in this element are uCkl(x, t) = lXk(X) Uk(t)
and r<kl(x, t) = f3k(x) Uk(t), where a:k is a specified function and Pk results
706 Chapter 9. Stochastic Systems and Input

from «k by differentiation. The vector of global displacements U collects the


distinct node displacements in all U k 's and can be related to these vectors by
U k(t) = h U (t) (Example 9.24).

The vector of global displacements U is the solution of

M U(t) +A* U(t) = Y*(t), t ::=: 0, where (9.12)

M =~if (Lk R(x) «k(x)T «k(X) dx) h = ~ i[ Mk h.


A*= tH (L
k=l k
fJk(x)T A(x)fJk(x)dx) h = tH
k=l
AZh.

Y*(t) =~if (laDu,k Y(x, t)T a.k(x)da(x)) = ~jf YZ(t). (9.13)

Proof: The principle of virtual displacement gives (Eq. 9.11)

t lvk[ U k(t)T fJk(x)T A(x) fJk(x) 8U k(t) dx

t lvk{
k=I

= (-R(x)«k(x) ih<t)f «k(x)8Uk(t)dx

t lavu,k
{
k=I

+ Y(x, t)T «k(x)8Uk(t)da(x)


k=l

for b(t, x) set equal to the inertia force -R(x) U(t, x). The relationship between the el-
ement displacement vectors u(k) and the global displacement vector U, the requirement
that the above equation be satisfied for all virtual displacements 8U (t), and elementary ma-
trix manipulations yield Eqs. 9.12 and 9.13. The matrices Mb AZ, and YZ can be random
and are referred to as element mass, stiffness, and force matrices, respectively. The global
mass, stiffness, and force matrices are denoted by M, A*, andY*. •

9.3.4 Methods of analysis


Our objective is to find second moment and other properties of the process
U defined by Eq. 9.12. We discuss solutions ofEq. 9.12 based on (1) some of the
methods in Section 9.2, referred to as classical solutions, and (2) a method based
on polynomial chaos.
9.3. Mechanics 707

9.3.4.1 Classical methods


The methods in Section 9.2 can be used to find second moment and other
properties of the solution U of Eq. 9.12. The applications of these methods re-
quires us to find properties of the random matrices M, A*, and Y*. These proper-
ties can be calculated from their definitions in Eq. 9.13 and the probability laws of
the random fields R(x), A(x), and Y(x, t). For example, the first two moments
of the element mass matrices are

E[Mk] = /, E[R(x)]«k(t/ «k(x)dx,


Dk
E[Mk,iJ Mt,pq]

= { E[R(x) R(y)] ( «k(X)T «k(X)) .. ( «t(Y)T «t(Y)) dx dy.


]DkXDt lj pq

It is common in applications to calculate properties of the random matrices


M, A*, Y*, M k. Ak', and Yk' approximately to reduce the computation time. Most
approximations are based on the representation of the random fields R, A, and Y
by random functions that are constant in each finite element, for example, we can
take these functions equal in each finite element Dk with (1) one of their values
in this element or (2) their spatial average over Dk. There is little justification for
these approximations. If we take R(x) :::::: R(~ k) for all x E Dk. where ~k is a
point in Dk, the calculation of the second moment properties of M k,

E[Mk,iJ Mt,pq] and

= E[R(~k) R(~t)] { (a.kCxl a.k(x)) .. ( «t(Y)T «t(Y)) dx dy,


hkx~ U N

is reduced significantly. On the other hand, the calculation efforts required to find
the second moment properties of M k exactly and by the second approximation are
similar.
Frequently, the parameters defining constitutive laws in continuum material
models are assumed to be random variables and/or fields for use in stochastic fi-
nite element studies, for example, the Lame constants of linear elastic materials
are represented by random fields [71, 114]. These representations are not recom-
mended since they can be inconsistent with the material microstructure [136]. The
incorporation of the microstructure features in the analysis does not pose any con-
ceptual difficulty. It requires the use of adequate mechanical models and the cali-
bration of the random coefficients of these models to the available information on
the material microstructure. We have used such a representation in Section 8.6.2.2
to describe properties of metallic polycrystals.
708 Chapter 9. Stochastic Systems and Input

Example 9.26: Let U be the solution of A U = Y, where A and Y are indepen-


dent real-valued random variables with finite mean. This stochastic problem is a
special case ofEq. 9.12 defining a static scalar problem. H (A- E[A])I E[A] < 1
a.s., then (Section 8.3.1.4)

00
A- E[A])r] Y
+~
(
U = [
l E[A] E[Af

Approximations of the above solution obtained by retaining a finite number of


terms from the above series can be used to calculate properties of U. For example,
the approximate mean of U based on the first three terms of the above series
1 1
solution is ( 1+a I f.J, ~) (J.J, y I f.J, A), where f.J, A and a denote the mean and variance
of A and f.J,Y is the mean of Y. o
Example 9.27: Let X be the solution of dX (t) = -(a+ e R) X (t) dt +a dB(t),
t 2: 0, with zero initial conditions, where a > 0 and a are some constants, R
is a random variable with mean zero and standard deviation a r > 0, e is a small
parameter, and B denotes a Brownian motion process that is independent of R.
The above equation is a special case of Eq. 9.12.
Let Yij(t I R) = E[(Xi(t)- E[Xi(t)]) (Xj(t)- E[Xj(t)l) I R], i, j =
0, 1, ... , denote the conditional covariance functions of the processes Xi in the
perturbation solution X (t) = X o(t) + e X 1 ( t) + · · · of the above equation. These
functions are

Yoo(t I R) =;a (1- e-


2
2 at),

YOI(t I R) =- a2 R
2a
(1- e-2at - te-2at)'
2a

a2 R2 ( 1 - e-2at t )
Yll
(t I R) =~ 2a2
- -a e-2at - t2 e-2at '

and approach a 21(2a), -a 2 Rl(4a 2), and a 2 R 21(4a 3 ) as t --+ oo. The sta-
tionary variance of X based on the first order perturbation X o(t) + eX 1 (t) is
Var[X(t)] = (al(2a)) (1 +e 2 a;l(2a 3)). 0

Proof: The JR2 -valued process (Xo, Xt) is the solution of

d[ ~~~~~ J= [ =~ ~a J[ ~~~~~ Jdt + [ ~ JdB(t)


so that Yoo(t I R) =
-2 a Yoo(t I R) + u 2 , YO! (t I R) =
-2 a Yo! (t I R) - R Yoo(t I R),
and Yll (t I R) = -2a Y11 (t I R) -2 R YO! (t I R) (Section 7.2.1.1). The solutions of these
equations with zero initial conditions give the above expressions for Yij . Conditional on R
and using the representation X (t) ~ Xo(t) +eX 1(t), the approximate second moment of
X is Yoo(t I R) + 2 e YO! (t I R) + e2 Yll (t I R). The condition on R can be eliminated by
9.3. Mechanics 709

averaging with respect to this random variable. The approximate variance of X based on
the first order perturbation does not include all terms of order s2 .
A similar result can be obtained by the first order Taylor method. The processes
X and V in the first order Taylor approximation X(t) ~ X(t) + V(t) R* of X satisfy the
differential equations dX(t) =-a X(t) dt +a dB(t) and dV(t) =-a V(t) dt- X(t) dt,
where R* = s R. •

9.3.4.2 Polynomial chaos


Let (Q, F, P) be a probability space and let Gi, i = 1, 2, ... , be indepen-
dent N(O, 1) random variables defined on this space. Denote by Sp the collection
of all polynomials of Gi with degree smaller than an integer p 2: 0. The family of
polynomials in Sp that are orthogonal to every polynomial in Sz, l < p, is denoted
by Sp and is referred to as the p th polynomial chaos. Two polynomials are said
to be orthogonal if the expectation of their product is zero. We setS o = So = { 1}.
The polynomials in Sp can be obtained simply. For example, let ~o = 1,
~~ (Gi) = Gi, and ~2(Gk, Gz) = ao + a1 Gk + a2 Gz + a12 Gk Gz be members of
So, St, and S2, respectively. Then~~ (Gi) is also in S1 because E[~l (Gi) ~o] =
E[Gi] = 0. The orthogonality conditions,

E[~2(Gk, Gz) ~o] = 0 and E[~2(Gk, Gz) ~~ (Gi)] = 0,

imply ao + a12 8kz = 0 and a18ki + a2 8u = 0 so that ~2(Gk, Gz) = Gk Gz- OkJ
is in S2. Similar orthogonality conditions show that the polynomials in S3 have
the form ~3(Gk. Gz, Gm) = Gk Gz Gm - Gk 8zm - Gz8km - Gm OkJ, where Gk.
Gz, and Gm are independent copies of N(O, 1). A table giving polynomial chaos
of various orders can be found in ([71], Section 2.4).

If X is a random variable with finite variance defined on (Q, F, P), the series
representation,

(9.14)

of X is convergent in mean square, where a~;:::::~; are some real constants


([109], Chapter 6, [71], Section 2.4).

Note: The convergence of the series in Eq. 9.14 indicates that we can approximate a square
integrable random variable to any degree of accuracy by polynomials of Gaussian variables.
The coefficients a~::::::~; in the above expression of X can be determined simply since the
Hermite polynomials t;p are orthogonal ([71], Section 2.4).
Approximations X of X defined by finite sums including the first terms in Eq. 9.14,
rather than the series representation in this equation, have been employed to solve stochastic
problems [71]. The statement in Eq. 9.14 has only been used in these applications to select
the functional form for X. A
710 Chapter 9. Stochastic Systems and Input

Example 9.28: Let G be an N (0, 1) random variable and consider its memoryless
transformations X 1 = exp( G) and X 2 = IG 1. These random variables can be
approximated by
m m
x1 = L ap,1 ~p(G) and x2 = L:ap,2 ~p(G),
p=O p=O

respectively, where ~p(G) denote Hermite or chaos polynomials of G "' N(O, 1)


and ap,k. k = 1, 2, are some coefficients. For example, ap,O = e 112 and ap,1 =
_ 1, ... , m, ..,~or x- 1·
e 112; p.,, p-
Figure 9.10 shows the exact and approximate mappings, G 1-+ X k and
G t-+ Xk, for several values of m. These mappings suggest that the tails of
the distributions of the random variables X2 and X2 may differ significantly. Fig-

Figure 9.10. Exact and approximate mappings G 1-+ X k and G 1-+ Xk (adapted
from [63])

ure 9.11 showing ratios of approximate to exact variances and kurtosis coefficients
confirms this observation [63]. <>

Proof: Consider an approximation X= L.;=O ap ~p(G) of an arbitrary square integrable


random variable X suggested by Eq. 9.14. The coefficients ap of this approximation can
be determined from the conditions
-
E[X ~q(G)] ~ E[X ~q(G)] = aq E[~q(G) 2 ], q = 0, 1, ... , m,
where the last equality holds since E[~p(G) ~q(G)] = 0 for p i= q. We have aq
E[X ~q(G)]/ E[~q(G) 2 ], q = 0, 1, ... , m. These equations can be used to calculate the
coefficients ap,k in the above approximate representations for Xk, k = 1, 2. •

Example 9.29: Let X(t) = F- 1 o <I>(G(t)) = g(G(t)), t E [0, o], t" > 0, be
a translation process, where F is a distribution and G denotes a stationary Gaus-
sian process with mean zero and covariance function p(t") = E[G(t + t") G(t)]
9.3. Mechanics 711

4.5
X1=exp(G)
X2=1 Gl
3.5

o.

X2=1 Gl

X1=exp(G)

Variance ratios

9 1

Figure 9.11. Ratios of approximate to exact variances and kurtosis coefficients


(adapted from [63])

(Section 3.6.6). The process X can be approximated by

m*
Y(t) = l:)v(t)Sv(Gip ... , Giv),
v=O

where bv (-) are some deterministic functions of time depending on the marginal
distribution and the correlation function of X, and l; v are Hermite polynomials. <>

Proof: We can approximate X(t) at a fixed timet :::: 0 by X(t) = r,;=oap ~p(G(t))
since X(t) is a random variable (Example 9.28). Since X is a stationary process, the
coefficients ap in the representation of X do not depend on time and can be calculated
from ap::: E[X(t) ~q(G(t))]/ E[~p(G(t)) 2 ]. The approximation X represents a nonlinear
memoryless transformation of a Gaussian process and is not very useful for calculations.
For example, it is difficult to find even the marginal distribution of X.
Consider the approximation G(t) = Lk=l Gk ((Jk(t) of G(t), where Gt, ... , Gm
denote independent N(O, 1) variables and CfJk are specified deterministic functions. For
example, the spectral representation (Sections 3.9.4.1 and 5.3.1.1) and the Karhunen-Loeve
expansion (Section 3.9.4.4) can be used to define G. The approximate representation X
with G in place of G becomes

where ~v(Gi 1 , ••• , Giv) are Hermite polynomials and bv(·) are functions of time that can
be obtained from the above equality [71].
In many applications X is specified partially by its marginal distribution F and cor-
relation function r(-r) = E[X(t + 1:) X(t)]. The approximations X andY of X need to be
calibrated such that they match as closely as possible these properties of X. The accuracy
of the marginal distribution F of X depends on the number of terms considered in its rep-
resentation. The marginal distribution of Y is likely to be less accurate than F because its
712 Chapter 9. Stochastic Systems and Input

definition is based on the approximation {; of G. The correlation function of X is defined


by the memoryless mapping G(t) r-+ X(t) and the correlation function p(·) of G. Since X
is a translation process, there is no guarantee that there is a correlation function p ( ·) such
that E[X(t + -r) X(t)] is a satisfactory approximation of r(-r) (Section 3.6.6 in this book,
[79], Section 3.1.1). Similar comments apply to the approximation :Y. •

Example 9.30: Consider a beam with length l > 0 and random stiffness. The
beam is fixed and free at its left and right ends, respectively, and is subjected to a
unit transverse load at its free end. Suppose that the beam stiffness can be modeled
by a homogeneous Gaussian random field Z (x ), x E [0, l], with mean f.L z > 0 and
covariance function Cz. Let Zm(x) = f.Lz + L:;=l Gq hq(x), x E [0, l], be an
approximate parametric representation of Z, where Gq are independent N(O, 1)
random variables and hq are specified deterministic functions. The above approx-
imation of Z can be obtained, for example, by the spectral representation method
(Sections 3.9.4.1 and 5.3.1.1 in this book, [79], Section 4.3.2) or the Karhunen-
Loeve series (Section 3.9.4.4 in this book, [71], Section 2.3).
Let 0 = Xl < x2 < · · · < Xn+l = l be nodes in [0, l] defining the finite
elements Dk = (xk. Xk+l), k = 1, ... , n. The displacement vector U k and the
interpolation matrix a k (x) for the beam element k are

Uk = [U(xk), U'(xk), U(Xk+l), U'(xk+l)r,

«k(X)=[l-3~ 2 +2e. h<~-2~ 2 +~ 3 ), 3~ 2 -2~\ h<-~ 2 +~ 3 )].

where U'(x) = dU(x)fdx, ~ = (x -xk)/ h, h = (Xk+l -xk). The corresponding


displacement field in this element is U(kl(x) = «k(X) Uk.
Figure 9.12 shows estimates of the density f corresponding to the exact
tip displacement U (l) and approximations of U (l) given by polynomial chaos of
order 1, 2, and 3 based on two terms in the Karhunen-Loeve representation Z m of
Z. The estimates are for f.Lz = 1 and Cz(Xl, x2) = (0.3) 2 exp( -lxl - x21) and
are based on 10,000 independent samples of the tip displacement. The differences
between the Monte Carlo estimates of the density of U (l) based on the exact
solution and polynomial chaos solutions are relatively small and seem to decrease
with the order of the polynomial chaos expansions. <>

Note: LetA.k(x) = d 2a.k(x)fdx 2. The strain energy in the finite element k is


9.3. Mechanics 713

15r-----~.-----.-------------,-------------,

1 1 Density of U([)
1
~ 3rd order polynomial chaos
2nd order polynomial chaos
10

./ 1st order polynomial chaos

· MC simulation
5

0.4 0.6
u

Figure 9.12. Estimates of the density of U (l) by Monte Carlo simulation using
the exact solution and polynomial chaos solutions for f.k z = 1 and cz (x1, x2)
(0.3) 2 exp(-lxl- x21) (adapted from [71], Fig. 5.36, p. 147)

so that the global stiffness becomes (Eq. 9.13)

The equilibrium condition in Eq. 9.12 gives [ ao + :L;=l Gq aq J U = y*, where y*


relates to the loading. The solution of this equation by polynomial chaos involves three
steps. First, U needs to be represented by a polynomial chaos of the random variables
(Gl, ... , Gm) used in the representation of Z. Second, the coefficients a~::::::~~ of the
polynomial chaos representation for U (Eq. 9.14) have to be calculated from the above
equilibrium condition and the orthogonality property of the polynomial chaos. Third, the
resulting expression of U can be used to approximate probabilistic properties of U.
The Gaussian model used for Z is not physically acceptable because the beam stiff-
ness cannot take negative values. It has been used in [71] to simplify calculations . .i.
714 Chapter 9. Stochastic Systems and Input

9.4 Physics
We consider large systems whose macroscopic properties can be described
by global variables. The Boltzmann equation, the Ising model, and noise induced
transitions are discussed in Sections 9.4.1, 9.4.2, and 9.4.3, respectively. The
Boltzmann equation examines systems consisting of a large number of identical
particles, and gives the evolution in time of the fraction of particles with some
attributes, the Ising model provides a global characterization of magnetism for
ferromagnetic materials, and diffusion processes can model the predator-prey re-
lationship in a random environment. In some cases, the qualitative behavior of
these systems changes abruptly and this change is referred to as phase transition.

9.4.1 Boltzmann transport equation


Let x = (x1, x2, x3) and p = (Pl, P2· P3) denote the position and momen-
tum vectors of a particle with mass min JR3 , where p = m v and v = (v1, v2, v3)
denotes the particle velocity. The motion of the particle is completely character-
ized by the evolution of the vector (x, p) E JR 6 in time.
Consider a system consisting of n identical particles. We can study the
evolution of this system by tracking the motion of every particle or by finding the
evolution in time of a macroscopic system property, for example, the density or
temperature. If n is very large, the first approach is not feasible. On the other
hand, the second approach applies and delivers macroscopic attributes directly.
Denote by g(x, p; t) dx dp the number of particles in the infinitesimal sub-
set (x, x + dx) x (p, p + d p) of the space JR6 with coordinates (x, p) at a time
t ::: 0.
If g has continuous partial derivatives, it satisfies the Boltzmann transport
equation
3 3
ag + 2_ LPi ~ + Lhi ~ = (ag) (9.15)
at m i=l axi i=l api at coli
where a= (o1, 02, 03) = dvfdt, h = m a, and (ag;at)coll quantifies the effect
of particle collisions.

Proof: If there are no collisions, the number of particles in an elementary volume along a
flow line is preserved so that g(x, p; t) = g(x + dx, p + d p; t + dt). This conservation
condition gives Eq. 9.15 by expanding its right side in Taylor series, taking the limit of the
resulting expression as dt ~ 0, and using the relationships dx = vdt and dp = hdt
that give the change of the position and momentum vectors in (t, t + dt). The Boltzmann
equation with no collisions and the Liouville equation in Eq. 9.4 coincide.
If there are collisions, the term (agjat)coll needs to be specified. Generally, this
term is approximated by (agjat)coll = -(g- go)fr:c, where go is the distribution g at
thermal equilibrium and r:c denotes a relaxation time ([4], Section 9.1). This approximation
gives (g- go)t = (g- go)r=O exp(-tfr:c) since agofat = 0 by the definition of the
equilibrium distribution. •
9.4. Physics 715

Example 9.31: The steady-state Boltzmann transport equation is given by Eq. 9.15
with ag ;at = 0. If rc is a small parameter and (ag ;at)coll = -(g- go)/rc, the
first order approximation of this equation is

with the notation in Eq. 9.15. <>

Proof: The perturbation solution 80 + Tc 81 + · · · introduced in the steady-state Boltzmann


transport equation yields

Tc (
1
-
3 a 3 a)
LPi-+Lh;- (8o+rc8t+···)=-[C8o+rc81+···)-8o]
m i=1 ax; i=l ap;

The Boltzmann equation is deterministic so that its presentation in this


chapter may appear out of place. We give the Boltzmann equation here because it
is closely related to the Liouville equation (Eq. 9.4) and its output is a probability
density function if normalized. The Boltzmann equation has numerous application
in physics and engineering depending on the particular functional form selected
for (agjat)coll ([4], Chapter 9).

9.4.2 Ising model


The ferromagnetism phenomenon can be observed in some metals at tem-
peratures lower than a characteristic value Tc. called the Curie temperature, and
is caused by the polarization of a fraction of atomic spins in one direction. At
temperatures exceeding Tc the atomic spins have random orientation so that no
macroscopic magnetic field results.
The Ising model provides a simple and approximate representation of the
structure of ferromagnetic materials and of the ferromagnetism phenomenon. The
model consists of a lattice in JRq, q = 1, 2, 3, that has n sites organized in various
geometrical shapes. For example, a two-dimensional square lattice is a square
with equally spaced sites in the two coordinates. LetS; = ±1 denote the atomic
spin at site i = 1, ... , n. The vector S = (S 1, ... , Sn) defines a particular lattice
configuration.
716 Chapter 9. Stochastic Systems and Input

The energy of the random field Ising model with n sites and configuration S is
n n
£=- L BijSiSj- .L<h+Zi)Si, (9.16)
{ij}=l i=l

where Bij denotes the interaction energy and h + Z i is the energy corresponding
to an external magnetic field with a deterministic component h and a random
component Z i.

Note: The classical Ising model is given by Eq. 9.16 with Zt = 0 at all sites.
The symbol L(ij)=l indicates that summation is performed over the nearest neigh-
bor sites. For example, the summation has at each site of a two-dimensional square lattice
four terms corresponding the right/left and above/below neighboring sites ([4], Chapter 11,
[103], Chapter 14). The random field Z can be interpreted as a spatially fluctuating external
field superposed on h. J.

If the interaction energy is isotropic, that is, B ij = B, B > 0, and the fields
hand Zt are zero, the minimum of the resulting lattice energy is-s y nj2, where
y n j2 gives the number of terms in the first summation of Eq. 9.16 and y = 4 for
a square lattice. This minimum value of £ corresponds to a completely polarized
lattice, that is, a lattice with all spins of the same orientation. It is not possible
to find analytically the minimum of the lattice energy for the general setting in
Eq. 9.16.
We give an alternative form of the lattice energy that can be used to de-
velop approximations for the minimum lattice energy, and offers some interesting
interpretations of the ferromagnetism phenomenon.

If Bij = B, the energy£ in Eq. 9.16 is


n
£ = -4sN++ +2(sy -h)N+ + (h- syj2)n- ,Ls1 Zt, (9.17)
i=l

where N+ and N_ = n- N+ denote the number of up (Si = +1) and down


(S1 = -1) spins in the lattice with a configurationS, respectively, and N ++ is
the number of nearest neighbor pairs with up spins.

Note: Denote by N++• N+-· and N- the number of nearest neighbor pairs with spins
(+, +), (+, -) and (-, +), and (-, -), respectively. It can be shown that y N+ =
2N++ + N+- andy N- = 2N- + N+- so that N+- = y N+- 2N++ and N_ =
(y N-- N+-)/2 = y n/2 + N++- y N+ ([103], Section 14.1). Hence, we have

n
L si Sj = N++ + N- - N+- = 4 N++ - 2 y N+ + y n/2
{i,j}=l

and I:?=t Si = N+- N_ = 2N+- n ([103], Section 14.1).


9.4. Physics 717

The expression of£ in Eq. 9.17 shows that, if Z; =0 for all i =


1, ... , n, the lattice
energy does not depend on the details of the configuration S but only on the lattice size n
and its macroscopic attributes N+ and N++· Hence, many lattice configurations can have
the same energy. A

Example 9.32: Suppose that Bij = e > 0 and Z; = 0 for all i, j = 1, ... , n, so
that £ can be expressed in terms of the macroscopic attributes N + and N++ of the
Ising lattice (Eq. 9.17). Let L = (N+ - N-)/n E [ -1, 1] denote the fraction of
sites with up spin in excess of sites with down spin, that is, the magnetization per
particle.
Numerical algorithms are needed to find the minimum of the lattice energy
even for the special case in which Z i = 0 at all sites, as considered here. To sim-
plify the analysis, the Bragg-Williams model assumes N++ = (N+fn) 2 (y n)/2,
which yields the expression £ = -n (e y L 2 /2 + h L) for the lattice energy.
If h = 0, then£ = -n (e y L 2 /2) with minimum -n e y /2 corresponding to
L = 1. If h :j: 0, the determination of the minimum energy is less simple. We use
the fact that the value of L minimizing £ maximizes the partition function

1 '
Q= L
L=-1
[n (1 + L)/2]~[n (1 - L)/2]! exp [ -{3 £]

where f3 = 1/(k D. k denotes the Boltzmann constant, and Tis the temperature
([103], Section 14.4). If n -* oo, the partition function is maximized at L = l *,
where l* is the solution of l* =tanh ((h + y e l*)/(k D). If h = 0, this equation
has the solutions l* = 0 and l* = ±lo for T > Tc and T < Tc. respectively,
where Tc = e y I k. Hence, the lattice is ferromagnetic forT < Tc but has no
magnetization otherwise. <>

Note: The Bragg-Williams model and arguments similar to those used to obtain Eq. 9.17
yield the expression of the lattice energy.
The partition function is Q = Lsi exp(-fl £),where the summation is over all
possible values of S; ([103], Section 14.1). Because Lis determined by N+, the number
of configurations (S1, ... , Sn) having the same value of L is equal to the number of ways
in which we can extract N+ out of n objects, that is, n!/(N+! N- !). The equation l* =
tanh (h/(k D + 8 y l* /(k D) giving the value of L that maximizes Q results by using
Sterling's formula to approximate the combinatorial coefficient in the expression of Q and
retaining the dominant term of Q ([103], Section 14.4).
If h = 0, we have l* = tanh(eyl*/(kD) so that l* = 0 for 8 yj(kD < 1 or
[* = 0 and[* = ±lo fore y /(k n > 1, where lo E (0, 1). The solution[* = 0 under
the condition e y 1(k n > 1 is not valid since it corresponds to a minimum rather than a
maximum of Q. The solution l* = ±lo is not unique because if h = 0 there is no intrinsic
difference between up and down spins. A

Example 9.33: Consider a lattice with n = 20 x 20 = 400 sites, Bij = e = 1,


h = 0, and let Zi, i = 1, ... , n, be independent Gaussian variables with mean
718 Chapter 9. Stochastic Systems and Input

zero and variance a 2 • Let Z(w) be a sample of Z = (Zt, ... , Zn). and denote
by Emin(w) the minimum of the energy in Eq. 9.16 corresponding to this noise
sample.
Figure 9.13 shows estimates of the mean and the coefficient of variation of
Emin = Emin/ (n a) for noise intensities a in the range [0, 10]. The estimates have
been calculated from 1,000 samples of £min at each noise level. The figure also
shows samples of the lattice configuration S at the minimum energy for several

o.---.----r--~----r---.----r--~----r---.---~0

-2 -0.01

-4 X 0 -0.02

o Coefficient of variation of £!m1n.


-6 0
/ (right scale) -0.03

-/{rr/2-1)/n o o o o o o o o
-----~-9-~-----------------
-8 '---><-----'------'-----'-----'-------'------'-----'-----'-------'-----_l-0.04
0 2 3 4 5 6 7 8 9 10
Noise scale, cr
cr=4 cr=6 cr=10

~iii
Figure 9.13. Estimates of the mean and standard deviation of Emin

noise levels. For a = 0 the minimum energy is -y n/2 so that its coefficient of
variation is zero. The magnitude of the coefficient of variation of &min increases
with the noise, and approaches an asymptotic value of - J (1T /2 - 1) In, which
is 0.0378 for n = 400. The estimated mean of £min converges to -,J2!1i =
-0.0378 as the noise intensity increases. <>
Note: An algorithm in Section 5.2.1 has been used to generate independent samples of Z
needed to construct the plots in Fig. 9.13.
If the noise is very large, the atomic spins tend to have the same orientation as the
noise so that (1) the contribution of the first term in Eq. 9.16 becomes negligible and (2) the
second term is approximately equal in distribution with- L:?=l IZil. Hence, &min can be
9.4. Physics 719

approximated by the random variable X = -(1/n) Z:?=l IGi I, where Gi are independent
copies of N(O, 1). Because E[IG1Il = .j2fii and E[IG1I 2l = 1, the mean and variance
of X are -.j2{ii and (1 - 2/rr:)jn, respectively, which yields the stated coefficient of
variation of Emin ~

Example 9.34: The previous example examines the random field Ising model in
Eq. 9.16 with h = 0 at an arbitrary temperature T. Here we take T = 0 and
h i= 0. If the noise Zi is zero at all sites and h < 0 has a large magnitude,
then Si = -1 at all sites, that is, all spins point down. As the external field h is
increased, it reaches a critical value at which all spins change suddenly their ori-
entation and magnetization jumps from -1 to + 1 (Fig. 9.14). Hence, the lattice
experiences a phase transition since the atomic spins at all its sites change orien-

0.8

0.6

0.4

0.2

-0.2

-0.4

-0.6
0' =1.5
-0.8

-1
-3 -2 -1 0 4 5 6
h

Figure 9.14. Magnetization in the presence of a random external field (adapted


from [167])

tation abruptly. Suppose now that the previous experiment is repeated and that
the noise in Eq. 9.16 is not zero but consists of independent Gaussian variables
Zi with mean zero and variance a 2 • We can still observe a phase transition in the
presence of noise. However, the transition is less abrupt and becomes smoother
as the noise intensity increases. The Ising model in this example has been used to
study Barkhausen noise in magnetic materials [168]. <>
Note: For small noise intensities, that is, small values of a, the couplings between spins
dominate and the lattice changes polarization through a large avalanche that sweeps through
the entire system as the external field is increased to a particular level. For large values of
a, the coupling between spins is insignificant compared to the local disorder, and the lattice
tend to magnetize in many single-spin flips, that is, avalanches contain a single spin. There
exists a critical value ac of the disorder parameter a defining a phase transition between
720 Chapter 9. Stochastic Systems and Input

these two limit cases. Near the critical point ac, avalanches can be described by a power
law up to a size depending on the magnitude of Ia- acl [168]. A

9.4.3 Noise induced transitions


Let X be the solution of the stochastic differential equation

dX(t) = a(X(t)) dt + b(X(t)) dS(t), t:::: 0, (9.18)

where S is either a scaled Brownian motion or a compound Poisson process. It is


assumed that the functions a, b and the noise S are such that Eq. 9.18 has a unique
solution (Sections 4.7.1.1 and 4.7.2). Stochastic problems of the type in Eq. 9.18
have been examined in Section 8.7. We revisit this type of stochastic problems in
this section because (1) elementary extensions of Eq. 9.18 obtained, for example
by randomizing some of the coefficients of and/or supplying an additive noise to
this equation, yield stochastic problems of the type in Eq. 9.1 and (2) the state
X in Eq. 9.18 can experience notable phase transitions, encountered in physics,
biology, chemistry, and other fields [102].
Our objectives in this section and Section 8.7 differ slightly. In Section 8.7
we have studied the stability of the stationary solution X s of stochastic differen-
tial equations of the type in Eq. 9.18 based on the Lyapunov exponent. A main
objective here is to determine whether the density fs of Xs can exhibit qualitative
changes under some conditions. If such changes occur, we say that f s undergoes
transitions. If the transitions of fs are caused by the noise intensity, they are
refer to as noise induced transitions. Qualitative changes in fs indicate possi-
ble qualitative changes in some macroscopic properties of a large system ([102],
Chapter 1).
We illustrate the transition phenomenon by some extensions of the deter-
ministic Verhulst model describing the growth of a biological population. The
state x E [0, oo) of this model is a measure of the population size, for example,
the average population per unit area, and satisfies the differential equation

i:(t) = px(t) -x(t) 2 , t > 0, (9.19)

where p x(t) is the rate of population increase or decrease depending on the sign
of p E IR and the term -x(t) 2 accounts for the fact that resources are limited. The
parameter p is positive and negative for a favorable and hostile environment. The
solution of the Verhulst model is

x(t)=x(O)ept (1+x(O)(eP 1 -l)/pr 1 . (9.20)

The stable steady-state solution of the model is x s ( t) = 0 for p < 0 and x s ( t) = p


for p > 0. The steady-state solution Xs(t) = 0 for p < 0 becomes unstable at
p = 0 and a new steady-state solution Xs (t) = p emerges. The system undergoes
a phase transition at p = 0. Hence, Xs can be viewed as a degenerate random
9.4. Physics 721

variable with the density Is (x) = 8 (x) for p < 0 and Is (x) = 8 (x - p) for p > 0,
where 8 denotes the Dirac delta function.
The Verhulst model is a special case of Eq. 9.1 in which V has deterministic
coefficients and Y is zero. The following two examples consider extensions of
the Verhulst model that account for the random variations of the environment.
It is shown that the state of these models experience one or more noise induced
transitions.
Example 9.35: Suppose that the rate p in the Verhulst model is replaced by p +
a dB(t)jdt, where B denotes a Brownian motion and a is a constant giving the
scale of the environmental noise. The state X of this version of the Verhulst model
is a diffusion process defined by the stochastic differential equation

dX(t) = (p X(t)- X(t) 2 ) dt +a X(t) dB(t), t > 0.


Let Xs be the stationary solution of the above equation. If p > a 2 /2, the density
of Xs exists and is ls(x) = cx 2 (pfa 2 -l) exp (-2xja 2 ), x > 0, where c > 0
is a constant. If p < a 2 j2, then ls(x) = 8(x). The mean and variance of the
non-trivial stationary solution Xs, that is, Xs for p > a 2 /2, are E[Xs(t)] = p and
Var[Xs(t)] = p a 2 /2, so that the expectation of Xs(t) for p > a 2/2 coincides
with the steady-state solution Xs of the deterministic Verhulst model. Also, X s =
Xs = 0 for p < a 2 /2.
Figure 9.15 shows the stationary density Is for a = 1 and pin the ranges
(-oo, a 2 /2), (a 2 /2, a 2 ), and (a 2 , oo ). The stationary density Is has transitions
at p = a 2 /2 and p = a 2 , in contrast to the Verhulst model that has a single

0.9
cl-!2<p<cl- (p=3/4)
0.8

0.7
p<cl-12 (p=O)

0.5 1.5 2 2.5 3 3.5 4


X

Figure 9.15. The density Is of the stationary state Xs for a = 1, p = 0, p = 3/4,


andp = 2

transition at p = 0. The transitions in Fig. 9.15 represent qualitative changes in


722 Chapter 9. Stochastic Systems and Input

the functional form of fs and are caused by random fluctuations in the environ-
ment since E[p dt +a dB(t)]/dt = p has the same value as in the deterministic
Verhulst model ([102], Section 6.4).
It can be shown that the above differential equation has the solution

X (0) exp [ (p - a 2 /2) t +a B(t)]


X(t) = 1 ,
1 +X (0) fo exp [(p- a 2 /2) s +a B(s)] ds

where X (0) denotes the initial state ([115], p. 125). Figure 9.16 shows samples of

1.5 r r - - - - - - - - - , 3.---------,
p =0.0 p =0.75 p=2.0
2 4

0.5

0
0
u 50
time

Figure 9.16. Samples of X for a = 1 and p = 0, 3/4, and 2


X for a = 1 and p = 0, 3/4, and 2, that is, the values of pin Fig. 9.15. The sample
behavior is consistent with the properties of the stationary density in Fig. 9.15. <>

Proof: The density l(x; t I xo; 0) of X(t) I (X(O) = xo) satisfies the Fokker-Planck
equation (Section 7.3.1.3)

a1= - -
- a ( (px-x 2 )1 ) +cr2-a2
- ( x 2 I).
at ax 2 ax2
The stationary density ls(x) = limt--+oo l(x; t I xo; 0) is the solution of the above equa-
tion with aIs 1at = o, that is,

d 0"2 d2 )
0 = - dx ( (p x - x2) Is) +2 dx2 (x2 Is ,

so that (cr 2 /2) d(x 2 ls)/dx = (p x- x 2) Is or


dls(x)
Is (x)
= 2_
cr 2
(p- x
cr 2 _ 1)

since Is and its first derivative approach zero as lx I -+ oo. The above equation gives
In <fs(x)) = 2 (p- cr 2) ln(x)jcr 2 - 2xjcr 2 + q by integration, where q is a constant.
The stated expression of Is results by elementary manipulations.
If p > a 2j2, the integral J000 x 2 (p/0" 2 -l) e- 2 xf0" 2 dx is bounded so that Is exists
and has the above expression. If p < cr 2 /2, Is is a delta function at x = 0, that is,
9.4. Physics 723

ls(x) = 8(x) ([102], Section 6.4), a result consistent with the expression of X(t) showing
that, if p < u 2 /2, then X(t) --+ 0 a.s. as t --+ oo since p- u 2/2 + u B(t)/t converges
a.s. top- u 2 /2 as t --+ oo (Section 8.7). Also, note that the first two moments of Xs have
the same expression for p > u 2 /2. Hence, the qualitative change in Is at p = u 2 cannot
be captured by the second moment characterization of Xs.
The solution x(t) = x(O) eP 1 [1 + x(O) (eP 1 - 1)/ p rl ofEq. 9.19 has the asymp-
totic value Xs = limt-Hxl x(t) = 0 and Xs = p for p < 0 and p > 0, respectively. Hence,
p = 0 constitutes a bifurcation point between two stable solutions. The steady-state solu-
tion Xs coincides with the mean of the stationary solution Xs for all values of p.
The Lyapunov exponent ALE obtained by the method in Section 8.7 is ALE = p-
u 2/2 and ALE = (1 - m) (p - u 2/2), m = 2, for the trivial and non-trivial stationary
solutions, respectively. Since the stationary solution is stable a.s. if ALE < 0, the trivial and
non-trivial stationary solutions are stable a.s. for p < u 2 /2 and p > u 2/2, respectively,
in agreement with the results in this section showing that Is (x) = 8(x) for p < u 2/2 and
Xs > 0 a.s. for p > u 2/2. The approach in Section 8.7 delivers only the value p = u 2/2
of p at which the stationary solution Xs bifurcates but not the transition of Is at p = u 2.
However, the Lyapunov exponent method can be extended to detect the transition of !s at
p = u 2 [10].
We have modeled the uncertainty in the environment by a Gaussian white noise
for simplicity. A broadband noise with finite variance would have been more realistic. If
such a noise were used to model the environment, the differential equation for X would
have been a Stratonovich rather than an It() equation. The relationship between the It() and
Stratonovich differential equations in Section 4.7.1.2 can be used to recast a Stratonovich
equation into an It() equation. •

Example 9.36: Let X be defined by the stochastic differential equation in the


previous example in which the Gaussian white noise a dB(t)jdt is replaced by a
Poisson white noise dC(t)jdt, where C(t) = '£~7 Yk is a compound Poisson
process, N denotes a Poisson process with intensity A > 0, and Yk. k = 1, 2, ... ,
are independent copies of a random variable Y E L 2 with mean zero and distribu-
tion F. Denote by X s the stationary solution of the differential equation

dX(t) = (pX(t-)- X(t-) 2)dt + X(t-)dC(t), t > 0.

It is not possible to find analytically the stationary density Is of the sta-


tionary solution X 8 • To determine whether Is exhibits transitions, we consider
the characteristic function q;(u; t) = E[exp(J=T u X(t))]. This function is the
solution of
aq;(u; t) aq;(u; t) '1 a2q;(u; t)
at = pu au + v-I au2

+A [i q;(u (1 + y); t) dF(y)- q;(u; t) J.


The stationary characteristic function q; s (u) = limr--;. 00 q; ( u; t) satisfies the above
equation in which we set aq;jat = 0. The density Is of Xs is the Fourier transform
of (/Js·
724 Chapter 9. Stochastic Systems and Input

Imaginary part

5 10 15 20
u
Figure 9.17. Real and imaginary parts of the characteristic function rp s for A = 12,
Y1 ""U(-a, a), a= 0.5, and p = 1, 2

Figure 9.17 shows the real and imaginary parts of the characteristic function
fPs for A = 12, Y1 "" U( -a, a), a = 0.5, and two values of p. The significant
difference between the two characteristic functions suggest that Is may exhibit a
transition. The corresponding plots of Is in Fig. 9.18 show that there is a qualita-

5 6

Figure 9.18. The density Is for A = 12, Y1 ,.._, U (-a, a), a = 0.5, and p = 1, 2

tive difference between the stationary solutions for p = 1 and p = 2. The most
likely values of Xs are in a small vicinity of x = 0 and away from x = 0 for
p = 1 and p = 2, respectively. We also note that Var[C(t)] = At E[Yf] = t
9.5. Environment and ecology 725

for A.= 12 and Y1 ,....., U(-a, a) with a = 0.5 so that the processes C(t) in this
example and a B(t) with a = 1 in the previous example have the same second
moment properties. 0

Proof: The Ito formula applied to the function e A u X(t) gives

eHuX(t) -eAuX(O) = t .J=lueAuX(s-)dX(s)


lo+
+ L [eA u X(s) _ e A u X(s-) _ .J=I u e A u X(s-) ~X(s) J,
O<s:::t

where ~X(s) = X(s)- X(s-), or (Section 4.6.1)


eAuX(t) -eAuX(O) = t .J=lueAuX(s-) (pX(s-) -X(s-)2) ds
lo+
+ L [eAuX(s) _ eHuX(s-)].
O<s:::t

The expectation of the above equation can be performed term by term. The left side gives
rp(u; t) - rp(u; 0). The calculation of the expectation of the integral on the right side does
not pose difficulties and is not discussed. To calculate the expectation of the summation in
the above equation, we note that the term

eHuX(s) _ eHuX(s-) = eHuX(s-)(l+~C(s)) _ eAuX(s-)

is zero or eHuX(Tk-)(l+Yk)- eAuX(Tk-) if there is no jump at s or s = Tk. re-


spectively, where Tk = inf{t ~ 0 : N(t) = k}, k = 1, ... , N(t), and ~C(Tk)
C(Tk) - C(Tk-) = Yk. Since the expectation of e A u X(s) - e A u X(s-) is

and the jump times of Care uniformly distributed in (0, t] conditional on N(t), the above
conditional expectation is fR dF(y) Jci rp(u (1 + y); ~) d~/t- Jci rp(u; ~) d~/t for each
k = 1, ... , N(t). The stated equation for rp results by differentiating the expectation of the
above Ito formula for e A u X(t).
The numerical results in Fig. 9.17 have been obtained by using a finite difference
approximation of the equation for rp with the boundary conditions rp(±u; t) = 0 and
arp(±u; t)jau = 0, where u = 200. The densities Is in Fig. 9.18 are the Fourier transforms
of the corresponding characteristic functions IPs. •

9.5 Environment and ecology


We present environmental measures that are defined by equations with ran-
dom coefficients and input of the type of Eq. 9.1. For example, the moisture
726 Chapter 9. Stochastic Systems and Input

content and pollutant concentration in soil deposits are random functions satisfy-
ing partial differential equations with random coefficients and input. This section
presents models for the moisture content in soil deposits (Section 9.5.1), water
quality indicators in rivers (Section 9.5.2), and pollutant concentration in soil de-
posits (Section 9.5.3). We also give some numerical results for these environment
and ecology measures.

9.5.1 Rainfall runoff model


The spatial variation of soil properties and rainfall intensity can have no-
table effects on runoff production, river flows, and other hydrological measures
[90]. Because both soil properties and rainfall vary randomly in space, the analy-
sis of most hydrology problems involves solutions of partial differential equations
with random coefficients and input involving space and time partial derivatives.
Let 8(z, t) denote the moisture content in a soil deposit at depth z 2': 0 and
time t 2': 0 and let e s be the saturation water content, where z = 0 is the soil
surface so that the coordinates x and y are in the plane defined by this surface.
Our objective is to find properties of the ponding time T P defined by the condition
8(0, Tp) = 8 8 , that is, the time at which the soil surface reaches its saturation
water content. It is assumed that (1) soil properties are uniform in the vertical
direction z but vary randomly in the horizontal plane, (2) water flow is solely in
the z-direction, (3) effects of water accumulation at soil surface and water runoff
can be neglected, and (4) the soil saturated hydraulic conductivity K s is the only
soil parameter exhibiting spatial variation.

If the above four assumptions are satisfied, then e is the solution of

(9.21)

with the initial and boundary conditions 8(z, 0) = e i (z), z > 0, and Q =
ae(O, t)jaz = R, t 2': 0, where R denotes the rainfall rate at (x, y), and the
coefficients 1/f and K are the suction head and the hydraulic conductivity, re-
spectively ([90], Chapter 5).

Note: The coefficients 1/J and K can be related to e and Ks by empirical constitutive
relationships, so that Eq. 9.21 becomes a partial differential equation for <9. The functions
ei and Q = ae(O, t)jaz are the moisture content at the initial timet = 0 and the water
flux in the vertical direction, respectively.
Because of the uncertainty in Ks, Eq. 9.21 has random coefficients. The boundary
condition at z = 0 can also be random since the rainfall rate R exhibits random spatial
and temporal fluctuations. The dependence of the functions <9, <98 , Ks, and R on the
coordinates (x, y) is not shown in Eq. 9.21 because we assume that the water can flow only
in the z-direction. Therefore, the solution of Eq. 9.21 at a site (x, y) depends only on the
soil properties and rainfall rate at this site. A
9.5. Environment and ecology 727

It is not possible to find the probability laws of e and T P analytically. Prop-


erties of these random functions can be ( 1) estimated from samples of e given by
solutions ofEq. 9.21 corresponding to samples of K s and R or (2) calculated from
approximate analytical solutions of Eq. 9.21 based on heuristic approximations,
for example, the approximate expression for T p in the following equation.

The ponding time can be approximated by

for R > Ks, (9.22)

where c and s are some constants ([90], p. 85). If K s ::: R, ponding cannot occur
so that Tp = +oo.
Note: The constants c and s depend on <9j, <9s, soil water diffusion coefficient, soil satu-
rated hydraulic conductivity, and other soil properties ([90], p. 84).
The ponding time in Eq. 9.22 is a nonlinear mapping of Ks and R so that Tp is a
random field if Ks and R are random fields. We can calculate properties of Tp from the
probability laws of Ks and R. &

Example 9.37: Suppose that Ks and R are independent lognormal translation


random fields with expectations E[Ks(x, y)] = 0.018325 and E[R] = 0.01665
measured in centimeters/minute and coefficients of variation v k = 0.5 and Vr =
0.5, respectively. For example,

Ks(x, y) = exp (t-tg + ag G(x, y)), x = (x, y) E JR2 , (9.23)

where tL g, a g are constants and G denotes a homogeneous Gaussian field with


mean zero, variance one, and covariance function p(x 1, x2) = E[G(x1) G(x2)],
x1, x2 E JR2 ([79], Section 3.1). We take G(x, y) = [Gt (x) + G2(y)]j,J2,
where Gi, i = 1, 2, are independent stationary Gaussian processes with mean
zero, covariance function E[Gi(u) Gi(u + nJ = exp(-a lsi), and a = 0.5.
The constants tL g, ag can be calculated from the prescribed mean and coefficients
of variation of Ks. The rainfall rate R is also modeled by a similar lognormal
translation field but the parameter of the covariance function of its Gaussian image
is a = 1 rather than a = 0.5.
Figure 9.19 shows samples of Ks and R in a square D = (0, a) x (0, a)
for a= 5. The figure also shows the inverse of the ponding time 1/TP (Eq. 9.22)
corresponding to the samples of (Ks, R) and (Ks, E[R]) in this figure. The sig-
nificant spatial variation of the ponding time and the differences between ponding
times for random and deterministic rainfall rates indicate that the rainfall runoff
model must account for the random spatial variation of both the soil properties
and the rainfall rates. <>
Note: Since the soil conductivity Ks and rainfall rate R are positive, they cannot be mod-
eled by, for example, Gaussian random fields. Lognormal translation fields have been used
to represent these parameters.
728 Chapter 9. Stochastic Systems and Input

Sample of K Sample of R
s

0.04 0.04

0.02 0.02

5 5
0 0 0 0
Sample of liT
P,
Sample of liT (R
p
= E[R])
0.4

0.2 0.5

0
5
5 5
0 0 0 0

Figure 9.19. Samples of Ks, R, and samples of 1/Tp corresponding to (Ks, R)


and (Ks, E[R])

The lognormal random field Ks is homogeneous with mean and covariance function

11-k = exp (11-g + aff/2) and

Ck(~) = E[Ks(~) Ks(~ + ~)] = exp ( 2{1-g + aff (1 + p(~)))- {1-~,

respectively, so that a'f = ck(O) and Vk = O'k/ 11-k ([79], p. 49). The finite dimensional
distributions of Ks can be obtained from the definition of Ks and the probability law of
G (Section 3.6.6 in this book, [79], p. 45). For example, the first order or the marginal
distribution of this random field is P(Ks :5 z) = <P((ln(z)- /1-g)/ag), where <P denotes
the distribution of N(O, 1). It can also be shown that the scaled covariance function of
Ks satisfies the condition I~(~) I :5 IP(~)I for all~ E JR2 , where;(~)= ck(~)fck(O) =
[(1 + Vf)P(~) - 1]/Vf ([79], p. 49). A

Probabilistic models forKs and R of the type in Example. 9.37 can also
be used in Eq. 9.21. Moreover, the model for R can be generalized by allowing
the rainfall to vary randomly not only in space but also in time. We have already
stated that analytical solutions cannot be obtained for Eq. 9.21, but this equation
can be solved by Monte Carlo simulation. First, samples of (K s, R) need to be
generated and the corresponding samples of (\If, K) calculated. Second, Eq. 9.21
has to be solved for the samples of (Ks, R, \11, K) obtained in the previous step.
9.5. Environment and ecology 729

Third, the corresponding samples of the ponding time T P result from its defining
equation 8(0, Tp) = E>s. Fourth, statistics need to be calculated for 8 and T p
from the calculated samples of these random fields.

9.5.2 Water quality in streams


The biochemical oxygen demand (BOD) and the dissolved oxygen (DO)
are essential indicators for water quality in streams. The BOD is the total amount
of oxygen required by bacteria stabilizing decomposable substances in a stream
and the DO is oxygen available in the water. Let v be the average velocity of a
stream and denote by l(t) and c(t) the value of BOD and DO, respectively, at a
distance t :;: 0 downstream from a polluting source. Because the stream velocity
is assumed to be constant, time t I v is a scaled distance.
Two classes of stochastic models have been developed for BOD and DO,
( 1) models based on random walks, birth-death process, or time series and (2) mod-
els based on stochastic differential equations. We consider models of the second
type ([2], pp. 135-150). One of the simplest deterministic models giving the evo-
lution of the BOD and DO is

{
i(t) = -(k1jv) /(t), (9.24)
i:(t) = -(k2jv) (c(t)- Cs) - (k1/V) /(t),

where k1 and k2 denote the deoxygenation and reaeration rates, respectively, and
Cs is the DO saturation concentration ([2], p. 137). The evolution of the state
vector (l, c) can be calculated simply from Eq. 9.24 for specified initial conditions.
Alternative models have been proposed for the evolution of (/, c) that account
for additional processes that may take place in streams ([2], pp. 135-150). Our
objectives are to establish a stochastic version for the evolution of BOD and DO
and calculate statistics of these water quality measures at any distance t > 0
downstream from a pollution source.
Consider the differential equation

X(t) =A X(t) + Y(t), t:;: 0, (9.25)

where L(t) and C(t) are the coordinates of the state vector X(t), A 11 = -(K1 +
K3)jv, A12 = 0, A21 = -K1jv, A22 = -K2jv, Y1(t) = Z(t)jv, and Y2(t) =
(K2 Cs- K2,p)/v are the entries of A and Y(t). The uppercase letters in Eq. 9.25
are used to denote random variables and functions. The parameters K 1 and K2
have the same meaning as in Eq. 9.24 but can be random. The parameters K 3 , Z,
and K2,p denote the rate at which pollution decreases because of sedimentation
and absorption, the rate of pollution increase along the river banks caused by
small pollution sources, and the rate of DO decrease owing to photosynthesis,
respectively ([2], pp. 138, [138]). The parameters K 3, Z, and K2,p may depend
on the distance t from the main pollution source. The model in Eq. 9.25 is a linear
differential equation with random coefficients, random input, and deterministic
730 Chapter 9. Stochastic Systems and Input

or random initial conditions. The model in Eq. 9.25 is an extension of Eq. 9.24
because it (1) considers additional processes, such as sedimentation, absorption,
and photosynthesis, and (2) incorporates the uncertainty in the model parameters
and input.
The method for calculating probabilistic properties of X in Eq. 9.25 de-
pends on the type of the driving noise. For example, the conditional analysis in
Section 9 .2.3 and the methods in Chapter 7 can be used if Y is a Brownian motion,
a martingale, or a semimartingale. If Y is a random variable rather than a random
function, then

\ll1(t) 0 J
X(t) =[ \ll2(t) \ll3(t) X(O) +[ 81(t) J
82 (t)
.....
= \ll(t) X(O) + .:=.(t), (9.26)

where ([2], pp. 144-145, [138])

\ll1(t) =exp[-(K1 +K3)tjv],


\ll2(t) = K1 [\ll1(t)- \ll3(t)]j(K1- K2 + K3),
\ll3(t) = exp[-K2tjv],
81(t) = Z [1- \ll1(t)]j(K1 + K3),
,. . , K1Z [l-\ll1(t)
.':.2(t) = K1-K2+K3 K1+K3
-l-\ll3(t)J
- K2--
c K2- K2
+ s ,p [1 - \ll3(t)]. (9.27)
K2
The probability law of X can be obtained from (1) Eqs. 9.26-9.27 and the prop-
erties of the random parameters in these equations by Monte Carlo simulation,
(2) the relationship between the densities of functionally dependent random vari-
ables (Eqs. 9.3), and (3) the Liouville equation (Eqs. 9.4-9.5). We illustrate the
determination of the density of X (t) by Liouville's equation.

If Kr = k,, r = 1, 2, 3, are deterministic constants and (Z, K2,p) are random


variables, the density of the augmented state vector (X, Z, K 2, p) at t :::=: 0 is

f(x, z, x; t)
_.f. (X1-~1(t) -1/J2(t)(XI-~I(t))+1/II(t)(X2-~2(t)) )
- JO 1/1'1 (t) ' 1/1'1 (t) 1/1'3 (t) 'Z, X

x exp[(k1 + k2 + k3) tfv] (9.28)

where ~q and 1/fp have the expressions of 8, q = 1, 2, and IIJ P• p = 1, 2, 3, in


Eq. 9.27, respectively, with Kr = k,, r = 1, 2, 3, and (Z, K2,p) = (z, x). The
function fo is the density of (X(O), Z, K2,p).

Note: The augmented vector X= (Xt = Xt. X2 :"" X2, X3 =.z. X4 = K2,p) is defined
by Eqs. 9.26-9.27 and the differential equations X3(t) = 0, X4(t) = 0 with the initial
9.5. Environment and ecology 731

conditions X3(0) = Z, X4(0) = K2,p· The density f(x, z, x; t) is given by Eq. 9.5
applied to X in which we use X(O) = \ll(t)- 1 [X(t)- :S:(t)] (Eq. 9.26). Generally, the
determination of the density f(x; t) = fJR2 f(x, z, x; t) dz dx of X(t) requires numerical
integration. A

Example 9.38: Suppose that X(O), Z, and K2,p in Eq. 9.26 are random variables
and all the other parameters in these equation are deterministic. Let the densities
of X1 (0) and X2(0) be proportional to the densities of N(6.8, 1) and N(8.7, 0.03)
in the intervals [0, oo) and [0, cs], respectively, and zero outside these intervals.
For example, fx 1 (o)(~) = ctl[O,oo)(~)f/J(~- 6.8), where q > 0 is a constant and
f/J denotes the density of N(O, 1). Let Z and K2,p be random variables that are
uniformly distributed in the intervals (0, 0.4) and (0, 0.2), respectively. It is also
assumed that the random variables X 1(0), X2(0), Z, and K2,p are mutually inde-
pendent. Then the joint density of (X 1(0), X2(0), Z, K2,p), that is, the density of
the augmented state vector X(t) = (X1 (t), X2(t), Z, K2,p) at t = 0, is

fo(i) = c exp[ -(i1 - 6.8) 2/2- (x2 - 8.7) 2/0.06]

for i1 ::=:: 0, i2 E [0, cs], X3 E [0, 0.4], and X4 E [0, 0.2], where c > 0 is a
normalization constant. If any of these conditions is violated, fo is zero. The
density of X(t) = (X(t), Z, K2,p) at a later timet::=:: 0 is (Eq. 9.28)

+ k2 + k3) tfv] exp { - -1 [i1 - ~1 (t)


_ 2
f(x; t) = c exp [(k1 -6.8]
2 1/11 (t)

_ _1_ [-1/12(t) (i1- ~1(t)) + 1/11(t) (i2- ~2(t)) _ 8 .7 ] 2 }


0.06 1/11 (t) 1/13(t)

if i1 E [~1 (t), oo), i2 E [c*, c*], X3 E [0, 0.4], and X4 E [0, 0.2] and zero if any
of these conditions is not satisfied, where c * = max(O, (1/12/1/11) (i1 - ~I) + ~2)
and c* = min(cs, Cs 1/13 + (1/12/1/11)(i1 - ~1) + ~2) [138].
Figure 9.20 shows the joint density of X(t) at distances t = 0 and t = 5
miles downstream from a pollution source for k1 = 0.35, k2 = 0.75, k3 = 0.20,
cs = 10, and an average stream velocity v = 7.5 miles/day. The plots have
been calculated from the above expression of the density of X(t) by numerical
integration. They show the trend of (BOD, DO) with the distance t ::::: 0 and the
spread about this trend. Such plots can be used to calculate the probability that the
DO level does not fall below a critical level DOer at any site downstream from a
pollution source. Values of DO smaller than DOer can have serious consequences
on the aquatic life in a stream. <>
Note: The densities in Fig. 9.20 have been calculated from the density of X(t) in which
the random variables Z and K2,p were assumed to take discrete values. Numerical results
are for ten equally spaced values in the range of possible values of these variables.
The relationship between X(t) and (X(O), Z, K2,p) in Eq. 9.26 can also be used
to calculate moments of BOD and DO. For example the expectations of these pollution
732 Chapter 9. Stochastic Systems and Input

2.5

1.5

0.5

0
10
10

Figure 9.20. The joint density of (BOD, DO) at t = 0 and t = 5 miles down-
stream from a pollution source

measures are

E[L(t)] = E[X 1(t)] = 'if/1 (t) E[L(O)] + 1 -1/fl (t) E[Z],


kl +k3
E[C(t)] = E[Xz(t)] = 1/fz(t) E[L(O)] + 1/f 3 (t) E[C(O)]

+ k1 [1-1/fl(t) _ 1-1/f3(t)J E[Z]


k 1 - kz + k3 k1 + k 3 k2
+ cs kz- E[Kz,pl [ 1 -1/f3 (t)]
kz

since k1, k2, and k3 are deterministic. &

9.5.3 Subsurface flow and transport


Consider a soil deposit in a subset D of Rq, q = 1, 2, 3. Let k(x) and
diJ (x) denote the hydraulic conductivity and the dispersion tensor at x E D and
timet 2:: 0, respectively. The flux and head, q(x) and h(x), at x E Dare assumed
for simplicity to be time-invariant. Let v(x, t) and c(x, t) be the flow velocity
and the concentration of a pollutant at x E D and time t 2:: 0 corresponding to a
specified head and flux on the boundary of D and to a pollutant source.
9.5. Environment and ecology 733

The spatial and temporal variation of the flow velocity and pollutant con-
centration depend on the properties of the soil deposit in D, pollutant source, and
boundary and initial conditions. Our objective is to find properties of pollutant
·concentration c(x, t). This function varies randomly in space and time if the soil
properties, the pollutant source, and/or initial and boundary conditions are uncer-
tain.
The steady-state head in D is the solution of the partial differential equation

V · (k(x) Vh(x)) + s(x) = 0, x E D, (9.29)

with the boundary conditions

h(x) = ho(x), X E aDu, and


-q(x) · n(x) = qo(x), X E aD,., (9.30)

where the source functions, the head ho, and the flux qo are prescribed func-
tions, aDu and aD,. partition the boundary of D, and n(x) is the unit outward
normal at x E aD,. ([39], Chapter 20).

Note: The flux in D can be calculated from the Darcy law q(x) = k(x) Vh(x) and the
solution of Eqs. 9.29 and 9.30. The velocity v of ground water is v = q frp, where rjJ
denotes the effective soil porosity ([39], Chapter 20). •

The concentration c(x, t) of a pollutant at x E D and timet 2:: 0 is the solution


of the transport equation

ac q (adij ac a2c )
-+cVv+v·Vc-" - - + d i j - - - =g(x,t), (9.31)
at .!....J ax·I ax.J
i,j=l
ax·I ax.J

where g is a specified pollutant source function and q = 1, 2, 3.


Note: Initial and boundary conditions as well as the pollutant source function need to be
specified for solution. The conditions on the boundary aD of D can be of the Dirichlet
and Neumann type specifying the concentration cat and the pollutant flux through aD,
respectively.
If the flow velocity is constant, the above equation simplifies to

ac
-+v·Vc-"
q (adij ac
--+d;·---
a2 c ) =g(x,t).
at L ax· ax·
i,j=l I
1 ax· ax·
I I ]

If in addition the coefficients d;j are space invariant, the above equation becomes

ac q a2 c
- +c'Vv- "di' - - - = g(x,t).
at L 1 ax·ax·J
i,j=l I
734 Chapter 9. Stochastic Systems and Input

The defining equation for c is a special case of the type of partial differential equa-
tion considered in Section 6.1. Hence, the local solutions in Sections 6.2 and 6.3 can be
used to find properties of the concentration at a specified location x E D and time t ::=: 0. A

Example 9.39: Suppose that (1) the ground water flow in an infinite medium has
a known constant velocity v > 0 along the coordinate x 1, (2) an amount co > 0
of pollutant is placed instantaneously at x = (0, 0, 0) and time t = 0, and (3) the
dispersion tensor diJ = d OiJ is constant. Then c satisfies the partial differential
equation ac;at + v ac;ax1 = d ilc, where il = L:~= 1 a2 ;axf, so that

CO [ (X1 - Vt) 2 +xi+ X~]


c(x,t)=(4T(dt)3!2exp- 4dt .

Figure 9.21 shows contour lines of the pollutant concentration at timet = 0.1, 1.5,
and 3.0 in a two dimensional medium ford = 0.2 and v = 4. Two phenomena
can be observed. The pollutant eye travels in the x 1 direction with velocity v and
the pollutant diffuses in the x1 and xz directions as time increases. <>

4
0.1 1.5 3.0

0
t= t= t=

0
2

x2 o

-2

-4
-2 0 2 4 6 8 10 12 14 16
x,

Figure 9.21. Propagation of pollutant in a two-dimensional medium

Note: That the above expression of c is the solution of the transport equation ocfot +
v ocjox1 = d ll.c can be found by direct calculations ([46], Chapter 1). As t t 0, the
concentration converges to co 8 (x). The equality co = f&,3 c(x, t) dx holds at each time
t ::=: 0 and expresses the conservation of mass. A

Example 9.40: Consider the transport problem in the previous example but as-
sume that the dispersion tensor DiJ (x) = D(x) 8ij varies randomly in space. The
pollutant concentration C(x, t) becomes a random function of space and time de-
fined by (Eq. 9.31)

ac +v ac _~[aD ac +D
at axl 6 ax; ax;
aaxfc]
2
=O,
9.5. Environment and ecology 735

Sample of D ( vd = 1.5)
3~~~~~~~~

-3
0 3 -3 0 3
xl xl

Sample of c ( vd =0.7) Sample of C ( vd = 1.5)

0 x<t· e 01
3t = 0.1 t =1.5 t = 3.0 t= 0.1 t = 1.5 t= 3.0

• 0
x2
0

-3 -3
0 5 10 15 0 5 10 15
xl xl

Figure 9.22. Samples of dispersion tensor and contour lines for the corresponding
samples of pollutant concentration

with the initial condition C(x, t) =co 8(x). It is not possible to find the probabil-
ity law of C analytically so that numerical methods need to be employed.
The top plots in Fig. 9.22 are two samples of the dispersion D assumed to
be a lognormal translation field defined by D (x) = exp(J-L g + a g G (x)), where
G(x) = (Gt (xt) + G2(x2))j,Ji, x E ~2 , and Gi, i = 1, 2, denote independent
Gaussian processes with mean zero and covariance functions E[Gi(Xi) Gi(Xi +
si)] = exp( -a lsi!), a = 2. The expected value of D(x) is 0.2, that is, the
value of the dispersion in Example 9.39. The samples of Dare for coefficients of
variation Vd = 0.7 and Vd = 1.5.
The bottom two plots in Fig. 9.22 are contour lines for the corresponding
samples of C at distances t = 0.1, 1.5, and 3 downstream from the pollution
source for v = 4. Figures 9.21 and 9.22 show that the uncertainty in the dispersion
tensor can have a significant effect on pollution spread. Also, the sample to sample
variation of pollutant concentration can be significant. <>

Note: The parameters f.lg and ag in the definition of D(x) can be related to the mean 1-ld
736 Chapter 9. Stochastic Systems and Input

and coefficient of variation Vd of this field by

These equations can be used to calculate the values of the parameters J-tg and a g for speci-
fied values of J-td and ad.
The Monte Carlo simulation algorithm in Section 5.3.3.1 has been used to generate
samples of the dispersion tensor D. A MATLAB tool box has been applied to solve the
partial differential equation of the pollutant concentration for samples of D.
Note that an extended version of the local solution in Section 6.2 of the type con-
sidered in, for example, Section 8.2 can be used to find properties of C(x, t) at an arbitrary
location and time. Also, the methods discussed in the first part of this chapter can be used
to find moments and other properties of the pollutant concentration approximately. A

The calculations in the previous two examples are based on the assumption
that the flow velocity is known and deterministic. Generally, the flow velocity is
not known and may be a random function of x E D and t 2::: 0. In this case, we
have to solve also Eq. 9.29 to find the head that gives the flux and flow velocity.
There are no simple methods for solving Eqs. 9.29 and 9.31 even in a deterministic
setting.
The stochastic version of Eqs. 9.29 and 9.31 results by assuming that the
hydraulic conductivity, the dispersion tensor, the input, and the initial and bound-
ary conditions are random functions with known probability law. Generally, the
probability law of these functions is not known so that it has to be inferred from
observations. Measurements of conductivity and dispersion can be performed at
a finite number of points x k and constitute spatial averages over relatively small
volumes centered at x k· where the size of these volumes depends on the measur-
ing device. Higher resolution devices tend to identify smaller scale features of
the measured fields so that the resulting records have a larger variability. Meth-
ods for incorporating the quality of measurements in the probabilistic models for
conductivity and dispersion are discussed in [39] (Chapter 20) and [47].

9.6 Waves in random media


Waves represent disturbances that propagate in a medium and carry energy.
These disturbances are functions of the space and time arguments, and are de-
fined by partial differential equations, referred to as the wave equations. The
coefficients of the wave equations are random because of the uncertainty in the
properties of the medium. Propagation of sound and electromagnetic radiation,
vibration of mechanical systems, and variation of ocean surface elevation in time
are examples of wave propagation.
The stochastic Helmholtz or Schrodinger wave equation,

I::!..U(x) + k5 (1 + £ Z(x)) U(x) = 0, x E D C ffi.3 , (9.32)


9.6. Waves in random media 737

is another example of a wave propagation equation in random media ([173], Sec-


tion m.13). In Eq. 9.32, sis a scale parameter, ko > 0 denotes the wave number
corresponding to the deterministic Schrodinger equation (s = 0), and the ran-
dom field Z characterizing the uncertainty in the medium properties is such that
1 + s Z(x) > 0 a.s. in D. We have obtained in Section 6.2.2.2 a local solution
for the deterministic Schrodinger equation based on diffusion processes and the
Feynman-Kac formula. This local solution can be extended by considerations in
Section 9.2.1 to solve locally the stochastic Schrodinger equation in Eq. 9.32. We
note that Eq. 9.32 belongs to the class of stochastic problems defined by Eq. 9.1.
Our objective is to determine properties of waves propagating in random
heterogeneous media. There is no general method for calculating moments and
other properties of the solution of the stochastic wave equation. In our brief dis-
cussion on waves we assume that (1) the differential operator of the wave equa-
tion is linear with time-invariant random coefficients (2) the Sommerfeld radiation
condition holds, that is, the waves from a point source depart to infinity ([173],
p. 27). Even under these assumptions there is no general solution for the stochastic
wave equation. However, the analysis simplifies significantly because the super-
position principle applies and Green functions exist. Methods discussed in the
first part of this chapter can be used to find moments and other properties of the
solution of the stochastic wave equation, for example, the perturbation, Neumann
series, iteration, and Monte Carlo methods. A comprehensive review of the ap-
plication of these methods to the solution of various types of wave equations can
be found in [173] (Chapter 3). The following two examples consider some simple
stochastic wave equations.

Example 9.41: Consider a rod with length l > 0, cross section of unit area, and
random stiffness K(x) > 0 and density R(x) > 0, x E (0, l). The rod is free
and fixed at its left and right ends, respectively. It is assumed that a force Y(t) is
acting on the left end of the rod and that the rod is at rest at the initial time. Let
U (x, t) be the rod displacement at location x E (0, l) and time t :::: 0. Then

R(x) a2 U(x, t) = ~ [K(x) aU(x,


at 2 ax ax
t)], X E (O, J),

which becomes

I _a_t_
oV(x,t)

_a_t_
oT(x,t)
_
-
1
R(x)
= K(x)
oT(x,t)
----ax•
oV(x,t)
ax '
with the notation V(x, t) = aU(x, t)jat and T(x, t) = K(x) aU(x, t)jax. The
functions V and T are the rate of change of U in time and the stress in the rod,
respectively. The initial and boundary conditions for the above equations are
V(x, 0) = 0, T(x, 0) = 0, T(O, t) = Y(t), and V(l, t) = 0.
Figure 9.23 shows contour lines for two samples of U corresponding to
Y(t) = sin(v t), v = 0.1, l = 10, R(x) = 1, and two independent samples of
K(x) = ko +a exp(G(x)), where ko = 2, a= 5, and G is a stationary Gaussian
738 Chapter9. Stochastic Systems and Input

10 10

8 8
X X
6 6

4 4

2 2

0 0
0 2 3 4 5 0 2 3 t 4 5

Figure 9.23. Contour lines of two samples of the solution of the wave equation
for a rod with random stiffness

field with mean zero and covariance function E[G(x) G(x ')] = exp( -lx - x'l).
The distinct pattern of the two wave samples is caused by the random fluctuations
of the rod stiffness. <>
Note: Because we assume that the rod is made of a linearly elastic material and its defor-
mation is small, the strain and stress at (x, t) are r(x, t) = aU(x, t)jax and S(x, t) =
K(x) aU(x, t)jax, respectively. The equilibrium condition,

aS(x, t)jax- R(x) a 2U(x, t)jat 2 = 0,


gives the wave equation.
Let Vk,l and Tk,l denote the values of V and T at (x = Xk> t = tz), respectively,
where Xk = kh are equally spaced points in (0,1), h > 0, k = 0, 1, ... , n + 1, and
tz = l 11t, where 11t > 0 is the integration time step and l :::: 0, n > 0 are integers. The
(2n, 1) vector X1 = <Vt,t• ... , Vn,l• Tt,l• ... , Tn,t) satisfies the finite difference equation
Xl+l =A X[, where the entries of the random matrix A are zero except for A;,i = 1 for
i = 1, ... , 2n, Ai,n+i+l = B; fori = 1, ... , n- 1, Ai,n+i-1 = -B; fori = 2, ... , n,
An+i,i-1 = -C; fori = 2, ... , n, and An+i,i+1 = C; fori = 1, ... , n - 1, where
B; = 11t/(2R(x;)h) and C; = K(x;) 11t/(2h). This finite difference equation needs to
be supplemented by initial and boundary conditions. The evolution in time of moments and
other properties of X can be calculated by the methods in Sections 9.2 and 9.3.
If the rod is subjected to an externally applied action Q(x, t), the wave equation
becomes
R(x) a2 U(x, t) = ~ [K(x) aU(x,
at2 ax ax
t)J
+ Q(x, t)
or
1
= R(x) ax
( oT(x,t)
+ Q(x, t )) •
_ K( ) oV(x,t)
- X -ax-•
so that the corresponding finite difference equation is X1+1 =A X1+ Ql, where the entries
in the first n rows of Q 1 are Q(xb tz) 11t, k = 1, ... , n, and the last n rows of this matrix
are zero.
9.6. Waves in random media 739

Let X(x, t) be an JR2 -valued function with coordinates (V, T). Then X is the solu-
tion ofEq. 9.1 with 'D = i (ojot) -« (ojox) and Y(x, t) = fJ Q(x, t)j R(x), where i is the
identity matrix, and the entries of the matrices« and fJ are au = a22 = 0, a12 = 1/ R(x),
a21 = K(x), fh = 1, and f32 = 0. The methods discussed in this chapter can be applied to
find properties of X. A

Example 9.42: Consider an infinite rod that is homogeneous and deterministic


exceptforafinite interval (0, l), l > 0, in which therodhasrandomheterogeneous
properties. The amplitude U of the harmonic waves in the rod is the solution of
the differential equation

U"(x) + k5 (1 + e l(o,l)(X) Z(x)) U(x) = 0, x E IR,

where Z is a random function defined in (0, l).


Suppose an incident wave e -A ko x propagates toward the random hetero-
geneous interval of the rod from its right side. Our objective is to find properties
of the reflected wave A(l) eAkox, x E (l, oo). The reflection coefficient A
is viewed as a stochastic process of the argument l that satisfies the differential
equation

dA(l) = e HkoZ(l) [eAkol A(l) +e-Akoz] 2


dl 2
with the initial condition A(O) = 0. This equation has random coefficients be-
cause of the rod properties in (0, l). Figure 9.24 shows the real and imaginary
parts of ten samples of A in the range (0, 3). The samples have been calculated

1.-----------------~
Real part of A Imaginary part of A

-1~----~----~----~ -1~----~----~----~
0 2 3 0 2 3

Figure 9.24. Ten samples of A in the range (0, 3)

from the above differential equation for A and the samples of Z in Fig. 9.25 for
ko = 5, e = 0.5, and Z(x) = exp(G(x)), where G is a stationary Gaussian pro-
cess with mean zero and covariance function E[G(x) G(x')] = exp(-lx- x'l),
X, X 1 E (0, l).
740 Chapter 9. Stochastic Systems and Input

14.-------------~--~ 4.---------~--~---.
Samples of z Phase
12
3 oo
@~ «® ,Yo
8
6 2~~;g
4
1 wr~ 0
2
~ €£ 0 0
0 n~O.
1.5 2 2.5 3 0 0.1 0.2 0.3 0.4
Amplitude

Figure 9.25. Ten samples of the field Z and 100 samples of the amplitude and
phase corresponding to 100 samples of Z and l = 3

The solution of the wave equation in (l, oo) is


e-Hkox +AeH(kox+<l>),

where A and <I> denote the random amplitude and phase of the reflected wave.
The right graph in Figure 9.25 shows the amplitude of the reflected waves for 100
samples of Z and the corresponding phase of these waves relative to the incident
wave for l = 3. The sample to sample variation of the amplitude and phase of
the reflected waves is significant. Statistics of A and <I> can be calculated from
Monte Carlo simulation results as shown in Fig. 9.25. For example, an estimate
of the correlation coefficient PA,<I> of (A, <I>) based on 1000 samples is -0.0195
for l = 3. This low value of PA,<I> is consistent with the plot of the amplitude and
phase of the reflected wave in Fig. 9.25 showing virtually no linear trend between
these two random variables. <>
Proof: The solution of the wave equation in (l, oo) is U(x) = e-Hkox+A(l) eHkox.
Denote by[; the solution of the wave equation in (0, l). The continuity conditions at x = l
require [; (l) = U (l) and U' (l) = U' (l). The ratio of these two conditions is
U1 (l) A(l)eHkol_e-Hkol
--- = .J=I ko - - - ' - - - = , - - - - - = , - - -
u(t) A(l)eHkol + e-Hkol
so that
A(l) = e-2Hkol A ko ~(l) + ~'(l)
.J=1 ko U(l)- U 1 (l)
which gives the equation for A by differentiation with respect to l and by using continuity
conditions at x = 0 [140].
We have not imposed any condition on e in our discussion. If e is a small parameter,
it is possible to derive an evolution equation for the transition probability density of a
stochastic process related to the reflection coefficient A. The derivation of this asymptotic
result can be found in [140]. •
9. 7. Seismology 741

9.7 Seismology
The energy supplied by the movement of the tectonic plates strains the
earth's crust and can cause slips along faults. The fault slips are violent events
called earthquakes. Large slips can produce earthquakes that may have devas-
tating social and economical consequences. There are few large earthquakes but
many small seismic events.
Seismologists have had a limited success in predicting the occurrence of in-
dividual future earthquakes. However, simple global laws for earth dynamics have
been found from empirical observations, for example, the Gutenberg-Richter law
log(n(m)) = a - b m and the energy-magnitude relationship m = ot + f3 log(e ),
where n(m) is the number of earthquakes with magnitude larger then m, e de-
notes the energy released during an earthquake, and a, b, ot, f3 are some constants.
These two equations show that log(n(m)) and log(e) are linearly related.
Large systems with many components tend to evolve into a critical, unstable
state that can change abruptly because of minor disturbances. The critical state
results solely from the interaction between the system components so that it is
self-organized. A system may evolve in a critical state if it is large, receives a
steady supply of energy, and is dissipative [16]. These conditions are satisfied by
the collection of interacting crust faults. The system of tectonic plates extends
over large regions, receives a steady supply of energy from the motion of tectonic
plates, and dissipates energy through fault slips. This observation suggests that
the Gutenberg-Richter law can be a manifestation of the self-organized critical
behavior of earth dynamics [16].
We consider two earthquake fault models related to the plate tectonic theory.
The first model, referred to as a physical model, consists of a collection of con-
nected blocks that are pulled at a constant speed on a rough surface. The second
model, called cellular automata, is a mathematical abstraction. The states of both
models satisfy equations with random coefficients and random or deterministic
inputs, which can be described by Eq. 9.1.

9.7.1 Physical model


Figure 9.26 shows a collection of equally spaced blocks sitting on a hori-
zontal fixed plate with rough surface. The blocks are connected to each other by
coil springs and to a moving plate by flat springs that pressure the blocks to the
fixed plate. The velocity of the moving plate is constant and deterministic. If the
force acting on a block in the horizontal plane does not exceed the friction be-
tween the block and the fixed plate, the block adheres to this plate. Otherwise, the
block slips and the released force is transferred to the neighboring blocks.
Suppose that all blocks and springs are identical and the friction coefficient
is the same at all points of the fixed plate. Then the blocks are in identical condi-
tions so that either they are at rest or slip at once. If this ideal system is perturbed,
then block slips are usually localized, and the energy release associated with slips
742 Chapter 9. Stochastic Systems and Input

Moving plate
. I

Flat I
I I
I

Rough

Figure 9.26. Physical model of interacting crust faults (adapted from [35], Fig. 3)

is small. If the friction at a block site is exceeded, the block slips and the released
force may cause a single block or a collection of blocks to slip. It is possible that
a single block slip will trigger the slip of many neighboring blocks resulting in a
large avalanche of block slips and a significant reduction of the system potential
energy. Generally, this is not the case, and block slips are confined to small neigh-
borhoods. The block slips resemble the fault slips. The energies released by the
slip of one or a few blocks and the slip of many blocks corresponds to small and
large seismic events, respectively.
Figure 9.27 shows the time evolution of the potential energy e(t) for the
system in Fig. 9.26. Following an initial time interval when potential energy is
built in the system, the energy reaches a steady-state regime characterized by small
frequent and large infrequent negative jumps of magnitude X (t) = le(t)- e(t-) J.
Consider a large time interval in the steady-state regime of the potential energy

3500 10'
Energy, e(t) n@
oo
3000 00
0
2500 0
102
0
2000
0
0
1500 0
101 0
1000 0

500 0
0

0 100
0 100 200 300 10° 102
time, t ~

Figure 9.27. Potential energy and frequency-energy relation (adapted from


[35],Figs. 4 and 5)

and let n(~) denote the number of jumps of X in this time interval with magnitude
larger than an arbitrary energy value ~ > 0. The resulting relationship between
9. 7. Seismology 743

the logarithms of n(~) and~ shown in Fig. 9.27 is nearly linear with a slope of
approximately -1.12, in agreement with the Gutenberg-Richter law [35].
The relationship between n(~) and ~ in Fig. 9.27 implies that the jump
process X associated with the potential energy must have a power-like tail, that is,
P(X (t) > ~) "" c ~-p as ~ ~ oo for some p, c > 0 ([161], Section 2) because
(1) the above power law implies that the logarithms of P(X(t) > ~) and~ are
linearly related and (2) n(~) scaled by n(O) is an estimate of P(X (t) > n

Note: Therelationshipbetweenn(~) and~ in Fig. 9.27 gives P(X(t) > ~) ~ c~-1.1 2 for
large values of~, and shows that the upper tail of the distribution of X (t) is approximately
Cauchy. Let Y be a Cauchy random variable with density f(y) = 1/[:rr (1 + y)] and
distribution F(y) = 1/2 + arctan(y)j:rr, y E JR. Hence, limy-+oo P(Y > y)jy- 1 = 1/:rr
so that P(Y > y) ~ (1/:rr) y- 1 as y -+ oo and log(P(Y > y)) ~ log(l/:rr) - log(y) so
that log(P(Y > y)) is linearly related to log(y) and the slope of this linear relationship is
-1 for Cauchy distributions.
We also note that the results in Fig. 9.27 are for a stochastic system subjected to a
deterministic input since the properties of the rough surface and the velocity of the moving
plate in Fig. 9.26 are random and deterministic, respectively. Problems characterized by
stochastic systems and deterministic input have been examined in Chapter 8. We discuss
the model in Fig. 9.26 here because it can be extended simply to the case in which the
velocity of the moving plate is random. .t.

9.7.2 Cellular automata model

Consider a square lattice with n 2 sites and denote by Z(i, j; t) the value of
a random field defined on this lattice at timet = 0, 1, ... and site (i, j), i, j =
1, ... , n. We can think of the random field Z as a model for forces acting on
hypothetical blocks located at the lattice sites (Fig 9.26).
Suppose that the lattice is in equilibrium at a time t :::: 0, that is, Z (i, j; t) :=:
Zc at all nodes, where Zc > 0 is an integer that can be related to the magnitude
of the friction force between the blocks and the fixed plate of the physical model
in Fig. 9.26. The effect of the moving plate on the system in Fig. 9.26 can be
simulated in cellular automata model by increasing the value of Z with unity at a
site selected at random. This site selection attempts to model the spatial variability
in the mechanical properties of a fault. If the updated value of Z exceeds z c. the
values of Z have to be modified according to a redistribution rule until the field Z
does not exceed Zc at all sites.
The algorithm for evolving the random field Z involves the following three
steps [16, 107]. Consider a time t and assume that at this time the lattice is in
equilibrium, that is, we have Z(i, j; t) < Zc for all i, j = 1, ... , n.
744 Chapter 9. Stochastic Systems and Input

1. Select a site (i, j) at random and increase the value of Z at this site by unity,
that is, Z(i, j; t) ~--+ Z(i, j; t) + 1 = Z(i, j; t). The values of Z at all the other
sites is not changed, that is, Z (k, l; t) ~--+ Z (k, l; t) = Z(k, l; t) fork f= i and/or
l f= j.

2. If Z(i, j; t) < Zc, then set Z(p, q; t + 1) = Z(p, q; t), p, q = 1, ... , n,


and return to the previous step. If Z(i, j; t) 2: Zc, then the following values
of the field are changed according to the rule: Z(i, j; t) 1-+ Z(i, j; t) - 4,
Z(i ± 1, j; t) ~--+ Z(i ± 1, j; t) + 1, and Z(i, j ± 1; t) ~--+ Z(i, j ± 1; t) + 1.

3. Modify Z according to the rules in the previous step at all sites with values
larger than Zc till equilibrium has been achieved. The resulting values of Z
correspond to timet+ 1 and are denoted by Z(i, j; t + 1), i, j = 1, ... , n.

Note: The initial values of Z are taken zero, that is, Z(i, j; 0) = 0 at all nodes. The
redistribution in the second step differs at boundary sites. For example, suppose that
Z(i, n; t) ~ Zc and i E {2, ... , n -1}. Then Z(i, n; t) is reduced by four, and Z(i ±1, n; t),
Z(i, n - 1; t) are increased by unity. A unit of energy is allowed to escape through the
boundary. The uncertainty in the input can be incorporated by replacing the first step with
Z(i, j; t) r+ Z(i, j; t) + Q, where Q is a {1, ... , a}-valued random variable and a > 0
denotes an integer smaller than Zc.
An earthquake is generated during a time interval (t, t + 1] if the application of the
above algorithm causes one or more sites to exceed the critical value Zc· If at timet the
values of Z(i, j; t) are relatively small with respect to Zc, a local slip associated with an
increase of Z at a site (step 1) is unlikely to propagate in the lattice. On the other hand, if
most of the values of Z are nearly equal to Zc, a local slip may trigger an avalanche of slips
covering a large region of the lattice. These two extreme events correspond to small and
large earthquakes, respectively. _.

Let E>(t) = 2:7,j=l S(i, j; t), t = 0, 1, ... , r, and X(t) = IE>(t)- E>(t-
1)1, where r > 0 is an integer. The random variables E>(t) and X(t) can be
interpreted as potential energy and earthquake magnitude, respectively, at time t.
Denote by N(x) = #{X(t) > x : t = 1, ... , r} the number of earthquakes with
magnitudelargerthanx > Ointhetimeinterval [0, r]. Then N(x) = N(x)/N(O)
is an estimate of the probability P (X (t) > x).
Figure 9.28 shows the evolution in time of two samples of 8 for n = 50,
Zc = 3, and r = 30, 000. The significant difference in the details of the samples of
8 suggests that there is little hope to ever predict details of future seismic events,
for example, the time and the magnitude of the next major earthquake. However,
the corresponding samples of N versus x = xI n2 are quite similar, indicating that
global features of the earthquake dynamics can be predicted. The relation between
1og(N) and x is nearly linear for both samples with slopes of approximately -1.17
and -1.29. This relationship is similar to the Gutenberg-Richter law introduced
at the beginning of Section 9.7.
9.8. Model selection 745

7800.----~--~------, 100 ,-------~---~------.

x.
X

L
X

10-• Two samples X X X

of fl('%)
10-5L___ _~---~--_j
8500 9000 9500 10000 0 2 3
time, t

Figure 9.28. 1Wo samples of e and the corresponding samples of N

9.8 Model selection


Our objective in this chapter and through the entire book has been to solve
the output prediction problem ([18], Chapter 2), that is, to find properties of the
output X in Eq. 9.1 for known probability law of Y, functional form of 'D, and
probability laws of the random coefficients of 'D. Generally, the probability law
of Y. the functional form of 'D, and/or the probability laws of the random coeffi-
cients of 'D are not known. Records and information other than records, referred
to in Bayesian inference as prior information ([201], Chapter II), can be used to
characterize the input Y and the operator 'D of Eq. 9.1. For example, the prior
information that the elasticity modulus is positive implies that Gaussian random
fields cannot be used to model this material property. In stochastic hydrology it
is common to augment the information provided by the record available at a par-
ticular river basin with prior information derived from records collected at similar
river basins.
The following two sections outline difficulties related to the mathemati-
cal formulation of a particular stochastic problem caused by limited information
and/or understanding of the input and the system features. The first section exam-
ines the sensitivity of the output X of Eq. 9.1 with a known deterministic operator
'D to a class of models for Y that are consistent with the available information. The
last section presents a Bayesian framework for selecting an optimal model for 'D
and Y from a collection of competing models. The coefficients of the resulting
optimal model are random since the available information is finite.

9.8.1 Partially known input


Suppose that the operator 'D in Eq. 9.1 and the initial and boundary con-
ditions for this equation are specified but the information on Y is limited to its
746 Chapter 9. Stochastic Systems and Input

marginal distribution F and correlation function r(r) = E[Y(t) Y(t + r)]. Our
objective is to assess the sensitivity of the output X ofEq. 9.1 to equivalent mod-
els for Y, that is, to a class of stochastic processes with marginal distribution
F and correlation function r. Sensitivity is measured by the difference between
estimates of the marginal distribution, the skewness coefficient, and the kurtosis
coefficient of X corresponding to equivalent models for Y.
In the following two examples we assume that Y is a stationary process with
mean zero, lognormal marginal distribution F, and covariance function r (r) =
c(r) = exp(-A lrl), A > 0. Two equivalent models are considered for Y, a
translation process YT and a diffusion process Y D, defined by

YT(t) = p-l o <P(G(t)) and


dYv(t) =-A Yv(t) dt + b(Yv(t)) dB(t),
where G is a stationary Gaussian process with mean zero, variance 1, and covari-
ance function p(r) = E[G(t) G(t + r)], <I> denotes the distribution of N(O, 1),
and B is a Brownian motion. The function p is such that Y T has the specified
exponential covariance function (Sections 3.6.6 and 5.3.3.1 in this book, [79],
Section 3 .1.1) and
b(y) 2 = - 2 A [Y u f(u) du,
f(y) ]yz

where yz is the smallest value in the range of the lognormal density f (y)
dF(y)/dy of Y [36].

Note: The stationary density fv of Y D is the solution of the Fokker-Planck equation

d 1 d 2 ( b(y) 2 fv(Y) ) =0
--(-A.yfv(y))+-- 2
dy 2 dy

so that A.y fv(y) + (l/2)d (b(y) 2 fv(Y)) /dy = 0 by the boundary conditions in Sec-
tion 7.3.1.3. This equation yields the above expression for bO by requiring that fv coin-
cides with the target density f, and solving the resulting equation forb(·). A

We denote by XT and Xv the stationary outputs ofEq. 9.1 to the processes


Y T and Y D, respectively. In the following two examples the processes Y T and Y v
modeling Y follow the same shifted lognormal marginal distribution with zero-
mean and unit-variance and the same exponential correlation function with decay
parameter A = 1, so that they are equivalent.

Example 9.43: Suppose that D = d 2 fdt 2 +2 ~ v d/dt+v 2 . This operator defines


a linear oscillator with natural frequency v > 0 and damping ratio 0 < ~ < 1.
The mean and the covariance functions of X T and X v coincide since D is
linear and Y T and Y D have the same second moment properties (Section 7 .2.1.1 ).
However, higher order statistics of X T and Xv differ. Figure 9.29 shows samples
of X T and X D corresponding to samples of YT and Yv for v = 10 and ~ = 0.1.
9.8. Model selection 747

0.1 .-----.--...---,-----.-----,---,------,----.------.-----,
Sample of XT

-0.05

-0. 1o 10 20 30 40 50 60 70 80 90 100

0.1 ~-~--,---~---r--~-~-~-~-~-~
Sample of X0
0.05

-0.05 L__ _.___..J...__ __.__--'-----'---__J,_-----'------L-----'-___J


0 10 20 30 40 50 60 70 80 90 100
time

Figure 9.29. Samples of X T and X v corresponding to samples of YT and Yv for


a linear oscillator with v = 10 and ~ = 0.1

Estimates of the skewness and kurtosis coefficients, Y3 and y4, are Y3 = 1.34 and
Y4 = 7.6829 for Xr and Y3 = 1.81 and Y4 = 8.06 for Xv. The differences be-
tween these estimates are relatively small. However, the differences between the
marginal histograms of X T and X v and estimates of the probabilities P (X T > x)
and P(Xv > x) in Fig. 9.30 indicate that reliability measures limiting the allow-
able values of X are sensitive to the particular model used for Y. All estimates are
based on 1,000 independent output samples<>

Example9.44: SupposethatV=d 2 /dt 2 +2~vdjdt+k(·),where0< ~ < 1,


k(y) = v2 y if IYI :::; a and k(y) = v 2 [a+ b(l- exp(-,B(Iyl- a)))] sign(y)
if IYI 2:: a, and a, b, v > 0 are constants. The operator V defines a nonlinear
oscillator with restoring force k (·) and damping ratio ~.
The processes X T and X v are not equal in the second moment sense since
V is nonlinear. For example, estimates of the variance of these two processes
are 0.01 and 0.017, respectively. Figure 9.31 shows samples of X T and Xv
corresponding samples of Yr and Yv for v = 10, ~ = 0.1, a = 0.04, and
b = 0.03. Estimates of the skewness and kurtosis coefficients differ signifi-
cantly. These estimates are "93 = 3.03 and "94 = 23.24 for X T and "93 = 2.29
and "94 = 12 for Xv. Differences between properties of X T and Xv can also be
seen in Fig. 9.32 showing histograms of X T and Xv and estimates of the probabil-
ities P(Xr > x) and P(Xv > x). All estimates are based on 1,000 independent
output samples. <>
748 Chapter 9. Stochastic Systems and Input

so~~~----~~--~---,
Histogram of XT
60 60

40 40

20

X X

P(XT>x)
0.1

0.05 /
P(XD >x)
0
0.01 0.02 0.03 0.04 0.05 0.06
X

Figure 9.30. Histograms of X r and X D and estimates of the probabilities


P(Xr > x) and P(Xv > x) for a linear oscillator with v = 10 and { = 0.1

In Examples 9.43 and 9.44 we have assumed that (1) the operator V in
Eq. 9.1 is perfectly known and has deterministic coefficients and (2) the input Y
to this equation is a real-valued stationary process with known marginal distri-
bution and correlation function. We have seen that the properties of the solution
X of Eq. 9.1 depend on the particular model used for Y, although the models
considered for Y are equivalent in the sense that they have the same marginal
distribution and correlation function. These results are of concern since the infor-
mation available in most applications rarely exceeds the information postulated in
Examples 9.43 and 9.44, and suggest that, at least for important systems, we need
to find properties of X in Eq. 9.1 for a collection of equivalent models of Y.

9.8.2 Partially known system and input


Consider a stochastic problem of the type in Eq. 9.1 and suppose that the
properties of the operator V and the input Y are not known. The selection of the
operator V, referred to as system identification ([18], Chapter 2), requires the
specification of the functional form of V and the coefficients of this functional
form. However, it is common to limit the solution of the system identification
problem to the estimation of the coefficients of V assumed to have a known func-
tional form ([18], Chapter 2). This partial solution of the system identification
problem can be unsatisfactory.
9.8. Model selection 749

0.15 .---.-------.-.------.----,-------,--,-----,------,------r----,
Sample of XT
0.1

0.05

0 bvi.IIMI'!MI"''~'IJIWIWIII
-0.05
-0.1 L___..L..__ __.__ ___,__ ____.__ ____jl____..J..__ _j___ _ J __ __.__ ___J

0 10 20 30 40 50 60 70 80 90 100

0.15 .---.------.-----,-----,-------,--.---..,-----.------.----,
Sample of XD
0.1

-0.05
-0.1 '-------'----'-----'----'------'--"---..L.._---'-----'-----'
0 10 20 30 40 50 60 70 80 90 100
time

Figure 9.31. Samples of X T and X D corresponding to samples of YT and YD for


s
a nonlinear oscillator with v = 10, = 0.1, a = 0.04, and b = 0.03

Example 9.45: Let Eq. 9.1 be a memoryless relation between two random vari-
ables, the input Y and the output X. Suppose that Eq. 9.1 is X = V - 1 [Y] = y 2
and we chose arbitrarily to model this equation by a linear regression X =a Y+b,
where (a, b) are unknown coefficients. If n independent samples (y i, Xi), i
1, ... , n, of these random variables are available, a and b can be estimated by
n n
and b A
-~--a~-·
- " Xi A " Yi
n n
i=1 i=1

The selected linear regression model may provide no information on the relation
between X andY. For example, if Y "' N(O, 1), the estimates (a, b) converge
to (0, 1) as n --7 oo so that the regression line becomes X = 1 under perfect
information. ~
Note: The estimates aand b minimize the error "L?-1 (a Yi + b -Xi ) 2 and have the above
a
expressions. Note that and b converge to Yxy /Yyy and J.tx - (Yxy /Yyy) J.ty as n --;. oo,
where (J.tx, J.ty) denote the means of (X, Y), 'Yxy is the covariance of (X, Y), and 'Yyy
denotes the variance of Y. If Y ~ N(O, 1), then J.ty = 0, J.tx = E[X] = E[Y 2 ] = 1, and
Yxy = E[X Yl = E[Y 3 ] = 0, so that a --;.
0 and b --;. 1 as n ___... oo. 6

We now present a method for solving the system identification problem,


which selects the optimal models for V andY in Eq. 9.1 from a specified collec-
tion of competing models with known functional form but unknown coefficients.
750 Chapter 9. Stochastic Systems and Input

so~~~--~~~------, 8007~~----~~--~---.
Histogram of XT Histogram of XD
60 60

40 40

20 20

0 0
_nn ~nnn.
-0.04 -0.02 0 0.02 0.04 0.06 -0.04 -0.02 0 0.02 0.04 0.06
X X

0.1
'•,
'•,

0.05
P(XD >x)
? '•

0
O.Q1 0.02 0.03 0.04 0.05 0.06
X

Figure 9.32. Histograms of X T and X D and estimates of the probabilities


P(Xr > x) and P(Xv > x) for a linear oscillator with v = 10, ~ = 0.1,
a = 0.04, and b = 0.03

The method is based on a Bayesian inference approach, and is illustrated for the
case in which Eq. 9.1 is a Gaussian autoregressive process with unknown order,
coefficients, and input noise scale. We denote the output X and the input Y in
Eq. 9.1 by X and W, respectively.

Let Mk, k = 1, 2, ... , m, be a collection of autoregressive Gaussian processes


with state X defined by the recurrence formula

Xt+l = f3o + f3r Xt + · · · + f3k Xt-k+l + Wt, (9.33)

where W1 are independent N(O, 1/ hk) variables, hk > 0, (f3o, ... , f3k, hk) are
unknown coefficients, and m ~ 1 is a specified integer defining the size of the
collection of competing models.

The ingredients of the Bayesian method for model selection are:

- Prior information on the collection of competing models, that is, the value
of m ~ 1 defining the largest autoregressive model considered in the analy-
sis, the prior densities f£ (/l k> hk) of the coefficients fl k = (f3o, f3r, ... , f3k)
and hk in the definition of Mk. k = 1, ... , m, and the prior probabilities
Pi > 0, Lk=l Pi = 1, where Pi is the initial likelihood that Mk is the cor-
rect model. An extensive discussion on the selection of ffc(flk, hk) can be
9.8. Model selection 751

found in [201] (Chapters II and III). It is possible to define non-informative


prior densities, that is, densities that carry no information regarding the pa-
rameters {Jk and hk. Physics must also be included in the selection of the
competing models. For example, Gaussian models are inadequate for phys-
ical parameters known to have bounded values.

- Measurements of the state X consisting of n consecutive readings, that is,


a record z = (zt, ... , Zn) of (X 1, ... , Xn) with size n > m, referred to as
an observation vector. In our discussion we consider perfect measurements.
Noisy measurements can be incorporated simply in the analysis.

- Utility function, which is problem specific and may depend on the model
use. Utility functions are needed in many applications since the obser-
vation vector z is frequently insufficient to provide a reliable solution to
the model selection problem. Let l (M k. Mq) be a utility function giving
the penalty if the model M k is selected and the correct model is M q and
let E[Mk] = L:;= 1 l(Mk. Mq) p~ denote the expected utility associated
with the selection of the model M k. where p~ denotes the posterior prob-
ability that Mq is the correct model and will be calculated later in this
section (Eq. 9.37). The optimal model minimizes E[M k] [89]. We take
l (Mk, Mq) = 1 - lhq in the following example, so that the optimal model
will have the highest posterior probability p k.
Our analysis delivers (1) an optimal model for V andY. that is, the model
Mk• with E[Mk•] 2: E[Mk] fork =f. k* and(2)theposteriordensities f£'(/Jk> hk)
of the coefficients of the models M k and the probabilities pf:, k = 1, ... , m. The
resulting optimal model M k* can be used to estimate extremes, integrals, and
other properties of X relevant for reliability studies [89]. The coefficients of M k*
are random because of the limited information. We now give formulas for calcu-
lating f£' and pf:.
Consider the model Mk in Eq. 9.33 and its regression form
1
Vk=akfJk+ fl:":"Gk (9.34)
vhk
corresponding to an observation vector z, where G k is an (n - k, 1) matrix with
independent N(O, 1) entries, V k = (Xk+t, ... , Xn) denotes a column vector,

Zk
Zk+l
(9.35)

Zn-1

and {J k = (f3o, f31, ... , f3k)T is a column vector. Given z = (zt, ... , Zn). the
vector V k and the matrix ak are perfectly known so that Eq. 9.34 becomes a
multiple regression model. Methods for regression analysis can be used to find
752 Chapter 9. Stochastic Systems and Input

the posterior density f£'({3k> hk) of the coefficients Pk and hk of Mk as well


as the posterior probability p~ that Mk is the correct model, k = 1, ... , m.
The posterior densities f£' and probabilities p~ incorporate the prior information
consisting off£ and p~ and the additional information provided by the observation
vector z.

If f£(f3k> hk) is a normal-gamma density with parameters ({3~, y~, p~, v~) and
n > k, then f£'({3k> hk) is also a normal-gamma density with parameters

(y%)- 1 = <rD- 1 +a[ ak


{3~ = Y~ [ (y~)- 1 {3~ + af Zk J
P~ v~ = P~ v~ + (f3~l <rD- 1 {3~- ({3~)T (y%)- 1 {3~ + z[ Zk
v~ = v~ +n- k, (9.36)

where Zk = (Zk+1, ... , Zn) is a column vector.

Note: Consider a model Mk defined by Eq. 9.33. Under the above assumption on the prior
density of (fJk> hk), we have fJk I hk ~ N(fJ~, rUhk) is a Gaussian vector with mean fJ~
and covariance matrix rU hk and hk ~ G2(p~, v~) follows a gamma-2 distribution with
parameters (p~, vic), that is, the prior density of hk is ([91], p. 226)

A density f//fJb hk) is said to be a conjugate prior density if!£ and !£' have
the same functional form, for example, the prior and posterior densities in Eq. 9.36. The
posterior density f!/ (fJk, hk) can be obtained by applying the Bayes formula (Section 2.3.4
in this book, [201], Section 3.2.3).
The results in Eq. 9.36 can be used to calculate the simple predictive distribution,
that is, the distribution of y(k) = Xn+ 1 conditional on Mk and based on the prior informa-
tion f£(fJb hk) and the observation vector z = (Z1, ... 'Zn). We have y(k) = z[ fJk+ Wn
from Eq. 9.33 with t = n, where Zk = (1, Zn, ... , Zn-k+1) is a column vector. Hence,
y(k) I hk ~ N ( zf fJ'£, (1 + zf y'£ Zk)/ hk) so that the density of y(k) is given by

where fk(y 1 ~)is the density of y(k) I (hk =~)and f~: denotes the posterior density of
hk· The above integral shows that the random variable (Y(k) -zr J
fJ'£) I v;: (1 + z[ y'£ Zk)
is a standard Student t with v',: degrees of freedom ([201], Section 3.2.4). A
9.8. Model selection 753

The posterior probability that M k is the correct model is


m
p'fc = c p~ fk(Zk), where c- 1 = L p~ fk(Zk) (9.37)

=I
k=1

fk(Zk) fk(Zk I {Jk> hk) f£(/Jk, hk) d{Jk dhk, (9.38)

fk(Zk I fJb hk) is the density of ak fJk + Gk/../hk conditional on ({Jb hk),
Zk = (Zk+1, .•. , Zn) (Eq. 9.36), and f£(/Jk> hk) denotes the prior density of
the coefficients ({Jk, hk) of Mk.

Note: The vector ak flk + Gki.../Tik is Gaussian with mean ak flk and covariance matrix
ik! hk conditional on (flk> hk), where ik is an identity matrix. Generally, numerical inte-
gration is needed to find fk(Zk). Straightforward calculations show that, if f£<flb hk) is
the normal-gamma density in Eq. 9.36, then

~
" ' _k_
( ) _ (211')-(vk-vk)/2 IY 11 1112 (p k1 v'k /2)vkf 2 f(vk11 /2)
Jk Zk - I>'kI 11/2 (pkII vk/2)
II v"/2
k r( vk'/2)'
where r (·) denotes the Gamma function. Extensive considerations on the selection of the
prior density can be found, for example, in [91] (Chapter 3) and [201] (Section 8.4). &

Example 9.46: Two sets of samples of length n = 10, 20, 50, 100, 150, 200,
and 250 have been generated from an autoregressive Gaussian process X of order
k = 3 with parameters fJo = 0, fJ1 = 0.7, fJ2 = -0.5, {J3 = -0.3, and 1/ hk =
1 (Eq. 9.33). We assume that the samples are of unknown origin, consider a
collection Mk, k = 1, 2, 3, of autoregressive process, and apply the method in
this section to find the optimal model M k* for X.
Figure 9.33 shows the dependence on the sample size of the posterior prob-
abilities p'fc in Eq. 9.37 for the two sets of samples generated from M 3· The
posterior probabilities P1 are nearly zero for all values of n and are not shown.
The posterior probabilities p~ and p~ vary significantly from sample to sample
and with the sample size. However, the method delivers the correct model as
the available information, that is, the sample size n of the record, increases. The
parameters ({Jo, fJ1, fJ2, {J3, h) of the optimal model Mk• = M3 are random be-
cause n is finite. For example, the means of the parameters ({Jo, fJ1, fJ2, {J3) of this
model are
(1.0056, 0.6736, -0.4555, -0.3499) and (1.1558, 0.6636, -0.4734, -0.3330)

for the two samples of X with length n = 250 considered in Fig. 9.33. The
standard deviations of these parameters are
(0.0885, 0.0591, 0.0660, 0.0591) and (0.0826, 0.0596, 0.0676, 0.0598),
respectively. The posterior density of ({J o, fJ1 , fJ2, {J3, h) given by Eq. 9.36 can be
used to find properties of the output of Eq. 9.1 at this information level.
754 Chapter 9. Stochastic Systems and Input

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

100 150
Sample size, n

Figure 9.33. Posterior probabilities of second and third order autoregressive


Gaussian processes

The evolution of the posterior probabilities in Fig. 9.33 shows that M 2 can
be the optimal model if the sample size n is not sufficiently large although (1) the
available samples have been generated from M 3 and (2) M2 is a special case of
M3. That simpler models may be superior to more general models for small sam-
ples is not surprising. For small values of n the uncertainty in the coefficients of
M3 is so large that its probability law becomes inconsistent with the observation
vector.<>
Note: The prior information used in the analysis has been derived from five consecutive
values of X generated from M3 with f3o = 0, f3I = 0.7, f32 = -0.5, /33 = -0.3, and
1/ hk = 1. This prior information has virtually no influence on the outcome in Fig. 9.33
since the sample size n considered in analysis is much larger.
Because the absolute values of the roots of the polynomial 1 - /31 z - f32 z2 - /33 z3
with /31 = 0.7, f32 = -0.5, and /33 = -0.3 differ from one, the state of the model
M 3 becomes stationary as t increases ([31], Theorem 3.1.3, p. 88). The samples used in
Fig. 9.33 correspond to the stationary state of M3. &

9.9 Problems
9.1: Find the local solution of the partial differential equation in Example 9.3
with Z1 and Z2 replaced by the random fields a1 + exp(G1 (x)) and -a2 + G2(x),
respectively, where a 1, a2 > 0 are constants, x E D, G 1 and G2 are homogeneous
Gaussian fields with mean zero and covariance functions q(~) = e-'Ak 11~11, k =
1, 2, and G1 is independent of G2. For numerical calculations take a1 = 1, a2 =
5, and A.1 = A.2 = 5.
9.9. Problems 755

9.2: Find the local solution of


d 2
au ex, t) = A(x)2 La U(~, t), xED, t :::. o,
at i=l axi

where D is an open bounded subset of m.d and A > 0 denotes a random field.
Assume the initial and boundary conditions U(x, 0) = a > 0, x E D, and
U(x, t) = 0 for X E aD and t:::. 0, respectively.

9.3: Find U in Example 9.3 by conditional analysis. Use any numerical method
to calculate the solution U I (Z 1, Z2).

9.4: Find U in Example 9.3 by the first order Taylor approximation.

9.5: Find the first four moments of X in Example 9.7 by a closure method. Com-
pare the resulting moments with Monte Carlo solutions.

9.6: Find the function c(z2) in the expression for the stationary density of the state
vector Z in Example 9.7.

9.7: Let X be the solution of (a+ R) X(t) = Q(t), t :::_ 0, where R is a real-
valued random variable such that IR j a I < 1 a.s. and Q is a stochastic process in
L2. Find the Neumann series representation of X and examine its convergence.
Calculate the second moment properties of X.

9.8: Find the second moment properties of X in Example 9.8 by one of the closure
methods in Section 7.3.1.5.

9.9: Find the function c(z2, Z3) in the expression for the stationary density of the
state vector Z in Example 9.8.

9.10: Let X be the solution of X(t) = -a X (t) 2 for a > 0, t :::_ 0, and X (0) = Z,
where Z is a real-valued random variable. Find the density of X at some time
t :::_ 0 by using Eq. 9.3 and Eq. 9.5.

9.11: Complete the solution of the problem in Example 9.20.

9.12: Solve the problem in Example 9.20 by the Galerkin method with Berstein
polynomials as trial functions.

9.13: Find the first two moments of U in Example 9.3 by the stochastic finite
element method.

9.14: Complete the details of the stochastic finite element analysis in Exam-
ple 9.30 based on polynomial chaos.
756 Chapter 9. Stochastic Systems and Input

9.15: Find the second order approximation of the solution of the steady-state
Boltzmann equation in Example 9.31.

9.16: Calculate the Lyapunov exponent for the stochastic differential equation in
Example 9.35. Discuss the stochastic stability of the stationary solution of this
equation as a function of p.

9.17: Suppose that Kr, r = 1, 2, 3, X(O), Z, and Kzp in Example 9.38 are real-
valued random variables. Find the marginal density of the state X (t) at an arbitrary
timet?: 0.

9.18: Find the local solution for Eq. 9.31 and a generalized version of this equa-
tion for the case in which the dispersion tensor dij is random.

9.19: Repeat the calculations in Examples 9.43 and 9.44 for other input processes,
which are equivalent in some sense. Also, assume that some of the coefficients of
the differential equations in these examples are random.

9.20: Complete the details of the proof presented for Eqs. 9.36 and 9.38.

9.21: Extend results in Example 9.46 to the case in which the observations are
imperfect, that is, the sample values Zi are replaced with Zi + Ui, where Ui are
independent copies of a real-valued random variable with mean zero and known
variance.

9.22: Let X be a stationary Gaussian process with mean zero and covariance func-
tion c('r) = E[X(t) X(t + r)]. The covariance function of X is not given. It
is known that cis one of the functions ct(r) = a 2 sin(vc r)j(vc r), cz(r) =
a 2 exp(-a Jrl), or c3(r) = a 2 (1 + fJJrl) exp(-{JJrl), where a, Vc, a, {3 > 0
are some unknown constants. Generate a sample of length n of X with c = c 1,
a = 1, and Vc = 30. Suppose that the resulting sample is of unknown origin.
Apply the approach in Example 9.46 to find the optimal model for X from the
collection of stationary Gaussian processes with mean zero and covariance func-
tions q, k = 1, 2, 3.
Bibliography

[1] R. J. Adler. The Geometry of Random Fields. John Wiley & Sons, New
York, 1981.

[2] G. Adomian, editor. Applied Stochastic Processes. Academic Press, New


York, 1980.

[3] G. Adomian. Stochastic Systems. Academic Press, New York, 1983.

[4] B. K. Agarwal and M. Eisner. Statistical Mechanics. John Wiley & Sons,
New York, 1988.

[5] T. M. Apostol. Mathematical Analysis. Addison-Wesley Publishing Com-


pany, Reading, Massachusetts, 1974.

[6] S. T. Ariaratnam. Some illustrative examples of stochastic bifurcation. In


J. M. T. Thompson and S. R. Bishop, editors, Nonlinearity and Chaos in
Engineering Dynamics, pages 267-274. John Wiley & Sons, Ldt., New
York, 1994.

[7] S. T. Ariaratnam and N. M. Abdelrahman. Almost sure stochastic stability


of viscoelastic plates in supersonic flow. AIAA Journal, 39(3):465-472,
2001.

[8] S. T. Ariaratnam and H. N. Pi. On the first-passage time for envelope cross-
ing for a linear oscillator. International Journal of Control, 18(1):89-96,
1973.

[9] S. T. Ariaratnam and W. C. Xie. Almost-sure stochastic stability of cou-


pled non-linear oscillators. International Journal of Non-Linear Mechan-
ics, 29(2):197-204, 1994.
[10] L. Arnold. A formula connecting sample and moment stability of linear
stochastic systems. SIAM Journal of Applied Mathematics, 1994.

[11] S. R. Arwade. Probabilistic models for aluminum microstructure and in-


tergranular fracture analysis. Master's thesis, Cornell University, Civil and
Environmental Engineering, Ithaca, NY, 1999.

757
758 Bibliography

[12] S. R. Arwade. Stochastic characterization and simulation of material mi-


crostructures with applications to alluminum. PhD thesis, Cornell Univer-
sity, Ithaca, NY, January 2002.

[13] S. F. Asokanthan and S. T. Ariaratnam. Almost-sure stability of a gyropen-


dulum subjected to white noise random support motion. Journal of Sound
and Vibration, 235(5):801-812, 2000.

[14] M. Avlonis, M. Zaiser, and E. C. Aifantis. Some exactly solvable models


for the statistical evolution of internal variables during plastic deformation.
Probabilistic Engineering Mechanics, 15(2):131-138, 2000.

[15] P. Bak. How Nature Works. Springer-Verlag, New York, 1996.

[16] P. Bak and C. Tang. Earthquakes as a self-organized critical phenomenon.


Journal of Geophysical Research, 94(B11):15,635-15,637, 1989.

[17] N. Bellomo and R. Rigantti. Nonlinear Stochastic Systems in Physics and


Mechanics. World Scientific, Singapore, 1987.

[18] J. S. Bendat. Nonlinear System Analysis and Identification from Random


Data. John Wiley & Sons, New York, 1990.

[19] P. Bernard and L. Wu. Stochastic linearization: The theory. Journal of


Applied Probability, 35:718-730, 1998.

[20] J. Bertoin. Levy Processes. Cambridge University Press, New York, 1998.

[21] A. T. Bharucha-Reid and M. Sambandham. Random Polynomials. Aca-


demic Press, Inc., New York, 1986.

[22] V. V. Bolotin. Random Vibration of Elastic Systems. Martinus Nijhoff


Publishers, Boston, 1984.

[23] A. P. Boresi and K. P. Chong. Elasticity in Engineering Mechanics. John


Wiley & Sons, Inc., New York, 2000.

[24] P. Bougerol and J. Lacroix. Products ofRandom Matrices with Applications


to Schrodinger Operators. Birkhauser, Boston, 1985.

[25] W. E. Boyce. Probabilistic Methods in Applied Mathematics, volume 1.


Academic Press, New York, 1968.

[26] R. L. Brabenec. Introduction to Real Analysis. PWS-KENT Publishing


Company, Boston, 1990.

[27] P. Bratley, B.L. Fox, and L. E. Schrage. A Guide to Simulation. Springer-


Verlag, New York, second edition, 1987.
Bibliography 759

[28] C. A. Brebbia and J. J. Connor. Fundamentals of Finite Element Tech-


niques. Butterworth, London, 1973.

[29] E. 0. Brigham. The Fast Fourier Transform. Prentice-Hall, Inc., Engle-


wood Cliffs, NJ, 1974.

[30] R. W. Brockett. Finite Dimensional Linear Systems. John Wiley & Sons,
Inc., New York, 1970.

[31] P. J. Brockwell and R. A. Davis. Time Series: Theory and Methods.


Springer-Verlag, New York, 1987.

[32] V. Bryant. Metric Spaces. Iteration and Applications. Cambridge Univer-


sity Press, 1987.

[33] R. S. Bucy and P. D. Joseph. Filtering for Stochastic Processes with Appli-
cations to Guidance. Chelsea Publishing Company, New York, 1987.

[34] B. Budiansky. On the elastic moduli of some heterogeneous materials.


Journal of the Mechanics and Physics of Solids, 13:223-227, 1965.

[35] R. Burridge and L. Knopoff. Model and theoretical seismicity. Bulletin of


the Seismological Society of America, 57(3):341-371, 1967.

[36] G. Q. Cai andY. K. Lin. Generation of non-Gaussian stationary stochastic


processes. Physical Review, E, 54(1):299-303, 1995.

[37] A. S. Cakmak, J. F Botha, and W. G Gray. Computational and Applied


Mathematics for Engineering Analysis. Springer-Verlag, New York, 1987.

[38] Grigoriu M. Kulkarni S. S. Chati, M. K. and S. Mukherjee. Random walk


method for the two- and three-dimensional Laplace, Poisson and Helmholtz
equations. International Journal for Numerical Methods in Engineering,
51:1133-1156,2001.

[39] A. H. D. Cheng and C. Y. Yang, editors. Computational Stochastic Me-


chanics. Elsevier Applied Science, New York, 1993.

[40] K. L. Chung. A Course in Probability Theory. Academic Press, Inc., New


York, 1974.

[41] K. L. Chung. Green, Brown, and Probability. World Scientific, New Jersey,
1995.

[42] K. L. Chung and R. J. Williams. Introduction to Stochastic Integration.


Birkhauser, Boston, 1990.

[43] R. Courant and D. Hilbert. Methods of Mathematical Physics, volume 1.


Interscience Publishers, Inc., New York, 1953.
760 Bibliography

[44] R. Courant and D. Hilbert. Methods of Mathematical Physics, volume 2.


lnterscience Publishers, Inc., New York, 1953.

[45] H. Cramer and M. R. Leadbetter. Stationary and Related Stochastic Pro-


cesses. John Wiley & Sons, Inc., New York, 1967.

[46] G. T. Csanady. Turbulent Diffusion in the Environment. D. Reidel Publish-


ing Company, Boston, 1973.

[47] J. H. Cushman. Development of stochastic partial differential equations


for subsurface hydrology. Stochastic Hydrology and Hydraulics, 1(4) :241-
262, 1987.

[48] R. E. Cutkosky. A Monte Carlo method for solving a class of integral equa-
tions. Journal ofResearch of the National Bureau of Standards, 47(2): 113-
115, 1951.

[49] W. B. Davenport and W. L. Root. An Introduction to the Theory of Random


Signals and Noise. McGraw-Hill Book Company, Inc., New York, 1958.

[50] P.R. Dawson and E. B. Martin. Computational mechanics for metal de-
formation processes using polycrystal plasticity. Advances in Applied Me-
chanics, pages 77-169, 1998.

[51] J. B. Diaz and L. E. Payne. Mean value theorems in the theory of elasticity.
In Proceedings of the Third U. S. National Congress ofApplied Mechanics,
pages 293-303. The American Society of Mechanical Engineers, June 11-
14 1958.

[52] P. J. Digby. The effective elastic moduli of porous granular rocks. Journal
ofApplied Mechanics, 48:803-808, 1981.

[53] 0. Ditlevsen. Uncertainty Modeling with Applications to Multidimensional


Civil Engineering Systems. McGraw-Hill Inc., New York, 1981.

[54] 0. Ditlevsen and G. Lindgren. Empty envelope excursions in stationary


Gaussian processes. Journal of Sound and Vibration, 122(3):571-587,
1988.

[55] P. Ditlevsen. Observations of a-stable noise induced millennial cli-


mate changes from an ice--core record. Geophysical Research Letters,
26(10):1441-1444, 1999.

[56] J. Durbin. The first-passage density of a continuous gaussian process to a


general boundary. Journal ofApplied Probability, 22:99-122, 1985.

[57] R. Durrett. Brownian Motion and Martingales in Analysis. Wadsworth


Advanced Books & Software, Belmont, California, 1984.
Bibliography 761

[58] R. Durrett. Stochastic Calculus. A Practical Introduction. CRC Press, New


York, 1996.
[59] R. J. Elliott. Stochastic Calculus and Applications. Springer-Verlag, New
York, 1982.
[60] R. J. Elliott and P. E. Kopp. Mathematics of Financial Markets. Springer-
Verlag, New York, 1999.
[61] S. N. Ethier and T. G. Kurtz. Markov Processes. Characterization and
Convergence. John Wiley & Sons, New York, 1986.
[62] W Feller. An Introduction to Probability Theory and Its Applications, vol-
ume II. John Wiley & Sons, Inc., New York, second edition, 1971.
[63] R. V. Field, Jr. and M. Grigoriu. A new perspetive on polynomial chaos. In
P. D. Spanos and G. Deodatis, editors, Proceedings of the Fourth Interna-
tional Conference on Computational Stochastic Mechanics, to appear. Pre-
sented at the Fourth International Conference on Computational Stochastic
Mechanics, Corfu, Greece, June 9-12,2002.
[64] A. Frankel. Mapping seismic hazard in the central united and eastern
United States. Seismological Research Letters, 66(4):8-21, 1995.

[65] B. Friedman. Lectures on Applications-Oriented Mathematics. John Wiley


& Sons, Inc., New York, 1969.

[66] B. Fristedt and L. Gray. A Modem Approach to Probability Theory.


Birkhauser, Boston, 1997.
[67] A. T. Fuller. Analysis of nonlinear stochastic systems by means of the
Fokker-Planck equation. International Journal of Control, 9(6):603-655,
1969.
[68] Y. C. Fung. Foundations of Solid Mechanics. Prentice-Hall, Inc., Engle-
wood Cliffs, NJ, 1965.
[69] H. Furstenberg and H. Keston. Products of random matrices. Annals od
Mathematical Statistics, 31:457-469, 1960.
[70] C. W. Gardiner. Handbook of Stochastic Methods for Physics, Chemistry
and the Natural Sciences. Springer-Verlag, New York, second edition,
1985.
[71] R. G. Ghanem and P. D. Spanos. Stochastic Finite Elements: A Spectral
Approach. Springer-Verlag, New York, 1991.
[72] J. D. Goddard. Continuum modelling of granular assemblies. In H. J.
Herrmann, J.-P. Hovi, and S. Luding, editors, Physics of Dry Granular
Media, pages 1-24. Kluwer Academic Publishers, Boston, 1998.
762 Bibliography

[73] M.D. Greenberg. Foundations ofApplied Mathematics. Prentice Hall, Inc.,


Englewood Cliffs, New Jersey, 1978.

[74] M. Grigoriu. A consistent closure method for nonlinear random vibration.


International Journal ofNonlinear Mechanics, 26(6):857-866, 1991.

[75] M. Grigoriu. A solution of the random eigenvalue problem by crossing


theory. Journal of Sound and Vibration, 158(1):58-80, 1992.

[76] M. Grigoriu. Transient response of linear systems to stationary gaussian


inputs. Probabilistic Engineering Mechanics, 7(3):159-164, 1992.

[77] M. Grigoriu. Simulation of nonstationary Gaussian processes by random


trigonometric polynomials. Journal of Engineering Mechanics, ASCE,
119(2):328-343, 1993.

[78] M. Grigoriu. Simulation of stationary processes via a sampling theorem.


Journal of Sound and Vibration, 166(2):301-313, 1993.

[79] M. Grigoriu. Applied Non-Gaussian Processes: Examples, Theory, Simu-


lation, Linear Random Vibration, and MATLAB Solutions. Prentice Hall,
Englewoods Cliffs, NJ, 1995.

[80] M. Grigoriu. Equivalent linearization for Poisson white noise input. Prob-
abilistic Engineering Mechanics, 11(1):45-51, 1995.

[81] M. Grigoriu. Lyapunov exponents for nonlinear systems with Poisson white
noise. Physics Letters A, 217:258-262, 1996.

[82] M. Grigoriu. Local solutions of Laplace, heat, and other equations by


Ito processes. Journal of Engineering Mechanics, ASCE, 123(8):823-829,
1997.

[83] M. Grigoriu. Mean and covariance equations for boundary value problems.
Journal of Engineering Mechanics, ASCE, 123(5):485-488, 1997.

[84] M. Grigoriu. Solution of some elasticity problems by the random walk


method. ACTA Mechanica, 125:197-209, 1997.

[85] M. Grigoriu. A local solution of the SchrOdinger equation. Journal of


Physics A: Mathematical and General, 31:8669-8676, 1998.

[86] M. Grigoriu. A Monte Carlo solution of transport equations. Probabilistic


Engineering Mechanics, 13(3):169-174, 1998.

[87] M. Grigoriu. A spectral representation based model for Monte Carlo sim-
ulation. Probabilistic Engineering Mechanics, 15(4):365-370, October
2000.
Bibliography 763

[88] M. Grigoriu and S. Balopoulou. A simulation method for stationary Gaus-


sian random functions based on the sampling theorem. Probabilistic Engi-
neering Mechanics, 8(3-4):239-254, 1993.
[89] M. Grigoriu, D. Veneziano, and C. A. Cornell. Probabilistic modelling as
decision making. Journal of the Engineering Mechanics Division, ASCE,
105(EM4):585-596, 1979.

[90] V. K. Gupta, I. Rodriguez-Iturbe, and E. F. Wood, editors. Scale Problems


in Hydrology. D. Reidel Publishing Company, Boston, 1986.

[91] Raiffa H. and R. Schlaifer. Applied Statistical Decision Theory. The M.I.T.
Press, Cambridge, Massachusetts, 1961.

[92] P. Halmer. On the foundations of stochastic dislocation dynamics. Applied


Physics A, 62:473-481, 1996.
[93] J. H. Halton. Sequential monte carlo. In Proceedings of the Cambridge
Philosophical Society (Mathematical and Physical Sciences), volume 58,
pages 57-78. Cambridge at the University Press, 1962.

[94] J. M. Hammersley and D. C. Handscomb. Monte Carlo Methods. Methuen


& Co. Ltd., London, 1964.

[95] J. M. Harmmersley and D. C. Handscomb. Monte Carlo Methods. Methuen


& Co. Ltd., London, 1967.

[96] S. Harmsen and A. Frankel. Deaggregation of probabilistic ground mo-


tions in the central and eastern United States. Bulletin of the Seismological
Society of America, 89(1):1-13, 1999.

[97] A. M. Hasofer and M. Grigoriu. A new perspective on the moment closure


method. Journal ofApplied Mechanics, 62(2):527-532, 1995.

[98] D. B. Hernandez. Lectures on Probability and Second Order Random


Fields. World Scientific, London, 1995.

[99] F. B. Hildebrand. Methods of Applied Mathematics. Prentice Hall, Inc.,


Englewood Cliffs, N.J., 1965.

[100] B. M. Hill. A simple general approach to inference about the tail of a


distribution. The Annals of Statistics, 3(5):1163-1174, 1975.
[101] E. J. Hinch. Perturbation Methods. Cambridge University Press, Cam-
bridge, 1994.

[ 102] W. Horsthemke and R. Lefever. Phase transitions induced by external noise.


Physics Letters, 64A(1):19-21, 1977.
[103] K. Huang. Statistical Mechanics. John Wiley & Sons, New York, 1987.
764 Bibliography

[104] I.D. Huntley and R. M. Johnson. Linear and Nonlinear Differential Equa-
tions. John Wiley & Sons, New York, 1983.

[105] R. A. Ibrahim. Parametric Random Vibration. John Wiley & Sons Inc.,
New York, 1985.

[106] R. Iranpour and P. Chacon. The Mark Kac Lectures. Macmillan Publishing
Company, New York, 1988.

[107] K. Ito and M. Matsuzaki. Earthquakes as self-organized critical phenom-


ena. Journal of Geophysical Research, 95(B5):6853-6860, 1990.

[108] J. T. Jenkins and 0. D. L Strack. Mean field inelastic behavior of random


arrays of identical spheres. Mechanics of Materials, 16:25-33, 1993.

[109] G. Kallianpur. Stochastic Filtering Theory. Springer-Verlag, New York,


1980.

[110] R. P. Kanwal. Linear Integral Equations. Theory and Technique. Academic


Press, New York, 1971.

[111] S. Karlin and H. M. Taylor. A Second Course in Stochastic Processes.


Academic Press, New York, 1981.

[112] R. Z. Khas'minskii. A limit theorem for the solution of differential equa-


tions with random right-hand sides. Theory of Probability and Its Applica-
tions, XI(3):390--406, 1966.

[113] I. C. Kim and S. Torquato. Determination of the effective conductivity of


heterogeneous media by brownian motion simulation. Journal of Applied
Physics, 68(8):3892-3903, 1990.

[114] M. Kleiber and T. D. Hien. The Stochastic Finite Element. John Wiley &
Sons, New York, 1992.

[115] P. E. Kloeden and Platen E. Numerical Solutions of Stochastic Differential


Equations. Springer-Verlag, New York, 1992.

[116] M. A. Koenders. Evolution of spatially structures elastic materials using a


harmonic density function. Physical Review E, 56(5), 1997.

[117] V. Krishnan. Nonlinear Filtering and Smoothing: An Introduction to Mar-


tingales, Stochastic Integrals and Estimation. John Wiley & Sons, New
York, 1984.

[118] S. Kulkarni, S. Mukherjee, and M. Grigoriu. Applications ofthe boundary


walk method to potential theory and linear elasticity. Journal of Applied
Mechanics, ASME, 2002. in press.
Bibliography 765

[ 119] A. Kumar and P. R. Dawson. The simulation of texture evolution with finite
element over orientation space. Application to planar polycrystals. Applied
Mechanics and Engineering, 130:247-261, 1996.

[ 120] P. Lancaster and M. Tismenetsky. The Theory ofMatrices. Academic Press,


Inc., New York, second edition, 1985.

[121] M. R. Leadbetter, G. Lindgren, and H. Rootzen. Extremes and Related


Properties of Random Sequences and Processes. Springer-Verlag, New
York, 1983.

[122] Y. K. Lin and G. Q. Cai. Probabilistic Structural Dynamics. Advanced


Theory and Applications. McGraw-Hill, New York, 1995.

[123] G. G. Lorentz. Bernstein Polynomials. Chelsea Publishing Company, New


York, 1986.

[124] E. Lukacs. Characteristic Functions. Number 5 in Griffin's Statistical


Monographs & Courses. Charles Griffin & Company Limited, London,
1960.

[125] K. A. Mal and S. J. Singh. Deformation of Elastic Solids. Prentice Hall,


1991.

[126] R.N. Mantegna and H. E. Stanley. An Introduction to Econophysics. Cor-


relation and Complexity in Finance. Cambridge University Press, Cam-
bridge, UK, 2000.

[127] L. Meirovitch. Computational Methods in Structural Dynamics. Sijthoff &


Noordhoff, Rockville, Maryland, 1980.

[128] Yu. A. Melnikov. Green's Functions in Applied Mathematics. Computa-


tional Mechanics Publications, Boston, 1995.

[129] J. L. Melsa and A. P. Sage. An Introduction to Probability and Stochastic


Processes. Prentice-Hall, Inc., Englewood Cliffs, NJ, 1973.

[130] C. Meunier and A. D. Verga. Noise and bifurcations. Journal of Statistical


Physics, 50(112):345-375, 1988.

[131] T. Mikosch. Elementary Stochastic Calculus. World Scientific, New Jersey,


1998.

[132] A. Naess and J. M. Johnsen. Response statistics of nonlinear, compliant


offshore structures by the path integral solution method. Probabilistic En-
gineering Mechanics, 8(2):91-106, 1993.

[133] M. Nicolescu. Funcfii Reale# Elemente de Topologie (in Romanian). Ed-


itura Didactica si Pedagogica, Bucharest, Romania, 1968.
766 Bibliography

[134] B. 0ksendal. Stochastic Differential Equations. Springer-Verlag, New


York, 1992.

[135] B. 0ksendal. Stochastic Differential Equations. An Introduction with Ap-


plications. Springer-Verlag, New York, 1998.

[136] M. Ostoja-Starzewski. Micromechanics as a basis of random elastic con-


tinuum approximations. Probabilistic Engineering Mechanics, 8:107-114,
1993.
[137] M. Ostoja-Starzewski. Micromechanics as a basis of continuum random
fields. Applied Mechanics Reviews, 47:221-230, 1994.
[138] W. J. Padgett, G. Schultz, and C. P Tsokos. A random differential equation
approach to the probability distribution of bod and do in streams. SIAM
Journal ofApplied Mathematics, 32(2):467-483, 1977.
[139] A. S. Papageorgiou. On the characteristic frequencies of acceleration spec-
tra: Patch comer frequency and f-max. Bulletin ofthe Seismological Society
of America, 78(2):509-529, 1988.
[140] G. C. Papanicolaou. Wave propagation in a one-dimensional random
medium. SIAM Journal of Applied Mathematics, 21(1):13-18, 1971.
[141] A. Papoulis. Probability, Random Variables, and Stochastic Processes.
McGraw-Hill Book Company, New York, 1965.
[142] V. V. Petrov. Sums of Independent Random Variables. Springer-Verlag,
New York, 1975.
[143] A. J. Pettofrezzo. Matrices and Transformations. Dover Publications, Inc.,
New York, 1966.
[144] R. Popoescu, J. H. Prevost, and G. Deodatis. Effects of spatial variabil-
ity on soil liquefaction: Some design recommendations. Geotechnique,
XLVII(5):1019-1036, December 1997.
[145] V. C. Prantil, J. T. Jenkins, and P. R. Dawson. An analysis of texture and
plastic spin for planar polycrystals. Journal of the Mechanics and Physics
of Solids, 42(8):1357-1382, 1993.
[146] J. H. Prevost. A simple plasticity theory for frictional cohesionless soils.
Soil Dynamics and Earthquake Engineering, 4:9-17, 1985.
[147] P. Protter. Stochastic Integration and Differential Equations. Springer-
Verlag, New York, 1990.
[148] F. Radjai. Multicontact dynamics. In H. J. Herrmann, J.-P. Hovi, and
S. Luding, editors, Physics ofDry Granular Media. Kluwer Academic Pub-
lishers, Boston, 1998.
Bibliography 767

[149] K. Rektorys. Survey of Applicable Mathematics. The M.I.T. Press, Cam-


bridge, Massachusetts, 1969.

[150] S. I. Resnick. Adventures in Stochastic Processes. Birkhauser, Boston,


1992.

[151] S. I. Resnick. A Probability Path. Birkhauser, Boston, 1998.

[152] J. B. Roberts. Averaging Methods in Random Vibration. Number R-245.


Technical University of Denmark, Lyngby, Denmark, 1989.

[153] J. B. Roberts and P. D. Spanos. Random Vibration and Statistical Lineariza-


tion. John Wiley & Sons, New York, 1990.

[154] S.M. Ross. Stochastic Processes. John Wiley & Sons, New York, 1983.

[155] R. Rubinstein. Simulation and the Monte Carlo Method. John Wiley &
Sons, New York, NY, 1981.

[156] W. Rudin. Real and Complex Analysis. McGraw-Hill, Inc., New York,
1974.

[157] P. A. Ruymgaart and T. T. Soong. Mathematics of Kalman-Bucy Filtering.


Springer-Verlag, New York, 1988.

[158] I. Rychlik. A note on Durbin's formula for the first-passage density. Statis-
tics & Probability Letters, 5:425-428, 1987.

[159] I. Rychlik and G. Lindgren. Crossreg, a computer code for first passage and
wave density analysis. Technical report, University of Lund and Lund Insti-
tute of Technology, Department of Mathematical Statistics, Lund, Sweden,
1990.

[160] K. K. Sabelfeld. Monte Carlo Methods in Boundary Value Problems.


Springer-Verlag, New York, 1991.

[161] G. Samorodnitsky. Long Range Dependence, Heavy Tails and Rare Events.
Lecture notes. MaPhySto, Center for Mathematical Physics and Stochas-
tics. Aarhus, Denmark, 2002.

[162] G. Samorodnitsky and M.S. Taqqu. Stable Non-Gaussian Random Pro-


cesses. Stochastic Models with Infinite Variance. Birkhauser, New York,
1994.

[163] S. B. Savage. Modeling and granular material boundary value problems. In


H. J. Herrmann, J.-P. Hovi, and S. Luding, editors, Physics of Dry Granular
Media. Kluwer Academic Publishers, Boston, 1998.

[164] J. Scheidt and W. Purkert. Random Eigenvalue Problem. North Holland,


New York, 1983.
768 Bibliography

[165] Z. Schuss. Theory and Applications of Stochastic Differential Equations.


John Wiley & Sons, New York, 1980.
[166] F. C. Schweppe. Uncertain Dynamic Systems. Prentice Hall, Inc, Engle-
wood Cliffs, NJ, 1973.

[167] J.P. Sethna, K. Dahmen, S. Kartha, J. A. Krumhansi, B. W. Roberts, and


J.D. Shore. Hysteresis and hierarchies: Dynamics of disorder-driven first-
order phase transformations. Physical Review Letters, 70(21):3347-3350,
1993.
[168] J. P. Sethna, K. Dahmen, and C. R. Myers. Crackling noise. Nature,
410:242-250,2001.

[169] M. Shinozuka. Structural response variability. Journal of Engineering Me-


chanics, ASCE, 113(EM6):825-842, 1987.
[170] M. Shinozuka and Yang J.-N. On the bounds of first excursion probability.
Journal of the Engineering Mechanics Division, ASCE, 95(EM2):363-377,
1969.

[171] J. G. Simmonds and J. E. Mann. A First Look at Perturbation Theory.


Robert E. Krieger Publishing Company, Malabar, Florida, 1986.
[172] D. L. Snyder. Random Point Processes. John Wiley & Sons, New York,
1975.
[173] K. Sobczyk. Stochastic Wave Propagation. Elsevier, New York, 1985.

[174] C. Soize. The Fokker-Planck Equation for Stochastic Dynamical Systems


and Its Explicit Steady State Solution. World Scientific, New Jersey, 1994.

[175] T. T. Soong and M. Grigoriu. Random Vibration of Mechanical and Struc-


tural Systems. Prentice Hall, Englewood Cliffs, N.J., 1993.

[176] T.T. Soong. Probabilistic Modeling and Analysis in Science and Engineer-
ing. John Wiley & Sons, New York, NY, 1981.
[177] P. D. Spanos. Stocahstic linearization in structural dynamics. Applied Me-
chanics Reviews, 34(1):1-8, 1981.
[178] P. D. Spanos and C. A. Brebbia, editors. Computational Stochastic Me-
chanics. Elsevier Applied Science, New York, 1991.
[179] B. F. Spencer and L.A. Bergman. On the numerical solutions of the Fokker-
Planck equations for nonlinear stochastic systems. Nonlinear Dynamics,
4:357-372, 1993.
[180] D. Stoyan, W. S. Kendall, and J. Mecke. Stochastic Geometry and Its Ap-
plications. John Wiley & Sons, New York, 1987.
Bibliography 769

[ 181] H. Tanaka. Application of importance sampling method to time-dependent


system reliability analysis using Girsanov transformation. InN. Shiraishi,
M. Shinozuka, and Y. K. Wen, editors, Structural Safety and Reliability,
ICOSSAR'97, volume 1, Rotterdam, Netherlands, 1998. A. A. Balkema.

[182] S. J. Taylor. Exact asymptotic estimates of brownian path variation. Duke


Mathematical Journal, 39, 1972.

[183] G. P. Tolstov. Fourier Series. Dover Publications, Inc., New York, 1962.

[184] S. Torquato. Thermal conductivity of disordered heterogeneous media form


the microstructure. Reviews in Chemical Engineering, 4(3 & 4):151-204,
1987.

[185] S. Torquato. Random Heterogeneous Materials. Microstructure and


Macroscopic Properties. Springer, New York, 2002.

[186] S. Torquato and I. C. Kim. Efficient simulation technique to compute


effective properties of heterogeneous media. Applied Physics Letters,
55(18):1847-1849, 1989.

[187] F. G. Tricomi. Integral Equations. Dover Publications, Inc., New York,


1957.

[188] H. S. Thrkmen, R. Loge, P. R Dawson, and M. P. Miller. The micro-


mechanical behavior of a polycrystalline metal during cycling loading, in
preparation. 2002.

[189] E. Vanmarcke. On the distribution of the first-passage time for normal


stationary random processes. Journal of Applied Mechanics, 42:215-220,
1975.

[190] E. Vanmarcke, M. Shinozuka, S. Nakagiri, G.l. Schueller, andM. Grigoriu.


Random fields and stochastic finite elements. Structural Safety, 3:143-166,
1986.

[191] W. V. Wedig. Lyapunov Exponents of Stochastic Systems and Related Bi-


furcation Problems. Elsevier Applied Science, 315-327, New York, 1988.

[192] S. Whitaker. Multiphase transport phenomena: matching theory and ex-


periments. In G. Papanicolaou, editor, Advances in multiphase flow and
related problems, pages 273-295. SIAM, 1986.

[193] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Clarendon Press,


Oxford, 1988.

[194] J. R. Willis. Variational and related methods for the overall properties of
composites. Advances in Applied Mechanics, 21:1-78, 1981.
770 Bibliography

[195] R. Willis and W. W.-G. Yeh. Groundwater Systems Planning and Manage-
ment. Prentice Hall, Englewood Cliffs, NJ, 1987.

[196] S. F. Wojtkiewicz and L. A. Bergman. Numerical solution of high-


dimensional Fokker-Planck equations. In A. Kareem, A. Haldar, B. F.
Spencer, and Johnson E. A., editors, Proceedings of the 8'th ASCE Joint
Specialty Conference on Probabilistic Mechanics and Structural Reliabil-
ity, CD-ROM, July 24-26 2000.

[197] E. Wong and B. Hajek. Stochastic Processes in Engineering Systems.


Springer-Verlag, New York, 1985.

[198] W-C Xie and S. T. Ariaratnam. Vibration mode localization in large


randomly disordered continuous beams. Fields Institute Communication,
9:219-238, 1996.

[199] P. Yiou, K. Fuhrer, L. D. Meeker, J. Jouzel, S. Johnsen, and P. A.


Mayewski. Paleoclimatic variability inferred from the spectral analysis of
Greenland and Antarctic ice-core data. Journal of Geophysical Research,
102(C12):26441-26454, 1997.

[200] J. M. Zagajac. Engineering Analysis over Subdomains. PhD thesis, Cor-


nell University, Sibley School of Mechanical and Aerospace Engineering,
Ithaca, New York, May 1997.

[201] A. Zellner. An Introduction to Bayesian Inference in Econometrics. John


Wiley & Sons, Inc., New York, 1971.
Index

Lq(n, :F, P) space, 34 Decomposition method, 563


a-field, 6, 116 Differential equations for
Borel, 7 characteristic function, 455, 478,
generated by random process, 107 510
generated by random variable, density, 481
22 backward Kolmogorov, 490
Fokker-Planck equation, 481
Backward Kolmogorov equation, 490, moments, 433, 437, 452, 463,
494 475,494,508
Bochner's theorem, 132 Diffusion coefficient, 254
Borel-Cantelli Lemma, 21 Diffusion process, 254, 258, 262, 267,
Brownian motion, 107, 256 432,475
in the first orthant of JR2 , 387 Doob decomposition, 95
properties, 186 Doob-Meyer decomposition, 175
reflected at two thresholds, 383 Drift coefficient, 254
reflected at zero, 372 Dynkin formula, 350
local time, 372
Tanaka formula, 373 Earthquake engineering and seismol-
ogy, 741
Cadlag, caglad, 113 cellular automata model, 743
Cauchy distribution, 46 fragility surface, 528
Chapman-Kolmogorov equation, 121 soil liquefaction, 663
Compound Poisson process, 111, 182 Eigenvalue problem, 370, 413, 421,
Conditional expectation, 82 566,599
change of fields, 89 Equivalent linearization method, 564
conditional probability with re- Expectation operator, 28, 33, 143
spect to a a-field, 91 Extremes of stochastic processes, 165
defining relation, 84 first passage time, 168, 527
properties, 89 mean crossing rate, 165
Conditional probability, 16
Convergence of random variables, 70 Fatou's lemma, 21
Lp, 70 Feynman-Kac functional, 364, 365
a.s., 70 Filtration, 78, 107
distribution, 70 Finance, 534
mean square, 70 Black-Scholes formula, 536
probability, 70 stock price model, 535

771
772 Index

Finite difference method, 411, 488, Levy measure, 56


531,533,600,700 Levy process, 189
Finite dimensional distribution, 117 Levy decomposition, 193, 197
Finite element method, 600, 652, 668, Levy-Khinchine formula, 197
701 Levy-Khinchine representation, 56
First passage time, 168, 527 Liouville equation, 687, 714, 730
Fokk:er-Planckequation, 269,481,494, Lipschitz condition, 258
531 Local solution
Fubini theorem, 39,99 algebraic and integral equations
homogeneous,413,421
Girsanov's theorem, 337, 338 inhomogeneous,407,418
differential equation
Independence boundary walk method, 403
a-fields, 36 Feynman-Kac functional, 364
events, 36 random walk method, 345, 550
random variables, 38 Schrodingerequation,367,370
Independentincrements, 122, 182, 186, spherical process method, 394
189 Lyapunov exponent
Inequalities modelocalization,666
Cauchy-Schwarz, 69 noise induced transitions, 720
Chebyshev, 68 stochastic stability, 655
Doob, 98
Holder, 69 Markov property, 81
Jensen, 68 Martingale, 94
Minkowski, 69 Doob decomposition, 95
Infinitely divisible characteristic func- Doob inequality, 98
tion, 52 Doob-Meyerdecomposition, 175
a-stable, 57 Jensen inequality, 176
canonical representation, 55 stopped, 96, 175
construction, 54 submartingale, 92, 94, 169
Levy-Khinchine representation, supermartingale, 92, 94, 169
56 variation and covariation, 179
properties, 53 Martingales, 92, 169
Integrals of random variables Materials science, 518
Fatou lemma, 32 effective properties, 605
Lebesgue theorem, 33 evolution, 633
properties,30 Reuss average, 623
Ito formula, 237 Voigt average, 621
multi-dimensional case, 247 Measurable functions, 21
one-dimensional case, 238 Measurable space, 6
Iteration method, 565, 574, 604 Mixture of translation process, 126
Modeling
Karhunen-Loeve representation, 161 non-Gaussian input, 515
Kolmogorov criterion, 110 partially known input, 745
Index 773

unknown input and system, 748 Radon-Nikodym derivative, 41


Monte Carlo simulation Random field, 104, 158
applications, 507,527,528,531, Random variable, 22
533, 552, 554, 567, 581, ?-integrable, 29
584,614,677,736 arbitrary, 29
improved characteristic function, 47
measure change, 334 density function, 45
time change, 330 distribution function, 43
non-Gaussian process and field finite-valued simple, 28
memoryless transformations, positive, 29
316 Random variables, 42
transformation with memory, Random vector, 22
320 characteristic function, 62
non-stationary Gaussian process Gaussian vector, 65
and field independence,61
Fourier series, 312,315 joint density function, 59
linear differential equations, joint distribution function, 59
310 moments, 64
point processes, 325
second moment properties, 65
random variable, 288
Random vectors, 58
stationary Gaussian process and
Random walk, 7 5
field, 293
Reliability, 522
sampling theorem, 304, 309
Reuss average, 623
spectral representation, 293,
299 Riemann-Stieltjes integral, 206

Neumann series method, 561, 591, Sample space, 5


630,693 SchrOdingerequation, 346,367,370
Second moment calculus for processes
Omstein-Uhlenbeckprocess, 257,263, in L2, 139
314,340 expectation and mean square in-
tegrals, 152
Path integral solution, 494
Karhunen-Loeve representation,
Perturbation method, 499, 558, 572,
161
587,603,629,692
mean square continuity, 111, 141
Poisson process, 182, 326
Probability measure, 8 mean square differentiation, 14 2
Probability space, 5 mean square integration, 145
Product probability space, 13 spectral representation, 153
variation functions, 146
Quadratic variation and covariation, Second moment properties, 65, 138,
179,228 159
integration by parts, 229 Semimartingale, 324
Kunita-Watanabe inequality, 235 Sequence of events, 20
polarization identity, 229 State augmentation, 460, 512, 682
774 Index

Stationary increments, 111, 122, 182, Taylor series method, 554, 570, 584,
186, 189 691
Stationary process, 119 Translation process, 125
in the strict sense, 119
in the weak sense, 128 Voigt average, 621
spectral density, 132
White noise process, 144, 184, 186,
Stochastic differential equation, 253
254,322,323
numerical solution, 275 Wong-Zakai theorem, 267
Euler, 277
Milstein, 280
semimartingale input, 271
Brownian motion input, 256
diffusion process, 262
equations for characteristic func-
tions, 267
equations for densities, 267
equations for moments, 267
semimartingale, 263
Wong-Zakai theorem, 267
Stochastic integral, 208
associativity, 225
Ito integral, 208, 250
preservation, 225
semimartingale, 217
simple predictable integrand, 221
Stratonovich integral, 210, 249,
250
Stochastic process, 104
adapted, 107
dtdlag, caglact, 113
classes of, 119
correlation, 127, 130
covariance, 127
finite dimensional densities, 117
finite dimensional distribution,
117
measurable, 106
progressively measurable, 108
sample properties, 110
second moment properties, 127
Stopping time, 78, 114
Stratonovich integral, 249

S-ar putea să vă placă și