Sunteți pe pagina 1din 11

The Development of Rigor in Mathematical Probability (1900-1950)

Author(s): Joseph L. Doob


Source: The American Mathematical Monthly, Vol. 103, No. 7 (Aug. - Sep., 1996), pp. 586-595
Published by: Mathematical Association of America
Stable URL: http://www.jstor.org/stable/2974673
Accessed: 07-11-2015 11:14 UTC

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.

Mathematical Association of America is collaborating with JSTOR to digitize, preserve and extend access to The American
Mathematical Monthly.

http://www.jstor.org

This content downloaded from 213.233.188.210 on Sat, 07 Nov 2015 11:14:07 UTC
All use subject to JSTOR Terms and Conditions

THE EVOLUTIONOF...
Edited by:Abe Shenitzer
NorthYork,OntarioM3J1P3, Canada
Mathematics,YorkUniversity,

The Developmentof Rigor


in Mathematical Probability(1900-1950)
Joseph L. Doob
1 Introduction.This paper is a brief informal outline of the history of the
introductionof rigourinto mathematicalprobabilityin the first half of this century.
Specificresultsare mentionedonly in so far as they are importantin the historyof
the logical developmentof mathematicalprobability.
The developmentof science is not a simpleprogressionfrom one advanceto the
next. Judged by hindsight,the developmentis slow, proceeds in a zigzag course,
with many wrong turns and blind alleys, and frequently moves in directions
condemnedby leading scientists.In the 1930'sBanach spaces were sneered at as
absurdlyabstract,later it was the turn of locally convex spaces, and now it is the
turn of nonstandardanalysis. Mathematiciansare no more eager than other
humansto embracenew ideas, and full acceptanceof mathematicalprobabilit was
not realized until the second half of the century.In particular,many statisticians
and probabilistsresented the mathematizationof probabilityby measure theory,
and some still place mathematicalprobabilityoutside analysis. The following
quotations(in translation)are relevant.
A new scientifictruthdoes not triumphby convincingits opponentsand
die,
makingthemsee the light, but ratherbecauseits opponentseventuaGy
and a newgenerationgrowsup withit.
Poincare: Formerly,when one inventeda new function, it was to furthersome
practicalpurpose;todayone inventsthem in orderto make incorrectthe
reasoningof ourfathers,and nothingmore will everbe accomplishedby
theseinventions.
Hermite: (in a letter to Stieltjes)I recoilwithdismayand horrorat thislamentable
plagueof functionswhichdo not hauederivatives.

Planck:

Probabilitytheory began, and remained for a long time, an idealization and


analysisof certain real life phenomenaoutside mathematics,but gradually,in the
first half of this centuiy, mathematicalprobabilitybecame a normal part of
mathematics.The mathematizationof probabilityrequired new ideas, and in
particularrequireda new approachto the idea of acceptabilityof a function. In

Reprinted with kind permission of Birkhauser Verlag AG, from Deuelopmentof Mathematics
by Jean-Paul Pier, Basel, 1994, ISBN 0-8176-2821-5

1900-1950,
edited
586

THE EVOLUTION OF . . .

[Aug.-Sept.

This content downloaded from 213.233.188.210 on Sat, 07 Nov 2015 11:14:07 UTC
All use subject to JSTOR Terms and Conditions

view of the above quotationsit is not surprisingthat acceptanceof this mathematization was slow and faced resistance.In fact even now some probabilistsfear that
mathematizationhas removedthe intrinsiccharmfrom their subject.And they are
based
rightin the sense that the charmof the old, vague probability-mathematics,
on nonmathematicaldefinitions,has split into two quite differentcharms:those of
real world probabilityand of mathematicalprecision.But it must be stressed that
manyof the most essentialresultsof mathematicalprobabilityhave been suggested
by the nonmathematicalcontext of real world probability,which has never even
had a universallyacceptable definition. In fact the relation between real world
probabilityand mathematicalprobabilityhas been simultaneouslythe bane of and
inspirationfor the developmentof mathematicalprobability.
2 VVhatis the real world (nonmathematical)problem? What is usually called
(real world) probabilityarises in many contexts. Besides the obvious contexts of
gamblinggames, of insurance,of statisticalphysics,there are such simple contexts
as the following.Suppose an individualrides his bicycle to work. The riderwould
be surprisedif, when the bicycleis parked,the valve on the front tire appearedin
the upper half of the tire circle 10 successive days, just as surprised as if 10
successivetosses of a coin all gave heads. However,it is clear that (tire context)if
the ride is very short, or (coin context) if the coin starts close to the coin landing
place and the initial rotational velocity of the coin is low, the surprise would
decrease and the probabilitycontextwould become suspect. The moral is that the
specific context must be examined closely before any probabilisticstatement is
made. If philosophyis relevant,an arguablequestion,it must be augmentedby an
examinationof the physicalcontext.
3 The law of large numbers. In a repetitivescheme of independenttrials,such as
coin tossing,what strikesone at once is what has been christenedthe law of large
numbers.In the simple context of coin tossing it states that in some sense the
numberof heads in n tosses dividedby n has limit 1/2 as the numberof tosses
increases.The key words here are in some sense. If the law of large numbersis a
mathematicaltheorem,that is, if there is a mathematicalmodel for coin tossing,in
which the law of large numbersis formulatedas a mathematicaltheorem, either
the theorem is true in one of the variousmathematicallimit concepts or it is not.
On the other hand, if the law of large numbersis to be stated in a real world
nonmathematicalcontext, it is not at all clear that the limit concept can be
formulatedin a reasonableway. The most obvious difficultyis that in the real
worldonly finitelymanyexperimentscan be performedin finite time. Anyonewho
tries to explainto studentswhat happenswhen a coin is tossed mumbleswordslike
in the long run, tends, seems to clusternear, and so on, in a desperate attempt to
give form to a cloudy concept. Yet the fact is that anyone tossing a coin observes
that for a modestnumberof coin tosses the numberof heads in n tosses dividedby
n seems to be getting closer to 1/2 as n increases.The simplestsolution, adopted
by a prominent Bayesian statistician, is the vacuous one: never discuss what
happenswhen a coin is tossed. A more commonequallysatisfactorysolution is to
leave fuzzy the question of whether the context under discussion is or is not
mathematics.Perhapsthe fact that the assertionis called a law is an example of
this fuzziness. The following statements have been made about this law (my
emphasis):
Laplace: (1814) This theorem,impliedby commonsense, was difficultto prove by
analysis.
1996]

THE EVOLUTION OF . . .

This content downloaded from 213.233.188.210 on Sat, 07 Nov 2015 11:14:07 UTC
All use subject to JSTOR Terms and Conditions

587

Ville:
Bauer:

(1939) One sees no reasonfor this propositionto be true; but as it is


impossibleto prove experimentally
that it is false, one can at least safely
stateit.
(translatedfrom the context of dice to that of coins) It is an experimentally establishedfact that the quotient.. . exhibitsa deviationfrom 1/2
whichapproaches0 for largen.

These statements illustrate the enduring charm of discussions of real world


probability.Mathematicians,unfortunately,have felt forced to think about the
followingquestion,or at least to write about it.
4 What is probability? Here are some attemptsto answerthis question and to
discussthe teachingof the subject.
Poincare:
(1912) One can scarcelygive a satisfactorydefinitionof probability.
Mazurkiewicz: (1915) The theoryof probabilityis not an independentelementof
mathematicalinstruction;nevertheless
it is verydesirablethata mathematicianknowsits generalprinciples.Its fundamentalconceptsare
incompletelydetermined.Theycontainmanyunsolveddifficulties.
v. Mises:
(1919) In fact, one can scarcelycharac-terize
the presentstate other
thanthatprobabilityis not a mathematicaldiscipline.(He proceeded
to make it into a mathematicaldisciplineby basing mathematical
probabilityon a sequence of observations(<<Beobachtungen>>)
with
propertiesthat cannotbe satisfiedby a mathematicallywell defined
sequence. In a lightermood he is said to have defined probability
as a numberbetween 0 and 1 aboutwhich nothingelse is known.)
E. Pearson: (1935) (Oral communication)Probabilityis so linkedwithstatistics
that,.althoughit is possibleto teachthe two separately,such a project
wouldbejust a tourde force.
Uspensky:
(1937) In a useful textbook,he gave the following once common
textbook definition. If, consistentwith conditionS, there are n
mutuallyexclusive,and equallylikelycases, and m arefavorableto
the eventA, then the mathematicalprobabilityof A is definedas
m/n.
The foregoing should make obvious the advisabilityof separatingmathematical
probabilitytheory from its real world applications.Note however that no one
doubts the real world applicabilityof mathematicalprobability.Gambling,genetics, insuranceand statisticalphysicsare here to stay.
Only mathematicalprobabilitywill be discussedbelow, except for the following
remark on coin tossing. Newtonian mechanics provides a partial mathematical
model for coin tossing. In coin tossing, a solid body falls under the influence of
gravity.Its motionis determinedin Newton'smodel by his laws, and any discussion
of what the coin does cannotbe completeunless these laws are applied.Onlythese
laws, ratherthan philosophicalremarks,can explainthe quantitativeinfluenceand
importanceof the initial and final conditionsof the coin motion in order to justify
allusionsto equal likelihood of heads and tails. Of course these laws can at best
reduce the analysisto considerationsof the initial and final conditionsof the toss,
but these conditions can show what the <<equallikelihoods>>
depends on and
therebygive it a plausibleinterpretation.

588

THE EVOLUTION OF . . .

[Aug.-Sept.

This content downloaded from 213.233.188.210 on Sat, 07 Nov 2015 11:14:07 UTC
All use subject to JSTOR Terms and Conditions

function A +(A) which in general is not additive. In fact + satisfies

the

5 Mathematicalprobabilitybefore the era of precise defilnitions.There were


manyimportantadvancesin mathematicalprobabilitybefore 1900,but the subject
was not yet mathematics.Although nonmathematicalprobabilisticcontexts suggested problemsin combinatorics,differenceequationsand differentialequations,
there was a minimumof attentionpaid to the mathematicalbasis of the contexts a
maximumof attention to the pure mathematicsproblems they suggested. This
unequaltreatmentwas inevitable,because measuretheory,needed for mathematical modelingof real world probabilisticcontexts,had not yet been invented.
It was alwaysclear that, howeverclassical mathematicalprobabilitywas to be
developed, the concept of additivityof probabilityas applied to incompatiblereal
worldeventswas fundamental.Additivefunctionsof sets were of coursefamiliarto
mathematiciansfrom conceptsof volume,mass and so on, long before 1900.It was
realizedthat contextsinvolvingaveragesled to probability.It was frequentlyclear
how to use the contextsto suggestproblemsin analysis,but it was not clear how to
formulatean overall mathematicalcontext, that is, how to define a mathematical
structurein which the variouscontextscould be placed.
A weaker condition than additivitywas less familiar but turned out to be
essential later. The standardloose languagewill be used here. If x1, x2, . . . are
numbersobtainedby chance,and if A is a set of numbers,considerthe probability
that at least one of the membersof this sequence lies in A, or, in more colorful
language,considerthe probabilitythat an orbit of this motion throughpoints of a
line hits A. The usual calculation(ignoringhere all notions of rigour),defines a
inequality
(t(A) + (h(B)-+(A

U B) 2 +(A n B),

(5.1)

whereasadditivityof + would implyequalityin (5.1). The point is that the left side
of (5.1) is the probabilitythat the sequence x hits both A and B, a probabilityat
least equal to, and in general greater than, +(A n B), the probabilitythat the
sequence hits A n B. The inequality(5.1), the strongsubadditivityinequality,is
satisfiedalso by the electrostaticcapacityof a body in R3, and this fact hints at the
close connection between potential theory and probability,developed in great
detail in the second half of the century with the help of Choquet'stheory of
mathematicalcapacity.
6 The developmentof measuretheory. Recall that a Borelfield(-(r al,kebra)of
subsetsof a space is a collectionof subsetswhich is closed under the operationsof
complementationand the formation of countable unions and intersections.The
class of Borelsets of a metric space is the smallest set (r algebracontainingthe
open sets of the space. A measurablespace is a pair, (S, S), where S is a space and
s is a (r algebraof subsetsof S. The sets of S are the measurable
sets of the space.
In the following,if S is metric,the coupled ff algebramakingit into a measurable
space will alwaysbe the cr algebraof its Borel sets. In particular(RN,RN) denotes
N dimensionalEuclideanspace coupledwith its Borel sets. The superscriptwill be
omitted when N = 1. A measurablefunction from a measurablespace (S1,S1)
into a measurablespace tS2, S2) iS a function from S1 into S2 with the property
that the inverseimage of a set in 52 is a set in S1.
Measure theory started with Lebesgue's thesis (1902), which extended the
definitionof volume in RN to the Borel sets. Radon (1913) made the furtherstep

1996]

THE EVOLUTION OF . . .

This content downloaded from 213.233.188.210 on Sat, 07 Nov 2015 11:14:07 UTC
All use subject to JSTOR Terms and Conditions

589

variables, (t, s) x(t, s), fromOxS into the state space. Thefunction x(t,.)

to general measuresof Borel sets of RN (finite on compactsets). These measures


are usually extended to slightly larger classes than the class of Borel sets, by
completion Finally Frechet (1915), 13 years after Lebesgue'sthesis, pointed out
that all that the usual definitionsand operationsof measuretheoryrequireis a Cr
algebra of subsets of an abstract space on which a measure, that is, a positive
countablyadditive set function, is defined. At each step of this progressionnot
necessarilypositive countablyadditive set functions-signed measures were incorporatedinto the theory.As noted below, the Radon-Nikodymtheorem (1930),
which gives conditionsnecessaryand sufficientthat a countablyadditivefunction
of sets can be expressed as an integral over the sets, turned out to be the final
essentialresult needed to formulatethe basic mathematicalprobabilitydefinitions.
Thus it was 28 years before Lebesgue'stheory was extended far enough to be
adequate for the mathematicalbasis of probability.This extensionwas not developed in order to provide a basis for probability,however. Measure theory was
developedas a part of classicalanalysis,and applicationsin analysiswere immediate, for example to the (Lebesgue measure) almost eveiywhere derivabilityof a
monotone function.
There has been criticismof the fact that mathematicalprobabilityis usually
prescribednot only to be additivebut even to be countablyadditive.The question
whether real world probabilityis countably additive, if the question is to be
meaningful,asks whether a mathematicalmodel of real world probabilisticphenomena necessarilyalwaysinvolves countablyadditiveset functions.In fact there
may well be real world contextsfor which the appropriatemathematicalmodel is
based on finitelybut not countablyadditiveset functions.But there have been very
few applicationsof such set functionsin either mathematicalor nonmathematical
contexts,and such set functionswill not be discussedfurtherhere.
7 Early applicationsof explicitmeasuretheoryto probability.Some probabilistic
slang will be needed, enduringrelics of the historicalbackgroundof probability
theory.A probabilityspace is a triple (S, X, P), where (S, S) is a measurablespace
and P is a measure on s with P(S) = 1. A measurewith this normalizationis a
probabilitymeasureA randomvariableis a measurablefunctionfrom a probability
space (S, X, P), into a measurablespace (S', '). The space S', or, when one writes
carefully,(S', '), is the statespace of the randomvariable.Mutualindependence
of randomvariablesis defined in the classicalway. The distributionof a random
variablex is the measurePx on ' defined by setting
PX(X)=P{SES

X(S)Xt}@

The joint distributionof finitely many random variables defined on the same
probabilityspace is obtained by making x into a vector and specifying' and S'
correspondingly.A stochasticprocessis a familyof randomvariables{x(t,), t E >}
from some probabilityspace (S, X, P), into a state space (S', '). The set o is the
index set of the process. Thus a stochastic process defines a function of two
from S into S' is the tth randomvariableof the process;the function x(, s) from
into S' is the sth samplefunction,or samplepath, or sample sequence if o is a
sequence.
Borel (1909) pointed out that in the dyadic representationx = xtx2 . . . of a
numberx between 0 and 1, in which each digit Xj is either 0 or 1, these digits are
functions of x, and if the interval [0,1] is provided with Lebesgue measure, a
probabilitymeasureon this interval,these functionsmiraculouslybecome random

590

THE EVOLUTION OF . . .

[Aug.-Sept.

This content downloaded from 213.233.188.210 on Sat, 07 Nov 2015 11:14:07 UTC
All use subject to JSTOR Terms and Conditions

variables which have exactly the distributionsused in calculating coin tossing


probabilities.That is, 2-n is the probabilityassignedto the event that, in a tossing
experiment,the first n tosses yield a specifiedsequenceof heads and tails, and 2-n
is also the total length (= Lebesgue measure)of the finite set of intervalswhose
points x have dyadicrepresentationswith a specifiedsequence of 0's and l's in the
first n places. Thus a mathematicalversionof the law of large numbersin the coin
tossingcontextis the existencein some sense of a limit of the sequenceof function
averages{(x1 + *@+xn)/n, n2 1}. Classical elementaryprobabilitycalculations
implythat this sequence of averagesconvergesin measureto 1/2, but a stronger
mathematicalversion of the law of large numbers was the fact deduced by
Borel-in an unmendablyfaulty proof-that this sequence of averagesconverges
to 1/2 for (Lebesguemeasure)almost everyvalue of x. A correctproof was given
a year later by Faber, and much simpler proofs have been given since. [Frechet
proof is excessivelyshort.It omits severalintermediate
remarkedtactfully:<<Borel's
arguments and assumes certain results without proof.>>]This theorem was an
importantstep, an exampleof a new kind of convergencetheorem in probability.
Observethat (fortunately)pure mathematiciansneed not interpretthis theoremin
the real worldof real people tossingreal coins. Some of the quotationsgiven above
indicate that they not only need not but should not.
Daniell (1918) used a deep approachto measure theory in which integralsare
defined before measuresto get a (ratherclumsy)approachto infinite sequencesof
randomvariablesby way of measuresin infinite dimensionalEuclideanspace.
The Brownianmotion stochastic process in R3 is the mathematicalmodel of
Brownianmotion, the motion of a microscopicparticlein a fluid as the particleis
hit by the molecules of the fluid. The process is normalizedby supposingit starts
at the originof a cartesiancoordinatesystemin R3, and a (normalized)Brownian
motionprocessin R is the processof a coordinatefunctionof a normalizedprocess
in R3, vanishing.initially.A (normalized)Brownian motion process in RN is a
process defined by N mutuallyindependentBrownianmotion processes in R. It
was well knownwhat the joint distributionsof the randomvariablesof a Brownian
motion process should be, and it had been taken for granted that in a proper
mathematicalmodel the class of continuous paths would have probability1. By
1900, Bachelier had even derived various importantdistributionsrelated to the
Brownianmotion process in R, such as that of the maximumchange duringa time
interval,by findingcorrespondingdistributionsfor a certaindiscrete randomwalk
and then going to the limit as the walk steps tended to 0. More precisely,what
Bachelierderivedwere distributionsvalid for a Brownianmotion process if in fact
there was such a thing as a Brownianmotion process, and if it was approximable
by his randomwalks. Observe that there was no question about the existence of
Brownianmotion;Brownianmotion is observableunder a microscope.But there
was as yet no proof of the existence of a stochastic process, a mathematical
construct, with the desired properties. Wiener (1923) constructed the desired
Brownianmotion process, now sometimes called the Wiener process, by applying
the Daniell approachto measure theory to obtain a measure with the desired
propertieson a space S of continuousfunctions:if x(t,) is the randomvariable
defined by the value at time t of a function in S, the stochasticprocess of these
randomvariablesis a stochasticprocess with sample functionsthe membersof S,
and with the joint randomvariabledistributionsthose prescribedfor the Brownian
motion process.
Bachelier'sresults remainedunnoticedfor years, and in fact were rediscovered
several times. Wiener'swork, like his fundamentalwork in potential theory, had
1996]

THE EVOLUTION OF . . .

This content downloaded from 213.233.188.210 on Sat, 07 Nov 2015 11:14:07 UTC
All use subject to JSTOR Terms and Conditions

591

little immediate influence because it was published in a journal which was not
widely distributed.It was an aspect of his genius that he carriedout his Brownian
motion researchthen and later without knowledgeof the slang and some of the
useful elementarymathematicaltechniquesof probabilitytheory.
Steinhaus (1930) demonstrated that classical arguments to derive standard
probabilitytheorems could be placed in a rigorous context by taking Lebesgue
measureon a linearintervalof length 1 as the basic probabilitymeasure,interpreting random variables as Lebesgue measurable functions on this interval, and
expectationsof randomvariablesas their integrals.No new proofs were required;
all that was requiredwas a proper translationof the classicalterminologyinto his
context. If this were all mathematizationof probabilityby measure theory had to
offer, the scorn of rigorous mathematicsexpressed by some nonmathematicians
would be justified.
8 Kolmogorov's1933 monograph.Kolmogorov(1933) constructedthe following
mathematicalbasis for probabilitytheory.
(a) The context of mathematicalprobabilityis a probabilityspace (S, X, P). The
sets in S are the mathematicalcounterpartsof real world events;the points
of S are counterpartsof elementaryevents, that is of individual(possible)
real world observations.
(b) Random variables on (S,S,P), are the counterpartsof functions of real
world observations.Suppose {x(t,),t e>} is a stochastic process on a
probabilityspace (S, X, P), with state space S'. A set of n of the process
randomvariableshas a probabilitydistributionon stn. Such finite dimensional distributionsare mutuallycompatiblein the sense that if 1 < m < n,
the joint distributionof x(t1,),. . ., x(tm,) on S'm is the m-dimensional
distribution induced by the n-dimensional distribution of x(t1,),....
x(tn,) onS
(c) Conversely,Kolmogorovproved that given an arbitraryindex set X, and a
suitablyrestrictedmeasurablespace (S",') (for example, the measurable
space can be a completeseparablemetricspace togetherwith the (r algebra
of its Borel sets) and a mutuallycompatibleset of distributionson stn, for
integers n 21, indexed by the finite subsets of X, there is a probability
space and a stochasticprocess {x(t,), t E >} defined on it, with state space
S', with the assigned joint random variable distributions.To prove this
result he constructeda probabilitymeasureon a (r algebraof subsetsof the
productspace S'>, the space of all functionsfrom > into S', and obtained
the requiredrandomvariablesas the coordinatefunctionsof S'>.
(e) The expectationof a numericallyvalued integrablerandomvariable is its
integralwith respect to the given probabilitymeasure.
(f) The classicaldefinitionof the conditionalprobabilityof an event (measurable set) A, given an event B of strictlypositive probability,is P(A r) B)/
P(B). In this way, for fixed B, new probabilitiesare obtained,and expectations of randomvariablesfor given B are computedin terms of these new
conditionalprobabilities.More generally, given an arbitrarycollection of
random variables, conditional probabilities and expectations relative to
given values of those randomvariablesare needed, functionsof the values
assigned to the conditioningrandomvariables.If (S, , P) is a probability
space, and if a collection of randomvariablesis given, let IFbe the smallest
sub (r algebra of S relative to which all the given randornvariables are
592

THE EVOLUTION OF . . .

[Aug.-Sept.

This content downloaded from 213.233.188.210 on Sat, 07 Nov 2015 11:14:07 UTC
All use subject to JSTOR Terms and Conditions

measurable.This S algebrais the (r algebra


generated
byconditions
imposed
on thegivenrandomvariables.
A reasonableinterpretationof a measurable
real valued function of the given collection of random variables is a
measurablefunctionfrom(S, IF)into R. The Kolmogorovconditional
expectationof a real valued integrablerandomvariablex on (S, X, P), relativeto a
(r algebra(Cof measurablesets, is a randomvariablewhich is measurable
relativeto (Cand has the same integralas x on everyset in (C.The existence
of such a randomvariable,and its uniquenessup to P-nullsets, is assuredby
the Radon-Nikodymtheorem.The conditionalexpectationof x relativeto a
collectionof randomvariablesis defined as the conditionalexpectationof x
relativeto the (r algebrageneratedby conditionson the randomvariables.
A conditionalprobabilityof a measurableset A is defined as the conditional expectationof the randomvariablewhich is 1 on A and Oelsewhere.
Kolmogorov's1933 exposition paints a discouragingpicture of mathematical
progress In the first pages of his monographhe states explicitlythat real valued
randomvariables are measurablefunctions and expectationsare their integrals.
Even as late as 1933,however,he must have thoughtthat mathematicianswere not
familiarwith measuretheory.In fact in the body of his monograph,when he comes
to the definitionof a real valued randomvariable,he does not simplyrefer back to
the first pages of the monographand say that a randomvariableis a measurable
function. Instead he actuallydefines measurabilityof a real valued function, and
similarlywhen he defines the expected value of a randomvariable he does not
simplystate that it is the integralof the randomvariablewith respect to the given
probabilitymeasure,but he actuallydefines the integral.Later in the monograph,
when he needs Lebesgue'stheorem allowingtaking limits of convergentfunction
sequences under the sign of integration,he does not simplyrefer to Lebesguebut
gives a detailed proof of what he needs. As confirmationof Kolmogorov'scaution
in invokingmeasuretheory,the authorrecallshis student experiencein 1932when
there were professorial disapprovingremarks on the extreme generality of a
seminarlecture givenby Sakson what is now called the Vitali-Hahn-Sakstheorem,
a theorem which has since become an importanttool in probabilitytheory. [He
also recalls that he did not understandthe point of Kolmogorov'smeasure on a
functionspace until long after he had read the monograph.]
It was some time before Kolmogorov'sbasis was accepted by probabilists.The
idea that a (mathematical)randomvariableis simplya function,with no romantic
connotation,seemed ratherhumiliatingto some probabilists.A prominentstatistician in 1935 wonderedwhether two orthogonalreal valued randomvariableswith
zero means (integrals)are necessarilyindependent, as they are under the added
hypothesis that they have a bivariateGaussian distribution.He was rather surprisedby the exampleof the sine and cosine functionson the interval[O,2qr],with
probabilitymeasure defined as Lebesgue measure divided by 2qr. These two
functions,orthogonaland with zero means but not independent,are not the kind
of random variables probabilistswere used to. Some analysts may be gratified,
some humiliated,to learn that in discussingFourierseries they can be accused of
discussingprobabilitiesand expectations.
9 Expansionbackwardsof the Kolmogorovbasis. Kolmogorov'sbasis for mathematicalprobabilitycan be expanded,and should be expandedin the view of some
probabilists,who want to start with some not necessarilynumericalmathematical
version of the confidence of observers that certain events will occur, and to
1996]

THE EVOLUTION OF . . .

This content downloaded from 213.233.188.210 on Sat, 07 Nov 2015 11:14:07 UTC
All use subject to JSTOR Terms and Conditions

593

of >, the function s supt>x(t, s) is measurableif

is countable, but need

proceed postulationallyto numericalevaluationsof this confidence, and finallyto


additivity.Such an analysismay be enlightenedin discussingthe appropriateness
of mathematicalprobability as a model for real world phenomena, but any
approachto the subjectwhich ends with a justificationof the classicalcalculations
and is mathematicallyusable, will end with Kolmogorov'sbasis, howeverphrased,
because all the measure basis to probabilitydoes is to give a formal precise
mathematicalframeworkfor the classical calculationsand their present refinements. This frameworkhad made it possible to applymathematicalprobabilityin
many other mathematicalfields, for example to potential theory and partial
differentialequations.Although such applicationswere made in the past before
the acceptance of measure theory as the basis of probability,the probabilistic
context served only to suggest mathematicsand was not an integral part of the
mathematics.The meaningof solutionsas probabilitiesand expectationscould not
be formulatedand exploited.
10 Uncountableindex sets. If the index-set > of a stochastic process {x(t,-),
t E >} is an intervalof the line, and if the state space of the randomvariablesis R,
the class of continuoussample functions may not be measurable.This difficulty
arises in the processes derivedby the Kolmogorovconstructionof a measureon a
function space, for example, whatever the choice of joint distributionsof the
process randomvariables.To understandthe difficulty,observe that if the index
set > of a stochasticprocesswith state space 1Ris an interval,and if Z is a subset
not be measurableif Z is uncountable.If boundednessand continuityof sample
functionsare to be discussed,some modificationof the probabilityrelationsof the
randomvariablesof a stochasticprocess should be devised to make such suprema
measurablefunctions. A clumsy approachwas proposed by Doob (1937) but a
more usable one was not devised until after 1950.
11 Reluctanceto accept measuretheoly by probabilists. There was considerable
resistance to the acceptance and exploitationof measure theory by probabilists,
both in Kolmogorov'sday and later. The followingquotationis an exampleof the
reluctanceof some mathematiciansto separate the mathematicsfrom the context
that inspiredit.
Kac (1959) How muchfiwssovermeasuretheozyis necessazyfor probabilitytheotyis
a matterof taste. PersonallyI preferas littlefiwssas possiblebecauseI
firmly
believethatprobabilitytheozyis more closelyrelatedto analysis,
physicsand statisticsthan to measuretheotyas such.
12 New relations between functions made possible by the mathematizationof
probability.Probabilitytheory suggested new relations between functions. For
example consider the sequence x1, x2, . . . of real valued integrablerandomvariables on a probabilityspace (S, Si,P) and suppose that the conditionalexpectation
of xn given x1, . . ., xn_1 vanishes (P) almost everywhere,for n > 1, that is, the
integral of xn over any set determinedby conditions on the preceding random
variablesvanishes.If these randomvariablesare squareintegrable,this condition
is equivalentto the condition,much strongerthan mutualorthogonality,that xn is
orthogonal to every square integrable function of x1,. .., xn. Bernstein (1927)
seems to have been the first to treat such sequences systematically.This condition
on a sequence of functions means that in a reasonable sense the sequence of
partial sums of the given sequence is the counterpartof a fair game. In fact, the
partialsums Y1,Y2,. . . are characterizedby the propertythat the expectationof Yn
594

THE EVOLUTION OF . . .

[Aug.-Sept.

This content downloaded from 213.233.188.210 on Sat, 07 Nov 2015 11:14:07 UTC
All use subject to JSTOR Terms and Conditions

that in consideringa functionof two variables, (t, s) x(t, s) as in considering

relative to Y1 XYn-lis equal to Yn-l almost everywhereon the probability


first used explicitlyby Ville
space. Processeswith this property,called martingales,
differentialequations,
partial
to
(1939), have had many applications,for example
class of sequences of
important
to derivation,and to potential theory. Another
property. These
Markov
the
with
of
sequences
random variables is the class
probabilithe
conditional
2
1
n
when
that
the
fact
sequencesare characterizedby
for xn
those
to
everywhere
almost
are
equal
ties for xn relative to xl, . . ., xn_1
past,
the
given
present,
of
the
the
influence
speaking,
relative to xn_1. Roughly
very
in
a
introduced
property,
Markov
The
past.
immediate
the
on
depends only
very
proved
has
others)
by
honor
in
his
(named
1906,
in
Markov
special case by
fruitful,for example, leading in the second half of the centuryto a probabilistic
potential theory,generalizingand includingclassicalpotential theory.
13 What is the place of probabilitytheoly in measuretheoly, and more generally
in analysis? It is considered by some mathematiciansthat if one deals with
analytic properties of probabilitiesand expectationsthen the subject is part of
analysis,but that if one deals with samplesequencesand samplefunctionsthen the
subjectis probabilitybut not analysis.These authorsare in the interestingsituation
stochastic processes they call it analysis if the family of functions x(t,) as t
varies is studied, but call it probabilityand definitelynot analysisif the familyof
functions x(, s) as s varies is studied. More precisely, they regard discussions
of distributionsand associatedquestionsas analysis,but not discussionsin termsof
sample functions.This point of view is expressedin the followingquotation.

Ito
as integrands,
processes
in 1944withstochastic
integral
Protter By developinghis
techwithpurelyprobabilistic
diffusions
wasableto studymultidimensional
overtheanalyticmethodsof Feller.
niques,an improvement
The following remark on the convergence of a sum of orthogonal functions
illustratesthe difficultyin separating(mathematical)probabilityfrom the rest of
analysis.The measure space is a probabilityspace, but with trivial changes the
discussionis valid for any finite measurespace.
If x is an orthogonalsequence of functions,on a probabilitymeasure space,
and if xn2has integral (rn2,then (Riesz-Fischer)Exn convergesin the mean if
E

<

(13.1)

The orthogonal series converges almost everywhere if either (Mensov-Rademacher)(13.1) is strengthenedto

n < + oo,

( 13.2)
or (Levy, 1937) the condition (13.1) is kept but the orthogonalitycondition is
strengthenedto the conditionin Section 12.
The reader shouldjudge which of these results is measuretheoretic and which
is probabilistic,whether there is any point in evicting mathematicalprobability
from analysis,and if so whether measuretheoryshould also be evicted.
E

(rn2 log2

Road,Apt.1104
101 WestWindsor
Illinois61801-6663
Urbana,
math.uiuc.edu
doob@symcom.

1996]

THE EVOLUTION OF . . .

This content downloaded from 213.233.188.210 on Sat, 07 Nov 2015 11:14:07 UTC
All use subject to JSTOR Terms and Conditions

s9s

S-ar putea să vă placă și