Sunteți pe pagina 1din 29

Statistical Science

2006, Vol. 21, No. 1, 70–98


DOI 10.1214/088342305000000467
© Institute of Mathematical Statistics, 2006

The Sources of Kolmogorov’s


Grundbegriffe
Glenn Shafer and Vladimir Vovk

Abstract. Andrei Kolmogorov’s Grundbegriffe der Wahrscheinlichkeits-


rechnung put probability’s modern mathematical formalism in place. It also
provided a philosophy of probability—an explanation of how the formalism
can be connected to the world of experience. In this article, we examine the
sources of these two aspects of the Grundbegriffe—the work of the earlier
scholars whose ideas Kolmogorov synthesized.
Key words and phrases: Axioms for probability, Borel, classical probabil-
ity, Cournot’s principle, frequentism, Grundbegriffe der Wahrscheinlichkeits-
rechnung, history of probability, Kolmogorov, measure theory.

1. INTRODUCTION bring them together explicitly, and take re-


sponsibility for saying that nothing further
Andrei Kolmogorov’s Grundbegriffe der Wahr- is needed in order to construct the theory.
scheinlichkeitsrechnung, which set out the axiomatic This is what Mr. Kolmogorov did. This
basis for modern probability theory, appeared in 1933. is his achievement. (And we do not believe
Four years later, in his opening address to an interna- he wanted to claim any others, so far as the
tional colloquium at the University of Geneva, Maurice axiomatic theory is concerned.)
Fréchet praised Kolmogorov for organizing a theory
Émile Borel had created many years earlier by com- Perhaps not everyone in Fréchet’s audience agreed that
Borel had put everything on the table, but surely many
bining countable additivity with classical probability.
saw the Grundbegriffe as a work of synthesis. In Kol-
Fréchet (1938b, page 54) put the matter this way in the
mogorov’s axioms and in his way of relating his ax-
written version of his address
ioms to the world of experience, they must have seen
It was at the moment when Mr. Borel in- traces of the work of many others—the work of Borel,
troduced this new kind of additivity into the yes, but also the work of Fréchet himself, and that
calculus of probability—in 1909, that is to of Cantelli, Chuprov, Lévy, Steinhaus, Ulam and von
say—that all the elements needed to for- Mises.
mulate explicitly the whole body of axioms Today, what Fréchet and his contemporaries knew
of (modernized classical) probability theory is no longer known. We know Kolmogorov and what
came together. came after; we have mostly forgotten what came be-
fore. This is the nature of intellectual progress, but it
It is not enough to have all the ideas in
has left many modern students with the impression that
mind, to recall them now and then; one must
Kolmogorov’s axiomatization was born full grown—
make sure that their totality is sufficient,
a sudden brilliant triumph over confusion and chaos.
To understand the synthesis represented by the
Glenn Shafer is Professor, Rutgers Business School, Grundbegriffe, we need a broad view of the founda-
Newark, New Jersey 07102, USA and Royal Hol- tions of probability and the advance of measure the-
loway, University of London, Egham, Surrey TW20 ory from 1900 to 1930. We need to understand how
OEX, UK (e-mail: gshafer@andromeda.rutgers.edu). measure theory became more abstract during those
Vladimir Vovk is Professor, Royal Holloway, University decades, and we need to recall what others were saying
of London, Egham, Surrey TW20 OEX, UK (e-mail: about axioms for probability, about Cournot’s principle
vovk@cs.rhul.ac.uk). and about the relationship of probability with meas-

70
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 71

ure and frequency. Our review of these topics draws was a synthesis and a manual, not a report on new re-
mainly on work by authors listed by Kolmogorov search. Like any textbook, its mathematics was novel
in the Grundbegriffe’s bibliography, especially Sergei for most of its readers, but its real originality was
Bernstein, Émile Borel, Francesco Cantelli, Maurice rhetorical and philosophical.
Fréchet, Paul Lévy, Antoni Łomnicki, Evgeny Slutsky,
Hugo Steinhaus and Richard von Mises. 2. THE CLASSICAL FOUNDATION
We are interested not only in Kolmogorov’s math-
The classical foundation of probability theory, which
ematical formalism, but also in his philosophy of
begins with the notion of equally likely cases, held
probability—how he proposed to relate the mathemat-
sway for 200 years. Its elements were put in place early
ical formalism to the real world. In a letter to Fréchet,
in the eighteenth century, and they remained in place
Kolmogorov (1939) wrote, “You are also right in at-
in the early twentieth century. Even today the classical
tributing to me the opinion that the formal axioma-
foundation is used in teaching probability.
tization should be accompanied by an analysis of its
Although twentieth century proponents of new ap-
real meaning.” Kolmogorov devoted only two pages of
proaches were fond of deriding the classical foundation
the Grundbegriffe to such an analysis, but the ques-
as naive or circular, it can be defended. Its basic math-
tion was more important to him than this brevity might
ematics can be explained in a few words, and it can
suggest. We can study any mathematical formalism we
be related to the real world by Cournot’s principle, the
like, but we have the right to call it probability only if
principle that an event with small or zero probability
we can explain how it relates to the phenomena classi-
will not occur. This principle was advocated in France
cally treated by probability theory.
and Russia in the early years of the twentieth century,
We begin by looking at the classical foundation that
but disputed in Germany. Kolmogorov retained it in the
Kolmogorov’s measure-theoretic foundation replaced:
Grundbegriffe.
equally likely cases. In Section 2 we review how proba-
In this section we review the mathematics of equally
bility was defined in terms of equally likely cases, how
likely cases and recount the discussion of Cournot’s
the rules of the calculus of probability were derived principle, contrasting the support for it in France with
from this definition and how this calculus was related German efforts to find other ways to relate equally
to the real world by Cournot’s principle. We also look likely cases to the real world. We also discuss two para-
at some paradoxes discussed at the time. doxes, contrived at the end of the nineteenth century
In Section 3 we sketch the development of measure by Joseph Bertrand, which illustrate the care that must
theory and its increasing entanglement with probability be taken with the concept of relative probability. The
during the first three decades of the twentieth century. lack of consensus on how to make philosophical sense
This story centers on Borel, who introduced countable of equally likely cases and the confusion revealed by
additivity into pure mathematics in the 1890s and then Bertrand’s paradoxes were two sources of dissatisfac-
brought it to the center of probability theory, as Fréchet tion with the classical theory.
noted, in 1909, when he first stated and more or less
proved the strong law of large numbers for coin toss- 2.1 The Classical Calculus
ing. However, the story also features Lebesgue, Radon, The classical definition of probability was formu-
Fréchet, Daniell, Wiener, Steinhaus and Kolmogorov lated by Jacob Bernoulli (1713) in Ars Conjectandi
himself. and Abraham de Moivre in (1718) in The Doctrine of
Inspired partly by Borel and partly by the challenge Chances: the probability of an event is the ratio of the
issued by Hilbert in 1900, a whole series of mathe- number of equally likely cases that favor it to the to-
maticians proposed abstract frameworks for probabil- tal number of equally likely cases possible under the
ity during the three decades we are emphasizing. In circumstances.
Section 4 we look at some of these, beginning with From this definition, de Moivre derived two rules for
the doctoral dissertations by Rudolf Laemmel and Ugo probability. The theorem of total probability, or the ad-
Broggi in the first decade of the century and including dition theorem, says that if A and B cannot both hap-
an early contribution by Kolmogorov, written in 1927, pen, then
five years before he started work on the Grundbegriffe.
In Section 5 we finally turn to the Grundbegriffe it- probability of A or B happening
self. Our review of it will confirm what Fréchet said # of cases favoring A or B
in 1937 and what Kolmogorov says in the preface: it =
total # of cases
72 G. SHAFER AND V. VOVK

# of cases favoring A # of cases favoring B what their temporal or logical relationship, and that it
= +
total # of cases total # of cases allows one to streamline Poincaré’s proofs of the ad-
= (probability of A) + (probability of B). dition and multiplication theorems. Hausdorff’s nota-
tion was adopted by Czuber in 1903. Kolmogorov used
The theorem of compound probability, or the multipli- it in the Grundbegriffe, and it persisted, especially in
cation theorem, says the German literature, until the middle of the twenti-
probability of both A and B happening eth century, when it was displaced by the more flexible
P (E|F ), which Harold Jeffreys (1931) introduced in
# of cases favoring both A and B
= his Scientific Inference.
total # of cases
# of cases favoring A 2.2 Cournot’s Principle
=
total # of cases An event with very small probability is morally im-
# of cases favoring both A and B possible: it will not happen. Equivalently, an event with
·
# of cases favoring A very high probability is morally certain: it will hap-
= (probability of A) pen. This principle was first formulated within math-
ematical probability by Jacob Bernoulli. In his Ars
· (probability of B if A happens). Conjectandi, published in 1713, Bernoulli proved a
These arguments were still standard fare in probability celebrated theorem: in a sufficiently long sequence of
textbooks at the beginning of the twentieth century, in- independent trials of an event, there is a very high prob-
cluding the great treatises by Henri Poincaré (1896) in ability that the frequency with which the event happens
France, Andrei Markov (1900) in Russia and Emanuel will be close to its probability. Bernoulli explained that
Czuber (1903) in Germany. Some years later we find we can treat the very high probability as moral cer-
them in Guido Castelnuovo’s (1919) Italian textbook, tainty and so use the frequency of the event as an esti-
which has been held out as the acme of the genre mate of its probability.
(Onicescu, 1967). Probabilistic moral certainty was widely discussed
Geometric probability was incorporated into the in the eighteenth century. In the 1760s, the French sa-
classical theory in the early nineteenth century. Instead vant Jean d’Alembert muddled matters by questioning
of counting equally likely cases, one measures their whether the prototypical event of very small probabil-
geometric extension—their area or volume. However, ity, a long run of many happenings of an event as likely
probability is still a ratio, and the rules of total and to fail as happen on each trial, is possible at all. A run of
compound probability are still theorems. This was ex- a hundred may be metaphysically possible, he felt, but
plained by Antoine-Augustin Cournot (1843, page 29) it is physically impossible. It has never happened and
in his influential treatise on probability and statistics, never will happen (d’Alembert, 1761, 1767; Daston,
Exposition de la théorie des chances et des probabil- 1979). Buffon (1777) argued that the distinction be-
ités. This understanding of geometric probability did tween moral and physical certainty is one of degree.
not change in the early twentieth century, when Borel An event with probability 9999/10,000 is morally cer-
and Lebesgue expanded the class of sets for which tain; an event with much greater probability, such as
we can define geometric extension. We may now have the rising of the sun, is physically certain (Loveland,
more events with which to work, but we define and 2001).
study geometric probabilities as before. Cournot would Cournot, a mathematician now remembered as an
have seen nothing novel in Felix Hausdorff’s (1914, economist and a philosopher of science (Martin, 1996,
pages 416–417) definition of probability in the chapter 1998), gave the discussion a nineteenth century cast.
on measure theory in his treatise on set theory. Being equipped with the idea of geometric probabil-
The classical calculus was enriched at the beginning ity, Cournot could talk about probabilities that are van-
of the twentieth century by a formal and universal no- ishingly small. He brought physics to the foreground.
tation for relative probabilities. Hausdorff (1901) intro- It may be mathematically possible, he argued, for a
duced the symbol pF (E) for what he called the relative heavy cone to stand in equilibrium on its vertex, but
Wahrscheinlichkeit von E, posito F (relative probabil- it is physically impossible. The event’s probability is
ity of E given F ). Hausdorff explained that this nota- vanishingly small. Similarly, it is physically impossi-
tion can be used for any two events E and F , no matter ble for the frequency of an event in a long sequence of
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 73

trials to differ substantially from the event’s probability Maurice Fréchet and Paul Lévy, made the connection
(Cournot, 1843, pages 57 and 106). by treating events of small or zero probability as im-
In the second half of the nineteenth century, the prin- possible.
ciple that an event with a vanishingly small probability Borel explained this repeatedly, often in a style more
will not happen took on a real role in physics, most literary than mathematical or philosophical (Borel,
saliently in Ludwig Boltzmann’s statistical understand- 1906, 1909b, 1914, 1930). Borel’s many discussions
ing of the second law of thermodynamics. As Boltz- of the considerations that go into assessing the bound-
mann explained in the 1870s, dissipative processes aries of practical certainty culminated in a classifica-
are irreversible because the probability of a state with tion more refined than Buffon’s. A probability of 10−6 ,
entropy far from the maximum is vanishingly small he decided, is negligible at the human scale, a proba-
(von Plato, 1994, page 80; Seneta, 1997). Also notable bility of 10−15 at the terrestrial scale and a probability
was Henri Poincaré’s use of the principle in celes- of 10−50 at the cosmic scale (Borel, 1939, pages 6–7).
tial mechanics. Poincaré’s (1890) recurrence theorem Hadamard, the preeminent analyst who did path-
says that an isolated mechanical system confined to a breaking work on Markov chains in the 1920s (Bru,
bounded region of its phase space will eventually re- 2003), made the point in a different way. Probabil-
turn arbitrarily close to its initial state, provided only ity theory, he said, is based on two notions: the no-
that this initial state is not exceptional. The states for tion of perfectly equivalent (equally likely) events and
which the recurrence does not hold are exceptional the notion of a very unlikely event (Hadamard, 1922,
inasmuch as they are contained in subregions whose page 289). Perfect equivalence is a mathematical as-
total volume is arbitrarily small. sumption which cannot be verified. In practice, equiva-
Saying that an event of very small or vanishingly lence is not perfect—one of the grains in a cup of sand
small probability will not happen is one thing. Saying may be more likely than another to hit the ground first
that probability theory gains empirical meaning only when they are thrown out of the cup—but this need not
by ruling out the happening of such events is another. prevent us from applying the principle of the very un-
Cournot (1843, page 78) seems to have been the first to likely event. Even if the grains are not exactly the same,
say explicitly that probability theory does gain empir- the probability of any particular one hitting the ground
ical meaning only by declaring events of vanishingly first is negligibly small. Hadamard was the teacher of
small probability to be impossible: both Fréchet and Lévy.
. . . The physically impossible event is there- Among the French mathematicians of this period, it
fore the one that has infinitely small proba- was Lévy who expressed most clearly the thesis that
bility, and only this remark gives Cournot’s principle is probability’s only bridge to re-
substance—objective and phenomenal ality. In his Calcul des probabilités (Lévy, 1925) Lévy
value—to the theory of mathematical prob- emphasized the different roles of Hadamard’s two no-
ability. tions. The notion of equally likely events, Lévy ex-
plained, suffices as a foundation for the mathematics of
[The phrase “objective and phenomenal” refers to probability, but so long as we base our reasoning only
Kant’s distinction between the noumenon, or thing- on this notion, our probabilities are merely subjective.
in-itself, and the phenomenon, or object of experi- It is the notion of a very unlikely event that permits the
ence (Daston, 1994).] After the Second World War, results of the mathematical theory to take on practical
some authors began to use “Cournot’s principle” for significance (Lévy, 1925, pages 21, 34; see also Lévy,
the principle that an event of very small or zero proba- 1937, page 3). Combining the notion of a very unlikely
bility singled out in advance will not happen, especially event with Bernoulli’s theorem, we obtain the notion
when this principle is advanced as the unique means by of the objective probability of an event, a physical con-
which a probability model is given empirical meaning. stant that is measured by frequency. Objective proba-
2.2.1 The viewpoint of the French probabilists. In bility, in Lévy’s view, is entirely analogous to length
the early decades of the twentieth century, probabil- and weight, other physical constants whose empirical
ity theory was beginning to be understood as pure meaning is also defined by methods established for
mathematics. What does this pure mathematics have measuring them to a reasonable approximation (Lévy,
to do with the real world? The mathematicians who 1925, pages 29–30).
revived research in probability theory in France dur- By the time he undertook to write the Grundbe-
ing these decades, Émile Borel, Jacques Hadamard, griffe, Kolmogorov must have been very familiar with
74 G. SHAFER AND V. VOVK

Lévy’s views. He had cited Lévy’s 1925 book in his happen on that trial. The weak form says that an event
1931 article on Markov processes and subsequently, with very small probability will happen very rarely in
during his visit to France, had spent a great deal of repeated trials.
time talking with Lévy about probability. He could Borel, Lévy and Kolmogorov all subscribed to
also have learned about Cournot’s principle from the Cournot’s principle in its strong form. In this form,
Russian literature. The champion of the principle in the principle combines with Bernoulli’s theorem to
Russia had been Chuprov, who became professor of produce the unequivocal conclusion that an event’s
statistics in Petersburg in 1910. Chuprov put Cournot’s probability will be approximated by its frequency in
principle—which he called Cournot’s lemma—at the a particular sufficiently long sequence of independent
heart of this project; it was, he said, a basic principle trials. It also provides a direct foundation for statistical
of the logic of the probable (Chuprov, 1910; Sheynin, testing. If the meaning of probability resides precisely
1996, pages 95–96). Markov, who also worked in Pe- in the nonhappening of small-probability events sin-
tersburg, learned about the burgeoning field of mathe- gled out in advance, then we need no additional prin-
matical statistics from Chuprov (Ondar, 1981), and we ciples to justify rejecting a hypothesis that gives small
see an echo of Cournot’s principle in Markov’s (1912, probability to an event we single out in advance and
page 12 of the German edition) textbook: then observe to happen.
The closer the probability of an event is Other authors, including Chuprov, enunciated Cour-
to one, the more reason we have to expect not’s principle in its weak form, and this can lead in a
the event to happen and not to expect its op- different direction. The weak principle combines with
posite to happen. Bernoulli’s theorem to produce the conclusion that an
In practical questions, we are forced to event’s probability will usually be approximated by
regard as certain events whose probability its frequency in a sufficiently long sequence of inde-
comes more or less close to one, and to re- pendent trials, a general principle that has the weak
gard as impossible events whose probability principle as a special case. This was pointed out in
is small. the famous textbook by Castelnuovo (1919, page 108).
Consequently, one of the most important On page 3, Castelnuovo called the general principle the
tasks of probability theory is to identify empirical law of chance:
those events whose probabilities come close In a series of trials repeated a large num-
to one or zero. ber of times under identical conditions, each
The Russian statistician Evgeny Slutsky discussed of the possible events happens with a (rel-
Chuprov’s views in his influential article on limit the- ative) frequency that gradually equals its
orems (Slutsky, 1925). Kolmogorov included Lévy’s probability. The approximation usually im-
book and Slutsky’s article in his bibliography, but proves as the number of trials increases.
not Chuprov’s book. An opponent of the Bolsheviks, Although the special case where the probability is close
Chuprov was abroad when they seized power, and he to 1 is sufficient to imply the general principle, Castel-
never returned home. He remained active in Sweden nuovo preferred to begin his introduction to the mean-
and Germany, but his health soon failed, and he died ing of probability by enunciating the general principle,
in 1926 at the age of 52. and so he can be considered a frequentist. His approach
2.2.2 Strong and weak forms of Cournot’s principle. was influential. Maurice Fréchet and Maurice Halb-
Cournot’s principle has many variations. Like proba- wachs adopted it in their textbook in 1924. It brought
bility, moral certainty can be subjective or objective. Fréchet to the same understanding of objective proba-
Some authors make moral certainty sound truly equiv- bility as Lévy: objective probability is a physical con-
alent to absolute certainty; others emphasize its prag- stant that is measured by frequency (Fréchet, 1938a,
matic meaning. page 5; 1938b, pages 45–46).
For our story, it is important to distinguish between The weak point of Castelnuovo and Fréchet’s po-
the strong and weak forms of the principle (Fréchet, sition lies in the modesty of their conclusion: they
1951, page 6; Martin, 2003). The strong form refers to conclude only that an event’s probability is usually ap-
an event of small or zero probability that we single out proximated by its frequency. When we estimate a prob-
in advance of a single trial: it says the event will not ability from an observed frequency, we are taking a
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 75

further step: we are assuming that what usually hap- first appeared in 1886. von Kries rejected what he
pens has happened in the particular case. This step called the orthodox philosophy of Laplace and the
requires the strong form of Cournot’s principle. Ac- mathematicians who followed him. As von Kries saw
cording to Kolmogorov (1956, page 240 of the 1965 it, these mathematicians began with a subjective con-
English edition), it is a reasonable step only if we have cept of probability, but then claimed to establish the
some reason to assume that the position of the partic- existence of objective probabilities by means of a so-
ular case among other potential ones “is a regular one, called law of large numbers, which they erroneously
that is, that it has no special features.” derived by combining Bernoulli’s theorem with the be-
lief that small probabilities can be neglected. Having
2.2.3 British indifference and German skepticism. both subjective and objective probabilities at their dis-
The mathematicians who worked on probability in posal, these mathematicians then used Bayes’ theorem
France in the early twentieth century were unusual in to reason about objective probabilities for almost any
the extent to which they delved into the philosophical question where many observations are available. All
side of their subject. Poincaré had made a mark in the this, von Kries believed, was nonsense. The notion that
philosophy of science as well as in mathematics, and an event with very small probability is impossible was,
Borel, Fréchet and Lévy tried to emulate him. The sit- in von Kries’ eyes, simply d’Alembert’s mistake.
uation in Britain and Germany was different. von Kries believed that objective probabilities some-
In Britain there was little mathematical work in times exist, but only under conditions where equally
probability proper in this period. In the nineteenth likely cases can legitimately be identified. Two condi-
century, British interest in probability had been practi- tions, he thought, are needed:
cal and philosophical, not mathematical (Porter, 1986,
page 74ff). Robert Leslie Ellis (1849) and John Venn • Each case is produced by equally many of the pos-
(1888) accepted the usefulness of probability, but in- sible arrangements of the circumstances, and this
sisted on defining it directly in terms of frequency, remains true when we look back in time to earlier
leaving no role for Bernoulli’s theorem and Cournot’s circumstances that led to the current ones. In this
principle (Daston, 1994). These attitudes persisted sense, the relative sizes of the cases are natural.
even after Pearson and Fisher brought Britain into a • Nothing besides these circumstances affects our ex-
leadership role in mathematical statistics. The British pectation about the cases. In this sense, the Spiel-
statisticians had no puzzle to solve concerning how to räume are insensitive. [In German, Spiel means
link probability to the world. They were interested in game or play, and Raum (plural Räume) means
reasoning directly about frequencies. room or space. In most contexts, Spielraum can be
In contrast with Britain, Germany did see a substan- translated as leeway or room for maneuver. For von
tial amount of mathematical work in probability dur- Kries the Spielraum for each case was the set of all
ing the first decades of the twentieth century, much of arrangements of the circumstances that produce it.]
it published in German by Scandinavians and eastern von Kries’ principle of the Spielräume was that objec-
Europeans, but few German mathematicians of the first tive probabilities can be calculated from equally likely
rank fancied themselves philosophers. The Germans cases when these conditions are satisfied. He consid-
were already pioneering the division of labor to which ered this principle analogous to Kant’s principle that
we are now accustomed, between mathematicians who everything that exists has a cause. Kant thought that
prove theorems about probability, and philosophers, we cannot reason at all without the principle of cause
logicians, statisticians and scientists who analyze the and effect. von Kries thought that we cannot reason
meaning of probability. Many German statisticians be- about objective probabilities without the principle of
lieved that one must decide what level of probabil- the Spielräume.
ity will count as practical certainty in order to apply Even when an event has an objective probability,
probability theory (von Bortkiewicz, 1901, page 825; von Kries saw no legitimacy in the law of large num-
Bohlmann, 1901, page 861), but German philosophers bers. Bernoulli’s theorem is valid, he thought, but it
did not give Cournot’s principle a central role. tells us only that a large deviation of an event’s fre-
The most cogent and influential of the German quency from its probability is just as unlikely as some
philosophers who discussed probability in the late other unlikely event, say a long run of successes. What
nineteenth century was Johannes von Kries (1886), will actually happen is another matter. This disagree-
whose Principien der Wahrscheinlichkeitsrechnung ment between Cournot and von Kries can be seen as
76 G. SHAFER AND V. VOVK

a quibble about words. Do we say that an event will 2.3 Bertrand’s Paradoxes
not happen (Cournot) or do we say merely that it is How do we know cases are equally likely, and when
as unlikely as some other event we do not expect to something happens, do the cases that remain possi-
happen (von Kries)? Either way, we proceed as if it ble remain equally likely? In the decades before the
will not happen. However, the quibbling has its rea- Grundbegriffe, these questions were frequently dis-
sons. Cournot wanted to make a definite prediction, be- cussed in the context of paradoxes formulated by
cause this provides a bridge from probability theory to Joseph Bertrand, an influential French mathematician,
the world of phenomena—the real world, as those who in a textbook published in 1889.
have not studied Kant would say. von Kries thought he We now look at discussions by other authors of two
had a different way to connect probability theory with of Bertrand’s paradoxes: Poincaré’s discussion of the
phenomena. paradox of the three jewelry boxes and Borel’s discus-
sion of the paradox of the great circle. (In the literature
von Kries’ critique of moral certainty and the law
of the period, “Bertrand’s paradox” usually referred
of large numbers was widely accepted in Germany
to a third paradox, concerning two possible interpre-
(Kamlah, 1983). Czuber, in the influential textbook we tations of the idea of choosing a random chord on a
have already mentioned, named Bernoulli, d’Alembert, circle. Determining a chord by choosing two random
Buffon and De Morgan as advocates of moral certainty points on the circumference is not the same as deter-
and declared them all wrong; the concept of moral cer- mining it by choosing a random distance from the cen-
tainty, he said, violates the fundamental insight that ter and then a random orientation.) The paradox of the
an event of ever so small a probability can still hap- great circle was also discussed by Kolmogorov and is
pen (Czuber, 1843, page 15; see also Meinong, 1915, now sometimes called the Borel–Kolmogorov paradox.
page 591). 2.3.1 The paradox of the three jewelry boxes. This
This wariness about ruling out the happening of paradox, laid out by Bertrand (1889, pages 2–3), in-
events whose probability is merely very small does volves three identical jewelry boxes, each with two
not seem to have prevented acceptance of the idea that drawers. Box A has gold medals in both drawers, box B
zero probability represents impossibility. Beginning has silver medals in both, and box C has a gold medal
with Wiman’s work on continued fractions in 1900, in one and a silver medal in the other. Suppose we
mathematicians writing in German worked on show- choose a box at random. It will be box C with prob-
ing that various sets have measure zero, and everyone ability 1/3. Now suppose we open at random one of
understood that the point was to show that these sets the drawers in the box we have chosen. There are two
possibilities for what we find:
are impossible (see Felix Bernstein, 1912, page 419).
This suggests a great gulf between zero probability and • We find a gold medal. In this case, only two possibil-
merely small probability. One does not sense such a ities remain: the other drawer has a gold medal (we
gulf in the writings of Borel and his French colleagues; have chosen box A) or the other drawer has a silver
as we have seen, the vanishingly small, for them, was medal (we have chosen box C).
merely an idealization of the very small. • We find a silver medal. Here also, only two possibil-
ities remain: the other drawer has a gold medal (we
von Kries’ principle of the Spielräume did not en-
have chosen box C) or the other drawer has a silver
dure, because no one knew how to use it, but his
medal (we have chosen box B).
project of providing a Kantian justification for the uni-
form distribution of probabilities remained alive in Either way, it seems, there are now two cases, one of
German philosophy in the first decades of the twenti- which is that we have chosen box C. So the probability
eth century (Meinong, 1915; Reichenbach, 1916). John that we have chosen box C is now 1/2.
Bertrand himself did not accept the conclusion that
Maynard Keynes (1921) brought it into the English lit-
opening the drawer would change the probability of
erature, where it continues to echo, to the extent that having box C from 1/3 to 1/2, and Poincaré (1912,
today’s probabilists, when asked about the philosophi- pages 26–27) gave an explanation: Suppose the draw-
cal grounding of the classical theory of probability, are ers in each box are labeled (where we cannot see)
more likely to think about arguments for a uniform dis- α and β, and suppose the gold medal in box C is in
tribution of probabilities than about Cournot’s princi- drawer α. Then there are six equally likely cases for
ple. the drawer we open:
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 77

1. Box A, drawer α: gold medal. assign the value zero to the probability that
2. Box A, drawer β: gold medal. M and M are on the circle. In order to avoid
3. Box B, drawer α: silver medal. this factor of zero, which makes any calcu-
4. Box B, drawer β: silver medal. lation impossible, one must consider a thin
5. Box C, drawer α: gold medal. bundle of great circles all going through M,
6. Box C, drawer β: silver medal. and then it is obvious that there is a greater
probability for M to be situated in a vicinity
When we find a gold medal, say, in the drawer we have 90 degrees from M than in the vicinity of M
opened, three of these cases remain possible: case 1, itself (Fig. 13).
case 2 and case 5. Of the three, only one favors our
having our hands on box C, so the probability for box C To give this argument practical content, Borel dis-
cussed how one might measure the longitude of a point
is still 1/3.
on the surface of the earth. If we use astronomical ob-
2.3.2 The paradox of the great circle. Bertrand servations, then we are measuring an angle, and er-
(1889, pages 6–7) begins with a simple question: if we rors in the measurement of the angle correspond to
choose at random two points on the surface of a sphere, wider distances on the ground at the equator than at
what is the probability that the distance between them the poles. If we instead use geodesic measurements,
is less than 10 ? say with a line of markers on each of many meridians,
By symmetry, we can suppose that the first point is then to keep the markers out of each other’s way, we
known. So one way to answer the question is to calcu- must make them thinner and thinner as we approach
late the proportion of a sphere’s surface that lies within the poles.
10 of a given point. This is 2.1 × 10−6 . 2.3.3 Appraisal. Poincaré, Borel and others who
Bertrand also found a different answer. After fix- understood the principles of the classical theory were
ing the first point, he said, we can also assume that able to resolve the paradoxes that Bertrand contrived.
we know the great circle that connects the two points, Two principles emerge from the resolutions they of-
because the possible chances are the same on great fered:
circles through the first point. There are 360 degrees— • The equally likely cases must be detailed enough
2160 arcs of 10 each—in this great circle. Only the to represent new information (e.g., we find a gold
points in the two neighboring arcs are within 10 of the
first point, and so the probability sought is 2/2160, or
9.3 × 10−4 . This is many times larger than the prob-
ability found by the first method. Bertrand considered
both answers equally valid, the original question being
ill-posed. The concept of choosing points at random on
a sphere was not, he said, sufficiently precise.
In his own probability textbook Borel (1909b, pages
100–104) explained that Bertrand was mistaken.
Bertrand’s first method, based on the assumption that
equal areas on the sphere have equal chances of con-
taining the second point, is correct. His second method,
based on the assumption that equal arcs on a great cir-
cle have equal chances of containing it, is incorrect.
Writing M and M for the two points to be chosen at
random on the sphere, Borel explained Bertrand’s mis-
take as follows:
. . . The error begins when, after fixing the
point M and the great circle, one assumes
that the probability of M being on a given
arc of the great circle is proportional to the
length of that arc. If the arcs have no width,
then in order to speak rigorously, we must F IG . 1. Borel’s Figure 13.
78 G. SHAFER AND V. VOVK

medal) in all relevant detail. The remaining equally analysis. In his doctoral dissertation Borel (1895) stud-
likely cases will then remain equally likely. ied certain series that were known to diverge on a
• We may need to consider the real observed event of dense set of points on a closed curve and hence, it was
nonzero probability that is represented in an ideal- thought, could not be continued analytically into the
ized way by an event of zero probability (e.g., a ran- region bounded by the curve. Roughly speaking, Borel
domly chosen point falls on a particular meridian). discovered that the set of points where divergence oc-
We should pass to the limit only after absorbing the curred, although dense, can be covered by a count-
new information. able number of intervals with arbitrarily small total
length. Elsewhere on the curve—almost everywhere,
Not everyone found it easy to apply these principles,
we would say now—the series does converge and so
however, and the confusion surrounding the paradoxes
analytic continuation is possible (Hawkins, 1975, Sec-
was another source of dissatisfaction with the classical
tion 4.2). This discovery led Borel to a new theory of
theory.
measurability for subsets of [0, 1] (Borel, 1898).
Borel’s innovation was quickly seized upon by Henri
3. MEASURE-THEORETIC PROBABILITY BEFORE
Lebesgue, who made it the basis for his powerful the-
THE GRUNDBEGRIFFE
ory of integration (Lebesgue, 1901). We now speak of
A discussion of the relationship between measure Lebesgue measure on the real numbers R and on the
and probability in the first decades of the twentieth n-dimensional space R n , and of the Lebesgue integral
century must navigate many pitfalls, because measure in these spaces. We need not review Lebesgue’s the-
theory itself evolved, beginning as a theory about the ory, but we should mention one theorem, the precursor
measurability of sets of real numbers and then becom- of the Radon–Nikodym theorem: any countably addi-
ing more general and abstract. Probability theory fol- tive and absolutely continuous set function on the real
lowed along, but since the meaning of measure was numbers is an indefinite integral. This result first ap-
changing, we can easily misunderstand things said at peared in (Lebesgue, 1904; Hawkins, 1975, page 145;
the time about the relationship between the two theo- Pier, 1994a, page 524). He generalized it to R n in 1910
ries. (Hawkins, 1975, page 186).
The development of theories of measure and inte- Wacław Sierpiński (1918) gave an axiomatic treat-
gration during the late nineteenth and early twenti- ment of Lebesgue measure. In this note, important to
eth centuries has been studied extensively (Hawkins, us because of the use Hugo Steinhaus later made of it,
1975; Pier, 1994a). Here we offer only a bare-bones Sierpiński characterized the class of Lebesgue measur-
sketch, beginning with Borel and Lebesgue, and touch- able sets as the smallest class K of sets that satisfy the
ing on those steps that proved most significant for following conditions:
the foundations of probability. We discuss the work
I. For every set E in K, there is a nonnegative num-
of Carathéodory, Radon, Fréchet and Nikodym, who
ber µ(E) that will be its measure and will satisfy
made measure primary and integral secondary, as well
conditions II, III, IV and V.
as the contrasting approach of Daniell, who took inte-
II. Every finite closed interval is in K and has its
gration to be basic, and Wiener, who applied Daniell’s
length as its measure.
methods to Brownian motion. Then we discuss Borel’s
III. The class K is closed under finite and countable
strong law of large numbers, which focused attention
unions of disjoint elements, and µ is finitely and
on measure rather than on integration. After looking
countably additive.
at Steinhaus’ axiomatization of Borel’s denumerable
IV. If E1 ⊃ E2 , and E1 and E2 are in K, then E1 \ E2
probability, we turn to Kolmogorov’s use of measure
is in K.
theory in probability in the 1920s.
V. If E is in K and µ(E) = 0, then any subset of E is
3.1 Measure Theory from Borel to Fréchet in K.
Émile Borel is considered the founder of measure An arbitrary class K that satisfies these conditions is
theory. Whereas Peano and Jordan had extended the not necessarily a field; there is no requirement that the
concept of length from intervals to a larger class of intersection of two of K’s elements also be in K.
sets of real numbers by approximating the sets inside Lebesgue’s measure theory was first made abstract
and outside with finite unions of intervals, Borel used by Johann Radon (1913). Radon unified Lebesgue and
countable unions. His motivation came from complex Stieltjes integration by generalizing integration with
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 79

respect to Lebesgue measure to integration with respect . . . After Lebesgue’s investigations, the anal-
to any countably additive set function on the Borel sets ogy between the measure of a set and the
in R n . The generalization included a version of the the- probability of an event, as well as between
orem of Lebesgue we just mentioned: if a countably the integral of a function and the mathe-
additive set function g on R n is absolutely continu- matical expectation of a random variable,
ous with respect to another countably additive set func- was clear. This analogy could be extended
tion f , then g is an indefinite integral with respect to f further; for example, many properties of in-
(Hawkins, 1975, page 189). dependent random variables are completely
Constantin Carathéodory was also influential in analogous to corresponding properties of
drawing attention to measures on Euclidean spaces orthogonal functions. But in order to base
other than Lebesgue measure. Carathéodory (1914) probability theory on this analogy, one still
gave axioms for outer measure in a q-dimensional needed to liberate the theory of measure
space, derived the notion of measure and applied and integration from the geometric elements
these ideas not only to Lebesgue measure on Euclid- still in the foreground with Lebesgue. This
ean spaces, but also to lower dimensional measures liberation was accomplished by Fréchet.
on Euclidean space which assign lengths to curves, It should not be inferred from this passage that Fréchet
areas to surfaces and so forth (Hochkirchen, 1999). and Kolmogorov used “measure” in the way we do
Carathéodory also recast Lebesgue’s theory of integra- today. Fréchet may have liberated measure and inte-
tion to make measure even more fundamental; in his gration from its geometric roots, but Fréchet and Kol-
textbook (Carathéodory, 1918) on real functions, he mogorov continued to reserve the word measure for
defined the integral of a positive function on a subset geometric settings. Throughout the 1930s, what we
of R n as the (n + 1)-dimensional measure of the region now call a measure, they called an additive set func-
between the subset and the function’s graph (Bourbaki, tion. The usage to which we are now accustomed be-
1994, page 228). came standard only after the Second World War.
It was Fréchet who first went beyond Euclidean
space. Fréchet (1915a, b) observed that much of 3.2 Daniell’s Integral and Wiener’s
Radon’s reasoning does not depend on the assumption Differential Space
that one is working in R n . One can reason in the same Percy Daniell, an Englishman working at the Rice
way in a much larger space, such as a space of func- Institute in Houston, Texas, introduced his integral in a
tions. Any space will do, so long as the countably addi- series of articles (Daniell, 1918, 1919a, b, 1920) in the
tive set function is defined on a σ -field of its subsets, as Annals of Mathematics.
Radon had required. Fréchet did not, however, manage Like Fréchet, Daniell considered an abstract set E,
to generalize Radon’s theorem on absolute continuity but instead of beginning with an additive set function
to the fully abstract framework. This generalization, on subsets of E, he began with what he called an in-
now called the Radon–Nikodym theorem, was obtained tegral on E—a linear operator on some class T0 of
by Otton Nikodym fifteen years later (Nikodym, 1930). real-valued functions on E. The class T0 might con-
Did Fréchet himself have probability in mind when sist of all continuous functions (if E is endowed with
he proposed a calculus that allows integration over a topology) or perhaps all step functions. Applying
function space? Probably so. An integral is a mean Lebesgue’s methods in this general setting, Daniell ex-
value. In a Euclidean space this might be a mean tended the linear operator to a wider class T1 of func-
value with respect to a distribution of mass or electrical tions on E, the summable functions. In this way, the
charge, but we cannot distribute mass or charge over a Riemann integral is extended to the Lebesgue integral,
space of functions. The only thing we can imagine dis- the Stieltjes integral is extended to the Radon integral
tributing over such a space is probability or frequency. and so on (Daniell, 1918). Using ideas from Fréchet’s
However, Fréchet thought of probability as an appli- dissertation, Daniell also gave examples in infinite-
cation of mathematics, not as a branch of pure mathe- dimensional spaces (Daniell, 1919a, b). Daniell (1921)
matics itself, so he did not think he was axiomatizing even used his theory of integration to construct a theory
probability theory. of Brownian motion. However, he did not succeed in
It was Kolmogorov who first called Fréchet’s theory gaining recognition for this last contribution; it seems
a foundation for probability theory. He put the matter to have been completely ignored until Stephen Stigler
this way in the preface to the Grundbegriffe: spotted it in the 1970s (Stigler, 1973).
80 G. SHAFER AND V. VOVK

The American ex-child prodigy and polymath Nor- It should not be thought, however, that Wiener defined
bert Wiener, when he came upon Daniell’s 1918 and a σ -additive probability measure and then found mean
July 1919 articles (Daniell, 1918, 1919a), was in a values as integrals with respect to that measure. Rather,
better position than Daniell himself to appreciate and as we just explained, he started with mean values and
advertise their remarkable potential for probability used Daniell’s theory to obtain more. This Daniellian
(Wiener, 1956; Masani, 1990). Having studied philos- approach to probability, making mean value basic and
ophy as well as mathematics, Wiener was well aware probability secondary, has long taken a back seat to
of the intellectual significance of Brownian motion and Kolmogorov’s approach, but it still has its supporters
of Einstein’s mathematical model for it. (Haberman, 1996; Whittle, 2000).
In November 1919, Wiener submitted his first arti- 3.3 Borel’s Denumerable Probability
cle (Wiener, 1920) on Daniell’s integral to the Annals
of Mathematics, the journal where Daniell’s four arti- Impressive as it was and still is, Wiener’s work
cles on it had appeared. This article did not yet dis- played little role in the story leading to Kolmogorov’s
cuss Brownian motion; it merely laid out a general Grundbegriffe. The starring role was played instead by
method for setting up a Daniell integral when the un- Borel.
derlying space E is a function space. However, by Au- In retrospect, Borel’s use of measure theory in com-
gust 1920, Wiener was in France to explain his ideas plex analysis in the 1890s already looks like proba-
on Brownian motion to Fréchet and Lévy (Segal, 1992, bilistic reasoning. Especially striking in this respect
page 397). He followed up with a series of articles is the argument Borel gave for his claim that a Tay-
(Wiener, 1921a, b), including a later much celebrated lor series will usually diverge on the boundary of its
article on “differential-space” (Wiener, 1923). circle of convergence (Borel, 1897). In general, he as-
Wiener’s basic idea was simple. Suppose we want serted, successive coefficients of the Taylor series, or
to formalize the notion of Brownian motion for a fi- at least successive groups of coefficients, are indepen-
nite time interval, say 0 ≤ t ≤ 1. A realized path is a dent. He showed that each group of coefficients de-
function on [0, 1]. We want to define mean values for termines an arc on the circle, that the sum of lengths
certain functionals (real-valued functions of the real- of the arcs diverges and that the Taylor series will
ized path). To set up a Daniell integral that gives these diverge at a point on the circle if it belongs to infi-
mean values, Wiener took T0 to consist of functionals nitely many of the arcs. The arcs being independent
that depend only on the path’s values at a finite number and the sum of their lengths being infinite, a given point
of time points. One can find the mean value of such a must be in infinitely many of them. To make sense of
functional using Gaussian probabilities for the changes this argument, we must evidently take “in general” to
from each time point to the next. Extending this in- mean that the coefficients are chosen at random and
tegral by Daniell’s method, he succeeded in defining take “independent” to mean probabilistically indepen-
mean values for a wide class of functionals. In particu- dent; the conclusion then follows by what we now call
lar, he obtained probabilities (mean values for indicator the Borel–Cantelli lemma. Borel himself used proba-
functions) for certain sets of paths. He showed that the bilistic language when he reviewed this work in 1912
set of continuous paths has probability 1, while the set (Borel, 1912; Kahane, 1994). In the 1890s, however,
of differentiable paths has probability 0. Borel did not see complex analysis as a domain for
It is now commonplace to translate this work into probability, which is concerned with events in the real
Kolmogorov’s measure-theoretic framework. Kiyoshi world.
Itô, for example, in a commentary published along In the new century, Borel did begin to explore the im-
with Wiener’s articles from this period in Volume 1 plications for probability of his and Lebesgue’s work
of Wiener’s collected works (Wiener, 1976–1985, on measure and integration (Bru, 2001). His first com-
ments came in an article in 1905 (Borel, 1905), where
page 515), wrote as follows concerning Wiener’s 1923
he pointed out that the new theory justified Poincaré’s
article:
intuition that a point chosen at random from a line seg-
Having investigated the differential space ment would be incommensurable with probability 1
from various directions, Wiener defines the and called attention to Anders Wiman’s (1900, 1901)
Wiener measure as a σ -additive probability work on continued fractions, which had been inspired
measure by means of Daniell’s theory of in- by the question of the stability of planetary motions, as
tegral. an application of measure theory to probability.
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 81

Then, in 1909, Borel published a startling result—his bounded random variables will converge to their mean
strong law of large numbers (Borel, 1909a). This new with arbitrarily high probability. Cantelli’s work in-
result strengthened measure theory’s connection both spired other authors to study the strong law and to sort
with geometric probability and with the heart of clas- out different concepts of probabilistic convergence.
sical probability theory—the concept of independent By the early 1920s, it seemed to some that there
trials. Considered as a statement in geometric proba- were two different versions of Borel’s strong law—
bility, the law says that the fraction of 1’s in the binary one concerned with real numbers and one concerned
expansion of a real number chosen at random from with probability. Hugo Steinhaus (1923) proposed to
[0, 1] converges to 12 with probability 1. Considered as clarify matters by axiomatizing Borel’s theory of de-
a statement about independent trials (we may use the numerable probability along the lines of Sierpiński’s
language of coin tossing, though Borel did not), it says axiomatization of Lebesgue measure. Writing A for
that the fraction of heads in a denumerable sequence of the set of all infinite sequences of ρ’s and η’s (ρ for
independent tosses of a fair coin converges to 12 with “rouge” and η for “noir”; now we are playing red or
probability 1. Borel explained the geometric interpre- black rather than heads or tails), Steinhaus proposed
tation and he asserted that the result can be established the following axioms for a class K of subsets of A and
using measure theory (Borel, 1909a, Section I.8). How- a real-valued function µ that gives probabilities for the
ever, he set measure theory aside for philosophical elements of K:
reasons and provided an imperfect proof using denu-
I. µ(E) ≥ 0 for all E ∈ K.
merable versions of the rules of total and compound
II. 1. For any finite sequence e of ρ’s and η’s, the
probability. It was left to others, most immediately
subset E of A consisting of all infinite se-
Faber (1910, page 400) and Hausdorff (1914), to give
quences that begin with e is in K.
rigorous measure-theoretic proofs (Doob, 1989, 1994;
2. If two such sequences e1 and e2 differ in only
von Plato, 1994).
one place, then µ(E1 ) = µ(E2 ), where E1 and
Borel’s discomfort with a measure-theoretic treat-
E2 are the corresponding sets.
ment can be attributed to his unwillingness to as-
3. µ(A) = 1.
sume countable additivity for probability (Barone and
III. K is closed under finite and countable unions of
Novikoff, 1978; von Plato, 1994). He saw no logi-
disjoint elements, and µ is finitely and countably
cal absurdity in a countably infinite number of zero
additive.
probabilities adding to a nonzero probability, and so
IV. If E1 ⊃ E2 , and E1 and E2 are in K, then E1 \ E2
instead of general appeals to countable additivity he
is in K.
preferred arguments that derive probabilities as lim-
V. If E is in K and µ(E) = 0, then any subset of E is
its as the number of trials increases (Borel, 1909a,
in K.
Section I.4). Such arguments seemed to him stronger
than formal appeals to countable additivity, because Sierpiński’s axioms for Lebesgue measure consisted
they exhibit the finitary pictures that are idealized by of I, III, IV and V, together with an axiom that says that
the infinitary pictures. He saw even more fundamen- the measure µ(J ) of an interval J is its length. This
tal problems in the idea that Lebesgue measure can last axiom being demonstrably equivalent to Steinhaus’
model a random choice (von Plato, 1994, pages 36–56; axiom II, Steinhaus concluded that the theory of prob-
Knobloch, 2001). How can we choose a real number at ability for an infinite sequence of binary trials is iso-
random when most real numbers are not even definable morphic with the theory of Lebesgue measure.
in any constructive sense? To show that his axiom II is equivalent to setting the
Although Hausdorff did not hesitate to equate Lebes- measures of intervals equal to their length, Steinhaus
gue measure with probability, his account of Borel’s used the Rademacher functions—the nth Rademacher
strong law, in his Grundzüge der Mengenlehre (Haus- function being the function that assigns a real num-
dorff, 1914, pages 419–421), treated it as a theorem ber the value 1 or −1 depending on whether the nth
about real numbers: the set of numbers in [0, 1] with digit in its dyadic expansion is 0 or 1. He also used
binary expansions for which the proportion of 1’s con- these functions, which are independent random vari-
verges to 12 has Lebesgue measure 1. Later, Francesco ables, in deriving Borel’s strong law and related re-
Paolo Cantelli (1916a, b, 1917) rediscovered the strong sults. The work by Rademacher (1922) and Steinhaus
law (he neglected, in any case, to cite Borel) and ex- marked the beginning of the Polish school of “indepen-
tended it to the more general result that the average of dent functions,” which made important contributions to
82 G. SHAFER AND V. VOVK

probability theory during the period between the wars probability. In this article, Kolmogorov considered a
(Holgate, 1997). system with a set of states A. For any two time points
t1 and t2 (t1 < t2 ), any state x ∈ A and any element E in
3.4 Kolmogorov Enters the Stage
a collection F of subsets of A, he wrote P (t1 , x, t2 , E)
Although Steinhaus considered only binary trials for the probability, when the system is in state x at
in 1923, his reference to Borel’s more general con- time t1 , that it will be in a state in E at time t2 . Cit-
cept of denumerable probability pointed to generaliza- ing Fréchet, Kolmogorov assumed that P is countably
tions. We find such a generalization in Kolmogorov’s additive as a function of E and that F is closed un-
first article on probability, co-authored by Khinchin der differences and countable unions, and contains the
(Khinchin and Kolmogorov, 1925), which showed that empty set, all singletons and A. However, the focus was
a series of discrete random variables y1 + y2 + · · · will not on Fréchet; it was on the equation that ties together
converge with probability 1 when the series of means the transition probabilities, now called the Chapman–
and the series of variances both converge. The first sec- Kolmogorov equation. The article launched the study
tion of the article, due to Khinchin, spells out how to of this equation by purely analytical methods, a study
represent the random variables as functions on [0, 1]: that kept probabilists occupied for 50 years.
divide the interval into segments with lengths equal As many commentators have noted, the 1931 arti-
to the probabilities for y1 ’s possible values, then di- cle makes no reference to probabilities for trajecto-
vide each of these segments into smaller segments with ries. There is no suggestion that such probabilities are
lengths proportional to the probabilities for y2 ’s possi- needed for a stochastic process to be well defined. Con-
ble values and so on. This, Khinchin noted with a nod sistent transition probabilities, it seems, are enough.
to Rademacher and Steinhaus, reduces the problem to a Bachelier (1900, 1910, 1912) is cited as the first to
problem about Lebesgue measure. This reduction was study continuous-time stochastic processes, but Wiener
useful because the rules for working with Lebesgue is not cited.
measure were clear, while Borel’s picture of denumer-
able probability remained murky. 4. HILBERT’S SIXTH PROBLEM
Dissatisfaction with this detour into Lebesgue mea-
sure must have been one impetus for the Grundbegriffe At the beginning of the twentieth century, many
(Doob, 1989, page 818). Kolmogorov made no such mathematicians were dissatisfied with what they saw
detour in his next article on the convergence of sums as a lack of clarity and rigor in the probability calcu-
of independent random variables. In this sole-authored lus. The whole calculus seemed to be concerned with
article (Kolmogorov, 1928), he took probabilities and concepts that lie outside mathematics: event, trial, ran-
expected values as his starting point, but even then he domness, probability. As Henri Poincaré wrote, “one
did not appeal to Fréchet’s countably additive calcu- can hardly give a satisfactory definition of probability”
lus. Instead, he worked with finite additivity and then (Poincaré, 1912, page 24).
stated an explicit ad hoc definition when he passed to The most celebrated call for clarification came from
a limit. Forexample, he defined the probability P that David Hilbert. The sixth of the twenty-three open
the series ∞ n=1 yn converges by the equation
problems that Hilbert presented to the International
 p 
Congress of Mathematicians in Paris in 1900 was to
 
  N treat axiomatically, after the model of geometry, those
 
P = lim lim lim W Max  yk  <η , parts of physics in which mathematics already played
η→0 n→∞ N→∞  
k=n p=n an outstanding role, especially probability and me-
where W(E) denotes the probability of the event E. chanics (Hilbert, 1902; Hochkirchen, 1999). To explain
[This formula does not appear in the Russian what he meant by axioms for probability, Hilbert cited
(Kolmogorov, 1986) and English (Kolmogorov, 1992) Georg Bohlmann, who had labeled the rules of total
translations provided in Kolmogorov’s collected and compound probability axioms rather than theorems
works; there the argument has been modernized so as in his lectures on the mathematics of life insurance
to eliminate it.] This recalls the way Borel proceeded (Bohlmann, 1901). In addition to a logical investiga-
in 1909: think through each passage to the limit. tion of these axioms, Hilbert called for a “rigorous and
It was in his seminal article on Markov processes satisfactory development of the method of average val-
(Kolmogorov, 1931) that Kolmogorov first explicitly ues in mathematical physics, especially in the kinetic
and freely used Fréchet’s calculus as his framework for theory of gases.”
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 83

Hilbert’s call for a mathematical treatment of aver- discrete points, the function φ assigns each point
age values was answered in part by the work on inte- its probability, and the region between a subset m
gration that we discussed in the preceding section, but and the graph of φ consists of a line segment for
his suggestion that the classical rules for probability each point in m, whose Carathéodory measure is its
should be treated as axioms on the model of geome- length (i.e., the point’s probability). The rule of total
try was an additional challenge. Among the early re- probability follows. Like Broggi, Łomnicki treated
sponses, we may mention the following: the rule of compound probability as a rule for re-
lating probabilities on a Cartesian product to proba-
• In his Zürich dissertation, Rudolf Laemmel (1904) bilities on its components. He did not consider it an
discussed the rules of total and compound prob- axiom, because it holds only if the density itself is a
ability as axioms, but he stated the rule of com- product density.
pound probability only in the case of independence, • In an article published in Russian, Sergei Bernstein
a concept he did not explicate. (For excerpts, see (1917) showed that probability theory can be foun-
Schneider, 1988, pages 359–366.) ded on qualitative axioms for numerical coefficients
• In his Göttingen dissertation, directed by Hilbert that measure the probabilities of propositions. He
himself, Ugo Broggi (1907) gave only two axioms: also developed this idea in his probability text-
an axiom stating that the sure event has probabil- book (Bernstein, 1927), and Kolmogorov listed both
ity 1, and an axiom stating the rule of total probabil- the article and the book in the bibliography of
ity. Following tradition, he then defined probability the Grundbegriffe. John Maynard Keynes included
as a ratio (a ratio of numbers of cases in the discrete Bernstein’s article in the bibliography of his prob-
setting; a ratio of the Lebesgue measures of two sets ability book (Keynes, 1921), but Bernstein’s work
in the geometric setting) and verified his axioms. He was subsequently ignored by English-language au-
did not state an axiom that corresponds to the clas- thors on qualitative probability. It was first sum-
sical rule of compound probability. Instead, he gave marized in English in Samuel Kotz’s translation of
this name to a rule for calculating the probability of Leonid E. Maistrov’s (1974) history of probability.
a Cartesian product, which he derived from the defi- We now discuss at greater length responses by
nition of geometric probability in terms of Lebesgue von Mises, Slutsky, Kolmogorov and Cantelli.
measure. (For excerpts, see Schneider, 1988, pages 4.1 von Mises’ Collectives
367–377.) Broggi mistakenly claimed that his axiom
of total probability (finite additivity) implied count- The concept of a collective was introduced into
able additivity (Steinhaus, 1923). the German scientific literature by Gustav Fechner’s
• In an article written in 1920, published in 1923 (1897) Kollektivmasslehre, which appeared ten years
and listed in the bibliography of the Grundbegriffe, after the author’s death. The concept was quickly taken
Antoni Łomnicki (1923) proposed that probability up by Georg Helm (1902) and Heinrich Bruns (1906).
should always be understood relative to a density Fechner wrote about the concept of a Kollektivgegen-
stand (collective object) or a Kollektivreihe (collective
φ on a set M in R r . Łomnicki defined this prob-
series). It was only later, in Meinong (1915) for ex-
ability by combining two of Carathéodory’s ideas:
ample, that we see these names abbreviated to Kollek-
the idea of p-dimensional measure and the idea of
tiv. As the name Kollektivreihe indicates, a Kollektiv
defining the integral of a function on a set as the
is a population of individuals given in a certain order;
measure of the region between the set and the func- Fechner called the ordering the Urliste. It was sup-
tion’s graph (see Section 3.1 above). The probabil- posed to be irregular—random, we would say. Fechner
ity of a subset m of M, according to Łomnicki, is was a practical scientist, not concerned with the the-
the ratio of the measure of the region between m oretical notion of probability, but as Helm and Bruns
and φ’s graph to the measure of the region between realized, probability theory provides a framework for
M and this graph. If M is an r-dimensional sub- studying collectives.
set of R r , then the measure being used is Lebesgue The concept of a collective was developed by Richard
measure on R r+1 ; if M is a lower dimensional von Mises (1919, 1928, 1931). His contribution was to
subset of R r , say p-dimensional, then the measure realize that the concept can be made into a mathemat-
is the (p + 1)-dimensional Carathéodory measure. ical foundation for probability theory. As von Mises
This definition covers discrete as well as continu- defined it, a collective is an infinite sequence of out-
ous probability: in the discrete case, M is a set of comes that satisfies two axioms:
84 G. SHAFER AND V. VOVK

1. The relative frequency of each outcome converges out that whereas a collective in von Mises’ sense will
to a real number (the probability of the outcome) as not be vulnerable to a gambling system that chooses a
we look at longer and longer initial segments of the subsequence of trials on which to bet, it may still be
sequence. vulnerable to a more clever gambling system, which
2. The relative frequency converges to the same prob- also varies the amount of the bet and the outcome on
ability in any subsequence selected without knowl- which to bet.
edge of the future (we may use knowledge of the
4.2 Slutsky’s Calculus of Valences and
outcomes so far in deciding whether to include the
Kolmogorov’s General Theory of Measure
next outcome in the subsequence).
The second property says we cannot change the odds In an article published in Russian Evgeny Slutsky
by selecting a subsequence of trials on which to bet; (1922) presented a viewpoint that greatly influenced
this is von Mises’ version of the “hypothesis of the im- Kolmogorov. As Kolmogorov (1948) said in an obit-
possibility of a gambling system,” and it assures the uary for Slutsky, Slutsky was “the first to give the right
irregularity of the Urliste. picture of the purely mathematical content of probabil-
According to von Mises, the purpose of the prob- ity theory.”
ability calculus is to identify situations where collec- How do we make probability purely mathemati-
tives exist and the probabilities in them are known, and cal? Markov had claimed to do this in his textbook,
to derive probabilities for other collectives from these but Slutsky did not think Markov had succeeded, be-
given probabilities. He pointed to three domains where cause Markov had retained the subjective notion of
probabilities for collectives are known: (1) games of equipossibility. The solution, Slutsky felt, was to re-
chance where devices are carefully constructed so move both the word “probability” and the notion of
the axioms will be satisfied, (2) statistical phenom- equally likely cases from the theory. Instead of begin-
ena where the two axioms can be confirmed, to a rea- ning with equally likely cases, one should begin by as-
sonable degree and (3) branches of theoretical physics suming merely that numbers are assigned to cases and
where the two axioms play the same hypothetical role that when a case assigned the number α is further sub-
as other theoretical assumptions (von Mises, 1931, divided, the numbers assigned to the subcases should
pages 25–27). add to α. The numbers assigned to cases might be equal
von Mises derived the classical rules of probabil- or they might not. The addition and multiplication the-
ity, such as the rules for adding and multiplying prob- orems would be theorems in this abstract calculus, but
abilities, from rules for constructing new collectives it should not be called the probability calculus. In place
from an initial one. He had several laws of large num- of “probability,” he suggested the unfamiliar word va-
bers. The simplest was his definition of probability: the lentnost, or “valence.” (Laemmel had earlier used
probability of an event is the event’s limiting frequency the German valenz.) Probability would be only one in-
in a collective. Others arose as one constructed further terpretation of the calculus of valences, a calculus fully
collectives. as abstract as group theory.
The ideas of von Mises were taken up by a num- Slutsky listed three distinct interpretations of the cal-
ber of mathematicians in the 1920s and 1930s. Kol- culus of valences:
mogorov’s bibliography includes an article by Arthur 1. Classical probability (equally likely cases).
Copeland (1932) that proposed founding probability 2. Finite empirical sequences (frequencies).
theory on particular rules for selecting subsequences 3. Limits of relative frequencies. (Slutsky remarked
in von Mises’ scheme, as well as articles by Karl that this interpretation is particularly popular with
Dörge (1930), Hans Reichenbach (1932) and Erhard the English school.)
Tornier (1933) that argued for related schemes. But the
most prominent mathematicians of the time, including Slutsky did not think probability could be reduced to
the Göttingen mathematicians (Mac Lane, 1995), the limiting frequency, because sequences of independent
French probabilists and the British statisticians, were trials have properties that go beyond their possessing
hostile or indifferent. limiting frequencies. Initial segments of the sequences
Collectives were given a rigorous mathematical basis have properties that are not imposed by the eventual
by Abraham Wald (1938) and Alonzo Church (1940), convergence of the frequency, and the sequences must
but the claim that they provide a foundation for prob- be irregular in a way that resists the kind of selection
ability was refuted by Jean Ville (1939). Ville pointed discussed by von Mises.
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 85

Slutsky’s idea that probability could be an instance treatment of conditional probability and the theory of
of a broader abstract theory was taken up by Kol- distributions in infinite products were missing. Also
mogorov in a thought piece in Russian (Kolmogorov, missing, though, is the bold rhetorical move that
1929), before his forthright use of Fréchet’s theory in Kolmogorov made in the Grundbegriffe—giving the
his article on Markov processes in 1930 (Kolmogorov, abstract theory the name probability.
1931). Whereas Slutsky had mentioned frequencies as
4.3 The Axioms of Steinhaus and Ulam
an alternative interpretation of a general calculus, Kol-
mogorov pointed to more mathematical examples: the In the 1920s and 1930s, the city of Lwów in Poland
distribution of digits in the decimal expansions of ir- was a vigorous center of mathematical research, led by
rationals, Lebesgue measure in an n-dimensional cube Hugo Steinhaus. (Though it was in Poland between the
and the density of a set A of positive integers (the limit two World Wars, Lwów is now in Ukraine. Its name
as n → ∞ of the fraction of the integers between 1 and is spelled differently in different languages: Lwów in
n that are in A). Polish, Lviv in Ukrainian and Lvov in Russian. When
The abstract theory Kolmogorov sketches is con- part of Austria–Hungary and, briefly, Germany, it was
cerned with a function M that assigns a nonnega- Lemberg. Some articles in our bibliography refer to it
tive number M(E) to each element E of a class of as Léopol.) In 1929, Steinhaus’ work on limit theorems
subsets of a set A. He called M(E) the measure intersected with Kolmogorov’s, and his approach pro-
(mera) of E and he called M a measure specification moted the idea that probability should be axiomatized
(meroopredelenie). So as to accommodate all the in the style of measure theory.
mathematical examples he had in mind, he assumed, in As we saw in Section 3.3, Steinhaus had already,
general, neither that M is countably additive nor that in 1923, formulated axioms for heads and tails iso-
the class of subsets to which it assigns numbers is a morphic to Sierpiński’s axioms for Lebesgue measure.
field. Instead, he assumed only that when E1 and E2 This isomorphism had more than a philosophical pur-
are disjoint and M assigns a number to two of the three pose; Steinhaus used it to prove Borel’s strong law. In
sets E1 , E2 and E1 ∪ E2 , it also assigns a number to a pair of articles written in 1929 and published in 1930
the third, and that (Steinhaus, 1930a, b), Steinhaus extended his approach
M(E1 ∪ E2 ) = M(E1 ) + M(E2 ) to limit theorems that involved an infinite sequence of
independent draws θ1 , θ2 , . . . from the interval [0, 1].
then holds (cf. Steinhaus’ axioms III and IV). In the
His axioms for this case were the same as for the bi-
case of probability, however, he did suggest (using dif-
nary case (Steinhaus, 1930b, pages 22–23), except that
ferent words) that M should be countably additive and
the second axiom, which determines probabilities for
that the class of subsets to which it assigns numbers
initial finite sequences of heads and tails, was replaced
should be a field, for only then can we uniquely de-
by an axiom that determines probabilities for initial fi-
fine probabilities for countable unions and intersec-
nite sequences θ1 , θ2 , . . . , θn :
tions, and this seems necessary to justify arguments
involving events such as the convergence of random The probability that θi ∈ i for i = 1, . . . , n,
variables. where the i are measurable subsets of
He defined the abstract Lebesgue integral of a func- [0, 1], is
tion f on A, and he commented that countable ad-
ditivity is to be assumed whenever such an integral |1 | · |2 | · · · |n |,
is discussed. He wrote ME1 (E2 ) = M(E1 E2 )/M(E1 ) where |i | is the Lebesgue measure of i .
“by analogy with the usual concept of relative proba-
bility.” He defined independence for partitions, and he Steinhaus presented his axioms as a “logical extra-
commented, no doubt in reference to Borel’s strong law polation” of the classical axioms to the case of an infi-
and other results in number theory, that the notion of nite number of trials (Steinhaus, 1930b, page 23). They
independence is responsible for the power of probabi- were more or less tacitly used, he asserted, in all clas-
listic methods within pure mathematics. sical problems, such as the problem of the gambler’s
The mathematical core of the Grundbegriffe is alre- ruin, where the game as a whole—not merely finitely
ady here. Many years later, in his commentary in Vol- many rounds—must be considered (Steinhaus, 1930a,
ume II of his collected works (Kolmogorov, 1992, page 409). As in the case of heads and tails, Steinhaus
page 520), Kolmogorov said that only the set-theoretic showed that there are probabilities that uniquely satisfy
86 G. SHAFER AND V. VOVK

his axioms by setting up an isomorphism with Lebes- (Cantelli, 1932) and a lecture he gave in 1933 at the
gue measure on [0, 1], this time using a sort of Peano Institut Henri Poincaré in Paris (Cantelli, 1935).
curve to map [0, 1]∞ onto [0, 1]. He used the isomor- Cantelli (1932) argued for a theory that makes no
phism to prove several limit theorems, including one appeal to empirical notions such as possibility, event,
that formalized Borel’s 1897 claim concerning the probability or independence. This abstract theory, he
circle of convergence of a Taylor series with randomly said, should begin with a set of points that have fi-
chosen coefficients. nite nonzero measure. This could be any set for which
Steinhaus’ axioms were measure-theoretic, but they measure is defined, perhaps a set of points on a sur-
were not yet abstract. His words suggested that his face. He wrote m(E) for the area of a subset E. He
ideas should apply to all sequences of random vari- noted that m(E1 ∪ E2 ) = m(E1 ) + m(E2 ), provided
ables, not merely ones uniformly distributed, and he E1 and E2 are disjoint, and 0 ≤ m(E1 E2 )/m(Ei ) ≤ 1
even considered the case where the variables were for i = 1, 2. He called E1 and E2 multipliable when
complex-valued rather than real-valued, but he did not m(E1 E2 ) = m(E1 )m(E2 ). Much of probability theory,
step outside the geometric context to consider pro- he noted, including Bernoulli’s law of large numbers
bability on abstract spaces. This step was taken by and Khinchin’s law of the iterated logarithm, can be
Stanisław Ulam, one of Steinhaus’ junior colleagues carried out at this abstract level.
at Lwów. At the International Congress of Mathema- Cantelli (1935) explained how his abstract theory re-
ticians in Zürich in 1932, Ulam announced that he lates to frequencies in the world. The classical calculus
and another Lwów mathematician, Zbigniew Łomnicki of probability, he said, should be developed for a parti-
(a nephew of Antoni Łomnicki), had shown that pro- cular class of events in the world in three steps:
duct measures can be constructed in abstract spaces 1. Study experimentally the equally likely cases
(Ulam, 1932). (check that they happen equally frequently), thus
Ulam and Łomnicki’s axioms for a measure m were justifying experimentally the rules of total and com-
simple. We can put them in today’s language by sa- pound probability.
ying that m is a probability measure on a σ -algebra 2. Develop an abstract theory based only on the
that is complete (includes all null sets) and contains all rules of total and compound probability, without re-
singletons. Ulam announced that from a countable se- ference to their empirical justification.
quence of spaces with such probability measures, one 3. Deduce probabilities from the abstract theory and
can construct a probability measure that satisfies the use them to predict frequencies.
same conditions on the product space.
We do not know whether Kolmogorov knew about His own theory, Cantelli explains, is the one obtained
Ulam’s announcement when he wrote the Grundbe- in the second step.
griffe. Ulam’s axioms would have held no novelty for Cantelli’s 1932 article and 1933 lecture were not
him, but he would presumably have found the result on really sources for the Grundbegriffe. Kolmogorov’s
product measures interesting. When it finally appeared, earlier work (Kolmogorov, 1929, 1931) had already
Łomnicki and Ulam (1934) listed the same axioms as went well beyond anything Cantelli did in 1932, in
Ulam’s announcement had, but it now cited the Grund- both degree of abstraction and mathematical clarity.
begriffe as authority for them. Kolmogorov (1935) ci- The 1933 lecture was more abstract, but obviously
ted their article in turn in a short list of introductory came too late to influence the Grundbegriffe. Howe-
literature in mathematical probability. ver, Cantelli did develop independently of Kolmogorov
the project of combining a frequentist interpretation of
4.4 Cantelli’s Abstract Theory probability with an abstract axiomatization that retai-
ned in some form the classical rules of total and com-
Like Borel, Castelnuovo and Fréchet, Francesco
pound probability. This project had been in the air for
Paolo Cantelli turned to probability after distinguish-
30 years.
ing himself in other areas of mathematics. It was only
in the 1930s, about the same time as the Grundbegriffe
5. THE GRUNDBEGRIFFE
appeared, that he introduced his own abstract theory
of probability. This theory, which has important affini- The Grundbegriffe was an exposition, not another
ties with Kolmogorov’s, is developed most clearly in research contribution. In his preface, after acknowl-
an article included in the Grundbegriffe’s bibliography edging Fréchet’s work, Kolmogorov said this:
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 87

In the pertinent mathematical circles it In this section we take a fresh look at the Grund-
has been common for some time to con- begriffe. We review its six axioms and two ideas that
struct probability theory in accordance with were, as Kolmogorov himself pointed out in his pre-
this general point of view. But a complete face, novel at the time: the construction of probabilities
presentation of the whole system, free from on infinite-dimensional spaces (his famous consistency
superfluous complications, has been mis- theorem) and the definition of conditional probability
sing (though a book by Fréchet, [2] in the using the Radon–Nikodym theorem. Then we look at
bibliography, is in preparation). the explicitly philosophical part of the monograph: the
Kolmogorov aimed to fill this gap, and he did so bril- two pages in Chapter I where Kolmogorov explains the
liantly and concisely, in just 62 pages. Fréchet’s much empirical origin and meaning of his axioms.
longer book, which finally appeared in two volumes 5.1 The Mathematical Framework
(Fréchet, 1937–1938), is regarded by some as a mere
footnote to Kolmogorov’s achievement. Kolmogorov’s six axioms for probability are so fa-
Fréchet’s own evaluation of the Grundbegriffe’s con- miliar that it seems superfluous to repeat them, but so
tribution, quoted at the beginning of this article, is cor- concise that it is easy to do so. We do repeat them
rect so far as it goes. Borel had introduced countable and then we discuss the two points just mentioned:
additivity into probability in 1909, and in the following the consistency theorem and the treatment of condi-
20 years, many authors, including Kolmogorov, had tional probability and expectation. As we will see, the
explored its consequences. The Grundbegriffe merely mathematics was due to earlier authors—Daniell in
rounded out the picture by explaining that nothing the case of the consistency theorem and Nikodym in
more was needed. However, Kolmogorov’s mathema- the case of conditional probabilities and expectations.
tical achievement, especially his definitive work on the Kolmogorov’s contribution, more rhetorical and philo-
classical limit theorems, had given him the grounds and sophical than mathematical, was to bring this mathe-
the authority to say that nothing more was needed. matics into a framework for probability.
Moreover, Kolmogorov’s appropriation of the name 5.1.1 The six axioms. Kolmogorov began with five
probability was an important rhetorical achievement, axioms concerning a set E and a set F of subsets of E,
with enduring implications. Slutsky in 1922 and which he called random events:
Kolmogorov himself in 1927 had proposed a gener-
al theory of additive set functions but had relied on I. F is a field of sets.
the classical theory to say that probability should be a II. F contains the set E.
special case of this general theory. Now Kolmogorov III. To each set A from F is assigned a nonnegative
proposed axioms for probability. The numbers in his real number P(A). This number P(A) is called the
abstract theory were probabilities, not merely valences probability of the event A.
or mery. His philosophical justification for proceed- IV. The P(E) = 1.
ing in this way so resembled the justification that Borel V. If A and B are disjoint, then
and Lévy had given for the classical theory that they P(A ∪ B) = P(A) + P(B).
could hardly take exception.
It was not really true that nothing more was need- He then added a sixth axiom, redundant for finite F but
ed. Those who studied Kolmogorov’s formulation in independent of the first five axioms for infinite F:
detail soon realized that his axioms and definitions VI. If A1 ⊇ A2 ⊇ · · · is a decreasing sequence of
were inadequate in a number of ways. Most salien- ∞
events from F with A
n=1 n = ∅, then
tly, his treatment of conditional probability was not limn→∞ P(An ) = 0.
adequate for the burgeoning theory of Markov process-
es. In addition, there were other points in the mo- This is the axiom of continuity. Given the first five ax-
nograph where he could not obtain natural results at ioms, it is equivalent to countable additivity.
the abstract level and had to fall back to the classi- The six axioms can be summarized by saying that
cal examples—discrete probabilities and probabilities P is a nonnegative additive set function in the sense of
in Euclidean spaces. These shortcomings only gave im- Fréchet with P(E) = 1.
petus to the new theory, because the project of filling in Unlike Fréchet, who had debated countable addi-
the gaps provided exciting work for a new generation tivity with de Finetti (Fréchet, 1930; de Finetti, 1930;
of probabilists. Cifarelli and Regazzini, 1996), Kolmogorov did not
88 G. SHAFER AND V. VOVK

make a substantive argument for it. Instead, he said this Mathematics and Physics, together with an article by
(page 14): Wiener that also called attention to Daniell’s result. In
a commemoration of Kolmogorov’s early work, Doob
. . . Since the new axiom is essential only for
(1989) hazards the guess that Kolmogorov was una-
infinite fields of probability, it is hardly pos-
ware of Daniell’s result when he wrote the Grund-
sible to explain its empirical meaning. . . .
begriffe. This may be true. He would not have been
In describing any actual observable random
process, we can obtain only finite fields of the first author to repeat Daniell’s work; Jessen had
probability. Infinite fields of probability oc- presented the result as his own to the Seventh Scan-
cur only as idealized models of real random dinavian Mathematical Conference in 1929 and had
processes. This understood, we limit our- become aware of Daniell’s priority only in time to ac-
selves arbitrarily to models that satisfy Ax- knowledge it in a footnote to his contribution to the
iom VI. So far this limitation has been found proceedings (Jessen, 1930).
expedient in the most diverse investigations. It is implausible that Kolmogorov was still unaware
of Daniell’s construction after the comments by Wiener
This echoes Borel who adopted countable additi- and Jessen, but in 1948 he again ignored Daniell while
vity not as a matter of principle but because he had claiming the construction of probability measures on
not encountered circumstances where its rejection infinite products as a Soviet achievement (Gnedenko
seemed expedient (Borel, 1909a, Section I.5). How- and Kolmogorov, 1948, Section 3.1). Perhaps this
ever, Kolmogorov articulated even more clearly than can be dismissed as mere propaganda, but we should
Borel the purely instrumental significance of infinity. also remember that the Grundbegriffe was not meant
5.1.2 Probability distributions in infinite-dimension- as a contribution to pure mathematics. Daniell’s and
al spaces. Suppose, using modern terminology, that Kolmogorov’s theorems seem almost identical when
(E1 , F1 ), (E2 , F2 ), . . . is a sequence of measurable spa- they are assessed as mathematical discoveries, but they
ces. For each finite set of indices, say i1 , . . . , in , write differed in context and purpose. Daniell was not think-
F i1 ,...,in for the induced σ -algebra in the product space ing about probability, whereas the slightly different
n theorem formulated by Kolmogorov was about proba-
j =1 Eij . Write E for the product of all the Ei and
write F for the algebra (not a σ -algebra) that con- bility. Neither Daniell nor Wiener undertook to make
sists of all the cylinder subsets of E corresponding to probability into a conceptually independent branch
elements of the various Fi1 ,...,in . Suppose we define of mathematics by establishing a general method for
consistent probability measures for all the marginal representing it measure-theoretically.

spaces ( nj=1 Eij , Fi1 ,...,in ). This defines a set function Kolmogorov’s theorem was more general than Dani-
on (E, F). Is it countably additive? ell’s in one respect—Kolmogorov considered an index
In general, the answer is negative; a counterexample set of arbitrary cardinality, whereas Daniell considered
was given by Erik Sparre Andersen and Børge Jessen only denumerable cardinality. This greater generality is
in 1948, but as we noted in Section 4.3, Ulam had merely formal, in two senses: it involves no additional
given a positive answer for the case where the mar- mathematical complications and it has no practical use.
ginal measures are product measures. Kolmogorov’s The obvious use of a nondenumerable index would be
consistency theorem, in Section 4 of Chapter III of to represent continuous time, and so we might conjec-
the Grundbegriffe, gave a positive answer for another ture that Kolmogorov was thinking of making prob-
case, where each Ei is a copy of the real numbers ability statements about trajectories, as Wiener had
and each Fi consists of the Borel sets. (Formally, we done in the 1920s. However, Kolmogorov’s construc-
should acknowledge, Kolmogorov had a slightly differ- tion does not accomplish anything in this direction.
ent starting point: finite-dimensional distribution func- The σ -algebra on the product obtained by the con-
tions, not finite-dimensional measures.) struction contains too few sets; in the case of Brow-
In his September 1919 article (Daniell, 1919b), nian motion, it does not include the set of continuous
Daniell had proven a closely related theorem. Although trajectories. It took some decades of further research
Kolmogorov did not cite Daniell in the Grundbegriffe, to develop general methods of extension to σ -algebras
the essential mathematical content of Kolmogorov’s re- rich enough to include the infinitary events one typi-
sult is already in Daniell’s. This point was recognized cally wants to discuss (Doob, 1953; Bourbaki, 1994,
quickly; Jessen (1935) called attention to Daniell’s pri- pages 243–245). The topological character of these
ority in an article that appeared in MIT’s Journal of extensions and the failure of the consistency theorem
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 89

for arbitrary Cartesian products remain two important to derive conditional probabilities from absolute prob-
caveats to the Grundbegriffe’s thesis that probability is abilities.
adequately represented by the abstract notion of a prob- We should not, incidentally, jump to the conclu-
ability measure. sion that Kolmogorov had abandoned the emphasis on
transition probabilities he had displayed in his 1931
5.1.3 Experiments and conditional probability. In
article and now wanted to start the study of stochas-
the case where A has nonzero probability, Kolmogorov
tic processes with unconditional probabilities. Even
defined PA (B) in the usual way. He called it bedingte
in 1935, he recommended the opposite (Kolmogorov,
Wahrscheinlichkeit, which translates into English as
1935, pages 168–169 of the English translation).
“conditional probability.” Before the Grundbegriffe,
this term was less common than “relative probability.” 5.1.4 When is conditional probability meaningful?
Kolmogorov’s treatment of conditional probability To illustrate his understanding of conditional probabi-
and expectation was novel. It began with a set-theoretic lity, Kolmogorov discussed Bertrand’s paradox of the
formalization of the concept of an experiment (Ver- great circle, which he called, with no specific reference,
such in German). Here Kolmogorov had in mind a a Borelian paradox. His explanation of the paradox was
subexperiment of the grand experiment defined by the simple but formal. After noting that the probability dis-
conditions S. The subexperiment may give only limi- tribution for the second point conditional on a particu-
ted information about the outcome ξ of the grand ex- lar great circle is not uniform, he said:
periment. It defines a partition A of the sample space
This demonstrates the inadmissibility of
E for the grand experiment: its outcome amounts to
the idea of conditional probability with re-
specifying which element of A contains ξ . Kolmogo-
spect to a given isolated hypothesis with
rov formally identified the subexperiment with A. Then
probability zero. One obtains a probability
he introduced the idea of conditional probability rela-
distribution for the latitude on a given great
tive to A:
circle only when that great circle is consid-
• In the finite case, he wrote PA (B) for the random ered as an element of a partition of the entire
variable whose value at each point ξ of E is PA (B), surface of the sphere into great circles with
where A is the element of A that contains ξ , and he the given poles (page 45).
called this random variable the “conditional proba-
This explanation has become part of the culture of
bility of B after the experiment A” (page 12). This
random variable is well defined for all the ξ in ele- probability theory, but it cannot completely replace the
ments of A that have positive probability, and these more substantive explanations given by Borel.
ξ form an event that has probability 1. Borel insisted that we explain how the measurement
• In the general case, he represented the partition A by on which we will condition is to be carried out. This
a function u on E that induces it and he wrote Pu (B) accords with Kolmogorov’s insistence that a partition
for any random variable that satisfies be specified, because a procedure for measurement will
determine such a partition. Kolmogorov’s explicitness
P{u⊂A} (B) = E{u⊂A} Pu (B) on this point was a philosophical advance. On the other
hand, Borel demanded more than the specification of a
for every set A of possible values of u such that the
partition. He demanded that the measurement be speci-
subset {ξ |u(ξ ) ∈ A} of E (this is what he meant by
fied realistically enough that we can see partitions into
{u ⊂ A}) is measurable and has positive probability
events of positive probability, not just a theoretical lim-
(page 42). By the Radon–Nikodym theorem (only
iting partition into events of probability 0.
recently proven by Nikodym), this random variable
Borel’s demand that we be told how the theoretical
is unique up to a set of probability 0. Kolmogorov
partition into events of probability 0 arises as a limit
called it the “conditional probability of B with re-
of partitions into events of positive probability again
spect to (or knowing) u.” He defined Eu (y), which
compromises the abstract picture by introducing to-
he called “the conditional expectation of the variable
pological ideas, but this seems to be needed so as to
y for a known value of u,” analogously (page 46).
rule out nonsense. This point was widely discussed
Kolmogorov was doing no new mathematics here; the in the 1940s and 1950s. Dieudonné (1948) and Lévy
mathematics is Nikodym’s. However, Kolmogorov was (1959) gave examples in which the conditional prob-
the first to point out that Nikodym’s result can be used abilities defined by Kolmogorov do not have versions
90 G. SHAFER AND V. VOVK

(functions of ξ for fixed B) that form sensible prob- namely


ability measures (when considered as functions of B
heads—heads, heads—tails,
for fixed ξ ). Gnedenko and Kolmogorov (1949) and
Blackwell (1956) formulated conditions on measurable tails—heads, tails—tails.
spaces or probability measures that rule out such exam-
Consider the event A that there is a repe-
ples. For modern formulations of these conditions, see
tition. This event consists of the first and
Rogers and Williams (2000).
fourth elementary events. Every event can
5.2 The empirical origin of the axioms similarly be regarded as a set of elementary
events.
Kolmogorov devoted about two pages of the Grund-
begriffe to the relation between his axioms and the real 4. Under certain conditions, that we will
world. These two pages, a concise statement of Kolmo- not go into further here, we may assume
gorov’s frequentist philosophy, are so important to our that an event A that does or does not oc-
story that we quote them in full. We then discuss how cur under conditions S is assigned a real
this philosophy was related to the thinking of his prede- number P(A) with the following proper-
cessors and how it fared in the decades following 1933. ties:
A. One can be practically certain that
5.2.1 In Kolmogorov’s own words. Section 2 of if the system of conditions S is re-
Chapter I of the Grundbegriffe is titled “Das Verhält- peated a large number of times, n,
nis zur Erfahrungswelt.” It is only two pages in length. and the event A occurs m times, then
This subsection consists of a translation of the section the ratio m/n will differ only slightly
in its entirety. from P(A).
The relation to the world of experience B. If P(A) is very small, then one can
The theory of probability is applied to the be practically certain that the event A
real world of experience as follows: will not occur on a single realization
of the conditions S.
1. Suppose we have a certain system of
conditions S, capable of unlimited repe- Empirical deduction of the axioms. Usu-
tition. ally one can assume that the system F of
2. We study a fixed circle of phenomena events A, B, C . . . that come into consid-
that can arise when the conditions S are eration and are assigned definite probabili-
realized. In general, these phenomena ties forms a field that contains E (Axioms
can come out in different ways in differ- I and II and the first half of Axiom III—the
ent cases where the conditions are rea- existence of the probabilities). It is further
lized. Let E be the set of the different evident that 0 ≤ m/n ≤ 1 always holds, so
that the second half of Axiom III appears
possible variants ξ1 , ξ2 , . . . of the out-
completely natural. We always have m = n
comes of the phenomena. Some of these
for the event E, so we naturally set P(E) =
variants might actually not occur. We
1 (Axiom IV). Finally, if A and B are mu-
include in the set E all the variants we
tually incompatible (in other words, the sets
regard a priori as possible.
A and B are disjoint), then m = m1 + m2 ,
3. If the variant that actually appears when
where m, m1 and m2 are the numbers of
conditions S are realized belongs to a set
experiments in which the events A ∪ B, A
A that we define in some way, then we
and B happen, respectively. It follows that
say that the event A has taken place.
m m1 m2
E XAMPLE . The system of conditions = + .
n n n
S consists of flipping a coin twice. The
So it appears appropriate to set P(A ∪ B) =
circle of phenomena mentioned in point 2
P(A) + P(B).
consists of the appearance, on each flip,
of heads or tails. It follows that there are R EMARK I. If two assertions are both
four possible variants (elementary events), practically certain, then the assertion that
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 91

they are simultaneously correct is practi- von Mises’ infinitary version, could not be made math-
cally certain, though with a little lower ematically rigorous. So for mathematics, one should
degree of certainty. But if the number of as- adopt an axiomatic theory “whose practical value can
sertions is very large, we cannot draw any be deduced directly” from a finitary concept of collec-
conclusion whatsoever about making the as- tives.
sertions simultaneously from the practical Although collectives are in the background, Kolmo-
certainty of each of them individually. So gorov starts in a way that echoes Chuprov more than
it in no way follows from Principle A that von Mises. He writes, as Chuprov (1910, page 149)
m/n will differ only a little from P(A) in did, of a system of conditions (Komplex von Bedin-
every one of a very large number of series gungen in German; kompleks usloviĭ in Russian).
of experiments, where each series consists Probability is relative to a system of conditions S, and
of n experiments. yet further conditions must be satisfied in order for
events to be assigned a probability under S. Kolmogo-
R EMARK II. By our axioms, the impos- rov says nothing more about these conditions, but we
sible event (the empty set) has the probabil- may conjecture that he was thinking of the three sour-
ity P(∅) = 0. But the converse inference, ces of probabilities mentioned by von Mises: gambling
from P(A) = 0 to the impossibility of A, devices, statistical phenomena and physical theory.
does not by any means follow. By Princi- Where do von Mises’ two axioms—probability as a
ple B, the event A’s having probability zero limit of relative frequency and its invariance under se-
implies only that it is practically impossible lection of subsequences—appear in Kolmogorov’s ac-
that it will happen on a particular unrepe- count? Principle A is obviously a finitary version of
ated realization of the conditions S. This von Mises’ axiom that identifies probability as the limit
by no means implies that the event A will of relative frequency. Principle B, on the other hand,
not appear in the course of a sufficiently is the strong form of Cournot’s principle (see Sec-
long series of experiments. When P(A) = 0 tion 2.2.2 above). Is it a finitary version of von Mises’
and n is very large, we can only say, by principle of invariance under selection? Evidently. In
Principle A, that the quotient m/n will be a collective, von Mises says, we have no way to sin-
very small—it might, for example, be equal gle out an unusual infinite subsequence. One finitary
to 1/n. version of this is that we have no way to single out an
unusual single trial. It follows that when we do select
5.2.2 The philosophical synthesis. The philosophy a single trial (a single realization of the conditions S,
set out in the two pages we have just translated is a syn- as Kolmogorov puts it), we should not expect anything
thesis, combining elements of the German and French unusual. In the special case where the probability is
traditions. very small, the usual is that the event will not happen.
By his own testimony, Kolmogorov drew first and Of course, Principle B, like Principle A, is only sat-
foremost from von Mises. In a footnote, he put the mat- isfied when there is a collective, that is, under certain
ter this way: conditions. Kolmogorov’s insistence on this point is
. . . In laying out the assumptions needed to confirmed by the comments we quoted in Section 2.2.2
make probability theory applicable to the herein on the importance and nontriviality of the step
world of real events, the author has fol- from “usually” to “in this particular case.”
lowed in large measure the model provided As Borel and Lévy had explained so many times,
by Mr. von Mises . . . Principle A can be deduced from Principle B togeth-
er with Bernoulli’s theorem, which is a consequence
The very title of this section of the Grundbegriffe, “Das of the axioms. In the framework that Kolmogorov sets
Verhältnis zur Erfahrungswelt,” echoes the title of the up, however, the deduction requires an additional as-
passage in von Mises (1931) that Kolmogorov cites— sumption: we must assume that Principle B applies
“Das Verhältnis der Theorie zur Erfahrungswelt”— not only to the probabilities specified for repetitions
but Kolmogorov does not discuss collectives. As he of conditions S, but also to the corresponding prob-
explained in a letter to Fréchet in 1939, he thought abilities (obtaining by assuming independence) for re-
only a finitary version of this concept would reflect petitions of n-fold repetitions of S. It is not clear
experience truthfully, and a finitary version, unlike that this additional assumption is appropriate, not only
92 G. SHAFER AND V. VOVK

because we might hesitate about independence (see enjoyed the enduring popularity of his axioms. Sec-
Shiryaev’s comments on page 120 of the third Russian tion 2 of Chapter I of the Grundbegriffe is seldom quo-
edition of the Grundbegriffe, published in 1998), but ted. Cournot’s principle remained popular in Europe
also because the enlargement of our model to n-fold re- during the 1950s (Shafer and Vovk, 2005), but never
petitions might involve a deterioration in its empirical gained substantial traction in the United States.
precision to the extent that we are no longer justified in The lack of interest in Kolmogorov’s philosophy
treating its high-probability predictions as practically during the past half century can be explained in many
certain. Perhaps these considerations justify Kolmogo- ways, but one important factor is the awkwardness of
rov’s presenting Principle A as an independent princi- extending it to stochastic processes. The first condition
ple alongside Principle B rather than as a consequence in Kolmogorov’s credo is that the system of conditions
of it. should be capable of unlimited repetition. When we
Principle A has an independent role in Kolmogorov’s define a stochastic process in terms of transition prob-
story, however, even if we do regard it as a consequence abilities, as in Kolmogorov (1931), this condition may
of Principle B together with Bernoulli’s theorem, be- be met, for it may be possible to start a system repeat-
cause it comes into play at a point that precedes the edly in a given state, but when we focus on probabili-
adoption of the axioms and hence the derivation of Ber- ties for sets of possible trajectories, we are in a more
noulli’s theorem: it is used to motivate the axioms (cf. awkward position. In many applications, there is only
Bartlett, 1949). The parallel to the thinking of Lévy is one realized trajectory; it is not possible to repeat the
striking. In Lévy’s picture, the notion of equally like- experiment to obtain another. Kolmogorov managed to
ly cases motivates the axioms, while Cournot’s princi- overlook this tension in the Grundbegriffe, where he
ple links the theory with reality. The most important showed how to represent a discrete-time Markov chain
change Kolmogorov makes in this picture is to replace in terms of a single probability measure (Chapter I,
equally likely cases with frequency; frequency now Section 6), but did not give such representations for
motivates the axioms, but Cournot’s principle remains continuous stochastic processes. It became more dif-
the most essential link with reality. ficult to ignore the tension after Doob and others suc-
In spite of the obvious influence of Borel and Lévy, ceeded in giving such representations.
Kolmogorov cites only von Mises in this section of
the Grundbegriffe. Philosophical works by Borel and 6. CONCLUSION
Lévy, along with those by Slutsky and Cantelli, do ap- Seven decades later, the Grundbegriffe’s mathemati-
pear in the Grundbegriffe’s bibliography, but their ap- cal ideas still set the stage for mathematical probability.
pearance is explained only by a sentence in the preface: Its philosophical ideas, especially Cournot’s principle,
“The bibliography gives some recent works that sho- also remain powerful, even for those who want to go
uld be of interest from a foundational viewpoint.” The beyond the measure-theoretic framework (Shafer and
emphasis on von Mises may have been motivated in Vovk, 2001). As we have tried to show in this article,
part by political prudence. Whereas Borel and Lévy the endurance of these ideas is not due to Kolmogo-
persisted in speaking of the subjective side of proba- rov’s originality. Rather, it is due to the presence of the
bility, von Mises was an uncompromising frequentist. ideas in the very fabric of the work that came before.
Whereas Chuprov and Slutsky worked in economics The Grundbegriffe was a product of its own time.
and statistics, von Mises was an applied mathemati-
cian, concerned more with aerodynamics than social ACKNOWLEDGMENTS
science, and the relevance of his work on collectives
to physics had been established in the Soviet litera- Glenn Shafer’s research was partially supported
ture by Khinchin (1929; see also Khinchin, 1961, and by NSF Grant SES-98-19116 to Rutgers University.
Siegmund-Schultze, 2004). (For more on the politi- Vladimir Vovk’s research was partially supported by
cal context, see Blum and Mespoulet, 2003; Lorentz, EPSRC Grant GR/R46670/01, BBSRC Grant
2002; Mazliak, 2003; Seneta, 2004.) 111/BIO14428, MRC Grant S505/65 and EU Grant
IST-1999-10226 to Royal Holloway, University of
5.2.3 Why was Kolmogorov’s philosophy not more London.
influential? Although Kolmogorov never abandoned We want to thank the many colleagues who have
his formulation of frequentism, his philosophy has not helped us broaden our understanding of the period
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 93

discussed in this article. Bernard Bru and Oscar Shey- B ERNSTEIN , F. (1912). Über eine Anwendung der Mengenlehre
nin were particularly helpful. We also benefited from auf ein aus der Theorie der säkularen Störungen herrührendes
conversation and correspondence with Pierre Crépel, Problem. Math. Ann. 71 417–439.
B ERNSTEIN , S. N. (1917). Opyt aksiomatiqeskogo obosno-
Elyse Gustafson, Sam Kotz, Steffen Lauritzen, Per
vani teorii verotnosteĭ (On the axiomatic founda-
Martin-Löf, Thierry Martin, Laurent Mazliak, Paul tion of the theory of probability). Soobweni Harkovskogo
Miranti, Julie Norton, Nell Painter, Goran Peskir, An- Matematiqeskogo Obwestva (Communications of the
drzej Ruszczynski, J. Laurie Snell, Stephen M. Stigler Kharkiv Mathematical Society) 15 209–274. Reprinted in
and Jan von Plato. S. N. Bernstein (1964). Sobranie Soqineniĭ 10–60. Na-
We are also grateful for help in locating references. uka, Moscow.
B ERNSTEIN , S. N. (1927). Teori verotnosteĭ (Theory
Sheynin gave us direct access to his extensive trans-
of Probability). Gosudarstvennoe Izdatelstvo (State
lations. Vladimir V’yugin helped us locate the origi- Publishing House), Moscow and Leningrad. Second edition
nal text of Kolmogorov’s 1929 article, and Aleksandr 1934, fourth 1946. This work was included in the Grundbe-
Shen’ gave us a copy of the 1936 Russian translation griffe’s bibliography.
of the Grundbegriffe. Natalie Borisovets, at Rutgers’ B ERTRAND , J. (1889). Calcul des probabilités. Gauthier-Villars,
Dana Library, and Mitchell Brown, at Princeton’s Fine Paris. Some copies of the first edition are dated 1888. Second
edition 1907. Reprinted by Chelsea, New York, 1972.
Library, have also been exceedingly helpful.
B LACKWELL , D. (1956). On a class of probability spaces. Proc.
Third Berkeley Symp. Math. Statist. Probab. 2 1–6. Univ. Cali-
REFERENCES fornia Press, Berkeley.
B LUM , A. and M ESPOULET, M. (2003). L’Anarchie bureau-
A NDERSEN , E. S. and J ESSEN , B. (1948). On the introduction cratique. Statistique et pouvoir sous Staline. Découverte, Paris.
of measures in infinite product sets. Det Kongelige Danske B OHLMANN , G. (1901). Lebensversicherungs-Mathematik. In En-
Videnskabernes Selskab, Matematisk-Fysiske Meddelelser 25 cyklopädie der Mathematischen Wissenschaften 1(2) 852–917.
(4), 8 pp. Teubner, Leipzig.
BACHELIER , L. (1900). Théorie de la spéculation. Ann. Sci. École B OREL , E. (1895). Sur quelques points de la théorie des fonctions.
Norm. Supér. (3) 17 21–86. This was Bachelier’s doctoral dis- Ann. Sci. École Norm. Supér. (3) 12 9–55.
sertation. Reprinted in facsimile in 1995 by Éditions Jacques
B OREL , E. (1897). Sur les séries de Taylor. Acta Math. 20
Gabay, Paris. An English translation, by A. J. Boness, appears
243–247. Reprinted in Borel (1972) 2 661–665.
in P. H. Cootner, ed. (1964). The Random Character of Stock
B OREL , E. (1898). Leçons sur la théorie des fonctions. Gauthier-
Market Prices 17–78. MIT Press.
Villars, Paris.
BACHELIER , L. (1910). Les probabilités à plusieurs variables. Ann.
B OREL , E. (1905). Remarques sur certaines questions de proba-
Sci. École Norm. Supér. (3) 27 339–360.
bilité. Bull. Soc. Math. France 33 123–128. Reprinted in Borel
BACHELIER , L. (1912). Calcul des probabilités. Gauthier-Villars,
(1972) 2 985–990.
Paris.
B OREL , E. (1906). La valeur pratique du calcul des probabilités.
BARONE , J. and N OVIKOFF , A. (1978). A history of the axiomatic
formulation of probability from Borel to Kolmogorov. I. Arch. La revue du mois 1 424–437. Reprinted in Borel (1972) 2
Hist. Exact Sci. 18 123–190. 991–1004.
BARTLETT, M. S. (1949). Probability in logic, mathematics and B OREL , E. (1909a). Les probabilités dénombrables et leurs appli-
science. Dialectica 3 104–113. cations arithmétiques. Rend. Circ. Mat. Palermo 27 247–270.
BAYER , R., ed. (1951). Congrès international de philosophie des Reprinted in Borel (1972) 2 1055–1079.
sciences, Paris, 1949 4. Calcul des probabilités. Hermann, Pa- B OREL , E. (1909b). Éléments de la théorie des probabilités.
ris. Gauthier-Villars, Paris. Third edition 1924. The 1950 edition
B ERNOULLI , J. (1713). Ars Conjectandi. Thurnisius, Basel. This was translated into English by J. E. Freund and published as
pathbreaking work appeared eight years after Bernoulli’s Elements of the Theory of Probability by Prentice-Hall in 1965.
death. A facsimile reprinting of the original Latin text is sold B OREL , E. (1912). Notice sur les travaux scientifiques. Gauthier-
by Éditions Jacques Gabay, Paris. A German translation ap- Villars, Paris. Prepared by Borel to support his candidacy to the
peared in 1899 (Wahrscheinlichkeitsrechnung von Jakob Ber- Académie des Sciences. Reprinted in Borel (1972) 1 119–190.
noulli. Anmerkungen von R. Haussner, Ostwald’s Klassiker, B OREL , E. (1914). Le Hasard. Alcan, Paris. The first and second
Nr. 107–108, Engelmann, Leipzig), with a second edition editions both appeared in 1914, with later editions in 1920,
(Deutsch, Frankfurt) in 1999. A Russian translation of Part IV, 1928, 1932, 1938 and 1948.
which contains Bernoulli’s law of large numbers, appeared in B OREL , E. (1930). Sur les probabilités universellement négligea-
1986: . Bernulli, O zakone bolxih qisel. Nauka, bles. C. R. Acad. Sci. Paris 190 537–540. Reprinted as Note IV
Moscow. It includes a preface by Kolmogorov, dated October of Borel (1939).
1985, and commentaries by other Russian authors. B. Sung’s B OREL , E. (1939). Valeur pratique et philosophie des probabilités.
English translation of Part IV, dated 1966, remains unpublished Gauthier-Villars, Paris. Reprinted in 1991 by Éditions Jacques
but is available in several university libraries in the United Sta- Gabay, Paris.
tes. O. Sheynin’s English translation of Part IV, dated 2005, B OREL , E. (1972). Œuvres de Émile Borel. Centre National de la
can be downloaded from www.sheynin.de. Recherche Scientifique, Paris. Four volumes.
94 G. SHAFER AND V. VOVK

B OURBAKI , N. (pseudonym) (1994). Elements of the History C ZUBER , E. (1903). Wahrscheinlichkeitsrechnung und ihre An-
of Mathematics. Springer, Berlin. Translated from the 1984 wendung auf Fehlerausgleichung, Statistik und Lebensver-
French edition by J. Meldrum. sicherung. Teubner, Leipzig. Second edition 1910, third 1914.
B ROGGI , U. (1907). Die Axiome der Wahrscheinlichkeitsrech- D ’A LEMBERT, J. (1761). Réflexions sur le calcul des probabilités.
nung. Ph.D. thesis, Universität Göttingen. Excerpts reprinted Opuscules mathématiques 2 1–25.
in Schneider (1988) 359–366. D ’A LEMBERT, J. (1767). Doutes et questions sur le calcul des pro-
B RU , B. (2001). Émile Borel. In Statisticians of the Centuries babilités. Mélanges de littérature, d’histoire, et de philosophie
(C. C. Heyde and E. Seneta, eds.) 287–291. Springer, New 5 275–304.
York. DANIELL , P. J. (1918). A general form of integral. Ann. of Math.
B RU , B. (2003). Souvenirs de Bologne. Journal de la Société (2) 19 279–294.
Française de Statistique 144 134–226. Special volume on DANIELL , P. J. (1919a). Integrals in an infinite number of dimen-
history. sions. Ann. of Math. (2) 20 281–288.
B RUNS , H. (1906). Wahrscheinlichkeitsrechnung und Kollek- DANIELL , P. J. (1919b). Functions of limited variation in an infi-
tivmasslehre. Teubner, Leipzig and Berlin. Available at nite number of dimensions. Ann. of Math. (2) 21 30–38.
historical.library.cornell.edu. DANIELL , P. J. (1920). Further properties of the general integral.
B UFFON , G.-L. (1777). Essai d’arithmétique morale. In Sup- Ann. of Math. (2) 21 203–220.
plément à l’Histoire naturelle 4 46–148. Imprimerie Royale, DANIELL , P. J. (1921). Integral products and probability. Amer.
Paris. J. Math. (2) 43 143–162.
C ANTELLI , F. P. (1916a). La tendenza ad un limite nel senso DASTON , L. (1979). d’Alembert’s critique of probability theory.
del calcolo delle probabilità. Rend. Circ. Mat. Palermo 41 Historia Math. 6 259–279.
191–201. Reprinted in Cantelli (1958) 175–188. DASTON , L. (1994). How probabilities came to be objective and
C ANTELLI , F. P. (1916b). Sulla legge dei grandi numeri. Atti Re- subjective. Historia Math. 21 330–344.
ale Accademia Nazionale Lincei, Memorie Cl. Sc. Fis. 11 DE F INETTI , B. (1930). A proposito dell’estensione del teorema
329–349. Reprinted in Cantelli (1958) 189–213. delle probabilità totali alle classi numerabili. Rend. Reale
C ANTELLI , F. P. (1917). Sulla probabilità come limite della fre-
Instituto Lombardo Sci. Lettere 63 901–905, 1063–1069.
quenza. Atti Reale Accademia Nazionale Lincei 26 39–45. Re-
DE F INETTI , B. (1939). Compte rendu critique du colloque de
printed in Cantelli (1958) 214–221.
Genève sur la théorie des probabilités. Actualités Scientifiques
C ANTELLI , F. P. (1932). Una teoria astratta del calcolo delle
et Industrielles 766. Hermann, Paris. Number 766 is the eighth
probabilità. Giornale dell’Istituto Italiano degli Attuari 3
fascicle of Wavre (1938–1939).
257–265. Reprinted in Cantelli (1958) 289–297.
DE M OIVRE , A. (1718). The Doctrine of Chances: Or, A Method
C ANTELLI , F. P. (1935). Considérations sur la convergence dans
of Calculating the Probability of Events in Play. Pearson, Lon-
le calcul des probabilités. Ann. Inst. H. Poincaré 5 3–50. Re-
don. Second edition 1738, third 1756.
printed in Cantelli (1958) 322–372.
D IEUDONNÉ , J. (1948). Sur le théorème de Lebesgue–Nikodym.
C ANTELLI , F. P. (1958). Alcune memorie matematiche. Giuffrè,
III. Ann. Univ. Grenoble 23 25–53.
Milan.
D OOB , J. L. (1953). Stochastic Processes. Wiley, New York.
C ARATHÉODORY, C. (1914). Über das lineare Mass von
Punktmengen—eine Verallgemeinerung des Längenbegriffs. D OOB , J. L. (1989). Kolmogorov’s early work on convergence
Nachr. Akad. Wiss. Göttingen Math.-Phys. II Kl. 4 404–426. theory and foundations. Ann. Probab. 17 815–821.
C ARATHÉODORY, C. (1918). Vorlesungen über reelle Funktionen. D OOB , J. L. (1994). The development of rigor in mathematical
Teubner, Leipzig and Berlin. Second edition 1927. probability, 1900–1950. In Pier (1994b) 157–170. Reprinted
C ASTELNUOVO , G. (1919). Calcolo delle probabilitá. Albrighi in Amer. Math. Monthly 103 (1996) 586–595.
e Segati, Milan, Rome, and Naples. Second edition in two D ÖRGE , K. (1930). Zu der von R. von Mises gegebenen Begrün-
volumes, 1926 and 1928. Third edition 1948. dung der Wahrscheinlichkeitsrechnung. Math. Z. 32 232–258.
C HUPROV, A. A. (1910). Oqerki po teorii statistiki E LLIS , R. L. (1849). On the foundations of the theory of proba-
(Essays on the Theory of Statistics), 2nd ed. Sabashnikov, St. bilities. Trans. Cambridge Philos. Soc. 8 1–6. The paper was
Petersburg. The first edition appeared in 1909. The second edi- read on February 14, 1842. Part 1 of Volume 8 was published
tion was reprinted by the State Publishing House, Moscow, in in 1843 or 1844, but Volume 8 was not completed until 1849.
1959. FABER , G. (1910). Über stetige Funktionen. II. Math. Ann. 69
C HURCH , A. (1940). On the concept of a random sequence. Bull. 372–443.
Amer. Math. Soc. 46 130–135. F ECHNER , G. T. (1897). Kollektivmasslehre. Engelmann, Leipzig.
C IFARELLI , D. M. and R EGAZZINI , E. (1996). de Finetti’s contri- Edited by G. F. Lipps.
bution to probability and statistics. Statist. Sci. 11 253–282. F RÉCHET, M. (1915a). Définition de l’intégrale sur un ensemble
C OPELAND , A. H., S R . (1932). The theory of probability from abstrait. C. R. Acad. Sci. Paris 160 839–840.
the point of view of admissible numbers. Ann. Math. Statist. 3 F RÉCHET, M. (1915b). Sur l’intégrale d’une fonctionnelle étendue
143–156. à un ensemble abstrait. Bull. Soc. Math. France 43 248–265.
C OURNOT, A.-A. (1843). Exposition de la théorie des chances F RÉCHET, M. (1930). Sur l’extension du théorème des probabilités
et des probabilités. Hachette, Paris. Reprinted in 1984 as Vo- totales au cas d’une suite infinie d’événements. Rend. Reale
lume I (B. Bru, ed.) of Cournot (1973–1984). Instituto Lombardo Sci. Lettere 63 899–900, 1059–1062.
C OURNOT, A.-A. (1973–1984). Œuvres complètes. Vrin, Paris. F RÉCHET, M. (1937–1938). Recherches théoriques modernes sur
Ten volumes, with an eleventh to appear. la théorie des probabilités. Gauthier-Villars, Paris. This work
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 95

was listed in the Grundbegriffe’s bibliography as in prepa- H OCHKIRCHEN , T. (1999). Die Axiomatisierung der Wahrsche-
ration. It consists of two books, Fréchet (1937) and Fréchet inlichkeitsrechnung und ihre Kontexte: Von Hilberts sechstem
(1938a). The two books together constitute Fascicle 3 of Vo- Problem zu Kolmogoroffs Grundbegriffen. Vandenhoeck and
lume 1 of Émile Borel’s Traité du calcul des probabilités et ses Ruprecht, Göttingen.
applications. H OLGATE , P. (1997). Independent functions: Probability and ana-
F RÉCHET, M. (1937). Généralités sur les probabilités. Variables lysis in Poland between the wars. Biometrika 84 161–173.
aléatoires. Gauthier-Villars, Paris. Second edition 1950. This J EFFREYS , H. (1931). Scientific Inference. Cambridge Univ. Press.
is Book 1 of Fréchet (1937–1938). Second edition 1957, third 1973.
F RÉCHET, M. (1938a). Méthode des fonctions arbitraires. Théorie J ESSEN , B. (1930). Über eine Lebesguesche Integrationstheorie
des événements en chaîne dans le cas d’un nombre fini d’états für Funktionen unendlich vieler Veränderlichen. In Den Sy-
possibles. Gauthier-Villars, Paris. Second edition 1952. This is vende Skandinaviske Mathatikerkongress I Oslo 19–22 August
Book 2 of Fréchet (1937–1938). 1929 127–138. A. W. Brøggers Boktrykkeri, Oslo.
F RÉCHET, M. (1938b). Exposé et discussion de quelques recher- J ESSEN , B. (1935). Some analytical problems relating to probabi-
ches récentes sur les fondements du calcul des probabilités. lity. J. Math. Phys. Mass. Inst. Tech. 14 24–27.
Actualités Scientifiques et Industrielles 735 23–55. Hermann, J OHNSON , N. L. and KOTZ , S., eds. (1997). Leading Personalities
Paris. In Wavre (1938–1939), second fascicle, entitled Les fon- in Statistical Sciences. Wiley, New York.
dements du calcul des probabilités. K AHANE , J.-P. (1994). Des séries de Taylor au mouvement brow-
F RÉCHET, M. (1951). Rapport général sur les travaux du Colloque nien, avec un aperçu sur le retour. In Pier (1994b) 415–429.
de Calcul des Probabilités. In Bayer (1951) 3–21. K AMLAH , A. (1983). Probability as a quasi-theoretical concept—
F RÉCHET, M. and H ALBWACHS , M. (1924). Le calcul des proba- J. V. Kries’ sophisticated account after a century. Erkenntnis
bilités à la portée de tous. Dunod, Paris. 19 239–251.
G NEDENKO , B. V. and KOLMOGOROV, A. N. (1948). Teori K EYNES , J. M. (1921). A Treatise on Probability. Macmillan, Lon-
verotnosteĭ (Probability theory). In Matematika v don.
SSSR za tridcat let 1917–1947 (Thirty Years of Soviet K HINCHIN , A. YA . (1929). Uqenie Mizesa o verotnosth
Mathematics 1917–1947) 701–727. Gostehizdat, Moscow i principy fiziqeskoĭ statistiki (Mises’ work on
and Leningrad. English translation in Sheynin (1998) 131–158. probability and the principles of statistical physics). Uspehi
G NEDENKO , B. V. and KOLMOGOROV, A. N. (1949). Pre- Fiziqeskih Nauk 9 141–166.
delnye raspredeleni dl summ nezavisimyh K HINCHIN , A. YA . (1961). On the Mises frequentist theory.
sluqaĭnyh veliqin. State Publishing House, Moscow. Voprosy filosofii (Questions of Philosophy) 15
Translated into English by K. L. Chung and published in 1954 (1, 2) 91–102, 77–89. Published after Khinchin’s death by
as Limit Distributions for Sums of Independent Random Va- B. Gnedenko. English translation in Sheynin (1998) 99–137,
riables, Addison–Wesley, Cambridge, MA, with an appendix reproduced with footnotes by R. Siegmund-Schultze in
by J. L. Doob. Science in Context 17 (2004) 391–422. We have seen only this
H ABERMAN , S. J. (1996). Advanced Statistics 1. Description of English translation, not the original.
Populations. Springer, New York. K HINCHIN , A. YA . and KOLMOGOROV, A. N. (1925). Über Kon-
H ADAMARD , J. (1922). Les principes du calcul des probabilités. vergenz von Reihen, deren Glieder durch den Zufall bestimmt
Revue de métaphysique et de morale 39 289–293. A sligh- werden. Matematiqeskiĭ Sbornik (Sbornik: Mathema-
tly longer version of this note, with the title “Les axiomes du tics) 32 668–677. Translated into Russian in Kolmogorov
calcul des probabilités,” was included in Oeuvres de Jacques (1986) 7–16 and thence into English in Kolmogorov (1992)
Hadamard 4 2161–2162. Centre National de la Recherche Sci- 1–10.
entifique, Paris, 1968. K NOBLOCH , E. (2001). Emile Borel’s view of probability theory.
H AUSDORFF , F. (1901). Beiträge zur Wahrscheinlichkeitsrech- In Probability Theory: Philosophy, Recent History and Re-
nung. Sitzungsber. Königlich Sächs. Gesellschaft Wiss. Leipz. lations to Science (V. F. Hendricks, S. A. Pedersen and
Math.-Phys. Kl. 53 152–178. K. F. Jørgensen, eds.) 71–95. Kluwer, Dordrecht.
H AUSDORFF , F. (1914). Grundzüge der Mengenlehre. von Veit, KOLMOGOROV, A. N. (1928). Über die Summen durch den Zufall
Leipzig. bestimmter unabhängiger Grössen. Math. Ann. 99 309–319.
H AWKINS , T. (1975). Lebesgue’s Theory of Integration: Its Ori- An addendum appears in 1930: 102 484–488. The article
gins and Development, 2nd ed. Chelsea, New York. First edi- and the addendum are translated into Russian in Kolmogorov
tion 1970, Univ. Wisconsin Press, Madison. The second edition (1986) 20–34 and thence into English in Kolmogorov (1992)
differs only slightly from the first, but it corrects a consequ- 15–31.
ential error on p. 104. Second edition reprinted in 1979 by KOLMOGOROV, A. N. (1929). Obwa teori mery i is-
Chelsea, New York, and then in 2001 by the American Mat- qislenie verotnosteĭ (The General Theory of Measure
hematical Society, Providence, RI. and the calculus of probability). In Sbornik rabot Mate-
H ELM , G. (1902). Die Wahrscheinlichkeitslehre als Theorie der matiqeskogo Razdela, Kommunistiqeska Akademi,
Kollektivbegriffe. Annalen der Naturphilosophie 1 364–384. Sekci Estestvennyh i Toqnyh Nauk (Collected Works
H ILBERT, D. (1902). Mathematical problems. Bull. Amer. Math. of the Mathematical Section, Communist Academy, Section for
Soc. 8 437–479. Hilbert’s famous address to the Internatio- Natural and Exact Sciences) 1 8–21. The Socialist Academy
nal Congress of Mathematicians in Paris in 1900, in which was founded in Moscow in 1918 and was renamed The Com-
he listed twenty-three open problems central to mathematics. munist Academy in 1923 (Vucinich, 2000). The date 8 January
Translated from the German by M. W. Newson. 1927, which appears at the end of the article in the journal, was
96 G. SHAFER AND V. VOVK

omitted when the article was reproduced in the second volume L EBESGUE , H. (1904). Leçons sur l’intégration et la recherche
of Kolmogorov’s collected works (Kolmogorov, 1986, 48–58). des fonctions primitives. Gauthier-Villars, Paris. Second edi-
The English translation (Kolmogorov, 1992, 48–59) moderni- tion 1928.
zes the article’s terminology somewhat: M becomes a “mea- L ÉVY, P. (1925). Calcul des probabilités. Gauthier-Villars, Paris.
sure” instead of a “measure specification.” L ÉVY, P. (1937). Théorie de l’addition des variables aléatoires.
KOLMOGOROV, A. N. (1931). Über die analytischen Methoden Gauthier-Villars, Paris. Second edition 1954.
in der Wahrscheinlichkeitsrechnung. Math. Ann. 104 415–458. L ÉVY, P. (1959). Un paradoxe de la théorie des ensembles aléato-
Dated July 26, 1930. Translated into Russian in Kolmogorov ires. C. R. Acad. Sci. Paris 248 181–184. Reprinted in Levy
(1986) 60–105 and thence into English in Kolmogorov (1992) (1973–1980) 6 67–69.
62–108. L ÉVY, P. (1973–1980). Œuvres de Paul Lévy. Gauthier-Villars, Pa-
KOLMOGOROV, A. N. (1933). Grundbegriffe der Wahrschein- ris. In six volumes. Edited by D. Dugué.
lichkeitsrechnung. Springer, Berlin. A Russian translation by Ł OMNICKI , A. (1923). Nouveaux fondements du calcul des pro-
G. M. Bavli, appeared under the title Osnovnye ponti babilités (Définition de la probabilité fondée sur la théorie des
teorii verotnosteĭ (Nauka, Moscow) in 1936, with ensembles). Fund. Math. 4 34–71.
a second edition, slightly expanded by Kolmogorov with Ł OMNICKI , Z. and U LAM , S. (1934). Sur la théorie de la mesure
the assistance of A. N. Shiryaev, in 1974, and a third edi- dans les espaces combinatoires et son application au calcul
tion (FAZIS, Moscow) in 1998. An English translation by des probabilités. I. Variables indépendantes. Fund. Math. 23
N. Morrison appeared under the title Foundations of the Theory 237–278.
of Probability (Chelsea, New York) in 1950, with a second edi- L ORENTZ , G. G. (2002). Mathematics and politics in the Soviet
tion in 1956. Union from 1928 to 1953. J. Approx. Theory 116 169–223.
KOLMOGOROV, A. N. (1935). O nekotoryh novyh teqenih L OVELAND , J. (2001). Buffon, the certainty of sunrise, and the
v teorii verotnosteĭ (On some modern currents in the probabilistic reductio ad absurdum. Arch. Hist. Exact Sci. 55
theory of probability). In Trudy 2-go Vsesoznogo Mate- 465–477.
matiqeskogo Sezda, Leningrad, 24–30 In 1934 g. M AC L ANE , S. (1995). Mathematics at Göttingen under the Nazis.
(Proceedings of the 2nd All-Union Mathematical Congress, Notices Amer. Math. Soc. 42 1134–1138.
Leningrad, 24–30 June 1934) 1 (Plenary Sessions and Review M AISTROV, L. E. (1974). Probability Theory: A Historical Sketch.
Talks) 349–358. Izdatelstvo AN SSSR, Leningrad and Academic Press, New York. Translated and edited by S. Kotz.
Moscow. English translation in Sheynin (2000) 165–173. M ARKOV, A. A. (1900). Isqislenie verotnosteĭ (Calculus
KOLMOGOROV, A. N. (1939). Letter to Maurice Fréchet. Fonds of Probability). Tipografi Imperatorskoĭ Akade-
Fréchet, Archives de l’Académie des Sciences, Paris. mii Nauk, St. Petersburg. Second edition 1908, fourth 1924.
KOLMOGOROV, A. N. (1948). Evgeniĭ Evgenieviq Slu- M ARKOV, A. A. (1912). Wahrscheinlichkeitsrechnung.
ckiĭ: Nekrolog (Obituary for Evgeny Evgenievich Teubner, Leipzig. Translation of second edition of Markov
Slutsky). Uspehi Matematiqeskih Nauk (Russian (1900). Available at historical.library.cornell.
Mathematical Surveys) 3(4) 142–151. English translation edu.
in Sheynin (1998) 77–88, reprinted in Math. Sci. 27 67–74 M ARTIN , T. (1996). Probabilités et critique philosophique selon
(2002). Cournot. Vrin, Paris.
KOLMOGOROV, A. N. (1956). Teori verotnosteĭ (Pro- M ARTIN , T. (1998). Bibliographie cournotienne. Annales littérai-
bability theory). In Matematika, ee soderanie, me- res de l’Université de Franche-Comté, Besançon.
tody i znaqenie (A. D. Aleksandrov, A. N. Kolmogorov M ARTIN , T. (2003). Probabilité et certitude. In Probabilités
and M. A. Lavrent’ev, eds.) 2 252–284. Nauka, Moscow. The subjectives et rationalité de l’action (T. Martin, ed.) 119–134.
Russian edition had three volumes. The English translation, CNRS Éditions, Paris.
Mathematics, Its Content, Methods, and Meaning, was first M ASANI , P. R. (1990). Norbert Wiener, 1894–1964. Birkhäuser,
published in 1962 and 1963 in six volumes by the American Basel.
Mathematical Society, Providence, RI, and then republished M AZLIAK , L. (2003). Andrei Nikolaevitch Kolmogorov
in 1965 in three volumes by the MIT Press, Cambridge, MA. (1903–1987). Un aperçu de l’homme et de l’œuvre pro-
Reprinted by Dover, New York, 1999. Kolmogorov’s chapter babiliste. Prépublication PMA-785, Univ. Paris VI. Available
occupies pp. 33–71 of Part 4 in the 1963 English edition and at www.proba.jussieu.fr.
pp. 229–264 of Volume 2 in the 1965 English edition. M EINONG , A. (1915). Über Möglichkeit und Wahrscheinlich-
KOLMOGOROV, A. N. (1986). Izbrannye trudy. Teori ve- keit: Beiträge zur Gegenstandstheorie und Erkenntnistheorie.
rotnosteĭ i matematiqeska statistika. Nauka, Barth, Leipzig.
Moscow. N IKODYM , O. (1930). Sur une généralisation des intégrales de
KOLMOGOROV, A. N. (1992). Selected Works of A. N. Kolmogo- M. J. Radon. Fund. Math. 15 131–179.
rov 2. Probability Theory and Mathematical Statistics. Kluwer, O NDAR , K H . O., ed. (1981). The Correspondence Between A. A.
Dordrecht. Translation by G. Lindquist of Kolmogorov (1986). Markov and A. A. Chuprov on the Theory of Probability and
L AEMMEL , R. (1904). Untersuchungen über die Ermittlung Mathematical Statistics. Springer, New York. Translated from
von Wahrscheinlichkeiten. Ph.D. thesis, Universität Zürich. the Russian by C. M. and M. D. Stein.
Excerpts reprinted in Schneider (1988) 367–377. O NICESCU , O. (1967). Le livre de G. Castelnuovo Calcolo della
L EBESGUE , H. (1901). Sur une généralisation de l’intégrale défi- probabilità e applicazioni comme aboutissant de la suite des
nie. C. R. Acad. Sci. Paris 132 1025–1028. grands livres sur les probabilités. In Simposio Internazionale
THE SOURCES OF KOLMOGOROV’S GRUNDBEGRIFFE 97

di Geometria Algebrica (Roma, 30 Settembre–5 Ottobre 1965) S IEGMUND -S CHULTZE , R. (2004). Mathematicians forced to phi-
xxxvii–liii. Edizioni Cremonese, Rome. losophize: An introduction to Khinchin’s paper on von Mises’
P IER , J.-P. (1994a). Intégration et mesure 1900–1950. In Pier theory of probability. Sci. Context 17 373–390.
(1994b) 517–564. S IERPI ŃSKI , W. (1918). Sur une définition axiomatique des en-
P IER , J.-P., ed. (1994b). Development of Mathematics 1900–1950. sembles mesurables (L). Bull. Internat. Acad. Sci. Cracovie A
Birkhäuser, Basel. 173–178. Reprinted in W. Sierpiński (1975). Oeuvres choisies
P OINCARÉ , H. (1890). Sur le problème des trois corps et les équa- 2 256–260. PWN (Polish Scientific Publishers), Warsaw.
tions de la dynamique. Acta Math. 13 1–271. S LUTSKY, E. (1922). K voprosu o logiqeskih osnovah teo-
P OINCARÉ , H. (1896). Calcul des probabilités. Leçons professées rii verotnosti (On the question of the logical foundation
pendant le deuxième semestre 1893–1894. Carré, Paris. Avail- of the theory of probability). Vestnik Statistiki (Bul-
able at historical.library.cornell.edu. letin of Statistics) 12 13–21.
P OINCARÉ , H. (1912). Calcul des probabilités. Gauthier-Villars, S LUTSKY, E. (1925). Über stochastische Asymptoten und Grenz-
Paris. Second edition of Poincaré (1896). werte. Metron 5 3–89.
P ORTER , T. (1986). The Rise of Statistical Thinking, 1820–1900. S TEINHAUS , H. (1923). Les probabilités dénombrables et leur rap-
Princeton University Press, Princeton, NJ. port à la théorie de la mesure. Fund. Math. 4 286–310.
R ADEMACHER , H. (1922). Einige Sätze über Reihen von allge- S TEINHAUS , H. (1930a). Über die Wahrscheinlichkeit dafür,
meinen Orthogonalfunktionen. Math. Ann. 87 112–138. daß der Konvergenzkreis einer Potenzreihe ihre natürliche
R ADON , J. (1913). Theorie und Anwendungen der absolut ad- Grenze ist. Math. Z. 31 408–416. Received by the editors
ditiven Mengenfunktionen. Akad. Wiss. Sitzungsber. Kaiserl. 5 August 1929.
Math.-Nat. Kl. 122 1295–1438. Reprinted in his Gesammelte S TEINHAUS , H. (1930b). Sur la probabilité de la convergence
Abhandlungen 1 45–188. Birkhäuser, Basel, 1987. de séries. Première communication. Studia Math. 2 21–39.
R EICHENBACH , H. (1916). Der Begriff der Wahrscheinlichkeit Received by the editors 24 October 1929.
für die mathematische Darstellung der Wirklichkeit. Barth, S TIGLER , S. M. (1973). Simon Newcomb, Percy Daniell, and the
Leipzig. history of robust estimation 1885–1920. J. Amer. Statist. Assoc.
R EICHENBACH , H. (1932). Axiomatik der Wahrscheinlichkeits- 68 872–879.
rechnung. Math. Z. 34 568–619. T ORNIER , E. (1933). Grundlagen der Wahrscheinlichkeitsrech-
ROGERS , L. C. G. and W ILLIAMS , D. (2000). Diffusions, Markov nung. Acta Math. 60 239–380.
Processes, and Martingales. 1. Foundations, reprinted 2nd ed. U LAM , S. (1932). Zum Massbegriffe in Produkträumen. In Ver-
Cambridge Univ. Press. handlung des Internationalen Mathematiker-Kongress Zürich
S CHNEIDER , I., ed. (1988). Die Entwicklung der Wahrscheinlich- 2 118–119.
keitstheorie von den Anfängen bis 1933: Einführungen und V ENN , J. (1888). The Logic of Chance, 3rd ed. Macmillan, London
Texte. Wissenschaftliche Buchgesellschaft, Darmstadt. and New York. First edition 1866, second 1876.
S EGAL , I. E. (1992). Norbert Wiener. November 26, 1894–March V ILLE , J. (1939). Étude critique de la notion de collectif. Gauthier-
18, 1964. Biographical Memoirs 61 388–436. National Aca- Villars, Paris. This differs from Ville’s dissertation, which was
demy of Sciences, Washington. defended in March 1939, only in that a 17-page introductory
S ENETA , E. (1997). Boltzmann, Ludwig Edward. In Johnson and chapter replaces the dissertation’s one-page introduction.
Kotz (1997) 353–354. VON B ORTKIEWICZ , L. (1901). Anwendungen der Wahrschein-
S ENETA , E. (2004). Mathematics, religion and Marxism in the So- lichkeitsrechnung auf Statistik. In Encyklopädie der Mathema-
viet Union in the 1930s. Historia Math. 31 337–367. tischen Wissenschaften 1 821–851. Teubner, Leipzig.
S HAFER , G. and VOVK , V. (2001). Probability and Finance: It’s VON K RIES , J. (1886). Die Principien der Wahrscheinlichkeits-
Only a Game! Wiley, New York. rechnung. Eine logische Untersuchung. Mohr, Freiburg. The
S HAFER , G. and VOVK , V. (2005). The origins and legacy of Kol- second edition, which appeared in 1927, reproduced the first
mogorov’s Grundbegriffe. Working Paper No. 4. Available at without change and added a new 12-page foreword.
www.probabilityandfinance.com. VON M ISES , R. (1919). Grundlagen der Wahrscheinlichkeitsrech-
S HEYNIN , O. (1996). Aleksandr A. Chuprov: Life, Work, Corre- nung. Math. Z. 5 52–99.
spondence. The Making of Mathematical Statistics. Vandenho- VON M ISES , R. (1928). Wahrscheinlichkeit, Statistik und Wahrheit.
eck and Ruprecht, Göttingen. Springer, Vienna. Second edition 1936, third 1951. A posthu-
S HEYNIN , O., ed. (1998). From Markov to Kolmogorov. Rus- mous fourth edition, edited by his widow Hilda Geiringer, ap-
sian papers on probability and statistics. Containing es- peared in 1972. English editions, under the title Probability,
says of S. N. Bernstein, A. A. Chuprov, B. V. Gnedenko, Statistics and Truth, appeared in 1939 and 1957.
A. Ya. Khinchin, A. N. Kolmogorov, A. M. Liapunov, VON M ISES , R. (1931). Wahrscheinlichkeitsrechnung und ihre An-
A. A. Markov and V. V. Paevsky. Hänsel-Hohenhausen, wendung in der Statistik und theoretischen Physik. Deuticke,
Egelsbach, Germany. Translations from Russian into English Leipzig and Vienna.
by the editor. Deutsche Hochschulschriften No. 2514. In VON P LATO , J. (1994). Creating Modern Probability: Its Math-
microfiche. ematics, Physics, and Philosophy in Historical Perspective.
S HEYNIN , O., ed. (2000). From Daniel Bernoulli to Urlanis. Still Cambridge Univ. Press.
more Russian Papers on Probability and Statistics. Hänsel- V UCINICH , A. (2000). Soviet mathematics and dialectics in the
Hohenhausen, Egelsbach, Germany. Translations from Rus- Stalin era. Historia Math. 27 54–76.
sian into English by the editor. Deutsche Hochschulschriften WALD , A. (1938). Die Widerspruchfreiheit des Kollectivbegrif-
No. 2696. In microfiche. fes. In Actualités Scientifiques et Industrielles 735 79–99.
98 G. SHAFER AND V. VOVK

Hermann, Paris. Titled Les fondements du calcul des prob- W IENER , N. (1921b). The average of an analytical functional
abilités, Number 735 is the second fascicle of Wavre and the Brownian movement. Proc. Natl. Acad. Sci. U.S.A. 7
(1938–1939). 294–298.
WAVRE , R. (1938–1939). Colloque consacré à la théorie des prob- W IENER , N. (1923). Differential-space. J. Math. Phys. Mass. Inst.
abilités. Hermann, Paris. This celebrated colloquium, chaired Tech. 2 131–174.
by Maurice Fréchet, was held in October 1937 at the Univer- W IENER , N. (1924). The average value of a functional. Proc. Lon-
sity of Geneva. The proceedings were published by Hermann don Math. Soc. 22 454–467.
in eight fascicles in their series Actualités Scientifiques et W IENER , N. (1956). I am a Mathematician. The Later Life of a
Industrielles. The first seven fascicles appeared in 1938 as Prodigy. Doubleday, Garden City, NY.
W IENER , N. (1976–1985). Collected Works with Commenta-
numbers 734 through 740; the eighth, de Finetti’s summary of
ries. MIT Press, Cambridge, MA. Four volumes. Edited by
the colloquium, appeared in 1939 as number 766 (de Finetti,
P. Masani. Volume 1 includes Wiener’s early papers on Brow-
1939).
nian motion (Wiener, 1920; Wiener, 1921a; Wiener, 1921b;
W HITTLE , P. (2000). Probability via Expectation, 4th ed. Springer, Wiener, 1923; Wiener, 1924), with a commentary by K. Itô.
New York. The first two editions (Penguin, 1970; Wiley, 1976) W IMAN , A. (1900). Über eine Wahrscheinlichkeitsaufgabe bei
were titled Probability. The third edition, also by Springer, ap- Kettenbruchentwicklungen. Öfversigt af Kongliga Svenska
peared in 1992. Vetenskaps-Akademiens Förhandlingar. Femtiondesjunde År-
W IENER , N. (1920). The mean of a functional of arbitrary ele- gången 57 829–841.
ments. Ann. of Math. (2) 22 66–72. W IMAN , A. (1901). Bemerkung über eine von Gyldén aufgewor-
W IENER , N. (1921a). The average of an analytical functional. fene Wahrscheinlichkeitsfrage. Håkan Ohlssons boktrykeri,
Proc. Natl. Acad. Sci. U.S.A. 7 253–260. Lund.

S-ar putea să vă placă și