Documente Academic
Documente Profesional
Documente Cultură
This page begins with million, billion, etc., proceeds through Googolplex
and Skewes' numbers (organised into "classes" based on the height of the
power-tower involved), then moves on through "tetration", the Moser and
the "Graham-Rothschild number", on to lesser-known hierarchies of
recursive functions, the theory of computation, transfinite numbers and
infinities. If it's a number and it's large, it's probably here.
As we have found the need to use large numbers in our lives, various
interesting systems have been proposed. Impress your friends with some of
these!
.So we get
n=11 undecillion
Page 1 of 100
n=18 octodecillion
n=25 quinvigintillion
In case your Latin needs a refresher, here’s the table to create the
prefix, as refined by Olivier Miakinen. It works left-to-right, so for
your n, you do the units first, then the tens, then the hundreds.
0 — — —
When placed before a component marked (s) or (x), “tre” becomes “tres” and
“se” becomes “ses” or “sex”. Similarly, placed before a component marked
(m) or (n), “septe” and “nove” become “septem” and “novem” or “septen” and
“noven”.
Page 2 of 100
.
Large numbers are often seen in terms of money. But how large? Estimates
suggest the total amount of money in the world is around $1 quadrillion:
this is only $1015
= 100 quinvigintillion
A googol
10100 = 10 x 103 x 32 + 3
= 10 duotrigintillion
= 93 x 103×51+3
= 93 unquinquagintillion
Page 3 of 100
The largest-known prime is a Mersenne prime, named after this guy:
Mersenne.
274,207,281−1 ≈ 3.004…×1022,338,617
≈300×103×7,446,204+3= ???
Fret not! The system expands further. Run the naming system for each group
of thousands and put an –illi– between them. So,
274,207,281−1 ≈ 3.004…×1022,338,617
≈ 300×103×7,446,204+3
300 septillisesquadragintaquadringentilliquattuorducentillion!
The number of years you’d have to wait for the universe to regenerate
itself in a similar state to now, if you let the universe repeat its
history arbitrarily many times: 10^10^10^10^10^1.1
OK, what? The big bang theory gives us an age for the universe of 13.8
billion years: that’s 4.355×1017 seconds. Still not bigger than the number
of atoms in the universe, though. But in order to get that incredibly
large number—in fact, probably the largest number you’ve ever seen—we need
to get a bit physical. Read about it here. This number even dwarves a
googolplex, the largest number to have a name.
The Latin mille for 1000—from which we get millennium (1000 years) and
mile (1000 paces)—is thought to have come from pre-13th century Italian.
As the Romans had no names for numbers larger than 100,000, the Italians
added the ending -one to mille to make it larger: milione.
Page 4 of 100
Chuquet’s first mention of million, billion, trillion, and so on (starting
top line, in French).
The first mark can signify million, the second mark byllion, the third
mark tryllion, the fourth quadrillion, the fifth quyillion, the sixth
sixlion, the seventh septyllion, the eighth ottyllion, the ninth nonyllion
and so on with others as far as you wish to go.
Chuquet records the usage here as using the ‘long scale‘: where a billion
equals a million million, instead of a thousand million. This usage was
common in British English up until the mid-1970s, when pressure from
American English (‘a billion dollars’) tipped the balance. Most European
languages still use the long scale: ironic, given that the short scale was
adapted in the US from a 17th century French convention.
The Conway system for naming large numbers expands the current system in a
logical way, but suffers from the possibility of long scale/short scale
confusion, as well as inconsistency between n<10 and n≥10. Russ Rowlett’s
2001 suggestion is to use Greek prefixes to create new, unambiguous
numbers:
103 thousand
106 million
109 gillion
1012 tetrillion
The Conway system also suffers from a bad starting point, which makes it
difficult to work out from the naming system what, say, 1 million × 1
billion is. (It’s a quadrillion). Donald Knuth (of TeX fame) invented an
exponential system of ‘-yllions’ where a new name is only introduced at
102,104,108,1016 and so on.
101 ten
102 hundred
104 myriad
Page 5 of 100
108 myllion
1016 byllion
Names for large numbers only catch on when they appear in our daily lives.
In measurement they’re avoided by the use of SI prefixes. Barring any
hyperinflation, the highest we’d expect to see for a while is the world
wealth of $1 quadrillion. For everything else, we can be pretty happy with
standard form. For more large number fun, Robert Munafo has a terrific
(but long) read on names for large numbers on his website.
Contents
Author's Introduction
Class 0 Numbers (like 3)
Class 1 Numbers (like 100)
Class 2 Numbers (like googol)
The -illion Names
Conway-Wechsler Extension
Knuth -yllion System
Class 3 Numbers (like googolplex)
Class 4 Numbers
Skewes' Number
Higher Classes
The Quality of Uncomputably Larger
Power Towers
Inventing New Operators and Functions
Beyond Exponents: hyper4
Hyperfactorial and Superfactorial
Higher hyper Operators
Bowers' Array Notation
Steinhaus-Moser-Ackermann operators
Friedman sequences
The various "Graham's number"s :
Page 6 of 100
The "Graham-Rothschild Number"
The "Graham-Gardner Number"
The "Graham-Conway Number"
Superclasses
Conway's Chained Arrow Notation
A Partial Ordering for short Conway chains
More Bowers Constructions :
Bowers' Extended Operators
Bowers' Array Notation (4-element Subset)
Bowers Arrays with 5 or More Elements
Generalised Invention of Recursive Functions
Formal Grammars
The Lin-Rado/Goucher/Rayo/Wojowu Method
Lin-Rado Busy Beaver Function
Beyond BB Function
Oracle Turing Machines
Declarative Computation and Combinatory Logic=
Page 7 of 100
Rayo's Number
BIG FOOT
The Frontier
Transfinite and Infinite Numbers
Ordinal Infinities
The First Cardinal Infinity: Aleph-Null
The Ordinal "Countable" Infinities
Epsilon-Null
All Ordinals Countable by Reordering
Aleph-One
The Continuum
The Continuum Hypothesis
The Power Sets of the Continuum
Inaccessible Infinities
Footnotes
Bibliography and other References
Other Links
Author's Introduction
This page covers all the huge numbers I have seen discussed in books and
web pages, and it actually does so in numerical order, as near as I can
tell (see the uncomparable and superclass 5 discussions).
One important thing to notice is that all discussions like this ultimately
lead to difficult and unsolved problems in the theory of algorithms and
computation. This page ends with Turing machines just before crossing over
to the transfinite numbers. If you want to learn something about the
theory of algorithms and computation, get two or more fairly knowledgeable
people to compete at describing the highest number they can, and then
stand back!. One such competition (detailed in a footnote) took only a few
days to move beyond the range of everything discussed in the first two-
thirds of this webpage, and then spent another few years discussing formal
proofs.
Page 8 of 100
Classes
First of all, I'm going to define what I call "classes" of numbers. This
is a somewhat refined and more precise version of the "levels of
perceptual realities" presented by Douglas Hofstadter in a 1982 Scientific
American column [39] (and reprinted in his 1985 book [41]). It is a
powerful and basic concept but usually goes unsaid. I think you'll agree
that the classes make sense and are a useful way to distinguish numbers.
Almost all numbers that are easy to make simple statements about (such as
which of two numbers is larger) can be put into the class system.
All numbers that anyone ever has to deal with in any practical application
(unless you count abstract mathematics and nerdyone-upmanship contests as
practical :-) are members of one of thefirst four classes. Googol and
googolplex are examples from class 2 and class 3, respectively.
Class-0 Numbers
(the concept of subitising)
Class-0 numbers are those that are small enough to have an immediate
intuitive or perceptual impact. Perceiving such a number is called
subitising, and for most purposes the limit has been shown to be somewhere
from 5 to 9 (see Kaufman [30] and Miller [31]). I'll be a bit conservative
here and place the limit at six. So, the numbers 0 through 6 are class 0.
One way to see this phenomenon for yourself is to use flash cards (or a
computer program set up to simulate flash cards) that present pictures of
objects that can be counted and placed in random arrangements — but look
at the picture only long enough to see it, and not long enough to start
counting. Then, after the picture is hidden, ask how many objects there
were. You then try to count the number of objects in your mental image of
the picture you've just seen. If the number of objects is a class 0
Page 9 of 100
number, you'll usually be able to give the right answer. As you increase
the numbers of objects, your counts will be less and less likely to be
correct. Obviously, this gives a rather fuzzy definition of "class 0", but
the value you get will almost always be "around" 6.
Class-1 Numbers
Class-1 numbers are those that are small enough to be perceived as a bunch
of objects seen directly by the human eye. What I mean by "seen directly"
is that it is possible to see the number as a set of separate, distinct
objects in a single scene (no time limit, but the observer and the objects
cannot move). 100 is a class-1 number because it is possible to see 100
objects (goats for example) in a single scene. The limit for class-1
numbers is around a million, 1,000,000 or 106. You can just barely put
1,000,000 dots on a large piece of paper and stand at a distance such that
you can perceive each individual dot as a distinct dot, and at the same
time be within viewing distance of the other 999,999 dots. (I have
actually done this, just for fun!) As with Class-0 the definition is
fuzzy, some people have better vision and could manage 10,000,000 dots or
even more.
Class-1 numbers include all quantities that people can comfortably handle
or perceive. For values in class 1, it is easy to distinguish the
magnitude of the value just by looking at it. Most people have realised
that, if they walk into a room with 85 people, although they can't tell
it's exactly 85, they know right away it's somewhere around 75 to 100. No
thought or calculation is necessary. This is an immediate perception of
magnitude, and the ability extends to numbers up into the thousands and
tens of thousands, but drops off after that. A person in a stadium with
10,000 people will have a fuzzier magnitude perception (they might guess
anywhere from 3,000 to 30,000). By the time we get to numbers like 108 (the
number of blades of grass in an acre) a person is probably about as likely
to believe "10 million" (107) as "a trillion" (1012) unless they take the
time to do some calculations.
Class-1 numbers also include most types of things that people aggregate or
count with the passage of time. If you have kept count of how many times
you have done something (e.g. jogging) or the number of things in a
collection (e.g. stamps) it probably numbers in the class 1 range. The
actual act of counting usually wears out before exceeding class 1, partly
because of the difficulty of accurately remembering the digits. (While
counting the number of days you have jogged is fairly easy, most people
would not be able to persist in keeping count of how many steps they had
Page 10 of 100
taken once that number gets into 6 or 7 digits!) I tried this myself at
age 9 and reached 35000 before memory became too difficult.
Class-2 Numbers
Class-2 numbers are those that can be represented in exact form using
decimal place-value notation (or another small integer base, like base 2,
16 or 60). Typically this depends on how the digits are recorded and what
you need to do with them. Since I used 6 as the upper limit of class 0,
and 106 = 1000000 for the upper limit of class 1, I'll just continue the
pattern and say that the class-2 numbers go from 106 to about 101000000.
Place-value notation was popularised in the Arabic culture (but came from
India, and perhaps from China before that, again see [45]). It opened up
the range of class-2 numbers to anyone who wanted to use them. It was no
longer necessary to come up with new symbols for each successive power of
10. Generalizations in arithmetic rules were obvious: adding 2000+7000 was
not only analogous to adding 2+7, it was essentially the same thing.
Handling huge numbers became easy. To make an exact calculation about
thousands of objects, only a handful of objects (the digits) need to be
manipulated.
Googol is a class-2 number, as are the various large prime numbers used in
cryptography, all of the known Perfect numbers (until 1997!), the Fermat
numbers with known factorization, etc. All of the large physical constants
like 6.02×1023 (Avogadro's number) and 1080 (the number of protons in the
universe) are class-2. So are most of the numbers with names ending in -
illion, like vigintillion (1063), centillion (10303), and on up to the
somewhat contrived milli-millillion (103000003) (which, by my admittedly
arbitrary decision, is a bit beyond the class-2 range).
The word million comes from around 12702, and entered the English language
around 13706. The names billion, trillion, and so on up to nonillion, plus
the general idea of continuing with Latin-derived prefixes all first
appear in the late 15th century, in writing by Nicolas Chuquet, a French
Page 11 of 100
mathematician living in Lyon from 1480 until his death in 1488. (There
were also the longer forms bymillion and trimillion used as early as 1475
by Jehan Adam, but these never caught on). Follow this link for more
details: Origins of the Chuquet number names.
The long scale is Chuquet's original system, and has digits grouped 6 at a
time, thus trillion is a million times larger than billion. This is the
"billion=1012 system". Peletier's names for 10(6N+3) (in the English
spelling, milliard=109, billiard=1015, etc.) are compatible with this
system.
The use of number-names during the following few centuries eventually led
to widespread usage of billion to mean 109, trillion for 1012, and similar
redefinitions of the higher names. These definitions are the short scale
or "billion=109 system". Follow this link for more on the history of short
vs. long scale. Here is a related video by Numberphile: How big is a
billion?.
While the confusion between short and long scale was becoming well-
established, the big-number words ending in -illion were also becoming
popular for the purpose of espressing an excessively or unimaginably
large, or even infinite, quantity. This is a type of usage that was
already common for hundreds, thousands, myriads and millions. For example,
OED's [42] HUNDRED heading 2 a. begins: "Often used indefinitely or
hyperbolically for a large number: cf. thousand. (With various
constructions, as in [heading] I.)", and then gives nine quotations dating
from 1300 AD to 1885. In the following table I show the first documented
use of each number-name in both the literal sense and in this
"superlative" sense.
(It should be noted that zillion more generally can refer to far larger
things. For example, Howard DeLong[34] used the term "zillion" to refer to
an iterated Ackermann function of some other really large number c1.[49]
This table shows all positive powers of ten that have authoritatively
accepted names in English (by [42]) up to Chuquet's highest name
nonillion. The numeric values here follow the billion=109 system ("short
scale"). I am also including a few other non-powers of 10 that have names
in English, but leaving out many base-20 constructions and other names
less than 100, about which you can read plenty in [45]. I include all
Page 12 of 100
former and current official SI prefixes because they are quasi-"words"
that have a purely numerical meaning. The dates of first literal and
superlative usage are largely from OED [42] but are augmented as indicated
in the footnotes.
first
N in first
literal SI
N Latin 103N+3 name for 103N+3 superlative
3,18
usage prefix(es)20
usage [42]
[42]
deca- or
101 ten
deka- (da,dk)
megamega-,
3 tres 1012 trillion 169021 184723
Tera- (T)
Page 13 of 100
6 sex 1021 sextillion 169021 185523 Zetta- (Z)
Yotta- or
7 septem 1024 septillion 169021 ?
Yotto- (Y)
Chuquet left it to others to work out the details of extending the names
beyond nonillion. Although there is much discrepancy between the actual
number-names in Latin and the -illion names Chuquet listed, it was
nevertheless understood that Latin number-names were to be used to extend
the names as needed. Using Latin for prefixes goes smoothly as far as
vigintillion. The following names are found in many dictionaries19;
vigintillion and centillion are a little more common than the others. Some
popular non-dictionary sources have made reference to millillion and
milli-millillion (mostly due to Henkle/Brooks, and Borgmann [33]).
Page 14 of 100
100 centum 10303 centillion
1010100 "googolplex"
The system is based on the short scale (billion=109) but the names could
easily be used in a long scale system. A number name is built out of
pieces representing powers of 103, 1030 and 10300 as shown by this table:
0 - - -
Page 15 of 100
- Take the power of 10 you're naming and subtract 3.
- For a quotient less than 10, use the standard names thousand, million,
billion and so on through nonillion. Otherwise:
- Break the quotient up into 1's, 10's and 100's. Find the appropriate
name segments for each piece in the table. (NOTE: The original Conway-
Wechsler system specifies quinqua for 5, not quin.)
- For the special case of tre, the letter s should be inserted if the
following part is marked with either an s or an x.
Many of the resulting names are only slightly different from one another.
For example
102421 is sexoctingentillion.
Then there's
10903 = trecentillion.
Page 16 of 100
As their example shows, the beginning parts of the standard names such as
million and trillion are used for the "1" and "003" parts (respectively)
of the number 1,000,003, with the placeholder "nilli" for the central
"000" portion. This is the "1,000,003rd zillion", which is
103×1000003+3=103000012. In general, when naming 103N+3, the rules above are to be
used for each group of 3 digits in the number N.
A Practical Alternative
If the above tables seem a bit much to deal with, here is my modest
proposal for a simpler naming system:
2. Beyond that, use "Ten to the power of..." followed by the appropriate
class 1 number.
Donald Knuth created a system that extends much further than the standard
Latin-based system. In the essay Supernatural Numbers[38] he wrote:
So in this system the word "thousand" is not used, and instead everything
up to 9999 is named using the traditional names for numbers up to 99 plus
"hundred", and no comma is used. For example:
Page 17 of 100
127 = One hundred twenty-seven
1000 = Ten hundred
1356 = Thirteen hundred fifty-six
3000 = Thirty hundred
4192 = Forty-one hundred ninety-two
104 is called "myriad", a name that originally comes from ancient Egypt. It
is written 1,0000 — note that the comma is added to separate the lowest
four digits, not three. Numbers up to 9999,9999 are named like so:
Then 1016 is called "byllion", and a new punctuation mark is used. Knuth
points out the advantage of avoiding the long scale vs. short scale
confusion. Notice each punctuation mark can be read exactly when it
appears so it's easy to read off these numbers in words:
Each new number name is the square of the previous one — therefore, each
new name allows us to name numbers with twice as many digits. This gives
us a lot more mileage out of each name. Knuth continues borrowing the
traditional names changing "illion" to "yllion" on each one.
"vigintyllion" ends up being 104194304, a bit beyond the upper limit of
class-2 numbers.
In the same article [38], Knuth reports that Hsu Yo (living near the end
of the Han dynasty) used the names wan=104, i=108, chao=1016 and ching=1032
as part of a nomenclature system for large numbers. The names descended
Page 18 of 100
into the present-day Chinese wàn, yì, zhào and jing respectively. Usage of
the names zhào and jing for 1016 and 1032 respectively is the "higher degree
system" reported by [45], but this usage did not continue into the present
(see Wikipedia's Chinese numerals article). The ancient usage corresponds
directly to myirad, myllion, byllion and tryllion in Knuth's system,
including the ordering of words to make the names of arbitrary large
numbers. A specific example showing the recursive grouping, with Chinese
spelling, phonetic pronunciation and translation into more familiar
numeric notation is shown in [45] figure 21.41 (page 278). The Chinese
names continue with gai which would be 1064, all the way up to zài=104096
(which is Knuth's decyllion), but usage of the larger ones has only ever
been "theoretical" — no actual usage is known.
As with class-0 and class-1, the limit for class-2 numbers is subjective.
I defined class 2 numbers as those that "can be represented in exact form
using place-value notation", and this depends on where and how the digits
are recorded, which in turn depends on what you want to do with the
number. If you just want to store the exact value of a number and not do
anything with it, you can keep it on a tape or disk, which has much more
capacity — perhaps as much as 1012 digits. For some simpler algorithms,
such as squaring a number and adding together all the digits of the
result, the limit might be quite large — say a billion digits. For
algorithms involving many intermediate results, lookup tables or auxiliary
data, the limit might be lower — perhaps as few as 1000 or 10000 digits.
So, the limit for class 2 could be anywhere from 103 to 1012 digits,
depending on the desired operation. We'll just continue the pattern and
say that class 2 ends at 1 million digits, i.e. numbers up to 101000000 or
10106.
Class-3 Numbers
Class-3 numbers are the largest which can effectively be compared to see
if they are of comparable magnitude. For example, the following two
numbers are class-3 (and are at the low end, as class-3 numbers go) :
A = 279641170620168673833
Page 19 of 100
B = 350247984153525417450
Which is larger?
We cannot compute the exact values of these two numbers and compare
directly — they have way too many digits to store the values on a
computer. That is the nature of class-3 numbers. However, we can represent
both in scientific notation with 10 digits of accuracy. This is
accomplished in much the same way that your computer or a scientific
calculator would do it. Starting with the logarithm of 2 (or 3), multiply
by the exponent, then divide by the logarithm of 10, separating the
integer from the fractional part, and use the fractional part to determine
the first few digits of the answer. In this case we get:
A = 5.0760252191 × 1023974381246463762439
B = 5.0760252191 × 1023974381246463762439
Now you begin to see the problem. Using 10 decimal places, both values
seem to be the same. (We know they are not, because one is a power of 2
and must be even, and the other, being a power of 3 is odd). As it turns
out, you need at least 20 decimal places to see that B is slightly larger.
Jonathan Bowers 10, about whom we'll discuss a lot more further below, has
invented many names for special numbers in this area: myrillion=1030003,
micrillion=103000003, killillion=103×103000+3, megillion=103×103000000+3,
gigillion=103×103000000000+3, and likewise with higher SI prefixes, which he
extends, e.g. tedakillion=103×103×1042+3. There are also a few ad-hoc Chuquet
extensions that attempt to reach up into this area.
Class-4 Numbers
Now we move on to Class-4 numbers and higher classes. You may have already
seen a pattern here; we'll just continue the pattern:
class-4 numbers are those numbers that are larger than class-3, and whose
logarithm can be represented as a class-3 number.
C = 22283
D = 33352
as before we take the logarithm of both but this time we must do it twice,
and we find
so D is larger.
Skewes' Numbers
These numbers occur in the study of prime numbers, and particularly the
frequency of occurrence of prime numbers. Gauss' well-known estimate of
the number of prime numbers less than N is
Since then, others have improved the estimate dramatically. Conway and Guy
(The Book of Numbers, page 61) cite the result of Lehman, who in 1966 gave
an upper bound of about 101167. According to Eric W. Weisstein and
Wikipedia, in 1987 H. J. J. te Riele reduced the upper bound of the first
crossing to ee(27/4), a class 2 number approximately equal to 8.185×10370. In
2000 Bays and Hudson found an actual crossover point using numerical
techniques — around 1.39822×10316. Most recently, in 2005 Patrick Demichel
found a smaller crossover point near 1.397162914×10316. In any case, the
original Skewes' Number is now just an interesting part of history.
class-5 numbers are those numbers that are larger than class-4, and whose
logarithm can be represented as a class-4 number.
class-N numbers are those numbers that are larger than class N-1, and
whose logarithm can be represented as a class N-1 number.
but as it turns out, these higher classes aren't too useful for
representing the large numbers of abstract mathematics. Once we get into
the really big numbers like the ones discussed below, exponents are so
Page 21 of 100
unwieldy that they are no longer used directly — instead faster-growing
functions like the hyper4 function are used.
101010101,000,000 =
5 101010106 X indistinguishable from X2
10101010106
Page 22 of 100
For an example of this, imagine A has trillions of digits. If you add some
small number to it, only the last few digits will change — and all of the
digits would have to be stored and examined to tell the difference. On the
other hand, multiplying A by a small number N will change all the digits,
and you can distinguish the difference by comparing the logarithm of A to
that of A×N
This pattern does not continue with higher operators, because the "class"
system is based on exponents. For example, if A is a class 10 number and K
is class 9 or smaller, it is still easy to distinguish A④K from A, and
hard to distinguish AK from A.
Notice that this definition depends not only on A and B but also on one's
knowledge and/or ability. As you go to higher and higher operators and
functions it becomes quite difficult to determine which values are larger
than others (I refer to this later in my discussion of superclass 5). It
is easy to see that Skewes' Number is bigger than googolplex, but not
nearly so easy to figure out which of the "Graham-Rothschild number" and
the Moser is bigger.
The "Graham-Rothschild number" and the Moser are defined with different
systems of representation, and the two systems cannot be readily converted
into each other. They would be called uncomparable until the two systems
are studied and a method is developed to show which number is larger. Once
such a method was developed, and it was determined which is larger, they
are no longer "uncomparable".
computably larger
(using 10 digits in both mantissa and 143 > 127
exponent)
Page 23 of 100
uncomputably larger (using 10 digits) but
350247984153525417450 >
computably larger (using 20 or more digits
279641170620168673833
in both mantissa and exponent)
Power Towers
Problem : Start with the 3-level power tower 2210. Consider two different
ways to make it bigger: Increase the bottom-most number, making it H210
where H is something really huge like 1000000, or make the power tower
higher by making it S2210, where S is something really small like 1.001.
Determine which is biggest: the original power tower X = 2210, or the two
altered versions, A = 1000000210 or B = 1.0012210 ?
First we show that A and B are both bigger than X. A>X is obvious. For B
it's less obvious. We're comparing:
B = 1.0012210 ⋛ 2210 = X
B = (20.001442)2210 ⋛ 2210 = X
Page 24 of 100
log2(log2B) is about 1014.56, much bigger than log2(log2X) which is 10. So
B>X.
B' = 2210.
B' = 2^1024, so
We could have used any really big number H in place of 1000000 and any
small number S in place of 1.001 and B would still be the biggest, as long
as logSH is less than 21014. 21014 is about 1.7556×10305, a class 2 number. To
show how extreme this is,
S be 1+1/googol.
still much less than 21014. So even with this really huge H and really
small S, the power tower S2210 is still bigger than H210.
Page 25 of 100
100010001000
and
A similar phenomenon, the power tower paradox, causes two power towers to
be effectively the same if the numbers at the top are the same, even if
the numbers near the bottom are different. For example, 271010100 is almost
exactly the same as 101010100
The concept of the "classes" described so far does quite well at handling
everything that can be done with exponents, which are the most powerful
operator known to most people. To proceed further we begin to invent new
operators. This practice of inventing new operators continues over and
over again as you go to higher and higher large numbers. The new operators
overcome the limits of the old operators, limits that are reached as the
old notation becomes unwieldy.
3158 = ((3 × 10 + 1) × 10 + 5) × 10 + 8
When expressing larger numbers, like Avogadro's number and googol, one
usually uses exponents and power towers, as discussed above:
but after a while that becomes unwieldy too. Eventually there are so many
exponents that it cannot be written on a page. Then it becomes a good idea
to invent a new new shorthand, which amounts to defining a new operator.
Page 26 of 100
The first new operators used by those seeking large numbers are usually
higher dyadic operators. A dyadic operator is one that has two arguments —
two numbers that it acts on. Usually in notation the operator is placed
between the two numbers.
The most common higher dyadic operators follow the pattern set by the
well-known three (addition, multiplication and exponentiation). These
operators come up a lot in the definitions of large numbers that are to
follow.
absolute
operation representation inductive definition
definition
successor (a + (b-1))
addition a + b or a①b - or successor ((a-1) +
b)
ab or a^b or a↑b
exponentiation a × a × ... × a a×(a(b-1)) or (a(b-1))×a
or a③b
a^^b or a↑↑b or
hyper4 a^(a^(...^a)) a^(a④(b-1))
a④b
Note that for the last operator, there are two ways to interpret the
absolute and inductive definitions, producing different hyper4 operators.
In common practice, the first one is used because the other one can be
reduced to a combination of two exponent operators: a④b=aa(b-1), and thus it
does not really count as a new operator.
The names tetration, superpower, and superdegree have also been used to
refer to the hyper4 operator. (As a child I used the somewhat misleading
name powerlog for hyper4, as in 2 powerlog 5 is 65536.)
Extension to reals: Now, suppose you want to calculate 2④2.5 or pi④e. The
above definition isn't too useful because the number after the ④ has a
Page 27 of 100
fractional part. What we would need is a way to "extend" the hyper4
operator to real numbers. Unfortunately, this is tough to do in a way that
meets the types of standards mathematicians generally want such things to
have. I also know of no proof that such extension is impossible. A lot of
people have worked on this over the years, and if you're interested, I
suggest you check my notes here, and the Tetration FAQ.
(given: number X, we want to find R such that R④2 = X. Note that R④2 =
RR.)
hyperlog(2) ≈ 0.39
hyperlog(100) ≈ 1.39
hyperlog(10100) ≈ 2.39
hyperlog(1010100) ≈ 3.39
...
The function "below" addition: Some people have also developed a hyper0
function. If you think about it, addition is a shortcut for counting, in
much the same way multiplication is shortcut for addition. The following
definition for a hyper0 function was developed by Constantin Rubtsov:
Page 28 of 100
a⓪b = a (if b = -∞)
a⓪b = b (if a = -∞)
a⓪b = a+2 = b+2 (if a = b)
a⓪b = a+1 (if a > b)
a⓪b = b+1 (if b > a)
N.J.A. Sloane and Simon Plouffe use hyperfactorial to refer to the integer
values of the K-function, a function related to the Riemann Zeta function,
the Gamma function, and others. It is
For example, H(3) = 27×4×1 = 108 and H(5) = 86400000. This function does
not really grow much faster than the normal factorial function.
n$ = n!n!n!....n!
where there are n! repetitions of n! on the right hand side. Using the
hyper4 operator, n$ is equivalent to:
n$ = n! ④ n!
There are other ways to define a higher version of the factorial, such as
this and this.
To get an idea how big the hyperfactorial of a pretty normal number can
be, read Wayne Baisley's wonderful article "Quantity Has A Quality All Its
Own" (and bring your towel).
More Bowers Names
Jonathan Bowers, mentioned above, has many names covering this area. For
example, in analogy to googol and googolplex he refers to 10④100 as giggol
and 10④(10④100) as giggolplex.
Page 29 of 100
Higher hyper operators
absolute inductive
operation representation
definition definition
a^^^^b or a↑↑↑↑b or
hyper6 a⑤(a⑤( ... ⑤a)) a⑤(a⑥(b-1))
a⑥b
and so on.
Bowers has several named numbers in this area, including trisept, 7⑦7;
tridecal, 10⑩10; and the aptly named boogol, the frighteningly large
10(100)10.
hy(a,3,b) = a↑b = ab
hy(a,4,b) = a↑↑b
hy(a,5,b) = a↑↑↑b
hy(a,6,b) = a↑↑↑↑b
etc.
Page 30 of 100
Bowers' Array Notation (3-element Subset)
1. For one- and two-element arrays, just add the elements. [a] = a and
[a,b] = a+b
2. If rule 1 does not apply, and if there are any trailing 1's, remove
them: [a,b,1] = [a,b] = a+b; [a,1,1] = [a].
3. If neither previous rule applies, and the 2nd entry is a 1, remove all
but the first element: [a,1,n] = [a] = a.
4. There is no rule 4 (there will be when we get to bigger arrays).
5. Otherwise replace the array [a,b,n] with [a,[a,b-1,n],n-1], then go
back and repeat the rules to expand it further.
With just a little effort you can see that these rules make [a,b,n]
equivalent to hy(a,n,b) except for the special case of n=0. Compare the
formula of rule 5:
[a,b,n] = [a,[a,b-1,n],n-1]
hy(a,n,b) = hy(a,n-1,hy(a,n,b-1))
They are the same except the order of the arguments is different. Bowers
arranges the arguments in order of increasing "growth potential" — the
operator has higher growth potential than b, so it goes last.
So, all 3-element Bowers arrays are equivalent to the normal hyper
operators. [3,2,2] = 3②2 = 3×2 = 6; [3,2,3] = 3③2 = 32, [4,5,6] = 4⑥5,
etc.
a ↑↑ b = hy(a,4,b)
a ↑↑↑ b = hy(a,5,b)
Page 31 of 100
a ↑↑↑↑ b = hy(a,6,b)
(etc.)
using the hy() function allows for a more compact representation of really
large numbers that would otherwise take a lot of arrows. For example,
hy(10,20,256) is equivalent to 10 ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ 256
23 is smaller than 32
24 is the same as 42
for any other a and b, if b is greater than a then ab is greater than ba.
in general, to compare ab to cd we can probably calculate both directly, so
long as all four numbers are class 1.
Page 33 of 100
A pattern emerges: except when a is 2 or when b is 2, the values of a↑↑b
generally follow the rule:
Now let's make a similar list of a↑↑↑b examples, and showing how the a↑↑b
values fit in:
Page 34 of 100
4↑↑↑4 = 4↑↑(4↑↑↑3), a tower of height 4↑↑↑3
.. 5↑↑↑4 through 13↑↑↑4
2↑↑↑6 = 2↑↑(2↑↑↑5), a tower of height 2↑↑↑5
.. 14↑↑↑4 through 7625597484980↑↑↑4 ...
Page 35 of 100
Gödel Numbers
Goodstein sequences
Almost certainly higher than this (but who can say?) are numbers related
to the Goodstein sequence.
(For more detailed descriptions, see the Wiki entry and this page by
Justin Miller)
v2 + 2
v2 + 1
v2
v2 - 1 = (a-1) × v + (a-1) where a = value of v at this step
...
(a-1) × v
(a-2) × v + (b-1) where b = value of v at this step
...
(a-2) × v
(a-3) × v + (c-1) where c = value of v at this step
...
2 v
v + (d-1) where d = value of v at this step
...
v
e - 1 where e = value of v at this step
When higher-level exponents are involved, the series will get longer each
time a higher-level exponent has to be decremented. Each time the series
will become enormously longer, but will still be of finite length.
Therefore, the same principle applies.
Consider the lower Goodstein sequence, and look at just one of the
exponents in that sequence, and call it "c". As we have already shown, "c"
will eventually get decreased to a lower number. Call that number "d"
(which is c minus one). At this point the iteration continues for an even
longer time with no change to "d" or any of the other "exponents", but
eventually as before, the lowest exponent will have to get dinimished
again. So in this way we see that each exponent will eventually get
replaced with a lower one. Each step takes massively longer than the
previous step, but all steps are still of a finite length (not an infinite
length) so eventually even the highest exponent will get decreased.
Page 37 of 100
Ackermann's Function
The example value most commonly cited is ack-rm(3,5), 2④5 which is 265536, a
large class-2 number. Of course, as with Steinhaus-Moser notation it is
easy to transcend the classes entirely.
a1(n) = ack-h(n,n,n)
While it is true that a1(x) grows just as fast as the ack-h() function,
and therefore serves as a good way of defining large numbers as a function
of one variable, actually computing those numbers involves the recursive
definition of the function. If x>1, we have:
Page 38 of 100
a1(x) = ack-h(x,x,x) = ack-h(x-1, x, ack-h(x,x,x-1))
note that the arguments of the two ack-h functions on the right are not
equal to each other, and therefore we can't substitute from the definition
of a1(n) to make the right side be in terms of the a1() function.
These numbers were constructed by Hugo Steinhaus and Leo Moser (in the
late 1960's or earlier, exact dates unknown) just to show that it is easy
to create a notation for extremely large numbers.
(Both versions of the notation predate 1983; I learned about the Moser's
number at an academic library in June 1979.)
Page 39 of 100
sm(2,1,3) = 2^2 = 4 sm(2,2,3) = sm(sm(2,1,3),1,3) = sm(4,1,3) = 4^4 = 256
sm(2,1,4) = sm(2,2,3) = 256 mega = sm(2,1,5) = sm(2,2,4) =
sm(sm(2,1,4),1,4) = sm(256,1,4) = sm(256,256,3) = sm(256^256,255,3) =
sm((256^256)^(256^256),254,3) =
sm([(256^256)^(256^256)]^[(256^256)^(256^256)],253,3) = ... megiston =
sm(10,1,5) moser = sm(2,1,Mega)
10 let mega = 256; 20 for n=1 to 256; 40 let mega = mega ^ mega; 80 next n
160 print "Mega = ", mega 320 end
256 = 28 = 223
256256 = 22048 = 2211 ≈ 3.231700607153×10616
(256256)(256256) = 222059 ≈ 10(1.992373902866×10619)
[(256256)(256256)][(256256)(256256)] = 222059+22059 ≈ 1010(1.992373902866×10619)
Each time through the loop there are twice as many 256's — so there are
2256 256's in mega. However, the parentheses are grouped differently from
the power towers discussed above. After two times through the loop, for
example, it's (256256)^(256256). That is not as big as 256(256(256256)) — the
former is 10(1.992373902866×10619), the latter is 10[10(7.78271055807×10616)]. This
discrepancy continues, with the result that the mega is about
10^(10^(10^...^(1.992373902866×10619)...)), with 255 10's before the
"1.992373902866×10619" part. For reasons explained here, that is equivalent
to what you get if you replace all but the last few 10's with nearly any
other number between say 2 and 1000. hypercalc's final answer is:
255 PT ( 1.992373902865×10619 )
which represents a power tower with 255 powers of 10 ("255 P.T.") and
1.992373...×10619 at the top.
Page 40 of 100
Friedman Sequences
A7198(158386) = ack-rm(7198,158386)
This value is less than the "Graham-Rothschild number" and the other
"Graham's number"s (which we'll go into next). However, the N=4 case gives
a result that is immensely bigger than all the versions of Graham's
number. Friedman describes these relations at [47].
The original genuine "Graham's number", from a 1971 paper by Graham and
Rothschild [35], is an upper bound for a problem in Ramsey theory, (graph
colouring, combinatorics).
F(m,n) = { 2^n for m=1, n>=2 { { 4 for m>=1, n=2 { { F(m-1,F(m,n-1)) for
m>=2, n>=3
Page 41 of 100
Graham and Gardner's 1971 description of the number
lgn(1) = 2↑↑↑↑↑↑↑↑↑↑↑↑3
lgn(2) = 2↑↑↑...↑↑↑3
lgn(3) = 2↑↑↑...↑↑↑3
Page 42 of 100
(with lgn(2) up-arrows)
. . .
lgn(7) = 2↑↑↑...↑↑↑3
So Gardner and Graham came up with the definition for a larger upper
bound, which was popularised by Martin Gardner in 1977. This came to be
known as "Graham's number", but I call it the "Graham-Gardner number". Its
value is is gn(64), where gn() is defined as follows:
Page 43 of 100
The "curly braces" indicate that the number of up-arrows in each "layer"
is counted by the number immediately below, with 4 arrows in the bottom
layer.
Todd Cesere and Tim Chow have both proven that the "Graham-Gardner number"
is bigger than the Moser, and in fact even gn(2) is much bigger than the
Moser. Tim's proof is outlined on a page by Susan Stepney, here.
Conway and Guy's book The Book of Numbers [43] includes a description of a
"Graham's number" which is inexact, and which differs from the other
"Graham's number" definitions above. The exact text from [43] is:
Graham's number is
4↑↑ . . . ↑↑4, where the number of arrows is
4↑↑ . . . ↑↑4, where the number of arrows is
. . . et cetera . . .
(where the number of lines like that is 64)
It lies between 3→3→64→2 and
3→3→65→2.
(The last bit with the right-arrows is using Conway's Chained Arrow
Notation). The problem with this description is that it doesn't tell you
how many up-arrows is in the last of the "lines like that", though we know
there are 64 lines. Sbiis Saibian has a thorough page on Graham's number
[53] and calls this the "Graham-Conway number". He makes an educated guess
that the number of up-arrows on the last 4↑..↑4 line should be the same as
in the Graham-Gardner definition, i.e. 4 up-arrows:
Page 44 of 100
Clearly the "Graham-Conway number" is larger than the others, and it's the
largest "Graham's number" I've seen, notwithstanding xkcd:
ga(g64) = AAUGHHHH!!!
Superclasses
Now let's take a break from the specific examples and functions for a
moment to describe another type of "number class" that is evident in the
way people describe large numbers. We'll call these superclasses.
(1) Let's start with 3↑3. This is 33 = 27, a number that is small enough
for anyone to visualise.
(2) Next, consider 3↑↑3 = 3④3. This is 333 = 327 = 7625597484987, about 7
trillion. Even 7 billion would be too big for anyone to visualise, but
most people can understand it as being comparable to something familiar.
(For example, it is about 1000 times the world population and about a
tenth the number of cells in a human body.)
Page 45 of 100
(3) 3↑↑↑3 or 3⑤3, is 3④(3④3), which is 3④7625597484987. That would be a
power tower of 3's, 7 trillion levels high. Now we are far beyond the
ability to understand — nothing in the real world is this numerous, and
even combinations of things (such as the number of ways to shuffle every
particle in the known universe) only require, at most, 4 or 5 levels of a
power-tower to express. However, even though the quantity is beyond
understanding, the procedure for computing it can be visualised: Start
with n=1. Now replace n with 3n. Replace n with 3n again, and so on —
repeat seven trillion times. This is a procedure that one can visualise
doing, and it can even be completed in a reasonable amount of time using a
fast computer and a suitable representation format such as level-index (be
advised: some rounding may occur).
(4) Now consider 3↑↑↑↑3, or 3⑥3, the first step in the "Graham-Gardner
number" definition. This is 3⑤(3⑤3), which is 3④(3④(3④(...④(3④3))...)),
a chain of 7 trillion 3's and hyper4 operators. Every one of these
requires performing the exponent operation an unimaginable number of
times. So now, the procedure is beyond human ability to visualise,
although it can be understood. Start with n=3. Now replace n with an
exponential tower of threes of height n. Repeat 3⑤3 - 2 more times, where
3⑤3 is the huge number whose calculation we visualised in the previous
paragraph! Martin Gardner, also writing about "Graham's number" (the one
here I call the "Graham-Gardner number"), said:
(5) We have just seen four numbers in a sequence: 3↑3, 3↑↑3, 3↑↑↑3, and
3↑↑↑↑3. Now consider the formula for the "Graham-Gardner number". Begin
with x = 3↑↑↑↑3, the number whose calculation procedure cannot even be
visualised — then increase the number of up-arrows from 4 to x. Then
increase the number of up-arrows to this new, larger value of x again.
Then — repeat 61 more times! Here's what Yudkowsky had to say:
Graham's number is far beyond my ability to grasp. I can describe it, but
I cannot properly appreciate it. (Perhaps Graham can appreciate it, having
written a mathematical proof that uses it.) This number is far larger than
most people's conception of infinity. I know that it was larger than mine.
...
Page 46 of 100
Superclass 3: The number cannot be understood, but the procedure for
computing it can be visualised. (Example: 3↑↑↑3)
Notice that the first three (superclass 0, 1 and 2) are roughly equivalent
to class 0, class 1 and class 2 above. The division points between them
will be similar for most readers, but the upper range of superclass 2 will
probably vary from one reader to another. I think I "understand" numbers
about as big as the size of the visible universe, although I find it
harder on some days than on others. There are probably people who have an
understanding of such things as the size of the universe after cosmic
inflation, or the size that would be needed for a spontaneous origin of
life.
The first three superclasses can all be calculated fairly easily, using
familiar techniques or modest towers of exponents — in the latter case
sizes can easily be compared by looking at the number of levels in the
tower and the value of the number at the top. Practical tools like
hypercalc exist to actually work with these numbers.
Page 47 of 100
be followed and understood fairly easy, but it probably isn't too easy to
remember.
I have found my own boundary between 4 and 5 varies widely from one day to
another. Clearly there are those (such as the people who worked on the
numbers we are about to discuss) for whom "superclass 4" extends far
higher than mine. I suggest it might be useful to define:
Here is Conway's notation, adapted from Susan Stepney's excellent web page
on large numbers:
a→b→...→x→1 = a→b→...→x
Page 48 of 100
C. Conway is silent about the meaning of a two-element chain7, but a
definition is necessary. The most logical definition combines the two
previous rules and assumes that they work in reverse:
a→b = a↑b
= ab
a→b→...→x→1→(z+1) = a→b→...→x
E. The last number in the chain can be reduced in size by 1 by taking the
second-to-last number and replacing it with a copy of the entire chain
with its second-to-last number reduced by 1:
a→b→...→x→(y+1)→(z+1)
= a→b→...→x→ a→b→...→x→y→(z+1)) →z
a→b→...→x→ (y+1) → 2
= a→b→...→x→ (a→b→...→x→y→2) →1
= a→b→...→x→ (a→b→...→x→y→2)
= a→b→...→x→ (a→b→...→x→ (a→b→...→x→(y-1)→2) →1 )
= a→b→...→x→ (a→b→...→x→ (a→b→...→x→(y-1)→2))
and in general,
a→b→...→x→ (y+1) → 2
= a→b→...→x→ ( y times a→b→...→x) y times
a→b→...→x→2→ (z+1)
= a→b→...→x→ (a→b→...→x→1→(z+1)) →z
= a→b→...→x→ (a→b→...→x ) →z
a→b→...→x→3→ (z+1)
= a→b→...→x→ (a→b→...→x→2→(z+1)) →z
= a→b→...→x→ (a→b→...→x→ (a→b→...→x) →z) →z
Page 49 of 100
a→b→...→x→4→ (z+1)
= a→b→...→x→ (a→b→...→x→3→(z+1)) →z
= a→b→...→x→ (a→b→...→x→ (a→b→...→x→ (a→b→...→x) →z) →z) →z
and in general,
The parentheses can only be removed after the chain inside the parentheses
has been evaluated into a single number. Remember, the arrows are all
taken together at once. You can't group them to evaluate, you can only
reduce terms off the end using one of the rules above. a→b→c→d is not
equivalent to (a→b→c)→d, a→(b→c→d), a→(b→c)→d, a→b→(c→d) or
anything else like that.
The wikipedia article, Conway chained arrow notation, gives this set of
rules which is equivalent to the above but probably a bit easier to use:
4b. The chain X→a→b for any values of a and b where both are greater than
1, is equivalent to X→(X→(a-1)→b)→(b-1).
4c. By repeating rule 4b we can see that any chain X→a→b with a and b
both larger than 1 eventually turns into a chain X→c where "c" is a the
value of the recursive construction X→(X→(X→(...(X→1→1)...)))→(b-
1))→(b-1) with the number of nested parentheses depending on a.
Regarding 1's
Any chain starting with 1 eventually ends up as 1→a→b for some a and b,
which is 1↑↑↑...↑↑↑a with b arrows. This is just a power of 1, so any chain
starting with 1 is equal to 1.
Page 50 of 100
Any chain with a 1 in it someplace else, like a→b→c→1→d→e→f, reduces
to a→b→c→1→x for some x, and this then immediately reduces to a→b→c.
So if you have anything with a 1 in it, you can remove the 1 and
everything after it.
Some Correlations
The rules for Conway chained arrows express them in terms of Knuth's up-
arrow notation and recursion; we can also express some of the other
functions and numbers discussed above in terms of chained arrows:
A(m, n) = ( 2→(n+3)→(m-2) ) - 3
Now start with f64(1), and note that we can reverse rule 4c above (with
"3→3" as the value of X), and show that:
f64(27) = 3→3→(3→3→(...3→3→
Since f64(1) < f64(4) < f64(27), the "Graham-Gardner number" must be between
3→3→64→2 and 3→3→65→2. See here for a somewhat different discussion of
the same thing.
Partial Ordering
Page 51 of 100
One may speculate on the general problem of determining which is the
larger of two chains a→b→...→c and x→y→...→z. We can begin to make
answer that question for some of the shorter chains (most of which is
simply a restatement and re-ordering of the examples in my partial
ordering for Knuth up-arrows):
First, as noted above we can remove any 1's and anything coming after a 1.
Also, if the chain starts with 2→2 the whole thing can be replaced with 4.
For any a, the chain a→2→2 = a↑↑2 = aa. So for these, the one with a
larger a is the larger chain.
If the first two items are the same, as in comparing a→b→c to a→b→d,
both are like a↑↑↑...↑↑↑b but one has more arrows. So if a and b are both
greater than 2, then the one with the larger d is larger.
Similarly if the first and last items are the same, as in comparing a→b→d
to a→c→d, we are comparing two things with the same number of arrows
(a↑↑↑...↑↑↑c versus a↑↑↑...↑↑↑d) and clearly the one with the larger number
is larger.
2→3→2 = 2↑↑3 = 2↑2↑2 = 2↑4, but 2→2→3 = 2↑↑↑2 = 2↑↑2 = 2↑2 = 4. Since
2→2→anything is 4, it's clear that if we have two 2's, the arrangement
with the larger-than-2 number in the middle is larger. Also, 3→2→2 = 3↑↑2
= 3↑3 = 27, so this is the largest of the three combinations of two 2's and
a 3.
The thing to notice is that the winners have the 4 at the end, and among
them the one with the 3 first is a lot larger. Let's try it again with
bigger numbers all around:
= 4↑↑↑↑(4↑↑↑4↑↑↑4↑↑↑4)
3→4→5 = 3↑↑↑↑↑4
= 3↑↑↑↑(3↑↑↑↑(3↑↑↑↑3))
= 3↑↑↑↑(3↑↑↑↑(3↑↑↑3↑↑↑3))
Here it appears that 3→4→5 is going to end up being larger than 4→3→5.
This is a reflection of the general rule for partial ordering of the hyper
operator described above.
Page 53 of 100
Using this notation makes it easier to make the operator itself a variable
or expression, and unlike using the hy() function it retains the look of
applying an operator (because the operator part is in the middle where it
"belongs". For example:
a <1+2> b = a <3> b = ab
gn(1) = 3 <6> 3
gn(2) = 3 <3 <6> 3> 3
Mega ≈ 256 <4> 2
Moser ≈ Mega <Mega+1> 2 ≈ (256 <4> 2) <256 <4> 2> 2
a <<1>> 2 = a <a> a
a <<1>> 3 = a <a <a> a> a
a <<1>> 4 = a <a <a <a> a> a> a
and so on
a <<1>> b is "a expanded to b".
If you wish, you might prefer to represent this operator in a way similar
to the higher hyper operators but with a 1 inside two circles (which I'll
enlarge here for clarity):
a⓵b
Since the two circles are hard to see in normal font sizes, I'll instead
use the hyper1 operator symbol ① inside a set of parentheses: a(①)b.
3(①)2 = 3 <3> 3
gn(1) = 3 <6> 3
3(①)3 = 3 <3 <3> 3> 3 = 3 <27> 3
gn(2) = 3 <3 <6> 3> 3
and so on
3(①)65
gn(64) = "Graham-Gardner number"
3(①)66
a <<2>> 2 = a <<1>> a
a <<2>> 3 = a <<1>> (a <<1>> a)
a <<2>> 4 = a <<1>> (a <<1>> (a <<1>> a))
and so on
a <<2>> b = a <<1>> (a <<2>> (b-1))
a <<2>> b is called "a multiexpanded to b".
Page 54 of 100
Note the similarity of the definition a <<2>> b = a <<1>> (a <<2>> (b-1))
to the corresponding definition for a hyper operator, e.g. a②b = a①(a②(b-
1)).
a <<<1>>> 2 = a <<a>> a
a <<<1>>> 3 = a <<a <<a>> a>> a
a <<<1>>> 4 = a <<a <<a <<a>> a>> a>> a
and so on
a <<<1>>> b is "a exploded to b".
a <<<2>>> 2 = a <<<1>>> a
a <<<2>>> 3 = a <<<1>>> (a <<<1>>> a)
a <<<2>>> 4 = a <<<1>>> (a <<<1>>> (a <<<1>>> a))
and so on
a <<<2>>> b is called "a multiexploded to b".
and so on:
You can see that this generalises easily to a function of four numbers, a,
b, the number inside the angle-brackets and a number telling how many
angle-brackets there are. This can be written as a function, f(a,b,c,d) or
something like that — but Bowers wanted to go further.
All of the operators defined thus far can be expressed as an array with up
to four elements, as follows:
[a] = a
[a,b] = a+b
Page 55 of 100
[a,b,1] = [a,b] = a+b = a①b
[a,b,2] = a×b = a②b
[a,b,3] = ab = a③b
[a,b,c] = a <c> b = hy(a,c,b)
[a,b,c,1] = a <c> b (combining a and b with the cth operator from the
added, multiplied, exponentiated, ... sequence)
[a,b,c,2] = a <<c>> b (combining a and b with the cth operator from the
expanded, multiexpanded, powerexpanded, ... sequence)
[a,b,c,3] = a <<<c>>> b (combining a and b with the cth operator from
the exploded, multiexploded, powerexploded, ... sequence)
1. For one- and two-element arrays, just add the elements. [a] = a and
[a,b] = a+b
2. If rule 1 does not apply, and if there are any trailing 1's, remove
them: [a,b,1,1] = [a,b]; [a,b,c,1] = [a,b,c], etc.
3. If neither previous rule applies, and the 2nd entry is a 1, remove all
but the first element: [a,1,b,c] = [a] = a.
4. If none of the previous rules applies, and the 3rd entry is a 1:
[a,b,1,c] becomes [a,a,[a,b-1,1,c],c-1]
5. Otherwise all four elements are greater than 1: [a,b,c,d] becomes
[a,[a,b-1,c,d],c-1,d]
12. f(s(s(a)),1,1,1) = a
1b. f(s(s(a)),s(s(s(b))),1,1) = f(s(s(s(a))),s(s(b)),1,1)
3. f(s(s(a)),1,s(s(b)),s(s(c))) = a
4. f(s(s(a)),s(s(s(b))),1,s(s(s(c)))) =
f(s(s(a)),s(s(a)),f(s(s(a)),s(s(b)),1,s(s(s(c)))),s(s(c)))
5. f(s(s(a)),s(s(s(b))),s(s(s(c))),s(s(d))) =
f(1,f(1,s(s(b)),s(s(s(c))),s(s(d))),s(s(c)),s(s(d)))
With a little effort you can see that anything starting with [2,2 is equal
to 4. To get anything bigger than 4, you have to have at least one 3. Here
is the simplest example:
3 <<2>> 2 is the same as [3,2,2,2]. In fact, the rules for the 4-element
array notation are equivalent to definitions of the extended operators.
The array [a,b,c,2] is equal to a <<c>> b; [a,b,c,3] is a <<<c>>> b; and
in general [a,b,c,d] is a <<<<<c>>>>> b with d sets of brackets around the
c.
Of course, Bowers wanted to extend the system, so the rules were designed
to work with arrays of arbitrary length. This is done by changing rules 4
and 5 to the following:
4. If none of the previous rules applies, and the 3rd entry is a 1: Define
the variables a,b,S,d and R so that the array is [a,b,S,1,d,R] where a,b
are the first two elements, [S,1] is the string of 1 or more 1's; d is the
first element bigger than 1 and [R] is the remaining part of the array.
Replace the array with [a,a,S',[a,b-1,S,1,d,R],d-1,R] where [S'] is a
string of a's of equal length as string [S].
I am fairly well convinced that Bowers is right in stating that the value
represented by the 5-element array [n,n,n,n,n] is at least as large as
n→n→n→...→n in the Conway chained-arrow notation, where there are n
items in the chain.
Formal Grammars
If the foregoing makes little sense, consider this concrete (but somewhat
non-rigorous) example. Select any well defined, "sufficiently powerful"
grammar G, consisting of a symbol-set of S symbols, finite in number, and
well-defined rules of what constitutes a syntactically valid string of
symbols specifying an integer. An example grammar that should be fairly
familiar uses the symbols:
0 1 2 3 4 5 6 7 8 9 + * ^ ( )
and the rules that these symbols are to be strung together to make a legal
set of additions, multiplications and exponentiations yielding an integer
result; in this example S = 15 because we listed fifteen symbols. Just to
be unambiguous, we'll require parentheses whenever two or more operators
appear in a string.
Since N is always larger than X + 2 we can define our new grammar G' just
by adding the symbol:
a h b
where a and b are valid strings, and interpreted as a④b. This function
grows faster than G's m(x) function. In this new grammar, which we now
call G':
m'(3) is 9④9
m'(7) is 9④(9④9)
m'(11) is 9④(9④(9④9))
Now the process could continue to grammar G'' and so on. If you continue
the same idea indefinitely you just get higher hyper operators, but you
could also define new symbols using the ideas given above — the Ackermann
function, the Conway chained-arrow notation, etc. At each stage you have a
grammar Gx with its maximal function mx(n) to which the same idea can be
applied to generate another bigger function.
Perhaps the first and best-known such formulation is the busy beaver
problem. It achieves truly staggering huge numbers with absolutely the
least amount of symbols, rules, and input data of anything weve seen so
far; simply put, it's hard to beat. We'll see why soon, when we get to the
next table with large numbers.
The Busy Beaver Function was originally defined by Tibor Rado at Ohio
State in 1962. It is defined by specifying that you must start with a
blank tape (all 0's), with a finite number of symbols per position on the
tape (we usually use two: 0 and 1) and you're limited to N states in the
state machine. What is the most number of marks (1's) you can have it
write on the tape before stopping? A set of rules that never stops doesn't
count. The maximum number of marks for N states is BBN. This is a well-
defined function and it grows very very fast.
Page 61 of 100
In this table, the column labeled "Machines" tells how many Turing
machines of N states exist; this is (4N+4)2N (the number that actually have
to be checked is lower). The column labeled "steps" shows how many steps
are taken by the current record-holder before halting. Here are some older
record setters and a more detailed description of the difficulty of the
problem. A good page for recent infomation about the problem is Marxen's
own page.
7 24N(N+1)2N
BBN is not "computable" in the formal sense — you cannot predict how long
it might take to count the number of 1's written by all Turing Machines
with N states for arbitrary values of N. But for specific small values of
N, it is possible to do a brute-force search, with human assistance to
examine all the "non-halting" candidates and equip your program with
pattern-matching techniques to identify these as non-halting.
However, this takes massively greater amounts of work for each higher
value of N, and so the Busy Beaver function is unwieldy to calculate. No-
Page 62 of 100
one has been able to complete the brute-force search for any value of N
greater than 4.
So the Busy Beaver function is not actually a good way to calculate big
numbers — for example, 101027 isn't nearly as big as BB27, but it's bigger
than any BBN value we've been able to calculate, and it can be calculated
much more quickly and easily.
The only way in which the Busy Beaver function "grows fastest" is when you
look at it in terms of the function's value compared to the amount of
information required to specify the formal system, the function, and the
function's parameter(s). This is a highly abstract concept and shouldn't
be considered important unless you are studying the theory of
deterministic algorithms specified within formal systems. To understand
this, you can imagine, defining a precise set of rules for manipulating
symbols, which define all of the functions above (starting with addition
and going up through chained arrow notation, iteratively defining new
functions, and so on). Each new rule, symbol and function would take a bit
more information to define completely. If you wrote a computer program to
compute each function, each program would be a bit more complex. You could
also do the same thing by starting with a definition of the rules of the
Turing machine, then start with 1-state Turing machines and then increase
the number of states by adding a few extra bits of information per state.
It is generally believed that, as the amount of information used gets
higher, the Turing machine based system will produce higher function
values than any other formal system.
The Busy Beaver function, and anything we'll discuss after it, by
necessity must go beyond functions, algorithms and computability. Imagine
any sufficiently general definition of formalism (such as the Turing
machine) and then define a function f(N) giving the maximum value of the
results of its computation in terms of N, a suitably formalised
specification of the amount of information used to define the formal
system and the algorithm. f(N) will have a finite value for any finite N
and can be said to grow at a certain rate. Because all f(N) are finite for
all finite N, there exists a g() such that g(N) is greater than f(N) for
all N.
Page 63 of 100
At this point in the discussion (or usually sooner) it becomes apparent
that there is additional knowledge and assumptions "outside the system".
An effort is made to identify these, define them precisely and add them
into the quantity N. After doing this, it is soon discovered that the
resulting formal system itself depends on things outside itself, and so
on. I have encountered many expositions, discussion threads, etc. over the
years, that begin with an optimistic determination to formalise the
problem and quantify exactly how large numbers can be derived from first
principles; they all have ended up somewhere in this jungle of
abstraction. Here is a relevent quote:
One can of course imagine Turing machines that have two or more oracle
functions, or a single oracle function that answers questions about
another type of oracle machine. If a "first order oracle machine" is a
Turing machine with an oracle that computes the Busy Beaver function for
normal Turing machines, then a "second order oracle machine" has an oracle
that computes the Busy Beaver function for first order oracle machines,
and so on.
Page 64 of 100
Nabutovsky and Weinberger have shown[48] that group theory can be used to
define functions that grow as quickly as the Busy Beaver function of a
second-order oracle Turing machine.
Turing's work was closely related to, and produced largely similar results
to, that of Alonzo Church. The Church encoding can be used to represent
data and computation operations in the lambda calculus, and computation
occurs by beta-reducing assertions into results.
The lambda calculus is more powerful because (among other reasons) it
allows results to be expressed without the need to figure out how those
results might actually be accomplished. For this reason, in practical
computing problems it is more powerful; that is why so much good work has
been able to be done in LISP and Haskell, for example.
As I'll mention later ("by whichever specific axioms and means..."), there
are mutiple approaches to this type of work. We'll eventually get to Peano
arithmetic, set-theory and first-order formal logic. These approaches
might be avoided for various reasons, including Gödelian incompleteness or
because they simply aren't needed for constructing the desired result.
In the case of large numbers like those we've managed so far, we need only
a three-symbol combinatory logic calculus combined with a simple (first-
order) oracle for its own version of the Collatz conjecture's halting
problem. This three-symbol calculus uses symbols I (identity), K
(constant) and S (application) on parenthesised expressions that are
equivalent to binary trees, i.e. every pair of parentheses contains two
entities which may either be a single symbol or a pair of parentheses that
itself contains two entities. The SKI combinator calculus or "SKI
calculus" is equivalent to the more commonly-known lambda calculus of
Alonzo Church.
Any study of lambda calculus defines symbols for needed terms, operations,
functions, etc. as it goes along (see the Wikipedia lambda calculus for
examples). The SKI calculus might seem simpler in that we're just sticking
to these symbols and the parentheses, but it is equally powerful. In
particular, S, K, and I are just three of the commonly-abbreviated
combinators of the lambda calculus:
Page 65 of 100
I := λx.x
K := λx.λy.x
S := λx.λy.λz.x z (y z)
The SKI calculus is close to being the simplest that is needed to provide
all of the power of the lambda calculus (in fact, only S and K are needed,
because ((SK)K) is equivalent to I). Anything in the SKI calculus can be
converted to an equivalent form in lambda calculus, and vice-versa26.
Therefore (by the Church-Turing thesis), the SKI calculus itself is just
as powerful, from a theory of computation viewpoint, as Turing machines:
for every Turing machine, there is a parenthesised expression that is
valid in the SKI calculus and, when beta-reduced produces a result that is
analogous to the final state (including tape) of the Turing machine.
Ix => x
Kxy => x
Sxyz => xz(yz)
As you might imagine, then, SKI calculus has its versions of the "halting
problem" and the "busy beaver function".
Alternatively, (for the oracle that we'll get to soon) the halting
function tells whether beta-reduction produces just a single symbol "I":
(((SK)S)((KI
S.K.S.((KI)S) → ((K((KI)S))(S((KI)S)))
)S))
K.(KI)S.(S((KI)S)) → ((KI)S)
h(S) =
true,
K.I.S → I
O(S) =
true
h(S) =
(((SS)S)(SI) true,
S.S.S.(SI) → ((S(SI))(S(SI))) O(S) =
)
false
(length
Page 66 of 100
6
symbols
after 1
step.
maximum
length
possibl
e from
a 5-
symbol
start.)
((((SS)S)(SI
S.S.S.(SI) → (((S(SI))(S(SI)))(SI))
))(SI))
S.(SI).(S(SI)).(SI) → (((SI)(SI))((S(SI))(SI)))
S.I.(SI).((S(SI))(SI)) →
((I((S(SI))(SI)))((SI)((S(SI))(SI))))
I.((S(SI))(SI)) →
(((S(SI))(SI))((SI)((S(SI))(SI))))
S.(SI).(SI).((SI)((S(SI))(SI))) →
(((SI)((SI)((S(SI))(SI))))((SI)((SI)((S(SI))(SI))))
)
S.I.((SI)((S(SI))(SI))).((SI)((SI)((S(SI))(SI)))) →
((I((SI)((SI)((S(SI))(SI)))))(((SI)((S(SI))(SI)))((
SI)((SI)((S(SI))(SI))))))
h(S) =
false,
O(S) =
false
(grows
(etc.)
forever
at an
exponen
tial
rate)
For the "busy beaver function" of SKI calculus, we define the length of a
string in the obvious way:
Page 67 of 100
and the "output" of a string is the length of the result of beta-reducing
it:
bb(n) := the largest value o(S) of any string S for which l(S)=n and h(S)
is true
Some time after Rayo's Number came along (we'll get to it later) Adam
Goucher [54] was attempting to define a large number in a way like that of
the Lin-Rado Busy Beaver function. He recognised that this combinatory
logic system was only equal to Turing machines, and that its bb(n) would
grow comparably to the Lin-Rado BB(n). To make his definition bigger, he
added the oracle symbol O, which indeed makes for a faster-growing bb(n)
function. His description defines what is essentially first-order
arithmetic, i.e. a system like that coming out of the Peano axioms above,
along with a rich set of predefined things like operators + and ×, an
infinite alphabet of variables, etc. As such, it looks a lot like
Hofstadter's Typographical Number Theory [36]. Its computational
capabilities are equivalent to normal Turing machines and the SKI
calculus.
Adding the oracle operator O brings the system up to the level of second-
order arithmetic. Goucher then defined the Xi function Ξ(n):
Ξ(n) = the largest value o(S) of any string S for which l(S)=n and h(S) is
true
This is the same as bb(n) above, but with S allowed to include O symbols
in addition to S, K, and I.
Finally we'll spend some time on the third popular way to formulate
systems that express large numbers. This is closely related to formal
logic (ostensibly still part of the school of philosophy) and its use to
Page 68 of 100
establish the foundations of mathematics. This was the approach of Frege,
Russell and Whitehead, and Gödel.
Rayo's number, which we'll get to eventually, appears to use a popular and
well-studied "structure": a set of objects similar to the von Neumann
universe; with finitary operations and relations common in set theory:
negation (~), conjuncation (∧), set-membership (∈), equality (=), and
existential quantification (∃). Other well-known operations can be defined
in terms of these, possibly by implicit definitions, examples of which
follow.
Peano Arithmetic
In the late 1900's, Giuseppe Peano presented the Peano axioms which can be
used to formalise arithmetic. The original Peano axioms comprise:
0 is a natural number.
Page 69 of 100
For any two things a and b, if a is a natural number and a=b, than b
is a natural number (Natural numbers are closed under the equality
relation.)
For every natural number n, there exists a natural number m such that
m=S(N).
For any two natural numbers m and n, m=n if and only if S(M)=S(N)
(the successor function is an injective function.)
For every natural number n, S(n) is false: there is no natural number
whose successor is 0 (and the successor function is not a bijection.)
{ 0, 1, 2, 3, 4, 5, 6, 7, .... , A, B, C }
would saitisfy the first eight axioms if S(x) were defined such that:
the three elements A, B, and C form a cycle that is "disjoint" from the
other numbers starting with 0. If you start within this cycle, S(x) is
always defined and unique, as it is if you start anywhere in the normal
numbers, but there is no way to get from one to the other or back through
repeated application of the successor function. This type of situation is
eliminated by including the Axiom of Induction.
Regardless of how induction is handled, the Peano axioms can then be used
along with suitable definitions of addition and multiplication, and the
total ordering that comes from whichever type of induction axiom(s) were
used, to develop a theory of arithmetic in which it is possible to prove
Page 70 of 100
such things as the fundamental theorem of arithmetic (that every natural
number except for 0 and S(0) has a unique factorisation into prime
factors; for outlines, see [51] or [55]); a fairly thorough discussion of
this sort of thing in [36].
As stated so far, we imagined an element "0", and elements "1", "2", "3",
etc. without establishing our right to add these symbols to the basic
symbols of set theory. Set theory itself includes only the null set ∅ = {}
and the ability to construct or define sets that include other sets. This
turns out to be enough, if we define the first natural number 0 as being
the null set ∅, and the successor function S(a) as a U {a}, the union of a
with the set containing a as its only element. The succession of natural
numbers becomes:
0 = ∅ = {} ;
1 = ∅ U {∅} = {∅} ;
2 = {∅} U {{∅}} = {∅, {∅}} ;
3 = {∅, {∅}} U {{∅, {∅}}} = {∅, {∅}, {∅, {∅}}} ;
4 = {∅, {∅}, {∅, {∅}}} U {{∅, {∅}, {∅, {∅}}}} = {∅, {∅}, {∅, {∅}}, {∅,
{∅}, {∅, {∅}}}} ;
etc.
ℕ = {0, 1, 2, 3, 4, ...}
a < b ↔ a ∈ b
It is also useful to have a set of all natural numbers so that we can use
"a ∈ ℕ" to assert that a is a natural number (and not some other thing,
like an ordered pair, which will be needed later).
Page 71 of 100
p is an element of q, and
q is an element of n
We have "and" (∧) and "not" (~); from these we get "or" in the standard
way. If p and q are predicates, then "p∨q" can be expressed by
"~((~p)∧(~q))"; or more formally:
p∨q := ~((~p)∧(~q))
p→q := ~((~q)∧p)
*p↔q := (p∧q)∨((~p)∧(~q))
∀x(p(x)) := ~(∃y(~(p(y))))
If a and b are sets, the their union c=a∪b can be defined, and its
existence asserted, with:
"There exists (a set) C such that for all d, the statement 'd is an
element of C' is true if and only if d is an element of A or of B."
We are given no symbol for the null set, but we can define it and assert
its existence:
∃ ∅ ( ∀ a ( ~(a∈∅) ) )
"There exists (a set) ∅ such that for all a, a is not an element of ∅."
Forming Predicates
By whichever specific axioms and means, the methods of set theory and
formal logic are used to define more predicates, functions, and relations
on the natural numbers. We've already seen a few that are basic to set
theory: the equality and element relations, the successor function, the
predicate φ indicating membership in a certain set. These naturally give
Page 72 of 100
us vary besic number-theory operations of equality, the successor
function, and the ordering/comparison operator.
Not So Fast!
Peano arithmetic, in the forms that use the induction axiom of the second
order (i.e. the single axiom covering all possible predicates φ) might be
avoided for various reasons, including Gödelian incompleteness or because
they simply aren't needed for constructing the desired result.
Rayo's Calculus
To really surmount the Busy Beaver function, we'll go to the winner of the
rather colourfully-advertised Big Number Duel at MIT in 2007.
Formulas
a ∈ b (set-membership)
a = b (equality)
~ a (negation)
a ∧ b (conjunction)
∃ a : b (satisfiability)
When we get to Rayo's number there will be the concept of "being able to
name a number m in a certain number n of symbols".
All numbers are nameable: at the very least, one can assert that the
number is equal to one of the von Neumann cardinals like 1={∅}; this
assertion is done by the rather awkward construction:
1 exists ↔ (
(~∃x3: (x3∈x2) ) ∧ "x2={}"
x2∈x1 ∧ "x2<x1"
(~∃x3: (x3∈x1 ∧ (~x3=x2)) ) "there is no x3 such that x3<x2 and x3<x1"
)
The only way for that to be true is for x1 to have the value "{∅}"=1: the
first part is a sub-assertion forcing x2 to be "{}"=0 ; the next part says
that x1 is bigger than x2 (so must be 1 or higher); and the third part says
that x1 is not bigger than 1.
(I'll note here that it must seem unusual to some readers that it takes so
many symbols to assert the fact that the number 1 exists. But far more
symbols can be used, as seen in Whitehead and Russell's programme of
metamathematics "Principia Mathematica".)
The fact that all numbers are namable this way is of little use; we're
trying to get a specific, well-defined large number. We'll use a Rayo-like
approach, and define a predicate that says "a certain number is nameable
in an assertion of this type, with a limited number of symbols":
ST-namable-in(m, n) ↔
∃Φ(x1): {
Φ has fewer than n symbols ∧
∃s: s = Assign(m, x1) ∧
(∀t: Sat([Φ(x1)],t) → t = Assign(m, x1))
)
Page 74 of 100
(It turns true at 37 because the definition of ST-namable-in(m, n)
requires that we can do it in "fewer than n symbols").
We can say the same thing in a different way by changing the first
argument:
Using the type of "existence of n" formula just shown, where we invoke
only the existence of zero and the successor function, is very
inefficient. With each successive assertion of "c is less than b but not
enough that another number d can get inbetween" we need more
subexpressions than we did the last time. To assert the existence of the
number n requires (9n2+43n+20)/2 symbols. If a googol symbols were allowed,
the largest number we could assert would be about 4.714×1049:
But the subset of set theory and formal logic that Rayo chose can do more
than just string together a bunch of a∈b relations. It is, in fact "Gödel-
complete" in the sense that an entire Peano arithmetic can be built upon
it.
Since we're using set theory, functions and relations can be expressed as
(infinite) sets, and set-membership asserts a relation. For example, the
set P (for "Plus") would be the set of all valid addition relations,
consisting of ordered triples; and it might start out: Plus = { (0,0,0),
(0,1,1), (1,0,1), (0,2,2), (1,1,2), (2,0,2), (0,3,3), ...} where each of
those digits is a von Neumann cardinal like 2={{},{{}}}. We don't have to
spend an infinite number of symbols to define P expicitly, we could "just"
say:
Page 75 of 100
b1+1=a1 ∧ b2=a2, or
b1=a1 ∧ b2+1=a2
It's unwieldy, but it works (and again, it's very close to how addition is
built up in the Gödel construction for demonstrating incompleteness.) We
used things like "for all" (∀) and "if and only if" (↔) that are not in
the allowed 7 symbols, but that's okay because these can be defined in
terms of the others. (For example, ∀x:P(x) is equivalent to ~(∃x:~P(x)} ).
I've skipped over how we make ordered triples: we have parentheses but
have no comma "," to construct ordered-tuple literals like this; instead
"(*a,b,c)" is shorthand for a set like {{*a},{a,{b}},{a,{b},{{c}}}}.
Things like "the second item of b" have to be expressed in terms of more
temporary variables.
a+b=c ⋛ {a,{b,{c}}}∈P
If you look back through these pages, you can see that all of the fast-
growing functions have been defined this way: from tetration to the
Ackermann function to chained arrow notation to the Bowers Extended Array
notation, everything is defined by subtracting 1 from one number and
applying an operation (either the one being defined, or a previously-
defined operation) to another number.
So, this gives us a more efficient way to prove numbers are ST-namable in
a googol symbols. Suppose that the definition of P outlined above took
1000 symbols, and we didn't define M or any other "operators". Given the
Page 76 of 100
primtive (as above) assertion that "2 exists" we can combine it with the
definition and application of addition as follows:
We are iterating addition in the same way that our previous approach
iterated the succssor function. If the definitions needed to set up
ordered triples and P take 1000 symbols, and each "a+b=c" takes another
1000 symbols, then the number 2n could be expressed in a bit over
1000n+1000 symbols. With a googol symbols, we can now assert the existence
of numbers as high as 2(googol/1000) ≈ 103.01×1096. We've made it to class 3! Woo
hoo!!
Page 77 of 100
Does this define a single number x? We don't know. The largest Mersenne
prime is less than 101010 and probably will be for some time. There might or
might not be a finite number of Mersenne primes (that's an unproven
conjecture), but if there are a finite number of them there is a largest
one, and if that largest one were the only one greater than 1010100, then
Φ(x) would be true for only one x. The trouble is that, given the
difficulty of proving the conjecture, it might take an infinite amount of
computation to confirm that there is only one such value.
This issue is addressed in the Busy Beaver function by assuring that only
halting Turing machines are allowed to be considered as candidates: the
number x must be computed in a finite number of steps and the machine must
then stop and not consider any higher values for x.
In formal logic, we need the formula Φ(x) to be true for some x, and we
also need the system to somehow verify this, through a finite number of
deductive steps.
I'm not going to try to explain all the concepts and methods fully, but
several articles in the Stanford Encyclopedia of Philosophy should be
helpful:
The basic terminology and smbols of formal logic are described in the
Classical Logic article.
Tarski's Truth Definitions discusses the task of expressing "truth
predicate" True(x) in terms of a formula Φ(x). It also discusses "variable
assignments that satisfy formulas". The entries on model theory and on
quantifiers and quantification discuss these ideas more generally, and the
latter links them to the definition of truth.
Rayo refers to "second order" and plural quantification in the definition
of Sat([Φ],s).
Rayo's Number
I still don't understand quite how this works, though most of the needed
explanation is in the links I gave at the end of the previous section, and
in the further readings list at the bottom of the Big Number Duel page at
MIT. My general impression is that most of Rayo's definition is explained
by the need to use methods of model theory to formalise the definition of
a "namable number" and ensure that the (second-order) system determines
the truth of the existence and uniqueness of any number that is so
namable, through arithmetisation of the first-order formulas, and
variable-assignments, formulas and satisfiability predicates in the
second-order system. I will describe the building blocks of Rayo's number
as nearly as I can.
Page 78 of 100
Variable Assignments
This isn't quite what Rayo's construction does, but it's conceptually
similar.
Gödel-Coding
'∈' = 2, '=' = 3, '(' = 5, ')' = 7, '~' = 11, '∧' = 13, '∃' = 17 ; x1 = 19,
x2 = 23, x3 = 29, ...
21932521
where the exponents {19, 2, 21} correspond to the three symbols {x1, ∈,
x2}.
[x1∈x2] = 21932521
Assignment
Variable assignments are so-called because they are the result of taking a
set of expressions with a free variable and replacing that free variable
with something that has a specific definition. (This is type of thing is
also used in Gödel's incompleteness proofs)
Page 79 of 100
Assign(m, x1) = s ↔ s is a variable assignment in which every x1 is
changed to m
R() has to be infinite set of ordered pairs, but let's give a finite
example and call it r().
where P() and Q() are predicates. It states that there is at least one
thing with the property P(), and all things with the property Q() also
have the property P().
where P and Q are sets. It states that P is a non-empty set and that Q is
a subset of P.
Page 80 of 100
Definition of Sat()
Sat([φ],s) :=
∀ R {
{for any (coded) formula [ψ] and any variable assignment t
(R([ψ],t) ↔
([ψ] = 'xi ∈ xj' ∧ t(xi) ∈ t(xj)) ∨
([ψ] = 'xi = xj' ∧ t(xi) = t(xj)) ∨
([ψ] = '(~θ)' ∧ ~R([θ],t)) ∨
([ψ] = '(θ∧ξ)' ∧ R([θ],t) ∧ R([ξ],t)) ∨
([ψ] = '∃xi (θ)' and, for some an xi-variant t' of t, R([θ],t'))
)
} →
R([φ],s) }
Rayo-nameability
Rayo-namable(m) ↔
∃Φ(x1): {
∃s: s = Assign(m, x1) ∧
(∀t: Sat([Φ(x1)],t) → t = Assign(m, x1))
)
But the fact that all numbers are namable this way is of little use; we're
trying to get a specific, well-defined large number. So Rayo expresses
"the largest number that satisfies an assertion of this type, but with a
limited number of symbols":
Page 81 of 100
Rayo-namable-in(m, n) ↔
∃Φ(x1): {
Φ has fewer than n symbols ∧
∃s: s = Assign(m, x1) ∧
(∀t: Sat([Φ(x1)],t) → t = Assign(m, x1))
)
Rayo's Number
Having gotten all that out of the way, the rest is simple. The "busy
beaver function" for this assertion-schema in von Neumann universe,
sometimes called FOST(), is:
BIG FOOT
In the years since the Big Number Duel, an online wiki / forum has built
up, centred around the "Googology wiki" on wikia.com.
Not satisfied with Rayo's number, the self-named "googologists" have made
several attempts to top it. Re-use of earlier ideas does not count; any of
these:
FOST(10100) + 1
10FOST(10100)
FOST(1010100)
FOST(Mega)
FOST(Graham)
FOST(googol→googol→googol→googol→googol)
FOST(Bowers{googol,googol,(googol),googol})
FOST(BB(googol))
FOST(FOST(googol))
FOST10(googol)
would not considered a new champion, because (under the rules of the Big
Number Duel that spawned Rayo's number), every new champion must use a
significantly new technique.
Page 82 of 100
They start by augmenting Rayo's calculus with the [ and ] symbols. These
are not the same as Rayo's [ ] (which were for Gödel-coding of a formula);
rather
Ordα is the least oodinal which is greater than every ordinal which can be
defined in language of FOOT with parameters of rank below Ordα, if we allow
it to use constant symbols Ordv(β} for every β<α.
There is a lot more to define what "oodinals" are and how to work with.
They are not (necessarily) sets, but they have a lot in common with ranked
set theory e.g. the von Neumann universe. You can read all about it here:
[First-order oodle theory|http://snappizz.com/foot]. You might want to
start with the "Higherorderset_theory" article by the same author.
The Frontier
Beyond all the finite numbers are transfinite numbers and infinities. Once
we go beyond finite numbers, we enter an area where it is essential to
define exactly what theory of numbers we're working in.
Depending on what type of number theory you're looking at, there may or
may not be transfinite numbers and there may or may not be a plurality of
infinities. These differences result from the use of different axioms and
rules for deriving results. Different axioms and rules lead to different
results including different answers to the question what lies beyond all
the integers?. Because different systems are useful for different things
and none can generate all useful results (due to incompleteness as
demonstrated by Gödel) we end up with several different 'right answers' to
the question. None is the 'best' answer, but some are more popular than
others. (The term transfinite itself is a result of this — it was Cantor's
effort to avoid using the term infinite for certain quantities that were
definitely not finite, but did not share all the properties of what he
considered truly infinite, and now called "Absolute Infinite".)
Ordinal Infinities
Page 84 of 100
you counted the integers by taking all the evens first, and then the odds:
infinity even numbers plus infinity odd numbers; the total is just
infinity, not "two times infinity". All you did was reorder the numbers;
that never changes how many there are.
This infinity is also the size of an infinite Euclidean geometrical
object, like the length of a line, the area of a plane, etc. when measured
in terms of another finite unit such as a line segment. Here we are
referring to "size" in terms of measure, where specific distances are
taken into account, not in terms of order, which is the number of elements
in a set and therefore the number of points in a geometric object.
The Ordinal "Countable" Infinities
"omega" = ω = 1 + ω = 2 × ω = ℵ0
ω + 1
ω + 2
ω + ω = ω × 2
ω + ω + 1
ω × 3
ω × ω = ω2
ω2 + 1
ω2 + ω
ω3
ω3 + ω2 × 3 + ω × 3 + 1
ωω = 1 + ω + ω2 + ω3 + ω4 + ω5 + ...
ωω + 1
ωω + ω
ωω + ω2
ωω + ω3
ωω + ωω = ωω × 2
ωω × ω = ωω + 1
ωω + 1 + ω
ωω + 1 + ωω
ωω + 2
ωω × 2
ωω2
Page 85 of 100
ωω3
ωωω
ωωω + 1
ωωω × 2
ωωωω
ωωωωω
ωωωωω..
ε0 = ωωωωω..... (with ω omegas)
Epsilon-Null
Somewhere along this sequence or perhaps after it (it is unclear from the
sources I have access to) are various higher "epsilons" ε1, ε2, εω, εε0 and
so on, and then a quantity Cantor calls alpha, which represents the first
quantity that cannot be handled by the epsilon sequence.7,11
All of this is possible because of the original axioms and rules of the
ordinal system, which state that the order you count things in makes a
difference. But what if you're allowed to reorder the items when counting
them? That would amount to switching to a cardinal counting system. When
Page 86 of 100
this is done, all of these ordinal infinities turn out to be equal! They
are all equivalent to the cardinal ℵ0. For that reason, Cantor put all the
ordinal infinites listed so far in a "class" and labeled that class ℵ0.
Definition of ℵ1
In geometric set theory systems, which are cardinal systems, the ℵ-series
is not used (although ℵ0 may occasionally be used or implied by the use of
the term "countable"). In these systems, the next infinity after the
"countable" is c, called the *order of the continuum* or sometimes simply
the continuum. One also sees reference to a continuum, in which case the
reference is to a geometric/topological set that has c elements, that is
to say, a geometric object containing c points. Examples of a continuum
are a straight line, or the real numbers.
c = 2 ℵ0
Imagine a line segment of length 1 and an infinite line. The line segment
has a midpoint Q0 and the line has an arbitrary centre point P0. Now, every
point P on the line has a coordinate CP corresponding to that point's
distance from P0, positive on one side and negative on the other. Every
point Q on the line segment has a similar coordinate CQ. To show that the
two objects (the line and the line segment) have the same number of
points, all we need to do is to supply a mapping function such as the
following:
CQ = arctan(CP) / pi
Each point P has a unique coordinate CP, and each value for CP generates a
unique value for CQ by this formula, which corresponds to a unique point Q
on the line segment.
The continuum is the number of real numbers. Real numbers include anything
that has a decimal point and a finite (or infinite) number of digits, with
a repeating or nonrepeating decimal pattern. Most real numbers have an
infinite number of digits after the decimal point and no repeating
pattern.
Real numbers can be used to show that the number of points on a plane is
equal to the number of points on a line. For each point on a plane, there
is a unique pair of coordinates, such as (2.21751..., 6.40861...) or
(9.40589..., 3.25361...), etc. Take the digits of the two coordinates and
form a single number by interleaving the digits: one digit from the first
coordinate, then one digit from the second, then another digit from the
first coordinate and another from the second, and so on. All the digits
get used once, none get duplicated or thrown away. The result is a single
real number that is different from the number you would get from any other
pair of coordinates:
Page 88 of 100
-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, ...
-2, 1, 2, 4, 5, 7, 8, 10
0, 2, 4, 5, 7, 8, 10, 16, 17, 19, 22, ...
1, 2, 4, 8, 16, 32, 64, 128, 256, ...
1, 3, 4, 7, 10
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, ...
where there are a finite or infinite number of integers and each one is
bigger than the one before it. The number of possible sequences is
infinite, and can be proven to be bigger than the number of integers. It
can also be proven to be equal to the number of real numbers with another
one-to-one mapping (here, we're skipping a detail that is necessary to
avoid problems with integer sequences that have no definite start, as for
example the set of negative even integers):
Starting with any real number *X, define its simple continued
fraction* to be the expression of the form:
A + 1 / (B + 1 / (C + 1 / (D + ... ) ) )
[ A, B, C, D, ... ]
For each simple continued fraction there is exactly one such sequence and
each simple continued fraction gives a different sequence.
Each ordered sequence gives exactly one ascending sequence and each
ordered sequence gives a different ascending sequence.
There are other ways to prove that c is the order of the power set of the
integers; Cantor proved it in a manner similar to that discussed here.
Page 89 of 100
The Continuum Hypothesis
After developing the ordinal and cardinal theories to this point, Cantor
could not determine whether c was distinct from ℵ1 or equal to it. Cantor
tried for a long time to discover a set of points that had more than ℵ0
points but less than c (if found, he could say that this set had ℵ1 points,
and c would be ℵ2 or larger). He couldn't find such a set, and then
proposed what is now called the continuum_hypothesis:
Gödel showed in 1940 that Cantor could not have disproved the continuum
hypothesis using his axioms (which are now called "Zermelo Fraenkel set
theory with the Axiom of Choice", often abbreviated ZFC), Paul Cohen
showed in 1963 that Cantor could not have proved it either. For this work,
Gödel and Cohen both did major new work in the field of metamathematics,
which involves "modeling" mathematical axiom-proof systems with "bigger"
systems.
So, at least in standard ZFC set theory, the continuum hypothesis must be
declared to be true or false using a new axiom, or left undecided (as
Cantor did). You get a different system of infinities each way. By the
1990's, most mathematicians preferred to define the continuum hypothesis
as being false (mostly because of the usefulness of the results that can
be derived). The implication is that (if you follow the preference of the
mathematicians) c is greater than ℵ1.
Page 90 of 100
Let S1 be a set with ℵ0 elements (like the set of integers)
Let S2 be the set of all countable ordinals
Let T be a set with c elements (like the set of points on a line)
Let T' be the set of all subsets of T (the power set of T).
Let T'' be the set of all subsets of T' (the power set of T').
etc.
In cardinal set theories it can be shown that that there are no infinities
"in between" these. Any definition of an infinite quantity can be shown to
be equivalent to a member of the power set sequence. Since Continuum
Hypothesis taken to be false, c cannot be equivalent to ℵ1, but it could be
ℵ2 or one of the higher ones. All of the higher power sets would then
coincide in the same way. For example, if c were ℵ2, then 2c would be ℵ3
and so on.
Page 91 of 100
This idea of only three useful infinities is hauntingly reminiscent of the
(perhaps mythical) "one, two, three, many" of the Hottentots, bringing us
full-circle back to class-0 numbers.
Inaccessible Infinities
In each of these processes, imagine the infinity you "get to" as you carry
the process on "forever". This includes any algorithmic process in which
the number of steps is finite, working up to such things as ℵBB(n) where
BB(n) is the busy beaver function and N is some gratuitous huge integer.
If you stay "within the system" while doing this process, by sticking to
well-defined symbols, rules, axioms, etc. you can create more and more
infinities, but you will always be working within a formal system of
number theory or set theory.
However, all number theories and set theories are incomplete. It has been
shown that by going outside the system you can demonstrate the existence
of "inaccessible cardinals" or "inaccessible infinities", which are bigger
than all of those producible through formal systems. This result is
analogous to the computation-theory result of the uncomputable functions.
Note. I try to explain things at least a little bit, and to give suitable
references. I definitely do not follow my own First Law of Mathematics. If
you suggest an improvement for these pages, I'll probably be able to do
something to make it better — just let me know (contact links at the
bottom of the page).
Footnotes
Page 92 of 100
2 : http://www.miakinen.net/vrac/nombres#lettres_zillions Olivier
Miakinen, Écriture des nombres en français, (web page) 2003.
3 :
http://web.archive.org/web/20061021030550/http://www.io.com/~iareth/bignum
.html Gregg William Geist, Big Numbers (web page), 2006 (Latin number
names, some of the large examples like centumsedecillion)
7 : Conway and Guy, The Book of Numbers. See bibliography entry [43]
below.
12 : http://www.toothycat.net/wiki/wiki.pl?CategoryMaths/BigNumbers
Douglas Reay, commenting on discussion of formal theory of computation,
toothycat.net wiki (created by Sergei and Morag Lewis), CategoryMaths,
BigNumbers.
13 : http://www.math.ohio-state.edu/~friedman/manuscripts.html Papers by
Harvey M. Friedman. In the "preprints, drafts and abstracts" is Enormous
Integers in Real Life, 2000, which summarises several methods of producing
large integers, related to combinatorics and theory of computation.
15 : http://math.eretrandre.org/tetrationforum/showthread.php?tid=184
Henryk Trappman and Andrew Robbins, Tetration FAQ (online document)
Page 93 of 100
16 : Martin Gardner, The Colossal Book of Mathematics: Classic Puzzles,
Paradoxes, and Problems, W. W. Norton (2001), ISBN 0393020231.
17 : Knuth, Donald E., Coping With Finiteness, Science vol. 194 n. 4271
(Dec 1976), pp. 1235-1242.
18 : http://www.stars21.com/translator/english_to_latin.html InterTran
English-Latin Translator, via Stars21.
20 : http://www.numericana.com/answer/units.htm#prefix G{'e}rard P.
Michon's Numericana, Final Answers — Measurements and Units. (Has lots of
details about real and bogus SI prefixes) (formerly at
http://home.att.net/~numericana/answer/units.htm)
22 : OED [42] does not cite billion in the superlative sense, but milliard
was used in the superlative sense as far back as 1823.
Page 94 of 100
An example of such a discussion is the long-running xkcd forum discussion
thread "My number is bigger!". This thread was begun on the 7th of July,
2007, and remained continually active for nearly three years (last checked
May 2010). The initial message began the competition with 9000; the first
respondent offered 3.250792...×10548; several class 2 replies brought it up
to 3.454307...×101661; then it jumped to 101010, 10101010, 10↑↑512, and
10↑↑↑3=10↑↑(10↑↑10). All of this was within the first 24 hours. Up-arrow
notation was no longer of any use by the third day of the discussion, and
the participants then began defining recursive functions and discussing
proofs. It continued along those lines for the following three years.
Bibliography
[28] Kasner, Edward and Newman, James, *Mathematics and the Imagination*,
Penguin, 1940
[29] Gamow, George, One, Two, Three... Infinity: Facts and Speculations of
Science, Viking, 1947 (reprinted in paperback by Dover, 1988).
This was an early source for me and unfortunately gave me the impression
that the contimuum hypothesis had been proven. This figure implies that
the ℵn series of infinities is the complete set of infinities:
Page 95 of 100
Gamow p. 23., implying CH
If these are really "the first three infinite numbers", then there can be
nothing between ℵ0 and ℵ1, and that's CH.
[31] George Miller, The magical number seven plus or minus two: some
limits on our capacity for processing information. The Psychological
Review 63 (1956), pp. 81-97
[32] Davis, Philip J., The Lore of Large Numbers, New York: Random House,
1961
Page 96 of 100
[33] Dmitri Borgmann, Naming the numbers. Word Ways: the Journal of
Recreational Linguistics 1 (1), pp. 28-31, 1968. Cover and contents are
here and article is here.
[35] R.L. Graham, B.L. Rothschild, Ramsey's Theorem for n-Parameter Sets.
Transactions of the American Mathematical Society 159 (1971), 257-292.
(Another PDF is here).
[42] The Compact Oxford English Dictionary (Second Edition), 1991. This is
the version that has 21473 pages photographically reduced into a single
book of about 2400 pages.
[43] John Horton Conway and Richard Guy, The Book of Numbers, Springer-
Verlag, New York, 1996. ISBN 038797993X.
Page 97 of 100
pp. 266-276 (Cantor ordinal infinities)
pp. 277-282 (cardinal infinities and the continuum)
[46] Chris Bird, Proof that Bird's Linear Array Notation with 5 or more
entries goes beyond Conway's Chained Arrow Notation, 2006. Available here
(and formerly at
uglypc.ggh.org.uk/chrisb/maths/superhugenumbers/array_notations.pdf)
[47] Harvey Friedman, n(3) < Graham's number < n(4) < TREE{3}, message to
FOM (Foundations of Mathematics) mailing list.
[51] N. Mohan Kumar, Construction of Number Systems (for Math 310 course
at Washington University in St. Louis), 2011.
[52] John Baez, Google+ post, 2013 Jan 11 (See also this mathoverflow
question)
[53] Sbiis Saibian, 3.2.10 Graham's Number, web article, 2013 Feb 15.
Other Links
Aaronson, Scott, Who Can Name the Bigger Number?, essay about how to win
the often-contemplated contest; covers many of the topics discussed here.
Bird, Chris, Array Notations for Super Huge Numbers, 2006. (An older
version of his work, which includes much of the material found here).
----, Super Huge Numbers, 2012. There are several sections, with the
simplest and slowest-growing functions first. The initial chapter "Linear
Array Notation" is roughly comparable to Bowers arrays; the other chapters
define higher and higher recursive functions.
Page 98 of 100
Bowers, Jonathan, Big Number Central.
Rucker, Rudy, Infinity and the Mind, 1980. (ordinal infinities: the
relevant chapter was reproduced here the last time I checked.)
Steinhaus, Hugo, Mathematical Snapshots (3rd revised edition) 1983, pp. 28-
29.
----, Graham's Number (referring to the more well-known version, i.e. the
"Graham-Gardner number")
To Morgan Owens (packrat at mznet gen nz) for news of the Knuth -yllion
names and the Busy Beaver function