19 03 13 - Large Numbers - R Munafo

Large Numbers
This page begins with million, billion, etc., proceeds through Googolplex
and Skewes' numbers (organised into "classes" based on the height of the
power-tower involved), then moves on through "tetration", the Moser and
the "Graham-Rothschild number", on to lesser-known hierarchies of
recursive functions, the theory of computation, transfinite numbers and
infinities. If it's a number and it's large, it's probably here.
Following is an article by Adam Townsend. Published on 12 May 2016.

What’s bigger than a trillion? Bigger than a quadrillion? Can we give
names for large numbers which are snappier than 274,207,281−1
As we have found the need to use large numbers in our lives, various
interesting systems have been proposed. Impress your friends with some of
these!
Extending million, billion, trillion…
An extension to ‘million, billion, trillion’ was proposed in John Conway

and Richard Guy’s book, The Book of Numbers (not to be confused with the
slightly older book sharing its name).
With the English language finally having settled on the traditionally

American designation that 1 billion = 1 thousand million (thanks
finance!), Conway & Guy extended the system that already exists up to 1030.
They take the number n occurring in 103n+3

106 million n = 1
109 billion n = 2
1012 trillion n = 3
1015 quadrillion n = 4
1018 quintillion n = 5
1021 sextillion n = 6
1024 septillion n = 7
1027 octillion n = 8
1030 nonillion n = 9
1033 decillion n = 10
and use its Latin translation as a prefix for n≥10
.So we get
n=11 undecillion
Page 1 of 100
n=18 octodecillion
n=25 quinvigintillion
In case your Latin needs a refresher, here’s the table to create the
prefix, as refined by Olivier Miakinen. It works left-to-right, so for
your n, you do the units first, then the tens, then the hundreds.
Units Tens Hundreds
0 — — —
1 un (n) deci (nx) centi
2 duo (ms) viginti (n) ducenti
3 tre (s) (ns) triginta (ns) trecenti
4 quattuor (ns) quadraginta (ns) quadringenti
5 quin (ns) quinquaginta (ns) quingenti
6 se (sx) (n) sexaginta (n) sescenti
7 septe (mn) (n) septuaginta (n) septingenti
8 octo (mx) octoginta (mx) octingenti
9 nove (mn) nonaginta nongenti
When placed before a component marked (s) or (x), “tre” becomes “tres” and
“se” becomes “ses” or “sex”. Similarly, placed before a component marked
(m) or (n), “septe” and “nove” become “septem” and “novem” or “septen” and
“noven”.
Once you’ve combined these parts together in units–tens–hundreds order,

remove the final vowel, and add -illion. Easy! Let’s go and name some
real-life big numbers.
Names for real-life big numbers
$100 trillion is still only 1014
Page 2 of 100
.
Old Zimbabwean dollars in new Zimbabwean dollars
2 x 1035 = 200 x 103 x 10 + 3= 200 decillion
Large numbers are often seen in terms of money. But how large? Estimates
suggest the total amount of money in the world is around $1 quadrillion:
this is only $1015
However, we can find some larger numbers due to hyperinflation! The

Zimbabwean dollar started life as the Rhodesian dollar in 1970, at the
rate of 10 shillings (half a pound) = $1, in the same style as other
Commonwealth nations converting from pounds, shillings and pence.
Accelerated erosion of the currency in the 2000s lead to three
revaluations of the currency (knocking noughts off), with the final dollar
being worth 2 x 1035 of the old dollar. That’s a pretty big number, but of
course, the process of knocking noughts off meant that largest number ever
seen on a banknote was $100 trillion. These days, Zimbabweans use a
mixture of other currencies.
Number of atoms in the universe
1080 = 100 x 103 x 25 + 3
= 100 quinvigintillion
Surely that must be the physical limit of large numbers?
A googol
10100 = 10 x 103 x 32 + 3
= 10 duotrigintillion
One hundred factorial
100! ≈ 9.3 x 10157
= 93 x 103×51+3
= 93 unquinquagintillion
The factorial function grows incredibly quickly. Classic maths challenge

question: how many zeros are on the end of this number?
Page 3 of 100
The largest-known prime is a Mersenne prime, named after this guy:
Mersenne.
The largest known prime
274,207,281−1 ≈ 3.004…×1022,338,617
≈300×103×7,446,204+3= ???
Fret not! The system expands further. Run the naming system for each group
of thousands and put an –illi– between them. So,
274,207,281−1 ≈ 3.004…×1022,338,617
≈ 300×103×7,446,204+3
300 septillisesquadragintaquadringentilliquattuorducentillion!
The number of years you’d have to wait for the universe to regenerate
itself in a similar state to now, if you let the universe repeat its
history arbitrarily many times: 10^10^10^10^10^1.1
OK, what? The big bang theory gives us an age for the universe of 13.8
billion years: that’s 4.355×1017 seconds. Still not bigger than the number
of atoms in the universe, though. But in order to get that incredibly
large number—in fact, probably the largest number you’ve ever seen—we need
to get a bit physical. Read about it here. This number even dwarves a
googolplex, the largest number to have a name.
The naming of this number is left as an exercise to the reader.
Where do we get the names million, billion, trillion from?
The Latin mille for 1000—from which we get millennium (1000 years) and
mile (1000 paces)—is thought to have come from pre-13th century Italian.
As the Romans had no names for numbers larger than 100,000, the Italians
added the ending -one to mille to make it larger: milione.
The words bymillion and trimillion were first recorded in 1475.

Subsequently, the French mathematician and chief notation-creator Nicolas
Chuquet wrote in his book, Triparty en la science des nombres,
Page 4 of 100
Chuquet’s first mention of million, billion, trillion, and so on (starting
top line, in French).
The first mark can signify million, the second mark byllion, the third
mark tryllion, the fourth quadrillion, the fifth quyillion, the sixth
sixlion, the seventh septyllion, the eighth ottyllion, the ninth nonyllion
and so on with others as far as you wish to go.
Chuquet records the usage here as using the ‘long scale‘: where a billion
equals a million million, instead of a thousand million. This usage was
common in British English up until the mid-1970s, when pressure from
American English (‘a billion dollars’) tipped the balance. Most European
languages still use the long scale: ironic, given that the short scale was
adapted in the US from a 17th century French convention.
Competing systems for large number names
The Conway system for naming large numbers expands the current system in a
logical way, but suffers from the possibility of long scale/short scale
confusion, as well as inconsistency between n<10 and n≥10. Russ Rowlett’s
2001 suggestion is to use Greek prefixes to create new, unambiguous
numbers:
103 thousand
106 million
109 gillion
1012 tetrillion
The Conway system also suffers from a bad starting point, which makes it
difficult to work out from the naming system what, say, 1 million × 1
billion is. (It’s a quadrillion). Donald Knuth (of TeX fame) invented an
exponential system of ‘-yllions’ where a new name is only introduced at
102,104,108,1016 and so on.
101 ten
102 hundred
104 myriad
Page 5 of 100
108 myllion
1016 byllion
So 103= ten hundred and
106= a hundred myriad
Names for large numbers only catch on when they appear in our daily lives.
In measurement they’re avoided by the use of SI prefixes. Barring any
hyperinflation, the highest we’d expect to see for a while is the world
wealth of $1 quadrillion. For everything else, we can be pretty happy with
standard form. For more large number fun, Robert Munafo has a terrific
(but long) read on names for large numbers on his website.
Contents
Author's Introduction
Class 0 Numbers (like 3)
Class 1 Numbers (like 100)
Class 2 Numbers (like googol)
The -illion Names
Conway-Wechsler Extension
Knuth -yllion System
Class 3 Numbers (like googolplex)
Class 4 Numbers
Skewes' Number
Higher Classes
The Quality of Uncomputably Larger
Power Towers
Inventing New Operators and Functions
Beyond Exponents: hyper4
Hyperfactorial and Superfactorial
Higher hyper Operators
Bowers' Array Notation
Steinhaus-Moser-Ackermann operators
Friedman sequences
The various "Graham's number"s :
Page 6 of 100
The "Graham-Rothschild Number"
The "Graham-Gardner Number"
The "Graham-Conway Number"
Superclasses
Conway's Chained Arrow Notation
A Partial Ordering for short Conway chains
More Bowers Constructions :
Bowers' Extended Operators
Bowers' Array Notation (4-element Subset)
Bowers Arrays with 5 or More Elements
Generalised Invention of Recursive Functions
Formal Grammars
The Lin-Rado/Goucher/Rayo/Wojowu Method
Lin-Rado Busy Beaver Function
Beyond BB Function
Oracle Turing Machines
Declarative Computation and Combinatory Logic=
Adam Goucher's Ξ(n)

Computation by Formal Logic and Set Theory
Peano Arithmetic
The von Neumann Construction
Forming Predicates
Not So Fast!
Rayo's Calculus
Formulas
Direct Declaration of the Existence of a Number
Doing Maths in First-Order-Logic and Set Theory
Rayo's Number
Variable Assignments
Gödel-Coding
Rayo-nameability
Page 7 of 100
Rayo's Number
BIG FOOT
The Frontier
Transfinite and Infinite Numbers
Ordinal Infinities
The First Cardinal Infinity: Aleph-Null
The Ordinal "Countable" Infinities
Epsilon-Null
All Ordinals Countable by Reordering
Aleph-One
The Continuum
The Continuum Hypothesis
The Power Sets of the Continuum
Inaccessible Infinities
Footnotes
Bibliography and other References
Other Links
Author's Introduction
Large numbers have interested me almost all my life.
This page covers all the huge numbers I have seen discussed in books and
web pages, and it actually does so in numerical order, as near as I can
tell (see the uncomparable and superclass 5 discussions).
One important thing to notice is that all discussions like this ultimately
lead to difficult and unsolved problems in the theory of algorithms and
computation. This page ends with Turing machines just before crossing over
to the transfinite numbers. If you want to learn something about the
theory of algorithms and computation, get two or more fairly knowledgeable
people to compete at describing the highest number they can, and then
stand back!. One such competition (detailed in a footnote) took only a few
days to move beyond the range of everything discussed in the first two-
thirds of this webpage, and then spent another few years discussing formal
proofs.
This page is meant to counteract the forces of Munafo's Laws of

Mathematics. If you see room for improvement, let me know!
Page 8 of 100
Classes
First of all, I'm going to define what I call "classes" of numbers. This
is a somewhat refined and more precise version of the "levels of
perceptual realities" presented by Douglas Hofstadter in a 1982 Scientific
American column [39] (and reprinted in his 1985 book [41]). It is a
powerful and basic concept but usually goes unsaid. I think you'll agree
that the classes make sense and are a useful way to distinguish numbers.
Almost all numbers that are easy to make simple statements about (such as
which of two numbers is larger) can be put into the class system.
All numbers that anyone ever has to deal with in any practical application
(unless you count abstract mathematics and nerdyone-upmanship contests as
practical :-) are members of one of thefirst four classes. Googol and
googolplex are examples from class 2 and class 3, respectively.
Class-0 Numbers
(the concept of subitising)
Class-0 numbers are those that are small enough to have an immediate
intuitive or perceptual impact. Perceiving such a number is called
subitising, and for most purposes the limit has been shown to be somewhere
from 5 to 9 (see Kaufman [30] and Miller [31]). I'll be a bit conservative
here and place the limit at six. So, the numbers 0 through 6 are class 0.
Experiments with animals, when sufficiently well set up and conducted,

show that animals are able to identify numbers of objects and exhibit
different behavior based on whether the number of objects is equal to some
specific value — for example, pressing a lever only when five objects are
present. Such experiments also show that the animal's ability to perform
the feat falls off sharply between 4 and 8: the task can almost always be
performed reliably when the number is 4, and can seldom be performed
reliably when the number is 8 (with intermediate results in-between).
It is a widespread belief (perhaps myth) that there are/were some

primitive tribes which distinguished the concept of number but couldn't
count any higher than three or some other small number. (A tribe called
the Hottentots was said to have only four words for numbers, "one" "two"
"three" and "many" — but not having a word for it is different from not
being able to distinguish it). Whether or not this is true, it reflects
the basic truth of the fact that the human mind requires some additional
abstraction or understanding to go beyond the first few or several small
numbers.
One way to see this phenomenon for yourself is to use flash cards (or a
computer program set up to simulate flash cards) that present pictures of
objects that can be counted and placed in random arrangements — but look
at the picture only long enough to see it, and not long enough to start
counting. Then, after the picture is hidden, ask how many objects there
were. You then try to count the number of objects in your mental image of
the picture you've just seen. If the number of objects is a class 0
Page 9 of 100
number, you'll usually be able to give the right answer. As you increase
the numbers of objects, your counts will be less and less likely to be
correct. Obviously, this gives a rather fuzzy definition of "class 0", but
the value you get will almost always be "around" 6.
Class-1 Numbers
Class-1 numbers are those that are small enough to be perceived as a bunch
of objects seen directly by the human eye. What I mean by "seen directly"
is that it is possible to see the number as a set of separate, distinct
objects in a single scene (no time limit, but the observer and the objects
cannot move). 100 is a class-1 number because it is possible to see 100
objects (goats for example) in a single scene. The limit for class-1
numbers is around a million, 1,000,000 or 106. You can just barely put
1,000,000 dots on a large piece of paper and stand at a distance such that
you can perceive each individual dot as a distinct dot, and at the same
time be within viewing distance of the other 999,999 dots. (I have
actually done this, just for fun!) As with Class-0 the definition is
fuzzy, some people have better vision and could manage 10,000,000 dots or
even more.
The earliest conscious communication of numbers between humans was

probably limited to class-0 and very low class-1 numbers, because of
simple physical methods of counting (like fingers and toes). The first
written number systems consisted of tally marks and extended into the
class-1 range. (Methods involving objects or symbols that each count for
5, 10 or larger values, came later, see below.)
Class-1 numbers include all quantities that people can comfortably handle
or perceive. For values in class 1, it is easy to distinguish the
magnitude of the value just by looking at it. Most people have realised
that, if they walk into a room with 85 people, although they can't tell
it's exactly 85, they know right away it's somewhere around 75 to 100. No
thought or calculation is necessary. This is an immediate perception of
magnitude, and the ability extends to numbers up into the thousands and
tens of thousands, but drops off after that. A person in a stadium with
10,000 people will have a fuzzier magnitude perception (they might guess
anywhere from 3,000 to 30,000). By the time we get to numbers like 108 (the
number of blades of grass in an acre) a person is probably about as likely
to believe "10 million" (107) as "a trillion" (1012) unless they take the
time to do some calculations.
Class-1 numbers also include most types of things that people aggregate or
count with the passage of time. If you have kept count of how many times
you have done something (e.g. jogging) or the number of things in a
collection (e.g. stamps) it probably numbers in the class 1 range. The
actual act of counting usually wears out before exceeding class 1, partly
because of the difficulty of accurately remembering the digits. (While
counting the number of days you have jogged is fairly easy, most people
would not be able to persist in keeping count of how many steps they had
Page 10 of 100
taken once that number gets into 6 or 7 digits!) I tried this myself at
age 9 and reached 35000 before memory became too difficult.
Symbolic representations of numbers soon became common. The earlier

systems were just tally-marks with lots of different symbols, like one
symbol to represent 1's and another to represent 10's, etc. Roman numerals
are the last surviving example of this. Often, different types of physical
objects (like round and flat stones) were used for counting. Many examples
are described in [45]. With symbolic systems it became easy for people to
express, write, and do arithmetic with numbers throughout the class-1
range. Such representation systems usually reached their limit right
around 1,000,000 for the same reasons that class-0 perceptive abilities
are limited to 6: it is difficult to keep track of lots of different types
of symbols/objects at once, and 5 or 6 types of symbols/objects is a
practical limit.
Class-2 Numbers
Class-2 numbers are those that can be represented in exact form using
decimal place-value notation (or another small integer base, like base 2,
16 or 60). Typically this depends on how the digits are recorded and what
you need to do with them. Since I used 6 as the upper limit of class 0,
and 106 = 1000000 for the upper limit of class 1, I'll just continue the
pattern and say that the class-2 numbers go from 106 to about 101000000.
Place-value notation was popularised in the Arabic culture (but came from
India, and perhaps from China before that, again see [45]). It opened up
the range of class-2 numbers to anyone who wanted to use them. It was no
longer necessary to come up with new symbols for each successive power of
10. Generalizations in arithmetic rules were obvious: adding 2000+7000 was
not only analogous to adding 2+7, it was essentially the same thing.
Handling huge numbers became easy. To make an exact calculation about
thousands of objects, only a handful of objects (the digits) need to be
manipulated.
Googol is a class-2 number, as are the various large prime numbers used in
cryptography, all of the known Perfect numbers (until 1997!), the Fermat
numbers with known factorization, etc. All of the large physical constants
like 6.02×1023 (Avogadro's number) and 1080 (the number of protons in the
universe) are class-2. So are most of the numbers with names ending in -
illion, like vigintillion (1063), centillion (10303), and on up to the
somewhat contrived milli-millillion (103000003) (which, by my admittedly
arbitrary decision, is a bit beyond the class-2 range).
The Big Number Names of Nicolas Chuquet

(number names based on Latin)
The word million comes from around 12702, and entered the English language
around 13706. The names billion, trillion, and so on up to nonillion, plus
the general idea of continuing with Latin-derived prefixes all first
appear in the late 15th century, in writing by Nicolas Chuquet, a French
Page 11 of 100
mathematician living in Lyon from 1480 until his death in 1488. (There
were also the longer forms bymillion and trimillion used as early as 1475
by Jehan Adam, but these never caught on). Follow this link for more
details: Origins of the Chuquet number names.
Peletier's Proposal and the Short Scale
In 1549 Jacques Peletier repeated the suggestion that billion should be

one million million = 1012, and trillion for 1018 and so on. He also
introduced1,2 the use of milliart, billiart and so on to represent the
skipped-over powers of 1000, like 109 and 1015.
The long scale is Chuquet's original system, and has digits grouped 6 at a
time, thus trillion is a million times larger than billion. This is the
"billion=1012 system". Peletier's names for 10(6N+3) (in the English
spelling, milliard=109, billiard=1015, etc.) are compatible with this
system.
The use of number-names during the following few centuries eventually led
to widespread usage of billion to mean 109, trillion for 1012, and similar
redefinitions of the higher names. These definitions are the short scale
or "billion=109 system". Follow this link for more on the history of short
vs. long scale. Here is a related video by Numberphile: How big is a
billion?.
Zillions: Big-Number Words as a Hyperbolic Adjective
While the confusion between short and long scale was becoming well-
established, the big-number words ending in -illion were also becoming
popular for the purpose of espressing an excessively or unimaginably
large, or even infinite, quantity. This is a type of usage that was
already common for hundreds, thousands, myriads and millions. For example,
OED's [42] HUNDRED heading 2 a. begins: "Often used indefinitely or
hyperbolically for a large number: cf. thousand. (With various
constructions, as in [heading] I.)", and then gives nine quotations dating
from 1300 AD to 1885. In the following table I show the first documented
use of each number-name in both the literal sense and in this
"superlative" sense.
(It should be noted that zillion more generally can refer to far larger
things. For example, Howard DeLong[34] used the term "zillion" to refer to
an iterated Ackermann function of some other really large number c1.[49]
Standard Accepted Names and SI Prefixes
This table shows all positive powers of ten that have authoritatively
accepted names in English (by [42]) up to Chuquet's highest name
nonillion. The numeric values here follow the billion=109 system ("short
scale"). I am also including a few other non-powers of 10 that have names
in English, but leaving out many base-20 constructions and other names
less than 100, about which you can read plenty in [45]. I include all
Page 12 of 100
former and current official SI prefixes because they are quasi-"words"
that have a purely numerical meaning. The dates of first literal and
superlative usage are largely from OED [42] but are augmented as indicated
in the footnotes.
The Standard Names and SI Prefixes
first
N in first
literal SI
N Latin 103N+3 name for 103N+3 superlative
3,18
usage prefix(es)20
usage [42]
[42]
deca- or
101 ten
deka- (da,dk)
102 hundred 950 AD 1300 hecto- (h)
10×12 great hundred 1533
122 gross 1411
0 103 thousand 971 AD 1000 kilo- (k)
210 1024 kibi- (ki)
123 great gross 1640
104 myriad 1555 1555 myria- (my)
1 unus 106 million 1370 1362 Mega- (M)
220 1048576 Mebi- (Mi)
great million, 1625, ?,

kilomega-,
2 duo 109 milliard, 1793, 182322,
Giga- (G)
billion 169021 ?
230 1073741824 Gibi- (Gi)
megamega-,
3 tres 1012 trillion 169021 184723
Tera- (T)
240 1099511627776 Tebi- (Ti)
4 quatuor 1015 quadrillion 167421 185523 Peta- (P)
250 1125899906842624 Pebi- (Pi)
5 quinque 1018 quintillion 167421 185523 Exa- (E)
260 1152921504606846976 Exbi- (Ei)
Page 13 of 100
6 sex 1021 sextillion 169021 185523 Zetta- (Z)
270 1180591620717411303424 Zebi- (Zi)
Yotta- or
7 septem 1024 septillion 169021 ?
Yotto- (Y)
280 1208925819614629174706176 Yobi- (Yi)
8 octo 1027 octillion 169021 185523
9 novem 1030 nonillion 169021 ?
Chuquet left it to others to work out the details of extending the names
beyond nonillion. Although there is much discrepancy between the actual
number-names in Latin and the -illion names Chuquet listed, it was
nevertheless understood that Latin number-names were to be used to extend
the names as needed. Using Latin for prefixes goes smoothly as far as
vigintillion. The following names are found in many dictionaries19;
vigintillion and centillion are a little more common than the others. Some
popular non-dictionary sources have made reference to millillion and
milli-millillion (mostly due to Henkle/Brooks, and Borgmann [33]).
Larger Standard Names Beyond Chuquet's Nonillion
N N in Latin 3,18 103N+3 name for 103N+3
10 decem 1033 decillion
11 undecim 1036 undecillion
12 duodecim 1039 duodecillion
13 tredecim 1042 tredecillion
14 quattuordecim 1045 quattuordecillion
15 quindecim 1048 quindecillion, quinquadecillion
16 se(x)decim 1051 sexdecillion, sedecillion
17 septemdecim 1054 septendecillion
18 duodeviginti24 1057 octodecillion
19 undeviginti24 1060 novemdecillion, novendecillion
20 viginti 1063 vigintillion
10100 "googol" = ten duotrigintillion
Page 14 of 100
100 centum 10303 centillion
1000 mille 103003 millillion
1000000 decies centena milia 103000003 milli-millillion
1010100 "googolplex"
The Conway-Wechsler System
Chuquet's names are notable for being well-researched, faithful to Latin

within limits of utility, for retaining the meaning of existing widely-
used names, and for being proposed by a respected well-known
mathematician. The Henkle/Brooks names fall short of that mark on one or
two counts.
The only modern-day system with equivalent qualifications is that

described by Conway in [43]. It extends the Chuquet names arbitrarily far,
and surpasses the other ad-hoc systems by having better spelling, greater
consistency, and avoiding hyphens. It was developed by John Horton Conway
and Allan Wechsler after significant research into Latin5. Olivier
Miakinen4,9 refined it to fix a few minor problems, as described below.
The system is based on the short scale (billion=109) but the names could
easily be used in a long scale system. A number name is built out of
pieces representing powers of 103, 1030 and 10300 as shown by this table:
1's 10's 100's
0 - - -
1 un (n) deci (nx) centi
2 duo (ms) viginti (n) ducenti
3 tre (*) (ns) triginta (ns) trecenti
4 quattuor (ns) quadraginta (ns) quadringenti
5 quin (ns) quinquaginta (ns) quingenti
6 se (sx) (n) sexaginta (n) sescenti
7 septe (mn) (n) septuaginta (n) septingenti
8 octo (mx) octoginta (mx) octingenti
9 nove (mn) nonaginta nongenti
The rules are:
Page 15 of 100
- Take the power of 10 you're naming and subtract 3.
- Divide by 3. If the remainder is 0, 1 or 2, put one, ten or one hundred

at the beginning of your name (respectively).
- For a quotient less than 10, use the standard names thousand, million,
billion and so on through nonillion. Otherwise:
- Break the quotient up into 1's, 10's and 100's. Find the appropriate
name segments for each piece in the table. (NOTE: The original Conway-
Wechsler system specifies quinqua for 5, not quin.)
- String the segments together, inserting an extra letter if the letter

shown in parentheses at the end of one segment match a letter in
parentheses at the beginning of the next. For example: septe(mn) +
(ms)viginti = septemviginti because the (m)'s match; Another example:
se(sx) + (mx)octoginta = sexoctoginta.
- For the special case of tre, the letter s should be inserted if the
following part is marked with either an s or an x.
- Remove a final vowel, if any.
- Add illion at the end. You're done.
Many of the resulting names are only slightly different from one another.
For example
10261 is sexoctogintillion, and,
102421 is sexoctingentillion.
Then there's
10309 = duocentillion while,
10603 = ducentillion; and similarly, 10312 = trescentillion while,
10903 = trecentillion.
However, if such subtleties of spelling (and probably pronunciation) don't

bother you, the Conway-Wechsler system extends to arbitrarily high values.
After setting out the rules above, the authors continue7:
With Allan Wechsler we propose to extend this system indefinitely by

combining these according to the convention that "XilliYilliZillion" (say)
denotes the (1000000X + 1000Y + Z)th zillion, using "nillion" for the
zeroth "zillion" when this is needed as a placeholder. So for example the
million-and-third zillion is a "millinillitrillion."
Page 16 of 100
As their example shows, the beginning parts of the standard names such as
million and trillion are used for the "1" and "003" parts (respectively)
of the number 1,000,003, with the placeholder "nilli" for the central
"000" portion. This is the "1,000,003rd zillion", which is
103×1000003+3=103000012. In general, when naming 103N+3, the rules above are to be
used for each group of 3 digits in the number N.
For another example, consider 1019683: this is 103×6560+3, so N=6560. That

breaks up into a "6" part (the standard sextillion) and a "560" part
(sexagintaquingentillion by the above table and rules); these are combined
to form sextillisexagintaquingentillion which is the full Conway-Wechsler
name for 1019683.
The name for googolplex is ten
trillitrestrigintatrecentilli....trestrigintatrecentilliduotrigintatrecent
illion; with the "...." replaced by 30 additional repetitions of
"trestrigintatrecentilli". This name is two words and 3+766 letters long.
See more examples of Conway-Wechsler number names here.
There have also been numerous personal or ad-hoc Chuquet extensions,

follow that link for more.
A Practical Alternative
If the above tables seem a bit much to deal with, here is my modest
proposal for a simpler naming system:
1. Learn a few of the smaller powers of 1000.
2. Beyond that, use "Ten to the power of..." followed by the appropriate
class 1 number.
3. There's no step three!.
The Knuth -yllion Notation
Donald Knuth created a system that extends much further than the standard
Latin-based system. In the essay Supernatural Numbers[38] he wrote:
When we stop to examine our conventional numbers, it is immediately

apparent that these names are "Menschenwerk"; they could have been
designed much better. For example, it would be better to forget about
thousands entirely, and to make a myriad (104) the next unit after
hundreds.
So in this system the word "thousand" is not used, and instead everything
up to 9999 is named using the traditional names for numbers up to 99 plus
"hundred", and no comma is used. For example:
Page 17 of 100
127 = One hundred twenty-seven
1000 = Ten hundred
1356 = Thirteen hundred fifty-six
3000 = Thirty hundred
4192 = Forty-one hundred ninety-two
104 is called "myriad", a name that originally comes from ancient Egypt. It
is written 1,0000 — note that the comma is added to separate the lowest
four digits, not three. Numbers up to 9999,9999 are named like so:
1,2345 = One myriad twenty-three hundred forty-

five
10,0000 = Ten myriad
26,0044 = Twenty-six myriad forty-four
100,0000 = One hundred myriad
1000,0000 = Ten hundred myriad
1400,2054 = Fourteen hundred myriad twenty hundred
fifty-four
4309,8127 = Forty-three hundred nine myriad eighty-
one hundred twenty-seven
108 is called "myllion" (pronounced "mile-yun") and is written 1;0000,0000.

Notice a new punctuation mark is used to represent "myllion". Numbers up
to 9999,9999;9999,9999 are named as in these examples:
9;0000,0000 = Nine myllion

100;0001,0000 = One hundred myllion one myriad
2000;0000,1234 = Twenty hundred myllion twelve
hundred thirty four
4,0006;5020,0100 = Four myriad six myllion fifty
hundred twenty myriad one hundred
Then 1016 is called "byllion", and a new punctuation mark is used. Knuth
points out the advantage of avoiding the long scale vs. short scale
confusion. Notice each punctuation mark can be read exactly when it
appears so it's easy to read off these numbers in words:
1844:6744,0737;0955,1616 = Eighteen hundred forty-four byllion sixty-seven

hundred forty-four myriad seven hundred thirty-seven myllion nine hundred
fifty-five myriad sixteen hundred sixteen
Each new number name is the square of the previous one — therefore, each
new name allows us to name numbers with twice as many digits. This gives
us a lot more mileage out of each name. Knuth continues borrowing the
traditional names changing "illion" to "yllion" on each one.
"vigintyllion" ends up being 104194304, a bit beyond the upper limit of
class-2 numbers.
In the same article [38], Knuth reports that Hsu Yo (living near the end
of the Han dynasty) used the names wan=104, i=108, chao=1016 and ching=1032
as part of a nomenclature system for large numbers. The names descended
Page 18 of 100
into the present-day Chinese wàn, yì, zhào and jing respectively. Usage of
the names zhào and jing for 1016 and 1032 respectively is the "higher degree
system" reported by [45], but this usage did not continue into the present
(see Wikipedia's Chinese numerals article). The ancient usage corresponds
directly to myirad, myllion, byllion and tryllion in Knuth's system,
including the ordering of words to make the names of arbitrary large
numbers. A specific example showing the recursive grouping, with Chinese
spelling, phonetic pronunciation and translation into more familiar
numeric notation is shown in [45] figure 21.41 (page 278). The Chinese
names continue with gai which would be 1064, all the way up to zài=104096
(which is Knuth's decyllion), but usage of the larger ones has only ever
been "theoretical" — no actual usage is known.
Upper Limit of Class 2
As with class-0 and class-1, the limit for class-2 numbers is subjective.
I defined class 2 numbers as those that "can be represented in exact form
using place-value notation", and this depends on where and how the digits
are recorded, which in turn depends on what you want to do with the
number. If you just want to store the exact value of a number and not do
anything with it, you can keep it on a tape or disk, which has much more
capacity — perhaps as much as 1012 digits. For some simpler algorithms,
such as squaring a number and adding together all the digits of the
result, the limit might be quite large — say a billion digits. For
algorithms involving many intermediate results, lookup tables or auxiliary
data, the limit might be lower — perhaps as few as 1000 or 10000 digits.
So, the limit for class 2 could be anywhere from 103 to 1012 digits,
depending on the desired operation. We'll just continue the pattern and
say that class 2 ends at 1 million digits, i.e. numbers up to 101000000 or
10106.
Class-3 Numbers
Class-3 numbers are those that can be represented inexactly using

scientific notation, to within a given percentage of error. Numbers about
the size of Googolplex are class-3 numbers, although Googolplex itself can
be represented exactly. Class-3 numbers include (almost) all combinatoric
enumerations of physical systems (i.e. the number of possible states of a
system containing N particles, where N is a class-2 number, see googolplex
notes). The limit of class-3 numbers depends on the limit of class-2
numbers and the base. As I suggested above, I'll continue the pattern (see
the class 2 introduction) and say that class-3 goes from 10106 to about
10101000000.
Class-3 numbers are the largest which can effectively be compared to see
if they are of comparable magnitude. For example, the following two
numbers are class-3 (and are at the low end, as class-3 numbers go) :
A = 279641170620168673833
Page 19 of 100
B = 350247984153525417450
Which is larger?
We cannot compute the exact values of these two numbers and compare
directly — they have way too many digits to store the values on a
computer. That is the nature of class-3 numbers. However, we can represent
both in scientific notation with 10 digits of accuracy. This is
accomplished in much the same way that your computer or a scientific
calculator would do it. Starting with the logarithm of 2 (or 3), multiply
by the exponent, then divide by the logarithm of 10, separating the
integer from the fractional part, and use the fractional part to determine
the first few digits of the answer. In this case we get:
A = 5.0760252191 × 1023974381246463762439
B = 5.0760252191 × 1023974381246463762439
Now you begin to see the problem. Using 10 decimal places, both values
seem to be the same. (We know they are not, because one is a power of 2
and must be even, and the other, being a power of 3 is odd). As it turns
out, you need at least 20 decimal places to see that B is slightly larger.
Bowers' Named Numbers
Jonathan Bowers 10, about whom we'll discuss a lot more further below, has
invented many names for special numbers in this area: myrillion=1030003,
micrillion=103000003, killillion=103×103000+3, megillion=103×103000000+3,
gigillion=103×103000000000+3, and likewise with higher SI prefixes, which he
extends, e.g. tedakillion=103×103×1042+3. There are also a few ad-hoc Chuquet
extensions that attempt to reach up into this area.
Class-4 Numbers
Now we move on to Class-4 numbers and higher classes. You may have already
seen a pattern here; we'll just continue the pattern:
class-4 numbers are those numbers that are larger than class-3, and whose
logarithm can be represented as a class-3 number.
Now we have another problem as before. Which of the following class-4

numbers is larger?
C = 22283
D = 33352
as before we take the logarithm of both but this time we must do it twice,
and we find
ln(ln(C)) = ln(ln(2)) + [ln(2) * 9671406556917033397649408]

= 6703708186976009930559261.24579...
Page 20 of 100
ln(ln(D)) = ln(ln(3)) + [ln(3) * 6461081889226673298932241]
= 7098223961595389530659098.10481...
so D is larger.
Skewes' Numbers
These numbers occur in the study of prime numbers, and particularly the
frequency of occurrence of prime numbers. Gauss' well-known estimate of
the number of prime numbers less than N is
oo / | 1 Li(n) = | ----- du ~= u/(ln(u)-1) | ln(u) / u=2
For all values of n up to 1022 (which is as far as we've been able to

compute so far) Li(n) is an overestimate. Littlewood showed that above
some value of n it becomes an underestimate, then at an even higher value
of n it becomes an overestimate again and so on. In 1933 Skewes showed
that (if the Riemann Hypothesis is true) the first crossing cannot be
greater than eee79. This is the first or "Riemann true" Skewes' Number; it
is class-4. Converted to base 10, the value is normally approximated as
101010^34; a more accurate approximation is 10108.852142×1033 or
10108852142197543270606106100452735038.55.
Since then, others have improved the estimate dramatically. Conway and Guy
(The Book of Numbers, page 61) cite the result of Lehman, who in 1966 gave
an upper bound of about 101167. According to Eric W. Weisstein and
Wikipedia, in 1987 H. J. J. te Riele reduced the upper bound of the first
crossing to ee(27/4), a class 2 number approximately equal to 8.185×10370. In
2000 Bays and Hudson found an actual crossover point using numerical
techniques — around 1.39822×10316. Most recently, in 2005 Patrick Demichel
found a smaller crossover point near 1.397162914×10316. In any case, the
original Skewes' Number is now just an interesting part of history.
In 1955 Skewes also defined an upper bound if the Riemann Hypothesis is

false: 1010101000. This is the "second Skewes' Number"; it is much larger,
but still class-4.
Class 5 and Higher
In a similar way, we can define higher classes:
class-5 numbers are those numbers that are larger than class-4, and whose
logarithm can be represented as a class-4 number.
class-N numbers are those numbers that are larger than class N-1, and
whose logarithm can be represented as a class N-1 number.
but as it turns out, these higher classes aren't too useful for
representing the large numbers of abstract mathematics. Once we get into
the really big numbers like the ones discussed below, exponents are so
Page 21 of 100
unwieldy that they are no longer used directly — instead faster-growing
functions like the hyper4 function are used.
Here is a summary of what has been covered so far:
class from to distinguishing characteristics
unconscious awareness; animal

0 1 6
brain
1 6 1,000,000 = 106 visual acuity; direct familiarity
2 106 101,000,000 = 10106 exact representation of integers
3 10106 10101,000,000 = 1010106 X indistinguishable from X+1
4 1010106 1010101,000,000 = 101010106 X indistinguishable from 2X
101010101,000,000 =
5 101010106 X indistinguishable from X2
10101010106
10101010101,000,000 = log(X) indistinguishable from

6 10101010106
1010101010106 (log(X))2
1010101010101,000,000 = log(log(X)) indistinguishable from

7 1010101010106
101010101010106 (log(log(X)))2
Uncomputably Larger and Uncomparable
At this point it is useful to define the concept of uncomputably larger.
Uncomputably Larger: A is Uncomputably Larger than B if A is larger than

B, but the difference does not show up when the numbers are expressed in
the same system of representation (with a given standard number of
digits).
A system of representation is any system of digits and/or symbols used to

express numbers in a standard form that lends itself well to seeing which
is bigger. Two examples are integers with a limited number of digits (like
a calculator that overflows above 99999999) and standard scientific
notation. There are some other computer-oriented examples on my
Alternative Number Formats page. Mathematical formulas in general do not
count (but they are addressed by the less precise concept of uncomparable,
see below).
If A is a class 3 number number and K is class 2 or smaller, it is easy to

distinguish A×K from A, but hard to distinguish A+K from A. So A+K is
"uncomputably larger" than A.
Page 22 of 100
For an example of this, imagine A has trillions of digits. If you add some
small number to it, only the last few digits will change — and all of the
digits would have to be stored and examined to tell the difference. On the
other hand, multiplying A by a small number N will change all the digits,
and you can distinguish the difference by comparing the logarithm of A to
that of A×N
If A is a class 4 number and K is class 3 or smaller, it is easy to

distinguish AK from A, but hard to distinguish A×K from A. A×K is
uncomputably larger than A.
If A is a class 5 number and K is class 4 or smaller, it is easy to

distinguish A④K from A, but hard to distinguish AK from A. (④ is the
hyper4 operator). AK is uncomputably larger than A.
This pattern does not continue with higher operators, because the "class"
system is based on exponents. For example, if A is a class 10 number and K
is class 9 or smaller, it is still easy to distinguish A④K from A, and
hard to distinguish AK from A.
I also define the term uncomparable:
Uncomparable: A and B are said to be Uncomparable if it is unknown how to

distinguish which is larger. (This is a made-up word, and is not intended
to be the antonym of "comparable".)
Notice that this definition depends not only on A and B but also on one's
knowledge and/or ability. As you go to higher and higher operators and
functions it becomes quite difficult to determine which values are larger
than others (I refer to this later in my discussion of superclass 5). It
is easy to see that Skewes' Number is bigger than googolplex, but not
nearly so easy to figure out which of the "Graham-Rothschild number" and
the Moser is bigger.
The "Graham-Rothschild number" and the Moser are defined with different
systems of representation, and the two systems cannot be readily converted
into each other. They would be called uncomparable until the two systems
are studied and a method is developed to show which number is larger. Once
such a method was developed, and it was determined which is larger, they
are no longer "uncomparable".
To clarify all of this, here are examples of pairs of numbers as

classified by the definitions computably larger and uncomparable:
computably larger
(using 10 digits in both mantissa and 143 > 127
exponent)
279641 > 350247
Page 23 of 100
uncomputably larger (using 10 digits) but
350247984153525417450 >
computably larger (using 20 or more digits
279641170620168673833
in both mantissa and exponent)
uncomputably larger (using ordinary

scientific notation) 1010100+1 > 1010100
yet not uncomparable
uncomparable "Graham-Rothschild number" >

(without some effort) Moser
Busy Beaver functions for two

uncomparable
different types of Turing
(even with extreme effort)
machine
Power Towers
A tower of exponents like eee79 mentioned above in the discussion of Skewes'

numbers is often called a power tower. Notice how in the examples of
class-3 numbers there are three numbers in the power tower, and in the
class-4 examples there are 4 numbers in the power tower. But this isn't
necessarily always the case, for example 2222} is a power tower with 4
numbers but its value is 65536, a class-1 number.
Problem : Start with the 3-level power tower 2210. Consider two different
ways to make it bigger: Increase the bottom-most number, making it H210
where H is something really huge like 1000000, or make the power tower
higher by making it S2210, where S is something really small like 1.001.
Determine which is biggest: the original power tower X = 2210, or the two
altered versions, A = 1000000210 or B = 1.0012210 ?
First we show that A and B are both bigger than X. A>X is obvious. For B
it's less obvious. We're comparing:
B = 1.0012210 ⋛ 2210 = X
Convert both to a power of 2, by taking the log base 2 of 1.001, and

repeat. We get:
B = (20.001442)2210 ⋛ 2210 = X
B = 2(0.001442 × 2210) ⋛ 2210 = X
log2B = 0.001442 × 2210 ⋛ 210 = log2X
log2B = 2(-9.4377 + 210) ⋛ 210 = log2X
log2(log2B) = -9.4377 + 210 ⋛ 10 = log2(log2X)
Page 24 of 100
log2(log2B) is about 1014.56, much bigger than log2(log2X) which is 10. So
B>X.
Now we need to compare A to B. Let's rewrite A as a power of 1.001:
A = (1.001(log1.0011000000))210 = 1.001(log1.0011000000) × (210)
Then it is a question of which of these is larger:
A' = log1.0011000000 × 210 or
B' = 2210.
Substituting 210 = 1024, we're comparing
A' = (log1.0011000000) x 1024 to
B' = 2^1024, so
A'/B' = (log1.0011000000) x 1024 / 21024. Cancelling powers of 2, we remove

1024 from the numerator and reduce the denominator to 21014:
A'/B' = log1.0011000000 / 2^1014. log1.0011000000 is pretty large (it's a

little over 13822) but 2^1014 is much much larger! So B' is larger, and
therefore B is the biggest of our three power towers.
We could have used any really big number H in place of 1000000 and any
small number S in place of 1.001 and B would still be the biggest, as long
as logSH is less than 21014. 21014 is about 1.7556×10305, a class 2 number. To
show how extreme this is,
let H be a googol and
S be 1+1/googol.
logSH would be ln(10)×100×googol = 2.3026×10102,
still much less than 21014. So even with this really huge H and really
small S, the power tower S2210 is still bigger than H210.
In general, if you have a tower of exponents and you want to make it

larger, you'll make it much larger by adding another exponent to the
bottom than by increasing the size of the bottom exponent, as long as the
tower of exponents has something fairly big at the top and the numbers
involved are all class 1 or on the lower end of class 2. This leads to the
(somewhat nonintuitive) result that if you're comparing two towers of
exponents, you can look at how many exponents are in the tower and know
right away which is larger. For example,
1.11.11.11000 is much larger than
Page 25 of 100
100010001000
and
1.11.11.11.11.11.11.11000 (which has 7 levels of exponents with 1000 at the top)

is much larger than
1000100010001000100010001000 (which has 6 levels of exponents).
A similar phenomenon, the power tower paradox, causes two power towers to
be effectively the same if the numbers at the top are the same, even if
the numbers near the bottom are different. For example, 271010100 is almost
exactly the same as 101010100
If you're interested in trying some of these out, my Perl-based calculator

program can handle everything discussed so far, up to power towers
thousands of numbers high.
Inventing New Operators and Functions
The concept of the "classes" described so far does quite well at handling
everything that can be done with exponents, which are the most powerful
operator known to most people. To proceed further we begin to invent new
operators. This practice of inventing new operators continues over and
over again as you go to higher and higher large numbers. The new operators
overcome the limits of the old operators, limits that are reached as the
old notation becomes unwieldy.
For example, class-1 numbers are written in traditional place-value

notation, which is essentially abbreviated addition and multiplication.
For example:
3158 = ((3 × 10 + 1) × 10 + 5) × 10 + 8
Although we don't normally think of it that way, the place-value notation

avoids the unwieldy use of lots of symbols.
When expressing larger numbers, like Avogadro's number and googol, one
usually uses exponents and power towers, as discussed above:
6.02 × 1023, 10100, 1010100, 27256312546656, etc.
but after a while that becomes unwieldy too. Eventually there are so many
exponents that it cannot be written on a page. Then it becomes a good idea
to invent a new new shorthand, which amounts to defining a new operator.
Beyond Exponents: the hyper4 Operator

(most commonly called "tetration")
(my early name for this was "powerlog")
Page 26 of 100
The first new operators used by those seeking large numbers are usually
higher dyadic operators. A dyadic operator is one that has two arguments —
two numbers that it acts on. Usually in notation the operator is placed
between the two numbers.
The most common higher dyadic operators follow the pattern set by the
well-known three (addition, multiplication and exponentiation). These
operators come up a lot in the definitions of large numbers that are to
follow.
Following an obvious pattern in the three common operators, the new

operator can be defined as shown here:
absolute
operation representation inductive definition
definition
successor (a + (b-1))
addition a + b or a①b - or successor ((a-1) +
b)
a×b or a*b or a+(a×(b-1)) or (a×(b-

multiplication a + a + ... + a
a.b or a②b 1))+a
ab or a^b or a↑b
exponentiation a × a × ... × a a×(a(b-1)) or (a(b-1))×a
or a③b
a^^b or a↑↑b or
hyper4 a^(a^(...â)) a^(a④(b-1))
a④b
hyper4 a④b ((aâ)^...)â (a④(b-1))â
Note that for the last operator, there are two ways to interpret the
absolute and inductive definitions, producing different hyper4 operators.
In common practice, the first one is used because the other one can be
reduced to a combination of two exponent operators: a④b=aa(b-1), and thus it
does not really count as a new operator.
The names tetration, superpower, and superdegree have also been used to
refer to the hyper4 operator. (As a child I used the somewhat misleading
name powerlog for hyper4, as in 2 powerlog 5 is 65536.)
Extension to reals: Now, suppose you want to calculate 2④2.5 or pi④e. The
above definition isn't too useful because the number after the ④ has a
Page 27 of 100
fractional part. What we would need is a way to "extend" the hyper4
operator to real numbers. Unfortunately, this is tough to do in a way that
meets the types of standards mathematicians generally want such things to
have. I also know of no proof that such extension is impossible. A lot of
people have worked on this over the years, and if you're interested, I
suggest you check my notes here, and the Tetration FAQ.
A "logarithm" for hyper4 : Another common question about hyper4 is how to

perform the inverse operations — the equivalent of the "hyper4 logarithm"
and "hyper4 root". There is no good answer for either one, until the
problem of extending hyper4 to the reals is solved. The "hyper4 root" can
be evaluated for fixed integer "root" values using Newton's method. For
example, to take the "2nd hyper4 root", use this algorithm:
(given: number X, we want to find R such that R④2 = X. Note that R④2 =
RR.)
step action notes
1. R = ln(X) first approximation of answer
2. Y = RR calculate the function
3. dY = Y (1 + ln R) derivative with respect to R
4. new R = old R + (X-Y)/dY new approximation by Newton's method
5. go back to step 2 repeat until accurate enough
The hyperlogarithm is intuitively similar to the "class number" (see my

description of classes above) along with a fraction indicating how far
through the class we are. It is very similar to the level-index
representation and to the internal format used by my hypercalc program.
Here are some hyperlogarithm values (to base 10) using a definition from
Trappman's Tetration FAQ15:
hyperlog(2) ≈ 0.39
hyperlog(100) ≈ 1.39
hyperlog(10100) ≈ 2.39
hyperlog(1010100) ≈ 3.39
...
The function "below" addition: Some people have also developed a hyper0
function. If you think about it, addition is a shortcut for counting, in
much the same way multiplication is shortcut for addition. The following
definition for a hyper0 function was developed by Constantin Rubtsov:
Page 28 of 100
a⓪b = a (if b = -∞)
a⓪b = b (if a = -∞)
a⓪b = a+2 = b+2 (if a = b)
a⓪b = a+1 (if a > b)
a⓪b = b+1 (if b > a)
This function, appropriately enough, is also the "successor" function used

as the primitive computational element in algorithms defined in the Church
theory of computation, which includes the original Ackermann function. For
more on how this is done see my page on functional computation.
The Hyperfactorial and Superfactorial Operators
These are single-argument functions like the factorial but producing

higher values.
N.J.A. Sloane and Simon Plouffe use hyperfactorial to refer to the integer
values of the K-function, a function related to the Riemann Zeta function,
the Gamma function, and others. It is
H(n) = nn (n-1)n-1 ... 33 22 11
For example, H(3) = 27×4×1 = 108 and H(5) = 86400000. This function does
not really grow much faster than the normal factorial function.
In 1995, Pickover defined the superfactorial n$ (think of the dollar sign

as a factorial sign with an S for "super" drawn on top of it) as follows:
n$ = n!n!n!....n!
where there are n! repetitions of n! on the right hand side. Using the
hyper4 operator, n$ is equivalent to:
n$ = n! ④ n!
There are other ways to define a higher version of the factorial, such as
this and this.
To get an idea how big the hyperfactorial of a pretty normal number can
be, read Wayne Baisley's wonderful article "Quantity Has A Quality All Its
Own" (and bring your towel).
More Bowers Names
Jonathan Bowers, mentioned above, has many names covering this area. For
example, in analogy to googol and googolplex he refers to 10④100 as giggol
and 10④(10④100) as giggolplex.
Page 29 of 100
Higher hyper operators
Of course, the pattern of dyadic operators is easily continued:
absolute inductive
operation representation
definition definition
hyper5 a^^^b or a↑↑↑b or a⑤b a④(a④( ... ④a)) a④(a⑤(b-1))
a^^^^b or a↑↑↑↑b or
hyper6 a⑤(a⑤( ... ⑤a)) a⑤(a⑥(b-1))
a⑥b
and so on.
Bowers has several named numbers in this area, including trisept, 7⑦7;
tridecal, 10⑩10; and the aptly named boogol, the frighteningly large
10(100)10.
The first triadic operator
Since the dyadic operators all fall into a pattern, it is logical to

define a triadic operator that combines them all. A triadic operator is an
operator that acts on three numbers, just as a dyadic operator acts on two
numbers.
This new triadic operator is represented as a function with three

arguments, and defined as follows:
hy(a,n,b) = { 1 + b for n = 0 { { a + b for n = 1 { { a * b for n = 2 { {

a ^ b for n = 3 { { a ^ hy(a,4,b-1) for n = 4 { { hy(a,n-1,hy(a,n,b-1))
for n > 4 { { a for n > 1, b = 1
The following definition is equivalent:
hy(a,n,b) = { 1 + b for n = 0 { { a for n = 1, b = 0 { { a for n > 1, b =

1 { { hy(a,n-1,hy(a,n,b-1)) for n > 0
and also note that:
hy(a,3,b) = a↑b = ab
hy(a,4,b) = a↑↑b
hy(a,5,b) = a↑↑↑b
hy(a,6,b) = a↑↑↑↑b
etc.
Page 30 of 100
At this point we return to the work of Jonathan Bowers to introduce his

array notation. This notation is elegant, powerful, relatively easy to use
and covers a greater range than any other discussed on these pages, within
the limits of functional formal systems.
We will start by showing a very reduced version of the notation, which

uses arrays of only 1, 2, or 3 elements. The rules for converting the
notation into a number are:
1. For one- and two-element arrays, just add the elements. [a] = a and
[a,b] = a+b
2. If rule 1 does not apply, and if there are any trailing 1's, remove
them: [a,b,1] = [a,b] = a+b; [a,1,1] = [a].
3. If neither previous rule applies, and the 2nd entry is a 1, remove all
but the first element: [a,1,n] = [a] = a.
4. There is no rule 4 (there will be when we get to bigger arrays).
5. Otherwise replace the array [a,b,n] with [a,[a,b-1,n],n-1], then go
back and repeat the rules to expand it further.
With just a little effort you can see that these rules make [a,b,n]
equivalent to hy(a,n,b) except for the special case of n=0. Compare the
formula of rule 5:
[a,b,n] = [a,[a,b-1,n],n-1]
with the general case of the definition of the hyper function:
hy(a,n,b) = hy(a,n-1,hy(a,n,b-1))
They are the same except the order of the arguments is different. Bowers
arranges the arguments in order of increasing "growth potential" — the
operator has higher growth potential than b, so it goes last.
So, all 3-element Bowers arrays are equivalent to the normal hyper
operators. [3,2,2] = 3②2 = 3×2 = 6; [3,2,3] = 3③2 = 32, [4,5,6] = 4⑥5,
etc.
hyper operator variant: Knuth's Up-arrow Notation
The use of two or more carets (as in "a^^b" or "a^^^b") resembles a

notation defined by Donald Knuth17 in 1976 ("a↑↑b" and "a↑↑↑b"
respectively), and is equivalent to the hyper operator. Carets are
commonly seen in old ASCII sources such as mailing lists from the early
days of USENET, but Knuth used real arrows: a↑↑b and a↑↑↑b instead of a^^b
or a^^^b.
a ↑↑ b = hy(a,4,b)
a ↑↑↑ b = hy(a,5,b)
Page 31 of 100
a ↑↑↑↑ b = hy(a,6,b)
(etc.)
using the hy() function allows for a more compact representation of really
large numbers that would otherwise take a lot of arrows. For example,
hy(10,20,256) is equivalent to 10 ↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑↑ 256
and hy(256,625,4096) would be very unwieldy. Bigger numbers like

hy(256,4294967296,256) can't be written at all.
This up-arrow notation is used in defining the Ackermann numbers
A(n) = n ↑↑↑...↑↑↑ n (with n up-arrows) = hy(n, n+2, n)
which are related to the Ackermann function described below.
In 2010 Knuth informed me [50] that he has found "the Ackermann-like

'arrow notation' in a 19th century encyclopedia."
Partial Ordering for Knuth Up-Arrows
One may speculate on the general problem of determining which is the

larger of two values a↑↑↑...↑↑↑b and x↑↑↑...↑↑↑y. We can begin to make
answer that question for small numbers of up-arrows. In particular (for
later discussion) we care about the answer when a, b, x and y are positive
integers.
First, note that if a is 1, a↑↑↑...↑↑↑b is just a power of 1, which is

always 1. Also, if a and b are 2 then the value of a↑↑↑...↑↑↑b is 4,
regardless of the number of arrows.
With a single arrow a↑b is the familar exponentiation operation.
23 is smaller than 32
24 is the same as 42
for any other a and b, if b is greater than a then ab is greater than ba.
in general, to compare ab to cd we can probably calculate both directly, so
long as all four numbers are class 1.
With two arrows, a↑↑b is a "power-tower" of height b. Using Hypercalc it is

relatively easy to compile a list of a↑↑b for all the smaller values of a
and b, and larger values of a. Here I'll also show the smaller values of
c↑d that are not expressible as in the form a↑↑b, to see where they fit in:
1↑↑a = 1, for all a

2↑↑1 = 1
3↑↑1 = 3
2↑↑2 = 22 = 4 = 4↑↑1
5↑↑1 = 5
6↑↑1, 7↑↑1, etc. (a↑↑1 = a, for all a)
Page 32 of 100
23
32
2↑↑3 = 2^(22) = 24 = 42 = 16
52
3↑↑2 = 33 = 27
25
62, 72
26 = 82
34 = 92
102, 112
27
122 through 152 and 35
4↑↑2 = 44 = 28 = 256
172, etc.; 28, etc.; 36, etc.
5↑↑2 = 55 = 3125
562, etc.; 153, etc.; 212, etc.; 38, etc.
6↑↑2 = 66 = 363 = 2162 = 46656
2172, etc.
2↑↑4 = 216 = 48 = 164 = 2562 = 65536
7↑↑2 = 77 = 823543
.. 8↑↑2, 9↑↑2, through 11↑↑2
3↑↑3 = 3↑(33) = 327 = 7625597484987
12↑↑2 = 1212 = 8916100448256
.. 13↑↑2 through 80↑↑2
4↑↑3 = 4↑(44) = 4256 ~ 1.3408×10154
81↑↑2 = 8181 ~ 3.8662×10154
.. 82↑↑2 through 758↑↑2
5↑↑3 = 5↑(55) = 53125 ~ 1.911×102184
759↑↑2 = 759759 ~ 1.269×102186
.. 760↑↑2, etc.
2↑↑5 = 2↑(2↑↑4) = 265536 ~ 2.004×1019728
5298↑↑2 = 52985298 ~ 2.214×1019730
.. 5299↑↑2, etc.
6↑↑3 = 6↑(66) = 646656 ~ 2.659×1036305
7↑↑3 = 7↑(77) ~ 3.76×10695974
.. 8↑↑3 through 11↑↑3
3↑↑4 = 3↑(3↑↑3) ~ 1.35×103638334640024
12↑↑3 = 12↑(12↑↑2) ~ 5.85×109622088391634
.. 13↑↑3 through 80↑↑3
4↑↑4 = 4↑(4↑↑3) ~ 108.0723×10153
.. 81↑↑3 through 758↑↑3
5↑↑4 = 5↑(5↑↑3) ~ 101.336×102184
.. 759↑↑3, etc.
2↑↑6 = 2↑(2↑↑5) ~ 106.03×1019727
6↑↑4 = 6↑(6↑↑3) ~ 102.07×1036305
.. 7↑↑4 through 11↑↑4
3↑↑5 = 3↑(3↑↑4) ~ 106.46×103638334640023
...
Page 33 of 100
A pattern emerges: except when a is 2 or when b is 2, the values of a↑↑b
generally follow the rule:
If y is larger than b, x↑↑y will be larger than a↑↑b.
However there are exceptions for smaller b or moderately larger a: as 12↑↑2

is larger than 3↑↑3; 81↑↑2 is larger than 4↑↑3, and similar things happen
further along in the list.
But even including these smaller b or larger a cases, a more general

pattern is seen, namely that increasing b by one always gives a value that
is about 10 to the power of whatever we had before: 4↑↑3 ~ 1.3408×10154, and
4↑↑4 ~ 108.0723×10153. This is related to the "power tower paradox".
It is also generally true that if b is 3 or more, all of the numbers of

the form a↑↑b are larger than anything of the form c↑d (with one arrow, and
with "reasonably-sized" c and d). The smallest c↑d bigger than 3↑↑3 is
12↑12; in order for c↑d to be bigger than 4↑↑3 you need to go up to 81↑81,
and so on.
Now let's make a similar list of a↑↑↑b examples, and showing how the a↑↑b
values fit in:
2↑↑↑2 = 2↑↑2 = 2↑2 = 4

2↑↑3, 3↑↑2 through 6↑↑2
2↑↑↑3 = 2↑↑2↑↑2 = 2↑↑4 = 2↑2↑2↑2 = 2↑2↑4 = 2↑16 = 65536 = 2↑↑4
7↑↑2 through 11↑↑2
3↑↑↑2 = 3↑↑3 = 3↑27 = 7625597484987
12↑↑2, etc.; 4↑↑3 through 80↑↑3; and 3↑↑4
4↑↑↑2 = 4↑↑4 = 4↑4↑4↑4 ~ 108.0723×10153
all the rest of the a↑↑b in the list above
5↑↑↑2 = 5↑↑5 = 5↑5↑5↑5↑5 ~ 10101.33574×102184
2↑↑↑4 = 2↑↑(2↑↑(2↑↑2)) = 2↑↑(2↑↑4) = 2↑↑16, a tower of height 16 (or
10↑10↑...6.03×1019727 with eleven 10's at the beginning, which in Hypercalc
is written "11pt6.03×1019727")
3↑↑↑3 = 3↑↑(3↑↑3), a tower of height 7625597484987
4↑↑↑3 = 4↑↑(4↑↑4), a tower of height 108.0723×10153
5↑↑↑3 = 5↑↑(5↑↑5), a tower of height 10101.33574×102184
6↑↑↑3 = 6↑↑(6↑↑6), a tower of height 3pt2.0692×1036305
10↑↑↑3 = 10↑↑(10↑↑10), a tower of height 7pt1010000000000
.. 8↑↑↑3 through 13↑↑↑3
2↑↑↑5 = 2↑↑(2↑↑↑4), a tower of height 2↑↑16 ~ 11pt6.03×1019727
.. 15↑↑↑3 through 7625597484980↑↑↑3 and (perhaps 7625597484981↑↑↑3)
3↑↑↑4 = 3↑↑(3↑↑↑3), a tower of height 3↑↑↑3 ~ 7625597484984pt3638334640023.8
Page 34 of 100
4↑↑↑4 = 4↑↑(4↑↑↑3), a tower of height 4↑↑↑3
.. 5↑↑↑4 through 13↑↑↑4
2↑↑↑6 = 2↑↑(2↑↑↑5), a tower of height 2↑↑↑5
.. 14↑↑↑4 through 7625597484980↑↑↑4 ...
Once again a pattern emerges: except when a is 2, the ordering is

determined first by b and then a. It shouldn't be hard to believe that the
same thing happens again for a↑↑↑↑b, a↑↑↑↑↑b, and so on for larger numbers
of arrows. The exception when a is 2 really continues all the way, for
example:
2↑↑↑↑3 = 2↑↑↑(2↑↑↑2) = 2↑↑↑4, a tower of height 16,

but 3↑↑↑↑2 = 3↑↑↑3 = 3↑↑(3↑↑3) = 3↑↑(3↑3↑3) = 3↑↑(3↑27), a tower of height
327, which is much larger
And so we have the:
General Rule for Partial Ordering of the hyper Operator:
If a, b, c, x, y and z are all "of reasonable size", then with few

exceptions, when comparing hy(a,b,c) to hy(x,y,z):
the one with more arrows (b versus y) is larger;
when b = y, the one with the larger number on the right (c versus z) is
larger;
when b=y and c=z, the one with the larger number on the left (a versus x)
is larger.
Detailed Rules for Partial Ordering of the hyper Operator:
When comparing hy(a,b,c) to hy(x,y,z):

if a = x = 1, they are equal,
if a is 1 and x is larger, then hy(x,y,z) is larger
if x is 1 and a is larger, then hy(a,b,c) is larger
if a = c = x = z = 2, they are equal,
if y is larger than b, then hy(x,y,z) is larger
if b is larger than y, then hy(a,b,c) is larger
if b and y (the number of up-arrows) is the same, and if a and x are
both larger than 2, then hy(a,b,c) is larger if c is larger than z, or
hy(x,y,z) is larger if z is larger than c
Proof Becomes Difficult
At this point we begin to encounter functions and definitions that are

difficult to compare to one another, either because they are not very
thoroughly worked out, or because it takes so much work to actually prove
whether one grows more quickly than another.
Page 35 of 100
Gödel Numbers
The Gödel number of G, Gödel's undecidable sentence, is probably around

here somewhere (its value depends highly on what operators, functions,
etc. are available to construct primitive-recursive statements in the
formalised number theory system that the Gödel technique is applied to)
Goodstein sequences
Almost certainly higher than this (but who can say?) are numbers related
to the Goodstein sequence.
(For more detailed descriptions, see the Wiki entry and this page by
Justin Miller)
G(n) = highest value of the sequence constructed as in Goodstein's Theorem

from starting value of n.
The hereditary base-b representations are analogous to ordinal infinities

and subtracting one works like the successor function on the ordinal
infinities — except that "w" is always finite no matter how big it gets,
so every individual "w" term is guaranteed to eventually become bound to a
finite value and therefore will eventually go down to zero.
The convergence of a series with no second-level exponents is easy to see:
v2 + 2
v2 + 1
v2
v2 - 1 = (a-1) × v + (a-1) where a = value of v at this step
...
(a-1) × v
(a-2) × v + (b-1) where b = value of v at this step
...
(a-2) × v
(a-3) × v + (c-1) where c = value of v at this step
...
2 v
v + (d-1) where d = value of v at this step
...
v
e - 1 where e = value of v at this step
When higher-level exponents are involved, the series will get longer each
time a higher-level exponent has to be decremented. Each time the series
will become enormously longer, but will still be of finite length.
Therefore, the same principle applies.
For example, let's start with
27 (in base 2, 110112). We get:

Page 36 of 100
110112
110113 = 11210. 110103 = 11110
110104 = 32410. 110034 = 32310
110035 = 75310. 110025 = 75210
110026 = 151410. 110016 = 151310
110017 = 274510.
110008 = 460810. 107778 = 460710
107779 = 719810.
1077610
1077511
1077412
1077313
1077214
1077115
1077016 = 6744010. 1076F16 = 6743910
1076F17
...
lg(2n) ≈ hy(2, 2+n, 2n)
Consider the lower Goodstein sequence, and look at just one of the
exponents in that sequence, and call it "c". As we have already shown, "c"
will eventually get decreased to a lower number. Call that number "d"
(which is c minus one). At this point the iteration continues for an even
longer time with no change to "d" or any of the other "exponents", but
eventually as before, the lowest exponent will have to get dinimished
again. So in this way we see that each exponent will eventually get
replaced with a lower one. Each step takes massively longer than the
previous step, but all steps are still of a finite length (not an infinite
length) so eventually even the highest exponent will get decreased.
Other Triadic Operators
A common trick that clearly generates faster-growing functions involves

defining functions that take more than two arguments. We have seen how the
hyper operator, our first triadic operator, easily covers everything all
the dyadic operators can handle. This trend continues. Of course, all
operators can be referred to as functions, and the dyadic operators are
actually functions with two arguments.
The Steinhaus-Moser-Ackermann operators
The Ackermann function and the Steinhaus-Moser notation are both

equivalent to a triadic operator that is somewhat more powerful than the
hy(a,b,c) function above. The Ackermann function and Steinhaus-Moser are
roughly equivalent to each other so we'll discuss them together.
Page 37 of 100
Ackermann's Function
A recursive function first described by W. Ackermann in 1928 to

demonstrate a property of computability in the field of mathematics, and
also used more recently as an example of pathological recursive functions
in computer science. There are many different versions of the function;
for a complete description of each go here. I will use the version that is
the simplest to convert to the hyper operators:
ack-rm(a,b) = { 2b for a = 1 { { 2 for b = 1, a > 1 { { ack-rm(a-1, ack-

rm(a,b-1)) for a,b > 1
which yields to analysis as follows:
ack-rm(1,b) = 2b ack-rm(a,1) = 2 ack-rm(2,b) = ack-rm(1, ack-

rm(2,b-1)) = 2*ack-rm(2,b-1)
and by induction,
ack-rm(2,b) = 2^b ack-rm(3,b)
= ack-rm(2, ack-rm(3,b-1))
= 2âck-rm(3,b-1)
and by induction,
ack-rm(3,b) = 2^{(#4#)}b ack-rm(4,b)
= ack-rm(3, ack-rm(4,b-1))
= 2^âck-rm(4,b-1)
and by induction,
ack-rm(4,b) = 2^{(#5#)}b
and by induction,
ack-rm(a,b) = hy(2,a+1,b)
The example value most commonly cited is ack-rm(3,5), 2④5 which is 265536, a
large class-2 number. Of course, as with Steinhaus-Moser notation it is
easy to transcend the classes entirely.
At this point it is tempting to try to avoid the "triadic function

requirement" noted above by defining a single-variable function, such as:
a1(n) = ack-h(n,n,n)
While it is true that a1(x) grows just as fast as the ack-h() function,
and therefore serves as a good way of defining large numbers as a function
of one variable, actually computing those numbers involves the recursive
definition of the function. If x>1, we have:
Page 38 of 100
a1(x) = ack-h(x,x,x) = ack-h(x-1, x, ack-h(x,x,x-1))
note that the arguments of the two ack-h functions on the right are not
equal to each other, and therefore we can't substitute from the definition
of a1(n) to make the right side be in terms of the a1() function.
However, as seen above it is possible to reduce the Ackermann function to

two arguments. Furthermore, it is the fastest-growing function you can get
using two arguments, if the function is defined only in terms of calls to
itself and the "successor function" f(x)=x+1.
The Mega and the Moser
These numbers were constructed by Hugo Steinhaus and Leo Moser (in the
late 1960's or earlier, exact dates unknown) just to show that it is easy
to create a notation for extremely large numbers.
In the earlier "Steinhaus polygon notation", there were 3 special symbols:

a triangle, a square, and a circle.
X inside a triangle equals XX

X inside a square equals X inside X concentric triangles
X inside a circle equals X inside X concentric squares
Steinhaus called 2 inside a circle "Mega" and 10 in a circle "Megiston".
Later, in what is now called "Steinhaus-Moser notation", the circle was

replaced with a pentagon and higher-order polygons (hexagons, etc.) were
added to the system.
As before, X inside a triangle equals XX

As before, X inside a square equals X inside X concentric triangles
X inside a pentagon equals X inside X concentric squares
X inside a hexagon equals X inside X concentric pentagons
and so on.
The "Mega" is now represented by 2 inside a pentagon, and "Megiston" is 10

inside a pentagon. A new, much larger quantity called "Moser's number" is
"2 inside a megagon", where a "megagon" is a polygon with a Mega sides.
(Both versions of the notation predate 1983; I learned about the Moser's
number at an academic library in June 1979.)
Here is the notation in functional form:
sm(a,n,p) = { aâ for n = 1, p = 3 { { sm(a,a,p-1) for n = 1, p > 3 { {

sm(sm(a,1,p),n-1,p) for n > 1
and here are a few values:
Page 39 of 100
sm(2,1,3) = 2^2 = 4 sm(2,2,3) = sm(sm(2,1,3),1,3) = sm(4,1,3) = 4^4 = 256
sm(2,1,4) = sm(2,2,3) = 256 mega = sm(2,1,5) = sm(2,2,4) =
sm(sm(2,1,4),1,4) = sm(256,1,4) = sm(256,256,3) = sm(256^256,255,3) =
sm((256^256)^(256^256),254,3) =
sm([(256^256)^(256^256)]^[(256^256)^(256^256)],253,3) = ... megiston =
sm(10,1,5) moser = sm(2,1,Mega)
We can approximate the sm function in terms of the hy function for small

values of p:
sm(n,1,3) = n^n = hy(n,3,n) = hy(n,4,2) sm(n,2,3) = (n^n)^(n^n) =

n^(n*n^n) = n^(n^(n+1)) ~= hy(n,4,3+epsilon) sm(n,3,3) =
sm(n,2,3)^sm(n,2,3) = (n^(n^(n+1)))^(n^(n^(n+1))) = n^(n^(n+1) *
n^(n^(n+1))) = n^(n^(1+n+n^(n+1))) ~= n^(n^(n^(n+1+epsilon))) ~=
hy(n,4,4+epsilon) by induction sm(n,x,3) ~= hy(n,4,x+1+epsilon) sm(n,1,4)
= sm(n,n,3) ~= hy(n,4,n+1+epsilon) sm(n,2,4) = sm(sm(n,1,4),1,4) ~=
sm(hy(n,4,n+1+epsilon),1,4) ~=
hy(hy(n,4,n+1+epsilon),4,hy(n,4,n+1+epsilon)+1+epsilon) "intuitively",
sm(n,1,4) ~= hy(n,4,n+1+epsilon) sm(n,1,5) ~= hy(n,5,n^2)
The value of Mega can be computed by hypercalc's internal BASIC

interpreter with the following code:
10 let mega = 256; 20 for n=1 to 256; 40 let mega = mega ^ mega; 80 next n
160 print "Mega = ", mega 320 end
The first few steps of this generate the numbers:
256 = 28 = 223
256256 = 22048 = 2211 ≈ 3.231700607153×10616
(256256)(256256) = 222059 ≈ 10(1.992373902866×10619)
[(256256)(256256)][(256256)(256256)] = 222059+22059 ≈ 1010(1.992373902866×10619)
Each time through the loop there are twice as many 256's — so there are
2256 256's in mega. However, the parentheses are grouped differently from
the power towers discussed above. After two times through the loop, for
example, it's (256256)^(256256). That is not as big as 256(256(256256)) — the
former is 10(1.992373902866×10619), the latter is 10[10(7.78271055807×10616)]. This
discrepancy continues, with the result that the mega is about
10^(10^(10^...^(1.992373902866×10619)...)), with 255 10's before the
"1.992373902866×10619" part. For reasons explained here, that is equivalent
to what you get if you replace all but the last few 10's with nearly any
other number between say 2 and 1000. hypercalc's final answer is:
255 PT ( 1.992373902865×10619 )
which represents a power tower with 255 powers of 10 ("255 P.T.") and
1.992373...×10619 at the top.
Page 40 of 100
Friedman Sequences
In a 1998 paper14, Harvey Friedman describes the problem of finding the

longest sequence of letters (chosen from a set of N allowed letters) such
that no subsequence of letters i through 2i occurs anywhere further on in
the sequence. For 1 letter {A} the maximum length is 3: AAA. For 2 letters
{A, B} the longest sequence is 11: ABBBAAAAAAA. For 3 letters {A, B, C}
the longest sequence is very very long, but not infinite.
He then goes on to show how to construct proofs of lower bounds for N-

character sequences using certain special (N+1)-character sequences. With
help from R. Dougherty, he found a lower bound for the N=3 case,
A7198(158386) = ack-rm(7198,158386)
= ack(7198,2,158386) = hy(2,7199,158386) = 2(7199)158386, where x(7199)y

represents the 7199th hyper operator.13
This value is less than the "Graham-Rothschild number" and the other
"Graham's number"s (which we'll go into next). However, the N=4 case gives
a result that is immensely bigger than all the versions of Graham's
number. Friedman describes these relations at [47].
The "Graham-Rothschild Number"
The original genuine "Graham's number", from a 1971 paper by Graham and
Rothschild [35], is an upper bound for a problem in Ramsey theory, (graph
colouring, combinatorics).
The problem is to determine the lowest dimension of a hypercube such that

if the lines joining all pairs of corners are two-coloured, a planar graph
K4 of one colour will be forced. (K4 is a totally-connected graph with 4
edges, topologically equivalent to the 4 vertices and 6 edges of a
tetrahedron. The planar requirement means that its 4 vertices are in one
plane, meaning that it can't be just any 4 vertices of the hypercube, but
diagonal planes are okay. See the article by Sbiis Saibian [53] for an
extremely thorough description with lots of illustrations.)
In 1971 Graham and Rothschild proved that an answer exists using an upper
bound F(F(F(F(F(F(F(12,3),3),3),3),3),3),3), where F(n) is defined:
F(m,n) = { 2^n for m=1, n>=2 { { 4 for m>=1, n=2 { { F(m-1,F(m,n-1)) for
m>=2, n>=3
Page 41 of 100
Graham and Gardner's 1971 description of the number
Using the generalised hyper function,

F(F(F(F(F(F(F(12,3),3),3),3),3),3),3) is equivalent to lgn(7), where
lgn(n) is defined:
lgn(n) = { hy(2,12,3) for n = 1 { { hy(1, lgn(n-1), 3) for n > 1
In other words, using up-arrows:
lgn(1) = 2↑↑↑↑↑↑↑↑↑↑↑↑3
(12 up-arrows) = hy(2, 12, 3)
lgn(2) = 2↑↑↑...↑↑↑3
(with lgn(1) up-arrows)
lgn(3) = 2↑↑↑...↑↑↑3
Page 42 of 100
. . .
lgn(7) = 2↑↑↑...↑↑↑3
This is the original number from Graham's proof.
As related by mathematician John Baez [52], Martin Gardner wanted to

describe it the Mathematical Games column of Scientific American, but
Graham and Gardner agreed that the definition was a little too complex,
and so they invented an even larger number...
The "Graham-Gardner Number"
This is the number commonly known as "Graham's number", but it is more

attributable to Martin Gardner than to Graham. The number from the 1971
paper by Graham and Rothschild, described above, is an "upper bound".
Upper bounds are "true" so long as the thing they're supposed to be
"bounding" is provably smaller. That means you can use a larger number for
an upper bound, and it's true too. If I tell you that 22×3=64 is an upper
bound for my problem, then it's safe for you to tell someone else that
223=256 is an upper bound for the problem, because 256 > 64.
So Gardner and Graham came up with the definition for a larger upper
bound, which was popularised by Martin Gardner in 1977. This came to be
known as "Graham's number", but I call it the "Graham-Gardner number". Its
value is is gn(64), where gn() is defined as follows:
gn(n)= { hy(3,6,3) for n = 1
{{ hy(3, gn(n-1)+2, 3) for n > 1
This illustration from the Graham's number Wikipedia article is a popular

way to illustrate the definition concisely:
Page 43 of 100
The "curly braces" indicate that the number of up-arrows in each "layer"
is counted by the number immediately below, with 4 arrows in the bottom
layer.
Todd Cesere and Tim Chow have both proven that the "Graham-Gardner number"
is bigger than the Moser, and in fact even gn(2) is much bigger than the
Moser. Tim's proof is outlined on a page by Susan Stepney, here.
Since gn(1) is 3↑↑↑↑3, which is smaller than 2↑↑↑↑↑↑↑↑↑↑↑↑3, it follows that

the "Graham-Rothschild number" is also bigger than Moser:
Moser << Graham-Rothschild << Graham-Gardner << Graham-Conway
This last variant, "Graham-Conway" is probably the least-known, but also

the largest...
The "Graham-Conway Number"
Conway and Guy's book The Book of Numbers [43] includes a description of a
"Graham's number" which is inexact, and which differs from the other
"Graham's number" definitions above. The exact text from [43] is:
Graham's number is
4↑↑ . . . ↑↑4, where the number of arrows is
4↑↑ . . . ↑↑4, where the number of arrows is
. . . et cetera . . .
(where the number of lines like that is 64)
It lies between 3→3→64→2 and
3→3→65→2.
(The last bit with the right-arrows is using Conway's Chained Arrow
Notation). The problem with this description is that it doesn't tell you
how many up-arrows is in the last of the "lines like that", though we know
there are 64 lines. Sbiis Saibian has a thorough page on Graham's number
[53] and calls this the "Graham-Conway number". He makes an educated guess
that the number of up-arrows on the last 4↑..↑4 line should be the same as
in the Graham-Gardner definition, i.e. 4 up-arrows:
Graham-Conway Number = Gc(64)

where Gc(n) is defined for all positive integer n as follows:
Gc(1) = 4↑↑↑↑4
Gc(n+1) = 4↑↑...↑↑4 with Gc(n) up-arrows
With this clarification, the definition of the "Graham-Conway number" is

just that of the "Graham-Gardner number" except that you change the 3's to
4's. This definition has a sort of elegance in that everything is 4's,
even the number of lines can be expressed as 64=4×4×4 if you're really
trying to use 4's everywhere.
Page 44 of 100
Clearly the "Graham-Conway number" is larger than the others, and it's the
largest "Graham's number" I've seen, notwithstanding xkcd:
How to horrify mathematicians
This strip is using a 2-argument Ackermann function. There are several

two-argument definitions in my listing (Versions of Ackermann's Function),
and we could choose any of them, so we may as well choose one that is
easily turned into up-arrows, such as the one by Buck in 1963. This gives
A(gn(64),gn(64)) = 2↑↑...↑↑gn(64), where the number of up-arrows is gn(64)-
2. Thus it's "a little bigger" than gn(65). If xkcd wanted to horrify
mathematicians a little more, and with fewer symbols, they might be better
off defining a "single-argument" Ackermann function a(x)=A(x,x), then
subscripting one of the g's by this a() function:
ga(g64) = AAUGHHHH!!!
Superclasses
Now let's take a break from the specific examples and functions for a
moment to describe another type of "number class" that is evident in the
way people describe large numbers. We'll call these superclasses.
I first saw the concept in Eliezer Yudkowsky's description8 of what I call

the "Graham-Gardner number", apparently adapted from the very similar
description by Martin Gardner16. I'm adapting it further to illustrate the
superclasses.
(1) Let's start with 3↑3. This is 33 = 27, a number that is small enough
for anyone to visualise.
(2) Next, consider 3↑↑3 = 3④3. This is 333 = 327 = 7625597484987, about 7
trillion. Even 7 billion would be too big for anyone to visualise, but
most people can understand it as being comparable to something familiar.
(For example, it is about 1000 times the world population and about a
tenth the number of cells in a human body.)
Page 45 of 100
(3) 3↑↑↑3 or 3⑤3, is 3④(3④3), which is 3④7625597484987. That would be a
power tower of 3's, 7 trillion levels high. Now we are far beyond the
ability to understand — nothing in the real world is this numerous, and
even combinations of things (such as the number of ways to shuffle every
particle in the known universe) only require, at most, 4 or 5 levels of a
power-tower to express. However, even though the quantity is beyond
understanding, the procedure for computing it can be visualised: Start
with n=1. Now replace n with 3n. Replace n with 3n again, and so on —
repeat seven trillion times. This is a procedure that one can visualise
doing, and it can even be completed in a reasonable amount of time using a
fast computer and a suitable representation format such as level-index (be
advised: some rounding may occur).
(4) Now consider 3↑↑↑↑3, or 3⑥3, the first step in the "Graham-Gardner
number" definition. This is 3⑤(3⑤3), which is 3④(3④(3④(...④(3④3))...)),
a chain of 7 trillion 3's and hyper4 operators. Every one of these
requires performing the exponent operation an unimaginable number of
times. So now, the procedure is beyond human ability to visualise,
although it can be understood. Start with n=3. Now replace n with an
exponential tower of threes of height n. Repeat 3⑤3 - 2 more times, where
3⑤3 is the huge number whose calculation we visualised in the previous
paragraph! Martin Gardner, also writing about "Graham's number" (the one
here I call the "Graham-Gardner number"), said:
3↑↑↑↑3 is unimaginably larger than 3↑↑↑3, but it is still small as finite

numbers go, since most finite numbers are very much larger.
(5) We have just seen four numbers in a sequence: 3↑3, 3↑↑3, 3↑↑↑3, and
3↑↑↑↑3. Now consider the formula for the "Graham-Gardner number". Begin
with x = 3↑↑↑↑3, the number whose calculation procedure cannot even be
visualised — then increase the number of up-arrows from 4 to x. Then
increase the number of up-arrows to this new, larger value of x again.
Then — repeat 61 more times! Here's what Yudkowsky had to say:
Graham's number is far beyond my ability to grasp. I can describe it, but
I cannot properly appreciate it. (Perhaps Graham can appreciate it, having
written a mathematical proof that uses it.) This number is far larger than
most people's conception of infinity. I know that it was larger than mine.
...
This example illustrates the numbers (including those on my numbers page

and on this page up to this point) can be seen as going from one level of
abstraction to another. Here are the definitions of the superclasses:
Superclass 1: The number can be visualised. (Example: 27)
Superclass 2: The number cannot be visualised but it can be understood.

(Example: 3↑↑3 ≈ 7 trillion)
Page 46 of 100
Superclass 3: The number cannot be understood, but the procedure for
computing it can be visualised. (Example: 3↑↑↑3)
Superclass 4: The procedure cannot be visualised, but the procedure can be

understood. (3↑↑↑↑3)
Superclass 5: The procedure for generating the number is so abstract that

it cannot be understood (by whoever's talking, by the author, by yourself,
by some given group or people, etc.).
To complete the analogy to the normal classes, I will also add:
Superclass 0: The number can be "experienced" even by certain animals.

(Example: 2)
Like the classes, the superclasses have important qualities that

distinguish them, and this insight is what makes "superclass" a useful
distinction. Each requires an additional amount of mental effort to ensure
that the number is understood well anough to answer questions such as
which number is larger, or which algorithm grows faster.
Notice that the first three (superclass 0, 1 and 2) are roughly equivalent
to class 0, class 1 and class 2 above. The division points between them
will be similar for most readers, but the upper range of superclass 2 will
probably vary from one reader to another. I think I "understand" numbers
about as big as the size of the visible universe, although I find it
harder on some days than on others. There are probably people who have an
understanding of such things as the size of the universe after cosmic
inflation, or the size that would be needed for a spontaneous origin of
life.
The first three superclasses can all be calculated fairly easily, using
familiar techniques or modest towers of exponents — in the latter case
sizes can easily be compared by looking at the number of levels in the
tower and the value of the number at the top. Practical tools like
hypercalc exist to actually work with these numbers.
Superclass 3 begins wherever your ability to "understand" ends. However,

the procedure for computing it can still be visualised. Somewhere within
superclass 3, the practical tools like hypercalc stop working. At the top
end of superclass 3, we are well beyond the ability to work with actual
numbers and have to start using symbols and words, because the numbers
outstrip the available tools. However, it is still possible to make fairly
simple proofs to show how one function converts to another. An example is
proving that the higer-valued hyper4 operator a④b grows about as quickly
as the lower-valued hyper5 operator a⑤b.
With superclass 4, such proofs become more difficult. This is why it is a

bit tougher to compare the "Graham-Rothschild number" to the Moser. It can
be worked out by most who have time and patience, and the explanation can
Page 47 of 100
be followed and understood fairly easy, but it probably isn't too easy to
remember.
With Graham's number (the original "Graham-Rothschild number" or its more

well-known variant the "Graham-Gardner number"), and other numbers of
similar size and larger which we are about to get into, most readers will
have stepped firmly into superclass 5. The last paragraph in the Yudkowsky
quote captures what superclass 5 is about: perception and understanding
are so difficult that it is hard even to compare the number to infinity. I
cannot say where your superclass 5 begins, but once you reach it, it is
better to stop trying to grasp the quantities in any direct way and
instead use rigorous deduction, or trust others who have done so.
I have found my own boundary between 4 and 5 varies widely from one day to
another. Clearly there are those (such as the people who worked on the
numbers we are about to discuss) for whom "superclass 4" extends far
higher than mine. I suggest it might be useful to define:
Superclass 6: The number is so big that no-one can understand its

definition well enough to know anything useful about it.
In other words, superclass 6 is finite, but so large that no person, nor

humanity as a whole, stands a chance of discovering anything useful about
it. This probably happens with values of the Busy Beaver function with an
argument greater than a few hundred.
Conway's Chained Arrow Notation
This notation is based on Knuth's up-arrow notation. In this notation,

three or more numbers are joined by right-arrows. The arrow is not an
ordinary dyadic operator. You can have three or more numbers joined by
arrows, in which case the arrows don't act separately, the whole chain has
to be considered as a unit. It might be thought of as a function with a
variable number of arguments, or perhaps a function whose single argument
is an ordered list or vector.
Here is Conway's notation, adapted from Susan Stepney's excellent web page
on large numbers:
A. Three-number chains are equivalent to the generalised hyper operator:
a→b→c = hy(a, c+2, b) = a(c+2)b

= a ↑↑..↑↑ b, with c up-arrows
B. A chain with a final 1 is equivalent to the same chain with the 1

removed:
a→b→...→x→1 = a→b→...→x
Page 48 of 100
C. Conway is silent about the meaning of a two-element chain7, but a
definition is necessary. The most logical definition combines the two
previous rules and assumes that they work in reverse:
ab = a↑b = a→b→1 (by rule A)

= a→b (by rule B)
therefore the reverse is true:
a→b = a↑b
= ab
D. The last two numbers in a chain can be dropped if the second-to-last is

a 1:
a→b→...→x→1→(z+1) = a→b→...→x
E. The last number in the chain can be reduced in size by 1 by taking the
second-to-last number and replacing it with a copy of the entire chain
with its second-to-last number reduced by 1:
a→b→...→x→(y+1)→(z+1)
= a→b→...→x→ a→b→...→x→y→(z+1)) →z
F. If the last number in the chain is 2, rules D and E combine into:
a→b→...→x→ (y+1) → 2
= a→b→...→x→ (a→b→...→x→y→2) →1
= a→b→...→x→ (a→b→...→x→y→2)
= a→b→...→x→ (a→b→...→x→ (a→b→...→x→(y-1)→2) →1 )
= a→b→...→x→ (a→b→...→x→ (a→b→...→x→(y-1)→2))
and in general,
a→b→...→x→ (y+1) → 2
= a→b→...→x→ ( y times a→b→...→x) y times
G. A more general expansion of rule E follows:
a→b→...→x→2→ (z+1)
= a→b→...→x→ (a→b→...→x→1→(z+1)) →z
= a→b→...→x→ (a→b→...→x ) →z
a→b→...→x→3→ (z+1)
= a→b→...→x→ (a→b→...→x→2→(z+1)) →z
= a→b→...→x→ (a→b→...→x→ (a→b→...→x) →z) →z
Page 49 of 100
a→b→...→x→4→ (z+1)
= a→b→...→x→ (a→b→...→x→3→(z+1)) →z
= a→b→...→x→ (a→b→...→x→ (a→b→...→x→ (a→b→...→x) →z) →z) →z
and in general,
a→b→...→x→ (y+1) → (z+1)

= a→b→...→x→ (y times a→b→...→x) →z y times
The parentheses can only be removed after the chain inside the parentheses
has been evaluated into a single number. Remember, the arrows are all
taken together at once. You can't group them to evaluate, you can only
reduce terms off the end using one of the rules above. a→b→c→d is not
equivalent to (a→b→c)→d, a→(b→c→d), a→(b→c)→d, a→b→(c→d) or
anything else like that.
The wikipedia article, Conway chained arrow notation, gives this set of
rules which is equivalent to the above but probably a bit easier to use:
1. A single-number "chain" is equal to itself.
2. The chain a→b represents the value ab.
3. If X is any chain (of one or more numbers), the chain X→a→1 is

equivalent to X→a.
4a. The chain X→1→b for any value of b, is equivalent to X.
4b. The chain X→a→b for any values of a and b where both are greater than
1, is equivalent to X→(X→(a-1)→b)→(b-1).
4c. By repeating rule 4b we can see that any chain X→a→b with a and b
both larger than 1 eventually turns into a chain X→c where "c" is a the
value of the recursive construction X→(X→(X→(...(X→1→1)...)))→(b-
1))→(b-1) with the number of nested parentheses depending on a.
Regarding 2's at the beginning
Note that 2↑↑↑...↑↑↑2 = 4 regardless of the number of up-arrows. Since any

chain starting with 2→2→... will eventually end up as 2→2→c for some
large c, it follows that any chain starting with 2→2→... is equal to 4.
Regarding 1's
Any chain starting with 1 eventually ends up as 1→a→b for some a and b,
which is 1↑↑↑...↑↑↑a with b arrows. This is just a power of 1, so any chain
starting with 1 is equal to 1.
Page 50 of 100
Any chain with a 1 in it someplace else, like a→b→c→1→d→e→f, reduces
to a→b→c→1→x for some x, and this then immediately reduces to a→b→c.
So if you have anything with a 1 in it, you can remove the 1 and
everything after it.
Some Correlations
The rules for Conway chained arrows express them in terms of Knuth's up-
arrow notation and recursion; we can also express some of the other
functions and numbers discussed above in terms of chained arrows:
Ackermann's function is equivalent to a 3-element chain:
A(m, n) = ( 2→(n+3)→(m-2) ) - 3
The "Graham-Gardner number" is not directly equal to a simple chained-

arrow representation, but its definition in terms of the function gn(n)
can be re-stated in terms of an interated function fi(n):
f1(n) = f(n) = 3→3→n = 3↑↑..↑↑3 with n up-arrows

f2(n) = f(f(n)) = 3→3→(3→3→n)
f3(n) = f(f(f(n))) = 3→3→(3→3→(3→3→n))
f4(n) = f(f(f(f(n)))), etc.
which gives the "Graham-Gardner number" G equal to f64(4) because the

construction of Graham-Gardner starts with 3↑↑↑↑3 = 3→3→4, and we increase
the number of arrows 63 times.
Now start with f64(1), and note that we can reverse rule 4c above (with
"3→3" as the value of X), and show that:
f64(1) = 3→3→(3→3→(...3→3→(3→3→1))...)) with 64 sets of 3→3

= 3→(3→3→(...3→3→(3→3)→1)...)→1)→1
= 3→3→64→2
Similarly, we can start with f64(27) and show that:
f64(27) = 3→3→(3→3→(...3→3→
(3→3→27))...)) with 64 sets of 3→3

= 3→3→(3→3→(...3→3→(3→3→(3→3)))...))
= 3→3→65→2
Since f64(1) < f64(4) < f64(27), the "Graham-Gardner number" must be between
3→3→64→2 and 3→3→65→2. See here for a somewhat different discussion of
the same thing.
Partial Ordering
Page 51 of 100
One may speculate on the general problem of determining which is the
larger of two chains a→b→...→c and x→y→...→z. We can begin to make
answer that question for some of the shorter chains (most of which is
simply a restatement and re-ordering of the examples in my partial
ordering for Knuth up-arrows):
First, as noted above we can remove any 1's and anything coming after a 1.
Also, if the chain starts with 2→2 the whole thing can be replaced with 4.
For two-item chains, the following are clear:
2→3 is smaller than 3→2

2→4 is the same as 4→2
for any other a and b, if b is greater than a then a→b is greater than
b→a.
Now let's look at three-item chains:
For any a, the chain a→2→2 = a↑↑2 = aa. So for these, the one with a
larger a is the larger chain.
If the first two items are the same, as in comparing a→b→c to a→b→d,
both are like a↑↑↑...↑↑↑b but one has more arrows. So if a and b are both
greater than 2, then the one with the larger d is larger.
Similarly if the first and last items are the same, as in comparing a→b→d
to a→c→d, we are comparing two things with the same number of arrows
(a↑↑↑...↑↑↑c versus a↑↑↑...↑↑↑d) and clearly the one with the larger number
is larger.
A similar argument shows that if we are comparing a→c→d to b→c→d, where

only the first number differs, the one with the larger first number is
larger. This is a generalization of the a→2→2 case above.
2→3→2 = 2↑↑3 = 2↑2↑2 = 2↑4, but 2→2→3 = 2↑↑↑2 = 2↑↑2 = 2↑2 = 4. Since
2→2→anything is 4, it's clear that if we have two 2's, the arrangement
with the larger-than-2 number in the middle is larger. Also, 3→2→2 = 3↑↑2
= 3↑3 = 27, so this is the largest of the three combinations of two 2's and
a 3.
There are three ways to combine a 2 and two 3's:
3→3→2 = 3↑↑3 = 3↑27; and

3→2→3 = 3↑↑↑2 = 3↑↑3 = 3↑27,
so these are the same.
2→3→3 = 2↑↑↑3 = 2↑↑2↑↑2
= 2↑↑4 = 2↑2↑2↑2 = 2↑2↑4 = 2↑16
Page 52 of 100
Now let's look at the combinations of 2, 3 and 4. There are 6 of these,
and I'll show them here with the largest first:
3→2→4 = 3↑↑↑↑2 = 3↑↑↑3

= 3↑↑(3↑↑3) = 3↑↑(3↑3↑3) = 3↑↑(3↑27), a power-tower of height 3↑{27}.
2→3→4 = 2↑↑↑↑3 = 2↑↑↑(2↑↑↑2) = 2↑↑↑4 = 2↑↑(2↑↑(2↑↑2))

= 2↑↑(2↑↑4) = 2↑↑16, a tower of height 16.
2→4→3 = 2↑↑↑4, the same as 2→3→4 (a tower of height 16).
3→4→2 = 3↑↑4 = 3↑3↑3↑3, a tower of height 4.
4→2→3 = 4↑↑↑2 = 4↑↑4
= 4↑4↑4 = 4↑256.
4→3→2 = 4↑↑3 = 4↑4↑4 = 4↑256.
The thing to notice is that the winners have the 4 at the end, and among
them the one with the 3 first is a lot larger. Let's try it again with
bigger numbers all around:
4→3→5 = 4↑↑↑↑↑3 = 4↑↑↑↑(4↑↑↑↑4)
= 4↑↑↑↑(4↑↑↑4↑↑↑4↑↑↑4)
3→4→5 = 3↑↑↑↑↑4
= 3↑↑↑↑(3↑↑↑↑(3↑↑↑↑3))
= 3↑↑↑↑(3↑↑↑↑(3↑↑↑3↑↑↑3))
Here it appears that 3→4→5 is going to end up being larger than 4→3→5.
This is a reflection of the general rule for partial ordering of the hyper
operator described above.
Jonathan Bowers' Extended Operators
Jonathan Bowers has defined a whole series of notations that surpass

everything mentioned so far. We can start by defining his extended
operators, which are kind of analogous to the hyper operators. In fact, he
starts with the hyper operators, which can be thought of as the original,
"unextended" set of operators. Instead of putting the operator number
inside a raised circle, he uses a pair of "angle-brackets":
a <n> b = aⓝb = hy(a,n,b)
So, <1> is addition, a <1> b = a+b; <2> is multiplication and so on.
Page 53 of 100
Using this notation makes it easier to make the operator itself a variable
or expression, and unlike using the hy() function it retains the look of
applying an operator (because the operator part is in the middle where it
"belongs". For example:
a <1+2> b = a <3> b = ab
gn(1) = 3 <6> 3
gn(2) = 3 <3 <6> 3> 3
Mega ≈ 256 <4> 2
Moser ≈ Mega <Mega+1> 2 ≈ (256 <4> 2) <256 <4> 2> 2
Here is his first extended operator:
a <<1>> 2 = a <a> a
a <<1>> 3 = a <a <a> a> a
a <<1>> 4 = a <a <a <a> a> a> a
and so on
a <<1>> b is "a expanded to b".
If you wish, you might prefer to represent this operator in a way similar
to the higher hyper operators but with a 1 inside two circles (which I'll
enlarge here for clarity):
a⓵b
Since the two circles are hard to see in normal font sizes, I'll instead
use the hyper1 operator symbol ① inside a set of parentheses: a(①)b.
Using this notation, the "Graham-Gardner number" is shown to be between

3(①)65 and 3(①)66 (the gn(x) function is as defined here):
3(①)2 = 3 <3> 3
gn(1) = 3 <6> 3
3(①)3 = 3 <3 <3> 3> 3 = 3 <27> 3
gn(2) = 3 <3 <6> 3> 3
and so on
3(①)65
gn(64) = "Graham-Gardner number"
3(①)66
Going back to Bowers' notation, here is the definition of the next

extended operator. It is a right-associative recursive iteration of the
first:
a <<2>> 2 = a <<1>> a
a <<2>> 3 = a <<1>> (a <<1>> a)
a <<2>> 4 = a <<1>> (a <<1>> (a <<1>> a))
and so on
a <<2>> b = a <<1>> (a <<2>> (b-1))
a <<2>> b is called "a multiexpanded to b".
Page 54 of 100
Note the similarity of the definition a <<2>> b = a <<1>> (a <<2>> (b-1))
to the corresponding definition for a hyper operator, e.g. a②b = a①(a②(b-
1)).
The subsequent extended operators are defined in a similar way, each in

terms of the previous.
a <<3>> b is called "a powerexpanded to b".

a <<4>> b is called "a expandotetrated to b".
Then Bowers defines a third series of extended operators, in a similar

way:
a <<<1>>> 2 = a <<a>> a
a <<<1>>> 3 = a <<a <<a>> a>> a
a <<<1>>> 4 = a <<a <<a <<a>> a>> a>> a
and so on
a <<<1>>> b is "a exploded to b".
then the next operator:
a <<<2>>> 2 = a <<<1>>> a
a <<<2>>> 3 = a <<<1>>> (a <<<1>>> a)
a <<<2>>> 4 = a <<<1>>> (a <<<1>>> (a <<<1>>> a))
and so on
a <<<2>>> b is called "a multiexploded to b".
and so on:
a <<<3>>> b is called "a powerexploded to b".

a <<<4>>> b is called "a explodotetrated to b".
You can see that this generalises easily to a function of four numbers, a,
b, the number inside the angle-brackets and a number telling how many
angle-brackets there are. This can be written as a function, f(a,b,c,d) or
something like that — but Bowers wanted to go further.
In order to generalise his operators while also making it easy to extend

the system further, Bowers created his array notation. The 3-element
version of Bowers array notation was already covered above. It is easy to
convert from the extended operator notation to an array notation version
of the same number — sacrificing a bit by making the rules for "expanding"
a number from the array notation a little complex.
All of the operators defined thus far can be expressed as an array with up
to four elements, as follows:
[a] = a
[a,b] = a+b
Page 55 of 100
[a,b,1] = [a,b] = a+b = a①b
[a,b,2] = a×b = a②b
[a,b,3] = ab = a③b
[a,b,c] = a <c> b = hy(a,c,b)
[a,b,c,1] = a <c> b (combining a and b with the cth operator from the
added, multiplied, exponentiated, ... sequence)
[a,b,c,2] = a <<c>> b (combining a and b with the cth operator from the
expanded, multiexpanded, powerexpanded, ... sequence)
[a,b,c,3] = a <<<c>>> b (combining a and b with the cth operator from
the exploded, multiexploded, powerexploded, ... sequence)
Here are the rules:
1. For one- and two-element arrays, just add the elements. [a] = a and
[a,b] = a+b
2. If rule 1 does not apply, and if there are any trailing 1's, remove
them: [a,b,1,1] = [a,b]; [a,b,c,1] = [a,b,c], etc.
3. If neither previous rule applies, and the 2nd entry is a 1, remove all
but the first element: [a,1,b,c] = [a] = a.
4. If none of the previous rules applies, and the 3rd entry is a 1:
[a,b,1,c] becomes [a,a,[a,b-1,1,c],c-1]
5. Otherwise all four elements are greater than 1: [a,b,c,d] becomes
[a,[a,b-1,c,d],c-1,d]
At this point, it is interesting to note how similar the Bowers array

rules are to the definition of a recursive function like the Ackermann
function. The Ackermann function was originally developed with the
restriction that functions must be defined entirely in terms of calls to
themselves and each other, and in terms of the "successor function" s(x) =
x+1. The 5 rules above can be restated:
12. f(s(s(a)),1,1,1) = a
1b. f(s(s(a)),s(s(s(b))),1,1) = f(s(s(s(a))),s(s(b)),1,1)
3. f(s(s(a)),1,s(s(b)),s(s(c))) = a
4. f(s(s(a)),s(s(s(b))),1,s(s(s(c)))) =
f(s(s(a)),s(s(a)),f(s(s(a)),s(s(b)),1,s(s(s(c)))),s(s(c)))
5. f(s(s(a)),s(s(s(b))),s(s(s(c))),s(s(d))) =
f(1,f(1,s(s(b)),s(s(s(c))),s(s(d))),s(s(c)),s(s(d)))
Here is an example of applying the rules to the simplest non-trivial 4-

element array:
[2,2,2,2] = [2,[2,1,2,2],1,2] (by rule 5)

[2,1,2,2] = 2 (by rule 3)
so we have [2,2,1,2]
[2,2,1,2] = [2,2,[2,1,1,2],1] (by rule 4)
[2,1,1,2] = 2 (by rule 3)
so we have [2,2,2,1]
[2,2,2,1] = [2,2,2] (by rule 2}
[2,2,2] = [2,[2,1,2],1] (by rule 5)
[2,1,2] = 2 (by rule 3)
Page 56 of 100
so we have [2,2,1]
[2,2,1] = [2,2] (by rule 2)
[2,2] = 2+2 = 4 (by rule 1)
With a little effort you can see that anything starting with [2,2 is equal
to 4. To get anything bigger than 4, you have to have at least one 3. Here
is the simplest example:
[3,2,1,2] = [3,3,[3,1,1,2],1] (by rule 4)

= [3,3,3,1] (because [3,1,1,2] = 3 by rule 3)
= [3,3,3] (by rule 2)
Once it is reduced to a 3-element array, we can convert to hyper operator

notation as established earlier. So [3,2,1,2] is 3③3 = 33 = 27. Now using
the extended operator <<1>>, 3 <<1>> 2 = 3 {3} 3. This is the same as
[3,2,1,2}.
Here is another example:
[3,2,2,2] = [3,[3,1,2,2],1,2] (by rule 5)

= [3,3,1,2] (because [3,1,2,2] = 3 by rule 3)
= [3,3,[3,2,1,2],1] (by rule 4)
= [3,3,[3,3,[3,1,1,2],1],1] (by rule 4 again)
= [3,3,[3,3,3,1],1] (because [3,1,1,2] = 3 by rule 3)
= [3,3,[3,3,3]] (by rule 2)
= [3,3,[3,[3,2,3],2]] (by rule 5)
= [3,3,[3,[3,[3,1,3],2],2]] (by rule 5)
= [3,3,[3,[3,3,2],2]] ([3,1,3] = 3 by rule 3)
= [3,3,[3,[3,[3,2,2],1],2]] (by rule 5)
= [3,3,[3,[3,[3,[3,1,2],1],1],2]] (by rule 5)
= [3,3,[3,[3,[3,3]],2]] (by rules 2 and 3)
= [3,3,[3,[3,6],2]] (by rule 1)
= [3,3,[3,9,2]] (by rule 1)
... (about 8 repeats of rules 5 and 1 to turn [3,9,2] into 27)
= [3,3,27] = 3 {27} 3 = 3㉗3 = 3 {26} (3 {26} (3...))
This is equivalent to 3 <<2>> 2, which expands as follows:
3 <<2>> 2 = 3 <<1>> 3 = 3 <3 <3> 3> 3 = 3 <27> 3 = 3㉗3
3 <<2>> 2 is the same as [3,2,2,2]. In fact, the rules for the 4-element
array notation are equivalent to definitions of the extended operators.
The array [a,b,c,2] is equal to a <<c>> b; [a,b,c,3] is a <<<c>>> b; and
in general [a,b,c,d] is a <<<<<c>>>>> b with d sets of brackets around the
c.
in 2006 Chris Bird proved[46] that a 4-element array [a,b,c,d] is larger

than a→a→a→...→a→(b-1)→(c-1), using Conway's chained arrow notation,
and with d occurrences of a in the chain (provided that a>2, b>1, c>0 and
d>1). (Bird exhibited far more patience than I, who was merely satisfied
Page 57 of 100
with Bowers' own assessment that a 5-element array [n,n,n,n,n] is at least
as large as an n-element chain n→n→n→...→n.)
Bowers Arrays with 5 or More Elements
Of course, Bowers wanted to extend the system, so the rules were designed
to work with arrays of arbitrary length. This is done by changing rules 4
and 5 to the following:
4. If none of the previous rules applies, and the 3rd entry is a 1: Define
the variables a,b,S,d and R so that the array is [a,b,S,1,d,R] where a,b
are the first two elements, [S,1] is the string of 1 or more 1's; d is the
first element bigger than 1 and [R] is the remaining part of the array.
Replace the array with [a,a,S',[a,b-1,S,1,d,R],d-1,R] where [S'] is a
string of a's of equal length as string [S].
5. If none of the previous rules applies, replace the second element:

[a,b,c,R] becomes [a,[a,b-1,c,R],c-1,R]
I am fairly well convinced that Bowers is right in stating that the value
represented by the 5-element array [n,n,n,n,n] is at least as large as
n→n→n→...→n in the Conway chained-arrow notation, where there are n
items in the chain.
For more on Bowers' notation, including updated definitions and a great

many more steps in defining new recursive functions, read here: Array
Notation, or check the newer, longer version here: Exploding Array
Function. There are some other Bowers links below.
Generalised Invention of Recursive Functions
At this point it is best to just describe the general process. Larger

numbers are described by defining various types of recursive functions,
always defined in terms of themselves and other previously defined
recursive functions. Each new definition adds a little more complexity to
the system. In order to understand any one function, you have to
understand all the functions it is defined in terms of. Once you have
defined a new function, you can invoke it with larger and larger
arguments: f(2), f(10), f(f(1000)), etc. until the amount of digits and
notation symbols becomes inconvenient, then you define a new function
g(x).
It is important to note that you keep adding information: plugging in

larger numbers like 2, 10, 1000 increases the information, and defining
functions greatly increases the information. In general, larger numbers
require more information.
But defining functions is just an operation in itself. If you define a

standardised way to define new functions, then you can abstractify the
process of defining the functions, and define a new function based in the
idea of iterating the process of defining functions. This requires
Page 58 of 100
modeling the process of recursive definition and computation, something
that can be done with, say, a computer program that emulates another
simpler computer.
This is a jump into a second-higher level of abstraction. Just as

arithmetic is an algorithmic abstractification of counting, and defining
functions is an algorithmic abstractifcation of the mechanics of
arithmetic, this new process of automatically repeatedly defining
functions is an abstractification of that.
All of these ideas were formalised and the process of algorithmic

abstractification was studied in the theory of computation by Turing,
Church and Kleene, among others. They showed that all algorithmic
processes within a certain limited definition of algorithmic process could
be reproduced by a certain, minimal definition of computation, and used
that model to show that there were certain limits to what types of things
could be computed. (We'll dive into that fairly deeply later, when we get
to the Lin-Rado Busy Beaver Function).
Formal Grammars
If the foregoing makes little sense, consider this concrete (but somewhat
non-rigorous) example. Select any well defined, "sufficiently powerful"
grammar G, consisting of a symbol-set of S symbols, finite in number, and
well-defined rules of what constitutes a syntactically valid string of
symbols specifying an integer. An example grammar that should be fairly
familiar uses the symbols:
0 1 2 3 4 5 6 7 8 9 + * ^ ( )
and the rules that these symbols are to be strung together to make a legal
set of additions, multiplications and exponentiations yielding an integer
result; in this example S = 15 because we listed fifteen symbols. Just to
be unambiguous, we'll require parentheses whenever two or more operators
appear in a string.
Given this grammar G, for every integer N there is a set of integers EN

consisting of all the integers that can be specified as a combination of N
symbols in G using G's defined grammar. This set is finite, (it has, at
most, SN elements, since there are that many combinations of N symbols from
a set of S). Since EN has a finite number of elements, it therefore has a
maximum element. Define m(N) to be a new function (not a part of the
grammar G) giving the value of this maximum expressible integer in the
grammar G for each N. Now we have a function which is guaranteed to grow
at least as fast as any function defined within G, or faster.
(Technically, it is only guaranteed to grow faster above a certain minimum
value of N — this is part of what we vaguely called "sufficiently
powerful"). In any case, this function, or any larger function definition
from f(x) = m(x) + 1 to f(x) = m(m(m(x))) or beyond, can be defined as
part of a new, larger grammar G' incorporating all of the definitions of G
plus the new definition of f().
Page 59 of 100
So, in the specific example given here, we find in particular that for N =
3, N = 7, N = 11, the largest expressible integers in G are:
9^9, therefore m(3) = 9 ^ 9

9^(9^9), therefore m(7) = 9④3
9^(9^(9^9)), therefore m(11) = 9④4
and in general for N = 4 X + 3 for any natural number X,
m(N) = m(4X+3) = 9④(X+2)
Since N is always larger than X + 2 we can define our new grammar G' just
by adding the symbol:
(which represents the ④ or hyper4 operator) and the new syntax:
a h b
where a and b are valid strings, and interpreted as a④b. This function
grows faster than G's m(x) function. In this new grammar, which we now
call G':
m'(3) is 9④9
m'(7) is 9④(9④9)
m'(11) is 9④(9④(9④9))
Now the process could continue to grammar G'' and so on. If you continue
the same idea indefinitely you just get higher hyper operators, but you
could also define new symbols using the ideas given above — the Ackermann
function, the Conway chained-arrow notation, etc. At each stage you have a
grammar Gx with its maximal function mx(n) to which the same idea can be
applied to generate another bigger function.
The Lin-Rado/Goucher/Rayo/Wojowu Method
The process described in the previous section defines higher functions

while adding to the amount of information necessary to describe the
function and its definitions. All of these functions are said to be
computable. This is a very jargony word with a very specific precise
meaning, birthed in the early 20th century, that makes sense within the
same textbooks that might outline the Church-Turing thesis or Gödel's
incompleteness theorems.
If however, we are willing to step outside the capabilities of the

computer executing one specific well-defined algorithm on a finite number
of finite-valued integer inputs, we can use far fewer symbols to define a
faster-growing function. We do this by using a formulation with this
structure:
Page 60 of 100
B(n) is the largest number computable in a way that can be completely
defined in n or fewer symbols, drawn from a finite alphabet and using a
finite set of combination / operation rules, all of which are agreed upon
by both parties before any attempt is made to contemplate the value of
B(n) for any specific argument n.
In this formulation, we aren't allowing new symbols to be added for

hyper4, the Ackermann function, etc. but we don't need to if we choose a
system that is "computationally universal", i.e. capable of computing any
of the foregoing functions/operations, or others similarly defined, if
given a suitable program. Such universal computing capacity is called
Turing completeness or "Turing equivalence", probably because Turing's
treatment of it was so assessible, and presenteed at the right time; but
the first formal systems and computer designs with universal capability
were much older.
The Lin-Rado Busy Beaver Function
Perhaps the first and best-known such formulation is the busy beaver
problem. It achieves truly staggering huge numbers with absolutely the
least amount of symbols, rules, and input data of anything weve seen so
far; simply put, it's hard to beat. We'll see why soon, when we get to the
next table with large numbers.
The Turing machine is often used to demonstrate fundamental principles of

computation. It is equivalent to many (but not all) actual real-world
computer techniques. A Turing machine consists of a state machine that has
a certain (finite) number of states, an infinitely large memory (usually
described as an infinitely long linear strip of paper that can be marked
in two different ways) and a set of rules describing what the machine will
do given its current state and the marks on the paper. The rules are
limited to things like moving one way or the other on the strip, writing a
symbol (like 0 or 1) on the paper, looking at what's written at the
current position, changing to another state, and stopping. The rules are
often described as "five-tuples": each rule is five numbers (A,B,C,D,E)
and is interpreted as "If you're in state A and the tape has symbol B then
go to state C, write symbol D, and move the tape E units to the right". (A
must be from 1 to N, C must be from 1 to N or 'H' for halt, B and D must
be 0 or 1 and E must be -1 or 1. Note that a "don't move the tape" option
wouldn't gain anything, because then you'll just apply another rule and
overwrite the tape anyway.).
The Busy Beaver Function was originally defined by Tibor Rado at Ohio
State in 1962. It is defined by specifying that you must start with a
blank tape (all 0's), with a finite number of symbols per position on the
tape (we usually use two: 0 and 1) and you're limited to N states in the
state machine. What is the most number of marks (1's) you can have it
write on the tape before stopping? A set of rules that never stops doesn't
count. The maximum number of marks for N states is BBN. This is a well-
defined function and it grows very very fast.
Page 61 of 100
In this table, the column labeled "Machines" tells how many Turing
machines of N states exist; this is (4N+4)2N (the number that actually have
to be checked is lower). The column labeled "steps" shows how many steps
are taken by the current record-holder before halting. Here are some older
record setters and a more detailed description of the difficulty of the
problem. A good page for recent infomation about the problem is Marxen's
own page.
N Machines BBN Steps Found by
1 64 1 1 Lin & Rado
2 20736 4 6 Lin & Rado
3 16777216 6 21 Lin & Rado
4 25600000000 13 107 Brady
5 63403380965376 >= 4098 47176870 Marxen & Buntrock
6 232218265089212416 >= 3.515×1018267 >= 7.412×1036534 Pavel Kropitz
7 24N(N+1)2N
8 (OEIS A052200) >= 8.248×1044 Milton Green
When it comes to implementing fast-growing functions of integers, Turing

machines appear to do a very good job of going at least as high as
anything else we've defined. For example, a Turing machine with only 6
states is sufficient to implement an interated exponential function with a
chaotic deterministic low-probability exit condition. The machine that set
the 1.29149×10865 record is essentially performing the iteration X=2K×X
several times in a row before halting. There are few involved with Turing
machines who doubt that with only a few more states, massively higher
numbers can be computed by much faster-growing functions.
When a function like "B(n)" above, or this BBN specifically, is allowed as

part of the definition of a finite number, we have moved up to a higher
order of "computability" than the early-20th-century type of Church, Turing
and Gödel.
BBN is not "computable" in the formal sense — you cannot predict how long
it might take to count the number of 1's written by all Turing Machines
with N states for arbitrary values of N. But for specific small values of
N, it is possible to do a brute-force search, with human assistance to
examine all the "non-halting" candidates and equip your program with
pattern-matching techniques to identify these as non-halting.
However, this takes massively greater amounts of work for each higher
value of N, and so the Busy Beaver function is unwieldy to calculate. No-
Page 62 of 100
one has been able to complete the brute-force search for any value of N
greater than 4.
So the Busy Beaver function is not actually a good way to calculate big
numbers — for example, 101027 isn't nearly as big as BB27, but it's bigger
than any BBN value we've been able to calculate, and it can be calculated
much more quickly and easily.
The only way in which the Busy Beaver function "grows fastest" is when you
look at it in terms of the function's value compared to the amount of
information required to specify the formal system, the function, and the
function's parameter(s). This is a highly abstract concept and shouldn't
be considered important unless you are studying the theory of
deterministic algorithms specified within formal systems. To understand
this, you can imagine, defining a precise set of rules for manipulating
symbols, which define all of the functions above (starting with addition
and going up through chained arrow notation, iteratively defining new
functions, and so on). Each new rule, symbol and function would take a bit
more information to define completely. If you wrote a computer program to
compute each function, each program would be a bit more complex. You could
also do the same thing by starting with a definition of the rules of the
Turing machine, then start with 1-state Turing machines and then increase
the number of states by adding a few extra bits of information per state.
It is generally believed that, as the amount of information used gets
higher, the Turing machine based system will produce higher function
values than any other formal system.
In other words, the Turing machine is a very concise general-purpose

algorithmic computation system. It seems to grow faster than any other
function in any other formal system when both the system's definition and
the function's arguments are counted as part of the data length.
Beyond the Busy Beaver Function
The Busy Beaver function, and anything we'll discuss after it, by
necessity must go beyond functions, algorithms and computability. Imagine
any sufficiently general definition of formalism (such as the Turing
machine) and then define a function f(N) giving the maximum value of the
results of its computation in terms of N, a suitably formalised
specification of the amount of information used to define the formal
system and the algorithm. f(N) will have a finite value for any finite N
and can be said to grow at a certain rate. Because all f(N) are finite for
all finite N, there exists a g() such that g(N) is greater than f(N) for
all N.
By necessity, it is impossible to define g(N) in any specific way because

the entire realm of formal systems and algorithmic definition is already a
part of the definition of f(N). By necessity, g(N) cannot have a clear
definition: if it did that definition is formalizable and capable of being
computed by the Turing machine, and is therefore already part of f(N).
Page 63 of 100
At this point in the discussion (or usually sooner) it becomes apparent
that there is additional knowledge and assumptions "outside the system".
An effort is made to identify these, define them precisely and add them
into the quantity N. After doing this, it is soon discovered that the
resulting formal system itself depends on things outside itself, and so
on. I have encountered many expositions, discussion threads, etc. over the
years, that begin with an optimistic determination to formalise the
problem and quantify exactly how large numbers can be derived from first
principles; they all have ended up somewhere in this jungle of
abstraction. Here is a relevent quote:
I have this vision of hoards[sic] of shadowy numbers lurking out there in

the dark, beyond the small sphere of light cast by the candle of reason.
They are whsipering to each other; plotting who knows what. Perhaps they
don't like us very much for capturing their smaller brethren with our
minds. Or perhaps they just live uniquely numberish lifestyles, out there
beyond our ken.
— Douglas Reay12
Oracle Turing Machines
One popular approach involves the use of "oracle functions". An oracle

function is a function that has a definite value but is not computable in
any ordinary sense (because for example it might take an infnite number of
steps to compute). The halting problem function (which returns 1 or 0
depending on whether or not a given Turing machine halts) and the Busy
Beaver function as defined above are both good examples. An "oracle Turing
machine" is a Turing machine that has the capability of computing some
chosen oracle function (usually this is described as a normal Turing
machine that is capable of sending a number to a connected "black box"
which implements the oracle function and sends back a yes or no answer).
A single-oracle Turing machine has its own halting problem which is

different (and in a certain way "harder") than the halting problem for
normal Turing machines. It also has its own Busy Beaver function, which
might grow at a faster rate (possibly depending on what oracle function is
implemented). Both of these are just as difficult as the original Halting
and Busy Beaver functions.
One can of course imagine Turing machines that have two or more oracle
functions, or a single oracle function that answers questions about
another type of oracle machine. If a "first order oracle machine" is a
Turing machine with an oracle that computes the Busy Beaver function for
normal Turing machines, then a "second order oracle machine" has an oracle
that computes the Busy Beaver function for first order oracle machines,
and so on.
(However an oracle function cannot answer questions about a system that

incorporates itself, for the same reason that the original halting
function is uncomputable. To see why, consider an oracle machine that asks
its oracle "will I halt?" and then halts only if the answer is no.)
Page 64 of 100
Nabutovsky and Weinberger have shown[48] that group theory can be used to
define functions that grow as quickly as the Busy Beaver function of a
second-order oracle Turing machine.
Declarative Computation and Combinatory Logic
So far we have approached our large number algorithms mostly by "running a

set of instructions one at a time" techniques. These are part of the
imperative method of computer programming and the central technique of
Turing's work. The Busy Beaver function is the first step into "second-
order imperative programming".
There's a second popular way to formulate systems that express large

numbers, which is closely related to the declarative of "functional"
method of computer programming. This includes the lambda calculus, or its
minimal form combinatory logic.
Turing's work was closely related to, and produced largely similar results
to, that of Alonzo Church. The Church encoding can be used to represent
data and computation operations in the lambda calculus, and computation
occurs by beta-reducing assertions into results.
The lambda calculus is more powerful because (among other reasons) it
allows results to be expressed without the need to figure out how those
results might actually be accomplished. For this reason, in practical
computing problems it is more powerful; that is why so much good work has
been able to be done in LISP and Haskell, for example.
As I'll mention later ("by whichever specific axioms and means..."), there
are mutiple approaches to this type of work. We'll eventually get to Peano
arithmetic, set-theory and first-order formal logic. These approaches
might be avoided for various reasons, including Gödelian incompleteness or
because they simply aren't needed for constructing the desired result.
In the case of large numbers like those we've managed so far, we need only
a three-symbol combinatory logic calculus combined with a simple (first-
order) oracle for its own version of the Collatz conjecture's halting
problem. This three-symbol calculus uses symbols I (identity), K
(constant) and S (application) on parenthesised expressions that are
equivalent to binary trees, i.e. every pair of parentheses contains two
entities which may either be a single symbol or a pair of parentheses that
itself contains two entities. The SKI combinator calculus or "SKI
calculus" is equivalent to the more commonly-known lambda calculus of
Alonzo Church.
Any study of lambda calculus defines symbols for needed terms, operations,
functions, etc. as it goes along (see the Wikipedia lambda calculus for
examples). The SKI calculus might seem simpler in that we're just sticking
to these symbols and the parentheses, but it is equally powerful. In
particular, S, K, and I are just three of the commonly-abbreviated
combinators of the lambda calculus:
Page 65 of 100
I := λx.x
K := λx.λy.x
S := λx.λy.λz.x z (y z)
The SKI calculus is close to being the simplest that is needed to provide
all of the power of the lambda calculus (in fact, only S and K are needed,
because ((SK)K) is equivalent to I). Anything in the SKI calculus can be
converted to an equivalent form in lambda calculus, and vice-versa26.
Therefore (by the Church-Turing thesis), the SKI calculus itself is just
as powerful, from a theory of computation viewpoint, as Turing machines:
for every Turing machine, there is a parenthesised expression that is
valid in the SKI calculus and, when beta-reduced produces a result that is
analogous to the final state (including tape) of the Turing machine.
Ix => x
Kxy => x
Sxyz => xz(yz)
As you might imagine, then, SKI calculus has its versions of the "halting
problem" and the "busy beaver function".
SKI's version of the halting problem is determining if beta-reduction ever

stops:
h(S) := true iff beta-reduction of S terminates
Alternatively, (for the oracle that we'll get to soon) the halting
function tells whether beta-reduction produces just a single symbol "I":
O(S) := true iff beta-reduction of S produces "I"
Examples of different types of beta-reduction results include:
start beta-reduction result
(((SK)S)((KI
S.K.S.((KI)S) → ((K((KI)S))(S((KI)S)))
)S))
K.(KI)S.(S((KI)S)) → ((KI)S)
h(S) =
true,
K.I.S → I
O(S) =
true
h(S) =
(((SS)S)(SI) true,
S.S.S.(SI) → ((S(SI))(S(SI))) O(S) =
)
false
(length
Page 66 of 100
6
symbols
after 1
step.
maximum
length
possibl
e from
a 5-
symbol
start.)
((((SS)S)(SI
S.S.S.(SI) → (((S(SI))(S(SI)))(SI))
))(SI))
S.(SI).(S(SI)).(SI) → (((SI)(SI))((S(SI))(SI)))
S.I.(SI).((S(SI))(SI)) →
((I((S(SI))(SI)))((SI)((S(SI))(SI))))
I.((S(SI))(SI)) →
(((S(SI))(SI))((SI)((S(SI))(SI))))
S.(SI).(SI).((SI)((S(SI))(SI))) →
(((SI)((SI)((S(SI))(SI))))((SI)((SI)((S(SI))(SI))))
)
S.I.((SI)((S(SI))(SI))).((SI)((SI)((S(SI))(SI)))) →
((I((SI)((SI)((S(SI))(SI)))))(((SI)((S(SI))(SI)))((
SI)((SI)((S(SI))(SI))))))
h(S) =
false,
O(S) =
false
(grows
(etc.)
forever
at an
exponen
tial
rate)
For the "busy beaver function" of SKI calculus, we define the length of a
string in the obvious way:
l(S) := length of the string S (number of symbols not including

parentheses)
Page 67 of 100
and the "output" of a string is the length of the result of beta-reducing
it:
o(S) := the length of the final string after beta-reduction, or undefined

if h(S) is false
then the "busy beaver function" of SKI calculus is:
bb(n) := the largest value o(S) of any string S for which l(S)=n and h(S)
is true
Adam Goucher's Ξ(n)
Some time after Rayo's Number came along (we'll get to it later) Adam
Goucher [54] was attempting to define a large number in a way like that of
the Lin-Rado Busy Beaver function. He recognised that this combinatory
logic system was only equal to Turing machines, and that its bb(n) would
grow comparably to the Lin-Rado BB(n). To make his definition bigger, he
added the oracle symbol O, which indeed makes for a faster-growing bb(n)
function. His description defines what is essentially first-order
arithmetic, i.e. a system like that coming out of the Peano axioms above,
along with a rich set of predefined things like operators + and ×, an
infinite alphabet of variables, etc. As such, it looks a lot like
Hofstadter's Typographical Number Theory [36]. Its computational
capabilities are equivalent to normal Turing machines and the SKI
calculus.
Adding the oracle operator O brings the system up to the level of second-
order arithmetic. Goucher then defined the Xi function Ξ(n):
Ξ(n) = the largest value o(S) of any string S for which l(S)=n and h(S) is
true
This is the same as bb(n) above, but with S allowed to include O symbols
in addition to S, K, and I.
Finally, Goucher's number is defined as Ξ(106).
In this construction, Goucher confused matters a bit by referring to the

first-order arithmetic as "First-order logic", and then asserts that
Rayo's number is defined in terms of that "first-order logic"; but as
we'll see later, Rayo's number is more like a busy-beaver function for
first-order set theory, which is more powerful than nth order arithmetic.
For more on Goucher's Xi function, see its Googology page: Xi function.

Computation by Formal Logic and Set Theory
Finally we'll spend some time on the third popular way to formulate
systems that express large numbers. This is closely related to formal
logic (ostensibly still part of the school of philosophy) and its use to
Page 68 of 100
establish the foundations of mathematics. This was the approach of Frege,
Russell and Whitehead, and Gödel.
It famously led to the downfall of the ambitions of mathematical

"completeness", but that did not affect our ability to use it to define
large numbers. It is yet more abstract than the functiuonal approach of
Church's lambda calculus, and more powerful because now when we assert
things, they actualy do get satisfied (or not) "in parallel". A
combination of a dozen or so assertions, suitably formulated, can
instantly create an infinite set containing all the results of a new
operator like tetration.
Set-Theoretic Construction of Arithmetic
As things will be getting somewhat more abstract, it helps to review some

of the principles of the foundations of mathematics.
As in any of the methods used so far, asserting the existence of numbers,

including googol, Graham's number or the (unknown but nevertheless well-
defined) Lin-Rado busy beaver function of 100 "BB(100)", requires a formal
language with symbols that are interpreted in a well-defined way.
Rayo's number, which we'll get to eventually, appears to use a popular and
well-studied "structure": a set of objects similar to the von Neumann
universe; with finitary operations and relations common in set theory:
negation (~), conjuncation (∧), set-membership (∈), equality (=), and
existential quantification (∃). Other well-known operations can be defined
in terms of these, possibly by implicit definitions, examples of which
follow.
The natural numbers can be thought of abstractly as any countably infinite

set with a single member analogous to zero and a binary relation (called
successor) that uniquely sets an ordering.
Peano Arithmetic
In the late 1900's, Giuseppe Peano presented the Peano axioms which can be
used to formalise arithmetic. The original Peano axioms comprise:
Concerning the zero element 0:
 0 is a natural number.
Concerning the equality relation =:
 For every natural number x, x=x. (Equality is a reflexive relation.)

 For any two natural numbers x and y, if x=y then y=x (Equality is a
symmetric relation.)
 For any three natural numbers *x, *y, and z: if x=y and y=z, then x=z
(Equality is a transitive relation.)
Page 69 of 100
 For any two things a and b, if a is a natural number and a=b, than b
is a natural number (Natural numbers are closed under the equality
relation.)
Concerning the successor function S:
 For every natural number n, there exists a natural number m such that
m=S(N).
 For any two natural numbers m and n, m=n if and only if S(M)=S(N)
(the successor function is an injective function.)
 For every natural number n, S(n) is false: there is no natural number
whose successor is 0 (and the successor function is not a bijection.)
The Axiom of Induction:
 If K is a set such that [0 is an element of K] and [for any natural

number n, if n is an element of K, then S(n) is an element of K],
then every natural number is an element of K.
In this last axiom, "membership in in the set K" can equivalently be

expressed as "satisfying a unitary predicate φ", that is "x is an element
of K" can be expressed instead as "φ(x) is true" with the understanding
that for any x, φ(x) is either true or false.
This last axiom is critical because it ensures that all of the natural
numbers are connected by the successor relation. Without it, a set like
the following:
{ 0, 1, 2, 3, 4, 5, 6, 7, .... , A, B, C }
would saitisfy the first eight axioms if S(x) were defined such that:
S(0)=1; S(1)=2; S(2)=3; S(3)=4; ...

and S(A)=B; S(B)=C; S(C)=A.
the three elements A, B, and C form a cycle that is "disjoint" from the
other numbers starting with 0. If you start within this cycle, S(x) is
always defined and unique, as it is if you start anywhere in the normal
numbers, but there is no way to get from one to the other or back through
repeated application of the successor function. This type of situation is
eliminated by including the Axiom of Induction.
The induction axiom is second-order, leading to some problems such as

Gödel's incompleteness theorems, so it is sometimes replaced with an
"axiom schema" which adds a specific induction axiom for every specific
predicate.
Regardless of how induction is handled, the Peano axioms can then be used
along with suitable definitions of addition and multiplication, and the
total ordering that comes from whichever type of induction axiom(s) were
used, to develop a theory of arithmetic in which it is possible to prove
Page 70 of 100
such things as the fundamental theorem of arithmetic (that every natural
number except for 0 and S(0) has a unique factorisation into prime
factors; for outlines, see [51] or [55]); a fairly thorough discussion of
this sort of thing in [36].
The von Neumann Construction
As stated so far, we imagined an element "0", and elements "1", "2", "3",
etc. without establishing our right to add these symbols to the basic
symbols of set theory. Set theory itself includes only the null set ∅ = {}
and the ability to construct or define sets that include other sets. This
turns out to be enough, if we define the first natural number 0 as being
the null set ∅, and the successor function S(a) as a U {a}, the union of a
with the set containing a as its only element. The succession of natural
numbers becomes:
0 = ∅ = {} ;
1 = ∅ U {∅} = {∅} ;
2 = {∅} U {{∅}} = {∅, {∅}} ;
3 = {∅, {∅}} U {{∅, {∅}}} = {∅, {∅}, {∅, {∅}}} ;
4 = {∅, {∅}, {∅, {∅}}} U {{∅, {∅}, {∅, {∅}}}} = {∅, {∅}, {∅, {∅}}, {∅,
{∅}, {∅, {∅}}}} ;
etc.
and the natural numbers are
ℕ = {0, 1, 2, 3, 4, ...}
Each natural number n is represented as a set containing n elements, where

those n elements are all of the natural numbers less than n. This also
gives us another operator "for free": If a and b are natural numbers under
von Neumann's construction, the subset relation can be used as the
comparison operation "less than":
a < b ↔ a ∈ b
"a is less than b" if and only if a is a subset of b.
It is also useful to have a set of all natural numbers so that we can use
"a ∈ ℕ" to assert that a is a natural number (and not some other thing,
like an ordered pair, which will be needed later).
There exists a set ℕ such that, for all n

n is an element of ℕ if and only if
n is the null set (there exists no a ∈ n), or
there exists a p such that
p is an element of ℕ, and
p is an element of n, and
there does not exist a q such that
Page 71 of 100
p is an element of q, and
q is an element of n
More Relations and Operations
We have "and" (∧) and "not" (~); from these we get "or" in the standard
way. If p and q are predicates, then "p∨q" can be expressed by
"~((~p)∧(~q))"; or more formally:
p∨q := ~((~p)∧(~q))
If p and q are predicates, then we can define the conditional predicate

"if p then q" by exclusing the one condition in which it would be false (q
true with p false):
p→q := ~((~q)∧p)
Similarly the biconditional "p if and only if q" is defined:
*p↔q := (p∧q)∨((~p)∧(~q))
The universal quantifier "∀" can be defined in terms of existence and

negation. If p(x) is a predicate that takes a single argument, saying
"p(x) is true for all x" is equivalent to saying "there is no y for which
p(y) is false":
∀x(p(x)) := ~(∃y(~(p(y))))
If a and b are sets, the their union c=a∪b can be defined, and its
existence asserted, with:
∃ C ( ∀d ( (d∈C) ↔ (d∈A ∨ d∈B ) ) )
"There exists (a set) C such that for all d, the statement 'd is an
element of C' is true if and only if d is an element of A or of B."
We are given no symbol for the null set, but we can define it and assert
its existence:
∃ ∅ ( ∀ a ( ~(a∈∅) ) )
"There exists (a set) ∅ such that for all a, a is not an element of ∅."
Forming Predicates
By whichever specific axioms and means, the methods of set theory and
formal logic are used to define more predicates, functions, and relations
on the natural numbers. We've already seen a few that are basic to set
theory: the equality and element relations, the successor function, the
predicate φ indicating membership in a certain set. These naturally give
Page 72 of 100
us vary besic number-theory operations of equality, the successor
function, and the ordering/comparison operator.
Peano arithmetic proceeds to define addition and multiplication with all

the familiar properties such as associativity and commutativity; similar
methods can be used to construct more elaborate and faster-growing
functions and operators like tetration and the Ackermann function. The
existence of a number can then be expressed as a "predicate", e.g.:
τ(x) := x = S(S(0)) + S(S(0))
Which roughly states, "τ(x) is a predicate equivalent to the assertion that

x=2+2". It is true only for one value of x, namely S(S(S(S(0)))) which is
4. If we had defined an Ackermann function, we could construct a predicate
equivalent to "x=A(27,143)" and that predicate would "define" a very large
number.
Not So Fast!
As I mentioned above ("by whichever specific axioms and means..."), there

are mutiple approaches to this type of work. The approach I just outlined
is probably more familiar with readers, due to the relative prominence of
Peano arithmetic. It has been used in many set-theory constructions of
mathematics, most notoriously including Whitehead & Russell's efforts
which were ultimately refuted by Gödel; all memorialised in Gödel, Escher,
Bach [36] and other popular works.
Peano arithmetic, in the forms that use the induction axiom of the second
order (i.e. the single axiom covering all possible predicates φ) might be
avoided for various reasons, including Gödelian incompleteness or because
they simply aren't needed for constructing the desired result.
Rayo's Calculus
To really surmount the Busy Beaver function, we'll go to the winner of the
rather colourfully-advertised Big Number Duel at MIT in 2007.
Formulas
The coded formula is constrained to obey a number of conditions that force

the formula to be a combination of five formal logic operations:
a ∈ b (set-membership)
a = b (equality)
~ a (negation)
a ∧ b (conjunction)
∃ a : b (satisfiability)
Before we get to Rayo's construction, which is about as intricate and

elaborate as the incompleteness theorems of Gödel, Turing, or Church, we
Page 73 of 100
will first show how to use this limited symbol set to express numbers in a
traditional "imperative computation" sort of way.
Direct Declaration of the Existence of a Number
When we get to Rayo's number there will be the concept of "being able to
name a number m in a certain number n of symbols".
All numbers are nameable: at the very least, one can assert that the
number is equal to one of the von Neumann cardinals like 1={∅}; this
assertion is done by the rather awkward construction:
1 exists ↔ (
(~∃x3: (x3∈x2) ) ∧ "x2={}"
x2∈x1 ∧ "x2<x1"
(~∃x3: (x3∈x1 ∧ (~x3=x2)) ) "there is no x3 such that x3<x2 and x3<x1"
)
The only way for that to be true is for x1 to have the value "{∅}"=1: the
first part is a sub-assertion forcing x2 to be "{}"=0 ; the next part says
that x1 is bigger than x2 (so must be 1 or higher); and the third part says
that x1 is not bigger than 1.
(I'll note here that it must seem unusual to some readers that it takes so
many symbols to assert the fact that the number 1 exists. But far more
symbols can be used, as seen in Whitehead and Russell's programme of
metamathematics "Principia Mathematica".)
The fact that all numbers are namable this way is of little use; we're
trying to get a specific, well-defined large number. We'll use a Rayo-like
approach, and define a predicate that says "a certain number is nameable
in an assertion of this type, with a limited number of symbols":
ST-namable-in(m, n) ↔
∃Φ(x1): {
Φ has fewer than n symbols ∧
∃s: s = Assign(m, x1) ∧
(∀t: Sat([Φ(x1)],t) → t = Assign(m, x1))
)
Let's evauate the ST-namable-in() function for various values of m and n.

That unwieldy expression asserting the existence of 1 has 36 symbols (10
for the "(~∃x3(x3∈x2))" part that means "x2={}", 23 for the longer part, and
3 more for the (, ∧, ) that joins those together). So we can say that:
ST-namable-in(1, 35) = false

ST-namable-in(1, 37) = true
Page 74 of 100
(It turns true at 37 because the definition of ST-namable-in(m, n)
requires that we can do it in "fewer than n symbols").
We can say the same thing in a different way by changing the first
argument:

Using the type of "existence of n" formula just shown, where we invoke
only the existence of zero and the successor function, is very
inefficient. With each successive assertion of "c is less than b but not
enough that another number d can get inbetween" we need more
subexpressions than we did the last time. To assert the existence of the
number n requires (9n2+43n+20)/2 symbols. If a googol symbols were allowed,
the largest number we could assert would be about 4.714×1049:
ST-namable-in(4.714×1049, 10100) = true

ST-namable-in(4.715×1049, 10100) = false (for now)
But the subset of set theory and formal logic that Rayo chose can do more
than just string together a bunch of a∈b relations. It is, in fact "Gödel-
complete" in the sense that an entire Peano arithmetic can be built upon
it.
Doing Maths in First-Order-Logic and Set Theory
(I might move the next several paragraphs up to an earlier place titled

"doing arithmetic in set theory and formal logic". As we get near the end
of the section, the parts about inventing functions in a fast-growing
hierarchy will need to be summarised in the earlier section, and
repeated/expanded here.)
Since we're using set theory, functions and relations can be expressed as
(infinite) sets, and set-membership asserts a relation. For example, the
set P (for "Plus") would be the set of all valid addition relations,
consisting of ordered triples; and it might start out: Plus = { (0,0,0),
(0,1,1), (1,0,1), (0,2,2), (1,1,2), (2,0,2), (0,3,3), ...} where each of
those digits is a von Neumann cardinal like 2={{},{{}}}. We don't have to
spend an infinite number of symbols to define P expicitly, we could "just"
say:
There exists a set P such that, for all a

a is an element of P if and only if
a is an ordered-triple such that either
a consists of three zeros (0,0,0), or
there is another b that is also an element of P where
b3+1=a3, and
Page 75 of 100
b1+1=a1 ∧ b2=a2, or
b1=a1 ∧ b2+1=a2
It's unwieldy, but it works (and again, it's very close to how addition is
built up in the Gödel construction for demonstrating incompleteness.) We
used things like "for all" (∀) and "if and only if" (↔) that are not in
the allowed 7 symbols, but that's okay because these can be defined in
terms of the others. (For example, ∀x:P(x) is equivalent to ~(∃x:~P(x)} ).
I've skipped over how we make ordered triples: we have parentheses but
have no comma "," to construct ordered-tuple literals like this; instead
"(*a,b,c)" is shorthand for a set like {{*a},{a,{b}},{a,{b},{{c}}}}.
Things like "the second item of b" have to be expressed in terms of more
temporary variables.
With such a definition of P containing all triples of valid addition

relations, a statement like "a+b=c" is expressed as
a+b=c ⋛ {a,{b,{c}}}∈P
This also cannot be done directly, because our language has no { }

symbols: we cannot construct set literals. Instead we must do something
like:
there exists a t such that

a is the first element of t ∧
a is a natural number ∧
b is the second element of t ∧
b is a natural number ∧
c is the third element of t ∧
c is a natural number ∧
t ∈ P
Once addition is defined, a similar set M (for Multiplication) can be

defined and a similar iterative definition used to assert that all ordered
triples in M consist of items that can be related to other elements of M
by the successor relationship combined with addition (i.e. membership in
P).
If you look back through these pages, you can see that all of the fast-
growing functions have been defined this way: from tetration to the
Ackermann function to chained arrow notation to the Bowers Extended Array
notation, everything is defined by subtracting 1 from one number and
applying an operation (either the one being defined, or a previously-
defined operation) to another number.
So, this gives us a more efficient way to prove numbers are ST-namable in
a googol symbols. Suppose that the definition of P outlined above took
1000 symbols, and we didn't define M or any other "operators". Given the
Page 76 of 100
primtive (as above) assertion that "2 exists" we can combine it with the
definition and application of addition as follows:
(there exists an x1 such that (x1 is the number 2) ) ∧

(there exists a P such that for all a ... (define P as above)) ∧
(there exists an x2 such that x1+x1=x2 ) ∧
...
We are iterating addition in the same way that our previous approach
iterated the succssor function. If the definitions needed to set up
ordered triples and P take 1000 symbols, and each "a+b=c" takes another
1000 symbols, then the number 2n could be expressed in a bit over
1000n+1000 symbols. With a googol symbols, we can now assert the existence
of numbers as high as 2(googol/1000) ≈ 103.01×1096. We've made it to class 3! Woo
hoo!!
Of course, we continue, using all our old tricks to define ever-faster-

growing functions with more and more arguments> We could just methodically
follow one of the well-defined fast growing hierarchies, and every time we
add a new iteration paradigm we use up 1000 or 10,000 or even a million
symbols. It's easy to see we can get through all the stuff we've already
discussed on these pages (except the uncomputable Busy Beaver function)
without putting a dent into our budget of a googol symbols.
But we don't have to figure out what techniques to use to define an

optimum fast growing hierarchy: "ST-namable-in()" is defined as the
biggest thing anyone could accomplish with a given number of symbols (or
fewer). It is effectively the "busy beaver function" of the computation-
schema built on first-order set theory.
How do we coerce our propositional calculus to do the magic for us and

compute this result instantly? One way to do it would be to apply the
infinite parallelism of set theory: define a set that contains all
possible predicates in the Rayo 7-symbol calculus, linked to variables
whose values are sets containing all things (including any natural
numbers) that satisfy those predicates. We can't do that directly, but we
can employ the completeness property of our formal system to get the
system to compute the results of its own predicates, using the methods of
Gödel-numbering.
Truth and Uniqueness
One issue that must be addressed is how to determine (given a definition a

number) that it actually does define a single specific number. Suppose we
formulated the following statement:
Φ(x) := x is greater than 1010100 and is a Mersenne prime.
Page 77 of 100
Does this define a single number x? We don't know. The largest Mersenne
prime is less than 101010 and probably will be for some time. There might or
might not be a finite number of Mersenne primes (that's an unproven
conjecture), but if there are a finite number of them there is a largest
one, and if that largest one were the only one greater than 1010100, then
Φ(x) would be true for only one x. The trouble is that, given the
difficulty of proving the conjecture, it might take an infinite amount of
computation to confirm that there is only one such value.
This issue is addressed in the Busy Beaver function by assuring that only
halting Turing machines are allowed to be considered as candidates: the
number x must be computed in a finite number of steps and the machine must
then stop and not consider any higher values for x.
In formal logic, we need the formula Φ(x) to be true for some x, and we
also need the system to somehow verify this, through a finite number of
deductive steps.
I'm not going to try to explain all the concepts and methods fully, but
several articles in the Stanford Encyclopedia of Philosophy should be
helpful:
The basic terminology and smbols of formal logic are described in the
Classical Logic article.
Tarski's Truth Definitions discusses the task of expressing "truth
predicate" True(x) in terms of a formula Φ(x). It also discusses "variable
assignments that satisfy formulas". The entries on model theory and on
quantifiers and quantification discuss these ideas more generally, and the
latter links them to the definition of truth.
Rayo refers to "second order" and plural quantification in the definition
of Sat([Φ],s).
Rayo's Number
I still don't understand quite how this works, though most of the needed
explanation is in the links I gave at the end of the previous section, and
in the further readings list at the bottom of the Big Number Duel page at
MIT. My general impression is that most of Rayo's definition is explained
by the need to use methods of model theory to formalise the definition of
a "namable number" and ensure that the (second-order) system determines
the truth of the existence and uniqueness of any number that is so
namable, through arithmetisation of the first-order formulas, and
variable-assignments, formulas and satisfiability predicates in the
second-order system. I will describe the building blocks of Rayo's number
as nearly as I can.
Page 78 of 100
Variable Assignments
A variable assignment isn't just a single number or set, but an infinite

set of objects. Since we're operating within set theory, these "objects"
could be finite numbers using the von Neumann Construction, which are sets
with a finite number of elements each of which are sets; but they could
also be sets that are not in ℕ. With a suitable construction like that of
von Neumann, we could encapsulate the entire state of a Turing machine at
any particilar step: its state machine, current state, and tape contents
(all of which can be expressed in a finite number of bits). Then the
objects in the variable assignment could represent successive steps in the
execution of a Turing machine. Just as in the definition of the P set
containing all valid addition-triples, we can define the set's contents by
induction.
This isn't quite what Rayo's construction does, but it's conceptually
similar.
Gödel-Coding
Rayo's number relies on Gödel-numbering to relate a formula to a variable

assignment that is linked with it through the R() relation. Gödel-coding
is simply turning a formula into a number. For example, the formula 'x1∈x2'
has three symbols; recall that there are seven pre-defined symbols, so the
variables start with the 8th prime number. Rayo's Gödel-numbering might use
the assignments:
'∈' = 2, '=' = 3, '(' = 5, ')' = 7, '~' = 11, '∧' = 13, '∃' = 17 ; x1 = 19,
x2 = 23, x3 = 29, ...
And so the Gödel-number of 'x1∈x2' would be
21932521
where the exponents {19, 2, 21} correspond to the three symbols {x1, ∈,
x2}.
Rayo represents the Gödel number of a formula by putting the formula

inside square brackets:
[x1∈x2] = 21932521
Assignment
Variable assignments are so-called because they are the result of taking a
set of expressions with a free variable and replacing that free variable
with something that has a specific definition. (This is type of thing is
also used in Gödel's incompleteness proofs)
Page 79 of 100
Assign(m, x1) = s ↔ s is a variable assignment in which every x1 is
changed to m
Rayo's R() Relation
The definition of Sat([Φ],s) invokes a mysterious relation R(), which

always appears with two arguments; the first is a Gödel number of a
formula represented as a Greek letter, like '[ψ]', and the second is a
variable-assignment represented as a letter s or t.
This relation is a temporary definition, and it is a relation forming a

bijection between Gödel-coded formulae and assertions in Rayo's 7-symbol
subset of first-order logic; as a bijection is a set of ordered pairs of
items, R() encapsulates all such pairs.
R() has to be infinite set of ordered pairs, but let's give a finite
example and call it r().
r() := { ( 21932521, ({}, {{},{{}}}, {{},{{}}}) ), ( 22133523, ({}, {{},{{}}},

{{},{{}}}) ) }
r(α, b) ↔ ( (α = 'xi=xj') ∧ (bi = bj) ) ∧

( (α = 'xi∈xj') ∧ (bi ∈ bj) )
The first ordered-pair in r() has '21932521', which is the Gödel-number of

the formula 'x1∈x2', and it has '({}, {{},{{}}}, {{},{{}}})', an ordered-
triple containing the von Neumann numerals (0, 2, 2). Notice that the
first number (0) is less than the second number (2). The definition of r(α,
b) is satisfied here, because if x1 and x2 are taken to be the first and
second elements of the ordered triple (0, 2, 2), then x1 = 0 = {} and x2 =
2 = {{},{{}}}, and indeed x1∈x2 because {} is a subset of {{},{{}}}.
Two-Way Existence-Equivalence Assertion
The definition of Sat([Φ],s) and of Rayo-namability both use a construct

like this:
(∃ a : P(a)) ∧ (∀ b : Q(b) → P(b))
where P() and Q() are predicates. It states that there is at least one
thing with the property P(), and all things with the property Q() also
have the property P().
Since predicates can also be thought of as set-membership, the existence-

equivalence assertion can also be expressed:
(∃ a : a∈P) ∧ (∀ b : b∈Q → b∈P)
where P and Q are sets. It states that P is a non-empty set and that Q is
a subset of P.
Page 80 of 100
Definition of Sat()
Where φ is a formula and [φ] is its Gödel number, use "Sat([φ],s)" to

abbreviate the following second-order formula (where the second-order
quantifier is understood plurally):
Sat([φ],s) :=
∀ R {
{for any (coded) formula [ψ] and any variable assignment t
(R([ψ],t) ↔
([ψ] = 'xi ∈ xj' ∧ t(xi) ∈ t(xj)) ∨
([ψ] = 'xi = xj' ∧ t(xi) = t(xj)) ∨
([ψ] = '(~θ)' ∧ ~R([θ],t)) ∨
([ψ] = '(θ∧ξ)' ∧ R([θ],t) ∧ R([ξ],t)) ∨
([ψ] = '∃xi (θ)' and, for some an xi-variant t' of t, R([θ],t'))
)
} →
R([φ],s) }
Rayo-nameability
The definition of Rayo's number depends on being able to find a number

that can be "computed" by a sequence of steps represented by the variable-
assignment according to a formula in first-order set theory, as verified
by a Godel-type emulation asserted by putting the formula and the
variable-assignment together in a Sat() function. The ability to do this
for a number is called "Rayo-nameability".
m is Rayo-namable if and only if:

there is a formula Φ(x1) in the language of first-order set-theory (as
presented in the definition of 'Sat') with x1 as its only free variable
such that:
(a.) there is a variable assignment s assigning m to x1 such that
Sat([Φ(x1)],s), and
(b) for any variable assignment t, if Sat([Φ(x1)],t), then t assigns m
to x1.
Using more notation, this would be:
Rayo-namable(m) ↔
∃Φ(x1): {
)
But the fact that all numbers are namable this way is of little use; we're
trying to get a specific, well-defined large number. So Rayo expresses
"the largest number that satisfies an assertion of this type, but with a
limited number of symbols":
Page 81 of 100
Rayo-namable-in(m, n) ↔
∃Φ(x1): {
Φ has fewer than n symbols ∧
)
This is the fully powerful version of the toy ST-namable-in() function

that we explored above.
Rayo's Number
Having gotten all that out of the way, the rest is simple. The "busy
beaver function" for this assertion-schema in von Neumann universe,
sometimes called FOST(), is:
FOST(x) = The smallest number n for which Rayo-namable-in(n, x) is false.
Then Rayo's number is FOST(10100).
BIG FOOT
In the years since the Big Number Duel, an online wiki / forum has built
up, centred around the "Googology wiki" on wikia.com.
Not satisfied with Rayo's number, the self-named "googologists" have made
several attempts to top it. Re-use of earlier ideas does not count; any of
these:
FOST(10100) + 1
10FOST(10100)
FOST(1010100)
FOST(Mega)
FOST(Graham)
FOST(googol→googol→googol→googol→googol)
FOST(Bowers{googol,googol,(googol),googol})
FOST(BB(googol))
FOST(FOST(googol))
FOST10(googol)
would not considered a new champion, because (under the rules of the Big
Number Duel that spawned Rayo's number), every new champion must use a
significantly new technique.
As of this writing (early 2016), the googologists consider this to have

been accomplished by "Wojowu" or "LittlePeng9", the person who described
BIG FOOT.
Page 82 of 100
They start by augmenting Rayo's calculus with the [ and ] symbols. These
are not the same as Rayo's [ ] (which were for Gödel-coding of a formula);
rather
if variable x has oodinal of value α, then [x] = Ordα
Ordα is the least oodinal which is greater than every ordinal which can be
defined in language of FOOT with parameters of rank below Ordα, if we allow
it to use constant symbols Ordv(β} for every β<α.
There is a lot more to define what "oodinals" are and how to work with.
They are not (necessarily) sets, but they have a lot in common with ranked
set theory e.g. the von Neumann universe. You can read all about it here:
[First-order oodle theory|http://snappizz.com/foot]. You might want to
start with the "Higherorderset_theory" article by the same author.
Anyway, they define a function FOOT(n) which is conceptually similar to

Rayo's FOST(n), and then:
BIG FOOT = FOOT10(10100)
The Frontier
As of this writing (last checked in 2016), this seems to be the frontier

of development for the expression of large finite numbers. Of course, many
people try, but everything seen so far appears to duplicate or fall short
of the results shown here so far. At the very least, it seems one now
needs to be proficient in computation theory, set theory, and formal logic
to stand a chance of creating a new champion.
If you're interested in defining larger functions, go right ahead, please

check your new function carefully to see if it really pushes the limits a
significant amount. If you only use the methods described on these pages,
then your new function will not push the limits a significant amount.
Transfinite and Infinite Numbers
Beyond all the finite numbers are transfinite numbers and infinities. Once
we go beyond finite numbers, we enter an area where it is essential to
define exactly what theory of numbers we're working in.
Most number theory follows the axiomatic method, a discipline established

by Euclid in the study of geometry and later adapted to every other branch
of mathematics. By the axiomatic method, results are found by starting
with a set of axioms and strictly following a set of rules to derive new
results. This technique seemed flawless until the development of non-
Euclidean geometry in the 19th century, which showed that one could
construct equally valid, useful, and consistent versions of a given type
of mathematics (e.g. geometry) by starting with a different set of axioms.
Mathematicians were even more surprised in the 1920's when Gödel showed
that no (sufficiently powerful) axiomatic system of number theory can
Page 83 of 100
prove all statements which are true in that system. It is now agreed that
this phenomenon of incompleteness is a property of all axiomatic systems.
Depending on what type of number theory you're looking at, there may or
may not be transfinite numbers and there may or may not be a plurality of
infinities. These differences result from the use of different axioms and
rules for deriving results. Different axioms and rules lead to different
results including different answers to the question what lies beyond all
the integers?. Because different systems are useful for different things
and none can generate all useful results (due to incompleteness as
demonstrated by Gödel) we end up with several different 'right answers' to
the question. None is the 'best' answer, but some are more popular than
others. (The term transfinite itself is a result of this — it was Cantor's
effort to avoid using the term infinite for certain quantities that were
definitely not finite, but did not share all the properties of what he
considered truly infinite, and now called "Absolute Infinite".)
In the discussion to follow, it is often difficult or even meaningless to

compare the various definitions of infinities to each other, trying to
determine which is larger. However, within any one number theory system
the infinities can usually be put into a clear order.
Georg Cantor developed two different systems of infinities, called ordinal

and cardinal, out of his work in set theory during the 1870's and 1880's.
His work still suffices for most purposes (much as Newton's physics is
sufficient for most engineers).
Ordinal Infinities
The transfinite numbers, also called ordinal infinities, arise out of a

set of axioms from which one gets the nonintuitive result that "infinity"
and "one plus infinity" are equal, but "infinity plus one" is bigger.
Here, "infinity" can refer to any of a large number of different types of
infinity. The smallest of them is called omega, which will usually be
symbolised w.
The First Cardinal Infinity: ℵ0
The cardinal systems are more familiar. In these systems, order is

irrelevant in counting. Cardinal infinity systems are more common in set
theory because most set theories have the property that sets are
considered equivalent when reordered. Cardinal infinities also occur in
topology, geometry and fractal studies because of the practice of treating
geometrical objects as "sets" of points.
In cardinal systems, the first or "smallest" infinity is ℵ0, pronounced

"alef-null". This is the one that most people think of when they think of
infinity — the number of integers, or where you'd get to if you counted
"forever". Since we're talking about cardinal numbers, adding one does not
change the value: ℵ0 + 1 = 1 + ℵ0 = ℵ0. Also, it's the same infinity even if
Page 84 of 100
you counted the integers by taking all the evens first, and then the odds:
infinity even numbers plus infinity odd numbers; the total is just
infinity, not "two times infinity". All you did was reorder the numbers;
that never changes how many there are.
This infinity is also the size of an infinite Euclidean geometrical
object, like the length of a line, the area of a plane, etc. when measured
in terms of another finite unit such as a line segment. Here we are
referring to "size" in terms of measure, where specific distances are
taken into account, not in terms of order, which is the number of elements
in a set and therefore the number of points in a geometric object.
The Ordinal "Countable" Infinities
Now we switch back to the ordinal systems. As mentioned above, in the

ordinal systems we have the strange result that infinity + 1 is a
different quantity from infinity, but that 1 + infinity is equal to
infinity. In the ordinal systems a lot of work is done to construct ever
higher and higher infinities, developing rules for how addition (and
later, multiplication, exponents, etc) work and inventing new symbols as
you go along. I'll skip the details and just list some of the ordinal
infinities. Each line gives an ordinal infinity (sometimes in more than
one equal and equivalent form), and each line is a larger value than the
lines before it. Also, in most cases we're leaving out an infinite number
of lines between each line and the next:
"omega" = ω = 1 + ω = 2 × ω = ℵ0
ω + 1
ω + 2
ω + ω = ω × 2
ω + ω + 1
ω × 3
ω × ω = ω2
ω2 + 1
ω2 + ω
ω3
ω3 + ω2 × 3 + ω × 3 + 1
ωω = 1 + ω + ω2 + ω3 + ω4 + ω5 + ...
ωω + 1
ωω + ω
ωω + ω2
ωω + ω3
ωω + ωω = ωω × 2
ωω × ω = ωω + 1
ωω + 1 + ω
ωω + 1 + ωω
ωω + 2
ωω × 2
ωω2
Page 85 of 100
ωω3
ωωω
ωωω + 1
ωωω × 2
ωωωω
ωωωωω
ωωωωω..
ε0 = ωωωωω..... (with ω omegas)
Epsilon-Null
Cantor defined ε0, "epsilon-null", to be the first ordinal infinity that

could not be expressed with a finite number of omegas and/or integers
combined with addition, multiplication, and exponents. I see no particular
reason why Cantor had to do this, except that he did not consider using a
hyper4 operator. Since we do have the hyper4 operator we'll go ahead and
use it, and continue the series (repeating the last line and continuing):
ε0 = ωωωωω..... (with ω omegas) = ω④ω

ω④ω + 1
ω④ω × 2
ω④ω × ω
(ω④ω)2
(ω④ω)ω
(ω④ω)④ω
ω④(ω④ω)
ω⑤ω
ω⑥ω
...
Somewhere along this sequence or perhaps after it (it is unclear from the
sources I have access to) are various higher "epsilons" ε1, ε2, εω, εε0 and
so on, and then a quantity Cantor calls alpha, which represents the first
quantity that cannot be handled by the epsilon sequence.7,11
All Ordinals Countable by Reordering
This process continues, of course, through higher hyper operators (which

is as far as Cantor took it), then through the same procedures we used on
the finite numbers: triadic operators, Ackerman functions, chained-arrow
notation, and so on: All of these techniques will generate higher and
higher, distinct ordinal infinities. The limit of finite algorithmic
iteration on the ordinal infinities is given by a sort of transfinite
ordinal busy beaver function. Beyond that are other non-algorithmically-
reachable constructions of ordinal infinities.
All of this is possible because of the original axioms and rules of the
ordinal system, which state that the order you count things in makes a
difference. But what if you're allowed to reorder the items when counting
them? That would amount to switching to a cardinal counting system. When
Page 86 of 100
this is done, all of these ordinal infinities turn out to be equal! They
are all equivalent to the cardinal ℵ0. For that reason, Cantor put all the
ordinal infinites listed so far in a "class" and labeled that class ℵ0.
All of these infinities are called countable because, if appropriately

reordered, a set with ω + 1 or ωω or ω⑤ω elements can be shown to have
the same number of elements as the set of positive integers. (Such sets
are called "countable" because you can "count" their elements with
integers, and be sure that every one will get a number.)
Definition of ℵ1
After showing how to construct all these countable ordinal infinities,

Cantor then defined a new ordinal infinity omega-one or w1 to be the number
of countable ordinal infinities. This number, the number of countable
ordinal infinities, is bigger than ℵ0 even if treated as a cardinal number:
there is no way to reorder the ordinal infinities in such a way that you
can assign a different integer to each one. Any attempted ordering will
leave at least one un-numbered.
In order to define w1 Cantor had to use cardinal counting, where order

doesn't matter and one-to-one mappings are used to show if two sets have
the same number of members (more on this later).
In the ordinal system, ℵ1 is called ω1. It is the first non-countable

infinity. The process of constructing ordinal infinities continues, and is
even more tedious than the process that we used with the omegas. The
resulting ordinal infinities all fall into a second "class" when counted
in a cardinal system, and this class is called the ℵ1 class, because when
counted in the cardinal manner, any set with a number of elements
constructed by this process has ℵ1 elements.
The Order of the Continuum
In geometric set theory systems, which are cardinal systems, the ℵ-series
is not used (although ℵ0 may occasionally be used or implied by the use of
the term "countable"). In these systems, the next infinity after the
"countable" is c, called the *order of the continuum* or sometimes simply
the continuum. One also sees reference to a continuum, in which case the
reference is to a geometric/topological set that has c elements, that is
to say, a geometric object containing c points. Examples of a continuum
are a straight line, or the real numbers.
Since we are in a cardinal system, ℵ0 × 2, 2 × ℵ0 and ℵ0 × ℵ0 are all equal

to ℵ0, but 2ℵ0 is bigger, and in fact
c = 2 ℵ0
c is the number of points in a line segment (canonically the open set

consisting of all the points on the real line from 0 to 1 but not
Page 87 of 100
including 0 and 1 themselves). c is also sometimes called the line
segment's measure.
Amazingly, this is also equal to the number of points on a line of

infinite length.
Imagine a line segment of length 1 and an infinite line. The line segment
has a midpoint Q0 and the line has an arbitrary centre point P0. Now, every
point P on the line has a coordinate CP corresponding to that point's
distance from P0, positive on one side and negative on the other. Every
point Q on the line segment has a similar coordinate CQ. To show that the
two objects (the line and the line segment) have the same number of
points, all we need to do is to supply a mapping function such as the
following:
CQ = arctan(CP) / pi
Each point P has a unique coordinate CP, and each value for CP generates a
unique value for CQ by this formula, which corresponds to a unique point Q
on the line segment.
The continuum is the number of real numbers. Real numbers include anything
that has a decimal point and a finite (or infinite) number of digits, with
a repeating or nonrepeating decimal pattern. Most real numbers have an
infinite number of digits after the decimal point and no repeating
pattern.
Real numbers can be used to show that the number of points on a plane is
equal to the number of points on a line. For each point on a plane, there
is a unique pair of coordinates, such as (2.21751..., 6.40861...) or
(9.40589..., 3.25361...), etc. Take the digits of the two coordinates and
form a single number by interleaving the digits: one digit from the first
coordinate, then one digit from the second, then another digit from the
first coordinate and another from the second, and so on. All the digits
get used once, none get duplicated or thrown away. The result is a single
real number that is different from the number you would get from any other
pair of coordinates:
(2.21751..., 6.40861...) becomes 26.2410785611...

(9.40589..., 3.25361...) becomes 93.4205538691...
(1.01489..., 0.99749...) becomes 10.0919478499...
etc.
(This is another example of a one-to-one mapping, this time successful. It

is a technique used often in set theory)
c is also the number of sets of integers, which is also the number of

ascending integer sequences (just reorder each set of integers so their
elements are in ascending order). An ascending integer sequence is
something like:
Page 88 of 100
-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, ...
-2, 1, 2, 4, 5, 7, 8, 10
0, 2, 4, 5, 7, 8, 10, 16, 17, 19, 22, ...
1, 2, 4, 8, 16, 32, 64, 128, 256, ...
1, 3, 4, 7, 10
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, ...
where there are a finite or infinite number of integers and each one is
bigger than the one before it. The number of possible sequences is
infinite, and can be proven to be bigger than the number of integers. It
can also be proven to be equal to the number of real numbers with another
one-to-one mapping (here, we're skipping a detail that is necessary to
avoid problems with integer sequences that have no definite start, as for
example the set of negative even integers):
 Starting with any real number *X, define its simple continued
fraction* to be the expression of the form:
A + 1 / (B + 1 / (C + 1 / (D + ... ) ) )
where A is an integer, B, C, D, ... are positive integers, and the

expression converges on the value of X (that is, the value of the
expression, when all of its terms, perhaps infinite in number, are taken,
is exactly equal to X.) For each real number there is exactly one such
simple continued fraction and each real number gives a different simple
continued fraction. If the real number is a rational number the continued
fraction has a finite number of terms.
 Now, replace the expression with the ordered integer sequence:
[ A, B, C, D, ... ]
For each simple continued fraction there is exactly one such sequence and
each simple continued fraction gives a different sequence.
 Now, replace each sequence with another sequence by taking sums to

get an ascending sequence:
[ A, A+B, A+B+C, A+B+C+D, ... ]
Each ordered sequence gives exactly one ascending sequence and each
ordered sequence gives a different ascending sequence.
To get the one-to-one mapping from ascending sequences back to real

numbers, just reverse the process.
There are other ways to prove that c is the order of the power set of the
integers; Cantor proved it in a manner similar to that discussed here.
Page 89 of 100
The Continuum Hypothesis
After developing the ordinal and cardinal theories to this point, Cantor
could not determine whether c was distinct from ℵ1 or equal to it. Cantor
tried for a long time to discover a set of points that had more than ℵ0
points but less than c (if found, he could say that this set had ℵ1 points,
and c would be ℵ2 or larger). He couldn't find such a set, and then
proposed what is now called the continuum_hypothesis:
c is equal to ℵ1 ? (continuum hypothesis)
Cantor then tried to prove or disprove this hypothesis but never

succeeded. Today, with the benefit of Gödel's results, it is not
surprising to see why he had so much trouble: Cantor was attempting to
combine or assimilate results from two different formal systems: the
ordinal and cardinal types of counting.
In an ordinal system, 1 + X is not always equal to X + 1, but X × 2 is

always greater than X. In a cardinal system, 1 + X equals X + 1 but X × 2
is not always greater than X. Another more formal way of saying this is
that ordinal systems retain the property of a unique multiplicative
identity and cardinal systems retain the property of commutativity — but
neither retains both.
Gödel showed in 1940 that Cantor could not have disproved the continuum
hypothesis using his axioms (which are now called "Zermelo Fraenkel set
theory with the Axiom of Choice", often abbreviated ZFC), Paul Cohen
showed in 1963 that Cantor could not have proved it either. For this work,
Gödel and Cohen both did major new work in the field of metamathematics,
which involves "modeling" mathematical axiom-proof systems with "bigger"
systems.
So, at least in standard ZFC set theory, the continuum hypothesis must be
declared to be true or false using a new axiom, or left undecided (as
Cantor did). You get a different system of infinities each way. By the
1990's, most mathematicians preferred to define the continuum hypothesis
as being false (mostly because of the usefulness of the results that can
be derived). The implication is that (if you follow the preference of the
mathematicians) c is greater than ℵ1.
The Power Sets of the Continuum
Returning permanently to cardinal set theory, we proceed to higher

infinities beyond c. The set of integers, and all other countable sets,
has ℵ0 elements. A continuum (like a line) has c points, and if CH is
assumed to be true, the set of integer sequences also has c elements. The
set of integer sequences is an example of something called a power set:
the set of all subsets of some other set. Cantor showed that power sets
always have more elements than the set from which they were constructed,
and so generate another higher infinity.
Page 90 of 100
Let S1 be a set with ℵ0 elements (like the set of integers)
Let S2 be the set of all countable ordinals
Let T be a set with c elements (like the set of points on a line)
Let T' be the set of all subsets of T (the power set of T).
Let T'' be the set of all subsets of T' (the power set of T').
etc.
If (as is more commonly assumed), Continuum Hypothesis is false, then we

say:
ℵ0 is the order of S1. (The number of elements in S1).

ℵ1 is the order of S2.
ℵ2 is the next ordinal infinity after ℵ1.
ℵ3 is the next ordinal infinity after ℵ2
etc.
c is the order of T.
2c is the order of T'.
22c is the order of T''.
etc.
AND there is no proven relation between the two series, other than that c
is bigger than ℵ1.
In cardinal set theories it can be shown that that there are no infinities
"in between" these. Any definition of an infinite quantity can be shown to
be equivalent to a member of the power set sequence. Since Continuum
Hypothesis taken to be false, c cannot be equivalent to ℵ1, but it could be
ℵ2 or one of the higher ones. All of the higher power sets would then
coincide in the same way. For example, if c were ℵ2, then 2c would be ℵ3
and so on.
Consider the order of the set T':
c* = 2c = order of set T'
This infinity is usually thought to be equal to the number of distinct

sets of points in a Euclidean space. This is a little difficult to
comprehend; an easier definition to comprehend is the number of distinct
"wiggly lines" in two-dimensional space. A "wiggly line" in this case can
be extremely convoluted, such that any level of magnification will show
more and more wiggles (like a fractal, but not necessarily a self-similar
fractal).
The next infinity after c* or 2c is c or 22c. There appears to be no useful

geometrical definition or application (outside set theory) for this or any
of the higher infinites. Whereas the first three infinities can be thought
of as counting the number of integers, points, and curves in 2-d space, 22c
doesn't appear to count anything geometrical. Anything we've found that
can be counted is covered by one of the lower infinities.
Page 91 of 100
This idea of only three useful infinities is hauntingly reminiscent of the
(perhaps mythical) "one, two, three, many" of the Hottentots, bringing us
full-circle back to class-0 numbers.
Inaccessible Infinities
Finally consider the limit of these processes:
ℵ0, ℵ1, ℵ2, ℵ3, ... (ordinals)

ℵ0, c, 2c, 22c, ... (cardinals)
In each of these processes, imagine the infinity you "get to" as you carry
the process on "forever". This includes any algorithmic process in which
the number of steps is finite, working up to such things as ℵBB(n) where
BB(n) is the busy beaver function and N is some gratuitous huge integer.
Since the infinities all have an integer subscript, the "number of

infinities" (or number of classes, if you are working within an ordinal
system) is ℵ0, and the "limit" of the process of defining higher infinities
is the "ℵℵ0" class (ordinal system), or "2④ℵ0" (cardinal system).
Then you make another definition (still in a formal well-defined way) so

you can talk about ℵℵ0 directly and thence move on to ℵω+1 or ℵℵ1 (depending
on whether your larger formal system uses ordinal rules or cardinal rules,
respectively). This process can be continued, and eventually formalised
through another level of abstraction to construct even higher infinities.
One of these is so big that is is equal to its own ℵ-number: theta = ℵtheta.
If you stay "within the system" while doing this process, by sticking to
well-defined symbols, rules, axioms, etc. you can create more and more
infinities, but you will always be working within a formal system of
number theory or set theory.
However, all number theories and set theories are incomplete. It has been
shown that by going outside the system you can demonstrate the existence
of "inaccessible cardinals" or "inaccessible infinities", which are bigger
than all of those producible through formal systems. This result is
analogous to the computation-theory result of the uncomputable functions.
Note. I try to explain things at least a little bit, and to give suitable
references. I definitely do not follow my own First Law of Mathematics. If
you suggest an improvement for these pages, I'll probably be able to do
something to make it better — just let me know (contact links at the
bottom of the page).
Footnotes
1 : http://www.sizes.com/numbers/big_numName.htm Anonymous author at

"sizes.com", Names of big numbers, 2004.
Page 92 of 100
2 : http://www.miakinen.net/vrac/nombres#lettres_zillions Olivier
Miakinen, Écriture des nombres en français, (web page) 2003.
3 :
http://web.archive.org/web/20061021030550/http://www.io.com/~iareth/bignum
.html Gregg William Geist, Big Numbers (web page), 2006 (Latin number
names, some of the large examples like centumsedecillion)
4 : http://www.miakinen.net/vrac/zillions Olivier Miakinen, Les zillions

selon Conway, Wechsler... et Miakinen (web page), 2003.
5 : http://www.graner.net/nicolas/nombres/wechsler.txt Allan Wechsler,

"Re: Number names" (newsgroup message), 2000.
6 : The Oxford English Dictionary (Second Edition), 1989, entry for

million (vol. IX, pp. 784-785), sense 1. a. (a)
7 : Conway and Guy, The Book of Numbers. See bibliography entry [43]
below.
8 : http://yudkowsky.net/singularity.html Eliezer Yudkowsky, Staring into

the Singularity, web page (1996-2001).
9 : Olivier Miakinen, personal communication, Sep 2004.
10 : http://www.polytope.net/hedrondude/illion.htm Jonathan Bowers (AKA

"hedrondude"), -Illion Numbers. Extensive list of his invented large
number names, in numerical order, and most ending in "-illion".
11 : Stephen Hawking, God Created the Integers (an anthology of translated

works of great mathematicians throughout history), pp. 971-1039 (Georg
Cantor)
12 : http://www.toothycat.net/wiki/wiki.pl?CategoryMaths/BigNumbers
Douglas Reay, commenting on discussion of formal theory of computation,
toothycat.net wiki (created by Sergei and Morag Lewis), CategoryMaths,
BigNumbers.
13 : http://www.math.ohio-state.edu/~friedman/manuscripts.html Papers by
Harvey M. Friedman. In the "preprints, drafts and abstracts" is Enormous
Integers in Real Life, 2000, which summarises several methods of producing
large integers, related to combinatorics and theory of computation.
14 : Harvey Friedman, Long Finite Sequences, 1998. Available at the above

website13.
15 : http://math.eretrandre.org/tetrationforum/showthread.php?tid=184
Henryk Trappman and Andrew Robbins, Tetration FAQ (online document)
Note: A previous version was here
Page 93 of 100
16 : Martin Gardner, The Colossal Book of Mathematics: Classic Puzzles,
Paradoxes, and Problems, W. W. Norton (2001), ISBN 0393020231.
The "Graham-Gardner number": pp. 448-450; also appeared in Scientific

American in 1977.
Most Gardner material has been published multiple times, so you might find
it in one or another of his earlier books.
17 : Knuth, Donald E., Coping With Finiteness, Science vol. 194 n. 4271
(Dec 1976), pp. 1235-1242.
18 : http://www.stars21.com/translator/english_to_latin.html InterTran
English-Latin Translator, via Stars21.
19 : Wikipedia, Names of large numbers, encyclopedia article (accessed

January 2010)
20 : http://www.numericana.com/answer/units.htm#prefix G{'e}rard P.
Michon's Numericana, Final Answers — Measurements and Units. (Has lots of
details about real and bogus SI prefixes) (formerly at
http://home.att.net/~numericana/answer/units.htm)
21 : billion in the literal sense appears first in 1690, when it

unambiguously meant 1012. The same applies to the first literal usage of
trillion (1018) and all of the higher names in the table — so these
citation dates merely indicate how old the words are. See my discussion
History of Short vs. Long Scale for more details. All first citations in
OED [42] are either from Jeake in 1674 or Locke in 1690.
22 : OED [42] does not cite billion in the superlative sense, but milliard
was used in the superlative sense as far back as 1823.
23 : OED [42] cites one usage of sextillion in the superlative sense by

Walt Whitman in 1881; earlier editions of Whitman's poetry collection show
that he used this word and several others as early as 1855, and trillions
in 1847. Whitman also used decillion as a superlative, and he used
millions in the superlative sense more than all the other -illions
combined. H. P. Lovecraft used vigintillion in the superlative sense in
1926 and 1928, and used no other -illion words above billion; his usage of
million and billion was literal in almost all cases.
24 : duodeviginti, undeviginti : These are two of the more notable

discrepancies with actual Latin number-names; see the discussion here.
25 : Typically the terms of such a competition require that each number be

finite, "well-defined", and "computable"; this last requirement keeps the
discussion within the realm of things that can be proven. (Without it, the
busy beaver functions prevail, but it becomes nearly impossible to resolve
the question of whose function is larger).
Page 94 of 100
An example of such a discussion is the long-running xkcd forum discussion
thread "My number is bigger!". This thread was begun on the 7th of July,
2007, and remained continually active for nearly three years (last checked
May 2010). The initial message began the competition with 9000; the first
respondent offered 3.250792...×10548; several class 2 replies brought it up
to 3.454307...×101661; then it jumped to 101010, 10101010, 10↑↑512, and
10↑↑↑3=10↑↑(10↑↑10). All of this was within the first 24 hours. Up-arrow
notation was no longer of any use by the third day of the discussion, and
the participants then began defining recursive functions and discussing
proofs. It continued along those lines for the following three years.
26 : See "Completeness of the S-K basis" on the Wikipedia page for

Combinatory logic,
Bibliography
[27] Edward Brooks, The Philosophy of Arithmetic, 1904. Cited by [33].
[28] Kasner, Edward and Newman, James, *Mathematics and the Imagination*,
Penguin, 1940
[29] Gamow, George, One, Two, Three... Infinity: Facts and Speculations of
Science, Viking, 1947 (reprinted in paperback by Dover, 1988).
This was an early source for me and unfortunately gave me the impression
that the contimuum hypothesis had been proven. This figure implies that
the ℵn series of infinities is the complete set of infinities:
Page 95 of 100
Gamow p. 23., implying CH
If these are really "the first three infinite numbers", then there can be
nothing between ℵ0 and ℵ1, and that's CH.
[30] E. L. Kaufman, et. al., 1949: The discrimination of visual number.

The American Journal of Psychology 62(4) (Oct. 1949), pp. 498-525.
[31] George Miller, The magical number seven plus or minus two: some
limits on our capacity for processing information. The Psychological
Review 63 (1956), pp. 81-97
[32] Davis, Philip J., The Lore of Large Numbers, New York: Random House,
1961
Much discussion of number writing systems, methods of arithmetic and

estimation, names for large powers of 10, and so on. Covers many other
topics, including: the method of finite differences; linear algebra and
finite-element analysis; figurate sequences, prime numbers; large and
small quantities encountered in science; SI prefixes and unit conversion.
This is one of the first books I found on the topic. Bits of it (such as
the discussion of how many objects one can see at one time with one's
eyes) are seen on this web page.
Page 96 of 100
[33] Dmitri Borgmann, Naming the numbers. Word Ways: the Journal of
Recreational Linguistics 1 (1), pp. 28-31, 1968. Cover and contents are
here and article is here.
[34] Howard DeLong, A profile of mathematical logic (Addison-Wesley 1970,

also Dover 2004) p. 192.
[35] R.L. Graham, B.L. Rothschild, Ramsey's Theorem for n-Parameter Sets.
Transactions of the American Mathematical Society 159 (1971), 257-292.
(Another PDF is here).
[36] Hofstadter, Douglas, Gödel, Escher Bach: An Eternal Golden Braid,

Vintage, 1979, ISBN 978-0394745022
[37] Davis, Philip and Hersh, Reuben. *The Mathematical Experience*,

Birkhaeuser, 1981.
infinities: pages 223-225
[38] Donald E. Knuth, Supernatural numbers. Appears as pp. 310-325 in The

Mathematical Gardner, ed. David A. Klarner (1981).
[39] Douglas R. Hofstadter, On Number Numbness, Mathematical Recreations

column, Scientific American, May 1982.
[40] Dewdney, A.K., The Busy Beaver, in Mathematical Recreations column,

Scientific American, April 1985, p. 30.
[41] Douglas R. Hofstadter, Metamagical Themas, book collecting several

articles from the Scientific American column of the same name, BasicBooks
(1985), ISBN 0-465-04540-5.
[42] The Compact Oxford English Dictionary (Second Edition), 1991. This is
the version that has 21473 pages photographically reduced into a single
book of about 2400 pages.
[43] John Horton Conway and Richard Guy, The Book of Numbers, Springer-
Verlag, New York, 1996. ISBN 038797993X.
Page numbers for specific topics:

pp. 13-15 (-illion number-names)
pp. 59-61 (Knuth up-arrow notation)
p. 60 (Ackermann numbers)
p. 61 (Conway chained-arrow notation)
p. 61 (Skewes's number)
pp. 61-62 (the "Graham-Conway number")
Page 97 of 100
pp. 266-276 (Cantor ordinal infinities)
pp. 277-282 (cardinal infinities and the continuum)
[44] Crandall, The Challenge of Large Numbers, Scientific American

February 1997, pages 74-79.
[45] Georges Ifrah, The Universal History of Numbers, ISBN 0-471-37568-3.

(1999).
[46] Chris Bird, Proof that Bird's Linear Array Notation with 5 or more
entries goes beyond Conway's Chained Arrow Notation, 2006. Available here
(and formerly at
uglypc.ggh.org.uk/chrisb/maths/superhugenumbers/array_notations.pdf)
[47] Harvey Friedman, n(3) < Graham's number < n(4) < TREE{3}, message to
FOM (Foundations of Mathematics) mailing list.
[48] A. Nabutovsky and S. Weinberger, Betti numbers of finitely presented

groups and very rapidly growing functions, 2007. Available here
[49] Chris Bird, personal communication, 2008.
[50] Donald E. Knuth, personal communication, 2010 Feb 26.
[51] N. Mohan Kumar, Construction of Number Systems (for Math 310 course
at Washington University in St. Louis), 2011.
[52] John Baez, Google+ post, 2013 Jan 11 (See also this mathoverflow
question)
[53] Sbiis Saibian, 3.2.10 Graham's Number, web article, 2013 Feb 15.
[54] Adam Goucher, The Ξ function (blog article), 2013 Jan 6.
[55] Michael Taylor, Numbers, 2014.
Other Links
Aaronson, Scott, Who Can Name the Bigger Number?, essay about how to win
the often-contemplated contest; covers many of the topics discussed here.
Bird, Chris, Array Notations for Super Huge Numbers, 2006. (An older
version of his work, which includes much of the material found here).
----, Super Huge Numbers, 2012. There are several sections, with the
simplest and slowest-growing functions first. The initial chapter "Linear
Array Notation" is roughly comparable to Bowers arrays; the other chapters
define higher and higher recursive functions.
Page 98 of 100
Bowers, Jonathan, Big Number Central.
----, Exploding Array Function.
----, Infinity Scrapers.
Hudelson, Matt, Extremely Large Numbers
Knuth, Mathematics and Computer Science: Coping with Finiteness. Advances

in our ability to compute are bringing us substantially closer to ultimate
limitations., Science, 1976, pages 1235-1242
Kosara, Robert, The Ackermann Function
MacTutor history of Mathematics page on Chuquet
Matuszek, David, Ackermann's Function
McGough, Nancy, The Continuum Hypothesis (web pages)
Munafo, Robert, hypercalc (the Perl calculator program that handles

numbers up to 10④10000000000)
Pilhofer, Frank, Googolplex and How to get a Googolplex
Rado, Tibor, On non-computable functions, Bell System Tech. Journal vol.

41 (1962), pages 877-884. (busy beaver function)
Rucker, Rudy, Infinity and the Mind, 1980. (ordinal infinities: the
relevant chapter was reproduced here the last time I checked.)
Spencer, Large Numbers and Unprovable Theorems, American Mathematical

Monthly, 1983, pages 669-675
Steinhaus, Hugo, Mathematical Snapshots (3rd revised edition) 1983, pp. 28-
29.
Stepney, Susan, Ackermann's function
----, Big Numbers
----, Graham's Number (referring to the more well-known version, i.e. the
"Graham-Gardner number")
Teoh, H. S., The Exploding Tree Function, 2008.
Weisstein, Eric (ed.), Ackermann Function
----, Large Number
Wikipedia, Veblen function.

Page 99 of 100
Acknowledgments
To Morgan Owens (packrat at mznet gen nz) for news of the Knuth -yllion
names and the Busy Beaver function
Unconfirmed SI prefixes: Sci.Math FAQ, Alex Lopez-Ortiz, ed. (formerly at

http://www.cs.unb.ca/~alopez-o/math-faq/mathtext/node25.html)
Page 100 of 100

19 03 13 - Large Numbers - R Munafo

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

19 03 13 - Large Numbers - R Munafo

Încărcat de

Drepturi de autor:

Formate disponibile

Large Numbers

Following is an article by Adam Townsend. Published on 12 May 2016.

Extending million, billion, trillion…

An extension to ‘million, billion, trillion’ was proposed in John Conway

With the English language finally having settled on the traditionally

They take the number n occurring in 103n+3

and use its Latin translation as a prefix for n≥10

Units Tens Hundreds

1 un (n) deci (nx) centi

2 duo (ms) viginti (n) ducenti

3 tre (s) (ns) triginta (ns) trecenti

4 quattuor (ns) quadraginta (ns) quadringenti

5 quin (ns) quinquaginta (ns) quingenti

6 se (sx) (n) sexaginta (n) sescenti

7 septe (mn) (n) septuaginta (n) septingenti

8 octo (mx) octoginta (mx) octingenti

9 nove (mn) nonaginta nongenti

Once you’ve combined these parts together in units–tens–hundreds order,

Names for real-life big numbers

$100 trillion is still only 1014

Old Zimbabwean dollars in new Zimbabwean dollars

2 x 1035 = 200 x 103 x 10 + 3= 200 decillion

However, we can find some larger numbers due to hyperinflation! The

Number of atoms in the universe

1080 = 100 x 103 x 25 + 3

Surely that must be the physical limit of large numbers?

One hundred factorial

100! ≈ 9.3 x 10157

The factorial function grows incredibly quickly. Classic maths challenge

The largest known prime

The naming of this number is left as an exercise to the reader.

Where do we get the names million, billion, trillion from?

The words bymillion and trimillion were first recorded in 1475.

Competing systems for large number names

So 103= ten hundred and

106= a hundred myriad

Adam Goucher's Ξ(n)

Large numbers have interested me almost all my life.

This page is meant to counteract the forces of Munafo's Laws of

Experiments with animals, when sufficiently well set up and conducted,

It is a widespread belief (perhaps myth) that there are/were some

The earliest conscious communication of numbers between humans was

Symbolic representations of numbers soon became common. The earlier

The Big Number Names of Nicolas Chuquet

Peletier's Proposal and the Short Scale

In 1549 Jacques Peletier repeated the suggestion that billion should be

Zillions: Big-Number Words as a Hyperbolic Adjective

Standard Accepted Names and SI Prefixes

The Standard Names and SI Prefixes

102 hundred 950 AD 1300 hecto- (h)

10×12 great hundred 1533

122 gross 1411

0 103 thousand 971 AD 1000 kilo- (k)

210 1024 kibi- (ki)

123 great gross 1640

104 myriad 1555 1555 myria- (my)

1 unus 106 million 1370 1362 Mega- (M)

220 1048576 Mebi- (Mi)

great million, 1625, ?,

230 1073741824 Gibi- (Gi)

240 1099511627776 Tebi- (Ti)

4 quatuor 1015 quadrillion 167421 185523 Peta- (P)

250 1125899906842624 Pebi- (Pi)