Sunteți pe pagina 1din 103

Discrete Math Boot Camp – Cullen Schaffer

1. Compute 𝟏 + 𝟐 + 𝟑 + ⋯ + 𝟏𝟎𝟎.

A good idea is to add the sum to itself backwards:


1 + 2 + 3 + ⋯ + 100 +
100 + 99 + 98 + ⋯ + 1
Then, adding each vertical pair of numbers, we have

101 + 101 + 101 + ⋯ + 101


which is 100 ⋅ 101 = 10,100. Since this is twice the sum we want, our answer is 10,100/2 = 5,050.

2. Compute 𝟏 + 𝟐 + 𝟑 + ⋯ + 𝒏.

The same trick works, yielding a general formula. We have:


1+ 2+ 3+ ⋯+ 𝑛 +
𝑛 +𝑛 − 1+ 𝑛 − 2+ ⋯+ 1
Then, adding each vertical pair of numbers, we have

(𝑛 + 1) + (𝑛 + 1) + (𝑛 + 1) + ⋯ + (𝑛 + 1)

which is 𝑛 ⋅ (𝑛 + 1). Since this is twice the sum we want, our answer is 𝑛(𝑛 + 1)/2.
𝑛+1
Note that we can think of this as the total of 𝑛 items, of average value 2
.

3. Eight people arrive at a meeting and everyone shakes hands. Compute the number of
handshakes.

Pick any particular person 𝐴 and note that she participates in seven handshakes. Pick any other person
𝐵 and note that, not counting the handshake with 𝐴, he participates in six handshakes. Pick any other
person 𝐶 and note that, not counting the handshakes with 𝐴 and 𝐵, he participates in five handshakes.
Continuing in this way, we see that the total number of handshakes is

7+6+5+4+3+2+1
Applying our formula, that’s 7 ⋅ 8/2 = 28.

Having the formula makes it easy to answer the question for larger groups. If there are 50 people, the
number of handshakes is 49 ⋅ 50/2. If we like, we can even give a general formula; with 𝑛 people, the
number of handshakes will be (𝑛 − 1)𝑛/2.

4. Find the number of comparisons in a bubblesort of 𝟏, 𝟎𝟎𝟎 items.

To sort is to put things in order. Bubblesort is one way of doing this. Starting with an unsorted list, we
compare the first two items, then the second and third, then the third and fourth and so on, swapping
items that are out of order after each comparison. Note that, as soon as we come to the largest item, it
will be repeatedly swapped forward, so that it will certainly be moved to the last place in the list, just
where it should be. Then we make another pass, comparing and swapping, but ignoring the last item

1
©2018, Cullen Schaffer
(since it’s already in place). This second pass leaves the second largest item in the correct place. A third
pass leaves the three largest items in place, and so on.

If we start with 1,000 items, we’ll make 999 comparisons on the first pass. On the second pass, we’ll
make 998. In all, we’ll make

999 + 998 + 997 + ⋯ + 1


comparisons. Applying our formula, the total number of comparisons is 999 ⋅ 1,000/2, about half a
million. That’s a lot of comparisons to put just 1,000 items in order; we’ll do much better before long.

5. Compute 𝟓 + 𝟖 + 𝟏𝟏 + ⋯ + 𝟐𝟕𝟓.

Adding the sum to itself in reverse will work here as well, but instead, let’s do a bit of algebra. The sum
we want is

(2 + 3 ⋅ 1) + (2 + 3 ⋅ 2) + (2 + 3 ⋅ 3) + ⋯ + (2 + 3 ⋅ 91)

Using the commutative law, we can rearrange this:


(2 + 2 + 2 + ⋯ + 2) + (3 ⋅ 1 + 3 ⋅ 2 + 3 ⋅ 3 + ⋯ + 3 ⋅ 91)

The first part is 91 ⋅ 2. The second part, factoring out the 3, is 3 ⋅ (1 + 2 + 3 + ⋯ + 91). Putting the
two together, and applying our formula, we get
91 ⋅ 92
91 ⋅ 2 + 3 ⋅ = 12,740
2
It’s important to get used to seeing manipulations of this sort written using Σ notation. A more compact
way of writing (2 + 3 ⋅ 1) + (2 + 3 ⋅ 2) + (2 + 3 ⋅ 3) + ⋯ + (2 + 3 ⋅ 91) is
91

∑ 2 + 3𝑖
𝑖=1

Our first step above was to notice that it was helpful to add up the 2’s separately:
91 91

∑ 2 + ∑ 3𝑖
𝑖=1 𝑖=1

The next step was to use the distributive property, factoring out something common to all the terms in a
sum:
91 91

∑2 + 3∑𝑖
𝑖=1 𝑖=1

Adding a number to itself 91 times is the same as multiplying it by 91, so the first sum is 91 ⋅ 2. In
general, ∑𝑛𝑖=1 𝑘 = 𝑛 ⋅ 𝑘 . For the second sum, we have a formula, which we can now write using Σ
notation:

2
©2018, Cullen Schaffer
𝑛
𝑛(𝑛 + 1)
∑𝑖 =
2
𝑖=1

A sequence of numbers that increase by a fixed amount (like the 5, 8, 11, … in this example, which go up
by 3’s) is called arithmetic. The sum is called an arithmetic series. With the techniques we’ve outlined,
we can now compute any arithmetic series.

6. Evaluate ∑𝟓𝟎
𝒊=−𝟓 𝟏 + 𝟒𝒊.

There are 56 terms in this sum, but the index doesn’t run from 1 to 56, which makes it hard to apply the
formula we’ve developed. Let’s rewrite the sum using an index that does run in the usual way:
56

∑ 1 + 4(𝑗 − 6)
𝑗=1

Note that we’re adding up exactly the same numbers. As 𝑗 goes from 1 to 56, the parenthetical that has
taken the place of 𝑖 goes from −5 to 50, just as before. But now our standard manipulations and
formula will get us an answer:
56 56 56 56
56 ⋅ 57
∑ 1 + 4(𝑗 − 6) = ∑ 4𝑗 − 23 = 4 ∑ 𝑗 − ∑ 23 = 4 ⋅ − 56 ⋅ 23
2
𝑗=1 𝑗=1 𝑗=1 𝑗=1

7. In a set of 𝟐𝟎 elements, how many subsets have 𝟎 elements? How many have 𝟏 element? How
many have 𝟐 elements?

A set is an unordered collection of elements and is written this way: 𝑆 = {𝑎, 𝑏, 𝑐, 𝑑, 𝑒}. We say, for
example, that 𝑑 is an element of the set 𝑆, and abbreviate that 𝑑 ∈ 𝑆. A subset of 𝑆 is a set, all of whose
elements are also elements of 𝑆. For example, we say that 𝑅 = {𝑎, 𝑐, 𝑑} is a subset of 𝑆 and abbreviate
that 𝑅 ⊆ 𝑆.

Note carefully that

1. 𝑆 ⊆ 𝑆; that is, a set is always a subset of itself.


2. The empty set, written { } or 𝜙, is a subset of every set; for example, 𝜙 ⊆ 𝑆. If it seems
counterintuitive that 𝜙 ⊆ 𝑆, see if you can find any elements of 𝜙 that are not in 𝑆.

We have a set of 20 elements, say {𝑎1 , 𝑎2 , 𝑎3 , … , 𝑎20 }. There can only be one set with 0 elements,
namely 𝜙. The subsets of exactly one element are easily listed: {𝑎1 }, {𝑎2 }, {𝑎3 }, … , {𝑎20 }. Clearly there
are 20 of them.

To make a list of all subsets of two elements, start with just those including 𝑎1 . These are
{𝑎1 , 𝑎2 }, {𝑎1 , 𝑎3 }, {𝑎1 , 𝑎4 }, … , {𝑎1 , 𝑎20 } and we clearly have 19 of them.

Next we list two-element subsets containing 𝑎2 , being careful not to include {𝑎1 , 𝑎2 }, which we’ve
already listed. The new subsets are
{𝑎2 , 𝑎3 }, {𝑎2 , 𝑎4 }, {𝑎2 , 𝑎5 }, … , {𝑎2 , 𝑎20 }

and there are 18 of them.

3
©2018, Cullen Schaffer
Then we’ll have 17 subsets containing 𝑎3 , not counting those already listed, 16 subsets containing 𝑎4
19⋅20
and so on. The total is ∑19
𝑖=1 𝑖 = 2
= 190, applying our formula.

20
The number of ways of choosing two elements from a set of 20 to form a subset is written ( ) and is
2
20
read “20 choose 2.” The number of three-element subsets is ( ), and so on; we’ll have a way of
3
20 20
calculating this before long. Note that we already know that ( ) = 1 and ( ) = 20.
0 1
8. In (𝒙 + 𝒚)𝟓𝟎 , what are the coefficients of 𝒙𝟓𝟎 , 𝒙𝟒𝟗 𝒚 and 𝒙𝟒𝟖 𝒚𝟐 ?

Look first at (𝑥 + 𝑦)2 or (𝑥 + 𝑦)(𝑥 + 𝑦). The 𝑥 in the first factor is identical to the one in the second
factor; likewise for the 𝑦’s. For clarity, though, we temporarily label these with subscripts, rewriting the
product as (𝑥1 + 𝑦1 )(𝑥2 + 𝑦2 ). Multiplying this out, we get 𝑥1 𝑥2 + 𝑥1 𝑦2 + 𝑦1 𝑥2 + 𝑦1 𝑦2 .

Note carefully that each term has one variable from the first factor (𝑥1 + 𝑦1 ), that is, either 𝑥1 or 𝑦1 ,
and one variable from the second factor (𝑥2 + 𝑦2 ), either 𝑥2 or 𝑦2 .

We can see the process even more clearly if we go one step farther and multiply out (𝑥1 + 𝑦1 )(𝑥2 +
𝑦2 )(𝑥3 + 𝑦3 ). We start by multiplying out the first two factors, getting the answer we’ve already seen:

(𝑥1 𝑥2 + 𝑥1 𝑦2 + 𝑦1 𝑥2 + 𝑦1 𝑦2 )(𝒙𝟑 + 𝒚𝟑 )

Then we distribute:

(𝑥1 𝑥2 + 𝑥1 𝑦2 + 𝑦1 𝑥2 + 𝑦1 𝑦2 )𝒙𝟑 + (𝑥1 𝑥2 + 𝑥1 𝑦2 + 𝑦1 𝑥2 + 𝑦1 𝑦2 )𝒚𝟑

If we distribute again, we’ll get 𝑥3 with every combination of one variable from the first factor and one
from the second and 𝑦3 with every one of the same combinations. The result is a series of terms with all
possible combinations of one variable from each of the three original factors.

Now that the pattern is clear, consider (𝑥 + 𝑦)50 . If we again temporarily add subscripts, that will be
(𝑥1 + 𝑦1 )(𝑥2 + 𝑦2 ) ⋯ (𝑥50 + 𝑦50 ). Multiplying this out, we’ll get a very long sum. But a typical term
will look like this

𝑦1 𝑥2 𝑥3 𝑥4 ⋯ 𝑦49 𝑥50
with one variable from each factor.

Suppose we now drop the subscripts. Then all of the terms that have exactly 13 𝑥’s and 37 𝑦’s will be
equal to 𝑥 13 𝑦 37 and can be collected together. The number of such terms will be the coefficient of
𝑥 13 𝑦 37 .

We’re interested, first, in the number of terms with all 𝑥’s, since that will be the coefficient of 𝑥 50 . But
the only way to get such a term is to pick the 𝑥 from each of the 50 factors. That is, there’s only one
such term and the coefficient is 1.

Next, how can we get terms of the form 𝑥 49 𝑦? There’s exactly one 𝑦 and it can come from the first
factor or the second factor or the third factor, etc. There are 50 possibilities for which factor supplies
the 𝑦 and each one yields one 𝑥 49 𝑦 term. So the coefficient of 𝑥 49 𝑦 is 50.

4
©2018, Cullen Schaffer
Finally, what about terms of the form 𝑥 48 𝑦 2 . Here, there are exactly two 𝑦’s. They might be 𝑦1 and
some other 𝑦: 𝑦1 and 𝑦2 , 𝑦1 and 𝑦3 and so on, up to 𝑦1 and 𝑦50 —that’s 49 terms. Or they might be 𝑦2
and some other 𝑦: 𝑦2 and 𝑦3 . 𝑦2 and 𝑦4 and so on, for a total of 48 more terms. Or they might be 𝑦3
with some other 𝑦—47 more terms.

The total number of terms is ∑49 48 2


𝑖=1 𝑖 = 49 ⋅ 50/2 = 1,225, so this is the coefficient of 𝑥 𝑦 .

Notice that each term of the form 𝑥 48 𝑦 2 is the result of picking a pair of 𝑦’s with subscripts from the set
{1,2,3, … ,50}. The number of such terms is the same as the number of subsets of {1, 2, 3, … , 50} with
50
two elements. That is, the coefficient of 𝑥 48 𝑦 2 in (𝑥 + 𝑦)50 is ( ).
2
9. In (𝒙 + 𝒚)𝟓𝟎 , what is the coefficient of 𝒙𝟏𝟑 𝒚𝟑𝟕 written in “choose” notation?
50
It will be the number of ways of forming a term with 𝑦’s chosen from 37 of the 50 factors, i.e. ( ). We
37
still don’t have a way of computing this number, but we will soon.

Two additional points are worth noting. First, terms with 37 𝑦’s can also be described as terms with 13
50 50
𝑥’s. We get one of these for every way of choosing 13 of the 50 factors, i.e. ( ). So we have ( ) =
13 37
50
( ). The argument applies in general: If we have a set of 𝑛 elements, the number of ways of choosing
13
𝑘 to be in a subset is the same as the number of ways of choosing 𝑛 − 𝑘 to not be in the subset. That is,
𝑛 𝑛
( )=( )
𝑘 𝑛−𝑘
50
So we can also say that the coefficient of 𝑥 13 𝑦 37 in (𝑥 + 𝑦)50 is ( ).
13
Second, since, again, there was nothing special about the number 50, we get a general formula for (𝑥 +
𝑦) raised to any power:
𝑛 𝑛 𝑛 𝑛 𝑛
(𝑥 + 𝑦)𝑛 = ( ) 𝑥 𝑛 + ( ) 𝑥 𝑛−1 𝑦 + ( ) 𝑥 𝑛−2 𝑦 2 + ⋯ + ( ) 𝑥𝑦 𝑛−1 + ( ) 𝑦 𝑛
0 1 2 𝑛−1 𝑛
Or, more compactly, using Σ notation
𝑛
𝑛
(𝑥 + 𝑦) = ∑ ( ) 𝑥 𝑛−𝑖 𝑦 𝑖
𝑛
𝑖
𝑖=0

This is called the binomial theorem. The factor (𝑥 + 𝑦) is a bi (two) nomial (name), since each of the
𝑛
two variables is a name for some quantity. The “chooses” ( )—𝑛 choose 𝑘—are formally called
𝑘
binomial coefficients.

10. Given a wardrobe of seven pants, three shirts and five jackets, how many outfits are possible?

Label the pants 𝑝1 , 𝑝2 , … , 𝑝7 and the shirts 𝑠1 , 𝑠2 , 𝑠3. For each choice of pants, there are three possible
shirts, giving 7 ⋅ 3 = 21 combinations:

5
©2018, Cullen Schaffer
𝑝1 𝑠1
𝑝1 𝑠2
𝑝1 𝑠3
𝑝2 𝑠1
𝑝2 𝑠2
𝑝2 𝑠3

𝑝7 𝑠1
𝑝7 𝑠2
𝑝7 𝑠3
Then, for each of these 21, there are five possible jacket choices. Labeling the jackets 𝑗1 , 𝑗2 , … , 𝑗5 , we
have:
𝑝1 𝑠1 𝑗1
𝑝1 𝑠1 𝑗2
𝑝1 𝑠1 𝑗3
𝑝1 𝑠1 𝑗4
𝑝1 𝑠1 𝑗5

𝑝7 𝑠3 𝑗1
𝑝7 𝑠3 𝑗2
𝑝7 𝑠3 𝑗3
𝑝7 𝑠3 𝑗4
𝑝7 𝑠3 𝑗5
So the total number of choices is 105 = 21 ⋅ 5 = 7 ⋅ 3 ⋅ 5. In general, if we make a decision in stages,
we multiply the number of choices in each stage to get the total number of possible decisions. This is
called the multiplication principle.

11. If we multiply out (𝒙𝟏 + 𝒚𝟏 )(𝒙𝟐 + 𝒚𝟐 ) ⋯ (𝒙𝟓𝟎 + 𝒚𝟓𝟎 ), how many terms will we get?

Apply the multiplication principle. To form a term, we first decide what to pick from the first factor: 𝑥1
or 𝑦1 . Then we decide what to pick from the second factor and so on. Each stage has two choices and
50 times
there are 50 stages, so the total number of possibilities is ⏞
2 ⋅ 2 ⋅ 2 ⋅ ⋯ ⋅ 2 = 250.
𝒏
12. Compute ∑𝒏𝒊=𝟎 ( ).
𝒊
Suppose we multiply out (𝑥1 + 𝑦1 )(𝑥2 + 𝑦2 ) ⋯ (𝑥50 + 𝑦50 ), getting 250 terms. What happens if we
50 50
then drop the subscripts? As we’ve seen, ( ) of the terms will then be of the form 𝑥 50 𝑦 0 , ( ) of the
0 1
49 1 50 0 50
terms will be of the form 𝑥 𝑦 , and so on up to the ( ) terms of the form 𝑥 𝑦 .
50
50 50 50
All of the terms must be of one of these forms, so the total () + ( ) + ⋯ + ( ) must be 250 .
0 1 50
Since there’s nothing special about the number 50, we get a general formula:
𝑛
𝑛
∑ ( ) = 2𝑛
𝑖
𝑖=0

6
©2018, Cullen Schaffer
13. How many subsets can be formed from elements of the set {𝒂𝟏 , 𝒂𝟐 , 𝒂𝟑 , … , 𝒂𝒏 }?

Apply the multiplication principle. To form a subset, we first decide whether to include 𝑎1 —there are
two choices, yes or no. Then we decide whether to include 𝑎2 , and so on. There are 𝑛 stages, each with
𝑛 times
two choices. The total number of possibilities is ⏞
2 ⋅ 2 ⋅ 2 ⋅ ⋯ ⋅ 2 = 2𝑛 .

This gives us an alternative way of deriving the formula of Item 12. The total number of subsets is the
number of 0-element subsets plus the number of 1-element subsets plus the number of 2-element
𝑛 𝑛 𝑛
subsets and so on, up to the number of 𝑛-element subsets. In other words, 2𝑛 = ( ) + ( ) + ( ) +
0 1 2
𝑛 𝑛 𝑛 𝑛

⋯ + ( ). Or, using Σ notation: 2 = 𝑖=0 ( ).
𝑛 𝑖
14. In how many ways can the elements of the set {𝒂𝟏 , 𝒂𝟐 , 𝒂𝟑 , … , 𝒂𝒏 } be ordered?

Start by thinking about an example, a set of four elements {𝑤, 𝑥, 𝑦, 𝑧}, and apply the multiplication
principle. To put the elements in an order, choose an element to be first, say 𝑦. Then choose one of the
remaining elements to be second, say 𝑤. Then choose a remaining element to be third, say 𝑧. This
leads to the order 𝑦𝑤𝑧𝑥.

We’re making a decision in three stages, with four choices in the first stage, three in the second and two
in the third. The total number of possibilities is 4 ⋅ 3 ⋅ 2, each matching up with one order, so there are
24 possible orders.

We can get a neater answer if we consider adding the last element as an additional stage. At this point,
there is only one remaining element, so we have just one choice. Our answer is then 4 ⋅ 3 ⋅ 2 ⋅ 1, exactly
the same as before.

Since there is nothing special about the number 4, we have a general formula: the number of ways to
order the elements of a set of 𝑛 is 𝑛 ⋅ (𝑛 − 1) ⋅ (𝑛 − 2) ⋅ ⋯ ⋅ 1. The standard notation for this product is
𝑛! and is read “𝑛 factorial.” The technical term for an ordering is a permutation; we say the number of
permutations of 𝑛 distinct objects is 𝑛!.

15. What is 𝒏!/(𝒏 − 𝟏)!? What is 𝒏!/(𝒏 − 𝟑)!?

In either case, most of the factors in the numerator cancel with factors in the denominator. Let’s start
by looking at 𝑛 = 7:
𝑛! 7! 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1
= = =7
(𝑛 − 1)! 6! 6⋅5⋅4⋅3⋅2⋅1
In general,
𝑛! 𝑛 ⋅ (𝑛 − 1) ⋅ (𝑛 − 2) ⋅ ⋯ ⋅ 1
= =𝑛
(𝑛 − 1)! (𝑛 − 1) ⋅ (𝑛 − 2) ⋅ ⋯ ⋅ 1
Likewise, if 𝑛 = 7:
𝑛! 7! 7 ⋅ 6 ⋅ 5 ⋅ 4 ⋅ 3 ⋅ 2 ⋅ 1
= = =7⋅6⋅5
(𝑛 − 3)! 4! 4⋅3⋅2⋅1
Note that when the denominator is 𝑛 − 3, there are 3 factors left in the answer. In general,

7
©2018, Cullen Schaffer
𝑛! 𝑛 ⋅ (𝑛 − 1) ⋅ (𝑛 − 2) ⋅ (𝑛 − 3) ⋅ ⋯ ⋅ 1
= = 𝑛 ⋅ (𝑛 − 1) ⋅ (𝑛 − 2)
(𝑛 − 3)! (𝑛 − 3) ⋅ ⋯ ⋅ 1
16. In how many ways can 𝒌 elements of the set {𝒂𝟏 , 𝒂𝟐 , 𝒂𝟑 , … , 𝒂𝒏 } be ordered?

Start with an example—ordering three of a set of five elements {𝑣, 𝑤, 𝑥, 𝑦, 𝑧}—and apply the
multiplication principle. We have five choices for which element comes first. For each of these, we
have four choices for which remaining element comes second and then three choices for which comes
third. The number of orderings is 5 ⋅ 4 ⋅ 3. Note that there are three factors.

In the general case, we have 𝑛 choices for which element comes first. For each of these, we have 𝑛 − 1
choices for which remaining element comes second and so on. The final answer is:
𝑘 factors

𝑛 ⋅ (𝑛 − 1) ⋅ ⋯ ⋅ (𝑛 − (𝑘 − 1))

As we’ve just seen, this can be written more compactly as


𝑛!
(𝑛 − 𝑘)!
Careful, though. To compute the number of ways of ordering four elements from a set of 500, it would
be very silly to calculate the enormous number 500! and then divide it by the enormous number 496!.
The answer is just 500 ⋅ 499 ⋅ 498 ⋅ 497; it’s far better to dispense with factorials and apply the
multiplication principle directly.

The factorial form will be very useful though, as we’ll see, especially in writing proofs.

17. In how many ways can we choose 𝒌-element subsets of the set {𝒂𝟏 , 𝒂𝟐 , 𝒂𝟑 , … , 𝒂𝒏 }?

Start with an example—choosing 3-element subsets of a set of five elements {𝑣, 𝑤, 𝑥, 𝑦, 𝑧}. We already
know that there are 5 ⋅ 4 ⋅ 3 = 60 orderings of three elements. Suppose we list them out, keeping
orderings of the same elements together:

8
©2018, Cullen Schaffer
𝑣𝑤𝑥
𝑣𝑥𝑤
𝑥𝑣𝑤
orderings of 𝑣, 𝑤, 𝑥
𝑥𝑤𝑣
𝑤𝑥𝑣
𝑤𝑣𝑥 }
𝑣𝑤𝑦
𝑣𝑦𝑤
𝑤𝑣𝑦
𝑤𝑦𝑣 orderings of 𝑣, 𝑤, 𝑦
𝑦𝑣𝑤
𝑦𝑤𝑣 }

𝑥𝑦𝑧
𝑥𝑧𝑦
𝑦𝑥𝑧
𝑦𝑧𝑥 orderings of 𝑥, 𝑦, 𝑧
𝑧𝑥𝑦
𝑧𝑦𝑥 }
Each group corresponds to a single one of the subsets we are trying to count. For example, the first
group consists of orderings of the subset {𝑣, 𝑤, 𝑥}. To find the number of subsets or groups, divide the
total length of the list (60) by the number of items in each group—that’s 3! = 6 since each group is the
number of orderings of three elements. The number of 3-element subsets is 60/6 = 10.

In general, if we have a set of 𝑛 elements, our list of orderings will have 𝑛!/(𝑛 − 𝑘)! Items, but each
group of 𝑘! of them will correspond to the same subset. Dividing, we get a formula for the number of
subsets:
𝑛!
(𝑛
𝑘! − 𝑘)!
Keep in mind that we already have a notation (see the very end of Item 7) for the number of 𝑘-element
𝑛
subsets of a set of 𝑛 elements: ( ). Now we have a formula for computing the binomial coefficients:
𝑘
𝑛 𝑛!
( )=
𝑘 𝑘! (𝑛 − 𝑘)!

18. In (𝒙 + 𝒚)𝟓𝟎 , what is the coefficient of 𝒙𝟏𝟑 𝒚𝟑𝟕 ?


50
We already know (from Item 9) that the answer is ( ). We also know (from Item 9) that this is the
37
50
same as ( ). In fact, now we can prove it, using our new formula. In the first case, the formula gives
13
50! 50!
=
37! (50 − 37)! 37! 13!
In the second case, the formula gives
50! 50!
=
13! (50 − 13)! 13! 37!

9
©2018, Cullen Schaffer
50
In either case, though, it would be silly to start by calculating 50!. Instead, remember that ( ) is the
13
number of orderings of 13 elements taken from 50,
13 factors

50 ⋅ 49 ⋅ 48 ⋅ ⋯ ⋅ 38
divided by the 13! ways each subset can be ordered. This gives us an answer with 13 factors in both the
numerator and denominator:
50 ⋅ 49 ⋅ 48 ⋅ ⋯ ⋅ 38
13 ⋅ 12 ⋅ 11 ⋅ ⋯ ⋅ 1
Cancelling and then computing, we get a final answer of 354,860,518,600.
𝑛
In general, ( ) has 𝑘 factors in both the numerator and denominator. Count down from 𝑛 in the
𝑘
12
numerator and 𝑘 in the denominator. For example, ( ) has four factors counting down from 12 in the
4
numerator and four counting down from 4 in the denominator:
12 ⋅ 11 ⋅ 10 ⋅ 9
4⋅3⋅2⋅1
Again, as we’ll see, the factorial formula 𝑛!/𝑘! (𝑛 − 𝑘)! is useful, but for proofs, not for calculations.

19. Represent 457 in binary.

In our usual number-representation system, base 10, the value of a digit depends on its position or
place. Place-values are powers of 10, with powers increasing from right to left. For example, in the
number 457, the digit 4, has the value 400 because the second place from the right has the value 102 .
The full picture is as follows:
4 5 7
102 101 100
That is, 457 means 4 ⋅ 102 + 5 ⋅ 101 + 7 ⋅ 100 .

We can use bases other than 10, however. In particular, base-2, or binary, is heavily used in computer
science. Note that, in addition to having different place values, binary uses a different set of digits. Base
10 uses the ten digits 0, 1, 2, … , 9. Base 2 uses the two digits 0, 1; we normally refer to these as bits,
short for binary digits.

Consider the number represented in binary as 111001001. We can get the value by taking place values
into account:
1 1 1 0 0 1 0 0 1
28 27 26 25 24 23 22 21 20
Leaving out the zeroes, which contribute nothing, the number represented is 1 ⋅ 28 + 1 ⋅ 27 + 1 ⋅ 26 +
1 ⋅ 23 + 1 ⋅ 20 = 256 + 128 + 64 + 8 + 1 = 457. This is our answer; but how can we find it? We want
a systematic way to turn 45710 into 1110010012 (subscripts indicate the base).

10
©2018, Cullen Schaffer
The key is to notice what happens when we divide a number by a base. If we take 457 and divide by
10, for example, we get 45 with a remainder of 7. The remainder is the last digit. If we then divide 45
by 10, we get 4, with a remainder of 5. The remainder is the next digit. In fact, if we take any number
and divide by 10 repeatedly, the remainders will be the digits of the number taken from right to left.

Writing the number out with place values makes the process clearer. We start with 4 ⋅ 102 + 5 ⋅ 101 +
7 ⋅ 100 = 10 ⋅ (4 ⋅ 101 + 5 ⋅ 100 ) + 7. Dividing by ten leaves the number in parentheses—all but the
last digit shifted one place to the right—with the final digit as remainder.

Now what if we divide repeatedly by two? In this case, the remainders will be the binary digits, still from
right to left: 457 divided by two is 228 with a remainder of 1, the rightmost bit of 111001001; 228
divided by two is 114 with a remainder of 0, the second bit from the right; and so on. Continuing, we
can get all the bits of the binary representation.

Again, we can see more clearly what’s happening by showing place values explicitly. For example, our
original number was

457 = 1 ⋅ 28 + 1 ⋅ 27 + 1 ⋅ 26 + 0 ⋅ 25 + 0 ⋅ 24 + 1 ⋅ 23 + 0 ⋅ 22 + 0 ⋅ 21 + 1 ⋅ 20
= 2 ⋅ (1 ⋅ 27 + 1 ⋅ 26 + 1 ⋅ 25 + 0 ⋅ 24 + 0 ⋅ 23 + 1 ⋅ 22 + 0 ⋅ 21 + 0 ⋅ 20 ) + 1
Dividing by two leaves the number in parentheses—everything but the last digit of the original
number—shifted one place to the right. The final bit is left as the remainder.

Note very carefully that dividing a number by two or ten gives the same result no matter what
representation we use. If we have 457 pieces of candy and give them out to ten people, each will get
45 pieces and we will have 7 pieces left over. This a fact about numbers and not about how they are
written. If we use Roman numerals and say that we have CDLVII pieces of candy and give them out to X
people, we will still have seven pieces left over.

Dividing a base-10 number by ten leaves a remainder equal to the last digit (in base 10). Dividing a
base-2 number by two does the same, leaving a remainder equal to the last bit (in base 2). But dividing
a base-10 number by two leaves exactly the same remainder and still gives us the last bit. This is why
our procedure of dividing by two repeatedly produces binary digits, even though we do our division
calculations in base ten.

20. How many 𝒏-bit binary numbers are there?

Apply the multiplication principle. The first bit is either 0 or 1, two choices. The second bit is either 0 or
1, two choices. There are 𝑛 stages, each with two choices. The total number of possibilities is
𝑛 times

2 ⋅ 2 ⋅ 2 ⋅ ⋯ ⋅ 2 = 2𝑛 .

There is a second way to get the answer. Suppose we have a 4-element set {𝑎, 𝑏, 𝑐, 𝑑}. We can think of
any 4-bit binary number as a code specifying a subset, the first bit telling us whether to include the first
element (𝑎)—1 for yes and 0 for no—the second bit telling us whether to include the second element
(𝑏) and so on. For example, 1011 would be the code for {𝑎, 𝑐, 𝑑}. In this case, every binary number
corresponds to a subset. We already know that there are 24 subsets, so there must also be 24 binary
numbers. But there is nothing special about the number 4; there are 2𝑛 subsets of an 𝑛-element set, so
there must be 2𝑛 𝑛-bit binary numbers.

11
©2018, Cullen Schaffer
There is yet another way to get the answer. How many 3-digit numbers are there in base 10? This is
easy, the complete list of numbers is 000, 001, 002, 003, … , 999, so the number of numbers is one more
than the highest number: 999 + 1 = 1,000 (one more because there’s a zero (000) in addition to the
numbers 1 through 999).

In base 2, the list of 3-bit numbers is 000, 001, 010, … , 111. The number of numbers in this list is one
more than the highest number (1112 = 710 ). So the total number of numbers is 7 + 1 = 8 = 23. In
general, all we need to do is add 1 to the highest number and that will tell us how many numbers are in
the list.
𝑛 bits
What is the highest binary number with 𝑛 bits? It’s ⏞111 ⋯ 1 = 1 ⋅ 2𝑛−1 + 1 ⋅ 2𝑛−2 + 1 ⋅ 2𝑛−3 + ⋯ + 1 ⋅
20 . That is, looking at the sum in the opposite order: 1 + 2 + 22 + ⋯ + 2𝑛−1 .

Let’s call this sum 𝑆 and then apply a trick. We write 𝑆 and then also 2𝑆:

𝑆 = 1 + 2 + 22 + ⋯ + 2𝑛−1
2𝑆 = 2 + 22 + ⋯ + 2𝑛−1 + 2𝑛
If we now subtract the top line from the bottom, nearly everything cancels, and we’re left with just

𝑆 = 2𝑛 − 1
This is the highest number. Add one to get the total number of numbers: 2𝑛 .

By the way, there’s an easier way to get the value of the highest binary number with 𝑛 bits. The highest
4-digit number in base 10—9,999—is one less than the smallest 5-digit number—10,000. Likewise, the
𝑛 bits 𝑛+1 bits
highest 𝑛-bit binary number— ⏞ ⏞ ⋯ 0.
111 ⋯ 1—is one less than the smallest (𝑛 + 1)-bit number—100
𝑛 2 𝑛−1
That is, it’s 2 − 1. In fact, this is an alternative way to show that 1 + 2 + 2 + ⋯ + 2 = 2𝑛 − 1.

We prefer the method above, though, because, as we’re about to see, it can be applied to other
problems.

21. Compute ∑𝒏𝒊=𝟎 𝒓𝒊 .

When 𝑟 = 2, this is 1 + 2 + 22 + ⋯ + 2𝑛 , that is, it’s the sum we just tackled, but extended for one
additional term. The same trick works again. Call the sum we want 𝑆 and write both 𝑆 and 𝑟𝑆:

𝑆 = 1 + 𝑟 + 𝑟2 + ⋯ + 𝑟𝑛
𝑟𝑆 = 𝑟 + 𝑟 2 + ⋯ + 𝑟 𝑛 + 𝑟 𝑛+1
Subtract the top line from the bottom to get:

𝑟𝑆 − 𝑆 = 𝑆(𝑟 − 1) = 𝑟 𝑛+1 − 1
Then divide by (𝑟 − 1) to get a formula for 𝑆:
𝑛
𝑟 𝑛+1 − 1
∑ 𝑟𝑖 =
𝑟−1
𝑖−0

12
©2018, Cullen Schaffer
If 𝑟 is a number like 2 or 3, then the term 𝑟 𝑛+1 in this formula increases without bound if we use larger
1
and larger values of 𝑛. Suppose, however, that 𝑟 = . In this case, 𝑟 𝑛+1 gets smaller and smaller if 𝑛
2
increases; in fact, if we let 𝑛 go all the way to infinity, it disappears entirely. This gives us a simpler
formula:

0−1 1
∑ 𝑟𝑖 = =
𝑟−1 1−𝑟
𝑖=0

Keep in mind that this applies only if the size of 𝑟 is less than 1 (so that 𝑟 𝑛+1 disappears as 𝑛 increases to
infinity) and when the sum goes on forever. Here’s an example:

1 1 2 1 3 1 3
1+ +( ) +( ) +⋯= =
3 3 3 1
(1 − 3) 2

A sequence of numbers that increases by a fixed factor (for example, one like 1, 2, 4, 8, … where each
number is two times the preceding one) is called geometric. The sum of a geometric sequence is a
geometric series. With the techniques we’ve outlined, we can now evaluate any geometric series.

As an example, let’s try 𝑥 10 𝑦 − 2𝑥 7 𝑦 2 + 4𝑥 4 𝑦 3 − 8𝑥𝑦 4 + ⋯ − 512𝑦10 /𝑥17 . This may not immediately
look like a geometric series, but notice that each term is −2𝑦/𝑥 3 times the previous one; if terms
increase by a fixed factor, it’s geometric. To evaluate a geometric series:

1. Factor out the first term, so that the series starts from 1.
2. Write what remains so that it’s in the form 1 + 𝑟 + 𝑟 2 + ⋯ + 𝑟 𝑛 (or, if the series doesn’t stop,
1 + 𝑟 + 𝑟 2 + ⋯).
3. Apply one our two formulas.

Here’s what that looks like for our example:


2𝑦 4𝑦 2 8𝑦 3 512𝑦 9
1. 𝑥 10 𝑦 (1 − 𝑥 3 + 𝑥 6 − 𝑥 9 + ⋯ − 𝑥 27
)
2𝑦 2𝑦 2 2𝑦 3 2𝑦 9
2. 𝑥 10 𝑦 (1 + (− 3 ) + (− 3 ) + (− 3 ) + ⋯ + (− ) )
𝑥 𝑥 𝑥 𝑥3
2𝑦 10
(− 3 ) −1
10 𝑥
3. 𝑥 𝑦 2𝑦
− 3 −1
𝑥

Of course, this answer can then be simplified.

22. Compute ∑𝒏𝒊=𝟏 𝒋.


𝑛 terms
This is easier to see for a particular value of 𝑗, say 𝑗 = 5. In this case we have ∑𝑛𝑖=1 5 = ⏞
5 + 5 + ⋯+ 5 =
𝑛 ⋅ 5.

In general, ∑𝑛𝑖=1 𝑗 = 𝑛 ⋅ 𝑗.

Even more generally, so long as what’s being summed remains the same as the index changes, it simply
gets multiplied by the number of times it’s added. For example:

13
©2018, Cullen Schaffer
𝑧2

∑ √𝑘log 𝑞 = 𝑧 2 √𝑘log 𝑞
𝑥=1

23. What is the sum of the entries in an 𝒏 × 𝒏 addition table?

A 3 × 3 addition table looks like this:

1 2 3
1 2 3 4
2 3 4 5
3 4 5 6

Adding up the nine entries, we get (2 + 3 + 4) + (3 + 4 + 5) + (4 + 5 + 6) = 36.

Here, we have arbitrarily added up entries for each column and then totaled the column-sums. For any
particular column, 𝑐, the column-sum is ∑3𝑟=1 𝑟 + 𝑐 . If we add all the column-sums, we get the grand
total: ∑3𝑐=1(∑3𝑟=1 𝑟 + 𝑐). Normally, this would be written without parentheses: ∑3𝑐=1 ∑3𝑟=1 𝑟 + 𝑐.

In the general 𝑛 × 𝑛 case, the sum we want is:


𝑛 𝑛

∑∑𝑟 +𝑐
𝑐=1 𝑟=1

Let’s work on the inner sum first. As we’ve seen (Item 5), the commutative law allows us to break
∑𝑛𝑟=1 𝑟 + 𝑐 into two parts: ∑𝑛𝑟=1 𝑟 + ∑𝑛𝑟=1 𝑐. By the result of Item 22, the second of these is just 𝑐𝑛; and
the first is very familiar—it’s 1 + 2 + 3 + ⋯ + 𝑛—and we can apply our formula. So the inner sum is
𝑛(𝑛 + 1)
+ 𝑐𝑛
2
Plugging this in, we have:
𝑛
𝑛(𝑛 + 1)
∑( + 𝑐𝑛)
2
𝑐=1

Using the commutative law again, we break this into two sums:
𝑛 𝑛
𝑛(𝑛 + 1)
∑ + ∑ 𝑐𝑛
2
𝑐=1 𝑐=1

𝑛2 (𝑛+1)
The result of Item 22 applies to the first, yielding . In the second, we factor out the 𝑛 which is
2
common to each term to get 𝑛 ∑𝑛𝑐=1 𝑐 and then note that the remaining sum is a familiar friend; the
𝑛(𝑛+1) 𝑛2 (𝑛+1)
result is 𝑛 ⋅ 2 = 2
.

𝑛2 (𝑛+1) 𝑛2 (𝑛+1)
Adding the first and second sums together we get: 2
+ 2
= 𝑛2 (𝑛 + 1). Note that, for 𝑛 = 3,
this yields 36, matching our example.

14
©2018, Cullen Schaffer
24. What is the sum of the entries in an 𝒏 × 𝒏 multiplication table?

A 3 × 3 multiplication table looks like this:

1 2 3
1 1 2 3
2 2 4 6
3 3 6 9

Adding up the nine entries, we get (1 + 2 + 3) + (2 + 4 + 6) + (3 + 6 + 9) = 36.

The sum we want is


𝑛 𝑛

∑ ∑ 𝑟𝑐
𝑐=1 𝑟=1

We can factor out the 𝑐 which is common to terms in the inner sum
𝑛 𝑛

∑𝑐∑𝑟
𝑐=1 𝑟=1

and then the inner sum is again familiar


𝑛
𝑛(𝑛 + 1)
∑𝑐
2
𝑐=1

Next, factor out the 𝑛(𝑛 + 1)/2 common to each term


𝑛
𝑛(𝑛 + 1)
∑𝑐
2
𝑐=1

and note the same familiar sum again

𝑛(𝑛 + 1) 𝑛(𝑛 + 1) 𝑛2 (𝑛 + 1)2


=
2 2 4
Substituting 𝑛 = 3, we can check that the formula gives 36, as it should.

If we’re clever, we can get the same answer in another way, avoiding all the summation manipulations.
Suppose we multiply (1 + 2 + 3 + ⋯ + 𝑛)(1 + 2 + 3 + ⋯ + 𝑛). We’ll get a sum consisting of every
possible product of one number from the first factor of (1 + 2 + 3 + ⋯ + 𝑛) and one from the second.
That is, we’ll get exactly all of the entries in the multiplication table added up. This is the sum we want
𝑛(𝑛+1) 𝑛(𝑛+1)
and now we can see clearly that it’s (1 + 2 + 3 + ⋯ + 𝑛)(1 + 2 + 3 + ⋯ + 𝑛) = ⋅ =
2 2
𝑛2 (𝑛+1)2
4
.
𝒏
25. Calculate ( ) using only addition.
𝒌

15
©2018, Cullen Schaffer
7
Suppose we want to calculate ( ), the number of 3-element subsets of a set of seven elements, say
3
{𝑎, 𝑏, 𝑐, 𝑑, 𝑒, 𝑓, 𝑔}. These subsets are of two kinds—the ones that include 𝑎 and the ones that don’t. If
we do include 𝑎, then we have to pick two more elements out of the remaining six to complete the
6
subset; there are ( ) ways to do that. If we don’t include 𝑎, then we have to make up the subset by
2
6 6
choosing three elements out of the remaining six; there are ( ) ways to do that. In all we have ( ) +
3 3
6
( ) ways of forming the subset. That is,
2
7 6 6
( )=( )+( )
3 3 2
This argument applies in general, yielding the extremely useful formula
𝑛 𝑛−1 𝑛−1
( )=( )+( )
𝑘 𝑘 𝑘−1
6 6
If we like, we can apply it to each of ( ) and ( ) in our example and get
3 2
7 6 6 5 5 5 5
( )= ( )+( )= ( )+( )+( )+( )
3 3 2 3 2 2 1
And then we can apply it to each of the four terms on the right, and so on. This process of substitution
has a nice property—it’s guaranteed to stop. Each iteration reduces the number at the top of the
binomial coefficients. Given this, one of two things is bound to happen sooner or later. Either:

 We’ll get a binomial coefficient with the top number equal to the bottom number—i.e.
3
something like ( ). In this case, we can stop applying the formula and just replace the binomial
3
coefficient with 1, since there is exactly one way of forming a subset of 𝑛 elements from a set of
𝑛 elements.
 We’ll get a binomial coefficient with zero as the bottom number. In this case, we can also
replace the binomial coefficient with 1, since, no matter how large a set we are considering
there is exactly one subset of size zero: 𝜙.
7
In other words, eventually, we’ll have ( ) = 1 + 1 + ⋯ + 1, and we can calculate the value just by
3
adding.
7
Rather than using our formula to break ( ) into smaller and smaller pieces, however, we can apply the
3
formula in the other direction. Consider the following pattern of numbers:
0
( )
0
1 1
( ) ( )
0 1
2 2 2
( ) ( ) ( )
0 1 2
3 3 3 3
( ) ( ) ( ) ( )
0 1 2 3

16
©2018, Cullen Schaffer
We’ve just argued that each of the numbers along the outer diagonals must be 1, since it’s either of the
𝑛 𝑛
form ( ) or of the form ( ). All of the other numbers can be filled in row by row, working downward
0 𝑛
using our formula, which says that each is the sum of the two closest numbers in the preceding row. For
3 2 2 3 2 2
example, once the “2” row has been filled in, we get ( ) = ( ) + ( ) and ( ) = ( ) + ( ).
1 1 0 2 2 1
Working downward in this way, we quickly get to the following:

1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 𝟑𝟓 35 21 71
7
The number 35, fourth from the left on the bottom row is ( ) and it has been calculated using only
3
additions.

This triangular pattern of binomial coefficients is called Pascal’s triangle, and it has very many
interesting and useful properties. One that we’ve already seen is that numbers on the rows sum to
4 4 4 4 4
successive powers of two. For example, 1 + 4 + 6 + 4 + 1 = ( ) + ( ) + ( ) + ( ) + ( ) = 24 .
0 1 2 3 4
Another is that the triangle is symmetrical. This is just another way of saying, as we’ve seen (Item 9),
𝑛 𝑛
that ( ) = ( ).
𝑘 𝑛−𝑘
Note that the approach we’ve taken in constructing Pascal’s triangle—saving the results of small
versions of a problem and using them to get solutions for ever larger versions of the same problem—is
an example of dynamic programming. You’ll likely study other applications of the technique in a class
on algorithms.
𝒏
26. Compute ∑𝒏𝒊=𝟎(−𝟏)𝒊 ( ).
𝒊
The inclusion of a factor of (−1)𝑖 in a sum is a standard notational trick. Writing out the first few terms
of the sum, we can see its effect:
𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛 𝑛
(−1)0 ( ) + (−1)1 ( ) + (−1)2 ( ) + (−1)3 ( ) + ⋯ = ( ) − ( ) + ( ) − ( ) + ⋯
0 1 2 3 0 1 2 3
The result is an alternating series, with every other term negative.

One way of computing this particular alternating series is to start by recalling the binomial theorem
(Item 9):
𝑛
𝑛
(𝑥 + 𝑦) = ∑ ( ) 𝑥 𝑛−𝑖 𝑦 𝑖
𝑛
𝑖
𝑖=0

17
©2018, Cullen Schaffer
This holds for any values of 𝑥 and 𝑦; let’s choose 𝑥 = 1 and 𝑦 = −1. The left side of the equation is
then (1 + −1)𝑛 = 0𝑛 = 0. The right side is
𝑛 𝑛
𝑛 𝑛
∑ ( ) 1𝑛−𝑖 (−1)𝑖 = ∑(−1)𝑖 ( )
𝑖 𝑖
𝑖=0 𝑖=0

exactly the sum we are trying to compute.

Note that this result says that an alternating sum of the entries in any row of Pascal’s triangle is equal to
zero. For example, on the “4” row, we have: 1 − 4 + 6 − 4 + 1 = 0.

A second way of arriving at the same result is to apply the formula from Item 25 (we’ve reversed the
order of the terms on the right, because it will be more convenient for this application):
𝑛 𝑛−1 𝑛−1
( )=( )+( )
𝑘 𝑘−1 𝑘
The sum we want is:
𝑛 𝑛 𝑛 𝑛 𝑛 𝑛
( ) − ( ) + ( ) − ( ) + ⋯± ( )∓( )
0 1 2 3 𝑛−1 𝑛
Whether 𝑛 is even or odd will determine the signs of the last two terms; for now we leave this
ambiguous, but indicate that the signs are opposite. Now we apply the formula to all but the first and
last terms:

𝑛 𝑛−1 𝑛−1 𝑛−1 𝑛−1 𝑛−1 𝑛−1


( ) − (( )+( )) + (( )+( )) − (( )+( )) + ⋯
0 0 1 1 2 2 3
𝑛−1 𝑛−1 𝑛
± (( )+( )) ∓ ( )
𝑛−2 𝑛−1 𝑛

Note that, except for the first two and the last two terms, everything now cancels. And since binomial
coefficients with the bottom zero or the top and bottom the same are equal to one, both the first and
last pair are 1 − 1 = 0. Thus the whole sum is zero.

A third way to get the result is to think what the positive and negative parts of the sum are counting.
The positive parts are:
𝑛 𝑛 𝑛
( )+( )+( )+⋯
0 2 4
This is counting the total number of subsets of even size. Likewise, the negative parts
𝑛 𝑛 𝑛
( )+( )+( )+⋯
1 3 5
count the number of subsets of odd size. To say that the difference of these is zero is exactly the same
as saying that the number of even subsets is the same as the number of odd subsets, and we can argue
this directly.

If we are forming subsets from some set, say {𝑎, 𝑏, 𝑐, … }, we can form the subsets into pairs which are
identical in every way except that one contains 𝑎 and the other does not. Since the difference in size is

18
©2018, Cullen Schaffer
exactly one, one of the subsets must be of even size and the other of odd size. But if each pair
contributes one even-sized subset and one odd-sized, the total number of each must be the same.

The part of mathematics that deals with counting problems, like the ones we have been discussing, is
called combinatorics. This third argument, relying as it does on counting arguments is called
combinatorial. We say that we have given a combinatorial proof that
𝑛
𝑛
∑(−1)𝑛 ( ) = 0
𝑖
𝑖=0

𝒏 𝒏−𝟏 𝒏−𝟏
27. Prove algebraically that ( ) = ( )+( ).
𝒌 𝒌 𝒌−𝟏
In Item 25, we gave a combinatorial proof of this formula, counting separately two ways of forming a 𝑘-
element subset—either skipping the first element and choosing 𝑘 from the rest or including the first
element and choosing 𝑘 − 1 from the rest. Now we get the same result just by doing some algebra.
𝑛−1 𝑛−1
The right side of the equation is ( )+( ). We can write each coefficient in terms of
𝑘 𝑘−1
factorials, using the formula of Item 17 (note that the last factor on the lower right is ((𝑛 − 1) −
(𝑘 − 1))! = (𝑛 − 𝑘)!):

(𝑛 − 1)! (𝑛 − 1)!
+
𝑘! (𝑛 − 1 − 𝑘)! (𝑘 − 1)! (𝑛 − 𝑘)!
We’d like to get a common denominator. Noting that 𝑘! = 𝑘 ⋅ (𝑘 − 1)!, we multiply the second fraction
𝑘
by 𝑘 to get

(𝑛 − 1)! 𝑘(𝑛 − 1)!


+
𝑘! (𝑛 − 1 − 𝑘)! 𝑘! (𝑛 − 𝑘)!
𝑛−𝑘
Likewise, (𝑛 − 𝑘)! = (𝑛 − 𝑘)(𝑛 − 𝑘 − 1)! So, multiplying the first fraction by we get
𝑛−𝑘

(𝑛 − 𝑘)(𝑛 − 1)! 𝑘(𝑛 − 1)!


+
𝑘! (𝑛 − 𝑘)! 𝑘! (𝑛 − 𝑘)!
Now that we have a common denominator, we can combine the fractions:
(𝑛 − 𝑘)(𝑛 − 1)! + 𝑘(𝑛 − 1)!
𝑘! (𝑛 − 𝑘)!
Factor out (𝑛 − 1)! on top to get
(𝑛 − 𝑘 + 𝑘)(𝑛 − 1)! 𝑛(𝑛 − 1)! 𝑛! 𝑛
= = =( )
𝑘! (𝑛 − 𝑘)! 𝑘! (𝑛 − 𝑘)! 𝑘! (𝑛 − 𝑘)! 𝑘
𝒏 𝒏 𝒏−𝟏
28. Prove algebraically and combinatorially that ( ) = ( ).
𝒌 𝒌 𝒌−𝟏

For the algebraic proof, rewrite the right side, using factorials for the binomial coefficient (note that for
the second factor on the bottom, we’ve relied on the fact that (𝑛 − 1) − (𝑘 − 1) = 𝑛 − 𝑘):

19
©2018, Cullen Schaffer
𝑛 (𝑛 − 1)!
𝑘 (𝑘 − 1)! (𝑛 − 𝑘)!
Since 𝑛(𝑛 − 1)! = 𝑛! and 𝑘(𝑘 − 1)! = 𝑘!, we then have
𝑛! 𝑛
=( )
𝑘! (𝑛 − 𝑘)! 𝑘
𝑛
For the combinatorial proof, remember that the left side, ( ), is the number of 𝑘-element subsets of a
𝑘
set of 𝑛 elements. One way to choose such subsets is to first pick a single element out of the original 𝑛
and then an additional 𝑘 − 1 elements out of the remaining 𝑛 − 1. There are 𝑛 choices for the first
𝑛−1 𝑛−1
stage and ( ) for the second stage, so the total number of ways of specifying a subset is 𝑛 ( ).
𝑘−1 𝑘−1
If we make a list of all these ways, however, we’ll find that each subset is listed more than once. For
example, if we are forming 3-element subsets from the set {𝑎, 𝑏, 𝑐, 𝑑, 𝑒}, then the list will include

Choose 𝑎 and then {𝑐, 𝑑} yielding {𝑎, 𝑐, 𝑑}


Choose 𝑐 and then {𝑎, 𝑑} yielding {𝑎, 𝑐, 𝑑}
Choose 𝑑 and then {𝑎, 𝑐} yielding {𝑎, 𝑐, 𝑑}

As the example shows, every subset will be listed once for each element in it—three times here, since
we are forming 3-element subsets, and 𝑘 times in general. The number of distinct subsets is 𝑘 times
less than the length of the list, i.e.
𝑛 𝑛−1
( )
𝑘 𝑘−1
29. What is the sum of the first 𝒏 odd numbers?

The first 𝑛 odd numbers can be written this way: 2 ⋅ 1 − 1, 2 ⋅ 2 − 1, 2 ⋅ 3 − 1, … ,2 ⋅ 𝑛 − 1. So the sum
we want is
𝑛

∑ 2𝑖 − 1
𝑖=1

By commutativity, this breaks into two sums


𝑛 𝑛

∑ 2𝑖 − ∑ 1
𝑖=1 𝑖=1

We factor 2 out of the first sum and are left with a familiar sum for which we have a formula; the second
sum is easy (Item 22). Thus we have
𝑛(𝑛 + 1)
2 − 𝑛 ⋅ 1 = 𝑛2 + 𝑛 − 𝑛 = 𝑛2
2
The sum of the first 𝑛 odd numbers is 𝑛2 .

20
©2018, Cullen Schaffer
A very different proof of this fact starts off as follows. Suppose, somehow, we happen to know that the
sum of the first 27 odd numbers is 272 . What does this tell us about the sum of the first 28 odd
numbers? That sum is

(2 ⋅ 1 − 1) + (2 ⋅ 2 − 1) + (2 ⋅ 3 − 1) + ⋯ + (2 ⋅ 27 − 1) + (2 ⋅ 28 − 1)

We’re adding 28 numbers, but we already know the sum of the first 27. Plugging that in, we get

272 + (2 ⋅ 28 − 1)

Now we substitute 27 + 1 in place of 28 and do a little algebraic rearranging:

272 + (2 ⋅ (27 + 1) − 1) = 272 + 2 ⋅ 27 + 1 = (27 + 1)2 = 282

In other words, if we know that the sum of the first 27 odd numbers is 272 , we can use that to prove
that the sum of the first 28 odd numbers is 282 .

Next, notice that the argument just given doesn’t rely on anything special about the numbers 27 and 28.
If we substitute, 43 and 44 instead, we’ll have a proof that if the first 43 odd numbers sum to 432 then
the first 44 odd numbers sum to 442 .

Let’s put the argument in general terms. Suppose we somehow know that the first 𝑘 odd numbers sum
to 𝑘 2. What does this tell us about the sum of the first 𝑘 + 1 odd numbers? That sum is
(2 ⋅ 1 − 1) + (2 ⋅ 2 − 1) + (2 ⋅ 3 − 1) + ⋯ + (2 ⋅ 𝑘 − 1) + (2 ⋅ (𝑘 + 1) − 1)

We’re adding 𝑘 + 1 numbers, but we already know the sum of the first 𝑘. Plugging that in, we get

𝑘 2 + (2 ⋅ (𝑘 + 1) − 1)

Then, doing a little algebra we have:

𝑘 2 + (2 ⋅ (𝑘 + 1) − 1) = 𝑘 2 + 2𝑘 + 1 = (𝑘 + 1)2
In other words, if we know that the sum of the first 𝑘 odd numbers is 𝑘 2, we can use that to prove that
the sum of the first 𝑘 + 1 odd numbers is (𝑘 + 1)2 .

Now how can we use this result?

What we’re trying to prove is a general statement: For any 𝑛 the sum of the first 𝑛 odd numbers is 𝑛2 .
What we’ve shown so far is that if the statement is true when 𝑛 = 27 then it’s also true when 𝑛 = 28;
and that, if the statement is true when 𝑛 = 43, it’s also true when 𝑛 = 44; and, in general, if the
statement is true when 𝑛 = 𝑘, for any particular number 𝑘, then it’s also true when 𝑛 = 𝑘 + 1.

So far, though, we don’t actually know that the statement is ever true—only that if it’s true in some
cases, it will be true in others. Luckily, in addition to these conditional if results, we can also easily verify
one simple unconditional fact: The statement is true when 𝑛 = 1. We know this is true, because we can
check it directly. In this case, all we’re saying is that the first odd number is equal to 12 and this is
clearly correct.

But now our conditionals come into play. One of them tells us that if the statement is true for 𝑛 = 1
then it is also true for 𝑛 = 2. Since we’ve just checked that it is true for 𝑛 = 1, we can conclude it’s true

21
©2018, Cullen Schaffer
for 𝑛 = 2. The next tells us that if it’s true for 𝑛 = 2 then it’s also true for 𝑛 = 3. Since we’ve already
seen that the statement is true for 𝑛 = 2, it must be true for 𝑛 = 3 as well.

Clearly there’s no limit to how far we can go. Thus, we may conclude that the statement—the sum of
the first 𝑛 odd numbers is 𝑛2 —is true for all values of 𝑛 in the infinite set {1, 2, 3, … }.

This kind of argument, where we

 Show that a statement is true for one particular value of 𝑛


 Show that if it’s true for any particular value then it’s also true for the next higher value and
finally
 Conclude that it’s true for all values greater than or equal to the first one

is called a proof by induction.

A few points are worth noting. First, proof by induction is much more natural than it may first appear.
Imagine a long line of people waiting at a checkout counter. Suppose there’s a special discount and that
anyone in line who hears about the discount will tell the person waiting just behind him or her. What
can we conclude about who will hear about the discount? Nothing, so far. If anyone hears, so will all
the people farther back in the line. But from the information given so far, it’s perfectly possible that no
one will hear anything. On the other hand, if we add just one fact—that the first person in line hears
about the discount—we can immediately conclude that everyone will hear.

The second point is that there is a standard formal terminology used in presenting induction proofs. The
single hard unconditional fact is called the base case. The supposition1 that the statement we want to
prove is true for some particular value of 𝑛 is called the induction hypothesis; very typically, a particular
value that we want to consider is denoted 𝑘 The crux of the proof—that if the statement is true for any
particular 𝑘 it is also true for the next higher value 𝑘 + 1—is called the induction step.

The standard formal presentation of the argument we’ve just seen would then look like this:

Theorem: The sum of the first 𝑛 odd numbers is 𝑛2 .

Base case: Let 𝑛 = 1. Then the sum of the first 𝑛 odd numbers is 1 = 12 , as the theorem claims.

Induction step: Assume that the theorem is true for 𝑛 = 𝑘; this is our induction hypothesis.
Consider the sum of the first 𝑘 + 1 odd numbers:
(2 ⋅ 1 − 1) + (2 ⋅ 2 − 1) + ⋯ + (2 ⋅ 𝑘 − 1) + (2 ⋅ (𝑘 + 1) − 1)

By the induction hypothesis, we can replace the first 𝑘 terms with 𝑘 2 to get

𝑘 2 + (2 ⋅ (𝑘 + 1) − 1)

But this simplifies to (𝑘 + 1)2 , as we wished to prove.

1
By ‘supposition’ we mean the thing we’re supposing is true. We showed that if the sum of the first 27 odd
numbers is 272 then the sum of the first 28 odd numbers is 282 . In making this argument we temporarily
supposed that we somehow knew that the first 27 did sum to 272 . We might also have started by saying “Assume
that the sum of the first 27 odd numbers is 272 ” or even just “If the sum of the first 27 odd numbers is 272 .”

22
©2018, Cullen Schaffer
The third point, though, is that induction proofs are very rarely presented in this formal way. Other than
in a textbook teaching about proofs by induction, something like the following would be much more
common:

Theorem: The sum of the first 𝑛 odd numbers is 𝑛2 .

We proceed by induction on 𝑛. The theorem is clearly true for 𝑛 = 1. If it’s true for the first 𝑘
odd numbers, then the sum of the first 𝑘 + 1 odd numbers would be 𝑘 2 + (2 ⋅ (𝑘 + 1) − 1) =
(𝑘 + 1)2 , proving the theorem.

It’s important to be able to recognize and understand the basic checkout-line argument being employed
regardless of how the proof is presented.

We’ve given two proofs that the sum of the first 𝑛 odd numbers is 𝑛2 . Here’s a third, much nicer one:

1
3
5
7

Doesn’t this make it immediately obvious that 1 + 3 + 5 + 7 = 42 and that the same pattern will hold
for any value of 𝑛?

Of course, we really didn’t need any of the proofs just given. The sum we’re discussing is arithmetic and
can be handled by the reversal technique we used in Item 1. Twice the sum we want is
1+3 + ⋯ + 2𝑛 − 1 +
2𝑛 − 1 + 2𝑛 − 3 + ⋯ + 1
Adding vertical pairs, we get 𝑛 terms, each equal to 2𝑛 for a total of 2𝑛2. Divide by two to get 𝑛2 .

30. Compute ∑𝒏𝒊=𝟏 𝒊𝟐 .

Our plan is to get a formula using guesswork and then prove it correct using induction. If we add up the
first 𝑛 numbers to the zero power, each is equal to 1 and the answer is 𝑛:

10 + 20 + 30 + ⋯ + 𝑛 0 = 𝑛

We’ve already seen that if the power is one in each term, we get an answer involving 𝑛2 :
𝑛(𝑛 + 1) 1 2 1
11 + 21 + 31 + ⋯ + 𝑛1 = = 𝑛 + 𝑛
2 2 2
Let’s see what happens if we guess that using the power two will give us an answer involving 𝑛3 , that is

12 + 22 + 32 + ⋯ + 𝑛2 = 𝑎𝑛3 + 𝑏𝑛2 + 𝑐𝑛 + 𝑑
for some numbers 𝑎, 𝑏, 𝑐 and 𝑑. In order for the formula to be correct for 𝑛 = 1, we must have

12 = 𝑎 ⋅ 13 + 𝑏 ⋅ 12 + 𝑐 ⋅ 1 + 𝑑
or 𝑎 + 𝑏 + 𝑐 + 𝑑 = 1. Likewise, for the formula to be correct for 𝑛 = 2, we must have

23
©2018, Cullen Schaffer
12 + 22 = 𝑎 ⋅ 23 + 𝑏 ⋅ 22 + 𝑐 ⋅ 2 + 𝑑
or 8𝑎 + 4𝑏 + 2𝑐 + 𝑑 = 5. We get two additional equations for the cases 𝑛 = 3 and 𝑛 = 4. This gives
us four equations in four variables and—with substantial but routine effort—we can solve to find that
1 1 1
𝑎 = 3 , 𝑏 = 2 , 𝑐 = 6 , 𝑑 = 0. That is, if our idea is to work, the formula must be
𝑛
1 1 1
∑ 𝑖 2 = 𝑛3 + 𝑛2 + 𝑛
3 2 6
𝑖=1
1 1
We can factor out 6 from each term to get 6 (2𝑛3 + 3𝑛2 + 𝑛) and then factor the polynomial to get
1
6
𝑛(𝑛 + 1)(2𝑛 + 1).

This formula works by design for the first four cases. But is it always true? Here is the induction proof
that it is.
1
Plugging in 𝑛 = 1, we get 6 ⋅ 1 ⋅ (1 + 1) ⋅ (2 ⋅ 1 + 1) = 1, which is correct if our sum has just one term,
12 . Assume then that the formula is correct for the sum of the first 𝑘 squares. What if we add up the
first 𝑘 + 1? Based on our assumption, this will be
1
𝑘(𝑘 + 1)(2𝑘 + 1) + (𝑘 + 1)2
6
1
Factor out 6 from both terms:

1
[𝑘(𝑘 + 1)(2𝑘 + 1) + 6(𝑘 + 1)2 ]
6
Factor out 𝑘 + 1:
1
(𝑘 + 1)[𝑘(2𝑘 + 1) + 6(𝑘 + 1)]
6
Inside the square brackets we have 𝑘(2𝑘 + 1) + 6(𝑘 + 1) = 2𝑘 2 + 7𝑘 + 6 which factors into (𝑘 +
2)(2𝑘 + 3). So now we have
1
(𝑘 + 1)(𝑘 + 2)(2𝑘 + 3)
6
If we rewrite the last two factors in terms of 𝑘 + 1, we get
1
(𝑘 + 1)((𝑘 + 1) + 1)(2(𝑘 + 1) + 1)
6
But this is exactly the formula we’re trying to prove, with 𝑘 + 1 taken as the value of 𝑛—if we assume
the formula is correct for 𝑘 then we’ve shown it’s correct for 𝑘 + 1 as well, completing the induction
proof.

31. Compute ∑𝒏𝒊=𝟏 𝒊𝟑 .

Exactly the approach we’ve just applied in computing the sum of squares will work here as well. But
here’s a much nicer idea. When we added up the entries in an 𝑛 × 𝑛 multiplication table, we first

24
©2018, Cullen Schaffer
formed the column sums and then added them together. What happens if instead we add by “layers,”
according to the following picture:

1 2 3 4
1 1 2 3 4
2 2 4 6 8
3 3 6 9 12
4 4 8 12 16

When we add the 𝑛th layer, the numbers along the bottom are 𝑛 ⋅ 1, 𝑛 ⋅ 2, 𝑛 ⋅ 3, … , 𝑛 ⋅ 𝑛. The additional
numbers along the right—not counting the last one, which we’ve already accounted for—are 𝑛 ⋅ 1, 𝑛 ⋅
2, 𝑛 ⋅ 3, … 𝑛 ⋅ (𝑛 − 1). If we reverse the second list and add it to the first, we get the sum of all numbers
in the layer:
𝑛⋅1 +𝑛⋅2 + ⋯ + 𝑛 ⋅ (𝑛 − 1) + 𝑛 ⋅ 𝑛
𝑛 ⋅ (𝑛 − 1) + 𝑛 ⋅ (𝑛 − 2) + ⋯ + 𝑛 ⋅ 1

Summing these up vertically, we find that every pair is 𝑛 ⋅ 𝑛. We have 𝑛 pairs, so the final total is 𝑛3 .
That is, when we add up the 𝑛 layers in an 𝑛 × 𝑛 multiplication table, we’re adding 13 + 23 + 33 +
⋯ +𝑛3 , exactly the sum we want. And whether we add by rows or columns or layers, the answer is the
same, so the result must be what we computed in Item 24:
𝑛
𝑛2 (𝑛 + 1)2
∑ 𝑖3 =
4
𝑖=1

32. Prove the correctness of a recursive algorithm for calculating binomial coefficients.

Here is a function for calculating binomial coefficients. It’s written in Python, but it can be read as
pseudocode to be implemented in any familiar programming language.
def binomial(n, k):
if k==0 or k==n:
return 1
return binomial(n-1, k) + binomial(n-1, k-1)

A key feature of this function is that it calls itself. If you’ve studied functions with this property, you
know that they’re called recursive. If you haven’t, you may find the idea odd at first; it may even seem
that the definition of the function is circular, like defining a word in terms of itself.

Here’s how we can check that the definition isn’t circular—that it always returns an answer rather than
getting stuck in a never-ending computation—and that it is correct—that the answer it returns is always
what it should be.

First, consider what happens when the function is called with 𝑛 = 1. In this case, the value of 𝑘 must be
1
either 0 or 1 (we aren’t considering use of the function to calculate quantities like ( ); if we’re worried
7
about those, we could add a condition that catches them and returns zero). For both values of 𝑘, the
conditional correctly causes the function to return 1.

25
©2018, Cullen Schaffer
Next, consider what happens when the function is called with 𝑛 = 2. If 𝑘 = 0 or 2, the conditional
correctly causes the function to return 1. In the remaining case, the function applies the formula
𝑛 𝑛−1 𝑛−1
( )=( )+( )
𝑘 𝑘 𝑘−1
and it will be correct so long as the two recursive calls produce the correct answers. But in each of these
calls we have 𝑛 = 1 and we have already checked that the function works correctly in this case. So now
we know that the function is correct when 𝑛 = 2.

Exactly the same argument will now apply when 𝑛 = 3. If 𝑘 = 0 or 3, the conditional ensures that the
correct answer is returned; in every other case, the function applies the formula, making two calls with
𝑛 = 2 and we have already checked that it works correctly for this value of 𝑛.

So far we have proved that the function is correct for values of 𝑛 up to 3, but it’s clear we could
continue as far as necessary. That is, the function is correct for all values of 𝑛.

Although we haven’t used any of the standard formal terminology, it should be clear that this argument
is an induction proof. We argue directly that the program is correct when 𝑛 = 1. Then we argue that if
it is correct for any particular value of 𝑛, it will also be correct for the next higher value (since that
mainly depends on two calls with the original value).

Here’s how the induction proof might typically be presented:


𝑛
Theorem: The given function correctly computes ( ).
𝑘
We proceed by induction on 𝑛. For 𝑛 = 1, the only valid values of 𝑘 are 0 and 1; both cases are
handled correctly by the conditional.

For higher values of 𝑛, the values 𝑘 = 0 and 𝑘 = 𝑛 are again handled correctly by the
conditional. The correctness for any other value of 𝑘 depends on calls involving 𝑛 − 1, which
are correct by the induction hypothesis.
𝒏
33. How many function calls will the recursive algorithm make in computing ( )?
𝒌
7 6
If we use the function of Item 32 to calculate, say, ( ), it will call itself twice, once to calculate ( ) and
3 3
6
once to calculate ( ). Then each of these will generate more calls. If we use the notation 𝑁𝑛,𝑘 to mean
2
𝑛
the total number of calls when we use the function to calculate ( ), then what we have so far is
𝑘
𝑁7,3 = 2 + 𝑁6,3 + 𝑁6,2
6
We can go a step further, substituting in for 𝑁6,3 and 𝑁6,2 . In calculating ( ), for example, the function
3
will call itself twice, and each of these calls will in turn generate further calls. After substituting, we get

𝑁7,3 = 2 + (2 + 𝑁5,3 + 𝑁5,2 ) + (2 + 𝑁5,2 + 𝑁5,1 )

This process of substitution is not nearly complete, but it will stop eventually. Sooner or later the
subscripts will either be equal or the second one (the 𝑘 value) will be zero. In either case, the function

26
©2018, Cullen Schaffer
will return its answer using the conditional and will make no recursive calls. In other words, for example
𝑁4,0 = 0 and 𝑁3,3 = 0.

So if we go on substituting (and leaving out zeroes), eventually we’ll get our answer in the form

𝑁7,3 = 2 + 2 + ⋯ + 2

As with Pascal’s triangle, though, it’s more efficient to build up. As we’ve seen, we have
𝑁𝑛,0 = 𝑁𝑛,𝑛 = 0
𝑁𝑛,𝑘 = 2 + 𝑁𝑛−1,𝑘 + 𝑁𝑛−1,𝑘−1 for 0 < 𝑘 < 𝑛

This way of specifying values based on previous ones (with a base case to get things started) is called an
inductive definition. The second line, (the recursive part) is called a recurrence. Using this definition, we
can quickly build up a triangular table of values for 𝑁, with the values on the 𝑛th row being
𝑁𝑛,0 , 𝑁𝑛,1 , 𝑁𝑛,2 , … , 𝑁𝑛,𝑛 . The base case tells us that the outside diagonal values are all zero. The
recurrence tells us that any other value is two more than the sum of the values on either side in the row
just above.

0
0 0
0 2 0
0 4 4 0
0 6 10 6 0
0 8 18 18 8 0
0 10 28 38 28 10 0
0 12 40 𝟔𝟖 68 40 12 0

7
From the last row we can tell, for example, that 68 function calls are made in calculating ( ).
3
𝑛
Using this approach, we can determine how many function calls are made in calculating ( ) for any
𝑘
values of 𝑛 and 𝑘. But the answer is a bit unsatisfying. What we’d really like is a formula based on 𝑛
and 𝑘 that tells us this number. Is such a formula possible?

34. Prove that the number of odd-degree vertices in a graph is always even.

Here’s a picture of dots connected by lines.

It might represent five cities and the roads between them. Or it might be a computer network, the dots
being servers and the lines being communications links. Or the dots might be people, with lines
connecting each pair who know each other.

27
©2018, Cullen Schaffer
In fact, there are so many useful things that can be represented with pictures of this kind, that there’s an
extensive terminology for talking about them. The dots are called nodes or vertices (each one is a
vertex) and the lines are called edges. The picture as a whole is a graph and the subfield of mathematics
concerned with graphs is called graph theory.
𝑛
If we have 𝑛 vertices, there are ( ) possible edges, one for each pair of vertices. If a graph includes all
2
of them—that is, if every pair of vertices is connected by an edge—we say the graph is complete. Note,
by the way, that we’re allowing only one edge between any pair of vertices—there can’t be two roads
from 𝑎 to 𝑏—and that we’re not allowing an edge from one vertex to that same vertex—no road goes
from 𝑎 to 𝑎. A graph of this kind is called simple.

With roads in mind, we can think of travelling from vertex to vertex in a graph along the edges. The
series of edges traversed (or the vertices visited) is then called a path. If there’s a path from every
vertex to every other vertex in a graph, we say the graph is connected. If there’s a path from a vertex
back to itself—with no edge used twice—the graph contains a kind of loop; the technical term is cycle.

In our sample graph, vertex 𝑏 is the endpoint of just one edge. If this is a map of cities, 𝑏 will be cut off
if just one road is washed out. If the graph is a network diagram, server 𝑏 won’t be able to
communicate at all if a single connection goes down. By contrast, 𝑑 is highly connected, being the
endpoint of three edges. If an edge ends at a vertex, we say it is incident to that vertex. We call the
number of edges incident to a vertex its degree. In our graph, vertices 𝑎, 𝑐 and 𝑒 all have degree 2; 𝑏
has degree 1 and 𝑑 has degree 3.

Note that in our graph there are two vertices of odd degree and that two is an even number. In fact, the
number of odd-degree vertices is always even. Here’s a proof by induction on the number of edges.

If we have no edges at all, then every vertex is of degree 0, so there are no odd-degree vertices—and
since zero is an even number, the claim is correct.

Suppose the claim is correct after we’ve added some number of edges. What happens if we add one
more?

 If the new edge goes between two odd-degree nodes, it adds one to the degree of each, making
the degree of each even. This means the number of odd-degree nodes goes down by 2. But by
the induction hypothesis, this number was even before we added the new edge. If it goes down
by 2, it will still be even.
 If the new edge goes between two even-degree nodes, it makes them both odd-degree. The
number of odd-degree nodes goes up by 2, so if it used to be even, the number remains even.
 If the new edge goes between an odd-degree node and an even-degree node, their parities—
whether they’re even or odd—will be reversed and the number of odd-degree nodes will be
unchanged. If the number was even, it remains even.

35. Prove that at any party there will be two people who know the same number of guests.

We can represent the people at the party as vertices in a graph, with edges connecting each pair who
are acquainted. The claim is then one of graph theory: Every graph has two nodes of the same degree.

28
©2018, Cullen Schaffer
If there are 𝑛 vertices, then any particular vertex can have edges connecting it directly to any of the 𝑛 −
1 others; that is, its degree may be anything from 0 to 𝑛 − 1. On the other hand, it’s impossible for a
graph to have both a vertex of degree 0 and one of degree 𝑛 − 1. If one vertex has degree 𝑛 − 1,
there’s an edge from it to all the other vertices, meaning that each of those vertices has at least one
incident edge—i.e. that its degree is at least 1 and hence it can’t be a node of degree 0.

Looking at the graph, then, we’ll either see all degrees belonging to the set {0, 1, 2, … , 𝑛 − 2} or all
degrees belonging to the set {1, 2, 3, … , 𝑛 − 1}. In either case we have 𝑛 numbers drawn from a set of
size 𝑛 − 1. One of these numbers must be used twice.

At the end of this proof, we relied implicitly on a commonsense notion—you can’t pick 𝑛 times from a
list of 𝑛 − 1 options without repeating yourself. If a restaurant only serves four things and you go five
times, you’ll have to eat something twice. Mathematicians call this commonsense notion the
pigeonhole principle. If you have 𝑛 pigeons and 𝑛 − 1 pigeonholes for them to live in, some hole must
house at least two birds.

By the pigeonhole principle, there must be two people in New York City who have exactly the same
number of hairs on their heads. It’s a biological fact that no one has more than 500,000 head hairs. So
the number associated with each person is one from the set {0, 1, 2, … , 500,000}. But there are more
than 8 million New Yorkers. If we make a list of 8 million numbers from the set {0, 1, 2, … , 500,000},
some number will have to be listed twice.

36. In how many ways can we tile an 𝒏 × 𝟐 area with 𝟏 × 𝟐 tiles?

Here are two sample tilings of a 7 × 2 area:

Let 𝑁𝑛 denote the number of tilings of an 𝑛 × 2 area. If we’re tiling such an area, working from the left,
there are two possible ways to start. Either we first lay down a vertical tile; in this case, we have 𝑁𝑛−1
ways to finish the job. Or we lay down two horizontal tiles; in this case, we have 𝑁𝑛−2 ways to finish.

If we note that 𝑁1 and 𝑁2 are easy, then we can write a complete inductive definition:
𝑁1 = 1
𝑁2 = 2
𝑁𝑛 = 𝑁𝑛−1 + 𝑁𝑛−2 for 𝑛 > 2
This allows us to efficiently compile a table of values of 𝑁 as high as we like:

𝑛 1 2 3 4 5 6 7
𝑁𝑛 1 2 3 5 8 13 21

According to the recurrence, each value after the base cases is just the sum of the preceding two.

29
©2018, Cullen Schaffer
Again, it would be nice to have an exact formula for 𝑁𝑛 . In lieu of that, though, let’s use induction to
prove that 𝑁𝑛 grows very large very quickly, in particular that it grows at least exponentially.

Theorem: 𝑁2𝑛 ≥ 2𝑛 .

For 𝑛 = 1, the claim is that 𝑁2 ≥ 21 which is true. For larger values of 𝑛, we have 𝑁2𝑛 =
𝑁2𝑛−1 + 𝑁2𝑛−2 , according to the recurrence. Applying the recurrence again, to 𝑁2𝑛−1 , we get

𝑁2𝑛 = (𝑁2𝑛−2 + 𝑁2𝑛−3 ) + 𝑁2𝑛−2 = 2𝑁2(𝑛−1) + 𝑁2𝑛−3

By the induction hypothesis, 𝑁2(𝑛−1) ≥ 2𝑛−1. So we have

𝑁2𝑛 ≥ 2 ⋅ 2𝑛−1 + 𝑁2𝑛−3


In which case, certainly 𝑁2𝑛 ≥ 2𝑛 .

37. How many moves does it take to solve an 𝒏-disk Tower of Hanoi puzzle?

Here’s a 4-disk Tower of Hanoi puzzle.

The goal is to move the tower of 4 disks from its current position on Peg 3 to a different one, say Peg 1.
Each move consists of transferring the top disk on one peg (we can’t get to any of the other disks) to
another peg. Also, it is illegal to put a disk on top of a smaller one.

To solve the 4-disk puzzle, we will eventually have to move the largest disk to Peg 1. The only way to do
that will be to have all the other disks out of the way on Peg 2. By the rule prohibiting large disks on top
of small disks, the three disks will have to be on Peg 2 in size order. In other words, before we can move
the largest disk, we’ll have to solve a 3-disk puzzle, moving a tower of three disks from Peg 3 to Peg 2.

Let 𝑁𝑛 be the number of moves necessary to solve an 𝑛-disk puzzle. Then to solve the 4-disk puzzle
we’ll have to

 Move the top three disks out of the way (𝑁3 moves)
 Move the single bottom disk into position (1 move)
 Move the top three disks onto the bottom one (𝑁3 moves)

The total is 2𝑁3 + 1 moves.

Of course, there’s nothing special about the number 4. In general, we have 𝑁𝑛 = 2𝑁𝑛−1 + 1. Also, if
𝑛 = 1, the puzzle is trivial. A 1-disk tower is just a single disk and it takes one move to transfer it. Given
this, we have a complete inductive definition:
𝑁1 = 1
𝑁𝑛 = 2𝑁𝑛−1 + 1 for 𝑛 > 1

30
©2018, Cullen Schaffer
Luckily, in this case we can easily get a formula for 𝑁𝑛 , instead of just compiling a table. The table will
give us a hint about the formula though:

𝑛 1 2 3 4 5
𝑁𝑛 1 3 7 15 31

Each of the numbers in the bottom row is one less than a power of 2; it certainly looks as if 𝑁𝑛 = 2𝑛 −
1. Here’s an induction proof that this is in fact the case.

Theorem: Solving an 𝑛-disk Tower of Hanoi puzzle requires 2𝑛 − 1 moves.

The formula checks for 𝑛 = 1. For larger values of 𝑛, the recurrence tells us that 𝑁𝑛 = 2𝑁𝑛−1 +
1. By the induction hypothesis, 𝑁𝑛−1 = 2𝑛−1 − 1. Substituting we have

𝑁𝑛 = 2𝑁𝑛−1 + 1 = 2(2𝑛−1 − 1) + 1 = 2𝑛 − 2 + 1 = 2𝑛 − 1
A formula like this one, 2𝑛 − 1, that allows us to calculate values for any 𝑛 without repeatedly applying
the recurrence is said to be in closed form. When we find a closed-form formula for a quantity defined
inductively, we say that we are solving the recurrence which is the core of the definition.

38. Solve the recurrence 𝑵𝟏 = 𝟏, 𝑵𝒏 = 𝑵𝒏−𝟏 + 𝒏.

Note first that what we are given is really an inductive definition. The recurrence is the second part.
This informal way of talking is common though.

Now consider 𝑁4 . By the recurrence, we have 𝑁4 = 𝑁3 + 4. Applying the recurrence again, to 𝑁3 , we


have 𝑁4 = 𝑁2 + 3 + 4. Applying it once more—and noting that we are given 𝑁1 = 1—we have 𝑁4 =
1 + 2 + 3 + 4.
𝑛(𝑛+1)
It should be clear now that we have the closed form 𝑁𝑛 = 2
. We could prove this using induction,
but that would be overkill for a problem this simple.

39. Solve the recurrence 𝑵𝟏 = 𝟏, 𝑵𝒏 = 𝟑𝑵𝒏−𝟏 + 𝟓.

Note first that this recurrence of exactly the same form as the one we solved for the Tower of Hanoi
problem in Item 37. There we had
𝑁1 = 1
𝑁𝑛 = 2𝑁𝑛−1 + 1 for 𝑛 > 1
Here we’ve changed two constants to get
𝑁1 = 1
𝑁𝑛 = 3𝑁𝑛−1 + 5 for 𝑛 > 1
With the first recurrence we just compiled a table of values, noticed the obvious pattern and then
proved it by induction. Does the same approach work here? Here’s a table of values:

31
©2018, Cullen Schaffer
𝑛 1 2 3 4 5
𝑁𝑛 1 8 29 92 281

If there’s a pattern here, it’s fair to say it isn’t obvious. We can find it, though, if we keep track of how
the values of 𝑁𝑛 are calculated, rather than just their values. Here’s a table that shows this information
for the first five:

𝑛 𝑁𝑛

1 1

2 3⋅1+5

3 3(3 ⋅ 1 + 5) + 5 = 32 + 3 ⋅ 5 + 5

4 3(32 + 3 ⋅ 5 + 5) + 5 = 33 + 32 ⋅ 5 + 3 ⋅ 5 + 5

5 3(33 + 32 ⋅ 5 + 3 ⋅ 5 + 5) + 5 = 34 + 33 ⋅ 5 + 32 ⋅ 5 + 3 ⋅ 5 + 5

From the right-hand column of the table we can see that 𝑁𝑛 = 3𝑛−1 + (3𝑛−2 + 3𝑛−3 + ⋯ + 32 + 3 +
1) ⋅ 5. The expression inside the parentheses is a geometric series. Applying the formula we derived in
Item 21, we get

3𝑛−1 − 1 7 5
𝑁𝑛 = 3𝑛−1 + ⋅ 5 = 3𝑛−1 −
3−1 2 2

To prove this formally, we could use induction, but the presentation we’ve just given is convincing
enough that we won’t take the trouble.

Instead, let’s take the same approach and solve a whole class of recurrences, one that includes both the
Tower of Hanoi example and the one we’ve just finished. Here’s a recurrence that includes constants
𝑎, 𝑏 and 𝑐 that can represent any numerical value:
𝑁1 = 𝑎
𝑁𝑛 = 𝑏𝑁𝑛−1 + 𝑐 for 𝑛 > 1

Again, we draw up a table showing how the first five values of 𝑁𝑛 will be calculated:

32
©2018, Cullen Schaffer
𝑛 𝑁𝑛

1 𝑎

2 𝑏⋅𝑎+𝑐

3 𝑏(𝑏𝑎 + 𝑐) + 𝑐 = 𝑏 2 𝑎 + 𝑏𝑐 + 𝑐

4 𝑏(𝑏 2 𝑎 + 𝑏𝑐 + 𝑐) + 𝑐 = 𝑏 3 𝑎 + 𝑏 2 𝑐 + 𝑏𝑐 + 𝑐

5 𝑏(𝑏 3 𝑎 + 𝑏 2 𝑐 + 𝑏𝑐 + 𝑐) + 𝑐 = 𝑏 4 𝑎 + 𝑏 3 𝑐 + 𝑏 2 𝑐 + 𝑏𝑐 + 𝑐

From this table we see that, in general, 𝑁𝑛 = 𝑏 𝑛−1 𝑎 + (𝑏 𝑛−2 + 𝑏 𝑛−3 + ⋯ + 𝑏 2 + 𝑏 + 1)𝑐. Substituting
for the geometric series in parentheses and then simplifying, we get

𝑏 𝑛−1 − 1 𝑐 𝑐
𝑁𝑛 = 𝑏 𝑛−1 𝑎 + 𝑐 = (𝑎 + ) 𝑏 𝑛−1 −
𝑏−1 𝑏−1 𝑏−1

With 𝑎 = 1, 𝑏 = 2 and 𝑐 = 1, this yields 𝑁𝑛 = 2 ⋅ 2𝑛−1 − 1 = 2𝑛 − 1, our solution for the Tower of
7 5
Hanoi recurrence. With 𝑎 = 1, 𝑏 = 3 and 𝑐 = 5, it yields 𝑁𝑛 = 2 3𝑛−1 − 2, the formula we derived
earlier in this same discussion.

40. Solve the recurrence 𝑵𝟏 = 𝟏, 𝑵𝟐 = 𝟐, 𝑵𝒏 = 𝑵𝒏−𝟏 + 𝟔𝑵𝒏−𝟐.

Start by compiling a table of values:

𝑛 1 2 3 4 5 6 7
𝑁𝑛 1 2 8 20 68 188 596

2 8 20
Next, look at the factors by which the numbers along the bottom are increasing: 1 = 2, 2 = 4, 8
=
68 188 596
2.5, = 3.4, ≈ 2.8, ≈ 3.2. These factors seem perhaps to be converging to a number near 3.
20 68 188

Is there a simple, geometric sequence of numbers—that is, one increasing by a fixed factor like 3—that
satisfies the recurrence 𝑁𝑛 = 𝑁𝑛−1 + 6𝑁𝑛−2 ? A geometric sequence with the factor 𝑟 would be of this
form 𝑁𝑛 = 𝑎𝑟 𝑛 . Plugging this into the recurrence gives us

𝑎𝑟 𝑛 = 𝑎𝑟 𝑛−1 + 6𝑎𝑟 𝑛−2

Dividing both sides by 𝑎𝑟 𝑛−2 we get

𝑟 2 = 𝑟 + 6 or 𝑟 2 − 𝑟 − 6 = 0 or (𝑟 + 2)(𝑟 − 3) = 0

Surprisingly then, we get two geometric sequences that work: 𝑁𝑛 = 𝑎(−2)𝑛 and 𝑁𝑛 = 𝑏3𝑛 . And we are
free to choose 𝑎 or 𝑏 any way we like.

Unfortunately, no matter how we choose, neither of these sequences gives us the base case values we
want: 𝑁1 = 1 and 𝑁2 = 2. For example, if we have 𝑁𝑛 = 𝑎(−2)𝑛 , then we have 𝑁1 = −2𝑎 and 𝑁2 =
1 1
4𝑎. The value of 𝑎 would have to be − in the first case and in the second case; no single value works.
2 2

33
©2018, Cullen Schaffer
Actually, of course, this is obvious. The terms in one of our geometric sequences increase by factors of
−2 and terms in the other increase by factors of 3. But in our actual sequence, terms don’t increase by
either of these factors; for example, the ratio between the first two—the base cases—is 2.

Neither of the geometric sequences that satisfy our recurrence can be tweaked to match the base cases.
Note though that, if we have two sequences that satisfy our recurrence, then their sum works as well. If
𝐴𝑛 = 𝐴𝑛−1 + 6𝐴𝑛−2 and 𝐵𝑛 = 𝐵𝑛−1 + 6𝐵𝑛−2 then if we define 𝐶𝑛 = 𝐴𝑛 + 𝐵𝑛 we have
𝐶𝑛 = 𝐴𝑛 + 𝐵𝑛
= 𝐴𝑛−1 + 6𝐴𝑛−2 + 𝐵𝑛−1 + 6𝐵𝑛−2
= 𝐴𝑛−1 + 𝐵𝑛−1 + 6(𝐴𝑛−2 + 𝐵𝑛−2 )
= 𝐶𝑛−1 + 6𝐶𝑛−2
So now we can set 𝑁𝑛 = 𝑎(−2)𝑛 + 𝑏3𝑛 and choose 𝑎 and 𝑏 to match our base case values. We have

𝑁1 = 1 = 𝑎(−2)1 + 𝑏31 = −2𝑎 + 3𝑏


𝑁2 = 2 = 𝑎(−2)2 + 𝑏32 = 4𝑎 + 9𝑏
1 4
We have two equations and two variables; the solution is 𝑎 = − 10 , 𝑏 = 15. That is, our closed form
solution is
1 4
𝑁𝑛 = − (−2)𝑛 + 3𝑛
10 15
Since the formula matches the base cases—by design—and also the recurrence, it must match every
term in the sequence.

41. Get a closed-form formula for the number of ways of tiling an 𝒏 × 𝟐 area with 𝟏 × 𝟐 tiles?

The approach of Item 40 applies. We have an inductive definition of the number we want from Item 36:
𝑁1 = 1
𝑁2 = 2
𝑁𝑛 = 𝑁𝑛−1 + 𝑁𝑛−2 for 𝑛 > 2
Assuming a geometric solution for the recurrence, i.e. one of the form 𝑁𝑛 = 𝑎𝑟 𝑛 , leads to the equation

𝑎𝑟 𝑛 = 𝑎𝑟 𝑛−1 + 𝑎𝑟 𝑛−2

Dividing through by 𝑎𝑟 𝑛−2 gives 𝑟 2 = 𝑟 + 1 or 𝑟 2 − 𝑟 − 1 = 0. We can solve this quadratic equation


1+√5 1−√5 2
using the standard formula, yielding two possible values for 𝑟: 𝑟1 = 2
and 𝑟2 = 2
. Neither one of
these allows us to match both base case values, but we can add the two geometric solutions together to
get the following:

𝑁𝑛 = 𝑎𝑟1𝑛 + 𝑏𝑟2𝑛

2 1+√5
Surprisingly, the number is important enough that it gets its own symbol (𝜑), putting it in company with
2
other VIP numbers like 𝜋 and 𝑒. It’s called the golden ratio and has its own Wikipedia article too.

34
©2018, Cullen Schaffer
This still satisfies the recurrence and now we can solve for 𝑎 and 𝑏 to make the solution match the two
base cases. We have

𝑁1 = 1 = 𝑎𝑟11 + 𝑏𝑟21 = 𝑎𝑟1 + 𝑏𝑟2


𝑁2 = 2 = 𝑎𝑟12 + 𝑏𝑟22
Solving, we get

1 + 1/√5 1 − 1/√5
𝑎= ,𝑏 =
2 2
So our closed-form formula is
𝑛 𝑛
1 + 1/√5 1 + √5 1 − 1/√5 1 − √5
𝑁𝑛 = ( ) + ( )
2 2 2 2
𝒏
42. Get a closed-form formula for ∑𝒏𝒊=𝟎 𝟐𝒊 ( ).
𝒊
𝑛 𝑛 𝑛 𝑛
We want a formula for ( ) + 2 ( ) + 4 ( ) + ⋯ + 2𝑛 ( ). We’ll first argue that this is the number of
0 1 2 𝑛
ways of making up an 𝑛-digit number in base 3, i.e. one composed of the digits 0, 1 and 2. Suppose
we’re making up a 7-digit number and we decide to have 5 digits which are not zeroes. We still have
7
some choices to make. There are ( ) ways to decide which positions will be non-zeroes. And then for
5
each such position we have two choices: fill it with 1 or fill it with 2. So the total number of ways of
7
making a number of precisely this kind is 25 ( ).
5
Of course, there are other kinds of 7-digits numbers. For example, following an analogous argument,
7
there are 23 ( ) ways to get numbers with exactly three digits that are not zeroes.
3
Adding together all the different kinds—ones with no non-zero digits, one non-zero digit, etc. up to 7
non-zero digits—we get exactly the formula we’re analyzing. So the formula does give the total number
of 𝑛-digit base-3 numbers.

But we can get this number directly using the multiplication principle. We have 3 choices for the first
digit, 3 for the second digit and so on—3𝑛 . We’ve counted the same thing two ways; the answers must
be the same. Therefore
𝑛
𝑛
∑ 2𝑖 ( ) = 3𝑛
𝑖
𝑖=0

Here’s another approach that gets the same result: Apply the binomial theorem (𝑥 + 𝑦)𝑛 =
𝑛
∑𝑛𝑖=0 ( ) 𝑥 𝑛−𝑖 𝑦 𝑖 with 𝑥 = 1 and 𝑦 = 2. This yields
𝑖
𝑛 𝑛
𝑛 𝑛
3𝑛 = (1 + 2)𝑛 = ∑ ( ) 1𝑛−𝑖 2𝑖 = ∑ 2𝑖 ( )
𝑖 𝑖
𝑖=0 𝑖=0

43. In how many ways can the digits in the base-𝟒 number 0012013031 be rearranged?

35
©2018, Cullen Schaffer
We have 10 digits and we know that they can be placed in 10! orders. Many of these will be the same,
however. For example, swapping the first two digits in the original order has no effect.

Temporarily label all the digits so that they’re distinguishable: 01 02 03 04 11 12 13 21 31 32 . Now we do


have 10! different orders. Suppose we make a complete list. How many times will any actual item like
0012013031 be listed? For any such item, we have 4! ways of adding subscripts to the 0’s, 3! ways of
adding subscripts to the 1’s, 1! way of subscripting the 2 and 2! ways of adding subscripts to the 3’s. In
all, we have 4! 3! 1! 2! ways to subscript, each leading to one entry in the list.

The total number of actual arrangements is therefore


10!
4! 3! 1! 2!
We can get the same answer using the multiplication principle to count ways of producing orderings. In
the first stage, we decide where to put the 0’s. There are ten positions and we have to choose four, so
10
there are ( ) possibilities. Then we choose in which of the six remaining positions to put the three
4
6 3
1’s—there are ( ) possibilities. Continuing in the same way, there are ( ) possibilities for where to put
3 1
2
the 2 and then ( ) possibilities for the 3’s. In all, we have
2
10 6 3 2
( )( )( )( )
4 3 1 2
Substituting using our factorial formula for binomial coefficients, we get
10! 6! 3! 2!
4! 6! 3! 3! 1! 2! 2! 0!
or, after cancelling,
10!
4! 3! 1! 2!
just as before.

Note that if we apply the original approach to binary numbers we can actually derive our factorial
formula for binomial coefficients. As we saw in Item 20, every 𝑛-digit binary number with 𝑘 1’s
corresponds to one way of picking a 𝑘-element subset from a set of 𝑛. For example, 10110 ⋯ would
correspond to the subset in which we include the first element, do not include the second, do include
the third and fourth and so on.
𝑛
The number of 𝑘-element subsets of a set of size 𝑛—i.e. ( )—is thus the number of ways of rearranging
𝑘
𝑘 ones and 𝑛 − 𝑘 zeroes. If the digits had subscripts to distinguish them, we’d have 𝑛! orders. But since
we have 𝑘! ways of assigning subscripts to the ones and (𝑛 − 𝑘)! ways of assigning subscripts to the
zeroes, we have to divide by 𝑘! (𝑛 − 𝑘)! to get the actual number of orders. That is

𝑛 𝑛!
( )=
𝑘 𝑘! (𝑛 − 𝑘)!

36
©2018, Cullen Schaffer
44. In how many ways can five people be assigned ten tasks, if the tasks are all different? What if
they are all the same?

For the first question, it’s natural to take the people one by one and see which tasks they’re assigned,
but this leads to trouble. The first person can be assigned any subset of the ten tasks, so there are 210
possible assignments. But the number of possibilities for the second person will depend on the number
assigned to the first person. We’ll have to look at 11 different cases: first person assigned 0 tasks, first
person assigned 1 task, …, first person assigned 10 tasks. Then for each of these, there will be different
cases for the third person and so on.

A much better idea is to focus on tasks. The first may be assigned to any of the people—that’s 5
possibilities. The second may be assigned to any of the people—another 5 possibilities. With this
approach, the multiplication principle gives us the answer 510 .

For the second question, note that if all tasks are the same, the assignment just specifies how many
tasks each person is given. A sample assignment might look like this: 4, 0, 1, 3, 2, meaning that the first
person got four tasks, the second none and so on. Any series of five numbers that sum to 10 is a valid
assignment; the question is how many of these are possible.

Use stars to stand for tasks and bars to separate people. Then the sample assignment 4, 0, 1, 3, 2 looks
like this:
****||*|***|**

Note that we have 10 stars, one for each task but only 4 bars, which is all that are needed to divide the
stars into five groups. In all, there are 14 symbols and each assignment of tasks corresponds to one
arrangement of these symbols. To form an arrangement, just decide which of the 14 positions will be
14
the bars—there are ( ) possibilities. Therefore, this is the number of ways of assigning ten tasks to
4
five people.

In general, the number of ways of assigning 𝑘 identical tasks to 𝑛 people is same as the number of ways
of arranging 𝑘 stars and 𝑛 − 1 bars, 𝑘 + 𝑛 − 1 symbols in all. This is just the number of ways of deciding
𝑘+𝑛−1
where to put the bars—there are ( ) possibilities. Of course, we could just as well count the
𝑛−1
𝑘+𝑛−1
number of ways of deciding where to put the stars, getting the equivalent answer ( ).
𝑘
45. How many five-digit numbers have at least one repeated digit?

There are many different kinds of numbers that have at least one repeated digit—ones with exactly one
repetition (47403), ones with two repetitions (02320), ones with a single digit repeated three times
(31334), even ones with all digits the same (88888). Rather than dealing with all these cases, we can
get our answer by counting all five-digit numbers and subtracting the number that do not have any
repeated digits.

The multiplication principle gives us the total number of five-digit numbers—there are 10 ways of
choosing the first digit, then 10 ways of choosing the second digit and so on. The total is 105 .

37
©2018, Cullen Schaffer
Picking a number with no repeated digits just limits our choice. We still have 10 ways of choosing the
first digit, but then there are only 9 ways of choosing the second digit (since it can’t be the same as the
first). There are 8 ways of choosing a third digit that isn’t the same as the first two, and so on. The total
number of numbers with no repeated digits is 10 ⋅ 9 ⋅ 8 ⋅ 7 ⋅ 6. This part of the question is an example
of the general one we answered in Item 16.

The number of five-digit numbers with at least one repeated digit is thus 105 − 10 ⋅ 9 ⋅ 8 ⋅ 7 ⋅ 6.

There’s a general principle at work here: To count items that we want, it’s sometimes easier to count
the ones we don’t want and subtract the answer from the total.

46. If all subsets are equally likely, what is the chance that a random subset chosen from a set of 𝟏𝟎
will have 𝟒 elements?

When all possibilities are equally likely, probabilities are just a matter of counting. For example, if we
roll a fair die, there are six equally likely outcomes. If we ask the chance of rolling at least a 5, two out of
2 1
the six possibilities are the ones we mean, so the chance of getting one of them is 6 = 3.

10
A set of 10 elements has 210 subsets. Of these, ( ) have 4 elements. So, if all subsets are equally
4
10 10⋅9⋅8⋅7
( )
4 4⋅3⋅2⋅1
likely, the chance of getting one with 4 element is 210
. That is, 1,024
≈ .205; there is about a 20
percent chance.

47. What is the chance of being dealt a one-pair hand in poker? What about a flush?

A standard American deck of 52 playing cards is divided into four suits: hearts, clubs, diamonds and
spades. In each suit, there is a card for each of 13 face values: ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, jack, queen
and king. We abbreviate the suits as 𝐻, 𝐶, 𝐷 and 𝑆 and the face values as
𝐴, 2, 3, 4, 5, 6, 7, 8, 9, 10, 𝐽, 𝑄, 𝐾; for example the 3 of spades would be written 3𝑆.

Poker is a card game with many variations, but here we assume that a player is dealt a five-card hand at
random. A one-pair hand includes two cards of the same face value (but no other cards that match
these or each other in face value, since that would be a different kind of hand). A sample one-pair hand
is 3𝑆, 𝐴𝐻, 10𝐷, 3𝐶, 𝑄𝐻.
52
The total number of possible hands is ( ). Assuming these are equally likely, all we need to answer
5
the first question is the number of possible one-pair hands. We can construct one in stages and apply
the multiplication principle: in the first stage, we decide the face value of the pair (13 choices); in the
4
second, we decide which two of the four cards with that face value to use (( ) choices); In the third, we
2
12
pick three more face values for the remaining cards (( ) choices); finally, we pick a suit for each of
3
these (4 choices for each, 43 choices in all). The chance of getting a one-pair hand is then
4 12
13 ⋅ ( ) ⋅ ( ) ⋅ 43
2 3 ≈ .423
52
( )
5

38
©2018, Cullen Schaffer
That is, there is slightly more than a 42 percent chance of a one-pair hand.

A flush is a hand in which all cards are of the same suit; e.g. 𝐴𝐻, 7𝐻, 3𝐻, 𝐽𝐻, 10𝐻. A straight is a hand in
which the face values can be arranged in strictly sequential order; e.g. 3𝐻, 4𝐷, 7𝐻, 5𝐶, 6𝑆 (which
contains the sequence 3, 4, 5, 6, 7). Note that a hand satisfying both conditions (e.g.
3𝐻, 4𝐻, 7𝐻, 5𝐻, 6𝐻) counts as a straight flush, which is a third kind of hand and is neither a flush nor a
straight.

Hands with all cards of the same suit can be constructed in stages and counted using the multiplication
principle: in the first stage we pick a suit (4 choices); in the second we choose five cards from that suit
13 13
(( ) choices). This gives us 4 ( ) possible hands, but we need to subtract out the ones which are
5 5
straights. For each of the four suits, there are ten such hands (one with the sequence starting with 𝐴,
one with the sequence starting with 2 and so on up to one with the sequence starting with 10).3 So
there are 40 hands altogether that need to be subtracted. The chance of a flush is thus
13
4 ( ) − 40
5 ≈ .002
52
( )
5
about 0.2 percent.

48. What is the chance of being dealt a sequential hand or a hand all of one suit (or both) in poker?

The key difficulty here is that the categories overlap and we need to be careful not to double count. If
we count the number of sequential hands and then the number which have all cards of the same suit,
we’ll have accounted for all the hands we want, but we will have counted twice hands which have both
properties. To correct, we calculate

sequential hands + one-suit hands − one-suit, sequential hands


There is a general principle here. Let 𝐴 and 𝐵 be sets and let 𝐴 ∪ 𝐵 denote the union, the set containing
all elements in either of the sets. Also, let |𝑆| denote the number of elements in set 𝑆. Then, in general,
|𝐴 ∪ 𝐵| ≠ |𝐴| + |𝐵|, since the right hand side double counts elements which are in both sets. This
overlap is denoted 𝐴 ∩ 𝐵, the intersection. To correct, we subtract out the elements in it:

|𝐴 ∪ 𝐵| = |𝐴| + |𝐵| − |𝐴 ∩ 𝐵|

Getting back to our question, we construct sequential hands in stages; first we choose a sequence (10
choices, see Item 47); then we choose a suit for the lowest face value (4 choices), for the next face value
(4 choices) and so on. The total number of possibilities is 10 ⋅ 45 .

To construct a hand of cards from one suit: choose a suit (4 choices) and then choose five cards from
13 13
that suit (( ) choices)—4 ( ) possibilities in all.
5 5

3
The ace may count as either the lowest face value or the highest, so the sequences run from 𝐴, 2, 3, 4, 5 to
10, 𝐽, 𝑄, 𝐾, 𝐴.

39
©2018, Cullen Schaffer
In Item 47, we determined that there are 40 hands which are sequential and of just one suit. The
chance of getting a hand which is either sequential or all of one suit or both is thus
13
10 ⋅ 45 + 4 ( ) − 40
5 ≈ .006
52
( )
5
In poker terms, we’d say there’s about a . 6 percent chance of getting a straight, a flush or a straight
flush.

49. What is the chance that ten flips of a fair coin will produce a run of at least length 𝟑?

When we flip a coin ten times, the result may be summarized like this (meaning tails on the first two
flips, then heads on the third flip and so on):

𝑇𝑇𝐻𝑇𝑇𝑇𝐻𝐻𝐻𝐻

There are two possibilities for each of the ten flips, so the total number of possibilities is 210 . For a fair
coin, these are equally likely.

In this example, there are three tails in a row and then four heads in a row. We call these runs of length
three and four. To answer the question, we just need to count how many of the 210 possible outcomes
contains a run of at least length 3.

A direct approach runs into trouble, though, because there are different kinds of outcomes that have a
run of at least 3—ones with only a run of 3, ones with two runs of 3, ones with a run of 7, ones with a
run of 3 and also a run of 5 and so on—and we would have to consider each of these case by case.

Instead, we’ll count the outcomes that do not have a run of at least 3 and subtract this from the total of
210 to get the number we want.

Let’s call a pattern of 𝐻’s and 𝑇’s entirely made up of runs of length 1 and 2 a disappointing one, since it
doesn’t have a long run of the desired kind. If we flip a coin 𝑛 times, we’ll let 𝑁𝑛 stand for the number
of possible disappointing patterns. Our goal is simply to calculate 𝑁10 . If we can get it, then the number
of patterns of length 10 that do have a run of least 3 will be the number of all patterns minus the
number of disappointing ones, 210 − 𝑁10 , and the probability of getting such a pattern will be

210 − 𝑁10
210
A disappointing pattern must either start with a run of length 1 or one of length 2. In the first case,
there are two possibilities for what the initial run will look like: 𝐻 or 𝑇. In either case, the remainder
must be a disappointing pattern of length 𝑛 − 1 and there are 𝑁𝑛−1 of these. But half of them are ruled
out. If the run of length 1 is 𝐻, then the disappointing pattern of length 𝑛 − 1 that follows must start
with a 𝑇—if not, we would have begun with 𝐻𝐻, a run of length 2. Likewise, if the run of length 1 is 𝑇,
the disappointing pattern that follows must start with an 𝐻. Putting together what we’ve said—and
applying the multiplication principle—if we start with a run of length 1, we have two possibilities for the
1 1
run and 2 𝑁𝑛−1 ways to complete the pattern: 2 ⋅ 2 𝑁𝑛−1 = 𝑁𝑛−1 possibilities.

40
©2018, Cullen Schaffer
In the second case, the initial run is of length 2 and must be either 𝐻𝐻 or 𝑇𝑇: two possibilities. In either
case, the remainder must be a disappointing pattern of length 𝑛 − 2 starting with the other letter and
1 1
there are 𝑁𝑛−2 of these. Applying the multiplication principle, we get a total of 2 ⋅ 𝑁𝑛−2 = 𝑁𝑛−2
2 2
possibilities.

Putting the two cases together, we have a recurrence:

𝑁𝑛 = 𝑁𝑛−1 + 𝑁𝑛−2
To complete an inductive definition for 𝑁𝑛 , we just need two base case values. These are easy: 𝑁1 = 2
and 𝑁2 = 4, since every pattern of length 1 or 2 is disappointing—it certainly can’t contain a run of
length 3—and we know that there are 2 patterns of length 1 and 4 patterns of length 2.

Our inductive definition is exactly the same as the one for which we found a closed form back in Item
41, except that the base cases are twice as big. If we want, then, we could use the formula we derived
and multiply our final answer by two. For the purpose of computing 𝑁10 , though, it’s easier just to
compile a table:

𝑛 1 2 3 4 5 6 7 8 9 10
𝑁𝑛 2 4 6 10 16 26 42 68 110 178

210 −178
The probability we are trying to calculate is thus 210
≈ .826, about 83 percent.

50. On average, how many times will we flip a fair coin if we stop at the first heads?

It’s possible that the very first flip will come up heads and, in this case, we’ll stop immediately, having
1
flipped just one time. In fact, this happens with probability . Of course, it’s also possible that the flips
2
will look like this: 𝑇𝑇𝑇𝑇𝐻. This is much less likely, but it is certainly possible; in this case we will have
flipped 5 times.

What does it mean to ask for the average number of flips?

Suppose we ask for the average number that will be rolled with a fair die. If we roll the die any fixed
number of times and record the outcomes, the answer is obvious. For example, if four rolls come out
5+1+3+2
5, 1, 3 and 2, then we’d say the average (so far) is 4
= 2.75. But what about if we roll the die a
much larger number of times? In this case, we’d expect about a sixth of the rolls to result in each of the
six possible outcomes—in fact, this is what we mean when we say that the die is fair.

Of course, we wouldn’t expect the distribution to be exactly equal, just reasonably close. For example, if
we roll the die 600 times, we’d expect a typical result to look like the following:

Outcome 1 2 3 4 5 6
# of rolls 97 105 102 93 99 104

In this case, the average would be

41
©2018, Cullen Schaffer
97 times 105 times 104 times

1 + 1 + ⋯+ 1 + ⏞
2 + 2 + ⋯+ 2 + ⋯+ ⏞ 6 + 6 + ⋯ + 6 97 ⋅ 1 + 105 ⋅ 2 + ⋯ + 104 ⋅ 6
=
600 600
97 105 104
= ⋅1+ ⋅ 2 + ⋯+ ⋅6
600 600 600
1
Note that each of the fractions is a number close to 6. The larger the number of rolls, the closer we’d
1
expect the fractions to get to 6; again, this is what we mean when we say that the die is fair. If we
increase the number of flips to 6,000 or 600,000, it’s clear that the formula would be essentially the
1
same, except that the fractions would be even closer to 6. In the long run, then, the average is

1 1 1 1 1 1
⋅ 1 + ⋅ 2 + ⋅ 3 + ⋅ 4 + ⋅ 5 + ⋅ 6 = 3.5
6 6 6 6 6 6
Of course, this is exactly what we would have gotten if we’d proceeded to calculate the average in the
familiar way: (1 + 2 + 3 + 4 + 5 + 6)/6. The advantage of the argument we’ve just made is that it tells
us how to calculate an average when probabilities are not equal.

For example, suppose a die is weighted so that it comes up six half the time, with the other outcomes
1
splitting the remaining probability equally. Then after a large number of rolls, we’d expect about
10
1 1 1
ones, twos and so on up to fives and, of course, sixes. In this case, the average is
10 10 2

1 1 1 1 1 1
⋅1+ ⋅2+ ⋅3+ ⋅4+ ⋅ 5 + ⋅ 6 = 4.5
10 10 10 10 10 2
This kind of average, with each number weighted by a fraction and the fractions summing to one, is
called a weighted average. In the case of numerical outcomes weighted by the chance of their
occurring, the weighted average is called the expected value. We say that the expected value of a roll of
the unfair die we just studied is 4.5.

The question of this item is: What is the expected value of the number of flips of a fair coin if we stop at
the first heads? The answer is the weighted average

𝑃(1) ⋅ 1 + 𝑃(2) ⋅ 2 + 𝑃(3) ⋅ 3 + ⋯


Here 𝑃(𝑛) stands for the probability that heads will first appear on the 𝑛th flip. Note that the sum goes
on forever, since there is no number of flips that is strictly impossible.
1
We’ve already pointed out that 𝑃(1) = 2; that is, half the time the very first flip is heads. Of the half of
the time that we’re unsuccessful on the first flip, exactly half of the time we’ll be successful on the
1 1 1 1
second flip—that’s half of half or one quarter, i.e. ⋅ = , of the time. So 𝑃(2) = . Of the one
2 2 4 4
quarter of the time that we’re unsuccessful on the first two flips, exactly half the time we’ll succeed on
1 1 1
the third—that’s half of one quarter or one eighth, i.e. 2 ⋅ 4 = 8, of the time.

1 𝑛
By now, the pattern should be clear: 𝑃(𝑛) = (2) . The average we want is

42
©2018, Cullen Schaffer

1 𝑛
∑( ) 𝑛
2
𝑛=1

𝟏 𝒏
51. Compute ∑∞
𝒏=𝟏 (𝟐) 𝒏.

The sum we want is


1 1 1
𝑆= ⋅1+ ⋅2+ ⋅3+⋯
2 4 8
We can take nearly the same approach that worked with the geometric series in Item 21. First multiply
1
both sides of the equation by 2 to get

1 1 1
𝑆= ⋅1+ ⋅2+ ⋅3+⋯
2 4 8
1 1 1
𝑆= + ⋅1+ ⋅2+⋯
2 4 8
Subtracting the bottom equation from the top gives us
1 1 1 1 1 1 1
𝑆 = ⋅1+ ⋅1+ ⋅1+⋯= + + +⋯
2 2 4 8 2 4 8
Multiplying by 2 on both sides then yields
1 1
𝑆 =1+ + +⋯
2 4
It’s probably obvious now that 𝑆 = 2, but, if it’s not, we can sum up the infinite geometric series using
1 1
the last formula we derived in Item 21 (∑∞ 𝑖
𝑖=0 𝑟 = 1−𝑟) with 𝑟 = 2.

By the way, our answer, 2, is the average number of flips it takes to get heads on a fair coin, and this
ought to be obvious as well. A fair coin comes up heads, on average, once every two flips. Likewise, if
1
we have an unfair coin that has only a chance of coming up heads, we’d expect to see heads once
10
every 10 flips, so the average number of flips to get heads ought to be 10. Let’s see if the mathematics
confirms our intuition. The average we want is still

𝑃(1) ⋅ 1 + 𝑃(2) ⋅ 2 + 𝑃(3) ⋅ 3 + ⋯


but now the probabilities are different. For example, 𝑃(4) = 𝑃(𝑇𝑇𝑇𝐻). It happens that we are
1 9 9 9 9
unsuccessful three times and then successful of ofof of the time, that is, with probability ⋅
10 10 10 10 10
9 9 1 9 3 1 9 𝑛−1 1
⋅ ⋅ =( ) . In general, we have 𝑃(𝑛) = ( ) and the average we want is
10 10 10 10 10 10 10

9 0 1 9 1 1 9 2
𝑆=( ) ⋅1+( ) ⋅2+( ) ⋅3+⋯
10 10 10 10 10
Before solving, let’s note that the formula for the average would be of exactly the same form if there
1 1 1 6 9
was, say, a 7 chance of heads. In this case, we’d just substitute 7 in place of 10 and 7 in place of 10. Let’s

43
©2018, Cullen Schaffer
make the formula more general by using 𝑝 to represent the chance of heads and 𝑞 to represent 1 − 𝑝,
the chance of tails. So the average is then

𝑆 = 𝑞 0 𝑝 ⋅ 1 + 𝑞1 𝑝 ⋅ 2 + 𝑞 2 𝑝 ⋅ 3 + ⋯
Multiply on both sides by the geometric factor 𝑞:

𝑆 = 𝑞 0 𝑝 ⋅ 1 + 𝑞1 𝑝 ⋅ 2 + 𝑞 2 𝑝 ⋅ 3 + ⋯
𝑞𝑆 = +𝑞1 𝑝 ⋅ 1 + 𝑞 2 𝑝 ⋅ 2 + ⋯
Subtract the bottom from the top, noting that, on the left-hand side, we have 𝑆 − 𝑞𝑆 = (1 − 𝑞)𝑆 = 𝑝𝑆:

𝑝𝑆 = 𝑞 0 𝑝 + 𝑞1 𝑝 + 𝑞 2 𝑝 + ⋯
= 𝑝(1 + 𝑞 + 𝑞 2 + ⋯ )
Divide by 𝑝 on both sides to get

𝑆 = 1 + 𝑞 + 𝑞2 + ⋯
and then apply the formula for an infinite series from Item 21:
1 1
𝑆= =
1−𝑞 𝑝
1 1
Sure enough, if 𝑝 = , the expected number of flips is = 10.
10 1/10

𝒏
52. Prove that ∑𝒏𝒊=𝟎 𝒊 ( ) = 𝒏𝟐𝒏−𝟏 both combinatorially and by induction.
𝒊
Suppose we have a set of 𝑛 elements, {1, 2, 3, … , 𝑛} and we make up a list of all possible pairs of (a) a
subset of this set and (b) an element of the subset. Here are a few sample pairs:
{3, 7, 18}, 7
{3, 7, 18}, 18
{3, 5, 7, 18, 25}, 7
We’ll count the number of items in the list in two ways; since the number of items is the same no matter
how we count, the two answers must be the same, which will establish the equality we want.

First, consider ordering the list so that all items with subsets of the same size are grouped together.
𝑛
How many items will be in the group with subsets of size, say, 3? There are ( ) of such subsets, and
3
each will be listed 3 times, once with each of its elements. The total number of items in the group is
𝑛 𝑛
3 ( ). In general, the number of items in the group with subsets of any particular size 𝑖 will be 𝑖 ( ).
3 𝑖
𝑛 𝑛
Adding together the number for each group, we get the total number of items in the list: ∑𝑖=0 𝑖 ( ).
𝑖
Second, consider ordering the list so that all items that choose a particular element are grouped
together, i.e. all items of the form {⋯ }, 1 in the first group, all items of the form {⋯ }, 2 in the second
group and so on. How many items will be in the group where, say, 3 is chosen? There will be one for
every subset containing 3. To make a subset containing 3, we just add 3 to any subset of the remaining
𝑛 − 1 elements; since there are 2𝑛−1 such subsets, there are 2𝑛−1 subsets containing 3 and 2𝑛−1 items

44
©2018, Cullen Schaffer
in our list with 3 as the chosen element. But the same argument applies for any other chosen element.
So the list contains 𝑛2𝑛−1 items in all.

Now let’s try an induction proof. For the base case, we just check that both sides of the equation are
zero when 𝑛 = 0. Assume then that the equation holds for some value of 𝑛 and look at the left side for
the next higher value:
𝑛+1
𝑛+1
∑𝑖( )
𝑖
𝑖=0

𝑛+1
The very first term in this sum is zero, so we can leave it off. The last term is (𝑛 + 1) ( ) = 𝑛 + 1;
𝑛+1
we’ll list this separately from the other terms. After these two adjustments, our sum looks like this:
𝑛
𝑛+1
∑𝑖( )+𝑛+1
𝑖
𝑖=1

𝑛 𝑛−1 𝑛−1
Now we can apply the standard recurrence ( ) = ( )+( ) to the binomial coefficient:
𝑘 𝑘 𝑘−1
𝑛
𝑛 𝑛
∑ 𝑖 (( ) + ( )) + 𝑛 + 1
𝑖 𝑖−1
𝑖=1

Breaking the ∑ part into two sums we get


𝑛 𝑛
𝑛 𝑛
∑𝑖( )+ ∑𝑖( )+𝑛+1
𝑖 𝑖−1
𝑖=1 𝑖=1

By assumption, the first sum is equal to 𝑛2𝑛−1—we’re missing the 𝑖 = 0 term, but that would be equal
to zero anyway and makes no contribution. So now we have
𝑛
𝑛−1 𝑛
𝑛2 + ∑𝑖 ( ) + 𝑛 + 1 (∗)
𝑖−1
𝑖=1

We’ve marked this with a star so we can refer to it later. For now, let’s look at the one remaining Σ part.
We’ll rewrite 𝑖 as (𝑖 − 1) + 1, distribute and break the sum into two pieces:
𝑛 𝑛
𝑛 𝑛
∑𝑖( ) = ∑((𝑖 − 1) + 1) ( )
𝑖−1 𝑖−1
𝑖=1 𝑖=1
𝑛
𝑛 𝑛
= ∑(𝑖 − 1) ( )+( )
𝑖−1 𝑖−1
𝑖=1
𝑛 𝑛
𝑛 𝑛
= ∑(𝑖 − 1) ( ) + ∑( )
𝑖−1 𝑖−1
𝑖=1 𝑖=1

45
©2018, Cullen Schaffer
𝑛
The first sum is almost the one in our assumption—it’s just missing the last term 𝑛 ( ) = 𝑛. So it comes
𝑛
to 𝑛2𝑛−1 − 𝑛. The second is almost the sum of the 𝑛th row of Pascal’s triangle—it’s just missing the last
𝑛
term ( ) = 1. So it comes to 2𝑛 − 1. Putting these together we have 2𝑛 + 𝑛2𝑛−1 − 𝑛 − 1.
𝑛
Now we substitute this into the formula we marked with a star:

𝑛2𝑛−1 + (2𝑛 + 𝑛2𝑛−1 − 𝑛 − 1) + 𝑛 + 1


Simplifying, we get

2 ⋅ 𝑛2𝑛−1 + 2𝑛 = 𝑛2𝑛 + 2𝑛 = (𝑛 + 1)2𝑛


thus confirming that the claimed formula is correct for the next higher value after 𝑛.

53. What’s the chance that a coin with a probability . 𝟔 of heads will turn up heads seven times in ten
flips?

We’ve already seen how to calculate the chance of any particular string of heads and tails. For example,
in order for three flips to come up 𝐻𝐻𝑇, assuming any individual flip has a 60 percent chance of heads,
we reason this way:

 The first flip comes up heads 60 percent of the time.


 Of that 60 percent of the time, the second flip will come up heads 60 percent of the time. That
is, we’ll see 𝐻𝐻 60 percent of 60 percent of the time: . 6 ⋅ .6 = .36, so that’s 36 percent of the
time.
 Of this 36 percent of the time, the third flip will come up tails 40 percent of the time. That is,
we’ll see 𝐻𝐻𝑇 40 percent of 36 percent of the time: . 6 ⋅ .6 ⋅ .4.

Rather than talking about “percent of the time,” we can apply the counting methods we’ve developed.
Let’s think of picking at random from the following set: {𝐻1 , 𝐻2 , 𝐻3 , 𝐻4 , 𝐻5 , 𝐻6 , 𝑇1 , 𝑇2 , 𝑇3 , 𝑇4 }. This clearly
gives us the same probabilities as flipping the 60/40 coin. If we pick three times and then drop the
subscripts, what’s the chance that the result will be 𝐻𝐻𝑇. With the subscripts, the total number of
possibilities for three picks is 103 . The number which will be 𝐻𝐻𝑇 after subscripts are dropped is 6 ⋅ 6 ⋅
4. The probability is thus
6⋅6⋅4 6 6 4
3
= = .6 ⋅ .6 ⋅ .4
10 10 10 10
Of course, the answer will be exactly analogous for a number of flips other than three. For any
particular outcome, we get a factor of . 6 for each 𝐻 and one of . 4 for each 𝑇. For example, the
outcome 𝐻𝑇𝐻𝐻𝑇𝐻𝐻𝑇𝐻𝐻—one of the ones of interest, since it has seven heads in ten flips—will be
observed with probability (. 6)7 (. 4)3 . Since each of the outcomes of interest has the same probability,
all we need to do is count them and this is easy—there’s one for each way to choose 7 positions for the
10
𝐻’s out of ten positions available. There are ( ) outcomes, each with probability (. 6)7 (. 4)3 , so the
7
10 7 3
total probability is ( ) (. 6) (. 4) .
7
This answers our question, but there’s more to say. First, there is nothing special about the number 7;
exactly the same argument will tell us that the chance of seeing 𝑖 heads in ten flips is

46
©2018, Cullen Schaffer
10
( ) (. 6)𝑖 (. 4)10−𝑖 . And of course, if we add up the chance of seeing 0 heads, the chance of seeing 1
𝑖
heads, the chance of seeing 2 heads and so on up to the chance of seeing 10 heads, we will have
accounted for 100 percent of the probability:
10
10
∑ ( ) (. 6)𝑖 (. 4)10−𝑖 = 1
𝑖
𝑖=0

If you also notice that 1 = 110 = (. 6 + .4)10 , then you’ll see that we just have a special case of the
binomial theorem (Item 9), with 𝑥 = .6 and 𝑦 = .4.

Second, there’s also nothing special about the number 10 or the probability . 6. If the number of flips is
𝑛 and the probability of heads on each flip is 𝑝 and if we let 𝑞 stand for 1 − 𝑝, the probability of tails,
then the probability of seeing 𝑘 heads is
𝑛
𝑃(𝑘 heads in 𝑛 flips) = ( ) 𝑝𝑘 𝑞𝑛−𝑘
𝑘
54. If a teacher collects quiz papers from 𝟑𝟎 students, shuffles them and hands them back for grading,
what is the chance that at least one student will be grading his or her own paper?

There are 30! ways of handing back the papers. The difficulty is in counting the number ways in which
some student gets his or her own paper. Let 𝐴1 be the ways in which the first student gets back his or
her own paper, 𝐴2 be the ways in which the second student gets back his or her own paper and so on.
What we want is

|𝐴1 ∪ 𝐴2 ∪ 𝐴3 ∪ ⋯ ∪ 𝐴30 |
Of course, this is not |𝐴1 | + |𝐴2 | + |𝐴3 | + ⋯ + |𝐴30 |, because these subsets overlap. Any way of giving
back the papers so that the first and second students both get their own papers back will be in 𝐴1 and
𝐴2 and would be counted at least twice in this formula—more if it happens to be a way in which other
students also get their papers back.

We addressed a simple example of this kind of overlap problem in Item 48. If we want to know the
number of elements in one or both of two sets, 𝐴 and 𝐵, we add the number in each and then subtract
the number in the overlap, 𝐴 ∩ 𝐵, to correct for the fact that these have been counted twice. We’d like
to extend this idea to more than two sets.

To start, let’s consider three sets, 𝐴1 , 𝐴2 and 𝐴3 and try to count the number of elements in the union
𝐴1 ∪ 𝐴2 ∪ 𝐴3 . Here’s a picture of the situation.

47
©2018, Cullen Schaffer
All of the elements we want to count are of three types. Either they are in exactly one of the sets, like
the element 𝑎 in the diagram, in exactly two of the sets, like 𝑏, or in all three, like 𝑐.

Suppose we try to write the total as |𝐴1 | + |𝐴2 | + |𝐴3 |. This correctly counts elements like 𝑎 once each.
But, as we’ve seen, it double counts elements like 𝑏 which are in two sets. To correct, we can subtract
off the size of each of the intersections; now our formula is

|𝐴1 | + |𝐴2 | + |𝐴3 | − |𝐴1 ∩ 𝐴2 | − |𝐴2 ∩ 𝐴3 | − |𝐴1 ∩ 𝐴3 |

The only problem now is with elements like 𝑐 which are in all three sets. An element like this is counted
once in each of the first three terms of the formula and subtracted in each of the last three terms. The
result is that such an element adds nothing to the total. We correct for this final problem by adding a
term that counts precisely elements of this kind:

|𝐴1 | + |𝐴2 | + |𝐴3 | − |𝐴1 ∩ 𝐴2 | − |𝐴2 ∩ 𝐴3 | − |𝐴1 ∩ 𝐴3 | + |𝐴1 ∩ 𝐴2 ∩ 𝐴3 |

This is a correct formula for the size of the union of three sets |𝐴1 ∪ 𝐴2 ∪ 𝐴3 |. But it also suggests a
more general rule. Here we added the sizes of the individual sets, subtracted the sizes of the
intersections of two sets and added the sizes of the intersection of three. What if we had more sets?
Would the correct formula continue by subtracting the sizes of all intersections of four sets, adding the
sizes of the intersections of five sets and so on?

In fact, we can prove that such a formula is correct for the union of any number of sets, by picking any
arbitrary element in the union and showing that it contributes exactly one to the sum. If each element
contributes one, then the total will clearly be the number of elements.

Suppose then that we have an element in a union of 𝑛 sets. This element belongs to some of the sets;
let’s say it belongs to 𝑘 of them. Then in the first part of the proposed formula

|𝐴1 | + |𝐴2 | + |𝐴3 | + ⋯ + |𝐴𝑛 |

the element will contribute 𝑘, one for each set of which it’s a member.

In the second part of the sum

−|𝐴1 ∩ 𝐴2 | − |𝐴1 ∩ 𝐴3 | − |𝐴1 ∩ 𝐴4 | − ⋯ − |𝐴𝑛−1 ∩ 𝐴𝑛 |

𝑘
we'll subtract one for every pair of subsets of which our element is a member—that’s ( ) of them. So,
2
𝑘
so far we have a contribution of 𝑘 − ( ).
2

In the third part of the sum, we’ll add one for every group of three subsets of which our element is a
𝑘 𝑘 𝑘
member—that’s ( ) of them. Now we have a contribution of 𝑘 − ( ) + ( ) or, noting that 𝑘 can be
3 2 3
𝑘 𝑘 𝑘 𝑘
written ( ), a contribution of ( ) − ( ) + ( ). If we continue in the same way up through groups of 𝑘
1 1 2 3
subsets, we’ll get the total contribution of the element on which we’re focusing:

48
©2018, Cullen Schaffer
𝑘 𝑘 𝑘 𝑘
( )−( )+ ( )− ⋯( )
1 2 3 𝑘

Note that we don’t need to worry about intersections involving more than 𝑘 sets. Since our element is
in just 𝑘 sets, it can’t be a member of any such intersection.

𝑘
Now, if we subtract our formula for the contribution of our element from ( ), we get something
0
familiar:

𝑘 𝑘 𝑘 𝑘 𝑘 𝑘 𝑘 𝑘 𝑘 𝑘
( ) − [( ) − ( ) + ( ) − ⋯ ( )] = ( ) − ( ) + ( ) − ( ) + ⋯ ( )
0 1 2 3 𝑘 0 1 2 3 𝑘

This is just the alternating sum of numbers in a row of Pascal’s triangle. In Item 26, we showed that it
𝑘
comes to zero. But ( ) is one, and if something subtracted from one is zero, that something must also
0
be one. In other words, the contribution of the element we’re following is one, exactly as we hoped.

The formula we’ve just confirmed—giving the number of elements in a union as the number in each of
the sets minus the number in each intersection of two sets plus the number in each intersection of three
sets and so on—is called the inclusion-exclusion principle. In general, it looks like this:

|𝐴1 ∪ 𝐴2 ∪ ⋯ ∪ 𝐴𝑛 | = +|𝐴1 | + |𝐴2 | + ⋯ + |𝐴𝑛 |


−|𝐴1 ∩ 𝐴2 | − |𝐴1 ∩ 𝐴3 | − ⋯ − |𝐴𝑛−1 ∩ 𝐴𝑛 |
+|𝐴1 ∩ 𝐴2 ∩ 𝐴3 | + |𝐴1 ∩ 𝐴2 ∩ 𝐴4 | + ⋯ + |𝐴𝑛−2 ∩ 𝐴𝑛−1 ∩ 𝐴𝑛 |

±|𝐴1 ∩ 𝐴2 ∩ ⋯ ∩ 𝐴𝑛 |

Rows alternate, positive terms on one row and negative terms on the next. We’ve shown the sign as ±
on the last row because it will depend on whether 𝑛 is even or odd.

Now that we have the formula, our question about student papers is straightforward. Again, we want

|𝐴1 ∪ 𝐴2 ∪ 𝐴3 ∪ ⋯ ∪ 𝐴30 |

We first calculate |𝐴1 | + |𝐴2 | + ⋯ + |𝐴30 |. Each of these sets is of the same size, so we just need to
determine the size of any one, say |𝐴1 |, and then multiply by 30. The set 𝐴1 contains all the ways of
giving out papers so that the first student gets his or her own paper. We have no choice for the first
student, but then the remaining 29 papers can be returned in any order. So there are 29! possibilities.
Multiplying by 30 we get 30 ⋅ 29! for the first part of the inclusion-exclusion formula. We’ll rewrite the
30 30
30 as ( )—why will be clear in a moment—and give our answer as ( ) 29!.
1 1

Next we calculate

−|𝐴1 ∩ 𝐴2 | − |𝐴1 ∩ 𝐴3 | − |𝐴1 ∩ 𝐴4 | − ⋯ − |𝐴29 ∩ 𝐴30 |

49
©2018, Cullen Schaffer
Again, each of these is the same size, so we just need to determine the size of any one and multiply by
30
( ). The number of elements in, say, 𝐴1 ∩ 𝐴2 , is the number of ways of giving the first and second
2
students back their own papers and then returning the rest of the papers in any order. We have 28!
30
choices for the latter; hence the total for this part of the inclusion-exclusion formula is ( ) 28!.
2

30 30
The same argument will yield ( ) 27! for the third part of the formula, ( ) 26! for the fourth part and
3 4
so on. The pattern is clear, so we can jump to the full formula, remembering to divide by the total of all
ways of handing out the paper so that we get what we want—the probability of someone getting his or
her own paper:

30 30 30 30
( ) 29! − ( ) 28! + ( ) 27! − ⋯ − ( ) 0!
1 2 3 30 ≈ .632
30!

That is, there is about a 63 percent chance. Note that the very last term in the sum in the numerator
includes the factor 0!, which is conventionally defined to be 1. If we don’t want to rely on this
convention, we can simply use the fact that this is meant to count the number of ways of giving back
papers in which all of the 30 students get their own papers—clearly there is just one way.

The remaining probability, ≈ .368, is the chance that no one will get his or her own paper. A
permutation in which no item remains in place is called a derangement. The probability of a
derangement of 30 items is thus about . 368. Two things are notable about this result. First, it remains
almost exactly the same no matter how many items we have; the probability of a derangement of 300
or 3,000 items is also about . 368. Second, this recurring probability is very close to 1/𝑒; in fact, it
becomes exactly 1/𝑒 as the number of items increases to infinity. We’ll see why before long.

55. Prove that 𝑨 ∪ (𝑩 ∩ 𝑪) = (𝑨 ∪ 𝑩) ∩ (𝑨 ∪ 𝑪) and that ̅̅̅̅̅̅̅̅


𝑨∪𝑩=𝑨 ̅∩𝑩
̅.

The first of these has the same form as the familiar distributive law: 𝑥 ⋅ (𝑦 + 𝑧) = 𝑥 ⋅ 𝑦 + 𝑥 ⋅ 𝑧. But this
law applies to numbers, whereas what’s stated here is a distributive law for sets.

The statement is an equality; how do we prove that two sets are equal? In order for them not to be
equal, they have to differ somehow in the elements they contain. That is, there has to be an element in
one of them that’s not in the other. To show that they’re equal, we show that this is not the case, i.e.
that every element in one is in the other and vice versa.

Start by picking an element—call it 𝑥—in 𝐴 ∪ (𝐵 ∩ 𝐶). Our goal is to show that it is in (𝐴 ∪ 𝐵) ∩ (𝐴 ∪


𝐶). Since 𝑥 is in 𝐴 ∪ (𝐵 ∩ 𝐶), then by the definition of union it must be in either 𝐴 or 𝐵 ∩ 𝐶. In the first
case, 𝑥 ∈ 𝐴, we immediately have that 𝑥 ∈ 𝐴 ∪ 𝐵 and 𝑥 ∈ 𝐴 ∪ 𝐶, which in turn implies that 𝑥 ∈
(𝐴 ∪ 𝐵) ∩ (𝐴 ∪ 𝐶). In the second case, 𝑥 ∈ 𝐵 ∩ 𝐶, the definition of intersection tells us that 𝑥 ∈ 𝐵 and
𝑥 ∈ 𝐶. But then 𝑥 ∈ 𝐴 ∪ 𝐵 and 𝑥 ∈ 𝐴 ∪ 𝐶. So, again, 𝑥 ∈ (𝐴 ∪ 𝐵) ∩ (𝐴 ∪ 𝐶). Summarizing, any
element in the set on the left hand side of the equation must be in the one on the right.

Now we give the other half of the argument…that an element in the right hand set must also be in the
left hand one. Let 𝑥 be an element in (𝐴 ∪ 𝐵) ∩ (𝐴 ∪ 𝐶). Then by the definition of intersection, it must

50
©2018, Cullen Schaffer
be in 𝐴 ∪ 𝐵 and also in 𝐴 ∪ 𝐶. To be in 𝐴 ∪ 𝐵, it must be in either 𝐴 or 𝐵. In the first case, it is in 𝐴 ∪
(𝐵 ∩ 𝐶) simply by virtue of being in 𝐴. Thus we only need even to consider the second case, where 𝑥 is
in 𝐵, when 𝑥 is not in 𝐴. Since 𝑥 is in 𝐴 ∪ 𝐶 and not in 𝐴, it must be in 𝐶. But then 𝑥 ∈ 𝐵 and 𝑥 ∈ 𝐶;
hence 𝑥 ∈ 𝐵 ∩ 𝐶 meaning, in turn, that it is in 𝐴 ∪ (𝐵 ∩ 𝐶). In either case, then, an element in the right
hand set must also be in the left hand one.

This completes the proof, because we’ve now shown it’s impossible for either set to have an element
that’s not in the other, which is what it means for them to be unequal.

The second statement uses a standard notation for the complement of a set: 𝐴̅ is the set of all things
not in 𝐴. Note that this assumes some context—a larger set or universe—of which 𝐴 is a subset. For
example, if 𝐴 = {0, 2, 4, 6, 8, … }, we probably have in mind that 𝐴 is a subset of the set ℕ =
3
{0, 1, 2, 3, … }, the natural numbers. In this case, 5 ∈ 𝐴̅ but −5 ∉ 𝐴̅, ∉ 𝐴̅ and √2 ∉ 𝐴̅ . Or course, we
4
could also consider 𝐴 to be a subset of the set of all integers, ℤ = {… , −3, −2, −1, 0, 1, 2, 3, … }, in which
case −5 ∈ 𝐴̅. Or we could consider 𝐴 to be a subset of ℚ =
3
{all rational numbers (positive and negative fractions)}, in which case ∈ 𝐴̅. Or we could consider 𝐴
4
to be a subset of the real numbers, ℝ—all numbers on the number line—in which case √2 ∈ 𝐴̅ . If the
context is ever unclear, we need to state it explicitly to avoid ambiguity.

We’d like to prove that ̅̅̅̅̅̅̅


𝐴 ∪ 𝐵 = 𝐴̅ ∩ 𝐵̅. Again, we proceed by showing that an element in either of
these sets must also be in the other. Suppose 𝑥 ∈ ̅̅̅̅̅̅̅
𝐴 ∪ 𝐵. Then 𝑥 ∉ 𝐴 ∪ 𝐵. This means it can’t be in 𝐴
and it also can’t be in 𝐵—otherwise it would be in 𝐴 ∪ 𝐵. But then 𝑥 ∈ 𝐴̅ and 𝑥 ∈ 𝐵̅, hence 𝑥 ∈ 𝐴̅ ∩ 𝐵̅.

Working in the other direction, suppose 𝑥 = 𝐴̅ ∩ 𝐵̅. Then 𝑥 ∈ 𝐴̅ and hence 𝑥 ∉ 𝐴. Likewise, 𝑥 ∉ 𝐵.
But if 𝑥 is neither in 𝐴 nor in 𝐵, then it can’t be in 𝐴 ∪ 𝐵. Hence 𝑥 ∈ ̅̅̅̅̅̅̅
𝐴 ∪ 𝐵, as we wanted to show.

Again, the proof is complete, because we’ve shown it’s impossible for the left hand and right hand sets
to differ by even a single element.

The distributive law for numbers “distributes multiplication over addition”: 𝑥 ⋅ (𝑦 + 𝑧) = 𝑥 ⋅ 𝑦 + 𝑥 ⋅ 𝑧.


We can’t swap the operation symbols ⋅ and + and distribute addition over multiplication—that is, the
following is false for most numbers: 𝑥 + (𝑦 ⋅ 𝑧) = (𝑥 + 𝑦) ⋅ (𝑥 + 𝑧).

By contrast, there are two distributive laws for sets, the one we’ve just seen and the analogous one with
the set operations swapped: 𝐴 ∩ (𝐵 ∪ 𝐶) = (𝐴 ∩ 𝐵) ∪ (𝐴 ∩ 𝐶).

Likewise, the relation we just proved for set complements continues to be true if we swap the
operators. The two relations are known as De Morgan's laws:

̅̅̅̅̅̅̅
𝐴 ∪ 𝐵 = 𝐴̅ ∩ 𝐵̅
̅̅̅̅̅̅̅
𝐴 ∩ 𝐵 = 𝐴̅ ∪ 𝐵̅

The second distributive law for sets and the second of De Morgan’s laws can be proved with exactly the
same sort of straightforward (and tedious) reasoning we used here.

51
©2018, Cullen Schaffer
Note that, with De Morgan’s laws, we can compute an intersection using only unions (and vice versa).
We rely on the fact that the complement of the complement of a set is that same set—things that are
not not in a set are in it:

𝐴 ∩ 𝐵 = ̅̅̅̅̅̅̅
𝐴 ∩ 𝐵 = ̅̅̅̅̅̅̅̅
𝐴̅ ∪ 𝐵̅

The second equality follows by applying the second of De Morgan’s laws.

56. Prove that 𝑷 ∨ (𝑸 ∧ 𝑹) = (𝑷 ∨ 𝑸) ∧ (𝑷 ∨ 𝑹) and that ¬(𝑷 ∨ 𝑸) = ¬𝑷 ∧ ¬𝑸.

De Morgan’s laws actually have nothing to do with sets; they’re statements about logic. In a moment,
we’ll see the connection.

The kind of logic we have in mind is called propositional. It deals with statements—propositions—each
of which is definitely either true or false. We may not know the truth value of a proposition—e.g. It will
rain next Monday—but the statement must be either true or false.

Typically, we use letters like 𝑃, 𝑄 and 𝑅 to stand for such propositions. For example, we might have
𝑃 = It will rain next Monday
𝑄 = You will get an A in this course
𝑅 = I have three children
Once we have some propositions, we can also consider the truth value of more complex expressions
using the logical operators ∧ (and), ∨ (or) and ¬ (not). We define 𝑃 ∧ 𝑄 to be true exactly when 𝑃 and
𝑄 are both true and we define 𝑃 ∨ 𝑄 to be true exactly when either 𝑃 is true or 𝑄 is true or—note
carefully—when both are true. We also define ¬𝑃 to be true when 𝑃 is not true; that is, ¬𝑃’s truth
value is the opposite of 𝑃’s.

We’d like, first, to prove a distributive law for propositional logic: 𝑃 ∨ (𝑄 ∧ 𝑅) = (𝑃 ∨ 𝑄) ∧ (𝑃 ∨ 𝑅). The
situation here is radically unlike the analogous ones for numbers and sets. For example, with numbers,
the statement 𝑥 ⋅ (𝑦 + 𝑧) = 𝑥 ⋅ 𝑦 + 𝑥 ⋅ 𝑧 holds for an infinite number of different values of 𝑥, 𝑦 and 𝑧.
The distributive law for sets holds for sets of an infinite number of different sizes. By contrast, the
variables in our new law, 𝑃, 𝑄 and 𝑅, can each take on only one of two truth values. The consequence is
that we can verify the law simply by checking all of the 2 × 2 × 2 cases. The result is called a truth table:

𝑷 𝑸 𝑹 𝑷 ∨ (𝑸 ∧ 𝑹) (𝑷 ∨ 𝑸) ∧ (𝑷 ∨ 𝑹)
𝐹 𝐹 𝐹 𝐹 𝐹
𝐹 𝐹 𝑇 𝐹 𝐹
𝐹 𝑇 𝐹 𝐹 𝐹
𝐹 𝑇 𝑇 𝑇 𝑇
𝑇 𝐹 𝐹 𝑇 𝑇
𝑇 𝐹 𝑇 𝑇 𝑇
𝑇 𝑇 𝐹 𝑇 𝑇
𝑇 𝑇 𝑇 𝑇 𝑇

The first three columns show every possible combination of truth values for 𝑃, 𝑄 and 𝑅. The fourth
column shows the corresponding value for 𝑃 ∨ (𝑄 ∧ 𝑅); for example, in the first row, where 𝑃, 𝑄 and 𝑅
are all false, 𝑃 ∨ (𝑄 ∧ 𝑅) also evaluates to false. The fifth column shows the value of (𝑃 ∨ 𝑄) ∧ (𝑃 ∨ 𝑅).

52
©2018, Cullen Schaffer
Since the fourth and fifth columns are identical, it must always be the case that 𝑃 ∨ (𝑄 ∧ 𝑅) =
(𝑃 ∨ 𝑄) ∧ (𝑃 ∨ 𝑅), as we wanted to prove.

With two propositions, there are only 22 = 4 cases to check. Here’s a truth table that proves our
second law ¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄:

𝑷 𝑸 ¬(𝑷 ∨ 𝑸) ¬𝑷 ∧ ¬𝑸
𝐹 𝐹 𝑇 𝑇
𝐹 𝑇 𝐹 𝐹
𝑇 𝐹 𝐹 𝐹
𝑇 𝑇 𝐹 𝐹

As with sets, there’s a matching distributive law with the operators swapped: 𝑃 ∧ (𝑄 ∨ 𝑅) = (𝑃 ∧ 𝑄) ∨
(𝑃 ∧ 𝑅). The second equality is also one of a pair, the actual laws of De Morgan:
¬(𝑃 ∨ 𝑄) = ¬𝑃 ∧ ¬𝑄
¬(𝑃 ∧ 𝑄) = ¬𝑃 ∨ ¬𝑄
Now that we have De Morgan’s laws for propositional logic, it’s easy to see how they imply the ones for
sets. For any given element, let
𝑃 = the element is in the set 𝐴
𝑄 = the element is in the set 𝐵
Then to say the element is in 𝐴 ∪ 𝐵 is to assert 𝑃 ∨ 𝑄; and to say it is in ̅̅̅̅̅̅̅
𝐴 ∪ 𝐵 is to assert ¬(𝑃 ∨ 𝑄). By
De Morgan’s laws, this is the same as to assert ¬𝑃 ∧ ¬𝑄, that is, that the element is not in 𝐴 and also
that it is not in 𝐵. But then it is in 𝐴̅ ∩ 𝐵̅. Since we can reverse the argument to show that every
element in 𝐴̅ ∩ 𝐵̅ is in ̅̅̅̅̅̅̅
𝐴 ∪ 𝐵, these two sets must be equal.
𝒏
57. Prove that ∑𝒏𝒊=𝟎 𝒊 ( ) = 𝒏𝟐𝒏−𝟏 by considering the average size of a subset of a set of 𝒏 items.
𝒊

We can pair every subset with its complement, the subset containing all the other elements of the set.
Together, these have a total of 𝑛 elements; therefore their average size is 𝑛/2. The average of a
collection of paired numbers, with an average of 𝑛/2 for each pair, is again 𝑛/2, so this must be the
average size of all subsets.

We can also calculate the average size directly; that is, we can add up the sizes of all the subsets and
𝑛 1's
then divide by the number of subsets. The sum of the sizes begins this way: 0 + ⏞
1+ 1+ ⋯+1 +2 +
𝑛 𝑛
2 + ⋯. In this sum, the number 2 will appear ( ) times, since there are ( ) subsets containing two
2 2
𝑛 𝑛
elements; the number 3 will appear ( ) times, since there are ( ) subsets containing 3 elements. The
3 3
𝑛
number 0 appears just once because the empty set is the only subset of size 0, that is because ( ) = 1;
0
𝑛
and the number 1 appears 𝑛 times because the number of single-element subsets is ( ) = 𝑛. The
1
𝑛 𝑛 𝑛 𝑛
upshot is that the whole sum can be written more compactly as 0 ( ) + 1 ( ) + 2 ( ) + 3 ( ) + ⋯ +
0 1 2 3
𝑛 𝑛 𝑛
𝑛 ( ) or ∑𝑖=0 𝑖 ( ).
𝑛 𝑖

53
©2018, Cullen Schaffer
Dividing the sum of all the sizes of the subsets by the number of subsets, 2𝑛 , we find that the average is
𝑛
1 𝑛
∑𝑖( )
2𝑛 𝑖
𝑖=0

Setting this equal to our other answer for the average gives us
𝑛
1 𝑛
∑ 𝑖 ( ) = 𝑛/2
2𝑛 𝑖
𝑖=0

Multiplying on both sides by 2𝑛 then yields the formula we want, the same one we proved by induction
and combinatorially in Item 52.

58. Find a recurrence relation for the number of derangements 𝑫𝒏 of 𝒏 objects.

The first object, call it 𝑜1 , can’t end up in its own position, so we have 𝑛 − 1 choices for where to put it.
Suppose we put it in position 𝑘. Now consider two possibilities for where the object 𝑜𝑘 in position 𝑘
goes.

One possibility is that 𝑜𝑘 is placed in the first position. In this case, these two objects have switched
positions and we are now left with 𝑛 − 2 objects that must be deranged; that is, we have 𝐷𝑛−2 choices
to complete the derangement.

The other possibility is that 𝑜𝑘 is not to be placed in the first position. Start by placing it there. Now we
have 𝑛 − 1 objects—all of them except 𝑜1 —which must be arranged in 𝑛 − 1 positions, with none
ending up in its current position. This is exactly what we mean by a derangement, so there are 𝐷𝑛−1
choices.

Taking the two possibilities together, we have 𝑛 − 1 choices for where to put 𝑜1 and then 𝐷𝑛−1 + 𝐷𝑛−2
choices for what to do with the rest. By the multiplication principle,

𝐷𝑛 = (𝑛 − 1)(𝐷𝑛−1 + 𝐷𝑛−2 )

Note that 𝐷1 = 0 and 𝐷2 = 1. Along with the recurrence, this allows us to calculate the number of
derangements for any 𝑛.

We can’t solve this recurrence, in the sense of finding a closed form for it, but we can get a formula that
will provide insight. As a first step, we manipulate the recurrence algebraically:

𝐷𝑛 − 𝑛𝐷𝑛−1 = −(𝐷𝑛−1 − (𝑛 − 1)𝐷𝑛−2 )

Next, note that we can apply this form of the recurrence iteratively. For example, applying it once tells
us that

𝐷10 − 10𝐷9 = −(𝐷9 − 9𝐷8 )

54
©2018, Cullen Schaffer
But we can then apply the recurrence again to the expression in parentheses, since it’s of the right form.
This yields

𝐷10 − 10𝐷9 = −(−(𝐷8 − 8𝐷7 )) = (−1)2 (𝐷8 − 8𝐷7 )

Applying it a third time, we get

𝐷10 − 10𝐷9 = (−1)3 (𝐷7 − 7𝐷6 )

If we keep going, we’ll eventually get 𝐷2 and 𝐷1 on the right and we know the actual values of these.
When we do this, what will the power of −1 be? Look back and you’ll see that this power and the
subscript of the first 𝐷 on the right always sum to 10—no surprise since they start out summing to 10
and one goes up 1 with each iteration while the other goes down 1. This means that when the subscript
on the first 𝐷 is 2, as we want, the power of −1 will be 8. That is

𝐷10 − 10𝐷9 = (−1)8 (𝐷2 − 2𝐷1 ) = (−1)8 (1 − 2 ⋅ 0) = (−1)8

More generally, applying the recurrence iteratively yields

𝐷𝑛 − 𝑛𝐷𝑛−1 = (−1)1 (𝐷𝑛−1 − (𝑛 − 1)𝐷𝑛−2


= (−1)2 (𝐷𝑛−2 − (𝑛 − 2)𝐷𝑛−3 )
= (−1)3 (𝐷𝑛−3 − (𝑛 − 3)𝐷𝑛−4 )

Note that the power of −1 and the subscript on the first 𝐷 on the right always sum to 𝑛. If we keep
going until this subscript is 2, the power of −1 will thus be 𝑛 − 2. So we have

𝐷𝑛 − 𝑛𝐷𝑛−1 = (−1)𝑛−2 (𝐷2 − 2𝐷1 ) = (−1)𝑛−2 = (−1)𝑛−2 (−1)(−1) = (−1)𝑛

At the very end, we added two factors of −1; these don’t change the value but do simplify the form.

We now have 𝐷𝑛 − 𝑛𝐷𝑛−1 = (−1)𝑛 . Divide on both sides by 𝑛! and we get

𝐷𝑛 𝐷𝑛−1 (−1)𝑛
− =
𝑛! (𝑛 − 1)! 𝑛!

Suppose, again, that we want to get 𝐷10 . Substitute 𝑛 = 2, 3, 4, ⋯ , 10 to get a series of equations and
then add them together. On the right, we’ll get

10 10
(−1)𝑖 (−1)𝑖
∑ =∑
𝑖! 𝑖!
𝑖=2 𝑖=0

The equality holds because the two extra 𝑖 values that we include have the effect of adding one and
subtracting one, that is, they have no effect.

On the left-hand side we get

55
©2018, Cullen Schaffer
𝐷2 𝐷1 𝐷3 𝐷2 𝐷4 𝐷3 𝐷10 𝐷9
( − )+ ( − )+ ( − )+ ⋯+ ( − )
2! 1! 3! 2! 4! 3! 10! 9!

Here, everything cancels except the terms with the lowest and highest subscripts and we’re left with
𝐷10 𝐷1 𝐷
10
10!
− 1!
. But 𝐷1 = 0, so even this simplifies, to just 10! .

The same argument would apply if we’d chosen a value other than 10, so, equating the right- and left-
hand sides, we get a general formula:

𝐷𝑛 1 1 1 1
=1− + − +⋯
𝑛! 1! 2! 3! 𝑛!

We can use this to calculate 𝐷𝑛 . But it also clears up a mystery from the very end of Item 54. The left-
hand side is the chance of a derangement, if all permutations are equally likely, since it’s the number of
derangements divided by the total of all possible permutations. And the right-hand side—as you have
learned or will learn in Calculus class—is a very close approximation of 1/𝑒. In fact, the right-hand side
becomes 1/𝑒 in the limit, as we let 𝑛 increase without bound.

In fact, now that we know what we’re looking for, we can get the same result from the formula we
developed in Item 54. Here’s what we got for the chance of a not getting a derangement of 30 exam
papers:

30 30 30 30
(( ) 29! − ( ) 28! + ( ) 27! − ⋯ − ( ) 0!)
1 2 3 30
30!

Dividing through by the 30! and substituting fractions with descending factors for the binomial
coefficients, this becomes

30 29! 30 ⋅ 29 28! 30 ⋅ 29 ⋅ 28 27! 1 1 1 1


− + −⋯= − + −⋯≈1−
1 30! 2 ⋅ 1 30! 3 ⋅ 2 ⋅ 1 30! 1! 2! 3! 𝑒
59. If a professor in a large class collects exam papers and hands them back out at random, what’s the
relative likelihood that exactly 𝟎, 𝟏, 𝟐 or 𝟑 students get their own papers back? On average, how
many students get their own papers back?

Let’s call the probabilities 𝑃0 , 𝑃1 , 𝑃2 and 𝑃3 and the large number of students 𝑛. The first probability, 𝑃0 ,
is just the chance of a derangement: 𝐷𝑛 /𝑛!. But, as we saw in Item 58, for any reasonably large value of
𝑛 this is almost exactly 1/𝑒.

To count the ways in which exactly one student gets his or her own paper, we think of first choosing the
student and then picking a derangement of the 𝑛 − 1 remaining papers. So 𝑃1 = 𝑛𝐷𝑛−1 /𝑛! =
𝐷𝑛−1 /(𝑛 − 1)!. But this is again almost exactly 1/𝑒. So we have almost exactly 𝑃0 = 𝑃1 .

56
©2018, Cullen Schaffer
𝑛 𝑛(𝑛−1)
For exactly two students to get their own papers back, we first choose the two (there are ( ) = 2⋅1
2
𝑛(𝑛−1)
𝐷𝑛−2 1 𝐷 𝑛−2
2⋅1
ways) and then pick a derangement of the 𝑛 − 2 remaining papers. So 𝑃2 = 𝑛!
= 2! (𝑛−2)!. But
11 1
𝐷𝑛−2 /(𝑛 − 2)! is again almost exactly 1/𝑒. So we have almost exactly 𝑃2 = 2! 𝑒 = 2! 𝑃0 . The same
1 1
argument produces 𝑃3 = 3! 𝑃0 and, in fact, 𝑃𝑘 = 𝑘! 𝑃0 , for any 𝑘.4

To calculate the average number of students who get their own papers back, we just take a weighted
sum:
𝑛 𝑛 𝑛
𝑃0 1 1
∑ 𝑘𝑃𝑘 = ∑ 𝑘 = 𝑃0 ∑ ≈ ⋅𝑒=1
𝑘! (𝑘 − 1)! 𝑒
𝑘=0 𝑘=1 𝑘=1

In the first equality, we dropped the value 0 for the index 𝑘, since the first term is zero and contributes
1 1 1
nothing. In the next-to-last step, we rely on the Calculus fact that 1 + 1! + 2! + 3! ⋯ very quickly
approaches 𝑒.5

A much easier way to get the average number of students who get their own papers back is to take the
perspective of an individual student. Clearly, such a student has exactly a 1/𝑛 chance of getting his or
her paper back. So, if papers are redistributed 𝑛 times, he or she will expect to get his or her own paper
back once. The same is true of all the other students as well. So, if the papers are handed out 𝑛 times,
we’d expect a paper coincidence to occur once for each student, 𝑛 times in all. If there are 𝑛
coincidences in 𝑛 times, then the average is one coincidence per time.
𝒏 𝟐 𝟐𝒏
60. Prove that ∑𝒏𝒊=𝟎 ( ) = ( ), both combinatorially and by applying the binomial theorem to
𝒊 𝒏
(𝒙 + 𝒚)𝟐𝒏 .

The combinatorial approach to proving this appeared in a previous homework problem, but here it is
officially.

The right-hand side of the equality has a natural interpretation—it’s the number of 𝑛-element subsets of
a set of 2𝑛 elements. Suppose such a set looks like this: {𝑎1 , 𝑎2 , 𝑎3 , … , 𝑎𝑛 , 𝑏1 , 𝑏2 , 𝑏3 , … , 𝑏𝑛 }. Then to get
a subset of 𝑛, we can decide first which of the 𝑎’s to take and then which of the 𝑏’s. Suppose we decide

4
Well, not quite any. We’re relying on the fact that 𝐷𝑛−𝑘 /(𝑛 − 𝑘)! is almost exactly 1/𝑒 and this is true because
1 1 1 1
it’s the sum 1 − + − − ⋯ (𝑛−𝑘)!. So long as the sum continues for more than a few terms, it is extremely
1! 2! 3!
close to 1/𝑒. But if 𝑘 is nearly as large as 𝑛 then we only get a few terms and the approximation is bad. In
1
particular, if 𝑘 = 𝑛 then 𝑃𝑛 is the chance that everyone gets his or her own paper back. This chance is clearly ,
𝑛!
1 1 1
not the that our formula 𝑃𝑘 = 𝑃0 would suggest.
𝑛! 𝑒 𝑘!
𝑃
5
As observed in the previous footnote, the approximation 𝑃𝑘 = 0 actually fails for the very last few terms. The
𝑘!
effect is miniscule, since the probabilities in these terms are vanishingly small (both the true ones and our faulty
approximations). In any case, the next paragraph shows, using a different line of reasoning, that the conclusion is
not only extremely accurate, but, in fact, exact.

57
©2018, Cullen Schaffer
to use six of the 𝑎’s. Then naturally we’ll have to use 𝑛 − 6 of the 𝑏’s. So in this case we have
𝑛 𝑛 𝑛 𝑛 𝑛 2
( )( ) choices. But ( ) = ( ). So the number of choices simplifies to ( ) .
6 𝑛−6 6 𝑛−6 6

Of course, we would get an analogous answer if we decided to use a different number of 𝑎’s. We can
use any number from 0 to 𝑛 of them. Summing, we get a complete count of the number of ways of
forming an 𝑛-element subset:
𝑛
𝑛 2
∑( )
𝑖
𝑖=0

Now let’s get the same result by applying the binomial theorem. The theorem tells us that the right-
hand side of our equality is the coefficient of 𝑥 𝑛 𝑦 𝑛 in the expansion of (𝑥 + 𝑦)2𝑛 .

Note, though, that (𝑥 + 𝑦)2𝑛 = (𝑥 + 𝑦)𝑛 (𝑥 + 𝑦)𝑛 . How can a term of the form 𝑥 𝑛 𝑦 𝑛 arise on the right-
hand side? Taking 𝑛 = 10 as an example, how can we get the term of the form 𝑥 10 𝑦10 in the product
(𝑥 + 𝑦)10 (𝑥 + 𝑦)10? One way is to multiply the term (10) 𝑥 3 𝑦 7 from the first factor by the term
7
10 7 3 10 10 10 10 10
( ) 𝑥 𝑦 from the second, giving ( ) ( ) 𝑥 𝑦 . Another is to multiply the term ( ) 𝑥 4 𝑦 6 from
3 7 3 6
10 6 4 10 10 10 10
the first factor by the term ( ) 𝑥 𝑦 from the second, giving ( ) ( ) 𝑥 𝑦 . Or we might multiply
4 6 4
10 0 10 10 10 0 10 10 10 10
( ) 𝑥 𝑦 by ( ) 𝑥 𝑦 , giving ( ) ( ) 𝑥 𝑦 . In all, there are 11 possibilities and we need to
10 0 10 0
combine them to get the correct total. The coefficient of 𝑥 10 𝑦10 will be

10
10 10
∑( )( )
10 − 𝑖 𝑖
𝑖=0

Generalizing from 10 to any number 𝑛, we find that the coefficient of 𝑥 𝑛 𝑦 𝑛 in (𝑥 + 𝑦)𝑛 (𝑥 + 𝑦)𝑛 is
𝑛
𝑛 𝑛
∑( )( )
𝑛−𝑖 𝑖
𝑖=0

𝑛 𝑛
But since ( ) = ( ), this simplifies to just
𝑛−𝑖 𝑖
𝑛
𝑛 2
∑( )
𝑖
𝑖=0

61. In how many ways can you climb a staircase with 𝒏 stairs, if you take them either 𝟏 or 𝟐 at a time?

Call the number of ways 𝑁𝑛 . If your first step is to climb one stair, then you have 𝑁𝑛−1 ways to
continue. The only other possibility is that your first step is to climb two stairs; in this case you have
𝑁𝑛−2 ways to continue. In all

𝑁𝑛 = 𝑁𝑛−1 + 𝑁𝑛−2

58
©2018, Cullen Schaffer
We’ve run into this recurrence before, in Item 36 when we were counting tilings and in Item 49 when
we were counting runs in coin flips; it also helped with Sanskrit poetry in a homework problem. Of
course, an inductive definition depends not only on the recurrence, but also on the base cases. Here we
have 𝑁1 = 1 and 𝑁2 = 2. Since this exactly matches the base case for the tiling problem, the closed
form we derived for that problem in Item 41 works just as well here.

The sequence of numbers that we get—1, 2, 3, 5, 8, 13, …, each number being the sum of the previous
two—is named for the Italian mathematician Fibonacci who studied it back in the 1200s. Fibonacci was
counting pairs of rabbits (each pair being one male and one female). Suppose we start with one mature
pair and make the following assumptions:

 It takes one month for rabbits to mature.


 In each month, every pair of mature rabbits produces one new pair of immature rabbits.
 Rabbits do not die (during the relatively short time we’re observing them).

In this case, the number of rabbits, 𝑁𝑛 , in any month, 𝑛, will be the sum of all rabbits alive in the
previous month, 𝑁𝑛−1 , plus any new ones that were born. The number born is equal to the number of
mature rabbits last month, which in turn is equal to the number of all rabbits two months ago, 𝑁𝑛−2 ;
certainly any rabbit alive two months ago will be mature as of a month ago, and any other rabbit alive
last month must just have been born. The upshot is that 𝑁𝑛 = 𝑁𝑛−1 + 𝑁𝑛−2 . Assuming, again, that we
start with one mature pair, we have 𝑁1 = 1. And since this pair will produce a second pair by the time
𝑛 = 2, we have 𝑁2 = 2.

Of course, we’ll get a slightly different sequence with different starting assumptions. For example,
Fibonacci numbers are often given as 1, 1, 2, 3, 5, 8, …. This is the result of starting with one immature
pair of rabbits.

To avoid confusion, we’ll define the Fibonacci numbers this second way, but with subscripts starting at
0: 𝐹0 = 1, 𝐹1 = 1, 𝐹2 = 2, 𝐹3 = 3, 𝐹4 = 5, 𝐹6 = 8, …. That way, 𝐹𝑛 is still the solution to the problems
we’ve analyzed—the number of ways of tiling an 𝑛 × 2 path, climbing an 𝑛-step staircase or composing
a line of Sanskrit poetry of length 𝑛.

62. Prove that ∑𝒏𝒊=𝟎 𝑭𝒊 = 𝑭𝒏+𝟐 − 𝟏.

The proof is by induction on 𝑛. When 𝑛 = 0, the claim is that 𝐹0 = 𝐹2 − 1, which is true, since 𝐹0 = 1
and 𝐹2 = 2. Suppose the claim is true for some value of 𝑛. Then we have

𝑛+1 𝑛

∑ 𝐹𝑖 = (∑ 𝐹𝑖 ) + 𝐹𝑛+1 = 𝐹𝑛+2 − 1 + 𝐹𝑛+1


𝑖=0 𝑖=0

But the sum of any two consecutive Fibonacci numbers is the next Fibonacci number. So this is 𝐹𝑛+3 −
1, proving that the claim is true for the next higher value of 𝑛.

63. Prove that 𝑭𝟐𝒏 − 𝑭𝒏−𝟏 𝑭𝒏+𝟏 = (−𝟏)𝒏 .

Start by rewriting the claim this way:

59
©2018, Cullen Schaffer
𝐹𝑛2 = 𝐹𝑛−1 𝐹𝑛+1 + (−1)𝑛

We can check to see that this is true for 𝑛 = 1. Now suppose it’s true for all values up to some
2
particular value of 𝑛 and consider 𝐹𝑛+1 . Substituting 𝐹𝑛+1 = 𝐹𝑛 + 𝐹𝑛−1 , we have

2 2
𝐹𝑛+1 = (𝐹𝑛 + 𝐹𝑛−1 )2 = 𝐹𝑛2 + 2𝐹𝑛 𝐹𝑛−1 + 𝐹𝑛−1

Applying the induction hypothesis to the last term, we have

2
𝐹𝑛+1 = 𝐹𝑛2 + 2𝐹𝑛 𝐹𝑛−1 + 𝐹𝑛−2 𝐹𝑛 + (−1)𝑛−1

Factoring out the 𝐹𝑛 common to all terms except (−1)𝑛−1 , we get

2
𝐹𝑛+1 = 𝐹𝑛 (𝐹𝑛 + 𝐹𝑛−1 + 𝐹𝑛−1 + 𝐹𝑛−2 ) + (−1)𝑛−1

Applying the recurrence for Fibonacci numbers twice inside the parentheses and then once again to the
result, we get

2
𝐹𝑛+1 = 𝐹𝑛 (𝐹𝑛+1 + 𝐹𝑛 ) + (−1)𝑛−1 = 𝐹𝑛 𝐹𝑛+2 + (−1)𝑛−1

If we then note that (−1)𝑛−1 = (−1)𝑛−1 (−1)(−1) = (−1)𝑛+1 , since adding a pairs of factors of (−1)
has no effect, then we see that the claim has been proven to hold for the next higher value of 𝑛.

The induction proof given here is slightly different than the ones we’ve seen before. Rather than
supposing a claim holds for a certain value of 𝑛, we suppose it hold for all values up to 𝑛. This was
crucial in the proof, since we applied the claim, not to 𝐹𝑛 , but to 𝐹𝑛−1 . To see that this kind of proof is
still valid, think of the checkout line again. Suppose a person will hear about a special discount only if
everyone ahead of them on line hears. And, to start things out, suppose the first person knows about
the discount. Then the woman who’s second in line will hear, because everyone ahead of her—only the
person first in line—has heard. And now everyone ahead of the man third in line knows, so he’ll hear as
well, and so on. We can still conclude that everyone in line hears about the discount.

64. Get a formula for 𝑵𝒏 in terms of 𝑭𝒏, assuming that 𝑵𝒏 = 𝑵𝒏−𝟏 + 𝑵𝒏−𝟐 and 𝑵𝟎 = 𝒂 and 𝑵𝟏 = 𝒃.

The question is: What happens to the Fibonacci sequence if we change the base cases? The answer
becomes clear if we just generate the first few elements of the new sequence: 𝑎, 𝑏, 𝑎 + 𝑏, 𝑎 + 2𝑏, 2𝑎 +
3𝑏, 3𝑎 + 5𝑏, 5𝑎 + 8𝑏, …; it seems that 𝑁𝑛 = 𝐹𝑛−2 𝑎 + 𝐹𝑛−1 𝑏.

To prove that this is so, we use induction on 𝑛. The claim checks for 𝑛 = 2 and 𝑛 = 3. If it holds for any
all values of 𝑛 up to any particular value greater than 3, then we have

𝑁𝑛+1 = 𝑁𝑛 + 𝑁𝑛−1 by the recurrence defining 𝑁𝑛


= 𝐹𝑛−2 𝑎 + 𝐹𝑛−1 𝑏 + 𝐹𝑛−3 𝑎 + 𝐹𝑛−2 𝑏 applying the induction hypothesis twice
= (𝐹𝑛−2 + 𝐹𝑛−3 )𝑎 + (𝐹𝑛−1 + 𝐹𝑛−2 )𝑏 after regrouping
= 𝐹𝑛−1 𝑎 + 𝐹𝑛 𝑏 by the recurrence defining 𝐹𝑛

That is, the claim holds for the next higher value of 𝑛.

60
©2018, Cullen Schaffer
One interesting choice for 𝑎 and 𝑏 is to let them be a pair of consecutive Fibonacci numbers, say
𝑎 = 𝐹4 = 5 and 𝑏 = 𝐹5 = 8. Then the series will run this way: 5, 8, 13, 21, …. Clearly, we now have
𝑁𝑛 = 𝐹𝑛+4. On the other hand, the formula we just proved tells us that

𝑁𝑛 = 𝐹𝑛−2 𝑎 + 𝐹𝑛−1 𝑏 = 𝐹𝑛−2 𝐹4 + 𝐹𝑛−1 𝐹5

Putting the two results for 𝑁𝑛 together, we have

𝐹𝑛+4 = 𝐹𝑛−2 𝐹4 + 𝐹𝑛−1 𝐹5

This works for values other than 4, of course, so a more general formula is

𝐹𝑛+𝑚 = 𝐹𝑛−2 𝐹𝑚 + 𝐹𝑛−1 𝐹𝑚+1

We can get a much more symmetrical version of this formula if we now define 𝑛 = 𝑘 + 2. Substituting,
yields

𝐹𝑘+𝑚+2 = 𝐹𝑘 𝐹𝑚 + 𝐹𝑘+1 𝐹𝑚+1

We can immediately put this to use to prove additional identities. For example, let 𝑘 = 𝑚 = 𝑛 − 1 in
2
our new formula. Then the left side of the formula is 𝐹2𝑛 and the right side is 𝐹𝑛−1 + 𝐹𝑛2 , that is,

2
𝐹2𝑛 = 𝐹𝑛−1 + 𝐹𝑛2

Or, again, let 𝑚 = 𝑘 − 1. Then we get

𝐹2𝑘+1 = 𝐹𝑘 𝐹𝑘−1 + 𝐹𝑘+1 𝐹𝑘 = 𝐹𝑘 (𝐹𝑘−1 + 𝐹𝑘+1 )

These last two together give us a formula for every Fibonacci number—the first for those with even
subscripts, the second for those with odd subscripts—in terms of others with subscripts about half as
big.

65. How many bits does it take to write the number 𝒏?

If the number in question is a power of 2, say 𝑛 = 2𝑚 , then the answer is easy. It will be written as a 1
followed by 𝑚 zeroes, 𝑚 + 1 bits in all. We’ll get the same answer for every value of 𝑛 up through the
next power of two, that is the number of bits will be 𝑚 + 1 for values of 𝑛 satisfying

2𝑚 ≤ 𝑛 < 2𝑚+1
We’d like to use this relationship to get a formula relating 𝑛 to the number of bits used to write it. As a
first step, recall that the power of 2 that yields a number, 𝑛, is called the base-two logarithm of 𝑛 and is
written log 2 𝑛. When 𝑛 is a power of two, the meaning of the logarithm is obvious—to say log 2 8 = 3
just means that 23 = 8. But what does it mean to say that log 2 9 ≈ 3.17? Of course, it means that
23.17 ≈ 9, but we’re still left with the question of what 23.17 means.

This is easy enough to understand based on two fundamental facts about exponents:

𝑥 𝑎 𝑥 𝑏 = 𝑥 𝑎+𝑏 and (𝑥 𝑎 )𝑏 = 𝑥 𝑎𝑏

61
©2018, Cullen Schaffer
Given these, we can rewrite 23.17 as
1 7 1 7 1 1 7
23+10+100 = 23 ⋅ 210 ⋅ 2100 = 23 ⋅ 210 ⋅ (2100 )

1 1
So we can understand what 23.17 means if we only know what’s meant by 210 , 2100 and similar powers.
1 1 10 1 1
But if we raise 210 to the tenth power we get (210 ) = 210⋅10 = 21 = 2. In other words, 210 must be
1
the tenth root of 2; and 2100 must be the hundredth root of 2 and so on.

The upshot is that the base-two logarithm is a function that takes in any (positive) number, 𝑛, and
returns an exponent, which can be interpreted as we’ve just explained. The exponent will be an integer
if 𝑛 is a power of two and will increase smoothly up to the next integer as 𝑛 increases to the next power
of two. Here’s the picture:

Now let’s go back to the fact that the number 𝑛 will take 𝑚 + 1 bits to write so long as

2𝑚 ≤ 𝑛 < 2𝑚+1
Based on the graph, it’s clear that this is equivalent to saying that

𝑚 ≤ log 2 𝑛 < 𝑚 + 1
Of course, once we’re familiar with logarithms, we can get the same result just by taking the log of all
three quantities in the original inequality.

Now suppose we have a number 𝑛 and log 2 𝑛 = 4.654. The value of 𝑚 that works in our inequality is
clearly 4:

4 ≤ 4.654 < 5
Thus, in this case, 𝑚 + 1 is 5 and it takes 5 bits to write 𝑛. In general, for any value of log 2 𝑛, the value
of 𝑚 will be the largest integer not larger than log 2 𝑛. This concept is useful enough that there’s a
standard way of denoting it: the floor of 𝑥, written ⌊𝑥⌋, is the largest integer not larger than 𝑥. For
example, we have

62
©2018, Cullen Schaffer
⌊4.654⌋ = 4
⌊4⌋ = 4
⌊−4.654⌋ = −5
Careful—as the last example shows, calculating the floor is not quite the same as dropping the fractional
part; it’s a matter of approaching a number from the left on the number line. By the way, we also define
the ceiling of a number 𝑥 to be the smallest integer not smaller than 𝑥 and write it ⌈𝑥⌉.

Using floor, we can give a formula for the number bits used in writing 𝑛; it’s ⌊log 2 𝑛⌋ + 1. This gives an
exact answer to our question about the number of bits it takes to write a number in binary. Keep in
mind, though, that the floor and “+1” in our exact formula affect the answer very little. For many
purposes, it’s good enough to say that the number of bits it takes to write 𝑛 is about log 2 𝑛. And,
likewise, log 3 𝑛 is about how many digits it takes to write 𝑛 in base 3; log 7 𝑛 is about how many digits it
takes to write 𝑛 in base 7 and so on.

In other words, we can understand logarithms as being, roughly, functions that tell us how many digits a
number will have. Note that, in any base, this is extremely small compared to the size of the number
itself. Even a large number like a million takes only 7 digits to write in base 10 or 20 bits in binary.
Counting up to a million is a huge task; writing it down is easy.

This turns out to be tremendously important in computer science, because we often find that one
approach to a task involving 𝑛 items is as hard as counting to 𝑛 while another is as easy as writing down
the number 𝑛.

Since we’ll be using logarithms in what follows, here’s a reminder of three important facts about them:

 log 𝑏 𝑥𝑦 = log 𝑏 𝑥 + log 𝑏 𝑦. The claim is that when we put 𝑏 to the power of the value on the
right-hand side, we’ll get 𝑥𝑦. Checking, we find that 𝑏 log𝑏 𝑥+log𝑏 𝑦 = 𝑏 log𝑏 𝑥 ⋅ 𝑏 log𝑏 𝑦 = 𝑥𝑦.
 log b 𝑥 ⁄𝑦 = log 𝑏 𝑥 − log 𝑏 𝑦. Again, we check the claim by putting 𝑏 to the power of the value
−1
on the right-hand side: 𝑏 log𝑏 𝑥−log𝑏 𝑦 = 𝑏 log𝑏 𝑥 ⋅ 𝑏 − log𝑏 𝑦 = 𝑥 ⋅ (𝑏 log𝑏 𝑦 ) = 𝑥 ⋅ 𝑦 −1 = 𝑥/𝑦.
𝑎
 log 𝑏 𝑥 𝑎 = 𝑎 log 𝑏 𝑥. Checking, we have 𝑏 𝑎 log𝑏 𝑥 = (𝑏 log𝑏 𝑥 ) = 𝑥𝑎.

66. Find the number of comparisons in a mergesort of 𝒏 items.

In Item 4, we found that bubblesort uses 𝑛(𝑛 − 1)/2 comparisons to sort a list of 𝑛 items. A better
approach to sorting is mergesort.

Note first that it’s easy to combine two already-sorted lists into one. Create an empty list and then:

Repeatedly compare the top two items on each list and move the smaller of the two to the next
position of the new list.

63
©2018, Cullen Schaffer
Each comparison moves one item, except that the very last item to be moved does not need to be
compared. If there are a total of 𝑛 items in the two original lists, then this process of merging will use at
most 𝑛 − 1 comparisons.6

To sort a list, apply the following procedure:

1. If the list contains just one item, do nothing.


2. Otherwise
a. Break it into two lists, as nearly equal in size as possible.
b. Use this procedure to sort each half.
c. Merge the two sorted lists.

The approach is recursive. To sort a list of 8 items, Step 2b calls for us to sort lists of 4 items. This will in
turn result in calls for us to sort lists of 2 items. Eventually, however, we bottom out at calls to sort lists
of just one item each and—since a list of one item can’t be out of order—in this case Step 1 tells us that
there’s nothing to do.

We can prove that the procedure is correct using induction. If we apply it to a list of length 1, there’s
nothing to do and the procedure, correctly, does nothing.

If the procedure works correctly for all lists of length up to and including 𝑛, then what will happen if we
try it on a list of length 𝑛 + 1? Step 2a will break the list into two halves, each of about length about
(𝑛 + 1)/2 . The exact lengths are not important; all we need to note is that they are certainly no bigger
than 𝑛. The induction hypothesis then tells us that the procedure correctly sorts them. Merging the
sorted lists in Step 2c then produces a sorted list of all the items, completing the proof by showing that
the procedure works correctly for lists of the next higher size.

How many comparisons are used in the process? Let 𝑁𝑛 be the number of comparisons and,
temporarily, assume that 𝑛 is a power of 2, say 𝑛 = 2𝑚 . In this case, Step 2a will always be able to
break the list into two parts exactly equal in size.

In order to mergesort a list of 𝑛 items, we first break it into two halves. In Step 2b, we use mergesort to
sort the two halves. Each of these mergesorts uses 𝑁𝑛/2 comparisons, for a total of 2𝑁𝑛/2 . Then, in
Step 2c, we merge the sorted lists. The total number of items merged is 𝑛; hence, as we’ve already said,
we use at most an additional 𝑛 − 1 comparisons. Putting this together, we have

𝑁𝑛 = 2𝑁𝑛/2 + 𝑛 − 1

The base case for this recurrence is 𝑁1 = 0, since no comparisons are used in sorting a one-item list.

To get a closed form for 𝑁𝑛 , we iterate the recurrence. First, though, we substitute 𝑛 = 2𝑚 , rewriting
the recurrence in the form

6
We say “at most” because it will use even less if one list becomes empty while the other still contains items. In
this case, we can just copy the remaining items at the bottom, using no comparisons. We’re making a “worst case”
analysis.

64
©2018, Cullen Schaffer
𝑁2𝑚 = 2𝑁2𝑚−1 + 2𝑚 − 1

Iterating, we then get

𝑁2𝑚 = 2𝑁2𝑚−1 + 2𝑚 − 1
= 2(2𝑁2𝑚−2 + 2𝑚−1 − 1) + 2𝑚 − 1 = 22 𝑁2𝑚−2 + 2𝑚 − 2 + 2𝑚 − 1 = 22 𝑁2𝑚−2 + 2 ⋅ 2𝑚 − 2 − 1
= 22 (2𝑁2𝑚−3 + 2𝑚−2 − 1) + 2 ⋅ 2𝑚 − 2 − 1 = 23 𝑁2𝑚−3 + 3 ⋅ 2𝑚 − 22 − 2 − 1
= 23 (2𝑁2𝑚−4 + 2𝑚−3 − 1) + 3 ⋅ 2𝑚 − 22 − 2 − 1 = 24 𝑁2𝑚−4 + 4 ⋅ 2𝑚 − 23 − 22 − 2 − 1

The results follow a simple pattern. If we continue until the power on the 2 that starts the expression is
𝑚, then we’ll have

2𝑚 𝑁20 + 𝑚2𝑚 − (2𝑚−1 + 2𝑚−2 + ⋯ + 22 + 2 + 1)

The sum of powers of 2 is one we recognize; it comes to 2𝑚 − 1. Also, 𝑁20 = 𝑁1 = 0. So the


expression simplifies to just

𝑚2𝑚 − 2𝑚 + 1

Of course, we’d like to put this answer back in terms of 𝑛. As we saw in Item 65, saying 𝑛 = 2𝑚 is
exactly the same as saying 𝑚 = log 2 𝑛. The number of comparisons that mergesort takes to sort a list of
𝑛 items is thus

𝑛 log 2 𝑛 − 𝑛 + 1

All this was assuming 𝑛 was exactly a power of 2. If it’s not, we can increase the size to a power of 2 by
adding dummy elements with the value ∞. After sorting, these will end up at the bottom of the list and
they can be dropped. Adding the extra elements can’t make the list larger than 2𝑛. So the number of
comparisons is at most

2𝑛 log 2 2𝑛 − 2𝑛 + 1

If we’re raising 2 to powers, we get a number twice as large simply by increasing the power by one—
2𝑚+1 = 2 ⋅ 2𝑚 = 2𝑛. That is, log 2 2𝑛 = 𝑚 + 1 = log 2 𝑛 + 1. Our bound on the number of
comparisons is thus

2𝑛(log 2 𝑛 + 1) − 2𝑛 + 1 = 2𝑛 log 2 𝑛 + 1

For comparison, if 𝑛 = 220 ≈ 1,000,000, then bubblesort will use 𝑛(𝑛 − 1)/2 = 220 (220 − 1)/2 ≈
500,000,000,000 comparisons. Using our exact formula, since 𝑛 is a power of 2, we find that mergesort
will use 𝑛 log 2 𝑛 − 𝑛 + 1 = 220 ⋅ 20 − 220 + 1 ≈ 19,000,000 comparisons. Counting in millions, that’s
19 versus 500,000, a savings of about 99.996 percent.

67. Find the average number of comparisons in a quicksort of 𝒏 items.

65
©2018, Cullen Schaffer
Another standard sorting method, quicksort, relies on partitioning list elements into two groups—those
smaller and larger than some particular item. To partition, we

1. Choose an item at random from the list and call it the pivot.
2. Compare each other item to the pivot, placing it before the pivot if it’s smaller and after the
pivot if it’s larger.

If the list contains 𝑛 items, partitioning compares every item other than the pivot to the pivot. Hence it
uses 𝑛 − 1 comparisons. The result looks like this:

items
smaller
than the
pivot

pivot

items
larger
than the
pivot

To sort a list, apply the following procedure:

1. If the list is empty or contains just one item, do nothing.


2. Otherwise
a. Partition it.
b. Use this procedure to sort the list of items smaller than the pivot.
c. Use this procedure to sort the list of items larger than the pivot.

Like mergesort, this procedure is recursive and we can prove it correct by induction. It clearly works for
empty lists or lists of size one, since these don’t need to be sorted and the procedure (correctly) does
nothing. If it works for lists up to and including size 𝑛, then a list of 𝑛 + 1 items will be partitioned into
two lists, each of length 𝑛 or less, and these will—by the induction hypothesis—be sorted correctly.

A key feature of quicksort is that its functioning is partly dependent on chance—the random selection of
a pivot in the partition step. Because the approach is random, we can’t say definitely how many
comparisons it will use. Instead, we’ll analyze two cases—the worst possible performance and the
average performance.

The worst-case scenario is that the comparisons used in the partition step help minimally in dividing the
list into small and large items. This occurs when the pivot happens, unfortunately, to be largest or
smallest item. Suppose it’s the largest. Then after the 𝑛 − 1 comparisons of the partition step, all of the
other items are in the top list, above the partition. There’s nothing to do with the empty bottom list,
but the recursive sort of the top list will begin with a partition requiring 𝑛 − 2 comparisons. Assuming

66
©2018, Cullen Schaffer
this and subsequent partitions always pick the largest item as pivot, we’ll then use 𝑛 − 3 comparisons,
𝑛 − 4 comparisons and so on. The total is (𝑛 − 1) + (𝑛 − 2) + ⋯ + 1 = (𝑛 − 1)𝑛/2, exactly the same
as used by the inefficient bubblesort procedure.

The good news is that this worst case is fantastically unlikely. We’re picking one of 𝑛 items, then one of
𝑛 − 1 and so on; the multiplication principle tells us that the total number of choices is 𝑛!. But only one
of all of these involves picking the largest and then the largest of those remaining and so on. So the
1
chance of this happening is 1/𝑛!. Even for a relatively small list of just 100 items, that’s 100! or one out
of

93,326,215,443,944,152,681,699,238,856,266,700,490,715,968,264,381,621,468,
592,963,895,217,523,999,929,915, 608,941,463,976,156,518,286,253,697,
920,827,223,758,251,185,210,916,864,000,000,000,000,000,000,000,000

A much more useful analysis tells us what happens on average. Call the average number of comparisons
used by quicksort in sorting a list of 𝑛 items 𝑁𝑛 . The partition step is not random—it definitely uses 𝑛 −
1 comparisons. If it happens to pick the smallest item, then we’re left with lists of length 0 and 𝑛 − 1
and the recursive calls in Steps 2b and 2c will use 𝑁0 and 𝑁𝑛−1 comparisons, on average. If the chosen
pivot is the second smallest item, then we’ll be left with lists of length 1 and 𝑛 − 2 and the recursive
calls will use 𝑁1 and 𝑁𝑛−2 comparisons, on average. This pattern continues for a pivot which is the third
smallest, fourth smallest and so on.

There are 𝑛 possible choices for the pivot, all equally likely. To get the average over all possible choices,
we add the number of comparisons used in each and divide by 𝑛. This gives us a formula for 𝑁𝑛 :

𝑁𝑛 = ((𝑁0 + 𝑁𝑛−1 ) + (𝑁1 + 𝑁𝑛−2 ) + ⋯ + (𝑁𝑛−1 + 𝑁0 ))/𝑛 + 𝑛 − 1

Inside the parentheses, we have each of the values 𝑁0 through 𝑁𝑛−1 twice. Substituting, we get

2 ∑𝑛−1
𝑖=0 𝑁𝑖
𝑁𝑛 = +𝑛−1
𝑛

Multiplying through by 𝑛 we get

𝑛−1

𝑛𝑁𝑛 = 2 ∑ 𝑁𝑖 + 𝑛2 − 𝑛
𝑖=0

This looks harder than other recurrences we’ve seen because 𝑁𝑛 is based on all previous values, not just
one or two. In fact, though, we can easily reduce it to the more familiar kind of recurrence. The formula
we’ve just given holds for all values of 𝑛; write it out for the next lower one:

𝑛−2

(𝑛 − 1)𝑁𝑛−1 = 2 ∑ 𝑁𝑖 + (𝑛 − 1)2 − (𝑛 − 1)
𝑖=0

Then subtract this from the previous equality. Note that all but one of the 𝑁𝑖 ’s on the right disappear.

67
©2018, Cullen Schaffer
𝑛𝑁𝑛 − (𝑛 − 1)𝑁𝑛−1 = 2𝑁𝑛−1 + 𝑛2 − (𝑛 − 1)2 − 𝑛 + (𝑛 − 1)

Simplifying and bringing all the 𝑁𝑛−1 ’s to the right, we have

𝑛𝑁𝑛 = (𝑛 + 1)𝑁𝑛−1 + 2𝑛 − 2

For simplicity, we’ll drop the −2 on the right. This has the effect of very slightly overstating the
difference between successive values of 𝑁𝑛 ; so the answer we get will actually be a tiny overestimate.
Next, divide through by 𝑛(𝑛 + 1):

𝑁𝑛 𝑁𝑛−1 2
= +
𝑛+1 𝑛 𝑛+1

This version of the recurrence has the nice property that the fractions involving 𝑁 are of exactly the
same form. Look what happens if we write this equation for a series of values of 𝑛, say 𝑛 = 10, 9, 8, …:

𝑁10 𝑁9 2
= +
11 10 11
𝑁9 𝑁8 2
= +
10 9 10
𝑁8 𝑁7 2
= +
9 8 9

If we add all these equations up, nearly all of the fractions involving 𝑁 will appear on both the left and
𝑁1
right sides and, hence, will cancel. In the general case, if we go all the way down until we have 2
on the
right, the result of adding up the equations and then cancelling will be

𝑁𝑛 𝑁1 2 2 2 2 2
= + + + + ⋯+ +
𝑛+1 2 𝑛+1 𝑛 𝑛−1 4 3

or, noting that 𝑁1 = 0—no comparisons for a list with one item—we have

1 1 1
𝑁𝑛 = (𝑛 + 1)2 ( + + ⋯ + )
3 4 𝑛+1

We can picture the sum in parentheses with the help of this picture:

68
©2018, Cullen Schaffer
The sum we want is the total of the areas of all the rectangles, except the first one. You should know
from Calculus—or you will know soon—that the area under the curve 𝑦 = 1/𝑥 between 𝑥 = 1 and any
higher value is the natural logarithm of that value. So the area under the curve from 𝑥 = 1 to 𝑥 = 𝑛 + 1
is ln (𝑛 + 1). The rectangles under that part of the curve have a total area less than the area under the
curve, so we have

1 1 1
+ + ⋯+ < ln (𝑛 + 1)
2 3 𝑛+1
1
And if we drop the 2 on the left, we make the left side even smaller. That is, it’s certainly also true that

1 1
+ ⋯+ < ln (𝑛 + 1)
3 𝑛+1

Since the left side in this inequality is exactly the sum that appears in our formula for 𝑁𝑛 , we can now
substitute to get a closed-form upper bound for the number of comparisons quicksort makes on
average:

𝑁𝑛 < 2(𝑛 + 1)ln(𝑛 + 1)

Dropping the two +1’s in the formula temporarily, since they make very little difference when 𝑛 is large,
we get 2𝑛 ln 𝑛, essentially the same as our answer for mergesort in Item 66. The only difference is that,
with mergesort, the logarithm in our answer was base-two; the natural logarithm here uses 𝑒 as a base.
How much difference does that make?

By the definition of logarithms, 𝑛 = 𝑒 ln 𝑛 . Taking the base-two logarithm of both sides, we have
log 2 𝑛 = log 2 𝑒 ln 𝑛 = ln 𝑛 ⋅ log 2 𝑒. So the base-two logarithm is just log 2 𝑒 ≈ 1.44 times bigger than the
natural logarithm.

Since the base of the logarithm only leads to a percentage difference between mergesort and quicksort,
we group them together as 𝑛 log 𝑛 procedures, leaving out the base entirely. By contrast, bubblesort is
an 𝑛2 approach; the difference between bubblesort and mergesort or quicksort grows with the size of
the list 𝑛 like the difference between the functions 𝑛 log 𝑛 and 𝑛2 . This is not merely a matter of
percentages—we can get a difference larger than any fixed percentage just by choosing a big enough
value of 𝑛.

68. Find the greatest common divisor of 𝟔𝟔, 𝟒𝟒𝟓 and 𝟕𝟑, 𝟐𝟗𝟓.

42 7
When we reduce a fraction like 60 to 10, we divide the numerator and denominator by the highest
number that divides both. In this case, it’s obvious that the greatest common divisor is 6. But how can
we find it for larger numbers? In particular, what if the fraction we’re trying to reduce is

66,445
73,295

69
©2018, Cullen Schaffer
We can get the greatest common divisor of any two numbers using Euclid’s algorithm. An algorithm is
simply a precisely specified procedure; this one is named after the Greek mathematician who described
it more than 2,300 years ago:

1. If the smaller number divides the larger one, it is the greatest common divisor.
2. Otherwise:
a. Find the remainder after dividing the larger number by the smaller one.
b. Replace the larger number with this remainder.
3. Go back to Step 1.

In our example, we start with two numbers: 66,445 and 73,295. Step 1 does not apply, so we divide,
getting a remainder of 6,850. Replacing the larger number with this remainder, we now have two
numbers: 66,445 and 6,850. Step 1 does not apply, so we divide, getting a remainder of 4,795.
Replacing the larger number with this, we have two numbers 6,850 and 4,795. Step 1 does not apply,
so we divide, getting a remainder of 2,055. Our numbers are now 4,795 and 2,055. Dividing, we get a
remainder of 685. Our two numbers are 2,055 and 685. But 685 does divide 2,055, so we’re done.
The greatest common divisor is 685.

The procedure is easy enough, but why should we believe it’s correct?

Suppose our two original numbers are 𝑎0 and 𝑎1 . We’ll call the number we’re looking for—the biggest
one that divides both—𝑑. Dividing, and using 𝑎2 to denote the remainder, we get

𝑎0 = 𝑘1 𝑎1 + 𝑎2
for some integer 𝑘1 , which will be of no importance to us (yet). We can rewrite this equation as 𝑎0 −
𝑘1 𝑎1 = 𝑎2 , and—here’s the key—since 𝑑 divides both 𝑎0 and 𝑎1 , it divides both terms on the left,
meaning that it divides 𝑎2 .

We’ve just argued that, since 𝑑 divides 𝑎0 and 𝑎1 , it must also divide 𝑎2 . But now that we know it
divides 𝑎1 and 𝑎2 , we can argue in exactly the same way that it divides 𝑎3 , the next remainder. And
continuing, we can show that it divides each pair of numbers as the algorithm proceeds.

Eventually we get an exact division, that is, a remainder of zero:

𝑎𝑖 = 𝑘𝑖+1 𝑎𝑖+1
In this case, we can see directly that the largest number that divides the two 𝑎’s is the one on the right.

The argument we’ve just given relies critically on certain facts about division and divisibility. One
example is the fact that, if a number divides each of two other numbers (say, 𝑎0 and −𝑘1 𝑎1 ), then it
divides their sum (in this case 𝑎0 − 𝑘1 𝑎1). This is intuitive enough, but let’s see how to justify it more
carefully.

We start by defining what we mean by divisibility. We’ll say 𝑎 divides 𝑏—and we’ll denote this 𝑎|𝑏—
exactly when we can write 𝑏 = 𝑘𝑎 for some integer 𝑘. Remember that integers can be negative, so that
3| − 6 (since 6 = −2 ⋅ 3) and −3|6 (since 6 = −2 ⋅ −3).

Using the definition, we can then state the theorem we want and prove it:

70
©2018, Cullen Schaffer
Theorem: Given integers 𝑎, 𝑏 and 𝑐, if 𝑎|𝑏 and 𝑎|𝑐 then 𝑎|(𝑏 + 𝑐).

Since 𝑎|𝑏, we have 𝑏 = 𝑘𝑏 𝑎 for some integer 𝑘𝑏 . Likewise, since 𝑎|𝑐, we have 𝑐 = 𝑘𝑐 𝑎, for
some integer 𝑘𝑐 . But then 𝑏 + 𝑐 = 𝑘𝑏 𝑎 + 𝑘𝑐 𝑎 = (𝑘𝑏 + 𝑘𝑐 )𝑎. We’ve written 𝑏 + 𝑐 as an
integer, 𝑘𝑏 + 𝑘𝑐 , times 𝑎. Hence, by the definition of divisibility, 𝑎|(𝑏 + 𝑐).

The Euclidean algorithm relies on another fundamental fact about division: Given any integer, we can
divide by any positive integer and get a unique answer in the form of a quotient and remainder. Note
carefully that we aren’t limiting ourselves to dividing larger numbers by smaller ones: 3 divided by 7
gives a quotient of 0 with a remainder of 3—if we have 3 pennies to divide equally among 7 people, we
give each person no pennies and have three left over. We also aren’t assuming the number being
divided is positive, but what we mean by the quotient and remainder when we divide −7 by 3 is,
perhaps, not clear. We’ll come back to this in a moment.

First, let’s state the claimed fact precisely as a theorem (the “Division Theorem”):

Theorem: Given any integer 𝑎 and any integer 𝑏 > 0, there is a unique integer 𝑞 (the quotient)
and a unique integer 𝑟 (the remainder), such that 𝑎 = 𝑞𝑏 + 𝑟 and 0 ≤ 𝑟 < 𝑏.

Note that the theorem specifies that the remainder must be smaller than the divisor and that it can’t be
negative. This is part of what we mean by the word ‘remainder’; the statement of the theorem makes
that meaning precise and explicit. Now here’s the proof:

Consider the set 𝑆 of all numbers of the form 𝑎 − 𝑘𝑏, where 𝑘 is an integer. Of the numbers in
𝑆 that are zero or higher, pick the smallest one and call it 𝑟. Since 𝑟 is a member of 𝑆, it must be
𝑎 minus some multiple of 𝑏; let 𝑞 designate the multiplier of 𝑏. So we have 𝑟 = 𝑎 − 𝑞𝑏 or 𝑎 =
𝑞𝑏 + 𝑟, as claimed.

There are two things left to prove: (a) that 𝑟 is in the correct range, 0 ≤ 𝑟 < 𝑏 and (b) that 𝑟 and
𝑞 are unique, i.e. that there is no other pair of numbers that also satisfies the conditions given in
the theorem. The first point is easy: 𝑟 must be greater than or equal to zero, since it was chosen
from among elements of 𝑆 that were zero or higher. If it were 𝑏 or higher, then 𝑟 couldn’t have
been the lowest number chosen, since 𝑟 − 𝑏 = 𝑎 − 𝑞𝑏 − 𝑏 = 𝑎 − (𝑞 + 1)𝑏 would also be a
possible choice (it’s of the right form and it’s zero or higher).

For the second point, suppose we find a pair of integers, 𝑟′ and 𝑞′ such that 0 ≤ 𝑟 ′ < 𝑏 and 𝑎 =
𝑞 ′ 𝑏 + 𝑟′. Then, equating our two formulas for 𝑎, we have

𝑞 ′ 𝑏 + 𝑟 ′ = 𝑞𝑏 + 𝑟
or

(𝑞 ′ − 𝑞)𝑏 = 𝑟 − 𝑟′

Now 𝑏 divides the left side, so it must divide the right side as well. But 𝑟 and 𝑟′ are both in the
set {0, 1, … , 𝑏 − 1} and so their difference is strictly less than 𝑏. The only number less than 𝑏
that’s divisible by 𝑏 is zero. So we have 𝑟 − 𝑟 ′ = 0 or 𝑟 = 𝑟′. Also, now that we’ve shown that
the right side of our equation is zero we know that 𝑞 ′ − 𝑞 times 𝑏 is zero. The only way for a
product of two integers to be zero is for one of them to be zero. And the theorem states that 𝑏
is not zero. Hence we must have 𝑞 ′ − 𝑞 = 0 or 𝑞 ′ = 𝑞. If other words the new pair 𝑟′ and 𝑞′

71
©2018, Cullen Schaffer
must be the same as the old pair. There is only one pair that works, i.e. the 𝑟 and 𝑞 of the
theorem are unique.

If you’re careful, you’ll note that even this proof relies on other facts that we haven’t justified, the fact,
for example, that a product of integers is zero only if at least one of the factors is zero. Continuing
backward in this way, we’ll eventually have to stop with a few facts that are simply assumed—
mathematicians would call these axioms. We won’t go farther backward though.

The careful statement of the theorem answers the question we raised earlier: What are the quotient
and remainder if we divide −7 by 3? The theorem tells us that there is only one choice for 𝑞 and 𝑟 in
the equation −7 = 𝑞 ⋅ 3 + 𝑟, so long as 0 ≤ 𝑟 < 3. The unique values that work are 𝑞 = −3 and 𝑟 = 2,
so these must be the quotient and remainder.

We’ve been justifying Euclid’s algorithm for finding the largest number that divides two numbers. To
express the result of that algorithm—the greatest common divisor of 𝑥 and 𝑦—mathematicians write
either gcd(𝑥, 𝑦) or (𝑥, 𝑦); we’ll use the second notation.

69. Given integers 𝒙 and 𝒚, find integers 𝒎 and 𝒏 such that 𝒎𝒙 + 𝒏𝒚 = (𝒙, 𝒚), the greatest common
divisor of 𝒙 and 𝒚.

Before we attempt to find 𝑚 and 𝑛, how do we even know they exist? Is it always true that the greatest
common divisor of two numbers can be expressed as a sum of multiples of those numbers?

Theorem: The greatest common divisor of any integers 𝑥 and 𝑦 can be expressed as 𝑚𝑥 + 𝑛𝑦
for some integers 𝑚 and 𝑛.

Let 𝑆 be the set of all numbers of the form 𝑎𝑥 + 𝑏𝑦, for integers 𝑎 and 𝑏 and let 𝑑 be the
smallest positive element of 𝑆. Since it’s in 𝑆, this number 𝑑 must be the sum of some multiple
of 𝑥 and some multiple of 𝑦; say it’s 𝑚𝑥 + 𝑛𝑦. We’d like to show that 𝑑 = (𝑥, 𝑦), the greatest
common divisor of 𝑥 and 𝑦; that would certainly prove that the greatest common divisor can be
written in the form we want.

First, though, we’ll argue that every element of 𝑆 is a multiple of 𝑑. Take any element of 𝑆, say
𝑎𝑥 + 𝑏𝑦 and divide it by 𝑑. According to the division theorem we proved in Item 68, we can
write the result this way, with 0 ≤ 𝑟 < 𝑑:

𝑎𝑥 + 𝑏𝑦 = 𝑞𝑑 + 𝑟 = 𝑞(𝑚𝑥 + 𝑛𝑦) + 𝑟
But then we have

𝑎𝑥 + 𝑏𝑦 − 𝑞𝑚𝑥 − 𝑞𝑛𝑦 = (𝑎 − 𝑞𝑚)𝑥 + (𝑏 − 𝑞𝑛)𝑦 = 𝑟


This shows that 𝑟 is in 𝑆, since it’s the sum of a multiple of 𝑥 and a multiple of 𝑦. But it’s less
than 𝑑, which is the smallest positive element of 𝑆. Since 𝑟 thus can’t be positive, it must be
zero. We’ve shown that dividing 𝑎𝑥 + 𝑏𝑦 by 𝑑 gives a remainder of zero; that means 𝑎𝑥 + 𝑏𝑦
must be a multiple of 𝑑.

Now 𝑥 ∈ 𝑆, since we can write it as 1 ⋅ 𝑥 + 0 ⋅ 𝑦 and 𝑦 ∈ 𝑆, since 𝑦 = 0 ⋅ 𝑥 + 1 ⋅ 𝑦. Every


element of 𝑆 is a multiple of 𝑑, hence 𝑑 divides 𝑥 and 𝑦; that is, we’ve shown that 𝑑 is a divisor
of both, a common divisor.

72
©2018, Cullen Schaffer
But is it the greatest common divisor? Sure, because any common divisor, since it divides both
𝑥 and 𝑦, also divides all numbers of the form 𝑎𝑥 + 𝑏𝑦, i.e. every element of 𝑆. Since 𝑑 is one of
those elements, it must divide 𝑑; but then it can’t be bigger than 𝑑.

This shows us that the numbers 𝑚 and 𝑛 exist, but it doesn’t tell us how to find them. As an example, in
Item 68 we found that the greatest common divisor of 66,445 and 73,295 is 685. But what are 𝑚 and
𝑛 such that

𝑚 ⋅ 66,445 + 𝑛 ⋅ 73,295 = 685


In fact, the Euclidean algorithm from Item 68 will tell us, so long as we keep track of some additional
information while applying it. At any stage of the algorithm, we have

𝑎𝑖 = 𝑘𝑖+1 𝑎𝑖+1 + 𝑎𝑖+2


Rewriting, we have

𝑎𝑖+2 = 𝑎𝑖 − 𝑘𝑖+1 𝑎𝑖+1


That is, we can always replace any 𝑎 by an expression involving multiples of the two previous 𝑎’s.
Continuing, we can eventually get an expression for the last non-zero remainder—the greatest common
divisor—as a sum of multiples of the first two, 𝑎0 and 𝑎1 . In our example, we have:
73,295 = 1 ⋅ 66,445 + 6,850
66,445 = 9 ⋅ 6,850 + 4,795
6,850 = 1 ⋅ 4,795 + 2,055
4,795 = 2 ⋅ 2,055 + 685
Rewrite these, isolating the remainders:
73,295 − 1 ⋅ 66,445 = 6,850
66,445 − 9 ⋅ 6,850 = 4,795
6,850 − 1 ⋅ 4,795 = 2,055
4,795 − 2 ⋅ 2,055 = 685
And then substitute, starting at the bottom:
685 = 4,795 − 2 ⋅ 2,055
= 4,795 − 2(6,850 − 1 ⋅ 4,795) = 3 ⋅ 4,795 − 2 ⋅ 6,850
= 3(66,445 − 9 ⋅ 6,850) − 2 ⋅ 6,850 = 3 ⋅ 66,445 − 29 ⋅ 6,850
= 3 ⋅ 66,445 − 29(73,295 − 1 ⋅ 66,445) = 32 ⋅ 66,445 − 29 ⋅ 73,295
That is, the greatest common divisor, 685, can be expressed as the sum of multiples of our original
numbers, with the multipliers being 32 and −29.

70. Prove that there are an infinite number of primes.

A prime number, 𝑝, is a positive number divisible by exactly two positive numbers: 1 and 𝑝. Examples
are: 2, 3, 5, 7 and 11. Note that 1 is divisible by only one positive number, so it isn’t prime. Euclid knew
how to prove that there are an infinite number of primes. He argued this way:

If there were only a finite number of prime numbers, then there would be a largest one—call it
𝑝. Now consider 𝑞 = 𝑝! + 1 and let 𝑟 be any prime number that divides 𝑞. This number 𝑟 can’t

73
©2018, Cullen Schaffer
be less than or equal to 𝑝, since dividing 𝑞 by any number less than or equal to 𝑝 gives a
remainder of 1. But then, we’ve discovered a prime number larger than the largest prime
number. This is impossible. We reached this contradiction by assuming that the number of
prime numbers must be finite, so the assumption must be false. That is, it must be that there
are an infinite number of prime numbers.

This type of argument is called a proof by contradiction. We temporarily assume the opposite of what
we want to prove, show that this assumption leads to a contradiction and conclude that the assumption
was false—that is, that what we really wanted to prove is true. Although this kind of argument is heavily
used in mathematics, we also rely on it almost without thinking in everyday life:

You say Fred cooked last night? But you can always tell when Fred cooks, because he leaves the
kitchen a mess. And this morning the kitchen was spotless. So Fred can’t have cooked.

If we like, we can use propositional logic to formalize the argument. Let 𝑃 be the proposition “Fred
cooked last night” and let 𝑄 be the proposition “the kitchen is a mess this morning.” Then the argument
depends on the truth of two assertions:

 𝑃→𝑄
 ¬𝑄
The first of these is read “𝑃 implies 𝑄” and it means that if 𝑃 is true, 𝑄 must also be true—in the
example, if Fred did cook last night, the kitchen definitely is a mess this morning. The symbol → is
new—it’s an addition to ∨,∧ and ¬, the logical operators we’ve already defined,—but it’s easy to draw
up the truth table defining it. The only way that 𝑃 → 𝑄 can be false is if 𝑃 is true, but—contrary to what
we’ve said—𝑄 is false. This means the truth table must be

𝑷 𝑸 𝑷→𝑸
𝐹 𝐹 𝑇
𝐹 𝑇 𝑇
𝑇 𝐹 𝐹
𝑇 𝑇 𝑇

Note very carefully that 𝑃 → 𝑄 is true when 𝑃 is false. How can it be lie to say that if Fred cooked, the
kitchen is a mess? Only if Fred did cook and the kitchen is clean; if Fred did not cook, the statement
can’t be false.

Now the argument about Fred starts from the premises 𝑃 → 𝑄 and ¬𝑄 and concludes ¬𝑃. Is this valid?
Again, we can simply check the truth table. For 𝑃 → 𝑄 to be true, we must have values for 𝑃 and 𝑄
corresponding to the first, second or fourth rows of the truth table just above. For ¬𝑄 to be true, we
must have values corresponding to the first or third rows. The only possible way for both to be true is
for the values to be the ones in the first row, where 𝑃 is false.

The general rule—from 𝑃 → 𝑄 and ¬𝑄 conclude ¬𝑃—goes by the Latin name modus tollens. This is
one of a large number of logically valid arguments. You may also see the term modus ponens used to
refer to one of the most basic valid arguments—from 𝑃 and 𝑃 → 𝑄 conclude 𝑄.

74
©2018, Cullen Schaffer
We’ve just justified the form of the argument used in proving that there are an infinite number of prime
numbers. But there is still an important hole in the proof. After defining 𝑞 we said: Let 𝑟 be the largest
prime dividing 𝑞. But how do we know there are any prime numbers dividing some arbitrary number
like 𝑞?

Theorem: For every integer 𝑛 > 1, there is a prime 𝑝 such that 𝑝|𝑛.

The proof is by induction on 𝑛. If 𝑛 = 2, then we can let 𝑝 = 2. Suppose then that the claim is
true for all numbers up to some value 𝑛 and consider 𝑛 + 1. If this number is itself prime, then
we can let 𝑝 = 𝑛 + 1. If not, then it is divisible by some number other than 1 and 𝑛 + 1, say 𝑟
where 2 ≤ 𝑟 ≤ 𝑛. By the induction hypothesis, 𝑟 is divisible by some prime 𝑝. But if 𝑛 + 1 = 𝑟𝑘
(for some 𝑘) and 𝑟 = 𝑝𝑘′ (for some 𝑘′) then 𝑛 + 1 = 𝑝𝑘′𝑘, that is, 𝑝|(𝑛 + 1).

One handy consequence of this theorem is that, if two numbers have a factor in common, then they
have a prime factor in common. By the theorem, the common factor must itself be divisible by a prime,
and then this prime divides both of the original numbers.

71. Show that the set of positive integers less than 𝒏 and relatively prime to 𝒏 form a group under
multiplication mod-𝒏.

When we do clock arithmetic, we remove multiples of 12. For example, if it’s 9 o’clock now, what time
is it seven hours later? We get the answer by computing 9 + 7 = 16 and removing 12 to get 4 o’clock.
What time will it be 700 hours later? The answer is 9 + 700 − 59 ⋅ 12 = 1, i.e. it will be 1 o’clock.

This is called modular arithmetic, and it works just as well with numbers other than 12. Clock arithmetic
is called mod-12. If we leave out multiples of, say, 7, it’s mod-7 arithmetic. We write things this way,
for example: 32 ⋅ 6 ≡ 3 (mod 7), since 32 ⋅ 6 − 27 ⋅ 7 = 3. The expression 16 ≡ 2(mod 7) is read “16
is equivalent to 2 mod 7.”

Some interesting things happen if we pick a number 𝑛—say, 𝑛 = 9—and look at the set of all numbers
less than 𝑛 that have no factors in common with it, that is, numbers 𝑥 for which (𝑥, 𝑛) = 1. When
(𝑥, 𝑛) = 1, we say that 𝑥 and 𝑛 are relatively prime. In the case of, 𝑛 = 9, the relatively prime numbers
are 1, 2, 4, 5, 7 and 8.

The first interesting thing about the set {1, 2, 4, 5, 7, 8} is that, working mod-9, multiplying these
numbers together always gives us an answer back in the set. For example 4 ⋅ 5 ≡ 2 (mod 9).

The second interesting thing is that every number in the set has a multiplicative inverse in the set—that
is, every number in the set is paired with another one such that the product of the two is 1. For
example, 5 ⋅ 2 ≡ 2 ⋅ 5 ≡ 1 (mod 9). Normally—when we’re not doing modular arithmetic—the inverse
of 5 is the reciprocal, 1/5. What we’re saying is that, in the set we’re discussing, the modular equivalent
of the reciprocal of each member is also a member of the set.

We’ve been claiming that sets like the one we’ve been looking at have certain properties. Let’s prove
the claims.

We start by giving a name, 𝐺𝑛 , to the set of all numbers smaller than 𝑛 that are relatively prime to 𝑛.
When we multiply any two of these, 𝑥 and 𝑦, mod-𝑛 we get 𝑥 ⋅ 𝑦 − 𝑘 ⋅ 𝑛 for some 𝑘. Any chance that

75
©2018, Cullen Schaffer
this has a factor in common with 𝑛? If so, then, as we saw at the end of Item 70, there must be a
common factor that’s prime. Since this factor divides 𝑘 ⋅ 𝑛, it would also have to divide 𝑥 ⋅ 𝑦. But a
prime number that divides 𝑥 ⋅ 𝑦 would have to divide either 𝑥 or 𝑦 and that’s not possible, since these
are in 𝐺𝑛 . So we’ve now proved that 𝑥 ⋅ 𝑦 (mod 𝑛) has no factors in common with 𝑛 and hence must be
a member of 𝐺𝑛 .

Another way to say the same thing is that 𝐺𝑛 is closed under multiplication.

By the way, this proof relies on the fact that, if a prime number divides a product 𝑥 ⋅ 𝑦, then it must
divide one of the factors. This follows from a basic fact, that all integers can be factored in just one way
into prime numbers. For example, 42 is 2 ⋅ 3 ⋅ 7 and 63 is 3 ⋅ 3 ⋅ 7 and there is no other collection of
primes in either case that gives the same product.7 Given this, we argue as follows. To say that 𝑝|𝑥𝑦
means that 𝑝𝑘 = 𝑥𝑦, for some 𝑘. If we factor both sides of this equation into primes (by factoring 𝑘 on
the left and 𝑥 and 𝑦 on the right), the left will include 𝑝, so the right must as well; and since the factors
on the right are precisely the factors of 𝑥 and 𝑦, one of these must have 𝑝 as a factor.

Now we’d like to show that every element of 𝐺𝑛 has a multiplicative inverse in 𝐺𝑛 . Let 𝑥 ∈ 𝐺𝑛 . Since
(𝑥, 𝑛) = 1, we know from Item 69 that there are integers 𝑎 and 𝑏 such that 𝑎𝑥 + 𝑏𝑛 = 1. The number
𝑎 must not have any factors in common with 𝑛; if it did, the whole left side of the equations would be
divisible by this factor and, hence, so would the right side; but 1 can’t be divisible by any such factor.
Since 𝑎 has no factors in common with 𝑛, we have 𝑎 ∈ 𝐺𝑛 . But since 𝑎𝑥 + 𝑏𝑛 = 1, we have 𝑎𝑥 = 1 −
𝑏𝑛 or 𝑎𝑥 ≡ 1 (mod 𝑛); that is, 𝑥 has an inverse in 𝐺𝑛 , namely 𝑎.

The set 𝐺9 = {1,2,4,5,7,8} is interesting, as we’ve said, because (a) it’s closed under multiplication and
(b) every element has an inverse in the set. Taken together, these two properties imply that 1 is in the
set. If we add in the property that multiplication is associative—i.e. that 𝑎 ⋅ (𝑏 ⋅ 𝑐) = (𝑎 ⋅ 𝑏) ⋅ 𝑐 for any
three elements 𝑎, 𝑏 and 𝑐—then we have what mathematicians call a group. Associativity goes without
saying for 𝐺𝑛 , since the multiplication operation is based on ordinary multiplication, which we know is
associative.

72. Prove that the order of a subgroup must divide the order of the group.

In Item 71, we defined a group to be a set 𝐺 with elements that can be combined with some kind of
associative operation that we’ll write and talk about as if it’s multiplication, e.g. we write 𝑔1 ⋅ 𝑔2 or 𝑔1 𝑔2
and say “𝑔1 times 𝑔2 ,” but which can be defined in any way so long as the set is closed under the
operation and each element has an inverse in the set. We’ll write the inverse of an element 𝑔 as 𝑔−1
and the product 𝑔 ⋅ 𝑔−1 as 1. Note that this 1 isn’t necessarily the number 1—it’s just something that
has no effect when we multiply another element by it.

The size of a group is called its order and is written this way: |𝐺|. If a subset 𝑆 of the group happens
itself to be a group, we call it a subgroup. We’d like to prove that |𝑆| divides |𝐺|.

As an example, 𝐺15 = {1,2,4,7,8,11,13,14}. A subgroup is 𝑆 = {1,4}—you can easily check that it’s
closed and every element has an inverse. Note that |𝑆| = 2 which does, in fact, divide |𝐺| = 8.

7
Of course, the same factors can be arranged in a different order: 42 = 2 ⋅ 3 ⋅ 7 = 7 ⋅ 2 ⋅ 3 = 2 ⋅ 7 ⋅ 3, etc.

76
©2018, Cullen Schaffer
To carry out the proof, we define a coset, 𝑔𝑆, of a subgroup 𝑆 to be the elements we get when we
multiply each of the elements of 𝑆 by some element 𝑔 of the group 𝐺. In our example, we might take 𝑔
to be 11. Then 𝑔𝑆 would be {11 ⋅ 1, 11 ⋅ 4} = {11, 14}.

This coset has the same number of elements as the subgroup it’s based on. In fact, this has to be the
case. Suppose we get the same answer twice: 𝑔𝑠1 = 𝑔𝑠2 . Then multiplying by 𝑔−1 on the left of both
sides yields 𝑠1 = 𝑠2. So conversely, if we multiply 𝑔 by two different elements 𝑠1 and 𝑠2, we must get
different answers. In other words, we must get as many answers in forming the set as there are
elements in 𝑆.

Now suppose we form a coset based on a new group element 𝑔′. Our next claim is that this new coset is
either exactly the same as the first coset or has nothing in common with it. Suppose it has even one
thing in common, i.e. suppose 𝑔𝑠𝑖 = 𝑔′ 𝑠𝑗 for some choice of these four elements. Then consider any
element of the original coset—say 𝑔𝑠𝑘 .

 This element of the original coset is a multiple of 𝑔𝑠𝑖 . Specifically 𝑔𝑠𝑘 = (𝑔𝑠𝑖 ) ⋅ (𝑠𝑖−1 𝑠𝑘 ).
 Therefore, the element of the original coset is also a multiple of 𝑔′ 𝑠𝑗 . That is, 𝑔𝑠𝑘 = (𝑔′ 𝑠𝑗 ) ⋅
(𝑠𝑖−1 𝑠𝑘 ).
 But then, since the element of the original coset can be written as 𝑔′ times an element of 𝑆
(namely 𝑠𝑗 𝑠𝑖−1 𝑠𝑘 ), it is an element of the new coset.

We’ve shown that any element of the original coset must be an element of the new coset. An exactly
analogous argument shows that the reverse is true. Since neither has any elements that aren’t in the
other one, the two cosets are identical.

Now suppose we form all the cosets of the subgroup. Each will have |𝑆| elements and each element of
|𝐺| will appear in exactly one coset; we’ve just shown that an element can’t be in two different cosets
and each element 𝑔 will be in some coset—since 𝑆 contains 1, 𝑔 is a member of 𝑔𝑆. The upshot is that
we’ve partitioned 𝐺 into subsets, each containing |𝑆| elements; that is, finally, |𝑆| divides |𝐺|.

As an example, here are the cosets for the subset 𝑆 = {1,4} of 𝐺15 :
{1,4}, {11,14}, {2,8}, {7,13}

Showing that we can divide 𝐺15 into subsets of size 2 is proof that |𝐺15 | is divisible by 2.

This result we’ve proved—that the size of a subgroup must divide the size of the group—is called
Lagrange’s Theorem.

73. Prove that the order of an element in a finite group divides the order of the group.

One way to form a subgroup is to compute all the powers of any individual element. For example,
suppose we compute powers of 4 in 𝐺15 :

{40 = 1, 41 = 4, 42 = 1, 43 = 4, … } = {1, 4}

This is the same subgroup we’ve already seen. No matter how many powers we compute, we only get
two elements, 1 and 4.

As a second example, consider powers of 2:

77
©2018, Cullen Schaffer
{20 = 1, 21 = 2, 22 = 4, 23 = 8, 24 = 1, 25 = 2, 26 = 4, … } = {1, 2, 4, 8}

Again, the answers begin to repeat. Note that, as expected, the order of the subgroup (four in this case)
divides the size of the group (eight). To see that the set {1, 2, 4, 8} actually is a subgroup, we’d need to
check that it’s closed under multiplication and that all elements have inverses in the set. In a minute,
though, we’ll prove in general that a set of powers like this must be a subgroup.

As a last example, consider powers of 7:

{70 = 1, 71 = 7, 72 = 4, 73 = 13, 74 = 1, 75 = 7, 76 = 4, … } = {1, 7, 4, 13}

Again, if we check for closure and inverses, we’ll find that this too is a subgroup.

Okay, but why does the set of powers form a subgroup? Well, it’s easy to see that it’s closed, since a
product of powers is itself a power: 𝑎𝑛 ⋅ 𝑎𝑚 = 𝑎𝑚+𝑛 . Also, so long as the original set is finite, we’re
eventually going to run out of possible answers as we compute powers and we’ll repeat. That is, we’ll
get 𝑎𝑛 = 𝑎𝑚 . Multiplying by 𝑎−1 on both sides enough times, we’ll find that some power of 𝑎 is 1. The
lowest power of 𝑎 for which that’s true is called the order of 𝑎, since it’s the number of elements in the
subgroup, i.e. if the order is 𝑘, then the subgroup is:

{1, 𝑎1 , 𝑎2 , … , 𝑎𝑘−1 }
After this point, the elements clearly repeat.8 Also, this shows every element has an inverse: the inverse
of an element 𝑎𝑛 is 𝑎𝑘−𝑛 .

The order of 𝑎 is the number of elements in the subgroup of its powers. From Lagrange’s Theorem in
Item 72, we know that this number divides the order of the group. That is: The order of any element
divides the order of the group.

74. What is the order of 𝑮𝒏?

The question we’re asking is: How many positive numbers less than 𝑛 have no factors in common with
𝑛? The answer is called the totient of 𝑛 and is written this way: 𝜙(𝑛). For example, 𝜙(60) = 16
because we have the following list of 16 numbers relatively prime to 60:

1, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 49, 53, 59
Let’s look at this example in detail. If a number has a factor in common with 60, then it has a prime
factor in common (Item 70). Since 60 = 22 ⋅ 3 ⋅ 5, the only factors we need to consider are 2, 3 and 5.
If we rule out multiples of 2, 3 and 5, everything else is in 𝐺𝑛 . Let’s define:
𝑆 = {1, 2, 3, … , 60}
𝑀2 = {2, 4, 6, … , 60}
𝑀3 = {3, 6, 9, … , 60}
𝑀5 = {5, 10, 15, … , 60}

Then the numbers we want to rule out are exactly the elements of 𝑀2 ∪ 𝑀3 ∪ 𝑀5 . And we can write

8
If a group 𝐺 has an infinite number of elements, this argument doesn’t work. Instead, we let the set of powers of
𝑎 include negative powers (𝑎 −1 , the inverse of 𝑎, 𝑎 −2 , the inverse squared, and so on). In this case, it’s
automatically clear that the set we form includes inverses. Here, we’re only considering finite groups.

78
©2018, Cullen Schaffer
|𝐺60 | = 𝜙(60) = |𝑆| − |𝑀2 ∪ 𝑀3 ∪ 𝑀5 |

Since it’s obvious that |𝑆| = 60, all we need to get a formula for 𝜙(60) is to calculate |𝑀2 ∪ 𝑀3 ∪ 𝑀5 |
and we learned how to do that by the inclusion-exclusion principle back in Item 54. The size of the
union of the three sets is

|𝑀2 | + |𝑀3 | + |𝑀5 | − |𝑀2 ∩ 𝑀3 | − |𝑀2 ∩ 𝑀5 | − |𝑀3 ∩ 𝑀5 | + |𝑀2 ∩ 𝑀3 ∩ 𝑀5 |

that is, include the sizes of the individual sets, exclude the sizes of the intersections of pairs and include
the size of the intersection of all three.
60 60 60
Each of these is easy to calculate: There are 2
multiples of 2 in 𝑆, so |𝑀2 | = 2
. Likewise, |𝑀3 | = 3
and
60
|𝑀5 | = . An element of 𝑀2 ∩ 𝑀3 is both a multiple of 2 and a multiple of 3, so it’s a multiple of 6 and
5
60 60 60 60
there are 6 of these; that is, |𝑀2 ∩ 𝑀3 | = 2⋅3. Likewise, |𝑀2 ∩ 𝑀5 | = 2⋅5 and |𝑀3 ∩ 𝑀5 | = 3⋅5. Finally,
60
|𝑀2 ∩ 𝑀3 ∩ 𝑀5 | = . Plugging these in—and remembering that we’re subtracting |𝑀2 ∪ 𝑀3 ∪ 𝑀5 |
2⋅3⋅5
from |𝑆|—we have
60 60 60 60 60 60 60
𝜙(60) = 60 − ( + + − − − + )
2 3 5 2⋅3 2⋅5 3⋅5 2⋅3⋅5
60 60 60 60 60 60 60
= 60 − − − + + + −
2 3 5 2⋅3 2⋅5 3⋅5 2⋅3⋅5
1 1 1 1 1 1 1
= 60 (1 − − − + + + − )
2 3 5 2⋅3 2⋅5 3⋅5 2⋅3⋅5
The pattern is clear from this example, and it wouldn’t be hard to write out an analogous formula for
𝜙(𝑛) for a value of 𝑛 with more than three prime factors. But the formula would be very long—we’d
have a term for each of the factors, one for each pair, one for each set of three and so on. In fact, there
would be one term for each subset of factors; for 𝑘 factors, that’s 2𝑘 terms!

Luckily, we can write the expression inside the parentheses in a much more compact form:
1 1 1
(1 − ) (1 − ) (1 − )
2 3 5
As we’ve seen, expanding out a product of binomials like this gives us a term for each combination of
one term from the first factor, one term from the second factor and one from the third. If we choose
the 1 from each factor, we get 1 ⋅ 1 ⋅ 1 = 1, the first term of our parenthetical expression. Choosing 1
from two of the factors and a negative fraction from the third, we get the negative fraction—this
accounts for the next three terms in the parenthetical. Choosing 1 from one factor and negative
fractions from two factors, the negatives cancel and we get the next three terms of the parenthetical,
with the proper positive sign. Lastly, choosing all negative fractions, we get the product of three
negatives for the last term. Note that each additional fraction adds one factor of −1, flipping the sign so
that it matches what we want.

In short, if 𝑛 has prime factors 𝑝1 , 𝑝2 , … , 𝑝𝑘 , then we have the following formula:


1 1 1
𝜙(𝑛) = 𝑛 (1 − ) (1 − ) ⋯ (1 − )
𝑝1 𝑝2 𝑝𝑘

79
©2018, Cullen Schaffer
75. Prove that 𝒂𝝓(𝒏) ≡ 𝟏 (mod 𝒏), if (𝒂, 𝒏) = 𝟏.

Consider the group 𝐺𝑛 . If 𝑎 ∈ 𝐺𝑛 , then its order, 𝑘, divides the order of the group, 𝜙(𝑛); we proved that
this is true for any group in Item 72. So we can write 𝜙(𝑛) = 𝑘𝑚, for some 𝑚. But then 𝑎𝜙(𝑛) = 𝑎𝑘𝑚 =
𝑚
(𝑎𝑘 ) ≡ 1𝑚 ≡ 1 (mod 𝑛). This result is known as Euler’s theorem

𝑎𝜙(𝑛) ≡ 1 (mod 𝑛) if 𝑎 ∈ 𝐺𝑛 , that is, if (𝑎, 𝑛) = 1

We get another important result by looking at the special case where 𝑛 is prime. In this case, it’s more
suggestive to use the letter 𝑝. Euler’s theorem tells us that 𝑎𝜙(𝑝) ≡ 1 (mod 𝑝). But since 𝑝 is prime, it
has no factors in common with any smaller number. So 𝜙(𝑝) = 𝑝 − 1. Thus we have Fermat’s “little”
theorem:

𝑎𝑝−1 ≡ 1 (mod 𝑝) for any prime 𝑝 and any 𝑎 not divisible by 𝑝


Note that saying 𝑎 is not divisible by 𝑝 is equivalent to saying that (𝑎, 𝑝) = 1, which is what we need for
Euler’s theorem to apply; the only possible factor that 𝑎 can have in common with a prime, 𝑝, is 𝑝 itself.

76. Show how to use Fermat’s little theorem for public-key encryption.

When you go to a website and enter your login and password, these get communicated to the site via a
string of intermediary computers. What keeps the owners of those computers from looking at your
login information?

As you probably know, the login information is encrypted before it gets sent, so that it won’t be
readable by anyone but the intended receiver. But that still leaves an open question. When you and
the site communicate to agree on the way that the information will be encrypted, can’t the
intermediaries listen in on that communication? And if they learn how the information will be
encrypted, can’t they then simply reverse the process and decrypt it?

Oddly enough, much of modern life depends on the fact that—for now—knowing how information is
encrypted doesn’t mean knowing how to decrypt it. A system called RSA, invented in the 1970s, allows
a site to publically announce how to encrypt messages sent to it. Everyone knows how to encrypt; but
only the receiver knows how to decrypt. Very surprisingly, this public key encryption system depends on
Fermat’s little theorem, which dates from 1640!

Suppose we have a positive number, 𝑎, which is divisible by neither of two primes 𝑝 and 𝑞. Since 𝑎 isn’t
divisible by 𝑝, neither is 𝑎𝑞−1 . So by Fermat’s little theorem we have

(𝑎𝑞−1 )𝑝−1 ≡ 1 (mod 𝑝)

That is, 𝑎(𝑞−1)(𝑝−1) ≡ 1 (mod 𝑝), which in turn means 𝑎(𝑞−1)(𝑝−1) − 1 is divisible by 𝑝.

Likewise, if 𝑎 isn’t divisible by 𝑞, then neither is 𝑎𝑝−1 . So the theorem tells us that

(𝑎𝑝−1 )𝑞−1 ≡ 1 (mod 𝑞)

This means 𝑎(𝑞−1)(𝑝−1) ≡ 1 (mod 𝑞), which in turn means 𝑎(𝑞−1)(𝑝−1) − 1 is divisible by 𝑞.

But if 𝑎(𝑞−1)(𝑝−1) − 1 is divisible by two different primes, then it must be divisible by the product 𝑝𝑞.
So, we have

80
©2018, Cullen Schaffer
𝑎(𝑝−1)(𝑞−1) ≡ 1 (mod 𝑝𝑞)
We can put both sides of this to an arbitrary power 𝑠, giving us

𝑎 𝑠(𝑝−1)(𝑞−1) ≡ 1 (mod 𝑝𝑞)


and then multiply by 𝑎 on both sides to get

𝑎1+𝑠(𝑝−1)(𝑞−1) ≡ 𝑎 (mod 𝑝𝑞) (∗)


We’ve marked this with an asterisk so we can refer back to it.

Now, look at 𝐺(𝑝−1)(𝑞−1). Pick any element 𝑐, that is, a number that has no factors in common with
(𝑝 − 1)(𝑞 − 1). Being in a group, it has an inverse, say 𝑑. That is

𝑐𝑑 ≡ 1 (mod (𝑝 − 1)(𝑞 − 1))


Or, putting the same thing another way, for some integer 𝑠 we have

𝑐𝑑 = 1 + 𝑠(𝑝 − 1)(𝑞 − 1)
What the result marked above with an asterisk tells us is that

𝑎𝑐𝑑 ≡ 𝑎 (mod 𝑝𝑞)

Now, here’s the application. If we need to receive a secure message—say a user’s password or credit
card number—we start by picking two very large prime numbers 𝑝 and 𝑞. Let’s say each is 200 digits
long. We find a number 𝑐 that has no factors in common with (𝑝 − 1)(𝑞 − 1). Then we publicly
announce 𝑐 and the product 𝑝𝑞, which we’ll call 𝑁. Note that, if 𝑝 and 𝑞 are 200 digits long, then 𝑁 will
be about 400 digits long.

The user’s computer takes the private message and breaks it into 400-digit pieces (turning the message
into digits in the first place is standard—think ASCII or Unicode). Then for each piece, call it 𝑎, the
computer sends us the result of calculating 𝑎𝑐 (mod 𝑁).

To decode, we find the inverse of 𝑐 (mod (𝑝 − 1)(𝑞 − 1)), call it 𝑑 and calculate (𝑎𝑐 )𝑑 ≡ 𝑎𝑐𝑑 ≡
𝑎 (mod 𝑝𝑞). This gives us 𝑎, the original message.

But why can’t other people do the same thing?

The problem is that, to decode, you need to find 𝑑, which is the inverse of 𝑐 (mod (𝑝 − 1)(𝑞 − 1)). In
order to do this, it’s necessary to know 𝑝 and 𝑞, but these numbers are not public. All we’ve published is
the product 𝑁 = 𝑝 ⋅ 𝑞. Of course, we know 𝑝 and 𝑞, but anyone else who wants to know them will have
to factor the 400-digit number 𝑁 and no one knows how to do this efficiently. In fact, it’s widely
believed that it’s impossible to use an ordinary computer—even a fantastically fast one—to solve this
factoring problem efficiently. A new kind of machine—called a quantum computer—could solve the
factoring problem. But full-scale quantum computers can’t yet be built. For the present, RSA is secure.

77. How do the sizes of the set ℕ, ℤ, 𝟐ℤ, ℚ and ℝ compare?

All of the symbols in this question except 2ℤ were defined in Item 55. The set 2ℤ consists of elements
that we get by multiplying all elements in ℤ by 2, i.e. {… , −6, −4, −2, 0, 2, 4, 6, … }.

81
©2018, Cullen Schaffer
In general, we often define a set by saying what form or property the elements have. The standard
notation looks like this:

𝑆 = {𝑥 | 𝑥 has property 𝑃}
We read this: 𝑆 is the set of elements 𝑥 that have property 𝑃. Or, if the elements have a certain form,
the notation would be:

𝑆 = {𝑓(𝑥) | 𝑥 ∈ 𝑇}

We’d read this: 𝑆 is the set of all elements of the form 𝑓(𝑥) where 𝑥 an element of 𝑇.

Sometimes we combine the two, defining a set as all elements of another set that have a certain
property:

𝑆 = {𝑥 ∈ 𝑇 |𝑥 has property 𝑃}
that is, 𝑆 consists of the elements in 𝑇 that have the property.

To illustrate the notation, we could say

2ℤ = {𝑥 ∈ ℤ | 2 divides 𝑥}

or

2ℤ = {2𝑥 | 𝑥 ∈ ℤ}

Now, what about the relative sizes of ℤ and 2ℤ? It’s tempting just to say that both are infinite and leave
it at that. But mathematicians have a more subtle way of answering the question—one that, as it
happens, has ramifications for computer science.

Let’s say that two sets have the same size—cardinality is the technical term—if their elements can be
matched up one-to-one. This is clearly exactly what we mean when we say that {1, 2, 3} and {2, 4, 6}
are sets of the same size.

But now we can conclude that ℤ and 2ℤ are also the same size, since we can match up each element 𝑥
in ℤ with the element 2𝑥 in 2ℤ. To check that this really is a one-to-one correspondence, we need to
make sure that all elements in 2ℤ are matched and that none is matched twice, but this is quite clear
here, so the conclusion is valid.

What about ℕ and ℤ? Again, a one-to-one correspondence is easy enough to exhibit:

ℕ ℤ
0 0
1 −1
2 1
3 −2
4 2
⋮ ⋮

82
©2018, Cullen Schaffer
By the way, it’s not essential that we show a formula that matches elements of one set with those of the
other. A table like the one above makes the point just as well. But we can give a formula in this case, if
𝑥
we like. We’ve matched every 𝑥 ∈ ℕ with (−1)𝑥 ⌈ ⌉.
2

Since we have a one-to-one correspondence between ℕ and ℤ and another between ℤ and 2ℤ, we can
combine them to get a correspondence between ℕ and 2ℤ—just take an element of ℕ, match it with
one in ℤ and then match this with one in 2ℤ. The upshot is that all three sets are the same size.

Could it be that all infinite sets are the same size? This makes intuitive sense. But then, why would the
mathematicians have invented the one-to-one correspondence way of looking at the question? Let’s
continue.

What about ℕ and ℚ? Here it’s not at all obvious that there’s a one-to-one correspondence. Between
any two fractions—elements of ℚ—there are an infinite number of additional elements; between
elements of ℕ there’s nothing. It certainly seems as if the infinite size of ℚ is larger than the infinite size
of ℕ. By the mathematician’s definition, though, the sizes again turn out to be the same. Here’s a
picture of how we can begin to think about the problem:


4 ⋰
( )
1 6
3 3 ⋰
( ) ( )
1 3 2 7
2 2 2 ⋰
( ) ( ) ( )
1 1 2 4 3 8
1 1 1 1 ⋯
( ) ( ) ( ) ( )
1 0 2 2 3 5 4 9

The subscripts show how we assign an element of ℕ to every positive fraction. This isn’t yet a one-to-
one correspondence for several reasons, but each of them is easy to address. First, we’ve left out the
rational number zero; no problem, we’ll match the natural number zero to that and start the subscripts
from one in our diagram. Second, we left out all the negative fractions in ℚ; no problem, as we count
2
through ℕ assigning subscripts, we’ll match one with a positive fraction, like 3, and the next one with the
2
corresponding negative fractions, e.g. − 3. Lastly, our picture shows elements of ℚ more than once; for
1 2
example, is the same element as . No problem, as we assign subscripts, we’ll skip any elements we’ve
1 2
already seen.

As you can see, the only really important thing is that we’ve found a way to list or enumerate the
rational numbers:

 Start with zero.


 Run through the fractions in the picture above in the order given by the subscripts. For each
new one, list it and its negative.
1 1 1 1 3 3 2 2 1 1
The beginning of the list looks like this: 0, 1, −1, 2, −2, 2 , − 2 , 3, −3, 3 , − 3 , 4, −4, 2 , − 2 , 3 , − 3 , 4 , − 4 , ….
Matching these with 0, 1, 2, 3, 4, …, we really do get a one-to-one correspondence with ℕ.

83
©2018, Cullen Schaffer
In short, any infinite set that can be enumerated can be put into one-to-one correspondence with ℕ and
this includes ℤ, 2ℤ and ℚ. All four infinite sets are the same size.

Suppose, though, that we could enumerate the numbers in ℝ. Then we could certainly enumerate the
smaller set {𝑥 ∈ ℝ | 0 ≤ 𝑥 ≤ 1}. But we can show that this is impossible. Suppose we had such an
enumeration. It might start like this:

0 . 𝟏9567 ⋯
1 . 2𝟓001 ⋯
2 . 09𝟗34 ⋯
3 . 370𝟗1 ⋯
4 . 8083𝟏 ⋯
⋮ ⋮

Now watch what happens if we create a new number by running down the diagonal and increasing each
digit we see by one (unless it’s 9, in which case we decrease it to 8). In the example, the diagonal,
marked in bold, is . 15991 ⋯; so the new number is . 26882 ⋯. This is certainly a real number between
zero and one, so it should be in the list, but it can’t be, since we’ve constructed it so as to make it
different in at least one digit from every number in the list. This particular list is definitely not a
complete enumeration of the real numbers between 0 and 1.

But this attack succeeds on every list, thus no enumeration can be complete. That is, it’s not possible to
put ℕ in one-to-one correspondence with ℝ. By the mathematician’s definition, the size of ℝ is greater
than the size of ℕ or even of ℚ.

78. Show that there is no one-to-one correspondence between 𝑺 and 𝟐𝑺 for any set 𝑺.

The diagonal proof we gave in Item 77 can be modified to produce a much more general result. Take
any set 𝑆 and consider the set consisting of all its subsets. We call this the power set of 𝑆 and write it
2𝑆 . We’ve known for a long time that if 𝑆 is finite, say |𝑆| = 𝑛, then |2𝑆 | = 2𝑛 . But what can we say
about the cardinality of 2𝑆 when 𝑆 is infinite?

If 𝑆 is infinite, then, as we saw in Item 77, it may be impossible to enumerate the set’s elements.
Temporarily, though, let’s act as if we can; that is, let’s suppose that we can write 𝑆 = {𝑠1 , 𝑠2 , 𝑠3 , … }. In
this case, watch what happens if we try to get a one-to-one correspondence between elements of 𝑆 and
elements of 2𝑆 :

𝑠1 𝑠2 𝑠3 𝑠4 ⋯
𝑠1 𝑌 𝑁 𝑁 𝑌 ⋯
𝑠2 𝑁 𝑁 𝑌 𝑁 ⋯
𝑠3 𝑌 𝑌 𝑌 𝑁 ⋯
𝑠4 𝑁 𝑌 𝑌 𝑁 ⋯

The left-hand column in this table shows the elements of 𝑆. For each, reading across the row, we have
the subset it’s matched with: 𝑌 indicating that the element above is in the subset and 𝑁 indicating that
it’s not. For example, reading the first row, we see that 𝑠1 has been matched with a subset that starts

84
©2018, Cullen Schaffer
{𝑠1 , 𝑠4 , … }. The problem is that we can read down the diagonal (𝑌𝑁𝑌𝑁 ⋯), flip the yes’s to no’s and vice
versa (𝑁𝑌𝑁𝑌 ⋯) and create a new subset that’s not in the listing. It differs from the subset matched
with 𝑠1 as to whether or not it contains 𝑠1; it differs from the subset matched with 𝑠2 as to whether or
not it contains 𝑠2 and so on. Since every proposed one-to-one correspondence can be shown to be
incomplete in this way, none is possible.

The argument we’ve just given is not generally valid because, as we’ve pointed out, it’s not always
possible to enumerate the elements of 𝑆. If 𝑆 can’t be enumerated, proposed one-to-one
correspondences can’t be pictured as tables like the one in the example.

In fact, though, we can always make our diagonal argument. The crux was forming a subset that’s
provably different from all the ones matched by elements of 𝑆 and we did it by making sure the subset
matched with an element 𝑠 was different from this new subset with respect to whether or not it
contained 𝑠.

In general, take any proposed correspondence between 𝑆 and 2𝑆 and form the diagonal subset

𝐷 = {𝑠 ∈ 𝑆 | 𝑠 not in the set matched with 𝑠}

All we need to show is that this subset is not matched with any element; this would mean that the
proposed one-to-one correspondence is incomplete. Suppose, on the contrary, that 𝐷 is matched with
some element 𝑡 ∈ 𝑆. This is impossible, because 𝐷 differs from the subset matched with 𝑡 with respect
to whether it contains 𝑡; if 𝑡 ∈ 𝐷, that must be because it’s not in the subset matched with 𝑡; if 𝑡 ∉ 𝐷,
that must be because it is in the subset matched with 𝑡. Either way, we’ve confirmed that the subset in
question is not 𝐷.

79. Show that just recognizing if a string is in a specified collection of strings is a problem, in general,
too hard for computer programs.

We’re going to prove mathematically that there are limits to computing power. The proof will follow
from analysis of a simple kind of programming problem: We specify a collection of strings, like {apple,
aardvark, amplitude, ax, amiable, ⋯ }, and ask if it’s possible to write computer program that will tell us
if any arbitrary string is in the collection.

In the case of this example, the answer is yes. The collection we have in mind is the set of all English
words that start with the letter ‘a’. There’s a finite number of these and it’s straightforward to write a
program that checks any string to see if it’s one of them.

In fact, for any finite collection of strings, the same argument shows that such a program is possible.

What about if our collection is all strings of letters that start with the letter ‘a’—including ones like
apnoht and azazz. This is an infinite set, but it’s still obvious that a recognizing program is possible. The
program takes a string, checks the first letter and uses an if-else statement to say whether the string is
in the collection or not.

To talk precisely about more complex problems of this kind, we need some terminology. First, we need
to specify a finite set of characters that may be used to form strings; we’ll call this the alphabet. For our
purposes, the only important thing is that the alphabet should include all characters used in the
programming language we want to employ in writing programs. We can assume that it’s something like

85
©2018, Cullen Schaffer
ASCII or Unicode. We’ll also feel free to assume that the alphabet has some natural order, extending the
familiar alphabetical order; in the case of ASCII or Unicode character sets, this might as well be the
numerical order of the character codes.

A string is then just one or more characters of the alphabet in a specified order. We don’t include
infinite strings of characters.

We define a language to be any precisely-specified collection of strings. Note that this use of the word
‘language’ has nothing to do with natural languages like Chinese and Spanish and also nothing to do
with programming languages. It’s just a way of referring to a set of strings.

Finally, we need to be precise about what we mean by a program that recognizes what strings are in a
language. We say a program accepts a language 𝐿 if:

 When we run the program, it asks us to type in a string 𝑠.


 The program is then guaranteed to produce output as follows:
o If 𝑠 ∈ 𝐿, the output is ‘Yes’
o If 𝑠 ∉ 𝐿, the output is ‘No’

A program that doesn’t ask for input doesn’t qualify. Neither does one that sometimes runs forever
without giving an answer or one that ever gives a wrong answer.

We’ll refer to a program that asks for a string and then definitely produces output of either ‘Yes’ or ‘No’
as a recognizer. Such a program is of the right form and it accepts some language—just not necessarily
the language 𝐿 that we have in mind.

Given all this terminology, we can now make our argument. Let 𝑆 be the set of all strings. This set can
be enumerated: We first list all one-character strings in alphabetical order, then all two-character strings
in alphabetical order, all three-character strings and so on. This tells us that the cardinality of 𝑆 is the
same as the cardinality of ℕ.

Let 𝑅 be the set of all recognizers. A key thing to note is that a recognizer, being a program, is just a
string of characters. We’re used to seeing programs displayed over a number of lines, with indentation
showing the form, but this just means that some of the characters are newlines, spaces and tabs. As
stored internally, the program is just a single long sequence of characters. This means that 𝑅 ⊆ 𝑆 and,
as a consequence, that we can enumerate 𝑅: Just run through the enumeration of 𝑆 and list each
recognizer as you come to it. We conclude that |𝑅| = |ℕ|.

As we enumerate 𝑅, each recognizer we come to solves the language-accepting problem for exactly one
language. Of course, there are many different ways of writing a program that accepts a language, so
many elements of 𝑅 will accept the same language. But the only languages accepted by any recognizer
are the ones accepted by one in this list. If we run down the list of recognizers and note each new
accepted language as we come to it, we will have enumerated all languages for which it’s possible to
write a recognizer. The number of such languages—since we’ve shown they can be enumerated—is |ℕ|.

Of course, this is an infinite number, but we know that there are infinite numbers larger than ℕ. In
particular, since languages are just subsets of 𝑆, the number of languages is |2𝑆 | = |2ℕ | and this is larger
than ℕ. There are more languages than recognizers, so there must be some language for which no
recognizer is possible.

86
©2018, Cullen Schaffer
We can argue this slightly more carefully: If all languages were accepted by some recognizer, then we
could match each language with one. But then we’d have a one-to-one correspondence between 2𝑆
and a subset of 𝑆, something we proved impossible in Item 78.9

In following the formal details of the argument, it’s important not to lose sight of the striking conclusion.
We just proved mathematically, that some straightforward programming problems can never be solved.
There are strict limits on what it’s possible to do with a computer. And, if you believe that people are
essentially complex machines, the proof also gives strict limits on the kinds of problems we can solve.

On the other hand, it’s a bit unsatisfying just to know that there exist languages for which it’s impossible
to build a recognizer. We’d like to have an actual example. Is it possible that we can precisely specify a
collection of strings and yet not be able to write a program to recognize it?

The answer is, yes, using the same diagonalization technique we developed in Item 78. Since we can
enumerate recognizers, we can refer to them this way: {𝑟0 , 𝑟1 , 𝑟2 , … }. And then we can define

𝐿 = {𝑟𝑖 ∈ 𝑅 | 𝑟𝑖 is not accepted by 𝑟𝑖 }


As a first step in understanding this definition, remember again that recognizers are programs and
programs are just strings. This language 𝐿 thus consists of certain strings, since its elements are in 𝑅. In
other words, 𝐿 is a bona fide language—a collection of strings. Now take any particular recognizer. To
decide if it’s in 𝐿 we imagine running the recognizer and, when it asks for input, typing in the whole
recognizer program itself as input. If the program says ‘Yes’ then the string we typed in is not in 𝐿; if it
says ‘No’ then the string is in 𝐿. Note that this makes it impossible for this particular recognizer to
accept 𝐿—it gives the wrong answer for the string we just typed. But the same argument makes it
impossible for any other recognizer to accept 𝐿. This language is a specific example of one for which it’s
impossible to write a recognizer. Accepting it is a problem that computers can’t solve.

Two last points. First, the notation obscures the fact that 𝐿 is actually fairly easy to describe in English:
It’s the set of recognizers that don’t accept themselves—all the programs that say ‘No’ when you type
the program itself as input. Note that this is a perfectly precise description—you understand exactly
which strings are being discussed—and yet there’s no way to translate this precise description into an
algorithm.

Second, it’s natural to think of the following apparent loophole. Can’t we build the recognizer we want
this way:

 Ask for an input string.


 Check to see if it’s a recognizer.
o If not, output ‘No’.
o If so, run the recognizer on itself and output the opposite of whatever the recognizer
says.

In fact, it’s possible to use a diagonalization argument to prove that checking to see if a program is a
recognizer is itself impossible. One important part of the definition of a recognizer is that it gives an
answer; a program that runs forever doesn’t qualify. But determining whether or not a program will run

9
Actually, we proved that there’s no correspondence between 𝑆 and 2𝑆 , but the diagonal argument works just as
well if we try to match only a subset of 𝑆 with all elements of 2𝑆 .

87
©2018, Cullen Schaffer
forever is what computer scientists call the halting problem and, again, diagonalization can be used to
show it can’t be solved.

For now though, we can rely on much simpler argument. If we try to turn the design sketch just given
into an actual recognizer program, either the result won’t be a recognizer—in which case it’s not a
correct implementation of the design—or else it will be recognizer and then it is already included in our
enumeration of 𝑅. But, since it’s one of the 𝑟𝑖 , it must give the wrong answer for this same 𝑟𝑖 . Again,
the implementation is incorrect. There’s nothing wrong with the design, per se, it’s just that it can’t be
correctly implemented.
𝒏−𝟏
80. Prove that any 𝒏-vertex graph with more than ( ) edges is connected.
𝟐
In Item 34, we defined a connected graph to be one in which there’s a path from every vertex to every
𝑛
other vertex. If we have a complete graph—that is, if we include all ( ) edges—then each vertex is
2
linked directly to all the others and the graph is certainly connected. If we include no edges, it’s
certainly not connected.

What’s the maximum number of edges we can include and still not have a connected graph? Since the
graph is not connected, any arbitrary vertex, call it 𝑣, must not have a path to all of the others. We’ll call
the set of vertices it can reach (including itself) 𝐾 and the set of those it can’t reach 𝐿. If |𝐾| = 𝑘, then
|𝐿| = 𝑛 − 𝑘.

Now the most edges we can have in this situation is if all vertices in 𝐾 have edges between them and,
likewise for 𝐿. Suppose this is the case and we then add in all edges linking the two sets, that is, all
𝑛
edges from a vertex in 𝐾 to one in 𝐿. The result will be a complete graph and will thus have ( ) edges.
2
But the number of edges linking the two sets is 𝑘(𝑛 − 𝑘). Therefore, the maximum number before we
𝑛
add the extra edges must be ( ) − 𝑘(𝑛 − 𝑘). Again, we’d like to make this as large as possible. To do
2
this, we need to make 𝑘(𝑛 − 𝑘) as small as possible. A graph of the function 𝑘(𝑛 − 𝑘) is an upside-
down parabola crossing the horizontal axis at 𝑘 = 0 and 𝑘 = 𝑛; it’s lowest at the sides. But 𝐾 contains
at least one element, 𝑣, and it can’t contain all 𝑛 elements, since 𝑣 can’t reach everything. The farthest
to the sides that we can get is either 𝑘 = 1 or 𝑘 = 𝑛 − 1. In either case, one of 𝐾 and 𝐿 includes just
𝑛−1
one element and no edges and the other includes 𝑛 − 1 elements and ( ) edges. The maximum
2
𝑛−1
number of edges in an unconnected graph is thus ( ). Any graph with more than this number of
2
edges must be connected.

81. What is the minimum number of edges in a connected graph with 𝒏 vertices?

In Item 34 we defined a cycle as a path from a vertex back to that same vertex. Suppose the path of the
cycle looks like this: 𝑣1 , 𝑣2 , 𝑣3 , … , 𝑣𝑘 , 𝑣1 . Then there are two ways to get from 𝑣1 to 𝑣2 —directly along
the edge connecting them or going in the opposite direction around the cycle: 𝑣1 , 𝑣𝑘 , 𝑣𝑘−1 , … , 𝑣3 , 𝑣2 .

As a consequence, a connected graph with a minimum number of edges must not contain a cycle. If it
did, we could remove one edge—say the one from 𝑣1 to 𝑣2 in our example, which we’ll call 𝑒𝑣1 𝑣2 —and
the graph would remain connected. There’s still a path between every two vertices, because any such
path that used to rely on 𝑒𝑣1 𝑣2 could instead go the other way around.

88
©2018, Cullen Schaffer
Now consider a connected graph with no cycles and look for the longest path in it that has no vertex
repeated. This path is of the form 𝑣1 , 𝑣2 , 𝑣3 , …. There can’t be an edge from 𝑣1 to any of the 𝑣𝑖 other
than 𝑣2 , since this would form a cycle. And there can’t be an edge from 𝑣1 to any other vertex, because
that would be extend the path, making it longer than the longest. In short, 𝑣1 is connected to only one
other vertex; it’s of degree 1. Every connected graph with no cycles has a vertex of degree 1.

A simple induction argument now shows that the minimum number of edges in a connected graph with
𝑛 vertices is 𝑛 − 1. A graph with one vertex is connected and has no edges, confirming the claim for
𝑛 = 1. If the claim is true for connected graphs with 𝑛 vertices, what about one with 𝑛 + 1? We’ve
seen that it has a vertex 𝑣 of degree 1. Removing 𝑣 and the edge connecting it to the rest of the graph
leaves a connected graph of 𝑛 vertices with a minimum number of edges which, by hypothesis, must be
𝑛 − 1. Adding 𝑣 and its incident edge back in yields a total of 𝑛 edges, proving the claim true for the
next higher number of vertices.

82. Prove that a connected graph of 𝒏 vertices with more than 𝒏 − 𝟏 edges must have a cycle.

Start with a set 𝑆 containing any single vertex 𝑣. Repeatedly find an edge from a vertex in 𝑆 to one not
in 𝑆 and add this vertex to 𝑆.

Note, first, that so long as |𝑆| < 𝑛, it must be possible to continue this process. When |𝑆| < 𝑛, there is
some vertex not in 𝑆, call it 𝑤. Since the graph is connected, there must be a path from 𝑣 to 𝑤. The first
edge along this path from an element in 𝑆 to one outside it is of the kind we want.

Note, second, that there is always a path between any two elements of 𝑆. This follows by induction.
The vertex 𝑣 is connected to itself. And if at any point 𝑆 is connected, then there will be a path from the
next vertex added to any earlier vertex—it starts with the edge connecting the new vertex to one in 𝑆
and the induction hypothesis guarantees that it can continue from there to the target.

Now 𝑆 starts with one vertex, 𝑣, and its size is then incremented each time we find an edge connecting a
vertex in 𝑆 to one outside. When |𝑆| reaches 𝑛, therefore, we will have used just 𝑛 − 1 edges. Since the
graph contains more than 𝑛 − 1 edges, there must be one remaining and this must join two elements in
𝑆. But there is already a path between them. So this edge completes a cycle.

In Item 81, we saw that a connected graph must have at least 𝑛 − 1 edges; now we see that a
connected graph without a cycle can’t have more than 𝑛 − 1 edges. Putting the two together, we
conclude that a connected graph without a cycle has exactly 𝑛 − 1 edges.

Graphs of this kind are a major focus in graph theory and of fundamental importance in computer
science. Using the term acyclic to mean “having no cycles,” we define a tree to be an acyclic connected
graph. Here’s a single example, pictured in three ways:

89
©2018, Cullen Schaffer
The picture on the left is clearly acyclic and connected, but it’s not obvious why we’d want to call it a
tree. The middle picture, on the other hand, might be a family tree with 𝑑 being the parent of 𝑒, 𝑐, 𝑓 and
𝑔; 𝑐 being the parent of 𝑎 and 𝑏; and 𝑔 being the parent of ℎ. As graphs, though, the left and middle
pictures are identical—each contains the same nodes, {𝑎, 𝑏, 𝑐, … , ℎ}, and the same set of edges
connecting them, {𝑒𝑎𝑐 , 𝑒𝑏𝑐 , 𝑒𝑐𝑑 , … , 𝑒𝑔ℎ }. In fact, the picture on the right is also exactly the same graph—
we’ve just grabbed hold of 𝑔 and shaken the rest down.

The middle and right pictures show why trees are so important—they’re a natural way to represent
hierarchies. A family tree captures a hierarchy of generations; an organization chart shows the
management hierarchy of a company; an evolution tree shows species in a hierarchy of common
ancestors; and if you’ve studied object-oriented programming, you know that programmers create
hierarchies of classes; these too can be represented with trees.

Of course, in a hierarchy, it matters who’s on top. The middle and right pictures are equivalent as
graphs, but we need a way to distinguish them. In a rooted tree, we designate one vertex as the root—
it’s 𝑑 in the middle picture and 𝑔 in the one on the right. We then use family terminology in the natural
way to describe relationships. In the middle tree, 𝑐 is a child of 𝑑 and 𝑑 is 𝑐’s parent; 𝑒 and 𝑔 are
siblings, children of the same parent; 𝑎 is a grandchild of 𝑑; 𝑑 is an ancestor of all other vertices and
these other vertices are 𝑑’s descendants. We revert to botany only at the very bottom: vertices with no
children, like 𝑒 and ℎ, are called leaves of the tree. Stories differ as to why we normally show trees
upside down, with the root at the top and leaves at the bottom.

83. Show that, for any convex polyhedron, the number of corners plus the number of sides is two
more than the number of edges.

A cube has eight corners and six sides; summing, we get 14. This is two more than the number of edges,
12.

A pentagonal prism looks like this:

It has 10 corners and 7 sides, for a total of 17. This is two more than the number of edges, 15.

Here’s a kind of pyramid with an irregular, five-sided base:

90
©2018, Cullen Schaffer
It has 6 corners and 6 sides, for a total of 12—two more than the number of edges, 10.

The relationship that these examples illustrate holds for all convex polyhedra, a polyhedron being a solid
three-dimensional shape with polygons as sides and convex meaning that no part of the shape slopes
inward (technically, that no internal angle is greater than 180 degrees).

The first step in the proof is to turn a polyhedron into a two-dimensional graph. The corners will be
vertices; edges remain edges; for now, we ignore the sides. Here’s what a cube looks like, in this case,
while still three-dimensional:

Next, we take an arbitrary side—we’ll pick the front-facing one—and stretch it out in all directions:

As the picture makes clear, we can now flatten the graph. The end result is not only two-dimensional,
but has no intersecting edges. A graph that can be drawn in two dimensions without any intersecting
edges is called planar. Any convex polyhedron can be flattened in the way we’ve just described into a
connected planar graph.

During the flattening process, there’s clearly no change in the number of vertices and edges. All the
original sides become enclosed regions in the graph, except for the one we stretched. If we count the
space outside the graph as one additional region, then the number regions is the same as the original
number of sides. So what we now need to prove is that the number of vertices plus the number regions
is two more than the number of edges.

In fact, we can show that this is true for all connected planar graphs, not just ones that result from
flattening polyhedra. Here are two examples of connected planar graphs that definitely were not
produced in this way:

91
©2018, Cullen Schaffer
We know they can’t come from polyhedra because they have vertices of degree one and two. These
would correspond to corners with only one or two associated edges, which is impossible for three-
dimensional solids with flat sides.

The graphs are planar though, since we’ve drawn them with no intersecting edges. In the graph on the
left, there are 6 vertices and 1 region—the one outside the graph. The sum, 7, is two more than the
number of edges, 5. In the graph on the right, there are 3 regions and 6 vertices for a total of 9, which is
two more than the 7 edges.

The proof that this is always the case for connected planar graphs is by induction on the number of
regions. Call the number of regions 𝑛𝑟 and the number of vertices and edges, 𝑛𝑣 and 𝑛𝑒 . If there is only
one region, this must be the one outside the graph, implying that the graph does not enclose anything.
That is, it must have no cycles and hence must be a tree. In this case, we know that the number of
vertices is one more than the number of edges: 𝑛𝑣 = 𝑛𝑒 + 1. Given that 𝑛𝑟 = 1, we have 𝑛𝑣 + 𝑛𝑟 =
(𝑛𝑒 + 1) + 1 = 𝑛𝑒 + 2.

Now suppose the formula is correct for any particular number of regions. To increase the number by
one, we must add a path between two existing vertices. If the path adds 𝑘 vertices, then it adds 𝑘 + 1
edges. Before adding the path, we had 𝑛𝑣 + 𝑛𝑟 = 𝑛𝑒 + 2. Adding the path increases 𝑛𝑣 by 𝑘 and 𝑛𝑟 by
1, so it increases the left side of the equality by 𝑘 + 1. And since it increases 𝑛𝑒 by 𝑘 + 1, the same is
true of the right side, maintaining the equality.

The equation

𝑛𝑣 + 𝑛𝑟 = 𝑛𝑒 + 2
is known as Euler’s formula.

84. Show that every connected planar graph has a vertex of degree five or less.

We’ll give a proof by contradiction (see Item 70), starting with the assumption that every vertex has
degree at least six. In this case, the sum of all the degrees is at least 6𝑛𝑣 . Each edge contributes exactly
2 to this sum, so the sum of all the degrees is 2𝑛𝑒 . That is, 2𝑛𝑒 ≥ 6𝑛𝑣 or
1
𝑛 ≥ 𝑛𝑣
3 𝑒

92
©2018, Cullen Schaffer
Next, let’s call the number of edges that define a region its boundary count. Each region is bounded by
at least three edges, so the sum of the boundary counts is at least 3𝑛𝑟 . But each edge contributes at
most 2 to the sum of the boundary counts, so we must have 2𝑛𝑒 ≥ 3𝑛𝑟 .10 That is:
2
𝑛 ≥ 𝑛𝑟
3 𝑒
1 2
Now consider 𝑛𝑣 + 𝑛𝑟 . By the two inequalities we’re derived, this is at most 3 𝑛𝑒 + 3 𝑛𝑒 = 𝑛𝑒 . But
Euler’s formula (Item 83) tells us it must come to 𝑛𝑒 + 2. This is a contradiction; hence our assumption
that every vertex has degree at least six must be false.

85. Prove the six-color map theorem.

Map makers like to use different colors for different countries. Of course, the idea is use different colors
for countries that share a border, so that the change of color makes the border visually clear. Given this
constraint, how many colors are necessary?

Here’s an example that shows at least three colors are necessary.

If we use one color for country 𝑎, we must use a second for country 𝑏, since they have a common
border. But then we’ll need a third color for 𝑐, since it shares a border with both 𝑎 and 𝑏.

Three colors aren’t always enough, however. Here’s an example that shows we need at least four.

We’ve already seen that we must use a different color for each of countries 𝑎, 𝑏 and 𝑐. But 𝑑 has a
common border with each of these and so must need a fourth color.

At this point, it might seem that, no matter how many colors we have, there’s always a map that
requires more. In fact, though, we can prove the six-color map theorem, which states that with just six
colors, we can color every map.

10
Actually, this sentence and the previous one are both false if there is only one region, i.e. if the graph does not
enclose anything. But in this case, it’s a tree and there’s at least one vertex of degree one; hence our conclusion
that there’s a vertex of degree five or less is still true.

93
©2018, Cullen Schaffer
The first step is to convert the map to a planar graph, by placing one vertex in each country and drawing
an edge between countries that have a common border. Here’s how the conversion looks for our
previous example:

An acceptable map coloring now corresponds to a vertex coloring for which no edge connects vertices of
the same color. We’ve transformed the six-color map theorem into one of graph theory:

Theorem: The vertices of every planar graph can be colored using no more than six colors in
such a way that no edge connects vertices of the same color.

The proof is by induction on the number of vertices. If a graph has exactly one vertex, we assign
it a color and, since there are no edges, certainly no edge connects vertices of the same color.

Now assume the claim is true for all planar graphs of 𝑛 vertices and consider a planar graph with
𝑛 + 1 vertices. As we showed in Item 84, one of these must be of degree five or less; call it 𝑣. If
we temporarily remove 𝑣 and all incident edges from the graph, we have a graph of 𝑛 vertices,
which we are assuming can be colored with six colors. Now, what if we add 𝑣 and its edges back
in? Since 𝑣 has no more than five neighbors, no more than five colors are ruled out. But we
have six colors to choose from, so we can certainly pick one to complete the graph coloring.
That is, if we can color graphs with 𝑛 vertices, we can color ones with 𝑛 + 1.

86. Prove the five-color map theorem.

We can improve on the result of Item 85, showing that all maps can be colored using just five colors. Of
course, we’ll actually show the corresponding result for graph colorings, but the result is known as the
five-color map theorem. In the proof, we’ll use the term subgraph for a graph wholly made up of
vertices and edges contained in some larger graph.

Theorem: The vertices of every planar graph can be colored using no more than five colors in
such a way that no edge connects vertices of the same color.

The proof is by induction on the number of vertices. If a graph has only one vertex, the claim is
obvious.

If a graph has 𝑛 + 1 vertices, we find one of degree five or less, calling it 𝑣. We remove 𝑣 and its
associated edges from the graph and, relying on the induction hypothesis, use five colors to
color the remaining vertices. If 𝑣 is of degree less than five, then when we add it back there will
be a color for it that is not ruled out. Even if 𝑣 is of degree five and its neighbors are not all of
different colors then no more than four colors would be ruled out and, again, there will a color
left to assign to 𝑣. The only difficult case is if 𝑣 is of degree five and all of its neighbors are of
different colors.

94
©2018, Cullen Schaffer
In this case, the picture looks like this, with vertices 𝑣1 , 𝑣2 , 𝑣3 , 𝑣4 and 𝑣5 assigned colors
𝑐1 , 𝑐2 , 𝑐3 , 𝑐4 and 𝑐5 respectively.

Now within the whole graph, consider only vertices colored 𝑐1 and 𝑐3 and edges between them.
Either there is a path in this subgraph from 𝑣1 to 𝑣3 or not. If not, then we can take all vertices
reachable in the subgraph from 𝑣1 and swap the colors 𝑐1 and 𝑐3 for these vertices. The altered
coloring is still acceptable for all vertices except 𝑣. And since 𝑣1 is now assigned 𝑐3 , we are free
to assign 𝑐1 to 𝑣, completing the coloring.

The only difficulty is if there is a path in the subgraph from 𝑣1 to 𝑣3 . In this case, consider the
subgraph of vertices colored 𝑐2 and 𝑐4 and edges between them. Again, if there is no path
between 𝑣2 and 𝑣4 in the subgraph, we can swap the colors of all vertices reachable from 𝑣2 ,
leaving an acceptable coloring and freeing up 𝑐2 for use in coloring 𝑣. The only difficulty is if
there is a path in the 𝑐2 -𝑐4 subgraph from 𝑣2 to 𝑣4 . But this is impossible. We already have a
path from 𝑣1 to 𝑣3 , and any path from 𝑣2 to 𝑣4 would have to intersect it, contradicting the fact
that the whole graph is planar.

You may be wondering, now, if there is a four-color map theorem. There is, but it took mathematicians
more than 120 years to prove and the validity of the highly complex, computer-assisted proof was
debated for some years after it was published in 1976. As of today, no proof has been given that would
fit in a book of this length, let alone in a few pages of one. Of course, there will never be a three-color
map theorem: We’ve seen an example of a map that requires four colors.

87. In how many ways can five people be assigned ten tasks, if the tasks are all different and each
person must be assigned at least one task?

Without the proviso that each person gets at least one task, this is an easy problem that we solved in
Item 44. Each task may be assigned to any of five people, so the number of assignments is 510 .

For the new version, we’ll use inclusion-exclusion. Let 𝑆 be the set of all assignments and let 𝐴1 be the
set of assignments in which the first person is assigned no tasks, 𝐴2 be the set of assignments in which
the second person is assigned no tasks and so on. Then the number we want is

|𝑆| − |𝐴1 ∪ 𝐴2 ∪ 𝐴3 ∪ 𝐴4 ∪ 𝐴5 |

We’ve already noted that |𝑆| = 510. If any one person is not to be assigned tasks, there are four choices
for each task, so |𝐴1 | = |𝐴2 | = ⋯ = |𝐴5 | = 410 . Likewise if any two people are not to be assigned
tasks, there are three choices for each task, so |𝐴1 ∩ 𝐴2 | = |𝐴1 ∩ 𝐴3 | = ⋯ = |𝐴4 ∩ 𝐴5 | = 310 . The
principle of inclusion-exclusion would have us continue until we consider the intersection of all five
subsets, but there are clearly zero ways of assigning tasks if all five people are excluded. So then by
inclusion-exclusion, the number we want is

95
©2018, Cullen Schaffer
5 5 5 5
510 − (( ) 410 − ( ) 310 + ( ) 210 − ( ) 110 ) = 5,103,000
1 2 3 4

5
Noting that the first term, 510 , fits into the pattern if we write it ( ) 510 , we can give the same answer
0
more compactly as:
4
5
∑(−1)𝑖 ( ) (5 − 𝑖)10
𝑖
𝑖=0

This also makes it clear how the answer generalizes if we have 𝑡 tasks and 𝑝 people. If every person
must be assigned at least one task, the number of assignments is:
𝑝−1
𝑝
∑(−1)𝑖 ( ) (𝑝 − 𝑖)𝑡
𝑖
𝑖=0

As it turns out, we can also state the question in a much more general form. We’re used to writing
functions as formulas, e.g. 𝑓(𝑥) = 3𝑥 2 − 2. But actually a function is just a way of associating each
member of one set with exactly one of another set. In our example, 3𝑥 2 − 2, both sets are ℝ. But
another perfectly good function associates each element of ℝ with an element of the set {𝑇, 𝐹},
depending on whether it is greater than 𝜋. Still another associates elements in the finite set
{1, 2, … , 1000} with a letter in the set {𝐼, 𝑉, 𝑋, 𝐿, 𝐶, 𝐷, 𝑀}, depending on the first character of its
representation in Roman numerals.

Now suppose we consider all functions from a set 𝑇 of 𝑡 elements to a set 𝑃 of 𝑝 elements. To specify a
function, we just say what each element of 𝑇 is associated with. We have 𝑝 choices for each, so there
are 𝑝𝑡 possible functions. Note that this is exactly the answer we relied on earlier from Item 44. We
aren’t saying anything about what kind of elements make up 𝑇 and 𝑃—since that has no effect on our
answer—but they may as well be tasks and people.

Now what if we consider only functions in which every element of 𝑃 is associated with at least one
element of 𝑇? A function of this kind that “covers” all of 𝑃 is called surjective. Again, thinking of tasks
and people, we’re now only counting assignments for which each person is assigned at least one task.
The upshot is that the number of surjective functions from a set of 𝑡 elements to one of 𝑝 elements is
𝑝−1
𝑝
∑(−1)𝑖 ( ) (𝑝 − 𝑖)𝑡
𝑖
𝑖=0

Nice as it is to have a clear solution to such a general problem, it’s a bit disappointing that we can’t put
the answer in closed form. Oddly enough, though, we’ll soon see that this messy form leads us to a
strikingly clean solution for an apparently unrelated problem.

To apply it, we need an easy consequence. If 𝑡 < 𝑝, it’s clearly impossible to make up any surjective
functions. So in this case we have
𝑝−1
𝑝
∑(−1)𝑖 ( ) (𝑝 − 𝑖)𝑡 = 0
𝑖
𝑖=0

96
©2018, Cullen Schaffer
Leaving the 𝑖 = 0 term on the left and bringing the rest to the right, we have
𝑝−1
𝑝
𝑝 = ∑(−1)𝑖−1 ( ) (𝑝 − 𝑖)𝑡
𝑡
𝑖
𝑖=1

Again, this works so long as 𝑡 < 𝑝. In our application, we’ll use 𝑡 = 𝑝 − 2, yielding
𝑝−1
𝑝
𝑝𝑝−2 = ∑(−1)𝑖−1 ( ) (𝑝 − 𝑖)𝑝−2
𝑖
𝑖=1

88. Given a set of 𝒏 labelled vertices, in how many ways can we add edges to make a tree?

Let 𝑁𝑛 represent the number of trees. If 𝑛 = 2, the only way to make a tree is to draw an edge between
the two vertices, i.e. 𝑁2 = 1. We’ll use induction starting with this as a base case, but let’s look at the
answer for 𝑛 = 3 as well, just for the sake of clarifying what we’re counting. In any tree, the number of
edges is one less than the number of vertices (Item 82), so if we have 3 vertices, there must be 2 edges.
We can picture all possible choices of 2 edges and see that they result in distinct trees.

Thus, 𝑁3 = 3. Note carefully that we are counting unrooted trees; each of these diagrams would yield
three different rooted trees, one with vertex 1 as the root, one with vertex 2 and the root and one with
vertex 3 as the root.

We’ll prove that, for 𝑛 ≥ 2, 𝑁𝑛 = 𝑛𝑛−2 . This checks for 𝑛 = 2. Now suppose we have at least 3 vertices
and consider the set of all trees that can be made by adding edges between them. As we saw in Item
81, each tree must have a vertex of degree 1. Let 𝐴1 be the subset of trees in which vertex 1 is of
degree 1, let 𝐴2 be the subset of trees in which vertex 2 is of degree 1 and so on. Then 𝑁𝑛 = |𝐴1 ∪
𝐴2 ∪ ⋯ ∪ 𝐴𝑛 |. We’ll compute this using inclusion-exclusion.

Suppose vertex 1 is of degree 1. Then the other 𝑛 − 1 vertices form a tree and there are 𝑁𝑛−1 ways in
which they may do so. The single edge incident to vertex 1 may lead to any of the other vertices, so we
have |𝐴1 | = (𝑛 − 1)𝑁𝑛−1 . The same is true, of course, for all the other |𝐴𝑖 |.

Likewise, suppose vertices 1 and 2 are of degree 1. Then the other vertices form a tree in one of 𝑁𝑛−2
ways and each of the first two may be connected to any of them. So we have |𝐴1 ∩ 𝐴2 | =
(𝑛 − 2)2 𝑁𝑛−2, and the same is true of all the other |𝐴𝑖 ∩ 𝐴𝑗 |.

The pattern is now clear and we can apply inclusion-exclusion to get


𝑛 𝑛 𝑛
𝑁𝑛 = ( ) (𝑛 − 1)𝑁𝑛−1 − ( ) (𝑛 − 2)2 𝑁𝑛−2 + ⋯ ( ) (𝑛 − 𝑛)𝑛 𝑁𝑛−𝑛
1 2 𝑛
We can drop the last term, since it includes a factor of zero: (𝑛 − 𝑛). The rest, written compactly is

97
©2018, Cullen Schaffer
𝑛−1
𝑛
𝑁𝑛 = ∑(−1)𝑖−1 ( ) (𝑛 − 𝑖)𝑖 𝑁𝑛−𝑖
𝑖
𝑖=1

Now, by the induction hypothesis, each 𝑁𝑛−𝑖 is (𝑛 − 𝑖)𝑛−𝑖−2 . Substituting, we get


𝑛−1 𝑛−1
𝑛 𝑛
𝑁𝑛 = ∑(−1)𝑖−1 ( ) (𝑛 − 𝑖)𝑖 (𝑛 − 𝑖)𝑛−𝑖−2 = ∑(−1)𝑖−1 ( ) (𝑛 − 𝑖)𝑛−2
𝑖 𝑖
𝑖=1 𝑖=1

But at the very end of Item 87, we saw that the expression on the right is 𝑛𝑛−2 . So we have 𝑁𝑛 = 𝑛𝑛−2
for the next value of 𝑛, completing the induction argument.

The result, that the number of trees on 𝑛 labelled vertices is 𝑛𝑛−2 , is called Cayley’s formula.

89. Prove Dijkstra’s algorithm correct.

Diagrams that show dots connected by lines—what we call graphs—are worth studying because they
can represent so many things: cities connected by roads, relationships between people, computer
networks, organization charts, chemical structures, etc. Often, we can make graphs even more useful by
associating a number or weight with each edge. Here’s an example:

If the dots are companies, the weights might be the number of contracts they’ve signed with each other;
if the dots are cities, the weights might be the cost in dollars to send a package from one city to another;
if this is a computer network, the weights might be the time in milliseconds it takes a packet of bits to
get from one server to another.

In the last two cases, the sum of weights along a path has a natural meaning. If we want to send a
package from city 𝑏 to city 𝑐, the total cost depends on the path. The direct route costs 75 dollars, but if
we send it by way of 𝑎 it drops to 30 + 25 = 55; and we can do even better routing the package via 𝑑
and 𝑒: 20 + 5 + 15 = 40. Likewise, if we switch to the computer server interpretation, we can send a
packet directly from 𝑏 to 𝑐 in 75 milliseconds, but by going through intermediate servers, we can cut
that to 55 or even 40.

In general, then, given a weighted graph, it can be useful to find the shortest path between two vertices.
Actually, the method we’ll develop will do more—it finds the shortest path between a given starting
vertex 𝑠 and all of the vertices in the graph that are reachable from 𝑠. Moreover, we’ll be able to prove
that the method is correct, so long as all weights are positive.

98
©2018, Cullen Schaffer
We maintain a set, 𝐾, of all vertices for which the shortest path from 𝑠 is known. Of course, for each
vertex in 𝐾, we remember the actual path that was shortest. We start 𝐾 out by placing 𝑠 in it and
noting that the shortest path from 𝑠 to 𝑠 is the one that takes no steps.11

We then repeatedly add a vertex 𝑣 to 𝐾. To pick 𝑣, we take the paths that we already know to be
optimal from 𝑠 to members of 𝐾 and extend each of them in every way possible by a single edge, so
long as that edge leads to a vertex not in 𝐾. Here’s an example:

In this diagram, we already know the shortest path from 𝑠 to 𝑡 and we can extend it by one edge to 𝑣1 .
Likewise, we already know the shortest path from 𝑠 to 𝑢 and can extend it by one edge to either 𝑣1 or
𝑣2 . Of all the extended shortest paths, we choose the shortest one and add the vertex 𝑣 at the end of it
to our known set 𝐾. We’ll show in a moment that, when 𝑣 is added to 𝐾, the extended path that led to
it being selected is the shortest of all possible paths from 𝑠 to 𝑣. So we associate this path with 𝑣.

This process continues—adding vertices one by one to 𝐾—until no path can be extended to bring an
additional vertex into 𝐾. Let 𝑛 denote the number of vertices in 𝐾 when the process is completed.

Claim: When 𝐾 contains 𝑘 nodes, for 𝑘 ∈ {1, 2, … , 𝑛}, the path associated with each vertex in 𝐾
is optimal.

We have already argued that the claim is true when |𝐾| = 1 and 𝑠 is the only element of 𝐾.
Assume, then, that it is true when |𝐾| = 𝑘 − 1, for some 𝑘 ∈ {2, 3, … , 𝑛}. If 𝐾 contains 𝑘
vertices, it must be because it contained 𝑘 − 1 and we added one more, call it 𝑣, using the
procedure described above; by assumption, the path associated with each of the 𝑘 − 1 vertices
in 𝐾 is optimal.

When we add 𝑣, it is because an extended path to 𝑣, call it 𝑃, is the shortest of all extended
paths. Consider any other path, 𝑃′, from 𝑠 to 𝑣. It starts at 𝑠 and, after some number of steps,
takes a first step outside of 𝐾, say from a vertex 𝑘 inside 𝐾 to one on the outside called 𝑙. At
this point, the new path must be at least as long as the shortest distance from 𝑠 to 𝑘 increased
by the weight of the edge from 𝑘 to 𝑙. That is 𝑃′ is at least as long as some extended path. But
𝑃 is the shortest of all extended paths, so 𝑃′ can’t be shorter; that is, no path can be shorter
than 𝑃; 𝑃 is the shortest path from 𝑠 to 𝑣. Since the procedure associates 𝑃 with 𝑣, the claim is
true when |𝐾| = 𝑘.

As it happens, it’s quite easy to maintain a collection of extended paths; in fact, it’s also easy to make
sure that the collection always includes just the best extended path leading to any given vertex.

11
Of course, if some weights were negative, we might be able to find a path of length shorter than one that takes
no steps by traversing some negatively weighted edges. Since we’re assuming positive weights, this can’t happen.

99
©2018, Cullen Schaffer
We initialize our collection of extended paths to {} before any vertices are added to 𝐾. Then, when each
vertex is added to 𝐾, we consider all paths extending it by a single edge. If one of these extended paths
leads to a new vertex—one not at the end of an extended path already in our collection—we add it. If it
leads to a vertex already at the end of an extended path in our collection, it replaces that path, but only
if it is shorter.

Pseudocode for the algorithm we’ve been describing is straightforward, once we agree on some
notation. Let 𝑤𝑡(𝑣, 𝑤) be the weight of the edge from 𝑣 to 𝑤 and 𝑉 be the set of vertices at the end of
extended paths. For each vertex 𝑣 in 𝑉, we’ll use 𝑃(𝑣) to denote the shortest extended path to 𝑣 (so
far) and 𝑙𝑒𝑛(𝑣) to denote the length of the path.

initialize 𝐾 to {}, 𝑉 to {𝑠}, 𝑃(𝑠) to the empty path and 𝑙𝑒𝑛(𝑠) to 0


while 𝑉 is not empty
choose the 𝑣 in 𝑉 with the minimum value of 𝑙𝑒𝑛(𝑣); remove 𝑣 from 𝑉 and add it to 𝐾
for each edge from 𝑣 to another vertex 𝑤
let 𝑃′ be 𝑃(𝑣) extended by this edge and let 𝑙 be the length of 𝑃′, 𝑙𝑒𝑛(𝑣) + 𝑤𝑡(𝑣, 𝑤)
if 𝑤 ∉ 𝑉 and 𝑤 ∉ 𝐾
add 𝑤 to 𝑉 with 𝑃(𝑤) = 𝑃′ and 𝑙𝑒𝑛(𝑤) = 𝑙
else if 𝑤 ∉ 𝐾 and 𝑙 < 𝑙𝑒𝑛(𝑤)
set 𝑃(𝑤) to 𝑃′ and 𝑙𝑒𝑛(𝑤) to 𝑙

This is known as Dijkstra’s algorithm (the name is pronounced DIKE-struh).

90. Prove the Friends and Strangers theorem: In any group of six people, either there are three who
all know each other or three who are all unacquainted.

Before starting the proof, note that the same abstract claim can be framed in other contexts. Our proof
will also show that, of any six servers on a network, there must either be three that are all directly linked
to each other or three with no direct links to each other at all. Likewise, it will show that any six cities on
a highway map must include either three that can be reached from one another by traveling on just one
highway or three that cannot be reached from one another without using at least two highways.

In each of these cases, we have objects (people, servers, cities), each pair of which is in one of two
relations—people are acquainted or not, servers are directly linked or not, a pair of cities has a highway
running between them or not. We’ll represent this as a graph with vertices representing the objects and
colored edges between them—a blue edge if the pair is in one relation, a red one if it’s in the other.
Note that, since every pair of vertices is in one of the two relations, there is an edge between each pair.
That is, the graph is complete.

With this graph representation, our claim is the same whether it’s about people, servers or cities—a
complete graph with six vertices and edges colored red and blue must contain a triangle with all edges
blue or one with all edges red.

We start the proof by picking any vertex, 𝑣. Since the graph is complete, there are 5 edges connecting 𝑣
to other vertices. Of these edges, at least 3 must be of the same color—if there were no more than 2
blue and no more than 2 red, then there would be no more than 4 edges altogether and we know there
are five. Without loss of generality, we’ll assume at least three of the edges are red. These red edges
connect 𝑣 to three vertices—call them 𝑣1 , 𝑣2 and 𝑣3 . If there is a red edge connecting any two of these,

100
©2018, Cullen Schaffer
then this edge completes a red triangle, the other two sides of which are the red edges coming from 𝑣.
But if there is no red edge connecting any two of 𝑣1 , 𝑣2 and 𝑣3 , then these three form a blue triangle.

Note carefully the phrase “without loss of generality” in this proof; it’s a standard one. We knew that at
least 3 of the edges from 𝑣 were of the same color and we gave an argument assuming this color was
red. If, in fact, they were blue, the argument would have been entirely analogous, so we lose nothing by
simply picking arbitrarily. In general we say we are making a choice “without loss of generality” when
each option would be handled in exactly the same way and it’s as good to proceed with one as another.

We’ve just proved that any six-vertex complete graph with edges colored red and blue must contain
either a red triangle or a blue one. Of course, the same is all the more true of such a graph with seven
or more vertices. Any six of them form a complete subgraph and hence have a monochromatic—all-
one-color—triangle. But what about smaller graphs? Is a five-vertex complete graph with red and blue
edges bound to include a monochromatic triangle? Here’s the proof that it is not:

It took a paragraph of careful arguing to prove the positive claim about six-vertex graphs, because we
need to show that something was true of all such graphs. This simple diagram, however, is all that’s
necessary to prove the negative claim. To show that something is not true of all five-vertex graphs, we
only need to find one for which it’s false. In exhibiting this one, we say we are giving a proof by counter-
example.

91. Prove Ramsey’s theorem for graphs with edges of two colors.

Putting together the positive and negative results from Item 90, we conclude that six is the minimum
number of vertices—in complete graphs with edges colored red and blue—to guarantee existence of a
monochromatic triangle.

A triangle is a set of 3 vertices in which every pair is connected, that is, which forms a complete
subgraph. We’ll generalize and define a clique (say either ‘click’ or ‘cleek’) to be a complete subgraph of
any size.

Given this term, we can ask how large a complete graph with red and blue edges needs to be to ensure
that it contains monochromatic cliques of any specified sizes. We’ll write 𝑅(𝑟, 𝑏) for the minimum size
necessary to guarantee either a red clique of 𝑟 vertices of a blue one of 𝑏 vertices. The results of Item
90 can be summed up simply writing 𝑅(3, 3) = 6.

When we write 𝑅(4, 5), we mean the number of vertices in the smallest complete graph with red and
blue edges that’s guaranteed to include either four vertices, every pair of which are connected with red
edges, or five vertices, every pair of which are connected with blue edges. But is there such a number?

101
©2018, Cullen Schaffer
How can we be sure that, with enough vertices, we can find one or the other monochromatic clique?
And, even if we argue this case out, won’t we be left with a similar doubt about 𝑅(𝑟, 𝑏) for every other
pair of values for 𝑟 and 𝑏?

Theorem: For every pair of values 𝑟, 𝑏 ≥ 1, 𝑅(𝑟, 𝑏) is finite.

Note first that 𝑅(𝑟, 1) = 𝑅(1, 𝑏) = 1. A monochromatic one-vertex clique is just a single vertex,
and every non-empty graph contains one.

We’ll now prove by induction something slightly stronger than the claim of the theorem, namely
that 𝑅(𝑟, 𝑏) ≤ 𝑅(𝑟 − 1, 𝑏) + 𝑅(𝑟, 𝑏 − 1). Along with the base case just noted, this certainly
shows that 𝑅(𝑟, 𝑏) is finite.

The induction is on 𝑟 + 𝑏. Consider a complete graph with 𝑅(𝑟 − 1, 𝑏) + 𝑅(𝑟, 𝑏 − 1) vertices


and edges colored red and blue. Pick a vertex 𝑣 and partition the other vertices into two sets: 𝑉𝑟
being those connected to 𝑣 with a red edge and 𝑉𝑏 being those connected to 𝑣 with a blue edge.
We can’t have both |𝑉𝑟 | < 𝑅(𝑟 − 1, 𝑏) and also |𝑉𝑏 | < 𝑅(𝑟, 𝑏 − 1), because in this case the
total number of vertices, |𝑉𝑟 | + |𝑉𝑏 | + 1, would be less than 𝑅(𝑟 − 1, 𝑏) + 𝑅(𝑟, 𝑏 − 1). But if
|𝑉𝑟 | ≥ 𝑅(𝑟 − 1, 𝑏), then 𝑉𝑟 is guaranteed to have one of two kinds of monochromatic cliques. If
it has a 𝑏-vertex blue clique, then we’re done. If not, then it has an (𝑟 − 1)-vertex red clique, to
which we can add 𝑣 to make an 𝑟-vertex red clique. The argument for other case, |𝑉𝑏 | ≥
𝑅(𝑟, 𝑏 − 1), is strictly analogous.

This is called Ramsey’s theorem for two colors. A very slight extension of the proof just given yields a
theorem for graphs with edges of any finite number of colors and shows that 𝑅(𝑐1 , 𝑐2 , … , 𝑐𝑛 ) is a finite
number. We call these Ramsey numbers; the theorem proves that they are well-defined.

Proving Ramsey numbers are well-defined does not, unfortunately, tell us how to compute them. A
brief analysis showed us that 𝑅(3, 3) = 6. But no one in the world currently knows the value of 𝑅(5, 5)!

92. Prove Cayley’s formula combinatorially.

In a rooted tree on 𝑛 labelled vertices, we can think of each of the 𝑛 − 1 edges as directed from parent
to child and we can consider a list of these edges in some order. Given 𝑛 labelled vertices, let 𝑆 be the
set of all such lists for all rooted trees on these vertices.

We’ll count |𝑆| in two ways.

Let 𝑁𝑛 be the number of unrooted trees on our 𝑛 vertices. Then we can get an edge list in stages:

1. Pick an unrooted tree (𝑁𝑛 choices)


2. Designate one of the vertices as the root (𝑛 choices)
3. Pick an order for the 𝑛 − 1 edges ((𝑛 − 1)! choices)

By the multiplication principle, we have |𝑆| = 𝑁𝑛 ⋅ 𝑛 ⋅ (𝑛 − 1)! = 𝑁𝑛 ⋅ 𝑛!.

Alternatively, we can start with the 𝑛 labelled vertices and add edges one by one, thinking about how
many choices we have for each new edge. For the first edge, we may choose any of the 𝑛 vertices as
parent and any of the remaining 𝑛 − 1 vertices for its child, making 𝑛(𝑛 − 1) choices. Note carefully,
though, that this child can never again be chosen as the child for any subsequent edge—otherwise it

102
©2018, Cullen Schaffer
would have two parents, which is impossible in a rooted tree. In choosing a child, we must select a
vertex that does not yet have a parent; and each time we do so—that is, each time we add an edge—the
number of vertices without a parent is decreased by one.

When we choose a second edge, we can still designate any of the 𝑛 vertices as parent. But there are
only 𝑛 − 1 parentless vertices left from which to pick a child and one of these is already ruled out since
it is connected to the vertex we’ve chosen as parent. For our second edge, we therefore have 𝑛(𝑛 − 2)
choices.

For the third edge, we can pick any vertex as parent. There are 𝑛 − 2 parentless vertices left, one of
which is connected to the vertex just selected as parent. This leaves 𝑛 − 3 choices for the child and a
total of 𝑛(𝑛 − 3) choices for the edge.

If we continue in this way, choosing all 𝑛 − 1 edges and applying the multiplication principle, we have

|𝑆| = (𝑛(𝑛 − 1)) ⋅ (𝑛(𝑛 − 2)) ⋅ ⋯ ⋅ (𝑛 ⋅ 1) = 𝑛𝑛−1 (𝑛 − 1)!

Since our two counts of |𝑆| must be equal, we have 𝑁𝑛 ⋅ 𝑛! = 𝑛𝑛−1 (𝑛 − 1)! or 𝑁𝑛 = 𝑛𝑛−2 . This is
Cayley’s formula, as derived much more laboriously in Item 88.

93. In any set of 𝒏 + 𝟏 numbers drawn from {𝟏, 𝟐, 𝟑, … , 𝟐𝒏} one is a multiple of another.

Given any number, we can factor out 2’s repeatedly. For example, 28 = 2 ⋅ 14 = 2 ⋅ 2 ⋅ 7 and 72 = 2 ⋅
36 = 2 ⋅ 2 ⋅ 18 = 2 ⋅ 2 ⋅ 2 ⋅ 9. Note that what’s left, after we factor out all 2’s is an odd number—if it
were even, we could factor out another 2. So every number can be written in the form 2𝑛 ⋅ 𝑘, where 𝑘
is odd.

If we write each of our 𝑛 + 1 numbers in this way, we get 𝑛 + 1 odd numbers. Since each of these is no
larger than 2𝑛, it’s in the set {1, 2, … , 2𝑛}. But this set contains only 𝑛 odd numbers, so by the
pigeonhole principle one of the odd numbers must be repeated. But then our collection of 𝑛 + 1
numbers contains two numbers 2𝑚 ⋅ 𝑘 and 2𝑛 ⋅ 𝑘, the larger of which is a multiple of the smaller.

103
©2018, Cullen Schaffer

S-ar putea să vă placă și