Documente Academic
Documente Profesional
Documente Cultură
Mathematics
Textbook
CHOO YAN MIN
Since then, only small changes (usually corrections of typos) have been made.
The licensor cannot revoke these freedoms as long as you follow the license terms.
• Attribution — You must give appropriate credit, provide a link to the license, and
indicate if changes were made. You may do so in any reasonable manner, but not in any
way that suggests the licensor endorses you or your use.
• NonCommercial — You may not use the material for commercial purposes.
• ShareAlike — If you remix, transform, or build upon the material, you must distribute
your contributions under the same license as the original.
• No additional restrictions — You may not apply legal terms or technological measures
that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain
or where your use is permitted by an applicable exception or limitation. No warranties are
given. The license may not give you all of the permissions necessary for your intended use.
For example, other rights such as publicity, privacy, or moral rights may limit how you use
the material.
The scientist does not study nature because it is useful to do so. He studies it because
he takes pleasure in it, and he takes pleasure in it because it is beautiful.
- Henri Poincaré (1908 [1914], Science and Method, English trans., p. 22).
This textbook is thus written simply and non-rigorously. For example, there are few formal
definitions or proofs.2
Simple and non-rigorous as this textbook may be, I fully intend that a careful study of this
textbook (complemented by a capable teacher) will easily earn you your A in H1 Maths.
For a comparison of H1 vs H2 maths, check out this brief 3-page document: H1 Maths vs
H2 Maths: What’s the Difference? Which Should I Take?
• FREE! This book is free. But if you paid any money for it, I certainly hope your money
is going to me! This book is free because:
• DONATE! This book may be free, but donations are more than welcome! Donation
methods in footnote.3
It’s irrational for Homo economicus to donate. But please consider donating because:
1. There are any errors in this book. Please let me know even if it’s something as trivial
as a spelling mistake or a grammatical error.
2. You have absolutely any suggestions for improvement.
3. Any part of this book is less than crystal clear.
Here’s an anecdote about Richard Feynman, the great teacher and physicist:
Feynman was once asked by a Caltech faculty member to explain why spin
1/2 particles obey Fermi-Dirac statistics. He gauged his audience perfectly
and said, “I’ll prepare a freshman lecture on it.” But a few days later he
returned and said, “You know, I couldn’t do it. I couldn’t reduce it to
the freshman level. That means we really don’t understand it.”
I agree: If you can’t explain something simply, you don’t understand it well enough.4 And
as a corollary, the best way to gauge whether you understand something is to see if you
can explain it simply to someone else.
If at any point in this textbook, you have read the same passage a few times, tried to reason
it through, and still find things confusing, then it is a failure on MY part. Please let me
know and I will try to rewrite it so that it’s clearer. (There is also the possibility that I
simply messed up! So please let me know if there’s anything confusing!)
I deeply value any feedback, because I’d like to keep improving this textbook
for the benefit of everyone! I am very grateful to all the kind folks who’ve already
written in, allowing me to rid this book of more than a few embarrassing errors.
• LyX rocks!
You’re probably reading this on some device. So I’ve tried to set the font sizes and stuff so
that one can comfortably read this on a device as small as a seven-inch tablet. It should
also be possible to read this on a phone, though somewhat less comfortably. (Please let me
know if you have any feedback about this!)
(I’ll probably be contacting some publishers to see if they want to do a print version of
this, for anyone who prefers it in print.)
4
This quote or some similar variant is often (mis)attributed to Einstein. But as Einstein himself once said, “73% of Einstein
quotes are misattributed.”
L TEX is the typesetting program used by most economists and scientists. But LATEX can be difficult to use. LYX is a
5 A
user-friendly GUI version of LATEX. LYX has boosted my productivity by countless hours over the years and you should use
LYX too!
Reading maths is not like reading Harry Potter. Most of Harry Potter is fluff. There is
little fluff in maths.
So go slowly. Dwell upon and carefully consider every sentence in this textbook. Make sure
you completely understand what each statement says and why it is true. Reading maths
is very different from reading any other subject matter.
If you don’t quite understand some material, you might be tempted to move forward anyway.
Don’t. In maths, later material usually builds on earlier material. So if you simply move
forward, this will usually cost you more time and frustration in the long run.
Better then to stop right there. Keep working on it until you “get” it. Ask a friend or
a teacher for help. Feel free to even email me! (I’m always interested to know what the
common points of confusion are and how I can better clear them up.)
• Examples and exercises are your best friends. So work through them.
Work through all the examples and exercises. Merely moving your eyeballs is not the same
as working. Working means having pencil and paper by your side and going through each
example/exercise word-by-word, line-by-line.
For example, I might say something like “x2 − y 2 = 0. Thus, (x − y)(x + y) = 0.” If it’s not
obvious to you why the first sentence implies the second, stop right there and work on it
until you understand why. Don’t just let your eyeballs fly over these sentences and pretend
that your brain is “getting” it.
I will often not bother to explain some steps, especially if they simply involve some simple
algebra.
It’s called List of Formulae MF26. It’s available at this link (MF26). (I cannot guarantee
though that your JC will give you the List during your JC common tests and exams.)
Google is probably the quickest for simple calculations. Type in anything into your
browser’s Google search bar and the answer will instantly show up:
Wolfram Alpha is somewhat more advanced (but also slower). Enter “sin x” for example
and you’ll get graphs, the derivative, the indefinite integral, the Maclaurin series, and a
bunch of other stuff you neither know nor care about.
The Derivative Calculator and the Integral Calculator are probably unbeatable for the
specific purposes of differentiation and integration. Both give step-by-step solutions for
anything you want to differentiate or integrate.
Here is a collection of spreadsheets I made. These spreadsheets are for doing tedious and
repetitive calculations that H2 Maths students (and hence also H1 Maths students) will
often encounter. As with anything I do, I welcome any feedback you may have about
these spreadsheets. Perhaps in the future I will make a more attractive version of it.
(Instructions: Click “Make a copy” to open up your own independent copy of
this spreadsheet. Enter your input in the yellow cells. Output is produced in
the blue cells. If you mess up anything, simply click the same link and “Make
a copy” again.)
6
Pretty bizarre that in this age of the smartphone, they want you to learn how to use these clunky and now-useless devices
from the ’80s and ’90s. It is the equivalent of learning to program a VCR.
IMHO it’d be much better to teach you to some simple programming or Excel (or whatever spreadsheet program). “B-b-but
... how would such learning be tested in an exam format?” Ay, there’s the rub. In the Singapore education system, anything
that cannot be “examified” is not worth learning.
1 Dividing By Zero 18
2 Functions 19
3 Graphs: Introduction 21
4 Graphs: Intercepts 23
6 Quadratic Equations 27
7 Graphs: Asymptotes 32
8 Exponents: Laws 35
9 Exponents: Graphs 36
11 Logarithms: Introduction 42
12 Logarithms: Laws 43
13 Logarithms: Graphs 45
14 Logarithmic Growth 47
19 Quadratic Inequalities 57
II Calculus 62
22 Equations of Lines 63
24 Chain Rule 72
27 Inflexion Points 80
55 Sampling 205
55.1 Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
55.2 Population Mean and Population Variance . . . . . . . . . . . . . . . . . . . . 206
55.3 Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
55.4 Distribution of a Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
55.5 A Random Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
55.6 Sample Mean and Sample Variance . . . . . . . . . . . . . . . . . . . . . . . . . 211
55.7 Sample Mean and Sample Variance are Unbiased Estimators . . . . . . . . . 217
55.8 The Sample Mean is a Random Variable . . . . . . . . . . . . . . . . . . . . . 220
55.9 The Distribution of the Sample Mean . . . . . . . . . . . . . . . . . . . . . . . 221
55.10Non-Random Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Example 1. Find the values of x for which x(x − 1) = (2x − 2)(x − 1).
Here’s the wrong solution: “Divide both sides by x − 1 to get x = 2x − 2. So x = 2.”
Here’s the correct solution: “Case #1. Suppose x − 1 = 0. Then the given equation is
satisfied. So x = 1 is one possible value for which x(x − 1) = (2x − 1)(x − 1). Case #2.
Now suppose x − 1 ≠ 0. So we can divide both sides by x − 1 to get x = 2x − 2. So x = 2.
Conclusion. The two possible values of x for which x(x − 1) = (2x − 1)(x − 1) are x = 1 and
x = 2.”
Moral of the story. Whenever you divide by a certain quantity, make sure it’s non-zero.
If you’re not sure whether it equals 0, then break up your analysis into two cases, as was
done in the above example: Case #1 — the quantity equals 0 (and see what happens
in this case); Case #2 — the quantity is non-zero (in which case you can go ahead and
divide).
By the way, let’s take this opportunity to clear up another popular misconception — You
may have heard that 1/0 = ∞. This is wrong. 1/0 ≠ ∞. Instead, any non-zero number
divided by 0 is undefined.7 “Undefined” is the mathematician’s way of saying, “You haven’t
told me what you are talking about. So what you are saying is meaningless.”
7
One exception is 0/0, which is indeterminate. This means that 0/0 is sometimes undefined, but can sometimes be defined
under certain circumstances.
Informally, a function is a rule that maps each input to exactly one output.
Example 2. Consider the function f defined by f (x) = x2 +5. The input is any real number
x, the corresponding output is the real number x2 + 5. For example, f (3) = 32 + 5 = 14. In
words, we may say either of the following equivalent statements:
Example 3. Consider the function g defined by g(x) = x/ (x2 + 1). The input is any
real number x, the corresponding output is the real number x/ (x2 + 1). For example,
g(3) = 3/ (32 + 1) = 0.3. In words, we may say either of the following equivalent statements:
We will usually consider only functions whose inputs and outputs are real numbers. But
in general, this need not be the case. To illustrate this point, here are two examples.
Example 4. Consider the function h that maps each person’s name to the first letter of
that name. So for example, h (Lee Kuan Yew) = L. In words, we may say either of the
following equivalent statements:
Another example: h (Barack Hussein Obama) = B. In words, we may say either of the
following equivalent statements:
• h maps the input Barack Hussein Obama to the output B; or
• the value of h at Barack Hussein Obama is B.
Exercise 2. Let f (x) = 7x − 3. What are f (0), f (1), and f (2)? (Answer on p. 314.)
Exercise 3. Let g be the function that maps each country to its capital. What are
g(France) and g(Japan)? (Answer on p. 314.)
This may seem like an excessively pedantic distinction. But maths is precise and pedantic.
In maths, what we mean is precisely what we say and what we say is precisely what we
mean. There is never any room for ambiguity or alternative interpretations.
Example 6. Consider the equation y = 2x + 3. Its graph is the set of points (x, y) that
satisfy the equation y = 2x + 3. For example, the point (x, y) = (0, 3) is in the graph of the
equation y = 2x + 3, because 3 = 2 ⋅ 0 + 3.
We can illustrate the graph of an equation in what is called the cartesian plane. The
graph of y = 2x + 3 is drawn below. The point (0, 3) is marked in green.
8
6
4
2
0
-2 -1 0 1 2
-2
Example 7. Consider the equation y = x2 − 1. Its graph is the set of points (x, y) that
satisfy the equation y = x2 − 1. For example, the point (x, y) = (3, 8) is in the graph of the
equation y = x2 − 1, because 8 = 32 − 1.
The graph of y = x2 − 1 is drawn below. The point (3, 8) is marked in green.
10
-4 -2 -2 0 2 4
0
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
Example 9. Consider the function g defined by g(x) = x/ (x2 + 1). The graph of g is
defined to be the graph of the equation y = g(x); equivalently, it is the graph of the
equation y = x/ (x2 + 1).
The graph of g is drawn below. The point (2, 0.4) is marked in green.
0
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
-1
Exercise 4. Graph the following equations: (i) y = 2x − 1; (ii) y = 1 − x2 . Mark the point
where x = 2. (Answer on p. 314.)
Exercise 5. Graph the following functions: (i) the function f defined by f (x) = 5 − 3x; (ii)
the function g defined by g(x) = 3x + x2 . Mark the point where x = 2. (Answer on p. 315.)
A graph may also intersect the vertical axis (also known as the y-axis). The y-coordinate
of any such intersection point is called a vertical intercept (or y-intercept).
A graph may intersect the horizontal axis (also known as the x-axis). The x-coordinate of
any such intersection point is called a horizontal intercept (or x-intercept). Horizontal
intercepts are also called zeros or roots (of the corresponding equation or function). (We’ll
use the terms zeros and roots interchangeably in this textbook.)
Example 10. Graphed below is the equation y = 2x − 1. The graph has one horizontal
intercept, 0.5, and one vertical intercept, −1. Equivalently, the graph intersects the x-axis
at (0.5, 0); and the y-axis at (0, −1).
4
3
2
1
0
-2 -1 0 1 2
-1
-2
-3
-4
-5
-6
We also call 0.5 the zero or root of the equation y = 2x − 1, because 2(0.5) − 1 = 0. That
is, x = 0.5 satisfies the equation y = 0 or 2x − 1 = 0.
0
-2 -1 0 1 2
-1
-2
We also call −1 and 1 the zeros or roots of the function f , because f (−1) = 0 and f (1) = 0.
That is, x = −1 or x = 1 satisfies the equation f (x) = 0.
Example 12. Graphed below are the functions f and g defined by f (x) = (x + 1)2 and
g(x) = −(x − 1)2 .
This graph has a maximum turning point (−2, 19) and a minimum turning point (2, −13).
Informally, a maximum turning point is where the y-value is greater than all nearby points.
Similarly, a minimum turning point is where the the y-value is smaller than all nearby
points.
Section A of the A-level exams will have some questions about quadratic equations.
In theory, you should have completely mastered quadratic equations from your study of
O-Level Mathematics. In practice? Probably not. So this chapter reviews quadratic
equations.
Remark 1. You may have heard of the term parabola (plural: parabolae). Just so you
know, the graph of a quadratic equation is an example of a parabola. But don’t worry, the
word parabola will never show up on the A-level H1 Maths exam.
b 2 b b2
(x + ) = x2 + x + 2 .
2a a 4a
b 1 b 2 b2
Or rearranging: x2 + x = (x + ) − .
a 2a 4a
1
In a moment we’ll make use of =. Now let’s consider the quadratic equation y = ax2 + bx + c.
Assume that a ≠ 0, otherwise the equation simplifies to y = bx + c, which is just a straight
line. We now manipulate the quadratic expression ax2 + bx + c. First, we divide by a (this
is allowed because of our assumption that a ≠ 0):
2 b c
ax2 + bx + c = a (x2 + x + ) .
a a
1 2
Now plug = into = to get:
2 b 2 b2 c b 2 b2 − 4ac
ax + bx + c = a [(x + ) − 2 + ] = a [(x + ) − ].
2a 4a a 2a 4a2
What we just did above is called completing the square. We can now compute the zeros
of the equation y = ax2 + bx + c.
b 2 b2 − 4ac
ax2 + bx + c = 0 ⇐⇒ a [(x + ) − ]=0
2a 4a2
b 2 b2 − 4ac b 2 b2 − 4ac
⇐⇒ (x + ) − =0 ⇐⇒ (x + ) =
2a 4a2 2a 4a2
√
√ −b ± b2 − 4ac
⇐⇒ x+
b ± b2 − 4ac
2a
=
2a
⇐⇒ x= .
2a
This last expression solves ax2 + bx + c = 0. This expression will NOT be printed in the
A-Level List of Formulae! So be sure you remember it!
The properties of quadratic equations are summarised in the following table and discussed
on the next page.
Category Features
∪-shaped.
1. a > 0, b2 − 4ac > 0
Intersects the x-axis at two points.
∪-shaped.
2. a > 0, b2 − 4ac = 0
Just touches the x-axis at the minimum point.
∪-shaped.
3. a > 0, b2 − 4ac < 0
Doesn’t intersect the x-axis.
∩-shaped.
4. a < 0, b2 − 4ac > 0
Intersects the x-axis at two points.
∩-shaped.
5. a < 0, b2 − 4ac = 0
Just touches the x-axis at the maximum point.
∩-shaped.
6. a < 0, b2 − 4ac < 0
Doesn’t intersect the x-axis.
– If a > 0, then the graph is ∪-shaped and has a minimum turning point at x = −b/2a.
– Conversely, if a < 0, then the graph is ∩-shaped and has a maximum turning point at
x = −b/2a.
• The sign of the discriminant b2 −4ac. This name makes sense, because the discriminant
helps us discriminate between several possible cases of the equation ax2 + bx + c = 0:
What we have just done is to factorise the expression ax2 + bx + c. Factorisation is often
a useful trick to play. Notice that if you plug in either of the roots into the right hand
side (RHS) of the above equation, we do indeed get zero, as expected.
– If b2 − 4ac = 0, then:
∗ There is only one real root (or zero or horizontal intercept), namely −b/2a.
∗ Moreover, we can write
2 −b 2 b 2
ax + bx + c = (x − ) = (x + ) .
2a 2a
∗ Notice that if you plug x = −b/2a into the RHS of the above equation, we do indeed
get zero, as expected.
– If b2 − 4ac < 0, then:
∗ There are no real roots (or zeros or horizontal intercepts).
∗ There is no way to factorise the expression ax2 +bx+c (at least without using complex
numbers, which are not covered in H1 Maths).
(ii) When is ax2 + bx + c (a) positive for all possible values of x? (b) negative for all possible
values of x?
Example 16. The graph below has horizontal asymptote y = 2, because as x grows infinitely
large (i.e. towards ∞), y grows ever closer to (but is never equal to) 2.
Example 17. The graph below has horizontal asymptote y = 2, because as x grows infinitely
small (i.e. towards −∞), y grows ever closer to (but is never equal to) 2.
Example 18. The graph below has vertical asymptote x = 3, because as x grows ever closer
to (but is never equal to) 3, y grows infinitely large (i.e. towards ∞).
-4 -2 0 2 4 6
Example 19. The graph below has vertical asymptote x = 3, because as x grows ever closer
to (but is never equal to) 3, y grows infinitely small (i.e. towards −∞).
-4 -2 0 2 4 6
x a xa
xa ⋅ xb = xa+b , ( ) = a,
y y
xa 1
= xa−b , x−a = ,
xb xa
√
a1/b =
b
(xa ) = xab , b
a,
√ √ c
(xy)a = xa y a , ac/b = ac = ( b a) .
b
Exercise 9. (Answer on p. 318.) Is each of the following true? (If true, explain why. If
false, simply give a counterexample.)
8
By convention, 00 is usually defined to be equal to 1 – this textbook will follow this practice.
10
Each of these graphs has horizontal asymptote y = 0, because as x grows infinitely small
(i.e. to −∞), y grows ever closer to (but never equals) 0.
15
14
13
12
11
10
Each of these graphs has horizontal asymptote y = 0, because as x grows infinitely small
(i.e. to −∞), y grows ever closer to (but never equals) 0.
Exercise 10. Graph (on the same diagram) the following equation and function: (i) y = 6x ;
(ii) f defined by f (x) = 7x . (Answer on p. 318.)
Exam Tip
The topic of exponential growth and decay is new to the 8865 syllabus. So there are no
TYS questions covering this topic.
Example 22. Bacteria in a petridish double in weight every 10 seconds. Let bt be the total
weight (micrograms) of the bacteria in the petridish at time t. Let t (seconds) be time.
Initially, there were 7 micrograms of bacteria. The graph below is bt against t.
yt = y0 ⋅ 2t/d ,
where y0 is the quantity at time t = 0, and d is the number of units of time it takes for the
quantity to double.
9
More precisely, its growth rate is proportional to the current magnitude of the quantity.
Example 24. The number of Singapore citizens originally from the People’s Republic of
China doubles every 5 years. In the year 2000, there were 50, 000 such citizens. Let cy be
the total number of such citizens in the year y. The graph below is of cy against y.
Even more generally than before, a quantity xt that grows exponentially takes the following
form: xt = xT ⋅ 2(t−T )/d , where xT is the quantity at time t = T , and d is the number of units
of time it takes for the quantity to double.
Extrapolating, the PM’s salary will be s2060 = 64 million Singapore dollars in 2060.
Example 26. The world population of panda bears halves every 20 years. In the year
1960, there were 64, 000 panda bears. Let py be the total number of panda bears in the
year y. So py = 64000 ⋅ 0.5(y−1960)/20 (graphed below).
Extrapolating, there will be m2020 = 100, 000 such citizens in the year 2020.
Exercise 11. Let by be the number of Singaporean billionaires in the year y. This number
doubles every 7 years. In 1990, there were 4 Singaporean billionaires. (Answer on p. 319.)
(i) Write down an equation that expresses by in terms of y.
We define a = logb c to be the number such that ba = c. We call b the logarithmic base.
log3 9 = 2, because 32 = 9.
We define lg c = log10 c.
lg 1 = 0, because 100 = 1.
Remark 2. The Singapore-Cambridge A-level exams write lg c to mean the base-10 loga-
rithm of c, so that’s what we’ll stick to. But you should know that some other writers
(including most calculators) simply write log c to mean the same.
(ii) Given the following, find the constants a, b, and c: loga 16 = 4, logb 0.25 = −1, and
logc 5 = 1.
(iv) Rewrite the following equations in exponential form: α = log4 β and logγ δ = 17.
For all real numbers x, we have logx 1 = 0, because x0 = 1 (this was stated in our discussion
of the laws of exponents). And if c ≤ 0, then logx c is undefined, because there is no real
number a such that xa ≤ 0.
(i) logb bx = x
x
(iii) logb x − logb y = logb ,
y
loga x
(v) logb x = .
loga b
(vi) y = ln x ⇐⇒ ey = x.
(ii) By (i), x = blogb x and y = blogb y . Hence, xy = blogb x blogb y = blogb x+logb y . Apply logb to both
sides of this equation to get logb (xy) = logb x + logb y.
x blogb x x
(iii) = log y = blogb x−logb y . Apply logb to both sides of this equation to get logb =
y b b y
logb x − logb y.
(iv) By (i) and (ii), xa = blogb x = ba logb x . Apply logb to both sides of this equation to get
a
logb xa = a logb x.
loga x
(v) By (i), x = blogb x . Plugging this into RHS and using also (ii), we have =
loga b
loga blogb x logb x loga b
= = logb x.
loga b loga b
(vi) is immediate from (i). (Observe that ln x = loge x.)
Example 35. log2 5 = lg 5/ lg 2 = ln 5/ ln 2 = log3 5/ log3 2. Indeed, log2 5 = loga 5/ loga 2 for
any positive number a.
(ii) Find x if 2 loga 7 + 0.25 loga 81 − loga 3 = loga x, where a is a positive constant.
Example 36. The graphs below are of the equations y = log2 x, y = log3 x, y = ln x, and
y = lg x.
Each of these graphs crosses the horizontal axis at the point (1, 0).
Moreover, each has horizontal asymptote y = 0, because as x grows infinitely small (i.e. to
−∞), y grows ever closer to (but never equals) 0.
Each of these graphs crosses the horizontal axis at the point (1, 0).
Moreover, each of these graphs has horizontal asymptote y = 0, because as x grows infinitely
small (i.e. to −∞), y grows ever closer to (but never equals) 0.
Exercise 14. Graph (on the same diagram) the following equation and function: (i) y =
log7 x; (ii) f defined by f (x) = log9 x. (Answer on p. 320.)
Exam Tip
The topic of logarithmic growth is new to the 8865 syllabus. So there are no TYS questions
covering this topic.
1 1 1 1
Example 38. The nth harmonic number is Hn = + + + ⋅⋅⋅ + .
1 2 3 n
For example, the first four harmonic numbers are
1 1 1 1 1 1 1 1 1 1
H1 = = 1, H2 = + = 1.5, H3 = + + = 1.8333 . . . , H4 = + + + = 2.0833 . . .
1 1 2 1 2 3 1 2 3 4
It turns out that harmonic numbers grow logarithmically. In particular, a graph of the
harmonic numbers looks very similar to the graph of y = ln x (black dotted curve).
The harmonic numbers grow very slowly. For example, the first to exceed 10 is
1 1 1 1
H12367 = + + + ⋅⋅⋅ + ≈ 10.000043.
1 2 3 12367
Nonetheless and remarkably, the harmonic numbers grow forever (towards ∞)! In 1968, it
was shown (source) that the first harmonic number that exceeds 100 is:
H15092688622113788323693563264538101449859497 .
10
A bit more precisely, the growth rate is inversely proportional to the time elapsed.
Example 40. Ah Kow is studying for the H1 Maths Exam. He takes a practice test every
day for 30 days. On Day #1, he gets nearly 0 points. His score improves rapidly initially,
but the rate of improvement slows down.
Example 41. The graph of y = x2 is symmetric in the line x = 0 (which also happens to
be the vertical axis).
4
y
x=0
Reflection
line
3
y = x2
x
0
-2 -1 0 1 2
5
y
4
y = -x y=x
line 3 line
1
y=1/x
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
-1 x
-2
-3
-4
-5
Exercise 15. Draw the graphs of each of the following equations. (a) y = ex . (b) y = 3x + 2.
(c) y = 2x2 + 1. Identify any intercepts, turning points, asymptotes, and lines of symmetry.
(Answers on pp. 321, 322, and 323.)
Here are our first examples involving a graphing calculator. As mentioned, all such examples
use a TI84.
Most buttons on the TI84 have three different roles. Simply pressing a button executes the
role printed on the button itself. Pressing the blue 2ND and then a button executes the
role printed in blue above the button. And pressing the green ALPHA and then a button
executes the role printed in green above the button.
√
3. Press the blue 2ND button and then (which corresponds to the x2 button) to
√
enter “ (”. Next press X,T,θ,n to enter “X”. (If we’d like, we can also enter the
right parenthesis ) to close the left parenthesis, but this is not necessary — the TI84
understands what you mean, even if you don’t enter the right parenthesis.)
√
4. Now press GRAPH and the calculator will graph the equation y = x.
√
Exercise 16. Graph y = ex − x2 + x on your TI84. (Answer on p. 324.)
1 2
Example 45. Solve the following pair of simultaneous equations: y = x+5 and y = x2 −2x+1.
1 2
Plug = into =: x + 5 = x2 − 2x + 1. Rearrange to get x2 − 3x − 4 = 0. We can factorise
x2 − 3x − 4 = (x − 4)(x + 1). So x = 4 or x = −1. Correspondingly, y = 9 or y = 4.
So there are two solutions to the given pair of simultaneous equations, namely (x, y) = (4, 9)
and (x, y) = (−1, 4).
8. Simply press ENTER to confirm that you want y = x + 5 to be your first curve.
It now asks, “Second curve?” Again:
9. Simply press ENTER to confirm that you want y = x2 − 2x + 1 to be your second curve.
It now asks, “Guess?” You can use the arrow keys to move the blinking cursor to close to
where you believe an intersection point will be. Here I won’t bother moving the blinking
cursor at all. Instead, I will simply
10. Press ENTER . The TI84 tells you that the nearest intersection point is (x, y) = (−1, 4).
11. To find the other intersection point, repeat steps #7 through #10, using the arrow keys
as is appropriate.
The TI84 tells you what the other intersection point is — it is (x, y) = (4, 9).
After Step 8. After Step 9. After Step 10. After Step 11.
1 2
Example 46. Solve the following pair of simultaneous equations: x+y = 0 and y = 3x2 +x−1.
1 3 3 2
Rearrange = to y = −x. Plug = into = to get −x = 3x2 + x − 1 or 0 = 3x2 + 2x − 1. Now use
the quadratic formula:
√
−2 ± 22 − 4(3)(−1) −2 ± 4 1
x= = = −1, .
2(3) 6 3
Correspondingly, y = 1 or y = −1/3.
So there are two solutions to the given pair of simultaneous equations, namely (x, y) =
(−1, 1) and (x, y) = (1/3, −1/3). TI84 screenshots:
You are required to know how to use a graphing calculator to find the numerical solution
of equations (including system of linear equations).
Here I’ll use another method: First rewrite the two equations as a third equation y =
x4 − x3 − 5 − ln x. Our goal is to find the horizontal intercepts of this equation, which will
in turn also be the solutions to the above set of equations.
Exercise 18. Using your graphing calculator, solve the following systems of equations.(a)
1 1
y= √ , y = x5 − x3 + 2. (b) y = , y = x3 + sin x. (Answers on p. 326.)
1+ x 1−x 2
In the TI84:
1. Press ON to turn on your calculator.
2. Press Y= to bring up the Y= editor.
3. Press X,T,θ,n − SIN 0 . 5 . To enter “π”, press the blue 2ND button and then π
(which corresponds to the ∧ button). Now press X,T,θ,n ) and altogether you will
have entered “x − sin(0.5πx)”.
4. Now press GRAPH and the calculator will graph y = x − sin(0.5πx).
It looks like the horizontal intercepts are close to the origin. Let’s zoom in to see better.
5. Press the (ZOOM) button to bring up a menu of ZOOM options.
6. Press 2 to select the Zoom In option. Nothing seems to happen. But now press ENTER
and the TI will zoom in a little for you.
It looks like there are 3 horizontal intercepts. To find out what precisely they are, we’ll use
the TI84’s “zero” option.
4. Press the blue 2ND button and then CALC (which corresponds to the TRACE
button). This brings up the CALCULATE menu.
5. Press 2 to select the “zero” option. This brings you back to the graph, with a cursor
flashing. Also, the TI84 prompts you with the question: “Left Bound?”
TI84’s ZERO function works by you first specifying a “Left Bound” and a “Right Bound”
for x. TI84 will then check to see if there are any horizontal intercepts (i.e. values of x for
which y = 0) within those bounds.
6. Using the < and > arrow keys, move the blinking cursor until it is where you want your
first “Left Bound” to be. For me, I have placed it a little to the left of where I believe
the leftmost horizontal intercept to be.
7. Press ENTER and you will have just entered your first “Left Bound”.
TI84 now prompts you with the question: “Right Bound?”.
8. So now just repeat. Using the < and > arrow keys, move the blinking cursor until it is
where you want your first “Right Bound” to be. For me, I have placed it a little to the
right of where I believe the leftmost horizontal is.
9. Again press ENTER and you will have just entered your first “Right Bound”.
TI84 now asks you: “Guess?” This is just asking if you want to proceed and get TI84 to
work out where the horizontal intercept is. So go ahead and:
10. Press ENTER . TI84 now informs you that there is a “Zero” at “x = −1”, “y = 0” and
places the blinking cursor at precisely that point. This is the first horizontal intercept
we’ve found.
To find each of the other 2 horizontal intercepts, just repeat steps 4 through 10. You
should be able to find that they are at x = 0 and x = 1. Altogether, the 3 intercepts are
x = −1, 0, 1. Based on these and what the graph looks like, we conclude: x > sin (0.5πx)
⇐⇒ x ∈ (−1, 0) ∪ (1, ∞).
Look for the values of x for which x − e − ln x = 0. They are x = 0.7083, 4.1387:
Based on these horizontal intercepts and what the graph looks like, we conclude: x > e+ln x
if and only if x ∈ (0, 0.7083) ∪ (4.1387, ∞).
Exercise 20. Use a graphing calculator to find the values of x for which each of the
√ 1
following inequalities is true. (a) x3 − x2 + x − 1 > ex . (b) x > cos x. (c) > x3 + sin x.
1−x 2
(Answers on pp. 328.)
Exercise 21. (PSLE-style question.) When Apu was 40 years old, Beng was twice as old
as Caleb. Today, Caleb is 28 years old and Apu is twice as old as Beng. What are the ages
of Apu and Beng today? (If necessary, assume that the age of a person is always an integer
and is fixed between January 1st and December 31st of each year.) (Answer on p. 329.)
Exercise 23. The point (−1, 2) satisfies the equation y = ax2 + bx + c. Moreover, the
minimum point of the equation y = ax2 + bx + c is (0, 0). What are a, b, and c? (Answer on
p. 329.)
Calculus
Example 53. The line with slope 3 and which passes through the point (1, 2) has equation
y−2 = 3(x−1). If desired, we can rearrange this equation into a more familiar form: y = 3x−1.
Example 54. The line with slope −1 and which passes through the point (3, −1) has
equation y − (−1) = −1(x − 3). If desired, we can rearrange this equation into a more
familiar form: y = −x + 2.
Example 56. The line with slope 0 and which passes through the point (1, 1) has equation
y − 1 = 0(x − 1). If desired, we can rearrange this equation into a more familiar form: y = 1.
The problem of finding the derivative is the problem of finding the slope of the tangent to
a graph at a given point.
Graphed below is some function f . Pick some point A = (a, f (a)). Draw the line l which
is tangent to the graph at the point A.
How do we find the slope of l? Unsure of how to proceed, we try a crude approximation.
Pick some point X1 = (x1 , f (x1 )) that is also on the graph. Consider the line AX1 . What’s
f (x1 ) − f (a)
its slope? Slope = Rise ÷ Run and so AX1 has slope .
x1 − a
This number serves as our first crude approximation of the slope of l.
How can we improve on this approximation? Simple — just pick some point X2 = (x2 , f (x2 ))
f (x2 ) − f (a)
that is closer to A. The line AX2 has slope .
x2 − a
This number serves as our second, improved approximation of the slope of l.
At least in theory, we can keep repeating this procedure, by picking points that are ever
closer to A. Our estimates of the slope of l will get ever better. Altogether then, we are
motivated to make the following informal definition of the derivative:
f (x) − f (a)
, when x is “very close to, but not equal to” a.
x−a
The following proposition summarises the rules of differentiation you need to know. You
don’t need to know why they work; instead, you need only blindly apply them like a monkey.
For example, Rule #1 says that the function h defined by h(x) = k (where k is some
constant) has derivative h′ defined by h′ (x) = 0.
d d k d x
1. k = 0, 2. x = kxk−1 , 3. e = ex ,
dx dx dx
d 1 d d
4. ln x = , 5. f ± g = f ′ ± g′, 6. kf = kf ′ .
dx x dx dx
Proof. Omitted.
The derivative of the function f is the function f ′ may be written compactly as:
df df (x)
= f′ or = f ′ (x).
dx dx
Example 57. Graphed below (in red) is the function f defined by f (x) = 5x.
This says that the tangent to the graph of g at the point (x, g(x)) has slope 2x.
For example, the tangent at (1.5, 2.25) has slope 2x = 2(1.5) = 3. Its equation is thus
y − 2.25 = 3(x − 1.5) or y = 3x − 2.25.
As another example, the tangent at (−1, −1) has slope 2x = 2(−1) = −2. Its equation is thus
y − (−1) = −2 [x − (−1)] or y = −2x − 1.
This says that the tangent to the graph of h at the point (x, h(x)) has slope 3x2 − 4x + 5.
For example, the tangent at (−1, −9) has slope 3(−1)2 − 4(−1) + 5 = 12. Its equation
is thus y − (−9) = 12 [x − (−1)] or y = 12x + 3.
As another example, the tangent at (1, 3) has slope 3(1)2 − 4(1) + 5 = 4. Its equation is thus
y − 3 = 4(x − 1) or y = 4x − 1.
dy
.
dx
Example 60. Consider the equation y = 5x. The derivative of y with respect to x is
dy
= 5.
dx
dy
= 2x.
dx
dy
= 3x2 − 4x + 5.
dx
Example 64. Rule #1. The function g defined by g(x) = 31 is another example of a
constant function. Its derivative is the function g ′ defined by g ′ (x) = 0.
Example 65. Rule #2. The function f defined by f (x) = x has derivative f ′ defined by
f ′ (x) = 1.
Example 66. Rule #2. The function g defined by g(x) = x2 has derivative g ′ defined by
g ′ (x) = 2x.
Example 67. Rule #2. The function h defined by h(x) = x3 has derivative h′ defined by
h′ (x) = 3x2 .
Example 68. Rule #2. The function i defined by i(x) = x4 has derivative i′ defined by
i′ (x) = 4x3 .
Example 69. Rule #3. The function f defined by f (x) = ex has derivative f ′ defined by
f ′ (x) = ex . That is, interestingly enough, the derivative of f is itself.
Example 70. Rule #4. The function g defined by g(x) = ln x has derivative g ′ defined
by g ′ (x) = 1/x.
Example 71. Rule #5. The function h defined by h(x) = x3 + ln x has derivative h′
defined by h′ (x) = 3x2 + 1/x.
Example 72. Rule #5. The function h defined by h(x) = ex + x4 has derivative h′ defined
by h′ (x) = ex + 4x3 .
Example 73. Rule #5. The function i defined by i(x) = 15 + x + x2 has derivative i′
defined by i′ (x) = 1 + 2x.
Example 75. Rule #6. The function f defined by f (x) = 30x has derivative f ′ defined
by f ′ (x) = 30.
Example 77. Rule #6. The function h defined by h(x) = 4ex has derivative h′ defined
by h′ (x) = 4ex . (Interestingly, the only functions whose derivatives are themselves must be
of the form f (x) = kex , for some constant k.)
Exercise 24. For the functions below, (a) compute its derivative; (b) find the equations of
the tangents to the graph at the points where x = 1 and x = 2. (Answer on p. 330.)
Exercise 25. For each of the two equations below, (a) compute the derivative of y with
respect to x; (b) find the equations of the tangents to the graph at the points where x = 1
and x = 2. (Answer on p. 331.)
√ 3
(i) y = 13 ( x − 2 ). (ii) y = 9ex − x5 .
x
The Chain Rule is yet another rule of differentiation. A simple example to illustrate:
Example 78. When I add 1 g of Milo (the x-variable) to a cup of water, the volume of
the water increases by 2 cm3 (the y-variable). We can write this more compactly as
dy
= 2 cm3 g−1 .
dx
When the volume of the water increases by 1 cm3 (the y-variable), the water level (in the
cup) rises by 0.3 cm (the z-variable). We can write this more compactly as
dz
= 0.3 cm cm−3 = 0.3 cm−2 .
dy
Altogether then, when I add 1 g of Milo (the x-variable) to a cup of water, I should expect
the water level to rise by 0.6 cm. That is,
dz
= 0.6 cm g−1 .
dx
We got the above expression for dz/dx by making the following quick computation:
dz dz dy
= = 2 × 0.3 = 0.6 cm g−1 .
dx dy dx
In general, let x, y, and z be variables. Suppose x and z are not directly related. However,
a small change in x causes a small change in y. And in turn, a small change in y causes a
small change in z.
Informally, the Chain Rule addresses the following question: “If there is a small unit
change in x, how does z change?” The answer is this:
dz dz dy
The Chain Rule is thus simply this equation: = × .
dx dy dx
dex dx3
3 3
′ dex
f (x) = = 3
= e x3
⋅ 3x2 .
dx dx dx
√
Example 80. Let g be defined by g(x) = 4x − 1. Its derivative g ′ is defined by:
√ √
d 4x − 1 d 4x − 1 d(4x − 1) −0.5 −0.5
g ′ (x) = = = 0.5 (4x − 1) ⋅ 4 = 2 (4x − 1) .
dx d(4x − 1) dx
Here’s a more complicated example, where the Chain Rule is applied twice.
3
Example 81. Let h be defined by h(x) = (ln x2 + e5x+3 ) . Its derivative h′ is defined by:
3 3
′ d (ln x2 + e5x+3 ) d (ln x2 + e5x+3 ) d(ln x2 + e5x+3 )
h (x) = =
dx d(ln x2 + e5x+3 ) dx
2 1 5x+3 2 2
= 3 (ln x2 + e5x+3 ) ( 2
⋅ 2x + e 5x+3
⋅ 5) = 3 (ln x 2
+ e ) ( + 5e5x+3 ) .
x x
Exercise 26. The functions f , g, and h are defined below. Find the value of the derivative
of each, at x = 0. (Answer on p. 331.)
(a) f (x) = x2 .
2
(b) g(x) = 1 + [x − ln (x + 1)] .
2 3
(c) h(x) = (1 + [x − ln (x + 1)] ) .
Example 82. The function f defined by f (x) = x2 has derivative f ′ defined by f ′ (x) = 2x.
For x < 0, f is (strictly) decreasing, i.e. f ′ (x) < 0. For x > 0, f is (strictly) increasing,
i.e. f ′ (x) > 0.
Example 83. Graphed below is the function g defined by g(x) = 3x3 − 5x2 + x − 7. Its
derivative g ′ is defined by g ′ (x) = 9x2 − 10x + 1.
From what we know about quadratic equations, g ′ (x) = 9x2 − 10x + 1 = (9x − 1)(x − 1) is
negative if 1/9 < x < 1, zero if x = 1/9 or x = 1, and positive if x < 1/9 or x > 1.
So for 1/9 < x < 1, the function g is (strictly) decreasing, i.e. g ′ (x) < 0. And for x < 1/9
or x > 1, the function g is (strictly) increasing, i.e. g ′ (x) > 0.
Exercise 27. Let f be defined by f (x) = 3x2 −4x+1. (i) Sketch the graph of f . (ii) Identify
where f ′ (x) is negative, zero, and positive (equivalently, where the slope of the graph of f
is decreasing, flat, and increasing). (iii) Identify the stationary points. (Answer on p. 332.)
It turns out that every maximum and minimum turning point is a stationary
point. The intuition for this is quite simple:
Example 85. Graphed below is f defined by f (x) = −(x − 1)2 . Here’s the intuition for why
f ′ (0) = 0 (i.e. why there is a stationary point at x = 0):
In order for 1 to be a maximum turning point of f , it must be that to its left, f is increasing;
while to its right, f is decreasing. In other words, to the left of 1, f ′ (x) ≥ 0. While to the
right of 1, f ′ (x) ≤ 0. Altogether then, we must have f ′ (1) = 0 — that is, the maximum
turning point must also be a stationary point.
The next exercise asks you to give a similar piece of intuition for why g ′ (−1) = 0.
Exercise 28. Explain why g ′ (−1) = 0 in the above Example. (Answer on p. 332.)
Every maximum or minimum turning point is a stationary point. However, the converse is
not true: not every stationary point is a turning point.
So to identify all the maximum or minimum turning points of a function, we can follow
this two-step recipe:
For H1 Maths, checking what exactly a stationary point is usually just involves sketching
the graph (either manually or using your graphing calculator).
1. g ′ (x) = 28x6 − 56x3 + 28 = 28 (x6 − 2x3 + 1) = 28 (x3 − 1) (x3 − 1). So the only stationary
point is at x = 1.
2. But this is not a turning point, as a quick graph sketch will verify.
Exercise 29. For each of the following functions, identify any maximum and minimum
turning points. (Answers on pp. 333 and 334.)
(i) f defined by f (x) = x.
(0, 0) is an inflexion point because this is where the graph changes from concave downwards
to concave upwards.
The tangent line test says that a point is an inflexion point if and only if the line is
above the graph on one side of the point and below the graph on the other side.
This is illustrated in the above example.
11
The discussion in this chapter here is very brief and informal, because a proper discussion of inflexion points would be
much longer. If you’re really interested in what inflexion points are, please read my H2 Mathematics Textbook. (In the
2006-2015 H1 Maths exams, I can find only one 2-mark question on inflexion points — see Exercise 63.1. So it isn’t terribly
important, if all you care about is getting an A.) By the way, inflection would be the American spelling.
Example 93. Graphed below is the function f defined by f (x) = x3 + x. We have f ′ (x) =
3x2 + 1. The point (0, 0) is not a stationary point because f ′ (0) = 1 ≠ 0.
Nonetheless, it is an inflexion point, because to the left of 0, f is concave downwards; and
to the right, f is concave upwards. (We can also verify this using the tangent line test.)
But don’t worry, the A-level exams will ONLY ask about stationary points of inflexion.
And so for the purposes of the A-level exams, the Simple Recipe given in the previous
chapter will detect not only all turning points, but also all inflexion points.
f ′ (x) = 5x4 + 8x3 + 3x2 = x2 (5x2 + 8x + 3) = x2 (5x + 3)(x + 1). So the only stationary points
are at x = −1, x = −0.6, and x = 0. These are labelled in the graph below as A, B, and C.
2. Investigate the nature of these points.
A is a maximum turning point; B is a minimum turning point, and C is a stationary
point of inflexion. (The graph of f actually has two other points of inflexion other than C.
However, they are non-stationary and you are not required to find them for the A-levels.)
Example 95. Graphed below is the function g defined by g(x) = 9x4 + 2x3 − 3x2 .
g ′ (x) = 36x3 + 6x2 − 6x = 6x (6x2 + x − 1) = 6x(3x − 1)(2x + 1). So the only stationary points
are at x = −1/2, x = 0, and x = 1/3. These are labelled in the graph below as A, B, and C.
2. Investigate the nature of these points.
A and C are both minimum turning points, B is a maximum turning point. There are no
stationary points of inflexion. (There may or may not be non-stationary points of inflexion,
but you’re not required to know how to find these for the A-levels.)
h′ (x) = 12x5 − 12x3 = 12x3 (x2 − 1) = 12x3 (x − 1)(x + 1). So the only stationary points are
at x = −1, x = 0, and x = 1. These are labelled in the graph below as A, B, and C.
2. Investigate the nature of these points.
A and C are both minimum turning points, B is a maximum turning point. There are no
stationary points of inflexion. (Again, there may or may not be non-stationary points of
inflexion, but you’re not required to know how to find these for the A-levels.)
Exercise 30. (Answer on p. 335.) For each of the following functions, find the stationary
points and investigate the nature of each.
(i) f defined by f (x) = x3 − 3x + 1.
Example 97. Define f by f (x) = x − sin (0.5πx). Let’s find the minimum point of f , in
the region where 0 < x < 2.
It looks like starting at x = 0, the function is decreasing, then hits a minimum point, then
keeps increasing. Our goal now is to find out what that minimum point is.
4. Press the blue 2ND button and then CALC (which corresponds to the TRACE
button). This brings up the CALCULATE menu.
5. Press 3 to select the “minimum” option. This brings you back to the graph, with a
cursor flashing. Also, the TI84 prompts you with the question: “Left Bound?”
TI84’s MINIMUM function works by you first choosing a “Left Bound” and a “Right
Bound” for x. TI84 will then look for the minimum point within your chosen bounds.
6. Using the < and > arrow keys, move the blinking cursor until it is where you want your
first “Left Bound” to be. For me, I have placed it a little to the left of where I believe
the minimum point to be.
7. Press ENTER and you will have just entered your first “Left Bound”.
TI84 now prompts you with the question: “Right Bound?”.
8. So now just repeat. Using the < and > arrow keys, move the blinking cursor until it is
where you want your first “Right Bound” to be. For me, I have placed it a little to the
right of where I believe the minimum point to be.
9. Again press ENTER and you will have just entered your first “Right Bound”.
TI84 now asks you: “Guess?” This is just asking if you want to proceed and get TI84 to
work out where the minimum point is. So go ahead and:
10. Press ENTER . TI84 now informs you that there is a “Zero” at “X = .56066485”,
“Y = −.2105137” and places the cursor at precisely that point. This is our desired
minimum point.
Example 98. Define f by f (x) = esin x . Our goal is to find f ′ (2) and f ′ (3).
The TI84 now graphs y = esin x . Now you need only tell the TI84 at which point (x, y) you’d
like it to evaluate dy/dx. So to find f ′ (2), simply
6. Press 2 .
7. Press ENTER . You’re now told that f ′ (2) = −1.033116.
To find f ′ (3):
8. Press the blue 2ND button and then CALC (which corresponds to the TRACE
button) to again bring up the CALCULATE menu. Again press 6 to select the “dy/dx”
option. The only difference now is that we press 3 . Press ENTER . You’re now told
that f ′ (3) = −1.140038.
Example 99. We unload sand onto a flat surface at a steady rate of 0.01 m3 s-1 . Assume
the unloaded sand always forms a perfect cone whose height and base diameter are always
equal.
Let’s find the rate at which the base area of the cone is increasing, at the instant t = 20 s.
First, recall that a cone with base radius r and height h has volume
1
V = πr2 h.
3
Since the base diameter equals the height (or h = 2r), we can rewrite this as
2
V = πr3 .
3
Let A = πr2 be the base area. The rate at which the base area is increasing is
dA dr dV
= 2πr = ÷ r.
dt dt dt
The volume of the sand is always increasing at a rate 0.01 m3 s-1 . That is:
dV
= 0.01 m3 s−1 .
dt
dA 0.3 1/3
∣ = 0.01 ÷ ( ) = 0.0219 m2 s−1 .
dt t=20 π
(b) Use the Pythagorean Theorem to express l in terms of r and h. Hence express l solely
in terms of h.
(c) Now express the total external surface area A (excludes the base) solely in terms of h.
dA 3 π − h63
= .
dh 2 A
6 1/3
h = ( ) ≈ 1.24 m.
π
(e) Using your expression from part (c) and your graphing calculator, graph A as a function
of h. Hence confirm that the stationary point we found in part (d) is indeed the minimum
turning point. That is, the desired height is indeed
6 1/3
h = ( ) ≈ 1.24 m.
π
If the function g is the derivative of the function f , then we may also say that f is an
indefinite integral of g.
Example 100. Consider the functions f and g defined by f (x) = x2 and g(x) = 2x.
The function g is the derivative of the function f . We write:
df df (x)
=g or = g(x).
dx dx
The two statements above are equivalent. Each says: “the function g is the derivative of
the function f ”.
∫ g dx = f or ∫ g(x) dx = f (x).
The two statements above are equivalent. Each says: “the function f is an indefinite
integral of the function g”.
Remarks on notation:
One common source of confusion amongst students is a failure to grasp that x is merely a
dummy variable. We can replace x with any other letter. The next example illustrates:
∫ g dx = f or ∫ g(x) dx = f (x).
∫ g da = f or ∫ g(a) da = f (a).
∫ g db = f or ∫ g(b) db = f (b).
∫ g dc = f or ∫ g(c) dc = f (c).
The dummy variable is merely a place-holder for whatever input that goes into the
function f or g. We can use any letter for this dummy variable, be it x or a or b or c.
∫ g dx = f or ∫ g(x) dx = f (x).
2 2
f (x) = ex and g(x) = 2x ⋅ ex .
The function g is the derivative of the function f . Conversely, the function f is an indefinite
integral of the function g. We may write either
∫ g dx = f or ∫ g(x) dx = f (x).
It turns out that every function has infinitely many indefinite integrals.
Example 104. Consider the function f defined by f (x) = 2x. The following functions are
all indefinite integrals of f :
Indeed, f is the derivative of any function j of the form j(x) = x2 + C (where C is any
constant). Thus, any such j is an indefinite integral of f .
This is not terribly surprising, given that the derivative of any constant C is 0. We call C
the constant of integration.
Altogether then:
Example 106. Define f by f (x) = xex . You are given that an indefinite integral of f is
the function g defined by g(x) = ex (x − 1).
Then you immediately know that:
1. Every function h of the form h(x) = ex (x − 1) + C is an indefinite integral of f .
2. Moreover, besides such functions, there are no other indefinite integrals of f .
Proposition 2. Let k be any constant. Let f and g be functions with derivatives f ′ and
g ′ . Then
(ax + b)k+1
1. ∫ k dx = kx + C, 5. ∫ (ax + b) dx
k
= + C,
a(k + 1)
xk+1 1 ax+b
2. ∫ xk dx = + C, 6. ∫ e
ax+b
dx = e + C,
k+1 a
1
3. ∫ dx = ln ∣x∣ + C, 7. ∫ f ′ (x) ± g ′ (x) dx = f (x) ± g(x) + C,
x
′
4. ∫ ex dx = ex + C, 8. ∫ kf (x) dx = kf (x) + C,
where in each case, C is the constant of integration. (For Rule #2, assume k ≠ −1.
And if k < 0, assume x ≠ 0. For Rule #3, assume x ≠ 0.)
d
So to prove Rule #3 — i.e. that ∫ x−1 dx = ln ∣x∣+C — it suffices to prove that (ln ∣x∣ + C) =
dx
x−1 for all x ≠ 0. This we now do. First note that
⎧
⎪
⎪
⎪ln x + C, for x > 0,
ln ∣x∣ + C = ⎨
⎪
⎪
⎩ln (−x) + C,
⎪ for x < 0.
⎧
⎪ 1
⎪
⎪
⎪ , for x > 0,
d ⎪
⎪
⎪ x
Thus, (ln ∣x∣ + C) = ⎨
dx ⎪
⎪
⎪
⎪
⎪ −1 1
⎪
⎪ = , for x < 0.
⎩ −x x
d
And so indeed (ln ∣x∣ + C) = x−1 for all x ≠ 0. (Exercise 33 requests that you prove the
dx
remaining rules.)
f ′ (x) = (7x + 2)2 , g ′ (x) = (7x + 2)3 , and h′ (x) = 5(7x + 2)3 .
5(7x + 2)4
and ∫ 5(7x + 2)3 dx = + C3 .
4⋅7
Exercise 34. Find each of the following indefinite integrals. (Don’t forget to include the
constant of integration. Answer on p. 337.)
(i) ∫ 7x5 − 8x4 + 3x2 + 2 dx. (ii) ∫ e5x+2 − (5x + 2)2 dx. (iii) ∫ 16/x + 32x3 dx.
Surprisingly, we can use integration to find the area under the graph of a function.
Example 110. Graphed below is the function f defined by f (x) = 2x. What is the shaded
green area under the graph of f , between the lines x = 2 and x = 5?
We can of course find this area using primary school methods: This is a parallelogram with
base 3 and sides 4 and 10. Hence, it has area
1 1
× Base × (Sum of sides) = × 3 × (4 + 10) = 21.
2 2
But surprisingly enough, this area can also be found using integration. Pick any indefinite
integral of f — say g defined by g(x) = x2 . Then the desired area is simply:
5
We sometimes also write [g(x)]2 as shorthand for g(5) − g(2).
√
Example 111. Let the function f be defined by f (x) = x + 1. The definite integral
3
∫1 f dx (simply the area under f , between 1 and 3) is highlighted in blue. Similarly, the
3
definite integral ∫ f dx (simply the area under f , between 5 and 8) is highlighted in red.
1
Example 112. Consider the function g defined by g(x) = 9x2 + 6x + 1. What is the area
under the graph of g, between the lines x = 0 and x = 7?
12
But see my H2 Mathematics Textbook if you’re interested.
Example 114. Consider the function i defined by i(x) = ex . What is the shaded green
area under the graph of i, between the lines x = 3 and x = 4?
(ii) Find the area bounded by the x-axis, the lines x = −2 and x = 3, and the graph of
y = x2 + 5x + 10.
(iii) Find the area bounded by the x-axis, the lines x = 1 and x = 2, and the graph of
y = 1/x.
Example 115. Find the exact area bounded by the curve y = x2 and the horizontal lines
y = 1 and y = 2.
It’s always helpful to make a quick sketch (given below). Our desired area is labelled A
below. To find a desired area, there are usually multiple methods, some quicker than others.
√ √
Method #1. The entire rectangle A + B + C + D has area 2 × 2 2 = 4 2. B has area
√ √
−1 x 3 −1 1 2 2 2 2−1
2
∫−√2 x dx = [ 3 ] √ = − 3 − (− 3 ) = 3
.
− 2
By symmetry, D has the same area as B. C has area 1 × 2. Hence, A has area
√ √
√ 2 2−1 2 2−1 4 √
A + B + C + D − (B + C + D) = 4 2 − ( +2+ ) = (2 2 − 1) .
3 3 3
√
Method #2. The right branch of the curve y = x2 has equation x = y. The right half of
y=2 y=2 √ 2 2 2 √ 4 √
the area A is ∫ x dy = ∫ y dy = [y 3/2 ]1 = (2 2 − 1). Hence, A = (2 2 − 1).
y=1 y=1 3 3 3
Exercise 36. Find the exact area bounded by the curve y = x3 , the horizontal lines y = 1
and y = 2, and the vertical axis. (Answer on p. 338.)
Example 116. Find the area A bounded by the curve y = x2 and the line y = x + 1.
√
1± 5
By the quadratic formula, the curve and line intersect at the points x = .
2
√ √
(1+ 5)/2 (1+ 5)/2
2 x2 x3
∫(1−√5)/2 x + 1 − x dx = [ + x − ] √
2 3 (1− 5)/2
⎡ (1 + √5)2 √ √ 3 √ 2
(1 + 5) ⎤⎥ ⎡⎢ (1 − 5)
√ √ 3
(1 − 5) ⎤⎥
⎢ 1 + 5 1 − 5
= ⎢⎢ + − ⎥−⎢
⎥ ⎢ + − ⎥
⎥
⎢ 23 2 3⋅23
⎥ ⎢ 23 2 3⋅23
⎥
⎣ ⎦ ⎣ ⎦
√ √ √ √ √ √
6 + 2 5 1 + 5 16 + 8 5 6 − 2 5 1 − 5 16 − 8 5
=[ + − ]−[ + − ]
8 2 24 8 2 24
√ √ √ √ √ √ √ √ √
3+ 5 1+ 5 2+ 5 3− 5 1− 5 2− 5 7+5 5 7−5 5 5 5
=[ + − ]−[ + − ]= − = .
4 2 3 4 2 3 12 12 6
Exercise 37. Find the exact area bounded by the curve y = ex and the lines y = 2, y = 3,
and x = 0.5. (Answer on p. 339.)
√
0.5(1+ 5) √
x3 x2 5 5
= 2 [x − + ] = ,
3 2 0.5(1−√5) 3
where we’ve simply recycled our tedious calculations from the previous example.
x
A
Exercise 38. Find exact area bounded by the curves y = 2 − x2 and y = x2 + 1. (Answer on
p. 339.)
Example 118. Use your TI84 to find the approximate area bounded by the curve y = esin x
and the horizontal axis, between x = 1 and x = 2.
∫ f (x) dx = 2.60466115.
Probability and Statistics accounts for 60% of the A-Level H1 Maths Exam.
How many arrangements or permutations are there of the three letters in CAT? For
example, one possible permutation of CAT is TCA.
To solve this problem, one possible method is the method of enumeration. That is,
simply list out (enumerate) all the possible permutations.
To help us count more efficiently, we’ll learn about four basic principles of counting:
Example 119. For lunch today, I can either go to the food court or the hawker centre. At
the food court, I have 2 choices: ramen or briyani. At the hawker centre, I have 3 choices:
bak chor mee, nasi lemak, or kway teow.
Altogether then, I have 2 + 3 = 5 choices of what to eat for lunch today.
The Addition Principle (AP). I have to choose a destination, out of two possible areas.
At area #1, there are p possible destinations to choose from. At area #2, there are q possible
destinations to choose from.
The Addition Principle (AP) simply states that I have, in total, p + q different choices.
(Just so you know, the AP is sometimes also called the Second Principle of Counting
or the Rule of Sum or the Disjunctive Rule.)
Of course, the AP generalises to cases where there are more than just 2 “areas”. It may
seem a little silly, but just to illustrate, let’s use the AP to tackle the CAT problem:
Case #1. First letter is an A. Then the next two letters are either CT or TC — 2
possibilities.
Case #2. First letter is a C. Then the next two letters are either AT or TA — 2 possibilities.
Case #3. First letter is a T. Then the next two letters are either AC or CA — 2 possibilities.
Altogether then, by the AP, there are 2 + 2 + 2 = 6 possibilities. That is, there are 6 possible
permutations of the letters in CAT. These are illustrated in the tree diagram below.
Exercise 40. How many permutations are there of the letters in the word DEED? Illustrate
your answer with a tree diagram similar to that given in the CAT example above. (Answer
on p. 340.)
Example 121. For lunch today, I can either have prata or horfun. For dinner tonight, I
can have McDonald’s, KFC, or Pizza Hut.
Enumeration shows that I have a total of 6 possible choices for my two meals today:
Alternatively, we can use the Multiplication Principle (MP). I have 2 choices for lunch
and 3 choices for dinner. Hence, for my two meals today, I have in total 2 × 3 = 6 possible
choices.
The Multiplication Principle (MP). I have to choose two destinations, one from each
of two possible areas. At area #1, there are p possible destinations to choose from. At area
#2, there are q possible destinations to choose from.
The Multiplication Principle (AP) simply states that I have, in total, p × q different choices.
Of course, the MP generalises to cases where there are more than just 2 “areas”. Here’s an
example where we have to make 3 decisions:
(SF, BPC, A), (SF, BPC, B), (SF, BPC, C), (SF, CF, A),
(SF, CF, B), (SF, CF, C), (BN, BPC, A), (BN, BPC, B),
(BN, BPC, C), (BN, CF, A), (BN, CF, B), (BN, CF, C).
More examples:
_ _ _ _.
1 2 3 4
These 4 blanks spaces correspond to 4 decisions to be made. Decision #1: What letter to
put in the first blank space? Decision #2: What letter to put in the second blank space?
Decision #3: What letter to put in the third blank space? Decision #4: What letter to
put in the fourth blank space?
For Decision #1, we can put A, B, C, ..., or Z. So we have 26 choices for Decision #1.
For Decision #2, we can again put A, B, C, ..., or Z. So we again have 26 choices for
Decision #2.
We likewise have 26 choices for Decision #3 and also 26 choices for Decision #4.
Altogether then, by the MP, there are 26 × 26 × 26 × 26 = 264 = 456, 976 ways to make our
four decisions.
Solution: There are 264 = 456, 976 possible four-letter words that can be formed using the
26-letter alphabet.
_ _.
1 2
These 2 blank spaces correspond to 2 decisions to be made. Decision #1: What number to
put in the first blank space? Decision #2: What letter to put in the second blank space?
For Decision #1, we can put 1, 2, 3, ..., or 18. So we have 18 choices for Decision #1.
For Decision #2, we can put A, B, C, D, E, or F. So we have 6 choices for Decision #2.
Altogether then, by the MP, there are 18 × 6 = 108 ways to make our two decisions. In other
words, there are 108 possible outcomes from rolling these two dice.
(If necessary, it is tedious but not difficult to enumerate them: 1A, 1B, 1C, 1D, 1E, 1F,
2A, 2B, ..., 17E, 17F, 18A, 18B, 18C, 18D, 18E, and 18F.)
Exercise 41. A club as a shortlist of 3 men for president, 5 animals for vice-president, and
10 women for club mascot. How many possible ways are there to choose the president, the
vice-president, and the mascot? (Answer on p. 340.)
Example 125. For lunch today, I can either go to the food court or the hawker centre. At
the food court, I have 4 choices of cuisine: Chinese, Indian, Malay, and Western. At the
hawker centre, I have 3 choices of cuisine: Chinese, Malay, and Thai.
There are 2 choices of cuisine that are common to both the food court and the hawker
centre (Chinese and Malay).
Why do we subtract 2? If we simply added the 4 choices available at the food court to the
3 available at the hawker centre, then we’d double-count the Chinese and Malay cuisines,
which are available at both the food court and the hawker centre. And so we must subtract
the 2 cuisines that are at both locations.
Hence, by the IEP, there are 10 + 4 − 2 = 12 integers that are divisible by either 2 or 5.
(These are namely 2, 4, 5, 6, 8, 10, 12, 14, 15, 16, 18, and 20.)
Exercise 43. (Answer on p. 342.) The food court has 4 types of cuisine: Chinese,
Indonesian, Korean, and Western. The hawker centre has 3: Chinese, Malay, and Western.
A restaurant has 3: Chinese, Japanese, or Malay.
In total, how many different types of cuisine are there? Illustrate your answer with a Venn
diagram.
Example 127. The food court has 4 types of cuisine: Chinese, Malay, Indian, and Other.
I’m at the food court but don’t feel like eating Malay or Chinese. So by the Complements
Principle (CP), I have 4 − 2 = 2 possible choices of cuisine (Indian and Other).
The Complements Principle (CP). There are p possible destinations. I must choose
one. I rule out q of the possible destinations.
Exercise 44. There are 10 Southeast Asian countries, of which 3 (Brunei, Indonesia, and
the Philippines) are not on the mainland. How many mainland Southeast Asian countries
are there that a European tourist can visit? (Answer on p. 342.)
In this chapter, we’ll use the MP to generate several more methods of counting.
But first, we’ll learn about the factorial notation.
Let’s rephrase this problem in the framework of the MP. Consider three blank spaces:
_ _ _.
1 2 3
These 3 blank spaces correspond to 3 decisions to be made. Decision #1: What letter to
put in the first blank space? Decision #2: What letter to put in the second blank space?
Decision #3: What letter to put in the third blank space?
For Decision #1, we can put C, A, or T. So we have 3 choices for Decision #1.
Having already used up a letter in Decision #1, we are left with two letters. So we have 2
choices for Decision #2.
Having already used up a letter in Decision #1 and another in Decision #2, we are left
with just one letter. So we have only 1 choice for Decision #3.
Altogether then, by the MP, there are 3×2×1 = 3! = 6 possible ways of making our decisions.
This is also the number of ways there are to arrange the three letters in the word CAT.
Again, let’s rephrase this problem in the framework of the MP. Consider 13 blank spaces:
_ _ _ _ _ _ _ _ _ _ _ _ _.
1 2 3 4 5 6 7 8 9 10 11 12 13
These 13 blanks spaces correspond to 13 decisions to be made. Decision #1: What letter
to put in the first blank space? Decision #2: What letter to put in the second blank space?
... Decision #13: What letter to put in the 13th blank space?
For Decision #2, having already used up a letter in Decision #1, we are left with 12 letters.
So we have 12 choices for Decision #2.
For Decision #3, having already used up a letter in Decision #1 and another letter in
Decision #2, we are left with 11 letters. So we have 11 choices for Decision #3.
For Decision #13, having already used up a letter in Decision #1, another in Decision #2,
another in Decision #3, ..., and another in Decision #12, we are left with one letter. So
we have 1 choice for Decision #13.
Altogether then, by the MP, there are 13 × 12 × ⋅ ⋅ ⋅ × 2 × 1 = 13! = 6, 227, 020, 800 possible
ways of making our decisions. This is also the number of ways there are to arrange the 13
letters in the word UNPREDICTABLY.
The next fact simply summarises what should already be obvious from the above examples:
_ _ _ . . . _.
1 2 3 n
For space #1, we have n possible choices. For space #2, we have n − 1 possible choices
(because one object was already placed in space #1). ... And finally for space #n, we have
only 1 object left and thus only 1 choice. By the MP then, there are n × (n − 1) × ⋅ ⋅ ⋅ × 1 = n!
possible ways of filling in these n spaces with the n distinct objects.
Example 131. The word COWDUNG has seven distinct letters. Hence, there are 7! = 5040
permutations of the letters in the word COWDUNG.
In the previous section, we saw that there are 3! permutations of the three letters in the
word CAT and 13! permutations of the 13 letters in the word UNPREDICTABLY. We
made an important note: In each of these words, there was no repeated letter.
We now consider permutations of a set where some elements are repeated.
Example 132. How many permutations are there of the three letters in the word SEE?
A naïve application of the MP would suggest that the answer is 3! = 6. This is wrong.
Enumeration shows that there are only 3 possible permutations:
To see why a naïve application of the MP fails, set up the problem in the framework of the
MP. Consider 3 blank spaces:
_ _ _.
1 2 3
These 3 blanks spaces correspond to 3 decisions to be made. Decision #1: What letter to
put in the first blank space? Decision #2: What letter to put in the second blank space?
Decision #3: What letter to put in the third blank space?
For Decision #1, we can put E or S. So we have 2 choices for Decision #1.
But now the number of choices available for Decision #2 depends on what we chose for
Decision #1! (If we chose E in Decision #1, then we again have 2 choices for Decision
#2. But if instead we chose S in Decision #2, then we now have only 1 choice for Decision
#2.) This violates the implicit but important assumption in the MP that the number of
choices available in one decision is independent on the choice made in the other decision.
Hence, the MP does not directly apply.
The reason SEE has only 3 possible permutations (instead of 3! = 6) is that it contains a
repeated element, namely E. But why would this make any difference?
To understand why, let’s rename the second E as Ê, so that the word SEE is now trans-
formed into a new word SEÊ. From the three letters of this new word, we’d again have
3! = 6 possible permutations:
Restricting attention to the two letters EÊ, we see that there are 2! = 2 ways to permute
these two letters. Hence, any single permutation (in the case where we do not distinguish
between the two E’s) corresponds to 2 possible permutations (in the case where we do). The
figure below illustrates how the 3 permutations of SEE correspond to the 6 permutations
in SEÊ.
Hence, when we do not distinguish between the two E’s, there are only half as many possible
permutations.
If we distinguish between the three S’s, perhaps by calling them S, Ŝ, and S̄, then we’d
have 4! = 24 possible permutations of the letters in the word SAŜS̄.
But amongst the three S’s themselves, we have 3! = 6 possible permutations: SŜS̄, SS̄Ŝ,
ŜSS̄, S̄SŜ, ŜS̄S, and S̄ŜS. So distinguishing between the three S’s increases by 6-fold the
number of possible permutations. Working backwards, the word SASS thus has one-sixth
as many permutations as SAŜS̄. That is, SASS has 4!/3! = 4 possible permutations.
The figure below illustrates how the 4 possible permutations of SASS correspond to the 24
possible permutations of SAŜS̄.
Exercise 46. There are 3 identical white tiles and 4 identical black tiles. How many ways
are there of arranging these 7 tiles in a row? (Answer on p. 342.)
Example 134. Using the 26-letter alphabet, how many 3-letter words can we form that
have no repeated letters? This, of course, is simply the problem of filling in these 3 empty
spaces using 26 distinct elements. For space #1, we have 26 possible choices. For space
#2, we have 25. And for space #2, we have 24.
___
1 2 3
By the MP then, the number of ways to fill the three spaces is 26 × 25 × 24. This is also the
number of three-letter words with no repeated letters.
Problems like the above example crop up often enough to motivate a new piece of notation:
Definition 2. Let n, k be positive integers with n ≥ k. Then P (n, k), read aloud as n
permute k, is defined by
n!
P (n, k) = .
(n − k)!
P (n, k) answers the following question: “Given n distinct objects and k spaces (where
k ≤ n), how many ways are there to fill the k spaces?”
Just so you know, P (n, k) is also variously denoted nP k, Pkn , n Pk , etc., but we’ll stick solely
with the P (n, k) in this textbook.
Example 124 (continued from above). The number of 3-letter words without repeated
letters is simply P (26, 3) = 26!/23! = 26 × 25 × 24.
Example 135. Problem: Using the 22-letter Phoenician alphabet, how many 4-letter words
can we form that have no repeated letters?
This, of course, is simply the problem of filling in these 4 empty spaces using 22 distinct
elements. So the answer is P (22, 4) = 22!/18! = 22 × 20 × 19 × 18 words.
Exercise 47. Out of a committee of 11 members, how many ways are there to choose a
president and a vice-president? (Answer on p. 342.)
Example 136. At a dance party, there are 7 heterosexual married couples (and thus 14
people in total). Problem #1. How many ways are there of arranging them in a line, with
the restriction that every person is next to his or her partner?
Think of there as being 7 units (each unit being a couple). There are 7! ways to arrange
these 7 units in a line. Within each unit, there are 2 possible arrangements. Hence, in
total, there are 7! × 27 possible arrangements.
Example 137. (I assume you’re familiar with the standard 52-card deck.)
Problem #1. Using a standard 52-card deck, how many ways are there of arranging any
3 cards in a line, with the restriction that no two cards of the same suit are next to each
other?
This is the problem of filling in 3 spaces with 52 distinct objects. For space #1, we have
52 possible choices.
_ _ _.
1 2 3
For space #2, having picked a card of suit X for space #1, we must pick a card from some
other suit Y. And so there are only 39 possible choices (we have three suits available —
that’s 3 × 13 = 39).
For space #3, having picked a card of suit Y for space #2, we must pick a card from some
other suit Z. Note that suit Z can be the same as suit X. And so there are 38 possible choices
(we have three suits available, less the card used for space #1 — that’s 3 × 13 − 1 = 38).
Exercise 48. (Answer on p. 343.) There are 4 brothers and 3 sisters. In how many ways
can they be arranged ...
(a) in a line, without any 2 brothers being next to each other?
(b) in a line, without any 2 sisters being next to each other?
P (n, k) is the number of ways we can fill k (ordered) spaces using n distinct objects.
In contrast, C(n, k) is the number of ways of choosing k out of n distinct objects. Equiva-
lently, it is the same problem of filling k spaces using n distinct objects, except that now
order does not matter.
Example 138. Suppose we have a committee of 13 members and wish to select a president
and a vice-president. This is equivalent to the problem of filling in 2 spaces, given 13
distinct objects.
__
1 2
Suppose instead that we want to choose two co-presidents. How many ways are there of
doing so?
This is simply the same problem as before — again we want to fill in 2 spaces, given 13
distinct objects. The only difference now is that the order of the 2 chosen objects
does not matter. So the answer must be that there are P (13, 2)/2! ways of choosing the
two co-presidents.
Example 139. How many ways are there of choosing 5 cards out of a standard 52-card
deck?
_____
1 2 3 4 5
First, how many ways are there to fill 5 spaces using 52 distinct objects (where order
matters)? Answer: P (52, 5) = 52 × 51 × 50 × 49 × 48 = 311, 875, 200.
And so if we don’t care about order, we must adjust this number by dividing by 5! to get
P (52, 5)/5! = 2, 598, 960. So the answer is that to choose 5 cards out of a 52-card deck,
there are 2, 598, 960 ways.
The above examples suggest that, in general, to choose k out of n given distinct objects,
there are P (n, k)/k! possible ways. This motivates the following definition:
P (n, k) n!
C(n, k) = = .
k! (n − k)!k!
It turns out that C(n, k) appears so often in maths that it has many alternative notations
⎛n⎞
— one of the most common is .
⎝k ⎠
“n choose k” also has several names, such as the combination, the combinatorial
number, and even the binomial coefficient. Shortly, we’ll see why the name binomial
coefficient makes sense.
Exercise 49 gives an alternate expression for C(n, k) which you’ll often find very useful.
n × (n − 1) × (n − 2) × ⋅ ⋅ ⋅ × (n − k + 1)
Exercise 49. Show that C(n, k) = . (Answer on p.
k!
344.)
Exercise 50. Compute C(4, 2), C(6, 4), and C(7, 3). (Answer on p. 344.)
Exercise 51. We wish to form a basketball team, consisting of 1 centre, 2 forwards, and 2
guards. We have available 3 centres, 7 forwards, and 5 guards. How many ways are there
of forming a team? (Answer on p. 344.)
Intuitively, this property is true because choosing k out of n objects, is the same as choosing
which n − k out of n objects to ignore. Let’s jot down this symmetry property as a formal
fact:
100!
C(100, 70) = .
30!70!
This is the same as the number of ways to choose the 30 men that will not be used for the
task:
100!
C(100, 30) = .
70!30!
Pascal’s Triangle consists of a triangle of numbers. If we adopt the convention that the
topmost row is row 0 and the leftmost term of each row is the 0th term, then the nth row,
k th term is the number C(n, k):
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 25 35 21 7 1
⋮
It turns out that beautifully enough, each term is equal to the sum of the two terms above
it. The next exercise asks you to verify several instances of this:
Exercise 52. Verify the following: (a) C(1, 0) + C(1, 1) = C(2, 1); (b) C(4, 2) + C(4, 3) =
C(5, 3); (c) C(17, 2) + C(17, 3) = C(18, 3). (Answer on p. 344.)
Suppose we do choose the last object. Then we have to choose another k − 1 objects, out
of the first n objects. There are C(n, k − 1) ways of doing so.
Altogether then, by the Addition Principle, there are C(n, k) + C(n, k − 1) ways of choosing
k out of n + 1 distinct objects.
Poincaré’s quote is especially true in combinatorics. In this section, we’ll learn why C (n, k)
can be called the combination and also the binomial coefficient.
Verify for yourself that the following equations are true:
(1 + x)0 = 1,
(1 + x)1 = 1 + x,
(1 + x)2 = 1 + 2x + x2 ,
(1 + x)3 = 1 + 3x + 3x2 + x3 ,
(1 + x)4 = 1 + 4x + 6x2 + 4x3 + x4 ,
(1 + x)5 = 1 + 5x + 10x2 + 10x3 + 5x4 + x5 ,
(1 + x)6 = 1 + 6x + 15x2 + 20x3 + 15x4 + 6x5 + x6 ,
(1 + x)7 = 1 + 7x + 21x2 + 35x3 + 35x4 + 21x5 + 7x6 + x7 .
⋮
Each of the expressions on the RHS is called a binomial series. Each can also be called
the binomial expansion of (1 + x)n .
Notice anything interesting? No? Try this exercise:
It turns out that somewhat surprisingly, the coefficients of the binomial expansions of
⎛n⎞ ⎛n⎞ ⎛n⎞
(1 + x)n are simply , , ... . As an additional exercise, you should verify for
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝n⎠
yourself that this is also true for n = 0 through n = 6.
There are several ways to explain why the combinatorial numbers also happen to be the
binomial coefficients. Here we’ll give only the combinatorial explanation:
(1 + x)2 = (1 + x)(1 + x) = 1 ⋅ 1 + 1 ⋅ x + x ⋅ 1 + x ⋅ x.
For 1 ⋅ x, we “chose” 1
from the first (1 + x) and x
from the second (1 + x). ⎫
⎪ From the two (1 + x)’s in the
⎪
⎬ product, there are C(2, 1) = 2
⎪
⎪
For x ⋅ 1, we “chose” x ⎭ ways to choose 1 of the x’s.
from the first (1 + x) and 1
from the second (1 + x).
Altogether then, the coefficient on x0 is C(2, 0) (“choose 0 of the x’s”), that on x1 is C(2, 1)
(“choose 1 of the x’s”), and that on x2 is C(2, 1) (“choose 2 of the x’s”). That is:
Exercise 54. (Answer on p. 345.) Mimicking what was just done above, explain why
By plugging x = 1, y = 1 into the last fact, we see that (1 + 1) = 2n is the sum of the terms
in the nth row of Pascal’s triangle:
There’s a nice combinatorial interpretation of the above fact (Poincaré’s quote at work
again).
Consider the set S = {A, B}. S has 22 = 4 subsets: ∅ = {}, {A}, {B}, and S = {A, B}.
Now consider the set T = {A, B, C}. T has 23 = 8 subsets: ∅ = {}, {A}, {B}, {C}, {A, B},
{A, C}, {B, C}, and T = {A, B, C}.
In general, if a set has n elements, how many subsets does it have? We can couch this in
the framework of the Multiplication Principle — this is really a sequence of n decisions of
whether or not to include each element in the subset. There are 2 choices for each decision.
Thus, there are 2n choices altogether. In other words, using a set of n elements, we can
form 2n subsets.
But of course, this must in turn be equal to the sum of the following:
...
Thus,
Exercise 56. Using what you’ve learnt, write down (3 + x)4 . (Answer on p. 346.)
Exercise 57. (Answer on p. 346.) (a) The Tan family has 4 sons and the Wong family
has 3 daughters. Using the sons and daughters from these two families, how many ways
are there of forming 2 heterosexual couples?
(b) The Lee family has 6 sons and the Ho family has 9 daughters. Using the sons and
daughters from these two families, how many ways are there of forming 5 heterosexual
couples?
Example 141. We want to know how much material to purchase, in order to build a fence
around a field. We might go through these steps:
1. Formulate a mathematical model: Our field is the shape of a rectangle, with length
100 m and breadth 50 m.
2. Analyse: The rectangle has perimeter 100 + 50 + 100 + 50 = 300 m.
3. Apply the results of our analysis: We need to buy enough material to build a
300-metre long fence.
That is, describe the real-world scenario in mathematical language and concepts.
This first step is arguably the most important. It is often subjective — not everyone will
agree that your mathematical model is the most appropriate for the scenario at hand.
To use the above example, the field may not be a perfect rectangle, so some may object
to your description of the field as a rectangle. Nonetheless, you may decide that all things
considered, the rectangle is a good mathematical model.
This involves using maths and the rules of logic. (A-level maths exams tend to be mostly
concerned with this second step.)
In the above example, this second step simply involved computing the perimeter of the
rectangle — 100 + 50 + 100 + 50 = 300 m. Of course, for the A-levels, you can expect the
analysis to be more challenging than this.
Note that this second step, in contrast to the first, is supposed to be completely watertight,
non-subjective, and with no room for disagreement. After all, hardly anyone reasonable
could disagree that a perfect rectangle with length 100 m and breadth 50 m has perimeter
300 m.
We’ve secretly always been using mathematical modelling; we just haven’t always been
terribly explicit about it. The foregoing discussion was placed here, because with probability
and statistical models, we want to be especially clear about that we are doing mathematical
modelling.
Real-world scenarios often involve chance. We can model such scenarios mathe-
matically using a mathematical object called the experiment. The experiment can be
formally defined, but we shall not do so in this textbook. Instead, we’ll merely discuss the
experiment informally, with the aid of examples.13
Example 142. A coin flip is an example of an experiment. There are two possible out-
comes: H and T .
Example 143. A die roll is an example of an experiment. There are six possible outcomes:
1, 2, 3, 4, 5, and 6.
Example 144. In the die roll experiment, an example of an event is A = {1, 3, 5}. This is
the event that the die roll is odd. The probability of this event occurring is 0.5. We may
write P(A) = 0.5.
Another example of an event is B = {2, 4, 6}. This is the event that the die roll is even.
The probability of this event occurring is 0.5. We may write P(B) = 0.5.
Another example of an event is C = {1}. This is the event that the die roll is 1. The
probability of this event occurring is 1/6. We may write P(C) = 1/6.
Exercise 58. (Answer on p. 347.) For each of the following experiments, list the possible
outcomes. State the probability of the given event.
(a) You pick, at random, a card from a standard 52-card deck. The event A is the event
that we get a spade.
(b) You flip two fair coins. The event B is the event that both coin-flips are the same.
(c) You roll two fair dice. The event C is the event that the dice sum to 9.
13
See my H2 Mathematics Textbook for a thorough, rigorous, and formal discussion.
To say that two events A and B are mutually exclusive (or disjoint) is to say, informally,
that:
Example 145. Consider the events A = {1, 3, 5}, B = {2, 4, 6}, and C = {1} in the die-roll
experiment.
Example 146. We randomly pick a student from the student population. D is the event
that the student is taller than 1.8 m; E is the event that the student is shorter than 1.6 m;
and F is the event that the student is male.
Example 147. We randomly pick a car in the carpark. G is the event that the car is blue.
H is the event that the car is a Mercedes-Benz. I is the event that the car has only two
seats.
Of the three events given, no two are mutually exclusive.
Exercise 59. We randomly pick a student from the student population. A is the event
that this student has an iPhone. B is the event that this student has exactly one phone. C
is the event that this student has at least two phones. (i) Are A and B mutually exclusive?
(ii) A and C? (iii) B and C? (Answer on p. 347.)
Let A be an event. Its complement — the event A′ (also denoted Ac ) — is the set of all
outcomes other than those in A.
Example 148. Consider the events A = {1, 2}, B = {2, 3, 5}, and C = {1} in the die-roll
experiment.
Their complements are A′ = {3, 4, 5, 6}, B ′ = {1, 4, 6}, and C ′ = {2, 3, 4, 5, 6}.
Example 149. We randomly pick a student from the student population. D is the event
that the student is taller than 1.8 m.
Its complement is D′ , the event that the student is 1.8 m or shorter.
Example 150. We randomly pick a car in the carpark. G is the event that the car is blue.
Its complement is G′ , the event that the car is not blue.
Exercise 60. We randomly pick a student from the student population. A is the event
that this student has exactly one phone. B is the event that this student has two phones.
What are the complements B ′ and C ′ ? (Answer on p. 347.)
Example 151. Flip three fair coins. The possible outcomes are
Let A be the event that there is at least 1 tail, B be the event that there are at least 2
heads, and C be the event that there are at least 3 tails. That is,
C = {T T T } .
A ∪ B is the event that there is at least 1 tail OR there are at least 2 heads. A ∪ C is the
event that there is at least 1 tail. B ∪ C is the event that there are at least 3 tails OR there
are at least 2 heads.
Exercise 61. Roll two dice. Let A be the event that the sum of the rolls is even; B be the
event that it is 11 or 12; and C be the event that it is odd. Write down the probabilities
of the events A, B, C, A ∪ B, A ∪ C, and B ∪ C. (Answer on p. 348.)
Example 152. Flip three fair coins. The possible outcomes are
As before, let A be the event that there is at least 1 tail, B be the event that there are at
least 2 heads, and C be the event that there are at least 3 tails.
A ∩ B is the event that there is at least 1 tail AND there are at least 2 heads. A ∩ C is the
event that there are at least 3 tails. B ∩ C is the event that there are at least 3 tails AND
there are at least 2 heads.
A ∩ B = {HT T, T HT, T T H} ,
A ∩ C = C = {T T T } ,
B ∩ C = {} .
Note that B ∩ C is the empty event. That is, it is the event that contains no outcomes.
Exercise 62. Roll two dice. As before, let A be the event that the sum of the rolls is even;
B be the event that it is 11 or 12; and C be the event that it is odd. Write down the
probabilities of the events A ∩ B, A ∩ C, and B ∩ C. (Answer on p. 348.)
1. Non-negativity: P(A) ≥ 0.
2. Normalisation: P(S) = 1, where S is the set of all possible outcomes.
3. Sum of two mutually exclusive events: P(A ∪ B) = P(A) + P(B).
4. Complements: P(A) = 1 − P (Ac ).
5. Monotonicity: If every event in B is also in A, then P(B) ≤ P(A).
6. Probabilities are at most 1: P(A) ≤ 1.
7. Inclusion-Exclusion: P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
Venn diagrams are helpful for illustrating probabilities. Those below help to illustrate four
of the above properties.
Example 153. Flip three fair coins. The possible outcomes are
Let A be the event that there is at least 1 tail and B be the event that there are at least 2
heads. That is,
Question: You are told that A occurred; what then is the probability that B also occurred?
P(A ∩ B) 3/8 3
P(B∣A) = = = .
P(A) 7/8 7
Explanation: There is probability 7/8 that A occurred. There is probability 3/8 that both
A and B occurred. Thus, given that A occurred, the probability that B also occurred is
3/7.
Hence, given that B has occurred, the probability that A has also occurred is simply
0.2/0.6 = 1/3. (The information that P(A) = 0.5 is irrelevant.) Formally:
P(A ∩ B) 0.2 1
P(A∣B) = = = .
P(B) 0.6 3
Exercise 64. Roll two dice. Given that the sum of the two dice rolls is 8, what is the
probability that we rolled at least one even number? (Answer on p. 349.)
Informally, two events A and B are independent if the probability that both occur is
simply the product of the probabilities that each occurs. Independence is thus analogous
to the MP from counting. Formally:
P(A ∩ B) = P(A)P(B).
1
Proof. By definition of conditional probabilities, P(A∣B) = P(A ∩ B)/P(B). By definition
2 2 1
of independence, P(A ∩ B) = P(A)P(B). Plugging = into =, we have P(A∣B) = P(A), as
desired.
The intuitive idea of independence is easy to grasp. If we say that the two coin flips are
independent, what we mean is that the following four conditions are true:
1. H1 and H2 are independent. (The probability that the second flip is heads is independent
of whether the first flip is heads.)
2. H1 and T2 are independent. (The probability that the second flip is tails is independent
of whether the first flip is heads.)
3. T1 and H2 are independent. (The probability that the second flip is heads is independent
of whether the first flip is tails.)
4. T1 and T2 are independent. (The probability that the second flip is tails is independent
of whether the first flip is tails.)
Formally:
1. P (H1 ∩ H2 ) = P({HH}) = P (H1 ) P (H2 ) = P({HH, HT }) ⋅ P({HH, T H}) = 0.5 × 0.5 =
0.25.
2. P (H1 ∩ T2 ) = P({HT }) = P (H1 ) P (T2 ) = P({HH, HT })⋅P({HT, T T }) = 0.5×0.5 = 0.25.
3. P (T1 ∩ H2 ) = P({T H}) = P (T1 ) P (H2 ) = P({T H, T T })⋅P({HH, T H}) = 0.5×0.5 = 0.25.
4. P (T1 ∩ T2 ) = P({T T }) = P (T1 ) P (T2 ) = P({T H, T T }) ⋅ P({HT, T T }) = 0.5 × 0.5 = 0.25.
Example 156. Flip a fair coin and roll a fair die. Consider the event “Heads”
E1 = {H1, H2, H3, H4, H5, H6}, and the event “Roll an odd number” E2 =
{H1, H3, H5, T 1, T 3, T 5}. These two events E1 and E2 are independent, as we now verify:
P (E1 ∩ E2 ) 3/12 1
P (E1 ∣E2 ) = = = = P (E1 ) .
P (E2 ) 6/12 2
More broadly, we can even say that the coin flip and die roll are independent. Informally,
this means that the outcome of the coin flip has no influence on the outcome of the die roll,
and vice versa.
The idea of independence is a little tricky to illustrate on a Venn diagram. I’ll try anyway.
We compute
P(A ∩ B) 0.02
P(A∣B) = = = 0.2.
P(B) 0.1
We observe that P(A) = 0.2 = P(A∣B). And so by Fact 7, we conclude that the events A
and B are independent.
Flip two fair coins. Let H1 be the event that the first coin flip is heads, H2 be the event
that the second is heads, and T1 be the event that the first flip is tails. Show that
The idea of independence is intuitively easy to grasp. Indeed, so much so that students
often assume that “everything is independent”. This is a mistake. Unless you’re explicitly
told, NEVER assume that two events are independent.
Example 158. The event “coin-flip #1 is heads” and the event “coin-flip #2 is heads” are
probably independent.
Example 159. The event “die-roll #1 is 3” and the event “die-roll #2 is 6” are probably
independent.
Here are two examples where the assumption of independence is not plausible:
Example 160. The event “Google’s share price rises today” is probably not independent
of the event “Apple’s share price rises today”.
Example 161. The event “it rains in Singapore today” is probably not independent of the
event “it rains in Kuala Lumpur today”.
4
1 1
( ) = .
1, 000, 000 1, 000, 000, 000, 000, 000, 000, 000, 000
This is equal to the probability of buying a 4D number on six consecutive weeks, and
winning first prize every time. Is the journalist correct?
Informally, a random variable assigns a numerical code to each possible outcome. A bit
more formally, it is a function that maps each outcome to a real number.
Example 162. Flip a fair coin. Let X be the random variable that indicates whether
the coin-flip is heads. So X(H) = 1 and X(T ) = 0.
(A bit more formally, we say that X is the function that maps the outcome H to the number
1 and the outcome T to the number 0.)
We refer to 1 and 0 as the possible observed values of the random variable X. These
correspond to the two possible outcomes of the coin-flip experiment.
Let A be the random variable that that indicates whether there are at least 2 heads. So
A(HHH) = A(HHT ) = A(HT H) = A(T HH) = 1 And A(T T T ) = A(T T H) = A(T HT ) =
A(HT T ) = 0.
Example 164. Draw a card from a standard 52-card deck. In bridge, an ace is worth 4
high card points, a king 3, a queen 2, and a jack 1. Any other card is worth 0 points.
So we might let B be the corresponding random variable, where for example B(Aª) = 4,
B(J¨) = 1, and B(7«) = 0.
Exercise 68. Let X be the random variable that is the sum of two fair die-rolls. What are
the possible observed values of X? (Answer on p. 349.)
Exercise 69. Let C be the random variable that counts the total number of high card
points, in any two randomly-chosen cards from a standard 52-card deck. What are the
possible observed values of C? (Answer on p. 350.)
The notation X = k is shorthand for the event that contains all the outcomes s such
that X(s) = k.
The notation “X ≥ k”, “X > k”, “X ≤ k”, “X < k”, “a ≤ X ≤ b”, etc. are similarly defined.
Example 162 (continued from above). Recall the fair coin-flip. Let A be the event
that the coin-flip is heads and B be the event that the coin-flip is tails. So P(A) = 0.5 and
P(B) = 0.5.
Let X be the random variable that indicates whether the coin-flip is heads. That is,
X(H) = 1 and X(T ) = 0.
By our newly-introduced notation, we can also write P(X = 1) = 0.5 and P(X = 0) = 0.5.
We also have P(X ≤ 1) = P(X = 0) + P(X = 1) = 1.
Example 163 (continued from above). Recall the three fair coin-flips. Let C, D, E,
and F be the events that there are 0, 1, 2, and 3 heads. So P(C) = 1/8, P(D) = 3/8,
P(E) = 3/8, and P(F ) = 1/8.
Let Y be the random variable that counts the number of heads. By our newly-introduced
notation, we can also write P(Y = 0) = 1/8, P(Y = 1) = 3/8, P(Y = 2) = 3/8, and P(Y = 3) =
1/8.
Example 164 (continued from above). Recall the high card point count in bridge.
Randomly choose a card from a standard 52-card deck. Let G be its high card point
count. By our newly-introduced notation, we can write P(G = 0) = 9/13, P(G = 1) = 1/13,
P(G = 2) = 1/13, P(G = 3) = 1/13, and P(G = 4) = 1/13.
Example 165. Flip two fair coins. The four possible outcomes are HH, HT , T H, and
TT.
Let X indicate whether the two coin flips are the same and Y count the number of heads.
That is,
Y (HH) = 2, Y (HT ) = 1, Y (T H) = 1, Y (T T ) = 0.
Another example:
A«, K«, , . . . , 2«, Aª, Kª, . . . , 2ª, A©, K©, . . . , 2©, A¨, K¨, . . . , 2¨.
Let Y indicate whether the picked card is a spade («). That is,
39 13
P(Y = 0) = , P(Y = 1) = .
52 52
⎛ ⎞ ⎛ ⎞
X = 7 and X = 5.
⎝ ⎠ ⎝ ⎠
The table below says that P (X = 2) = 1/36, because there is only one way the event X = 2
can occur. And P (X = 3) = 2/36, because there are two ways the event X = 3 can occur.
Exercise 70. (Continuation of the above example.) (Answer on p. 350.) (a) Complete
the above table.
Consider the event E, described in words as “the sum of the two dice is at least 10”.
Informally, two random variables are independent if knowing the value of one does not
tell us anything about the value of the other.
Example 168. Flip a fair coin twice. The four possible outcomes are HH, HT, T H, T T .
When we say that “the two coin-flips are independent”, what exactly do we mean by this?
Let’s rephrase this statement slightly more formally.
Let A indicate whether the first coin-flip was heads and B indicate whether the second was
heads. That is,
Informally, the second statement says that knowing the value of A (whether the first coin-
flip was heads or not) tells us absolutely nothing about the value of B (whether the second
coin-flip was heads or not).
For example, if we know that A = 1, then P(B = 0) = 0.5 and P(B = 1) = 0.5. And if we
know instead that A = 0, then P(B = 0) = 0.5 and P(B = 1) = 0.5. Thus, knowing whether
A = 1 or A = 0 makes absolutely no difference about what we wan say about B.
Formally:
Definition 5. Given random variables X and Y , we say that X and Y are independent if
for all x, y,
A and B remain the random variables indicating whether the first and second coin-flips are
heads (respectively).
We now verify that indeed, P (A = a, B = b) = P(A = a)P(B = b) for all possible values of a
and b:
P (A = a, B = b) P(A = a)P(B = b)
P (A = 0, B = 0) = 0.25 P (A = 0) P (B = 0) = 0.5 × 0.5, ✓
P (A = 1, B = 0) = 0.25 P (A = 1) P (B = 0) = 0.5 × 0.5, ✓
P (A = 0, B = 1) = 0.25 P (A = 0) P (B = 1) = 0.5 × 0.5, ✓
P (A = 1, B = 1) = 0.25 P (A = 1) P (B = 1) = 0.5 × 0.5. ✓
The above method for proving that two random variables are independent becomes espe-
cially useful, when it is not immediately “obvious” that they are independent:
Exercise 71. Flip two fair coins. Let X indicate whether the two coin flips were the same
and Y count the number of heads. Are X and Y independent random variables? (Answer
on p. 350.)
Earlier we warned against blithely assuming that any two events are independent. Here we
can repeat this warning: Unless explicitly told (or you have a good reason), do not assume
that two random variables are independent.
The assumption of independence is a strong one. There are many scenarios where it is
plausible. For example, the flips of two coins are probably independent. The rolls of two
dice are probably independent.
There are, however, also many scenarios where it is not plausible. Today’s changes in
the share prices of Google and Apple are probably not independent. Today’s rainfall in
Singapore and in Kuala Lumpur are probably not independent.
Note that X takes on a value 1 with probability 1/6. Similarly, it takes on a value 2 with
probability 1/6. Etc. Hence, the expected value of X, denoted E [X] is given by:
1 1 1 1 1 1 1 + 2 + 3 + 4 + 5 + 6 21
E[X] = ⋅1+ ⋅2+ ⋅3+ ⋅4+ ⋅5+ ⋅6= = = 3.5.
6 6 6 6 6 6 6 6
A bit more formally, the expected value or mean of a random variable X — denoted E[X]
— is simply a weighted average of the possible observed values of X, where the weights are
simply given by the probability that the random variable takes on each possible observed
value.
Given a random variable X, its mean is usually denoted µX . If it’s obvious from the context
that we’re talking about the random variable X, we drop the subscript X and simply use
µ to denote the mean of X.
µY = P (Y = 2) ⋅ 2 + P (Y = 3) ⋅ 3 + P (Y = 4) ⋅ 4 + P (Y = 5) ⋅ 5 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ 12
1 2 3 4 5 6 5 4 3 2 1
= ⋅2+ ⋅3+ ⋅4+ ⋅5+ ⋅6+ ⋅7+ ⋅8+ ⋅9+ ⋅ 10 + ⋅ 11 + ⋅ 12
36 36 36 36 36 36 36 36 36 36 36
2 + 6 + 12 + 20 + 30 + 42 + 40 + 36 + 30 + 22 + 12 252
= = = 7.
36 36
As it turns out, it is generally true that E[X + Y ] = E[X] + E[Y ] (as we’ll see in the next
section). So if we knew this, then the problem is very easy: E[X + Y ] = E[X] + E[Y ] =
1 4
1+ = .
3 3
But as an exercise, let’s pretend we don’t know that E[X + Y ] = E[X] + E[Y ]. We thus
have to work out E[X + Y ] the hard way:
1 1 5 5 25
P (X + Y = 0) = ⋅ ⋅ ⋅ = ,
2 2 6 6 144
⎛ 2 ⎞ 1 1 5 5 1 1 ⎛ 2 ⎞ 5 1 50 10 60
P (X + Y = 1) = ⋅ ⋅ ⋅ + ⋅ = + = .
⎝ 1 ⎠ 2 2 6 6 2 2 ⎝ 1 ⎠ 6 6 144 144 72
You are asked to complete the rest of this problem in the exercise below.
Exercise 73. In the game of 4D, you pay $1 to pick any four-digit number between 0000
and 9999 (there are thus 10, 000 possible choices). There are two variants of the 4D game
— “big” and “small”. The prize structures are as given below. Let X be the prize received
from a $1 stake in the “big” game and Y be the prize received from a $1 stake in the “small”
game. (Answer on p. 352.)
(a) Write down the possible observed values of X and Y .
(b) Write down the probability distributions of X and Y .
(c) Hence find E[X] and E[Y ].
(d) Which game — “big” or “small” — is expected to lose you less money?
(Source: Singapore Pools, “Rules for the 4-D Game”, Version 1.11, 17/11/15, PDF.)
d
Example 172. The differentiation operator is an example of a linear transformation.
dx
Because it satisfies the following two conditions:
d d d
(f (x) + g(x)) = f (x) + g(x),
dx dx dx
d d
and (kf (x)) = k f (x).
dx dx
√
Example 173. The square-root operator ⋅ is not a linear transformation, because in
general, we do not have
√ √ √
x + y = x + y,
√ √
or kx = k x.
Example 174. The square operator ⋅2 is not a linear transformation, because in general,
we do not have
2
(x + y) = x2 + y 2 ,
2
or (kx) = kx2 .
E[X + Y ] = E [X] + E [Y ] ,
The expectation operator is linear. This is true even if independence is not satisfied,
which makes it an especially powerful property. Example:
Example 175. I stake $100 on each of two different 4D numbers for Saturday’s drawing
(“big” game). (So that’s $200 total.)
Let X and Y be my winnings (excluding my original stake) from the first and second
numbers (respectively). Now, X and Y are certainly not independent because for example,
if my first number wins first prize, then my second number cannot possibly also win first
prize.
Nonetheless, despite X and Y not being independent, the linearity of the expectation
operator tells us that
Example 176. Consider a random variable X that is equally likely to take on one of 5
possible values: 0, 1, 2, 3, 4. Its mean is
1 1 1 1 1
µX = ∑ P (X = k) ⋅ k = ⋅ 0 + ⋅ 1 + ⋅ 2 + ⋅ 3 + ⋅ 4 = 2.
5 5 5 5 5
Now consider another random variable Y that is equally likely to take on one of 5 possible
values: −8, −3, 2, 7, 12. Coincidentally, its mean is the same:
1 1 1 1 1
µY = ∑ P (Y = k) ⋅ k = ⋅ (−8) + ⋅ (−3) + ⋅ 2 + ⋅ 7 + ⋅ 12 = 2.
5 5 5 5 5
The random variables X and Y share the same mean. However, there is an obvious differ-
ence: Y is “more spread out”.
What, precisely, do we mean when we say that one random variable is “more spread out”
than another?
Our goal in this section is to invent a measure of “spread-outness”. We’ll call this the
variance and denote the variance of any random variable X by V [X].
It’s not at all obvious how the variance should be defined. One possibility is to define the
variance as the weighted average of the deviations from the mean.
Hmm. This works out to be 0. Is that just a weird coincidence? Let’s try the same for Y :
−8 − µ −3 − µ 2 − µ 7 − µ 12 − µ
V [Y ] = ∑ P (Y = k) ⋅ (k − µ) = + + + +
5 5 5 5 5
−8 − 2 −3 − 2 2 − 2 7 − 2 12 − 2
= + + + + = −2 − 1 + 0 + 1 + 2 = 0.
5 5 5 5 5
=µ
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
∑ P(X = k) ⋅ (k − µ) = ∑ P(X = k) ⋅ k − ∑ P(X = k) ⋅ µ
k k k
= µ − µ∑ P(X = k) = 0.
k
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
=1
So our first proposed definition of the variance — the weighted average of the deviations
from the mean — is always equal to 0. Intuitively, the reason is that the negative deviations
(corresponding to those values below the mean) exactly cancel out the positive deviations
(corresponding to those values above the mean).
This proposed definition is thus quite useless. We cannot use it to say things like Y is
“more spread out” than X.
This suggests a second approach: define the variance to be the weighted average of the
absolute deviations from the mean.
For X, the weighted average of the absolute deviations from the mean is
∣0 − µ∣ ∣1 − µ∣ ∣2 − µ∣ ∣3 − µ∣ ∣4 − µ∣
V [X] = ∑ P (X = k) ⋅ ∣k − µ∣ = + + + +
5 5 5 5 5
∣0 − 2∣ ∣1 − 2∣ ∣2 − 2∣ ∣3 − 2∣ ∣4 − 2∣ 2 1 1 2 6
= + + + + = + +0+ + = .
5 5 5 5 5 5 5 5 5 5
Wonderful! So we can now use this second proposed definition of the variance to say things
like “Y is more spread out than X”.
This second proposed definition seems perfectly satisfactory. Yet for some bizarre reason,
it will not be our actual definition of variance. Instead, the variance will be defined as the
weighted average of the squared deviations from the mean.
For X, the weighted average of the squared deviations from the mean is
2 2 2 2 2
(0 − µ)
2 (1 − µ) (2 − µ) (3 − µ) (4 − µ)
V [X] = ∑ P (X = k) ⋅ (k − µ) = + + + +
5 5 5 5 5
2 2 2 2 2
(0 − 2) (1 − 2) (2 − 2) (3 − 2) (4 − 2) 4 1 1 4
= + + + + = + + 0 + + = 2.
5 5 5 5 5 5 5 5 5
2 2 2 2 2
(−8 − µ)
2 (−3 − µ) (2 − µ) (7 − µ) (12 − µ)
V [Y ] = ∑ P (Y = k) ⋅ (k − µ) = + + + +
5 5 5 5 5
2 2 2 2 2
(−8 − 2) (−3 − 2) (2 − 2) (7 − 2) (12 − 2)
= + + + + = 20 + 5 + 0 + 5 + 20 = 50.
5 5 5 5 5
A bit more formally, if X is a random variable and µ is its expected value, then its variance
2
is defined to be the expected value of (X − µ) .
2
The variance of X is denoted V[X] or σX or even more simply as σ 2 (if it is clear from the
context that we’re talking about the variance of X). So we may write
2 2
V[X] = σX = E [(X − µ) ] .
So to calculate the variance, we do this: Consider all the possible values that X can take.
Take the difference between these values and the mean of X. Square them. Then take the
probability-weighted average of these squared numbers.
More examples:
2 2
V[X] = E [(X − µ) ] = E [(X − 3.5) ]
1 35
= (2.52 + 1.52 + 0.52 + 0.52 + 1.52 + 2.52 ) = ≈ 2.92.
6 12
So the variance of the die roll is 35/12 ≈ 2.92. This means that the expected squared
deviation of X from its mean µ = 3.5 is 35/12 ≈ 2.92.
Example 178. Roll two fair dice. Let the random variable Y be the sum of the two dice.
We already know from Example 170 that µ = 7. So, using also our findings from Exercise
70,
2 2
V[Y ] = E [(Y − µ) ] = E [(Y − 7) ]
1 ⋅ 52 + 2 ⋅ 42 + 3 ⋅ 32 + 4 ⋅ 22 + 5 ⋅ 12 + 6 ⋅ 02 + 5 ⋅ 12 + 4 ⋅ 22 + 3 ⋅ 32 + 2 ⋅ 42 + 1 ⋅ 52
=
36
2 (25 + 32 + 27 + 16 + 5) 210 70
= = = ≈ 5.83.
36 36 12
So the variance of the sum of two dice is 70/12 ≈ 5.83. This means that on average, the
square of the deviation of Y from its mean µ = 7 is 70/12 ≈ 5.83.
As the above examples suggest, calculating the variance can be tedious. Fortunately, there
is a shortcut:
Proof. Omitted.
Example 177 (continued from above). Let the random variable X be the outcome of
the roll of a fair die. We already know that µ = 3.5. So compute
1 2 2 91
E [X 2 ] = P (X = 1) ⋅ 12 + P (X = 2) ⋅ 22 + ⋅ ⋅ ⋅ + P (X = 6) ⋅ 62 = (1 + 2 + ⋅ ⋅ ⋅ + 62 ) = .
6 6
91 182 147 35
Hence, V[X] = E [X 2 ] − µ2 = − 3.52 = − = .
6 12 12 12
Example 178 (continued from above). Let the random variable Y be the sum of two
rolled dice. We already know from Example 170 that µ = 7. So, using also our findings
from Exercise 70,
E [Y 2 ] = P (Y = 2) ⋅ 22 + P (Y = 3) ⋅ 32 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ 122
Exercise 74. Let the random variable B be the high card point count of a randomly-chosen
card from a standard 52-card deck. Find V[B]. (Answer on p. 353.)
Let X be a random variable. Then E [X] has the same unit of measure as X. In contrast,
V [X] uses the squared unit.
Example 179. There are 100 dumbbells in a gym, of which 30 have weight 5 kg and the
remaining 70 have weight 10 kg. Let X be the weight of a randomly-chosen dumbbell.
Then the mean and variance of X are
2 2
V [X] = 0.3 × (5 kg − 8.5 kg) + 0.7 × (10 kg − 8.5 kg)
= 0.3 × 12.25 kg2 + 0.7 × 2.25 kg2 = 5.25 kg2 .
To get a measure of “spread” that uses the original unit of measure, we simply take the
square root of the variance. This is called the standard deviation as a measure of spread.
Definition 6. Let X be a random variable and V[X] be its variance. Then the standard
deviation of X is defined as
√
SD [X] = V[X].
2
The variance of a random variable X is often denoted σX or even more simply as σ 2 (if it
is clear from the context that we’re talking about the variance of X).
Exercise 75. There are 100 rulers in a bookstore, of which 35 have length 20 cm and
the remaining 65 have length weight 30 cm. Let Y be the weight of a randomly-chosen
dumbbell. Find the mean, variance, and standard deviation of Y . (Be sure to include the
units of measurement. Answer on p. 353.)
With the above properties, it becomes much easier than before to find the variance of the
sum of 2 dice, 3 dice, or indeed n dice.
Example 180. Let X be the outcome of a fair die-roll. We showed earlier that V[X] =
35/12.
Now roll two fair dice. Let X1 and X2 be the respective outcomes. Let Y be the sum of
the two dice (i.e. Y = X1 + X2 ). Assuming independence, we have
70
V[Y ] = V [X1 + X2 ] = V [X1 ] + V [X2 ] = .
12
Now roll three fair dice. Let X3 , X4 , and X5 be the respective outcomes. Let Z be the sum
of the three dice (i.e. Z = X3 + X4 + X5 ). Again, assuming independence, we have
105
V[Z] = V [X3 + X4 + X5 ] = V [X3 ] + V [X4 ] + V [X5 ] = .
12
Again, compare this quick computation to the work we would have had to do, without this
property!
Now, let A be double the outcome of a die roll (i.e. A = 2X). Note importantly that A ≠ Y .
Y is the sum of two independent die rolls. In contrast, A is double the outcome of a single
die roll. Indeed, we have that
140
V[A] = V[2X] = 4V[X] = ≠ V[Y ].
12
Similarly, let B be triple the outcome of a die roll (i.e. B = 3X). Note importantly that
B ≠ Z. Z is the sum of three independent die rolls. In contrast, B is triple the outcome of
a single die roll. Indeed, we have that
315
V[B] = V[3X] = 9V[X] = ≠ V[Z].
12
Exercise 76. The weight of a fish in a pond is a random variable with mean µ kg and
variance σ 2 kg2 . (Include the units of measurement in your answer. Answer on p. 353.)
(a) If two fish are caught and the weights of these fish are independent of each other, what
are the mean and variance of the total weight of the two fish?
(b) If one fish is caught and an exact clone is made of it, what are the mean and variance
of the total weight of the fish and its clone?
(c) If two fish are caught and the weights of these fish are not independent of each other,
what are the mean and variance of the total weight of the two fish?
Example 181. Flip 3 fair coins. Let X be the random variable that counts the number of
heads.
1
Then X is an example of a binomial random variable with parameters 3 and .
2
Example 182. Flip 4 fair coins. Let Y be the random variable that counts the number of
heads.
1
Then Y is an example of a binomial random variable with parameters 4 and .
2
Example 183. There are 10 ATMs. On any given day, each has, independently, probability
0.1 of failure. Let Z be the random variable that counts the number of failures on any given
day.
We flip a biased coin n times. On each flip, the coin has probability p of landing on heads.
Let X count the number of heads. Then X is the binomial random variable with parameters
n and p.
What is P(X = k)? In other words, what is the probability that there are k heads and n − k
tails?
First let’s consider instead the probability that the first k coin-flips are heads and the
remaining n − k coin-flips are tails. We know that the probability of a heads is p and the
probability of a tails is 1 − p. Hence, by the Multiplication Principle, this probability is
simply pk (1 − p)n−k .
The above is the probability of k heads and n − k tails, but where exactly the first k trials
are successes and exactly the last n − k trials are failures. But we don’t care about where
the successes are. We only care that there are k successes. And there are C(n, k) ways to
have exactly k successes in n trials. Thus, our desired probability is:
⎛n⎞ k
P(X = k) = p (1 − p)n−k .
⎝k ⎠
Example 185. Let X be the number of heads when 10 fair coins are flipped.
Then X ∼ B(10, 0.5). And the probability that exactly 8 coins are heads is:
⎛ 10 ⎞ 8 2 45
P(X = 8) = 0.5 0.5 = .
⎝ 8 ⎠ 1024
⎛ 20 ⎞ 18 2 ⎛ 20 ⎞ 19 1 ⎛ 20 ⎞ 20 0
= 0.9 0.1 + 0.9 0.1 + 0.9 0.1 ≈ 0.677.
⎝ 18 ⎠ ⎝ 19 ⎠ ⎝ 20 ⎠
Example 187. Problem: Three machines each have, independently, probability 0.3 of fail-
ure. What is the expected number of failures? What is the variance of the number of
failures?
Solution: Let Z ∼ B(3, 0.3) be the number of failures. Then
Hence, E[Z] = P (Z = 1) ⋅ 1 + P (Z = 2) ⋅ 2 + P (Z = 3) ⋅ 3
Now, E [Z 2 ] = P (Z = 1) ⋅ 12 + P (Z = 2) ⋅ 22 + P (Z = 3) ⋅ 32
2
Hence, V[Z] = E [Z 2 ] − (E [Z]) = 1.44 − 0.92 = 0.63.
It turns out though that there is a much quicker formula for finding the mean and variance
of any binomial random variable.
(You can verify that this formula works for the last example: n = 3, p = 0.3, and thus
E[Z] = np = 0.9.)
Proof. Omitted.
Exercise 77. (Answer on p. 354.) Plane engine #1 contains 20 components, each of which
has probability 0.01 of failure. Plane engine #2 contains 35 components, each of which has
probability 0.005 of failure. The probability that any component fails is independent of
whether any other component has failed.
An engine fails if and only if at least 2 of its components fail. What is the probability that
both engines fail?
The binomial random variable is discrete, because its range of possible observed values is
finite.
We’ll now look instead at continuous random variables. Informally, a random variable Y
is continuous if its range takes on a continuum of values.
For H1 Maths, you need only learn about one continuous random variable: the normal
random variable (subject of the next chapter).
Nonetheless, we’ll first look at another continuous random variable that is not in the syl-
labus. This is the continuous uniform random variable. It is much simpler than
the normal random variable and can thus help build up your intuition of how continuous
random variables work.
A line measuring exactly 1 metre in length is drawn on the floor. It is about to rain. Let
X be the position of the first rain-drop that hits the line. X is measured as the distance
(in metres) from the left-most point of the line.
So for example, if the first rain-drop hits the left-most point of the line, then x = 0. If it
hits the exact midpoint of the line, then x = 0.5. And if it hits the right-most point, then
x = 1.
Assume we can measure X to infinite precision.
Then, assuming the first rain-drop is equally likely to hit any point of the line, we can
model X as a continuous uniform random variable on [0, 1]. This says that
• The range of X is [0, 1] (the first rain-drop can hit any point along the line); and
• X is equally likely to take on any value in the interval [0, 1] (the first rain-drop is equally
likely to hit any point along the line).
Recall that previously with any discrete random variable Y , we could find its probability
distribution. That is, we could find P (Y = k) (the probability that Y takes on the value
k). For example, if Y ∼ B (3, 0.5) modelled the number of heads in three coin-flips, then
⎛3⎞ 1 2 3
the probability that there was one heads was P (Y = 1) = 0.5 0.5 = .
⎝1⎠ 8
So for any continuous random variable X, it is pointless to try to write down P (X = k) for
different possible values of k, because P (X = k) is always equal to zero (regardless of what
k is). Instead, we shall try to write down P (a ≤ X ≤ b), for different possible values of a
and b.
Now, if X ∼ U [0, 1], then the probability that X takes on values between 0.3 and 0.7 is
simply 0.7 − 0.3 = 0.4. That is,
Similarly, the probability that X takes on values between 0.16 and 0.35 is simply 0.35−0.16 =
0.19. That is,
The above observations suggest that it may be useful to define a new concept, called the
cumulative distribution function.
14
But strangely enough, zero probability is not the same thing as impossible. For example, we’d say that
• There is zero probability, but it is not impossible that X ∼ U [0, 1] takes on the value 0.37.
• There is zero probability and it is impossible that X ∼ U [0, 1] takes on the value 1.2.
(Actually, rather than use the word “impossible”, mathematicians prefer saying “almost never”, which has a precise
definition.)
P (X ≤ k) = P (X < k) .
That is, whether an inequality is strict makes no difference. The reason is that:
Thus, for continuous random variables, it doesn’t matter whether inequalities are strict or
weak.
P (0.2 ≤ X ≤ 0.5) = P (0.2 < X ≤ 0.5) = P (0.2 ≤ X < 0.5) = P (0.2 < X < 0.5) .
Example 189. Let X ∼ U [0, 1]. Let FX be its CDF. Then we have, for example,
Example 190. Let Y ∼ U [3, 5]. This is the continuous uniform distribution on [3, 5]. It
is equally likely to take on any value in the interval [3, 5]. Let FY be the CDF of Y . Then
we have, for example,
d
fX = FX .
dk
Example 191. The PDF of X ∼ U[0, 1] (graphed below) is simply the function fX ∶ R → R
defined by
Recall that the area under the curve (definite integral) can be computed as the reverse
process of differentiation. Hence, for any a ≤ b, the area under the PDF between a and b is
precisely P (a ≤ X ≤ b). For example, there is probability 0.25 (red area) that X takes on
values between 0.5 and 0.75. There is probability 0.1 (blue area) that X takes on values
between 0.2 and 0.3.
Exercise 78. The continuous uniform random variable Y ∼ U[3, 5] is equally likely to take
on values between 3 and 5, inclusive. (a) Write down CDF FY . (b) Write down and graph
its PDF fY . (c) Compute, and also illustrate on your graph, the quantities P (3.1 ≤ Y ≤ 4.6)
and P (4.8 ≤ Y ≤ 4.9). (Answer on p. 354.)
15
Note that although every random variable has a CDF, not every random variable has a PDF. In particular, if the random
variable’s CDF is not differentiable, then by our definition here, the random variable does not have a PDF.
The standard normal (or Gaussian) random variable (SNRV) is very important. In
fact, it is so important that we usually reserve the letter Z for it, and the Greek letters φ
and Φ (lower- and upper-case phi) for its PDF and CDF.
1. Z is a SNRV.
2. Z is a random variable with the standard normal distribution.
3. Z ∼ N (0, 1).
1
φ(a) = √ e−0.5a .
2
2π
For the A-levels, you need not remember this complicated-looking PDF. Nor need you
understand where it comes from.
The normal PDF is often also referred to as the bell curve, due to its resemblance to a
bell (kinda).
As with the continuous uniform, for any a ≤ b, the area under the normal PDF between a
and b gives us precisely P (a ≤ X ≤ b). For example, there is probability 0.0819 (red area)
that X takes on values between 0.5 and 0.75. There is probability 0.4593 (blue area) that
X takes on values between −1 and 0.3.
a a 1
√ e−0.5x dx.
2
Φ(a) = P (Z ≤ a) = ∫ φ(x)dx = ∫
−∞ −∞ 2π
Unfortunately, this last integral has no simpler expression (mathematicians would say that
it has no “closed-form expression”). Instead, as we’ll soon see, we have to use the so-called
Z-tables (or a graphing calculator) to look up values of Φ(k).
The next fact summarises the properties of the normal distribution. Some of these proper-
ties are illustrated in the figure that follows.
Fact 10. Let Z ∼ N(0, 1) and let φ and Φ be the PDF and CDF of Z.
1. Φ(∞) = 1. (The area under the entire PDF is 1. This, of course, is true of any random
variable.)
2. φ(a) > 0, for all a ∈ R. (The PDF is positive everywhere. This has the surprising
implication that no matter how large a is, there is always some non-zero probability that
Z ≥ a.)
3. E [Z] = 0. (The mean of Z is 0.)
4. The PDF φ reaches a global maximum at the mean 0. (In fact, we can go ahead and
1
compute φ (0) = √ ≈ 0.399.)
2π
5. V [Z] = 1. (The variance of Z is 1.)
6. P (Z ≤ a) = P (Z < a). (We’ve already discussed this earlier. It makes no difference
whether the inequality is strict. This is because P(Z = a) = 0.)
7. The PDF φ is symmetric about the mean. This has several implications:
Proof. Omitted.
1. Press the blue 2ND button and then DISTR (which corresponds to the VARS button).
This brings up the DISTR menu.
2. Press 2 to select the “normalcdf” option.
The TI84 is now asking for your lower and upper bounds. Since Φ(2.51) = Φ(2.51)−Φ(−∞),
your lower bound is −∞ and your upper bound is 2.51.
3. But there’s no way to enter −∞ on your TI84. So instead, you’ll enter −1099 , which is
simply a very large negative number. To do so, press (-) , the blue 2ND button, EE
(which corresponds to the , button), and then 9 9 . (Don’t press ENTER yet!)
4. Now to enter your upper bound. First press , (this simply demarcates your lower and
upper bounds). Then enter your upper bound 2.51 by pressing 2 . 5 1 . Then press
ENTER . Your TI84 says that the answer is Φ(2.51) ≈ 0.99396.
-4 -3 -2 -1 0 1 2 3 4
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
Example 194. We’ll find Φ(2.51), Φ(−2.51), Φ(1.372), and P (−4 ≤ Z ≤ 4) using Z-tables.
Refer to the Z-tables on p. 188. (These are the exact same tables that appear on the List
of Formulae you’ll get during exams.)
• To find Φ(2.51), look at the row labelled 2.5 and the column labelled 1 — read off the
number 0.9940. We thus have Φ(2.51) = 0.9940.
• To find Φ(−2.51), note that the table does not explicitly give values of Φ(z), if z < 0.
But we can exploit the fact that the standard normal is symmetric about the mean µ = 0.
This fact implies that Φ(−z) = 1 − Φ(z). Hence, Φ(−2.51) = 1 − Φ(2.51) = 0.0060.
• To find Φ(1.372), first look at the row labelled 1.3 and the column labelled 7 — read off
the number 0.9147. This tells us that Φ(1.37) = 0.9147. Now look at the right end of the
table (where it says “ADD”). Since the third decimal place of 1.372 is 2, we look under
the column labelled 2 — this tells us to ADD 3. Thus, Φ(1.372) = 0.9147+0.003 = 0.9150.
• To find P (−4 ≤ Z ≤ 4), the Z-tables printed are actually useless, because they only go
to 2.99. So you can just write P (−4 ≤ Z ≤ 4) ≈ 1.
Exercise 79. Using both the Z-tables and your graphing calculator, find the following:
(a) P (Z ≥ 1.8). (b) P (−0.351 < Z < 1.2). (Answer on p. 355.)
1 2 3 4 5 6 7 8 9
z 0 1 2 3 4 5 6 7 8 9
ADD
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 4 8 12 16 20 24 28 32 36
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 4 8 12 16 20 24 28 32 36
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 4 8 12 15 19 23 27 31 35
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 4 7 11 15 19 22 26 30 34
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 4 7 11 14 18 22 25 29 32
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 3 7 10 14 17 20 24 27 31
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 3 7 10 13 16 19 23 26 29
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 3 6 9 12 15 18 21 24 27
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 3 5 8 11 14 16 19 22 25
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3 5 8 10 13 15 18 20 23
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 2 5 7 9 12 14 16 19 21
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 2 4 6 8 10 12 14 16 18
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 2 4 6 7 9 11 13 15 17
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 2 3 5 6 8 10 11 13 14
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1 3 4 6 7 8 10 11 13
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1 2 4 5 6 7 8 10 11
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1 2 3 4 5 6 7 8 9
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1 2 3 4 4 5 6 7 8
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1 1 2 3 4 4 5 6 6
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 1 1 2 2 3 4 4 5 5
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 0 1 1 2 2 3 3 4 4
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 0 1 1 2 2 2 3 3 4
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 0 1 1 1 2 2 2 3 3
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 0 1 1 1 1 2 2 2 2
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 0 0 1 1 1 1 1 2 2
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 0 0 0 1 1 1 1 1 1
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 0 0 0 0 1 1 1 1 1
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 0 0 0 0 0 1 1 1 1
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 0 0 0 0 0 0 0 1 1
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 0 0 0 0 0 0 0 0 0
Consider σZ + µ, itself a random variable. We know that since E [Z] = 0 and V [Z] = 1, it
follows from the properties from the mean and variance that
It turns out that σZ + µ is a normal random variable with mean µ and variance σ 2 :
Definition 8. X is called a normal random variable with mean µ and variance σ 2 if its
PDF fX ∶ R → R is defined by:
1 a−µ 2
fX (a) = √ e−0.5( σ ) .
σ 2π
Once again, for the A-levels, you need not remember this complicated-looking PDF. Nor
need you understand where it comes from.
Exercise 80. Let X ∼ N(µ, σ 2 ). Verify that if µ = 0 and σ 2 = 1, then for all a ∈ R, we have
fX (a) = φ(a). What can you conclude? (Answer on p. 355.)
Proof. Omitted.
Thus, we can easily transform any normal random variable into the SNRV:
X −µ
Corollary 1. If X ∼ N (µ, σ 2 ), then = Z ∼ N(0, 1). Equivalently, X = σZ + µ.
σ
X −µ
Exercise 81. Using Fact 11, prove that if X ∼ N (µ, σ 2 ), then = Z ∼ N(0, 1). (Answer
σ
on p. 356.)
The above corollary gives us an alternative method for computing probabilities associated
with normal random variables. In general, if X ∼ N (µ, σ 2 ), then
c−µ c−µ
P (X ≤ c) = P (σZ + µ ≤ c) = P (Z ≤ ) = Φ( ).
σ σ
Fact 12. Let X ∼ N (µ, σ 2 ) and let fX and FX be the PDF and CDF of X.
1. Φ(∞) = 1. (The area under the entire PDF is 1. This, of course, is true of any random
variable.)
2. φ(a) > 0, for all a ∈ R. (The PDF is positive everywhere. This has the surprising
implication that no matter how large a is, there is always some non-zero probability that
Z ≥ a.)
3. E [X] = µ. (The mean of Z is µ.)
4. The PDF fX reaches a global maximum at the mean µ. (In fact, we can go ahead and
1 0.399
compute fX (µ) = √ ≈ .)
σ 2π σ
5. V [X] = σ 2 . (The variance of X is σ 2 .)
6. P (Z ≤ a) = P (Z < a). (We’ve already discussed this earlier. It makes no difference
whether the inequality is strict. This is because P(Z = a) = 0.)
7. The PDF φ is symmetric about the mean. This has several implications:
(a) P (X ≥ µ + a) = P (X ≤ µ − a) = FX (µ − a).
(b) Since P (X ≥ µ + a) = 1 − P (X ≤ µ + a) = 1 − FX (µ + a), it follows that FX (µ − a) =
1 − FX (µ + a) or, equivalently, FX (µ + a) = 1 − FX (µ − a).
(c) FX (µ) = 1 − FX (µ) = 0.5.
Proof. Omitted.
1. Press the blue 2ND button and then VARS (which corresponds to the DISTR button).
This brings up the DISTR menu.
2. Press 2 to select the “normalcdf” option.
3. Enter the lower bound −1099 by pressing (-) , the blue 2ND button, EE (which corre-
sponds to the , button), and then 9 9 . (Don’t press ENTER yet!)
4. Enter the upper bound 2 by pressing , and 2 . (Don’t press ENTER yet!!).
Previously, we didn’t bother telling the TI84 our mean µ and standard deviation σ.
And so by default, if we pressed ENTER at this point, the TI84 simply assumed that we
wanted the SNRV Z ∼ N(0, 1). Now we’ll tell the TI84 what µ and σ are:
Finding P (H < 2), P (I < 2), P (−1 < G < 1), P (−1 < H < 1), and P (−1 < I < 1) is similar:
P (H < 2) and P (I < 2) P (−1 < G < 1) P (−1 < H < 1) P (−1 < I < 1)
Since I has mean µ = 2, we should have exactly P (I < 2) = 0.5. So here the TI84 has
actually made a small error in reporting instead that P (I < 2) ≈ 0.5000000005.
2 − µG 2 − (−1)
P (G < 2) = P (Z < = √ ≈ 9.4868) = Φ (9.4868) ≈ 1,
σG 0.1
2 − µH 2 − 1
P (H < 2) = P (Z < = √ ≈ 0.7071) = Φ (0.7071) ≈ 0.7601,
σH 2
2 − µI 2 − 2
P (I < 2) = P (Z < = √ = 0) = Φ (0) = 0.5,
σI 3
−1 − (−1) 1 − (−1)
P (−1 < G < 1) = P (0 = √ <Z< √ ≈ 6.3246)
0.1 0.1
= Φ (6.3246) − Φ (0) ≈ 1 − Φ(0) = 0.5.
−1 − 1 1−1
P (−1 < H < 1) = P (−1.4142 ≈ √ < Z < √ = 0)
2 2
= Φ(0) − Φ(−1.4142) ≈ 0.5 − [1 − Φ(1.4142)]
= Φ(1.4142) − 0.5 ≈ 0.9213 − 0.5 = 0.4213,
−1 − 2 1−2
P (−1 < I < 1) = P (−1.7321 ≈ √ < Z < √ ≈ −0.5774)
3 3
= Φ(−0.5774) − Φ(−1.7321) = 1 − Φ(0.5774) − [1 − Φ(1.7321)]
≈ 0.9584 − 0.7182 = 0.2402.
Exercise 82. Let X ∼ N(2.14, 5) and Y ∼ N(−0.33, 2). Using both the Z-tables and your
graphing calculator, find the following: (a) P (X ≥ 1) and P (Y ≥ 1). (b) P (−2 ≤ X ≤ −1.5)
and P (−2 ≤ Y ≤ −1.5). (Answer on p. 356.)
Proof. Omitted.
2
Corollary 2. Let X ∼ N (µX , σX ) and Y ∼ N (µY , σY2 ) be independent and a, b ∈ R
2
be constants. Then X + Y ∼ N (µX + µY , σX + σY2 ) and more generally, aX + bY ∼
N (aµX + bµY , a2 σX
2
+ b2 σY2 ).
2
Moreover, X − Y ∼ N (µX − µY , σX + σY2 ) and more generally, aX − bY ∼
N (aµX − bµY , a2 σX
2
+ b2 σY2 ).
Examples:
(a) What is the probability that their total weight is greater than 405 kg?
(b) What is the probability that one is more than 10% heavier than that the other?
(a) Let X1 ∼ N (200, 50) and X2 ∼ N (200, 50) be the weight of the first and second sumo
wrestler. Then X1 + X2 ∼ N (400, 100). Thus,
405 − 400
P (X1 + X2 > 405) = P (Z > √ ) = P (Z > 0.5) = 1 − Φ (0.5) ≈ 1 − 0.6915 = 0.3085.
100
(b) Our goal is to find p = P (X1 > 1.1X2 ) + P (X2 > 1.1X1 ). This is the probability that
the first sumo wrestler is more than 10% heavier than the second, plus the probability that
the second is more than 10% heavier than the first. Of course, by symmetry, these two
probabilities are equal. Thus, p = 2 × P (X1 > 1.1X2 ). Now,
But X1 − 1.1X2 ∼ N (200 − 1.1 ⋅ 200, 50 + 1.12 ⋅ 50) = N (−20, 110.5). Thus,
0 − (−20)
P (X1 > 1.1X2 ) = P (X1 − 1.1X2 > 0) = P (Z > √ )
110.5
(b) What is the probability that a caught fish weighs more than 9 times as much as a
caught shrimp?
(a) Let S be the total weight of 4 caught fish and 50 caught shrimp. Note, importantly,
that it would be wrong to write S = 4X + 50Y , because 4X + 50Y would be 4 times the
weight of a single caught fish, plus 50 times the weight of a single caught shrimp.
In contrast, we want Z to be the sum of the weights of 4 independent fish and 50 independent
shrimp. Thus, we should instead write S = X1 + X2 + X3 + X4 + Y1 + Y2 + ⋅ ⋅ ⋅ + Y50 , where
• X1 ∼ N (1, 0.4), X2 ∼ N (1, 0.4), X3 ∼ N (1, 0.4), and X4 ∼ N (1, 0.4) are the weights of
each caught fish.
• Y1 ∼ N (0.1, 0.1), Y2 ∼ N (0.1, 0.1), . . . , and Y50 ∼ N (0.1, 0.1) are the weights of each
caught shrimp.
(Note by the way that in contrast, 4X +50Y ∼ N (9, 42 × 0.4 + 502 × 0.1) = N (9, 256.4), which
has a rather different variance!)
(b) P (X > 9Y ) = P (X − 9Y > 0). But X − 9Y ∼ N (1 − 9 × 0.1, 0.4 + 92 × 0.1) = N (0.1, 8.5).
Thus, P (X − 9Y > 0) ≈ 0.5137 (calculator).
(a) Find the probability that their total water and electricity utility bill in any given month
exceeds $100.
(b) Find the probability that their total water and electricity utility bill in any given year
exceeds $1, 000.
(c) Then what is the maximum value of x, in order for the probability that the total utility
bill in a given month exceeds $100 is 0.1 or less?
How large is “large enough”? The most common rule-of-thumb is that n ≥ 30 is “large
enough”, so that’s what we’ll use in this book, even though this is somewhat arbitrary.
The CLT says that since n = 100 ≥ 30 is large enough and the distribution is “nice enough”
(we are assuming this), the random variable X can be approximated by the normal random
variable Y ∼ N (100 × 3.5, 100 × 35/12) = N (350, 3500/12).
P(X ≥ 360) ≈ P(Y ≥ 360) and P(X > 360) ≈ P(Y > 360).
Note however that X is a discrete random variable, so that P(X ≥ 360) ≠ P(X > 360).
More specifically,
In contrast, Y is a continuous random variable, so that P(Y ≥ 360) = P(Y > 360). Hence, if
we simply use the approximations P(X ≥ 360) ≈ P(Y ≥ 360) and P(X > 360) ≈ P(Y > 360),
then implicitly we’d be saying that P(X = 360) = 0, which is blatantly false.
To correct for this, we perform the so-called continuity correction. This says that we’ll
instead use the approximations
P(X ≥ 360) ≈ P(Y ≥ 359.5) and P(X > 360) ≈ P(Y ≥ 360.5).
Thus, P(X ≥ 360) ≈ P(Y ≥ 359.5) ≈ 0.2890 (calculator) and P(X > 360) ≈ P(Y ≥ 360.5) ≈
0.2693.
Note that if the random variable to be approximated is itself continuous, then there is no
need to perform the continuity correction. This is illustrated in Exercise 85 below.
Exercise 84. Let X be the random variable that is the sum of 30 rolls of a fair die. Find
P(100 ≤ X ≤ 110). (Answer on p. 358.)
55.1 Population
Example 203. The two candidates for the 2016 Bukit Batok SMC By-Election are Dr.
Chee Soon Juan and PAP Guy. It is the night of the election and voting has just closed.
Our objects-of-interest are the 23, 570 valid ballots cast. (A ballot is simply a piece of paper
on which a vote is recorded. The words ballot and vote are often used interchangeably.)
Arrange the ballots in any arbitrary order. Let v1 = 1 if the first ballot is in favour of Dr.
Chee and v1 = 0 otherwise. Similarly and more generally, for any i = 2, 3, . . . , 23570, let
vi = 1 if the ith ballot is in favour of Dr. Chee and v1 = 0 otherwise.
Our population here is simply the ordered set P = (v1 , v2 , . . . , v23570 ). So in this example,
the population is simply an ordered set of 1s and 0s.
The population mean µ is simply the average across all population values. The popu-
lation variance σ 2 is a measure of the variation across all population values. Formally:16
2 2 2 2
∑i=1 vi v1 + v2 + ⋅ ⋅ ⋅ + vk ∑i=1 (vi − µ) (v1 − µ) + (v2 − µ) + ⋅ ⋅ ⋅ + (vk − µ)
k k
2
µ= = and σ = = .
k k k k
Example 205 (continued from above). Suppose that of the 23, 570 votes, 9, 142 were
for Dr. Chee and the remaining against. So the vector (v1 , v2 , . . . , v23570 ) contains 9, 142 1s
and 14, 428 0s.
In this particular example, the population values are binary (either 0 or 1). And so we have
a nice alternative interpretation: the population mean is also the population proportion.
In this case, it is the proportion of the population who voted for Dr. Chee. So here the
proportion of votes for Dr. Chee is about 0.3879.
2 2 2 9142 9142 2 2
2 (v1 − µ) + (v2 − µ) + ⋅ ⋅ ⋅ + (vn − µ) 9142 ⋅ (1 − 23570 ) + 14428 ⋅ (0 − 23570 )
σ = = ≈ 0.2374.
n 23570
As usual, the variance tells us about the degree to which the vi ’s vary. Of course, in this
example, we already know that the vi ’s can take on only two values — 0 and 1. So the
variance isn’t terribly interesting or informative in this example. In particular, it doesn’t
tell us anything more that the population mean didn’t already tell us (indeed, it can be
shown that in this example, σ 2 = µ − µ2 ).
16
In the case of an infinite population, the definitions of µ and σ 2 must be adjusted slightly, but the intuition is the same.
Informally, a parameter is some number we’re interested in and which may be calculated
based on the population.
Voting has just closed. In a few hours’ time (after the vote-counting is done), we will know
what exactly µ is. But right now, we still don’t know what µ is.
Suppose we are impatient and want to know right away what µ might be. In other words,
suppose we want to get an estimate of the true value of µ. What are some possible
methods of getting a quick estimate of µ?
One possibility is to observe a random sample of 100 votes and count the proportion of
these 100 votes that are in favour of Dr. Chee. So for example, say we do this and observe
that 39 out of the 100 votes are for Dr. Chee. That is, we find that the observed sample
mean (which in this context can also be called the observed sample proportion) is
0.39. Then we might conclude:
Based on this observed random sample of 100 votes, we estimate that µ is 0.39.
The layperson might be content with this. But the statistician digs a little deeper and asks
questions such as:
• How do we know if this estimate is “good”?
• What are the criteria to determine whether an estimate is “good”?
We’ll now try to address, if only to a limited extent, these questions. But to do so, we must
first precisely define terms like sample and estimate.
1. The range of possible values taken on by the objects in the population; and
2. The proportion of the population that takes on each possible value.
Example 205 (continued from above). The population is P = (v1 , v2 , . . . , v23570 ), the
ordered set of 23570 ballots. Suppose that of these, 9, 142 are votes for Dr. Chee (hence
recorded as 1s) and the remaining 14, 428 are for PAP Guy (hence recorded as 0s).
Then the distribution of the population can informally be described in words as:
• A proportion 9142/23570 of the population are 1s, and
• A proportion 14428/23570 of the population are 0s.
Then the distribution of the population can informally be described in words as:
• A proportion 1/6 of the population are 2s;
• A proportion 2/6 of the population are 3s;
• A proportion 1/6 of the population are 4s; and
• A proportion 2/6 of the population are 7s.
17
Formally, we’d define the population distribution as a function. Indeed, some writers define the population itself as the
distribution function.
Informally, to observe a random sample of size n, we follow this procedure: Imagine the
23, 570 ballots are in a single big bag.
1. Randomly pull out one ballot. Record the vote (either we write x1 = 1, if the vote was
for Dr. Chee, or we write x1 = 0, if it wasn’t).
2. Put this ballot back in (this second step is why we call it sampling with replacement).
3. Repeat the above n times in total, so as to record down the values of x1 , x2 , . . . , xn .
Definition 10. Let P be a population. Then the random vector (i.e. ordered set of random
variables) (X1 , X2 , . . . , Xn ) is a random sample of size n from the population P if
An example to illustrate:
Let X1 , X2 , and X3 be independent random variables, each with the same distribution as
the population. That is, for each i = 1, 2, 3,
14428 9142
P (Xi = 0) = and P (Xi = 1) = .
23570 23570
In this textbook, we’ll be very careful to distinguish between a random sample (which is
a vector of random variables) and an observed random sample (which is a vector of real
numbers).
This may be contrary to the practice of your teachers or indeed even the A-level exams.
Definition 11. Let S = (X1 , X2 , . . . , Xn ) be a random sample of size n. Then the corre-
sponding sample mean X̄ and the sample variance of S are the random variables defined
by:
X 1 + X 2 + ⋅ ⋅ ⋅ + Xn
X̄ = ,
n
2 2 2 2
2
(X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (Xn − X̄) ∑i=1 (Xi − X̄)
n
S = = .
n−1 n−1
(The List of Formulae you get during exams will contain the observed sample variance.)
Note that strangely enough, the denominator of S 2 is n − 1, rather than n as one might
expect. As we’ll see later, there is a good reason for this.
By the way, there are two other formulae for calculating the sample variance:
Fact 13. Let S = (X1 , X2 , . . . , Xn ) be a random sample of size n. Let X̄ be the sample
mean and S 2 be the sample variance. Let a ∈ R be a constant. Then
2 2
[∑n
i=1 Xi ] 2 [∑ (X −a)]
n
∑i=1 Xi2 − ∑i=1 (Xi − a) − i=1 n i
n n
2 2
(a) S = n
and (b) S = .
n−1 n−1
Proof. Omitted.
• The sample mean X̄ (a random variable) vs. the observed sample mean x̄ (a real
number).
• The sample variance S 2 (a random variable) vs. the observed sample variance s2
(a real number).
Example 205 (continued from above). Let (X1 , X2 , X3 ) be a random sample of size 3.
The corresponding sample mean X̄ and sample variance S 2 are these random variables:
2 2 2
X1 + X2 + X3 (X1 − X̄) + (X2 − X̄) + (X3 − X̄)
X̄ = , S2 = .
3 3−1
Suppose our observed random sample of size 3 is (1, 0, 0). Then the corresponding ob-
served sample mean x̄ and observed sample variance s2 are these real numbers:
x1 + x2 + x3 1 + 0 + 0 1
x̄ = = = ,
n 3 3
2 2 2
2
2 2
(x1 − x̄) + (x2 − x̄) + (x3 − x̄)
2
(1 − 13 ) + (0 − 31 ) + (0 − 31 ) 1
s = = = .
n−1 3−1 3
2 2 2
X 1 + X 2 + X 3 + X4 + X5 (X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (X5 − X̄)
X̄ = , S2 = .
5 5−1
Suppose our observed random sample of size 5 is (0, 1, 0, 0, 1). Then the corresponding
observed sample mean x̄ and observed sample variance s2 are these real numbers:
x1 + x2 + x3 + x4 + x5 0 + 1 + 0 + 0 + 1 2
x̄ = = = = 0.4,
n 5 5
2 2 2 2 2
2 (x1 − x̄) + (x2 − x̄) + (x3 − x̄) + (x4 − x̄) + (x5 − x̄)
s =
n−1
2 2 2 2 2
(0 − 51 ) + (1 − 15 ) + (0 − 51 ) + (0 − 15 ) + (1 − 51 )
= = 0.35.
5−1
Example 205 (continued from above). It is the night of the election and polling has
just closed. We still do not know the true proportion µ that voted for Dr. Chee.
We decide to get a random sample of size 3: (X1 , X2 , X3 ). The corresponding sample mean
X̄3 = (X1 + X2 + X3 ) /3 shall be an estimator for µ. (Informally, an estimator is a method
for generating “guesses” for some unknown parameter, in this case µ.)
This estimator is used to generate estimates (“guesses”) for µ. For every observed
random sample, the estimator generates an estimate.
Suppose our observed random sample of size 3 is (1, 0, 0). We calculate the corresponding
observed sample mean to be x̄ = 1/3. We say that x̄ = 1/3 is an estimate for µ.
(By the way, unless we are extremely lucky, it is highly unlikely that the true value of the
unknown parameter µ is precisely 1/3. After all, 1/3 is merely an estimate obtained from
a single observed random sample of size 3.)
Suppose instead that our observed random sample of size 3 were (0, 1, 1). Then the cor-
responding observed sample mean would be x̄ = 2/3. We’d instead say that x̄ = 2/3 is our
estimate for µ.
There is also more than one estimator we can use. For example, suppose instead that we
decide to get a random sample of size 5: (X1 , X2 , X3 , X4 , X5 ). We shall instead use the
corresponding sample mean X̄ = (X1 + X2 + X3 + X4 + X5 ) /3 as our estimator for µ. And
so for example suppose our observed random sample of size 5 is is (0, 1, 0, 0, 1). Then the
corresponding observed sample mean x̄ = 0.4 and x̄ = 0.4 would be our estimate for µ.
Now, are these estimators and estimates “good” or “reliable”? How much should we
trust them? These are questions that we’ll address in the next section.
A different example:
Suppose our observed random sample is (h1 , h2 , h3 , h4 ) = (178, 165, 182, 175).
Thus, h̄ = 175 serves as an estimate (or “guess”) of the true average male height µ.
Again, are the estimator H̄ and estimate h̄ = 175 “good” or “reliable”? How much should
we trust them? These are questions that we’ll address in the next section.
8 8
∑ xi = 1, 320 and ∑ x2i = 218, 360.
i=1 i=1
Then the observed sample mean x̄ and the observed sample variance s2 are
n
∑i=1 xi 1320
x̄ = = = 165,
n 8
2
(∑n xi )
218360 − 1320
2
∑i=1 x2i − i=1n
n
2 8
s = = = 80.
n−1 7
And our estimates for µ and σ 2 are, respectively, 165 cm and 80 cm2 .
8 8
2
∑(xi − 160) = 72 and ∑ (xi − 160) = 1, 560.
i=1 i=1
Then the observed sample mean x̄ and the observed sample variance s2 are
2
[∑ (x −a)]
2 n
1, 560 − 728
2
∑i=1 (xi − 160) − i=1 ni
n
2
s = = ≈ 130.3.
n−1 7
And our estimates for µ and σ 2 are, respectively, 169 cm and 130.3 cm2 .
Exercise 87. (Answer on p. 358.) Let X be the random variable that is the weight (in
kg) of an American. Suppose we are interested in estimating the true population mean µ
and variance σ 2 of X. We get an observed random sample of size 10: (x1 , x2 , . . . , x10 ).
10 10
(a) Suppose you are told that ∑ xi = 1, 885 and ∑ x2i = 378, 265. Find the observed sample
i=1 i=1
mean x̄ and observed sample variance s2 .
10 10
2
(b) Suppose you are instead told that ∑(xi − 50) = 1, 885 and ∑ (xi − 50) = 378, 265. Find
i=1 i=1
2
the observed sample mean x̄ and observed sample variance s .
Earlier we asked: How do we decide if an estimator and the estimates it generates are
“good”? How do we know whether to trust any given estimate?
For H1 Maths, we’ll learn only about one (important) criterion for deciding whether an
estimator is “good”. This is unbiasedness. Informally, an estimator is unbiased if on
average, the estimator “gets it right”. Formally:
Definition 12. Let X be a random variable and θ ∈ R be a parameter (i.e. just some real
number). We say that X is an unbiased estimator for θ if
E [X] = θ.
The next proposition says that the sample mean X̄ is an unbiased estimator for the
population mean µ; and the sample variance S 2 is an unbiased estimator for the
population variance σ 2 .
Proof. You are asked to prove (a) in Exercise 89. The proof of (b) is omitted.
Proposition 3(b) is the reason why, strangely enough, we define the sample variance with
n − 1 in the denominator:
2 2 2
(X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (Xn − X̄)
2
S = .
n−1
As defined, S 2 is an unbiased estimator for the population variance σ 2 . This, then, is the
reason why we define it like this.
Some writers call S 2 the unbiased sample variance, but we shall not bother doing so. We’ll
simply call S 2 the sample variance.
Suppose two observed random samples of size 3 are (x1 , x2 , x3 ) = (1, 0, 0) and (x1 , x2 , x3 ) =
(1, 0, 1). The corresponding observed sample means are x̄1 = 1/3 and x̄2 = 2/3. These are
two possible estimates (“guesses”) of the true sample proportion µ.
Unless we’re extremely lucky, it’s unlikely that either of these two estimates is exactly
correct. Nonetheless, what the above unbiasedness proposition tells us is this:
Suppose the unknown population mean is µ = 0.39. We draw the following 10 observed
random samples of size 3 (table below). For each sample i, we calculate the corresponding
observed sample mean x̄i .
Sample i x1 x2 x3 x̄i
1 1 0 1 2/3
2 0 0 0 0
3 0 1 0 2/3
4 1 0 0 1/3
5 0 1 1 2/3
6 1 0 0 1/3
7 0 0 0 0
8 0 0 0 0
9 0 0 1 1/3
10 1 1 0 2/3
Note that every estimate x̄i is wrong. Indeed, since the sample mean X̄i can only take on
values 0, 1/3, 2/3, or 1, the estimates can never possibly be equal to the true µ = 0.39.
Nonetheless, what the above proposition says informally is that on average, the estimate
gets it correct. Formally, E [X̄] = µ = 0.39.
For a demonstration that you can play around with, try this Google spreadsheet.
Exercise 89. Prove that E [X̄] = µ. (This is part (a) of Proposition 3). (Answer on p.
359.)
Exercise 90. Suppose we flip a coin 10 times. The first 7 flips are heads and the next 3
are tails. Let 1 denote heads and 0 denote tails. (Answer on p. 360.)
(a) Write down, in formal notation, our observed random sample, the observed sample
mean, and observed sample variance.
(b) Are these observed sample mean and variance unbiased estimates for the true population
mean and variance?
(c) Can we conclude that this a biased coin (i.e. the true population mean is not 0.5)?
This section is just to repeat, stress, and emphasise that the sample mean X̄ is itself a
random variable. This is an important point.
Indeed, the sample mean X̄ is both (i) a random variable; and (ii) an estimator. In
contrast, an observed sample mean x̄ is both (i) a real number; and (ii) an estimate.
We’ve showed that E [X̄] = µ. This equation can be interpreted in two equivalent ways:
• The expected value of the sample mean equals the population mean µ.
• The sample mean is an unbiased estimator for the population mean µ.
We now give the variance of the sample mean. It turns out to be equal to the population
variance σ 2 , divided by the sample size n.
σ2
Fact 14. V [X̄] = .
n
1
Exercise 91. Prove Fact 14. (Hint: Note that X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) and X1 , X2 , . . . ,
n
Xn are independent.) (Answer on p. 360.)
Exercise 92. For each of the following terms, give a formal definition and an intuitive ex-
planation. (State whether each term is a random variable or a real number.) For simplicity,
you may assume that the finite population is given by P = (x1 , x2 , . . . , xk ). (Answer on p.
361.)
(a) The population mean.
(b) The population variance.
(c) The sample mean.
(d) The sample variance.
(e) The mean of the sample mean.
(f) The variance of the sample mean.
(g) The mean of the sample variance.
(h) The observed sample mean.
(i) The observed sample variance.
X1 + X2 + ⋅ ⋅ ⋅ + X n σ2
X̄n = ∼ N (µ, ) .
n n
Proof. Corollary 2 tells us that the sum of normal random variables is itself a normal
random variable. So X1 + X2 + ⋅ ⋅ ⋅ + Xn is a normal random variable.
Fact 11 tells us that a linear transformation of a normal random variable is itself a normal
random variable. So X̄n = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) /n is a normal random variable.
In the previous sections, we already showed that X̄n has mean µ and variance σ 2 /n.
σ2
Altogether then, X̄n ∼ N (µ, ).
n
In the next chapter, we’ll make greater use of the two results just given in this section.
Example 207. Suppose we’re interested in the average height of a Singaporean. The only
way to know this for sure is to survey every single Singaporean. This, however, is not
practical.
Instead, we have only the resources to survey 100 individuals. We decide to go to a bas-
ketball court and measure the heights of 100 people there. We thereby gather an ob-
served sample of size 100: (x1 , x2 , . . . , x100 ). We find that the average individual’s height is
x̄ = ∑ xi /100 = 179 cm.
The reason is that our observed sample of size 100 was non-random. We picked a basketball
court, where the individuals are overwhelmingly (i) male; and (ii) taller than average. Our
estimate x̄ = 179 cm is thus probably biased upwards.
Example 208. Suppose we’re interested in what the average Singaporean family spends
on food each month. The only way to know this for sure is to survey every single family in
Singapore. This, however, is not practical.
Instead, we have only the resources to survey 100 families. We decide to go to Sixth
Avenue and randomly ask 100 families living there what they reckon they spend on food
each month. We thereby gather an observed sample of size 100: (x1 , x2 , . . . , x100 ). We find
that the average family spends x̄ = ∑ xi /100 = $2, 700 on food each month.
Is x̄ = $2, 700 an unbiased estimate of the average monthly spending on food by a Singa-
porean family? Intuitively, we know that the answer is obviously no.
The reason is that our observed sample of size 100 was non-random. We picked an unusually
affluent neighbourhood. Our estimate x̄ = $2, 700 is thus probably biased upwards.
Here’s a quick sketch of how Null Hypothesis Significance Testing (NHST) works:
Example 209. A piece of equipment has probability θ of breaking down. We have many
pieces of the same type of equipment. Assume the rates of breakdown across the pieces of
equipment are identical and independent.
4. Write down a test statistic. In this case, an obvious test statistic is the sample number
of failures T = X1 + X2 + X3 + X4 + X5 . Our observed test statistic is thus t = x1 + x2 +
x3 + x4 + x5 = 0 + 0 + 0 + 1 + 0 = 1.
5. Now ask, how likely is it that — if H0 were true — our test statistic would have been
“at least as extreme as” that actually observed? That is, what is the probability
In this case, the p-value is the probability of observing a random sample where 1 or fewer
pieces of equipment broke down, assuming H0 ∶ θ = 0.6 were true. That is,
p = P (T ≤ t = 1∣H0 ) .
Now, remember that T is a random variable. In fact, it’s a binomial random variable.
Assuming H0 to be true, we have T ∼ B (n, θ) = B (5, 0.6). Thus,
⎛5⎞ 0 5 ⎛5⎞ 1 4
p = P (T ≤ 1∣H0 ) = P (T = 0∣H0 ) + P (T = 1∣H0 ) = 0.6 0.4 + 0.6 0.4 = 0.08704.
⎝0⎠ ⎝1⎠
This says that if H0 were true, then the probability of observing a test statistic as extreme
as the one we actually observed is only 0.08704. We might interpret this relatively small
p-value as casting doubt on or providing evidence against H0 .
1. Null hypothesis H0 (e.g. “this equipment has probability 0.6 of breaking down”).
2. Alternative hypothesis HA (e.g. “this equipment has probability less than 0.6 of
breaking down”). The test is either one-tailed or two-tailed, depending on HA .
4. A test statistic T (which simply maps each observed random sample to a real number.)
5. The p-value of the observed sample. This is the probability that — assuming H0 were
true — T takes on values that are at least “as extreme as” the actual observed test
statistic t.
In particular, if p < α, then we say that we reject H0 at the significance level α. And
if p ≥ α, then we say that we fail to reject H0 at the significance level α.
Note importantly that to reject H0 (at some significance level α) does NOT mean that H0
is false and HA is true. Similarly, failure to reject H0 does NOT mean that H0 is true and
HA is false. More on this below.
Another example of NHST, now slightly more formally and carefully presented.
H0 ∶ µ = 0.3,
HA ∶ µ > 0.3.
We pre-select α = 0.05 as our significance level. This is the arbitrary threshold at which
we’ll say we reject (or fail to reject) H0 .
We gather a random sample of 100 votes: (X1 , X2 , . . . , X100 ). Our test statistic is the
number of votes in favour of Dr. Chee, given by
T = X1 + X2 + ⋅ ⋅ ⋅ + X100 .
Suppose that in our observed random sample (x1 , x2 , . . . , x100 ), we find that 39 are in favour
of Dr. Chee. Our observed test statistic is thus t = 39.
We now ask: What is the probability that — assuming H0 were true — T takes on values
that are at least “as extreme as” the actual observed test statistic t? That is, what is the
p-value of the observed sample?
Now, assuming H0 were true, T is a binomial random variable with parameters 100 and
0.3. That is, T ∼ B (n, p) = B (100, 0.3). So:
And since p ≈ 0.03398 < α = 0.05, we can also say that we reject H0 at the α = 0.05
significance level.
Let θ be the parameter we’re interested in. Under the objectivist interpretation, the
value of θ may be unknown, but it is fixed.
This has two consequences:
When performing NHST, we will assiduously avoid saying things like “H0 is true”, “H0 is
false”, “HA is true”, or “HA is false”. Instead, we will stick strictly to saying either “we
reject H0 at the significance level α” or “we fail to reject H0 at the significance level α”.
Each of these two statements has a very precise meaning. The first says that p < α. The
second says that p ≥ α. Nothing more and nothing less.
Exercise 93. We flip a coin 20 times and get 17 heads. Test, at the 5% significance level,
whether the coin is biased towards heads. (Answer on p. 362.)
In the previous section, all the NHST we did were one-tailed tests.18 For example, in the
NHST done for Dr. Chee, we had
H0 ∶ µ = 0.3,
HA ∶ µ > 0.3.
This was a one-tailed test because the alternative hypothesis HA was that µ was to the
right of 0.3.
If instead we changed the alternative hypothesis to:
H0 ∶ µ = 0.3,
HA ∶ µ ≠ 0.3.
Then this would be called a two-tailed test, because the alternative hypothesis HA is that
µ is either to the left or to the right of 0.3.
We now repeat the examples done in the previous section, but with HA tweaked so that we
instead have two-tailed tests. The difference is that the p-value is calculated differently.
18
By the way, the more common convention is to say “one-tailed” and “two-tailed” tests, rather than “one-tail” and “two-
tail” tests, as is the norm in Singapore (similar to those “Close for break” signs you sometimes see). But after some
consultation with my grammatical experts, I have been told that both are equally correct.
H0 ∶ θ = 0.6,
HA ∶ θ ≠ 0.6.
Say we observe the same random sample as before: (x1 , x2 , x3 , x4 , x5 ) = (0, 0, 0, 1, 0).
The difference now is how the p-value (of the observed sample) is calculated. In words, the
p-value gives the likelihood that our test statistic is “at least as extreme as” that actually
observed — assuming H0 were true.
Previously, under a one-tailed test, we interpreted “our test statistic is at least as extreme
as that actually observed” to mean the event T ≤ t = 1.
Now that we’re doing a two-tailed test, we’ll instead interpret the same phrase to mean both
the event T ≤ t = 1 and the event that T is as far away on the other side of E [T ∣H0 ] = 3.
The second event is, specifically, T ≥ 5. Altogether then, the p-value is given by
p = P (T ≤ 1, T ≥ 5∣H0 )
Since p = 0.1648 ≥ α = 0.1, we say that we fail to reject H0 at the α = 0.1 significance
level.
Observe that previously, under the one-tailed test, we could reject H0 at the α = 0.1
significance level, because there p = 0.08704. Now, in contrast, under the two-tailed test,
we fail to reject H0 at the same significance level.
In general, all else equal, the p-value for an observed random sample is greater under a
two-tailed test than under a one-tailed test. Thus, under a two-tailed test, we are less
likely to reject H0 .
H0 ∶µ = 0.3,
HA ∶µ ≠ 0.3.
Say we observe the same random sample as before: (x1 , x2 , . . . , x100 ), in which 39 votes were
in favour of Dr. Chee. So again our observed test statistic is t = x1 + x2 + ⋅ ⋅ ⋅ + x100 = 39.
The difference now is how the p-value (of the observed sample) is calculated. In words, the
p-value gives the likelihood that our test statistic is “at least as extreme as” that actually
observed — assuming H0 were true.
Previously, under a one-tailed test, we interpreted “our test statistic is at least as extreme
as that actually observed” to mean the event T ≥ t = 39.
Now that we’re doing a two-tailed test, we’ll instead interpret the same phrase to mean both
the event T ≥ t = 39 and the event that T is as far away on the other side of E [T ∣H0 ] = 30.
The second event is, specifically, T ≤ 21. Altogether then, the p-value is given by
Since p = 0.06281 ≥ α = 0.05, we say that we fail to reject H0 at the α = 0.05 significance
level.
Again observe that previously, under the one-tailed test, we could reject H0 at the α = 0.05
significance level, because there p = 0.03398. Now, in contrast, under the two-tailed test,
we fail to reject H0 at the same significance level.
Exercise 94. We flip a coin 20 times and get 17 heads. Test, at the 5% significance level,
whether the coin is biased.(Answer on p. 362.)
However, NHST is widely misunderstood, misinterpreted, and misused even within scientific
communities. It has long been heavily criticised. In March 2016, the American Statistical
Association even issued an official policy statement on how NHST should be used!
p = P (D∣H0 ) ,
where D stands for the observed data and H0 stands for the null hypothesis. The p-value
answers the following question: — assuming H0 were true, what’s the probability that we’d
get data “at least as extreme” as those actually observed (D)?
Say we get a p-value of 0.03. We should then say simply that
However, instead of merely saying the above, some researchers may instead conclude that:
Do you see the error here? The researcher has gone from the finding that p = P (D∣H0 ) = 0.03
to the conclusion that P (H0 ∣D) = 0.03.
The error is the same as leaping from “A lottery ticket buyer who doesn’t cheat has a small
probability q of winning” to “Jane bought a lottery ticket and won. Therefore, there is only
probability q that she didn’t cheat.”
The p-value is NOT the probability that H0 is true.19 Instead, it is the probability that
— assuming H0 were true — we would have gotten data “at least as extreme” as those
actually observed. This is an important difference. But it is also a subtle one, which is why
even researchers get confused.
19
Indeed, under the objectivist view, such a statement is nonsensical anyway, because H0 is either true or not true; it makes
no sense to talk probabilistically about whether H0 is true.
Example 210. On the night of the 2016 Bukit Batok SMC By-Election, the Elections
Department announced* that based on a sample count of 900 ballots,
What does the above gobbledygook mean? Let µ be the true proportion of votes won by
Dr. Chee. Let X̄ be the sample proportion and x̄ be the observed sample proportion.
It’s clear enough what the 39% means — they randomly counted 900 ballots and found
(after accounting for any spoilt votes) that x̄ = 39% were in favour of Dr. Chee.
What’s less clear is what the 95% confidence level and ±4% margin of error mean.
Here are three possible interpretations of what is meant. Only one is correct.
Equivalently, suppose we repeatedly observe many random samples of size 900. Then we
should find that in 0.95 of these observed random samples, the observed sample mean is
between 0.35 and 0.43.
Equivalently, suppose we repeatedly observe many random samples of size 900. Then we
should find that in 0.95 of these observed random samples, the observed sample mean is
between µ − 0.04 and µ + 0.04.
Take a moment to understand what each of the above interpretations say. Then decide
which you think is the correct interpretation, before turning to the next page.
Unfortunately, the correct interpretation is also the one that says the least. It is Interpre-
tation #3 — “with probability 0.95, X̄ ∈ (µ − 0.04, µ + 0.04)”.
This interpretation says merely that if we were somehow able to repeatedly observe random
samples of size 900, then we’d find that 0.95 of the corresponding observed sample means
will be in (µ − 0.04, µ + 0.04). Which isn’t saying much, because first of all, we have only one
observed random sample; we do not get to repeatedly observe random samples. Secondly,
this still doesn’t tell us much about µ, which is what we’re really interested in.
The correct interpretation (Interpretation #3) is the least interesting interpretation. Per-
haps this explains why journalists often prefer to give an incorrect interpretation.
*E.g. the article “Margin of Ignorance” (backup) begins by reporting poll results that Kerry-Edwards was supported by 51%
of voters, while Bush-Cheney was supported by 45%. The author then ridicules other journalists for their misinterpretation
of these data. (He also claims, incorrectly, that polling is based on the Central Limit Theorem.) He then triumphantly
gives the “correct” explanation: “95 times out of 100 the true Kerry-Edwards number will fall between 47 and 55 and the
Bush-Cheney number will fall between 41 and 49.” This, of course, is what we called incorrect Interpretation #1 above.
For a discussion of where the Elections Department’s ±4% margin of error comes from,
please see the Appendices of my H2 Mathematics Textbook.
Example 211. On the night of the 2016 Bukit Batok SMC By-Election, a website called
Mothership.sg wrote:
“Based on the sample count of 100 votes,* it was revealed at 9.26pm that the SDP Sec-Gen
received 39 percent of votes. In other words, Chee would score 35 per cent in the worst
case scenario and 43 per cent in the best case scenario.”
This is the most absurd misinterpretation of the margin of error I have ever seen.**
Let’s see what the correct worst- and best-case scenarios are.
Suppose that in the observed random sample of 900 votes, exactly 39% or 0.39 × 900 = 351
were votes for Dr. Chee and the remaining 549 were for PAP Guy. Then:
• Worst-case scenario: The observed random sample of 900 votes happened to contain
exactly all of the votes in favour of Dr. Chee. That is, Dr. Chee won only 351 votes
and PAP Guy won the remaining 23, 570 − 351 = 23, 219 votes. So the correct worst-case
scenario is that Dr. Chee won ≈ 1.5% of the votes.
• Best-case scenario: The observed random sample of 900 votes happened to contain
exactly all of the votes in favour of PAP Guy. That is, PAP Guy won only 549 votes
and Dr. Chee won the remaining 23570 − 549 = 23, 021 votes. So the correct best-case
scenario is that Dr. Chee won ≈ 97.7% of the votes.
These worst- and best-case scenarios are admittedly unlikely. Nonetheless, they are pos-
sible scenarios all the same. The journalist’s purported worst- and best-case scenarios are
completely wrong.
*By the way, even this basic fact was wrong. The sample count was not 100 votes. Instead, it was 900 votes, consisting of
100 votes from each of 9 polling stations.
Moreover, the Mothership.sg journalist failed to report the confidence level of 95%, either because he didn’t know what it
meant or because he didn’t think it important. But it is important. It is pointless to inform the reader about the margin of
error without also specifying the confidence level.
**You can find several misinterpretations of the margin of error collected in this academic paper: “Erring in the Margin of
Error”. None is as absurdly bad as the one here.
Informally, the critical region is the set of values of the observed test statistic t for which
we would reject the null hypothesis. The critical region is thus sometimes also called the
rejection region.
And the critical value(s) is (are) the exact value(s) of the observed test statistic t at
which we are just able to reject the null hypothesis.
Say that as before, we have a one-tailed test where the two competing hypotheses are:
H0 ∶ µ = 0.3,
HA ∶ µ > 0.3.
Say that as before, in our observed random sample of 100 votes, 39 are in favour of Dr.
Chee, so that our observed test statistic is t = 39.
We calculated that the corresponding p-value is 0.03398 and so we were able to reject H0
at the α = 0.05 significance level.
We now calculate the critical region and the critical value. We can calculate that if
t = 38, then the corresponding p-value is ≈ 0.053 (you should verify this for yourself). And
so we would be unable to reject H0 .
We thus conclude that the critical value is 39, because this is the value of t at which we
are just able to reject H0 .
And the critical region is the set {39, 40, 41, . . . , 100}. These are the values at which we’d
be able to reject H0 at the α = 0.05 significance level.
Say that as before, we have a two-tailed test where the two competing hypotheses are:
H0 ∶ µ = 0.3,
HA ∶ µ ≠ 0.3.
The significance level is again α = 0.05. Again, the observed random sample of 100 votes
contains 39 in favour of Dr. Chee, so that our observed test statistic is t = 39.
We calculate that if t = 40, then the corresponding p-value is ≈ 0.03745 (you should verify
this for yourself). Thus, the critical values are 20 and 40, because these are the values of t
at which we are just able to reject H0 .
The critical region is the set {0, 1, . . . , 20, 40, 41, . . . , 100}. These are the values at which
we’d be able to reject H0 at the α = 0.05 significance level.
Exercise 95. (Answer on p. 363.) We flip a coin 20 times. What are the critical region
and critical value(s) in
(a) A test, at the 5% significance level, of whether the coin is biased towards heads.
(b) A test, at the 5% significance level, of whether the coin is biased.
Example 212. The weight (in mg) of a grain of sand is X ∼ N (µ, 9). Our unknown
parameter of interest is the true population mean µ (i.e. the true average weight of a grain
of sand). Our “guess” is that µ = 5. We thus write down two competing hypotheses:
H0 ∶ µ = 5,
HA ∶ µ ≠ 5.
We take a random sample of size 4 — (X1 , X2 , X3 , X4 ). Our test statistic is the sample
mean X̄ = (X1 + X2 + X3 + X4 ) /4.
Our observed random sample is (x1 , x2 , x3 , x4 ) = (3, 9, 11, 7). That is, we randomly pick
four grains of sand that happen to have weights 3, 9, 11, and 7 mg. Then the observed test
statistic is
3 + 9 + 11 + 7
x̄ = = 7.5.
4
The p-value is the probability that the test statistic X̄ takes on values “at least as extreme
as” our observed test statistic x̄ = 7.5, assuming H0 ∶ µ = 5 were true. Note that if H0 were
true, then X̄ ∼ N (µ, σ 2 /n) = N (5, 9/4). Thus, the p-value is given by
⎛ 7.5 − 5 ⎞ ⎛ 2.5 − 5 ⎞
=P Z≥ √ +P Z ≤ √ ≈ 0.04779 + 0.04779 = 0.09558.
⎝ 9/4 ⎠ ⎝ 9/4 ⎠
Thus, we reject H0 at the α = 0.1 significance level. However, we would fail to reject H0 at
the α = 0.05 significance level.
X̄ − µ
Any Normal Known Z-test: √ ∼ N(0, 1).
σ/ n
X̄ − µ
Large Any Known Z-test: √ ∼ N(0, 1).
σ/ n
X̄ − µ
Large Any Unknown Z-test: √ ∼ N(0, 1).
s/ n
Exercise 96. The Singapore daily high temperature (in °C) can be modelled by X ∼
N (µ, 8). Our unknown parameter of interest is the true population mean µ (i.e. the
true average daily high temperature). Your friend guesses that µ = 34. You gather
the following data on daily high temperatures, of 10 randomly-chosen days in 2015:
(35, 35, 31, 32, 33, 34, 31, 34, 35, 34). Test your friend’s hypothesis, at the α = 0.05 signifi-
cance level. (Be sure to write down your null and alternative hypotheses.) (Answer on p.
364.)
We’ll recycle the same example from the previous section. Before, we knew that X was
normally distributed. Now the big difference is that we have absolutely no idea what
distribution X comes from!
To compensate, we require also that our random sample is “large enough”, so that the
CLT-approximation can be used.
Example 213. The weight (in mg) of a grain of sand is X ∼ (µ, 9). (This says simply that
X is distributed with mean µ and variance 9.) Our unknown parameter of interest is the
true population mean µ (i.e. the true average weight of a grain of sand). Again, we “guess”
that µ = 5. Again, we write down
H0 ∶ µ = 5,
HA ∶ µ ≠ 5.
This time, we’ll take a random sample of size 100 — (X1 , X2 , . . . , X100 ). Again, our test
statistic is the sample mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X100 ) /100.
Recall the magic of the CLT. Even if we have absolutely no idea what distribution X
is drawn from, then provided n is sufficiently large, X̄ is normally distributed. So here,
since the sample is large (n = 100 ≥ 20), by the CLT, we know that X̄ has, approximately,
the normal distribution N (µ, σ 2 /n). So, if H0 were true, then we have, approximately,
X̄ ∼ N (µ, σ 2 /n) = N (5, 9/100).
x1 + x2 + ⋅ ⋅ ⋅ + x100
x̄ = = 5.5.
100
Again, the p-value is the probability that our test statistic X̄ takes on values “at least as
extreme as” our observed test statistic x̄ = 5.6, assuming H0 ∶ µ = 5 were true. Thus, the
p-value is given by
= P (Z ≥ 2) + P (Z ≤ −2) ≈ 0.0455.
Exercise 97. The Singapore daily high temperature (in °C) can be modelled by X ∼ (µ, 8).
Our unknown parameter of interest is the true population mean µ (i.e. the true average
daily high temperature). Your friend guesses that µ = 34. You gather the data on daily high
temperatures, of 100 randomly-chosen days in 2015 and find the observed sample average
temperature to be 33.4 °C. Test your friend’s hypothesis, at the α = 0.05 significance level.
(Be sure to write down your null and alternative hypotheses. Also, clearly state where you
use the CLT.) (Answer on p. 364.)
We’ll recycle the same example from the previous section. Again, we have absolutely no
idea what distribution X comes from. And again, the random sample is large enough, so
that the CLT can be used.
But now, σ 2 is unknown. This turns out to be no big deal. We can simply replace σ 2
with the observed unbiased sample variance s2 , and do the same thing as before.
Example 214. The weight (in mg) of a grain of sand is X ∼ (µ, σ 2 ). (This says simply
that X is distributed with mean µ and variance σ 2 .) Our unknown parameter of interest
is the true population mean µ (i.e. the true average weight of a grain of sand). Again, we
“guess” that µ = 5. Again, we write down
H0 ∶ µ = 5,
HA ∶ µ ≠ 5.
Again, we take a random sample of size 100 — (X1 , X2 , . . . , X100 ). Again, our test statistic
is the sample mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X100 ) /100.
Again, since the sample is large (n = 100 ≥ 20), by the CLT, that X̄ has, approximately,
the normal distribution N (µ, σ 2 /n). So, if H0 were true, then we have, approximately,
X̄ ∼ N (µ, σ 2 /n) = N (5, σ 2 /100). Since the sample variance S 2 is an unbiased estimator for
σ 2 , it is plausible that we also have, approximately, X̄ ∼ N (µ, σ 2 /n) = N (5, s2 /100), where
s2 is the observed sample variance.
Say the observed sample mean and observed sample variance we get are:
100 2
x1 + x2 + ⋅ ⋅ ⋅ + x100 2 ∑i=1 (xi − x̄)
x̄ = = 5.6 and s = =8
100 n−1
Again, the p-value is the probability that our test statistic X̄ takes on values “at least as
extreme as” our observed test statistic x̄ = 5.6, assuming H0 ∶ µ = 5 were true. Thus, the
p-value is given by
Exercise 98. The Singapore daily high temperature (in °C) can be modelled by X ∼
(µ, σ 2 ). Our unknown parameter of interest is the true population mean µ (i.e. the true
average daily high temperature). Your friend guesses that µ = 34. You gather the data
on daily high temperatures, of 100 randomly-chosen days in 2015. Your observed sample
mean temperature is 33.4 °C and your observed sample variance is 11.2 °C2 . Test your
friend’s hypothesis, at the α = 0.05 significance level. (Be sure to write down your null and
alternative hypotheses. Also, clearly state where you use the CLT.) (Answer on p. 365.)
Example 215. We flip a coin 100 times. We get 100 heads. What can we say about the
coin?
This is an open-ended question, to which there can be many different answers. Here’s the
answer we’re taught to give for H2 Maths:
H0 ∶ µ = 0.5,
HA ∶ µ ≠ 0.5.
Our test statistic T is the number of heads (out of 100 coin-flips). Our observed test
statistic t is 100. The corresponding p-value (note that this is a two-tailed test) is
We note also that we can easily reject H0 at any of the conventional significance levels
(α = 0.1, α = 0.05, or α = 0.01).
Exercise 99. (Answer on p. 365.) We observe the weights (in kg) of a random sample of
50 Singaporeans: (x1 , x2 , . . . , x50 ). We observe that ∑ xi /50 = 68 and ∑ x2i /50 = 5000.
A friend claims that the average American is heavier than the average Singaporean. It
is known that the average American weighs 75 kg. Is your friend correct? If you make
any assumptions or approximations, make clear exactly where you do so. (Hint: Use Fact
13(a)).
In this chapter, we’ll be interested in the relationship between two sets of data.
Example 216. We measure the heights and weights of 10 adult male Singaporeans. Their
heights (in cm) and weights (in kg) are given in this table:
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
We call (hi , wi ) observation i. So for example, observation 5 is (178, 72) and observation
9 is (150, 44).
We can plot a scatter diagram of these 10 persons’ weights (vertical axis) against their
heights (horizontal).
90 Weight (kg)
80
70
60
50 Height (cm)
40
145 155 165 175 185 195
The black dotted line is called a line of best fit. Shortly (section 57.4), we’ll learn how
to construct this line of best fit.
The more closely the data points in the above scatter diagram lie to a straight line, the more
strongly linearly-correlated are weight and height. So here with these particular data,
the linear correlation between weight and height seems strong. In the next section, we’ll
learn about the product moment correlation coefficient, which is a way to precisely
quantify the degree to which two sets of data are linearly-correlated.
Because the line of best fit is upward-sloping, we can also say that the linear correlation is
positive.
i 1 2 3 4 ... 361
ti (°C) 27.3 29.5 31.1 32 30.2
pi (mm) 0 0.2 0 0 12.4
80 Rainfall (mm)
70
60
50
40
30
20
10
0
25 30 Temperature (degrees Celsius) 35
Again, the black dotted line is a line of best fit. The data points do not seem close to this
line. Thus, it seems that the linear correlation between temperature and rainfall is weak.
The line of best fit is downward-sloping and so we say that the linear correlation is negative.
Exercise 100. (Answer on p. 366.) The table below shows the prices charged (p) and the
number of haircuts (q) given by 5 different barbers, during June 2016.
Draw a scatter diagram with price on the horizontal axis. Plot also what you think looks
like a line of best fit.
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
In the previous section, we used a scatter diagram to determine if there was a plausible
linear relationship between two sets of data. This, though, was a very crude method.
A more precise measure of the degree to which two sets of data are linearly correlated is
called the product moment correlation coefficient (PMCC). Formally:
Definition 13. Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two ordered sets of real numbers.
Then their product moment correlation coefficient (PMCC) , denoted r, is the real number
defined by
1. −1 ≤ r ≤ 1.
2. We say the linear correlation is positive if r > 0 and negative if r < 0.
3. If r = 1, then the data points lie exactly on an upward-sloping line (and we say the linear
correlation is perfect).
5. If r is close to 1, then the data points lie close to an upward-sloping line (and we say the
linear correlation is very strong).
6. If r is close to −1, then the data points lie close to a downward-sloping line (and we say
the linear correlation is very strong).
8. r is merely a measure of linear correlation and nothing else. Two variables may be very
closely related but not linearly-correlated. For example, data generated by the quadratic
model yi = x2i may have a very low r.
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
90 Weight (kg)
80
70
60
50 Height (cm)
40
145 155 165 175 185 195
182 + 165 + 173 + 155 + 178 + 174 + 169 + 160 + 150 + 190
h̄ = = 169.6,
10
81 + 70 + 71 + 53 + 72 + 75 + 69 + 60 + 44 + 80
w̄ = = 67.5,
10
n
∑ (hi − h̄) (wi − w̄) = (182 − h̄) (81 − w̄) + ⋅ ⋅ ⋅ + (190 − h̄) (80 − w̄) = 1237
i=1
¿ √
Án 2
Á
À∑ (hi − h̄) = (182 − 169.6)2 + ⋅ ⋅ ⋅ + (190 − 169.6)2 ≈ 37.180640,
i=1
¿ √
Án
Á
À∑ (wi − w̄)2 = (81 − 67.5)2 + + ⋅ ⋅ ⋅ + (80 − 67.5)2 ≈ 35.418922,
i=1
As expected, r > 0 (the linear correlation is positive or, equivalently, the line of best fit is
upward-sloping). Moreover, r is close to 1 (the linear correlation is very strong).
i 1 2 3 4 ... 361
ti (°C) 27.3 29.5 31.1 32 30.2
pi (mm) 0 0.2 0 0 12.4
80 Rainfall (mm)
70
60
50
40
30
20
10
0
25 30 Temperature (degrees Celsius) 35
≈ −0.1623.
As expected, r < 0 (the linear correlation is negative or, equivalently, the line of best fit is
downward-sloping). Moreover, r is fairly close to 0 (the linear correlation is weak).
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
Correlation does not imply causation. This saying has now become a cliché. Doesn’t make
it any less true.
Hanging suicides
$25 billion 8000 suicides
The PMCC is r ≈ 0.99789126. So the two sets of data are almost perfectly linearly-
correlated. But of course, this doesn’t mean that spending on science causes suicides
or that suicides cause spending on science. More likely, the correlation is simply spurious.
Example 87 (continued from above). We suspect that the heights and weights of adult
male Singaporeans are linearly-correlated. We thus write down this linear model:
w = a + bh.
Recall the quote: “All models are wrong, but some are useful.” The model w = a + bh is
unlikely to be exactly correct. But hopefully it will be useful.
We recycle the data from earlier. These, along with the scatter diagram, are reproduced
for convenience.
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
90 Weight (kg)
80
70
60
50 Height (cm)
40
145 155 165 175 185 195
The basic idea of linear regression is this: Find the line that “best fits” the given data.
Drawn in the figure above are three plausible candidates for the “line of best fit”. But there
can only be one line of best fit. Which is it?
At the end of the day, we’ll choose black dotted line as “the” line of best fit. But why?
This will be answered in the next section.
p = a + bt.
Again, our goal is to get estimates for the unknown parameters a and b (do you expect b
to be positive or negative?).
i 1 2 3 4 ... 361
ti (°C) 27.3 29.5 31.1 32 30.2
pi (mm) 0 0.2 0 0 12.4
80 Rainfall (mm)
70
60
50
40
30
20
10
0
25 30 Temperature (degrees Celsius) 35
Again, drawn in the figure above are several plausible candidates for the “line of best fit”.
It turns out that the black dotted line will be “the” line of best fit.
There are different methods for determining “the” line of best fit. Each method will give a
different line of best fit.
The method we’ll learn in H2 Maths is the most basic and most standard method. It is
called the method of ordinary least squares (OLS).
Let’s assume there is some true linear model, which may be written as y = a+bx. As always,
we stick to the objectivist interpretation. The parameters a and b have some true, fixed
values. However, they are unknown (and may forever be unknown).
Nonetheless, we’ll try to do our best and get estimates for a and b. These estimates will be
denoted â and b̂. And our line of best fit will then be y = â + b̂x.
How do we find this line of best fit? Intuitively, this will be the line to which the data
points are “as close as possible”. But there are many ways to define the term “as close
as possible”. For example, we could try to minimise the sum of the distances between the
points and the line. But we shall not do this.
Instead, we’ll use the method of OLS:
1. Measure the vertical distance of each data point (xi , yi ) from the line. This is called the
residual and is denoted ûi .
2. Our goal is to find the line y = â + b̂x that minimises ∑ û2i — this quantity is called the
Sum of Squared Residuals (SSR).
Example:
85
Weight (kg)
80
75
70 5
65
60
55
50
45
Height (cm)
40
145 155 165 175 185 195
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
ŵi (kg) 65 65 65 65 65 65 65 65 65 65
ûi = wi − ŵi (kg) 16 5 6 −12 7 10 4 −5 −21 15
The second last row of the above table gives, for each person with height hi , the correspond-
ing predicted weight ŵi (as per our candidate line of best fit). The residual ûi (last row) is
then defined as the vertical distance between the data point and the weight predicted by
the candidate line of best fit.
10
The SSR is ∑ û2i = 162 + 52 + 62 + (−12)2 + 72 + 102 + 42 + (−5)2 + (−21)2 + 152 = 1317. Can
i=1
we do better than this? That is, can we find another candidate line of best fit whose SSR
is smaller than 1317?
Fact 16. Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two ordered sets of data. The OLS
regression line of y on x is y − ȳ = b̂ (x − x̄), where
∑ xi yi − nx̄ȳ
(ii) b̂ = .
∑ x2i − nx̄2
Moreover, the regression line can also be written in the form y = â + b̂x, where b̂ is as given
above and â = ȳ − b̂x̄.
Proof. We want to find â and b̂ such that the line y = â + b̂x has the smallest SSR possible.
The residual ûi is defined as the vertical distance between (xi , yi ) and the line y = â + b̂x.
That is,
2
∑ û2i = ∑ [yi − (â + b̂xi )]
We wish to minimise the SSR, by choosing appropriate values of â and b̂. This involves the
following pair of first order conditions:20
∂ ∂
∑ û2i = 0, ∑ û2i = 0.
∂â ∂ b̂
The remainder of the proof simply involves taking derivatives and doing the algebra — it
can be found in the Appendices of my H2 Mathematics Textbook.
Remark 3. Whenever we simply say regression line or line of best fit, it may safely be
assumed that we are talking about the OLS regression line.
20
There’s a bit of hand-waving here.
n n
2
h̄ = 169.6, w̄ = 67.5, ∑ (hi − h̄) = 1382.4, ∑ (hi − h̄) (wi − w̄) = 1237.
i=1 i=1
Thus, the regression line is w − 67.5 = 0.8948 (h − 169.6) or w = â + b̂h = −84.26 + 0.8948h.
90
Weight (kg)
85 4
80
8
75
70
65
60
55
50
45
Height (cm)
40
145 155 165 175 185 195
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
ŵi (kg) 78.6 63.4 70.5 54.4 75.0 71.4 67.0 58.9 50.0 85.8
ûi = wi − ŵi (kg) 2.4 6.6 0.5 −1.4 −3.0 3.6 2.0 1.1 −6.0 −5.8
10
The SSR for the actual line of best fit is ∑ û2i = 2.42 + ⋅ ⋅ ⋅ + (−5.8)2 ≈ 147.6. This is much
i=1
better than the SSR of 1317 that we found for the previous candidate line of best fit, which
was simply a horizontal line.
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
q̂i
ûi = qi − q̂i
Example 218. We’ll find the PMCC and the regression line for these data:
i 1 2 3 4 5
xi 1 7 3 11 8
yi 14 5 6 4 4
2. Press the blue 2ND button and then CATALOG (which corresponds to the 0 button).
This brings up the CATALOG menu.
3. Using the down arrow key ∨ , scroll down until the cursor is on DiagnosticOn.
4. Press ENTER once. And press ENTER a second time. The TI84 now says “DONE”,
telling you that the Diagnostic option has been turned on.
The above steps need only be performed once. Unless of course you’ve just reset your
calculator (as is required before each exam). In which case you have to go through the
above steps again.
(Be careful to note that the TI84 uses the symbol “a” for the coefficient for x, whereas in
the A-level List of Formulae, they use b instead. Don’t get these mixed up!)
After Step 9. After Step 10. After Step 11. After Step 12.
Exercise 103. Using your TI84, find the PMCC between q and p, and also find the regres-
sion line of q on p (see data below). Verify that your answer for this exercise is the same
as those in the last two exercises. (Answer on p. 368.)
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
Given any value of x, we call the corresponding ŷ = b̂ (x − x̄) + ȳ the fitted value or the
predicted value. One use of the regression line is that it can help us predict (or “guess”)
the value of y, even for x for which we have no data.
Example 216 (height and weight example revisited). Say we want to guess the weight
of an adult male Singaporean who is 185 cm tall. Using our regression line, we predict that
his weight is ŵh=185 = 0.8948 × 185 − 84.26 ≈ 81.3 kg. This is called interpolation, because
we are predicting the weight of a person whose height is between two of our observations.
Say instead we want to guess the weight of an adult male Singaporean who is 210 cm tall.
Using our regression line, we predict that his weight is ŵh=210 = 0.8948 × 210 − 84.26 ≈ 103.6
kg. This is called extrapolation, because we are predicting the weight of a person whose
height is beyond on our rightmost observation.
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190 185 210
wi (kg) 81 70 71 53 72 75 69 60 44 80 - -
ŵi (kg) 78.6 63.4 70.5 54.4 75.0 71.4 67.0 58.9 50.0 85.8 81.3 103.6
90
80
70
60
50
Height (cm)
40
145 155 165 175 185 195 205 215
This, though, is not a very satisfying explanation for why extrapolation is “less reliable”
than interpolation. It merely leads to another question: “Why should a prediction be more
reliable if done between two known observations, than if done to the right of the right-most
observation (or to the left of the left-most observation)?”
We won’t give an adequate answer to this latter question. Instead, we’ll simply give a
bunch of examples to illustrate the dangers of extrapolation:
Example 219. A man on a diet weighs 115 kg in Week #1. Here’s a chart of his weight
loss.
The OLS line of best fit suggests that he has been losing about 0.5 kg a week.
He forgot to record his weight on Week #6. By interpolation, we “predict” that his weight
that week was 112.5 kg. This is probably a reliable guess.
By extrapolation, we predict that his weight on Week #201 will be 15 kg. This guess is
obviously absurd. It requires that he keeps losing 0.5 kg a week for nearly 4 years.
The OLS line of best fit suggests that he has been growing by about 1 cm a month.
He forgot to record his height in Month #6. By interpolation, we “predict” that his height
that month was 165 cm. This is probably a reliable guess.
By extrapolation, we predict that his height in Month #101 will be 260 cm. This guess is
obviously absurd. It requires that he keep growing by 1 cm a month for the 8-plus years.
Example 221. Russell’s Chicken (Problems of Philosophy, 1912, Google Books link).
The man who has fed the chicken every day throughout its life at last wrings its neck instead,
showing that more refined views as to the uniformity of nature would have been useful to the
chicken. ... The mere fact that something has happened a certain number of times causes
animals and men to expect that it will happen again. Thus our instincts certainly cause
us to believe the sun will rise to-morrow, but we may be in no better a position than the
chicken which unexpectedly has its neck wrung.
F0 = 22 + 1 = 3,
0
F1 = 22 + 1 = 5,
1
F2 = 22 + 1 = 17,
2
F3 = 22 + 1 = 257,
3
F4 = 22 + 1 = 65537.
4
Remarkably, the first five Fermat numbers are all prime. This observation led Fermat to
conjecture (guess) in the 17th century that all Fermat numbers are prime. This was an act
of extrapolation.
Unfortunately, Fermat’s act of extrapolation was wrong. About a century later, Euler
showed that F5 = 22 + 1 = 4294967297 = 641 × 6700417 is composite (not prime).
5
Today, the Fermat numbers F5 , F6 , . . . , F32 are all known to be composite. Indeed, it was
shown in 1964 that F32 is composite. Over half a century later, it is not yet known if F33 =
22 + 1 is prime or composite. F33 is an unimaginably huge number, with 2, 585, 827, 973
33
digits.
On his second day at school, he learns that the Chinese character for the number 2 is
written as two horizontal strokes.
On his third day at school, he learns that the Chinese character for the number 3 is written
as three horizontal strokes.
After his third day at school, Ah Beng decides he’ll skip at least the next few Chinese
classes, because he thinks he knows how to write the Chinese characters for the numbers 4
and above. 4 simply consists of four horizontal strokes; 5 simply consists of five horizontal
strokes; etc. Unfortunately, Ah Beng’s act of extrapolation is wrong.
The characters for the numbers 4 through 10 look instead like this:
4 5 6 7 8 9 10
Example 224. Moore’s Law. In 1965, Gordon Moore observed that the number of
components that could be crammed onto each integrated circuit doubled every year. He
predicted that this rate of progress would continue at least through 1975.
In 1975, he adjusted his prediction to a more modest rate of doubling every two years. Thus
far, this latter prediction has held up remarkably well, as the following graph (taken from
Nature) shows.
Unfortunately, as stated in the same Nature article, it “has become increasingly obvious to
everyone involved” that “Moore’s law ... is nearing its end”.
This is considerably quicker than the rate at which the annual US defense budget and US
Gross National Product (GNP) grows. Extrapolating, he concluded:
• In 2054, the entire annual US defense budget will be spent on a single aircraft.
• Early in the 22nd century, the entire US GNP will be spent on a single aircraft.
Except so far they have been right on track. In a 2010 Economist article, Augustine was
quoted as saying, “We are right on target. Unfortunately nothing has changed.” That article
also presented an updated version of Augustine’s Law.
The latest F-35 fighter program is estimated to cost the US Department of Defense US$1.124
trillion. To be fair, that estimate is the cost of the entire program over its projected 60-
year lifespan (through 2070) — this includes R&D, the purchase of over 2, 000 F-35s, and
operating costs. But still, US$1.124 trillion is a mind-blowing figure.*
*Figure quoted from an April 2016 Defense News story. Note though that the estimate keeps changing.
Exercise 104. Using the data below, “predict” how many haircuts were sold in June 2016
by (a) a barber who charged $7 per haircut; and (b) a barber who charged $200 per haircut.
Which prediction is an act of interpolation and which is an act of extrapolation? Which
prediction do you think is more reliable?(Answer on p. 368.)
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
It’s much more interesting to live not knowing than to have answers which might be wrong.
- Richard Feynman (1981, YouTube).
The A-level examiners21 want you to say, mindlessly and formulaically, that
Regurgitating the above sentence will earn you your full mark. But in fact, without the
“all else equal” clause, it is nonsense. And since it is almost never true that “all else is
equal”, it is almost always nonsense.
In every introductory course or text on statistics, one is told that the PMCC is merely
a relatively-unimportant consideration, in deciding between models. Yet somehow, the
A-level examiners seem to consider the PMCC an all-important consideration.
Here’s a quick example to illustrate.
Example 226. (From the 2015 H2 Maths exam.) In an experiment the following informa-
tion was gathered about air pressure P , measured in inches of mercury, at different heights
above sea-level h, measured in feet.
h 2000 5000 10000 15000 20000 25000 30000 35000 40000 45000
P 27.8 24.9 20.6 16.9 13.8 11.1 8.89 7.04 5.52 4.28
The exam first asks us to find the PMCCs between (a) h and P ; (b) ln h and P ; and (c)
√
h and P . The answers are (a) ra ≈ −0.980731; (b) rb ≈ −0.974800; and (c) rc ≈ −0.998638.
The A-level exam then says, “Using the most appropriate case ..., find the equation which √
best models air pressure at different heights.” The “correct” answer is that (c) P = a + b h
is the “most appropriate” model, simply because the PMCC there is the largest.
21
See H2 Maths 9740 N2015/II/10(iii), N2014/II/8(b)(ii), N2012/II/8(v), N2011/II/8(iii), N2010/II/10(iii), and
N2008/II/8(i).
But this is utter nonsense. One does not conclude that one model is “more appropriate”
than another simply because its PMCC is 0.018 larger. Small measurement errors or plain
bad luck could easily explain these tiny differences in PMCCs.
Moreover, even if one model has r = 0.9 and another has r = 0.4, it does not automatically
follow that the first model is “more appropriate” than the second. In deciding which
statistical model to use, there are very many considerations, of which the PMCC is a
relatively-unimportant one.
Sadly, in the Singapore education system, what I consider to be the correct answer would
not have gotten you any marks. Instead, one is taught that there must always be one
single, simplistic, formulaic, definitive, “correct” answer. This is a convenient substitute
for thinking.
As it turns out, the “most correct” linear model — based on the actual barometric formula
(see the last page of the Appendices in my H2 Mathematics Textbook) — is actually the
following:
L
ln P = a + b ln (1 + h) .
T
The constants L = −0.0065 kelvin per metre (Km-1 ) and T = 288.15 kelvin (K) are, re-
spectively, the standard temperature lapse rate (up to 11, 000 m above sea level) and the
standard temperature (at sea level).
The PMCC for the above model is rd ≈ 0.999998, which is “better” than the cases examined
above. (See this Google spreadsheet for the data and calculations.)
Ten-Year Series
This part lists all the questions from 2006-2015 A-Level exams, sorted into the two sections
of the exam (Pure Mathematics and Statistics), and in reverse chronological order.
In the older exams, they had the habit of not distinctly numbering different parts within
the same question as parts (i), (ii), etc. So I have sometimes taken the liberty of adding or
modifying such numbers.
Exam Tip
Unless explicitly instructed, you are always allowed to use your graphing calculator, so use
it wherever possible.
Examples of explicit instructions to avoid using your calculator include (but are not limited
to):
• “Without using your calculator ...”
• “Use a non-calculator method ...”
• “Find the exact value of ...”
√
• “Express your answer in terms of 3 or π.”
Exercise 105. (8864 N2015/I/1. Answer on p. 369.) Show that there are no real values
of k for which 2k + (k − 4)x − 2x2 is always negative. [4]
3
Exercise 106. (8864 N2015/I/2. Answer on p. 369.) (i) Differentiate . [2]
(2x − 1)4
1
2
(ii) Use a non-calculator method to find ∫ (x + 2/x) dx. [5]
0.5
Exercise 107. (8864 N2015/I/3. Answer on p. 369.) The diagram shows the curve C
with equation y = 12x + 8e−2x .
0 x
(i) Use differentiation to find the exact x-coordinate of the stationary point of C. [4]
(ii) Find the area of the region bounded by C, the x-axis and the lines x = 0 and x = a,
where a is a positive constant. Give your answer in terms of a. [3]
D
x
P Q
x
y
U R
x x
F T S E
√
(i) Show that the area, A cm2 , of P QRST U is given by A = (0.5 3) (50 + 10x − x2 ). [5]
(ii) Without using a calculator, find the maximum value of A as x varies, justifying that
this value is a maximum. [3]
Exercise 109. (8864 N2015/I/5. Answer on p. 370.) The curve C has equation y =
0.5x − ln(x + 1).
(i) Sketch the graph of C, stating the coordinates of any points of intersection with the
axes and the equations of any asymptotes parallel to the y-axis. [3]
(ii) Find the numerical value of the gradient of C at the point P where x = 0.5, giving your
answer correct to 3 decimal places. [1]
(iii) The normal to C at P meets the x-axis at A and the y-axis at B. Find the length of
AB. [5]
6 1
∫1 √ dx. [4]
1 + 4x
Exercise 111. (8864 N2014/I/2. Answer on p. 371.) (i) Differentiate ln (x2 + 4). [2]
(ii) The curve C has equation y = ln (x2 + 4). Show that the values of x for which the
gradient of C is equal to the constant k satisfy the equation kx2 − 2x + 4k = 0. [1]
(iii) Find the values of k for which this equation has equal roots. [2]
Exercise 112. (8864 N2014/I/3. Answer on p. 371.) The curve C has equation y =
1 − e1−2x .
(i) Sketch the graph of C, stating the exact coordinates of any points of intersection with
the axes and the equation of the asymptote. [3]
(ii) Without using a calculator, find the equation of the tangent to C at the point where
x = 1, giving your answer in the form y = mx + c, where m and c are exact constants. [4]
Exercise 115. (8864 N2013/I/1. Answer on p. 372.) Find the set of values of k for which
the equation x2 − (k − 2)x + (2k + 1) = 0 has no real roots. [4]
Exercise 116. (8864 N2013/I/2. Answer on p. 372.) (i) Differentiate ln (1 + 2x2 ). [2]
(ii) Use a non-calculator method to find the exact value of
0 1
∫−1 (1 − 3x)4 dx. [4]
Exercise 117. (8864 N2013/I/3. Answer on p. 372.) A piece of card has the shape of a
trapezium ABCE. The point D on CE is such that ABCD is a rectangle. It is given that
AB = y cm, BC = 4x cm and DE = 3x cm (see diagram). The area of the card is S cm2 .
(iii) For the smaller value of a, find the coordinates of the point of intersection of the normal
at P and the line y = x. [2]
Exercise 119. (8864 N2013/I/5. Answer on p. 373.) (i) By taking logarithms, find the
exact root of the equation e2−2x = 2e−x . [3]
(ii) Use differentiation to show that the curve C with equation y = e2−2x − 2e−x has a
stationary point at (2, −e−2 ). [3]
(iii) Sketch C, stating the exact value of the x-coordinate of its point of intersection with
the x-axis. [2]
(iv) Use your calculator to find the area of the region bounded by C, the x-axis and the
lines x = 0 and x = 1. [1]
Exercise 120. (8864 N2012/I/1. Answer on p. 373.) Given that 3e2x = 4 (e−2x − 1), use
the substitution u = e2x to find the exact value of x. [4]
(i) Find, in terms of k, the x-coordinates of the points where C and L intersect. [2]
(ii) Hence find, in terms of k, the area of the finite region between C and L. [4]
Exercise 124. (8864 N2012/I/5. Answer on p. 375.) The curve C has equation y = 2x −x2 .
(i) Sketch C, stating the coordinates of the points of intersection with the axes. [3]
(ii) Find the numerical value of the gradient of C at the point where x = 1.5. Give your
answer correct to 4 decimal places. [1]
(iii) Hence find the equation of the tangent to C at the point where x = 1.5. Give your
answer in the form y = mx + c, with m and c correct to 4 decimal places. [2]
(iv) This tangent meets the y-axis at A and the line y = x at B. Find the length of AB. [4]
Exercise 126. (8864 N2011/I/2. Answer on p. 375.) (i) On a single diagram, sketch
the graphs of y = 2 − 0.6x and y = x2 − 1, stating clearly the coordinates of any points of
intersection with the y-axis. [2]
(ii) Find the x-coordinates of the points of intersection of y = 2 − 0.6x and y = x2 − 1, giving
your answers correct to 4 decimal places. [2]
(iii) Write down as an integral an expression for the area of the region bounded by y = 2−0.6x
and y = x2 − 1 and the lines x = 2 and x = 3. Evaluate this integral, giving your answer
correct to 3 decimal places. [2]
Exercise 127. (8864 N2011/I/3. Answer on p. 376.) (i) Find ∫ e3x+2 dx. [2]
9√ √
(ii) Without using a calculator, find ∫ 3 ( x − 1/ x) dx. [4]
4
Exercise 128. (8864 N2011/I/4. Answer on p. 376.) The diagram shows a square piece
of cardboard ABCD of side 2 m. A square of side x m is removed from each corner of
ABCD. The remaining shape is now folded along P Q, QR, RS and SP to form an open
rectangular box of height x m.
(i) Show that the volume, V m3 , of the box is given by V = 4x3 − 8x2 + 4x. [3]
(ii) Without using a calculator, find the maximum value of V as x varies. [5]
(ii) find the exact coordinates of A and B and hence show that the exact area of triangle
OAB is (p − q ln 5)2 /30, where p and q are integers to be found. [8]
Exercise 130. (8864 N2010/I/1. Answer on p. 376.) Find the set of values of k for which
the equation 4x2 − 2kx + 9 = 0 has two real distinct roots. [3]
Exercise 133. (8864 N2010/I/4. Answer on p. 377.) A window in a new building has the
shape of a rectangle ABCD joined to an isosceles triangle ABE, as shown in the diagram.
IT is given that AB = 2x m and AE = 5/8AB. The total perimeter AEBCDA of the window
is 6 m.
(iii) Hence use a non-calculator method to find the maximum value of this area. [4]
(i) Use a non-calculator method to find the coordinates of the stationary points of C. [4]
(ii) Sketch C. Mark the point of inflexion with a cross. [2]
(iii) Find, correct to 2 decimal places, the x-coordinates of the points where C cuts the
x-axis. [2]
(iv) Find ∫ 6 − 4x3 − 3x4 dx. Hence find the exact area of the region bounded by C, the
x-axis and the lines x = −1 and x = 1/2. [3]
Exercise 135. (8863 N2009/I/1. Answer on p. 378.) Without using a calculator, solve
the simultaneous equations
x + 2y = 3,
x2 + xy = 2. [4]
√
Exercise 136. (8863 N2009/I/2. Answer on p. 378.) (i) Sketch the graphs of y = x and
√
y = 0.5x on a single diagram and write down the coordinates of the points where y = x
and y = 0.5x intersect. [2]
√
(ii) Find ∫ x dx and ∫ 0.5x dx. [2]
(iii) Without using a calculator, find the area of the region between the two graphs. [2]
Exercise 137. (8863 N2009/I/4. Answer on p. 379.) (i) Sketch the curve y = x − 1/x,
stating clearly the coordinates of all points of intersection with the axes. [1]
(ii) Find the gradient of the normal at the point P on the curve where x = 2. [2]
(iii) Find the equation of the normal at P in the form ax + by + c = 0, where a and b are
integers. [3]
(iv) The normal at P meets the y-axis at N and the tangent at P meets the y-axis at T .
Find the area of triangle P T N . [5]
(ii) Sketch the curve, stating clearly the coordinates of all points of intersection with the
axes. [3]
(iii) Solve the inequality 2x3 −5x2 −4x+3 > 0. Hence find the exact solutions of the inequality
2e3x − 5e2x − 4ex + 3 > 0. [5]
Exercise 139. (8863 N2008/I/1. Answer on p. 380.) This question is no longer in the
8865 (revised) syllabus, so you can skip it. Sketch the graph of y = sin x for 0 ≤ x ≤ 4π. [1]
It is given that α is an acute angle, and sin α = c. State, in terms of c, the value of
Exercise 140. (8863 N2008/I/2. Answer on p. 380.) The sum of two numbers x & y is
20 and the sum of their squares is 300. Given that x > y, find the exact value of x & y. [5]
Exercise 141. (8863 N2008/I/3. Answer on p. 380.) The diagram shows the graphs of
C1 ∶ y = 2x2 and C2 ∶ y = x2 + k 2 , where k is a positive constant. The graphs intersect at P
and Q, as shown.
(i) Show that the x-coordinates of P and Q are k and −k respectively. [1]
(ii) Find the exact value of the area of the shaded region between C1 and C2 . [5]
(i) Find the set of values of k for which x is an increasing function of t. [5]
It is now given that k = 36.
Exercise 143. (8863 N2008/I/6. Answer on p. 381.) The diagram shows the curve C
with equation y = ln(2x + 4). The point P on C has coordinates (1, ln 6). The tangent to
C at P meets the x-axis at T .
Exercise 145. (8863 N2007/I/3. Answer on p. 382.) (i) Sketch, for x ≥ 0, the graphs of
y = 20/(x + 2) and y = 10 − x2 on the same axes. [2]
(ii) The graphs intersect on the y-axis. Find, correct to 3 decimal places, the x-coordinate
of the point of intersection for which x > 0. [1]
(iv) Use your answers to parts (ii) and (iii) to find the area of the region, in the first
quadrant, between the two graphs. [2]
Exercise 146. (8863 N2007/I/4. Answer on p. 382.) The diagram shows a large rect-
angular field surrounded by a wall. The broken lines represent fences. The corner shapes
are an isosceles triangle and a square. The length of the fence bordering the triangle is x
metres.
(iii) Hence, using a sketch of the graph of x = cos θ, solve the inequality 2 cos2 θ + 3 cos θ + 2 ≥
2 cos θ + 3, for 0○ ≤ θ ≤ 540○ . [6]
Exercise 148. (8174 N2006/I/6. Answer on p. 383.) Solve (i) e5x+2 = 23, [2]
Exercise 150. (8174 N2006/I/9. Answer on p. 384.) (i) Find ∫ (5x2 − 8x) dx. [2]
1
(ii) Evaluate ∫ e−2x dx. [4]
0
Exercise 151. (8174 N2006/I/16. Answer on p. 384.) The diagram shows the line
y = −4x + 19 intersecting the curve y = −2x2 + 6x + 11 at the points A and B. Find
Exercise 152. (8864 N2015/I/6. Answer on p. 385.) The masses of peaches sold by a
shop have a normal distribution. Over a long period of time, it is found that 20% of peaches
have a mass less than 40 grams and 25% of peaches have a mass greater than 60 grams.
Find the mean and variance of the distribution. [4]
Exercise 153. (8864 N2015/I/7. Answer on p. 385.) This question is no longer in the
8865 (revised) syllabus, so you can skip it. A college has 1200 students. Of these students,
500 are in Year One, 400 are in Year Two and 300 are in Year Three. A list of the names
of all 1200 students, with the names arranged in alphabetical order, is available. A survey
is to be carried out to investigate how many hours students spend playing computer games
each week.
(i) Describe how to obtain a systematic sample of 100 students from the list to take part
in the survey. [2]
(ii) State one disadvantage of using a systematic sample in this context. [1]
(iii) What type of sample might it be more appropriate to use? You do not need to describe
how you would obtain this sample. [1]
Exercise 154. (8864 N2015/I/8. Answer on p. 385.) Two events A and B are such that
P(A) = p, P(B) = 2p, P(A ∪ B) = 0.42 and P(A ∩ B) = 0.03.
Exercise 155. (8864 N2015/I/9. Answer on p. 385.) Kai throws a fair die 8 times. Find
the probability that he obtains a six
(iii) Using a suitable approximation, estimate the probability that the number of times
he obtains a six is between 90 and 100 inclusive. State the mean and variance of the
distribution that you use. [4]
Rower A B C D E F G H I J
h 1.75 1.90 1.81 1.82 1.81 1.60 1.88 1.71 1.95 1.76
w 95 102 96 98 99 90 106 92 110 93
(i) Draw a sketch of the scatter diagram for the data, as shown on your calculator. [2]
(ii) Find the product moment correlation coefficient and comment on its value in the context
of the data. [2]
(iii) Find the equation of the regression line of w on h and sketch this line on your scatter
diagram. [2]
(iv) Use the equation of your regression line to calculate an estimate of the weight of a
rower whose height is 1.66 metres. Give two reasons why you would expect this estimate
to be reliable. [3]
Exercise 157. (8864 N2015/I/11. Answer on p. 386.) Men and women staying at a large
hotel have masses, in kg, that are normally distributed with means and standard deviations
as shown in the following table.
(i) Find the probability that the mass of a man chosen at random is within ±2 kg of the
mean mass of men. [2]
(ii) Find the probability that the total mass of three men chosen at random is greater
than the total mass of four women chosen at random. State the mean and variance of the
distribution that you use. [4]
The lift in the hotel has a safety limit of 460 kg. Three men and four women are chosen at
random.
(iii) Find the probability that they can safely travel in the lift together. State the mean
and variance of the distribution that you use. [3]
Exercise 159. (8864 N2015/I/13. Answer on p. 387.) A scientist claims that the mean
length of fish in a particular lake is 15.2 cm. The lengths of fish are known to have a normal
distribution with standard deviation 2.1 cm. A random sample of 30 fish is selected and
found to have a sample mean length of 14.5 cm.
(i) Test, at the 5% significance level, whether the scientist’s claim should be rejected. [4]
The lengths of a random sample of 40 fish from a second lake are summarised as follows,
where x cm denotes the length of a fish in this lake.
(ii) Find unbiased estimates of the population mean and variance. [3]
(iii) What do you understand by the term ‘unbiased estimate’? [1]
The population mean length of fish from this second lake is µ cm. Using the sample data,
a significance test of the null hypothesis µ = 18 against the alternative hypothesis µ < 18 is
carried out at the α% significance level.
(iv) Find the set of values of α for which the null hypothesis will be rejected. [3]
Exercise 160. (8864 N2014/I/6. Answer on p. 387.) The heights of girls in a school
have a normal distribution with mean 142.2 cm and standard deviation 6 cm. Find the
probability that a girl chosen at random from this school has height
Supermarket Online
Under 25 years 500 1000
25 − 60 years 900 1600
Over 60 years 800 200
A researcher carries out a survey to investigate the amount spent on food per week. She
decides to use a sample of size 100 from these households.
(i) Describe how she might obtain a systematic sample. [2]
(ii) Describe how she might obtain a stratified sample, identifying the strata and finding
the size of the sample taken from each of the strata. [2]
(iii) State, with a reason, whether a systematic sample or a stratified sample would be more
appropriate in this context. [1]
Exercise 162. (8864 N2014/I/8. Answer on p. 388.) In a certain large city, the number
of hours, x, spent travelling to and from work and the number of hours, y, spent watching
television were recorded for a random sample of 8 people, for one particular week. The
results are given in the following table.
A B C D E F G H
x 12.8 8.4 4.4 9.0 7.2 2.2 9.2 6.3
y 4.5 8.3 14.8 8.0 9.2 12.5 7.8 10.4
(i) Give a sketch of the scatter diagram for the data, as shown on your calculator. [2]
(ii) Find the product moment correlation coefficient and comment on its value in the context
of the data. [2]
(iii) Find the equation of the regression line of y on x, in the form y = mx + c, giving the
values of m and c correct to 4 significant figures. Sketch this line on your scatter diagram.
[2]
(iv) Use the equation of your regression line to estimate the number of hours of television
watched by a person who spends 13.2 hours a week travelling to and from work. Comment
on the reliability of your estimate. [3]
Exercise 164. (8864 N2014/I/10. Answer on p. 388.) It is known that the lengths
of leaves from beech trees in a particular forest have a population variance of 4.4 cm2 .
Scientists believe that the mean length of leaves from beech trees in this forest is 7 cm. A
random sample of 50 of these leaves has a mean length of 6.5 cm.
(i) Test, at the 5% significance level, whether the population mean length of leaves from
beech trees in this forest is less than 7 cm. [4]
The lengths, x cm, of a random sample of 50 leaves from beech trees in another forest are
summarised by ∑ x = 310.4 and ∑ x2 = 2209.2.
(ii) Calculate unbiased estimates of the population mean and variance. [3]
A test, at the α% significance level, shows that there is sufficient evidence to suggest that
the population mean length of leaves from beech trees in this second forest differs from 7
cm.
(iii) Find the set of possible values of α. [4]
(i) Write down expressions for P(L) and P(G) in terms of x. Given that L and G are
independent, show that x = 10. [4]
Using this value of x, find (ii) P(L ∪ T ), (iii) P(T ∩ G′ ), and (iv) P(L∣G). [1 mark each.]
Two students from the whole group are chosen at random.
(v) Find the probability that both of these students each owns exactly two out of the three
times (laptop, tablet, games machine). [3]
Exercise 166. (8864 N2014/I/12. Answer on p. 389.) The outputs of a certain metal,
in tonnes, extracted each day from two mines, A and B, have independent normal dis-
tributions. The mean of the distribution of the daily output from A is 50 tonnes. The
probability that the daily output from A is more than 75 tonnes is 0.0189.
(i) Show that the variance of this distribution is 145 tonnes2 , correct to 3 s.f. [3]
The mean and variance of the distribution of the daily output from B are 75 tonnes and
64 tonnes2 respectively. B operates for seven days each week.
(ii) Find the probability that in a 7-day week B produces less than 500 tonnes. [3]
(iii) A operates for five days each week. Find the probability that in any particular week
the output from B is more than twice the output from A. You should state the mean and
variance of any distribution that you use. [5]
(i) Find unbiased estimates of the population mean and variance. [3]
(ii) This type of device has previously been considered capable of retaining information for
75 hours, on average, after power is switched off, but the manufacturers now claim that
information is retained for longer than this. Test at the 2.5% significance level whether the
claim is justified. [4]
Exercise 169. (8864 N2013/I/8. Answer on p. 390.) A shop sells batteries in packs
of 10. An advertiser claims that individual batteries each have a lifetime of at least 100
hours. The probability that an individual battery has a lifetime less than 100 hours is 0.2,
independently of all other batteries.
(i) Find the probability that, in a randomly chosen pack of 10 batteries, each of the batteries
satisfies the advertiser’s claim. [1]
Customers are satisfied if at least 8 of the batteries in a pack have a lifetime of at least 100
hours.
(ii) Find the probability that a randomly chosen pack will satisfy customers. [3]
Boy A B C D E F G H I J
x 8.2 10.1 6.6 13.5 6.8 11.4 7.8 6.9 12.8 7.5
y 123 135 119 141 112 151 122 116 141 123
(i) Give a sketch for the scatter diagram for the data, as shown on your calculator. [2]
(ii) Find the product moment correlation coefficient and comment on its value in the context
of the data. [2]
(iii) Find the equation of the regression line of y on x, in the form y = mx + c, giving the
values of m and c correct to 2 decimal places. Sketch this line on your scatter diagram. [2]
(iv) Use the equation of your regression line to calculate an estimate of the height of a boy
whose age is 13.2 years and comment on the reliability of your estimate. [3]
(ii) Find the probability that a randomly selected packet of Type B has a mass of food less
than 1 kg. State the mean and variance of any distribution that you use. [4]
(iii) Find the probability that the mass of food in a randomly selected packet of Type B is
more than the mass of food in a randomly selected packet of Type A. State the mean and
variance of any distribution that you use. [4]
Exercise 173. (8864 N2013/I/12. Answer on p. 391.) Jai is playing a game which involves
throwing a fair six-sided die. If the result is a 3, 4, 5 or 6, his score is the number shown.
If the result is a 1 or a 2, he throws the die a second time and his score is the sum of the
two numbers from his two throws.
(i) Draw a tree diagram to represent the possible outcomes. [3]
Events A and B are defined as follows:
Event A: Jai’s score is 5 or 6, Event B: Jai has two throws.
(ii) Show that P(A) = 4/9. [2]
Find (iii) P(A ∩ B), [1] (iv) P(A ∪ B), [2] and (v) P(B∣A′ ). [4]
Exercise 174. (8864 N2012/I/6. Answer on p. 392.) This question is no longer in the
8865 (revised) syllabus, so you can skip it.
(i) Given that A and B are independent, find a quadratic equation satisfied by p. [3]
(ii) Hence find the value of p and the value of P(A ∩ B). [2]
Exercise 176. (8864 N2012/I/8. Answer on p. 392.) An election was held to choose the
leader of a political party.
• Candidate A received 50% of all the votes, and 60% of A’s votes were cast by males.
• Candidate B received 35% of all the votes, and 40% of B’s votes were cast by males.
• Candidate C received 15% of all the votes, and 20% of C’s votes were cast by males.
A person V , who voted in the election, is selected at random. Find the probability that V
(i) voted for A and is male, [1]
(ii) is female, [2]
(iii) voted for C, given that V is male. [2]
Exercise 177. (8864 N2012/I/9. Answer on p. 392.) A company is selling ‘Pluto’ cars.
The age x, in years, and the advertised price y, in hundreds of dollars, for ten Pluto cars
are given in the following table.
Car 1 2 3 4 5 6 7 8 9 10
x 5.0 4.5 6.0 5.2 5.6 6.0 3.0 2.0 7.1 7.5
y 85 90 65 72 75 70 130 150 42 42
(i) Draw a sketch of the scatter diagram for the data, as shown on your calculator. [2]
(ii) Find the product moment correlation coefficient and comment on its value in the context
of the data. [2]
(iii) Find the equation of the regression line of y on x, in the form y = mx + c, giving the
values of m and c correct to 2 decimal places. Sketch this line on your scatter diagram. [2]
(iv) Calculate an estimate of the advertised price of a Pluto car which is
(a) 4 years old, [2]
(b) 9 years old. [2]
(v) Comment on the reliability of each of your estimates in part (iv). [2]
Exercise 179. (8864 N2012/I/11. Answer on p. 393.) A company sells balls of string. A
manager claims that the average length of string in a ball is at least 300 m. To test this
claim, a random sample of 100 balls of string is checked and the lengths of string per ball,
x m, are summarised by ∑(x − 300) = −60 and ∑(x − 300)2 = 1240.
(i) Find unbiased estimates of the population mean and variance. [3]
(ii) Test at the 5% significance level whether the manager’s claim is valid. [5]
The manufacturing process is improved and the new population variance is known to be
12.1 m2 . A new random sample of 100 balls of string is chosen and the mean of this sample
is k m. A test at the 10% significance level indicates that the manager’s claim is valid for
this improved process.
(iii) Find the least possible value of k, giving your answer correct to 2 decimal places. [3]
Stating clearly the mean and variance of all distributions that you use, find the probability
that (i) the total mass of 10 randomly chosen grapefruit of type A is less than 2.4 kg, [3]
(ii) the total mass of 6 randomly chosen grapefruit of type A is within 0.2 kg of the total
mass of 5 randomly chosen grapefruit of type B. [4]
(iii) Mrs Woo buys 3 grapefruit of type A and 3 grapefruit of type B. Mr Tan buys 10
grapefruit of type A. Stating clearly the mean and variance of the distribution that you
use, find the probability that Mrs Woo pays more than Mr Tan. [6]
Exercise 181. (8864 N2011/I/6. Answer on p. 394.) Independent events A and B are
such that P(A) = a and P(B) = b. Given that P(A ∪ B) = 0.46 and P(A ∩ B) = 0.04, find a
quadratic equation satisfied by a and hence find the possible values of P(A). [5]
Exercise 182. (8864 N2011/I/7. Answer on p. 394.) This question is no longer in the
8865 (revised) syllabus, so you can skip it. Two thousand students travel to college either
by car, by bicycle or on foot. Any given student travels by the same method each day.
The numbers in each of two year-groups using each method of travel are summarised in the
table below.
Researcher A carries out a survey to investigate the length of students’ journey times to
college, using a random sample of 100 students.
(i) Explain what is meant in this context by the term ‘a random sample’. [2]
Researcher B decides to use stratified sampling with three strata from the combined year-
groups, also using 100 students.
(ii) Identify the three strata and find the size of the sample taken from each stratum. [2]
(iii) State one advantage that stratified sampling would have compared to random sampling
in this context, and state how a better stratified sample of size 100 could have been achieved,
using the data in the above table. [2]
(i) Give a sketch of the scatter diagram for the data, as shown on your calculator. [2]
(ii) Find the product moment correlation coefficient and comment on its value in the context
of this question. [2]
(iii) Find the equation of the regression line of T on H. Sketch this line on your scatter
diagram. [2]
(iv) Calculate an estimate of the air temperature at noon at a place in the region with
altitude 1000 metres. Comment on the reliability of this estimate. [2]
Exercise 184. (8864 N2011/I/9. Answer on p. 395.) A certain type of light bulb is
designed to have a mean lifetime of 12, 000 hours. The standard deviation of the lifetimes
is 1, 400 hours. Tests on a random sample of 50 bulbs from a certain batch give a mean
lifetime of 11, 500 hours.
(i) Test at the 1% level of significance whether this particular batch is substandard (that
is, the mean lifetime of bulbs in the batch is less than 12, 000 hours). [4]
Tests on a random sample of 50 bulbs from another batch give a mean lifetime of T hours.
A test at the 5% level of significance does not indicate that this batch is substandard.
(ii) Obtain an equation for the least possible value of T , and solve it. [4]
Exercise 185. (8864 N2011/I/10. Answer on p. 395.) Jon attempts a puzzle in his daily
newspaper each day. The probability that he will complete the puzzle on any given day is
0.8, independently of any other day.
(i) Find the probability that, in a given week of 7 days, Jon will complete the puzzle
(a) exactly 3 times, [1]
(b) at least 5 times. [2]
(ii) Find the probability that, over a period of 10 weeks, Jon completes the puzzle at least
5 times each week. [2]
(iii) Using a suitable approximation, find the probability that, over a period of 10 weeks,
Jon completes the puzzle at least 50 times in total. State the mean and variance of approx-
imation. [4]
One of the boxes is selected by tossing two fair coins. If both coins show heads, box A is
selected and otherwise box B is selected.
(i) One ball is chosen at random from the selected box and the colour of the ball is noted.
(a) Draw a tree diagram to represent this situation. [3]
(b) Find the probability that a red ball is chosen. [2]
(c) Given that a red ball is chosen, find the probability that it comes from box A. [2]
(ii) Instead, two balls are chosen at random, without replacement, from the selected box.
Find the probability that both balls are the same colour. [4]
Exercise 187. (8864 N2011/I/12. Answer on p. 396.) Boys and girls visiting a theme
park have masses, in kg, that are independent and are normally distributed with means
and standard deviations as shown in the following table.
(i) Find the probability that the mass of a boy chosen at random is between 50 kg and 70
kg. [2]
(ii) A boy and a girl are chosen at random. Find the probability that the mass of the boy is
greater than the mass of the girl, stating clearly the mean and variance of the distribution
that you use. [4]
(iii) On a ride at the theme park, trains carrying up to 5 people travel around a track. The
total mass of the people on the train must not exceed the safety limit of 300 kg. Three
boys and two girls are chosen at random. Find the probability that their total mass is less
than 300 kg, stating clearly the mean and variance of the distribution that you use. [4]
(iv) The track is improved and new trains carrying up to 6 people are designed. The new
safety limit is L kg. Obtain the equation for L, given that it is 95% certain that 6 boys
chosen at random have a total mass not exceeding L kg. Hence find L. [3]
Exercise 188. (8864 N2010/I/6. Answer on p. 396.) The events A and B are such that
P(A) = 0.6, P(B) = 0.3 and P(A∣B) = 0.2. Find the probability that (i) both A and B
occur, [1] (ii) at least one of A and B occurs, [2] (iii) exactly one of A and B occurs. [2]
(i) Find the probability that a randomly chosen student fails the examination at both
attempts. [1]
(ii) Given that a student passes the examination, find the probability that it is at the second
attempt. [3]
(iii) Three students taking the examination are chosen at random. Find the probability
that two of them pass at the first attempt and the other passes at the second attempt. [3]
Exercise 190. (8864 N2010/I/8. Answer on p. 397.) A college has 1, 400 students in Year
One, 900 students in Year Two and 700 students in Year Three. It is intended to carry out
a survey to investigate how much students spend on new clothes each year.
(i) Describe how to obtain a stratified random sample of 60 students to take part in the
survey. [2]
(ii) Describe, in this context, one advantage that stratified sampling has compared to simple
random sampling. [1]
The amount of money spent by a student is denoted by $X. The values for a (non-stratified)
random sample of 50 students are summarised by ∑ x = 10, 450, ∑ x2 = 2, 235, 000. The
population mean and variance of X are denoted by µ and σ 2 respectively.
(iii) Calculate unbiased estimates of µ and σ 2 . [3]
A significance test of the null hypothesis µ = 200 against the alternative hypothesis µ > 200
is carried out at the 10% level of significance.
(iv) Without doing any further calculations, state two assumptions or approximations that
are involved when carrying out the significance test using the above sample data. [2]
Exercise 192. (8864 N2010/I/10. Answer on p. 397.) A factory produces components for
an electrical product. The masses of the components are normally distributed with standard
deviation 1.2 grams. The factory owner claims that the mean mass of the components is
15 grams. A random sample of 80 components was taken and found to have a mean mass
of 15.25 grams. (i) Test the owner’s claim at the 5% level of significance. [4]
The owner purchases new machinery to produce the components, and the standard devia-
tion remains unchanged. The owner claims the mean mass is now less than 15 grams. A
new random sample of 80 components is taken.
(ii) Find the set of values within which the mean mass of this sample must lie for the
owner’s new claim to be accepted at the 5% level of significance. [5]
Exercise 193. (8864 N2010/I/11. Answer on p. 398.) (a) Eight pairs of values of variables
x and y are measured. Draw a sketch of a possible scatter diagram of the data for each of
the following cases: the product moment correlation coefficient is approximately
x 18 20 22 27 35 45 55
y 2.55 2.65 2.85 3.15 4.76 5.45 6.26
(i) Give a sketch of the scatter diagram for the data, as shown on your calculator. [2]
(ii) Find the product moment correlation coefficient. [1]
(iii) Find the equation of the regression line of y on x in the form y = mx + c, giving the
values of m and c correct to 4 decimal places. [1]
(iv) Calculate an estimate of the monthly earnings of a 40-year-old worker. State why you
would expect this to be a reliable estimate. [2]
(v) All workers are given an increase of N thousand dollars per month. Without any further
calculations state any change you would expect in the values of your constants m and c
found in part (iii). [2]
(i) Find the probability that an individual unwrapped sweet has mass less than 36 grams.
[1]
(ii) State the mean and variance of the mass of an individual wrapped sweet. Find the
probability that a wrapped sweet has mass between 42 grams and 46 grams. [3]
Twelve wrapped sweets are packed together in a cardboard tube. The mass of an empty
tube is normally distributed with mean 50 grams and standard deviation 5 grams. The
masses of all sweets and tubes are independent.
(iii) Find the probability that the total mass of a tube containing 12 wrapped sweets is
more than 600 grams, stating clearly the mean and variance of the distribution that you
use. [4]
A rival company produces similar tubes of sweets. The masses of these tubes of sweets
have a normal distribution. Over a long period of time, it is found that 5% of them have a
mass less than 450 grams and 8% have a mass more than 550 grams.
(iv) Find the mean and variance of this distribution. [5]
The probabilities of each researcher being in the office when the phone rings are as follows.
All the probabilities are independent. Find the probability that, when the phone rings,
(i) the call is for A and A is in the office, [1]
(ii) the researcher being called is in the office, [2]
(iii) the call is for C, given that the researcher being called is not in the office. [2]
(i) Find the probability that the lifetime of a component is more than 144 days. [2]
(ii) Two components are chosen at random. Find the probability that one has a lifetime of
more than 144 days and one has a lifetime of less than 144 days. [2]
A company develops a new design for the component. The standard deviation of the
lifetimes remains 18 days, but the company claims that the mean lifetime is longer than
for the old components. From a random sample of 50 components of the new design, the
sample mean is 124 days.
(iii) Test at the 5% level of significance whether there is sufficient evidence to support the
company’s claim. [4]
Exercise 198. (8864 N2009/I/9. Answer on p. 400.) A liquid nutrient is added to the
soil around the fruit trees in an orchard, with the aim of increasing the total weight of fruit
produced by the trees. For each of 8 trees, the volume of liquid nutrient, x cm3 , and the
corresponding weight, y kg, of fruit per tree is given in the table below.
(i) Give a sketch of the scatter diagram for the data, as shown on your calculator.
(ii) Calculate the product moment correlation coefficient and comment on its value in the
context of the data. [2]
(iii) Calculate the equation of the regression line of y on x. Sketch this line on your scatter
diagram. [2]
(iv) Estimate the weight of fruit on a tree when 135 of liquid nutrient is added to its soil.
[1]
(v) Explain why it might be unsuitable to use the equation in part (iii) to estimate how
much liquid nutrient would be needed for a tree to yield 20 kg of fruit. [1]
(i) Find the probability that, in a group of 10 randomly chosen candidates who take the
examination, exactly 2 will fail. [2]
(ii) It is given that 15% of the candidates who pass the piano examination are awarded a
distinction. Find the probability that, in a randomly chosen group of 10 candidates who
take the examination, fewer than 2 will be awarded a distinction. [3]
(iii) Use a suitable approximation to estimate the probability that, in a group of 50 randomly
chosen candidates who take the examination, at most 12 will fail. You should state the
mean and variance of the distribution used in the approximation. [4]
Exercise 200. (8864 N2009/I/11. Answer on p. 401.) (a) An insurance company receives
a large number of claims for flood damage. On a particular day the company receives 72
such claims. Because of staff shortages, it is only possible to process 8 of these claims.
Parts (a)(i) and (a)(ii) are no longer in the 8865 (revised) syllabus, so you can skip them.
(i) Describe how you would choose a systematic random sample of size 8 from the received
claims. [2]
(ii) Comment on whether this method of sampling gives a better indication of the value of
the 72 claims as compared to simply choosing as the sample the first 8 claims received. [1]
(b) From the claims received by the company, over a long period of time, a random sample
of 120 is taken. The values of the claims, $x, are summarised by
(i) Find unbiased estimates of the population mean and variance. [3]
(ii) What do you understand by the term ‘unbiased estimate’? [1]
(iii) The population mean is denoted by $µ. Using the sample data, a significance test of
the null hypothesis µ = 1000 against the alternative hypothesis µ ≠ 1000 is carried out at
the α% level of significance. Find the set of values of α for which the null hypothesis will
be rejected. [5]
(b) The masses, in kilograms, of apples and nectarines sold by the supermarket have inde-
pendent normal distributions with means and standard deviations as shown in the following
table.
(i) Two apples and four nectarines are chosen at random. Find the probability that the
total mass of the two apples is greater than the total mass of the four nectarines. [4]
(ii) Apples cost $9 per kilogram and nectarines cost $12 per kilogram. Find the mean
and the variance of the total cost of two apples and four nectarines and hence find the
probability that the total cost is between $5 and $6. [5]
(i) Give a reason why a normal distribution, with this mean and standard deviation, would
not give a good approximation to the distribution of marks. [1]
(ii) A random sample of 50 of the candidates is taken. Calculate the probability that the
mean mark of this sample lies between 70.0 and 75.0.[3]
Exercise 203. (8864 N2008/I/8. Answer on p. 402.) A baker makes loaves of bread. 60%
of the loaves that he makes are ‘crusty’.
(i) A customer buys six randomly chosen loaves. Find the probability that exactly three of
them are crusty. [2]
(ii) A market trader buys 40 randomly chosen loaves. Use a suitable approximation to find
the probability that at least 20 of them are crusty. [4]
(iii) The mass of a loaf has a normal distribution with mean 1.24 kg and standard deviation
σ kg. The probability that a randomly chosen loaf has mass less than 1 kg is 0.04. Find
the value of σ. [3]
∑ x = 10317, ∑ x2 = 1540231.
The population mean lifetime is denoted by µ hours. The null hypothesis µ = 150 is to be
tested against the alternative hypothesis µ < 150.
(i) Find the p-value of the test and state the meaning of this p-value in the context of the
question. [5]
A second random sample of 50 batteries of this type is test and the lifetime, y hours, of
each battery is measured, with results summarised by
∑ y = 7331, ∑ y 2 = 1100565.
(ii) Combining the two samples into a single sample, carry out a test, at the 10% significance
level, of the same null and alternative hypotheses. [6]
x 15 17 13 21 16 22 14 18
y 290 350 270 430 340 410 300 360
(i) Give a sketch of the scatter diagram for the data as shown on your calculator. [2]
(ii) Find x̄ and ȳ, and mark the point (x̄, ȳ) on your scatter diagram. [2]
(iii) Calculate the equation of the regression line of y on x, and draw this line on your
scatter diagram. [2]
(iv) Calculate the product moment correlation coefficient, and comment on its value in
relation to your scatter diagram. [2]
(v) For the next three-month period, the sales target is 20 cranes. Estimate the correspond-
ing profit. [2]
(vi) The company’s sales director uses the regression line in part (iii) to predict the profit
if 40 cranes were to be sold in a three-month period. Comment on the validity of this
prediction. [2]
(i) Find the probability that a randomly chosen small bag has a mass exceeding 1.2 kg. [4]
(ii) Find the probability that the total mass of two randomly chosen small bags is within
±0.2 kg of the mass of a randomly chosen large bag. [4]
Lee buys two small bags at $1.50 per kg, and Foo buys one large bag at $1.20 per kg.
(iii) Find the probability that Lee pays at least $0.50 more than Foo. [6]
(i) Find the proportion of packets which contain less than 500 g of margarine. [2]
The manufacturer increases the mean amount of margarine in a packet to µ g. The standard
deviation remains unchanged. Only 1 packet in 1000, on average, now contains less than
500 g.
(ii) Find µ, correct to 1 decimal place. [3]
The headteacher wants to find out what students think of the lunches provided in the
canteen. On one particular day she selects a sample of students to interview from those
buying their lunch by
(ii) State one advantage and one disadvantage of the sampling method used in this context.
[2]
(iii) Describe an alternative sampling method which would be better in this case. [2]
Exercise 210. (8864 N2007/I/8. Answer on p. 404.) Seven cities in a certain country are
linked by rail to the capital city. The table below shows the distance of each city from the
capital and the rail fare from the city to the capital.
City A B C D E F G
Distance, x km 124 44 76 148 16 180 104
Rail fare, $y 156 53 99 169 23 177 138
(i) Give a sketch of the scatter diagram for the data, as shown on your calculator. [2]
(ii) Calculate the product moment correlation coefficient. [1]
You are given that the regression line of y on x has equation y = 16.7 + 1.01x, where the
coefficients are given correct to 3 significant figures.
(iii) Calculate the equation of the regression line of x on y, giving your answer in the form
x = a + by. [1]
(iv) Use the appropriate regression line to estimate
(a) the rail fare from a city that is 28 km from the capital, [2]
(b) the distance of a city from the capital if the rail fare is $198. [2]
(v) Comment briefly on the reliability of the estimates in part (iv). [2]
(iii) The mean and standard deviation of X are denoted by µ and σ respectively. Find
P(µ − σ < X < µ + σ), correct to 2 decimal places. [5]
(i) Find unbiased estimates of the population mean and variance. [3]
(ii) Assuming a normal distribution, test at the 5% significance level whether the population
mean volume is less than 500 ml. [4]
(iii) State, giving a reason, whether it is necessary to assume a normal distribution for the
test to be valid. [1]
Exercise 213. (8864 N2007/I/11. Answer on p. 405.) The table below shows the results
of a survey of the 120 cars in a car park, in which the colour of each car and the gender of
the driver were recorded.
Male Female
Green 18 12
Blue 48 22
Red 6 14
One of the cars is selected at random. M is the event that the car selected has a male
owner. G is the event that the car selected is green. B is the event that the car selected is
blue. R is the event that the car selected is red.
(i) Find the following probabilities: (a) P(M ), (b) P(M ∩ G), (c) P(M ∪ B), (d) P(M ∣R′ ).
[1 mark each.]
(ii) Determine whether the events M and G are independent, justifying your answer. [2]
It is given that bicycle racks are fitted to 20% of the green cars, 30% of the blue cars and
5% of the red cars. One of the cars is selected at random and found to have a bicycle rack
fitted. (iii) What is the probability that it is a blue car? [5]
(i) Two men are chosen at random. Find the probability that one of the men has mass
more than 90 kg and the other has mass less than 90 kg. [4]
(ii) One man and one woman are chosen at random. Find the probability that the woman’s
mass is greater than the man’s. [4]
The safety limit for a hotel elevator is 530 kg. (iii) Six men are chosen at random. Find
the probability that their total mass is greater than 530 kg. [4]
(iv) Six male hotel guests enter the elevator, at a time when a large number of sumo
wrestlers are staying at the hotel. Give two reasons why the probability that their total
mass exceeds 530 kg may be different from the value calculated in part (iii). [2]
Exercise 215. (8174 N2006/II/8. Answer on p. 406.) A and B are independent events
such that P(A) = 0.6 and P(A ∪ B) = 0.7. Find P(A ∩ B ′ ). [6]
Exercise 216. (8174 N2006/II/9. Answer on p. 406.) This question is no longer in the
8865 (revised) syllabus, so you can skip it. Some students are conducting a survey at a
sports club. They each question a sample of the club members.
(i) Anil decides to choose the first 20 men and the first 20 women he sees. What name is
given to this type of sampling? [1]
(ii) Betty decides to choose every tenth person on the membership list. What name is given
to this type of sampling? [1]
(iii) Calvin decides to use random sampling. Describe briefly one way in which he could
select his sample. [2]
The club has 240 members and 3 sections — badminton, squash and tennis. The table
shows the number of men and women in each section.
(iv) Dennis decides to take a stratified sample of size 60 from the total membership. (a)
How many women does he select? (b) How many men from the squash section does he
select? [3]
(i) Find the probability that exactly 4 out of 12 residents watch the programme. [3]
(ii) Use a suitable approximation to find the probability that, out of 80 residents, more
than 20 but less than 30 watch the programme. [7]
Exercise 218. (8174 N2006/II/14. Answer on p. 407.) A team either wins or loses each
of their matches. If the team wins a match, the probability that it wins the next match
is 0.8. If the team loses a match, the probability that it wins the next match is 0.4. The
team plays 4 matches in total. The team wins the first match. Calculate the probability
that the team wins
Exercise 219. (8174 N2006/II/14-OR. Answer on p. 407.) The heights of male students
in a college can be modelled using a normal distribution with mean 176 cm and standard
deviation 4 cm.
(i) Calculate the probability that one of these students, chosen at random, is less than 170
cm tall. [2]
(ii) Find the height that is exceeded by 10% of these students. [2]
In another college there are 1000 female students. Of these, 6 are less than 150 cm tall and
883 of them are less than 175 cm tall.
(iii) Assuming the heights of these students can be modelled using a normal distribution
with mean m and standard deviation s, find the value of m and of s. [6]
Answers to Exercises
My answers here are often more verbose than what would be necessary for you to get the
full credit on an exam. The reason is to help you understand my answers better.
Answer to Exercise 2. Given f (x) = 7x−3, we have f (0) = 7⋅0−3 = −3, f (1) = 7⋅1−3 = 4,
and f (2) = 7 ⋅ 2 − 3 = 11.
Answer to Exercise 3. Given g (the function that maps each country to its capital), we
have g(France) = Paris and g(Japan) = Tokyo.
4
3
2
1
0
-2.5 -2.0 -1.5 -1.0 -0.5 -1 0.0 0.5 1.0 1.5 2.0 2.5
-2
-3
-4
(ii)
2
1
0
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
-1
-2
-3
-4
12
0
-2 -1 0 1 2 3
-4
(ii)
12
0
-5 -3 -2 0 1 3
-4
(ii) (a) ax2 + bx + c is positive for all possible values of x if and only if a > 0 (so ∪-shaped)
and b2 − 4ac < 0 (so doesn’t touch x-axis).
(b) ax2 + bx + c is negative for all possible values of x if and only if a < 0 (so ∩-shaped) and
b2 − 4ac < 0 (so doesn’t touch x-axis).
Answer to Exercise 8.
52+x
= 2x+1 Add the exponents
5 + 3(25x ) + 17(52x )
52+x
= 2x+1 ∵ 25 = 52
5 + 3(52x ) + 17(52x )
52+x
= 2x 1 Factorise out 52x
5 (5 + 3 + 17)
1 52+x 5x
= x = 2x = 2x = 5−x .
5 5 (25) 5
b = 4. Then x(a ) = 2(3 ) = 281 , but xab = 23×4 = 212 – the two are clearly not equal.
b 4
b
(ii) (xa ) = xab is true, as we now prove:
b times
b
³¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
(xa ) = (xa ) ⋅ (xa ) ⋅ ⋅ ⋅ ⋅ ⋅ (xa )
b times
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
⎛³¹¹ ¹a¹ ¹ ¹ ¹ ¹ ¹times ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞ ⎛³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞
a times ⎛³¹¹ ¹a¹ ¹ ¹ ¹ ¹ ¹times ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞
= ⎜x ⋅ x ⋅ ⋅ ⋅ ⋅ ⋅ x⎟ ⋅ ⎜x ⋅ x ⋅ ⋅ ⋅ ⋅ ⋅ x⎟ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜x ⋅ x ⋅ ⋅ ⋅ ⋅ ⋅ x⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠
= xab .
10
9
8
7
6
5
4
3
2
1
0
(ii)
(iii) In 2025, there’ll be b2025 = 128 Singaporean billionaires.
Answer to Exercise 12. (i) ln (1/e2 ) = −2, log5 0.008 = −4, lg 100000 = 5.
(ii) loga 16 = 4 Ô⇒ a = 2.
logb 0.25 = −1 Ô⇒ b = 4.
logc 5 = 1 Ô⇒ c = 5.
(iii) y = 3x Ô⇒ log3 y = x.
5 = pq Ô⇒ logp 5 = q.
(iv) α = log4 β Ô⇒ 4α = β.
logγ δ = 17 Ô⇒ γ 17 = δ.
(ii) 2 loga 7 + 0.25 loga 81 − loga 3 = loga 49 + loga 811/4 − loga 3 = loga 49 + loga 3 − loga 3 =
loga 49 = loga x. So x = 49.
√ √
1± (−1)2 − 4(1) (−e2 ) 1 ± 1 + 4e2
y= = .
2(1) 2
√
We know that y must be positive, so it must be that y = (1 + 1 + 4e2 ) /2.
8 y
4
y = ex
3
1
x
0
-2 -1 0 1 2
There are infinitely many lines of symmetry — specifically, every line that is perpendicular
to the graph is a line of symmetry.
8 y
7
5
y = 3x + 2
4
1
x
0
-2 -1 0 1 2
-1
-2
-3
-4
10 y
8
y = 2x2 + 1
7
1
x
0
-2 -1 0 1 2
√ √
−24 ± 242 − 4(5)(16) −24 ± 256
x= = = −4, −0.8.
2(5) 10
So there are two solutions to the given pair of simultaneous equations, namely (x, y) =
(−4, −0.4) and (x, y) = (−0.8, −0.24). TI84 screenshots:
1 3 2 4
(ii) Rearrange = into y = 1 − 4x. Rearrange = into y = −2x2 + 5x − 3.
3 4
Now plug = into = to get 1 − 4x = −2x2 + 5x − 3 or 2x2 − 9x + 4 = 0 or (2x − 1)(x − 4) = 0. So
x = 0.5 or x = 4.
Correspondingly, y = −1 or y = −15.
So there are two solutions to the given pair of simultaneous equations, namely (x, y) =
(0.5, −1) and (x, y) = (4, −15). TI84 screenshots:
It looks like there are no horizontal intercepts. Conclusion: This system of equations has
no solutions.
After Step 1.
1
Answer to Exercise 18 (b). The system of equations is y = , y = x3 + sin x.
1−x2
1
Rewrite the two equations into a new equation y = − x3 − sin x.
1 − x2
Our goal is to find the horizontal intercepts of this equation. These horizontal intercepts
will give us the solutions to the above system of equations.
1
1. Graph the equation y = − x3 − sin x.
1−x 2
It is −1.1790. Conclusion: This system of equations has one solution and its x-coordinate is
−1.1790. To find the corresponding y-coordinate, we need merely plug in this value of x into
1 1
either of the equations in the original system of equations: y = = 2 ≈
1−x 2
1 − (−1.1790)
−2.5633. Altogether, this system of equations has one solutions: (−1.1790, −2.5633).
√ √
−3 ± 32 − 4(3)(−11) −3 ± 141
x= = .
2(3) 6
√ √
Thus, the inequality holds if x < (−3 − 141) /6 or x > (−3 + 141) /6.
(ii) Rearrange the inequality (x−3)(x+5) < 1 into x2 +2x−16 < 0. The expression x2 +2x−16
is a ∪-shaped quadratic that equals 0 when
√ √
−2 ± 22 − 4(1)(−16) −2 ± 68 √
x= = = −1 ± 17.
2(1) 2
√ √
Thus, the inequality holds if −1 − 17 < x < −1 + 17.
√ √
Answer
√ to Exercise 20 (b) Rewrite the
√ inequality as x−cos x > 0. Graph y = x−cos x.
x − cos x = 0 ⇐⇒ x = 0.6417. Thus, x > cos x ⇐⇒ x > 0.6417.
3
Beng is 32 years old today. And from =, Apu is 64 years old today.
Answer to Exercise 22. The given information provides this system of equations
2 1 2 2 2 3
a (1) + b (1) + c = 2, a (3) + b (3) + c = 5, a (6) + b (6) + c = 9.
You can solve this system of equations either by calculator or by hand, as I do now:
2 1 4
Take = minus = to get 8a + 2b = 3 or b = 0.5(3 − 8a) = 1.5 − 4a.
4 1 5
Plug = into = to get a + 1.5 − 4a + c = 2 or c = 0.5 + 3a.
4 5 3
Plug = and = into = to get
4 5
Now from =, b = 49/30 and from =, c = 0.4.
Answer to Exercise 23. The turning point (which is a minimum turning point if a is
b b 2 b b2 b2 b2
positive) of the equation is at x = − and y = a (− ) + b (− ) + c = − +c = c− .
2a 2a 2a 4a 2a 4a
We know that at the minimum point, x = 0 and y = 0. So b = 0 and c = 0. Since (−1, 2)
2 1
satisfies the equation y = ax2 + bx + c, we also have a (−1) + b (1) + c = 2. Thus, a = 2.
y − (e + 1) = (3 + e) (x − 1) .
Or rearranging: y = (3 + e)x − 2.
y − (ln 2 + e2 + 4) = (4.5 + e2 ) (x − 2) .
Or rearranging: y = (4.5 + e2 ) x − 5 − e2 + ln 2.
y − (2 + 7e) = (2 + 7e) (x − 1) .
Or rearranging: y = (2 + 7e)x.
g(2) = 1/2 + 23 + 7e2 = 8.5 + 7e2 and g ′ (2) = −1/22 + 3 ⋅ 22 + 7e2 = 11.75 + 7e2 .
So the equation of the tangent at the point (1, −26) is y − (−26) = 84.5 (x − 1). Or rearrang-
ing: y = 84.5x − 110.5. And at x = 2:
√ 3 √ 3 dy 3 1 3
y = 13 ( 2 − 2 ) = 13 ( 2 − ) and = 13 (0.5 ⋅ 2−0.5 + 2 ⋅ 3 ) = 13 ( √ + ) .
2 4 dx 2 2 2 4
√
So the equation of the tangent at the point (2, 13 ( 2 − 3/4)) is
√ 3 1 3 1 3 √ 9 1
y − [13 ( 2 − )] = 13 ( √ + ) (x − 2) or y = 13 ( √ + ) x + 13 ( 2 − − √ ) .
4 2 2 4 2 2 4 4 2
(ii) f ′ (x) = 6x − 4 is negative for x < 2/3, equal to 0 at x = 2/3, and positive for x > 2/3.
From a graph sketch, we see that there are minimum turning points at x = ±1 and a
maximum turning point at x = 0.
-2 -1 0 1 2
(v) j ′ (x) = 3x2 + 2x − 1 = (3x − 1)(x + 1). So the only two stationary points are at x = 1/3
or x = −1 .
From a graph sketch, we see that there is a minimum turning points at x = −1 and a
maximum turning point at x = 1/3.
(ii) g ′ (x) = 3x2 − 6x + 3 = 3 (x2 − 2x + 1) = 3(x − 1)2 . The only stationary point is at x = 1.
From a graph sketch, it is a point of inflexion.
√
√ 3
(b) By the Pythagorean Theorem, l = r2 + h2 = + h2 .
πh
(c) The total external surface area of the cone (including the base) is
√ √ √ √
3 3 9 3h 9
A = πrl = π + h2 = π + = + 3πh.
πh πh π 2 h2 π h2
−18
dA h3 + 3π 3 π − h63 3 π − h63
(d) Compute = √ = √ = .
dh 2 9 + 3πh 2 9 + 3πh 2 A
h2 h2
dA 6 1/3
So = 0 ⇐⇒ h = ( ) ≈ 1.24 m.
dh π
√
(e) Graph A = 9/h2 + 3πh on your graphing calculator. (This is simply the expression we
found in part (c).)
(Ignore the region where h < 0 since the height of the cone cannot be negative.)
Zoom in to verify that the stationary point we found in part (d) is indeed a minimum
turning point.
Answer to Exercise 32. The function f defined by f (x) = 2x has indefinite integrals
d
1. (kx + C) = k Ô⇒ ∫ k dx = kx + C, ✓
dx
d xk+1 xk+1
2. ( + C) = xk Ô⇒ ∫ x k
dx = + C, ✓
dx k + 1 k+1
d x
4. (e + C) = ex Ô⇒ ∫ e dx = e + C,
x x
✓
dx
d 1 ax+b 1
6. [ e + C] = eax+b Ô⇒ ∫ e
ax+b
dx = eax+b + C, ✓
dx a a
d
7. [f (x) ± g(x) + C] = f ′ (x) ± g ′ (x) Ô⇒ ∫ f ′ (x) ± g ′ (x) dx = f (x) ± g(x) + C, ✓
dx
d
8. [kf (x) + C] = kf ′ (x) Ô⇒ ′
∫ kf (x) dx = kf (x) + C. ✓
dx
Answer to Exercise 34. (i) ∫ 7x5 − 8x4 + 3x2 + 2 dx = 7x6 /6 − 8x5 /5 + x3 + 2x + C, where
C is the constant of integration.
2 2
2
Answer to Exercise 35. (i) ∫ y dx = ∫ 6 dx = [6x]1 = 12 − 6 = 6.
1 1
2
3 3
2 x3 5x2 8 1 5 5
(ii) ∫ y dx = ∫ x + 5x + 10 dx = [ + + 10x] = + 10 + 20 − ( + + 10) = 19 .
−2 −2 3 2 1 3 3 2 6
2 2
2
(iii) ∫ y dx = ∫ 1/x dx = [ln ∣x∣]1 = ln 2 − ln 1 = ln 2.
1 1
4/3 24/3 − 1 3
A = A + B + C + D − (B + C + D) = 2 − (1 + ) = (24/3 − 1) .
4 4
y
y=2
A
y=1
D
B
C
x
√
Answer to Exercise 38. The two curves intersect at ± 2/2 (quadratic formula). So
√ √
2/2 √ √ √ √ √
2/2 2x 3 2 2 2 2 2 2 2 2
A=∫ √ 2 − x2 − (x2 + 1) dx = [x − ] √ =[ − ] − [− + ]= .
− 2/2 3 − 2/2 2 12 2 12 3
Answer to Exercise 39. Taking the green path, there are 3 ways. Taking the red path,
there are 2 ways. Hence, there are 3 + 2 = 5 ways to get from the Starting Point to the
River.
Case #2(ii). Second letter is a D. Then the last two letters must be either DE or ED. (2
permutations.)
Altogether then, there are 1 + 2 + 1 + 2 = 6 possible permutations of the letters in DEED.
____
1 2 3 4
Thus, by the MP, there are 10 × 9 × 8 × 7 = 5040 ways to choose the first 4D number.
If we ignored the fact that we already chose the first 4D number, then there’d similarly be
5040 ways to choose the second 4D number (given the condition that this second 4D number
does not have any repeated digits). However, there is an additional condition — namely,
the second 4D number cannot be the same as the first. Thus, there are 5040 − 1 = 5039
ways to choose the second 4D number.
By similar reasoning, we see that there are 5040 − 2 = 5038 ways to choose the third 4D
number.
Altogether then, by the MP, there are 5040 × 5039 × 5038 = 127, 947, 869, 280 ways to choose
the three 4D numbers.
1. The food court and hawker centre share 2 types of cuisine (Chinese and Western) in
common. And so together, the food court and the hawker centre have 4 + 3 − 2 = 5
different types of cuisine.
2. Combine together the food court and the hawker centre (call this the “Low-Class Place”).
The Low-Class Place has 5 types of cuisine and shares 2 types of cuisine (Chinese and
Malay) with the restaurant. And so together, the Low-Class Place and restaurant have
5 + 3 − 2 = 6 different types of cuisine (namely Chinese, Indonesian, Japanese, Korean,
Malay, and Western).
There are 4! ways to permute the brothers and 3! ways to permute the sisters.
Hence, there are in total 1 × 4!3! = 144 possible ways to arrange the siblings in a line, so
that no two brothers are next to each other.
(b) First consider the problem of permuting the seven letters in BBBBSSS, without any
two S’s next to each other. We’ll use the AP.
1. B in position #1.
(a) B in position #2. Then the only way to fill the remaining five positions is SBSBS.
Total: 1 possible arrangement.
(b) S in position #2. Then we must have B in position #3.
i. B in position #4. Then the only way to fill the remaining three positions is
SBS. Total: 1 possible arrangement.
ii. S in position #4. Then we must have B in position #5. And there are two
ways to fill the remaining two positions: either BS or SB. Total: 2 possible
arrangements.
(a) B in position #3. Then, like in 1(b), we are left with two B’s and two S’s to fill
the remaining four positions. Hence, Total: 3 possible arrangements.
(b) S in position #3. Then we must have B in position #4. There are three ways
to fill the remaining three positions: SBB, BSB, and BBS. Total: 3 possible
arrangements.
Hence, there are in total 10 × 4!3! = 1440 possible ways to arrange the siblings in a line, so
that no two sisters are next to each other.
⎛n⎞ n!
=
⎝ k ⎠ k!(n − k)!
n × (n − 1) × ⋅ ⋅ ⋅ × (n − k + 1) × (n − k) × (n − k − 1) × ⋅ ⋅ ⋅ × 1
=
k!(n − k) × (n − k − 1) × ⋅ ⋅ ⋅ × 1
n × (n − 1) × (n − 2) × ⋅ ⋅ ⋅ × (n − k + 1)
= (mass cancellation).
k!
4! 4! 4×3
C(4, 2) = = = = 6,
2!(4 − 2)! 2!2! 2 × 1
6! 6! 6×5
C(6, 4) = = = = 15,
4!(6 − 4)! 4!2! 2 × 1
7! 7! 7×6×5
C(7, 3) = = = = 35.
3!(7 − 3)! 3!4! 3 × 2 × 1
⎛ 3 ⎞⎛ 7 ⎞⎛ 5 ⎞
Answer to Exercise 51. = 630.
⎝ 1 ⎠⎝ 2 ⎠⎝ 2 ⎠
17! 17! 17 × 16 17 × 16 × 15
(c) C(17, 2) + C(17, 3) = + = +
2!15! 3!14! 2×1 3×2×1
18 × 17 × 16
= 17 × 8 + 17 × 8 × 5 = 17 × 8 × 6 = .
3×2×1
Consider the 6 terms on the right. There is C(3, 0) = 1 way to choose 0 of the x’s. Hence,
the coefficient on x0 is C(3, 0) — this corresponds to the term 1 ⋅ 1 ⋅ 1 above.
There are C(3, 1) = 3 ways to choose 1 of the x’s. Hence, the coefficient on x1 is C(3, 1) —
this corresponds to the terms 1 ⋅ 1 ⋅ x, 1 ⋅ x ⋅ 1, and x ⋅ 1 ⋅ 1 above.
There are C(3, 2) = 3 ways to choose 2 of the x’s. Hence, the coefficient on x2 is C(3, 2) —
this corresponds to the terms 1 ⋅ x ⋅ x, x ⋅ 1 ⋅ x, and x ⋅ x ⋅ 1 above.
There is C(3, 03) = 1 way to choose 3 of the x’s. Hence, the coefficient on x3 is C(3, 3) —
this corresponds to the term x ⋅ x ⋅ x above.
Altogether then,
⎛4⎞
Answer to Exercise 57. (a) There are = 4 ways of choosing the two Tan sons
⎝2⎠
⎛3⎞
and = 3 ways of choosing the two Wong daughters.
⎝2⎠
Having chosen these sons and daughters, there are only 2! = 2 × 1 possible ways of matching
them up. This is because for the first chosen Tan Son, we have 2 possible choices of brides
for him. And then for the second chosen Tan Son, there is only 1 possible choice of bride
left for him.
⎛ 4 ⎞⎛ 3 ⎞
Altogether then, there are ⋅ 2 = 24 ways of forming the two couples.
⎝ 2 ⎠⎝ 2 ⎠
⎛6⎞ ⎛9⎞
(b) There are = 6 ways of choosing the five Lee sons and = 126 ways of choosing
⎝5⎠ ⎝5⎠
the five Ho daughters.
Having chosen these sons and daughters, there are 5! = 5 × 4 × 3 × 2 × 1 possible ways of
matching them up. This is because for the first chosen Tan Son, we have 5 possible choices
of brides for him. And then for the second chosen Tan Son, there are 4 possible choices of
brides left for him. Etc.
⎛ 6 ⎞⎛ 9 ⎞
Altogether then, there are ⋅ 5! = 6 ⋅ 126 ⋅ 5! = 90, 720 ways of forming the five
⎝ 5 ⎠⎝ 5 ⎠
couples.
A«, K«, Q«, . . . , 2«, Aª, Kª, Qª, . . . , 2ª, A©, K©, Q©, . . . , 2©, A¨, K¨, Q¨, . . . , 2¨.
P(A) = 1/4.
Answer to Exercise 59. A and B are not mutually exclusive. A and C are not mutually
exclusive. But B and C are mutually exclusive.
Answer to Exercise 60. B ′ is the event that the student has at least two phones. C ′
is the event that the student has zero, one, or at least three phones.
(b)
P(A ∩ B) 3/36 3
P(A∣B) = = = .
P(B) 5/36 5
Answer to Exercise 66. First, note that P (H1 ) = P (T1 ) = P (H2 ) = 0.5.
(a) P (H1 ∩ H2 ) = 0.25 = 0.5 × 0.5 = P (H1 ) P (H2 ), so that indeed H1 and H2 are indepen-
dent.
(b) P (H2 ∩ T1 ) = 0.25 = 0.5×0.5 = P (H2 ) P (T1 ), so that indeed H2 and T1 are independent.
(c) Observe that H1 ∩ T1 = ∅ (it is impossible that “the first coin flip is heads” AND also
“the first coin flip is tails”).
Hence, P (H1 ∩ T1 ) = P (∅) = 0 ≠ 0.25 = 0.5 × 0.5 = P (H1 ) P (T1 ), so that indeed H1 and T1
are not independent.
Answer to Exercise 67. No, the journalist is incorrectly assuming that the probability
of one family member making the NBA is independent of another family member making
the NBA. But such an assumption is almost certainly false.
The same excellent genes that made Rick Barry a great basketball player, probably also
helped his three sons. Not to mention that having an NBA player as your father probably
helps a lot too.
The two events “family member #1 in NBA” and “family member #2 in NBA” are probably
not independent. So we cannot simply multiply probabilities together.
Answer to Exercise 68. The possible observed values of X are 2, 3, 4, . . . , and 12.
3 2 1 6 1
(c) P(E) = P (X ≥ 10) = P (X = 10) + P (X = 11) + P (X = 12) = + + = = .
36 36 36 36 6
1 1 5 5 ⎛ 2 ⎞1 1⎛ 2 ⎞5 1 1 1 1 1
P (X + Y = 2) = ⋅ ⋅ ⋅ + ⋅ ⋅ + ⋅ ⋅ ⋅
2 2 6 6 ⎝ 1 ⎠2 2⎝ 1 ⎠6 6 2 2 6 6
25 20 1 46
= + + = .
144 144 144 144
(b) P (X + Y = 3) is simply the probability of 2 heads and 1 six OR 1 head and 2 sixes. So
1 1 ⎛ 2 ⎞ 5 1 ⎛ 2 ⎞ 1 1 1 1 10 2 12
P (X + Y = 3) = ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ = + = .
2 2 ⎝ 1 ⎠ 6 6 ⎝ 1 ⎠ 2 2 6 6 144 144 144
1 1 1 1 1
P (X + Y = 4) = ⋅ ⋅ ⋅ = .
2 2 6 6 144
(d) E[X + Y ]
= P (X + Y = 0) ⋅ 0 + P (X + Y = 1) ⋅ 1 + P (X + Y = 2) ⋅ 2
+ P (X + Y = 3) ⋅ 3 + P (X + Y = 4) ⋅ 4
25 60 46 12 1 60 + 92 + 36 + 4 192 4
= ⋅0+ ⋅1+ ⋅2+ ⋅3+ ⋅4= = = .
144 144 144 144 144 144 144 3
1
(b) P (X = 2000) = P (X = 1000) = P (X = 490) = ,
10000
10
P (X = 250) = P (X = 60) = ,
10000
9977
P (X = 0) = ,
10000
1
P (Y = 3000) = P (Y = 2000) = P (Y = 800) = ,
10000
9997
P (Y = 0) = .
10000
1 1 1 9997
= ⋅ 3000 + ⋅ 2000 + ⋅ 800 + ⋅ 0 = 0.3 + 0.2 + 0.08 + 0 = 0.58.
10000 10000 10000 10000
(d) For every $1 staked, the “big” game is expected to lose you $0.341 and the “small”
game is expected to lose you $0.42. Thus, the “big” game is expected to lose you less
money.
1 ⋅ 4 + 2 ⋅ 4 + 3 ⋅ 4 + 4 ⋅ 4 + 36 ⋅ 0 10
E [B] = = .
52 13
12 ⋅ 4 + 22 ⋅ 4 + 32 ⋅ 4 + 42 ⋅ 4 + 36 ⋅ 0 30
E [B 2 ] = = .
52 13
2 30 10 2 290
2
Hence, V[B] = E [B ] − (E [B]) = −( ) = .
13 13 169
35 65
E [Y ] = × 20 cm + × 30 cm = 26.5 cm.
100 100
35 2 65 2
V [Y ] = × (20 cm − 26.5 cm) + × (30 cm − 26.5 cm) = 22.75 cm2 .
100 100
√
SD [Y ] = V [Y ] ≈ 4.77 cm.
(c) The mean of the total weight of the two fish is 2µ kg. However, we do not know the
variance, since the weights of the two fish are not independent.
P (X ≥ 2) = 1 − P (X ≤ 1) = 1 − P (X = 0) − P (X = 1)
⎛ 20 ⎞ ⎛ 20 ⎞
=1− 0.010 0.9920 − 0.011 0.9919 ≈ 0.0169.
⎝ 0 ⎠ ⎝ 1 ⎠
P (Y ≥ 2) = 1 − P (Y ≤ 1) = 1 − P (Y = 0) − P (Y = 1)
⎛ 35 ⎞ ⎛ 35 ⎞
=1− 0.0050 0.99535 − 0.0051 0.99534 ≈ 0.0133.
⎝ 0 ⎠ ⎝ 1 ⎠
⎧
⎪
⎪
⎪
⎪ 0, if k < 3, ⎧
⎪
⎪
⎪ ⎪
⎪0.5, if k ∈ [3, 5]
(a) FY (k) = ⎨0.5k, if k ∈ [3, 5], (b) fY (k) = ⎨
⎪
⎪
⎪ ⎪
⎪
⎪
⎪
⎪
⎪ if k > 5. ⎩0, otherwise.
⎩1,
(c) P (3.1 ≤ Y ≤ 4.6) = 0.75 is in blue and P (4.8 ≤ Y ≤ 4.9) = 0.05 is in red.
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
1 a−µ 2 1 a−0 2 1
fX (a) = √ e−0.5( σ ) = √ e−0.5( 1 ) = √ e−0.5a = φ(a).
2
σ 2π 1 2π 2π
We’ve just shown that the PDF of X ∼ N(µ, σ 2 ) when µ = 0 and σ 2 , is the same as the PDF
of the SNRV Z ∼ N(0, 1). Hence, the SNRV is indeed simply a normal random variable
with mean µ = 0 and variance σ 2 = 1.
X − µ X −µ µ −µ 1 2
= + ∼ N( + , σ ) = N (0, 1) .
σ σ σ σ σ σ2
Answer to Exercise 82. We are given that X ∼ N(2.14, 5) and Y ∼ N(−0.33, 2).
1 − 2.14
(a) P (X ≥ 1) = P (Z ≥ √ ) ≈ P (Z ≥ −0.5098) = P (Z ≤ 0.5098) = Φ (0.5098) ≈ 0.6949.
5
1 − (−0.33)
P (Y ≥ 1) = P (Z ≥ √ ) ≈ P (Z ≥ 0.9405) = 1 − P (Z ≤ 0.9405) = 1 − Φ (0.9405) ≈
2
0.1735.
≈ P (−1.8515 ≤ Z ≤ −1.6279)
(b) Let B1 ∼ N (110, 1156), B2 ∼ N (110, 1156), . . . , B12 ∼ N (110, 1156) be the bills in each
of the 12 months.
Then the total bill in a year is T = B1 +B2 +⋅ ⋅ ⋅+B12 ∼ N (12 × 110, 12 × 1156) = N (1320, 13872).
Thus, P (T > 1000) ≈ 0.9967 (calculator).
Our goal is to find the value of x for which P (B > 100) = 0.1. We have
50 − 200x 50 − 200x
Φ (√ ) = 0.9 ⇐⇒ √ ≈ 1.2815.
256 + 10000x2 256 + 10000x2
One can rearrange, do the algebra (square both sides), and use the quadratic formula.
Alternatively, one can simply use one’s graphing calculator to find that x ≈ 0.084. We
conclude that the maximum value of x is approximately 0.084, in order for the probability
that the total utility bill in a given month exceeds $100 is 0.1 or less.
Answer to Exercise 85. Let X be the random variable that is the sum of the weights of
the 5, 000 Coco-Pops.
The CLT says that since n = 5000 ≥ 30 is large enough and the distribution is “nice
enough” (we are assuming this), X can be approximated by the normal random variable
Y ∼ N (5000 × 0.1, 5000 × 0.004) = N (500, 20). Thus, P (X ≤ 499) ≈ P (Y ≤ 499) ≈ 0.4115
(calculator).
3 + 14 + 2 + 8 + 8 + 6 + 0 41
Answer to Exercise 86. x̄ = = and
7 7
Answer to Exercise 87. (a) The sample mean x̄ and variance s2 are
n
∑i=1 x 1885
x̄ = = = 188.5,
n 10
2
(∑n
378, 265 − 1885
2
∑i=1 x2 − i=1 x)
n
2 10
s = n
= ≈ 2550.
n−1 9
2
2 [∑ (x −50)]
n
378, 265 − 1885
2
∑i=1 (xi − 50) − i=1 ni
n
2 10
s = = ≈ 2550.
n−1 9
∑ xi 32 + 88 + 67 + 75 + 56
x̄ = = = 63.6,
n 5
2 ∑ x2i − nx̄2 322 + 882 + 672 + 752 + 562 − 4 × 63.6
s = = = 448.3.
n−1 4
(b) We don’t know! And unless we literally gather and weigh every single Singaporean, we
will never know what exactly the average weight of a Singaporean is.
All we’ve found in part (a) is an estimate (63.6 kg) for the average weight of a Singaporean.
We know that on average, the estimator we uses “gets it right”.
However, it could well be that we’re unlucky (and got 5 unusually heavy or unusually light
persons) and the estimate of 63.6 kg is thus way off.
X 1 + X 2 + ⋅ ⋅ ⋅ + Xn E [X1 + X2 + ⋅ ⋅ ⋅ + Xn ]
E [X̄] = E [ ]=
n n
E [X1 ] + E [X2 ] + ⋅ ⋅ ⋅ + E [Xn ] µ + µ + ⋅ ⋅ ⋅ + µ nµ
= = = = µ.
n n n
We have just shown that E [X̄] = µ. In other words, we’ve just shown that X̄ is an unbiased
estimator for µ.
x1 + x2 + ⋅ ⋅ ⋅ + x10
x̄ = = 0.7,
n
2 2 2
2 (x1 − x̄) + (x2 − x̄) + ⋅ ⋅ ⋅ + (x10 − x̄) 7 ⋅ 0.32 + 3 ⋅ 0.72 ⋅
s = = = 0.23.
n−1 9
(b) Yes, the observed sample mean x̄ = 0.7 is an unbiased estimate for the true population
mean µ (i.e. the true proportion of coin flips that are heads).
⋅
And yes, the observed sample variance s2 = 0.23 is an unbiased estimate for the true
population variance σ 2 .
(c) No, this is merely one observed random sample, from which we generated a single
estimate (“guess”) — namely x̄ = 0.7 — of the true population mean µ.
All we know is that the sample mean X̄ is an unbiased estimator for the true population
mean µ. That is, the average estimate generated by X̄ will equal µ.
However, any particular estimate x̄ may or may not be equal to µ. Indeed, if we’re unlucky,
our particular estimate may be very far from the true µ.
1 1
Answer to Exercise 91. V [X̄] = V [ (X1 + X2 + ⋅ ⋅ ⋅ + Xn )] = 2 V [X1 + X2 + ⋅ ⋅ ⋅ + Xn ] =
n n
1 1 σ 2
2
(V [X1 ] + V [X2 ] + ⋅ ⋅ ⋅ + V [Xn ]) = 2 (nσ 2 ) = .
n n n
k
(b) The population variance σ 2 is the number defined by σ 2 = ∑ (xi − µ) /k. It measures
i=1
the dispersion across the population values.
n
(c) The sample mean X̄ is a random variable defined by X̄ = ∑ Xi /n. It is the average
i=1
of all values in a random sample.
n
(d) The sample variance S 2 is a random variable defined by S 2 = ∑ (Xi − X̄) / (n − 1).
i=1
It measures the dispersion across the values in a random sample.
(e) The mean of the sample mean, also called the expected value of the sample mean, is the
number E [X̄]. The interpretation is that if we we have infinitely-many observed samples
of size n, calculate the observed sample mean for each, then E [X̄] is equal to the average
across the observed sample means. It can be shown that E [X̄] = µ and hence that the
sample mean X̄ is an unbiased estimator for the population mean µ.
(f) The variance of the sample mean is the number V [X̄]. The interpretation is that if
we have infinitely-many observed random samples of size n, calculate the observed sample
mean for each, then V [X̄] measures the dispersion across the observed sample means.
(g) The mean of the sample variance, also called the expected value of the sample variance,
is the number E [S 2 ]. The interpretation is that if we have infinitely-many observed
random samples of size n, calculate the observed sample variance for each, then E [S 2 ] is
equal to the average across the observed sample variances. It can be shown that E [S 2 ] = σ 2
and hence that the sample variance S 2 is an unbiased estimator for the population variance
σ2 .
(h) Given an observed random sample, e.g. (x1 , x2 , x3 ) = (1, 1, 0), we can calculate the
corresponding observed sample mean as
x1 + x2 + x3 1 + 1 + 0 2
x̄ = = = .
3 3 3
The observed sample mean is the average of all values in an observed random sample.
(i) Given an observed random sample, e.g. (x1 , x2 , x3 ) = (1, 1, 0), we can calculate the
corresponding observed sample variance as
2 2 2
2 (x1 − x̄) + (x2 − x̄) + (x3 − x̄) 1/9 + 1/9 + 4/9 1
s = = = .
3−1 2 3
The observed sample variance measures the dispersion across the observed sample variances.
Our random sample is 20 coin-flips: (X1 , X2 , . . . , X20 ), where Xi takes on the value 1 if
the ith coin-flip is heads and 0 otherwise.
Our test statistic is the number of heads: T = X1 + X2 + ⋅ ⋅ ⋅ + X20 .
In our observed random sample (x1 , x2 , . . . , x20 ), there are 17 heads. So the observed
test statistic is t = 17.
Assuming H0 were true, we’d have T ∼ B (20, 0.5). Thus, the p-value is
Answer to Exercise 94. Let µ be the true long-run proportion of coin-flips that are
heads. The null and alternative hypotheses are
Our random sample is 20 coin-flips: (X1 , X2 , . . . , X20 ), where Xi takes on the value 1 if
the ith coin-flip is heads and 0 otherwise.
Our test statistic is the number of heads: T = X1 + X2 + ⋅ ⋅ ⋅ + X20 .
In our observed random sample (x1 , x2 , . . . , x20 ), there are 17 heads. So the observed
test statistic is t = 17.
Assuming H0 were true, we’d have T ∼ B (20, 0.5). Thus, the p-value is
Thus, the critical value is 15 (this is the value of t at which we are just able to reject H0 at
the α = 0.05 significance level).
And the critical region is {15, 16, . . . , 20} (this is the set of values of t at which we’d be able
to reject H0 at the α = 0.05 significance level).
(b) The competing hypotheses are H0 ∶ µ = 0.5, HA ∶ µ ≠ 0.5.
The test statistic T is the number of heads (out of the 20 coin-flips).
For t = 14, the corresponding p-value is
Thus, the critical value is 15 and the critical region is {15, 16, . . . , 20}.
H0 ∶ µ = 34,
HA ∶ µ ≠ 34.
35 + 35 + 31 + 32 + 33 + 34 + 31 + 34 + 35 + 34
x̄ = = 33.4.
10
⎛ 33.4 − 34 ⎞ ⎛ 34.6 − 34 ⎞
=P Z≥ √ +P Z ≤ √ ≈ 0.5271.
⎝ 9/10 ⎠ ⎝ 9/10 ⎠
The large p-value does not cast doubt on or provide evidence against H0 . We fail to reject
H0 at the α = 0.05 significance level.
H0 ∶ µ = 34,
HA ∶ µ ≠ 34.
The large p-value casts doubt on or provides evidence against H0 . We reject H0 at the
α = 0.05 significance level.
H0 ∶ µ = 34,
HA ∶ µ ≠ 34.
The observed sample mean is x̄ = 33.4. And the observed sample variance is s2 = 11.2.
The fairly small p-value casts some doubt on or provides some evidence against H0 . But
we fail to reject H0 at the α = 0.05 significance level.
Answer to Exercise 99. The observed sample mean is x̄ = 68 and the observed sample
variance (use Fact 13(a)) is
2
− [∑i=1n xi ] 50 × 5000 − (68×50)
n 2
∑i=1 x2i
n
50
s2 = = ≈ 383.7.
n−1 49
Let µ be the true average weight of a Singaporean. The competing hypotheses are H0 ∶ µ =
75 and HA ∶ µ < 75.
(This is a one-tailed test, because your friend’s claim is that the average American is heavier
than the average Singaporean. If the claim were instead that the average American’s weight
is different from the average Singaporean’s, then we’d have a two-tailed test.)
Since the sample size n = 50 is “large enough”, we can appeal to the CLT. The p-value is
CLT ⎛ 68 − 75 ⎞
p = P (X̄ ≤ 68∣H0 ) ≈ P Z ≤ √ ≈ 0.0058.
⎝ 383.7/50 ⎠
The small p-value casts doubt on or provides evidence against H0 . We can reject H0 at any
conventional significance level (α = 0.1, α = 0.05, or α = 0.01).
1200 q
1000
800
600
400
200
p ($)
0
0 2 4 6 8 10 12
n
∑ (pi − p̄) (qi − q̄) = (8 − p̄) (300 − q̄) + (9 − p̄) (250 − q̄) + ⋅ ⋅ ⋅ + (8 − p̄) (400 − q̄)
i=1
= (8 − 7.8) (300 − 470) + (9 − 7.8) (250 − 470) + ⋅ ⋅ ⋅ + (8 − 7.8) (400 − 470) = −2480,
¿ √
Án
Á
À∑ (pi − p̄)2 = (8 − p̄)2 + (9 − p̄)2 + (4 − p̄)2 + (10 − p̄)2 + (8 − p̄)2
i=1
√ √
2 2 2 2 2
= (8 − 7.8) + (9 − 7.8) + (4 − 7.8) + (10 − 7.8) + (8 − 7.8) = 20.8 ≈ 4.56070170,
¿ √
Án
Á
À∑ (qi − q̄)2 = (300 − q̄)2 + (250 − q̄)2 + ⋅ ⋅ ⋅ + (400 − q̄)2
i=1
√ √
2 2 2
= (300 − 470) + (250 − 470) + ⋅ ⋅ ⋅ + (400 − 470) = 368000 ≈ 606.63003552.
(b) i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
q̂i 446 327 923 208 406
ûi = qi − q̂i −146 −77 77 192 −46
1000 q
900
800
700
600
500
400
300
200
100 p ($)
0
(c) 0 2 4 6 8 10
5
2 2 2
(d) The SSR is ∑ û2i ≈ (−146) + (−77) + 772 + 1922 + (−46) = 72308.
i=1
After Step 9. After Step 10. After Step 11. After Step 12.
The TI84 tells us that r = −.8963881445 and the regression line is y = ax+b = −119.2307692+
1400. This is indeed consistent with the answers from the previous exercises.
Answer to Exercise 104. In the previous exercises, we already calculated that the OLS
line of best fit is q = 1400 − 119.2p. Thus,
(a) By interpolation, a barber who charged $7 per haircut sold 1400 − 119.2 × 7 ≈ 566
haircuts.
(b) By extrapolation, a barber who charged $200 per haircut sold 1400−119.2×200 = −22440
haircuts. This is plainly absurd.
The second prediction is obviously absurd and thus obviously less reliable than the first.
d 3 −5 −24
= −12(2x − 1) (2) = .
dx (2x − 1)4 (2x − 1)5
1
12 2 1 4 x3 4
(ii) ∫ (x + ) dx = ∫ x2 + 4 + 2 dx = [ + 4x − ]
0.5 x 0.5 x 3 x 0.5
1 1 7
= ( + 4 − 4) − ( + 2 − 8) = 6 .
3 24 24
√ √
dA 3 3
(ii) = (10 − 2x) = 0 ⇐⇒ x=5 Ô⇒ Amax = 75.
dx 2 2
dy 1
(ii) = 0.5x ln 0.5 − .
dx x+1
R
dy RRRR 1 √ 2
Ô⇒ RRR = 0.50.5 ln 0.5 − = − 0.5 ln 2 − ≈ −1.157.
dx RR 0.5 + 1 3
Rx=0.5
√
(iii) The point P is (0.5, 0.5 − ln 1.5). The normal to C at P has equation
√ −1 1
y − ( 0.5 − ln 1.5) = √ 2
(x − 0.5) = √ (x − 0.5)
− 0.5 ln 2 − 3 0.5 ln 2 + 23
or y = 0.8644568499x − 0.13058675187.
Its y- and x-intercepts are y0 = 0.8644568499(0)−0.13058675187 ≈ −0.13058675187
√ and x0 =
(0 + 0.13058675187)/0.8644568499 ≈ 0.15106219805. The length of AB is thus x20 + y02 ≈
0.040.
6 1 2√ 6
1 √ √ √
∫1 √ dx = [ 1 + 4x] = ( 25 − 5) = 0.5 (5 − 5) .
1 + 4x 4 1 2
(ii) dy/dx = 2e1−2x . The point is (1, 1 − e−1 ). dy/dx∣x=1 = 2e−1 . So the equation is y −
(1 − e−1 ) = 2e−1 (x − 1) or y = 2e−1 x + 1 − 3e−1 .
Correspondingly, y = 18, 6. We reject the latter because y > x. Thus, (x, y) = (4, 18).
√ √
10 ± (−10)2 − 4(3)(7) 5 ± 25 − 21 7
x= = = 1, .
2(3) 3 3
2
2
3 2 x4 5x3 7x2 x 40 1 5 7 1 19
(iv) ∫ x −5x +7x−1dx = [ − + − ] = (4 − + 14 − 1)−( − + − ) = .
1 4 3 2 2 1 3 4 3 2 2 12
Answer to Exercise 115 (8864 N2013/I/1). A quadratic equation has no real roots if
2
and only if its discriminant is negative. In this case, the discriminant is D = [−(k − 2)] −
4(1)(2k + 1) = k 2 − 4k + 4 − 8k − 4 = k 2 − 12k = k(k − 12).
D < 0 if and only if k ∈ (0, 12).
0 0
1 1 −3 1 1 −3 7
(ii) ∫ dx = [ (1 − 3x) ] = − 4 = .
−1 (1 − 3x)4 9 −1 9 9 64
1+5
10−a−3 = ⇐⇒ 2(7−a)(a−3) = 6 ⇐⇒ −a2 +10a−21 = 3 ⇐⇒ a2 −10a+24 = 0.
2a − 6
a2 − 10a + 24 = (a − 6)(a − 4) = 0 ⇐⇒ a = 4, 6.
(iii) y − 3 = (y + 5)/(2 ⋅ 4 − 6) ⇐⇒ 2(y − 3) = (y + 5) ⇐⇒ y = 11. It is (11, 11).
1 1
(iv) Exact: ∫ e2−2x − 2e−x dx = [−0.5e2−2x + 2e−x ]0 = (−0.5e0 + 2e−1 ) − (−0.5e2 + 2e0 ) =
0
−1
2e − 2.5 + 0.5e2 ≈ 1.930. TI84:
Answer to Exercise 120 (8864 N2012/I/1). Let u = e2x . Then 3e2x = 4 (e−2x − 1)
⇐⇒ 3u = 4 (u−1 − 1) ⇐⇒ 3u2 = 4 (1 − u) ⇐⇒ 3u2 + 4u − 4 = 0 ⇐⇒ (3u − 2)(u + 2) = 0
⇐⇒ u = 2/3, −2.
Assuming that x is real, it cannot be that e2x = −2. Hence, e2x = 2/3 or x = 0.5 ln(2/3).
120x
4x+ +20 = 100 ⇐⇒ 4x2 +100x+400+120x = 100x+2000 ⇐⇒ 4x2 +120x−1600 = 0.
x + 20
(ii) x2 + 30x − 400 = (x − 10)(x + 40) = 0 ⇐⇒ x = 10, −40. (Reject the negative value.)
HF = x + y = 10 + 60 ⋅ 10/(10 + 20) = 30.
2 4
4 √ 1 4 1 x2
(ii) ∫2 ( x − √ ) dx = ∫ x + − 2dx = [ + ln ∣x∣ − 2x]
x 2 x 2 2
= (8 + ln 4 − 8) − (2 + ln 2 − 4) = ln 2 + 2.
(iv) The points A and B are (2.1377, 0) and (1.048, 1.048). So the length AB is
√
(2.1377 − 1.048)2 + 1.0482 ≈ 1.51.
Answer to Exercise 128 (8864 N2011/I/4). (i) V = (2 − 2x)2 x = 4x3 − 8x2 + 4x.
(ii) dV /dx = 12x2 − 16x + 4 = 0 ⇐⇒ 3x2 − 4x + 1 = (3x − 1)(x − 1) = 0 ⇐⇒ x = 1/3, 1. (These
are the two stationary points.)
dV /dx is decreasing at x = 1/3 and increasing at x = 1. Hence, the former is a maximum
turning point.
1 5 16
y − (2 − ln 5) = − (x − 2) or y = − x + − ln 5.
0.6 3 3
3 16 16
So A = ( ( − ln 5) , 0) and B = (0, − ln 5). So the area of △OAB is
5 3 3
2
3 16 16 3 16 1 2
0.5 ( − ln 5) ( − ln 5) = ( − ln 5) = (16 − 3 ln 5) .
5 3 3 10 3 30
Answer to Exercise 130 (8864 N2010/I/1). A quadratic equation has two real roots
2
if and only if its discriminant is positive. In this case, the discriminant is D = (−2k) −
4(4)(9) = 4k 2 − 144 = 4(k − 6)(k + 6).
(iii) dy/dx∣x=3 = 2/3. So at the point (3, ln 3), the normal has equation y − ln 3 = −1.5(x − 3)
or y + 1.5x = 4.5 + ln 3 or 2y + 3x = 9 + 2 ln 3.
0.5 1 0.6 51
3 4 4 5 0.5
∫−1 6 − 4x − 3x dx = [6x − x − 0.6x ]−1 = (3 − 16 − 32 ) − (−6 − 1 + 0.6) = 9 160 .
√
Answer to Exercise 136 (8863 N2009/I/2). (i) x = 0.5x ⇐⇒ x = 0.25x2 ⇐⇒
x(1 − 0.25x) = 0 ⇐⇒ x = 0, 4. So the points of intersection are (0, 0) and (4, 2).
√
(ii) ∫ xdx = 2x1.5 /3 + C and ∫ 0.5xdx = x2 /4 + D, where C and D are constants of
integration
4√ 4
(iii) ∫ x − 0.5xdx = [2x1.5 /3 − x2 /4]0 = 16/3 − 4 = 4/3.
0
(ii) dy/dx = 1 + 1/x2 . dy/dx∣x=2 = 5/4. So the gradient of the normal at P is −0.8.
(iii) The point P is (2, 1.5). So the equation of the normal is y − 1.5 = −0.8(x − 2) or
4x + 5y − 15.5 = 0.
(iv) N is (0, 3.1). The equation of the tangent at P is y − 1.5 = (5/4)(x − 2). So the point
T is (0, −1).
(iii) From the graph, 2x3 − 5x2 − 4x + 3 > 0 ⇐⇒ x ∈ (−1, 0.5) ∪ (3, ∞).
2e3x − 5e2x − 4ex + 3 > 0 ⇐⇒ ex ∈ (−1, 0.5) ∪ (3, ∞) ⇐⇒ x ∈ (−∞, ln 0.5) ∪ (ln 3, ∞).
(ii) sin(3π + α) = sin 3π cos α + sin α cos 3π = 0 ⋅ cos α + sin α ⋅ (−1) = − sin α = −c.
sin(π + α) = sin π cos α + sin α cos π = 0 ⋅ cos α + sin α ⋅ (−1) = − sin α = −c.
1 2
Answer to Exercise 140 (8863 N2008/I/2). We are given that x + y = 20 and x2 + y 2 =
300.
1 3 3 2
From =, we have y = 20 − x. Plug = into = to get x2 + (20 − x)2 = 300 or 2x2 − 40x + 100 = 0
or x2 − 20x + 50 = 0. Solving the quadratic:
√
20 ± (−20)2 − 4(1)(50) √ √
x= = 10 ± 100 − 50 = 10 ± 50.
2(1)
√
Correspondingly, y = 10 ∓ 50. So the two solutions are
√ √ √ √
(x, y) = (10 ± 50, 10 ∓ 50) , (10 ∓ 50, 10 ± 50) .
Answer to Exercise 143 (8863 N2008/I/6). (i) dy/dx = 2/(2x + 4) = 1/(x + 2).
dy/dx∣x=1 = 1/3. The equation of the tangent at P is y − ln 6 = (1/3)(x − 1). So the
x-coordinate of T is 1 − 3 ln 6.
(ii) The equation of the normal at P is y − ln 6 = −3(x − 1). So the x-coordinate of N is
1 + (ln 6)/3.
Answer to Exercise 146 (8863 N2007/I/4). (i) Let a be the length of each side of
the isosceles triangle. By the Pythagorean Theorem, a2 + a2 = x2 . Thus, the area of the
triangle is 0.5a2 = x2 /4.
(ii) Let b be the length of a side of the square. Then 2b + x = 100. And A = x2 /4 + b2 =
2
x2 /4 + [0.5(100 − x)] = 2500 − 50x + 0.5x2 .
(iii) dA/dx = −50 + x = 0 ⇐⇒ x = 50. So Amin = 2500 − 50 ⋅ 50 + 0.5 ⋅ 502 = 1250. This is
the minimum because A is a ∪-shaped quadratic function of x.
(iv) A is a ∪-shaped quadratic function of x. We know the minimum is at x = 50. So the
maximum is at either corner (i.e. x = 10 or x = 80).
A(10) = 2500 − 50 ⋅ 10 + 0.5 ⋅ 102 = 2050. A(80) = 2500 − 50 ⋅ 80 + 0.5 ⋅ 802 = 1700.
So Amax = A(10) = 2050.
1
Answer to Exercise 149 (8174 N2006/I/7). Plug the equation of the line y = 1 − 3x
2
into the equation of the curve x2 + y 2 + kx + 2y + 7 = 0 to get
2 3
x2 + (1 − 3x) + kx + 2 (1 − 3x) + 7 = 0 ⇐⇒ 10x2 + (k − 12)x + 10 = 0.
d d dy dy
(x2 + y 2 + kx + 2y + 7) = 0 ⇐⇒ 2x + 2y + k + 2 = 0.
dx dx dx dx
The tangent line has slope −3. So at the point at which the line touches the curve, we
have dy/dx = −3. Plugging this into the above, we have 2x + 2y(−3) + k + 2(−3) = 0 or
4 1 4 5 5
2x − 6y + k − 6 = 0. Now plug = into = to get 20x − 12 + k = 0 or k = 12 − 20x. Now plug = into
3
= to get 10x2 − 20x2 + 10 = 0 or 10x2 = 10 or x2 = 1 or x = ±1. Correspondingly, k = −8, 32.
4 4 4
(ii) ∫ −2x2 + 6x + 11 − (−4x + 19)dx = ∫ −2x2 + 10x − 8dx = [−2x3 /3 + 5x2 − 8x]1
1 1
= (−2/3) ⋅ 63 + 5 ⋅ 15 − 8 ⋅ 3 = −42 + 75 − 24 = 9.
Answer to Exercise 152 (8864 N2015/I/6). Let X be the mass of a peach. We are
1 2
given that P(X < 40) = P (Z < (40 − µ)/σ) = 0.2 and P(X > 60) = P (Z > (60 − µ)/σ) = 0.25.
1 2 3 4
From = and =, we have (40 − µ)/σ ≈ −0.841621234 and (60 − µ)/σ ≈ 0.67448975.
Answer to Exercise 153 (8864 N2015/I/7). (i) Take the 12th, 24th, . . . , 1200th
students.
(ii) There might be some strange period-12 pattern in the list of names, thus introducing
bias to the sample.
(iii) Stratified.
Answer to Exercise 154 (8864 N2015/I/8). (i) 0.03 = P(A ∪ B) = P(A) + P(B) −
P(A ∩ B) = 3p − 0.42, so p = 0.15.
(ii) P(A ∪ B ′ ) = P(A) + P(B ′ ) − P(A ∩ B ′ ) = p + (1 − 2p) − 0.12 = 1 − p − 0.12 = 0.73.
(iii) P(A)P(B ′ ) = p(1 − 2p) = 0.15(0.7) = 0.105. P(A ∩ B ′ ) = 0.12. Since P(A)P(B ′ ) ≠
P(A ∩ B ′ ), A and B ′ are not independent.
Answer to Exercise 155 (8864 N2015/I/9). (i) Let X ∼ B(8, 1/6) be the number of
sixes. P(X = 3) = C(8, 3)(1/6)3 (5/6)5 = 56 ⋅ 55 /68 ≈ 0.104.
≈ 0.969.
(iii) Let Y ∼ B(600, 1/6). Since the sample size is large, Y can approximated by A ∼
N (100, 500/6).
(ii) r ≈ 0.922. There is a fairly strong, positive linear correlation between h and w.
(iii) w − 98.1 = 58.0(h − 1.799).
(iv) ŵh=1.66 = 58.0(1.66−1.799)+98.1 ≈ 90.0. (1) Linear interpolation is somehow magically
reliable. (2) Our linear model seems to fit the data pretty well.
Answer to Exercise 157 (8864 N2015/I/11). (i) Let M be the mass of a randomly
chosen man. P(75 ≤ M ≤ 79) ≈ 0.162.
(ii) Let W be the mass of a randomly chosen woman.
(ii) (3/4) × (2/5) + (1/4) × (3/4) × (2/5) = 3/8. (iii) (3/4) × (2/5) = 0.3.
(iv) Let X ∼ B(5, 3/8) be the number of successes. P(X ≥ 2) ≈ 0.619.
2
−32 2
325 − (−32)
40
x̄ = + 18 = 17.2, s = ≈ 7.677.
40 39
Answer to Exercise 160 (8864 N2014/I/6). (i) P(H < 146) ≈ 0.737.
Answer to Exercise 161 (8864 N2014/I/7). (i) Order the 5000 households by name.
Take the 50th, 100th, . . . , 5000th households.
(ii) The six strata are “under-25, supermarket”, “under-25, online”, “25−60, supermarket”,
“25 − 60, online”, “over-60, supermarket”, “over-60, online”. From each, randomly pick,
respectively, 10, 20, 18, 32, 16, and 4 households.
(iii) Stratified sampling, because it usually results in a smaller sample variance.
15
10
2209.2 − 310.4
2
310.4 2 50
x̄ = = 6.208, s = ≈ 5.76.
50 49
(iii) The competing hypotheses are H0 ∶ µ = 7 and HA ∶ µ ≠ 7. The sample mean X̄50 is, by
the CLT, well-approximated by N(µ, 5.76/50). The p-value is
To reject H0 , α ? 1.97.
48 + 12 + 10 + 20 90
P(L) = = ,
290 + x 290 + x
55 + 15 + 10 + 20 100
P(G) = = .
290 + x 290 + x
90 100 30
P(L)P(G) = = P(L ∩ G) = .
290 + x 290 + x 290 + x
Answer to Exercise 166 (8864 N2014/I/12). (i) P(A > 75) = P(A > (75 − 50)/σ) =
0.0189 ⇐⇒ (75 − 50)/σ ≈ 2.077016894 ⇐⇒ σ ≈ 12.03649333 ⇐⇒ σ 2 ≈ 145.
(ii) WB = B1 + B2 + ⋅ ⋅ ⋅ + B7 ∼ N (7 ⋅ 75, 7 ⋅ 64) = N (525, 448). P(WB < 500) ≈ 0.119.
Answer to Exercise 167 (8864 N2013/I/6). (i) Randomly pick 25, 50, and 75 people
who bought the $X, $Y , $Z tickets, respectively.
(ii) Results in lower sample variance (as compared to simple random sampling).
29555 − 305
2
2 250
t̄ = 305/250 + 75 = 76.22, s = ≈ 117.2.
250 − 1
(ii) Let T ∼ (µ, σ 2 ) be the retention time. The competing hypotheses are H0 ∶ µ = 75 and
HA ∶ µ > 75.
The sample mean retention time is T̄250 ∼ (µ, σ 2 /250). By the CLT, T̄250 is well-approximated
by T̄250 ∼ N (µ, s2 /250).
The p-value is
√
P (T̄250 ≥ 76.22∣H0 ) = P (Z ≥ (76.22 − 75)/( 117.2/250) ≈ 0.03738856 > 0.025.
160
150
140
130
120
110
100
0 3 6 9 12 15
(ii) r ≈ 0.9032560806 is positive and fairly large, suggesting a fairly strong linear correlation
between age and height.
(iii) y = 4.46x + 87.43.
(iv) ŷx=13.2 = 4.46(13.2) + 87.43 ≈ 146. We are supposed to say that this estimate is reliable
because it involves interpolation.
Answer to Exercise 172 (8864 N2013/I/11). (i) Let A ∼ N (1000, σ 2 ) be the mass
(in g) of a Type A packet of animal food. P(A < 990) = P(Z < (990 − 1000) /σ) = 0.2 ⇐⇒
(990 − 1000) /σ = −0.841621234 ⇐⇒ σ ≈ 11.9.
(ii) Let P ∼ N (240, 102 ) and Q ∼ N (145, 82 ) be the masses (in g) of a scoop of P and a
scoop of Q, respectively. Then
(ii) r ≈ −0.9840253445 is very large and negative. This suggest a very strong, negative
linear correlation between x and y.
(iii) y = −19.21x + 183.12.
(iv) (a) ŷx=4 = −19.21(4) + 183.12 ≈ 106.26.
(iv) (b) ŷx=9 = −19.21(9) + 183.12 ≈ 10.23.
(v) We are supposed to say that the estimate ŷx=4 is reliable because it involves interpolation
and the estimate ŷx=9 is not because it involves extrapolation.
(iii) Let B ∼ B(96, 0.8) be the number that flower. By the CLT, B is well-approximated
by C ∼ N(96 ⋅ 0.8, 96 ⋅ 0.8 ⋅ 0.2) = N(76.8, 15.36). So,
[This is fairly close to the exact probability of P(B > 75) ≈ 0.638 (calculator).]
(iv) Using the approximation in (iii), the answer is C(3, 2)0.6302 (1−0.630)+0.6303 ≈ 0.691.
[Using instead the exact probability, it is C(3, 2)0.6382 (1 − 0.638) + 0.6383 ≈ 0.702.]
Answer to Exercise 179 (8864 N2012/I/11). (i) Unbiased estimates of the population
mean and variance are
1240 − (−60)
2
−60 2 100 1204
x̄ = + 300 = 299.4, s = = .
100 100 − 1 99
(ii) Let X̄100 ∼ (µ, σ 2 /100) be the sample mean. By the CLT, it is approximately the case
that X̄100 ∼ N (µ, s2 /100). The competing hypotheses are H0 ∶ µ = 300 and HA ∶ µ ≥ 300.
The p-value is
(iii) Let X̄100 ∼ (µ, 12.1/100) be the sample mean. By the CLT, it is approximately the
case that X̄100 ∼ N (µ, 0.121). The competing hypotheses are H0 ∶ µ = 300 and HA ∶ µ ≥ 300.
The minimum kmin at which we’d be able to reject H0 at the 10% significance level is given
by
Answer to Exercise 182 (8864 N2011/I/7). (i) Every student is equally likely to be
chosen.
(ii) The three strata are “car”, “bicycle”, and “on foot”. The totals for each stratum are
440, 760, and 800, for a grand total of 2000 students. So from each stratum, take 22, 38,
and 40 students.
(iii) Stratified sampling usually results in lower sample variance (than simple random
sampling).
A better stratified sample of size 100 could have been achieved by using six strata instead
of just three: namely “Year 1 car”, “Year 1 bicycle”, “Year 1 on foot”, “Year 2 car”, “Year
2 bicycle”, and “Year 2 on foot”.
(ii) r ≈ −0.9670056283 is large and negative, which suggests there is a strong, negative
linear correlation between H and T .
(iii) T = −0.01472090021H + 27.00297934.
(iv) T̂H=1000 = −0.01472090021(1000) + 27.00297934 ≈ 12.28. We are supposed to say it’s
reliable because it involves interpolation.
Answer to Exercise 184 (8864 N2011/I/9). (i) The lifetime of a light bulb in this
batch is L ∼ (µ, 14002 ). The sample mean lifetime is L̄50 ∼ (µ, 14002 /50). By the CLT, we
have approximately L̄50 ∼ N (µ, 14002 /50).
The competing hypotheses are H0 ∶ µ = 12000 and HA ∶ µ < 12000. The p-value is
so we can reject H0 . This is evidence in favour of believing that this particular batch is
substandard.
(ii) P (L̄50 ≤ Tmin ∣H0 ) = 0.05 ⇐⇒ Tmin ≈ 11674.3356 (calculator).
Answer to Exercise 185 (8864 N2011/I/10). (i) (a) Let X ∼ B(7, 0.8) be the number
of times Jon completes the puzzle. P(X = 3) = 0.028672 (calculator).
(i) (b) P(X ≥ 5) = 0.851968 (calculator).
(ii) 0.8519685 ≈ 0.449.
(iii) Let Y ∼ B(70, 0.8) be the number of times Jon completes the puzzle. By the CLT, Y
is well-approximated by A ∼ N(70 ⋅ 0.8, 70 ⋅ 0.8 ⋅ 0.2) = N(56, 11.2). So
(i) (b) P(Red ball) = P(A ∩ Red ball) + P(B ∩ Red ball)
= P(A)P(Red ball∣A) + P(B)P(Red ball∣B)
= 1/45/10 + 3/46/8 = 11/16.
(i) (c) P(A∣Red ball) = P(A ∩ Red ball)/P(Red ball) = 1/8/11/16 = 2/11.
Answer to Exercise 187 (8864 N2011/I/12). Let B ∼ N (60, 122 ) and G ∼ N (50, 102 )
be the masses of a boy and a girl.
(i) P (50 ≤ B ≤ 70) ≈ 0.595 (calculator).
(ii) B − G ∼ N (60 − 50, 122 + 102 ) = N (10, 244). So P (B > G) = P (B − G > 0) ≈ 0.739
(calculator).
(iii) B1 + B2 + B3 + G1 + G2 ∼ N (3 ⋅ 60 + 2 ⋅ 50, 3 ⋅ 122 + 2 ⋅ 102 ) = N (280, 632).
So P (B1 + B2 + B3 + G1 + G2 < 300) ≈ 0.787.
(iv) B1 + B2 + ⋅ ⋅ ⋅ + B6 ∼ N (6 ⋅ 60, 6 ⋅ 122 ) = N (360, 864).
√
P (B1 + B2 + ⋅ ⋅ ⋅ + B6 < L) = P (Z < (L − 360)/ 864) = 0.95
√
⇐⇒ (L − 360)/ 864 ≈ 1.644853627 ⇐⇒ L ≈ 408.349.
Answer to Exercise 188 (8864 N2010/I/6). (i) P (A ∩ B) = P(A∣B)P (B) = 0.2 ⋅ 0.3 =
0.06.
(ii) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 0.6 + 0.3 − 0.06 = 0.84.
(iii) P (A ∪ B) − P (A ∩ B) = 0.84 − 0.06 = 0.78.
Answer to Exercise 190 (8864 N2010/I/8). There are 3, 000 students total.
(i) Randomly pick 28, 18, and 14 students from Years One, Two, and Three respectively.
(ii) Stratified sampling usually results in a lower sample variance.
2235000 − 10450
2
10450 2 50
x̄ = = 209, s = ≈ 1039.79591837.
50 50 − 1
(iv) (1) The large sample size lets us use the CLT approximation. (2) What each student
spends is independent of what any other student spends (this assumption is actually already
implicit in the definition of a random sample).
Answer to Exercise 191 (8864 N2010/I/9). Let X ∼ B(8, 0.7) be the number that
germinate. (i) P(X = 6) ≈ 0.296 (calculator). (ii) P(X ≥ 6) ≈ 0.552 (calculator).
(iii) Let Y ∼ B(60, 0.7) be the number that germinate. By the CLT, Y is well-approximated
by A ∼ N(42, 12.6). So P(Y < 40) ≈ P(A < 39.5) ≈ 0.241.
[This is fairly close to the exact probability of P(Y < 40) ≈ 0.238 (calculator).]
Answer to Exercise 192 (8864 N2010/I/10). Let X ∼ N (µ, 1.22 ) be the mass of a
component. (i) The sample mean is X̄80 ∼ N (µ, 1.22 /80). The competing hypotheses are
H0 ∶ µ = 15 and HA ∶ µ ≠ 15. The p-value P (X̄80 ≥ 15.25, X̄80 ≤ 14.75∣H0 ) ≈ 0.06240742 is
more than 5%, so we fail to reject H0 . This fails to cast doubt or provide evidence against
the factory owner’s claim.
(ii) The sample mean is X̄80 ∼ N (µ, 1.22 /80). The competing hypotheses are H0 ∶ µ = 15
and HA ∶ µ < 15. The maximum observed sample mean kmax at which we’d reject H0 (in
favour of the owner’s new claim) is given by: P (X̄80 ≤ kmax ∣H0 ) = 0.05. So by calculator,
kmax ≈ 14.77931973. So the set of values of x̄80 for which we’d reject H0 (in favour of the
owner’s new claim) is [0, 14.77931973).
(a) (ii)
(b) (i)
Answer to Exercise 196 (8864 N2009/I/7). (i) P(A ∩ B) = P(A) + P(B) − P(A ∪ B) =
1/3 + 2/5 − 17/30 = 1/6.
(ii) P(A)P(B) = 2/15 is not equal to P(A ∩ B) = 1/6, so A and B are not independent.
(iii) P(A′ ∪ B) = 1 − [P(A) − P(A ∩ B)] = 1 − (1/3 − 1/6) = 5/6.
Answer to Exercise 197 (8864 N2009/I/8). Let X ∼ N (120, 182 ) be the lifetime of a
component.
(i) P(X > 144) ≈ 0.09121122 (calculator).
(ii) P (X1 < 144) P (X2 > 144) + P (X1 > 144) P (X2 < 144) ≈ 0.16578346669.
(iii) Let X ∼ N (µ, 182 ) be the new lifetime of a component. The sample mean is X̄50 ∼
N (µ, 182 /50). The competing hypotheses are H0 ∶ µ = 120 and HA ∶ µ > 120. The p-value is
so we fail to reject H0 . This fails to provide evidence in favour of the company’s claim.
(ii) r ≈ 0.9306540721 is fairly large and positive, suggesting a fairly strong positive linear
correlation between x and y.
(iii) y = 0.01232906764x + 15.48661792.
(ii) Let Y ∼ B(10, 0.8 ⋅ 0.15) be the number (out of ten) who get a distinction. P(Y < 2) ≈
0.658 (calculator).
(iii) Let A ∼ B(50, 0.2) be the number (out of 50) who fail. By the CLT, A is well-
approximated by B ∼ N(10, 8). So
[This is fairly close to the exact probability P(A ≤ 12) ≈ 0.814 (calculator).]
8282000 − 5320
2
5320 1 120
x̄ = + 1000 = 1044 , s2 = ≈ 67614.6778711.
120 3 120 − 1
1 2 2
P (X̄72 > 1044 , X̄72 < 955 ∣H0 ) = 2P (X̄72 < 955 ∣H0 ) ≈ 0.06180786.
3 3 3
Answer to Exercise 201 (8864 N2009/I/12). (a) Let X ∼ N (µ, σ 2 ) be the mass of a
plum.
1
P(X < 22) = P(Z < (22 − µ)/σ) = 0.3 ⇐⇒ (22 − µ)/σ ≈ −0.524400513.
2
P(X > 29) = P(Z > (29 − µ)/σ) = 0.2 ⇐⇒ (29 − µ)/σ ≈ 0.841621234.
(29 − µ) − (22 − µ) = 0.841621234σ − (−0.524400513) σ = 7 = 1.366021747σ ⇐⇒ σ ≈ 5.124.
And µ ≈ 24.687.
(b) (i) Let A ∼ N (0.15, 0.032 ) and N ∼ N (0.07, 0.022 ) be the masses of an apple and a
nectarine.
A1 + A2 − (N1 + N2 + N3 + N4 ) ∼ N (2 ⋅ 0.15 − 4 ⋅ 0.07, 2 ⋅ 0.032 + 4 ⋅ 0.022 ) = N (0.02, 0.0034).
P (A1 + A2 > (N1 + N2 + N3 + N4 )) = P (A1 + A2 − (N1 + N2 + N3 + N4 ) > 0) ≈ 0.634 (calcula-
tor).
(b) (ii) 9 (A1 + A2 ) + 12 (N1 + N2 + N3 + N4 ) is the random variable with distribution
Answer to Exercise 203 (8864 N2008/I/8). (i) C(6, 3)0.63 0.43 = 0.27648.
(ii) Let X ∼ B(40, 0.6) be the number that are crusty. By the CLT, X is well-approximated
by Y ∼ N(24, 9.6). So
[This is fairly close to the exact probability P(X ≥ 20) ≈ 0.926 (calculator).]
(iii) Let M ∼ N (1.24, σ 2 ) be the mass of a loaf. P(M < 1) = P(Z < (1 − 1.24)/σ) = 0.04
⇐⇒ (1 − 1.24)/σ = −1.750686071 ⇐⇒ σ ≈ 0.137.
(ii) If Tan’s pen is red, then there are 2 red pens, 5 blue pens, and 1 green pen in the box
when Mui gets a randomly-chosen pen. So the probability that Mui’s pen is blue is 5/8.
(iii) If Tan’s pen is red, then there are 2 red pens, 5 blue pens, and 1 green pen in the box
when Mui gets a randomly-chosen pen; and Mui gets a red pen with probability 2/8. If
Tan’s pen is blue, then there are 3 red pens, 4 blue pens, and 1 green pen in the box when
Mui gets a randomly-chosen pen; and Mui gets a red pen with probability 3/8. Altogether
then, her probability of getting a red pen is 3/82/8 + 5/83/8 = 21/64.
(iv) Mui’s pen is blue with probability 1 − 21/64 − 1/8 = 35/64.
Tan’s pen is red and Mui’s pen is blue with probability 3/85/8 = 15/64.
Thus, the desired conditional probability is 15/64/35/64 = 3/7.
1540231 − 10317
2
10317 2 70
x̄70 = ≈ 147.385714286, s = ≈ 284.820082816.
70 70 − 1
The p-value is the probability of getting a test statistic that is at least as extreme as that
actually observed. It is: P (X̄70 < x̄70 ∣H0 ) ≈ 0.09748170.
(ii) The sample mean is W̄120 ∼ (µ, σ 2 /120). By the CLT, we have approximately W̄120 ∼
N (µ, s2w /120). The observed sample mean and observed sample variance are, respectively,
2
10317 + 7331 1 2
1540231 + 1100565 − (10317+7331)
70+50
w̄120 = = 147 , sw = ≈ 381.205602241.
70 + 50 15 70 + 50 − 1
The p-value P (W̄120 < w̄120 ∣H0 ) ≈ 0.04990429 is less than 10%, so we are able to reject H0 .
(17, 343.75)
Answer to Exercise 208 (8864 N2007/I/6). Let M ∼ N (502, 0.82 ) be the mass of
margarine in a packet. (i) P(M < 500) ≈ 0.00621 (calculator).
(ii) The new mass of margarine in a packet is M ∼ N (µ, 0.82 ). P(M < 500) = P(Z <
(500 − µ)/0.8) = 0.001 ⇐⇒ (500 − µ)/0.8 ≈ −3.090232306 ⇐⇒ µ ≈ 502.4721858.
√ √
⎛3 9 3 9⎞
P(µ − σ < X < µ + σ) = P − <X < + = P(X = 1, X = 2)
⎝2 8 2 8⎠
1 1 3 5 1 2 3 4 6 ⋅ 243 + 15 ⋅ 81 2673
= C(6, 1) ( ) ( ) + C(6, 2) ( ) ( ) = = ≈ 0.63.
4 4 4 4 46 4096
Answer to Exercise 212 (8864 N2007/I/10). (i) Unbiased estimates of the population
mean and variance are
150.5 − (−35.8)
2
−35.8 2 50
x̄ = + 500 = 499.284, s = ≈ 2.54831020408.
50 50 − 1
(ii) The sample mean is X̄50 ∼ N (µ, σ 2 /50). We can use s2 as an unbiased estimate for σ 2 .
The competing hypotheses are H0 ∶ µ = 500 and HA ∶ µ < 500. And so the p-value is
so we can reject H0 .
(iii) No, the sample size was large enough that we could have used the CLT.
Answer to Exercise 213 (8864 N2007/I/11). (i) (a) P(M ) = (18 + 48 + 6)/120 = 3/5.
(i) (b) P(M ∩ G) = 18/120 = 3/20.
(i) (c) P(M ∪ B) = (18 + 48 + 6 + 22)/120 = 47/60.
(i) (d) P(M ∣R′ ) = (18 + 48)/(18 + 48 + 12 + 22) = 66/100 = 0.66.
(ii) P(M )P(G) = (3/5) (30/120) = 3/20 is equal to P(M ∩ G) = 3/20; thus M and G are indeed
independent.
(iii) The number of blue cars with bicycle racks is 0.3 ⋅ 70 = 21.
The number of cars with bicycle racks is 0.2 ⋅ 30 + 0.3 ⋅ 70 + 0.05 ⋅ 20 = 6 + 21 + 1 = 28.
So the desired probability is 21/28 = 3/4.
(iii) M1 +M2 +⋅ ⋅ ⋅+M6 ∼ N (6 ⋅ 75, 6 ⋅ 12.52 ) = N (450, 937.5). So P (M1 + M2 + ⋅ ⋅ ⋅ + M6 > 530) ≈
0.00449034 (calculator).
(iv) The weights of the hotel guests are probably not independent.
The distribution of weights of the hotel guests may differ from that of the population.
(ii) Systematic.
(iii) Use a computer random number generator to generate, for each member, a number
between 0 and 1. Take the x members with the largest numbers, where x is his desired
sample size.
(iv) (a) 25 women.
(iv) (b) 15 men from squash.
Answer to Exercise 217 (8174 N2006/II/13). (i) Let X ∼ B(12, 0.3) be the number
of residents (out of 12) who watch the programme. P(X = 4) ≈ 0.231.
(ii) Let Y ∼ B(80, 0.3) be the number of residents (out of 80) who watch the programme.
By the CLT, Y is well-approximated by A ∼ N(24, 16.8). So
[This is fairly close to the exact probability P(20 < Y < 30) ≈ 0.711 (calculator).]
www.EconsPhDTutor.com
Or simply email:
DrChooYanMin@gmail.com