H1 Mathematics Textbook (Choo Yan Min) PDF

H1
Mathematics
Textbook
CHOO YAN MIN
Includes TYS & Answers.

Covers 8865 (revised) syllabus.
This version: 1st April 2017.
The latest version will always be at this link.
This textbook was first completed in August 2016.
Since then, only small changes (usually corrections of typos) have been made.
Page 2, Table of Contents www.EconsPhDTutor.com

, Errors? Feedback? Email me! ,
With your help, I plan to keep improving this textbook.

This book is licensed under the Creative Commons license CC-BY-NC-SA 4.0.
You are free to:
• Share — copy and redistribute the material in any medium or format

• Adapt — remix, transform, and build upon the material
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
• Attribution — You must give appropriate credit, provide a link to the license, and
indicate if changes were made. You may do so in any reasonable manner, but not in any
way that suggests the licensor endorses you or your use.
• NonCommercial — You may not use the material for commercial purposes.
• ShareAlike — If you remix, transform, or build upon the material, you must distribute
your contributions under the same license as the original.
• No additional restrictions — You may not apply legal terms or technological measures
that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain
or where your use is permitted by an applicable exception or limitation. No warranties are
given. The license may not give you all of the permissions necessary for your intended use.
For example, other rights such as publicity, privacy, or moral rights may limit how you use
the material.
Author: Choo, Yan Min.

Title: H1 Mathematics Textbook.
ISBN: 978-981-11-0755-9 (e-book).

The first thing to understand is that mathematics is an art.
Paul Lockhart (2009, A Mathematician’s Lament, p. 22).
A mathematician, like a painter or a poet, is a maker of patterns. If his patterns are

more permanent than theirs, it is because they are made with ideas. ... Beauty is the
first test: there is no permanent place in the world for ugly mathematics.
- G.H. Hardy (1940 [1967], A Mathematician’s Apology, pp. 84-85).
The scientist does not study nature because it is useful to do so. He studies it because
he takes pleasure in it, and he takes pleasure in it because it is beautiful.
- Henri Poincaré (1908 [1914], Science and Method, English trans., p. 22).

About This Book
This textbook is for Singaporean H1 Maths students. It is based exactly on the revised
(8865) syllabus, which will be examined for the first time only in 2017.1
I assume that if you’re an H1 Maths student, you
• have passed O-Level Mathematics;

• may or may not have taken O-Level Additional Mathematics;
• are somewhat weaker or less interested in maths than the average H2 Maths student;
• want to learn or do the minimum amount of maths necessary to get an A;
• won’t be studying such subjects as mathematics, physics, engineering, or economics at
university.
This textbook is thus written simply and non-rigorously. For example, there are few formal
definitions or proofs.2
Simple and non-rigorous as this textbook may be, I fully intend that a careful study of this
textbook (complemented by a capable teacher) will easily earn you your A in H1 Maths.
For a comparison of H1 vs H2 maths, check out this brief 3-page document: H1 Maths vs
H2 Maths: What’s the Difference? Which Should I Take?
• FREE! This book is free. But if you paid any money for it, I certainly hope your money
is going to me! This book is free because:
1. It is a shameless advertising vehicle for my awesome tutoring services.

2. The marginal cost of reproducing this book is zero.
• DONATE! This book may be free, but donations are more than welcome! Donation
methods in footnote.3
It’s irrational for Homo economicus to donate. But please consider donating because:
1. You’re a nice human being , [*emotional_manipulation*].

2. Your donations will encourage me and others to continue producing awesome free content
for the world.
1
The old syllabus is 8864, to be examined for the last time in 2017. It is not very different from 8865.
2
This is in contrast to my H2 Mathematics Textbook, which is an authoritative reference that the interested H1 Maths
student should look at. That H2 Mathematics Textbook covers the same topics (and more) as this textbook, but much
more rigorously and thoroughly. Indeed a good deal of this H1 Mathematics Textbook is simply a diluted version of that
H2 Mathematics Textbook.
3
Singapore. POSB Savings Account 174052271 or OCBC Savings Account 5523016383 (Name: Choo Yan Min). Interna-
tional. Bitcoin wallet: 1GDGNAdGZhEq9pz2SaoAdLb1uu34LFwViz. PayPal ychoo@umich.edu (Name: Yan Min Choo,
USD preferred because this account was set up in the US). USA. Venmo link (Name: Yanmin Choo).

• HELP ME IMPROVE THIS BOOK! Feel free to email me if:
1. There are any errors in this book. Please let me know even if it’s something as trivial
as a spelling mistake or a grammatical error.
2. You have absolutely any suggestions for improvement.
3. Any part of this book is less than crystal clear.
Here’s an anecdote about Richard Feynman, the great teacher and physicist:
Feynman was once asked by a Caltech faculty member to explain why spin
1/2 particles obey Fermi-Dirac statistics. He gauged his audience perfectly
and said, “I’ll prepare a freshman lecture on it.” But a few days later he
returned and said, “You know, I couldn’t do it. I couldn’t reduce it to
the freshman level. That means we really don’t understand it.”
I agree: If you can’t explain something simply, you don’t understand it well enough.4 And
as a corollary, the best way to gauge whether you understand something is to see if you
can explain it simply to someone else.
If at any point in this textbook, you have read the same passage a few times, tried to reason
it through, and still find things confusing, then it is a failure on MY part. Please let me
know and I will try to rewrite it so that it’s clearer. (There is also the possibility that I
simply messed up! So please let me know if there’s anything confusing!)
I deeply value any feedback, because I’d like to keep improving this textbook
for the benefit of everyone! I am very grateful to all the kind folks who’ve already
written in, allowing me to rid this book of more than a few embarrassing errors.
• LyX rocks!
This book was written using LYX.5
• Is the font size big enough?
You’re probably reading this on some device. So I’ve tried to set the font sizes and stuff so
that one can comfortably read this on a device as small as a seven-inch tablet. It should
also be possible to read this on a phone, though somewhat less comfortably. (Please let me
know if you have any feedback about this!)
(I’ll probably be contacting some publishers to see if they want to do a print version of
this, for anyone who prefers it in print.)
4
This quote or some similar variant is often (mis)attributed to Einstein. But as Einstein himself once said, “73% of Einstein
quotes are misattributed.”
L TEX is the typesetting program used by most economists and scientists. But LATEX can be difficult to use. LYX is a
5 A
user-friendly GUI version of LATEX. LYX has boosted my productivity by countless hours over the years and you should use
LYX too!

Tips for the Student
• Read maths slowly.
Reading maths is not like reading Harry Potter. Most of Harry Potter is fluff. There is
little fluff in maths.
So go slowly. Dwell upon and carefully consider every sentence in this textbook. Make sure
you completely understand what each statement says and why it is true. Reading maths
is very different from reading any other subject matter.
If you don’t quite understand some material, you might be tempted to move forward anyway.
Don’t. In maths, later material usually builds on earlier material. So if you simply move
forward, this will usually cost you more time and frustration in the long run.
Better then to stop right there. Keep working on it until you “get” it. Ask a friend or
a teacher for help. Feel free to even email me! (I’m always interested to know what the
common points of confusion are and how I can better clear them up.)
• Examples and exercises are your best friends. So work through them.
A good stock of examples, as large as possible, is indispensable for a

thorough understanding of any concept, and when I want to learn something
new, I make it my first job to build one.
- Paul Halmos (1983, Google Books).
Work through all the examples and exercises. Merely moving your eyeballs is not the same
as working. Working means having pencil and paper by your side and going through each
example/exercise word-by-word, line-by-line.
For example, I might say something like “x2 − y 2 = 0. Thus, (x − y)(x + y) = 0.” If it’s not
obvious to you why the first sentence implies the second, stop right there and work on it
until you understand why. Don’t just let your eyeballs fly over these sentences and pretend
that your brain is “getting” it.
I will often not bother to explain some steps, especially if they simply involve some simple
algebra.
• You get a List of Formulae during the A-level exam.
It’s called List of Formulae MF26. It’s available at this link (MF26). (I cannot guarantee
though that your JC will give you the List during your JC common tests and exams.)

• Online Calculators
Google is probably the quickest for simple calculations. Type in anything into your
browser’s Google search bar and the answer will instantly show up:
Wolfram Alpha is somewhat more advanced (but also slower). Enter “sin x” for example
and you’ll get graphs, the derivative, the indefinite integral, the Maclaurin series, and a
bunch of other stuff you neither know nor care about.
The Derivative Calculator and the Integral Calculator are probably unbeatable for the
specific purposes of differentiation and integration. Both give step-by-step solutions for
anything you want to differentiate or integrate.
Here is a collection of spreadsheets I made. These spreadsheets are for doing tedious and
repetitive calculations that H2 Maths students (and hence also H1 Maths students) will
often encounter. As with anything I do, I welcome any feedback you may have about
these spreadsheets. Perhaps in the future I will make a more attractive version of it.
(Instructions: Click “Make a copy” to open up your own independent copy of
this spreadsheet. Enter your input in the yellow cells. Output is produced in
the blue cells. If you mess up anything, simply click the same link and “Make
a copy” again.)

Use of Graphing Calculators
You are required to know how to use a graphing calculator.6
This textbook will give only a very few examples involving graphing calculators.
There is no better way of learning to use it than to play around with it yourself. By the
time you sit down for your A-level exams, you should have had plenty of practice with it.
You can also use any of the seven calculators in the list below (last updated by SEAB
on March 1st, 2016, PDF). But this textbook will stick with the TI-84 PLUS Silver Edi-
tion (which I’ll simply call the TI84). (My understanding is that most students use a TI
calculator and that the five approved TI calculators are pretty similar.)
I’ll always start each example with the calculator freshly reset.
6
Pretty bizarre that in this age of the smartphone, they want you to learn how to use these clunky and now-useless devices
from the ’80s and ’90s. It is the equivalent of learning to program a VCR.
IMHO it’d be much better to teach you to some simple programming or Excel (or whatever spreadsheet program). “B-b-but
... how would such learning be tested in an exam format?” Ay, there’s the rub. In the Singapore education system, anything
that cannot be “examified” is not worth learning.

Contents
About This Book 6
Tips for the Student 8
Use of Graphing Calculators 10
I Functions and Graphs 17
1 Dividing By Zero 18
2 Functions 19
3 Graphs: Introduction 21
4 Graphs: Intercepts 23
5 Graphs: Turning Points 25
6 Quadratic Equations 27
7 Graphs: Asymptotes 32
8 Exponents: Laws 35
9 Exponents: Graphs 36
10 Exponential Growth and Decay 38
11 Logarithms: Introduction 42
12 Logarithms: Laws 43
13 Logarithms: Graphs 45
14 Logarithmic Growth 47

15 Graphs: Symmetry 49
16 Graphing with the TI84 51
17 Simultaneous Equations: One Linear and One Quadratic 53
18 Solving Equations Using Your TI84 56
19 Quadratic Inequalities 57
20 Solving Inequalities Using Your TI84 58
21 Formulating an Equation or a System of Linear Equations from a Problem

Situation 61
II Calculus 62
22 Equations of Lines 63
23 The Derivative as Slope of the Tangent 65
24 Chain Rule 72
25 Increasing, Decreasing, and f ′ 74
26 Finding Turning Points (the First Derivative Test) 76
27 Inflexion Points 80
28 Finding Max/Min Points on the TI84 85
29 Finding the Derivative at a Point on the TI84 87
30 Connected Rates of Change Problems 88
31 Integration as the Reverse of Differentiation 90

32 The Constant of Integration 93
33 Basic Rules of Integration 94
34 The Definite Integral as the Area Under a Graph 97
35 Area between a Curve and Lines Parallel to Axes 101
36 Area between a Curve and a Line 102
37 Area between Two Curves 103
38 Finding Definite Integrals on your TI84 104
III Probability and Statistics 105
39 How to Count: Four Principles 106

39.1 How to Count: The Addition Principle . . . . . . . . . . . . . . . . . . . . . . . 107
39.2 How to Count: The Multiplication Principle . . . . . . . . . . . . . . . . . . . 110
39.3 How to Count: The Inclusion-Exclusion Principle . . . . . . . . . . . . . . . . 114

39.4 How to Count: The Complements Principle . . . . . . . . . . . . . . . . . . . . 116
40 How to Count: Permutations 117
40.1 Permutations with Repeated Elements . . . . . . . . . . . . . . . . . . . . . . . 121

40.2 Partial Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
40.3 Permutations with Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
41 How to Count: Combinations 127

41.1 Pascal’s Triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
41.2 The Combination as Binomial Coefficient . . . . . . . . . . . . . . . . . . . . . 131

41.3 The Number of Subsets of a Set is 2n . . . . . . . . . . . . . . . . . . . . . . . . 134

42 Probability: Introduction 136
42.1 Mathematical Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
42.2 The Experiment as a Model of Scenarios Involving Chance . . . . . . . . . . . 138

42.3 Mutually Exclusive Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
42.4 Complementary Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
42.5 The Union of Two Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

42.6 The Intersection of Two Events . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
42.7 Properties of Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
43 Probability: Conditional Probability 145
44 Probability: Independence 147
45 Probability: Not Everything is Independent 151
46 Random Variables: Introduction 153
47 Random Variables: Probability Distribution 154
48 Random Variables: Independence 158
49 Random Variables: Expectation 160

49.1 The Expectation Operator is Linear . . . . . . . . . . . . . . . . . . . . . . . . 163
50 Random Variables: Variance 165

50.1 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
50.2 Properties of the Variance Operator . . . . . . . . . . . . . . . . . . . . . . . . 172
51 The Binomial Distribution 174

51.1 Probability Distribution of the Binomial R.V. . . . . . . . . . . . . . . . . . . 175
51.2 The Mean and Variance of the Binomial Random Variable . . . . . . . . . . 176

52 The Continuous Uniform Distribution 178
52.1 The Continuous Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . 178
52.2 Important Digression: P (X ≤ k) = P (X < k) . . . . . . . . . . . . . . . . . . . 180
52.3 The Cumulative Distribution Function (CDF) . . . . . . . . . . . . . . . . . . 181
52.4 The Probability Density Function (PDF) . . . . . . . . . . . . . . . . . . . . . 182
53 The Normal Distribution 183

53.1 The Normal Distribution, in General . . . . . . . . . . . . . . . . . . . . . . . . 189
53.2 Sum of Independent Normal Random Variables . . . . . . . . . . . . . . . . . 198
54 The Central Limit Theorem and The Normal Approximation 202
55 Sampling 205
55.1 Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
55.2 Population Mean and Population Variance . . . . . . . . . . . . . . . . . . . . 206
55.3 Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
55.4 Distribution of a Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
55.5 A Random Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
55.6 Sample Mean and Sample Variance . . . . . . . . . . . . . . . . . . . . . . . . . 211
55.7 Sample Mean and Sample Variance are Unbiased Estimators . . . . . . . . . 217
55.8 The Sample Mean is a Random Variable . . . . . . . . . . . . . . . . . . . . . 220
55.9 The Distribution of the Sample Mean . . . . . . . . . . . . . . . . . . . . . . . 221
55.10Non-Random Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
56 Null Hypothesis Significance Testing (NHST) 223

56.1 One-Tailed vs Two-Tailed Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
56.2 The Abuse of NHST (Optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . 230
56.3 Common Misinterpretations of the Margin of Error (Optional) . . . . . . . . 231
56.4 Critical Region and Critical Value . . . . . . . . . . . . . . . . . . . . . . . . . . 234
56.5 Testing of a Population Mean
2
(Small Sample, Normal Distribution, σ Known) . . . . . . . . . . . . . . . . . 236

2
(Large Sample, Any Distribution, σ Known) . . . . . . . . . . . . . . . . . . . 238
2
(Large Sample, Any Distribution, σ Unknown) . . . . . . . . . . . . . . . . . 240
56.8 Formulation of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
57 Correlation and Linear Regression 243

57.1 Bivariate Data and Scatter Diagrams . . . . . . . . . . . . . . . . . . . . . . . . 243
57.2 Product Moment Correlation Coefficient (PMCC) . . . . . . . . . . . . . . . . 245
57.3 Correlation Does Not Imply Causation (Optional) . . . . . . . . . . . . . . . . 251
57.4 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
57.5 Ordinary Least Squares (OLS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
57.6 TI84 to Calculate the PMCC and the OLS Estimates . . . . . . . . . . . . . . 259
57.7 Interpolation and Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 261
57.8 The Higher the PMCC, the Better the Model? . . . . . . . . . . . . . . . . . . 269
IV Ten-Year Series 271
58 Past-Year Questions for Section A: Pure Mathematics 272
59 Past-Year Questions for Section B: Prob. & Stats 287
V Answers to Exercises 313
60 Answers to Exercises in Part I: Functions and Graphs 314
61 Answers to Exercises in Part II: Calculus 330
62 Answers to Exercises in Part III: Probability and Statistics 340
63 Answers to Exercises in Part IV (2006-2015 A-Level Exams) 369

63.1 Answers for Ch. 58: Pure Mathematics . . . . . . . . . . . . . . . . . . . . . . 369
63.2 Answers for Ch. 59: Probability and Statistics . . . . . . . . . . . . . . . . . . 385

Part I
Functions and Graphs

1 Dividing By Zero
This chapter is a brief warning against a common mistake — dividing by 0. Students

have little trouble avoiding this mistake if the divisor is obviously a big fat 0. Instead,
students usually make this mistake when the divisor is an unknown constant or variable
that might be 0.
Example 1. Find the values of x for which x(x − 1) = (2x − 2)(x − 1).
Here’s the wrong solution: “Divide both sides by x − 1 to get x = 2x − 2. So x = 2.”
Here’s the correct solution: “Case #1. Suppose x − 1 = 0. Then the given equation is
satisfied. So x = 1 is one possible value for which x(x − 1) = (2x − 1)(x − 1). Case #2.
Now suppose x − 1 ≠ 0. So we can divide both sides by x − 1 to get x = 2x − 2. So x = 2.
Conclusion. The two possible values of x for which x(x − 1) = (2x − 1)(x − 1) are x = 1 and
x = 2.”
Moral of the story. Whenever you divide by a certain quantity, make sure it’s non-zero.
If you’re not sure whether it equals 0, then break up your analysis into two cases, as was
done in the above example: Case #1 — the quantity equals 0 (and see what happens
in this case); Case #2 — the quantity is non-zero (in which case you can go ahead and
divide).
By the way, let’s take this opportunity to clear up another popular misconception — You
may have heard that 1/0 = ∞. This is wrong. 1/0 ≠ ∞. Instead, any non-zero number
divided by 0 is undefined.7 “Undefined” is the mathematician’s way of saying, “You haven’t
told me what you are talking about. So what you are saying is meaningless.”
Exercise 1. What’s wrong with this “proof” that 1 = 0? (Answer on p. 314.)
1. Let x, y be positive numbers such that x = y.

2. Square both sides: x2 = y 2 .
3. Rearrange: x2 − y 2 = 0
4. Factorise: (x − y)(x + y) = 0.
5. Divide both sides by x − y to get x + y = 0.
6. Since x = y, sub y = x into the above equation to get 2x = 0.
7. Divide both sides by 2x to get 1 = 0.
7
One exception is 0/0, which is indeterminate. This means that 0/0 is sometimes undefined, but can sometimes be defined
under certain circumstances.

2 Functions
Undoubtedly the most important concept in all of mathematics is that of a function

— in almost every branch of modern mathematics functions turn out to be the central
objects of investigation.
- Michael Spivak (1994 [2006], Calculus, p. 39).
Informally, a function is a rule that maps each input to exactly one output.
Example 2. Consider the function f defined by f (x) = x2 +5. The input is any real number
x, the corresponding output is the real number x2 + 5. For example, f (3) = 32 + 5 = 14. In
words, we may say either of the following equivalent statements:
• f maps the input 3 to the output 14; or

• the value of f at 3 is 14.
Example 3. Consider the function g defined by g(x) = x/ (x2 + 1). The input is any
real number x, the corresponding output is the real number x/ (x2 + 1). For example,
g(3) = 3/ (32 + 1) = 0.3. In words, we may say either of the following equivalent statements:
• g maps the input 3 to the output 0.3; or

• the value of g at 3 is 0.3.
We will usually consider only functions whose inputs and outputs are real numbers. But
in general, this need not be the case. To illustrate this point, here are two examples.
Example 4. Consider the function h that maps each person’s name to the first letter of
that name. So for example, h (Lee Kuan Yew) = L. In words, we may say either of the
following equivalent statements:
• h maps the input Lee Kuan Yew to the output L; or

• the value of h at Lee Kuan Yew is L.
Another example: h (Barack Hussein Obama) = B. In words, we may say either of the
following equivalent statements:
• h maps the input Barack Hussein Obama to the output B; or
• the value of h at Barack Hussein Obama is B.

Example 5. Consider the function i that maps each building to the country of its loca-
tion. So for example, i (Burj Khalifa) = United Arab Emirates and i (Petronas Towers) =
Malaysia.
Exercise 2. Let f (x) = 7x − 3. What are f (0), f (1), and f (2)? (Answer on p. 314.)
Exercise 3. Let g be the function that maps each country to its capital. What are
g(France) and g(Japan)? (Answer on p. 314.)
Students frequently believe that f (x) denotes a function. This is wrong.
f and f (x) refer to two different things.

f denotes a function. f (x) denotes the value of f at x.
This may seem like an excessively pedantic distinction. But maths is precise and pedantic.
In maths, what we mean is precisely what we say and what we say is precisely what we
mean. There is never any room for ambiguity or alternative interpretations.

3 Graphs: Introduction
A point is any ordered pair (x, y) of real numbers.

The graph of an equation is the set of points (x, y) that satisfy the equation.
Example 6. Consider the equation y = 2x + 3. Its graph is the set of points (x, y) that
satisfy the equation y = 2x + 3. For example, the point (x, y) = (0, 3) is in the graph of the
equation y = 2x + 3, because 3 = 2 ⋅ 0 + 3.
We can illustrate the graph of an equation in what is called the cartesian plane. The
graph of y = 2x + 3 is drawn below. The point (0, 3) is marked in green.
8
6
4
2
0
-2 -1 0 1 2
-2
Example 7. Consider the equation y = x2 − 1. Its graph is the set of points (x, y) that
satisfy the equation y = x2 − 1. For example, the point (x, y) = (3, 8) is in the graph of the
equation y = x2 − 1, because 8 = 32 − 1.
The graph of y = x2 − 1 is drawn below. The point (3, 8) is marked in green.
10
-4 -2 -2 0 2 4

The graph of a function f is the graph of the equation y = f (x).
Example 8. Consider the function f defined by f (x) = x2 + 5. The graph of f is defined to

be the graph of the equation y = f (x); equivalently, it is the graph of the equation y = x2 +5.
The graph of f is drawn below. The point (2, 9) is marked in green.
0
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
Example 9. Consider the function g defined by g(x) = x/ (x2 + 1). The graph of g is
defined to be the graph of the equation y = g(x); equivalently, it is the graph of the
equation y = x/ (x2 + 1).
The graph of g is drawn below. The point (2, 0.4) is marked in green.
0
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
-1
Exercise 4. Graph the following equations: (i) y = 2x − 1; (ii) y = 1 − x2 . Mark the point
where x = 2. (Answer on p. 314.)
Exercise 5. Graph the following functions: (i) the function f defined by f (x) = 5 − 3x; (ii)
the function g defined by g(x) = 3x + x2 . Mark the point where x = 2. (Answer on p. 315.)

4 Graphs: Intercepts
A graph may also intersect the vertical axis (also known as the y-axis). The y-coordinate
of any such intersection point is called a vertical intercept (or y-intercept).
A graph may intersect the horizontal axis (also known as the x-axis). The x-coordinate of
any such intersection point is called a horizontal intercept (or x-intercept). Horizontal
intercepts are also called zeros or roots (of the corresponding equation or function). (We’ll
use the terms zeros and roots interchangeably in this textbook.)
Example 10. Graphed below is the equation y = 2x − 1. The graph has one horizontal
intercept, 0.5, and one vertical intercept, −1. Equivalently, the graph intersects the x-axis
at (0.5, 0); and the y-axis at (0, −1).
4
3
2
1
0
-2 -1 0 1 2
-1
-2
-3
-4
-5
-6
We also call 0.5 the zero or root of the equation y = 2x − 1, because 2(0.5) − 1 = 0. That
is, x = 0.5 satisfies the equation y = 0 or 2x − 1 = 0.

Example 11. Graphed below is the function f defined by f (x) = x2 − 1. The graph has
two horizontal intercepts, −1 and 1, and one vertical intercept −1. Equivalently, the graph
intersects the x-axis at (−1, 0) and (1, 0); and the y-axis at (0, −1).
0
-2 -1 0 1 2
-1
-2
We also call −1 and 1 the zeros or roots of the function f , because f (−1) = 0 and f (1) = 0.
That is, x = −1 or x = 1 satisfies the equation f (x) = 0.

5 Graphs: Turning Points
A turning point of a graph is either a maximum turning point or a minimum

turning point.
Example 12. Graphed below are the functions f and g defined by f (x) = (x + 1)2 and
g(x) = −(x − 1)2 .

The graph of f has a minimum turning point, namely (−1, 0).
The graph of g has a maximum turning point, namely (1, 0).
Example 13. Graphed below is the equation y = x3 − 12x + 3.
This graph has a maximum turning point (−2, 19) and a minimum turning point (2, −13).

Example 14. Graphed below is the equation y = x + 1.
This graph has neither maximum nor minimum turning points.
Informally, a maximum turning point is where the y-value is greater than all nearby points.
Similarly, a minimum turning point is where the the y-value is smaller than all nearby
points.

6 Quadratic Equations
Section A of the A-level exams will have some questions about quadratic equations.
In theory, you should have completely mastered quadratic equations from your study of
O-Level Mathematics. In practice? Probably not. So this chapter reviews quadratic
equations.
Example 15. Below are the graphs of the equations y = x2 + 3x + 1 (red), y = x2 + 2x + 1

(blue), y = x2 +x+1 (green), y = −x2 +x+1 (red dotted), y = −x2 −2x−1 (blue dotted),
and y = −x2 − x − 1 (green dotted).
Remark 1. You may have heard of the term parabola (plural: parabolae). Just so you
know, the graph of a quadratic equation is an example of a parabola. But don’t worry, the
word parabola will never show up on the A-level H1 Maths exam.

We now learn how to complete the square.
In general, (x + k)2 = x2 + 2kx + k 2 . Thus,
b 2 b b2
(x + ) = x2 + x + 2 .
2a a 4a
b 1 b 2 b2
Or rearranging: x2 + x = (x + ) − .
a 2a 4a
1
In a moment we’ll make use of =. Now let’s consider the quadratic equation y = ax2 + bx + c.
Assume that a ≠ 0, otherwise the equation simplifies to y = bx + c, which is just a straight
line. We now manipulate the quadratic expression ax2 + bx + c. First, we divide by a (this
is allowed because of our assumption that a ≠ 0):
2 b c
ax2 + bx + c = a (x2 + x + ) .
a a
1 2
Now plug = into = to get:
2 b 2 b2 c b 2 b2 − 4ac
ax + bx + c = a [(x + ) − 2 + ] = a [(x + ) − ].
2a 4a a 2a 4a2
What we just did above is called completing the square. We can now compute the zeros
of the equation y = ax2 + bx + c.
b 2 b2 − 4ac
ax2 + bx + c = 0 ⇐⇒ a [(x + ) − ]=0
2a 4a2
b 2 b2 − 4ac b 2 b2 − 4ac
⇐⇒ (x + ) − =0 ⇐⇒ (x + ) =
2a 4a2 2a 4a2
√
√ −b ± b2 − 4ac
⇐⇒ x+
b ± b2 − 4ac
2a
=
2a
⇐⇒ x= .
2a
This last expression solves ax2 + bx + c = 0. This expression will NOT be printed in the
A-Level List of Formulae! So be sure you remember it!

We can distinguish between six categories of quadratic equations, based on the signs of a
(the coefficient of x2 ) and b2 − 4ac (the discriminant). Each of these six categories are
illustrated in the figure below (reproduced from above).
The properties of quadratic equations are summarised in the following table and discussed
on the next page.
Category Features
∪-shaped.
1. a > 0, b2 − 4ac > 0
Intersects the x-axis at two points.
∪-shaped.
2. a > 0, b2 − 4ac = 0
Just touches the x-axis at the minimum point.
∪-shaped.
3. a > 0, b2 − 4ac < 0
Doesn’t intersect the x-axis.
∩-shaped.
4. a < 0, b2 − 4ac > 0
Intersects the x-axis at two points.
∩-shaped.
5. a < 0, b2 − 4ac = 0
Just touches the x-axis at the maximum point.
∩-shaped.
6. a < 0, b2 − 4ac < 0
Doesn’t intersect the x-axis.

• The vertical intercept of the graph of a quadratic equation is always simply c. This
is because plugging x = 0 into ax2 + bx + c yields c.
• The sign of a.
– If a > 0, then the graph is ∪-shaped and has a minimum turning point at x = −b/2a.
– Conversely, if a < 0, then the graph is ∩-shaped and has a maximum turning point at
x = −b/2a.
• The sign of the discriminant b2 −4ac. This name makes sense, because the discriminant
helps us discriminate between several possible cases of the equation ax2 + bx + c = 0:
– If b2 − 4ac > 0, then:

∗ There are two real roots (or zeros or horizontal intercepts), namely
√
−b ± b2 − 4ac
.
2a
∗ Moreover, we can write
√ √
−b + b2 − 4ac −b + b2 − 4ac
ax2 + bx + c = (x − ) (x + ).
2a 2a
What we have just done is to factorise the expression ax2 + bx + c. Factorisation is often
a useful trick to play. Notice that if you plug in either of the roots into the right hand
side (RHS) of the above equation, we do indeed get zero, as expected.
– If b2 − 4ac = 0, then:
∗ There is only one real root (or zero or horizontal intercept), namely −b/2a.
∗ Moreover, we can write
2 −b 2 b 2
ax + bx + c = (x − ) = (x + ) .
2a 2a
∗ Notice that if you plug x = −b/2a into the RHS of the above equation, we do indeed
get zero, as expected.
– If b2 − 4ac < 0, then:
∗ There are no real roots (or zeros or horizontal intercepts).
∗ There is no way to factorise the expression ax2 +bx+c (at least without using complex
numbers, which are not covered in H1 Maths).

Exercise 6. For each of the following equations, sketch its graph and identify its intercepts
and turning points (if these exist). (a) y = 2x2 + x + 1. (b) y = −2x2 + x + 1. (c) y = x2 + 6x + 9.
(Answer on p. 316.)
Exercise 7. (Answer on p. 317.)

(i) When does the quadratic equation y = ax2 + bx + c have (a) two real roots? (b) two equal
roots? (c) no real roots?
(ii) When is ax2 + bx + c (a) positive for all possible values of x? (b) negative for all possible
values of x?

7 Graphs: Asymptotes
A horizontal asymptote is a horizontal line of the form y = a.
Example 16. The graph below has horizontal asymptote y = 2, because as x grows infinitely
large (i.e. towards ∞), y grows ever closer to (but is never equal to) 2.
Example 17. The graph below has horizontal asymptote y = 2, because as x grows infinitely
small (i.e. towards −∞), y grows ever closer to (but is never equal to) 2.

A vertical asymptote is a vertical line of the form x = b.
Example 18. The graph below has vertical asymptote x = 3, because as x grows ever closer
to (but is never equal to) 3, y grows infinitely large (i.e. towards ∞).
-4 -2 0 2 4 6
Example 19. The graph below has vertical asymptote x = 3, because as x grows ever closer
to (but is never equal to) 3, y grows infinitely small (i.e. towards −∞).
-4 -2 0 2 4 6

Here are the informal definitions. A graph has a
• Horizontal asymptote y = a if:
As x grows ever larger or smaller (towards ∞ or −∞),

y grows ever closer to (but never equals) a.
• Vertical asymptote x = b if:
As x grows ever closer to (but never equals) b,

y grows ever larger or smaller (towards ∞ or −∞).

8 Exponents: Laws
For all real numbers x, we have x1 = x and x0 = 1.8

For all real numbers x, y, a, and b (provided any denominators are non-zero):
x a xa
xa ⋅ xb = xa+b , ( ) = a,
y y
xa 1
= xa−b , x−a = ,
xb xa
√
a1/b =
b
(xa ) = xab , b
a,
√ √ c
(xy)a = xa y a , ac/b = ac = ( b a) .
b
Exercise 8. (Answer on p. 317.) Simplify the two expressions below.
(53x ⋅ 251−x ) (8x+2 − 34(23x ))

52x+1 + 3(25x ) + 17(52x )
, √ 2x+1 .
( 8)
Exercise 9. (Answer on p. 318.) Is each of the following true? (If true, explain why. If
false, simply give a counterexample.)
(i) x(a ) = xab ;

b b
(ii) (xa ) = xab .
8
By convention, 00 is usually defined to be equal to 1 – this textbook will follow this practice.

9 Exponents: Graphs
e = 2.7182818 . . . is the constant known as Euler’s number. The significance of Euler’s

number will be revealed only later on, when we study calculus.
Example 20. The graphs below are of the equations y = 2x , y = 3x , y = 3 ⋅ 2x , y = 2 ⋅ 3x , and

y = ex .
10
Each of these graphs has horizontal asymptote y = 0, because as x grows infinitely small
(i.e. to −∞), y grows ever closer to (but never equals) 0.

Example 21. The graphs below are of the functions f , g, h, and i defined by f (x) = 4x ,
g(x) = 5x , h(x) = 5 ⋅ 4x , and i(x) = 4 ⋅ 5x .
15
14
13
12
11
10
Each of these graphs has horizontal asymptote y = 0, because as x grows infinitely small
(i.e. to −∞), y grows ever closer to (but never equals) 0.
Exercise 10. Graph (on the same diagram) the following equation and function: (i) y = 6x ;
(ii) f defined by f (x) = 7x . (Answer on p. 318.)

10 Exponential Growth and Decay
Exam Tip
The topic of exponential growth and decay is new to the 8865 syllabus. So there are no
TYS questions covering this topic.
Informally, a quantity exhibits exponential growth if its growth rate is increasing.9
Example 22. Bacteria in a petridish double in weight every 10 seconds. Let bt be the total
weight (micrograms) of the bacteria in the petridish at time t. Let t (seconds) be time.
Initially, there were 7 micrograms of bacteria. The graph below is bt against t.
In general, a quantity yt that grows exponentially takes the following form:
yt = y0 ⋅ 2t/d ,
where y0 is the quantity at time t = 0, and d is the number of units of time it takes for the
quantity to double.
So in this case, we have bt = 7 ⋅ 2t/10 .
9
More precisely, its growth rate is proportional to the current magnitude of the quantity.

Example 23. Rabbits in a forest double in population every 6 years. Let rt be the number
of rabbits at time t (years). Initially, there were 20 rabbits. So rt = 20⋅2t/6 (graphed below).
Example 24. The number of Singapore citizens originally from the People’s Republic of
China doubles every 5 years. In the year 2000, there were 50, 000 such citizens. Let cy be
the total number of such citizens in the year y. The graph below is of cy against y.
Even more generally than before, a quantity xt that grows exponentially takes the following
form: xt = xT ⋅ 2(t−T )/d , where xT is the quantity at time t = T , and d is the number of units
of time it takes for the quantity to double.
So in this case, we have cy = 50000 ⋅ 2(y−2000)/5 . Extrapolating, there’ll be cy = 3, 200, 000

such citizens in the year 2030.

Example 25. The Prime Minister’s salary doubles every 12 years. In the year 2000,
the PM’s annual salary was $2 million. Let sy be the PM’s annual salary (in millions of
Singapore dollars) in the year y. So sy = 2 ⋅ 2(y−2000)/12 (graphed below).
Extrapolating, the PM’s salary will be s2060 = 64 million Singapore dollars in 2060.
Exponential decay is simply negative exponential growth.
Example 26. The world population of panda bears halves every 20 years. In the year
1960, there were 64, 000 panda bears. Let py be the total number of panda bears in the
year y. So py = 64000 ⋅ 0.5(y−1960)/20 (graphed below).
Extrapolating, there will be p2020 = 8, 000 pandas in 2020.

Example 27. The number of ethnic-Malay Singapore citizens halves every 10 years. In the
year 1990, there were 800, 000 such citizens. Let mty be the total number of such citizens
in the year y. So my = 800000 ⋅ 0.5(y−1900)/5 (graphed below).
Extrapolating, there will be m2020 = 100, 000 such citizens in the year 2020.
Exercise 11. Let by be the number of Singaporean billionaires in the year y. This number
doubles every 7 years. In 1990, there were 4 Singaporean billionaires. (Answer on p. 319.)
(i) Write down an equation that expresses by in terms of y.
(ii) Graph this equation.
(iii) Extrapolating, how many Singaporean billionaires will there be in 2025?

11 Logarithms: Introduction
We define a = logb c to be the number such that ba = c. We call b the logarithmic base.
Example 28. log2 8 = 3, because 23 = 8.

log2 16 = 4, because 24 = 16.
log3 9 = 2, because 32 = 9.
log4 2 = 0.5, because 40.5 = 2.
We define lg c = log10 c.
Example 29. lg 100 = 2, because 102 = 100.

lg 1000 = 3, because 103 = 1000.
lg 10 = 1, because 101 = 10.
lg 1 = 0, because 100 = 1.
Remark 2. The Singapore-Cambridge A-level exams write lg c to mean the base-10 loga-
rithm of c, so that’s what we’ll stick to. But you should know that some other writers
(including most calculators) simply write log c to mean the same.
We define ln c = loge c, where e = 2.7182818 . . . is Euler’s number.
Example 30. ln e = 1, because e1 = e.

ln e2 = 2, because e2 = e2 .

(i) Compute each of the following: ln (1/e2 ), log5 0.008, lg 100000.
(ii) Given the following, find the constants a, b, and c: loga 16 = 4, logb 0.25 = −1, and
logc 5 = 1.
(iii) Rewrite the following equations in log form: y = 3x and 5 = pq .
(iv) Rewrite the following equations in exponential form: α = log4 β and logγ δ = 17.

12 Logarithms: Laws
For all real numbers x, we have logx 1 = 0, because x0 = 1 (this was stated in our discussion
of the laws of exponents). And if c ≤ 0, then logx c is undefined, because there is no real
number a such that xa ≤ 0.
Fact 1. Let a, b, x, y be positive numbers. Then
(i) logb bx = x
(ii) logb x + logb y = logb (xy),
x
(iii) logb x − logb y = logb ,
y
(iv) logb xa = a logb x
loga x
(v) logb x = .
loga b
(vi) y = ln x ⇐⇒ ey = x.
Proof. (Optional.) (i) is immediate from the definition of logarithms.
(ii) By (i), x = blogb x and y = blogb y . Hence, xy = blogb x blogb y = blogb x+logb y . Apply logb to both
sides of this equation to get logb (xy) = logb x + logb y.
x blogb x x
(iii) = log y = blogb x−logb y . Apply logb to both sides of this equation to get logb =
y b b y
logb x − logb y.
(iv) By (i) and (ii), xa = blogb x = ba logb x . Apply logb to both sides of this equation to get
a
logb xa = a logb x.
loga x
(v) By (i), x = blogb x . Plugging this into RHS and using also (ii), we have =
loga b
loga blogb x logb x loga b
= = logb x.
loga b loga b
(vi) is immediate from (i). (Observe that ln x = loge x.)

Examples to illustrate each of the above laws:
Example 31. log2 23 = 3 and log5 520 = 20.
Example 32. ln 3 + ln 4 = ln 12, log2 5 + log2 7 = log2 35, and lg 5 + lg 3 = lg 15.
Example 33. ln 3 − ln 4 = ln (3/4), log2 5 − log2 7 = log2 (5/7), and lg 5 − lg 3 = lg (5/3).
Example 34. ln 34 = 4 ln 3, log2 117 = 7 log2 11, and lg 53 = 3 lg 5.
Example 35. log2 5 = lg 5/ lg 2 = ln 5/ ln 2 = log3 5/ log3 2. Indeed, log2 5 = loga 5/ loga 2 for
any positive number a.

(i) Simplify log3 3x .
(ii) Find x if 2 loga 7 + 0.25 loga 81 − loga 3 = loga x, where a is a positive constant.
(iii) Find y if ln(y − 1) + ln y = 2.

13 Logarithms: Graphs
Example 36. The graphs below are of the equations y = log2 x, y = log3 x, y = ln x, and
y = lg x.
Each of these graphs crosses the horizontal axis at the point (1, 0).
Moreover, each has horizontal asymptote y = 0, because as x grows infinitely small (i.e. to
−∞), y grows ever closer to (but never equals) 0.

Example 37. The graphs below are of the functions f , g, and h defined by f (x) = log4 x,
g(x) = log5 x, and h(x) = log6 x.
Each of these graphs crosses the horizontal axis at the point (1, 0).
Moreover, each of these graphs has horizontal asymptote y = 0, because as x grows infinitely
small (i.e. to −∞), y grows ever closer to (but never equals) 0.
Exercise 14. Graph (on the same diagram) the following equation and function: (i) y =
log7 x; (ii) f defined by f (x) = log9 x. (Answer on p. 320.)

14 Logarithmic Growth
Exam Tip
The topic of logarithmic growth is new to the 8865 syllabus. So there are no TYS questions
covering this topic.
Informally, a quantity yt exhibits logarithmic growth if its growth rate is decreasing.10

It can be written as yt = y0 ln t, where y0 is the quantity at time t = 0.
1 1 1 1
Example 38. The nth harmonic number is Hn = + + + ⋅⋅⋅ + .
1 2 3 n
For example, the first four harmonic numbers are
1 1 1 1 1 1 1 1 1 1
H1 = = 1, H2 = + = 1.5, H3 = + + = 1.8333 . . . , H4 = + + + = 2.0833 . . .
1 1 2 1 2 3 1 2 3 4
It turns out that harmonic numbers grow logarithmically. In particular, a graph of the
harmonic numbers looks very similar to the graph of y = ln x (black dotted curve).
The harmonic numbers grow very slowly. For example, the first to exceed 10 is
1 1 1 1
H12367 = + + + ⋅⋅⋅ + ≈ 10.000043.
1 2 3 12367
Nonetheless and remarkably, the harmonic numbers grow forever (towards ∞)! In 1968, it
was shown (source) that the first harmonic number that exceeds 100 is:
H15092688622113788323693563264538101449859497 .
10
A bit more precisely, the growth rate is inversely proportional to the time elapsed.

Example 39. A brain tumour initially grows rapidly, then its rate of growth slows down.
Its weight bt (grams) is graphed against time t (days).
The growth of the brain tumour appears logarithmic.
Example 40. Ah Kow is studying for the H1 Maths Exam. He takes a practice test every
day for 30 days. On Day #1, he gets nearly 0 points. His score improves rapidly initially,
but the rate of improvement slows down.
The growth in Ah Kow’s test scores appears to be roughly logarithmic.

15 Graphs: Symmetry
Informally, a graph is symmetric in a line if it is unchanged even after being reflected in

that line.
Example 41. The graph of y = x2 is symmetric in the line x = 0 (which also happens to
be the vertical axis).
4
y
x=0
Reflection
line
3
y = x2
x
0
-2 -1 0 1 2

1
Example 42. The graph of y = is symmetric in the lines y = x and y = −x.
x
5
y
4
y = -x y=x
line 3 line
1
y=1/x
0
-5 -4 -3 -2 -1 0 1 2 3 4 5
-1 x
-2
-3
-4
-5
Exercise 15. Draw the graphs of each of the following equations. (a) y = ex . (b) y = 3x + 2.
(c) y = 2x2 + 1. Identify any intercepts, turning points, asymptotes, and lines of symmetry.
(Answers on pp. 321, 322, and 323.)

16 Graphing with the TI84
Here are our first examples involving a graphing calculator. As mentioned, all such examples
use a TI84.
Example 43. Graph the function f defined by f (x) = x2 .
1. Press ON to turn on your calculator.

2. Press Y= to bring up the Y= editor.
3. Press X,T,θ,n to enter “X”; then x2 to enter the squared “2 ” symbol.
4. Now press GRAPH and the calculator will graph the equation y = x2 .
After Step 1. After Step 2. After Step 3. After Step 4.

√
Example 44. Graph the function g defined by g(x) = x.

Most buttons on the TI84 have three different roles. Simply pressing a button executes the
role printed on the button itself. Pressing the blue 2ND and then a button executes the
role printed in blue above the button. And pressing the green ALPHA and then a button
executes the role printed in green above the button.
√
3. Press the blue 2ND button and then (which corresponds to the x2 button) to
√
enter “ (”. Next press X,T,θ,n to enter “X”. (If we’d like, we can also enter the
right parenthesis ) to close the left parenthesis, but this is not necessary — the TI84
understands what you mean, even if you don’t enter the right parenthesis.)
√
4. Now press GRAPH and the calculator will graph the equation y = x.
√
Exercise 16. Graph y = ex − x2 + x on your TI84. (Answer on p. 324.)

17 Simultaneous Equations: One Linear and One Quadratic
1 2
Example 45. Solve the following pair of simultaneous equations: y = x+5 and y = x2 −2x+1.
1 2
Plug = into =: x + 5 = x2 − 2x + 1. Rearrange to get x2 − 3x − 4 = 0. We can factorise
x2 − 3x − 4 = (x − 4)(x + 1). So x = 4 or x = −1. Correspondingly, y = 9 or y = 4.
So there are two solutions to the given pair of simultaneous equations, namely (x, y) = (4, 9)
and (x, y) = (−1, 4).
We can also solve this using our TI84:

3. Press X,T,θ,n + 5 to enter “x + 5”.
4. Now press ENTER to go to the second line.
5. Press X,T,θ,n x2 − 2 X,T,θ,n + 1 to enter “x2 − 2x + 1”.
6. Now press blue 2ND button and then CALC (which corresponds to the TRACE
button). This brings up the CALCULATE menu.
7. Press 5 to select the “intersect” option.
The TI84 now graphs both equations for you. It will now find the intersection points of
the two graphs. You entered only two equations, but it was possible that you entered more
than two. So just to be clear, the TI84 is now asking you, which are the two curves whose
intersection points you want? It first asks, “First curve?”
After Step 5. After Step 6. After Step 7.
(... Example continued on the next page ...)

(... Example continued from the previous page ...)
8. Simply press ENTER to confirm that you want y = x + 5 to be your first curve.
It now asks, “Second curve?” Again:
9. Simply press ENTER to confirm that you want y = x2 − 2x + 1 to be your second curve.
It now asks, “Guess?” You can use the arrow keys to move the blinking cursor to close to
where you believe an intersection point will be. Here I won’t bother moving the blinking
cursor at all. Instead, I will simply
10. Press ENTER . The TI84 tells you that the nearest intersection point is (x, y) = (−1, 4).
11. To find the other intersection point, repeat steps #7 through #10, using the arrow keys
as is appropriate.
The TI84 tells you what the other intersection point is — it is (x, y) = (4, 9).
1 2
Example 46. Solve the following pair of simultaneous equations: x+y = 0 and y = 3x2 +x−1.
1 3 3 2
Rearrange = to y = −x. Plug = into = to get −x = 3x2 + x − 1 or 0 = 3x2 + 2x − 1. Now use
the quadratic formula:
√
−2 ± 22 − 4(3)(−1) −2 ± 4 1
x= = = −1, .
2(3) 6 3
Correspondingly, y = 1 or y = −1/3.
So there are two solutions to the given pair of simultaneous equations, namely (x, y) =
(−1, 1) and (x, y) = (1/3, −1/3). TI84 screenshots:

Exercise 17. (Answer on p. 325.) Solve each of the following pairs of simultaneous
equations, both with and without a graphing calculator:
1 2
(i) x = 5y − 2 and x2 = y − 5x + 3.6.
1 2
(ii) 4x = 1 − y and 2x2 + 3 = 5x − y.

18 Solving Equations Using Your TI84
You are required to know how to use a graphing calculator to find the numerical solution
of equations (including system of linear equations).
Example 47. Solve the system of equations y = x4 − x3 − 5, y = ln x.

The method we learnt above was to graph both equations and then find their intersection
points.
Here I’ll use another method: First rewrite the two equations as a third equation y =
x4 − x3 − 5 − ln x. Our goal is to find the horizontal intercepts of this equation, which will
in turn also be the solutions to the above set of equations.
Briefly, in the TI84:

1. Graph the equation y = x4 − x3 − 5 − ln x.
It looks like there is only one horizontal intercept.
2. Zoom in.
3. Find the horizontal intercept using the “zero” option.
Conclusion: There is one solution to this set of equations and its x-coordinate is 1.8658. To
find the y-coordinate, we need merely plug in this value of x into either of the equations in
the original set of equations: y = ln x = ln 1.8658 ≈ 0.6237. Altogether, this set of equations
has one solution: (1.8658, 0.6237).
Exercise 18. Using your graphing calculator, solve the following systems of equations.(a)
1 1
y= √ , y = x5 − x3 + 2. (b) y = , y = x3 + sin x. (Answers on p. 326.)
1+ x 1−x 2

19 Quadratic Inequalities
Example 48. Solve x2 + 3x − 1 > 0.

x2 + 3x − 1 is a ∪-shaped expression. By the quadratic formula, it equals 0 if and only if
√ √
−3 ± 32 − 4(1)(−1) −3 ± 13
x= = .
2(1) 2
√ √
Hence, it is positive if x < (−3 − 13) /2 or x > (−3 + 13) /2.
Example 49. Solve 2x2 + 5x − 1 < 0.

2x2 + 5x − 1 is a ∪-shaped expression. By the quadratic formula, it equals 0 if
√ √
−5 ± 52 − 4(2)(−1) −5 ± 33
x= = .
2(2) 4
√ √
Hence, it is negative if (−5 − 33) /4 < x < (−5 + 33) /4.
Example 50. Solve x2 + 3 > 0.

x2 + 3 is a ∪-shaped expression. Moreover, its discriminant b2 − 4ac = 02 − 4(1)(3) = −12 is
negative. Hence it is always positive, for all values of x.
Exercise 19. (Answer on p. 327.) Solve the following inequalities

(i) x2 + 3x − 5 > 6 − 2x2 .
(ii) (x − 3)(x + 5) < 1.

20 Solving Inequalities Using Your TI84
Example 51. For what values of x is x > sin (0.5πx)?

Rewrite the inequality as x − sin(0.5πx) > 0. Graph y = x − sin(0.5πx) on your graphing
calculator. Our goal is to first find the horizontal intercepts of this equation; this will let
us solve for x > sin (0.5πx).
In the TI84:
3. Press X,T,θ,n − SIN 0 . 5 . To enter “π”, press the blue 2ND button and then π
(which corresponds to the ∧ button). Now press X,T,θ,n ) and altogether you will
have entered “x − sin(0.5πx)”.
4. Now press GRAPH and the calculator will graph y = x − sin(0.5πx).
It looks like the horizontal intercepts are close to the origin. Let’s zoom in to see better.
5. Press the (ZOOM) button to bring up a menu of ZOOM options.
6. Press 2 to select the Zoom In option. Nothing seems to happen. But now press ENTER
and the TI will zoom in a little for you.
It looks like there are 3 horizontal intercepts. To find out what precisely they are, we’ll use
the TI84’s “zero” option.

4. Press the blue 2ND button and then CALC (which corresponds to the TRACE
5. Press 2 to select the “zero” option. This brings you back to the graph, with a cursor
flashing. Also, the TI84 prompts you with the question: “Left Bound?”
TI84’s ZERO function works by you first specifying a “Left Bound” and a “Right Bound”
for x. TI84 will then check to see if there are any horizontal intercepts (i.e. values of x for
which y = 0) within those bounds.
6. Using the < and > arrow keys, move the blinking cursor until it is where you want your
first “Left Bound” to be. For me, I have placed it a little to the left of where I believe
the leftmost horizontal intercept to be.
7. Press ENTER and you will have just entered your first “Left Bound”.
TI84 now prompts you with the question: “Right Bound?”.
8. So now just repeat. Using the < and > arrow keys, move the blinking cursor until it is
where you want your first “Right Bound” to be. For me, I have placed it a little to the
right of where I believe the leftmost horizontal is.
9. Again press ENTER and you will have just entered your first “Right Bound”.
TI84 now asks you: “Guess?” This is just asking if you want to proceed and get TI84 to
work out where the horizontal intercept is. So go ahead and:
10. Press ENTER . TI84 now informs you that there is a “Zero” at “x = −1”, “y = 0” and
places the blinking cursor at precisely that point. This is the first horizontal intercept
we’ve found.
To find each of the other 2 horizontal intercepts, just repeat steps 4 through 10. You
should be able to find that they are at x = 0 and x = 1. Altogether, the 3 intercepts are
x = −1, 0, 1. Based on these and what the graph looks like, we conclude: x > sin (0.5πx)
⇐⇒ x ∈ (−1, 0) ∪ (1, ∞).

Example 52. For what values of x is x > e + ln x?
For this example, I won’t give the full detailed instructions of what to do on the TI84; I’ll
only show a few screenshots. First, rewrite the inequality as x − e − ln x > 0 and so graph
y = x − e − ln x on your graphing calculator:
After Graphing. Zoom In, Adjust Window.
Look for the values of x for which x − e − ln x = 0. They are x = 0.7083, 4.1387:
Leftmost horizontal intercept. Rightmost horizontal intercept.
Based on these horizontal intercepts and what the graph looks like, we conclude: x > e+ln x
if and only if x ∈ (0, 0.7083) ∪ (4.1387, ∞).
Exercise 20. Use a graphing calculator to find the values of x for which each of the
√ 1
following inequalities is true. (a) x3 − x2 + x − 1 > ex . (b) x > cos x. (c) > x3 + sin x.
1−x 2
(Answers on pp. 328.)

21 Formulating an Equation or a System of Linear
Equations from a Problem Situation
Exercise 21. (PSLE-style question.) When Apu was 40 years old, Beng was twice as old
as Caleb. Today, Caleb is 28 years old and Apu is twice as old as Beng. What are the ages
of Apu and Beng today? (If necessary, assume that the age of a person is always an integer
and is fixed between January 1st and December 31st of each year.) (Answer on p. 329.)
Solve the next two problems without using a calculator.

Exercise 22. The points (1, 2), (3, 5), and (6, 9) satisfy the equation y = ax2 +bx+c. What
are a, b, and c? (Answer on p. 329.)
Exercise 23. The point (−1, 2) satisfies the equation y = ax2 + bx + c. Moreover, the
minimum point of the equation y = ax2 + bx + c is (0, 0). What are a, b, and c? (Answer on
p. 329.)

Part II
Calculus

22 Equations of Lines
Recall that Slope = Rise / Run = “Change in y” / “Change in x”.

Moreover, the line with slope m and which passes through the point (a, b) has equation
y − b = m(x − a).
Example 53. The line with slope 3 and which passes through the point (1, 2) has equation
y−2 = 3(x−1). If desired, we can rearrange this equation into a more familiar form: y = 3x−1.
Example 54. The line with slope −1 and which passes through the point (3, −1) has
equation y − (−1) = −1(x − 3). If desired, we can rearrange this equation into a more
familiar form: y = −x + 2.

y − 0 = 2(x − 0). If desired, we can rearrange this equation into a more familiar form: y = 2x.
y − 1 = 0(x − 1). If desired, we can rearrange this equation into a more familiar form: y = 1.

23 The Derivative as Slope of the Tangent
The problem of finding the derivative is the problem of finding the slope of the tangent to
a graph at a given point.
Graphed below is some function f . Pick some point A = (a, f (a)). Draw the line l which
is tangent to the graph at the point A.
How do we find the slope of l? Unsure of how to proceed, we try a crude approximation.
Pick some point X1 = (x1 , f (x1 )) that is also on the graph. Consider the line AX1 . What’s
f (x1 ) − f (a)
its slope? Slope = Rise ÷ Run and so AX1 has slope .
x1 − a
This number serves as our first crude approximation of the slope of l.
How can we improve on this approximation? Simple — just pick some point X2 = (x2 , f (x2 ))
f (x2 ) − f (a)
that is closer to A. The line AX2 has slope .
x2 − a
This number serves as our second, improved approximation of the slope of l.
At least in theory, we can keep repeating this procedure, by picking points that are ever
closer to A. Our estimates of the slope of l will get ever better. Altogether then, we are
motivated to make the following informal definition of the derivative:

The derivative of the function f at the point a is the value of the following expression
f (x) − f (a)
, when x is “very close to, but not equal to” a.
x−a
The following proposition summarises the rules of differentiation you need to know. You
don’t need to know why they work; instead, you need only blindly apply them like a monkey.
For example, Rule #1 says that the function h defined by h(x) = k (where k is some
constant) has derivative h′ defined by h′ (x) = 0.
Proposition 1. If k is a constant, f and g are functions with derivatives f ′ and g ′ , then:
d d k d x
1. k = 0, 2. x = kxk−1 , 3. e = ex ,
dx dx dx
d 1 d d
4. ln x = , 5. f ± g = f ′ ± g′, 6. kf = kf ′ .
dx x dx dx
Proof. Omitted.
The derivative of the function f is the function f ′ may be written compactly as:
df df (x)
= f′ or = f ′ (x).
dx dx
Example 57. Graphed below (in red) is the function f defined by f (x) = 5x.
Also graphed is the derivative of f . The derivative of f is itself a function, namely:

the function f ′ defined by f ′ (x) = 5.
This says that the graph of f has constant slope 5 everywhere.

Example 58. Graphed below (in red) is the function g defined by g(x) = x2 .
Also graphed is the derivative of g. The derivative of g is itself a function, namely:
the function g ′ defined by g ′ (x) = 2x.
This says that the tangent to the graph of g at the point (x, g(x)) has slope 2x.
For example, the tangent at (1.5, 2.25) has slope 2x = 2(1.5) = 3. Its equation is thus
y − 2.25 = 3(x − 1.5) or y = 3x − 2.25.
As another example, the tangent at (−1, −1) has slope 2x = 2(−1) = −2. Its equation is thus
y − (−1) = −2 [x − (−1)] or y = −2x − 1.

Example 59. Graphed below (in red) is the function h defined by h(x) = x3 − 2x2 + 5x − 1.
Also graphed is the derivative of h. The derivative of h is itself a function, namely:
the function h′ defined by h′ (x) = 3x2 − 4x + 5.
This says that the tangent to the graph of h at the point (x, h(x)) has slope 3x2 − 4x + 5.
For example, the tangent at (−1, −9) has slope 3(−1)2 − 4(−1) + 5 = 12. Its equation
is thus y − (−9) = 12 [x − (−1)] or y = 12x + 3.
As another example, the tangent at (1, 3) has slope 3(1)2 − 4(1) + 5 = 4. Its equation is thus
y − 3 = 4(x − 1) or y = 4x − 1.

Given an equation y = f (x), the derivative of y with respect to x is simply the function
f ′ . In this context, this function may also be denoted
dy
.
dx
Example 60. Consider the equation y = 5x. The derivative of y with respect to x is
dy
= 5.
dx
Example 61. Consider the equation y = x2 . The derivative of y with respect to x is
dy
= 2x.
dx
Example 62. Consider the equation y = x3 − 2x2 + 5x − 1. The derivative of y with

respect to x is
dy
= 3x2 − 4x + 5.
dx
A whole load more examples of differentiation on the next two pages:

Example 63. Rule #1. The function f defined by f (x) = 7 is an example of a constant
function.
Its derivative is the function f ′ defined by f ′ (x) = 0. What Rule #1 says is that the
derivative of any constant function is simply the zero function (i.e. the function that maps
every input to the number 0). Intuitively and graphically, this is obvious.
Example 64. Rule #1. The function g defined by g(x) = 31 is another example of a
constant function. Its derivative is the function g ′ defined by g ′ (x) = 0.
Example 65. Rule #2. The function f defined by f (x) = x has derivative f ′ defined by
f ′ (x) = 1.
Example 66. Rule #2. The function g defined by g(x) = x2 has derivative g ′ defined by
g ′ (x) = 2x.
Example 67. Rule #2. The function h defined by h(x) = x3 has derivative h′ defined by
h′ (x) = 3x2 .
Example 68. Rule #2. The function i defined by i(x) = x4 has derivative i′ defined by
i′ (x) = 4x3 .
Example 69. Rule #3. The function f defined by f (x) = ex has derivative f ′ defined by
f ′ (x) = ex . That is, interestingly enough, the derivative of f is itself.
Example 70. Rule #4. The function g defined by g(x) = ln x has derivative g ′ defined
by g ′ (x) = 1/x.
Example 71. Rule #5. The function h defined by h(x) = x3 + ln x has derivative h′
defined by h′ (x) = 3x2 + 1/x.
Example 72. Rule #5. The function h defined by h(x) = ex + x4 has derivative h′ defined
by h′ (x) = ex + 4x3 .

Of course, Rule #5 generalises to where we’re summing up more than two functions.
Example 73. Rule #5. The function i defined by i(x) = 15 + x + x2 has derivative i′
defined by i′ (x) = 1 + 2x.
Example 74. Rule #5. The function f defined by f (x) = 1 + x + x2 + x3 + x4 + ⋅ ⋅ ⋅ + x100

has derivative f ′ defined by f ′ (x) = 1 + 2x + 3x2 + 4x3 + ⋅ ⋅ ⋅ + 100x99 .
Example 75. Rule #6. The function f defined by f (x) = 30x has derivative f ′ defined
by f ′ (x) = 30.
Example 76. Rule #6. The function g defined by g(x) = 7 (1 + x + x2 + x3 + ⋅ ⋅ ⋅ + x100 )

has derivative g ′ defined by g ′ (x) = 7 (1 + 2x + 3x2 + ⋅ ⋅ ⋅ + 100x99 ).
Example 77. Rule #6. The function h defined by h(x) = 4ex has derivative h′ defined
by h′ (x) = 4ex . (Interestingly, the only functions whose derivatives are themselves must be
of the form f (x) = kex , for some constant k.)
Exercise 24. For the functions below, (a) compute its derivative; (b) find the equations of
the tangents to the graph at the points where x = 1 and x = 2. (Answer on p. 330.)
(i) f defined by f (x) = ln x + ex + x2 . (ii) g defined by g(x) = 1/x + x3 + 7ex .
Exercise 25. For each of the two equations below, (a) compute the derivative of y with
respect to x; (b) find the equations of the tangents to the graph at the points where x = 1
and x = 2. (Answer on p. 331.)
√ 3
(i) y = 13 ( x − 2 ). (ii) y = 9ex − x5 .
x

24 Chain Rule
The Chain Rule is yet another rule of differentiation. A simple example to illustrate:
Example 78. When I add 1 g of Milo (the x-variable) to a cup of water, the volume of
the water increases by 2 cm3 (the y-variable). We can write this more compactly as
dy
= 2 cm3 g−1 .
dx
When the volume of the water increases by 1 cm3 (the y-variable), the water level (in the
cup) rises by 0.3 cm (the z-variable). We can write this more compactly as
dz
= 0.3 cm cm−3 = 0.3 cm−2 .
dy
Altogether then, when I add 1 g of Milo (the x-variable) to a cup of water, I should expect
the water level to rise by 0.6 cm. That is,
dz
= 0.6 cm g−1 .
dx
We got the above expression for dz/dx by making the following quick computation:
dz dz dy
= = 2 × 0.3 = 0.6 cm g−1 .
dx dy dx
In general, let x, y, and z be variables. Suppose x and z are not directly related. However,
a small change in x causes a small change in y. And in turn, a small change in y causes a
small change in z.
Informally, the Chain Rule addresses the following question: “If there is a small unit
change in x, how does z change?” The answer is this:
The change in z caused by The change in z caused by The change in y caused by

= × .
a small unit change in x a small unit change in y a small unit change in x
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
dz/dx dz/dy dy/dx
dz dz dy
The Chain Rule is thus simply this equation: = × .
dx dy dx

Examples to illustrate how the Chain Rule is applied:
Example 79. Let f be defined by f (x) = ex . Its derivative f ′ is defined by:

3
dex dx3
3 3
′ dex
f (x) = = 3
= e x3
⋅ 3x2 .
dx dx dx
Another simple example:
√
Example 80. Let g be defined by g(x) = 4x − 1. Its derivative g ′ is defined by:
√ √
d 4x − 1 d 4x − 1 d(4x − 1) −0.5 −0.5
g ′ (x) = = = 0.5 (4x − 1) ⋅ 4 = 2 (4x − 1) .
dx d(4x − 1) dx
Here’s a more complicated example, where the Chain Rule is applied twice.
3
Example 81. Let h be defined by h(x) = (ln x2 + e5x+3 ) . Its derivative h′ is defined by:
3 3
′ d (ln x2 + e5x+3 ) d (ln x2 + e5x+3 ) d(ln x2 + e5x+3 )
h (x) = =
dx d(ln x2 + e5x+3 ) dx
2 d ln x2 dx2 de5x+3 d(5x + 3)

= 3 (ln x2 + e5x+3 ) [ + ]
dx2 dx d(5x + 3) dx
2 1 5x+3 2 2
= 3 (ln x2 + e5x+3 ) ( 2
⋅ 2x + e 5x+3
⋅ 5) = 3 (ln x 2
+ e ) ( + 5e5x+3 ) .
x x
Exercise 26. The functions f , g, and h are defined below. Find the value of the derivative
of each, at x = 0. (Answer on p. 331.)
(a) f (x) = x2 .
2
(b) g(x) = 1 + [x − ln (x + 1)] .
2 3
(c) h(x) = (1 + [x − ln (x + 1)] ) .

25 Increasing, Decreasing, and f ′
Example 82. The function f defined by f (x) = x2 has derivative f ′ defined by f ′ (x) = 2x.
For x < 0, f is (strictly) decreasing, i.e. f ′ (x) < 0. For x > 0, f is (strictly) increasing,
i.e. f ′ (x) > 0.
At x = 0, f is neither strictly deceasing nor strictly increasing (f is flat), i.e. f ′ (0) = 0.
A stationary point (x, f (x)) of a function f is a point at which f ′ (x) = 0. So in this

example, (0, 0) is a stationary point.
Example 83. Graphed below is the function g defined by g(x) = 3x3 − 5x2 + x − 7. Its
derivative g ′ is defined by g ′ (x) = 9x2 − 10x + 1.
From what we know about quadratic equations, g ′ (x) = 9x2 − 10x + 1 = (9x − 1)(x − 1) is
negative if 1/9 < x < 1, zero if x = 1/9 or x = 1, and positive if x < 1/9 or x > 1.
So for 1/9 < x < 1, the function g is (strictly) decreasing, i.e. g ′ (x) < 0. And for x < 1/9
or x > 1, the function g is (strictly) increasing, i.e. g ′ (x) > 0.
At x = 1/9 or x = 1, g is neither strictly deceasing nor strictly increasing, i.e. g ′ (1/9) = 0

and g ′ (1) = 0. Those are also the stationary points of g.

Example 84. Graphed below is the function h defined by h(x) = ln x. Its derivative h′ is
defined by h′ (x) = 1/x.
The derivative h′ is always positive. So the slope of the graph of h is positive everywhere.
(There are no stationary points.)
Exercise 27. Let f be defined by f (x) = 3x2 −4x+1. (i) Sketch the graph of f . (ii) Identify
where f ′ (x) is negative, zero, and positive (equivalently, where the slope of the graph of f
is decreasing, flat, and increasing). (iii) Identify the stationary points. (Answer on p. 332.)

26 Finding Turning Points (the First Derivative Test)
It turns out that every maximum and minimum turning point is a stationary
point. The intuition for this is quite simple:
Example 85. Graphed below is f defined by f (x) = −(x − 1)2 . Here’s the intuition for why
f ′ (0) = 0 (i.e. why there is a stationary point at x = 0):
In order for 1 to be a maximum turning point of f , it must be that to its left, f is increasing;
while to its right, f is decreasing. In other words, to the left of 1, f ′ (x) ≥ 0. While to the
right of 1, f ′ (x) ≤ 0. Altogether then, we must have f ′ (1) = 0 — that is, the maximum
turning point must also be a stationary point.
The next exercise asks you to give a similar piece of intuition for why g ′ (−1) = 0.
Exercise 28. Explain why g ′ (−1) = 0 in the above Example. (Answer on p. 332.)
Every maximum or minimum turning point is a stationary point. However, the converse is
not true: not every stationary point is a turning point.
So to identify all the maximum or minimum turning points of a function, we can follow
this two-step recipe:

The Simple Recipe for Finding Maximum and Minimum Turning Points.
1. Find all stationary points (i.e. where the derivative is zero).

Since every turning point is a stationary point, this ensures that we do not miss out on any
turning points.
2. Investigate the nature of these points.

Some stationary points may be maximum or minimum turning points. However, some
stationary points may be neither (for example, they may be inflexion points, to be discussed
in the next chapter). So we really do need to carefully check the nature of the stationary
points we’ve found.
For H1 Maths, checking what exactly a stationary point is usually just involves sketching
the graph (either manually or using your graphing calculator).
Examples on how to use the above Simple Recipe:
Example 86. Consider f defined by f (x) = x2 .
1. f ′ (x) = 2x. So the only stationary point is at x = 0.

2. This is a minimum turning point, as a quick graph sketch will verify.

Example 87. Consider g defined by g(x) = 4x7 − 14x4 + 28x.
1. g ′ (x) = 28x6 − 56x3 + 28 = 28 (x6 − 2x3 + 1) = 28 (x3 − 1) (x3 − 1). So the only stationary
point is at x = 1.
2. But this is not a turning point, as a quick graph sketch will verify.
Example 88. Consider h defined by h(x) = 3.
1. h′ (x) = 0 everywhere. So every point is a stationary point.

2. However, no point is a turning point. Indeed, the graph of h is simply a horizontal line.

Example 89. Consider i defined by i(x) = x3 + x2 + x + 1.
1. i′ (x) = 3x2 + 2x + 1. This is a ∪-shaped quadratic expression whose discriminant is

negative. So, it is never the case that i′ (x) = 0 and there are no stationary points.
2. Hence, there are no turning points either.
Exercise 29. For each of the following functions, identify any maximum and minimum
turning points. (Answers on pp. 333 and 334.)
(i) f defined by f (x) = x.
(ii) g defined by g(x) = 100.
(iii) h defined by h(x) = x4 − 2x2 .
(iv) i defined by i(x) = x3 .
(v) j defined by j(x) = x3 + x2 − x + 1.

27 Inflexion Points
• A graph is concave downwards (or simply concave) in a region if the line segment
connecting any two points of the graph (in that region) is below the graph.
• A graph is concave upwards (or simply convex) in a region if the line segment con-
necting any two points of the graph (in that region) is above the graph.
• An inflexion point is any point where the concavity of the graph changes (either from
concave downwards to concave upwards OR concave upwards to concave downwards).11
Example 90. Graphed below is f defined by f (x) = x3 .

For x < 0, the graph is concave downwards (the line segment connecting any two points is
below the graph). For x > 0, it is concave upwards (the line segment connecting any two
points is above the graph).
(0, 0) is an inflexion point because this is where the graph changes from concave downwards
to concave upwards.
The tangent line test says that a point is an inflexion point if and only if the line is
above the graph on one side of the point and below the graph on the other side.
This is illustrated in the above example.
11
The discussion in this chapter here is very brief and informal, because a proper discussion of inflexion points would be
much longer. If you’re really interested in what inflexion points are, please read my H2 Mathematics Textbook. (In the
2006-2015 H1 Maths exams, I can find only one 2-mark question on inflexion points — see Exercise 63.1. So it isn’t terribly
important, if all you care about is getting an A.) By the way, inflection would be the American spelling.

Example 91. Graphed below is g defined by g(x) = x5 + 5x4 + 10x3 + 10x2 + 5x + 1.
For x < −1, the graph is concave downwards. And for x > −1, it is concave upwards. So the
graph of g has an inflexion point at x = −1.
Example 92. Graphed below is h defined by h(x) = x3 − 2x2 + 4x + 1.

For x < 2, the graph is concave downwards. And for x > 2, it is concave upwards. So the
graph of h has an inflexion point at x = 2.

The A-level syllabuses explicitly exclude non-stationary points of inflexion. Nonethe-
less, there is the temptation to believe that “every inflexion point must also be a stationary
point”. Here’s a quick counter-example to dispel this false belief:
Example 93. Graphed below is the function f defined by f (x) = x3 + x. We have f ′ (x) =
3x2 + 1. The point (0, 0) is not a stationary point because f ′ (0) = 1 ≠ 0.
Nonetheless, it is an inflexion point, because to the left of 0, f is concave downwards; and
to the right, f is concave upwards. (We can also verify this using the tangent line test.)
The point (0, 0) is thus an example of a non-stationary point of inflexion.
But don’t worry, the A-level exams will ONLY ask about stationary points of inflexion.
And so for the purposes of the A-level exams, the Simple Recipe given in the previous
chapter will detect not only all turning points, but also all inflexion points.

Example 94. Graphed below is the function f defined by f (x) = x5 + 2x4 + x3 .
f ′ (x) = 5x4 + 8x3 + 3x2 = x2 (5x2 + 8x + 3) = x2 (5x + 3)(x + 1). So the only stationary points
are at x = −1, x = −0.6, and x = 0. These are labelled in the graph below as A, B, and C.
A is a maximum turning point; B is a minimum turning point, and C is a stationary
point of inflexion. (The graph of f actually has two other points of inflexion other than C.
However, they are non-stationary and you are not required to find them for the A-levels.)
Example 95. Graphed below is the function g defined by g(x) = 9x4 + 2x3 − 3x2 .
g ′ (x) = 36x3 + 6x2 − 6x = 6x (6x2 + x − 1) = 6x(3x − 1)(2x + 1). So the only stationary points
are at x = −1/2, x = 0, and x = 1/3. These are labelled in the graph below as A, B, and C.
A and C are both minimum turning points, B is a maximum turning point. There are no
stationary points of inflexion. (There may or may not be non-stationary points of inflexion,
but you’re not required to know how to find these for the A-levels.)

Example 96. Graphed below is the function h defined by h(x) = 2x6 − 3x4 − 1.
h′ (x) = 12x5 − 12x3 = 12x3 (x2 − 1) = 12x3 (x − 1)(x + 1). So the only stationary points are
at x = −1, x = 0, and x = 1. These are labelled in the graph below as A, B, and C.
A and C are both minimum turning points, B is a maximum turning point. There are no
stationary points of inflexion. (Again, there may or may not be non-stationary points of
inflexion, but you’re not required to know how to find these for the A-levels.)
Exercise 30. (Answer on p. 335.) For each of the following functions, find the stationary
points and investigate the nature of each.
(i) f defined by f (x) = x3 − 3x + 1.
(ii) g defined by g(x) = x3 − 3x2 + 3x + 5.

28 Finding Max/Min Points on the TI84
Example 97. Define f by f (x) = x − sin (0.5πx). Let’s find the minimum point of f , in
the region where 0 < x < 2.

3. Press X,T,θ,n − SIN 0 . 5 . To enter “π”, press the blue 2ND button and then π
(which corresponds to the ∧ button). Now press X,T,θ,n ) and altogether you will
have entered “x − sin(0.5πx)”.
4. Now press GRAPH and the calculator will graph y = x − sin(0.5πx).
Let’s zoom in to the region where 0 ≤ x ≤ 2.

5. Press the (ZOOM) button to bring up a menu of ZOOM options.
6. Press 2 to select the Zoom In option. Using the < and > arrow keys, move the cursor
to where X = 1.0638298, Y = 0. Now press ENTER and the TI will zoom in a little,
centred on the point X = 1.0638298, Y = 0.
It looks like starting at x = 0, the function is decreasing, then hits a minimum point, then
keeps increasing. Our goal now is to find out what that minimum point is.

5. Press 3 to select the “minimum” option. This brings you back to the graph, with a
cursor flashing. Also, the TI84 prompts you with the question: “Left Bound?”
TI84’s MINIMUM function works by you first choosing a “Left Bound” and a “Right
Bound” for x. TI84 will then look for the minimum point within your chosen bounds.
6. Using the < and > arrow keys, move the blinking cursor until it is where you want your
first “Left Bound” to be. For me, I have placed it a little to the left of where I believe
the minimum point to be.
7. Press ENTER and you will have just entered your first “Left Bound”.
TI84 now prompts you with the question: “Right Bound?”.
8. So now just repeat. Using the < and > arrow keys, move the blinking cursor until it is
where you want your first “Right Bound” to be. For me, I have placed it a little to the
right of where I believe the minimum point to be.
9. Again press ENTER and you will have just entered your first “Right Bound”.
TI84 now asks you: “Guess?” This is just asking if you want to proceed and get TI84 to
work out where the minimum point is. So go ahead and:
10. Press ENTER . TI84 now informs you that there is a “Zero” at “X = .56066485”,
“Y = −.2105137” and places the cursor at precisely that point. This is our desired
minimum point.

29 Finding the Derivative at a Point on the TI84
Example 98. Define f by f (x) = esin x . Our goal is to find f ′ (2) and f ′ (3).

3. Press X,T,θ,n , blue 2ND button and then ex (which corresponds to the LN button)
∧ SIN X,T,θ,n ) to enter “esin x ”.
5. Press 6 to select the “dy/dx” option.
The TI84 now graphs y = esin x . Now you need only tell the TI84 at which point (x, y) you’d
like it to evaluate dy/dx. So to find f ′ (2), simply
6. Press 2 .
7. Press ENTER . You’re now told that f ′ (2) = −1.033116.
To find f ′ (3):
button) to again bring up the CALCULATE menu. Again press 6 to select the “dy/dx”
option. The only difference now is that we press 3 . Press ENTER . You’re now told
that f ′ (3) = −1.140038.

30 Connected Rates of Change Problems
Example 99. We unload sand onto a flat surface at a steady rate of 0.01 m3 s-1 . Assume
the unloaded sand always forms a perfect cone whose height and base diameter are always
equal.
Let’s find the rate at which the base area of the cone is increasing, at the instant t = 20 s.
First, recall that a cone with base radius r and height h has volume
1
V = πr2 h.
3
Since the base diameter equals the height (or h = 2r), we can rewrite this as
2
V = πr3 .
3
Now differentiate the above equation with respect to t, to get

dV dr
= 2πr2 .
dt dt
Let A = πr2 be the base area. The rate at which the base area is increasing is
dA dr dV
= 2πr = ÷ r.
dt dt dt
The volume of the sand is always increasing at a rate 0.01 m3 s-1 . That is:
dV
= 0.01 m3 s−1 .
dt
3 3V 1/3 0.3 1/3

V ∣t=20 = 20 × 0.01 = 0.2 m . Hence, r∣t=20 = ( ) ∣ = ( ) m. Altogether then,
2π t=20 π
dA 0.3 1/3
∣ = 0.01 ÷ ( ) = 0.0219 m2 s−1 .
dt t=20 π

Exercise 31. (Answer on p. 336.) Illustrated below is a cone with lateral l, base radius
r, and height h. You are given that such a cone has total external surface area (excluding
the base) πrl and volume πr2 h/3.
A manufacturer wishes to manufacture a cone whose volume is fixed at 1 m3 and whose

total external surface area (excluding the base) is minimised. Let’s find out what its height
should be, by following these steps:
(a) Express r in terms of h.
(b) Use the Pythagorean Theorem to express l in terms of r and h. Hence express l solely
in terms of h.
(c) Now express the total external surface area A (excludes the base) solely in terms of h.
(d) Show that
dA 3 π − h63
= .
dh 2 A
Hence conclude that the only stationary point is
6 1/3
h = ( ) ≈ 1.24 m.
π
(e) Using your expression from part (c) and your graphing calculator, graph A as a function
of h. Hence confirm that the stationary point we found in part (d) is indeed the minimum
turning point. That is, the desired height is indeed
6 1/3
h = ( ) ≈ 1.24 m.
π

31 Integration as the Reverse of Differentiation
If the function g is the derivative of the function f , then we may also say that f is an
indefinite integral of g.
Example 100. Consider the functions f and g defined by f (x) = x2 and g(x) = 2x.
The function g is the derivative of the function f . We write:
df df (x)
=g or = g(x).
dx dx
The two statements above are equivalent. Each says: “the function g is the derivative of
the function f ”.
Conversely, the function f is an indefinite integral of the function g. We write:
∫ g dx = f or ∫ g(x) dx = f (x).
The two statements above are equivalent. Each says: “the function f is an indefinite
integral of the function g”.
Remarks on notation:
• The symbol ∫ is called the integration sign — it is an elongated S.

• The symbol dx is called the differential of the variable x — it says that the variable
of integration is x.
• The function g to be integrated is called the integrand.
One common source of confusion amongst students is a failure to grasp that x is merely a
dummy variable. We can replace x with any other letter. The next example illustrates:

Example 101. Consider again the functions f and g defined by f (x) = x2 and g(x) = 2x.
The function f is an indefinite integral of the function g. We can write either
∫ g dx = f or ∫ g(x) dx = f (x).
But we can equally well write any of the following:
∫ g da = f or ∫ g(a) da = f (a).
∫ g db = f or ∫ g(b) db = f (b).
∫ g dc = f or ∫ g(c) dc = f (c).
The dummy variable is merely a place-holder for whatever input that goes into the
function f or g. We can use any letter for this dummy variable, be it x or a or b or c.
More examples to illustrate that integration is the reverse of differentiation:
Example 102. Consider the functions f and g defined by

1
f (x) = ln x + x and g(x) = + 1.
x
The function g is the derivative of the function f . Conversely, the function f is an indefinite
integral of the function g. We may write either
∫ g dx = f or ∫ g(x) dx = f (x).
Example 103. Consider the functions f and g defined by
2 2
f (x) = ex and g(x) = 2x ⋅ ex .
The function g is the derivative of the function f . Conversely, the function f is an indefinite
integral of the function g. We may write either
∫ g dx = f or ∫ g(x) dx = f (x).

Exercise 32. The following is a list of functions. State which, if any, function is an
indefinite integral of another function. (Answer on p. 336.)
f defined by f (x) = 2x, g defined by g(x) = 3x2 , h defined by h(x) = x3 ,
i defined by i(x) = x2 + 2 j defined by j(x) = x3 + 1 k defined by k(x) = x2 .

32 The Constant of Integration
It turns out that every function has infinitely many indefinite integrals.
Example 104. Consider the function f defined by f (x) = 2x. The following functions are
all indefinite integrals of f :
g defined by g(x) = x2 , h defined by h(x) = x2 + 7, i defined by i(x) = x2 − 11.
Indeed, f is the derivative of any function j of the form j(x) = x2 + C (where C is any
constant). Thus, any such j is an indefinite integral of f .
This is not terribly surprising, given that the derivative of any constant C is 0. We call C
the constant of integration.
Moreover, the indefinite integral is unique up to the constant of integration. That

is, if g and h are both indefinite integrals of f , then g and h must differ by only a constant.
Example 105. Define f by f (x) = 2x. An indefinite integral of f is g defined by g(x) = x2 .

Given that the function h is also an indefinite integral of f , then we know immediately
that g and h differ by at most some constant C. That is, it must be that h takes the form
h(x) = x2 + C (where C is some constant).
Altogether then:
1. Every function has infinitely many indefinite integrals.

2. Moreover, each of these indefinite integrals differ from each other by at most some
constant term.
Example 106. Define f by f (x) = xex . You are given that an indefinite integral of f is
the function g defined by g(x) = ex (x − 1).
Then you immediately know that:
1. Every function h of the form h(x) = ex (x − 1) + C is an indefinite integral of f .
2. Moreover, besides such functions, there are no other indefinite integrals of f .

33 Basic Rules of Integration
Below are the basic rules of integration.

For example, Rule #1 says that if h is the function defined by h(x) = k (where k is some
constant), then every function i defined by i(x) = kx + C is an indefinite integral of h.
Moreover, there are no other indefinite integrals of h.
Proposition 2. Let k be any constant. Let f and g be functions with derivatives f ′ and
g ′ . Then
(ax + b)k+1
1. ∫ k dx = kx + C, 5. ∫ (ax + b) dx
k
= + C,
a(k + 1)
xk+1 1 ax+b
2. ∫ xk dx = + C, 6. ∫ e
ax+b
dx = e + C,
k+1 a
1
3. ∫ dx = ln ∣x∣ + C, 7. ∫ f ′ (x) ± g ′ (x) dx = f (x) ± g(x) + C,
x
′
4. ∫ ex dx = ex + C, 8. ∫ kf (x) dx = kf (x) + C,
where in each case, C is the constant of integration. (For Rule #2, assume k ≠ −1.
And if k < 0, assume x ≠ 0. For Rule #3, assume x ≠ 0.)
Proof. To prove ∫ f ′ (x) dx = f (x), it suffices to prove that the derivative of f is f ′ .
d
So to prove Rule #3 — i.e. that ∫ x−1 dx = ln ∣x∣+C — it suffices to prove that (ln ∣x∣ + C) =
dx
x−1 for all x ≠ 0. This we now do. First note that
⎧
⎪
⎪
⎪ln x + C, for x > 0,
ln ∣x∣ + C = ⎨
⎪
⎪
⎩ln (−x) + C,
⎪ for x < 0.
⎧
⎪ 1
⎪
⎪
⎪ , for x > 0,
d ⎪
⎪
⎪ x
Thus, (ln ∣x∣ + C) = ⎨
dx ⎪
⎪
⎪
⎪
⎪ −1 1
⎪
⎪ = , for x < 0.
⎩ −x x
d
And so indeed (ln ∣x∣ + C) = x−1 for all x ≠ 0. (Exercise 33 requests that you prove the
dx
remaining rules.)

Example 107. Define the functions f ′ , g ′ , and h′ by
f ′ (x) = x, g ′ (x) = x2 , and h′ (x) = x3 .

They have indefinite integrals f , g, and h, defined by
x2 x3 x4
f (x) = + C1 , g(x) = + C2 , and h(x) = + C3 ,
2 3 4
where C1 , C2 , and C3 are constants of integration. We may also simply write:
x2 2 x3 x4
∫ x dx = + C1 , ∫ x dx = + C2 , and ∫ x3 dx = + C3 .
2 3 4
f ′ (x) = ex , g ′ (x) = 3ex , and h′ (x) = 3ex + x2 .

x3
f (x) = ex + C1 , g(x) = 3ex + C2 , + C3 ,
and h(x) = 3ex +
3
x3
∫ e dx = e + C1 ,
x x
∫ 3e dx = 3e + C2 ,
x x
and ∫ 3ex + x2 dx = 3ex + + C3 .
3
f ′ (x) = (7x + 2)2 , g ′ (x) = (7x + 2)3 , and h′ (x) = 5(7x + 2)3 .

(7x + 2)3 (7x + 2)4 (7x + 2)4
f (x) = , g(x) = , and h(x) = 5 ,
3⋅7 4⋅7 4⋅7
2 (7x + 2)3 3 (7x + 2)4

∫ (7x + 2) dx = 3⋅7
+ C1 , ∫ (7x + 2) dx = 4⋅7
+ C2 ,
5(7x + 2)4
and ∫ 5(7x + 2)3 dx = + C3 .
4⋅7

Exercise 33. Complete the proof of Proposition 2. (Answer on p. 337.)
Exercise 34. Find each of the following indefinite integrals. (Don’t forget to include the
constant of integration. Answer on p. 337.)
(i) ∫ 7x5 − 8x4 + 3x2 + 2 dx. (ii) ∫ e5x+2 − (5x + 2)2 dx. (iii) ∫ 16/x + 32x3 dx.

34 The Definite Integral as the Area Under a Graph
Surprisingly, we can use integration to find the area under the graph of a function.
Example 110. Graphed below is the function f defined by f (x) = 2x. What is the shaded
green area under the graph of f , between the lines x = 2 and x = 5?
We can of course find this area using primary school methods: This is a parallelogram with
base 3 and sides 4 and 10. Hence, it has area
1 1
× Base × (Sum of sides) = × 3 × (4 + 10) = 21.
2 2
But surprisingly enough, this area can also be found using integration. Pick any indefinite
integral of f — say g defined by g(x) = x2 . Then the desired area is simply:
g(5) − g(2) = 52 − 22 = 21.
We can also write this area as

5
∫2 f (x) dx.
The above expression is called a definite integral, where

5
∫2 f (x) dx = g(5) − g(2) = 21.
5
We sometimes also write [g(x)]2 as shorthand for g(5) − g(2).

This is an amazing “trick” for finding the area under a graph. Why it works involves
something called the Fundamental Theorems of Calculus, which are beyond the scope of
H1 Maths.12 For H1 Maths, all you need know is that this amazing “trick” works and all
you need do is perform this “trick” like a monkey. More examples:
√
Example 111. Let the function f be defined by f (x) = x + 1. The definite integral
3
∫1 f dx (simply the area under f , between 1 and 3) is highlighted in blue. Similarly, the
3
definite integral ∫ f dx (simply the area under f , between 5 and 8) is highlighted in red.
1
Example 112. Consider the function g defined by g(x) = 9x2 + 6x + 1. What is the area
under the graph of g, between the lines x = 0 and x = 7?
By the above amazing “trick”, the desired area is simply:

7 7
2 3 2 3 2 3 2
∫0 9x + 6x + 1 dx = [3x + 3x + x]0 = 3 ⋅ 7 + 3 ⋅ 7 + 7 − (3 ⋅ 0 + 3 ⋅ 0 + 0) = 1183.
12
But see my H2 Mathematics Textbook if you’re interested.

Example 113. Consider the function h defined by h(x) = (5x + 2)2 . What is the area
under the graph of h, between the lines x = −1 and x = 1?

1 1
2 1 3 1 1 3 74
∫−1 (5x + 2) dx = [ (5x + 2) ] = (5 ⋅ 1 + 2)3 − [5 ⋅ (−1) + 2] = .
3⋅5 −1 3⋅5 3⋅5 3
Example 114. Consider the function i defined by i(x) = ex . What is the shaded green
area under the graph of i, between the lines x = 3 and x = 4?

4 4
4
∫3 i(x) dx = ∫ ex dx = [ex ]3 = g(4) − g(3) = e4 − e3 ≈ 34.5.
3

(i) Find the area bounded by the x-axis, the lines x = 1 and x = 2, and the graph of y = 6.
(ii) Find the area bounded by the x-axis, the lines x = −2 and x = 3, and the graph of
y = x2 + 5x + 10.
(iii) Find the area bounded by the x-axis, the lines x = 1 and x = 2, and the graph of
y = 1/x.

35 Area between a Curve and Lines Parallel to Axes
Example 115. Find the exact area bounded by the curve y = x2 and the horizontal lines
y = 1 and y = 2.
It’s always helpful to make a quick sketch (given below). Our desired area is labelled A
below. To find a desired area, there are usually multiple methods, some quicker than others.
√ √
Method #1. The entire rectangle A + B + C + D has area 2 × 2 2 = 4 2. B has area
√ √
−1 x 3 −1 1 2 2 2 2−1
2
∫−√2 x dx = [ 3 ] √ = − 3 − (− 3 ) = 3
.
− 2
By symmetry, D has the same area as B. C has area 1 × 2. Hence, A has area
√ √
√ 2 2−1 2 2−1 4 √
A + B + C + D − (B + C + D) = 4 2 − ( +2+ ) = (2 2 − 1) .
3 3 3
√
Method #2. The right branch of the curve y = x2 has equation x = y. The right half of
y=2 y=2 √ 2 2 2 √ 4 √
the area A is ∫ x dy = ∫ y dy = [y 3/2 ]1 = (2 2 − 1). Hence, A = (2 2 − 1).
y=1 y=1 3 3 3
Exercise 36. Find the exact area bounded by the curve y = x3 , the horizontal lines y = 1
and y = 2, and the vertical axis. (Answer on p. 338.)

36 Area between a Curve and a Line
Example 116. Find the area A bounded by the curve y = x2 and the line y = x + 1.
√
1± 5
By the quadratic formula, the curve and line intersect at the points x = .
2
√ √
(1+ 5)/2 (1+ 5)/2
2 x2 x3
∫(1−√5)/2 x + 1 − x dx = [ + x − ] √
2 3 (1− 5)/2
⎡ (1 + √5)2 √ √ 3 √ 2
(1 + 5) ⎤⎥ ⎡⎢ (1 − 5)
√ √ 3
(1 − 5) ⎤⎥
⎢ 1 + 5 1 − 5
= ⎢⎢ + − ⎥−⎢
⎥ ⎢ + − ⎥
⎥
⎢ 23 2 3⋅23
⎥ ⎢ 23 2 3⋅23
⎥
⎣ ⎦ ⎣ ⎦
√ √ √ √ √ √
6 + 2 5 1 + 5 16 + 8 5 6 − 2 5 1 − 5 16 − 8 5
=[ + − ]−[ + − ]
8 2 24 8 2 24
√ √ √ √ √ √ √ √ √
3+ 5 1+ 5 2+ 5 3− 5 1− 5 2− 5 7+5 5 7−5 5 5 5
=[ + − ]−[ + − ]= − = .
4 2 3 4 2 3 12 12 6
Exercise 37. Find the exact area bounded by the curve y = ex and the lines y = 2, y = 3,
and x = 0.5. (Answer on p. 339.)

37 Area between Two Curves
Example 117. Find the area A bounded by the curves y = x2 − 2x − 1 and y = 1 − x2 .

√
1± 5
By the quadratic formula, the curves intersect at x = . So
2
√ √
0.5(1+ 5) 0.5(1+ 5)
A=∫ √ 1 − x2 − (x2 − 2x − 1) dx = 2 ∫ √ 1 − x2 + x dx
0.5(1− 5) 0.5(1− 5)
√
0.5(1+ 5) √
x3 x2 5 5
= 2 [x − + ] = ,
3 2 0.5(1−√5) 3
where we’ve simply recycled our tedious calculations from the previous example.
x
A
Exercise 38. Find exact area bounded by the curves y = 2 − x2 and y = x2 + 1. (Answer on
p. 339.)

38 Finding Definite Integrals on your TI84
Example 118. Use your TI84 to find the approximate area bounded by the curve y = esin x
and the horizontal axis, between x = 1 and x = 2.
After Step 9. After Step 10.

2. Press Y= .
3. Press blue 2ND button and then ex (which corresponds to the LN button). Then
press SIN X,T,θ,n ) ) and altogether you will have entered esin x .
4. Now press GRAPH and the calculator will graph the given equation.
button), to bring up the CALCULATE menu.
6. Press 7 to select the “∫ f (x) dx” option. This brings you back to the graph.
7. The TI84 is now prompting you for “Lower Limit?” Simply press 1 .
8. Now press ENTER and you will have told the TI84 that your lower limit is x = 1.
9. The TI84 is now similarly prompting you for “Upper Limit?” Simply press 2 .
10. Now press ENTER and you will have told the TI84 that your upper limit is x = 2. The
TI84 now tells you that our desired area (now shaded in black) is
∫ f (x) dx = 2.60466115.

Part III
Probability and Statistics
Probability and Statistics accounts for 60% of the A-Level H1 Maths Exam.

39 How to Count: Four Principles
How many arrangements or permutations are there of the three letters in CAT? For
example, one possible permutation of CAT is TCA.
To solve this problem, one possible method is the method of enumeration. That is,
simply list out (enumerate) all the possible permutations.
ACT, ATC, CAT, CTA, TAC, TCA.
We see that there are 6 possible permutations.

Enumeration works well enough when we have just three letters, as in CAT. Indeed, enu-
meration is sometimes the quickest method.
In contrast, the 13 letters in the word UNPREDICTABLY have 6, 227, 020, 800 possible
permutations. So enumeration is probably not practical.
To help us count more efficiently, we’ll learn about four basic principles of counting:
1. The Addition Principle (AP);

2. The Multiplication Principle (MP);
3. The Inclusion-Exclusion Principle (IEP); and
4. The Complements Principle (CP).

39.1 How to Count: The Addition Principle
The addition principle (AP) is very simple.
Example 119. For lunch today, I can either go to the food court or the hawker centre. At
the food court, I have 2 choices: ramen or briyani. At the hawker centre, I have 3 choices:
bak chor mee, nasi lemak, or kway teow.
Altogether then, I have 2 + 3 = 5 choices of what to eat for lunch today.
Here’s an informal statement of the AP:
The Addition Principle (AP). I have to choose a destination, out of two possible areas.
At area #1, there are p possible destinations to choose from. At area #2, there are q possible
destinations to choose from.
The Addition Principle (AP) simply states that I have, in total, p + q different choices.
(Just so you know, the AP is sometimes also called the Second Principle of Counting
or the Rule of Sum or the Disjunctive Rule.)
Of course, the AP generalises to cases where there are more than just 2 “areas”. It may
seem a little silly, but just to illustrate, let’s use the AP to tackle the CAT problem:

Example 120. Problem: How many permutations are there of the letters in the word CAT?
We can divide the possibilities into three cases:
Case #1. First letter is an A. Then the next two letters are either CT or TC — 2
possibilities.
Case #2. First letter is a C. Then the next two letters are either AT or TA — 2 possibilities.
Case #3. First letter is a T. Then the next two letters are either AC or CA — 2 possibilities.
Altogether then, by the AP, there are 2 + 2 + 2 = 6 possibilities. That is, there are 6 possible
permutations of the letters in CAT. These are illustrated in the tree diagram below.

The next exercise is very simple and just to illustrate again the AP.
Exercise 39. Without retracing your steps, how many ways are there to get from the
Starting Point to the River (see figure below)? (Answer on p. 340.)
Exercise 40. How many permutations are there of the letters in the word DEED? Illustrate
your answer with a tree diagram similar to that given in the CAT example above. (Answer
on p. 340.)

39.2 How to Count: The Multiplication Principle
Example 121. For lunch today, I can either have prata or horfun. For dinner tonight, I
can have McDonald’s, KFC, or Pizza Hut.
Enumeration shows that I have a total of 6 possible choices for my two meals today:
(Prata, McDonald’s), (Prata, KFC), (Prata, Pizza Hut),
(Horfun, McDonald’s), (Horfun, KFC), (Horfun, Pizza Hut).
Alternatively, we can use the Multiplication Principle (MP). I have 2 choices for lunch
and 3 choices for dinner. Hence, for my two meals today, I have in total 2 × 3 = 6 possible
choices.
Here’s an informal statement of the MP:
The Multiplication Principle (MP). I have to choose two destinations, one from each
of two possible areas. At area #1, there are p possible destinations to choose from. At area
#2, there are q possible destinations to choose from.
The Multiplication Principle (AP) simply states that I have, in total, p × q different choices.
(The MP is sometimes also called the Fundamental or First Principle of Counting

or the Rule of Product or the Sequential Rule.)
Of course, the MP generalises to cases where there are more than just 2 “areas”. Here’s an
example where we have to make 3 decisions:

Example 122. For breakfast tomorrow, I can have shark’s fin or bird’s nest (2 choices).
For lunch tomorrow, I can have black pepper crab or curry fishhead (2 choices). For dinner
tomorrow, I can have an apple, a banana, or a carrot (3 choices). By the MP, for tomorrow’s
meals, I have a total of 2 × 2 × 3 = 12 possible choices. We can enumerate these (I’ll use
abbreviations):
(SF, BPC, A), (SF, BPC, B), (SF, BPC, C), (SF, CF, A),
(SF, CF, B), (SF, CF, C), (BN, BPC, A), (BN, BPC, B),
(BN, BPC, C), (BN, CF, A), (BN, CF, B), (BN, CF, C).
More examples:

Example 123. Problem: How many four-letter words can be formed using the letters in
the 26-letter alphabet?
Let’s rephrase this problem so that it is clearly in the framework of the MP. We have 4
blank spaces to be filled:
_ _ _ _.
1 2 3 4
These 4 blanks spaces correspond to 4 decisions to be made. Decision #1: What letter to
put in the first blank space? Decision #2: What letter to put in the second blank space?
Decision #3: What letter to put in the third blank space? Decision #4: What letter to
put in the fourth blank space?
How many choices have we for each decision?
For Decision #1, we can put A, B, C, ..., or Z. So we have 26 choices for Decision #1.
For Decision #2, we can again put A, B, C, ..., or Z. So we again have 26 choices for
Decision #2.
We likewise have 26 choices for Decision #3 and also 26 choices for Decision #4.
Altogether then, by the MP, there are 26 × 26 × 26 × 26 = 264 = 456, 976 ways to make our
four decisions.
Solution: There are 264 = 456, 976 possible four-letter words that can be formed using the
26-letter alphabet.

Example 124. One 18-sided die has the numbers 1 through 18 printed on each of its sides.
Another six-sided die has the letters A, B, C, D, E, and F printed on each of its sides. We
roll the two dice. How many distinct possible outcomes are there?
Again, let’s rephrase this problem in the framework of the MP. Consider 2 blank spaces:
_ _.
1 2
These 2 blank spaces correspond to 2 decisions to be made. Decision #1: What number to
Again we ask: How many choices have we for each decision?
For Decision #1, we can put 1, 2, 3, ..., or 18. So we have 18 choices for Decision #1.
For Decision #2, we can put A, B, C, D, E, or F. So we have 6 choices for Decision #2.
Altogether then, by the MP, there are 18 × 6 = 108 ways to make our two decisions. In other
words, there are 108 possible outcomes from rolling these two dice.
(If necessary, it is tedious but not difficult to enumerate them: 1A, 1B, 1C, 1D, 1E, 1F,
2A, 2B, ..., 17E, 17F, 18A, 18B, 18C, 18D, 18E, and 18F.)
Exercise 41. A club as a shortlist of 3 men for president, 5 animals for vice-president, and
10 women for club mascot. How many possible ways are there to choose the president, the
vice-president, and the mascot? (Answer on p. 340.)
Exercise 42. (Answer on p. 341.) The highly-stimulating game of 4D consists of selecting

a four-digit number, between 0000 and 9999 (so there are 10, 000 possible numbers).
Your mother tells you to go to the nearest gambling den (also known as a Singapore Pools
outlet) to buy any three numbers, subject to these two conditions:
• The four digits in each number are distinct.
• Each four-digit number is distinct.
How many possible ways are there to fulfil your mother’s request?

39.3 How to Count: The Inclusion-Exclusion Principle
The Inclusion-Exclusion Principle (IEP) is another very simple principle.
Example 125. For lunch today, I can either go to the food court or the hawker centre. At
the food court, I have 4 choices of cuisine: Chinese, Indian, Malay, and Western. At the
hawker centre, I have 3 choices of cuisine: Chinese, Malay, and Thai.
There are 2 choices of cuisine that are common to both the food court and the hawker
centre (Chinese and Malay).
And so by the Inclusion-Exclusion Principle (IEP), I have in total 4 + 3 − 2 = 5 choices of

cuisine. The Venn diagram below illustrates.
Why do we subtract 2? If we simply added the 4 choices available at the food court to the
3 available at the hawker centre, then we’d double-count the Chinese and Malay cuisines,
which are available at both the food court and the hawker centre. And so we must subtract
the 2 cuisines that are at both locations.

Example 126. Problem: How many integers between 1 and 20 are divisible by 2 or 5?
There are 10 integers divisible by 2, namely 2, 4, 6, 8, 10, 12, 14, 16, 18, and 20.
There are 4 integers divisible by 5, namely 5, 10, 15, and 20.
There are 2 integers divisible by BOTH 2 and 5, namely 10 and 20.
Hence, by the IEP, there are 10 + 4 − 2 = 12 integers that are divisible by either 2 or 5.
(These are namely 2, 4, 5, 6, 8, 10, 12, 14, 15, 16, 18, and 20.)
Here’s an informal statement of the IEP:
The Inclusion-Exclusion Principle (IEP). I have to choose a destination, out of two

possible areas. At area #1, there are p possible destinations to choose from. At area #2,
there are q possible destinations to choose from. Areas #1 and #2 overlap — they have r
destinations in common.
The IEP simply states that I have, in total, p + q − r different choices.
Exercise 43. (Answer on p. 342.) The food court has 4 types of cuisine: Chinese,
Indonesian, Korean, and Western. The hawker centre has 3: Chinese, Malay, and Western.
A restaurant has 3: Chinese, Japanese, or Malay.
In total, how many different types of cuisine are there? Illustrate your answer with a Venn
diagram.

39.4 How to Count: The Complements Principle
The Complements Principle (CP) is another very simple principle.
Example 127. The food court has 4 types of cuisine: Chinese, Malay, Indian, and Other.
I’m at the food court but don’t feel like eating Malay or Chinese. So by the Complements
Principle (CP), I have 4 − 2 = 2 possible choices of cuisine (Indian and Other).
Here’s an informal statement of the CP:
The Complements Principle (CP). There are p possible destinations. I must choose
one. I rule out q of the possible destinations.
The Complements Principle says that I am left with p − q possible choices.
Exercise 44. There are 10 Southeast Asian countries, of which 3 (Brunei, Indonesia, and
the Philippines) are not on the mainland. How many mainland Southeast Asian countries
are there that a European tourist can visit? (Answer on p. 342.)

40 How to Count: Permutations
In this chapter, we’ll use the MP to generate several more methods of counting.
But first, we’ll learn about the factorial notation.
Definition 1. Let n ∈ Z+0 . Then n-factorial, denoted n!, is defined by n! = n × (n − 1) × ⋅ ⋅ ⋅ × 1

for n ≥ 1 and 0! = 1.
Example 128. 0! = 1, 1! = 1, 2! = 2× = 2, 3! = 3 × 2 × 1 = 6, 4! = 4 × 3 × 2 × 1 = 24,

5! = 5 × 4 × 3 × 2 × 1 = 120.
Exercise 45. Compute 6!, 7!, and 8!. (Answer on p. 342.)
We now revisit the CAT problem, using the MP:

Example 129. Problem: How many permutations (or arrangements) are there of the three
letters in the word CAT?
Let’s rephrase this problem in the framework of the MP. Consider three blank spaces:
_ _ _.
1 2 3
These 3 blank spaces correspond to 3 decisions to be made. Decision #1: What letter to
Decision #3: What letter to put in the third blank space?
For Decision #1, we can put C, A, or T. So we have 3 choices for Decision #1.
Having already used up a letter in Decision #1, we are left with two letters. So we have 2
choices for Decision #2.
Having already used up a letter in Decision #1 and another in Decision #2, we are left
with just one letter. So we have only 1 choice for Decision #3.
Altogether then, by the MP, there are 3×2×1 = 3! = 6 possible ways of making our decisions.
This is also the number of ways there are to arrange the three letters in the word CAT.
Let’s now try the UNPREDICTABLY problem.

Example 130. Problem: How many ways permutations are there of the 13 letters in the
word UNPREDICTABLY?
Again, let’s rephrase this problem in the framework of the MP. Consider 13 blank spaces:
_ _ _ _ _ _ _ _ _ _ _ _ _.
1 2 3 4 5 6 7 8 9 10 11 12 13
These 13 blanks spaces correspond to 13 decisions to be made. Decision #1: What letter
to put in the first blank space? Decision #2: What letter to put in the second blank space?
... Decision #13: What letter to put in the 13th blank space?
First an important note: In the word UNPREDICTABLY, no letter is repeated. (Indeed,

UNPREDICTABLY is the longest “common” English word without any repeated letters.)
For Decision #1, we can put U, N, P, R, E, D, I, C, T, A, B, L, or Y. So we have 13 choices

for Decision #1.
For Decision #2, having already used up a letter in Decision #1, we are left with 12 letters.
So we have 12 choices for Decision #2.
For Decision #3, having already used up a letter in Decision #1 and another letter in
Decision #2, we are left with 11 letters. So we have 11 choices for Decision #3.
For Decision #13, having already used up a letter in Decision #1, another in Decision #2,
another in Decision #3, ..., and another in Decision #12, we are left with one letter. So
we have 1 choice for Decision #13.
Altogether then, by the MP, there are 13 × 12 × ⋅ ⋅ ⋅ × 2 × 1 = 13! = 6, 227, 020, 800 possible
ways of making our decisions. This is also the number of ways there are to arrange the 13
letters in the word UNPREDICTABLY.
The next fact simply summarises what should already be obvious from the above examples:
Fact 2. There are n! possible permutations of n distinct objects.
Here is an informal proof of the above fact.

Consider n empty spaces. We are to fill them with the n distinct objects.
_ _ _ . . . _.
1 2 3 n
For space #1, we have n possible choices. For space #2, we have n − 1 possible choices
(because one object was already placed in space #1). ... And finally for space #n, we have
only 1 object left and thus only 1 choice. By the MP then, there are n × (n − 1) × ⋅ ⋅ ⋅ × 1 = n!
possible ways of filling in these n spaces with the n distinct objects.
Example 131. The word COWDUNG has seven distinct letters. Hence, there are 7! = 5040
permutations of the letters in the word COWDUNG.

40.1 Permutations with Repeated Elements
In the previous section, we saw that there are 3! permutations of the three letters in the
word CAT and 13! permutations of the 13 letters in the word UNPREDICTABLY. We
made an important note: In each of these words, there was no repeated letter.
We now consider permutations of a set where some elements are repeated.
Example 132. How many permutations are there of the three letters in the word SEE?
A naïve application of the MP would suggest that the answer is 3! = 6. This is wrong.
Enumeration shows that there are only 3 possible permutations:
EES, ESE, SEE.
To see why a naïve application of the MP fails, set up the problem in the framework of the
MP. Consider 3 blank spaces:
_ _ _.
1 2 3
These 3 blanks spaces correspond to 3 decisions to be made. Decision #1: What letter to
Decision #3: What letter to put in the third blank space?
For Decision #1, we can put E or S. So we have 2 choices for Decision #1.
But now the number of choices available for Decision #2 depends on what we chose for
Decision #1! (If we chose E in Decision #1, then we again have 2 choices for Decision
#2. But if instead we chose S in Decision #2, then we now have only 1 choice for Decision
#2.) This violates the implicit but important assumption in the MP that the number of
choices available in one decision is independent on the choice made in the other decision.
Hence, the MP does not directly apply.

(... Example continued from the next page ...)
The reason SEE has only 3 possible permutations (instead of 3! = 6) is that it contains a
repeated element, namely E. But why would this make any difference?
To understand why, let’s rename the second E as Ê, so that the word SEE is now trans-
formed into a new word SEÊ. From the three letters of this new word, we’d again have
3! = 6 possible permutations:
EÊS, ÊES, ESÊ, ÊSE, SEE, SÊE.
Restricting attention to the two letters EÊ, we see that there are 2! = 2 ways to permute
these two letters. Hence, any single permutation (in the case where we do not distinguish
between the two E’s) corresponds to 2 possible permutations (in the case where we do). The
figure below illustrates how the 3 permutations of SEE correspond to the 6 permutations
in SEÊ.
Hence, when we do not distinguish between the two E’s, there are only half as many possible
permutations.
We next consider permutations of SASS.

Example 133. How many permutations are there of the four letters in the word SASS?
The answer is 4!/3! = 4. Let’s see why.
If we distinguish between the three S’s, perhaps by calling them S, Ŝ, and S̄, then we’d
have 4! = 24 possible permutations of the letters in the word SAŜS̄.
But amongst the three S’s themselves, we have 3! = 6 possible permutations: SŜS̄, SS̄Ŝ,
ŜSS̄, S̄SŜ, ŜS̄S, and S̄ŜS. So distinguishing between the three S’s increases by 6-fold the
number of possible permutations. Working backwards, the word SASS thus has one-sixth
as many permutations as SAŜS̄. That is, SASS has 4!/3! = 4 possible permutations.
The figure below illustrates how the 4 possible permutations of SASS correspond to the 24
possible permutations of SAŜS̄.
Exercise 46. There are 3 identical white tiles and 4 identical black tiles. How many ways
are there of arranging these 7 tiles in a row? (Answer on p. 342.)

40.2 Partial Permutations
Example 134. Using the 26-letter alphabet, how many 3-letter words can we form that
have no repeated letters? This, of course, is simply the problem of filling in these 3 empty
spaces using 26 distinct elements. For space #1, we have 26 possible choices. For space
#2, we have 25. And for space #2, we have 24.
___
1 2 3
By the MP then, the number of ways to fill the three spaces is 26 × 25 × 24. This is also the
number of three-letter words with no repeated letters.
Problems like the above example crop up often enough to motivate a new piece of notation:
Definition 2. Let n, k be positive integers with n ≥ k. Then P (n, k), read aloud as n
permute k, is defined by
n!
P (n, k) = .
(n − k)!
P (n, k) answers the following question: “Given n distinct objects and k spaces (where
k ≤ n), how many ways are there to fill the k spaces?”
Just so you know, P (n, k) is also variously denoted nP k, Pkn , n Pk , etc., but we’ll stick solely
with the P (n, k) in this textbook.
Example 124 (continued from above). The number of 3-letter words without repeated
letters is simply P (26, 3) = 26!/23! = 26 × 25 × 24.
Example 135. Problem: Using the 22-letter Phoenician alphabet, how many 4-letter words
can we form that have no repeated letters?
This, of course, is simply the problem of filling in these 4 empty spaces using 22 distinct
elements. So the answer is P (22, 4) = 22!/18! = 22 × 20 × 19 × 18 words.
Exercise 47. Out of a committee of 11 members, how many ways are there to choose a
president and a vice-president? (Answer on p. 342.)

40.3 Permutations with Restrictions
Example 136. At a dance party, there are 7 heterosexual married couples (and thus 14
people in total). Problem #1. How many ways are there of arranging them in a line, with
the restriction that every person is next to his or her partner?
Think of there as being 7 units (each unit being a couple). There are 7! ways to arrange
these 7 units in a line. Within each unit, there are 2 possible arrangements. Hence, in
total, there are 7! × 27 possible arrangements.
Example 137. (I assume you’re familiar with the standard 52-card deck.)

Problem #1. Using a standard 52-card deck, how many ways are there of arranging any
3 cards in a line, with the restriction that no two cards of the same suit are next to each
other?
This is the problem of filling in 3 spaces with 52 distinct objects. For space #1, we have
52 possible choices.
_ _ _.
1 2 3
For space #2, having picked a card of suit X for space #1, we must pick a card from some
other suit Y. And so there are only 39 possible choices (we have three suits available —
that’s 3 × 13 = 39).
For space #3, having picked a card of suit Y for space #2, we must pick a card from some
other suit Z. Note that suit Z can be the same as suit X. And so there are 38 possible choices
(we have three suits available, less the card used for space #1 — that’s 3 × 13 − 1 = 38).
Altogether then, there are 52 × 39 × 38 possible arrangements.
Exercise 48. (Answer on p. 343.) There are 4 brothers and 3 sisters. In how many ways
can they be arranged ...
(a) in a line, without any 2 brothers being next to each other?
(b) in a line, without any 2 sisters being next to each other?

41 How to Count: Combinations
P (n, k) is the number of ways we can fill k (ordered) spaces using n distinct objects.
In contrast, C(n, k) is the number of ways of choosing k out of n distinct objects. Equiva-
lently, it is the same problem of filling k spaces using n distinct objects, except that now
order does not matter.
Example 138. Suppose we have a committee of 13 members and wish to select a president
and a vice-president. This is equivalent to the problem of filling in 2 spaces, given 13
distinct objects.
__
1 2
The answer is thus simply P (13, 2) = 13 × 12.
Suppose instead that we want to choose two co-presidents. How many ways are there of
doing so?
This is simply the same problem as before — again we want to fill in 2 spaces, given 13
distinct objects. The only difference now is that the order of the 2 chosen objects
does not matter. So the answer must be that there are P (13, 2)/2! ways of choosing the
two co-presidents.
Example 139. How many ways are there of choosing 5 cards out of a standard 52-card
deck?
_____
1 2 3 4 5
First, how many ways are there to fill 5 spaces using 52 distinct objects (where order
matters)? Answer: P (52, 5) = 52 × 51 × 50 × 49 × 48 = 311, 875, 200.
And so if we don’t care about order, we must adjust this number by dividing by 5! to get
P (52, 5)/5! = 2, 598, 960. So the answer is that to choose 5 cards out of a 52-card deck,
there are 2, 598, 960 ways.
The above examples suggest that, in general, to choose k out of n given distinct objects,
there are P (n, k)/k! possible ways. This motivates the following definition:

Definition 3. Let n, k be positive integers with n ≥ k. Then C(n, k), read aloud as n
choose k, is defined by
P (n, k) n!
C(n, k) = = .
k! (n − k)!k!
It turns out that C(n, k) appears so often in maths that it has many alternative notations
⎛n⎞
— one of the most common is .
⎝k ⎠
“n choose k” also has several names, such as the combination, the combinatorial
number, and even the binomial coefficient. Shortly, we’ll see why the name binomial
coefficient makes sense.
Exercise 49 gives an alternate expression for C(n, k) which you’ll often find very useful.
n × (n − 1) × (n − 2) × ⋅ ⋅ ⋅ × (n − k + 1)
Exercise 49. Show that C(n, k) = . (Answer on p.
k!
344.)
Exercise 50. Compute C(4, 2), C(6, 4), and C(7, 3). (Answer on p. 344.)
Exercise 51. We wish to form a basketball team, consisting of 1 centre, 2 forwards, and 2
guards. We have available 3 centres, 7 forwards, and 5 guards. How many ways are there
of forming a team? (Answer on p. 344.)
Here’s a nice symmetry property:
Ways to choose k out Ways to choose n − k out

=
of n distinct objects of n distinct objects.
Intuitively, this property is true because choosing k out of n objects, is the same as choosing
which n − k out of n objects to ignore. Let’s jot down this symmetry property as a formal
fact:
Fact 3. (Symmetry.) C(n, k) = C(n, n − k).

Example 140. We have a group of 100 men. 70 are needed for a task. The number of
ways to choose these 70 men is:
100!
C(100, 70) = .
30!70!
This is the same as the number of ways to choose the 30 men that will not be used for the
task:
100!
C(100, 30) = .
70!30!

41.1 Pascal’s Triangle
Pascal’s Triangle consists of a triangle of numbers. If we adopt the convention that the
topmost row is row 0 and the leftmost term of each row is the 0th term, then the nth row,
k th term is the number C(n, k):
1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 25 35 21 7 1
⋮
It turns out that beautifully enough, each term is equal to the sum of the two terms above
it. The next exercise asks you to verify several instances of this:
Exercise 52. Verify the following: (a) C(1, 0) + C(1, 1) = C(2, 1); (b) C(4, 2) + C(4, 3) =
C(5, 3); (c) C(17, 2) + C(17, 3) = C(18, 3). (Answer on p. 344.)
Fact 4. (Pascal’s Rule/Identity/Relation.) C(n + 1, k) = C(n, k) + C(n, k − 1).
Proof. C(n + 1, k) is the number of ways of choosing k out of n + 1 distinct objects.

Suppose we do not choose the last object, i.e. the (n + 1)th object. Then we have to choose
our k objects out of the first n objects. There are C(n, k) ways of doing so.
Suppose we do choose the last object. Then we have to choose another k − 1 objects, out
of the first n objects. There are C(n, k − 1) ways of doing so.
Altogether then, by the Addition Principle, there are C(n, k) + C(n, k − 1) ways of choosing
k out of n + 1 distinct objects.

41.2 The Combination as Binomial Coefficient
Mathematics is the art of giving the same name to different things.

- Henri Poincaré, p. 34 in Science and Method.
Poincaré’s quote is especially true in combinatorics. In this section, we’ll learn why C (n, k)
can be called the combination and also the binomial coefficient.
Verify for yourself that the following equations are true:
(1 + x)0 = 1,
(1 + x)1 = 1 + x,
(1 + x)2 = 1 + 2x + x2 ,
(1 + x)3 = 1 + 3x + 3x2 + x3 ,
(1 + x)4 = 1 + 4x + 6x2 + 4x3 + x4 ,
(1 + x)5 = 1 + 5x + 10x2 + 10x3 + 5x4 + x5 ,
(1 + x)6 = 1 + 6x + 15x2 + 20x3 + 15x4 + 6x5 + x6 ,
(1 + x)7 = 1 + 7x + 21x2 + 35x3 + 35x4 + 21x5 + 7x6 + x7 .
⋮
Each of the expressions on the RHS is called a binomial series. Each can also be called
the binomial expansion of (1 + x)n .
Notice anything interesting? No? Try this exercise:
⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞

Exercise 53. Compute , , , , , , , . Compare
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠ ⎝4⎠ ⎝5⎠ ⎝6⎠ ⎝7⎠
these to the coefficients of the binomial expansion of (1+x)7 . What do you notice? (Answer
on p. 345.)
It turns out that somewhat surprisingly, the coefficients of the binomial expansions of
⎛n⎞ ⎛n⎞ ⎛n⎞
(1 + x)n are simply , , ... . As an additional exercise, you should verify for
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝n⎠
yourself that this is also true for n = 0 through n = 6.
There are several ways to explain why the combinatorial numbers also happen to be the
binomial coefficients. Here we’ll give only the combinatorial explanation:

Consider (1 + x)2 . Expanding, we have
(1 + x)2 = (1 + x)(1 + x) = 1 ⋅ 1 + 1 ⋅ x + x ⋅ 1 + x ⋅ x.
Consider the 4 terms on the right.
For 1 ⋅ 1, we “chose” 1 From the two (1 + x)’s in the

from the first (1 + x) and 1 Ð→ product, there is C(2, 0) = 1
from the second (1 + x). way to choose 0 of the x’s.
For 1 ⋅ x, we “chose” 1
from the first (1 + x) and x
from the second (1 + x). ⎫
⎪ From the two (1 + x)’s in the
⎪
⎬ product, there are C(2, 1) = 2
⎪
⎪
For x ⋅ 1, we “chose” x ⎭ ways to choose 1 of the x’s.
from the first (1 + x) and 1
from the second (1 + x).
Finally, for x ⋅ x, we “chose” From the two (1 + x)’s in the

x from the first (1 + x) and Ð→ product, there is C(2, 2) = 1
x from the second (1 + x). way to choose 2 of the x’s.
Altogether then, the coefficient on x0 is C(2, 0) (“choose 0 of the x’s”), that on x1 is C(2, 1)
(“choose 1 of the x’s”), and that on x2 is C(2, 1) (“choose 2 of the x’s”). That is:
⎛2⎞ 0 ⎛2⎞ 1 ⎛2⎞ 2

(1 + x)2 = x + x + x = 1 + 2x + x2 .
⎝0⎠ ⎝1⎠ ⎝2⎠
Exercise 54. (Answer on p. 345.) Mimicking what was just done above, explain why
⎛3⎞ 0 ⎛3⎞ 1 ⎛3⎞ 2 ⎛3⎞ 3

(1 + x)3 = x + x + x + x.
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠
More generally, we have

Fact 5. Let n ∈ Z+ . Then
n ⎛ n ⎞ n−i i ⎛ n ⎞ n 0 ⎛ n ⎞ n−1 1 ⎛ n ⎞ n−2 2 ⎛n⎞ 0 n

(x + y)n = ∑ x y = x y + x y + x y + ⋅⋅⋅ + xy .
i=0 ⎝ i ⎠ ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ ⎝ n ⎠

41.3 The Number of Subsets of a Set is 2n
By plugging x = 1, y = 1 into the last fact, we see that (1 + 1) = 2n is the sum of the terms
in the nth row of Pascal’s triangle:
Fact 6. Let n ∈ Z+ . Then

n ⎛n⎞ ⎛n⎞ ⎛n⎞ ⎛n⎞ ⎛n⎞
2 =∑
n
= + + + ⋅⋅⋅ + .
i=0 ⎝ i ⎠ ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ ⎝n⎠
There’s a nice combinatorial interpretation of the above fact (Poincaré’s quote at work
again).
Consider the set S = {A, B}. S has 22 = 4 subsets: ∅ = {}, {A}, {B}, and S = {A, B}.
Now consider the set T = {A, B, C}. T has 23 = 8 subsets: ∅ = {}, {A}, {B}, {C}, {A, B},
{A, C}, {B, C}, and T = {A, B, C}.
In general, if a set has n elements, how many subsets does it have? We can couch this in
the framework of the Multiplication Principle — this is really a sequence of n decisions of
whether or not to include each element in the subset. There are 2 choices for each decision.
Thus, there are 2n choices altogether. In other words, using a set of n elements, we can
form 2n subsets.
But of course, this must in turn be equal to the sum of the following:
• C (n, 0) ways to form subsets with 0 elements;

• C (n, 1) ways to form subsets with 1 element;
• C (n, 2) ways to form subsets with 2 elements;
...
• C (n, n) ways to form subsets with n elements.
Thus,
⎛n⎞ ⎛n⎞ ⎛n⎞ ⎛n⎞

2n = + + + ⋅⋅⋅ + .
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ ⎝n⎠

⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞
Exercise 55. Verify that 27 = + + + ⋅⋅⋅ + . (Answer on p. 345.)
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝7⎠
Exercise 56. Using what you’ve learnt, write down (3 + x)4 . (Answer on p. 346.)
Exercise 57. (Answer on p. 346.) (a) The Tan family has 4 sons and the Wong family
has 3 daughters. Using the sons and daughters from these two families, how many ways
are there of forming 2 heterosexual couples?
(b) The Lee family has 6 sons and the Ho family has 9 daughters. Using the sons and
daughters from these two families, how many ways are there of forming 5 heterosexual
couples?

42 Probability: Introduction
42.1 Mathematical Modelling
All models are wrong, but some are useful.

- G.E.P. Box, p. 202 in Robustness in Statistics.
Whenever we use maths in a real-world scenario, we have some mathematical model in

mind. Here’s a very simple example just to illustrate:
Example 141. We want to know how much material to purchase, in order to build a fence
around a field. We might go through these steps:
1. Formulate a mathematical model: Our field is the shape of a rectangle, with length
100 m and breadth 50 m.
2. Analyse: The rectangle has perimeter 100 + 50 + 100 + 50 = 300 m.
3. Apply the results of our analysis: We need to buy enough material to build a
300-metre long fence.
The figure below depicts how mathematical modelling works.
Starting with some real-world scenario, we go through these steps:
1. Formulate a mathematical model.
That is, describe the real-world scenario in mathematical language and concepts.
This first step is arguably the most important. It is often subjective — not everyone will
agree that your mathematical model is the most appropriate for the scenario at hand.
To use the above example, the field may not be a perfect rectangle, so some may object
to your description of the field as a rectangle. Nonetheless, you may decide that all things
considered, the rectangle is a good mathematical model.

2. Analyse the model.
This involves using maths and the rules of logic. (A-level maths exams tend to be mostly
concerned with this second step.)
In the above example, this second step simply involved computing the perimeter of the
rectangle — 100 + 50 + 100 + 50 = 300 m. Of course, for the A-levels, you can expect the
analysis to be more challenging than this.
Note that this second step, in contrast to the first, is supposed to be completely watertight,
non-subjective, and with no room for disagreement. After all, hardly anyone reasonable
could disagree that a perfect rectangle with length 100 m and breadth 50 m has perimeter
300 m.
3. Apply your results.
Now apply the results of your analysis to the real-world scenario.

In the above example, pretend you’re a mathematical consultant hired by the fence-builder.
Then your final report might simply say, “We recommend the purchase of 300 m worth of
fence material.”
This third and last step is, like the first, subjective and open to debate. It involves your
interpretation of what the results of your analysis mean (in the real world) and your rec-
ommendation of what actions to take.
For example, you find that the fence will have perimeter 300 m and thus recommend that
300 m of fence material be purchased. However, someone else, looking at the same result,
might point out that the corners of the fence require additional or special material; she
might thus make a slightly different recommendation.
We’ve secretly always been using mathematical modelling; we just haven’t always been
terribly explicit about it. The foregoing discussion was placed here, because with probability
and statistical models, we want to be especially clear about that we are doing mathematical
modelling.

42.2 The Experiment as a Model of Scenarios Involving Chance
Real-world scenarios often involve chance. We can model such scenarios mathe-
matically using a mathematical object called the experiment. The experiment can be
formally defined, but we shall not do so in this textbook. Instead, we’ll merely discuss the
experiment informally, with the aid of examples.13
Example 142. A coin flip is an example of an experiment. There are two possible out-
comes: H and T .
Example 143. A die roll is an example of an experiment. There are six possible outcomes:
1, 2, 3, 4, 5, and 6.
An event is simply any set of possible outcomes.
Example 144. In the die roll experiment, an example of an event is A = {1, 3, 5}. This is
the event that the die roll is odd. The probability of this event occurring is 0.5. We may
write P(A) = 0.5.
Another example of an event is B = {2, 4, 6}. This is the event that the die roll is even.
The probability of this event occurring is 0.5. We may write P(B) = 0.5.
Another example of an event is C = {1}. This is the event that the die roll is 1. The
probability of this event occurring is 1/6. We may write P(C) = 1/6.
Exercise 58. (Answer on p. 347.) For each of the following experiments, list the possible
outcomes. State the probability of the given event.
(a) You pick, at random, a card from a standard 52-card deck. The event A is the event
that we get a spade.
(b) You flip two fair coins. The event B is the event that both coin-flips are the same.
(c) You roll two fair dice. The event C is the event that the dice sum to 9.
13
See my H2 Mathematics Textbook for a thorough, rigorous, and formal discussion.

42.3 Mutually Exclusive Events
To say that two events A and B are mutually exclusive (or disjoint) is to say, informally,
that:
If A occurs, this means that B cannot possibly have occurred.

And if B occurs, this means that A cannot possibly have occurred.
Example 145. Consider the events A = {1, 3, 5}, B = {2, 4, 6}, and C = {1} in the die-roll
experiment.
• The events A and B are mutually exclusive.

• The events B and C are mutually exclusive.
• But the events A and C are not mutually exclusive.
Example 146. We randomly pick a student from the student population. D is the event
that the student is taller than 1.8 m; E is the event that the student is shorter than 1.6 m;
and F is the event that the student is male.
• The events D and E are mutually exclusive.

• But the events E and F are not mutually exclusive.
• Nor are the events D and F .
Example 147. We randomly pick a car in the carpark. G is the event that the car is blue.
H is the event that the car is a Mercedes-Benz. I is the event that the car has only two
seats.
Of the three events given, no two are mutually exclusive.
Exercise 59. We randomly pick a student from the student population. A is the event
that this student has an iPhone. B is the event that this student has exactly one phone. C
is the event that this student has at least two phones. (i) Are A and B mutually exclusive?
(ii) A and C? (iii) B and C? (Answer on p. 347.)

42.4 Complementary Events
Let A be an event. Its complement — the event A′ (also denoted Ac ) — is the set of all
outcomes other than those in A.
Example 148. Consider the events A = {1, 2}, B = {2, 3, 5}, and C = {1} in the die-roll
experiment.
Their complements are A′ = {3, 4, 5, 6}, B ′ = {1, 4, 6}, and C ′ = {2, 3, 4, 5, 6}.
Example 149. We randomly pick a student from the student population. D is the event
that the student is taller than 1.8 m.
Its complement is D′ , the event that the student is 1.8 m or shorter.
Example 150. We randomly pick a car in the carpark. G is the event that the car is blue.
Its complement is G′ , the event that the car is not blue.
Exercise 60. We randomly pick a student from the student population. A is the event
that this student has exactly one phone. B is the event that this student has two phones.
What are the complements B ′ and C ′ ? (Answer on p. 347.)

42.5 The Union of Two Events
Example 151. Flip three fair coins. The possible outcomes are
HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T.
Let A be the event that there is at least 1 tail, B be the event that there are at least 2
heads, and C be the event that there are at least 3 tails. That is,
A = {HHT, HT H, HT T, T HH, T HT, T T H, T T T } ,
B = {HHH, HHT, HT H, T HH} ,
C = {T T T } .
A ∪ B is the event that there is at least 1 tail OR there are at least 2 heads. A ∪ C is the
event that there is at least 1 tail. B ∪ C is the event that there are at least 3 tails OR there
are at least 2 heads.
A ∪ B = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T } ,
A ∪ C = A = {HHT, HT H, HT T, T HH, T HT, T T H, T T T } ,
B ∪ C = {HHH, HHT, HT H, T HH, T T T } .
Exercise 61. Roll two dice. Let A be the event that the sum of the rolls is even; B be the
event that it is 11 or 12; and C be the event that it is odd. Write down the probabilities
of the events A, B, C, A ∪ B, A ∪ C, and B ∪ C. (Answer on p. 348.)

42.6 The Intersection of Two Events
As before, let A be the event that there is at least 1 tail, B be the event that there are at
least 2 heads, and C be the event that there are at least 3 tails.
A ∩ B is the event that there is at least 1 tail AND there are at least 2 heads. A ∩ C is the
event that there are at least 3 tails. B ∩ C is the event that there are at least 3 tails AND
there are at least 2 heads.
A ∩ B = {HT T, T HT, T T H} ,
A ∩ C = C = {T T T } ,
B ∩ C = {} .
Note that B ∩ C is the empty event. That is, it is the event that contains no outcomes.
Exercise 62. Roll two dice. As before, let A be the event that the sum of the rolls is even;
B be the event that it is 11 or 12; and C be the event that it is odd. Write down the
probabilities of the events A ∩ B, A ∩ C, and B ∩ C. (Answer on p. 348.)

42.7 Properties of Probabilities
Let A and B be events. Probabilities must satisfy the following properties.
1. Non-negativity: P(A) ≥ 0.
2. Normalisation: P(S) = 1, where S is the set of all possible outcomes.
3. Sum of two mutually exclusive events: P(A ∪ B) = P(A) + P(B).
4. Complements: P(A) = 1 − P (Ac ).
5. Monotonicity: If every event in B is also in A, then P(B) ≤ P(A).
6. Probabilities are at most 1: P(A) ≤ 1.
7. Inclusion-Exclusion: P(A ∪ B) = P(A) + P(B) − P(A ∩ B).
Venn diagrams are helpful for illustrating probabilities. Those below help to illustrate four
of the above properties.

Exercise 63. Illustrate each of the following two properties with a Venn diagram: (a) “If
two events A and B are mutually exclusive, then P(A ∩ B) = 0.” (b) “Let A, B, and C be
events. Then P(A ∪ B ∪ C) = P(A) + P (Ac ∩ B) + P (Ac ∩ B c ∩ C).” (Answer on p. 348.)

43 Probability: Conditional Probability
Let A be the event that there is at least 1 tail and B be the event that there are at least 2
heads. That is,
A = {HHT , HT H, HT T, T HH, T HT, T T H, T T T } ,
B = {HHH, HHT , HT H, T HH} .
Question: You are told that A occurred; what then is the probability that B also occurred?
The desired probability is called the conditional probability of A given B and is

denoted P(B∣A). It is given by
P(A ∩ B) 3/8 3
P(B∣A) = = = .
P(A) 7/8 7
Explanation: There is probability 7/8 that A occurred. There is probability 3/8 that both
A and B occurred. Thus, given that A occurred, the probability that B also occurred is
3/7.

Example 154. A and B are events, with P(A) = 0.5, P(B) = 0.6, and P(A ∩ B) = 0.2.
Hence, given that B has occurred, the probability that A has also occurred is simply
0.2/0.6 = 1/3. (The information that P(A) = 0.5 is irrelevant.) Formally:
P(A ∩ B) 0.2 1
P(A∣B) = = = .
P(B) 0.6 3
Exercise 64. Roll two dice. Given that the sum of the two dice rolls is 8, what is the
probability that we rolled at least one even number? (Answer on p. 349.)

44 Probability: Independence
Informally, two events A and B are independent if the probability that both occur is
simply the product of the probabilities that each occurs. Independence is thus analogous
to the MP from counting. Formally:
Definition 4. Two events A, B ∈ Σ are independent if
P(A ∩ B) = P(A)P(B).
There is a second, equivalent perspective of independence. Informally, two events A and B

are independent if the probability that A occurs is independent of whether B has occurred.
Formally:
Fact 7. Suppose P(B) ≠ 0. Then A, B are independent events ⇐⇒ P(A∣B) = P(A).
1
Proof. By definition of conditional probabilities, P(A∣B) = P(A ∩ B)/P(B). By definition
2 2 1
of independence, P(A ∩ B) = P(A)P(B). Plugging = into =, we have P(A∣B) = P(A), as
desired.

Example 155. Flip two fair coins. Let H1 be the event that the first coin flip is Heads —
that is, H1 = {HH, HT }. Analogously define T1 , H2 , and T2 .
The intuitive idea of independence is easy to grasp. If we say that the two coin flips are
independent, what we mean is that the following four conditions are true:
1. H1 and H2 are independent. (The probability that the second flip is heads is independent
of whether the first flip is heads.)
2. H1 and T2 are independent. (The probability that the second flip is tails is independent
of whether the first flip is heads.)
3. T1 and H2 are independent. (The probability that the second flip is heads is independent
of whether the first flip is tails.)
4. T1 and T2 are independent. (The probability that the second flip is tails is independent
of whether the first flip is tails.)
Formally:
1. P (H1 ∩ H2 ) = P({HH}) = P (H1 ) P (H2 ) = P({HH, HT }) ⋅ P({HH, T H}) = 0.5 × 0.5 =
0.25.
2. P (H1 ∩ T2 ) = P({HT }) = P (H1 ) P (T2 ) = P({HH, HT })⋅P({HT, T T }) = 0.5×0.5 = 0.25.
3. P (T1 ∩ H2 ) = P({T H}) = P (T1 ) P (H2 ) = P({T H, T T })⋅P({HH, T H}) = 0.5×0.5 = 0.25.
4. P (T1 ∩ T2 ) = P({T T }) = P (T1 ) P (T2 ) = P({T H, T T }) ⋅ P({HT, T T }) = 0.5 × 0.5 = 0.25.
Example 156. Flip a fair coin and roll a fair die. Consider the event “Heads”
E1 = {H1, H2, H3, H4, H5, H6}, and the event “Roll an odd number” E2 =
{H1, H3, H5, T 1, T 3, T 5}. These two events E1 and E2 are independent, as we now verify:
P (E1 ∩ E2 ) 3/12 1
P (E1 ∣E2 ) = = = = P (E1 ) .
P (E2 ) 6/12 2
More broadly, we can even say that the coin flip and die roll are independent. Informally,
this means that the outcome of the coin flip has no influence on the outcome of the die roll,
and vice versa.
The idea of independence is a little tricky to illustrate on a Venn diagram. I’ll try anyway.

Example 157. The Venn diagram below illustrates a sample space with 100 equally likely
outcomes (represented by 100 small squares). The event A is highlighted in red. The event
B is highlighted in blue.
P(A) = 0.2 (A is made of 20 small squares). P(B) = 0.1 (B is made of 10 small squares).
The event A ∩ B, coloured in green, is made of 2 small squares, so P(A ∩ B) = 0.02.
We compute
P(A ∩ B) 0.02
P(A∣B) = = = 0.2.
P(B) 0.1
We observe that P(A) = 0.2 = P(A∣B). And so by Fact 7, we conclude that the events A
and B are independent.

Exercise 65. Symmetry of Independence. In Fact 7, we showed that “A, B inde-
pendent ⇐⇒ P(A∣B) = P(A)”. Now prove that “A, B are independent events ⇐⇒
P(B∣A) = P(B).” (Answer on p. 349.)
Exercise 66. (Answer on p. 349.) An example of a transitive relation is equality: If

A = B and B = C, then A = C. Another example is ≤: If A ≤ B and B ≤ C, then A ≤ C.
In contrast, independence is not transitive, as this exercise will demonstrate. That is,
even if A and B are independent, and B and C are independent, it may not be that A and
C are also independent.
Flip two fair coins. Let H1 be the event that the first coin flip is heads, H2 be the event
that the second is heads, and T1 be the event that the first flip is tails. Show that
(a) H1 and H2 are independent.

(b) H2 and T1 are independent.
(c) H1 and T1 are not independent.

45 Probability: Not Everything is Independent
The idea of independence is intuitively easy to grasp. Indeed, so much so that students
often assume that “everything is independent”. This is a mistake. Unless you’re explicitly
told, NEVER assume that two events are independent.
Here are two examples where the assumption of independence is plausible:
Example 158. The event “coin-flip #1 is heads” and the event “coin-flip #2 is heads” are
probably independent.
Example 159. The event “die-roll #1 is 3” and the event “die-roll #2 is 6” are probably
independent.
Here are two examples where the assumption of independence is not plausible:
Example 160. The event “Google’s share price rises today” is probably not independent
of the event “Apple’s share price rises today”.
Example 161. The event “it rains in Singapore today” is probably not independent of the
event “it rains in Kuala Lumpur today”.
Nonetheless, the assumption of independence is frequently — and incorrectly — made even

when it is implausible. One reason is that the maths is easy if we assume independence —
we can simply multiply probabilities together.

Exercise 67. (Answer on p. 349.) Say the probability that a randomly-chosen person is
or was an NBA player is one in a million. (This is probably about right, since there’ve only
ever been 4, 000 or so NBA players, since the late 1940s.)
The Barry family had four players in the NBA — the father Rick Barry and three of his
four sons Jon, Brent, and Drew. (The oldest son Scooter didn’t make the NBA but was
still good enough to play professionally in other basketball leagues around the world.)
A journalist concludes that the probability of a Barry family ever occurring is
4
1 1
( ) = .
1, 000, 000 1, 000, 000, 000, 000, 000, 000, 000, 000
This is equal to the probability of buying a 4D number on six consecutive weeks, and
winning first prize every time. Is the journalist correct?

46 Random Variables: Introduction
Informally, a random variable assigns a numerical code to each possible outcome. A bit
more formally, it is a function that maps each outcome to a real number.
Example 162. Flip a fair coin. Let X be the random variable that indicates whether
the coin-flip is heads. So X(H) = 1 and X(T ) = 0.
(A bit more formally, we say that X is the function that maps the outcome H to the number
1 and the outcome T to the number 0.)
We refer to 1 and 0 as the possible observed values of the random variable X. These
correspond to the two possible outcomes of the coin-flip experiment.
Example 163. Flip three fair coins.

Let Y be the random variable that counts the number of heads. So Y (T T T ) = 0,
Y (HT T ) = Y (T HT ) = Y (T T H) = 1, Y (HHT ) = Y (HT H) = Y (T HH) = 2, and
Y (HHH) = 3.
We refer to 0, 1, 2, and 3 as the possible observed values of the random variable Y .
Let A be the random variable that that indicates whether there are at least 2 heads. So
A(HHH) = A(HHT ) = A(HT H) = A(T HH) = 1 And A(T T T ) = A(T T H) = A(T HT ) =
A(HT T ) = 0.
We refer to 1 and 0 as the possible observed values of the random variable A.
Example 164. Draw a card from a standard 52-card deck. In bridge, an ace is worth 4
high card points, a king 3, a queen 2, and a jack 1. Any other card is worth 0 points.
So we might let B be the corresponding random variable, where for example B(Aª) = 4,
B(J¨) = 1, and B(7«) = 0.
Exercise 68. Let X be the random variable that is the sum of two fair die-rolls. What are
the possible observed values of X? (Answer on p. 349.)
Exercise 69. Let C be the random variable that counts the total number of high card
points, in any two randomly-chosen cards from a standard 52-card deck. What are the
possible observed values of C? (Answer on p. 350.)

47 Random Variables: Probability Distribution
The notation X = k is shorthand for the event that contains all the outcomes s such
that X(s) = k.
The notation “X ≥ k”, “X > k”, “X ≤ k”, “X < k”, “a ≤ X ≤ b”, etc. are similarly defined.
Example 162 (continued from above). Recall the fair coin-flip. Let A be the event
that the coin-flip is heads and B be the event that the coin-flip is tails. So P(A) = 0.5 and
P(B) = 0.5.
Let X be the random variable that indicates whether the coin-flip is heads. That is,
X(H) = 1 and X(T ) = 0.
By our newly-introduced notation, we can also write P(X = 1) = 0.5 and P(X = 0) = 0.5.
We also have P(X ≤ 1) = P(X = 0) + P(X = 1) = 1.
Example 163 (continued from above). Recall the three fair coin-flips. Let C, D, E,
and F be the events that there are 0, 1, 2, and 3 heads. So P(C) = 1/8, P(D) = 3/8,
P(E) = 3/8, and P(F ) = 1/8.
Let Y be the random variable that counts the number of heads. By our newly-introduced
notation, we can also write P(Y = 0) = 1/8, P(Y = 1) = 3/8, P(Y = 2) = 3/8, and P(Y = 3) =
1/8.
We also have P(Y ≤ 2) = P(Y = 0) + P(Y = 1) + P(Y = 2) = 7/8.
Example 164 (continued from above). Recall the high card point count in bridge.
Randomly choose a card from a standard 52-card deck. Let G be its high card point
count. By our newly-introduced notation, we can write P(G = 0) = 9/13, P(G = 1) = 1/13,
P(G = 2) = 1/13, P(G = 3) = 1/13, and P(G = 4) = 1/13.
We also have P(G > 2) = P(G = 3) + P(G = 4) = 2/13.

The probability distribution (or probability law or probability mass function) of a
random variable X is a complete specification of P (X = k), for all possible observed values
k (of the random variable X) . In the above examples, we gave the probability distributions
of several random variables.
More examples of random variables and their probability distributions:
Example 165. Flip two fair coins. The four possible outcomes are HH, HT , T H, and
TT.
Let X indicate whether the two coin flips are the same and Y count the number of heads.
That is,
X(HH) = 1, X(HT ) = 0, X(T H) = 0, X(T T ) = 1,
Y (HH) = 2, Y (HT ) = 1, Y (T H) = 1, Y (T T ) = 0.
And so the probability distribution of X is
P(X = 0) = 0.5, P(X = 1) = 0.5.
And the probability distribution of Y is
P(Y = 0) = 0.25, P(Y = 1) = 0.5, P(Y = 2) = 0.25.
Another example:

Example 166. Pick a random card from the standard 52-card deck. The 52 possible
outcomes are
A«, K«, , . . . , 2«, Aª, Kª, . . . , 2ª, A©, K©, . . . , 2©, A¨, K¨, . . . , 2¨.
Let Y indicate whether the picked card is a spade («). That is,
Y (Any «) = 1, Y (Any other card) = 0.
So the probability distribution of Y is:
39 13
P(Y = 0) = , P(Y = 1) = .
52 52

Example 167. Roll two fair dice. The 36 possible outcomes are
, ,..., , ,..., , ,..., .
Let X is the sum of the two dice. And so for example,
⎛ ⎞ ⎛ ⎞
X = 7 and X = 5.
⎝ ⎠ ⎝ ⎠
The table below says that P (X = 2) = 1/36, because there is only one way the event X = 2
can occur. And P (X = 3) = 2/36, because there are two ways the event X = 3 can occur.
You are asked to complete the table in the next exercise.
k s such that X(s) = k P (X = k)

1
2
36
2
3 ,
36
4
5
6
7
8
9
10
11
12
Exercise 70. (Continuation of the above example.) (Answer on p. 350.) (a) Complete
the above table.
Consider the event E, described in words as “the sum of the two dice is at least 10”.
(b) Write down the event E in terms of X.

(c) Calculate P(E).

48 Random Variables: Independence
Informally, two random variables are independent if knowing the value of one does not
tell us anything about the value of the other.
Example 168. Flip a fair coin twice. The four possible outcomes are HH, HT, T H, T T .
When we say that “the two coin-flips are independent”, what exactly do we mean by this?
Let’s rephrase this statement slightly more formally.
Let A indicate whether the first coin-flip was heads and B indicate whether the second was
heads. That is,
A(HH) = 1, A(HT ) = 1, A(T H) = 0, A(T T ) = 0,
B(HH) = 1, B(HT ) = 0, B(T H) = 1, B(T T ) = 0.
The following two statements are equivalent:

1. “The two coin-flips are independent.”
2. “The random variables A and B are independent.”
Informally, the second statement says that knowing the value of A (whether the first coin-
flip was heads or not) tells us absolutely nothing about the value of B (whether the second
coin-flip was heads or not).
For example, if we know that A = 1, then P(B = 0) = 0.5 and P(B = 1) = 0.5. And if we
know instead that A = 0, then P(B = 0) = 0.5 and P(B = 1) = 0.5. Thus, knowing whether
A = 1 or A = 0 makes absolutely no difference about what we wan say about B.
Formally:
Definition 5. Given random variables X and Y , we say that X and Y are independent if
for all x, y,
P (X = x, Y = y) = P(X = x)P(Y = y).

Example 168 (continued from above). It may be “obvious”, even without proof, that
“the two coin-flips are independent”. But as an exercise, let’s formally prove that this is so,
using the above formal definition.
A and B remain the random variables indicating whether the first and second coin-flips are
heads (respectively).
We now verify that indeed, P (A = a, B = b) = P(A = a)P(B = b) for all possible values of a
and b:
P (A = a, B = b) P(A = a)P(B = b)
P (A = 0, B = 0) = 0.25 P (A = 0) P (B = 0) = 0.5 × 0.5, ✓
P (A = 1, B = 0) = 0.25 P (A = 1) P (B = 0) = 0.5 × 0.5, ✓
P (A = 0, B = 1) = 0.25 P (A = 0) P (B = 1) = 0.5 × 0.5, ✓
P (A = 1, B = 1) = 0.25 P (A = 1) P (B = 1) = 0.5 × 0.5. ✓
The above method for proving that two random variables are independent becomes espe-
cially useful, when it is not immediately “obvious” that they are independent:
Exercise 71. Flip two fair coins. Let X indicate whether the two coin flips were the same
and Y count the number of heads. Are X and Y independent random variables? (Answer
on p. 350.)
Earlier we warned against blithely assuming that any two events are independent. Here we
can repeat this warning: Unless explicitly told (or you have a good reason), do not assume
that two random variables are independent.
The assumption of independence is a strong one. There are many scenarios where it is
plausible. For example, the flips of two coins are probably independent. The rolls of two
dice are probably independent.
There are, however, also many scenarios where it is not plausible. Today’s changes in
the share prices of Google and Apple are probably not independent. Today’s rainfall in
Singapore and in Kuala Lumpur are probably not independent.
Nonetheless, the assumption of independence is frequently — and incorrectly — made even

when it is implausible. The reason is that the maths is easy if we assume independence —
we can simply multiply probabilities together. Unfortunately, incorrectly assuming inde-
pendence can sometimes have tragic consequences.

49 Random Variables: Expectation
Example 169. Let X be the outcome of a fair die roll.

Informally, the expected value (or the mean) of X is the average expected outcome of
a fair die roll.
Note that X takes on a value 1 with probability 1/6. Similarly, it takes on a value 2 with
probability 1/6. Etc. Hence, the expected value of X, denoted E [X] is given by:
1 1 1 1 1 1 1 + 2 + 3 + 4 + 5 + 6 21
E[X] = ⋅1+ ⋅2+ ⋅3+ ⋅4+ ⋅5+ ⋅6= = = 3.5.
6 6 6 6 6 6 6 6
On average, we expect the outcome of a fair die roll to be 3.5.
A bit more formally, the expected value or mean of a random variable X — denoted E[X]
— is simply a weighted average of the possible observed values of X, where the weights are
simply given by the probability that the random variable takes on each possible observed
value.
Given a random variable X, its mean is usually denoted µX . If it’s obvious from the context
that we’re talking about the random variable X, we drop the subscript X and simply use
µ to denote the mean of X.
Example 170. Let Y be the sum of two fair die-rolls.

In Exercise 70, we worked out that P (Y = 2) = 1/36, P (Y = 3) = 2/36, etc. Thus:
µY = P (Y = 2) ⋅ 2 + P (Y = 3) ⋅ 3 + P (Y = 4) ⋅ 4 + P (Y = 5) ⋅ 5 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ 12
1 2 3 4 5 6 5 4 3 2 1
= ⋅2+ ⋅3+ ⋅4+ ⋅5+ ⋅6+ ⋅7+ ⋅8+ ⋅9+ ⋅ 10 + ⋅ 11 + ⋅ 12
36 36 36 36 36 36 36 36 36 36 36
2 + 6 + 12 + 20 + 30 + 42 + 40 + 36 + 30 + 22 + 12 252
= = = 7.
36 36

Example 171. Flip two fair coins and roll two fair dice. Let X be the number of heads
and Y be the number of sixes.
Problem: What is E[X + Y ]?
As it turns out, it is generally true that E[X + Y ] = E[X] + E[Y ] (as we’ll see in the next
section). So if we knew this, then the problem is very easy: E[X + Y ] = E[X] + E[Y ] =
1 4
1+ = .
3 3
But as an exercise, let’s pretend we don’t know that E[X + Y ] = E[X] + E[Y ]. We thus
have to work out E[X + Y ] the hard way:
First, note that the possible observed values of X + Y are 0, 1, 2, 3, 4. P (X + Y = 0) is the

probability of 0 heads and 0 sixes. And P (X + Y = 1) is the probability of 1 head and 0
sixes OR 0 heads and 1 six. We can compute:
1 1 5 5 25
P (X + Y = 0) = ⋅ ⋅ ⋅ = ,
2 2 6 6 144
⎛ 2 ⎞ 1 1 5 5 1 1 ⎛ 2 ⎞ 5 1 50 10 60
P (X + Y = 1) = ⋅ ⋅ ⋅ + ⋅ = + = .
⎝ 1 ⎠ 2 2 6 6 2 2 ⎝ 1 ⎠ 6 6 144 144 72
You are asked to complete the rest of this problem in the exercise below.

Exercise 72. Complete the above example by following these steps: (a) Compute
P (X + Y = 2). (b) Compute P (X + Y = 3). (c) Compute P (X + Y = 4). (d) Now com-
pute E[X + Y ]. (Answer on p. 351.)
Exercise 73. In the game of 4D, you pay $1 to pick any four-digit number between 0000
and 9999 (there are thus 10, 000 possible choices). There are two variants of the 4D game
— “big” and “small”. The prize structures are as given below. Let X be the prize received
from a $1 stake in the “big” game and Y be the prize received from a $1 stake in the “small”
game. (Answer on p. 352.)
(a) Write down the possible observed values of X and Y .
(b) Write down the probability distributions of X and Y .
(c) Hence find E[X] and E[Y ].
(d) Which game — “big” or “small” — is expected to lose you less money?
(Source: Singapore Pools, “Rules for the 4-D Game”, Version 1.11, 17/11/15, PDF.)

49.1 The Expectation Operator is Linear
d
Example 172. The differentiation operator is an example of a linear transformation.
dx
Because it satisfies the following two conditions:
d d d
(f (x) + g(x)) = f (x) + g(x),
dx dx dx
d d
and (kf (x)) = k f (x).
dx dx
A common mistake made by students is to believe that “everything is linear”.

Here are two examples of operators that are not linear transformations.
√
Example 173. The square-root operator ⋅ is not a linear transformation, because in
general, we do not have
√ √ √
x + y = x + y,
√ √
or kx = k x.
Example 174. The square operator ⋅2 is not a linear transformation, because in general,
we do not have
2
(x + y) = x2 + y 2 ,
2
or (kx) = kx2 .

It turns out that the expectation operator E is a linear transformation. That is, if X and
Y are random variables and c is a constant, then
E[X + Y ] = E [X] + E [Y ] ,
and E[cX] = cE [X] .
The expectation operator is linear. This is true even if independence is not satisfied,
which makes it an especially powerful property. Example:
Example 175. I stake $100 on each of two different 4D numbers for Saturday’s drawing
(“big” game). (So that’s $200 total.)
Let X and Y be my winnings (excluding my original stake) from the first and second
numbers (respectively). Now, X and Y are certainly not independent because for example,
if my first number wins first prize, then my second number cannot possibly also win first
prize.
Nonetheless, despite X and Y not being independent, the linearity of the expectation
operator tells us that
E [X + Y ] = E [X] + E [Y ] = $65.90 + $65.90 = $131.80.

50 Random Variables: Variance
Example 176. Consider a random variable X that is equally likely to take on one of 5
possible values: 0, 1, 2, 3, 4. Its mean is
1 1 1 1 1
µX = ∑ P (X = k) ⋅ k = ⋅ 0 + ⋅ 1 + ⋅ 2 + ⋅ 3 + ⋅ 4 = 2.
5 5 5 5 5
Now consider another random variable Y that is equally likely to take on one of 5 possible
values: −8, −3, 2, 7, 12. Coincidentally, its mean is the same:
1 1 1 1 1
µY = ∑ P (Y = k) ⋅ k = ⋅ (−8) + ⋅ (−3) + ⋅ 2 + ⋅ 7 + ⋅ 12 = 2.
5 5 5 5 5
The random variables X and Y share the same mean. However, there is an obvious differ-
ence: Y is “more spread out”.
What, precisely, do we mean when we say that one random variable is “more spread out”
than another?
Our goal in this section is to invent a measure of “spread-outness”. We’ll call this the
variance and denote the variance of any random variable X by V [X].
It’s not at all obvious how the variance should be defined. One possibility is to define the
variance as the weighted average of the deviations from the mean.

Example 165 (continued from above). (Our first proposed definition of variance.)
For X, the weighted average of the deviations from the mean is
0−µ 1−µ 2−µ 3−µ 4−µ

V [X] = ∑ P (X = k) ⋅ (k − µ) = + + + +
5 5 5 5 5
0−2 1−2 2−2 3−2 4−2 2 1 1 2

= + + + + = − − + 0 + + = 0.
5 5 5 5 5 5 5 5 5
Hmm. This works out to be 0. Is that just a weird coincidence? Let’s try the same for Y :
−8 − µ −3 − µ 2 − µ 7 − µ 12 − µ
V [Y ] = ∑ P (Y = k) ⋅ (k − µ) = + + + +
5 5 5 5 5
−8 − 2 −3 − 2 2 − 2 7 − 2 12 − 2
= + + + + = −2 − 1 + 0 + 1 + 2 = 0.
5 5 5 5 5
Hmm. Again it works out to be 0.
This is no mere coincidence. It turns out that ∑ P(X = k) ⋅ (k − µ) is always equal to 0.

k
This is because
=µ
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
∑ P(X = k) ⋅ (k − µ) = ∑ P(X = k) ⋅ k − ∑ P(X = k) ⋅ µ
k k k
= µ − µ∑ P(X = k) = 0.
k
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
=1
So our first proposed definition of the variance — the weighted average of the deviations
from the mean — is always equal to 0. Intuitively, the reason is that the negative deviations
(corresponding to those values below the mean) exactly cancel out the positive deviations
(corresponding to those values above the mean).
This proposed definition is thus quite useless. We cannot use it to say things like Y is
“more spread out” than X.
This suggests a second approach: define the variance to be the weighted average of the
absolute deviations from the mean.

Example 165 (continued from above). (Our second proposed definition of variance.)
For X, the weighted average of the absolute deviations from the mean is
∣0 − µ∣ ∣1 − µ∣ ∣2 − µ∣ ∣3 − µ∣ ∣4 − µ∣
V [X] = ∑ P (X = k) ⋅ ∣k − µ∣ = + + + +
5 5 5 5 5
∣0 − 2∣ ∣1 − 2∣ ∣2 − 2∣ ∣3 − 2∣ ∣4 − 2∣ 2 1 1 2 6
= + + + + = + +0+ + = .
5 5 5 5 5 5 5 5 5 5
And now let’s work out the same for Y :
∣−8 − µ∣ ∣−3 − µ∣ ∣2 − µ∣ ∣7 − µ∣ ∣12 − µ∣

V [Y ] = ∑ P (Y = k) ⋅ (k − µ) = + + + +
5 5 5 5 5
∣−8 − 2∣ ∣−3 − 2∣ ∣2 − 2∣ ∣7 − 2∣ ∣12 − 2∣

= + + + + = 2 + 1 + 0 + 1 + 2 = 6.
5 5 5 5 5
Wonderful! So we can now use this second proposed definition of the variance to say things
like “Y is more spread out than X”.
This second proposed definition seems perfectly satisfactory. Yet for some bizarre reason,
it will not be our actual definition of variance. Instead, the variance will be defined as the
weighted average of the squared deviations from the mean.

Example 165 (continued from above). (The actual definition of variance.)
For X, the weighted average of the squared deviations from the mean is
2 2 2 2 2
(0 − µ)
2 (1 − µ) (2 − µ) (3 − µ) (4 − µ)
V [X] = ∑ P (X = k) ⋅ (k − µ) = + + + +
5 5 5 5 5
2 2 2 2 2
(0 − 2) (1 − 2) (2 − 2) (3 − 2) (4 − 2) 4 1 1 4
= + + + + = + + 0 + + = 2.
5 5 5 5 5 5 5 5 5
And now let’s work out the same for Y :
2 2 2 2 2
(−8 − µ)
2 (−3 − µ) (2 − µ) (7 − µ) (12 − µ)
V [Y ] = ∑ P (Y = k) ⋅ (k − µ) = + + + +
5 5 5 5 5
2 2 2 2 2
(−8 − 2) (−3 − 2) (2 − 2) (7 − 2) (12 − 2)
= + + + + = 20 + 5 + 0 + 5 + 20 = 50.
5 5 5 5 5
A bit more formally, if X is a random variable and µ is its expected value, then its variance
2
is defined to be the expected value of (X − µ) .
2
The variance of X is denoted V[X] or σX or even more simply as σ 2 (if it is clear from the
context that we’re talking about the variance of X). So we may write
2 2
V[X] = σX = E [(X − µ) ] .
So to calculate the variance, we do this: Consider all the possible values that X can take.
Take the difference between these values and the mean of X. Square them. Then take the
probability-weighted average of these squared numbers.
More examples:

Example 177. Let the random variable X be the outcome of the roll of a fair die. We
already know that µ = 3.5. Hence,
2 2
V[X] = E [(X − µ) ] = E [(X − 3.5) ]
= P (X = 1) ⋅ (1 − 3.5)2 + P (X = 2) ⋅ (2 − 3.5)2 + ⋅ ⋅ ⋅ + P (X = 6) ⋅ (6 − 3.5)2
1 35
= (2.52 + 1.52 + 0.52 + 0.52 + 1.52 + 2.52 ) = ≈ 2.92.
6 12
So the variance of the die roll is 35/12 ≈ 2.92. This means that the expected squared
deviation of X from its mean µ = 3.5 is 35/12 ≈ 2.92.
Example 178. Roll two fair dice. Let the random variable Y be the sum of the two dice.
We already know from Example 170 that µ = 7. So, using also our findings from Exercise
70,
2 2
V[Y ] = E [(Y − µ) ] = E [(Y − 7) ]
= P (Y = 2) ⋅ (2 − 7)2 + P (Y = 3) ⋅ (3 − 7)2 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ (12 − 7)2
1 ⋅ 52 + 2 ⋅ 42 + 3 ⋅ 32 + 4 ⋅ 22 + 5 ⋅ 12 + 6 ⋅ 02 + 5 ⋅ 12 + 4 ⋅ 22 + 3 ⋅ 32 + 2 ⋅ 42 + 1 ⋅ 52
=
36
2 (25 + 32 + 27 + 16 + 5) 210 70
= = = ≈ 5.83.
36 36 12
So the variance of the sum of two dice is 70/12 ≈ 5.83. This means that on average, the
square of the deviation of Y from its mean µ = 7 is 70/12 ≈ 5.83.
As the above examples suggest, calculating the variance can be tedious. Fortunately, there
is a shortcut:

Fact 8. Let X be a random variable with mean µ. Then V[X] = E [X 2 ] − µ2 .
Proof. Omitted.
We now redo the previous two examples using this shortcut:
Example 177 (continued from above). Let the random variable X be the outcome of
the roll of a fair die. We already know that µ = 3.5. So compute
1 2 2 91
E [X 2 ] = P (X = 1) ⋅ 12 + P (X = 2) ⋅ 22 + ⋅ ⋅ ⋅ + P (X = 6) ⋅ 62 = (1 + 2 + ⋅ ⋅ ⋅ + 62 ) = .
6 6
91 182 147 35
Hence, V[X] = E [X 2 ] − µ2 = − 3.52 = − = .
6 12 12 12
Example 178 (continued from above). Let the random variable Y be the sum of two
rolled dice. We already know from Example 170 that µ = 7. So, using also our findings
from Exercise 70,
E [Y 2 ] = P (Y = 2) ⋅ 22 + P (Y = 3) ⋅ 32 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ 122
1 ⋅ 22 + 2 ⋅ 32 + 3 ⋅ 42 + 4 ⋅ 52 + 5 ⋅ 62 + 6 ⋅ 72 + 5 ⋅ 82 + 4 ⋅ 92 + 3 ⋅ 102 + 2 ⋅ 112 + 1 ⋅ 122

=
36
4 + 18 + 48 + 100 + 294 + 320 + 324 + 300 + 242 + 144 1974 658

= = = .
36 36 12
658 658 588 70

Hence, V[Y ] = E [Y 2 ] − µ2 = − 72 = − = .
12 12 12 12
Exercise 74. Let the random variable B be the high card point count of a randomly-chosen
card from a standard 52-card deck. Find V[B]. (Answer on p. 353.)

50.1 Standard Deviation
Let X be a random variable. Then E [X] has the same unit of measure as X. In contrast,
V [X] uses the squared unit.
Example 179. There are 100 dumbbells in a gym, of which 30 have weight 5 kg and the
remaining 70 have weight 10 kg. Let X be the weight of a randomly-chosen dumbbell.
Then the mean and variance of X are
E [X] = µ = 0.3 × 5 kg + 0.7 × 10 kg = 8.5 kg.
2 2
V [X] = 0.3 × (5 kg − 8.5 kg) + 0.7 × (10 kg − 8.5 kg)
= 0.3 × 12.25 kg2 + 0.7 × 2.25 kg2 = 5.25 kg2 .
To get a measure of “spread” that uses the original unit of measure, we simply take the
square root of the variance. This is called the standard deviation as a measure of spread.
Definition 6. Let X be a random variable and V[X] be its variance. Then the standard
deviation of X is defined as
√
SD [X] = V[X].
2
The variance of a random variable X is often denoted σX or even more simply as σ 2 (if it
is clear from the context that we’re talking about the variance of X).
Correspondingly, the standard deviation of X is often denoted σX or σ.
Example 171 (continued from above). We calculated the variance of X to be V [X] =

σ 2 = 5.25 kg2 .
√
Hence, the standard deviation of X is simply σ = 5.25 ≈ 2.29 kg.
Exercise 75. There are 100 rulers in a bookstore, of which 35 have length 20 cm and
the remaining 65 have length weight 30 cm. Let Y be the weight of a randomly-chosen
dumbbell. Find the mean, variance, and standard deviation of Y . (Be sure to include the
units of measurement. Answer on p. 353.)

50.2 Properties of the Variance Operator
1. If X is a random variable and c is a constant, then V[cX] = c2 E [X].
2. If X and Y are independent random variables, then V[X + Y ] = E [X] + E [Y ].
With the above properties, it becomes much easier than before to find the variance of the
sum of 2 dice, 3 dice, or indeed n dice.
Example 180. Let X be the outcome of a fair die-roll. We showed earlier that V[X] =
35/12.
Now roll two fair dice. Let X1 and X2 be the respective outcomes. Let Y be the sum of
the two dice (i.e. Y = X1 + X2 ). Assuming independence, we have
70
V[Y ] = V [X1 + X2 ] = V [X1 ] + V [X2 ] = .
12
Compare this quick computation to the work we did in Example 178!
Now roll three fair dice. Let X3 , X4 , and X5 be the respective outcomes. Let Z be the sum
of the three dice (i.e. Z = X3 + X4 + X5 ). Again, assuming independence, we have
105
V[Z] = V [X3 + X4 + X5 ] = V [X3 ] + V [X4 ] + V [X5 ] = .
12
Again, compare this quick computation to the work we would have had to do, without this
property!
Now, let A be double the outcome of a die roll (i.e. A = 2X). Note importantly that A ≠ Y .
Y is the sum of two independent die rolls. In contrast, A is double the outcome of a single
die roll. Indeed, we have that
140
V[A] = V[2X] = 4V[X] = ≠ V[Y ].
12
Similarly, let B be triple the outcome of a die roll (i.e. B = 3X). Note importantly that
B ≠ Z. Z is the sum of three independent die rolls. In contrast, B is triple the outcome of
a single die roll. Indeed, we have that
315
V[B] = V[3X] = 9V[X] = ≠ V[Z].
12

It is important to remember that the second property does not hold if X and Y are not
independent.
Exercise 76. The weight of a fish in a pond is a random variable with mean µ kg and
variance σ 2 kg2 . (Include the units of measurement in your answer. Answer on p. 353.)
(a) If two fish are caught and the weights of these fish are independent of each other, what
are the mean and variance of the total weight of the two fish?
(b) If one fish is caught and an exact clone is made of it, what are the mean and variance
of the total weight of the fish and its clone?
(c) If two fish are caught and the weights of these fish are not independent of each other,
what are the mean and variance of the total weight of the two fish?

51 The Binomial Distribution
Example 181. Flip 3 fair coins. Let X be the random variable that counts the number of
heads.
1
Then X is an example of a binomial random variable with parameters 3 and .
2
Example 182. Flip 4 fair coins. Let Y be the random variable that counts the number of
heads.
1
Then Y is an example of a binomial random variable with parameters 4 and .
2
Example 183. There are 10 ATMs. On any given day, each has, independently, probability
0.1 of failure. Let Z be the random variable that counts the number of failures on any given
day.
Then Z is an example of a binomial random variable with parameters 10 and 0.1.
Example 184. 90% of H2 Maths students pass their A-level exams.

Let A be the number of passes among 2 randomly-chosen students. Then A is a binomial
random variable with parameters 2 and 0.9.
Let B be the number of passes among 3 randomly-chosen students. Then B is a binomial

random variable with parameters 3 and 0.9.
The following three statements are entirely equivalent:
1. X is a binomial random variable with parameters n and p.

2. The random variable X has the binomial distribution with parameters n and p.
3. X ∼ B(n, p).

51.1 Probability Distribution of the Binomial R.V.
We flip a biased coin n times. On each flip, the coin has probability p of landing on heads.
Let X count the number of heads. Then X is the binomial random variable with parameters
n and p.
What is P(X = k)? In other words, what is the probability that there are k heads and n − k
tails?
First let’s consider instead the probability that the first k coin-flips are heads and the
remaining n − k coin-flips are tails. We know that the probability of a heads is p and the
probability of a tails is 1 − p. Hence, by the Multiplication Principle, this probability is
simply pk (1 − p)n−k .
The above is the probability of k heads and n − k tails, but where exactly the first k trials
are successes and exactly the last n − k trials are failures. But we don’t care about where
the successes are. We only care that there are k successes. And there are C(n, k) ways to
have exactly k successes in n trials. Thus, our desired probability is:
⎛n⎞ k
P(X = k) = p (1 − p)n−k .
⎝k ⎠
Example 185. Let X be the number of heads when 10 fair coins are flipped.
Then X ∼ B(10, 0.5). And the probability that exactly 8 coins are heads is:
⎛ 10 ⎞ 8 2 45
P(X = 8) = 0.5 0.5 = .
⎝ 8 ⎠ 1024
Example 186. 90% of H2 Maths students pass their A-level exams.

Let Y be the number of passes among 20 randomly-chosen students. Then Y ∼ B(20, 0.9).
And the probability that at least 18 pass is
P(Y ≥ 18) = P(Y = 18) + P(Y = 19) + P(Y = 20)
⎛ 20 ⎞ 18 2 ⎛ 20 ⎞ 19 1 ⎛ 20 ⎞ 20 0
= 0.9 0.1 + 0.9 0.1 + 0.9 0.1 ≈ 0.677.
⎝ 18 ⎠ ⎝ 19 ⎠ ⎝ 20 ⎠

51.2 The Mean and Variance of the Binomial Random Variable
Example 187. Problem: Three machines each have, independently, probability 0.3 of fail-
ure. What is the expected number of failures? What is the variance of the number of
failures?
Solution: Let Z ∼ B(3, 0.3) be the number of failures. Then
⎛3⎞ 1 2 ⎛3⎞ 2 1 ⎛3⎞ 3 0

P (Z = 1) = 0.3 0.7 , P (Z = 2) = 0.3 0.7 , P (Z = 3) = 0.3 0.7 .
⎝1⎠ ⎝2⎠ ⎝3⎠
Hence, E[Z] = P (Z = 1) ⋅ 1 + P (Z = 2) ⋅ 2 + P (Z = 3) ⋅ 3
⎛3⎞ 1 2 ⎛3⎞ 2 1 ⎛3⎞ 3 0

= 0.3 0.7 ⋅ 1 + 0.3 0.7 ⋅ 2 + 0.3 0.7 ⋅ 3
⎝1⎠ ⎝2⎠ ⎝3⎠
= 0.441 + 0.378 + 0.081 = 0.9.
That is, the expected number of failures is 0.9.
Now, E [Z 2 ] = P (Z = 1) ⋅ 12 + P (Z = 2) ⋅ 22 + P (Z = 3) ⋅ 32
⎛3⎞ 1 2 2 ⎛3⎞ 2 1 2 ⎛3⎞ 3 0 2

= 0.3 0.7 ⋅ 1 + 0.3 0.7 ⋅ 2 + 0.3 0.7 ⋅ 3
⎝1⎠ ⎝2⎠ ⎝3⎠
= 0.441 + 0.756 + 0.243 = 1.44.
2
Hence, V[Z] = E [Z 2 ] − (E [Z]) = 1.44 − 0.92 = 0.63.
That is, the variance of the number of failures is 0.63.
It turns out though that there is a much quicker formula for finding the mean and variance
of any binomial random variable.

Fact 9. If X ∼ B(n, p), then E[X] = np and V[X] = np(1 − p).
(You can verify that this formula works for the last example: n = 3, p = 0.3, and thus
E[Z] = np = 0.9.)
Proof. Omitted.
Exercise 77. (Answer on p. 354.) Plane engine #1 contains 20 components, each of which
has probability 0.01 of failure. Plane engine #2 contains 35 components, each of which has
probability 0.005 of failure. The probability that any component fails is independent of
whether any other component has failed.
An engine fails if and only if at least 2 of its components fail. What is the probability that
both engines fail?

52 The Continuous Uniform Distribution
The binomial random variable is discrete, because its range of possible observed values is
finite.
We’ll now look instead at continuous random variables. Informally, a random variable Y
is continuous if its range takes on a continuum of values.
For H1 Maths, you need only learn about one continuous random variable: the normal
random variable (subject of the next chapter).
Nonetheless, we’ll first look at another continuous random variable that is not in the syl-
labus. This is the continuous uniform random variable. It is much simpler than
the normal random variable and can thus help build up your intuition of how continuous
random variables work.
52.1 The Continuous Uniform Distribution
A line measuring exactly 1 metre in length is drawn on the floor. It is about to rain. Let
X be the position of the first rain-drop that hits the line. X is measured as the distance
(in metres) from the left-most point of the line.
So for example, if the first rain-drop hits the left-most point of the line, then x = 0. If it
hits the exact midpoint of the line, then x = 0.5. And if it hits the right-most point, then
x = 1.
Assume we can measure X to infinite precision.
Then, assuming the first rain-drop is equally likely to hit any point of the line, we can
model X as a continuous uniform random variable on [0, 1]. This says that
• The range of X is [0, 1] (the first rain-drop can hit any point along the line); and
• X is equally likely to take on any value in the interval [0, 1] (the first rain-drop is equally
likely to hit any point along the line).
1. X is a continuous uniform random variable on [0, 1].

2. X is a random variable with the continuous uniform distribution on [0, 1].
3. X ∼ U [0, 1].
Recall that previously with any discrete random variable Y , we could find its probability
distribution. That is, we could find P (Y = k) (the probability that Y takes on the value
k). For example, if Y ∼ B (3, 0.5) modelled the number of heads in three coin-flips, then
⎛3⎞ 1 2 3
the probability that there was one heads was P (Y = 1) = 0.5 0.5 = .
⎝1⎠ 8

Now, in contrast, for any continuous random variable X, strangely enough, there is
zero probability that X takes on any particular value! For example, if X ∼ U [0, 1], then
P (X = 0.37) = 0. That is, there is zero probability that X takes on the value of 0.37!
At first glance, this may seem strange.

But remember: There are infinitely-many real numbers in the interval [0, 1]. So it makes
sense to say that the probability of X taking on any particular value is zero.14
So for any continuous random variable X, it is pointless to try to write down P (X = k) for
different possible values of k, because P (X = k) is always equal to zero (regardless of what
k is). Instead, we shall try to write down P (a ≤ X ≤ b), for different possible values of a
and b.
Now, if X ∼ U [0, 1], then the probability that X takes on values between 0.3 and 0.7 is
simply 0.7 − 0.3 = 0.4. That is,
P (0.3 ≤ X ≤ 0.7) = 0.7 − 0.3 = 0.4.
Similarly, the probability that X takes on values between 0.16 and 0.35 is simply 0.35−0.16 =
0.19. That is,
P (0.16 ≤ X ≤ 0.35) = 0.35 − 0.16 = 0.19.
The above observations suggest that it may be useful to define a new concept, called the
cumulative distribution function.
14
But strangely enough, zero probability is not the same thing as impossible. For example, we’d say that
• There is zero probability, but it is not impossible that X ∼ U [0, 1] takes on the value 0.37.
• There is zero probability and it is impossible that X ∼ U [0, 1] takes on the value 1.2.
(Actually, rather than use the word “impossible”, mathematicians prefer saying “almost never”, which has a precise
definition.)

52.2 Important Digression: P (X ≤ k) = P (X < k)
For any continuous random variable X, we have
P (X ≤ k) = P (X < k) .
That is, whether an inequality is strict makes no difference. The reason is that:
P (X ≤ k) = P (X < k) + P (X = k) = P (X < k) + 0 = P (X < k) .
Thus, for continuous random variables, it doesn’t matter whether inequalities are strict or
weak.
Example 188. Let X ∼ U [0, 1]. Then
P (0.2 ≤ X ≤ 0.5) = P (0.2 < X ≤ 0.5) = P (0.2 ≤ X < 0.5) = P (0.2 < X < 0.5) .

52.3 The Cumulative Distribution Function (CDF)
Let X be a random variable. Its cumulative distribution function (CDF) — denoted

FX — simply tells us the probability that X takes on values less than or equal to k, for
every k ∈ R.
Example 189. Let X ∼ U [0, 1]. Let FX be its CDF. Then we have, for example,
FX (0.7) = P (X ≤ 0.7) = 0.7, and FX (0.2) = P (X ≤ 0.2) = 0.2.
Example 190. Let Y ∼ U [3, 5]. This is the continuous uniform distribution on [3, 5]. It
is equally likely to take on any value in the interval [3, 5]. Let FY be the CDF of Y . Then
we have, for example,
FY (3.1) = P (Y ≤ 3.1) = 0.05, and FY (4.4) = P (Y ≤ 4.4) = 0.7.

52.4 The Probability Density Function (PDF)
Given a random variable X, its probability density function (PDF) — denoted fX —

is simply defined as the derivative of its CDF FX . 15 That is,
d
fX = FX .
dk
Example 191. The PDF of X ∼ U[0, 1] (graphed below) is simply the function fX ∶ R → R
defined by
fX (k) = 1, if k ∈ [0, 1], and fX (k) = 0, otherwise.
Recall that the area under the curve (definite integral) can be computed as the reverse
process of differentiation. Hence, for any a ≤ b, the area under the PDF between a and b is
precisely P (a ≤ X ≤ b). For example, there is probability 0.25 (red area) that X takes on
values between 0.5 and 0.75. There is probability 0.1 (blue area) that X takes on values
between 0.2 and 0.3.
Exercise 78. The continuous uniform random variable Y ∼ U[3, 5] is equally likely to take
on values between 3 and 5, inclusive. (a) Write down CDF FY . (b) Write down and graph
its PDF fY . (c) Compute, and also illustrate on your graph, the quantities P (3.1 ≤ Y ≤ 4.6)
and P (4.8 ≤ Y ≤ 4.9). (Answer on p. 354.)
15
Note that although every random variable has a CDF, not every random variable has a PDF. In particular, if the random
variable’s CDF is not differentiable, then by our definition here, the random variable does not have a PDF.

53 The Normal Distribution
The standard normal (or Gaussian) random variable (SNRV) is very important. In
fact, it is so important that we usually reserve the letter Z for it, and the Greek letters φ
and Φ (lower- and upper-case phi) for its PDF and CDF.
1. Z is a SNRV.
2. Z is a random variable with the standard normal distribution.
3. Z ∼ N (0, 1).
Here’s the formal definition:
Definition 7. Z is called a standard normal random variable (SNRV) if its PDF φ ∶ R → R

is defined by:
1
φ(a) = √ e−0.5a .
2
2π
For the A-levels, you need not remember this complicated-looking PDF. Nor need you
understand where it comes from.
The normal PDF is often also referred to as the bell curve, due to its resemblance to a
bell (kinda).
As with the continuous uniform, for any a ≤ b, the area under the normal PDF between a
and b gives us precisely P (a ≤ X ≤ b). For example, there is probability 0.0819 (red area)
that X takes on values between 0.5 and 0.75. There is probability 0.4593 (blue area) that
X takes on values between −1 and 0.3.

As usual, the CDF Φ ∶ R → R is defined by:
a a 1
√ e−0.5x dx.
2
Φ(a) = P (Z ≤ a) = ∫ φ(x)dx = ∫
−∞ −∞ 2π
Unfortunately, this last integral has no simpler expression (mathematicians would say that
it has no “closed-form expression”). Instead, as we’ll soon see, we have to use the so-called
Z-tables (or a graphing calculator) to look up values of Φ(k).
The next fact summarises the properties of the normal distribution. Some of these proper-
ties are illustrated in the figure that follows.
Fact 10. Let Z ∼ N(0, 1) and let φ and Φ be the PDF and CDF of Z.
1. Φ(∞) = 1. (The area under the entire PDF is 1. This, of course, is true of any random
variable.)
2. φ(a) > 0, for all a ∈ R. (The PDF is positive everywhere. This has the surprising
implication that no matter how large a is, there is always some non-zero probability that
Z ≥ a.)
3. E [Z] = 0. (The mean of Z is 0.)
4. The PDF φ reaches a global maximum at the mean 0. (In fact, we can go ahead and
1
compute φ (0) = √ ≈ 0.399.)
2π
5. V [Z] = 1. (The variance of Z is 1.)
6. P (Z ≤ a) = P (Z < a). (We’ve already discussed this earlier. It makes no difference
whether the inequality is strict. This is because P(Z = a) = 0.)
7. The PDF φ is symmetric about the mean. This has several implications:
(a) P (Z ≥ a) = P (Z ≤ −a) = Φ(−a).

(b) Since P (Z ≥ a) = 1 − P (Z ≤ a) = 1 − Φ(a), it follows that Φ(−a) = 1 − Φ(a) or,
equivalently, Φ(a) = 1 − Φ(−a).
(c) Φ(0) = 1 − Φ(0) = 0.5.
8. P (−1 ≤ Z ≤ 1) = Φ (1) − Φ (−1) ≈ 0.6827. (There is probability 0.6827 that Z takes on

values within 1 standard deviation of the mean.)
values within 2 standard deviations of the mean.)
values within 3 standard deviations of the mean.)
11. The PDF φ has two points of inflexion, namely at ±1. (The points of inflexion are one
standard deviation away from the mean.)
Proof. Omitted.

-4 -3 -2 -1 0 1 2 3 4

Example 192. Let’s use the TI84 to find Φ(2.51).
1. Press the blue 2ND button and then DISTR (which corresponds to the VARS button).
This brings up the DISTR menu.
2. Press 2 to select the “normalcdf” option.
The TI84 is now asking for your lower and upper bounds. Since Φ(2.51) = Φ(2.51)−Φ(−∞),
your lower bound is −∞ and your upper bound is 2.51.
3. But there’s no way to enter −∞ on your TI84. So instead, you’ll enter −1099 , which is
simply a very large negative number. To do so, press (-) , the blue 2ND button, EE
(which corresponds to the , button), and then 9 9 . (Don’t press ENTER yet!)
4. Now to enter your upper bound. First press , (this simply demarcates your lower and
upper bounds). Then enter your upper bound 2.51 by pressing 2 . 5 1 . Then press
ENTER . Your TI84 says that the answer is Φ(2.51) ≈ 0.99396.
-4 -3 -2 -1 0 1 2 3 4

Example 193. To find Φ(−2.51), Φ(1.372), and P (−4 ≤ Z ≤ 4), the steps are very similar.
So for each, I’ll simply give the screenshot from the TI84:
Φ(−2.51) Φ(1.372) P (−4 ≤ Z ≤ 4)
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
Example 194. We’ll find Φ(2.51), Φ(−2.51), Φ(1.372), and P (−4 ≤ Z ≤ 4) using Z-tables.
Refer to the Z-tables on p. 188. (These are the exact same tables that appear on the List
of Formulae you’ll get during exams.)
• To find Φ(2.51), look at the row labelled 2.5 and the column labelled 1 — read off the
number 0.9940. We thus have Φ(2.51) = 0.9940.
• To find Φ(−2.51), note that the table does not explicitly give values of Φ(z), if z < 0.
But we can exploit the fact that the standard normal is symmetric about the mean µ = 0.
This fact implies that Φ(−z) = 1 − Φ(z). Hence, Φ(−2.51) = 1 − Φ(2.51) = 0.0060.
• To find Φ(1.372), first look at the row labelled 1.3 and the column labelled 7 — read off
the number 0.9147. This tells us that Φ(1.37) = 0.9147. Now look at the right end of the
table (where it says “ADD”). Since the third decimal place of 1.372 is 2, we look under
the column labelled 2 — this tells us to ADD 3. Thus, Φ(1.372) = 0.9147+0.003 = 0.9150.
• To find P (−4 ≤ Z ≤ 4), the Z-tables printed are actually useless, because they only go
to 2.99. So you can just write P (−4 ≤ Z ≤ 4) ≈ 1.
Exercise 79. Using both the Z-tables and your graphing calculator, find the following:
(a) P (Z ≥ 1.8). (b) P (−0.351 < Z < 1.2). (Answer on p. 355.)

THE NORMAL DISTRIBUTION FUNCTION
If Z has a normal distribution with mean 0 and

variance 1 then, for each value of z, the table gives
the value of (z) , where
(z )  P(Z  z).
For negative values of z use (z)  1  (z) .
1 2 3 4 5 6 7 8 9
z 0 1 2 3 4 5 6 7 8 9
ADD
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 4 8 12 16 20 24 28 32 36
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 4 8 12 16 20 24 28 32 36
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 4 8 12 15 19 23 27 31 35
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 4 7 11 15 19 22 26 30 34
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 4 7 11 14 18 22 25 29 32
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 3 7 10 14 17 20 24 27 31
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 3 7 10 13 16 19 23 26 29
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 3 6 9 12 15 18 21 24 27
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 3 5 8 11 14 16 19 22 25
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3 5 8 10 13 15 18 20 23
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 2 5 7 9 12 14 16 19 21
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 2 4 6 8 10 12 14 16 18
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 2 4 6 7 9 11 13 15 17
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 2 3 5 6 8 10 11 13 14
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1 3 4 6 7 8 10 11 13
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1 2 4 5 6 7 8 10 11
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1 2 3 4 5 6 7 8 9
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1 2 3 4 4 5 6 7 8
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1 1 2 3 4 4 5 6 6
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 1 1 2 2 3 4 4 5 5
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 0 1 1 2 2 3 3 4 4
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 0 1 1 2 2 2 3 3 4
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 0 1 1 1 2 2 2 3 3
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 0 1 1 1 1 2 2 2 2
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 0 0 1 1 1 1 1 2 2
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 0 0 0 1 1 1 1 1 1
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 0 0 0 0 1 1 1 1 1
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 0 0 0 0 0 1 1 1 1
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 0 0 0 0 0 0 0 1 1
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 0 0 0 0 0 0 0 0 0
Critical values for the normal distribution
If Z has a normal distribution with mean 0 and

variance 1 then, for each value of p, the table
gives the value of z such that
P(Z  z) = p.
p 0.75 0.90 0.95 0.975 0.99 0.995 0.9975 0.999 0.9995

z 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291
53.1 The Normal Distribution, in General
Let Z ∼ N(0, 1) be the SNRV and σ, µ ∈ R be constants.
Consider σZ + µ, itself a random variable. We know that since E [Z] = 0 and V [Z] = 1, it
follows from the properties from the mean and variance that
E [σZ + µ] = σE [Z] + µ = µ and V [σZ + µ] = σ 2 V [Z] = σ 2 .
It turns out that σZ + µ is a normal random variable with mean µ and variance σ 2 :
Definition 8. X is called a normal random variable with mean µ and variance σ 2 if its
PDF fX ∶ R → R is defined by:
1 a−µ 2
fX (a) = √ e−0.5( σ ) .
σ 2π
Once again, for the A-levels, you need not remember this complicated-looking PDF. Nor
need you understand where it comes from.
1. X is a normal random variable with mean µ and variance σ 2 .

2. X is a random variable with normal distribution of mean µ and variance σ 2 .
3. X ∼ N (µ, σ 2 ).

Example 195. The normal random variables A ∼ N(−1, 1), B ∼ N(1, 1), and C ∼ N(2, 1)
have variance 1 (just like the SNRV), but non-zero means. Their PDFs are graphed below.
(Included for reference is the standard normal PDF in black.)
We see that the effect of increasing the mean µ is to move the graph of the PDF rightwards.
And decreasing the mean moves it leftwards.

Example 196. The normal random variables D ∼ N(0, 0.1), E ∼ N(0, 2), and F ∼ N(0, 3)
have mean 0 (just like the SNRV), but non-unit variances. Their PDFs are graphed below.
(Included for reference is the standard normal PDF in black.)
The effect of changing the variance σ 2 is this:
• The larger the variance, the “fatter” the “tails” of the PDF and the shorter the peak.
• Conversely, the smaller the variance, the “thinner” the “tails” of the PDF and the taller
the peak.

Example 197. The normal random variables G ∼ N(−1, 0.1), H ∼ N(1, 2), and I ∼ N(2, 3)
have non-zero means and non-unit variances. Their PDFs are graphed below. (Included
for reference is the standard normal PDF in black.)
Exercise 80. Let X ∼ N(µ, σ 2 ). Verify that if µ = 0 and σ 2 = 1, then for all a ∈ R, we have
fX (a) = φ(a). What can you conclude? (Answer on p. 355.)

In general, normality is preserved under linear transformations:
Fact 11. Let X ∼ N (µ, σ 2 ) and a, b ∈ R be constants. Then aX + b ∼ N (aµ + b, a2 σ 2 ).
Proof. Omitted.
Thus, we can easily transform any normal random variable into the SNRV:
X −µ
Corollary 1. If X ∼ N (µ, σ 2 ), then = Z ∼ N(0, 1). Equivalently, X = σZ + µ.
σ
Proof. The next exercise asks you to prove this corollary.
X −µ
Exercise 81. Using Fact 11, prove that if X ∼ N (µ, σ 2 ), then = Z ∼ N(0, 1). (Answer
σ
on p. 356.)
The above corollary gives us an alternative method for computing probabilities associated
with normal random variables. In general, if X ∼ N (µ, σ 2 ), then
c−µ c−µ
P (X ≤ c) = P (σZ + µ ≤ c) = P (Z ≤ ) = Φ( ).
σ σ

The properties that we listed for the SNRV also apply, with only a few modifications, to
any NRV. I highlight any differences in red. The figure that follows illustrates.
Fact 12. Let X ∼ N (µ, σ 2 ) and let fX and FX be the PDF and CDF of X.
1. Φ(∞) = 1. (The area under the entire PDF is 1. This, of course, is true of any random
variable.)
2. φ(a) > 0, for all a ∈ R. (The PDF is positive everywhere. This has the surprising
implication that no matter how large a is, there is always some non-zero probability that
Z ≥ a.)
3. E [X] = µ. (The mean of Z is µ.)
4. The PDF fX reaches a global maximum at the mean µ. (In fact, we can go ahead and
1 0.399
compute fX (µ) = √ ≈ .)
σ 2π σ
5. V [X] = σ 2 . (The variance of X is σ 2 .)
6. P (Z ≤ a) = P (Z < a). (We’ve already discussed this earlier. It makes no difference
whether the inequality is strict. This is because P(Z = a) = 0.)
7. The PDF φ is symmetric about the mean. This has several implications:
(a) P (X ≥ µ + a) = P (X ≤ µ − a) = FX (µ − a).
(b) Since P (X ≥ µ + a) = 1 − P (X ≤ µ + a) = 1 − FX (µ + a), it follows that FX (µ − a) =
1 − FX (µ + a) or, equivalently, FX (µ + a) = 1 − FX (µ − a).
(c) FX (µ) = 1 − FX (µ) = 0.5.
8. P (µ − σ ≤ X ≤ µ + σ) = Φ (1) − Φ (−1) ≈ 0.6827. (There is probability 0.6827 that X takes

on values within 1 standard deviation of the mean.)
on values within 2 standard deviations of the mean.)
on values within 3 standard deviations of the mean.)
11. The PDF φ has two points of inflexion, namely at ±σ. (The points of inflexion are one
standard deviation away from the mean.)
Proof. Omitted.

Example 198. Let G ∼ N(−1, 0.1), H ∼ N(1, 2), and I ∼ N(2, 3). We’ll find P (G < 2) using
our TI84. The first few steps are similar to before:
1. Press the blue 2ND button and then VARS (which corresponds to the DISTR button).
This brings up the DISTR menu.
2. Press 2 to select the “normalcdf” option.
3. Enter the lower bound −1099 by pressing (-) , the blue 2ND button, EE (which corre-
sponds to the , button), and then 9 9 . (Don’t press ENTER yet!)
4. Enter the upper bound 2 by pressing , and 2 . (Don’t press ENTER yet!!).
Previously, we didn’t bother telling the TI84 our mean µ and standard deviation σ.
And so by default, if we pressed ENTER at this point, the TI84 simply assumed that we
wanted the SNRV Z ∼ N(0, 1). Now we’ll tell the TI84 what µ and σ are:
5. First enter the mean µ = −1. Press , (-) 1 .

√ √
6. Now enter the standard deviation σ = 0.1 (and not the variance). Press , 0
. 1 ) . Finally, press ENTER . The TI84 says that P (G < 2) ≈ 1.
Finding P (H < 2), P (I < 2), P (−1 < G < 1), P (−1 < H < 1), and P (−1 < I < 1) is similar:
P (H < 2) and P (I < 2) P (−1 < G < 1) P (−1 < H < 1) P (−1 < I < 1)
Since I has mean µ = 2, we should have exactly P (I < 2) = 0.5. So here the TI84 has
actually made a small error in reporting instead that P (I < 2) ≈ 0.5000000005.

Example 199. We now redo the previous two examples, but use Z-tables:
2 − µG 2 − (−1)
P (G < 2) = P (Z < = √ ≈ 9.4868) = Φ (9.4868) ≈ 1,
σG 0.1
2 − µH 2 − 1
P (H < 2) = P (Z < = √ ≈ 0.7071) = Φ (0.7071) ≈ 0.7601,
σH 2
2 − µI 2 − 2
P (I < 2) = P (Z < = √ = 0) = Φ (0) = 0.5,
σI 3
−1 − (−1) 1 − (−1)
P (−1 < G < 1) = P (0 = √ <Z< √ ≈ 6.3246)
0.1 0.1
= Φ (6.3246) − Φ (0) ≈ 1 − Φ(0) = 0.5.
−1 − 1 1−1
P (−1 < H < 1) = P (−1.4142 ≈ √ < Z < √ = 0)
2 2
= Φ(0) − Φ(−1.4142) ≈ 0.5 − [1 − Φ(1.4142)]
= Φ(1.4142) − 0.5 ≈ 0.9213 − 0.5 = 0.4213,
−1 − 2 1−2
P (−1 < I < 1) = P (−1.7321 ≈ √ < Z < √ ≈ −0.5774)
3 3
= Φ(−0.5774) − Φ(−1.7321) = 1 − Φ(0.5774) − [1 − Φ(1.7321)]
≈ 0.9584 − 0.7182 = 0.2402.
Exercise 82. Let X ∼ N(2.14, 5) and Y ∼ N(−0.33, 2). Using both the Z-tables and your
graphing calculator, find the following: (a) P (X ≥ 1) and P (Y ≥ 1). (b) P (−2 ≤ X ≤ −1.5)
and P (−2 ≤ Y ≤ −1.5). (Answer on p. 356.)

53.2 Sum of Independent Normal Random Variables
Theorem 1. If X and Y are independent normal random variables, then X + Y is also a

normal random variable. Moreover, X − Y is also a normal random variable.
Proof. Omitted.
We already knew from before that E [X ± Y ] = E [X] ± E [Y ]. Moreover, if X and Y are

independent, then V [X ± Y ] = V [X] + V [Y ]. Thus, the above theorem implies:
2
Corollary 2. Let X ∼ N (µX , σX ) and Y ∼ N (µY , σY2 ) be independent and a, b ∈ R
2
be constants. Then X + Y ∼ N (µX + µY , σX + σY2 ) and more generally, aX + bY ∼
N (aµX + bµY , a2 σX
2
+ b2 σY2 ).
2
Moreover, X − Y ∼ N (µX − µY , σX + σY2 ) and more generally, aX − bY ∼
N (aµX − bµY , a2 σX
2
+ b2 σY2 ).
Examples:

Example 200. The weight (in kg) of a sumo wrestler is modelled by X ∼ N (200, 50).
Assume that the weight of each sumo wrestler is independent of the weight of any other
sumo wrestler.
We randomly choose two sumo wrestlers.
(a) What is the probability that their total weight is greater than 405 kg?
(b) What is the probability that one is more than 10% heavier than that the other?
(a) Let X1 ∼ N (200, 50) and X2 ∼ N (200, 50) be the weight of the first and second sumo
wrestler. Then X1 + X2 ∼ N (400, 100). Thus,
405 − 400
P (X1 + X2 > 405) = P (Z > √ ) = P (Z > 0.5) = 1 − Φ (0.5) ≈ 1 − 0.6915 = 0.3085.
100
(b) Our goal is to find p = P (X1 > 1.1X2 ) + P (X2 > 1.1X1 ). This is the probability that
the first sumo wrestler is more than 10% heavier than the second, plus the probability that
the second is more than 10% heavier than the first. Of course, by symmetry, these two
probabilities are equal. Thus, p = 2 × P (X1 > 1.1X2 ). Now,
P (X1 > 1.1X2 ) = P (X1 − 1.1X2 > 0) .
But X1 − 1.1X2 ∼ N (200 − 1.1 ⋅ 200, 50 + 1.12 ⋅ 50) = N (−20, 110.5). Thus,
0 − (−20)
P (X1 > 1.1X2 ) = P (X1 − 1.1X2 > 0) = P (Z > √ )
110.5
≈ P (Z > 1.9026) = 1 − Φ (1.9026) ≈ 1 − 0.9714 = 0.0286.
Altogether then, p = 2P (X1 > 1.1X2 ) = 2 × 0.0286 = 0.0572.

Example 201. The weight (in kg) of a caught fish is modelled by X ∼ N (1, 0.4). The
weight (in kg) of a caught shrimp is modelled by Y ∼ N (0.1, 0.1). Assume that the weights
of any caught fish and shrimp are independent.
(a) What is the probability that the total weight of 4 caught fish and 50 caught shrimp is
greater than 10 kg?
(b) What is the probability that a caught fish weighs more than 9 times as much as a
caught shrimp?
(a) Let S be the total weight of 4 caught fish and 50 caught shrimp. Note, importantly,
that it would be wrong to write S = 4X + 50Y , because 4X + 50Y would be 4 times the
weight of a single caught fish, plus 50 times the weight of a single caught shrimp.
In contrast, we want Z to be the sum of the weights of 4 independent fish and 50 independent
shrimp. Thus, we should instead write S = X1 + X2 + X3 + X4 + Y1 + Y2 + ⋅ ⋅ ⋅ + Y50 , where
• X1 ∼ N (1, 0.4), X2 ∼ N (1, 0.4), X3 ∼ N (1, 0.4), and X4 ∼ N (1, 0.4) are the weights of
each caught fish.
• Y1 ∼ N (0.1, 0.1), Y2 ∼ N (0.1, 0.1), . . . , and Y50 ∼ N (0.1, 0.1) are the weights of each
caught shrimp.
Now, S ∼ N (4 × 1 + 50 × 0.1, 4 × 0.4 + 50 × 0.1) = N (9, 6.6).
(Note by the way that in contrast, 4X +50Y ∼ N (9, 42 × 0.4 + 502 × 0.1) = N (9, 256.4), which
has a rather different variance!)
Thus, P (S > 10) ≈ 0.3485 (calculator).
(b) P (X > 9Y ) = P (X − 9Y > 0). But X − 9Y ∼ N (1 − 9 × 0.1, 0.4 + 92 × 0.1) = N (0.1, 8.5).
Thus, P (X − 9Y > 0) ≈ 0.5137 (calculator).

Exercise 83. (Answer on p. 357.) Water and electricity usage are billed, respectively, at
$2 per 1, 000 litres (l) and $0.30 per kilowatt-hour (kWh). Assume that each month, the
amount of water used by Ahmad (and his family) at their HDB flat is normally distributed
with mean 25, 000 l and variance 64, 000, 000 l2 . Similarly, the amount of electricity they
use is normally distributed with mean 200 kWh and variance 10, 000 kWh2 .
Assume that monthly water usage and electricity usage are independent.
(a) Find the probability that their total water and electricity utility bill in any given month
exceeds $100.
(b) Find the probability that their total water and electricity utility bill in any given year
exceeds $1, 000.
Suppose instead that electricity usage is billed at $x per kWh.
(c) Then what is the maximum value of x, in order for the probability that the total utility
bill in a given month exceeds $100 is 0.1 or less?

54 The Central Limit Theorem and The Normal
Approximation
Suppose we have n independent random variables, each identically-distributed with mean

µ ∈ R and variance σ 2 ∈ R. Then informally, the Central Limit Theorem (CLT) says:
If n is “large enough”, then the sum of n independent,

identically-distributed random variables is
well-approximated by a normal distribution.
How large is “large enough”? The most common rule-of-thumb is that n ≥ 30 is “large
enough”, so that’s what we’ll use in this book, even though this is somewhat arbitrary.

Example 202. Let X be the random variable that is the sum of 100 rolls of a fair die. From
our earlier work, we know that each die roll has mean 3.5 and variance 35/12. Problem:
Find P(X ≥ 360) and P(X > 360).
The CLT says that since n = 100 ≥ 30 is large enough and the distribution is “nice enough”
(we are assuming this), the random variable X can be approximated by the normal random
variable Y ∼ N (100 × 3.5, 100 × 35/12) = N (350, 3500/12).
Now, in using Y as an approximation for X, we might be tempted to simply write
P(X ≥ 360) ≈ P(Y ≥ 360) and P(X > 360) ≈ P(Y > 360).
Note however that X is a discrete random variable, so that P(X ≥ 360) ≠ P(X > 360).
More specifically,
P(X ≥ 360) = P(X = 360) + P(X > 360).
In contrast, Y is a continuous random variable, so that P(Y ≥ 360) = P(Y > 360). Hence, if
we simply use the approximations P(X ≥ 360) ≈ P(Y ≥ 360) and P(X > 360) ≈ P(Y > 360),
then implicitly we’d be saying that P(X = 360) = 0, which is blatantly false.
To correct for this, we perform the so-called continuity correction. This says that we’ll
instead use the approximations
P(X ≥ 360) ≈ P(Y ≥ 359.5) and P(X > 360) ≈ P(Y ≥ 360.5).
Thus, P(X ≥ 360) ≈ P(Y ≥ 359.5) ≈ 0.2890 (calculator) and P(X > 360) ≈ P(Y ≥ 360.5) ≈
0.2693.

Continuity Correction. If X is a discrete random variable that is to be approximated
by a continuous random variable Y , then
• P (X ≥ k) ≈ P (Y ≥ k − 0.5),
• P (X ≤ k) ≈ P (Y ≤ k + 0.5),
• P (X > k) ≈ P (Y > k + 0.5),
• P (X < k) ≈ P (Y < k − 0.5).
Note that if the random variable to be approximated is itself continuous, then there is no
need to perform the continuity correction. This is illustrated in Exercise 85 below.
Exercise 84. Let X be the random variable that is the sum of 30 rolls of a fair die. Find
P(100 ≤ X ≤ 110). (Answer on p. 358.)
Exercise 85. The weight of each Coco-Pop is independently- and identically-distributed

with mean 0.1 g and variance 0.004 g2 . A box of Coco-Pops has exactly 5, 000 Coco-Pops.
It is labelled as having a net weight of 500 g. Find the probability that that the actual net
weight of the Coco-Pops in this box is less than or equal to 499 g. (Answer on p. 358.)

55 Sampling
55.1 Population
A population is simply any ordered set of objects we’re interested in.
Example 203. The two candidates for the 2016 Bukit Batok SMC By-Election are Dr.
Chee Soon Juan and PAP Guy. It is the night of the election and voting has just closed.
Our objects-of-interest are the 23, 570 valid ballots cast. (A ballot is simply a piece of paper
on which a vote is recorded. The words ballot and vote are often used interchangeably.)
Arrange the ballots in any arbitrary order. Let v1 = 1 if the first ballot is in favour of Dr.
Chee and v1 = 0 otherwise. Similarly and more generally, for any i = 2, 3, . . . , 23570, let
vi = 1 if the ith ballot is in favour of Dr. Chee and v1 = 0 otherwise.
Our population here is simply the ordered set P = (v1 , v2 , . . . , v23570 ). So in this example,
the population is simply an ordered set of 1s and 0s.

55.2 Population Mean and Population Variance
The population mean µ is simply the average across all population values. The popu-
lation variance σ 2 is a measure of the variation across all population values. Formally:16
Definition 9. Given a finite population P = (v1 , v2 , . . . , vk ), the population mean µ and

population variance σ 2 are defined by
2 2 2 2
∑i=1 vi v1 + v2 + ⋅ ⋅ ⋅ + vk ∑i=1 (vi − µ) (v1 − µ) + (v2 − µ) + ⋅ ⋅ ⋅ + (vk − µ)
k k
2
µ= = and σ = = .
k k k k
Example 205 (continued from above). Suppose that of the 23, 570 votes, 9, 142 were
for Dr. Chee and the remaining against. So the vector (v1 , v2 , . . . , v23570 ) contains 9, 142 1s
and 14, 428 0s.
Then the population mean is
v1 + v2 + ⋅ ⋅ ⋅ + vn 9142 × 1 + 14428 × 0 9142

µ= = = ≈ 0.3879.
n 23570 23570
In this particular example, the population values are binary (either 0 or 1). And so we have
a nice alternative interpretation: the population mean is also the population proportion.
In this case, it is the proportion of the population who voted for Dr. Chee. So here the
proportion of votes for Dr. Chee is about 0.3879.
The population variance is
2 2 2 9142 9142 2 2
2 (v1 − µ) + (v2 − µ) + ⋅ ⋅ ⋅ + (vn − µ) 9142 ⋅ (1 − 23570 ) + 14428 ⋅ (0 − 23570 )
σ = = ≈ 0.2374.
n 23570
As usual, the variance tells us about the degree to which the vi ’s vary. Of course, in this
example, we already know that the vi ’s can take on only two values — 0 and 1. So the
variance isn’t terribly interesting or informative in this example. In particular, it doesn’t
tell us anything more that the population mean didn’t already tell us (indeed, it can be
shown that in this example, σ 2 = µ − µ2 ).
16
In the case of an infinite population, the definitions of µ and σ 2 must be adjusted slightly, but the intuition is the same.

55.3 Parameter
Informally, a parameter is some number we’re interested in and which may be calculated
based on the population.
Example 205 (continued from above). A parameter we might be interested in is the

population mean µ — this is also the proportion of votes in favour of Dr. Chee. (Another
parameter we might be interested in is the population variance σ 2 , but let’s ignore that for
now.)
Voting has just closed. In a few hours’ time (after the vote-counting is done), we will know
what exactly µ is. But right now, we still don’t know what µ is.
Suppose we are impatient and want to know right away what µ might be. In other words,
suppose we want to get an estimate of the true value of µ. What are some possible
methods of getting a quick estimate of µ?
One possibility is to observe a random sample of 100 votes and count the proportion of
these 100 votes that are in favour of Dr. Chee. So for example, say we do this and observe
that 39 out of the 100 votes are for Dr. Chee. That is, we find that the observed sample
mean (which in this context can also be called the observed sample proportion) is
0.39. Then we might conclude:
Based on this observed random sample of 100 votes, we estimate that µ is 0.39.
The layperson might be content with this. But the statistician digs a little deeper and asks
questions such as:
• How do we know if this estimate is “good”?
• What are the criteria to determine whether an estimate is “good”?
We’ll now try to address, if only to a limited extent, these questions. But to do so, we must
first precisely define terms like sample and estimate.

55.4 Distribution of a Population
Informally,17 the distribution of a population tells us
1. The range of possible values taken on by the objects in the population; and
2. The proportion of the population that takes on each possible value.
Example 205 (continued from above). The population is P = (v1 , v2 , . . . , v23570 ), the
ordered set of 23570 ballots. Suppose that of these, 9, 142 are votes for Dr. Chee (hence
recorded as 1s) and the remaining 14, 428 are for PAP Guy (hence recorded as 0s).
Then the distribution of the population can informally be described in words as:
• A proportion 9142/23570 of the population are 1s, and
• A proportion 14428/23570 of the population are 0s.
Example 204. The population is P = (3, 4, 7, 7, 2, 3).
Then the distribution of the population can informally be described in words as:
• A proportion 1/6 of the population are 2s;
• A proportion 2/6 of the population are 3s;
• A proportion 1/6 of the population are 4s; and
• A proportion 2/6 of the population are 7s.
17
Formally, we’d define the population distribution as a function. Indeed, some writers define the population itself as the
distribution function.

55.5 A Random Sample
Informally, to observe a random sample of size n, we follow this procedure: Imagine the
23, 570 ballots are in a single big bag.
1. Randomly pull out one ballot. Record the vote (either we write x1 = 1, if the vote was
for Dr. Chee, or we write x1 = 0, if it wasn’t).
2. Put this ballot back in (this second step is why we call it sampling with replacement).
3. Repeat the above n times in total, so as to record down the values of x1 , x2 , . . . , xn .
We call (x1 , x2 , . . . , xn ) an observed random sample of size n. Note that this is an

ordered set (or vector) of numbers. Formally:
Definition 10. Let P be a population. Then the random vector (i.e. ordered set of random
variables) (X1 , X2 , . . . , Xn ) is a random sample of size n from the population P if
• X1 , X2 , . . . , Xn are independent; and

• X1 , X2 , . . . , Xn are identically-distributed, with the same distribution as P .
As always, we must be careful to distinguish between a function and a value taken on by

the function. This table summarises.
Function Value taken by the function

f is a function f (x) is a possible value taken on by the function
X is a random variable x is a possible observed value of the random variable
(X1 , X2 , . . . , Xn ) is a random sample (x1 , x2 , . . . , xn ) is a possible observed random sample
An example to illustrate:

Example 205 (continued from above). To repeat, the distribution of the population
P = (v1 , v2 , . . . , v23570 ) can informally be described in words as:
• 9142/23570 of the population were 1s; and
• 14428/23570 of the population were 0s.
Let X1 , X2 , and X3 be independent random variables, each with the same distribution as
the population. That is, for each i = 1, 2, 3,
14428 9142
P (Xi = 0) = and P (Xi = 1) = .
23570 23570
The ordered set (or vector) (X1 , X2 , X3 ) is a random sample of size 3.
An example of an observed random sample of size 3 might be (x1 , x2 , x3 ) = (1, 1, 0)

— this would be where we randomly sample 3 ballots (with replacement) and find that the
first two are votes for Dr. Chee but the third is not.
Another example of an observed random sample of size 3 might be (x1 , x2 , x3 ) = (0, 0, 0) —

this would be where we randomly sample 3 ballots (with replacement) and find that none
of the three are for Dr. Chee.
As another example, (X1 , X2 , X3 , X4 , X5 ) is a random sample of size 5.
An example of an observed random sample of size 5 might be (x1 , x2 , x3 , x4 , x5 ) =

(0, 1, 0, 1, 0) — this would be where we randomly sample 5 ballots (with replacement) and
find that only the second and fourth are votes for Dr. Chee.
Another example of an observed random sample of size 5 might be (x1 , x2 , x3 , x4 , x5 ) =

(1, 1, 0, 1, 1) — this would be where we randomly sample 5 ballots (with replacement) and
find that only the third is not a vote for Dr. Chee.
In this textbook, we’ll be very careful to distinguish between a random sample (which is
a vector of random variables) and an observed random sample (which is a vector of real
numbers).
This may be contrary to the practice of your teachers or indeed even the A-level exams.

55.6 Sample Mean and Sample Variance
Definition 11. Let S = (X1 , X2 , . . . , Xn ) be a random sample of size n. Then the corre-
sponding sample mean X̄ and the sample variance of S are the random variables defined
by:
X 1 + X 2 + ⋅ ⋅ ⋅ + Xn
X̄ = ,
n
2 2 2 2
2
(X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (Xn − X̄) ∑i=1 (Xi − X̄)
n
S = = .
n−1 n−1
(The List of Formulae you get during exams will contain the observed sample variance.)
Note that strangely enough, the denominator of S 2 is n − 1, rather than n as one might
expect. As we’ll see later, there is a good reason for this.
By the way, there are two other formulae for calculating the sample variance:
Fact 13. Let S = (X1 , X2 , . . . , Xn ) be a random sample of size n. Let X̄ be the sample
mean and S 2 be the sample variance. Let a ∈ R be a constant. Then
2 2
[∑n
i=1 Xi ] 2 [∑ (X −a)]
n
∑i=1 Xi2 − ∑i=1 (Xi − a) − i=1 n i
n n
2 2
(a) S = n
and (b) S = .
n−1 n−1
(The List of Formulae has a but not b.)
Proof. Omitted.

Once again, it is important to distinguish between
• The sample mean X̄ (a random variable) vs. the observed sample mean x̄ (a real
number).
• The sample variance S 2 (a random variable) vs. the observed sample variance s2
(a real number).
Example 205 (continued from above). Let (X1 , X2 , X3 ) be a random sample of size 3.
The corresponding sample mean X̄ and sample variance S 2 are these random variables:
2 2 2
X1 + X2 + X3 (X1 − X̄) + (X2 − X̄) + (X3 − X̄)
X̄ = , S2 = .
3 3−1
Suppose our observed random sample of size 3 is (1, 0, 0). Then the corresponding ob-
served sample mean x̄ and observed sample variance s2 are these real numbers:
x1 + x2 + x3 1 + 0 + 0 1
x̄ = = = ,
n 3 3
2 2 2
2
2 2
(x1 − x̄) + (x2 − x̄) + (x3 − x̄)
2
(1 − 13 ) + (0 − 31 ) + (0 − 31 ) 1
s = = = .
n−1 3−1 3
Let (X1 , X2 , X3 , X4 , X5 ) be a random sample of size 5. The corresponding sample mean X̄

and sample variance S 2 are these random variables:
2 2 2
X 1 + X 2 + X 3 + X4 + X5 (X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (X5 − X̄)
X̄ = , S2 = .
5 5−1
Suppose our observed random sample of size 5 is (0, 1, 0, 0, 1). Then the corresponding
observed sample mean x̄ and observed sample variance s2 are these real numbers:
x1 + x2 + x3 + x4 + x5 0 + 1 + 0 + 0 + 1 2
x̄ = = = = 0.4,
n 5 5
2 2 2 2 2
2 (x1 − x̄) + (x2 − x̄) + (x3 − x̄) + (x4 − x̄) + (x5 − x̄)
s =
n−1
2 2 2 2 2
(0 − 51 ) + (1 − 15 ) + (0 − 51 ) + (0 − 15 ) + (1 − 51 )
= = 0.35.
5−1

We call a random variable an estimator if it is used to generate estimates (“guesses”)
for some parameter. Example:
Example 205 (continued from above). It is the night of the election and polling has
just closed. We still do not know the true proportion µ that voted for Dr. Chee.
We decide to get a random sample of size 3: (X1 , X2 , X3 ). The corresponding sample mean
X̄3 = (X1 + X2 + X3 ) /3 shall be an estimator for µ. (Informally, an estimator is a method
for generating “guesses” for some unknown parameter, in this case µ.)
This estimator is used to generate estimates (“guesses”) for µ. For every observed
random sample, the estimator generates an estimate.
Suppose our observed random sample of size 3 is (1, 0, 0). We calculate the corresponding
observed sample mean to be x̄ = 1/3. We say that x̄ = 1/3 is an estimate for µ.
(By the way, unless we are extremely lucky, it is highly unlikely that the true value of the
unknown parameter µ is precisely 1/3. After all, 1/3 is merely an estimate obtained from
a single observed random sample of size 3.)
Suppose instead that our observed random sample of size 3 were (0, 1, 1). Then the cor-
responding observed sample mean would be x̄ = 2/3. We’d instead say that x̄ = 2/3 is our
estimate for µ.
There is also more than one estimator we can use. For example, suppose instead that we
decide to get a random sample of size 5: (X1 , X2 , X3 , X4 , X5 ). We shall instead use the
corresponding sample mean X̄ = (X1 + X2 + X3 + X4 + X5 ) /3 as our estimator for µ. And
so for example suppose our observed random sample of size 5 is is (0, 1, 0, 0, 1). Then the
corresponding observed sample mean x̄ = 0.4 and x̄ = 0.4 would be our estimate for µ.
Now, are these estimators and estimates “good” or “reliable”? How much should we
trust them? These are questions that we’ll address in the next section.
A different example:

Example 205. Suppose we wish to find the average height µ (in cm) of an adult male.
As a practical matter, it would be quite difficult to locate and record the height of every
adult male in the world. So instead, what we might do is to randomly pick 4 adult males
and record their heights. This gives us a random sample (H1 , H2 , H3 , H4 ) of heights. The
corresponding sample mean is the random variable H̄ = (H1 + H2 + H3 + H4 ) /4. H̄ shall
serve as our estimator for µ.
Suppose our observed random sample is (h1 , h2 , h3 , h4 ) = (178, 165, 182, 175).
Then the corresponding observed sample mean is
h1 + h2 + h3 + h4 178 + 165 + 182 + 175

h̄ = = = 175.
n 4
Thus, h̄ = 175 serves as an estimate (or “guess”) of the true average male height µ.
Again, are the estimator H̄ and estimate h̄ = 175 “good” or “reliable”? How much should
we trust them? These are questions that we’ll address in the next section.

Example 206. Let X be the random variable that is the height (in cm) of an adult
female Singaporean. Our parameters-of-interest are the true population mean µ and true
population variance σ 2 of X. We wish to generate estimates for µ and σ 2 .
To this end, we get a random sample of size 8: (X1 , X2 , . . . , X8 ). The corresponding sample
mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X8 ) /8 will serve as our estimator for µ. And the corresponding
8
2
sample variance S 2 = ∑ (Xi − X̄) /(8 − 1) will serve as our estimator for σ 2 .
i=1
(a) Suppose our observed random sample is such that
8 8
∑ xi = 1, 320 and ∑ x2i = 218, 360.
i=1 i=1
Then the observed sample mean x̄ and the observed sample variance s2 are
n
∑i=1 xi 1320
x̄ = = = 165,
n 8
2
(∑n xi )
218360 − 1320
2
∑i=1 x2i − i=1n
n
2 8
s = = = 80.
n−1 7
And our estimates for µ and σ 2 are, respectively, 165 cm and 80 cm2 .
(b) Suppose instead our observed random sample is such that
8 8
2
∑(xi − 160) = 72 and ∑ (xi − 160) = 1, 560.
i=1 i=1
Then the observed sample mean x̄ and the observed sample variance s2 are
∑i=1 xi ∑i=1 (xi − 160 + 160) ∑i=1 (xi − 160)

n n n
72
x̄ = = = + 160 = + 160 = 169,
n n n 8
2
[∑ (x −a)]
2 n
1, 560 − 728
2
∑i=1 (xi − 160) − i=1 ni
n
2
s = = ≈ 130.3.
n−1 7
And our estimates for µ and σ 2 are, respectively, 169 cm and 130.3 cm2 .

Exercise 86. Calculate the observed sample mean and variance for the following observed
random sample of size 7: (3, 14, 2, 8, 8, 6, 0). (Answer on p. 358.)
Exercise 87. (Answer on p. 358.) Let X be the random variable that is the weight (in
kg) of an American. Suppose we are interested in estimating the true population mean µ
and variance σ 2 of X. We get an observed random sample of size 10: (x1 , x2 , . . . , x10 ).
10 10
(a) Suppose you are told that ∑ xi = 1, 885 and ∑ x2i = 378, 265. Find the observed sample
i=1 i=1
mean x̄ and observed sample variance s2 .
10 10
2
(b) Suppose you are instead told that ∑(xi − 50) = 1, 885 and ∑ (xi − 50) = 378, 265. Find
i=1 i=1
2
the observed sample mean x̄ and observed sample variance s .

55.7 Sample Mean and Sample Variance are Unbiased Estimators
Earlier we asked: How do we decide if an estimator and the estimates it generates are
“good”? How do we know whether to trust any given estimate?
For H1 Maths, we’ll learn only about one (important) criterion for deciding whether an
estimator is “good”. This is unbiasedness. Informally, an estimator is unbiased if on
average, the estimator “gets it right”. Formally:
Definition 12. Let X be a random variable and θ ∈ R be a parameter (i.e. just some real
number). We say that X is an unbiased estimator for θ if
E [X] = θ.
If x is an estimate generated by an unbiased estimator X, then we call x an unbiased

estimate.
The next proposition says that the sample mean X̄ is an unbiased estimator for the
population mean µ; and the sample variance S 2 is an unbiased estimator for the
population variance σ 2 .
Proposition 3. Let (X1 , X2 , . . . , Xn ) be a random sample of size n drawn from a distribu-

tion with population mean µ and population variance σ 2 . Let X̄ be the sample mean and
S 2 be the sample variance. Then
(a) E [X̄] = µ. And
(b) E [S 2 ] = σ 2 .
Proof. You are asked to prove (a) in Exercise 89. The proof of (b) is omitted.
Proposition 3(b) is the reason why, strangely enough, we define the sample variance with
n − 1 in the denominator:
2 2 2
(X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (Xn − X̄)
2
S = .
n−1
As defined, S 2 is an unbiased estimator for the population variance σ 2 . This, then, is the
reason why we define it like this.
Some writers call S 2 the unbiased sample variance, but we shall not bother doing so. We’ll
simply call S 2 the sample variance.

Example 203 (continued from above). (Chee Soon Juan election.)
Suppose two observed random samples of size 3 are (x1 , x2 , x3 ) = (1, 0, 0) and (x1 , x2 , x3 ) =
(1, 0, 1). The corresponding observed sample means are x̄1 = 1/3 and x̄2 = 2/3. These are
two possible estimates (“guesses”) of the true sample proportion µ.
Unless we’re extremely lucky, it’s unlikely that either of these two estimates is exactly
correct. Nonetheless, what the above unbiasedness proposition tells us is this:
Suppose the unknown population mean is µ = 0.39. We draw the following 10 observed
random samples of size 3 (table below). For each sample i, we calculate the corresponding
observed sample mean x̄i .
Sample i x1 x2 x3 x̄i
1 1 0 1 2/3
2 0 0 0 0
3 0 1 0 2/3
4 1 0 0 1/3
5 0 1 1 2/3
6 1 0 0 1/3
7 0 0 0 0
8 0 0 0 0
9 0 0 1 1/3
10 1 1 0 2/3
Note that every estimate x̄i is wrong. Indeed, since the sample mean X̄i can only take on
values 0, 1/3, 2/3, or 1, the estimates can never possibly be equal to the true µ = 0.39.
Nonetheless, what the above proposition says informally is that on average, the estimate
gets it correct. Formally, E [X̄] = µ = 0.39.
For a demonstration that you can play around with, try this Google spreadsheet.

Exercise 88. (Answer on p. 359.) We are interested in the weight (in kg) of Singaporeans.
We have an observed random sample of size 5: (32, 88, 67, 75, 56).
(a) Find unbiased estimates for the population mean µ and variance σ 2 of the weights of
Singaporeans. (State any assumptions you make.)
(b) What is the average weight of a Singaporean?
Exercise 89. Prove that E [X̄] = µ. (This is part (a) of Proposition 3). (Answer on p.
359.)
Exercise 90. Suppose we flip a coin 10 times. The first 7 flips are heads and the next 3
are tails. Let 1 denote heads and 0 denote tails. (Answer on p. 360.)
(a) Write down, in formal notation, our observed random sample, the observed sample
mean, and observed sample variance.
(b) Are these observed sample mean and variance unbiased estimates for the true population
mean and variance?
(c) Can we conclude that this a biased coin (i.e. the true population mean is not 0.5)?

55.8 The Sample Mean is a Random Variable
This section is just to repeat, stress, and emphasise that the sample mean X̄ is itself a
random variable. This is an important point.
Indeed, the sample mean X̄ is both (i) a random variable; and (ii) an estimator. In
contrast, an observed sample mean x̄ is both (i) a real number; and (ii) an estimate.
We’ve showed that E [X̄] = µ. This equation can be interpreted in two equivalent ways:
• The expected value of the sample mean equals the population mean µ.
• The sample mean is an unbiased estimator for the population mean µ.
We now give the variance of the sample mean. It turns out to be equal to the population
variance σ 2 , divided by the sample size n.
σ2
Fact 14. V [X̄] = .
n
Proof. You are asked to prove this fact in Exercise 91 .
1
Exercise 91. Prove Fact 14. (Hint: Note that X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) and X1 , X2 , . . . ,
n
Xn are independent.) (Answer on p. 360.)
Exercise 92. For each of the following terms, give a formal definition and an intuitive ex-
planation. (State whether each term is a random variable or a real number.) For simplicity,
you may assume that the finite population is given by P = (x1 , x2 , . . . , xk ). (Answer on p.
361.)
(a) The population mean.
(b) The population variance.
(c) The sample mean.
(d) The sample variance.
(e) The mean of the sample mean.
(f) The variance of the sample mean.
(g) The mean of the sample variance.
(h) The observed sample mean.
(i) The observed sample variance.

55.9 The Distribution of the Sample Mean
Fact 15. Let X1 , X2 , . . . , Xn ∼ N (µ, σ 2 ) be independent random variables. Then
X1 + X2 + ⋅ ⋅ ⋅ + X n σ2
X̄n = ∼ N (µ, ) .
n n
Proof. Corollary 2 tells us that the sum of normal random variables is itself a normal
random variable. So X1 + X2 + ⋅ ⋅ ⋅ + Xn is a normal random variable.
Fact 11 tells us that a linear transformation of a normal random variable is itself a normal
random variable. So X̄n = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) /n is a normal random variable.
In the previous sections, we already showed that X̄n has mean µ and variance σ 2 /n.
σ2
Altogether then, X̄n ∼ N (µ, ).
n
Now, suppose instead X1 , X2 , . . . , Xn are not normally-distributed. Surprisingly, a similar

result still holds, thanks to the CLT. Informally, draw X1 , X2 , . . . , Xn from any distribution.
Then thanks to the CLT:
If n is “large enough”, then X̄n = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) /n is

well-approximated by a normal distribution with mean µ
and variance σ 2 /n.
In the next chapter, we’ll make greater use of the two results just given in this section.

55.10 Non-Random Samples
Some examples to illustrate the concept of a non-random sample:
Example 207. Suppose we’re interested in the average height of a Singaporean. The only
way to know this for sure is to survey every single Singaporean. This, however, is not
practical.
Instead, we have only the resources to survey 100 individuals. We decide to go to a bas-
ketball court and measure the heights of 100 people there. We thereby gather an ob-
served sample of size 100: (x1 , x2 , . . . , x100 ). We find that the average individual’s height is
x̄ = ∑ xi /100 = 179 cm.
Is x̄ = 179 cm an unbiased estimate of the average Singaporean’s height? Intuitively, we

know that the answer is obviously no.
The reason is that our observed sample of size 100 was non-random. We picked a basketball
court, where the individuals are overwhelmingly (i) male; and (ii) taller than average. Our
estimate x̄ = 179 cm is thus probably biased upwards.
Example 208. Suppose we’re interested in what the average Singaporean family spends
on food each month. The only way to know this for sure is to survey every single family in
Singapore. This, however, is not practical.
Instead, we have only the resources to survey 100 families. We decide to go to Sixth
Avenue and randomly ask 100 families living there what they reckon they spend on food
each month. We thereby gather an observed sample of size 100: (x1 , x2 , . . . , x100 ). We find
that the average family spends x̄ = ∑ xi /100 = $2, 700 on food each month.
Is x̄ = $2, 700 an unbiased estimate of the average monthly spending on food by a Singa-
porean family? Intuitively, we know that the answer is obviously no.
The reason is that our observed sample of size 100 was non-random. We picked an unusually
affluent neighbourhood. Our estimate x̄ = $2, 700 is thus probably biased upwards.

56 Null Hypothesis Significance Testing (NHST)
Here’s a quick sketch of how Null Hypothesis Significance Testing (NHST) works:
Example 209. A piece of equipment has probability θ of breaking down. We have many
pieces of the same type of equipment. Assume the rates of breakdown across the pieces of
equipment are identical and independent.
1. Write down a null hypothesis H0 . In this case, it might be “H0 : θ = 0.6”.

2. Write down an alternative hypothesis HA . In this case, it might be “HA : θ < 0.6”.
(This is a one-tailed test — to be explained shortly.)

3. Observe a random sample. For example, we might have an observed random sample
of size 5, where only the fourth piece of equipment breaks down. And so we’d write
(x1 , x2 , x3 , x4 , x5 ) = (0, 0, 0, 1, 0).
4. Write down a test statistic. In this case, an obvious test statistic is the sample number
of failures T = X1 + X2 + X3 + X4 + X5 . Our observed test statistic is thus t = x1 + x2 +
x3 + x4 + x5 = 0 + 0 + 0 + 1 + 0 = 1.
5. Now ask, how likely is it that — if H0 were true — our test statistic would have been
“at least as extreme as” that actually observed? That is, what is the probability
P (Observe data as extreme as that observed∣H0 )?
The above probability is called the p-value of the observed sample.
In this case, the p-value is the probability of observing a random sample where 1 or fewer
pieces of equipment broke down, assuming H0 ∶ θ = 0.6 were true. That is,
p = P (T ≤ t = 1∣H0 ) .
Now, remember that T is a random variable. In fact, it’s a binomial random variable.
Assuming H0 to be true, we have T ∼ B (n, θ) = B (5, 0.6). Thus,
⎛5⎞ 0 5 ⎛5⎞ 1 4
p = P (T ≤ 1∣H0 ) = P (T = 0∣H0 ) + P (T = 1∣H0 ) = 0.6 0.4 + 0.6 0.4 = 0.08704.
⎝0⎠ ⎝1⎠
This says that if H0 were true, then the probability of observing a test statistic as extreme
as the one we actually observed is only 0.08704. We might interpret this relatively small
p-value as casting doubt on or providing evidence against H0 .

Here is the full list of the ingredients that go into NHST.
Null Hypothesis Significance Testing (NHST)
1. Null hypothesis H0 (e.g. “this equipment has probability 0.6 of breaking down”).
2. Alternative hypothesis HA (e.g. “this equipment has probability less than 0.6 of
breaking down”). The test is either one-tailed or two-tailed, depending on HA .
3. A random sample of size n: (X1 , X2 , . . . , Xn ).
4. A test statistic T (which simply maps each observed random sample to a real number.)
5. The p-value of the observed sample. This is the probability that — assuming H0 were
true — T takes on values that are at least “as extreme as” the actual observed test
statistic t.
6. The significance level α. This is a pre-selected threshold, usually chosen to be some

small value. The conventional significance levels are α = 0.1, α = 0.05, or α = 0.01.
We then conclude qualitatively that:

• A small p-value casts doubt on or provides evidence against H0 .
• A large p-value fails to cast doubt on or provide evidence against H0 .
In particular, if p < α, then we say that we reject H0 at the significance level α. And
if p ≥ α, then we say that we fail to reject H0 at the significance level α.
Note importantly that to reject H0 (at some significance level α) does NOT mean that H0
is false and HA is true. Similarly, failure to reject H0 does NOT mean that H0 is true and
HA is false. More on this below.
Another example of NHST, now slightly more formally and carefully presented.

Example 205. (Dr. Chee election example.) Our parameter of interest is µ, the
proportion of votes for Dr. Chee. We guess that Dr. Chee won only 30% of the votes. We
might thus write down two competing hypotheses:
H0 ∶ µ = 0.3,
HA ∶ µ > 0.3.
We call H0 the null hypothesis and HA the alternative hypothesis.
We pre-select α = 0.05 as our significance level. This is the arbitrary threshold at which
we’ll say we reject (or fail to reject) H0 .
We gather a random sample of 100 votes: (X1 , X2 , . . . , X100 ). Our test statistic is the
number of votes in favour of Dr. Chee, given by
T = X1 + X2 + ⋅ ⋅ ⋅ + X100 .
Suppose that in our observed random sample (x1 , x2 , . . . , x100 ), we find that 39 are in favour
of Dr. Chee. Our observed test statistic is thus t = 39.
We now ask: What is the probability that — assuming H0 were true — T takes on values
that are at least “as extreme as” the actual observed test statistic t? That is, what is the
p-value of the observed sample?
Now, assuming H0 were true, T is a binomial random variable with parameters 100 and
0.3. That is, T ∼ B (n, p) = B (100, 0.3). So:
p = P (T ≥ 39∣H0 ) = P (T = 39∣H0 ) + P (T = 40∣H0 ) + ⋅ ⋅ ⋅ + P (T = 100∣H0 )

⎛ 100 ⎞ 39 61 ⎛ 100 ⎞ 40 60 ⎛ 100 ⎞ 100 0
= 0.3 0.7 + 0.3 0.7 + ⋅ ⋅ ⋅ + 0.3 0.7 ≈ 0.03398.
⎝ 39 ⎠ ⎝ 40 ⎠ ⎝ 100 ⎠
The small p-value casts doubt on or provides evidence against H0 .
And since p ≈ 0.03398 < α = 0.05, we can also say that we reject H0 at the α = 0.05
significance level.

A wee bit of philosophy. The interpretation of probability and statistics used in the A-levels
(and thus also in this textbook) is called the objectivist interpretation. You needn’t
know much about this, but what you should know is this:
Let θ be the parameter we’re interested in. Under the objectivist interpretation, the
value of θ may be unknown, but it is fixed.
This has two consequences:
1. We never speak probabilistically about θ, because θ is a fixed number. For example, we

never say “θ is probably less than 0.6” or “θ has probability 0.8 of being between 0.4
and 0.7”. Such statements are nonsensical.
2. The null hypothesis, which is always written as an equality (e.g. “H0 ∶ θ = 0.6”), is
almost certainly false. After all, θ can (usually) take on a continuum of values. So do
NOT interpret “we fail to reject H0 ” to mean “H0 is true”. This is because H0
is almost certainly false.
When performing NHST, we will assiduously avoid saying things like “H0 is true”, “H0 is
false”, “HA is true”, or “HA is false”. Instead, we will stick strictly to saying either “we
reject H0 at the significance level α” or “we fail to reject H0 at the significance level α”.
Each of these two statements has a very precise meaning. The first says that p < α. The
second says that p ≥ α. Nothing more and nothing less.
Exercise 93. We flip a coin 20 times and get 17 heads. Test, at the 5% significance level,
whether the coin is biased towards heads. (Answer on p. 362.)

56.1 One-Tailed vs Two-Tailed Tests
In the previous section, all the NHST we did were one-tailed tests.18 For example, in the
NHST done for Dr. Chee, we had
H0 ∶ µ = 0.3,
HA ∶ µ > 0.3.
This was a one-tailed test because the alternative hypothesis HA was that µ was to the
right of 0.3.
If instead we changed the alternative hypothesis to:
H0 ∶ µ = 0.3,
HA ∶ µ ≠ 0.3.
Then this would be called a two-tailed test, because the alternative hypothesis HA is that
µ is either to the left or to the right of 0.3.
We now repeat the examples done in the previous section, but with HA tweaked so that we
instead have two-tailed tests. The difference is that the p-value is calculated differently.
18
By the way, the more common convention is to say “one-tailed” and “two-tailed” tests, rather than “one-tail” and “two-
tail” tests, as is the norm in Singapore (similar to those “Close for break” signs you sometimes see). But after some
consultation with my grammatical experts, I have been told that both are equally correct.

Example 209 (equipment breakdown). Everything is as before, except that we now
change the alternative hypothesis:
H0 ∶ θ = 0.6,
HA ∶ θ ≠ 0.6.
Say we observe the same random sample as before: (x1 , x2 , x3 , x4 , x5 ) = (0, 0, 0, 1, 0).
Again our test statistic is the sample number of failures T = X1 + X2 + X3 + X4 + X5 . And

so again our observed test statistic is t = x1 + x2 + x3 + x4 + x5 = 0 + 0 + 0 + 1 + 0 = 1.
The difference now is how the p-value (of the observed sample) is calculated. In words, the
p-value gives the likelihood that our test statistic is “at least as extreme as” that actually
observed — assuming H0 were true.
Previously, under a one-tailed test, we interpreted “our test statistic is at least as extreme
as that actually observed” to mean the event T ≤ t = 1.
Now that we’re doing a two-tailed test, we’ll instead interpret the same phrase to mean both
the event T ≤ t = 1 and the event that T is as far away on the other side of E [T ∣H0 ] = 3.
The second event is, specifically, T ≥ 5. Altogether then, the p-value is given by
p = P (T ≤ 1, T ≥ 5∣H0 )
= P (T = 0∣H0 ) + P (T = 1∣H0 ) + P (T = 5∣H0 )

⎛5⎞ 0 5 ⎛5⎞ 1 4 ⎛5⎞ 1 4
= 0.6 0.4 + 0.6 0.4 + 0.6 0.4 = 0.1648.
⎝0⎠ ⎝1⎠ ⎝5⎠
Since p = 0.1648 ≥ α = 0.1, we say that we fail to reject H0 at the α = 0.1 significance
level.
Observe that previously, under the one-tailed test, we could reject H0 at the α = 0.1
significance level, because there p = 0.08704. Now, in contrast, under the two-tailed test,
we fail to reject H0 at the same significance level.
In general, all else equal, the p-value for an observed random sample is greater under a
two-tailed test than under a one-tailed test. Thus, under a two-tailed test, we are less
likely to reject H0 .

Example 205 (Dr. Chee election). We change the alternative hypothesis:
H0 ∶µ = 0.3,
HA ∶µ ≠ 0.3.
Say we observe the same random sample as before: (x1 , x2 , . . . , x100 ), in which 39 votes were
in favour of Dr. Chee. So again our observed test statistic is t = x1 + x2 + ⋅ ⋅ ⋅ + x100 = 39.
The difference now is how the p-value (of the observed sample) is calculated. In words, the
p-value gives the likelihood that our test statistic is “at least as extreme as” that actually
observed — assuming H0 were true.
Previously, under a one-tailed test, we interpreted “our test statistic is at least as extreme
as that actually observed” to mean the event T ≥ t = 39.
Now that we’re doing a two-tailed test, we’ll instead interpret the same phrase to mean both
the event T ≥ t = 39 and the event that T is as far away on the other side of E [T ∣H0 ] = 30.
The second event is, specifically, T ≤ 21. Altogether then, the p-value is given by
p = P (T ≤ 21, T ≥ 39∣H0 ) = 1 − P (22 ≤ T ≤ 38∣H0 )
= 1 − [P (T = 22∣H0 ) + P (T = 23∣H0 ) + ⋅ ⋅ ⋅ + P (T = 38∣H0 )]

⎡ ⎤
⎢⎛ 100 ⎞ 22 78 ⎛ 100 ⎞ 23 77 ⎛ 100 ⎞ 38 62 ⎥⎥
=1−⎢ ⎢ 0.3 0.7 + 0.3 0.7 + ⋅ ⋅ ⋅ + 0.3 0.7 ⎥ ≈ 0.06281.
⎢⎝ 22 ⎠ ⎝ 23 ⎠ ⎝ 38 ⎠ ⎥
⎣ ⎦
Since p = 0.06281 ≥ α = 0.05, we say that we fail to reject H0 at the α = 0.05 significance
level.
Again observe that previously, under the one-tailed test, we could reject H0 at the α = 0.05
significance level, because there p = 0.03398. Now, in contrast, under the two-tailed test,
we fail to reject H0 at the same significance level.
Exercise 94. We flip a coin 20 times and get 17 heads. Test, at the 5% significance level,
whether the coin is biased.(Answer on p. 362.)

56.2 The Abuse of NHST (Optional)
NHST is popular because it gives a simplistic, formulaic cookbook procedure. Moreover,

its conclusion appears to be binary: either we reject H0 or we fail to reject H0 .
However, NHST is widely misunderstood, misinterpreted, and misused even within scientific
communities. It has long been heavily criticised. In March 2016, the American Statistical
Association even issued an official policy statement on how NHST should be used!
Here I discuss only the most important, commonly-made error.

We may write the p-value as
p = P (D∣H0 ) ,
where D stands for the observed data and H0 stands for the null hypothesis. The p-value
answers the following question: — assuming H0 were true, what’s the probability that we’d
get data “at least as extreme” as those actually observed (D)?
Say we get a p-value of 0.03. We should then say simply that
• The small p-value casts doubt on or provides evidence against H0 .

• If the pre-selected significance level was α = 0.05, then we may say that we reject H0
at the 5% significance level.
However, instead of merely saying the above, some researchers may instead conclude that:
H0 is true with probability 0.03.
Do you see the error here? The researcher has gone from the finding that p = P (D∣H0 ) = 0.03
to the conclusion that P (H0 ∣D) = 0.03.
The error is the same as leaping from “A lottery ticket buyer who doesn’t cheat has a small
probability q of winning” to “Jane bought a lottery ticket and won. Therefore, there is only
probability q that she didn’t cheat.”
The p-value is NOT the probability that H0 is true.19 Instead, it is the probability that
— assuming H0 were true — we would have gotten data “at least as extreme” as those
actually observed. This is an important difference. But it is also a subtle one, which is why
even researchers get confused.
19
Indeed, under the objectivist view, such a statement is nonsensical anyway, because H0 is either true or not true; it makes
no sense to talk probabilistically about whether H0 is true.

56.3 Common Misinterpretations of the Margin of Error (Optional)
The sampling error or margin of error is often misinterpreted by laypersons (and

journalists).
Example 210. On the night of the 2016 Bukit Batok SMC By-Election, the Elections
Department announced* that based on a sample count of 900 ballots,
• Dr. Chee had won 39% of the votes.

• These sample counts have a confidence level of 95%, with a ±4% margin of error.
What does the above gobbledygook mean? Let µ be the true proportion of votes won by
Dr. Chee. Let X̄ be the sample proportion and x̄ be the observed sample proportion.
It’s clear enough what the 39% means — they randomly counted 900 ballots and found
(after accounting for any spoilt votes) that x̄ = 39% were in favour of Dr. Chee.
What’s less clear is what the 95% confidence level and ±4% margin of error mean.
Here are three possible interpretations of what is meant. Only one is correct.
1. “With probability 0.95, µ ∈ (x̄ − 0.04, x̄ + 0.04) = (0.35, 0.43).”
2. “With probability 0.95, X̄ ∈ (x̄ − 0.04, x̄ + 0.04) = (0.35, 0.43).”
Equivalently, suppose we repeatedly observe many random samples of size 900. Then we
should find that in 0.95 of these observed random samples, the observed sample mean is
between 0.35 and 0.43.
3. “With probability 0.95, X̄ ∈ (µ − 0.04, µ + 0.04).”

We have no idea what µ is. All we can say is that with probability 0.95, the sample mean
X̄ of votes for Dr. Chee is between µ − 0.04 and µ + 0.04.
Equivalently, suppose we repeatedly observe many random samples of size 900. Then we
should find that in 0.95 of these observed random samples, the observed sample mean is
between µ − 0.04 and µ + 0.04.
Take a moment to understand what each of the above interpretations say. Then decide
which you think is the correct interpretation, before turning to the next page.

*Sources: The Straits Times [backup], TodayOnline [backup].

(... Example from the previous page ...)
Interpretation #1 — “with probability 0.95, µ ∈ (x̄ − 0.04, x̄ + 0.04) = (0.35, 0.43)” — is

perhaps the one most commonly made by laypersons.* It makes two errors:
1. It is nonsensical to speak probabilistically about the proportion µ of votes
won by Dr. Chee. µ is some fixed number. So either µ is in the interval (0.35, 0.43), or
it isn’t. It makes no sense to speak probabilistically about whether µ is in that interval.
2. The margin of error is applicable to the true proportion µ and not to the observed
sample proportion x̄ = 0.39.
Some “authorities” often attempt** to correct Interpretation #1 by offering Interpretation

#2 — “with probability 0.95, X̄ ∈ (x̄ − 0.04, x̄ + 0.04) = (0.35, 0.43)”. However, Interpreta-
tion #2 is still wrong, because it still makes the second of the above two errors.
Unfortunately, the correct interpretation is also the one that says the least. It is Interpre-
tation #3 — “with probability 0.95, X̄ ∈ (µ − 0.04, µ + 0.04)”.
This interpretation says merely that if we were somehow able to repeatedly observe random
samples of size 900, then we’d find that 0.95 of the corresponding observed sample means
will be in (µ − 0.04, µ + 0.04). Which isn’t saying much, because first of all, we have only one
observed random sample; we do not get to repeatedly observe random samples. Secondly,
this still doesn’t tell us much about µ, which is what we’re really interested in.
The correct interpretation (Interpretation #3) is the least interesting interpretation. Per-
haps this explains why journalists often prefer to give an incorrect interpretation.
*E.g. the article “Margin of Ignorance” (backup) begins by reporting poll results that Kerry-Edwards was supported by 51%
of voters, while Bush-Cheney was supported by 45%. The author then ridicules other journalists for their misinterpretation
of these data. (He also claims, incorrectly, that polling is based on the Central Limit Theorem.) He then triumphantly
gives the “correct” explanation: “95 times out of 100 the true Kerry-Edwards number will fall between 47 and 55 and the
Bush-Cheney number will fall between 41 and 49.” This, of course, is what we called incorrect Interpretation #1 above.
**Section 3 of “Erring on the Margin of Error” lists some such mistakes.
For a discussion of where the Elections Department’s ±4% margin of error comes from,
please see the Appendices of my H2 Mathematics Textbook.

Journalists often try to explain what the confidence level and margin of error mean — they
almost always get it wrong.
Example 211. On the night of the 2016 Bukit Batok SMC By-Election, a website called
Mothership.sg wrote:
“Based on the sample count of 100 votes,* it was revealed at 9.26pm that the SDP Sec-Gen
received 39 percent of votes. In other words, Chee would score 35 per cent in the worst
case scenario and 43 per cent in the best case scenario.”
This is the most absurd misinterpretation of the margin of error I have ever seen.**
Let’s see what the correct worst- and best-case scenarios are.
Suppose that in the observed random sample of 900 votes, exactly 39% or 0.39 × 900 = 351
were votes for Dr. Chee and the remaining 549 were for PAP Guy. Then:
• Worst-case scenario: The observed random sample of 900 votes happened to contain
exactly all of the votes in favour of Dr. Chee. That is, Dr. Chee won only 351 votes
and PAP Guy won the remaining 23, 570 − 351 = 23, 219 votes. So the correct worst-case
scenario is that Dr. Chee won ≈ 1.5% of the votes.
• Best-case scenario: The observed random sample of 900 votes happened to contain
exactly all of the votes in favour of PAP Guy. That is, PAP Guy won only 549 votes
and Dr. Chee won the remaining 23570 − 549 = 23, 021 votes. So the correct best-case
scenario is that Dr. Chee won ≈ 97.7% of the votes.
These worst- and best-case scenarios are admittedly unlikely. Nonetheless, they are pos-
sible scenarios all the same. The journalist’s purported worst- and best-case scenarios are
completely wrong.
*By the way, even this basic fact was wrong. The sample count was not 100 votes. Instead, it was 900 votes, consisting of
100 votes from each of 9 polling stations.
Moreover, the Mothership.sg journalist failed to report the confidence level of 95%, either because he didn’t know what it
meant or because he didn’t think it important. But it is important. It is pointless to inform the reader about the margin of
error without also specifying the confidence level.
**You can find several misinterpretations of the margin of error collected in this academic paper: “Erring in the Margin of
Error”. None is as absurdly bad as the one here.

56.4 Critical Region and Critical Value
Informally, the critical region is the set of values of the observed test statistic t for which
we would reject the null hypothesis. The critical region is thus sometimes also called the
rejection region.
And the critical value(s) is (are) the exact value(s) of the observed test statistic t at
which we are just able to reject the null hypothesis.
Example 205. (Dr. Chee election.)
Say that as before, we have a one-tailed test where the two competing hypotheses are:
H0 ∶ µ = 0.3,
HA ∶ µ > 0.3.
Say that as before, we choose α = 0.05 as our significance level.
Say that as before, in our observed random sample of 100 votes, 39 are in favour of Dr.
Chee, so that our observed test statistic is t = 39.
We calculated that the corresponding p-value is 0.03398 and so we were able to reject H0
at the α = 0.05 significance level.
We now calculate the critical region and the critical value. We can calculate that if
t = 38, then the corresponding p-value is ≈ 0.053 (you should verify this for yourself). And
so we would be unable to reject H0 .
We thus conclude that the critical value is 39, because this is the value of t at which we
are just able to reject H0 .
And the critical region is the set {39, 40, 41, . . . , 100}. These are the values at which we’d
be able to reject H0 at the α = 0.05 significance level.

Same example as above, but now two-tailed:
Example 205. (Dr. Chee election.)
Say that as before, we have a two-tailed test where the two competing hypotheses are:
H0 ∶ µ = 0.3,
HA ∶ µ ≠ 0.3.
The significance level is again α = 0.05. Again, the observed random sample of 100 votes
contains 39 in favour of Dr. Chee, so that our observed test statistic is t = 39.
We calculated that the corresponding p-value is 0.06281 and so we failed to reject H0 at

the α = 0.05 significance level.
We calculate that if t = 40, then the corresponding p-value is ≈ 0.03745 (you should verify
this for yourself). Thus, the critical values are 20 and 40, because these are the values of t
at which we are just able to reject H0 .
The critical region is the set {0, 1, . . . , 20, 40, 41, . . . , 100}. These are the values at which
we’d be able to reject H0 at the α = 0.05 significance level.
Exercise 95. (Answer on p. 363.) We flip a coin 20 times. What are the critical region
and critical value(s) in
(a) A test, at the 5% significance level, of whether the coin is biased towards heads.
(b) A test, at the 5% significance level, of whether the coin is biased.

(Small Sample, Normal Distribution, σ 2 Known)
Example 212. The weight (in mg) of a grain of sand is X ∼ N (µ, 9). Our unknown
parameter of interest is the true population mean µ (i.e. the true average weight of a grain
of sand). Our “guess” is that µ = 5. We thus write down two competing hypotheses:
H0 ∶ µ = 5,
HA ∶ µ ≠ 5.
(Note that this is a two-sided test.)
We take a random sample of size 4 — (X1 , X2 , X3 , X4 ). Our test statistic is the sample
mean X̄ = (X1 + X2 + X3 + X4 ) /4.
Our observed random sample is (x1 , x2 , x3 , x4 ) = (3, 9, 11, 7). That is, we randomly pick
four grains of sand that happen to have weights 3, 9, 11, and 7 mg. Then the observed test
statistic is
3 + 9 + 11 + 7
x̄ = = 7.5.
4
The p-value is the probability that the test statistic X̄ takes on values “at least as extreme
as” our observed test statistic x̄ = 7.5, assuming H0 ∶ µ = 5 were true. Note that if H0 were
true, then X̄ ∼ N (µ, σ 2 /n) = N (5, 9/4). Thus, the p-value is given by
p = P (X̄ ≥ 7.5, X̄ ≤ 2.5∣H0 ) = P (X̄ ≥ 7.5∣H0 ) + P (X̄ ≤ 2.5∣H0 )
⎛ 7.5 − 5 ⎞ ⎛ 2.5 − 5 ⎞
=P Z≥ √ +P Z ≤ √ ≈ 0.04779 + 0.04779 = 0.09558.
⎝ 9/4 ⎠ ⎝ 9/4 ⎠
Thus, we reject H0 at the α = 0.1 significance level. However, we would fail to reject H0 at
the α = 0.05 significance level.

The table below summarises the tests to use for the population mean, in different circum-
stances. In this section, we learnt how to handle the first case (any sample size, normal
distribution, σ 2 known). The following sections will deal with the other two cases.
Sample size Distribution σ2 σ 2 known
X̄ − µ
Any Normal Known Z-test: √ ∼ N(0, 1).
σ/ n
X̄ − µ
Large Any Known Z-test: √ ∼ N(0, 1).
σ/ n
X̄ − µ
Large Any Unknown Z-test: √ ∼ N(0, 1).
s/ n
Small Normal Unknown Not in A-levels.
Small Non-normal Either Not in A-levels.
Exercise 96. The Singapore daily high temperature (in °C) can be modelled by X ∼
N (µ, 8). Our unknown parameter of interest is the true population mean µ (i.e. the
true average daily high temperature). Your friend guesses that µ = 34. You gather
the following data on daily high temperatures, of 10 randomly-chosen days in 2015:
(35, 35, 31, 32, 33, 34, 31, 34, 35, 34). Test your friend’s hypothesis, at the α = 0.05 signifi-
cance level. (Be sure to write down your null and alternative hypotheses.) (Answer on p.
364.)

(Large Sample, Any Distribution, σ 2 Known)
We’ll recycle the same example from the previous section. Before, we knew that X was
normally distributed. Now the big difference is that we have absolutely no idea what
distribution X comes from!
To compensate, we require also that our random sample is “large enough”, so that the
CLT-approximation can be used.
Example 213. The weight (in mg) of a grain of sand is X ∼ (µ, 9). (This says simply that
X is distributed with mean µ and variance 9.) Our unknown parameter of interest is the
true population mean µ (i.e. the true average weight of a grain of sand). Again, we “guess”
that µ = 5. Again, we write down
H0 ∶ µ = 5,
HA ∶ µ ≠ 5.
(Note that this is, again, a two-sided test.)
This time, we’ll take a random sample of size 100 — (X1 , X2 , . . . , X100 ). Again, our test
statistic is the sample mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X100 ) /100.
Recall the magic of the CLT. Even if we have absolutely no idea what distribution X
is drawn from, then provided n is sufficiently large, X̄ is normally distributed. So here,
since the sample is large (n = 100 ≥ 20), by the CLT, we know that X̄ has, approximately,
the normal distribution N (µ, σ 2 /n). So, if H0 were true, then we have, approximately,
X̄ ∼ N (µ, σ 2 /n) = N (5, 9/100).
Say the observed test statistic we get is:
x1 + x2 + ⋅ ⋅ ⋅ + x100
x̄ = = 5.5.
100

Again, the p-value is the probability that our test statistic X̄ takes on values “at least as
extreme as” our observed test statistic x̄ = 5.6, assuming H0 ∶ µ = 5 were true. Thus, the
p-value is given by
p = P (X̄ ≥ 5.6, X̄ ≤ 4.4∣H0 ) = P (X̄ ≥ 5.6∣H0 ) + P (X̄ ≤ 4.4∣H0 )
CLT 5.6 − µ 4.4 − µ ⎛ 5.6 − 5 ⎞ ⎛ 4.4 − 5 ⎞

≈ P (Z ≥ √ ) + P (Z ≤ √ )=P Z≥ √ +P Z ≤ √
σ/ n σ/ n ⎝ 9/100 ⎠ ⎝ 9/100 ⎠
= P (Z ≥ 2) + P (Z ≤ −2) ≈ 0.0455.
Thus, we reject H0 at the α = 0.05 significance level.
Exercise 97. The Singapore daily high temperature (in °C) can be modelled by X ∼ (µ, 8).
Our unknown parameter of interest is the true population mean µ (i.e. the true average
daily high temperature). Your friend guesses that µ = 34. You gather the data on daily high
temperatures, of 100 randomly-chosen days in 2015 and find the observed sample average
temperature to be 33.4 °C. Test your friend’s hypothesis, at the α = 0.05 significance level.
(Be sure to write down your null and alternative hypotheses. Also, clearly state where you
use the CLT.) (Answer on p. 364.)

(Large Sample, Any Distribution, σ 2 Unknown)
We’ll recycle the same example from the previous section. Again, we have absolutely no
idea what distribution X comes from. And again, the random sample is large enough, so
that the CLT can be used.
But now, σ 2 is unknown. This turns out to be no big deal. We can simply replace σ 2
with the observed unbiased sample variance s2 , and do the same thing as before.
Example 214. The weight (in mg) of a grain of sand is X ∼ (µ, σ 2 ). (This says simply
that X is distributed with mean µ and variance σ 2 .) Our unknown parameter of interest
is the true population mean µ (i.e. the true average weight of a grain of sand). Again, we
“guess” that µ = 5. Again, we write down
H0 ∶ µ = 5,
HA ∶ µ ≠ 5.
(Note that this is, again, a two-sided test.)
Again, we take a random sample of size 100 — (X1 , X2 , . . . , X100 ). Again, our test statistic
is the sample mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X100 ) /100.
Again, since the sample is large (n = 100 ≥ 20), by the CLT, that X̄ has, approximately,
the normal distribution N (µ, σ 2 /n). So, if H0 were true, then we have, approximately,
X̄ ∼ N (µ, σ 2 /n) = N (5, σ 2 /100). Since the sample variance S 2 is an unbiased estimator for
σ 2 , it is plausible that we also have, approximately, X̄ ∼ N (µ, σ 2 /n) = N (5, s2 /100), where
s2 is the observed sample variance.
Say the observed sample mean and observed sample variance we get are:
100 2
x1 + x2 + ⋅ ⋅ ⋅ + x100 2 ∑i=1 (xi − x̄)
x̄ = = 5.6 and s = =8
100 n−1

Again, the p-value is the probability that our test statistic X̄ takes on values “at least as
extreme as” our observed test statistic x̄ = 5.6, assuming H0 ∶ µ = 5 were true. Thus, the
p-value is given by
p = P (X̄ ≥ 5.6, X̄ ≤ 4.4∣H0 ) = P (X̄ ≥ 5.6∣H0 ) + P (X̄ ≤ 4.4∣H0 )
CLT 5.6 − µ 4.4 − µ ⎛ 5.6 − 5 ⎞ ⎛ 4.4 − 5 ⎞

≈ P (Z ≥ √ ) + P (Z ≤ √ ) = P Z ≥ √ +P Z ≤ √
s/ n s/ n ⎝ 8/100 ⎠ ⎝ 8/100 ⎠
≈ P (Z ≥ 2.1213) + P (Z ≤ −2.1213) ≈ 0.03389.
Thus, we reject H0 at the α = 0.05 significance level.
Exercise 98. The Singapore daily high temperature (in °C) can be modelled by X ∼
(µ, σ 2 ). Our unknown parameter of interest is the true population mean µ (i.e. the true
average daily high temperature). Your friend guesses that µ = 34. You gather the data
on daily high temperatures, of 100 randomly-chosen days in 2015. Your observed sample
mean temperature is 33.4 °C and your observed sample variance is 11.2 °C2 . Test your
friend’s hypothesis, at the α = 0.05 significance level. (Be sure to write down your null and
alternative hypotheses. Also, clearly state where you use the CLT.) (Answer on p. 365.)

56.8 Formulation of Hypotheses
Example 215. We flip a coin 100 times. We get 100 heads. What can we say about the
coin?
This is an open-ended question, to which there can be many different answers. Here’s the
answer we’re taught to give for H2 Maths:
Let µ be the probability that a coin-flip is heads. We formulate a pair of competing

hypotheses:
H0 ∶ µ = 0.5,
HA ∶ µ ≠ 0.5.
Our test statistic T is the number of heads (out of 100 coin-flips). Our observed test
statistic t is 100. The corresponding p-value (note that this is a two-tailed test) is
P (T ≥ 100, T ≤ 0∣H0 ) = P (T = 0∣H0 ) + P (T = 100∣H0 )

⎛ 100 ⎞ 0 100 ⎛ 100 ⎞ 100 0
= 0.5 0.5 + 0.5 0.5 ≈ 1.578 × 10−30 .
⎝ 0 ⎠ ⎝ 100 ⎠
The tiny p-value may be interpreted as casting on or providing evidence against H0 .
We note also that we can easily reject H0 at any of the conventional significance levels
(α = 0.1, α = 0.05, or α = 0.01).
Exercise 99. (Answer on p. 365.) We observe the weights (in kg) of a random sample of
50 Singaporeans: (x1 , x2 , . . . , x50 ). We observe that ∑ xi /50 = 68 and ∑ x2i /50 = 5000.
A friend claims that the average American is heavier than the average Singaporean. It
is known that the average American weighs 75 kg. Is your friend correct? If you make
any assumptions or approximations, make clear exactly where you do so. (Hint: Use Fact
13(a)).

57 Correlation and Linear Regression
57.1 Bivariate Data and Scatter Diagrams
In this chapter, we’ll be interested in the relationship between two sets of data.
Example 216. We measure the heights and weights of 10 adult male Singaporeans. Their
heights (in cm) and weights (in kg) are given in this table:
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
We call (hi , wi ) observation i. So for example, observation 5 is (178, 72) and observation
9 is (150, 44).
We can plot a scatter diagram of these 10 persons’ weights (vertical axis) against their
heights (horizontal).
90 Weight (kg)
80
70
60
50 Height (cm)
40
145 155 165 175 185 195
The black dotted line is called a line of best fit. Shortly (section 57.4), we’ll learn how
to construct this line of best fit.
The more closely the data points in the above scatter diagram lie to a straight line, the more
strongly linearly-correlated are weight and height. So here with these particular data,
the linear correlation between weight and height seems strong. In the next section, we’ll
learn about the product moment correlation coefficient, which is a way to precisely
quantify the degree to which two sets of data are linearly-correlated.
Because the line of best fit is upward-sloping, we can also say that the linear correlation is
positive.

Example 217. We have data from the Clementi weather station for the daily high temper-
ature (in °C) and daily rainfall (in mm) on 361 days in 2015. (Strangely, data were missing
for four days, namely Feb 10-13.)
i 1 2 3 4 ... 361
ti (°C) 27.3 29.5 31.1 32 30.2
pi (mm) 0 0.2 0 0 12.4
We can again plot a scatter diagram of rainfall against temperature.
80 Rainfall (mm)
70
60
50
40
30
20
10
0
25 30 Temperature (degrees Celsius) 35
Again, the black dotted line is a line of best fit. The data points do not seem close to this
line. Thus, it seems that the linear correlation between temperature and rainfall is weak.
The line of best fit is downward-sloping and so we say that the linear correlation is negative.
Exercise 100. (Answer on p. 366.) The table below shows the prices charged (p) and the
number of haircuts (q) given by 5 different barbers, during June 2016.
Draw a scatter diagram with price on the horizontal axis. Plot also what you think looks
like a line of best fit.
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400

57.2 Product Moment Correlation Coefficient (PMCC)
In the previous section, we used a scatter diagram to determine if there was a plausible
linear relationship between two sets of data. This, though, was a very crude method.
A more precise measure of the degree to which two sets of data are linearly correlated is
called the product moment correlation coefficient (PMCC). Formally:
Definition 13. Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two ordered sets of real numbers.
Then their product moment correlation coefficient (PMCC) , denoted r, is the real number
defined by
∑i=1 (xi − x̄) (yi − ȳ)

n
r=√ √ .
2 2
∑i=1 (xi − x̄) ∑i=1 (yi − ȳ)
n n
Properties of the PMCC.
1. −1 ≤ r ≤ 1.
2. We say the linear correlation is positive if r > 0 and negative if r < 0.
3. If r = 1, then the data points lie exactly on an upward-sloping line (and we say the linear
correlation is perfect).

4. If r = −1, then the data points lie exactly on a downward-sloping line (and we say the
linear correlation is perfect).
5. If r is close to 1, then the data points lie close to an upward-sloping line (and we say the
linear correlation is very strong).
6. If r is close to −1, then the data points lie close to a downward-sloping line (and we say
the linear correlation is very strong).

7. If r is close to 0, then we say that the linear correlation is very weak.
8. r is merely a measure of linear correlation and nothing else. Two variables may be very
closely related but not linearly-correlated. For example, data generated by the quadratic
model yi = x2i may have a very low r.

Example 216 (continued from above). This is the height and weight example revisited.
For convenience, we reproduce the data and scatter diagram:
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
90 Weight (kg)
80
70
60
50 Height (cm)
40
145 155 165 175 185 195
182 + 165 + 173 + 155 + 178 + 174 + 169 + 160 + 150 + 190
h̄ = = 169.6,
10
81 + 70 + 71 + 53 + 72 + 75 + 69 + 60 + 44 + 80
w̄ = = 67.5,
10
n
∑ (hi − h̄) (wi − w̄) = (182 − h̄) (81 − w̄) + ⋅ ⋅ ⋅ + (190 − h̄) (80 − w̄) = 1237
i=1
¿ √
Án 2
Á
À∑ (hi − h̄) = (182 − 169.6)2 + ⋅ ⋅ ⋅ + (190 − 169.6)2 ≈ 37.180640,
i=1
¿ √
Án
Á
À∑ (wi − w̄)2 = (81 − 67.5)2 + + ⋅ ⋅ ⋅ + (80 − 67.5)2 ≈ 35.418922,
i=1
∑i=1 (hi − h̄) (wi − w̄)

n
Ô⇒ r = √ √ ≈ 0.9393.
2 2
∑i=1 (hi − h̄) ∑i=1 (wi − w̄)
n n
As expected, r > 0 (the linear correlation is positive or, equivalently, the line of best fit is
upward-sloping). Moreover, r is close to 1 (the linear correlation is very strong).

Example 217 (continued from above). This is the temperature and rainfall example
revisited. For convenience, we reproduce the data and scatter diagram:
i 1 2 3 4 ... 361
ti (°C) 27.3 29.5 31.1 32 30.2
pi (mm) 0 0.2 0 0 12.4
80 Rainfall (mm)
70
60
50
40
30
20
10
0
27.3 + 29.5 + 31.1 + 32 + ⋅ ⋅ ⋅ + 30.2 0 + 0.2 + 0 + 0 + ⋅ ⋅ ⋅ + 12.4

t̄ = ≈ 31.5, w̄ = ≈ 5.0.
361 361
∑i=1 (ti − t̄) (wi − w̄)

n
Ô⇒ r = √ √
2 2
∑i=1 (ti − t̄) ∑i=1 (wi − w̄)
n n
(27.3 − 31.5) (0 − 5.0) + ⋅ ⋅ ⋅ + (30.2 − 31.5) (12.4 − 5.0)

=√ √
2 2 2 2
(27.3 − 31.5) + ⋅ ⋅ ⋅ + (30.2 − 31.5) (0 − 5.0) + ⋅ ⋅ ⋅ + (12.4 − 5.0)
≈ −0.1623.
As expected, r < 0 (the linear correlation is negative or, equivalently, the line of best fit is
downward-sloping). Moreover, r is fairly close to 0 (the linear correlation is weak).

Exercise 101. Compute the PMCC between p and q, using the data below. (Answer on
p. 366.)
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400

57.3 Correlation Does Not Imply Causation (Optional)
Correlation does not imply causation. This saying has now become a cliché. Doesn’t make
it any less true.
Below is an amusing but spurious correlation (source).
US spending on science, space, and technology

correlates with
Suicides by hanging, strangulation and suffocation
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
$30 billion 10000 suicides
US spending on science
Hanging suicides

1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Hanging suicides US spending on science

tylervigen.com
The PMCC is r ≈ 0.99789126. So the two sets of data are almost perfectly linearly-
correlated. But of course, this doesn’t mean that spending on science causes suicides
or that suicides cause spending on science. More likely, the correlation is simply spurious.
A comic from xkcd:

57.4 Linear Regression
Example 87 (continued from above). We suspect that the heights and weights of adult
male Singaporeans are linearly-correlated. We thus write down this linear model:
w = a + bh.
Recall the quote: “All models are wrong, but some are useful.” The model w = a + bh is
unlikely to be exactly correct. But hopefully it will be useful.
We treat a and b as unknown parameters (do you expect b to be positive or negative?).

Our goal is to try to get estimates for a and b, from an observed random sample of height
and weight data.
We recycle the data from earlier. These, along with the scatter diagram, are reproduced
for convenience.
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
90 Weight (kg)
80
70
60
50 Height (cm)
40
145 155 165 175 185 195
The basic idea of linear regression is this: Find the line that “best fits” the given data.
Drawn in the figure above are three plausible candidates for the “line of best fit”. But there
can only be one line of best fit. Which is it?
At the end of the day, we’ll choose black dotted line as “the” line of best fit. But why?
This will be answered in the next section.

Example 217 (continued from above). We suspect that daily rainfall and daily high
temperatures for 2015 were linearly-correlated. We thus write down this linear model:
p = a + bt.
Again, our goal is to get estimates for the unknown parameters a and b (do you expect b
to be positive or negative?).
We gather the following data (recycled from before):
i 1 2 3 4 ... 361
ti (°C) 27.3 29.5 31.1 32 30.2
pi (mm) 0 0.2 0 0 12.4
80 Rainfall (mm)
70
60
50
40
30
20
10
0
Again, drawn in the figure above are several plausible candidates for the “line of best fit”.
It turns out that the black dotted line will be “the” line of best fit.

57.5 Ordinary Least Squares (OLS)
There are different methods for determining “the” line of best fit. Each method will give a
different line of best fit.
The method we’ll learn in H2 Maths is the most basic and most standard method. It is
called the method of ordinary least squares (OLS).
Let’s assume there is some true linear model, which may be written as y = a+bx. As always,
we stick to the objectivist interpretation. The parameters a and b have some true, fixed
values. However, they are unknown (and may forever be unknown).
Nonetheless, we’ll try to do our best and get estimates for a and b. These estimates will be
denoted â and b̂. And our line of best fit will then be y = â + b̂x.
How do we find this line of best fit? Intuitively, this will be the line to which the data
points are “as close as possible”. But there are many ways to define the term “as close
as possible”. For example, we could try to minimise the sum of the distances between the
points and the line. But we shall not do this.
Instead, we’ll use the method of OLS:
1. Measure the vertical distance of each data point (xi , yi ) from the line. This is called the
residual and is denoted ûi .
2. Our goal is to find the line y = â + b̂x that minimises ∑ û2i — this quantity is called the
Sum of Squared Residuals (SSR).
Example:

Example 216 (height and weight example revisited). Our candidate line of best fit
is w = â+ b̂h = 65+0h = 65. This is a horizontal line, which simply “predicts” that everyone’s
weight is always 65 kg, regardless of their height. (This is a somewhat silly candidate line
of best fit. Not surprisingly, this is not the actual line of best fit.)
85
Weight (kg)
80
75
70 5
65
60
55
50
45
Height (cm)
40
145 155 165 175 185 195
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
ŵi (kg) 65 65 65 65 65 65 65 65 65 65
ûi = wi − ŵi (kg) 16 5 6 −12 7 10 4 −5 −21 15
The second last row of the above table gives, for each person with height hi , the correspond-
ing predicted weight ŵi (as per our candidate line of best fit). The residual ûi (last row) is
then defined as the vertical distance between the data point and the weight predicted by
the candidate line of best fit.
10
The SSR is ∑ û2i = 162 + 52 + 62 + (−12)2 + 72 + 102 + 42 + (−5)2 + (−21)2 + 152 = 1317. Can
i=1
we do better than this? That is, can we find another candidate line of best fit whose SSR
is smaller than 1317?

The following fact gives two formulae for b̂, the slope of the line of best fit. Formula (i) is
printed in the List of Formulae you get during exams, but formula (ii) is not.
Fact 16. Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two ordered sets of data. The OLS
regression line of y on x is y − ȳ = b̂ (x − x̄), where
∑ (xi − x̄) (yi − ȳ)

n
(i) b̂ = i=1 n 2 ,
∑i=1 (xi − x̄)
∑ xi yi − nx̄ȳ
(ii) b̂ = .
∑ x2i − nx̄2
Moreover, the regression line can also be written in the form y = â + b̂x, where b̂ is as given
above and â = ȳ − b̂x̄.
Proof. We want to find â and b̂ such that the line y = â + b̂x has the smallest SSR possible.
The residual ûi is defined as the vertical distance between (xi , yi ) and the line y = â + b̂x.
That is,
ûi = yi − y = yi − (â + b̂xi ) .
Thus, the SSR is
2
∑ û2i = ∑ [yi − (â + b̂xi )]
We wish to minimise the SSR, by choosing appropriate values of â and b̂. This involves the
following pair of first order conditions:20
∂ ∂
∑ û2i = 0, ∑ û2i = 0.
∂â ∂ b̂
The remainder of the proof simply involves taking derivatives and doing the algebra — it
can be found in the Appendices of my H2 Mathematics Textbook.
Remark 3. Whenever we simply say regression line or line of best fit, it may safely be
assumed that we are talking about the OLS regression line.
20
There’s a bit of hand-waving here.

Example 216 (height and weight example revisited). We already calculated
n n
2
h̄ = 169.6, w̄ = 67.5, ∑ (hi − h̄) = 1382.4, ∑ (hi − h̄) (wi − w̄) = 1237.
i=1 i=1
∑i=1 (hi − h̄) (wi − w̄)

n
1237
So, b̂ = 2 = ≈ 0.8948.
∑i=1 (hi − h̄)
n 1382.4
Thus, the regression line is w − 67.5 = 0.8948 (h − 169.6) or w = â + b̂h = −84.26 + 0.8948h.
90
Weight (kg)
85 4
80
8
75
70
65
60
55
50
45
Height (cm)
40
145 155 165 175 185 195
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
ŵi (kg) 78.6 63.4 70.5 54.4 75.0 71.4 67.0 58.9 50.0 85.8
ûi = wi − ŵi (kg) 2.4 6.6 0.5 −1.4 −3.0 3.6 2.0 1.1 −6.0 −5.8
10
The SSR for the actual line of best fit is ∑ û2i = 2.42 + ⋅ ⋅ ⋅ + (−5.8)2 ≈ 147.6. This is much
i=1
better than the SSR of 1317 that we found for the previous candidate line of best fit, which
was simply a horizontal line.

Exercise 102. (a) Find the regression line of q on p, using the data below. (b) Complete
the table. (c) Draw the scatter diagram, including the regression line and the corresponding
residuals. (d) Compute the SSR. (Answer on p. 367.)
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
q̂i
ûi = qi − q̂i

57.6 TI84 to Calculate the PMCC and the OLS Estimates
Example 218. We’ll find the PMCC and the regression line for these data:
i 1 2 3 4 5
xi 1 7 3 11 8
yi 14 5 6 4 4
2. Press the blue 2ND button and then CATALOG (which corresponds to the 0 button).
This brings up the CATALOG menu.
3. Using the down arrow key ∨ , scroll down until the cursor is on DiagnosticOn.
4. Press ENTER once. And press ENTER a second time. The TI84 now says “DONE”,
telling you that the Diagnostic option has been turned on.
The above steps need only be performed once. Unless of course you’ve just reset your
calculator (as is required before each exam). In which case you have to go through the
above steps again.
5. Press STAT to bring up the STAT menu.

6. Press 1 to select the “1:Edit” option.
7. The TI84 now prompts you to enter data under the column titled “L1”. This is where
you should enter the data for x, using the numeric pad and the ENTER key as is
appropriate. (I omit from this step the exact buttons you should press.)
8. After entering the last entry, press the right arrow key > to go to column L2. So enter
the data for y, again using the numeric pad and the ENTER key as is appropriate.

9. Now press STAT to again bring up the STAT menu.

10. Press the right arrow key > to go to the CALC submenu.
11. Press 4 to select the “4:LinReg(ax+b)” option.
12. To tell the TI84 to go ahead and do the calculations, simply press ENTER .
The TI84 tells you that the PMCC is r = −.8147656398. The equation of the regression line
of y on x is y = ax + b = −.859375x + 11.75625.
(Be careful to note that the TI84 uses the symbol “a” for the coefficient for x, whereas in
the A-level List of Formulae, they use b instead. Don’t get these mixed up!)
Exercise 103. Using your TI84, find the PMCC between q and p, and also find the regres-
sion line of q on p (see data below). Verify that your answer for this exercise is the same
as those in the last two exercises. (Answer on p. 368.)
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400

57.7 Interpolation and Extrapolation
Given any value of x, we call the corresponding ŷ = b̂ (x − x̄) + ȳ the fitted value or the
predicted value. One use of the regression line is that it can help us predict (or “guess”)
the value of y, even for x for which we have no data.
Example 216 (height and weight example revisited). Say we want to guess the weight
of an adult male Singaporean who is 185 cm tall. Using our regression line, we predict that
his weight is ŵh=185 = 0.8948 × 185 − 84.26 ≈ 81.3 kg. This is called interpolation, because
we are predicting the weight of a person whose height is between two of our observations.
Say instead we want to guess the weight of an adult male Singaporean who is 210 cm tall.
Using our regression line, we predict that his weight is ŵh=210 = 0.8948 × 210 − 84.26 ≈ 103.6
kg. This is called extrapolation, because we are predicting the weight of a person whose
height is beyond on our rightmost observation.
i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190 185 210
wi (kg) 81 70 71 53 72 75 69 60 44 80 - -
ŵi (kg) 78.6 63.4 70.5 54.4 75.0 71.4 67.0 58.9 50.0 85.8 81.3 103.6
110 Weight (kg)

6
100
90
80
70
60
50
Height (cm)
40
145 155 165 175 185 195 205 215

For the A-level exams, you are supposed to mindlessly and formulaically say that “Extrap-
olation is less reliable than interpolation”, because
The former predicts what’s beyond the known observations; the

latter predicts what’s between two known observations.
This, though, is not a very satisfying explanation for why extrapolation is “less reliable”
than interpolation. It merely leads to another question: “Why should a prediction be more
reliable if done between two known observations, than if done to the right of the right-most
observation (or to the left of the left-most observation)?”
We won’t give an adequate answer to this latter question. Instead, we’ll simply give a
bunch of examples to illustrate the dangers of extrapolation:
Example 219. A man on a diet weighs 115 kg in Week #1. Here’s a chart of his weight
loss.
The OLS line of best fit suggests that he has been losing about 0.5 kg a week.
He forgot to record his weight on Week #6. By interpolation, we “predict” that his weight
that week was 112.5 kg. This is probably a reliable guess.
By extrapolation, we predict that his weight on Week #201 will be 15 kg. This guess is
obviously absurd. It requires that he keeps losing 0.5 kg a week for nearly 4 years.

Example 220. A growing boy is 160 cm tall in Month #1. Here’s a chart of his growth.
The OLS line of best fit suggests that he has been growing by about 1 cm a month.
He forgot to record his height in Month #6. By interpolation, we “predict” that his height
that month was 165 cm. This is probably a reliable guess.
By extrapolation, we predict that his height in Month #101 will be 260 cm. This guess is
obviously absurd. It requires that he keep growing by 1 cm a month for the 8-plus years.

Here are three colourful examples of the dangers of extrapolation from other contexts.
Example 221. Russell’s Chicken (Problems of Philosophy, 1912, Google Books link).
The man who has fed the chicken every day throughout its life at last wrings its neck instead,
showing that more refined views as to the uniformity of nature would have been useful to the
chicken. ... The mere fact that something has happened a certain number of times causes
animals and men to expect that it will happen again. Thus our instincts certainly cause
us to believe the sun will rise to-morrow, but we may be in no better a position than the
chicken which unexpectedly has its neck wrung.
Example 222. The Fermat numbers are
F0 = 22 + 1 = 3,
0
F1 = 22 + 1 = 5,
1
F2 = 22 + 1 = 17,
2
F3 = 22 + 1 = 257,
3
F4 = 22 + 1 = 65537.
4
Remarkably, the first five Fermat numbers are all prime. This observation led Fermat to
conjecture (guess) in the 17th century that all Fermat numbers are prime. This was an act
of extrapolation.
Unfortunately, Fermat’s act of extrapolation was wrong. About a century later, Euler
showed that F5 = 22 + 1 = 4294967297 = 641 × 6700417 is composite (not prime).
5
Today, the Fermat numbers F5 , F6 , . . . , F32 are all known to be composite. Indeed, it was
shown in 1964 that F32 is composite. Over half a century later, it is not yet known if F33 =
22 + 1 is prime or composite. F33 is an unimaginably huge number, with 2, 585, 827, 973
33
digits.

Example 223. On Ah Beng’s first day at school, he learns in Chinese class that the Chinese
character for the number 1 is written as a single horizontal stroke.
On his second day at school, he learns that the Chinese character for the number 2 is
written as two horizontal strokes.
On his third day at school, he learns that the Chinese character for the number 3 is written
as three horizontal strokes.
The Chinese The Chinese The Chinese

character for 1 character for 2 character for 3
After his third day at school, Ah Beng decides he’ll skip at least the next few Chinese
classes, because he thinks he knows how to write the Chinese characters for the numbers 4
and above. 4 simply consists of four horizontal strokes; 5 simply consists of five horizontal
strokes; etc. Unfortunately, Ah Beng’s act of extrapolation is wrong.
The characters for the numbers 4 through 10 look instead like this:
4 5 6 7 8 9 10

On the other hand, here are two historical examples of extrapolation that, to everyone’s
surprise, have held up remarkably well (at least to date).
Example 224. Moore’s Law. In 1965, Gordon Moore observed that the number of
components that could be crammed onto each integrated circuit doubled every year. He
predicted that this rate of progress would continue at least through 1975.
In 1975, he adjusted his prediction to a more modest rate of doubling every two years. Thus
far, this latter prediction has held up remarkably well, as the following graph (taken from
Nature) shows.
Unfortunately, as stated in the same Nature article, it “has become increasingly obvious to
everyone involved” that “Moore’s law ... is nearing its end”.

Example 225. Augustine’s Law. In 1983, Norman Augustine observed that the cost of
a tactical aircraft grows four-fold every ten years. (Google Books.)
This is considerably quicker than the rate at which the annual US defense budget and US
Gross National Product (GNP) grows. Extrapolating, he concluded:
• In 2054, the entire annual US defense budget will be spent on a single aircraft.
• Early in the 22nd century, the entire US GNP will be spent on a single aircraft.

These seemingly-absurd conclusions were written at least partly in jest.
Except so far they have been right on track. In a 2010 Economist article, Augustine was
quoted as saying, “We are right on target. Unfortunately nothing has changed.” That article
also presented an updated version of Augustine’s Law.
The latest F-35 fighter program is estimated to cost the US Department of Defense US$1.124
trillion. To be fair, that estimate is the cost of the entire program over its projected 60-
year lifespan (through 2070) — this includes R&D, the purchase of over 2, 000 F-35s, and
operating costs. But still, US$1.124 trillion is a mind-blowing figure.*
*Figure quoted from an April 2016 Defense News story. Note though that the estimate keeps changing.
Exercise 104. Using the data below, “predict” how many haircuts were sold in June 2016
by (a) a barber who charged $7 per haircut; and (b) a barber who charged $200 per haircut.
Which prediction is an act of interpolation and which is an act of extrapolation? Which
prediction do you think is more reliable?(Answer on p. 368.)
i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400

57.8 The Higher the PMCC, the Better the Model?
There are no routine statistical questions, only questionable statistical routines.

- Usually attributed to David Cox.
It’s much more interesting to live not knowing than to have answers which might be wrong.
- Richard Feynman (1981, YouTube).
The A-level examiners21 want you to say, mindlessly and formulaically, that
All else equal, a model with a higher PMCC

is better than a model with a lower PMCC.
Regurgitating the above sentence will earn you your full mark. But in fact, without the
“all else equal” clause, it is nonsense. And since it is almost never true that “all else is
equal”, it is almost always nonsense.
In every introductory course or text on statistics, one is told that the PMCC is merely
a relatively-unimportant consideration, in deciding between models. Yet somehow, the
A-level examiners seem to consider the PMCC an all-important consideration.
Here’s a quick example to illustrate.
Example 226. (From the 2015 H2 Maths exam.) In an experiment the following informa-
tion was gathered about air pressure P , measured in inches of mercury, at different heights
above sea-level h, measured in feet.
h 2000 5000 10000 15000 20000 25000 30000 35000 40000 45000
P 27.8 24.9 20.6 16.9 13.8 11.1 8.89 7.04 5.52 4.28
The exam first asks us to find the PMCCs between (a) h and P ; (b) ln h and P ; and (c)
√
h and P . The answers are (a) ra ≈ −0.980731; (b) rb ≈ −0.974800; and (c) rc ≈ −0.998638.
The A-level exam then says, “Using the most appropriate case ..., find the equation which √
best models air pressure at different heights.” The “correct” answer is that (c) P = a + b h
is the “most appropriate” model, simply because the PMCC there is the largest.
21
See H2 Maths 9740 N2015/II/10(iii), N2014/II/8(b)(ii), N2012/II/8(v), N2011/II/8(iii), N2010/II/10(iii), and
N2008/II/8(i).

But this is utter nonsense. One does not conclude that one model is “more appropriate”
than another simply because its PMCC is 0.018 larger. Small measurement errors or plain
bad luck could easily explain these tiny differences in PMCCs.
Moreover, even if one model has r = 0.9 and another has r = 0.4, it does not automatically
follow that the first model is “more appropriate” than the second. In deciding which
statistical model to use, there are very many considerations, of which the PMCC is a
relatively-unimportant one.
In my view, the correct answer should have been this:
We have far too little information to make any conclusions.
Sadly, in the Singapore education system, what I consider to be the correct answer would
not have gotten you any marks. Instead, one is taught that there must always be one
single, simplistic, formulaic, definitive, “correct” answer. This is a convenient substitute
for thinking.
As it turns out, the “most correct” linear model — based on the actual barometric formula
(see the last page of the Appendices in my H2 Mathematics Textbook) — is actually the
following:
L
ln P = a + b ln (1 + h) .
T
The constants L = −0.0065 kelvin per metre (Km-1 ) and T = 288.15 kelvin (K) are, re-
spectively, the standard temperature lapse rate (up to 11, 000 m above sea level) and the
standard temperature (at sea level).
The PMCC for the above model is rd ≈ 0.999998, which is “better” than the cases examined
above. (See this Google spreadsheet for the data and calculations.)
But again, the PMCC is merely one relatively-unimportant √ consideration. Our

conclusion that this last model is superior to the model P = a + b h is based not on the
fact that rd is 0.001 larger than rc .
Instead, we are confident in this

√ model because it was derived from physical theories. In
contrast, the model P = a + b h (or indeed any of the other models suggested above) is
completely
√ arbitrary and has no theoretical justification. Hence, even if the model P =
a + b h had a PMCC of 1, we’d still prefer this last model.

Part IV
Ten-Year Series
This part lists all the questions from 2006-2015 A-Level exams, sorted into the two sections
of the exam (Pure Mathematics and Statistics), and in reverse chronological order.
In the older exams, they had the habit of not distinctly numbering different parts within
the same question as parts (i), (ii), etc. So I have sometimes taken the liberty of adding or
modifying such numbers.
Exam Tip
Unless explicitly instructed, you are always allowed to use your graphing calculator, so use
it wherever possible.
Examples of explicit instructions to avoid using your calculator include (but are not limited
to):
• “Without using your calculator ...”
• “Use a non-calculator method ...”
• “Find the exact value of ...”
√
• “Express your answer in terms of 3 or π.”

58 Past-Year Questions for Section A: Pure Mathematics
Exercise 105. (8864 N2015/I/1. Answer on p. 369.) Show that there are no real values
of k for which 2k + (k − 4)x − 2x2 is always negative. [4]
3
Exercise 106. (8864 N2015/I/2. Answer on p. 369.) (i) Differentiate . [2]
(2x − 1)4
1
2
(ii) Use a non-calculator method to find ∫ (x + 2/x) dx. [5]
0.5
Exercise 107. (8864 N2015/I/3. Answer on p. 369.) The diagram shows the curve C
with equation y = 12x + 8e−2x .
0 x
(i) Use differentiation to find the exact x-coordinate of the stationary point of C. [4]
(ii) Find the area of the region bounded by C, the x-axis and the lines x = 0 and x = a,
where a is a positive constant. Give your answer in terms of a. [3]

Exercise 108. (8864 N2015/I/4. Answer on p. 369.) The diagram shows a piece of paper
DEF in the shape of an equilateral triangle of side y cm. An equilateral triangle of side x
cm is removed from each corner of DEF . The perimeter of the remaining shape P QRST U
is 30 cm.
D
x
P Q
x
y
U R
x x
F T S E
√
(i) Show that the area, A cm2 , of P QRST U is given by A = (0.5 3) (50 + 10x − x2 ). [5]
(ii) Without using a calculator, find the maximum value of A as x varies, justifying that
this value is a maximum. [3]
Exercise 109. (8864 N2015/I/5. Answer on p. 370.) The curve C has equation y =
0.5x − ln(x + 1).
(i) Sketch the graph of C, stating the coordinates of any points of intersection with the
axes and the equations of any asymptotes parallel to the y-axis. [3]
(ii) Find the numerical value of the gradient of C at the point P where x = 0.5, giving your
answer correct to 3 decimal places. [1]
(iii) The normal to C at P meets the x-axis at A and the y-axis at B. Find the length of
AB. [5]

Exercise 110. (8864 N2014/I/1. Answer on p. 370.) Use a non-calculator method to find
the exact value of
6 1
∫1 √ dx. [4]
1 + 4x
Exercise 111. (8864 N2014/I/2. Answer on p. 371.) (i) Differentiate ln (x2 + 4). [2]
(ii) The curve C has equation y = ln (x2 + 4). Show that the values of x for which the
gradient of C is equal to the constant k satisfy the equation kx2 − 2x + 4k = 0. [1]
(iii) Find the values of k for which this equation has equal roots. [2]
1 − e1−2x .
(i) Sketch the graph of C, stating the exact coordinates of any points of intersection with
the axes and the equation of the asymptote. [3]
(ii) Without using a calculator, find the equation of the tangent to C at the point where
x = 1, giving your answer in the form y = mx + c, where m and c are exact constants. [4]
Exercise 113. (8864 N2014/I/4. Answer on p. 371.) ABCD is a rectangle in which

AB = y cm and BC = 3x cm. The point E is on DA and the point G√ is on DC such that
DEF G is a square of side x cm (see diagram). The length of BF is 2 65 cm.
(i) Show that 5x2 + y 2 − 2xy = 260. [2]

(ii) Given that the perimeter of the rectangle ABCD is 60 cm, find the values of x and y.
[5]

x3 + kx2 + 7x + c, where k and c are constants. The stationary points of C are at A and B.
(i) Given that A has coordinates (1, 2), show that k = −5 and find the value of c. [4]
(ii) Hence find the exact values of the coordinates of B. [3]
(iii) Sketch the graph of C, stating the coordinates of any points where the curve crosses
the x-axis. [2]
(iv) Use a non-calculator method to find the exact area of the region bounded by C, the
x-axis and the lines x = 1 and x = 2. [3]
Exercise 115. (8864 N2013/I/1. Answer on p. 372.) Find the set of values of k for which
the equation x2 − (k − 2)x + (2k + 1) = 0 has no real roots. [4]
Exercise 116. (8864 N2013/I/2. Answer on p. 372.) (i) Differentiate ln (1 + 2x2 ). [2]
(ii) Use a non-calculator method to find the exact value of
0 1
∫−1 (1 − 3x)4 dx. [4]
Exercise 117. (8864 N2013/I/3. Answer on p. 372.) A piece of card has the shape of a
trapezium ABCE. The point D on CE is such that ABCD is a rectangle. It is given that
AB = y cm, BC = 4x cm and DE = 3x cm (see diagram). The area of the card is S cm2 .
Given that the perimeter of the card is 20 cm,

(i) find an expression for S in terms of x, [3]
(ii) find the maximum value of S, justifying that this value is a maximum. [3]

x3 − ax2 + 3x + 6, where a is a constant.
(i) Find, in terms of a, the gradient of the normal to C at the point P where x = 1. [3]
The normal at P passes through the point (−5, 3).

(ii) Show that a satisfies the equation a2 − 10a + 24 = 0 and hence find the two possible
values of a. [5]
(iii) For the smaller value of a, find the coordinates of the point of intersection of the normal
at P and the line y = x. [2]
Exercise 119. (8864 N2013/I/5. Answer on p. 373.) (i) By taking logarithms, find the
exact root of the equation e2−2x = 2e−x . [3]
(ii) Use differentiation to show that the curve C with equation y = e2−2x − 2e−x has a
stationary point at (2, −e−2 ). [3]
(iii) Sketch C, stating the exact value of the x-coordinate of its point of intersection with
the x-axis. [2]
(iv) Use your calculator to find the area of the region bounded by C, the x-axis and the
lines x = 0 and x = 1. [1]
Exercise 120. (8864 N2012/I/1. Answer on p. 373.) Given that 3e2x = 4 (e−2x − 1), use
the substitution u = e2x to find the exact value of x. [4]

Exercise 121. (8864 N2012/I/2. Answer on p. 374.) The diagram shows a garden which
is enclosed by a wall AH and fencing along the rest of the boundary ABCDEF GH. The
angles at B, C, D, E, F and H are each right angles and EF = 20 m, BC = y m and
AB = DC = DE = x m. It is given that the total length of the fencing is 100 m, and that
the area of the rectangle HBCG is three times the area of the rectangle DEF G.
(i) Show that x2 + 30x − 400 = 0. [4]

(ii) Find the length of HF . [2]

with equation y = k 2 − x2 and the line L with equation y = 3k 2 /4, where k is a positive
constant.
(i) Find, in terms of k, the x-coordinates of the points where C and L intersect. [2]
(ii) Hence find, in terms of k, the area of the finite region between C and L. [4]
Exercise 123. (8864 N2012/I/4. Answer on p. 374.) (i) Differentiate

(a) 2 ln(3x + 2), [2]
(b) 4/(2x + 1). [2]
√4 √ 2
(ii) Without using a calculator, find the exact value of ∫ ( x − 1/ x) dx, simplifying
2
your answer. [5]
Exercise 124. (8864 N2012/I/5. Answer on p. 375.) The curve C has equation y = 2x −x2 .
(i) Sketch C, stating the coordinates of the points of intersection with the axes. [3]
(ii) Find the numerical value of the gradient of C at the point where x = 1.5. Give your
answer correct to 4 decimal places. [1]
(iii) Hence find the equation of the tangent to C at the point where x = 1.5. Give your
answer in the form y = mx + c, with m and c correct to 4 decimal places. [2]
(iv) This tangent meets the y-axis at A and the line y = x at B. Find the length of AB. [4]

Exercise 125. (8864 N2011/I/1. Answer on p. 375.) Find, algebraically, the set of values
for which x2 + (k − 2)x + (k + 1) > 0 for all real values of x. [4]
Exercise 126. (8864 N2011/I/2. Answer on p. 375.) (i) On a single diagram, sketch
the graphs of y = 2 − 0.6x and y = x2 − 1, stating clearly the coordinates of any points of
intersection with the y-axis. [2]
(ii) Find the x-coordinates of the points of intersection of y = 2 − 0.6x and y = x2 − 1, giving
your answers correct to 4 decimal places. [2]
(iii) Write down as an integral an expression for the area of the region bounded by y = 2−0.6x
and y = x2 − 1 and the lines x = 2 and x = 3. Evaluate this integral, giving your answer
correct to 3 decimal places. [2]
Exercise 127. (8864 N2011/I/3. Answer on p. 376.) (i) Find ∫ e3x+2 dx. [2]
9√ √
(ii) Without using a calculator, find ∫ 3 ( x − 1/ x) dx. [4]
4
Exercise 128. (8864 N2011/I/4. Answer on p. 376.) The diagram shows a square piece
of cardboard ABCD of side 2 m. A square of side x m is removed from each corner of
ABCD. The remaining shape is now folded along P Q, QR, RS and SP to form an open
rectangular box of height x m.
(i) Show that the volume, V m3 , of the box is given by V = 4x3 − 8x2 + 4x. [3]
(ii) Without using a calculator, find the maximum value of V as x varies. [5]

x − ln(2x + 1). O is the origin and P is the point on C for which x = 2. The normal to C
at the point P meets the x-axis at A and the y-axis at B (see diagram).
Without using a calculator,

(i) find the exact coordinates of the minimum point on C, [3]
(ii) find the exact coordinates of A and B and hence show that the exact area of triangle
OAB is (p − q ln 5)2 /30, where p and q are integers to be found. [8]
Exercise 130. (8864 N2010/I/1. Answer on p. 376.) Find the set of values of k for which
the equation 4x2 − 2kx + 9 = 0 has two real distinct roots. [3]
Exercise 131. (8864 N2010/I/2. Answer on p. 376.) Find
(i) ∫ e1−2x dx, [2]
(ii) ∫ 2/(x + 1)3 dx. [3]

Exercise 132. (8864 N2010/I/3. Answer on p. 377.) The equation of a curve is y =
ln(2x − 3).
(i) Sketch the curve, stating the exact equations of any asymptotes and the exact coordi-
nates of any intersections with the axes. [2]
(ii) Find dy/dx. [1]
(iii) Hence find the equation of the normal to the curve at the point where x = 3, giving
your answer in the form ax + by = c, where a and b are integers. [4]
Exercise 133. (8864 N2010/I/4. Answer on p. 377.) A window in a new building has the
shape of a rectangle ABCD joined to an isosceles triangle ABE, as shown in the diagram.
IT is given that AB = 2x m and AE = 5/8AB. The total perimeter AEBCDA of the window
is 6 m.
(i) Show that AD = (3 − 9x/4) m. [2]
The area of the window is to be as large as possible.

(ii) Show that the area of the window is equal to (6x − 15x2 /4) m2 . [3]
(iii) Hence use a non-calculator method to find the maximum value of this area. [4]

Exercise 134. (8864 N2010/I/5. Answer on p. 377.) A curve C has equation y =
6 − 4x3 − 3x4 .
(i) Use a non-calculator method to find the coordinates of the stationary points of C. [4]
(ii) Sketch C. Mark the point of inflexion with a cross. [2]
(iii) Find, correct to 2 decimal places, the x-coordinates of the points where C cuts the
x-axis. [2]
(iv) Find ∫ 6 − 4x3 − 3x4 dx. Hence find the exact area of the region bounded by C, the
x-axis and the lines x = −1 and x = 1/2. [3]
Exercise 135. (8863 N2009/I/1. Answer on p. 378.) Without using a calculator, solve
the simultaneous equations
x + 2y = 3,
x2 + xy = 2. [4]
√
Exercise 136. (8863 N2009/I/2. Answer on p. 378.) (i) Sketch the graphs of y = x and
√
y = 0.5x on a single diagram and write down the coordinates of the points where y = x
and y = 0.5x intersect. [2]
√
(ii) Find ∫ x dx and ∫ 0.5x dx. [2]
(iii) Without using a calculator, find the area of the region between the two graphs. [2]
Exercise 137. (8863 N2009/I/4. Answer on p. 379.) (i) Sketch the curve y = x − 1/x,
stating clearly the coordinates of all points of intersection with the axes. [1]
(ii) Find the gradient of the normal at the point P on the curve where x = 2. [2]
(iii) Find the equation of the normal at P in the form ax + by + c = 0, where a and b are
integers. [3]
(iv) The normal at P meets the y-axis at N and the tangent at P meets the y-axis at T .
Find the area of triangle P T N . [5]

Exercise 138. (8863 N2009/I/5. Answer on p. 379.) A curve has equation y = 2x3 − 5x2 −
4x + 3.
(i) Find dy/dx. Hence find the exact coordinates of the stationary points on the curve. [4]
(ii) Sketch the curve, stating clearly the coordinates of all points of intersection with the
axes. [3]
(iii) Solve the inequality 2x3 −5x2 −4x+3 > 0. Hence find the exact solutions of the inequality
2e3x − 5e2x − 4ex + 3 > 0. [5]
Exercise 139. (8863 N2008/I/1. Answer on p. 380.) This question is no longer in the
8865 (revised) syllabus, so you can skip it. Sketch the graph of y = sin x for 0 ≤ x ≤ 4π. [1]
It is given that α is an acute angle, and sin α = c. State, in terms of c, the value of
(i) sin(2π + α), [1] (ii) sin(3π + α). [1]

State, in terms of α and π, one value of x between π and 2π for which sin x = −c. [1]
Exercise 140. (8863 N2008/I/2. Answer on p. 380.) The sum of two numbers x & y is
20 and the sum of their squares is 300. Given that x > y, find the exact value of x & y. [5]
Exercise 141. (8863 N2008/I/3. Answer on p. 380.) The diagram shows the graphs of
C1 ∶ y = 2x2 and C2 ∶ y = x2 + k 2 , where k is a positive constant. The graphs intersect at P
and Q, as shown.
(i) Show that the x-coordinates of P and Q are k and −k respectively. [1]
(ii) Find the exact value of the area of the shaded region between C1 and C2 . [5]

Exercise 142. (8863 N2008/I/5. Answer on p. 381.) A spot of light on a computer screen
moves in a horizontal line across the screen. At time t seconds, its distance, x mm, from
the left-hand edge of the screen is given, for t ≥ 0, by x = t3 − 12t2 + kt, where k is a positive
constant.
(i) Find the set of values of k for which x is an increasing function of t. [5]
It is now given that k = 36.
(ii) Sketch the graph of x against t. [1]

(ii) The screen has width 375 mm. Find the time in seconds at which the spot reaches the
right-hand edge of the screen, giving your answer correct to 1 decimal place. [2]
with equation y = ln(2x + 4). The point P on C has coordinates (1, ln 6). The tangent to
C at P meets the x-axis at T .
(i) Show that the exact x-coordinate of T is 1 − 3 ln 6. [4]
The normal to C at P meets the x-axis at N .

(ii) Find the exact x-coordinate of N . [2]
(iii) Find the exact area of triangle P T N . [4]

Exercise 144. (8863 N2007/I/1. Answer on p. 381.) (i) Find the numerical value of the
derivative of 3x when x = 2. [1]
(ii) Hence find the equation of the tangent to the graph of y = 3x at the point where x = 2,
giving your answer in the form y = mx + c. [2]
Exercise 145. (8863 N2007/I/3. Answer on p. 382.) (i) Sketch, for x ≥ 0, the graphs of
y = 20/(x + 2) and y = 10 − x2 on the same axes. [2]
(ii) The graphs intersect on the y-axis. Find, correct to 3 decimal places, the x-coordinate
of the point of intersection for which x > 0. [1]
(iii) Find ∫ 20/(x + 2) dx and ∫ (10 − x2 ) dx. [3]
(iv) Use your answers to parts (ii) and (iii) to find the area of the region, in the first
quadrant, between the two graphs. [2]
Exercise 146. (8863 N2007/I/4. Answer on p. 382.) The diagram shows a large rect-
angular field surrounded by a wall. The broken lines represent fences. The corner shapes
are an isosceles triangle and a square. The length of the fence bordering the triangle is x
metres.
(i) Explain why the area of the triangle is 0.25x2 m2 . [2]

The total length of the fences is 100 m . The total area of the triangle and the square is A
m2 . (ii) Show that A = 2500 − 50x + 0.5x2 . [3]
(iii) Use differentiation to find the value of x for which A is a minimum. State the corre-
sponding minimum value of A and explain briefly how you can tell that it is a minimum
rather than a maximum. [4]
(iv) Find the largest value that A can take, given that 10 ≤ x ≤ 80. Show clearly how you
obtain your answer. [2]

Exercise 147. (8863 N2007/I/5. Answer on p. 383.) (i) Without using a calculator, solve
the simultaneous equations y = 2x2 + 3x + 2 and y = 2x + 3. [3]
(ii) Hence solve the inequality 2x2 + 3x + 2 ≥ 2x + 3. [2]

Part (iii) of this question is no longer in the 8865 (revised) syllabus, so you can skip it.
(iii) Hence, using a sketch of the graph of x = cos θ, solve the inequality 2 cos2 θ + 3 cos θ + 2 ≥
2 cos θ + 3, for 0○ ≤ θ ≤ 540○ . [6]
Exercise 148. (8174 N2006/I/6. Answer on p. 383.) Solve (i) e5x+2 = 23, [2]
(ii) lg (40 + y 2 ) = 2.5. [3]
Exercise 149. (8174 N2006/I/7. Answer on p. 383.) The line y = 1 − 3x is a tangent to

the curve x2 + y 2 + kx + 2y + 7 = 0. Find the possible values of the constant k. [5]
Exercise 150. (8174 N2006/I/9. Answer on p. 384.) (i) Find ∫ (5x2 − 8x) dx. [2]
1
(ii) Evaluate ∫ e−2x dx. [4]
0
Exercise 151. (8174 N2006/I/16. Answer on p. 384.) The diagram shows the line
y = −4x + 19 intersecting the curve y = −2x2 + 6x + 11 at the points A and B. Find
(i) the coordinates of the points A and B, [4]
(ii) the area of the shaded region. [7]

59 Past-Year Questions for Section B: Prob. & Stats
Exercise 152. (8864 N2015/I/6. Answer on p. 385.) The masses of peaches sold by a
shop have a normal distribution. Over a long period of time, it is found that 20% of peaches
have a mass less than 40 grams and 25% of peaches have a mass greater than 60 grams.
Find the mean and variance of the distribution. [4]
8865 (revised) syllabus, so you can skip it. A college has 1200 students. Of these students,
500 are in Year One, 400 are in Year Two and 300 are in Year Three. A list of the names
of all 1200 students, with the names arranged in alphabetical order, is available. A survey
is to be carried out to investigate how many hours students spend playing computer games
each week.
(i) Describe how to obtain a systematic sample of 100 students from the list to take part
in the survey. [2]
(ii) State one disadvantage of using a systematic sample in this context. [1]
(iii) What type of sample might it be more appropriate to use? You do not need to describe
how you would obtain this sample. [1]
Exercise 154. (8864 N2015/I/8. Answer on p. 385.) Two events A and B are such that
P(A) = p, P(B) = 2p, P(A ∪ B) = 0.42 and P(A ∩ B) = 0.03.
(i) Show that p = 0.15. [1]
(ii) Find P (A ∪ B ′ ). [3]

(iii) Determine whether the events A and B ′ are independent. [2]
Exercise 155. (8864 N2015/I/9. Answer on p. 385.) Kai throws a fair die 8 times. Find
the probability that he obtains a six
(i) exactly three times, [1]

(ii) fewer than four times. [2]
Lam throws a fair die 600 times.
(iii) Using a suitable approximation, estimate the probability that the number of times
he obtains a six is between 90 and 100 inclusive. State the mean and variance of the
distribution that you use. [4]

Exercise 156. (8864 N2015/I/10. Answer on p. 386.) The height, h metres, and the
weight, w kg, were recorded for a random sample of 10 members of a rowing club. The
results are given in the following table.
Rower A B C D E F G H I J
h 1.75 1.90 1.81 1.82 1.81 1.60 1.88 1.71 1.95 1.76
w 95 102 96 98 99 90 106 92 110 93
(i) Draw a sketch of the scatter diagram for the data, as shown on your calculator. [2]
(ii) Find the product moment correlation coefficient and comment on its value in the context
of the data. [2]
(iii) Find the equation of the regression line of w on h and sketch this line on your scatter
diagram. [2]
(iv) Use the equation of your regression line to calculate an estimate of the weight of a
rower whose height is 1.66 metres. Give two reasons why you would expect this estimate
to be reliable. [3]
Exercise 157. (8864 N2015/I/11. Answer on p. 386.) Men and women staying at a large
hotel have masses, in kg, that are normally distributed with means and standard deviations
as shown in the following table.
Mean mass Standard deviation

Men 77 9.8
Women 62 10.6
(i) Find the probability that the mass of a man chosen at random is within ±2 kg of the
mean mass of men. [2]
(ii) Find the probability that the total mass of three men chosen at random is greater
than the total mass of four women chosen at random. State the mean and variance of the
distribution that you use. [4]
The lift in the hotel has a safety limit of 460 kg. Three men and four women are chosen at
random.
(iii) Find the probability that they can safely travel in the lift together. State the mean
and variance of the distribution that you use. [3]

Exercise 158. (8864 N2015/I/12. Answer on p. 386.) An accountancy qualification
involves two separate examinations, Part I and Part II. To be successful, a student must
first pass Part I and after passing Part I must then pass Part II. Students who fail Part
I at the first attempt always make a second attempt. Students are allowed at most two
attempts at Part I but only one attempt at Part II. The probability that a student will
pass Part I, on either attempt, is 3/4. The probability that a student will pass Part II is
2/5.
(i) Draw a tree diagram to represent this information. [3]

(ii) Find the probability that a student chosen at random will succeed in the accountancy
qualification. [2]
(iii) Find the probability that a student chosen at random will succeed in the accountancy
qualification, given that the student fails Part I at the first attempt. [2]
Five randomly chosen students take the qualification.
(iv) Find the probability that at least two of them will be successful. [3]
Exercise 159. (8864 N2015/I/13. Answer on p. 387.) A scientist claims that the mean
length of fish in a particular lake is 15.2 cm. The lengths of fish are known to have a normal
distribution with standard deviation 2.1 cm. A random sample of 30 fish is selected and
found to have a sample mean length of 14.5 cm.
(i) Test, at the 5% significance level, whether the scientist’s claim should be rejected. [4]
The lengths of a random sample of 40 fish from a second lake are summarised as follows,
where x cm denotes the length of a fish in this lake.
∑(x − 18) = −32, ∑(x − 18)2 = 325.
(ii) Find unbiased estimates of the population mean and variance. [3]
(iii) What do you understand by the term ‘unbiased estimate’? [1]
The population mean length of fish from this second lake is µ cm. Using the sample data,
a significance test of the null hypothesis µ = 18 against the alternative hypothesis µ < 18 is
carried out at the α% significance level.
(iv) Find the set of values of α for which the null hypothesis will be rejected. [3]
Exercise 160. (8864 N2014/I/6. Answer on p. 387.) The heights of girls in a school
have a normal distribution with mean 142.2 cm and standard deviation 6 cm. Find the
probability that a girl chosen at random from this school has height
(i) less than 146 cm, [2]

(ii) within 5 cm of the mean. [2]

8865 (revised) syllabus, so you can skip it. There are 5000 households in a particular town.
For each household the weekly food shopping is done either by going to the supermarket
or by ordering online and having the order delivered. The numbers using each method
of shopping are recorded, according to the age, in years, of the person responsible for the
shopping. The data is summarised in the following table.
Supermarket Online
Under 25 years 500 1000
25 − 60 years 900 1600
Over 60 years 800 200
A researcher carries out a survey to investigate the amount spent on food per week. She
decides to use a sample of size 100 from these households.
(i) Describe how she might obtain a systematic sample. [2]
(ii) Describe how she might obtain a stratified sample, identifying the strata and finding
the size of the sample taken from each of the strata. [2]
(iii) State, with a reason, whether a systematic sample or a stratified sample would be more
appropriate in this context. [1]
Exercise 162. (8864 N2014/I/8. Answer on p. 388.) In a certain large city, the number
of hours, x, spent travelling to and from work and the number of hours, y, spent watching
television were recorded for a random sample of 8 people, for one particular week. The
results are given in the following table.
A B C D E F G H
x 12.8 8.4 4.4 9.0 7.2 2.2 9.2 6.3
y 4.5 8.3 14.8 8.0 9.2 12.5 7.8 10.4
(i) Give a sketch of the scatter diagram for the data, as shown on your calculator. [2]
of the data. [2]
(iii) Find the equation of the regression line of y on x, in the form y = mx + c, giving the
values of m and c correct to 4 significant figures. Sketch this line on your scatter diagram.
[2]
(iv) Use the equation of your regression line to estimate the number of hours of television
watched by a person who spends 13.2 hours a week travelling to and from work. Comment
on the reliability of your estimate. [3]

Exercise 163. (8864 N2014/I/9. Answer on p. 388.) A bakery produces two kinds of
cake. One kind of cake contains fruit, and the other kind contains no fruit. There is a
constant probability that a cake contains fruit. The cakes are sold in packs of 6. Each pack
has a random selection of cakes. For these packs, the mean number of cakes containing
fruit is 2.4.
(i) Find the probability that a pack chosen at random has

(a) no cakes containing fruit, [2]
(b) at most two cakes containing fruit. [1]

A customer buys 8 packs of cakes for a party.
(ii) Find the probability that at least 4 of these packs have at most two cakes containing
fruit. [3]
A supermarket stocks 150 of these packs of cakes.
(iii) Using a suitable approximation, estimate the probability that more than half of the
packs have at most two cakes containing fruit. You should state the mean and variance of
any distribution that you use. [4]
Exercise 164. (8864 N2014/I/10. Answer on p. 388.) It is known that the lengths
of leaves from beech trees in a particular forest have a population variance of 4.4 cm2 .
Scientists believe that the mean length of leaves from beech trees in this forest is 7 cm. A
random sample of 50 of these leaves has a mean length of 6.5 cm.
(i) Test, at the 5% significance level, whether the population mean length of leaves from
beech trees in this forest is less than 7 cm. [4]
The lengths, x cm, of a random sample of 50 leaves from beech trees in another forest are
summarised by ∑ x = 310.4 and ∑ x2 = 2209.2.
(ii) Calculate unbiased estimates of the population mean and variance. [3]
A test, at the α% significance level, shows that there is sufficient evidence to suggest that
the population mean length of leaves from beech trees in this second forest differs from 7
cm.
(iii) Find the set of possible values of α. [4]

Exercise 165. (8864 N2014/I/11. Answer on p. 389.) A group of students are asked
whether they own any of a laptop, a tablet and a games machine. The numbers owning
different combinations are shown in the Venn diagram. The number of students owning
none of these is x. One of the students is chosen at random.
L is the event that the student owns a laptop.
T is the event that the student owns a tablet.
G is the event that the student owns a games machine.
(i) Write down expressions for P(L) and P(G) in terms of x. Given that L and G are
independent, show that x = 10. [4]
Using this value of x, find (ii) P(L ∪ T ), (iii) P(T ∩ G′ ), and (iv) P(L∣G). [1 mark each.]
Two students from the whole group are chosen at random.
(v) Find the probability that both of these students each owns exactly two out of the three
times (laptop, tablet, games machine). [3]
Exercise 166. (8864 N2014/I/12. Answer on p. 389.) The outputs of a certain metal,
in tonnes, extracted each day from two mines, A and B, have independent normal dis-
tributions. The mean of the distribution of the daily output from A is 50 tonnes. The
probability that the daily output from A is more than 75 tonnes is 0.0189.
(i) Show that the variance of this distribution is 145 tonnes2 , correct to 3 s.f. [3]
The mean and variance of the distribution of the daily output from B are 75 tonnes and
64 tonnes2 respectively. B operates for seven days each week.
(ii) Find the probability that in a 7-day week B produces less than 500 tonnes. [3]
(iii) A operates for five days each week. Find the probability that in any particular week
the output from B is more than twice the output from A. You should state the mean and
variance of any distribution that you use. [5]

8865 (revised) syllabus, so you can skip it. Suky is organising a pop concert. She sells 5000
tickets at $X each, 10000 tickets at $Y each and 15000 tickets at $Z each. Suky wants to
find out whether those who bought the tickets thought that the price they paid was good
value for money. She decides to do this by choosing a stratified random sample of size 150.
(i) Describe how Suky might choose her sample. [3]

(ii) State one reason for using stratified random sampling in this context. [1]
Exercise 168. (8864 N2013/I/7. Answer on p. 390.) A particular type of electronic

device is being tested to determine for how long information stored in it is retained after
power has been switched off. A random sample of 250 such devices is chosen and the time,
T hours, for which information is retained is measured for each one. The results obtained
are summarised as follows.
∑(t − 75) = 305, ∑(t − 75)2 = 29555.
(i) Find unbiased estimates of the population mean and variance. [3]
(ii) This type of device has previously been considered capable of retaining information for
75 hours, on average, after power is switched off, but the manufacturers now claim that
information is retained for longer than this. Test at the 2.5% significance level whether the
claim is justified. [4]
Exercise 169. (8864 N2013/I/8. Answer on p. 390.) A shop sells batteries in packs
of 10. An advertiser claims that individual batteries each have a lifetime of at least 100
hours. The probability that an individual battery has a lifetime less than 100 hours is 0.2,
independently of all other batteries.
(i) Find the probability that, in a randomly chosen pack of 10 batteries, each of the batteries
satisfies the advertiser’s claim. [1]
Customers are satisfied if at least 8 of the batteries in a pack have a lifetime of at least 100
hours.
(ii) Find the probability that a randomly chosen pack will satisfy customers. [3]
A customer buys a batch of 80 packs of these batteries.

(iii) Using a suitable approximation, estimate the probability that at least 75% of packs in
the batch will satisfy the customer. State the mean and variance of the distribution that
you use. [4]

Exercise 170. (8864 N2013/I/9. Answer on p. 390.) The ages x, in years, and the heights
y, in cm, for 10 boys are given in the following table.
Boy A B C D E F G H I J
x 8.2 10.1 6.6 13.5 6.8 11.4 7.8 6.9 12.8 7.5
y 123 135 119 141 112 151 122 116 141 123
(i) Give a sketch for the scatter diagram for the data, as shown on your calculator. [2]
of the data. [2]
values of m and c correct to 2 decimal places. Sketch this line on your scatter diagram. [2]
(iv) Use the equation of your regression line to calculate an estimate of the height of a boy
whose age is 13.2 years and comment on the reliability of your estimate. [3]
Exercise 171. (8864 N2013/I/10. Answer on p. 391.) A company producing barbecue

sauce claims that the mass of salt in a bottle of the sauce has a mean of 12 g. The mass
of salt is known to have a normal distribution with standard deviation 0.8 g. A random
sample of 20 bottles is selected. The sample mean is m g. A test at the 5% significance
level is carried out on this sample, and the company’s claim is accepted.
(i) Find the set of possible values of m. [5]

The company launches a new variety of the sauce and claims that the mean salt content
per bottle has been reduced. The mass of salt in a random sample of 40 bottles of the
new variety has a mean of 11.75 g. The mass of salt still has a normal distribution with
standard deviation 0.8 g.
(ii) Test the company’s claim about the new variety of sauce, using a 5% significance level.
[4]

Exercise 172. (8864 N2013/I/11. Answer on p. 391.) A pet shop sells two types of animal
food. Type A is supplied by a manufacturer and sold in packets with the food content having
a mean mass of 1 kg. The masses of the food content are normally distributed. It is known
that 20% of the packets contain less than 990 g of food.
(i) Find the standard deviation of the distribution. [3]

Type B animal food is mixed by the shop owner from two ingredients P and Q. One packet
contains 3 scoops of ingredient P and 2 scoops of ingredient Q. The masses, in grams, of the
food in scoops of ingredients P and Q have independent normal distributions with means
and standard deviations as shown in the following table.
Mean Standard deviation

Ingredient P 240 10
Ingredient Q 145 8
(ii) Find the probability that a randomly selected packet of Type B has a mass of food less
than 1 kg. State the mean and variance of any distribution that you use. [4]
(iii) Find the probability that the mass of food in a randomly selected packet of Type B is
more than the mass of food in a randomly selected packet of Type A. State the mean and
variance of any distribution that you use. [4]
Exercise 173. (8864 N2013/I/12. Answer on p. 391.) Jai is playing a game which involves
throwing a fair six-sided die. If the result is a 3, 4, 5 or 6, his score is the number shown.
If the result is a 1 or a 2, he throws the die a second time and his score is the sum of the
two numbers from his two throws.
(i) Draw a tree diagram to represent the possible outcomes. [3]
Events A and B are defined as follows:
Event A: Jai’s score is 5 or 6, Event B: Jai has two throws.
(ii) Show that P(A) = 4/9. [2]
Find (iii) P(A ∩ B), [1] (iv) P(A ∪ B), [2] and (v) P(B∣A′ ). [4]
8865 (revised) syllabus, so you can skip it.
(i) Describe what is meant by ‘systematic sampling’. [2]

A researcher is conducting a survey in a particular town to find out how many hours adults
spend on their computers each day. He decides to survey a sample of 100 adults by standing
outside the main supermarket at midday and using systematic sampling.
(ii) State, in this context, one advantage and one disadvantage of this procedure. [2]
(iii) Describe briefly how, in this case, the researcher might choose a more appropriate
systematic sample. [1]

Exercise 175. (8864 N2012/I/7. Answer on p. 392.) Events A and B are such that
P(A) = P(B) = p and P(A ∪ B) = 5/9.
(i) Given that A and B are independent, find a quadratic equation satisfied by p. [3]
(ii) Hence find the value of p and the value of P(A ∩ B). [2]
Exercise 176. (8864 N2012/I/8. Answer on p. 392.) An election was held to choose the
leader of a political party.
• Candidate A received 50% of all the votes, and 60% of A’s votes were cast by males.
• Candidate B received 35% of all the votes, and 40% of B’s votes were cast by males.
• Candidate C received 15% of all the votes, and 20% of C’s votes were cast by males.
A person V , who voted in the election, is selected at random. Find the probability that V
(i) voted for A and is male, [1]
(ii) is female, [2]
(iii) voted for C, given that V is male. [2]
Exercise 177. (8864 N2012/I/9. Answer on p. 392.) A company is selling ‘Pluto’ cars.
The age x, in years, and the advertised price y, in hundreds of dollars, for ten Pluto cars
are given in the following table.
Car 1 2 3 4 5 6 7 8 9 10
x 5.0 4.5 6.0 5.2 5.6 6.0 3.0 2.0 7.1 7.5
y 85 90 65 72 75 70 130 150 42 42
(i) Draw a sketch of the scatter diagram for the data, as shown on your calculator. [2]
of the data. [2]
values of m and c correct to 2 decimal places. Sketch this line on your scatter diagram. [2]
(iv) Calculate an estimate of the advertised price of a Pluto car which is
(a) 4 years old, [2]
(b) 9 years old. [2]
(v) Comment on the reliability of each of your estimates in part (iv). [2]

Exercise 178. (8864 N2012/I/10. Answer on p. 393.) ‘Sunbrite’ plants are sold in trays
of 12 plants. For any Sunbrite plant, the probability that it flowers is 0.8, independently of
all other Sunbrite plants. Find the probability that from one tray of Sunbrite plants
(i) exactly 10 will flower, [2]
(ii) fewer than 8 will flower. [2]

A gardener A buys 8 trays of Sunbrite plants.
(iii) Use a suitable approximation to estimate the probability that more than 75 plants will
flower. State the mean and variance of the distribution that you use. [4]
Two other gardeners, B and C, each buy 8 trays of Sunbrite plants.
(iv) Find the probability that, for at least two of the three gardeners A, B, and C, more
than 75 of their plants will flower. [3]
Exercise 179. (8864 N2012/I/11. Answer on p. 393.) A company sells balls of string. A
manager claims that the average length of string in a ball is at least 300 m. To test this
claim, a random sample of 100 balls of string is checked and the lengths of string per ball,
x m, are summarised by ∑(x − 300) = −60 and ∑(x − 300)2 = 1240.
(ii) Test at the 5% significance level whether the manager’s claim is valid. [5]
The manufacturing process is improved and the new population variance is known to be
12.1 m2 . A new random sample of 100 balls of string is chosen and the mean of this sample
is k m. A test at the 10% significance level indicates that the manager’s claim is valid for
this improved process.
(iii) Find the least possible value of k, giving your answer correct to 2 decimal places. [3]

Exercise 180. (8864 N2012/I/12. Answer on p. 394.) A supermarket sells two types of
grapefruit, A and B. The masses, in kilograms, of the grapefruit of each type have inde-
pendent normal distributions. The means and standard deviations of these distributions,
and the selling prices, in $ per kilogram, are shown in the following table.
Mean (kg) Standard deviation (kg) Selling price ($ per kg)

Type A 0.25 0.02 1.50
Type B 0.35 0.03 2.40
Stating clearly the mean and variance of all distributions that you use, find the probability
that (i) the total mass of 10 randomly chosen grapefruit of type A is less than 2.4 kg, [3]
(ii) the total mass of 6 randomly chosen grapefruit of type A is within 0.2 kg of the total
mass of 5 randomly chosen grapefruit of type B. [4]
(iii) Mrs Woo buys 3 grapefruit of type A and 3 grapefruit of type B. Mr Tan buys 10
grapefruit of type A. Stating clearly the mean and variance of the distribution that you
use, find the probability that Mrs Woo pays more than Mr Tan. [6]
Exercise 181. (8864 N2011/I/6. Answer on p. 394.) Independent events A and B are
such that P(A) = a and P(B) = b. Given that P(A ∪ B) = 0.46 and P(A ∩ B) = 0.04, find a
quadratic equation satisfied by a and hence find the possible values of P(A). [5]
8865 (revised) syllabus, so you can skip it. Two thousand students travel to college either
by car, by bicycle or on foot. Any given student travels by the same method each day.
The numbers in each of two year-groups using each method of travel are summarised in the
table below.
Car Bicycle On foot

Year 1 200 400 500
Year 2 240 360 300
Researcher A carries out a survey to investigate the length of students’ journey times to
college, using a random sample of 100 students.
(i) Explain what is meant in this context by the term ‘a random sample’. [2]
Researcher B decides to use stratified sampling with three strata from the combined year-
groups, also using 100 students.
(ii) Identify the three strata and find the size of the sample taken from each stratum. [2]
(iii) State one advantage that stratified sampling would have compared to random sampling
in this context, and state how a better stratified sample of size 100 could have been achieved,
using the data in the above table. [2]

Exercise 183. (8864 N2011/I/8. Answer on p. 395.) The air temperature T , in °C, and
the altitude H, in metres, were recorded at noon on a certain day at each of 8 locations in
a mountainous region. The results are summarised in the table below.
H 200 285 335 450 581 878 1225 1550

T 27 23 22 20 15 14 8 6
of this question. [2]
(iii) Find the equation of the regression line of T on H. Sketch this line on your scatter
diagram. [2]
(iv) Calculate an estimate of the air temperature at noon at a place in the region with
altitude 1000 metres. Comment on the reliability of this estimate. [2]
Exercise 184. (8864 N2011/I/9. Answer on p. 395.) A certain type of light bulb is
designed to have a mean lifetime of 12, 000 hours. The standard deviation of the lifetimes
is 1, 400 hours. Tests on a random sample of 50 bulbs from a certain batch give a mean
lifetime of 11, 500 hours.
(i) Test at the 1% level of significance whether this particular batch is substandard (that
is, the mean lifetime of bulbs in the batch is less than 12, 000 hours). [4]
Tests on a random sample of 50 bulbs from another batch give a mean lifetime of T hours.
A test at the 5% level of significance does not indicate that this batch is substandard.
(ii) Obtain an equation for the least possible value of T , and solve it. [4]
Exercise 185. (8864 N2011/I/10. Answer on p. 395.) Jon attempts a puzzle in his daily
newspaper each day. The probability that he will complete the puzzle on any given day is
0.8, independently of any other day.
(i) Find the probability that, in a given week of 7 days, Jon will complete the puzzle
(a) exactly 3 times, [1]
(b) at least 5 times. [2]
(ii) Find the probability that, over a period of 10 weeks, Jon completes the puzzle at least
5 times each week. [2]
(iii) Using a suitable approximation, find the probability that, over a period of 10 weeks,
Jon completes the puzzle at least 50 times in total. State the mean and variance of approx-
imation. [4]

Exercise 186. (8864 N2011/I/11. Answer on p. 396.) Box A contains 5 red balls, 4 green
balls and 1 yellow ball. Box B contains 6 red balls and 2 green balls.
One of the boxes is selected by tossing two fair coins. If both coins show heads, box A is
selected and otherwise box B is selected.
(i) One ball is chosen at random from the selected box and the colour of the ball is noted.
(a) Draw a tree diagram to represent this situation. [3]
(b) Find the probability that a red ball is chosen. [2]
(c) Given that a red ball is chosen, find the probability that it comes from box A. [2]
(ii) Instead, two balls are chosen at random, without replacement, from the selected box.
Find the probability that both balls are the same colour. [4]
Exercise 187. (8864 N2011/I/12. Answer on p. 396.) Boys and girls visiting a theme
park have masses, in kg, that are independent and are normally distributed with means
and standard deviations as shown in the following table.

Boys 60 12
Girls 50 10
(i) Find the probability that the mass of a boy chosen at random is between 50 kg and 70
kg. [2]
(ii) A boy and a girl are chosen at random. Find the probability that the mass of the boy is
greater than the mass of the girl, stating clearly the mean and variance of the distribution
that you use. [4]
(iii) On a ride at the theme park, trains carrying up to 5 people travel around a track. The
total mass of the people on the train must not exceed the safety limit of 300 kg. Three
boys and two girls are chosen at random. Find the probability that their total mass is less
than 300 kg, stating clearly the mean and variance of the distribution that you use. [4]
(iv) The track is improved and new trains carrying up to 6 people are designed. The new
safety limit is L kg. Obtain the equation for L, given that it is 95% certain that 6 boys
chosen at random have a total mass not exceeding L kg. Hence find L. [3]
Exercise 188. (8864 N2010/I/6. Answer on p. 396.) The events A and B are such that
P(A) = 0.6, P(B) = 0.3 and P(A∣B) = 0.2. Find the probability that (i) both A and B
occur, [1] (ii) at least one of A and B occurs, [2] (iii) exactly one of A and B occurs. [2]

Exercise 189. (8864 N2010/I/7. Answer on p. 397.) A group of students take an
examination in Chemistry. A student who fails the examination at the first attempt is
allowed one further attempt. For a randomly chosen student, the probability of passing the
examination at the first attempt is 0.7 and the probability of passing at the second attempt
is 0.9. The information is shown in the tree diagram below.
(i) Find the probability that a randomly chosen student fails the examination at both
attempts. [1]
(ii) Given that a student passes the examination, find the probability that it is at the second
attempt. [3]
(iii) Three students taking the examination are chosen at random. Find the probability
that two of them pass at the first attempt and the other passes at the second attempt. [3]
Exercise 190. (8864 N2010/I/8. Answer on p. 397.) A college has 1, 400 students in Year
One, 900 students in Year Two and 700 students in Year Three. It is intended to carry out
a survey to investigate how much students spend on new clothes each year.
(i) Describe how to obtain a stratified random sample of 60 students to take part in the
survey. [2]
(ii) Describe, in this context, one advantage that stratified sampling has compared to simple
random sampling. [1]
The amount of money spent by a student is denoted by $X. The values for a (non-stratified)
random sample of 50 students are summarised by ∑ x = 10, 450, ∑ x2 = 2, 235, 000. The
population mean and variance of X are denoted by µ and σ 2 respectively.
(iii) Calculate unbiased estimates of µ and σ 2 . [3]
A significance test of the null hypothesis µ = 200 against the alternative hypothesis µ > 200
is carried out at the 10% level of significance.
(iv) Without doing any further calculations, state two assumptions or approximations that
are involved when carrying out the significance test using the above sample data. [2]

Exercise 191. (8864 N2010/I/9. Answer on p. 397.) The probability of any sunflower
seed germinating when it is sown is 0.7, independently of all other sunflower seeds. Find
the probability that, when 8 seeds are sown, (i) exactly 6 will germinate, [2]
(ii) at least 6 will germinate. [2]

(iii) 60 sunflower seeds are sown. Use a suitable approximation to estimate the proba-
bility that fewer than 40 will germinate. You should state the mean and variance of the
approximation. [4]
Exercise 192. (8864 N2010/I/10. Answer on p. 397.) A factory produces components for
an electrical product. The masses of the components are normally distributed with standard
deviation 1.2 grams. The factory owner claims that the mean mass of the components is
15 grams. A random sample of 80 components was taken and found to have a mean mass
of 15.25 grams. (i) Test the owner’s claim at the 5% level of significance. [4]
The owner purchases new machinery to produce the components, and the standard devia-
tion remains unchanged. The owner claims the mean mass is now less than 15 grams. A
new random sample of 80 components is taken.
(ii) Find the set of values within which the mean mass of this sample must lie for the
owner’s new claim to be accepted at the 5% level of significance. [5]
Exercise 193. (8864 N2010/I/11. Answer on p. 398.) (a) Eight pairs of values of variables
x and y are measured. Draw a sketch of a possible scatter diagram of the data for each of
the following cases: the product moment correlation coefficient is approximately
(i) 0, [1] (ii) −0.8. [1]

(b) The monthly earnings, y thousand dollars, of 7 workers of different ages, x years, in a
particular company are given in the table below.
x 18 20 22 27 35 45 55
y 2.55 2.65 2.85 3.15 4.76 5.45 6.26
(ii) Find the product moment correlation coefficient. [1]
(iii) Find the equation of the regression line of y on x in the form y = mx + c, giving the
values of m and c correct to 4 decimal places. [1]
(iv) Calculate an estimate of the monthly earnings of a 40-year-old worker. State why you
would expect this to be a reliable estimate. [2]
(v) All workers are given an increase of N thousand dollars per month. Without any further
calculations state any change you would expect in the values of your constants m and c
found in part (iii). [2]

Exercise 194. (8864 N2010/I/12. Answer on p. 399.) Sweets of a certain brand are
individually wrapped. The masses, in grams, of the unwrapped sweets and the wrappers
have independent normal distributions with means and standard deviations as shown in
the table below.

Unwrapped sweets 40 3
Wrappers 4 0.5
(i) Find the probability that an individual unwrapped sweet has mass less than 36 grams.
[1]
(ii) State the mean and variance of the mass of an individual wrapped sweet. Find the
probability that a wrapped sweet has mass between 42 grams and 46 grams. [3]
Twelve wrapped sweets are packed together in a cardboard tube. The mass of an empty
tube is normally distributed with mean 50 grams and standard deviation 5 grams. The
masses of all sweets and tubes are independent.
(iii) Find the probability that the total mass of a tube containing 12 wrapped sweets is
more than 600 grams, stating clearly the mean and variance of the distribution that you
use. [4]
A rival company produces similar tubes of sweets. The masses of these tubes of sweets
have a normal distribution. Over a long period of time, it is found that 5% of them have a
mass less than 450 grams and 8% have a mass more than 550 grams.
(iv) Find the mean and variance of this distribution. [5]
Exercise 195. (8864 N2009/I/6. Answer on p. 399.) Three researchers, A, B, and C,

share an office. When the office phone rings, the probabilities of the call being for each of
them are as follows.
A ∶ 0.2, B ∶ 0.3, C ∶ 0.5.
The probabilities of each researcher being in the office when the phone rings are as follows.
A ∶ 0.7, B ∶ 0.6, C ∶ 0.8.
All the probabilities are independent. Find the probability that, when the phone rings,
(i) the call is for A and A is in the office, [1]
(ii) the researcher being called is in the office, [2]
(iii) the call is for C, given that the researcher being called is not in the office. [2]

Exercise 196. (8864 N2009/I/7. Answer on p. 399.) A and B are two events such that
P(A) = 1/3, P(B) = 2/5 and P(A ∪ B) = 17/30.
(i) Find P(A ∩ B). [1]
(ii) Show that A and B are not independent. [1]
(iii) Using a Venn diagram, or otherwise, find P(A′ ∪ B). [3]
Exercise 197. (8864 N2009/I/8. Answer on p. 399.) Components in machines used in

a factory wear out and need to be replaced. The lifetime of a component has a normal
distribution with mean 120 days and standard deviation 18 days.
(i) Find the probability that the lifetime of a component is more than 144 days. [2]
(ii) Two components are chosen at random. Find the probability that one has a lifetime of
more than 144 days and one has a lifetime of less than 144 days. [2]
A company develops a new design for the component. The standard deviation of the
lifetimes remains 18 days, but the company claims that the mean lifetime is longer than
for the old components. From a random sample of 50 components of the new design, the
sample mean is 124 days.
(iii) Test at the 5% level of significance whether there is sufficient evidence to support the
company’s claim. [4]
Exercise 198. (8864 N2009/I/9. Answer on p. 400.) A liquid nutrient is added to the
soil around the fruit trees in an orchard, with the aim of increasing the total weight of fruit
produced by the trees. For each of 8 trees, the volume of liquid nutrient, x cm3 , and the
corresponding weight, y kg, of fruit per tree is given in the table below.
x 0 20 40 60 90 120 160 200

y 15.1 15.7 16.2 16.8 16.7 16.5 17.3 18.1
(i) Give a sketch of the scatter diagram for the data, as shown on your calculator.
(ii) Calculate the product moment correlation coefficient and comment on its value in the
context of the data. [2]
(iii) Calculate the equation of the regression line of y on x. Sketch this line on your scatter
diagram. [2]
(iv) Estimate the weight of fruit on a tree when 135 of liquid nutrient is added to its soil.
[1]
(v) Explain why it might be unsuitable to use the equation in part (iii) to estimate how
much liquid nutrient would be needed for a tree to yield 20 kg of fruit. [1]

Exercise 199. (8864 N2009/I/10. Answer on p. 400.) Over a long period of time, it is
found that 20% of candidates who take a particular piano examination fail the examination.
(i) Find the probability that, in a group of 10 randomly chosen candidates who take the
examination, exactly 2 will fail. [2]
(ii) It is given that 15% of the candidates who pass the piano examination are awarded a
distinction. Find the probability that, in a randomly chosen group of 10 candidates who
take the examination, fewer than 2 will be awarded a distinction. [3]
(iii) Use a suitable approximation to estimate the probability that, in a group of 50 randomly
chosen candidates who take the examination, at most 12 will fail. You should state the
mean and variance of the distribution used in the approximation. [4]
Exercise 200. (8864 N2009/I/11. Answer on p. 401.) (a) An insurance company receives
a large number of claims for flood damage. On a particular day the company receives 72
such claims. Because of staff shortages, it is only possible to process 8 of these claims.
Parts (a)(i) and (a)(ii) are no longer in the 8865 (revised) syllabus, so you can skip them.
(i) Describe how you would choose a systematic random sample of size 8 from the received
claims. [2]
(ii) Comment on whether this method of sampling gives a better indication of the value of
the 72 claims as compared to simply choosing as the sample the first 8 claims received. [1]
(b) From the claims received by the company, over a long period of time, a random sample
of 120 is taken. The values of the claims, $x, are summarised by
∑(x − 1000) = 5320, ∑(x − 1000)2 = 8282000.
(ii) What do you understand by the term ‘unbiased estimate’? [1]
(iii) The population mean is denoted by $µ. Using the sample data, a significance test of
the null hypothesis µ = 1000 against the alternative hypothesis µ ≠ 1000 is carried out at
the α% level of significance. Find the set of values of α for which the null hypothesis will
be rejected. [5]

Exercise 201. (8864 N2009/I/12. Answer on p. 401.) (a) The plums sold by a super-
market are graded ‘small’, ‘medium’ or ‘large’. The masses of the plums have a normal
distribution. Plums with a mass less than 22 grams are graded as small, plums with a mass
greater than 29 grams are graded as large and the rest are graded as medium. Given that
30% of plums are small and 20% are large, find the mean and standard deviation of the
distribution. [4]
(b) The masses, in kilograms, of apples and nectarines sold by the supermarket have inde-
pendent normal distributions with means and standard deviations as shown in the following
table.

Apples 0.15 0.03
Nectarines 0.07 0.02
(i) Two apples and four nectarines are chosen at random. Find the probability that the
total mass of the two apples is greater than the total mass of the four nectarines. [4]
(ii) Apples cost $9 per kilogram and nectarines cost $12 per kilogram. Find the mean
and the variance of the total cost of two apples and four nectarines and hence find the
probability that the total cost is between $5 and $6. [5]
Exercise 202. (8864 N2008/I/7. Answer on p. 402.) An examination is marked out of

100. It is taken by a large number of candidates. The mean mark, for all candidates, is
72.1, and the standard deviation is 15.2.
(i) Give a reason why a normal distribution, with this mean and standard deviation, would
not give a good approximation to the distribution of marks. [1]
(ii) A random sample of 50 of the candidates is taken. Calculate the probability that the
mean mark of this sample lies between 70.0 and 75.0.[3]
Exercise 203. (8864 N2008/I/8. Answer on p. 402.) A baker makes loaves of bread. 60%
of the loaves that he makes are ‘crusty’.
(i) A customer buys six randomly chosen loaves. Find the probability that exactly three of
them are crusty. [2]
(ii) A market trader buys 40 randomly chosen loaves. Use a suitable approximation to find
the probability that at least 20 of them are crusty. [4]
(iii) The mass of a loaf has a normal distribution with mean 1.24 kg and standard deviation
σ kg. The probability that a randomly chosen loaf has mass less than 1 kg is 0.04. Find
the value of σ. [3]

Exercise 204. (8864 N2008/I/9. Answer on p. 402.) Two children, Tan and Mui, are
each to be given a pen from a box containing 3 red pens and 5 blue pens. One pen is chosen
at random and given to Tan. A green pen is then put in the box. A second pen is chosen
at random from the box and given to Mui.
(i) Draw a tree diagram to represent the possible outcomes. [2]

(ii) Write down the conditional probability that Mui’s pen is blue, given that Tan’s pen is
red. [1]
(iii) Find the probability that Mui’s pen is red. [2]

(iv) Find the conditional probability that Tan’s pen is red, given that Mui’s pen is blue.
[5]
Exercise 205. (8864 N2008/I/10. Answer on p. 403.) A consumer association is testing

the lifetime of a particular type particular type of battery that is claimed to have a lifetime
of 150 hours. A random sample of 70 batteries of this type is tested and the lifetime, x
hours, of each battery is measured. The results are summarised by
∑ x = 10317, ∑ x2 = 1540231.
The population mean lifetime is denoted by µ hours. The null hypothesis µ = 150 is to be
tested against the alternative hypothesis µ < 150.
(i) Find the p-value of the test and state the meaning of this p-value in the context of the
question. [5]
A second random sample of 50 batteries of this type is test and the lifetime, y hours, of
each battery is measured, with results summarised by
∑ y = 7331, ∑ y 2 = 1100565.
(ii) Combining the two samples into a single sample, carry out a test, at the 10% significance
level, of the same null and alternative hypotheses. [6]

Exercise 206. (8864 N2008/I/11. Answer on p. 403.) An engineering company makes
cranes. The numbers, x, sold in each three-month period for two years together with the
profits, y thousand dollars, on the sale of these cranes are given in the following table.
x 15 17 13 21 16 22 14 18
y 290 350 270 430 340 410 300 360
(i) Give a sketch of the scatter diagram for the data as shown on your calculator. [2]
(ii) Find x̄ and ȳ, and mark the point (x̄, ȳ) on your scatter diagram. [2]
(iii) Calculate the equation of the regression line of y on x, and draw this line on your
scatter diagram. [2]
(iv) Calculate the product moment correlation coefficient, and comment on its value in
relation to your scatter diagram. [2]
(v) For the next three-month period, the sales target is 20 cranes. Estimate the correspond-
ing profit. [2]
(vi) The company’s sales director uses the regression line in part (iii) to predict the profit
if 40 cranes were to be sold in a three-month period. Comment on the validity of this
prediction. [2]
Exercise 207. (8864 N2008/I/12. Answer on p. 404.) A supermarket obtains a large

supply of apples of a single variety. The mass of an apple has a normal distribution with
mean 0.234 kg and standard deviation 0.025 kg. Some of the apples are packed, at random,
into ‘small’ bags, each containing 5 apples, and others are packed, at random, into ‘large’
bags, each containing 10 apples.
(i) Find the probability that a randomly chosen small bag has a mass exceeding 1.2 kg. [4]
(ii) Find the probability that the total mass of two randomly chosen small bags is within
±0.2 kg of the mass of a randomly chosen large bag. [4]
Lee buys two small bags at $1.50 per kg, and Foo buys one large bag at $1.20 per kg.
(iii) Find the probability that Lee pays at least $0.50 more than Foo. [6]
Exercise 208. (8864 N2007/I/6. Answer on p. 404.) A manufacturer produces packets

of margarine. The mass of margarine in a packet has a normal distribution with mean 502
g and standard deviation 0.8 g.
(i) Find the proportion of packets which contain less than 500 g of margarine. [2]
The manufacturer increases the mean amount of margarine in a packet to µ g. The standard
deviation remains unchanged. Only 1 packet in 1000, on average, now contains less than
500 g.
(ii) Find µ, correct to 1 decimal place. [3]

8865 (revised) syllabus, so you can skip it. A school has a canteen where students can buy
their lunch. Each day most, but not all, students buy their lunch in the canteen.
The headteacher wants to find out what students think of the lunches provided in the
canteen. On one particular day she selects a sample of students to interview from those
buying their lunch by
• choosing at random one of the first 10 students to buy their lunch,

• then choosing every 10th student after the first student chosen.
(i) What is this type of sampling method called? [1]
(ii) State one advantage and one disadvantage of the sampling method used in this context.
[2]
(iii) Describe an alternative sampling method which would be better in this case. [2]
Exercise 210. (8864 N2007/I/8. Answer on p. 404.) Seven cities in a certain country are
linked by rail to the capital city. The table below shows the distance of each city from the
capital and the rail fare from the city to the capital.
City A B C D E F G
Distance, x km 124 44 76 148 16 180 104
Rail fare, $y 156 53 99 169 23 177 138
(ii) Calculate the product moment correlation coefficient. [1]
You are given that the regression line of y on x has equation y = 16.7 + 1.01x, where the
coefficients are given correct to 3 significant figures.
(iii) Calculate the equation of the regression line of x on y, giving your answer in the form
x = a + by. [1]
(iv) Use the appropriate regression line to estimate
(a) the rail fare from a city that is 28 km from the capital, [2]
(b) the distance of a city from the capital if the rail fare is $198. [2]
(v) Comment briefly on the reliability of the estimates in part (iv). [2]

Exercise 211. (8864 N2007/I/9. Answer on p. 405.) A random variable X has a binomial
distribution with n = 6 and probability of success p.
(i) Write down an expression, in terms of p, for P(X = 4). [1]

It is given that p = 1/4. (ii) Find P(X = 4), giving your answer as a fraction. [1]
(iii) The mean and standard deviation of X are denoted by µ and σ respectively. Find
P(µ − σ < X < µ + σ), correct to 2 decimal places. [5]
Exercise 212. (8864 N2007/I/10. Answer on p. 405.) Bottles of a particular brand of

washing-up liquid are said to contain 500 ml. A random sample of 50 bottles is taken and
the volumes of liquid in the bottles are measured. The volumes, x ml, are summarised
∑(x − 500) = −35.8 and ∑(x − 500)2 = 150.5.
(ii) Assuming a normal distribution, test at the 5% significance level whether the population
mean volume is less than 500 ml. [4]
(iii) State, giving a reason, whether it is necessary to assume a normal distribution for the
test to be valid. [1]
Exercise 213. (8864 N2007/I/11. Answer on p. 405.) The table below shows the results
of a survey of the 120 cars in a car park, in which the colour of each car and the gender of
the driver were recorded.
Male Female
Green 18 12
Blue 48 22
Red 6 14
One of the cars is selected at random. M is the event that the car selected has a male
owner. G is the event that the car selected is green. B is the event that the car selected is
blue. R is the event that the car selected is red.
(i) Find the following probabilities: (a) P(M ), (b) P(M ∩ G), (c) P(M ∪ B), (d) P(M ∣R′ ).
[1 mark each.]
(ii) Determine whether the events M and G are independent, justifying your answer. [2]
It is given that bicycle racks are fitted to 20% of the green cars, 30% of the blue cars and
5% of the red cars. One of the cars is selected at random and found to have a bicycle rack
fitted. (iii) What is the probability that it is a blue car? [5]

Exercise 214. (8864 N2007/I/12. Answer on p. 406.) Men and women have masses,
in kg, that are normally distributed with means and standard deviations as shown in the
following table.
Mean mass Standard deviation

Men 75 12.5
Women 55 10.5
(i) Two men are chosen at random. Find the probability that one of the men has mass
more than 90 kg and the other has mass less than 90 kg. [4]
(ii) One man and one woman are chosen at random. Find the probability that the woman’s
mass is greater than the man’s. [4]
The safety limit for a hotel elevator is 530 kg. (iii) Six men are chosen at random. Find
the probability that their total mass is greater than 530 kg. [4]
(iv) Six male hotel guests enter the elevator, at a time when a large number of sumo
wrestlers are staying at the hotel. Give two reasons why the probability that their total
mass exceeds 530 kg may be different from the value calculated in part (iii). [2]
Exercise 215. (8174 N2006/II/8. Answer on p. 406.) A and B are independent events
such that P(A) = 0.6 and P(A ∪ B) = 0.7. Find P(A ∩ B ′ ). [6]
Exercise 216. (8174 N2006/II/9. Answer on p. 406.) This question is no longer in the
8865 (revised) syllabus, so you can skip it. Some students are conducting a survey at a
sports club. They each question a sample of the club members.
(i) Anil decides to choose the first 20 men and the first 20 women he sees. What name is
given to this type of sampling? [1]
(ii) Betty decides to choose every tenth person on the membership list. What name is given
to this type of sampling? [1]
(iii) Calvin decides to use random sampling. Describe briefly one way in which he could
select his sample. [2]
The club has 240 members and 3 sections — badminton, squash and tennis. The table
shows the number of men and women in each section.
Male Female TOTAL

Badminton 32 12 44
Squash 60 40 100
Tennis 48 48 96
TOTAL 140 100 240
(iv) Dennis decides to take a stratified sample of size 60 from the total membership. (a)
How many women does he select? (b) How many men from the squash section does he
select? [3]

Exercise 217. (8174 N2006/II/13. Answer on p. 406.) The probability that a resident of
a certain town watches a particular television programme is 0.3.
(i) Find the probability that exactly 4 out of 12 residents watch the programme. [3]
(ii) Use a suitable approximation to find the probability that, out of 80 residents, more
than 20 but less than 30 watch the programme. [7]
Exercise 218. (8174 N2006/II/14. Answer on p. 407.) A team either wins or loses each
of their matches. If the team wins a match, the probability that it wins the next match
is 0.8. If the team loses a match, the probability that it wins the next match is 0.4. The
team plays 4 matches in total. The team wins the first match. Calculate the probability
that the team wins
(i) both the second and third matches, [2]

(ii) the fourth match, [5]
(iii) at least 3 of the 4 matches placed. [3]
Exercise 219. (8174 N2006/II/14-OR. Answer on p. 407.) The heights of male students
in a college can be modelled using a normal distribution with mean 176 cm and standard
deviation 4 cm.
(i) Calculate the probability that one of these students, chosen at random, is less than 170
cm tall. [2]
(ii) Find the height that is exceeded by 10% of these students. [2]
In another college there are 1000 female students. Of these, 6 are less than 150 cm tall and
883 of them are less than 175 cm tall.
(iii) Assuming the heights of these students can be modelled using a normal distribution
with mean m and standard deviation s, find the value of m and of s. [6]

Part V
Answers to Exercises
My answers here are often more verbose than what would be necessary for you to get the
full credit on an exam. The reason is to help you understand my answers better.

60 Answers to Exercises in Part I: Functions and Graphs
Answer to Exercise 1. The error is in Step #5. Since x = y, we have x − y = 0. Hence,

we cannot divide both sides by x − y.
Answer to Exercise 2. Given f (x) = 7x−3, we have f (0) = 7⋅0−3 = −3, f (1) = 7⋅1−3 = 4,
and f (2) = 7 ⋅ 2 − 3 = 11.
Answer to Exercise 3. Given g (the function that maps each country to its capital), we
have g(France) = Paris and g(Japan) = Tokyo.
Answer to Exercise 4. (i)
4
3
2
1
0
-2.5 -2.0 -1.5 -1.0 -0.5 -1 0.0 0.5 1.0 1.5 2.0 2.5
-2
-3
-4
(ii)
2
1
0
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
-1
-2
-3
-4

12
0
-2 -1 0 1 2 3
-4
(ii)
12
0
-5 -3 -2 0 1 3
-4

Answer to Exercise 6. The graphs of all three equations are below: (a) y = 2x2 + x + 1
(red). (b) y = −2x2 + x + 1 (blue). (c) y = x2 + 6x + 9 (green).
(a) Since b2 − 4ac = 12 − 4(2)(1) = 1 − 8 = −7 < 0, there are no horizontal intercepts. The
b 1
vertical intercept is c = 1. The turning point is at x = − = − = −0.25.
2a 4
(b) Since b2 √− 4ac = 12√− 4(−2)(1) = 1 + 8 = 9 > 0, there are two horizontal intercepts,
−b ± 9 −1 ± 9
namely = = 1, −0.5. The vertical intercept is c = 1. The turning point is
2a −4
b −1
at x = − = = 0.25.
2a −4
(c) Since b2 − 4ac = 62 − 4(1)(9) = 36 − 36 = 0, there is one horizontal intercept, namely
b 6 b
− = − = 3. The vertical intercept is c = 9. The turning point is at x = − = 3.
2a 2 2a

Answer to Exercise 7. (i) The quadratic equation y = ax2 + bx + c has (a) two real
roots if and only if b2 − 4ac > 0; (b) two equal roots if and only if b2 − 4ac = 0; and (c) no
real roots if and only if b2 − 4ac < 0.
(ii) (a) ax2 + bx + c is positive for all possible values of x if and only if a > 0 (so ∪-shaped)
and b2 − 4ac < 0 (so doesn’t touch x-axis).
(b) ax2 + bx + c is negative for all possible values of x if and only if a < 0 (so ∩-shaped) and
b2 − 4ac < 0 (so doesn’t touch x-axis).
Answer to Exercise 8.
(53x ⋅ 251−x ) (53x ⋅ 52(1−x) )

= 2x+1 ∵ 25 = 52
5 2x+1 + 3(25 ) + 17(5 ) 5
x 2x + 3(25 ) + 17(5 )
x 2x
52+x
= 2x+1 Add the exponents
5 + 3(25x ) + 17(52x )
52+x
= 2x+1 ∵ 25 = 52
5 + 3(52x ) + 17(52x )
52+x
= 2x 1 Factorise out 52x
5 (5 + 3 + 17)
1 52+x 5x
= x = 2x = 2x = 5−x .
5 5 (25) 5
(8x+2 − 34(23x )) (8x+2 − 34(23x ))

√ 2x+1 = √ 2x √ 1 Splitting out the exponents
( 8) ( 8) ( 8)
(8x+2 − 34(23x )) (8x+2 − 34(8x ))
= x √ = x √
(8) ( 8) (8) ( 8)
(8x ) (82 − 34)
= x √ Factorise out the 8x
(8) ( 8)
(82 − 34) (64 − 34)
= √ = √
( 8) ( 8)
30 30 15
=√ = √ =√ .
8 2 2 2

Answer to Exercise 9. (i) x(a ) = xab is false. Here’s a counter-example. Let x = 2, a = 3,
b
b = 4. Then x(a ) = 2(3 ) = 281 , but xab = 23×4 = 212 – the two are clearly not equal.
b 4
b
(ii) (xa ) = xab is true, as we now prove:
b times
b
³¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
(xa ) = (xa ) ⋅ (xa ) ⋅ ⋅ ⋅ ⋅ ⋅ (xa )
b times
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
⎛³¹¹ ¹a¹ ¹ ¹ ¹ ¹ ¹times ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞ ⎛³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞
a times ⎛³¹¹ ¹a¹ ¹ ¹ ¹ ¹ ¹times ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞
= ⎜x ⋅ x ⋅ ⋅ ⋅ ⋅ ⋅ x⎟ ⋅ ⎜x ⋅ x ⋅ ⋅ ⋅ ⋅ ⋅ x⎟ ⋅ ⋅ ⋅ ⋅ ⋅ ⎜x ⋅ x ⋅ ⋅ ⋅ ⋅ ⋅ x⎟
⎝ ⎠ ⎝ ⎠ ⎝ ⎠
= xab .
10
9
8
7
6
5
4
3
2
1
0

Answer to Exercise 11. (i) by = 4 ⋅ 2(y−1990)/7 .
(ii)
(iii) In 2025, there’ll be b2025 = 128 Singaporean billionaires.
Answer to Exercise 12. (i) ln (1/e2 ) = −2, log5 0.008 = −4, lg 100000 = 5.
(ii) loga 16 = 4 Ô⇒ a = 2.
logb 0.25 = −1 Ô⇒ b = 4.
logc 5 = 1 Ô⇒ c = 5.
(iii) y = 3x Ô⇒ log3 y = x.
5 = pq Ô⇒ logp 5 = q.
(iv) α = log4 β Ô⇒ 4α = β.
logγ δ = 17 Ô⇒ γ 17 = δ.

Answer to Exercise 13. (i) log3 3x = x.
(ii) 2 loga 7 + 0.25 loga 81 − loga 3 = loga 49 + loga 811/4 − loga 3 = loga 49 + loga 3 − loga 3 =
loga 49 = loga x. So x = 49.
(iii) ln(y − 1) + ln y = ln [y(y − 1)] = 2 ⇐⇒ y(y − 1) = e2 ⇐⇒ y 2 − y − e2 = 0. By the

quadratic formula,
√ √
1± (−1)2 − 4(1) (−e2 ) 1 ± 1 + 4e2
y= = .
2(1) 2
√
We know that y must be positive, so it must be that y = (1 + 1 + 4e2 ) /2.

Answer to Exercise 15 (a). Graphed below is the equation y = ex .
The vertical intercept is 1 — i.e. the graph crosses the y-axis at the point (0, 1).
There are no turning points.

The horizontal asymptote is y = 0 — as x grows infinitely small (i.e. to −∞), y grows ever
closer to (but does not equal) 0.
There are no lines of symmetry.
8 y
4
y = ex
3
1
x
0
-2 -1 0 1 2

Answer to Exercise 15 (b). Graphed below is the equation y = 3x + 2.
The vertical intercept is 2 — i.e. the graph crosses the y-axis at the point (0, 2). The
horizontal intercept is −2/3 — i.e. the graph crosses the y-axis at the point (−2/3, 0).
There are no turning points.

There are no asymptotes.
There are infinitely many lines of symmetry — specifically, every line that is perpendicular
to the graph is a line of symmetry.
8 y
7
5
y = 3x + 2
4
1
x
0
-2 -1 0 1 2
-1
-2
-3
-4

Answer to Exercise 15 (c). Graphed below is the equation y = 2x2 + 1.
The vertical intercept is 1 — i.e. the graph crosses the y-axis at the point (0, 1).
The turning point is (0, 1).

There are no asymptotes.
There is one line of symmetry, namely x = 0 (this is also the vertical axis).
10 y
8
y = 2x2 + 1
7
1
x
0
-2 -1 0 1 2


3. Press the blue 2ND button and then ex (which corresponds to the LN button) to
enter “eˆ”. Next press X,T,θ,n to enter “X”. Then − , X,T,θ,n x2 + to enter “−x2 +”.
√
Now press the blue 2ND (which corresponds to the x2 button), then X,T,θ,n to
√
enter “ (x”.
√
4. Now press GRAPH and the calculator will graph the equation y = x.

1 3 2 4
Answer to Exercise 17. (i) Rearrange = into y = 0.2x + 0.4. Rearrange = into y =
x2 + 5x + 3.6.
3 4
Now plug = into = to get 0.2x + 0.4 = x2 + 5x + 3.6 or x2 + 4.8x + 3.2 = 0 or 5x2 + 24x + 16 = 0.
Now use the quadratic formula:
√ √
−24 ± 242 − 4(5)(16) −24 ± 256
x= = = −4, −0.8.
2(5) 10
Correspondingly, y = −0.4 or y = −0.24.
(−4, −0.4) and (x, y) = (−0.8, −0.24). TI84 screenshots:
1 3 2 4
(ii) Rearrange = into y = 1 − 4x. Rearrange = into y = −2x2 + 5x − 3.
3 4
Now plug = into = to get 1 − 4x = −2x2 + 5x − 3 or 2x2 − 9x + 4 = 0 or (2x − 1)(x − 4) = 0. So
x = 0.5 or x = 4.
Correspondingly, y = −1 or y = −15.
(0.5, −1) and (x, y) = (4, −15). TI84 screenshots:

1
Answer to Exercise 18 (a). The system of equations is y = √ , y = x5 − x3 + 2.
1+ x
1
Rewrite the two equations into a new equation y = x5 − x3 + 2 − √ .
1+ x
Our goal is to find the horizontal intercepts of this equation. These horizontal intercepts
will give us the solutions to the above system of equations.
1
1. Graph the equation y = x5 − x3 + 2 − √ .
1+ x
It looks like there are no horizontal intercepts. Conclusion: This system of equations has
no solutions.
After Step 1.
1
Answer to Exercise 18 (b). The system of equations is y = , y = x3 + sin x.
1−x2
1
Rewrite the two equations into a new equation y = − x3 − sin x.
1 − x2
Our goal is to find the horizontal intercepts of this equation. These horizontal intercepts
will give us the solutions to the above system of equations.
1
1. Graph the equation y = − x3 − sin x.
1−x 2
It looks like there is only one horizontal intercept.
2. Find the horizontal intercept.
It is −1.1790. Conclusion: This system of equations has one solution and its x-coordinate is
−1.1790. To find the corresponding y-coordinate, we need merely plug in this value of x into
1 1
either of the equations in the original system of equations: y = = 2 ≈
1−x 2
1 − (−1.1790)
−2.5633. Altogether, this system of equations has one solutions: (−1.1790, −2.5633).

Answer to Exercise 19. (i) Rearrange the inequality x2 + 3x − 5 > 6 − 2x2 into
3x2 + 3x − 11 > 0. The expression 3x2 + 3x − 11 is a ∪-shaped quadratic that equals 0 when
√ √
−3 ± 32 − 4(3)(−11) −3 ± 141
x= = .
2(3) 6
√ √
Thus, the inequality holds if x < (−3 − 141) /6 or x > (−3 + 141) /6.
(ii) Rearrange the inequality (x−3)(x+5) < 1 into x2 +2x−16 < 0. The expression x2 +2x−16
is a ∪-shaped quadratic that equals 0 when
√ √
−2 ± 22 − 4(1)(−16) −2 ± 68 √
x= = = −1 ± 17.
2(1) 2
√ √
Thus, the inequality holds if −1 − 17 < x < −1 + 17.

Answer to Exercise 20 (a) Rewrite the inequality as x3 − x2 + x − 1 − ex > 0. Graph
y = x3 − x2 + x − 1 − ex . x3 − x2 + x − 1 − ex = 0 ⇐⇒ x = 3.0472, 3.5040. Thus, x3 − x2 + x − 1 > ex
⇐⇒ 3.0472 < x < 3.5040.
After graphing. Zoom, adjust. Left intercept. Right intercept.
√ √
Answer
√ to Exercise 20 (b) Rewrite the
√ inequality as x−cos x > 0. Graph y = x−cos x.
x − cos x = 0 ⇐⇒ x = 0.6417. Thus, x > cos x ⇐⇒ x > 0.6417.
After graphing. Zoom in once. The only horizontal intercept.
Answer to Exercise 20 (c) Rewrite the inequality as 1/ (1 − x2 ) − x3 − sin x > 0. Graph

y = 1/ (1 − x2 ) − x3 − sin x. By observation,
• 1/ (1 − x2 ) − x3 − sin x > 0 if −1 < x < 1; and

• 1/ (1 − x2 ) − x3 − sin x < 0 if x > 1;
• 1/ (1 − x2 ) − x3 − sin x > 0 if x is to the left of the only horizontal intercept.
After graphing. Zoom in once. The only horizontal intercept.
1/ (1 − x2 ) − x3 − sin x = 0 ⇐⇒ x = −1.179. We conclude that
1/ (1 − x2 ) − x3 − sin x > 0 ⇐⇒ x < −1.179 or − 1 < x < 1.

Answer to Exercise 21. Let A, B, and C be the present-day age of Apu, Beng, and
Caleb.
Let k be the number of years ago when Apu was 40 years old. From the first sentence, we
1 2 3 4
know that A − k = 40 and B − k = 2(C − k). From the second sentence: A = 2B and C = 28.
3 1 4 2 5 6
Sub =into = and =into = to get 2B − k = 40 and B − k = 2(28 − k).
5 7 7 6
From =, k = 2B − 40. Sub = into = to get:
B − (2B − 40) = 2 [28 − (2B − 40)]

40 − B = 2 [68 − 2B] = 136 − 4B
3B = 96 Ô⇒ B = 32
3
Beng is 32 years old today. And from =, Apu is 64 years old today.
Answer to Exercise 22. The given information provides this system of equations
2 1 2 2 2 3
a (1) + b (1) + c = 2, a (3) + b (3) + c = 5, a (6) + b (6) + c = 9.
You can solve this system of equations either by calculator or by hand, as I do now:
2 1 4
Take = minus = to get 8a + 2b = 3 or b = 0.5(3 − 8a) = 1.5 − 4a.
4 1 5
Plug = into = to get a + 1.5 − 4a + c = 2 or c = 0.5 + 3a.
4 5 3
Plug = and = into = to get
36a + 6 (1.5 − 4a) + 0.5 + 3a = 9 ⇐⇒ 15a + 9.5 = 9 ⇐⇒ 15a = −0.5 ⇐⇒ a = −1/30.
4 5
Now from =, b = 49/30 and from =, c = 0.4.
Answer to Exercise 23. The turning point (which is a minimum turning point if a is
b b 2 b b2 b2 b2
positive) of the equation is at x = − and y = a (− ) + b (− ) + c = − +c = c− .
2a 2a 2a 4a 2a 4a
We know that at the minimum point, x = 0 and y = 0. So b = 0 and c = 0. Since (−1, 2)
2 1
satisfies the equation y = ax2 + bx + c, we also have a (−1) + b (1) + c = 2. Thus, a = 2.
Altogether then, a = 2, b = 0, and c = 0.

61 Answers to Exercises in Part II: Calculus
Answer to Exercise 24. (i) (a) f ′ (x) = 1/x + ex + 2x.
(i) (b) f (1) = ln 1 + e1 + 12 = e + 1 and f ′ (1) = 1/1 + e1 + 2(1) = 3 + e.

So the equation of the tangent at the point (1, 3 + e) is
y − (e + 1) = (3 + e) (x − 1) .
Or rearranging: y = (3 + e)x − 2.
f (2) = ln 2 + e2 + 22 = ln 2 + e2 + 4 and f ′ (2) = 1/2 + e2 + 2(2) = 4.5 + e2 .

So the equation of the tangent at the point (2, ln 2 + e2 + 4) is
y − (ln 2 + e2 + 4) = (4.5 + e2 ) (x − 2) .
Or rearranging: y = (4.5 + e2 ) x − 5 − e2 + ln 2.
(ii) (a) g ′ (x) = −1/x2 + 3x2 + 7ex .

(ii) (b) g(1) = 1/1 + 13 + 7e1 = 2 + 7e and g ′ (1) = −1/12 + 3 ⋅ 12 + 7e = 2 + 7e.
So the equation of the tangent at the point (1, 2 + 7e) is
y − (2 + 7e) = (2 + 7e) (x − 1) .
Or rearranging: y = (2 + 7e)x.
g(2) = 1/2 + 23 + 7e2 = 8.5 + 7e2 and g ′ (2) = −1/22 + 3 ⋅ 22 + 7e2 = 11.75 + 7e2 .
So the equation of the tangent at the point (2, 8.5 + 7e2 ) is
y − (8.5 + 7e2 ) = (11.75 + 7e2 ) (x − 2) .
Or rearranging: y = (11.75 + 7e2 ) x − 15 − 7e2 .

dy 3
Answer to Exercise 25. (i) (a) = 13 (0.5x−0.5 + 2 ⋅ 3 ). (i) (b) At x = 1,
dx x
√ 3 dy 3
y = 13 ( 1 − 2 ) = −26 and = 13 (0.5 ⋅ 1−0.5 + 2 ⋅ 3 ) = 84.5.
1 dx 1
So the equation of the tangent at the point (1, −26) is y − (−26) = 84.5 (x − 1). Or rearrang-
ing: y = 84.5x − 110.5. And at x = 2:
√ 3 √ 3 dy 3 1 3
y = 13 ( 2 − 2 ) = 13 ( 2 − ) and = 13 (0.5 ⋅ 2−0.5 + 2 ⋅ 3 ) = 13 ( √ + ) .
2 4 dx 2 2 2 4
√
So the equation of the tangent at the point (2, 13 ( 2 − 3/4)) is
√ 3 1 3 1 3 √ 9 1
y − [13 ( 2 − )] = 13 ( √ + ) (x − 2) or y = 13 ( √ + ) x + 13 ( 2 − − √ ) .
4 2 2 4 2 2 4 4 2
(ii) (a) dy/dx = 9ex − 5x4 .
(ii) (b) At x = 1, y = 9e1 − 15 = 9e − 1 and dy/dx = 9e1 − 5 ⋅ 14 = 9e − 5.

So the equation of the tangent at the point (1, 9e − 1) is
y − (9e − 1) = (9e − 5) (x − 1) or y = (9e − 5)x + 4.
At x = 2, y = 9e2 − 25 = 9e2 − 32 and dy/dx = 9e2 − 5 ⋅ 24 = 9e2 − 80.
So the equation of the tangent at the point (2, 9e2 − 32) is
y − (9e2 − 32) = (9e2 − 80) (x − 2) or y = (9e2 − 80) x − 9e2 + 128.
Answer to Exercise 26. (a) f ′ (x) = 2x, so f ′ (0) = 0.

1
(b) g ′ (x) = 2 [x − ln (x + 1)] (1 − ). So g ′ (0) = 0.
x+1
3 2
(c) Observe that h(x) = [g(x)] . So h′ (x) = 3 [g(x)] g ′ (x).
2
Since g(0) = 1 and g ′ (0) = 0, we have h′ (0) = 3 [g(0)] g ′ (0) = 0.

(ii) f ′ (x) = 6x − 4 is negative for x < 2/3, equal to 0 at x = 2/3, and positive for x > 2/3.
(iii) There is one stationary point: (2/3, −1/3).
Answer to Exercise 28. In order for −1 to be a minimum turning point of g, it must

be that to its left, g is decreasing; while to its right, g is increasing. In other words, to the
left of −1, g ′ (x) ≤ 0. While to the right of −1, g ′ (x) ≥ 0. Altogether then, we must have
g ′ (−1) = 0 — at the minimum turning point, the slope of the function must be 0.

Answer to Exercise 29. (i) f ′ (x) = 1. So f has no stationary points. Hence, it has no
turning points.
(ii) g ′ (x) = 0. So every point of g is a stationary point. However, no point of g is a turning
point. Indeed, the graph of g is simply a horizontal line.
(iii) h′ (x) = 4x3 − 4x = 4x (x2 − 1) = 4x(x − 1)(x + 1). So the stationary points of h are
where x = 0, x = 1, or x = −1.
From a graph sketch, we see that there are minimum turning points at x = ±1 and a
maximum turning point at x = 0.
-2 -1 0 1 2

Answer to Exercise 29. (iv) i′ (x) = 3x2 . So the only stationary point of i is at x = 0.
But this is not a turning point. (Indeed, as we’ll discover in the next chapter, this is an
example of an inflexion point.)
(v) j ′ (x) = 3x2 + 2x − 1 = (3x − 1)(x + 1). So the only two stationary points are at x = 1/3
or x = −1 .
From a graph sketch, we see that there is a minimum turning points at x = −1 and a
maximum turning point at x = 1/3.

Answer to Exercise 30. (i) f ′ (x) = 3x2 − 3 = 3 (x2 − 1) = 3(x − 1)(x + 1). So the only
stationary points are at x = −1 and x = 1. From a graph sketch, the former (labelled A
below) is a maximum turning point and the latter (labelled C below) is a minimum turning
point.
There are no stationary points of inflexion. (However, there is a non-stationary point of

inflexion, namely B. But you need not know how to identify this for the A-Levels.)
(ii) g ′ (x) = 3x2 − 6x + 3 = 3 (x2 − 2x + 1) = 3(x − 1)2 . The only stationary point is at x = 1.
From a graph sketch, it is a point of inflexion.

√
1 3
Answer to Exercise 31. (a) The volume is fixed as 1 = πr2 h. So r = .
3 πh
√
√ 3
(b) By the Pythagorean Theorem, l = r2 + h2 = + h2 .
πh
(c) The total external surface area of the cone (including the base) is
√ √ √ √
3 3 9 3h 9
A = πrl = π + h2 = π + = + 3πh.
πh πh π 2 h2 π h2
−18
dA h3 + 3π 3 π − h63 3 π − h63
(d) Compute = √ = √ = .
dh 2 9 + 3πh 2 9 + 3πh 2 A
h2 h2
dA 6 1/3
So = 0 ⇐⇒ h = ( ) ≈ 1.24 m.
dh π
√
(e) Graph A = 9/h2 + 3πh on your graphing calculator. (This is simply the expression we
found in part (c).)
(Ignore the region where h < 0 since the height of the cone cannot be negative.)
Zoom in to verify that the stationary point we found in part (d) is indeed a minimum
turning point.
Answer to Exercise 32. The function f defined by f (x) = 2x has indefinite integrals
i defined by i(x) = x2 + 2, and k defined by k(x) = x2 .
The function g defined by g(x) = 3x2 has indefinite integrals
h defined by h(x) = x3 , and j defined by j(x) = x3 + 1.

d
1. (kx + C) = k Ô⇒ ∫ k dx = kx + C, ✓
dx
d xk+1 xk+1
2. ( + C) = xk Ô⇒ ∫ x k
dx = + C, ✓
dx k + 1 k+1
d x
4. (e + C) = ex Ô⇒ ∫ e dx = e + C,
x x
✓
dx
d (ax + b)k+1 (ax + b)k+1

5. [ + C] = (ax + b)k Ô⇒ ∫ (ax + b)k
dx = + C, ✓
dx a(k + 1) a(k + 1)
d 1 ax+b 1
6. [ e + C] = eax+b Ô⇒ ∫ e
ax+b
dx = eax+b + C, ✓
dx a a
d
7. [f (x) ± g(x) + C] = f ′ (x) ± g ′ (x) Ô⇒ ∫ f ′ (x) ± g ′ (x) dx = f (x) ± g(x) + C, ✓
dx
d
8. [kf (x) + C] = kf ′ (x) Ô⇒ ′
∫ kf (x) dx = kf (x) + C. ✓
dx
Answer to Exercise 34. (i) ∫ 7x5 − 8x4 + 3x2 + 2 dx = 7x6 /6 − 8x5 /5 + x3 + 2x + C, where
C is the constant of integration.
5x+2 2 5x+2 (5x + 2)3

(ii) ∫ e − (5x + 2) dx = e /5 − + C, where C is the constant of integration.
3⋅5
(iii) ∫ 16/x + 32x3 dx = 16 ln ∣x∣ + 8x4 + C, where C is the constant of integration.
2 2
2
Answer to Exercise 35. (i) ∫ y dx = ∫ 6 dx = [6x]1 = 12 − 6 = 6.
1 1
2
3 3
2 x3 5x2 8 1 5 5
(ii) ∫ y dx = ∫ x + 5x + 10 dx = [ + + 10x] = + 10 + 20 − ( + + 10) = 19 .
−2 −2 3 2 1 3 3 2 6
2 2
2
(iii) ∫ y dx = ∫ 1/x dx = [ln ∣x∣]1 = ln 2 − ln 1 = ln 2.
1 1

Answer to Exercise 36. Our desired area is labelled A below.
Method #1. The entire rectangle A + B + C + D has area 21/3 × 2 = 24/3 . The rectangle B + C
4 2
1/3
21/3 x 24/3 − 1
has area 1 × 1 = 1. The region D has area ∫ x3 dx = [ ] = . Hence,
1 4 1 4
4/3 24/3 − 1 3
A = A + B + C + D − (B + C + D) = 2 − (1 + ) = (24/3 − 1) .
4 4
y=2 2 3 4/3 2 3 4/3

Method #2. y = x3 ⇐⇒ x = y 1/3 . So A = ∫ x dy = ∫ y 1/3 dy = [y ]1 = (2 − 1).
y=1 1 4 4
y
y=2
A
y=1
D
B
C
x

Answer to Exercise 37. The desired area is labelled A below.
The area A + B + C + D equals 3 (ln 3 − 0.5).

ln 3 ln 3
ln 3
The area C + D equals ∫ y dx = ∫ ex dx = [ex ]ln 2 = eln 3 − eln 2 = 3 − 2 = 1.
ln 2 ln 2
The area B equals 2 (ln 2 − 0.5).

Hence, the desired area A = (A + B + C + D)−B −(C + D) = 3 (ln 3 − 0.5)−2 (ln 2 − 0.5)−1 =
3 ln 3 − 2 ln 2 − 1.5.
√
Answer to Exercise 38. The two curves intersect at ± 2/2 (quadratic formula). So
√ √
2/2 √ √ √ √ √
2/2 2x 3 2 2 2 2 2 2 2 2
A=∫ √ 2 − x2 − (x2 + 1) dx = [x − ] √ =[ − ] − [− + ]= .
− 2/2 3 − 2/2 2 12 2 12 3

62 Answers to Exercises in Part III: Probability and Statistics
Answer to Exercise 39. Taking the green path, there are 3 ways. Taking the red path,
there are 2 ways. Hence, there are 3 + 2 = 5 ways to get from the Starting Point to the
River.
Answer to Exercise 40. The tree diagram below illustrates.

Case #1. First letter is a D.
Case #1(i). Second letter is a D. Then the last two letters must both be E’s. (1 permuta-
tion.)
Case #1(ii). Second letter is an E. Then the last two letters must be either DE or ED. (2
permutations.)
Case #2. First letter is a E.

Case #2(i). Second letter is an E. Then the last two letters must both be D’s. (1 permu-
tation.)
Case #2(ii). Second letter is a D. Then the last two letters must be either DE or ED. (2
permutations.)
Altogether then, there are 1 + 2 + 1 + 2 = 6 possible permutations of the letters in DEED.
Answer to Exercise 41. 3 × 5 × 10 = 150.

Answer to Exercise 42. We must choose three 4D numbers. Choosing the first 4D
number involves four decisions — what to put as the first, second, third and fourth digits,
with the condition that no digit is repeated.
____
1 2 3 4
Thus, by the MP, there are 10 × 9 × 8 × 7 = 5040 ways to choose the first 4D number.
If we ignored the fact that we already chose the first 4D number, then there’d similarly be
5040 ways to choose the second 4D number (given the condition that this second 4D number
does not have any repeated digits). However, there is an additional condition — namely,
the second 4D number cannot be the same as the first. Thus, there are 5040 − 1 = 5039
ways to choose the second 4D number.
By similar reasoning, we see that there are 5040 − 2 = 5038 ways to choose the third 4D
number.
Altogether then, by the MP, there are 5040 × 5039 × 5038 = 127, 947, 869, 280 ways to choose
the three 4D numbers.

Answer to Exercise 43. Apply the IEP twice.
1. The food court and hawker centre share 2 types of cuisine (Chinese and Western) in
common. And so together, the food court and the hawker centre have 4 + 3 − 2 = 5
different types of cuisine.
2. Combine together the food court and the hawker centre (call this the “Low-Class Place”).
The Low-Class Place has 5 types of cuisine and shares 2 types of cuisine (Chinese and
Malay) with the restaurant. And so together, the Low-Class Place and restaurant have
5 + 3 − 2 = 6 different types of cuisine (namely Chinese, Indonesian, Japanese, Korean,
Malay, and Western).
Answer to Exercise 44. 10 − 3 = 7. (Can you name them?)
Answer to Exercise 45. 6! = 720, 7! = 5040, and 8! = 40320.

Answer to Exercise 46. 7!/ (4!3!) = 35.
Answer to Exercise 47. The problem of choosing a president and vice-president from
a committee of 11 members is equivalent to the problem of filling 2 spaces with 11 distinct
objects. The answer is thus P (11, 2) = 11!/9! = 11 × 10 = 110.

Answer to Exercise 48. Let B and S stand for brother and sister, respectively.
(a) First consider the problem of permuting the seven letters in BBBBSSS, without any
two B’s next to each other. There is only 1 possible arrangement, namely BSBSBSB.
There are 4! ways to permute the brothers and 3! ways to permute the sisters.
Hence, there are in total 1 × 4!3! = 144 possible ways to arrange the siblings in a line, so
that no two brothers are next to each other.
(b) First consider the problem of permuting the seven letters in BBBBSSS, without any
two S’s next to each other. We’ll use the AP.
1. B in position #1.
(a) B in position #2. Then the only way to fill the remaining five positions is SBSBS.
Total: 1 possible arrangement.
(b) S in position #2. Then we must have B in position #3.
i. B in position #4. Then the only way to fill the remaining three positions is
SBS. Total: 1 possible arrangement.
ii. S in position #4. Then we must have B in position #5. And there are two
ways to fill the remaining two positions: either BS or SB. Total: 2 possible
arrangements.
2. S in position #1. Then we must have B in position #2.
(a) B in position #3. Then, like in 1(b), we are left with two B’s and two S’s to fill
the remaining four positions. Hence, Total: 3 possible arrangements.
(b) S in position #3. Then we must have B in position #4. There are three ways
to fill the remaining three positions: SBB, BSB, and BBS. Total: 3 possible
arrangements.
By the AP, there are 1 + 1 + 2 + 3 + 3 = 10 possible arrangements.

Again, there are 4! ways to permute the brothers and 3! ways to permute the sisters.
Hence, there are in total 10 × 4!3! = 1440 possible ways to arrange the siblings in a line, so
that no two sisters are next to each other.

⎛n⎞ n!
=
⎝ k ⎠ k!(n − k)!
n × (n − 1) × ⋅ ⋅ ⋅ × (n − k + 1) × (n − k) × (n − k − 1) × ⋅ ⋅ ⋅ × 1
=
k!(n − k) × (n − k − 1) × ⋅ ⋅ ⋅ × 1
n × (n − 1) × (n − 2) × ⋅ ⋅ ⋅ × (n − k + 1)
= (mass cancellation).
k!
4! 4! 4×3
C(4, 2) = = = = 6,
2!(4 − 2)! 2!2! 2 × 1
6! 6! 6×5
C(6, 4) = = = = 15,
4!(6 − 4)! 4!2! 2 × 1
7! 7! 7×6×5
C(7, 3) = = = = 35.
3!(7 − 3)! 3!4! 3 × 2 × 1
⎛ 3 ⎞⎛ 7 ⎞⎛ 5 ⎞
Answer to Exercise 51. = 630.
⎝ 1 ⎠⎝ 2 ⎠⎝ 2 ⎠
Answer to Exercise 52. (a) C(1, 0) + C(1, 1) = 1 + 1 = 2 = C(2, 1).

(b) C(4, 2) + C(4, 3) = 3 + 3 = 6 = C(5, 3).
17! 17! 17 × 16 17 × 16 × 15
(c) C(17, 2) + C(17, 3) = + = +
2!15! 3!14! 2×1 3×2×1
18 × 17 × 16
= 17 × 8 + 17 × 8 × 5 = 17 × 8 × 6 = .
3×2×1

⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞
Answer to Exercise 53. = 1, = 7, = 21, = 35, = 35,
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠ ⎝4⎠
⎛7⎞ ⎛7⎞ ⎛7⎞
= 21, = 7, = 1.
⎝5⎠ ⎝6⎠ ⎝7⎠
Answer to Exercise 54. Expanding, we have
(1 + x)3 = (1 + x)(1 + x)(1 + x)

= 1 ⋅ 1 ⋅ 1 + 1 ⋅ 1 ⋅ x + 1 ⋅ x ⋅ 1 + x ⋅ 1 ⋅ 1 + 1 ⋅ x ⋅ x + x ⋅ 1 ⋅ x + x ⋅ x ⋅ 1 + x ⋅ x ⋅ x.
´¹¹ ¹ ¸′ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸′ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¸′ ¹ ¹ ¹ ¹ ¶
0 xs 1 x 2 xs 3 xs
Consider the 6 terms on the right. There is C(3, 0) = 1 way to choose 0 of the x’s. Hence,
the coefficient on x0 is C(3, 0) — this corresponds to the term 1 ⋅ 1 ⋅ 1 above.
There are C(3, 1) = 3 ways to choose 1 of the x’s. Hence, the coefficient on x1 is C(3, 1) —
this corresponds to the terms 1 ⋅ 1 ⋅ x, 1 ⋅ x ⋅ 1, and x ⋅ 1 ⋅ 1 above.
There are C(3, 2) = 3 ways to choose 2 of the x’s. Hence, the coefficient on x2 is C(3, 2) —
this corresponds to the terms 1 ⋅ x ⋅ x, x ⋅ 1 ⋅ x, and x ⋅ x ⋅ 1 above.
There is C(3, 03) = 1 way to choose 3 of the x’s. Hence, the coefficient on x3 is C(3, 3) —
this corresponds to the term x ⋅ x ⋅ x above.
Altogether then,
⎛3⎞ 0 ⎛3⎞ 1 ⎛3⎞ 2 ⎛3⎞ 3

(1 + x)3 = x + x + x + x = 1 + 3x + 3x2 + x3 .
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠
Answer to Exercise 55. 27 = 128.
⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞

+ + + ⋅⋅⋅ + = 1 + 7 + 21 + 35 + 35 + 21 + 7 + 1 = 128.
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝7⎠
⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞

So indeed, 27 = + + + ⋅⋅⋅ + .
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝7⎠

⎛4⎞ 4 0 ⎛4⎞ 3 1 ⎛4⎞ 2 2 ⎛4⎞ 1 3 ⎛4⎞ 4 4

(3 + x)4 = 3x + 3x + 3x + 3x + 3x
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠ ⎝4⎠
= 81 + 4 ⋅ 27x + 6 ⋅ 9x2 + 4 ⋅ 3x3 + x4 = 81 + 108x + 54x2 + 12x3 + x4 .
⎛4⎞
Answer to Exercise 57. (a) There are = 4 ways of choosing the two Tan sons
⎝2⎠
⎛3⎞
and = 3 ways of choosing the two Wong daughters.
⎝2⎠
Having chosen these sons and daughters, there are only 2! = 2 × 1 possible ways of matching
them up. This is because for the first chosen Tan Son, we have 2 possible choices of brides
for him. And then for the second chosen Tan Son, there is only 1 possible choice of bride
left for him.
⎛ 4 ⎞⎛ 3 ⎞
Altogether then, there are ⋅ 2 = 24 ways of forming the two couples.
⎝ 2 ⎠⎝ 2 ⎠
⎛6⎞ ⎛9⎞
(b) There are = 6 ways of choosing the five Lee sons and = 126 ways of choosing
⎝5⎠ ⎝5⎠
the five Ho daughters.
Having chosen these sons and daughters, there are 5! = 5 × 4 × 3 × 2 × 1 possible ways of
matching them up. This is because for the first chosen Tan Son, we have 5 possible choices
of brides for him. And then for the second chosen Tan Son, there are 4 possible choices of
brides left for him. Etc.
⎛ 6 ⎞⎛ 9 ⎞
Altogether then, there are ⋅ 5! = 6 ⋅ 126 ⋅ 5! = 90, 720 ways of forming the five
⎝ 5 ⎠⎝ 5 ⎠
couples.

Answer to Exercise 58(a). The 52 possible outcomes are:
A«, K«, Q«, . . . , 2«, Aª, Kª, Qª, . . . , 2ª, A©, K©, Q©, . . . , 2©, A¨, K¨, Q¨, . . . , 2¨.
P(A) = 1/4.
(b) The 4 possible outcomes are HH, HT , T H, and T T .

P(B) = 1/2.
(c) The 36 possible outcomes are:
, ,..., , ,..., , ,..., .
There are 4 ways to roll a 9, namely: , , , or .
Hence P(C) = 4/36.
Answer to Exercise 59. A and B are not mutually exclusive. A and C are not mutually
exclusive. But B and C are mutually exclusive.
Answer to Exercise 60. B ′ is the event that the student has at least two phones. C ′
is the event that the student has zero, one, or at least three phones.

Answer to Exercise 61. P(A) = 0.5, P(B) = 3/36, P(C) = 0.5, P(A ∪ B) = P(A) +
P ({11}) = 0.5+1/12 = 7/12, P(A∪C) = 1, and P(B∪C) = P(C)+P ({12}) = 0.5+1/12 = 7/12.
Answer to Exercise 62. P(A ∩ B) = P ({12}) = 1/12, P(A ∩ C) = 0, and P(B ∩ C) =

P ({11}) = 1/12.
Answer to Exercise 63. (a)
(b)

Answer to Exercise 64. Let A be the event that we rolled at least one even number
and B be the event that the sum of the two dice was 8. We have P(B) = 5/36 (see Exercise
70).
And A ∩ B can occur if and only if the two dice were , , or . Hence, P(A ∩ B) =
3/36.
Altogether then,
P(A ∩ B) 3/36 3
P(A∣B) = = = .
P(B) 5/36 5
Answer to Exercise 65. By Fact 7, A, B are independent events ⇐⇒ P(A∣B) = P(A).

Rearranging, P(B) = P(A ∩ B)/P(A) = P(B∣A), as desired.
Answer to Exercise 66. First, note that P (H1 ) = P (T1 ) = P (H2 ) = 0.5.
(a) P (H1 ∩ H2 ) = 0.25 = 0.5 × 0.5 = P (H1 ) P (H2 ), so that indeed H1 and H2 are indepen-
dent.
(b) P (H2 ∩ T1 ) = 0.25 = 0.5×0.5 = P (H2 ) P (T1 ), so that indeed H2 and T1 are independent.
(c) Observe that H1 ∩ T1 = ∅ (it is impossible that “the first coin flip is heads” AND also
“the first coin flip is tails”).
Hence, P (H1 ∩ T1 ) = P (∅) = 0 ≠ 0.25 = 0.5 × 0.5 = P (H1 ) P (T1 ), so that indeed H1 and T1
are not independent.
Answer to Exercise 67. No, the journalist is incorrectly assuming that the probability
of one family member making the NBA is independent of another family member making
the NBA. But such an assumption is almost certainly false.
The same excellent genes that made Rick Barry a great basketball player, probably also
helped his three sons. Not to mention that having an NBA player as your father probably
helps a lot too.
The two events “family member #1 in NBA” and “family member #2 in NBA” are probably
not independent. So we cannot simply multiply probabilities together.
Answer to Exercise 68. The possible observed values of X are 2, 3, 4, . . . , and 12.

Answer to Exercise 69. The possible observed values of C are 8 (two aces), 7 (one ace
and one king), 6 (one ace and one queen, or two kings), 5 (one ace and one jack, or one
king and one queen), 4 (one ace, or one king and one jack, or two queens), 3 (one king, or
one queen and one jack) 2 (one queen, or two jacks), 1 (one jack), 0 (no ace, king, queen,
or jack).
Answer to Exercise 70. (a)
k s such that X(s) = k P(X = k)

1
2 .
36
2
3 , .
36
3
4 , , .
36
4
5 , , , .
36
5
6 , , , , .
36
6
7 , , , , , .
36
5
8 , , , , .
36
4
9 , , , .
36
3
10 , , .
36
2
11 , .
36
1
12 .
36
(b) E is the event X ≥ 10.
3 2 1 6 1
(c) P(E) = P (X ≥ 10) = P (X = 10) + P (X = 11) + P (X = 12) = + + = = .
36 36 36 36 6
Answer to Exercise 71. No. For example, P (X = 0, Y = 0) = 0, but P (X = 0) P (Y = 0) =

0.5 × 0.25 = 0.125.

Answer to Exercise 72. (a) P(X + Y = 2) is simply the probability of 2 heads and 0
sixes OR 1 head and 1 six OR 0 heads and 2 sixes. So
1 1 5 5 ⎛ 2 ⎞1 1⎛ 2 ⎞5 1 1 1 1 1
P (X + Y = 2) = ⋅ ⋅ ⋅ + ⋅ ⋅ + ⋅ ⋅ ⋅
2 2 6 6 ⎝ 1 ⎠2 2⎝ 1 ⎠6 6 2 2 6 6
25 20 1 46
= + + = .
144 144 144 144
(b) P (X + Y = 3) is simply the probability of 2 heads and 1 six OR 1 head and 2 sixes. So
1 1 ⎛ 2 ⎞ 5 1 ⎛ 2 ⎞ 1 1 1 1 10 2 12
P (X + Y = 3) = ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ = + = .
2 2 ⎝ 1 ⎠ 6 6 ⎝ 1 ⎠ 2 2 6 6 144 144 144
(c) P (X + Y = 4) is simply the probability of 2 heads and 2 sixes. So
1 1 1 1 1
P (X + Y = 4) = ⋅ ⋅ ⋅ = .
2 2 6 6 144
(d) E[X + Y ]
= P (X + Y = 0) ⋅ 0 + P (X + Y = 1) ⋅ 1 + P (X + Y = 2) ⋅ 2
+ P (X + Y = 3) ⋅ 3 + P (X + Y = 4) ⋅ 4
25 60 46 12 1 60 + 92 + 36 + 4 192 4
= ⋅0+ ⋅1+ ⋅2+ ⋅3+ ⋅4= = = .
144 144 144 144 144 144 144 3

Answer to Exercise 73(a). The possible observed values of X are 2000, 1000, 490, 250, 60, 0.
(Don’t forget to include 0.)
Similarly, the possible observed values of X are 3000, 2000, 800, 0.
1
(b) P (X = 2000) = P (X = 1000) = P (X = 490) = ,
10000
10
P (X = 250) = P (X = 60) = ,
10000
9977
P (X = 0) = ,
10000
1
P (Y = 3000) = P (Y = 2000) = P (Y = 800) = ,
10000
9997
P (Y = 0) = .
10000
(c) E[X] = 2000P (X = 2000) + 1000P (X = 1000) + ⋅ ⋅ ⋅ + 0P (X = 0)
2000 1000 490 250 ⋅ 10 60 ⋅ 10 9977 ⋅ 0

= + + + + + = 0.659
10000 10000 10000 10000 10000 10000
E[Y ] = P (Y = 3000) ⋅ 3000 + P (Y = 2000) ⋅ 2000 + P (Y = 800) ⋅ 800 + P (Y = 0) ⋅ 0
1 1 1 9997
= ⋅ 3000 + ⋅ 2000 + ⋅ 800 + ⋅ 0 = 0.3 + 0.2 + 0.08 + 0 = 0.58.
10000 10000 10000 10000
(d) For every $1 staked, the “big” game is expected to lose you $0.341 and the “small”
game is expected to lose you $0.42. Thus, the “big” game is expected to lose you less
money.

Answer to Exercise 74. From our work earlier, we know that P (B = 1) = P (B = 2) =
P (B = 3) = P (B = 4) = 4/52 and P (B = 0) = 36/52. Compute
1 ⋅ 4 + 2 ⋅ 4 + 3 ⋅ 4 + 4 ⋅ 4 + 36 ⋅ 0 10
E [B] = = .
52 13
12 ⋅ 4 + 22 ⋅ 4 + 32 ⋅ 4 + 42 ⋅ 4 + 36 ⋅ 0 30
E [B 2 ] = = .
52 13
2 30 10 2 290
2
Hence, V[B] = E [B ] − (E [B]) = −( ) = .
13 13 169
35 65
E [Y ] = × 20 cm + × 30 cm = 26.5 cm.
100 100
35 2 65 2
V [Y ] = × (20 cm − 26.5 cm) + × (30 cm − 26.5 cm) = 22.75 cm2 .
100 100
√
SD [Y ] = V [Y ] ≈ 4.77 cm.
Answer to Exercise 76. (a) 2µ kg, 2σ 2 kg2 .

(b) 2µ kg, 4σ 2 kg2 .
(c) The mean of the total weight of the two fish is 2µ kg. However, we do not know the
variance, since the weights of the two fish are not independent.

Answer to Exercise 77. Let X ∼ B (20, 0.01) be the number of components in engine
#1 that fail. Let Y ∼ B (35, 0.005) be the number of components in engine #2 that fail.
The probability that engine #1 fails is
P (X ≥ 2) = 1 − P (X ≤ 1) = 1 − P (X = 0) − P (X = 1)
⎛ 20 ⎞ ⎛ 20 ⎞
=1− 0.010 0.9920 − 0.011 0.9919 ≈ 0.0169.
⎝ 0 ⎠ ⎝ 1 ⎠
The probability that engine #2 fails is
P (Y ≥ 2) = 1 − P (Y ≤ 1) = 1 − P (Y = 0) − P (Y = 1)
⎛ 35 ⎞ ⎛ 35 ⎞
=1− 0.0050 0.99535 − 0.0051 0.99534 ≈ 0.0133.
⎝ 0 ⎠ ⎝ 1 ⎠
Hence, the probability that both engines fail is P (X ≥ 2) P (Y ≥ 2) ≈ 0.00022.
⎧
⎪
⎪
⎪
⎪ 0, if k < 3, ⎧
⎪
⎪
⎪ ⎪
⎪0.5, if k ∈ [3, 5]
(a) FY (k) = ⎨0.5k, if k ∈ [3, 5], (b) fY (k) = ⎨
⎪
⎪
⎪ ⎪
⎪
⎪
⎪
⎪
⎪ if k > 5. ⎩0, otherwise.
⎩1,
(c) P (3.1 ≤ Y ≤ 4.6) = 0.75 is in blue and P (4.8 ≤ Y ≤ 4.9) = 0.05 is in red.

Answer to Exercise 79. (a) From Z-tables,
P (Z ≥ 1.8) = 1 − P (Z ≤ 1.8) = 1 − Φ(1.8) ≈ 1 − 0.9641 = 0.0359.
Graphing calculator screenshot:
-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4
(b) From Z-tables,
P (−0.351 < Z < 1.2) = Φ(1.2) − Φ(−0.351) = Φ(1.2) − [1 − Φ(0.351)]

≈ 0.8849 − (1 − 0.6372) = 0.8849 − 0.3628 = 0.5221.
Graphing calculator screenshot:
Answer to Exercise 80. If µ = 0 and σ 2 = 1, then
1 a−µ 2 1 a−0 2 1
fX (a) = √ e−0.5( σ ) = √ e−0.5( 1 ) = √ e−0.5a = φ(a).
2
σ 2π 1 2π 2π
We’ve just shown that the PDF of X ∼ N(µ, σ 2 ) when µ = 0 and σ 2 , is the same as the PDF
of the SNRV Z ∼ N(0, 1). Hence, the SNRV is indeed simply a normal random variable
with mean µ = 0 and variance σ 2 = 1.

X − µ X −µ
Answer to Exercise 81. First observe that = + . Now simply use Fact 11,
σ σ σ
1 −µ
with a = and b = :
σ σ
X − µ X −µ µ −µ 1 2
= + ∼ N( + , σ ) = N (0, 1) .
σ σ σ σ σ σ2
Answer to Exercise 82. We are given that X ∼ N(2.14, 5) and Y ∼ N(−0.33, 2).
1 − 2.14
(a) P (X ≥ 1) = P (Z ≥ √ ) ≈ P (Z ≥ −0.5098) = P (Z ≤ 0.5098) = Φ (0.5098) ≈ 0.6949.
5
1 − (−0.33)
P (Y ≥ 1) = P (Z ≥ √ ) ≈ P (Z ≥ 0.9405) = 1 − P (Z ≤ 0.9405) = 1 − Φ (0.9405) ≈
2
0.1735.
P (X ≥ 1) and P (Y ≥ 1) P (−2 ≤ X ≤ −1.5) and P (−2 ≤ Y ≤ −1.5)
(−2) − 2.14 (−1.5) − 2.14

(b) P (−2 ≤ X ≤ −1.5) = P ( √ ≤Z≤ √ )
5 5
≈ P (−1.8515 ≤ Z ≤ −1.6279)
= P (1.6279 ≤ Z ≤ 1.8515) = Φ (1.8515) − Φ (1.6279) ≈ 0.9679 − 0.9482 = 0.0197.
(−2) − (−0.33) (−1.5) − (−0.33)

P (−2 ≤ Y ≤ −1.5) = P ( √ ≤Z≤ √ )
2 2
≈ P (−1.1809 ≤ Z ≤ −0.8273) = P (0.8273 ≤ Z ≤ 1.1809)
= Φ (1.1809) − Φ (0.8273) ≈ 0.8812 − 0.7959 = 0.0853.

Answer to Exercise 83. (a) Let W ∼ N (25000, 64000000) and E ∼ N (200, 10000). Let
B = 0.002W + 0.3E be the total bill in a given month. Then
B ∼ N (0.002 × 25000 + 0.3 × 200, 0.0022 × 64000000 + 0.32 × 10000)

= N (50 + 60, 256 + 900) = N (110, 1156) .
Thus, P (B > 100) ≈ 0.6157 (calculator).
(b) Let B1 ∼ N (110, 1156), B2 ∼ N (110, 1156), . . . , B12 ∼ N (110, 1156) be the bills in each
of the 12 months.
Then the total bill in a year is T = B1 +B2 +⋅ ⋅ ⋅+B12 ∼ N (12 × 110, 12 × 1156) = N (1320, 13872).
Thus, P (T > 1000) ≈ 0.9967 (calculator).
(c) The total bill in a given month is B = 0.002W + xE and
B ∼ N (50 + 200x, 256 + 10000x2 ) .
Our goal is to find the value of x for which P (B > 100) = 0.1. We have
100 − (50 + 200x) 50 − 200x

P (B > 100) = P (Z > √ ) = P (Z > √ )
256 + 10000x2 256 + 10000x2
50 − 200x
= 1 − Φ (√ ) = 0.1.
256 + 10000x2
From the Z-tables,
50 − 200x 50 − 200x
Φ (√ ) = 0.9 ⇐⇒ √ ≈ 1.2815.
256 + 10000x2 256 + 10000x2
One can rearrange, do the algebra (square both sides), and use the quadratic formula.
Alternatively, one can simply use one’s graphing calculator to find that x ≈ 0.084. We
conclude that the maximum value of x is approximately 0.084, in order for the probability
that the total utility bill in a given month exceeds $100 is 0.1 or less.

Answer to Exercise 84. From our earlier work, we know that each die roll has mean
3.5 and variance 35/12.
The CLT says that since n = 30 ≥ 30 is large enough and the distribution is “nice enough”
(we are assuming this), X can be approximated by the normal random variable Y ∼
N (30 × 3.5, 30 × 35/12) = N (105, 1050/12). Thus, using also the continuity correction, we
have P(100 ≤ X ≤ 110) ≈ P(99.5 ≤ Y ≤ 110.5) ≈ 0.4435 (calculator).
Answer to Exercise 85. Let X be the random variable that is the sum of the weights of
the 5, 000 Coco-Pops.
The CLT says that since n = 5000 ≥ 30 is large enough and the distribution is “nice
enough” (we are assuming this), X can be approximated by the normal random variable
Y ∼ N (5000 × 0.1, 5000 × 0.004) = N (500, 20). Thus, P (X ≤ 499) ≈ P (Y ≤ 499) ≈ 0.4115
(calculator).
3 + 14 + 2 + 8 + 8 + 6 + 0 41
Answer to Exercise 86. x̄ = = and
7 7
2 ∑ x2i − nx̄2 9 + 156 + 4 + 64 + 64 + 36 − 412 /7 155

s = = = .
n−1 6 7
Answer to Exercise 87. (a) The sample mean x̄ and variance s2 are
n
∑i=1 x 1885
x̄ = = = 188.5,
n 10
2
(∑n
378, 265 − 1885
2
∑i=1 x2 − i=1 x)
n
2 10
s = n
= ≈ 2550.
n−1 9
(b) The sample mean x̄ and variance s2 are
∑i=1 x ∑i=1 (x − 50 + 50) ∑i=1 (x − 50) + ∑i=1 50 1885 + 50n 1885

n n n n
x̄ = = = = = + 50 = 238.5,
n n n n n
2
2 [∑ (x −50)]
n
378, 265 − 1885
2
∑i=1 (xi − 50) − i=1 ni
n
2 10
s = = ≈ 2550.
n−1 9

Answer to Exercise 88. (a) Assume that the weights of the five Singaporeans sampled
are independently- and identically-distributed. Then unbiased estimates for the population
mean µ and variance σ 2 of the weights of Singaporeans are, respectively, the observed
sample mean x̄ and observed sample variance s2 :
∑ xi 32 + 88 + 67 + 75 + 56
x̄ = = = 63.6,
n 5
2 ∑ x2i − nx̄2 322 + 882 + 672 + 752 + 562 − 4 × 63.6
s = = = 448.3.
n−1 4
(b) We don’t know! And unless we literally gather and weigh every single Singaporean, we
will never know what exactly the average weight of a Singaporean is.
All we’ve found in part (a) is an estimate (63.6 kg) for the average weight of a Singaporean.
We know that on average, the estimator we uses “gets it right”.
However, it could well be that we’re unlucky (and got 5 unusually heavy or unusually light
persons) and the estimate of 63.6 kg is thus way off.
X 1 + X 2 + ⋅ ⋅ ⋅ + Xn E [X1 + X2 + ⋅ ⋅ ⋅ + Xn ]
E [X̄] = E [ ]=
n n
E [X1 ] + E [X2 ] + ⋅ ⋅ ⋅ + E [Xn ] µ + µ + ⋅ ⋅ ⋅ + µ nµ
= = = = µ.
n n n
We have just shown that E [X̄] = µ. In other words, we’ve just shown that X̄ is an unbiased
estimator for µ.

Answer to Exercise 90. (a) The observed random sample is (x1 , x2 , . . . , x10 ) =
(1, 1, 1, 1, 1, 1, 1, 0, 0, 0). The observed sample mean and observed sample variance are
x1 + x2 + ⋅ ⋅ ⋅ + x10
x̄ = = 0.7,
n
2 2 2
2 (x1 − x̄) + (x2 − x̄) + ⋅ ⋅ ⋅ + (x10 − x̄) 7 ⋅ 0.32 + 3 ⋅ 0.72 ⋅
s = = = 0.23.
n−1 9
(b) Yes, the observed sample mean x̄ = 0.7 is an unbiased estimate for the true population
mean µ (i.e. the true proportion of coin flips that are heads).
⋅
And yes, the observed sample variance s2 = 0.23 is an unbiased estimate for the true
population variance σ 2 .
(c) No, this is merely one observed random sample, from which we generated a single
estimate (“guess”) — namely x̄ = 0.7 — of the true population mean µ.
All we know is that the sample mean X̄ is an unbiased estimator for the true population
mean µ. That is, the average estimate generated by X̄ will equal µ.
However, any particular estimate x̄ may or may not be equal to µ. Indeed, if we’re unlucky,
our particular estimate may be very far from the true µ.
1 1
Answer to Exercise 91. V [X̄] = V [ (X1 + X2 + ⋅ ⋅ ⋅ + Xn )] = 2 V [X1 + X2 + ⋅ ⋅ ⋅ + Xn ] =
n n
1 1 σ 2
2
(V [X1 ] + V [X2 ] + ⋅ ⋅ ⋅ + V [Xn ]) = 2 (nσ 2 ) = .
n n n

Answer to Exercise 92. (a) The population mean µ is the number defined by
k
µ = ∑ xi /k. It is the average across all population values.
i=1
k
(b) The population variance σ 2 is the number defined by σ 2 = ∑ (xi − µ) /k. It measures
i=1
the dispersion across the population values.
n
(c) The sample mean X̄ is a random variable defined by X̄ = ∑ Xi /n. It is the average
i=1
of all values in a random sample.
n
(d) The sample variance S 2 is a random variable defined by S 2 = ∑ (Xi − X̄) / (n − 1).
i=1
It measures the dispersion across the values in a random sample.
(e) The mean of the sample mean, also called the expected value of the sample mean, is the
number E [X̄]. The interpretation is that if we we have infinitely-many observed samples
of size n, calculate the observed sample mean for each, then E [X̄] is equal to the average
across the observed sample means. It can be shown that E [X̄] = µ and hence that the
sample mean X̄ is an unbiased estimator for the population mean µ.
(f) The variance of the sample mean is the number V [X̄]. The interpretation is that if
we have infinitely-many observed random samples of size n, calculate the observed sample
mean for each, then V [X̄] measures the dispersion across the observed sample means.
(g) The mean of the sample variance, also called the expected value of the sample variance,
is the number E [S 2 ]. The interpretation is that if we have infinitely-many observed
random samples of size n, calculate the observed sample variance for each, then E [S 2 ] is
equal to the average across the observed sample variances. It can be shown that E [S 2 ] = σ 2
and hence that the sample variance S 2 is an unbiased estimator for the population variance
σ2 .
(h) Given an observed random sample, e.g. (x1 , x2 , x3 ) = (1, 1, 0), we can calculate the
corresponding observed sample mean as
x1 + x2 + x3 1 + 1 + 0 2
x̄ = = = .
3 3 3
The observed sample mean is the average of all values in an observed random sample.
(i) Given an observed random sample, e.g. (x1 , x2 , x3 ) = (1, 1, 0), we can calculate the
corresponding observed sample variance as
2 2 2
2 (x1 − x̄) + (x2 − x̄) + (x3 − x̄) 1/9 + 1/9 + 4/9 1
s = = = .
3−1 2 3
The observed sample variance measures the dispersion across the observed sample variances.

Answer to Exercise 93. Let µ be the probability that a coin-flip is heads. The null
and alternative hypotheses are
H0 ∶ µ = 0.5 and HA ∶ µ > 0.5.
Our random sample is 20 coin-flips: (X1 , X2 , . . . , X20 ), where Xi takes on the value 1 if
the ith coin-flip is heads and 0 otherwise.
Our test statistic is the number of heads: T = X1 + X2 + ⋅ ⋅ ⋅ + X20 .
In our observed random sample (x1 , x2 , . . . , x20 ), there are 17 heads. So the observed
test statistic is t = 17.
Assuming H0 were true, we’d have T ∼ B (20, 0.5). Thus, the p-value is
P (T ≥ 17∣H0 ) = P (T = 17∣H0 ) + P (T = 18∣H0 ) + P (T = 19∣H0 ) + P (T = 20∣H0 )

⎛ 20 ⎞ 17 3 ⎛ 20 ⎞ 18 2 ⎛ 20 ⎞ 19 1 ⎛ 20 ⎞ 20 0
= 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 ≈ 0.0013.
⎝ 17 ⎠ ⎝ 18 ⎠ ⎝ 19 ⎠ ⎝ 20 ⎠
Since p ≈ 0.0013 < α = 0.05, we reject H0 at the 5% significance level.
Answer to Exercise 94. Let µ be the true long-run proportion of coin-flips that are
heads. The null and alternative hypotheses are
H0 ∶ µ = 0.5 and HA ∶ µ ≠ 0.5.
Our random sample is 20 coin-flips: (X1 , X2 , . . . , X20 ), where Xi takes on the value 1 if
the ith coin-flip is heads and 0 otherwise.
Our test statistic is the number of heads: T = X1 + X2 + ⋅ ⋅ ⋅ + X20 .
In our observed random sample (x1 , x2 , . . . , x20 ), there are 17 heads. So the observed
test statistic is t = 17.
Assuming H0 were true, we’d have T ∼ B (20, 0.5). Thus, the p-value is
P (T ≥ 17, T ≤ 3∣H0 ) = P (T = 0∣H0 ) + ⋅ ⋅ ⋅ + P (T = 3∣H0 ) + P (T = 17∣H0 ) + ⋅ ⋅ ⋅ + P (T = 20∣H0 )

⎛ 20 ⎞ 0 20 ⎛ 20 ⎞ 1 19 ⎛ 20 ⎞ 17 3 ⎛ 20 ⎞ 20 0
= 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ≈ 0.0026.
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 17 ⎠ ⎝ 20 ⎠
Since p ≈ 0.0026 < α = 0.05, we reject H0 at the 5% significance level.

Answer to Exercise 95. Let µ be the probability that a coin-flip is heads.
(a) The competing hypotheses are H0 ∶ µ = 0.5, HA ∶ µ > 0.5.
The test statistic T is the number of heads (out of the 20 coin-flips).
For t = 14, the corresponding p-value is
P (T ≥ 14∣H0 ) = P (T = 14∣H0 true) + P (T = 15∣H0 true) + ⋅ ⋅ ⋅ + P (T = 20∣H0 true)

⎛ 20 ⎞ 14 6 ⎛ 20 ⎞ 15 5 ⎛ 20 ⎞ 20 0
= 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ≈ 0.05766.
⎝ 14 ⎠ ⎝ 15 ⎠ ⎝ 20 ⎠
P (T ≥ 15∣H0 ) = P (T = 15∣H0 true) + P (T = 15∣H0 true) + ⋅ ⋅ ⋅ + P (T = 20∣H0 true)

⎛ 20 ⎞ 14 6 ⎛ 20 ⎞ 15 5 ⎛ 20 ⎞ 20 0
= 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ≈ 0.02069.
⎝ 15 ⎠ ⎝ 15 ⎠ ⎝ 20 ⎠
Thus, the critical value is 15 (this is the value of t at which we are just able to reject H0 at
the α = 0.05 significance level).
And the critical region is {15, 16, . . . , 20} (this is the set of values of t at which we’d be able
to reject H0 at the α = 0.05 significance level).
(b) The competing hypotheses are H0 ∶ µ = 0.5, HA ∶ µ ≠ 0.5.
The test statistic T is the number of heads (out of the 20 coin-flips).
P (T ≥ 14, T ≤ 6∣H0 ) = 1 − P (7 ≤ T ≤ 13∣H0 )

= 1 − [P (T = 7∣H0 true) + P (T = 8∣H0 true) + ⋅ ⋅ ⋅ + P (T = 13∣H0 true)]
⎡ ⎤
⎢⎛ 20 ⎞ 7 13 ⎛ 20 ⎞ 8 12 ⎛ 20 ⎞ ⎥
= 1 − ⎢⎢ 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ⎥⎥ ≈ 0.1153.
13 7
⎢⎝ 7 ⎠ ⎝ 8 ⎠ ⎝ 13 ⎠ ⎥
⎣ ⎦
P (T ≥ 15, T ≤ 5∣H0 ) = 1 − P (6 ≤ T ≤ 14∣H0 )

= 1 − [P (T = 6∣H0 true) + P (T = 7∣H0 true) + ⋅ ⋅ ⋅ + P (T = 14∣H0 true)]
⎡ ⎤
⎢⎛ 20 ⎞ 6 14 ⎛ 20 ⎞ 7 13 ⎛ 20 ⎞ ⎥
= 1 − ⎢⎢ 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.513 0.57 ⎥⎥ ≈ 0.1153.
⎢⎝ 6 ⎠ ⎝ 7 ⎠ ⎝ 14 ⎠ ⎥
⎣ ⎦
Thus, the critical value is 15 and the critical region is {15, 16, . . . , 20}.

Answer to Exercise 96. The competing hypotheses are:
H0 ∶ µ = 34,
HA ∶ µ ≠ 34.
The observed sample mean is
35 + 35 + 31 + 32 + 33 + 34 + 31 + 34 + 35 + 34
x̄ = = 33.4.
10
The corresponding p-value is
p = P (X̄ ≥ 33.4, X̄ ≤ 34.6∣H0 ) = P (X̄ ≥ 33.4∣H0 ) + P (X̄ ≤ 34.6∣H0 )
⎛ 33.4 − 34 ⎞ ⎛ 34.6 − 34 ⎞
=P Z≥ √ +P Z ≤ √ ≈ 0.5271.
⎝ 9/10 ⎠ ⎝ 9/10 ⎠
The large p-value does not cast doubt on or provide evidence against H0 . We fail to reject
H0 at the α = 0.05 significance level.
H0 ∶ µ = 34,
HA ∶ µ ≠ 34.
The observed sample mean is x̄ = 33.4.

p = P (X̄ ≤ 33.4, X̄ ≥ 34.6∣H0 ) = P (X̄ ≤ 33.4∣H0 ) + P (X̄ ≥ 34.6∣H0 )
CLT ⎛ 33.4 − 34 ⎞ ⎛ 34.6 − 34 ⎞

≈ P Z≤ √ +P Z ≥ √ ≈ 0.04550.
⎝ 9/100 ⎠ ⎝ 9/100 ⎠
The large p-value casts doubt on or provides evidence against H0 . We reject H0 at the
α = 0.05 significance level.

H0 ∶ µ = 34,
HA ∶ µ ≠ 34.
The observed sample mean is x̄ = 33.4. And the observed sample variance is s2 = 11.2.
p = P (X̄ ≤ 33.4, X̄ ≥ 34.6∣H0 ) = P (X̄ ≤ 33.4∣H0 ) + P (X̄ ≥ 34.6∣H0 )
CLT ⎛ 33.4 − 34 ⎞ ⎛ 34.6 − 34 ⎞

≈ P Z≤√ +P Z ≥ √ ≈ 0.07300.
⎝ 11.2/100 ⎠ ⎝ 11.2/100 ⎠
The fairly small p-value casts some doubt on or provides some evidence against H0 . But
we fail to reject H0 at the α = 0.05 significance level.
Answer to Exercise 99. The observed sample mean is x̄ = 68 and the observed sample
variance (use Fact 13(a)) is
2
− [∑i=1n xi ] 50 × 5000 − (68×50)
n 2
∑i=1 x2i
n
50
s2 = = ≈ 383.7.
n−1 49
Let µ be the true average weight of a Singaporean. The competing hypotheses are H0 ∶ µ =
75 and HA ∶ µ < 75.
(This is a one-tailed test, because your friend’s claim is that the average American is heavier
than the average Singaporean. If the claim were instead that the average American’s weight
is different from the average Singaporean’s, then we’d have a two-tailed test.)
Since the sample size n = 50 is “large enough”, we can appeal to the CLT. The p-value is
CLT ⎛ 68 − 75 ⎞
p = P (X̄ ≤ 68∣H0 ) ≈ P Z ≤ √ ≈ 0.0058.
⎝ 383.7/50 ⎠
The small p-value casts doubt on or provides evidence against H0 . We can reject H0 at any
conventional significance level (α = 0.1, α = 0.05, or α = 0.01).

1200 q
1000
800
600
400
200
p ($)
0
0 2 4 6 8 10 12
Answer to Exercise 101. Compute p̄ = (8 + 9 + 4 + 10 + 8) /5 = 7.8 and

q̄ = (300 + 250 + 1000 + 400 + 400) /5 = 470. Also,
n
∑ (pi − p̄) (qi − q̄) = (8 − p̄) (300 − q̄) + (9 − p̄) (250 − q̄) + ⋅ ⋅ ⋅ + (8 − p̄) (400 − q̄)
i=1
= (8 − 7.8) (300 − 470) + (9 − 7.8) (250 − 470) + ⋅ ⋅ ⋅ + (8 − 7.8) (400 − 470) = −2480,
¿ √
Án
Á
À∑ (pi − p̄)2 = (8 − p̄)2 + (9 − p̄)2 + (4 − p̄)2 + (10 − p̄)2 + (8 − p̄)2
i=1
√ √
2 2 2 2 2
= (8 − 7.8) + (9 − 7.8) + (4 − 7.8) + (10 − 7.8) + (8 − 7.8) = 20.8 ≈ 4.56070170,
¿ √
Án
Á
À∑ (qi − q̄)2 = (300 − q̄)2 + (250 − q̄)2 + ⋅ ⋅ ⋅ + (400 − q̄)2
i=1
√ √
2 2 2
= (300 − 470) + (250 − 470) + ⋅ ⋅ ⋅ + (400 − 470) = 368000 ≈ 606.63003552.
∑i=1 (pi − p̄) (qi − q̄) −2480,

n
Thus, r=√ √ ≈ ≈ −0.8964.
2 2 4.56070170 × 606.63003552
∑i=1 (pi − p̄) ∑i=1 (qi − q̄)
n n

Answer to Exercise 102. (a) We already computed (in the previous exercise) that
n n
2
p̄ = 7.8, q̄ = 470, ∑ (pi − p̄) (qi − q̄) = −2480 and ∑ (pi − p̄) = 20.8. So,
i=1 i=1
∑i=1 (pi − p̄) (qi − q̄) −2480

n
b̂ = 2 = ≈ −119.2
∑i=1 (pi − p̄)
n 20.8
Thus, the regression line of q on p is q − q̄ = b̂ (p − p̄) or q − 470 = −119.2 (p − 7.8) or

q = 1400 − 119.2p.
(b) i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
q̂i 446 327 923 208 406
ûi = qi − q̂i −146 −77 77 192 −46
1000 q
900
800
700
600
500
400
300
200
100 p ($)
0
(c) 0 2 4 6 8 10
5
2 2 2
(d) The SSR is ∑ û2i ≈ (−146) + (−77) + 772 + 1922 + (−46) = 72308.
i=1

The TI84 tells us that r = −.8963881445 and the regression line is y = ax+b = −119.2307692+
1400. This is indeed consistent with the answers from the previous exercises.
Answer to Exercise 104. In the previous exercises, we already calculated that the OLS
line of best fit is q = 1400 − 119.2p. Thus,
(a) By interpolation, a barber who charged $7 per haircut sold 1400 − 119.2 × 7 ≈ 566
haircuts.
(b) By extrapolation, a barber who charged $200 per haircut sold 1400−119.2×200 = −22440
haircuts. This is plainly absurd.
The second prediction is obviously absurd and thus obviously less reliable than the first.

63 Answers to Exercises in Part IV (2006-2015 A-Level Exams)
63.1 Answers for Ch. 58: Pure Mathematics
Answer to Exercise 105 (8864 N2015/I/1). The given expression is a ∩-shaped

quadratic. It is thus always negative if and only if its discriminant is negative. But this
is impossible, because it discriminant is D = (k − 4)2 − 4(−2)(2k) = k 2 + 8k + 16 = (k + 4)2 ,
which is always positive.
Answer to Exercise 106 (8864 N2015/I/2). (i)
d 3 −5 −24
= −12(2x − 1) (2) = .
dx (2x − 1)4 (2x − 1)5
1
12 2 1 4 x3 4
(ii) ∫ (x + ) dx = ∫ x2 + 4 + 2 dx = [ + 4x − ]
0.5 x 0.5 x 3 x 0.5
1 1 7
= ( + 4 − 4) − ( + 2 − 8) = 6 .
3 24 24
Answer to Exercise 107 (8864 N2015/I/3). (i) dy/dx = 12 − 16e−2x = 0 ⇐⇒

x = −0.5 ln 0.75.
a a
(ii) ∫ 12x + 8e−2x dx = [6x2 − 4e−2x ]0 = (6a2 − 4e−2a ) − (6 ⋅ 02 − 4e−2⋅0 ) = 6a2 − 4e−2a + 4.
0
Answer to Exercise 108 (8864 N2015/I/4). (i) The perimeter of P QRST U is 3x +

3(y − 2x) = 3(y − x) = 30, so y = 10 + x.
√ √ √
2
△DEF has area 0.5y y 2 − (0.5y) = 3y 2 /4 = 3(10 + x)2 /4.
√
Each of the three small cut-out triangles has area 3x2 /4. So
√ √ √
3 3 3
A= (10 + x)2 − 3x2 = (100 + x2 + 20x − 3x2 )
4 4 4
√ √
3 3
= (100 + 20x − 2x2 ) = (50 + 10x − x2 ) .
4 2
√ √
dA 3 3
(ii) = (10 − 2x) = 0 ⇐⇒ x=5 Ô⇒ Amax = 75.
dx 2 2

dy 1
(ii) = 0.5x ln 0.5 − .
dx x+1
R
dy RRRR 1 √ 2
Ô⇒ RRR = 0.50.5 ln 0.5 − = − 0.5 ln 2 − ≈ −1.157.
dx RR 0.5 + 1 3
Rx=0.5
√
(iii) The point P is (0.5, 0.5 − ln 1.5). The normal to C at P has equation
√ −1 1
y − ( 0.5 − ln 1.5) = √ 2
(x − 0.5) = √ (x − 0.5)
− 0.5 ln 2 − 3 0.5 ln 2 + 23
or y = 0.8644568499x − 0.13058675187.
Its y- and x-intercepts are y0 = 0.8644568499(0)−0.13058675187 ≈ −0.13058675187
√ and x0 =
(0 + 0.13058675187)/0.8644568499 ≈ 0.15106219805. The length of AB is thus x20 + y02 ≈
0.040.
Answer to Exercise 110 (8864 N2014/I/1).
6 1 2√ 6
1 √ √ √
∫1 √ dx = [ 1 + 4x] = ( 25 − 5) = 0.5 (5 − 5) .
1 + 4x 4 1 2

−1
Answer to Exercise 111 (8864 N2014/I/2). (i) 2x (x2 + 4) .
−1
(ii) 2x (x2 + 4) = k ⇐⇒ 2x = k (x2 + 4) ⇐⇒ kx2 − 2x + 4k = 0.
(iii) This quadratic equation has equal roots if and only if its discriminant is zero. D =
(−2)2 − 4(k)(4k) = 4 − 16k 2 = 0 ⇐⇒ k = ±0.5.
(ii) dy/dx = 2e1−2x . The point is (1, 1 − e−1 ). dy/dx∣x=1 = 2e−1 . So the equation is y −
(1 − e−1 ) = 2e−1 (x − 1) or y = 2e−1 x + 1 − 3e−1 .
Answer to Exercise 113 (8864 N2014/I/4). (i) By the Pythagorean Theorem, (y −

x)2 + (2x)2 = 4(65) or 5x2 + y 2 − 2xy = 260.
(ii) The perimeter is 2(3x + y) = 6x + 2y = 60, so y = 30 − 3x. Plug this into the equation
from (i) to get
5x2 + (30 − 3x)2 − 2x(30 − 3x) = 260
⇐⇒ 5x2 + 900 + 9x2 − 180x − 60x + 6x2 = 260
⇐⇒ 20x2 − 240x + 640 = 0 ⇐⇒ x2 − 12x + 32 = (x − 8)(x − 4) = 0 ⇐⇒ x = 4, 8.
Correspondingly, y = 18, 6. We reject the latter because y > x. Thus, (x, y) = (4, 18).

Answer to Exercise 114 (8864 N2014/I/5). (i) dy/dx = 3x2 + 2kx + 7 = 0. If A is a
stationary point, then 3(1)2 + 2k(1) + 7 = 0 ⇐⇒ k = −5. And 2 = 13 − 5 ⋅ 12 + 7 ⋅ 1 + c ⇐⇒
c = −1.
(ii) 3x2 − 10x + 7 = 0 ⇐⇒
√ √
10 ± (−10)2 − 4(3)(7) 5 ± 25 − 21 7
x= = = 1, .
2(3) 3 3
y = (7/3)3 − 5(7/3)2 + 7(7/3) − 1 = 343/27 − 245/9 + 49/3 − 1 = 22/27. So B is (7/3, 22/27).

(iii)
2
2
3 2 x4 5x3 7x2 x 40 1 5 7 1 19
(iv) ∫ x −5x +7x−1dx = [ − + − ] = (4 − + 14 − 1)−( − + − ) = .
1 4 3 2 2 1 3 4 3 2 2 12
Answer to Exercise 115 (8864 N2013/I/1). A quadratic equation has no real roots if
2
and only if its discriminant is negative. In this case, the discriminant is D = [−(k − 2)] −
4(1)(2k + 1) = k 2 − 4k + 4 − 8k − 4 = k 2 − 12k = k(k − 12).
D < 0 if and only if k ∈ (0, 12).
Answer to Exercise 116 (8864 N2013/I/2). (i) 4x/ (1 + 2x2 ).
0 0
1 1 −3 1 1 −3 7
(ii) ∫ dx = [ (1 − 3x) ] = − 4 = .
−1 (1 − 3x)4 9 −1 9 9 64
Answer to Exercise 117 (8864 N2013/I/3). (i) The perimeter is 4x + 2y + 3x + 5x =

12x + 2y = 20, so y = 10 − 6x. The area is S = 4xy + 6x2 = 4x (10 − 6x) + 6x2 = 40x − 18x2 .
(ii) dS/dx = 40 − 36x = 0 ⇐⇒ x = 10/9. So Smax = 40(10/9) − 18(10/9)2 = 200/9.

Answer to Exercise 118 (8864 N2013/I/4). (i) dy/dx = 3x2 − 2ax + 3. dy/dx∣x=1 =
3 − 2a + 3 = 6 − 2a. So the gradient of the normal is 1/(2a − 6).
(ii) The normal passes through the point P (1, 10−a) and has equation y−3 = (x+5)/(2a−6).
Plugging the point into the equation,
1+5
10−a−3 = ⇐⇒ 2(7−a)(a−3) = 6 ⇐⇒ −a2 +10a−21 = 3 ⇐⇒ a2 −10a+24 = 0.
2a − 6
a2 − 10a + 24 = (a − 6)(a − 4) = 0 ⇐⇒ a = 4, 6.
(iii) y − 3 = (y + 5)/(2 ⋅ 4 − 6) ⇐⇒ 2(y − 3) = (y + 5) ⇐⇒ y = 11. It is (11, 11).
Answer to Exercise 119 (8864 N2013/I/5). (i) 2 − 2x = ln 2 − x ⇐⇒ x = 2 − ln 2.

(ii) dy/dx = −2e2−2x + 2e−x = 0 ⇐⇒ ex = e2x−2 ⇐⇒ e2 = ex ⇐⇒ x = 2. And y =
e2−2(2) − 2e−2 = e−2 − 2e−2 = e−2 . So the stationary point is (2, −e−2 ).
(iii) e2−2x − 2e−x = 0 ⇐⇒ e2−x = 2 ⇐⇒ x = 2 − ln 2.
1 1
(iv) Exact: ∫ e2−2x − 2e−x dx = [−0.5e2−2x + 2e−x ]0 = (−0.5e0 + 2e−1 ) − (−0.5e2 + 2e0 ) =
0
−1
2e − 2.5 + 0.5e2 ≈ 1.930. TI84:
Answer to Exercise 120 (8864 N2012/I/1). Let u = e2x . Then 3e2x = 4 (e−2x − 1)
⇐⇒ 3u = 4 (u−1 − 1) ⇐⇒ 3u2 = 4 (1 − u) ⇐⇒ 3u2 + 4u − 4 = 0 ⇐⇒ (3u − 2)(u + 2) = 0
⇐⇒ u = 2/3, −2.
Assuming that x is real, it cannot be that e2x = −2. Hence, e2x = 2/3 or x = 0.5 ln(2/3).

Answer to Exercise 121 (8864 N2012/I/2). (i) The area of HBCG and DEF G is
1
(x + 20)y and 20x. The former is thrice the latter: (x + 20)y = 60x ⇐⇒ y = 60x/(x + 20).
2 1 2
The total fencing is x + y + x + x + 20 + x + y = 4x + 2y + 20 = 100. Plug = into = to get
120x
4x+ +20 = 100 ⇐⇒ 4x2 +100x+400+120x = 100x+2000 ⇐⇒ 4x2 +120x−1600 = 0.
x + 20
Dividing by 4 yields the desired result.
(ii) x2 + 30x − 400 = (x − 10)(x + 40) = 0 ⇐⇒ x = 10, −40. (Reject the negative value.)
HF = x + y = 10 + 60 ⋅ 10/(10 + 20) = 30.
Answer to Exercise 122 (8864 N2012/I/3). (i) 3k 2 /4 = k 2 − x2 ⇐⇒ x = ±0.5k.
0.5k 0.5k 0.5k

0.5k 3 k2 x3 k2 x3 k3 k3 k3
(ii) ∫ 2 2
k − x − k 2 dx = [ x − ] = 2[ x − ] = 2[ − ] =
−0.5k 4 4 3 −0.5k 4 3 0 8 24 0 6
Answer to Exercise 123 (8864 N2012/I/4). (i) (a) 6/(3x + 2).

(b) −8/(2x + 1)2 .
2 4
4 √ 1 4 1 x2
(ii) ∫2 ( x − √ ) dx = ∫ x + − 2dx = [ + ln ∣x∣ − 2x]
x 2 x 2 2
= (8 + ln 4 − 8) − (2 + ln 2 − 4) = ln 2 + 2.

Answer to Exercise 124 (8864 N2012/I/5). (i) 2x − x2 = 0 ⇐⇒ x = 2, 4.
(ii) dy/dx = 2x ln 2 − 2x. dy/dx∣x=1.5 = 21.5 ln 2 − 3 ≈ −1.0395.

(iii) The point is (1.5, 21.5 − 1.52 ). The equation of the tangent at the point is
y − (21.5 − 1.52 ) = (21.5 ln 2 − 3) (x − 1.5) ⇐⇒ y ≈ −1.0395x + 2.1377.
(iv) The points A and B are (2.1377, 0) and (1.048, 1.048). So the length AB is
√
(2.1377 − 1.048)2 + 1.0482 ≈ 1.51.
Answer to Exercise 125 (8864 N2011/I/1). This is a ∪-shaped quadratic which

is entirely above the x-axis if and only if its discriminant is negative. In this case, the
discriminant is D = (k − 2)2 − 4(1)(k + 1) = k 2 − 4k + 4 − 4k − 4 = k 2 − 8k = k(k − 8).
D < 0 if and only if k ∈ (0, 8).
(ii) 2 − 0.6x = x2 − 1 ⇐⇒ x ≈ −1.1116, 1.5995.

3
(iii) ∫ x2 − 1 − (2 − 0.6x ) dx ≈ 3.615.
2

Answer to Exercise 127 (8864 N2011/I/3). (i) ∫ e3x+2 dx = e3x+2 /3 + C, where C is
9 √ √ √ 9
(ii) ∫ 3 ( x − 1/ x) dx = 3 [2x1.5 /3 − 2 x]4 = 3 [2(27 − 8)/3 − 2 ⋅ (3 − 2)] = 32.
4
Answer to Exercise 128 (8864 N2011/I/4). (i) V = (2 − 2x)2 x = 4x3 − 8x2 + 4x.
(ii) dV /dx = 12x2 − 16x + 4 = 0 ⇐⇒ 3x2 − 4x + 1 = (3x − 1)(x − 1) = 0 ⇐⇒ x = 1/3, 1. (These
are the two stationary points.)
dV /dx is decreasing at x = 1/3 and increasing at x = 1. Hence, the former is a maximum
turning point.
Thus, Vmax = 4(1/3)3 − 8(1/3)2 + 4(1/3) = 4/27 − 8/9 + 4/3 = 16/27.
Answer to Exercise 129 (8864 N2011/I/5). (i) dy/dx = 1−2/(2x+1) = 0 ⇐⇒ x = 0.5.

y = 0.5 − ln 2. The minimum point is (0.5, 0.5 − ln 2).
(ii) Point P is (2, 2 − ln 5). dy/dx∣x=2 = 0.6. So the normal to C at P has equation
1 5 16
y − (2 − ln 5) = − (x − 2) or y = − x + − ln 5.
0.6 3 3
3 16 16
So A = ( ( − ln 5) , 0) and B = (0, − ln 5). So the area of △OAB is
5 3 3
2
3 16 16 3 16 1 2
0.5 ( − ln 5) ( − ln 5) = ( − ln 5) = (16 − 3 ln 5) .
5 3 3 10 3 30
Answer to Exercise 130 (8864 N2010/I/1). A quadratic equation has two real roots
2
if and only if its discriminant is positive. In this case, the discriminant is D = (−2k) −
4(4)(9) = 4k 2 − 144 = 4(k − 6)(k + 6).
D > 0 if and only if k < −6 or k > 6.
Answer to Exercise 131 (8864 N2010/I/2). (i)∫ e1−2x dx = −0.5e1−2x + C, where C is

(ii) ∫ 2/(x + 1)3 dx = −(x + 1)−2 + C, where C is the constant of integration.

Answer to Exercise 132 (8864 N2010/I/3). (i) Below. (ii) dy/dx = 2/(2x − 3).
(iii) dy/dx∣x=3 = 2/3. So at the point (3, ln 3), the normal has equation y − ln 3 = −1.5(x − 3)
or y + 1.5x = 4.5 + ln 3 or 2y + 3x = 9 + 2 ln 3.
Answer to Exercise 133 (8864 N2010/I/4). (i) The perimeter is (5/4)2x+2x+2AD =

6. Rearranging , AD = 3 − 9x/4.
√
(ii) The area is A = 2x(3 − 9x/4) + x (1.25x)2 − x2 = 6x − 9x2 /2 + 3x2 /4 = 6x − 15x2 /4.
(iii) dA/dx = 6 − 15x/2 = 0 ⇐⇒ x = 0.8. Amax = 6 ⋅ 0.8 − 15 ⋅ 0.82 /4 = 4.8 − 2.4 = 2.4.
Answer to Exercise 134 (8864 N2010/I/5). (i) dy/dx = −12x2 − 12x3 = 0 ⇐⇒

x = 0, −1. So the stationary points are (x, y) = (0, 6), (−1, 7). (ii) Below.
(iii) 6 − 4x3 − 3x4 = 0 ⇐⇒ x = −1.72, 0.96 (calculator).
(iv) ∫ 6 − 4x3 − 3x4 dx = 6x − x4 − 0.6x5 + C, where C is a constant of integration.
0.5 1 0.6 51
3 4 4 5 0.5
∫−1 6 − 4x − 3x dx = [6x − x − 0.6x ]−1 = (3 − 16 − 32 ) − (−6 − 1 + 0.6) = 9 160 .

1 2
Answer to Exercise 135 (8863 N2009/I/1). We’re given x + 2y = 3 and x2 + xy = 2.
1 3 3 2
From =, x = 3 − 2y. Plug = into = to get (3 − 2y)2 + (3 − 2y)y = 2 or 2y 2 − 9y + 7 = 0 or
(2y − 7)(y − 1) = 0. Thus, y = 7/2, 1. Correspondingly, x = −4, 1. The solutions are thus
(x, y) = (7/2, −4), (1, 1).
√
Answer to Exercise 136 (8863 N2009/I/2). (i) x = 0.5x ⇐⇒ x = 0.25x2 ⇐⇒
x(1 − 0.25x) = 0 ⇐⇒ x = 0, 4. So the points of intersection are (0, 0) and (4, 2).
√
(ii) ∫ xdx = 2x1.5 /3 + C and ∫ 0.5xdx = x2 /4 + D, where C and D are constants of
integration
4√ 4
(iii) ∫ x − 0.5xdx = [2x1.5 /3 − x2 /4]0 = 16/3 − 4 = 4/3.
0

Answer to Exercise 137 (8863 N2009/I/4). (i) The curve intersect the x-axis at
(1, 0) and does not intersect the y-axis.
(ii) dy/dx = 1 + 1/x2 . dy/dx∣x=2 = 5/4. So the gradient of the normal at P is −0.8.
(iii) The point P is (2, 1.5). So the equation of the normal is y − 1.5 = −0.8(x − 2) or
4x + 5y − 15.5 = 0.
(iv) N is (0, 3.1). The equation of the tangent at P is y − 1.5 = (5/4)(x − 2). So the point
T is (0, −1).
So the area of △P T N is 0.5(4.1)(2) = 4.1.
Answer to Exercise 138 (8863 N2009/I/5). (i) dy/dx = 6x2 − 10x − 4 = 0 ⇐⇒

3x2 −5x−2 = (3x+1)(x−2) = 0 ⇐⇒ x = −1/3, 2. So the stationary points are (−1/3, 100/27)
and (2, −9). (ii) Below.
(iii) From the graph, 2x3 − 5x2 − 4x + 3 > 0 ⇐⇒ x ∈ (−1, 0.5) ∪ (3, ∞).
2e3x − 5e2x − 4ex + 3 > 0 ⇐⇒ ex ∈ (−1, 0.5) ∪ (3, ∞) ⇐⇒ x ∈ (−∞, ln 0.5) ∪ (ln 3, ∞).

(i) sin(2π + α) = sin 2π cos α + sin α cos 2π = 0 ⋅ cos α + sin α ⋅ 1 = sin α = c.
(ii) sin(3π + α) = sin 3π cos α + sin α cos 3π = 0 ⋅ cos α + sin α ⋅ (−1) = − sin α = −c.
sin(π + α) = sin π cos α + sin α cos π = 0 ⋅ cos α + sin α ⋅ (−1) = − sin α = −c.
1 2
Answer to Exercise 140 (8863 N2008/I/2). We are given that x + y = 20 and x2 + y 2 =
300.
1 3 3 2
From =, we have y = 20 − x. Plug = into = to get x2 + (20 − x)2 = 300 or 2x2 − 40x + 100 = 0
or x2 − 20x + 50 = 0. Solving the quadratic:
√
20 ± (−20)2 − 4(1)(50) √ √
x= = 10 ± 100 − 50 = 10 ± 50.
2(1)
√
Correspondingly, y = 10 ∓ 50. So the two solutions are
√ √ √ √
(x, y) = (10 ± 50, 10 ∓ 50) , (10 ∓ 50, 10 ± 50) .
Answer to Exercise 141 (8863 N2008/I/3). (i) 2x2 = x2 + k 2 ⇐⇒ x = ±k.

k k k
(ii) ∫ x2 + k 2 − 2x2 dx = [−x3 /3 + k 2 x]−k = 2 [−x3 /3 + k 2 x]0 = 2 (−k 3 /3 + k 3 ) = 4k 3 /3.
−k

Answer to Exercise 142 (8863 N2008/I/5). (i) dx/dt = 3t2 − 24t + k > 0 ⇐⇒
D = (−24)2 − 4(3)(k) < 0 ⇐⇒ k > 48. (ii) Below.
(iii) t3 − 12t2 + 36t = 375 ⇐⇒ t ≈ 11.7 (calculator).
Answer to Exercise 143 (8863 N2008/I/6). (i) dy/dx = 2/(2x + 4) = 1/(x + 2).
dy/dx∣x=1 = 1/3. The equation of the tangent at P is y − ln 6 = (1/3)(x − 1). So the
x-coordinate of T is 1 − 3 ln 6.
(ii) The equation of the normal at P is y − ln 6 = −3(x − 1). So the x-coordinate of N is
1 + (ln 6)/3.
(iii) 0.5 [(ln 6) /3 + 3 ln 6] ln 6 = (5/3)(ln 6)2 .
Answer to Exercise 144 (8863 N2007/I/1). d3x /dx = 3x ln 3. At x = 2, this equals

9 ln 3.
(ii) The point is (2, 9). The equation is y − 9 = 9 ln 3(x − 2) or y = (9 ln 3)x + 9(1 − 2 ln 3).

Answer to Exercise 145 (8863 N2007/I/3). (i) Below.
(ii) 20/(x + 2) = 10 − x2 ⇐⇒ x ≈ 2.317 (calculator).
(iii) ∫ 20/(x + 2)dx = 20 ln ∣x + 2∣ + C and ∫ (10 − x2 ) dx = 10x − x3 /3 + D, where C and

D are constants of integration.
2.317 2.317
(iv) ∫ 10 − x2 − 20/(x + 2)dx = [10x − x3 /3 − 20 ln ∣x + 2∣]0 ≈ 3.635.
0
Answer to Exercise 146 (8863 N2007/I/4). (i) Let a be the length of each side of
the isosceles triangle. By the Pythagorean Theorem, a2 + a2 = x2 . Thus, the area of the
triangle is 0.5a2 = x2 /4.
(ii) Let b be the length of a side of the square. Then 2b + x = 100. And A = x2 /4 + b2 =
2
x2 /4 + [0.5(100 − x)] = 2500 − 50x + 0.5x2 .
(iii) dA/dx = −50 + x = 0 ⇐⇒ x = 50. So Amin = 2500 − 50 ⋅ 50 + 0.5 ⋅ 502 = 1250. This is
the minimum because A is a ∪-shaped quadratic function of x.
(iv) A is a ∪-shaped quadratic function of x. We know the minimum is at x = 50. So the
maximum is at either corner (i.e. x = 10 or x = 80).
A(10) = 2500 − 50 ⋅ 10 + 0.5 ⋅ 102 = 2050. A(80) = 2500 − 50 ⋅ 80 + 0.5 ⋅ 802 = 1700.
So Amax = A(10) = 2050.

1
Answer to Exercise 147 (8863 N2007/I/5). (i) We are given y = 2x2 + 3x + 2 and
2 2 1
y = 2x + 3. Plug = into = to get 2x + 3 = 2x2 + 3x + 2 or 2x2 + x − 1 = 0 or (2x − 1)(x + 1) = 0.
Thus, x = 0.5, −1. Correspondingly, y = 4, 1. The two solutions to the given simultaneous
equations are thus (x, y) = (0.5, 4), (−1, 1).
(ii) The inequality 2x2 + 3x + 2 ≥ 2x + 3 is equivalent to (2x − 1)(x + 1) ≥ 0, which is true if
and only if x ≤ −1 or x ≥ 0.5.
(iii) From our work above, we know that the given inequality holds if and only if cos θ ≤ −1
or cos θ ≥ 0.5. Since cos θ ∈ [−1, 1], this is equivalent to cos θ = −1 or cos θ ∈ [0.5, 1]. Which
in turn is true if and only if θ ∈ [0○ , 60○ ] ∪ {180○ } ∪ [300○ , 420○ ] ∪ {540○ }.
Answer to Exercise 148 (8174 N2006/I/6). (i) x = (ln 23 − 2)/5 ≈ 0.227.

√
(ii) y = ± 102.5 − 40 ≈ ±16.620.
1
Answer to Exercise 149 (8174 N2006/I/7). Plug the equation of the line y = 1 − 3x
2
into the equation of the curve x2 + y 2 + kx + 2y + 7 = 0 to get
2 3
x2 + (1 − 3x) + kx + 2 (1 − 3x) + 7 = 0 ⇐⇒ 10x2 + (k − 12)x + 10 = 0.
Apply d/dx to the equation of the curve:
d d dy dy
(x2 + y 2 + kx + 2y + 7) = 0 ⇐⇒ 2x + 2y + k + 2 = 0.
dx dx dx dx
The tangent line has slope −3. So at the point at which the line touches the curve, we
have dy/dx = −3. Plugging this into the above, we have 2x + 2y(−3) + k + 2(−3) = 0 or
4 1 4 5 5
2x − 6y + k − 6 = 0. Now plug = into = to get 20x − 12 + k = 0 or k = 12 − 20x. Now plug = into
3
= to get 10x2 − 20x2 + 10 = 0 or 10x2 = 10 or x2 = 1 or x = ±1. Correspondingly, k = −8, 32.

Answer to Exercise 150 (8174 N2006/I/9). (i) ∫ (5x2 − 8x) dx = 5x3 /3 − 16x2 + C,
where C is the constant of integration.
1 1
(ii) ∫ e−2x dx = [−0.5e−2x ]0 = −0.5 (e−2 − 1) = 0.5 (1 − e−2 ).
0
Answer to Exercise 151 (8174 N2006/I/16). (i) −4x + 19 = −2x2 + 6x + 11 ⇐⇒

2x2 − 10x + 8 = 0 ⇐⇒ x2 − 5x + 4 = (x − 1)(x − 4) = 0 ⇐⇒ x = 1, 4. So A = (1, 15) and
B = (4, 3).
4 4 4
(ii) ∫ −2x2 + 6x + 11 − (−4x + 19)dx = ∫ −2x2 + 10x − 8dx = [−2x3 /3 + 5x2 − 8x]1
1 1
= (−2/3) ⋅ 63 + 5 ⋅ 15 − 8 ⋅ 3 = −42 + 75 − 24 = 9.

63.2 Answers for Ch. 59: Probability and Statistics
Answer to Exercise 152 (8864 N2015/I/6). Let X be the mass of a peach. We are
1 2
given that P(X < 40) = P (Z < (40 − µ)/σ) = 0.2 and P(X > 60) = P (Z > (60 − µ)/σ) = 0.25.
1 2 3 4
From = and =, we have (40 − µ)/σ ≈ −0.841621234 and (60 − µ)/σ ≈ 0.67448975.
So (60 − µ) − (40 − µ) = 20 ≈ 0.67448975σ + 0.841621234σ Ô⇒ σ ≈ 13.192. And µ ≈ 51.102.
Answer to Exercise 153 (8864 N2015/I/7). (i) Take the 12th, 24th, . . . , 1200th
students.
(ii) There might be some strange period-12 pattern in the list of names, thus introducing
bias to the sample.
(iii) Stratified.
Answer to Exercise 154 (8864 N2015/I/8). (i) 0.03 = P(A ∪ B) = P(A) + P(B) −
P(A ∩ B) = 3p − 0.42, so p = 0.15.
(ii) P(A ∪ B ′ ) = P(A) + P(B ′ ) − P(A ∩ B ′ ) = p + (1 − 2p) − 0.12 = 1 − p − 0.12 = 0.73.
(iii) P(A)P(B ′ ) = p(1 − 2p) = 0.15(0.7) = 0.105. P(A ∩ B ′ ) = 0.12. Since P(A)P(B ′ ) ≠
P(A ∩ B ′ ), A and B ′ are not independent.
Answer to Exercise 155 (8864 N2015/I/9). (i) Let X ∼ B(8, 1/6) be the number of
sixes. P(X = 3) = C(8, 3)(1/6)3 (5/6)5 = 56 ⋅ 55 /68 ≈ 0.104.
⎛8⎞ 1 0 5 8 ⎛8⎞ 1 1 5 7 ⎛8⎞ 1 2 5 6 ⎛8⎞ 1 3 5 5

(ii) P(X < 3) = ( ) ( ) + ( ) ( ) + ( ) ( ) + ( ) ( )
⎝0⎠ 6 6 ⎝1⎠ 6 6 ⎝2⎠ 6 6 ⎝3⎠ 6 6
≈ 0.969.
(iii) Let Y ∼ B(600, 1/6). Since the sample size is large, Y can approximated by A ∼
N (100, 500/6).
⎛ 89.5 − 100 100.5 − 100 ⎞

P (90 ≤ Y ≤ 100) = P (89.5 < A < 100.5) = P √ <A< √
⎝ 500/6 500/6 ⎠
≈ 0.52184005 − 0.12502718 ≈ 0.397.

(ii) r ≈ 0.922. There is a fairly strong, positive linear correlation between h and w.
(iii) w − 98.1 = 58.0(h − 1.799).
(iv) ŵh=1.66 = 58.0(1.66−1.799)+98.1 ≈ 90.0. (1) Linear interpolation is somehow magically
reliable. (2) Our linear model seems to fit the data pretty well.
Answer to Exercise 157 (8864 N2015/I/11). (i) Let M be the mass of a randomly
chosen man. P(75 ≤ M ≤ 79) ≈ 0.162.
(ii) Let W be the mass of a randomly chosen woman.
M1 + M2 + M3 − (W1 + W2 + W3 + W4 ) ∼ N (3 ⋅ 77 − 4 ⋅ 62, 3 ⋅ 9.82 + 4 ⋅ 10.62 ) = N (−17, 737.56) .
The desired probability is P (M1 + M2 + M3 − (W1 + W2 + W3 + W4 ) > 0) ≈ 0.266.

(iii) M1 +M2 +M3 +W1 +W2 +W3 +W4 ∼ N (3 ⋅ 77 + 4 ⋅ 62, 3 ⋅ 9.82 + 4 ⋅ 10.62 ) = N (479, 737.56).
The desired probability is P (M1 + M2 + M3 + W1 + W2 + W3 + W4 ≤ 460) ≈ 0.242.
(ii) (3/4) × (2/5) + (1/4) × (3/4) × (2/5) = 3/8. (iii) (3/4) × (2/5) = 0.3.
(iv) Let X ∼ B(5, 3/8) be the number of successes. P(X ≥ 2) ≈ 0.619.

Answer to Exercise 159 (8864 N2015/I/13). (i) Let X be the length of a fish. We are
given that X ∼ N (µ, 2.12 ) The competing hypotheses are H0 ∶ µ = 15.2 and HA ∶ µ ≠ 15.2.
The sample mean is X̄30 ∼ N (µ, 2.12 /30). The p-value is
P (X̄30 ≤ 14.5, X̄30 ≥ 15.9∣H0 ) ≈ 0.068 > 0.05.
We fail to reject H0 , which is the scientist’s claim.

(ii) Unbiased estimates of the population mean and variance are, respectively,
2
−32 2
325 − (−32)
40
x̄ = + 18 = 17.2, s = ≈ 7.677.
40 39
(iii) An ‘unbiased estimate’ is generated by an unbiased estimator, which is a random

variable whose expected value is equal to the parameter of interest.
(iv) The p-value is
P (X̄40 ≤ 17.2∣H0 ) ≈ 0.03391771.
The null hypothesis would be rejected if α ? 3.40.
Answer to Exercise 160 (8864 N2014/I/6). (i) P(H < 146) ≈ 0.737.
(ii) P(137.2 < H < 147.2) ≈ 0.595.
Answer to Exercise 161 (8864 N2014/I/7). (i) Order the 5000 households by name.
Take the 50th, 100th, . . . , 5000th households.
(ii) The six strata are “under-25, supermarket”, “under-25, online”, “25−60, supermarket”,
“25 − 60, online”, “over-60, supermarket”, “over-60, online”. From each, randomly pick,
respectively, 10, 20, 18, 32, 16, and 4 households.
(iii) Stratified sampling, because it usually results in a smaller sample variance.

15
10
(ii) r ≈ −0.926. There is a fairly strong, negative correlation between x and y.

(iii) y = −0.9021x + 16.15.
(iv) ŷx=13.2 = −0.9021(13.2) + 16.15 ≈ 4.2. We are supposed to say that this predicted value
is unreliable because it involves extrapolation.
Answer to Exercise 163 (8864 N2014/I/9). (i) (a) Let X ∼ B(6, 0.4) be the number
of cakes with fruit. P(X = 0) = 0.046656.
(b) P(X ≤ 2) = 0.54432.
(ii) Let Y ∼ B(8, 0.54432) be the number of packs with at most two cakes containing fruit.
P(Y ≥ 4) ≈ 0.729.
(iii) Let A ∼ B(150, 0.54432) be the number of packs with at most two cakes containing
fruit. Since the sample size is large, A is well-approximated by B ∼ N(81.648, 37.205). The
desired probability is P(A > 75) ≈ P(B > 75.5) ≈ 0.843.
Answer to Exercise 164 (8864 N2014/I/10). (i) Let X ∼ (µ, 4.4) be the length of a
leaf. The competing hypotheses are H0 ∶ µ = 7 and HA ∶ µ < 7. The sample mean X̄50 is, by
the CLT, well-approximated by N(µ, 4.4/50). The p-value P (X̄50 < 6.5∣H0 ) ≈ 0.0459 is less
than 5%, so we can reject the null hypothesis.
(ii) Unbiased estimates of the population mean and variance are
2209.2 − 310.4
2
310.4 2 50
x̄ = = 6.208, s = ≈ 5.76.
50 49
(iii) The competing hypotheses are H0 ∶ µ = 7 and HA ∶ µ ≠ 7. The sample mean X̄50 is, by
the CLT, well-approximated by N(µ, 5.76/50). The p-value is
P (X̄50 < 6.208, X̄ > 7.792∣H0 ) ≈ 0.01962372.
To reject H0 , α ? 1.97.

Answer to Exercise 165 (8864 N2014/I/11). (i) The total number of students is
48 + 12 + 10 + 20 + 55 + 15 + 130 + x = 290 + x. So
48 + 12 + 10 + 20 90
P(L) = = ,
290 + x 290 + x
55 + 15 + 10 + 20 100
P(G) = = .
290 + x 290 + x
Compute P(L ∩ G) = 30/(290 + x). If L and G are independent, then
90 100 30
P(L)P(G) = = P(L ∩ G) = .
290 + x 290 + x 290 + x
So 300 = 290 + x or x = 10.

(ii) P(L ∪ T ) = (48 + 12 + 10 + 20 + 130 + 15)/300 = 235/300 = 47/60.
(iii) P(T ∩ G′ ) = (130 + 12)/300 = 142/300 = 71/150.

(iv) P(L∣G) = (10 + 20)/(10 + 20 + 55 + 15) = 30/100 = 0.3.
(v) There are 10 + 12 + 15 = 37 students with exactly two items. So the probability that
two randomly chosen students have exactly two items is (37/300) × (36/299) ≈ 0.0148.
Answer to Exercise 166 (8864 N2014/I/12). (i) P(A > 75) = P(A > (75 − 50)/σ) =
0.0189 ⇐⇒ (75 − 50)/σ ≈ 2.077016894 ⇐⇒ σ ≈ 12.03649333 ⇐⇒ σ 2 ≈ 145.
(ii) WB = B1 + B2 + ⋅ ⋅ ⋅ + B7 ∼ N (7 ⋅ 75, 7 ⋅ 64) = N (525, 448). P(WB < 500) ≈ 0.119.
(iii) WA = A1 + A2 + ⋅ ⋅ ⋅ + A5 ∼ N (5 ⋅ 50, 5 ⋅ 145) = N (250, 725) ,
WB − 2WA ∼N (525 − 2 ⋅ 250, 448 + 22 ⋅ 725) = N (25, 3348) ,
P (WB > 2WA ) = P (WB − 2WA > 0) ≈ 0.667.
Answer to Exercise 167 (8864 N2013/I/6). (i) Randomly pick 25, 50, and 75 people
who bought the $X, $Y , $Z tickets, respectively.
(ii) Results in lower sample variance (as compared to simple random sampling).

Answer to Exercise 168 (8864 N2013/I/7). (i) Unbiased estimates of the population
mean and variance are
29555 − 305
2
2 250
t̄ = 305/250 + 75 = 76.22, s = ≈ 117.2.
250 − 1
(ii) Let T ∼ (µ, σ 2 ) be the retention time. The competing hypotheses are H0 ∶ µ = 75 and
HA ∶ µ > 75.
The sample mean retention time is T̄250 ∼ (µ, σ 2 /250). By the CLT, T̄250 is well-approximated
by T̄250 ∼ N (µ, s2 /250).
The p-value is
√
P (T̄250 ≥ 76.22∣H0 ) = P (Z ≥ (76.22 − 75)/( 117.2/250) ≈ 0.03738856 > 0.025.
We fail to reject the null hypothesis.

Answer to Exercise 169 (8864 N2013/I/8). (i) Let X ∼ B(10, 0.2) be the number of
batteries that have a lifetime less than 100 hours. P(X = 0) = 0.810 ≈ 0.107.
(ii) P(X ≤ 2) ≈ 0.678 (calculator).
(iii) Let Y ∼ B(80, 0.678) be the number of packs that are satisfactory. By the CLT, Y is
well-approximated by A ∼ N(80 ⋅ 0.678, 80 ⋅ 0.678 ⋅ 0.322) ≈ N(54.224, 17.471). So
P(Y ≥ 60) ≈ P(A ≥ 59.5) ≈ 0.103.
160
150
140
130
120
110
100
0 3 6 9 12 15
(ii) r ≈ 0.9032560806 is positive and fairly large, suggesting a fairly strong linear correlation
between age and height.
(iii) y = 4.46x + 87.43.
(iv) ŷx=13.2 = 4.46(13.2) + 87.43 ≈ 146. We are supposed to say that this estimate is reliable
because it involves interpolation.

Answer to Exercise 171 (8864 N2013/I/10). (i) Let M ∼ N (µ, 0.82 ) be the mass of
salt in a bottle. The sample mean is M̄20 ∼ N (µ, 0.82 /20). The competing hypotheses are
H0 ∶ µ = 12 and HA ∶ µ ≠ 12. Let [mL , mU ] be the range of possible values of m, in order for
H0 to not be rejected at the 5% significance level. So P (mL ≤ M̄20 ≤ mU ∣H0 ) = 0.95. So by
calculator, mU ≈ 12.35060902 and mL ≈ 11.64939098. So the set of possible values of m is
[11.64939098, 12.35060902].
(ii) The sample mean is M̄40 ∼ N (µ, 0.82 /40). The competing hypotheses are H0 ∶ µ = 12
and HA ∶ µ < 12. The p-value P (M̄20 ≤ 11.75∣H0 ) = 0.02405341 is less than 0.05, so we can
reject H0 — this is evidence in favour of the company’s claim.
Answer to Exercise 172 (8864 N2013/I/11). (i) Let A ∼ N (1000, σ 2 ) be the mass
(in g) of a Type A packet of animal food. P(A < 990) = P(Z < (990 − 1000) /σ) = 0.2 ⇐⇒
(990 − 1000) /σ = −0.841621234 ⇐⇒ σ ≈ 11.9.
(ii) Let P ∼ N (240, 102 ) and Q ∼ N (145, 82 ) be the masses (in g) of a scoop of P and a
scoop of Q, respectively. Then
B = P1 + P2 + P3 + Q1 + Q2 ∼ N (3 ⋅ 240 + 2 ⋅ 145, 3 ⋅ 102 + 2 ⋅ 82 ) = N (1010, 428) .
And so P(B < 1000) ≈ 0.314.

(iii) B − A ∼ N(1010 − 1000, 428 + 11.92 ) ≈ N(10, 569). So P(B > A) = P(B − A > 0) ≈ 0.662.

(ii) P(A) = P(5 or 6) + P(1)P(4 or 5) + P(2)P(3 or 4) = 1/3 + (1/6)(1/3) + (1/6)(1/3) = 4/9.
(iii) P(A ∩ B) = P(1)P(4 or 5) + P(2)P(3 or 4) = 1/9.
(iv) P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 4/9 + 1/3 − 1/9 = 2/3.
(v) P(A′ ) = 1 − P(A) = 1 − 4/9 = 5/9.
P(B ∩ A′ ) = P(1)P(1 or 2 or 3) + P(2)P(1 or 2) = (1/6)(1/2) + (1/6)(1/3) = 5/36.
So P(B∣A′ ) = P(B ∩ A′ )/P(A′ ) = (5/36)/(5/9) = 1/4.

Answer to Exercise 174 (8864 N2012/I/6). (i) Take an ordered list of the population.
Select every kth object in the list, until we get our desired sample size.
(ii) Advantage: He can expect the supermarket to be busy and get all the respondents he
needs. Disadvantage: The adults who go to the main supermarket at midday may not be
representative of the adult population.
(iii) Get an ordered list of the adult population. Select every 91st person on the list, until
he has his 100 respondents.
Answer to Exercise 175 (8864 N2012/I/7). (i) A, B independent ⇐⇒ P(A)P(B) =

p2 = P(A ∩ B). P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 2p − p2 . So p2 − 2p + 5/9 = 0.
(ii) p2 − 2p + 5/9 = (p − 5/3)(p − 1/3) = 0 ⇐⇒ p = 1/3 (reject p = 5/3). P(A ∩ B) = p2 = 1/9.
Answer to Exercise 176 (8864 N2012/I/8). (i) 50% × 60% = 0.3.

(ii) Proportion of votes cast by males: 50%×60%+35%×40%+15%×20% = 0.3+0.14+0.03 =
0.47. Thus P(Female) = 0.53.
(iii) P(Male) = 0.47. P(C ∩ Male) = 0.03. So P(C∣Male) = P(C ∩ Male)/P(Male) = 3/47.
(ii) r ≈ −0.9840253445 is very large and negative. This suggest a very strong, negative
linear correlation between x and y.
(iii) y = −19.21x + 183.12.
(iv) (a) ŷx=4 = −19.21(4) + 183.12 ≈ 106.26.
(iv) (b) ŷx=9 = −19.21(9) + 183.12 ≈ 10.23.
(v) We are supposed to say that the estimate ŷx=4 is reliable because it involves interpolation
and the estimate ŷx=9 is not because it involves extrapolation.

Answer to Exercise 178 (8864 N2012/I/10). (i) Let A ∼ B(12, 0.8) be the number
that flower. P(A = 10) ≈ 0.283 (calculator).
(ii) P(A < 8) ≈ 0.0726 (calculator).
(iii) Let B ∼ B(96, 0.8) be the number that flower. By the CLT, B is well-approximated
by C ∼ N(96 ⋅ 0.8, 96 ⋅ 0.8 ⋅ 0.2) = N(76.8, 15.36). So,
P(B > 75) ≈ P(C > 75.5) ≈ 0.630.
[This is fairly close to the exact probability of P(B > 75) ≈ 0.638 (calculator).]
(iv) Using the approximation in (iii), the answer is C(3, 2)0.6302 (1−0.630)+0.6303 ≈ 0.691.
[Using instead the exact probability, it is C(3, 2)0.6382 (1 − 0.638) + 0.6383 ≈ 0.702.]
1240 − (−60)
2
−60 2 100 1204
x̄ = + 300 = 299.4, s = = .
100 100 − 1 99
(ii) Let X̄100 ∼ (µ, σ 2 /100) be the sample mean. By the CLT, it is approximately the case
that X̄100 ∼ N (µ, s2 /100). The competing hypotheses are H0 ∶ µ = 300 and HA ∶ µ ≥ 300.
The p-value is
P (X̄100 ≥ 299.4∣H0 ) ≈ 0.957.
We are unable to reject H0 . This is evidence against the manager’s claim.
(iii) Let X̄100 ∼ (µ, 12.1/100) be the sample mean. By the CLT, it is approximately the
case that X̄100 ∼ N (µ, 0.121). The competing hypotheses are H0 ∶ µ = 300 and HA ∶ µ ≥ 300.
The minimum kmin at which we’d be able to reject H0 at the 10% significance level is given
by
P (X̄100 ≥ kmin ∣H0 ) = 0.1.
So by calculator, kmin ≈ 300.4457884.

Answer to Exercise 180 (8864 N2012/I/12). (i) A1 +A2 +⋅ ⋅ ⋅+A10 ∼ N (10 ⋅ 0.25, 10 ⋅ 0.022 ).
So P (A1 + A2 + ⋅ ⋅ ⋅ + A10 < 2.4) ≈ 0.0569.
(ii) A1 + A2 + ⋅ ⋅ ⋅ + A6 ∼ N (6 ⋅ 0.25, 6 ⋅ 0.022 ). B1 + B2 + ⋅ ⋅ ⋅ + B5 ∼ N (5 ⋅ 0.35, 5 ⋅ 0.032 ). So
A1 + A2 + ⋅ ⋅ ⋅ + A6 − (B1 + B2 + ⋅ ⋅ ⋅ + B5 ) ∼ N (6 ⋅ 0.25 − 5 ⋅ 0.35, 6 ⋅ 0.022 + 5 ⋅ 0.032 )
Ô⇒ P (−0.2 < A1 + A2 + ⋅ ⋅ ⋅ + A6 − (B1 + B2 + ⋅ ⋅ ⋅ + B5 ) < 0.2) ≈ 0.274.
(iii) Mrs Woo and Mr Tan pay, respectively,
W = 1.5 (A1 + A2 + A3 ) + 2.4 (B1 + B2 + B3 )

∼ N (1.5 ⋅ 3 ⋅ 0.25 + 2.4 ⋅ 3 ⋅ 0.35, 1.52 ⋅ 3 ⋅ 0.022 + 2.42 ⋅ 3 ⋅ 0.032 )
= N (3.645, 0.018252)
T = 1.5 (A1 + A2 + ⋅ ⋅ ⋅ + A10 ) ∼ N (1.5 ⋅ 10 ⋅ 0.25, 1.52 ⋅ 10 ⋅ 0.022 ) = N(3.75, 0.009).
So W − T ∼ N (−0.105, 0.027252). And P(W − T > 0) ≈ 0.262.
Answer to Exercise 181 (8864 N2011/I/6). P(A ∪ B) = P(A) + P(B) − P(A ∩ B) or

1
0.46 = a + b − ab.
2
Moreover, A, B independent ⇐⇒ P(A)P(B) = P(A ∩ B) or ab = 0.04.
2 1
Plug = into = to get: 0.46 = a + 0.04/a − 0.04 or a2 − 0.5a + 0.04 = 0.
a2 − 0.5a + 0.04 = (a − 0.1)(a − 0.4) = 0 ⇐⇒ a = 0.1, 0.4.
Answer to Exercise 182 (8864 N2011/I/7). (i) Every student is equally likely to be
chosen.
(ii) The three strata are “car”, “bicycle”, and “on foot”. The totals for each stratum are
440, 760, and 800, for a grand total of 2000 students. So from each stratum, take 22, 38,
and 40 students.
(iii) Stratified sampling usually results in lower sample variance (than simple random
sampling).
A better stratified sample of size 100 could have been achieved by using six strata instead
of just three: namely “Year 1 car”, “Year 1 bicycle”, “Year 1 on foot”, “Year 2 car”, “Year
2 bicycle”, and “Year 2 on foot”.

(ii) r ≈ −0.9670056283 is large and negative, which suggests there is a strong, negative
linear correlation between H and T .
(iii) T = −0.01472090021H + 27.00297934.
(iv) T̂H=1000 = −0.01472090021(1000) + 27.00297934 ≈ 12.28. We are supposed to say it’s
reliable because it involves interpolation.
Answer to Exercise 184 (8864 N2011/I/9). (i) The lifetime of a light bulb in this
batch is L ∼ (µ, 14002 ). The sample mean lifetime is L̄50 ∼ (µ, 14002 /50). By the CLT, we
have approximately L̄50 ∼ N (µ, 14002 /50).
The competing hypotheses are H0 ∶ µ = 12000 and HA ∶ µ < 12000. The p-value is
P (L̄50 ≤ 11500∣H0 ) ≈ 0.00577864 < 0.01,
so we can reject H0 . This is evidence in favour of believing that this particular batch is
substandard.
(ii) P (L̄50 ≤ Tmin ∣H0 ) = 0.05 ⇐⇒ Tmin ≈ 11674.3356 (calculator).
Answer to Exercise 185 (8864 N2011/I/10). (i) (a) Let X ∼ B(7, 0.8) be the number
of times Jon completes the puzzle. P(X = 3) = 0.028672 (calculator).
(i) (b) P(X ≥ 5) = 0.851968 (calculator).
(ii) 0.8519685 ≈ 0.449.
(iii) Let Y ∼ B(70, 0.8) be the number of times Jon completes the puzzle. By the CLT, Y
is well-approximated by A ∼ N(70 ⋅ 0.8, 70 ⋅ 0.8 ⋅ 0.2) = N(56, 11.2). So
P(Y ≥ 50) ≈ P(A ≥ 49.5) ≈ 0.974.
[This is fairly close to the exact probability P(Y ≥ 50) ≈ 0.970.]

Answer to Exercise 186 (8864 N2011/I/11). (i) (a)
(i) (b) P(Red ball) = P(A ∩ Red ball) + P(B ∩ Red ball)
= P(A)P(Red ball∣A) + P(B)P(Red ball∣B)
= 1/45/10 + 3/46/8 = 11/16.
(i) (c) P(A∣Red ball) = P(A ∩ Red ball)/P(Red ball) = 1/8/11/16 = 2/11.
(ii) P(Same) = P(A ∩ Same) + P(B ∩ Same) = P(A)P(Same∣A) + P(B)P(Same∣B)

1 5 4 4 3 3 65 21 8 3 163
= ( + )+ ( + )= + = .
4 10 9 10 9 4 87 87 90 7 315
Answer to Exercise 187 (8864 N2011/I/12). Let B ∼ N (60, 122 ) and G ∼ N (50, 102 )
be the masses of a boy and a girl.
(i) P (50 ≤ B ≤ 70) ≈ 0.595 (calculator).
(ii) B − G ∼ N (60 − 50, 122 + 102 ) = N (10, 244). So P (B > G) = P (B − G > 0) ≈ 0.739
(calculator).
(iii) B1 + B2 + B3 + G1 + G2 ∼ N (3 ⋅ 60 + 2 ⋅ 50, 3 ⋅ 122 + 2 ⋅ 102 ) = N (280, 632).
So P (B1 + B2 + B3 + G1 + G2 < 300) ≈ 0.787.
(iv) B1 + B2 + ⋅ ⋅ ⋅ + B6 ∼ N (6 ⋅ 60, 6 ⋅ 122 ) = N (360, 864).
√
P (B1 + B2 + ⋅ ⋅ ⋅ + B6 < L) = P (Z < (L − 360)/ 864) = 0.95
√
⇐⇒ (L − 360)/ 864 ≈ 1.644853627 ⇐⇒ L ≈ 408.349.
Answer to Exercise 188 (8864 N2010/I/6). (i) P (A ∩ B) = P(A∣B)P (B) = 0.2 ⋅ 0.3 =
0.06.
(ii) P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 0.6 + 0.3 − 0.06 = 0.84.
(iii) P (A ∪ B) − P (A ∩ B) = 0.84 − 0.06 = 0.78.

Answer to Exercise 189 (8864 N2010/I/7). (i) 0.3 × 0.1 = 0.03.
(ii) 0.3 × 0.9/(0.7 + 0.3 × 0.9) = 27/97.
(iii) C(3, 2)0.72 (0.3 × 0.9) = 3 × 0.49 × 0.27 = 0.3969.
Answer to Exercise 190 (8864 N2010/I/8). There are 3, 000 students total.
(i) Randomly pick 28, 18, and 14 students from Years One, Two, and Three respectively.
(ii) Stratified sampling usually results in a lower sample variance.
(iii) Unbiased estimates of µ and σ 2 are
2235000 − 10450
2
10450 2 50
x̄ = = 209, s = ≈ 1039.79591837.
50 50 − 1
(iv) (1) The large sample size lets us use the CLT approximation. (2) What each student
spends is independent of what any other student spends (this assumption is actually already
implicit in the definition of a random sample).
Answer to Exercise 191 (8864 N2010/I/9). Let X ∼ B(8, 0.7) be the number that
germinate. (i) P(X = 6) ≈ 0.296 (calculator). (ii) P(X ≥ 6) ≈ 0.552 (calculator).
(iii) Let Y ∼ B(60, 0.7) be the number that germinate. By the CLT, Y is well-approximated
by A ∼ N(42, 12.6). So P(Y < 40) ≈ P(A < 39.5) ≈ 0.241.
[This is fairly close to the exact probability of P(Y < 40) ≈ 0.238 (calculator).]
Answer to Exercise 192 (8864 N2010/I/10). Let X ∼ N (µ, 1.22 ) be the mass of a
component. (i) The sample mean is X̄80 ∼ N (µ, 1.22 /80). The competing hypotheses are
H0 ∶ µ = 15 and HA ∶ µ ≠ 15. The p-value P (X̄80 ≥ 15.25, X̄80 ≤ 14.75∣H0 ) ≈ 0.06240742 is
more than 5%, so we fail to reject H0 . This fails to cast doubt or provide evidence against
the factory owner’s claim.
(ii) The sample mean is X̄80 ∼ N (µ, 1.22 /80). The competing hypotheses are H0 ∶ µ = 15
and HA ∶ µ < 15. The maximum observed sample mean kmax at which we’d reject H0 (in
favour of the owner’s new claim) is given by: P (X̄80 ≤ kmax ∣H0 ) = 0.05. So by calculator,
kmax ≈ 14.77931973. So the set of values of x̄80 for which we’d reject H0 (in favour of the
owner’s new claim) is [0, 14.77931973).

Answer to Exercise 193 (8864 N2010/I/11). (a) (i)
(a) (ii)
(b) (i)
(b) (ii) r ≈ 0.9872953317.
(b) (iii) y = 0.1069x + 0.5615.

(b) (iv) ŷx=40 = 0.1069(40) + 0.5615 = 4.8375. We are supposed to say that this is a reliable
estimate because it involves interpolation.
(b) (v) m is unchanged and c increases by N .

Answer to Exercise 194 (8864 N2010/I/12). Let U ∼ N (40, 32 )be the masses of a
unwrapped sweet.
(i) P(U < 36) ≈ 0.0912 (calculator).
(ii) Let W ∼ N (44, 32 + 0.52 ) = N (44, 9.25) be the mass of a wrapped sweet. P(42 < W <
46) ≈ 0.489 (calculator).
(iii) Let T ∼ N (12 ⋅ 44 + 50, 12 ⋅ 9.25 + 52 ) = N (578, 136) be e mass of a tube. P(T > 600) ≈
0.0296 (calculator).
(iv) Let X ∼ N (µ, σ 2 ) be the mass of a tube produced by the rival company.
1
P(X < 450) = P (X < (450 − µ)/σ) = 0.05 ⇐⇒ (450 − µ)/σ = −1.644853627.
2
P(X > 550) = P (X > (550 − µ)/σ) = 0.08 ⇐⇒ (550 − µ)/σ = 1.40507156.
(550 − µ) − (450 − µ) = 100 = 1.40507156σ − (−1.644853627) σ = 3.049925187σ. So σ ≈ 32.788
and σ 2 ≈ 1075.033. And µ ≈ 503.931.
Answer to Exercise 195 (8864 N2009/I/6). (i) 0.2 ⋅ 0.7 = 0.14.

(ii) 0.2 ⋅ 0.7 + 0.3 ⋅ 0.6 + 0.5 ⋅ 0.8 = 0.14 + 0.18 + 0.4 = 0.72.
(iii) (0.5 ⋅ 0.2)/(1 − 0.72) = 5/14.
Answer to Exercise 196 (8864 N2009/I/7). (i) P(A ∩ B) = P(A) + P(B) − P(A ∪ B) =
1/3 + 2/5 − 17/30 = 1/6.
(ii) P(A)P(B) = 2/15 is not equal to P(A ∩ B) = 1/6, so A and B are not independent.
(iii) P(A′ ∪ B) = 1 − [P(A) − P(A ∩ B)] = 1 − (1/3 − 1/6) = 5/6.
Answer to Exercise 197 (8864 N2009/I/8). Let X ∼ N (120, 182 ) be the lifetime of a
component.
(i) P(X > 144) ≈ 0.09121122 (calculator).
(ii) P (X1 < 144) P (X2 > 144) + P (X1 > 144) P (X2 < 144) ≈ 0.16578346669.
(iii) Let X ∼ N (µ, 182 ) be the new lifetime of a component. The sample mean is X̄50 ∼
N (µ, 182 /50). The competing hypotheses are H0 ∶ µ = 120 and HA ∶ µ > 120. The p-value is
P (X̄50 ≥ 124∣H0 ) ≈ 0.05805087 > 0.05,
so we fail to reject H0 . This fails to provide evidence in favour of the company’s claim.

(ii) r ≈ 0.9306540721 is fairly large and positive, suggesting a fairly strong positive linear
correlation between x and y.
(iii) y = 0.01232906764x + 15.48661792.
(iv) ŷx=135 = 0.01232906764(135) + 15.48661792 ≈ 17.15.

(v) We are supposed to say that this involves extrapolation and is thus unreliable/unsuitable.

(i) Let X ∼ B(10, 0.2) be the number (out of ten) who fail. P(X = 2) ≈ 0.302 (calculator).
(ii) Let Y ∼ B(10, 0.8 ⋅ 0.15) be the number (out of ten) who get a distinction. P(Y < 2) ≈
0.658 (calculator).
(iii) Let A ∼ B(50, 0.2) be the number (out of 50) who fail. By the CLT, A is well-
approximated by B ∼ N(10, 8). So
P(A ≤ 12) ≈ P(B ≤ 12.5) ≈ 0.812.
[This is fairly close to the exact probability P(A ≤ 12) ≈ 0.814 (calculator).]

Answer to Exercise 200 (8864 N2009/I/11). (a) (i) Sort the claims in alphabetical
order. Then take the 9th, 18th, . . . , and 72nd claims in the list.
(a) (ii) Probably. The first 8 submissions might not be representative of the 72 received
that day. For example, it might be that those who wake up early and submit their insurance
claims early are also the ones who make the most outrageous claims.
(b) (i) Unbiased estimates of the population mean and variance are
8282000 − 5320
2
5320 1 120
x̄ = + 1000 = 1044 , s2 = ≈ 67614.6778711.
120 3 120 − 1
(b) (ii) An ‘unbiased estimate’ is generated by an unbiased estimator, which is a random

variable whose expected value is equal to the parameter of interest.
(b) (iii) The sample mean is X̄72 ∼ (µ, σ 2 /120). By the CLT, we have approximately
X̄72 ∼ N (µ, s2 /120). The p-value is
1 2 2
P (X̄72 > 1044 , X̄72 < 955 ∣H0 ) = 2P (X̄72 < 955 ∣H0 ) ≈ 0.06180786.
3 3 3
So we’d reject H0 if α ? 0.06180786.
Answer to Exercise 201 (8864 N2009/I/12). (a) Let X ∼ N (µ, σ 2 ) be the mass of a
plum.
1
P(X < 22) = P(Z < (22 − µ)/σ) = 0.3 ⇐⇒ (22 − µ)/σ ≈ −0.524400513.
2
P(X > 29) = P(Z > (29 − µ)/σ) = 0.2 ⇐⇒ (29 − µ)/σ ≈ 0.841621234.
(29 − µ) − (22 − µ) = 0.841621234σ − (−0.524400513) σ = 7 = 1.366021747σ ⇐⇒ σ ≈ 5.124.
And µ ≈ 24.687.
(b) (i) Let A ∼ N (0.15, 0.032 ) and N ∼ N (0.07, 0.022 ) be the masses of an apple and a
nectarine.
A1 + A2 − (N1 + N2 + N3 + N4 ) ∼ N (2 ⋅ 0.15 − 4 ⋅ 0.07, 2 ⋅ 0.032 + 4 ⋅ 0.022 ) = N (0.02, 0.0034).
P (A1 + A2 > (N1 + N2 + N3 + N4 )) = P (A1 + A2 − (N1 + N2 + N3 + N4 ) > 0) ≈ 0.634 (calcula-
tor).
(b) (ii) 9 (A1 + A2 ) + 12 (N1 + N2 + N3 + N4 ) is the random variable with distribution
N (9 ⋅ 2 ⋅ 0.15 + 12 ⋅ 4 ⋅ 0.07, 92 ⋅ 2 ⋅ 0.032 + 122 ⋅ 4 ⋅ 0.022 ) = N (6.06, 0.3762) .
P (5 < 9 (A1 + A2 ) + 12 (N1 + N2 + N3 + N4 ) < 6) ≈ 0.419 (calculator).

Answer to Exercise 202 (8864 N2008/I/7). (i) The normal distribution would suggest
that a non-trivial percentage of students get more than 100.
(ii) The sample mean is X̄50 ∼ (72.1, 15.22 /50). By the CLT, we have approximately
X̄50 ∼ N (72.1, 15.22 /50). So by calculator, P (70.0 ≤ X̄50 ≤ 75.0) ≈ 0.74704179.
Answer to Exercise 203 (8864 N2008/I/8). (i) C(6, 3)0.63 0.43 = 0.27648.
(ii) Let X ∼ B(40, 0.6) be the number that are crusty. By the CLT, X is well-approximated
by Y ∼ N(24, 9.6). So
P(X ≥ 20) ≈ P(Y ≥ 19.5) ≈ 0.927.
[This is fairly close to the exact probability P(X ≥ 20) ≈ 0.926 (calculator).]
(iii) Let M ∼ N (1.24, σ 2 ) be the mass of a loaf. P(M < 1) = P(Z < (1 − 1.24)/σ) = 0.04
⇐⇒ (1 − 1.24)/σ = −1.750686071 ⇐⇒ σ ≈ 0.137.
(ii) If Tan’s pen is red, then there are 2 red pens, 5 blue pens, and 1 green pen in the box
when Mui gets a randomly-chosen pen. So the probability that Mui’s pen is blue is 5/8.
(iii) If Tan’s pen is red, then there are 2 red pens, 5 blue pens, and 1 green pen in the box
when Mui gets a randomly-chosen pen; and Mui gets a red pen with probability 2/8. If
Tan’s pen is blue, then there are 3 red pens, 4 blue pens, and 1 green pen in the box when
Mui gets a randomly-chosen pen; and Mui gets a red pen with probability 3/8. Altogether
then, her probability of getting a red pen is 3/82/8 + 5/83/8 = 21/64.
(iv) Mui’s pen is blue with probability 1 − 21/64 − 1/8 = 35/64.
Tan’s pen is red and Mui’s pen is blue with probability 3/85/8 = 15/64.
Thus, the desired conditional probability is 15/64/35/64 = 3/7.

Answer to Exercise 205 (8864 N2008/I/10). (i) The sample mean is X̄70 ∼ (µ, σ 2 /70).
By the CLT, we have approximately X̄70 ∼ N (µ, s2 /70). The competing hypotheses are
H0 ∶ µ = 150 and HA ∶ µ < 150.
The observed sample mean and observed sample variance are, respectively,
1540231 − 10317
2
10317 2 70
x̄70 = ≈ 147.385714286, s = ≈ 284.820082816.
70 70 − 1
The p-value is the probability of getting a test statistic that is at least as extreme as that
actually observed. It is: P (X̄70 < x̄70 ∣H0 ) ≈ 0.09748170.
(ii) The sample mean is W̄120 ∼ (µ, σ 2 /120). By the CLT, we have approximately W̄120 ∼
N (µ, s2w /120). The observed sample mean and observed sample variance are, respectively,
2
10317 + 7331 1 2
1540231 + 1100565 − (10317+7331)
70+50
w̄120 = = 147 , sw = ≈ 381.205602241.
70 + 50 15 70 + 50 − 1
The p-value P (W̄120 < w̄120 ∣H0 ) ≈ 0.04990429 is less than 10%, so we are able to reject H0 .
(17, 343.75)
(ii) (x̄, ȳ) ≈ (17, 343.75) is indicated in blue.

(iii) y = 17.083̇x + 53.3̇.
(iv) r ≈ 0.9688043135 is very large and positive, suggesting a strong positive linear corre-
lation between x and y.
(v) ŷx=20 = 17.083̇(20) + 53.3̇ = 395. The estimated corresponding profit is $395, 000.
2
(vi) ŷx=40 = 17.083̇(40) + 53.3̇ = 736 . The predicted corresponding profit is $736, 667. We
3
are supposed to say that this prediction is unreliable.

Answer to Exercise 207 (8864 N2008/I/12). (i) Let S ∼ N (5 ⋅ 0.234, 5 ⋅ 0.0252 ) =
N(1.17, 0.003125). P(S > 1.2) ≈ 0.296 (calculator).
(ii) S1 + S2 ∼ N(2 ⋅ 1.17, 2 ⋅ 0.003125) = N(2.34, 0.00625). L ∼ N(10 ⋅ 0.234, 10 ⋅ 0.0252 ) =
N(2.34, 0.00625). So S1 + S2 − L ∼ N(0, 0.0125).
P(L − 0.2 < S1 + S2 < L + 0.2) = P(−0.2 < S1 + S2 − L < 0.2) ≈ 0.926.
(iii) Lee pays 1.5 (S1 + S2 ) ∼ N (1.5 ⋅ 2.34, 1.52 ⋅ 0.00625) = N (3.51, 0.0140625).
Foo pays 1.2L ∼ N (1.2 ⋅ 2.34, 1.22 ⋅ 0.00625) = N (2.808, 0.009).
So 1.5 (S1 + S2 ) − 1.2L ∼ N (0.702, 0.0230625). And P (1.5 (S1 + S2 ) − 1.2L ≥ 0.5) ≈ 0.908
(calculator).
Answer to Exercise 208 (8864 N2007/I/6). Let M ∼ N (502, 0.82 ) be the mass of
margarine in a packet. (i) P(M < 500) ≈ 0.00621 (calculator).
(ii) The new mass of margarine in a packet is M ∼ N (µ, 0.82 ). P(M < 500) = P(Z <
(500 − µ)/0.8) = 0.001 ⇐⇒ (500 − µ)/0.8 ≈ −3.090232306 ⇐⇒ µ ≈ 502.4721858.
Answer to Exercise 209 (8864 N2007/I/7). (i) Systematic.

(ii) Advantage: Simple to implement. Disadvantage: The students who do not buy lunch
have no possibility of being included in her sampling method.
(iii) Take an alphabetical list of all students. Select every kth student on the list.
(ii) r ≈ 0.9734793616. (iii) x = 0.9397628752y − 10.55810619.

(iv) (a) ŷx=28 = 16.7 + 1.01(28) = 44.98.
(iv) (b) x̂y=198 = 0.9397628752(198) − 10.55810619 ≈ 175.5.
(v) We are supposed to say that the estimate ŷx=28 is reliable because it involves interpo-
lation, but the estimate x̂y=198 is not because it involves extrapolation.

Answer to Exercise 211 (8864 N2007/I/9). (i) P(X = 4) = C(6, 4)p4 (1 − p)2 .
(ii) P(X = 4) = C(6, 4)p4 (1 − p)2 = 15(1/4)4 (3/4)2 = 15 ⋅ 9/46 = 135/4096.
√ √ √
(iii) µ = np = 6(1/4) = 3/2 and σ = np(1 − p) = 6(1/4)(3/4) = 9/8. So
√ √
⎛3 9 3 9⎞
P(µ − σ < X < µ + σ) = P − <X < + = P(X = 1, X = 2)
⎝2 8 2 8⎠
1 1 3 5 1 2 3 4 6 ⋅ 243 + 15 ⋅ 81 2673
= C(6, 1) ( ) ( ) + C(6, 2) ( ) ( ) = = ≈ 0.63.
4 4 4 4 46 4096
150.5 − (−35.8)
2
−35.8 2 50
x̄ = + 500 = 499.284, s = ≈ 2.54831020408.
50 50 − 1
(ii) The sample mean is X̄50 ∼ N (µ, σ 2 /50). We can use s2 as an unbiased estimate for σ 2 .
The competing hypotheses are H0 ∶ µ = 500 and HA ∶ µ < 500. And so the p-value is
P (X̄50 ≤ 499.284∣H0 ) ≈ 0.00075813 < 0.05,
so we can reject H0 .
(iii) No, the sample size was large enough that we could have used the CLT.
Answer to Exercise 213 (8864 N2007/I/11). (i) (a) P(M ) = (18 + 48 + 6)/120 = 3/5.
(i) (b) P(M ∩ G) = 18/120 = 3/20.
(i) (c) P(M ∪ B) = (18 + 48 + 6 + 22)/120 = 47/60.
(i) (d) P(M ∣R′ ) = (18 + 48)/(18 + 48 + 12 + 22) = 66/100 = 0.66.
(ii) P(M )P(G) = (3/5) (30/120) = 3/20 is equal to P(M ∩ G) = 3/20; thus M and G are indeed
independent.
(iii) The number of blue cars with bicycle racks is 0.3 ⋅ 70 = 21.
The number of cars with bicycle racks is 0.2 ⋅ 30 + 0.3 ⋅ 70 + 0.05 ⋅ 20 = 6 + 21 + 1 = 28.
So the desired probability is 21/28 = 3/4.

Answer to Exercise 214 (8864 N2007/I/12). Let M ∼ N (75, 12.52 ) and W ∼
N (55, 10.52 ) be the masses of a man and a woman.
(i) P (M1 > 90) P (M2 < 90)+P (M2 > 90) P (M1 < 90) = 2 (0.11506967) (0.88493033) ≈ 0.204
(calculator).
(ii) W − M ∼ N (−20, 10.52 + 12.52 ). So P(W > M ) = P(W − M > 0) ≈ 0.110 (calculator).
(iii) M1 +M2 +⋅ ⋅ ⋅+M6 ∼ N (6 ⋅ 75, 6 ⋅ 12.52 ) = N (450, 937.5). So P (M1 + M2 + ⋅ ⋅ ⋅ + M6 > 530) ≈
0.00449034 (calculator).
(iv) The weights of the hotel guests are probably not independent.
The distribution of weights of the hotel guests may differ from that of the population.
Answer to Exercise 215 (8174 N2006/II/8). P(A ∪ B) = P(A) + P(B) − P(A ∩ B) =

P(A) + P(B) − P(A)P(B) or 0.7 = 0.6 + b − 0.6b. So b = 0.25 and P(A ∩ B) = 0.15.
P(A ∩ B ′ ) = P(A) − P(A ∩ B) = 0.6 − 0.15 = 0.45.
Answer to Exercise 216 (8174 N2006/II/9). (i) Quota.
(ii) Systematic.
(iii) Use a computer random number generator to generate, for each member, a number
between 0 and 1. Take the x members with the largest numbers, where x is his desired
sample size.
(iv) (a) 25 women.
(iv) (b) 15 men from squash.
Answer to Exercise 217 (8174 N2006/II/13). (i) Let X ∼ B(12, 0.3) be the number
of residents (out of 12) who watch the programme. P(X = 4) ≈ 0.231.
(ii) Let Y ∼ B(80, 0.3) be the number of residents (out of 80) who watch the programme.
By the CLT, Y is well-approximated by A ∼ N(24, 16.8). So
P(20 < Y < 30) ≈ P(20.5 < A < 29.5) ≈ 0.714.
[This is fairly close to the exact probability P(20 < Y < 30) ≈ 0.711 (calculator).]

Answer to Exercise 218 (8174 N2006/II/14). (i) P(W W ) = 0.82 = 0.64.
(ii) P(W W W )+P(W LW )+P(LW W )+P(LLW ) = 0.83 +0.8⋅0.2⋅0.4+0.2⋅0.4⋅0.8+0.2⋅0.6⋅0.4 =
0.512 + 0.128 + 0.048 = 0.688.
(iii) P(W W W )+P(W LW )+P(LW W )+P(W W L) = 0.512+0.128+0.82 ⋅0.2 = 0.64+0.128 =

0.668.
Answer to Exercise 219 (8174 N2006/II/14-OR). Let X ∼ N (176, 42 ) be the height

of a male student.
(i) P(X < 170) ≈ 0.0668 (calculator).

(ii) P (X > k) = 0.1 ⇐⇒ k ≈ 181.1262063 (calculator).
Let Y ∼ N (m, s2 ) be the height of a female student.
1
(iii) P (Y < 150) = P (Z < (150 − m) /s) = 0.006 ⇐⇒ (150 − m) /s ≈ −2.512144328.
1
P (Y < 175) = P (Z < (175 − m) /s) = 0.883 ⇐⇒ (175 − m) /s ≈ 1.190118042.
(175 − m) − (150 − m) = 25 ≈ 1.190118042s − (−2.512144328s) = 3.70226237s ⇐⇒ s ≈ 6.753.
And m ≈ 166.964.
(This is the last page of this textbook.)

I make educational YouTube videos too! Mostly on
economics. Do me a favour by checking them out! I’m
a newbie at this, so please feel free to leave me a
comment if you have any feedback or suggestions.
YouTube.com/EconCow EconCow.com
Tuition Ad
I give tuition for any of the following subjects:
 Economics
 Mathematics
 Writing, English, General Paper.
I have a PhD in economics (University of

Michigan, 2015) and have been teaching and
tutoring since 2010.
For more information, please visit:
www.EconsPhDTutor.com
Or simply email:
DrChooYanMin@gmail.com

H1 Mathematics Textbook (Choo Yan Min) PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

H1 Mathematics Textbook (Choo Yan Min) PDF

Încărcat de

Drepturi de autor:

Formate disponibile

H1

Includes TYS & Answers.

The latest version will always be at this link.

This textbook was first completed in August 2016.

Page 2, Table of Contents www.EconsPhDTutor.com

With your help, I plan to keep improving this textbook.

Page 3, Table of Contents www.EconsPhDTutor.com

You are free to:

• Share — copy and redistribute the material in any medium or format

Under the following terms:

Author: Choo, Yan Min.

Page 4, Table of Contents www.EconsPhDTutor.com

A mathematician, like a painter or a poet, is a maker of patterns. If his patterns are

Page 5, Table of Contents www.EconsPhDTutor.com

• have passed O-Level Mathematics;

1. It is a shameless advertising vehicle for my awesome tutoring services.

1. You’re a nice human being , [*emotional_manipulation*].

Page 6, Table of Contents www.EconsPhDTutor.com

This book was written using LYX.5

• Is the font size big enough?

Page 7, Table of Contents www.EconsPhDTutor.com

A good stock of examples, as large as possible, is indispensable for a

• You get a List of Formulae during the A-level exam.

Page 8, Table of Contents www.EconsPhDTutor.com

Page 9, Table of Contents www.EconsPhDTutor.com

Page 10, Table of Contents www.EconsPhDTutor.com

About This Book 6

Tips for the Student 8

Use of Graphing Calculators 10

I Functions and Graphs 17

5 Graphs: Turning Points 25

10 Exponential Growth and Decay 38

Page 11, Table of Contents www.EconsPhDTutor.com

16 Graphing with the TI84 51

17 Simultaneous Equations: One Linear and One Quadratic 53

18 Solving Equations Using Your TI84 56

20 Solving Inequalities Using Your TI84 58

21 Formulating an Equation or a System of Linear Equations from a Problem

23 The Derivative as Slope of the Tangent 65

25 Increasing, Decreasing, and f ′ 74

26 Finding Turning Points (the First Derivative Test) 76

28 Finding Max/Min Points on the TI84 85

29 Finding the Derivative at a Point on the TI84 87

30 Connected Rates of Change Problems 88

31 Integration as the Reverse of Differentiation 90

Page 12, Table of Contents www.EconsPhDTutor.com

33 Basic Rules of Integration 94

34 The Definite Integral as the Area Under a Graph 97

35 Area between a Curve and Lines Parallel to Axes 101

36 Area between a Curve and a Line 102

37 Area between Two Curves 103

38 Finding Definite Integrals on your TI84 104

III Probability and Statistics 105

39 How to Count: Four Principles 106

39.3 How to Count: The Inclusion-Exclusion Principle . . . . . . . . . . . . . . . . 114

40 How to Count: Permutations 117

40.1 Permutations with Repeated Elements . . . . . . . . . . . . . . . . . . . . . . . 121

40.3 Permutations with Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

41 How to Count: Combinations 127

41.2 The Combination as Binomial Coefficient . . . . . . . . . . . . . . . . . . . . . 131

Page 13, Table of Contents www.EconsPhDTutor.com

42.2 The Experiment as a Model of Scenarios Involving Chance . . . . . . . . . . . 138

42.5 The Union of Two Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

1. You’re a nice human being , [emotional_manipulation].