Statistics Notes On Random Variables

Random Variables
Topic 04 ST1232 Statistics for Life Sciences 1 / 92

1 Random Variables
2 Discrete Random Variables
Parameter of a Discrete Probability Distribution: Mean
Parameter of a Discrete Probability Distribution: Variance
3 Continuous Random Variables
Parameters of a Continuous Probability Distribution: Mean and Variance
Parameter of a Continuous Probability Distribution: Quantiles
4 Random Variables And Data Types
5 The Binomial Distribution
Combinations
Assumptions
Example
6 The Poisson Distribution
7 The Normal Distribution
Z -Scores
Examples
8 Index of Definitions and Examples
9 Reading

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

Random Variables
Definition 1 (Random Variables)

A random variable is a numerical measurement of the outcome of a experiment.
Quite often, the randomness arises from the use of random sampling or a
randomised experiment to gather the data.
The values that a random variable takes correspond to outcomes of the

experiment.
If we sample NUS students and measure their heights, we do not know

beforehand precisely what values we will get. The random variable is height.

Examples of Random Variables
Example 1 (Rolling Two Example 2 (Flipping A

Dice) Coin)
Suppose we roll two dice and Suppose we flip a coin.
take the sum.
S Z
S X H → 0
(1,1) → 2 T → 1
(1,2) → 3
(2,1) → 3
(1,3) → 4
(2,2) → 4
···
(6,6) → 12

Random Variables and Events
The values that a random variable takes are on defined events on the sample
space S.
For instance, in Example 1, X = 2 corresponds to the event {(1, 1)} and

X = 3 corresponds to the event {(1, 2), (2, 1)}.
It follows that we can now say things such as P(X = 1) and P(X = 2)
without ambiguity.
Each possible outcome in S has a specific probability of occurring.
The probability distribution of a random variable specifies its possible

values and their probabilities.

Sampling From the Population
The probability distribution applies for

selecting a subject at random from a
population.
Recall that numerical summaries of the
population are called parameters.
Numerical summaries of probability
distributions are also called parameters
since they pertain to the population.
We shall learn about the following
summaries of probability distributions:
I mean or proportion,
I variance, sd and
I percentiles.

How Will We Use Probability Distributions?
In a population, suppose we know the prevalence of a disease to be 0.00001.

Then, if we were running a clinic, we may not want to stock up on the
medication for this disease too much, especially if it is expensive.
Consider two populations - people who smoke and those who do not. If we
know that the probability of contracting a particular form of cancer is 0.8 for
smokers and 0.4 for non-smokers, then we can run a campaign to educate the
public in order to reduce the occurrence of this cancer.
The walking time (for me) from LT32 to Kent Ridge MRT is a constant 9.5
minutes. Taking the bus also has an average time of 9.5 minutes, but the
probability that it is longer than 15 minutes is 0.2. What should I do?

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

Discrete Random Variables
Definition 2 (Discrete Random Variables)

A discrete random variable X takes on a set of separate values {0, 1, 2, 3, . . .}. Its
probability distribution assigns a probability px to each possible value of X .
For each possible x, the probability px is between 0 and 1.
The sum of the probabilities for all the possible values equals 1.
We typically use uppercase letters to denote the random variable, and lower
case to denote the values that it has taken on.

Probability Distribution when Tossing Two Coins
Example 3 (Two Coin Tosses)

Suppose we toss a fair coin twice. Let X represent the total number of Heads that
turn up in the two tosses. Then px is given by:
Outcome(s) x px
(T , T ) 0 0.25
(H, T ), (T , H) 1 0.50
(H, H) 2 0.25

Probability Distribution when Rolling Two Dice
Example 4 (Rolling Two Dice)

Suppose we roll two dice. Let Y represent the sum of the values on these two dice. Then
the probability distribution of Y is given by:
Outcome(s) y py
(1,1) 2 1/36
(1,2),(2,1) 3 2/36
(1,3),(2,2),(3,1) 4 3/36
(1,4),(2,3),(3,2),(4,1) 5 4/36
... ... ...
(5,6),(6,5) 11 2/36
(6,6) 12 1/36

Using a Bar Plot To Visualise a Probability Distribution
For Discrete Random Variables
A bar plot uses a rectangle for each possible value that X can take on.
The width of each rectangle is identical, but the height is proportional to px .

When An Infinite Number of Values Are Possible
Example 5 (Infinite Number of Outcomes)
Suppose that we we are interested in the number of dengue cases that arise in
Punggol in August 2014.
We are told this random variable Z , which can take on values 0, 1, 2, 3, etc.,
follows this probability distribution:
e −20 20z
pz =
z!
Visualising with a bar plot is still possible, but not for the full range of z:

Utilising the Probability Distribution
In Example 3, what is the probability that we observe at least one Heads?
I P(X ≥ 1) = P(X = 1) + P(X = 2) = 0.50 + 0.25 = 0.75.
In Example 4, what is the probability that we observe a 2 or a 11 for the
sum?
I P({Y = 2} ∪ {Y = 11}) = P(Y = 2) + P(Y = 11) = 3/36 = 1/12.
In Example 5, what is the probability that we observe exactly two dengue
cases, given that there was at least one?
P({Z = 2} ∩ {Z ≥ 1})
P(Z = 2|Z ≥ 1) =
P(Z ≥ 1)
P(Z = 2)
=
P(Z ≥ 1)
P(Z = 2)
=
1 − P(Z < 1)
P(Z = 2)
= = 4.12 × 10−7
1 − P(Z = 0)

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

Mean of a Discrete Random Variable
Definition 3 (Mean of Discrete Random Variable)

The mean of a discrete random variable X is denoted by the Greek letter µ, and
is defined to be X
µ= xpx
x
Think of it as the sum of

Probabilities multiplied by Possibilities.
µ is the average of the values in the sample space, weighted by their

probabilities. Values that are more likely (i.e. have higher probability) have
more weight.

Computing The Mean
Example 6 (Computing the Mean)

Consider the following random variable T , which takes on only two possible
values (0 or 1):
t pt
0 0.27
1 0.73
The mean of T is
µ = 0 × 0.27 + 1 × 0.73 = 0.73
Notice that the mean of T is not one of the values in the sample space!

Computing The Mean
Let us return to the example on the two-coin toss in Example 3, where X

represents the number of Heads that turn up.
In this case, the expectation is given by
µ = 0 × 0.25 + 1 × 0.50 + 2 × 0.25 = 1

Mean Number of Goals Scored
Example 7 (Cristiano Ronaldo)

Suppose that the number of goals that Cristiano Ronaldo scores in a game is a
random variable R that follows this probability distribution:
r pr
0 0.43
1 0.30
2 0.10
3 0.10
4 0.07
The mean number of goals that he scores in a game is
µ = 0(0.43) + 1(0.30) + 2(0.10) + 3(0.10) + 4(0.07) = 1.08

About The Mean
The mean of the probability distribution of a random variable X is also

referred to as the expected value of X , and written as E (X ).
It is not what we expect to see for a single observation.
If we obtain a large number of observations from a population that follows

this probability distribution, the sample mean of those observations would be
close to the mean of the probability distribution.

Properties of The Mean
Recall of topic 01 (EDA) we considered the sample mean of linear
transformations of the observed data. The identical property holds when
considering the mean of the linear transformation of a random variable X :
I Let X be a random variable with E (X ) = µ. Let Y = bX + a be a linear
transformation of X , where b and a are known.
I Then E (Y ) = bE (X ) + a = bµ + a.
If a1 , a2 , an are known values, and X1 , X2 , . . . , Xn are n random variables
with means known to be µ1 , µ2 , . . . , µn , then the mean of the linear
combination of the n variables can be obtained as follows:
E (a1 X1 + a2 X2 + · · · + an Xn ) = a1 µ1 + a2 µ2 + · · · + an µn
In particular, if n random variables X1 , X2 , . . . , Xn are identically distributed,

then the mean of these variables (denoted by X̄ ) has the mean as the same
as the mean of each random variable:
n
1X
E (X̄ ) = Xi = µ
n
i=1

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

Risk-Taking or Risk-Averse?
Example 8 (Sure Win Strategy Or Not?)

You have $1000 to invest. Consider the following two investment options
presented to you:
A sure win of $500.
A 0.50 chance of a gain of $1000 and a 0.50 chance of gaining nothing.
Example 9 (Sure Lose Strategy Or Not?)

You have $1000 to invest. Consider the following two investment options
presented to you:
A sure loss of $500.
A 0.50 chance of a loss of $1000 and a 0.50 chance of losing nothing.

Expected Gain / Loss
In Example 8, the expected gain is $500 in both cases. So what is the

difference?
In Example 9, the expected loss is $500 in both cases. So what is the

difference?
The random variables in the second strategy in both cases involve more
variability, or risk.
Most people choose the sure gain strategy in the 1st scenario and the risky
strategy in the 2nd scenario.

Variance of Discrete Random Variable
Definition 4 (Variance of Discrete Random Variable)

The variance of a discrete random variable X is denoted by the Greek letter σ 2 ,
and is defined to be X
σ2 = (x − µ)2 px
x
The standard deviation of a discrete random variable is σ.
σ measures the variability of a random variable from the mean.
When comparing two random variables, the one with the larger standard
deviation has more variability.
An equivalent expression to refer to the variance of a random variable X is

Var (X ).

Computing σ 2
The variance σ 2 for the random variable R in Example 7 is
σ2 = 0.43(−1.08)2 + 0.30(1 − 1.08)2 + 0.10(2 − 1.08)2

+0.10(3 − 1.08)2 + 0.07(4 − 1.08)2 = 1.5536.
The standard deviation is given by σ = 1.246.
We will try to arrive at more intuition about σ when we discuss the Normal
distribution later, but for now, think of σ as the average deviation of a
random variable from it’s mean.

Properties of The Variance
Recall of topic 01 (EDA), we considered the sample variance of linear
transformations of the observed data. The identical property holds when
considering the variance of the linear transformation of a random variable X :
I Let X be a random variable with E (X ) = µ and variance σ 2 . Let Y = bX + a
be a linear transformation of X , where b and a are known.
I Then Var (Y ) = b 2 Var (X ) = b 2 σ 2 .
If a1 , a2 , an are known values, and X1 , X2 , . . . , Xn are independent random

variables with respective variance σ12 , σ22 , . . . , σn2 , then the variance of the
linear combination of the n variables can be obtained as follows:
Var (a1 X1 + a2 X2 + · · · + an Xn ) = a12 σ12 + a22 σ22 + · · · + an2 σn2
In particular, if we take the variance of the mean of n independent and

identically distributed random variables,
n
1 X 2 σ2
Var (X̄ ) = σ =
n2 n
i=1

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

Continuous Random Variables
Definition 5 (Continuous Random Variables)

A continuous random variable X has possible values that form an interval.
It’s probability distribution is specified by a curve that helps determine
probabilities of intervals. This curve is referred to as a probability density
function, or pdf.
Each interval will have probability between 0 and 1. This is the area under
the curve, above that interval.
The total area under the curve is equal to 1.

Visualising a Probability Distribution: From Bar Plots to A
Curve
The general idea is this:
Remember how we could use a bar plot to represent the probability
distribution for a discrete random variable?
Now, we have so many possible values that the individual bars cannot be
separated from their “neighbours” and so we do not see their distinct borders.

Systolic Blood Pressure
Example 10 (Systolic Blood Pressure of Males)
Consider randomly selecting a Singaporean male, aged 35 to 44 years old,
and measuring his Systolic Blood Pressure.
Let X be the random variable representing the outcome.
Suppose that the pdf of X is given by the curve below.

Computing Probabilities of Intervals
If we were interested in the probability that the measurement of the
individual selected falls between 60 and 73 mm Hg, we would have to find the
following area under the curve:
This area is 0.232. Hence the required probability
P(60 ≤ X ≤ 73) = 0.232

If we were interested in the probability that the measurement of the
individual selected is greater than 90 mm Hg, we would have to find the
following area under the curve:
This area is 0.202. Hence the required probability
P(X ≥ 90) = 0.202

Bus Arrival Times
Example 11 (Bus Arrival Times)
Suppose that the arrival times between buses is a random variable Y with the
following pdf. What is the probability that you have to wait more than 3
minutes?
5 minutes pass without any bus. What is the probability that you have to
wait a further 3 minutes?

The first question is asking for P(Y ≥ 3), which corresponds to the area
below.
This area is found to be 0.687. Hence
P(Y ≥ 3) = 0.687.

For the second question, we have to work a little first to identify the intervals
whose probabilities we need.
P(Y ≥8∩Y ≥5)
P(Y ≥ 8|Y ≥ 5) = P(Y ≥5)
P(Y ≥8)
= P(Y ≥5)
The respective probabilities are 0.3679 and 0.5353. Hence the desired
probability is just
P(Y ≥ 8|Y ≥ 5) = 0.3679/0.5353 = 0.687
Computing the Area Under a Curve
All the areas on the previous slides were computed using tables or software.
We will avoid using integration to obtain the area under a curve for this class.
In the Section 6 on the Normal distribution, we shall introduce the use of

tables.

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

Mean of a Continuous Random Variable
Definition 6 (Mean of Continuous Random Variable)
The mean of a continuous random variable X , which has pdf f (x), is denoted by
the Greek letter µ, and is defined to be
Z
µ = xf (x) dx
Identical to the discrete case, it is also referred to as the expectation of a

random variable E (X ).
The interpretation of the mean is the same as for the discrete case (see slide
21).
The properties of the mean are the same as for the discrete case (see slide
22).
In Example 10, if the pdf formula was given and if we were to carry out the
integration, we would find that E (X ) = 80.
In Example 11, if the pdf formula was given and if we were to carry out the
integration, we would find that E (Y ) = 8.
Variance of a Continuous Random Variable
Definition 7 (Variance of Continuous Random Variable)

The variance of a continuous random variable X , which has pdf f (x), is denoted
by the Greek letter σ 2 , and is defined to be
Z
σ 2 = (x − µ)2 f (x) dx
Identical to the discrete case, it is also referred to as Var (X ).

The properties of the variance are the same as for the discrete case (see slide
28).
In Example 10, if we were to carry out the integration, we would find that
Var (X ) = 144.
In Example 11, if we were to carry out the integration, we would find that
Var (Y ) = 64.

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

Quantiles
Let p be a value between 0 and 1.
Definition 8 (Quantile or Percentile)

For a continuous random variable X , the p-th quantile, qp , is a value such that
P(X ≤ qp ) = p
Quantiles are also known as percentiles.
If we were to try to explain it in terms of the pdf curve, then qp is the point
on the x-axis such that the area under the curve and to the left of qp , is
equal to p.

Quantiles of A Bell-Shaped Distribution
Consider the pdf in Example 10.
The blue area in the top diagram is

equal to 0.0478, and the blue area ends
at 60 mm Hg. Hence
q0.0478 = 60
The blue area in the middle diagram is

at 73 mm Hg. Hence
q0.280 = 73
The blue area in the bottom diagram is

at 90 mm Hg. Hence
q0.798 = 90

Quantiles In The Bus Arrivals Example
Consider the pdf in Example 11.
On top, the blue area is equal to

0.313, and it ends at 3 mins. Hence
q0.313 = 3
In the middle, the blue area is equal

to 0.465, and it ends at 5 mins.
q0.465 = 5
At the bottom, the blue area is

equal to 0.632, and it ends at 8
mins.
q0.632 = 8

Using Quantiles
In this course, we must be comfortable using probability tables (not integration)
to do the following:
Given a probability p, what is point x
Given a value x, what is the area under such that the area to the left of x (under
the curve to the left of it? the curve) is p?

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

Random Variables and Data Types

Random Variables and Data Types

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

Combinations
Example 12 (Drawing Cards from a Deck)

In how many ways can we draw 2 cards from a well-shuffled deck of 52?
How many different combinations of 2 Aces can be drawn from a deck of 52?
This involves counting combinations of cards selected from a group of 52. The
order in which the cards were selected does not matter.

Combinations
In Example 12, regarding the first question:
I There are 52 ways of choosing the first card.
I Having done so, there are 51 ways of choosing the second card.
I Hence using the multiplicative rule for counting (slide 24 of topic
03-probability),
52 × 51 = 2652
I However, this way double-counts each set of 2 cards selected. Hence the total
number of combinations of 2 cards from 52 is
2652/2 = 1326
I There are 1326 ways in which 2 cards can be drawn from a deck of 52.
Regarding the number of ways in which 2 Aces can be chosen:
I There are 4 ways of choosing the first card, and then 3 ways of choosing the
second.
I Hence there are 4 × 3 = 12 ways of choosing 2 Aces, if we were to
differentiate the order in which these were drawn.
I Again this double counts. Thus there are only 12/2 = 6 combinations of 2
cards from 4.

Combinations
Definition 9 (Combinations)
The number of combinations of n things, taken k at a time is
n! n(n − 1)(n − 2) × · · · × (n − k + 1)
Ckn = =
k!(n − k)! k!
It represents the number of ways of selecting k items out of n, when the

order of selection does not matter.
It is useful to remember the following corner cases:
I C1n = n
I Cnn = 1

Counting Combinations
Example 13 (Selecting Patients for a Trial)

Suppose we have a new developmental drug for schizophrenia.
There are 6 eligible patients at NUH, but we only have permission to
administer the drug to 3 patients. Hence we have to randomly select three of
them. How many such selections are there?
Suppose that the 6 patients consist of 4 males and 2 females. How many
selections are possible if both females must be included in the sample?

Combinations of Patients
The number of possible outcomes is a combination of k = 3 patients from

n = 6; hence the total number of combinations is
6!
C36 = = 20
3!3!
If both females are already in the sample, then we are left to choose only one
male from the group of 4.
Thus we only have to pick a combination of k = 1 male from the n = 4 male
patients. This value is
C14 = 4

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

Probability Distribution for Counts with Binary Data
In many applications, each observation is binary: It has only two possible

outcomes.
For instance, a person may:

I accept or decline a credit card offer from a bank.
I have, or not have health insurance.
I vote for or against PAP.
Under certain conditions, a Binomial distribution counts the total number of

cases for the outcome of interest.

The Binomial Distribution
Definition 10 (Binomial Distribution)

Suppose we have
n trials, each of which has two possible outcomes. The outcome of interest is
called a success and the other outcome is called a failure.
Each trial has the same probability of success p.
The n trials are independent.
The binomial random variable X is the total number of successes in the n trials.
X can take on values 0, 1, 2, . . . , n.
We shall denote this distribution as Bin(n, p).
A Bin(1, p) distribution is also referred to as a Bernoulli trial or a Bernoulli

distribution with success probability p.

Examples of Binomial Distribution
Suppose we plant 3 seeds, and each of them will germinate independently

with probability 0.6. Then Z , a random variable for the total number of seeds
that do germinate, follows Bin(3, 0.6) distribution.
Suppose we sample White Blood Cells (WBC) from an individual, and test if
it is a lymphocyte or not. For this individual, this will be true only 20% of the
time. If we were to obtain 10 WBC and set W to be the total number of
lymphocytes obtained, then W ∼ Bin(10, 0.2).
The probability of a woman developing breast cancer over a lifetime is 1/9.

Suppose we were to sample 50 women from Singapore independently and
follow them over their lifetime. If we set Y to be the total number of women
who develop breast cancer, then Y ∼ Bin(50, 1/9).

Binomial Formula
Suppose that X follows a Bin(n, p) distribution.
Then the probability of x successes in these n trials is
P(X = x) = Cxn p x (1 − p)n−x
for x = 0, 1, 2, . . . , n.
Note the presence of the the combinations term.
The mean of X is E (X ) = np.
The variance of X is Var (X ) = np(1 − p).

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

Using the Binomial Probability Distribution
Example 14 (DNA Sequence Alignment)

The DNA of an organism consists of very long sequences of nucleotides.
There are four different nucleotides, which are represented by the letters A,
T, G and C.
Within a species, DNA sequences change over the course of many
generations. Hence it is possible that two sequences, separately obtained, in
truth derived from the same ancestor.
Consider the following two short sequences of length 10:
G G A G A C T G T A (reference)
| | | | |
G A A C G C C C T A (query)
We wish to gauge if the two sequences show significant similarity.
What is the probability of the above outcome (i.e., 5 matches), if the query
sequence was unrelated to (i.e., random w.r.t.) the reference sequence?

Probability of Five Matches
Picture this as 10 trials, one at each nucleotide position.

The outcome of interest is a match, which would happen with probability
0.25 if the two organisms were unrelated.
In essence, we need to find the probability of 5 matches out of 10.
Hence the probability that a random sequence of nucleotides gives rise to a
match of 5 positions is
5 5
1 3
C510 = 0.0584
4 4
What would you conclude about the two sequences? Related or unrelated?
I In practice, we use the the probability of 5 or more matches instead of just the
probability of 5 matches to make our conclusion.

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

The Poisson Distribution
This is the second most frequently used discrete distribution after Binomial
distribution. It usually associates with rare events.
Definition 11 (Poisson Distribution)

Random variable X follows Poisson distribution with parameter λ if
e −µ µk
P(X = k) = , k = 0, 1, 2, ...
k!
where e is approximately 2.71828, λ is the expected no. of events per time unit
and µ = λt is the expected no. of events over time period t.

Example of Poisson distribution
Example 15 (Infectious Disease)

Consider a typhoid-fever example. Suppose the number of deaths from typhoid
fever over a 1-year period is Poisson distributed with parameter µ = 4.6.
What is the probability distribution of the number of deaths over a 6-months
period?
A 3-months period?
Let X be the number of deaths in 6 months. Find the probability distribution

of X.
Let Y be the number of deaths in 3 months. Find the probability distribution
os Y.

Example of Poisson distribution (cont)
For X, because µ = 4.6, t = 1 year, it follows that λ = 4.6. For a 6-months
period, we have µ = 4.6 × 0.5 = 2.3. Therefore,
P(X = 0) = e −2.3 = 0.100
2.3 −2.3
P(X = 1) = e = 0.231
1!
2.32 −2.3
P(X = 2) = e = 0.265
2!
2.33 −2.3
P(X = 3) = e = 0.203
3!
2.34 −2.3
P(X = 4) = e = 0.117
4!
2.35 −2.3
P(X = 5) = e = 0.054
5!
P(X ≥ 6) = 1 − (0.100 + 0.231 + 0.265 + 0.203 + 0.117 + 0.054) = 0.03

Example of Poisson distribution (cont)
For Y, because µ = 4.6, t = 1 year, it follows that λ = 4.6. For a 6-months

period, we have µ = 4.6 × 0.25 = 1.15. Therefore,
P(Y = 0) = e −1.15 = 0.317
1.15 −1.15
P(Y = 1) = e = 0.364
1!
1.152 −1.15
P(Y = 2) = e = 0.209
2!
1.153 −1.15
P(Y = 3) = e = 0.08
3!
P(Y ≥ 4) = 1 − (0.317 + 0.364 + 0.209 + 0.08) = 0.03

Mean and Variance of the Poisson distribution
Poisson distribution with parameter µ has the mean and variance are both equal
to µ. This could help to identify if the distribution is a Poisson distribution.
Example 16 (Occupational Health)

A public health issue arose concerning the possible carcinogentic potential of food
ingredients containing ethylene dibromide (EDB). In some instances food were
removed from public consumption if they were shown to have excessive quantities
of EDB. A previous study had looked at motality if 161 white male employees of
two plants in Texas and Michigan who were exposed to EDB over the time period
1940-1975. Seven deaths from cancer were observed among these employees. For
this time period, 5.8 cancer deaths were expected as calculated from overall
mortality rates for U.S. white men. Was the observed number of cancer deaths
excessive in this group?

Example 16
The expected number of cancer deaths from U.S. white male mortality rates
is µ = 5.8.
Let X be the number of deaths from cancer among the employees in the
study, then X is a Poisson random variable with µ = 5.8.
We need to find P(X ≥ 7).
P(X ≥ 7) = 1 − P(X ≤ 6) where P(X = k) = e −5.8 (5.8)k /(k!)
P(X ≥ 7) = 1 − 0.638 = 0.362
Clearly, the observed number of cancer deaths in the given study is not
excessive in this group.

Poisson Approximation to the Binomial Distribution
The Binomial with large n and small p can be accurately approximated by a

Poisson distribution with parameter µ = np.
The mean f=of this distribution is np and the variance by np(1 − p) where
(1 − p) is approximately equal to 1 for small p, and thus np(1 − p) ≈ np,
that is the mean and variance are almost equal.
The Binomial distribution involves expressions Ckn and (1 − p)n−k which are
cumbersome for large n.

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

The Normal Distribution
Definition 12 (Normal Distribution)

The Normal distribution is symmetric, bell-shaped and characterised by it’s
mean µ and it’s variance σ 2 .
It is also known as the Gaussian distribution.

If X is random variable that follows a Normal distribution with mean µ and
variance σ 2 , we say that X ∼ N(µ, σ 2 ) distribution.

Properties of the Normal Distribution
The highest point of the Normal distribution curve is at x = µ.
The Normal distribution is symmetric about µ. This implies two things:
I If x > 0, the area to the left of µ − x is the same as the area to the right of
µ + x.
I q1−p = 2µ − qp

Defining the N(µ, σ 2 ) Distribution
Normal distributions with a larger σ 2 naturally have a larger spread of values.

An Empirical Guide to Normal Probabilities
The plot on the left gives a useful rule

of thumb for observations from a
N(µ, σ 2 ) distribution.
For instance, approximately 68% of the
observations from a N(0, 1)
distribution would fall within -1 and 1.
Similarly, the area under a N(3, 4)
distribution between -1 and 7 would be
approximately 0.95.
We can use this rule of thumb to gauge
the width of an interval that contains
95% of the data, or to compare two
graphs in terms of their varibility.

Linear Operations on Normal Random Variables
If X ∼ N(µX , σX2 ) and Y ∼ N(µY , σY2 ), and X and Y are independent
random variables, then
I X + Y ∼ N(µX + µY , σX2 + σY2 )
I X − Y ∼ N(µX − µY , σX2 + σY2 )
I The addition could be of more than just two terms. In particular, if
X1 , X2 , . . . , Xn , then
x1 + ... + Xn
X̄ = =∼ N(µ, σ 2 /n)
n
For any real numbers a and b, if X ∼ N(µX , σX2 ), and Y ∼ N(µY , σY2 ) then
aX + bY ∼ N(aµX + bµY , a2 σX2 + b2 σY2 )
In particular, if X ∼ N(µ, σ 2 ) and we take a = 1/σ and b = −µ/σ, then
X −µ
Z= ∼ N(0, 1)
σ
Whenever we compute Z = (X − µ)/σ, we refer to Z as the Z -score of X .
1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

Normal(0,1) Distribution Tables
Use the Normal Table to compute the values of the following, where Z ∼ N(0, 1).
P(Z > 1)
P(Z ≤ 2.3)
P(Z > 1.18)
P(Z > 1.18|Z > 0)
P(Z > 1.96 ∪ Z < −1.96)
q0.32

Using Z -scores
To summarise:
If we are given a value X and wish to find a probability, convert
Z = (X − µ)/σ and use the table to obtain.
If we are given a probability p and asked to find qp for an X ∼ N(µ, σ 2 ), find

the corresponding quantile q ∗ for a Z ∼ N(0, 1) and then compute
qp = q ∗ σ + µ.

Using Z -scores to Find Outliers
If the Z -score for a random variable X ∼ N(µ, σ 2 ) is greater than 3 or less
than -3, it suggests it is very different from the rest of the data.
For bell-shaped histograms, this is another way of identifying outliers.

1 Random Variables
Combinations
Assumptions
Example
Z -Scores
Examples
9 Reading

SAT Scores
Example 17 (SAT Scores)

The SAT is an entrance exam used in the US. Suppose that scores (X ) from this
test follow a N(µ = 1500, σ 2 = 90000) distribution. Find the following:
P(X ≤ 1800).
P(X ≥ 1630).

SAT Scores (1)
Convert to Z -score:
P(X ≤ 1800) = P(Z ≤ (1800 − 1500)/300) = P(Z ≤ 1)
From the table, this value is 0.841.

SAT Scores (2)
Convert to Z -score:
P(X ≥ 1630) = P(Z ≥ (1630 − 1500)/300) = P(Z ≥ 0.43)
From the table, this value is 0.3336.

Screening for Hypertension
Example 18 (Hypertension Screening)

Suppose that you run a clinic. Patients who come in have their blood pressure
measured. The random variable X of this measurement for hypertensive
patients follows a Normal distribution with µ = 95 and σ = 12.
You wish to develop a screening test for hypertension, which has sensitivity
0.90. What is the cut-off pressure you should use?
It takes 5 minutes to serve a normal patient (with low or normal blood
pressure), but it takes 15 minutes if a patient has been detected to have high
blood pressure. What is the mean time that a nurse will spend taking the
blood pressure of a hypertensive patient, when the above screening test is
implemented?

Screening for Hypertension
We need to find q0.10 for X ∼ N(95, 144). This is equal to 79.6 mm Hg.
If we let Y be the time that a nurse spends on a hypertensive patient, then
p5 = 0.1 and p15 = 0.9
Hence E (Y ) = 5(0.1) + 15(0.9) = 14 minutes.

Normal Approximation to the Binomial Distribution
When n is large, the Bin(n, p) is difficult to work with and an approximation

is easier to use rather than the exact binomial distribution.
If n is moderately large and p is either near 0 or 1, then the Bin(n, p) will be
very positively or negatiely skewed, respectively.
If n is moderately large and p is not too extreme then the Bin(n, p) tends to
be symmetric and is well approximated by a normal distribution
N(np, np(1 − p)).
The normal distribution with mean np and variance np(1 − p) can be used to
approximate a binomial distribution with parameter n and p when
np(1 − p) ≥ 5.

Index
Def 01: Random Variable, 4 Eg 04: Two Dice, 12

Def 02: Discrete Random Variable, 10 Eg 05: Infinite Number of Outcomes, 14
Def 03: Mean, Discrete, 17 Eg 06: Computing Mean, 18
Def 04: Discrete, 26 Eg 07: Ronaldo, 20
Def 05: Continuous Random Variable, 30 Eg 08: Sure Win or Not?, 24
Def 06: Mean, Continuous, 40 Eg 09: Sure Loss or Not?, 24
Def 07: Quantile, 43 Eg 10: Systolic Blood Pressure, 32
Def 07: Variance, Continuous, 41 Eg 11: Bus Arrival Times, 35
Def 08: Combinations, 54 Eg 12: Drawing Cards, 52
Def 09: Binomial Distribution, 59 Eg 13: Patient Selection, 55
Def 10: Poisson Distribution, 66 Eg 14: DNA Sequence Alignment, 63
Def 11: Normal Distribution, 74 Eg 15: Infectious Disease, 67
Eg 01: Two Dice, 5 Eg 16: Occupational Health, 70
Eg 02: Coin Flip, 5 Eg 17: SAT Scores, 84
Eg 03: Coin Toss, 11 Eg 18: Hypertension Screening, 87

Further Reading
Statistics: The Art and Science of Learning from Data, 3rd edition
Alan Agresti and Christine Franklin.
Read: Chapter 6

Statistics Notes On Random Variables

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Statistics Notes On Random Variables

Încărcat de

Drepturi de autor:

Formate disponibile

Random Variables

Topic 04 ST1232 Statistics for Life Sciences 1 / 92

Topic 04 ST1232 Statistics for Life Sciences 2 / 92

Topic 04 ST1232 Statistics for Life Sciences 3 / 92

Definition 1 (Random Variables)

The values that a random variable takes correspond to outcomes of the

If we sample NUS students and measure their heights, we do not know

Topic 04 ST1232 Statistics for Life Sciences 4 / 92

Example 1 (Rolling Two Example 2 (Flipping A

Topic 04 ST1232 Statistics for Life Sciences 5 / 92

For instance, in Example 1, X = 2 corresponds to the event {(1, 1)} and

Each possible outcome in S has a specific probability of occurring.

The probability distribution of a random variable specifies its possible

Topic 04 ST1232 Statistics for Life Sciences 6 / 92

The probability distribution applies for

Topic 04 ST1232 Statistics for Life Sciences 7 / 92

In a population, suppose we know the prevalence of a disease to be 0.00001.

Topic 04 ST1232 Statistics for Life Sciences 8 / 92

Topic 04 ST1232 Statistics for Life Sciences 9 / 92

Definition 2 (Discrete Random Variables)

For each possible x, the probability px is between 0 and 1.

Topic 04 ST1232 Statistics for Life Sciences 10 / 92

Example 3 (Two Coin Tosses)

Topic 04 ST1232 Statistics for Life Sciences 11 / 92

Example 4 (Rolling Two Dice)

Topic 04 ST1232 Statistics for Life Sciences 12 / 92

Topic 04 ST1232 Statistics for Life Sciences 13 / 92

Topic 04 ST1232 Statistics for Life Sciences 14 / 92

Topic 04 ST1232 Statistics for Life Sciences 15 / 92

Topic 04 ST1232 Statistics for Life Sciences 16 / 92

Definition 3 (Mean of Discrete Random Variable)

Think of it as the sum of

µ is the average of the values in the sample space, weighted by their

Topic 04 ST1232 Statistics for Life Sciences 17 / 92

Example 6 (Computing the Mean)

Topic 04 ST1232 Statistics for Life Sciences 18 / 92

Let us return to the example on the two-coin toss in Example 3, where X

µ = 0 × 0.25 + 1 × 0.50 + 2 × 0.25 = 1

Topic 04 ST1232 Statistics for Life Sciences 19 / 92

Example 7 (Cristiano Ronaldo)

The mean number of goals that he scores in a game is

µ = 0(0.43) + 1(0.30) + 2(0.10) + 3(0.10) + 4(0.07) = 1.08

Topic 04 ST1232 Statistics for Life Sciences 20 / 92

The mean of the probability distribution of a random variable X is also

It is not what we expect to see for a single observation.

If we obtain a large number of observations from a population that follows

Topic 04 ST1232 Statistics for Life Sciences 21 / 92

In particular, if n random variables X1 , X2 , . . . , Xn are identically distributed,

Topic 04 ST1232 Statistics for Life Sciences 22 / 92

Topic 04 ST1232 Statistics for Life Sciences 23 / 92

Example 8 (Sure Win Strategy Or Not?)

Example 9 (Sure Lose Strategy Or Not?)

Topic 04 ST1232 Statistics for Life Sciences 24 / 92

In Example 8, the expected gain is $500 in both cases. So what is the

In Example 9, the expected loss is $500 in both cases. So what is the

Topic 04 ST1232 Statistics for Life Sciences 25 / 92

Definition 4 (Variance of Discrete Random Variable)

The standard deviation of a discrete random variable is σ.

σ measures the variability of a random variable from the mean.

An equivalent expression to refer to the variance of a random variable X is

Topic 04 ST1232 Statistics for Life Sciences 26 / 92

The variance σ 2 for the random variable R in Example 7 is

σ2 = 0.43(−1.08)2 + 0.30(1 − 1.08)2 + 0.10(2 − 1.08)2