Sunteți pe pagina 1din 51

Lecture 8: Estimation

Matt Golder & Sona Golder


Pennsylvania State University
Introduction
Introduction
Populations are characterized by numerical descriptive measures called
parameters. These parameters that describe a population are xed constants.
Important population parameters that we might be interested in are the
population mean and variance
2
. A parameter of interest is often called a
target parameter.
Methods for making inferences about parameters fall into one of two categories
1 We will estimate (predict) the value of the target parameter of interest.
What is the value of the population parameter?
2 We will test a hypothesis about the value of the target parameter.
Is the parameter value equal to this specic value?
Well use greek letters for population parameters (like ), and letters with
hats (

) for specic data-based estimates of those parameters.


Notes
Notes
Notes
Introduction
Suppose we are interested in estimating the mean waiting time in a
supermarket. We can give our estimate in two forms.
1 Point estimate: A single value or point say, 3 minutes that we think
is close to the unknown population mean .
2 Interval estimate: Two values that correspond to an interval say 2 and
4 minutes that is intended to enclose the parameter of interest .
Estimation is accomplished by using an estimator for the target parameter.
Introduction
An estimator is a rule, often expressed as a formula, that tells us how to
calculate the value of an estimate based on the measurements contained in a
sample.
For example, the sample mean

X =
1
N
N

i=1
Xi
is one possible point estimator of the population mean .
Recall that any estimate we make is itself a random variable.
Random Variables
One way to think of a random variable is that it is made up of two parts: a
systematic component and a random part.
Xi = +ui
This implies something about u:
ui = Xi
Notes
Notes
Notes
Random Variables
What is the expected value of u?
E(u) = E(X )
= E(X) E()
= E(X)
=
= 0
The mean is a number such that, if it is subtracted from each value of X in
the sample, the sum of those dierences will be zero.
Random Variables
What about the variance of X and u?
Var(X) = E[(X )
2
]
= E[u
2
]
Var(u) = E[(u E(u))
2
]
= E[(u 0)
2
]
= E[u
2
]
If we dene a random variable as composed of a xed part and a random part,
then:
The variable will have a population mean (i.e. E(X)) equal to , and
The variance of X is equal to the variance of u.
Estimates are Random Variables
Suppose that:
We want to know for the population, but
we only have data on a sample of N observations from the population, so
we use these data to estimate the mean.

X =
1
N
N

i=1
Xi
Notes
Notes
Notes
Estimates are Random Variables
Recalling that each Xi = +ui, then we can write:

X =
1
N
N

i=1
( +ui)
=
1
N
N

i=1
() +
1
N
N

i=1
(ui)
=
1
N
(N) +
1
N
N

i=1
(ui)
= + u
Estimates are Random Variables
This means that
The estimate of the mean is itself a random variable.
For dierent samples, well get dierent values of u (the sample-based
average of the stochastic component of X), and correspondingly
dierent estimates of the mean.
All of this (stochastic) variation is due to the random component of X.
We could show in analogous fashion that the usual estimate of the variance
2
(s
2
) is also a random variable.
Properties of Estimators
In theory, there are many dierent estimators. The one(s) we will choose will
depend on their properties.
There are two general types of properties of estimators:
Small-Sample Properties
These properties hold irrespective of the size of the sample on which the
estimate is based.
In other words, in order for a estimator to have these properties, they
must hold for all possible sample sizes.
Notes
Notes
Notes
Properties of Estimators
Large-Sample (Asymptotic) Properties
These are properties which hold only as the sample size increases to
innity.
In practical terms, it means that to receive the benets of these
properties, more is better (at least as far as sample size goes).
In what follows, well consider an abstract population parameter (a mean, a
correlation, etc.) Well assume that we estimate it with a sample of N
observations. And well call this generic estimator

.
Unbiased Point Estimators
We generally prefer that estimators be accurate i.e., that they reect the
population parameter as closely as possible.
E(

) =
If this property holds, then we say an estimator is unbiased.
An unbiased estimator is one for which its expected value is the
population parameter.
Unbiased Point Estimators
Denition: Let

be a point estimator for parameter . Then

is an unbiased
estimator if E(

) = . If E(

) = , then

is said to be biased. The bias of a
point estimator

is given by B(

) = E(

) .
Figure: Sampling Distribution for an Unbiased and Biased Estimator

E( 1)

f( )
1

1
E( 2)
Bias
f( 2)
Notes
Notes
Notes
Unbiased Point Estimators
Weve already seen that the sample mean is an unbiased estimate of the
population mean:
E(

X) = E( + u)
= E() + E( u)
= + 0
=
But only if we have random sampling.
Unbiased Point Estimators and Random Sampling
Example: Suppose each of 200,000 people in a city under study has eaten X
number of fast-food meals in the last week. However, a residential phone
survey on a week-day afternoon misses those who are working - the very people
most likely to eat fast food.
Table: Target Population and Biased Subpopulation
Whole Target Population Subpopulation Responding
X = Meals Frequency Relative Frequency Frequency Relative Frequency
0 100,000 0.50 38,000 0.76
1 40,000 0.20 6,000 0.12
2 40,000 0.20 4,000 0.08
3 20,000 0.10 2,000 0.04
200,000 1.00 50,000 1,00
Unbiased Point Estimators and Random Sampling
Population mean: = 0(0.5) + 1(0.2) + 2(0.2) + 3(0.1) = 0.9.
Subpopulation mean: R = 0(0.76) + 1(0.12) + 2(0.08) + 3(0.04) = 0.4.
A random sample of 200 phone calls during the week will bring a response rate
of about 50, whose average

R will be used to estimate . What is the bias?
The sample mean

R has an obvious non-response bias.
Bias = E(

R)
= R
= 0.4 0.9 = 0.5
The bias is large and will lead researchers to underestimate the number of fast
food meals eaten in a week by 0.5.
Notes
Notes
Notes
Unbiased Point Estimators
So, how do we know if an estimator is unbiased?
As the example of the sample mean illustrates, we can sometimes prove it.
Other times, it can be dicult or impossible to show that an estimator is
unbiased.
Moreover, there may be many, many unbiased estimators for a particular
population parameter.
Unbiased Point Estimators
Example: Consider a sample of two observations X1 and X2, and a generalized
estimator for the mean:
Z = 1X1 +2X2.
E(Z) = E(1X1 +2X2)
= E(1X1) + E(2X2)
= 1E(X1) +2E(X2)
= 1 +2
= (1 +2)
Unbiased Point Estimators
So long as (1 +2) = 1.0, then E(Z) = and the estimator is unbiased.
This means that there are, in principle, an innite number of unbiased
estimators.
We could extend this to N observations: So long as the sum of the
weights add up to 1.0, the estimate is unbiased.
So how do we choose which estimator to use?
Notes
Notes
Notes
Relative Eciency of Point Estimators
If

1 and

2 denote two unbiased estimators for the same parameter , we
prefer to use the estimator with the smaller variance i.e. the estimator whose
sampling distribution is more concentrated around the target parameter. This
is the notion of eciency.
Figure: Comparing the Eciency of Estimators
f( 1)

1
f( 2)

2
Relative Eciency of Point Estimators
To compare the relative eciency of two estimators we examine the ratio of
their variances.
Denition: Given two unbiased estimators

1 and

2 of a parameter , with
variances Var(

1) and Var(

2), respectively, then the eciency of



1 relative to

2, denoted e(

1,

2), is dened by the ratio
e(

1,

2)
Var(

2)
Var(

1)
If

1 and

2 are unbiased estimators for , e(

1,

2), is greater than 1 only if
Var(

2) > Var(

1). In this case,



1 is a better unbiased estimator than

2.
If e(

1,

2) < 1, then

2 is preferred to

2.
Relative Eciency of Point Estimators
Example: We saw a long time ago that when the population being sampled is
exactly symmetric, its center can be estimated without bias by the sample
mean

X and the sample median XMed. But which is more ecient?
When we sample from a normal population, it can be shown that
Var(XMed) 1.57

2
N
.
And we already know that for a normal population Var(

X) =

2
N
.
Notes
Notes
Notes
Relative Eciency of Point Estimators
This means that:
e(

X, XMed)
Var(XMed)
Var(

X)

1.57
2
/N

2
/N
= 1.57 = 157%
The sample mean

X is 57% more ecient than the sample median XMed.
The sample median will yield as accurate an estimate as the sample mean only
if we take a 57% larger sample.
Relative Eciency of Point Estimators
Example: A distribution with thicker tails than the normal distribution is called
the Laplace distribution. What is the eciency of the sample mean relative to
the sample median now?
Figure: Comparing the Standard Normal Distribution and Standard Laplace
Distribution

Relative Eciency of Point Estimators
In sampling from a Laplace distribution, Var(XMed) 0.5

2
N
.
And so, we have:
e(

X, XMed)
Var(XMed)
Var(

X)

0.5
2
/N

2
/N
= 0.57 = 50%
The sample mean is less ecient than the sample median in this case.
If a symmetric population has thick tails, so that outlying observations are
likely to occur, then the sample mean has a larger variance. This is because it
takes into account all observations, even the distant outliers that the sample
median ignores.
Notes
Notes
Notes
Relative Eciency of Point Estimators
Example: Consider our generalized estimator Z = 1X1 +2X2. Consider the
variance of Z.
Var(Z) = Var(1X1 +2X2)
= (
2
1
+
2
2
)
2
We want to know what combination minimizes this variance. Since we know
that 1 +2 = 1.0, we can rewrite:

2
1
+
2
2
=
2
1
+ (1 1)
2
=
2
1
+ (1 21 +
2
1
)
= 2
2
1
21 + 1
Relative Eciency of Point Estimators
We then minimize this by taking the derivative with respect to 1 and setting
equal to zero:
41 2 = 0
1 = 0.5
So the (equally-weighted) sample average has the smallest variance of all the
possible unbiased estimators for the population mean.
Ecient Point Estimators
So far, we have talked about the relative eciency of point estimators.
However, we sometimes talk about an ecient estimator in absolute terms.
An ecient estimator is an unbiased estimator whose variance is equal to
what is known as the Cramer-Rao lower bound, I().
If we can show that an estimator is equal to the Cramer-Rao lower bound, then
we know that we have the most ecient estimator.
Note that an ecient estimator must be unbiased. It is possible that a biased
estimator has a smaller variance than I(), but that does not make it ecient.
Notes
Notes
Notes
Ecient Point Estimators
Let X1, X2, . . . , XN denote a random sample from a probability density
function f(x), which has unknown parameter . If

is an unbiased estimator
of , then under very general conditions
Var(

) I()
where
I() =
_
NE
_

2
lnf(x)

2
__
1
This is known as the Cramer-Rao inequality.
If Var(

) = I(), then the estimator



is said to be ecient.
Mean Squared Error
Clearly one would like to have unbiased and ecient estimators. If one had the
choice of two unbiased estimators, we would chose the more ecient one.
But what if we are comparing both biased and unbiased estimators? It turns
out that it may no longer be appropriate to select the estimator with least
variance or the estimator with least bias.
Maybe we will want to tradeo some bias in favor of gains in eciency, or
vice versa.
Mean Squared Error
How do we decide which estimator is closest to the target parameter overall?
Figure: Mean Squared Error

It turns out that we use something called the mean squared error (MSE).
Notes
Notes
Notes
Mean Squared Error
Denition: The mean squared error (MSE) of a point estimator

is
MSE(

) = E[(

)
2
]
The mean squared error is, therefore, the expected squared bias of an estimator.
The MSE can be re-written as
MSE(

) = Var(

) + [B(

)]
2
This shows that the MSE reduces to the variance for unbiased estimators.
Mean Squared Error
The MSE can be regarded as a general kind of variance that applies to either
unbiased or biased estimators.
This leads to the general denition of the relative eciency of two estimators.
Denition: For any two estimators whether biased or unbiased
e(

1,

2)
MSE(

2)
MSE(

1)
With regards to our three hypothetical estimators shown earlier,

2 has the
least mean squared error and is therefore the more ecient estimator.
If we choose this estimator, wed be trading o slightly more bias for greater
eciency.
Mean Squared Error
Example: We want to estimate with a sample size of N. One estimator is
the mean (

X), which has:
B(

X) = 0 (because the mean is an unbiased estimator of ).
Var(

X) =
2
/N, where
2
is the variance of X.
MSE(

X) =
2
/N + (0)
2
=
2
/N.
An alternative estimator, , might be:
= 6
In other words, this estimator says that its guess of the expectation of X is
always equal to six.
Notes
Notes
Notes
Mean Squared Error
The bias of , B(), is:
B() = E( ) = E(6) E() = 6
The variance of is:
Var() = Var(6) = 0
And, the MSE of is:
MSE() = Var() + [B()]
2
= 0 + (6 )
2
= 36 12 +
2
Mean Squared Error
Figure: Mean Squared Error

The black line is the MSE of as a function of the true population mean .
The red lines are the MSEs for

X, under the assumption that
2
= 10 and
N = {20, 100, 1000}, respectively.
Mean Squared Error
There are several things to note:
The MSE of is quite good if 6. In some circumstances, the MSE
for will be smaller than that for

X, even though

X is both unbiased
and a better estimator.
But, the MSE of gets much worse as gets further away from six.
Since we dont know whether = 6 or not, this is not a desirable
property.
Relatedly, our estimator doesnt improve in MSE terms if we add
more data to our sample (that is, as N ).
In contrast, the MSE of

X drops considerably as N increases, and does
so irrespective of the true value of .
This example illustrates that while MSE can be a good way to choose among
estimators, it shouldnt be applied uncritically.
Notes
Notes
Notes
Mean Squared Error
Example: Recall the phone survey of 50 responses from 200 calls that had a
serious non-response bias. In addition, the average response

R has variability
too. Calculate the MSE of

R.
Table: Biased Subpopulation
r f(r) rf(r) r R (r R)
2
(r R)
2
f(r)
0 0.76 0 -0.4 0.16 0.1216
1 0.12 0.12 0.6 0.36 0.0432
2 0.08 0.16 1.6 2.56 0.2048
3 0.04 0.12 2.6 6.76 0.2704
R = 0.4
2
R
= 0.64
Mean Squared Error
We saw from earlier that the bias was -0.5.
Var(

R) =

2
R
N
=
0.64
50
= 0.013
MSE(

R) = Var(

R) + [Bias(

R)]
2
= 0.013 + 0.25 = 0.263
Mean Squared Error
If we increase the sample size vefold, how much would the MSE be reduced?
Var(

R) =

2
R
N
=
0.64
5 50
= 0.003
The increase in sample size would not aect the bias and so
MSE(

R) = Var(

R) + [Bias(

R)]
2
= 0.003 + 0.25 = 0.253
Given that the main term in the MSE is the bias and this has not been
reduced, an increase in sample size does not aect the MSE that much.
Notes
Notes
Notes
Mean Squared Error
If we increase the sample size vefold, how much would the MSE be reduced?
Var(

R) =

2
R
N
=
0.64
5 50
= 0.003
The increase in sample size would not aect the bias and so
MSE(

R) = Var(

R) + [Bias(

R)]
2
= 0.003 + 0.25 = 0.253
Given that the main term in the MSE is the bias and this has not been
reduced, an increase in sample size does not aect the MSE that much.
Mean Squared Error
A second statistician takes a sample survey of only N = 20 phone calls, with
persistent follow-up until he gets a response. Let this small but unbiased
sample have a sample mean denoted by

X. What is the MSE?
Table: Whole Population
x f(x) xf(x) x (x )
2
(x )
2
f(x)
0 0.50 0 -0.90 0.81 0.405
1 0.20 0.20 0.10 0.01 0.002
2 0.20 0.40 1.10 1.21 0.242
3 0.10 0.30 2.10 4.41 0.441
= 0.9
2
= 1.09
Mean Squared Error
Var(

X) =

2
X
N
=
1.09
20
= 0.055
MSE(

X) = Var(

X) + [Bias(

X)]
2
= 0.055 + 0 = 0.055
The variance is larger due to the smaller sample size but the mean squared
error is much smaller.
In publishing his results, the second statistician is criticized for using a sample
only 1/10 the size of the rst statistician. What defense might he oer?
MSE(

X) = 0.055
MSE(

R) = 0.253
Notes
Notes
Notes
Mean Squared Error
Var(

X) =

2
X
N
=
1.09
20
= 0.055
MSE(

X) = Var(

X) + [Bias(

X)]
2
= 0.055 + 0 = 0.055
The variance is larger due to the smaller sample size but the mean squared
error is much smaller.
In publishing his results, the second statistician is criticized for using a sample
only 1/10 the size of the rst statistician. What defense might he oer?
MSE(

X) = 0.055
MSE(

R) = 0.253
Large Sample Properties
Unbiasedness, relative eciency, and eciency are small-sample properties of
estimators, that hold irrespective of sample size.
In contrast, large-sample properties are properties of estimators that hold only
as the sample size increases without limit.
Note that this is dependent on sample size, not on the number of samples
drawn.
Intuitively: what would you expect to happen as sample size gets larger?
The variance around the true value decreases (less possibility of
drawing a bad sample).
Eventually the sample size = population, and the estimate collapses on
the true value.
Consistent Estimators
In an informal sense, a consistent estimator is one that concentrates in a
narrower and narrower band around its target as sample size N increases
indenitely.
Figure: Consistency

N =5
N =10
N =50
N =200
One of the conditions that makes an estimator consistent is if its bias and
variance both approach zero as the sample size increases.
Notes
Notes
Notes
Consistent Estimators
Denition: The estimator

N is said to be a consistent estimator of if it
converges in probability to its population value as N goes to innity. We write
this as:
lim
N
Pr
_
|

N |
_
= 1
or equivalently
lim
N
Pr
_
|

N | >
_
= 0
for an arbitrarily small > 0.
If we consider an estimator whose properties vary by sample size (say

N), then

N is consistent if E(

N) as N .
Consistent Estimators
Example: Is the sample mean

X a consistent estimator of the population mean
?
We know that

X is unbiased and that Var
_

2
N
_
approaches zero as N
increases.
As a result,

X is both an unbiased and consistent estimator of .
Consistent Estimators
The fact that the sample mean is consistent for the population mean, or
converges in probability to the population mean, is sometimes referred to as a
law of large numbers.
This provides the theoretical justication for the averaging process that many
employ to obtain precision in measurements. For example, an experimenter
may take the average of the weights of many animals to obtain a precise
estimate of the average weight of animals in a species.
Notes
Notes
Notes
Consistent Estimators
Is P a consistent estimator of ? Is the average response in our fast food
example

R a consistent estimator of ?
Because proportions are just disguised means, it follows that P is also an
unbiased and consistent estimator of .
In terms of our fast food example, we saw that the estimator

R concentrated
around R = 0.40, which is far below the target = 0.90. Thus,

R is
inconsistent.
Consistent Estimators
Is P a consistent estimator of ? Is the average response in our fast food
example

R a consistent estimator of ?
Because proportions are just disguised means, it follows that P is also an
unbiased and consistent estimator of .
In terms of our fast food example, we saw that the estimator

R concentrated
around R = 0.40, which is far below the target = 0.90. Thus,

R is
inconsistent.
Consistent Estimators
Is P a consistent estimator of ? Is the average response in our fast food
example

R a consistent estimator of ?
Because proportions are just disguised means, it follows that P is also an
unbiased and consistent estimator of .
In terms of our fast food example, we saw that the estimator

R concentrated
around R = 0.40, which is far below the target = 0.90. Thus,

R is
inconsistent.
Notes
Notes
Notes
Asymptotically Unbiased Estimators
An asymptotically unbiased estimator has a bias that tends to zero as sample
size N increases. If its variance also tends to zero, then the estimator is
consistent.
Although the MSD estimator is a biased estimator of the population variance

2
, is it asymptotically unbiased?
Mean Squared Deviation =
1
N
N

i=1
(XN

X)
2
Recall from last time that the sample variance is an unbiased estimator of the
population variance
s
2
=
1
N 1
N

i=1
(XN

X)
2
Asymptotically Unbiased Estimators
We can write the MSD in terms of the unbiased s
2
.
Mean Squared Deviation =
_
N 1
N
_
s
2
=
_
1
1
N
_
s
2
E(MSD) =
_
1
1
N
_
E(s
2
)
=
_
1
1
N
_

2
=
2

_
1
N
_

2
Since
1
N
tends to zero as N increases, the bias tends to zero. As a result, the
MSD is biased but asymptotically unbiased.
It can also be shown that the variance of MSD approaches zero as the sample
size increases. As a result, the MSD is a consistent estimator of the population
variance.
Asymptotic Eciency
Asymptotic eciency can be thought of as eciency as N .
It is intuitive to think of this as the speed with which

collapses on .
All else equal, we prefer an estimator that does so faster (i.e. for smaller
sample sizes) rather than more slowly.
Notes
Notes
Notes
General Issues
We prefer estimators that have desirable small-sample properties.
We prefer unbiased to consistent estimators, and
We prefer ecient to asymptotically ecient ones.
but...
We cant always gure out the small sample properties of certain
estimators, and/or
Our estimators with desirable small-sample properties may have other
problems (e.g. computational cost).
As a result, we often have to choose among estimators that dier in their
degree of desirable properties.
Properties of Point Estimators
To go to Properties of Point Estimators applet, click
here
Some Common Unbiased Point Estimators
Table: Expected Values and Standard Errors of Common Point Estimators
Target Parameter Sample Size(s) Point Estimator

E(

) Standard Error

N

X

N
N P =
X
N
=

X

P(1P)
N
1 2 N1 and N2

X1

X2 1 2

2
1
N
1
+

2
2
N
2
1 2 N1 and N2 P1 P2 1 2

P
1
(1P
1
)
N
1
+
P
2
(1P
2
)
N
2
The dierence in means and dierence in proportions assume that the random
samples are independent.
All four estimators in the table possess sampling distributions that are
approximately normal for large samples.
Notes
Notes
Notes
Interval Estimators and Condence Intervals
An interval estimator is a rule specifying the method for using the sample
measurements to calculate two numbers that form the endpoints of an interval.
Ideally, the resulting interval will have two properties.
1 It should contain the target parameter .
2 It should be as narrow as possible.
The length and location of the interval are random variables and we cannot be
certain that a (xed) target parameter will fall in the interval calculated from a
single sample.
We want to nd an interval estimator capable of generating narrow intervals
that have a high probability of enclosing .
Interval Estimators and Condence Intervals
Interval estimators are more commonly called condence intervals.
The upper and lower end points of a condence interval are called the upper
and lower condence limits (bounds).
The probability that a (random) condence interval will enclose (a xed
quantity) is called the condence coecient.
The condence coecient identies the fraction of the time, in repeated
sampling, that the intervals constructed will contain the target parameter .
If the condence coecient associated with our estimator is high, then we can
be highly condent that any condence interval, constructed by using the
results from a single sample, will enclose .
Interval Estimators and Condence Intervals
Suppose that

L and

U are the (random) lower and upper condence limits,
respectively, for a parameter .
Then if
Pr(

L

U) = 1
then the probability (1 ) is the condence coecient (or level of condence).
The resulting random interval dened by [

L,

R] is called a two-sided
condence interval.
The value of 1 is something that is determined by the researcher, and is
usually set with an eye to whether she is more concerned with the parameter
being in the condence interval, or with the relative precision of the interval
estimate.
Notes
Notes
Notes
Interval Estimators and Condence Intervals
It is also possible to form a lower one-sided condence interval such that
Pr(

L ) = 1
The implied condence interval here is [

L, ).
Similarly, we could have what is called an upper one-sided condence interval
such that
Pr(

U) = 1
The implied condence interval here is (,

U).
Interval Estimators and Condence Intervals
One method for nding condence intervals is called the pivotal method.
To use this method, we must have a pivotal quantity that possesses two
characteristics:
1 It is a function of the sample measurements and the unknown parameter
, where is the only unknown quantity.
2 Its probability distribution does not depend on the parameter .
If an estimator has these characteristics, then (as well discuss below) we can
use simple linear transformations to construct condence intervals.
Large-Sample Condence Intervals
As we noted previously, the sampling distribution of a mean (or any sum of a
suciently large number of independent random variables) follows a normal
distribution.
Our typical estimator of , denoted

X, can be thought of as being normally
distributed:

X N(,
2

X
)
where we dened
2

X
=

2
N
and
2
is just the variance of X.
Notes
Notes
Notes
Large-Sample Condence Intervals
To use the pivotal method, we must have a pivotal quantity that possesses two
characteristics:
1 It is a function of the sample measurements and the unknown parameter
, where is the only unknown quantity.
2 Its probability distribution does not depend on the parameter .
With respect to these two criteria:
1 The sample mean

X depends only on the values of X in the sample, and
on the value of .
2 The shape of its sampling distribution does not depend on , but only on
other things (like the size of the sample).
Large-Sample Condence Intervals
To construct a condence interval, we can start with the sample statistic

X.
Since we know that E(

X) = , it makes sense to use the sample value

X as
the center or pivot of our condence interval.
Next, we choose a level of condence tradition suggests that we set
1 = 0.95 (a 95 percent level of condence), though theres nothing
special about this number.
This means that we want to create a condence interval such that
Pr(

XL

XU) = 0.95
Large-Sample Condence Intervals
One way of calculating the bounds of the condence interval is to choose

XL
and

XU so that
Pr( <

XL) =
_
XL


X
(u) du = 0.025
and
Pr( >

XH) =
_

XH

X
(u) du = 0.025.
Since we know the parameters of
X
that is, the distribution is N(,
2

X
)
calculating values for the upper and lower limits of a condence interval is
straightforward.
Notes
Notes
Notes
Large-Sample Condence Intervals
More generally, for any sample statistic

which is an estimator of (where
might be , , 1 2, or 1 2) and whose sampling distribution is Normal
(in large samples), the statistic
Z =

is distributed according to a standard normal distribution.


As a result, Z forms (at least approximately) a pivotal quantity it is a
function of the sample measurements

and a single unknown parameter , and
the standard normal distribution does not depend on .
Large-Sample Condence Intervals
We can consider two values in the tails of that standard normal distribution
z
/2
and z
/2
such that
Pr(z
/2
Z z
/2
) = 1 .
We can rewrite this as
1 = Pr
_
z
/2

z
/2
_
= Pr
_
z
/2



z
/2

_
= Pr
_

z
/2

+z
/2

_
= Pr
_

z
/2



+z
/2

_
Large-Sample Condence Intervals
Figure: Location of z
/2
and z
/2

z/2 -z/2
/2
/2
1-
This means that a (1 ) 100-percent condence interval for is given by
[

L,

U] =
_

z
/2

,

+z
/2

_
Notes
Notes
Notes
Large-Sample Condence Intervals
Thus, constructing a condence interval for a variable whose (asymptotic)
sampling distribution is normal consists of ve steps:
1 Select your level of condence 1 .
2 Calculate the sample statistic

.
3 Calculate the z-value associated with the 1 level of condence.
4 Multiply that z-value by

, the standard error of the sampling statistic.


5 Construct the condence interval according to
[

L,

U] =
_

z
/2

,

+z
/2

_
.
Large-Sample Condence Intervals: Mean
Example (Mean): The shopping times of N = 64 randomly selected customers
at a supermarket were recorded. The average and variance of the 64 shopping
times were 33 and 256 respectively. Estimate , the true average shopping
time per customer, with a condence coecient of 1 = 0.90 i.e. a 90%
condence interval.
In this case, we are interested in target parameter = . Thus,

=

X = 33
and s
2
= 256 for a sample of N = 64. The population variance
2
is unknown,
so we will use s
2
as its estimated value.
The condence interval

z
/2

has the form

X z
/2
_

N
_


X z
/2
_
s

N
_
Large-Sample Condence Intervals: Mean
If we use a standard normal distribution table, we can nd that
z
/2
= z0.05 = 1.645.
Thus, the condence intervals are

X z
/2
_
s

N
_
= 33 1.645
_
16
8
_
= 29.71

X +z
/2
_
s

N
_
= 33 + 1.645
_
16
8
_
= 36.29
In other words, our condence interval for is [29.71, 36.29].
Notes
Notes
Notes
Interpreting Condence Intervals
Our condence interval for is [29.71, 36.29]. What does this mean?
It is very important to remember that this 90% condence interval does NOT
mean that there is a 90% chance that the true population mean is in this
interval.
The population mean is a xed constant and is either in the condence interval
or it is not.
Interpreting Condence Intervals
The correct interpretation is that over a large number of repeated samples,
approximately 90% of all intervals of the form

X 1.645
_
s

N
_
will include ,
the true population mean.
Although we do not know whether the particular interval [29.71, 36.29] that we
have calculated from our sample contains , the procedure that generated it
yields intervals that do capture the true mean in approximately 90% of all
instances where the procedure is used.
This is why we sometimes say that we are 90% condent that the interval
contains the target parameter.
Large-Sample Condence Intervals: Mean
What if we wanted a condence coecient of 1 = 0.95 i.e. a 95%
condence interval?
If we use a standard normal distribution table, we can nd that
z
/2
= z0.025 = 1.96.
Thus, the condence intervals are

X z
/2
_
s

N
_
= 33 1.96
_
16
8
_
= 29.08

X +z
/2
_
s

N
_
= 33 + 1.96
_
16
8
_
= 36.92
In other words, our 95% condence interval for is [29.08, 36.92].
Notes
Notes
Notes
Large-Sample Condence Intervals: Mean
What if we wanted a condence coecient of 1 = 0.99 i.e. a 99%
condence interval?
If we use a standard normal distribution table, we can nd that
z
/2
= z0.005 = 2.58.
Thus, the condence intervals are

X z
/2
_
s

N
_
= 33 2.58
_
16
8
_
= 27.84

X +z
/2
_
s

N
_
= 33 + 2.58
_
16
8
_
= 38.16
In other words, our 99% condence interval for is [27.84, 38.16].
Large-Sample Condence Intervals: Proportions
As we noted previously, for suciently dierent from either zero or one, and
N suciently large, the sampling distribution of P is N(,
2
P
).
That means that we can calculate condence intervals for an estimated
proportion as
PL = P z
/2
__
P(1 P)
N
_
and
PU = P +z
/2
__
P(1 P)
N
_
Large-Sample Condence Intervals: Proportions
Example: Suppose that we have a sample of size 20, and P = 0.390. The
lower bound of the associated 95% condence interval is
L = 0.390 1.96
__
0.39(0.61)
20
_
= 0.390 0.214
= 0.176
while the upper bound is
U = 0.390 + 1.96
__
0.39(0.61)
20
_
= 0.390 + 0.214
= 0.604
Notes
Notes
Notes
Large-Sample Condence Intervals: Proportions
Figure: Condence Intervals for P = for N = 20 (black dashes), N = 100
(red dashes), and N = 400 (green dashes)
0.0 0.2 0.4 0.6 0.8 1.0
0
.0
0
.2
0
.4
0
.6
0
.8
1
.0

h
a
t
The condence interval for a proportion is a straightforward function of two
quantities: the estimated proportion P = , and the sample size N.
Large-Sample Condence Intervals: Mean
To go to CondenceIntervalP under Estimation to illustrate how condence
intervals work, click
here
Dierence in Proportions
Example (Dierence in Proportions): Two brands of refrigerators, A and B, are
each guaranteed for 1 year. In a random sample of 50 refrigerators of brand A,
12 were observed to fail before the guarantee period ended. An independent
random sample of 60 brand B refrigerators also revealed 12 failures during the
guarantee period. Estimate the true dierence (1 2) between proportions
of failures during the guarantee period with a condence coecient
approximately 0.98.
The condence interval

z
/2

has the form


(P1 P2) z
/2
_
P1(1 P1)
N1
+
P2(1 P2)
N2
Notes
Notes
Notes
Dierence in Proportions
We have P1 = 0.24, 1 P1 = 0.76, P2 = 0.20, and 1 P2 = 0.80, and
z0.01 = 2.33.
Thus, the desired 98% condence interval is
(0.24 0.20) 2.33
_
(0.24)(0.76)
50
+
(0.20)(0.80)
60
0.04 0.1851 or [0.1451, 0.2251]
Dierence in Proportions
The 98% condence interval is [-0.1451, 0.2251].
Notice that the condence interval contains 0. Thus, a zero value for the
dierence in proportions is believable (at approximately the 98% level) on
the basis of the observed data.
But of course the interval also contains the value 0.1, and so 0.1 represents
another value for the dierence in proportions that is believable etc.
Selecting the Sample Size
There are two considerations in choosing the appropriate sample size for
estimating using a condence interval.
1 The tolerable error. This establishes the desired width of the condence
interval.
2 The condence level that should be selected.
A wide condence interval would not be very informative, but the cost of
obtaining a narrow condence interval could be quite large.
Similarly, too low a condence level would mean that the stated condence
interval is likely to be in error, but obtaining a higher level of condence might
be quite expensive.
Notes
Notes
Notes
Selecting the Sample Size
Suppose we wish to estimate the average daily yield of a chemical and we
wish the error of estimation to be less than 5 tons with probability 0.95.
Because approximately 95% of the sample means will lie within 2
X
(really
1.96
X
) of in repeated sampling, we are asking that 2
X
equals 5 tons.
2

N
= 5
N =
4
2
25
Selecting the Sample Size
We cannot obtain an exact numerical value of N unless the population
standard deviation is known.
We could use an estimate s obtained from a previous sample. Lets say that
= 21.
N =
(4)(21)
2
25
= 70.56 = 71
Thus, using a sample size N = 71, we can be 95% condence that our
estimate will lie within 5 tons of the true average daily yield.
Small-Sample Condence Intervals
The formula for calculating large-sample condence intervals is

z
/2

When = is the target parameter, then



=

X and
2

=

2
N
, where
2
is the
population variance.
If the true value of
2
is known, then this value should be used when
calculating the condence interval.
However, if
2
is unknown (as will almost always be the case) and N is large,
then there is no real loss of accuracy if s
2
is substituted for
2
(recall that s
2
converges to
2
as N increases).
As a result, we can use the standard normal distribution in these circumstances
as well.
Notes
Notes
Notes
Small-Sample Condence Intervals
Problems only arise if
2
is unknown AND N is small.
In this case, we will need to calculate small-sample condence intervals.
In eect, using s instead of introduces an additional source of unreliability
into our calculations and we must, therefore, widen the condence intervals.
Small-Sample Condence Intervals: Mean
In terms of the population mean, we have already seen that
Z =

X
/

N
possessed approximately a standard normal distribution.
Well, if we substitute in s for we have
T =

X
s/

N
which has a t distribution with (N 1) degrees of freedom.
Small-Sample Condence Intervals: Mean
The quantity T now serves as a pivotal quantity that we will use to form
condence intervals for .
We can use a t distribution table to nd values t
N1,/2
and t
N1,/2
so that
P(t
N1,/2
t
N1,/2
) = 1
Thus, we will now construct our condence intervals according to:
[

XL,

XU] =

X t
N1,/2
_
s

N
_
Notes
Notes
Notes
Students t-Distribution
Figure: Standard Normal and Students t-Distributions

The condence intervals constructed using the z distribution and the t
distribution are eectively the same when the degrees of freedom (N 1) are
greater than 120; they are also very close as soon as the degrees of freedom
(N 1) are greater than 30.
Students t-Distribution
To go to Comparison of Students t and Normal Distributions under
Distributions Related to the Normal, click
here
Small-Sample Condence Intervals: Mean
Technically, the small-sample condence intervals for the mean are based on
the assumption that the sample is randomly drawn from a normal population.
However, experimental evidence has shown that the interval for a single mean
is quite robust in relation to moderate departures from normality.
Notes
Notes
Notes
Small-Sample Condence Intervals: Mean
Example: A manufacturer of gunpowder has developed a new powder, which
was tested in eight shells. The resulting muzzle velocities were: 3005, 2925,
2935, 2965, 2995, 3005, 2937, 2905. Find a 95% condence interval for the
true average velocity for shells of this type. Assume that muzzle velocities
are approximately normally distributed.
The condence interval for is

X t
N1,/2
_
s

N
_
Small-Sample Condence Intervals: Mean
For the given data,

X = 2959 and s = 39.1.
Using the table for the t-distribution, we have t7,0.025 = 2.365.
Thus, we have
2959 2.365
_
39.1

8
_
or 2959 32.7
as the observed condence interval for .
Small-Sample Condence Intervals: Mean
. sum muzzle_velocity
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
muzzle_vel~y | 8 2959 39.08964 2905 3005
. ci muzzle_velocity, level(95)
Variable | Obs Mean Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------
muzzle_vel~y | 8 2959 13.82027 2926.32 2991.68
. ci muzzle_velocity, level(99)
Variable | Obs Mean Std. Err. [99% Conf. Interval]
-------------+---------------------------------------------------------------
muzzle_vel~y | 8 2959 13.82027 2910.636 3007.364
Notes
Notes
Notes
Small-Sample Condence Intervals: Mean
Example: From a large class, a random sample of 4 grades was drawn: 64, 66,
89, and 77. Calculate a 95% condence interval for the whole class mean .
Assume that the class grades are approximately normally distributed.
Table: Small-Sample Condence Interval for a Mean
X (X

X) (X

X)
2
64 -10 100
66 -8 64
89 15 225
77 3 9

X =
296
4
= 74 0 s
2
=
398
3
= 132.7
Small-Sample Condence Intervals: Mean
The condence interval for is

X t
N1,/2
_
s

N
_
For the given data,

X = 74 and s =

132.7.
In this example, we have N 1 = 3 degrees of freedom.
Using the table for the t distribution, we have t3,0.025 = 3.18.
Small-Sample Condence Intervals: Mean
Thus, we have
74 3.18
_
132.7

4
_
or 74 18
as the observed condence interval for .
That is, with 95% condence, we can conclude that the mean grade of the
whole class is between 56 and 92.
Notes
Notes
Notes
Dierence in Means
Suppose we are interested in comparing the means of two normal populations,
one with mean 1 and variance
2
1
and the other with mean 2 and
2
2
.
If the samples are independent, then condence intervals for 1 2 based on
a t-distributed random variable can be constructed if we assume that the two
populations have a common but unknown variance,
2
1
=
2
2
=
2
.
If

X1 and

X2 are the two sample means, then the large-sample condence
interval for (1 2) is developed by using
Z =
(

X1

X2) (1 2)
_

2
1
N1
+

2
2
N2
as a pivotal quantity.
Small-Sample Condence Intervals: Dierence in Means
Using the assumption
2
1
=
2
2
=
2
,
Z =
(

X1

X2) (1 2)

_
1
N1
+
1
N2
Because is unknown, though, we need to nd an estimator for the common
variance
2
so that we can construct a quantity with a t distribution.
Small-Sample Condence Intervals: Dierence in Means
Let X11, X12, . . . , X1N1
denote the random sample of size N1 from the rst
population and let X21, X22, . . . , X2N2
denote an independent random sample
of size N2 from the second population. Then we have

X1 =
1
N1
N

i=1
X1i
and

X2 =
1
N2
N

i=1
X2i
Notes
Notes
Notes
Dierence in Means
The usual unbiased estimator of the common variance
2
is obtained by
pooling the sample data to obtain the pooled estimator s
2
p
:
s
2
p
=

N1
i=1
(X1i

X1)
2
+

N2
i=1
(X2i

X2)
2
(N1 1) + (N2 1)
=
(N1 1)s
2
1
+ (N2 1)s
2
2
N1 +N2 2
where s
2
i
is the sample variance from the i
th
sample, i = 1, 2.
Notice that if N1 = N2, then s
2
p
is just the average of s
2
1
and s
2
2
.
If N1 = N2, then s
2
p
is the weighted average of s
2
1
and s
2
2
, with larger weight
given to the sample variance associated with the larger sample size.
Dierence in Means
From all of this we can calculate the following pivotal quantity:
T =
(

X1

X2) (1 2)
sp
_
1
N1
+
1
N2
This quantity has a t distribution with (N1 +N2 2) degrees of freedom.
If we use the pivotal method, we nd that the small-sample condence interval
for (1 2) is just
(

X1

X2) t
N1+N22,/2
sp
_
1
N1
+
1
N2
Dierence in Means
Technically, the small-sample condence intervals for the dierence in two
means are based on the assumptions that the samples are randomly drawn from
two independent and normal populations with equal variances.
Experimental evidence has shown that these intervals are robust to moderate
departures from normality and to the assumption of equal population variances
if N1 N2.
As N1 and N2 become dissimilar, the assumption of equal population variances
becomes more crucial.
Notes
Notes
Notes
Dierence in Means
Example: Suppose we want to compare two methods for training people. At
the end of the training, two groups of nine employees are timed at some task.
The nine people who had the standard training had times of 32, 37, 35, 28, 41,
44, 35, 31, and 34. The nine people who had the new training had times of 35,
31, 29, 25, 34, 40, 27, 32, and 31. Estimate the true mean dierence (1 2)
with condence coecient 0.95.
Assume that the assembly times are approximately normally distributed, that
the variances of the assembly times are approximately equal for the two
methods, and that the samples are independent.
Dierence in Means
For the standard training method, we have sample mean

X1 = 35.22 and
sample variance s
2
1
=

9
i=1
(X1i

X1)
N1
=
195.56
8
= 24.445.
For the new training method, we have sample mean

X2 = 31.56 and sample
variance s
2
2
=

9
i=1
(X2i

X2)
N1
=
160.22
8
= 20.027.
As a result, we have
s
2
p
=
8(24.445) + 8(20.027)
9 + 9 2
=
195.56 + 160.22
16
= 22.236
sp = 4.716
Dierence in Means
Since t16,0.025 = 2.120, the observed condence interval is
(

X1

X2) t
N1+N22,/2
sp
_
1
N1
+
1
N2
(35.22 31.56 (2.120)(4.716)
_
1
9
+
1
9
3.66 4.71
This condence interval can be written as [-1.05, 8.37].
Since the interval contains both positive and negative numbers, we cannot say
that the new training method diers from the other at our given level of
condence.
Notes
Notes
Notes
Dierence in Means
Example: From a large class, a sample of 4 grades were drawn: 64, 66, 89, and
77. From a second large class, an independent sample of 3 grades were drawn:
56, 71, and 53. Calculate the 95% condence interval for the dierence
between the two class means, 1 2. Assume that the grades from both
classes are approximately normally distributed and that the variances of the
grades are approximately equal for the two classes.
Dierence in Means
Table: Dierence in Two Means (Independent Samples): Class 1
X1 (X1

X1) (X1

X1)
2
64 -10 100
66 -8 64
89 15 225
77 3 9

X1 =
296
4
= 74 0 s
2
1
=
398
3
= 132.7
Table: Dierence in Two Means (Independent Samples): Class 2
X2 (X2

X2) (X2

X2)
2
56 -4 16
71 11 121
53 -7 49

X2 =
180
3
= 60 0 s
2
2
=
186
2
= 93
Dierence in Means
Class 1: The sample mean is

X1 = 74 and the sample variance is
s
2
1
=

4
i=1
(X1i

X1)
N1
=
398
3
= 132.7.
Class 2: The sample mean is

X2 = 31.56 and the sample variance is
s
2
2
=

3
i=1
(X2i

X2)
N1
=
186
2
= 93.
s
2
p
=
3(132.7) + 2(93
4 + 3 2
=
398 + 186
5
= 117
sp =

117
Notes
Notes
Notes
Dierence in Means
We can nd that t5,0.025 = 2.57. The observed condence interval is therefore
(

X1

X2) t
N1+N22,/2
sp
_
1
N1
+
1
N2
(74 60 (2.57)

117
_
1
4
+
1
3
14 21
This condence interval can be written as [-7, 35].
Since the interval contains both positive and negative numbers, we cannot say
that the mean grades dier from one class to the other at our given level of
condence.
Dierence in Means (Dependent or Matched Samples)
We might also want to compare means across dependent samples. Dependent
samples are sometimes called matched or paired samples.
Suppose that we want to compare the fall grades and spring grades for the
same students.
Table: Dierence in Two Means (Dependent Samples)
Observed Grades Dierence
Name X1 X2 D = X1 X2 D

D (D

D)
2
Trimble 64 57 7 -4 16
Wilde 66 57 9 -2 4
Giannos 89 73 16 5 25
Ames 77 65 12 1 1

D =
44
4
= 11 0 s
2
D
=
46
3
= 15.3
Dierence in Means (Dependent or Matched Samples)
We can use the sample

D to construct a condence interval for the average
population dierence .
The condence interval for in a matched pair sample is
=

D t
N1,/2
_
sD

N
_
Suppose we want to construct a 95% condence interval.
Notes
Notes
Notes
Dierence in Means (Dependent or Matched Samples)

D = 11 and sD =

15.3.
Using the table for the t distribution, we have t3,0.025 = 3.18.
Thus, we have
11 3.18
_
15.3

4
_
or 11 6
as the observed condence interval for .
Dierence in Means (Dependent or Matched Samples)
We are estimating the same parameter (the dierence in two population
means) with the dependent samples as we did with the independent samples.
The matched-pair approach is much better because it has a smaller condence
interval. Why?
Independent samples condence interval was [-7, 35].
Dependent samples condence interval for the same data was just [5, 17].
Essentially, pairing achieves a match that keeps many of the extraneous
variables that might aect our results constant.
Dierence in Means (Dependent or Matched Samples)
We are estimating the same parameter (the dierence in two population
means) with the dependent samples as we did with the independent samples.
The matched-pair approach is much better because it has a smaller condence
interval. Why?
Independent samples condence interval was [-7, 35].
Dependent samples condence interval for the same data was just [5, 17].
Essentially, pairing achieves a match that keeps many of the extraneous
variables that might aect our results constant.
Notes
Notes
Notes
Overview
Example: To measure the eect of a tness campaign, a ski club randomly
sampled ve members before the campaign and another 5 afterwards. The
weights were as follows:
Before: JH 168, KL 195, MM 155, TR 183, MT 169
After: LW 183, VG 177, EP 148, JC 162, MW 180
Calculate a 95% condence interval for (i) the mean weight before the
campaign, (ii) the mean weight after the campaign, and (iii) the mean weight
loss during the campaign.
Overview
Table: Small-Sample Condence Interval for Dierence in Two Means
(Independent Samples)
Before After
X1 (X1

X1) (X1

X1)
2
X2 (X2 x2) (X2

X2)
2
168 -6 36 183 13 169
195 21 441 177 7 49
155 -19 361 148 -22 484
183 9 81 162 -8 64
169 -5 25 180 10 100

X1 =
870
5
= 174 0 944

X2 =
850
5
= 170 0 866
Overview
1 = 174 2.78
_
944/4

5
= 174 19
2 = 170 2.78
_
866/4

5
= 170 18
1 2 = 174 170 2.31
_
944 + 866
4 + 4

_
1
5
+
1
5
= 4 22
Notes
Notes
Notes
Overview
It was then decided that a better sampling design would be to measure the
same people after, as before.
KL 194, MT 160, TR 177, MM 147, JH 157
Table: Dierence in Two Means (Dependent Samples)
Weights Dierence
Name X1 X2 D = X1 X2 D

D (D

D)
2
JH 168 157 11 4 16
KL 195 194 1 -6 36
MM 155 147 8 1 1
TR 183 177 6 -1 1
MT 169 160 9 2 4

D =
35
5
= 7 0 s
2
D
=
58
4
= 14.5
= 7 2.78
_
14.5

5
_
or 7 5
Condence Interval for Population Variance
2
As weve seen before, s
2
=

N
i=1
(Xi

X)
N1
is an unbiased estimator of
2
.
Given that it is distributed according to a gamma distribution, it can be
dicult to determine the probability of it lying in a specic interval.
But we can transform it (as we did before) into a quantity that has a
2
distribution with N 1 degrees of freedom.

2
=
(N 1)s
2

2
Condence Interval for Population Variance
2
As we saw before, this can be written as:

2
N1
=
(N 1)s
2

2
=

N
i=1
(Xi

X)

2
This quantity becomes the pivotal quantity that allows us to calculate
condence intervals for the population variance
2
.
In eect, we want to nd two numbers
2
L
and
2
U
such that
P
_

2
L

N
i=1
(Xi

X)

2

2
U
_
= 1
for any condence coecient (1 ).
Notes
Notes
Notes
Condence Interval for Population Variance
2
Figure:
2
Distribution with (N 1) = 3 Degrees of Freedom

n-1=3degreesof freedom
We would like to nd the shortest interval that includes
2
with probability
(1 ). This is dicult and requires trial and error.
Typically, we compromise by choosing points that cut o equal tail areas.
Condence Interval for Population Variance
2
Given the choice to cut o equal tail areas, we obtain
P
_

2
N1,1(/2)

N
i=1
(Xi

X)

2

2
N1,/2
_
= 1
When we reorder the inequality, we get
P
_
(N 1)s
2

2
N1,/2

2

(N 1)s
2

2
N1,1(/2)
_
= 1
Thus, the 100(1 )% condence interval for
2
is
_
(N 1)s
2

2
N1,/2
,
(N 1)s
2

2
N1,1(/2)
_
Condence Interval for Population Variance
2
Technically, the small-sample condence intervals for the population variance

2
assume that the sampled population is normally distributed.
Unlike with small-sample condence intervals for the population mean or
dierence in population means, which are reasonably robust to deviations from
normality, experimental evidence suggests that the small-sample condence
intervals for the population variance can be quite misleading if the sampled
population is not normally distributed.
Notes
Notes
Notes
Condence Interval for Population Variance
2
Example: Suppose we have a sample of observations with values 4.1, 5.2, and
10.2. Estimate
2
with condence coecient 0.90. Assume normality.
For the data, we have s
2
= 10.57.
We can see from the
2
distribution table that
2
2,0.95
= 0.103 and

2
2,0.05
= 5.991.
Condence Interval for Population Variance
2
Thus, the 90% condence interval for
2
is
_
(N 1)s
2

2
2,0.05
,
(n 1)s
2

2
2,0.95
_
_
(2)(10.57)
5.991
,
(2)(10.57)
0.103
_
[3.53, 205.24]
This condence interval is very wide. Why?
Comparing
2
in Two Populations
What if we want to compare
2
in two populations?
Rather than look at s
2
1
s
2
2
, which has a complicated sampling distribution, we
look at
s
2
1
s
2
2
.
Let two independent random samples of sizes N1 and N2 be drawn from two
normal populations with variances
2
1
and
2
2
.
Let the variances of the random samples be s
2
1
and s
2
2
.
Notes
Notes
Notes
Comparing
2
in Two Populations
Notice that
(N11)s
2
1

2
1
and
(n21)s
2
2

2
2
are both independent
2
random variables.
The ratio of two
2
N1
random variables is distributed according to an F
distribution with (N1 1) and (N2 1) degrees of freedom, i.e.,
FN11,N21 =
(N1 1)s
2
1
(N1 1)
2
1
/
(N2 1)s
2
2
(N2 1)
2
2
=
s
2
1

2
2
s
2
2

2
1
that has an F distribution with (N1 1) numerator degrees of freedom and
(N2 1) denominator degrees of freedom.
This quantity acts as a pivotal quantity.
Comparing
2
in Two Populations
And so, we want to nd:
P
_
F
N11,N21,/2

s
2
1

2
2
s
2
2

2
1
F
N11,N21,1(/2)
_
= 1
When we reorder the inequalities, we have
1
F
N11,N21,1(/2)

s
2
1
s
2
2


2
1

2
2

1
F
N11,n21,/2

s
2
1
s
2
2
Thus, if we were to construct a 90% condence interval for the ratio of two
population variances based on two sample variances where N1 = 10 and
N2 = 8, we would have
1
F9,7,0.95

s
2
1
s
2
2


2
1

2
2

1
F9,7,0.05

s
2
1
s
2
2
Comparing
2
in Two Populations
But how do you nd
1
F9,7,0.95
and
1
F9,7,0.05
?
Most F distribution tables will only give you information related to the
right-hand tail of the distribution.
Thus, it is relatively straightforward to nd that
1
F9,7,0.95
=
1
3.68
.
But how do we nd what
1
F9,7,0.05
is?
Notes
Notes
Notes
Comparing
2
in Two Populations
Recall from our discussion of probability distributions that if W1 and W2 are
independent and
2
k
and
2

, respectively, then
W1
W2
Fk,
In other words, the ratio of two chi-squared variables is distributed as F with
d.f. equal to the number of d.f. in the numerator and denominator variables.
We saw that this implied that:
If X F(k, ), then
1
X
F(, k) (because
1
X
=
1
(W1/W2)
=
W2
W1
).
Comparing
2
in Two Populations
Well, it follows from this that:
F
N21,N11,1(/2)
=
1
F
N11,N21,/2
Given this, we have
F9,7,0.05 =
1
F7,9,0.95
=
1
3.29
= 0.3
As a result, we can write the 90% condence interval as
1
3.68

s
2
1
s
2
2


2
1

2
2

1
0.3

s
2
1
s
2
2
Comparing
2
in Two Populations
Example: Two samples of sizes 16 and 10 are drawn at random from two
normal populations. Suppose their sample variances are 25.2 and 20
respectively. Find the (i) 98% and (ii) 90% condence limits for the ratio of
the variances.
We need to calculate
1
F15,9,0.99
and
1
F15,9,0.01
.
Looking at the back of the book, we nd that F15,9,0.99 = 4.96.
We also know that F15,9,0.01 =
1
F9,15,0.99
=
1
3.89
.
Notes
Notes
Notes
Comparing
2
in Two Populations
Given this, we nd for the 98% condence interval that we have
1
4.96

25.2
20.0


2
1

2
2
3.89
25.2
20.0
0.283

2
1

2
2
4.90
Comparing
2
in Two Populations
We now need to nd
1
F15,9,0.95
and
1
F15,9,0.05
.
Looking at the back of the book, we nd that F15,9,0.95 = 3.01.
We also know that F15,9,0.05 =
1
F9,15,0.95
=
1
2.59
.
Given this, we nd for the 90% condence interval that we have
1
3.01

25.2
20.0


2
1

2
2
2.59
25.2
20.0
0.4186

2
1

2
2
3.263
Comparing
2
in Two Populations
Example: Find the 98% and 90% condence limits for the ratio of the standard
deviations in the previous example.
By taking square roots of the inequalities in the previous example, we nd the
98% condence limits are

0.283
1
2

4.90
0.53
1
2
2.21
and that the 90% condence limits are

0.4186
1
2

3.263
0.65
1
2
1.81
Notes
Notes
Notes
Stata
Lets return to Zorns Warren & Burger Court data from last time.
One variable constit was coded one if the cases was decided on
constitutional grounds, and zero otherwise.
The true population proportion is = 0.2536; well use this as an example of
how we can learn about that parameter through the use of condence intervals.
Well begin by considering a rather small random sample of cases (N = 20),
and calculating the condence interval for based on that sample
Stata
. use WarrenBurger
. su
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
us | 0
id | 7161 3581 2067.347 1 7161
amrev | 7161 .4319229 1.342633 0 33
amaff | 7161 .4099986 1.302139 0 37
sumam | 7161 .8419215 2.189712 0 39
-------------+--------------------------------------------------------
fedpet | 7161 .173998 .3791343 0 1
constit | 7161 .2535959 .4350993 0 1
sgam | 7161 .0786203 .269164 0 1
. sample 20, count
(7141 observations deleted)
Stata
. ci constit, level(95)
Variable | Obs Mean Std. Err. [95% Conf. Interval]
-------------+---------------------------------------------------------------
constit | 20 .2 .0917663 .0079309 .3920691
The condence interval for this sample is [0.008, 0.392], which means that in
repeated random samples from this population, we would expect the true
population parameter to be contained in an interval constructed in this way
95% of the time.
Notes
Notes
Notes
Stata
To illustrate the idea of a condence interval, we can do what we just did, say,
100 times, and then see how many of the resulting condence intervals contain
the true population value 0.2536.
program define CI20, rclass
version 10
use WarrenBurger, clear
sample 20, count
tempvar z
gen z = constit
summarize z
return scalar mean=r(mean)
return scalar ub=r(mean) + 1.96*sqrt((r(mean) * (1-r(mean)))/20)
return scalar lb=r(mean) - 1.96*sqrt((r(mean) * (1-r(mean)))/20)
end
. set seed 11101968
. simulate pihat=r(mean) ub=r(ub) lb=r(lb), reps(100): CI20, nodots
Stata
. su
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
pihat | 100 .2335 .0901892 .05 .5
ub | 100 .4125235 .1171615 .1455186 .7191347
lb | 100 .0544765 .0641941 -.0455186 .2808653
. tab pihat
r(mean) | Freq. Percent Cum.
------------+-----------------------------------
.05 | 4 4.00 4.00
.1 | 7 7.00 11.00
.15 | 14 14.00 25.00
.2 | 24 24.00 49.00
.25 | 21 21.00 70.00
.3 | 10 10.00 80.00
.35 | 16 16.00 96.00
.4 | 3 3.00 99.00
.5 | 1 1.00 100.00
------------+-----------------------------------
Total | 100 100.00
Stata
Figure: 100 Condence Intervals for constit for N = 20
0
0
01
1
12
2
23
3
34
4
4(Density of Estimates of pi)
( D
e
n
s
i t y
o
f E
s
t i m
a
t e
s
o
f p
i )
(Density of Estimates of pi) 0
0
0.2
. 2
.2 .4
. 4
.4 .6
. 6
.6 .8
. 8
.8 CI Range
C
I R
a
n
g
e
CI Range 0
0
0.1
.1
.1 .2
.2
.2 .3
.3
.3 .4
.4
.4 .5
.5
.5 Value of Pi
Value of Pi
Value of Pi
Note that we have four observations with = 0.05, seven with = 0.10, and
one with = 0.50, all of which have calculated condence intervals that do not
include the true value = 0.2536. Thats 12/100, or = 0.12, which is
quite dierent from = 0.05.
Notes
Notes
Notes
Stata
We can modify the code slightly to do the same thing with 100 samples of
N = 100:
program define CI100, rclass
version 10
use WarrenBurger, clear
sample 100, count
tempvar z
gen z = constit
summarize z
return scalar mean=r(mean)
return scalar ub=r(mean) + 1.96*sqrt((r(mean) * (1-r(mean)))/100)
return scalar lb=r(mean) - 1.96*sqrt((r(mean) * (1-r(mean)))/100)
end
. simulate pihat=r(mean) ub=r(ub) lb=r(lb), reps(100): CI100, nodots
Stata
. tab pihat
r(mean) | Freq. Percent Cum.
------------+-----------------------------------
.16 | 3 3.00 3.00
.17 | 1 1.00 4.00
.18 | 2 2.00 6.00
.19 | 3 3.00 9.00
.2 | 5 5.00 14.00
.21 | 7 7.00 21.00
.22 | 6 6.00 27.00
.23 | 10 10.00 37.00
.24 | 13 13.00 50.00
.25 | 14 14.00 64.00
.26 | 8 8.00 72.00
.27 | 8 8.00 80.00
.28 | 4 4.00 84.00
.29 | 1 1.00 85.00
.3 | 7 7.00 92.00
.31 | 3 3.00 95.00
.32 | 2 2.00 97.00
.33 | 1 1.00 98.00
.35 | 1 1.00 99.00
.37 | 1 1.00 100.00
------------+-----------------------------------
Total | 100 100.00
Stata
Figure: 100 Condence Intervals for
constit
for N = 100
0
0
05
5
510
1
0
10 (Density of Estimates of pi)
( D
e
n
s
i t y
o
f E
s
t i m
a
t e
s
o
f p
i )
(Density of Estimates of pi) .1
. 1
.1 .2
. 2
.2 .3
. 3
.3 .4
. 4
.4 .5
. 5
.5 CI Range
C
I R
a
n
g
e
CI Range .15
.15
.15 .2
.2
.2 .25
.25
.25 .3
.3
.3 .35
.35
.35 Value of Pi
Value of Pi
Value of Pi
Only six samples (three with = 0.16, one with = 0.17, one with = 0.35, and
one with = 0.37) out of 100 whose condence intervals do not include now.
Thats much closer to = 0.05, as we expect it to be. What we say here is that the
coverage probabilities are getting better as the size of the sample increases.
Notes
Notes
Notes
Stata
If we do the same for 100 samples each with N = 400, the coverage gets even
better:
. simulate pihat=r(mean) ub=r(ub) lb=r(lb), reps(100): CI400, nodots
command: CI400, nodots
pihat: r(mean)
ub: r(ub)
lb: r(lb)
Simulations (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
.................................................. 100
. su
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
pihat | 100 .24975 .0213984 .2 .3025
ub | 100 .2921026 .0226069 .2392 .3475154
lb | 100 .2073974 .0201902 .1608 .2574846
Stata
Figure: 100 Condence Intervals for
constit
for N = 400
0
0
05
5
510
1
0
10 15
1
5
15 20
2
0
20 (Density of Estimates of pi)
( D
e
n
s
i t y
o
f E
s
t i m
a
t e
s
o
f p
i )
(Density of Estimates of pi) .15
. 1
5
.15 .2
. 2
.2 .25
. 2
5
.25 .3
. 3
.3 .35
. 3
5
.35 CI Range
C
I R
a
n
g
e
CI Range .2
.2
.2 .22
.22
.22 .24
.24
.24 .26
.26
.26 .28
.28
.28 .3
.3
.3 Value of Pi
Value of Pi
Value of Pi
While they are not shown here the coverage probability is more-or-less perfect
(96/100). In each gure, the density plot overlaid shows the distribution of estimated
means ( s). Note that they look both increasingly Normal, and that their range /
standard deviation declines, as the sample sizes increase.
Stata
This is the code for the gures.
. twoway (scatter pihat pihat, mcolor(black) msymbol(circle))
(rcap ub lb pihat, lcolor(black) lwidth(vthin) msize(small))
(kdensity pihat, yaxis(2) lcolor(gs8) lpattern(dash)),
ytitle(CI Range) yline(.2536, lpattern(longdash) lcolor(cranberry))
ytitle((Density of Estimates of pi), axis(2))
xtitle(Value of Pi) legend(off)
aspectratio(1, placement(center))
Notes
Notes
Notes

S-ar putea să vă placă și