Sunteți pe pagina 1din 16

Confidence intervals

Confidence Interval
Confidence Limits
Confidence Level
Confidence Interval for a Mean
Confidence Interval for the Difference Between Two Means
Main Contents page | Index of all entries

Confidence Interval
A confidence interval gives an estimated range of values which is likely to include
an unknown population parameter, the estimated range being calculated from a
given set of sample data.
If independent samples are taken repeatedly from the same population, and a
confidence interval calculated for each sample, then a certain percentage
(confidence level) of the intervals will include the unknown population parameter.
Confidence intervals are usually calculated so that this percentage is 95%, but
we can produce 90%, 99%, 99.9% (or whatever) confidence intervals for the
unknown parameter.
The width of the confidence interval gives us some idea about how uncertain we
are about the unknown parameter (see precision). A very wide interval may
indicate that more data should be collected before anything very definite can be
said about the parameter.
Confidence intervals are more informative than the simple results of hypothesis
tests (where we decide "reject H0" or "don't reject H0") since they provide a range
of plausible values for the unknown parameter.

See also confidence limits.

Confidence Limits
Confidence limits are the lower and upper boundaries / values of a confidence
interval, that is, the values which define the range of a confidence interval.
The upper and lower bounds of a 95% confidence interval are the 95%
confidence limits. These limits may be taken for other confidence levels, for
example, 90%, 99%, 99.9%.

Confidence Level
The confidence level is the probability value
interval.

associated with a confidence

It is often expressed as a percentage. For example, say


, then the
confidence level is equal to (1-0.05) = 0.95, i.e. a 95% confidence level.
Example
Suppose an opinion poll predicted that, if the election were held today, the
Conservative party would win 60% of the vote. The pollster might attach a 95%
confidence level to the interval 60% plus or minus 3%. That is, he thinks it very
likely that the Conservative party would get between 57% and 63% of the total
vote.

Confidence Interval for a Mean


A confidence interval for a mean specifies a range of values within which the
unknown population parameter, in this case the mean, may lie. These intervals
may be calculated by, for example, a producer who wishes to estimate his mean

daily output; a medical researcher who wishes to estimate the mean response by
patients to a new drug; etc.
The (two sided) confidence interval for a mean contains all the values of 0 (the
true population mean) which would not be rejected in the two-sided hypothesis
test of:
H0: = 0
against
H1: not equal to 0
The width of the confidence interval gives us some idea about how uncertain we
are about the unknown population parameter, in this cas the mean. A very wide
interval may indicate that more data should be collected before anything very
definite can be said about the parameter.
We calculate these intervals for different confidence levels, depending on how
precise we want to be. We interpret an interval calculated at a 95% level as, we
are 95% confident that the interval contains the true population mean. We could
also say that 95% of all confidence intervals formed in this manner (from different
samples of the population) will include the true population mean.
Compare one sample t-test.

Confidence Interval for the Difference Between Two Means


A confidence interval for the difference between two means specifies a range of
values within which the difference between the means of the two populations may
lie. These intervals may be calculated by, for example, a producer who wishes to
estimate the difference in mean daily output from two machines; a medical
researcher who wishes to estimate the difference in mean response by patients
who are receiving two different drugs; etc.
The confidence interval for the difference between two means contains all the
values of 1 - 2 (the difference between the two population means) which would
not be rejected in the two-sided hypothesis test of:
H0: 1 = 2

against
H1: 1 not equal to 2
i.e.
H0: 1 - 2 = 0
against
H1: 1 - 2 not equal to 0
If the confidence interval includes 0 we can say that there is no significant
difference between the means of the two populations, at a given level of
confidence.
The width of the confidence interval gives us some idea about how uncertain we
are about the difference in the means. A very wide interval may indicate that
more data should be collected before anything definite can be said.
We calculate these intervals for different confidence levels, depending on how
precise we want to be. We interpret an interval calculated at a 95% level as, we
are 95% confident that the interval contains the true difference between the two
population means. We could also say that 95% of all confidence intervals formed
in this manner (from different samples of the population) will include the true
difference.
Compare two sample t-test.

Top of page | Main Contents page

Confidence Limits for the Mean


Purpose:
Interval
Estimate for
Mean

Confidence limits for the mean (Snedecor and Cochran, 1989) are an interval estimate for the m
Interval estimates are often desirable because the estimate of the mean varies from sample to
sample. Instead of a single estimate for the mean, a confidence interval generates a lower and up
limit for the mean. The interval estimate gives an indication of how much uncertainty there is in
estimate of the true mean. The narrower the interval, the more precise is our estimate.
Confidence limits are expressed in terms of a confidence coefficient. Although the choice of

confidence coefficient is somewhat arbitrary, in practice 90 %, 95 %, and 99 % intervals are ofte


used, with 95 % being the most commonly used.

Definition:
Confidence
Interval

As a technical note, a 95 % confidence interval does not mean that there is a 95 % probability th
the interval contains the true mean. The interval computed from a given sample either contains t
true mean or it does not. Instead, the level of confidence is associated with the method of
calculating the interval. The confidence coefficient is simply the proportion of samples of a give
size that may be expected to contain the true mean. That is, for a 95 % confidence interval, if ma
samples are collected and the confidence interval computed, in the long run about 95 % of these
intervals would contain the true mean.
Confidence limits are defined as:
Yt1/2,N1sN
where Y is the sample mean, s is the sample standard deviation, N is the sample size, is the

desired significance level, and t1-/2,N-1 is the 100(1-/2) percentile of the t distribution with N - 1
degrees of freedom. Note that the confidence coefficient is 1 - .
From the formula, it is clear that the width of the interval is controlled by two factors:
1. As N increases, the interval gets narrower from the N term.

That is, one way to obtain more precise estimates for the mean is to increase the sample s

2. The larger the sample standard deviation, the larger the confidence interval. This simply
means that noisy data, i.e., data with a large standard deviation, are going to generate wid
intervals than data with a smaller standard deviation.
Definition:
To test whether the population mean has a specific value, 0, against the two-sided alternative th
Hypothesis Test does not have a value 0, the confidence interval is converted to hypothesis-test form. The test is
one-sample t-test, and it is defined as:
H0:

=0

Ha:

Test Statistic:

T=(Y0)/(s/N)
where Y, N, and s are defined as above.

Significance Level: . The most commonly used value for is 0.05.


Critical Region:
Reject the null hypothesis that the mean is a specified value, 0, if
T<t/2,N1
or
Confidence
Interval

T>t1/2,N1
We generated a 95 %, two-sided confidence interval for the ZARR13.DAT data set based on the
following information.

Example

N
MEAN
STANDARD DEVIATION

t1-0.025,N-1

= 195
=
9.261460
=
0.022789
=
1.9723

LOWER LIMIT = 9.261460 - 1.9723*0.022789/195


UPPER LIMIT = 9.261460 + 1.9723*0.022789/195

Thus, a 95 % confidence interval for the mean is (9.258242, 9.264679).


t-Test Example We performed a two-sided, one-sample t-test using the ZARR13.DAT data set to test the null
hypothesis that the population mean is equal to 5.
H0:
Ha:

= 5
5

Test statistic: T = 2611.284


Degrees of freedom: = 194
Significance level: = 0.05
Critical value: t1-/2, = 1.9723
Critical region: Reject H0 if |T| > 1.9723

We reject the null hypotheses for our two-tailed t-test because the absolute value of the test statis
is greater than the critical value. If we were to perform an upper, one-tailed test, the critical valu
would be t1-, = 1.6527, and we would still reject the null hypothesis.

The confidence interval provides an alternative to the hypothesis test. If the confidence interval
contains 5, then H0 cannot be rejected. In our example, the confidence interval (9.258242,
9.264679) does not contain 5, indicating that the population mean does not equal 5 at the 0.05 le
of significance.

In general, there are three possible alternative hypotheses and rejection regions for the one-samp
test:
Alternative Hypothesis

Rejection Region

Ha: 0

|T| > t1-/2,

Ha: > 0

T > t1-,

Ha: < 0

T < t,

The rejection regions for three posssible alternative hypotheses using our example data are show
the following graphs.

Questions

Related
Techniques

Case Study
Software

Confidence limits for the mean can be used to answer the following questions:
1. What is a reasonable estimate for the mean?
2. How much variability is there in the estimate of the mean?
3. Does a given target value fall within the confidence limits?
Two-Sample t-Test

Confidence intervals for other location estimators such as the median or mid-mean tend to be
mathematically difficult or intractable. For these cases, confidence intervals can be obtained usin
the bootstrap.
Heat flow meter data.

Confidence Interval on the Mean


Author(s)
David M. Lane
Prerequisites
Areas Under Normal Distributions, Sampling Distribution of the Mean, Introduction to
Estimation, Introduction to Confidence Intervals
Learning Objectives
1. Use the inverse normal distribution calculator to find the value of z to use for a
confidence interval
2. Compute a confidence interval on the mean when is known
3. Determine whether to use a t distribution or a normal distribution
4. Compute a confidence interval on the mean when is estimated
View Multimedia Version
When you compute a confidence interval on the mean, you compute the mean of a
sample in order to estimate the mean of the population. Clearly, if you already knew the
population mean, there would be no need for a confidence interval. However, to explain
how confidence intervals are constructed, we are going to work backwards and begin by
assuming characteristics of the population. Then we will show how sample data can be
used to construct a confidence interval.
Assume that the weights of 10-year-old children are normally distributed with a mean of
90 and a standard deviation of 36. What is the sampling distribution of the mean for a
sample size of 9? Recall from the section on the sampling distribution of the mean that
the mean of the sampling distribution is and the standard error of the mean is

For the present example, the sampling distribution of the mean has a mean of 90 and a
standard deviation of 36/3 = 12. Note that the standard deviation of a sampling
distribution is its standard error. Figure 1 shows this distribution. The shaded area
represents the middle 95% of the distribution and stretches from 66.48 to 113.52. These
limits were computed by adding and subtracting 1.96 standard deviations to/from the
mean of 90 as follows:

90 - (1.96)(12) = 66.48
90 + (1.96)(12) = 113.52

The value of 1.96 is based on the fact that 95% of the area of a normal distribution is
within 1.96 standard deviations of the mean; 12 is the standard error of the mean.

Figure 1. The sampling distribution of the mean for N=9. The middle 95% of the
distribution is shaded.
Figure 1 shows that 95% of the means are no more than 23.52 units (1.96 standard
deviations) from the mean of 90. Now consider the probability that a sample mean
computed in a random sample is within 23.52 units of the population mean of 90. Since
95% of the distribution is within 23.52 of 90, the probability that the mean from any
given sample will be within 23.52 of 90 is 0.95. This means that if we repeatedly
compute the mean (M) from a sample, and create an interval ranging from M - 23.52 to
M + 23.52, this interval will contain the population mean 95% of the time. In general,
you compute the 95% confidence interval for the mean with the following formula:
Lower limit = M - Z.95M
Upper limit = M + Z.95M
where Z.95 is the number of standard deviations extending from the mean of a normal
distribution required to contain 0.95 of the area and M is the standard error of the mean.
If you look closely at this formula for a confidence interval, you will notice that you need
to know the standard deviation () in order to estimate the mean. This may sound
unrealistic, and it is. However, computing a confidence interval when is known is easier
than when has to be estimated, and serves a pedagogical purpose. Later in this section
we will show how to compute a confidence interval for the mean when has to be
estimated.
Suppose the following five numbers were sampled from a normal distribution with a
standard deviation of 2.5: 2, 3, 5, 6, and 9. To compute the 95% confidence interval, start
by computing the mean and standard error:
M = (2 + 3 + 5 + 6 + 9)/5 = 5.
M =

= 1.118.

Z.95 can be found using the normal distribution calculator and specifying that the shaded
area is 0.95 and indicating that you want the area to be between the cutoff points. As
shown in Figure 2, the value is 1.96. If you had wanted to compute the 99% confidence
interval, you would have set the shaded area to 0.99 and the result would have been 2.58.

Figure 2. 95% of the area is between -1.96 and 1.96.


Normal Distribution Calculator
The confidence interval can then be computed as follows:
Lower limit = 5 - (1.96)(1.118)= 2.81
Upper limit = 5 + (1.96)(1.118)= 7.19
You should use the t distribution rather than the normal distribution when the variance is
not known and has to be estimated from sample data. When the sample size is large, say
100 or above, the t distribution is very similar to the standard normal distribution.
However, with smaller sample sizes, the t distribution is leptokurtic, which means it has
relatively more scores in its tails than does the normal distribution. As a result, you have
to extend farther from the mean to contain a given proportion of the area. Recall that with
a normal distribution, 95% of the distribution is within 1.96 standard deviations of the
mean. Using the t distribution, if you have a sample size of only 5, 95% of the area is

within 2.78 standard deviations of the mean. Therefore, the standard error of the mean
would be multiplied by 2.78 rather than 1.96.
The values of t to be used in a confidence interval can be looked up in a table of the t
distribution. A small version of such a table is shown in Table 1. The first column, df,
stands for degrees of freedom, and for confidence intervals on the mean, df is equal to N 1, where N is the sample size.
Table 1. Abbreviated t table.
df
2
3
4
5
8
10
20
50
100

0.95 0.99
4.303 9.925
3.182 5.841
2.776 4.604
2.571 4.032
2.306 3.355
2.228 3.169
2.086 2.845
2.009 2.678
1.984 2.626

You can also use the "inverse t distribution" calculator to find the t values to use in
confidence intervals. You will learn more about the t distribution in the next section.
Assume that the following five numbers are sampled from a normal distribution: 2, 3, 5,
6, and 9 and that the standard deviation is not known. The first steps are to compute the
sample mean and variance:
M=5
s2 = 7.5
The next step is to estimate the standard error of the mean. If we knew the population
variance, we could use the following formula:

Instead we compute an estimate of the standard error (sM):


= 1.225
The next step is to find the value of t. As you can see from Table 1, the value for the 95%
interval for df = N - 1 = 4 is 2.776. The confidence interval is then computed just as it is
when M. The only differences are that sM and t rather than M and Z are used.

Lower limit = 5 - (2.776)(1.225) = 1.60


Upper limit = 5 + (2.776)(1.225) = 8.40

More generally, the formula for the 95% confidence interval on the mean is:
Lower limit = M - (tCL)(sM)
Upper limit = M + (tCL)(sM)
where M is the sample mean, tCL is the t for the confidence level desired (0.95 in the
above example), and sM is the estimated standard error of the mean.
We will finish with an analysis of the Stroop Data. Specifically, we will compute a
confidence interval on the mean difference score. Recall that 47 subjects named the color
of ink that words were written in. The names conflicted so that, for example, they would
name the ink color of the word "blue" written in red ink. The correct response is to say
"red" and ignore the fact that the word is "blue." In a second condition, subjects named
the ink color of colored rectangles.

Confidence Intervals
In statistical inference, one wishes to estimate population parameters using observed
sample data.
A confidence interval gives an estimated range of values which is likely to include an
unknown population parameter, the estimated range being calculated from a given set of
sample data. (Definition taken from Valerie J. Easton and John H. McColl's Statistics
Glossary v1.1)
The common notation for the parameter in question is
population mean

. Often, this parameter is the

, which is estimated through the sample mean

The level C of a confidence interval gives the probability that the interval produced by
the method employed includes the true value of the parameter

Example
Suppose a student measuring the boiling temperature of a certain liquid observes the
readings (in degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different
samples of the liquid. He calculates the sample mean to be 101.82. If he knows that the
standard deviation for this procedure is 1.2 degrees, what is the confidence interval for

the population mean at a 95% confidence


level?
In other words, the student wishes to
estimate the true mean boiling
temperature of the liquid using the
results of his measurements. If the
measurements follow a normal
distribution, then the sample mean will
have the distribution N( ,
). Since
the sample size is 6, the standard
deviation of the sample mean is equal to
1.2/sqrt(6) = 0.49.
The selection of a confidence level for an interval determines the probability that the
confidence interval produced will contain the true parameter value. Common choices for
the confidence level C are 0.90, 0.95, and 0.99. These levels correspond to percentages of
the area of the normal density curve. For example, a 95% confidence interval covers 95%
of the normal curve -- the probability of observing a value outside of this area is less than
0.05. Because the normal curve is symmetric, half of the area is in the left tail of the
curve, and the other half of the area is in the right tail of the curve. As shown in the
diagram to the right, for a confidence interval with level C, the area in each tail of the
curve is equal to (1-C)/2. For a 95% confidence interval, the area in each tail is equal to
0.05/2 = 0.025.
The value z* representing the point on the standard normal density curve such that the
probability of observing a value greater than z* is equal to p is known as the upper p
critical value of the standard normal distribution. For example, if p = 0.025, the value z*
such that P(Z > z*) = 0.025, or P(Z < z*) = 0.975, is equal to 1.96. For a confidence
interval with level C, the value p is equal to (1-C)/2. A 95% confidence interval for the
standard normal distribution, then, is the interval (-1.96, 1.96), since 95% of the area
under the curve falls within this interval.

Confidence Intervals for Unknown Mean and Known


Standard Deviation
For a population with unknown mean
and known standard deviation
,a
confidence interval for the population mean, based on a simple random sample

(SRS) of size n, is
+ z*
standard normal distribution.

, where z* is the upper (1-C)/2 critical value for the

Note: This interval is only exact when the population distribution is normal. For large
samples from other population distributions, the interval is approximately correct by
the Central Limit Theorem.
In the example above, the student calculated the sample mean of the boiling temperatures
to be 101.82, with standard deviation 0.49. The critical value for a 95% confidence
interval is 1.96, where (1-0.95)/2 = 0.025. A 95% confidence interval for the unknown
mean
is ((101.82 - (1.96*0.49)), (101.82 + (1.96*0.49))) = (101.82 - 0.96, 101.82 +
0.96) = (100.86, 102.78).
As the level of confidence decreases, the size of the corresponding interval will decrease.
Suppose the student was interested in a 90% confidence interval for the boiling
temperature. In this case, C = 0.90, and (1-C)/2 = 0.05. The critical value z* for this level
is equal to 1.645, so the 90% confidence interval is ((101.82 - (1.645*0.49)), (101.82 +
(1.645*0.49))) = (101.82 - 0.81, 101.82 + 0.81) = (101.01, 102.63)
An increase in sample size will decrease the length of the confidence interval without
reducing the level of confidence. This is because the standard deviation decreases as n
increases. The margin of error m of a confidence interval is defined to be the value
added or subtracted from the sample mean which determines the length of the interval: m

= z*

Suppose in the example above, the student wishes to have a margin of error equal to 0.5
with 95% confidence. Substituting the appropriate values into the expression for m and
solving for n gives the calculation n = (1.96*1.2/0.5) = (2.35/0.5) = 4.7 = 22.09. To
achieve a 95% confidence interval for the mean boiling point with total length less than 1
degree, the student will have to take 23 measurements.

Confidence Intervals for Unknown Mean and Unknown


Standard Deviation
In most practical research, the standard deviation for the population of interest is not
known. In this case, the standard deviation
is replaced by the estimated standard

deviation s, also known as the standard error. Since the standard error is an estimate for
the true value of the standard deviation, the distribution of the sample mean

longer normal with mean

and standard deviation

is no

. Instead, the sample mean

follows the t distribution with mean


and standard deviation
. The t distribution is
also described by its degrees of freedom. For a sample of size n, the t distribution will
have n-1 degrees of freedom. The notation for a t distribution with k degrees of freedom
is t(k). As the sample size n increases, the t distribution becomes closer to the normal
distribution, since the standard error approaches the true standard deviation
for large
n.
For a population with unknown mean
and unknown standard deviation, a
confidence interval for the population mean, based on a simple random sample
(SRS) of size n, is
+ t*
, where t* is the upper (1-C)/2 critical value for the t
distribution with n-1 degrees of freedom, t(n-1).
Example
The dataset "Normal Body Temperature, Gender, and Heart Rate" contains 130
observations of body temperature, along with the gender of each individual and his or her
heart rate. Using the MINITAB "DESCRIBE" command provides the following
information:
Descriptive Statistics
Variable
TEMP
Variable
TEMP

N
130
Min
96.300

Mean
Median Tr Mean
98.249
98.300
98.253
Max
100.800

Q1
97.800

StDev SE Mean
0.733
0.064

Q3
98.700

To find a 95% confidence interval for the mean based on the sample mean 98.249 and
sample standard deviation 0.733, first find the 0.025 critical value t* for 129 degrees of
freedom. This value is approximately 1.962, the critical value for 100 degrees of freedom
(found in Table E in Moore and McCabe). The estimated standard deviation for the
sample mean is 0.733/sqrt(130) = 0.064, the value provided in the SE MEAN column of
the MINITAB descriptive statistics. A 95% confidence interval, then, is approximately
((98.249 - 1.962*0.064), (98.249 + 1.962*0.064)) = (98.249 - 0.126, 98.249+ 0.126) =
(98.123, 98.375).

For a more precise (and more simply achieved) result, the MINITAB "TINTERVAL"
command, written as follows, gives an exact 95% confidence interval for 129 degrees of
freedom:
MTB > tinterval 95 c1
Confidence Intervals
Variable
TEMP

N
130

Mean
98.2492

StDev SE Mean
0.7332
0.0643

95.0 % CI
( 98.1220, 98.3765)

According to these results, the usual assumed normal body temperature of 98.6 degrees
Fahrenheit is not within a 95% confidence interval for the mean.
Data source: Data presented in Mackowiak, P.A., Wasserman, S.S., and Levine, M.M.
(1992), "A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body
Temperature, and Other Legacies of Carl Reinhold August Wunderlich," Journal of the
American Medical Association, 268, 1578-1580. Dataset available through the JSE
Dataset Archive.

For some more definitions and examples, see the confidence interval index in Valerie J.
Easton and John H. McColl's Statistics Glossary v1.1.

S-ar putea să vă placă și