Documente Academic
Documente Profesional
Documente Cultură
Confidence Interval
Confidence Limits
Confidence Level
Confidence Interval for a Mean
Confidence Interval for the Difference Between Two Means
Main Contents page | Index of all entries
Confidence Interval
A confidence interval gives an estimated range of values which is likely to include
an unknown population parameter, the estimated range being calculated from a
given set of sample data.
If independent samples are taken repeatedly from the same population, and a
confidence interval calculated for each sample, then a certain percentage
(confidence level) of the intervals will include the unknown population parameter.
Confidence intervals are usually calculated so that this percentage is 95%, but
we can produce 90%, 99%, 99.9% (or whatever) confidence intervals for the
unknown parameter.
The width of the confidence interval gives us some idea about how uncertain we
are about the unknown parameter (see precision). A very wide interval may
indicate that more data should be collected before anything very definite can be
said about the parameter.
Confidence intervals are more informative than the simple results of hypothesis
tests (where we decide "reject H0" or "don't reject H0") since they provide a range
of plausible values for the unknown parameter.
Confidence Limits
Confidence limits are the lower and upper boundaries / values of a confidence
interval, that is, the values which define the range of a confidence interval.
The upper and lower bounds of a 95% confidence interval are the 95%
confidence limits. These limits may be taken for other confidence levels, for
example, 90%, 99%, 99.9%.
Confidence Level
The confidence level is the probability value
interval.
daily output; a medical researcher who wishes to estimate the mean response by
patients to a new drug; etc.
The (two sided) confidence interval for a mean contains all the values of 0 (the
true population mean) which would not be rejected in the two-sided hypothesis
test of:
H0: = 0
against
H1: not equal to 0
The width of the confidence interval gives us some idea about how uncertain we
are about the unknown population parameter, in this cas the mean. A very wide
interval may indicate that more data should be collected before anything very
definite can be said about the parameter.
We calculate these intervals for different confidence levels, depending on how
precise we want to be. We interpret an interval calculated at a 95% level as, we
are 95% confident that the interval contains the true population mean. We could
also say that 95% of all confidence intervals formed in this manner (from different
samples of the population) will include the true population mean.
Compare one sample t-test.
against
H1: 1 not equal to 2
i.e.
H0: 1 - 2 = 0
against
H1: 1 - 2 not equal to 0
If the confidence interval includes 0 we can say that there is no significant
difference between the means of the two populations, at a given level of
confidence.
The width of the confidence interval gives us some idea about how uncertain we
are about the difference in the means. A very wide interval may indicate that
more data should be collected before anything definite can be said.
We calculate these intervals for different confidence levels, depending on how
precise we want to be. We interpret an interval calculated at a 95% level as, we
are 95% confident that the interval contains the true difference between the two
population means. We could also say that 95% of all confidence intervals formed
in this manner (from different samples of the population) will include the true
difference.
Compare two sample t-test.
Confidence limits for the mean (Snedecor and Cochran, 1989) are an interval estimate for the m
Interval estimates are often desirable because the estimate of the mean varies from sample to
sample. Instead of a single estimate for the mean, a confidence interval generates a lower and up
limit for the mean. The interval estimate gives an indication of how much uncertainty there is in
estimate of the true mean. The narrower the interval, the more precise is our estimate.
Confidence limits are expressed in terms of a confidence coefficient. Although the choice of
Definition:
Confidence
Interval
As a technical note, a 95 % confidence interval does not mean that there is a 95 % probability th
the interval contains the true mean. The interval computed from a given sample either contains t
true mean or it does not. Instead, the level of confidence is associated with the method of
calculating the interval. The confidence coefficient is simply the proportion of samples of a give
size that may be expected to contain the true mean. That is, for a 95 % confidence interval, if ma
samples are collected and the confidence interval computed, in the long run about 95 % of these
intervals would contain the true mean.
Confidence limits are defined as:
Yt1/2,N1sN
where Y is the sample mean, s is the sample standard deviation, N is the sample size, is the
desired significance level, and t1-/2,N-1 is the 100(1-/2) percentile of the t distribution with N - 1
degrees of freedom. Note that the confidence coefficient is 1 - .
From the formula, it is clear that the width of the interval is controlled by two factors:
1. As N increases, the interval gets narrower from the N term.
That is, one way to obtain more precise estimates for the mean is to increase the sample s
2. The larger the sample standard deviation, the larger the confidence interval. This simply
means that noisy data, i.e., data with a large standard deviation, are going to generate wid
intervals than data with a smaller standard deviation.
Definition:
To test whether the population mean has a specific value, 0, against the two-sided alternative th
Hypothesis Test does not have a value 0, the confidence interval is converted to hypothesis-test form. The test is
one-sample t-test, and it is defined as:
H0:
=0
Ha:
Test Statistic:
T=(Y0)/(s/N)
where Y, N, and s are defined as above.
T>t1/2,N1
We generated a 95 %, two-sided confidence interval for the ZARR13.DAT data set based on the
following information.
Example
N
MEAN
STANDARD DEVIATION
t1-0.025,N-1
= 195
=
9.261460
=
0.022789
=
1.9723
= 5
5
We reject the null hypotheses for our two-tailed t-test because the absolute value of the test statis
is greater than the critical value. If we were to perform an upper, one-tailed test, the critical valu
would be t1-, = 1.6527, and we would still reject the null hypothesis.
The confidence interval provides an alternative to the hypothesis test. If the confidence interval
contains 5, then H0 cannot be rejected. In our example, the confidence interval (9.258242,
9.264679) does not contain 5, indicating that the population mean does not equal 5 at the 0.05 le
of significance.
In general, there are three possible alternative hypotheses and rejection regions for the one-samp
test:
Alternative Hypothesis
Rejection Region
Ha: 0
Ha: > 0
T > t1-,
Ha: < 0
T < t,
The rejection regions for three posssible alternative hypotheses using our example data are show
the following graphs.
Questions
Related
Techniques
Case Study
Software
Confidence limits for the mean can be used to answer the following questions:
1. What is a reasonable estimate for the mean?
2. How much variability is there in the estimate of the mean?
3. Does a given target value fall within the confidence limits?
Two-Sample t-Test
Confidence intervals for other location estimators such as the median or mid-mean tend to be
mathematically difficult or intractable. For these cases, confidence intervals can be obtained usin
the bootstrap.
Heat flow meter data.
For the present example, the sampling distribution of the mean has a mean of 90 and a
standard deviation of 36/3 = 12. Note that the standard deviation of a sampling
distribution is its standard error. Figure 1 shows this distribution. The shaded area
represents the middle 95% of the distribution and stretches from 66.48 to 113.52. These
limits were computed by adding and subtracting 1.96 standard deviations to/from the
mean of 90 as follows:
90 - (1.96)(12) = 66.48
90 + (1.96)(12) = 113.52
The value of 1.96 is based on the fact that 95% of the area of a normal distribution is
within 1.96 standard deviations of the mean; 12 is the standard error of the mean.
Figure 1. The sampling distribution of the mean for N=9. The middle 95% of the
distribution is shaded.
Figure 1 shows that 95% of the means are no more than 23.52 units (1.96 standard
deviations) from the mean of 90. Now consider the probability that a sample mean
computed in a random sample is within 23.52 units of the population mean of 90. Since
95% of the distribution is within 23.52 of 90, the probability that the mean from any
given sample will be within 23.52 of 90 is 0.95. This means that if we repeatedly
compute the mean (M) from a sample, and create an interval ranging from M - 23.52 to
M + 23.52, this interval will contain the population mean 95% of the time. In general,
you compute the 95% confidence interval for the mean with the following formula:
Lower limit = M - Z.95M
Upper limit = M + Z.95M
where Z.95 is the number of standard deviations extending from the mean of a normal
distribution required to contain 0.95 of the area and M is the standard error of the mean.
If you look closely at this formula for a confidence interval, you will notice that you need
to know the standard deviation () in order to estimate the mean. This may sound
unrealistic, and it is. However, computing a confidence interval when is known is easier
than when has to be estimated, and serves a pedagogical purpose. Later in this section
we will show how to compute a confidence interval for the mean when has to be
estimated.
Suppose the following five numbers were sampled from a normal distribution with a
standard deviation of 2.5: 2, 3, 5, 6, and 9. To compute the 95% confidence interval, start
by computing the mean and standard error:
M = (2 + 3 + 5 + 6 + 9)/5 = 5.
M =
= 1.118.
Z.95 can be found using the normal distribution calculator and specifying that the shaded
area is 0.95 and indicating that you want the area to be between the cutoff points. As
shown in Figure 2, the value is 1.96. If you had wanted to compute the 99% confidence
interval, you would have set the shaded area to 0.99 and the result would have been 2.58.
within 2.78 standard deviations of the mean. Therefore, the standard error of the mean
would be multiplied by 2.78 rather than 1.96.
The values of t to be used in a confidence interval can be looked up in a table of the t
distribution. A small version of such a table is shown in Table 1. The first column, df,
stands for degrees of freedom, and for confidence intervals on the mean, df is equal to N 1, where N is the sample size.
Table 1. Abbreviated t table.
df
2
3
4
5
8
10
20
50
100
0.95 0.99
4.303 9.925
3.182 5.841
2.776 4.604
2.571 4.032
2.306 3.355
2.228 3.169
2.086 2.845
2.009 2.678
1.984 2.626
You can also use the "inverse t distribution" calculator to find the t values to use in
confidence intervals. You will learn more about the t distribution in the next section.
Assume that the following five numbers are sampled from a normal distribution: 2, 3, 5,
6, and 9 and that the standard deviation is not known. The first steps are to compute the
sample mean and variance:
M=5
s2 = 7.5
The next step is to estimate the standard error of the mean. If we knew the population
variance, we could use the following formula:
More generally, the formula for the 95% confidence interval on the mean is:
Lower limit = M - (tCL)(sM)
Upper limit = M + (tCL)(sM)
where M is the sample mean, tCL is the t for the confidence level desired (0.95 in the
above example), and sM is the estimated standard error of the mean.
We will finish with an analysis of the Stroop Data. Specifically, we will compute a
confidence interval on the mean difference score. Recall that 47 subjects named the color
of ink that words were written in. The names conflicted so that, for example, they would
name the ink color of the word "blue" written in red ink. The correct response is to say
"red" and ignore the fact that the word is "blue." In a second condition, subjects named
the ink color of colored rectangles.
Confidence Intervals
In statistical inference, one wishes to estimate population parameters using observed
sample data.
A confidence interval gives an estimated range of values which is likely to include an
unknown population parameter, the estimated range being calculated from a given set of
sample data. (Definition taken from Valerie J. Easton and John H. McColl's Statistics
Glossary v1.1)
The common notation for the parameter in question is
population mean
The level C of a confidence interval gives the probability that the interval produced by
the method employed includes the true value of the parameter
Example
Suppose a student measuring the boiling temperature of a certain liquid observes the
readings (in degrees Celsius) 102.5, 101.7, 103.1, 100.9, 100.5, and 102.2 on 6 different
samples of the liquid. He calculates the sample mean to be 101.82. If he knows that the
standard deviation for this procedure is 1.2 degrees, what is the confidence interval for
(SRS) of size n, is
+ z*
standard normal distribution.
Note: This interval is only exact when the population distribution is normal. For large
samples from other population distributions, the interval is approximately correct by
the Central Limit Theorem.
In the example above, the student calculated the sample mean of the boiling temperatures
to be 101.82, with standard deviation 0.49. The critical value for a 95% confidence
interval is 1.96, where (1-0.95)/2 = 0.025. A 95% confidence interval for the unknown
mean
is ((101.82 - (1.96*0.49)), (101.82 + (1.96*0.49))) = (101.82 - 0.96, 101.82 +
0.96) = (100.86, 102.78).
As the level of confidence decreases, the size of the corresponding interval will decrease.
Suppose the student was interested in a 90% confidence interval for the boiling
temperature. In this case, C = 0.90, and (1-C)/2 = 0.05. The critical value z* for this level
is equal to 1.645, so the 90% confidence interval is ((101.82 - (1.645*0.49)), (101.82 +
(1.645*0.49))) = (101.82 - 0.81, 101.82 + 0.81) = (101.01, 102.63)
An increase in sample size will decrease the length of the confidence interval without
reducing the level of confidence. This is because the standard deviation decreases as n
increases. The margin of error m of a confidence interval is defined to be the value
added or subtracted from the sample mean which determines the length of the interval: m
= z*
Suppose in the example above, the student wishes to have a margin of error equal to 0.5
with 95% confidence. Substituting the appropriate values into the expression for m and
solving for n gives the calculation n = (1.96*1.2/0.5) = (2.35/0.5) = 4.7 = 22.09. To
achieve a 95% confidence interval for the mean boiling point with total length less than 1
degree, the student will have to take 23 measurements.
deviation s, also known as the standard error. Since the standard error is an estimate for
the true value of the standard deviation, the distribution of the sample mean
is no
N
130
Min
96.300
Mean
Median Tr Mean
98.249
98.300
98.253
Max
100.800
Q1
97.800
StDev SE Mean
0.733
0.064
Q3
98.700
To find a 95% confidence interval for the mean based on the sample mean 98.249 and
sample standard deviation 0.733, first find the 0.025 critical value t* for 129 degrees of
freedom. This value is approximately 1.962, the critical value for 100 degrees of freedom
(found in Table E in Moore and McCabe). The estimated standard deviation for the
sample mean is 0.733/sqrt(130) = 0.064, the value provided in the SE MEAN column of
the MINITAB descriptive statistics. A 95% confidence interval, then, is approximately
((98.249 - 1.962*0.064), (98.249 + 1.962*0.064)) = (98.249 - 0.126, 98.249+ 0.126) =
(98.123, 98.375).
For a more precise (and more simply achieved) result, the MINITAB "TINTERVAL"
command, written as follows, gives an exact 95% confidence interval for 129 degrees of
freedom:
MTB > tinterval 95 c1
Confidence Intervals
Variable
TEMP
N
130
Mean
98.2492
StDev SE Mean
0.7332
0.0643
95.0 % CI
( 98.1220, 98.3765)
According to these results, the usual assumed normal body temperature of 98.6 degrees
Fahrenheit is not within a 95% confidence interval for the mean.
Data source: Data presented in Mackowiak, P.A., Wasserman, S.S., and Levine, M.M.
(1992), "A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the Normal Body
Temperature, and Other Legacies of Carl Reinhold August Wunderlich," Journal of the
American Medical Association, 268, 1578-1580. Dataset available through the JSE
Dataset Archive.
For some more definitions and examples, see the confidence interval index in Valerie J.
Easton and John H. McColl's Statistics Glossary v1.1.