Sunteți pe pagina 1din 52

BASIC STATISTICS:

A USER ORIENTED APPROACH


(Manuscript)

Spyros MAKRIDAKIS

and
Robert L. WINKLER

CHAPTER 11

CHAPTER 11
HYPOTHESIS TESTING: MEANS AND PROPORTIONS
11.1

Introduction

In Chapters 8 and 9 we discussed problems of estimation, where we want to come up with an


estimate of a parameter and to have some idea of the accuracy of estimation. No specific values
of the parameter are singled out in advance for special attention. We simply take a sample and
calculate the desired point or interval estimates.

Another form of statistical inference is called hypothesis testing. In hypothesis testing, a specific
value or set of values of a parameter is singled out in advance. For example, if a car
manufacturer claims that the mean gasoline mileage obtained by a certain model in city driving
is 32 miles per gallon, the value = 32 is being singled out. Claims such as the statement that =
32 are called hypotheses.

Suppose that the members of a consumer group are suspicious about the manufacturer's claim
that = 32. In particular, they suspect that the mean gasoline mileage in city driving for this

model is in fact less than 32 miles per gallon. This is another hypothesis: the hypothesis that <

32. The manufacturer's claim that = 32 and the consumer group's suspicion that < 32 can be
viewed as competing hypotheses. Instead of simply asking what the mean gasoline mileage is

equal to, as in an estimation problem, we are focusing upon the value 32 and asking whether =
32 or < 32.

In order to test the manufacturer's claim (or hypothesis) that = 32 against their own hypothesis

that < 32, the members of the consumer group could design an experiment to gather
information about the fuel economy of the type of car in question. For example, they could take a
sample of 40 cars of the same make and model and record the gasoline mileage obtained by each
car under city driving conditions over a period of one month. Suppose that they do so and find
that the sample mean turns out to be x = 30.21 miles per gallon. This is less than the
manufacturer's claim that = 32. However, we know that because of sampling fluctuations, the

sample mean x is not expected to equal the population mean exactly. Is it likely that a sample

mean of 30.21 would occur even though the population mean is actually 32? Is 30.21 far enough

below 32 to cause severe doubt on the manufacturers claim or is the difference well within the
range of what might be expected due to chance fluctuations?

The sampling distribution of the sample mean x tells us how much x is likely to fluctuate if is

really 32. Thus, the sampling distribution of x helps us to answer the above questions. From the

sampling distribution, the consumer group can decide how much below 32 the sample mean
must be in order for them to reject the manufacturer's claim. If x is low enough, they will reject

the hypothesis that = 32. If x is not low enough, on the other hand, they will be unable to

reject the hypothesis that = 32.

In this chapter, we will discuss hypothesis-testing involving means and proportions. First, some
general concepts and terminology of hypothesis testing are introduced in Section 11.2.
Hypothesis testing for means is then covered in Sections 11.3 and 11.4, followed by hypothesis
testing for proportions in Section 11.5. In Section 11.6 the idea of a significance level is.
presented.

11.2

The Nature or Hypothesis Testing

Hypotheses are simply statements, or claims, and hypothesis testing is a set of procedures to help
us decide whether or not to accept such claims or decide which of two competing claims to
believe. For example, think about a courtroom trial. The prosecution makes the claim that a
defendant is guilty of violating a law, while the defense states that the defendant is innocent.
These are two competing hypotheses, and the objective of a courtroom trial is to decide whether
or not to accept the prosecution's hypothesis that the defendant is guilty. Thus, a trial is a familiar
situation which provides a good analogy to statistical hypothesis testing.

In the system of justice used in the U.S., the guiding principle is supposed to be that the
defendant is "presumed innocent until proven guilty". That is, we start out with the notion that
the defendant is innocent. In the terminology of hypothesis testing, we call the claim of

innocence a null hypothesis. This can often be thought of as representing the status quo. The
prosecution offers a competing claim: the defendant is guilty. This is an alternative to our null
hypothesis of innocence, and we call the claim of guilt an alternative hypothesis. A shorthand
way of summarizing this is to say that we are testing

HO : the defendant is innocent


against

HA: the defendant is guilty.

The notation HO stands for "null hypothesis". In numerical terms, null means zero; hence the "O"
subscript in HO. Similarly, HA stands for "alternative hypothesis".

The outcome of a courtroom trial is a decision. The defendant is either acquitted or convicted of
the charges brought by the prosecution. Conviction means that the jury (in the case of a jury trial)
or the judge has decided to reject the null hypothesis HO in favor of the alternative hypothesis HA
Acquittal means that the judge or jury has decided that the evidence presented by the prosecution
is not sufficient to warrant rejection of the null hypothesis of innocence. In this case, some would
say that we accept the null hypothesis. Others would say that acceptance is too strong. The judge
or the members of the jury may not be fully convinced that the defendant is innocent, but there is
not sufficient evidence to reject the hypothesis of innocence. As a result, some would prefer to
say "we fail to reject" the null hypothesis.

We will generally use the terminology "accept HO and "reject HO to represent the two choices
in hypothesis-testing situations. "Fail to reject HO is a bit more awkward expression than
"accept HO". However, you should keep in mind that "accept HO" does not necessarily mean that
we literally believe that HO is true. Sometimes it means that we do not have enough evidence to
reject HO or that we are reserving judgment.

It would be nice if judges and juries never made mistakes - that is, if all guilty parties were
convicted and all innocent parties were acquitted. Unfortunately when decisions must be made in
the face of uncertainty, mistakes are always possible. The four possible situations are shown in
Table 11.1. In two of the situations, the decision turns out to be correct:

- acquitting an innocent person (accepting HO when HO is in fact true)


- convicting a guilty person (rejecting HO when HO is false and HA is true).

In the other two situations, the decision is not correct:

- convicting an innocent person (rejecting HO when HO is really true)


- acquitting a guilty person (accepting HO when HO is really false and HA is true).
Rejecting HO when HO is true is called a Type I error, accepting Ho when Ho is false is called a
Type II error. The probability of a Type I error is denoted by (Greek alpha), and the probability
of a Type II error is denoted by (Greek beta). Ideally, if we could determine guilt or innocence
for certain, with no doubt whatsoever, we could guarantee correct decisions. Then we would

have = 0 and = O. This is, of course, seldom possible. However, we can try to make and
small. In a courtroom trial, this is done by listening to evidence that has a bearing on the case
while attempting to bar certain types of evidence that may be misleading. In statistical hypothesis
testing, we gather evidence by taking a sample, and the larger the sample size, the better able we
are to reduce the error probabilities and .
The two types of error have different implications, and one may be viewed as more serious than
the other. In the U.S. justice system, it is generally acknowledged that convicting an innocent
person is a more serious error than acquitting a guilty person. Thus, with a Type I error more
serious than a Type n error, we are especially concerned about keeping small. This does not

mean that the size of is unimportant, but it means that more attention is given to . The notion

that guilt must be proven beyond a reasonable doubt in order to convict a person reflects the
importance of keeping small.

Table 11.1: Correct Decisions and Errors in Hypothesis Testing.


ALTERNATIVES
HO

is true

HO

is false

Correct decision

(HA is true)

Accept HO

Type I error

Type II error

Reject HO

(Probability )

(Probability )

DECISIONS

Correct decision

How is the decision to accept or reject actually made in hypothesis testing? The courtroom trial
analogy does not help us here, because the decision is made in a relatively informal manner by
the judge or the jury, taking into account the law and the evidence presented during the trial. In
statistical hypothesis testing, we develop a more formal decision rule that tells us when to accept
and when to reject the null hypothesis HO. This decision rule is based on a test statistic which is
computed from the sample. In the remainder of this section, we will take the concepts developed
in the courtroom trial example and discuss them in terms of statistical hypothesis testing.
Specific procedures for means and proportions will then be developed in more detail in Sections
11.3 to 11.5.

The first step in hypothesis testing is the formulation of the null and alternative hypotheses. In
the courtroom trial situation, innocence is taken as the null hypothesis because of the "innocent
until proven guilty" dictum. Under a different justice system with the burden of proof placed on
the defendant instead of on the prosecution, the hypothesis that the defendant is guilty might be
taken as the null hypothesis.

In statistical hypothesis testing, hypotheses usually involve population parameters. For instance,
consider the example given in Section 11.1. The hypotheses are

HO: = 32

and

HA: < 32,

where is the mean gasoline mileage (in miles per gallon) obtained by a certain model of car in

city driving. The manufacturer's claim that = 32 is taken as the null hypothesis, and the

consumer group's suspicion that < 32 is HA. Suppose that we turned the tables, with the

consumer group claiming that = 32 and the manufacturer feeling that > 32. The manufacturer
could test

HO: = 32
versus HA: > 32.
Rejection of HO would provide the manufacturer with ammunition to dispute the consumer
group's claim.

Suppose that you are given a coin and you suspect that it is "loaded" to make one side more
likely to come up than the other when the coin is tossed. The null hypothesis is that the coin is
fair. This can be represented as p = 1/2, where p is the probability that heads comes up. The
alternative is that the coin is loaded, or not fair. But the coin is loaded if heads is more likely than
tails (p > 1/2) or if heads is less likely than tails (p < 1/2). That is, the coin is loaded if p 1/2.
Thus, the hypotheses are

HO: p = 1/2

and

HA: p 1/2.

Certain achievement tests are given at regular intervals in many school systems. Suppose that the
standard deviation of scores is ten in a test given to high school juniors in many schools across
the country. The superintendent of a particular school system might be interested in whether or
not the variability of scores in that school system differs from the national norms. The null

hypothesis is provided by the national standard deviation of ten, and the alternative includes both
the possibility of less variability ( < 10) and the possibility of greater variability ( <10). Thus,
we have

HO: = 10
versus HA : 10.
In all of these examples, the null hypothesis specifies a single value of parameter ( = 32, p =
1/2, = 10). The alternative hypotheses differ in form, however. For the first example, the
consumer group is particularly interested in the possibility that the manufacturer has overstated

the mean gasoline mileage. Hence, the alternative hypothesis is . < 32, and the test is called a

one-sided or one-tailed, test to the left. On a graph, values of less than 32 are to the left of =

32; see Figure 11.1.

For the second example involving the mean gasoline mileage, the null hypothesis is once again
= 32, but the alternative hypothesis is > 32. Now the test is called a one-sided, or one-tailed,

test to the right. The values in HA are to the right of the null hypothesis that = 32, as shown in
Figure 11.2.

If we just wanted to test whether was equal to 32 or different from 32, then we would have
HO: = 32
versus HA: 32.
This is called a two-sided, or two-tailed, test, because the alternative. hypothesis HA contains
values to the left of = 32 and values to the right of = 32 (see Figure 11.3). The examples
involving the coin and the variability of test scores are also two-tailed tests, because the
alternative hypotheses are p 1/2 and 10.

Should a test be one-tailed to the left, one-tailed to the right, or two-tailed? That depends on HA,
which, in turn, depends on the situation. Sometimes we are particularly interested in values in
just one direction from the null hypothesis. For instance, we might be interested in whether a
modification in working conditions leads to an increase in average productivity; whether the
proportion of voters intending to vote for a certain candidate is greater than one half, or whether
a new drug decreases the death rate from a rare form of cancer. One-tailed tests are indicated in
these situations. Other times we may just want to know if a difference exists without caring in
which direction the difference is. For instance, we might ask whether the average productivity in
a particular task is different for French workers than for American workers; whether the
proportion of male respondents in a given survey is different from one half; or whether a new
drug causes a change in body temperature. Two-tailed tests seem appropriate for these cases.

Once HO and HA have been formulated, we need to determine the appropriate test statistic to be
computed from the data. In testing hypotheses about , the test statistic will involve the sample

mean x, as you would expect. If the test is about a proportion p, the sample proportion x/n will be
used in computing the relevant test statistic. More details will be given when tests for specific
parameters are discussed in the following sections of this chapter and in later chapters.

The next step is to specify a decision rule, which involves a rejection region. A rejection region
is a set of values of the test statistic that will cause you to reject the null hypothesis Ho. Consider
the gas mileage example, with

HO: = 32
versus HA: <32.
Here we will reject HO if x is sufficiently low, since low values favor HA and high values favor

HO. An x of 24 would seem to favor HA, while x = 35 makes HO look better than HA. This sounds

reasonable, but what is "sufficiently low"? Should we reject whenever x 31.5, whenever x

30.0, only when x 25.0, or for yet another set of values of x? In other words, what rejection
region should be chosen?

Different rejection regions yield different error probabilities, and our choice of a rejection region
should be based on these probabilities. In the gas mileage example, " x 25" is a more stringent
rejection region that ' x 30" and provides a smaller a but a greater chance of a Type II error. A
common practice is to choose a small value of , such as 0.01 or 0.05, and to find a rejection

region that yields the desired a. This approach will be illustrated in detail when tests for specific
parameters are discussed.

The form of the rejection region depends on HA. In the above example, we have HA : < 32, and

we reject HO if x is sufficiently low. With a one-tailed test to the right, such as


HO: = 32
versus HA: > 32,

we reject HO if x is sufficiently high. With a two tail test we reject HO if x is sufficiently different

from 32 (that is, if x is sufficiently high or sufficiently low). The direction of the rejection region
(to the left, to the right, both sides) is the same as the direction of the alternative hypothesis HA.

The specification of a test statistic and rejection region completes the setting up of the test. All
that remains is to calculate the test statistic from the data. If it falls in the rejection region, HO
should be rejected; if it does not fall in the rejection region, HO should be accepted. Remember
that accepting Ho does not necessarily mean that we believe that Ho is true. It may simply imply
that we do not have enough evidence to reject Ho or that we are reserving judgment.

In this section we have attempted to give you some idea of the nature of hypothesis testing.
Much of the terminology of hypothesis testing has been introduced, although more terms will be
defined and discussed in later sections. Now we are ready to apply the ideas of hypothesis testing

to tests involving a mean (Sections 11.3 and 11.4) and a proportion p (Section 11.5). If you

gain an appreciation of the general idea of hypothesis testing from this chapter and you can
follow the specific hypothesis-testing procedures for means and proportions, then other tests

developed in later chapters will seem like "variations on a theme" and should not be difficult to
understand.

11.3

Hypothesis Testing: Means


(Large Samples)

Armed with some general ideas about hypothesis testing from the previous section, you are now
ready to consider specific methods for dealing with hypotheses about a mean . We will begin

by analyzing the gasoline mileage example with hypotheses

HO: = 32
and

HA: < 32.

Recall that we use the sample mean x as a point estimate of . Moreover, from the Central Limit

Theorem, which was covered in Section 7.4, the sampling distribution of x is approximately
normal if the sample size is not too small. Also, the standard error of the mean is /n.
Therefore,

z=

/n

has approximately a standard normal distribution. All we are doing here is standardizing x
(subtracting its mean and dividing by its standard error /n) and invoking the Central Limit

Theorem to justify the claim of normality.

In the gasoline mileage example, suppose that the sample consists of the gasoline mileage for
each of 40 cars and that the standard deviation of gasoline mileage for city driving with this type
of car is = 4.5 miles per gallon. Therefore, the standard error of the mean is
/n = 4.5/40 = 0.71 miles per gallon.
Now, if HO is true (that is, if = 32), then
z=

/n

x 32

4.5/40

x 32
0.71

What if we reject whenever x 31 ? The rejection region x 31 corresponds to


z

31 32
1
=
= 1.41
0.71
0.71

From the cumulative nonnal probabilities given in Table 3, the probability that z -1.41 is
P(z -1.41) = P(z 1.41) = 1 - P(z 1.41)
= 1 - 0.92 = 0.08.

But this is the probability that x 31 given that HO is true (since = 32 was used to find z = 1.41). If the rejection region is x 31, this is the probability of rejecting HO when HO is true,

which is the error probability . We can represent a as an area in the left tail of the normal
distribution, as shown in Figure 11.4.

What does this result mean? It means that when we use the rejection region x 31 (or
equivalently, z -1.41), if HO is true then the probability of rejecting HO is 0.08. Even though

the true mean is = 32, sampling fluctuations are such that the chance of a sample mean of 31 or

less is 0.08. If we were to take repeated samples of size 40 from a population = 32 and = 4.5,
8 percent of the samples would yield sample means of 31 or less.

Perhaps = 0.08 is judged to be too great a risk of a Type I error in this case. How can we

change the rejection region to make = 0.01? Since this is a one-tailed test to the left, is a lefthand tail area under the normal curve, as in Figure 11.4. To make the area smaller, the critical

value, which is the dividing line between acceptance and rejection, needs to be shifted to the left
The critical value of z is -1.41 in Figure 11.4. If we want = 0.01, the critical value of z should

be the first percentile of z. From Table 3, the 99th percentile of z is -2.33, since P(z > -2.33) =
0.99. Thus,

P(z -2.33) = 0.01,


and a rejection region of z -2.33 gives us an a of 0.01, as shown in Figure 11.5. Now we have
a decision rule with = 0.01:

reject HO if z -2.33,
accept HO if z > -2.33.

Suppose that the sample mean for the sample of 40 cars turns out to be x = 30.21 miles per
gallon. The computed value of the test statistic z is then

z=

30.21 32 1.79
=
= 2.52
0.71
0.71

Since this is less than the critical value of z, -2.33, we reject the manufacturer's claim that = 32

in favor of the alternative hypothesis that < 32. Alternatively, the rejection region can be
expressed in terms of x; z -2.33 corresponds to

x 32 - 2.33(0.71),
or

x 30.35.

The observed sample mean, 30.21, is in the rejection region.

How can we interpret the results of the test? With = 0.01, the observed sample mean of 30.21
miles per gallon is far enough below 32 to make us reject HO. The difference between 30.21 and
32 is not within the range of what might be expected due to chance fluctuations.

Let us consider another example. Suppose that a study indicates that the mean IQ score among
college students in the U.S. is 115 and the standard deviation is 10. The president of a large state
university feels that the mean IQ at that university is higher than 115. Unfortunately, IQ tests are
not administered routinely and are not included in students' files at the university.
Therefore, the president decides to gather some data to test

HO: = 115
and

HA : > 115,

where is the mean IQ score among students at the university. A random sample of 50 students
is selected, and these 50 students take an IQ test.

The test statistic in this example is

z=

x 115

10/115

x 115
1.41

since = 10, n = 50, and the hypothesized mean in HO is 115. Furthermore, since one-tailed test

to the right (because HA is > 115), we will reject for large values of z. If = 0.05, the critical
value of z is the 95th percentile of the standard normal distribution, which is 1.64 (from Table 3).

The rejection region is z 1.64, as shown in Figure 11.6.


The IQ tests are administered, and the average IQ score for the 50 students is 116.72. The test
statistic is

z=

116.72 115 1.72


=
= 1.22
1.41
1.41

But the rejection region is z 1.64. The observed x of 116.72 is not large enough to enable the

president to reject HO. With = 0.05, this x is within the bounds of what might be expected just

by chance even if = 115. In order to reject HO when = 0.05, we need to have z 1.64, which,
when convened to x ; corresponds to

x 115 + 1.64(1.41) = 117.31.


For this test, the sample mean IQ score must be at least 2.31 points above the hypothesized mean
115 in order to reject HO.

The gasoline mileage example and the IQ example illustrate one-tailed tests to the left and right,
respectively. How about a two-tailed test involving a mean ? Consider a small firm in the
French province of Brittany. The firm bottles and sells the region's famous apple cider. In the
process, a machine that automatically fills bottles with apple cider is used. The bottles are
supposed to hold one liter (100 centiliters) each, but the actual amount varies slightly from bottle
to bottle because of chance variation. Extensive measurements of such variation in the past
indicate that the standard deviation is about 1.5 centiliters, or 0.015 liters. The manager of the

firm is concerned about , the mean amount of cider (in centiliters) per bottle. If > 100, then
the bottles are being overfilled on the average, and such overfilling is costly to the firm. On the

other hand, if < 100, then the bottles tend not to hold as much as the label claims, which is also

an undesirable state of affairs since it is not fair to the consumer. The firm wants to equal 100

centiliters and would want to stop the bottling process and adjust the machine to eliminate any
deviations from = 100.
The hypotheses in this example are

HO: = 100
and

HA: 100.

Suppose that the firm takes a sample of 30 bottles and measures the amount of cider in each
bottle. The measurement process is very accurate, providing accuracy to the nearest hundredth of
a centiliter. Thus, any variation in the measurement process can be ignored for all practical
purposes.

The test statistic is

z=

x 100

1.5/30

x 100
.
0.27

Let = 0.05. In the previous examples, Ho was to be rejected only for low values of z (the

gasoline mileage example) or only for high values of z (the IQ example). This is a two-tailed
test, however, and we should reject HO if z deviates too much from zero in either direction. Now
is not represented by the area in one tail of the standard normal distribution, as in Figures 11.4,

11.5, and 11.6. Instead, the (l of 0.05 is split into an area of 0.025 in the left tail and an area of
0.025 in the right tail. From Table 3, the value z = 1.96 cuts off an area of 0.025 in the right tail.
By the symmetry of the normal distribution, z = -1.96 cuts off an area of 0.025 in the left tail.
Thus, the decision rule is as follows:

reject HO if z -1.96 or z 1.96,


accept HO otherwise (that is, if -1.96 < z < 1.96).

This is illustrated in Figure 11.7. Using the notation introduced in the discussion of interval
estimation in Chapter 8, z/2 = 1.96.
The sample of 30 bottles is taken and the amount of cider in each bottle is measured. The sample
mean of the 30 measurements is x = 100.93. On average, the .machine appears to be overfilling
the bottles by almost a centiliter per bottle.

Is this much overfilling likely to occur by chance in a sample of n = 30, or is it an indication that
the machine needs adjusting? Let's compute the test statistic z:

z=

100.93 100
= 3.44
0.27

But z = 3.44 is much larger than the critical value of z = 1.96 in Figure 11.7. This much
overfilling is highly unlikely to occur by chance in a sample of n = 30, and HO should be
rejected.

A review of the procedures discussed in this section is in order.

Step 1. Formulate the hypotheses. The null hypothesis is of the form

HO : = O
where O stands for the specific value given in HO (O = 115 in the IQ example, for instance).

We have considered three types of alternative hypotheses:

HA: < O (one-tailed test to the left),

HA: > O (one-tailed test to the right),

HA: O (two-tailed test).

Step 2. Determine the appropriate test statistic. The test statistic used in this section is

z=

x o
/n

Step 3. Specify a rejection region. For a given value of , the decision rule is
reject HO if z - z. for a one-tailed test to the left;

reject HO if z z. for a one-tailed test to the right;

reject HO if z - z/2 or if z z/2 for a two-tailed test.


Here z represents the value of z cutting off the area a in the right tail of the normal curve (and

z/2, of course, cuts off /2 in the right tail of the normal curve). When = 0.05, for example, z
= 1.64 and z/2 = 1.96; when = 0.01, z = 2.33 and z/2 =2.58.

If you prefer, the rejection region can be expressed in terms of x:


reject HO if x O - z /n for a one-tailed test to the left;

reject HO if x O + z /n for a one-tailed test to the right;

reject HO if x O - z/2 /n or if x O - z/2 /n for a two-tailed test.


Step 4. Compute the values of the sample mean x and the test statistic z from the data. If the
observed z falls in the rejection region, reject HO. Otherwise, accept HO.

It is important to keep in mind the fact that this test is based on the normality of z. This is
justified by the Central Limit Theorem, which holds for large samples; n 30 is often used as a
rule of thumb, although smaller values of n may work if the population distribution is symmetric

and "mound-shaped" like the normal distribution. As a result, the test given in this section is
called a large-sample test. Tests for when the sample is small will be discussed in the next
section, Section 11.4.

Another point worth noting is the fact that we need to know the population standard deviation
in order to compute the test statistic z. Given that we do not know (we are testing about ), are
we likely to be able to specify ? In some cases, past data may provide reliable information

about . However, often we have no more than a rough guess about . Since the test is a large

sample test, we are saved by the fact that the sample standard deviation s, which can be
computed from the formula

n (xi x)2
s = i=1
n1
provides a good estimate of for large samples. Therefore, s can be used in place of when we

calculate the test statistic z for large samples.

In the gasoline-mileage example, suppose that the standard deviation is not known. From the

data (the miles per gallon obtained from each of the 40 cars), the standard deviation is found to
be s = 4.16. (We already computed x = 30.21). With s used in place of , the test statistic is
z =

30.21 32
4.16/40

1.79
= 2.71 .
0.66

This is less than the critical value of z = -2.33 (see Figure 11.5), and Ho should therefore be
rejected.

11.4

Hypothesis Testing: Means


(Small Samples)

Next we consider tests of hypotheses about for small samples from normally distributed

populations. Suppose that the US Environmental Protection Agency (EPA), as pan of a review of
national air quality standards, is interested in the level of ozone in the atmosphere in a particular
region. In particular, the hypotheses

HO: = 0.10
and HA: > 0.10
are of interest, where represents the mean level of ozone in parts per million (ppm). Twelve

measurements of ozone level are taken at randomly selected points in the region. The sample
mean is 0.117 ppm, and the sample standard deviation is 0.016 ppm.

The test statistic is

t =

x 0.10
s/n

and the number of degrees of freedom is 12 -1 = 11. If = 0.05, the rejection region for this one-

tailed test to the right consists of the values of t in the upper 5 percent of the distribution, from
Table 4, the rejection region is t 1.796.
From the sample results.

t =

0.117 0.10
0.016/12

0.017
= 3.70.
0.0046

Since this value is larger than 1.796, we reject the null hypothesis that the mean level of ozone is
0.10 ppm in favor of the alternative hypothesis that the mean level is greater than 0.10 ppm,

The procedure used in testing hypotheses concerning for small .samples from normally
distributed populations can be summarized as follows,

Step 1. Formulate the hypotheses, The null hypothesis is of the form

HO : = O

and the alternative hypothesis is either


HA: < O (one-tailed test to the left),

HA : > O (one-tailed test to the right);


or HA : : O (two-tailed test).

Step 2. Determine the appropriate test statistic. The test statistic for small-sample tests involving
is
t =

x O
s/n

with n - 1 degrees of freedom.

Step 3. Specify a rejection region. For a given value of . the decision rule is:
reject HO if t -t , n-1 for a one-tailed test to the left;

reject HO if t -t , n-1 for a one-tailed test to the right;

reject HO if t -t /2, n-1 or if t -t /2, n-1 for a for a two-tailed test.

Here t represents the value of t cutting off the area a in the right tail of the t curve (and t/2, of
course, cuts off /2 in the right tail).
In terms of x, the rejection region is
reject HO if x O - t,n-1 s/n for a one-tailed test to the left;

reject HO if x O + t,n-1 s/n for a one-tailed test to the right;

reject HO if x O - t/2, n-1 s/n or if x O - t/2, n-1, s/n for a two-tailed test.
Step 4. Compute the values of the sample mean x and the sample standard deviation s from the

data. Then compute t. Reject HO if the observed t falls in the rejection region, accept HO
otherwise.

11.5

Hypothesis Testing: Proportions

When only two categories are of interest (such as a question that only can be answered yes or no,
a part that is good or defective, a person who has a disease or does not have it, or a coin that can
come up heads or tails), the parameter we deal with is a proportion (the proportion of yes
answers, the proportion of defective parts, the proportion of people with the disease, the
proportion of times heads comes up). In Sections 8.4 and 8.5 we discussed point and interval
estimation for proportions. Now we will consider the testing of hypotheses about proportions. As
in the discussion of hypothesis testing for , we will give some examples and then provide a
summary of the procedure.

Suppose that for a particular form of cancer, the cure rate using the standard treatment has been
0.60. That is, 60 percent of the patients with the disease have been cured in the sense that they
are free of the disease for at least five years following the treatment. A group of medical
researchers have developed a new treatment for the disease, and they claim that their treatment
leads to a cure rate higher than 0.60. The hypotheses, then, are

HO: p=0.60

and HA: p > 0.60,

where p represents the cure rate with the new treatment.

The researchers report that the new treatment has been tried on a number of patients at various
clinics. Among those patients treated at least five years ago, 47 out of 61 have been cured. The
cure rate for these patients is x/n = 47/61 = 0.7705. This is encouraging, but is it enough larger
than 0.60 to make us reject HO and conclude that the new treatment is indeed more effective at
curing the disease?

The number of patients cured, x, has a binomial distribution, and if n is not too small we can use
a normal approximation to the binomial distribution. As in Sections 8.4 and 8.5, we will work
with the sample proportion x/n instead of x. The mean of x/n is p and the standard deviation of
the sampling distribution (i.e., the standard error) of x/n is p(l p)/n. Therefore when we

standardize x/n by subtracting its mean and dividing by its standard error, the z statistic we wind
up with is

z =

(x/n) p

p(l p)/n

Just as in testing hypotheses about , our test statistic is a standard normal z-statistic.
In the cancer example, n is large enough for the normal approximation to be used. The value of p
under the null hypothesis is 0.60, which means that np = 61(0.60) = 36.6 and n(1-p) = 61(0.40) =
24.4. In Section 5.4 we suggested as a rule of thumb that the normal distribution provides a
reasonable approximation to the binomial distribution when np 5 and n(l-p) 5. For the

cancer example np and n(l-p) are both considerably larger than 5.

Let = 0.05. As in the previous section, can be represented as an area under the normal curve.

Since this is a one-tailed test to the right (HA consists of values of p greater than 0.60), the area
must be in the right tail of the distribution. If we want = 0.05, the critical value of z should be
the 95th percentile of z (so that 95 percent of the area under the curve will be to the left of the

critical value and 5 percent to the right). From Table 3, the 95th percentile of z is 1.64, since
P1.64) = 0.95. The rejection region is z 1.64, as shown in Figure 11.8.
Next we calculate the test statistic z. With 47 cured patients out of 61, we have

z =

(47/61) 0.60
0.60(0.40)/61

0.1705
= 2.72
0.0627

This is in the rejection region, since 2.72 > 1.64. The evidence is strong enough to make us
believe that the new treatment yields a cure rate better than 0.60.

How large would x/n have to be to make us reject HO in this example? The rejection region of z
1.64 corresponds to
x/n 0.60 + 1.64 0.60(0.40)/61 ,
or x/n 0.7029.

A cure rate in the sample of at least 70.29 percent of the patients leads to rejection of HO when n
= 61, = 0.05, and the null hypothesis indicates we should expect only 60 percent to be cured.
The hypotheses for a test of whether a coin is fair or loaded were set up in Section 11.2. They are
versus

HO: p =0.50
HA: p 0.50,

where p represents the proportion of times heads comes up. To test Ho versus HA for a particular
coin, the coin is tossed 100 times.

The normal approximation can be used, since np = n(1-p) = 100(0.50) = 50 for n = 100 and p =
0.50. Suppose that = 0.10. For a two-tailed test, that means that the rejection region can be
represented by an area of 0.05 in the left tail and another area of 0.05 in the right tail. The critical

values of z are thus the 5th and 95th percentiles, which are -1.64 and 1.64 (see Figure 11.9). The
decision rule is:

reject HO if z -1.64 or if z 1.64,

accept HO otherwise (that is, if -1.64 < z < 1.64).


The 100 tosses of the coin result in 46 heads and 54 tails. The test statistic can be computed as
follows when x = 46 and n = 100:

z =

(46/100) 0.50
0.50(0.50)/100

0.04
= 0.80
0.05

This value is well within the acceptance region. Although we expect 50 heads in 100 tosses if the
coin is fair, 46 heads is not unusual enough to make us conclude that the coin is loaded.

For a third example, consider a union which is preparing for a strike vote. The union rules
indicate that a strike will be called only if at least 80 percent of the members vote in favor of
striking. The union's leaders, who recommend a strike, claim that the members support this
recommendation and that the required number of pro-strike votes will be obtained. A polling
organization decides to take a survey of 140 randomly-chosen union members in order to test
versus

HO: p = 0.80

versus HA:p < 0.80,

where p represents the proportion of union members who favor a strike. A value of 0.05 is
chosen for , which means that HO will rejected if z -1.64, as shown in Figure 11.10.
Of the 140 members polled, 103 favor the strike and the remaining 37 are against the strike. A
point estimate of p is 103/140 = 0.7357. This is less than 0.80, but is it sufficiently low to cause
us to reject the claim of the union's leaders? The test statistic is

z =

(103/ 140) 0.80


0.80(0.20)/140

0.0.643
= 1.90.
0.0338

This is less than -1.64, and HO should be rejected.

The procedures used to test hypotheses about p can be summarized as follows.

Step 1. Formulate the hypotheses. The null hypothesis is of the form

Ho: p = po,

where po stands for the specific value given in HO (po is 0.60 in the cancer example, 0.50 in the
coin example, and 0.80 in the strike vote example).
We have considered three types of alternative hypotheses:
HA : p < po (one-tailed test to the left);

HA : p > po (one-tailed test to the right),

and HA: p po (two-tailed test).

Step 2. Determine the appropriate test statistic. The test statistic used in this section is

z =

(x/n) po

po (l po )/n

Step 3. Specify a rejection region. For given value of , the decision rule is
reject HO if z -z for a one-tailed test to the left;

reject HO if z z for a one-tailed test to the right;

reject HO if z -z/2 or if z z/2 for a two-tailed test.

Here z represents the value of z cutting off the area (l in the right tail of the normal curve (and

z/2, of course, cuts off a/2 in the right tail of the normal curve).

If you prefer, the rejection region can be expressed in terms of x/n:

reject HO if x/n po - z po (l po )/n for a one-tailed test to the left;

reject HO if x/n po + z po (l po )/n for a one-tailed test to the right;

reject HO if x/n po - z/2 po (l po )/n or if x/n po + z/2 po (l po )/n


for a two-tailed test.

Step 4. Compute the values of the sample proportion x/n and the test statistic z from the data. If
the observed z falls in the rejection region, reject HO Otherwise, accept Ho

11.6

Significance Levels

In some instances everyone would agree that a particular hypothesis should be rejected. In the
coin example from Section 11.5, suppose that the number of times heads comes up in 100 tosses
was 6 instead of 46. With this result, it is hard to imagine anyone continuing to support the

notion of the coin being fair. In fact, anyone who still thought the coin was fair might be a good
person to bet against with bets on future tosses of the coin. When x = 6, the test statistic is

z =

(6/100) 0.50

0.50(0.50)/100

0.44
= 8.80
0.05

To say the least, this is an extreme value of z.

At the other extreme, if x = 50, surely no one would consider this as evidence against the coin
being fair. This value should lead to general agreement that Ho should not be rejected. Here we
have

z =

(50/100) 0.50
0.50(0.50)/100

0
= 0
0.05

which is right in the middle of the standard normal distribution (the mean of the distribution, to
be exact).

The decision to accept or reject is not always so easy. In the strike vote example of Section 11.5,
z = -1.90 for a one-tailed test to the left. With = 0.05, as used in the example, the rejection

region is z -1.64, and the decision is therefore to reject Ho. However, what if = 0.01? With

this smaller value of , the rejection region becomes z -2.33, since -2.33 is the first percentile

of the distribution of z. But the calculated test statistic, z = -1.90, is greater than -2.33, as you can
see from Figure 11.11. Having changed to 0.01, we now accept HO.
This example illustrates how the choice between accepting and rejecting hypotheses may depend
on the value of that is selected. Specifying a value of enables us to determine a rejection

region. The conventional justification for concentrating on is that

1. The hypotheses are set up so that a Type I error is more serious than a Type II error and

2. if "accept Ho is interpreted as "fail to reject HO", then we never accept Ho literally and
thus theoretically we never get into a Type II error.

But the choice of is often made in a somewhat arbitrary fashion. Over the years, = 0.05 and
= 0.01 have been used most often. Unfortunately, these values are generally used more out of
tradition than out of a careful consideration of the real-world problem at hand.

One way to avoid a choice of is not to view hypothesis testing in decision-making terms. The
approach of hypothesis testing presented in the preceding sections has been decision-oriented,
emphasizing the choice between rejecting and accepting hypotheses. This requires a formal
decision rule, or rejection region, that should be chosen before the data are analyzed.

But what are we trying to accomplish in hypothesis testing? Often we are trying to see whether a
particular result seems "real" or whether it might be just due to chance. Is the sample proportion
103/140 = 0.7357 in the strike vote example an indication that the true population proportion p in
favor of the strike is less than 0.80? Or could it be that 80 percent of the members favor the strike
and the sample proportion is 0.7357 just because the sample, by the luck of the draw, happened
to include an unusually high proportion of members against the strike? The null hypothesis that p
= 0.80 is a claim that the result (the difference between 0.7357 and 0.80) is due to chance. The
alternative hypothesis that p < 0.80 is an alternative claim that the difference is real.

How strong is the evidence against p = 0.80? The observed x/n of 0.7357 corresponds to z =
-1.90. From the sampling distribution of z shown in Figure 11.12. you can see that -1.90 is in the
left tail of the distribution. How surprised should we be to get a z of -1.90? The chance of getting
a z this far to the left of zero or farther (remember. this is a one-tailed test to the left) is
represented by the shaded area in Figure 11.12. But this is P(z -1.90), which can be found by
using Table 3; its value is approximately 0.03. This means that if p were exactly 0.80 and we

took repeated samples of size n = 140. about 3 percent of the samples would result in a sample
proportion x/n less than or equal to 0.7357 (that is, less than or equal to x = 103 people in favor

of the strike). The probability P(z -1.90) = 0.03 is called the observed significance level for the
test of HO : p = 0.80 versus HA : p < 0.80 in the strike vote example.

Observed Significance Level: The chance of getting a test statistic as extreme as or more
extreme than the value of the test statistic that is actually obtained, given that the null
hypothesis is true. Another name for an observed significance level is a P-value.

The lower the observed significant level (or P-value) is the more surprised someone who
believes the null hypothesis should be. A low P-value suggests that if HO is true, we have
witnessed an unusual sample. Thus, the lower the P-value is, the stronger the evidence is against
HO. Consistent with the widespread use of 0.05 and 0.01 as values of a is the interpretation of an
observed significance level of 0.05 or less as a "statistically significant result" and of 0.01 or less
as a "highly statistically significant result". Alternative expressions are "a result significant at the
0.05 level" or "a result significant at the 0.01 level" or "a result significant at the (insert the Pvalue here) level". The term "significant" comes from the initial question of whether the
difference between the sample results and the null hypothesis (for example, between 0.7357 and
0.80 in the strike vote example) is "real", or "significant". In fact hypothesis tests are often
referred to as tests of significance.

In the gasoline mileage example from Section 11.3, the test is one-tailed to the left and z = -1.41.
Thus.

P-value = P(z -1.41) = 0.08.


The observed significance level is 0.08, which provides some evidence against HO but not
enough to satisfy someone who insists upon = 0.05.
In the IQ example from Section 11.3, the test is one-tailed to the right and z = 1.22. Therefore,

P-value = P(z 1.22) = 0.11.

Because the test is one-tailed to the right, the P-value is a right-tail area, not a left-tail area.

In the cancer example of Section 11.5, z = 2.72. The observed significance level is

P-value = P(z 2.72) = 0.003


for this one-tailed test to the right. This provides very strong evidence against the null hypothesis
that the cure rate for the new treatment is 0.60. Since the P-value is less than 0.01, the result
might be called highly statistically significant.

How about the coin example of Section 11.5? The results of 46 heads in 100 tosses yield z =
-0.80, and the corresponding left-tail area is

P(z -0.80) = 0.21.


But the test is two-tailed (p = 0.50 versus p 0.50). Therefore, a z of +0.80 would be just as

extreme as a z of -0.80. As a result, we double the one-tail area to allow for the fact that this is a
two-tailed test. The observed significance level is

P-value = 2P(z -0.80) = 2(0.21) = 0.42.


A significance level this high indicates a result that is not at all surprising under the null
hypothesis.

Consider the bottle-filling example from Section 11.3, with z = 3.44. The one-tail area is

P(z 3.44) = 0.0003,


which means that the P-value for the two-tailed test (= 100 versus 100) is

P-value = 2(0.0003) = 0.0006.

Finally, consider the ozone-level example in Section 11.4. The test statistic is t = 3.70 and the
number of degrees of freedom is 11. The test is one-tailed to the right. Therefore, the observed
significance level (P-value) is P (t 3.70). From Table 4,t0.995 = 3.106 and t0.999 = 4.025 with 11
degrees of freedom. Thus, the area to the right of 3.106 is 0.005, and the area to the right of
4.025 is only 0.001. Since the observed t of 3.70 is in between 3.106 and 4.025, the observed
significance level is between 0.005 and 0.001. We would say that the results are significant at the
0.005 level but not at the 0.001 level. Of course, the 0.005 level is quite low, indicating that the
results would be highly unlikely if HO were really true.

In summary,

P-value = area to the left of the observed test statistic (z or t, whichever is used) for a onetailed test to the left;
P-value = area to the right of the observed test statistic (z or t, whichever is used) for a
one-tailed test to the right;
P-value = twice the one-tailed P-value for a two-tailed test.

We must emphasize the importance of looking at more than just the observed significance level
in order to evaluate the practical importance of the results. A low significance level does not
guarantee practical importance. For instance, in the gasoline mileage example, suppose that
1,600 cars were used instead of 40, and the average mileage for the 1,600 cars was 31.68. The
test statistic would be

z =

31.68 32
4.5/1600

0.32
= 2.84
0.1125

For this one-tailed test to the left (= 32 versus 32), the observed significance level would be

P-value = P(z -2.84) = .002.


Thus, the results appear to be highly statistically significant. But notice that x = 31.68, which is

only about one-third of a mile per gallon less than the manufacturer's claim of 32 miles per
gallon. If the consumer group uses this difference of one-third of a mile per gallon to dispute the
manufacturer's claim, they will be laughed at. It is unlikely that anyone will care about such a
small difference, statistically significant or not

If the difference of one-third of a mile per gallon is not important, why does it lead to such a low
observed significance level? Notice that the sample is very large (1,600 cars). Such a large
sample can provide a high degree of accuracy, as you learned in Chapter 8. In hypothesis testing,
this great accuracy means that the test becomes sensitive to very small differences from the null
hypothesis. The standard error of 0.1125, or just over a tenth of a mile per gallon, is quite small.
The sample tells us that the evidence is strongly against being exactly 32. That does not mean

cannot be close to 32, and it appears that it is in the neighborhood of 31.68.

The moral of this example for the consumer group is that a large sample size gives them more
accuracy than they need. They might as well save some money by reducing their sample size. On
the other hand, in some situations a high degree of accuracy, hence a large sample, may be
desirable. In the bottle-filling example, a deviation of only 0.01 centiliters may be very costly to
the firm when the number of bottles being filled and shipped out is considered. (If it is important
to the firm to detect small deviations from the null hypothesis of = 100, then a larger sample
size may be needed.)

The moral of this example for anyone evaluating the results of their own or someone else's tests
of hypotheses is that it is important to look beyond the statistical significance of such results to
consider their practical significance. Of course, there are many well-designed studies which yield
both. As you can see by now, interpreting hypothesis tests can be trickier than interpreting point
and interval estimates. If you keep the underlying real-world situation in mind while evaluating
the statistical results (and look carefully at statistics such as x as well as at significance levels),

it's not that hard to figure out what is happening. And in reporting results of tests, you should
report some statistics (for example, x when the test involves a mean ) in addition to an observed
significance level, so that others will be able to evaluate the results.

11.7 Hypothesis Testing Using SPSS

Go to Analyze > Compare Means > One-Sample T Test

Take the variable that you are testing to the box Test Variables

Example: A Car Cleaning service firm FastCarClean is trying to reduce the total service time
during the peak periods from the current 30 minutes. A new procedure in implemented. If they
are successful, the entire chain will change to the new process. To test the effectiveness of the
new process, a random sample of a 100 cars is surveyed. The data is provided in
CarCleanWaitingTime.xls. Are the new procedures effective? Test at = .05.

Step 1: Set up the hypothesis


H0 : o = 30
HA : < 30
This is a one-tail test.
Step 2: Determine the test statistic
= .05.
Steps 3 and 4
In SPSS, write down the null value.
The output from SPSS is :

One-Sample Statistics

WaitingTime

N
100

Mean
27.350

Std.
Deviation
12.3525

Std. Error
Mean
1.2352

Observe from the top panel that the mean waiting time after the process is 27.35 minutes. The
bottom panel reveals that the two-tailed p-value is .034. Since this is a one-tail test, the p-value
for our one-tail test is .034 /2 = .017.
Since the p-value of .017 is less than (= .05), we reject the null. The firm has successfully
reduced waiting time.

11.7 Two-Sample t tests

So far we have done looked at one-sample hypothesis testing. We call it one-sample since only
one set of sample observations was gathered.

Very often, we are interested in making

comparisons. For example, we may be interested in whether behaviour is different between two
sets of people do Asian consumers spend more on entertainment than Caucasian consumers?
Or do younger people more likely to try new products than older people? Is a new medication
better than the existing one? Since we are interested in differences or comparisons between
people/ events / strategies, we very often have to conduct hypothesis testing using more than one
sample. To test for differences between Asian and Caucasian consumers, we will have to gather
data from a sample of Asian consumers and another set of observations from a sample of
Caucasian consumers. Now we are dealing with two samples. The basic process of hypothesis

testing using two samples is the same as in the one-sample case. We have to formulate the
hypothesis, set up the rejection region and calculate the test statistic. Two-sample hypotheses
testing can be of two types independent samples t test and paired-sample t tests. We discuss
each type next.

11.7.1 Independent-Sample t-test

Independent-sample t test is used when data for the two samples is gathered independently of
each other. For example, if we are interested in differences in spending patterns between Asians
and Caucasians, we can get spending data by going to Galleria and soliciting consumers. We ask
a random sample of 100 Asians (e.g., by asking every 10th Asian we see) how much they spend
on entertainment. We simultaneously also get a random sample of 100 Caucasians (also by
contacting every 10th Caucasian we see) and ask them how much they spend on entertainment.
The crucial notion is that these two samples of 100 Asians and 100 Caucasians are not connected
in anyway. These data for one sample (Asians) is collected independently of the data for the
second sample (Caucasian): that is why it is called independent-samples.

A manager of a restaurant is considering offering a free drink to customers as soon as they sit to
order. Some of her colleagues think it will be a good idea and encourage customers to order
more expensive dishes or buy more drinks. Other colleagues believe it will be bad for business
as people who would have otherwise ordered drinks will not do so anymore and also will spend
less on food. They suggest offering a free appetizer. She therefore decides to conduct an
experiment one night to see which one will be more effective. We will call the free drink
promotion strategy X and the old approach of appetizers Y. The firm decides to implement one
strategy one night and gives 12 (nx) randomly selected customers a free drink while 12 (ny) other
randomly selected customers will get an appetizer. The manager will then see the how much
money customers have spent to make a decision on which strategy to adopt. Before all the data
collection begins, the firm needs to set up the hypothesis testing process as discussed in the
earlier sections.

Step 1:

First the null hypothesis. The null hypothesis is that both promotions are equally effective which
is denoted by:
H0: x = y
This implies that the null hypothesis is that mean sales of the drinks promotion (x) are going to
be the same as the second appetizer promotion.
Depending on the firms point of view, three alternative hypotheses are possible. First, the firm
may believe that promotion (x) will be more effective than promotion (y). This might be the
case if promotion y is what the firm currently uses and a management consultant has
recommended that the firm try out promotion x. Now the firm will set up the alternative
hypothesis to be:
HA: x > y
This is a one-tail test.
Alternatively, the firms management might believe that promotion (y) is more effective than
promotion (x). In this case, the alternative hypothesis will be:
HA: x < y

This is also a one-tail test.


Finally, the management might be equivocal about either promotion. In this case, the alternative
hypothesis will be
HA: x y
This is a two-tailed test.

Step 2: Determine the appropriate test statistic

There are two possible test statistics for the independent (two)-sample tests. The formulae for the
test statistics are quite cumbersome. Which one should be used depends on the nature of the data
gathered. First, the standard deviations of the two random samples need to be examined. If the
standard deviations of the two samples are not significantly different (equal variances), then one

formula is used; if the standard deviations are significantly different (unequal variances), another
formula is used. You will not solve these problems by hand SPSS will do this for you. But in
case you are curious, the formulae for the test statistics are given below.

t-statistic when sample variances are equal

t=

( x y) ( x y )
s 2p
nx

s 2p

where

s =
2
p

(n x 1)s 2x + (n y 1)s 2y
nx + ny 2

ny

and t has (n1 + n2 2) degrees of .freedom

t-statistic when sample variances are unequal


2

t=

(x y) D 0
2
s 2x s y
+
nX nY

s 2x
s 2y
+
)
(
)
(

n y
n x
where v =
2
2
s 2y
s 2x
/(n x 1) + /(n y 1)

nx
ny

and t has (n1 + n2 2) degrees of freedom.


As mentioned, SPSS computes these for you. SPSS also computes both the formulas for you.
The output looks like this:

Back to the promotions example, let us consider that the manager is equivocal about the two
promotions and sets the hypothesis as:

Step 1:
H0: x = y (The effectiveness of the two promotions will be the same)
HA: x = y (The effectiveness of the two promotions will not be the same).

The output from the two samples is given below:

From the top panel, we can see that the average amount spent when free drinks were offered is $
118.75 while the average amount spent when a free appetizer was offered is $ 101.57. Is this a
significant difference?

First, notice that there are two rows in the second panel. The first row is output that begins with
Equal variances assumed.

The second row is Equal variances not assumed.

SPSS

computes the formulas for both possibilities. Look at the t-values (4th Column). In both the rows
the t-values are the same (2.095). The p-values are also the same at .048. We therefore reject the
null and conclude that the Drink promotion is significantly better than the Appetizer promotion.

Sometimes the t-values and p-values in the Equal variances assumed and Equal variances not
assumed will be different as in the output below (this is an entirely different example).

See the 5th column that has the P-values. The P-value in this example is dereived when we
calculated using the t-statistic formula for equal variances is (.021) while the P-value when the tstatistics formula for unequal variances was used is .037. So what P-value should we use? We
should use the more conservative P-value so that we are consistent with our philosophy of
reducing Type-I error. We therefore should use the P-value of .037.

11.7.2 Paired-Sample Hypothesis Test


When the sample data consists of matched pairs, the Paired-Sample test is required. If you are
interested in seeing whether a weight reduction plan is effective, then weight measures for
individuals before starting the plan and after the plan are taken. The difference in weight loss of
individual patients will indicate whether the plan significantly reduces weight. Similarly, if a
regulatory agency wants to check if gas prices are different in two chains, they will check gas
prices in different zip codes and then measure the differences in gas prices within each zip code.

Example
A CPG firm wants to decide which of two package designs to use.

They randomly select 29

customers and show them both designs. Each customer rates both designs on a 1-10 point scale
where 10 indicates Excellent and 1 means Very Bad. The data is in PackageDesign.xls. Is
any design significantly preferred over another? Test at = 0.05

Step 1:
First the null hypothesis. The null hypothesis is that both designs are equally preferred which is
denoted by:
H0: x = y
Step 2: Decision Rule: = 0.05
Step 3:
In SPSS, go to
Analyze > Compare Means > > Paired-Sample T-test
Move the variables so that they are side-by-side as shown below.

From the top panel, we can see that the mean preference for Package is 7.07 while it is 6.34 for
Package design B. Is this a significant difference?

The bottom panel indicates a t-value of -

2.470 and p-value of .02. Since the p-value is less than = 0.05, we reject the null and conclude
that Package A is significantly preferred to Package B.

S-ar putea să vă placă și