Sunteți pe pagina 1din 47

Chapter 10

Categorical Data Analysis


10.1

a.

The rejection region requires .05 in the upper tail of the 2 distribution with df k 1 3 1 2 .
2
From Table IV, Appendix D, .05
5.99147 . The rejection region is 2 5.99147 .

b.

The rejection region requires .10 in the upper tail of the 2 distribution with df k 1 5 1 4 .
2
From Table IV, Appendix D, .10
7.77944 . The rejection region is 2 7.77944 .

c.

The rejection region requires .01 in the upper tail of the 2 distribution with df k 1 4 1 3 .
2
From Table IV, Appendix D, .01
11.3449 . The rejection region is 2 11.3449 .

10.2

The characteristics of the multinomial experiment are:


1.
2.
3.
4.
5.

The experiment consists of n identical trials.


There are k possible outcomes to each trial.
The probabilities of the k outcomes, denoted p1, p2, ... , pk, remain the same from trial to trial, where
p1 + p2 + + pk = 1.
The trials are independent.
The random variables of interest are the counts n1, n2, ... , nk in each of the k cells.

The characteristics of the binomial are the same as those for the multinomial with k 2 .
10.3

The sample size n will be large enough so that, for every cell, the expected cell count, Ei, will be equal to 5
or more.

10.4

The hypotheses of interest are:


H 0 : p1 .25, p2 .25, p3 .50
H a : At lease one of the probabilities differs from the hypothesized value
E1 np1,0 320(.25) 80

The test statistic is 2

E2 np2,0 320(.25) 80

ni Ei
Ei

E3 np3,0 320(.50) 160

(78 80) 2 (60 80) 2 (182 160) 2

8.075
80
80
160

The rejection region requires .05 in the upper tail of the 2 distribution with df k 1 3 1 2 . From
2
Table IV, Appendix B, .05
5.99147 . The rejection region is 2 5.99147 .

Since the observed value of the test statistic falls in the rejection region ( 2 8.075 5.99147) , H0 is
rejected. There is sufficient evidence to indicate that at least one of the probabilities differs from its
hypothesized value at .05 .

564
Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 565


10.5

Some preliminary calculations are:


If the probabilities are the same, p1,0 p2,0 p3,0 p4,0 .25
E1 np1,0 205(.25) 51.25 E2 E3 E4

a.

To determine if the multinomial probabilities differ, we test:


H 0 : p1 p2 p3 p4 .25
H a : At lease one of the probabilities differs from .25

The test statistic is

[ni Ei ]2 (43 51.25) 2 (56 51.25) 2 (59 51.25) 2 (47 51.25) 2

3.293
Ei
51.25
51.25
51.25
51.25

The rejection region requires .05 in the upper tail of the 2 distribution with df k 1 4 1 3 .
2
From Table IV, Appendix D, .05
7.81473 . The rejection region is 2 7.81473 .

Since the observed value of the test statistic does not fall in the rejection region
( 2 3.293 7.81473) , H0 is not rejected. There is insufficient evidence to indicate the multinomial
probabilities differ at .05 .
b.

The Type I error is concluding the multinomial probabilities differ when, in fact, they do not.
The Type II error is concluding the multinomial probabilities are equal, when, in fact, they are not.

c.

For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D,
z.025 1.96 .
p 3 59 / 205 .288

The confidence interval is:


p 3 z.025

10.6

.288(.712)
pq
.288 1.96
.288 .062 .226, .350
n
205

a.

The data are categorical because they are measured using categories, not meaningful numbers. The
possible categories are legs only, wheels only, both legs and wheels, and neither legs nor wheels.

b.

Let p1 proportion of social robots with legs only, p2 proportion of social robots with wheels only,
p3 proportion of social robots with both legs and wheels, and p4 proportion of social robots with
neither legs nor wheels. To determine if the design engineers claim is incorrect, we test:
H 0 : p1 .50, p2 .30, p3 .10, and p4 .10
H a : At least one of the probabilities differs from the hypothesized value

c.

If the claim is true, E1 np1,0 106(.50) 53 , E2 np2,0 106 .30 31.8 ,

E3 np3,0 106 .10 10.6 , and E4 np4,0 106 .10 10.6 .

Copyright 2014 Pearson Education, Inc.

566

Chapter 10

ni Ei

(63 53) 2 (20 31.8) 2 (8 10.6) 2 (15 10.6) 2

8.730
53
31.8
10.6
10.6

d.

The test statistic is 2

e.

The rejection region requires .05 in the upper tail of the 2 distribution with df k 1 4 1 3 .

Ei

2
From Table IV, Appendix D, .05
7.81473 . The rejection region is 2 7.81473 .

Since the observed value of the test statistic falls in the rejection region ( 2 8.730 7.81473) , H0 is
rejected. There is sufficient evidence to indicate that at least one of the probabilities differs from its
hypothesized value at. .05 .
10.7

a.

Let p1 proportion using total visitors, p2 proportion using paying visitors, p3 proportion using
big shows, p4 proportion using funds raised, and p5 proportion using members.
To determine if one performance measure is used more often than any of the others, we test:
H 0 : p1 p2 p3 p4 p5 .20
H a : At least one of the probabilities differs from the hypothesized value

From the printout, the test statistic is 2 1.66667 and the p-value is p 0.797 .
Since the p-value is not less than ( p .797 .10) , H0 is not rejected. There is insufficient evidence
to indicate that one performance measure is used more often than any of the others at .10 .
b.

For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D,
z.05 1.645 .
p 1 8 / 30 .267

The confidence interval is:

.267(.733)
pq
p1 z.05
.267 1.645
.267 .133 (.134, .400)
n
30
We are 90% confident that the proportion of museums world-wide that use total visitors as their
performance measure is between .134 and .400.
10.8

a.

The categorical variable is the rating of the student exposure to social and environmental issues. It
has 5 levels: 1-star, 2-stars, 3-stars, 4-stars, and 5-stars.

b.

If there were no difference in the category proportions, then each proportion should be
pi 1 / 5 .20 . There were a total of n = 30 business schools sampled. The expected number would
be: E1 E2 E3 E4 E5 npi ,0 30(.20) 6

c.

To determine if there are differences in the star rating category proportions of all MBA programs, we
test:
H 0 : p1 p2 p3 p4 p5 .20
H a : At lease one of the probabilities differs from the hypothesized value

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 567

ni Ei

2 6

9 6

14 6

5 6

0 6

d.

The test statistic is 2

e.

The rejection region requires .05 in the upper tail of the 2 distribution with df k 1 5 1 4 .

Ei

21

2
From Table IV, Appendix D, .05
9.48773 . The rejection region is 2 9.48773 .

f.

Since the observed value of the test statistic falls in the rejection region ( 2 21 9.48773) , H0 is
rejected. There is sufficient evidence to indicate differences in the star rating category proportions of
all MBA programs at .05 .

g.

Some preliminary calculations are: p 3

x3 14

.467
n 30

For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D,
z.025 1.96 . The 95% confidence interval is:
p 3 z.025

p 3 q3
.467(.533)
.467 1.96
.467 .179 (.288, .646)
n
30

We are 95% confident that the proportion of all MBA programs that are ranked in the 3-star category
is between .288 and .646.
10.9

a.

Since there are 10 income groups, we would expect 10% or 1, 072(.10) 107.2 givers in each of the
income categories.

b.

The null hypothesis for testing whether the true proportions of charitable givers in each income group
are the same is:
H 0 : p1 p2 p10 .10

c.

Some preliminary calculations are: E1 E2 E10 npi ,0 1, 072(.10) 107.2

2
d.

[ ni Ei ]2 (42 107.2) 2 (93 107.2) 2


(127 107.2) 2

...
93.15
Ei
107.2
107.2
107.2

The rejection region requires .10 in the upper tail of the 2 distribution with df k 1 10 1 9 .
2
From Table IV, Appendix D, .10
14.6837 . The rejection region is 2 14.6837 .

10.10

e.

Since the observed value of the test statistic falls in the rejection region ( 2 93.15 14.6837) , H0 is
rejected. There is sufficient evidence to indicate that the true proportions of charitable givers in each
income group are not all the same at .10 .

a.

The qualitative variable is firm position on off-shoring. There are four levels: currently off-shoring,
not currently off-shoring, but plan to do so, off-shored in the past, but no more, and off-shoring is
not applicable.

b.

Let p1 proportion of firms currently off-shoring, p2 proportion of firms not currently off-shoring,
but plan to do so, p3 proportion of firms off-shored in the past, but no more, and p4 proportion of
firms where off-shoring is not applicable.
Copyright 2014 Pearson Education, Inc.

568

Chapter 10
Some preliminary calculations are: E1 E2 E3 E4 npi,0 600 .25 150
To determine if the proportions of U.S. firms in the four off-shoring position categories is significantly
different, we test:
H 0 : p1 p2 p3 p4 .25
H a : At least one of the probabilities differs from the hypothesized value

The test statistic is


2
ni Ei

(126 150) 2 (72 150) 2 (30 150) 2 (372 150) 2


2

468.96

Ei
150
150
150
150
The rejection region requires .05 in the upper tail of the 2 distribution with df k 1 4 1 3 .
2
From Table IV, Appendix D, .05
7.81473 . The rejection region is 2 7.81473 .

Since the observed value of the test statistic falls in the rejection region ( 2 468.96 7.81473) , H0
is rejected. There is sufficient evidence to indicate that at least one of the proportions of U.S. firms in
the four off-shoring position categories is significantly different at .05 .
c.

For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D,
z.025 1.96 .
p 1 126 / 600 .21

The confidence interval is:

.21(.79)
pq
p1 z.025
.21 1.96
.21 .033 (.177, .243)
n
600
We are 95% confident that the proportion of U.S. firms who are currently off-shoring is between .177
and .243.
10.11

Let p1 proportion users using both hands/both thumbs, p2 proportion of users using right hand/right
thumb, p3 proportion of users using left hand/left thumb, p4 proportion of users using both hands/right
index finger, p5 proportion of users using left hand/right index finger and p6 proportion of users using
other. Some preliminary calculations: E1 E2 E3 E4 E5 E6 npi,0 859 1/ 6 143.167 .

To determine if the proportions of mobile device users in the six texting style categories differ, we test:
H 0 : p1 p2 p3 p4 p5 p6 1 / 6
H a : At least one of the probabilities differs from the hypothesized value

The test statistic is


2
n Ei (396 143.167) 2 (311 143.167)2 (70 143.167) 2 (39 143.167)2
2 i
Ei
143.167
143.167
143.167
143.167

(18 143.167) 2 (25 143.167) 2

756.436
143.167
143.167

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 569


The rejection region requires .10 in the upper tail of the 2 distribution with df k 1 6 1 5 . From
2
Table IV, Appendix D, .10
9.23635 . The rejection region is 2 9.23635 .

Since the observed value of the test statistic falls in the rejection region ( 2 756.436 9.23635) , H0 is
rejected. There is sufficient evidence to indicate that the proportions of mobile device users in the six
texting style categories differ at .10 .
10.12

Let p1 proportion of anchor tenants, p2 proportion of major space users, p3 proportion of large
standard tenants, p4 proportion of small standard tenants, and p5 proportion of small tenants. Some
preliminary calculations:

E1 np1,0 1,821.01 18.21


E4 np4,0 1,821.40 728.4

E2 np2,0 1,821.05 91.05

E3 np3,0 1,821.10 182.1

E5 np5,0 1,821.44 801.24

To determine if the mall developers belief is correct, we test:


H 0 : p1 .01, p2 .05, p3 .10, p4 .40, p5 .44
H a : At least one of the probabilities differs from the hypothesized value

The test statistic is

ni Ei
Ei

(14 18.21) 2 (61 91.05) 2 (216 182.1) 2 (711 728.4) 2

18.21
91.05
182.1
728.4

(819 801.24) 2
18.011
801.24

The rejection region requires .01 in the upper tail of the 2 distribution with df k 1 5 1 4 . From
2
Table IV, Appendix D, .01
13.2767 . The rejection region is 2 13.2767 .

Since the observed value of the test statistic falls in the rejection region ( 2 18.011 13.2767) , H0 is
rejected. There is sufficient evidence to indicate that the proportions of tenants in the five categories differ
from the developers belief at .01 .

Copyright 2014 Pearson Education, Inc.

570
10.13

Chapter 10
a.

The data come from a multinomial experiment because there are several possible categorical
responses to the question.

b.

To determine if the multinomial probabilities agree with the theory, we test:


H 0 : p1 =.50, p2 p3 p4 p5 .10, p6 p7 .05

c.

Using MINITAB, the results are:


Chi-Square Goodness-of-Fit Test for Observed Counts in Variable: C1
Category
1
2
3
4
5
6
7

Observed
869
339
338
127
85
128
233

Test
Proportion
0.50
0.10
0.10
0.10
0.10
0.05
0.05

N
2119

Chi-Sq
452.483

P-Value
0.000

DF
6

Contribution
Expected
to Chi-Sq
1059.50
34.252
211.90
76.236
211.90
75.041
211.90
34.016
211.90
75.996
105.95
4.589
105.95
152.352

To determine if the multinomial probabilities agree with the theory, we test:


H 0 : p1 =.50, p2 p3 p4 p5 .10, p6 p7 .05
H a : At least one of the probabilities differs from its hypothesized value

The test statistic is 2 452.843 and the p-value is p 0.000 . Since the p-value is less that .01 ,
H0 is rejected. There is sufficient evidence to indicate that at least one of the proportions differs from
its hypothesized value at .01 .
10.14

Some preliminary calculations are:


E1 np1,0 2, 023(.45) 910.35

E2 np2,0 2, 023(.35) 708.05

E3 np3,0 2, 023(.15) 303.45

E4 np4,0 2, 023(.05) 101.15

To determine if the percentages of all adults falling into the four response categories changed after the
Enron scandal, we test:
H 0 : p1 .45, p2 .35, p3 .15, and p4 .05
H a : At lease one of the probabilities differs from the hypothesized value

The test statistic is

ni Ei

Ei

1,173 910.35
910.35

587 708.05
708.05

182 303.45
303.45

81 101.15

101.15

149.096
The rejection region requires .01 in the upper tail of the 2 distribution with df k 1 4 1 3 . From
2
Table IV, Appendix D, .01
11.3449 . The rejection region is 2 11.3449 .

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 571

Since the observed value of the test statistic falls in the rejection region ( 2 149.096 11.3449) , H0 is
rejected. There is sufficient evidence to indicate the percentages of all adults falling into the four response
categories changed after the Enron scandal at .01 .
10.15

Let p1 proportion of mail only users, p2 proportion of Internet only users, and p3 proportion of both
mail and Internet. Some preliminary calculations:
E1 E2 E3 npi,0 440 1 / 3 146.667

To determine if the professors beliefs are correct, we test:


H 0 : p1 p2 p3 1 / 3
H a : At least one of the probabilities differs from the hypothesized value

The test statistic is

ni Ei
Ei

(262 146.667) 2 (43 146.667) 2 (135 146.667) 2

164.895
146.667
146.667
146.667

The rejection region requires .01 in the upper tail of the 2 distribution with df k 1 3 1 2 . From
2
Table IV, Appendix D, .01
9.21034 . The rejection region is 2 9.21034 .

Since the observed value of the test statistic falls in the rejection region ( 2 164.895 9.21034) , H0 is
rejected. There is sufficient evidence to indicate that the proportions mail only, Internet only, and both
mail and Internet users differ at .01 .
10.16

Some preliminary calculations are:


E1 np1,0 943(.51) 480.93

E2 np2,0 943(.37) 348.91

E3 np3,0 943(.09) 84.87

E4 np4,0 943(.03) 28.29

To determine if the data from the independent survey contradict the percentages reported by the CPS Cell
Phone Supplement, we test:
H 0 : p1 .51, p2 .37, p3 .09 and p4 .03
H a : At least one of the probabilities differs from the hypothesized value

The test statistic is


2
ni Ei (473 480.93)2 (334 348.91)2 (106 84.87) 2 (30 29.29) 2
2

6.132

480.93
348.91
84.87
29.29
Ei
The rejections region requires .10 in the upper tail of the 2 distribution with df k 1 4 1 3 .
2
From Table IV, Appendix D, .10
6.25139. The rejection region is 2 6.25139 .

Since the test statistic does not fall in the rejection region ( 2 6.132 6.25139) , H0 is not rejected.
There is insufficient evidence to indicate the data from the independent survey contradict the percentages
reported by the CPS Cell Phone Supplement at .10 .
Copyright 2014 Pearson Education, Inc.

572

10.17

Chapter 10

To determine if the number of overweight trucks per week is distributed over the 7 days of the week in
direct proportion to the volume of truck traffic, we test:
H0:
Ha:

p1 = .191, p2 = .198, p3 = .187, p4 = .180, p5 = .155, p6 = .043, p7 = .046


At least one of the probabilities differs from the hypothesized value

E1 np1,0 414(.191) 79.074

E2 np2,0 414(.198) 81.972

E3 np3,0 414(.187) 77.418

E4 np4,0 414(.180) 74.520

E5 np2,0 414(.155) 64.170

E6 np3,0 414(.043) 17.802

E7 np3,0 414(.046) 19.044

The test statistic is

[ni Ei ]2 90 79.074 82 81.972 72 77.418 70 74.520

Ei
79.074
81.972
77.418
74.520
2

51 64.170
64.170

18 17.802

17.802

31 19.044
19.044

12.374

The rejection region requires .05 in the upper tail of the 2 distribution with df k 1 7 1 6 . From
Table IV, Appendix D, .05 12.5916 . The rejection region is 2 12.5916 .
2

Since the observed value of the test statistic does not fall in the rejection region ( 2 12.374 12.5916) , H0
is not rejected. There is insufficient evidence to indicate the number of overweight trucks per week is
distributed over the 7 days of the week is not in direct proportion to the volume of truck traffic at .05 .
10.18

Some preliminary calculations are:


E1 np1,0 435(.28) 121.8

E2 np2,0 435(.04) 17.4

E3 np3,0 435(.02) 8.7

E4 np4,0 435(.66) 287.1

To determine if the House of Representatives is not statistically representative of the religious affiliations
of their constituents, we test:
H 0 : p1 .28, p2 .04, p3 .02, and p4 .66
H a : At lease one of the probabilities differs from the hypothesized value

The test statistic is

ni Ei
Ei

(117 121.8) 2 (61 17.4) 2 (30 8.7) 2 (227 287.1) 2

174.169
121.8
17.4
8.7
287.1

Since no value of was given, we will use .05 . The rejections region requires .05 in the upper tail
2
of the 2 distribution with df k 1 4 1 3 . From Table IV, Appendix D, .05
7.81473 . The
rejection region is 2 7.81473 .
Since the test statistic falls in the rejection region ( 2 174.169 7.81473) , Ho is rejected. There is sufficient
evidence to indicate the House of Representatives is not statistically representative of the religious
affiliations of their constituents at .05 .
Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 573

10.19

a.

2
26.2962 . The rejection
df ( r 1)(c 1) (5 1)(5 1) 16 . From Table IV, Appendix D, .05

region is 2 26.2962 .
b.

2
15.9871 . The rejection
df ( r 1)(c 1) (3 1)(6 1) 10 . From Table IV, Appendix D, .10

region is 2 15.9871 .
c.

2
9.21034 . The rejection
df ( r 1)(c 1) (2 1)(3 1) 2 . From Table IV, Appendix D, .01

region is 2 9.21034 .
10.20

a.

b.

H0:
Ha:

The row and column classifications are independent


The row and column classifications are dependent

2
The test statistic is

[nij Eij ]2
.
E
ij

The rejection region requires .01 in the upper tail of the 2 distribution with
2
9.21034 . The rejection
df ( r 1)(c 1) (2 1)(3 1) 2 . From Table IV, Appendix D, .01

region is 2 9.21034 .
c.

d.

The expected cell counts are:


R C 96(25)
E11 1 1
14.37
n
167

RC
96(64)
E12 1 2
36.79
n
167

R C 96(78)
E13 1 3
44.84
n
167

R C 71(25)
E 21 2 1
10.63
n
167

RC
71(64)
E 22 2 2
27.21
n
167

R C 71(78)
E 23 2 3
33.16
n
167

The test statistic is


2

[nij E ij ]2 (9 14.37)2 (34 36.79) 2 (53 44.84) 2

14.37
36.79
44.84
E
ij

(16 10.63) 2 (30 27.21) 2 (25 33.16) 2

8.71
10.63
27.21
33.16

Since the observed value of the test statistic does not fall in the rejection region
( 2 8.71 9.21034) , H0 is not rejected. There is insufficient evidence to indicate the row and
column classifications are dependent at .01 .

Copyright 2014 Pearson Education, Inc.

574

10.21

Chapter 10

a.

To convert the frequencies to percentages, divide the numbers in each column by the column total
and multiply by 100. Also, divide the row totals by the overall total and multiply by 100. The
column totals are 25, 64, and 78, while the row totals are 96 and 71. The overall sample size is 165.
The table of percentages are:
Column
2

b.

Row 1

9
100 36%
25

34
100 53.1%
64

53
100 67.9%
78

96
100 57.5%
167

16
100 64%
25

30
100 46.9%
64

25
100 32.1%
78

71
100 42.5%
167

Using MINITAB, the graph is:


70
60

57.5

Percent

50
40
30
20
10
0

c.

10.22

2
Column

If the rows and columns are independent, the row percentages in each column would be close to the
row total percentages. This pattern is not evident in the plot, implying the rows and columns are not
independent. In Exercise 10.20, we did not have enough evidence to say the rows and columns were
not independent. If the sample sizes were bigger, we would have been able to reject H0.

Some preliminary calculations are:


R C 154(134)
E11 1 1
47.007
n
439

154(163)
E12
57.180
439

154(142)
E13
49.813
439

186(134)
E 21
56.774
439

186(163)
E 22
69.062
439

186(142)
E 23
60.164
439

99(134)
E 31
30.219
439

99(163)
E 32
36.759
439

99(142)
E 33
32.023
439

To determine if the row and column classifications are dependent, we test:


H0: The row and column classifications are independent
Ha: The row and column classifications are dependent

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 575

The test statistic is


2

[nij E ij ]2 (40 47.007)2 (72 57.180)2 (42 49.813)2 (63 56.774)2

47.007
57.180
49.813
56.774
E
ij

(53 69.062) 2 (70 60.164) 2 (31 30.219) 2 (38 36.759)2 (30 32.023)2

12.36
69.062
60.164
30.219
36.759
32.023

The rejection region requires .05 in the upper tail of the 2 distribution with
2
9.48773 . The rejection region is
df ( r 1)(c 1) (3 1)(3 1) 4 . From Table IV, Appendix D, .05

2 9.48773 .
Since the observed value of the test statistic falls in the rejection region ( 2 12.36 9.48773) , H0 is
rejected. There is sufficient evidence to indicate the row and column classification are dependent at
.05 .
a-b. To convert the frequencies to percentages, divide the numbers in each column by the column total
and multiply by 100. Also, divide the row totals by the overall total and multiply by 100.
B
B2

B1

Totals

40
100 29.9%
134

72
100 44.2%
163

42
100 29.6%
142

154
100 35.1%
439

A2

63
100 47.0%
134

53
100 32.5%
163

70
100 49.3%
142

186
100 42.4%
439

A3

31
100 23.1%
134

38
100 23.3%
163

30
100 21.1%
142

99
100 22.6%
439

Row

c.

B3

A1

Using MINITAB, the graph of A1 is:


50

40
35.1
Percent

10.23

30

20

10

2
B

The graph supports the conclusion that the rows and columns are not independent. If they were, then
the height of all the bars would be essentially the same.

Copyright 2014 Pearson Education, Inc.

576

Chapter 10

d.

Using MINITAB, the graph of A2 is:


50
42.4

Percent

40

30

20

10

2
B

The graph supports the conclusion that the rows and columns are not independent. If they were, then
the height of all the bars would be essentially the same.
e.

Using MINITAB, the graph of A3 is:

25
22.6

Percent

20

15

10

2
B

The graph does not support the conclusion that the rows and columns are not independent. All the
bars would be essentially the same.
10.24

a.
b.

The two qualitative variables are model of Accord and injury (yes or no).
The contingency table is:
Injury

No Injury

Total

Conventional

5,364

44,768

50,132

Hybrid
Total

137
5,501

1,368
46,136

1,505
51,637

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 577

c.

To determine if the injury rate for collision claims depends on Accord model, we test:
H 0 : Model and Injury rate are independent
H a : Model and Injury rate are dependent

d.

e.

The expected cell counts are:


R C 50,132(5,501)
E11 1 1
5,340.67
n
51, 637

RC
50,132(46,136)
E12 1 2
44, 791.33
n
51, 637

R C 1, 505(5, 501)
E 21 2 1
160.33
n
51, 637

R C 1, 505(46,136)
E 22 2 2
1, 344.67
n
51, 637

The test statistic is:


2

2
2
nij E ij
5,364 5,340.67 44, 768 44, 791.33

5,340.67
44, 791.33
E ij
2

137 160.33
160.33

1,368 1,344.67

1,344.67

3.91

This agrees with the test statistic found on the XLSTAT printout.
f.

The rejection region requires .05 in the upper tail of the 2 distribution with
2
3.84146 . The rejection
df ( r 1)(c 1) (2 1)(2 1) 1 . From Table IV, Appendix D, .05

region is 2 3.84146 . This is the same critical value found on the XLSTAT printout.
g.

Since the observed value of the test statistic falls in the rejection region ( 2 3.91 3.84146) , H0 is
rejected. There is sufficient evidence to indicate the injury rate for collision claims depends on
Accord model at .05 .
Since the p-value is less than ( p .0479 .05) , H0 is rejected. There is sufficient evidence to
indicate the injury rate for collision claims depends on Accord model at .05 .

h.

Some preliminary calculations are:


p1

x1 5,364

.107
n1 50,132

p 2

x2
137

.091
n2 1,505

For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D,
z.025 1.96 . The 95% confidence interval is:
( p1 p 2 ) z.025

p1q1 p 2 q2
.107(.893) .091(.909)

(.107 .091) 1.96

n1
n2
50,132
1,505

.016 .015 (.001, .031)


Since the interval contains only positive numbers, the injury rate for hybrid Accords is less than the
injury rate for conventional Accords.
Copyright 2014 Pearson Education, Inc.

578

10.25

Chapter 10

a.

Yes, it appears that the male and female tourists differ in their responses to purchasing photographs,
postcards, and paintings. The values in the Always and Rarely or Never categories are quite
different. The percentages are insufficient to draw a conclusion because the sample sizes must be
taken into account.

b.

The counts are found by changing the percentages to proportions and multiplying the proportions by
the sample sizes in each gender. The counts are:

Always
Often
Occasionally
Rarely or Never
Total

c.

Male Tourist Female Tourist


240
476
405
527
525
493
330
204
1500
1700

Total
716
932
1018
534
3200

To determine whether male and female tourists differ in their responses to purchasing photographs,
postcards, or paintings, we test:
H 0 : Gender and purchasing are independent
H a : Gender and purchasing are dependent

10.26

d.

The test statistic is 2 112.433 and the p-value is p .000 .

e.

Since the p-value is less than ( p .000 .01) , H0 is rejected. There is sufficient evidence to
indicate male and female tourists differ in their responses to purchasing photographs, postcards, or
paintings at .01 .

a.

The sample proportion of negative tone news stories that are deceptive is 111 / 170 .653 .

b.

The sample proportion of neutral tone news stories that are deceptive is 61 / 110 .555 .

c.

The sample proportion of positive tone news stories that are deceptive is 11 / 31 .355 .

d.

Yes, it appears that the proportion of news stories that are deceptive depends on the story tone. The
proportion that is deceptive for negative tone stories is .653, while the proportion that is deceptive for
positive tone stories is only .355. These proportions look much different.

e.

To determine if the authenticity of a news story depends on tone, we test:


H 0 : Authenticity and tone are independent
H a : Authenticity and tone are dependent

f.

The test statistic is 2 10.427 and the p-value is p .005 .


Since the p-value is less than ( p .005 .05) , Ho is rejected. There is sufficient evidence to
indicate authenticity of a news story depends on tone at .05 .

10.27

a.

To compare the two proportions, we could use either a test of hypothesis or a confidence interval. I
will use a 95% confidence interval.

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 579

Some preliminary calculations are:


p M 1

xM 1 29

.282
nM 103

p F 1

xF 1 89

.511
nF 174

For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D,
z.025 1.96 . The 95% confidence interval is:

p M 1 p F1 z.025

p M 1q M 1 p F 1q F 1
.282(.718) .511(.489)

.282 .511 1.96

103
174
nM
nF
.229 .114 (.343, .115)

We are 95% confident that the difference in the proportions of male and female professionals who
believe their salaries are too low is between .343 and .115. Since 0 is not in this interval, there is
evidence that the two proportions are different.
b.

Some preliminary calculations are:


p M 2

xM 2
58

.563
nM 103

p F 2

xF 2
64

.368
nF 174

For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D,
z.025 1.96 . The 95% confidence interval is:

p M 2 p F 2 z.025

p M 2 q M 2 p F 2 q F 2
.563(.437) .368(.632)

.563 .368 1.96

103
174
nM
nF

.195 .120 (.075, .315)


We are 95% confident that the difference in the proportions of male and female professionals who
believe their salaries are equitable/fair is between .075 and .315. Since 0 is not in this interval, there
is evidence that the two proportions are different.
c.

Some preliminary calculations are:


p M 3

xM 3 16

.155
nM 103

p F 3

xF 3
21

.121
nF 174

For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D,
z.025 1.96 . The 95% confidence interval is:

p M 3 p F 3 z.025

p M 3 q M 3 p F 3 q F 3
.155(.845) .121(.879)

.155 .121 1.96

103
174
nM
nF
.034 .085 (.051, .119)

We are 95% confident that the difference in the proportions of male and female professionals who
believe they are well paid is between .051 and .119. Since 0 is in this interval, there is no evidence
that the two proportions are different.

Copyright 2014 Pearson Education, Inc.

580

Chapter 10

d.
e.

Yes. Since there were differences between the proportions of males and females on 2 of the 3 levels,
there is evidence that the opinions of males and females are different.
Some preliminary calculations are:
R C 118(103)
43.877
E11 1 1
n
277

RC
118(174)
74.123
E12 1 2
n
277

R C 122(103)
45.365
E 21 2 1
n
277

RC
122(174)
76.635
E 22 2 2
n
277

R C 37(103)
13.758
E 31 3 1
n
277

RC
37(174)
23.242
E 33 3 3
n
277

To determine if the opinion on the fairness of a travel professionals salary differ for males and
females, we test:
H0: Opinion and Gender are independent
Ha: Opinion and Gender are dependent
The test statistic is
2

2
2
2
nij E ij
29 43.877 89 74.123 58 45.365

43.877
74.123
45.365
E
2

ij

64 76.635
76.635

16 13.758
13.758

21 23.242
23.242

14.214

The rejection region requires .10 in the upper tail of the 2 distribution with
2
4.60517 . The rejection
df ( r 1)(c 1) (3 1)(2 1) 2 . From Table IV, Appendix D, .10

region is 2 4.60517 .
Since the observed value of the test statistic falls in the rejection region ( 2 14.214 4.60517) , H0
is rejected. There is sufficient evidence to indicate that the opinions on the fairness of a travel
professionals salary differ for males and females at .10 .
f.

For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D,
z.05 1.645 . The 90% confidence interval is:

p M 1 p F 1 z.05

p M 1qM 1 p F 1q F 1
.282(.718) .511(.489)

.282 .511 1.645

103
174
nM
nF
.229 .096 (.325, .133)

We are 90% confident that the difference in the proportions of male and female professionals who
believe their salaries are too low is between -.325 and -.133. Since 0 is not in this interval, there is
evidence that the two proportions are different.
10.28

a.

Let p3 proportion of the 3-photos per page group who selected the target mugshot, p6 proportion
of the 6-photos per page group who selected the target mugshot, and p12 proportion of the 12-photos
per page group who selected the target mugshot.

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 581


p 3

19
19
15
.594 , p 6
.594 , p12
.469
32
32
32

The 12-photos per page group had the lowest proportion.


b.

The contingency table is:

3-photos per page


3-photos per page
3-photos per page
Total

c.

Target
Mugshot
selected
19
19
15
53

Target
Mugshot not
selected
13
13
17
43

Total

32
32
32
96

Some preliminary calculations are:


R C 32(53)
R C 32(53)
17.667
17.667
E11 1 1
E 21 2 1
96
96
n
n

R C 32(53)
17.667
E 31 3 1
n
96

RC
32(43)
14.333
E12 1 2
n
96

RC
32(43)
E 32 3 2
14.333
n
96

RC
32(43)
14.333
E 22 2 2
n
96

To determine if there are differences in the proportions who selected the target mugshot among the
three photo groups, we test:
H 0 : Photo group and Mugshot selection are independent
H a : Photo group and Mugshot selection are dependent

The test statistic is:


2

2
2
2
2
nij E ij
19 17.667 13 14.333 19 17.667 13 14.333

17.667
14.333
17.667
14.333
E ij
2

15 17.667
17.667

17 14.333

14.333

1.348

The rejection region requires .10 in the upper tail of the 2 distribution with
2
4.60517 . The rejection
df ( r 1)(c 1) (3 1)(2 1) 2 . From Table IV, Appendix D, .10

region is 2 4.60517 .
Since the observed value of the test statistic does not fall in the rejection region
( 2 1.348 4.60517) , H0 is not rejected. There is insufficient evidence to indicate that there are
differences in the proportions who selected the target mugshot among the three photo groups at
.10 .

Copyright 2014 Pearson Education, Inc.

582

10.29

Chapter 10

Using MINITAB, the contingency table analysis is:


Tabulated statistics: Position, Nationality
Using frequencies in Fr
Rows: Position

1
2
3
4
All

Columns: Nationality

All

126
72
30
372
600

75
36
9
180
300

35
10
4
51
100

93
27
6
174
300

329
145
49
777
1300

Cell Contents:

Count

Pearson Chi-Square = 21.242, DF = 9, P-Value = 0.012


Likelihood Ratio Chi-Square = 21.327, DF = 9, P-Value = 0.011

To determine if a firms position on off-shoring depends on the firms nationality, we test:


H 0 : Position and Nationality are independent
H a : Position and Nationality are dependent

From the printout, the test statistic is 2 21.242 and the p-value is p .012 . Since the p-value is less than
( p .012 .05) , H0 is rejected. There is sufficient evidence to indicate a firms position on off-shoring
depends on the firms nationality at .05 .
10.30

Some preliminary calculations are:


R C 57(60)
20
E11 1 1
171
n

RC
58(60)
20.35
E 21 2 1
171
n

R C 56(60)
19.65
E 31 3 1
n
171

RC
57(111)
37
E12 1 2
171
n

RC
58(111)
37.65
E 22 2 2
171
n

RC
56(111)
36.35
E 32 3 2
n
171

To determine if the option choice depends on emotion state, we test:


H0: Option choice and emotion state are independent
Ha: Option choice and emotion state are dependent
The test statistic is
2

2
2
2
2
nij E ij
(45 20) (12 37) (8 20.35) (50 37.65)
2
20
37
20.35
37.65
E ij

(75 19.65) 2 (49 36.35) 2

72.234
19.65
36.35

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 583

The rejection region requires .10 in the upper tail of the 2 distribution with
2
4.60517 . The rejection region is
df ( r 1)(c 1) (3 1)(2 1) 2 . From Table IV, Appendix D, .10

2 4.60517 .
Since the observed value of the test statistic falls in the rejection region ( 2 72.234 4.60517) , H0 is
rejected. There is sufficient evidence to indicate that the option choice depends on emotion state at .10 .
10.31

Some preliminary calculations are:


R C 396(335)
154.435
E11 1 1
859
n

R C 311(335)
121.286
E 21 2 1
859
n

R C 70(335)
27.299
E 31 3 1
n
859

R C 39(335)
15.210
E 41 4 1
859
n

R C 18(335)
7.020
E 51 5 1
n
859

R C 25(335)
9.750
E 61 6 1
n
859

RC
396(524)
241.565
E12 1 2
859
n

RC
311(524)
189.714
E 22 2 2
859
n

RC
70(524)
42.701
E 32 3 2
n
859

R C 39(524)
23.790
E 42 4 2
859
n

R C 18(524)
10.980
E 52 5 2
n
859

RC
25(524)
15.250
E 62 6 2
n
859

To determine if the proportions of mobile device users in the six texting style categories depend on whether
a male or female are texting, we test:
H 0 : Texting style and sex are independent
H a : Texting style and sex are dependent

The test statistic is:


2

2
2
2
nij E ij
161 154.435 235 241.565 14 15.250 4.209

154.435
241.565
15.250
E
2

ij

The rejection region requires .10 in the upper tail of the 2 distribution with
2
9.23635 . The rejection region is
df ( r 1)(c 1) (6 1)(2 1) 5 . From Table IV, Appendix D, .10

2 9.23635 .
Since the observed value of the test statistic does not fall in the rejection region ( 2 4.209 9.23635) , H0
is not rejected. There is insufficient evidence to indicate the proportions of mobile device users in the six
texting style categories depend on whether a male or female are texting at .10 .
10.32

Some preliminary calculations are:


R C 234(40)
21.419
E11 1 1
437
n

RC
234(397)
212.581
E12 1 2
437
n

R C 203(40)
18.581
E 21 2 1
437
n

RC
203(397)
184.419
E 22 2 2
437
n

Copyright 2014 Pearson Education, Inc.

584

Chapter 10

To determine if the response rate of air traffic controllers to mid-air collision alarms differs for true and
false alerts, we test:
H 0 : Responses and alerts are independent
H a : Responses and alerts are dependent

The test statistic is:


2

2
2
2
2
nij E ij
3 21.419 231 212.581 37 18.581 166 184.419 37.533

21.419
212.581
18.581
184.419
E
2

ij

The rejection region requires .05 in the upper tail of the 2 distribution with
2
3.84146 . The rejection region is
df ( r 1)(c 1) (2 1)(2 1) 1 . From Table IV, Appendix D, .05

2 3.84146 .
Since the observed value of the test statistic falls in the rejection region ( 2 37.533 3.84146) , H0 is
rejected. There is sufficient evidence to indicate the response rate of air traffic controllers to mid-air
collision alarms differs for true and false alerts at .05 .
10.33

Some preliminary calculations are:


R C 32(32)
10.667
E11 1 1
n
96

RC
32(32)
10.667
E 21 2 1
n
96

RC
32(32)
E 31 3 1
10.667
n
96

RC
RC
RC
32(64)
32(64)
32(64)
E 32 3 2
21.333
21.333
21.333
E12 1 2
E 22 2 2
n
n
96
n
96
96
To determine if the proportion of subjects who selected menus consistent with the theory depends on goal
condition, we test:

H0: Goal condition and Consistent with theory are independent


Ha: Goal condition and Consistent with theory are dependent
The test statistic is
2

2
2
2
2
nij E ij
15 10.667 17 21.333 14 10.667 18 21.333

10.667
21.333
10.667
21.333
E
2

ij

3 10.667
10.667

29 21.333
21.333

12.469

The rejection region requires .01 in the upper tail of the 2 distribution with
2
9.21034 . The rejection region is
df ( r 1)(c 1) (3 1)(2 1) 2 . From Table IV, Appendix D, .01

2 9.21034 .
Since the observed value of the test statistic falls in the rejection region ( 2 12.469 9.21034) , H0 is
rejected. There is sufficient evidence to indicate that the proportion of subjects who selected menus
consistent with the theory depends on goal condition at .01 .

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 585

10.34

Using MINITAB, the results of the table comparing type of coupon user and gender are:
Tabulated statistics: USER, GENDER
Rows: USER

Columns: GENDER

Female

Male

All

104
178
36
318

31
84
7
122

135
262
43
440

both
mail
net
All

Cell Contents:

Count

Pearson Chi-Square = 6.797, DF = 2, P-Value = 0.033


Likelihood Ratio Chi-Square = 7.105, DF = 2, P-Value = 0.029

To determine if type of coupon user depends on gender, we test:


H 0 : Type of coupon user and gender are independent
H a : Type of coupon user and gender are dependent

The test statistic is 2 6.797 and the p-value is p .033 . Since the p-value is not less than
( p .033 .01) , H0 is not rejected. There is insufficient evidence to indicate type of coupon user
depends on gender at .01 .
Using MINITAB, the results of the table comparing type of coupon user and coupon usage satisfaction
level are:
Tabulated statistics: USER, SATISF
Rows: USER

both
mail
net
All

Columns: SATISF

No

Some

Yes

All

3
28
4
35

9
62
9
80

123
172
30
325

135
262
43
440

Cell Contents:

Count

Pearson Chi-Square = 30.418, DF = 4, P-Value = 0.000


Likelihood Ratio Chi-Square = 34.934, DF = 4, P-Value = 0.000

To determine if type of coupon user depends on coupon usage satisfaction level, we test:
H 0 : Type of coupon user and coupon usage satisfaction level are independent
H a : Type of coupon user and coupon usage satisfaction level are dependent

The test statistic is 2 30.418 and the p-value is p .000 . Since the p-value is less than
( p .000 .01) , H0 is rejected. There is sufficient evidence to indicate type of coupon user depends on
coupon usage satisfaction level at .01 .
Copyright 2014 Pearson Education, Inc.

586

10.35

Chapter 10

Using MINITAB, the results are:


Tabulated statistics: Instruction, Strategy
Rows: Instruction

Columns: Strategy

Guess

Other

TTBC

All

Cue

5
20.83
35.71

6
25.00
35.29

13
54.17
76.47

24
100.00
50.00

Pattern

9
37.50
64.29

11
45.83
64.71

4
16.67
23.53

24
100.00
50.00

14
29.17
100.00

17
35.42
100.00

17
35.42
100.00

48
100.00
100.00

All

Cell Contents:

Count
% of Row
% of Column

Pearson Chi-Square = 7.378, DF = 2, P-Value = 0.025


Likelihood Ratio Chi-Square = 7.668, DF = 2, P-Value = 0.022

To determine if the choice of heuristic strategy depends on type of instruction, we test:


H0: Heuristic strategy and type of instruction are independent
Ha: Heuristic strategy and type of instruction are dependent
From the printout, the test statistic is 2 7.378 and the p-value is p .025 .
Since the p-value is less than ( p .025 .05) , H0 is rejected. There is sufficient evidence to indicate the
choice of heuristic strategy depends on type of instruction at .05 .
Since the p-value is not less than ( p .025 .01) , H0 is not rejected. There is insufficient evidence to
indicate the choice of heuristic strategy depends on type of instruction at .01 .

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 587

10.36

a.

Using MINITAB, the results for the First Trial are:


Chi-Square Test: Switch Boxes, No Switch

Expected counts are printed below observed counts


Chi-Square contributions are printed below expected counts
Switch
Boxes
10
6.50
1.885

No Switch
17
20.50
0.598

Total
27

3
6.50
1.885

24
20.50
0.598

27

5
6.50
0.346

22
20.50
0.110

27

8
6.50
0.346

19
20.50
0.110

27

Total
26
82
108
Chi-Sq = 5.876, DF = 3, P-Value = 0.118

To determine if the likelihood of switching boxes depends on condition for the first trial, we test:
H0: Likelihood of switching boxes and condition are independent
Ha: Likelihood of switching boxes and condition are dependent
From the printout above, the test statistic is 2 5.876 and the p-value is p 0.118 . Since the p-value
is not small, Ho is not rejected. There is insufficient evidence to indicate that the likelihood of
switching boxes depends on condition for the first trial for any value of .118 .
Using MINITAB, the results for the Last Trial are:
Chi-Square Test: Switch Boxes, No Switch

Expected counts are printed below observed counts


Chi-Square contributions are printed below expected counts
Switch
Boxes
23
18.75
0.963

No Switch
4
8.25
2.189

Total
27

12
18.75
2.430

15
8.25
5.523

27

21
18.75
0.270

6
8.25
0.614

27

19
18.75
0.003

8
8.25
0.008

27

Total
75
33
108
Chi-Sq = 12.000, DF = 3, P-Value = 0.007

Copyright 2014 Pearson Education, Inc.

588

Chapter 10

To determine if the likelihood of switching boxes depends on condition for the last trial, we test:
H0: Likelihood of switching boxes and condition are independent
Ha: Likelihood of switching boxes and condition are dependent
From the printout above, the test statistic is 2 12.00 and the p-value is p 0.007 . Since the p-value
is small, H0 is rejected. There is sufficient evidence to indicate that the likelihood of switching boxes
depends on condition for the last trial for any value of .007 .
b.

Using MINITAB, the results from the Empty condition are:


Chi-Square Test: Switch Boxes, No Switch
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

Switch
Boxes
10
16.50
2.561

No Switch
17
10.50
4.024

Total
27

23
16.50
2.561

4
10.50
4.024

27

Total
33
21
54
Chi-Sq = 13.169, DF = 1, P-Value = 0.000

To determine if the likelihood of switching boxes depends on trial number for the Empty condition, we
test:
H0: Likelihood of switching boxes and trial number are independent
Ha: Likelihood of switching boxes and trial number are dependent
From the printout above, the test statistic is 2 13.169 and the p-value is p 0.000 . Since the pvalue is so small, H0 is rejected. There is sufficient evidence to indicate that the likelihood of
switching boxes depends on trial number for the Empty condition for any value of .000 .
Using MINITAB, the results from the Vanish condition are:
Chi-Square Test: Switch Boxes, No Switch
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

Switch
Boxes
3
7.50
2.700

No Switch
24
19.50
1.038

Total
27

12
7.50
2.700

15
19.50
1.038

27

Total
15
39
54
Chi-Sq = 7.477, DF = 1, P-Value = 0.006

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 589

To determine if the likelihood of switching boxes depends on trial number for the Vanish condition,
we test:
H0: Likelihood of switching boxes and trial number are independent
Ha: Likelihood of switching boxes and trial number are dependent
From the printout above, the test statistic is 2 7.477 and the p-value is p 0.006 . Since the p-value
is so small, Ho is rejected. There is sufficient evidence to indicate that the likelihood of switching
boxes depends on trial number for the Vanish condition for any value of .006 .
Using MINITAB, the results from the Steroids condition are:
Chi-Square Test: Switch Boxes, No Switch

Expected counts are printed below observed counts


Chi-Square contributions are printed below expected counts

Switch
Boxes
5
13.00
4.923

No Switch
22
14.00
4.571

Total
27

21
13.00
4.923

6
14.00
4.571

27

Total
26
28
54
Chi-Sq = 18.989, DF = 1, P-Value = 0.000

To determine if the likelihood of switching boxes depends on trial number for the Steroids condition,
we test:
H0: Likelihood of switching boxes and trial number are independent
Ha: Likelihood of switching boxes and trial number are dependent
From the printout above, the test statistic is 2 18.989 and the p-value is p .000 . Since the p-value
is so small, H0 is rejected. There is sufficient evidence to indicate that the likelihood of switching
boxes depends on trial number for the Steroids condition for any value of .000 .
Using MINITAB, the results from the Steroids2 condition are:
Chi-Square Test: Switch Boxes, No Switch
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts

Switch
Boxes
8
13.50
2.241

No Switch
19
13.50
2.241

Total
27

19
13.50
2.241

8
13.50
2.241

27

Total
27
27
54
Chi-Sq = 8.963, DF = 1, P-Value = 0.003

Copyright 2014 Pearson Education, Inc.

590

Chapter 10

To determine if the likelihood of switching boxes depends on trial number for the Steroids2 condition,
we test:
H0: Likelihood of switching boxes and trial number are independent
Ha: Likelihood of switching boxes and trial number are dependent
From the printout above, the test statistic is 2 8.963 and the p-value is p .003 . Since the p-value is
so small, H0 is rejected. There is sufficient evidence to indicate that the likelihood of switching boxes
depends on trial number for the Steroids2 condition for any value of .003 .
c.

10.37

a.

Of all the tests performed, only one was not significant. There was no evidence that the likelihood of
switching boxes depended on condition for the first trial. All other tests indicated that the variables
were dependent. Thus, both condition and trial number influence a subject to switch.
To determine if the vaccine is effective in treating the MN strain of HIV, we test:
H0: Vaccine status and MN strain are independent
Ha: Vaccine status and MN strain are dependent
From the printout the test statistic is 2 4.411 and the p-value is p 0.036 . Since the p-value is
less than ( p .036 .05) , H0 is rejected. There is sufficient evidence to indicate that the vaccine is
effective in treating the MN strain of HIV at .05 .

b.

c.

d.

We must assume that we have a random sample from the population of interest. We cannot really
check this assumption. The second assumption is that all expected cell counts will be 5 or more. In
this case, since there are only 7 observations in the second row, there is no way that the expected cell
counts in that row will both be 5 or more (the sum of the expected cell counts in the row must sum to
the observed row total).
7 31
7!
31!
7 6 1
31 30 1

2 22 2!(7 2)! 22!(31 22)! 2 5 4 3 2 1 22 21 1 9 8 1 .04378
38!
38 37 1
38

24!(38 24)!
24 23 1 14 13 1
24

If vaccine status and MN are independent, then the proportion of those in each group that are positive
should be very similar. In these two additional tables, the proportion of positive results for the
unvaccinated group is increasing and the proportion of positive results for the vaccinated group is
decreasing.
Table 1:
7 31
7!
31!
7 6 1
31 30 1

1 23 1!(7 1)! 23!(31 23)! 1 6 5 4 3 2 1 23 22 1 8 7 1 .00571
38!
38 37 1
38

24!(38 24)!
24 23 1 14 13 1
24

Table 2:
7 31
7!
31!
7 6 1
31 30 1

0
24
0!(7 0)! 24!(31 24)! 1 7 6 5 4 3 2 1 24 23 1 7 6 1 .00027
38!
38 37 1
38

1 14 13 1
24!(38
24)!
24
23
24

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 591

10.38

e.

The p-value is 04378 .00571 .00027 .04976 . Since the p-value is less than ( p .04976 .05) ,
H0 is rejected. There is sufficient evidence to indicate that the vaccine is effective in treating the MN
strain of HIV at .05 .

a.

Some preliminary calculations are:


50(50)
E11
10
250
100(50)
E 21
20
250

50(90)
E12
18
250
100(90)
E 22
36
250

50(110)
E13
22
250
100(110)
E 23
44
250

100(50)
E 31
20
250

100(90)
E 32
36
250

100(110)
E 33
44
250

To determine if the rows and columns are dependent, we test:


H0:
H a:

Rows and columns are independent


Rows and columns are dependent
2

2
2
nij E ij
(20 10) (30 44) 54.14
The test statistic is
10
44
E ij
2

The rejection region requires .05 in the upper tail of the 2 distribution with
2
9.48773 . The rejection
df ( r 1)(c 1) (3 1)(3 1) 4 . From Table IV, Appendix D, .05

region is 2 9.48773 .
Since the observed value of the test statistic falls in the rejection region ( 2 54.15 9.48773) , H0
is rejected. There is sufficient evidence to indicate a dependence between rows and columns at
.05 .
b.

No, the analysis remains identical.

c.

Yes, the assumptions differ. If the row and column totals are not fixed, then we assume that we take
a random sample form a multinomial distribution. If the row totals are fixed, then we assume that we
are taking k random samples from k multinomial populations.

d.

The percentages are in the table below.

Row

Column
2

Totals

20
100% 40%
50

20
100% 22.2%
90

10
100% 9.1%
110

50
100% 20%
250

10
100% 20%
50

20
100% 22.2%
90

70
100% 63.6%
110

100
100% 40%
250

20
100% 40%
50

50
100% 55.6%
90

30
100% 27.3%
110

100
100% 40%
250

Copyright 2014 Pearson Education, Inc.

592

Chapter 10

e.

Using MINITAB, the bar graph is:

40

Percent

30

20

20

10

2
Column

The graph supports the decision in part a. In part a, we rejected the null hypothesis and concluded
that the rows and columns were dependent. If they were independent, then we would expect the three
bars to be the same height. In this graph, they are not the same height.
10.39

a.

If all the categories are equally likely, then p1,0 p2,0 p3,0 p4,0 p5,0 .2 .
E1 E2 E3 E4 E5 npi ,0 150(.20) 30

To determine if the categories are not equally likely, we test:


H 0 : p1 p2 p3 p4 p5 .2
H a : At lease one of the probabilities differs from .2

The test statistic is 2

[ni Ei ]2 (28 30) 2 (35 30) 2 (33 30) 2 (25 30) 2

2.133
30
30
30
30
Ei

The rejection region requires .10 in the upper tail of the 2 distribution with df k 1 5 1 4 .
2
From Table IV, Appendix D, .10
7.77944 . The rejection region is 2 7.77944 .

Since the observed value of the test statistic does not fall in the rejection region
( 2 2.133 7.77944) , H0 is not rejected. There is insufficient evidence to indicate the categories
are not equally likely at .10 .
b.

p 2

35
.233
150

For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D,
z.05 1.645 . The confidence interval is:
p 2 z.05

p 2 q2
.233(.767)
.233 1.645
.233 .057 (.176, .290)
n2
150

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 593

10.40

a.

The qualitative variable of interest is the location of professional sports stadiums and ballparks.
There are 3 levels or categories of this variable downtown, central city, and suburban.

b.

Let p1 proportion of major sports facilities located in downtown areas, p2 proportion of major
sports facilities located in central city areas, and p3 proportion of major sports facilities located in
suburban areas in 1997.
To determine if the proportions of major sports facilities in downtown, central city, and
suburban areas in 1997 are the different than in 1985, we test:
H 0 : p1 .40, p2 .30, p3 .30
H a : At lease one of the probabilities differs from its hypothesized value

c.

E1 np1,0 113(.40) 45.2 ; E2 np2,0 113(.30) 33.9 ; E3 np3,0 113(.30) 33.9

[ni Ei ]2 (58 45.2) 2 (26 33.9) 2 (29 33.9) 2

6.174
45.2
33.9
33.9
Ei

d.

The test statistic is 2

e.

The degrees of freedom for the test statistic is df k 1 3 1 2 . The p-value is p P ( 2 6.174) .
Using MINITAB,
Cumulative Distribution Function
Chi-Square with 2 DF
x
6.174

P( X <= x )
0.954361

The p-value is p 1 .954361 .045639 .


Since the p-value is smaller than ( p .0456 .05) , H0 is rejected. There is sufficient evidence to
indicate the proportions of major sports facilities in downtown, central city, and suburban areas in
1997 are the different than in 1985.
10.41

a.

b.

c.

The qualitative variable in this exercise is what Made in the USA means. There are 4 levels or
categories for this variable: 100% of labor and materials are produced in the US, 75-99% of labor
and materials are produced in the US, 50-74% of labor and materials are produced in the US, and less
than 50% of labor and materials are produced in the US.
The consumer advocate group hypothesized that p1 1 / 2 .5 , p2 1 / 4 .25 , p3 1 / 5 .20 , and
p4 .05 .
To determine if the consumer advocate groups claim is correct, we test:
H 0 : p1 .5, p2 .25, p3 .20 and p4 .05
H a : At lease one of the probabilities differs from its hypothesized value

Copyright 2014 Pearson Education, Inc.

594

Chapter 10

d.

Some preliminary calculations are:

n 64 20 18 4 106 .
E1 np1,0 106(.50) 53 ;

E2 np2,0 106(.25) 26.5 ;

E3 np3,0 106(.20) 21.2 ;

E4 np4,0 106(.05) 5.3

2
e.

[ni Ei ]2 (64 53) 2 (20 26.5) 2 (18 21.2) 2 (4 5.3) 2

4.68
53
26.5
21.2
5.3
Ei

The rejection region requires .10 in the upper tail of the 2 distribution with df k 1 4 1 3 .
2
From Table IV, Appendix D, .10
6.25139 . The rejection region is 2 6.25139 .

f.

g.

Since the observed value of the test statistic does not fall in the rejection region
( 2 4.68 6.25139) , H0 is not rejected. There is insufficient evidence to indicate the consumer
advocate groups claim is incorrect at .10 .
p1

n1 64

.604
n 106

For confidence coefficient .90, .10 and / 2 .10 / 2 .05 . From Table II, Appendix D,
z.05 1.645 . The 90% confidence interval is:
p1 z.05

p1 (1 p1 )
.604(.396)
.604 1.645
.604 .078 (.526, .682)
n
106

We are 90% confident that the proportion of all consumers who believe Made in the USA means
100% of labor and material are produced in the US is between .526 and .682.
10.42

a.

The contingency table would be:


Taxmotivation
Yes
No
Total

b.

Itemize Deductions
Yes
No
691
381
794
899
1,482
1,280

Total
1,072
1,693
2,765

R C 1, 072(1, 485)
E11 1 1
575.7
2, 765
n

RC
1, 072(1, 280)
E12 1 2
496.3
2, 765
n

R C 1, 693(1, 485)
E 21 2 1
909.3
2, 765
n

RC
1, 693(1, 280)
E 22 2 2
783.7
2, 765
n

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 595

c.

The test statistic is:

[nij Eij ]2 [691 575.7]2 [381 496.3]2 [794 909.3]2 [899 783.7]2

81.46
575.7
496.3
909.3
783.7
E
ij

d.

To determine if tax-motivation and itemize-deduction are related for charitable givers, we test:
H0: Tax-motivation and itemize-deduction are independent
Ha: Tax-motivation and itemize-deduction are dependent
The test statistic is 2 81.46 .
The rejection region requires .05 in the upper tail of the 2 distribution with
2
3.84146 . The rejection
df ( r 1)(c 1) (2 1)(2 1) 1 . From Table IV, Appendix D, .05

region is 2 3.84146 .
Since the observed value of the test statistic falls in the rejection region ( 2 81.46 3.84146) , H0 is
rejected. There is sufficient evidence to indicate that tax-motivation and itemize-deduction are
related for charitable givers at .05 .
To compute the bar graph, we first convert frequencies to percentages by dividing the numbers in
each column by the column total and multiplying by 100%. Also, divide the row totals by the overall
total and multiply by 100%.
Taxmotivation

Itemize Deductions
Yes

No

Total

Yes

691
100% 46.5%
1485

381
100% 29.8%
1280

1072
100% 38.8%
2765

No

794
100% 53.5%
1485
1,485

899
100% 70.2%
1280
1,280

1693
100% 61.2%
2765
2,765

Total

Using MINITAB, the bar graph is:


50

40

Percent

e.

38.8 %

30

20

10

Yes

No
Itemize

Copyright 2014 Pearson Education, Inc.

596

10.43

Chapter 10

a.

Some preliminary calculations are:


p C1

xC1
175

.028
n1 6, 222

p C 2

xC 2
236

.050
4, 692
n2

p C 3

xC 3
319

.045
7,140
n3

p C 4

xC 4
231

.038
6,120
n4

p C 5

xC 5
480

.046
n5 10,353

p C 6

xC 6
187

.039
4794
n6

The proportions range from .028 to .050. Since .050 is about twice as big as .028, there may be
evidence to conclude some of the proportions are different.
b.

Some preliminary calculations are:


RC
6, 222(37, 693)
E11 1 1
5,964.39
39,321
n

RC
6, 222(1628)
E12 1 2
257.61
39,321
n

RC
4, 692(37, 693)
E 21 2 1
4497.74
39,321
n

RC
4, 692(1, 628)
E 22 2 2
194.26
39,321
n

RC
7,140(37, 693)
E 31 3 1
6,844.38
n
39,321

RC
7,140(1, 628)
E 32 3 2
295.62
n
39,321

RC
6,120(37, 693)
E 41 4 1
5,866.61
39,321
n

RC
6,120(1, 628)
E 42 4 2
253.39
39,321
n

R C 10,353(37, 693)
E 51 5 1
9,924.36
n
39,321

RC
10,353(1, 628)
E 52 5 2
428.64
n
39,321

RC
4, 794(37, 693)
E 61 6 1
4,595.51
n
39,321

RC
4, 794(1, 628)
E 62 6 2
198.49
n
39,321

To determine if the proportions of censored measurements differ for the six tractor lines, we test:
H0: Tractor lines and Censored measurements are independent
Ha: Tractor lines and Censored measurements are dependent
The test statistic is
2

2
2
2
nij E ij
6047 5964.39 175 257.61 4456 4497.74

5964.39
257.61
4497.74
E
2

ij

187 198.49
198.49

48.0978

The rejection region requires .01 in the upper tail of the 2 distribution with
2
15.0863 . The rejection
df ( r 1)(c 1) (6 1)(2 1) 5 . From Table IV, Appendix D, .01

region is 2 15.0863 .

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 597

Since the observed value of the test statistic falls in the rejection region ( 2 48.0978 15.0863) ,
H0 is rejected. There is sufficient evidence to indicate that the proportions of censored measurements
differ for the six tractor lines at .01 .

10.44

c.

Even though there are differences in the proportions of censured data among the 6 tractor lines, these
proportions range from .028 to .050. In practice, there is very little difference between .028 and .050.

a.

Let p1 = proportion of abstainers with congestive heart failure. Then p1

b.

Let p2 = proportion of moderate drinkers with congestive heart failure. Then p 2

c.

Let p3 = proportion of heavy drinkers with congestive heart failure. Then p 3

d.

The three sample proportions found in parts a, b, and c appear to be different. It appears that the
proportion of AMI patients with congestive heart failure depends on alcohol consumption.

e.

To determine if the proportion of AMI patients with congestive heart failure depends on alcohol
consumption, we test:

f.

n1 146

.163 .
n 896
n2 106

.152 .
n 696

n3 29

.090 .
n 321

H0:

The proportion of AMI patients with congestive heart failure is independent of alcohol
consumption

H a:

The proportion of AMI patients with congestive heart failure depends on alcohol consumption

Using MINITAB, the results are:


Chi-Square Test: Abstain, Less 7, 7 or more
Expected counts are printed below observed counts
Chi-Square contributions are printed below expected counts
Abstain
146
131.61
1.573

Less 7
106
102.24
0.139

7 or more
29
47.15
6.988

Total
281

750
764.39
0.271

590
593.76
0.024

292
273.85
1.203

1632

Total

896

696

321

1913

Chi-Sq = 10.197, DF = 2, P-Value = 0.006

The test statistics is 2 10.197 , and the p-value p 0.006 .


Since the p-value is less than ( p .006 .05) , H0 is rejected. There is sufficient evidence to
indicate that the proportion of AMI patients with congestive heart failure depends on alcohol
consumption at .05 .

Copyright 2014 Pearson Education, Inc.

598

10.45

Chapter 10

a.

b.

Some preliminary calculations are:

E1 np1,0 400 .30 120

E2 np2,0 400 .20 80

E3 np3,0 400 .20 80

E4 np4,0 400 .10 40

E5 np5,0 400 .10 40

E6 np6,0 400 .10 40

The test statistic is

c.

ni Ei

Ei

(100 120) 2 (75 80) 2 (85 80) 2 (50 40) 2

120
80
80
40

(40 40) 2 (50 40) 2

8.958
40
40

To determine if the true percentages of the colors produced differ from the manufacturers stated
percentages, we test:
H 0 : p1 .30, p2 .20, p3 .20, p4 .10, p5 .10, and p6 .10
H a : At least one of the probabilities differs from the hypothesized value

The test statistic is 2 8.958 .


The rejection region requires .05 in the upper tail of the 2 distribution with df k 1 6 1 5 .
2
From Table IV, Appendix D, .05
11.0705 . The rejection region is 2 11.0705 .

Since the observed value of the test statistic does not fall in the rejection region
( 2 8.958 11.0705) , H0 is not rejected. There is insufficient evidence to indicate the true
percentages of the colors produced differ from the manufacturers stated percentages at .05 .
10.46

a.

Some preliminary calculations are:


E1 np1,0 1000(.50) 500

E2 np2,0 1000(.22) 220

E3 np3,0 1000(.11) 110

E4 np4,0 1000(.17) 170

To determine if the percentages disagree with the percentages reported by Nielson/NetRatings,


we test:
H 0 : p1 .50, p2 .22, p3 .11 and p4 .17
H a : At lease one of the probabilities differs from its hypothesized value

The test statistic is

ni Ei

Ei

487 500
500

245 220
220

121 110
110

Copyright 2014 Pearson Education, Inc.

147 170
170

7.391

Categorical Data Analysis 599

The rejection region requires .05 in the upper tail of the 2 distribution with
2
df k 1 4 1 3 . From Table IV, Appendix D, .05
7.81473 . The rejection region is

2 7.81473 .
Since the observed value of the test statistic does not fall in the rejection region
( 2 7.391 7.81473) , H0 is not rejected. There is insufficient evidence to indicate the percentages
disagree with the percentages reported by Nielson/NetRatings at .05 .
b.

Some preliminary calculations are:


p1

x1 487

.487
n 1000

For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D,
z.025 1.96 . The 95% confidence interval is:
p1 z.025

p1q1
.487(.513)
.487 1.96
.487 .031 (.456, .518)
n
1000

We are 95% confident that the percentage of all Internet searches that use the Google Search Engine
is between 45.6% and 51.8%.
10.47

a.

Some preliminary calculations are:


R C 53(35)
E11 1 1
26.5
70
n

RC
53(35)
E12 1 2
26.5
70
n

R C 17(35)
E 21 2 1
8.5
70
n

RC
17(35)
E 22 2 2
8.5
70
n

To determine if the severity of the ethical issue influenced whether the issue was identified or not by
the auditors, we test:
H0: Severity of ethical issue and identification are independent
Ha: Severity of ethical issue and identification are dependent
The test statistic is
2

2
2
2
2
nij E ij
(27 26.5) (26 26.5) (8 8.5) (9 8.5) .078

26.5
26.5
8.5
8.5
E ij
2

The rejection region requires .05 in the upper tail of the 2 distribution with
2
3.84146 . The rejection
df ( r 1)(c 1) (2 1)(2 1) 1 . From Table IV, Appendix D, .05

region is 2 3.84146 .
Since the observed value of the test statistic does not fall in the rejection region
( 2 .078 3.84146) , H0 is not rejected. There is insufficient evidence to indicate that the severity
of the ethical issue influenced whether the issue was identified or not by the auditors at .05 .

Copyright 2014 Pearson Education, Inc.

600

Chapter 10

b.

No. If there were 0 in the bottom cell of the column, then the expected count for that cell will be less
than 5. One of the assumptions necessary for the test statistic to have a 2 distribution will not hold.

c.

Suppose we change the numbers in the table to be as follows:

Ethical Issue Identified


Ethical Issue Not Identified

Severity of Ethical Issue


Moderate
Severe
32
21
3
14

Since the row and column totals are the same, the expected cell counts are the same as above.
The test statistic is
2

2
2
2
2
nij E ij
(32 26.5) (21 26.5) (3 8.5) (14 8.5) 9.401
2
26.5
26.5
8.5
8.5
E ij

Now the test statistic would fall in the rejection region.


10.48

Some preliminary calculations are:


RC
95(118)
E11 1 1
42.79
262
n

RC
69(118)
E 21 2 1
31.08
262
n

RC
42(118)
E 31 3 1
18.92
262
n

RC
56(118)
E 41 4 1
25.22
n
262

RC
95(144)
E12 1 2
52.21
n
262

RC
69(144)
E 22 2 2
37.92
n
262

RC
42(144)
E 32 3 2
23.08
n
262

RC
56(144)
E 42 4 2
30.78
n
262

To determine whether a pig farmers education level has an impact on the size of the pig farm, we test:
H0: Pig farmers education level and size of pig farm are independent
Ha: Pig farmers education level and size of pig farm are dependent
The test statistic is

[nij E ij ]2 (42 42.79) 2 (53 52.21) 2 (27 31.08) 2 (42 37.92) 2 (22 18.92) 2

42.79
52.21
31.08
37.92
18.92
E ij

(20 23.08) 2 (27 25.22) 2 (29 30.78) 2

2.14
23.08
25.22
30.78

The rejection region requires .05 in the upper tail of the 2 distribution with
2
7.81473 . The rejection region is
df ( r 1)(c 1) (4 1)(2 1) 3 . From Table IV, Appendix D, .05

2 7.81473 .

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 601

Since the observed value of the test statistic does not fall in the rejection region ( 2 2.14 7.81473) , H0
is not rejected. There is insufficient evidence to indicate that a pig farmers education level has an impact
on the size of the pig farm at .05 .
To compute the bar graph, we first convert frequencies to percentages by dividing the numbers in each row
by the row total and multiplying by 100%. Also, divide the column totals by the overall total and multiply
by 100%.
Farm Size

Education Level
No college
College

<1,000 pigs

42
100% 44.2%
95

53
100% 55.8%
95

1,000-2,000
pigs

27
100% 39.1%
69
22
100% 52.4%
42
27
100% 48.2%
56
118
100% 45.0%
262

42
100% 60.9%
69
20
100% 47.6%
42
29
100% 51.8%
56
144
100% 55.0%
262

2,000-5,000
pigs
> 5,000
pigs
Total

Total
95

69
42
56
262

Using MINITAB, the bar graph is:

50
45

Percent

40

30

20

10

<1,000

1,000-2,000
2,000-5,000
Farm Size

>5,000

Since the bars are all similar in height, it supports the conclusion to the test above.
10.49

Some preliminary calculations are: E1 E2 E3 E4 np1,0 83(.25) 20.75


To determine if there are differences in the percentages of incidents in the four cause categories, we test:
H 0 : p1 = p2 p3 p4 .25
H a : At lease one of the probabilities differs from its hypothesized value

Copyright 2014 Pearson Education, Inc.

602

Chapter 10

The test statistic is


2
n Ei 27 20.75 2 24 20.75 2 22 20.75 2 10 20.752 8.036
2 i
Ei
20.75
20.75
20.75
20.75
The rejection region requires .05 in the upper tail of the 2 distribution with df k 1 4 1 3 . From
2
Table IV, Appendix D, .05
7.81473 . The rejection region is 2 7.81473 .

Since the observed value of the test statistic falls in the rejection region ( 2 8.036 7.81473) , H0 is
rejected. There is sufficient evidence to indicate there are differences in the percentages of incidents in the
four cause categories at .05 .
10.50

a.

The two qualitative variables are years (1990, 1991, . . . , 2000) and acquisition status (yes or no).

b.

To determine if year and acquisition status are dependent, we test:


H0: Year and acquisition status are independent
Ha: Year and acquisition status are dependent

10.51

c.

From the printout, the test statistic is 2 297.048 and the p-value is p 0.000 . Since the p-value is
less than ( p 0.000 .05) , H0 is rejected. There is sufficient evidence to indicate that year and
acquisition status are dependent at .05 .

a.

The contingency table is:


Committee
Acceptable
Rejected
Inspector

b.

Totals

Acceptable

101

23

124

Rejected

10

19

29

Totals

111

42

153

Yes. To plot the percentages, first convert frequencies to percentages by dividing the numbers in
each column by the column total and multiplying by 100. Also, divide the row totals by the overall
total and multiply by 100.
Acceptable
Inspector

Rejected

Totals

Acceptable 101
100 90.99%
111

124
23
100 81.05%
100 54.76%
153
42

Rejected

29
19
100 18.95%
100 45.23%
153
42

10
100 9.01%
111

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 603

Using MINITAB, the graph of the data is:

90
81.1

80
70

Percent

60
50
40
30
20
10
0

Acceptable

Rejected
Committee

Since the heights of the bars are not similar, it appears there is a relationship.
c.

Some preliminary calculations are:


R C 124(111)
E11 1 1
89.691
153
n1

R C 124(42)
E 21 1 2
34.039
153
n1

R C 29(111)
E 21 2 1
21.039
153
n1

RC
29(42)
E 22 2 2
7.961
153
n1

To determine if the inspector's classifications and the committee's classifications are related, we test:
H0: The inspector's and committee's classification are independent
Ha: The inspector's and committee's classifications are dependent
The test statistic is
2

2
2
2
2
nij E ij
(101 89.961) (23 34.039) (10 21.039) (19 7.961) 26.034

89.961
34.039
21.039
7.961
E ij
2

The rejection region requires .05 in the upper tail of the 2 distribution with
2
3.84146 . The rejection
df ( r 1)(c 1) (2 1)(2 1) 1 . From Table IV, Appendix D, .05

region is 2 3.84146 .
Since the observed value of the test statistic falls in the rejection region ( 2 26.034 3.84146) , H0
is rejected. There is sufficient evidence to indicate the inspector's and committee's classifications are
related at .05 . This indicates that the inspector and committee tend to make the same decisions.
10.52

a.

Some preliminary calculations are:


E1 np1,0 85(.26) 22.1

E2 np2,0 85(.30) 25.5

E4 np4,0 85(.14) 11.9

E5 np2,0 85(.19) 16.15

E3 np3,0 85(.11) 9.35

Copyright 2014 Pearson Education, Inc.

604

Chapter 10

To determine if probabilities differ from the hypothesized values, we test:


H 0 : p1 .26, p2 .30, p3 .11, p4 .14 and p5 .19
H a : At lease one of the probabilities differs from its hypothesized value

The test statistic is


2

ni E i
(32 22.1) 2 (26 25.5) 2 (15 9.35) 2 (6 11.9) 2 (6 16.15) 2

17.16

Ei 2
22.1
25.5
9.35
11.9
16.15
2

The rejection region requires .05 in the upper tail of the 2 distribution with df k 1 5 1 4 .
2
From Table IV, Appendix D, .05
9.48773 . The rejection region is 2 9.48773 .

Since the observed value of the test statistic falls in the rejection region ( 2 17.16 9.48773) ,
reject H0. There is sufficient evidence to indicate the probabilities differ from their hypothesized
values at .05 .
b.

p1

n1 32

.376
n 85

For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D,
z.025 1.96 . The 95% confidence interval is:
p1 z.025

10.53

p1 (1 p1 )
.376(1 .376)
.376 1.96
.376 .103 .273, .479
n
85

c.

The interval tells us that between 27.3% and 47.9% of the Avonex MS patients are exacerbation-free
during a two-year period. Since this interval is completely above the percentage of placebo patients
(26%), it seems that the Avonex patients are more likely to have no exacerbations than placebo
patients.

a.

The contingency table is:


Altitude
< 300
300-600
600
Totals

b.

Flight Response
Low
High
85
105
77
121
17
59
179
285

Totals
190
198
76
464

Some preliminary calculations are:


R C 190(179)
E11 1 1
73.297
464
n

RC
190(285)
E12 1 2
116.703
464
n

R C 198(179)
E 21 2 1
76.384
n
464

RC
198(285)
E 22 2 2
121.616
n
464

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 605


RC
76(179)
E 31 3 1
29.319
n
464

RC
76(285)
E 32 3 2
46.681
n
464

To determine if flight response of the geese depends on the altitude of the helicopter, we test:
H0: Flight response and Altitude of helicopter are independent
Ha: Flight response and Altitude of helicopter are dependent
The test statistic is
2

2
2
2
nij E ij
85 73.297 105 116.703 77 76.384


73.297
116.703
76.384
E ij

121 121.616
121.616

17 29.319
29.319

59 46.681
46.681

11.477

The rejection region requires .01 in the upper tail of the 2 distribution with
2
9.21034 . The rejection
df ( r 1)(c 1) (3 1)(2 1) 2 . From Table IV, Appendix D, .01

region is 2 9.21034 .
Since the observed value of the test statistic falls in the rejection region ( 2 11.477 9.21034) , H0
is rejected. There is sufficient evidence to indicate that the flight response of the geese depends on
the altitude of the helicopter at .01 .
c.

The contingency table is:


Flight Response
Lateral
Distance
< 1000
1000-2000
2000-3000
3000
Totals

d.

Low
37
68
44
30
179

High
243
37
4
1
285

Totals
280
105
48
31
464

Some preliminary calculations are:


RC
280(179)
E11 1 1
108.017
464
n

RC
280(285)
E12 1 2
171.983
464
n

R C 105(179)
E 21 2 1
40.506
464
n

RC
105(285)
E 22 2 2
64.494
464
n

RC
48(179)
E 31 3 1
18.517
464
n

RC
48(285)
E 32 3 2
29.483
464
n

R C
31(179)
E 41 4 1
11.959
464
n

RC
31(285)
E 42 4 2
19.041
464
n

Copyright 2014 Pearson Education, Inc.

606

Chapter 10

To determine if flight response of the geese depends on the lateral distance of the helicopter, we test:
H0: Flight response and Lateral distance of the helicopter are independent
Ha: Flight response and Lateral distance of the helicopter are dependent
The test statistic is
2
2
2
2
2
nij E ij
37 108.017 243 171.983 68 40.506 37 64.494


108.017
171.983
40.506
64.494
E
ij

44 18.517
18.517

4 29.494
29.494

30 11.959
11.959

1 19.041
19.041

207.814

The rejection region requires .01 in the upper tail of the 2 distribution with
2
11.3449 . The rejection
df ( r 1)(c 1) (4 1)(2 1) 3 . From Table IV, Appendix D, .01

region is 2 11.3449 .
Since the observed value of the test statistic falls in the rejection region ( 2 207.814 11.3449) , H0
is rejected. There is sufficient evidence to indicate that the flight response of the geese depends on
the lateral distance of the helicopter at .01 .
e.

Using SAS, the contingency table for altitude by response with the column percents is:
Table of ALTGRP by RESPONSE
ALTGRP

RESPONSE

Frequency|
Percent |
Row Pct |
Col Pct |LOW
|HIGH
| Total
---------+--------+--------+
<300
|
85 |
105 |
190
| 18.32 | 22.63 | 40.95
| 44.74 | 55.26 |
| 47.49 | 36.84 |
---------+--------+--------+
300-600 |
77 |
121 |
198
| 16.59 | 26.08 | 42.67
| 38.89 | 61.11 |
| 43.02 | 42.46 |
---------+--------+--------+
600+
|
17 |
59 |
76
|
3.66 | 12.72 | 16.38
| 22.37 | 77.63 |
|
9.50 | 20.70 |
---------+--------+--------+
Total
179
285
464
38.58
61.42
100.00
Statistics for Table of ALTGRP by RESPONSE
Statistic
DF
Value
Prob
-----------------------------------------------------Chi-Square
2
11.4770
0.0032
Likelihood Ratio Chi-Square
2
12.1040
0.0024
Mantel-Haenszel Chi-Square
1
10.2104
0.0014
Phi Coefficient
0.1573
Contingency Coefficient
0.1554
Cramer's V
0.1573
Sample Size = 464

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 607

From the row percents, it appears that the lower the plane, the lower the response. For altitude
<300m, 55.26% of the geese had a high response. For altitude 300-600m, 61.11% of the geese had a
high response. For altitude 600+m, 77.63% of the geese had a high response. Thus, instead of
setting a minimum altitude for the planes, we need to set a maximum altitude. For this data, the
lowest response is at an altitude of < 300 meters.
Using SAS, the contingency table for lateral distance by response with the column percents is:
The FREQ Procedure
Table of LATGRP by RESPONSE
LATGRP

RESPONSE

Frequency |
Percent
|
Row Pct
|
Col Pct
|LOW
|HIGH
| Total
----------+--------+--------+
<1000
|
37 |
243 |
280
|
7.97 | 52.37 | 60.34
| 13.21 | 86.79 |
| 20.67 | 85.26 |
----------+--------+--------+
1000-2000 |
68 |
37 |
105
| 14.66 |
7.97 | 22.63
| 64.76 | 35.24 |
| 37.99 | 12.98 |
----------+--------+--------+
2000-3000 |
44 |
4 |
48
|
9.48 |
0.86 | 10.34
| 91.67 |
8.33 |
| 24.58 |
1.40 |
----------+--------+--------+
3000+
|
30 |
1 |
31
|
6.47 |
0.22 |
6.68
| 96.77 |
3.23 |
| 16.76 |
0.35 |
----------+--------+--------+
Total
179
284
464
38.58
61.42
100.00
Statistics for Table of LATGRP by RESPONSE
Statistic
DF
Value
Prob
-----------------------------------------------------Chi-Square
3
207.0812
<.0001
Likelihood Ratio Chi-Square
3
227.5212
<.0001
Mantel-Haenszel Chi-Square
1
189.2843
<.0001
Phi Coefficient
0.6692
Contingency Coefficient
0.5562
Cramer's V
0.6692
Sample Size = 464

From the row percents, it appears that the greater the lateral distance, the lower the response. For a
lateral distance of 3000+m only 3.23% of the geese had a high response. Thus, the further away the
plane is laterally, the lower the response. For this data, the lowest response is when the plane is
further than 3000 meters.
Thus, the recommendation would be a maximum height of 300 m and a minimum lateral distance of
3000 m.

Copyright 2014 Pearson Education, Inc.

608

10.54

Chapter 10

a.

Some preliminary calculations are:


The contingency table is:

Shift

Defectives
25
35
80
140

1
2
3
Total

RC
200(140)
E11 1 1 =
46.667
600
n

Non-Defectives
175
165
120
460

Total
200
200
200
600

200(140)
E 21 E 31
46.667
600

200(460)
E12 E 22 E 32
153.333
600

To determine if quality of the filters are related to shift, we test:


H0:
H a:

Quality of filters and shift are independent


Quality of filters and shift are dependent

The test statistic is

2
2
2
2
[nij Eij ]2 25 46.667 35 46.667 80 46.667 175 153.333

46.667
46.667
46.667
153.333
E
ij

165 153.333
153.333

120 153.333
153.333

47.98

The rejection region requires .05 in the upper tail of the 2 distribution with
2
5.99147 . The rejection
df ( r 1)(c 1) (3 1)(2 1) 2 . From Table IV, Appendix D, .05

region is 2 5.99147 .
Since the observed value of the test statistic falls in the rejection region ( 2 47.98 5.99147) , H0
is rejected. There is sufficient evidence to indicate quality of filters and shift are related at .05 .
b.

p1

25
.125
200

For confidence coefficient .95, .05 and / 2 .05 / 2 .025 . From Table II, Appendix D,
z.025 1.96 . The 95% confidence interval is:
p1 z.025

p1 (1 p1 )
.125(.875)
.125 1.96
.125 .046 .079, .171
n
200

Copyright 2014 Pearson Education, Inc.

Categorical Data Analysis 609

10.55

a.

[ni Ei ]2 (26 23) 2 (146 136) 2 (361 341) 2 (143 136) 2 (13 23) 2

9.647
23
136
341
136
23
Ei

b.
c.

2
From Table IV, Appendix D, with df 5 , .05
11.0705
No. Since the observed value of the test statistics does not fall in the rejection region
( 2 9.647 11.0705) , H0 is not rejected. There is insufficient evidence to indicate the salary
distribution is non-normal for .05 .

d.

The p-value is p P ( 2 9.647) . Using MINITAB,


Cumulative Distribution Function
Chi-Square with 5 DF
x
9.647

P( X <= x )
0.914122

The p-value is p P ( 2 9.647) 1 .914122 .085878 .


10.56

Using MINITAB, the results are:


Tabulated statistics: Defect, PredEVG
Using frequencies in Fr
Rows: Defect

1
2
All

Columns: PredEVG

All

441
47
488

8
2
10

449
49
498

Cell Contents:

Count

Pearson Chi-Square = 1.188, DF = 1


Likelihood Ratio Chi-Square = 0.948, DF = 1

To determine if Defect and Pred_EVG are dependent, we test:


H 0 : Defect and Pred_EVG are independent
H a : Defect and Pred_EVG are dependent

The test statistic is 2 1.188 .


Since no level was given, we will use .05 . The rejection region requires .05 in the upper tail of
2
the 2 distribution with df ( r 1)(c 1) (2 1)(2 1) 1 . From Table IV, Appendix D, .05
3.84146 .
The rejection region is 2 3.84146 .
Since the observed value of the test statistic does not fall in the rejection region ( 2 1.188 3.84146) , H0
is not rejected. There is insufficient evidence to indicate that Defect and Pred_EVG are dependent at
.05 . If Defect and Pred_EVG are independent, then the Pred_EVG is no better predicting defects than
just guessing. I would not recommend the essential complexity algorithm be used as a predictor of
defective software modules.
Copyright 2014 Pearson Education, Inc.

610

10.57

Chapter 10

Using SAS, the output is:


The FREQ Procedure
Table of CANDIDATE by TIME
CANDIDATE

TIME

Frequency|
Col Pct |
1|
2|
3|
4|
5|
6|
---------+--------+--------+--------+--------+--------+--------+
SMITH
|
208 |
208 |
451 |
392 |
351 |
410 |
| 52.53 | 55.32 | 55.34 | 55.92 | 56.16 | 55.33 |
---------+--------+--------+--------+--------+--------+--------+
COPPIN
|
55 |
51 |
109 |
98 |
88 |
104 |
| 13.89 | 13.56 | 13.37 | 13.98 | 14.08 | 14.04 |
---------+--------+--------+--------+--------+--------+--------+
MONTES
|
133 |
117 |
255 |
211 |
186 |
227 |
| 33.59 | 31.12 | 31.29 | 30.10 | 29.76 | 30.63 |
---------+--------+--------+--------+--------+--------+--------+
Total
396
376
815
701
625
741

Total
2020

505

1129

3654

Statistics for Table of CANDIDATE by TIME


Statistic
DF
Value
Prob
-----------------------------------------------------Chi-Square
10
2.2839
0.9937
Likelihood Ratio Chi-Square
10
2.2722
0.9938
Mantel-Haenszel Chi-Square
1
0.9851
0.3209
Phi Coefficient
0.0250
Contingency Coefficient
0.0250
Cramer's V
0.0177
Sample Size = 3654

To determine if candidates received votes independent of time period, we test:


H0: Voting and Time period are independent
Ha: Voting and Time period are dependent
The test statistic is 2 2.2839 .
Since no value of was given, we will use .05 . The rejection region requires .05 in the upper tail
of the 2 distribution with df ( r 1)(c 1) (3 1)(6 1) 10 . From Table IV, Appendix D,
2
.05
18.3070 . The rejection region is 2 18.3070 .

Since the observed value of the test statistic does not fall in the rejection region ( 2 2.2839 18.3070) ,
H0 is not rejected. There is insufficient evidence to indicate Voting and Time period are dependent at
.05 . Thus, we can conclude that voting and time period are independent. This means that regardless
of time period, the percentage of votes received by each candidate is the same. In the table created by SAS,
the bottom number in each cell is the column percent. This is the percent of votes received by the
candidate in each time period. An inspection of these percents indicates that candidate Smith received
approximately 55.3% of the votes each time period, candidate Coppin received approximately 13.8% of the
vote, and candidate Montes received approximately 30.9% of the vote. All of this indicates that the
election was rigged

Copyright 2014 Pearson Education, Inc.

S-ar putea să vă placă și