CHP 8-12

Chapter 08 - The Comparison of Two Populations
CHAPTER 8
THE COMPARISON OF TWO POPULATIONS
8-1. n = 25 D = 19.08 s D = 30.67

H0:  D = 0 H1:  D  0
D   D0
t (24) = = 3.11
sD / n
Reject H0 at  = 0.01.
Paired Difference Test

Evidence
Size 25 n Assumption
Average Difference 19.08 D Populations Normal
Stdev. of Difference 30.67 sD
Note: Difference has been defined as
Test Statistic 3.1105 t
df 24
Hypothesis Testing At an  of
Null Hypothesis p-value 5%
H0: 1 2 =0 0.0048 Reject
8-2. n = 40 D = 5 s D = 2.3
H0:  D = 0 H1:  D  0
50
t(39) = = 13.75
2.3 / 40
Strongly reject H0. 95% C.I. for  D  2.023(2.3/ 40 ) = [4.26, 5.74].
8-3. n = 12 D = 3.67 s D = 2.45 (D = Movie – Commercial)

H0:  D = 0 H1:  D  0
8-1
(template: Testing Paired Difference.xls, sheet: Sample Data)

Data
Current Previous Evidence
M C Size 9 n Assumption
1 15 10 Average Difference 3.66667 D Populations Normal
2 17 9 Stdev. of Difference 2.44949 sD
3 25 21 Note: Difference has been defined as
4 17 16 Test Statistic 4.4907 t
5 14 11 df 8
6 18 12 Hypothesis Testing At an  of
7 17 13 Null Hypothesis p-value 5%
8 16 15 H0: 1 2 =0 0.0020 Reject
9 14 13 H0: 1 2 >=0 0.9990
H0: 1 2 <=0 0.0010 Reject
At  = 0.05, we reject H0. There are more viewers for movies than commercials.
8-4. n = 60 D = 0.2 sD = 1
H0:  D  0 H1:  D > 0
0.2  0
t(24) = = 1.549. At  = 0.05, we cannot reject H0.
1 / 60

Evidence
Stdev. of Difference 1 sD
df 59
H0: 1 2 =0 0.1267
H0: 1 2 >=0 0.9367
H0: 1 2 <=0 0.0633
8-5. n = 15 D = 3.2 s D = 8.436 (D = After – Before)

H0:  D  0 H1:  D > 0
3.2  0
t (14) = = 1.469
8.436 / 15
8-2
There is no evidence that the shelf facings are effective.
8-6. n = 12 D = 37.08 s D = 43.99

H0:  D = 0 H1:  D  0
(template: Testing Paired Difference.xls, sheet: Sample Data)

Data
Current Previous Evidence
France Spain Size 12 n Assumption
1 258 214 Average Difference 37.0833 D Populations Normal
2 289 250 Stdev. of Difference 43.9927 sD
3 228 190 Note: Difference has been defined as
4 200 185 Test Statistic 2.9200 t
5 190 114 df 11
6 350 285 Hypothesis Testing At an  of
7 310 378 Null Hypothesis p-value 5%
8 212 230 H0: 1 2 =0 0.0139 Reject
9 195 160 H 0: 1 2 >= 0 0.9930
10 175 120 H0: 1 2 <=0 0.0070 Reject
11 299 220
12 190 105
Reject H0. There is strong evidence that hotels in Spain are cheaper than those in France,
based on this small sample. p-value = 0.0139
8-7. Power at  D = 0.1 n = 60  D = 1.0  = 0.01

H0:  D  0 H1:  D > 0
C =  0 + 2.326(  / n ) = 0.30029 We need:
P( D > C |  D = 0.1)
= P( D > 0.30029 |  D = 0.1)
 0.30029 0.1 
= P  Z  
 1 / 60 
= P(Z > 1.551) = 0.0604
8-8. n = 20 D = 1.25 s D = 42.896

H0:  D = 0 H1:  D  0
8-3
1.25  0
t (19) = = 0.13
42.89 / 20
Do not reject H0; no evidence of a difference.

Evidence
Stdev. of Difference 42.89 sD
df 19
H0: 1 2 =0 0.8977
8-9. n1 = 100 n 2 = 100 x1 = 76.5 x 2 = 88.1 s1 = 38 s 2 = 40

H0:  2  1  0 H1:  2  1  0
(Template: Testing Population Means.xls, sheet: Z-test from Stats)

(need to use the t-test since the population std. dev. is unknown)
Evidence Assumptions
Sample1 Sample2 Populations Normal
Size 100 100 n H0: Population Variances Equal
Mean 76.5 88.1 x-bar F ratio 1.10803
Std. Deviation 38 40 s p-value 0.6108
Assuming Population Variances are Equal

Pooled Variance 1522 s2p
Test Statistic -2.1025 t
df 198
At an  of Confidence Interval for difference in Population Means
Confidence
Null Hypothesis p-value 5%  Interval
H0: 1 2 =0 0.0368 Reject 95% -11.6 ± 10.8801 = [ -22.48, -0.7199 ]
H0: 1 2 >=0 0.0184 Reject
H0: 1 2 <=0 0.9816
Reject H0. There is evidence that gasoline outperforms ethanol.
8-10. n1 = n 2 = 30
H0: 1   2 = 0 H1: 1   2  0
Nikon (1): x1 = 8.5 s1 = 2.1 Minolta (2): x 2 = 7.8 s 2 = 1.8
8-4
8.5  7.8
z= = 1.386
2 2
(2.1 / 30)  (1.8 / 30)
Do not reject H0. There is no evidence of a difference in the average ratings of the two cameras.
8-11. Bel Air (1): n1 = 32 x1 = 2.5M s1 = 0.41M

Marin (2): n 2 = 35 x 2 = 4.32M s 2 = 0.87M
H0: 1   2 = 0 H1: 1   2  0
(Template: Testing Population Means.xls, sheet: t-test from Stats)

Equal variance assumptions is questionable.
t-Test for Difference in Population Means
Mean 2.5 4.32 x-bar F ratio 4.50268
Std. Deviation 0.41 0.87 s p-value 0.0001

Pooled Variance 0.47609 s2p Warning: Equal variance assumption is questionable.
df 65
Confiden
ce
H0: 1 2 =0 0.0000 Reject 95% -1.82 ± 0.33704 = [ -2.157, -1.48296 ]
H0: 1 2 >=0 0.0000 Reject
H0: 1 2 <=0 1.0000
Assuming Population Variances are Unequal

df 49
Null Hypothesis p-value 5%  Confidence Interval
H0: 1 2 =0 0.0000 Reject 95% -1.82 ± 0.32946 = [ -2.1495, -1.49054 ]
H0: 1 2 >=0 0.0000 Reject
H0: 1 2 <=0 1.0000
8-5
Reject H0. There is evidence that the average Bel Air price is lower.
8-12. (Template: Testing Population Means.xls, sheet: t-test from Stats)

H0: μJ – μSP = 0 H1: μJ – μSP ≠ 0

Mean 15 6.2 x-bar F ratio 1.36111
Std. Deviation 3 3.5 s p-value 0.3398

Pooled Variance 10.625 s2p
df 78
Confidence
H0: 1 2 =0 0.0000 Reject 95% 8.8 ± 1.45107 = [ 7.34893, 10.2511 ]
H0: 1 2 >=0 1.0000
H0: 1 2 <=0 0.0000 Reject
Reject the null hypothesis. The global equities outperform U.S. market.
8-13. Music: n1 = 128 x1 = 23.5 s1 = 12.2

Verbal: n 2 = 212 x 2 = 18.0 s 2 = 10.5
H0: 1   2 = 0 H1: 1   2  0
23.5  18.0
z= = 4.24
(12.2 / 128)  (10.5 2 / 212)
2
Reject H0. Music is probably more effective.
8-6
Evidence
Sample1 Sample2
Size 128 212 n
Mean 23.5 18 x-bar
Popn. 1 Popn. 2
Popn. Std. Devn. 12.2 10.5 
฀
Hypothesis Testing
Test Statistic 4.2397 z

At an  of
H0: 1 2 =0 0.0000 Reject
8-14. n1 = 13 n 2 = 13 x1 = 20.385 x 2 = 10.385

s1 = 7.622 s 2 = 4.292  = .05
H0: u1 = u2 H1: u1  u2
S p2 
13 17.622 2  13 14.292 2  38.2581
13  13 2
20.385 10.385
t 24   4.1219

38.2581 1  1
13 13

df  24.
Use a critical value of 2.064 for a two-tailed test. Reject H0. The two methods do differ.
8-15. Liz (1): n1 = 32 x1 = 4,238 s1 = 1,002.5

Calvin (2): n 2 = 37 x 2 = 3,888.72 s 2 = 876.05
a. one-tailed: H0: 1   2  0 H1: 1   2 > 0
4,238  3,888.72  0
b. z = = 1.53
(1,002.52 / 32)  (876.052 / 37)
c. At  = 0.5, the critical point is 1.645. Do not reject H0 that Liz Claiborne models do not get
more money, on the average.
d. p-value = .5  .437 = .063 (It is the probability of committing a Type I error if we choose
to reject and H0 happens to be true.)
8-7
e.
S 2p 
10  11002.5 2  11  1876.05 2  879983.804
10  11  2
4238  3888.72
t 24   0.8522

879983.804 1  1
10 11

df  19
8-16. (Template: Testing Population Means.xls, sheet: t-test from Stats)

H0: 1   2 = 0 H1: 1   2  0
Mean 0.19 0.72 x-bar F ratio 1.25792

df 54
Confidence
H0: 1 2 =0 0.7158 99% -0.53 ± 3.86682 = [ -4.3968, 3.33682 ]
H0: 1 2 >=0 0.3579
H0: 1 2 <=0 0.6421
Do not reject the null hypothesis. Pre-earnings announcements have no impact on earnings on
stock investments.
8-17. Non-research (1): n1 = 255 s1 = 0.64

Research (2): n 2 = 300 s 2 = 0.85
x 2  x1 = 2.54
95% C.I. for  2  1 is: ( x 2  x1 )  z / 2 (s1 / n1 )  (s 2 / n2 )

2 2
2 2
= 2.54  1.96 (.64 / 255)  (.85 / 300) = [2.416, 2.664] percent.
8-8
8-18. Audio (1): n1 = 25 x1 = 87 s1 = 12

Video (2): n 2 = 20 x 2 = 64 s 2 = 23
H0: 1   2 = 0 H1: 1   2  0
x1  x 2  0
t(43) = = 4.326
(n1  1) s1  (n 2  1) s 2
2 2
1 1 
  
n1  n 2  2  n1 n 2 
Reject H0. Audio is probably better (higher average purchase intent). Waldenbooks should
concentrate in audio.
Evidence
Sample1 Sample2
Size 25 20 n
Mean 87 64 x-bar
Std. Deviation 12 23 s

df 43
At an  of
H0: 1 2 =0 0.0001 Reject
8-19. With training (1): n1 = 13 x1 = 55 s1 = 8

Without training (2): n 2 = 15 x 2 = 48 s2 = 6
H0: 1   2  4,000 H1: 1   2 > 4,000
(55  48)  4
t (26) = = 1.132
2 2
(12)(8)  (14)(6)  1 1
  
26  13 15 
The critical value at  = .05 for t (26) in a right-hand tailed test is 1.706. Since 1.132 < 1.706,
there is no evidence at  = .05 that the program executives get an average of $4,000 per year
more than other executives of comparable levels.
8-20. (Use template: “testing difference in means.xls”)

H0: μP - μL= 0 H1: μP - μL  0
8-9

Mean 1 6 x-bar F ratio 5.16529
The variances are not equal.
Assuming Population Variances are Unequal

df 26
At an  Confidence Interval for difference in Population
of Means
Confidence
H0: 1 2 =0 0.0000 Reject 95% -5 ± 1.25539 = [ -6.2554, -3.74461 ]
H0: 1 2 >=0 0.0000 Reject
H0: 1 2 <=0 1.0000
Reject the null hypothesis: the average cost of beer is cheaper in Prague. Londoners save
between $3.74 and $6.26.

H0: 1   2 = 0 H1: 1   2  0
US China Populations Normal
Mean 3.8 6.1 x-bar F ratio 5.80372
8-10
Equal variance assumption is violated.

Assuming Population Variances are
Unequal
df 23
Confidence Interval for difference in Population
At an  of Means
Confidence
H0: 1 2 =0 0.1073 99% -2.3 ± 3.85252 = [ -6.1525, 1.55252 ]
H0: 1 2 >=0 0.0536
H0: 1 2 <=0 0.9464
Do not reject the null hypothesis (p-value = 0.1073), investment returns are the same in China
and the US.
8-22. Old (1): n1 = 19 x1 = 8.26 s1 = 1.43

New (2): n 2 = 23 x 2 = 9.11 s 2 = 1.56
H0:  2  1  0 H1:  2  1 > 0
9.11  8.26  0
t (40) = = 1.82
2 2
18(1.43)  22(1.56)  1 1 
  
40  19 23 
Some evidence to reject H0 (p-value = 0.038) for the t-distribution with df = 40, in a one-tailed
test.
8-23. Take proposed route as population 1 and alternate route as 2. Assume equal variance for both
populations.
H0: 1   2  0
H1: 1   2 > 0
p-value from the template = 0.8674
cannot reject H0

H0: 1   2 = 0 H1: 1   2  0
8-11
Mean 3.56 4.84 x-bar F ratio 1.30612

df 38
At an  of
H0: 1 2 =0 0.1862
H0: 1 2 >=0 0.0931
H0: 1 2 <=0 0.9069
Do not reject the null hypothesis. Neither investment outperforms the other.
8-25. “Yes” (1): n1 = 25 x1 = 12 s1 = 2.5

“No” (2): n 2 = 25 x 2 = 13.5 s2 = 1
Assume independent random sampling from normal populations with equal population variances.
H0:  2  1  0 H1:  2  1 > 0
13.5  12
t(48) = = 2.785
2 2
24(2.5)  24(1)  1 1 
  
48  25 25 
At  = 0.05, reject H0. Also reject at  = 0.01. p-value = 0.0038.
Evidence
Sample1 Sample2
Size 25 25 n
Mean 12 13.5 x-bar
Std. Deviation 2.5 1 s

df 48
At an  of
H0: 1 2 =0 0.0076 Reject
H0: 1 2 >=0 0.0038 Reject
8-12
8-26. H0: 1   2 = 0 H1: 1   2  0

.1331 .105  0
z= = 0.8887
2 2
20(.09)  27(.122)  1 1 
  
47  21 28 
Do not reject H0. There is no evidence of a difference in average stock returns for the two
periods.

H0: μN - μO ≤ 0 H1: μN - μO > 0
Mean 3 2.3 x-bar F ratio 1.1025
Std. Deviation 2 2.1 s p-value 0.9186

df 16
At an  of
H0: 1 2 =0 0.4834
H0: 1 2 >=0 0.7583
H0: 1 2 <=0 0.2417
Do not reject the null hypothesis. (p-value = 0.2417) The new advertising firm has not resulted
in significantly higher sales.
8.28. From Problem 8-25:

n1 = n 2 = 25 x1 = 12 x 2 = 13.5 s1 = 2.5 s2 = 1
We want a 95% C.I. for  2  1 :
(n1  1) s1  (n2  1) s 2
2 2
1 1 
( x 2  x1 )  2.011   
n1  n2  2  n1 n2 
2 2
24(2.5)  24(1)  1 1 
= (13.5 – 12)  2.001   
48  25 25 
= [0.4170, 2.5830] percent.
8-13
8-29. Before (1): x1 = 85 n1 = 100

After (2): x 2 = 68 n 2 = 100
H0: p1 – p2  0 H1: p1 – p2 > 0
pˆ 1  pˆ 2 .85  .68
z= = = 2.835
1 1   1 1 
pˆ (1  pˆ )   (.765)(.235)  
 n1 n 2   100 100 
Reject H0. On-time departure percentage has probably declined after NW’s merger with
Republic. p-value = 0.0023.
Sample Sample
Evidence 1 2
Size 100 100 n
#Successes 85 68 x
Proportion 0.8500 0.6800 p-hat
Hypothesis Testing
Hypothesized Difference Zero
Pooled p-hat 0.7650

At an  of
H0: p1 - p2 = 0 0.0046 Reject
H0: p1 - p2 >= 0 0.9977
H0: p1 - p2 <= 0 0.0023 Reject
8-30. Small towns (1): n1 = 1,000 x 1 = 850

Big cities (2): n 2 = 2,500 x 2= 1,950
H0: p1 – p2  0 H1: p1 – p2 > 0
850 1,950

1,000 2,500
z= = 4.677
 850  1,950  2,800  1 1 
 1    
 3,500  3,500  1,000 2,500 
Reject H0. There is strong evidence that the percentage of word-of-mouth recommendations in
small towns is greater than it is in large metropolitan areas.
8.31. n1 = 31 x 1 = 11 n 2 = 50 x 2= 19
H0: p1 – p2 = 0 H1: p1 – p2  0
8-14
pˆ 1  pˆ 2
z= = 0.228
1 1 
pˆ (1  pˆ )  
 1
n n 2 
Do not reject H0. There is no evidence that one corporate raider is more successful than the other.
8-32. Before campaign (1): n1 = 2,060 p̂1 = 0.13

After campaign (2): n 2 = 5,000 p̂ 2 = 0.19
H0: p2  p1  .05 H1: p2 – p1 > .05
pˆ 2  pˆ 1  D 0.19  0.13  .05
z= = = 1.08
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 ) (.13)(.87) (.19)(.81)
 
n1 n2 2,060 5,000
No evidence to reject H0; cannot conclude that the campaign has increased the proportion of
people who prefer California wines by over 0.05.
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 )
8-33. 95% C.I. for p2  p1: ( p̂ 2  p̂1 )  1.96 
n1 n2
(.13)(.87) (.19)(.81)
= .06  1.96  = [0.0419, 0.0781]
2,060 5,000
We are 95% confident that the increase in the proportion of the population preferring California
wines is anywhere from 4.19% to 7.81%.
Confidence Interval
 Confidence Interval
95% 0.0600 ± 0.0181 = [ 0.0419 , 0.0782 ]
8-34. The statement to be tested must be hypothesized before looking at the data:
Chase Man. (1): n1 = 650 x 1 = 48
Manuf. Han. (2): n 2 = 480 x 2 = 20
H0: p 1 – p 2  0 H1: p 1 – p 2 > 0
pˆ 1  pˆ 2
z= = 2.248
1 1 
pˆ (1  pˆ )  
 n1 n 2 
Reject H0. p-value = 0.0122.
8-35. American execs (1): n1 = 120 x 1 = 34

European execs (2): n 2 = 200 x 2 = 41
H0: p 1 – p 2  0 H1: p 1 – p 2 > 0
8-15
.283  .205
z= = 1.601
 1 1 
(.234)(1  .234)  
 120 200 
At  = 0.05, there is no evidence to conclude that the proportion of American executives who
prefer the A380 is greater than that of European executives. (p-value = 0.0547.)
Evidence Sample 1 Sample 2

Size 120 200 n
#Successes 34 41 x
Hypothesis Testing
Pooled p-hat 0.2344

At an  of
H0: p1 - p2 = 0 0.1093
H0: p1 - p2 >= 0 0.9454
H0: p1 - p2 <= 0 0.0546
8-36. Cleveland (1): n1 = 1,000 x 1 = 75 p̂1 = .075

Chicago (2): n 2 = 1,000 x 2 = 72 p̂ 2 = .072
H0: p 1 – p 2 = 0 H1: p 1 – p 2  0 p̂ = (72 +75)/2,000 = .0735
pˆ 1  pˆ 2
z= = 0.257
1 1 
pˆ (1  pˆ )  
 n1 n 2 
We cannot reject H0. p. value = 0.7971
8-37. (Use template: “testing difference in proportions.xls”)

H0: pQ – pN = 0 H1: pQ – pN ≠ 0
8-16
Comparing Two Population Proportions

Size 100 100 n
#Successes 18 6 x
Hypothesis Testing
Pooled p-hat 0.1200

At an  of
H0: p1 - p2 = 0 0.0090 Reject
Reject the null hypothesis, the new accounting method is more effective.
8-38. (Use template: “testing difference in proportions.xls”)

H0: pC – pD = 0 H1: pC – pD ≠ 0
Comparing Two Population Proportions

Size 100 100 n
#Successes 32 19 x
Hypothesis Testing
Pooled p-hat 0.2550

At an  of
H0: p1 - p2 = 0 0.0349
Do not reject the null hypothesis: the proportions are not significantly different.
8-39. Motorola (1): n1 = 120 x 1 = 101 p1 = .842

Blaupunkt (2): n 2 = 200 x 2 = 110 p2 = .550
H0: p 1  p 2 H1: p 1 > p 2 p̂ = (101 +110)/320 = .659
.842  .550
z= = 5.33
 1 1 
(.659)(1  .659)  
 120 200 
8-17
Strongly reject H0; Motorola’s system is superior (p-value is very small).
2
8-40. Old method (1): n1 = 40 s1 = 1,288
2
New method (2): n 2 = 15 s 2 = 1,112
2 2 2 2
H0:  1  2 H1:  1 >2 use  = .05
2 2
F (39,14) = s 1 /s 2
= 1,288/1,112 = 1.158
The critical point at  = .05 is F (39,14) = 2.27 (using approximate df in the table). Do not reject
H0. There is no evidence that the variance of the new production method is smaller.
F-Test for Equality of Variances
Sample 1 Sample 2
Size 40 15
Variance 1288 1112
Test Statistic 1.158273 F

df1 39
df2 14
At an  of
H0:  -
2 2
1 2 = 0 0.7977
H0:  -
2 2
1 2 >= 0 0.6012
H0:  1 2
2 2
- <= 0 0.3988
8-41. Test the equal-variance assumption of Problem 8-27:

2 2 2 2
H0:  1 =  2 H1:  1   2
F = 1.1025
Assumptions
Populations Normal
H0: Population Variances Equal
F ratio 1.1025
p-value 0.9186
Do not reject H0. Variances are equal.
8-42. “Yes” (1): n1 = 25 s1= 2.5

“No” (2): n 2 = 25 s2= 1
2 2 2 2
H0:  1 =  2 H1:  1   2
Put the larger s in the numerator and use 2  :
2
2 2 2 2
F (24,24) = s1 / s2 = (2.5) /(1) = 6.25
8-18
Chapter 09 - Analysis of Variance
CHAPTER 9
ANALYSIS OF VARIANCE
9-1. H0: X X X X 1   2   3   4
H1: X X All 4 are different
X X
X X X 2 equal; 2 different
X
X X X X 3 equal; 1 different
X X X X 2 equal; other 2 equal but different from first 2
9-2. ANOVA assumptions: normal populations with equal variance. Independent random sampling
from the r populations.
9-3. Series of paired t-test are dependent on each other. There is no control over the probability of a
Type I error for the joint series of tests.
9-4. r = 5 n1 = n2 = . . . = n5 = 21 n =105
df’s of F are 4 and 100. Computed F = 3.6. The p-value is close to 0.01. Reject H0. There is
evidence that not all 5 plants have equal average output.
F Distribution
 10% 5% 1% 0.50%
(1-Tail) F-Critical 2.0019 2.4626 3.5127 3.9634
9-5. r = 4 n1 = 52 n2 = 38 n3 = 43 n4 = 47
Computed F = 12.53. Reject H0. The average price per lot is not equal at all 4 cities. Feel very
strongly about rejecting the null hypothesis as the critical point of F (3,176) for  = .01 is
approximately 3.8.
F Distribution
 10% 5% 1% 0.50%
(1-Tail) F-Critical 2.1152 2.6559 3.8948 4.4264
9-6. Originally, treatments referred to the different types of agricultural experiments being performed
on a crop; today it is used interchangeably to refer to the different populations in the study.Errors
are the differences between the data points and their sample means.
9-7. Because the sum of all the deviations from a mean is equal to 0.
9-1
 
9-8. Total deviation = xij – x = ( x i – x ) + x ij  xi 
= treatment deviation + error deviation.
9-9. The sum of squares principle says that the sum of the squared total deviations of all the data
points is equal to the sum of the squared treatment deviations plus the sum of all squared error
deviations in the data.
9-10. An error is any deviation from a sample menu that is not explained by differences among
populations. An error may be due to a host of factors not studied in the experiment.
9-11. Both MSTR and MSE are sample statistics given to natural variation about their own means.
(If x >  0 we cannot immediately reject H0 in a single-sample case either.)
9-12. The main principle of ANOVA is that if the r population means are not all equal then it is likely
that the variation of the data points about their sample means will be small compared to the
variation of the sample means about the grand mean.
9-13. Distances among populations means manifest themselves in treatment deviations that are large
relative to error deviations. When these deviations are squared, added, and then divided by df’s,
they give two variances. When the treatment variance is (significantly) greater than the error
variance, population mean differences are likely to exist.
9-14. a) degrees of freedom for Factor: 4 – 1 = 3

b) degrees of freedom for Error: 80 – 4 = 76
c) degrees of freedom for Total: 80 – 1 = 79
9-15 SST = SSTR + SSE, but this does not equal MSTR + MSE. A counterexample:
Let n = 21 r = 6 SST = 100 SSTR = 85 SSE = 15
Then SST = SSTR + SSE = 85 + 15 = 100.
SSTR SSE 85 15 SST
But = MSTR  MSE      18 
r  1 n  r 5 15 n 1
9-16. When the null hypothesis of ANOVA is false, the ratio MSTR/MSE is not the ratio of two
independent, unbiased estimators of the common population variance  2 , hence this ratio does
not follow an F distribution.
9-17. For each observation xij , we know that

 
(tot.) = (treat.) + (error): xij – x = ( x i – x ) + x ij  xi 
Squaring both sides of the equation:
  
( xij – x )2 = ( x i – x )2 + 2( x i – x )( xij – x i ) + ( xij – x i )2
9-2
Now sum this over all observations (all treatments i = 1, . . . , r; and within treatment i, all
observations j = 1, . . . , ni:
ni ni ni ni
  
r r r r

i 1 j 1
( xij – x )2 = 
i 1 j 1
( x i – x )2 +  i 1 j 1
2( x i – x )( xij – x i ) + 
i 1 j 1
( xij – x i )2

r
Notice that the first sum of the R.H.S. here equals 
i 1
ni( x i – x )2 since for each i the
summand doesn’t vary over each of the ni) values of j. Similarly the second sum is
ni ni

r
2  [( x i – x )  ( xij – x i )]. But for each fixed i,  ( xij – x i ) = 0 since this is just the sum
i 1 j 1 j 1
of all deviations from the mean within treatment i. Thus the whole second sum in the long R.H.S.
above is 0, and the equation is now
ni ni
 
r r r

i 1 j 1
( xij – x )2 = 
i 1
ni( x i – x )2 + 
i 1 j 1
( xij – x i )2
which is precisely Equation (9-12).
9-18. (From Minitab):

Source df SS MS F
Treatment 2 381127 190563 20.71
Error 27 248460 9202
Total 29 629587
The critical point for F (2,27) at  = 0.01 is 5.49. Therefore, reject H0. The average range of the 3
prototype planes is probably not equal.

ANOVA Table 5%
Source SS df MS F Fcritical p-value
Between 381127 2 190563.33 20.7084038 3.3541312 0.0000 Reject
Within 248460 27 9202.2222
Total 629587 29
9-19. (Template: Anova.xls, sheet: 1-way):

ANOVA Table 5%
Between 187.696 3 62.565 11.494 2.9467 0.0000 Reject
Within 152.413 28 5.4433
Total 340.108 31
9-3
MINITAB output
One-way ANOVA: UK, Mex, UAE, Oman
Source DF SS MS F P
Factor 3 187.70 62.57 11.49 0.000
Error 28 152.41 5.44
Total 31 340.11
S = 2.333 R-Sq = 55.19% R-Sq(adj) = 50.39%
Individual 95% CIs For Mean Based on

Pooled
StDev
Level N Mean StDev +---------+---------+---------+--------
-
UK 8 60.160 2.535 (------*-----)
Mex 8 58.390 2.405 (------*-----)
UAE 8 55.190 2.224 (------*------)
Oman 8 54.124 2.149 (-----*------)
+---------+---------+---------+--------
-
52.5 55.0 57.5 60.0
Pooled StDev = 2.333
Critical point F (3,28) for  = 0.05 is 2.9467. Therefore we reject H0. There is evidence of
differences in the average price per barrel of oil from the four sources. The Rotterdam oil market
may not be efficient. The conclusion is valid only for Rotterdam, and only for Arabian Light. We
need to assume independent random samples from these populations, normal populations with
equal population variance. Observations are time-dependent (days during February), thus the
assumptions could be violated. This is a limitation of the study. Another limitation is that
February may be different from other months.
9-20. An F(.05,2,101) = 3.61 result, relative to a critical value of 3.08637, indicates a significant difference
in their perceptions on the roles played by African American models in commercials.
9-21. (From Minitab):

Source df SS MS F
Treatment 2 91.0426 45.5213 12.31
Error 38 140.529 3.69812
Total 40 231.571
9-4
p-value = .0001. Critical point for F (2,38) at  = .05 is 3.245. Therefore, reject H0. There is a
difference in the length of time it takes to make a decision.

ANOVA Table 5%
Between 91.0426 2 45.521302 12.3093042 3.2448213 0.0001 Reject
Within 140.529 38 3.6981215
Total 231.571 40
9-22. An F(.05,2,55) = 52.787 result, relative to a critical value of 3.165, indicates a significant difference
in the monetary-economic reaction to the three inflation fighting policies.
9-23. The test results exceed the critical value of F(.01,3,236) = 3.866. The results indicate that the
performances of the four different portfolios are significantly different.
9-24. 95% C.I. for the mean responses:

Martinique: x2  t / 2 MSE / n2 = 75  1.96 504.4 / 40 = [68.04, 81.96]
Eleuthera: 73  1.96 MSE / n3 = [66.04, 79.96]
Paradise Island: 91  1.96 MSE / n4 = [84.04, 97.96]
St. Lucia: 85  1.96 MSE / n5 = [78.04, 91.96]
9-25. Where do differences exist in the circle-square-triangle populations from Table 9-1, using
Tukey? From the text: MSE = 2.125
triangles: n1 = 4 x1 = 6
squares: n2 = 4 x 2 = 11.5
circles: n3 = 3 x3 = 2
For  = .01, q  (r,nr) = q 0.01(3,8) = 5.63 Smallest ni is 3:
T = q MSE / 3 = 5.63 2.125 / 3 = 4.738
| x1  x 2 | = 5.5 > 4.738 sig.
| x 2  x 3 | = 9.5 > 4.738 sig.
| x1  x 3 | = 4.0 < 4.738 n.s.
Thus: “1 = 3”; “2 > 1”; “2 > 3”
9-26. Find which prototype planes are different in Problem 9-18:

MSE = 9,202 ni = 10 for all i x A = 4,407 x B = 4,230 xC = 4,135
For  = .05, q  (3,27) = approximately 3.51. T = 3.51 9,202 / 10 = 106.475
9-5
Chapter 10 - Simple Linear Regression and Correlation
CHAPTER 10
SIMPLE LINEAR REGRESSION AND CORRELATION
(The template for this chapter is: Simple Regression.xls.)
10-1. A statistical model is a set of mathematical formulas and assumptions that describe some real-
world situation.
10-2. Steps in statistical model building: 1) Hypothesize a statistical model; 2) Estimate the model
parameters; 3) Test the validity of the model; and 4) Use the model.
10-3. Assumptions of the simple linear regression model: 1) A straight-line relationship between X and
Y; 2) The values of X are fixed; 3) The regression errors, , are identically normally distributed
random variables, uncorrelated with each other through time.
10-4.  0 is the Y-intercept of the regression line, and  1 is the slope of the line.
10-5. The conditional mean of Y, E(Y | X), is the population regression line.
10-6. The regression model is used for understanding the relationship between the two variables, X and
Y; for prediction of Y for given values of X; and for possible control of the variable Y, using the
variable X.
10-7. The error term captures the randomness in the process. Since X is assumed nonrandom, the
addition of  makes the result (Y) a random variable. The error term captures the effects on Y of a
host of unknown random components not accounted for by the simple linear regression model.
10-8. The equation represents a simple linear regression model without an intercept (constant) term.
10-9. The least-squares procedure produces the best estimated regression line in the sense that the line
lies “inside” the data set. The line is the best unbiased linear estimator of the true regression line
as the estimators  0 and  1 have smallest variance of all linear unbiased estimators of the line
parameters. Least-squares line is obtained by minimizing the sum of the squared deviations of the
data points from the line.
10-10. Least squares is less useful when outliers exist. Outliers tend to have a greater influence on the
determination of the estimators of the line parameters because the procedure is based on
minimizing the squared distances from the line. Since outliers have large squared distances they
exert undue influence on the line. A more robust procedure may be appropriate when outliers
exist.
10-1
10-11. (Template: Simple Regression.xls, sheet: Regression)

Simple Regression
Income Wealth
X Y Error Quantile Z Confidence Interval for Slope
1 1 17.3 0.8 0.667 0.431  (1-) C.I. for  1
2 2 23.6 -3.02 0.167 -0.967 95% 10.12 + or - 2.77974
3 3 40.2 3.46 0.833 0.967
4 4 45.8 -1.06 0.333 -0.431 Confidence Interval for Intercept
5 5 56.8 -0.18 0.500 0.000  (1-) C.I. for  0
95% 6.38 + or - 9.21937
Regression Equation: Wealth Growth = 6.38 + 10.12 Income Quantile
10-12. b1 = SSXY /SSX = 934.49/765.98 = 1.22
10-13. (Template: Simple Regression.xls, sheet: Regression)
Thus, b0 = 3.057 b1 = 0.187
2
r 0.9217 Coefficient of Determination
Confidence Interval for Slope r 0.9601 Coefficient of Correlation
 (1-) C.I. for  1
95% 0.18663 + or - 0.03609 s(b1) 0.0164Standard Error of Slope
Confidence Interval for Intercept

 (1-) C.I. for  0
95% -3.05658 + or - 2.1372 s(b0) 0.97102Standard Error of Intercept
Prediction Interval for Y

 X (1-) P.I. for Y given X
95% 10 -1.19025 + or - 2.8317 s 0.99538Standard Error of prediction
Prediction Interval for E[Y|X]

 X (1-) P.I. for E[Y | X]
+ or -
ANOVA Table
Regn. 128.332 1 128.332 129.525 4.84434 0.0000
Error 10.8987 11 0.99079
Total 139.231 12
10-2
10-14. b1 = SSXY /SSX = 2.11

b0 = y  b1 x = 165.3  (2.11)(88.9) = 22.279
10-15.
Simple Regression
Inflation Return
X Y Error
1 1 -3 -20.0642
2 2 36 17.9677
3 12.6 12 -16.294
4 -10.3 -8 -14.1247
5 0.51 53 36.4102
6 2.03 -2 -20.0613
7 -1.8 18 3.64648
8 5.79 32 10.2987
9 5.87 24 2.22121
Inflation & return on stocks
2
 (1-) C.I. for  1

 (1-) C.I. for  0
95% 16.0961 + or - 17.3299 s(b0) 7.32883Standard Error of Intercept
s 20.8493Standard Error of prediction
ANOVA Table
Regn. 291.134 1 291.134 0.66974 5.59146 0.4401
Error 3042.87 7 434.695
Total 3334 8
10-3
60
50
y = 0.9681x + 16.096
40
30
20
Y
10
0
-15 -10 -5 -10 0 5 10 15
-20
X
There is a weak linear relationship (r) and the regression is not significant (r2, F, p-value)
10-16.
Simple Regression
Year Value
X Y Error
1 1960 180000 84000
2 1970 40000 -72000
3 1980 60000 -68000
4 1990 160000 16000
5 2000 200000 40000
Average value of Aston Martin
2
 (1-) C.I. for  1
95% 1600 + or - 7949.76 s(b1) 2498Standard Error of Slope

 (1-) C.I. for  0
95% -3040000 + or - 1.6E+07 s(b0) 4946165Standard Error of Intercept
s 78993.7Standard Error of prediction
ANOVA Table
Regn. 2.6E+09 1 2.6E+09 0.41026 10.128 0.5674
Error 1.9E+10 3 6.2E+09
Total 2.1E+10 4
10-4
250000
200000 y = 1600x - 3E+06
150000
Y
100000
50000
0
1950 1960 1970 1980 1990 2000 2010
X
There is a weak linear relationship (r) and the regression is not significant (r 2, F, p-value).
Limitations: sample size is very small.
Hidden variables: the 70s and 80s models have a different valuation than other decades possibly
due to a different model or style.
10-17. Regression equation is:

Credit Card Transactions = 39.6717 + 0.06129 Debit Card Transactions
2
 (1-) C.I. for  1

 (1-) C.I. for  0
95% 177.641 + or - 110.147 s(b0) 39.6717Standard Error of Intercept

+ or - s 56.9747Standard Error of prediction
Prediction Interval for E[Y|X]

 X (1-) P.I. for E[Y | X]
+ or -
ANOVA Table
Regn. 332366 1 332366 102.389 7.70865 0.0005
Error 12984.5 4 3246.12
Total 345351 5
There is no implication for causality. A third variable influence could be “increases in per capital
income” or “GDP Growth”.
10-5
10-18. SSE =   y  b  b x Take partial derivatives with respect to b

0 1
2
0 and b1:
 / b [ ( y  b  b x) ] = 2   y  b  b x 
0 0 1
2
0 1
 / b [ ( y  b  b x) ] = 2  x y  b  b x 
1 0 1
2
0 1
Setting the two partial derivatives to zero and simplifying, we get:

  y  b  b x = 0 and  x y  b
0 1 0  b1 x  = 0. Expanding, we get:
 y  nb   xb = 0 and

0 1

  xy  -  xb   x b
0
2
1 =0
Solving the above two equations simultaneously for b0 and b1 gives the required results.
10-19. 99% C.I. for  1 : 1.25533  2.807(0.04972) = [1.1158, 1.3949].

The confidence interval does not contain zero.
10-20. MSE = 7.629

From the ANOVA table for Problem 10-11:
ANOVA Table
Source SS df MS
Regn. 1024.14 1 1024.14
Error 22.888 3 7.62933
Total 1047.03 4
10-21. From the regression results for problem 10-11

s(b0) = 2.897 s(b1) = 0.873
s(b1) 0.87346Standard Error of Slope
s(b0) 2.89694Standard Error of Intercept
10-22. From the regression results for problem 10-11
Confidence Interval for Slope

 (1-) C.I. for  1
95% 10.12 + or - 2.77974

 (1-) C.I. for  0
95% 6.38 + or - 9.21937
95% C.I. for the slope: 10.12 ± 2.77974 = [7.34026, 12.89974]
95% C.I. for the intercept: 6.38 ± 9.21937 = [-2.83937, 15.59937]
10-6
10-23. s(b0) = 0.971 s(b1) = 0.016; estimate of the error variance is MSE = 0.991. 95% C.I. for  1 :
0.187 + 2.201(0.016) = [0.1518, 0.2222]. Zero is not a plausible value at  = 0.05.

 (1-) C.I. for  1

 (1-) C.I. for  0
10-24. s(b0) = 85.44 s(b1) = 0.1534

Estimate of the regression variance is MSE = 8122
95% C.I. for b1: 1.5518  2.776 (0.1534) = [1.126, 1.978]
Zero is not in the range.

 (1-) C.I. for  1

 (1-) C.I. for  0
10-25. s 2 gives us information about the variation of the data points about the computed regression line.
10-26. In correlation analysis, the two variables, X and Y, are viewed in a symmetric way, where no one
of them is “dependent” and the other “independent,” as the case in regression analysis. In
correlation analysis we are interested in the relation between two random variables, both
assumed normally distributed.
10-27. From the regression results for problem 10-11:

r 0.9890 Coefficient of Correlation
10-28. r = 0.960
r 0.9601 Coefficient of Correlation
10-7
0.3468
10-29. t(5) = = 0.640
(1  .1203) / 3
Accept H0. The two variables are not linearly correlated.
10-30. Yes. For example suppose n = 5 and r = .51; then:

r
t= = 1.02 and we do not reject H0. But if we take n = 10,000 and
(1  r ) /(n  2)
2
r = 0.04, giving t = 14.28, this leads to strong rejection of H0.
10-31. We have: r = 0.875 and n = 10. Conducting the test:

r .875
t (8) = = = 5.11
(1  r ) /(n  2)
2
(1  .8752 ) / 8
There is statistical evidence of a correlation between the prices of gold and of copper.
Limitations: data are time-series data, hence not dependent random samples. Also, data set
contains only 10 points.
.37
10-34. n= 65 r = 0.37 t (63) = = 3.16
(1  .37 2 ) / 63
Yes. Significant. There is a correlation between the two variables.
1 1
10-35. z  = ln [(1 + r)/(1 – 5)] = ln (1.37/0.63) = 0.3884
2 2
1 1
  = ln [(1 +  )/(1 –  )] = ln (1.22/0.78) = 0.2237
2 2
  = 1/ n  3 = 1/ 62 = 0.127
z = ( z     )/   = (0.3884 – 0.2237)/0.127 = 1.297. Cannot reject H0.
10-36. Using “TINV(,df)” function in Excel, where df = n-2 = 52: =TINV(0.05,52) = 2.006645
And TINV(0.01, 52) = 2.6737
Reject H0 at 0.05 but not at 0.01. There is evidence of a linear relationship at  = 0.05 only.
10-37. t (16) = b1/s(b1) = 3.1/2.89 = 1.0727.

Do not reject H0. There is no evidence of a linear relationship using any  .
10-38. Using the regression results for problem 10-11:

critical value of t is: t( 0.05, 3) = 3.182
computed value of t is: t = b1/s(b1) = 10.12 / 0.87346 = 11.586
Reject H0. There is strong evidence of a linear relationship.
10-8
10-39. t (11) = b1/s(b1) = 0.187/0.016 = 11.69

Reject H0. There is strong evidence of a linear relationship between the two variables.
10-40. b1/ s(b1) = 1600/2498 = 0.641

Do not reject H0. There is no evidence of a linear relationship.
10-41. t (58) = b1/s(b1) = 1.24/0.21 = 5.90

Yes, there is evidence of a linear relationship.
10-42. Using the Excel function, TDIST(x,df,#tails) to estimate the p-value for the t-test results, where
x = 1.51, df = 585692 – 2 = 585690, #tails = 2 for a 2-tail test:
TDIST(1.51, 585690,2) = 0.131.
The corresponding p-value for the results is 0.131. The resgression is not significant even at the
0.10 level of significance.
10-43. t (211) = z = b1/s(b1) = 0.68/12.03 = 0.0565

Do not reject H0. There is no evidence of a linear relationship using any  . (Why report such
results?)
10-44. b1 = 5.49 s(b1) = 1.21 t (26) = 4.537

Yes, there is evidence of a linear relationship.
10-45. The coefficient of determination indicates that 9% of the variation in customer satisfaction can
be explained by the changes in a customer’s materialism measurement.
10-46 a. The model should not be used for prediction purposes because only 2.0% of the
variation in pension funding is explained by its relationship with firm profitability.
b. The model explains virtually nothing.
c. Probably not. The model explains too little.
10-47. In Problem 10-11 regression results, r 2 = 0.9781. Thus, 97.8% of the variation in wealth growth
is explained by the income quantile.
2
10-48. In Problem 10-13, r 2 = 0.922. Thus, 92.2% of the variation in the dependent variable is
explained by the regression relationship.
10-49. r 2 in Problem 10-16: r 2 = 0.1203
10-50. Reading directly from the MINITAB output: r 2 = 0.962
10-9
2
10-51. Based on the coefficient of determination values for the five countries, the UK model explains
31.7% of the variation in long-term bond yields relative to the yield spread. This is the best
predictive model of the five. The next best model is the one for Germany, which explains 13.3%
of the variation. The regression models for Canada, Japan, and the US do not predict long-term
yields very well.
10-52. From the information provided, the slope coefficient of the equation is equal to -14.6. Since its
value is not close to zero (which would indicate that a change in bond ratings has no impact on
yields), it would indicate that a linear relationship exists between bond ratings and bond yields.
This is in line with the reported coefficient of determination of 61.56%.
10-53. r 2 in Problem 10-15: r 2 = 0.873
2
10-54.  ( y  y) = [( yˆ  y)  ( y  yˆ )] = [( yˆ  y)  2( yˆ  y)( y  yˆ )  ( y  yˆ )

2 2 2 2
]
=  ( yˆ  y )  2 ( yˆ  y )( y  yˆ ) +  ( y  yˆ )
2 2
But: 2 ( yˆ  y )( y  yˆ ) = 2 yˆ ( y  yˆ )  2 y  ( y  yˆ ) = 0
because the first term on the right is the sum of the weighted regression residuals, which sum to
zero. The second term is the sum of the residuals, which is also zero. This establishes the result:
  
( y  y ) 2  ( yˆ  y ) 2  ( y  yˆ ) 2 .
10-55. From Equation (10-10): b1 = SSXY/SSX. From Equation (10-31):

SSR = b1SSXY. Hence, SSR = (SSXY /SSX)SSXY = (SSXY) 2/SSX
10-56. Using the results for problem 10-11:

F = 134.238 F(1,3) = 10.128 Reject H0.
F Fcritical p-value
134.238 10.128 0.0014
10-57. F (1,11) = 129.525 t (11) = 11.381 t 2 = 11.3812 = the F-statistic value already calculated.
F Fcritical p-value
129.525 4.84434 0.0000
10-58. F(1,4) = 102.39 t (4) = 10.119 t 2 = F (10.119)2 = 102.39
10-10
F Fcritical p-value
102.389 7.70865 0.0005
10-59. F (1,7) = 0.66974 Do not reject H0.
87,691/ 1
10-60. F (1,102) = MSR/MSE = = 701.8
12,745 / 102
There is extremely strong evidence of a linear relationship between the two variables.
10-61. t (k2 ) = F (1,k) . Thus, F(1,20) = [b1/s(b1)]2 = (2.556/4.122)2 = 0.3845

Do not reject H0. There is no evidence of a linear relationship.
2
 SS / SS 
10-62 t (k2 ) = [b1/s(b1)] =  XY
2 X 
 s / SS 
 X 
[using Equations (10-10) and (10-15) for b1 and s(b1), respectively]
2
 SS / SS  2
= XY X  = (SS XY / SS X )
 MSE / SS  MSE / SS X
 X 
SS 2XY / SS X SSR/1 MSR
= = = = F (1,k)
MSE MSE MSE
[because SS 2XY / SS X = SSR by Equations (10-31) and (10-10)]
10-63. a. Heteroscedasticity.
b. No apparent inadequacy.
c. Data display curvature, not a straight-line relationship.
10-64. a. No apparent inadequacy.

b. A pattern of increase with time.
10-65. a. No serious inadequacy.

b. Yes. A deviation from the normal-distribution assumption is apparent.
10-11
10-66. Using the results for problem 10-11:

Residual Analysis Durbin-Watson statistic
d 3.39862
Residual Plot
4
3
2
1
Error
0
-1
-2
-3
-4
X
Residual variance fluctuates; with only 5 data points the residuals appear to be normally
distributed.
Normal Probability Plot of Residuals

3
2
Corresponding
1
Normal Z
0
-10 -5 0 5 10
-1
-2
-3
Residuals
10-12
10-67. Residuals plotted against the independent variable of Problem 10-14:

*
resids
1.2+
* *
*
0.0+ * *
* *
*
* *
-1.2+ * *
Quality
30 40 50 60 70 80
No apparent inadequacy.

d 2.0846
10-68.
10-13

d 1.70855
Plot shows some curvature.
10-69. In the American Express example, give a 95% prediction interval for x = 5,000:
ŷ = 274.85 + 1.2553(5,000) = 6,551.35.
1 (5,000  3,177.92) 2
P.I. = 6,551.35  (2.069)(318.16) 1  
25 40,947,557.84
= [5,854.4, 7,248.3]
10-70. Given that the slope of the equation for 10-52 is –14.6, if the rating falls by 3 the yield should
increase by 43.8 basis points.
10-71. For 99% P.I.: t .005(23) = 2.807

1 (5,000  3,177.92) 2
6,551.35  (2.807)(318.16) 1  
25 40,947,557.84
= [5,605.75, 7,496.95]
10-72. Point prediction: yˆ  6.38  10.12(4)  46.86

The 99% P.I.: [28.465, 65.255]
99% 4 46.86 + or - 18.3946
10-14
10-73. The 99% P.I.: [36.573, 77.387]

99% 5 56.98 + or - 20.407
10-74. The 95% P.I.: [-142633, 430633]

95% 1990 144000 + or - 286633
10-75. The 95% P.I.: [-157990, 477990]

95% 2000 160000 + or - 317990
10-76. Point prediction: ŷ  16.0961  0.96809(5)  20.9365
10-77.
a) simple regression equation: Y = 2.779337 X – 0.284157
when X = 10, Y = 27.5092
Intercept Slope
b0 b1
-0.284157 2.779337
b) forcing through the origin: regression equation: Y = 2.741537 X.
Intercept Slope
b0 b1
0 2.741537
When X = 10, Y = 27.41537

Prediction
X Y
10 27.41537
c) forcing through (5, 13): regression equation: Y = 2.825566 X – 1.12783
Intercept Slope Prediction

b0 b1 X Y
-1.12783 2.825566 5 13
10-15
Chapter 11 - Multiple Regression
CHAPTER 11
MULTIPLE REGRESSION
(The template for this chapter is: Multiple Regression.xls.)
11-1. The assumptions of the multiple regression model are that the errors are normally and
independently distributed with mean zero and common variance  2 . We also assume that the X i
are fixed quantities rather than random variables; at any rate, they are independent of the error
terms. The assumption of normality of the errors is need for conducting test about the regression
model.
11-2. Holding advertising expenditures constant, sales volume increases by 1.34 units, on average, per
increase of 1 unit in promotional experiences.
11-3. In a correlational analysis, we are interested in the relationships among the variables. On the
other hand, in a regression analysis with k independent variables, we are interested in the effects
of the k variables (considered fixed quantities) on the dependent variable only (and not on one
another).
11-4. A response surface is a generalization to higher dimensions of the regression line of simple linear
regression. For example, when 2 independent variables are used, each in the first order only, the
response surface is a plane is a plane in 3-dimensional euclidean space. When 7 independent
variables are used, each in the first order, the response surface is a 7-dimensional hyperplane in
8-dimensional euclidean space.
11-5. 8 equations.
11-6. The least-squares estimators of the parameters of the multiple regression model, obtained as
solutions of the normal equations.
11-7. Y  nb  b  X  b  X
0 1 1 2 2
X Y b X b X b X X
1 0 1 1 1
2
2 1 2
X Y b X b X X b X
2 0 2 1 1 2 2 2
2
852 = 100b0 + 155b1 + 88b2

11,423 = 155b0 + 2,125b1 + 1,055b2
8,320 = 88b0 + 1,055b1 + 768b2
b0 = (852 – 155b1 – 88b2)/100

11,423 = 155(852 – 155b1 – 88b2)/100 + 2,125b1 + 1,055b2
8,320 = 88(852 – 155b1 – 88b2)/100 + 1,055b1 + 768b2
11-1
Continue solving the equations to obtain the solutions:

b0 = 1.1454469 b1 = 0.0487011 b2 = 10.897682
11-8. Using SYSTAT:

DEP VAR: VALUE N: 9 MULTIPLE R: .909 SQUARED MULTIPLE R: .826
ADJUSTED SQUARED MULTIPLE R: .769
STANDING ERROR OF ESTIMATE: 59.477
VARIABLE COEFFICIENT STD ERROR STD COEF TOLERANCE T P(2TAIL)
CONSTANT 9.800 80.763 0.000 0.121 0.907

SIZE 0.173 0.040 0.753 0.9614430 4.343 0.005
DISTANCE 31.094 14.132 0.382 0.9614430 2.200 0.070
ANALYSIS OF VARIANCE
SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P
REGRESSION 101032.867 2 50516.433 14.280 0.005

RESIDUAL 21225.133 6 3537.522
Multiple Regression Results Value
0 1 2 3 4 5 6 7 8
Intercept Size Distance
b -9.7997 0.17331 31.094
s(b) 80.7627 0.0399 14.132
t -0.1213 4.34343 2.2002
p-value 0.9074 0.0049 0.0701
VIF 1.0401 1.0401
ANOVA Table
Source SS df MS F FCritical p-value
Regn. 101033 2 50516 14.28 5.1432 0.0052 s 59.477
Error 21225.1 6 3537.5
2 2
Total 122258 8 15282 R 0.8264 Adjusted R 0.7685
11-2
11-9. With no advertising and no spending on in-store displays, sales are b0  47.165 (thousands) on
the average. Per each unit (thousand) increase in advertising expenditure, keeping in-store
display expenditure constant, there is an average increase in sales of b1 = 1.599 (thousand).
Similarly, for each unit (thousand) increase in in-store display expenditure, keeping advertising
constant, there is an average increase in sales of b2 = 1.149 (thousand).
11-10. We test whether there is a linear relationship between Y and any of the X, variables (that is, with
at least one of the Xi). If the null hypothesis is not rejected, there is nothing more to do since
there is no evidence of a regression relationship. If H0 is rejected, we need to conduct further
analyses to determine which of the variables have a linear relationship with Y and which do not,
and we need to develop the regression model.
11-11. Degrees of freedom for error = n  13.
11-12. k = 2 n = 82 SSE = 8,650 SSR = 988

MSR = SSR / k = 988 / 2 = 494
SST = SSR + SSE = 988 + 8650 = 9638
MSE = SSE / n – (k+1) = 8650 / 79 = 109.4937
F = MSR / MSE = 494 / 109.4937 = 4.5116
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the df’s refer to the degrees of freedom in the numerator and denominator, respectively.
FDIST(4.5116, 2, 79) = 0.013953

Yes, there is evidence of a linear regression relationship at  = 0.05, but not at 0.01.
7,768 / 4
11-13. F (4,40) = MSR/MSE = = 1,942/197.625 = 9.827
(15,673  7,768) / 40
Yes, there is evidence of a linear regression relationship between Y and at least one of the
independent variables.
11-14. Source SS df MS F
Regression 7,474.0 3 2,491.33 48.16
Error 672.5 13 51.73
Total 8,146.5 16
Since the F-ratio is highly significant, there is evidence of a linear regression relationship
between overall appeal score and at least one of the three variables prestige, comfort, and
economy.
11-15. When the sample size is small; when the degrees of freedom for error are relatively smallwhen
adding a variable and thus losing a degree if freedom for error is substantial.
11-3
11-16. R 2 = SSR/SST. As we add a variable, SSR cannot decrease. Since SST is constant, R 2 cannot
decrease.
11-17. No. The adjusted coefficient is used in evaluating the importance of new variables in the
presence of old ones. It does not apply in the case where all we consider is a single independent
variable.
11-18. By the definition of the adjusted coefficient of determination, Equation (11-13):

SSE /( n  k  1) n 1
R2 = 1  = 1 – (SSE/SST)
SST /( n  1) n  k 1
But: SSE/SST = 1 – R 2, so the above is equal to:
n 1
1 – (1 – R 2) which is Equation (11-14).
n  (k  1)
11-19. The mean square error gives a good indication of the variation of the errors in regression.
However, other measures such as the coefficient of multiple determination and the adjusted
coefficient of multiple determination are useful in evaluating the proportion of the variation in
the dependent variable explained by the regressionthus giving us a more meaningful measure
of the regression fit.
11-20. Given an adjusted R 2 = 0.021, only 2.1% of the variation in the stock return is explained by the
four independent variables.
FDIST(2.27, 4, 433) = 0.06093

There is evidence of a linear regression relationship at  = 0.10 only.
11-21. R 2 = 7,474.0/8,146.5 = 0.9174 A good regression.

R 2 = 1  (1  0.9174)(16/13) = 0.8983 s= MSE = 51.73 = 7.192
11-22. Given R 2 = 0.94, k = 2 and n = 383, the adjusted R 2is:

n 1
R2 =1  (1  R 2) = 1  (1  0.94)(382/380) = 0.9397
n  (k  1)
Therefore, security and time effects characterize 93.97% of the variation on market price. Given
the value of the adjusted R 2, the model is a reliable predictor of market price.
n 1
11-23. R 2 = 1  (1  R 2) = 1  (1  0.918)(16/12) = 0.8907
n  (k  1)
Since R 2 has decreased, do not include the new variable.
11-4
11-24. Given R 2 = 0.769, k = 6 and n = 242

n 1
R 2 = 1  (1  R 2) = 1  (1  0.769)(241/235) = 0.7631
n  (k  1)
Since R 2 =76.31%, approximately 76% of the variation in the information price is characterized
by the 6 independent marketing variables.
FDIST(44.8, 6, 235) = 2.48855E-36
There is evidence of a linear regression relationship at all ’s.
11-25. a. The regression expresses stock returns as a plane in space, with firm size ranking and
stock price ranking as the two horizontal axes:
RETURN = 0.484 - 0.030(SIZRNK)  0.017(PRCRNK)
The t-test for a linear relationship between returns and firm size ranking is highly significant,
but not for returns against stock price ranking.
b. We know that R 2 = 0.093 and n = 50, k = 2. Using Equation (11-14) we calculate:

 n 1 
(1 – R 2)   = 1  R 2
 n  ( k  1) 
 n  (k  1) 
R 2 = 1 – (1 – R 2 )   = 1 – (1 – 0.093)(47/49) = 0.130
 n 1 
Thus, 13% of the variation is due to the two independent variables.
c. The adjusted R 2 is quite low, indicating that the regression on both variables is not a good
model. They should try regressing on size alone.
n 1
11-26. R 2 = 1 – (1 -– R 2) = 1 – (1 – 0.72)(712/710) = 0.719
n  (k  1)
Based solely on this information, this is not a bad regression model.
11-27. k = 8 n = 500 SSE = 6179 SST = 23108

Source SS df MS F
Regn. 16929 8 2116.125 168.153
Error 6179 491 12.5845
Total 23108 499 3.0684E+14
11-5
FDIST(168.153, 8, 491) = 0.00 approximately
There is evidence of a linear regression relationship at all ’s.

SSE /[ n  (k  1)]
R 2 = SSR/SST = 0.7326 R2= 1  = 0.7282 MSE = 12.5845
SST /( n  1)
11-28. A joint confidence region for both parameters is a set of pairs of likely values of  1 , and  2 at
95%. This region accounts for the mutual dependency of the estimators and hence is elliptical
rather than rectangular. This is why the region may not contain a bivariate point included in the
separate univariate confidence intervals for the two parameters.
11-29. Assuming a very large sample size, we use the following formula for testing the significance of
bi
each of the slope parameters: z  . and use  = 0.05. Critical value of |z| = 1.96
sbi 
For firm size: z = 0.06/0.005 = 12.00 (significant)
For firm profitability: z = -5.533 (significant)
For fixed-asset ratio: z = -0.08
For growth opportunities: z = -0.72
For nondebt tax shield: z = 4.29 (significant)
The slope estimates with respect to “firm size”, “firm profitability” and “nondebt tax shield” are
not zero. The adjusted R-square indicates that 16.5% of the variation in governance level is
explained by the five independent variables. Next step: exclude “fixed-asset ratio” and “growth
opportunities” from the regression and see what happens to the adjusted R-square.\
11-30. 1. The usual caution about the possibility of a Type 1 error.

2. Multicollinearity may make the tests unreliable.
3. Autocorrelation in the errors may make the tests unreliable.
11-31. 95% C.I.’s for  2 through  5 :

 2 : 5.6  1.96(1.3) = [3.052, 8.148]
 3 : 10.35  1.96(6.88) = [3.135, 23.835]
 4 : 3.45  1.96(2.7) = [1.842, 8.742]
 5 : 4.25  1.96(0.38) = [4.995, 3.505]
 3 &  4 :contains the point (0,0)
11-32. Use the following formula for testing the significance of each of the slope parameters:
bi
z . and use  = 0.05. Critical value of |z| = 1.96
sbi 
11-6
For unexpected accruals: z = -2.0775 / 0.4111 = -5.054 (significant)

For auditor quality: z = 0.5176
For return on investment: z = 1.7785
For expenditure on R&D: z = 2.1161 (significant)
The R-square indicates that 36.5% of the variation in a firm’s reputation can be explained by the
four independent variables listed.
11-33. Yes. Considering the joint confidence region for both slope parameters is equivalent to
conducting an F test for the existence of a linear regression relationship. Since (0,0) is not in the
joint 95% region, this is equivalent to rejecting the null hypothesis of the F test at  = 0.05.
11-34. Prestige is not significant (or at least appears so, pending further analysis). Comfort and
Economy are significant (Comfort only at the 0.05 level). The regression should be rerun with
variables deleted.
11-35. Variable Lend seems insignificant because of collinearity with M1 or Price.
11-36. a. As Price is dropped, Lend becomes significant: there is, apparently, a collinearity between
Lend and Price.
b.,c. The best model so far is the one in Table 11-9, with M1 and Price only. The adjusted R 2 for
that model is higher than for the other regressions.
d. For the model in this problem, MINITAB reports F = 114.09. Highly significant. For the
model in Table 11-9: F = 150.67. Highly significant.
e. s = 0.3697. For Problem 11-35: s = 0.3332. As a variable is deleted, s (and its square, MSE)
increases.
f. In Problem 11-35: MSE = s 2 = (0.3332)2 = 0.111.
11-37. Autocorrelation of the regression error may cause this.
11-38. Use the following formula for testing the significance of each of the slope parameters:
bi
z . and use  = 0.05. Critical value of |z| = 1.96
sbi 
For new technological process: z = -0.014 / 0.004 = -3.50 (significant)
For organizational innovation: z = 0.25
For commercial innovation: z = 3.2 (significant)
For R&D: z = 4.50 (significant)
All but “organizational innovation” is an important independent variable in explaining

employment growth. The R-square indicates that 74.3% of the variation in employment growth
is explained by the four independent variables in the equation.
11-7
11-39. Regress Profits on Employees and Revenues
Multiple Regression
Y 1 X1 X2 Multiple Regression Results

Sl.No. Profits Ones Employees Revenues
1 -1221 1 96400 17440 0 1 2
2 -2808 1 63000 13724 Intercept Employees Revenues
3 -773 1 70600 13303 b 834.9510193 0.0085493 -0.174148688
4 248 1 39100 9510 s(b) 621.1993315 0.064416986 0.340929503
t 1.344095167 0.132718098 -0.510805567
5 38 1 37680 8870
p-value 0.2208 0.8982 0.6252
6 1461 1 31700 6846
7 442 1 32847 5937
VIF 29.8304 29.8304
8 14 1 12867 2445
9 57 1 11475 2254
10 108 1 6000 1311
ANOVA Table
Source SS df MS F FCritical p-value
Regn. 4507008.861 2 2253504.43 2.166 4.737 0.1852 s 1019.925
Error 7281731.539 7 1040247.363
2 2
Total 11788740.4 9 1309860.044 R 0.3823 Adjusted R 0.2058
Correlation matrix
1 2
Employees Revenues
1 Employees 1.0000
2 Revenues 0.9831 1.0000
Y Profits -0.5994 -0.6171
Regression Equation:
Profits = 834.95 + 0.009 Employees - 0.174 Revenues
The regression equation is not significant (F value), and there is a large amount of
multicollinearity present between the two independent variables (0.9831). There is so much
multicollinearity present that the negative partial correlations between the independent variables
and profits are not maintained in the regression results (both of the parameters of the independent
variables should be negative). None of the values of the parameters are significant.
11-40. The residual plot exhibits both heteroscedasticity and a curvature apparently not accounted for in
the model.
11-8

CHP 8-12

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

CHP 8-12

Încărcat de

Drepturi de autor:

Formate disponibile

Chapter 08 - The Comparison of Two Populations

8-1. n = 25 D = 19.08 s D = 30.67

Paired Difference Test

8-3. n = 12 D = 3.67 s D = 2.45 (D = Movie – Commercial)

(template: Testing Paired Difference.xls, sheet: Sample Data)

Paired Difference Test

8-5. n = 15 D = 3.2 s D = 8.436 (D = After – Before)

There is no evidence that the shelf facings are effective.

8-6. n = 12 D = 37.08 s D = 43.99

Paired Difference Test

8-7. Power at  D = 0.1 n = 60  D = 1.0  = 0.01

8-8. n = 20 D = 1.25 s D = 42.896

Paired Difference Test

8-9. n1 = 100 n 2 = 100 x1 = 76.5 x 2 = 88.1 s1 = 38 s 2 = 40

(Template: Testing Population Means.xls, sheet: Z-test from Stats)

Assuming Population Variances are Equal

8-11. Bel Air (1): n1 = 32 x1 = 2.5M s1 = 0.41M

(Template: Testing Population Means.xls, sheet: t-test from Stats)

t-Test for Difference in Population Means

Assuming Population Variances are Equal

Assuming Population Variances are Unequal

8-12. (Template: Testing Population Means.xls, sheet: t-test from Stats)

H0: μJ – μSP = 0 H1: μJ – μSP ≠ 0

t-Test for Difference in Population Means

Assuming Population Variances are Equal

8-13. Music: n1 = 128 x1 = 23.5 s1 = 12.2

Reject H0. Music is probably more effective.

Test Statistic 4.2397 z

8-14. n1 = 13 n 2 = 13 x1 = 20.385 x 2 = 10.385

8-15. Liz (1): n1 = 32 x1 = 4,238 s1 = 1,002.5

8-16. (Template: Testing Population Means.xls, sheet: t-test from Stats)

Assuming Population Variances are Equal

8-17. Non-research (1): n1 = 255 s1 = 0.64

95% C.I. for  2  1 is: ( x 2  x1 )  z / 2 (s1 / n1 )  (s 2 / n2 )

8-18. Audio (1): n1 = 25 x1 = 87 s1 = 12

Assuming Population Variances are Equal

8-19. With training (1): n1 = 13 x1 = 55 s1 = 8

8-20. (Use template: “testing difference in means.xls”)

H0: μP - μL= 0 H1: μP - μL  0

t-Test for Difference in Population Means

Assuming Population Variances are Unequal

8-21. (Use template: “testing difference in means.xls”)

Equal variance assumption is violated.

8-22. Old (1): n1 = 19 x1 = 8.26 s1 = 1.43

8-24. (Use template: “testing difference in means.xls”)

Assuming Population Variances are Equal

8-25. “Yes” (1): n1 = 25 x1 = 12 s1 = 2.5

Assuming Population Variances are Equal

8-26. H0: 1   2 = 0 H1: 1   2  0

8-27. (Use template: “testing difference in means.xls”)

Assuming Population Variances are Equal

8.28. From Problem 8-25:

8-29. Before (1): x1 = 85 n1 = 100

Pooled p-hat 0.7650

8-30. Small towns (1): n1 = 1,000 x 1 = 850

8-32. Before campaign (1): n1 = 2,060 p̂1 = 0.13

8-35. American execs (1): n1 = 120 x 1 = 34

Evidence Sample 1 Sample 2

Pooled p-hat 0.2344

8-36. Cleveland (1): n1 = 1,000 x 1 = 75 p̂1 = .075

8-37. (Use template: “testing difference in proportions.xls”)

Comparing Two Population Proportions

Pooled p-hat 0.1200