Sunteți pe pagina 1din 46

Chapter 08 - The Comparison of Two Populations

CHAPTER 8
THE COMPARISON OF TWO POPULATIONS

8-1. n = 25 D = 19.08 s D = 30.67


H0:  D = 0 H1:  D  0
D   D0
t (24) = = 3.11
sD / n
Reject H0 at  = 0.01.

Paired Difference Test


Evidence
Size 25 n Assumption
Average Difference 19.08 D Populations Normal
Stdev. of Difference 30.67 sD
Note: Difference has been defined as
Test Statistic 3.1105 t
df 24
Hypothesis Testing At an  of
Null Hypothesis p-value 5%
H0: 1 2 =0 0.0048 Reject

8-2. n = 40 D = 5 s D = 2.3
H0:  D = 0 H1:  D  0
50
t(39) = = 13.75
2.3 / 40
Strongly reject H0. 95% C.I. for  D  2.023(2.3/ 40 ) = [4.26, 5.74].

8-3. n = 12 D = 3.67 s D = 2.45 (D = Movie – Commercial)


H0:  D = 0 H1:  D  0

8-1
Chapter 08 - The Comparison of Two Populations

(template: Testing Paired Difference.xls, sheet: Sample Data)


Paired Difference Test
Data
Current Previous Evidence
M C Size 9 n Assumption
1 15 10 Average Difference 3.66667 D Populations Normal
2 17 9 Stdev. of Difference 2.44949 sD
3 25 21 Note: Difference has been defined as
4 17 16 Test Statistic 4.4907 t
5 14 11 df 8
6 18 12 Hypothesis Testing At an  of
7 17 13 Null Hypothesis p-value 5%
8 16 15 H0: 1 2 =0 0.0020 Reject
9 14 13 H0: 1 2 >=0 0.9990
H0: 1 2 <=0 0.0010 Reject

At  = 0.05, we reject H0. There are more viewers for movies than commercials.

8-4. n = 60 D = 0.2 sD = 1
H0:  D  0 H1:  D > 0
0.2  0
t(24) = = 1.549. At  = 0.05, we cannot reject H0.
1 / 60

Paired Difference Test


Evidence
Size 60 n Assumption
Average Difference 0.2 D Populations Normal
Stdev. of Difference 1 sD
Note: Difference has been defined as
Test Statistic 1.5492 t
df 59
Hypothesis Testing At an  of
Null Hypothesis p-value 5%
H0: 1 2 =0 0.1267
H0: 1 2 >=0 0.9367
H0: 1 2 <=0 0.0633

8-5. n = 15 D = 3.2 s D = 8.436 (D = After – Before)


H0:  D  0 H1:  D > 0
3.2  0
t (14) = = 1.469
8.436 / 15

8-2
Chapter 08 - The Comparison of Two Populations

There is no evidence that the shelf facings are effective.

8-6. n = 12 D = 37.08 s D = 43.99


H0:  D = 0 H1:  D  0
(template: Testing Paired Difference.xls, sheet: Sample Data)

Paired Difference Test


Data
Current Previous Evidence
France Spain Size 12 n Assumption
1 258 214 Average Difference 37.0833 D Populations Normal
2 289 250 Stdev. of Difference 43.9927 sD
3 228 190 Note: Difference has been defined as
4 200 185 Test Statistic 2.9200 t
5 190 114 df 11
6 350 285 Hypothesis Testing At an  of
7 310 378 Null Hypothesis p-value 5%
8 212 230 H0: 1 2 =0 0.0139 Reject
9 195 160 H 0: 1 2 >= 0 0.9930
10 175 120 H0: 1 2 <=0 0.0070 Reject
11 299 220
12 190 105

Reject H0. There is strong evidence that hotels in Spain are cheaper than those in France,
based on this small sample. p-value = 0.0139

8-7. Power at  D = 0.1 n = 60  D = 1.0  = 0.01


H0:  D  0 H1:  D > 0
C =  0 + 2.326(  / n ) = 0.30029 We need:
P( D > C |  D = 0.1)
= P( D > 0.30029 |  D = 0.1)
 0.30029 0.1 
= P  Z  
 1 / 60 
= P(Z > 1.551) = 0.0604

8-8. n = 20 D = 1.25 s D = 42.896


H0:  D = 0 H1:  D  0

8-3
Chapter 08 - The Comparison of Two Populations

1.25  0
t (19) = = 0.13
42.89 / 20
Do not reject H0; no evidence of a difference.

Paired Difference Test


Evidence
Size 20 n Assumption
Average Difference 1.25 D Populations Normal
Stdev. of Difference 42.89 sD
Note: Difference has been defined as
Test Statistic 0.1303 t
df 19
Hypothesis Testing At an  of
Null Hypothesis p-value 5%
H0: 1 2 =0 0.8977

8-9. n1 = 100 n 2 = 100 x1 = 76.5 x 2 = 88.1 s1 = 38 s 2 = 40


H0:  2  1  0 H1:  2  1  0

(Template: Testing Population Means.xls, sheet: Z-test from Stats)


(need to use the t-test since the population std. dev. is unknown)
Evidence Assumptions
Sample1 Sample2 Populations Normal
Size 100 100 n H0: Population Variances Equal
Mean 76.5 88.1 x-bar F ratio 1.10803
Std. Deviation 38 40 s p-value 0.6108

Assuming Population Variances are Equal


Pooled Variance 1522 s2p
Test Statistic -2.1025 t
df 198
At an  of Confidence Interval for difference in Population Means
Confidence
Null Hypothesis p-value 5%  Interval
H0: 1 2 =0 0.0368 Reject 95% -11.6 ± 10.8801 = [ -22.48, -0.7199 ]
H0: 1 2 >=0 0.0184 Reject
H0: 1 2 <=0 0.9816
Reject H0. There is evidence that gasoline outperforms ethanol.

8-10. n1 = n 2 = 30
H0: 1   2 = 0 H1: 1   2  0
Nikon (1): x1 = 8.5 s1 = 2.1 Minolta (2): x 2 = 7.8 s 2 = 1.8

8-4
Chapter 08 - The Comparison of Two Populations

8.5  7.8
z= = 1.386
2 2
(2.1 / 30)  (1.8 / 30)
Do not reject H0. There is no evidence of a difference in the average ratings of the two cameras.

8-11. Bel Air (1): n1 = 32 x1 = 2.5M s1 = 0.41M


Marin (2): n 2 = 35 x 2 = 4.32M s 2 = 0.87M
H0: 1   2 = 0 H1: 1   2  0

(Template: Testing Population Means.xls, sheet: t-test from Stats)


(need to use the t-test since the population std. dev. is unknown)
Equal variance assumptions is questionable.

t-Test for Difference in Population Means

Evidence Assumptions
Sample1 Sample2 Populations Normal
Size 32 35 n H0: Population Variances Equal
Mean 2.5 4.32 x-bar F ratio 4.50268
Std. Deviation 0.41 0.87 s p-value 0.0001

Assuming Population Variances are Equal


Pooled Variance 0.47609 s2p Warning: Equal variance assumption is questionable.
Test Statistic -10.7845 t
df 65
At an  of Confidence Interval for difference in Population Means
Confiden
ce
Null Hypothesis p-value 5%  Interval
H0: 1 2 =0 0.0000 Reject 95% -1.82 ± 0.33704 = [ -2.157, -1.48296 ]
H0: 1 2 >=0 0.0000 Reject
H0: 1 2 <=0 1.0000

Assuming Population Variances are Unequal


Test Statistic -11.101 t
df 49
At an  of Confidence Interval for difference in Population Means
Null Hypothesis p-value 5%  Confidence Interval
H0: 1 2 =0 0.0000 Reject 95% -1.82 ± 0.32946 = [ -2.1495, -1.49054 ]
H0: 1 2 >=0 0.0000 Reject
H0: 1 2 <=0 1.0000

8-5
Chapter 08 - The Comparison of Two Populations

Reject H0. There is evidence that the average Bel Air price is lower.

8-12. (Template: Testing Population Means.xls, sheet: t-test from Stats)


(need to use the t-test since the population std. dev. is unknown)

H0: μJ – μSP = 0 H1: μJ – μSP ≠ 0

t-Test for Difference in Population Means


Evidence Assumptions
Sample1 Sample2 Populations Normal
Size 40 40 n H0: Population Variances Equal
Mean 15 6.2 x-bar F ratio 1.36111
Std. Deviation 3 3.5 s p-value 0.3398

Assuming Population Variances are Equal


Pooled Variance 10.625 s2p
Test Statistic 12.0735 t
df 78
At an  of Confidence Interval for difference in Population Means
Confidence
Null Hypothesis p-value 5%  Interval
H0: 1 2 =0 0.0000 Reject 95% 8.8 ± 1.45107 = [ 7.34893, 10.2511 ]
H0: 1 2 >=0 1.0000
H0: 1 2 <=0 0.0000 Reject

Reject the null hypothesis. The global equities outperform U.S. market.

8-13. Music: n1 = 128 x1 = 23.5 s1 = 12.2


Verbal: n 2 = 212 x 2 = 18.0 s 2 = 10.5
H0: 1   2 = 0 H1: 1   2  0
23.5  18.0
z= = 4.24
(12.2 / 128)  (10.5 2 / 212)
2

Reject H0. Music is probably more effective.

8-6
Chapter 08 - The Comparison of Two Populations

Evidence
Sample1 Sample2
Size 128 212 n
Mean 23.5 18 x-bar
Popn. 1 Popn. 2
Popn. Std. Devn. 12.2 10.5 
฀
Hypothesis Testing

Test Statistic 4.2397 z


At an  of
Null Hypothesis p-value 5%
H0: 1 2 =0 0.0000 Reject

8-14. n1 = 13 n 2 = 13 x1 = 20.385 x 2 = 10.385


s1 = 7.622 s 2 = 4.292  = .05
H0: u1 = u2 H1: u1  u2

S p2 
13 17.622 2  13 14.292 2  38.2581
13  13 2

20.385 10.385
t 24   4.1219

38.2581 1  1
13 13

df  24.

Use a critical value of 2.064 for a two-tailed test. Reject H0. The two methods do differ.

8-15. Liz (1): n1 = 32 x1 = 4,238 s1 = 1,002.5


Calvin (2): n 2 = 37 x 2 = 3,888.72 s 2 = 876.05
a. one-tailed: H0: 1   2  0 H1: 1   2 > 0
4,238  3,888.72  0
b. z = = 1.53
(1,002.52 / 32)  (876.052 / 37)
c. At  = 0.5, the critical point is 1.645. Do not reject H0 that Liz Claiborne models do not get
more money, on the average.
d. p-value = .5  .437 = .063 (It is the probability of committing a Type I error if we choose
to reject and H0 happens to be true.)

8-7
Chapter 08 - The Comparison of Two Populations

e.

S 2p 
10  11002.5 2  11  1876.05 2  879983.804
10  11  2
4238  3888.72
t 24   0.8522

879983.804 1  1
10 11

df  19

8-16. (Template: Testing Population Means.xls, sheet: t-test from Stats)


(need to use the t-test since the population std. dev. is unknown)
H0: 1   2 = 0 H1: 1   2  0
t-Test for Difference in Population Means

Evidence Assumptions
Sample1 Sample2 Populations Normal
Size 28 28 n H0: Population Variances Equal
Mean 0.19 0.72 x-bar F ratio 1.25792
Std. Deviation 5.72 5.1 s p-value 0.5552

Assuming Population Variances are Equal


Pooled Variance 29.3642 s2p
Test Statistic -0.3660 t
df 54
At an  of Confidence Interval for difference in Population Means
Confidence
Null Hypothesis p-value 1%  Interval
H0: 1 2 =0 0.7158 99% -0.53 ± 3.86682 = [ -4.3968, 3.33682 ]
H0: 1 2 >=0 0.3579
H0: 1 2 <=0 0.6421
Do not reject the null hypothesis. Pre-earnings announcements have no impact on earnings on
stock investments.

8-17. Non-research (1): n1 = 255 s1 = 0.64


Research (2): n 2 = 300 s 2 = 0.85
x 2  x1 = 2.54

95% C.I. for  2  1 is: ( x 2  x1 )  z / 2 (s1 / n1 )  (s 2 / n2 )


2 2

2 2
= 2.54  1.96 (.64 / 255)  (.85 / 300) = [2.416, 2.664] percent.

8-8
Chapter 08 - The Comparison of Two Populations

8-18. Audio (1): n1 = 25 x1 = 87 s1 = 12


Video (2): n 2 = 20 x 2 = 64 s 2 = 23
H0: 1   2 = 0 H1: 1   2  0
x1  x 2  0
t(43) = = 4.326
(n1  1) s1  (n 2  1) s 2
2 2
1 1 
  
n1  n 2  2  n1 n 2 
Reject H0. Audio is probably better (higher average purchase intent). Waldenbooks should
concentrate in audio.

Evidence
Sample1 Sample2
Size 25 20 n
Mean 87 64 x-bar
Std. Deviation 12 23 s

Assuming Population Variances are Equal


Pooled Variance 314.116 s2p
Test Statistic 4.3257 t
df 43
At an  of
Null Hypothesis p-value 5%
H0: 1 2 =0 0.0001 Reject

8-19. With training (1): n1 = 13 x1 = 55 s1 = 8


Without training (2): n 2 = 15 x 2 = 48 s2 = 6
H0: 1   2  4,000 H1: 1   2 > 4,000
(55  48)  4
t (26) = = 1.132
2 2
(12)(8)  (14)(6)  1 1
  
26  13 15 
The critical value at  = .05 for t (26) in a right-hand tailed test is 1.706. Since 1.132 < 1.706,
there is no evidence at  = .05 that the program executives get an average of $4,000 per year
more than other executives of comparable levels.

8-20. (Use template: “testing difference in means.xls”)


(need to use the t-test since the population std. dev. is unknown)

H0: μP - μL= 0 H1: μP - μL  0

8-9
Chapter 08 - The Comparison of Two Populations

t-Test for Difference in Population Means


Evidence Assumptions
Sample1 Sample2 Populations Normal
Size 20 20 n H0: Population Variances Equal
Mean 1 6 x-bar F ratio 5.16529
Std. Deviation 1.1 2.5 s p-value 0.0008
The variances are not equal.

Assuming Population Variances are Unequal


Test Statistic -8.1868 t
df 26
At an  Confidence Interval for difference in Population
of Means
Confidence
Null Hypothesis p-value 5%  Interval
H0: 1 2 =0 0.0000 Reject 95% -5 ± 1.25539 = [ -6.2554, -3.74461 ]
H0: 1 2 >=0 0.0000 Reject
H0: 1 2 <=0 1.0000

Reject the null hypothesis: the average cost of beer is cheaper in Prague. Londoners save
between $3.74 and $6.26.

8-21. (Use template: “testing difference in means.xls”)


(need to use the t-test since the population std. dev. is unknown)

H0: 1   2 = 0 H1: 1   2  0
t-Test for Difference in Population Means
Evidence Assumptions
US China Populations Normal
Size 15 18 n H0: Population Variances Equal
Mean 3.8 6.1 x-bar F ratio 5.80372
Std. Deviation 2.2 5.3 s p-value 0.0018

8-10
Chapter 08 - The Comparison of Two Populations

Equal variance assumption is violated.


Assuming Population Variances are
Unequal
Test Statistic -1.676 t
df 23
Confidence Interval for difference in Population
At an  of Means
Confidence
Null Hypothesis p-value 1%  Interval
H0: 1 2 =0 0.1073 99% -2.3 ± 3.85252 = [ -6.1525, 1.55252 ]
H0: 1 2 >=0 0.0536
H0: 1 2 <=0 0.9464

Do not reject the null hypothesis (p-value = 0.1073), investment returns are the same in China
and the US.

8-22. Old (1): n1 = 19 x1 = 8.26 s1 = 1.43


New (2): n 2 = 23 x 2 = 9.11 s 2 = 1.56
H0:  2  1  0 H1:  2  1 > 0
9.11  8.26  0
t (40) = = 1.82
2 2
18(1.43)  22(1.56)  1 1 
  
40  19 23 
Some evidence to reject H0 (p-value = 0.038) for the t-distribution with df = 40, in a one-tailed
test.

8-23. Take proposed route as population 1 and alternate route as 2. Assume equal variance for both
populations.
H0: 1   2  0
H1: 1   2 > 0
p-value from the template = 0.8674
cannot reject H0

8-24. (Use template: “testing difference in means.xls”)


(need to use the t-test since the population std. dev. is unknown)
H0: 1   2 = 0 H1: 1   2  0

8-11
Chapter 08 - The Comparison of Two Populations

Evidence Assumptions
Sample1 Sample2 Populations Normal
Size 20 20 n H0: Population Variances Equal
Mean 3.56 4.84 x-bar F ratio 1.30612
Std. Deviation 2.8 3.2 s p-value 0.5662

Assuming Population Variances are Equal


Pooled Variance 9.04 s2p
Test Statistic -1.3463 t
df 38
At an  of
Null Hypothesis p-value 5%
H0: 1 2 =0 0.1862
H0: 1 2 >=0 0.0931
H0: 1 2 <=0 0.9069
Do not reject the null hypothesis. Neither investment outperforms the other.

8-25. “Yes” (1): n1 = 25 x1 = 12 s1 = 2.5


“No” (2): n 2 = 25 x 2 = 13.5 s2 = 1
Assume independent random sampling from normal populations with equal population variances.
H0:  2  1  0 H1:  2  1 > 0
13.5  12
t(48) = = 2.785
2 2
24(2.5)  24(1)  1 1 
  
48  25 25 
At  = 0.05, reject H0. Also reject at  = 0.01. p-value = 0.0038.

Evidence
Sample1 Sample2
Size 25 25 n
Mean 12 13.5 x-bar
Std. Deviation 2.5 1 s

Assuming Population Variances are Equal


Pooled Variance 3.625 s2p
Test Statistic -2.7854 t
df 48
At an  of
Null Hypothesis p-value 5%
H0: 1 2 =0 0.0076 Reject
H0: 1 2 >=0 0.0038 Reject

8-12
Chapter 08 - The Comparison of Two Populations

8-26. H0: 1   2 = 0 H1: 1   2  0


.1331 .105  0
z= = 0.8887
2 2
20(.09)  27(.122)  1 1 
  
47  21 28 
Do not reject H0. There is no evidence of a difference in average stock returns for the two
periods.

8-27. (Use template: “testing difference in means.xls”)


(need to use the t-test since the population std. dev. is unknown)
H0: μN - μO ≤ 0 H1: μN - μO > 0

Evidence Assumptions
Sample1 Sample2 Populations Normal
Size 8 10 n H0: Population Variances Equal
Mean 3 2.3 x-bar F ratio 1.1025
Std. Deviation 2 2.1 s p-value 0.9186

Assuming Population Variances are Equal


Pooled Variance 4.23063 s2p
Test Statistic 0.7175 t
df 16
At an  of
Null Hypothesis p-value 5%
H0: 1 2 =0 0.4834
H0: 1 2 >=0 0.7583
H0: 1 2 <=0 0.2417

Do not reject the null hypothesis. (p-value = 0.2417) The new advertising firm has not resulted
in significantly higher sales.

8.28. From Problem 8-25:


n1 = n 2 = 25 x1 = 12 x 2 = 13.5 s1 = 2.5 s2 = 1
We want a 95% C.I. for  2  1 :

(n1  1) s1  (n2  1) s 2
2 2
1 1 
( x 2  x1 )  2.011   
n1  n2  2  n1 n2 
2 2
24(2.5)  24(1)  1 1 
= (13.5 – 12)  2.001   
48  25 25 
= [0.4170, 2.5830] percent.

8-13
Chapter 08 - The Comparison of Two Populations

8-29. Before (1): x1 = 85 n1 = 100


After (2): x 2 = 68 n 2 = 100
H0: p1 – p2  0 H1: p1 – p2 > 0
pˆ 1  pˆ 2 .85  .68
z= = = 2.835
1 1   1 1 
pˆ (1  pˆ )   (.765)(.235)  
 n1 n 2   100 100 
Reject H0. On-time departure percentage has probably declined after NW’s merger with
Republic. p-value = 0.0023.

Sample Sample
Evidence 1 2
Size 100 100 n
#Successes 85 68 x
Proportion 0.8500 0.6800 p-hat

Hypothesis Testing
Hypothesized Difference Zero

Pooled p-hat 0.7650


Test Statistic 2.8351 z
At an  of
Null Hypothesis p-value 5%
H0: p1 - p2 = 0 0.0046 Reject
H0: p1 - p2 >= 0 0.9977
H0: p1 - p2 <= 0 0.0023 Reject

8-30. Small towns (1): n1 = 1,000 x 1 = 850


Big cities (2): n 2 = 2,500 x 2= 1,950
H0: p1 – p2  0 H1: p1 – p2 > 0
850 1,950

1,000 2,500
z= = 4.677
 850  1,950  2,800  1 1 
 1    
 3,500  3,500  1,000 2,500 
Reject H0. There is strong evidence that the percentage of word-of-mouth recommendations in
small towns is greater than it is in large metropolitan areas.

8.31. n1 = 31 x 1 = 11 n 2 = 50 x 2= 19
H0: p1 – p2 = 0 H1: p1 – p2  0

8-14
Chapter 08 - The Comparison of Two Populations

pˆ 1  pˆ 2
z= = 0.228
1 1 
pˆ (1  pˆ )  
 1
n n 2 

Do not reject H0. There is no evidence that one corporate raider is more successful than the other.

8-32. Before campaign (1): n1 = 2,060 p̂1 = 0.13


After campaign (2): n 2 = 5,000 p̂ 2 = 0.19
H0: p2  p1  .05 H1: p2 – p1 > .05
pˆ 2  pˆ 1  D 0.19  0.13  .05
z= = = 1.08
pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 ) (.13)(.87) (.19)(.81)
 
n1 n2 2,060 5,000
No evidence to reject H0; cannot conclude that the campaign has increased the proportion of
people who prefer California wines by over 0.05.

pˆ 1 (1  pˆ 1 ) pˆ 2 (1  pˆ 2 )
8-33. 95% C.I. for p2  p1: ( p̂ 2  p̂1 )  1.96 
n1 n2

(.13)(.87) (.19)(.81)
= .06  1.96  = [0.0419, 0.0781]
2,060 5,000
We are 95% confident that the increase in the proportion of the population preferring California
wines is anywhere from 4.19% to 7.81%.

Confidence Interval
 Confidence Interval
95% 0.0600 ± 0.0181 = [ 0.0419 , 0.0782 ]

8-34. The statement to be tested must be hypothesized before looking at the data:
Chase Man. (1): n1 = 650 x 1 = 48
Manuf. Han. (2): n 2 = 480 x 2 = 20
H0: p 1 – p 2  0 H1: p 1 – p 2 > 0
pˆ 1  pˆ 2
z= = 2.248
1 1 
pˆ (1  pˆ )  
 n1 n 2 
Reject H0. p-value = 0.0122.

8-35. American execs (1): n1 = 120 x 1 = 34


European execs (2): n 2 = 200 x 2 = 41
H0: p 1 – p 2  0 H1: p 1 – p 2 > 0

8-15
Chapter 08 - The Comparison of Two Populations

.283  .205
z= = 1.601
 1 1 
(.234)(1  .234)  
 120 200 
At  = 0.05, there is no evidence to conclude that the proportion of American executives who
prefer the A380 is greater than that of European executives. (p-value = 0.0547.)

Evidence Sample 1 Sample 2


Size 120 200 n
#Successes 34 41 x
Proportion 0.2833 0.2050 p-hat

Hypothesis Testing
Hypothesized Difference Zero

Pooled p-hat 0.2344


Test Statistic 1.6015 z
At an  of
Null Hypothesis p-value 5%
H0: p1 - p2 = 0 0.1093
H0: p1 - p2 >= 0 0.9454
H0: p1 - p2 <= 0 0.0546

8-36. Cleveland (1): n1 = 1,000 x 1 = 75 p̂1 = .075


Chicago (2): n 2 = 1,000 x 2 = 72 p̂ 2 = .072
H0: p 1 – p 2 = 0 H1: p 1 – p 2  0 p̂ = (72 +75)/2,000 = .0735
pˆ 1  pˆ 2
z= = 0.257
1 1 
pˆ (1  pˆ )  
 n1 n 2 
We cannot reject H0. p. value = 0.7971

8-37. (Use template: “testing difference in proportions.xls”)


H0: pQ – pN = 0 H1: pQ – pN ≠ 0

8-16
Chapter 08 - The Comparison of Two Populations

Comparing Two Population Proportions


Evidence Sample 1 Sample 2
Size 100 100 n
#Successes 18 6 x
Proportion 0.1800 0.0600 p-hat

Hypothesis Testing
Hypothesized Difference Zero

Pooled p-hat 0.1200


Test Statistic 2.6112 z
At an  of
Null Hypothesis p-value 5%
H0: p1 - p2 = 0 0.0090 Reject
Reject the null hypothesis, the new accounting method is more effective.

8-38. (Use template: “testing difference in proportions.xls”)


H0: pC – pD = 0 H1: pC – pD ≠ 0

Comparing Two Population Proportions


Evidence Sample 1 Sample 2
Size 100 100 n
#Successes 32 19 x
Proportion 0.3200 0.1900 p-hat

Hypothesis Testing
Hypothesized Difference Zero

Pooled p-hat 0.2550


Test Statistic 2.1090 z
At an  of
Null Hypothesis p-value 1%
H0: p1 - p2 = 0 0.0349

Do not reject the null hypothesis: the proportions are not significantly different.

8-39. Motorola (1): n1 = 120 x 1 = 101 p1 = .842


Blaupunkt (2): n 2 = 200 x 2 = 110 p2 = .550
H0: p 1  p 2 H1: p 1 > p 2 p̂ = (101 +110)/320 = .659
.842  .550
z= = 5.33
 1 1 
(.659)(1  .659)  
 120 200 

8-17
Chapter 08 - The Comparison of Two Populations

Strongly reject H0; Motorola’s system is superior (p-value is very small).

2
8-40. Old method (1): n1 = 40 s1 = 1,288
2
New method (2): n 2 = 15 s 2 = 1,112
2 2 2 2
H0:  1  2 H1:  1 >2 use  = .05
2 2
F (39,14) = s 1 /s 2
= 1,288/1,112 = 1.158
The critical point at  = .05 is F (39,14) = 2.27 (using approximate df in the table). Do not reject
H0. There is no evidence that the variance of the new production method is smaller.

F-Test for Equality of Variances

Sample 1 Sample 2
Size 40 15
Variance 1288 1112

Test Statistic 1.158273 F


df1 39
df2 14

At an  of
Null Hypothesis p-value 5%
H0:  -
2 2
1 2 = 0 0.7977
H0:  -
2 2
1 2 >= 0 0.6012
H0:  1 2
2 2
- <= 0 0.3988

8-41. Test the equal-variance assumption of Problem 8-27:


2 2 2 2
H0:  1 =  2 H1:  1   2

F = 1.1025
Assumptions
Populations Normal
H0: Population Variances Equal
F ratio 1.1025
p-value 0.9186

Do not reject H0. Variances are equal.

8-42. “Yes” (1): n1 = 25 s1= 2.5


“No” (2): n 2 = 25 s2= 1
2 2 2 2
H0:  1 =  2 H1:  1   2
Put the larger s in the numerator and use 2  :
2

2 2 2 2
F (24,24) = s1 / s2 = (2.5) /(1) = 6.25

8-18
Chapter 09 - Analysis of Variance

CHAPTER 9
ANALYSIS OF VARIANCE

9-1. H0: X X X X 1   2   3   4
H1: X X All 4 are different
X X
X X X 2 equal; 2 different
X
X X X X 3 equal; 1 different
X X X X 2 equal; other 2 equal but different from first 2

9-2. ANOVA assumptions: normal populations with equal variance. Independent random sampling
from the r populations.

9-3. Series of paired t-test are dependent on each other. There is no control over the probability of a
Type I error for the joint series of tests.

9-4. r = 5 n1 = n2 = . . . = n5 = 21 n =105
df’s of F are 4 and 100. Computed F = 3.6. The p-value is close to 0.01. Reject H0. There is
evidence that not all 5 plants have equal average output.

F Distribution

 10% 5% 1% 0.50%
(1-Tail) F-Critical 2.0019 2.4626 3.5127 3.9634

9-5. r = 4 n1 = 52 n2 = 38 n3 = 43 n4 = 47
Computed F = 12.53. Reject H0. The average price per lot is not equal at all 4 cities. Feel very
strongly about rejecting the null hypothesis as the critical point of F (3,176) for  = .01 is
approximately 3.8.

F Distribution

 10% 5% 1% 0.50%
(1-Tail) F-Critical 2.1152 2.6559 3.8948 4.4264

9-6. Originally, treatments referred to the different types of agricultural experiments being performed
on a crop; today it is used interchangeably to refer to the different populations in the study.Errors
are the differences between the data points and their sample means.

9-7. Because the sum of all the deviations from a mean is equal to 0.

9-1
Chapter 09 - Analysis of Variance

 
9-8. Total deviation = xij – x = ( x i – x ) + x ij  xi 
= treatment deviation + error deviation.
9-9. The sum of squares principle says that the sum of the squared total deviations of all the data
points is equal to the sum of the squared treatment deviations plus the sum of all squared error
deviations in the data.

9-10. An error is any deviation from a sample menu that is not explained by differences among
populations. An error may be due to a host of factors not studied in the experiment.

9-11. Both MSTR and MSE are sample statistics given to natural variation about their own means.
(If x >  0 we cannot immediately reject H0 in a single-sample case either.)
9-12. The main principle of ANOVA is that if the r population means are not all equal then it is likely
that the variation of the data points about their sample means will be small compared to the
variation of the sample means about the grand mean.

9-13. Distances among populations means manifest themselves in treatment deviations that are large
relative to error deviations. When these deviations are squared, added, and then divided by df’s,
they give two variances. When the treatment variance is (significantly) greater than the error
variance, population mean differences are likely to exist.

9-14. a) degrees of freedom for Factor: 4 – 1 = 3


b) degrees of freedom for Error: 80 – 4 = 76
c) degrees of freedom for Total: 80 – 1 = 79

9-15 SST = SSTR + SSE, but this does not equal MSTR + MSE. A counterexample:
Let n = 21 r = 6 SST = 100 SSTR = 85 SSE = 15
Then SST = SSTR + SSE = 85 + 15 = 100.
SSTR SSE 85 15 SST
But = MSTR  MSE      18 
r  1 n  r 5 15 n 1

9-16. When the null hypothesis of ANOVA is false, the ratio MSTR/MSE is not the ratio of two
independent, unbiased estimators of the common population variance  2 , hence this ratio does
not follow an F distribution.

9-17. For each observation xij , we know that


 
(tot.) = (treat.) + (error): xij – x = ( x i – x ) + x ij  xi 
Squaring both sides of the equation:
  
( xij – x )2 = ( x i – x )2 + 2( x i – x )( xij – x i ) + ( xij – x i )2

9-2
Chapter 09 - Analysis of Variance

Now sum this over all observations (all treatments i = 1, . . . , r; and within treatment i, all
observations j = 1, . . . , ni:
ni ni ni ni
  
r r r r


i 1 j 1
( xij – x )2 = 
i 1 j 1
( x i – x )2 +  i 1 j 1
2( x i – x )( xij – x i ) + 
i 1 j 1
( xij – x i )2


r
Notice that the first sum of the R.H.S. here equals 
i 1
ni( x i – x )2 since for each i the

summand doesn’t vary over each of the ni) values of j. Similarly the second sum is
ni ni

r
2  [( x i – x )  ( xij – x i )]. But for each fixed i,  ( xij – x i ) = 0 since this is just the sum
i 1 j 1 j 1

of all deviations from the mean within treatment i. Thus the whole second sum in the long R.H.S.
above is 0, and the equation is now
ni ni
 
r r r


i 1 j 1
( xij – x )2 = 
i 1
ni( x i – x )2 + 
i 1 j 1
( xij – x i )2

which is precisely Equation (9-12).

9-18. (From Minitab):


Source df SS MS F
Treatment 2 381127 190563 20.71
Error 27 248460 9202
Total 29 629587
The critical point for F (2,27) at  = 0.01 is 5.49. Therefore, reject H0. The average range of the 3
prototype planes is probably not equal.


ANOVA Table 5%
Source SS df MS F Fcritical p-value
Between 381127 2 190563.33 20.7084038 3.3541312 0.0000 Reject
Within 248460 27 9202.2222
Total 629587 29

9-19. (Template: Anova.xls, sheet: 1-way):


ANOVA Table 5%
Source SS df MS F Fcritical p-value
Between 187.696 3 62.565 11.494 2.9467 0.0000 Reject
Within 152.413 28 5.4433
Total 340.108 31

9-3
Chapter 09 - Analysis of Variance

MINITAB output
One-way ANOVA: UK, Mex, UAE, Oman

Source DF SS MS F P
Factor 3 187.70 62.57 11.49 0.000
Error 28 152.41 5.44
Total 31 340.11

S = 2.333 R-Sq = 55.19% R-Sq(adj) = 50.39%

Individual 95% CIs For Mean Based on


Pooled
StDev
Level N Mean StDev +---------+---------+---------+--------
-
UK 8 60.160 2.535 (------*-----)
Mex 8 58.390 2.405 (------*-----)
UAE 8 55.190 2.224 (------*------)
Oman 8 54.124 2.149 (-----*------)
+---------+---------+---------+--------
-
52.5 55.0 57.5 60.0

Pooled StDev = 2.333

Critical point F (3,28) for  = 0.05 is 2.9467. Therefore we reject H0. There is evidence of
differences in the average price per barrel of oil from the four sources. The Rotterdam oil market
may not be efficient. The conclusion is valid only for Rotterdam, and only for Arabian Light. We
need to assume independent random samples from these populations, normal populations with
equal population variance. Observations are time-dependent (days during February), thus the
assumptions could be violated. This is a limitation of the study. Another limitation is that
February may be different from other months.

9-20. An F(.05,2,101) = 3.61 result, relative to a critical value of 3.08637, indicates a significant difference
in their perceptions on the roles played by African American models in commercials.

9-21. (From Minitab):


Source df SS MS F
Treatment 2 91.0426 45.5213 12.31
Error 38 140.529 3.69812
Total 40 231.571

9-4
Chapter 09 - Analysis of Variance

p-value = .0001. Critical point for F (2,38) at  = .05 is 3.245. Therefore, reject H0. There is a
difference in the length of time it takes to make a decision.


ANOVA Table 5%
Source SS df MS F Fcritical p-value
Between 91.0426 2 45.521302 12.3093042 3.2448213 0.0001 Reject
Within 140.529 38 3.6981215
Total 231.571 40

9-22. An F(.05,2,55) = 52.787 result, relative to a critical value of 3.165, indicates a significant difference
in the monetary-economic reaction to the three inflation fighting policies.

9-23. The test results exceed the critical value of F(.01,3,236) = 3.866. The results indicate that the
performances of the four different portfolios are significantly different.

9-24. 95% C.I. for the mean responses:


Martinique: x2  t / 2 MSE / n2 = 75  1.96 504.4 / 40 = [68.04, 81.96]
Eleuthera: 73  1.96 MSE / n3 = [66.04, 79.96]
Paradise Island: 91  1.96 MSE / n4 = [84.04, 97.96]
St. Lucia: 85  1.96 MSE / n5 = [78.04, 91.96]

9-25. Where do differences exist in the circle-square-triangle populations from Table 9-1, using
Tukey? From the text: MSE = 2.125
triangles: n1 = 4 x1 = 6
squares: n2 = 4 x 2 = 11.5
circles: n3 = 3 x3 = 2
For  = .01, q  (r,nr) = q 0.01(3,8) = 5.63 Smallest ni is 3:
T = q MSE / 3 = 5.63 2.125 / 3 = 4.738
| x1  x 2 | = 5.5 > 4.738 sig.
| x 2  x 3 | = 9.5 > 4.738 sig.
| x1  x 3 | = 4.0 < 4.738 n.s.
Thus: “1 = 3”; “2 > 1”; “2 > 3”

9-26. Find which prototype planes are different in Problem 9-18:


MSE = 9,202 ni = 10 for all i x A = 4,407 x B = 4,230 xC = 4,135
For  = .05, q  (3,27) = approximately 3.51. T = 3.51 9,202 / 10 = 106.475

9-5
Chapter 10 - Simple Linear Regression and Correlation

CHAPTER 10
SIMPLE LINEAR REGRESSION AND CORRELATION

(The template for this chapter is: Simple Regression.xls.)

10-1. A statistical model is a set of mathematical formulas and assumptions that describe some real-
world situation.

10-2. Steps in statistical model building: 1) Hypothesize a statistical model; 2) Estimate the model
parameters; 3) Test the validity of the model; and 4) Use the model.

10-3. Assumptions of the simple linear regression model: 1) A straight-line relationship between X and
Y; 2) The values of X are fixed; 3) The regression errors, , are identically normally distributed
random variables, uncorrelated with each other through time.

10-4.  0 is the Y-intercept of the regression line, and  1 is the slope of the line.

10-5. The conditional mean of Y, E(Y | X), is the population regression line.

10-6. The regression model is used for understanding the relationship between the two variables, X and
Y; for prediction of Y for given values of X; and for possible control of the variable Y, using the
variable X.

10-7. The error term captures the randomness in the process. Since X is assumed nonrandom, the
addition of  makes the result (Y) a random variable. The error term captures the effects on Y of a
host of unknown random components not accounted for by the simple linear regression model.

10-8. The equation represents a simple linear regression model without an intercept (constant) term.

10-9. The least-squares procedure produces the best estimated regression line in the sense that the line
lies “inside” the data set. The line is the best unbiased linear estimator of the true regression line
as the estimators  0 and  1 have smallest variance of all linear unbiased estimators of the line
parameters. Least-squares line is obtained by minimizing the sum of the squared deviations of the
data points from the line.

10-10. Least squares is less useful when outliers exist. Outliers tend to have a greater influence on the
determination of the estimators of the line parameters because the procedure is based on
minimizing the squared distances from the line. Since outliers have large squared distances they
exert undue influence on the line. A more robust procedure may be appropriate when outliers
exist.

10-1
Chapter 10 - Simple Linear Regression and Correlation

10-11. (Template: Simple Regression.xls, sheet: Regression)


Simple Regression

Income Wealth
X Y Error Quantile Z Confidence Interval for Slope
1 1 17.3 0.8 0.667 0.431  (1-) C.I. for  1
2 2 23.6 -3.02 0.167 -0.967 95% 10.12 + or - 2.77974
3 3 40.2 3.46 0.833 0.967
4 4 45.8 -1.06 0.333 -0.431 Confidence Interval for Intercept
5 5 56.8 -0.18 0.500 0.000  (1-) C.I. for  0
95% 6.38 + or - 9.21937

Regression Equation: Wealth Growth = 6.38 + 10.12 Income Quantile

10-12. b1 = SSXY /SSX = 934.49/765.98 = 1.22

10-13. (Template: Simple Regression.xls, sheet: Regression)

Thus, b0 = 3.057 b1 = 0.187

2
r 0.9217 Coefficient of Determination
Confidence Interval for Slope r 0.9601 Coefficient of Correlation
 (1-) C.I. for  1
95% 0.18663 + or - 0.03609 s(b1) 0.0164Standard Error of Slope

Confidence Interval for Intercept


 (1-) C.I. for  0
95% -3.05658 + or - 2.1372 s(b0) 0.97102Standard Error of Intercept

Prediction Interval for Y


 X (1-) P.I. for Y given X
95% 10 -1.19025 + or - 2.8317 s 0.99538Standard Error of prediction

Prediction Interval for E[Y|X]


 X (1-) P.I. for E[Y | X]
+ or -

ANOVA Table
Source SS df MS F Fcritical p-value
Regn. 128.332 1 128.332 129.525 4.84434 0.0000
Error 10.8987 11 0.99079
Total 139.231 12

10-2
Chapter 10 - Simple Linear Regression and Correlation

10-14. b1 = SSXY /SSX = 2.11


b0 = y  b1 x = 165.3  (2.11)(88.9) = 22.279

10-15.
Simple Regression

Inflation Return
X Y Error
1 1 -3 -20.0642
2 2 36 17.9677
3 12.6 12 -16.294
4 -10.3 -8 -14.1247
5 0.51 53 36.4102
6 2.03 -2 -20.0613
7 -1.8 18 3.64648
8 5.79 32 10.2987
9 5.87 24 2.22121

Inflation & return on stocks

2
r 0.0873 Coefficient of Determination
Confidence Interval for Slope r 0.2955 Coefficient of Correlation
 (1-) C.I. for  1
95% 0.96809 + or - 2.7972 s(b1) 1.18294Standard Error of Slope

Confidence Interval for Intercept


 (1-) C.I. for  0
95% 16.0961 + or - 17.3299 s(b0) 7.32883Standard Error of Intercept

s 20.8493Standard Error of prediction

ANOVA Table
Source SS df MS F Fcritical p-value
Regn. 291.134 1 291.134 0.66974 5.59146 0.4401
Error 3042.87 7 434.695
Total 3334 8

10-3
Chapter 10 - Simple Linear Regression and Correlation

60
50
y = 0.9681x + 16.096
40
30
20
Y

10
0
-15 -10 -5 -10 0 5 10 15

-20
X

There is a weak linear relationship (r) and the regression is not significant (r2, F, p-value)

10-16.
Simple Regression

Year Value
X Y Error
1 1960 180000 84000
2 1970 40000 -72000
3 1980 60000 -68000
4 1990 160000 16000
5 2000 200000 40000

Average value of Aston Martin

2
r 0.1203 Coefficient of Determination
Confidence Interval for Slope r 0.3468 Coefficient of Correlation
 (1-) C.I. for  1
95% 1600 + or - 7949.76 s(b1) 2498Standard Error of Slope

Confidence Interval for Intercept


 (1-) C.I. for  0
95% -3040000 + or - 1.6E+07 s(b0) 4946165Standard Error of Intercept

s 78993.7Standard Error of prediction

ANOVA Table
Source SS df MS F Fcritical p-value
Regn. 2.6E+09 1 2.6E+09 0.41026 10.128 0.5674
Error 1.9E+10 3 6.2E+09
Total 2.1E+10 4

10-4
Chapter 10 - Simple Linear Regression and Correlation

250000

200000 y = 1600x - 3E+06

150000
Y

100000

50000

0
1950 1960 1970 1980 1990 2000 2010
X

There is a weak linear relationship (r) and the regression is not significant (r 2, F, p-value).
Limitations: sample size is very small.
Hidden variables: the 70s and 80s models have a different valuation than other decades possibly
due to a different model or style.

10-17. Regression equation is:


Credit Card Transactions = 39.6717 + 0.06129 Debit Card Transactions

2
r 0.9624 Coefficient of Determination
Confidence Interval for Slope r 0.9810 Coefficient of Correlation
 (1-) C.I. for  1
95% 0.6202 + or - 0.17018 s(b1) 0.06129Standard Error of Slope

Confidence Interval for Intercept


 (1-) C.I. for  0
95% 177.641 + or - 110.147 s(b0) 39.6717Standard Error of Intercept

Prediction Interval for Y


 X (1-) P.I. for Y given X
+ or - s 56.9747Standard Error of prediction

Prediction Interval for E[Y|X]


 X (1-) P.I. for E[Y | X]
+ or -

ANOVA Table
Source SS df MS F Fcritical p-value
Regn. 332366 1 332366 102.389 7.70865 0.0005
Error 12984.5 4 3246.12
Total 345351 5

There is no implication for causality. A third variable influence could be “increases in per capital
income” or “GDP Growth”.

10-5
Chapter 10 - Simple Linear Regression and Correlation

10-18. SSE =   y  b  b x Take partial derivatives with respect to b


0 1
2
0 and b1:

 / b [ ( y  b  b x) ] = 2   y  b  b x 
0 0 1
2
0 1

 / b [ ( y  b  b x) ] = 2  x y  b  b x 
1 0 1
2
0 1

Setting the two partial derivatives to zero and simplifying, we get:


  y  b  b x = 0 and  x y  b
0 1 0  b1 x  = 0. Expanding, we get:

 y  nb   xb = 0 and


0 1

  xy  -  xb   x b
0
2
1 =0

Solving the above two equations simultaneously for b0 and b1 gives the required results.

10-19. 99% C.I. for  1 : 1.25533  2.807(0.04972) = [1.1158, 1.3949].


The confidence interval does not contain zero.

10-20. MSE = 7.629


From the ANOVA table for Problem 10-11:
ANOVA Table
Source SS df MS
Regn. 1024.14 1 1024.14
Error 22.888 3 7.62933
Total 1047.03 4

10-21. From the regression results for problem 10-11


s(b0) = 2.897 s(b1) = 0.873
s(b1) 0.87346Standard Error of Slope

s(b0) 2.89694Standard Error of Intercept

10-22. From the regression results for problem 10-11

Confidence Interval for Slope


 (1-) C.I. for  1
95% 10.12 + or - 2.77974

Confidence Interval for Intercept


 (1-) C.I. for  0
95% 6.38 + or - 9.21937

95% C.I. for the slope: 10.12 ± 2.77974 = [7.34026, 12.89974]

95% C.I. for the intercept: 6.38 ± 9.21937 = [-2.83937, 15.59937]

10-6
Chapter 10 - Simple Linear Regression and Correlation

10-23. s(b0) = 0.971 s(b1) = 0.016; estimate of the error variance is MSE = 0.991. 95% C.I. for  1 :
0.187 + 2.201(0.016) = [0.1518, 0.2222]. Zero is not a plausible value at  = 0.05.

Confidence Interval for Slope


 (1-) C.I. for  1
95% 0.18663 + or - 0.03609 s(b1) 0.0164Standard Error of Slope

Confidence Interval for Intercept


 (1-) C.I. for  0
95% -3.05658 + or - 2.1372 s(b0) 0.97102Standard Error of Intercept

10-24. s(b0) = 85.44 s(b1) = 0.1534


Estimate of the regression variance is MSE = 8122
95% C.I. for b1: 1.5518  2.776 (0.1534) = [1.126, 1.978]
Zero is not in the range.

Confidence Interval for Slope


 (1-) C.I. for  1
95% 1.55176 + or - 0.42578 s(b1) 0.15336Standard Error of Slope

Confidence Interval for Intercept


 (1-) C.I. for  0
95% -255.943 + or - 237.219 s(b0) 85.4395Standard Error of Intercept

10-25. s 2 gives us information about the variation of the data points about the computed regression line.

10-26. In correlation analysis, the two variables, X and Y, are viewed in a symmetric way, where no one
of them is “dependent” and the other “independent,” as the case in regression analysis. In
correlation analysis we are interested in the relation between two random variables, both
assumed normally distributed.

10-27. From the regression results for problem 10-11:


r 0.9890 Coefficient of Correlation

10-28. r = 0.960

r 0.9601 Coefficient of Correlation

10-7
Chapter 10 - Simple Linear Regression and Correlation

0.3468
10-29. t(5) = = 0.640
(1  .1203) / 3

Accept H0. The two variables are not linearly correlated.

10-30. Yes. For example suppose n = 5 and r = .51; then:


r
t= = 1.02 and we do not reject H0. But if we take n = 10,000 and
(1  r ) /(n  2)
2

r = 0.04, giving t = 14.28, this leads to strong rejection of H0.

10-31. We have: r = 0.875 and n = 10. Conducting the test:


r .875
t (8) = = = 5.11
(1  r ) /(n  2)
2
(1  .8752 ) / 8
There is statistical evidence of a correlation between the prices of gold and of copper.
Limitations: data are time-series data, hence not dependent random samples. Also, data set
contains only 10 points.

.37
10-34. n= 65 r = 0.37 t (63) = = 3.16
(1  .37 2 ) / 63
Yes. Significant. There is a correlation between the two variables.

1 1
10-35. z  = ln [(1 + r)/(1 – 5)] = ln (1.37/0.63) = 0.3884
2 2
1 1
  = ln [(1 +  )/(1 –  )] = ln (1.22/0.78) = 0.2237
2 2
  = 1/ n  3 = 1/ 62 = 0.127
z = ( z     )/   = (0.3884 – 0.2237)/0.127 = 1.297. Cannot reject H0.

10-36. Using “TINV(,df)” function in Excel, where df = n-2 = 52: =TINV(0.05,52) = 2.006645
And TINV(0.01, 52) = 2.6737
Reject H0 at 0.05 but not at 0.01. There is evidence of a linear relationship at  = 0.05 only.

10-37. t (16) = b1/s(b1) = 3.1/2.89 = 1.0727.


Do not reject H0. There is no evidence of a linear relationship using any  .

10-38. Using the regression results for problem 10-11:


critical value of t is: t( 0.05, 3) = 3.182
computed value of t is: t = b1/s(b1) = 10.12 / 0.87346 = 11.586
Reject H0. There is strong evidence of a linear relationship.

10-8
Chapter 10 - Simple Linear Regression and Correlation

10-39. t (11) = b1/s(b1) = 0.187/0.016 = 11.69


Reject H0. There is strong evidence of a linear relationship between the two variables.

10-40. b1/ s(b1) = 1600/2498 = 0.641


Do not reject H0. There is no evidence of a linear relationship.

10-41. t (58) = b1/s(b1) = 1.24/0.21 = 5.90


Yes, there is evidence of a linear relationship.

10-42. Using the Excel function, TDIST(x,df,#tails) to estimate the p-value for the t-test results, where
x = 1.51, df = 585692 – 2 = 585690, #tails = 2 for a 2-tail test:
TDIST(1.51, 585690,2) = 0.131.
The corresponding p-value for the results is 0.131. The resgression is not significant even at the
0.10 level of significance.

10-43. t (211) = z = b1/s(b1) = 0.68/12.03 = 0.0565


Do not reject H0. There is no evidence of a linear relationship using any  . (Why report such
results?)

10-44. b1 = 5.49 s(b1) = 1.21 t (26) = 4.537


Yes, there is evidence of a linear relationship.

10-45. The coefficient of determination indicates that 9% of the variation in customer satisfaction can
be explained by the changes in a customer’s materialism measurement.

10-46 a. The model should not be used for prediction purposes because only 2.0% of the
variation in pension funding is explained by its relationship with firm profitability.
b. The model explains virtually nothing.
c. Probably not. The model explains too little.

10-47. In Problem 10-11 regression results, r 2 = 0.9781. Thus, 97.8% of the variation in wealth growth
is explained by the income quantile.
2
r 0.9781 Coefficient of Determination

10-48. In Problem 10-13, r 2 = 0.922. Thus, 92.2% of the variation in the dependent variable is
explained by the regression relationship.

10-49. r 2 in Problem 10-16: r 2 = 0.1203

10-50. Reading directly from the MINITAB output: r 2 = 0.962

10-9
Chapter 10 - Simple Linear Regression and Correlation

2
r 0.9624 Coefficient of Determination

10-51. Based on the coefficient of determination values for the five countries, the UK model explains
31.7% of the variation in long-term bond yields relative to the yield spread. This is the best
predictive model of the five. The next best model is the one for Germany, which explains 13.3%
of the variation. The regression models for Canada, Japan, and the US do not predict long-term
yields very well.

10-52. From the information provided, the slope coefficient of the equation is equal to -14.6. Since its
value is not close to zero (which would indicate that a change in bond ratings has no impact on
yields), it would indicate that a linear relationship exists between bond ratings and bond yields.
This is in line with the reported coefficient of determination of 61.56%.

10-53. r 2 in Problem 10-15: r 2 = 0.873

2
r 0.8348 Coefficient of Determination

10-54.  ( y  y) = [( yˆ  y)  ( y  yˆ )] = [( yˆ  y)  2( yˆ  y)( y  yˆ )  ( y  yˆ )


2 2 2 2
]

=  ( yˆ  y )  2 ( yˆ  y )( y  yˆ ) +  ( y  yˆ )
2 2

But: 2 ( yˆ  y )( y  yˆ ) = 2 yˆ ( y  yˆ )  2 y  ( y  yˆ ) = 0
because the first term on the right is the sum of the weighted regression residuals, which sum to
zero. The second term is the sum of the residuals, which is also zero. This establishes the result:
  
( y  y ) 2  ( yˆ  y ) 2  ( y  yˆ ) 2 .

10-55. From Equation (10-10): b1 = SSXY/SSX. From Equation (10-31):


SSR = b1SSXY. Hence, SSR = (SSXY /SSX)SSXY = (SSXY) 2/SSX

10-56. Using the results for problem 10-11:


F = 134.238 F(1,3) = 10.128 Reject H0.
F Fcritical p-value
134.238 10.128 0.0014

10-57. F (1,11) = 129.525 t (11) = 11.381 t 2 = 11.3812 = the F-statistic value already calculated.

F Fcritical p-value
129.525 4.84434 0.0000

10-58. F(1,4) = 102.39 t (4) = 10.119 t 2 = F (10.119)2 = 102.39

10-10
Chapter 10 - Simple Linear Regression and Correlation

F Fcritical p-value
102.389 7.70865 0.0005

10-59. F (1,7) = 0.66974 Do not reject H0.

87,691/ 1
10-60. F (1,102) = MSR/MSE = = 701.8
12,745 / 102
There is extremely strong evidence of a linear relationship between the two variables.

10-61. t (k2 ) = F (1,k) . Thus, F(1,20) = [b1/s(b1)]2 = (2.556/4.122)2 = 0.3845


Do not reject H0. There is no evidence of a linear relationship.

2
 SS / SS 
10-62 t (k2 ) = [b1/s(b1)] =  XY
2 X 
 s / SS 
 X 
[using Equations (10-10) and (10-15) for b1 and s(b1), respectively]
2
 SS / SS  2
= XY X  = (SS XY / SS X )
 MSE / SS  MSE / SS X
 X 
SS 2XY / SS X SSR/1 MSR
= = = = F (1,k)
MSE MSE MSE
[because SS 2XY / SS X = SSR by Equations (10-31) and (10-10)]

10-63. a. Heteroscedasticity.
b. No apparent inadequacy.
c. Data display curvature, not a straight-line relationship.

10-64. a. No apparent inadequacy.


b. A pattern of increase with time.

10-65. a. No serious inadequacy.


b. Yes. A deviation from the normal-distribution assumption is apparent.

10-11
Chapter 10 - Simple Linear Regression and Correlation

10-66. Using the results for problem 10-11:


Residual Analysis Durbin-Watson statistic
d 3.39862

Residual Plot
4
3
2
1
Error

0
-1
-2
-3
-4
X

Residual variance fluctuates; with only 5 data points the residuals appear to be normally
distributed.

Normal Probability Plot of Residuals


3

2
Corresponding

1
Normal Z

0
-10 -5 0 5 10
-1

-2

-3
Residuals

10-12
Chapter 10 - Simple Linear Regression and Correlation

10-67. Residuals plotted against the independent variable of Problem 10-14:


*
resids

1.2+
* *
*

0.0+ * *
* *
*
* *

-1.2+ * *

Quality
30 40 50 60 70 80

No apparent inadequacy.

Residual Analysis Durbin-Watson statistic


d 2.0846

10-68.

10-13
Chapter 10 - Simple Linear Regression and Correlation

Residual Analysis Durbin-Watson statistic


d 1.70855

Plot shows some curvature.

10-69. In the American Express example, give a 95% prediction interval for x = 5,000:
ŷ = 274.85 + 1.2553(5,000) = 6,551.35.

1 (5,000  3,177.92) 2
P.I. = 6,551.35  (2.069)(318.16) 1  
25 40,947,557.84
= [5,854.4, 7,248.3]

10-70. Given that the slope of the equation for 10-52 is –14.6, if the rating falls by 3 the yield should
increase by 43.8 basis points.

10-71. For 99% P.I.: t .005(23) = 2.807


1 (5,000  3,177.92) 2
6,551.35  (2.807)(318.16) 1  
25 40,947,557.84
= [5,605.75, 7,496.95]

10-72. Point prediction: yˆ  6.38  10.12(4)  46.86


The 99% P.I.: [28.465, 65.255]
Prediction Interval for Y
 X (1-) P.I. for Y given X
99% 4 46.86 + or - 18.3946

10-14
Chapter 10 - Simple Linear Regression and Correlation

10-73. The 99% P.I.: [36.573, 77.387]


Prediction Interval for Y
 X (1-) P.I. for Y given X
99% 5 56.98 + or - 20.407

10-74. The 95% P.I.: [-142633, 430633]


Prediction Interval for Y
 X (1-) P.I. for Y given X
95% 1990 144000 + or - 286633

10-75. The 95% P.I.: [-157990, 477990]


Prediction Interval for Y
 X (1-) P.I. for Y given X
95% 2000 160000 + or - 317990

10-76. Point prediction: ŷ  16.0961  0.96809(5)  20.9365

10-77.
a) simple regression equation: Y = 2.779337 X – 0.284157
when X = 10, Y = 27.5092
Intercept Slope
b0 b1
-0.284157 2.779337

b) forcing through the origin: regression equation: Y = 2.741537 X.

Intercept Slope
b0 b1
0 2.741537

When X = 10, Y = 27.41537


Prediction
X Y
10 27.41537

c) forcing through (5, 13): regression equation: Y = 2.825566 X – 1.12783

Intercept Slope Prediction


b0 b1 X Y
-1.12783 2.825566 5 13

10-15
Chapter 11 - Multiple Regression

CHAPTER 11
MULTIPLE REGRESSION

(The template for this chapter is: Multiple Regression.xls.)

11-1. The assumptions of the multiple regression model are that the errors are normally and
independently distributed with mean zero and common variance  2 . We also assume that the X i
are fixed quantities rather than random variables; at any rate, they are independent of the error
terms. The assumption of normality of the errors is need for conducting test about the regression
model.

11-2. Holding advertising expenditures constant, sales volume increases by 1.34 units, on average, per
increase of 1 unit in promotional experiences.

11-3. In a correlational analysis, we are interested in the relationships among the variables. On the
other hand, in a regression analysis with k independent variables, we are interested in the effects
of the k variables (considered fixed quantities) on the dependent variable only (and not on one
another).

11-4. A response surface is a generalization to higher dimensions of the regression line of simple linear
regression. For example, when 2 independent variables are used, each in the first order only, the
response surface is a plane is a plane in 3-dimensional euclidean space. When 7 independent
variables are used, each in the first order, the response surface is a 7-dimensional hyperplane in
8-dimensional euclidean space.

11-5. 8 equations.

11-6. The least-squares estimators of the parameters of the multiple regression model, obtained as
solutions of the normal equations.

11-7. Y  nb  b  X  b  X
0 1 1 2 2

X Y b X b X b X X
1 0 1 1 1
2
2 1 2

X Y b X b X X b X
2 0 2 1 1 2 2 2
2

852 = 100b0 + 155b1 + 88b2


11,423 = 155b0 + 2,125b1 + 1,055b2
8,320 = 88b0 + 1,055b1 + 768b2

b0 = (852 – 155b1 – 88b2)/100


11,423 = 155(852 – 155b1 – 88b2)/100 + 2,125b1 + 1,055b2
8,320 = 88(852 – 155b1 – 88b2)/100 + 1,055b1 + 768b2

11-1
Chapter 11 - Multiple Regression

Continue solving the equations to obtain the solutions:


b0 = 1.1454469 b1 = 0.0487011 b2 = 10.897682

11-8. Using SYSTAT:


DEP VAR: VALUE N: 9 MULTIPLE R: .909 SQUARED MULTIPLE R: .826
ADJUSTED SQUARED MULTIPLE R: .769
STANDING ERROR OF ESTIMATE: 59.477

VARIABLE COEFFICIENT STD ERROR STD COEF TOLERANCE T P(2TAIL)

CONSTANT 9.800 80.763 0.000 0.121 0.907


SIZE 0.173 0.040 0.753 0.9614430 4.343 0.005
DISTANCE 31.094 14.132 0.382 0.9614430 2.200 0.070

ANALYSIS OF VARIANCE

SOURCE SUM-OF-SQUARES DF MEAN-SQUARE F-RATIO P

REGRESSION 101032.867 2 50516.433 14.280 0.005


RESIDUAL 21225.133 6 3537.522

Multiple Regression Results Value

0 1 2 3 4 5 6 7 8
Intercept Size Distance
b -9.7997 0.17331 31.094
s(b) 80.7627 0.0399 14.132
t -0.1213 4.34343 2.2002
p-value 0.9074 0.0049 0.0701

VIF 1.0401 1.0401

ANOVA Table
Source SS df MS F FCritical p-value
Regn. 101033 2 50516 14.28 5.1432 0.0052 s 59.477
Error 21225.1 6 3537.5
2 2
Total 122258 8 15282 R 0.8264 Adjusted R 0.7685

11-2
Chapter 11 - Multiple Regression

11-9. With no advertising and no spending on in-store displays, sales are b0  47.165 (thousands) on
the average. Per each unit (thousand) increase in advertising expenditure, keeping in-store
display expenditure constant, there is an average increase in sales of b1 = 1.599 (thousand).
Similarly, for each unit (thousand) increase in in-store display expenditure, keeping advertising
constant, there is an average increase in sales of b2 = 1.149 (thousand).

11-10. We test whether there is a linear relationship between Y and any of the X, variables (that is, with
at least one of the Xi). If the null hypothesis is not rejected, there is nothing more to do since
there is no evidence of a regression relationship. If H0 is rejected, we need to conduct further
analyses to determine which of the variables have a linear relationship with Y and which do not,
and we need to develop the regression model.

11-11. Degrees of freedom for error = n  13.

11-12. k = 2 n = 82 SSE = 8,650 SSR = 988


MSR = SSR / k = 988 / 2 = 494
SST = SSR + SSE = 988 + 8650 = 9638
MSE = SSE / n – (k+1) = 8650 / 79 = 109.4937
F = MSR / MSE = 494 / 109.4937 = 4.5116
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the df’s refer to the degrees of freedom in the numerator and denominator, respectively.

FDIST(4.5116, 2, 79) = 0.013953


Yes, there is evidence of a linear regression relationship at  = 0.05, but not at 0.01.

7,768 / 4
11-13. F (4,40) = MSR/MSE = = 1,942/197.625 = 9.827
(15,673  7,768) / 40
Yes, there is evidence of a linear regression relationship between Y and at least one of the
independent variables.

11-14. Source SS df MS F
Regression 7,474.0 3 2,491.33 48.16
Error 672.5 13 51.73
Total 8,146.5 16

Since the F-ratio is highly significant, there is evidence of a linear regression relationship
between overall appeal score and at least one of the three variables prestige, comfort, and
economy.

11-15. When the sample size is small; when the degrees of freedom for error are relatively smallwhen
adding a variable and thus losing a degree if freedom for error is substantial.

11-3
Chapter 11 - Multiple Regression

11-16. R 2 = SSR/SST. As we add a variable, SSR cannot decrease. Since SST is constant, R 2 cannot
decrease.

11-17. No. The adjusted coefficient is used in evaluating the importance of new variables in the
presence of old ones. It does not apply in the case where all we consider is a single independent
variable.

11-18. By the definition of the adjusted coefficient of determination, Equation (11-13):


SSE /( n  k  1) n 1
R2 = 1  = 1 – (SSE/SST)
SST /( n  1) n  k 1
But: SSE/SST = 1 – R 2, so the above is equal to:
n 1
1 – (1 – R 2) which is Equation (11-14).
n  (k  1)

11-19. The mean square error gives a good indication of the variation of the errors in regression.
However, other measures such as the coefficient of multiple determination and the adjusted
coefficient of multiple determination are useful in evaluating the proportion of the variation in
the dependent variable explained by the regressionthus giving us a more meaningful measure
of the regression fit.

11-20. Given an adjusted R 2 = 0.021, only 2.1% of the variation in the stock return is explained by the
four independent variables.
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the df’s refer to the degrees of freedom in the numerator and denominator, respectively.

FDIST(2.27, 4, 433) = 0.06093


There is evidence of a linear regression relationship at  = 0.10 only.

11-21. R 2 = 7,474.0/8,146.5 = 0.9174 A good regression.


R 2 = 1  (1  0.9174)(16/13) = 0.8983 s= MSE = 51.73 = 7.192

11-22. Given R 2 = 0.94, k = 2 and n = 383, the adjusted R 2is:


n 1
R2 =1  (1  R 2) = 1  (1  0.94)(382/380) = 0.9397
n  (k  1)
Therefore, security and time effects characterize 93.97% of the variation on market price. Given
the value of the adjusted R 2, the model is a reliable predictor of market price.

n 1
11-23. R 2 = 1  (1  R 2) = 1  (1  0.918)(16/12) = 0.8907
n  (k  1)
Since R 2 has decreased, do not include the new variable.

11-4
Chapter 11 - Multiple Regression

11-24. Given R 2 = 0.769, k = 6 and n = 242


n 1
R 2 = 1  (1  R 2) = 1  (1  0.769)(241/235) = 0.7631
n  (k  1)
Since R 2 =76.31%, approximately 76% of the variation in the information price is characterized
by the 6 independent marketing variables.
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the df’s refer to the degrees of freedom in the numerator and denominator, respectively.

FDIST(44.8, 6, 235) = 2.48855E-36

There is evidence of a linear regression relationship at all ’s.

11-25. a. The regression expresses stock returns as a plane in space, with firm size ranking and
stock price ranking as the two horizontal axes:
RETURN = 0.484 - 0.030(SIZRNK)  0.017(PRCRNK)

The t-test for a linear relationship between returns and firm size ranking is highly significant,
but not for returns against stock price ranking.

b. We know that R 2 = 0.093 and n = 50, k = 2. Using Equation (11-14) we calculate:


 n 1 
(1 – R 2)   = 1  R 2
 n  ( k  1) 
 n  (k  1) 
R 2 = 1 – (1 – R 2 )   = 1 – (1 – 0.093)(47/49) = 0.130
 n 1 
Thus, 13% of the variation is due to the two independent variables.

c. The adjusted R 2 is quite low, indicating that the regression on both variables is not a good
model. They should try regressing on size alone.

n 1
11-26. R 2 = 1 – (1 -– R 2) = 1 – (1 – 0.72)(712/710) = 0.719
n  (k  1)
Based solely on this information, this is not a bad regression model.

11-27. k = 8 n = 500 SSE = 6179 SST = 23108


Source SS df MS F
Regn. 16929 8 2116.125 168.153
Error 6179 491 12.5845
Total 23108 499 3.0684E+14

11-5
Chapter 11 - Multiple Regression

Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the df’s refer to the degrees of freedom in the numerator and denominator, respectively.

FDIST(168.153, 8, 491) = 0.00 approximately

There is evidence of a linear regression relationship at all ’s.


SSE /[ n  (k  1)]
R 2 = SSR/SST = 0.7326 R2= 1  = 0.7282 MSE = 12.5845
SST /( n  1)

11-28. A joint confidence region for both parameters is a set of pairs of likely values of  1 , and  2 at
95%. This region accounts for the mutual dependency of the estimators and hence is elliptical
rather than rectangular. This is why the region may not contain a bivariate point included in the
separate univariate confidence intervals for the two parameters.

11-29. Assuming a very large sample size, we use the following formula for testing the significance of
bi
each of the slope parameters: z  . and use  = 0.05. Critical value of |z| = 1.96
sbi 
For firm size: z = 0.06/0.005 = 12.00 (significant)
For firm profitability: z = -5.533 (significant)
For fixed-asset ratio: z = -0.08
For growth opportunities: z = -0.72
For nondebt tax shield: z = 4.29 (significant)
The slope estimates with respect to “firm size”, “firm profitability” and “nondebt tax shield” are
not zero. The adjusted R-square indicates that 16.5% of the variation in governance level is
explained by the five independent variables. Next step: exclude “fixed-asset ratio” and “growth
opportunities” from the regression and see what happens to the adjusted R-square.\

11-30. 1. The usual caution about the possibility of a Type 1 error.


2. Multicollinearity may make the tests unreliable.
3. Autocorrelation in the errors may make the tests unreliable.

11-31. 95% C.I.’s for  2 through  5 :


 2 : 5.6  1.96(1.3) = [3.052, 8.148]
 3 : 10.35  1.96(6.88) = [3.135, 23.835]
 4 : 3.45  1.96(2.7) = [1.842, 8.742]
 5 : 4.25  1.96(0.38) = [4.995, 3.505]
 3 &  4 :contains the point (0,0)

11-32. Use the following formula for testing the significance of each of the slope parameters:
bi
z . and use  = 0.05. Critical value of |z| = 1.96
sbi 

11-6
Chapter 11 - Multiple Regression

For unexpected accruals: z = -2.0775 / 0.4111 = -5.054 (significant)


For auditor quality: z = 0.5176
For return on investment: z = 1.7785
For expenditure on R&D: z = 2.1161 (significant)
The R-square indicates that 36.5% of the variation in a firm’s reputation can be explained by the
four independent variables listed.

11-33. Yes. Considering the joint confidence region for both slope parameters is equivalent to
conducting an F test for the existence of a linear regression relationship. Since (0,0) is not in the
joint 95% region, this is equivalent to rejecting the null hypothesis of the F test at  = 0.05.

11-34. Prestige is not significant (or at least appears so, pending further analysis). Comfort and
Economy are significant (Comfort only at the 0.05 level). The regression should be rerun with
variables deleted.

11-35. Variable Lend seems insignificant because of collinearity with M1 or Price.

11-36. a. As Price is dropped, Lend becomes significant: there is, apparently, a collinearity between
Lend and Price.
b.,c. The best model so far is the one in Table 11-9, with M1 and Price only. The adjusted R 2 for
that model is higher than for the other regressions.
d. For the model in this problem, MINITAB reports F = 114.09. Highly significant. For the
model in Table 11-9: F = 150.67. Highly significant.
e. s = 0.3697. For Problem 11-35: s = 0.3332. As a variable is deleted, s (and its square, MSE)
increases.
f. In Problem 11-35: MSE = s 2 = (0.3332)2 = 0.111.

11-37. Autocorrelation of the regression error may cause this.

11-38. Use the following formula for testing the significance of each of the slope parameters:
bi
z . and use  = 0.05. Critical value of |z| = 1.96
sbi 
For new technological process: z = -0.014 / 0.004 = -3.50 (significant)
For organizational innovation: z = 0.25
For commercial innovation: z = 3.2 (significant)
For R&D: z = 4.50 (significant)

All but “organizational innovation” is an important independent variable in explaining


employment growth. The R-square indicates that 74.3% of the variation in employment growth
is explained by the four independent variables in the equation.

11-7
Chapter 11 - Multiple Regression

11-39. Regress Profits on Employees and Revenues

Multiple Regression

Y 1 X1 X2 Multiple Regression Results


Sl.No. Profits Ones Employees Revenues
1 -1221 1 96400 17440 0 1 2
2 -2808 1 63000 13724 Intercept Employees Revenues
3 -773 1 70600 13303 b 834.9510193 0.0085493 -0.174148688
4 248 1 39100 9510 s(b) 621.1993315 0.064416986 0.340929503
t 1.344095167 0.132718098 -0.510805567
5 38 1 37680 8870
p-value 0.2208 0.8982 0.6252
6 1461 1 31700 6846
7 442 1 32847 5937
VIF 29.8304 29.8304
8 14 1 12867 2445
9 57 1 11475 2254
10 108 1 6000 1311

ANOVA Table
Source SS df MS F FCritical p-value
Regn. 4507008.861 2 2253504.43 2.166 4.737 0.1852 s 1019.925
Error 7281731.539 7 1040247.363
2 2
Total 11788740.4 9 1309860.044 R 0.3823 Adjusted R 0.2058

Correlation matrix

1 2
Employees Revenues
1 Employees 1.0000
2 Revenues 0.9831 1.0000

Y Profits -0.5994 -0.6171

Regression Equation:
Profits = 834.95 + 0.009 Employees - 0.174 Revenues
The regression equation is not significant (F value), and there is a large amount of
multicollinearity present between the two independent variables (0.9831). There is so much
multicollinearity present that the negative partial correlations between the independent variables
and profits are not maintained in the regression results (both of the parameters of the independent
variables should be negative). None of the values of the parameters are significant.

11-40. The residual plot exhibits both heteroscedasticity and a curvature apparently not accounted for in
the model.

11-8

S-ar putea să vă placă și