Documente Academic
Documente Profesional
Documente Cultură
CHAPTER 8
THE COMPARISON OF TWO POPULATIONS
8-2. n = 40 D = 5 s D = 2.3
H0: D = 0 H1: D 0
50
t(39) = = 13.75
2.3 / 40
Strongly reject H0. 95% C.I. for D 2.023(2.3/ 40 ) = [4.26, 5.74].
8-1
Chapter 08 - The Comparison of Two Populations
At = 0.05, we reject H0. There are more viewers for movies than commercials.
8-4. n = 60 D = 0.2 sD = 1
H0: D 0 H1: D > 0
0.2 0
t(24) = = 1.549. At = 0.05, we cannot reject H0.
1 / 60
8-2
Chapter 08 - The Comparison of Two Populations
Reject H0. There is strong evidence that hotels in Spain are cheaper than those in France,
based on this small sample. p-value = 0.0139
8-3
Chapter 08 - The Comparison of Two Populations
1.25 0
t (19) = = 0.13
42.89 / 20
Do not reject H0; no evidence of a difference.
8-10. n1 = n 2 = 30
H0: 1 2 = 0 H1: 1 2 0
Nikon (1): x1 = 8.5 s1 = 2.1 Minolta (2): x 2 = 7.8 s 2 = 1.8
8-4
Chapter 08 - The Comparison of Two Populations
8.5 7.8
z= = 1.386
2 2
(2.1 / 30) (1.8 / 30)
Do not reject H0. There is no evidence of a difference in the average ratings of the two cameras.
Evidence Assumptions
Sample1 Sample2 Populations Normal
Size 32 35 n H0: Population Variances Equal
Mean 2.5 4.32 x-bar F ratio 4.50268
Std. Deviation 0.41 0.87 s p-value 0.0001
8-5
Chapter 08 - The Comparison of Two Populations
Reject H0. There is evidence that the average Bel Air price is lower.
Reject the null hypothesis. The global equities outperform U.S. market.
8-6
Chapter 08 - The Comparison of Two Populations
Evidence
Sample1 Sample2
Size 128 212 n
Mean 23.5 18 x-bar
Popn. 1 Popn. 2
Popn. Std. Devn. 12.2 10.5
Hypothesis Testing
S p2
13 17.622 2 13 14.292 2 38.2581
13 13 2
20.385 10.385
t 24 4.1219
38.2581 1 1
13 13
df 24.
Use a critical value of 2.064 for a two-tailed test. Reject H0. The two methods do differ.
8-7
Chapter 08 - The Comparison of Two Populations
e.
S 2p
10 11002.5 2 11 1876.05 2 879983.804
10 11 2
4238 3888.72
t 24 0.8522
879983.804 1 1
10 11
df 19
Evidence Assumptions
Sample1 Sample2 Populations Normal
Size 28 28 n H0: Population Variances Equal
Mean 0.19 0.72 x-bar F ratio 1.25792
Std. Deviation 5.72 5.1 s p-value 0.5552
2 2
= 2.54 1.96 (.64 / 255) (.85 / 300) = [2.416, 2.664] percent.
8-8
Chapter 08 - The Comparison of Two Populations
Evidence
Sample1 Sample2
Size 25 20 n
Mean 87 64 x-bar
Std. Deviation 12 23 s
8-9
Chapter 08 - The Comparison of Two Populations
Reject the null hypothesis: the average cost of beer is cheaper in Prague. Londoners save
between $3.74 and $6.26.
H0: 1 2 = 0 H1: 1 2 0
t-Test for Difference in Population Means
Evidence Assumptions
US China Populations Normal
Size 15 18 n H0: Population Variances Equal
Mean 3.8 6.1 x-bar F ratio 5.80372
Std. Deviation 2.2 5.3 s p-value 0.0018
8-10
Chapter 08 - The Comparison of Two Populations
Do not reject the null hypothesis (p-value = 0.1073), investment returns are the same in China
and the US.
8-23. Take proposed route as population 1 and alternate route as 2. Assume equal variance for both
populations.
H0: 1 2 0
H1: 1 2 > 0
p-value from the template = 0.8674
cannot reject H0
8-11
Chapter 08 - The Comparison of Two Populations
Evidence Assumptions
Sample1 Sample2 Populations Normal
Size 20 20 n H0: Population Variances Equal
Mean 3.56 4.84 x-bar F ratio 1.30612
Std. Deviation 2.8 3.2 s p-value 0.5662
Evidence
Sample1 Sample2
Size 25 25 n
Mean 12 13.5 x-bar
Std. Deviation 2.5 1 s
8-12
Chapter 08 - The Comparison of Two Populations
Evidence Assumptions
Sample1 Sample2 Populations Normal
Size 8 10 n H0: Population Variances Equal
Mean 3 2.3 x-bar F ratio 1.1025
Std. Deviation 2 2.1 s p-value 0.9186
Do not reject the null hypothesis. (p-value = 0.2417) The new advertising firm has not resulted
in significantly higher sales.
(n1 1) s1 (n2 1) s 2
2 2
1 1
( x 2 x1 ) 2.011
n1 n2 2 n1 n2
2 2
24(2.5) 24(1) 1 1
= (13.5 – 12) 2.001
48 25 25
= [0.4170, 2.5830] percent.
8-13
Chapter 08 - The Comparison of Two Populations
Sample Sample
Evidence 1 2
Size 100 100 n
#Successes 85 68 x
Proportion 0.8500 0.6800 p-hat
Hypothesis Testing
Hypothesized Difference Zero
8.31. n1 = 31 x 1 = 11 n 2 = 50 x 2= 19
H0: p1 – p2 = 0 H1: p1 – p2 0
8-14
Chapter 08 - The Comparison of Two Populations
pˆ 1 pˆ 2
z= = 0.228
1 1
pˆ (1 pˆ )
1
n n 2
Do not reject H0. There is no evidence that one corporate raider is more successful than the other.
pˆ 1 (1 pˆ 1 ) pˆ 2 (1 pˆ 2 )
8-33. 95% C.I. for p2 p1: ( p̂ 2 p̂1 ) 1.96
n1 n2
(.13)(.87) (.19)(.81)
= .06 1.96 = [0.0419, 0.0781]
2,060 5,000
We are 95% confident that the increase in the proportion of the population preferring California
wines is anywhere from 4.19% to 7.81%.
Confidence Interval
Confidence Interval
95% 0.0600 ± 0.0181 = [ 0.0419 , 0.0782 ]
8-34. The statement to be tested must be hypothesized before looking at the data:
Chase Man. (1): n1 = 650 x 1 = 48
Manuf. Han. (2): n 2 = 480 x 2 = 20
H0: p 1 – p 2 0 H1: p 1 – p 2 > 0
pˆ 1 pˆ 2
z= = 2.248
1 1
pˆ (1 pˆ )
n1 n 2
Reject H0. p-value = 0.0122.
8-15
Chapter 08 - The Comparison of Two Populations
.283 .205
z= = 1.601
1 1
(.234)(1 .234)
120 200
At = 0.05, there is no evidence to conclude that the proportion of American executives who
prefer the A380 is greater than that of European executives. (p-value = 0.0547.)
Hypothesis Testing
Hypothesized Difference Zero
8-16
Chapter 08 - The Comparison of Two Populations
Hypothesis Testing
Hypothesized Difference Zero
Hypothesis Testing
Hypothesized Difference Zero
Do not reject the null hypothesis: the proportions are not significantly different.
8-17
Chapter 08 - The Comparison of Two Populations
2
8-40. Old method (1): n1 = 40 s1 = 1,288
2
New method (2): n 2 = 15 s 2 = 1,112
2 2 2 2
H0: 1 2 H1: 1 >2 use = .05
2 2
F (39,14) = s 1 /s 2
= 1,288/1,112 = 1.158
The critical point at = .05 is F (39,14) = 2.27 (using approximate df in the table). Do not reject
H0. There is no evidence that the variance of the new production method is smaller.
Sample 1 Sample 2
Size 40 15
Variance 1288 1112
At an of
Null Hypothesis p-value 5%
H0: -
2 2
1 2 = 0 0.7977
H0: -
2 2
1 2 >= 0 0.6012
H0: 1 2
2 2
- <= 0 0.3988
F = 1.1025
Assumptions
Populations Normal
H0: Population Variances Equal
F ratio 1.1025
p-value 0.9186
2 2 2 2
F (24,24) = s1 / s2 = (2.5) /(1) = 6.25
8-18
Chapter 09 - Analysis of Variance
CHAPTER 9
ANALYSIS OF VARIANCE
9-1. H0: X X X X 1 2 3 4
H1: X X All 4 are different
X X
X X X 2 equal; 2 different
X
X X X X 3 equal; 1 different
X X X X 2 equal; other 2 equal but different from first 2
9-2. ANOVA assumptions: normal populations with equal variance. Independent random sampling
from the r populations.
9-3. Series of paired t-test are dependent on each other. There is no control over the probability of a
Type I error for the joint series of tests.
9-4. r = 5 n1 = n2 = . . . = n5 = 21 n =105
df’s of F are 4 and 100. Computed F = 3.6. The p-value is close to 0.01. Reject H0. There is
evidence that not all 5 plants have equal average output.
F Distribution
10% 5% 1% 0.50%
(1-Tail) F-Critical 2.0019 2.4626 3.5127 3.9634
9-5. r = 4 n1 = 52 n2 = 38 n3 = 43 n4 = 47
Computed F = 12.53. Reject H0. The average price per lot is not equal at all 4 cities. Feel very
strongly about rejecting the null hypothesis as the critical point of F (3,176) for = .01 is
approximately 3.8.
F Distribution
10% 5% 1% 0.50%
(1-Tail) F-Critical 2.1152 2.6559 3.8948 4.4264
9-6. Originally, treatments referred to the different types of agricultural experiments being performed
on a crop; today it is used interchangeably to refer to the different populations in the study.Errors
are the differences between the data points and their sample means.
9-7. Because the sum of all the deviations from a mean is equal to 0.
9-1
Chapter 09 - Analysis of Variance
9-8. Total deviation = xij – x = ( x i – x ) + x ij xi
= treatment deviation + error deviation.
9-9. The sum of squares principle says that the sum of the squared total deviations of all the data
points is equal to the sum of the squared treatment deviations plus the sum of all squared error
deviations in the data.
9-10. An error is any deviation from a sample menu that is not explained by differences among
populations. An error may be due to a host of factors not studied in the experiment.
9-11. Both MSTR and MSE are sample statistics given to natural variation about their own means.
(If x > 0 we cannot immediately reject H0 in a single-sample case either.)
9-12. The main principle of ANOVA is that if the r population means are not all equal then it is likely
that the variation of the data points about their sample means will be small compared to the
variation of the sample means about the grand mean.
9-13. Distances among populations means manifest themselves in treatment deviations that are large
relative to error deviations. When these deviations are squared, added, and then divided by df’s,
they give two variances. When the treatment variance is (significantly) greater than the error
variance, population mean differences are likely to exist.
9-15 SST = SSTR + SSE, but this does not equal MSTR + MSE. A counterexample:
Let n = 21 r = 6 SST = 100 SSTR = 85 SSE = 15
Then SST = SSTR + SSE = 85 + 15 = 100.
SSTR SSE 85 15 SST
But = MSTR MSE 18
r 1 n r 5 15 n 1
9-16. When the null hypothesis of ANOVA is false, the ratio MSTR/MSE is not the ratio of two
independent, unbiased estimators of the common population variance 2 , hence this ratio does
not follow an F distribution.
9-2
Chapter 09 - Analysis of Variance
Now sum this over all observations (all treatments i = 1, . . . , r; and within treatment i, all
observations j = 1, . . . , ni:
ni ni ni ni
r r r r
i 1 j 1
( xij – x )2 =
i 1 j 1
( x i – x )2 + i 1 j 1
2( x i – x )( xij – x i ) +
i 1 j 1
( xij – x i )2
r
Notice that the first sum of the R.H.S. here equals
i 1
ni( x i – x )2 since for each i the
summand doesn’t vary over each of the ni) values of j. Similarly the second sum is
ni ni
r
2 [( x i – x ) ( xij – x i )]. But for each fixed i, ( xij – x i ) = 0 since this is just the sum
i 1 j 1 j 1
of all deviations from the mean within treatment i. Thus the whole second sum in the long R.H.S.
above is 0, and the equation is now
ni ni
r r r
i 1 j 1
( xij – x )2 =
i 1
ni( x i – x )2 +
i 1 j 1
( xij – x i )2
ANOVA Table 5%
Source SS df MS F Fcritical p-value
Between 381127 2 190563.33 20.7084038 3.3541312 0.0000 Reject
Within 248460 27 9202.2222
Total 629587 29
9-3
Chapter 09 - Analysis of Variance
MINITAB output
One-way ANOVA: UK, Mex, UAE, Oman
Source DF SS MS F P
Factor 3 187.70 62.57 11.49 0.000
Error 28 152.41 5.44
Total 31 340.11
Critical point F (3,28) for = 0.05 is 2.9467. Therefore we reject H0. There is evidence of
differences in the average price per barrel of oil from the four sources. The Rotterdam oil market
may not be efficient. The conclusion is valid only for Rotterdam, and only for Arabian Light. We
need to assume independent random samples from these populations, normal populations with
equal population variance. Observations are time-dependent (days during February), thus the
assumptions could be violated. This is a limitation of the study. Another limitation is that
February may be different from other months.
9-20. An F(.05,2,101) = 3.61 result, relative to a critical value of 3.08637, indicates a significant difference
in their perceptions on the roles played by African American models in commercials.
9-4
Chapter 09 - Analysis of Variance
p-value = .0001. Critical point for F (2,38) at = .05 is 3.245. Therefore, reject H0. There is a
difference in the length of time it takes to make a decision.
ANOVA Table 5%
Source SS df MS F Fcritical p-value
Between 91.0426 2 45.521302 12.3093042 3.2448213 0.0001 Reject
Within 140.529 38 3.6981215
Total 231.571 40
9-22. An F(.05,2,55) = 52.787 result, relative to a critical value of 3.165, indicates a significant difference
in the monetary-economic reaction to the three inflation fighting policies.
9-23. The test results exceed the critical value of F(.01,3,236) = 3.866. The results indicate that the
performances of the four different portfolios are significantly different.
9-25. Where do differences exist in the circle-square-triangle populations from Table 9-1, using
Tukey? From the text: MSE = 2.125
triangles: n1 = 4 x1 = 6
squares: n2 = 4 x 2 = 11.5
circles: n3 = 3 x3 = 2
For = .01, q (r,nr) = q 0.01(3,8) = 5.63 Smallest ni is 3:
T = q MSE / 3 = 5.63 2.125 / 3 = 4.738
| x1 x 2 | = 5.5 > 4.738 sig.
| x 2 x 3 | = 9.5 > 4.738 sig.
| x1 x 3 | = 4.0 < 4.738 n.s.
Thus: “1 = 3”; “2 > 1”; “2 > 3”
9-5
Chapter 10 - Simple Linear Regression and Correlation
CHAPTER 10
SIMPLE LINEAR REGRESSION AND CORRELATION
10-1. A statistical model is a set of mathematical formulas and assumptions that describe some real-
world situation.
10-2. Steps in statistical model building: 1) Hypothesize a statistical model; 2) Estimate the model
parameters; 3) Test the validity of the model; and 4) Use the model.
10-3. Assumptions of the simple linear regression model: 1) A straight-line relationship between X and
Y; 2) The values of X are fixed; 3) The regression errors, , are identically normally distributed
random variables, uncorrelated with each other through time.
10-4. 0 is the Y-intercept of the regression line, and 1 is the slope of the line.
10-5. The conditional mean of Y, E(Y | X), is the population regression line.
10-6. The regression model is used for understanding the relationship between the two variables, X and
Y; for prediction of Y for given values of X; and for possible control of the variable Y, using the
variable X.
10-7. The error term captures the randomness in the process. Since X is assumed nonrandom, the
addition of makes the result (Y) a random variable. The error term captures the effects on Y of a
host of unknown random components not accounted for by the simple linear regression model.
10-8. The equation represents a simple linear regression model without an intercept (constant) term.
10-9. The least-squares procedure produces the best estimated regression line in the sense that the line
lies “inside” the data set. The line is the best unbiased linear estimator of the true regression line
as the estimators 0 and 1 have smallest variance of all linear unbiased estimators of the line
parameters. Least-squares line is obtained by minimizing the sum of the squared deviations of the
data points from the line.
10-10. Least squares is less useful when outliers exist. Outliers tend to have a greater influence on the
determination of the estimators of the line parameters because the procedure is based on
minimizing the squared distances from the line. Since outliers have large squared distances they
exert undue influence on the line. A more robust procedure may be appropriate when outliers
exist.
10-1
Chapter 10 - Simple Linear Regression and Correlation
Income Wealth
X Y Error Quantile Z Confidence Interval for Slope
1 1 17.3 0.8 0.667 0.431 (1-) C.I. for 1
2 2 23.6 -3.02 0.167 -0.967 95% 10.12 + or - 2.77974
3 3 40.2 3.46 0.833 0.967
4 4 45.8 -1.06 0.333 -0.431 Confidence Interval for Intercept
5 5 56.8 -0.18 0.500 0.000 (1-) C.I. for 0
95% 6.38 + or - 9.21937
2
r 0.9217 Coefficient of Determination
Confidence Interval for Slope r 0.9601 Coefficient of Correlation
(1-) C.I. for 1
95% 0.18663 + or - 0.03609 s(b1) 0.0164Standard Error of Slope
ANOVA Table
Source SS df MS F Fcritical p-value
Regn. 128.332 1 128.332 129.525 4.84434 0.0000
Error 10.8987 11 0.99079
Total 139.231 12
10-2
Chapter 10 - Simple Linear Regression and Correlation
10-15.
Simple Regression
Inflation Return
X Y Error
1 1 -3 -20.0642
2 2 36 17.9677
3 12.6 12 -16.294
4 -10.3 -8 -14.1247
5 0.51 53 36.4102
6 2.03 -2 -20.0613
7 -1.8 18 3.64648
8 5.79 32 10.2987
9 5.87 24 2.22121
2
r 0.0873 Coefficient of Determination
Confidence Interval for Slope r 0.2955 Coefficient of Correlation
(1-) C.I. for 1
95% 0.96809 + or - 2.7972 s(b1) 1.18294Standard Error of Slope
ANOVA Table
Source SS df MS F Fcritical p-value
Regn. 291.134 1 291.134 0.66974 5.59146 0.4401
Error 3042.87 7 434.695
Total 3334 8
10-3
Chapter 10 - Simple Linear Regression and Correlation
60
50
y = 0.9681x + 16.096
40
30
20
Y
10
0
-15 -10 -5 -10 0 5 10 15
-20
X
There is a weak linear relationship (r) and the regression is not significant (r2, F, p-value)
10-16.
Simple Regression
Year Value
X Y Error
1 1960 180000 84000
2 1970 40000 -72000
3 1980 60000 -68000
4 1990 160000 16000
5 2000 200000 40000
2
r 0.1203 Coefficient of Determination
Confidence Interval for Slope r 0.3468 Coefficient of Correlation
(1-) C.I. for 1
95% 1600 + or - 7949.76 s(b1) 2498Standard Error of Slope
ANOVA Table
Source SS df MS F Fcritical p-value
Regn. 2.6E+09 1 2.6E+09 0.41026 10.128 0.5674
Error 1.9E+10 3 6.2E+09
Total 2.1E+10 4
10-4
Chapter 10 - Simple Linear Regression and Correlation
250000
150000
Y
100000
50000
0
1950 1960 1970 1980 1990 2000 2010
X
There is a weak linear relationship (r) and the regression is not significant (r 2, F, p-value).
Limitations: sample size is very small.
Hidden variables: the 70s and 80s models have a different valuation than other decades possibly
due to a different model or style.
2
r 0.9624 Coefficient of Determination
Confidence Interval for Slope r 0.9810 Coefficient of Correlation
(1-) C.I. for 1
95% 0.6202 + or - 0.17018 s(b1) 0.06129Standard Error of Slope
ANOVA Table
Source SS df MS F Fcritical p-value
Regn. 332366 1 332366 102.389 7.70865 0.0005
Error 12984.5 4 3246.12
Total 345351 5
There is no implication for causality. A third variable influence could be “increases in per capital
income” or “GDP Growth”.
10-5
Chapter 10 - Simple Linear Regression and Correlation
/ b [ ( y b b x) ] = 2 y b b x
0 0 1
2
0 1
/ b [ ( y b b x) ] = 2 x y b b x
1 0 1
2
0 1
Solving the above two equations simultaneously for b0 and b1 gives the required results.
10-6
Chapter 10 - Simple Linear Regression and Correlation
10-23. s(b0) = 0.971 s(b1) = 0.016; estimate of the error variance is MSE = 0.991. 95% C.I. for 1 :
0.187 + 2.201(0.016) = [0.1518, 0.2222]. Zero is not a plausible value at = 0.05.
10-25. s 2 gives us information about the variation of the data points about the computed regression line.
10-26. In correlation analysis, the two variables, X and Y, are viewed in a symmetric way, where no one
of them is “dependent” and the other “independent,” as the case in regression analysis. In
correlation analysis we are interested in the relation between two random variables, both
assumed normally distributed.
10-28. r = 0.960
10-7
Chapter 10 - Simple Linear Regression and Correlation
0.3468
10-29. t(5) = = 0.640
(1 .1203) / 3
.37
10-34. n= 65 r = 0.37 t (63) = = 3.16
(1 .37 2 ) / 63
Yes. Significant. There is a correlation between the two variables.
1 1
10-35. z = ln [(1 + r)/(1 – 5)] = ln (1.37/0.63) = 0.3884
2 2
1 1
= ln [(1 + )/(1 – )] = ln (1.22/0.78) = 0.2237
2 2
= 1/ n 3 = 1/ 62 = 0.127
z = ( z )/ = (0.3884 – 0.2237)/0.127 = 1.297. Cannot reject H0.
10-36. Using “TINV(,df)” function in Excel, where df = n-2 = 52: =TINV(0.05,52) = 2.006645
And TINV(0.01, 52) = 2.6737
Reject H0 at 0.05 but not at 0.01. There is evidence of a linear relationship at = 0.05 only.
10-8
Chapter 10 - Simple Linear Regression and Correlation
10-42. Using the Excel function, TDIST(x,df,#tails) to estimate the p-value for the t-test results, where
x = 1.51, df = 585692 – 2 = 585690, #tails = 2 for a 2-tail test:
TDIST(1.51, 585690,2) = 0.131.
The corresponding p-value for the results is 0.131. The resgression is not significant even at the
0.10 level of significance.
10-45. The coefficient of determination indicates that 9% of the variation in customer satisfaction can
be explained by the changes in a customer’s materialism measurement.
10-46 a. The model should not be used for prediction purposes because only 2.0% of the
variation in pension funding is explained by its relationship with firm profitability.
b. The model explains virtually nothing.
c. Probably not. The model explains too little.
10-47. In Problem 10-11 regression results, r 2 = 0.9781. Thus, 97.8% of the variation in wealth growth
is explained by the income quantile.
2
r 0.9781 Coefficient of Determination
10-48. In Problem 10-13, r 2 = 0.922. Thus, 92.2% of the variation in the dependent variable is
explained by the regression relationship.
10-9
Chapter 10 - Simple Linear Regression and Correlation
2
r 0.9624 Coefficient of Determination
10-51. Based on the coefficient of determination values for the five countries, the UK model explains
31.7% of the variation in long-term bond yields relative to the yield spread. This is the best
predictive model of the five. The next best model is the one for Germany, which explains 13.3%
of the variation. The regression models for Canada, Japan, and the US do not predict long-term
yields very well.
10-52. From the information provided, the slope coefficient of the equation is equal to -14.6. Since its
value is not close to zero (which would indicate that a change in bond ratings has no impact on
yields), it would indicate that a linear relationship exists between bond ratings and bond yields.
This is in line with the reported coefficient of determination of 61.56%.
2
r 0.8348 Coefficient of Determination
= ( yˆ y ) 2 ( yˆ y )( y yˆ ) + ( y yˆ )
2 2
But: 2 ( yˆ y )( y yˆ ) = 2 yˆ ( y yˆ ) 2 y ( y yˆ ) = 0
because the first term on the right is the sum of the weighted regression residuals, which sum to
zero. The second term is the sum of the residuals, which is also zero. This establishes the result:
( y y ) 2 ( yˆ y ) 2 ( y yˆ ) 2 .
10-57. F (1,11) = 129.525 t (11) = 11.381 t 2 = 11.3812 = the F-statistic value already calculated.
F Fcritical p-value
129.525 4.84434 0.0000
10-10
Chapter 10 - Simple Linear Regression and Correlation
F Fcritical p-value
102.389 7.70865 0.0005
87,691/ 1
10-60. F (1,102) = MSR/MSE = = 701.8
12,745 / 102
There is extremely strong evidence of a linear relationship between the two variables.
2
SS / SS
10-62 t (k2 ) = [b1/s(b1)] = XY
2 X
s / SS
X
[using Equations (10-10) and (10-15) for b1 and s(b1), respectively]
2
SS / SS 2
= XY X = (SS XY / SS X )
MSE / SS MSE / SS X
X
SS 2XY / SS X SSR/1 MSR
= = = = F (1,k)
MSE MSE MSE
[because SS 2XY / SS X = SSR by Equations (10-31) and (10-10)]
10-63. a. Heteroscedasticity.
b. No apparent inadequacy.
c. Data display curvature, not a straight-line relationship.
10-11
Chapter 10 - Simple Linear Regression and Correlation
Residual Plot
4
3
2
1
Error
0
-1
-2
-3
-4
X
Residual variance fluctuates; with only 5 data points the residuals appear to be normally
distributed.
2
Corresponding
1
Normal Z
0
-10 -5 0 5 10
-1
-2
-3
Residuals
10-12
Chapter 10 - Simple Linear Regression and Correlation
1.2+
* *
*
0.0+ * *
* *
*
* *
-1.2+ * *
Quality
30 40 50 60 70 80
No apparent inadequacy.
10-68.
10-13
Chapter 10 - Simple Linear Regression and Correlation
10-69. In the American Express example, give a 95% prediction interval for x = 5,000:
ŷ = 274.85 + 1.2553(5,000) = 6,551.35.
1 (5,000 3,177.92) 2
P.I. = 6,551.35 (2.069)(318.16) 1
25 40,947,557.84
= [5,854.4, 7,248.3]
10-70. Given that the slope of the equation for 10-52 is –14.6, if the rating falls by 3 the yield should
increase by 43.8 basis points.
10-14
Chapter 10 - Simple Linear Regression and Correlation
10-77.
a) simple regression equation: Y = 2.779337 X – 0.284157
when X = 10, Y = 27.5092
Intercept Slope
b0 b1
-0.284157 2.779337
Intercept Slope
b0 b1
0 2.741537
10-15
Chapter 11 - Multiple Regression
CHAPTER 11
MULTIPLE REGRESSION
11-1. The assumptions of the multiple regression model are that the errors are normally and
independently distributed with mean zero and common variance 2 . We also assume that the X i
are fixed quantities rather than random variables; at any rate, they are independent of the error
terms. The assumption of normality of the errors is need for conducting test about the regression
model.
11-2. Holding advertising expenditures constant, sales volume increases by 1.34 units, on average, per
increase of 1 unit in promotional experiences.
11-3. In a correlational analysis, we are interested in the relationships among the variables. On the
other hand, in a regression analysis with k independent variables, we are interested in the effects
of the k variables (considered fixed quantities) on the dependent variable only (and not on one
another).
11-4. A response surface is a generalization to higher dimensions of the regression line of simple linear
regression. For example, when 2 independent variables are used, each in the first order only, the
response surface is a plane is a plane in 3-dimensional euclidean space. When 7 independent
variables are used, each in the first order, the response surface is a 7-dimensional hyperplane in
8-dimensional euclidean space.
11-5. 8 equations.
11-6. The least-squares estimators of the parameters of the multiple regression model, obtained as
solutions of the normal equations.
11-7. Y nb b X b X
0 1 1 2 2
X Y b X b X b X X
1 0 1 1 1
2
2 1 2
X Y b X b X X b X
2 0 2 1 1 2 2 2
2
11-1
Chapter 11 - Multiple Regression
ANALYSIS OF VARIANCE
0 1 2 3 4 5 6 7 8
Intercept Size Distance
b -9.7997 0.17331 31.094
s(b) 80.7627 0.0399 14.132
t -0.1213 4.34343 2.2002
p-value 0.9074 0.0049 0.0701
ANOVA Table
Source SS df MS F FCritical p-value
Regn. 101033 2 50516 14.28 5.1432 0.0052 s 59.477
Error 21225.1 6 3537.5
2 2
Total 122258 8 15282 R 0.8264 Adjusted R 0.7685
11-2
Chapter 11 - Multiple Regression
11-9. With no advertising and no spending on in-store displays, sales are b0 47.165 (thousands) on
the average. Per each unit (thousand) increase in advertising expenditure, keeping in-store
display expenditure constant, there is an average increase in sales of b1 = 1.599 (thousand).
Similarly, for each unit (thousand) increase in in-store display expenditure, keeping advertising
constant, there is an average increase in sales of b2 = 1.149 (thousand).
11-10. We test whether there is a linear relationship between Y and any of the X, variables (that is, with
at least one of the Xi). If the null hypothesis is not rejected, there is nothing more to do since
there is no evidence of a regression relationship. If H0 is rejected, we need to conduct further
analyses to determine which of the variables have a linear relationship with Y and which do not,
and we need to develop the regression model.
7,768 / 4
11-13. F (4,40) = MSR/MSE = = 1,942/197.625 = 9.827
(15,673 7,768) / 40
Yes, there is evidence of a linear regression relationship between Y and at least one of the
independent variables.
11-14. Source SS df MS F
Regression 7,474.0 3 2,491.33 48.16
Error 672.5 13 51.73
Total 8,146.5 16
Since the F-ratio is highly significant, there is evidence of a linear regression relationship
between overall appeal score and at least one of the three variables prestige, comfort, and
economy.
11-15. When the sample size is small; when the degrees of freedom for error are relatively smallwhen
adding a variable and thus losing a degree if freedom for error is substantial.
11-3
Chapter 11 - Multiple Regression
11-16. R 2 = SSR/SST. As we add a variable, SSR cannot decrease. Since SST is constant, R 2 cannot
decrease.
11-17. No. The adjusted coefficient is used in evaluating the importance of new variables in the
presence of old ones. It does not apply in the case where all we consider is a single independent
variable.
11-19. The mean square error gives a good indication of the variation of the errors in regression.
However, other measures such as the coefficient of multiple determination and the adjusted
coefficient of multiple determination are useful in evaluating the proportion of the variation in
the dependent variable explained by the regressionthus giving us a more meaningful measure
of the regression fit.
11-20. Given an adjusted R 2 = 0.021, only 2.1% of the variation in the stock return is explained by the
four independent variables.
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the df’s refer to the degrees of freedom in the numerator and denominator, respectively.
n 1
11-23. R 2 = 1 (1 R 2) = 1 (1 0.918)(16/12) = 0.8907
n (k 1)
Since R 2 has decreased, do not include the new variable.
11-4
Chapter 11 - Multiple Regression
11-25. a. The regression expresses stock returns as a plane in space, with firm size ranking and
stock price ranking as the two horizontal axes:
RETURN = 0.484 - 0.030(SIZRNK) 0.017(PRCRNK)
The t-test for a linear relationship between returns and firm size ranking is highly significant,
but not for returns against stock price ranking.
c. The adjusted R 2 is quite low, indicating that the regression on both variables is not a good
model. They should try regressing on size alone.
n 1
11-26. R 2 = 1 – (1 -– R 2) = 1 – (1 – 0.72)(712/710) = 0.719
n (k 1)
Based solely on this information, this is not a bad regression model.
11-5
Chapter 11 - Multiple Regression
Using Excel function to return the p-value, FDIST(F, dfN, dfD), where F is the F-test result and
the df’s refer to the degrees of freedom in the numerator and denominator, respectively.
11-28. A joint confidence region for both parameters is a set of pairs of likely values of 1 , and 2 at
95%. This region accounts for the mutual dependency of the estimators and hence is elliptical
rather than rectangular. This is why the region may not contain a bivariate point included in the
separate univariate confidence intervals for the two parameters.
11-29. Assuming a very large sample size, we use the following formula for testing the significance of
bi
each of the slope parameters: z . and use = 0.05. Critical value of |z| = 1.96
sbi
For firm size: z = 0.06/0.005 = 12.00 (significant)
For firm profitability: z = -5.533 (significant)
For fixed-asset ratio: z = -0.08
For growth opportunities: z = -0.72
For nondebt tax shield: z = 4.29 (significant)
The slope estimates with respect to “firm size”, “firm profitability” and “nondebt tax shield” are
not zero. The adjusted R-square indicates that 16.5% of the variation in governance level is
explained by the five independent variables. Next step: exclude “fixed-asset ratio” and “growth
opportunities” from the regression and see what happens to the adjusted R-square.\
11-32. Use the following formula for testing the significance of each of the slope parameters:
bi
z . and use = 0.05. Critical value of |z| = 1.96
sbi
11-6
Chapter 11 - Multiple Regression
11-33. Yes. Considering the joint confidence region for both slope parameters is equivalent to
conducting an F test for the existence of a linear regression relationship. Since (0,0) is not in the
joint 95% region, this is equivalent to rejecting the null hypothesis of the F test at = 0.05.
11-34. Prestige is not significant (or at least appears so, pending further analysis). Comfort and
Economy are significant (Comfort only at the 0.05 level). The regression should be rerun with
variables deleted.
11-36. a. As Price is dropped, Lend becomes significant: there is, apparently, a collinearity between
Lend and Price.
b.,c. The best model so far is the one in Table 11-9, with M1 and Price only. The adjusted R 2 for
that model is higher than for the other regressions.
d. For the model in this problem, MINITAB reports F = 114.09. Highly significant. For the
model in Table 11-9: F = 150.67. Highly significant.
e. s = 0.3697. For Problem 11-35: s = 0.3332. As a variable is deleted, s (and its square, MSE)
increases.
f. In Problem 11-35: MSE = s 2 = (0.3332)2 = 0.111.
11-38. Use the following formula for testing the significance of each of the slope parameters:
bi
z . and use = 0.05. Critical value of |z| = 1.96
sbi
For new technological process: z = -0.014 / 0.004 = -3.50 (significant)
For organizational innovation: z = 0.25
For commercial innovation: z = 3.2 (significant)
For R&D: z = 4.50 (significant)
11-7
Chapter 11 - Multiple Regression
Multiple Regression
ANOVA Table
Source SS df MS F FCritical p-value
Regn. 4507008.861 2 2253504.43 2.166 4.737 0.1852 s 1019.925
Error 7281731.539 7 1040247.363
2 2
Total 11788740.4 9 1309860.044 R 0.3823 Adjusted R 0.2058
Correlation matrix
1 2
Employees Revenues
1 Employees 1.0000
2 Revenues 0.9831 1.0000
Regression Equation:
Profits = 834.95 + 0.009 Employees - 0.174 Revenues
The regression equation is not significant (F value), and there is a large amount of
multicollinearity present between the two independent variables (0.9831). There is so much
multicollinearity present that the negative partial correlations between the independent variables
and profits are not maintained in the regression results (both of the parameters of the independent
variables should be negative). None of the values of the parameters are significant.
11-40. The residual plot exhibits both heteroscedasticity and a curvature apparently not accounted for in
the model.
11-8