Sunteți pe pagina 1din 67

Chapter 13 - Multiple Regression

Chapter 13
Multiple Regression
13.1 a. ŷ = 4.306 − 0.082ShipCost + 2.265PrintAds + 2.498WebAds + 16.697Rebate%
b. The coefficient of ShipCost says that each additional $1 of shipping cost reduces
about$ 0.082 from net revenue.
The coefficient of PrintAds says that each additional $1000 of printed ads adds about
$2,265 to net revenue.
The coefficient of WebAds says that each additional $1000 of printed ads adds about
$2,498 to net revenue.
The coefficient of Rebate% says that each additional percentage in the rebate rate
adds about $16,700 to net revenue.
c. The intercept is meaningless. You have to supply some product, so shipping cost
can’t be zero. You don’t have to have a rebate or ads, they can be zero.
d. NetRevenue = 4.306 − 0.082(10)+ 2.265(50) + 2.498(40) + 16.697(15) = 467
thousands or $467,111.
Learning Objective: 13-1

13.2 a. ŷ = 1225.44 + 11.52FloorSpace−6.935CompetingAds−0.1496Price


b. The coefficient of FloorSpace says that each additional square foot of floor space
adds about $11,520to average sales.
The coefficient of CompetingAds says that each additional $1000 of
CompetingAdsreducessales by $6,935.
The coefficient of Price says that each additional $1 of Advertised Pricereduces sales
by$149.60.
c. No. If all of these variables are zero, you wouldn’t sell a bike (no one will advertise a
bike for zero).
d. Sales = 1225.44 +11.52(80) − 6.935(100) − 0.1496(1200) = 1274.02 thousands or
$1,274,020.
Learning Objective: 13-1

13.3 a. ŷ = 2.8931 + 0.1542LiftWait + 0.2495AmountGroomed + 0.0539SkiPatrolVisibility


0.1196FriendlinessHosts
b. Overall satisfaction increases with an increase in satisfaction for each individual
predictor except for friendliness of hosts. This counterintuitive result could be due to
an interaction effect. Interaction effects will be explored later in the chapter.
c. No. Satisfaction scores of zero for the individual predictors is out of the range of the
variable values. It is unwise to extrapolate.
d. Overall satisfaction score = 2.8931 + 0.1542(5) + 0.2495(5) + 0.0539(5)  0.1196(5)
= 4.5831
Learning Objective: 13-1

13-1
Chapter 13 - Multiple Regression

13.4 a. ŷ = 4198.5808 27.3540AgeMed + 17.4893Bankrupt 0.0124FedSpend


29.0314HSGrad%
b. The 2005 state by state crime rate per 100,000 decreases by about 27 as the state
median age increases, increases by about 17 for every 1000 new bankruptcies filed,
decreases by .0124 for each dollar increase in federal funding per person, and
decreases by about 29 for each 1% increase in high school graduations.
c. No, a state would not have 0 median age or 0 values for any of the other predictor
variables.
d. Burglary Rate = 4198.5808 27.3540(35) + 17.4893(7)0.0124(6000)29.0314(80) =
966.7039
Learning Objective: 13-1

13.5 a. df1 = 4 and df2 = 45


b. From Appendix F: F.05 = 2.61, using df1 = 4 and df2 = 40. From Excel with df1 = 4 and
df2 = 45: F.05=F.INV.RT(.05,4,45) = 2.5787.
c. Fcalc = 64,853/4990 = 12.997. Yes, the overall regression is significant.
H0: All the coefficients are zero (1 = 2 = 3 = 0)
H1: At least one coefficient is not zero
d. R2 = 259,412/483,951 = .536 R2adj = 1 –[224539/45)/(483951/49)] = .4948
Learning Objective: 13-2

13.6 a. df1 = 3 and df2 = 26


b. From Appendix F:F.05 = 2.98. From Excel: =F.INV.RT(.05,3,26) = 2.975.
c. Fcalc = 398802/14590 = 27.334. Yes, the overall regression is significant.
H0: All the coefficients are zero (1 = 2 = 3 = 0)
H1: At least one coefficient is not zero
d. R2 = 1196410/1575741 = .7593R2adj = 1 –[(379332/26)/(1575742/29)]= .7315
Learning Objective: 13-2

13.7 a. df1 = 4 and df2 = 497


b. From Appendix F: F.05 = 2.42using df1 = 4 and df2 = 200. From Excel with df1 = 4 and
df2 = 497: F.05=F.INV.RT(.05,4,497) = 2.39.
c. Fcalc = 8.2682/0.6398 = 12.9231. Yes, the overall regression is significant.
H0: All the coefficients are zero (1 = 2 = 3 = 0)
H1: At least one coefficient is not zero
d. R2 = 33.0730/351.0598 = .0942R2adj = 1 –[(317.0598/497)/(351.0598/501)]= .0896
Learning Objective: 13-2

13.8 a. df1 = 4 and df2 = 45


b. From Appendix F: F.05 = 2.61using df1 = 4 and df2 = 40. From Excel with df1 = 4 and
df2 = 45: F.05=F.INV.RT(.05,4,45) = 2.5787.
c. Fcalc = 295683.3212/35221.1519 = 8.3951. Yes, the overall regression is significant.
H0: All the coefficients are zero (1 = 2 = 3 = 0)
H1: At least one coefficient is not zero
d. R2 = 1182733.285/2767685.12 = .4273
R2adj = 1 –[(1584951.835/45)/(2767685.12/49)]= .3764

13-2
Chapter 13 - Multiple Regression

Learning Objective: 13-2

13.9 a.
bj  0
tcalc , p-value =T.DIST.2T(|tcalc|, 45)
sj
Predictor Coef, bj sj tcalc p-value
Intercept 0.0608585
0.9517414
ShipCost -0.0175289
0.9860922
PrintAds 2.1571429
0.0363725
2.9537661
0.0049772
Rebate% 4.6770308

b. t.005 =T.INV(.005,45) =±2.69. Web Ads and Rebate% differ significantly from zero (p-
value < .01 and tcalc >2.69.)
c. See table in part a.
Learning Objective: 13-3

13.10 a.
bj  0
tcalc , p-value =T.DIST.2T(|tcalc|, 26)
sj

PredictorCoef, bj sj tcalc p-value


Intercept 3.0843192
FloorSpace 8.6631579
CmpetingAds -1.7759283
Price-0.14955 -1.6752548

b. t.005=T.INV(.005,26) =±2.779.Only Floor Space differs significantly from zero (p-


value < .01 and tcalc >2.779).
c. See table in part a.
Learning Objective: 13-3

13.11 a.
bj  0
tcalc , p-value =T.DIST.2T(|tcalc|, 497)
sj

Predictor Coef, bj sj tcalc p-value


Intercept 2.8931 0.3680 7.8617 2.37E-14
LiftWait 0.1542 0.0440 3.5045 .0005
AmountGroomed 0.2495 0.0529 4.7164 3.07E-06
SkiPatrolVisibility 0.0539 0.0443 1.2167 .2245
FriendlinessHost 0.1196 0.0623 1.9197 .0557

13-3
Chapter 13 - Multiple Regression

b. t.005 =T.INV(.005,497) =±2.69. Coefficients on LiftWait and AmountGroomed differ


significantly from zero.
c. See table in part a.
Learning Objective: 13-3

13.12 a.
bj  0
tcalc , p-value =T.DIST.2T(|tcalc|, 45)
sj

Predictor Coef, bj sj tcalc p-value


Intercept 4,198.5808 799.3395 5.2526 3.95E-06
AgeMed 27.3540 12.5687 2.1764 0.0348
Bankrupt 17.4893 12.4033 1.4101 0.1654
FedSpend 0.0124 0.0176 0.7037 0.4853
HSGrad% 29.0314 7.1268 4.0736 0.0002

b. t.005 =T.INV(.005,45) =± 2.586. Only the coefficient on HSGrad% differs significantly


from zero at a significance level of .01.
c. See table in part a.
Learning Objective: 13-3

13.13 Use ŷi ± t/2sewith 34 df, t.025 =T.INV(.025,34) = 2.032. (Use the positive value.)
Half width of 95% prediction interval = 2.032(3620) =7355.84
Using the quick rule the half width =2se = 2(3620) = 7240
Yes, the quick rule gives similar results.
Learning Objective: 12-9

13.14 Use ŷi ± t/2se with 20df, t.025 =T.INV(.025,34) = 2.086. (Use the positive value.)
Half width of 95% prediction interval =2.086 (1.17) =2.4406
Using the quick rule the half width = 2se = 2(1.17) = 2.34
Yes, the quick rule gives similar results.
Learning Objective: 12-9

13.15 a. Number of nights (NumNight) needed and number of bedrooms (NumBedrooms) are
both discrete.
b. Two: SwimPool = 1 if there is a swimming pool and ParkGarage = 1 if there is a
parking garage
c. CondoPrice = 0 + 1NumNights + 2NumBedrooms + 3SwimPool + 4ParkGarage
Learning Objective: 13-3
Learning Objective: 13-5

13.16 a. Weight of stone (continuous)


b. Nine: There are 6 different values for Colorrating so the model would need 5 (6-1)
indicator variables for color rating. There are 5 different Clarity rating values so the
model would need 4 (5-1) indicator variables for clarity rating.

13-4
Chapter 13 - Multiple Regression

c.Price = 0 + 1Weight + 2ColorD + 3ColorE +4ColorF +5ColorG +6ColorH


+7ClarityIF +8ClarityVVS1+9ClarityVVS2+10ClarityVS1
Learning Objective: 13-3
Learning Objective: 13-5

13.17 a. ln(Price) = 5.4841  0.0733SalePrice + 1.1196Sub-Zero + 0.0696Capacity +


0.04662DoorFzBot 0.34322DoorFzTop 0.70961DoorFz 0.88201DoorNoFz
b. Use p-value =T.DIST.2T(|tcalc|, 319) SalePrice:p-value = .0019, Sub-Zero: p-value =
2.24E-14, Capacity: p-value = 2.71E-31, 2DoorFzBot: p-value = .5650, 2DoorFzTop
= p-value = 3.68E-19, 1DoorFz: p-value = 1.19E-07, 1DoorNoFz: p-value = 8.59E-
09
The only variable that is not a significant predictor is 2DoorFzBot. This is an
indicator variable which means that there is not a significant difference in price
between two door refrigerators that have the freezer compartment on the side or on
the bottom.
c. The coefficient on two door, top freezer is −0.3432 so the natural log of the price
decreases by 0.3432.
d. The side freezer model demands a higher price because there is a negative coefficient
on the 1DoorFz model indicator variable.
Learning Objective: 13-3
Learning Objective: 13-5

13.18 a. SentenceLength = 3.2563 + 0.5219Age + 7.7412Convictions


6.0852Married14.3402Employed
b. Use p-value =T.DIST.2T(|tcalc|, 45). Age:p-value = 9.54E-06, Convictions: p-value =
2.03E-09, Married: p-value = .0228, Employed: p-value = 1.00E-06
c. A married male convicted of assault will receive a sentence that is about 6 years
shorter than an unmarried male assault convict.
d. About 14 years.
e. SentenceLength = 3.2563+ 0.5219(25) + 7.7412(1)  6.0852(0) 14.3402(0) =
24.045.
Learning Objective: 13-3
Learning Objective: 13-5

13-5
Chapter 13 - Multiple Regression

13.19 a. The scatter plot shows an obvious increasing trend but it is nonlinear rather than
linear. The increase in salary is much steeper in the earlier years than in the later
years. A nonlinear model would be appropriate.

b. MegaStat Output is below: R2 = .915, Fcalc =194.99, p-value = .0000. Yes, the model is
significant.
Regression Analysis

R² 0.915
Adjusted R² 0.911 n 39
R 0.957 k 2
Std. Error 8.757 Dep. Var. Salary ($1000)

ANOVA
table
Source SS df MS F p-value
29,901.972
Regression 8 2 14,950.9864 194.99 4.84E-20
Residual 2,760.3861 36 76.6774
32,662.359
Total 0 38

Regression output confidence interval


std. 95% 95%
variables coefficients error t (df=36) p-value lower upper
Intercept 45.3322 3.0644 14.793 7.13E-17 39.1172 51.5472
Years 5.6218 0.4742 11.856 5.45E-14 4.6601 6.5835
YearsSq -0.0945 0.0139 -6.789 6.21E-08 -0.1227 -0.0663

13-6
Chapter 13 - Multiple Regression

c. Years: p-value = .0000, YearsSq: p-value = .0000. Both of these predictors are
significant.
Learning Objective: 13-3

13.20 MegaStat Output is below: Male: p-value = .5009, YearsxMale: p-value = .0505. The
binary variable Male is not significant. The interaction variable YearsxMale is
significant because the p-value is less than .10. The coefficient on the interaction term
is positive which means that as men gain more years of experience their salaries tend
to be higher than females.

Regression Analysis

R² 0.945
Adjusted R² 0.939 n 39
R 0.972 k 4
Std. Error 7.269 Dep. Var. Salary ($1000)

ANOVA
table
Source SS df MS F p-value
30,865.754
Regression 2 4 7,716.4386 146.03 6.59E-21
Residual 1,796.6048 34 52.8413
32,662.359
Total 0 38

Regression output confidence interval


std. 95% 95%
variables coefficients error t (df=34) p-value lower upper
Intercept 44.6252 3.7277 11.971 9.62E-14 37.0495 52.2008
Years 4.7166 0.5098 9.251 8.23E-11 3.6805 5.7527
YearsSq -0.1033 0.0129 -8.019 2.40E-09 -0.1295 -0.0771
Male 3.2301 4.7476 0.680 .5009 -6.4182 12.8783
YearsxMale 1.0938 0.5395 2.027 .0505 -0.0026 2.1902

Learning Objective: 13-3


Learning Objective: 13-5

13-7
Chapter 13 - Multiple Regression

13.21 a. All but one pair of variables is significantly correlated at α = .01. See the matrix
below. LiftOps and Scanners (r = .635), Crowds and LiftWait (r = .577), AmountGr
and TrailGr (r = .531), SkiSafe and SpSeen (r = .488)
Correlation Matrix
scanners liftops liftwait trailv snosurf crowds amountgr trailgr skisafe spseen hosts
scanners 1.000
liftops .635 1.000
liftwait .146 .180 1.000
1.00
trailv .115 .206 .128 0
snosurf .190 .242 .227 .373 1.000
crowds .245 .299 .577 .235 .348 1.000
amountgr .245 .271 .251 .221 .299 .372 1.000
1.00
trailgr .266 .337 .205 .360 .358 .362 .531 0
skisafe .200 .306 .196 .172 .200 .332 .274 .323 1.000
spseen .145 .190 .207 .172 .184 .230 .149 .172 .488 1.000
1.00
hosts .245 .278 .046 .140 .119 .133 .128 .156 .212 .350 0

502sample size

± .088 critical value .05 (two-tail)


± .115 critical value .01 (two-tail)

b. All VIFs are less than 2. No cause for concern.


variables VIF
Intercept
scanners 1.718
liftops 1.887
liftwait 1.531
trailv 1.270
snosurf 1.337
crowds 1.838
amountgr 1.504
trailgr 1.676
skisafe 1.518
spseen 1.477
hosts 1.224

Learning Objective: 13-6

13-8
Chapter 13 - Multiple Regression

13.22 a. Al and Si (r = .456), Cr and Zn (r = .529) are significantly correlated at an  = .01. Al


and Ti (r = .389), Si and Zn (r = .365), and Ti and Pb (r = .345) are significantly
correlated at an  = .05.
Correlation Matrix

Al Si Cr Ti Zn Pb
Al 1.000
Si .456 1.000
Cr .133 -.073 1.000
Ti .389 .278 .011 1.000
Zn .286 .365 .529 -.083 1.000
Pb -.202 -.053 -.114 -.345 .180 1.000

33sample size

± .344 critical value .05 (two-tail)


± .442 critical value .01 (two-tail)

b. All VIFs are low in value. No cause for concern.

variables VIF
Intercept
Al 1.500
Si 1.674
Cr 1.761
Ti 1.390
Zn 2.197
Pb 1.281

Learning Objective: 13-6

2(k + 1)
13.23 If hi> then the observation is considered to be a high leverage observation.
n
2(5 + 1)
a. = .1667 . hi = .15 < .1667 therefore this is not a high leverage observation.
72
2(4 + 1)
b. = .10 . hi = .18 > .10 therefore this is a high leverage observation.
100
2(7 + 1)
c. = .0667 . hi = .08 > .0667 therefore this is a high leverage observation.
240
Learning Objective:13-8

13.24 Assumption #1: Residuals are normally distributed. It appears this assumption has been
violated because the histogram shows a skewed left distribution. Because the data set
is fairly small the normplot is not as useful for detecting non-normality.

13-9
Chapter 13 - Multiple Regression

Assumption #2: Residuals have constant variance. It appears this assumption has been
violated. The residuals plotted against the predicted Y values show a fan out pattern
which indicates heteroscedasticity or non-constant variance.
Assumption #3: Residuals are independent. It appears this assumption has been violated.
Applying the runs test (see section 12.8) to the Runs plot we see there are 10 crossing
points. If autocorrelation did not exist we would expect approximately 15/2 or 7 to 8
crossings. We have more than 8 crossings so negative autocorrelation is a concern.

Questions 13.25 through 13.41 refer to 10 different data sets labeled A-J. The answers to each question
are listed for each data set in turn.

DATA SET AResponse Variable: Vehicle City Mileage

13.25 Cross-sectional. Unit of observation: vehicle model.


Learning Objective: 02-3

13.26 The variable magnitudes are not too different. Weight is approximately 100 times the
magnitude of the other variables but this should not cause problems in the analysis.
Learning Objective: 13-9

13.27 The intercept would not have meaning. It would not be logical to have a car with zero
values for any of the predictors. A priori reasoning for the relationship between each
predictor and the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
Length Negative Bigger size, lower mileage
Width Negative Bigger size, lower mileage
Weight Negative Bigger size, lower mileage
Japanese cars have a reputation for better
Japan Positive mileage
Learning Objective: 12-2

13.28 43/4 = 10.75 > 10. The data set meets both Evans’ and Doane’s Rules.
Learning Objective: 13-1

13.29 The estimated regression equation is ŷ = 43.9932 − 0.0039length − 0.1064width −


0.0041weight − 1.3228Japan. The signs on the coefficients match our a priori
reasoning except for Japanese vehicles.

13-10
Chapter 13 - Multiple Regression

MegaStat Output:
Regression Analysis
R² 0.703
Adjusted R² 0.671 n 43
R 0.838 k 4
Std. Error 2.505 Dep. Var. City
ANOVA table
Source SS df MS F p-value
Regression 563.9264 4 140.9816 22.46 1.40E-09
Residual 238.5387 38 6.2773
Total 802.4651 42
Regression output confidence interval
variables coefficients std. error t (df=38) p-value 95% lower 95% upper
Intercept 43.9932 8.4767 5.190 7.33E-06 26.8330 61.1534
Length -0.0039 0.0445 -0.087 .9311 -0.0939 0.0862
Width -0.1064 0.1395 -0.763 .4501 -0.3888 0.1759
Weight -0.0041 0.0008 -4.955 1.53E-05 -0.0058 -0.0024
Japan -1.3228 0.8146 -1.624 .1127 -2.9718 0.3262
Learning Objective: 13-1
Learning Objective: 13-3

13.30 Refer to the output in the question 13.29. The coefficient confidence intervals contain zero
except for the variable Weight. This means that Weight is the only significant
predictor in the model (at a significance level of .05.)
Learning Objective: 13-4

13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 38,
t.025 =T.INV(.025, 38) = ±2.024. Only Weight has a significant result with tcalc =
4.955< −2.024
Learning Objective: 13-3

13.32 a. Weight: p-value = 1.53E-05 < .05.


b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3

13.33 Fcalc = 22.46 with a p-value = 1.40E-09. R2 = .703 and R2adj = .671. The model provides
significant fit with a fairly strong prediction of city mileage.
Learning Objective: 13-2

13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.024(2.505) = yˆi ± 5.07. Yes, this model does have
practical value.
Learning Objective: 12-9

13-11
Chapter 13 - Multiple Regression

13.35 a.
Correlation Matrix

Length Width Weight Japan


Length 1.000
Width .720 1.000
Weight .753 .739 1.000
Japan -.160 -.267 -.093 1.000

43sample size

± .301 critical value .05 (two-tail)


± .389 critical value .01 (two-tail)

b. Both length and width are significantly correlated with Weight. Collinearity could be a
problembut according to Klein’s Rule we shouldn’t be overly concerned. (Both .72
and .753 are less than .703 = .838 .)
Learning Objective: 13-6

13.36 a.
variables VIF
Intercept
Length 2.672
Width 2.746
Weight 2.907
Japan 1.106

b. The VIF values are all under 3 which suggest that multicollinearity has not caused
instability. In fact, both length and width turned out to be insignificant in the model
which is what one would expect.
Learning Objective: 13-6

13.37 Vehicle 42, the Jetta, was the only observation that had an outlier residual.
Learning Objective: 13-8

13.38 Observation 2, 8, 13, and 21 had high leverage values. To determine if any of the
observations were influential, remove them from the data set and rerun the regression.
If the regression statistics change significantly then the observation could be
considered influential.
Learning Objective: 13-8

13-12
Chapter 13 - Multiple Regression

13.39 Assuming normally distributed residuals appears reasonable with a high outlier. Although
running a normality test of the residuals (A-D and moment tests) results in a failure;
more than likely because of the outlier.

Learning Objective: 13-7

13-13
Chapter 13 - Multiple Regression

13.40 Assuming homoscedastic residuals appears reasonable. We see the one high outlier on the
plot.

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.


Learning Objective: 13-7

13-14
Chapter 13 - Multiple Regression

DATA SETB:Response Variable: Noodles& Company Sales/SqFt

13.25 Cross-sectional. Unit of observation: restaurant.


Learning Objective: 02-3

13.26 The variable magnitudes are all similar.


Learning Objective: 13-9

13.27 The intercept would not have meaning. It would not be logical to have a restaurant with
zero values for any of the predictors. A priori reasoning for the relationship between
each predictor and the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
Seats- The larger the size of the restaurant, the
Inside greater the sales.
Positive
Seats- The larger the size of the restaurant, the
Patio greater the sales.
Positive
MedInco The higher the income of the potential
me customers, the higher the sales
Positive
MedAge The older the potential
Positive customers, the higher the sales
BachDeg More education would be positively
% correlated with higher income and therefore
Positive higher the sales
Learning Objective: 12-2

13.28 74/5 = 14.8 > 10. The data set meets both Evans’ and Doane’s Rules.
Learning Objective: 13-1

13.29 The estimated regression equation is ŷ = 429.5114 − 1.8149Seats-Inside + 1.2719Seats-


Patio − 2.1021MedIncome − 0.0158MedAge + 8.6604BachDeg%
The signs on the coefficients do not match our a priori reasoning for Seats-Inside,
MedIncome, and MedAge.
MegaStat Output:
Regression Analysis

R² 0.233
Adjusted R² 0.177 n 74
R 0.483 k 5
Std. Error 124.529 Dep. Var. Sales/SqFt

ANOVA table
Source SS df MS F p-value
Regression 320,276.8169 5 64,055.3634 4.13 .0025
Residual 1,054,515.7777 68 15,507.5850
Total 1,374,792.5946 73

13-15
Chapter 13 - Multiple Regression

13-16
Chapter 13 - Multiple Regression

Regression output confidence interval


variables coefficients std. error t (df=68) p-value 95% lower 95% upper
Intercept 429.5114 182.1907 2.357 .0213 65.9556 793.0672
Seats-Inside -1.8149 0.9975 -1.819 .0733 -3.8054 0.1757
Seats-Patio 1.2719 1.0614 1.198 .2350 -0.8462 3.3900
MedIncome -2.1021 1.0941 -1.921 .0589 -4.2853 0.0811
MedAge -0.0158 4.4891 -0.004 .9972 -8.9737 8.9420
BachDeg% 8.6604 2.6187 3.307 .0015 3.4348 13.8860

Learning Objective: 13-1


Learning Objective: 13-3

13.30 Refer to the output in question 13.29. The 95% coefficient confidence intervals for the
variables Seats-Inside, Seats-Patio, MedIncome, and MedAge all contain zero. Only
BachDeg% is a significant predictor in the model at a significance level of .05.
Learning Objective: 13-4

13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 68,
t.025 =T.INV(.025,68) =±1.995. Only BachDeg% has a significant result at  = .05
with tcalc = 3.307> 1.995.
Learning Objective: 13-3

13.32 a. BachDeg%: p-value = .0015 < .05. Note that both MedIncome and Seats-Inside are
both significant at  = .10.
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3

13.33 Fcalc = 4.13 with a p-value = .0025. R2 = .233 and R2adj = .177. The model provides
significant fit but does not provide strong prediction of restaurant sales/sqft.
Learning Objective: 13-2

13.34 Prediction interval: yˆi ± t.025se = yˆi ± 1.995(124.529) = yˆi ± 248.4354. No, this model does
not have practical value because the width of the interval is so wide.
Learning Objective: 12-9

13-17
Chapter 13 - Multiple Regression

13.35 a.

Seats-Inside Seats-Patio MedIncome MedAge BachDeg%


Seats-Inside 1.000
Seats-Patio .007 1.000
MedIncome -.047 -.009 1.000
MedAge -.102 -.065 .416 1.000
BachDeg% -.158 .151 .552 .097 1.000

74sample size

± .229 critical value .05 (two-tail)


± .298 critical value .01 (two-tail)

b. Both MedAge and BachDeg% are significantly correlated with MedIncome.


Collinearity could be a problemaccording to Klein’s Rule because .552 >
.233 = .483 .
Learning Objective: 13-6

13.36 a.
variables VIF
Intercept
Seats-Inside 1.045
Seats-Patio 1.039
MedIncome 1.807
MedAge 1.267
BachDeg% 1.584

b. The VIF values are all under 2 which suggests that multicollinearity has not caused
instability.
Learning Objective: 13-6

13.37 Observations 6, 19, 22, 46, and 69 would be considered unusual or outlier residuals.
Learning Objective: 13-8

13.38 Observation 14, 19, 23, and 69 had high leverage values. To determine if any of the
observations were influential, remove them from the data set and rerun the regression.
If the regression statistics change significantly then the observation could be
considered influential.
Learning Objective: 13-8

13-18
Chapter 13 - Multiple Regression

13.39 Assuming normally distributed residuals appears reasonable.

Learning Objective: 13-7

13-19
Chapter 13 - Multiple Regression

13.40 Assuming homoscedastic residuals appears reasonable. There is no obvious fan out or
funnel pattern in the plot below.

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.


Learning Objective: 13-7

13-20
Chapter 13 - Multiple Regression

DATA SET C Response Variable: Medical Office Building Assessed Value

13.25 Cross-sectional. Unit of observation: office building.


Learning Objective: 02-3

13.26 The variable magnitudes are not too different. Floor space is approximately 1000 times the
magnitude of the other predictor variables but is similar to the response variable.
Learning Objective: 13-9

13.27 The intercept would not have meaning. It would not be logical to have a building with zero
values for any of the predictors. A priori reasoning for the relationship between each
predictor and the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
Floor Positive Bigger size, higher value
Offices Positive More offices, higher value
Entrances Positive More entrances, higher value
Increase in age means more maintenance,
Age Negative lower value
Freeway Positive Closer access to freeway, higher value
Learning Objective: 12-2

13.28 32/5 = 6.5. The data set meets Doane’s Rule but not Evans’.
Learning Objective: 13-1

13.29 The estimated regression equation is ŷ = −59.3894 + 0.2509Floor + 97.7927Offices +


72.8405Entrances − 0.4570Age + 116.1786Freeway. The signs on the coefficients
match our a priori reasoning.
MegaStat Output:
Regression Analysis

R² 0.967
Adjusted R² 0.961 n 32
R 0.983 k 5
Std. Error 90.189 Dep. Var. Assessed

ANOVA table
Source SS df MS F p-value
Regression 6,225,261.2561 5 1,245,052.2512 153.07 2.01E-18
Residual 211,486.6189 26 8,134.1007
Total 6,436,747.8750 31

Regression output confidence interval


variables coefficients std. error t (df=26) p-value 95% lower 95% upper
Intercept -59.3894 71.9826 -0.825 .4168 -207.3519 88.5730
Floor 0.2509 0.0218 11.494 1.08E-11 0.2060 0.2957

13-21
Chapter 13 - Multiple Regression

Offices 97.7927 30.8056 3.175 .0038 34.4709 161.1145


Entrances 72.8405 38.7501 1.880 .0714 -6.8114 152.4924
Age -0.4570 1.2011 -0.380 .7067 -2.9258 2.0118
Freeway 116.1786 34.7721 3.341 .0025 44.7036 187.6535
Learning Objective: 13-1
Learning Objective: 13-3

13.30 Refer to the output in question 13.29. The coefficient confidence intervals contain zero for
the variables Entrances and Age. This means that Floor, Offices, and Freeway are
significant predictors in the model.
Learning Objective: 13-4

13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 26,
t.025 =T.INV(.025, 26) =±2.056. Floor, Offices, and Freeway both have significant
results with tcalc = 11.494, 3.175, and 3.341, respectively (both are greater than 2.056.)
Learning Objective: 13-3

13.32 a. Floor:p-value = 1.08E-11, Offices: p-value = .0038, and FreewayWeight: p-value = .


0025. All p-values are less than .05
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3

13.33 Fcalc = 153.07 with a p-value = 2.01E-18. R2 = .967 and R2adj = .961. The model provides
significant fit with a very strong prediction of building assessed value.
Learning Objective: 13-2

13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.056(90.189) = yˆi ± 185.429. Yes, this model does
have practical value. The prediction interval width should provide valuable
information.
Learning Objective: 12-9

13-22
Chapter 13 - Multiple Regression

13.35 a.

Floor Offices Entrances Age Freeway


Floor 1.000
Offices .823 1.000
Entrances .567 .444 1.000
Age -.189 -.241 .136 1.000
Freeway -.331 -.368 -.082 .175 1.000

32sample size

± .349 critical value .05 (two-tail)


± .449 critical value .01 (two-tail)

b. Both Offices and Entrances are significantly correlated with Floor. Collinearity is
most likely not a problem according to Klein’s Rule. The correlation coefficient
values are less than .967.
Learning Objective: 13-6

13.36 a.
variables VIF
Intercept
Floor 3.757
Offices 3.267
Entrances 1.638
Age 1.169
Freeway 1.185

b. The VIF values are all under 4 which suggests that multicollinearity has not caused
instability.
Learning Objective: 13-6

13.37 Building 5 was the only observation that had an unusual residual.
Learning Objective: 13-8

13.38 There were no high leverage values.


Learning Objective: 13-8

13.39 The histogram shows a slight left skewed distribution but is unimodal with no obvious
outliers. The normplot is a fairly straight line on the diagonal. Assuming normally
distributed residuals appears reasonable.

13-23
Chapter 13 - Multiple Regression

Learning Objective: 13-7

13-24
Chapter 13 - Multiple Regression

13.40 Assuming homoscedastic residuals appears reasonable. The residual plot does not show a
fan out or funnel pattern.

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.


Learning Objective: 13-7

13-25
Chapter 13 - Multiple Regression

DATA SET D Response Variable: Percent Change in Consumer Price Index

13.25 Time-series. Unit of observation: one year.


Learning Objective: 02-3

13.26 The variable magnitudes are similar.


Learning Objective: 13-9

13.27 The intercept would not have meaning. While it might be possible to have 0% change in
currency demands and deposits, it would not be logical to have a zero unemployment
rate or zero utilization of manufacturing capacity. A priori reasoning for the
relationship between each predictor and the response variable are listed in the table
below.
Predictor Relationship with Reason?
Response
CapUtil Positive Greater utilization, increase in CPI
ChgM1 Negative Increase in deposits, CPI stays stable
ChgM2 Positive Increase in small deposits, CPI increases
Unem Positive Unemployment increases, CPI increases
Learning Objective: 12-2

13.28 41/4 = 10.25. The data set meets both Evans’ and Doane’s Rules.
Learning Objective: 13-1

13.29 The estimated regression equation is ŷ = −25.2195 + 0.2806CapUtil − 0.0847ChgM1 +


0.2205ChgM2 + 1.0511Unem. The signs on the coefficients match our a priori
reasoning.
MegaStat Output:
Regression Analysis

R² 0.225
Adjusted R² 0.139 n 41
R 0.474 k 4
Std. Error 2.623 Dep. Var. ChgCPI

ANOVA table
Source SS df MS F p-value
Regression 71.8691 4 17.9673 2.61 .0514
Residual 247.5957 36 6.8777
Total 319.4649 40

13-26
Chapter 13 - Multiple Regression

Regression output confidence interval


variables coefficients std. error t (df=36) p-value 95% lower 95% upper
Intercept -25.2195 11.7919 -2.139 .0393 -49.1346 -1.3044
CapUtil 0.2806 0.1258 2.231 .0320 0.0255 0.5357
ChgM1 -0.0847 0.1117 -0.758 .4531 -0.3112 0.1418
ChgM2 0.2205 0.1383 1.594 .1197 -0.0601 0.5011
Unem 1.0511 0.4086 2.572 .0144 0.2224 1.8798
Learning Objective: 13-1
Learning Objective: 13-3

13.30 Refer to the output in question 13.29. The 95% coefficient confidence intervals for ChgM1
and ChgM2 both contain zero. This means that neither of these predictors is
significant at  = .05. The confidence intervals for CapUtil and Unem do not contain
zero so both predictors are significant at a .05 level.
Learning Objective: 13-4

13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 36,
t.025 =T.INV(.025, 36) =±2.028. CapUtil has a tcalc = 2.231 and Unemp has a tcalc =
2.572, which are both greater than 2.028 so this indicates significance.
Learning Objective: 13-3

13.32 a. CapUtil: p-value = .0320 and Unempp-value = .0144. Both are significant at  = .05.
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3

13.33 Fcalc = 2.61 with a p-value = .0514. R2 = .225 and R2adj = .139. The model does not provide
significant fit at  = .05.
Learning Objective: 13-2

13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.028(2.623) = yˆi ± 5.319. Based on the response to
question 13.33 and the wide prediction interval,no, this model does not have practical
value.
Learning Objective: 12-9

13-27
Chapter 13 - Multiple Regression

13.35 a.

CapUtil ChgM1 ChgM2 Unem


CapUtil 1.000
ChgM1 -.257 1.000
ChgM2 -.284 .316 1.000
Unem -.649 .504 .303 1.000

41sample size

± .308 critical value .05 (two-tail)


± .398 critical value .01 (two-tail)

b. Unemployment is significantly correlated with manufacturing capacity utilization and


the variable CHGM1. Collinearity could be a problemaccording to Klein’s Rule. Both
.649 and .504 are greater than .225 = .474.
Learning Objective: 13-6

13.36 a.
variables VIF
Intercept
CapUtil 1.785
ChgM1 1.420
ChgM2 1.171
Unem 2.192

b. The VIF values are all under 3 which suggests that multicollinearity has not caused
instability.
Learning Objective: 13-6

13.37 1974, 1979, and 1980 were years that had unusual or outlier residuals.
Learning Objective: 13-8

13.38 1992 and 2001 were high leverage years. To determine if any of the observations were
influential, remove them from the data set and rerun the regression. If the regression
statistics change significantly then the observation could be considered influential.
Learning Objective: 13-8

13-28
Chapter 13 - Multiple Regression

13.39 The normplot is not as straight of a line as one would like to see. The histogram of
residuals is right skewed with several possible outliers. Assuming normally
distributed residuals is questionable.

Learning Objective: 13-7

13-29
Chapter 13 - Multiple Regression

13.40 Assuming homoscedastic residuals appears reasonable although one might question the
slight increase in residual magnitude for the predictions of greater positive change.

Learning Objective: 13-7

13.41 A test for autocorrelation is warranted. DW = 0.75 which suggests positive autocorrelation.

Learning Objective: 13-7

13-30
Chapter 13 - Multiple Regression

DATA SET E Response Variable: College Graduation Rate by State

13.25 Cross-sectional. Unit of observation: state.


Learning Objective: 02-3

13.26 The variable magnitudes are not too different. Education spending by state is
approximately 100 times the magnitude of the other variables but this should not
cause a problem.
Learning Objective: 13-9

13.27 The intercept would not have meaning. None of the quantitative predictor variables would
logically be zero. A priori reasoning for the relationship between each predictor and
the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
Higher dropout rate, lower college graduation
Dropout Negative rate
Higher spending, higher college graduation
EdSpend Positive rate
Greater urban population, higher college
Metro% Positive graduation rate
Age Negative Older population, fewer attending college
More women in workforce, more college
LPRFem Positive graduates
With Midwest as the base, more college
Neast Positive graduates in the northeast
With Midwest as the base, more college
Seast Positive graduates in the southeast
With Midwest as the base, more college
West Positive graduates in the west
Learning Objective: 12-2

13.28 50/8 = 6.25> 5. The data set meets Doane’s Rule but does not meet Evans’.
Learning Objective: 13-1

13.29 The estimated regression equation is ŷ = −21.7629− 0.2579Dropout + 0.0025EdSpend +


0.2036Metro% − 0.1458Age + 0.5477LPRFem + 5.26Neast + 2.0008Seast +
2.6378West. The signs on the coefficients match our a priori reasoning.
MegaStat Output:
Regression Analysis

R² 0.692
Adjusted
R² 0.632 n 50
R 0.832 k 8
Std. Error 3.099 Dep. Var. ColGrad%

13-31
Chapter 13 - Multiple Regression

ANOVA
table
Source SS df MS F p-value
Regression 885.4526 8 110.6816 11.53 2.16E-08
Residual 393.7026 41 9.6025
1,279.155
Total 2 49

Regression output confidence interval


coefficient t 95%
variables s std. error (df=41) p-value 95% lower upper
Intercept -21.7629 21.9424 -0.992 .3271 -66.0766 22.5507
Dropout -0.2579 0.1846 -1.398 .1697 -0.6307 0.1148
EdSpend 0.0025 0.0018 1.396 .1703 -0.0011 0.0062
Metro% 0.2036 0.0562 3.621 .0008 0.0900 0.3171
AgeMed -0.1458 0.2730 -0.534 .5963 -0.6971 0.4056
LPRFem 0.5477 0.1880 2.913 .0058 0.1680 0.9274
Neast 5.2600 1.5626 3.366 .0017 2.1042 8.4157
Seast 2.0008 1.7252 1.160 .2529 -1.4834 5.4849
West 2.6378 1.3417 1.966 .0561 -0.0718 5.3474

Learning Objective: 13-1


Learning Objective: 13-3

13.30 Refer to the output in question 13.29. The coefficient confidence intervals contain zero
except for the variables Metro%, LPRFem, and Neast. These three variables are the
only predictors significant at  = .05.
Learning Objective: 13-4

13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 41,
t.025 =T.INV(.025,41) = ±2.020. The t statistics for Metro%, LPRFem, and Neast are
3.621, 2.913, and 3.366, respectively. Each value is greater than 2.02.
Learning Objective: 13-3

13.32 a. Metro%: p-value =.0008, LPRFem: p-value = .0058, and Neast: p-value = .0017.
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3

13.33 Fcalc = 11.53 with a p-value = 2.16E-09. R2 = .692 and R2adj = .632. The model provides
significant fit with a fairly strong prediction of state college graduation rates.
Learning Objective: 13-2

13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.02(3.099) = yˆi ± 6.26. Yes, this model does have
practical value.

13-32
Chapter 13 - Multiple Regression

Learning Objective: 12-9

13.35 a.

Dropou EdSpen Metro AgeMe LPRFe


t d % d m
Dropout 1.000
EdSpen
d -.323 1.000
Metro% .156 .183 1.000
AgeMed -.101 .119 -.274 1.000
LPRFem -.741 .228 -.291 -.002 1.000

50 sample size

± .279 critical value .05 (two-tail)


± .361 critical value .01 (two-tail)

b. The only two variables that have an r value that might raise a red flag are LPRFem
and Dropout. However this should not be a problemaccording to Klein’s Rule because
.741< .692 = .832.
Learning Objective: 13-6

13.36 a.
variables VIF
Intercept
Dropout 2.637
EdSpend 1.447
Metro% 1.684
AgeMed 1.886
LPRFem 3.174
Neast 2.182
Seast 2.827
West 1.890

b. The VIF values are all 3 or less (except for LPRFEM which is slightly greater than 3)
which suggests that multicollinearity has not caused instability.
Learning Objective: 13-6

13.37 Delaware and Wyoming are the only two states that showed unusual or outlier residuals.
Learning Objective: 13-8

13.38 Utah and West Virginia had high leverage values. To determine if any of the observations
were influential, remove them from the data set and rerun the regression. If the
regression statistics change significantly then the observation could be considered
influential.
Learning Objective: 13-8

13-33
Chapter 13 - Multiple Regression

13.39 Although the normplot is not as straight of a line as one would like and the histogram is
slightly skewed to the left, the distribution is unimodel with tapering tails and there
are no obvious outliers. Assuming normally distributed residuals appears reasonable.

Learning Objective: 13-7

13-34
Chapter 13 - Multiple Regression

13.40 Assuming homoscedastic residuals is not reasonable. There is a clear fan out pattern. It
appears that the residual variation increases as the percentage of a state’s population
living in metropolitan areas increases. Taking a log transform of the dependent
variable would not be a good solution here because the magnitudes of the variables
are similar. There may be lurking variables we could add to the model to correct this
problem.

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.


Learning Objective: 13-7

13-35
Chapter 13 - Multiple Regression

DATA SET F Response Variable: Cruise Speed of Piston Aircraft

13.25 Cross-sectional. Unit of observation: aircraft model.


Learning Objective: 02-3

13.26 The variable magnitudes are all similar.


Learning Objective: 13-9

13.27 The intercept would not have meaning. It would not be logical to have an aircraft with zero
values for any of the predictors. A priori reasoning for the relationship between each
predictor and the response variable are listed in the table below. Note that the
variable Age is calculated by subtracting the year of manufacture from 2010.
Predictor Relationship with Reason?
Response
Age Negative Older engine, lower speed
TotalHP Positive Bigger engine, higher speed
NumBlades Positive More blades, higher speed
Turbo Positive Stronger engine, higher speed
Learning Objective: 12-2

13.28 55/4 = 13.75. The data set meets both Evans’ and Doane’s Rules.
Learning Objective: 13-1

13.29 The estimated regression equation is ŷ = 92.4431 + 0.1787TotalHP + 8.8269NumBlades +


15.9752Turbo −0.3927Age. The signs on the coefficients match our a priori
reasoning.
MegaStat Output:
Regression Analysis
R² 0.768
Adjusted R² 0.750 n 55
R 0.876 k 4
Std. Error 18.097 Dep. Var. Cruise
ANOVA table
Source SS df MS F p-value
54,232.905
Regression 0 4 13,558.2262 41.40 2.75E-15
16,375.204
Residual 1 50 327.5041
70,608.109
Total 1 54
Regression output confidence interval
95% 95%
variables coefficients std. error t (df=50) p-value lower upper
Intercept 92.4431 13.2196 6.993 6.16E-09 65.8907 118.9955
TotalHP 0.1787 0.0195 9.167 2.76E-12 0.1396 0.2179
NumBlades 8.8269 5.7530 1.534 .1313 -2.7284 20.3823
Turbo 15.9752 6.2959 2.537 .0143 3.3296 28.6208
Age -0.3927 0.1991 -1.972 .0541 -0.7927 0.0073

13-36
Chapter 13 - Multiple Regression

Learning Objective: 13-1


Learning Objective: 13-3

13.30 Refer to the output in question 13.29. The coefficient confidence intervals contain zero
except for the variables TotalHP and Turbo. Only TotalHP and Turbo are significant
predictors at  = .05.
Learning Objective: 13-4

13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 50,
t.025 =T.INV(.025, 50) =±2.009. The predictor TotalHP has a tcalc = 9.167 and Turbo
has a tcalc = 2.537 which are both greater than 2.009 therefore we reject the null
hypotheses and conclude their coefficients are not equal to zero.
Learning Objective: 13-3

13.32 a. TotalHP: p-value = 2.76E-12 and Turbo: p-value =.0143. Both p-values are less than .
05. Note that the variable Age has a p-value = .0541 which is significant at  = .10
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3

13.33 Fcalc = 41.40 with a p-value = 2.75E-15. R2 = .768 and R2adj = .750. The model provides
significant fit with a fairly strong prediction of aircraft cruising speed.
Learning Objective: 13-2

13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.009(18.097) = yˆi ± 36.357. Yes, this model does
have practical value.
Learning Objective: 12-9

13.35 a.

NumBlade
TotalHP s Turbo Age
TotalHP 1.000
NumBlade
s .491 1.000
Turbo .096 .388 1.000
Age .154 -.180 -.030 1.000

sample
55 size

± .266 critical value .05 (two-tail)


± .345 critical value .01 (two-tail)

13-37
Chapter 13 - Multiple Regression

b. Collinearity should not be a problemaccording to Klein’s Rule. The correlation


coefficient values are all less than R 2 .
Learning Objective: 13-6
13.36 a.
variables VIF
Intercept
Age 1.131
TotalHP 1.459
NumBlades 1.716
Turbo 1.201

b. The VIF values are all under 2 which suggest that multicollinearity has not caused
instability.
Learning Objective: 13-6

13.37 Aircraft 23, 39 and 46 have unusual residual values.


Learning Objective: 13-8

13.38 Observations 3, 8, 43, and 46 have high leverage values. To determine if any of the
observations were influential, remove them from the data set and rerun the regression.
If the regression statistics change significantly then the observation could be
considered influential.
Learning Objective: 13-8

13.39 Assuming normally distributed residuals appears reasonable.

13-38
Chapter 13 - Multiple Regression

Learning Objective: 13-7

13.40 Assuming homoscedastic residuals appears reasonable.

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.


Learning Objective: 13-7

13-39
Chapter 13 - Multiple Regression

DATA SET G Response Variable: Chromatographic Retention Time

13.25 Cross-sectional. Unit of observation: chemical compound.


Learning Objective: 02-3

13.26 The variable magnitudes are all similar.


Learning Objective: 13-9

13.27 The intercept would not have meaning. It would not be logical to for a particular compound
to have zero values for any of the predictors. A priori hypotheses for the relationship
between each predictor and the response variable are listed in the table below.
Predictor Relationship with
Response
MW Negative
BP Positive
RI Negative
H1 Negative
H2 Negative
H3 Negative
H4 Negative
Learning Objective: 12-2

13.28 35/7 = 5. The data set meets Doane’s Rule but not Evans’.
Learning Objective: 13-1

13.29 The estimated regression equation is ŷ = 51.3827 − 0.1772MW + 1.4901BP − 13.1620RI −


13.8067H1 − 6.4334H2 − 12.2297H3 − 0.5823H4. The signs on the coefficients
match our a priori reasoning.

13-40
Chapter 13 - Multiple Regression

MegaStat Output:
Regression Analysis

R² 0.987
Adjusted R² 0.983 n 35
R 0.993 k 7
Std. Error 8.571 Dep. Var. Ret

ANOVA table
Source SS df MS F p-value
Regression 146,878.2005 7 20,982.6001 285.64 1.27E-23
Residual 1,983.3648 27 73.4580
Total 148,861.5653 34

Regression output confidence interval


variables coefficients std. error t (df=27) p-value 95% lower 95% upper
Intercept 51.3827 162.7418 0.316 .7546 -282.5359 385.3012
MW -0.1772 0.3083 -0.575 .5701 -0.8097 0.4553
BP 1.4901 0.1831 8.139 9.64E-09 1.1144 1.8657
RI -13.1620 107.2293 -0.123 .9032 -233.1784 206.8544
H1 -13.8067 9.7452 -1.417 .1680 -33.8022 6.1888
H2 -6.4334 8.6848 -0.741 .4652 -24.2531 11.3864
H3 -12.2297 8.1138 -1.507 .1434 -28.8779 4.4184
H4 -0.5823 4.8499 -0.120 .9053 -10.5335 9.3689
Learning Objective: 13-1
Learning Objective: 13-3

13.30 Refer to the output in question 13.29. The coefficient confidence intervals contain zero
except for the variable BP (Boiling Point). Only BP is a significant predictor at  = .
05.
Learning Objective: 13-4

13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 27,
t.025 =T.INV(.025,27) =±2.052. The predictor BP has a tcalc = 8.139.
Learning Objective: 13-3

13.32 a. BP: p-value = 9.64E-09.


b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3

13.33 Fcalc = 285.64 with a p-value = 1.27E-23. R2 = .987 and R2adj = .983. The model provides
significant fit with a fairly strong prediction of compound retention time. However,
residual assumptions should be verified.
Learning Objective: 13-2

13-41
Chapter 13 - Multiple Regression

13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.052(8.571) = yˆi ± 17.588 Yes, this model does have
practical value, provided the residual assumptions are verified.
Learning Objective: 12-9

13.35 a.

MW BP RI H1 H2 H3 H4
MW 1.000
BP .906 1.000
RI .240 .580 1.000
H1 .065 -.218 -.747 1.000
H2 -.233 -.336 -.202 -.194 1.000
H3 -.214 -.185 -.145 -.194 -.094 1.000
H4 .117 .167 .202 -.316 -.153 -.153 1.000

35sample size

± .334 critical value .05 (two-tail)


± .430 critical value .01 (two-tail)

b. Collinearity could be a problem. The correlation coefficient for BP and MW is close


to .987 = .9935.
Learning Objective: 13-6

13.36 a.
variables VIF
Intercept
MW 21.409
BP 31.113
RI 13.115
H1 9.235
H2 2.816
H3 2.458
H4 1.793

b. The VIF values MW, BP, RI, and H1 are all high. It is possible that they are causing
variance inflation which could cause some instability. If the regression analysis is
rerun using only BP as the predictor variable, very little changes in the fit statistics
and the scatterplot shows a very strong linear relationship.
Learning Objective: 13-6

13.37 Observations 15, 17, and 25 have unusual residual values.


Learning Objective: 13-8

13-42
Chapter 13 - Multiple Regression

13.38 Observation 24 has a high leverage value. To determine if any of the observations were
influential, remove them from the data set and rerun the regression. If the regression
statistics change significantly then the observation could be considered influential.
Learning Objective: 13-8

13.39 While the normplot and histogram are not a perfect representation of a normal distribution,
there are no strong departures from normality. Assuming normally distributed
residuals appears reasonable.

Learning Objective: 13-7

13-43
Chapter 13 - Multiple Regression

13.40 The model appears to underestimate retention at the low end and the high end of the range
of values. It might be prudent to explore a nonlinear relationship with retention and
BP.

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.


Learning Objective: 13-7

13-44
Chapter 13 - Multiple Regression

DATA SET H Response Variable: 2007 State Foreclosure Rate

13.25 Cross-sectional. Unit of observation: state.


Learning Objective: 02-3

13.26 The variable magnitudes are all similar.


Learning Objective: 13-9

13.27 The intercept would not have meaning. It would not be logical to for a particular state to
have zero values for any of the predictors. A priori hypotheses for the relationship
between each predictor and the response variable are listed in the table below.
Predictor Relationship with
Response
MassLayof Positive
SubprimeShare Positive
PriceIncomeRat
o Positive
Homeownership Negative
5YrApp Positive
UnempChange Positive
%HousMoved Positive
Learning Objective: 12-2

13.28 50/7 = 7.14. The data set meets Doane’s Rule but not Evans’.
Learning Objective: 13-1

13.29 The estimated regression equation is ŷ = 51.2829 + 0.0751MassLayoff +


18.5385SubPrimeShare − .6965PriceIncomeRatio − .0587Homeownership − .
04415YrApp–16.4618UnEmpChange – 99.2433%HouseMoved. The signs on the
coefficients match our a priori logic for only Mass Layoffs and Share of mortgages
that were subprime

13-45
Chapter 13 - Multiple Regression

MegaStat Output:
Regression Analysis

R² 0.739
Adjusted R² 0.696 n 50
R 0.860 k 7
Std. Error 3.732 Dep. Var. Foreclosure

ANOVA table
Source SS df MS F p-value
Regression 1,657.8345 7 236.8335 17.01 2.01E-10
Residual 584.9417 42 13.9272
Total 2,242.7762 49

Regression output confidence interval


variables coefficients std. error t (df=42) p-value 95% lower 95% upper
Intercept 51.2829 13.8099 3.713 .0006 23.4134 79.1524
MassLayoff 0.0751 0.2056 0.365 .7167 -0.3398 0.4900
SubprimeShare 18.5385 17.7785 1.043 .3030 -17.3400 54.4169
PriceIncomeRatio -0.6965 0.7438 -0.936 .3544 -2.1976 0.8045
Homeownership -0.0587 0.1309 -0.448 .6561 -0.3228 0.2054
5YrApp 0.0441 0.0343 1.284 .2062 -0.0252 0.1134
UnempChange 16.4618 7.7223 2.132 .0389 0.8775 32.0461
%HousMoved -99.2433 14.3588 -6.912 1.94E-08 -128.2206 -70.2661
Learning Objective: 13-1
Learning Objective: 13-3

13.30 Refer to the output in question 13.29. The coefficient confidence intervals contain zero
except for the variables UnempChange and %HousMoved. These variables were the
only two significant at  = .05.
Learning Objective: 13-4

13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 42,
t.025 =T.INV(.025,42) =±2.018. The predictors UnempChange and %HousMoved have
tstatistics equal to 2.132 > 2.018 and 6.912<2.018, respectively.
Learning Objective: 13-3

13.32 a. UnempChange: p-value = .0389 and %HousMoved: p-value = 1.94E-08.


b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3

13.33 Fcalc = 17.01 with a p-value = 2.01E-10. R2 = .739 and R2adj = .696. The model provides
significant fit with a fairly strong prediction of foreclosure rates. However, residual
assumptions should be verified.

13-46
Chapter 13 - Multiple Regression

Learning Objective: 13-2

13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.018(3.732) = yˆi ± 7.531. Yes, this model does have
practical value, provided the residual assumptions are verified.
Learning Objective: 12-9

13.35 a.
Subprime PriceIncom Home Unemp %Hous
MassLayoff Share Ratio ownership 5YrApp Change Moved
MassLayoff 1.000
SubprimeShare -.022 1.000
PriceIncomeRatio .073 -.119 1.000
Homeownership -.045 .130 -.501 1.000
5YrApp .045 -.105 .786 -.434 1.000
UnempChange .280 .124 .200 -.066 .211 1.000
%HousMoved -.144 -.507 -.314 .229 -.119 -.143 1.000

50sample size

± .279 critical value .05 (two-tail)


± .361 critical value .01 (two-tail)

b. Collinearity might be a problem. There are several pairs of variables that have
significant correlation. In particular 5YrApp and PriceIncomRatio have an rclose in
value to .739 = .860.
Learning Objective: 13-6

13.36 a.
variables VIF
Intercept
MassLayoff 1.123
SubprimeShare 1.653
PriceIncomeRatio 3.447
Homeownership 1.405
5YrApp 2.868
UnempChange 1.170
%HousMoved 1.901

b. The VIF values are all 4 or less. Concern about multicollinearity is not high.
Learning Objective: 13-6

13.37 Colorado, Nevada, and Vermont have outlier/unusual residual values.


Learning Objective: 13-8

13-47
Chapter 13 - Multiple Regression

13.38 California, Mississippi, Nevada, and Vermont have high leverage values. To determine if
any of the observations were influential, remove them from the data set and rerun the
regression. If the regression statistics change significantly then the observation could
be considered influential.
Learning Objective: 13-8

13.39 Residuals do not appear to be normally distributed. A histogram shows a right skewed
distribution.

Learning Objective: 13-7

13-48
Chapter 13 - Multiple Regression

13.40 The residual plot shows some indication that the model is overestimating foreclosure rates
in the middle range of values. A plot of residuals against unemployment shows
heteroscedasticity.

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.


Learning Objective: 13-7

13-49
Chapter 13 - Multiple Regression

DATA SET I Response Variable: Body Fat %

13.25 Cross-sectional. Unit of observation: an individual male.


Learning Objective: 02-3

13.26 The variable magnitudes are similar.


Learning Objective: 13-9

13.27 The intercept would not have meaning. A priori logic for the relationship between each
predictor and the response variable are listed in the table below.
Predictor Relationship with
Response
Age Positive
Weight Positive
Height Neutral
Neck Positive
Chest Positive
Abdomen Positive
Hip Positive
Thigh Positive
Learning Objective: 12-2

13.28 50/8 = 6.25. The data set meets Doane’s Rule but not Evans’.
Learning Objective: 13-1

13.29 The estimated regression equation is y = −35.4309 +0.0905Age − 0.1928Weight −


0.0642Height − 0.3348Neck+ 0.0239Chest + 0.9132Abdomen−0.3107Hip
+0.7787Thigh. The signs on the coefficients match our a priori logic for Age, Chest,
and Abdomen only.
MegaStat Output:
Regression Analysis

R² 0.841
Adjusted R² 0.810 n 50
R 0.917 k 8
Std. Error 3.957 Dep. Var. Fat%

ANOVA table
Source SS df MS F p-value
Regression 3,399.1446 8 424.8931 27.14 4.82E-14
Residual 641.8882 41 15.6558
Total 4,041.0328 49

Regression output confidence interval


variables coefficients std. error t (df=41) p-value 95% lower 95% upper
Intercept -35.4309 24.9040 -1.423 .1624 -85.7256 14.8638

13-50
Chapter 13 - Multiple Regression

Age 0.0905 0.0880 1.028 .3099 -0.0872 0.2682


Weight -0.1928 0.0783 -2.462 .0181 -0.3510 -0.0346
Height -0.0642 0.1160 -0.554 .5827 -0.2984 0.1700
Neck -0.3348 0.4023 -0.832 .4100 -1.1472 0.4776
Chest 0.0239 0.1788 0.133 .8945 -0.3373 0.3850
Abdomen 0.9132 0.1640 5.570 1.77E-06 0.5821 1.2444
Hip -0.3107 0.2749 -1.130 .2649 -0.8658 0.2445
Thigh 0.7787 0.2907 2.678 .0106 0.1915 1.3658
Learning Objective: 13-1
Learning Objective: 13-3

13.30 Refer to the output in question 13.29. The 95% coefficient confidence intervals contain
zero except for the variables Weight, Abdomen, and Thigh. This means that these three
variables are the only significant predictors in the model.
Learning Objective: 13-4

13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 41,
t.025 =T.INV(.025,41) =±2.020. Weight, Abdomen, and Thigh have t statistics equal to
2.462<2.02, 5.570> 2.02, and 2.678> 2.02, and are all significant predictors.
Learning Objective: 13-3

13.32 a. Weight: p-value = .0181, Abdomen:p-value = 1.77E-06, and Thigh Weight: p-value = .
0106.
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3

13.33 Fcalc = 27.14 with a p-value = 4.82E-14. R2 = .841 and R2adj = .810. The model provides
significant fit with a fairly strong prediction of percent body fat.
Learning Objective: 13-2

13.34 Prediction interval: yˆi ± t.025se = yˆi ± 2.020(3.957) = yˆi ± 7.993. Yes, this model does have
practical value.
Learning Objective: 12-9

13-51
Chapter 13 - Multiple Regression

13.35 a.

Age Weight Height Neck Chest Abdomen Hip


Age 1.000
Weight .265 1.000
Height -.276 .109 1.000
Neck .176 .882 .201 1.000
Chest .376 .912 .014 .820 1.000
Abdomen .442 .915 -.052 .781 .942 1.000
Hip .314 .959 -.045 .804 .911 .942 1.000

50sample size

± .279 critical value .05 (two-tail)


± .361 critical value .01 (two-tail)

b. According to Klein’s Rule there are many pairs of data that are causing concern for
collinearity. The correlation coefficient values are greater than .841 = .917. .
Learning Objective: 13-6

13.36 a.
variables VIF
Intercept
Age 1.712
Weight 31.111
Height 1.689
Neck 5.472
Chest 11.275
Abdomen 17.714
Hip 25.899
Thigh 11.931

b. The VIF values are high except for Age, Height, and Neck. Multicollinearity is a
concern.
Learning Objective: 13-6

13.37 There are no unusual standardized residuals.


Learning Objective: 13-8

13.38 Observation 5, 15, 36, 39, 42 had high leverage values. To determine if any of the
observations were influential, remove them from the data set and rerun the regression.
If the regression statistics change significantly then the observation could be
considered influential.
Learning Objective: 13-8

13-52
Chapter 13 - Multiple Regression

13.39 Assuming normally distributed residuals appears reasonable although the histogram
appears slightly right skewed.

Learning Objective: 13-7

13-53
Chapter 13 - Multiple Regression

13.40 Assuming homoscedastic residuals appears reasonable.

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.


Learning Objective: 13-7

13-54
Chapter 13 - Multiple Regression

DATA SET J Response Variable: Used Vehicle Price

13.25 Cross-sectional. Unit of observation: vehicle model.


Learning Objective: 02-3

13.26 The variable magnitudes are different. The response variable magnitude is in tens of
thousands and the predictor variables are integers between 0 and 50.
Learning Objective: 13-9

13.27 The intercept would not have meaning. It would not be logical to have a car with zero
values for any of the predictors. A priori reasoning for the relationship between each
predictor and the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
Age Negative Older car, lower price
Cars are less expensive than Vans which is
Car Negative the base indicator variable
Trucks are more expensive than Vans which is
Truck Positive the base indicator variable
SUVs are more expensive than Vans which is
SUV Positive the base indicator variable
Learning Objective: 12-2

13.28 637/4 = 159.25. The data set meets both Evans’ and Doane’s Rules.
Learning Objective: 13-1

13.29 The estimated regression equation is ŷ = 15,340.7233 − 693.9768Age − 533.5731Car +


5,748.1799Truck + 3,897.5375SUV. The signs on the coefficients match our a priori
reasoning.

13-55
Chapter 13 - Multiple Regression

MegaStat Output:
Regression Analysis

R² 0.139
Adjusted R² 0.134 n 637
R 0.373 k 4
Std. Error 8573.178 Dep. Var. Price

ANOVA table
Source SS df MS F p-value
Regression 7,512,691,865.5390 4 1,878,172,966.3848 25.55 1.20E-19
Residual 46,451,606,047.1737 632 73,499,376.6569
Total 53,964,297,912.7127 636

Regression output confidence interval


variables coefficients std. error t (df=632) p-value 95% lower 95% upper
Intercept 15,340.7233 1,239.0560 12.381 1.12E-31 12,907.5585 17,773.8881
Age -693.9768 117.6801 -5.897 6.02E-09 -925.0680 -462.8855
Car -533.5731 1,225.8598 -0.435 .6635 -2,940.8241 1,873.6780
Truck 5,748.1799 1,318.6111 4.359 1.52E-05 3,158.7909 8,337.5689
SUV 3,897.5375 1,315.4861 2.963 .0032 1,314.2852 6,480.7899
Learning Objective: 13-1
Learning Objective: 13-3
Learning Objective: 13-5

13.30 Refer to the output in question 13.29. The 95% coefficient confidence interval for the
indicator variable Car contains zero. The other three variables are significant
predictors in the model.
Learning Objective: 13-4

13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 632,
t.025 =T.INV(.025,632) =±1.964. Age,Truck, and SUV have t statistic values equal to
5.897, 4.359, and 2.963, respectively.
Learning Objective: 13-3

13.32 a. Age: p-value = 6.02E-09, Truck: p-value = 1.52E-05, and SUV: p-value = .0032. All
three p-values are less than .05
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3

13.33 Fcalc = 25.55 with a p-value = 1.20E-19. R2 = .139 and R2adj = .134. The model provides
significant fit but the model is not a strong predictive equation.
Learning Objective: 13-2

13-56
Chapter 13 - Multiple Regression

13.34 Prediction interval: yˆi ± t.025se = yˆi ± 1.964(8573.178) = yˆi ± 16837.722. No, this model does
not have practical value.
Learning Objective: 12-9

13.35 a.
Correlation Matrix

Age Car Truck SUV


Age 1.000
Car .003 1.000
Truck .017 -.478 1.000
SUV -.092 -.495 -.308 1.000

637sample size

± .078 critical value .05 (two-tail)


± .102 critical value .01 (two-tail)

b. There is no concern for collinearity. While there is significant correlation between the
indicator variables, this is to be expected because of the way they are defined. We
would expect to see correlation between indicator variables defined on the same
characteristic.
Learning Objective: 13-6

13.36 a.
variables VIF
Intercept
Age 1.017
Car 3.201
Truck 2.662
SUV 2.749

b. The VIF values are all 4 or less which suggests that multicollinearity has not caused
instability.
Learning Objective: 13-6

13.37 Vehicles 212, 342, and 502 were extremely high outlier residuals. Vehicles 246, 397, 631,
and 632 were unusual/outlier residuals. The next step would be to investigate the
three high outlier residuals for possible exclusion from the data set. It is possible their
prices were mistyped or the vehicles do not fit the profile of a vehicle for which the
model is being developed.
Learning Objective: 13-8

13.38 There were many observations with high leverage, too many to list here. To determine if
any of the observations were influential, remove them from the data set and rerun the
regression. If the regression statistics change significantly then the observation could
be considered influential.

13-57
Chapter 13 - Multiple Regression

Learning Objective: 13-8

13.39 Residuals do not appear to be normally distributed. Outliers should be investigated for
possible removal from data set.

Learning Objective: 13-7

13-58
Chapter 13 - Multiple Regression

13.40 Assuming homoscedastic residuals is not reasonable. There are obviously outliers in the
data set. Removing the outliers and rerunning the analysis will most likely show
heteroscedasticity in the residual plot.

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.


Learning Objective: 13-7

13-59
Chapter 13 - Multiple Regression

DATA SET K Response Variable: 500 yard freestyle time

13.25 Cross-sectional. Unit of observation: individual swimmer.


Learning Objective: 02-3

13.26 The variable magnitudes are similar.


Learning Objective: 13-9

13.27 The intercept would not have meaning. It would not be logical to have a freestyle time
when the age of the swimmer is zero. A priori reasoning for the relationship between
each predictor and the response variable are listed in the table below.
Predictor Relationship with Reason?
Response
The lowest seeded swimmers should have
Seed Positive the lowest times and vice versa
If Gender = 1 indicates female it is possible
Gender Positive the women have slower times than men
The older a swimmer is the slower their times
Age Positive will be
Learning Objective: 12-2

13.28 198/3 = 66. The data set meets both Evans’ and Doane’s Rules.
Learning Objective: 13-1

13.29 The estimated regression equation is ŷ = −35.3401+ 0.9286Seed +13.4401Gender +


0.8105Age. The signs on the coefficients match our a priori reasoning.
MegaStat Output:
Regression Analysis
R² 0.942
Adjusted R² 0.941 n 198
R 0.970 k 3
Std. Error 39.015 Dep. Var. Time

ANOVA
table
Source SS df MS F p-value
4,755,790.720
Regression 5 3 1,585,263.5735 1041.45 2.64E-119
Residual 295,300.5055 194 1,522.1676
5,051,091.226
Total 0 197

Regression output confidence interval


95% 95%
variables coefficients std. error t (df=194) p-value lower upper
Intercept -35.3401 21.4123 -1.650 .1005 -77.5708 6.8907
Seed 0.9286 0.0251 36.985 8.29E-90 0.8791 0.9781
Gender 13.4401 6.6282 2.028 .0440 0.3675 26.5126
Age 0.8105 0.4042 2.005 .0463 0.0134 1.6076

13-60
Chapter 13 - Multiple Regression

Learning Objective: 13-1


Learning Objective: 13-3
Learning Objective: 13-5

13.30 Refer to the output in question 13.29. None of the 95% coefficient confidence interval for
all three variables contain zero.All three variables are significant predictors in the
model.
Learning Objective: 13-4

13.31 For each coefficient test the following hypotheses: H0: βj = 0 vs. H1: βj ≠ 0. Using df = 194,
t.025 =T.INV(.025,194) = ±1.972. All variables have t statistic values greater than
1.972 therefore all predictors are significant.
Learning Objective: 13-3

13.32 a. Seed: p-value = 8.29E-90,Gender: p-value = .0440, and Age: p-value = .0463. All
three p-values are less than .05
b. This is consistent with the answer in 13.31.
c. The tests conclude the same thing. Most analysts prefer the p-value approach because
it tells the strength of the predictor significance.
Learning Objective: 13-3

13.33 Fcalc = 1041.45 with a p-value = 2.64E-119. R2 = .942 and R2adj = .941. The model provides
significant fit and is a strong predictor of finishing times.
Learning Objective: 13-2

13.34 Prediction interval: yˆi ± t.025se = yˆi ± 1.972(39.015) = yˆi ± 76.938. Yes, this model does have
have practical value.
Learning Objective: 12-9

13.35 a.
Correlation Matrix

Seed Gender Age


Seed 1.000
Gender .347 1.000
Age .580 -.136 1.000

198 sample size

critical value .05 (two-


± .139 tail)
critical value .01 (two-
± .183 tail)

b. There is no concern for collinearity. While there is significant correlation between the
indicator variables, the correlation is not high enough to cause concern.
Learning Objective: 13-6

13-61
Chapter 13 - Multiple Regression

13.36 a.
variables VIF
Seed 2.086
Gender 1.411
Age 1.870

b. The VIF values are all 3 or less which suggests that multicollinearity has not caused
instability.
Learning Objective: 13-6

13.37 Seven swimmers had residuals that were either unusual (greater than 2 but less than 3) or
outliers (greater than 3.) This means the model underestimated their finishing times.
Six swimmers had residuals that were either unusual (less than −2 but greater than
−3) or outliers (less than −3.) This means the model overestimated their finishing
times.
Learning Objective: 13-8

13.38 There were nine observations with high leverage. To determine if any of the observations
were influential, remove them from the data set and rerun the regression. If the
regression statistics change significantly then the observation could be considered
influential.
Learning Objective: 13-8

13.39 Residuals do not appear to be normally distributed. Outliers should be investigated for
possible removal from data set. The histogram shows outliers on both the low and
high end.

13-62
Chapter 13 - Multiple Regression

Learning Objective: 13-7

13.40 Assuming homoscedastic residuals is not reasonable. There is a clear fan out pattern. It
appears that as the seed time increases, the variation in residuals increases.

13-63
Chapter 13 - Multiple Regression

Learning Objective: 13-7

13.41 This is cross-sectional data. A test for autocorrelation is not warranted.


Learning Objective: 13-7

13-64
Chapter 13 - Multiple Regression

13.42 a. Each coefficient measures the additional revenue earned by selling one more unit
(one more car, truck, or SUV,respectively.)
b. The intercept is not meaningful. Ford has to sell at least one car, truck or SUVto earn
revenue. No sales means no revenue.
c. The error term might consist of factors such as the price of fuel, which heavily
influences vehicle sales, and the state of the economy. Sales are lower when the
economy is unpredictable because people hold onto their cars longer. In addition, the
predictor variables are highly correlated to each other (multicollinearity problem), as
well as related to “missing variables” that influence their sales as well as revenue.
Learning Objective: 12-2
Learning Objective: 13-6

13.43 There are no quantitative predictor variables. A better approach would be to use an ANOVA
procedure that compares means within groups. In addition, the sample size is too
small relative to the number of predictors. There would have to be 6 binary variables
to cover the suppliers and substrate categories. With only 11 observations this would
violate both Evans’ Rule and Doane’s Rule.
Learning Objective: 13-5

13.44 a. One binary must be omitted to prevent perfect multicollinearity.


b. Same reasoning as in (a). The effect of the missing category is captured in the model
intercept.
c. Monday is the busiest day. The coefficient on Monday is positive meaning the
occupancy rates go up that day relative to the base day of Sunday. All other days had
negative coefficients.
d. Shift 3 is captured in the intercept. Both Shift 1 and Shift 2 have lower
AvgOccupancy given that they have negative coefficients.
e. The intercept represents the AvgOccupancy on Sundays during Shift 3.
f. The fit is poor because R2 = .094.
Learning Objective: 13-2
Learning Objective: 13-5

13.45 Main points:


1. The regression as a whole is not significant based on the Fcalcp-value = .3710
2. R2 = 0.117 indicating a very poor fit.
3. Examination of the individual regression coefficients indicates that the two binary
variables are not significantly different from zero, p-values >.10.
4. Conclusion: cost per average load does not differ based on whether or not it is a
top-load washer or whether or not powder was used. No apparent cost savings based
on washer type or detergent type.
Learning Objective: 13-2
Learning Objective: 13-5

13.46 Main points:


1. The best model in terms of fit as measured by s,R2, andR2adjis the model with three
variables (InfMort, GDPCap, and Literate). Note that the three variable model is only

13-65
Chapter 13 - Multiple Regression

marginally better in terms of fit statistics than the one or two variable models. No
gain in fit is achieved by adding LifeExp and Density.
2. Examination of the individual regression coefficients indicates that the InfMort and
Literate have p-values < .01 and GDPCap has a p-value < 0.05.
3. Conclusion: Infant mortality and literacy rate have the greatest impact on birth
rates.
Learning Objective: 13-2

13.47 a. Yes, the coefficients make sense, except for TrnOvr. One would think that turnovers
would actually reduce the number of wins, not increase them.
b. No. It is negative and the number of games won is limited to zero or greater.
c. One needs either 5 observations or 10 observations per predictor. There are 6
predictor variables in the model, so we need a minimum of 30 observations to meet
Doane’s rule. The fact there were only 23 teams and therefore only 23 observations
means the sample size was probably too small to make the model reliable.
d. Rebounds and points are highly correlated. We don’t need both of them in the model.
This could be inflating in the variance of the predictor estimates, causing the
predictors to appear insignificant.
Learning Objective: 13-3
Learning Objective: 13-6

13.48 Main points:


1. The regression as a whole indicates a very strong fit.
2. R2=.811. The predictor variables as a group explain 81.1% of the variation in
Salary.
3. Examination of the individual regression coefficients indicates that all of the
variables are significantly different from zero, p-values <0.01
4. Conclusion: The ethnicity of a professor does matter. A professor who is African-
American earns on average $2,093 less than one who is not. Assistant professors earn
on average $6,438 less than higher ranking professors. New hires earn less than those
who have been there for some time. To stay competitive universities often have to
offer high salaries to the top candidates so this seems counter-intuitive.
Learning Objective: 13-3
Learning Objective: 13-5

13.49 a. Both men and women who had prior marathon experience had lower times on
average than those who were running for the first time.
b. No the intercept does not have any meaning. If all predictor/binary variables were 0
then you wouldn’t have an individual racer.
c. It is suspected non-linearity is present among age, weight, and height. In this model
we see increases in age decreases times, but at an increasing rate, increases in weight
decreases time, but at an increasing rate and increasing height increases time, but at a
decreasing rate.
d. The model predicted that I would run the marathon in about 12 and ½ hours. And
that could be right. I can walk 4 mph so it would take at least 6 to 7 hours minimum!
Learning Objective: 13-3

13-66
Chapter 13 - Multiple Regression

Learning Objective: 13-5

13.50 The three predictors of CityMPG are most likely strongly correlated with each other. The
VIF values do not show any concern (all less than 3) but we see that the variables
Length and Width are not significant predictors of gas mileage. The variable Weight is
a significant predictor (p-value = .0000)and according to the R2 value of .682,
explains approximately 68% of the variation in CityMPG.
Learning Objective: 13-2
Learning Objective: 13-3

13.51 While the four predictor model gives the highest R2 (.474) and lowest standard error
(143.362), the predictor Divorce is not significant. The decrease in R2(.454) after
removing Divorce is quite small so the three predictor model would be the best
choice.
Learning Objective: 13-2
Learning Objective: 13-3

13-67

S-ar putea să vă placă și