Sunteți pe pagina 1din 5

MULTIPLE REGRESSION ANALYSIS

Problem 1: Jennifer Dahl, supervisor of the Circle O discount chain, would like to forecast
the time it takes to check out a customer. She decides to use the following independent
variables: number of purchased items and total amount of the purchase. She collects data for
a sample of 18 customers, shown in Table P-8.

Customer Checkout Time Amount ($) Number of items


(minutes) Y X1 X2
1 3.0 36 9
2 1.3 13 5
3 0.5 3 2
4 7.4 81 14
5 5.9 78 13
6 8.4 103 16
7 5.0 64 12
8 8.1 67 11
9 1.9 25 7
10 6.2 55 11
11 0.7 13 3
12 1.4 21 8
13 9.1 121 21
14 0.9 10 6
15 5.4 60 13
16 3.3 32 11
17 4.5 51 15
18 2.4 28 10

a. Determine the best regression equation.


b. When an additional item is purchased, what is the average increase in the checkout time?
c. Compute the residual for customer 18.
d. Compute the standard error of the estimate.
e. Interpret part d in terms of the variables used in this problem.
f. Compute a forecast of the checkout time if a customer purchases 14 items that amount to
$70.
g. Compute a 95% interval forecast for your prediction in part f.

Problem 2: Table P-9 contains data on food expenditures, annual income, and family size for
a sample of 10 families.

Family Annual Food Annual Income Family Size


Expenditures ($1,000s) X1 X2
($100s) Y
A 24 11 6
B 8 3 2
C 16 4 1
D 18 7 3
E 24 9 5
F 23 8 4
G 11 5 2
H 15 7 2
I 21 8 3
J 20 7 2
a. Construct the correlation matrix for the three variables in Table P-9. Interpret the
correlations in the matrix.
b. Fit a multiple regression model relating food expenditures to income and family size.
Interpret the partial regression coefficients of income and family size. Do they make sense?
c. Compute the variance inflation factors (VIFs) for the independent variables.

Problem 3: A taxi company is interested in the relationship between mileage, measured in


miles per gallon, and the age of cars in its fleet. The 12 fleet cars are the same make and size
and are in good operating condition as a result of regular maintenance. The company employs
both male and female drivers, and it is believed that some of the variability in mileage may be
due to differences in driving techniques between the groups of drivers of opposite gender. In
fact, other things being equal, women tend to get better mileage than men. Data are generated
by randomly assigning the 12 cars to five female and seven male drivers and computing miles
per gallon after 300 miles. The data appear in Table P-11.

Miles per Age of Car ( years) Gender (0 = male,


Gallon Y X1 1 =female) X2
22.3 3 0
22.0 4 1
23.7 3 1
24.2 2 0
25.5 1 1
21.1 5 0
20.6 4 0
24.0 1 0
26.0 1 1
23.1 2 0
24.8 2 1
20.2 5 0
a. Construct a scatter diagram with Y as the vertical axis and as the horizontal axis. Identify
the points corresponding to male and female drivers, respectively.

b. Fit the regression model and interpret the least squares coefficient,

c. Compute the fitted values for each of the pairs, and plot the fitted values on the scatter
diagram. Draw straight lines through the fitted values for male drivers and female drivers,
respectively. Specify the equations for these two straight lines.

d. Suppose gender is ignored. Fit the simple linear regression model,

, and plot the fitted straight line on the scatter diagram.


Problem 4: . The scores for two within-term examinations, X1 and X2 ; the current grade
point average (GPA), X3; and the final exam score, Y, for 20 students in a business statistics
class are listed in Table P-18.

X1 X2 X3 Y

87 85 2.7 91
100 84 3.3 90
91 82 3.5 83
85 60 3.7 93
56 64 2.8 43
81 48 3.1 75
77 67 3.1 63
86 73 3.0 78
79 90 3.8 98
96 69 3.7 99
93 60 3.2 54
92 69 3.1 63
100 86 3.6 96
80 87 3.5 89
100 96 3.8 97
69 51 2.8 50
80 75 3.6 74
74 70 3.1 58
79 66 2.9 87
95 83 3.3 57

a. Fit a multiple regression model to predict the final exam score from the scores on the
within-term exams and GPA. Is the regression significant? Explain.
b. Predict the final exam score for a student with within-term exam scores of 86 and 77 and a
GPA of 3.4.
c. Compute the VIFs and examine the t statistics for checking the significance of the
individual predictor variables. Is multicollinearity a problem? Explain.
d. Compute the mean leverage. Are any of the observations high leverage points?

Problem 5: Table P-21 contains the number of accounts (in thousands) and the assets (in
billions of dollars) for 10 online stock brokerages. Plot the assets versus the number of
accounts. Investigate the possibility the relationship is curved by running a multiple
regression to forecast assets using the number of accounts and the square of the number of
accounts as independent variables.

Assets Number of accounts


($ billions) X (1,000s) Y
219.0 2,500
21.1 909
38.8 615
5.5 205
160.0 2,300
19.5 428
11.2 590
5.9 134
1.3 130
6.8 125

a. Give the fitted regression function. Is the regression significant? Explain.


b. Test for the significance of the coefficient of the squared term. Summarize your
conclusion.
c. Rerun the analysis without the quadratic (squared) term. Explain why the coefficient of the
number of accounts is not the same as the one you found for part a.

Problem 6: The sales manager of a large automotive parts distributor, Hartman Auto
Supplies, wants to develop a model to forecast as early as May the total annual sales of a
region. If regional sales can be forecast, then the total sales for the company can be forecast.
The number of retail outlets in the region stocking the company’s parts and the number of
automobiles registered for each region as of May 1 are the two independent variables
investigated. The data appear in Table P-12.

a. Analyse the correlation matrix.

b. How much error is involved in the prediction for region 1?

c. Forecast the annual sales for region 12, given 2,500 retail outlets and 20.2 million
automobiles registered.

d. Discuss the accuracy of the forecast made in part c.

e. Show how the standard error of the estimate was computed.

Number of
Annual Sales Number of
Region Automobiles
($ millions) Retail
X2 Registered
Y Outlets X1
($ millions)
1 52.3 2011 24.6
2 26 2850 22.1
3 20.2 650 7.9
4 16 480 12.5
5 30 1694 9
6 46.2 2302 11.5
7 35 2214 20.5
8 3.5 125 4.1
9 33.1 1840 8.9
10 25.2 1233 6.1
11 38.2 1699 9.5
Problem 7: Ms. Haight, a real estate broker, wishes to forecast the importance of four factors
in determining the prices of lots. She accumulates data on price, area, elevation, and slope
and rates the view for 50 lots. She runs the data on a correlation program and obtains the
correlation matrix given in Table P-17. Ms. Haight then runs the data on a stepwise multiple
regression program.

a. Determine which variable will enter the model first, second, third, and last.

b. Which variable or variables will be included in the best prediction equation?

  View
Variable Price Area Elevation Slope View
Price 1 0.59 0.66 0.68 0.88
Area   1 0.4 0.64 0.41
Elevation     1 0.13 0.76
Slope       1 0.63
View         1

S-ar putea să vă placă și