Documente Academic
Documente Profesional
Documente Cultură
15-2
The Multiple Regression Model
The linear regression model relating y to x1,
x2,…, xk is y = β0 + β1x1 + β2x2 +…+ βkxk +
µy = β0 + β1x1 + β2x2 +…+ βkxk is the mean
value of the dependent variable y when the
values of the independent variables are
x1, x2,…, xk
β0, β1, β2,… βk are unknown the regression
parameters relating the mean value of y to x1,
x2,…, xk
is an error term that describes the effects on y
of all factors other than the independent
variables x1, x2,…, xk
15-3
The Least Squares Estimates and Point
Estimation and Prediction
Estimation/prediction equation
ŷ = b0 + b1x01 + b2x02 + … + bkx0k
is the point estimate of the mean value of the
dependent variable when the values of the
independent variables are x1, x2,…, xk
It is also the point prediction of an individual value of
the dependent variable when the values of the
independent variables are x1, x2,…, xk
b0, b1, b2,…, bk are the least squares point estimates of
the parameters β0, β1, β2,…, βk
x01, x02,…, x0k are specified values of the independent
predictor variables x1, x2,…, xk
Will use software to find the model parameters
15-4
LO15-1
EXAMPLE 15.1 The Tasty Sub Shop
Case
15-6
LO15-2
15-7
LO15-2
Sum of Squares
Sum of squared errors
SSE ei2 ( yi yˆ i ) 2
15-8
LO15-3: Calculate and
interpret the multiple
and adjusted multiple
coefficients of
determination.
15.3 R2 and Adjusted R2
1. Total variation is given by the formula
Σ(yi - ȳ)2
2. Explained variation is given by the formula
Σ(ŷi - ȳ)2
3. Unexplained variation is given by the formula
Σ(yi - ŷi)2
4. Total variation is the sum of explained and
unexplained variation
15-10
LO15-3
15-11
LO15-3
The Adjusted R2
Adding an independent variable to multiple
regression will raise R2
R2 will rise slightly even if the new variable
has no relationship to y
The adjusted R2 corrects this tendency in R2
As a result, it gives a better estimate of the
importance of the independent variables
The adjusted multiple coefficient of
determination is k n 1
2
R R
2
n 1 n (k 1)
15-12
LO15-4: Test the
significance of a
multiple regression
model by using an F
test. 15.4 The Overall F Test
To test
H0: β1= β2 = …= βk = 0 versus
Ha: At least one of β1, β2,…, βk ≠ 0
The test statistic is
(Explained variation )/k
F(model)
(Unexplain ed variation )/[n - (k 1)]
15-13
LO15-5: Test the
significance of a single
independent variable.
15.5 Testing the Significance of an
Independent Variable
A variable in a multiple regression model is
not likely to be useful unless there is a
significant relationship between it and y
To test significance, we use the null
hypothesis H0: βj = 0
Versus the alternative hypothesis
Ha: βj ≠ 0
15-14
LO15-5
Testing Significance of an Independent
Variable #2
Alternative Reject H0 If p-Value
15-15
LO15-5
Testing Significance of an Independent
Variable #3
Test Statistics
bj
t=
sbj
100(1-)% Confidence Interval for βj
[b1 ± t/2 Sbj]
t, t/2 and p-values are based on
n-(k+1) degrees of freedom
15-16
LO15-5
Testing Significance of an Independent
Variable #4
It is customary to test the significance of every
independent variable in a regression model
If we can reject H0: βj = 0 at the 0.05 level of
significance, we have strong evidence the
independent variable xj is significantly related to
y
If we can reject H0: βj = 0 at the 0.01 level of
significance, we have very strong evidence that
the independent variable xj is significantly
related to y
The smaller the significance level at which H0
can be rejected, the stronger the evidence that xj
is significantly related to y
15-17
LO15-5
A Confidence Interval for the
Regression Parameter βj
If the regression assumptions hold,
100(1-)% confidence interval for βj
is [b1 ± t/2 Sbj]
t/2 is based on n – (k + 1) degrees of
freedom
15-18
LO15-6: Find and
interpret a confidence
interval for a mean
value and a prediction
interval for an
15.6 Confidence and Prediction
individual value.
Intervals
The point on the regression line corresponding to a
particular value of x01, x02,…, x0k, of the
independent variables is
ŷ = b0 + b1x01 + b2x02 + … + bkx0k
It is unlikely that this value will equal the mean
value of y for these x values
Therefore, we need to place bounds on how far the
predicted value might be from the actual value
We can do this by calculating a confidence interval
for the mean value of y and a prediction interval for
an individual value of y
15-19
LO15-6
Distance Value
Both the confidence interval for the mean
value of y and the prediction interval for an
individual value of y employ a quantity
called the distance value
With simple regression, we were able to
calculate the distance value fairly easily
However, for multiple regression, calculating
the distance value requires matrix algebra
For that reason, we use software
15-20
LO15-6
A Confidence Interval for a Mean
Value of y
Assume the regression assumptions hold
The formula for a 100(1-) confidence
interval for the mean value of y is as follows:
15-21
LO15-6
A Prediction Interval for an Individual
Value of y
Assume the regression assumptions hold
The formula for a 100(1-) prediction
interval for an individual value of y is as
follows:
15-22
15.7 The Sales Representative Case:
Evaluating Employee Performance
yi Yearly sales of the company’s product
x1 Number of months the representative has been
employed
x2 Sales of products in the sales territory
x3 Dollar advertising expenditure in the territory
x4 Weighted average of the company’s market share
in the territory for the previous four years
x5 Change in the company’s market share in the
territory over the previous four years
15-23
Excel Output of a Regression Analysis of
the Sales Representative Performance Data
15-25
LO15-10
Residual Plots
Residuals versus each independent variable
Residuals versus predicted y’s
Residuals in time order (if the response is a
time series)