Sunteți pe pagina 1din 20

Multivariate Regression

Introduction
• The population regression model of a dependent variable, Y, on a
set of k independent variables, X1, X2,. . . , Xk is given by:

Y = β0 + β1X1 + β2X2 + β3X3 + . . . + βkXk+ ε

Y = the value of the dependent (response) variable


β0 = the regression constant
β1 = the partial regression coefficient of independent variable 1
β2 = the partial regression coefficient of independent variable 2
βk = the partial regression coefficient of independent variable k
k = the number of independent variables
ε = the error of prediction
Model Assumptions

1. ε ~ N(0,σ2), independent of other errors.


2. The variables Xi are uncorrelated with the error term.
Simple and Multiple Least-Squares
Regression
Y y

x1
yˆ  b  b x
0 1
X x2 y  b0  b1x1  b2 x 2

In a simple regression model, In a multiple regression model,


the least-squares estimators the least-squares estimators
minimize the sum of squared minimize the sum of squared
errors from the estimated errors from the estimated
regression line. regression plane.
Example 11-2
• Data given in Table 11-6 (page 512-13)
• Dependent Variable:
– Exports: US exports to Singapore in billions of Singapore
dollars
• Independent variables:
– M1: Money supply figures in billions of Singapore dollars
– Lend: minimum Singapore bank lending rate in %
– Price: An index of local prices where the base year is 1974
– Exchange: The exchange rate of Singapore dollars per US
dollar.
Decomposition of the Total Deviation
in a Multiple Regression Model
y
Y  Y: Error Deviation
Total deviation: Y  Y
Y  Y : Regression Deviation
y

x1

x2
Total Deviation = Regression Deviation + Error Deviation
SST = SSR + SSE
11-3 The F Test of a Multiple
Regression Model
A statistical test for the existence of a linear relationship between Y and any or
all of the independent variables X1, X2, ..., Xk:
H0: 1 = 2 = ...= k= 0
H1: Not all the i (i=1,2,...,k) are equal to 0

Source of Sum of Degrees of


Variation Squares Freedom Mean Square F Ratio

Regressio SSR k SSR


n MSR 
k
Error SSE n - (k+1) SSE
MSE 
( n  ( k  1))
Total SST n-1 SST
MST 
( n  1)
Analysis of Variance Table
ANOVA Table
Source SS df MS F FCritical p-value
Regn. 32.9463 4 8.2366 73.059 2.5201 0.0000 s 0.3358
Error 6.98978 62 0.1127
Adjusted
Total 39.9361 66 0.6051 R2 0.8250 R2 0.8137

F Distribution with 2 and 7 Degrees of Freedom The test statistic, F = 86.34, is greater
f(F) than the critical point of F(2, 7) for
Test statistic
86.34 any common level of significance
(p-value 0), so the null hypothesis is
rejected, and we might conclude that
=0.01
the dependent variable is related to
F one or more of the independent
0
F0.01=9.55 variables.
How Good is the Regression
y The mean square error is an unbiased
estimator of the variance of the population
errors, , denoted by  2 :
SSE ˆ )2
 ( y y
MSE  
(n  (k 1)) (n (k 1))
x1
Standard error of estimate:
Errors: y - y s  MSE
x2

2
The multiple coefficient of determination, R , measures the proportion of
the variation in the dependent variable that is explained by the combination
of the independent variables in the multiple regression model:
2 SSR SSE
R = =1-
SST SST
Decomposition of the Sum of Squares and the
Adjusted Coefficient of Determination

SST

SSR SSE
2 SSR SSE
R = = 1-
SST SST

The adjusted multiple coefficient of determination, R 2, is the coefficient of


determination with the SSE and SST divided by their respective degrees of freedom:
SSE
R 2 =1- (n-(k+1))
SST
(n-1)
Tests of the Significance of Individual
Regression Parameters
Hypothesis tests about individual regression slope parameters:
(1) H0: 1= 0
H1: 1  0
(2) H0: 2 = 0
H1: 2  0
.
.
.
(k) H0: k = 0
H1: k  0

bi  0
Test statistic for test i : t( n ( k 1) 
s (bi )
Regression Results for Individual
Parameters

0 1 2 3 4

Intercept M1 Lend Price Exch.

b -4.01546 0.368456 0.004702 0.036511 0.267896

s(b) 2.766401 0.063848 0.049222 0.009326 1.17544


t -1.45151 5.7708 0.095531 3.914914 0.227911

p-value 0.151679 2.71E-07 0.924201 0.000228 0.820465


Residual Analysis and Checking
for Model Inadequacies
Residuals Residuals

0 0

x or y x or y

Homoscedasticity: Residuals appear completely Heteroscedasticity: Variance of residuals


random. No indication of model inadequacy. increases when x changes.

Residuals Residuals

0 0

Time x or y

Curved pattern in residuals resulting from


Residuals exhibit a linear trend with time. underlying nonlinear relationship.
Normal Probability Plot of the
Residuals
Flatter than Normal
Normal Probability Plot of the
Residuals
More Peaked than Normal
Normal Probability Plot of the
Residuals
Positively Skewed
Normal Probability Plot of the
Residuals
Negatively Skewed
Multicollinearity
x2

x1 x2 x1

Orthogonal X variables Perfectly collinear X


provide information from variables provide identical
independent sources. No information content. No
multicollinearity. regression.

x2
x2
x1 x1
Some degree of collinearity.
A high degree of negative
Problems with regression
collinearity also causes
depend on the degree of
problems with regression.
collinearity.
Effects of Multicollinearity

• Variances of regression coefficients are inflated.


• Magnitudes of regression coefficients may be different
from what are expected.
• Signs of regression coefficients may not be as expected.
• Adding or removing variables produces large changes in
coefficients.
• Removing a data point may cause large changes in
coefficient estimates or signs.
• In some cases, the F ratio may be significant while the t
ratios are not.
Solutions to the Multicollinearity
Problem
• Drop a collinear variable from the regression
• Change in sampling plan to include elements
outside the multicollinearity range
• Transformations of variables
• Ridge regression

S-ar putea să vă placă și