Sunteți pe pagina 1din 9

4.5.

1 Seemingly Unrelated Regression Let's continue using the hsb2 data file to illustrate the use of seemingly unrelated regression. This time let's look at two regression models.
science = math female write = read female

It is the case that the errors (residuals) from these two models would be correlated. This would be true even if the predictor female were not found in both models. The errors would be correlated because all of the values of the variables are collected on the same set of observations. This is a situation tailor made for seemingly unrelated regression using the proc syslin with option sur. With the proc syslin we can estimate both models simultaneously while accounting for the correlated errors at the same time, leading to efficient estimates of the coefficients and standard errors. The syntax is as follows.
proc syslin data="c:\sasreg\hsb2" sur ; science: model science = math female ; write: model write = read female ; run;

The first part of the output consists of the OLS estimate for each model. Here is the OLS estimate for the first model.
The SYSLIN Procedure Ordinary Least Squares Estimation Model Dependent Variable SCIENCE science Analysis of Variance Source Model Error Corrected Total Root MSE Dependent Mean Coeff Var DF 2 197 199 7.64503 51.85000 14.74451 Sum of Squares 7993.550 11513.95 19507.50 R-Square Adj R-Sq Mean Square 3996.775 58.44645 F Value 68.38 Pr > F <.0001

0.40977 0.40378

Parameter Estimates Variable Intercept math female DF 1 1 1 Parameter Estimate 18.11813 0.663190 -2.16840 Standard Error 3.167133 0.057872 1.086043 t Value 5.72 11.46 -2.00 Pr > |t| <.0001 <.0001 0.0472

And here is OLS estimate for the second model.

The SYSLIN Procedure Ordinary Least Squares Estimation Model Dependent Variable WRITE write Analysis of Variance Source Model Error Corrected Total Root MSE Dependent Mean Coeff Var DF 2 197 199 7.13273 52.77500 13.51537 Sum of Squares 7856.321 10022.55 17878.88 R-Square Adj R-Sq Mean Square 3928.161 50.87591 F Value 77.21 Pr > F <.0001

0.43942 0.43373

Parameter Estimates Variable Intercept read female DF 1 1 1 Parameter Estimate 20.22837 0.565887 5.486894 Standard Error 2.713756 0.049385 1.014261 t Value 7.45 11.46 5.41 Pr > |t| <.0001 <.0001 <.0001

Proc syslin with option sur also gives an estimate of the correlation between the errors of the two models. Here is the corresponding output.
The SYSLIN Procedure Seemingly Unrelated Regression Estimation Cross Model Covariance SCIENCE SCIENCE WRITE 58.4464 7.8908 WRITE 7.8908 50.8759

Cross Model Correlation SCIENCE SCIENCE WRITE 1.00000 0.14471 WRITE 0.14471 1.00000

Cross Model Inverse Correlation SCIENCE SCIENCE WRITE 1.02139 -0.14780 WRITE -0.14780 1.02139

Cross Model Inverse Covariance SCIENCE SCIENCE WRITE 0.017476 -.002710 WRITE -.002710 0.020076 0.9981 394 0.3875

System Weighted MSE Degrees of freedom System Weighted R-Square

Finally, we have the seemingly unrelated regression estimation for our models. Note that both the estimates of the coefficients and their standard errors are different from the OLS model estimates shown above.
The SYSLIN Procedure Seemingly Unrelated Regression Estimation Model Dependent Variable SCIENCE science Parameter Estimates Variable Intercept math female Model Dependent Variable DF 1 1 1 Parameter Estimate 20.13265 0.625141 -2.18934 WRITE write Parameter Estimates Variable Intercept read female DF 1 1 1 Parameter Estimate 21.83439 0.535484 5.453748 Standard Error 2.698827 0.049091 1.014244 t Value 8.09 10.91 5.38 Pr > |t| <.0001 <.0001 <.0001 Standard Error 3.149485 0.057528 1.086038 t Value 6.39 10.87 -2.02 Pr > |t| <.0001 <.0001 0.0452

Now that we have estimated our models let's test the predictor variables. The test for female combines information from both models. The tests for math and read are actually equivalent to the t-tests above except that the results are displayed as F-tests.
proc syslin data = "c:\sasreg\hsb2" sur ; science: model science = math female ; write: model write = read female ; female: stest science.female = write.female =0; math: stest science.math = 0; read: stest write.read = 0; run; Test Results for Variable FEAMLE Num DF 2 Den DF 394 F Value 18.48 Pr > F 0.0001

Test Results for Variable MATH Num DF 1 Den DF 394 F Value 118.31 Pr > F 0.0001

Test Results for Variable READ Num DF 1 Den DF 394 F Value 119.21 Pr > F 0.0001

Now, let's estimate 3 models where we use the same predictors in each model as shown below.
read = female prog1 prog3 write = female prog1 prog3 math = female prog1 prog3

Here variable prog1 and prog3 are dummy variables for the variable prog. Let's generate these variables before estimating our three models using proc syslin.
data hsb2; set "c:\sasreg\hsb2"; prog1 = (prog = 1); prog3 = (prog = 3); run; proc syslin data = hsb2 sur; model1: model read = female prog1 prog3; model2: model write = female prog1 prog3; model3: model math = female prog1 prog3; run;

The OLS regression estimate of our three models are as follows.


<some output omitted> The SYSLIN Procedure Ordinary Least Squares Estimation Model Dependent Variable MODEL1 read Parameter Estimates Variable Intercept female prog1 prog3 Model Dependent Variable DF 1 1 1 1 Parameter Estimate 56.82950 -1.20858 -6.42937 -9.97687 MODEL2 write Standard Error 1.170562 1.327672 1.665893 1.606428 t Value 48.55 -0.91 -3.86 -6.21 Pr > |t| <.0001 0.3638 0.0002 <.0001

Parameter Estimates Variable Intercept female prog1 prog3 Model Dependent Variable DF 1 1 1 1 Parameter Estimate 53.62162 4.771211 -4.83293 -9.43807 MODEL3 math Parameter Estimates Variable Intercept female prog1 prog3 DF 1 1 1 1 Parameter Estimate 57.10551 -0.67377 -6.72394 -10.3217 Standard Error 1.036890 1.176059 1.475657 1.422983 t Value 55.07 -0.57 -4.56 -7.25 Pr > |t| <.0001 0.5674 <.0001 <.0001 Standard Error 1.042019 1.181876 1.482956 1.430021 t Value 51.46 4.04 -3.26 -6.60 Pr > |t| <.0001 <.0001 0.0013 <.0001

These regressions provide fine estimates of the coefficients and standard errors but these results assume the residuals of each analysis are completely independent of the other. Also, if we wish to test female, we would have to do it three times and would not be able to combine the information from all three tests into a single overall test. Now let's see the output of the estimate using seemingly unrelated regression.
The SYSLIN Procedure Seemingly Unrelated Regression Estimation Model MODEL1 Dependent Variable read Parameter Estimates Variable Intercept female prog1 prog3 Model Dependent Variable DF 1 1 1 1 Parameter Estimate 56.82950 -1.20858 -6.42937 -9.97687 MODEL2 write Parameter Estimates Variable Intercept female prog1 prog3 DF 1 1 1 1 Parameter Estimate 53.62162 4.771211 -4.83293 -9.43807 Standard Error 1.042019 1.181876 1.482956 1.430021 t Value 51.46 4.04 -3.26 -6.60 Pr > |t| <.0001 <.0001 0.0013 <.0001 Standard Error 1.170562 1.327672 1.665893 1.606428 t Value 48.55 -0.91 -3.86 -6.21 Pr > |t| <.0001 0.3638 0.0002 <.0001

Model Dependent Variable

MODEL3 math Parameter Estimates

Variable Intercept female prog1 prog3

DF 1 1 1 1

Parameter Estimate 57.10551 -0.67377 -6.72394 -10.3217

Standard Error 1.036890 1.176059 1.475657 1.422983

t Value 55.07 -0.57 -4.56 -7.25

Pr > |t| <.0001 0.5674 <.0001 <.0001

Note that the coefficients are identical in the OLS results and in the seemingly unrelated regression estimate, however the standard errors are different, only slightly, due to the correlation among the residuals in the multiple equations. In addition to getting more appropriate standard errors, we can test the effects of the predictors across the equations. We can test the hypothesis that the coefficient for female is 0 for all three outcome variables, as shown below.
proc syslin data = hsb2 sur; model1: model read = female prog1 prog3; model2: model write = female prog1 prog3; model3: model math = female prog1 prog3; feamle: stest model1.female = model2.female = model3.female = 0; run; Test Results for Variable FEAMLE Num DF 3 Den DF 588 F Value 11.63 Pr > F 0.0001

We can also test the hypothesis that the coefficient for female is 0 for just read and math.
proc syslin data = hsb2 sur; model1: model read = female prog1 prog3; model2: model write = female prog1 prog3; model3: model math = female prog1 prog3; f1: stest model1.female = model3.female = 0; run; Test Results for Variable F1 Num DF 2 Den DF 588 F Value 0.42 Pr > F 0.6599

We can also test the hypothesis that the coefficients for prog1 and prog3 are 0 for all three outcome variables, as shown below.
proc syslin data = hsb2 sur; model1: model read = female prog1 prog3; model2: model write = female prog1 prog3; model3: model math = female prog1 prog3; progs: stest model1.prog1 = model2.prog1 = model3.prog1 =0,

model1.prog3 = model2.prog3 = model3.prog3 =0 ; run; Test Results for Variable PROGS Num DF 6 Den DF 588 F Value 11.83 Pr > F 0.0001

4.5.2 Multivariate Regression Let's now use multivariate regression using proc reg to look at the same analysis say that we saw in the proc syslin example above, estimating the following 3 models.
read = female prog1 prog3 write = female prog1 prog3 math = female prog1 prog3

Below we use proc reg to predict read write and math from female prog1 and prog3. Note that the top part of the output is similar to the sureg output in that it gives an overall summary of the model for each outcome variable, however the results are somewhat different and the sureg uses a Chi-Square test for the overall fit of the model, and mvreg uses an Ftest. The lower part of the output appears similar to the sureg output, however when you compare the standard errors you see that the results are not the same. These standard errors correspond to the OLS standard errors, so these results below do not take into account the correlations among the residuals (as do the sureg results).
proc reg data =hsb2; model read write math = female prog1 prog3 ; run; The REG Procedure [Some output omitted] Dependent Variable: read Parameter Estimates Variable Intercept female prog1 prog3 DF 1 1 1 1 Parameter Estimate 56.82950 -1.20858 -6.42937 -9.97687 Standard Error 1.17056 1.32767 1.66589 1.60643 t Value 48.55 -0.91 -3.86 -6.21 Pr > |t| <.0001 0.3638 0.0002 <.0001

Dependent Variable: write Parameter Estimates Variable Intercept female prog1 prog3 DF 1 1 1 1 Parameter Estimate 53.62162 4.77121 -4.83293 -9.43807 Standard Error 1.04202 1.18188 1.48296 1.43002 t Value 51.46 4.04 -3.26 -6.60 Pr > |t| <.0001 <.0001 0.0013 <.0001

Dependent Variable: math Parameter Estimates Variable Intercept female prog1 prog3 DF 1 1 1 1 Parameter Estimate 57.10551 -0.67377 -6.72394 -10.32168 Standard Error 1.03689 1.17606 1.47566 1.42298 t Value 55.07 -0.57 -4.56 -7.25 Pr > |t| <.0001 0.5674 <.0001 <.0001

Now, let's test female. Note, that female was statistically significant in only one of the three equations. Using the mtest statement after proc reg allows us to test female across all three equations simultaneously. And, guess what? It is significant. This is consistent with what we found using seemingly unrelated regression estimation.
female: mtest female=0; run; Multivariate Test: female Multivariate Statistics and Exact F Statistics S=1 Statistic > F Wilks' Lambda <.0001 Pillai's Trace <.0001 Hotelling-Lawley Trace <.0001 Roy's Greatest Root <.0001 M=0.5 Value 0.84892448 0.15107552 0.17796108 0.17796108 N=96 F Value 11.51 11.51 11.51 11.51 Num DF 3 3 3 3 Den DF 194 194 194 194 Pr

We can also test prog1 and prog3, both separately and combined. Remember these are multivariate tests.
prog1: mtest prog1 = 0; run; Multivariate Test: prog1 Multivariate Statistics and Exact F Statistics S=1 Statistic > F Wilks' Lambda <.0001 Pillai's Trace <.0001 Hotelling-Lawley Trace <.0001 M=0.5 Value 0.89429287 0.10570713 0.11820192 N=96 F Value 7.64 7.64 7.64 Num DF 3 3 3 Den DF 194 194 194 Pr

Roy's Greatest Root <.0001 prog3: mtest prog3 = 0; run; Multivariate Test: prog3

0.11820192

7.64

194

Multivariate Statistics and Exact F Statistics S=1 Statistic > F M=0.5 Value N=96 F Value 21.25 21.25 21.25 21.25 Num DF 3 3 3 3 Den DF 194 194 194 194 Pr

Wilks' Lambda 0.75267026 <.0001 Pillai's Trace 0.24732974 <.0001 Hotelling-Lawley Trace 0.32860304 <.0001 Roy's Greatest Root 0.32860304 <.0001 prog: mtest prog1 = prog3 =0; run; quit; Multivariate Test: prog

Multivariate Statistics and F Approximations S=2 Statistic > F Wilks' Lambda <.0001 Pillai's Trace <.0001 Hotelling-Lawley Trace <.0001 Roy's Greatest Root <.0001 M=0 Value 0.73294667 0.26859190 0.36225660 0.35636617 N=96 F Value 10.87 10.08 11.68 23.16 Num DF 6 6 6 3 Den DF 388 390 256.9 195 Pr

NOTE: F Statistic for Roy's Greatest Root is an upper bound. NOTE: F Statistic for Wilks' Lambda is exact.

Proc syslin with sur option and proc reg both allow you to test multi-equation models while taking into account the fact that the equations are not independent. The proc syslin with sur option allows you to get estimates for each equation which adjust for the non-independence of the equations, and it allows you to estimate equations which don't necessarily have the same predictors. By contrast, proc reg is restricted to equations that have the same set of predictors, and the estimates it provides for the individual equations are the same as the OLS estimates. However, proc reg allows you to perform more traditional multivariate tests of predictors.

S-ar putea să vă placă și