Sunteți pe pagina 1din 2

5.2.

2 Regression Model Adequacy

Table 7 show the different regression models for all VIs and nonresponse rates (NRRs)

that were checked for adequacy. The columns are represented as follows: (a) VI, (b)

NRR, (c) IC, (d) the prediction model, (e) the coefficient of determination (R2) and (f) the

F-statistic and its p-value. For each imputation classes, the variables in the regression

model are defined as follows: y , the dependent variable which is the second visit VI

(TOTEX2 or TOTIN2) and LN(FV) , the independent variable which is the first visit VI

(TOTEX1 OR TOTIN1).

In coming up with the regression models that will exhibit better results in the model

assumptions, a logarithmic transformation or LN transformation was applied. The LN

transformation was applied to both the dependent and independent variable. The

independent and dependent variable in the model for all ICs under the varying NRR is the

first visit VI (TOTEX1 or TOTIN1) and the second visit VI (TOTEX2 or TOTIN2),

respectively. After applying the transformation, the following results were obtained:

First, in determining the explanatory power of first visit VI to the second visit VI, the

coefficient of determination, R2 was obtained. A large value of R2 is a good indication on

how well the model fits the data. The highest R2 in Table 7 measured 93.2%, the

coefficient of determination for the third imputation class of the TOTEX2 variable under

the highest NRR while the lowest is 70.3%, the coefficient of determination for the first

imputation class of the TOTIN2 variable under 20% NRR. For all NRR and VIs, the third

IC generated the highest R2 while the first IC produced the lowest R2.
Second, using the ANOVA tables presented in Section C of the Appendix for all the

models to check if the models satisfy the linearity assumption, results show that all

models exhibits the assumption of linearity. The p-values for all the models were less

than 0.0001, an indication that the linearity of the models is very significant.

Third, in testing for the assumption of independence, the Durbin – Watson test was

implemented. Results in Section C of the Appendix show that all of the models satisfy the

assumption of independence. However, since the data in this paper is not a time series

data where the assumption of the independence of error terms is relatively important, the

assumption of independence was ignored.

Fourth, to determine whether the models would meet the assumption of the normality, the

normal probability plot (NPP) was obtained. The normal probability plot in all models

moderately follows the S-shaped pattern which indicates that the residuals are not normal

but rather lognormal. However, the shape of the NPP improved after ln transformation

was applied even though it is not linear. Since the data used is a complex data, the models

were used even if assumption of the residuals to be normal is not perfectly achieved.

Lastly, to check if the residuals satisfy homoscedasticity or the equality of variances, a

scatter plot of the residuals against the predicted values was obtained. Results showed

that there were no patterns evident in the scatter plot. The logarithmic transformation

resolved the problem of heteroscedasticity.

S-ar putea să vă placă și