Sunteți pe pagina 1din 9

Regression in the Toolbar of Minitabs Help

1. Example of simple linear regression You are a manufacturer who wants to obtain a quality measure on a product, but the procedure to obtain the measure is expensive. There is an indirect approach, which uses a different product score (Score 1) in place of the actual quality measure (Score 2). This approach is less costly but also is less precise. You can use regression to see if Score 1 explains a significant amount of variance in Score 2 to determine if Score 1 is an acceptable substitute for Score 2. 1 2 3 4 5 Open the worksheet EXH_REGR.MTW. Choose Stat > Regression > Regression. In Response, enter Score2. In Predictors, enter Score1. Click OK.

Session window output

Interpreting the results Minitab displays the results in the Session window by default. The p-value(5) in the Analysis of Variance table(13) (0.000), indicates that the relationship between Score 1 and Score 2 is statistically significant(10) at an

-level(2) of 0.05. This is also shown by the p-value

for the estimated coefficient(14) of Score 1, which is 0.000. The R2(15) value shows that Score 1 explains 95.7% of the variance in Score 2, indicating that the model fits the data extremely well. Observation 9 is identified as an unusual observation(16) because its standardized residual(17) is less than -2. This could indicate that this observation is an outlier. See Identifying outliers. Because the model is significant and explains a large part of the variance in Score 2, the manufacturer decides to use Score 1 in place of Score 2 as a quality measure for the product. 2. Example of multiple regressions As part of a test of solar thermal energy, you measure the total heat flux from homes. You wish to examine whether total heat flux (HeatFlux)

can be predicted by the position of the focal points in the east, south, and north directions. Data are from [27]. You found, using best subsets regression, that the best two-predictor(18) model included the variables North and South and the best three-predictor added the variable East. You evaluate the three-predictor model using multiple regression. 1 2 3 4 5 6 7 Open the worksheet EXH_REGR.MTW. Choose Stat > Regression > Regression. In Response, enter HeatFlux. In Predictors, enter East South North. Click Graphs. Under Residuals for Plots, choose Standardized. Under Residual of Plots, choose Normal Individual plot of Plots. Check and residuals, residuals,

Histogram 8

Residuals versus fits. Click OK. Click Options. Under Display, check PRESS and predicted Rsquare. Click OK in each dialog box. Session window output

Interpreting the results Session window output The p-value(5) in the Analysis of Variance table(13) (0.000) shows that the model estimated by the regression procedure is significant(10) at an

-level(2) of 0.05. This indicates that at least one coefficient is different


from zero. The p-values for the estimated coefficients(14) of North and South are both 0.000, indicating that they are significantly related to HeatFlux. The p-value for East is 0.092, indicating that it is not related to HeatFlux at an

-level(2) of 0.05. Additionally, the sequential sum of


East doesn't explain a

squares(19) indicates that the predictor

substantial amount of unique variance. This suggests that a model with only North and South may be more appropriate. The R2(15) value indicates that the predictors explain 87.4% of the variance in HeatFlux. The adjusted R2(20) is 85.9%, which accounts for the number of predictors in the model. Both values indicate that the model fits the data well. The predicted R2(21) value is 78.96%. Because the predicted R2 value is close to the R2 and adjusted R2 values, the model does not appear to be overfit and has adequate predictive ability. Observations 4 and 22 are identified as unusual because the absolute value of the standardized residuals are greater than 2. This may

indicate they are outliers(22). See Checking your model, Identifying outliers, and Choosing a residual type. Graph window output The histogram(23) indicates that outliers may exist in the data, shown by the two bars on the far right side of the plot. The normal probability plot(24) shows an approximately linear pattern consistent with a normal distribution(25). The two points in the upperright corner of the plot may be outliers(22). Brushing the graph identifies these points as 4 and 22, the same points that are labeled unusual observations in the output. See Checking your model and Identifying outliers. The plot of residuals(11) versus the fitted values(26) shows that the residuals(11) get smaller (closer to the reference line) as the fitted values increase, which may indicate the residuals have non-constant variance. See [9] for information on non-constant variance. 3. Example of Fitted Regression Line You are studying the relationship between a particular machine setting and the amount of energy consumed. This relationship is known to have considerable curvature, and you believe that a log transformation of the response variable(18) will produce a more symmetric error(32) distribution. You choose to model the relationship between the machine setting and the amount of energy consumed with a quadratic model(33). 1 2 3 4 5 6 Open the worksheet EXH_REGR.MTW. Choose Stat > Regression > Fitted Line Plot. In Response (Y), enter EnergyConsumption. In Predictor (X), enter MachineSetting. Under Type of Regression Model, choose Quadratic. Click Options. Under Transformations, check Logten of Y and

Display logscale for Y variable. Under Display Options, check Display confidence interval and Display prediction interval. Click OK in each dialog box. Session window output

Interpreting the results The quadratic model (p-value(9) = 0.000, or actually p-value < 0.0005) appears to provide a good fit to the data. The R2
(15)

indicates that

machine setting accounts for 93.1% of the variation in log10 of the energy consumed. A visual inspection of the plot reveals that the data are randomly spread about the regression line(34), implying no systematic lack-of-fit. The red-dashed lines are the 95% confidence limits(35) for the log10 of energy consumed. The green-dashed lines are the 95% prediction limits(35) for new observations.

References [1] D.A. Belsley, E. Kuh, and R.E. Welsch (1980). Regression Diagnostics. John Wiley & Sons, Inc. [2] A. Bhargava (1989). "Missing Observations and the Use of the DurbinWatson Statistic," Biometrik, 76, 828-831. [3] D.A. Burn and T.A. Ryan, Jr. (1983). "A Diagnostic Test for Lack of Fit in Regression Models," ASA 1983 Proceedings of the Statistical Computing Section, 286-290. [4] R.D. Cook (1977). "Detection of Influential Observations in Linear Regression," Technometrics, 19, 15-18. [5] R.D. Cook and S. Weisberg (1982). Residuals and Influence in Regression. Chapman and Hall. [6] N.R. Draper and H. Smith (1981). Applied Regression Analysis, Second Edition. John Wiley & Sons, Inc. [7] I.E. Frank and J.H. Friedman (1993). "A Statistical View of Some Chemometrics Regression Tool," Technometrics, 35, 109-135. [8] D.C. Hoaglin and R.E. Welsch (1978). "The Hat Matrix in Regression and ANOVA," The American Statistician, 32, 17-22. [9] D.C. Montgomery and E.A. Peck (1982). Introduction to Linear Regression Analysis. John Wiley & Sons. [10] J. Neter, W. Wasserman, and M. Kutner (1985). Applied Linear Statistical Models. Richard D. Irwin, Inc. [11] M. Schatzoff, R. Tsao, and S. Fienberg (1968). "Efficient Calculation of All Possible Regressions," Technometrics, 10, 769-779. [12] P. Velleman and R. Welsch (1981). "Efficient Computation of Regression Diagnostics," The American Statistician, 35, 234-242. [13] P.F. Velleman, J. Seaman, and I.E. Allen (1977). "Evaluating Package Regression Routines," ASA 1977 Proceedings of the Statistical Computing Section. [14] S. Weisberg (1980). Applied Linear Regression. John Wiley & Sons, Inc.

http://statsdata.blogspot.com/

S-ar putea să vă placă și