Sunteți pe pagina 1din 21

Chapter 5: Statistical Aspects of Regression

and

are only estimates of and

Key question: How accurate are these estimates? Statistical procedures allow us to formally address this question.

What Factors Affect Accuracy of OLS Estimates?


Graphical Intuition: Figure 5.1 (small number of data points) Figure 5.2 (large number of data points but very scattered) Figure 5.3 (large number of data points but not very scattered) Figure 5.4 (large number of data points, but clustered near one value for X)

What Factors Affect Accuracy of OLS Estimates?

Consider fitting a line through the XY-plots in Figures 5.1-5.4. You would be most confident in the line you fit in Figure 5.3 Larger number of data points + less scattering (i.e. less variability in errors) + more variability in X = more accurate estimates.

Note: Figures 5.1, 5.2, 5.3 and 5.4 all contain artificially generated data with =0, =1.

Figure 5.1: Very Sm Sam Size all ple

Y 1 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 -1 -2 X

Figure 5.2: Large Sample Size, Large Error Variance


8

0 0 1 2 3 4 5

-2

-4 X

Figure 5.3: Large Sample Size, Small Error Variance


6

Y 2 1 0 0 1 2 3 4 5 6 -1 X

Figure 5.4: Lim ited Range of X Values


10

Y 2 0 0 0.5 1 1.5 2 2.5 3 3.5 -2 -4 X

A Confidence Interval for

Uncertainty about accuracy of the estimate can be summarised in a confidence interval

95% confidence interval for is given by:


[ t s , + t s ] b b b b

tb is a critical value from the Student tdistribution --- calculated automatically in computer packages
sb = standard error of is a measure of the accuracy of

SR S ( N 2 ) ( X i X )2

A Confidence Interval for

(cont.)

tb controls the confidence level (e.g. tb is bigger for 95% confidence than 90%). sb varies directly with SSR (i.e. how variable the residuals are) sb varies inversely with N, the number of data points
( sb varies inversely with X i X ) 2 , which is related to the variance/variability of X.

Note: Different computer packages label confidence intervals in different ways. E.g. Excel labels bounds of confidence interval as Lower 95% and Upper 95%

Intuition of Confidence Interval

Useful (but formally incorrect) intuition: There is a 95% probability that the true value of lies in the confidence interval. Correct intuition: If you repeatedly use the above formula for calculating a confidence interval, 95% of the intervals you construct will contain the true value for .

Can choose any level of confidence you want (e.g. 90%, 99%).

10

Example: Confidence Intervals for the Data sets in Figures 5.1-5.4

Data Set Figure 5.1 Figure 5.2 Figure 5.3 Figure 5.4

.91 1.04 1.00 1.52

90% 95% Confid. Confid. Interval Interval [-.92,2.75] [-1.57,3.39] [.75,1.32] [.70,1.38] [.99,1.01] [.99,1.02] [-1.33,4.36] [-1.88,4.91]

99% Confid. Interval [-3.64,5.47] [.59,1.49] [.98,1.03] [-2.98,6.02]

11

Example: The Regression of Executive Compensation on Profits


Y = executive compensation X = profits
=

.000842

95% Confidence interval: [.00061,.001075]

12

Example: The Regression of Lot Size on House Price


OLS results:

Y = 34 ,136 + 6.59 X ,

The OLS estimate of the marginal effect of X on Y is 6.59. Increasing lot size by an extra square foot is associated with a $6.59 increase in house price. The 95% confidence interval for [5.72,7.47]. is

We are 95% confident that the effect of lot size on house is at least $5.72 and at most $7.47.

13

Hypothesis Testing

Test whether =0 (i.e. whether X has any explanatory power) One way of doing it: look at confidence interval, check whether it contains zero. If no, then you are confident 0.

Alternative (equivalent) way is to use tstatistic (often called t-ratio)


t = s b

Big values for t indicate 0. Small values for t indicate =0.

14

Hypothesis Testing (cont.)


Q: What do we mean by big and small? A: Look at P-value. If P-value .05 then t is big and conclude 0. If P-value >.05 then t is small and conclude =0.

Useful (but formally incorrect) intuition: P-value measures the probability that = 0. .05 = 5% = level of significance Other levels of significance (e.g. 1% or 10%) occasionally used

15

Example: The Regression of Executive Compensation on Profits (cont.)


95% Confidence interval: [.00061,.001075] Confidence interval does not contain zero, so conclude that 0. Alternatively: t-ratio is 7.227937. Is this big? Yes, the P-value is 5.5 10-10 which is much less than .05. Hence, we conclude again that 0.

16

Jargon

The coefficient on profits is significantly different from zero.

Profits have statistically significant explanatory power for executive compensation.

The hypothesis that = 0 can be rejected at the 5% significance level.

17

Testing on R2: The F-statistic

Test whether R2=0 (i.e. whether X has any explanatory power) Note: In simple regression testing R2=0 and =0 are the same, but in multiple regression they will be different. F-statistic is a test statistic analogous to tstatistic (e.g. small values of it indicate R2=0).
( N 2) R 2 F = (1 R 2 )

18

Testing on R2: The F-statistic (cont.)

For test with 5% level of significance:

If P-value is > .05 conclude R2=0.

If P-value is .05 conclude R2 0.

Excel calls the P-value for this test Significance F

19

Example: The Regression of Executive Compensation on Profits (cont.)

P-value = Significance F = 5.5 10-10. Since P-value < .05 conclude R2 0. Profits do have explanatory power for Y.

20

Chapter Summary
1.

Accuracy of OLS estimates depends on number of data points, variability of the explanatory variable and variability of the errors. The confidence interval provides an interval in which you can be confident lies. The width of the confidence interval depends on the same factors as affect the accuracy of OLS estimates. In addition, the width of the confidence interval depends on the confidence level. A hypothesis test of whether =0 can be used to find out whether the explanatory variable belongs in the regression. The P-value is a measure of how plausible the hypothesis is. If the P-value for the hypothesis test of whether =0 is less than .05 then you can reject the hypothesis at the 5% level of significance. If the P-value for the hypothesis test of whether =0 is greater than .05 then you cannot reject the hypothesis at the 5% level of significance. A hypothesis test of whether R2=0 can be used to investigate whether the regression helps explain the dependent variable.

2.

3.

4.

5.

6.

7.

21