Sunteți pe pagina 1din 30

Two Variable Regression model: The problem of estimation

Sample and population


regression line

i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

52.25
58.32
81.79
119.90
125.80
100.46
121.51
100.08
127.75
104.94
107.48
98.48
181.21
122.23
129.57
92.84
117.92
82.13
182.28
139.13
2246.0
7

258.30
343.10
425.00
467.50
482.90
487.70
496.50
519.40
543.30
548.70
564.60
588.30
591.30
607.30
611.20
631.00
659.60
664.00
704.20
704.80
10898.70

-60.0535
-53.9835
-30.5135
7.5965
13.4965
-11.8435
9.2065
-12.2235
15.4465
-7.3635
-4.8235
-13.8235
68.9065
9.9265
17.2665
-19.4635
5.6165
-30.1735
69.9765
26.8265

-286.635
-201.835
-119.935
-77.435
-62.035
-57.235
-48.435
-25.535
-1.635
3.765
19.665
43.365
46.365
62.365
66.265
86.065
114.665
119.065
159.265
159.865

17213.43
10895.76
3659.64
-588.23
-837.26
677.86
-445.92
312.13
-25.26
-27.72
-94.85
-599.46
3194.85
619.07
1144.16
-1675.13
644.02
-3592.61
11144.81
4288.62
45907.91

82159.62
40737.37
14384.40
5996.18
3848.34
3275.85
2345.95
652.04
2.67
14.18
386.71
1880.52
2149.71
3889.39
4391.05
7407.18
13148.06
14176.47
25365.34
25556.82
251767.8
7

Properties of SRF

1. It passes through
2. The mean value of estimated Y is equal to the mean value
of the actual Y
3. The mean value of the residuals is zero
4. The residuals are uncorrelated with the predicted
5. The residuals are uncorrelated with the

The Classical Linear Regression Model

Assumptions (pertain to PRF):


1. Linear in the parameters.
2. Fixed X values in repeated samples (fixed regressor)
or X values independent of error term; for stochastic regressor;
3. Zero mean value of disturbance ;

The Classical Linear Regression Model

4. Constant variance of (homoscedasticity);

heteroscedasticity

The Classical Linear Regression Model


5. No autocorrelation between the disturbances ;

6. The number of observations n must be greater than the number of


parameters to be estimated
7. There must be variation in the values of the X variables
8. No exact colinearity between the X variables
9. There is no specification bias
8

Precision or standard errors of Least-squares estimates

Standard errors of estimates:

Standard error : standard deviation of the sampling distribution of the estimator;


distribution of the set of values of estimator obtained from all possible samples of
the same size from a given population.

standard error of estimate or standard error of the regression


(standard deviation of Y values about the estimated regression line)

Important feature of
The variance of is directly proportional to but inversely proportional to
-> Given , the larger variation in X values, the smaller the variance of implying
greater precision
-> Given , the larger , the larger the variance of
-> As sample size n increases, the number of terms in , the smaller the variance
of
10

Properties of Least Square


Estimators
The Gauss-Markov Theorem
Given the assumptions of the classical linear regression
model, the least-squares estimators, in the class of
unbiased linear estimators, have minimum variance,
that is they are BLUE (Best linear unbiased estimator)
1. Linear function of a random variable, such as the
dependent variable Y in the regression model.
2. Unbiased: its average of expected value is equal to
the true value.
3. Minimum variance in the class of all such linear
unbiased estimators -> efficient estimator

11

Coefficient of determination

How well the sample regression line fits the data?


The goodness of fit of the fitted regression line to
the data set.
-> coefficient of determination

Circle Y: variation in dependent variable Y


Circle X: variation in independent variable X
Shaded area: The extent to which the variation in Y is
explained by the variation in X
numerical measures of overlap, lies between 0 and 1
12

In deviation
form

TSS = ESS + RSS

Total sum of squares (TSS) =


total variation of actual Y about their
sample mean
Explained sum of squares (ESS) =
variation of estimated Y about their sample
mean
sum of squares due to regression
(explanatory var)
Residual sum of squares (RSS)
=
unexplained variation of Y values about the
regression line

Coefficient of determination (

What is the value of from exercise

1?

Measures the proportion or percentage of the total variation in Y explained by the


regression model.
i. Nonnegative quantity
14

Alternatively:

15

Coefficient of correlation

16

Properties of r:

1. Can be positive or negative


2. [-1, 1]
3. Symmetrical
4. Independent of origin and scale
5. If X and Y are statistically independent, r =
0. But zero correlation does not necessarily
imply independence.
6. A measure of linear association or linear
dependence only.
7. Does not necessarily imply any cause-and
effect relationship.

17

Classical Normal Linear Regression Model


Values of OLS estimators change from sample to sample.
Random variables.
Need to find out their probability distribution.
The Normality Assumption for
The classical normal linear regression model assumes that each is
distributed normally with

18

Theoretical justification
Central limit theorem (CLT): Given random and independent samples of N
observations each, the distribution of sample means approaches normality
as the size of N increases, regardless of the shape of the population
distribution
represent combined influence (on the dependent variable) of a large
number of independent variables that are not explicitly introduced in the
regression model. By CLT, if there is a large number of independent and
identically distributed random variables, the distribution of their sum
follows normal distribution as the number of variables increases indefinitely.
A variant of CLT, even if the number of variables is not very large or if these
variables are not strictly independent, their sum, may still be normally
distributed.
19

One property of normal distribution:


Any linear function of normally distributed variables is itself
normally distributed.
-> If are normally distributed, and are also normally distributed.

20

Properties of Least Square Estimators under the Normality


assumption
1. Unbiased
2. Minimum variance
3. Consistency as sample size increases indefinitely, the estimators converge to
their true population values
4. is normally distributed with

5. is normally distributed with

21

22

6. Variable follows chi square distribution with (n-2)df.


7. and distributed independently of
8. and have minimum variance in the entire class of unbiased estimators
whether linear or not.

Normality assumption of enables to derive the probability distributions of , and


-> estimation and hypotheses testing

Note that since is a linear function of

23

Interval Estimation
Example = 0.7240
Point estimator
-> constructing an interval around the point estimator
The probability that the random interval contains the true is

)
Confidence interval

Confidence coefficient

level of significance
Endpoints of CI
confidence limits (lower / upper)
24

Confidence intervals for and


OLS estimators and normally distributed
Standardized normal variable:

However, unknown. -> determined by

25

critical t value at level of significance. The value of t


variable obtained from the t distribution for level of
significance and n-2 df.

26

100 (1-)% confidence interval for :


Larger the standard error, larger is the width of the CI.
Larger the standard error, the greater the uncertainty of estimating the true value
of the unknown parameter.
Standard error of estimator as a measure of the precision of the estimator, how
precisely the estimator measure the true population value.
Example: = 0.7240
se = 0.07
n = 13
What is the critical value from t table assuming =0.05?
What is the 95 percent confidence interval for ?

27

28

100 (1-)% confidence interval for :

Interpretation:
Given the confidence coefficient of 95 percent, in 95 out of 100
cases, the interval of
will contain the true .
However, the probability that the specified fixed interval includes the
true is 1 or 0.
29

Simple Regression in Eviews:


Step 1: Open Eviews
Step 2: Click on File/New/Workfile in order to create a new file
Step 3: Choose the frequency of the data in the case of time series data or Undated or
Irregular in the case of cross-sectional data, and specify the start and end of your data set.
Eviews will open a new window which automatically contains a constant (c) and a residual
(resid) series.
Step 4: On the command line type:
genr x=0 (press enter)
genr y=0 (press enter)
which creates two new series named x and y that contains zeros for every observation.
Open x and y as a group by selecting them and double clicking with the mouse.
Step 5: Either type the data or copy/paste from Excel. To be able to type (edit) the data of
your series or to paste anything into the Eviews cells, the edit +/- button must be pressed.
After editing the series press the edit +/- button again to lock or secure the data.
Step 6: Once the data have been entered into Eviews, the regression line may be estimated
either by typing
ls y c x
(press enter)
on the command line, or by clicking on Quick/Estimate equation and then writing your equation
(y c x) in the new window.
Do exercise 3.20.

30

S-ar putea să vă placă și