Documente Academic
Documente Profesional
Documente Cultură
Week 12
Week 2
Week 3
Week 4
Probability:
Week 6
Review
Estimation: Week 5
Week
7
Week
8
Week 9
Hypothesis testing:
Week
10
Week
11
Linear regression:
Week 2 VL
Week 3 VL
Week 4 VL
Video lectures: Week 1 VL
Week 1
Week 5 VL
Idea;
Estimating using LSE (& BLUE estimator & relation MLE);
Partition of variability of the variable;
Testing:
i) Slope;
ii) Intercept;
iii) Regression line;
iv) Correlation coefficient.
Matrix notation;
LSE estimates;
Tests;
R-squared and adjusted R-squared.
Model selection
Reduction of number of explanatory variables
Model validation
Confounding effects
Linear regression measures the effect of explanatory variables
X1 , . . . , Xn on the dependent variable Y .
The assumptions are:
Effects of the covariates (explanatory variables) must be
additive;
Homoskedastic (constant) variance;
Errors must be independent of the explanatory variables with
mean zero (weak assumptions);
Errors must be Normally distributed, and hence, symmetric
(strong assumptions).
Confounding effects
C is a confounder of the relation between X and Y if: C
influences X and C influences Y , but X does not influence Y
(directly).
3304/3343
3305/3343
Model selection
Reduction of number of explanatory variables
Model validation
Collinearity
Multicollinearity occurs when one explanatory variable is a
(nearly) linear combination of the other explanatory variables.
If explanatory variable is collinear, this variable is redundant,
it provides no/little additional information.
Example: perfect fit for: y = 87 + x1 + 18x2 , but also for
y = 7 + 9x1 + 2x2 :
i
1 2 3
4
yi 23 83 63 103
xi1 2 8 6 10
xi2 6 9 8 10
3306/3343
Collinearity
Collinearity:
Does not influence fit, nor predictions.
Estimates of error variance, thus also model adequacy, are still
reliable.
Standard errors of individual regression coefficients are higher,
leading to small t-ratio.
Detecting collinearity:
i) Regress xj on the other explanatory variables
ii) Determine coefficient of determination R 2 .
iii) Calculate Variance Inflation Factor: VIFj = (1 Rj2 )1 . If
large (> 10), severe collinearity exists.
3307/3343
Model selection
Reduction of number of explanatory variables
Model validation
Heteroscedasticity
3308/3343
100
100
200
200
0
100
0
100
200
yi
100
0
300
4
x1i
200
1
0.5
100
0
100
0
0
0.5
10
x2i
20
1
0
2
4
y*i =log(yi)
3309/3343
20
20
10
10
0
10
0
0
200
yi
10
0
400
50
100
x1i
20
20
2i
15
10
0
10
0
3310/3343
10
5
10
x2i
20
0
0
50
100
x1i
Detecting heteroscedasticity
F-test (using two groups of data), White (1980)-test
Bruesch and Pagan (1980) test:
Test H0 : homoscedastic residuals v.s.
H1 : Var (yi ) = 2 + z >
i , where z i is a know vector of
variables and is a p-dimensional vector of parameters.
Test procedure:
1. Fit the regression model and determine the residuals i .
2 2
2. Calculate the squared standardized residuals ?2
i = i /s .
1 = P> P.
e + e
ye = X
1
= X> P> PX
P> X> Py
1
= X> 1 X
X> 1 y
1
b = 2 X
e >X
e
Var ()
1
= 2 X> 1 X
3313/3343
1 0
0
0 2
h i
0
E > = 2 = 2
.
.
.
.
.
.
0
0
n
1/1
0
0
1/ 1
1/
0
2
1/ 2
1
>
P=
=P P
..
..
...
.
.
1/ n
0
0
1/n
Only applicable when you know the relative variances
1 , . . . , n .
3314/3343
EGLS/FGLS (OPTIONAL)
Feasible GLS or Estimated GLS does not impose the structure
of heteroskedasticity, but estimates it from the data.
Estimation procedure:
1. Estimate the regression using OLS.
2. Regress the squared residuals on explanatory variables.
3. Determine the expected squared residuals
bi = diag(b
1 , . . . ,
bn ).
4. Use WLS with weights
bi to find the EGLS/FGLS estimate.
3315/3343
Model selection
Reduction of number of explanatory variables
Model validation
( + X )
| 0 {z1 1}
intercept depends on X1
+ (2 + 3 X1 ) X2 + i ,
|
{z
}
slope depends on X1
3317/3343
intercept
X1
X2
X1 X2
Main effects
Coef t stat
0.71 0.94
3.88
4.29
0.46
0.71
Full model
Coef t stat
0.61 1.51
0.83
1.24
0.18
0.53
1.11
6.57
Excluding X2
Coef t stat
0.46 1.55
1.02
1.83
1.12
6.83
Centered
Coef t stat
1.90
6.08
2.73
5.30
1.08
3.01
1.11
6.57
Correlations (where e
xi = e
xi E[Xi ]):
y
x1
x2
3318/3343
non-centered
x1
x2 x1 x2
0.92 0.82 0.97
0.86
0.8
0.8
y
e
x1
e
x2
centered
e
e
x1
x2 e
x1 e
x2
0.92 0.82 0.51
0.86 0.07
0.07
2
0
2
1
0.5
0.5
1.5
2.5
3.5
x1i
variance of i decreasing function of x1i x2i
2
0
2
0
3319/3343
6
x1i x2i
10
12
14
Model selection
Reduction of number of explanatory variables
Model validation
=
0 +
1 LnIncomei +
2 Singlei +
i
= 0.42+
1.12 LnIncomei
0.51 Singlei +
i
(0.56) (0.05)
(0.16)
(0.57)
Nonsingles
Singles
13
LnFace
12
11
10
9
8
7
6
6
3321/3343
9
10
LnIncome
11
12
13
15
14
Nonsingles
Singles
13
LnFace
12
11
10
9
8
7
6
6
3322/3343
9
10
LnIncome
11
12
13
1.5
Nonsingles
Singles
Residual
0.5
0.5
1.5
6
3323/3343
9
10
LnIncome
11
12
13
=
=
0 +
1 LnIi +
2
0.11+
1.07 LnIi
3.28
(0.61) (0.06)
(1.41)
Si +
Si +
3 Si LnIi +
0.27 Si LnIi +
(0.14)
i
i
(0.56)
Interpretation coefficients:
- 0 : Intercept for non-singles;
- 1 : Marginal effect of LnIncome;
- 2 : Difference in intercept singles, non-singles i.e., 0 + 2 is
the intercept for singles.
- 3 : Difference in marginal effect of LnIncome singles,
non-singles i.e., 1 + 3 is the marginal effect of LnIncome for
singles.
3324/3343
15
14
Nonsingles
Singles
13
LnFace
12
11
10
9
8
7
6
6
3325/3343
9
10
LnIncome
11
12
13
1.5
Nonsingles
Singles
Residual
0.5
0.5
1.5
6
3326/3343
9
10
LnIncome
11
12
13
=
0 +
1 YEi + i
= 406+
33 YEi + i
(8.97) (0.65)
(12)
3327/3343
150
Hour wage
100
50
0
12
3328/3343
12.5
13
13.5
14
14.5
years edu
15
15.5
16
20
15
10
Residual
5
0
5
10
15
20
12
3329/3343
12.5
13
13.5
14
14.5
years edu
15
15.5
16
=
0 +
1
= 151+
13.7
(3.91) (0.30)
YEi +
YEi +
2 (YEi 14) Di +
i
35 (YEi 14) Di +
i
(0.47)
(2.37)
150
Hour wage
100
50
0
12
3331/3343
12.5
13
13.5
14
14.5
years edu
15
15.5
16
6
4
Residual
2
0
2
4
6
12
3332/3343
12.5
13
13.5
14
14.5
years edu
15
15.5
16
= 0 +
1 Ci +
i
= 33.0+
8.00 Ci +
i
(1.36) (0.91)
(15.7)
Categorial
50
90
40
80
30
70
20
60
10
Residual
Yearly income
Empirical
100
50
40
0
10
30
20
20
30
10
40
0
3334/3343
1
2
Edu level
50
1
2
Edu level
Dummies
50
90
40
80
30
70
20
60
10
Residual
Yearly income
Empirical
100
50
40
0
10
30
20
20
30
10
40
0
3335/3343
1
2
Edu level
50
1
2
Edu level
= 0 +
1 D1,i +
2 D2,i +
3 D3,i +
i
= 33.4+
5.32 D1,i + 18.07 D2,i + 14.50 D3,i +
i
(1.53) (2.12)
(2.02)
(4.05)
(15.5)
Interpretation:
-
3336/3343
0 :
1 :
2 :
3 :
Average
Average
Average
Average
income of
additional
additional
additional
high school.
income of college relative to HS.
income of university relative to HS.
income of PhD relative to HS.
Model selection
Reduction of number of explanatory variables
Model validation
combinations
1
3
7
15
31
number of X
6
7
8
9
10
combinations
63
127
255
511
1023
3338/3343
3339/3343
Model selection
Reduction of number of explanatory variables
Model validation
Comparing models
How to compare two regression models with the same number
of explanatory variables:
- F -statistic;
- The variability of the residual (s);
- R-squared.
3340/3343
n
X
i=n1 +1
3341/3343
(yi ybi )2
n
X
yi yb(i)
2
i=
3343/3343