Documente Academic
Documente Profesional
Documente Cultură
2
SSR (sum of square residuals) ≡ ∑𝑛𝑖=1(𝑌𝑖 − 𝑌̂𝑖 )
2
ESS (explained sum of squares) ≡ ∑𝑛𝑖=1(𝑌̂𝑖 − 𝑌)
𝑛 𝑛
1 2 1
𝑆𝐸𝑅 ≡ √ ∑(𝑢̂𝑖 − 𝑢̂) = √ ∑ 𝑢̂𝑖 2
𝑛−2 𝑛−2
𝑖=1 𝑖=1
𝑛
1
𝑅𝑀𝑆𝐸 = √ ∑ 𝑢̂𝑖 2
𝑛
𝑖=1
Therefore, the larger the 𝜎𝑥 , the smaller the 𝑉𝑎𝑟(𝛽̂1 − 𝛽1 ), since more spread in X
means more information about 𝛽̂1.
Testing Hypothesis
First, we make our naull hypothesis and two-side alternative:
𝐻0 : 𝛽1 = 𝛽1,0 𝑣𝑠. 𝐻1 : 𝛽1 ≠ 𝛽1,0
In general,
estimator − hypothesized value
𝑡=
S. E. of the estimator
where the S.E. of the estimator is the square root of an estimator of the variance of the
estimator.
Applied to a hypothesis about 𝛽1:
𝛽̂1 − 𝛽1,0
𝑡=
𝑆𝐸(𝛽̂1 )
where 𝛽1,0 is the hypothesized value of 𝛽1. And 𝑆𝐸(𝛽̂1 ):
1 𝑛
2 2
𝑉𝑎𝑟(𝑣) 1 𝑛 − 2 ∑𝑖=1(𝑋𝑖 − 𝑋) 𝑢̂𝑖
𝑆𝐸(𝛽̂1 ) = √𝑉𝑎𝑟(𝛽̂1 ) = √𝜎̂𝛽̂1 2 =√ =√ ×
𝑛(𝜎𝑥 2 )2 𝑛 1 2 2
[𝑛 ∑𝑛𝑖=1(𝑋𝑖 − 𝑋) ]
𝜎 2
If we let 𝑉𝑎𝑟(𝑢𝑖 |𝑋𝑖 = 𝑥) = 𝜎𝑢 2 , then 𝑉𝑎𝑟(𝛽̂1 ) = 𝑛𝜎𝑢 2.
𝑋
And there’s no need to remember this since the software will calculate for us. Then
the following procedures is as usual.
Confidence intervals
Several examples.
Regression when X is Binary
A binary variable is sometimes called a dummy variable or an indicator variable. And
how do we interpret a regression with a binary regressor? For example, if 𝑌𝑖 = 𝛽0 +
𝛽1 𝑋𝑖 + 𝑢𝑖 and 𝑋𝑖 = 0 𝑜𝑟 1, then we can yield
𝛽1 = 𝐸(𝑌𝑖 |𝑋𝑖 = 1) − 𝐸(𝑌𝑖 |𝑋𝑖 = 0)
which is the population difference in group means.
Heteroskedasticity and Homoskedasticity
The meaning of these two terms:
(1) If 𝑉𝑎𝑟(𝑢|𝑋 = 𝑥) is constant, that is, the variance of the conditional distribution
齊一).
So far we have assumed that u is heteroskedastic. Let us recall the three least squares
assumption, since we didn’t restricted the conditional variance, it’s allowed that we
assume u is heteroskedastic.
What if the errors are in fact homoskedastic? We can prove that OLS has the lowest
variance among estimators that are linear in Y, a result called the Gauss-Markov
theorem.
Gauss-Markov conditions:
(i) 𝐸(𝑢𝑖 |𝑋1 , … , 𝑋𝑛 ) = 0
(ii) 𝑉𝑎𝑟(𝑢𝑖 |𝑋1 , … , 𝑋𝑛 ) = 𝜎𝑢 2 , 0 < 𝜎𝑢 2 < ∞, 𝑓𝑜𝑟 𝑖 = 1, … , 𝑛
(iii) 𝐸(𝑢𝑖 𝑢𝑗 |𝑋1 , … , 𝑋𝑛 ) = 0, 𝑖 = 1, … , 𝑛, 𝑖 ≠ 𝑗
Gauss-Markov Theorem
Under the Gauss-Markov conditions, the OLS estimator 𝛽̂1 is BLUE (Best Linear
Unbiased Estimator). That is, 𝑉𝑎𝑟(𝛽̂1|𝑋1 , … , 𝑋𝑛 ) ≤ 𝑉𝑎𝑟(𝛽̃1|𝑋1 , … , 𝑋𝑛 ) for all linear
conditionally unbiased estimators 𝛽̃1.
And the special case for the standard error of 𝛽̂1 under homoskedasticity is
1
1 ∑𝑛𝑖=1 𝑢̂𝑖 2
2
𝜎̂𝛽̂1 = × 𝑛 − 2
𝑛 1 ∑𝑛 (𝑋 − 𝑋)2
𝑛 𝑖=1 𝑖
However we’ll use the heteroskedastic formula since it’s OK for both heteroskedastic
and homoskedastic cases.
Weighted Least Squares (WLS)
Since OLS under homoscedasticity is efficient, traditional approach is trying to
transform a heteroskedastic model into a homoskedastic one.
Suppose the conditional variance of ui is known as a function of Xi, namely
𝑉𝑎𝑟(𝑢𝑖 |𝑋𝑖 ) = 𝜆ℎ(𝑋𝑖 )
Then we can divide both sides of the single-variable regression model by √ℎ(𝑋𝑖 ) to
obtain 𝑌̃𝑖 = 𝛽0 𝑋̃0𝑖 + 𝛽1 𝑋̃1𝑖 + 𝑢̃𝑖 , where
𝑌̃𝑖 = 𝑌𝑖 ⁄√ℎ(𝑋𝑖 ) , 𝑋̃0𝑖 = 1⁄√ℎ(𝑋𝑖 ) , 𝑋̃1𝑖 = 𝑋𝑖 ⁄√ℎ(𝑋𝑖 ) , 𝑢̃𝑖 = 𝑢𝑖 ⁄√ℎ(𝑋𝑖 )
𝑉𝑎𝑟(𝑢̃|𝑋𝑖 ) = 𝑉𝑎𝑟(𝑢𝑖 )⁄ℎ(𝑋𝑖 ) = 𝜆
The WLS estimator is the OLS estimator obtained by regressing by regressing 𝑌̃𝑖 on
̂𝑖 )
𝑋̃0𝑖 and 𝑋̃1𝑖 . However, ℎ(𝑋𝑖 ) is usually unknown, then we have to estimate ℎ(𝑋
̂𝑖 ). This is called feasible WLS.
first and then replace ℎ(𝑋𝑖 ) with ℎ(𝑋
𝑝 𝜎𝑢
𝛽̂1 → 𝛽1 + 𝜌𝑋𝑢
𝜎𝑋
Now we can see the omitted factor Z indeed requires the bias between 𝛽̂1 and 𝛽1.
And the direction of the bias in 𝛽̂1 depends on whether X and u are positively or
negatively correlated. To be more specific, suppose that the true model is
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝛽2 𝑍𝑖 + 𝑢𝑖 , 𝐶𝑜𝑣(𝑋𝑖 , 𝑢𝑖 ) = 0
The estimated model when Z is omitted is
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝜖𝑖 , 𝜖𝑖 ≡ 𝛽2 𝑍𝑖 + 𝑢𝑖
Then 𝐶𝑜𝑣(𝑋𝑖 , 𝜖𝑖 ) = 𝐶𝑜𝑣(𝑋𝑖 , 𝛽2 𝑍𝑖 + 𝑢𝑖 ) = 𝛽2 𝐶𝑜𝑣(𝑋𝑖 , 𝑍𝑖 ), therefore
𝑝 𝐶𝑜𝑣(𝑋𝑖 , 𝑍𝑖 )
𝛽̂1 → 𝛽1 + 𝛽2
𝑉𝑎𝑟(𝑋𝑖 )
Sometimes, we do regressions in order to figure out the causal effect of a certain
event. Then what’s the definition of causal effect here? We define: a causal effect is
defined to be the effect measured in an ideal randomized controlled experiment.
The Multiple Regression Model
Consider the case of two regressors:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + 𝑢𝑖 , 𝑖 = 1, … , 𝑛
𝑋1, 𝑋2 are the two independent variables (regressors).
(𝑌𝑖 , 𝑋1𝑖 , 𝑋2𝑖 ) denote the i-th observation on Y, X1, and X2.
𝛽0: unknown population intercept.
𝛽1: effect on Y of a change in X1, holding X2 constant.
𝛽2: effect on Y of a change in X2, holding X1 constant.
𝑢𝑖 : “error term” (omitted factors).
The OLS estimator minimizes the sum of squared difference between the actual
values of Yi and the prediction (predicted value) based on the estimated line. This
minimization problem yields the OLS estimators of 𝛽0, 𝛽1 and 𝛽2.
Measures of Fit
Actual= predicted+ residual: 𝑌𝑖 = 𝑌̂𝑖 + 𝑢̂𝑖
SER (standard error of regression)= standard error of 𝑢̂𝑖 (with degree of freedom
1
correction) = √𝑛−𝑘−1 ∑𝑛𝑖=1 𝑢̂𝑖 2
1
RMSE= standard error of 𝑢̂𝑖 (without degree of freedom correction) = √𝑛 ∑𝑛𝑖=1 𝑢̂𝑖 2
𝐸𝑆𝑆 𝑆𝑆𝑅
R2= fraction of the sample variance of Y explained by X= 𝑇𝑆𝑆 = 1 − 𝑇𝑆𝑆 , and R2
2
always increases when we add another regressor (will be fixed by defining R ).
2 𝑛−1 𝑆𝑆𝑅
R = “adjusted R2”= 1 − (𝑛−𝑘−1) 𝑇𝑆𝑆, and it can be negative.
be tested using the usual t-statistic, and confidence intervals are constructed as
(𝛽̂1 ± 1.96 × 𝑆𝐸(𝛽̂1 )), and so as for 𝛽2 , … , 𝛽𝑘 . (Note that 𝛽̂1 and 𝛽̂2 are generally
becomes
1 2
𝐹≅ (𝑡 + 𝑡2 2 )
2 1
And in large samples, F-statistic distributed as𝜒𝑞 2 ⁄𝑞 .
p-value here= tail probability of the 𝜒𝑞 2 ⁄𝑞 distribution beyond the F-statistic
actually computed.
Single Restriction Test
Consider the null and alternative hypothesis,
𝐻0 : 𝛽1 = 𝛽2 𝑣𝑠. 𝐻1 : 𝛽1 ≠ 𝛽2
This null imposes a single restriction (q=1) on multiple coefficients. Two methods for
testing single restrictions on multiple coefficients:
(1) Rearrange (“transform”) the regression. Rearrange the regressors so that the
restriction becomes a restriction on a single coefficient in an equivalent
regression.
(2) Perform the test directly.
Let’s see how the method one performs: let
𝑌𝑖 = 𝛽0 + 𝛾1 𝑋1𝑖 + 𝛽2 𝑊𝑖 + 𝑢𝑖
where 𝛾1 = 𝛽1 − 𝛽2 , 𝑊𝑖 = 𝑋1𝑖 + 𝑋2𝑖 . So now
𝐻0 : 𝛾1 = 0 𝑣𝑠. 𝐻1 : 𝛾1 ≠ 0
Model Specification
The job of determining which variables to include in multiple regression- that is, the
problem of choosing a regression specification- can be quite challenging, and no
single rule applies in all situations. The starting point for choosing a regression
specification is thinking through the possible sources of omitted variable bias. It’s
important to rely on our expert knowledge of the empirical problem and to focus on
obtaining an unbiased estimate of the casual effect of interest. Don’t rely solely on
purely statistical measures of fit.
A control variable isn’t the object of interest; rather it’s a regressor included to hold
constant factors that, if neglected, could lead the estimated casual effect of interest to
suffer from omitted variable bias. And the mechanism to make our interested variable
unbiased through adding control variable(s) will be introduced later.
Three interchangeable statements about what makes an effective control variable:
(i) An effective control variable is one which, when included in the regression,
makes the error term uncorrelated with the variable of interest.
(ii) Holding constant the control variable(s), the variable of interest is “as if”
randomly assigned.
(iii) Among individuals (entities) with the same value of the control variable(s), the
variable of interest is uncorrelated with the omitted determinants of Y.
We need a mathematical statement of what makes an effective control variable. This
condition is conditional mean independence: given the control variable, the mean of ui
doesn’t depend on the variable of interest.
Conditional mean independence: let Xi denote the variable of interest and Wi denote
the control variable(s). Wi is an effective control variable if conditional mean
independence holds: 𝐸(𝑢𝑖 |𝑋𝑖 , 𝑊𝑖 ) = 𝐸(𝑢𝑖 |𝑊𝑖 ).
Consider the regression model,
𝑌 = 𝛽0 + 𝛽1 𝑋 + 𝛽2 𝑊 + 𝑢
Where X is the variable of interest and W is an effective control variable. In addition,
suppose that LSA #2, #3, and #4 hold. Then:
(1) 𝛽1 has a casual interpretation.
(2) 𝛽̂1 is unbiased.
(3) The coefficient on the control variable,𝛽̂2, is in general biased. This bias stems
from the fact that the control variable is correlated with omitted variables in the
error term, so that is subject to omitted variable bias.
6 Instrumental Variable Regression
One Regressor and One Instrument
Three important threats to internal validity are:
(1) Omitted variable bias from a variable that is correlated with X but is un observed,
so cannot be included in the regression;
(2) Simultaneous causality bias (X causes Y, Y causes X);
(3) Errors-in-variables bias (X is measured with error).
Instrumental variables regression can eliminate bias when 𝐸(𝑢|𝑋) ≠ 0 – using an
instrumental variable, Z.
Instrumental variable (IV) regression is a general way to obtain a consistent estimator
of the unknown coefficients of the population regression function when the regressor,
X, is correlated with the error term, u.
The information about the movements in X that are uncorrelated with u is gleaned
from one or more additional variables, called instrumental variables or simply
instruments.
IV regression uses these additional variables as tools or “instruments” to isolate the
movements in X that are uncorrelated with u, which in turn permit consistent
estimation of the regression coefficients.
The IV Model and Assumptions
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋𝑖 + 𝑢𝑖
If Xi and ui are correlated, the OLS estimator is inconsistent.
IV estimation uses an additional, “instrumental” variable Z to isolate that part of Xi
that is uncorrelated with ui.
Terminology: an “endogenous” variable is one that is correlated with u; an
“exogenous” variable is one that is uncorrelated with u.
Two conditions for a valid IV, Z:
(1) Instrument relevance: 𝐶𝑜𝑣(𝑍𝑖 , 𝑋𝑖 ) ≠ 0
(2) Instrument exogeneity: 𝐶𝑜𝑣(𝑍𝑖 , 𝑢𝑖 ) = 0
The Two Stage Least Squares (TSLS) Estimator
As it sounds, TSLS has two stages- two regressions:
(1) First isolates the part of X that is uncorrelated with u: regress X on Z using OLS.
𝑋𝑖 = 𝜋0 + 𝜋1 𝑍𝑖 + 𝑣𝑖
Because 𝑍𝑖 is uncorrelated with 𝑢𝑖 , 𝜋0 + 𝜋1 𝑍𝑖 is uncorrelated with 𝑢𝑖 . We
don’t know 𝜋0 or 𝜋1 but we have estimated them.
Compute the predicted values of 𝑋𝑖 : 𝑋̂𝑖 = 𝜋̂0 + 𝜋̂1 𝑍𝑖 . (And thus 𝑋̂𝑖 is
uncorrelated with 𝑢𝑖 .)
(2) Replace Xi by 𝑋̂𝑖 in the regression of interest: regress Y on 𝑋̂𝑖 using OLS:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋̂𝑖 + 𝑢𝑖
Because 𝑋̂𝑖 is uncorrelated with 𝑢𝑖 in large samples, so the first least squares
assumption holds.
Thus 𝛽1 can be estimated by OLS using regression (2).
This argument relies on large samples (so 𝜋0 and 𝜋1 are well estimated using
regression (1)).
𝑇𝑆𝐿𝑆
This resulting estimator is called the TSLS estimator, 𝛽̂1 .
𝐶𝑜𝑣(𝑌 ,𝑍 )
Theoretically, we can show that 𝛽1 = 𝐶𝑜𝑣(𝑋𝑖 ,𝑍𝑖 ). Similarly, the IV estimator:
𝑖 𝑖
𝑇𝑆𝐿𝑆 𝑆𝑌𝑍
𝛽̂1 =
𝑆𝑋𝑍
𝑇𝑆𝐿𝑆 𝑝
We can also show that the consistency of the TSLS estimator: 𝛽̂1 → 𝛽1.