Econometrics UC3M

Lesson 1: Multiple Regression Analysis
(Wooldridge Ch. 3.1,3.2)
Course: Econometrics For Finance

Professor: Ricardo Mora
UC3
M
Master in Finance, UC3M
Econometrics for Finance, week 1
1 / 30
Introduction
The Multiple Regression Model
Two Examples
OLS
Simple and Multiple Regression
Goodness of fit
2 / 30
Introduction
Introduction
3 / 30
Introduction
The simple linear regression model

Yi = 0 + 1 Xi + ui ,
i = 1, . . . , n
A sample with n observations (Xi , Yi ), i = 1, .., n.

X is the independent variable or regressor, Y is the dependent
variable
ui = the error (changes in Yi due to other factors)
Assumptions
1
Y = 0 + 1 X + u (linearity)
E (u|X = x) = 0 (conditional expectation equal to zero)
i.i.d. sample (Xi , Yi ), i = 1, . . . , n (random sampling)
varn (Xi ) 6= 0
4 / 30
Introduction
OLS is unbiased
covn (Xi , Yi )
b1 =
,
varn (Xi )
b0 = Y b1 X
For any sample n, if Assumptions 1-4 are true

b0 = 0
1 E

b1 = 1
2 E
5 / 30
Introduction
Omitted variable bias

ui contains changes in Yi from variables that are not included in the
regression There are always omitted variables
When the omitted variables are correlated with the regressors, OLS is
biased.
Example: harvesti = 0 + 1 fertilizeri + ui

Consider omitted variable land quality:
Land quality plausibly affects the harvest (higher land quality implies
higher u)
Farmers presumably spend more fertilizer in fields that are of lower
quality: E (u|fertilizer ) decreases with fertilizer
OLS is biased: En (b
ui fertilizeri ) = 0 (only reasonable if land quality
is not related with fertilizer)
What is the direction of this bias?
6 / 30
7 / 30
Solutions to omitted variable bias

We can make sure that regressor Xi is unrelated to omitted variables:
force Xi to vary randomly in your sample
this is impossible with observational data
We can try to reduce the sample so that omitted factors are not
correlated with Xi
samples will be very small and we will not have a general answer to the
problem
We can add the omitted variable as a second regressor (we see this
today)
We could use another estimator that does not require conditional
mean independence (2SLS, beyond the scope of this course)
8 / 30
The model with multiple regressors
If we have an omitted variable bias, we can add the omitted variable as a

second regressor:
Yi = 0 + 1 X1i + 2 X2i + ui ,
i = 1, . . . , n
In this example, we have two regressors: X1 , X2

1 = effect on Y of a unitary change in X1 keeping everything else
constant
2 = effect on Y of a unitary change in X2 keeping everything else
constant
ui = error: changes in Y from factors unrelated to X1 and X2
9 / 30
Interpretation
change X1 in X1 units keeping X2 constant:

before: Y = 0 + 1 X1 + 2 X2 + u
after: Y + Y = 0 + 1 (X1 + X1 ) + 2 X2 + u
difference: Y = 1 X1
If X1 = 1, X2 = 0 Y = 1
Similarly, if X1 = 0, X2 = 1 Y = 2
10 / 30
Two Examples
Two Interesting Examples of the Multiple

Regression Model
11 / 30
Two Examples
Polinomials: Y = 0 + 1 X + 2 X 2 + u
1 does not represent the effect on Y of a unitary change in X
keeping everything else constant, because when X changes then X 2
also changes
E(Y |X )
=1 + 2 2 X
X
log (wages) = 6.7 + 0.03 experience 0.0005 experience 2 + u

E(log(wages)|experience )
=0.03 0.001 experience
experience
If you have no experience, your wage is around e 6.7
= 812.4 euros, and

the expected effect of an additional year of experience is a 3 percent
increase in your wage
If you have 10 years of experience, then an additional year gives you

an expected increase of 2 percent (0.03 0.001 10 = 0.02)
12 / 30
Two Examples
Interactions
Yi = 0 + 1 X1i + 2 X2i + 3 X1i X2i + ui ,
i = 1, . . . , n
Marginal effects depend on the values of other regressors

E(Y |X1 )
=1 + 3 X2
X1
log (wages) = 7 0.02 female + 0.07 educ 0.01 educ female + u

E(log(wages)|educ )
=0.07 0.01 female
educ
What are the returns of an additional year of education?

What is the expected wage differential between males and females?
13 / 30
OLS
Ordinary Least Squares
14 / 30
OLS
OLS with k regressors
o
n
b0 , b1 , ..., bk =
min
[Yi (b0 + b1 X1i + ... + bk Xki )]2
b0 ,b1 ,...,bk i=1
First order conditions:

h

i
b0 : (2) ni=1 Yi b0 + b1 X1i + ...bk Xki
=0
h

i
b1 : (2) ni=1 X1i Yi b0 + b1 X1i + ...bk Xki
=0
..
h

i .
bk : (2) ni=1 Xki Yi b0 + b1 X1i + ...bk Xki
=0
15 / 30
OLS
FOCs in terms of residuals
ni=1 ubi
ni=1 ubi X1i
=0
=0
..
.
ni=1 ubi Xki
=0
As with the simple regression model:

ni=1 ubi = 0 ub = 0 (residuals
have zero average)
ni=1 Xji ubi = 0 covn Xji , ubi = 0 (residuals have zero covariance with
all regressors)
16 / 30
OLS
Matrix Form
Let X =
1 X11
1 X12
..
..
.
.
1 X1n
ni=1 X1i
..
.
ni=1 Xki
. . . Xk1
. . . Xk2
..
...
.
and Y =
. . . Xkn
ni=1 X1i
ni=1 X1i2
..
.
ni=1 X1i Xki
...
...
..
.
...
Y1
Y2
..
.
, then from FOCs:
Yn
ni=1 Xki
n X X
i=1 1i ki
..
.
ni=1 Xki2
b0
b0
..
.
b
k
ni=1 Yi
n Y X
i=1 i 1i
..
.
ni=1 Yi Xki

X 0 X b = X 0 Y
This system of k + 1 equations with k + 1 unknowns has a unique

solution if rank (X 0 X ) = k + 1:
1 0
b = X 0 X
XY
17 / 30
OLS
Determinants of the Grade Point Average
\ = 2.40 + 0.027 ACT

Simple Regresion: colGPA
\ = 1.29 + 0.009 ACT + 0.453 hsGPA
Adding hsGPA: colGPA
What happens to the estimate of the coefficient associated to ACT ?
Why? (covN (ACT , hsGPA) > 0)
18 / 30
19 / 30
The issue
Structural (long) model

wages = 0 + 1 educ + 2 IQ + i , cov (educ, ) = cov (IQ, ) = 0
You do not have information on IQ: reduced form model

wages = 0 + 1 educ + u, cov (educ, u) = 0
this is the best linear predictor having only information on education
Is there any relation between 1 and 1 ?
20 / 30
Orthogonality conditions in the long model
cov (educ, ) = 0
cov (IQ, ) = 0
Since = wages 0 1 educ 2 IQ,
cov (educ, wages) 1 var (educ) 2 cov (educ, IQ) = 0
cov (IQ, wages) 1 cov (IQ, educ) 2 var (IQ) = 0
This is a system of two linear equations and two unknowns (1 and
2 )
21 / 30
Uncorrelated regressors
If educ and IQ are uncorrelated, cov (educ, IQ) = 0 and the system
becomes
cov (educ, wages) 1 var (educ) = 0
cov (IQ, wages) 2 var (IQ) = 0
The first condition coincides with the orthogonality condition in the
BLP (short) model
cov (educ, wages) 1 var (educ) = 0
(educ,wages)
Hence, 1 = 1 = covvar
if cov (educ, IQ) = 0
(educ)
If regressors are uncorrelated, the parameters of the multiple regression

model are equal to the parameters of the simple regression models for each
control
22 / 30
The short model with correlated controls (1/2)

With correlated controls, the long models FOCs stay complex:
cov (educ, wages) 1 var (educ) 2 cov (educ, IQ) = 0
cov (IQ, wages) 1 cov (IQ, educ) 2 var (IQ) = 0
Dividing the first condition by var (educ):
cov(educ,wages) = + cov(educ,IQ)
1
2 var(educ)
var(educ)
(educ,wages)
But 1 = covvar
so that
(educ)
(educ,IQ)
1 = 1 + 2 cov
var(educ)
23 / 30
The short model with correlated controls (2/2)

(educ,IQ)
1 = 1 + 2 cov
var(educ)
The coefficient 1 in the short model captures two effects on wages:

1
the ceteris paribus effect of educ for a given IQ level: 1
the effects of changes in IQ which are simultaneous to changes in

educ:
(educ,IQ)
2 cov
var(educ)
(educ,IQ)
Note that cov
var(educ) captures the change brought about in IQ by an
independent change in educ!!
24 / 30
Goodness of fit
Goodness of fit
25 / 30
Goodness of fit
Goodness of fit measures

s
SER =
n
1
ubi2
n k 1 i=1
2

bi Yb
Y
ESS
R2 =
=
2
TSS
ni=1 Yi Y
ni=1
Note that R 2 increases with k. The adjusted R 2 penalizes additional

2
n1
RSS
regressors: R = 1 nk1
TSS
GPA data:
1
2
Simple regression: R 2 = .0427,

SER = 0.3656
2
Adding hsGPA, R 2 = .1764,
R = .1645,
SER = 0.3403
26 / 30
Goodness of fit
Properties of the R 2
it represents the proportion of the sample variance of the dependent

variable explained by the model
It never diminishes when adding new regressors
hence, it cannot be used to check if we should add another regressor to
the equation
If the model has a constant, TSS = ESS + RSS and 0 R 2 1
27 / 30
Goodness of fit
Example: Crime data (I)

Sample: 2, 725 men born in California in 1960 or 1961 who had been
arrested at least one before 1986
We want to model the number of arrests these men had in 1986
\
narr
86 = 0.712 0.150 pcnv 0.034 ptime86 0.104 qemp86,
2
R = 0.0413
the expected number of arrests depends negatively on
the
the
the
the
more times they were found guilty in the past as a proportion of

times they had been arrrested, pcnv
more months they had been in prison in 1986, ptime86
more quarters they had been employed, qemp86
all these factors account for about 4 percent of the total variation in
the arrests
28 / 30
Goodness of fit
Example: Crime data (II)
\
narr
86 = 0.712 0.150 pcnv 0.034 ptime86 0.104 qemp86,
R 2 = 0.0413
We now add as a new factor the average sentence length in months
\
narr
86 = 0.707 0.151 pcnv + 0.0074 avgsen 0.037 ptime86
0.103 qemp86,
R 2 = 0.0422
there is very little improvement in the R 2 the expected number of
arrests depends negatively on
the estimates of the other coefficients hardly change
the sign of the coefficient estimate for avgsen is unusual
should we keep avgsen in the regression?
29 / 30
Summary
Summary
Sometimes, the omission of variables can lead to bias in the OLS

estimator
The best solution to avoid OV bias is to include the omitted variable
in the regression
Slope coefficients in the multiple regression model capture ceteris
paribus effects for variables who enter the model linearly
OLS still exploits sample orthogonality conditions
30 / 30

Econometrics UC3M

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Econometrics UC3M

Încărcat de

Drepturi de autor:

Formate disponibile

Lesson 1: Multiple Regression Analysis

(Wooldridge Ch. 3.1,3.2)

Course: Econometrics For Finance

Econometrics for Finance, week 1

The Multiple Regression Model

Simple and Multiple Regression

Master in Finance, UC3M

Econometrics for Finance, week 1

Master in Finance, UC3M

Econometrics for Finance, week 1

The simple linear regression model

A sample with n observations (Xi , Yi ), i = 1, .., n.

E (u|X = x) = 0 (conditional expectation equal to zero)

i.i.d. sample (Xi , Yi ), i = 1, . . . , n (random sampling)

Master in Finance, UC3M

Econometrics for Finance, week 1

For any sample n, if Assumptions 1-4 are true

Master in Finance, UC3M

Econometrics for Finance, week 1

Omitted variable bias

Example: harvesti = 0 + 1 fertilizeri + ui

Econometrics for Finance, week 1

The Multiple Regression Model

The Multiple Regression Model

Master in Finance, UC3M

Econometrics for Finance, week 1

The Multiple Regression Model

Solutions to omitted variable bias

Master in Finance, UC3M

Econometrics for Finance, week 1

The Multiple Regression Model

The model with multiple regressors

If we have an omitted variable bias, we can add the omitted variable as a

In this example, we have two regressors: X1 , X2

Master in Finance, UC3M

Econometrics for Finance, week 1

The Multiple Regression Model

change X1 in X1 units keeping X2 constant:

Master in Finance, UC3M

Econometrics for Finance, week 1

Two Interesting Examples of the Multiple

Master in Finance, UC3M

Econometrics for Finance, week 1

log (wages) = 6.7 + 0.03 experience 0.0005 experience 2 + u

= 812.4 euros, and

If you have 10 years of experience, then an additional year gives you

Econometrics for Finance, week 1

Yi = 0 + 1 X1i + 2 X2i + 3 X1i X2i + ui ,

Marginal effects depend on the values of other regressors

log (wages) = 7 0.02 female + 0.07 educ 0.01 educ female + u

What are the returns of an additional year of education?

Master in Finance, UC3M

Econometrics for Finance, week 1

Ordinary Least Squares

Master in Finance, UC3M

Econometrics for Finance, week 1

OLS with k regressors

[Yi (b0 + b1 X1i + ... + bk Xki )]2

b0 ,b1 ,...,bk i=1

First order conditions:

Master in Finance, UC3M

Econometrics for Finance, week 1

FOCs in terms of residuals

ni=1 ubi Xki