Sunteți pe pagina 1din 30

Lesson 1: Multiple Regression Analysis

(Wooldridge Ch. 3.1,3.2)

Course: Econometrics For Finance


Professor: Ricardo Mora

UC3
M
Master in Finance, UC3M

Econometrics for Finance, week 1

1 / 30

Introduction

The Multiple Regression Model

Two Examples

OLS

Simple and Multiple Regression

Goodness of fit

Master in Finance, UC3M

Econometrics for Finance, week 1

2 / 30

Introduction

Introduction

Master in Finance, UC3M

Econometrics for Finance, week 1

3 / 30

Introduction

The simple linear regression model


Yi = 0 + 1 Xi + ui ,

i = 1, . . . , n

A sample with n observations (Xi , Yi ), i = 1, .., n.


X is the independent variable or regressor, Y is the dependent
variable
ui = the error (changes in Yi due to other factors)
Assumptions
1

Y = 0 + 1 X + u (linearity)

E (u|X = x) = 0 (conditional expectation equal to zero)

i.i.d. sample (Xi , Yi ), i = 1, . . . , n (random sampling)

varn (Xi ) 6= 0

Master in Finance, UC3M

Econometrics for Finance, week 1

4 / 30

Introduction

OLS is unbiased

covn (Xi , Yi )
b1 =
,
varn (Xi )

b0 = Y b1 X

For any sample n, if Assumptions 1-4 are true


 
b0 = 0
1 E
 
b1 = 1
2 E

Master in Finance, UC3M

Econometrics for Finance, week 1

5 / 30

Introduction

Omitted variable bias


ui contains changes in Yi from variables that are not included in the
regression There are always omitted variables
When the omitted variables are correlated with the regressors, OLS is
biased.

Example: harvesti = 0 + 1 fertilizeri + ui


Consider omitted variable land quality:
Land quality plausibly affects the harvest (higher land quality implies
higher u)
Farmers presumably spend more fertilizer in fields that are of lower
quality: E (u|fertilizer ) decreases with fertilizer

OLS is biased: En (b
ui fertilizeri ) = 0 (only reasonable if land quality
is not related with fertilizer)
What is the direction of this bias?
Master in Finance, UC3M

Econometrics for Finance, week 1

6 / 30

The Multiple Regression Model

The Multiple Regression Model

Master in Finance, UC3M

Econometrics for Finance, week 1

7 / 30

The Multiple Regression Model

Solutions to omitted variable bias


We can make sure that regressor Xi is unrelated to omitted variables:
force Xi to vary randomly in your sample
this is impossible with observational data

We can try to reduce the sample so that omitted factors are not
correlated with Xi
samples will be very small and we will not have a general answer to the
problem

We can add the omitted variable as a second regressor (we see this
today)
We could use another estimator that does not require conditional
mean independence (2SLS, beyond the scope of this course)

Master in Finance, UC3M

Econometrics for Finance, week 1

8 / 30

The Multiple Regression Model

The model with multiple regressors

If we have an omitted variable bias, we can add the omitted variable as a


second regressor:

Yi = 0 + 1 X1i + 2 X2i + ui ,

i = 1, . . . , n

In this example, we have two regressors: X1 , X2


1 = effect on Y of a unitary change in X1 keeping everything else
constant
2 = effect on Y of a unitary change in X2 keeping everything else
constant
ui = error: changes in Y from factors unrelated to X1 and X2

Master in Finance, UC3M

Econometrics for Finance, week 1

9 / 30

The Multiple Regression Model

Interpretation

change X1 in X1 units keeping X2 constant:


before: Y = 0 + 1 X1 + 2 X2 + u
after: Y + Y = 0 + 1 (X1 + X1 ) + 2 X2 + u
difference: Y = 1 X1

If X1 = 1, X2 = 0 Y = 1
Similarly, if X1 = 0, X2 = 1 Y = 2

Master in Finance, UC3M

Econometrics for Finance, week 1

10 / 30

Two Examples

Two Interesting Examples of the Multiple


Regression Model

Master in Finance, UC3M

Econometrics for Finance, week 1

11 / 30

Two Examples

Polinomials: Y = 0 + 1 X + 2 X 2 + u
1 does not represent the effect on Y of a unitary change in X
keeping everything else constant, because when X changes then X 2
also changes
E(Y |X )
=1 + 2 2 X
X

log (wages) = 6.7 + 0.03 experience 0.0005 experience 2 + u


E(log(wages)|experience )
=0.03 0.001 experience
experience
If you have no experience, your wage is around e 6.7

= 812.4 euros, and


the expected effect of an additional year of experience is a 3 percent
increase in your wage

If you have 10 years of experience, then an additional year gives you


an expected increase of 2 percent (0.03 0.001 10 = 0.02)
Master in Finance, UC3M

Econometrics for Finance, week 1

12 / 30

Two Examples

Interactions

Yi = 0 + 1 X1i + 2 X2i + 3 X1i X2i + ui ,

i = 1, . . . , n

Marginal effects depend on the values of other regressors


E(Y |X1 )
=1 + 3 X2
X1

log (wages) = 7 0.02 female + 0.07 educ 0.01 educ female + u


E(log(wages)|educ )
=0.07 0.01 female
educ

What are the returns of an additional year of education?


What is the expected wage differential between males and females?

Master in Finance, UC3M

Econometrics for Finance, week 1

13 / 30

OLS

Ordinary Least Squares

Master in Finance, UC3M

Econometrics for Finance, week 1

14 / 30

OLS

OLS with k regressors

o
n
b0 , b1 , ..., bk =

min

[Yi (b0 + b1 X1i + ... + bk Xki )]2

b0 ,b1 ,...,bk i=1

First order conditions:


h

i
b0 : (2) ni=1 Yi b0 + b1 X1i + ...bk Xki
=0
h

i
b1 : (2) ni=1 X1i Yi b0 + b1 X1i + ...bk Xki
=0
..
h

i .
bk : (2) ni=1 Xki Yi b0 + b1 X1i + ...bk Xki
=0

Master in Finance, UC3M

Econometrics for Finance, week 1

15 / 30

OLS

FOCs in terms of residuals

ni=1 ubi
ni=1 ubi X1i

=0
=0
..
.

ni=1 ubi Xki

=0

As with the simple regression model:


ni=1 ubi = 0 ub = 0 (residuals
 have zero average)
ni=1 Xji ubi = 0 covn Xji , ubi = 0 (residuals have zero covariance with
all regressors)

Master in Finance, UC3M

Econometrics for Finance, week 1

16 / 30

OLS

Matrix Form

Let X =

1 X11
1 X12
..
..
.
.
1 X1n

ni=1 X1i

..

.
ni=1 Xki

. . . Xk1
. . . Xk2
..
...
.

and Y =

. . . Xkn

ni=1 X1i
ni=1 X1i2
..
.
ni=1 X1i Xki

...
...
..
.
...

Y1
Y2
..
.

, then from FOCs:

Yn
ni=1 Xki
n X X
i=1 1i ki
..
.
ni=1 Xki2

b0
b0
..
.
b
k

ni=1 Yi
n Y X
i=1 i 1i
..
.
ni=1 Yi Xki


X 0 X b = X 0 Y

This system of k + 1 equations with k + 1 unknowns has a unique


solution if rank (X 0 X ) = k + 1:
1 0 
b = X 0 X
XY
Master in Finance, UC3M

Econometrics for Finance, week 1

17 / 30

OLS

Determinants of the Grade Point Average

\ = 2.40 + 0.027 ACT


Simple Regresion: colGPA
\ = 1.29 + 0.009 ACT + 0.453 hsGPA
Adding hsGPA: colGPA
What happens to the estimate of the coefficient associated to ACT ?
Why? (covN (ACT , hsGPA) > 0)

Master in Finance, UC3M

Econometrics for Finance, week 1

18 / 30

Simple and Multiple Regression

Simple and Multiple Regression

Master in Finance, UC3M

Econometrics for Finance, week 1

19 / 30

Simple and Multiple Regression

The issue

Structural (long) model


wages = 0 + 1 educ + 2 IQ + i , cov (educ, ) = cov (IQ, ) = 0

You do not have information on IQ: reduced form model


wages = 0 + 1 educ + u, cov (educ, u) = 0
this is the best linear predictor having only information on education

Is there any relation between 1 and 1 ?

Master in Finance, UC3M

Econometrics for Finance, week 1

20 / 30

Simple and Multiple Regression

Orthogonality conditions in the long model

cov (educ, ) = 0
cov (IQ, ) = 0
Since = wages 0 1 educ 2 IQ,
cov (educ, wages) 1 var (educ) 2 cov (educ, IQ) = 0
cov (IQ, wages) 1 cov (IQ, educ) 2 var (IQ) = 0
This is a system of two linear equations and two unknowns (1 and
2 )

Master in Finance, UC3M

Econometrics for Finance, week 1

21 / 30

Simple and Multiple Regression

Uncorrelated regressors
If educ and IQ are uncorrelated, cov (educ, IQ) = 0 and the system
becomes
cov (educ, wages) 1 var (educ) = 0
cov (IQ, wages) 2 var (IQ) = 0
The first condition coincides with the orthogonality condition in the
BLP (short) model
cov (educ, wages) 1 var (educ) = 0
(educ,wages)
Hence, 1 = 1 = covvar
if cov (educ, IQ) = 0
(educ)

If regressors are uncorrelated, the parameters of the multiple regression


model are equal to the parameters of the simple regression models for each
control
Master in Finance, UC3M

Econometrics for Finance, week 1

22 / 30

Simple and Multiple Regression

The short model with correlated controls (1/2)


With correlated controls, the long models FOCs stay complex:
cov (educ, wages) 1 var (educ) 2 cov (educ, IQ) = 0
cov (IQ, wages) 1 cov (IQ, educ) 2 var (IQ) = 0
Dividing the first condition by var (educ):
cov(educ,wages) = + cov(educ,IQ)
1
2 var(educ)
var(educ)
(educ,wages)
But 1 = covvar
so that
(educ)
(educ,IQ)
1 = 1 + 2 cov
var(educ)

Master in Finance, UC3M

Econometrics for Finance, week 1

23 / 30

Simple and Multiple Regression

The short model with correlated controls (2/2)


(educ,IQ)
1 = 1 + 2 cov
var(educ)

The coefficient 1 in the short model captures two effects on wages:


1

the ceteris paribus effect of educ for a given IQ level: 1

the effects of changes in IQ which are simultaneous to changes in


educ:
(educ,IQ)
2 cov
var(educ)
(educ,IQ)
Note that cov
var(educ) captures the change brought about in IQ by an
independent change in educ!!

Master in Finance, UC3M

Econometrics for Finance, week 1

24 / 30

Goodness of fit

Goodness of fit

Master in Finance, UC3M

Econometrics for Finance, week 1

25 / 30

Goodness of fit

Goodness of fit measures


s
SER =

n
1
ubi2
n k 1 i=1

2

bi Yb
Y
ESS
R2 =
=
2
TSS
ni=1 Yi Y
ni=1

Note that R 2 increases with k. The adjusted R 2 penalizes additional


2
n1
RSS
regressors: R = 1 nk1
TSS
GPA data:
1
2

Simple regression: R 2 = .0427,


SER = 0.3656
2
Adding hsGPA, R 2 = .1764,
R = .1645,
SER = 0.3403

Master in Finance, UC3M

Econometrics for Finance, week 1

26 / 30

Goodness of fit

Properties of the R 2

it represents the proportion of the sample variance of the dependent


variable explained by the model
It never diminishes when adding new regressors
hence, it cannot be used to check if we should add another regressor to
the equation

If the model has a constant, TSS = ESS + RSS and 0 R 2 1

Master in Finance, UC3M

Econometrics for Finance, week 1

27 / 30

Goodness of fit

Example: Crime data (I)


Sample: 2, 725 men born in California in 1960 or 1961 who had been
arrested at least one before 1986
We want to model the number of arrests these men had in 1986
\
narr
86 = 0.712 0.150 pcnv 0.034 ptime86 0.104 qemp86,
2
R = 0.0413
the expected number of arrests depends negatively on
the
the
the
the

more times they were found guilty in the past as a proportion of


times they had been arrrested, pcnv
more months they had been in prison in 1986, ptime86
more quarters they had been employed, qemp86

all these factors account for about 4 percent of the total variation in
the arrests

Master in Finance, UC3M

Econometrics for Finance, week 1

28 / 30

Goodness of fit

Example: Crime data (II)

\
narr
86 = 0.712 0.150 pcnv 0.034 ptime86 0.104 qemp86,
R 2 = 0.0413
We now add as a new factor the average sentence length in months
\
narr
86 = 0.707 0.151 pcnv + 0.0074 avgsen 0.037 ptime86
0.103 qemp86,
R 2 = 0.0422
there is very little improvement in the R 2 the expected number of
arrests depends negatively on
the estimates of the other coefficients hardly change
the sign of the coefficient estimate for avgsen is unusual
should we keep avgsen in the regression?

Master in Finance, UC3M

Econometrics for Finance, week 1

29 / 30

Summary

Summary

Sometimes, the omission of variables can lead to bias in the OLS


estimator
The best solution to avoid OV bias is to include the omitted variable
in the regression
Slope coefficients in the multiple regression model capture ceteris
paribus effects for variables who enter the model linearly
OLS still exploits sample orthogonality conditions

Master in Finance, UC3M

Econometrics for Finance, week 1

30 / 30

S-ar putea să vă placă și