Sunteți pe pagina 1din 11

QUAN201 Introductory Econometrics Dean Hyslop

Lecture 1 (Ref: Wooldridge, Chapter 1)



1. What is Econometrics?

It is the Statistical or empirical analysis of economic
relationships.

This may involve:

Using economic theory to guide statistical modelling

Using statistical methods to:

1. Estimate economic relationships
E.g. the effect of education on wages

2. Test economic theories
E.g. does raising minimum wage reduce
employment?

3. Evaluate / implement govt / business policy
E.g. how effective are job training programmes on
low-skilled employment & wages

4. Forecast / predict economic variables
E.g. GDP growth, inflation, interest rates, etc.


2
2. How does Econometrics differ from
(Classical) Statistics?

Classical Statistics generally deals in the context of
experimental data collected in laboratory environments in
the natural sciences:
So the effects of control/treatment variables on
outcome/response variables can be examined
directly by changing the level of the control
variable(s), holding all-else equal, and measuring
the effect on the outcome variable(s).

In contrast, most economic (and other social science) data
is generally non-experimental in nature:
So there is often no reason to believe that, if two
observations of a control variable differ, that all-else
is the same between these observations.
E.g. can we assume 2 randomly selected people with
different education levels the same in other respects?

Therefore, many non-experimental issues need to be
addressed in econ metrics in order to confidently infer
causality between a control variable and an outcome
variable.

This makes econometrics both difficult and interesting!

3
3. Stages of Econometric Analysis
1. Careful formulation of the question of interest
E.g. what is the effect of an additional year of education
on wages?

2. Either use economic theory or an informal/intuitive
approach to develop an economic framework for estimation
& testing
E.g. Mincers (1962) Human Capital Theory

3. Translate the economic model into an econometric
model to be estimated statistically. This requires:
functional form of the relationship(s) to be estimated
between the observed variables of interest;
assumptions on the effects of unobserved factors
E.g. linear relationship between log(wages) and years of
education, and assume other factors that affect wages are
uncorrelated with education

4. Formulate hypotheses of interest in terms of the
(unknown) model parameters.
E.g. H
0
: returns to education = 0 vs H
1
: not so

Empirical analysis requires data!

5. Given data, proceed to estimation, hypothesis testing
and general model-specification evaluation

Note: Generally econometrics begins at stage-3, the
econometric specification of the model

QUAN201 Introductory Econometrics Dean Hyslop
Lecture 2 (Ref: Wooldridge, Chapter 2)


The starting point for any statistical analysis should be a
description of the data to:
i. understand what the data are supposed to measure
ii. check whether there are values that look like they were
mis-measured e.g. outliers
This can be done using:
i. simple graphical methods e.g. histograms or
scatterplots, etc; and/or
ii. common descriptive statistics e.g. mean, median,
standard deviation, etc.

After some such descriptive analysis, regression is the most
important building block in econometrics

1. What is Regression?
possible answers:

i. Fitting a line through data e.g. scatterplot of two
variables X and Y.

ii. Estimating the relationship between two or more
variables e.g. how are variables X and Y related?

iii. Ultimately, used as a basis for causal interpretation in
econometrics e.g. to answer the question: what is the
effect of a change in variable-X on outcome-Y?

2
2. Regression as line fitting
Consider the following scatterplot of data for two
variables X and Y :

Our interest is to fit a (simple) linear regression,
represented by
i i i
e bX a Y
,
to the (X
i
,Y
i
) scatterplot of data, where:
Y
i
is called the outcome or dependent variable,
X
i
is the explanatory or independent variable,
e
i
is the residual (or error),
(a,b) are regression parameters or coefficients: a is
called the intercept, and b is called slope, and
the subscript i denotes observation-i.


3
Three classic Examples:

1. Galton (1886, Regression towards Mediocrity ):
X
i
= father-is height (in metres),
Y
i
= son-is height (in metres)
What is the relationship between ingenerational family
structure, as measured by fathers and sons heights?


2. Returns to education (e.g. Mincer, 1962):
X
i
= person-is schooling level (years of education),
Y
i
= is wages ($/hour worked)
What is the average effect of an additional year of
education on a workers wage?


3. Phillips-curve (Phillips, 1958, Economica):
X
t
= year-t unemployment rate (percent unemployed),
Y
t
= year-t wage inflation rate (percent)
What is the relationship between unemployment and
wage inflation?

4
Consider three possible lines fit to these data, labelled A, B,
and C as follows.
First, consider line-A

Let
i i
bX a Y

be the fitted (or predicted) value of Y,


given the value-X
i
i.e. the point on the fitted line
corresponding to X
i
.
Then the residual is the difference between the actual
and predicted Y-values: i.e.
) (

i i i i i
bX a Y Y Y e
.
Intuitively, line-A doesnt fit the data i.e. it doesnt go
through the scatterplot!
More formally, the residuals (e
i
) are all negative so the
average residual,
0
1
1

N
i
i
e
N
e
. This suggests:

Property 1: one desirable property of a regression line fit
is that the average residual is 0: i.e.
0
1
1

N
i
i
e
N
e
.

5
Next, consider line-B

Line-B satisfies the zero average residual condition but
still doesnt look like a good fit, because there are mostly
negative residuals for low-Xs and positive residuals for
high-Xs.
More formally, the problem is the residuals are correlated
with the X
i
s i.e. their covariance should be zero:
0 ) )( (
1
) , cov(
1

N
i
i i i i
X X e e
N
X e
.
This suggests:

Property 2: a second desirable property of a regression
line fit is that the residuals are uncorrelated with the Xs.
Note: if 0 e , then this implies:
0
1
) , cov(
1

N
i
i i i i
X e
N
X e
.


6
Finally, consider line-C

Line-C looks like a good fitting line, and satisfies both
properties 1 and 2.

This very intuitive sense of a good fitting regression line
is based on a method of moments approach
It turns out that this approach leads to the (essentially)
most commonly used estimators in regression analysis.
These are generally referred to as the Ordinary Least
Squares (OLS) estimators the OLS name comes from an
alternative approach than intuited here.

But, lets see what this method of moments approach
implies about the regression coefficient estimates

7
3. Summary / Implications
This very intuitive discussion of fitting a good line to a
scatterplot relied on three aspects:
1. The assumed functional form of the relationship
between Y and X i.e. is linear
2. The resulting residuals should have zero average; and
3. The residuals should be uncorrelated with the X
i
s

To estimate the coefficients (a & b) of the good-fitting line,
we use these three points.

Property 1: Zero average residual
0 )

(
1
1

N
i
i i
Y Y
N
e

Y Y

i.e. avg actual-Y = avg predicted-Y.
And, using the linear functional form assumption,
0 )) ( (
1
1

N
i
i i
bX a Y
N

which implies
X b a Y ,
and solving for a , gives:
X b Y a

i.e. the intercept = avg-Y b*avg-X.

8
Property 2: zero correlation between residuals and Xs
0
) 1 (
1
) , cov(
1

N
i
i i i i
X e
N
X e

0 )) ) ( (
1

N
i
i i i
X bX X b Y Y

0 ) ( ) (
1 1



N
i
i i
N
i
i i
X X X b X Y Y

Solving for b, gives

N
i
i i
N
i
i i
X X X
X Y Y
b
1
1
) (
) (
.
Since


N
i
i
N
i
i
X X X X Y Y
1 1
) ( 0 ) (
, we can rewrite
this to solve for b,

N
i
i i
N
i
i i
X X X X
N
X X Y Y
N
b
1
1
) )( (
) 1 (
1
) )( (
) 1 (
1
,
which is simply
) (
) , (
i
i i
X Var
Y X Cov
b

i.e. the slope parameter is the covariance between X
i
and Y
i

divided (i.e. normalised) by the variance of X
i
.

S-ar putea să vă placă și