Presentation Studenmund Using Econo

Econometrics and Quantitative
Analysis
Using Econometrics: A Practical Guide
A.H. Studenmund
6th Edition. Addison Wesley Longman
Instructor: Dr. Samir Safi

Associate Professor of Statistics
Fall 2011
2011 Pearson Addison-Wesley. All rights reserved.
1-
Chapter 1
An Overview of
Regression
Analysis
Copyright 2011 Pearson Addison-Wesley.

All rights reserved.
Slides by Niels-Hugo Blunch

Washington and Lee University
What is Econometrics?
Econometrics literally means economic
measurement
It is the quantitative measurement and analysis
of actual economic and business phenomena
phenomena
and so involves:
economic theoryy
Statistics
Math
observation/data collection
1-
What is Econometrics? (cont.)

(cont )
Three major uses of econometrics:
Describing
D
ibi economic
i reality
lit
Testing hypotheses about economic theory
Forecasting future economic activity
So econometrics is all about questions: the

researcher (YOU!) first asks questions and then
uses econometrics to answer them
1-
Example
Consider the general and purely theoretical
relationship:
l ti
hi
Q = f(P,
f(P Ps, Yd)
(1 1)
(1.1)
Econometrics allows this g

general and purely
y
theoretical relationship to become explicit:
Q = 27.7
27 7 0.11P
0 11P + 0.03P
0 03Ps + 0.23Yd
0 23Yd
(1 2)
(1.2)
1-
What is Regression Analysis?

Economic theory can give us the direction of a
change,
g , e.g.
g the change
g in the demand for dvds
following a price decrease (or price increase)
But what if we want to know not jjust how? but also
how much?
Then we need:
A sample of data
A way to estimate such a relationship
one of the most frequently ones used is regression analysis
1-
What is Regression Analysis?

(cont.)
Formally, regression analysis is a statistical
t h i
technique
th t attempts
that
tt
t to
t explain
l i
movements in one variable, the dependent
variable, as a function of movements in a set
of other variables, the independent (or
explanatory) variables, through the
quantification of a single equation
1-
Example
Return to the example from before:
Q = f(P,
( , Ps, Yd))
((1.1))
Here, Q is the dependent variable and P, Ps, Yd are the

independent variables
Dont be deceived by the words dependent and independent,
however
A statistically significant regression result does not necessarily imply
causality
We also need:
Economic theory
Common sense
1-
Single-Equation Linear Models

The simplest example is:
Y = 0 +1 X
(1 3)
(1.3)
The ' s are denoted coefficients

0 is the constant or intercept term
1 is the slope
p coefficient: the amount that Y will
change when X increases by one unit; for a linear model,
1 is constant over the entire function
1-
Figure 1.1
Graphical Representation of the
Coefficients of the Regression Line
1-

(cont.)
Application of linear regression techniques requires that the

equation be linearsuch as (1.3)
By contrast, the equation

Y = 0 + 1 X2
(1.4)
Z = X2
(1 5)
(1.5)
Y = 0 + 1 Z
(1 6)
(1.6)
is not linear
What to do? First define

Substituting into (1.4) yields:
This redefined equation is now linear (in the coefficients 0 and 1 in
the variables Y and Z)
1-

(cont.)
Is (1.3) a complete description of origins of variation in Y?
No, at least four sources of variation in Y other than the variation in the
included Xs:
Other potentially important explanatory variables may be missing

(e.g., X2 and X3)
Measurement error
Incorrect functional form
Purely random and totally unpredictable occurrences
Inclusion of a stochastic error term () effectively takes care

of all these other sources of variation in Y that are NOT captured
b X
by
X, so th
thatt (1.3)
(1 3) b
becomes:
Y = 0 + 1X +
(1.7)
1-

(cont.)
Two components in (1.7):
dete
deterministic
st c co
component
po e (0 + 1X))
stochastic/random component ()
Why
y deterministic?
Indicates the value of Y that is determined by a given value of X
(which is assumed to be non-stochastic)
Alt
Alternatively,
ti l the
th det.
d t comp. can be
b thought
th
ht off as the
th
expected value of Y given Xnamely E(Y|X)i.e. the
mean (or average) value of the Ys associated with a
particular value of X
This is also denoted the conditional expectation (that is,
expectation of Y conditional on X)
1-
Example: Aggregate
Consumption Function
Aggregate consumption as a function of aggregate income may
be lower (or higher) than it would otherwise have been due to:
consumer uncertaintyhard
t i t
h d (i
(impossible?)
ibl ?) tto measure, ii.e. iis an
omitted variable
Observed consumption may be different from actual consumption
due to measurement error
The true consumption function may be nonlinear but a linear one is
estimated (see Figure 1.2 for a graphical illustration)
Human
H
b
behavior
h i always
l
contains
t i some element(s)
l
t( ) off pure chance;
h
unpredictable, i.e. random events may increase or decrease
consumption at any given time
Whenever one or more of these factors are at play, the observed

Y will differ from the Y predicted from the deterministic part, 0 +
1X
1-
Figure 1.2
Errors Caused by Using a Linear Functional
Form to Model a Nonlinear Relationship
1-
Extending the Notation

Include reference to the number of
observations
b
ti
Single-equation
g
q
linear case:
Yi = 0 + 1Xi + i (i = 1,2,,N)
(1.10)
So there are really N equations

equations, one for each
observation
the coefficients,
coefficients 0 and 1, are the same
the values of Y, X, and differ across observations
1-
Extending the Notation (cont.)

(cont )
The general case: multivariate regression
Yi = 0 + 1X1i + 2X2i + 3X3i + i (i = 1,2,,N) (1.11)
Each of the slope coefficients gives the impact of a one-unit
increase in the corresponding X variable on Y, holding the
other included independent variables constant (i.e., ceteris
paribus)
As an (implicit) consequence of this, the impact of variables
th t are nott included
that
i l d d in
i th
the regression
i are nott h
held
ld
constant (we return to this in Ch. 6)
1-
Example: Wage Regression

Let wages (WAGE) depend on:
years of work experience (EXP)
yyears of education ((EDU))
gender of the worker (GEND: 1 if male, 0 if female)
Substituting into equation (1.11) yields:

WAGEi = 0 + 1EXPi + 2EDUi + 3GENDi + i (1.12)
(1 12)
1-
Indexing Conventions
Subscript i for data on individuals (so called
cross section data)
Subscript
p t for time series data ((e.g.,
g series of
years, months, or daysdaily exchange rates, for
example
p )
Subscript it when we have both (for example,
panel
panel data
data))
1-
The Estimated Regression

E
Equation
ti
The regression equation considered so far is the truebut

unknowntheoretical regression equation
Instead of true, might think about this as the population

regression vs. the sample/estimated regression
How do we obtain the empirical counterpart of the theoretical

regression model (1.14)?
It has to be estimated
The empirical counterpart to (1.14) is:
Yi = 0 + 1 X i
(1.16)
The signs
Th
i
on top off the
h estimates
i
are denoted
d
d hat,
h so that
h we h
have
Y-hat, for example
1-

E
Equation
ti (cont.)
(
t)
For each sample we get a different set of estimated
regression
g
coefficients
Y is the estimated value of Yi (i.e. the dependent
variable for observation i);
); similarlyy it is the
prediction of E(Yi|Xi) from the regression equation
The closer Y is to the observed value of Yi, the
better is the fit of the equation
y, the smaller is the estimated error term,, ei,
Similarly,
often denoted the residual, the better is the fit
1-

E
Equation
ti (cont.)
(
t)
This can also be seen from the fact that
(1 17)
(1.17)
Note difference with the error term, i, given as

(1.18)
This all comes together in Figure 1.3
1-
Figure 1.3
T
True
and
d Estimated
E ti t d Regression
R
i
Lines
Li
1-
Example: Using Regression to

Explain Housing prices
Houses are not homogenous products, like corn or
gold that have generally known market prices
gold,
So, how to appraise a house against a given
asking
ki price?
i ?
Yes, its true: manyy real estate appraisers
pp
actually
y
use regression analysis for this!
Consider specific case: Suppose the asking price
was $230,000
1-

Explain Housing prices (cont.)
Is this fair / too much /too little?
Depends
D
d on size
i off h
house (hi
(higher
h size,
i
hi
higher
h price)
i )
So, collect cross-sectional data on prices
(in thousands of $) and sizes (in square feet)
for, say, 43 houses
Then say this yields the following estimated regression
line:
PR ICE i = 40 .0 + 0 .138 SIZE i
(1.23)
1-
Figure 1.5 A Cross-Sectional

Model of Housing Prices
1-

Note that the interpretation of the intercept term
is problematic in this case (well
(we ll get back to this
later, in Section 7.1.2)
The literal interpretation of the intercept here is the
price of a house with a size of zero square
p
q
feet
1-

How to use the estimated regression line / estimated
regression coefficients to answer the question?
Just plug the particular size of the house, you are interested in
(here, 1,600 square feet) into (1.23)
Alternatively,
Alternatively read off the estimated price using Figure 1.5
15
Either way, we get an estimated price of $260.8 (thousand,

remember!)
So, in terms of our original question, its a good dealgo
ahead and purchase!!
N
Note
t that
th t we simplified
i lifi d a llott iin thi
this example
l b
by assuming
i th
thatt
only size matters for housing prices
1-
Table 1.1a Data for and Results of the

Weight-Guessing Equation
1-
Table 1.1b Data for and Results of the

Weight-Guessing Equation
1-
Figure 1.4
A Weight-Guessing Equation
1-
Key Terms from Chapter 1

Regression analysis
Slope coefficient
Dependent variable
Multivariate regression
model
Independent (or explanatory) variable(s)
Expected value
Causality
Residual
Stochastic error term
Time series
Linear
Cross-sectional data set
Intercept term
1-
Chapter 2
Ordinary Least Squares
1-
Estimating Single-IndependentVariable Models with OLS

Recall that the objective of regression analysis is to start
from:
(2.1)
And, through the use of data, to get to:
(2 2)
(2.2)
Recall that equation 2.1 is purely theoretical, while equation
(2.2) is it empirical counterpart
How to move from (2
(2.1)
1) to (2.2)?
(2 2)?
1-
2-
Estimating Single-IndependentVariable Models with OLS (cont.)

One of the most widely used methods is Ordinary Least
Squares (OLS)
OLS minimizes
(i = 1, 2, ., N)
(2.3)
Or, the sum of squared deviations of the vertical distance

between the residuals ((i.e. the estimated error terms)) and
the estimated regression line
We also denote this term the Residual
Residual Sum of Squares
Squares
(RSS)
1-
2-

Similarly, OLS minimizes:
Why
Wh use OLS?
(
Y
Y
)
i i
i
Relatively easy to use

The goal of minimizing RSS is intuitively /
theoretically appealing
This basically says we want the estimated regression
equation to be as close as possible to the observed data
OLS estimates have a number of useful

characteristics
1-
2-

OLS estimates have at least two useful
characteristics:
The sum of the residuals is exactly zero
best estimator when
OLS can be shown to be the best
certain specific conditions hold (well get back to
this in Chapter 4)
Ordinary Least Squares (OLS) is an estimator
A g
given
produced by
y OLS is an estimate
1-
2-

How does OLS work?
First recall from (2
(2.3)
3) that OLS minimizes the sum of the squared
residuals
Next, it can be shown (see Exercise 12) that the coefficients that
Next
ensure that for the case of just one independent variable are:
(2 4)
(2.4)
(2.5)
1-
2-
Estimating Multivariate
Regression Models with OLS
In the real world one explanatory variable is not enough
The general multivariate regression model with K
independent variables is:
Yi = 0 + 1X1i + 2X2i + ... + KXKi + i (i = 1,2,,N) (1.13)
Biggest difference with single-explanatory variable

regression model is in the interpretation of the slope
coefficients
Now a slope coefficient indicates the change in the dependent
variable associated with a one-unit increase in the explanatory
variable holding the other explanatory variables constant
1-
2-
Estimating Multivariate Regression

Models with OLS (cont.)
Omitted (and relevant!) variables are therefore not
held constant
The intercept
p term, 0, is the value of Y when all
the Xs and the error term equal zero
Nevertheless
Nevertheless, the underlying principle of
minimizing the summed squared residuals remains
the same
1-
2-
Example: financial aid awards at

a liberal arts college
Dependent variable:
FINAIDi: financial aid (measured in dollars of
grant) awarded to the ith applicant
1-
2-

a liberal arts college
Theoretical Model:
(2.9)
(2.10)
where:
h
PARENTi: The amount (in dollars) that the parents of the ith
student are judged able to contribute to college expenses
HSRANKi: The ith students GPA rank in high school, measured
as a percentage (i.e. between 0 and 100)
1-
2-

a liberal arts college (cont.)
Estimate model using the data in Table 2.2 to get:
(2.11)
Interpretation of the slope coefficients?
Graphical interpretation in Figures 2.1 and 2.2
1-
2-
Figure 2.1 Financial Aid as a

Function of Parents Ability to Pay
1-
2-
Figure 2.2 Financial Aid as a

Function of High School Rank
1-
2-
Total, Explained, and Residual

Sums of Squares
(2.12)
(2 13)
(2.13)
TSS = ESS + RSS

This is usually called the decomposition of
variance
1-
2-
Figure 2.3 Decomposition of the

Variance in Y
1-
2-
Evaluating the Quality of a

Regression Equation
Checkpoints here include the following:
1. Is the equation
q
supported
pp
by
y sound theory?
y
2. How well does the estimated regression fit the data?
3. Is the data set reasonably large and accurate?
4. Is OLS the best estimator to be used for this equation?
5. How well do the estimated coefficients correspond to the expectations
developed by the researcher before the data were collected?
6. Are all the obviously important variables included in the equation?
7. Has the most theoretically logical functional form been used?
8 Does
8.
D
th
the regression
i appear tto b
be free
f
off major
j econometric
ti
problems?
*These numbers roughly correspond to the relevant chapters in the book
1-
2-
Describing the Overall Fit of the

Estimated Model
The simplest commonly used measure of overall fit
is the coefficient of determination
determination, R2:
(2 14)
(2.14)
Since OLS selects the coefficient estimates that
minimizes RSS, OLS provides the largest
g
possible
R2 (within the class of linear models)
1-
2-
Figure 2.4 Illustration of Case

Where R2 = 0
1-
2-

Where R2 = .95
1-
2-

Where R2 = 1
1-
2-
The Simple Correlation

Coefficient, r
This is a measure related to R2
r measures the strength and direction of the linear
relationship
p between two variables:
r = +1: the two variables are perfectly positively
correlated
r = 1: the two variables are perfectly negatively
correlated
r = 0: the two variables are totally uncorrelated
1-
2-
The adjusted coefficient of

determination
A major problem with R2 is that it can never
decrease if another independent variable is added
An alternative to R2 that addresses this issue is the
adjusted R2 or R2:
(2.15)
Where N K 1 = degrees of freedom
1-
2-
The adjusted coefficient of

determination (cont.)
So, R2 measures the share of the variation of Y around
its mean that is explained
p
by
y the regression
g
equation,
q
,
adjusted for degrees of freedom
R2 can be used to compare

p
the fits of regressions
g
with
the same dependent variable and different numbers of
As a result, most researchers automatically use instead
of R2 when evaluating the fit of their estimated
regressions equations
1-
2-
Table 2.1a
The Calculation of Estimated Regression
Coefficients for the Weight/Height Example
1-
2-
Table 2.1b
The Calculation of Estimated Regression
Coefficients for the Weight/Height Example
1-
2-
Table 2.2a
Data for the Financial Aid Example
1-
2-
Table 2.2b
1-
2-
Table 2.2c
1-
2-
Table 2.2d
1-
2-

Ordinary Least Squares (OLS)
Interpretation of a multivariate regression coefficient
Total sums of squares
Explained sums of squares
Residual sums of squares
determination, R2
Coefficient of determination
Simple correlation coefficient, r
Degrees of freedom
Adjusted coefficient of determination , R2
1-
2-
Chapter 3
Learning to Use Regression

Analysis
y
1-
Steps in Applied
Regression Analysis
The first step is choosing the dependent variable this step is
determined by the purpose of the research (see Chapter 11 for
details)
After choosing the dependent variable, its logical to follow the
following sequence:
1. Review
1
R i
th
the lit
literature
t
and
d develop
d
l the
th theoretical
th
ti l model
d l
2. Specify the model: Select the independent variables and the
functional form
3. Hypothesize the expected signs of the coefficients
4. Collect the data. Inspect and clean the data
5 Estimate
5.
E ti t and
d evaluate
l t the
th equation
ti
6. Document the results
1-
Step 1: Review the Literature and

Develop the Theoretical Model
Perhaps counter intuitively, a strong theoretical foundation
is the best start for any empirical project
Reason: main econometric decisions are determined by the
underlying theoretical model
Useful starting points:
Journal of Economic Literature or a business oriented publication of
abstracts
Internet
I t
t search,
h including
i l di G
Google
l S
Scholar
h l
EconLit, an electronic bibliography of economics literature (for more
details go to www.EconLit.org)
details,
www EconLit org)
1-
Step 2: Specify the Model: Independent

Variables and Functional Form
After selecting the dependent variable, the
specification of a model involves choosing the
following components:
1 the independent variables and how they should be
1.
measured,
2 the functional (mathematical) form of the variables
2.
variables,
and
3 the properties of the stochastic error term
3.
1-
Step 2: Specify the Model:

Independent Variables and
Functional Form (cont.)
A mistake in any of the three elements results in a specification error
For example,
example only theoretically relevant explanatory variables should
be included
Even so,, researchers frequently

q
y have to make choices also denoted
imposing their priors
Example:
when estimating a demand equation, theory informs us that prices of
complements and substitutes of the good in question are important
explanatory variables
But which complementsand which substitutes?
1-
Step 3: Hypothesize the Expected

Signs of the Coefficients
Once the variables are selected, its important to
hypothesize the expected signs of the regression
coefficients
Example:
E
l demand
d
d equation
ti ffor a fifinall consumption
ti good
d
First, state the demand equation as a general function:
(3.2)
The signs above the variables indicate the hypothesized
sign of the respective regression coefficient in a linear
model
d l
1-
Step 4: Collect the Data & Inspect

and Clean the Data
A general rule regarding sample size is the more
observations the better
as long as the observations are from the same general
population!
The reason for this goes back to notion of degrees of

freedom (mentioned first in Section 2.4)
When there are more degrees of freedom:
Every positive error is likely to be balanced by a negative error
(
(see
Fi
Figure 3.2)
3 2)
The estimated regression coefficients are estimated with a
greater deal of precision
1-
Figure 3.1 Mathematical Fit of a

Line to Two Points
1-
Figure 3.2 Statistical Fit of a Line

to Three Points
1-
Step 4: Collect the Data & Inspect

and Clean the Data (cont.)
Estimate model using the data in Table 2.2 to get:
Inspecting the data
dataobtain
obtain a printout or plot (graph)
of the data
Reason: to look for outliers
An outlier is an observation that lies outside the range of the rest of
the observations
Examples:
Does a student have a 7.0 GPA on a 4.0 scale?
Is consumption negative?
1-
Step 5: Estimate and Evaluate

the Equation
Once steps 14 have been completed, the estimation part
is quick
using Eviews or Stata to estimate an OLS regression takes less
than a second!
The evaluation part is more tricky, however, involving

answering the following questions:
How well did the equation fit the data?
Were the signs and magnitudes of the estimated coefficients as
expected?
Afterwards may add sensitivity analysis (see Section 6.4

for details)
1-
Step 6: Document the Results

A standard format usually is used to present estimated
regression results:
(3 3)
(3.3)
The number in parentheses under the estimated coefficient
is the estimated standard error of the estimated
coefficient,
ffi i t and
d th
the t-value
t l is
i th
the one used
d tto ttestt the
th
hypothesis that the true value of the coefficient is different
from zero (more on this later!)
1-
Case Study: Using Regression Analysis

to Pick Restaurant Locations
Background:
You have been hired to determine the best location
for the next Woodys restaurant (a moderately priced,
24 ho r family
24-hour,
famil resta
restaurant
rant chain)
Objective:
j
How to decide location using the six basic steps of
applied
app
ed regression
eg ess o a
analysis,
a ys s, discussed
d scussed ea
earlier?
e
1-
Step 1: Review the Literature and

Develop the Theoretical Model
Background reading about the restaurant industry
Talking to various experts within the firm
All the chains restaurants are identical and located in
suburban, retail, or residential environments
So, lack of variation in potential explanatory variables to help
determine location
Number of customers most important for locational decision
Dependent variable: number of customers (measured by
the number of checks or bills)
1-

Variables and Functional Form
More discussions with in-house experts

reveall th
three major
j d
determinants
t
i
t off sales:
l
Number of people living near the location
General income level of the location
Number of direct competitors near the location
1-

Variables and Functional Form (cont.)
Based on this, the exact definitions of the independent
variables you decide to include are:
N = Competition: the number of direct competitors within a twomile radius of the Woodys location
P = Population: the number of people living within a three-mile
radius of the location
I = Income: the average household income of the population
measured in variable P
With no reason to suspect anything other than linear

functional form and a typical
yp
stochastic error term,
thats what you decide to use
1-
Step 3: Hypothesize the Expected

Signs of the Coefficients
After talking some more with the in-house

experts
t and
d thinking
thi ki some more, you
come up with the following:
(3.4)
1-
Step 4: Collect the Data &

Inspect and Clean the Data
You manage to obtain data on the dependent and
independent variables for all 33 Woody
Woodys
s restaurants
Next, you inspect the data
The data quality is judged as excellent because:
Each manager measures each variable identically
All restaurants are included in the sample
All information is from the same year
y
The resulting data is as given in Tables 3.1 and 3.3 in the

book ((using
g Eviews and Stata,, respectively)
p
y)
1-
Step 5: Estimate and Evaluate

the Equation
You take the data set and enter it into the computer
You then run an OLS regression (after thinking the model over one
last time!)
The resulting model is:
(3.5)
Estimated coefficients are as expected and the fit is reasonable
Values for N, P, and I for each potential new location are then
obtained and plugged into (3.5) to predict Y
1-
Step 6: Document the Results

The results summarized in Equation 3.5
meet our documentation requirements
Hence
Hence, you
o decide that theres no need to
take this step any further
1-
Table 3.1a
Data for the Woodys
Woody s Restaurants Example
(Using the Eviews Program)
1-
Table 3.1b
Data for the Woodys
1-
Table 3.1c
Data for the Woodys
1-
Table 3.2a
Actual Computer Output
1-
Table 3.2b
1-
Table 3.3
Data for the Woodys
(Using the Stata Program)
1-
Table 3.3b
Data for the Woodys
1-
Table 3.4a
1-
Table 3.4b
1-

The six steps in applied regression analysis
Dummy variable
Cross-sectional data set
Specification
S
ifi ti error
Degrees of freedom
1-
Chapter 4
The Classical Model
1-
The Classical Assumptions
The classical assumptions must be met in order for OLS estimators to be the
best available
The seven classical assumptions are:
I. The regression model is linear, is correctly specified, and has an
additive error term
II The error term has a zero population mean
II.
III. All explanatory variables are uncorrelated with the error term
IV. Observations of the error term are uncorrelated with each other
(no serial correlation)
V. The error term has a constant variance (no heteroskedasticity)
VI. No explanatory variable is a perfect linear function of any other
explanatory variable(s) (no perfect multicollinearity)
VII. The error term is normally distributed (this assumption is optional
but usually is invoked)
1-
4-
I: linear, correctly specified,
additive error term

Consider the following regression model:
Yi = 0 + 1X1i + 2X2i + ... + KXKi + i
(4.1)
This model:
is linear (in the coefficients)
has an additive error term
If we also assume that all the relevant explanatory variables

are included in (4.1) then the model is also correctly
specified
1-
4-
II: Error term has a zero

population mean
As was pointed out in Section 1.2, econometricians add a
stochastic (random) error term to regression equations
Reason: to account for variation in the dependent
variable that is not explained by the model
The specific value of the error term for each observation
is determined purely by chance
This can be illustrated by Figure 4.1
1-
4-
Figure 4.1 An Error Term

Distribution with a Mean of Zero
1-
4-
III: All explanatory variables are

uncorrelated with the error term
If not, the OLS estimates would be likely to attribute to
the X some of the variation in Y that actually came from
the error term
For example,
example if the error term and X were positively
correlated then the estimated coefficient would probably
be higher
g
than it would otherwise have been ((biased
upward)
This assumption is violated most frequently when a
researcher omits an important independent variable from
an equation
1-
4-
IV: No serial correlation of

error term
If a systematic correlation does exist between one observation of
the error term and another, then it will be more difficult for OLS to
get accurate estimates of the standard errors of the coefficients
This assumption is most likely to be violated in time-series
models:
A
An increase
i
in
i th
the error tterm iin one ti
time period
i d ((a random
d
shock,
h k
for example) is likely to be followed by an increase in the next
period, also
Example: Hurricane Katrina
If, over all the observations of the sample t+1 is correlated with t then
the error term is said to be serially correlated (or auto-correlated),
and Assumption IV is violated
Violations of this assumption are considered in more detail in Chapter 9
1-
4-
V: Constant variance / No
heteroskedasticity in error term
The error term must have a constant variance
That is, the variance of the error term cannot
change
g for each observation or range
g of
observations
If it does,
does there is heteroskedasticity present in the
error term
An example of this can bee seen from Figure 4.2
1-
4-
Figure 4.2 An Error Term Whose

Variance Increases as Z Increases
(Heteroskedasticity)
1-
4-
VI: No perfect multicollinearity

Perfect collinearity between two independent variables
implies that:
they are really the same variable, or
one is a multiple
p of the other,, and/or
that a constant has been added to one of the variables
Example:
Including both annual sales (in dollars) and the annual sales tax
paid in a regression at the level of an individual store, all in the
same city
Since the stores are all in the same city, there is no variation in the
percentage sales tax
1-
4-
VII: The error term is normally

distributed
Basically implies that the error term follows a
bell-shape
bell
shape (see Figure 4.3)
4 3)
Strictly speaking not required for OLS estimation
( l t d tto the
(related
th Gauss-Markov
G
M k Theorem:
Th
more on
this in Section 4.3)
Its major application is in hypothesis testing,
which uses the estimated regression coefficient to
i
investigate
ti t hypotheses
h
th
about
b t economic
i b
behavior
h i
(see Chapter 5)
1-
4-
Figure 4.3
Normal Distributions
1-
4-
The Sampling
Di t ib ti off
Distribution
We saw earlier that the error term follows a
probability distribution (Classical Assumption VII)
But so do the estimates of !
The probability distribution of these values across
different samples is called the sampling distribution
of
We will now look at the properties of the mean,

mean the
variance, and the standard error of this sampling
distribution
1-
4-
Properties of the Mean

A desirable property of a distribution of estimates in that its mean
equals
q
the true mean of the variables being
g estimated
Formally, an estimator is an unbiased estimator if its sampling
distribution has as its expected
p
value the true value of .
We also write this as follows:
(4 9)
(4.9)
Similarly, if this is not the case, we say that the estimator is
biased
1-
4-
Properties of the Variance

Just as we wanted the mean of the sampling distribution to be
centered around the true p
population
p
, so too it is desirable for
the sampling distribution to be as narrow (or precise) as possible.
Centering around the truth but with high variability might be of very
little use.
One way of narrowing the sampling distribution is to increase the

sampling size (which therefore also increases the degrees of
freedom)
These
Th
points
i t are illustrated
ill t t d iin Fi
Figures 4.4
4 4 and
d 4.5
45
1-
4-
Figure 4.4
Distributions of
1-
4-
Figure 4.5 Sampling Distribution of

for Various Observations (N)
1-
4-
Properties of the
Standard Error
The standard error of the estimated coefficient, SE( ),
is the square root of the estimated variance of the
estimated coefficients.
H
Hence, it iis similarly
i il l affected
ff t d b
by th
the sample
l size
i and
d
the other factors discussed previously
For example, an increase in the sample size will decrease the
standard error
Similarly, the larger the sample, the more precise the
coefficient estimates will be
1-
4-
The Gauss-Markov Theorem and

the Properties of OLS Estimators
The Gauss-Markov Theorem states that:
Given Classical Assumptions I through VI (Assumption VII,
normality, is not needed for this theorem), the Ordinary Least
Squares estimator of kk is the minimum variance estimator
from among the set of all linear unbiased estimators of k,
for k = 0, 1, 2, , K
We also say that OLS is BLUE: Best (meaning

minimum variance)) Linear Unbiased Estimator
1-
4-
The Gauss-Markov Theorem and the

Properties of OLS Estimators (cont.)
The Gauss-Markov Theorem only requires the first six classical

assumptions
If we add the seventh condition, normality, the OLS coefficient

estimators can be shown to have the following properties:
Unbiased: the OLS estimates coefficients are centered around the true
population values
Minimum variance: no other unbiased estimator has a lower variance for
each estimated coefficient than OLS
Consistent: as the sample size gets larger, the variance gets smaller, and
each estimate approaches the true value of the coefficient being estimated
Normally distributed: when the error term is normally distributed, so are
the estimated coefficientswhich enables various statistical tests requiring
normality to be applied (well
(we ll get back to this in Chapter 5)
1-
4-
Table 4.1a
Notation Conventions
1-
4-
Table 4.1b
Notation Conventions
1-
4-

The classical assumptions
Classical error term
Standard normal distribution
SE( ),
Unbiased estimator
BLUE
Sampling distribution
1-
4-
Chapter 5
Hypothesis Testing
1-
What Is Hypothesis Testing?

Hypothesis testing is used in a variety of settings
The Food and Drug
g Administration ((FDA),
), for example,
p , tests new
products before allowing their sale
If the sample of people exposed to the new product shows some side effect
significantly more frequently than would be expected to occur by chance,
th FDA is
the
i likely
lik l to
t withhold
ithh ld approvall off marketing
k ti that
th t product
d t
Similarly, economists have been statistically testing various

relationships, for example that between consumption and income
Note here that while we cannot prove a given hypothesis (for

example the existence of a given relationship), we often can reject a
given hypothesis
g
yp
((again,
g
for example,
p rejecting
j
g the existence of a
given relationship)
1-
Classical Null and Alternative

Hypotheses
The researcher first states the hypotheses to be tested
Here, we distinguish between the null and the alternative

Here
hypothesis:
Null hypothesis (H0): the outcome that the researcher does not
expect (almost always includes an equality sign)
Alternative hypothesis (HA): the outcome the researcher does
expect
Example:
H0: 0 (the values you do not expect)
HA: > 0 (the values you do expect)
1-
Type I and Type II Errors

Two types of errors possible in hypothesis testing:
Type
yp I: Rejecting
j
g a true null hypothesis
yp
Type II: Not rejecting a false null hypothesis
Example:
p Suppose
pp
we have the following
g null and alternative
hypotheses:
H0: 0
HA: > 0
Even if the true really is not positive, in any one sample we might
y positive to lead to
still observe an estimate of that is sufficiently
the rejection of the null hypothesis
1-
Figure 5.1 Rejecting a True Null

Hypothesis Is a Type I Error
1-
Type I and Type II Errors (cont.)

(cont )
Alternatively, its possible to obtain an estimate of that
is close enough to zero (or negative) to be considered
not significantly positive
S
Such
h a result
lt may llead
d th
the researcher
h tto accept
t th
the
null hypothesis that 0 when in truth > 0
This is a Type II Error; we have failed to reject a false
null hypothesis!
1-
Figure 5.2 Failure to Reject a False

Null Hypothesis Is a Type II Error
1-
Decision Rules of
Hypothesis Testing
To test a hypothesis, we calculate a sample statistic that determines

when the null hypothesis can be rejected depending on the magnitude
of that sample statistic relative to a preselected critical value (which is
found in a statistical table)
This procedure is referred to as a decision rule
The decision rule is formulated before regression estimates are

obtained
The range of possible values of the estimates is divided into two

regions, an acceptance (really, non-rejection) region and a rejection
region
The critical value effectively separates the acceptance/non-rejection

acceptance /non rejection
region from the rejection region when testing a null hypothesis
Graphs of these acceptance and rejection regions are given in

Figures 5.3
5 3 and 5.4
54
1-
Figure 5.3 Acceptance and Rejection

Regions for a One-Sided Test of
1-
Figure 5.4 Acceptance and Rejection

Regions for a Two-Sided Test of
1-
The t-Test
The t-test is the test that econometricians usually use to test
hypotheses
yp
about individual regression
g
slope
p coefficients
Tests of more than one coefficient at a time (joint hypotheses)
are typically
yp
y done with the F-test,, presented
p
in Section 5.6
The appropriate test to use when the stochastic error term is
normally
y distributed and when the variance of that distribution
must be estimated
Since these usually are the case, the use of the t-test for
hypothesis testing has become standard practice in
econometrics
1-
The t-Statistic
For a typical multiple regression equation:
(5.1)
we can calculate t-values for each of the estimated
coefficients
Usually these are only calculated for the slope coefficients, though
(see Section 7.1)
Specifically, the t-statistic for the kth coefficient is:

(5.2)
1-
The Critical t-Value and the

t-Test Decision Rule
To decide whether to reject or not to reject a null hypothesis based
on a calculated t-value, we use a critical t-value
A critical t-value is the value that distinguishes the acceptance
region from the rejection region
The critical t-value,
-value tc, is selected from a t-table (see Statistical
Table B-1 in the back of the book) depending on:
whether the test is one-sided or two-sided,
the level of Type I Error specified and
the degrees of freedom (defined as the number of observations
minus the number of coefficients estimated (including the constant)
or N K 1)
1-
The Critical t-Value and the

t-Test Decision Rule (cont.)
The rule to apply when testing a single
regression
i coefficient
ffi i t ends
d up b
being
i th
thatt you
should:
Reject H0 if |tk| > tc and if tk also has the
sign
i implied
i li d b
by HA
Do not reject H0 otherwise
1-
The Critical t-Value and the t-Test

Decision Rule (cont.)
Note that this decision rule works both for
calculated t-values
values and critical t-values
values for
one-sided hypotheses around zero (or another
hypothesized value, S):
H0: k 0
H0: k S
HA: k > 0
HA: k > S
H0: k 0
H0: k S
HA: k < 0
HA: k < S
1-
The Critical t-Value and the t-Test

Decision Rule (cont.)
As well as for two-sided hypotheses around zero
(or another hypothesized value
value, S):
H0: k = 0
H0: k = S
HA: k 0
HA: k S
From Statistical Table B-1 the critical t-value

for a one-tailed test at a given level of
significance is exactly equal to the critical
t-value for a two-tailed test at twice the level
off significance
i ifi
off th
the one-tailed
t il d testas
t t
also
l
illustrated by Figure 5.5
1-
Figure 5.5 One-Sided and

Two-Sided t-Tests
1-
Choosing a Level of
Significance
The level of significance must be chosen before a critical
value can be found,, using
g Statistical Table B
The level of significance indicates the probability of
observing an estimated t-value greater than the critical
t l if th
t-value
the null
ll hypothesis
h
th i were correctt
It also measures the amount of Type I Error implied by a
particular critical t-value
Which level of significance is chosen?
5 percentt is
i recommended,
d d unless
l
you know
k
something
thi
unusual about the relative costs of making Type I and
Type II Errors
1-
Confidence Intervals
A confidence interval is a range that contains the true value of an
item a specified
p
p
percentage
g of the time
It is calculated using the estimated regression coefficient, the
two-sided critical t-value and the standard error of the estimated
coefficient as follows:
( )
(5.5)
Whats the relationship between confidence intervals and twosided hypothesis testing?
If a hypothesized value fall within the confidence interval, then we
cannot reject the null hypothesis
1-
p-Values
This is an alternative to the t-test

A p-value, or marginal significance level, is the probability of observing
a t-score
t
th t size
that
i or larger
l
(i absolute
(in
b l t value)
l ) if th
the null
ll hypothesis
h
th i
were true
Graphically, its two times the area under the curve of the t-distribution
between the absolute value of the actual tt-score
score and infinity
infinity.
In theory, we could find this by combing through pages and
pages of statistical tables
B t we dont
But
d t have
h
tto, since
i
we h
have EViews
EVi
and
d Stata:
St t these
th
(and other) statistical software packages automatically give the
p-values as part of the standard output!
In light of all this
this, the p-value
p al e decision rrule
le therefore is
is:
Reject H0 if p-valueK < the level of significance and if
has the sign
implied by HA
1-
Examples of t-Tests:
One-Sided
The most common use of the one-sided t-test is to determine whether
a regression coefficient is significantly different from zero (in the
direction predicted by theory!)
This involves four steps:
1. Set up the null and alternative hypothesis
2. Choose a level of significance and therefore a critical t-value
3. Run the regression and obtain an estimated t-value (or t-score)
4. Apply
pp y the decision rule by
y comparing
p
g calculated t-value with the
critical t-value in order to reject or not reject the null hypothesis
Lets look at each step in more detail for a specific example:
1-
One-Sided (cont.)
Consider the following simple model of the aggregate retail sales
of new cars:
(5.6)
Where:
Y = sales of new cars
X1 = real disposable income
X2 = average retail price of a new car adjusted by the consumer
price index
X3 = number
b off sports
t utility
tilit vehicles
hi l sold
ld
The four steps for this example then are as follows:
1-
Step 1: Set up the null and

alternative hypotheses
From equation 5.6, the one-sided hypotheses are set up
as:
1.
H0: 1 0
HA: 1 > 0
2.
H0: 2 0
HA: 2 < 0
3.
H0: 3 0
HA: 3 < 0
Remember that a t-test typically is not run on the

estimate of the constant term 0
1-
Step 2: Choose a level of significance

and therefore a critical t-value
Assume that you have considered the various costs
in ol ed in making T
involved
Type
pe I and T
Type
pe II Errors and ha
have
e
chosen 5 percent as the level of significance
There are 10 observations in the data set, and so
there are 10 3 1 = 6 degrees of freedom
At a 5-percent level of significance, the critical
t-value,, tc, can be found in Statistical Table B-1
to be 1.943
1-
Step 3: Run the regression and

obtain an estimated t-value
Use the data (annual from 2000 to 2009) to run
th regression
the
i on your OLS computer
t package
k
Again, most statistical software packages
automatically report the t-values
A
Assume that
th t in
i thi
this case th
the tt-values
l
were 2.1,
21
5.6, and 0.1 for 1, 2, and 3, respectively
1-
Step 4: Apply the ttest

decision rule
As stated in Section 5.2, the decision rule for the t-test is to:
Reject H0 if |tk| > tc and if tk also has the sign implied by HA
In this example, this amounts to the following three
conditions:
For 1: Reject
j
H0 if ||2.1|| > 1.943 and if 2.1 is p
positive.
For 2: Reject H0 if |5.6| > 1.943 and if 5.6 is positive.
For 3: Reject H0 iff |0.1|
|
| > 1.943 and iff 0.1 is positive.
Figure 5.6 illustrates all three of these outcomes
1-
Figure 5.6a One-Sided t-Tests of the

Coefficients of the New Car Sales Model
1-
Figure 5.6b One-Sided t-Tests of the

Coefficients of the New Car Sales Model
1-
Two-Sided
The two-sided test is used when the hypotheses should be
rejected if estimated coefficients are significantly different from
zero or a specific nonzero value
zero,
value, in either direction
So, there are two cases:
1 Two
1.
Two-sided
sided tests of whether an estimated coefficient is
significantly different from zero, and
2. Two-sided tests of whether an estimated coefficient is
significantly different from a specific nonzero value
Lets take an example to illustrate the first of these (the

second case is merely a generalized case of this
this, see the
textbook for details), using the Woodys restaurant
example in Chapter 3:
1-
Two-Sided (cont.)
Again, in the Woodys restaurant equation of Section 3.2, the
impace of the average income of an area on the expected number
of Woody
Woodys
s customer
customers
s in that area is ambiguous:
A high-income neighborhood might have more total customers
going
g
g out to dinner (p
(positive sign),
g ), but those customers might
g
decide to eat at a more formal restaurant that Woodys (negative
sign)
The appropriate (two-sided) t-test therefore is:
1-
Figure 5.7 Two-Sided t-Test of the

Coefficient of Income in the Woodys Model
1-
Two-Sided (cont.)
The four steps are the same as in the one-sided case:

1.
2
2.
3.
4.
Set up
p the null and alternative hypothesis
yp
H0: k = 0
HA: k 0
Choose a level of significance and therefore a critical t-value
t value
Keep the level at significance at 5 percent but this now must be
distributed between two rejection regions for 29 degrees of freedom
hence the correct critical t-value is 2.045
2 045 (found in Statistical Table B-1
for 29 degrees of freedom and a 5-percent, two-sided test)
Run the regression and obtain an estimated t-value:
Th t-value
The
t l remains
i att 2.37
2 37 (from
(f
Equation
E
ti 5.4)
5 4)
Apply the decision rule:
For the two-sided case, this simplifies to:
R j t H0 if |2.37|
Reject
|2 37| > 2.045;
2 045 so, reject
j t H0
1-
Limitations of the t-Test

With the t-values being automatically printed out by computer
regression packages, there is reason to caution against
potential
i l improper
i
use off the
h t-test:
1.
The t-Test Does Not Test Theoretical Validity:

If you regress the consumer price index on rainfall in
a time-series regression and find strong statistical
significance
g
does that also mean that the
underlying theory is valid? Of course not!
1-
Limitations of the t-Test

2.
The t-Test Does Not Test Importance:

The fact that one coefficient is more
more statistically
significant than another does not mean that it is
also more important in explaining the dependent
variablebut
variable
but merely that we have more evidence
of the sign of the coefficient in question
3.
The t-Test Is Not Intended for Tests of the Entire

P
Population:
l ti
From the definition of the t-score, given by Equation
5.2, it is seen that as the sample size approaches
the population (whereby the standard error will
approach zerosince the standard error decreases
as N increases),
) the t-score will approach
pp
infinity!
y
1-
The F-Test of Overall Significance

We can test for the predictive power of the entire model using
the F statistic
Generally these compare two sources of variation
F = V1/V2 and
dh
has ttwo df parameters
t
Here V1 = ESS/K has K df
And V2 = RSS/(n-K-1) has n-K-1 df
1-
F Tables
Usually will see several pages of these; one or two pages at
p
level of significance
g
((.10,, .05,, .01).
)
each specific
Numerator d.f.
denom.
d.f.
Value of F at a
specific significance level
F Test Hypotheses
H0: 1 = 2 = = K = 0 (None of the Xs help explain Y)
Ha: Not all s are 0 (At least one X is useful)
H 0 : R2 = 0
is an equivalent hypothesis
Reject H0
Do Not Reject
j H0
if FFc
if F<Fc
The critical F-value, Fc, is determined from Statistical Tables

B 2 or B3 depending
B-2
d
di on a level
l l off significance,
i ifi
, and
d degrees
d
of freedom, df1=K , (K, the number of the independent
variables) and df2=n-k-1
n-k-1
Example: The Woody's

Woody s restaurant
Since there are 3 independent variables, the null and alternative
hypotheses are:
H0: N = P = I = 0
Ha: Not all
s are 0
From E-Views
E Views output
output, F=15
F=15.65,
65 Fc(0.05;3,29)=2.93
Fc(0 05;3 29)=2 93
Fc is well below the calculated F-value of 15.65, so we can reject
the null hypothesis
yp
and conclude that the Woody's
y equation
q
does
indeed have a significance of overall fit.

Null hypothesis
Decision rule
Alternative hypothesis
Critical value
Type I Error
t-statistic
Level of significance
Confidence interval
Two-sided test
p-value
1-
Chapter 6
Model Specification: Choosing

the Independent Variables
1-
Specifying an Econometric
Equation and Specification Error
Before any equation can be estimated, it must be completely
specified
Specifying
S
if i an econometric
t i equation
ti consists
i t off three
th
parts,
t
namely choosing the correct:
functional form
form of the stochastic error term
Again,
A i thi
this iis partt off th
the first
fi t classical
l
i l assumption
ti from
f
Chapter
Ch t 4
A specification error results when one of these choices is made
incorrectly
This chapter will deal with the first of these choices (the two other
choices will be discussed in subsequent chapters)
1-
Omitted Variables
Two reasons why an important explanatory variable
might have been left out:
we forgot
it is not available in the dataset, we are examining
Either way, this may lead to omitted variable bias

(or, more generally, specification bias)
The reason for this is that when a variable is not
included, it cannot be held constant
Omitting a relevant variable usually is evidence that the
entire equation is a suspect, because of the likely bias of
the coefficients
coefficients.
1-
The Consequences of an
Omitted Variable
Suppose the true regression model is:

(6.1)
Where
is a classical error term
If X2 is omitted,, the equation

q
becomes instead:
(6.2)
Where:
(6.3)
Hence, the explanatory variables in the estimated regression (6.2) are not
independent of the error term (unless the omitted variable is uncorrelated
with all the included variablessomething which is very unlikely)
But this violates Classical Assumption III!
1-
The Consequences of an Omitted

Variable (cont.)
What happens if we estimate Equation 6.2 when Equation 6.1 is the truth?
We get bias!
What this means is that:
(6.4)
The amount of bias is a function of the impact of the omitted variable on the
dependent variable times a function of the correlation between the included
and the omitted variable
Or,
Or more formally:
(6.7)
So, the bias exists unless:
1. the true coefficient equals zero, or
2. the included and omitted variables are uncorrelated
1-
Correcting for an Omitted

Variable
In theory, the solution to a problem of specification bias seems easy:

add the omitted variable to the equation!
Unfortunately, thats easier said than done, for a couple of reasons

1. Omitted variable bias is hard to detect: the amount of bias introduced can
be small and not immediately detectable
2. Even if it has been decided that a given equation is suffering from omitted
variable bias
bias, how to decide exactly which variable to include?
Note here that dropping a variable is not a viable strategy to help cure
omitted variable bias:
If anything youll just generate even more omitted variable bias on the
remaining coefficients!
1-

Variable (cont.)
What if:
You have an unexpected result
result, which leads you to believe that you have
an omitted variable
You have two or more theoretically sound explanatory variables as
potential
t ti l candidates
did t ffor iinclusion

l i as th
the omitted
itt d variable
i bl tto th
the equation
ti iis
to use
How do you choose between these variables?
One possibility is expected bias analysis

Expected bias: the likely bias that omitting a particular variable would have
caused in the estimated coefficient of one of the included variables
1-

Variable (cont.)
Expected bias can be estimated with Equation 6.7:
(6 7)
(6.7)
When do we have a viable candidate?
When the sign of the expected bias is the same as the sign
of the unexpected result
Similarly
Similarly, when these signs differ,
differ the variable is
extremely unlikely to have caused the unexpected
result
1-
Irrelevant Variables
This refers to the case of including a variable in an equation when it

does not belong there
This is the opposite of the omitted variables caseand so the impact

can be illustrated using the same model
Assume that the true regression specification is:

(6.10)
But the researcher for some reason includes an extra variable:

(
(6.11)
)
The misspecified equations error term then becomes:

(6.12)
1-
Irrelevant Variables (cont.)

(cont )
So, the inclusion of an irrelevant variable will not cause bias
((since the true coefficient of the irrelevant variable is zero,, and so
the second term will drop out of Equation 6.12)
However,, the inclusion of an irrelevant variable will:
Increase the variance of the estimated coefficients, and this
increased variance will tend to decrease the absolute
magnitude of their t-scores
Decrease the R2 (but not the R2)
Table 6.1 summarizes the consequences of the omitted variable
and the included irrelevant variable cases (unless r12 = 0)
1-
Table 6.1 Effect of Omitted Variables and

Irrelevant Variables on the Coefficient Estimates
1-
Four Important Specification

Criteria
We can summarize the previous discussion into four criteria to help
decide whether a given variable belongs in the equation:
1. Theory: Is the variables place in the equation unambiguous and theoretically
sound?
2. t-Test: Is the variables estimated coefficient significant in the expected direction?
3 R2: Does
3.
D
th
the overallll fit off th
the equation
ti ((adjusted
dj t d ffor d
degrees off ffreedom)
d ) iimprove
when the variable is added to the equation?
4. Bias: Do other variables coefficients change significantly when the variable is
q
added to the equation?
If all these conditions hold, the variable belongs in the equation

If none of them hold, it does not belong
The tricky part is the intermediate cases: use sound judgment!
1-
Specification Searches
Almost any result can be obtained from a given
dataset, by simply specifying different regressions until
estimates
ti t with
ith th
the d
desired
i d properties
ti are obtained
bt i d
Hence, the integrity of all empirical work is open to
question
To counter this, the following three points of Best
Practices in Specification Searches are suggested:
1. Rely on theory rather than statistical fit as much as possible when
choosing variables, functional forms, and the like
2 Minimize the number of equations estimated (except for
2.
sensitivity analysis, to be discussed later in this section)
3. Reveal, in a footnote or appendix, all alternative
specifications estimated
1-
Sequential Specification
Searches
The sequential specification search technique allows a researcher to:

Estimate an undisclosed number of regressions
Subsequently present a final choice (which is based upon an unspecified
set of expectations about the signs and significance of the coefficients) as if
it were only
l a specification
ifi ti
Such a method misstates the statistical validity of the regression

results for two reasons:
1. The statistical significance of the results is overestimated because the
estimations of the previous regressions are ignored
2. The expectations used by the researcher to choose between various
regression results rarely, if ever, are disclosed
1-
Bias Caused by Relying on the

t-Test to Choose Variables
Dropping variables solely based on low t-statistics may lead to two

different types of errors:
1. An irrelevant explanatory variable may sometimes be included in the
equation (i.e., when it does not belong there)
2. A relevant explanatory variables may sometimes be dropped from the
2
equation (i.e., when it does belong)
In the first case, there is no bias but in the second case there is bias
Hence, the estimated coefficients will be biased every time an excluded

variable belongs in the equation, and that excluded variable will be left out
every time its estimated coefficient is not statistically significantly different
from zero
So, we will have systematic bias in our equation!
1-
Sensitivity Analysis
Contrary to the advice of estimating as few equations as possible

(and based on theory, rather than fit!), sometimes we see journal article
authors listing results from five or more specifications
Whats going on here:
In almost every case, these authors have employed a technique called

sensitivity analysis
This essentially consists of purposely running a number of alternative

specifications to determine whether particular results are robust (not
statistical flukes)) to a change
g in specification
Why is this useful? Because true specification isnt known!
1-
Data Mining
Data mining involves exploring a data set to try to uncover
empirical
p
regularities
g
that can inform economic theory
y
That is, the role of data mining is opposite that of traditional
econometrics, which instead tests the economic theory on
a data set
Be careful, however!
a hypothesis developed using data mining techniques must be
tested on a different data set (or in a different context) than
the one used to develop the hypothesis
N
Nott doing
d i so would
ld b
be hi
highly
hl unethical:
thi l After
Aft all,
ll the
th researcher
h
already knows ahead of time what the results will be!
1-

Omitted variable
Irrelevant variable
Specification bias
Sequential specification search
Specification
p
error
The four specification criteria
Expected bias
Sensitivity analysis
1-
Chapter 7
Model Specification: Choosing

a Functional Form
1-
The Use and Interpretation of

the Constant Term
An estimate of 0 has at least three components:
1. the true 0
2. the constant impact of any specification errors (an omitted variable,
for example)
3 the
3.
th mean off for
f the
th correctly
tl specified
ifi d equation
ti (if nott equall tto
zero)
Unfortunately,
y these components
p
cant be distinguished
g
from one
another because we can observe only 0, the sum of the three
components
dontt interpret the constant term
As a result of this, we usually don
On the other hand, we should not suppress the constant term,
either, as illustrated by Figure 7.1
1-
Figure 7.1 The Harmful Effect of

Suppressing the Constant Term
1-
Alternative Functional Forms
An equation is linear in the variables if plotting the function in terms of X

and Y generates a straight line
For example, Equation 7.1:

Y = 0 + 1X +
(7.1)
is linear in the variables but Equation 7.2:

Y = 0 + 1X2 +
(7.2)
is not linear in the variables
Similarly, an equation is linear in the coefficients only if the coefficients

appear in their simplest formthey:
form they:
are not raised to any powers (other than one)
are not multiplied or divided by other coefficients
do not themselves include some sort off function
f
(
(like
logs or exponents))
1-

(cont.)
For example, Equations 7.1 and 7.2 are linear in the
coefficients while Equation 7:3:
coefficients,
(7.3)
is not linear in the coefficients
In fact,, of all p
possible equations
q
for a single
g explanatory
p
y
variable, only functions of the general form:
(7.4)
are linear in the coefficients 0 and 1
1-
Linear Form
This is based on the assumption that the slope of the
relationship between the independent variable and the
dependent variable is constant:
For the linear case, the elasticity of Y with respect to X
(the percentage change in the dependent variable
caused by a 1-percent increase in the independent
variable, holding the other variables in the equation
constant) is:
1-
What Is a Log?
If e (a constant equal to 2.71828) to the bth power produces x, then b is the

log of x:
b is the log of x to the base e if: eb = x
Thus, a log (or logarithm) is the exponent to which a given base must be taken
in order to produce a specific number
While logs come in more than one variety
variety, well
we ll use only natural logs (logs to
the base e) in this text
The symbol for a natural log is ln, so ln(x) = b means that (2.71828) b = x or,
more simply,
ln(x) = b means that eb = x
For example, since e2 = (2.71828) 2 = 7.389, we can state that:
ln(7 389) = 2
ln(7.389)
Thus, the natural log of 7.389 is 2! Again, why? Two is the power of e that
produces 7.389
1-
What Is a Log? (cont.)

(cont )
Lets look at some other natural log calculations:

ln(100) = 4.605
ln(1000) = 6.908
ln(10000) = 9.210
ln(1000000) = 13.816
n(100000) = 11.513
Note that as a number goes from 100 to 1,000,000,

1 000 000 its natural log goes from 4.605
4 605 to
only 13.816! As a result, logs can be used in econometrics if a researcher wants to
reduce the absolute size of the numbers associated with the same actual meaning
One useful property of natural logs in econometrics is that they make it easier to
figure out impacts in percentage terms (well see this when we get to the doublelog specification)
1-
Double-Log Form
Here, the natural log of Y is the dependent variable and the natural log
of X is the independent variable:
(7.5)
IIn a double-log
d bl l equation,
ti
an iindividual
di id l regression
i coefficient
ffi i t can b
be
interpreted as an elasticity because:
(7 6)
(7.6)
Note that the elasticities of the model are constant and the slopes are
nott
This is in contrast to the linear model, in which the slopes are

constant but the elasticities are not
1-
Figure 7.2
Double-Log Functions
1-
Semilog Form
The semilog functional form is a variant of the doublelog
g equation
q
in which some but not all of the variables
(dependent and independent) are expressed in terms
of their natural logs.
It can be
b on the
th right-hand
i ht h d side,
id as iin:
Yi = 0 + 1lnX1i + 2X2i + i
(7.7)
Or it can be on the left-hand side, as in:

lnY = 0 + 1X1 + 2X2 +
(7 9)
(7.9)
Figure 7.3 illustrates these two different cases

1-
Figure 7.3
Semilog Functions
1-
Polynomial Form
Polynomial functional forms express Y as a function of

independent variables, some of which are raised to powers other
than 1
For example, in a second-degree polynomial (also called a

quadratic)
d ti ) equation,
ti
att lleastt one independent
i d
d t variable
i bl iis squared:
d
Yi = 0 + 1X1i + 2(X1i)2 + 3X2i + i
(7.10)
The slope of Y with respect to X1 in Equation 7.10 is:

(7 11)
(7.11)
Note that the slope depends on the level of X1
1-
Figure 7.4
Polynomial Functions
1-
Inverse Form
The inverse functional form expresses Y as a function of the
reciprocal (or inverse) of one or more of the independent
variables (in this case
case, X1):
Yi = 0 + 1(1/X1i) + 2X2i + i
(7.13)
So X1 cannot equal zero

This functional form is relevant when the impact of a particular
independent variable is expected to approach zero as that
independent variable approaches infinity
The slope with respect to X1 is:
(7 14)
(7.14)
The slopes for X1 fall into two categories, depending on the sign
of 1 (illustrated in Figure 7.5)
7 5)
1-
Figure 7.5
7 5 Inverse Functions
1-
Table 7.1 Summary of

1-
Lagged Independent
Variables
Virtually all the regressions weve studied so far have been
instantaneous in nature
In other words, they have included independent and
dependent variables from the same time period, as in:
Yt = 0 + 1X1t + 2X2t + t
(7.15)
Many econometric equations include one or more lagged

independent variables like X1t-1 where t1 indicates that
the observation of X1 is from the time period previous to
time period t,
t as in the following equation:
Yt = 0 + 1X1t-1
1t 1 + 2X2t + t
(7 16)
(7.16)
1-
Using Dummy Variables

A dummy variable is a variable that takes on the values of 0
or 1, depending
p
g on whether a condition for a q
qualitative
attribute (such as gender) is met
These conditions take the general form:
(7.18)
This is an example of an intercept dummy (as opposed to
a slope dummy, which is discussed in Section 7.5)
Figure 7.6 illustrates the consequences of including an
intercept dummy in a linear regression model
1-
Figure 7.6
An Intercept Dummy
1-
Slope Dummy Variables

Contrary to the intercept dummy, which changed only the
intercept
p (and
(
not the slope),
p ), the slope
p dummy
y changes
g both
the intercept and the slope
The general form of a slope dummy equation is:
Yi = 0 + 1Xi + 2Di + 3XiDi + i
(7.20)
p depends
p
on the value of D:
The slope
When D = 0, Y/X = 1
When D = 1,
1 Y/X = (1 + 3)
Graphical illustration of how this works in Figure 7.7
1-
Figure 7.7 Slope and

Intercept Dummies
1-
Problems with Incorrect

Functional Forms
If functional forms are similar, and if theory does not specify exactly which form
to use, there are at least two reasons why we should avoid using goodness of fit
over the
th sample
l tto determine
d t
i which
hi h equation
ti tto use:
1.
Fits are difficult to compare if the dependent variable is transformed
2
2.
An incorrect function form may provide a reasonable fit within the sample but
have the potential to make large forecast errors when used outside the range of
the sample
The first of these is essentially due to the fact that when the dependent variable
is transformed, the total sum of squares (TSS) changes as well
The second is essentially die to the fact that using an incorrect functional
amounts to a specification error similar to the omitted variables bias discussed in
Section 6.1
Thiss seco
second
d case is
s illustrated
us a ed in Figure
gu e 7.8
8
1-
Figure 7.8a Incorrect Functional

Forms Outside the Sample Range
1-
Figure 7.8b Incorrect Functional

Forms Outside the Sample Range
1-

Elasticity
Slope dummy
Double
Double-log
log
functional form
Natural log
Semilog
functional form
Interaction term
y
Polynomial
functional form
Inverse
functional form
Omitted condition
Linear in the variables
Linear in
the coefficients
1-
Chapter 8
Multicollinearity
1-
Introduction and Overview

The next three chapters deal with violations of the Classical
Assumptions and remedies for those violations
This chapter addresses multicollinearity; the next two chapters are
on serial correlation and heteroskedasticity
For each of these three problems
problems, we will attempt to answer the
following questions:
1. What is the nature of the problem?
2. What are the consequences of the problem?
3. How is the problem diagnosed?
4. What remedies for the problem are available?
1-
Perfect Multicollinearity
Perfect multicollinearity violates Classical Assumption VI, which specifies

that no explanatoryy variable is a perfect linear function of any
y other explanatory
y
variables
The word perfect in this context implies that the variation in one explanatory
variable can be completely explained by movements in another explanatory
variable
A special case is that of a dominant variable: an explanatory variable is
definitionally
y related to the dependent
p
variable
An example would be (Notice: no error term!):

X1i = 0 + 1X2i
(8.1)
where the s are constants and the Xs are independent variables in:
Yi = 0 + 1X1i + 2X2i + i
(8.2)
Figure 8.1 illustrates this case
1-
Figure 8.1
1-
(cont.)
What happens to the estimation of an econometric equation where there is

perfect multicollinearity?
y
OLS is incapable of generating estimates of the regression coefficients
most OLS computer programs will print out an error message in such a situation
What is going on?
Essentially, perfect multicollinearity ruins our ability to estimate the coefficients

because the perfectly collinear variables cannot be distinguished from each
other:
You cannot hold all the other independent variables in the equation constant if
every time one variable changes
changes, another changes in an identical manner!
Solution: one of the collinear variables must be dropped (they are essentially
identical, anyway)
1-
Imperfect Multicollinearity
Imperfect multicollinearity occurs when two
( more)) explanatory
(or
l
t
variables
i bl are imperfectly
i
f tl
linearly related, as in:
X1i = 0 + 1X2i + ui
(8.7)
Compare
C
E
Equation
ti 8.7
8 7 to
t Equation
E
ti 8.1
81
Notice that Equation 8.7 includes ui, a stochastic
error term
This case is illustrated in Figure 8.2

82
1-
Figure 8.2
Imperfect Multicollinearity
1-
The Consequences of
Multicollinearity
There are five major consequences of multicollinearity:
1
1.
Estimates will remain unbiased
2.
The variances and standard errors of the estimates

will increase:
a. Harder to distinguish the effect of one variable from the
effect of another,, so much more likely
y to make large
g
errors in estimating the s than without multicollinearity
b. As a result, the estimated coefficients, although still
unbiased now come from distributions with much larger
unbiased,
variances and, therefore, larger standard errors (this
point is illustrated in Figure 8.3)
1-
Figure 8.3 Severe Multicollinearity

Increases the Variances of the s
1-
The Consequences of
Multicollinearity (cont.)
3. The computed t-scores will fall:
a. Recalling
g Equation
q
5.2,, this is a direct consequence
q
of 2. above
4. Estimates will become very sensitive to changes in specification:

a.
The addition or deletion of an explanatory variable or of a few observations

will
ill often
ft cause major
j changes
h
iin th
the values
l
off th
the s when
h significant
i ifi
t
multicollinearity exists
b.
For example, if you drop a variable, even one that appears to be statistically
i i ifi
insignificant,
t th
the coefficients
ffi i t off the
th remaining
i i variables
i bl iin th
the equation
ti
sometimes will change dramatically
c.
This is again because with multicollinearity, it is much harder to distinguish

th effect
the
ff t off one variable
i bl ffrom th
the effect
ff t off another
th
5. The overall fit of the equation and the estimation of the coefficients of
g y unaffected
nonmulticollinear variables will be largely
1-
The Detection of
Multicollinearity
First realize that that some multicollinearity exists in every
equation:
q
all variables are correlated to some degree
g
((even if
completely at random)
q
of how much multicollinearityy exists in
So its reallyy a question
an equation, rather than whether any multicollinearity exists
p detect the
There are basicallyy two characteristics that help
degree of multicollinearity for a given application:
1. High simple correlation coefficients
2. High Variance Inflation Factors (VIFs)
We will now go through each of these in turn:
1-
High Simple Correlation

Coefficients
If a simple correlation coefficient, r, between any two explanatory

variables is high in absolute value, these two particular Xs are highly
correlated and multicollinearity is a potential problem
How high is high?

Some researchers pick an arbitraryy number, such as 0.80
A better answer might be that r is high if it causes unacceptably large
variances in the coefficient estimates in which were interested.
Caution in case of more than two explanatory variables:

Groups of independent variables, acting together, may cause
multicollinearity without any single simple correlation coefficient being high
enough to indicate that multicollinearity is present
As a result, simple correlation coefficients must be considered to be
sufficient but not necessary tests for multicollinearity
1-
High Variance Inflation

Factors (VIFs)
The variance inflation factor (VIF) is calculated from two steps:
1
1.
Run an OLS regression that has Xi as a function of all the other

explanatory variables in the equationFor i = 1, this equation
would be:
X1 = 1 + 2X2 + 3X3 + + KXK + v
(8.15)
where v is a classical stochastic error term

2.
Calculate the variance inflation factor for
:
(8 16)
(8.16)
where
is the unadjusted
from step one

1-
High Variance Inflation

Factors (VIFs) (cont.)
From Equation 8.16, the higher the VIF, the more severe the effects of
mulitcollinearity
How high is high?
While there is no table of formal critical VIF values, a common rule of thumb is
that if a given VIF is greater than 5, the multicollinearity is severe
As the number of independent variables increases, it makes sense to

increase this number slightly
Note
N
t th
thatt the
th authors
th
replace
l
the
th VIF with
ith its
it reciprocal,
i
l
tolerance, or TOL
Problems with VIF:
, called
ll d
No hard and fast VIF decision rule

There can still be severe multicollinearity even with small VIFs
VIF is a sufficient, not necessary, test for multicollinearity
1-
Remedies for
Multicollinearity
Essentially three remedies for multicollinearity:
1 Do nothing:
1.
a. Multicollinearity will not necessarily reduce the tscores enough
g to make them statistically
y
insignificant and/or change the estimated
coefficients to make them differ from expectations
b the deletion of a multicollinear variable that belongs
b.
in an equation will cause specification bias
2. Drop
p a redundant variable:
a. Viable strategy when two variables measure
essentially the same thing
b Always use theory as the basis for this decision!
b.
1-
Remedies for
Multicollinearity (cont.)
3. Increase the sample size:
a This is frequently impossible but a useful alternative
a.
to be considered if feasible
b The idea is that the larger sample normally will
b.
reduce the variance of the estimated coefficients,
diminishing the impact of the multicollinearity
1-
Table 8.1a
8 1a
1-
Table 8.1a
8 1a
1-
Table 8.2a
8 2a
1-
Table 8.2b
8 2b
1-
Table 8.2c
8 2c
1-
Table 8.2d
8 2d
1-
Table 8.3a
8 3a
1-
Table 8.3b
8 3b
1-

Perfect multicollinearity
Severe imperfect multicollinearity
Dominant variable
Auxiliary (or secondary) equation
Variance inflation factor
Redundant variable
1-
Chapter 9
Serial Correlation
1-
Pure Serial Correlation

Pure serial correlation occurs when Classical Assumption IV,
which assumes uncorrelated observations of the error term, is
violated (in a correctly specified equation!)
The most commonly assumed kind of serial correlation is firstorder serial correlation,, in which the current value of the error
term is a function of the previous value of the error term:
t = t1 + ut
(9.1)
where: = the error term of the equation in question

= the first-order autocorrelation coefficient
u = a classical (not serially correlated) error term
1-
Pure Serial Correlation (cont

(cont.))
The magnitude of indicates the strength of the serial
correlation:
If is zero, there is no serial correlation
As approaches one in absolute value
value, the previous observation
of the error term becomes more important in determining the
current value of t and a high degree of serial correlation exists
For to exceed one is unreasonable, since the error term
effectively would explode
As a result of this, we can state that:

1 < < +1
((9.2))
1-
Pure Serial Correlation (cont

(cont.))
The sign of indicates the nature of the serial correlation in an
equation:
Positive:
implies that the error term tends to have the same sign from one
time period to the next
this is called positive serial correlation
Negative:
implies that the error term has a tendency to switch signs from
negative to positive and back again in consecutive observations
this is called negative serial correlation
Figures 9.19.3 illustrate several different scenarios

1-
Figure 9.1a
Positive Serial Correlation
1-
Figure 9.1b
Positive Serial Correlation
1-
Figure 9.2
No Serial Correlation
1-
Figure 9.3a
Negative Serial Correlation
1-
Figure 9.3b
Negative Serial Correlation
1-
Impure Serial Correlation
Impure serial correlation is serial correlation that is caused by a

specification error such as:
an omitted variable and/or
an incorrect functional form
How does this happen?
As an example, suppose that the true equation is:

(9 3)
(9.3)
where t is a classical error term. As shown in Section 6.1, if X2 is
accidentally omitted from the equation (or if data for X2 are unavailable),
then:
th
(9.4)
The error term is therefore not a classical error term
1-
Impure Serial Correlation (cont

(cont.))
Instead, the error term is also a function of one of the
explanatory variables, X2
As a result, the new error term, * , can be serially correlated
even if the true error term , is not
IIn particular,
i l the
h new error term will
ill tend
d to b
be serially
i ll
correlated when:
1 X2 itself is serially correlated (this is quite likely in a time
1.
series) and
2. the size of is small compared to the size of
Figure 9.4 illustrates 1., for the case of U.S. disposable

income
1-
Figure 9.4 U.S. Disposable

Income as a Function of Time
1-
Impure Serial Correlation (cont

(cont.))
Turn now to the case of impure serial correlation caused by an

incorrect functional form
Suppose that the true equation is polynomial in nature:

( )
(9.7)
but that instead a linear regression is run:
(9 8)
(9.
The new error term * is now a function of the true error term and of
the differences between the linear and the polynomial
y
functional
forms
Figure 9.5 illustrates how these differences often follow fairly

autoregressive
t
i patterns
tt
1-
Figure 9.5a Incorrect Functional Form as a

Source of Impure Serial Correlation
1-
Figure 9.5b Incorrect Functional Form as a

Source of Impure Serial Correlation
1-
The Consequences of Serial

Correlation
The existence of serial correlation in the error term of an equation

violates Classical Assumption IV, and the estimation of the equation
with OLS has at least three consequences:
1.
Pure serial correlation does not cause bias in the coefficient estimates
2.
Serial correlation causes OLS to no longer be the minimum variance

estimator (of all the linear unbiased estimators)
3
3.
Serial correlation causes the OLS estimates of the SE to be biased,

biased
leading to unreliable hypothesis testing. Typically the bias in the SE
estimate is negative, meaning that OLS underestimates the standard
errors of the coefficients (and thus overestimates the tt-scores)
scores)
1-
The DurbinWatson d Test
Two main ways to detect serial correlation:
Informal: observing a pattern in the residuals like that in Figure 9.1
Formal: testing for serial correlation using the DurbinWatson d test
We will now go through the second of these in detail
First, it is important to note that the DurbinWatson

First
Durbin Watson d test is only applicable if
the following three assumptions are met:
1. The regression model includes an intercept term
2 The serial correlation is first-order
2.
first order in nature:
t = t1 + ut
where is the autocorrelation coefficient and u is a classical
(normally distributed) error term
3. The regression model does not include a lagged dependent variable
(discussed in Chapter 12) as an independent variable
1-
The DurbinWatson
d Test (cont.)
The equation for the DurbinWatson d statistic for T
observations is:
(9 10)
(9.10)
where the ets are the OLS residuals
There are three main cases:
1. Extreme positive serial correlation: d = 0
2. Extreme negative serial correlation: d 4
3 No serial correlation: d 2
3.
1-
The DurbinWatson
d Test (cont.)
To test for positive (note that we rarely, if ever, test for
negative!)
g
) serial correlation,, the following
g steps
p are
required:
1. Obtain the OLS residuals from the equation to be tested and
calculate
l l t the
th d statistic
t ti ti by
b using
i E
Equation
ti 9.10
9 10
2. Determine the sample size and the number of explanatory
variables and then consult Statistical Tables B-4,, B-5,, or B-6
in Appendix B to find the upper critical d value, dU, and the
lower critical d value, dL, respectively (instructions for the use of
pp
)
these tables are also in that appendix)
1-
The DurbinWatson
d Test (cont.)
3. Set up the test hypotheses and decision rule:
H0 : 0
( positive
(no
iti serial
i l correlation)
l ti )
HA : > 0
(positive serial correlation)
if d < dL
Reject H0
if d > dU
Do not reject H0
if dL d dU
Inconclusive
In rare circumstances, perhaps first differenced

equations, a two-sided d test might be appropriate
In such a case
case, steps 1 and 2 are still used
used, but step 3 is now:
1-
The DurbinWatson
d Test (cont.)
3. Set up the test hypotheses and decision rule:
H0 : = 0
(no serial correlation)
HA : 0
(serial correlation)
if d < dL
Reject H0
if d > 4 dL
Reject H0
if 4 dU > d > dU
Do Not Reject H0
Otherwise
Inconclusive
Figure 9.6 gives an example of a one-sided Durbin Watson d test

1-
Figure 9.6 An Example of a OneSided DurbinWatson d Test
1-
Remedies for Serial

Correlation
The place to start in correcting a serial correlation problem is to look

carefully at the specification of the equation for possible errors that
might be causing impure serial correlation:
Is the functional form correct?
Are you sure that there are no omitted variables?
Only after the specification of the equation has bee reviewed carefully
should the possibility of an adjustment for pure serial correlation be
considered
id d
There are two main remedies for pure serial correlation:

1 Generalized Least Squares
1.
2. Newey-West standard errors
We will no discuss each of these in turn
1-
Generalized Least Squares
Start with an equation that has first-order serial correlation:

(9 15)
(9.15)
Which, if t = t1 + ut (due to pure serial correlation), also equals:

(9 16)
(9.16)
Multiply Equation 9.15 by and then lag the new equation by one
period obtaining:
period,
(9.17)
1-

(cont.)
Next, subtract Equation 9.107 from Equation 9.16, obtaining:

(9 18)
(9.18)
Finally, rewrite equation 9.18 as:

(9 19)
(9.19)
(9.20)
1-

(cont.)
Equation 9.19 is called a Generalized Least Squares
(or quasi-differenced) version of Equation 9.16.
Notice that:
1. The error term is not serially correlated
a. As a result, OLS estimation of Equation 9.19 will be
minimum variance
b. This is true if we know or if we accurately estimate )
2. The slope coefficient 1 is the same as the slope

coefficient of the original serially correlated equation,
E
Equation
ti 9.16.
9 16 Thus
Th coefficients
ffi i t estimated
ti t d with
ith GLS
have the same meaning as those estimated with
OLS.
1-

(cont.)
3. The dependent variable has changed compared to
that in Equation 9.16. This means that the GLS is
nott directly
di tl comparable
bl to
t th
the OLS.
OLS
4. To forecast with GLS, adjustments like those
discussed in Section 15.2
15 2 are required
Unfortunately, we cannot use OLS to estimate a GLS
model because GLS equations are inherentlyy nonlinear
in the coefficients
Fortunately, there are at least two other methods
available:
1-
The CochraneOrcutt Method
Perhaps the best known GLS method
p iterative technique
q that first p
produces an estimate
This is a two-step
of and then estimates the GLS equation using that estimate.
The two steps are:

1 Estimate by running a regression based on the residuals of the equation suspected
1.
of having serial correlation:
(9.21)
et = et1 + ut
having pure
where the ets are the OLS residuals from the equation suspected of
serial correlation and ut is a classical error term
2. Use this to estimate the GLS equation by substituting into Equation 9.18 and using
OLS to estimate Equation 9.18 with the adjusted data
These two steps are repeated (iterated) until further iteration results in little
change in
Once has converged (usually in just a few iterations), the last estimate of
step 2 is used as a final estimate of Equation 9.18
9 18
1-
The AR(1) Method

Perhaps a better alternative than CochraneOrcutt for GLS
models
The AR(1) method estimates a GLS equation like Equation 9.18
by estimating 0, 1 and simultaneously with iterative
nonlinear regression techniques (that are well beyond the
scope off this
thi chapter!)
h t !)
The AR(1) method tends to produce the same coefficient
estimates as CochraneOrcutt
However, the estimated standard errors are smaller
This is whyy the AR(1)
( ) approach is recommended as long
g as yyour
software can support such nonlinear regression
1-
NeweyWest Standard Errors

Again, not all corrections for pure serial correlation involve
NeweyWest standard errors take account of serial
correlation by correcting the standard errors without
changing the estimated coefficients
The logic begin Newey
NeweyWest
West standard errors is powerful:
If serial correlation does not cause bias in the estimated
coefficients but does impact the standard errors,
errors then it makes
sense to adjust the estimated equation in a way that changes the
standard errors but not the coefficients
1-
NeweyWest Standard Errors

(cont.)
The NeweyWest SEs are biased but generally more
accurate than uncorrected standard errors for large
samples in the face of serial correlation
As a result
result, Newey
NeweyWest
West standard errors can be used for
t-tests and other hypothesis tests in most samples without
the errors of inference p
potentially
y caused by
y serial
correlation
Typically, NeweyWest
Newey West SEs are larger than OLS SEs, thus
producing lower t-scores
1-

Impure serial correlation
First-order serial correlation
First-order autocorrelation coefficient
DurbinWatson d statistic
Generalized Least Squares (GLS)
Positive serial correlation
NeweyWest standard errors
1-
Chapter 10
Heteroskedasticity
1-
Pure Heteroskedasticity
Pure heteroskedasticity occurs when Classical
Assumption V,
V which assumes constant variance of the
error term, is violated (in a correctly specified equation!)
Classical Assumption V assumes that:
(10.1)
With heteroskedasticity, this error term variance is not
constant
10-
1-
(cont.)
Instead, the variance of the distribution of the error term
depends on exactly which observation is being
discussed:
(10 2)
(10.2)
The simplest case is that of discrete heteroskedasticity,
where the observations of the error term can be grouped
into just two different distributions, wide and narrow
This
Thi case is
i illustrated
ill t t d iin Fi
Figure 10.1
10 1
10-
1-
Figure 10.1a Homoskedasticity

versus Discrete Heteroskedasticity
10-
1-
Figure 10.1b Homoskedasticity

versus Discrete Heteroskedasticity
10-
1-
(cont.)
Heteroskedasticity takes on many more complex forms, however,

than the discrete heteroskedasticity case
Perhaps the most frequently specified model of pure

heteroskedasticity relates the variance of the error term to an
exogenous variable
i bl Zi as follows:
f ll
(10.3)
(10.4)
where Z, the proportionality factor, may or may not be in the
equation
This is illustrated in Figures 10.2 and 10.3
10-
1-
Figure 10.2 A Homoskedastic

Error Term with Respect to Zi
10-
1-
Figure 10.3 A Heteroskedastic

Error Term with Respect to Zi
10-
1-
Impure Heteroskedasticity
Similar to impure serial correlation, impure heteroskedasticity is

heteroskedasticityy that is caused byy a specification error
Contrary to that case, however, impure heteroskedasticity almost always

originates from an omitted variable (rather than an incorrect functional
form)

The p
portion of the omitted effect not represented
p
by
y one of the included
explanatory variables must be absorbed by the error term.
So, if this effect has a heteroskedastic component, the error term of the
p
equation
q
might
g be heteroskedastic even if the error term of the true
misspecified
equation is not!
This highlights, again, the importance of first checking that the

specification is correct before trying to fix
fix things
things
10-
1-
The Consequences of
Heteroskedasticity
The existence of heteroskedasticity in the error term of an
equation violates Classical Assumption V, and the estimation of
th equation
the
ti with
ith OLS h
has att least
l
t three
th
consequences:
1. Pure heteroskedasticity does not cause bias in the coefficient
estimates
2. Heteroskedasticity typically causes OLS to no longer be the
minimum variance estimator ((of all the linear unbiased
estimators)
3. Heteroskedasticity causes the OLS estimates of the SE to be
biased, leading to unreliable hypothesis testing. Typically
the bias in the SE estimate is negative, meaning that OLS
underestimates the standard errors (and thus overestimates the
t-scores)
10-
1-
Testing for
Heteroskedasticity
Econometricians do not all use the same test for heteroskedasticity because
heteroskedasticity takes a number of different forms, and its precise
manifestation in a g
given equation
q
is almost never known
Before using any test for heteroskedasticity, however, ask the following:
1. Are there any obvious specification errors?
Fix
Fi th
those before
b f
t ti !
testing!
2. Is the subject of the research likely to be afflicted with heteroskedasticity?

Not only are cross-sectional studies the most frequent source of
heteroskedasticity but cross-sectional
heteroskedasticity,
cross sectional studies with large variations in the size of
the dependent variable are particularly susceptible to heteroskedasticity
3. Does a graph of the residuals show any evidence of heteroskedasticity?

Specifically
Specifically, plot the residuals against a potential Z proportionality factor
In such cases, the graph alone can often show that heteroskedasticity is or is
not likely
Figure 10.4 shows an example of what to look for: an expanding (or contracting)
range off the
th residuals
id l
10-
1-
Figure 10.4 Eyeballing Residuals

for Possible Heteroskedasticity
10-
1-
The Park Test

The Park test has three basic steps:
1. Obtain the residuals of the estimated regression equation:
(
(10.6)
)
2. Use these residuals to form the dependent variable in a
second regression:
(10.7)
where: ei = the residual from the ith observation from Equation 10.6
Zi = your best choice as to the possible proportionality factor (Z)
ui = a classical (homoskedastic) error term
10-
1-
The Park Test

3. Test the significance of the coefficient of Z in
Equation 10.7
10 7 with a tt-test:
test:
If the coefficient of Z is statistically significantly different from
zero this is evidence of heteroskedastic patterns in the
zero,
residuals with respect to Z
Potential issue: How do we choose Z in the first place?
10-
1-
The White Test

The White test also has three basic steps:
1 Obtain
1.
Obt i the
th residuals
id l off the
th estimated
ti t d regression
i equation:
ti
This is identical to the first step in the Park test
2 U
2.
Use th
these residuals
id l ((squared)
d) as th
the dependent
d
d t variable
i bl in
i a
second equation that includes as explanatory variables each X
from the original equation, the square of each X, and the product of
each X times every other Xfor example, in the case of three
explanatory variables:
(10.9)
10-
1-
The White Test (cont.)

(cont )
3. Test the overall significance of Equation 10.9 with the
chi-square test
The appropriate test statistic here is NR2, or the sample size (N) times
the coefficient of determination (the unadjusted R2) of Equation 10.9
This test statistic has a chi-square distribution with degrees of freedom
equal to the number of slope coefficients in Equation 10.9
If NR2 is larger than the critical chi-square
chi square value found in Statistical
Table B-8, then we reject the null hypothesis and conclude that it's likely
that we have heteroskedasticity
If NR2 is less than the critical chi
chi-square
square value
value, then we cannot reject
the null hypothesis of homoskedasticity
10-
1-
Remedies for
Heteroskedasticity
The place to start in correcting a heteroskedasticity problem is to look

carefully at the specification of the equation for possible errors that
might be causing impure heteroskedasticity :
Are you sure that there are no omitted variables?
Only after the specification of the equation has been reviewed carefully
should the possibility of an adjustment for pure heteroskedasticity be
considered
There are two main remedies for pure heteroskedasticit1

1. Heteroskedasticity-corrected standard errors
2. Redefining the variables
We will now discuss each of these in turn:
10-
1-
Heteroskedasticity-Corrected
Standard Errors
Heteroskedasticity-corrected errors take account of
heteroskedasticity correcting the standard errors without
changing the estimated coefficients
The logic behind heteroskedasticity-corrected
heteroskedasticity corrected standard
errors is power
If heteroskedasticity does not cause bias in the estimated
coefficients but does impact the standard errors, then it makes
sense to adjust the estimated equation in a way that changes the
standard errors but
b t not the coefficients
10-
1-
Heteroskedasticity-Corrected
Standard Errors (cont.)
The heteroskedasticity-corrected SEs are biased but
generally more accurate than uncorrected standard
errors for large samples in the face of heteroskedasticity
As a result
result, heteroskedasticity-corrected
heteroskedasticity corrected standard errors
can be used for t-tests and other hypothesis tests in most
samples
p
without the errors of inference p
potentially
y caused
by heteroskedasticity
Typically heteroskedasticity-corrected
heteroskedasticity corrected SEs are larger than
OLS SEs, thus producing lower t-scores
10-
1-
Redefining the Variables

Sometimes its possible to redefine the variables in a way
that avoids heteroskedasticity
Be careful, however:
Redefining your variables is a functional form specification
change that can dramatically change your equation!
IIn some cases, the

th only
l redefinition
d fi iti th
that's
t' needed
d d tto rid
id an
equation of heteroskedasticity is to switch from a linear
functional form to a double-log
double log functional form:
The double-log form has inherently less variation than the linear
form,, so it's less likelyy to encounter heteroskedasticityy
10-
1-

(cont.)
In other situations, it might be necessary to completely
rethink the research project in terms of its underlying
theory
For example,
example a cross-sectional model of the total
expenditures by the governments of different cities may
generate heteroskedasticity
g
y byy containing
g both large
g and
small cities in the estimation sample
Why?
Because of the proportionality factor (Z) the size of the cities
10-
1-

(cont.)
This is illustrated in Figure 10.5
In this case, per capita expenditures would be a logical
dependent variable
Such a transformation is shown in Figure 10.6
Aside: Note that Weighted
g
Least Squares
q
(WLS),
(
), that
some authors suggest as a remedy for heteroskedasticity,
has some serious potential drawbacks and can therefore
generally
ll iis nott be
b recommended
d d ((see F
Footnote
t t 14,
14 p. 355,
355
for details)
10-
1-
Figure 10.5 An Aggregate

City Expenditures Function
10-
1-
Figure 10.6 A Per Capita City

Expenditures Function
10-
1-
Table 10.1a
10 1a
10-
1-
Table 10.1b
10 1b
10-
1-
Table 10.1c
10 1c
10-
1-

Impure heteroskedasticity
Pure heteroskedasticity
Proportionality factor Z
The Park test
The White test
Heteroskedasticity-corrected standard errors
10-
1-
Chapter 11
Running Your Own Regression

Project
j
1-
Choosing Your Topic

There are at least three keys to choosing a topic:
1 Try
1.
T to
t pick
i k a field
fi ld th
thatt you fifind
d iinteresting
t
ti and/or
d/ that
th t you know
k
something about
2 Make sure that data are readily available with a reasonable
2.
sample (we suggest at least 25 observations)
3 Make sure that there is some substance to your topic
3.
Avoid topics that are purely descriptive or virtually tautological in nature
Instead,, look for topics
p
that address an inherently
y interesting
g economic or
behavioral question or choice
1-
Choosing Your Topic (cont.)

(cont )
Places to look:
your textbooks and notes from previous economics classes
economics journals
For example, Table 11.1 contains a list of the journals cited so far in this
textbook (in order of the frequency of citation)
1-
Table 11.1a
Sources of Potential Topic Ideas
1-
Table 11.1b
Sources of Potential Topic Ideas
1-
Collecting Your Data
Before any quantitative analysis can be done, the data must be:
collected
organized
entered into a computer
Usually, this is a time-consuming and frustrating task because of:

the difficulty of finding data
the existence of definitional differences between theoretical variables
and their empirical counterparts
and the high probability of data entry errors or data transmission errors
But time spent thinking about and collecting the data is well spent, since a
researcher who knows the data sources and definitions is much less likely
to make mistakes using or interpreting regressions run on that data
We will now discuss three data collection issues in a bit more detail
1-
What Data to Look For
Checking for data availability means deciding what specific variables you
want to study:
y
dependent variable
all relevant independent variables
At least 5 issues to consider here:

1. Time periods:
If the dependent variable is measured annually, the explanatory variables
should also be measured annually and not, say, monthly
2. Measuring
gq
quantity:
y
If the market and/or quality of a given variable has changed over time, it makes
little sense to use quantity in units
Example: TVs have changed so much over time that it makes more sense to use
quantity in terms of monetary equivalent: more comparable across time
1-
What Data to Look For (cont.)

(cont )
3. Nominal or real terms?
Depends on theor
theory essentially:
essentiall do we
e want
ant to clean for inflation?
TVs, again: probably use real terms
4 Appropriate variable definitions depend on whether data are cross

4.
crosssectional or time-series
TVs,, again:
g
national advertising
g would be a g
good candidate for an
explanatory variable in a time-series model, while advertising in or near
each state (or city) would make sense in a cross-sectional model
5 Be careful
5.
caref l when
hen reading (and creating!) descriptions of data
data:
Where did the data originate?
Are prices and/or income measured in nominal or real terms?
Are prices retail or wholesale?
1-
Where to Look for

Economic Data
Although some researchers generate their own data through
surveys
y or other techniques
q
((see Section 11.3),
), the vast majority
j y
of regressions are run on publicly available data
Good sources here include:
1. Government publications:
Statistical Abstract of the U
U.S.
S
the annual Economic Report of the President
the Handbook of Labor Statistics
Historical Statistics of the U.S. (published in 1975)
Census Catalog and Guide
1-
Where to Look for

Economic Data (cont.)
2. International data sources:
U.N.
U N St
Statistical
ti ti l Y
Yearbook
b k
U.N. Yearbook of National Account Statistics
3. Internet resources:
Resources for Economists on the Internet
Economagic
WebEC
EconLit (www.econlit.org)
Dialog
Links to these sites and other good sources of data are on the
1-
texts Web
site:
www.pearsonhighered.com/studenmund
2011 Pearson Addison-Wesley.
All rights
reserved.
Missing Data
Suppose the data arent there?
What happens if you choose the perfect variable and
look in all the right sources and cant find the data?
The answer to this question depends on how much
data is missing:
1. A few observations:
in a cross-section study:
Can usually afford to drop these observations from the
sample
in a time-series study:
May interpolate value (taking the mean of adjacent values)
1-
Missing Data (cont.)

(cont )
2. No data at all available (for a theoretically relevant
variable!):
From Chapter 6, we know that this is likely to cause
omitted variables bias
A possible solution here is to use a proxy variable
For example, the value of net investment is a variable
that is not measured directly in a number of countries
Instead, might use the value of gross investment as a
proxy the assumption being that the value of gross
proxy,
investment is directly proportional to the value of net
investment
1-
Advanced Data Sources

So far, all the data sets have been:
1. cross-sectional or time-series in nature
2. been collected by observing the world around us, instead being
created
It turns out,
out however,
however that:
1. time-series and cross-sectional data can be pooled to form panel
data
2. data can be generated through surveys
We will now briefly introduce these more advanced data

sources and explain why it probably doesn't
doesn t make sense to
use these data sources on your first regression project:
1-
Surveys
Surveys are everywhere in our society and are
used for many different purposes
purposesexamples
examples
include:
marketing firms using surveys to learn more about
products and competition
political candidates using surveys to finetune their
campaign advertising or strategies
go
governments
e
e ts us
using
g surveys
su eys for
o a
all so
sorts
ts o
of pu
purposes,
poses,
including keeping track of their citizens with instruments
like the U.S. Census
1-
Surveys (cont.)
(cont )
While running your own survey might be tempting as a
way of obtaining data for your own project,
project running a survey
is not as easy as it might seem surveys:
must be carefully thought through; its virtually impossible to go
back to the respondents and add another question later
must be worded precisely (and pretested) to avoid confusing the
respondent or "leading"
leading the respondent to a particular answer
must have samples that are random and avoid the selection,
survivor, and nonresponse biases explained in Section 17.2
As a result, we don't encourage beginning researchers to

run their own surveys...
1-
Panel Data
Again, panel data are formed when cross-sectional and
time-series
time
series data sets are pooled to create a single data
set
Two main reasons for using
gp
panel data:
To increase the sample size
To provide an insight into an analytical question that can't be
obtained by using time-series or cross-sectional data alone
1-
Panel Data (cont.)

(cont )
Example: suppose were interested in the relationship
between budget
g deficits and interest rates but only
y have 10
years of annual data to study
But ten observations is too small a sample for a reasonable
regression!
However, if we can find time-series data on the same economic
variables-interest rates and budget deficitsfor the same ten years
for six different countries
countries, we
wellll end up with a sample of 10
10*6
6 = 60
observations, which is more than enough
The result is a pooled cross-section time-series data seta
panel data set!
Panel data estimation methods are treated in Chapter 16
1-
Practical Advice for

Your Project
We now move to a discussion of practical advice
about actually doing applied econometric work
This discussion is structured in three parts:
1 The 10 Commandments of Applied Econometrics
1.
(by Peter Kennedy)
2 What to check if you get an unexpected sign
2.
3. A collection of a dozen practical tips, brought
g
from other sections of this text that are worth
together
reiterating specifically in the context of actually doing
applied econometric work
1-
Practical Advice for

Your Project
We now move to a discussion of practical advice
about actually doing applied econometric work
This discussion is structured in three parts:
1 The 10 Commandments of Applied Econometrics
1.
(by Peter Kennedy)
2 What to check if you get an unexpected sign
2.
3. A collection of a dozen practical tips, brought
g
from other sections of this text that are worth
together
reiterating specifically in the context of actually doing
applied econometric work
1-
The 10 Commandments of
Applied Econometrics
1. Use common sense and economic theory:
Example:
p match p
per capita
p variables with p
per capita
p variables,, use real exchange
g rates to
explain real imports or exports, etc
2. Ask the right questions:

Ask
A
k plenty
l t of,
f perhaps,
h
seemingly
i l silly
ill questions
ti
tto ensure th
thatt you ffully
ll understand
d t d th
the
goal of the research
3. Know the context:

Be sure to be familiar with the history, institutions, operating constraints, measurement
peculiarities, cultural customs, etc, underlying the object under study
4 Inspect the data:

4.
a. This includes calculating summary statistics, graphs, and data cleaning (including
checking filters)
b. The objective is to get to know the data well
1-
Applied Econometrics (cont.)
5. Keep it sensibly simple:
a Begin with a simple model and only complicate it if it fails
a.
b. This both goes for the specifications, functional forms, etc and for the
estimation method
6. Look long and hard at your results:
a. Check that the results make sense, including signs and magnitudes
b. Apply the laugh test
7. Understand the costs and benefits of data mining:
a. Bad data mining: deliberately searching for a specification that works
(i.e. torturing the data)
b Good
b.
Good data mining: experimenting with the data to discover empirical
regularities that can inform economic theory and be tested on a second data
set
1-
8. Be prepared to compromise:
a. The Classical Assumptions are only rarely are satisfied
b. Applied econometricians are therefore forced to compromise and adopt
suboptimal solutions, the characteristics and consequences of which are
not always known
c. Applied econometrics is necessarily ad hoc: we develop our analysis,
including responses to potential problems, as we go along
9. Do not confuse statistical significance with meaningful magnitude:
a. If the sample size is large enough, any (two-sided) hypothesis can be
rejected (when large enough to make the SEs small enough)
b. Substantive significancei.e. how large?is also important, not just
statistical significance
1-
10. Report a sensitivity analysis:
a Dimensions to examine:
a.
i. sample period
ii the functional form
ii.
iii. the set of explanatory variables
i th
iv.
the choice
h i off proxies
i
b. If results are not robust across the examined dimensions, then
this casts doubt on the conclusions of the research
1-
What to Check If You Get an

Unexpected Sign
1. Recheck the expected sign
Were dummy variables computed upside
upside down
down, for example?
2. Check your data for input errors and/or outliers
3 Check
3.
Ch k ffor an omitted
itt d variable
i bl
The most frequent source of significant unexpected signs
4. Check for an irrelevant variable
Frequent source of insignificant unexpected signs
5. Check for multicollinearity
Multicollinearity increases the variances and standard errors of the
estimated
ti t d coefficients,
ffi i t iincreasing
i th
the chance
h
th
thatt a coefficient
ffi i t could
ld
have an unexpected sign
1-
What to Check If You Get an

Unexpected Sign
6. Check for sample selection bias
An unexpected sign sometimes can be due to the fact that the
observations included in the data were not obtained randomly
7. Check your sample size

The smaller the sample size, the higher the variance on SEs
8. Check your theory

If nothing else is apparently wrong, only two possibilities remain:
the theory is wrong or the data is bad
1-
A Dozen Practical Tips Worth

Reiterating
1. Dont attempt to maximize R2 (Chapter 2)
2. Always review the literature and hypothesize the signs

of your coefficients before estimating a model (Chapter 3)
3. Inspect and clean your data before estimating a model.
Know that outliers should not be automatically omitted;
instead, they should be investigated to make sure that
they belong in the sample (Chapter 3)
4. Know the Classical Assumptions cold! (Chapter 4)
5. In g
general,, use a one-sided t-test unless the expected
p
sign of the coefficient actually is in doubt (Chapter 5)
1-

Reiterating (cont.)
6. Dont automatically discard a variable with an
insignificant t-score.
t score In general,
general be willing to live with a
variable with a t-score lower than the critical value in order
to decrease the chance of omitting
g a relevant variable
(Chapter 6)
7 Know how to analyze the size and direction of the bias
7.
caused by an omitted variable (Chapter 6)
8 Understand all the different functional form options and
8.
their common uses, and remember to choose your
functional form p
primarily
y on the basis of theory
y, not fit
(Chapter 7)
1-

Reiterating (cont.)
9. Multicollinearity doesnt create bias; the estimated
variances are large
large, but the estimated coefficients
themselves are unbiased: So, the most-used remedy for
multicollinearity is to do nothing (Chapter 8)
10. If you get a significant DurbinWatson, Park, or White
test,, remember to consider the possibility
p
y that a
specification error might be causing impure serial
correlation or heteroskedasticity. Dont change your
estimation
ti ti ttechnique
h i
ffrom OLS tto GLS or use adjusted
dj t d
standard errors until you have the best possible
specification (Chapters 9 and 10)
specification.
1-

Reiterating (cont.)
11. Adjusted standard errors like NeweyWest standard
errors or HC standard errors use the OLS coefficient
estimates. Its the standard errors of the estimated
coefficients that change, not the estimated coefficients
themselves. (Chapters 9 and 10)
12. Finally,
y, if in doubt,, rely
y on common sense and
economic theory, not on statistical tests
1-
The Ethical Econometrician

We think that there are two reasonable goals for
econometricians when estimating models:
1. Run as few different specifications as possible while
still attempting to avoid the major econometric problems
The only exception is sensitivity analysis, described in
Section 6.4
2. Report honestly the number and type of different

specifications estimated so that readers of the
research
h can evaluate
l t h
how much
h weight
i ht tto give
i tto your
results
1-
Writing Your Research Report

Most good research reports have a number of elements in
common:
A brief introduction that defines the dependent variable and states
the goals of the research
A short
h t review
i
off relevant
l
t previous
i
literature
lit
t
and
d research
h
An explanation of the specification of the equation (model):
Independent
I d
d t variables
i bl
functional forms
expected signs of (or other hypotheses about) the slope coefficients
A description of the data:

ge
generated
e a ed variables
a ab es
data sources
data irregularities (if any)
1-
Writing Your Research Report

(cont.)
A presentation of each estimated specification, using our standard

documentation format
If you estimate
i
more than
h one specification,
ifi i
b
be sure to explain
l i which
hi h one iis
best (and why!)
A careful analysis of the regression results:

discussion of any econometric problems encountered
complete documentation of all:
equations estimated
tests run
A short summary/conclusion that includes any policy

recommendations
d ti
or suggestions
ti
f further
for
f th research
h
A bibliography
pp
that includes all data,, all regression
g
runs,, and all relevant
An appendix
computer output
1-
Table 11.2a
Regression Users Checklist
1-
Table 11.2b
1-
Table 11.2c
1-
Table 11.2d
1-
Table 11.3a
Regression Users Guide
1-
Table 11.3b
1-
Table 11.3c
1-
Choosing a research topic
Data collection
Missing data
Surveys
Panel data
The 10 Commandments of Applied Econometrics
What to Check If You Get An Unexpected

p
Sign
g
A Dozen Practical Tips Worth Reiterating
The Ethical Econometrician
Writing your research report
A Regression Users Checklist
A Regression Users
User s Guide
1-
Chapter 12
Time-Series
Time
Series Models
1-
Dynamic Models:
Distributed Lag Models
An (ad hoc) distributed lag model explains the current value of Y as a

function of current and past values of X, thus distributing the impact of
X over a number of time periods
For example, we might be interested in the impact of a change in the

money supply
l (X) on GDP (Y) and
d model
d l thi
this as:
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
(12.2)
Potential issues from estimating Equation 12.2 with OLS:

1. The various lagged values of X are likely to be severely
multicollinear,
lti lli
making
ki coefficient
ffi i t estimates
ti t imprecise
i
i
1-
Dynamic Models:
Distributed Lag Models (cont.)
2. In large part because of this multicollinearity, there is no
guarantee that the estimated coefficients will follow the smoothly
declining pattern that economic theory would suggest
Instead,, its quite
q
typical
yp
to g
get something
g like:
3. The degrees of freedom tend to decrease, sometimes

substantially, since we have to:
a estimate a coefficient for each lagged X
a.
X, thus increasing K and lowering
the degrees of freedom (N K 1)
b. decrease the sample size by one for each lagged X, thus lowering the
number of observations, N, and therefore the degrees of freedom (unless
data for lagged Xs outside the sample are available)
1-
What Is a Dynamic Model?

The simplest dynamic model is:
Yt = 0 + 0 X t + Yt1 + ut
(12.3)
Note that Y is on the left-hand side as Yt, and on the

right-hand side as Yt1
Its this difference in time period that makes the equation
dynamic
Note that there is an important connection between a

dynamic model such as the Equation 12.3 and a
distributed lag model such as Equation 12.2
1-
What Is a Dynamic Model?

(cont.)
Yt = 0 + 0Xt + 1Xt1 + 2Xt2 + ... + pXtp + t
where:
1 = 0
2 = 20
3 = 30
.
.
p = P0
(12 2)
(12.2)
(12.8)
As long
g as is between 0 and 1, these coefficients will indeed smoothly
y
decline, as shown in Figure 12.1
1-
Figure 12.1 Geometric Weighting Schemes

for Various Dynamic Models
1-
Serial Correlation and

Dynamic Models
The consequences of serial correlation depend crucially on the type of model in

question:
1. Ad hoc distributed lag models:
serial correlation has the effects outlined in Section 9.2:
causes no bias in the OLS coefficients themselves
causes OLS to no longer be the minimum variance unbiased estimator
causes the standard errors to be biased
2 Dynamic
2.
D
i models:
d l
Now serial correlation causes bias in the coefficients produced by OLS
Compounding all this this is the fact that the consequences, detection, and
remedies for serial correlation that we discussed in Chapter 9 are all either
incorrect or need to be modified in the presence of a lagged dependent
variable
We will now discuss the issues of testing and correcting for serial correlation
in dynamic models in a bit more detail
1-
Testing for Serial Correlation

in Dynamic Models
Using the Lagrange Multiplier to test for serial correlation for a

typical dynamic model involves three steps:
1.
Obtain the residuals of the estimated equation:
e t = Yt Yt = Yt 0 0 X1t Yt1
2.
Use these residuals as the dependent variable in an auxiliary

regression that includes as independent variables all those on
the right-hand side of the original equation as well as the lagged
residuals:
1-
Testing for Serial Correlation

in Dynamic Models (cont.)
3. Estimate Equation 12.18 using OLS and then test the null hypothesis
that a3 = 0 with the following
g test statistic:
LM = N*R2
(12.19)
where: N = the sample

p size
R2 is the unadjusted coefficient of determination
both of the auxiliary equation, Equation 12.18
For large samples, LM has a chi-square distribution with degrees
of freedom equal to the number of restrictions in the null hypothesis (in
this case, one).
If LM is greater than the critical chi-square value from Statistical
Table B-8, then we reject the null hypothesis that a3 = 0 and conclude
that
h there
h
is
i indeed
i d d serial
i l correlation
l i
i the
in
h original
i i l equation
i
1-
Correcting for Serial Correlation

in Dynamic Models
There are essentially three strategies for attempting to rid a dynamic

model of serial correlation:
improving the specification:

Only relevant if the serial correlation is impure
instrumental variables:
substituting an instrument (a variable that is highly correlated with YM but
is uncorrelated with ut) for
f Yt: in the original equation effectively
ff
eliminates
the correlation between Ytl and ut
Problem: good instruments are hard to come by (also see Section 14.3)
modified GLS:
Technique similar to the GLS procedure outlined in Section 9.4
Potential issues: sample must be large and the standard
1-
Granger Causality
Granger causality, or precedence, is a circumstance in which
one time series variable consistentlyy and p
predictably
y changes
g
before another variable
A word of caution: even if one variable precedes (Granger
causes) another, this does not mean that the first variable
causes the other to change
There are several tests for Granger causality
They all involve distributed lag models in one form or another,
however
Well discuss an expanded version of a test originally
developed
p by
y Granger
g
1-
Granger Causality (cont

(cont.))
Granger suggested that to see if A Granger-caused Y, we should
run:
Yt = 0 + 1Yt1 + ... + pYtp + 1At1 + ... + pAtp + t
(12.20)
and test the null hypothesis that the coefficients of the lagged As
(the s) jointly equal zero
If we can reject
j t this
thi nullll h
hypothesis
th i using
i th
the F-test,
F t t then
th we
have evidence that A Granger-causes Y
N
Note
t that
th t if p = 1,
1 Equation
E
ti 12.20
12 20 is
i similar
i il tto th
the dynamic
d
i
model, Equation 12.3
A
Applications
li ti
off thi
this ttestt involve
i
l running
i two
t
G
Granger tests,
t t one
in each direction
1-
Granger Causality (cont

(cont.))
That is, run Equation 12.20 and also run:
At = 0 + 1At1 + ... + pAtp + 1Yt1 + ... + pYtp + t
(12.21)
testing for Granger causality in both directions by testing
the null hypothesis
yp
that the coefficients of the lagged
gg Ys
(again, the s) jointly equal zero
If the F
F-test
test is significant for Equation 12.20 but not
for Equation 12.21, then we can conclude that
A Granger-causes Y
1-
Spurious Correlation and

Nonstationarity
Independent variables can appear to be more significant than they actually are
if they have the same underlying trend as the dependent variable
Example: In a country with rampant inflation almost any nominal variable will
appear to be highly correlated with all other nominal variables
Why?
y
Nominal variables are unadjusted for inflation, so every nominal variable will have
a powerful inflationary component
Such a p
Suc
problem
ob e is
sa
an e
example
a peo
of spu
spurious
ous correlation:
co e at o
a strong relationship between two or more variables that is not caused by a real
underlying causal relationship
If y
you run a regression
g
in which the dependent
p
variable and one or more independent
p
variables are spuriously correlated, the result is a spurious regression, and the
t-scores and overall fit of such spurious regressions are likely to be overstated and
untrustworthy
1-
Stationary and Nonstationary

Time Series
a time-series variable, Xt, is stationary if:

1 the
1.
th mean off Xt is
i constant
t t over time,
ti
2. the variance of Xt is constant over time, and
3. the simple correlation coefficient between Xt and
Xtk depends on the length of the lag (k) but on no
other variable ((for all k))
If one or more of these properties is not met, then Xt is nonstationary
If a series is nonstationary
nonstationary, that problem is often referred to as
nonstationarity
1-

Time Series (cont.)
To get a better understanding of these issues, consider the case where Yt is

generated by
g
y an equation that includes only
y past values of itself ((an
autoregressive equation):
Yt = Yt1 + vt
(12.22)
where vt is a classical error term
Can you see that if | | < 1, then the expected value of Yt will eventually
approach
h 0 (and
( d th
therefore
f
be
b stationary)
t ti
) as th
the sample
l size
i gets
t bi
bigger and
d
bigger? (Remember, since vt is a classical error term, its expected value = 0)
Similarly,
y, can you
y see that if | | > 1,, then the expected
p
value of Yt will
continuously increase, making Yt nonstationary?
This is nonstationarity due to a trend, but it still can cause spurious

regression results
1-

Time Series (cont.)
Most importantly, what about if || = 1? In this case:
Yt = Yt1 + vt
(12.23)
This is a random walk: the expected value of Yt does not

converge on any value, meaning that it is nonstationary
This circumstance, where = 1 in Equation 12.23 (or similar
equations),
ti
) is
i called
ll d a unit
it roott
If a variable has a unit root, then Equation 12.23 holds, and
th variable
the
i bl follows
f ll
a random
d
walk
lk and
d iis nonstationary
t ti
1-
The DickeyFuller Test

From the previous discussion of stationarity and unit
roots it makes sense to estimate Equation 12.22:
roots,
12 22:
Yt = Yt1 + vt
(12.22)
and then determine if || < 1 to see if Y is stationary

This is almost exactly
y how the Dickey-Fuller
y
test works:
1. Subtract Yt1 from both sides of Equation 12.22,
yielding:
(Yt Yt1) = ( 1)Yt1 + vt
(12.26)
1-
The DickeyFuller Test (cont.)

(cont )
If we define Yt = Yt Yt1 then we have the simplest
form of the Dickey
DickeyFuller
Fuller test:
Yt = 1Yt1 + vt
(12.27)
where 1 = 1
Note: alternative Dickey-Fuller
y
tests additionally
y include
a constant and/or a constant and a trend term
2. Set up the test hypotheses:
H0: 1 = 0 (unit root)
HA: 1 < 0 (stationary)
1-
The DickeyFuller Test (cont.)

(cont )
3. Set up the decision rule:
If is statistically significantly less than 0, then we can
reject the null hypothesis of nonstationarity
If is not statistically significantly less than 0, then
we cannot reject the null hypothesis of
nonstationarity
Note that the standard t-table does not apply to Dickey
Fuller tests
For the case of no constant and no trend (Equation 12.27) the
large-sample
g
p values for tc are listed in Table 12.1
1-
Table 12.1 Large-Sample Critical

Values for the DickeyFuller Test
1-
Cointegration
If the DickeyFuller test reveals nonstationarity, what should we do?
The traditional approach has been to take first differences

(Y = Yt Yt1 and X = Xt Xt1) and use them in place of
Yt and Xt in the regressions
Issue: the first-differencing basically throws away information

about the possible equilibrium relationships between the variables
Alternatively, one might want to test whether the time-series are

cointegrated, which means that even though individual variables
might
g be nonstationary,
y its p
possible for linear combinations of
nonstationary variables to be stationary
1-
Cointegration (cont.)
(cont )
To see how this works, consider Equation 12.24:

(
(12.24)
)
Assume that both Yt and Xt have a unit root
Solving Equation 12.24 for ut, we get:

(12.30)
In Equation 12.24, u t is a function of two nonstationary variables, so u t might

be expected also to be nonstationary
Cointegration refers to the case where this is not the case:
Yt and Xt are both non

non-stationary,
stationary, yet a linear combination of them, as given
by Equation 12.24, is stationary

This could happen if economic theory supports Equation 12.24 as an equilibrium
1-
Cointegration (cont.)
(cont )
We thus see that if Xt and Yt are cointegrated then OLS estimation

of the coefficients in Equation 12.24 can avoid spurious results
To determine if Xt and Yt are cointegrated, we begin with OLS

estimation of Equation 12.24 and calculate the OLS residuals:
(12.31)
Next,, perform
p
a Dickey-Fuller
y
test on the residuals
Remember to use the critical values from the Dickey-Fuller Table!
If we are able to reject

j
the null hypothesis
yp
of a unit root in the
residuals, we can conclude that Xt and Yt are cointegrated and our
OLS estimates are not spurious
1-
A Standard Sequence of Steps for Dealing

with Nonstationary Time Series
1. Specify the model (lags vs. no lags, etc)
2 Test all variables for nonstationarity (technically unit roots) using the
2.
appropriate version of the DickeyFuller test
3. If the variables dont have unit roots, estimate the equation in its original
units (Y and X)
4. If the variables have unit roots, test the residuals of the equation for
cointegration using the Dickey
DickeyFuller
Fuller test
5. If the variables have unit roots but are not cointegrated, then change the
functional form of the model to first differences (X and Y) and
estimate
ti t the
th equation
ti
6. If the variables have unit roots and also are cointegrated, then estimate
the equation in its original
g
units
1-

Dynamic model
Ad hoc distributed lag model
Lagrange Multiplier Serial Correlation test
Granger causality
Nonstationary series
DickeyFuller
Fuller test
Dickey
Unit root
Random walk
Cointegration
1-
Chapter 13
Dummy Dependent Variable

Techniques
1-
The Linear Probability Model

The linear probability model is simply running OLS for a
regression,
g
, where the dependent
p
variable is a dummy
y ((i.e.
binary) variable:
(
(13.1)
)
where Di is a dummy variable, and the Xs, s, and are typical
independent
p
variables, regression
g
coefficients, and an error term,
respectively
The term linear probability model comes from the fact that the
right side of the equation is linear while the expected value of
the left side measures the probability that Di = 1
1-
Problems with the Linear

Probability Model
1. R2 is not an accurate measure of overall fit:
Di can equal
q
only
y 1 or 0,, but must move in a continuous fashion from one
extreme to the other (as also illustrated in Figure 13.1)
Hence,
is likely to be quite different from Di for some range of Xi
Th
Thus, R2 is
i lik
likely
l tto b
be much
h lower
l
than
th 1 even if the
th model
d l actually
t ll d
does an
2
exceptional job of explaining the choices
R p involved
As an alternative, one can instead use
, a measure based on the
percentage
t
off the
th observations
b
ti
i th
in
the sample
l th
thatt a particular
ti l estimated
ti t d
equation explains correctly
To use this approach, consider a
> .5 to predict that Di = 1 and a < .5
to predict that Di = 0 and then simply compare these predictions with the
actual Di
2.
is not bounded by 0 and 1:

The alternative binomial logit model, presented in Section 13.2, will address this
issue
1-
Figure 13.1
A Linear Probability Model
1-
The Binomial Logit Model
The binomial logit is an estimation technique for equations with dummy

dependent variables that avoids the unboundedness problem of the
linear probability model
It does so by using a variant of the cumulative logistic function:

(13.7)
Logits cannot be estimated using OLS but are instead estimated by

maximum likelihood (ML), an iterative estimation technique that is
especially
p
y useful for equations
q
that are nonlinear in the coefficients
Again, for the logit model
This is illustrated byy Figure

g
13.2
is bounded by 1 and 0
1-
Figure 13.2
Is Bounded by 0
and 1 in a Binomial Logit Model
1-
Interpreting Estimated
Logit Coefficients
The signs of the coefficients in the logit model have the
same meaning as in the linear probability (i.e.
(i e OLS) model
The interpretation of the magnitude of the coefficients
differs though
differs,
though, the dependent variable has changed
dramatically.
That the marginal effects are not constant can be seen
from Figure 13.2: the slope (i.e. the change in probability)
of the graph of the logit changes as moves from 0 to 1!
Well consider three ways for helping to interpret logit
coeffcients meaningfully:
1-
Interpreting Estimated Logit

Coefficients (cont.)
1. Change an average observation:
Create an average observation by plugging the means of all the independent variables
i t the
into
th estimated
ti t d logit
l it equation
ti and
d th
then calculating
l l ti an average
Then increase the independent variable of interest by one unit and recalculate the
The difference between the two
s then g
gives the marginal
g
effect
2. Use a partial derivative:

Taking a derivative of the logit yields the result that the change in the expected value of
caused by a one unit increase in holding constant the other independent variables in the
equation equals
To use this formula, simply plug in your estimates of
and Di
From this
this, again
again, the marginal impact of X does indeed depend on the value of
3. Use a rough estimate of 0.25:

Plugging
gg g in into the p
previous equation,
q
we g
get the ((more handy!)
y ) result that multiplying
py ga
logit coefficient by 0.25 (or dividing by 4) yields an equivalent linear probability model
coefficient
1-
Other Dummy Dependent

Variable Techniques
The Binomial Probit Model:

Similar to the logit model this an estimation techniq
technique
e for eq
equations
ations with
ith
dummy dependent variables that avoids the unboundedness problem
of the linear probability model
However, rather than the logistic function, this model uses a variant of the
cumulative normal distribution
The Multinomial Logit Model:

Sometimes there are more than two qualitative choices available
The sequential binary model estimates such choices as a series of
binary decisions
If the choice is made simultaneously, however, this is not appropriate
The multinomial logit is developed specifically for the case with more
than two qualitative choices and the choice is made simultaneously1-

Linear probability model
R
2
p
Binomial logit model

The interpretation of an estimated logit coefficient
Binomial probit model
Sequential binary model
Multinomial logit model
1-
Chapter 14
Simultaneous Equations
1-
The Nature of Simultaneous

Equations Systems
In a typical econometric equation:

Yt = 0 + 1X1t + 2X2t + t
(14 1)
(14.1)
a simultaneous system is one in which Y has an effect on at least

one of the Xs in addition to the effect that the Xs have on Y
Jargon here involves feedback effects, dual causality as well as X

and Y being jointly determined
Such systems are usually modeled by distinguishing between variables
that are simultaneously determined (the Ys, called endogenous
variables) and those that are not (the Xs
Xs, called exogenous variables):
Y1t = 0 + 1Y2t + 2X1t + 3X2t + 1t
(14.2)
Y2t = 0 + 1Y1t + 2X3t + 3X2t + 2t
(14 3)
(14.3)
1-

Equations Systems (cont.)
Equations 14.2 and 14.3 are examples of structural
equations
Structural equations characterize the underlying
economic theory behind each endogenous variable by
expressing it in terms of both endogenous and
exogenous variables
For example, Equations 14.2 and 14.3 could be a
demand and a supply equation,
equation respectively
1-

The term predetermined variable includes all
exogenous variables and lagged endogenous variables
Predetermined implies that exogenous and lagged endogenous
variables are determined outside the system
y
of specified
p
equations or prior to the current period
The main p
problem with simultaneous systems
y
is that
they violate Classical Assumption III (the error term
and each explanatory variable should be uncorrelated)
1-
Reduced-Form Equations
An alternative way of expressing a simultaneous equations
system is through the use of reduced
reduced-form
form equations
Reduced-form equations express a particular
endogenous variable solely in terms of an error term and all
the predetermined (exogenous plus lagged endogenous)
variables in the simultaneous system
y
1-
(cont.)
The reduced-form equations for the structural
Equations 14.2
14 2 and 14.3
14 3 would thus be:
Y1t =
Y2t =
4+
1X1t
5X1t +
2X2t +
3X3t
6X2t
7X3t +
+ v1t
v2t
(14.6)
(14.7)
where the vs are stochastic error terms and the s are

called reduced-form coefficients
1-
(cont.)
There are at least three reasons for using reduced-form equations:
1 Since the reduced
1.
reduced-form
form equations have no inherent simultaneity,
simultaneity they
do not violate Classical Assumption III
Therefore, they can be estimated with OLS without encountering the
problems discussed in this chapter
2. The interpretation of the reduced-form coefficients as impact

multipliers means that they have economic meaning and useful
applications of their own
3 Reduced-form
3.
Reduced form equations play a crucial role in Two
Two-Stage
Stage Least
Squares, the estimation technique most frequently used for
simultaneous equations (discussed in Section 14.3)
1-
The Bias of Ordinary Least

Squares (OLS)
Simultaneity bias refers to the fact that in a simultaneous system,

the expected values of the OLS-estimated structural coefficients are
not equal to the true s, that is:
(14.10)
The reason for this is that the two error terms of Equation 14.11
and 14.12 are correlated with the endogenous variables when
they appear as explanatory variables
As an example of how the application of OLS to simultaneous

equations
q
estimation causes bias, a Monte Carlo experiment
p
was
conducted for a supply and demand model
As Figure 14.2 illustrates, the sampling distributions differed greatly

f
from
the
th true
t distributions
di t ib ti
defined
d fi d in
i th
the M
Monte
t C
Carlo
l experiment
i
t
1-
Figure 14.2 Sampling Distributions Showing

Simultaneity Bias of OLS Estimates
1-
What Is Two-Stage Least

Squares?
Two-Stage Least Squares (2SLS) helps mitigate
simultaneity bias in simultaneous equation systems
2SLS requires a variable that is:
1. a good proxy for the endogenous variable
2. uncorrelated with the error term
Such a variable is called an instrumental variable
2SLS essentially
ti ll consist
i t off the
th following
f ll i ttwo steps:
t
1-

Squares?
STAGE ONE:
R
Run OLS on the
th reduced-form
d
df
equations
ti
f each
for
h off the
th
endogenous variables that appear as explanatory variables in the
structural equations in the system
That is, estimate (using OLS):
(14.18)
(14.19)
1-

Squares? (cont.)
STAGE TWO:
S
Substitute
b tit t the
th Ys
Y from
f
the
th reduced
d
d form
f
for
f the
th Ys
Y that
th t appear on
the right side (only) of the structural equations, and then estimate
these revised structural equations with OLS
That is, estimate (using OLS):
(14.20)
(14.21)
1-
The Properties of Two-Stage

Least Squares
1. 2SLS estimates are still biased in small samples
But consistent in large samples (get closer to true s as N increases)
2. Bias in 2SLS for small samples typically is of the opposite sign of

the bias in OLS
3. If the fit of the reduced-form equation is poor, then 2SLS will not rid
the equation of bias even in a large sample
4. 2SLS estimates have increased variances and standard errors
relative to OLS
Note that Two-Stage Least Squares cannot be applied to an

equation unless that equation is identified, however
We therefore now turn to the issue of identification
1-
What Is the Identification

Problem?
Identification is a precondition for the application of 2SLS to equations

in simultaneous systems
A structural equation is identified only when enough of the systems

predetermined variables are omitted from the equation in question to
allow
ll
that
th t equation
ti to
t be
b distinguished
di ti
i h d ffrom allll th
the others
th
iin th
the system
t
Note that one equation in a simultaneous system might be identified and
another might
g not
Most simultaneous systems are fairly complicated, so econometricians

need a general method by which to determine whether equations are
id ifi d
identified
The method typically used is the order condition of identification, to

which we now turn
1-
The Order Condition of

Identification
Is a systematic method of determining whether a particular
equation in a simultaneous system has the potential to be
identified
If an equation can meet the order condition,
condition then it is
almost always identified
We thus say that the order condition is a necessary but not
sufficient condition of identification
1-
The Order Condition of

Identification (cont.)
THE ORDER CONDITION:
A necessary condition
diti ffor an equation
ti tto be
b id
identified
tifi d iis th
thatt
the number of predetermined (exogenous plus lagged
endogenous)
e
doge ous) variables
a ab es in the
e sys
system
e be g
greater
eate tthan
a o
or
equal to the number of slope coefficients in the equation of
interest
Or, in equation form, a structural equation meets the order

condition if:
# predetermined variables # slope coefficients
(in the simultaneous system) (in the equation)
1-
Figure 14.1 Supply and Demand

Simultaneous Equations
1-
Figure 14.3
A Shifting Supply Curve
1-
Figure 14.4
When Both Curves Shift
1-
Table 14.1a
Data for a Small Macromodel
1-
Table 14.1b
Data for a Small Macromodel
1-

Endogenous variable
Predetermined variable
Structural equation
Reduced-form equation
Simultaneity
y bias
Two-Stage Least Squares
Identification
Order condition for identification
1-
Chapter 15
Forecasting
1-
What Is Forecasting?
In general, forecasting is the act of predicting the future
In econometrics
econometrics, forecasting is the estimation of the expected value of
a dependent variable for observations that are not part of the same
data set
In most forecasts, the values being predicted are for time periods in
the future, but cross-sectional predictions of values for countries or
people not in the sample are also common
To simplify terminology, the words prediction and forecast will be used

interchangeably
g
y in this chapter
p
Some authors limit the use of the word forecast to out-of-sample prediction
for a time series
1-
What Is Forecasting? (cont.)

(cont )
Econometric forecasting generally uses a single linear
equation to predict or forecast
Our use of such an equation to make a forecast can be
summarized into two steps:
1. Specify and estimate an equation that has as its
dependent variable the item that we wish to forecast:
(15.2)
1-
What Is Forecasting? (cont.)

(cont )
2. Obtain values for each of the independent
variables for the observations for which we
want a forecast and substitute them into our
forecasting equation:
(
(15.3)
)
Figure 15.1
15 1 illustrates two examples
1-
Figure 15.1a
Forecasting Examples
1-
Figure 15.1b
Forecasting Examples
1-
More Complex
Forecasting Problems
The forecasts generated in the previous section are quite simple,

however, and most actual forecasting involves one or more
additional questionsfor example:
1. Unknown Xs: It is unrealistic to expect to know the
values
l
ffor th
the iindependent
d
d t variables
i bl outside
t id th
the sample
l
What happens when we dont know the values of the
independent variables for the forecast period?
2. Serial Correlation: If there is serial correlation involved,
the forecasting
g equation
q
may
y be estimated with GLS
How should predictions be adjusted when forecasting equations
are estimated with GLS?
1-
More Complex Forecasting

Problems (cont.)
3. Confidence Intervals: All the previous forecasts were
single values, but such single values are almost never
exactly right, so maybe it would be more helpful if we
forecasted a confidence interval instead
How
H
can we d
develop
l th
these confidence
fid
iintervals?
t
l ?
4. Simultaneous Equations Models: As we saw in
Chapter 14,
14 many economic and business equations are
part of simultaneous models
How can we use an independent
p
variable to forecast a
dependent variable when we know that a change in value of the
dependent variable will change, in turn, the value of the
independent variable that we used to make the forecast?
1-
Conditional Forecasting (Unknown X

Values for the Forecast Period)
Unconditional forecast: all values of the independent
variables
i bl are kknown with
i h certainty
i
This is rare in practice
Conditional forecast: actual values of one or more of

the independent variables are not known
This is the more common type of forecast
1-
Conditional Forecasting (Unknown X

Values for the Forecast Period) (cont.)
The careful selection of independent variables can
sometimes
i
h
help
l avoid
id the
h need
d ffor conditional
di i
l
forecasting
This opportunity can arise when the dependent variable
can be expressed as a function of leading indicators:
A leading indicator is an independent variable the movements
of which anticipate movements in the dependent variable
Th
The best
b t known
k
leading
l di indicator,
i di t the
th Index
I d off Leading
L di
Economic Indicators, is produced each month
1-
Forecasting with Serially

Correlated Error Terms
Recall from Chapter 9 that when serial correlation is severe, one
remedyy is to run Generalized Least Squares
q
(GLS)
(
) as noted in
Equation 9.18:
(
(9.18)
)
If Equation 9.18 is estimated, the dependent variable will be:
(15 7)
(15.7)
Thus, if a GLS equation is used for forecasting, it will produce
predictions of Y*
Y T + 1 rather than of YT+1
Such predictions thus will be of the wrong variable!
1-
Forecasting with Serially

Correlated Error Terms (cont.)
If forecasts are to be made with a GLS equation, Equation 9.18
should first be solved for YT before forecasting
g is attempted:
p
(15.8)
Next
Next, substitute T+1 for t (to forecast time period T+1) and insert
estimates for the coefficients, s and Xs into the equation to get:
(15 9)
(15.9)
Equation 15.9 thus should be used for forecasting when an
equation has been estimated with GLS to correct for serial
correlation
1-
Forecasting Confidence
Intervals
The techniques we use to test hypotheses can also be
adapted to create forecasting confidence intervals
Given a point forecast,
all we need to generate a
confidence interval around that forecast are tc, the critical
t-value (for the desired level of confidence), and SF, the
estimated standard error of the forecast:
(15.11)
Th
The critical
iti l t-value
t l , tc, can be
b ffound
d iin St
Statistical
ti ti l T
Table
bl
B-1 (for a two-tailed test with T-K-1 degrees of freedom)
1-
Forecasting Confidence
Intervals (cont.)
Lastly, the standard error of the forecast, SF, for an equation with just
one independent variable, equals the square root of the forecast error
variance:
(15.13)
where:
s2
= the estimated variance of the error term
= the number of observations in the sample
XTT+11 = the forecasted value of the single

g independent
p
variable
X
= the arithmetic mean of the observed Xs in the sample
Figure 15.2 illustrates an example of a forecast confidence interval
1-
Figure 15.2
A Confidence Interval for
1-
Forecasting with Simultaneous

Equations Systems
How should forecasting be done in the context of a
simultaneous model?
There are two approaches to answering this question,
depending on whether there are lagged endogenous
variables on the right-hand side of any of the equations in
the system:
y
1-
Forecasting with Simultaneous

1.
No lagged endogenous variables in the system:

the reduced-form
reduced form equation for the particular endogenous variable can
be used for forecasting because it represents the simultaneous solution
of the system for the endogenous variable being forecasted
2.
Lagged endogenous variables in the system:

then the approach must be altered to take into account the dynamic
interaction caused by the lagged endogenous variables
For simple models, this sometimes can be done by substituting
for the lagged endogenous variables where they appear in the
reduced-form equations
If such a manipulation is difficult, however, then a technique called
simulation analysis can be used
1-
ARIMA Models
ARIMA is a highly refined curve-fitting device that uses current and

past values of the dependent variable to produce often accurate
short-term forecasts of that variable
Examples of such forecasts are stock market price predictions created by
brokerage analysts (called chartists
chartists or technicians)
technicians ) based entirely on
past patterns of movement of the stock prices
If ARIMA models thus essentially ignores economic theory (by

ignoring traditional explanatory variables), why use them?
The use of ARIMA is appropriate when:

little or nothing is known about the dependent variable being forecasted,
the independent variables known to be important cannot be forecasted
effectively
all that is needed is a one or two-period forecast
1-
ARIMA Models (cont.)

(cont )
The ARIMA approach combines two different
specifications (called processes) into one equation:
1. An autoregressive process (AR):
expresses a dependent variable as a function of past values of the
dependent variable
This is similar to the serial correlation error term function of
Chapter 9 and to the dynamic model of Chapter 12
2. a moving average process (MA):

expresses a dependent variable as a function of past values of the
error term
Such a function is a moving average of past error term observations
that can be added to the mean of Y to obtain a moving average of past
values of Y
1-

(cont )
To create an ARIMA model, we begin with an econometric equation

with no independent variables:
and then add to it both the autoregressive

g
and moving-average
g
g
processes:
(15.17)
where the s and the
s are the coefficients of the autoregressive
g
and
moving-average processes, respectively, and p and q are the number
of past values used of Y and , respectively
1-

(cont )
Before this equation can be applied to a time series, however, it
must be ensured that the time series is stationary,
y, as defined in
Section 12.4
p , a non-stationary
y series can often be converted
For example,
into a stationary one by taking the first difference:
(
(15.18)
)
If the first differences do not produce a stationary series, then
first differences of this first-differenced series can be takeni.e.
a second-difference transformation:
(15.19)
1-

(cont )
If a forecast of Y* or Y** is made, then it must be converted back into

Y terms
For example, if d = 1 (where d is the number of differences taken to

make Y stationary), then:
(15.20)
This conversion process is similar to integration in mathematics, so

the I
in ARIMA stands ffor integrated
ARIMA thus stands for Auto-Regressive Integrated Moving Average

An ARIMA model with p, d, and q specified is usually denoted as ARIMA
(p,d,q) with the specific integers chosen inserted for p, d, and q
If the
eo
original
g a se
series
es is
s stat
stationary
o a ya
and
d d therefore
e e o e equa
equals
s 0, thiss iss
sometimes shortened to ARMA
1-

Unconditional forecast
Conditional forecast
Leading indicator
Confidence interval (of forecast)
Autoregressive process
Moving-average process
ARIMA(p,d,q)
1-
Chapter 16
Experimental and Panel Data
1-
Random Assignment
Experiments
When medical researchers want to examine the effect of a new drug, they
use an experimental design called an random assignment experiment
In such experiments, two groups are chosen randomly:

1. Treatment group: receives the treatment (a specific medicine, say)
2. Control group: receives a harmless, ineffective placebo
The resulting equation is:

OUTCOMEi = 0 + 1TREATMENTi + i
(16.1)
where:
OUTCOMEi = a measure of the desired outcome in the ith individual
TREATMENTi = a dummy variable equal to 1 for individuals in
t t
treatment
t group and
d 0 for
f individuals
i di id l iin th
the control
t l group
the
1-
Random Assignment
Experiments (cont.)
But random assignment cant always control for all possible
other factorsthough
g sometimes we may
y be able to identify
y some
of these factors and add them to our equation
Lets say that the treatment is job training:
Suppose that random assignment, by chance, results in one group having more
males and being slightly older than the other group
If gender and age matter in determining earnings
earnings, then we can control for the
different composition of the two groups by including gender and age in our
regression equation:
OUTCOMEi = 0 + 1TREATMENTi + 2X1i + 3X2i + i
(16.2)
where: X1 = dummy variable for the individuals gender

X2 = the individuals age
1-
Random Assignment
Experiments (cont.)
Unfortunately, random assignment experiments are not common in

economics because they are subject to problems that typically do not
plague medical experimentse.g.:
1.
Non-Random Samples:
Most subjects in economic experiments are volunteers, and samples of
volunteers often arent random and therefore may not be representative of the
overall population
As a result, our conclusions may not apply to everyone
2.
Unobservable Heterogeneity:
In Equation 16.2, we added observable factors to the equation to avoid omitted
variable bias, but not all omitted factors in economics are observable
This unobservable omitted variable problem is called unobserved
heterogeneity
1-
Random Assignment
Experiments (cont.)
3. The Hawthorne Effect:
Human subjects typically know that theyre
they re being studied
studied, and they
usually know whether theyre in the treatment group or the control group
The fact that human subjects know that theyre being observed
sometimes
ti
can change
h
th
their
i b
behavior,
h i and
d thi
this change
h
iin b
behavior
h i could
ld
clearly change the results of the experiment
4. Impossible Experiments:
Its often impossible (or unethical) to run a random assignment
experiment in economics
Think about how difficult it would be to use a random assignment
experiment to study the impact of marriage on earnings!
1-
Natural Experiments
Natural experiments (or quasi-experiments) are similar
to random assignment experiments,
experiments except:
observations fall into treatment and control groups
naturally
naturally (because of an exogenous event) instead of
being randomly assigned by the researcher
By exogenous
exogenous event
event is meant that the natural event
must not be under the control of either of the two groups
1-
Natural Experiments (cont

(cont.))
The appropriate regression equation for such a natural experiment is:
OUTCOMEi = 0 + 1TREATMENTi + 2X1i + 3X2i + i (16.3)

(16 3)
where:
OUTCOMEi is defined as the outcome after the treatment minus the
outcome before the treatment for the ith observation
1 is called the difference

difference-in-differences
in differences estimator
estimator, and it measures
the difference between the change in the treatment group and the
change in the control group, holding constant X1 and X2
Figure 16.1 illustrates an example of a natural experiment
1-
Figure 16.1 Treatment and

Control Groups for Los Angeles
1-
What Are Panel Data?
Panel (or longitudinal) data combine time-series and crosssectional data such that observations on the same variables from the
same cross sectional
ti
l sample
l are followed
f ll
d over two
t
or more
different time periods
Why use panel data? At least three reasonsusing panel data:

1. certainly will increase sample sizes!
2 can help provide insights into analytical questions that can
2.
cantt be
answered by using time-series or cross-sectional data alone:
Allows determining whether the same people are unemployed year
after year or whether different individuals are unemployed in different
years
3. often allow researchers to avoid omitted variable problems that

otherwise would cause bias in cross-sectional studies
1-
What Are Panel Data? (cont.)

(cont )
There are four different kinds of variables that we encounter when

we use panel data:
1. Variables that can differ between individuals but dont change
over time:
e.g., gender, ethnicity, and race
2. Variables that change over time but are the same for all
individuals in a given time period:
e.g., the retail price index and the national unemployment rate
3 Variables that vary both over time and between individuals:

3.
e.g., income and marital status
4 Trend variables that vary in predictable ways:

4.
e.g., an individuals age
1-
The Fixed Effects Model
There are several alternative panel data estimation procedures
Most researchers use the fixed effects model

model, which allows each
cross-sectional unit to have a different intercept:
Yit = 0 + 1Xit + 2D2i + ... + NDNi + vit
((16.4))
where:
D2 = intercept dummy equal to 1 for the second cross-sectional
cross sectional entity
and 0 otherwise
DN = intercept
p dummy
y equal
q
to 1 for the Nth cross-sectional entity
y
and 0 otherwise
Note that Y, X, and v have two subscripts!
1-

(cont.)
One major advantage of the fixed effects model is that it avoids bias
due to omitted variables that dont change over time
e.g., race or gender
Such time-invariant omitted variables often are referred to as unobserved
heterogeneity or a fixed effect
To understand how this works, consider what Equation 16.4 would look
like with only two years worth of data:
Yit = 0 + 1Xit + 2D2i + vit
(16.5)
Lets decompose the error term

term, vit, into two
t o components,
components a classical
error term (it) and the unobserved impact of the time-invariant
omitted variables (ai):
vit = it + ai
(16.6)
1-

(cont.)
If we substitute Equation 16.6 into Equation 16.5, we get:

Yit = 0 + 1Xit + 2D2i + it + ai
(16.7)
Next, average Equation 16.7 over time for each observation i, thus
producing:
Yi = 0 + 1Xi + 2D2i + i + ai
(16.8)
where the bar over a variable indicates the mean of that variable
across time
Note that ai, 2D2i, and 0 dont have bars over them because theyre
constant over time
i
1-

(cont.)
If we now subtract Equation 16.8 from Equation 16.7, we get:
Note that ai, 2, D2i, and 0 are subtracted out because theyre in both
equations
q
Weve therefore shown that estimating panel data with the fixed effects
model does indeed drop the ai out of the equation
Hence, the fixed effects model will not experience bias due to timeinvariant omitted variables!
Example: The death penalty and the murder rate:

Figures 16.2 and 16.3 illustrates the importance of the fixed-effects model:
the unlikely (positive) result from the cross-section model is reversed by
the fixed effects model!
1-
Figure 16.2 In a Single-Year

Cross-Sectional Model
Model, the Murder Rate
Appears to Increase with Executions
1-
Figure 16.3
In a Panel Data Model
Model, the Murder
Rate Decreases with Executions
1-
The Random Effects Model

Recall that the fixed effects model is based on the assumption
that each cross-sectional unit has its own intercept
p
The random effects model instead is based on the assumption
that the intercept
p for each cross-sectional unit is drawn from
a distribution (that is centered around a mean intercept)
p is a random draw from an intercept
p
Thus each intercept
distribution and therefore is independent of the error term for
any particular observation
Hence the term random effects model
1-
The Random Effects Model

(cont.)
Advantages of the random effects model:

1 more degrees of freedom than a fixed effects model
1.
This is because rather than estimating an intercept for virtually every crosssectional unit, all we need to do is to estimate the parameters that describe the
distribution of the intercepts
intercepts.
2. Can now also estimate time-invariant explanatory variables

((like race or gender).
g
)
Disadvantages of the random effects model:

1 Most importantly,
1.
importantly the random effects estimator requires us to
assume that ai is uncorrelated with the independent variables,
the Xs, if were going to avoid omitted variable bias
This may be an overly strong assumption in many cases
1-
Choosing Between Fixed

and Random Effects
One key is the nature of the relationship between ai and the Xs:
If theyre likely to be correlated, then it makes sense to use the fixed
effects model
If not, then it makes sense to use the random effects model
Can also use the Hausman test to examine whether there is

correlation between ai and X
Essentially, this procedure tests to see whether the regression

Essentially
coefficients under the fixed effects and random effects models are
statistically different from each other
If they are different, then the fixed effects model is preferred
If the they are not different, then the random effects model is preferred
(or estimates of both the fixed
fi ed effects and random effects models are
provided)
1-
Table 16.1a
16 1a
1-
Table 16.1b
16 1b
1-
Table 16.1c
16 1c
1-
Table 16.1d
16 1d
1-
Table 16.1e
16 1e
1-

Treatment group
Control group
g p
Differences estimator
Difference in differences
Unobserved heterogeneity
The Hawthorne effect
Panel data
The fixed effects model
The random effects model
Hausman test
1-
Chapter 17
Statistical Principles
1-
Probability
A random variable X is a variable whose numerical value is

determined by chance, the outcome of a random phenomenon
A discrete random variable has a countable number of possible values,
such as 0, 1, and 2
A continuous random variable, such as time and distance, can take on any
value in an interval
A probability distribution P[Xi] for a discrete random variable X

assigns probabilities to the possible values X1, X2, and so on
For example,
p when a fair six-sided die is rolled, there are six equally
q
y
likely outcomes, each with a 1/6 probability of occurring
Figure 17.1 shows this probability distribution
1-
Figure 17.1 Probability

Distribution for a Six-Sided Die
1-
Mean, Variance, and

Standard Deviation
The expected value (or mean) of a discrete random variable X is a
weighted average of all possible values of X, using the probability of
each X value as weights:
= E[X] = XiP[Xi ]
(17.1)
the variance of a discrete random variable X is a weighted average,

for all possible values of X, of the squared difference between X and
its expected value
value, using the probability of each X value as weights:
2 = E[(X )2 ] = (Xi )2 P[Xi ]
(17.2)
The
Th standard
t d d deviation
d i ti
is
i th
the square roott off the
th variance
i
1-
Continuous Random
Variables
Our examples to this point have involved discrete random variables,
for which we can count the number of possible outcomes:
The coin can be heads or tails; the die can be 1, 2, 3, 4, 5, or 6
For continuous random variables, however, the outcome can be any

value in a given interval
For example, Figure 17.2 shows a spinner for randomly selecting a point on
a circle
A continuous probability density curve shows the probability that the

outcome is in a specified
p
interval as the corresponding
p
g area under the
curve
This is illustrated for the case of the spinner in Figure 17.3
1-
Figure 17.2
Pick a Number, Any Number
1-
Figure 17.3 A Continuous Probability

Distribution for the Spinner
1-
Standardized Variables
To standardize a random variable X, we subtract its mean

by its standard deviation :
Z=
and then divide
(17.3)
No matter what the initial units of X, the standardized random variable Z has
a mean of 0 and a standard deviation of 1
The standardized variable Z measures how many standard deviations X is

above or below its mean:
If X is equal to its mean, Z is equal to 0
If X is one standard deviation above its mean, Z is equal to 1
If X is two standard deviations below its mean, Z is equal to 2
Figures 17.4 and 17.5 illustrates this for the case of dice and fair coin flips,
respectively
ti l
1-
Figure 17.4a Probability Distribution for

Six-Sided Dice, Using Standardized Z
1-
Figure 17.4b Probability Distribution for

1-
Figure 17.4c Probability Distribution for

1-
Figure 17.5a Probability Distribution for

Fair Coin Flips, Using Standardized Z
1-
Figure 17.5b Probability Distribution for

1-
Figure 17.5c Probability Distribution for

1-
The Normal Distribution

The density curve for the normal distribution is graphed in
Figure 17.6
The probability that the value of Z will be in a specified interval is given
by the corresponding area under this curve
These areas can be determined by consulting statistical software or a
table, such as Table B-7 in Appendix B
Many things follow the normal distribution (at least approximately):

the weights of humans, dogs, and tomatoes
The lengths of thumbs,
thumbs widths of shoulders,
shoulders and breadths of skulls
Scores on IQ, SAT, and GRE tests
The number of kernels on ears of corn
corn, ridges on scallop shells
shells, hairs on
cats, and leaves on trees
1-
Figure 17.6
1-

(cont.)
The central limit theorem is a very strong result for
empirical analysis that builds on the normal
distribution
The central limit theorem states that:
if Z is a standardized sum of N independent, identically distributed
(discrete or continuous) random variables with a finite, nonzero
standard deviation, then the probability distribution of Z
approaches the normal distribution as N increases
1-
Sampling
First, lets define some key terms:
Population: the entire group of items that interests us
Sample: the part of this population that we actually
observe
Statistical inference involves using
g the sample
p to draw
conclusions about the characteristics of the population
from which the sample came
1-
Selection Bias
Any sample that differs systematically from the population that it is
intended to represent is called a biased sample
One of the most common causes of biased samples is selection bias,
which occurs when the selection of the sample systematically
excludes
l d or underrepresents
d
t certain
t i groups
Selection bias often happens when we use a convenience sample
consisting
g of data that are readily
y available
Self-selection bias can occur when we examine data for a group of

people who have chosen to be in that group
1-
Survivor and
Nonresponse Bias
A retrospective study looks at past data for a contemporaneously
selected sample
for example, an examination of the lifetime medical records of 65-year-olds
A prospective study, in contrast, selects a sample and then tracks the

members over time
By its very design, retrospective studies suffer from survivor bias: we
necessarily
il exclude
l d members
b
off the
th pastt population
l ti who
h are no longer
l
around!
Nonresponse bias: The systematic refusal of some groups to
participate in an experiment or to respond to a poll
1-
The Power of Random

Selection
In a simple random sample of size N from a given population:

each member of the population is equally likely to be included in the sample
every possible sample of size N from this population has an equal chance of
being selected
How do we actually make random selections?
We would like a procedure that is equivalent to the following:

put the name of each member of the population on its own slip of paper
drop these slips into a box
mix thoroughly
pick members out randomly
In practice, random sampling is usually done through some sort of

numerical identification combined with a computerized random selection
of numbers
1-
Estimation
First, some terminology:
Parameter: a characteristic of the population whose value is

unknown, but can be estimated
Estimator: a sample
p statistic that will be used to estimate the value of
the population parameter
Estimate: the specific value of the estimator that is obtained in one

particular sample
Sampling variation: the notion that because samples are chosen

randomly the sample average will vary from sample to sample
randomly,
sample,
sometimes being larger than the population mean and sometimes lower
1-
Sampling Distributions
The sampling distribution of a statistic is the probability distribution
or density curve that describes the population of all possible values
of this statistic
For example, it can be shown mathematically that if the individual
observations are drawn from a normal distribution
distribution, then the sampling
distribution for the sample mean is also normal
Even if the population does not have a normal distribution, the sampling
distribution of the sample mean will approach a normal distribution as the
sample size increases
It can be shown mathematically that the sampling distribution for the

sample mean has the following mean and standard deviation:
Mean of X =
(17.5)
St d d deviation
Standard
d i ti off X = / N
1-
The Mean of the Sampling

Distribution
A sample statistic is an unbiased estimator of a
population parameter if the mean of the sampling
distribution of this statistic is equal to the value of the
population parameter
Because the mean of the sampling distribution of X is ,
X is an unbiased estimator of
1-
The Standard Deviation of the

Sampling Distribution
One way of gauging the accuracy of an estimator is with
its standard deviation:
If an estimator has a large standard deviation, there is a
probability
y that an estimate will be far from its mean
substantial p
If an estimator has a small standard deviation, there is a high
probability that an estimate will be close to its mean
1-
The t-Distribution
When the mean of a sample from a normal distribution is

standardized by subtracting the mean of its sampling distribution and
dividing by the standard deviation of its sampling distribution, the
resulting Z variable
has a normal distribution
W.S. Gosset determined (in 1908) the sampling distribution of the

variable that is created when the mean of a sample from a normal
g and dividing
g by
y its
distribution is standardized byy subtracting
standard error ( the standard deviation of an estimator):
1-
The t-Distribution (cont.)

(cont )
The exact distribution of t depends on the sample
size,
as the sample size increases, we are increasingly confident of the
accuracy of the estimated standard deviation
Table B-1 at the end of the textbook shows some

probabilities for various t-distributions that are identified
by the number of degrees of freedom:
degrees
g
of freedom = # observations - # estimated
parameters
1-
Confidence Intervals
A confidence interval measures the reliability of a given statistic

such as X
The general procedure for determining a confidence interval for a

population mean can be summarized as:
1. Calculate the sample average X
2. Calculate the standard error of X by dividing the sample standard
deviation s by the square root of the sample size N
3. Select a confidence level (such as 95 percent) and look in
Table B-1 with N-1 degrees of freedom to determine the t-value
that
h corresponds
d to this
hi probability
b bili
4. A confidence interval for the population mean is then given by:
X t*s/ N
1-
Sampling from Finite

Populations
Notably, a confidence interval does not depend on the size of the
population
This may first seem surprising: if we are trying to estimate a
characteristic of a large population, then wouldnt we also need a
l
large
sample?
l ?
The reason why the size of the population doesnt matter is that the
chances that the luck of the draw will yield a sample whose mean
differs substantially from the population mean depends on the size of
the sample and the chances of selecting items that are far from the
population mean
That is, not on how many items there are in the population
1-

Random variable
Probability distribution
Expected Value
Mean
Variance
Selection, survivor,
and nonresponse bias
Sampling distribution
Population mean
Sample mean
Standard deviation
Population standard
d i ti
deviation
Standardized
random variable
Sample standard deviation
Population
Sample
D
off ffreedom
d
Degrees
Confidence interval
1-

Presentation Studenmund Using Econo

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Presentation Studenmund Using Econo

Încărcat de

Drepturi de autor:

Formate disponibile

Econometrics and Quantitative

Instructor: Dr. Samir Safi

2011 Pearson Addison-Wesley. All rights reserved.

Copyright 2011 Pearson Addison-Wesley.

Slides by Niels-Hugo Blunch

2011 Pearson Addison-Wesley. All rights reserved.

What is Econometrics? (cont.)

So econometrics is all about questions: the

2011 Pearson Addison-Wesley. All rights reserved.

Econometrics allows this g

2011 Pearson Addison-Wesley. All rights reserved.

What is Regression Analysis?

2011 Pearson Addison-Wesley. All rights reserved.

What is Regression Analysis?

2011 Pearson Addison-Wesley. All rights reserved.

Here, Q is the dependent variable and P, Ps, Yd are the

2011 Pearson Addison-Wesley. All rights reserved.

Single-Equation Linear Models

The ' s are denoted coefficients

2011 Pearson Addison-Wesley. All rights reserved.

2011 Pearson Addison-Wesley. All rights reserved.

Single-Equation Linear Models

Application of linear regression techniques requires that the

By contrast, the equation

What to do? First define

2011 Pearson Addison-Wesley. All rights reserved.

Single-Equation Linear Models

Is (1.3) a complete description of origins of variation in Y?

Other potentially important explanatory variables may be missing

Incorrect functional form

Purely random and totally unpredictable occurrences

Inclusion of a stochastic error term () effectively takes care

2011 Pearson Addison-Wesley. All rights reserved.

Single-Equation Linear Models

2011 Pearson Addison-Wesley. All rights reserved.

Whenever one or more of these factors are at play, the observed

2011 Pearson Addison-Wesley. All rights reserved.

Extending the Notation

So there are really N equations

2011 Pearson Addison-Wesley. All rights reserved.

Extending the Notation (cont.)

2011 Pearson Addison-Wesley. All rights reserved.

Example: Wage Regression

Substituting into equation (1.11) yields:

2011 Pearson Addison-Wesley. All rights reserved.

2011 Pearson Addison-Wesley. All rights reserved.

The Estimated Regression

The regression equation considered so far is the truebut

Instead of true, might think about this as the population

How do we obtain the empirical counterpart of the theoretical

The empirical counterpart to (1.14) is:

2011 Pearson Addison-Wesley. All rights reserved.

The Estimated Regression

2011 Pearson Addison-Wesley. All rights reserved.

The Estimated Regression

Note difference with the error term, i, given as

This all comes together in Figure 1.3

2011 Pearson Addison-Wesley. All rights reserved.

2011 Pearson Addison-Wesley. All rights reserved.

Example: Using Regression to

2011 Pearson Addison-Wesley. All rights reserved.

Example: Using Regression to

PR ICE i = 40 .0 + 0 .138 SIZE i

2011 Pearson Addison-Wesley. All rights reserved.