Sunteți pe pagina 1din 49

Martin Luther University of Halle-Wittenberg

Department of Economics
Chair of Econometrics

Econometrics
Lecture
6. Applications

Summer 2015
1 / 49

Key questions and objectives


This chapter focuses on the following key questions:
How does changing the units of measurement of variables affect the

OLS regression results (OLS intercept, slope estimates, standard errors,


t statistics, F statistics, and confidence intervals)?
How can we specify an appropriate functional form relationship between

the explained and explanatory variables?


How can we obtain confidence intervals for a prediction from the OLS

regression line?

2 / 49

Applications

6 Applications

6.1 Effects of data scaling on OLS statistics


6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms

6.3 Goodness-of-fit and selection of regressors


6.3.1 Adjusted R-squared
6.3.2 Selection of regressors

6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable

3 / 49

Applications
Effects of data scaling on OLS statistics

6 Applications

6.1 Effects of data scaling on OLS statistics


6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms

6.3 Goodness-of-fit and selection of regressors


6.3.1 Adjusted R-squared
6.3.2 Selection of regressors

6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable

4 / 49

Applications
Effects of data scaling on OLS statistics

6.1 Effects of data scaling on OLS statistics


In general, the coefficients, standard errors, confidence intervals, t

statistics, and F statistics change in ways that preserve all measured


effects and testing outcomes when variables are rescaled.
Data scaling is often used to reduce the number of zeros after a
decimal point in an estimated coefficient.
Example: birth weight and cigarette smoking
Regression model:

\ = 0 + 1 cigs + 2 faminc,
bwght

(6.1)

where
bwght
cigs
faminc

=
=
=

child birth weight, in ounces.


no. of cigs smoked by the pregnant mother, per day.
annual family income, in thousands of dollars
5 / 49

Applications

CHAPTER 6

187

Multiple Regression Analysis: Further Issues

Effects of data scaling on OLS statistics

T A B L E 6 . 1 Effects of Data Scaling


(1) bwght

Dependent Variable

(2) bwghtlbs

(3) bwght

Independent Variables
cigs


packs

.4634
(.0916)


faminc


intercept


Observations

R-Squared

SSR

557,485.51

SER

.0927
(.0292)
116.974
(1.049)
.0298
20.063


.0058
(.0018)


7.3109
(.0656)


1,388

1,388

.0289
(.0057)

.0298

2,177.6778

1.2539


9.268
(1.832)
.0927
(.0292)
116.974
(1.049)
1,388
.0298

557,485.51

20.063

Source: Wooldridge (2013), Table 6.1


49
The estimates of this equation, obtained using the data in BWGHT.RAW, are given in6 /the

Applications
Effects of data scaling on OLS statistics

Conversion of the dependent variable:


All OLS estimates change. But once the effects are transformed into
the same units, we get exactly the same answer, regardless of how the
dependent variable is measured.
Standard errors and confidence intervals change.
Residuals and SSR change.
Statistical significance is not affected. t and p values remain
unchanged.
R-squared is not affected.
Conversion of an explanatory variable affects only its coefficient and

standard error.
Question: in the birth weight equation, suppose that faminc is

measured in dollars rather than in thousands of dollars. Thus, define


the variable fincdol = 1, 000 faminc. How will the OLS statistics
change when fincdol is substituted for faminc? Do you think it is better
to measure income in dollars or in thousands of dollars?
7 / 49

Applications
Effects of data scaling on OLS statistics

If the dependent variable appears in logarithmic form, changing the

unit of measurement does not affect the slope coefficients:


Conversion: ln(cyi ) = ln c + ln yi , c > 0
New intercept: 0new = 0old + ln c

Similarly, changing the unit of measurement of any explanatory

variable xj , where ln(xj ) appears in the regression, only affects the


intercept.
Conversion: ln(cxij ) = ln c + ln xij , c > 0
New intercept: 0new = 0old j ln c

8 / 49

Applications
Functional form specification

6 Applications

6.1 Effects of data scaling on OLS statistics


6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms

6.3 Goodness-of-fit and selection of regressors


6.3.1 Adjusted R-squared
6.3.2 Selection of regressors

6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable

9 / 49

Applications
Functional form specification

6.2.1 Using logarithmic functional forms


Example: housing prices and air pollution
Estimated equation:
ln\
price =9.23
(0.19)

.718 ln nox
(.066)

+.306rooms

(6.7)

(.019)

The coefficient 1 is the elasticity of price with respect to nox: if nox

increases by 1%, price is predicted to fall by .718%, ceteris paribus.


The coefficient 2 is the semi-elasticity of price with respect to rooms.

It is the change in ln price, when rooms = 1. When multiplied by 100,


this is the approximate percentage change in price: one more room
increases price by about 30.6%.
The approximation error occurs because, as the change in ln y
becomes larger and larger, the approximation %y 100 ln y
becomes more and more inaccurate.
10 / 49

Applications
Functional form specification

For the exact interpretation, consider the general estimated model:

d
ln
y = 0 + 1 ln x1 + 2 x2 .
d
Holding fixed x1 , we have ln
y = 2 x2 .
Exact percentage change:
%
y = 100[exp (2 x2 ) 1],

(6.8)

where the multiplication by 100 turns the proportionate change into a


percentage change.
When x2 = 1,
%
y = 100[exp (2 ) 1].

(6.9)

In the housing price example,

%price = 100 [exp(.306) 1] = 35.8%, which is notably larger than


the approximate percentage change, 30.6%.
11 / 49

Applications
Functional form specification

Adjustment in 6.8 is not as crucial for small percentage changes.

Approximate
Exact

2
2 100
100[exp (2 ) 1]
0.05
5
5.13
0.10
10
10.52
0.15
15
16.18
0.20
20
22.14
0.30
30
34.99
0.50
50
64.87
Advantages of using logarithmic variables:
Appealing interpretations
When y > 0, models using ln y as the dependent variable often satisfy

the CLM assumptions more closely than models using the level of y .
Taking the log of a variable often narrows its range (e.g. monetary

values, such as firms annual sales). Narrowing the range of the


dependent and independent variables can make OLS estimates less
sensitive to outliers.
12 / 49

Applications
Functional form specification

Using explanatory variables that are measured as percentages:

\ = 0.3 0.05unemployment rate


ln(wage)
\ = 0.3 0.05 ln(unemployment rate)
ln(wage)

The first equation says that an increase in the unemployment rate by

one percentage point (e.g. a change from 8 to 9) decreases wages by


about 5%.
The second equation says that an increase in the unemployment rate by
one percent (e.g. a change from 8 to 8.08) decreases wages by about
0.05%.
Limitations of logarithms: logs cannot be used if a variable takes on

zero or negative values. Sometimes, ln(1 + y ) is used. However, this


approach is acceptable only when the data on y contain relatively few
zeros. Alternatives are Tobit and Poisson models.
13 / 49

Applications
Functional form specification

6.2.2 Models with quadratics


Quadratic functions are also used often to capture decreasing or

increasing marginal effects.


Example:

y = 0 + 1 x + 2 x 2 ,

(6.10)

where y = wage and x = exper.


Interpretation: the effect of x on y depends on the value of x.

y (1 + 22 x)x, so
1 + 22 x.
x

(6.11)

Typically, we might plug in the average value of x in the sample, or

some other interesting values, such as the median or the lower and
upper quartile values.
14 / 49

Applications
Functional form specification

Example: wage regression


Estimated equation:

w
[
age = 3.73 + .298exper .0061exper2

(6.12)

Equation 6.12 implies that exper has a diminishing effect on wage.


The first year of experience is worth 0.298 cent per hour.
The second year of experience is worth less: .298 2(.0061)(1) = .286.
In going from 10 to 11 years of experience, wage is predicted to

increase by about .298 2(.0061)(10) = .176.


The turning point (or maximum of the function) is achieved at the

coefficient on x over twice the absolute value of the coefficient on x2 :


x =

1
.298
=
24.4.
2(.0061)
22

(6.13)

15 / 49

all
Applications

but a small percentage of the people in the sample, then this is not of much concern.

Functional form specification

F I G U R E 6 . 1 Quadratic relationship between !


wage and exper.
wage

7.37

24.4

Source: Wooldridge (2013), Figure 6.1

exper

Cengage Learning, 2013

3.73

. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
uppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.

16 / 49

Applications
Functional form specification

Example: effects of pollution on housing prices


ln\
price = 0 + 1 ln nox + 2 ln dist + 3 rooms + 4 rooms2 5 stratio
. reg lprice lnox ldist c.rooms##c.rooms stratio
Source |
SS
df
MS
-------------+-----------------------------Model | 50.9872385
5 10.1974477
Residual | 33.5949865
500 .067189973
-------------+-----------------------------Total |
84.582225
505 .167489554

Number of obs
F( 5,
500)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

506
151.77
0.0000
0.6028
0.5988
.25921

--------------------------------------------------------------------------------lprice |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
----------------+---------------------------------------------------------------lnox |
-.901682
.1146869
-7.86
0.000
-1.12701
-.6763544
ldist | -.0867814
.0432807
-2.01
0.045
-.1718159
-.001747
rooms |
-.545113
.1654541
-3.29
0.001
-.870184
-.2200419
|
c.rooms#c.rooms |
.0622612
.012805
4.86
0.000
.037103
.0874194
|
stratio | -.0475902
.0058542
-8.13
0.000
-.059092
-.0360884
_cons |
13.38548
.5664731
23.63
0.000
12.27252
14.49844
--------------------------------------------------------------------------------17 / 49

Applications
Functional form specification

Interpretation: what is the effect of rooms on ln price?


Because the coefficient on rooms is negative and the coefficient on

rooms2 is positive, this equation implies that, at low values of rooms,


an additional room has a negative effect on ln price.
At some point, the effect becomes positive, and the quadratic shape

means that the semi-elasticity of price with respect to rooms is


increasing as rooms increases.
Turnaround value of rooms:

x =

(.5451)
= 4.4
2(.0623)

18 / 49

CHAPTER 6

Applications

197

Multiple Regression Analysis: Further Issues

Functional form specification

FIGURE 6.2 !
log(price) as a quadratic function of rooms.

4.4

Source: Wooldridge (2013), Figure 6.2


and so

rooms

Cengage Learning, 2013

log(price)

19 / 49

Applications
Functional form specification

Only five of the 506 communities in the sample have houses averaging

4.4 rooms or less, about 1% of the sample. Hence, the quadratic to the
left of 4.4 can, for practical purposes, be ignored.
To the right of 4.4, we see that adding another room has an increasing

effect on the percentage change in price:


d 100 {[.545 + 2(.062)] rooms} rooms
%price
= (54.5 + 12.4rooms)rooms
An increase in rooms from, say, five to six increases price by about

54.5 + 12.4(5) = 7.5%.


An increase from six to seven increases price by

54.5 + 12.4(6) = 19.9%.

20 / 49

Applications
Functional form specification

If the coefficients on the level and squared terms have the same sign

(either both positive or both negative) and the explanatory variable is


nonnegative, then there is no turning point for values x > 0.
Quadratic functions may also be used to allow for a nonconstant

elasticity.
Example:

ln price = 0 + 1 ln nox + 2 (ln nox)2 + ... + u.

(6.15)

The elasticity depends on the level of nox:


%price [1 + 22 ln nox]%nox.

(6.16)

Further (higher) polynomial terms can be included in regression models:

y = 0 + 1 x + 2 x 2 + 3 x 3 + 4 x 4 + u.
21 / 49

Applications
Functional form specification

6.2.3 Models with interaction terms


Sometimes, the partial effect, elasticity, or semi-elasticity of the

dependent variable with respect to an explanatory variable depends on


the magnitude of another explanatory variable.
Example: in the model

price = 0 + 1 sqrft + 2 bdrms + 3 sqrft bdrms + 4 bthrms + u


the partial effect of bdrms on price is
price
= 2 + 3 sqrft.
bdrms

(6.17)

Interaction effect between square footage and number of bedrooms:


if 3 > 0, then an additional bedroom yields a higher increase in
housing price for larger houses.
22 / 49

Applications
Functional form specification

Example: did returns to education change between 1978 and 1985?


Consider the following wage regression:

ln wage =1 + 2 y 85 + 3 educ + 4 y 85 educ + ... + u.


Returns to education are:

ln wage
= 3 + 4 y 85 =
educ

3 ,
if y 85 = 0;
3 + 4 , if y 85 = 1.

23 / 49

Applications
Functional form specification

Source |
SS
df
MS
-------------+-----------------------------Model | 135.992074
8 16.9990092
Residual | 183.099094 1075 .170324738
-------------+-----------------------------Total | 319.091167 1083
.29463635

Number of obs
F( 8, 1075)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

1084
99.80
0.0000
0.4262
0.4219
.4127

-----------------------------------------------------------------------------lwage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------y85 |
.1178062
.1237817
0.95
0.341
-.125075
.3606874
educ |
.0747209
.0066764
11.19
0.000
.0616206
.0878212
y85educ |
.0184605
.0093542
1.97
0.049
.000106
.036815
[output omitted]
_cons |
.4589329
.0934485
4.91
0.000
.2755707
.642295
------------------------------------------------------------------------------

Returns to education in 1978: 7.47%


Returns to education in 1985: (.0747 + .0185)100 = 9.32%
Returns to education increased between 1978 and 1985 by

4 = 0.0185, i.e. by 1.85 percentage points.


24 / 49

Applications
Goodness-of-fit and selection of regressors

6 Applications

6.1 Effects of data scaling on OLS statistics


6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms

6.3 Goodness-of-fit and selection of regressors


6.3.1 Adjusted R-squared
6.3.2 Selection of regressors

6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable

25 / 49

Applications
Goodness-of-fit and selection of regressors

6.3.1 Adjusted R-squared


R-squared is the proportion of the total sample variation in y that is

explained by x1 , x2 , ..., xk .
The size of R-squared does not affect unbiasedness.
R-squared never decreases when additional explanatory variables are
added to the model because SSR never goes up (and usually falls) as
more variables are added:
R2 = 1

SSR
.
SST

The adjusted R-squared imposes a penalty for adding additional

independent variables to a model:

2
2 = 1 SSR/(n k 1) = 1
R
.
SST /(n 1)
SST /(n 1)

(6.21)
26 / 49

Applications
Goodness-of-fit and selection of regressors

SSR/(n k 1) can go up or down when a new independent variable

is added to a regression.
2
If we add a new independent variable to a regression equation, R
increases if, and only if, the t statistic on the new variable is greater
than one in absolute value.
It holds that
2
2 = 1 (1 R )(n 1) .
R
(n k 1)

(6.22)

2 can be negative, indicating a very poor model fit relative to the


R
number of degrees of freedom.

27 / 49

Applications
Goodness-of-fit and selection of regressors

Adjusted R-squared can be used to choose between nonnested

models. (Two equations are nonnested models when neither equation


is a special case of the other.)
Example: explaining major league baseball players salaries

Model 1: ln salary = 0 + 1 yrs + 2 games + 3 bavg + 4 hrunsyr + u


2 = .6211
R
Model 2: ln salary = 0 + 1 yrs + 2 games + 3 bavg + 4 rbisyr + u
2 = .6226
R
Based on the adjusted R-squared, there is a very slight preference for
the model with rbisyr.

28 / 49

Applications
Goodness-of-fit and selection of regressors

Example: explaining R&D intensity

Model 1:

rdintens = 0 + 1 ln sales + u
2 = .030
R 2 = .061, R

Model 2:

rdintens = 0 + 1 sales + 2 sales2 + u


2 = .090
R 2 = .148, R

The first model captures a diminishing return by including sales in


logarithmic form; the second model does this by using a quadratic.
Thus, the second model contains one more parameter than the first.
2 can be used to choose between different functional
Neither R 2 nor R
forms for the dependent variable.

29 / 49

Applications
Goodness-of-fit and selection of regressors

6.3.2 Selection of regressors


A long regression (i.e. with many explanatory variables) is more likely

to have ceteris paribus interpretation than a short regression.


Furthermore, a long regression generates more precise estimates of the

coefficients on the variables included in a short regression because these


covariates lead to a smaller residual variance.
However, it is also possible to control for too many variables in a

regression analysis (over controlling).

30 / 49

Applications
Goodness-of-fit and selection of regressors

Example: impact of state beer taxes on traffic fatalities


Idea: a higher tax on beer will reduce alcohol consumption, and
likewise drunk driving, resulting in fewer traffic fatalities.
Model to measure the ceteris paribus effect of taxes on fatalities:
fatalities = 0 + 1 tax + 2 miles + 3 percmale + 4 perc16 21 + ...,
where
miles = total miles driven.
percmale = percentage of the state population that is male.
perc16 21 = percentage of the population between ages 16 and 21,
The model does not included a variable measuring per capita beer
consumption. Are we committing an omitted variables error?
No, because controlling for beer consumption would imply that we
measures the difference in fatalities due to a one percentage point
increase in tax, holding beer consumption fixed. This is not
interesting.
31 / 49

Applications
Prediction

6 Applications

6.1 Effects of data scaling on OLS statistics


6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms

6.3 Goodness-of-fit and selection of regressors


6.3.1 Adjusted R-squared
6.3.2 Selection of regressors

6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable

32 / 49

Applications
Prediction

6.4.1 Confidence intervals for predictions


(a) CI for E (y |x1 , ..., xk ) (for the average value of y for the
subpopulation with a given set of covariates)
Predictions are subject to sampling variation because they are obtained
using the OLS estimators.
Estimated equation:
y = 0 + 1 x1 + 2 x2 + ... + k xk .

(6.27)

Plugging in particular values of the independent variables, we obtain a

prediction for y . The parameter we would like to estimate is:


0 = 0 + 1 c1 + 2 c2 + ... + k ck

(6.28)

= E (y |x1 = c1 , x2 = c2 , ..., xk = ck ).
The estimator of 0 is

0 = 0 + 1 c1 + 2 c2 + ... + k ck .

(6.29)
33 / 49

Applications
Prediction

The uncertainty in this prediction is represented by a confidence

interval for 0 .
With a large df, we can construct a 95% confidence interval for 0

using the rule of thumb 0 2 se(0 ).


How do we obtain the standard error of 0 ? Trick:
Write 0 = 0 1 c1 2 c2 ... k ck .
Plug this into
y = 0 + 1 x1 + 2 x2 + ... + k xk + u.
This gives

y = 0 + 1 (x1 c1 ) + 2 (x2 c2 ) + ... + k (xk ck ) + u.

(6.30)

That is, we run a regression where we subtract the value cj from each
observation on xj .
The predicted value and its standard error are obtained from the
intercept in regression 6.30.
34 / 49

Applications
Prediction

Example: confidence interval for predicted college GPA


Estimation results for predicting college GPA:
Source |
SS
df
MS
-------------+-----------------------------Model | 499.030504
4 124.757626
Residual | 1295.16517 4132 .313447524
-------------+-----------------------------Total | 1794.19567 4136 .433799728

Number of obs
F( 4, 4132)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

4137
398.02
0.0000
0.2781
0.2774
.55986

-----------------------------------------------------------------------------colgpa |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------sat |
.0014925
.0000652
22.89
0.000
.0013646
.0016204
hsperc | -.0138558
.000561
-24.70
0.000
-.0149557
-.0127559
hsize | -.0608815
.0165012
-3.69
0.000
-.0932328
-.0285302
hsizesq |
.0054603
.0022698
2.41
0.016
.0010102
.0099104
_cons |
1.492652
.0753414
19.81
0.000
1.344942
1.640362
------------------------------------------------------------------------------

Note: definition of variables is colgpa=GPA after fall semester,


sat=combined SAT score, hsperc=high school percentile (from top),
hsize=size grad. class (100s).

35 / 49

Applications
Prediction

What is predicted college GPA, when sat=1,200, hsperc=30, and

hsize=5 (which means 500)?


Define a new set of independent variables: sat0 = sat - 1,200, hsperc0
= hsperc - 30, hsize0 = hsize - 5, and hsizesq0 = hsize2 - 25.
Source |
SS
df
MS
-------------+-----------------------------Model | 499.030503
4 124.757626
Residual | 1295.16517 4132 .313447524
-------------+-----------------------------Total | 1794.19567 4136 .433799728

Number of obs
F( 4, 4132)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

4137
398.02
0.0000
0.2781
0.2774
.55986

-----------------------------------------------------------------------------colgpa |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------sat0 |
.0014925
.0000652
22.89
0.000
.0013646
.0016204
hsperc0 | -.0138558
.000561
-24.70
0.000
-.0149557
-.0127559
hsize0 | -.0608815
.0165012
-3.69
0.000
-.0932328
-.0285302
hsizesq0 |
.0054603
.0022698
2.41
0.016
.0010102
.0099104
_cons |
2.700075
.0198778
135.83
0.000
2.661104
2.739047
-----------------------------------------------------------------------------36 / 49

Applications
Prediction

The variance of the prediction is smallest at the mean values of the xj

(because the variance of the intercept estimator is smallest when each


explanatory variable has zero sample mean).
(b) CI for a particular unit from the population: prediction interval
In forming a confidence interval for an unknown outcome on y , we
must account for the variance in the unobserved error.
Let y 0 be the value for an individual not in our original sample.
Let x10 , x20 , ..., xk0 be the new values of the independent variables.
Let u 0 be the unobserved error.
Model for observation (y 0 , x10 , ..., xk0 ):
y 0 = 0 + 1 x10 + 2 x20 + ... + k xk0 + u 0 .

(6.33)

Prediction:

y 0 = 0 + 1 x10 + 2 x20 + ... + k xk0 .


Prediction error:

e0 = y 0 y 0 = (0 + 1 x10 + 2 x20 + ... + k xk0 ) + u 0 y 0 .

(6.34)
37 / 49

Applications
Prediction

The expected prediction error is zero, E (


e 0 ) = 0, because

E (
y 0 ) = y 0 (as the j are unbiased) and u 0 has zero mean.
The variance of the prediction error is the sum of the variances

because u 0 and y 0 are uncorrelated:


Var (
e 0 ) = Var (
y 0 ) + Var (u 0 ) = Var (
y 0) + 2.

(6.35)

There are two sources of variation in e0 :


1 Sampling error in y
0 , which arises because we have estimated the j ;
decreases with sample size.
2 2 is variance of the error in the population; it does not change with the
sample size.
Standard error of e0 :

se(
e 0 ) = {[se(
y 0 )]2 +
2 }1/2 .

(6.36)

38 / 49

Applications
Prediction

It holds that e0 /se(


e 0 ) has a t distribution with n k 1 degrees of

freedom.
Therefore,



e0
P t/2 6
6
t
/2 = 1
se(
e 0)


y 0 y 0
6 t/2 = 1
P t/2 6
se(
e 0)


P y 0 t/2 se(
e 0 ) 6 y 0 6 y0 + t/2 se(
e 0) = 1

39 / 49

Applications
Prediction

Example: prediction interval (for GPA) for any particular student


Source |
SS
df
MS
-------------+-----------------------------Model | 499.030503
4 124.757626
Residual | 1295.16517 4132 .313447524
-------------+-----------------------------Total | 1794.19567 4136 .433799728

Number of obs
F( 4, 4132)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

4137
398.02
0.0000
0.2781
0.2774
.55986

-----------------------------------------------------------------------------colgpa |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------sat0 |
.0014925
.0000652
22.89
0.000
.0013646
.0016204
hsperc0 | -.0138558
.000561
-24.70
0.000
-.0149557
-.0127559
hsize0 | -.0608815
.0165012
-3.69
0.000
-.0932328
-.0285302
hsizesq0 |
.0054603
.0022698
2.41
0.016
.0010102
.0099104
_cons |
2.700075
.0198778
135.83
0.000
2.661104
2.739047
------------------------------------------------------------------------------

se(
e 0 ) = [(.020)2 + 0.5602 ]1/2 .560.
Prediction interval: 2.70 1.96 .560 = [1.60, 3.80].
40 / 49

Applications
Prediction

6.4.2 Predicting y when ln y is the dependent variable


Given the OLS estimators, we can predict ln y for any value of the

explanatory variables:
d
ln
y = 0 + 1 x1 + 2 x2 + ... + k xk .

(6.39)

How to predict y ?

d
N.B.: y 6= exp(ln
y ). Hence, simply exponentiate the predicted value
for ln y does not work. In fact, it will systematically underestimate the
expected value of y .
It can be shown that

E (y |x) = exp( 2 /2) exp(0 + 1 x1 + 2 x2 + ... + k xk ),


where 2 is the variance of u.
41 / 49

Applications
Prediction

Hence, the prediction of y is:

d
y = exp(
2 /2) exp(ln
y ),

(6.40)

where
2 is the unbiased estimator of 2 .
The prediction in 6.40 relies on the normality of the error term, u.
How to obtain a prediction that does not rely on normality?
General model:

E (y |x) = 0 exp(0 + 1 x1 + 2 x2 + ... + k xk ),

(6.41)

where 0 is the expected value of exp(u).


Given an estimate
0 , we can predict y as

d
y =
0 exp(ln
y ).

(6.42)

42 / 49

Applications
Prediction

First approach to estimate 0 : a consistent but not unbiased

smearing estimate is

0 = n1

n
X

exp(
ui ).

(6.43)

i=1

Second approach to estimate 0 :


Define mi = exp(0 + 1 xi1 + 2 xi2 + ... + k xik ).
d
Replace the j with their OLS estimates and obtain m
i = exp(ln
yi ).
Estimate a simple regression of yi on m
i without an intercept. The
slope estimate is a consistent but not unbiased estimate for 0 .
With a consistent estimate for 0 , the prediction for y can be

d
calculated as
0 exp(ln
y ).

43 / 49

Applications
Prediction

Example: predicting CEO salaries


Model:
ln salary = 0 + 1 ln sales + 2 ln mktval + 3 ceoten + u,
Estimation results:
Source |
SS
df
MS
-------------+-----------------------------Model | 20.5672434
3 6.85574779
Residual | 44.0789697
173 .254791732
-------------+-----------------------------Total | 64.6462131
176 .367308029

Number of obs
F( 3,
173)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

177
26.91
0.0000
0.3182
0.3063
.50477

-----------------------------------------------------------------------------lsalary |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------lsales |
.1628545
.0392421
4.15
0.000
.0853995
.2403094
lmktval |
.109243
.0495947
2.20
0.029
.0113545
.2071315
ceoten |
.0117054
.0053261
2.20
0.029
.001193
.0222178
_cons |
4.503795
.2572344
17.51
0.000
3.996073
5.011517
-----------------------------------------------------------------------------44 / 49

Applications
Prediction

The smearing estimate for 0 is:


. predict uhat, res
. gen euhat = exp(uhat)
. su euhat
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------euhat |
177
1.135661
.6970541
.0823372
6.378018

45 / 49

Applications
Prediction

The regression estimate for 0 is:


. predict lsalary_hat
(option xb assumed; fitted values)
. gen m_hat = exp(lsalary_hat)
. reg salary m_hat, nocons
Source |
SS
df
MS
-------------+-----------------------------Model |
147352711
1
147352711
Residual |
46113901
176 262010.801
-------------+-----------------------------Total |
193466612
177 1093031.71

Number of obs
F( 1,
176)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

177
562.39
0.0000
0.7616
0.7603
511.87

-----------------------------------------------------------------------------salary |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------m_hat |
1.116857
.0470953
23.71
0.000
1.023912
1.209801
------------------------------------------------------------------------------

46 / 49

Applications
Prediction

Prediction for sales = 5,000 (which means $5 billion because sales is in

millions), mktval = 10,000 (or $10 billion), and ceoten = 10:


ln\
salary = 4.503 + 0.163 ln(5000) + 0.109 ln(10000) + 0.012 10
= 7.013.
Naive prediction: exp(7.013) = 1110.983.
Prediction using smearing estimate: 1.136 exp(7.013) = 1262.076.
Prediction using regression estimate: 1.117 exp(7.013) = 1240.967.

47 / 49

Key terms

References

References

Key terms

adjusted R-squared
interaction effect
nonnested models
over controlling
prediction error
prediction interval
predictions
quadratic functions
smearing estimate
variance of the prediction error

48 / 49

Key terms

References

References

References
Textbook: Chapter 6 in Wooldridge (2013).
Further readings: Chapter 8, Chapter 9 in Stock and Watson (2012).
Chapter 6, Chapter 10 in Hill et al. (2001)
Hill, R. C., Griffiths, W. E., and Judge, G. G. (2001). Undergraduate
Econometrics. John Wiley & Sons, New York.
Stock, J. H. and Watson, M. W. (2012). Introduction to Econometrics.
Pearson, Boston.
Wooldridge, J. M. (2013). Introductory Econometrics: A Modern Approach.
Cengage Learning, Mason, OH.

49 / 49