Martin Luther University of HalleWittenberg
Department of Economics
Chair of Econometrics
Econometrics
Lecture
6. Applications
Summer 2015
1 / 49
Key questions and objectives
This chapter focuses on the following key questions:
How does changing the units of measurement of variables affect the
OLS regression results (OLS intercept, slope estimates, standard errors,
t statistics, F statistics, and confidence intervals)?
How can we specify an appropriate functional form relationship between
the explained and explanatory variables?
How can we obtain confidence intervals for a prediction from the OLS
regression line?
2 / 49
Applications
6 Applications
6.1 Effects of data scaling on OLS statistics
6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms
6.3 Goodnessoffit and selection of regressors
6.3.1 Adjusted Rsquared
6.3.2 Selection of regressors
6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable
3 / 49
Applications
Effects of data scaling on OLS statistics
6 Applications
6.1 Effects of data scaling on OLS statistics
6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms
6.3 Goodnessoffit and selection of regressors
6.3.1 Adjusted Rsquared
6.3.2 Selection of regressors
6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable
4 / 49
Applications
Effects of data scaling on OLS statistics
6.1 Effects of data scaling on OLS statistics
In general, the coefficients, standard errors, confidence intervals, t
statistics, and F statistics change in ways that preserve all measured
effects and testing outcomes when variables are rescaled.
Data scaling is often used to reduce the number of zeros after a
decimal point in an estimated coefficient.
Example: birth weight and cigarette smoking
Regression model:
\ = 0 + 1 cigs + 2 faminc,
bwght
(6.1)
where
bwght
cigs
faminc
=
=
=
child birth weight, in ounces.
no. of cigs smoked by the pregnant mother, per day.
annual family income, in thousands of dollars
5 / 49
Applications
CHAPTER 6
187
Multiple Regression Analysis: Further Issues
Effects of data scaling on OLS statistics
T A B L E 6 . 1 Effects of Data Scaling
(1) bwght
Dependent Variable
(2) bwghtlbs
(3) bwght
Independent Variables
cigs
packs
.4634
(.0916)
faminc
intercept
Observations
RSquared
SSR
557,485.51
SER
.0927
(.0292)
116.974
(1.049)
.0298
20.063
.0058
(.0018)
7.3109
(.0656)
1,388
1,388
.0289
(.0057)
.0298
2,177.6778
1.2539
9.268
(1.832)
.0927
(.0292)
116.974
(1.049)
1,388
.0298
557,485.51
20.063
Source: Wooldridge (2013), Table 6.1
49
The estimates of this equation, obtained using the data in BWGHT.RAW, are given in6 /the
Applications
Effects of data scaling on OLS statistics
Conversion of the dependent variable:
All OLS estimates change. But once the effects are transformed into
the same units, we get exactly the same answer, regardless of how the
dependent variable is measured.
Standard errors and confidence intervals change.
Residuals and SSR change.
Statistical significance is not affected. t and p values remain
unchanged.
Rsquared is not affected.
Conversion of an explanatory variable affects only its coefficient and
standard error.
Question: in the birth weight equation, suppose that faminc is
measured in dollars rather than in thousands of dollars. Thus, define
the variable fincdol = 1, 000 faminc. How will the OLS statistics
change when fincdol is substituted for faminc? Do you think it is better
to measure income in dollars or in thousands of dollars?
7 / 49
Applications
Effects of data scaling on OLS statistics
If the dependent variable appears in logarithmic form, changing the
unit of measurement does not affect the slope coefficients:
Conversion: ln(cyi ) = ln c + ln yi , c > 0
New intercept: 0new = 0old + ln c
Similarly, changing the unit of measurement of any explanatory
variable xj , where ln(xj ) appears in the regression, only affects the
intercept.
Conversion: ln(cxij ) = ln c + ln xij , c > 0
New intercept: 0new = 0old j ln c
8 / 49
Applications
Functional form specification
6 Applications
6.1 Effects of data scaling on OLS statistics
6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms
6.3 Goodnessoffit and selection of regressors
6.3.1 Adjusted Rsquared
6.3.2 Selection of regressors
6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable
9 / 49
Applications
Functional form specification
6.2.1 Using logarithmic functional forms
Example: housing prices and air pollution
Estimated equation:
ln\
price =9.23
(0.19)
.718 ln nox
(.066)
+.306rooms
(6.7)
(.019)
The coefficient 1 is the elasticity of price with respect to nox: if nox
increases by 1%, price is predicted to fall by .718%, ceteris paribus.
The coefficient 2 is the semielasticity of price with respect to rooms.
It is the change in ln price, when rooms = 1. When multiplied by 100,
this is the approximate percentage change in price: one more room
increases price by about 30.6%.
The approximation error occurs because, as the change in ln y
becomes larger and larger, the approximation %y 100 ln y
becomes more and more inaccurate.
10 / 49
Applications
Functional form specification
For the exact interpretation, consider the general estimated model:
d
ln
y = 0 + 1 ln x1 + 2 x2 .
d
Holding fixed x1 , we have ln
y = 2 x2 .
Exact percentage change:
%
y = 100[exp (2 x2 ) 1],
(6.8)
where the multiplication by 100 turns the proportionate change into a
percentage change.
When x2 = 1,
%
y = 100[exp (2 ) 1].
(6.9)
In the housing price example,
%price = 100 [exp(.306) 1] = 35.8%, which is notably larger than
the approximate percentage change, 30.6%.
11 / 49
Applications
Functional form specification
Adjustment in 6.8 is not as crucial for small percentage changes.
Approximate
Exact
2
2 100
100[exp (2 ) 1]
0.05
5
5.13
0.10
10
10.52
0.15
15
16.18
0.20
20
22.14
0.30
30
34.99
0.50
50
64.87
Advantages of using logarithmic variables:
Appealing interpretations
When y > 0, models using ln y as the dependent variable often satisfy
the CLM assumptions more closely than models using the level of y .
Taking the log of a variable often narrows its range (e.g. monetary
values, such as firms annual sales). Narrowing the range of the
dependent and independent variables can make OLS estimates less
sensitive to outliers.
12 / 49
Applications
Functional form specification
Using explanatory variables that are measured as percentages:
\ = 0.3 0.05unemployment rate
ln(wage)
\ = 0.3 0.05 ln(unemployment rate)
ln(wage)
The first equation says that an increase in the unemployment rate by
one percentage point (e.g. a change from 8 to 9) decreases wages by
about 5%.
The second equation says that an increase in the unemployment rate by
one percent (e.g. a change from 8 to 8.08) decreases wages by about
0.05%.
Limitations of logarithms: logs cannot be used if a variable takes on
zero or negative values. Sometimes, ln(1 + y ) is used. However, this
approach is acceptable only when the data on y contain relatively few
zeros. Alternatives are Tobit and Poisson models.
13 / 49
Applications
Functional form specification
6.2.2 Models with quadratics
Quadratic functions are also used often to capture decreasing or
increasing marginal effects.
Example:
y = 0 + 1 x + 2 x 2 ,
(6.10)
where y = wage and x = exper.
Interpretation: the effect of x on y depends on the value of x.
y (1 + 22 x)x, so
1 + 22 x.
x
(6.11)
Typically, we might plug in the average value of x in the sample, or
some other interesting values, such as the median or the lower and
upper quartile values.
14 / 49
Applications
Functional form specification
Example: wage regression
Estimated equation:
w
[
age = 3.73 + .298exper .0061exper2
(6.12)
Equation 6.12 implies that exper has a diminishing effect on wage.
The first year of experience is worth 0.298 cent per hour.
The second year of experience is worth less: .298 2(.0061)(1) = .286.
In going from 10 to 11 years of experience, wage is predicted to
increase by about .298 2(.0061)(10) = .176.
The turning point (or maximum of the function) is achieved at the
coefficient on x over twice the absolute value of the coefficient on x2 :
x =
1
.298
=
24.4.
2(.0061)
22
(6.13)
15 / 49
all
Applications
but a small percentage of the people in the sample, then this is not of much concern.
Functional form specification
F I G U R E 6 . 1 Quadratic relationship between !
wage and exper.
wage
7.37
24.4
Source: Wooldridge (2013), Figure 6.1
exper
Cengage Learning, 2013
3.73
. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s). Editorial review has
uppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
16 / 49
Applications
Functional form specification
Example: effects of pollution on housing prices
ln\
price = 0 + 1 ln nox + 2 ln dist + 3 rooms + 4 rooms2 5 stratio
. reg lprice lnox ldist c.rooms##c.rooms stratio
Source 
SS
df
MS
+Model  50.9872385
5 10.1974477
Residual  33.5949865
500 .067189973
+Total 
84.582225
505 .167489554
Number of obs
F( 5,
500)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
506
151.77
0.0000
0.6028
0.5988
.25921
lprice 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+lnox 
.901682
.1146869
7.86
0.000
1.12701
.6763544
ldist  .0867814
.0432807
2.01
0.045
.1718159
.001747
rooms 
.545113
.1654541
3.29
0.001
.870184
.2200419

c.rooms#c.rooms 
.0622612
.012805
4.86
0.000
.037103
.0874194

stratio  .0475902
.0058542
8.13
0.000
.059092
.0360884
_cons 
13.38548
.5664731
23.63
0.000
12.27252
14.49844
17 / 49
Applications
Functional form specification
Interpretation: what is the effect of rooms on ln price?
Because the coefficient on rooms is negative and the coefficient on
rooms2 is positive, this equation implies that, at low values of rooms,
an additional room has a negative effect on ln price.
At some point, the effect becomes positive, and the quadratic shape
means that the semielasticity of price with respect to rooms is
increasing as rooms increases.
Turnaround value of rooms:
x =
(.5451)
= 4.4
2(.0623)
18 / 49
CHAPTER 6
Applications
197
Multiple Regression Analysis: Further Issues
Functional form specification
FIGURE 6.2 !
log(price) as a quadratic function of rooms.
4.4
Source: Wooldridge (2013), Figure 6.2
and so
rooms
Cengage Learning, 2013
log(price)
19 / 49
Applications
Functional form specification
Only five of the 506 communities in the sample have houses averaging
4.4 rooms or less, about 1% of the sample. Hence, the quadratic to the
left of 4.4 can, for practical purposes, be ignored.
To the right of 4.4, we see that adding another room has an increasing
effect on the percentage change in price:
d 100 {[.545 + 2(.062)] rooms} rooms
%price
= (54.5 + 12.4rooms)rooms
An increase in rooms from, say, five to six increases price by about
54.5 + 12.4(5) = 7.5%.
An increase from six to seven increases price by
54.5 + 12.4(6) = 19.9%.
20 / 49
Applications
Functional form specification
If the coefficients on the level and squared terms have the same sign
(either both positive or both negative) and the explanatory variable is
nonnegative, then there is no turning point for values x > 0.
Quadratic functions may also be used to allow for a nonconstant
elasticity.
Example:
ln price = 0 + 1 ln nox + 2 (ln nox)2 + ... + u.
(6.15)
The elasticity depends on the level of nox:
%price [1 + 22 ln nox]%nox.
(6.16)
Further (higher) polynomial terms can be included in regression models:
y = 0 + 1 x + 2 x 2 + 3 x 3 + 4 x 4 + u.
21 / 49
Applications
Functional form specification
6.2.3 Models with interaction terms
Sometimes, the partial effect, elasticity, or semielasticity of the
dependent variable with respect to an explanatory variable depends on
the magnitude of another explanatory variable.
Example: in the model
price = 0 + 1 sqrft + 2 bdrms + 3 sqrft bdrms + 4 bthrms + u
the partial effect of bdrms on price is
price
= 2 + 3 sqrft.
bdrms
(6.17)
Interaction effect between square footage and number of bedrooms:
if 3 > 0, then an additional bedroom yields a higher increase in
housing price for larger houses.
22 / 49
Applications
Functional form specification
Example: did returns to education change between 1978 and 1985?
Consider the following wage regression:
ln wage =1 + 2 y 85 + 3 educ + 4 y 85 educ + ... + u.
Returns to education are:
ln wage
= 3 + 4 y 85 =
educ
3 ,
if y 85 = 0;
3 + 4 , if y 85 = 1.
23 / 49
Applications
Functional form specification
Source 
SS
df
MS
+Model  135.992074
8 16.9990092
Residual  183.099094 1075 .170324738
+Total  319.091167 1083
.29463635
Number of obs
F( 8, 1075)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
1084
99.80
0.0000
0.4262
0.4219
.4127
lwage 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+y85 
.1178062
.1237817
0.95
0.341
.125075
.3606874
educ 
.0747209
.0066764
11.19
0.000
.0616206
.0878212
y85educ 
.0184605
.0093542
1.97
0.049
.000106
.036815
[output omitted]
_cons 
.4589329
.0934485
4.91
0.000
.2755707
.642295

Returns to education in 1978: 7.47%
Returns to education in 1985: (.0747 + .0185)100 = 9.32%
Returns to education increased between 1978 and 1985 by
4 = 0.0185, i.e. by 1.85 percentage points.
24 / 49
Applications
Goodnessoffit and selection of regressors
6 Applications
6.1 Effects of data scaling on OLS statistics
6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms
6.3 Goodnessoffit and selection of regressors
6.3.1 Adjusted Rsquared
6.3.2 Selection of regressors
6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable
25 / 49
Applications
Goodnessoffit and selection of regressors
6.3.1 Adjusted Rsquared
Rsquared is the proportion of the total sample variation in y that is
explained by x1 , x2 , ..., xk .
The size of Rsquared does not affect unbiasedness.
Rsquared never decreases when additional explanatory variables are
added to the model because SSR never goes up (and usually falls) as
more variables are added:
R2 = 1
SSR
.
SST
The adjusted Rsquared imposes a penalty for adding additional
independent variables to a model:
2
2 = 1 SSR/(n k 1) = 1
R
.
SST /(n 1)
SST /(n 1)
(6.21)
26 / 49
Applications
Goodnessoffit and selection of regressors
SSR/(n k 1) can go up or down when a new independent variable
is added to a regression.
2
If we add a new independent variable to a regression equation, R
increases if, and only if, the t statistic on the new variable is greater
than one in absolute value.
It holds that
2
2 = 1 (1 R )(n 1) .
R
(n k 1)
(6.22)
2 can be negative, indicating a very poor model fit relative to the
R
number of degrees of freedom.
27 / 49
Applications
Goodnessoffit and selection of regressors
Adjusted Rsquared can be used to choose between nonnested
models. (Two equations are nonnested models when neither equation
is a special case of the other.)
Example: explaining major league baseball players salaries
Model 1: ln salary = 0 + 1 yrs + 2 games + 3 bavg + 4 hrunsyr + u
2 = .6211
R
Model 2: ln salary = 0 + 1 yrs + 2 games + 3 bavg + 4 rbisyr + u
2 = .6226
R
Based on the adjusted Rsquared, there is a very slight preference for
the model with rbisyr.
28 / 49
Applications
Goodnessoffit and selection of regressors
Example: explaining R&D intensity
Model 1:
rdintens = 0 + 1 ln sales + u
2 = .030
R 2 = .061, R
Model 2:
rdintens = 0 + 1 sales + 2 sales2 + u
2 = .090
R 2 = .148, R
The first model captures a diminishing return by including sales in
logarithmic form; the second model does this by using a quadratic.
Thus, the second model contains one more parameter than the first.
2 can be used to choose between different functional
Neither R 2 nor R
forms for the dependent variable.
29 / 49
Applications
Goodnessoffit and selection of regressors
6.3.2 Selection of regressors
A long regression (i.e. with many explanatory variables) is more likely
to have ceteris paribus interpretation than a short regression.
Furthermore, a long regression generates more precise estimates of the
coefficients on the variables included in a short regression because these
covariates lead to a smaller residual variance.
However, it is also possible to control for too many variables in a
regression analysis (over controlling).
30 / 49
Applications
Goodnessoffit and selection of regressors
Example: impact of state beer taxes on traffic fatalities
Idea: a higher tax on beer will reduce alcohol consumption, and
likewise drunk driving, resulting in fewer traffic fatalities.
Model to measure the ceteris paribus effect of taxes on fatalities:
fatalities = 0 + 1 tax + 2 miles + 3 percmale + 4 perc16 21 + ...,
where
miles = total miles driven.
percmale = percentage of the state population that is male.
perc16 21 = percentage of the population between ages 16 and 21,
The model does not included a variable measuring per capita beer
consumption. Are we committing an omitted variables error?
No, because controlling for beer consumption would imply that we
measures the difference in fatalities due to a one percentage point
increase in tax, holding beer consumption fixed. This is not
interesting.
31 / 49
Applications
Prediction
6 Applications
6.1 Effects of data scaling on OLS statistics
6.2 Functional form specification
6.2.1 Using logarithmic functional forms
6.2.2 Models with quadratics
6.2.3 Models with interaction terms
6.3 Goodnessoffit and selection of regressors
6.3.1 Adjusted Rsquared
6.3.2 Selection of regressors
6.4 Prediction
6.4.1 Confidence intervals for predictions
6.4.2 Predicting y when ln y is the dependent variable
32 / 49
Applications
Prediction
6.4.1 Confidence intervals for predictions
(a) CI for E (y x1 , ..., xk ) (for the average value of y for the
subpopulation with a given set of covariates)
Predictions are subject to sampling variation because they are obtained
using the OLS estimators.
Estimated equation:
y = 0 + 1 x1 + 2 x2 + ... + k xk .
(6.27)
Plugging in particular values of the independent variables, we obtain a
prediction for y . The parameter we would like to estimate is:
0 = 0 + 1 c1 + 2 c2 + ... + k ck
(6.28)
= E (y x1 = c1 , x2 = c2 , ..., xk = ck ).
The estimator of 0 is
0 = 0 + 1 c1 + 2 c2 + ... + k ck .
(6.29)
33 / 49
Applications
Prediction
The uncertainty in this prediction is represented by a confidence
interval for 0 .
With a large df, we can construct a 95% confidence interval for 0
using the rule of thumb 0 2 se(0 ).
How do we obtain the standard error of 0 ? Trick:
Write 0 = 0 1 c1 2 c2 ... k ck .
Plug this into
y = 0 + 1 x1 + 2 x2 + ... + k xk + u.
This gives
y = 0 + 1 (x1 c1 ) + 2 (x2 c2 ) + ... + k (xk ck ) + u.
(6.30)
That is, we run a regression where we subtract the value cj from each
observation on xj .
The predicted value and its standard error are obtained from the
intercept in regression 6.30.
34 / 49
Applications
Prediction
Example: confidence interval for predicted college GPA
Estimation results for predicting college GPA:
Source 
SS
df
MS
+Model  499.030504
4 124.757626
Residual  1295.16517 4132 .313447524
+Total  1794.19567 4136 .433799728
Number of obs
F( 4, 4132)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
4137
398.02
0.0000
0.2781
0.2774
.55986
colgpa 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+sat 
.0014925
.0000652
22.89
0.000
.0013646
.0016204
hsperc  .0138558
.000561
24.70
0.000
.0149557
.0127559
hsize  .0608815
.0165012
3.69
0.000
.0932328
.0285302
hsizesq 
.0054603
.0022698
2.41
0.016
.0010102
.0099104
_cons 
1.492652
.0753414
19.81
0.000
1.344942
1.640362

Note: definition of variables is colgpa=GPA after fall semester,
sat=combined SAT score, hsperc=high school percentile (from top),
hsize=size grad. class (100s).
35 / 49
Applications
Prediction
What is predicted college GPA, when sat=1,200, hsperc=30, and
hsize=5 (which means 500)?
Define a new set of independent variables: sat0 = sat  1,200, hsperc0
= hsperc  30, hsize0 = hsize  5, and hsizesq0 = hsize2  25.
Source 
SS
df
MS
+Model  499.030503
4 124.757626
Residual  1295.16517 4132 .313447524
+Total  1794.19567 4136 .433799728
Number of obs
F( 4, 4132)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
4137
398.02
0.0000
0.2781
0.2774
.55986
colgpa 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+sat0 
.0014925
.0000652
22.89
0.000
.0013646
.0016204
hsperc0  .0138558
.000561
24.70
0.000
.0149557
.0127559
hsize0  .0608815
.0165012
3.69
0.000
.0932328
.0285302
hsizesq0 
.0054603
.0022698
2.41
0.016
.0010102
.0099104
_cons 
2.700075
.0198778
135.83
0.000
2.661104
2.739047
36 / 49
Applications
Prediction
The variance of the prediction is smallest at the mean values of the xj
(because the variance of the intercept estimator is smallest when each
explanatory variable has zero sample mean).
(b) CI for a particular unit from the population: prediction interval
In forming a confidence interval for an unknown outcome on y , we
must account for the variance in the unobserved error.
Let y 0 be the value for an individual not in our original sample.
Let x10 , x20 , ..., xk0 be the new values of the independent variables.
Let u 0 be the unobserved error.
Model for observation (y 0 , x10 , ..., xk0 ):
y 0 = 0 + 1 x10 + 2 x20 + ... + k xk0 + u 0 .
(6.33)
Prediction:
y 0 = 0 + 1 x10 + 2 x20 + ... + k xk0 .
Prediction error:
e0 = y 0 y 0 = (0 + 1 x10 + 2 x20 + ... + k xk0 ) + u 0 y 0 .
(6.34)
37 / 49
Applications
Prediction
The expected prediction error is zero, E (
e 0 ) = 0, because
E (
y 0 ) = y 0 (as the j are unbiased) and u 0 has zero mean.
The variance of the prediction error is the sum of the variances
because u 0 and y 0 are uncorrelated:
Var (
e 0 ) = Var (
y 0 ) + Var (u 0 ) = Var (
y 0) + 2.
(6.35)
There are two sources of variation in e0 :
1 Sampling error in y
0 , which arises because we have estimated the j ;
decreases with sample size.
2 2 is variance of the error in the population; it does not change with the
sample size.
Standard error of e0 :
se(
e 0 ) = {[se(
y 0 )]2 +
2 }1/2 .
(6.36)
38 / 49
Applications
Prediction
It holds that e0 /se(
e 0 ) has a t distribution with n k 1 degrees of
freedom.
Therefore,
e0
P t/2 6
6
t
/2 = 1
se(
e 0)
y 0 y 0
6 t/2 = 1
P t/2 6
se(
e 0)
P y 0 t/2 se(
e 0 ) 6 y 0 6 y0 + t/2 se(
e 0) = 1
39 / 49
Applications
Prediction
Example: prediction interval (for GPA) for any particular student
Source 
SS
df
MS
+Model  499.030503
4 124.757626
Residual  1295.16517 4132 .313447524
+Total  1794.19567 4136 .433799728
Number of obs
F( 4, 4132)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
4137
398.02
0.0000
0.2781
0.2774
.55986
colgpa 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+sat0 
.0014925
.0000652
22.89
0.000
.0013646
.0016204
hsperc0  .0138558
.000561
24.70
0.000
.0149557
.0127559
hsize0  .0608815
.0165012
3.69
0.000
.0932328
.0285302
hsizesq0 
.0054603
.0022698
2.41
0.016
.0010102
.0099104
_cons 
2.700075
.0198778
135.83
0.000
2.661104
2.739047

se(
e 0 ) = [(.020)2 + 0.5602 ]1/2 .560.
Prediction interval: 2.70 1.96 .560 = [1.60, 3.80].
40 / 49
Applications
Prediction
6.4.2 Predicting y when ln y is the dependent variable
Given the OLS estimators, we can predict ln y for any value of the
explanatory variables:
d
ln
y = 0 + 1 x1 + 2 x2 + ... + k xk .
(6.39)
How to predict y ?
d
N.B.: y 6= exp(ln
y ). Hence, simply exponentiate the predicted value
for ln y does not work. In fact, it will systematically underestimate the
expected value of y .
It can be shown that
E (y x) = exp( 2 /2) exp(0 + 1 x1 + 2 x2 + ... + k xk ),
where 2 is the variance of u.
41 / 49
Applications
Prediction
Hence, the prediction of y is:
d
y = exp(
2 /2) exp(ln
y ),
(6.40)
where
2 is the unbiased estimator of 2 .
The prediction in 6.40 relies on the normality of the error term, u.
How to obtain a prediction that does not rely on normality?
General model:
E (y x) = 0 exp(0 + 1 x1 + 2 x2 + ... + k xk ),
(6.41)
where 0 is the expected value of exp(u).
Given an estimate
0 , we can predict y as
d
y =
0 exp(ln
y ).
(6.42)
42 / 49
Applications
Prediction
First approach to estimate 0 : a consistent but not unbiased
smearing estimate is
0 = n1
n
X
exp(
ui ).
(6.43)
i=1
Second approach to estimate 0 :
Define mi = exp(0 + 1 xi1 + 2 xi2 + ... + k xik ).
d
Replace the j with their OLS estimates and obtain m
i = exp(ln
yi ).
Estimate a simple regression of yi on m
i without an intercept. The
slope estimate is a consistent but not unbiased estimate for 0 .
With a consistent estimate for 0 , the prediction for y can be
d
calculated as
0 exp(ln
y ).
43 / 49
Applications
Prediction
Example: predicting CEO salaries
Model:
ln salary = 0 + 1 ln sales + 2 ln mktval + 3 ceoten + u,
Estimation results:
Source 
SS
df
MS
+Model  20.5672434
3 6.85574779
Residual  44.0789697
173 .254791732
+Total  64.6462131
176 .367308029
Number of obs
F( 3,
173)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
177
26.91
0.0000
0.3182
0.3063
.50477
lsalary 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+lsales 
.1628545
.0392421
4.15
0.000
.0853995
.2403094
lmktval 
.109243
.0495947
2.20
0.029
.0113545
.2071315
ceoten 
.0117054
.0053261
2.20
0.029
.001193
.0222178
_cons 
4.503795
.2572344
17.51
0.000
3.996073
5.011517
44 / 49
Applications
Prediction
The smearing estimate for 0 is:
. predict uhat, res
. gen euhat = exp(uhat)
. su euhat
Variable 
Obs
Mean
Std. Dev.
Min
Max
+euhat 
177
1.135661
.6970541
.0823372
6.378018
45 / 49
Applications
Prediction
The regression estimate for 0 is:
. predict lsalary_hat
(option xb assumed; fitted values)
. gen m_hat = exp(lsalary_hat)
. reg salary m_hat, nocons
Source 
SS
df
MS
+Model 
147352711
1
147352711
Residual 
46113901
176 262010.801
+Total 
193466612
177 1093031.71
Number of obs
F( 1,
176)
Prob > F
Rsquared
Adj Rsquared
Root MSE
=
=
=
=
=
=
177
562.39
0.0000
0.7616
0.7603
511.87
salary 
Coef.
Std. Err.
t
P>t
[95% Conf. Interval]
+m_hat 
1.116857
.0470953
23.71
0.000
1.023912
1.209801

46 / 49
Applications
Prediction
Prediction for sales = 5,000 (which means $5 billion because sales is in
millions), mktval = 10,000 (or $10 billion), and ceoten = 10:
ln\
salary = 4.503 + 0.163 ln(5000) + 0.109 ln(10000) + 0.012 10
= 7.013.
Naive prediction: exp(7.013) = 1110.983.
Prediction using smearing estimate: 1.136 exp(7.013) = 1262.076.
Prediction using regression estimate: 1.117 exp(7.013) = 1240.967.
47 / 49
Key terms
References
References
Key terms
adjusted Rsquared
interaction effect
nonnested models
over controlling
prediction error
prediction interval
predictions
quadratic functions
smearing estimate
variance of the prediction error
48 / 49
Key terms
References
References
References
Textbook: Chapter 6 in Wooldridge (2013).
Further readings: Chapter 8, Chapter 9 in Stock and Watson (2012).
Chapter 6, Chapter 10 in Hill et al. (2001)
Hill, R. C., Griffiths, W. E., and Judge, G. G. (2001). Undergraduate
Econometrics. John Wiley & Sons, New York.
Stock, J. H. and Watson, M. W. (2012). Introduction to Econometrics.
Pearson, Boston.
Wooldridge, J. M. (2013). Introductory Econometrics: A Modern Approach.
Cengage Learning, Mason, OH.
49 / 49
Mult mai mult decât documente.
Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.
Anulați oricând.