Documente Academic
Documente Profesional
Documente Cultură
Mark Schaffer
version of 21.10.2016
Introduction
In this lab, you reproduce and extend the results discussed in the lecture using Nerloves data
on 145 electricity generation firms. The data and estimations are discussed extensively in
Hayashi Chapter 1. The assignment that follows the lab is also based on this dataset.
Reminders from the lecture notes/Hayashi: the key variables are output (electricity in kwh),
total costs (labour, capital, fuel), and factor prices (price of labour, capital, fuel). We assume
firms are cost-minimizers (output is set exogenously) and price-takers.
Cobb-Douglas production function:
1 2 3
= 1
2 3
1
1
2
3
ln = + ln + ln 1 + ln 2 + ln 3 +
where 1 is the wage (price of labour), 2 is the price of capital, 3 is the price of fuel, and
the degree of returns to scale is defined as the sum of the labour, capital and fuel
elasticities: 1 + 2 + 3 .
Estimating equation:
ln = 1 + 2 ln + 3 ln 1 + 4 ln 2 + 5 ln 3 +
The error can be interpreted as the inverse of the firms productivity level (in logs) relative to
the industry average, so a positive means below-average productivity and above-average
costs. The coefficient on log output 2 can be interpreted as the inverse of the degree of
returns to scale .
Preparation
Run the lab5_prep.do file. The do file loads the dataset (stored at Boston College) and
creates log versions of the various variables: lnTC, lnQ, lnp1, lnp2, lnp3.
The do file also creates an identifier gid for 5 size categories based on output. The code is:
sort output
gen gid=ceil(_n/29)
label var gid "group identifier"
tab gid
The special built-in variable _n is the number of the current observation. The ceil(.)
function rounds up. The result is a variable gid that is =1 for the first 29 firms, =2 for the 2nd
29 firms, , =5 for the last 29 firms.
The last part of the do file puts the variables in a specific order (for convenience only) and
then lists and summarizes the data.
Discuss the results. What do you conclude? Is there strong evidence of homogeneity? Why
or why not?
What is the connection between the two ways of testing homogeneity? If you test the lower
and upper limits of the 95% confidence interval from lincom, what do you get as p-values?
(Remember: Stata's "test" command reports 2-sided tests by default.) Discuss. In what sense
are the two methods different? Which method of testing homogeneity is more informative?
test lnp1+lnp2+lnp3 = -0.2889876
test lnp1+lnp2+lnp3 = 1.574927
the restricted equation has only 4, which is the number of underlying parameters in the model
(the 3 factor elasticities in the Cobb-Douglas production function plus the measure of average
productivity ):
ln
1
2
= 1 + 2 ln + 3 ln
+ 4 ln
+
3
3
3
Open the lab5_task2.do file. Run the code at the top that creates the new dependent
variable and the two new regressors. Then estimate the restricted model using this trick, and
recover the fuel elasticity from the assumption of homogeneity 3 + 4 + 5 = 1:
gen double lnTCp3 = lnTC - lnp3
gen double lnp1p3 = lnp1 - lnp3
gen double lnp2p3 = lnp2 - lnp3
reg lnTCp3 lnQ lnp1p3 lnp2p3
lincom 1 - lnp1p3 - lnp2p3
Alternate method: use Statas facility for linear constraints and the cnsreg (constrained
regression) command:
constraint 1 lnp1+lnp2+lnp3=1
cnsreg lnTC lnQ lnp1 lnp2 lnp3, constraint(1)
Discuss the results. Has much changed between the unrestricted and restricted version?
Now test homogeneity again but using the LR Principle (contrast the values of the minimized
objective functions of the unrestricted and restricted estimations). To do this, estimate both
versions and saved the residual sums of squares as global macros. Also save the estimated
error variance from the unrestricted estimation. Note that the prefix qui (quietly) supresses
the output from the estimation:
qui reg lnTC lnQ lnp1 lnp2 lnp3
global RSSU=e(rss)
global sigma2=e(rmse)^2
qui reg lnTCp3 lnQ lnp1p3 lnp2p3
global RSSR=e(rss)
Two ways of calculating the F statistic based on the LR Principle are below; the second uses
the estimated error variance from the unrestricted estimation $sigma2. How many numerator
and denominator degrees of freedom do we have?
di (($RSSR-$RSSU)/1) / ($RSSU/(145-5))
di (($RSSR-$RSSU)/1) / $sigma2
Confirm these are identical to Wald test statistic from Task 1. Display the saved macro r(F)
to see more decimal places.
qui reg lnTC lnQ lnp1 lnp2 lnp3
test lnp1+lnp2+lnp3=1
di r(F)
The confidence interval for 2 is in effect a confidence interval for . This implies a
confidence interval for . What is this implied confidence interval for the degree of returns to
scale? Do the calculation by hand and discuss your results.
di 1/0.7549262
di 1/0.685862
Which method of testing for constant returns is more informative and why? What do you
conclude about returns to scale in electricity generation?
We can also use the delta method implemented in Statas nlcom command to obtain directly a
confidence interval for :
qui reg lnTC lnQ lnp13 lnp23
nlcom 1/_b[lnQ]
The two confidence intervals are close but not exactly the same. Why not?
We can also recover the underlying production function elasticities from the estimation.
1
Therefore 1 =
3
2
, 2 =
4
2
, 3 =
5
2
di _b[lnp1p3]/_b[lnQ]
di _b[lnp2p3]/_b[lnQ]
di (1-_b[lnp1p3]-_b[lnp2p3])/_b[lnQ]
To obtain standard errors and confidence intervals for the estimated parameters, we need to
use the delta method and Statas nlcom (why?):
nlcom _b[lnp1p3]/_b[lnQ]
nlcom _b[lnp2p3]/_b[lnQ]
nlcom (1-_b[lnp1p3]-_b[lnp2p3])/_b[lnQ]
-1
Residuals
10
log output
Nerlove explored this by splitting his sample of 145 firms according to size (output) into 5
groups of 29 firms each. We labelled firms earlier by creating a variable gid (group id)
which is =1 for the smallest group, =5 for the largest group, etc.
In this task we estimate the cost function equation for each group separately and test whether
certain restrictions across groups can be rejected or not.
Summarize output and log output by group:
tabstat output lnQ, by(gid) stats(mean median min max)
Estimate the restricted model group-by-group, smallest to largest. What do you see
1
happening to the estimated and 2 as firms get larger? (Reminders: = ; Root MSE=
2
lnTCp3
lnTCp3
lnTCp3
lnTCp3
lnTCp3
lnQ
lnQ
lnQ
lnQ
lnQ
lnp1p3
lnp1p3
lnp1p3
lnp1p3
lnp1p3
lnp2p3
lnp2p3
lnp2p3
lnp2p3
lnp2p3
if
if
if
if
if
gid==1
gid==2
gid==3
gid==4
gid==5
Use a Stata loop to display just the estimated 2 and 2 . This illustrates the use of a local
macro: the macro `i' is local because it disappears after the loop is done running. The
_col(.) option controls the column where the number appears; %6.3f determines the format
of the number.
forvalues i=1/5 {
qui reg lnTCp3 lnQ lnp1p3 lnp2p3 if gid==`i'
di "Group=`i'"
_col(20) "beta2=" %6.3f _b[lnQ]
_col(40) "sigma^2=" %6.3f e(rmse)^2
}
///
///
//
To estimate these 5 equations as a single system, we need to interact dummy variables for the
5 groups with the 3 explanatory variables. We will also use the dummies as intercepts. We
can do this using Statas xi prefix command and the factor variable prefix i.
The following line creates 5 dummies called Dgid_1, Dgid_2, , Dgid_5 based on our group
id variable gid. The noomit option says create all possible dummies and dont omit the base
category.
xi i.gid, prefix(D) noomit
To create interactions, we use the same command. The | operator says create only
interactions between the indicator variable and the continuous variable (if we used the *
operator instead, it would create the level dummies but we already have those from the
previous command).
xi i.gid|lnQ i.gid|lnp1p3 i.gid|lnp2p3, prefix(D) noomit
Reorder the variables so that the gid=1 interactions are all together, the gid=2 interactions,
etc. The very first variable is the group id. In each group, put the dummy for group id last.
This is to be consistent with Stata's convention of putting the intercept last in the OLS output.
order gid
DgidXlnQ_1
DgidXlnQ_2
DgidXlnQ_3
DgidXlnQ_4
DgidXlnQ_5
DgidXlnp1p_1
DgidXlnp1p_2
DgidXlnp1p_3
DgidXlnp1p_4
DgidXlnp1p_5
DgidXlnp2p_1
DgidXlnp2p_2
DgidXlnp2p_3
DgidXlnp2p_4
DgidXlnp2p_5
Dgid_1
Dgid_2
Dgid_3
Dgid_4
Dgid_5
///
///
///
///
///
DO NOT SKIP THIS STEP. Click on the data browser icon in the main Stata window.
Look for the interacted variables - they all start with D. Note the block-diagonal structure of
the regressors.
Estimate the 5 equations in two ways:
stacked system of 5 equations.
reg
reg
reg
reg
reg
lnTCp3
lnTCp3
lnTCp3
lnTCp3
lnTCp3
lnQ
lnQ
lnQ
lnQ
lnQ
lnp1p3
lnp1p3
lnp1p3
lnp1p3
lnp1p3
reg lnTCp3
DgidXlnQ_1
DgidXlnQ_2
DgidXlnQ_3
DgidXlnQ_4
DgidXlnQ_5
, nocons
lnp2p3
lnp2p3
lnp2p3
lnp2p3
lnp2p3
DgidXlnp1p_1
DgidXlnp1p_2
DgidXlnp1p_3
DgidXlnp1p_4
DgidXlnp1p_5
(2) As a single
gid==1
gid==2
gid==3
gid==4
gid==5
DgidXlnp2p_1
DgidXlnp2p_2
DgidXlnp2p_3
DgidXlnp2p_4
DgidXlnp2p_5
Dgid_1
Dgid_2
Dgid_3
Dgid_4
Dgid_5
///
///
///
///
///
///
The single system can also be estimated using Statas factor variables:
reg lnTCp3 ibn.gid ibn.gid#c.(lnQ lnp1p3 lnp2p3), nocons
total cost"
output"
p1 (labour)"
p2 (capital)"
p3 (fuel)"
10
11
if
if
if
if
if
gid==1
gid==2
gid==3
gid==4
gid==5
///
///
//
12
sum D*
// Reorder the variables so that the gid=1 interactions are all together,
// the gid=2 interactions, etc. The very first variable is the group id.
// In each group, put the dummy for group id last. This is to be
consistent
// with Stata's convention of putting the intercept last in the OLS
output.
order gid
DgidXlnQ_1
DgidXlnQ_2
DgidXlnQ_3
DgidXlnQ_4
DgidXlnQ_5
//
//
//
//
DgidXlnp1p_1
DgidXlnp1p_2
DgidXlnp1p_3
DgidXlnp1p_4
DgidXlnp1p_5
DgidXlnp2p_1
DgidXlnp2p_2
DgidXlnp2p_3
DgidXlnp2p_4
DgidXlnp2p_5
Dgid_1
Dgid_2
Dgid_3
Dgid_4
Dgid_5
///
///
///
///
///
gid==1
gid==2
gid==3
gid==4
gid==5
13