SGPE Econometrics Lab 5: Returns To Scale in Electricity Supply Mark Schaffer Version of 21.10.2016

SGPE Econometrics Lab 5: Returns to Scale in Electricity Supply
Mark Schaffer
version of 21.10.2016
Introduction
In this lab, you reproduce and extend the results discussed in the lecture using Nerloves data
on 145 electricity generation firms. The data and estimations are discussed extensively in
Hayashi Chapter 1. The assignment that follows the lab is also based on this dataset.
Reminders from the lecture notes/Hayashi: the key variables are output (electricity in kwh),
total costs (labour, capital, fuel), and factor prices (price of labour, capital, fuel). We assume
firms are cost-minimizers (output is set exogenously) and price-takers.
Cobb-Douglas production function:
1 2 3
= 1
2 3
where 1 is labour input of firm i, 2 is capital input and 3 is fuel input.
Log-linearized cost function:
1
1
2
3
ln = + ln + ln 1 + ln 2 + ln 3 +
where 1 is the wage (price of labour), 2 is the price of capital, 3 is the price of fuel, and
the degree of returns to scale is defined as the sum of the labour, capital and fuel
elasticities: 1 + 2 + 3 .
Estimating equation:
ln = 1 + 2 ln + 3 ln 1 + 4 ln 2 + 5 ln 3 +
The error can be interpreted as the inverse of the firms productivity level (in logs) relative to
the industry average, so a positive means below-average productivity and above-average
costs. The coefficient on log output 2 can be interpreted as the inverse of the degree of
returns to scale .
Preparation
Run the lab5_prep.do file. The do file loads the dataset (stored at Boston College) and
creates log versions of the various variables: lnTC, lnQ, lnp1, lnp2, lnp3.
The do file also creates an identifier gid for 5 size categories based on output. The code is:
sort output
gen gid=ceil(_n/29)
label var gid "group identifier"
tab gid
The special built-in variable _n is the number of the current observation. The ceil(.)
function rounds up. The result is a variable gid that is =1 for the first 29 firms, =2 for the 2nd
29 firms, , =5 for the last 29 firms.
The last part of the do file puts the variables in a specific order (for convenience only) and
then lists and summarizes the data.
Task 1: Test homogeneity

Begin by estimating the unrestricted model. Interpret and discuss the results:
reg lnTC lnQ lnp1 lnp2 lnp3
Which of the Assumptions 1.1-1.5 (linearity, strict exogeneity, no multicollinearity, spherical

error variance, normality of the error) do you find most plausible? Most unrealistic? What
are the consequences of each of these assumptions? Even if the assumptions are satisfied,
what does it mean that there might not be enough identifying variation to obtain precise
estimates of some of the parameters?
Well-behaved cost functions are linearly homogeneous in factor prices. Multiply total costs
and all factor prices by some scaling factor and the cost function is unchanged if and only if
3 + 4 + 5 = 1.
Estimate the unrestricted model and test the null hypothesis 0 : 3 + 4 + 5 = 1. What do
you conclude? Is there strong evidence of homogeneity? Why or why not?
test lnp1+lnp2+lnp3=1
An alternative approach is to estimate the unrestricted model and construct a confidence

interval for 3 + 4 + 5 using the Stata command lincom:
lincom lnp1+lnp2+lnp3
Discuss the results. What do you conclude? Is there strong evidence of homogeneity? Why
or why not?
What is the connection between the two ways of testing homogeneity? If you test the lower
and upper limits of the 95% confidence interval from lincom, what do you get as p-values?
(Remember: Stata's "test" command reports 2-sided tests by default.) Discuss. In what sense
are the two methods different? Which method of testing homogeneity is more informative?
test lnp1+lnp2+lnp3 = -0.2889876
test lnp1+lnp2+lnp3 = 1.574927
Task 2: Estimating the restricted and unrestricted models

There is more than one way to estimate the model with the restriction of homogeneity
imposed. The simplest way is to use a trick: subtract ln 3 from both sides of the cost
function, substitute and rearrange (see Hayashi and the lecture notes). Whereas the
unrestricted equation has 5 parameters to estimate:
ln = 1 + 2 ln + 3 ln 1 + 4 ln 2 + 5 ln 3 +
the restricted equation has only 4, which is the number of underlying parameters in the model
(the 3 factor elasticities in the Cobb-Douglas production function plus the measure of average
productivity ):
ln
1
2
= 1 + 2 ln + 3 ln
+ 4 ln
+
3
3
3
Open the lab5_task2.do file. Run the code at the top that creates the new dependent
variable and the two new regressors. Then estimate the restricted model using this trick, and
recover the fuel elasticity from the assumption of homogeneity 3 + 4 + 5 = 1:
gen double lnTCp3 = lnTC - lnp3
gen double lnp1p3 = lnp1 - lnp3
reg lnTCp3 lnQ lnp1p3 lnp2p3
lincom 1 - lnp1p3 - lnp2p3
Alternate method: use Statas facility for linear constraints and the cnsreg (constrained
regression) command:
constraint 1 lnp1+lnp2+lnp3=1
cnsreg lnTC lnQ lnp1 lnp2 lnp3, constraint(1)
Discuss the results. Has much changed between the unrestricted and restricted version?
Now test homogeneity again but using the LR Principle (contrast the values of the minimized
objective functions of the unrestricted and restricted estimations). To do this, estimate both
versions and saved the residual sums of squares as global macros. Also save the estimated
error variance from the unrestricted estimation. Note that the prefix qui (quietly) supresses
the output from the estimation:
qui reg lnTC lnQ lnp1 lnp2 lnp3
global RSSU=e(rss)
global sigma2=e(rmse)^2
qui reg lnTCp3 lnQ lnp1p3 lnp2p3
global RSSR=e(rss)
Two ways of calculating the F statistic based on the LR Principle are below; the second uses
the estimated error variance from the unrestricted estimation $sigma2. How many numerator
and denominator degrees of freedom do we have?
di (($RSSR-$RSSU)/1) / ($RSSU/(145-5))
di (($RSSR-$RSSU)/1) / $sigma2
Confirm these are identical to Wald test statistic from Task 1. Display the saved macro r(F)
to see more decimal places.
di r(F)
Task 3: Estimating returns to scale

To estimate the degree of returns to scale we work with the restricted model with
homogeneity imposed. Reminder: the coefficient on log output 2 can be interpreted as the
inverse of the degree of returns to scale .
The lab5_task3.do file includes the code for creating the new dependent variable and the
new price elasticity variables. Run this if needed first.
Estimate the restricted model and test whether there are constant returns to scale by testing
0 : 2 = 1. Discuss your results.
test lnQ=1
The confidence interval for 2 is in effect a confidence interval for . This implies a
confidence interval for . What is this implied confidence interval for the degree of returns to
scale? Do the calculation by hand and discuss your results.
di 1/0.7549262
di 1/0.685862
Which method of testing for constant returns is more informative and why? What do you
conclude about returns to scale in electricity generation?
We can also use the delta method implemented in Statas nlcom command to obtain directly a
confidence interval for :
qui reg lnTC lnQ lnp13 lnp23
nlcom 1/_b[lnQ]
The two confidence intervals are close but not exactly the same. Why not?
We can also recover the underlying production function elasticities from the estimation.
1
Recall that from the log-linearized cost function, 2 = , 3 = 1, 4 = 2, 5 = 3.
Therefore 1 =
3
2
, 2 =
4
2
, 3 =
5
2
. The point estimates of the elasticities are:
di _b[lnp1p3]/_b[lnQ]
di _b[lnp2p3]/_b[lnQ]
di (1-_b[lnp1p3]-_b[lnp2p3])/_b[lnQ]
To obtain standard errors and confidence intervals for the estimated parameters, we need to
use the delta method and Statas nlcom (why?):
nlcom _b[lnp1p3]/_b[lnQ]
nlcom (1-_b[lnp1p3]-_b[lnp2p3])/_b[lnQ]
How do you interpret these results?
Task 4: Interactions, restrictions, systems of equations

Open the lab5_task4.do file and run the code that creates the residuals from the restricted
model and plots the residuals vs. log output. If necessary, re-run the code to prepare the data
and create the additional variables for the restricted model.
cap drop resid
predict double resid, resid
scatter resid lnQ
-1
Residuals
The scatterplot should look like this:
10
log output
The decreasing variance in the residuals suggests the assumption of conditional

homoskedasticity is wrong, and the U-shape pattern suggests that the degree of returns to
scale is not constant. (In particular, the positive slope for most of the sample implies that
larger firms have lower productivity, whereas a more likely explanation is that small firms
have a larger i.e., they have scale economies available to them compared to large firms.)
Nerlove explored this by splitting his sample of 145 firms according to size (output) into 5
groups of 29 firms each. We labelled firms earlier by creating a variable gid (group id)
which is =1 for the smallest group, =5 for the largest group, etc.
In this task we estimate the cost function equation for each group separately and test whether
certain restrictions across groups can be rejected or not.
Summarize output and log output by group:
tabstat output lnQ, by(gid) stats(mean median min max)
Estimate the restricted model group-by-group, smallest to largest. What do you see
1
happening to the estimated and 2 as firms get larger? (Reminders: = ; Root MSE=
2
, the square root of the estimated error variance.)

reg
reg
reg
reg
reg
lnTCp3
lnTCp3
lnTCp3
lnTCp3
lnTCp3
lnQ
lnQ
lnQ
lnQ
lnQ
lnp1p3
lnp1p3
lnp1p3
lnp1p3
lnp1p3
lnp2p3
lnp2p3
lnp2p3
lnp2p3
lnp2p3
if
if
if
if
if
gid==1
gid==2
gid==3
gid==4
gid==5
Use a Stata loop to display just the estimated 2 and 2 . This illustrates the use of a local
macro: the macro ì' is local because it disappears after the loop is done running. The
_col(.) option controls the column where the number appears; %6.3f determines the format
of the number.
forvalues i=1/5 {
qui reg lnTCp3 lnQ lnp1p3 lnp2p3 if gid==ì'
di "Group=ì'"
_col(20) "beta2=" %6.3f _b[lnQ]
_col(40) "sigma^2=" %6.3f e(rmse)^2
}
///
///
//
To estimate these 5 equations as a single system, we need to interact dummy variables for the
5 groups with the 3 explanatory variables. We will also use the dummies as intercepts. We
can do this using Statas xi prefix command and the factor variable prefix i.
The following line creates 5 dummies called Dgid_1, Dgid_2, , Dgid_5 based on our group
id variable gid. The noomit option says create all possible dummies and dont omit the base
category.
xi i.gid, prefix(D) noomit
To create interactions, we use the same command. The | operator says create only
interactions between the indicator variable and the continuous variable (if we used the *
operator instead, it would create the level dummies but we already have those from the
previous command).
xi i.gid|lnQ i.gid|lnp1p3 i.gid|lnp2p3, prefix(D) noomit
Summarize and describe the new variables:

desc D*
sum D*
Reorder the variables so that the gid=1 interactions are all together, the gid=2 interactions,
etc. The very first variable is the group id. In each group, put the dummy for group id last.
This is to be consistent with Stata's convention of putting the intercept last in the OLS output.
order gid
DgidXlnQ_1
DgidXlnQ_2
DgidXlnQ_3
DgidXlnQ_4
DgidXlnQ_5
DgidXlnp1p_1
DgidXlnp1p_2
DgidXlnp1p_3
DgidXlnp1p_4
DgidXlnp1p_5
DgidXlnp2p_1
DgidXlnp2p_2
DgidXlnp2p_3
DgidXlnp2p_4
DgidXlnp2p_5
Dgid_1
Dgid_2
Dgid_3
Dgid_4
Dgid_5
///
///
///
///
///
DO NOT SKIP THIS STEP. Click on the data browser icon in the main Stata window.
Look for the interacted variables - they all start with D. Note the block-diagonal structure of
the regressors.
Estimate the 5 equations in two ways:
stacked system of 5 equations.
reg
reg
reg
reg
reg
lnTCp3
lnTCp3
lnTCp3
lnTCp3
lnTCp3
lnQ
lnQ
lnQ
lnQ
lnQ
lnp1p3
lnp1p3
lnp1p3
lnp1p3
lnp1p3
reg lnTCp3
DgidXlnQ_1
DgidXlnQ_2
DgidXlnQ_3
DgidXlnQ_4
DgidXlnQ_5
, nocons
lnp2p3
lnp2p3
lnp2p3
lnp2p3
lnp2p3
DgidXlnp1p_1
DgidXlnp1p_2
DgidXlnp1p_3
DgidXlnp1p_4
DgidXlnp1p_5
(1) As 5 separate regressions.

if
if
if
if
if
(2) As a single
gid==1
gid==2
gid==3
gid==4
gid==5
DgidXlnp2p_1
DgidXlnp2p_2
DgidXlnp2p_3
DgidXlnp2p_4
DgidXlnp2p_5
Dgid_1
Dgid_2
Dgid_3
Dgid_4
Dgid_5
///
///
///
///
///
///
The single system can also be estimated using Statas factor variables:
reg lnTCp3 ibn.gid ibn.gid#c.(lnQ lnp1p3 lnp2p3), nocons
Note that when the equations are estimated as a system:

(a) The coefficients are exactly the same as when estimated as 5 separate equations.
(b) The standard errors are NOT the same.
Why?
And why should we use the nocons option? What happens if we don't?
CODE FOR DATASET PREP

clear
use http://fmwww.bc.edu/ec-p/data/hayashi/nerlove63.dta, clear
// Describe and summarize the raw data
desc
sum
// Create variables - logs of cost, output, prices
gen double lnTC
= ln(totcost)
gen double lnQ
= ln(output)
gen double lnp1
= ln(plabor)
gen double lnp2
= ln(pkap)
gen double lnp3
= ln(pfuel)
// Label variables
label var lnTC
"log
label var lnQ
"log
label var lnp1
"log
label var lnp2
"log
label var lnp3
"log
total cost"
output"
p1 (labour)"
p2 (capital)"
p3 (fuel)"
// Generate 5 size categories.

// Variable gid=1,...,5, smallest-to-largest.
// Built-in special variable _n has the number of the current observation.
// Function ceil(.) (for "ceiling") rounds up. ceil(5.1)=ceil(5.9)=6.
// 145 firms means 5 groups of 29 each.
sort output
gen gid=ceil(_n/29)
label var gid
"group identifier"
tab gid
// Put group idenifier and new variables at the beginning (convenient).
order gid lnTC-lnp3, first
// Describe, list and summarize the data
desc
list
sum
CODE FOR TASK 1

capture cd M:\Econometrics\Lab5
// Make sure the Nerlove dataset with created variables is in memory.
// Can include this line to run the prep do file:
// run lab5_prep
******* TASK 1: Testing homogeneity ***********
// Run unrestricted model and interpret/discuus results.
// Run unrestricted model and test null of homogeneity.
// H0: beta3+beta4+beta5=1.
// Discsuss results. Is this strong evidence? Why or why not?
// Run unrestricted model and test homogeneity.
// This time, construct a confidence interval for beta3+beta4+beta5.
// Discuss results. Is this strong evidence? Why or why not?
lincom lnp1+lnp2+lnp3
// What is the connection between the two ways of testing homogeneity?
// If you test the lower and upper limits of the 95% confidence interval
// from lincom, what do you get as p-values?
// (Remember: Stata's "test" command reports 2-sided tests by default.)
// Discuss. Which method of testing homogeneity is more informative?
test lnp1+lnp2+lnp3 = -0.2889876
test lnp1+lnp2+lnp3 = 1.574927
CODE FOR TASK 2

// run lab5_prep
******* TASK 2: Restricted and unrestricted models ***********
// Create and label variables used to estimated restricted model.
// capture trick - if they already exist, drop them.
cap drop lnTCp3
cap drop lnp1p3
cap drop lnp2p3
label var lnTCp3 "lnTC - lnp3"
label var lnp1p3 "lnp2 - lnp3"
//
//
//
//
//
//
Restricted model estimated two different ways.

Confirm identical.
Discuss results compared to unrestricted model.
How many parameters are now being estimated?
Which estimation is more efficient and why?
How much has changed vs. unrestricted model?
// Restricted model using transformation trick:

// Recovered fuel coefficient beta3
lincom 1 - lnp1p3 - lnp2p3
// Restricted model using Stata's constraint support and the
// cnsreg ("constrained regression") command:
constraint 1 lnp1+lnp2+lnp3=1
cnsreg lnTC lnQ lnp1 lnp2 lnp3, constraint(1)
// Repeat test of homogeneity using the LR Principle (Hayahsi p. 66).
// Estimate restricted and unrestricted version and use RSS from both.
// "quietly" prefix prevents output going to the screen.
// Need the RSS from the restricted and unrestricted estimations,
// plus the error variance from the unrestricted estimation.
global RSSU=e(rss)
global sigma2=e(rmse)^2
qui reg lnTCp3 lnQ lnp1p3 lnp2p3
global RSSR=e(rss)
//
//
//
di
di
Two ways of calculating the F statistic based on the LR Principle.

How many numerator and denominator degrees of freedom do we have?
Why are the two statistics identical?
(($RSSR-$RSSU)/1) / ($RSSU/(145-5))
(($RSSR-$RSSU)/1) / $sigma2
// Confirm identical to Wald test statistic from Task 1.

// Display saved macro r(F) to see more decimal places.
di r(F)
10
CODE FOR TASK 3

// run lab5_prep
******* TASK 3: Estimating returns to scale ***********
cap drop lnTCp3
cap drop lnp1p3
cap drop lnp2p3
// Restricted model using transformation trick:
// Recall that returns to scale r = alpha1+alpha2+alpha3
// and the coeffic on lnQ beta2 = 1/r.
// Test whether returns to scale=1 by testing H0: beta2=1
// Discuss your results.
test lnQ=1
//
//
//
//
//
di
di
What is the confidence interval for beta2?

This gives a confidence interval for 1/r.
What is the implied confidence interval for r?
Do the calculation by hand and discuss your results.
Which method of testing for constant returns is more informative? Why?
1/0.7551567
1/0.6862183
// Use the delta method (later lecture) and Stata's nlcom to

// obtain a confidence interval for r = 1/beta2.
// (Advanced question, see later lecture for answer:
//
Why are the two CIs not exactly the same?)
qui reg lnTC lnQ lnp1p3 lnp2p3
nlcom 1/_b[lnQ]
//
//
di
di
di
What are the point estimates of the

production function elasticities?
_b[lnp1p3]/_b[lnQ]
_b[lnp2p3]/_b[lnQ]
(1-_b[lnp1p3]-_b[lnp2p3])/_b[lnQ]
// Use Stata's nlcom and the delta method to obtain

// SEs and CIs for the production function elasticities.
nlcom (1-_b[lnp1p3]-_b[lnp2p3])/_b[lnQ]
11
CODE FOR TASK 4

// run lab5_prep
******* TASK 4: Interactions, restrictions, etc. ***********
cap drop lnTCp3
cap drop lnp1p3
cap drop lnp2p3
// Plot residuals from restricted model vs. log output.
// Why should you be worried?
cap drop resid
predict double resid, resid
scatter resid lnQ
// Split the sample into 5 groups of 29 firms by size (output).
// gid=1 (smallest firms), 2, 3, 4, 5 (largest firms).
// Summarize output and ln(output) by group using tabstat+by(.).
// Report mean, median, min, max.
// Can expand to include other variables if desired.
tabstat output lnQ, by(gid) stats(mean median min max)
//
//
//
//
//
Estimate the restricted model group-by-group, smallest to largest.

What do you see happening to the estimated r and sigma^2
as the firms get larger?
Reminder: r=1/beta2.
Reminder: RMSE = estimated sigma.
// All output displayed:

if
if
if
if
if
gid==1
gid==2
gid==3
gid==4
gid==5
// Just beta2=1/r and sigma^2:

forvalues i=1/5 {
qui reg lnTCp3 lnQ lnp1p3 lnp2p3 if gid==ì'
di "Group=ì'"
_col(20) "beta2=" %6.3f _b[lnQ]
_col(40) "sigma^2=" %6.3f e(rmse)^2
}
//
//
//
//
///
///
//
Estimating a system of equations (pp. 78-80).

To test restrictions across the 5 separate estimations,
we need to estimated the 5 equations as a single sytem of equations.
We can do this here by interacting dummies with regressors.
// Create dummies for groups using xi command.

// Create interactions of dummies with regressors.
cap drop D*
xi i.gid, prefix(D) noomit
xi i.gid|lnQ i.gid|lnp1p3 i.gid|lnp2p3, prefix(D) noomit
desc D*
12
sum D*
// Reorder the variables so that the gid=1 interactions are all together,
// the gid=2 interactions, etc. The very first variable is the group id.
// In each group, put the dummy for group id last. This is to be
consistent
// with Stata's convention of putting the intercept last in the OLS
output.
order gid
DgidXlnQ_1
DgidXlnQ_2
DgidXlnQ_3
DgidXlnQ_4
DgidXlnQ_5
//
//
//
//
DgidXlnp1p_1
DgidXlnp1p_2
DgidXlnp1p_3
DgidXlnp1p_4
DgidXlnp1p_5
DgidXlnp2p_1
DgidXlnp2p_2
DgidXlnp2p_3
DgidXlnp2p_4
DgidXlnp2p_5
Dgid_1
Dgid_2
Dgid_3
Dgid_4
Dgid_5
///
///
///
///
///
DO NOT SKIP THIS STEP:

Click on the "data browser" icon in the main Stata window.
Look for the interacted variables - they all start with "D".
Note the block-diagonal structure of the regressors.
// Estimate the 5 equations in 2 different ways.

// 1. As separate regressions.
reg lnTCp3 lnQ lnp1p3 lnp2p3 if
gid==1
gid==2
gid==3
gid==4
gid==5
// 2. As a single "stacked" system of equations using the interactions.

reg lnTCp3
///
DgidXlnQ_1 DgidXlnp1p_1 DgidXlnp2p_1 Dgid_1
///
///
///
///
///
, nocons
// 2. As a single system of equations but using Stata's factor variables.
reg lnTCp3 ibn.gid ibn.gid#c.(lnQ lnp1p3 lnp2p3), nocons
// Note that when the equations are estimated as a system:
//
(a) The coefficients are exactly the same as estimated separately.
//
(b) The standard errors are NOT the same as estimated separately.
// Why?
// And why should we use the nocons option? What happens if we don't?
13

SGPE Econometrics Lab 5: Returns To Scale in Electricity Supply Mark Schaffer Version of 21.10.2016

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

SGPE Econometrics Lab 5: Returns To Scale in Electricity Supply Mark Schaffer Version of 21.10.2016

Încărcat de

Drepturi de autor:

Formate disponibile

SGPE Econometrics Lab 5: Returns to Scale in Electricity Supply

where 1 is labour input of firm i, 2 is capital input and 3 is fuel input.

Log-linearized cost function:

Task 1: Test homogeneity

Which of the Assumptions 1.1-1.5 (linearity, strict exogeneity, no multicollinearity, spherical

An alternative approach is to estimate the unrestricted model and construct a confidence

Task 2: Estimating the restricted and unrestricted models

Task 3: Estimating returns to scale

Recall that from the log-linearized cost function, 2 = , 3 = 1, 4 = 2, 5 = 3.

. The point estimates of the elasticities are:

How do you interpret these results?

Task 4: Interactions, restrictions, systems of equations

The scatterplot should look like this:

The decreasing variance in the residuals suggests the assumption of conditional

, the square root of the estimated error variance.)

Summarize and describe the new variables:

(1) As 5 separate regressions.

Note that when the equations are estimated as a system:

CODE FOR DATASET PREP

// Generate 5 size categories.

CODE FOR TASK 1

CODE FOR TASK 2

Restricted model estimated two different ways.

// Restricted model using transformation trick:

Two ways of calculating the F statistic based on the LR Principle.

// Confirm identical to Wald test statistic from Task 1.

CODE FOR TASK 3

What is the confidence interval for beta2?

// Use the delta method (later lecture) and Stata's nlcom to

What are the point estimates of the

// Use Stata's nlcom and the delta method to obtain

CODE FOR TASK 4

Estimate the restricted model group-by-group, smallest to largest.

// All output displayed:

// Just beta2=1/r and sigma^2:

Estimating a system of equations (pp. 78-80).

// Create dummies for groups using xi command.

DO NOT SKIP THIS STEP:

// Estimate the 5 equations in 2 different ways.

// 2. As a single "stacked" system of equations using the interactions.

S-ar putea să vă placă și