Sunteți pe pagina 1din 3

# 6305: Applied Econometrics for Policy Analysis

Homework 3 Due on 26 April 2013 Maximum Marks: 100 April 16, 2013

Oaxaca-Blinder Decomposition

For this set of questions, use the dataset called Q1.dta. This is the dataset that DiNardo and Pischke used for their paper on returns to pencils. We will use it to look at male-female wage dierentials. 1. First, start by computing the mean hourly wages of males and females for 1979 and 1985. Comment on the wage dierential. Who does it favour and how does it change between 1979 and 1985? 2. To what extent do the above gures indicate statistical discrimination against one group? Justify your answer using insights from what you know from this course. 3. Perform the Oaxaca Blinder decomposition separately for 1979 and 1985. In your regression, include the following variables: exp, expsq, school, schoolsq, married, sit, computer, pencil, teleph, calc, hammer, city, civser. Make sure to compute clustered standard errors using occupation. Show the output of the regressions and the decomposition. 4. Write a short paragraph contrasting the results for females and males within each year (in 1979 and 1985). What insights can you oer to explain these patterns? 5. Write a short paragraph on the trends, i.e. comparing the the signs on the covariates within each group, for males and females, between 1979 and 1989. Provide hypotheses on why we see what we see. 6. Using the results on decomposition into endowment eects and coecients and your insights above, oer your comments on the nature of discrimination or unexplained male-female wage dierentials. 7. What are the assumptions underlying your inferences? How would you persuade yourself that these are reasonable assumptions to make?

Dierence-in-Dierences Estimate

For this set of questions, use the dataset called Q2.dta. This is a dataset of housing prices in North Andover, Massachussetts. We will look at the impact that a new garbage incinerator had (or did not have) on housing prices. The construction of the incinerator was announced in 1979; it was constructed in 1981 and became operational in 1985. Housing values represent prices of houses sold in 1978 and 1981, before the announcement and during construction respectively. 1. Here are some preliminary questions to set up the problem. (a) What hypothesis would you hold in terms of the direction of impact on housing value with the coming of the incinerator?

(b) What would you use as the outcome variable of interest (i.e., Y )? (c) What would you use as the proxy that represents the cause (or the treatment variable, i.e., X )? Justify your choice. There are many options here; use your imagination (but not too much imagination). 2. For this part of the problem, follow the instructions carefully. (a) Using the variable rprice (price in real terms), compute the following four quantitites (a) mean housing value of the 1978 control, (b) mean housing value of the 1978 treated, (c) mean housing value of the 1981 for control observations and (d) the mean housing value for the treated units in 1981. For this part of the analysis, use nearinc as the treatment variable. (b) Compute the dierence-in-dierences estimate of the impact of being close to the incinerator. Express your nding in a single complete sentence in a way that a layperson can understand easily. (c) What is the key assumption under which the above ndings hold? (d) Now run two sets of regressions, where you regress rprice on nearinc, separately for 1978 and 1981. Write out these two regression models using the coecients you have estimated. rprice78 rprice81 = 78 + nearinc78 + = 81 + nearinc81 +
78 81

(1) (2)

(e) To test whether this dierence-in-dierences estimate is statistically signicant, use a pooled regression model. For this, pool all the data and run the model rprice = + nearinc + y81 nearinc + y81 + (3)

Determine dierence-in-dierences estimate from this model and comment on its statistical signicance. (f) Compare this estimate to the one computed without the regression model? Is the regression-based estimate smaller or larger? Why? Show this algebraically. (g) Now, expand this model to control for covariates that are also time invariant. In particular, run the model including the covariates distance from the interstate highway (intst), the number of rooms (rooms), bathsbaths, house area (area), land size (land). Distances are in feet and areas are in feet square. What happens to the size of the DD estimate? What happens to the standard error of the DD estimate? Is you result now stronger and weaker after adding the covariates? What happens to the coecient on nearinc ? What does this say about the drivers of housing value?

(h) As a nal exercise, estimate the following model log(price) = + nearinc + y81 nearinc + y81 + (4)

using log(price) as the relevant outcome variable. Write out the results substituting the above model with the coecients you nd. Can you express the average treatment eect as represented by the dierence-in-dierences estimate in percentage terms? (i) Explain whether with this dataset you would contemplate using individual xed eects to control for unobserved heterogeneity. Justify your answer. (j) What, if any, are the potential problems with the treatment variable nearinc ? 2

Quantile Regression

For this section, use the same dataset Q2.dta. In keeping with the spirit of the ndings on the impact of an incinerator on house values, we might believe that despite the shared features of the neighbourhood, the impact might be dierent at dierent housing values, so that the distribution of housing values might be aected rather than merely the average. In this section, you will perform a quantile regression to assess heterogenous impacts of the incinerator.We will not push this logic too far because we have not really covered quantile regression in the context of a pooled data that combines cross sections in two time periods. We will therefore keep it simple. (a) Estimate a quantile regression model using rprice on nearinc, y81, y81nrinc, area,land,rooms, cbd, intst, baths, dist, wind, age, agesq. Compute these for quintiles and 100 repetitions. (b) Let us focus on the the value of the coecient on y81 nearinc, and the coecients on y81 and nearinc. Perform a test of equality of these across the quintiles. Feel free to choose those quintiles that you nd interesting. (c) Write a brief paragraph summarizing your ndings, focussing mainly on the three covariates above. But feel free to comment on the others that you might nd interesting. (d) Focussing on the three key covariates identied above, are these quantile coecients statistically signicantly dierent from the least squares regression? (e) Please provide using sqreg gures for the quantile regression you have just performed.