Documente Academic
Documente Profesional
Documente Cultură
100 points
README: The following two problems will require a lot of calculations in STATA. It will
generate many pages of output. Here is how you should organize it. The first pages should
contain your answers to all the questions, along with showing any key algebraic equations
or explanations you need to use along the way. After that, include a printout of the output
from the regressions you executed in support of your answers. Highlight any numbers in this
output that you used in the first section. (To save paper, you may print this section double-side
and/or with 2-up format.) Last, include a copy of the DO file that contains the commands you
asked STATA to execute. Be sure you organize these in a way that will be clear to the reader.
Open this dataset within STATA. Before you begin answering the following, its not a
bad idea to ask STATA to summarize the data using the command summarize. You
should also start a log file to store your results.
Price 0 1 *beds
a. Run the following regression:
-------------+----------------------------------------------------------------
------------------------------------------------------------------------------
b. Hypothesize the sign of the bias, if any, resulting from excluding sqft from the
regression. Explain your reasoning.
Excluding sqft from the regression, I hypothesize that the bias of beds will be
positive. That is, as the number of bedrooms increase, so too does the price.
c. Use the data to verify (or not) your claim from b). Break down the bias into the
component pieces as we did in class
The data verifies that the number of bedrooms are related to the price. We see the constant
coef. Is 11025.06, then we see the bedroom coef. Is 32104.46
d. You will see from c. that the effect of beds is negative, once we control for square
footage. Does this make sense?
Yes, because if we control square footage and allow for an increase of beds, we would end
up with a house with more and more bedrooms. Who would want that?
------------------------------------------------------------------------------
price | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
beds | -17474.4 1983.609 -8.81 0.000 -21367.59 -13581.2
sqft | 98.13947 2.992471 32.80 0.000 92.26621 104.0127
baths | -1332.919 2688.311 -0.50 0.620 -6609.219 3943.381
age | -299.074 53.9454 -5.54 0.000 -404.9516 -193.1963
stories | -6708.066 3246.125 -2.07 0.039 -13079.18 -336.9543
_cons | 28193.49 5344.806 5.27 0.000 17703.34 38683.64
------------------------------------------------------------------------------
f. At a level of =.05, for which, if any, values of i, would you reject the null
hypothesis that i=0?
We reject all null hypotheses except for baths which has a p-value greater than .05 baths
=.620
= 107,622.32
According to this model, how much will my house change in value five years from
today?
= 106,126.90
gen price_thous=price/1000
.
. reg price_thous beds sqft baths age stories
------------------------------------------------------------------------------
price_thous | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
beds | -17.4744 1.983609 -8.81 0.000 -21.36759 -13.5812
sqft | .0981395 .0029925 32.80 0.000 .0922662 .1040127
baths | -1.332919 2.688311 -0.50 0.620 -6.609219 3.943381
age | -.299074 .0539454 -5.54 0.000 -.4049516 -.1931963
stories | -6.708066 3.246125 -2.07 0.039 -13.07918 -.3369542
_cons | 28.19349 5.344806 5.27 0.000 17.70334 38.68364
------------------------------------------------------------------------------
i. Compare the coefficients, standard error, and t-statistics for the independent variables.
Briefly interpret the difference between this model and the version from part e.
The real values of the coefficients were unchanged, only the decimal place is
moved for all coefficients. They are just divided by 1000. Also, the t-stats are
unchanged.
j. Create a new age variable by converting age from years to days (365 days in a year).
Rerun the regression from e with the new age variable in place of the original age.
gen age_days=age*365
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
------------------------------------------------------------------------------
According to the economic model of crime rates, lower crime rates are associated with
better labor markets (higher wages), more police presence and tougher sentences, and
lower population density. We will use this data set to examine these hypotheses. Use a
significance level of =.05 for all hypothesis tests. All of the following regressions
will utilize the following subset of variables from this dataset.
crmrte=crime rate
prbarr=probability of arrest
prbconv=probability of conviction
prbpris=probability of a prison sentence
avgsen=average sentence in days
polpc=number of police per capita
density=population density
pctmin=percent minority
taxpc=tax revenue per capita
wmfg=average weekly wage in manufacturing
wcon=average weekly wage in construction
wtuc=average weekly wage in transportation,utilities,and communications
wtrd=average weekly wage in wholesale and retail trade
wfir=average weekly wage in finance,insurance,and real estate
wser=average weekly wage in services
wfed=average weekly wage in federal government
wsta=average weekly wage in state government
wloc=average weekly wage in local government
a. Run a regression of crmrte on the variables listed above. Call this Model 1.
//model 1
.
. reg crmrte prbarr prbconv prbpris avgsen polpc density pctmin taxpc wmfg wcon wtuc wtrd
wfir wser wfed wsta wloc
------------------------------------------------------------------------------
-------------+----------------------------------------------------------------
Probability of arrest, population density, and percent minority are all not statistically
significant, because they al fall below our original alpha test value
Our F Statistic is calculated through the SSRrestricted minus the SSR unrestricted devided
by the number of restrictions all over the SSR unrestricted devided by our observations
minus the number of individual variables in the unrestricted regression minus one. Our
model gives us F(17, 72) This means our SSE df is 17 and our SSR df is 72. It tests the
joined hypotheses of all of our coefficients on all of our variables and that they are all zero.
This pval of 0.000 that is generated tells us that we will reject this null hyp at this level of .
05
d. Test the hypothesis that the coefficients on wsta and wloc are equal to each other.
Use the t-test method described in the lectures. What transformation do you need to
do here? Be specific.
We use an elaborate t test. We must generate some new value of wsta+wloc labled as
wstawloc. We run the model one regression using wstawloc in place of wsta and wloc We
find our t val as -.25 and our pval as .804, so we would not reject with an alpha of .05.
e. Test the hypothesis that the coefficients on wfed, wsta and wloc are all equal to
each other. Do this by writing down the formula for the relevant F-statistic.
Calculate it (by running the appropriate restricted regression) and test the hypothesis.
Report these results. This restricted version of the regression will be called Model 2.
For this, we use the same elaborate ttest as we ran in D. We generate some
variable labeled Qe=(wsta+wfed+wloc). We run the model 1 regression with Qe
in place of wsta wfed and wloc on crmrte. We must find our fstat with
. 0056988 45.005661278/2
F=
.005661278/7 2 = .238887688 WE will fail to reject .23 > .05
Restricted:
Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 15, 74) = 17.34
Model | .020033929 15 .001335595 Prob > F = 0.0000
Residual | .005698845 74 .000077011 R-squared = 0.7785
-------------+------------------------------ Adj R-squared = 0.7336
Total | .025732774 89 .000289132 Root MSE = .00878
Unrestricted is Model 1
f. Return to Model 1. Now test the hypothesis that all 9 of the wage variables have a
coefficient of zero. Do this by writing down the formula for the relevant F-statistic.
Calculate it (by running the appropriate restricted regression) and test the hypothesis.
Report these results.
Same process for the one above but all wage variables are replaced by Qf
. 005938334.005661278/8
F=
.005661278/80 = .489387732 Therefore we will fail to reject the Ho
Restricted is model 1
g. If a crime is committed, the probability of arrest is prbarr. If a person is arrested for
the crime, the probability of conviction is prbconv. If the person is convicted, the
probability of prison is prbpris. Assuming all these probabilities are independent.
What is the formula for calculating the probability that someone who commits a
crime will a) get arrested AND b) get convicted AND c) get a prison sentence? That
is, how would you calculate the probability of this intersection of statistically
independent events? [Note: the probabilities produced by the researchers are
derived from the arrest data, and thus may not follow the usual rules of probability.
In particular, some probabilities are greater than one. Dont worry about that here.]
Call this variable prjail_ifcrime and create it in STATA.
h. Given this result prjail_ifcrime, how would you use the variable avgsen to calculate
the expected time in jail if commiting a crime. Call this variable jailtime_ifcrime and
create it in STATA.
-------------+--------------------------------------------------------
i. Return to regression Model 1. Replace the variables prbarr, prbconv, prbpris and
avgsen with your new variable jailtime_ifcrime. This is Model 4. Write a paragraph
in which you discuss how Model 4 compares with Model 1.
. reg crmrte jailtime_ifcrime polpc density pctmin taxpc wmfg wcon wtuc wtrd wfir
wser wfed wsta wloc
Source | SS df MS Number of obs = 90
-------------+------------------------------ F( 14, 75) = 14.31
Model | .018723082 14 .001337363 Prob > F = 0.0000
Residual | .007009691 75 .000093463 R-squared = 0.7276
-------------+------------------------------ Adj R-squared = 0.6767
Total | .025732774 89 .000289132 Root MSE = .00967
----------------------------------------------------------------------------------
crmrte | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-----------------+----------------------------------------------------------------
jailtime_ifcrime | -.0001843 .000319 -0.58 0.565 -.0008197 .000451
polpc | -1.135305 .7543213 -1.51 0.137 -2.637991 .3673797
density | .0081923 .0010578 7.74 0.000 .006085 .0102996
pctmin80 | .0002044 .0000719 2.84 0.006 .0000611 .0003476
taxpc | .0002442 .00017 1.44 0.155 -.0000944 .0005828
wmfg | .0000139 .0000238 0.58 0.561 -.0000335 .0000614
wcon | .0000297 .0000481 0.62 0.539 -.0000662 .0001256
wtuc | -.0000255 .0000233 -1.09 0.278 -.000072 .000021
wtrd | -.0000212 .000066 -0.32 0.749 -.0001527 .0001103
wfir | .0000352 .000057 0.62 0.538 -.0000782 .0001487
wser | 2.04e-06 .000055 0.04 0.970 -.0001075 .0001116
wfed | .0000485 .0000305 1.59 0.116 -.0000123 .0001093
wsta | .000016 .0000424 0.38 0.707 -.0000685 .0001006
wloc | -7.90e-06 .0000932 -0.08 0.933 -.0001935 .0001777
_cons | -.0116069 .0185137 -0.63 0.533 -.0484881 .0252743
Upon creating model 4 and comparing our data to model 1 we fine several
differences in the data. Among our data, police per capita and tax revenue per
capita has increased so too as our average wage in manufacturing. It appears
that the rest of the wages have fallen in terms of their coefficients, while most of
their t values have increased. More surprising is our R2 has decreased
measurably, So I would say that this model 4 is not as good at predicting as
model 1.