Documente Academic
Documente Profesional
Documente Cultură
Answer: A
11) In a randomized controlled experiment
A) there is a control group and a treatment group.
B) you control for the effect that random numbers are not truly randomly generated
C) you control for random answers
D) the control group receives treatment on even days only.
Answer: A
12) The reason why economists do not use experimental data more frequently is for all of the following reasons
except that real-world experiments
A) cannot be executed in economics.
B) with humans are difficult to administer.
C) are often unethical.
D) have flaws relative to ideal randomized controlled experiments.
Answer: A
13) The most frequently used experimental or observational data in econometrics are of the following type:
A) cross-sectional data.
B) randomly generated data.
C) time series data.
D) panel data.
Answer: A
14) In the graph below, the vertical axis represents average real GDP growth for 65 countries over the period
1960-1995, and the horizontal axis shows the average trade share within these countries.
This is an example of
A) cross-sectional data.
B) experimental data.
C) a time series.
D) longitudinal data.
Answer: A
Is an example of
A) cross-sectional data.
B) experimental data.
C) a time series.
D) longitudinal data.
Answer: A
is an example of
A) experimental data.
B) cross-sectional data.
C) a time series.
D) longitudinal data.
Answer: C
1.2 Essays
1) Give at least three examples from economics where each of the following type of data can be used:
cross-sectional data, time series data, and panel data.
Answer: Answers will vary by student. At this level of economics, students most likely have heard of the
following use of cross-sectional data: earnings functions, growth equations, the effect of class size
reduction on student performance (in this chapter), demand functions (in this chapter: cigarette
consumption); time series: the Phillips curve (in this chapter), consumption functions, Okun s law; panel
data: various U.S. state panel studies on road fatalities (in this book), unemployment rate and
unemployment benefits variations, growth regressions (across states and countries), and crime and
abortion (Freakonomics).
D)
3
Y
3
Y
E (Y 3
Y)
3
Y
Answer: D
7) The skewness is most likely positive for one of the following distributions:
A) The grade distribution at your college or university.
B) The U.S. income distribution.
C) SAT scores in English.
D) The height of 18 year old females in the U.S.
Answer: B
8) The kurtosis of a distribution is defined as follows:
4
E YY
A)
4
Y
E Y4 B)
C)
4
Y
2
Y
skewness
var(Y)
D) E[(Y - Y)4 )
Answer: A
9) For a normal distribution, the skewness and kurtosis measures are as follows:
A) 1.96 and 4
B) 0 and 0
C) 0 and 3
D) 1 and 2
Answer: C
Pr(X = x i, Y = y).
i=1
Pr(X = x, Y = y)
C)
Pr(Y = y)
B)
D)
Pr(X = x, Y = y)
.
Pr(X = x)
Answer: D
11) The conditional expectation of Y given X, E(Y X = x), is calculated as follows:
k
A)
Yi Pr(X = x i Y= y)
i=1
B) E E(Y X)]
k
C)
y i Pr(Y = y i X= x)
i=1
l
D)
E(Y X= x i) Pr(X = x i)
i=1
Answer: C
12) Two random variables X and Y are independently distributed if all of the following conditions hold, with the
exception of
A) Pr(Y = y X = x) = Pr(Y = y).
B) knowing the value of one of the variables provides no information about the other.
C) if the conditional distribution of Y given X equals the marginal distribution of Y.
D) E(Y) = E[E(Y X)].
Answer: D
13) The correlation between X and Y
A) cannot be negative since variances are always positive.
B) is the covariance squared.
C) can be calculated by dividing the covariance between X and Y by the product of the two standard
deviations.
cov(X, Y)
.
D) is given by corr(X, Y) =
var(X) var(Y)
Answer: C
14) Two variables are uncorrelated in all of the cases below, with the exception of
A) being independent.
B) having a zero covariance.
C)
XY
2
X
2
Y.
D) E(Y X) = 0.
Answer: C
2
2
X+ b
2
Y.
B) a2
2
2
X + 2ab XY + b
2
Y.
C) XY + X Y.
D) a
2
X +b
2
Y.
Answer: B
16) To standardize a variable you
A) subtract its mean and divide by its standard deviation.
B) integrate the area below two points under the normal distribution.
C) add and subtract 1.96 times the standard deviation to the variable.
D) divide it by its standard deviation, as long as its mean is 1.
Answer: A
17) Assume that Y is normally distributed N( , 2 ). Moving from the mean ( ) 1.96 standard deviations to the left
and 1.96 standard deviations to the right, then the area under the normal p.d.f. is
A) 0.67
B) 0.05
C) 0.95
D) 0.33
Answer: C
18) Assume that Y is normally distributed N( , 2 ). To find Pr(c1
to calculate Pr(d 1
ci
, you need
d2) =
A)
(d 2 ) -
B)
C)
(1.96) - (1.96)
(d 2 ) - (1 - (d 1 ))
(d 1 )
D) 1 - ( (d 2 ) -
(d 1 ))
Answer: A
19) If variables with a multivariate normal distribution have covariances that equal zero, then
A) the correlation will most often be zero, but does not have to be.
B) the variables are independent.
C) you should use the 2 distribution to calculate probabilities.
D) the marginal distribution of each of the variables is no longer normal.
Answer: B
20) The Student t distribution is
A) the distribution of the sum of m squared independent standard normal random variables.
B) the distribution of a random variable with a chi-squared distribution with m degrees of freedom, divided
by m.
C) always well approximated by the standard normal distribution.
D) the distribution of the ratio of a standard normal random variable, divided by the square root of an
independently distributed chi-squared random variable with m degrees of freedom divided by m.
Answer: D
distribution.
Answer: B
22) The sample average is a random variable and
A) is a single number and as a result cannot have a distribution.
B) has a probability distribution called its sampling distribution.
C) has a probability distribution called the standard normal distribution.
D) has a probability distribution that is the same as for the Y1 ,..., Yn i.i.d. variables.
Answer: B
23) To infer the political tendencies of the students at your college/university, you sample 150 of them. Only one of
the following is a simple random sample: You
A) make sure that the proportion of minorities are the same in your sample as in the
entire student body.
B) call every fiftieth person in the student directory at 9 a.m. If the person does not answer the phone, you
pick the next name listed, and so on.
C) go to the main dining hall on campus and interview students randomly there.
D) have your statistical package generate 150 random numbers in the range from 1 to the total number of
students in your academic institution, and then choose the corresponding names in the student telephone
directory.
Answer: D
24) The variance of Y,
A)
2
Y.
B)
Y
.
n
2
Y , is given by the following formula:
2
Y
C)
2
Y
D)
Answer: C
C)
D)
Y
Answer: B
26) In econometrics, we typically do not rely on exact or finite sample distributions because
A) we have approximately an infinite number of observations (think of re -sampling).
B) variables typically are normally distributed.
C) the covariances of Yi, Yj are typically not zero.
D) asymptotic distributions can be counted on to provide good approximations to the exact sampling
distribution (given the number of observations available in most cases).
Answer: D
27) Consistency for the sample average Y can be defined as follows, with the exception of
A) Y converges in probability to Y.
B) Y has the smallest variance of all estimators.
p
C) Y
Y.
D) the probability of Y being in the range Y c becomes arbitrarily close to one as n increases for any
constant c > 0.
Answer: B
28) The central limit theorem states that
A) the sampling distribution of
Y- Y
is approximately normal.
Y
B) Y
Y.
C) the probability that Y is in the range Y c becomes arbitrarily close to one as n increases for any constant
c > 0.
D) the t distribution converges to the F distribution for approximately n > 30.
Answer: A
29) The central limit theorem
A) states conditions under which a variable involving the sum of Y1 ,..., Yn i.i.d. variables becomes the
standard normal distribution.
B) postulates that the sample mean Y is a consistent estimator of the population mean Y.
C) only holds in the presence of the law of large numbers.
D) states conditions under which a variable involving the sum of Y1 ,..., Yn i.i.d. variables becomes the
Student t distribution.
Answer: A
2
XY
C)
2
2
XY
X
D)
2 2
.
X Y
2
.
Y
2
X
2
XY
2
Y
Answer: B
n
31)
i=1
n
A) a
i=1
n
B) a
xi + b
xi + b
i=1
n
i=1
n
y i + n c
yi + c
i=1
C) ax + by + nc
n
n
xi + b
yi
D) a
i=1
i=1
Answer: A
n
32)
(axi+b)
i=1
A) nax+
nb
B) n(a+b)
C)
D)
Answer: A
33) Assume that you assign the following subjective probabilities for your final grade in your econometrics course
(the standard GPA scale of 4 = A to 0 = F applies):
Probability
0.20
0.50
0.20
0.08
Grade
A
B
C
D
F
0.02
x- x
x
is the standard deviation. Then the expected value and the standard deviation of Y are given as
A) 0 and 1
B) 1 and 1
C) Cannot be computed because Y is not a linear function of X
D)
and x
Answer: A
1
6
= ;
36 6
(b) 0.111 or
1
4
= ;
39 9
(c) 1;
(d) 0;
(e) 0.583;
(f) 0.222 or
2
8
= .
36 9
3) Probabilities and relative frequencies are related in that the probability of an outcome is the proportion of the
time that the outcome occurs in the long run. Hence concepts of joint, marginal, and conditional probability
distributions stem from related concepts of frequency distributions.
You are interested in investigating the relationship between the age of heads of households and weekly
earnings of households. The accompanying data gives the number of occurrences grouped by age and income.
You collect data from 1,744 individuals and think of these individuals as a population that you want to
describe, rather than a sample from which you want to infer behavior of a larger population. After sorting the
data, you generate the accompanying table:
Joint Absolute Frequencies of Age and Income, 1,744 Households
Household Income
Y1 $0-under $200
Y2 $200-under $ 400
90
346
140
Y3 $400-under $600
19
251
101
Y4 $600-under $800
11
110
55
108
84
Answer: (a) The joint relative frequencies and marginal relative frequencies are given in the accompanying table.
5.2 percent of the individuals are between the age of 20 and 24, and make between $200 and under $400.
21.6 percent of the individuals earn between $400 and under $600.
Joint Relative and Marginal Frequencies of Age and Income, 1,744 Households
Age of head of household
X1
X2
X3
X4
Household Income 16-under 20 20-under 25 25-under 45 45-under 65
Y1 $0-under $200
0.046
0.044
0.075
0.049
X5
65 and >
Total
0.014
0.227
Y2 $200-under $400
Y3 $400-under $600
0.007
0.052
0.198
0.080
0.005
0.342
0.000
0.011
0.144
0.058
0.003
0.216
Y4 $600-under $800
0.001
0.006
0.063
0.032
0.001
0.102
0.001
0.001
0.062
0.048
0.001
0.112
(b) The mean household income for the 16-under 20 age category is roughly $144. It is approximately
$489 for the 45-under 65 age category.
Conditional Relative Frequencies of Income and
Age 16-under 20, and 45-under 65, 1,744 Households
Age of head of household
X1
X4
Household Income 16-under 20
Y1 $0-under $200
0.842
45-under 65
0.185
Y2 $200-under $400
0.300
0.137
Y3 $400-under $600
0.000
0.217
Y4 $600-under $800
0.001
0.118
0.001
0.180
(c) They would have to be identical, which they clearly are not.
(d) Pr(Y = y, X = x) = Pr(Y = y) Pr(X = x). We can check this by multiplying two marginal probabilities to
see if this results in the joint probability. For example, Pr(Y = Y3 ) = 0.216 and Pr(X = X3 ) = 0.542,
resulting in a product of 0.117, which does not equal the joint probability of 0.144. Given that we are
looking at the data as a population, not a sample, we do not have to test how close 0.117 is to 0.144.
4) Math and verbal SAT scores are each distributed normally with N (500,10000).
(a) What fraction of students scores above 750? Above 600? Between 420 and 530? Below 480? Above 530?
(b) If the math and verbal scores were independently distributed, which is not the case, then what would be the
distribution of the overall SAT score? Find its mean and variance.
(c) Next, assume that the correlation coefficient between the math and verbal scores is 0.75. Find the mean and
variance of the resulting distribution.
(d) Finally, assume that you had chosen 25 students at random who had taken the SAT exam. Derive the
distribution for their average math SAT score. What is the probability that this average is above 530? Why is
this so much smaller than your answer in (a)?
Answer: (a) Pr(Y>750) = 0.0062; Pr(Y>600) = 0.1587; Pr(420<Y<530) = 0.4061; Pr(Y<480) = 0.4270; Pr(Y>530) =
0.3821.
(b) The distribution would be N(1000, 2000), using equations (2.29) and (2.31) in the textbook. Note that
the standard deviation is now roughly 141 rather than 200.
(c) Given the correlation coefficient, the distribution is now N(1000, 35000) , which has a standard
deviation of approximately 187.
(d) The distribution for the average math SAT score is N(500, 400). Pr(Y > 530) = 0.0668. This probability
is smaller because the sample mean has a smaller standard deviation (20 rather than 100).
5) The following problem is frequently encountered in the case of a rare disease, say AIDS, when determining the
probability of actually having the disease after testing positively for HIV. (This is often known as the accuracy
of the test given that you have the disease.) Let us set up the problem as follows: Y = 0 if you tested negative
using the ELISA test for HIV, Y = 1 if you tested positive; X = 1 if you have HIV, X = 0 if you do not have HIV.
Assume that 0.1 percent of the population has HIV and that the accuracy of the test is 0.95 in both cases of (i)
testing positive when you have HIV, and (ii) testing negative when you do not have HIV. (The actual ELISA
test is actually 99.7 percent accurate when you have HIV, and 98.5 percent accurate when you do not have
HIV.)
(a) Assuming arbitrarily a population of 10,000,000 people, use the accompanying table to first enter the
column totals.
Test Positive (Y=1)
HIV (X=1)
No HIV (X=0)
Total
Total
10,000,000
(b) Use the conditional probabilities to fill in the joint absolute frequencies.
(c) Fill in the marginal absolute frequencies for testing positive and negative. Determine the conditional
probability of having HIV when you have tested positive. Explain this surprising result.
(d) The previous problem is an application of Bayes theorem, which converts Pr( Y = y X = x) into Pr(X = x Y =
y). Can you think of other examples where Pr( Y = y X = x) Pr(X = x Y = y)?
Answer: (a)
Test Positive (Y=1)
Total
10,000
9,990,000
10,000,000
Total
10,000
9,990000
10,000,000
HIV (X=1)
No HIV (X=0)
Total
Total
10,000
9,990,000
10,000,000
(b)
HIV (X=1)
No HIV (X=0)
Total
(c)
HIV (X=1)
No HIV (X=0)
Total
Pr(X=1 Y=1) = 0.0187. Although the test is quite accurate, there are very few people who have HIV
(10,000), and many who do not have HIV (9,999,000). A small percentage of that large number
(499,500/9,990,000) is large when compared to the higher percentage of the smaller number
(9,500/10,000).
d. Answers will vary by student. Perhaps a nice illustration is the probability to be a male given that you
play on the college/university mens varsity team, versus the probability to play on the college/university
mens varsity team given that you are a male student.
6) You have read about the so-called catch-up theory by economic historians, whereby nations that are further
behind in per capita income grow faster subsequently. If this is true systematically, then eventually laggards
will reach the leader. To put the theory to the test, you collect data on relative (to the United States) per capita
income for two years, 1960 and 1990, for 24 OECD countries. You think of these countries as a population you
want to describe, rather than a sample from which you want to infer behavior of a larger population. The
relevant data for this question is as follows:
X1
0.023
0.014
.
0.041
0.033
0.625
0.770
1.000
.
0.200
0.130
13.220
X2
1.030
1.000
Y X1
0.018
0.014
.
.
0.450
0.008
0.230
0.004
17.800 0.294
Y2
0.00053
0.00020
.
0.00168
0.00109
0.01877
2
X1
2
X2
0.593
1.000
1.0609
1.0000
.
.
0.040 0.2025
0.017 0.0529
8.529 13.9164
where X1 and X2 are per capita income relative to the United States in 1960 and 1990 respectively, and Y is the
average annual growth rate in X over the 1960-1990 period. Numbers in the last row represent sums of the
columns above.
(a) Calculate the variance and standard deviation of X1 and X2 . For a catch-up effect to be present, what
relationship must the two standard deviations show? Is this the case here?
(b) Calculate the correlation between Y and . What sign must the correlation coefficient have for there to be
evidence of a catch-up effect? Explain.
Answer: (a) The variances of X1 and X2 are 0.0520 and 0.0298 respectively, with standard deviations of 0.2279
and 0.1726. For the catch-up effect to be present, the standard deviation would have to shrink over time.
This is the case here.
(b) The correlation coefficient is 0.88. It has to be negative for there to be evidence of a catch -up effect. If
countries that were relatively ahead in the initial period and in terms of per capita income grow by
relatively less over time, then eventually the laggards will catch -up.
7) Following Alfred Nobels will, there are five Nobel Prizes awarded each year. These are for outstanding
achievements in Chemistry, Physics, Physiology or Medicine, Literature, and Peace. In 1968, the Bank of
Sweden added a prize in Economic Sciences in memory of Alfred Nobel. You think of the data as describing a
population, rather than a sample from which you want to infer behavior of a larger population. The
accompanying table lists the joint probability distribution between recipients in economics and the other five
prizes, and the citizenship of the recipients, based on the 1969-2001 period.
Joint Distribution of Nobel Prize Winners in Economics and Non -Economics
Disciplines, and Citizenship, 1969-2001
Economics Nobel
Prize (X = 0)
Physics, Chemistry,
Medicine, Literature,
and Peace Nobel
Prize (X = 1)
Total
U.S. Citizen
(Y = 0)
0.118
Total
0.345
0.488
0.833
0.463
0.537
1.00
0.167
(c) A randomly selected Nobel Prize winner reports that he is a non-U.S. citizen. What is the probability that
this genius has won the Economics Nobel Prize? A Nobel Prize in the other five disciplines?
(d) Show what the joint distribution would look like if the two categories were independent.
Answer: (a) E(Y) = 0.53.7 . 53.7 percent of Nobel Prize winners were non-U.S. citizens.
(b) E(Y X=1) = 0.586 . 58.6 percent of Nobel Prize winners in non-economics disciplines were non-U.S.
citizens. E(Y X=0) = 0.293 . 29.3 percent of the Economics Nobel Prize winners were non -U.S. citizens.
(c) There is a 9.1 percent chance that he has won the Economics Nobel Prize, and a 90.9 percent chance
that he has won a Nobel Prize in one of the other five disciplines.
(d)
Joint Distribution of Nobel Prize Winners in Economics and Non -Economics Disciplines,
and Citizenship, 1969-2001, under assumption of independence
Economics Nobel
Prize (X = 0)
Physics, Chemistry,
Medicine, Literature,
and Peace Nobel
Prize (X = 1)
Total
U.S. Citizen
(Y = 0)
0.077
Total
0.386
0.447
0.833
0.463
0.537
1.00
0.167
8) A few years ago the news magazine The Economist listed some of the stranger explanations used in the past to
predict presidential election outcomes. These included whether or not the hemlines of womens skirts went up
or down, stock market performances, baseball World Series wins by an American League team, etc. Thinking
about this problem more seriously, you decide to analyze whether or not the presidential candidate for a
certain party did better if his party controlled the house. Accordingly you collect data for the last 34
presidential elections. You think of this data as comprising a population which you want to describe, rather
than a sample from which you want to infer behavior of a larger population. You generate the accompanying
table:
Joint Distribution of Presidential Party Affiliation and Party Control
of House of Representatives, 1860 -1996
Democratic
President (X = 0)
Republican
President (X = 1)
Total
Total
0.441
0.176
0.382
0.559
0.588
0.412
1.00
(a) Interpret one of the joint probabilities and one of the marginal probabilities.
(b) Compute E(X). How does this differ from E(X Y = 0 )? Explain.
(c) If you picked one of the Republican presidents at random, what is the probability that during his term the
Democrats had control of the House?
(d) What would the joint distribution look like under independence? Check your results by calculating the two
conditional distributions and compare these to the marginal distribution.
Answer: (a) 38.2 percent of the presidents were Republicans and were in the White House while Republicans
controlled the House of Representatives. 44.1 percent of all presidents were Democrats.
(b) E(X)= 0.559. E(X Y = 0) = 0.701. E(X) gives you the unconditional expected value, while E(X Y = 0) is
the conditional expected value.
(c) E(X) = 0.559 . 55.9 percent of the presidents were Republicans. E(X Y = 0) = 0.299 . 29.9 percent of
those presidents who were in office while Democrats had control of the House of Representatives were
Republicans. The second conditions on those periods during which Democrats had control of the House
of Representatives, and ignores the other periods.
(d)
Joint Distribution of Presidential Party Affiliation and Party Control of House of
Representatives, 1860-1996, under the Assumption of Independence
Democratic Control Republican Control
of House (Y = 0)
of House (Y = 1)
0.259
0.182
Democratic
President (X = 0)
Republican
President (X = 1)
Total
Total
0.441
0.329
0.230
0.559
0.588
0.412
1.00
Pr(X = 0 Y = 0) =
0.259
= 0.440 (there is a small rounding error).
0.588
Pr(Y = 1 X = 1) =
0.230
= 0.411 (there is a small rounding error).
0.559
f (u u),
where p is the actual inflation rate, is the expected inflation rate, and u is the unemployment rate, with
indicating equilibrium (the NAIRU Non-Accelerating Inflation Rate of Unemployment). Under the
assumption of static expectations ( = p 1), i.e., that you expect this periods inflation rate to hold for the next
period ( the sun shines today, it will shine tomorrow ), then the prediction is that inflation will accelerate if the
unemployment rate is below its equilibrium level. The accompanying table below displays information on
accelerating annual inflation and unemployment rate differences from the equilibrium rate (cyclical
unemployment), where the latter is approximated by a five-year moving average. You think of this data as a
population which you want to describe, rather than a sample from which you want to infer behavior of a larger
population. The data is collected from United States quarterly data for the period 1964:1 to 1995:4.
Joint Distribution of Accelerating Inflation and Cyclical Unemployment,
1964:1-1995:4
p 1 > 0
(X = 0)
p p 1
(X = 1)
Total
(u u) > 0
(Y = 0)
0.156
(u u) 0
(Y = 1)
0.383
Total
0.297
0.164
0.461
0.453
0.547
1.00
0.539
be? Given that the two means are different, is this sufficient to assume that the two variables are independent?
(c) What is the probability of inflation to increase if there is positive cyclical unemployment? Negative cyclical
unemployment?
(d) You randomly select one of the 59 quarters when there was positive cyclical unemployment (( u u) > 0).
What is the probability there was decelerating inflation during that quarter?
Answer: (a) E(Y) = 0.547 . 54.7 percent of the quarters saw cyclical unemployment.
E(Y) = 0.461 . 46.1 percent of the quarters saw decreasing inflation rates.
(b) E(Y X = 1) = 0.356; E(Y X = 0 ) = 0.711. You would expect the two conditional expectations to be the
same. In general, independence in means does not imply statistical independence, although the reverse
is true.
(c) There is a 34.4 percent probability of inflation to increase if there is positive cyclical unemployment.
There is a 70 percent probability of inflation to increase if there is negative cyclical unemployment.
(d) There is a 65.6 percent probability of inflation to decelerate when there is positive cyclical
unemployment.
10) The accompanying table shows the joint distribution between the change of the unemployment rate in an
election year and the share of the candidate of the incumbent party since 1928. You think of this data as a
population which you want to describe, rather than a sample from which you want to infer behavior of a larger
population.
Joint Distribution of Unemployment Rate Change and Incumbent Partys Vote
Share in Total Vote Cast for the Two Major -Party Candidates,
1928-2000
u > 0 (X = 0)
u 0 (X = 1)
Total
Total
0.264
0.736
1.00
u > 0 (X = 0)
u 0 (X = 1)
Total
Total
0.264
0.736
1.00
11) The table accompanying lists the joint distribution of unemployment in the United States in 2001 by
demographic characteristics (race and gender).
Joint Distribution of Unemployment by Demographic Characteristics,
United States, 2001
Age 16-19
(X = 0)
Age 20 and above
(X = 1)
Total
White
(Y = 0)
0.13
Total
0.60
0.22
0.82
0.73
0.27
1.00
0.18
Age 16-19
(X = 0)
Age 20 and above
(X = 1)
Total
White
(Y = 0)
0.18
0.82
0.81
1.00
1.00
(c) The original table showed the joint probability distribution, while the table in (b) presented the
conditional probability distribution.
12) From the Stock and Watson (http://www.pearsonhighered.com/stock_watson ) website the chapter 8 CPS data
set (ch8_cps.xls) into a spreadsheet program such as Excel. For the exercise, use the first 500 observations only.
Using data for average hourly earnings only (ahe), describe the earnings distribution. Use summary statistics,
such as the mean, meadian, variance, and skewness. Produce a frequency distribution (histogram) using
reasonable earnings class sizes.
Answer: ahe
Mean
Standard Error
Median
Mode
Standard
Deviation
Sample
Variance
Kurtosis
Skewness
Range
Minimum
19.79
0.51
16.83
19.23
11.49
131.98
0.23
0.96
58.44
2.14
Stock/Watson 2e -- CVC2 8/23/06 -- Page 24
Maximum
Sum
Count
60.58
9897.45
500.0
The mean is $19.79. The median ($16.83) is lower than the average, suggesting that the mean is
being pulled up by individuals with fairly high average hourly earnings. This is confirmed by
the skewness measure, which is positive, and therefore suggests a distribution with a long tail to
the right. The variance is $2 131.96, while the standard deviation is $11.49.
To generate the frequency distribution in Excel, you first have to settle on the number of class
intervals. Once you have decided on these, then the minimum and maximum in the data
suggests the class width. In Excel, you then define bins (the upper limits of the class intervals).
Sturgess formula can be used to suggest the number of class intervals (1+3.31log(n) ), which
would suggest about 9 intervals here. Instead I settled for 8 intervals with a class width of $8
minimum wages in California are currently $8 and approximately the same in other U.S. states.
The table produces the absolute frequencies, and relative frequencies can be calculated in a
straightforward way.
bins
8
16
24
32
40
48
56
66
More
Frequency
50
187
115
68
38
33
8
1
0
rel. freq.
0.1
0.374
0.23
0.136
0.076
0.066
0.016
0.002
Substitution of the relative frequencies into the histogram table then produces the following
graph (after eliminating the gaps between the bars).
4) Using the fact that the standardized variable Z is a linear transformation of the normally distributed random
variable Y, derive the expected value and variance of Z.
Answer: Z =
Y- Y
Y
Y
Y
=-
Y
Y
1
Y
Y = a + bY, with a = -
Y = 0, and Z =
Y
Y
1
2
Z
and b =
2
= 1.
Z
5) Show in a scatterplot what the relationship between two variables X and Y would look like if there was
(a) a strong negative correlation.
(b) a strong positive correlation.
(c) no correlation.
Answer: (a)
(b)
(c)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 27
6) What would the correlation coefficient be if all observations for the two variables were on a curve described by
Y = X2 ?
Answer: The correlation coefficient would be zero in this case, since the relationship is non -linear.
7) Find the following probabilities:
(a) Y is distributed
2
4 . Find Pr(Y > 9.49).
8) In considering the purchase of a certain stock, you attach the following probabilities to possible changes in the
stock price over the next year.
Stock Price Change During
Next Twelve Months (%)
+15
+5
0
5
15
Probability
0.2
0.3
0.4
0.05
0.05
What is the expected value, the variance, and the standard deviation? Which is the most likely outcome? Sketch
the cumulative distribution function.
Answer: E(Y) = 3.5;
2
Y = 8.49; Y = 2.91; most likely: 0.
9) You consider visiting Montreal during the break between terms in January. You go to the relevant Web site of
the official tourist office to figure out the type of clothes you should take on the trip. The site lists that the
average high during January is 7 C, with a standard deviation of 4 C. Unfortunately you are more familiar
with Fahrenheit than with Celsius, but find that the two are related by the following linear function:
5
C= (F 32).
9
Find the mean and standard deviation for the January temperature in Montreal in Fahrenheit.
Answer: Using equations (2.29) and (2.30) from the textbook, the result is 19.4 and 7.2.
10) Two random variables are independently distributed if their joint distribution is the product of their marginal
distributions. It is intuitively easier to understand that two random variables are independently distributed if
all conditional distributions of Y given X are equal. Derive one of the two conditions from the other.
Answer: If all conditional distributions of Y given X are equal, then
Pr(Y = y X = 1) = Pr(Y = y X = 2) = ... = Pr(Y = y X = l).
But if all conditional distributions are equal, then they must also equal the marginal distribution, i.e.,
Pr(Y = y X = x) = Pr(Y - y).
Given the definition of the conditional distribution of Y given X = x, you then get
Pr(Y = y X = x) =
Pr(Y = y, X = x)
= Pr(Y = y),
Pr(X = x)
Pr(X = x Y = y)
,
Pr(X = x)
2
y i = 94,228.8,
n
i=1
2
x i = 1,248.9,
x iy i = 7,625.9
i=1
13) Use the definition for the conditional distribution of Y given X = x and the marginal distribution of X to derive
the formula for Pr(X = x, Y = y). This is called the multiplication rule. Use it to derive the probability for
drawing two aces randomly from a deck of cards (no joker), where you do not replace the card after the first
draw. Next, generalizing the multiplication rule and assuming independence, find the probability of having
four girls in a family with four children.
3
1 4
1
4
.
Answer:
= 0.0045; 0.0625 or
=
2
16
52 51
14) The systolic blood pressure of females in their 20s is normally distributed with a mean of 120 with a standard
deviation of 9. What is the probability of finding a female with a blood pressure of less than 100? More than
135? Between 105 and 123? You visit the womens soccer team on campus, and find that the average blood
pressure of the 25 members is 114. Is it likely that this group of women came from the same population?
Answer: Pr(Y<100) = 0.0131; Pr(Y>135) = 0.0478; Pr(105<Y<123) = 0.6784; Pr(Y< 114) = Pr(Z < -3.33) = 0.0004.
(The smallest z-value listed in the table in the textbook is 2.99, which generates a probability value of
0.0014.) This unlikely that this group of women came from the same population.
15) Show that the correlation coefficient between Y and X is unaffected if you use a linear transformation in both
variables. That is, show that corr(X,Y) = corr(X*, Y*), where X* = a + bX and Y* = c + dY, and where a, b, c, and d
are arbitrary nonzero constants.
Answer: corr(X*, Y*) =
cov(X*, Y*)
=
var(X*) var(Y*)
bd cov(X, Y)
corr(X, Y).
2
b var(X) d 2 var(Y)
16) The textbook formula for the variance of the discrete random variable Y is given as
2
Y =
(y i
2
Y) p i.
i=1
Another commonly used formulation is
2
Y =
2
y i pi
2
Y.
i=1
Prove that the two formulas are the same.
Answer:
2
Y =
(y i 2
Y) pi =
2
(y i +
2
Y - 2 Yyi) p i =
2
( y i pi +
2
Y p i - 2 Yy ip i).
i=1
i=1
i=1
Moving the summation sign through results in
k
k
k
k
k
2
2
2
But
y
p
p
p
.
p
2
y
1
and
y ip i , giving you the second
=
+
=
Y
i i
Y
i
i
Y
i i
Y
i=1
i=1
i=1
i=1
i=1
expression after simplification.
17) The Economic Report of the President gives the following age distribution of the United States population for the
year 2000:
United States Population By Age Group, 2000
Outcome (age
category
Percentage
Under 5 5-15
16-19
20-24
25-44
45-64
0.06
0.06
0.07
0.30
0.22
0.16
65 and
over
0.13
Imagine that every person was assigned a unique number between 1 and 275,372,000 (the total population in
2000). If you generated a random number, what would be the probability that you had drawn someone older
than 65 or under 16? Treating the percentages as probabilities, write down the cumulative probability
distribution. What is the probability of drawing someone who is 24 years or younger?
Answer: Pr(Y < 16 or Y > 65) = 0.35;
Outcome (age
category
Cumulative
probability
distribution
Pr(Y
Under 5 5-15
16-19
20-24
25-44
45-64
0.06
0.28
0.35
0.65
0.87
0.22
65 and
over
1.00
24) = 0.35.
18) The accompanying table gives the outcomes and probability distribution of the number of times a student
checks her e-mail daily:
Probability of Checking E-Mail
Outcome
(number of email checks)
Probability
distribution
0.05
0.15
0.30
0.25
0.15
0.08
0.02
Sketch the probability distribution. Next, calculate the c.d.f. for the above table. What is the probability of her
checking her e-mail between 1 and 3 times a day? Of checking it more than 3 times a day?
Answer: Outcome
(number of email checks)
Cumulative
probability
distribution
Pr(1
0.05
0.20
0.50
0.75
0.90
0.98
1.00
19) The accompanying table lists the outcomes and the cumulative probability distribution for a student renting
videos during the week while on campus.
Video Rentals per Week during Semester
Outcome (number of weekly 0
video rentals)
Probability distribution
0.05
0.55
0.25
0.05
0.07
0.02
0.01
Sketch the probability distribution. Next, calculate the cumulative probability distribution for the above table.
What is the probability of the student renting between 2 and 4 a week? Of less than 3 a week?
Answer: The cumulative probability distribution is given below. The probability of renting between two and four
videos a week is 0.37. The probability of renting less than three a week is 0.85.
Outcome (number of
weekly video rentals)
Cumulative probability
distribution
0.05
0.60
0.85
0.90
0.97
0.99
1.00
20) The textbook mentioned that the mean of Y, E(Y) is called the first moment of Y, and that the expected value of
the square of Y, E(Y2 ) is called the second moment of Y, and so on. These are also referred to as moments about
the origin. A related concept is moments about the mean, which are defined as E[(Y Y)r]. What do you call
the second moment about the mean? What do you think the third moment, referred to as skewness,
measures? Do you believe that it would be positive or negative for an earnings distribution? What measure of
the third moment around the mean do you get for a normal distribution?
Answer: The second moment about the mean is the variance. Skewness measures the departure from symmetry.
For the typical earnings distribution, it will be positive. For the normal distribution, it will be zero.
21) Explain why the two probabilities are identical for the standard normal distribution: Pr(1.96
Pr(1.96 < X < 1.96).
Answer: For a continuous distribution, the probability of a point is zero.
X 1.96) and
22) SAT scores in Mathematics are normally distributed with a mean of 500 and a standard deviation of 100. The
1 Y- Y 2
)
- (
1
2
Y
formula for the normal distribution is f(Y)=
e
Use the scatter plot option in a standard
2
2
Y
spreadsheet program, such as Excel, to plot the Mathematics SAT distribution using this formula. Start by
entering 300 as the first SAT score in the first column (the lowest score you can get in the mathematics section
as long as you fill in your name correctly), and then increment the scores by 10 until you reach 800. In the
second column, use the formula for the normal distribution and calculate f(Y). Then use the scatter plot option,
where you eventually remove markers and substitute these with the solid line option.
Answer:
23) Use a standard spreadsheet program, such as Excel, to find the following probabilities from various
distributions analyzed in the current chapter:
a. If Y is distributed N (1,4), find Pr(Y 3)
b. If Y is distributed N (3,9), find Pr(Y>0)
c. If Y is distributed N (50,25), find Pr(40 Y 52)
d. If Y is distributed N (5,2), find Pr(6 Y 8)
Answer: The answers here are given together with the relevant Excel commands.
a.
=NORMDIST(3,1,2,TRUE) = 0.8413
b.
=1-NORMDIST(0,3,3,TRUE) = 0.8413
c.
=NORMDIST(52,50,5,TRUE)-NORMDIST(40,50,5,TRUE) = 0.6326
d.
=NORMDIST(8,5,SQRT(2),TRUE)-NORMDIST(6,5,SQRT(2),TRUE) = 0.2229
24) Looking at a large CPS data set with over 60,000 observations for the United States and the year 2004, you find
that the average number of years of education is approximately 13.6. However, a surprising large number of
individuals (approximately 800) have quite a low value for this variable, namely 6 years or less. You decide to
drop these observations, since none of your relatives or friends have that few years of education. In addition,
you are concerned that if these individuals cannot report the years of education correctly, then the observations
on other variables, such as average hourly earnings, can also not be trusted. As a matter of fact you have found
several of these to be below minimum wages in your state. Discuss if dropping the observations is reasonable.
Answer: While it is always a good idea to check the data carefully before conducting a quantitative analysis, you
should never drop data before carefully thinking about the problem at hand. While it is not plausible to
find many individuals in the U.S. who were raised here with that few years of education, there will be
immigrants in the survey. Average years of education can be quite low in other countries. For example,
Brazils average years of schooling is less than 6 years. The point of the exercise is to think hard whether
or not observations are outliers generated by faulty data entry or if there is a reason for observing values
which may appear strange at first.
25) Use a standard spreadsheet program, such as Excel, to find the following probabilities from various
distributions analyzed in the current chapter:
a.
If Y is distributed
2
4 , find Pr( Y
b.
If Y is distributed
2
10 , find Pr( Y > 18.31)
c.
d.
e.
f.
g.
h.
7.78)
Answer: The answers here are given together with the relevant Excel commands.
a.
=1-CHIDIST(7.78,4) = 0.90
b.
=CHIDIST(18.31,10) = 0.05
c.
=FDIST(1.83,10,1000000) = 0.05
d. =TDIST(1.75,15,1) = 0.05
e.
=1-TDIST(1.99,90,2) = 0.95
f.
=NORMDIST(1.99,0,1,1)-NORMDIST(-1.99,0,1,1) = 0.953
g.
=FDIST(4.12,7,4) = 0.10
h. =FDIST(2.79,7,120) = 0.01
A) Y =
.
Y
B) Y has the smallest variance of all estimators.
p
C) Y
Y.
^
D) E( Y) = Y.
Answer: D
^
Y.
B) its mean square error is the smallest possible.
C) Y is normally distributed.
p
D) Y
0.
Answer: A
5) An estimator Y of the population value Y is more efficient when compared to another estimator Y, if
^
A) E( Y) > E( Y).
B) it has a smaller variance.
C) its c.d.f. is flatter than that of the other estimator.
^
2
Y /n.
C) SY.
SY
D)
.
n
Answer: D
8) The critical value of a two-sided t-test computed from a large sample
A) is 1.64 if the significance level of the test is 5%.
B) cannot be calculated unless you know the degrees of freedom.
C) is 1.96 if the significance level of the test is 5%.
D) is the same as the p-value.
Answer: C
9) A type I error is
A) always the same as (1-type II) error.
B) the error you make when rejecting the null hypothesis when it is true.
C) the error you make when rejecting the alternative hypothesis when it is true.
D) always 5%.
Answer: B
10) A type II error
A) is typically smaller than the type I error.
B) is the error you make when choosing type II or type I.
C) is the error you make when not rejecting the null hypothesis when it is false.
D) cannot be calculated when the alternative hypothesis contains an = .
Answer: C
11) The size of the test
A) is the probability of committing a type I error.
B) is the same as the sample size.
C) is always equal to (1-the power of test).
D) can be greater than 1 in extreme examples.
Answer: A
12) The power of the test is
A) dependent on whether you calculate a t or a t2 statistic.
B) one minus the probability of committing a type I error.
C) a subjective view taken by the econometrician dependent on the situation.
D) one minus the probability of committing a type II error.
Answer: D
13) When you are testing a hypothesis against a two-sided alternative, then the alternative is written as
A) E(Y) > Y,0.
B) E(Y) = Y,0.
C) Y
Y,0.
D) E(Y)
Y,0.
Answer: D
14) A scatterplot
A) shows how Y and X are related when their relationship is scattered all over the place.
B) relates the covariance of X and Y to the correlation coefficient.
C) is a plot of n observations on Xi and Yi, where each observation is represented by the point (Xi, Yi).
D) shows n observations of Y over time.
Answer: C
15) The following types of statistical inference are used throughout econometrics, with the exception of
A) confidence intervals.
B) hypothesis testing.
C) calibration.
D) estimation.
Answer: C
16) Among all unbiased estimators that are weighted averages of Y1 ,..., Yn Y, is
A) the only consistent estimator of Y.
B) the most efficient estimator of
Y.
C) a number which, by definition, cannot have a variance.
D) the most unbiased estimator of Y.
Answer: B
17) To derive the least squares estimator Y, you find the estimator m which minimizes
n
A)
(Yi m)2 .
i=1
n
B)
(Yi m) .
i=1
n
2
C)
mY i .
i=1
n
D)
(Yi m) .
i=1
Answer: A
18) If the null hypothesis states H0 : E(Y) = Y,0, then a two-sided alternative hypothesis is
A) H1 : E(Y)
Y,0.
B) H1 : E(Y)
Y,0.
Y,0.
C) H1 : Y <
D) H1 : E(Y) > Y,0.
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 39
2
Y.
Answer: A
Y Y,0
2
Y
n
B) t =
C) t =
Y Y,0
SE(Y)
(Y Y,0)2
SE(Y)
D) 1.96.
Answer: A
24) The power of the test
A) is the probability that the test actually incorrectly rejects the null hypothesis when the null is true.
B) depends on whether you use Y or Y2 for the t-statistic.
C) is one minus the size of the test.
D) is the probability that the test correctly rejects the null when the alternative is true.
Answer: D
25) The sample covariance can be calculated in any of the following ways, with the exception of:
n
1
(Xi X)(Yi Y).
A)
n1
i=1
n
1
XiYi n XY.
B)
n1
n1
i=1
C)
1
n
(Xi X)(Yi
Y).
i=1
D) rXYSYSY, where rXY is the correlation coefficient.
Answer: C
26) When the sample size n is large, the 90% confidence interval for Y is
A) Y 1.96SE(Y).
B) Y 1.64SE(Y).
C) Y 1.64 Y.
D) Y 1.96.
Answer: B
27) The standard error for the difference in means if two random variables M and W , when the two population
variances are different, is
2
2
S M+ S W
A)
B)
SM SW
.
+
nM n
W
2
SM
C)
2
SW
1
(
).
+
2 nM
nW
2
SM
D)
nM + n
W
nM
2
SW
.
nW
Answer: D
28) The t-statistic has the following distribution:
A) standard normal distribution for n < 15
B) Student t distribution with n1 degrees of freedom regardless of the distribution of the Y.
C) Student t distribution with n1 degrees of freedom if the Y is normally distributed.
D) a standard normal distribution if the sample standard deviation goes to zero.
Answer: C
29) The following statement about the sample correlation coefficient is true.
A) 1 rXY 1.
p
2
B) r XY
corr(Xi, Yi).
C) rXY < 1.
D) rXY =
2
S XY
2 2
SXSY
Answer: A
30) The correlation coefficient
A) lies between zero and one.
B) is a measure of linear association.
C) is close to one if X causes Y.
D) takes on a high value if you have a strong nonlinear relationship.
Answer: B
Ym-Yw
SE(Ym-Yw )
, where SE(Ym-Yw )=
2
sm
nm
2
sw
nw
has
2) Adult males are taller, on average, than adult females. Visiting two recent American Youth Soccer Organization
(AYSO) under 12 year old (U12) soccer matches on a Saturday, you do not observe an obvious difference in the
height of boys and girls of that age. You suggest to your little sister that she collect data on height and gender
of children in 4th to 6th grade as part of her science project. The accompanying table shows her findings.
Height of Young Boys and Girls, Grades 4-6, in inches
YBoys
Boys
SBoys
nBoys
57.8
3.9
55
YGirls
Girls
SGirls
nGirls
58.4
4.2
57
(a) Let your null hypothesis be that there is no difference in the height of females and males at this age level.
Specify the alternative hypothesis.
(b) Find the difference in height and the standard error of the difference.
(c) Generate a 95% confidence interval for the difference in height.
(d) Calculate the t-statistic for comparing the two means. Is the difference statistically significant at the 1%
level? Which critical value did you use? Why would this number be smaller if you had assumed a one -sided
alternative hypothesis? What is the intuition behind this?
Answer: (a) H0 : Boys -
Girls
3.92 4.22
+
= 0.77.
55
57
3) Math SAT scores (Y) are normally distributed with a mean of 500 and a standard deviation of 100. An evening
school advertises that it can improve students scores by roughly a third of a standard deviation, or 30 points, if
they attend a course which runs over several weeks. (A similar claim is made for attending a verbal SAT
course.) The statistician for a consumer protection agency suspects that the courses are not effective. She views
the situation as follows: H0 : Y = 500 vs. H1 : Y = 530.
(a) Sketch the two distributions under the null hypothesis and the alternative hypothesis.
(b) The consumer protection agency wants to evaluate this claim by sending 50 students to attend classes. One
of the students becomes sick during the course and drops out. What is the distribution of the average score of
the remaining 49 students under the null, and under the alternative hypothesis?
(c) Assume that after graduating from the course, the 49 participants take the SAT test and score an average of
520. Is this convincing evidence that the school has fallen short of its claim? What is the p-value for such a score
under the null hypothesis?
(d) What would be the critical value under the null hypothesis if the size of your test were 5%?
(e) Given this critical value, what is the power of the test? What options does the statistician have for increasing
the power in this situation?
Answer: (a)
(b) Y of the 49 participants is normally distributed, with a mean of 500 and a standard deviation of
14.286 under the null hypothesis. Under the alternative hypothesis, it is normally distributed with a
mean of 530 and a standard deviation of 14.286.
(c) It is possible that the consumer protection agency had chosen a group of 49 students whose average
score would have been 490 without attending the course. The crucial question is how likely it is that 49
students, chosen randomly from a population with a mean of 500 and a standard deviation of 100, will
score an average of 520. The p-value for this score is 0.081, meaning that if the agency rejected the null
hypothesis based on this evidence, it would make a mistake, on average, roughly 1 out of 12 times.
Hence the average score of 520 would allow rejection of the null hypothesis that the school has had no
effect on the SAT score of students at the 10% level.
(d) The critical value would be 523.
(e) Pr(Y < 523 H1 is true) = 0.312. Hence the power of the test is 0.688. She could increase the power by
decreasing the size of the test. Alternatively, she could try to convince the agency to hire more test
subjects, i.e., she could increase the sample size.
4) Your packaging company fills various types of flour into bags. Recently there have been complaints from one
chain of stores: a customer returned one opened 5 pound bag which weighed significantly less than the label
indicated. You view the weight of the bag as a random variable which is normally distributed with a mean of 5
pounds, and, after studying the machine specifications, a standard deviation of 0.05 pounds.
(a) You take a sample of 20 bags and weigh them. Sketch below what the average pattern of individual weights
might look like. Let the horizontal axis indicate the sampled bag number (1, 2, , 20). On the vertical axis,
mark the expected value of the weight under the null hypothesis, and two ( 1.96) standard deviations above
and below the expected value. Draw a line through the graph for E(Y) + 2 Y, E(Y), and E(Y) 2 Y. How many
of the bags in a sample of 20 will you expect to weigh either less than 4.9 pounds or more than 5.1 pounds?
(b) You sample 25 bags of flour and calculate the average weight. What is the distribution of the average
weight of these 25 bags? Repeating the same exercise 20 times, sketch what the distribution of the average
weights would look like in a graph similar to the one you drew in (b), where you have adjusted the standard
Stock/Watson 2e -- CVC2 8/23/06 -- Page 45
error of Y accordingly.
(c) For each of the twenty observations in (c) a 95% confidence interval is constructed. Draw these confidence
intervals, using the same graph as in (c). How many of these 20 confidence intervals would you expect to
weigh 5 pounds under the null hypothesis?
Answer: (a) On average, there should be one bag in every sample of 20 which weighs less than 4.9 pounds or
more than 5.1 pounds.
(b) The average weight of 25 bags will be normally distributed, with a mean of 5 pounds and a standard
deviation of 0.01 pounds. (Same graph as in (a), but with the following lower and upper bounds.)
5) Assume that two presidential candidates, call them Bush and Gore, receive 50% of the votes in the population.
You can model this situation as a Bernoulli trial, where Y is a random variable with success probability Pr(Y =
^
1) = p, and where Y = 1 if a person votes for Bush and Y = 0 otherwise. Furthermore, let p be the fraction of
p(1-p)
) in reasonably large samples, say for n 40.
successes (1s) in a sample, which is distributed N(p,
n
(a) Given your knowledge about the population, find the probability that in a random sample of 40, Bush
would receive a share of 40% or less.
(b) How would this situation change with a random sample of 100?
(c) Given your answers in (a) and (b), would you be comfortable to predict what the voting intentions for the
^
entire population are if you did not know p but had polled 10,000 individuals at random and calculated p ?
Explain.
(d) This result seems to hold whether you poll 10,000 people at random in the Netherlands or the United States,
where the former has a population of less than 20 million people, while the United States is 15 times as
populous. Why does the population size not come into play?
^
0.40 - 0.50
) = Pr(Z < -1.26)
0.25
40
Bush would receive a vote of less than 40%, although in truth, his share is 50%.
^
0.40 - 0.50
(b) Pr(p < 0.40) = Pr(Z <
) = Pr(Z < -2.00) 0.023. With this sample size, you would expect
0.25
100
this to happen only every 50 th sample.
(c) The answers in (a) and (b) suggest that for even moderate increases in the sample size, the estimator
does not vary too much from the population mean. Polling 10,000 individuals, the probability of finding
^
a p of 0.48, for example, would be 0.00003. Unless the election was extremely close, which the 2000
election was, polls are quite accurate even for sample sizes of 2,500.
(d) The distribution of sample means shrinks very quickly depending on the sample size, not the
population size. Although at first this does not seem intuitive, the standard error of an estimator is a
value which indicates by how much the estimator varies around the population value. For large sample
sizes, the sample mean typically is very close to the population mean.
6) You have collected weekly earnings and age data from a sub-sample of 1,744 individuals using the Current
Population Survey in a given year.
(a) Given the overall mean of $434.49 and a standard deviation of $294.67, construct a 99% confidence interval
for average earnings in the entire population. State the meaning of this interval in words, rather than just in
numbers. If you constructed a 90% confidence interval instead, would it be smaller or larger? What is the
intuition?
(b) When dividing your sample into people 45 years and older, and younger than 45, the information shown in
the table is found.
Age Category
Average Earnings
Age 45
Age < 45
Y
$488.87
$412.20
Standard Deviation
SY
$328.64
$276.63
507
1237
Test whether or not the difference in average earnings is statistically significant. Given your knowledge of
age-earning profiles, does this result make sense?
Answer: (a) The confidence interval for mean weekly earnings is 434.49 2.58
294.67
= 434.49 18.20 = (416.29,
1744
452.69). Based on the sample at hand, the best guess for the population mean is $434.49. However,
because of random sampling error, this guess is likely to be wrong. Instead, the interval estimate for the
average earnings lies between $416.29 and $452.69. Committing to such an interval repeatedly implies
that the resulting statement is incorrect 1 out of 100 times. For a 90% confidence interval, the only change
in the calculation of the confidence interval is to replace 2.58 by 1.64. Hence the confidence interval is
smaller. A smaller interval implies, given the same average earnings and the standard deviation, that the
statement will be false more often. The larger the confidence interval, the more likely it is to contain the
population value.
(488.87 - 412.20)
(b) Assuming unequal population variances, t =
= 4.62, which is statistically
328.642 276.632
+
12.7
507
significant at conventional levels whether you use a two-sided or one-sided alternative. Hence the null
hypothesis of equal average earnings in the two groups is rejected. Age-earning profiles typically take
on an inverted U-shape. Maximum earnings occur in the 40s, depending on some other factors such as
years of education, which are not considered here. Hence it is not clear if the alternative hypothesis
should be one-sided or two-sided. In such a situation, it is best to assume a two-sided alternative
hypothesis.
7) A manufacturer claims that a certain brand of VCR player has an average life expectancy of 5 years and 6
months with a standard deviation of 1 year and 6 months. Assume that the life expectancy is normally
distributed.
(a) Selecting one VCR player from this brand at random, calculate the probability of its life expectancy
exceeding 7 years.
(b) The Critical Consumer magazine decides to test fifty VCRs of this brand. The average life in this sample is 6
years and the sample standard deviation is 2 years. Calculate a 99% confidence interval for the average life.
(c) How many more VCRs would the magazine have to test in order to halve the width of the confidence
interval?
Answer: (a) Pr (Y > 7) = Pr(Z > 1) = 0.1587.
2
(b) 6 2.58
= 6 0.73 = (5.27, 6.73).
50
(c)
1
(2.58
2
2
1
) = 2.58
2
50
2
= 2.58
50
2
, or n = 200.
4 50
8) U.S. News and World Report ranks colleges and universities annually. You randomly sample 100 of the national
universities and liberal arts colleges from the year 2000 issue. The average cost, which includes tuition, fees,
and room and board, is $23,571.49 with a standard deviation of $7,015.52.
(a) Based on this sample, construct a 95% confidence interval of the average cost of attending a
university/college in the United States.
(b) Cost varies by quite a bit. One of the reasons may be that some universities/colleges have a better reputation
than others. U.S. News and World Reports tries to measure this factor by asking university presidents and chief
academic officers about the reputation of institutions. The ranking is from 1 ( marginal ) to 5 ( distinguished ).
You decide to split the sample according to whether the academic institution has a reputation of greater than
3.5 or not. For comparison, in 2000, Caltech had a reputation ranking of 4.7, Smith College had 4.5, and Auburn
University had 3.1. This gives you the statistics shown in the accompanying table.
Reputation
Category
Average Cost
Standard deviation
of Cost (SY)
$29,311.31
$21,227.06
$5,649.21
$6,133.38
29
71
Test the hypothesis that the average cost for all universities/colleges is the same independent of the reputation.
What alternative hypothesis did you use?
(c) What other factors should you consider before making a decision based on the data in (b)?
Answer: (a) 23,571.49 1.96
7,015.52
= 23,571.49 701.55 = (22,869.94, 24,273.04).
100
(29311.31 - 21,227.06)
= 6.33, which is statistically
5,649.21 2 6,133.38 2
+
29
71
significant whether or not you use a one-sided or two-sided hypothesis test. Your prior expectation is
that academic institutions with a higher reputation will charge more for attending, and hence a
one-sided alternative would have been appropriate here.
(c) There may be other variables which potentially have an effect on the cost of attending the academic
institution. Some of these factors might be whether or not the college/university is private or public, its
size, whether or not it has a religious affiliation, etc. It is only after controlling for these factors that the
pure relationship between reputation and cost can be identified.
9) The development office and the registrar have provided you with anonymous matches of starting salaries and
GPAs for 108 graduating economics majors. Your sample contains a variety of jobs, from church pastor to
stockbroker.
(a) The average starting salary for the 108 students was $38,644.86 with a standard deviation of $7,541.40.
Construct a 95% confidence interval for the starting salary of all economics majors at your university/college.
(b) A similar sample for psychology majors indicates a significantly lower starting salary. Given that these
students had the same number of years of education, does this indicate discrimination in the job market against
psychology majors?
(c) You wonder if it pays (no pun intended) to get good grades by calculating the average salary for economics
majors who graduated with a cumulative GPA of B+ or better, and those who had a B or worse. The data is as
shown in the accompanying table.
Cumulative GPA
B+ or better
B or worse
Average Earnings
Standard deviation
SY
$39,915.25
$37,083.33
$8,330.21
$6,174.86
59
49
Conduct a t-test for the hypothesis that the two starting salaries are the same in the population. Given that this
data was collected in 1999, do you think that your results will hold for other years, such as 2002?
Answer: (a) 38,644.86 1.96
7,541.40
= 38,644.86 1,422.32 = (37,222.54, 40,067.18).
108
(b) It suggests that the market values certain qualifications more highly than others. Comparing means
and identifying that one is significantly lower than others does not indicate discrimination.
(39,915.25 - 37,083.33)
(c) Assuming unequal population variances, t =
= 2.03. The critical value for a
8,33.212 6,174.86 2
+
59
49
one-sided test is 1.64, for a two-sided test 1.96, both at the 5% level. Hence you can reject the null
hypothesis that the two starting salaries are equal. Presumably you would have chosen as an alternative
that better students receive better starting salaries, so that this becomes your new working hypothesis.
1999 was a boom year. If better students receive better starting offers during a boom year, when the
labor market for graduates is tight, then it is very likely that they receive a better offer during a recession
year, assuming that they receive an offer at all.
10) During the last few days before a presidential election, there is a frenzy of voting intention surveys. On a given
day, quite often there are conflicting results from three major polls.
(a) Think of each of these polls as reporting the fraction of successes (1s) of a Bernoulli random variable Y,
^
where the probability of success is Pr(Y = 1) = p. Let p be the fraction of successes in the sample and assume that
p(1-p)
this estimator is normally distributed with a mean of p and a variance of
. Why are the results for all
n
polls different, even though they are taken on the same day?
^
^
^ p (1-p )
of p is the standard deviation the largest? What value does it take in the case of a maximum p ?
(c) When the results from the polls are reported, you are told, typically in the small print, that the margin of
error is plus or minus two percentage points. Using the approximation of 1.96 2, and assuming,
conservatively, the maximum standard deviation derived in (b), what sample size is required to add and
subtract (margin of error) two percentage points from the point estimate?
(d) What sample size would you need to halve the margin of error?
^
Answer: (a) Since all polls are only samples, there is random sampling error. As a result, p will differ from sample
to sample, and most likely also from p.
^
(b) p 1.96
^
p (1-p )
. A bit of thought or calculus will show that the standard deviation will be largest
n
0.5
.
n
(c) n = 2,500.
(d) n = 10,000.
11) At the Stock and Watson (http://www.pearsonhighered.com/stock_watson ) website go to Student Resources
and select the option Datasets for Replicating Empirical Results. Then select the CPS Data Used in Chapter 8
(ch8_cps.xls) and open it in Excel. This is a rather large data set to work with, so just copy the first 500
observations into a new Worksheet (these are rows 1 to 501).
In the newly created Worksheet, mark A1 to A501, then select the Data tab and click on sort. A dialog box
will open. First select Add level from one of the options on the left. Then select sort by and choose
Northeast and Largest to Smallest. Repeat the same for the South as a second option. Finally press ok.
This should give you 209 observations for average hourly earnings for the Northeast region, followed by 205
observations for the South.
a.
For each of the 209 average hourly earnings observations for the Northeast region and separately for
the South region, calculate the mean and sample standard deviation.
Use the appropriate test to determine whether or not average hourly earnings in the Northeast region
the same as in the South region.
Find the 1%, 5%, and 10% confidence interval for the differences between the two populatioon means.
Is your conclusion consistent with the test in part (b)?
In all three cases of using the confidence interval in (c), the power of the test is quite low (5%). What
can you do to increase the power of the test without reducing the size of the test?
b. t =
regions at the1% level, but you are able to reject it at the 10% and 5% significance level.
c.
For the 10% significance level, the confidence interval is ($0.46,$4.18). For the 5% significance
level, the interval becomes larger and is ($0.10,$4.54). In either one of the cases you can reject
the null hypothesis, since $0 is not contained in the confidence interval. It is only for the 1%
significance level that the null hypothesis cannot be rejected. In that case, the confidence
interval is ($-0.60, $5.24).
d. You would have to increase the sample size, since that would shrink the standard error (assuming
that the sample mean and variance will not change).
1
n-1
=
n
i=1
1
(
n-1
1
n-1
1
(Xi - X)(Yi - Y) =
n-1
n
i=1
n
i=1
XiYi - X
Yi - Y
i=1
n
i=1
n
i=1
i=1
n
XY.
XiYi n-1
2) For each of the accompanying scatterplots for several pairs of variables, indicate whether you expect a positive
or negative correlation coefficient between the two variables, and the likely magnitude of it (you can use a
small range).
(a)
(b)
(c)
(d)
Answer: (a)
(b)
(c)
(d)
1
n-1
r=
i=1
1
n-1
1
n-1
(Yi Y ) 2
i=1
n
i=1
(Xi - X)2
2
Yi -(
n
i=1
i=1
n
Yi)(
i=1
i=1
r=
YiXi - (
i=1
n
Yi)2
Xi)
n
2
Xi -(
i=1
i=1
Xi ) 2
n
i=1
n
i=1
(Yi - Y ) 2
1
n-1
n
i=1
YiXi - nYX
i=1
2
Y - nY2
i
i=1
n
n
YiXi - (
i=1
n
n
i=1
i=1
2
( Y i - 2YYi + Y2 )
i=1
( X 2 - 2XXi + X2 )
i
i=1
YiXi - nYnX
i=1
2
Y i - nY2
n
i=1
2 - X2
i
i=1
2 -(
Yi)2
i
i=1
Yi) (
2
X - nX2
i
i=1
i=1
1
n-1
(Xi - X)2
Xi)
i=1
n
n
X
i=1
.
n
2 -(
Xi)2
i
i=1
4) IQs of individuals are normally distributed with a mean of 100 and a standard deviation of 16. If you sampled
students at your college and assumed, as the null hypothesis, that they had the same IQ as the population, then
in a random sample of size
(a) n = 25, find Pr(Y < 105).
(b) n = 100, find Pr(Y > 97).
(c) n = 144, find Pr(101 < Y < 103).
Answer: (a) 0.94
(b) 0.97
(c) 0.21
~ 1 1
7
1
7
1
7
Y= ( Y1 + Y2 + Y3 + Y4 + ... + Yn1 + Yn)
4
4
4
4
4
n 4
~
Prove that Y is unbiased and consistent, but not efficient when compared to Y.
Answer: E(Y)=
=
1 1
7
1
7
1
7
( E(Y1 ) + E(Y2 ) + E(Y3 ) + E(Y4 )+ ... + E(Yn-1 ) + E(Yn))
n 4
4
4
4
4
4
~
1 7
n
1
(2 + 2 + ... + + ) =
= Y. Hence Y is unbiased.
4 4
n Y
n Y
~
~
1 1
7
1
7
1
7
var(Y) = E(Y) - Y ) 2 = E[ ( Y1 + Y2 + Y3 + Y4 + ... + Yn-1 + Yn) - Y]2
n 4
4
4
4
4
4
=
1
n2
7
1
7
2
E[ 1 (Y1 Y)+ 4 (Y2 - Y) + ... + 4 (Yn-1 - Y) + 4 (Yn - Y)]
4
1
2 49
2
2 49
2
[ 1 E(Y1 Y) + 16 E(Y2 - Y) + ... + 16 E(Yn-1 - Y) + 16 E(Yn - Y) ]
n2 16
1
1
[
n2 16
2 49
Y + 16
Since var(Y)
efficient.
2
1
Y + ... + 16
0 as n
2 49
Y + 16
2
Y] =
2
Y
n2
[ n ( 1 + 49 )] = 1.5625
6
2 16
2
Y
n
6) Imagine that you had sampled 1,000,000 females and 1,000,000 males to test whether or not females have a
higher IQ than males. IQs are normally distributed with a mean of 100 and a standard deviation of 16. You are
excited to find that females have an average IQ of 101 in your sample, while males have an IQ of 99. Does this
difference seem important? Do you really need to carry out a t-test for differences in means to determine
whether or not this difference is statistically significant? What does this result tell you about testing hypotheses
when sample sizes are very large?
Answer: The difference seems very small, both in terms of absolute values and, more importantly, in terms of
standard deviations. With a sample size as large as n=1,000,000, the standard error becomes extremely
small. This implies that the distribution of means, or differences in means, has almost turned into a
spike. In essence, you are (very close to) observing the population. It is therefore unnecessary to test
whether or not the difference is statistically significant. After all, if in the population, the male IQ were
99.99 and the female IQ were 100.01, they would be different. In general, when sample sizes become very
large, it is very easy to reject null hypotheses about population means, which involve sample means as
an estimator, even if hypothesized differences are very small. This is the result of the distribution of
sample means collapsing fairly rapidly as sample sizes increase.
7) Let Y be a Bernoulli random variable with success probability Pr(Y = 1) = p, and let Y1 ,..., Yn be i.i.d. draws
^
from this distribution. Let p be the fraction of successes (1s) in this sample. In large samples, the distribution of
^
^
p(1- p)
p will be approximately normal, i.e., p is approximately distributed N(p,
). Now let X be the number of
n
successes and n the sample size. In a sample of 10 voters (n=10), if there are six who vote for candidate A, then X
^
= 6. Relate X, the number of success, to p , the success proportion, or fraction of successes. Next, using your
knowledge of linear transformations, derive the distribution of X.
^
^
p(1- p)
), then, given that X is a linear transformation of p , X is
n
H0 is true
I
Truth (Population)
H1 is true
II
indicates a correct decision, and I and II indicate that an error has been made. In probability terms, state
the mistakes that have been made in situation I and II, and relate these to the Size of the test and the Power of
the test (or transformations of these).
Answer: I: Pr(reject H0 H0 is correct) = Size of the test.
II: Pr(reject H1 H1 is correct) = (1-Power of the test).
9) Assume that under the null hypothesis, Y has an expected value of 500 and a standard deviation of 20. Under
the alternative hypothesis, the expected value is 550. Sketch the probability density function for the null and the
alternative hypothesis in the same figure. Pick a critical value such that the p-value is approximately 5%. Mark
the areas, which show the size and the power of the test. What happens to the power of the test if the
alternative hypothesis moves closer to the null hypothesis, i.e.,, Y = 540, 530, 520, etc.?
Answer: For a given size of the test, the power of the test is lower.
10) The net weight of a bag of flour is guaranteed to be 5 pounds with a standard deviation of 0.05 pounds. You are
concerned that the actual weight is less. To test for this, you sample 25 bags. Carefully state the null and
alternative hypothesis in this situation. Determine a critical value such that the size of the test does not exceed
5%. Finding the average weight of the 25 bags to be 4.7 pounds, can you reject the null hypothesis? What is the
power of the test here? Why is it so low?
Answer: Let Y be the net weight of the bag of flour. Then H0 : E(Y) = 5 and H1 : E(Y) < 5. Under the null
hypothesis, Y is distributed normally, with a mean of 5 pounds and a standard deviation of 0.01 pounds.
The critical value is approximately 4.98 pounds. Since 4.7 pounds falls in the rejection region, the null
hypothesis is rejected. The power of the test is low here, since there is no simple alternative. In the
extreme case, where the alternative hypothesis would place the net weight marginally below five
pounds, the power of the test would approximately equal its size, or 5% in this case.
11) Some policy advisors have argued that education should be subsidized in developing countries to reduce
fertility rates. To investigate whether or not education and fertility are correlated, you collect data on
population growth rates (Y) and education (X) for 86 countries. Given the sums below, compute the sample
correlation:
n
Yi = 1.594;
Xi = 449.6;
i=1
i=1
YiXi = 6.4697;
i=1
Y
i=1
2
= 0.03982;
i
n
X
i=1
2
= 3,022.76
i
Answer: r = 0.716.
12) (Advanced) Unbiasedness and small variance are desirable properties of estimators. However, you can imagine
situations where a trade-off exists between the two: one estimator may be have a small bias but a much smaller
variance than another, unbiased estimator. The concept of mean square error estimator combines the two
^
concepts. Let be an estimator of . Then the mean square error (MSE) is defined as follows: MSE( ) = E(
^
^
^
^
)2 . Prove that MSE( ) = bias2 + var( ). (Hint: subtract and add in E( ) in E( )2 .)
^
^ ^
13) Your textbook states that when you test for differences in means and you assume that the two population
variances are equal, then an estimator of the population variance is the following pooled estimator:
2
S pooled =
1
nm+ nw - 2
nm
(Yi - Ym)2 +
nw
(Yi - Yw)2
i=1
i=1
Explain why this pooled estimator can be looked at as the weighted average of the two variances.
2
1
Answer: S pooled =
nm+ nw - 2
nm
(Yi - Ym)2 +
nw
(Yi - Yw)2
i=1
i=1
1
2
2
(n - 1) s m + (nw - 1) s w
nm+ nw - 2 m
(nw - 1)
(nm - 1)
2
2
S m+
S .
nm + nw - 2 w
nm+ nw - 2
14) Your textbook suggests using the first observation from a sample of n as an estimator of the population mean.
It is shown that this estimator is unbiased but has a variance of
2
Y , which makes it less efficient than the
sample mean. Explain why this estimator is not consistent. You develop another estimator, which is the simple
average of the first and last observation in your sample. Show that this estimator is also unbiased and show
that it is more efficient than the estimator which only uses the first observation. Is this estimator consistent?
Answer: The estimator is not consistent because its variance does not vanish as n goes to infinity, i.e., var(Y1 )
as n
~ 1
~ 1
~
~
~
1
Y= (Y1 + Yn). E(Y) = (E(Y1 ) + E(Yn)) = ( Y + Y) = Y. Hence Y is unbiased. var(Y ) = E(Y - Y)2 =
2
2
2
1
1
E[( Y1 + Yn) 2
2
1
= E[( (Y1 2
2
Y]
1
Y) + 2 (Yn -
1
2
2
Y)] = 4 [E(Y1 + Y] + E(Yn -
1
2
2
Y) ] = 4 [ Y +
2
Y]
2
Y
2
Since var(Y)
0 as n
, does not hold, Y is not consistent.
~
var(Y) < var(Y1 ), and is therefore more efficient than the estimator, which only uses the first observation.
15) Let p be the success probability of a Bernoulli random variable Y, i.e., p = Pr(Y = 1). It can be shown that p , the
p(1 p)
fraction of successes in a sample, is asymptotically distributed N(p,
. Using the estimator of the variance
n
^
^
^ p (1 - p )
of p ,
, construct a 95% confidence interval for p. Show that the margin for sampling error simplifies to
1/ n if you used 2 instead of 1.96 assuming, conservatively, that the standard error is at its maximum.
Construct a table indicating the sample size needed to generate a margin of sampling error of 1%, 2%, 5% and
10%. What do you notice about the increase in sample size needed to halve the margin of error? (The margin of
^
p (1 - p )
.
n
0.25
n
^
p (1 - p )
is at a maximum for p = 0.5, in which
n
^
1
, and the margin of sampling error is
n
1
.
n
1
n
0.01
0.02
0.05
0.10
10,000
2,500
400
100
To halve the margin of error, the sample size has to increase fourfold.
16) Let Y be a Bernoulli random variable with success probability Pr(Y = 1) = p, and let Y1 ,..., Yn be i.i.d. draws
^
from this distribution. Let p be the fraction of successes (1s) in this sample. Given the following statement
Pr(-1.96 < z < 1.96) = 0.95
^
p(1 - p)
, derive the 95% confidence interval for p by
n
p-p
< 1.96) = 0.95. Multiplying through by the standard deviation results in Pr( -1.96
p(1 - p)
n
p(1 - p) ^
< p - p < 1.96
n
^
^
p(1 - p)
)= 0.95. Subtraction of p then yields, after multiplying both sides by
n
^
p(1 - p)
< p < p + 1.96
n
^
p(1 - p)
) = 0.95. The 95% confidence interval for p then is p
n
p(1 - p)
.
n
17) Your textbook mentions that dividing the sample variance by n 1 instead of n is called a degrees of freedom
correction. The meaning of the term stems from the fact that one degree of freedom is used up when the mean
is estimated. Hence degrees of freedom can be viewed as the number of independent observations remaining
after estimating the sample mean.
Consider an example where initially you have 20 independent observations on the height of students. After
calculating the average height, your instructor claims that you can figure out the height of the 20 th student if
she provides you with the height of the other 19 students and the sample mean. Hence you have lost one
degree of freedom, or there are only 19 independent bits of information. Explain how you can find the height of
the 20th student.
Answer: Since Y =
1
20
20
Yi, 20 Y =
i=1
20
i=1
Yi = Y20 +
19
i=1
height of the other 19 students is sufficient for finding the height of the 20 th student.
18) The accompanying table lists the height (STUDHGHT) in inches and weight (WEIGHT) in pounds of five
college students. Calculate the correlation coefficient.
STUDHGHT
WEIGHT
165
165
145
155
140
74
73
72
68
66
Answer: r = 0.72.
19) (Requires calculus.) Let Y be a Bernoulli random variable with success probability Pr(Y = 1) = p. It can be
p(1 p)
shown that the variance of the success probability p is
. Use calculus to show that this variance is
n
maximized for p = 0.5.
Answer:
p(1 - p)
n
p
1- p p
1
- = 0. Hence 1 - 2p = 0 or p = .
n
n
2
20) Consider two estimators: one which is biased and has a smaller variance, the other which is unbiased and has a
larger variance. Sketch the sampling distributions and the location of the population parameter for this
situation. Discuss conditions under which you may prefer to use the first estimator over the second one.
Answer: The bias indicates how far away, on average, the estimator is from the population value. Although this
average is zero for an unbiased estimator, there may be quite some variation around the population
mean. In a single draw, there is therefore a high probability of being some distance away from the
population mean. On the other hand, if the variance is very small and the estimator is biased by a small
amount, then the probability of being closer to the population value may be higher. (The biased
estimator may have a smaller mean square error than the unbiased estimator.)
Answer:
Without the trendline added, there does not seem to be much of a linear relationship between average
hourly earnings and years of education. Perhaps a linear relationship is not plausible since it would
imply that the returns to education would become smaller as further years of education are added.
However, and regardless of the linearity issues, there is a positive relationship in the data between the
two variables, which becomes visible when the trend line is added. The correlation coefficient is positive
and has a value of 46.9%, which is reasonably high (the correlation between height and weight for
college students is approximately 50% by comparison).
22) IQ scores are normally distributed with an average of 100 and a standard deviation of 16. Some research
suggests that left-handed individuals have a higher IQ score than right-handed individuals. To test this
hypothesis, a researcher randomly selects 132 individuals and finds that their average IQ is 103.2 with a sample
standard deviation of 14.6. Using the results from the sample, can you reject the null hypothesis that
left-handed people have an IQ of 100 vs. the alternative that they have a higher IQ? What critical value should
you choose if the size of the test is 5%?
Answer: The hypothesis is H0 :
103.2-100
=2.52.
14.6
132
Since the critical value for the one-sided alternative is 1.645 at the 5% significance level, the researcher
should reject the null hypothesis that left-handed individuals have an IQ of 100.
Answer:
The scatterplot suggests that, on average, schools which perform highly on the reading score will also
perform highly on the mathematics score. The sample correlation between the two series is 92.3%,
suggesting a high positive correlation between the two variables.
24) In 2007, a study of close to 250,000 18-19 year-old Norwegian males found that first-borns have an IQ that is
2.3 points higher than those who are second -born. To see if you can find a similar evidence at your university,
you collect data from 250 students, of which 140 are first-borns. After subjecting each of these individuals to an
IQ test, you find that the first-borns score 108.3 with a standard deviation of 13.2, while the second borns
achieve 107.1 with a standard deviation of 11.6. You hypothesize that first -borns and second-borns in a
university population have identical IQs against the one -sided alternative hypothesis that first borns have
higher IQs. Using a size of the test of 5%, what is your conclusion?
Answer: Given that your null hypothesis states H0 : first = second , your test statistic is t =
108.3 - 107.1
=
13.22 11.62
+
140
110
0.76. Since the critical value for the one-sided alternative test is 1.64, you cannot reject the null
hypothesis.
1) When the estimated slope coefficient in the simple regression model, 1 , is zero, then
A) R2 = Y .
B) 0 < R2 < 1.
C) R2 = 0.
D) R2 > (SSR/TSS).
Answer: C
2) The regression R2 is defined as follows:
ESS
A)
TSS
B)
RSS
TSS
n
C)
n
i=1
D)
(Yi - Y)(Xi - X)
i=1
n
(Yi - Y)2
i=1
(Xi - X)2
SSR
n-2
Answer: A
3) The standard error of the regression (SER) is defined as follows
n ^
1
2
A)
ui
n-2
i=1
B) SSR
C) 1-R2
D)
1
n-1
n ^
2
ui
i=1
Answer: A
4) (Requires Appendix material) Which of the following statements is correct?
A) TSS = ESS + SSR
B) ESS = SSR + TSS
C) ESS > TSS
D) R2 = 1 - (ESS/TSS)
Answer: A
5) Binary variables
A) are generally used to control for outliers in your sample.
B) can take on more than two values.
C) exclude certain individuals from your sample.
D) can take on only two values.
Answer: D
6) The following are all least squares assumptions with the exception of:
A) The conditional distribution of ui given Xi has a mean of zero.
B) The explanatory variable in regression model is normally distributed.
C) (Xi, Yi), i = 1,..., n are independently and identically distributed.
D) Large outliers are unlikely.
Answer: B
7) The reason why estimators have a sampling distribution is that
A) economics is not a precise science.
B) individuals respond differently to incentives.
C) in real life you typically get to sample many times.
D) the values of the explanatory variable and the error term differ across samples.
Answer: D
8) In the simple linear regression model, the regression slope
A) indicates by how many percent Y increases, given a one percent increase in X.
B) when multiplied with the explanatory variable will give you the predicted Y.
C) indicates by how many units Y increases, given a one unit increase in X.
D) represents the elasticity of Y on X.
Answer: C
9) The OLS estimator is derived by
A) connecting the Yi corresponding to the lowest Xi observation with the Yi corresponding to the highest Xi
observation.
B) making sure that the standard error of the regression equals the standard error of the slope estimator.
C) minimizing the sum of absolute residuals.
D) minimizing the sum of squared residuals.
Answer: D
10) Interpreting the intercept in a sample regression function is
A) not reasonable because you never observe values of the explanatory variables around the origin.
B) reasonable because under certain conditions the estimator is BLUE.
C) reasonable if your sample contains values of Xi around the origin.
D) not reasonable because economists are interested in the effect of a change in X on the change in Y.
Answer: C
11) The variance of Yi is given by
A)
2
0 +
2
1 var(Xi) + var(ui).
2
1 var(Xi) + var(ui).
A) Yi - 0 - 1 Xi
B) Yi - 0 - 1 Xi
^
C) Yi - Yi
D) (Yi - Y)2
Answer: C
14) The slope estimator, 1 , has a smaller standard error, other things equal, if
A) there is more variation in the explanatory variable, X.
B) there is a large variance of the error term, u.
C) the sample size is smaller.
D) the intercept, 0 , is small.
Answer: A
15) The regression R2 is a measure of
A) whether or not X causes Y.
B) the goodness of fit of your regression line.
C) whether or not ESS > TSS.
D) the square of the determinant of R.
Answer: B
16) (Requires Appendix) The sample regression line estimated by OLS
A) will always have a slope smaller than the intercept.
B) is exactly the same as the population regression line.
C) cannot have a slope of zero.
D) will always run through the point (X, Y).
Answer: D
17) The OLS residuals
A) can be calculated using the errors from the regression function.
B) can be calculated by subtracting the fitted values from the actual values.
C) are unknown since we do not know the population regression function.
D) should not be used in practice since they indicate that your regression does not run through all your
observations.
Answer: B
^
19) If the three least squares assumptions hold, then the large sample normal distribution of 1 is
1 var[Xi - X)ui]
).
A) N(0,
n
[var(Xi)]2
1 var(ui)]2
).
B) N( 1 ,
n [var(Xi)]2
2
u
C) N( 1 ,
n
i=1
(Xi - X)2
1 var(ui)]
).
D) N( 1 ,
n [var(Xi)]2
Answer: B
20) In the simple linear regression model Yi = 0 + 1 Xi + ui,
A) the intercept is typically small and unimportant.
B) 0 + 1 Xi represents the population regression function.
C) the absolute value of the slope is typically between 0 and 1.
D) 0 + 1 Xi represents the sample regression function.
Answer: B
21) To obtain the slope estimator using the least squares principle, you divide the
A) sample variance of X by the sample variance of Y.
B) sample covariance of X and Y by the sample variance of Y.
C) sample covariance of X and Y by the sample variance of X.
D) sample variance of X by the sample covariance of X and Y.
Answer: C
22) To decide whether or not the slope coefficient is large or small,
A) you should analyze the economic importance of a given increase in X.
B) the slope coefficient must be larger than one.
C) the slope coefficient must be statistically significant.
D) you should change the scale of the X variable if the coefficient appears to be too small.
Answer: A
23) E(ui Xi) = 0 says that
A) dividing the error by the explanatory variable results in a zero (on average).
B) the sample regression function residuals are unrelated to the explanatory variable.
C) the sample mean of the Xs is much larger than the sample mean of the errors.
D) the conditional distribution of the error given the explanatory variable has a zero mean.
Answer: D
24) In the linear regression model, Yi = 0 + 1 Xi + ui, 0 + 1 Xi is referred to as
A) the population regression function.
B) the sample regression function.
C) exogenous variation.
D) the right-hand variable or regressor.
Answer: A
25) Multiplying the dependent variable by 100 and the explanatory variable by 100,000 leaves the
A) OLS estimate of the slope the same.
B) OLS estimate of the intercept the same.
C) regression R2 the same.
D) variance of the OLS estimators the same.
Answer: C
26) Assume that you have collected a sample of observations from over 100 households and their consumption and
income patterns. Using these observations, you estimate the following regression Ci = 0 + 1 Yi+ ui where C is
consumption and Y is disposable income. The estimate of 1 will tell you
Income
A)
Consumption
B) The amount you need to consume to survive
Income
C)
Consumption
D)
Consumption
Income
Answer: D
27) In which of the following relationships does the intercept have a real-world interpretation?
A) the relationship between the change in the unemployment rate and the growth rate of real GDP
(Okuns Law)
B) the demand for coffee and its price
C) test scores and class-size
D) weight and height of individuals
Answer: A
^
2) (Requires Appendix material) At a recent county fair, you observed that at one stand peoples weight was
forecasted, and were surprised by the accuracy (within a range). Thinking about how the person could have
predicted your weight fairly accurately (despite the fact that she did not know about your heavy bones), you
think about how this could have been accomplished. You remember that medical charts for children contain
5%, 25%, 50%, 75% and 95% lines for a weight/height relationship and decide to conduct an experiment with
110 of your peers. You collect the data and calculate the following sums:
n
i=1
n
y
i=1
Yi = 17,375,
2
= 94,228.8,
i
Xi = 7,665.5,
i=1
n
2
x i = 1,248.9,
i=1
x iy i = 7,625.9
i=1
where the height is measured in inches and weight in pounds. (Small letters refer to deviations from means as
in zi = Zi Z.)
(a) Calculate the slope and intercept of the regression and interpret these.
(b) Find the regression R2 and explain its meaning. What other factors can you think of that might have an
influence on the weight of an individual?
^
^
7625.9
Answer: (a) 1 =
= 6.11, 0 = 157.95 - 6.11 69.69 = -267.86. For every additional inch in height, students
1,248.9
3) You have obtained a sub-sample of 1744 individuals from the Current Population Survey (CPS) and are
interested in the relationship between weekly earnings and age. The regression, using
heteroskedasticity-robust standard errors, yielded the following result:
Earn = 239.16 + 5.20 Age, R2 = 0.05, SER = 287.21.,
where Earn and Age are measured in dollars and years respectively.
(a) Interpret the results.
(b) Is the effect of age on earnings large?
(c) Why should age matter in the determination of earnings? Do the results suggest that there is a guarantee for
earnings to rise for everyone as they become older? Do you think that the relationship between age and
earnings is linear?
(d) The average age in this sample is 37.5 years. What is annual income in the sample?
(e) Interpret the measures of fit.
Answer: (a) A person who is one year older increases her weekly earnings by $5.20. There is no meaning attached
to the intercept. The regression explains 5 percent of the variation in earnings.
(b) Assuming that people worked 52 weeks a year, the effect of being one year older translates into an
additional $270.40 a year. This does not seem particularly large in 2002 dollars, but may have been
earlier.
(c) In general, age-earnings profiles take on an inverted U-shape. Hence it is not linear and the linear
approximation may not be good at all. Age may be a proxy for experience, which in itself can
approximate on the job training. Hence the positive effect between age and earnings. The results do
not suggest that there is a guarantee for earnings to rise for everyone as they become older since the
regression R2 does not equal 1. Instead the result holds on average.
Y = 0 + 1 X. Substituting the estimates for the slope and the intercept then
(d) Since 0 = Y - 1 X
results in average weekly earnings of $434.16 or annual average earnings of $22,576.32.
(e) The regression R2 indicates that five percent of the variation in earnings is explained by the model.
The typical error is $287.21.
4) The baseball team nearest to your home town is, once again, not doing well. Given that your knowledge of
what it takes to win in baseball is vastly superior to that of management, you want to find out what it takes to
win in Major League Baseball (MLB). You therefore collect the winning percentage of all 30 baseball teams in
MLB for 1999 and regress the winning percentage on what you consider the primary determinant for wins,
which is quality pitching (team earned run average). You find the following information on team performance:
Summary of the Distribution of Winning Percentage and
Team Earned Run Average for MLB in 1999
Average
Standard
Percentile
deviation
10% 25% 40% 50%
60% 75%
(median)
4.71
0.53
3.84 4.35 4.72 4.78
4.91 5.06
Team
ERA
Winning
0.50
Percentage
0.08
0.40
0.43
0.46
0.48
0.49
0.59
90%
5.25
0.60
(a) What is your expected sign for the regression slope? Will it make sense to interpret the intercept? If not,
should you omit it from your regression and force the regression line through the origin?
(b) OLS estimation of the relationship between the winning percentage and the team ERA yield the following:
Winpct = 0.9 0.10 teamera , R2 =0.49, SER = 0.06,
where winpct is measured as wins divided by games played, so for example a team that won half of its games
Stock/Watson 2e -- CVC2 8/23/06 -- Page 73
5) You have learned in one of your economics courses that one of the determinants of per capita income (the
Wealth of Nations) is the population growth rate. Furthermore you also found out that the Penn World
Tables contain income and population data for 104 countries of the world. To test this theory, you regress the
GDP per worker (relative to the United States) in 1990 ( RelPersInc) on the difference between the average
population growth rate of that country (n) to the U.S. average population growth rate (nus ) for the years 1980
to 1990. This results in the following regression output:
RelPersInc = 0.518 18.831 18.831 (n nus), R2 = 0.522, SER = 0.197
(a) Interpret the results carefully. Is this relationship economically important?
(b) What would happen to the slope, intercept, and regression R2 if you ran another regression where the
above explanatory variable was replaced by n only, i.e., the average population growth rate of the country?
(The population growth rate of the United States from 1980 to 1990 was 0.009.) Should this have any effect on
the t-statistic of the slope?
(c) 31 of the 104 countries have a dependent variable of less than 0.10. Does it therefore make sense to interpret
the intercept?
Answer: (a) A relative increase in the population rate of one percentage point, from 0.01 to 0.02, say, lowers
relative per-capita income by almost 20 percentage points (0.188). This is a quantitatively important and
large effect. Nations which have the same population growth rate as the United States have, on average,
roughly half as much per capita income.
(b) The interpretation of the partial derivative is unaffected, in that the slope still indicates the effect of a
one percentage point increase in the population growth rate. The regression R2 will remain the same
since only a constant was removed from the explanatory variable. The intercept will change as a result of
the change in X.
(c) To interpret the intercept, you must observe values of X close to zero, not Y.
6) The neoclassical growth model predicts that for identical savings rates and population growth rates, countries
should converge to the per capita income level. This is referred to as the convergence hypothesis. One way to
test for the presence of convergence is to compare the growth rates over time to the initial starting level.
(a) If you regressed the average growth rate over a time period (1960-1990) on the initial level of per capita
income, what would the sign of the slope have to be to indicate this type of convergence? Explain. Would this
result confirm or reject the prediction of the neoclassical growth model?
(b) The results of the regression for 104 countries were as follows:
g6090 = 0.019 0.0006 RelProd 60 , R2 = 0.00007, SER = 0.016,
where g6090 is the average annual growth rate of GDP per worker for the 1960 -1990 sample period, and
RelProd60 is GDP per worker relative to the United States in 1960.
Interpret the results. Is there any evidence of unconditional convergence between the countries of the world? Is
this result surprising? What other concept could you think about to test for convergence between countries?
(c) You decide to restrict yourself to the 24 OECD countries in the sample. This changes your regression output
as follows:
g6090 = 0.048 0.0404 RelProd 60 , R2 = 0.82 , SER = 0.0046
How does this result affect your conclusions from above?
Answer: (a) You would require a negative sign. Countries that are far ahead of others at the beginning of the
period would have to grow relatively slower for the others to catch up. This represents unconditional
convergence, whereas the neoclassical growth model predicts conditional convergence, i.e., there will
only be convergence if countries have identical savings, population growth rates, and production
technology.
(b) An increase in 10 percentage points in RelProd60 results in a decrease of 0.00006 in the growth rate
from 1960 to 1990, i.e., countries that were further ahead in 1960 do grow by less. There are some
countries in the sample that have a value of RelProd60 close to zero (China, Uganda, Togo, Guinea) and
you would expect these countries to grow roughly by 2 percent per year over the sample period. The
regression R2 indicates that the regression has virtually no explanatory power. The result is not
surprising given that there are not many theories that predict unconditional convergence between the
countries of the world.
(c) Judging by the size of the slope coefficient, there is strong evidence of unconditional convergence for
the OECD countries. The regression R2 is quite high, given that there is only a single explanatory
variable in the regression. However, since we do not know the sampling distribution of the estimator in
this case, we cannot conduct inference.
7) In 2001, the Arizona Diamondbacks defeated the New York Yankees in the Baseball World Series in 7 games.
Some players, such as Bautista and Finley for the Diamondbacks, had a substantially higher batting average
during the World Series than during the regular season. Others, such as Brosius and Jeter for the Yankees, did
substantially poorer. You set out to investigate whether or not the regular season batting average is a good
indicator for the World Series batting average. The results for 11 players who had the most at bats for the two
teams are:
AZWsavg = 0.347 + 2.290 AZSeasavg , R2 =0.11, SER = 0.145,
NYWsavg = 0.134 + 0.136 NYSeasavg , R2 =0.001, SER = 0.092,
where Wsavg and Seasavg indicate the batting average during the World Series and the regular season
respectively.
(a) Focusing on the coefficients first, what is your interpretation?
(b) What can you say about the explanatory power of your equation? What do you conclude from this?
Answer: (a) The two regressions are quite different. For the Diamondbacks, players who had a 10 point higher
batting average during the regular season had roughly a 23 point higher batting average during the
World Series. Hence top performers did relatively better. The opposite holds for the Yankees.
(b) Both regressions have little explanatory power as seen from the regression R2 . Hence performance
during the season is a poor forecast of World Series performance.
8) For the simple regression model of Chapter 4, you have been given the following data:
420
Yi = 274, 745.75;
i=1
Xi = 8,248.979;
i=1
i=1
420
420
420
i=1
2
X i = 163,513.03;
420
2
Y i = 179,878, 841.13
i=1
Answer: (a) 1 =
^
5,392, 705 - 420 19.64 654.16
= -2.28; 0 = 654.2-2.28 19.6 = 698.9.
163513.03 - 420 19.64 2
Country
Currency
Indonesia
Italy
South Korea
Chile
Spain
Hungary
Japan
Taiwan
Thailand
Czech Rep.
Russia
Denmark
Sweden
Mexico
France
Israel
China
South Africa
Switzerland
Poland
Germany
Malaysia
New Zealand
Singapore
Brazil
Rupiah
Lira
Won
Peso
Peseta
Forint
Yen
Dollar
Baht
Crown
Ruble
Crown
Crown
Peso
Franc
Shekel
Yuan
Rand
Franc
Zloty
Mark
Dollar
Dollar
Dollar
Real
Canada
Australia
Argentina
Britain
United States
Dollar
Dollar
Peso
Pound
Dollar
2.85
2.59
2.50
1.90
2.51
1.47
1.68
1.00
0.63
The concept of purchasing power parity or PPP (the idea that similar foreign and domestic goods should
have the same price in terms of the same currency, Abel, A. and B. Bernanke, Macroeconomics, 4th edition,
Boston: Addison Wesley, 476) suggests that the ratio of the Big Mac priced in the local currency to the U.S.
dollar price should equal the exchange rate between the two countries.
(a) Enter the data into your regression analysis program (EViews, Stata, Excel, SAS, etc.). Calculate the
predicted exchange rate per U.S. dollar by dividing the price of a Big Mac in local currency by the U.S. price of
a Big Mac ($2.51).
(b) Run a regression of the actual exchange rate on the predicted exchange rate. If purchasing power parity
held, what would you expect the slope and the intercept of the regression to be? Is the value of the slope and
the intercept far from the values you would expect to hold under PPP?
(c) Plot the actual exchange rate against the predicted exchange rate. Include the 45 degree line in your graph.
Which observations might cause the slope and the intercept to differ from zero and one?
Answer: (a)
Country
Indonesia
Italy
South Korea
Chile
Spain
Hungary
Japan
Taiwan
Thailand
Czech Rep.
Russia
Denmark
Sweden
Mexico
France
Israel
China
South Africa
Switzerland
Poland
Germany
Malaysia
New Zealand
Singapore
Brazil
Canada
Australia
Argentina
Britain
5777
1793
1195
502
149
135
117
27.9
21.9
21.7
15.7
9.86
9.56
8.33
7.37
5.78
3.94
3.59
2.35
2.19
1.99
1.80
1.35
1.27
1.18
1.14
1.03
1.00
0.76
11)
a.
b.
c.
d.
e.
f.
Use equation (4.7) and the sums of columns (v) and (vi) to generate the slope of the regression.
Use equation (4.8) to generate the intercept.
Display the regression line (4.9) and interpret the coefficients.
Use equation (4.16) and the sum of column (vii) to calculate the regression R2 .
Use equation (4.19) to calculate the SER.
Use the Regression function in Excel to verify the results.
^
^
-3418.76
1 = 1499.58 = - 2.27981
0 = 274745.75-(-2.27981)8248.979 = 698.933
Yi= 698.9 - 2.28 Xi. A decrease in the student-teacher ratio of one results in an increase in test
scores of 2.28. It is best not to interpret the intercept; it simply determines the height of the
regression line.
d. To calculate the regression R2 , you need the TSS given from the sum in column (vii) and either the
ESS or SSR. In principle, you could use equation (4.10) to generate the residuals, square these and sum
n
^2
them up to get SSR. However, the textbook suggests a shortcut at the bottom of p. 142:
ui =
i=1
n
n
^2
(Yi-Y)2 - 1
(Xi-X)2 (the cross-product vanishes due to the orthogonality conditions (4.32)
i=1
i=1
and (4.36)). The various terms on the RHS of the equation have been calculated
and equation (4.35)
n
^2
7794.11
implies that 1
(Xi-X)2 = ESS = 7794.11. Hence the regression R2 = 152109.6 = 0.051
i=1
e.
The answer in (d) can be used to calculate the SSR, which are 144325.5. Hence the SEE must be 18.6.
f.
SUMMARY OUTPUT
Regression Statistics
Multiple R
R Square
Adjusted R Square
Standard Error
Observations
0.226
0.051
0.049
18.581
420
ANOVA
df
SS
Stock/Watson 2e -- CVC2 8/23/06 -- Page 81
Regression
Residual
Total
1 7794.11
418 144315.5
419 152109.6
Coefficients
698.93
-2.28
Intercept
str
12) You have obtained a sample of 14,925 individuals from the Current Population Survey (CPS) and are interested
in the relationship between average hourly earnings and years of education. The regression yields the
following result:
^
b.
c.
Why should education matter in the determination of earnings? Do the results suggest that there is
a guarantee for average hourly earnings to rise for everyone as they receive an additional year of
education? Do you think that the relationship between education and average hourly earnings is
linear?
d.
The average years of education in this sample is 13.5 years. What is mean of average hourly
earnings in the sample?
e.
Answer: a. A person with one more year of education increases her earnings by $1.71. There is no meaning
attached to the intercept, it just determines the height of the regression. The model explains 5 percent of
the variation in average hourly earnings.
b. The difference between a high school graduate and a college graduate is four years of education.
Hence a college graduate will earn almost $7 more per hour, on average ($6.84 to be precise). If you
assume that there are 2,000 working hours per year, then the average salary difference would be close to
$14,000 (actually $13,680). Depending on how much you have spent for an additional year of education
and how much income you have forgone, this does not seem particularly large.
c. In general, you would expect to find a positive relationship between years of education and average
hourly earnings. Education is considered investment in human capital. If this were not the case, then it
would be a puzzle as to why there are students in the econometrics course surely they are not there to
just find themselves (which would be quite expensive in most cases). However, if you consider
education as an investment and you wanted to see a return on it, then the relationship will most likely
not be linear. For example, a constant percent return would imply an exponential relationship whereby
the additional year of education would bring a larger increase in average hourly earnings at higher
levels of education. The results do not suggest that there is a guarantee for earnings to rise for everyone
as they become more educated since the regression R2 does not equal 1. Instead the result holds on
average.
^
d. Since 0 = Y - 1 X Y = 0 + 1 X Substituting the estimates for the slope and the intercept then
results in a mean of average hourly earnings of roughly $18.50.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 82
e. The typical prediction error is $9.30. Since the measure is related to the deviation of the actual and
fitted values, the unit of measurement must be the same as that of the dependent variable, which is in
dollars here.
ESS
, where ESS is given by
TSS
^2
^
^
^
^
^
^
(Y - Y)2 . But Yi = 0 + 1 Xi and Y = 0 + 1 X.
i=1
2
1 (Xi - X) and therefore ESS =
^2
1
i=1
n
correlation coefficient is r2 =
i=1
n
i=1
n
(y ix i)2
2
xi
n
i=1
=
2
yi
n
(y ix i)2
2
xi
i=1
n
2
yi
i=1
2
xi
i=1
i=1
=
n
n
2 2
2
(
xi )
yi
i=1
i=1
^2
2
xi
i=1
n
2
yi
i=1
are the same. Correlation does not imply causation. Income is a regressor in the consumption function,
yet consumption enters on the right-hand side of the GDP identity. Regressing the weight of individuals
on the height is a situation where causality is without doubt, since the author of this test bank should be
seven feet tall otherwise. The authors of the textbook use weather data to forecast orange juice prices
later in the text.
2) You have analyzed the relationship between the weight and height of individuals. Although you are quite
confident about the accuracy of your measurements, you feel that some of the observations are extreme, say,
two standard deviations above and below the mean. Your therefore decide to disregard these individuals.
What consequence will this have on the standard deviation of the OLS estimator of the slope?
Answer: Other things being equal, the standard error of the slope coefficient will decrease the larger the variation
in X. Hence you prefer more variation rather than less. This can be seen from formula (4.20) in the text.
Intuitively it is easier for OLS to detect a response to a unit change in X if the data varies more.
3) In order to calculate the regression R2 you need the TSS and either the SSR or the ESS. The TSS is fairly
straightforward to calculate, being just the variation of Y. However, if you had to calculate the SSR or ESS by
hand (or in a spreadsheet), you would need all fitted values from the regression function and their deviations
from the sample mean, or the residuals. Can you think of a quicker way to calculate the ESS simply using
terms you have already used to calculate the slope coefficient?
n
Answer: The ESS is given by
i=1
and therefore ESS =
^2
^
^
^
^
^
^
^2
^
(Yi - Y)2 . But Yi = 0 + 1 Xi and Y = 0 + 1 X. Hence (Yi - Y)2 = 1 (Xi - X)2 ,
n
(Xi - X)2 . The right-hand side contains the estimated slope squared and the
i=1
denominator of the slope, i.e., all values that have already been calculated.
4) (Requires Appendix material) In deriving the OLS estimator, you minimize the sum of squared residuals with
^
respect to the two parameters 0 and 1 . The resulting two equations imply two restrictions that OLS places on
n ^
n ^
ui Xi = 0. Show that you get the same formula for the regression slope
ui = 0 and
the data, namely that
i=1
i=1
and the intercept if you impose these two conditions on the sample regression function.
^
^
Yi = n o + ^1
i=1
n ^
Xi +
ui . Imposing the first restriction, namely that the sum of the residuals is zero, dividing
i=1
i=1
n
For the second restriction, multiply both sides of the sample regression function by Xi and then sum
n ^
n
n
n ^
n
2
uiXi . After imposing the restriction
Xi + ^1
uiXi =0
YiXi = ^o
Xi +
both sides to get
i=1
i=1
i=1
i=1
i=1
and substituting the formula for the intercept, you get
n
n
n
n
^
2
2 ^
YiXi = (Y - ^1 X)nX + ^1
YiXi - nYX = ^1
X i or
X i - 1 X , which, after isolating 1
i=1
i=1
i=1
i=1
and dividing by the variation in ,X results in the OLS estimator for the slope.
5) (Requires Appendix material) Show that the two alternative formulae for the slope given in your textbook are
identical.
n
n
1
(Xi X)(Yi Y)
XiYi XY
n
i=1
i=1
=
n
n
1
2
2
Xi -X
(Xi - X)2
n
i=1
i=1
Answer: Lets start with the first equality. The numerator of the right -hand side expression can be written as
follows:
n
(Xi - X)(Yi - Y) =
XiYi - X
Yi - Y
Xi - nXY
i=1
i=1
i=1
i=1
i=1
n
n
n
YiXi - nXY. (Note that
Xi = nX .)
YiXi - nXY - nXY + nXY =
=
i=1
i=1
i=1
Multiplying out the terms in the denominator and moving the summation sign into the expression in
n
2
parentheses similarly yields
X i - nX2 . Dividing both of these expressions by n then results in the
i=1
left-hand side fraction.
6) (Requires Calculus) Consider the following model:
Yi = 0 + ui.
Derive the OLS estimator for 0 .
n
Answer: To derive the OLS estimator, minimize the sum of squared prediction mistakes
i=1
n
the derivative with respect to b0 results in
= (-2)
(Yi - b0 ) = (-2)
i=1
n
^
(-2)
Yi - n 0 = 0
i=1
b0
i=1
(Yi - b0 )2 =
n
i=1
2(Yi - b0 )(-1)
i=1
Yi - nb0 . Setting the derivative to zero then results in the OLS estimator:
i=1
^
b0
(Yi - b0 )2 =
(Yi - b0 )2 . Taking
o=Y .
b1
i=1
(Yi - b1 Xi)2 =
n
i=1
b1
(Yi - b1 Xi)2 =
2(Yi - b1 Xi)(-Xi)
i=1
n
= (-2)
i=1
OLS estimator:
2
(YiXi - b1 X i ) . Setting the derivative to zero then results in the
i=1
n
n
(-2)(
i=1
YiXi - 1
n
i=1
2
Xi =0
1=
i=1
n
YiXi
.
2
Xi
i=1
8) Show first that the regression R2 is the square of the sample correlation coefficient. Next, show that the slope of
a simple regression of Y on X is only identical to the inverse of the regression slope of X on Y if the regression
R2 equals one.
ESS
, where ESS is given by
TSS
i=1
^
(Yi - Y)2 . But Yi = 0 + 1 Xi and Y = 0 + 1 X .
n
^2
2
(Xi - X)2 . Using small letters to indicate
1 (Xi - X) , and therefore ESS = 1
i=1
n
^2
2
xi
1
i=1
deviations from mean, i.e., zi = Zi - Z, we get that the regression R2 =
. The square of the
n
2
yi
i=1
^
^2
n
correlation coefficient is r2 =
i=1
n
n
(y ix i)2
2
xi
i=1
2
yi
i=1
n
(y ix i)2
^2
2
xi
i=1
i=1
=
n
n
2
2
(
yi
x i )2
i=1
i=1
2
xi
i=1
n
2
yi
i=1
Now 1 = r2 =
2
xi
i=1
n
2
yi
i=1
n
^2
1 =
i=1
n
2
yi
. But
^2
2
xi
i=1
^ i=1
1 = 1 n
i=1
xiy i
^
2
xi
and therefore 1 =
i=1
n
i=1
2
yi
,
xiy i
Yi = 0 + 1 Xi + ui.
First, take averages on both sides of the equation. Second, subtract the resulting equation from the above
equation to write the sample regression function in deviations from means. (For simplicity, you may want to
use small letters to indicate deviations from the mean, i.e., zi = Zi Z.) Finally, illustrate in a two-dimensional
diagram with SSR on the vertical axis and the regression slope on the horizontal axis how you could find the
least squares estimator for the slope by varying its values through trial and error.
^
Answer: Taking averages results in the following equation: Y = 0 + 1 X. Subtracting this equation from the
^
n ^
^
^
2
u i = (y i = 1 x i )2 is a quadratic which takes on different values for different choices of 1
i=1
(the y and x are given in this case, i.e., different from the usual calculus problems, they cannot vary
here). You could choose a starting value of the slope and calculate SSR. Next you could choose a
different value for the slope and calculate the new SSR. There are two choices for the new slope value for
you to make: first, in which direction you want to move, and second, how large a distance you want to
choose the new slope value from the old one. (In essence, this is what sophisticated search algorithms
do.) You continue with this procedure until you find the smallest SSR. The slope coefficient which has
generated this SSR is the OLS estimator.
SSR =
10) Given the amount of money and effort that you have spent on your education, you wonder if it was (is) all
worth it. You therefore collect data from the Current Population Survey (CPS) and estimate a linear
relationship between earnings and the years of education of individuals. What would be the effect on your
regression slope and intercept if you measured earnings in thousands of dollars rather than in dollars? Would
the regression R2 be affected? Should statistical inference be dependent on the scale of variables? Discuss.
Answer: It should be clear that interpretation of estimated relationships and statistical inference should not
depend on the units of measurement. Otherwise whim could dictate conclusions. Hence the regression
R2 and statistical inference cannot be effected. It is easy but tedious to show this mathematically. Next,
the intercept indicates the value of Y when X is zero. The change in the units of measurement have no
^
effect on this, since the change in X is cancelled by the change in 1 . The slope coefficient will change to
compensate for the change in the units of measurement of X. In the above case, the decimal point will
move 3 digits to the left.
where * indicates that the variable has been standardized. What are the units of measurement for the
dependent and explanatory variable? Why would you want to transform both variables in this way? Show that
the OLS estimator for the intercept equals zero. Next prove that the OLS estimator for the slope in this case is
identical to the formula for the least squares estimator where the variables have not been standardized, times
^
^ SX
the ratio of the sample standard deviation of X and Y, i.e., 1 = 1 *
.
SY
Answer: The units of measurement are in standard deviations. Standardizing the variables allows conversion into
common units and allows comparison of the size of coefficients. The mean of standardized variables is
zero, and hence the OLS intercept must also be zero. The slope coefficient is given by the formula
n
i=1
n
1=
* *
xi yi
, where small letters indicate deviations from mean, i.e., z = Z - Z.
*2
xi
i=1
n
Note that means of standardized variables are zero, and hence we get
1=
* *
Xi Yi
i=1
n
. Writing this
*2
Xi
i=1
1 1
SX SY
1
n
i=1
n
2
S X i=1
as the sought after expression after simplification.
x iyi
, which is the same
2
xi
12) The OLS slope estimator is not defined if there is no variation in the data for the explanatory variable. You are
interested in estimating a regression relating earnings to years of schooling. Imagine that you had collected
data on earnings for different individuals, but that all these individuals had completed a college education (16
years of education). Sketch what the data would look like and explain intuitively why the OLS coefficient does
not exist in this situation.
Answer: There is no variation in X in this case, and it is therefore unreasonable to ask by how much Y would
change if X changed by one unit. Regression analysis cannot figure out the answer to this question,
because a change in X never happens in the sample.
13) Indicate in a scatterplot what the data for your dependent variable and your explanatory variable would look
like in a regression with an R2 equal to zero. How would this change if the regression R2 was equal to one?
Answer: For the zero regression R2 , the data would look something like this:
In the case of the regression R2 being one, all observations would lie on a straight line.
14) Imagine that you had discovered a relationship that would generate a scatterplot very similar to the
2
relationship Yi = X i , and that you would try to fit a linear regression through your data points. What do you
expect the slope coefficient to be? What do you think the value of your regression R2 is in this situation? What
are the implications from your answers in terms of fitting a linear regression through a non -linear relationship?
Answer: You would expect the slope to be a straight line (=0) and the regression R2 to be zero in this situation.
The implication is that although there may be a relationship between two variables, you may not detect
it if you use the wrong functional form.
15) (Requires Appendix material) A necessary and sufficient condition to derive the OLS estimator is that the
n ^
n ^
n ^
ui = 0 and
uiXi = 0. Show that these conditions imply that
uiYi =
following two conditions hold:
i=1
i=1
i=1
0.
n ^
n ^ ^
uiYi =
ui( 0 +
i=1
i=1
Answer:
1Xi) =
n ^
ui +
i=1
n ^
uiXi = 0
i=1
16) The help function for a commonly used spreadsheet program gives the following definition for the regression
slope it estimates:
n
XiYi (
i=1
Xi)(
i=1
2
Xi -(
n
i=1
Yi )
i=1
i=1
Xi)2
Prove that this formula is the same as the one given in the textbook.
n
XiYi - (
n
Answer:
i=1
n
i=1
n
n
i=1
2
Xi -(
n
Xi)(
Yi )
i=1
n
i=1
Xi)2
XiYi - nXnY
XiYi - nXY
i=1
i=1
.
=
=
n
n
2
2
n
n
X i - (nX)2
X i - nX2
i=1
i=1
n
Dividing both numerator and denominator by n then gives you the desired result.
17) In order to calculate the slope, the intercept, and the regression R2 for a simple sample regression function, list
the five sums of data that you need.
Answer: Depending whether or not the data is in deviations from means or not ( zi = Zi - Z or Zi, say), you need
the following sums:
n
n
n
n
n
2
2
Xi,
x iyi,
yi ,
x i (data in deviation form) or
Yi,
i=1
i=1
i=1
i=1
i=1
n
n
n
n
n
^
2
2
Yi,
Xi,
XiYi,
Yi ,
X i . Using these five columns, you can calculate the slope 1 =
i=1
i=1
i=1
i=1
i=1
n
i=1
n
x iyi
^
i=1
^2 n
xiy i
i=1
n
2
yi
i=1
2
xi
n
1
i=1
n
2
yi
i=1
n
if the data is not given in deviation form, the formulae are as follows:
1=
i=1
n
i=1
regression R2 =
n
1(
i=1
n
i=1
^2
XiYi - nXY )
1(
=
2
Y i - nY2
2
xi
. Alternatively,
YiXi - nXY
, and for the
2
X i - nX2
2
X i - nX2 )
i=1
n
2
Y i - nY2
i=1
18) A peer of yours, who is a major in another social science, says he is not interested in the regression slope and/or
intercept. Instead he only cares about correlations. For example, in the testscore/student -teacher ratio
regression, he claims to get all the information he needs from the negative correlation coefficient
corr(X,Y)=-0.226. What response might you have for your peer?
Answer: First of all, the regression slope is related to the regression R2 , and hence its square root, the correlation
coefficient, since
R2 =
n
1(
i=1
n
i=1
^2
(
XiYi - nXY)
=
2
Y i - nY2
1 i=1
n
i=1
2
- nX2 )
i
.
2
Y i - nY2
However, while the correlation coefficient tells you something about the direction and strength of the
relationship between two variables, it does not inform you about the effect a one unit increase in the
explanatory variable. Hence it cannot answer the question whether or not the relationship is important
(although even with the knowledge of the slope coefficient, this requires further information). Your
friend would not be able to answer the question which policy makers and researchers are typically
interested in, such as, what would be the effect on test scores of a reduction in the student-teacher ratio
by one?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 92
19) Assume that there is a change in the units of measurement on both Y and X. The new variables are Y*= aY and
X* = bX. What effect will this change have on the regression slope?
^* ^ *
^* *
Answer: We now have the following sample regression function Y = 0 +
1 X . The formula for the slope will
be
n
^*
1=
i=1
n
* *
xi yi
n
(bx i)(ayi)
=
*2
xi
i=1
n
i=1
i=1
ab
=
(bx i)2
b2
xiy i
i=1
n
=
2
xi
a^
.
b 1
i=1
20) Assume that there is a change in the units of measurement on X. The new variables X* = bX. Prove that this
change in the units of measurement on the explanatory variable has no effect on the intercept in the resulting
regression.
^
1 bX. But
^*
1 =
i=1
n
i=1
*
x i yi
=
*2
xi
0 +
n
(bx i) y i
i=1
n
i=1
^*
b
=
(bx i)2
^*
*
1 X . The formula for the intercept will be
xiy i
^*
^
1^
1^
i=1
. Hence 0 = Y bX = 0 .
=
b 1
b 1
n
2
b2
xi
i=1
^*
0 =Y-
^2
2
1 (Xi-X) and therefore ESS =
^2
2
1 (Xi-X) . Using small letters to indicate
n
^2
1
i=1
n
(y ix i)2
n
(y ix i)2
x i2
i=1
i=1
correlation coefficient is r2 =
=
=
n
n
n
n
(
x i2 )2
y i2
x i2
y i2
i=1
i=1
i=1
i=1
i=1
n
i=1
x i2
y i2
^2
i=1
x i2
n
i=1
y i2
the same.
22) At the Stock and Watson (http://www.pearsonhighered.com/stock_watson ) website, go to Student Resources
and select the option Datasets for Replicating Empirical Results. Then select the California Test Score Data
Used in Chapters 4-9 and read the data either into Excel or STATA (or another statistical program).
Run a regression of the average reading score (read_scr) on the average math score (math_scr). What values for
the slope and the intercept would you expect? Interpret the coefficients in the resulting regression output and
the regression R2 .
Answer: On average, it would seem plausible, a priori, that schools which score high on the math score would also
do well in the reading score. Perhaps an underlying variable, such as genes, parental interest, or the
quality of teachers, is driving results in both. The relationship is close to the 45 degree line, where the
intercept would be zero and the slope would be one. Interpreted literally, 85 percent of the variation in
the reading score is explained by our model.
23) In a simple regression with an intercept and a single explanatory variable, the variation in Y (TSS =
n
n
^
(Yi-Y)2 ) and the sum of squared
(Yi-Y)2 ) can be decomposed into the explained sums of squares ( ESS =
i=1
i=1
n
n
^
^
residuals (SSR =
ui2 =
(Yi-Y)2 ) (see, for example, equation (4.35) in the textbook).
i=1
i=1
Consider any regression line, positively or negatively sloped in {X,Y} space. Draw a horizontal line
where, hypothetically, you consider the sample mean of Y (
observation of Y.
In this graph, indicate where you find the following distances: the
(i)
(ii)
(iii)
residual
actual minus the mean of Y
fitted value minus the mean of Y
Answer:
Chapter 5 Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals
5.1 Multiple Choice
1) Heteroskedasticity means that
A) homogeneity cannot be assumed automatically for the model.
B) the variance of the error term is not constant.
C) the observed units have different preferences.
D) agents are not all rational.
Answer: B
2) With heteroskedastic errors, the weighted least squares estimator is BLUE. You should use OLS with
heteroskedasticity-robust standard errors because
A) this method is simpler.
B) the exact form of the conditional variance is rarely known.
C) the Gauss-Markov theorem holds.
D) your spreadsheet program does not have a command for weighted least squares.
Answer: B
3) When estimating a demand function for a good where quantity demanded is a linear function of the price, you
should
A) not include an intercept because the price of the good is never zero.
B) use a one-sided alternative hypothesis to check the influence of price on quantity.
C) use a two-sided alternative hypothesis to check the influence of price on quantity.
D) reject the idea that price determines demand unless the coefficient is at least 1.96.
Answer: B
4) The t-statistic is calculated by dividing
A) the OLS estimator by its standard error.
B) the slope by the standard deviation of the explanatory variable.
C) the estimator minus its hypothesized value by the standard error of the estimator.
D) the slope by 1.96.
Answer: C
5) The confidence interval for the sample regression function slope
A) can be used to conduct a test about a hypothesized population regression function slope.
B) can be used to compare the value of the slope relative to that of the intercept.
C) adds and subtracts 1.96 from the slope.
D) allows you to make statements about the economic importance of your estimate.
Answer: A
6) If the absolute value of your calculated t-statistic exceeds the critical value from the standard normal
distribution, you can
A) reject the null hypothesis.
B) safely assume that your regression results are significant.
C) reject the assumption that the error terms are homoskedastic.
D) conclude that most of the actual values are very close to the regression line.
Answer: A
7) Under the least squares assumptions (zero conditional mean for the error term, Xi and Yi being i.i.d., and Xi
and ui having finite fourth moments), the OLS estimator for the slope and intercept
A) has an exact normal distribution for n > 15.
B) is BLUE.
C) has a normal distribution even in small samples.
D) is unbiased.
Answer: D
8) In general, the t-statistic has the following form:
estimate-hypothesize value
A)
standard error of estimate
B)
estimator
standard error of estimator
C)
estimator-hypothesize value
standard error of estimator
D)
estimator-hypothesize value
standard error of estimator
n
Answer: C
9) Consider the following regression line: TestScore = 698.9 2.28 STR. You are told that the t-statistic on the
slope coefficient is 4.38. What is the standard error of the slope coefficient?
A) 0.52
B) 1.96
C) -1.96
D) 4.38
Answer: A
10) Imagine that you were told that the t-statistic for the slope coefficient of the regression line TestScore = 698.9
2.28 STR was 4.38. What are the units of measurement for the t-statistic?
A) points of the test score
B) number of students per teacher
TestScore
C)
STR
D) standard deviations
Answer: D
11) The construction of the t-statistic for a one- and a two-sided hypothesis
A) depends on the critical value from the appropriate distribution.
B) is the same.
C) is different since the critical value must be 1.645 for the one-sided hypothesis, but 1.96 for the two-sided
hypothesis (using a 5% probability for the Type I error).
D) uses 1.96 for the two-sided test, but only +1.96 for the one-sided test.
Answer: B
12) The p-value for a one-sided left-tail test is given by
A) Pr(Z - tact ) = (tact).
B) Pr(Z < tact ) = (tact).
C) Pr(Z < tact ) < 1.645.
D) cannot be calculated, since probabilities must always be positive.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 97
D) ( 1 - 1.96, 1 + 1.96).
Answer: C
14) The 95% confidence interval for 0 is the interval
A) ( 0 - 1.96SE( 0 ), 0 + 1.96SE( 0 )).
^
D) ( 0 - 1.96, 0 + 1.96).
Answer: C
15) The 95% confidence interval for the predicted effect of a general change in X is
A) ( 1 x - 1.96SE( 1 ) x, 1 x + 1.96SE( 1 ) x).
^
B) ( 1 x - 1.645SE( 1 )
^
C) ( 1 x - 1.96SE( 1 )
^
x, 1 x + 1.645SE( 1 )
^
x, 1 x + 1.96SE( 1 )
x).
x).
D) ( 1 x - 1.96, 1 x + 1.96).
Answer: C
^
Xi - X 2
i=1
S^
u
B)
Xi - X 2
i=1
2
S^
u
C)
2
Xi -X
i=1
D)
1
n-2
1
n
n
i=1
n
^2
Xi - X 2 u i
Xi - X 2
i=1
Answer: A
17) One of the following steps is not required as a step to test for the null hypothesis:
^
29) Using 143 observations, assume that you had estimated a simple regression function and that your estimate for
the slope was 0.04, with a standard error of 0.01. You want to test whether or not the estimate is statistically
significant. Which of the following possible decisions is the only correct one:
A) you decide that the coefficient is small and hence most likely is zero in the population
B) the slope is statistically significant since it is four standard errors away from zero
C) the response of Y given a change in X must be economically important since it is statistically significant
D) since the slope is very small, so must be the regression R 2 .
Answer: B
30) You extract approximately 5,000 observations from the Current Population Survey (CPS) and estimate the
following regression function:
ahe= 3.32 0.45 Age, R2 = 0.02, SER = 8.66
(1.00) (0.04)
where ahe is average hourly earnings, and Age is the individuals age. Given the specification, your 95%
confidence interval for the effect of changing age by 5 years is approximately
A) [$1.96, $2.54]
B) [$2.32, $4.32]
C) [$1.35, $5.30]
D) cannot be determined given the information provided
Answer: A
2) (Requires Appendix) (Continuation from Chapter 4) At a recent county fair, you observed that at one stand
peoples weight was forecasted, and were surprised by the accuracy (within a range). Thinking about how the
person could have predicted your weight fairly accurately (despite the fact that she did not know about your
heavy bones), you think about how this could have been accomplished. You remember that medical charts
for children contain 5%, 25%, 50%, 75% and 95% lines for a weight/height relationship and decide to conduct
an experiment with 110 of your peers. You collect the data and calculate the following sums:
n
Yi = 17,375,
i=1
n
2
y i = 94,228.8,
i=1
Xi = 7,665.5,
i=1
n
i=1
2
x i = 1,248.9,
x iy i = 7,625.9
i=1
where the height is measured in inches and weight in pounds. (Small letters refer to deviations from means as
in zi = Zi Z.)
(a) Calculate the homoskedasticity-only standard errors and, using the resulting t-statistic, perform a test on
the null hypothesis that there is no relationship between height and weight in the population of college
students.
(b) What is the alternative hypothesis in the above test, and what level of significance did you choose?
(c) Statistics and econometrics textbooks often ask you to calculate critical values based on some level of
significance, say 1%, 5%, or 10%. What sort of criteria do you think should play a role in determining which
level of significance to choose?
(d) What do you think the relationship is between testing for the significance of the slope and whether or not
the regression R2 is zero?
Answer: (a) The formula for the homoskedasticity-only standard errors requires knowledge of the residual
2
2
1
SSR, and SSR=TSS-ESS. Given the result in (2b), SSR=47,604.7, and hence S ^ =
variance. But S ^ =
u
u n-2
440.78. The SER is 21.00. Dividing by the square root of the variation in X then results in the
homoskedasticity-only standard error of the slope, which is 0.594. The t-statistic is 10.29, which rejects
the null hypothesis of no relationship.
(b) The alternative hypothesis should be one-sided, since there is strong prior knowledge that taller
people weigh more, on average. Given the size of the t-statistic, the null hypothesis can be rejected at
any reasonable level of significance.
(c) Clearly the levels should not be picked arbitrarily, but should depend on the cost involved with the
size and the power of the test. Consider a person who was accused of murder. In that case, the null
hypothesis is that he is innocent. The size of the test would be the probability of letting an innocent
person go to the electric chair, while (1-power of the test) gives the probability of letting a murderer go
free. There are obviously vastly different costs attached to each error, and these will determine the levels
chosen.
(d) If the slope in a regression function is zero, then there is no relationship between the two variables
involved. Hence testing for the significance of the regression slope is the same as testing whether or not
the regression R2 is zero.
3) You have obtained measurements of height in inches of 29 female and 81 male students ( Studenth) at your
university. A regression of the height on a constant and a binary variable ( BFemme), which takes a value of one
for females and is zero otherwise, yields the following result:
Studenth = 71.0 4.84BFemme , R2 = 0.40, SER = 2.0
(0.3) (0.57)
(a) What is the interpretation of the intercept? What is the interpretation of the slope? How tall are females, on
average?
(b) Test the hypothesis that females, on average, are shorter than males, at the 1% level.
(c) Is it likely that the error term is homoskedastic here?
Answer: (a) The intercept gives you the average height of males, which is 71 inches in this sample. The slope tells
you by how much shorter females are, on average (almost 5 inches). The average height of females is
therefore approximately 66 inches.
(b) The t-statistic for the difference in means is -8.49. For a one-sided test, the critical value is 2.33.
Hence the difference is statistically significant.
(c) It is safer to assume that the variances for males and females are different. In the underlying sample
the standard deviation for females was smaller.
4) (continuation from Chapter 4, number 3) You have obtained a sub -sample of 1744 individuals from the
Current Population Survey (CPS) and are interested in the relationship between weekly earnings and age. The
regression, using heteroskedasticity-robust standard errors, yielded the following result:
Earn = 239.16 + 5.20Age , R2 = 0.05, SER = 287.21.,
(20.24) (0.57)
where Earn and Age are measured in dollars and years respectively.
(a) Is the relationship between Age and Earn statistically significant?
(b) The variance of the error term and the variance of the dependent variable are related. Given the distribution
of earnings, do you think it is plausible that the distribution of errors is normal?
(c) Construct a 95% confidence interval for both the slope and the intercept.
Answer: (a) The t-statistic on the slope is 9.12, which is above the critical value from the standard normal
distribution for any reasonable level of significance.
(b) Since the earnings distribution is highly skewed, it is not reasonable to assume that the error
distribution is normal.
(c) The confidence interval for the slope is (4.08,6.32). The confidence interval for the intercept is
(199.49,278.83).
5) (Continuation from Chapter 4, number 5) You have learned in one of your economics courses that one of the
determinants of per capita income (the Wealth of Nations) is the population growth rate. Furthermore you
also found out that the Penn World Tables contain income and population data for 104 countries of the world.
To test this theory, you regress the GDP per worker (relative to the United States) in 1990 ( RelPersInc) on the
difference between the average population growth rate of that country ( n) to the U.S. average population
growth rate (nus ) for the years 1980 to 1990. This results in the following regression output:
RelPersInc = 0.518 18.831(n nus) , R2 =0.522, SER = 0.197
(0.056) (3.177)
(a) Is there any reason to believe that the variance of the error terms is homoskedastic?
(b) Is the relationship statistically significant?
Answer: (a) There are vast differences in the size of these countries, both in terms of the population and GDP.
Furthermore, the countries are at different stages of economic and institutional development. Other
factors vary as well. It would therefore be odd to assume that the errors would be homoskedastic.
(b) The t-statistic is 5.93, making the relationship statistically significant, i.e., we can reject the null
hypothesis that the slope is different from zero.
6) You recall from one of your earlier lectures in macroeconomics that the per capita income depends on the
savings rate of the country: those who save more end up with a higher standard of living. To test this theory,
you collect data from the Penn World Tables on GDP per worker relative to the United States ( RelProd) in 1990
and the average investment share of GDP from 1980 -1990 (SK ), remembering that investment equals saving.
The regression results in the following output:
RelProd = 0.08 + 2.44SK , R2 =0.46, SER = 0.21
(0.04) (0.38)
(a) Interpret the regression results carefully.
(b) Calculate the t-statistics to determine whether the two coefficients are significantly different from zero.
Justify the use of a one-sided or two-sided test.
(c) You accidentally forget to use the heteroskedasticity-robust standard errors option in your regression
package and estimate the equation using homoskedasticity -only standard errors. This changes the results as
follows:
RelProd = -0.08 + 2.44SK , R2 =0.46, SER = 0.21
(0.04) (0.26)
You are delighted to find that the coefficients have not changed at all and that your results have become even
more significant. Why havent the coefficients changed? Are the results really more significant? Explain.
(d) Upon reflection you think about the advantages of OLS with and without homoskedasticity -only standard
errors. What are these advantages? Is it likely that the error terms would be heteroskedastic in this situation?
Answer: (a) An increase in the saving rate of 0.1, or from 0.15 to 0.25, results in an increase in relative GDP per
worker of 0.244, or from 0.5 to roughly 0.75. (Taiwan had a value of 0.5 for RelProd in 1990, while
Sweden was at 0.77.) There is no interpretation for the intercept. The regression explains 46 percent of
the variation in GDP per worker relative to the United States.
(b) The t- statistics are 2.00 and 6.42 for the intercept and slope respectively. You should use a two -sided
test for the intercept, since there are no prior expectations on whether it should be positive or negative.
Hence the intercept is statistically significant at the 5 percent level, but not at the 1 percent level. Since
we expect a positive sign on the slope, we should conduct a one-sided test. The critical values suggest
significance at any reasonable probability level of the size of the test.
(c) Whether you use homoskedasticity-only or heteroskedasticity-robust standard errors does not affect
the estimator, only the formula for the standard errors. If the assumption of homoskedasticity was valid,
then the results would be more significant. However, given the lengthy discussion on homoskedasticity
Stock/Watson 2e -- CVC2 8/23/06 -- Page 104
versus heteroskedasticity in the textbook, it is safer to conduct inference under the assumption of
heteroskedasticity.
(d) In the presence of homoskedasticity in addition to the least squares assumptions in the text, OLS is
BLUE (Gauss-Markov theorem). If the errors are heteroskedastic, then the GLS estimator (weighted least
squares) is BLUE if the form of heteroskedasticity is known, which rarely occurs in practice. Since
economic theory does not suggest, in general, that errors are homoskedastic, it is safer to assume that
they are not. This avoids invalid statistical inference.
7) Carefully discuss the advantages of using heteroskedasticity-robust standard errors over standard errors
calculated under the assumption of homoskedasticity. Give at least five examples where it is very plausible to
assume that the errors display heteroskedasticity.
Answer: There are virtually no examples where economic theory suggests that the errors are homoskedastic.
Hence the maintained hypothesis should be that they are heteroskedastic. Using homoskedasticity -only
standard errors when in truth heteroskedasticity-robust standard errors should be used, results in false
inference. What makes this worse is that homoskedasticity-only standard errors are typically smaller
than heteroskedasticity-robust standard errors, resulting in t-statistics that are too large, and hence
rejection of the null hypothesis too often. There is an alternative GLS estimator, weighted least squares,
which is BLUE, but requires knowledge of how the error variance depends on X, e.g. X or X 2 . Answers
will vary by student regarding the examples, but earnings functions, cross country beta -convergence
regressions, consumption functions, sports regressions involving teams from markets with varying
population size, weight-height relationships for children, etc., are all good candidates.
8) (Requires Appendix material from Chapters 4 and 5) Shortly before you are making a group presentation on
the testscore/student-teacher ratio results, you realize that one of your peers forgot to type all the relevant
information on one of your slides. Here is what you see:
TestScore = 698.9 STR
(9.47) (0.48)
In addition, your group member explains that he ran the regression in a standard spreadsheet program, and
that, as a result, the standard errors in parenthesis are homoskedasticity-only standard errors.
(a) Find the value for the slope coefficient.
(b) Calculate the t-statistic for the slope and the intercept. Test the hypothesis that the intercept and the slope
are different from zero.
(c) Should you be concerned that your group member only gave you the result for the homoskedasticity -only
standard error formula, instead of using the heteroskedasticity-robust standard errors?
Answer: (a) The relationship between the slope coefficient and the regression R2 is
n
n
^2
2
2
x
yi
1
i
^2
ESS
i=1
2 i=1
R2 =
.
=
1 =R n
TSS
n
2
2
yi
xi
i=1
i=1
n
Given the information above, you need to find the TSS (=
2
y i ) and
i=1
2
x i . The TSS is relatively
i=1
easy to find: the SER is 18.6, and hence the SSR is 144,315.5. (Recall that SER = S ^ =
u
1
n-2
SSR
SSR
). This allows you to calculate the TSS, which is 152,109.6. (Recall that R2 = 1 n-2
TSS
SSR
1- R2
).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 105
n ^
2
ui =
i=1
TSS =
n
To find
i=1
2
x i , note that the homoskedasticity-only standard error for the slope is S ^ =
1
S^
u
n
2
xi
i=1
n
2
SER 2
2
xi =
. Hence,
x i = 38.72 = 1,499.6 .
S^
i=1
1
i=1
Inserting these results into the above formula, you get
^2
152,109.6
1 = 0.051 1,499.6 = 5.20
1 = -2.28 (luckily for you, your group member entered the negative
Country
Currency
Price of
Big Mac
Indonesia
Italy
South Korea
Chile
Spain
Hungary
Japan
Taiwan
Thailand
Czech Rep.
Russia
Denmark
Sweden
Mexico
France
Israel
China
South Africa
Switzerland
Poland
Germany
Malaysia
New Zealand
Singapore
Brazil
Canada
Australia
Argentina
Britain
Rupiah
Lira
Won
Peso
Peseta
Forint
Yen
Dollar
Baht
Crown
Ruble
Crown
Crown
Peso
Franc
Shekel
Yuan
Rand
Franc
Zloty
Mark
Dollar
Dollar
Dollar
Real
Dollar
Dollar
Peso
Pound
14,500
4,500
3,000
1,260
375
339
294
70
55
54.37
39.50
24.75
24.0
20.9
18.5
14.5
9.90
9.0
5.90
5.50
4.99
4.52
3.40
3.20
2.95
2.85
2.59
2.50
1.90
United States
Dollar
2.51
The concept of purchasing power parity or PPP (the idea that similar foreign and domestic goods should
have the same price in terms of the same currency, Abel, A. and B. Bernanke, Macroeconomics, 4th edition,
Boston: Addison Wesley, 476) suggests that the ratio of the Big Mac priced in the local currency to the U.S.
dollar price should equal the exchange rate between the two countries.
After entering the data into your spread sheet program, you calculate the predicted exchange rate per U.S.
dollar by dividing the price of a Big Mac in local currency by the U.S. price of a Big Mac ($2.51). To test for PPP,
you regress the actual exchange rate on the predicted exchange rate.
The estimated regression is as follows:
ActualExRate = 27.05 + 1.35 1.35Pr edExRate
(23.74) (0.02)
(a) Your spreadsheet program does not allow you to calculate heteroskedasticity robust standard errors.
Instead, the numbers in parenthesis are homoskedasticity only standard errors. State the two null hypothesis
under which PPP holds. Should you use a one-tailed or two-tailed alternative hypothesis?
(b) Calculate the two t-statistics.
(c) Using a 5% significance level, what is your decision regarding the null hypothesis given the two t-statistics?
What critical values did you use? Are you concerned with the fact that you are testing the two hypothesis
sequentially when they are supposed to hold simultaneously?
(d) What assumptions had to be made for you to use Students t-distribution?
Answer: (a) Under PPP, H0 : 0 = 0 and Ho : 1 = 1. Economic theory does not tell you whether the intercept
should be greater or less than zero if PPP does not hold. The same goes for the slope, i.e., you do not
know whether or not it is less than or greater than unity. As a result, you should use a two tailed
alternative hypothesis.
1.35- 1
-27.05 - 0
(b) The t-statistic for the intercept is t =
= -1.14. For the slope, it is t =
= 17.5.
0.02
23.74
(c) Using the Student t-distribution and 27 degrees of freedom, the critical value for a two-sided
alternative is 2.05. Hence you can reject the null hypothesis for the intercept but not the slope. Under
PPP, both hypothesis are supposed to hold simultaneously and if either or both are rejected, then PPP is
not supported by the data. As is discussed later in the textbook, testing hypothesis sequentially is not the
same as testing them simultaneously, since p-values change. (At an intuition and heroically assuming
independence here, Pr(AandB) = Pr(A) Pr(B); and hence the rejection probability needs to be adjusted.)
(d) In addition to the standard three least squares assumptions, you had to assume that the regression
errors are homoskedastic, and that the regression errors are normally distributed. That is you had to
assume that the homoskedastic normal regression assumptions hold.
10) (Continuation from Chapter 4, number 6) The neoclassical growth model predicts that for identical savings
rates and population growth rates, countries should converge to the per capita income level. This is referred to
as the convergence hypothesis. One way to test for the presence of convergence is to compare the growth rates
over time to the initial starting level.
(a) The results of the regression for 104 countries were as follows:
g6090 = 0.019 0.0006 RelProd 60 , R2 = 0.00007, SER = 0.016
(0.004) (0.0073)
where g6090 is the average annual growth rate of GDP per worker for the 1960 -1990 sample period, and
RelProd60 is GDP per worker relative to the United States in 1960. Numbers in parenthesis are
heteroskedasticity robust standard errors.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 107
Using the OLS estimator with homoskedasticity-only standard errors, the results changed as follows:
g6090 = 0.019 0.0006RelProd 60 , R2 = 0.00007, SER = 0.016
(0.002) (0.0068)
Why didnt the estimated coefficients change? Given that the standard error of the slope is now smaller, can
you reject the null hypothesis of no beta convergence? Are the results in the second equation more reliable than
the results in the first equation? Explain.
(b) You decide to restrict yourself to the 24 OECD countries in the sample. This changes your regression output
as follows (numbers in parenthesis are heteroskedasticity robust standard errors):
g6090 = 0.048 0.0404 RelProd 60 , R2 = 0.82 , SER = 0.0046
(0.004) (0.0063)
Test for evidence of convergence now. If your conclusion is different than in (a), speculate why this is the case.
(c) The authors of your textbook have informed you that unless you have more than 100 observations, it may
not be plausible to assume that the distribution of your OLS estimators is normal. What are the implications
here for testing the significance of your theory?
Answer: (a) Using homoskedasticity-only standard errors has no effect on the OLS estimator. The t- statistic
remains small and is certainly below the critical value. The results are less reliable since there is no
reason to believe that the error variance is homoskedastic.
(b) The t-statistic for the slope is 6.41. At face value, there is strong evidence for convergence.
Neoclassical growth theory does not predict unconditional convergence. Instead it only predicts
convergence if the savings rates and population growth rates are identical. It stands to reason that these
are much more similar between OECD countries than between the countries of the world.
(c) Since there are less than 30 observations, the distribution of the t-statistic is unknown. You should
therefore not conduct statistical inference.
11) You have collected 14,925 observations from the Current Population Survey. There are 6,285 females in the
sample, and 8,640 males. The females report a mean of average hourly earnings of $16.50 with a standard
deviation of $9.06. The males have an average of $20.09 and a standard deviation of $10.85. The overall mean
average hourly earnings is $18.58.
a.
Using the t-statistic for testing differences between two means (section 3.4 of your textbook), decide
whether or not there is sufficient evidence to reject the null hypothesis that females and males have
identical average hourly earnings.
b.
You decide to run two regressions: first, you simply regress average hourly earnings on an intercept
only. Next, you repeat this regression, but only for the 6,285 females in the sample. What will the
regression coefficients be in each of the two regressions?
c.
Finally you run a regression over the entire sample of average hourly earnings on an intercept and a
binary variable DFemme, where this variable takes on a value of 1 if the individual is a female, and is 0
otherwise. What will be the value of the intercept? What will be the value of the coefficient of the
binary variable?
d. What is the standard error on the slope coefficient? What is the t-statistic?
e.
Had you used the homoskedasticity-only standard error in (d) and calculated the t-statistic, how
would you have had to change the test-statistic in (a) to get the identical result?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 108
Answer: a. H0 : F = M; H1 : F
M
20.09-16.05
t=
. As a result, you can comfortably reject the null hypothesis at any reasonable
10.85 2 9.062
+
8640 6285
confidence level.
^
1) In order to formulate whether or not the alternative hypothesis is one -sided or two-sided, you need some
guidance from economic theory. Choose at least three examples from economics or other fields where you have
a clear idea what the null hypothesis and the alternative hypothesis for the slope coefficient should be. Write a
brief justification for your answer.
Answer: Answers will vary by student. The problem is to find examples where there is only a single explanatory
variable. A student may argue that the price coefficient in a demand function is downward sloping, but
unless you control for other variables, this may not be so. The demand for L.A. Laker tickets and their
price comes to mind. CAPM is a nice example. Perhaps the marginal propensity to consume in a
consumption function is another. Testing for speculative efficiency in exchange rate markets may also
work.
2) For the following estimated slope coefficients and their heteroskedasticity robust standard errors, find the
t-statistics for the null hypothesis H0 : 1 = 0. Assuming that your sample has more than 100 observations,
indicate whether or not you are able to reject the null hypothesis at the 10%, 5%, and 1% level of a one -sided
and two-sided hypothesis.
^
3) Explain carefully the relationship between a confidence interval, a one -sided hypothesis test, and a two-sided
hypothesis test. What is the unit of measurement of the t-statistic?
Answer: In the case of a two-sided hypothesis test, the relationship between the t-statistic and the confidence
interval is straightforward. The t-statistic calculates the distance between the estimate and the
hypothesized value in standard deviations. If the distance is larger than 1.96 (size of the test: 5%), then
the distance is large enough to reject the null hypothesis. The confidence interval adds and subtracts 1.96
standard deviations in this case, and asks whether or not the hypothesized value is contained within the
confidence interval. Hence the two concepts resemble the two sides of a coin. They are simply different
ways to look at the same problem. In the case of the one -sided test, the relationship is more complex.
Since you are looking at a one-sided alternative, it does not really make sense to construct a confidence
interval. However, the confidence interval results in the same conclusion as the t-test if the critical value
from the standard normal distribution is appropriately adjusted, e.g. to 10% rather than 5%. The unit of
measurement of the t-statistic is standard deviations.
4) The effect of decreasing the student-teacher ratio by one is estimated to result in an improvement of the
districtwide score by 2.28 with a standard error of 0.52. Construct a 90% and 99% confidence interval for the
size of the slope coefficient and the corresponding predicted effect of changing the student -teacher ratio by
one. What is the intuition on why the 99% confidence interval is wider than the 90% confidence interval?
Answer: The 90% confidence interval for the slope is calculated as follows:
(2.28 - 1.645 0.52, 2.28 + 1.645 0.52) = (1.42, 3.14).
The corresponding predicted effect of a unit change in the student -teacher ratio is the same, since the
change in X is 1.
The 99% confidence interval for the slope coefficient and the unit change in the student -teacher ratio is:
(2.28 - 2.58 0.52, 2.28 + 2.58 0.52) = (0.94, 3.62).
The 99% confidence interval corresponds to a smaller size of the test. This means that you want to be
more certain that the population parameter is contained in the interval, and that requires a larger
interval.
5) Below you are asked to decide on whether or not to use a one-sided alternative or a two-sided alternative
hypothesis for the slope coefficient. Briefly justify your decision.
^d ^
^
(a) q i = 0 + 1 p i, where qd is the quantity demanded for a good, and p is its price.
^ actual ^
^
assess
actual
assess
, where p i
is the actual house price, and p i
is the assessed house price.
(b) p i
= 0 + 1p i
Answer: (a) You would use a one-sided alternative hypothesis since economic theory suggests that the quantity
demanded and prices are negatively related.
(b) The alternative hypothesis is H1 : 1 1 since assessments could be too large or too small, on
average. You should also test for H1 : 0 0.
(c) You should use a one-sided alternative hypothesis, since economic theory strongly suggests that the
marginal propensity to consume is positive.
n ^
aiYi , where a^i =
i=1
n ^
Xi X
. For OLS to be conditionally unbiased, the following two conditions must hold:
ai = 0 and
n
2
i=1
Xi - X
i=1
^
6) (Requires Appendix material) Your textbook shows that OLS is a linear estimator 1 =
n ^
aiXi = 1. Show that this is the case.
i=1
Answer:
n
n ^
ai =
i=1
i=1
Xi - X
n
Xi - X 2
Xi - X 2 i=1
i=1
i=1
n
n ^
n
aiXi =
zero.
i=1
i=1
Xi - X
n
Xi - X 2 =
Xi =
Xi - X 2
(Xi - X) Xi =
Xi - X 2 i=1
i=1
i=1
(Note that
(Xi - X)
i=1
n
Xi - X 2
=1
Xi - X 2
i=1
(Xi - X) =
(Xi - X) Xi - X
i=1
i=1
i=1
i=1
term is zero again because of the definition of a mean.
i=1
7) (Requires Appedix material and Calculus) Equation (5.36) in your textbook derives the conditional variance
n
~
~
2
2
for any old conditionally unbiased estimator 1 to be var( 1 X1 , ..., Xn) = u
a i where the conditions for
i=1
n
n
aiXi = 1. As an alternative to the BLUE proof presented in
ai = 0 and
conditional unbiasedness are
i=1
i=1
your textbook, you recall from one of your calculus courses that you could minimize the variance subject to the
two constraints, thereby making the variance as small as possible while the constraints are holding. Show that
^
in doing so you get the OLS weights ai. (You may assume that X1 ,..., Xn are nonrandom (fixed over repeated
samples).)
2
u
2
ai - 1
ai - 2 (
results in (n+2) linear equations in (n+2) unknowns. Solving these for the weights, you get ai =
Xi - X
n
= ai .
Xi - X 2
i=1
8) Your textbook states that under certain restrictive conditions, the t- statistic has a Student t-distribution with
n-2 degrees of freedom. The loss of two degrees of freedom is the result of OLS forcing two restrictions onto
the data. What are these two conditions, and when did you impose them onto the data set in your derivation of
the OLS estimator?
n ^
n ^
ui = 0 and
uiXi = 0. These were the result of minimizing the sum of the
i=1
i=1
squared prediction errors, i.e., taking the derivative of the prediction mistakes and setting them to zero.
Answer: In deriving the OLS estimator 1 , you minimize the prediction mistake w.r.t. b1 only, not b0 and b1 . As a
n ^
uiXi = 0) not two. Hence there are n-1
result, you are only placing one restriction on the data, (
i=1
n ^
ui = 0 will no longer hold.
independent observations.
i=1
10) In many of the cases discussed in your textbook, you test for the significance of the slope at the 5% level. What
is the size of the test? What is the power of the test? Why is the probability of committing a Type II error so
large here?
Answer: The size of the test is the same as the probability of committing a Type I error. It is therefore 5%. If the
^
alternative hypothesis is vague, as is the case for H1 : 1 0 or H1 : 1 < 0 (or H1 : 1 > 0), then the
distribution of the alternative hypothesis is located virtually on top of the distribution of the null
hypothesis (it is just marginally moved to the left or the right). As a result, the probability of the Type II
error must be 1-probability of the Type I error. Hence the power of the test is only 5%, which is low.
11) Assume that the homoskedastic normal regression assumption hold. Using the Student t-distribution, find the
critical value for the following situation:
(a) n=28, 5% significance level, one-sided test.
(b) n=40, 1% significance level, two-sided test.
(c) n=10, 10% significance level, one-sided test.
(d) n= , 5% significance level, two-sided test.
Answer: (a) 1.71
(b) between 2.75 (30 degrees of freedom) and 2.66 (60 degrees of freedom)
(c) 1.40
(d) 1.96
12) Consider the following two models involving binary variables as explanatory variables:
Wage = 0 + 1 DFemme and Wage = 1DFemme + 2Male
where Wage is the hourly wage rate, DFemme is a binary variable that is equal to 1 if the person is a female, and
0 if the person is a male. Male = 1 DFemme. Even though you have not learned about regression functions with
two explanatory variables (or regressions without an intercept), assume that you had estimated both models,
i.e., you obtained the estimates for the regression coefficients.
What is the predicted wage for a male in the two models? What is the predicted wage for a female in the two
models? What is the relationship between the s and the s? Why would you prefer one model over the other?
Answer: For DFemme = 1, the models read Wage =
Wage = 0 and Wage = 2 . Hence both
0 +
0 and
0 =
1 . Since the wage for females is 1 = 0 + 1, and the wage for males is 0 , then 1 must be the
difference in the wage between males and females. Hence the first formulation allows you to test directly
whether or not the difference in means (here wages) is statistically significant.
^
13) Consider the sample regression function Yi = 0 + 1 Xi. The table below lists estimates for the slope ( 1 ) and
the variance of the slope estimator (
^ 2^
). In each case calculate the p-value for the null hypothesis of 1 = 0
1
and a two-tailed alternative hypothesis. Indicate in which case you would reject the null hypothesis at the 5%
significance level.
^
1.76
0.0025
2.85
-0.00014
^ 2^
0.37
0.000003
117.5
0.0000013
1
1
Answer: The t-statistics are -2.89, 1.36, 0.26, and -0.123 respectively, with p-values of 0.004, 0.17, 0.79, and 0.90.
Hence you only reject the null hypothesis for the first case.
14) Your textbook discussed the regression model when X is a binary variable
Yi = 0 + 1 Di + ui, i = 1..., n
Let Y represent wages, and let D be one for females, and 0 for males. Using the OLS formula for the slope
^
coefficient, prove that 1 is the difference between the average wage for males and the average wage for
females.
Answer: Using the OLS formula for the slope, we have
nf
n
wagei - nf wage
XiYi - nXY
^
1=
i=1
n
i=1
2
X i - nX2
i=1
2
nf
nf n
is the average wage. Dividing both the numerator and the denominator by nf , we get
n
1 f
wagei - wage
nf
wage f - wage
i=1
n
1=
(wage f - wage), where wage f is the average wage of
=
=
n - nf
n - nf
nf
1n
n
females. But note that wage =
nf
nm
wage f +
wage m,where the m subscript indicates males. Substitution
n
n
of this expression for average wages into the previous expression results in
^
1=
nf
nm
nm
n
n
(wage f - wage) =
wage f wage f +
wage m = wage f wage m
n - nf
n
n
n - nf
n - nf
15) Your textbook discussed the regression model when X is a binary variable
Yi = 0 + iDi + ui, i = 1,..., n
Let Y represent wages, and let D be one for females, and 0 for males. Using the OLS formula for the intercept
coefficient, prove that 0 is the average wage for males.
Answer:
0=Y-
1 X. It is easy but tedious to show that the formula for the slope reduces to the difference
between the average wage for females and the average wage for males.
nf
n
wagei - nf wage
XiYi - nXY
nf
i=1
i=1
and hence 0
1=
=
= wage f - wage m. But Y = wage and X =
n
n
2
2
2
nf
X i - nX
nf i=1
n
= wage - (wage f - wage m)
0 =
nf
nm
nf
. Substituting the expression wage =
wage f +
wagem then results in
n
n
n
nf
nm
wage m +
wage m, which equals the male average wage.
n
n
2
16) Let ui be distributed N(0, u ), i.e., the errors are distributed normally with a constant variance
^ ), where
2
u
i=1
inference would be straightforward if
. Statistical
n
(Xi - X)2
2
u was known. One way to deal with this problem is to replace
2
u
2
^
with an estimator S ^ . Clearly since this introduces more uncertainty, you cannot expect 1 to be still normally
u
distributed. Indeed, the t-statistic now follows Students t distribution. Look at the table for the Student
t-distribution and focus on the 5% two-sided significance level. List the critical values for 10 degrees of
freedom, 30 degrees of freedom, 60 degrees of freedom, and finally degrees of freedom. Describe how the
notion of uncertainty about
2
u can be incorporated about the tails of the t-distribution as the degrees of
freedom increase.
Answer: More uncertainty implies that the tales of the distribution should be stretched further to the left and right
when compared to the normal distribution. Hence the critical values for the 5% significance level should
be greater than 1.96 in absolute levels. However, as the number of observations (degrees of freedom)
2
increase, S ^ will converge towards
u
2
u , so that the shape of the t-distribution should resemble the
normal distribution more and more. Finally, when there are infinite degrees of freedom, the sample
2
formula S ^ becomes the population variance, and the t-distribution should converge to the normal
u
distribution.
17) In a Monte Carlo study, econometricians generate multiple sample regression functions from a known
population regression function. For example, the population regression function could be Yi = 0 + 1 Xi = 100
0.5 Xi. The Xs could be generated randomly or, for simplicity, be nonrandom (fixed over repeated samples).
If we had ten of these Xs, say, and generated twenty Ys, we would obviously always have all observations on a
straight line, and the least squares formulae would always return values of 100 and 0.5 numerically. However,
if we added an error term, where the errors would be drawn randomly from a normal distribution, say, then
the OLS formulae would give us estimates that differed from the population regression function values.
Assume you did just that and recorded the values for the slope and the intercept. Then you did the same
experiment again (each one of these is called a replication). And so forth. After 1,000 replications, you plot
the 1,000 intercepts and slopes, and list their summary statistics.
Sample: 1 1000
BETA0_HAT
Mean
Median
Maximum
Minimum
Std. Dev.
Skewness
Kurtosis
BETA1_HAT
100.014
100.021
106.348
93.862
1.994
0.013
3.026
0.500
0.500
0.468
0.538
0.011
0.042
2.986
0.055
0.973
0.305
0.858
Sum
100014.353
Sum Sq. Dev. 3972.403
499.857
0.118
Observations
1000.000
Jarque-Bera
Probability
1000.000
Using the means listed next to the graphs, you see that the averages are not exactly 100 and 0.5. However,
they are close. Test for the difference of these averages from the population values to be statistically
significant.
Answer: You can use a simple t-statistic to calculate whether or not (-0.499857) and 100.0144 are statistically
different from (-0.5) and 100. In the denominator of that statistic you would simply put the standard
deviations (0.0109 and 1.9941) divided by the square root of 1,000. As you can see, r =
100.0144 - 100
-0.499857 - (-0.50)
= -0.41 and t =
= 0.29. Neither one of the estimators is more than 1.96
0.0109
1.9941
1000
1000
standard deviations from truth, and hence you cannot reject the null hypothesis that the estimators are
unbiased.
n
^
18) In the regression through the origin model Yi = 1 Xi + ui, the OLS estimator is 1 =
i=1
n
XiYi
. Prove that the
2
Xi
i=1
estimator is a linear function of Y1 ,..., Yn and prove that it is conditionally unbiased.
Answer: Let wi =
Xi
n
2
Xi
, then 1 = wiYi. Hence the OLS estimator is a linear function of Y1 ..., Yn. Next, since
i=1
Yi = 1 Xi + ui, we get
n
n
n
^
wiui .
wi ( iXi + ui) = 1
wiXi +
1=
i=1
i=1
i=1
n
2
Xi
n
n
Xi
^
wiui . Taking expectations on both sides,
wiXi = i=1
wi =
,
= 1 implies 1 = 1 +
n
n
2 i=1
2
i=1
Xi
Xi
i=1
i=1
we find
E( 1 ) = 1 + E
1
n
wiui = 1 + E
i=1
1
n
n
i=1
n
i=1
1
n
Xiui
2
Xi
= 1+E
n
XiE(ui X1 ,..., Xn)
i=1
1
n
2
Xi
= 1
i=1
The last equality follows by using the law of iterated expectations. By least squares assumptions, ui is
distributed independently of X for all observations other than i, so E(ui X1 ,..., Xn) = E(ui X i) = 0. Hence
^
E( 1 X 1 ,...,Xn) = 1.
19) The neoclassical growth model predicts that for identical savings rates and population growth rates, countries
should converge to the per capita income level. This is referred to as the convergence hypothesis. One way to
test for the presence of convergence is to compare the growth rates over time to the initial starting level, i.e., to
run the regression g6090 = 0 + 1 RelProd 60 , where g6090 is the average annual growth rate of GDP per
worker for the 1960-1990 sample period, and RelProd 60 is GDP per worker relative to the United States in
1960. Under the null hypothesis of no convergence, 1 = 0; H1 : 1 < 0, implying (beta) convergence. Using a
standard regression package, you get the following output:
Dependent Variable: G6090
Method: Least Squares
Date: 07/11/06 Time: 05:46
Sample: 1 104
Included observations: 104
White Heteroskedasticity-Consistent Standard Errors & Covariance
Variable
C
YL60
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
0.018989
0.000566
0.000068
-0.009735
0.015992
0.026086
283.5498
1.367534
Prob.
0.0000
0.9111
0.018846
0.015915
-5.414418
-5.363565
0.006986
0.933550
You are delighted to see that this program has already calculated p-values for you. However, a peer of yours
points out that the correct p-value should be 0.4562. Who is right?
Answer: Statistical packages typically do not know what the alternative hypothesis is. As a result, the packages
calculate t-statistics and p-values for H1 : 1 0. You can tell your fellow student that she is right and
you will still have to calculate p-values (and t-statistics) by hand for cases other than H1 : 1
0.
20) Changing the units of measurement obviously will have an effect on the slope of your regression function. For
n
* *
xi yi
^*
a ^
i=1
example, let Y*= aY and X* = bX. Then it is easy but tedious to show that 1 =
. Given this
=
n
b 1
*2
xi
i=1
result, how do you think the standard errors and the regression R2 will change?
Answer: Statistical inference should not depend on whim, and hence changes in the units of measurement cannot
have an effect on the regression R2 . Also, the t-statistics should not change, and hence SE(
change accordingly (SE(
^*
^
a
1 ) = b SE( 1 )).
^*
1 ) must
21) Using the California School data set from your textbook, you run the following regression:
TestScr = 698.9 - 2.28 STR
n = 420, SER = 9.4
where TestScore is the average test score in the district and STR is the student-teacher ratio. The sample
standard deviation of test scores is 19.05, and the sample standard deviation of the student teacher ratio is
1.89.
a.
Find the regression R2 and the correlation coefficient between test scores and the student teacher ratio.
b.
Answer: a. R2 = 1 -
144611.3
SSR
=1= 0.051
152490.6
TSS
where TestScore is the average test score in the district and STR is the student-teacher ratio. Using
heteroskedasticity robust standard errors, you find
a.
b.
Which of the two t-statistics should you base your inference on?
Answer: a. The respective t-statistics are 4.39 (heteroskedasticity-robust standard error) and 4.75
(homoskedasticity-only standard error).
b. Given the similarity of the two statistics and the fact that both are greater than 4, it will not make
much of a difference which one you will use. However, it is cleaner to use the
heteroskedasticity-robust formula, since, in general, it will result in the correct inference
procedure.
23) Using data from the Current Population Survey, you estimate the following relationship between average
hourly earnings (ahe) and the number of years of education (educ):
ahe = -4.58 + 1.71 educ
The heteroskedasticity-robust standard error on the slope is (0.03). Calculate the 95% confidence interval for
the slope. Repeat the exercise using the 90% and then the 99% confidence interval. Can you reject the null
hypothesis that the slope coefficient is zero in the population?
Answer: The 95% confidence interval for the slope is (1.65,1.77). For the 90% confidence level, you get (1.66,1.75)
while the interval is (1.63,1.79) for the 99% level. Since neither of the confidence intervals contains zero,
you can comfortably reject the null hypothesis in all three cases.
7) (Requires Calculus) In the multiple regression model you estimate the effect on Yi of a unit change in one of
the Xi while holding all other regressors constant. This
A) makes little sense, because in the real world all other variables change.
B) corresponds to the economic principle of mutatis mutandis.
C) leaves the formula for the coefficient in the single explanatory variable case unaffected.
D) corresponds to taking a partial derivative in mathematics.
Answer: D
8) You have to worry about perfect multicollinearity in the multiple regression model because
A) many economic variables are perfectly correlated.
B) the OLS estimator is no longer BLUE.
C) the OLS estimator cannot be computed in this situation.
D) in real life, economic variables change together all the time.
Answer: C
9) In a two regressor regression model, if you exclude one of the relevant variables then
A) it is no longer reasonable to assume that the errors are homoskedastic.
B) OLS is no longer unbiased, but still consistent.
C) you are no longer controlling for the influence of the other variable.
D) the OLS estimator no longer exists.
Answer: C
10) The intercept in the multiple regression model
A) should be excluded if one explanatory variable has negative values.
B) determines the height of the regression line.
C) should be excluded because the population regression function does not go through the origin.
D) is statistically significant if it is larger than 1.96.
Answer: B
11) In the multiple regression model, the least squares estimator is derived by
A) minimizing the sum of squared prediction mistakes.
B) setting the sum of squared errors equal to zero.
C) minimizing the absolute difference of the residuals.
D) forcing the smallest distance between the actual and fitted values.
Answer: A
12) The sample regression line estimated by OLS
A) has an intercept that is equal to zero.
B) is the same as the population regression line.
C) cannot have negative and positive slopes.
D) is the line that minimizes the sum of squared prediction mistakes.
Answer: D
13) The OLS residuals in the multiple regression model
A) cannot be calculated because there is more than one explanatory variable.
B) can be calculated by subtracting the fitted values from the actual values.
C) are zero because the predicted values are another name for forecasted values.
D) are typically the same as the population regression function errors.
Answer: B
14) Under the least squares assumptions for the multiple regression problem (zero conditional mean for the error
term, all Xi and Yi being i.i.d., all Xi and ui having finite fourth moments, no perfect multicollinearity), the OLS
estimators for the slopes and intercept
A) have an exact normal distribution for n > 25.
B) are BLUE.
C) have a normal distribution in small samples as long as the errors are homoskedastic.
D) are unbiased and consistent.
Answer: D
15) The main advantage of using multiple regression analysis over differences in means testing is that the
regression technique
A) allows you to calculate p-values for the significance of your results.
B) provides you with a measure of your goodness of fit.
C) gives you quantitative estimates of a unit change in X.
D) assumes that the error terms are generated from a normal distribution.
Answer: C
16) In a multiple regression framework, the slope coefficient on the regressor X2i
A) takes into account the scale of the error term.
B) is measured in the units of Yi divided by units of X2i.
C) is usually positive.
D) is larger than the coefficient on X1i.
Answer: B
17) One of the least squares assumptions in the multiple regression model is that you have random variables which
are i.i.d. This stands for
A) initially indeterminate differences.
B) irregularly integrated dichotomies.
C) identically initiated deltas (as in changes).
D) independently and identically distributed.
Answer: D
18) Omitted variable bias
A) will always be present as long as the regression R2 < 1.
B) is always there but is negligible in almost all economic examples.
C) exists if the omitted variable is correlated with the included regressor but is not a determinant of the
dependent variable.
D) exists if the omitted variable is correlated with the included regressor and is a determinant of the
dependent variable.
Answer: D
19) The following OLS assumption is most likely violated by omitted variables bias:
A) E(ui Xi) = 0
B) (Xi, Yi) i=1,..., n are i.i.d draws from their joint distribution
C) there are no outliers for Xi, ui
D) there is heteroskedasticity
Answer: A
20) The population multiple regression model when there are two regressors, X1i and X2i can be written as
follows, with the exception of:
A) Yi = 0 + 1 X1i + 2 X2i + ui, i = 1,..., n
B) Yi = 0 X0i + 1 X1i + 2 X2i + ui, X0i = 1, i = 1,..., n
2
C) Yi =
j Xji + ui, i = 1,..., n
j=0
D) Yi = 0 + 1 X1i + 2 X2i + ... + kXki + ui , i = 1,..., n
Answer: D
21) In the multiple regression model Yi = 0 + 1 X1i+ 2 X2i + ... + kXki + ui , i = 1,..., n, the OLS estimators are
obtained by minimizing the sum of
n
Yi - b0 - b1 X1i - ... - bkXki 2
A) squared mistakes in
i=1
n
Yi - b0 - b1 X1i - ... - bkXki - ui 2
B) squared mistakes in
i=1
n
Yi - b0 - b1 X1i - ... - bkXki
C) absolute mistakes in
i=1
n
Yi - b0 - b1 Xi 2
D) squared mistakes in
i=1
Answer: A
22) In the multiple regression model, the SER is given by
n ^
1
A)
ui
n-2
i=1
n
1
ui
B)
n - k -2
i=1
n ^
1
C)
ui
n- k-2
i=1
n ^
1
2
ui
D)
n- k-1
i=1
Answer: D
23) In multiple regression, the R2 increases whenever a regressor is
A) added unless the coefficient on the added regressor is exactly zero.
B) added.
C) added unless there is heterosckedasticity.
D) greater than 1.96 in absolute value.
Answer: A
n-2 ESS
n - k -1 TSS
C) 1-
n-1 SSR
n - k -1 TSS
D)
ESS
TSS
Answer: C
25) Consider the following multiple regression models (a) to (d) below. DFemme = 1 if the individual is a female,
and is zero otherwise; DMale is a binary variable which takes on the value one if the individual is male, and is
zero otherwise; DMarried is a binary variable which is unity for married individuals and is zero otherwise, and
DSingle is (1-DMarried). Regressing weekly earnings (Earn) on a set of explanatory variables, you will
experience perfect multicollinearity in the following cases unless:
A) Earni = 0 + 1 DFemme + 2 Dmale + 3 X3i
B) Earni = 0 + 1 DMarried + 2 DSingle + 3 X3i
C) Earni = 0 + 1 DFemme + 3 X3i
D) Earni = 1 DFemme + 2 Dmale + 3 DMarried + 4 DSingle + 5 X3i
Answer: C
26) Consider the multiple regression model with two regressors X1 and X2 , where both variables are determinants
of the dependent variable. When omitting X2 from the regression, then there will be omitted variable bias for 1
A) if X1 and X2 are correlated
B) always
C) if X2 is measured in percentages
D) if X2 is a dummy variable
Answer: A
27) The dummy variable trap is an example of
A) imperfect multicollinearity
B) something that is of theoretical interest only
C) perfect multicollinearity
D) something that does not happen to university or college students
Answer: C
28) Imperfect multicollinearity
A) is not relevant to the field of economics and business administration
B) only occurs in the study of finance
C) means that the least squares estimator of the slope is biased
D) means that two or more of the regressors are highly correlated
Answer: D
29) Consider the multiple regression model with two regressors X1 and X2 , where both variables are determinants
of the dependent variable. You first regress Y on X1 only and find no relationship. However when regressing Y
on X1 and X2 , the slope coefficient 1 changes by a large amount. This suggests that your first regression
suffers from
A) heteroskedasticity
B) perfect multicollinearity
C) omitted variable bias
D) dummy variable trap
Answer: C
30) Imperfect multicollinearity
A) implies that it will be difficult to estimate precisely one or more of the partial effects using the data at
hand
B) violates one of the four Least Squares assumptions in the multiple regression model
C) means that you cannot estimate the effect of at least one of the Xs on Y
D) suggests that a standard spreadsheet program does not have enough power to estimate the multiple
regression model
Answer: A
1) Females, on average, are shorter and weigh less than males. One of your friends, who is a pre -med student,
tells you that in addition, females will weigh less for a given height. To test this hypothesis, you collect height
and weight of 29 female and 81 male students at your university. A regression of the weight on a constant,
height, and a binary variable, which takes a value of one for females and is zero otherwise, yields the following
result:
Studentw = -229.21 6.36 Female + 5.58 Height , R2 =0.50, SER = 20.99
where Studentw is weight measured in pounds and Height is measured in inches.
(a) Interpret the results. Does it make sense to have a negative intercept?
(b) You decide that in order to give an interpretation to the intercept you should rescale the height variable.
One possibility is to subtract 5 ft. or 60 inches from your Height, because the minimum height in your data set is
62 inches. The resulting new intercept is now 105.58. Can you interpret this number now? Do you thing that the
regression R2 has changed? What about the standard error of the regression?
(c) You have learned that correlation does not imply causation. Although this is true mathematically, does this
always apply?
Answer: (a) For every additional inch in height, weight increases by roughly 5.5 pounds. Female students weigh
approximately 6.5 pounds less than male students, controlling for height. The regression explains 50
percent of the weight variation among students. It does not make sense to interpret the intercept, since
there are no observations close to the origin, or, put differently, there are no individuals who are zero
inches tall.
(b) There are now observations close to the origin and you can therefore interpret the intercept. A
student who is 5ft. tall will weight roughly 105.5 pounds, on average. The two slopes will be unaffected,
as will be the regression R2 . Since the explanatory power of the regression is unaffected by rescaling, and
the dependent variable and the total sums of squares have remained unchanged, the sums of squared
residuals, and hence the SER, must also remain the same.
(c) Although true in general, there are cases where Y cannot cause X, as is the case here. Gaining weight
is not a good way for becoming taller, or put differently, weighing 250 pounds will not make students
over 7 ft. tall.
2) The cost of attending your college has once again gone up. Although you have been told that education is
investment in human capital, which carries a return of roughly 10% a year, you (and your parents) are not
pleased. One of the administrators at your university/college does not make the situation better by telling you
that you pay more because the reputation of your institution is better than that of others. To investigate this
hypothesis, you collect data randomly for 100 national universities and liberal arts colleges from the 2000 -2001
U.S. News and World Report annual rankings. Next you perform the following regression
Cost = 7,311.17 + 3,985.20 Reputation 0.20 Size
+ 8,406.79 Dpriv 416.38 Dlibart 2,376.51 Dreligion
R2 =0.72, SER = 3,773.35
where Cost is Tuition, Fees, Room and Board in dollars, Reputation is the index used in U.S. News and World
Report (based on a survey of university presidents and chief academic officers), which ranges from 1 (marginal
) to 5 (distinguished), Size is the number of undergraduate students, and Dpriv, Dlibart, and Dreligion are
binary variables indicating whether the institution is private, a liberal arts college, and has a religious
affiliation.
(a) Interpret the results. Do the coefficients have the expected sign?
(b) What is the forecasted cost for a liberal arts college, which has no religious affiliation, a size of 1,500
students and a reputation level of 4.5? (All liberal arts colleges are private.)
(c) To save money, you are willing to switch from a private university to a public university, which has a
ranking of 0.5 less and 10,000 more students. What is the effect on your cost? Is it substantial?
(d) Eliminating the Size and Dlibart variables from your regression, the estimation regression becomes
Cost = 5,450.35 + 3,538.84 Reputation + 10,935.70 Dpriv 2,783.31 Dreligion;
R2 =0.72, SER = 3,792.68
Why do you think that the effect of attending a private institution has increased now?
(e) What can you say about causation in the above relationship? Is it possible that Cost affects Reputation rather
than the other way around?
Answer: (a) An increase in reputation by one category, increases the cost by roughly $3,985. The larger the size of
the college/university, the lower the cost. An increase of 10,000 students results in a $2,000 lower cost.
Private schools charge roughly $8,406 more than public schools. A school with a religious affiliation is
approximately $2,376 cheaper, presumably due to subsidies, and a liberal arts college also charges
roughly $416 less. There are no observations close to the origin, so there is no direct interpretation of the
intercept. Other than perhaps the coefficient on liberal arts colleges, all coefficients have the expected
sign.
(b) $ 32,935.
(c) Roughly $ 12,4.00. Since over the four years of education, this implies approximately $50,000, it is a
substantial amount of money for the average household.
(d) Private institutions are smaller, on average, and some of these are liberal arts colleges. Both of these
variables had negative coefficients.
(e) It is very possible that the university president and chief academic officer are influenced by the cost
variable in answering the U.S. News and World Report survey. If this were the case, then the above
equation suffers from simultaneous causality bias, a topic that will be covered in a later chapter.
However, this poses a serious threat to the internal validity of the study.
1 X1i +
2 X2i + ui
the OLS estimators for the three parameters are as follows (small letters refer to deviations from means as in zi
= Zi Z):
^
0 = Y 1 X1 2 X2
n
n
n
n
2
x 2i y ix 2i
x 1ix 2i
y ix 1i
^
i=1
i=1
i=1
i=1
1=
n
n
n
2
2
x 1i
x 2i - (
x 1ix 2i)2
i=1
i=1
i=1
n
n
^
i=1
2=
y ix 2i
n
2
x 1i -
i=1
2
x 1i
i=1
n
i=1
n
i=1
n
y ix 1i
n
2
x 2i - (
i=1
i=1
x 1ix 2i
x 1ix 2i )2
You have collected data for 104 countries of the world from the Penn World Tables and want to estimate the
effect of the population growth rate (X1i) and the saving rate (X2i) (average investment share of GDP from
1980 to 1990) on GDP per worker (relative to the U.S.) in 1990. The various sums needed to calculate the OLS
estimates are given below:
n
X1i = 2.025;
X2i = 17.313
i=1
i=1
n
n
n
2
2
2
y i = 8.3103;
x 1i = .0122;
x 2i = 0.6422
i=1
i=1
i=1
n
n
n
y i x 1i = -0.2304;
y i x 2i = 1.5676;
x 1i x 2i = -0.0520
i=1
i=1
i=1
(a) What are your expected signs for the regression coefficient? Calculate the coefficients and see if their signs
correspond to your intuition.
(b) Find the regression R2 , and interpret it. What other factors can you think of that might have an influence on
i=1
Yi = 33.33;
productivity?
^
Answer: (a) You expect 1 < 0 and 2 > 0 with no prior expectation on the intercept. Substituting the above
^
numbers into the equations for the regression coefficients results in 1 = -12.95, 2 = 1.39, and 0 = 0.34.
^ n
^ n
x
y
y i x 2i
+
1
2
i 1i
i=1
i=1
(b) R2 =
= 0.62. 62 percent of the variation in relative productivity is
n
2
yi
i=1
explained by the regression. There is a vast literature on the subject and students answers will obviously
vary. Some may focus on additional economic variables such as the initial level of productivity and the
inflation rate during the sample period. Others may emphasize institutional variables such as whether or
Stock/Watson 2e -- CVC2 8/23/06 -- Page 129
not the country was democratic over the sample period, or had political stability, etc.
4) A subsample from the Current Population Survey is taken, on weekly earnings of individuals, their age, and
their gender. You have read in the news that women make 70 cents to the $1 that men earn. To test this
hypothesis, you first regress earnings on a constant and a binary variable, which takes on a value of 1 for
females and is 0 otherwise. The results were:
Earn = 570.70 170.72 Female, R2 =0.084, SER = 282.12.
(a) There are 850 females in your sample and 894 males. What are the mean earnings of males and females in
this sample? What is the percentage of average female income to male income?
(b) You decide to control for age (in years) in your regression results because older people, up to a point, earn
more on average than younger people. This regression output is as follows:
Earn = 323.70 169.78 Female + 5.15 Age, R2 =0.135, SER = 274.45.
Interpret these results carefully. How much, on average, does a 40 -year-old female make per year in your
sample? What about a 20-year-old male? Does this represent stronger evidence of discrimination against
females?
Answer: (a) Males earn $570.70, females $399.98. Percentage of average female income to male income is 70.1% in
the sample.
(b) As individuals become one year older, they earn $5.15 more, on average. Females earn significantly
less money on average and for a given age. 13.5 percent of the earnings variation is explained by the
regression. A 40-year-old female earns $359.92, while a 20-year-old male makes $426.70. There is
somewhat more evidence here, since age has been added as a regressor. However, many attributes,
which could potentially explain this difference, are still omitted.
5) You have collected data from Major League Baseball (MLB) to find the determinants of winning. You have a
general idea that both good pitching and strong hitting are needed to do well. However, you do not know how
much each of these contributes separately. To investigate this problem, you collect data for all MLB during
1999 season. Your strategy is to first regress the winning percentage on pitching quality (Team ERA), second
to regress the same variable on some measure of hitting (OPS On -base Plus Slugging percentage), and
third to regress the winning percentage on both.
Summary of the Distribution of Winning Percentage, On Base plus Slugging Percentage,
and Team Earned Run Average for MLB in 1999
Average
Team
ERA
OPS
Standard
deviation
Percentile
10%
25%
40%
4.35
4.72
50%
60%
(median)
4.78
4.91
75%
90%
5.06
5.25
4.71
0.53
3.84
0.778
0.034
0.08
0.40
0.49
Winning
0.50
Percentage
0.43
0.46
0.48
0.59
0.60
7) You have collected data for 104 countries to address the difficult questions of the determinants for differences
in the standard of living among the countries of the world. You recall from your macroeconomics lectures that
the neoclassical growth model suggests that output per worker (per capita income) levels are determined by,
among others, the saving rate and population growth rate. To test the predictions of this growth model, you
run the following regression:
RelPersInc = 0.339 12.894 n + 1.397 SK , R2 =0.621, SER = 0.177
where RelPersInc is GDP per worker relative to the United States, n is the average population growth rate,
1980-1990, and SK is the average investment share of GDP from 1960 to1990 (remember investment equals
saving).
(a) Interpret the results. Do the signs correspond to what you expected them to be? Explain.
(b) You remember that human capital in addition to physical capital also plays a role in determining the
standard of living of a country. You therefore collect additional data on the average educational attainment in
years for 1985, and add this variable (Educ) to the above regression. This results in the modified regression
output:
RelPersInc = 0.046 5.869 n + 0.738 SK + 0.055 Educ, R2 =0.775, SER = 0.1377
How has the inclusion of Educ affected your previous results?
(c) Upon checking the regression output, you realize that there are only 86 observations, since data for Educ is
not available for all 104 countries in your sample. Do you have to modify some of your statements in (d)?
(d) Brazil has the following values in your sample: RelPersInc = 0.30, n = 0.021, SK = 0.169, Educ = 3.5. Does your
equation overpredict or underpredict the relative GDP per worker? What would happen to this result if Brazil
managed to double the average educational attainment?
Answer: (a) The Solow growth model predicts higher productivity with higher saving rates and lower population
growth. The signs therefore correspond to prior expectations. A 10 percent point increase in the saving
rate results in a roughly 14 percent increase in per capita income relative to the United States. Lowering
the population growth rate by 1 percent results in a 13 percent higher per capita income relative to the
United States. It is best not to interpret the intercept. The regression explains approximately 62 percent of
the variation in per capita income among the 104 countries of the world.
(b) The coefficient on the population growth rate is roughly half of what it was originally, while the
coefficient on the saving rate has approximately doubled. The regression R2 has increased significantly.
(c) When comparing results, you should ensure that the sample is identical, since comparisons are not
valid otherwise.
(d) The predicted value for Brazil is 0.240. Hence the regression underpredicts Brazils per capita income.
Increasing Educ to 7.0 would result in a predicted per capita income of 0.43, which is a substantial
increase from both its current actual position and the previously predicted value.
8) Attendance at sports events depends on various factors. Teams typically do not change ticket prices from game
to game to attract more spectators to less attractive games. However, there are other marketing tools used, such
as fireworks, free hats, etc., for this purpose. You work as a consultant for a sports team, the Los Angeles
Dodgers, to help them forecast attendance, so that they can potentially devise strategies for price
discrimination. After collecting data over two years for every one of the 162 home games of the 2000 and 2001
season, you run the following regression:
Attend = 15,005 + 201 Temperat + 465 DodgNetWin + 82 OppNetWin
+ 9647 DFSaSu + 1328 Drain + 1609 D150m + 271 DDiv 978 D2001;
R2 =0.416, SER = 6983
Stock/Watson 2e -- CVC2 8/23/06 -- Page 132
where Attend is announced stadium attendance, Temperat it the average temperature on game day, DodgNetWin
are the net wins of the Dodgers before the game (wins -losses), OppNetWin is the opposing teams net wins at
the end of the previous season, and DFSaSu, Drain, D150m, Ddiv, and D2001 are binary variables, taking a
value of 1 if the game was played on a weekend, it rained during that day, the opposing team was within a 150
mile radius, the opposing team plays in the same division as the Dodgers, and the game was played during
2001, respectively.
(a) Interpret the regression results. Do the coefficients have the expected signs?
(b) Excluding the last four binary variables results in the following regression result:
Attend = 14,838 + 202 Temperat + 435 DodgNetWin + 90 OppNetWin
+ 10,472 DFSaSu, R2 =0.410, SER = 6925
According to this regression, what is your forecast of the change in attendance if the temperature increases by
30 degrees? Is it likely that people attend more games if the temperature increases? Is it possible that Temperat
picks up the effect of an omitted variable?
(c) Assuming that ticket sales depend on prices, what would your policy advice be for the Dodgers to increase
attendance?
(d) Dodger stadium is large and is not often sold out. The Boston Red Sox play in a much smaller stadium,
Fenway Park, which often reaches capacity. If you did the same analysis for the Red Sox, what problems would
you foresee in your analysis?
Answer: (a) 10 degree warmer temperature increases attendance by roughly 2,000. A 10 game net increase in wins
results in approximately 4,600 more spectators. If the opponents net win is 10 games higher when
compared to another team, then roughly 800 more people attend. Weekend games attract almost 10,000
more people on average. Rain during the day of the game brings out close to 1,300 more fans. A team
from closer by, such as the Angels or the Diamondbacks, attract a bit more than 1,600 more people, and a
team from the same division results in close to 270 more fans in the stadium. On average, there were
approximately 1,000 fewer spectators per game in 2001 than in 2000, holding all other factors constant.
With the exception of the rain variable, the signs correspond to prior expectation. The regression
explains 41.6 percent of the variation in Dodger attendance.
(b) For an increase in 30 degrees, there will be roughly 6,000 more people in attendance. Although
people prefer 75 degrees over 45 degrees, it is unlikely that they prefer 105 degrees over 75 degrees.
Temperature rises during the baseball season in Los Angeles. There are typically fewer people in
attendance during the earlier parts of the season than during the latter parts. Binary variables for the
month of the year would pick up such an effect.
(c) The only variable that management has limited control over is the performance of the team. The
policy advice would therefore be to assure a superior team performance, which, in turn, increases
attendance. (Stating the obvious is not going to keep the consultant on the payroll much longer.)
(d) If there was a serious capacity constraint, then estimating the equation in the above way would not
yield sensible results. Imagine that Fenway Park was basically sold out and the Red Sox would now
improve their net wins. Since you would not observe an increase in the dependent variable, the
coefficient for net wins would necessarily have to be zero.
9) The administration of your university/college is thinking about implementing a policy of coed floors only in
dormitories. Currently there are only single gender floors. One reason behind such a policy might be to
generate an atmosphere of better understanding between the sexes. The Dean of Students (DoS) has decided
to investigate if such a behavior results in more togetherness by attempting to find the determinants of the
gender composition at the dinner table in your main dining hall, and in that of a neighboring university, which
only allows for coed floors in their dorms. The survey includes 176 students, 63 from your university/college,
and 113 from a neighboring institution.
(a) The Deans first problem is how to define gender composition. To begin with, the survey excludes single
persons tables, since the study is to focus on group behavior. The Dean also eliminates sports teams from the
analysis, since a large number of single-gender students will sit at the same table. Finally, the Dean decides to
only analyze tables with three or more students, since she worries about couples distorting the results. The
Stock/Watson 2e -- CVC2 8/23/06 -- Page 133
Dean finally settles for the following specification of the dependent variable:
GenderComp= (50%-% of Male Students at Table)
Where Z stands for absolute value of Z. The variable can take on values from zero to fifty. Briefly analyze
some of the possible values. What are the implications for gender composition as more female students join a
given number of males at the table? Why would you choose the absolute value here? Discuss some other
possible specifications for the dependent variable.
(b) After considering various explanatory variables, the Dean settles for an initial list of eight, and estimates the
following relationship:
GenderComp = 30.90 3.78 Size 8.81 DCoed + 2.28 DFemme + 2.06 DRoommate
- 0.17 DAthlete + 1.49 DCons 0.81 SAT + 1.74 SibOther, R2 =0.24, SER = 15.50
where Size is the number of persons at the table minus 3, DCoed is a binary variable, which takes on the value
of 1 if you live on a coed floor, DFemme is a binary variable, which is 1 for females and zero otherwise,
DRoommate is a binary variable which equals 1 if the person at the table has a roommate and is zero otherwise,
DAthlete is a binary variable which is 1 if the person at the table is a member of an athletic varsity team, DCons
is a variable which measures the political tendency of the person at the table on a seven -point scale, ranging
from 1 being liberal to 7 being conservative, SAT is the SAT score of the person at the table measured on a
seven-point scale, ranging from 1 for the category 900-1000 to 7 for the category 1510 and above, and
increasing by one for 100 point increases, and SibOther is the number of siblings from the opposite gender in
the family the person at the table grew up with.
Interpret the above equation carefully, justifying the inclusion of the explanatory variables along the way. Does
it make sense to interpret the constant in the above regression?
(c) Had the Dean used the number of people sitting at the table instead of Number-3, what effect would that
have had on the above specification?
(d) If you believe that going down the hallway and knocking on doors is one of the major determinants of who
goes to eat with whom, then why would it not be a good idea to survey students at lunch tables?
Answer: (a) 3 females, 0 males: 50; 0 females, 3 males: 50; 2 females, 2 males: 0; 1 female, 3 males: 30; 4 females, 3
males: 7.143. For a given number of males, say 3, the gender composition will first decrease as the
number of females increases from 0 to 3. After that, the gender composition will decrease again. You
need to choose the absolute value because having many individuals from one gender relative to the
other is equally bad for a balanced gender composition. Another possibility would be to use the squared
difference.
(b) The larger the size at the table, the more balanced the gender composition. Consider a table of 6,
where you find two more males than females (4 females, 2 males, gender composition = 16.7) versus a
table of 14, where you have two more males than females (gender composition = 7.1). Obviously, if
males and females increased in the same proportion, then gender composition would not change. This
has not happened here. Students from a coed floor are more likely to sit at a more balanced table in terms
of gender composition. This is likely to happen if students knock on neighbors doors to see who is
willing to join them for lunch. Females are less likely to sit at gender balanced tables, and there is no prior
on the coefficient of this variable. Having a roommate increases the likelihood of gender imbalance.
Roommates are from the same gender, and joining the roommate for a meal results in a more
imbalanced gender composition. Being a member of a varsity team decreases the gender imbalance.
Recall that sports teams sitting together are excluded from the sample. Although there is no strong prior
here, the result suggests that varsity team members have more friends, on average, from the other sex
than does the general student body. Having a more conservative view, holding other factors constant,
results in sitting at meals with more people from the same sex. More intelligent students, or at least those
with a higher SAT score, sit more frequently with students from the other sex. Having had more siblings
from the other gender at home results in a more imbalanced gender composition: the female student
who had four brothers when she grew up has had enough of this sort of experience (although, given the
Stock/Watson 2e -- CVC2 8/23/06 -- Page 134
specification of the dependent variable, it is also possible that she continues to sit with four males). There
are no observations close to the origin, so it is best not to interpret the dependent variable. 24 percent of
the variation in gender composition is explained by the regression.
(c) The only change would be in the intercept.
(d) Many students attend lectures before lunch, and may ask some of the students attending the same
lecture to join them for lunch.
10) The Solow growth model suggests that countries with identical saving rates and population growth rates
should converge to the same per capita income level. This result has been extended to include investment in
human capital (education) as well as investment in physical capital. This hypothesis is referred to as the
conditional convergence hypothesis, since the convergence is dependent on countries obtaining the same
values in the driving variables. To test the hypothesis, you collect data from the Penn World Tables on the
average annual growth rate of GDP per worker (g6090) for the 1960-1990 sample period, and regress it on the
(i) initial starting level of GDP per worker relative to the United States in 1960 (RelProd 60), (ii) average
population growth rate of the country (n), (iii) average investment share of GDP from 1960 to1990 ( SK remember investment equals savings), and (iv) educational attainment in years for 1985 ( Educ). The results for
close to 100 countries is as follows:
g6090 = 0.004 0.172 n + 0.133 SK + 0.002 Educ 0.044 RelProd 60,
R2 =0.537, SER = 0.011
(a) Interpret the results. Do the coefficients have the expected signs? Why does a negative coefficient on the
initial level of per capita income indicate conditional convergence (beta-convergence)?
(b) Equations of the above type have been labeled determinants of growth equations in the literature. You
recall from your intermediate macroeconomics course that growth in the Solow growth model is determined by
technological progress. Yet the above equation does not contain technological progress. Is that inconsistent?
Answer: (a) All slope coefficients have the expected sign given the economic theory behind the equation. The
negative coefficient implies that countries which were further behind grew relatively faster, or, put
differently, countries which had a higher relative per capita income in 1960 grew relatively slower.
(b) The equation only determines growth relative to a given starting point, namely per capita income in
1960. Compare this to runners placed on a track where the starting blocks are at various points of the
first 100 m. Let the race last for perhaps 10 seconds and let the runners stop at that point on the track. In
essence, you measure where the runners ended up given their starting point, or you can also measure
how far they ran given their starting point. In many ways, the above equation is therefore meant to
predict the per capita income level in 1990 rather than the growth.
11) You have collected a sub-sample from the Current Population Survey for the western region of the United
States. Running a regression of average hourly earnings (ahe) on an intercept only, you get the following result:
ahe = 0 = 18.58
a.
b.
You decide to include a single explanatory variable without an intercept. The binary variable DFemme
takes on a value of 1 for females but is 0 otherwise. The regression result changes as follows:
ahe = 1 DFemme = 16.50DFemme
What is the interpretation now?
c.
You generate a new binary variable DMale by subtracting DFemme from 1, and run the new regression:
ahe = 2 DMale = 20.09DMale
What is the interpretation of the coefficient now?
d.
After thinking about the above results, you recognize that you could have generated the last two results
either by running a regression on both binary variables, or on an intercept and one of the binary
variables. What would the results have been?
Answer: a. The mean average hourly earnings for the sample is $18.58.
b. The mean average hourly earnings for females is $16.50 in this sample.
c. The mean average hourly earnings for males is $20.09 in this sample.
d. ahe = 1 DFemme +
Then the effect of X on Y can be measured properly as long as the arrow from Z to Y does not exist, or as
long as changes in X do not cause changes in Z, which in return influence Y.
2) You have obtained data on test scores and student -teacher ratios in region A and region B of your state. Region
B, on average, has lower student-teacher ratios than region A. You decide to run the following regression
Yi = 0 +
1 X1i +
1 X2i +
3 X3i + ui
where X1 is the class size in region A, X2 is the difference in class size between region A and B, and X3 is the
class size in region B. Your regression package shows a message indicating that it cannot estimate the above
equation. What is the problem here and how can it be fixed?
Answer: There is perfect multicollinearity present since one of the three explanatory variables can always be
expressed linearly in terms of the other two. Hence there are not really three pieces of independent
information contained in the three explanatory variables. Dropping one of the three will solve the
problem.
3) In the case of perfect multicollinearity, OLS is unable to calculate the coefficients for the explanatory variables,
because it is impossible to change one variable while holding all other variables constant. To see why this is the
case, consider the coefficient for the first explanatory variable in the case of a multiple regression model with
two explanatory variables:
n
n
n
2
x 2i x 1i x 2i
y i x 2i
^
i=1
i=1
i=1
i=1
1=
n
n
n
2
2
x 1i
x 2i (
x 1i x 2i)2
i=1
i=1
i=1
y ix 1i
2
x 2i
2
x 2i
n
i=1
n
i=1
Answer: 1 =
y ix 1i
2
x 1i
i=1
n
i=1
n
1-
i=1
n
2
x 2i
x 1ix 2i
i=1
2
x 1i
x 1ix 2i
2
x 1i
i=1
n
yx1 - yx2 x 1 x 2
^
1- x 2 x 1 x 1 x 2
x 1ix 2i
i=1
i=1
^
i=1
n
n
y ix 2i
2
x 2i
n
1
y ixi
i=1
, so that the slope of a simple regression of Y on X is the inverse of the slope of a regression
n
2
yi
i=1
yx1 - yx2 x 1 x 2
yx1 - yx2 x 1 x 2
, which is not defined. The denominator would be zero in
=
1=
^
0
1
1- x 2 x 1 ^
x2x1
4) You try to establish that there is a positive relationship between the use of a fertilizer and the growth of a
certain plant. Set up the design of an experiment to establish the relationship, paying particular attention to
relevant control variables. Discuss in this context the effect of omitted variable bias.
Answer: The answer should follow the randomized controlled experiment described in section 1.2 of the
textbook: there should be several plots where the plant is placed, each receiving identical treatment. In
this context, the same amount of water and sunshine should be available to each plant, and the soil
should have the identical quality. Then some of the plots, determined randomly, should receive varying
amounts of the fertilizer. The average yield can then be regressed on the amount of fertilizer received.
The experiment could also allow for different amounts of sunshine and water, as long as this were
recorded meticulously. In this case, failing to record the amount of sunshine received and therefore not
including this variable in the regression would result in omitted variable bias. For obvious reasons, the
effect of the fertilizer on yield would be estimated incorrectly, since plants which receive more fertilizer
but are always in the shade would produce a lower yield.
5) In the multiple regression model with two regressors, the formula for the slope of the first explanatory variable
is
n
n
^
1=
i=1
i=1
n
2
x 2i -
y ix 1i
n
y i x 2i
i=1
n
2
x 1i
i=1
x 1ix 2i
i=1
2
x 2i - (
x 1ix 2i )2
i=1
i=1
n
y ix 2i
i=1
Answer: Step 1: y i = yx x 2i + v i; yx =
n
2
2
y ix 2i
^
2
x 2i
i=1
, and v i = y i n
i=1
2
x 2i
i=1
n
n
x 1ix 2i
x 2i.
i=1
Step 2: x 1i = x x x 2i + wi; x x =
n
1 2
1 2
x 1ix 2i
^
2
x 2i
i=1
, and wi = x 1i n
i=1
i=1
2
x 2i
x 2i
n
n
i=1
^
^^
Step 3: v i = wi;
x 1ix 2i
y ix 2i
[(y i -
i=1
n
2
x 2i
x 2i)(x 1i -
i=1
n
i=1
2
x 2i
x 2i)]
i=1
n
n
x 1ix 2i
(x 1i -
i=1
i=1
n
2
x 2i
x 2i)2
i=1
n
Multiplying out the terms in the numerator and denominator and expanding by
2
x 2i before
i=1
moving through the summation sign, results in
n
n
y ix 2i
2
x 2i n
n
x 1ix 2i
i=1
i=1
i=1
y ix 2i
i=1
i=1
2
x 1i
2
x 2i - (
= 1=
x 1ix 2i )2
i=1
i=1
n
2
x 2i -
y ix 1i
i=1
n
2
x 2i
2
x 1i
i=1
n
y ix 2i
i=1
i=1
i=1
2
x 2i - (
i=1
x 1ix 2i
i=1
x 1ix 2i )2
i=1
2
x 2i
i=1
6) In the multiple regression problem with k explanatory variable, it would be quite tedious to derive the
formulas for the slope coefficients without knowledge of linear algebra. The formulas certainly do not resemble
the formula for the slope coefficient in the simple linear regression model with a single explanatory variable.
However, it can be shown that the following three step procedure results in the same formula for slope
coefficient of the first explanatory variable, X1 :
Step 1: regress Y on a constant and all other explanatory variables other than X1 , and calculate the residual
(Res1).
Step 2: regress X1 on a constant and all other explanatory variables, and calculate the residual (Res2).
Step 3: regress Res1 on a constant and Res2.
Can you give an intuitive explanation to this procedure?
Answer: Step 1 eliminates the linear influence of all variables other than X1 from Y. Think of pouring a liquid
through a filter: the remaining liquid now contains the purified Y, or that part of Y that could not be
explained by the other Xs. The same happens in Step 2, where X1 is now purified from any correlation
with the other Xs. Step 3 establishes the purified relationship between Y and X1 .
(This procedure is of interest to students if they want to plot the two -dimensional relationship between
Y and X1 .)
7) Give at least three examples from macroeconomics and three from microeconomics that involve specified
equations in a multiple regression analysis framework. Indicate in each case what the expected signs of the
coefficients would be and if theory gives you an indication about the likely size of the coefficients.
Answer: Answers will vary by student. In my experience, students most frequently will bring up demand
functions (quantity demanded, price, and other variables such as income, price of substitutes, etc.),
supply functions (quantity supplied, price, costs), production functions (output produced, capital, labor,
and other inputs), consumption functions (consumption, income, and the real interest rate or wealth),
money demand functions (real money supply, income, and interest rate), and the Phillips curve
(inflation, unemployment rate, and inflationary expectations).
8) One of your peers wants to analyze whether or not participating in varsity sports lowers or increases the GPA
of students. She decides to collect data from 110 male and female students on their GPA and the number of
hours they spend participating in varsity sports. The coefficient in the simple regression function turns out to
be significantly negative, using the t-statistic and carrying out the appropriate hypothesis test. Upon reflection,
she is concerned that she did not ask the students in her sample whether or not they were female or male. You
point out to her that you are more concerned about the effect of omitted variables in her regression, such as the
incoming SAT score of the students, and whether or not they are in a major from a high/low grading
department. Elaborate on your argument.
Answer: The presence of omitted variables will result in an inconsistent estimator for the included variable
(number of hours spent in varsity sports) if at least one of the following two conditions holds: the
omitted variable is relevant in affecting the GPA and/or the omitted variable is correlated with the
included variable. Incoming SAT scores are clearly relevant in predicting GPAs, at least in the earlier
years. Hence it is relevant. Departmental differences in the general level of grading will even more
obviously have an effect on the GPA. The relationship therefore suffers from omitted variable bias.
9) (Requires Calculus) For the case of the multiple regression problem with two explanatory variables, show that
minimizing the sum of squared residuals results in three conditions:
n ^
n ^
n ^
ui = 0;
ui X1i = 0;
ui X2i = 0
i=1
i=1
i=1
Answer: To minimize the sum of squared prediction mistakes
n
i=1
you need to take the following three derivatives with respect to b0 , b1 and b2 . This results in
n
b0
b1
b2
i=1
n
i=1
n
i=1
n
i=1
n
i=1
n
i=1
The OLS estimators are those for which the derivatives are zero. Hence we get
-2
-2
-2
n
i=1
n
i=1
n
i=1
n ^
ui
i=1
n
^
^
^
(Yi - 0 - 1 X1i - 2 X2i) X 1i = 0 =
i=1
n
^
^
^
(Yi - 0 - 1 X1i - 2 X2i) X2i = 0 =
i=1
^
(Yi - 0 - 1 X1i -
2 X2i) = 0 =
uiX1i
^
uiX2i
10) The probability limit of the OLS estimator in the case of omitted variables is given in your text by the following
formula:
^
p ^
u
1 + Xu
X
Give an intuitive explanation for two conditions under which the bias will be small.
Answer: The bias will be small if there is little correlation between the included variable and the error term. The
error term contains the omitted variable. If the omitted variable is correlated with the included variable,
then the error term is correlated with the included variable. Now consider the case where the correlation
between the included and omitted variable is low, resulting in a low correlation between the error term
and the included variable. In that case, changes in the omitted variable will not result in changes in the
included variable, which, in return, changes Y, and making it appear as if the included variable had
changed Y.
The second condition is the size of the ratio of the two standard deviations. The formula suggests that if
the included variable varies substantially more than the error term, which contains the omitted variable,
then the inconsistency will be small. In that case, the relationship between the included variable and the
dependent variable does not get disturbed much by variations in the omitted variable.
11) It is not hard, but tedious, to derive the OLS formulae for the slope coefficient in the multiple regression case
with two explanatory variables. The formula for the first regression slope is
n
n
^
1=
2
x 2i -
y ix 1i
i=1
i=1
i=1
n
i=1
2
x 1i
n
y ix 2i
2
x 2i - (
i=1
x 1ix 2i
i=1
x 1ix 2i )2
i=1
2
x 1i
n
i=1
n
^
1=
2
x 1i
i=1
n
i=1
2
x 2i
2
x 1i
x 1ix 2i
i=1
n
i=1
n
x 1ix 2i
yx1 - yx2 x 1 x 2
1-
x2x1 x1x2
2
x 2i
i=1
^
Now if
i=1
2
x 1i
i=1
to get
x 1ix 2i
i=1
n
i=1
i=1
1n
2
x 2i
n
y ix 2i
y ix 1i
x 1 x 2 = 0, then
1=
yx1
1
have no effect on the coefficient which indicates the effect of a change in the included variable and the
dependent variable. However, you also do not observe the effect that a change in the omitted variable
has on the dependent variable.
12) (Requires Statistics background beyond Chapters 2 and 3) One way to establish whether or not there is
independence between two or more variables is to perform a 2 test on independence between two variables.
Explain why multiple regression analysis is a preferable tool to seek a relationship between variables.
Answer: The 2 test can only establish whether or not a relationship between variables exists, but it cannot tell
the researcher anything about the effect of a unit change in X on Y. If the researcher is interested in the
quantitative information, then she must use a multiple regression framework. The textbook example on
student performance can be used here for an explanation.
13) In the multiple regression with two explanatory variables, show that the TSS can still be decomposed into the
ESS and the RSS.
Answer: The proof proceeds along the same line as in the case of a single explanatory variable. The sample
regression function is given by
^
Yi = 0 + 1 X1i + 2 X2i + ui
The average is therefore
Y=
0 + 1 X1 + 2 X2
n ^
ui = 0. Subtracting the second equation from the first and letting
i=1
small letters indicate deviations from mean, results in
since the first order condition has
yi =
1 x 1i + 2 x 2i + ui or y i = y i + ui.
2
yi =
n ^
n ^ ^
n ^
2
2
yi +
y iui .
ui +2
i=1
i=1
i=1
14) The OLS formula for the slope coefficients in the multiple regression model become increasingly more
complicated, using the sums expressions, as you add more regressors. For example, in the regression with a
single explanatory variable, the formula is
n
(Xi X)(Yi - X)
i=1
n
i=1
(Xi - X)2
whereas this formula for the slope of the first explanatory variable is
n
n
y ix 1i
^
1=
2
x 2i -
i=1
i=1
n
2
x 1i
i=1
i=1
i=1
n
x 1ix 2i
y ix 2i
2
x 2i - (
x 1ix 2i )2
i=1
i=1
you need to take the following three derivatives with respect to b0 , b1 and b2 . This results in
n
b0
b1
b2
i=1
n
i=1
n
i=1
n
i=1
n
i=1
n
i=1
The OLS estimators are those for which the derivatives are zero. Hence we get
-2
i=1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 146
^ n
2
X 1i + 2
X2iX1i
i=1
i=1
i=1
i=1
n
n
n
n
^
^
^
^
^
^
2
X 2i + 1
X2iX1i
YiX2i = 0 nX2 + 2
(Yi - 0 - 1 X1i - 2 X2i) X2i = 0;
-2
i=1
i=1
i=1
i=1
-2
YiX1i = 0 nX1 + 1
After substituting the result for 0 into the last two equation, these have only two unknowns remaining,
^
namely 1 and 2 . Letting small letters indicate deviations from mean, you get
n
^ n
2
x 1i + 2
x 2ix 1i
i=1
i=1
i=1
n
n
n
^
^
2
x 1ix 2i + = 2
x 2i
y ix 2i = 1
i=1
i=1
i=1
^
y ix 1i = 1
y ix 1i =
i=1
i=1
2 . Here we isolate
n
x 2ix 1i
1
i=1
i=1
2
x 1i +
y ix 2i -
1 and
n
x 2ix 1i
2
x 2i
i=1
i=1
n
2
x 2i -
y ix 1i
^
1=
2
x 1i
i=1
n
y ix 2i
i=1
i=1
i=1
x 1ix 2i
i=1
2
x 2i - (
x 1ix 2i )2
i=1
i=1
2
x 1i -
y ix 2i
^
2=
n
i=1
2
x 1i
n
y ix 1i
i=1
i=1
i=1
n
i=1
2
x 2i - (
i=1
n
x 1ix 2i
.
x 1ix 2i )2
i=1
16) (Requires Calculus) For the simple linear regression model of Chapter 4, Yi = 0 + 1 Xi + ui, the OLS estimator
n
Xi Yi - nXY
^
^
^
i=1
for the intercept was 0 = Y 1 X, and 1 =
. Intuitively, the OLS estimators for the regression
n
2
X i - nX2
i=1
n
model Yi = 0 + 1 X1i + 2 X2i + ui might be
0 = Y 1 X1 2 X2 ,
1=
i=1
n
X1iYi - nX1 Y
and 2 =
2
2
X 1i - n X 1
i=1
n
i=1
n
X2iYi - nX2 Y
. By minimizing the prediction mistakes of the regression model with two explanatory
2
2
X 2i - n X 2
i=1
variables, show that this cannot be the case.
Answer: To minimize the sum of squared prediction mistakes
n
i=1
you need to take the following three derivatives with respect to b0 , b1 and b2 . This results in
n
b0
b1
b2
i=1
n
i=1
n
i=1
n
i=1
n
i=1
n
i=1
The OLS estimators are those for which the derivatives are zero. Hence we get
-2
n
i=1
n
0 = Y - 1 X1 - 2 X2
n
^
2
X 1i + 2
X2iX1i
i=1
i=1
i=1
i=1
n
n
^
^
^
^
^ n
^ n
2
YiX2i = 0 nX2 + 2
(Yi - 0 - 1 X1i - 2 X2i) X2i = 0;
X 2i + 1
X2iX1i
-2
i=1
i=1
i=1
i=1
-2
YiX1i = 0 nX1 + 1
0=Y-
1 X1 - 2 X2 . However, the second (third) expression involves terms in X2i (X1i), hence the
n
n
X1iYi - nX1 Y
X2iYi - nX2 Y
^
^
i=1
i=1
( 2=
) unless special
formula cannot be simplified to 1 =
n
n
2
2
2
2
X 1i - n X 1
X 2i - n X 2
i=1
i=1
n
conditions hold (such as
X2iX1i = 0).
i=1
17) Your textbook extends the simple regression analysis of Chapters 4 and 5 by adding an additional explanatory
variable, the percent of English learners in school districts (PctEl). The results are as follows:
TestScore = 698.9 2.28 STR
and
TestScore = 698.0 1.10 STR 0.65 PctEL
Explain why you think the coefficient on the student-teacher ratio has changed so dramatically (been more
than halved).
Answer: This is a good example of omitted variable bias. The previously excluded variable of percent of English
learners not only seems to matter and being economically important in the determination of testscores,
but also is correlated with the student-teacher ratio (recall that schools with higher student-teacher
ratios also had a positive correlation coefficient with the percent of English learners (of almost 20%). As a
result, there will be omitted variable bias if you regress the test scores on the student-teacher ratios only.
18) (Requires some Calculus) Consider the sample regression function .
^
Yi = 0 +
1 X1i +
2 X2i. Take the total derivative. Next show that the partial derivative
Yi
X1i
is obtained by
( 0+
1 X1i +
2 X2i) =
19) (Requires Appendix material) Consider the following population regression function model with two
^
explanatory variables: Yi = 0 +
following formula:
^ = 1
1 X1i +
1
n
1-
2
x 1 ,x 2
and X2i.
Answer: The answer should look something like this:
20) For this question, use the California Testscore Data Set and your regression package (a spreadsheet program if
necessary). First perform a multiple regression of testscores on a constant, the student -teacher ratio, and the
percent of English learners. Record the coefficients. Next, do the following three step procedure instead: first,
regress the testscore on a constant and the percent of English learners. Calculate the residuals and store them
under the name resYX2. Second, regress the student-teacher ratio on a constant and the percent of English
learners. Calculate the residuals from this regression and store these under the name resX1X2. Finally regress
resYX2 on resX1X2 (and a constant, if you wish). Explain intuitively why the simple regression coefficient in the
last regression is identical to the regression coefficient on the student-teacher ratio in the multiple regression.
Answer: This three step procedure actually explains how OLS controls for the influence of other variables. In the
first step, OLS removes the linear influence of the percent of English learners from the dependent
variable. The residuals from that regression represent the left-over of the testscores that the percent of
English learners could not explain (purified testscores; think of a filter removing some of the
elements). The same explanation holds for the second regression: the student -teacher ratio is purified (if
the percent of English learners actually have an influence on student-teacher ratios). In the final step,
you regress the two purified variables on each other.
21) Assume that you have collected cross-sectional data for average hourly earnings (ahe), the number of years of
education (educ) and gender of the individuals (you have coded individuals as 1 if they are female and 0 if
they are male; the name of the resulting variable is DFemme).
Having faced recent tuition hikes at your university, you are interested in the return to education, that is, how
much more will you earn extra for an additional year of being at your institution. To investigate this question,
you run the following regression:
ahe = -4.58 + 1.71educ
N = 14,925, R2 = 0.18, SER = 9.30
a.
b.
Being a female, you wonder how these results are affected if you entered a binary variable (DFemme),
which takes on the value of 1 if the individual is a female, and is 0 for males. The result is as
follows:
ahe = -3.44 - 4.09DFemme + 1.76educ
N = 14,925, R2 = 0.22, SER = 9.08
Does it make sense that the standard error of the regression decreased while the regression R2
increased?
c.
Do you think that the regression you estimated first suffered from omitted variable bias?
Answer: a. For every additional year of education, you receive $1.71 additional earnings. It is best not to interpret
the intercept, since there are no (or extremely few) observations at the origin.
b. The regression R2 cannot decrease if you add an explanatory variable. If the additional variable does
not contribute anything to the fit, then this measure will remain the same. However, in practice, this
does not happen. The standard error is a measure of the SSR, and these will almost always decrease
with the addition of an explanatory variable. As a result, the observed pattern in the two statistics is to be
expected.
c. There are two conditions for omitted variable bias to be present. First, DFemme must be a determinant
of ahe; and second, it must be correlated with educ. Given that you have not learned how to test for
statistical significance in the multiple regression model, the first question is hard to determine at this
point. However, you might argue that the coefficient seems large and that you have read elsewhere that
there is evidence of females earning less using this type of equation. With regard to the second question,
you could argue that the coefficient on educ has changed somewhat, although the increase does not seem
to be large ($0.05). For there to be a correlation between education and the binary female variable, you
would have to argue that males and females receive years of education. Either way, the omitted variable
bias in the first equation does not appear to be large.
22) You have collected data on individuals and their attributes. Consequently you have generated several binary
variables, which take on a value of 1 if the individual has that characteristic and are 0 otherwise. One
example is the binary variable DMarr which is 1 for married individuals and 0 for non -married variables.
If you run the following regression:
ahei= 0 + 1 educi + 2 DMarri + ui
a.
b.
You are interested in directly observing the effect that being non -married (single) has on
earnings, controlling for years of education. Instead of recoding all observations such that they are
1 for a not married individual and 0 for a married person, how can you generate such a
variable (DSingle) through a simple command in your regression program?
Answer: a. The coefficient will tell you by how much, on average, a married persons average hourly earnings
differ from those of a non-married person, holding years of education constant.
b. gen DSingle = 1 DMarr (STATA); genr DSingle = 1 - DMarr (EViews)
23) Consider the following earnings function:
ahei= 0 + 1 DFemmei + 2 educi+...+ ui
versus the alternative specification
ahei= 0 DMale + 1 DFemmei + 2 educi+...+ ui
where ahe is average hourly earnings, DFemme is a binary variable which takes on the value of 1 if
the individual is a female and is 0 otherwise, educ measures the years of education, and DMale is a
binary variable which takes on the value of 1 if the individual is a male and is 0 otherwise. There
may be additional explanatory variables in the equation.
a.
How do the s and s compare? Putting it differently, having estimated the coefficients in the first
equation, can you derive the coefficients in the second equation without re-estimating the
regression?
b.
Will the goodness of fit measures, such as the regression R2 , differ between the two equations?
c.
What is the reason why economists typically prefer the second specification over the first?
Answer: a. 0 = 0 ; 1 = 0 + 1 ; 2 = 2
b. The regression R2 will be identical, as will be the standard error of the regression.
c. The second equation allows you to consider the difference between earnings of two sub-groups.
Economists are often interested in testing for such differences, rather than to find the average level of
earnings.
24) You would like to find the effect of gender and marital status on earnings. As a result, you consider running the
following regression:
ahei= 0 + 1 DFemmei + 2 DMarri + 3 DSingle i + 4 educi+...+ ui
Where ahe is average hourly earnings, DFemme is a binary variable which takes on the value of 1 if
the individual is a female and is 0 otherwise, DMarr is a binary variable which takes on the value of
1 if the individual is married and is 0 otherwise, DSingle takes on the value of 1 if the individual
is not married and is 0 otherwise. The regression program which you are using either returns a
message that the equation cannot be estimated or drops one of the coefficients. Why do you think that
is?
Answer: There is perfect multicollinearity here (dummy variable trap). You need to drop either Dmarr or
DSingle.
3
2
= 1 and 4 = 0
C) H0 : 2 = 0 and 3 = 0
D) H0 : 1 = - 2 and 1 + 2 = 1
Answer: A
5) When testing joint hypothesis, you should
A) use t-statistics for each hypothesis and reject the null hypothesis is all of the restrictions fail.
B) use the F-statistic and reject all the hypothesis if the statistic exceeds the critical value.
C) use t-statistics for each hypothesis and reject the null hypothesis once the statistic exceeds the critical
value for a single hypothesis.
D) use the F-statistics and reject at least one of the hypothesis if the statistic exceeds the critical value.
Answer: D
6) The overall regression F-statistic tests the null hypothesis that
A) all slope coefficients are zero.
B) all slope coefficients and the intercept are zero.
C) the intercept in the regression and at least one, but not all, of the slope coefficients is zero.
D) the slope coefficient of the variable of interest is zero, but that the other slope coefficients are not.
Answer: A
(SSRrestricted - SSRunrestricted)/q
SSRrestricted /(n - kunrestricted -1)
C) F=
(SSRunrestricted - SSRrestricted)/q
SSRunrestricted /(n - kunrestricted -1)
D) F=
(SSRrestricted - SSRunrestricted)/q-1)
SSRunrestricted /(n - kunrestricted)
Answer: A
9) All of the following are correct formulae for the homoskedasticity-only F-statistic, with the exception of
(SSRrestricted - SSRunrestricted)/q
A) F=
SSRunrestricted /(n - kunrestricted -1)
B) F=
(SSRunrestricted - SSRrestricted)/q
SSRrestricted /(n - krestricted -1)
C) F=
q
SSRunrestricted
D) F =
SSRrestricted
(n - kunrestricted-1)
-1
SSRunrestricted
q
Answer: B
10) In the multiple regression model, the t-statistic for testing that the slope is significantly different from zero is
calculated
A) by dividing the estimate by its standard error.
B) from the square root of the F-statistic.
C) by multiplying the p-value by 1.96.
D) using the adjusted R2 and the confidence interval.
Answer: A
11) To test joint linear hypotheses in the multiple regression model, you need to
A) compare the sums of squared residuals from the restricted and unrestricted model.
B) use the heteroskedasticity-robust F-statistic.
C) use several t-statistics and perform tests using the standard normal distribution.
D) compare the adjusted R2 for the model which imposes the restrictions, and the unrestricted model.
Answer: B
C) F=
D) F=
1 - R2 unrestricted)/q
R2 unrestricted /(n - kunrestricted -1)
(R2 unrestricted - R2 restricted)/q
(1-R2 unrestricted) /(n - krestricted -1)
(R2 unrestricted - R2 unrestricted)/q
(1-R2 unrestricted) /(n - krestricted -1)
Answer: A
13) Let R2 unrestricted and R2 restricted be 0.4366 and 0.4149 respectively. The difference between the unrestricted
and the restricted model is that you have imposed two restrictions. There are 420 observations. The F-statistic
in this case is
A) 4.61
B) 8.01
C) 10.34
D) 7.71
Answer: B
14) If you wanted to test, using a 5% significance level, whether or not a specific slope coefficient is equal to one,
then you should
A) subtract 1 from the estimated coefficient, divide the difference by the standard error, and check if the
resulting ratio is larger than 1.96.
B) add and subtract 1.96 from the slope and check if that interval includes 1.
C) see if the slope coefficient is between 0.95 and 1.05.
D) check if the adjusted R2 is close to 1.
Answer: A
15) If the absolute value of your calculated t-statistic exceeds the critical value from the standard normal
distribution you can
A) safely assume that your regression results are significant.
B) reject the null hypothesis.
C) reject the assumption that the error terms are homoskedastic.
D) conclude that most of the actual values are very close to the regression line.
Answer: B
16) If you reject a joint null hypothesis using the F-test in a multiple hypothesis setting, then
A) a series of t-tests may or may not give you the same conclusion.
B) the regression is always significant.
C) all of the hypotheses are always simultaneously rejected.
D) the F-statistic must be negative.
Answer: A
17) When your multiple regression function includes a single omitted variable regressor, then
A) use a two-sided alternative hypothesis to check the influence of all included variables.
B) the estimator for your included regressors will be biased if at least one of the included variables is
correlated with the omitted variable.
C) the estimator for your included regressors will always be biased.
D) lower the critical value to 1.645 from 1.96 in a two -sided alternative hypothesis to test the significance of
the coefficients of the included variables.
Answer: B
18) A 95% confidence set for two or more coefficients is a set that contains
A) the sample values of these coefficients in 95% of randomly drawn samples.
B) integer values only.
C) the same values as the 95% confidence intervals constructed for the coefficients.
D) the population values of these coefficients in 95% of randomly drawn samples.
Answer: D
19) When there are two coefficients, the resulting confidence sets are
A) rectangles.
B) ellipses.
C) squares.
D) trapezoids.
Answer: B
20) When testing the null hypothesis that two regression slopes are zero simultaneously, then you cannot reject the
null hypothesis at the 5% level, if the ellipse contains the point
A) (-1.96, 1.96).
B) (0, 1.96) .
C) (0,0).
D) (1.962 , 1.96 2 ).
Answer: C
21) The OLS estimators of the coefficients in multiple regression will have omitted variable bias
A) only if an omitted determinant of Yi is a continuous variable.
B) if an omitted variable is correlated with at least one of the regressors, even though it is not a determinant
of the dependent variable.
C) only if the omitted variable is not normally distributed.
D) if an omitted determinant of Yi is correlated with at least one of the regressors.
Answer: D
22) At a mathematical level, if the two conditions for omitted variable bias are satisfied, then
A) E(ui X1i, X2i,..., Xki) 0.
B) there is perfect multicollinearity.
C) large outliers are likely: X1i, X2i,..., Xki and Yi and have infinite fourth moments.
D) (X1i, X2i,..., Xki,Yi), i = 1,..., n are not i.i.d. draws from their joint distribution.
Answer: A
23) All of the following are true, with the exception of one condition:
A) a high R2 or R2 does not mean that the regressors are a true cause of the dependent variable.
B) a high R2 or R2 does not mean that there is no omitted variable bias.
C) a high R2 or R2 always means that an added variable is statistically significant.
D) a high R2 or R2 does not necessarily mean that you have the most appropriate set of regressors.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 157
24) The general answer to the question of choosing the scale of the variables is
A) dependent on you whim.
B) to make the regression results easy to read and to interpret.
C) to ensure that the regression coefficients always lie between -1 and 1.
D) irrelevant because regardless of the scale of the variable, the regression coefficient is unaffected.
Answer: B
25) If the estimates of the coefficients of interest change substantially across specifications,
A) then this can be expected from sample variation.
B) then you should change the scale of the variables to make the changes appear to be smaller.
C) then this often provides evidence that the original specification had omitted variable bias.
D) then choose the specification for which your coefficient of interest is most significant.
Answer: C
26) You have estimated the relationship between testscores and the student -teacher ratio under the assumption of
homoskedasticity of the error terms. The regression output is as follows: TestScore = 698.9 - 2.28STR, and the
standard error on the slope is 0.48. The homoskedasticity -only overall regression F- statistic for the
hypothesis that the Regression R2 is zero is approximately
A) 0.96
B) 1.96
C) 22.56
D) 4.75
Answer: C
27) Consider a regression with two variables, in which X1i is the variable of interest and X2i is the control variable.
Conditional mean independence requires
A) E(ui|X1i, X2i) = E(ui|X2i)
B) E(ui|X1i, X2i) = E(ui|X1i)
C) E(ui|X1i) = E(ui|X2i)
D) E(ui) = E(ui|X2i)
Answer: A
28) The homoskedasticity-only F-statistic and the heteroskedasticity-robust F-statistic typically are
A) the same
B) different
C) related by a linear function
D) a multiple of each other (the heteroskedasticity-robust F-statistic is 1.96 times the homoskedasticity-only
F-statistic)
Answer: B
29) Consider the following regression output where the dependent variable is testscores and the two explanatory
variables are the student-teacher ratio and the percent of English learners:
TestScore = 698.9 - 1.10STR - 0.650PctEL. You are told that the t-statistic on the student-teacher ratio
coefficient is 2.56. The standard error therefore is approximately
A) 0.25
B) 1.96
C) 0.650
D) 0.43
Answer: D
1) The F-statistic with q = 2 restrictions when testing for the restrictions 1 = 0 and 2 = 0 is given by the
following formula:
F=
1
2
^
2
2
t 1 + t 2 - 2 t ,t t1 t
1 2
1-
^2
t1 ,t2
2
2
1
t 1 + t 2 .The F2, distribution is the
2
distribution of a random variable with a chi-squared distribution with 2 degrees of freedom, divided by
2. Equivalently, the F2, distribution is the distribution of the average of 2 squared standard normal
random variables. Because the t-statistics are uncorrelated by assumption, they are independent
standard normal random variables under the null hypothesis. If either 1 or 2 are nonzero (or both),
2
2
then either t 1 or t 2 or both will be large. This leads to a large F-statistic, and hence a rejection of the
null hypothesis.
2) The cost of attending your college has once again gone up. Although you have been told that education is
investment in human capital, which carries a return of roughly 10% a year, you (and your parents) are not
pleased. One of the administrators at your university/college does not make the situation better by telling you
that you pay more because the reputation of your institution is better than that of others. To investigate this
hypothesis, you collect data randomly for 100 national universities and liberal arts colleges from the 2000 -2001
U.S. News and World Report annual rankings. Next you perform the following regression
^
(c) You want to test simultaneously the hypotheses that size = 0 and Dilbert = 0. Your regression package
returns the F-statistic of 1.23. Can you reject the null hypothesis?
(d) Eliminating the Size and Dlibart variables from your regression, the estimation regression becomes
^
0 = Y- 1 X1 - 2 X2
n
^
1=
y ix 1i
i=1
2
x 1i
i=1
n
^
2=
2
x 2i - (
i=1
y ix 2i
n
i=1
2
x 1i
x 1ix 2i
i=1
n
x 1ix 2i )2
i=1
n
2
x 1i -
y ix 1i
i=1
i=1
i=1
y ix 2i
i=1
i=1
n
2
x 2i -
2
x 2i - (
x 1ix 2i
i=1
n
x 1ix 2i )2
i=1
i=1
You have collected data for 104 countries of the world from the Penn World Tables and want to estimate the
effect of the population growth rate (X1i) and the saving rate (X2i) (average investment share of GDP from
1980 to 1990) on GDP per worker (relative to the U.S.) in 1990. The various sums needed to calculate the OLS
estimates are given below:
n
Yi = 33.33;
X1i = 2.025;
X2i =17.313
i=1
i=1
n
n
2
2
2
y i = 8.3103;
x 1i = .0122;
x 2i = 0.6422
i=1
i=1
i=1
i=1
n
n
i=1
y ix 1i = - 0.2304;
n
i=1
y ix 2i = 1.5676;
x 1ix 2i = -0.0520
i=1
The heteroskedasticity-robust standard errors of the two slope coefficients are 1.99 (for population growth)
and 0.23 (for the saving rate). Calculate the 95% confidence interval for both coefficients. How many standard
deviations are the coefficients away from zero?
Answer: The 95% confidence interval for the population growth is (16.85, -9.05), and the 95% confidence interval
for the saving rate is (0.94, 1.84). The population growth coefficient has a t-statistic of -6.51, and the
saving rate coefficient of 6.04. These represent standard deviations away from zero.
4) A subsample from the Current Population Survey is taken, on weekly earnings of individuals, their age, and
their gender. You have read in the news that women make 70 cents to the $1 that men earn. To test this
hypothesis, you first regress earnings on a constant and a binary variable, which takes on a value of 1 for
females and is 0 otherwise. The results were:
Earn = 570.70 - 170.72 Female, R2 =0.084, SER = 282.12.
(9.44) (13.52)
(a) Perform a difference in means test and indicate whether or not the difference in the mean salaries is
significantly different. Justify your choice of a one-sided or two-sided alternative test. Are these results
evidence enough to argue that there is discrimination against females? Why or why not? Is it likely that the
errors are normally distributed in this case? If not, does that present a problem to your test?
(b) Test for the significance of the age and gender coefficients. Why do you think that age plays a role in
earnings determination?
Answer: (a) The t-statistic is -12.63, while the critical value is 1.64. The difference is therefore statistically
significant. A one-sided alternative was chosen since the claim is that females make less than males. This
represents little evidence of discrimination, since attributes of males and females have not been included.
Given that earnings distributions are not normally distributed, the errors will also not be distributed
normally, and assuming that they are, results in problematic inference.
(b) The t-statistics are 9.36 for the age coefficient, and -13.00 for the gender coefficient. Both of these
values are greater than the (absolute) critical value from the standard normal distribution (1.64). Hence
you can reject the null hypothesis that these coefficients are zero. Age proxies on the job training. A
better proxy that has been used frequently in the past is the Mincer experience variable
(Age-Education-6). Obviously this is a better proxy for some subsample of individuals than for others.
5) You have collected data from Major League Baseball (MLB) to find the determinants of winning. You have a
general idea that both good pitching and strong hitting are needed to do well. However, you do not know how
much each of these contributes separately. To investigate this problem, you collect data for all MLB during
1999 season. Your strategy is to first regress the winning percentage on pitching quality (Team ERA), second
to regress the same variable on some measure of hitting (OPS On -base Plus Slugging percentage), and
third to regress the winning percentage on both.
Summary of the Distribution of Winning Percentage, On Base plus Slugging Percentage,
and Team Earned Run Average for MLB in 1999
Average Standard
deviation
Team
4.71
ERA
OPS
0.778
Winning 0.50
Percentage
Percentile
10%
25%
40%
75%
90%
4.72
50%
60%
(median)
4.78
4.91
0.53
3.84
4.35
5.06
5.25
0.034
0.08
0.720
0.40
0.754
0.43
0.769
0.46
0.780
0.48
0.798
0.59
0.820
0.60
0.790
0.49
6) In the process of collecting weight and height data from 29 female and 81 male students at your university, you
also asked the students for the number of siblings they have. Although it was not quite clear to you initially
what you would use that variable for, you construct a new theory that suggests that children who have more
siblings come from poorer families and will have to share the food on the table. Although a friend tells you that
this theory does not pass the straight-face test, you decide to hypothesize that peers with many siblings will
weigh less, on average, for a given height. In addition, you believe that the muscle/fat tissue composition of
male bodies suggests that females will weigh less, on average, for a given height. To test these theories, you
perform the following regression:
Studentw
7) You have collected data for 104 countries to address the difficult questions of the determinants for differences
in the standard of living among the countries of the world. You recall from your macroeconomics lectures that
the neoclassical growth model suggests that output per worker (per capita income) levels are determined by,
among others, the saving rate and population growth rate. To test the predictions of this growth model, you
run the following regression:
(0.229)
where RelPersInc is GDP per worker relative to the United States, n is the average population growth rate,
1980-1990, and SK is the average investment share of GDP from 1960 to1990 (remember investment equals
saving). Numbers in parentheses are for heteroskedasticity-robust standard errors.
(a) Calculate the t-statistics and test whether or not each of the population parameters are significantly
different from zero.
(b) The overall F-statistic for the regression is 79.11. What is the critical value at the 5% and 1% level? What is
your decision on the null hypothesis?
(c) You remember that human capital in addition to physical capital also plays a role in determining the
standard of living of a country. You therefore collect additional data on the average educational attainment in
years for 1985, and add this variable (Educ) to the above regression. This results in the modified regression
output:
RelPersInc = 0.046 5.869 n + 0.738 SK + 0.055 Educ, R2 =0.775, SER = 0.1377
(0.079) (2.238)
(0.294)
(0.010)
8) Attendance at sports events depends on various factors. Teams typically do not change ticket prices from game
to game to attract more spectators to less attractive games. However, there are other marketing tools used, such
as fireworks, free hats, etc., for this purpose. You work as a consultant for a sports team, the Los Angeles
Dodgers, to help them forecast attendance, so that they can potentially devise strategies for price
discrimination. After collecting data over two years for every one of the 162 home games of the 2000 and 2001
season, you run the following regression:
Attend = 15,005 + 201 Temperat + 465 DodgNetWin + 82 OppNetWin
(8,770) (121)
(169)
(26)
+ 9647 DFSaSu + 1328 Drain + 1609 D150m + 271 DDiv 978 D2001;
(1505)
(3355)
(1819)
(1,184)
(1,143)
R2 =0.416, SER = 6983
where Attend is announced stadium attendance, Temperat it the average temperature on game day, DodgNetWin
are the net wins of the Dodgers before the game (wins -losses), OppNetWin is the opposing teams net wins at
the end of the previous season, and DFSaSu, Drain, D150m, Ddiv, and D2001 are binary variables, taking a
value of 1 if the game was played on a weekend, it rained during that day, the opposing team was within a 150
mile radius, the opposing team plays in the same division as the Dodgers, and the game was played during
2001, respectively. Numbers in parentheses are heteroskedasticity- robust standard errors.
(a) Are the slope coefficients statistically significant?
(b) To test whether the effect of the last four binary variables is significant, you have your regression program
calculate the relevant F-statistic, which is 0.295. What is the critical value? What is your decision about
excluding these variables?
Answer: (a) The t-statistics for Temperat, DodgNewWin, OppNetWin, and DFSaSu are all statistically significant at
the 5% level, using a one-sided test. The constant is insignificant using a two-sided test. All the other
coefficients are not statistically significant at the 5% level.
(b) The critical value at the 5% level is 2.37. Hence you cannot reject the null hypothesis that all four
coefficients are simultaneously zero.
9) The administration of your university/college is thinking about implementing a policy of coed floors only in
dormitories. Currently there are only single gender floors. One reason behind such a policy might be to
generate an atmosphere of better understanding between the sexes. The Dean of Students (DoS) has decided
to investigate if such a behavior results in more togetherness by attempting to find the determinants of the
gender composition at the dinner table in your main dining hall, and in that of a neighboring university, which
only allows for coed floors in their dorms. The survey includes 176 students, 63 from your university/college,
and 113 from a neighboring institution.
The Deans first problem is how to define gender composition. To begin with, the survey excludes single
persons tables, since the study is to focus on group behavior. The Dean also eliminates sports teams from the
analysis, since a large number of single-gender students will sit at the same table. Finally, the Dean decides to
only analyze tables with three or more students, since she worries about couples distorting the results. The
Dean finally settles for the following specification of the dependent variable:
GenderComp= (50%-% of Male Students at Table)
Where Z stands for absolute value of Z. The variable can take on values from zero to fifty.
After considering various explanatory variables, the Dean settles for an initial list of eight, and estimates the
following relationship, using heteroskedasticity-robust standard errors (this Dean obviously has taken an
econometrics course earlier in her career and/or has an able research assistant):
GenderComp = 30.90 3.78 Size 8.81 DCoed + 2.28 DFemme +2.06 DRoommate
Stock/Watson 2e -- CVC2 8/23/06 -- Page 166
(7.73) (0.63)
(2.66)
(2.42)
(2.39)
- 0.17 DAthlete + 1.49 DCons 0.81 SAT + 1.74 SibOther, R2 =0.24, SER = 15.50
(3.23)
(1.10)
(1.20)
(1.43)
where Size is the number of persons at the table minus 3; DCoed is a binary variable, which takes on the value
of 1 if you live on a coed floor; DFemme is a binary variable, which is 1 for females and zero otherwise;
DRoommate is a binary variable which equals 1 if the person at the table has a roommate and is zero otherwise;
DAthlete is a binary variable which is 1 if the person at the table is a member of an athletic varsity team; DCons
is a variable which measures the political tendency of the person at the table on a seven -point scale, ranging
from 1 being liberal to 7 being conservative; SAT is the SAT score of the person at the table measured on a
seven-point scale, ranging from 1 for the category 900-1000 to 7 for the category 1510 and above; and
increasing by one for 100 point increases; and SibOther is the number of siblings from the opposite gender in
the family the person at the table grew up with.
(a) Indicate which of the coefficients are statistically significant.
(b) Based on the above results, the Dean decides to specify a more parsimonious form by eliminating the least
significant variables. Using the F-statistic for the null hypothesis that there is no relationship between the
gender composition at the table and DFemme, DRoommate, DAthlete, and SAT, the regression package returns a
value of 1.10. What are the degrees of freedom for the statistic? Look up the 1% and 5% critical values from the
F- table and make a decision about the exclusion of these variables based on the critical values.
(c) The Dean decides to estimate the following specification next:
GenderComp = 29.07 3.80 Size 9.75 DCoed + 1.50 DCons + 1.97 SibOther,
(3.75) (0.62)
(1.04)
(1.04)
(1.44)
R2 =0.22 SER = 15.44
Calculate the t-statistics for the coefficients and discuss whether or not the Dean should attempt to simplify the
specification further. Based on the results, what might some of the comments be that she will write up for the
other senior administrators of your college? What are some of the potential flaws in her analysis? What other
variables do you think she should have considered as explanatory factors?
Answer: (a) Only the constant, Size, and DCoed are statistically significant at the 5% level.
(b ) The F4, is 2.37 at the 5% level, and 3.32 at the 1% level. Hence you cannot reject the null hypothesis
that all four coefficients are zero.
(c) The t-statistics for the five coefficients are as follows: 7.75, -6.13, -9.38, 1.44 and 1.37. The Dean
should leave the specification as is and allow readers to decide if they want to place much weight on the
insignificant coefficients. The variable of interest is DCoed and she will most likely focus on that,
concluding that having coed floors in dormitories will increase the gender balance at dining hall tables.
She will most likely go further in her report and suggest that communication between the sexes will
improve as a result of coed floors.
One of the major flaws in the analysis is that students from one college do not have coed floors in
dormitories while students from the other college do not have single gender floors. Ideally you would
like to survey students from the same college where some of the students lived on single gender floors
while others did not. Answers on omitted variables will obviously vary. Ideally some survey question
should be included which would indicate the students attitude towards the other sex.
10) The Solow growth model suggests that countries with identical saving rates and population growth rates
should converge to the same per capita income level. This result has been extended to include investment in
human capital (education) as well as investment in physical capital. This hypothesis is referred to as the
conditional convergence hypothesis, since the convergence is dependent on countries obtaining the same
values in the driving variables. To test the hypothesis, you collect data from the Penn World Tables on the
average annual growth rate of GDP per worker (g6090) for the 1960-1990 sample period, and regress it on the
(i) initial starting level of GDP per worker relative to the United States in 1960 (RelProd 60), (ii) average
population growth rate of the country (n), (iii) average investment share of GDP from 1960 to1990 ( SK remember investment equals savings), and (iv) educational attainment in years for 1985 ( Educ). The results for
close to 100 countries is as follows (numbers in parentheses are for heteroskedasticity-robust standard errors):
g6090 = 0.004 - 0.172 n + 0.133 SK + 0.002 Educ 0.044 RelProd60,
(0.007) (0.209)
2
R =0.537, SER = 0.011
(0.015)
(0.001)
(0.008)
(a) Is the coefficient on this variable significantly different from zero at the 5% level? At the 1% level?
(b) Test for the significance of the other slope coefficients. Should you use a one-sided alternative hypothesis or
a two-sided test? Will the decision for one or the other influence the decision about the significance of the
parameters? Should you always eliminate variables which carry insignificant coefficients?
Answer: (a) The coefficient has a t-statistic of 5.50 and is therefore statistically significant at both the 5% and the
1% level.
(b) The t-statistics are 0.82. 8.87, and 2.00. Hence the coefficient on population growth is not statistically
significant. You should use a one-sided alternative hypothesis test since economic theory gives you
information about the expected sign on these variables. In the above case, the decision will not be
influenced by the choice of a one-sided or two-sided test, since the (absolute value of the) critical value
is 1.64 or 1.96 at the 5% significance level. If there is a strong prior on the sign of the coefficient, then the
variable should not be eliminated based on the significance test. Instead it should be left in the equation,
but the low p-value should be flagged to the reader, and the reader should decide herself how
convincing the evidence is in favor of the theory.
11) Using the 420 observations of the California School data set from your textbook, you estimate the following
relationship:
TestScore = 681.44 - 0.61LchPct
n=420, R2 =0.75, SER = 9.45
where TestScore is the test score and LchPct is the percent of students eligible for subsidized lunch
(average = 44.7, max = 100, min = 0).
a.
b.
In your interpretation of the slope coefficient in (a) above, does it matter if you start your explanation
with for every x percent increase rather than for every x percentage point increase?
c.
The overall regression F-statistic is 1149.57. What are the degrees of freedom for this statistic?
d.
Find the critical value of the F-statistic at the 1% significance level. Test the null hypothesis that the
regression R2 = 0.
e.
The above equation was estimated using heteroskedasticity robust standard errors. What is the
standard error for the slope coefficient?
Answer: a. For every 10 percentage point increase in students eligible for subsidized lunch, average test scores go
up by 6.1 points. If a school has no students eligible for subsidized lunch, then the average test score is
approximately 681 points. 75% of the variation in test scores is explained by our model.
b. Since your RHS variable is measured already in percent, it makes sense to increase that variable by 10
percentage points (say), rather than by 10 percent. If LchPct increases from 20 to 30, then this
represents an increase of 10 percentage points, or an increase of 50 percent.
c. There are 2 degrees of freedom in the numerator, and 418 ( ) degrees of freedom in the denominator.
d. F2, = 4.61. Hence you can comfortable reject the null hypothesis of no linear relationship between test
scores and the percent of students eligible for subsidized lunch.
e. With a single explanatory variable, the t-statistic is the square root of the F-statistic. Here it is 33.91.
From this result, and given the size of the coefficient, the standard error is 1.80.
12) Consider the following regression using the California School data set from your textbook.
TestScore = 681.44 - 0.61LchPct
n=420, R2 =0.75, SER = 9.45
where TestScore is the test score and LchPct is the percent of students eligible for subsidized lunch
(average = 44.7, max = 100, min = 0).
a.
What is the effect of a 20 percentage point increase in the student eligible for subsidized lunch?
b.
2) Set up the null hypothesis and alternative hypothesis carefully for the following cases:
(a) k = 4, test for all coefficients other than the intercept to be zero
(b) k = 3, test for the slope coefficient of X1 to be unity, and the coefficients on the other explanatory variables to
be zero
(c) k = 10, test for the slope coefficient of X1 to be zero, and for the slope coefficients of X2 and X3 to be the
same but of opposite sign.
(d) k = 4, test for the slope coefficients to add up to unity
Answer: (a) H0 : 1 = 0, 2 = 0, 3 = 0, 4 = 0
(b) H0 : 1 = 1, 2 = 0, 3 = 0
(c) H0 : 1 = 0, 2 + 3 = 0
(d) H0 : 1 + 2 +
3 + 4= 1
3) Consider a situation where economic theory suggests that you impose certain restrictions on your estimated
multiple regression function. These may involve the equality of parameters, such as the returns to education
and on the job training in earnings functions, or the sum of coefficients, such as constant returns to scale in a
production function. To test the validity of your restrictions, you have your statistical package calculate the
corresponding F-statistic. Find the critical value from the F-distribution at the 5% and 1% level, and comment
whether or not you will reject the null hypothesis in each of the following cases.
(a) number of observations: 152; number of restrictions: 3; F-statistic: 3.21
(b) number of observations: 1,732; number of restrictions:7; F-statistic: 4.92
(c) number of observations: 63; number of restrictions: 1; F-statistic: 2.47
(d) number of observations: 4,000; number of restrictions: 5; F-statistic: 1.82
(e) Explain why you can use the Fq, distribution to compute the critical values in (a)-(d).
Answer: (a) F3, = 2.60 (5% level), F3, = 3.78 (1% level). Reject the null hypothesis at the 5% level, but not at the
1% level.
(b ) F7, = 2.01 (5% level), F7, = 2.64 (1% level). Reject the null hypothesis at the 5% level and at the 1%
level.
(c) F1, = 3.84 (5% level), F1, = 6.63 (1% level). Cannot reject the null hypothesis at the 5% level or at
the 1% level.
(d) F5, = 2.21 (5% level), F5, = 3.02 (1% level). Cannot reject the null hypothesis at the 5% level or at the
1% level.
(e) The F-statistic is distributed Fq, in large samples. Although strictly speaking this only holds for the
limiting case of n = , for practical purposes the approximation is close for n > 100. This is therefore
problematic for (c) above, where n = 63.
4) Females, on average, are shorter and weigh less than males. One of your friends, who is a pre -med student,
tells you that in addition, females will weigh less for a given height. To test this hypothesis, you collect height
and weight of 29 female and 81 male students at your university. A regression of the weight on a constant,
height, and a binary variable, which takes a value of one for females and is zero otherwise, yields the following
result:
Studentw = 229.21 6.36 Female + 5.58 Height , R2 =0.50, SER = 20.99
(43.39) (5.74)
(0.62)
where Studentw is weight measured in pounds and Height is measured in inches (heteroskedasticity-robust
standard errors in parentheses).
Calculate t-statistics and carry out the hypothesis test that females weigh the same as males, on average, for a
given height, using a 10% significance level. What is the alternative hypothesis? What is the p-value? What
critical value did you use?
Answer: The t-statistics for the intercept, the gender binary variable, and the height variable are -5.28, -1.11, and
9.00, respectively. For a one-sided alternative hypothesis, Female < 0, the critical value from the
standard normal table is 1.28. Hence you cannot reject the null hypothesis at the 10% level. The
p-value is 13.4%.
5) You are presented with the following output from a regression package, which reproduces the regression
results of testscores on the student-teacher ratio from your textbook
Coefficient
Std. Error
t-Statistic
Prob.
9.47
0.48
73.82
-4.75
0.00
0.00
698.93
-2.28
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.05
0.05
18.58
144315.48
-1822.25
0.13
654.16
19.05
8.69
8.71
22.58
0.00
Variable
C
STR
EL_PCT
EXPN_STU
Coefficient
Std. Error
t-Statistic
Prob.
649.58
-2.29
-0.66
0.00
15.21
0.48
0.04
0.00
42.72
-0.60
-16.78
2.74
0.00
0.55
0.00
0.01
R-squared
0.44
Adjusted R-squared
0.43
S.E. of regression
14.35
Sum squared resid
85699.71
Log likelihood
-1712.81
Durbin-Watson stat
0.74
654.16
19.05
8.18
8.21
107.45
0.00
Answer: (a) The F-statistic tests the null hypothesis that all slope coefficients are zero. In the case of a single
explanatory variable, this is the same as testing for the significance of the explanatory variable
coefficient. In that case, the F-statistic is the same as the square of the t-statistic in the case of a single
restriction (q = 1).
(b) There is no simple relationship between the F-statistic and the three t-statistics now. The F-statistic
tests the null hypothesis that H0 : STR = EL_PCT = EXPN_STU = 0 simultaneously. The t-statistics
test the significance of each slope coefficient separately.
6) Consider the following multiple regression model
Yi = 0 + 1 X1i + 2 X2i + 3 X3i + ui
You want to consider certain hypotheses involving more than one parameter, and you know that the regression
error is homoskedastic. You decide to test the joint hypotheses using the homoskedasticity -only F-statistics.
For each of the cases below specify a restricted model and indicate how you would compute the F-statistic to
test for the validity of the restrictions.
(a) 1 = - 2 ; 3 = 0
(b)
(c)
1+ 2+ 3=1
1 = 2; 3 = 0
Answer: (a) The restricted model is Yi = 0 + 2 (X2i - X1i) + ui = 0 and the rule-of-thumb F-statistic would be F
(SSRrestricted - SSRunrestricted/2
.
=
SSRunrestricted/n - 3-1
(b) (Yi - X3i) =
(SSRrestricted - SSRunrestricted/q
. Name conditions under which the
SSRunrestricted/(n - kunrestricted -1)
2
2
R unrestricted - R restricted /q
2
1- R unrestricted /(n-kunrestricted - 1)
TSS
ESS restricted)
TSS
/q
. Since R2 =
/(n - kunrestricted - 1)
ESS
, this gives us the expression
TSS
9) To calculate the homoskedasticity-only overall regression F-statistic, you need to compare the SSR restricted
with the SSRunrestricted. Consider the following output from a regression package, which reproduces the
regression results of testscores on the student-teacher ratio, the percent of English learners, and the
expenditures per student from your textbook:
Dependent Variable: TESTSCR
Method: Least Squares
Date: 07/30/06 Time: 17:55
Sample: 1 420
Included observations: 420
Variable
Coefficient
C
STR
EL_PCT
EXPN_STU
649.58
-0.29
-0.66
0.00
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
0.44
0.43
14.35
85699.71
-1712.81
0.74
Std. Error
t-Statistic
Prob.
15.21
0.48
0.04
0.00
42.72
-0.60
-16.78
2.74
0.00
0.55
0.00
0.01
654.16
19.05
8.18
8.21
107.45
0.00
Sum of squared resid corresponds to SSRunrestricted. How are you going to find SSRrestricted?
^
Answer: You could simply run a regression of Testscr on a constant. However, for the case the Testscoret = 0 +
^
STR STRi +
^
EL_PCT EL_PCT i +
+ ui, and for the restricted sum of square residuals, you get simply the variation in test scores
n
SSRrestricted =
(Testscore i - Testscore)2 .
i=1
10) Adding the Percent of English Speakers (PctEL) to the Student Teacher Ratio (STR) in your textbook reduced
the coefficient for STR from 2.28 to 1.10 with a standard error of 0.43. Construct a 90% and 99% confidence
interval to test the hypothesis that the coefficient of STR is 2.28.
Answer: The 90% confidence interval is (1.10 1.64 0.43) = (0.39, 1.81). The 99% confidence interval is (-0.01,
2.21). Hence you can reject the null hypothesis at both the 90% and 99% confidence level.
(SSRrestricted - SSRunrestricted)/q
SSRunrestricted/(n-kunrestricted - 1)
where SSRrestricted is the sum of squared residuals from the restricted regression, SSRunrestricted is the sum of
squared residuals from the unrestricted regression, q is the number of restrictions under the null hypothesis,
and kunrestricted is the number of regressors in the unrestricted regression. Prove that this formula is the same
as the following formula based on the regression R2 of the restricted and unrestricted regression:
F=
Coefficient
658.47
-0.76
-0.19
11.69
-0.37
-0.07
0.80
0.79
8.64
30888.64
-1498.51
1.51
Std. Error
t-Statistic
Prob.
7.68
85.73
0.00
0.23
0.03
1.74
0.04
0.06
-3.27
-5.62
6.71
-9.53
-1.21
0.00
0.00
0.00
0.00
0.23
654.16
19.05
7.16
7.22
324.94
0.00
Restricted model:
Dependent Variable: TESTSCR
Method: Least Squares
Date: 07/31/06 Time: 17:37
Sample: 1 420
Included observations: 420
Variable
C
STR
EL_PCT
LOG(AVGINC)
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
593.48
-0.39
-0.43
28.36
0.71
0.71
10.26
43792.42
-1571.82
1.30
Std. Error
6.96
0.27
0.03
1.40
t-Statistic
85.32
-1.42
-14.34
20.32
Prob.
0.00
0.16
0.00
0.00
654.16
19.05
7.50
7.54
342.98
0.00
Calculate the homoskedasticity only F-statistic and determine whether the null hypothesis can be rejected at
the 5% significance level.
Answer: There are two restrictions, namely H0 : meal_pct = 0, calw_pct = 0. The F-statistic is
43792.42
420 - 5 - 1
F=
-1
= 86.47. The 5% critical value from the F2, distribution is 3.00. Hence we
30888.64
2
easily reject the two restrictions at the 5% level of significance.
14) Consider the regression output from the following unrestricted model:
Unrestricted model:
Dependent Variable: TESTSCR
Method: Least Squares
Date: 07/31/06 Time: 17:35
Sample: 1 420
Included observations: 420
Variable
C
STR
EL_PCT
LOG(AVGINC)
MEAL_PCT
CALW_PCT
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
658.47
-0.76
-0.19
11.69
-0.37
-0.07
0.80
0.79
8.64
30888.64
-1498.51
1.51
Std. Error
t-Statistic
7.68
0.23
0.03
1.74
0.04
0.06
85.73
-3.27
-5.62
6.71
-9.53
-1.21
Prob.
0.00
0.00
0.00
0.00
0.00
0.23
654.16
19.05
7.16
7.22
324.94
0.00
To test for the null hypothesis that neither coefficient on the percent eligible for subsidized lunch nor the
coefficient on the percent on public income assistance is statistically significant, you have your statistical
package plot the confidence set. Interpret the graph below and explain what it tells you about the null
hypothesis.
Answer: The dot in the center of the ellipse is the point estimate for the two coefficients (-0.37,-0.07). Since the
(0,0) point is not inside the ellipse, you reject the null hypothesis.
15) Consider the regression model Yi = 0 + 1 X1i + 2 X2i+ 3 X3i + ui. Use Approach #2 from Section 7.3 to
transform the regression so that you can use a t-statistic to test:
1=
2
3
Answer: This is not a linear restriction. Hence you cannot use the F-test to test for its validity.
1 2 ui
L
e (where Y is output, A is the
i
i
level of technology, K is the capital stock, and L is the labor force), which has been linearized here (by using
logarithms) to look as follows:
*
0 + 1 ki + 2 li + ui
yi =
Assuming that the errors are heteroskedastic, you want to test for constant returns to scale. Using a t-statistic
and Approach #2, how would you proceed.
Answer: Under constant returns to scale, 1 + 2 = 1. Hence you need to transform the unrestricted model above
by subtracting l from both sides, and by adding and subtracting 1 li. This results in (y i - li) =
*
0 + 1 (ki
- li) + ( 1 + 2 - 1) li + ui. The left hand side variable is now the (log of the) output-labor ratio, and the
first explanatory variable on the right hand side is the (log of the) capital-labor ratio. If the null
hypothesis of constant returns to scale holds, then the coefficient on l should be zero. This can be directly
tested using a t-statistic.
17) Consider the following two models to explain testscores.
Model 1:
Dependent Variable: TESTSCR
Method: Least Squares
Date: 07/31/06 Time: 17:52
Sample: 1 420
Included observations: 420
Variable
C
STR
EL_PCT
LOG(AVGINC)
MEAL_PCT
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
658.55
-0.73
-0.18
11.57
-0.40
0.80
0.79
8.64
30998.01
-1499.25
1.52
Std. Error
7.68
0.23
0.03
1.74
0.02
t-Statistic
85.70
-3.18
-5.52
6.65
-13.09
Prob.
0.00
0.00
0.00
0.00
0.00
654.16
19.05
7.16
7.21
405.36
0.00
Model 2:
Variable
C
STR
EL_PCT
LOG(AVGINC)
CALW_PCT
R-squared
Adjusted R-squared
S.E. of regression
Sum squared resid
Log likelihood
Durbin-Watson stat
Coefficient
620.92
-0.66
-0.39
21.87
-0.41
0.75
0.75
9.53
37659.29
-1540.13
1.41
Std. Error
t-Statistic
Prob.
7.27
0.25
0.03
1.52
0.05
85.41
-2.58
-14.05
14.41
-8.22
0.00
0.01
0.00
0.00
0.00
654.16
19.05
7.36
7.41
315.31
0.00
Explain why you cannot use the F-test in this situation to discriminate between Model 1 and Model 2.
Answer: Neither model is contained (nested) in the other, in the sense that you cannot place restrictions on
Model 1 to obtain Model 2 (and vice versa). Hence there is no unrestricted and restricted model in this case.
18) Your textbook has emphasized that testing two hypothesis sequentially is not the same as testing them
simultaneously. Consider the following confidence set below, where you are testing the hypothesis that H0 : 5
= 0, 6 = 0.
Your statistical package has also generated a dotted area, which corresponds to drawing two confidence
intervals for the respective coefficients. For each case where the ellipse does not coincide in area with the
corresponding rectangle, indicate what your decision would be if you relied on the two confidence intervals vs.
the ellipse generated by the F-statistic.
Answer: The following possible outcomes can be seen in the figure above: (i) both F-statistic and the two
confidence intervals generate the same result; (ii) you do not reject the null hypothesis using the
F-statistic, but you do so by using the confidence intervals (these are the points in the area at the tip of
the ellipse); (iii) you reject the null hypothesis using the confidence intervals but not the F-statistic.
19) You have estimated the following regression to explain hourly wages, using a sample of 250 individuals:
AHE i = -2.44 - 1.57 DFemme + 0.27 DMarried + 0.59 Educ + 0.04 Exper - 0.60 DNonwhite
(1.29) (0.33)
(0.36)
(0.09)
(0.01)
(0.49)
22) Looking at formula (7.13) in your textbook for the homoskedasticity-only F-statistic,
F=
give three conditions under which, ceteris paribus, you would find a large value, and hence would be
likely to reject the null hypothesis.
Answer: The F-statistic will be larger for (i) large percentage changes in the SSR between the restricted and the
unrestricted regression; (ii) smaller number of restrictions (q); (iii) larger sample size (large number of
degrees of freedom).
23) Analyzing a regression using data from a sub-sample of the Current Population Survey with about 4,000
observations, you realize that the regression R2 , and the adjusted R2 , R2 , are almost identical. Why is that the
case? In your textbook, you were told that the regression R2 will almost always increase when you add an
explanatory variable, but that the adjusted measure does not have to increase with such an addition. Can this
still be true?
Answer: The difference between the two measures is the adjustment by the degrees of freedom. Once the number
of observations become very large, it does not matter how many explanatory variables you have in your
regression, the ratio of (n-1) being roughly the same as (n-k-1). As a result, the adjusted measure will
also almost always increase with the addition of another explanatory variable.
Y = f(X1 +
C)
Y = f(X1 +
D)
Answer: C
2) The interpretation of the slope coefficient in the model Yi = 0 + 1 ln(Xi) + ui is as follows:
A) a 1% change in X is associated with a 1 % change in Y.
B) a 1% change in X is associated with a change in Y of 0.01 1 .
C) a change in X by one unit is associated with a 1 100% change in Y.
D) a change in X by one unit is associated with a 1 change in Y.
Answer: B
3) The interpretation of the slope coefficient in the model ln(Yi) = 0 + 1 Xi + ui is as follows:
A) a 1% change in X is associated with a 1 % change in Y.
B) a change in X by one unit is associated with a 100 1 % change in Y.
C) a 1% change in X is associated with a change in Y of 0.01 1 .
D) a change in X by one unit is associated with a 1 change in Y.
Answer: B
4) The interpretation of the slope coefficient in the model ln(Yi) = 0 + 1 ln(Xi)+ ui is as follows:
A) a 1% change in X is associated with a 1 % change in Y.
B) a change in X by one unit is associated with a 1 change in Y.
C) a change in X by one unit is associated with a 100 1 % change in Y.
D) a 1% change in X is associated with a change in Y of 0.01 1 .
Answer: A
5) In the case of regression with interactions, the coefficient of a binary variable should be interpreted as follows:
A) there are really problems in interpreting these, since the ln(0) is not defined.
B) for the case of interacted regressors, the binary variable coefficient represents the various intercepts for
the case when the binary variable equals one.
C) first set all explanatory variables to one, with the exception of the binary variables. Then allow for each of
the binary variables to take on the value of one sequentially. The resulting predicted value indicates the
effect of the binary variable.
D) first compute the expected values of Y for each possible case described by the set of binary variables. Next
compare these expected values. Each coefficient can then be expressed either as an expected value or as
the difference between two or more expected values.
Answer: D
6) The following interactions between binary and continuous variables are possible, with the exception of
A) Yi = 0 + 1 Xi + 2 Di + 3 (Xi Di) + ui.
B) Yi = 0 + 1 Xi + 2 (Xi Di) + ui.
C) Yi = ( 0 + Di) + 1 Xi + ui.
D) Yi = 0 + 1 Xi + 2 Di + ui.
Answer: C
Stock/Watson 2e -- CVC2 8/23/06 -- Page 183
Answer: A
15) For the polynomial regression model,
A) you need new estimation techniques since the OLS assumptions do not apply any longer.
B) the techniques for estimation and inference developed for multiple regression can be applied.
C) you can still use OLS estimation techniques, but the t-statistics do not have an asymptotic normal
distribution.
D) the critical values from the normal distribution have to be changed to 1.96 2 , 1.96 3 , etc.
Answer: B
16) To test whether or not the population regression function is linear rather than a polynomial of order r,
A) check whether the regression R2 for the polynomial regression is higher than that of the linear regression.
B) compare the TSS from both regressions.
C) look at the pattern of the coefficients: if they change from positive to negative to positive, etc., then the
polynomial regression should be used.
D) use the test of (r-1) restrictions using the F-statistic.
Answer: D
17) The best way to interpret polynomial regressions is to
A) take a derivative of Y with respect to the relevant X.
B) plot the estimated regression function and to calculate the estimated effect on Y associated with a change
in X for one or more values of X.
C) look at the t-statistics for the relevant coefficients.
D) analyze the standard error of estimated effect.
Answer: B
0 = 0,
1 = 0.
3 = 0.
D) F-statistic for the joint hypothesis that
2 = 0,
3 = 0.
Y
is
X1
26) In the model ln(Yi) = 0 + 1 Xi + ui, the elasticity of E(Y|X) with respect to X is
A) 1 X
B) 1
C)
1X
+
0 1X
28) Consider the polynomial regression model of degree Yi = 0 + 1 Xi + 2 X i + ...+ r X i + ui. According to the
null hypothesis that the regression is linear and the alternative that is a polynomial of degree r corresponds to
A) H0 : r = 0 vs. r 0
B) H0 : r = 0 vs. 1 0
C) H0 : 3 = 0, ..., r = 0, vs. H1 : all j 0, j = 3, ..., r
D) H0 : 2 = 0, 3 = 0 ..., r = 0, vs. H1 : at least one j
0, j = 2, ..., r
Answer: D
29) Consider the following least squares specification between testscores and the student -teacher ratio:
TestScore = 557.8 + 36.42 ln (Income). According to this equation, a 1% increase income is associated with an
increase in test scores of
A) 0.36 points
B) 36.42 points
C) 557.8 points
D) cannot be determined from the information given here
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 187
30) Consider the population regression of log earnings [Yi, where Yi = ln(Earnings i)] against two binary variables:
whether a worker is married (D1i, where D1i=1 if the ith person is married) and the workers gender ( D2i,
where D2i=1 if the ith person is female), and the product of the two binary variables Yi = 0 + 1 D1i + 2 D2i +
3 (D1iD2i) + ui. The interaction term
A) allows the population effect on log earnings of being married to depend on gender
B) does not make sense since it could be zero for married males
C) indicates the effect of being married on log earnings
D) cannot be estimated without the presence of a continuous variable
Answer: A
individuals life. Females earn approximately 42.1 percent less than males at a given age. Again, the
intercept should not be interpreted. The regression explains 17 percent of the variation in the log of
earnings. You should not prefer this specification over the linear one on grounds of the higher regression
R2 since these cannot be compared as a result of the difference in the units of measurement of the
dependent variable.
(d) The coefficient on the added variable is statistically significant and has resulted in a substantial
increase in the regression R2 . The increase in the Age coefficient is due to the fact that earnings increase
more initially than later in life or, mathematically speaking, it compensates for the negative coefficient on
Age2 , which lowers earnings as individuals become older.
(e) Students answers will differ, but education, ability, regional differences, race, and professional
choice are often mentioned.
2) An extension of the Solow growth model that includes human capital in addition to physical capital, suggests
that investment in human capital (education) will increase the wealth of a nation (per capita income). To test
this hypothesis, you collect data for 104 countries and perform the following regression:
RelPersInc = 0.046 5.869 gpop + 0.738 SK + 0.055 Educ, R2 =0.775, SER = 0.1377
(0.079) (2.238)
(0.294)
(0.010)
where RelPersInc is GDP per worker relative to the United States, gpop is the average population growth rate,
1980 to1990, sK is the average investment share of GDP from 1960 to1990, and Educ is the average educational
attainment in years for 1985. Numbers in parentheses are for heteroskedasticity -robust standard errors.
(a) Interpret the results and indicate whether or not the coefficients are significantly different from zero. Do the
coefficients have the expected sign?
(b) To test for equality of the coefficients between the OECD and other countries, you introduce a binary
variable (DOECD), which takes on the value of one for the OECD countries and is zero otherwise. To conduct
the test for equality of the coefficients, you estimate the following regression:
RelPersInc = -0.068 0.063 gpop + 0.719 SK + 0.044 Educ,
(0.072) (2.271)
(0.365)
(0.012)
(5.366)
(0.768)
Answer: (a) A one percentage point decrease in the population growth rate increases GDP per worker relative to
the United States by roughly 0.06. An increase in the investment share of 0.1 results in an increase of
GDP per worker relative to the United States by approximately 0.07. For every additional year of
average educational attainment, the increase is 0.055. The intercept should not be interpreted. The
regression explains 77.5 percent of the variation in relative productivity. All coefficients are significantly
different from zero at conventional levels. All coefficients carry the expected sign.
(b) The regression for the non-OECD countries is
RelPerInc = -0.068 0.063 gpop + 0.719 SK + 0.044 Educ.
For the OECD countries we get
RelPerInc = 0.313 8.101 gpop + 0.289 SK + 0.047 Educ.
The critical value is 3.32 at the 1% level and hence you can reject the null hypothesis that the coefficients
are equal.
(c) Answer: Given the critical value, the coefficient is statistically significant, that is, you can reject
DOECD = 0.
(d) Given the critical value of 3.78 at the 1% level, you cannot reject the null hypothesis that the
additional coefficients are all zero. The F-test is the proper procedure to use when testing for
simultaneous restrictions.
(e) There is evidence that the slopes can be set equal. However, there seems to be a level difference
between the two groups of countries.
3) You have been asked by your younger sister to help her with a science fair project. During the previous years
she already studied why objects float and there also was the inevitable volcano project. Having learned
regression techniques recently, you suggest that she investigate the weight -height relationship of 4 th to 6th
graders. Her presentation topic will be to explain how people at carnivals predict weight. You collect data for
roughly 100 boys and girls between the ages of nine and twelve and estimate for her the following relationship:
9-year-old
10-year-old
11-year-old
12-year-old
Boys Weight
60
70
77
87
Boys Height
52
54
56
58.5
Girls Weight
60
70
80
92
Girls Height
49
52
57
60
Insert two height/weight measures each for boys and girls and see how accurate your predictions are.
(d) The F-statistic for testing that the intercept and slope for boys and girls are identical is 2.92. Find the critical
values at the 5% and 1% level, and make a decision. Allowing for a different intercept with an identical slope
results in a t-statistic for DFY of (0.35). Having identical intercepts but different slopes gives a t -statistic on
(DFYHeight4) of (0.35) also. Does this affect your previous conclusion?
(e) Assume that you also wanted to test if the relationship changes by age. Briefly outline how you would
specify the regression including the gender binary variable and an age binary variable ( Older) that takes on a
value of one for eleven to twelve year olds and is zero otherwise. Indicate in a table of two rows and two
columns how the estimated relationship would vary between younger girls, older girls, younger boys, and
older boys.
Answer: (a) For every inch above 4 feet, children of that age group gain roughly 4 pounds. A student who is 4 feet
tall, weighs approximately 45.5 pounds. The regression explains 55 percent of the weight variation in
children of that age group.
(b) Shorter girls weight more than boys, and taller boys weigh more than girls on average. Given your
prior expectations, this is somewhat unexpected. The coefficients involving the binary variable are
statistically significant at conventional levels. The regressions for boys is
Weight = 36.27 + 5.32 Height4.
For girls it is
Weight = 53.60 + 3.49 Height4.
(c) The XX points mark a female, and the XY a male. The regression line predicts a 9 -year-old boy
to weigh 57.2 pounds, an 11-year-old boy to weight 78.8 pounds, a 10 -year- old girl to weigh 67.6 and a
12-year-old girl to weigh 95.5 pounds. Hence the weights are quite close.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 192
(d) The critical value is 3.00 at the 5% level, and 4.61 at the 1% level. Hence you cannot reject equality of
the two coefficients. The previous conclusion is unaffected since the test was for both hypotheses to hold
simultaneously. The t-statistics indicate that imposing the equality and testing for either the slope or the
intercept to be significantly different between boys and girls, does not result in a different coefficient
either.
(e) Weight = 0 + 1 DFY + 2 Height4 + 3 (DFY Height4)
+ 4 Older + 5 (Older Height4) + u
Boys
Girls
Younger
^
0 + 2 Height4
^
Older
^
( 0 + 4 ) + ( 2 + 5 ) Height4
^
( 0 + 1 ) + ( 2 + 3 ) Height4 ( 0 + 1 + 4 ) + ( 2 + 3 + 5 ) Height4
4) You have learned that earnings functions are one of the most investigated relationships in economics. These
typically relate the logarithm of earnings to a series of explanatory variables such as education, work
experience, gender, race, etc.
(a) Why do you think that researchers have preferred a log-linear specification over a linear specification? In
addition to the interpretation of the slope coefficients, also think about the distribution of the error term.
(b) To establish age-earnings profiles, you regress ln(Earn) on Age, where Earn is weekly earnings in dollars,
and Age is in years. Plotting the residuals of the regression against age for 1,744 individuals looks as shown in
the figure:
(d) The critical value from the F-table is 4.61 at the 1% level. Hence the null hypothesis is rejected.
(e) Instead of the inverted V-shape for the above regression, an inverted U -shape would most likely
produce a better fit. This can be generated through the use of a polynomial regression model of degree 2.
5) Sports economics typically looks at winning percentages of sports teams as one of various outputs, and
estimates production functions by analyzing the relationship between the winning percentage and inputs. In
Major League Baseball (MLB), the determinants of winning are quality pitching and batting. All 30 MLB teams
for the 1999 season. Pitching quality is approximated by Team Earned Run Average (ERA), and hitting
quality by On Base Plus Slugging Percentage (OPS).
Summary of the Distribution of Winning Percentage, On Base Plus Slugging Percentage,
and Team Earned Run Average for MLB in 1999
Average Standard
deviation
0.53
0.034
0.08
Percentile
10%
25%
40%
3.84
0.720
0.40
4.35
0.754
0.43
4.72
0.769
0.46
50%
(median)
4.78
0.780
0.48
60%
75%
90%
4.91
0.790
0.49
5.06
0.798
0.59
5.25
0.820
0.60
(0.89)
(0.11)
R2 = 0.31, SER=0.987
Stock/Watson 2e -- CVC2 8/23/06 -- Page 196
The F-statistic for the null hypothesis that both parameters involving the high impact minimum wage variable
are zero, is 42.16. Can you reject the null hypothesis that both coefficients are zero? Sketch the two regression
lines together with the 450 line and interpret the results again.
(c) To check the robustness of these results, you repeat the exercise using a new binary variable for the
so-called mining state (Dmining), i.e., the eleven states that have at least three percent of their total state
earnings derived from oil, gas extraction, and coal mining, in the 1980s. This results in the following output:
95
85
85
Ur i = 4.04 + 0.15 Ur i 2.92 Dmining + 0.37 (Dmining Ur i ),
(0.65) (0.09)
(0.90)
(0.10)
R2 = 0.31, SER=0.997
How confident are you that the previously found effect is due to minimum wages?
Answer: (a) An increase in the 1985 unemployment rate results in an increase in the unemployment rate in 1995 of
0.27 percent. Put differently, if one state had a one percent higher unemployment rate in 1985 than
another state, then this difference would shrink, on average, to 0.27 percent in 1995. 21 percent of the
variation in 1995 state unemployment rates is explained by the regression. If the fitted line coincided
with the 450 line, then the unemployment rates in 1995 would remain unchanged when compared to
1985. The estimated regression implies, unrealistically, mean reversion in the unemployment rates.
(b) The critical value for the F-statistic is 4.61 at the 1% level and hence the null hypothesis that both
coefficients are zero in the population is rejected. (The sample size is small, however, so the distribution
of the test statistic is not really known.) The intercept for the high -impact states is smaller and the slope
is steeper. This suggests that for high-impact states there is less of a mean reversion effect present: if
high-impact states had high 1985 unemployment rates, then they are expected to have higher
unemployment rates in 1995 when compared to a low-impact state. High and low unemployment rates
are thereby more persistent for high-impact states.
(c) The results here are similar to those in (b) in that the regression for the mining states is steeper than
the one for the other states. Perhaps omitted variables play a role here, such as relative (oil) price shocks
that affect some states more than others. Oil prices fell considerably over the time period and it is
possible that the high-impact binary variable coefficient picks up the effect of omitted variables.
Including more explanatory variables would be desirable.
7) Labor economists have extensively researched the determinants of earnings. Investment in human capital,
measured in years of education, and on the job training are some of the most important explanatory variables
in this research. You decide to apply earnings functions to the field of sports economics by finding the
determinants for baseball pitcher salaries. You collect data on 455 pitchers for the 1998 baseball season and
estimate the following equation using OLS and heteroskedasticity -robust standard errors:
Ln(Earni) = 12.45 + 0.052 Years + 0.00089 Innings + 0.0032 Saves
(0.08) (0.026)
(0.00020)
(0.0018)
(0.0165)
(0.0026)
(0.00000012)
R2 =0.69, SER=0.666
What is her reasoning? Are the coefficients of the quadratic terms statistically significant? Are they meaningful?
(d) Calculate the effect of moving from two to three years, as opposed to from 12 to 13 years.
(e) You also decide to test the specification for stability across leagues (National League and American League)
by including a dummy variable for the National League and allowing the intercept and all slopes to differ. The
resulting F-statistic for restricting all coefficients that involve the National League dummy variable to zero, is
0.40. Compare this to the relevant critical value from the table and decide whether or not these additional
variables should be included.
Answer: (a) For staying an additional year in the league, the pitcher receives a 5.2 percent increase in earnings. On
average, the reliever with 10 more saves ends up with 3.2 percent higher earnings. Pitching100
additional innings results in 8.9 percent higher earnings, and lowering the ERA by 1.5 increases earnings
by 1.3 percent. ERA, innings pitched, and number of saves are all quality of input indicators and should
therefore have the signs as in the regression above. Years in the major leagues stands as a proxy for on
the job training and should therefore carry a positive sign.
(b) Given that there is prior expectation on the sign of the coefficients, you should conduct a one-sided
hypothesis test. All variables with the exception of ERA carry statistically significant coefficients at the
5% level.
(c) Allowing for the quadratic terms to enter results in an inverted U-shape for the relationship between
the log of earnings, and both years in the league and innings pitched. Both coefficients are highly
significant and have resulted also in a significant ERA coefficient.
(d) Having played for two years and staying for one more year in the league results in an earnings
increase of 7.8 percent, while staying for an additional year after 12 years in the majors results in a
predicted decrease of 25.3 percent.
(e) F7, = 2.01 at the 5% level. Hence you cannot reject the null hypothesis of equality of coefficients
across leagues.
8) After analyzing the age-earnings profile for 1,744 workers as shown in the figure, it becomes clear to you that
the relationship cannot be approximately linear.
You estimate the following polynomial regression model, controlling for the effect of gender by using a binary
variable that takes on the value of one for females and is zero otherwise:
Earn = 795.90 + 82.93 Age 1.69 Age2 + 0.015 Age3 0.0005 Age4
(283.11) (29.29)
(1.06)
(0.016)
(0.0009)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 199
There is little difference between the two fits for values between the age of 25 and 60. The inverted
U-shape is well known to exist for age-earnings profiles, and hence the plot makes sense. There is no
interpretation for the intercept, since there is no data close to the origin. Females earn significantly less at
every age level.
(d) Since this is a nonlinear relationship, the effect will depend on the age level. This is described in
section 6.1 of the textbook. In essence, the predicted earnings value for one age level has to be computed
Stock/Watson 2e -- CVC2 8/23/06 -- Page 200
first. Next, the same has to be done for the age level plus one. Finally the two values are differenced to
find the change in earnings associated with the age level.
For the polynomial of degree 3, the first task is to consider the estimated change in earnings associated
^ ^
^
with a change in age by one year, say from 30 to 31. This is given by Y = 1 (31- 30) + 2 (312 - 302 ) +
^
3 (313 - 303 ) or
^
SE( Y =
Y = 1 + 61 2 + 2791 3 . The standard error of the estimated effect is then given from
^
^
^
^
^
^
Y
, where F = [( 1 + 61 2 + 2791 3 ) / SE( 1 + 61 2 + 2791 3 ]2 . A 95% confidence interval
F
^
for the change in the expected value of earnings is ( 1 + 61 2 + 2791 3 ) 1.96 SE( 1 + 61 2 + 2791 3 ).
Obviously these expressions get quite complicated once you go beyond a quadratic.
9) Earnings functions attempt to find the determinants of earnings, using both continuous and binary variables.
One of the central questions analyzed in this relationship is the returns to education.
(a) Collecting data from 253 individuals, you estimate the following relationship
ln(Earni) = 0.54 + 0.083 Educ, R2 = 0.20, SER = 0.445
(0.14) (0.011)
where Earn is average hourly earnings and Educ is years of education.
What is the effect of an additional year of schooling? If you had a strong belief that years of high school
education were different from college education, how would you modify the equation? What if your theory
suggested that there was a diploma effect?
(b) You read in the literature that there should also be returns to on -the-job training. To approximate
on-the-job training, researchers often use the so called Mincer or potential experience variable, which is
defined as Exper = Age Educ 6. Explain the reasoning behind this approximation. Is it likely to resemble
years of employment for various sub-groups of the labor force?
(c) You incorporate the experience variable into your original regression
ln(Earni) = -0.01 + 0.101 Educ + 0.033 Exper 0.0005 Exper2 ,
(0.16) (0.012)
(0.006)
(0.0001)
R2 = 0.34, SER = 0.405
What is the effect of an additional year of experience for a person who is 40 years old and had 12 years of
education? What about for a person who is 60 years old with the same education background?
(d) Test for the significance of each of the coefficients of the added variables. Why has the coefficient on
education changed so little? Sketch the age-(log)earnings profile for workers with 8 years of education and 16
years of education.
(e) You want to find the effect of introducing two variables, gender and marital status. Accordingly you specify
a binary variable that takes on the value of one for females and is zero otherwise ( Female), and another binary
variable that is one if the worker is married but is zero otherwise (Married). Adding these variables to the
regressors results in:
ln(Earni) = 0.21 + 0.093 Educ + 0.032 Exper 0.0005 Exper2
(0.16) (0.012)
(0.006)
(0.0001)
(e) The coefficient for the female binary variable is statistically significant even at the 1% level. The
coefficient for the married binary variable only has a t-statistic of 1.11 and is not statistically significant
at the 10% level. Both coefficients indicate economic importance, since females make approximately 29
percent less than males and married people earn roughly 6 percent more. A married female earns
Stock/Watson 2e -- CVC2 8/23/06 -- Page 202
roughly 23 percent less than a single male. Married females earn 29 percent less than married males, the
same percentage that single females earn less than single males.
(f) The default is the single male. Single females earn 15.8 percent less. Married males earn 17.3 percent
more. Married females earn 20.3 percent less. Comparing married females with married males now
results in a percentage differential of 37.6 percent in favor of the males.
10) One of the most frequently estimated equations in the macroeconomics growth literature are so -called
convergence regressions. In essence the average per capita income growth rate is regressed on the
beginning-of-period per capita income level to see if countries that were further behind initially, grew faster.
Some macroeconomic models make this prediction, once other variables are controlled for. To investigate this
matter, you collect data from 104 countries for the sample period 1960 -1990 and estimate the following
relationship (numbers in parentheses are for heteroskedasticity-robust standard errors):
g6090 = 0.020 0.360 gpop + 0.00 4 Educ 0.053RelProd 60, R2 =0.332, SER = 0.013
(0.009) (0.241)
(0.001)
(0.009)
where g6090 is the growth rate of GDP per worker for the 1960-1990 sample period, RelProd 60 is the initial
starting level of GDP per worker relative to the United States in 1960, gpop is the average population growth
rate of the country, and Educ is educational attainment in years for 1985.
(a) What is the effect of an increase of 5 years in educational attainment? What would happen if a country
could implement policies to cut population growth by one percent? Are all coefficients significant at the 5%
level? If one of the coefficients is not significant, should you automatically eliminate its variable from the list of
explanatory variables?
(b) The coefficient on the initial condition has to be significantly negative to suggest conditional convergence.
Furthermore, the larger this coefficient, in absolute terms, the faster the convergence will take place. It has been
suggested to you to interact education with the initial condition to test for additional effects of education on
growth. To test for this possibility, you estimate the following regression:
g6090 = 0.015 -0.323 gpop + 0.005 Educ 0.051RelProd60
(0.009) (0.238)
(0.001)
(0.013)
s Fallacy?
Answer: (a) Increasing educational attainment by 5 years results in an increase of productivity growth of 2
percent. Decreasing the population growth rate by one percent increases productivity growth by 0.4
percent. All coefficients are statistically significant at the 5% level with the exception of population
growth. You should not eliminate a variable simply because it is not statistically significant. It is better to
report the statistics and let the reader decide.
(b)
g6090
= 0.005 - 0.0028 RelProd60. For West Germany, the effect is 0.3 percent, while for Brazil it is
Educ
0.4 percent. These are small gains, but they accumulate over time.
g6090
(c)
= -0.051 - 0.0028Educ, which therefore depends on educational attainment. Countries
RelProd60
with higher educational attainment will converge faster. The coefficient has a t-statistic of 1.87 and is
therefore statistically significant at the 5% level using a one-sided hypothesis test.
(d) The above regressions generate a mean reversion outcome. Interpreted literally, the implication is
that all countries end up with the same productivity or per capita income, just as all persons would be of
the same height. It can be shown that Galtons Fallacy is the result of errors-in-variables which biases
the slope coefficient downward. This topic is covered in Chapter 7. The solution is to use instrumental
variable techniques, also discussed in Chapter 10. The literature in this area has done so, and the
convergence result persists.
11) Pages 283-284 in your textbook contain an analysis of the Return to Education and the Gender Gap. Column
(4) in Table 8.1 displays regression results using the 2009 Current Population Survey. The equation below
shows the regression result for the same specification, but using the 2005 Current Population Survey. Interpret
the major results.
ln earnings = 1.215 + 0.0899educ - 0.521DFemme+ 0.0180(DFemmeeduc)
(0.018) (0.0011)
(0.022)
(0.0016)
+ 0.0232exper - 0.000368exper2 - 0.058Midwest - 0.0098South - 0.030West
(0.0008)
(0.000018)
(0.006)
(0.0078)
(0.0030)
Answer: The return to education for males is approximately 9% and its coefficient has a t-statistic of 11.25. For
females, the return is slightly higher, approximately 11%. Since the binary variable for females is
interacted with the number of years of education, the gender gap depends on the number of years of
education. For the typical high school graduate (12 years of education), the gender gap is approximately
27%, while for the typical college graduate (16 years of education) the gender gap narrows to 19%. The
potential experience variable enters in an inverted U-shape, which is to be expected given the shape of
age-earnings profiles and the fact that potential experience depends on the age of the individual. There
is a declining marginal value for each year of potential experience until it eventually becomes negative.
Northeast is the omitted region, and all other regions have lower (log) earnings, ranging from 0.8% in the
South to 5.8% in the Midwest. All coefficients are statistically significant.
2) Suggest a transformation in the variables that will linearize the deterministic part of the population regression
functions below. Write the resulting regression function in a form that can be estimated by using OLS.
1
2
(a) Yi = 0 X 1i X 2i
(b) Yi =
(c) Yi =
Xi
0 + 1 Xi
e 0 + 1 X1
1+ e 0 + 1 X1
1
(d) Yi = 0 X 1i e 2 2 X2i
Answer: (a) ln(Yi) = ln( 0 ) +
1
1
(b)
= 0
+ 1
Yi
Xi
(c) ln
Yi
1-Yi
0+
1 ln(X1i) +
2 ln(X2i)
1 Xi
1 ln(X1i) +
2 X2i
3) Indicate whether or not you can linearize the regression functions below so that OLS estimation methods can
be applied:
(a) Yi = e 0 + 1 Xi+ui
(b) Yi =
1
2
1 X 1i X 2i + ui
Answer: (a) The function can be linearized by taking logs on both sides.
(b) The function cannot be linearized due to the additive error term.
4) Choose at least three different nonlinear functional forms of a single independent variable and sketch the
relationship between the dependent and independent variable.
Answer: Answers will vary by student. Most commonly used forms are the quadratic regression, the inverse (in
X) regression, and the log-log model.
5) In the case of perfect multicollinearity, OLS is unable to estimate the slope coefficients of the variables involved.
2
Assume that you have included both X1 and X2 as explanatory variables, and that X2 = X , so that there is an
1
exact relationship between two explanatory variables. Does this pose a problem for estimation?
Answer: There is no problem for estimation, since the second explanatory variable is not linearly related to the
first. This is an example of a polynomial regression model of degree 2, which is frequently estimated in
econometrics
6) The figure shows is a plot and a fitted linear regression line of the age -earnings profile of 1,744 individuals,
taken from the Current Population Survey.
(a) Describe the problems in predicting earnings using the fitted line. What would the pattern of the residuals
look like for the age category under 40?
(b) What alternative functional form might fit the data better?
(c) What other variables might you want to consider in specifying the determinants of earnings?
Answer: (a) There would be many overpredictions for this age category under 40, and hence more negative
residuals.
(b) It would be better to fit a quadratic here, i.e., a polynomial regression model, which would produce
an inverted U-shape.
(c) Answers will vary by students, but education, gender, race, tenure with an employer, professional
choice, and ability are typically present in answers.
7) (Requires Calculus) Show that for the log-log model the slope coefficient is the elasticity.
Answer: Consider the deterministic part Y = AX 1 . Then ln(Y) = 0 + 1 ln(X), where 0 = ln(A). Now
1
Y
Y
1
X
Y
ln(Y)
= 1
ln(X)
Y X
Y
. Alternatively you can derive the same result by taking the derivative
from Y = A
X Y
X
X 1.
8) Assume that you had data for a cross-section of 100 households with data on consumption and personal
disposable income. If you fit a linear regression function regressing consumption on disposable income, what
prior expectations do you have about the slope and the intercept? The slope of this regression function is called
the marginal propensity to consume. If, instead, you fit a log-log model, then what is the interpretation of
the slope? Do you have any prior expectation about its size?
Answer: For the log-log specification, the slope is the elasticity. Since there are many theories that predict a
constant average propensity to consume, the elasticity should equal one.
x
. Show that this is equivalent to the following approximation
x
x) ln(x)
y if y is small. You use this idea to estimate a demand for money function, which is of the form m =
0 GDP 1 , (1+ R) 1 eu where m is the quantity of (real) money, GDP is the value of (real) Gross Domestic
Product, and R is the nominal interest rate. You collect the quarterly data from the Federal Reserve Bank of St.
Louis data bank (FRED), which lists the money supply and GDP in billions of dollars, prices as an index, and
nominal interest rates in percentage points per year
You generate the variables in your regression program as follows: m = (money supply)/price index; GDP =
(Gross Domestic Product/Price Index), and R = nominal interest rate in percentage points per annum. Next you
perform the log-transformations on the real money supply, real GDP, and on (1+R). Can you for see a problem
in using this transformation?
Answer: ln(x +
x) - ln(x) = ln
x+ x
x
x
. Let y = 0.05, then ln(1 + y) = 0.049
= ln 1+
= ln(1+ y), where y =
x
x
x
0.05. Note that this approximation does not hold well for larger fractions, such as 0.60. The interest
rate is listed in percentage points. Entering R as 5, rather than 0.05, makes 2 not equal a semi-elasticity.
10) You have estimated an earnings function, where you regressed the log of earnings on a set of continuous
explanatory variables (in levels) and two binary variables, one for gender and the other for marital status. One
of the explanatory variables is education.
(a) Interpret the education coefficient.
(b) Next, specify the binary variables and an equation, where the default is a single male, without allowing for
interaction between marital status and gender. Indicate the coefficients that measure the effect of a single male,
single female, married male, and married female.
(c) Finally allow for an interaction between the gender and marital status binary variables. Repeat the exercise
of writing down the various effects based on the female/male and single/married status. Why is the latter
approach more general than the former?
Answer: (a) The coefficient on education gives you the return to education, i.e., if education increased by one
year, then by how many percent do earnings increase?
(b) Let DGender equal one if the individual is a female, and be zero otherwise. DMarried takes on a value
of one if the individual is married and is zero otherwise. The regression is
^
ln Earn = 0 +
1 DGender +
2 DMarried + ...
^
(c) ln Earn = 0 +
^
1 DGender +
^
^
11) You have been told that the money demand function in the United States has been unstable since the late 1970.
To investigate this problem, you collect data on the real money supply (m=M/P; where M is M1 and P is the
GDP deflator), (real) gross domestic product (GDP) and the nominal interest rate (R). Next you consider
estimating the demand for money using the following alternative functional forms:
(i) m = 0 + 1 GDP + 2 x R+ u
(ii) m =
(iii) m =
0 GDP 1 x R 2 eu
0 GDP 1 x 1+ R 2 eu
Give an interpretation for 1 and 2 in each case. How would you calculate the income elasticity in case (i)?
Answer: In (i), both coefficients show the effect of a unit increase of the respective variables on the demand for
^
money. In (ii), the two coefficients are elasticities. In (iii), 1 is an elasticity, whereas 2 is often referred
^
For this to be a parsimonious presentation of the initial regression, the following two restrictions must
hold: 1 = - 2 , and 3 = - 4 .The use of an F-test is required here to test the restrictions simultaneously.
The intercept is still present in the equation, and the assertion therefore cannot be true.
13) Earnings functions attempt to predict the log of earnings from a set of explanatory variables, both binary and
continuous. You have allowed for an interaction between two continuous variables: education and tenure with
the current employer. Your estimated equation is of the following type:
^
ln P) 1 eu
16) Being a competitive female swimmer, you wonder if women will ever be able to beat the time of the male gold
medal winner. To investigate this question, you collect data for the Olympic Games since 1910. At first you
consider including various distances, a binary variable for Mark Spitz, and another binary variable for the
arrival and presence of East German female swimmers, but in the end decide on a simple linear regression.
Your dependent variable is the ratio of the fastest womens time to the fastest mens time in the 100 m
backstroke, and the explanatory variable is the year of the Olympics. The regression result is as follows,
TFoverM = 4.42 0.0017 Olympics,
where TFoverM is the relative time of the gold medal winner, and Olympics is the year of the Olympic Games.
What is your prediction when females will catch up to men in this discipline? Does this sound plausible? What
other functional form might you want to consider?
Answer: According to the above regression, women will catch up in the year 2011.76 or 2012. (This happens to be
an Olympics year.) This is not plausible for swimming, and a better functional form would be TFoverM =
1
0 + 1 Olympics .
17) Sketch for the log-log model what the relationship between Y and X looks like for various parameter values of
the slope, i.e., 1 > 1; 0 < 1 < 1; 1 = (-1).
Answer:
ln(Yt)
=
t
1
Y
Y t
t
ln(1 + g)t = 0 + 1 t, where 0 = ln(Y0 ) and 1 = (1 + g) g for small g. Hence if g is small, then
regressing the log of a variable on time generates a slope coefficient which is approximately the
proportionate rate of growth for small growth rates.
19) Your task is to estimate the ice cream sales for a certain chain in New England. The company makes available
to you quarterly ice cream sales (Y) and informs you that the price per gallon has approximately remained
constant over the sample period. You gather information on average daily temperatures ( X) during these
quarters and regress Y on X, adding seasonal binary variables for spring, summer, and fall. These variables are
constructed as follows: DSpring takes on a value of 1 during the spring and is zero otherwise, DSummer takes
on a value of 1 during the summer, etc. Specify three regression functions where the following conditions hold:
the relationship between Y and X is (i) forced to be the same for each quarter; (ii) allowed to have different
intercepts each season; (iii) allowed to have varying slopes and intercepts each season. Sketch the difference
between (i) and (ii). How would you test which model fits the data the best?
Answer: (i) Yi = 0 + 1 Xi + ui ;
(ii) Yi = 0 + 1 Xi + 2 DSpring + 3 DSummer + 4 DFall + ui ;
(iii) Yi = 0 + 1 Xi + 2 DSpring + 3 DSummer + 4 DFall
+ 5 (DSpring Xi) + 6 (DSummer Xi ) + 7 (DFall Xi) + ui ;
(iii) is the most general of the models, the others are nested. Hence you can use the F-test to see if certain
restrictions hold. For example, (i) is a parsimonious representation of (iii) if all coefficients involving the
seasonal binary variables are simultaneously equal to zero.
20) In estimating the original relationship between money wage growth and the unemployment rate, Phillips used
United Kingdom data from 1861 to 1913 to fit a curve of the following functional form
W
(
+ 0 ) = 1 ur 2 eu,
W
W
is the percentage change in money wages and ur is the unemployment rate. Sketch the function.
where
W
What role does 0 play? Can you find a linear transformation that allows you to estimate the above function
using OLS? If, after taking logarithms on both sides of the equation, you tried to estimate 1 and 2 using OLS
by choosing different values for 0 by trial and error procedure (Phillipss words), what sort of problem
might you run into with the left-hand side variable for some of the observations?
Answer: Given the shape of the Phillips curve, 2 will be negative and 1 will be positive. Hence for large values
W
. Taking logarithms on
of 1 ur 2 will be approximately zero, - 0 and is the lower asymptote of
W
W
both sides results in ln(
+ 0 ) = ln( 1 )+ 2 ln(ur) + u, which cannot be estimated by OLS due to the
W
W
form of the dependent variable. Choosing different values for 0 can result in situations where (
+
W
0 ) is negative and hence is not defined.
21) Using a spreadsheet program such as Excel, plot the following logistic regression function with a single X, Yi =
^
^
1
, where 0 = - 4.13 and 1 = 5.37. Enter values of X in the first column starting from 0 and then
^ ^
1+e-( 0 + 1 Xi)
incrementing these by 0.1 until you reach 2.0. Then enter the logistic function formula in the next column.
Finally produce a scatter plot, connecting the predicted values with a line.
Answer:
22) Table 8.1 on page 284 of your textbook displays the following estimated earnings function in column (4):
ln earnings = 1.503 + 0.1032educ - 0.451DFemme+ 0.0143(DFemmeeduc)
(0.023) (0.0012)
(0.024)
(0.0017)
+ 0.0232exper - 0.000368exper2 - 0.058Midwest - 0.0098South - 0.030West
(0.0012)
(0.000023)
(0.006)
(0.006)
(0.007)
n = 52.790, R2 = 0.267
Given that the potential experience variable (exper) is defined as (Age-Education-6) find the age at
which individuals with a high school degree (12 years of education) and with a college degree (16
years of education) have maximum earnings, holding all other factors constant.
Answer: The answer can be found either by using calculus or graphical/spreadsheet techniques. Maximum
earnings occurs at potential experience of 31.5. Hence with 12 years of education, the maximum earnings
happen at age 49.5, while for a person with 16 years of education these occur at 53.5 years. (Since taking
logarithms results in a monotonistic transformation of the original data, the same results hold for the log
of earnings as for earnings).
23) Consider a typical beta convergence regression function from macroeconomics, where the growth of a country s
per capita income is regressed on the initial level of per capita income and various other economic and
socio-economic variables. Assume that two of these variables are the average number of years of education in
the specific country and a binary variable which indicates whether or not the country experienced a significant
number of years of civil war/unrest. Explain why it would make sense to have these two variables enter
separately and also why you should use an interaction term. What signs would you expect on the three
coefficients?
Answer: Simple extensions of the standard neoclassical growth model suggest that the number of years of
education have a positive effect on conditional growth in the wealth of a nation (per capita income). A
civil war would have a negative effect on the investment/output ratio (savings rate) and you would
therefore expect a negative sign on the coefficient. However, it is important to interact the variables
because no matter how much education the average person has, there will be virtually no investment in
a country during a civil war. Hence you would expect a negative sign, which would indicate the effect
that a civil war has on the education effect.
24) Consider the following regression of testscores on an intercept, a binary variable that equals 1 if the
student-teacher ratio is 20 or more (HiSTR) and another binary variable that equals 1 if the percentage of
English learners is 10% or more (HiEL).
TestScore = 664/1 - 1.9HiSTR - 18.2HiEL - 3.5(HiSTRHiEL)
Using the two by two table below, fill in the expected testscores of a student with various combinations of the
high/low student teacher ratio and the high/low percent of English lerners.
STR < 20
STR
20
EL < 10%
EL
10%
STR < 20
Answer:
EL < 10%
EL
10%
664.1
645.9
STR
662.2
640.5
20
1+
2
x
2
x +
2
w
14) In the case of a simple regression, where the independent variable is measured with i.i.d. error,
A)
B)
C)
D)
2
X
2
X+
2
X
2
X+
2
w
2
w
2
X+
2
w
1+
2
w
1.
2
X
2
X+
2
w
Answer: A
15) In the case of errors-in-variables bias,
A) maximum likelihood estimation must be used.
B) the OLS estimator is consistent if the variance in the unobservable variable is relatively large compared to
variance in the measurement error.
C) the OLS estimator is consistent, but no longer unbiased in small samples.
D) binary variables should not be used as independent variables.
Answer: B
16) Sample selection bias occurs when
A) the choice between two samples is made by the researcher.
B) data are collected from a population by simple random sampling.
C) samples are chosen to be small rather than large.
D) the availability of the data is influenced by a selection process that is related to the value of the dependent
variable.
Answer: D
17) Simultaneous causality
A) means you must run a second regression of X on Y.
B) leads to correlation between the regressor and the error term.
C) means that a third variable affects both Y and X.
D) cannot be established since regression analysis only detects correlation between variables.
Answer: B
18) Correlation of the regression error across observations
A) results in incorrect OLS standard errors.
B) makes the OLS estimator inconsistent, but not unbiased.
C) results in correct OLS standard errors if heteroskedasticity-robust standard errors are used.
D) is not a problem in cross-sections since the data can always be reshuffled.
Answer: A
19) Applying the analysis from the California test scores to another U.S. state is an example of looking for
A) simultaneous causality bias.
B) external validity.
C) sample selection bias.
D) internal validity.
Answer: B
20) Comparing the California test scores to test scores in Massachusetts is appropriate for external validity if
A) Massachusetts also allowed beach walking to be an appropriate P.E. activity.
B) the two income distributions were very similar.
C) the student-to-teacher ratio did not differ by more than five on average.
D) the institutional settings in California and Massachusetts, such as organization in classroom instruction
and curriculum, were similar in the two states.
Answer: D
21) The guidelines for whether or not to include an additional variable include all of the following, with the
exception of
A) providing full disclosure representative tabulations of the results.
B) testing whether additional questionable variables have nonzero coefficients.
C) determining whether it can be measured in the population of interest.
D) being specific about the coefficient or coefficients of interest.
Answer: C
22) Possible solutions to omitted variable bias, when the omitted variable is not observed, include the following
with the exception of
A) panel data estimation.
B) nonlinear least squares estimation.
C) use of instrumental variables regressions.
D) use of randomized controlled experiments.
Answer: B
23) A possible solution to errors-in-variables bias is to
A) use log-log specifications.
B) choose different functional forms.
C) use the square root of that variable since the error becomes smaller.
D) mitigate the problem through instrumental variables regression.
Answer: D
24) You try to explain the number of IBM shares traded in the stock market per day in 2005. As an independent
variable you choose the closing price of the share. This is an example of
A) simultaneous causality.
B) invalid inference due to a small sample size.
C) sample selection bias since you should analyze more than one stock.
D) a situation where homoskedasticity-only standard errors should be used since you only analyze one
company.
Answer: A
25) In the case of errors-in-variables bias, the precise size and direction of the bias depend
on
A) the sample size in general.
B) the correlation between the measured variable and the measurement error.
C) the size of the regression R2 .
D) whether the good in question is price elastic.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 219
height of children on the average height of their parents. He found a positive intercept and a slope between
zero and one. Being concerned about the height of the English aristocracy, he interpreted the results as
regression to mediocrity (hence the name regression). Do you see the parallel?
Answer: (a) High (low) reading and maths scores in 1998 will result in high (low) reading and maths scores in
1999. The slope coefficients suggest a high degree of persistence. However, both regression lines cross
the 45 degree line, thereby implying implausibly mean reversion. All coefficients are statistically
significant, and approximately 80 to 90 percent of the variation in the 1999 scores are explained by the
1998 scores.
(b) The biggest threat to internal validity stems from the errors-in-variables problem. Assume that the
tests scores in maths in a given year are determined by a given set of factors, such as class size,
socioeconomic variables of the school district, quality of teachers, etc. Let the maths score in the second
year also be determined by the same factors, which are unlikely to change by much between the two
years. Then subtracting the earlier year from the more current year results in a population regression
function with a slope of one and an intercept of zero, and an error term which is correlated with the
previous years score. Hence the OLS estimator will be biased downward from one and the intercept will
be biased upward from zero, giving the above result.
There are few threats to internal or external validity present through the other factors, although the L.A.
school district may not be typical when compared to a less urban setting.
(c) The coefficients are unaffected by the choice of standard error calculation. However, hypothesis tests
have no longer the desired significance levels, unless the errors are homoskedastic. There is no
suggestion from the institutional setting of the district that this should be the case here. (Indeed,
homoskedasticity is rejected for the above sample.)
(d) In that case the intercept would be zero, and the slope one. This is a simultaneous hypothesis, and
hence the F-test is appropriate here.
(e) The critical value is 4.61 at the 1% level, thereby comfortably rejecting the null hypothesis in each
case.
(f) The situation is similar here. Instead of regressing the outcome in one period on determining factors,
it is regressed on the outcome in a previous period. In each case the outcome in the previous period is an
imperfect measure, or contains a measure error, of the underlying determinants. This results in problems
with internal validation.
3) Keynes postulated that the marginal propensity to consume (MPC =
hypothesized that the average propensity to consume (APC =
C
) is between zero and one. He also
Ypd
C
) would fall as personal disposable income
Ypd
Ypd increased.
(a) Specify a linear consumption function. Show that the assumption of a falling APC implies the presence of a
positive intercept.
(b) Using annual per capita data, estimation of the consumption function for the United States results in the
following output for the years 1929-1938:
^
national income from 1869 to 1938 and found, using overlapping period averages, that the APC was relatively
constant over this period. To reconcile this finding with the regression results, Milton Friedman, who also won
the Nobel Prize, formulated the permanent income hypothesis. In essence, Friedman hypothesized that both
actual consumption and income are measured with error,
Ct = Ct + v t and Yt = Yt + wt ,
where Ct and Yt were called permanent consumption and income, respectively, and v t and wt, the two
measurement errors, were labeled transitory consumption and income. Friedman hypothesized that the
transitory components were purely random error terms, uncorrelated with the permanent parts.
Let permanent consumption and income be related as follows:
Ct = k Ypd ,t + ut
so that the APC and MPC are the same and constant over time. Furthermore, let both transitory and permanent
income be independent of the error term. Show that by regressing actual consumption on actual income, the
MPC will be downward biased, and the intercept will be greater than zero, even in large samples (to simplify
the analysis, assume that permanent income and all of the errors are i.i.d. and mutually independent).
^
Answer: (a) Ci = 0 +
Ci
Ypd ,i
= APC = 0
^
1
+ 1 . Hence the APC will fall with increases in personal disposable income.
Ypd ,i
(b) Assuming that all assumptions required for proper inference are satisfied here, the t-statistic for an
MPC of one is 6.97, thereby rejecting the null hypothesis. You can also reject the null hypothesis that the
slope is zero (t-statistic = 26.32). The sample is very small here and certainly less than the number of
observations required to permit the use of the standard normal distribution. There may also be omitted
variables here, such as wealth, the real interest rate, the inflation rate, etc. The functional form may be
misspecified, and there may be errors in variables (permanent income). Perhaps most seriously, there is
simultaneous causality present, given the GDP identity.
Ct It
Gt
(c) Dividing both sides of the identity by GDP results in 1
. With the APC falling over
+
+
Yt Yt Yt
time as income increased, either the investment output ratio or the government output ratio would have
to make up for this fall. The likely candidate was the government-expenditure share.
(d) This is the standard errors-in-variables problem discussed in the textbook. Following the derivation
2
X
2
X+
2
w
1 , where X is permanent
income, and w is the measurement error in income. Hence the marginal propensity to consume will be
^
0 = 0 - ( 1 - 1 ) X + v. Therefore 0
0+ X 1
2
w
2
X+
2
w
, since
1- 1
2
w
2
X+
2
w
and X
4) The Phillips curve is a relationship in macroeconomics between the inflation rate (inf) and the unemployment
rate (ur). Estimating the Phillips curve using quarterly data for the United States from 1962:I to 1995:IV, you
find
Inf t = 4.08 + 0.118 urt, R2 = 0.003, SER = 3.148
(1.11) (0.176)
(a) Explain why, at first glance, this is a surprising result.
(b) Do you think that there is omitted variable bias in the regression?
(c) What other threats to internal validity may be present?
(d) If you could find a proper specification for the Phillips curve using United States data, what external
validity criteria would you suggest?
Answer: (a) There is supposed to be a negative relationship between inflation and unemployment.
(b) The omitted variable is inflationary expectations and the natural rate of unemployment.
(c) There is simultaneous causality in that inflation also causes employment and thereby unemployment
in many models. The functional form is most likely incorrect, since the Phillips curve is typically not
shown as a straight line. There may also be omitted variables in the form of supply side shocks.
(d) The most obvious choice would be to estimate the Phillips curve for other countries. It is also possible
to estimate the Phillips curve for a cross-section of countries. Using state data is more problematic since
state unemployment rates vary, but inflation rates are very similar and only exist for certain cities (using
the CPI).
5) You have decided to analyze the year-to-year variation in temperature data. Specifically you want to use this
years temperature to predict next years temperature for certain cities. As a result, you collect the daily high
temperature (Temp) for 100 randomly selected days in a given year for three United States cities: Boston,
Chicago, and Los Angeles. You then repeat the exercise for the following year. The regression results are as
follows (heteroskedasticity-robust standard errors in parentheses):
BOS
BOS
Temp t
= 18.19 + 0.75 Temp t-1 ; R2 = 0.62, SER = 12.33
(6.46) (0.10)
CHI
CHI
Temp t
= 2.47 + 0.95 Temp t-1 ; R2 = 0.93, SER = 5.85
(3.98) (0.05)
LA
LA
Temp t = 37.54 + 0.44 Temp t-1 ; R2 = 0.18, SER = 7.17
(15.33) (0.22)
(a) What is the prediction of the above regression for Los Angeles if the temperature in the previous year was
75 degrees? What would be the prediction for Boston?
(b) Assume that the previous years temperature gives accurate predictions, on average, for this years
temperature. What values would you expect in this case for the intercept and slope? Sketch how each of the
above regressions behaves compared to this line.
(c) After reflecting on the results a bit, you consider the following explanation for the above results. Daily high
temperatures on any given date are measured with error in the following sense: for any given day in any of the
three cities, say January 28, there is a true underlying seasonal temperature ( X), but each year there are
^
different temporary weather patterns (v, w) which result in a temperature X different from X. For the two years
in your data set, the situation can be described as follows:
Subtracting Xt1 from Xt2, you get Xt2 = Xt1 + wt v t. Hence the population parameter for the intercept and
slope are zero and one, as expected. Show that the OLS estimator for the slope is inconsistent, where
2
v
1-
2
X+
2
v
(d) Use the formula above to explain the differences in the results for the three cities. Is your mathematical
explanation intuitively plausible?
Answer: (a) The prediction for Los Angeles is 70.5 degrees, and for Boston 74.4 degrees.
(b) In that case, the intercept would be zero, and the slope one.
(c) The derivation follows footnote 2 in the textbook with one modification: 1 = 1.
^
(d) Rewriting 1
1-
2
v
2
X+
2
v
as
1-
2
X
1+
2
v
temperature regression will be closer to one, the more variation there is in the underlying true
temperature. Temperatures in Los Angeles vary the least throughout the year, and you would therefore
expect the largest bias. The slope for Chicago suggests that temperatures there have the most variation.
The standard deviation for the Boston temperature is 19.5 and for Chicago 21.0. However, these are
actual temperature standard deviations. To calculate the variance of X in the above example, you could
Stock/Watson 2e -- CVC2 8/23/06 -- Page 225
collect data over a 100-year period on the same dates and form daily averages. It is the standard
deviation of these temperatures that would most resemble the standard deviation in X.
6) A study of United States and Canadian labor markets shows that aggregate unemployment rates between the
two countries behaved very similarly from 1920 to 1982, when a two percentage point gap opened between the
two countries, which has persisted over the last 20 years. To study the causes of this phenomenon, you specify
a regression of Canadian unemployment rates on demographic variables, aggregate demand variables, and
labor market characteristics.
(a) Assume that your analysis is internally valid. What would make it externally valid?
(b) If one of the determinants of Canadian unemployment is aggregate United States economic activity (or
perhaps shocks to it), what variable would you suggest as its replacement if you did a similar study for the
United States?
(c) Certain Canadian geographical areas, such as the prairies and British Columbia, seem particularly sensitive
to commodity price shocks (Edmontons NHL team is called the Edmonton Oilers). Having collected provincial
data, you establish a relationship between provincial unemployment rates and commodity price changes
(shocks). How would you address external validity now?
Answer: (a) Threats to external validation come from the difference between the population and settings studied
versus the population and settings of interest. Finding, for example, that the variables which characterize
the unemployment insurance system exert an influence on Canadian unemployment, does not
automatically imply that this holds universally. To obtain external validity, the exercise should be
repeated to other geographic units, such as countries or states. If the coefficients are similar, or
differences in coefficients can be explained, then the study is externally valid.
(b) Shocks to world aggregate demand, or the major trading partners for the United States, would be a
possibility.
(c) The task is to find geographical units that are also sensitive to commodity price changes. Texas,
Louisiana, and Oklahoma would be candidates for obtaining external validity.
7) Several authors have tried to measure the persistence in U.S state unemployment rates by running the
following regression:
uri,t = 0 + 1 uri,t-k + zi,t
where ur is the state unemployment rate, i is the index for the i-th state, t indicates a time period, and typically
k 10.
(a) Explain why finding a slope estimate of one and an intercept of zero is typically interpreted as evidence of
persistence.
(b) You collect data on the 48 contiguous U.S. states unemployment rates and find the following estimates:
^
(d) One of your peers points out that this result makes little sense, since it implies that eventually all states
would have identical unemployment rates. Explain the argument.
(e) Imagine that state unemployment rates were determined by their natural rates and some transitory shock.
The natural rates themselves may be functions of the unemployment insurance benefits of the state,
unionization rates of its labor force, demographics, sectoral composition, etc. The transitory components may
include state-specific shocks to its terms of trade such as raw material movements and demand shocks from
the other states. You specify the i-th state unemployment rate accordingly as follows for the two periods when
you observe it,
so that actual unemployment rates are measured with error. You have also assumed that the natural rate is the
same for both periods. Subtracting the second period from the first then results in the following population
regression function:
It is not too hard to show that estimation of the observed unemployment rate in period t on the unemployment
rate in period (t-k) by OLS results in an estimator for the slope coefficient that is biased towards zero. The
formula is
^
2
v
1
2
X+
2
v
Using this insight, explain over which periods you would expect the slope to be closer to one, and over which
period it should be closer to zero.
(f) Estimating the same regression for a different time period results in
^
1-
2
v
2
X+
2
v
can be rewritten as 1
1-
1
2
X
2
v
closer to one over time periods when natural rate variations dominate the transitory deviation of state
unemployment rates from their natural rates. Therefore if you attempted to predict the unemployment
rates in the mid 1980s from those in the mid 1970s, then the slope coefficient should be further away
from one. (There are several studies that have found virtually no persistence in state unemployment
rates over this period.)
(f) Following the previous argument, the result suggests that there were more transitory deviations from
the natural rate over this period. The large drop in oil prices, particularly in 1986, comes to mind.
8) Sir Francis Galton (1822-1911), an anthropologist and cousin of Charles Darwin, created the term regression. In
his article Regression towards Mediocrity in Hereditary Stature, Galton compared the height of children to
that of their parents, using a sample of 930 adult children and 205 couples. In essence he found that tall (short)
parents will have tall (short) offspring, but that the children will not be quite as tall (short) as their parents, on
average. Hence there is regression towards the mean, or as Galton referred to it, mediocrity. This result is
obviously a fallacy if you attempted to infer behavior over time since, if true, the variance of height in humans
would shrink over generations. This is not the case.
(a) To research this result, you collect data from 110 college students and estimate the following relationship:
studenth = 19.6 + 0.73 Midparh, R2 = 0.45, SER = 2.0
(7.2) (0.10)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 228
where Studenth is the height of students in inches and Midparh is the average of the parental heights. Values in
parentheses are heteroskedasticity-robust standard errors. Sketching this regression line together with the 45
degree line, explain why the above results suggest regression to the mean or mean reversion.
(b) Researching the medical literature, you find that height depends, to a large extent, on one gene (phog)
and on environmental influences. Let us assume that parents and offspring have the same invariant (over time)
gene and that actual height is therefore measured with error in the following sense,
where X is measured height, X is the height given through the gene, v and w are environmental influences, and
the subscripts o and p stand for offspring and parents, respectively. Let the environmental influences be
independent from each other and from the gene.
Subtracting the measured height of offspring from the height of parents, what sort of population regression
function do you expect?
(c) How would you test for the two restrictions implicit in the population regression function in (b)? Can you
tell from the results in (a) whether or not the restrictions hold?
(d) Proceeding in a similar way to the proof in your textbook, you can show that
2
v
1-
2
X+
2
v
for the situation in (b). Discuss under what conditions you will find a slope closer to one for the height
comparison. Under what conditions will you find a slope closer to zero?
(e) Can you think of other examples where Galtons Fallacy might apply?
Answer: (a) As can be seen in the accompanying graph, the regression line crosses the 45 degree line. Tall (short)
parents will have tall (short) children, but on average, they will not be as tall (short) as their parents.
Hence they will regress to the mean, or mean revert.
t ln Yi,t = ln Yi,0 ln Yi,0 , and t and o refer to two time periods, i is the i-th country.
Explain why a significantly negative slope implies convergence (hence the name).
(c) The equation in (b) can be rewritten without any change in information as (ignoring the division by T)
ln Yt = 0 + 1 ln Y0 + ut
In this form, how would you test for unconditional convergence? What would be the implication for
convergence if the slope coefficient were one?
(d) Lets write the equation in (c) as follows:
Stock/Watson 2e -- CVC2 8/23/06 -- Page 230
Yt = 0 + 1~
Y0 + ut
and assume that the ~ variables contain measurement errors of the following type,
~
~
Yi,t = Y * + v i,t and Yi,0 = Y * + wi,0 ,
t
0
where the * variables represent true, or permanent, per capita income components, while v and w are
temporary or transitory components. Subtraction of the initial period from the current period then results in
~
~
Yi,t = ( Y * Y * ) + Yi,0 + (v i,t wi,0 )
t
0
Ignoring, without loss of generality, the constant in the above equation, and making standard assumptions
about the error term, one can show that by regressing current per capita income on a constant and the initial
period per capita income, the slope behaves as follows:
2
v
1
2
Y* +
2
v
10) One of the most frequently used summary statistics for the performance of a baseball hitter is the so -called
batting average. In essence, it calculates the percentage of hits in the number of opportunities to hit
(appearances at the plate). The management of a professional team has hired you to predict next seasons
performance of a certain hitter who is up for a contract renegotiation after a particularly great year. To analyze
the situation, you search the literature and find a study which analyzed players who had at least 50 at bats in
1998 and 1997. There were 379 such players.
(a) The reported regression line in the study is
1998
1997 2
Batavg i
; R = 0.17
= 0.138 + 0.467 Batavg i
and the intercept and slope are both statistically significant. What does the regression imply about the
relationship between past performance and present performance? What values would the slope and intercept
have to take on for the future performance to be as good as the past performance, on average?
(b) Being somewhat puzzled about the results, you call your econometrics professor and describe the results to
her. She says that she is not surprised at all, since this is an example of Galtons Fallacy. She explains that Sir
Francis Galton regressed the height of offspring on the mid-height of their parents and found a positive
intercept and a slope between zero and one. He referred to this result as regression towards mediocrity. Why
do you think econometricians refer to this result as a fallacy?
(c) Your professor continues by mentioning that this is an example of errors-in-variables bias. What does she
mean by that in general? In this case, why would batting averages be measured with error? Are baseball
statisticians sloppy?
(d) The top three performers in terms of highest batting averages in 1997 were Tony Gwynn (.372), Larry
Walker (.366), and Mike Piazza (.362). Given your answers for the previous questions, what would be your
predictions for the 1998 season?
Answer: (a) The regression implies mean reversion: those players who had a high (low) average in 1997 will have
a high (low) average in 1998, but it will not be as high (low) as before. If the performance was as good or
bad as in the past, then the intercept would have to be zero and the slope one.
(b) If the result were true, then eventually everyone would be of the same height.
(c) Errors-in-variables bias refers to a situation where variables are not measured precisely, but contain
a measurement error. In this situation, the player may have had an extraordinarily good or bad year,
resulting, perhaps, from an injury, adjustments to a new league, a new city, etc. This results in a
measurement error of his underlying ability. It has nothing to do with not measuring the batting average
correctly.
(d) The forecast would be for Tony Gwynn to bat (.312), Larry Walker (.309), and Mike Piazza (.307).
11) Your textbook compares the results of a regression of test scores on the student -teacher ratio using a sample of
school districts from California and from Massachusetts. Before standardizing the test scores for California,
you get the following regression result:
TestScr = 698.9 - 2.28STR
n = 420, R2 = 0.051, SER = 18.6
In addition, you are given the following information: the sample mean of the student -teacher ratio is 19.64
with a standard deviation of 1.89, and the standard deviation of the test scores is 19.05.
a.
After standardizing the test scores variable and running the regression again, what is the value of the
slope? What is the meaning of this new slope here (interpret the result)?
b.
What will be the new intercept? Now that test scores have been standardized, should you interpret the
Stock/Watson 2e -- CVC2 8/23/06 -- Page 232
intercept?
c.
Does the regression R2 change between the two regressions? What about the t-statistic for the slope
estimator?
The numerical value of the new slope is (-0.11). The interpretation is as follows: if you
decrease the student-teacher ratio by one, then test scores improve by 0.11 of a standard
deviation of test scores or 0.1119.05 = 2.10 (there are some rounding errors here).
1=
*
y i xi
i=1
n
i=1
=
xi
*
Y i xi
i=1
n
i=1
xi
i=1
(a + b Yi) x i
n
=b
xi
i=1
Yi x i
i=1
n
=b 1
xi
i=1
Or, in this case, 2.35. Mathematically speaking, the intercept continues to represent the
(standardized) test score when the student-teacher ratio is zero. This does not make sense
and it is best not to interpret the intercept.
c. Performing a linear transformation on the regressand (or the regressor for that matter) does not
change the regression R2 . It is easy but tedious to show that it is unaffected. Intuitively this makes sense
since otherwise you could affect the goodness of fit by whim (changing the scale of the data). Similarly,
logic dictates that the t-statistic is unaffected.
12) Suppose that you have just read a review of the literature of the effect of beauty on earnings. You were initially
surprised to find a mild effect of beauty even on teaching evaluations at colleges. Intrigued by this effect, you
consider explanations as to why more attractive individuals receive higher salaries. One of the possibilities you
consider is that beauty may be a marker of performance/productivity. As a result, you set out to test whether or
not more attractive individuals receive higher grades (cumulative GPA) at college. You happen to have access
to individuals at two highly selective liberal arts colleges nearby. One of these specializes in Economics and
Government and incoming students have an average SAT of 2,100; the other is known for its engineering
program and has an incoming SAT average of 2,200. Conducting a survey, where you offer students a small
incentive to answer a few questions regarding their academic performance, and taking a picture of these
individuals, you establish that there is no relationship between grades and beauty. Write a short essay using
some of the concepts of internal and external validity to determine if these results are likely to apply to
universities in general.
Answer: Students will consider various points that pose a threat to internal and external validity. Obviously there
is a difference in populations (external validity) between highly selective liberal arts colleges and
universities in general. SAT scores at these colleges are much higher than for the average university. In
addition, the gender composition may be quite different, especially for engineering school, where males
dominate in terms of student numbers. Even in economics, the ratio of female to male students is
typically 1:2. This is an example of sample selection bias (internal validity). Other potential problems
with this study may include errors-in-variables from students not reporting the correct GPA. However,
this may not be a severe problem since GPA is the dependent variable. There could be a problem if there
are systematic problems in inflating the GPA for lower GPAs. It is also not clear from the setup how
beauty was judged. If judges were chosen who are friends of the individuals, then their judgments may
be biased, which is more severe since beauty is an explanatory variable. The setup also does not indicate
what the control variables are. In the absence of controls, there will be omitted variable bias (internal
validity) since intelligence will clearly be a determining factor of cumulative GPAs.
D
Q i = 0 1 Pi + ui ,
S
Q i = 0 1 Pi + v i ,
where P is the price of the good. In addition, you typically assume that the market clears.
Explain how the simultaneous causality bias applies in this situation. The textbook explained a positive
correlation between Xi and ui for 1 > 0 through an argument that started from imagine that ui is negative.
Repeat this exercise here.
Answer: Although quantities appear on the left-hand side of both equations, this is a system of two equations in
two unknowns, where quantity and price are determined simultaneously by demand and supply.
A negative ui, call it a demand shock, decreases the quantity demanded. Since demand equals supply,
this results in a lower quantity traded, and hence a lower price. (At the old price level, there would now
be excess supply, and hence the price would fall.) The negative ui has therefore resulted in a lower price,
and hence the error term in the demand equation is positively correlated with the price in the same
equation.
2
X
2
X+
2
w
so that the OLS estimator is inconsistent. Give a condition involving the variances of X and w, under which the
bias towards zero becomes small.
^
Answer: 1
2
X
2
X+
2
w
1=
2
w
1+
2
X
the variable measured with error is dominated by the unobserved component, then the bias disappears.
Also, if there is no measurement error, then
2
w = 0, and the bias disappears.
3) You have been hired as a consultant by building contractor, who have been sued by the owners
representatives of a large condominium project for shoddy construction work. In order to assess the damages
for the various units, the owners association sent out a letter to owners and asked if people were willing to
make their units available for destructive testing. Destructive testing was conducted in some of these units as a
result of the responses. Based on the tests, the owners association inferred the damage over the entire condo
complex. Do you think that the inference is valid in this case? Discuss how proper sampling should proceed in
this situation.
Answer: This is clearly a case of sample selection bias which leads to bias in the OLS estimator in general. It
should be clear that inference cannot be conducted properly, since owners who suspect that their unit is
faulty are much more likely to agree to destructive testing of their unit than those who have not
experienced any problems. The proportion of units assumed to be faulty in the population is bound to be
too large when derived through sampling of this type.
The proper sampling method would be to decide on the units to be tested through random sampling. A
random number generator should be used to determine the sampled units. The owners association must
guarantee that the randomly selected units are available for destructive testing.
4) Assume that a simple economy could be described by the following system of equations,
Ct = 0 + 1 Yt + ui
It = I ,
where C is consumption, Y is income, and I is investment. (This may be a primitive island society which does
not trade with other islands. There is no government, and the only good consumed and invested (saved) is
sunflower seeds.)
Assume the presence of the GDP identity, Y = C + I. If you estimated the consumption function, what sort of
problem involving internal validity may be present?
Answer: There is simultaneous causality present in the system. Income causes consumption, which in return
causes income (GDP). A negative consumption shock, ut, causes consumption, and hence aggregate
demand, to fall. With lower aggregate demand, not all goods supplied are being sold in the market, and
hence income (Yt) falls. There is therefore a positive correlation between ut and Yt, i.e., the error term
and the regressor are correlated.
5) Your professor wants to measure the classs knowledge of econometrics twice during the semester, once in a
midterm and once in a final. Assume that your performance, and that of your peers, on the day of your
midterm exam only measure knowledge imperfectly and with an error,
~
1
1
1
Xi = Xi + wi ,
where X is your exam grade, X is underlying econometrics knowledge, and w is a random error with mean zero
and variance
2
w . w may depend on whether you have a headache that day, whether or not the questions you
had prepared for appeared on the exam, your mood, etc. A similar situation holds for the final, which is exam
two:
2
2
2
X i = X i + w i . What would happen if you ran a regression of grades received by students in the final on
midterm grades?
Answer: This is a typical errors-in-variables problem, which results in a downward biased estimator of the slope.
2
2
1
1
2
1
Subtracting the first equation from the second results in X i = (X i - X i ) + X i + ( w i - w i ). If
underlying econometrics knowledge at each exam did not change, then the regression should have a
slope of one and a zero intercept. (Alternatively, you can allow for an intercept.) The main point here is
that the performance during the first exam is only an imperfect measure of econometric ability, meaning
that there is measurement error. This results in a correlation between the error term and the regressor,
^
2
X
2
X+
2
w
1=
2
X
2
X+
2
w
will display mean reversion: students with high (low) midterm scores will most likely have high (low)
scores in the final, but they will not be quite as high (low) as in the midterm.
6) Consider the one-variable regression model, Yi = 0 + 1 Xi + ui, where the usual assumptions from Chapter 4
~
~
are satisfied. However, suppose that both Y and X are measured with error, Yi = Yi + zi and Xi = Xi + wi. Let
both measurement errors be i.i.d. and independent of both Y and X respectively. If you estimated the
~
~
regression model Yi = 0 + 1 Xi + v i using OLS, then show that the slope estimator is not consistent.
Answer: The difference from the example used in section 7.2 of the text is that both the regressor and the
dependent variable are measured with error here. Proceeding along the lines in section 7.2, you can write
the population regression equation Yi = 0 + 1 Xi + ui in terms of the imprecisely measured variables
where v i = zi - 1 wi + ui. Hence the dependent variable being measured with error does not cause
additional problems to the case discussed in the textbook, but the error term continues to be correlated
with the regressor. As a matter of fact, it is easiest to combine the this measurement error with the
*
population regression error term, i.e., u i = zi + ui, in which case the derivation shown in Chapter 7
~
~
*
footnote 2 of the textbook holds after making this small adjustment. Note that cov(Xi, u i ) = cov(Xi, zi) +
~
2
X
2
w as before, and 1
2
X+
2
w
1.
7) In the simple, one-explanatory variable, errors-in-variables model, the OLS estimator for the slope is
inconsistent. The textbook derived the following result
2
X
2
X+
2
w
1.
Show that the OLS estimator for the intercept behaves as follows in large samples:
~ p ~
X.
~
2
w
0+ X
2
X+
2
w
1,
where X
Therefore 0
0+ X 1
2
w
2
X+
2
w
, since 1
1- 1
2
w
2
X+
2
w
8) Assume that you had found correlation of the residuals across observations. This may happen because the
regressor is ordered by size. Your regression model could therefore be specified as follows:
Yi = 0 + 1 Xi + ui
ui = u i-1 + v i;
< 1.
Furthermore, assume that you had obtained consistent estimates for 0 , 1 , . If asked to make a prediction for
^
Y, given a value of X(= Xj) and uj-1 , how would you proceed? Would you use the information on the lagged
residual at all? Why or why not?
Answer: Given that the error term for j is related to the error term in j-1, it seems intuitive to use that information
^
in prediction, i.e., if Yj-1 is larger than 0 + 1 Xj-1 , thenYj will also be larger than but not by as much
(given > 0). Substitution of the second equation into the first equation results in Yi = 0 + 1 Xi + u i-1
+ v i. Hence the predicted value should be calculated as
^
^^
Yj = 0 + 1 Xj + uj-1 .
9) Your textbook only analyzed the case of an error-in-variables bias of the type Xi= Xi + wi. What if the error
were generated in the simple regression model by entering data that always contained the same typographical
~
~
error, say Xi= Xi + a or Xi= bXi, where a and b are constants. What effect would this have on your regression
model?
Answer: This would have an effect similar to changing the units of measurement. The measurement error is not
random here, and the bias can be determined exactly.
For the case Xi= Xi + a, the slope will be unaffected and the usual properties for the OLS slope estimator
will hold. However, since X = X + a and 0 = Y - 1 X) - 1 a, the intercept will be underestimated by the
constant measurement error times the slope.
For the case Xi = bXi, the intercept is unaffected, but the ratio of the estimated slope with measurement
error to the slope without measurement error is b.
10) Explain why the OLS estimator for the slope in the simple regression model is still unbiased, even if there is
correlation of the error term across observations.
^
Answer: The proof for unbiasedness is presented in Appendix 4.3 of the textbook. There 1 = 1 +
n
n
1
1
(Xi - X)ui
(Xi - X)ui
n
n
^
i=1
i=1
, and E( 1 ) = 1 + E
.
n
n
1
1
(Xi - X)2
(Xi - X)2
n
n
i=1
i=1
1
n
i=1
1
n
, and
n
i=1
(Xi - X)2
the second term vanishes due to the least squares assumptions of independence between the error term
and the regressor. The assumption of correlation of the error term across observations has not entered
into the proof. However, it will play a role in the derivation of standard errors.
11) To analyze the situation of simultaneous causality bias, consider the following system of equations:
Yi = 0 + 1 Xi + ui
Xi = 0 + 1 Yi + v i
Demonstrate the negative correlation between Xi and 1 for 1 < 0 , either through mathematics or by
presenting an argument which starts as follows: Imagine that ui is negative.
Answer: The mathematical derivation of the correlation is given in footnote 3 of Chapter 7 in the textbook. Setting
1 <0 results in a negative correlation between Xi and ui. A negative shock to the first equation yields a
lower Y. This in turn increases X in the second equation. Hence there is a negative correlation between Xi
and ui.
12) Think of three different economic examples where cross-sectional data could be collected. Indicate in each of
these cases how you would check if the analysis is externally valid.
Answer: Answers will differ by student. Using U.S. state data to analyze determinants of unemployment or the
effect of minimum wages on employment-population ratios, and using a sample of Canadian provinces,
or other subnational geographical units, may be mentioned. Similarly cross -country comparisons to test
convergence in per capita income could be compared to results within countries. Given the textbook
example, test scores in elementary schools within one state may be validated by using data from another
state.
2
X
2
X+
2
w
2
w
2
w+
2
X
1.
2
X
Answer:
2
X+
2
w
1=
2
X
2
w
2
X+
2
w
1 = 1-
2
w
2
X+
2
w
1= 1-
2
w
2
X+
2
w
1.
14) Your textbook has analyzed simultaneous equation systems in the case of two equations,
Yi = 0 + 1 Xi + ui
Xi = 0 + 1 Yi + v i ,
where the first equation might be the labor demand equation (with capital stock and technology being held
constant), and the second the labor supply equation (X being the real wage, and the labor market clears). What
if you had a a production function as the third equation
Zi = 0 + 1 Yi + wi
where Z is output. If the error terms, u, v, and w, were pairwise uncorrelated, explain why there would be no
simultaneous causality bias when estimating the production function using OLS.
Answer: Although the above system represents three equations in three unknowns, it is block -recursive,
meaning that X and Y (the real wage and employment) are completely determined by the first two
equations and independently of the production function (Z). Given the solution for employment (Y), the
third equation solely determines output (Z).
Put differently, if there was a positive shock to the production function, which would result in higher
output, then this would have no effect on employment (Y), and there would therefore be no feedback
into the production function. Hence the error term in the third equation is not correlated with the
regressor.
15) A professor in your microeconomics lectures derived a labor demand curve in the lecture. Given some
reasonable assumptions, she showed that the demand for labor depends negatively on the real wage. You want
to put this hypothesis to the test (show me) and collect data on employment and real wages for a certain
industry. You try to estimate the labor demand curve but find no relationship between the two variables. Is
economic theory wrong? Explain.
Answer: This is a case of simultaneous causality. Since there is a supply of labor as well, the real wage depends
on employment, which, in a market-clearing model, is determined by the intersection of supply and
demand. In a Keynesian world with wait unemployment, you would expect a negative relationship
between real wages and employment, given the capital stock and productivity.
16) Your textbook uses the following example of simultaneous causality bias of a two equation system:
Yi = 0 +
1 Xi + ui
Xi = 0 + 1 Yi + v i
To be more specific, think of the first equation as a demand equation for a certain good, where Y is the quantity
demanded and X is the price. The second equation then represents the supply equation, with a third equation
establishing that demand equals supply. Sketch the market outcome over a few periods and explain why it is
impossible to identify the demand and supply curves in such a situation. Next assume that an additional
variable enters the demand equation: income. In a new graph, draw the initial position of the demand and
supply curves and label them D0 and S0 . Now allow for income to take on four different values and sketch
what happens to the two curves. Is there a pattern that you see which suggests that you might be able to
identify one of the two equations with real-life data?
You only observe market outcomes (the intersection of the demand and supply curve). Fitting a
regression line through these points does not gives you neither the supply curve nor the demand curve,
and hence neither is identified.
The market outcome now generates give observations at the intersection of the two curves. Fitting a line
through the five points will give an estimate of the supply curve. Hence by shifting the demand curve in
this fashion, you can identify the supply curve.
17) Give at least three examples where you could envision errors-in-variables problems. For the case where the
^ p
measurement error occurs only for the explanatory variable in the simple regression case, derive 1
2
X
2
X+
2
w
1.
Answer: Answers will vary by student. Consumption functions are frequently mentioned, where permanent
consumption is proportional to permanent income, both of which differ from actual measures of
consumption and income through transitory components. There are several examples in this chapter of
the test bank where the underlying measure of the regressor is proxied by previous outcomes
(unemployment rates, weather, height, etc.). Students may feel that responses to surveys result in
measurement error, e.g., when people respond to questions regarding their income, their SAT score, and
so forth.
The formula is derived in Chapter 7, footnote 2 of the textbook.
18) Your textbook states that correlation of the error term across observations will not happen if the data are
obtained by sampling at random from the population. However, in one famous study of the electric utility
industry, the observations were listed by the size of the output level, from smallest to largest. The pattern of the
residuals was as shown in the figure.
20) In macroeconomics, you studied the equilibrium in the goods and money market under the assumption of
prices being fixed in the very short run. The goods market equilibrium was described by the so -called IS
equation
Ri = 0 1 Yi + ui
where R represented the nominal interest rate and Y was real GDP. 0 contained variables determined outside
the system, such as government expenditures, taxes, and inflationary expectations.
The money market equilibrium was given by the so-called LM equation
Ri = 0 + 1 Yi + v i
and 0 contained the real money supply and the intercept from the money demand equation.
Show that there is simultaneous causality bias in this situation.
Answer: Consider the case of a positive shock to the LM curve. This will increase the interest rate, which, in
return, will result in lower output through the IS curve. Hence there is negative correlation between the
error in the LM curve and the regressor, resulting in simultaneous causality bias.
21) Assume the following model of the labor market:
W
Nd = 0 + 1
+u
P
W
Ns = 0 + 1
+v
P
Nd = Ns = N
where N is employment, (W/P) is the real wage in the labor market, and u and v are determinants other
than the real wage which affect labor demand and labor supply (respectively). Let
E(u) = E(v) = 0; var(u) =
2
u ; var(v) =
2
v ; cov(u,v) = 0
Assume that you had collected data on employment and the real wage from a random sample of
observations and estimated a regression of employment on the real wage (employment being the
regressand and the real wage being the regressor). It is easy but tedious to show that
( 1- 1)
( 1 - 1)
2
u
2
u+
2
v
>0
since the slope of the labor supply function is positive and the slope of the labor demand function is
negative. Hence, in general, you will not find the correct answer even in large samples.
a.
b.
What would the relationship between the variance of the labor supply/demand shift variable have to
be for the bias to disappear?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 245
c.
Give an intuitive answer why the bias would disappear in that situation. Draw a graph to illustrate
your argument.
22) To compare the slope coefficient from the California School data set with that of the Massachusetts School data
set, you run the following two regressions:
TestScrCA = 2.35 - 0.123STRCA
(0.54) (0.027)
n = 420, R2 = 0.051, SER = 0.98
R2
Numbers in parenthesis are heteroskedasticity-robust standard errors, and the LHS variable has been
standardized.
Calculate a t-statistic to test whether or not the two coefficients are the same. State the alternative
hypothesis. Which level of significance did you choose?
Answer: H0 : 1,CA = 1,MA; H1 : 1,CA
1,MA;t =
0.123-0.114
= 0.21. Hence you cannot reject the null
0.027 2 + 0.114 2
hypothesis at any reasonable level of significance. The underlying assumption here is that the two
samples are independent, which seems reasonable.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 246
23) You have read the analysis in chapter 9 and want to explore the relationship between poverty and test scores.
You decide to start your analysis by running a regression of test scores on the percent of students who are
eligible to receive a free/reduced price lunch both in California and in Massachusetts. The results are as
follows:
TestScrCA = 681.44 - 0.610PctLchCA
(0.99) (0.018)
n = 420,
R2
(0.045)
Calculate a t-statistic to test whether or not the two slope coefficients are the same.
b.
Your textbook compares the slope coefficients for the student-teacher ratio instead of the percent
eligible for a free lunch. The authors remark: Because the two standardized tests are different, the
coefficients themselves cannot be compared directly: One point on the Massachusetts test is not the
same as one point on the California test. What solution do they suggest?
1,MA;t =
0.788-0.610
= 3.67. Hence you reject the null
0.018 2 + 0.045 2
hypothesis.
b. The authors suggest standardizing the test score variable in both states by subtracting the mean and
by dividing by the standard deviation.
1) The notation for panel data is (Xit, Yit), i = 1, ..., n and t = 1, ..., T because
A) we take into account that the entities included in the panel change over time and are replaced by others.
B) the Xs represent the observed effects and the Y the omitted fixed effects.
C) there are n entities and T time periods.
D) n has to be larger than T for the OLS estimator to exist.
Answer: C
13) Consider the regression example from your textbook, which estimates the effect of beer taxes on fatality rates
across the 48 contiguous U.S. states. If beer taxes were set nationally by the federal government rather than by
the states, then
A) it would not make sense to use state fixed effect.
B) you can test state fixed effects using homoskedastic-only standard errors.
C) the OLS estimator will be biased.
D) you should not use time fixed effects since beer taxes are the same at a point in time across states.
Answer: D
14) In the panel regression analysis of beer taxes on traffic deaths, the estimation period is 1982 -1988 for the 48
contiguous U.S. states. To test for the significance of time fixed effects, you should calculate the F-statistic and
compare it to the critical value from your Fq, distribution, where q equals
A) 6.
B) 7.
C) 48.
D) 53.
Answer: A
15) When you add state fixed effects to a simple regression model for U.S. states over a certain time period, and the
regression R2 increases significantly, then it is safe to assume that
A) the included explanatory variables, other than the state fixed effects, are unimportant.
B) state fixed effects account for a large amount of the variation in the data.
C) the coefficients on the other included explanatory variables will not change.
D) time fixed effects are unimportant.
Answer: B
16) Time Fixed Effects regression are useful in dealing with omitted variables
A) even if you only have a cross-section of data available.
B) if these omitted variables are constant across entities but vary over time.
C) when there are more than 100 observations.
D) if these omitted variables are constant across entities but not over time.
Answer: B
17) Indicate for which of the following examples you cannot use Entity and Time Fixed Effects: a regression of
A) OECD unemployment rates on unemployment insurance generosity for the period 1980 -2006 (annual
data).
B) the (log of) earnings on the number of years of education, using the Current Population Survey of 60,000
households for March 2006.
C) the per capita income level in Canadian Provinces on provincial population growth rates, using decade
averages for 1960, 1970, and 1980.
D) the risk premium of 75 stocks on the market premium for the years 1998-2006.
Answer: B
18) Panel data is also called
A) longitudinal data.
B) cross-sectional data.
C) time series data.
D) experimental data.
Answer: A
19) (Requires Appendix material) When the fifth assumption in the Fixed Effects regression (cov (uit, uis Xit, Xis)
= 0 for t s ) is violated, then
A) using heteroskedastic-robust standard errors is not sufficient for correct statistical inference when using
OLS.
B) the OLS estimator does not exist.
C) you can use the simple homoskedasticity-only standard errors calculated in your regression package.
D) you cannot use fixed time effects in your estimation.
Answer: A
20) In the panel regression analysis of beer taxes on traffic deaths, the estimation period is 1982 -1988 for the 48
contiguous U.S. states. To test for the significance of entity fixed effects, you should calculate the F-statistic and
compare it to the critical value from your Fq, distribution, where q equals
A) 48.
B) 54.
C) 7.
D) 47.
Answer: D
21) The main advantage of using panel data over cross sectional data is that it
A) gives you more observations.
B) allows you to analyze behavior across time but not across entities.
C) allows you to control for some types of omitted variables without actually observing them.
D) allows you to look up critical values in the standard normal distribution.
Answer: C
22) One of the following is a regression example for which Entity and Time Fixed Effects could be used: a study of
the effect of
A) minimum wages on teenage employment using annual data from the 48 contiguous states in 2006 .
B) various performance statistics on the (log of) salaries of baseball pitchers in the American League and the
National League in 2005 and 2006.
C) inflation and inflationary expectations on unemployment rates in the United States, using quarterly data
from 1960-2006.
D) drinking alcohol on the GPA of 150 students at your university, controlling for incoming SAT scores.
Answer: B
23) Consider a panel regression of unemployment rates for the G7 countries (United States, Canada, France,
Germany, Italy, United Kingdom, Japan) on a set of explanatory variables for the time period 1980 -2000
(annual data). If you included entity and time fixed effects, you would need to specify the following number of
binary variables:
A) 21.
B) 6.
C) 28.
D) 26.
Answer: D
24) A pattern in the coefficients of the time fixed effects binary variables may reveal the following in a study of the
determinants of state unemployment rates using panel data:
A) macroeconomic effects, which affect all states equally in a given year.
B) attitude differences towards unemployment between states.
C) there is no economic information that can be retrieved from these coefficients.
D) regional effects, which affect all states equally, as long as they are a member of that region.
Answer: A
25) In the panel regression analysis of beer taxes on traffic deaths, the estimation period is 1982 -1988 for the 48
contiguous U.S. states. To test for the significance of time fixed effects, you should calculate the F-statistic and
compare it to the critical value from your Fq, distribution, which equals (at the 5% level)
A) 2.01.
B) 2.10.
C) 2.80.
D) 2.64.
Answer: B
26) Assume that for the T = 2 time periods case, you have estimated a simple regression in changes model and
found a statistically significant positive intercept. This implies
A) a negative mean change in the LHS variable in the absence of a change in the RHS variable since you
subtract the earlier period from the later period
B) that the panel estimation approach is flawed since differencing the data eliminates the constant (intercept)
in a regression
C) a positive mean change in the LHS variable in the absence of a change in the RHS variable
D) that the RHS variable changed between the two subperiods
Answer: C
27) HAC standard errors and clustered standard errors are related as follows:
A) they are the same
B) clustered standard errors are one type of HAC standard error
C) they are the same if the data is differenced
D) clustered standard errors are the square root of HAC standard errors
Answer: B
28) In panel data, the regression error
A) is likely to be correlated over time within an entity
B) should be calculated taking into account heteroskedasticity but not autocorrelation
C) only exists for the case of T > 2
D) fits all of the three descriptions above
Answer: A
29) It is advisable to use clustered standard errors in panel regressions because
A) without clustered standard errors, the OLS estimator is biased
B) hypothesis testing can proceed in a standard way even if there are few entities ( n is small)
C) they are easier to calculate than homoskedasticity-only standard errors
D) the fixed effects estimator is asymptotically normally distributed when n is large
Answer: D
30) If Xit is correlated with Xis for different values of s and t, then
A) Xit is said to be autocorrelated
B) the OLS estimator cannot be computed
C) statistical inference cannot proceed in a standard way even if clustered standard errors are used
D) this is not of practical importance since these correlations are typically weak in applications
Answer: A
affected by this?
Answer: (a) Time effects will pick up the effect of omitted variables that are common to all 50 states at a given
point in time. Federal fiscal and monetary variables, exchange rate and U.S. terms of trade movements,
aggregate business cycle developments, etc., are candidates here. State fixed effects will include variables
that are slowly changing over time within a specific state such as attitudes toward employment or labor
force participation, state specific labor market policies, industrial and labor force composition, etc.
(b) The implicit assumption by the author is that the coefficients on the state fixed effects are identical
within a region but differ between regions. Since these coefficients imply linear restrictions, they can be
tested using the F-test.
(c) Consider a ten percent increase in minimum wages, say from $5 to $5.50 with constant average
hourly earnings. This corresponds to a ten percent increase in relative minimum wages. The resulting
decrease in the teenage to population ratio is 1.8 or almost 2 percent. The regression explains roughly 73
percent of the employment to population ratio of teenagers during the period of 1977 to 1989 for the 50
U.S. states.
(d) This choice in effect drops the i subscript from the minimum wage, since there is no variation by
state. The original equation then reads
ln(Eit )= 0 + 1 ln(Mit /Wit ) + 2 D2 i + ... + nD8 i +
2 B2 t + ... +
TB13t + uit.
Furthermore, since the federal minimum wage is constant across the nine regions at a point in time, it is
absorbed by the time effects. The coefficient on the relative minimum wage therefore reflects regional
variations in average hourly earning in manufacturing. The minimum wage only enters indirectly as
changes in the federal minimum wage since there are different relative levels to average hourly earnings
in each region.
2) You want to find the determinants of suicide rates in the United States. To investigate the issue, you collect
state level data for ten years. Your first idea, suggested to you by one of your peers from Southern California, is
that the annual amount of sunshine must be important. Stacking the data and using no fixed effects, you find
no significant relationship between suicide rates and this variable. (This is good news for the people of Seattle.)
However, sorting the suicide rate data from highest to lowest, you notice that those states with the lowest
population density are dominating in the highest suicide rate category. You run another regression, without
fixed effect, and find a highly significant relationship between the two variables. Even adding some economic
variables, such as state per capita income or the state unemployment rate, does not lower the t-statistic for the
population density by much. Adding fixed entity and time effects, however, results in an insignificant
coefficient for population density.
(a) What do you think is the cause for this change in significance? Which fixed effect is primarily responsible?
Does this result imply that population density does not matter?
(b) Speculate as to what happens to the coefficients of the economic variables when the fixed effects are
included. Use this example to make clear what factors entity and time fixed effects pick up.
(c) What other factors might play a role?
Answer: (a) Population density only changes slowly over time, hence state effects will pick up the influence of
this variable. This does not imply that population is of no relevance. However, there are other omitted
variables in this regression, such as religious and cultural attitudes towards suicide, that are also
captured by the state effects, and these may also be correlated with population density.
(b) Since there is sufficient variation of state unemployment rates and state per capita income both over
time and across states, the coefficients on these variables are likely to remain statistically significant.
However, there may be multicollinearity between the two variables, and the standard errors may
therefore be large.
(c) Answers will vary by student. Cultural and institutional factors, such as attitudes towards suicide
and religion, and social services, are frequently mentioned.
3) Two authors published a study in 1992 of the effect of minimum wages on teenage employment using a U.S.
state panel. The paper used annual observations for the years 1977 -1989 and included all 50 states plus the
District of Columbia. The estimated equation is of the following type
(Eit )= 0 + 1 (Mit /Wit ) + 2 D2 i + ... + nD51i + 2 B2 t + ... + TB13t + uit,
where E is the employment to population ratio of teenagers, M is the nominal minimum wage, and W is
average wage in the state. In addition, other explanatory variables, such as the prime -age male unemployment
rate, and the teenage population share were included.
(a) Briefly discuss the advantage of using panel data in this situation rather than pure cross sections or time
series.
(b) Estimating the model by OLS but including only time fixed effects results in the following output
^
Compare the two results. Why would the inclusion of state fixed effects change the coefficients in this way?
(d) The significance of each coefficient decreased, yet R2 increased. How is that possible? What does this result
tell you about testing the hypothesis that all of the state fixed effects can be restricted to have the same
coefficient? How would you test for such a hypothesis?
Answer: (a) There are likely to be omitted variables in the above regression. One way to deal with some of these is
to introduce state and time effects. State effects will capture the influence of omitted variables that are
state specific and do not vary over time, while time effects capture those of country wide variables that
are common to all states at a point in time. Furthermore, there are more observations when using panel
data, resulting in more variation.
(b) There is negative relationship between minimum wages and the employment to population ratio.
Increases in the share of teenagers in the population result in a higher employment to population ratio,
and increases in the prime-age male unemployment rate lower the employment to population ratio. 20
percent of employment to population of teenagers variation is explained by the above regression. The
relative minimum wage and the prime-age male unemployment rate are significant using a 1%
significance level, while the proportion of teenagers in the population is not. Elasticities vary with levels
here. One possibility is to report elasticities at sample means.
(c) The parameter of interest here is the coefficient on the relative minimum wage. While it was highly
significant in the previous regression, it now has changed signs and is statistically insignificant. The
explanatory power of the equation has increased substantially. The size of the other two coefficients has
also decreased. The results suggest that omitted variables, which are now captured by state fixed effects,
were correlated with the regressors and caused omitted variable bias.
(d) The influence of the state effects is large. These are bound to be statistically significant and the
hypothesis to restrict these coefficients to zero is bound to fail. Since these are linear hypothesis that are
supposed to hold simultaneously, an F-test is appropriate here.
4) You learned in intermediate macroeconomics that certain macroeconomic growth models predict conditional
convergence or a catch up effect in per capita GDP between the countries of the world. That is, countries which
are further behind initially in per-capita GDP will grow faster than the leader. You gather data from the Penn
World Tables to test this theory.
(a) By limiting your sample to 24 OECD countries, you hope to have a more homogeneous set of countries in
your sample, i.e., countries that are not too different with respect to their institutions. To simplify matters, you
decide to only test for unconditional convergence. In that case, the laggards catch up even without taking into
account differences in some of the driving variables. Your scatter plot and regression for the time period
1975-1989 are as follows:
crimes reported). and are area and year fixed effects, where i equals one for area i and is zero otherwise
for all i, and t is one in year t and zero for all other years for t = 2, , 22. 1 is not included.
(a) What is the purpose of excluding 1 ? What are the terms and likely to pick up? Discuss the advantages
of using panel data for this type of investigation.
(b) Estimation by OLS using heteroskedasticity and autocorrelation -consistent standard errors results in the
following output, where the coefficients of the fixed effects are not reported:
ln(cmrt)it = 0.063 unrtmit + 3.739 proythit 0.588 ln(pp)it ; R2 = 0.904
(0.109)
(0.179)
(0.024)
Comment on the results. In particular, what is the effect of a ten percent increase in the probability of
punishment?
(c) To test for the relevance of the area fixed effects, your restrict the regression by dropping all entity fixed
effects and add single constant is added. The relevant F-statistic is 135.28. What are the degrees of freedom?
What is the critical value from your F table?
(d) Although the test rejects the hypothesis of eliminating the fixed effects from the regression, you want to
analyze what happens to the coefficients and their standard errors when the equation is re -estimated without
^
1 is now 1.340 with a standard error of 0.234. Why do you think that is?
Answer: (a) Since there is no constant in addition to the entity and time fixed effects, setting
t to one in year t
and zero for all other years for t = 1, , 22 would result in perfect multicollinearity. picks up omitted
variables that are specific to police regions and do not vary over time. picks up effects that are
common to all police regions in a given year. Attitudes toward crime may vary between rural regions
and metropolitan areas. These would be hard to capture through measurable variables. Common
macroeconomic shocks that affect all regions equally will be captured by the time fixed effects. Although
some of these variables could be explicitly introduced, the list of possible variables is long. By
introducing time fixed effects, the effect is captured all in one variable.
(b) A higher male unemployment rate and a higher proportion of youths increase the crime rate, while a
higher probability of punishment decreases the crime rate. The coefficients on the probability of
punishment and the proportion of youths is statistically significant, while the male unemployment rate
is not. The regression explains roughly 90 percent of the variation in crime rates in the sample. A ten
percent increase in the number of convictions over the number of crimes reported decreases the crime
rate by roughly six percent.
(c) The coefficients of the three regressors other than the entity coefficients would have been unaffected,
had there been a constant in the regression and (n-1) police region specific entity variables. In this case,
the entity coefficients on the police regions would have indicated deviations from the constant for the
first police region. Hence there are 41 restrictions imposed by eliminating the entity fixed effects and
adding a constant. Since there are over 100 observations (900 degrees of freedom), the critical value for
F41,
F30, = 1.70 at the 1% level. Hence the restrictions are rejected.
(d) This result would make the male unemployment rate coefficient significant. It suggests that male
unemployment rates change slowly over the years in a given police district and that this effect is picked
up by the entity fixed effects. Of course, there are other slowly changing variables, such as attitudes
towards crime, that are captured by these fixed effects.
6) You want to investigate the relationship between cumulative GPA scores at graduation and incoming SAT
scores of students. For this purpose, you have collected data from a balanced panel of 120 undergraduate
colleges and universities in the United States over a ten year period. Discuss some of the entity fixed effects
which you potentially capture by allowing for a binary variable for each of the colleges.
Answer: Students will come up with various possible entity fixed effects. These should include differences
between educational institutions that have
and so forth.
7) You want to study the relationship between weight and height of young children (4 th grade to 7th grade). You
collect data for more than 400 students and track the progress of these students over the following four years,
where you end up with a balanced panel of 400 students (you discard the observations for the students who
moved away). Discuss some of the entity fixed effects which you potentially capture by allowing for a binary
variable for each of the students. Do you expect significant time fixed effects if you allowed for them?
Answer: Students will come up with various possible entity fixed effects. These will reflect differences between
students potentially depending on
gender
ethnicity
degree of participation in exercises/athletic programs
growth spurts during these years
nutrition
genes
and so forth. It is hard to think of time fixed effects. Potentially there could be an effect if all students
went to a different school in 7th grade (e.g. middle school) and this school had a less/more healthy
lunch diet.
8) You first encountered growth regression in your intermediate macroeconomics course (beta -convergence
regressions), that is, conditionally on some initial condition in per capita income, different authors tried to find
the determinants of growth. Since growth is a long-run phenomenon, various studies collected data for a panel
of numerous countries using 10-year averages, over a time period stretching from 1960 to 2005. For example, a
balanced panel might consist of 50 or so odd countries for the time periods 1960 -1970, 1971-1980, ,
2000-2005. Instead of using two-way fixed effects (entity fixed effects and time fixed) authors often only
employed time fixed effects. Why do you think that is? What sort of information would be lost if these authors
employed entity fixed effects as well?
Answer: Time fixed effects will eliminate common growth phenomenon experienced by all countries during the
same decade (say). These could include productivity slow -downs due to the oil crisis of the 70s, effects
of the Great Moderation of the 90s, etc. However, most of these studies were interested in determining
the effect of institutional differences between countries. These effects, such as the degree of democracy,
law and order, openness of the economy, size of government, civil wars, geography, religion, etc., are
typically slowly changing, and by including entity fixed effects, you would lose the effects you are
interested in studying.
where Yt =
1
n
n
i=1
Yit, Xt = 1
n
n
i=1
0 + 3 St + ut
1
Xit, and ut =
n
uit .
i=1
where Yit = Yit -Yi, and Xit and uit are defined similarly. The time-demeaned regression can then be
estimated by OLS.
4) Your textbook modifies the four assumptions for the multiple regression model by adding a new assumption.
This represents an extension of the cross-sectional data case, where errors are uncorrelated across entities. The
new assumption requires the errors to be uncorrelated across time, conditional on the regressors as well
(cov(uit, uis Xit, Xis) = 0 for t s.).
(a) Discuss why there might be correlation over time in the errors when you use U.S. state panel data. Does this
mean that you should not use OLS as an estimator?
(b) Now consider pairs of adjacent states such as Indiana and Michigan, Texas and Arkansas, New York and
Connecticut, etc. Is it likely that the fifth assumption will hold here, even though the contemporaneous errors
are correlated? If not, can you still use OLS for estimation?
Answer: (a) The error term may contain omitted variables. If these change slowly from one period to the next,
then the error term will be correlated over time. In that case (cov(uit, uis Xit, Xis) = 0 for t s will be
violated. The OLS estimator is still unbiased, but valid statistical inference cannot be conducted, even
when using heteroskedasticity-robust standard errors. However, heteroskedasticity- and
autocorrelation- consistent standard errors can be used in this situation.
(b) The fifth assumption deals with observations that do not occur during the same time period. It does
not address the problems of errors of one entity being affected by errors in another entity during the
same period. While potentially there are more efficient estimators available in such a situation, OLS can
still be used for estimation.
5) In Sports Economics, production functions are often estimated by relating the winning percentage of teams ( Y)
to inputs indicating performance in certain aspects of the game. However, this omits the quality of
management. Assume that you could measure the quality of pitching and hitting by a single index L, and that
managerial ability is represented by M, which is assumed to be constant over time. The production function
would then be specified as follows:
Yit = 0 + 1 Lit + 2 Mi + uit
where i is an index for the baseball team, and t indexes time and all variables are in logs.
(a) Assume that managerial ability is unobservable but is positively related, in a linear way, to L. Explain why
^
the OLS estimator 1 is inconsistent in the case of a single cross-section, i.e., if you attempt to estimate the
above regression for a single year. Do you expect this coefficient to over- or under-estimate 1 ?
(b) If you had data for two years, indicate the transformation, which allows you to obtain a consistent estimator
for 1 .
Answer: (a) Regressing Y on L alone will result in omitted variable bias. An increase in the pitching and hitting
^
index will increase managerial ability, which in return increases the winning percentage. Hence 1 will
be expected to overestimate the effect of pitching and hitting on the winning percentage. Said differently,
OLS will attribute more to pitching and hitting quality and it deserves.
(b) Since managerial ability is assumed to be constant over time, then differencing the data over the two
time-periods will eliminate this effect for all teams. This can be shown as follows:
Yi2 = 0 + 1 Li2 + 2 Mi + ui2
Yi1 = 0 + 1 Li1 + 2 Mi + ui1
Subtracting the second equation from the first results in
Yi2 - Yi1 = 1 (Li2 - Li1 ) + ui2 - ui1 .
Alternatively, the binary variable specification or the entity-demeaned specification could have been
used with identical estimation results.
6) A study attempts to investigate the role of the various determinants of regional Canadian unemployment rates
in order to get a better picture of Canadian aggregate unemployment rate behavior. The annual data
(1967-1991) is for five regions (Atlantic region, Quebec, Ontario, Prairies, and British Columbia), and four
age-gender groups (female and male, adult and young). Focusing on young females, the authors find
significant effects for the following variables: the regional relative minimum wage rate (minimum wages
divided by average hourly earnings), the regional share of youth in the labor force, the regional share of adult
females in the labor force, United States activity shocks (deviations of United States GDP from trend), an
indicator of the degree of monetary tightness in Canada, regional union density, and a regional index of
unemployment insurance generosity. Explain why the authors only used region fixed effects. How would their
specification have to change if they also employed time fixed effects?
Answer: Since the study used Canada-wide effects (United States activity shocks, and monetary tightness), these
are identical for all regions at a point in time. Using time fixed effects in addition to these two variables
would have generated perfect multicollinearity among the regressors, and hence the OLS estimator
would not exist. An alternative specification would include time fixed effects, but eliminate the two
variables which are constant across all regions at a given point in time.
7) (Requires Matrix Algebra) Consider the time and entity fixed effect model with a single explanatory variable
Yit = 0 + 1 Xit + 2 D2 i + ... + nDni + 2 B2 t + ... + TBTt + uit,
For the case of n = 4 and T = 3, write this model in the form Y = X + U, where, in general,
X1
Y1
Y=
Y2
Yn
, U=
u1
u2
un
0
= 1
k
Xn
How would the X matrix change if you added two binary variables, D1 and B1? Demonstrate that in this case
the columns of the X matrix are not independent. Finally show that elimination of one of the two variables is
^
not sufficient to get rid of the multicollinearity problem. In terms of the OLS estimator, = (X X)-1 X Y, why
does perfect multicollinearity create a problem?
Answer: For the case of n = 4 and T = 3, the general model would look as follows:
Y11
Y12
Y13
Y21
Y22
Y23
1 X11 0 0 0 0 0
1 X12 0 0 0 0 0
u11
u12
1 X13 0 0 0 1 0
1 X21 1 0 0 0 0
u13
u21
1 X22 1 0 0 1 0
1 X23 1 0 0 0 1
Y31 = 1 X31 0 1 0
Y32
1 X32 0 1 0
Y33
1 X33 0 1 0
Y41
1 X41 0 0 1
Y42
Y43
0
1
2
u22
u23
1 0
3 + u31
u32
4
0 1
0 0
0 0
1 X42 0 0 1 1 0
1 X43 0 0 1 0 1
u33
u41
u42
u43
Adding the two binary variable would change the X matrix in this way:
Stock/Watson 2e -- CVC2 8/23/06 -- Page 263
1 X11 0 0 0 0 0 1 1
1 X12 0 0 0 1 0 1 0
1 X13 0 0 0 0 1 1 0
1 X21 1 0 0 0 0 0 1
1 X22 1 0 0 1 0 0 0
1 X23 1 0 0 0 1 0 0
X=
1 X31 0 1 0 0 0 0 1
1 X32 0 1 0 1 0 0 0
1 X33 0 1 0 0 1 0 0
1 X41 0 0 1 0 0 0 1
1 X42 0 0 1 1 0 0 0
1 X43 0 0 1 0 1 0 0
Adding columns 6, 7, and 9 results in column 1. Also adding columns 3, 4 , 5, and 8 results in column 1.
Hence the columns are not linearly independent and there is perfect multicollinearity among the
columns of the matrix. Eliminating column 9, say, is not sufficient to get rid of this problem, since adding
columns 3, 4, 5, and 8 still equals column 1. In case of perfect multicollinearity, the X matrix will not have
full rank, and hence (X X)-1 will also not have full rank (it is singular). In this case, (X X)-1 cannot be
inverted, and hence the OLS estimator does not exist.
8) Consider the time and entity fixed effect model with a single explanatory variable
Yit = 0 + 1 Xit + 2 D2 i + ... + nDni + 2 B2 t + ... + TBTt + uit,
Assume that you had estimated the above equation by OLS. Typically the coefficients for the entity and time
binary variables are not reported. Can you think of situations where the pattern of these coefficients might be
of interest? What could you do, for example, if you had a strong theoretical justification for believing that a few
macroeconomic variables had an effect on Yit ?
Answer: The coefficients pick up the effects of omitted variables that are common to all entities at a point in time
(time fixed effects), or that are constant across time for entities (entity fixed effect). If data is available on
slowly changing variables across time, say population density or average educational attainment by U.S.
state, or on macroeconomic variables, then you could perform a regression of the binary variable
coefficients on these variables to determine the degree of correlation. Obviously, the correlation will be
less than perfect, and unless these variables bear coefficients of interest, then there is little to be gained
from these auxiliary regressions.
9) Empirical studies of economic growth are flawed because many of the truly important underlying
determinants, such as culture and institutions, are very hard to measure. Discuss this statement paying
particular attention to simple cross-section data and panel data models. Use equations whenever possible to
underscore your argument.
Answer: Although some cultural and institutional variables, such as corruption, black market activity, central
bank independence, trust, etc., are hard to measure, authors have developed such series for the countries
of the world. Still, either these variables are measure with error or not all cultural and institutional
aspects are bound to be captured. Hence you would expect omitted variable bias to be present in
cross-sectional studies. However, if you could argue that these effects are constant across time or at least
slowly changing, then introducing country fixed effects in panel studies goes some way to alleviate the
omitted variable problem. Similarly by using time fixed effects, common world business cycle effects can
be largely eliminated. For an empirical study of economic growth using U.S. states, time fixed effects
would eliminate common effects of monetary policy and inflation.
The above argument can be made using equations along the theoretical arguments presented in sections
8.3 and 8.4 of the textbook.
10) Give at least three examples from macroeconomics and five from microeconomics that involve specified
equations in a panel data analysis framework. Indicate in each case what the role of the entity and time fixed
effects in terms of omitted variables might be.
Answer: Answers will vary by student. Given the textbook example, you can expect a study of fatality rates and
beer taxes to appear. Other examples mentioned may be minimum wage studies using data from U.S.
states or Canadian provinces, panel data in earnings studies, empirical studies of economic growth
across the countries of the world or regions within a country, determinants of unemployment rates using
data from geographical units (countries, regions, states), degree of democratization of the countries of
the world, etc. Students should point out in the various examples how entity and time fixed effects pick
up variables that are constant across entities at a point in time, or constant over time for specific entities.
For geographical units, these typically involve cultural and institutional factors, and common
macroeconomic effects.
11) Your textbook specifies a simple regression problem for two time periods for the years 1982 and 1988 as
follows:
FatalityRatei,1982= 0 + 1 BeerTaxi,1982 + ui,1982
FatalityRatei,1988= 0 + 1 BeerTaxi,1988 + ui,1988
After subtracting the first equation from the second equation, the authors estimate the model and find
a negative intercept.
a.
Show how you would have to modify the two equations to allow for the presence of an intercept in
the differenced model.
b.
What would the relative magnitude of the modified model have to be for you to find a negative
intercept?
b.
<
12) Your textbook reports the following result from an two-way fixed effects (entity and time fixed effects)
regression model:
FatalityRate = -0.66 BeerTax + StateFixedEffects + TimeFixedEffects
(0.36)
Where the number in parenthesis is the heteroskedasticity- and autocorrelation-consistent (HAC) standard
error.
a.
Calculate the t-statistic. Can you reject the null hypothesis that the slope coefficient is zero in the
population, using a two-sided test and a 5% significance level?
b.
Given that economic theory suggests that the population slope is negative under the alternative
hypothesis, is it possible to use a one-sided test here? In that case, does your conclusion change?
c.
Using only heteroskedasticity-robust standard errors, but not HAC standard errors, the value in
parenthesis becomes 0.25. Repeat the calculations in (a) and report your decision based on a two -sided
test.
d. Since the coefficient becomes more statistically significant in (d), should this influence your choice of
standard errors? Why or why not?
Answer: a. t =
-0.64
= -1.78 < -1.96. Hence you cannot reject the null hypothesis that the coefficient is zero in the
0.36
population.
b. The beer tax represents part of the cost (price) of alcohol consumption and an increase in price should
reduce the demand for alcohol. Hence economic theory suggests a negative price coefficient. It therefore
seems reasonable to use a one-sided test. Since the critical value is -1.64 in that case, you can reject the
null hypothesis at the 5% significance level.
c. The t-statistic is now -2.56 and you can reject the null hypothesis at the 5% level, and almost at the 1%
level.
d. It is better to use the clustered standard errors, since these are valid whether or not there is
heteroskedasticity, autocorrelation, or both. Using heteroskedasticity -robust standard errors only will
result in invalid statistical inference, since they were derived under the assumption of no serial
correlation in the error term.
7) The following tools from multiple regression analysis carry over in a meaningful manner to the linear
probability model, with the exception of the
A) F-statistic.
B) significance test using the t-statistic.
C) 95% confidence interval using 1.96 times the standard error.
D) regression R2 .
Answer: D
8) (Requires material from Section 11.3 possibly skipped) For the measure of fit in your regression model with a
binary dependent variable, you can meaningfully use the
A) regression R2 .
B) size of the regression coefficients.
C) pseudo R2 .
D) standard error of the regression.
Answer: C
9) The major flaw of the linear probability model is that
A) the actuals can only be 0 and 1, but the predicted are almost always different from that.
B) the regression R2 cannot be used as a measure of fit.
C) people do not always make clear-cut decisions.
D) the predicted values can lie above 1 and below 0.
Answer: D
10) The probit model
A) is the same as the logit model.
B) always gives the same fit for the predicted values as the linear probability model for values between 0.1
and 0.9.
C) forces the predicted values to lie between 0 and 1.
D) should not be used since it is too complicated.
Answer: C
11) The logit model derives its name from
A) the logarithmic model.
B) the probit model.
C) the logistic function.
D) the tobit model.
Answer: C
12) In the probit model Pr(Y = 1 =
( 0 + 1 X),
( 0 + 1 X),
A) ( 0 + 1 X) plays the role of z in the cumulative standard normal distribution function.
B) 1 cannot be negative since probabilities have to lie between 0 and 1.
20) (Requires Advanced material) Only one of the following models can be estimated by OLS :
A) Y = AK L + u.
B) Pr(Y = 1 X) = ( 0 + 1 X)
C) Pr(Y = 1 X) = F( 0 + 1 X) =
1
.
-( 0 + 1 X)
1+ e
D) Y = AK L u.
Answer: D
21) (Requires Advanced material) Nonlinear least squares estimators in general are not
A) consistent.
B) normally distributed in large samples.
C) efficient.
D) used in econometrics.
Answer: C
22) (Requires Advanced material) Maximum likelihood estimation yields the values of the coefficients that
A) minimize the sum of squared prediction errors.
B) maximize the likelihood function.
C) come from a probability distribution and hence have to be positive.
D) are typically larger than those from OLS estimation.
Answer: B
23) To measure the fit of the probit model, you should:
A) use the regression R2 .
B) plot the predicted values and see how closely they match the actuals.
C) use the log of the likelihood function and compare it to the value of the likelihood function.
D) use the fraction correctly predicted or the pseudo R2 .
Answer: D
24) When estimating probit and logit models,
A) the t-statistic should still be used for testing a single restriction.
B) you cannot have binary variables as explanatory variables as well.
C) F-statistics should not be used, since the models are nonlinear.
D) it is no longer true that the R2 < R2 .
Answer: A
25) The following problems could be analyzed using probit and logit estimation with the exception of whether or
not
A) a college student decides to study abroad for one semester.
B) being a female has an effect on earnings.
C) a college student will attend a certain college after being accepted.
D) applicants will default on a loan.
Answer: B
26) In the probit regression, the coefficient 1 indicates
A) the change in the probability of Y = 1 given a unit change in X
B) the change in the probability of Y = 1 given a percent change in X
C) the change in the z- value associated with a unit change in X
D) none of the above
Answer: C
27) Your textbook plots the estimated regression function produced by the probit regression of deny on P/I ratio.
The estimated probit regression function has a stretched S shape given that the coefficient on the P/I ratio is
positive. Consider a probit regression function with a negative coefficient. The shape would
A) resemble an inverted S shape (for low values of X, the predicted probability of Y would approach 1)
B) not exist since probabilities cannot be negative
C) remain the S shape as with a positive slope coefficient
D) would have to be estimated with a logit function
Answer: A
28) Probit coefficients are typically estimated using
A) the OLS method
B) the method of maximum likelihood
C) non-linear least squares (NLLS)
D) by transforming the estimates from the linear probability model
Answer: B
29) F-statistics computed using maximum likelihood estimators
A) cannot be used to test joint hypothesis
B) are not meaningful since the entire regression R2 concept is hard to apply in this situation
C) do not follow the standard F distribution
D) can be used to test joint hypothesis
Answer: D
30) When testing joint hypothesis, you can use
A) the F- statistic
B) the chi-squared statistic
C) either the F-statistic or the chi-square statistic
D) none of the above
Answer: C
(a) Do you see any relationship between the temperature and the number of O-ring failures? If you fitted a
linear regression line through these seven observations, do you think the slope would be positive or negative?
Significantly different from zero? Do you see any problems other than the sample size in your procedure?
(b) You decide to look at all successful launches before Challenger, even those for which there were no
Stock/Watson 2e -- CVC2 8/23/06 -- Page 272
incidents. Furthermore you simplify the problem by specifying a binary variable, which takes on the value one
if there was some O-ring failure and is zero otherwise. You then fit a linear probability model with the
following result,
OFail = 2.858 0.037 Temperature; R2 = 0.325, SER = 0.390,
(0.496) (0.007)
where Ofail is the binary variable which is one for launches where O-rings showed some thermal distress, and
Temperature is measured in degrees of Fahrenheit. The numbers in parentheses are heteroskedasticity -robust
standard errors.
Interpret the equation. Why do you think that heteroskedasticity-robust standard errors were used? What is
your prediction for some O-ring thermal distress when the temperature is 31, the temperature on January 28,
1986? Above which temperature do you predict values of less than zero? Below which temperature do you
predict values of greater than one?
(c) To fix the problem encountered in (b), you re-estimate the relationship using a logit regression:
Pr(OFail = 1 Temperature) = F (15.297 0.236 Temperature); pseudo- R2 =0.297
(7.329) (0.107)
What is the meaning of the slope coefficient? Calculate the effect of a decrease in temperature from 80 to 70,
and from 60 to 50. Why is the change in probability not constant? How does this compare to the linear
probability model?
(d) You want to see how sensitive the results are to using the logit, rather than the probit estimation method.
The probit regression is as follows:
Pr(OFail = 1 Temperature) =
Why is the slope coefficient in the probit so different from the logit coefficient? Calculate the effect of a decrease
in temperature from 80 to 70, and from 60 to 50
and compare the resulting changes in probability to your results in (c). What is the meaning of the pseudo - R2
? What other measures of fit might you want to consider?
(e) Calculate the predicted probability for 80 and 40, using your probit and logit estimates. Based on the
relationship between the probabilities, sketch what the general relationship between the logit and probit
regressions is. Does there seem to be much of a difference for values other than these extreme values?
(f) You decide to run one more regression, where the dependent variable is the
actual number of incidences (NoOFail). You allow for a different functional form by choosing the inverse of the
temperature, and estimate the regression by OLS.
NoOFail = -3.8853 + 295.545 (1/Temperature); R2 = 0.386, SER = 0.622
(1.516) (106.541)
What is your prediction for O-ring failures for the 31 temperature which was forecasted for the launch on
January 28, 1986? Sketch the fitted line of the regression above.
Answer: (a) There does not appear to be a linear relationship underlying the few observations where O -ring
failure occurred. If estimated by OLS, you would expect a slightly negative relationship (the slope turns
out to be 0.025). It certainly would not be statistically significant using the t-statistic (although a
standard normal distribution cannot be used given the small sample size). Using a linear function is also
a problem since, even in the presence of a significant slope, the dependent variable cannot be less than
zero.
(b) There is a negative relationship between the temperature and the occurrence of an O -ring failure. At
high temperatures, say above 75 degrees, there is less than a 10 percent chance of O -ring failure.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 273
As was mentioned in the textbook, the errors of the linear probability model are always heteroskedastic.
It is therefore necessary to use heteroskedasticity-robust standard errors for inference. The linear
probability model predicts O-ring failure with certainty for temperatures below 50 degrees. The
prediction for 31 degrees is therefore above one (1.7). The model predicts negative values for
temperatures above 77 degrees Fahrenheit.
(c) The slope coefficient is negative. Hence increases in temperature result in a lowering of the
probability of O-ring failures. Beyond that, neither the slope nor the intercept is easy to interpret. The
decrease in temperature from 80 to 70 results in an increase in the probability of 20.0 percent, and from
60 to 50 in an increase in the probability of 21.3 percent. The change in probability is not constant since
this is a nonlinear model. In the linear probability model the change in probability would remain
constant, being 30.7 percent in the above example.
(d) The slope coefficients should not be directly compared, since the functions are different. This does
not imply that the calculated probabilities are not similar between using the logit and probit model. For
example, the decrease in temperature from 80 to 70 results in an increase in the probability of 22.5
percent, and from 60 to 50 in an increase in the probability of 22.8 percent. The pseudo - R2 calculates
the increase in the likelihood function by using temperature compared to the case where no explanatory
variables is used. An alternative measure of fit is the fraction correctly predicted.
(e) There is little difference between the logit and probit predictions, other than in the extremes. For 80,
the logit and probit predicted values are 2.7 and 2.0 percent respectively, and at 40, they are 99.7
percent and 99.9 percent. Hence the logit is slightly higher at high temperatures and slightly lower at
low temperatures. However, the difference is very small.
3) A study tried to find the determinants of the increase in the number of households headed by a female. Using
1940 and 1960 historical census data, a logit model was estimated to predict whether a woman is the head of a
household (living on her own) or whether she is living within anothers household. The limited dependent
variable takes on a value of one if the female lives on her own and is zero if she shares housing. The results for
1960 using 6,051 observations on prime-age whites and 1,294 on nonwhites were as shown in the table:
Regression
Regression model
Constant
Age
age squared
education
farm status
South
expected family
earnings
family composition
Pseudo-R2
Percent Correctly
Predicted
(1) White
Logit
1.459
(0.685)
-0.275
(0.037)
0.00463
(0.00044)
-0.171
(0.026)
-0.687
(0.173)
0.376
(0.098)
0.0018
(0.00019)
4.123
(0.294)
0.266
(2) Nonwhite
Logit
-2.874
(1.423)
0.084
(0.068)
0.00021
(0.00081)
-0.127
(0.038)
-0.498
(0.346)
-0.520
(0.180)
0.0011
(0.00024)
2.751
(0.345)
0.189
82.0
83.4
where age is measured in years, education is years of schooling of the family head, farm status is a binary variable
taking the value of one if the family head lived on a farm, south is a binary variable for living in a certain region
of the country, expected family earnings was generated from a separate OLS regression to predict earnings from a
Stock/Watson 2e -- CVC2 8/23/06 -- Page 275
set of regressors, and family composition refers to the number of family members under the age of 18 divided by
the total number in the family.
The mean values for the variables were as shown in the table.
Variable
age
age squared
education
farm status
south
expected family
earnings
family composition
0.2
0.3
(a) Interpret the results. Do the coefficients have the expected signs? Why do you think age was entered both in
levels and in squares?
(b) Calculate the difference in the predicted probability between whites and nonwhites at the sample mean
values of the explanatory variables. Why do you think the study did not combine the observations and allowed
for a nonwhite binary variable to enter?
(c) What would be the effect on the probability of a nonwhite woman living on her own, if education and family
composition were changed from their current mean to the mean of whites, while all other variables were left
unchanged at the nonwhite mean values?
Answer: (a) Since these are logit estimates, the value of the coefficients cannot be interpreted easily. However,
statements can be made about the direction of the relationship between the dependent variable and the
regressors. There is a decrease in the probability of females of living on their own with an increase in
years of education. Not living on a farm also lowers the probability. These results hold both for whites
and nonwhites. In addition, for whites the probability of living on her own increases up to a point with
age, but then decreases. This is the result of age entering as a level and the square of age. This
relationship with regard to age is not statistically significant for nonwhites. In the south, white females
are more likely to live on their own, but nonwhites are not. An increase in expected family earnings and
family composition increase the probability of females living on their own.
(b) For whites, the probability is 0.90, while for nonwhites, it is 0.88. In the above approach, all
coefficients are allowed to vary, whereas in a combined sample, the coefficients on the variables other
than the binary race variable would have to be identical.
(c) The probability would increase to 0.81.
4) A study investigated the impact of house price appreciation on household mobility. The underlying idea was
that if a house were viewed as one part of the households portfolio, then changes in the value of the house,
relative to other portfolio items, should result in investment decisions altering the current portfolio. Using 5,162
observations, the logit equation was estimated as shown in the table, where the limited dependent variable is
one if the household moved in 1978 and is zero if the household did not move:
Regression
model
constant
Male
Black
Married78
marriage
Logit
-3.323
(0.180)
-0.567
(0.421)
-0.954
(0.515)
0.054
(0.412)
0.764
Stock/Watson 2e -- CVC2 8/23/06 -- Page 276
change
A7983
PURN
Pseudo-R2
(0.416)
-0257
(0.921)
-4.545
(3.354)
0.016
where male, black, married78, and marriage change are binary variables. They indicate, respectively, if the entity
was a male-headed household, a black household, was married, and whether a change in marital status
occurred between 1977 and 1978. A7983 is the appreciation rate for each house from 1979 to 1983 minus the
SMSA-wide rate of appreciation for the same time period, and PNRN is a predicted appreciation rate for the
unit minus the national average rate.
(a) Interpret the results. Comment on the statistical significance of the coefficients. Do the slope coefficients
lend themselves to easy interpretation?
(b) The mean values for the regressors are as shown in the accompanying table.
Variable
male
black
married78
marriage change
A7983
PNRN
Mean
0.82
0.09
0.78
0.03
0.003
0.007
Taking the coefficients at face value and using the sample means, calculate the probability of a household
moving.
(c) Given this probability, what would be the effect of a decrease in the predicted appreciation rate of 20
percent, that is A7983 = 0.20?
Answer: (a) Since the logit model is nonlinear, the slope coefficients cannot be easily interpreted. However, the
signs of the coefficients indicate the direction of the relationship between the regressors and the binary
dependent variable. Accordingly, being married or having experienced a marriage change increases the
probability of moving. A male-headed household or a black household is less likely to move. If the
predicted appreciation rate relative to the national average increased, then the household is less likely to
move. The same holds for the actual appreciation rate from 1979 to 1983. None of the slope coefficients
are statistically significant with the exception of the black household and marriage change coefficients.
The two t-statistics are 1.85 and 1.84 respectively. These would be statistically significant at the 5%
level of a one-sided hypothesis test.
(b) The probability is 0.021.
(c) The resulting probability would be 0.051, i.e., more than twice the value in the previous result.
5) A study analyzed the probability of Major League Baseball (MLB) players to survive for another season, or,
in other words, to play one more season. The researchers had a sample of 4,728 hitters and 3,803 pitchers for
the years 1901-1999. All explanatory variables are standardized. The probit estimation yielded the results as shown
in the table:
Regression
Regression model
constant
number of seasons
played
performance
average performance
(1) Hitters
probit
2.010
(0.030)
-0.058
(0.004)
0.794
(0.025)
0.022
(0.033)
(2) Pitchers
probit
1.625
(0.031)
-0.031
(0.005)
0.677
(0.026)
0.100
(0.036)
where the limited dependent variable takes on a value of one if the player had one more season (a minimum of
50 at bats or 25 innings pitched), number of seasons played is measured in years, performance is the batting average
for hitters and the earned run average for pitchers, and average performance refers to performance over the
career.
(a) Interpret the two probit equations and calculate survival probabilities for hitters and pitchers at the sample
mean. Why are these so high?
(b) Calculate the change in the survival probability for a player who has a very bad year by performing two
standard deviations below the average (assume also that this player has been in the majors for many years so
that his average performance is hardly affected). How does this change the survival probability when
compared to the answer in (a)?
(c) Since the results seem similar, the researcher could consider combining the two samples. Explain in some
detail how this could be done and how you could test the hypothesis that the coefficients are the same.
Answer: (a) Note that all variables are standardized, so that the mean is zero. This results in a survival probability
of 0.997 for hitters and 0.991 for pitchers. These results are so high because there is a high probability, in
general, for a player to return the following season.
(b) Since the variables are standardized, this implies a change of two for the performance variable. The
result for hitters is a lowering of the survival probability to 0.65, and for pitchers to 0.633
(c) After combining the sample for hitters and pitchers, you would allow for a different intercept and
slopes by introducing a binary variable for pitchers if hitters are the default. This binary variable would
be introduced by itself and in combination with each of the above variables, thereby allowing all
coefficients to differ. You could then conduct an F-test for the joint hypothesis that all coefficients
involving the binary variables are zero. If the hypothesis cannot be rejected, then there is no difference
between the coefficients for hitters and pitchers.
6) The logit regression (11.10) on page 393 of your textbook reads:
Pr(deny=1|P/Iratio,black) = F(-4.13 + 5.37 P/Iratio + 1.27 black)
a)
Using a spreadsheet program such as Excel, plot the following logistic regression function with a single X,
^
^
^
1
where
Yi =
0 = -4.13, 1 = 5.37, 2 = 1.27. Enter values for X1 in the first column
^ ^
^
1+e-( 0 + 1 X1i+ 2 X2i)
^
starting from 0 and then increment these by 0.1 until you reach 2.0. Let X2 be 0 at first. Then enter the logistic
function formula in the next column. Next allow X2 to be 1 and calculate the new values for the logistic
function in the third column. Finally produce the predicted probabilities for both blacks and whites, connecting
the predicted values with a line.
(b) Using the same spreadsheet calculations, list how the probability increases for blacks and for whites
Stock/Watson 2e -- CVC2 8/23/06 -- Page 278
b. The increase in the deny probability increases by 9.7 percentage points for whites, and by 13.3
percentage points for blacks.
c. At a P/I value of 0.5, the difference is approximately 30%, while it is 20% for the higher value. As the
ratio increases, the probability that everyone gets rejected increases and approaches 1, regardless of race.
d. In that case you would have to hold the other explanatory variables constant. A simple solution
would be to set all of these to zero. A more reasonable approach would be to set them to their sample
average if they are continuous variables, and to set them either to 0 or 1 for binary variables.
7) Equation (11.3) in your textbook presents the regression results for the linear probability model.
a.
Using a spreadsheet program such as Excel, plot the fitted values for whites and blacks in the same
graph, for P/I ratios ranging from 0 to 1 (use 0.05 increments).
b.
Explain some of the strengths and shortcomings of the linear probability model using this graph.
Answer: a.
Answer:
b. The strength is that the regression line is easy to interpret once you realize that the fitted values are
probabilities of being denied a loan: increases in the P/I ratio of 10 percentage points increase the
probability of being denied by roughly 6 percentage points. The role of the binary variable for blacks also
becomes clear: blacks have a roughly 18 percentage point higher probability of being rejected for a loan
when compared to whites, at any given level of a P/I ratio. As for shortcomings, it becomes clear that this
model cannot be used to calculate the probability of rejection for whites with a P/I ratio less than
approximately 20 percent. In that case, the predicted probability would be negative. Similarly, you
would expect the probability increase for a given change in the P/I ratio to change as the P/I ratio
becomes larger; this is not the case for the linear probability model. Furthermore, you will find values
larger than 1 for the P/I ratio in the data set used for Chapter 11. As a result, the predicted probability of
being rejected for a loan would be above 1 for some individuals, which does not make sense.
8) Equation (11.3) in your textbook presents the regression results for the linear probability model, and equation
(11.10) the results for the logit model.
a.
Using a spreadsheet program such as Excel, plot the predicted probabilities for being denied a loan for
both the linear probability model and the logit model if you are black. (Use a range from 0 to 1 for the
P/I Ratio and allow for it to increase by increments of 0.05.)
b.
Given the shortcomings of the linear probability model, do you think that it is a reasonable
approximation to the logit model?
c.
Answer: a.
Answer:
b. The predicted probabilities are actually quite close for P/I Ratio values between 0 and 0.5. Beyond that,
the linear probability model predicts substantially lower rejection probabilities.
c.
Here the shortcomings of the linear probability model become obvious for P/I Ratio values of
less than approximately 0.2: the predicted probabilities become negative. However, for values
of between 0.2 and 0.7, the predicted probabilities of both models are approximately the same,
so that the linear probability model would work well as an approximation.
3) You have a limited dependent variable (Y) and a single explanatory variable (X). You estimate the relationship
using the linear probability model, a probit regression, and a logit regression. The results are as follows:
^
Y = 2.858 0.037 X
(0.007)
Pr(Y = 1 X) = F (15.297 0.236 X)
Pr(Y = 1 X) =
(8.900 0.137 X)
(0.058)
(a) Although you cannot compare the coefficients directly, you are told that it can be shown that certain
^
relationships between the coefficients of these models hold approximately. These are for the slope: probit
^
0.625 Logit , linear 0.25 Logit . Take the logit result above as a base and calculate the slope coefficients for
the linear probability model and the probit regression. Are these values close?
(b) For the intercept, the same conversion holds for the logit-to-probit transformation. However, for the linear
probability model, there is a different conversion:
^
0,linear
Using the logit regression as the base, calculate a few changes in X (temperature in degrees of Fahrenheit) to
see how good the approximations are.
Answer: (a)
probit
0.625 0.236 = 0.148, which is quite close to the estimated slope, judging by its standard
^
deviation. linear 0.25 0.236 = 0.059 is close numerically, but not as close when you take into account
the small standard deviation.
(b) The approximation gives a probit intercept of 9.561 and a linear approximation of 4.324.
Temperature X
30
40
50
60
70
80
Probit model
actual
approximation
1
1
1
1
0.98
0.98
0.75
0.75
0.25
0.21
0.02
0.01
In terms of calculated probabilities, the approximation is closer for the probit model than for the linear
probability model.
4) The population logit model of the binary dependent variable Y with a single regressor is
Pr(Y=1 X1 )=
1
-( + 1 X1 )
1+e 0
Logistic functions also play a role in econometrics when the dependent variable is not a binary variable. For
example, the demand for televisions sets per household may be a function of income, but there is a saturation
or satiation level per household, so that a linear specification may not be appropriate. Given the regression
model
Yi =
0
+ ui,
- X
1 + 1e 2 i
sketch the regression line. How would you go about estimating the coefficients?
Answer: The equation cannot be estimated using linear methods or transformations that allow linearization.
However, nonlinear least squares estimation is possible as described in section 11.3 of the textbook.
Some students may point out that 0 will give an estimate of the satiation level (perhaps 10 TVs per
household), and that the point of inflection is at
1
ln 1 .
X=
2
5) (Requires Appendix material) Briefly describe the difference between the following models: censored and
truncated regression model, count data, ordered responses, and discrete choice data. Try to be specific in terms
of describing the data involved.
Answer: The answer should follow the discussion in Appendix 11.3. Briefly: censored regression models have a
dependent variable that has been censored above or below a certain cutoff, such as in the case where
some individuals actually spend different amounts of money on an item, but others do not spend any
amount. An example is the tobit regression model. The difference to the truncated regression model is that
data is available for both types of individuals, buyers and non-buyers in the case of the censored model,
but only for buyers in the case of the truncated regression model. An example for these types of models
are expenditures by individuals. There are other examples in economics where sample selection bias
occurs, such as in the case of earnings functions (labor economics), industrial organization, and finance.
Count data involves a discrete dependent variable, such as the number of times an activity is performed.
Just as OLS does not perform well in the discrete dependent variable case, the same holds here, and
special methods (Poisson and negative binomial regression models) have been developed to deal with
the special format. Ordered response data resembles the count data situation, in that there is a natural
ordering. The difference is that there are no natural numerical values attached, such as is the case when
activity by individuals happens a discrete number of times during a certain period. The Federal Reserve
may decide to lower the federal funds rate or not, and conditionally on lowering it, it may decide on a
mild cut or a more severe cut. Ordered Probit Models have been developed for such situations. Finally,
discrete choice data also allows for multiple responses, but these are not ordered, such as when the
individual can decide on different modes of transportation. In addition to its use in transportation
economics, multinomial probit and logit regression models have been developed and applied in labor
economics and health economics.
6) (Requires Appendix material and Calculus) The logarithm of the likelihood function (L) for estimating the
population mean and variance for an i.i.d. normal sample is as follows (note that taking the logarithm of the
likelihood function simplifies maximization. It is a monotonic transformation of the likelihood function,
meaning that this transformation does not affect the choice of maximum):
n
L = - log(2
2
2) 1
2 2
(Yi - Y)2
i=1
Derive the maximum likelihood estimator for the mean and the variance. How do they differ, if at all, from the
OLS estimator? Given that the OLS estimators are unbiased, what can you say about the maximum likelihood
estimators here? Is the estimator for the variance consistent?
Answer: Taking the derivative with respect to the two parameters Y and 2 results in
L
Y
=-
1
2 2
i=1
1
2(Yi - Y)(-1) =
2 2
n
1
L
=+
2
2
2
2 4
n
i=1
n
i=1
(Yi - Y)
(Yi - Y)2 .
The maximum likelihood estimator is then the value for Y and 2 that maximizes the (log) likelihood
function. Setting both equations to zero, and assuming that this results in a maximum rather than a
minimum (second order conditions will not be discussed here), yields
^
Y,MLE =
1
n
n
i=1
Yi = Y and
^2
1
MLE = n
n
i=1
1
^
(Yi - Y,MLE )2 = n
(Yi - Y)2 ) .
i=1
The maximum likelihood estimator of the population mean is therefore the sample mean. Since the OLS
estimator is identical, and it is unbiased, the MLE will also be unbiased. However, the MLE for the
population differs from the OLS estimator, and since the OLS estimator is unbiased, the MLE must be
biased. But, the difference between the two estimators vanishes as n increases, and hence the MLE is
consistent.
7) Besides maximum likelihood estimation of the logit and probit model, your textbook mentions that the model
can also be estimated by nonlinear least squares. Construct the sum of squared prediction mistakes and suggest
how computer algorithms go about finding the coefficient values that minimize the function. You may want to
use an analogy where you place yourself into a mountain range at night with a flashlight shining at your feet.
Your task is to find the lowest point in the valley. You have two choices to make: the direction you are walking
in and the step length. Describe how you will proceed to find the bottom of the valley. Once you find the
lowest point, is there any guarantee that this is the lowest point of all valleys? What should you do to assure
this?
n
(Yi - f(b0 + b1 X1i + ... + bkXki)]2 is the sum of squared prediction mistakes, whether or not
i=1
the function f() is linear or nonlinear. Nonlinear least squares then uses a sophisticated algorithm of trial
and error to find the minimum of the squared prediction mistakes by changing the values of the
parameters. Some of the routines are called Newton-Raphson, Gauss-Newton, Method of Steepest
Ascent, etc. What they have in common is the general principle that they evaluate the squared prediction
after changing the parameters in a certain direction and by a certain size. In the analogy, the student is
lowered into a mountain range at night and her task is to find the lowest point of the valley. The rule
may be that she will walk in one direction as long as at the end of the step she is at a lower point than at
the beginning of the step. If not, then she should walk in a different direction. She is also allowed to
choose the step length. There is, of course, no guarantee that another point in another valley is not lower
than the one she found in the valley she is in, nor is she guaranteed to find the lowest point if she makes
very large steps. To assure that this is the lowest point, she should ask to be dropped off in a different
location (starting point) and see if she finds the same spot again. Finally, she should be warned that it
is possible to walk along ridges for a long time without much progress visible.
Answer: In general,
(8.9 0.14 X)
Calculate the change in probability for X increasing by 10 for X = 40 and X = 60. Why is there such a large
difference in the change in probabilities?
Answer: Pr(Y=1 X=40) = 0.999; Pr(Y=1 X=50) = 0.971; Pr(Y=1 X=60) = 0.691; Pr(Y=1 X=70) = 0.184. The large
differences happen as a result of the non-linearity of the function, and the points at which they are
calculated.
9) Earnings equations establish a relationship between an individuals earnings and its determinants such as years
of education, tenure with an employer, IQ of the individual, professional choice, region within the country the
individual is living in, etc. In addition, binary variables are often added to test for discrimination against
certain sub-groups of the labor force such as blacks, females, etc. Compare this approach to the study in the
textbook, which also investigates evidence on discrimination. Explain the fundamental differences in both
approaches using equations and mathematical specifications whenever possible.
Answer: In the former case, the binary variable appears as a regressor. That is, the regression may be ln( Earni) =
0 + 1 Educi + 2 Exper + 3 Binary + ... + ui,
where earnings of an individual are explained by a set of attributes. Binary is a shift variable, which is
one for females (or blacks, religion, union members, etc.). The coefficient on the shift variable then
indicates whether or not the individual is treated differently, controlling for all other influences.
However, the dependent variable is continuous.
In the case of a limited dependent variable, it is the left-hand variable that is binary. Here behavior of a
qualitative type is being explained, i.e.,
Binaryi = 0 + 1 X1i + 2 X2i + ... + k Xki + ui,
although some of the regressors may also be binary variables.
10) (Requires Appendix material and Calculus) The log of the likelihood function (L) for the simple regression
model with i.i.d. normal errors is as follows (note that taking the logarithm of the likelihood function simplifies
maximization. It is a monotonic transformation of the likelihood function, meaning that this transformation
does not affect the choice of maximum):
n
n
1
log(2 ) log 2
2
2
2 2
L=
n
i=1
(Yi - 0 - 1 Xi)2
Derive the maximum likelihood estimator for the slope and intercept. What general properties do these
estimators have? Explain intuitively why the OLS estimator is identical to the maximum likelihood estimator
here.
Answer: Maximizing the likelihood function with respect to the regression coefficients is the same as making the
third term as small as possible. However, this term will become the sum of squared residuals once the
function is maximized. Hence maximizing the likelihood function is identical to minimizing the sum of
squared residuals, and the two methods of choosing an estimator are therefore identical for the
regression coefficients.
Taking the derivative of the log of the likelihood with respect to the three parameters
0 , 1 and 2
results in
L
0
L
1
=-
=-
1
2 2
2(Yi - 0 - 1 Xi)(-1)
i=1
n
1
2 2
2(Yi - 0 - 1 Xi)(-Xi)
i=1
L
n
1
=+
2
2
2
2 4
n
(Yi - 0 - 1 Xi)2
i=1
Setting the equations to zero and solving for the three parameters then results in the maximum
likelihood estimator (MLE).
n
i=1
n
0,MLE = Y - 1,MLE X.
(Yi - 0,MLE - 1,MLE Xi)(Xi) = 0, or, after multiplying through by Xi and substituting
i=1
n
^
YiXi - nXY
i=1
1,MLE = n
.
2
X i - nX2
i=1
n
2
^2
^2
MLE
1
MLE = n
+
2
^4
n
i=1
i=1
MLE
^
^
(Yi - 0,MLE - 1,MLE Xi)2 = 0, or
1
(Yi - 0,MLE - 1,MLE Xi)2 = n
^
n ^
2
ui .
i=1
0,MLE ,
The estimator for the regression slope and intercept is therefore identical to the OLS estimator. However,
the estimator for the error variance is different and biased. In general, MLEs are consistent. They are also
normally distributed in large samples.
11) The estimated logit regression in your textbook is
Pr(deny=1|P/Iratio,black) = F(-4.13 + 5.37 P/Iratio + 1.27 black)
Using a spreadsheet program, such as Excel, generate a table with predicted probabilities for both whites and
blacks using P/I Ratio values between 0 and 1 and increments of 0.05.
Answer: P/I Ratio
whites
0.02
0.02
0.03
0.03
0.04
0.06
0.07
0.10
0.12
0.15
0.19
0.24
0.29
0.35
0.41
0.47
0.54
0.61
0.67
0.73
0.78
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
0.55
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
1.00
blacks
0.05
0.07
0.09
0.11
0.14
0.18
0.22
0.27
0.33
0.39
0.46
0.52
0.59
0.65
0.71
0.76
0.81
0.85
0.88
0.90
0.92
6) The J-statistic
A) tells you if the instruments are exogenous.
B) provides you with a test of the hypothesis that the instruments are exogenous for the case of exact
identification.
C) is distributed
2
m-k where m-k is the degree of overidentification.
D) is distributed
2
m-k where m-k is the number of instruments minus the number of regressors.
Answer: C
7) In the case of the simple regression model Yi = 0 + 1 Xi + ui, i = 1,, n, when X and u are correlated, then
A) the OLS estimator is biased in small samples only.
B) OLS and TSLS produce the same estimate.
C) X is exogenous.
D) the OLS estimator is inconsistent.
Answer: D
8) The following will not cause correlation between X and u in the simple regression model:
A) simultaneous causality.
B) omitted variables.
C) irrelevance of the regressor.
D) errors in variables.
Answer: C
9) The distinction between endogenous and exogenous variables is
A) that exogenous variables are determined inside the model and endogenous variables are determined
outside the model.
B) dependent on the sample size: for n > 100, endogenous variables become exogenous.
C) depends on the distribution of the variables: when they are normally distributed, they are exogenous,
otherwise they are endogenous.
D) whether or not the variables are correlated with the error term.
Answer: D
10) The two conditions for a valid instrument are
A) corr(Zi, Xi) = 0 and corr(Zi, ui) 0.
B) corr(Zi, Xi) = 0 and corr(Zi, ui) = 0.
C) corr(Zi, Xi)
D) corr(Zi, Xi)
0.
Answer: C
11) Instrument relevance
A) means that the instrument is one of the determinants of the dependent variable.
B) is the same as instrument exogeneity.
C) means that some of the variance in the regressor is related to variation in the instrument.
D) is not possible since X and u are correlated and Z and u are not correlated.
Answer: C
12) Consider a competitive market where the demand and the supply depend on the current price of the good.
Then fitting a line through the quantity-price outcomes will
A) give you an estimate of the demand curve.
B) estimate neither a demand curve nor a supply curve.
C) enable you to calculate the price elasticity of supply.
D) give you the exogenous part of the demand in the first stage of TSLS.
Answer: B
13) When there is a single instrument and single regressor, the TSLS estimator for the slope can be calculated as
follows:
^ TSLS SZY
A) 1
.
=
SZX
B)
C)
D)
^ TSLS
^ TSLS
^ TSLS
=
=
SXY
2
SX
SZX
.
SZY
SZY
2
SZ
Answer: A
14) The TSLS estimator is
A) consistent and has a normal distribution in large samples.
B) unbiased.
C) efficient in small samples.
D) F-distributed.
Answer: A
15) The reduced form equation for X
A) regresses the endogenous variable X on the smallest possible subset of regressors.
B) relates the endogenous variable X to all the available exogenous variables, both those included in the
regression of interest and the instruments.
C) uses the predicted values of X from the first stage as a regressor in the original equation.
D) uses smaller standard errors, such as homoskedasticity-only standard errors, for inference.
Answer: B
16) When calculating the TSLS standard errors
A) you do not have to worry about heteroskedasticity, since it was eliminated in the first stage
B) you can use the standard errors reported by OLS estimation of the second stage regression.
C) the critical values from the standard normal table should be adjusted for the proper degrees of freedom.
D) you should use heteroskedasticity-robust standard errors.
Answer: D
17) Having more relevant instruments
A) is a problem because instead of being just identified, the regression now becomes overidentified.
B) is like having a larger sample size in that the more information is available for use in the IV regressions.
C) typically results in larger standard errors for the TSLS estimator.
D) is not as important for inference as having the same number of endogenous variables as instruments.
Answer: B
Stock/Watson 2e -- CVC2 8/23/06 -- Page 292
^ TSLS
B) ( 1
1
n
- 1) =
^ TSLS
C) ( 1
- 1) =
^ TSLS
D) ( 1
- 1) =
1
n
1
n
1
n
(Zi - Z)
i=1
(Zi - Z)(Xi - X)
i=1
n
(Zi - Z)ui
i=1
n
.
(Zi - Z)2
i=1
1
n
1
n
(Xi - X)ui
i=1
(Zi - Z)(Xi - X)
i=1
Answer: A
20) If the instruments are not exogenous,
A) you cannot perform the first stage of TSLS.
B) then, in order to conduct proper inference, it is essential that you use heteroskedasticity -robust standard
errors.
C) your model becomes overidentified.
D) then TSLS is inconsistent.
Answer: D
21) In the case of exact identification
A) you can use the J-statistic in a test of overidentifying restrictions.
B) you cannot use TSLS for estimation purposes.
C) you must rely on your personal knowledge of the empirical problem at hand to assess whether the
instruments are exogenous.
D) OLS and TSLS yield the same estimate.
Answer: C
28) For W to be an effective control variable in IV estimation, the following condition must hold
A) E(ui ) = 0
B) E(u i|Zi,W i) = E(ui|Wi )
C) E(uiuj) 0
D) there must be an intercept in the regression
Answer: B
29) The IV estimator can be used to potentially eliminate bias resulting from
A) multicollinearity.
B) serial correlation.
C) errors in variables.
D) heteroskedasticity.
Answer: C
30) Instrumental Variables regression uses instruments to
A) establish the Mozart Effect.
B) increase the regression R2 .
C) eliminate serial correlation.
D) isolate movements in X that are uncorrelated with u.
Answer: D
31) Endogenous variables
A) are correlated with the error term.
B) always appear on the LHS of regression functions.
C) cannot be regressors.
D) are uncorrelated with the error term.
Answer: A
32) Consider the following two equations to describe labor markets in various sectors of the economy
W
Nd = 0 + 1
+u
P
W
Ns = 0 + 1
+v
P
Nd = Ns = N
A) W/P is exogenous, n is endogenous
B) Both n and W/P are endogenous
C) n is exogenous, W/P is endogenous
D) the parameters cannot be estimated because it would require two equations to be estimated at the same
time (simultaneously)
Answer: B
2
m-k .
Calculating the J-statistic amounts to comparing different IV estimates. In the case of two instruments
and one endogenous regressor, where the degree of overidentification is one, two such estimates exist.
Due to sample variation, these estimates will differ, although they should be similar, or close to each
other. If one or both of the instruments is not exogenous, then the estimates will not be similar, or the
difference between the two will be sufficiently large so as not to be the result of pure sampling variation.
In this situation the null hypothesis will be rejected. This procedure can only be executed when the
coefficients are overidentified, since there is no comparison possible for the case of exactly identified
coefficients. Passing the test is not sufficient for the instruments to be valid since, in addition to being
exogenous, they must also be relevant, i.e., they must be correlated with the endogenous regressor.
2) Using some of the examples from your textbook, describe econometric studies which required instrumental
variable techniques. In each case emphasize why the need for instrumental variables arises and how authors
have approached the problem. Make sure to include a discussion of overidentification, the validity of
instruments, and testing procedures in your essay.
Answer: The textbook mentions several studies which used instrumental variable estimation techniques, starting
with Whrights problem to estimate demand and supply elasticities on animal and vegetable oils and
fats. This is a case of simultaneous causality bias since the price and quantity in the market are
determined by both the supply and demand for the commodity. Wright used the weather, which shifted
the supply curve only and thereby traced out the demand curve. Since there was only a single
instrument, the coefficients are exactly identified, and the validity of the instrument cannot not be tested.
Another example mentioned is the effect of class size on test scores. The reason for a correlation between
class size and the error term potentially stems from omitted variable bias here, such as the quality of the
teaching staff and outside opportunities for some of the students. In the hypothetical examples of an
earthquake, some schools may receive more students than usual dependent on the closeness to the
epicenter, if the school was unaffected structurally. The increase in class size is related to the closeness to
the epicenter, but this distance should be uncorrelated with the ability of the teaching staff and the
outside opportunities. As in the previous study, there is only a single instrument and hence no
possibility to use the overidentification test.
The primary example of instrumental variable estimation in the chapter involves estimation of the
demand elasticity for cigarettes. Due to simultaneity bias for the demand equation, sales taxes are used
as an instrument first in a cross section of states in a single year and later in a panel. Prices and quantities
are determined simultaneously by supply and demand, and as a result, prices will be correlated with the
error term in the demand equation. Sales taxes are fairly highly correlated with prices, explaining almost
half of the variation in these. It is argued that due to differences in choices about public finance due to
political considerations across states, these are exogenous. Only one instrument is used in the cross
section and hence there is no degree of overidentification. Later another instrument is introduced,
cigarette-specific taxes. With two instruments and one endogenous regressor, the J-statistic can be
computed for the overidentifying restrictions test.
Further examples discussed in the textbook include the effect of an increase in the prison population on
crime rates, further discussion of class size and test scores, and aggressive treatment of heart attacks and
the potential for saving lives.
3) Describe the consequences of estimating an equation by OLS in the presence of an endogenous regressor. How
can you overcome these obstacles? Present an alternative estimator and state its properties.
Answer: In the case of an endogenous regressor, there is correlation between the variable and the error term. In
this case, the OLS estimator is inconsistent. To get a consistent estimator in this situation, instrumental
variable techniques, such as TSLS, should be used. If one or more valid instruments can be found,
meaning that the instrument must be relevant and exogenous, then a consistent estimator can be
derived. The relevance of instruments can be tested using the rule of thumb (a first -stage F-statistic of
more than 10 in the TSLS estimator). The exogeneity of the instruments can be tested using the J-statistic.
The test requires that there is at least one more instrument than endogenous regressors, i.e., that the
equation is overidentified. In large samples the sampling distribution of the TSLS estimator is
approximately normal, so that statistical inference can proceed as usual using the t-statistic, confidence
intervals, or joint hypothesis tests involving the F-statistic. However, inference based on these statistics
will be misleading in the case where instruments are not valid.
4) Write an essay about where valid instruments come from. Part of your explorations must deal with checking
the validity of instruments and what the consequences of weak instruments are.
Answer: In order for instruments to be valid, they have to be relevant and exogenous. To find valid instruments,
two approaches are typically used. First economic theory can serve as a guide. In the case of
simultaneous causality in a market, for example, theory predicts shifts in one curve but not the other as a
result of changes in an instrumental variable. The second approach focuses on shifts in the endogenous
regressor that is caused by an exogenous source of variation in the variable resulting from a random
phenomenon. The textbook uses the example of an earthquake which changes student teacher ratios as
students in affected areas have to be redistributed.
To check the validity of instruments, there is the rule of thumb to determine whether or not an
instrument is weak. It states that the F-statistic in the first stage of the TSLS procedure should exceed 10.
Instrument exogeneity can be tested only in the case of overidentification. If there are more instruments
than endogenous regressors, then the J-statistic can be calculated. The null hypothesis of exogeneity will
be rejected, in essence, if the TSLS residuals are correlated with the instruments.
If instruments are weak, then the TSLS estimator is biased and statistical inference does not yield reliable
confidence intervals even in large samples.
5) You have estimated a government reaction function, i.e., a multiple regression equation, where a government
instrument, say the federal funds rate, depends on past government target variables, such as inflation and
unemployment rates. In addition, you added the previous periods popularity deficit of the government, e.g.
the (approval rating of the president 50%), as one of the regressors. Your idea is that the Federal Reserve,
although formally independent, will try to expand the economy if the president is unpopular. One of your
peers, a political science student, points out that approval ratings depend on the state of the economy and
thereby indirectly on government instruments. It is therefore endogenous and should be estimated along with
the reaction function. Initially you want to reply by using a phrase that includes the words money neutrality
but are worried about a lengthy debate. Instead you state that as an economist, you are not concerned about
government approval ratings, and that government approval ratings are determined outside your (the
economic) model. Does your whim make the regressor exogenous? Why or why not?
Answer: In general, the question of whether or not a variable is endogenous or exogenous depends on its
correlation with the error term, not on the size of the underlying model. The point to make is that just
because a variable is endogenous does not imply that its determinants have to be modeled. If the
purpose of the exercise is to eventually simulate the model for policy purposes, then the feedback
envisioned by the political science student is potentially important. However, if the aim is simply to
forecast the behavior of the government reaction function, then the issue of endogeneity or exogeneity is
only relevant for questions regarding the type of estimator to be used. Of course, if a regressor is
endogenous, then instrumental variable techniques must be used to ensure desirable properties of the
estimator.
6) You have been hired as a consultant to estimate the demand for various brands of coffee in the market. You are
provided with annual price data for two years by U.S. state and the quantities sold. You want to estimate a
demand function for coffee using this data. What problems do you think you will encounter if you estimated
the demand equation by OLS?
Answer: Answers will differ by student. However, the following points should be mentioned: (i) there will be
simultaneous equation bias because quantity and price are determined simultaneously in the market. (ii)
If this is the case, then the OLS estimator will not be consistent. (iii) In that case, IV estimation should be
used to get a consistent estimator of the demand elasticity or response to a price increase. (iv) This brings
up the question of a valid instrument. It is not clear that students will come up with an easy answer, but
their deliberations should be insightful. One possible instrument is the price (change) from a previous
year, which most likely will be highly correlated with this years price (change) but not with the error
term in the equation. (v) There should be some discussion on the other factors determining coffee
demand, although some of these can be ignored if there is data for two periods and the data is
differenced (fixed effects).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 298
7) Studies of the effect of minimum wages on teenage employment typically regress the teenage employment to
population ratio on the real minimum wage or the minimum wage relative to average hourly earnings using
OLS. Assume that you have a cross section of United States for two years. Do you think that there are problems
with simultaneous equation bias?
Answer: For OLS not to be consistent, there would have to be omitted variable bias or simultaneous equation
bias. The former can be dealt with by differencing the data, if you assume that most other factors are
being held constant. If the minimum wage does not change between the two periods, i.e. it is constant,
then this will bring further problems with the interpretation, since the variation in the RHS variable only
comes from the denominator. In many ways, the question should come down to the correlation between
minimum wages and the error term in the equation. Students may argue that minimum wages are set by
the legislature or, more recently, by ballot, and are therefore exogenous. A more nuanced discussion may
point out that neither the legislature nor the electorate will raise minimum wages in time periods of low
employment (a recession although the 2008 and 2009 raises will contradict this statement to some
extent; however, these were decided in 2006/2007 when the economy was booming). There may be
further problems because of the denominator of the minimum wage variable, either the CPI or AHE,
both of which are potentially correlated with teenage employment. The point here is for the student to
think about the problem at hand and to point out various obstacles to getting a good estimate of the
elasticity/response of employment from a minimum wage increase.
each year there are different temporary weather patterns (v, w) which result in a temperature X different from
X. For the two years in your data set, the situation can be described as follows:
Subtracting X1997 from X1998 , you get X1998 = X1997 + w1998 v 1997 . Hence the population parameter for
the intercept and slope are zero and one, as expected. It is not difficult to show that the OLS estimator for the
slope is inconsistent, where
2
v
1
2
x +
2
v
As a result you consider estimating the slope and intercept by TSLS. You think about an instrument and
consider the temperature one month ahead of the observation in the previous year. Discuss instrument validity
for this case.
(c) The TSLS estimation result is as follows:
PHX
PHX
Temp 1998 = -6.24 + 1.07 Temp 1997 ;
(0.06)
Perform a t-test on whether or not the slope is now significantly different from one.
Answer: (a) The three predicted temperatures will be 47.6, 78.0, and 95.6 respectively. The initial expectation
should be that the temperature in 1998 is the same in 1997 for a given date. The regression line and the
45 degree line are sketched in the accompanying figure. The implication is mean reversion: if the
temperature was low (40 degrees), then it will also be low the following year, but not as low.
Alternatively, if the temperature was high (100 degrees), then it will be high again, but not as high. If this
prediction extrapolated into the future, then eventually all temperatures should be the same for all days.
This obviously does not make sense.
(b) For an instrument to be valid, two conditions have to hold. First, the instrument has to be relevant,
and second, the instrument has to be exogenous. If temperatures in one month ahead can predict the
current temperature, as it certainly does in Phoenix, then the instrument is relevant or correlated with
the current months temperature. If in addition, whatever caused the temperature in the current month
to deviate from its long-term value is only a temporary phenomenon, such as a weather system created
by a storm in the Pacific, then next months temperature should not be correlated with this event. Hence
the instrument would be exogenous.
(c) The t-statistic is 1.17, and hence you cannot reject the null hypothesis that the slope equals one.
2) Consider the following population regression model relating the dependent variable Yi and regressor Xi,
Yi = 0 + 1 Xi + ui, i = 1,, n.
Xi
Yi + Zi
1.
(b) To generate a consistent estimator for 1 , what should you do?
(c) The two equations above make up a system of equations in two unknowns. Specify the two reduced form
equations in terms of the original coefficients. (Hint: substitute the identity into the first equation and solve for
Y. Similarly, substitute Y into the identity and solve for X.)
(d) Do the two reduced form equations satisfy the OLS assumptions? If so, can you find consistent estimators of
the two slopes? What is the ratio of the two estimated slopes? This estimator is called Indirect Least Squares.
How does it compare to the TSLS in this example?
Answer: (a) Substitution of the first equation into the identity shows that X is correlated with the error term.
Hence estimation with OLS results in an inconsistent estimator.
SZY
^ 2SLS
(b) The instrumental variable estimator is consistent and in this case is 1
. Adventurous
=
SZX
students will derive this estimator along the lines shown in Appendix 10.2.
(c)
Stock/Watson 2e -- CVC2 8/23/06 -- Page 301
Yi = 0 + 1 (Yi + Zi) + ui
Xi = ( 0 + 1 Xi + ui) + Zi
or
(1- 1 )Yi = 0 + 1 Zi + ui
(1- 1 )Xi = 0 + Zi + ui
Hence
Yi = 0 + 2 Zi + v 1i
Xi = 3 + 4 Zi + v 2i
1
1
1
0
,
,
, and v 1i = v 2i =
u.
where 0 = 3 =
=
=
1- 1 i
1- 1 2 1- 1 4 1- 1
(d) Since Z is a valid instrument by assumption, it must be uncorrelated with the error term and hence
SYZ
^
2
4
SZZ
SXZ
SZZ
SYZ
which is identical to the TSLS estimator.
SZZ
3) Here are some examples of the instrumental variables regression model. In each case you are given the number
of instruments and the J-statistic. Find the relevant value from the
2
m-k distribution, using a 1% and 5%
significance level, and make a decision whether or not to reject the null hypothesis.
(a) Yi = 0 + 1 X1i + ui, i = 1,..., n; Z1i, Z2i are valid instruments, J = 2.58.
(b) Yi = 0 + 1 X1i + 2 X2i + 3 W1i + ui, i = 1,..., n; Z1i, Z2i, Z3i, Z4i are valid instruments,
J = 9.63.
(c) Yi = 0 + 1 X1i + 2 W1i + 3 W2i + 4 W3i + ui, i = 1,..., n; Z1i, Z2i, Z3i, Z4i are valid instruments, J = 11.86.
Answer: (a) The test statistic is distributed
2
1 and the critical values are 6.63 and 3.84 at the 1% and 5%
significance level. Hence you cannot reject the null hypothesis that all the instruments are exogenous.
(b) The test statistic is distributed
2
2 and the critical values are 9.21 and 5.99 at the 1% and 5%
significance level. Hence you can reject the null hypothesis that all the instruments are exogenous.
(c) The test statistic is distributed
2
3 and the critical values are 11.34 and 7.81 at the 1% and 5%
significance level. Hence you can reject the null hypothesis that all the instruments are exogenous.
4) To study the determinants of growth between the countries of the world, researchers have used panels of
countries and observations spanning over long periods of time (e.g. 1965-1975, 1975-1985, 1985-1990). Some of
these studies have focused on the effect that inflation has on growth and found that although the effect is small
for a given time period, it accumulates over time and therefore has an important negative effect.
(a) Explain why the OLS estimator may be biased in this case.
(b) Explain how methods using panel data could potentially alleviate the problem.
(c) Some authors have suggested using an index of central bank independence as an instrumental. Discuss
whether or not such an index would be a valid instrument.
Answer: (a) The presence of simultaneous causality is highly likely since inflation may respond to growth.
Depending on the list of regressors, omitted variables can also bias the estimator for the effect of the
inflation rate.
(b) Country fixed effects or differencing the data can solve the problem if inflation stays relatively
constant over time from one country to the other. Unfortunately if the effect of inflation on growth is the
focus of the study, then much of the cross-sectional information is lost using this approach.
(c) For this index to be valid, central bank independence has to be relevant and exogenous. If inflation
rates are correlated with the index, then central bank independence is a relevant instrument. Although
there is a high correlation for developed countries, there is little to no correlation when data for all
countries is considered. Whether or not the index is exogenous cannot be tested unless the coefficients of
the equation are overidentified. Otherwise personal judgment is the only guide. An argument that
central bank independence is exogenous would have to rely on it being based on institutional
arrangements which are independent of inflation. Although the independence of central banks in many
countries was initially determined by concerns independent of inflation, there have been many
situations where the institutional arrangements were altered as a result of high inflation.
5) (Requires Matrix Algebra) The population multiple regression model can be written in matrix form as
Y=X +U
where
Y1
u1
u2
1 X11 N Xk1
Y
X
X
Y= 2 ,U=
, X = 1 12 N k2
O
O
OO R O
Yn
un
1 X1n N Xkn
W11 N Wr1
W12 N Wr2
, and
O
R O
W1n N Wrn
0
=
1
O
k
Note that the X matrix contains both k endogenous regressors and (r +1) included exogenous regressors (the
constant is obviously exogenous).
The instrumental variable estimator for the overidentified case is
^ IV
Wr1
Wr2
O
Wrn
It is of order n (m+r+1).
For this estimator to exist, both (Z Z) and [X Z(Z Z)-1 Z X] must be invertible. State the conditions under
which this will be the case and relate them to the degree of overidentification.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 303
Answer: In order for a matrix to be invertible, it must have full rank. Since Z Z is of order (m + r + 1) (m + r + 1),
then in order to invert Z Z, it must have rank (m + r + 1). In the case of a product such as Z Z, the rank is
at most less than or equal to the rank of Z or Z, whichever is smaller. Z is of order n (m + r + 1), and
assuming that there is no perfect multicollinearity, will have either rank n or rank (m + r + 1), whichever
is the smaller of the two. Hence if there are fewer observations than the number of instrumental
variables plus exogenous variables, then the rank of Z will be n(< m + r + 1), and the rank of Z Z is also
n(< m + r + 1). Hence Z Z does not have full rank, and therefore cannot be inverted. The IV estimator
does not exist as a result. In the past, this was considered a strong possibility with large econometric
models, where many predetermined variables entered.
If there are more observations than instruments, then the rank of Z Z is ( m + r + 1). X Z will be of order
(k + r + 1) (m + r + 1), which will have rank (k + r + 1) if m > k, i.e., if there is overidentification.
Furthermore [X Z(Z Z)-1 Z X] is of order (k + r + 1) (k + r + 1) and will have full rank since the rank of
a product of the three matrices involved is at most the rank of the minimum of the three matrices X Z,
Z Z, and Z X.
6) Consider the following model of demand and supply of coffee:
Coffee
Coffee
Tea
Demand: Q i
= 1P i
+ 2P i
+ ui
Coffee
Coffee
Tea
Supply: Q i
= 3P i
+ 4P i
+ 5 Weather + v i
(variables are measure in deviations from means, so that the constant is omitted).
What are the expected signs of the various coefficients this model? Assume that the price of tea and Weather are
exogenous variables. Are the coefficients in the supply equation identified? Are the coefficients in the demand
equation identified? Are they overidentified? Is this result surprising given that there are more exogenous
regressors in the second equation?
Answer: Changes in Weather will shift the supply equation and thereby trace out the demand equation. Hence the
coefficients of the demand equation are exactly identified since the number of instruments equals the
number of endogenous regressors. However the coefficients of the supply equation are underidentified
since there is no instrumental variable available for estimation. The result is not surprising, since it is not
the number of exogenous regressors in the equation that matters when determining whether or not the
coefficients are identified. Instead what matters is the number of instruments available relative to the
number of endogenous regressors. It is possible that the regression coefficients can be (over)identified
even if there are no exogenous regressors present in the equation.
7) You started your econometrics course by studying the OLS estimator extensively, first for the simple regression
case and then for extensions of it. You have now learned about the instrumental variable estimator. Under what
situation would you prefer one to the other? Be specific in explaining under which situations one estimation
method generates superior results.
Answer: Under the OLS assumptions, the OLS estimator is unbiased and consistent. The sampling distribution of
the estimator is approximately normal in large samples. Hence statistical inference can proceed as usual
using the t-statistic, confidence intervals, or joint hypothesis tests involving the F-statistic.
One major concern throughout the text has been the development of new estimation techniques in the
case where one of the OLS assumptions is violated, specifically that there is correlation between the error
term and at least one of the regressors. This may be the result of omitted variables, error -in-variables, or
simultaneous causality bias. These make up three of the threats to internal validity. In each of these
cases, OLS becomes biased and an alternative estimator should be used.
Even if the OLS assumptions are violated and the OLS estimator is biased because of omitted variable
bias, simultaneous causality, or errors-in-variables, using TSLS will not improve the situation if the
instruments are not valid. In that case, TSLS will yield inconsistent estimators if the instruments are not
exogenous. It will be biased and statistical inference will not be valid if the instruments are weak.
Furthermore, the estimator will not even normally distributed in large samples.
If the instruments are valid and the other IV regression assumptions hold, then the TSLS estimator is
consistent and therefore preferable over the OLS estimator. Although its distribution is complicated in
small samples, the sampling distribution of the estimator is approximately normal in large samples.
Hence statistical inference can proceed as usual using the t-statistic, confidence intervals, or joint
hypothesis tests involving the F-statistic.
8) Your textbook gave an example of attempting to estimate the demand for a good in a market, but being unable
to do so because the demand function was not identified. Is this the case for every market? Consider, for
example, the demand for sports events. One of your peers estimated the following demand function after
collecting data over two years for every one of the 162 home games of the 2000 and 2001 season for the Los
Angeles Dodgers.
Attend = 15,005 + 201 Temperat + 465 DodgNetWin + 82 OppNetWin
(8,770) (121)
(169)
(26)
+ 9647 DFSaSu + 1328 Drain + 1609 D150m + 271 DDiv 978 D2001;
(1505)
(3355)
(1819)
(1,184)
(1,143)
R2 = 0.416, SER = 6983
Where Attend is announced stadium attendance, Temperat it the average temperature on game day,
DodgNetWin are the net wins of the Dodgers before the game (wins-losses), OppNetWin is the opposing teams
net wins at the end of the previous season, and DFSaSu, Drain, D150m, Ddiv, and D2001 are binary variables,
taking a value of 1 if the game was played on a weekend, it rained during that day, the opposing team was
within a 150 mile radius, plays in the same division as the Dodgers, and during 2001, respectively. Numbers in
parenthesis are heteroskedasticity- robust standard errors.
Even if there is no identification problem, is it likely that all regressors are uncorrelated with the error term? If
not, what are the consequences?
Answer: In the case of sports events, often price and quantity are not simultaneously determined by supply and
demand. For baseball games, the supply of seats is fixed at the capacity level of the stadium. In addition,
prices for games are also fixed in advance and do not vary with the attractiveness of the opponent.
Therefore the supply curve is infinitely elastic up to the point of where the game is sold out. This
situation is complicated by ticket scalping and the fact that teams stage special events (fireworks, etc.).
Taking these considerations into account may result in simultaneous causality bias, or a threat to internal
validity because of the identification problem.
However, assuming that there is no identification problem, there may still be omitted variable bias or
errors-in-variables bias. For example, attendance typically increases the tighter the race for a play -off
spot towards the end of the season. Furthermore, it is not the opposing teams net wins at the end of the
previous season that accounts for the attractiveness of the opponent, but the performance during the
current season. If the opposing teams current performance is related to its performance in the previous
season, then the OLS estimator is biased.
9) Earnings functions, whereby the log of earnings is regressed on years of education, years of on the job training,
and individual characteristics, have been studied for a variety of reasons. Some studies have focused on the
returns to education, others on discrimination, union non-union differentials, etc. For all these studies, a major
concern has been the fact that ability should enter as a determinant of earnings, but that it is close to impossible
to measure and therefore represents an omitted variable.
Assume that the coefficient on years of education is the parameter of interest. Given that education is positively
correlated to ability, since, for example, more able students attract scholarships and hence receive more years of
education, the OLS estimator for the returns to education could be upward biased. To overcome this problem,
various authors have used instrumental variable estimation techniques. For each of the instruments potential
instruments listed below briefly discuss instrument validity.
(a) The individuals postal zip code.
(b) The individuals IQ or testscore on a work related exam.
(c) Years of education for the individuals mother or father.
(d) Number of siblings the individual has.
Answer: (a) Instrumental validity has two components, instrument relevance (corr(Zi, Xi)
0, and instrument
exogeneity (corr(Zi, ui) = 0). The individuals postal zip code will certainly be uncorrelated with the
omitted variable, ability, even though some zip codes may attract more able individuals. However, this
is an example of a weak instrument, since it is also uncorrelated with years of education.
(b) There is instrument relevance in this case, since, on average, individuals who do well in intelligence
scores or other work related test scores, will have more years of education. Unfortunately there is bound
to be a high correlation with the omitted variable ability, since this is what these tests are supposed to
measure.
(c) A non-zero correlation between the mothers or fathers years of education and the individuals years
of education can be expected. Hence this is a relevant instrument. However, it is not clear that the parent
s years of education are uncorrelated with parents ability, which in turn, can be a major determinant of
the individuals ability. If this is the case, then years of education of the mother or father is not a valid
instrument.
(d) There is some evidence that the larger the number of siblings of an individual, the less the number of
year of education the individual receives. Hence number of siblings is a relevant instrument. It has been
argued that number of siblings is uncorrelated with an individuals ability. In that case it also represents
an exogenous instrument. However, there is the possibility that ability depends on the attention an
individual receives from parents, and this attention is shared with other siblings.
10) The two conditions for instrument validity are corr(Zi, Xi)
inconsistency of OLS is that corr(Xi, ui) 0. But if X and Z are correlated, and X and u are also correlated, then
how can Z and u not be correlated? Explain.
Answer: The introduction to Chapter 10 on instrumental variables regression and section 10.1 went into a lengthy
explanation of this problem. The major idea is that corr(Xi, ui) has two parts: one for which the
correlation is zero and a second for which it is non-zero. The trick is to isolate the uncorrelated part of X.
For the instrument to be valid, corr(Zi, ui) = 0 and corr(Zi, Xi) 0 must hold. TSLS then generates
predicted values of X in the first stage by using a linear combination of the instruments. As long as
corr(Zi, Xi) 0 and corr(Zi, ui) = 0, then the part of X which is uncorrelated with the error term is
extracted through the prediction. In the second stage, this captured exogenous variation in X is then used
to estimate the effect of X on Y, which is exogenous.
11) Consider the a model of the U.S. labor market where the demand for labor depends on the real wage, while the
supply of labor is vertical and does not depend on the real wage. You could argue that the supply of labor by
households (think of hours supplied by two adults and two children) has not changed much over the last 60
years or so in the U.S. while real wages more than doubled over the same time span. At first that seems strange
given the higher participation rate of females over that period, but that increase has been countered by a lower
male participation rate (resulting from earlier retirement), an increase in legal holidays, and an increase in
vacation days.
a.
Write down two equations representing the labor supply and labor demand function, allowing for an
error term in each of the demand and supply equation. In addition, assume that the labor market
clears.
b.
c.
Assuming that the error terms are mutually independent i.i.d. random variables, both with mean zero,
show that the real wage and the error term of the labor demand equation are correlated.
d. If you find a non-zero correlation, should you estimate the labor demand equation using OLS? If so,
what are the consequences?
e.
Estimating the labor demand equation by IV estimation, which instrument suggests itself
immediately?
Answer: a. Student may use different symbols, but will end up with something like the following specification:
W
Nd = 0 + 1
+u
P
Ns = 0 + v
Nd = Ns = N
^
2
u
1
1 Xi + ui.
1iXi + ui.
C) Yi = 0i + 1iXi + ui.
D) Yi = 0 + 1 Gi + 2 Dt + ui.
Answer: C
22) In the case of heterogeneous causal effects, the following is not true:
A) in the circumstances in which OLS would normally be consistent (when E(ui Xi) = 0), the OLS estimator
continues to be consistent.
B) OLS estimation using heteroskedasticity-robust standard errors is identical to TSLS.
C) the OLS estimator is properly interpreted as a consistent estimator of the average causal effect in the
population being studied.
D) the TSLS estimator in general is not a consistent estimator of the average causal effect if an individuals
decision to receive treatment depends on the effectiveness of the treatment for that individual.
Answer: B
23) One of the major lessons learned in the chapter on experiments and quasi -experiments
A) is that there are almost no true experiments in economics and that quasi-experiments are a poor
substitute.
B) you should always use TSLS when estimating causal effects in quasi -experiments.
C) populations are always homogeneous.
D) is that the insights of experimental methods can be applied to quasi -experiments, in which special
circumstances make it seem as if randomization has occurred.
Answer: D
24) Quasi-experiments
A) provide a bridge between the econometric analysis of observational data sets and the statistical ideal of a
true randomized controlled experiment.
B) are not the same as experiments, and lessons learned from the use of the latter can therefore not be
applied to them.
C) most often use difference-in-difference estimators, which are quite different from OLS and instrumental
variables methods studied in earlier chapters of the book.
D) use the same methods as studied in earlier chapters of the book, and hence the interpretation of these
methods is the same.
Answer: A
25) The major distinction between the experiments and quasi-experiments chapter and earlier chapters is the
A) frequent use of binary variables.
B) type of data analyzed and the special opportunities and challenges posed when analyzing experiments
and quasi-experiments.
C) superiority of TSLS over OLS.
D) use of heteroskedasticity-robust standard errors.
Answer: B
26) A potential outcome
A) is the outcome for an individual under a potential treatment.
B) cannot be observed because most individuals do not achieve their potential.
C) is the same as a causal effect.
D) is none of the above.
Answer: A
27) A causal effect for a single individual
A) can be deduced from the average treatment effect.
B) cannot be measured.
C) depends on observable variables only.
D) is observable since it is used as part of calculating the mean of individual causal effects.
Answer: B
28) Randomization based on covariates is
A) not of practical importance since individuals are hardly ever assigned in this fashion.
B) dependent on the covariances of the error term (serial correlation).
C) a randomization in which the probability of assignment to the treatment group depends on one of more
observable variables W.
D) eliminates the omitted variable bias when using the difference estimator based on Yi = 0 + 1 Xi + ui ,
where Y is the outcome variable and X is the treatment indicator.
Answer: C
29) Testing for the random receipt of treatment
A) is not possible, in general.
B) entails testing the hypothesis that the coefficients on W1i, , Wri are non-zero in a regression of Xi on W1i,
, Wr .
C) is not meaningful since the LHS variable is binary.
D) entails testing the hypothesis that the coefficients on W1i, , Wri are zero in a regression of Xi on W1i, ,
Wr .
Answer: D
30) Failure to follow the treatment protocol means that
A) the OLS estimator cannot be computed.
B) instrumental variables estimation of the treatment effect should be used where the initial random
assignment is the instrument for the treatment actually received.
C) you should use the TSLS estimator and regress the outcome variable Y on the initial random assignment
in the first stage to get predicted values of the outcome variable.
D) the Hawthorne effect plays a crucial role.
Answer: B
2) Canada and the United States had approximately the same aggregate unemployment rates from the 1920s to
1981. In 1982, a two percentage point gap appears, which has roughly persisted until today, with the Canadian
unemployment rate in the third quarter of 2002 being 7.6 percent while the American rate stood at 5.9 percent
in the same period. Several authors have investigated this phenomenon. One study, published in 1990,
contained the following statement: It is a clich that, as compared to analysis in the physical sciences,
economic analysis is hampered by the lack of controlled experiments. In this regard, study of the Canadian
economy can be much facilitated by comparison with the behaviour of the US Discuss what the authors
may have had in mind. List some potential threats to internal and external validity when comparing aggregate
unemployment rate behavior between countries.
Answer: It should be clear that the authors were not really talking about a controlled experiment, but instead had
in mind a quasi-experiment or natural experiment. In a randomized controlled experiment to study the
effect of unemployment insurance benefits on unemployment, for example, unemployed workers would
be treated with various degrees of unemployment insurance generosity, such as the amount by which
their former wages are replaced by unemployment insurance benefits (replacement rate), the duration
of benefits, the scrutiny of the agency monitoring the job search effort, etc. Instead the authors must have
thought that the two economies were similar in many aspects, and that because of an external event,
either in Canada or in the U.S., one was subjected to a treatment, while the other was not, which resulted
in the aggregate unemployment rate difference. It is the difference in location (living in the U.S. vs. in
Canada) that gives the resemblance to a randomly assigned treatment. The above study is of the first
type of quasi-experiments discussed in the textbook whereby the treatment received is viewed as if
randomly determined.
One threat to external validity is to generalize the results from a U.S. -Canada comparison to other
cultural and less developed economies. Also, consider unemployment insurance generosity as a
treatment variable. (Canada liberalized unemployment benefits considerably in the early 70s). In that
case E(ui Xi) = 0 is unlikely to hold, and additional regressors and instrumental variable techniques
should be used.
3) Earnings functions provide a measure, among other things, of the returns to education. It has been argued
these regressions contain a serious omitted variable bias due to differences in abilities. Furthermore, ability is
hard to measure and bound to be highly correlated with years of schooling. Hence the standard estimate of
about a 10 percent return to every year of schooling is upward biased. Suggest some ways to address this
problem. One famous study looked at earnings of identical twins. Explain how this can be viewed as a
quasi-experiment, and mention some of the threats to internal and external validity that such a study might
encounter.
Answer: Answers will vary by student. The omitted variable bias should play a central part in the discussion. E(ui
Xi, W1i,..., Wri) = 0 will not hold if one of the Ws is years of education and u contains unobserved
ability. If ability causes individuals to have higher earnings and longer years of education, perhaps
through obtaining university scholarships easier, then the returns to education are biased upward. One
way to circumvent this problem is, as some studies have done in the past, to approximate ability by IQ
scores. If IQ scores measure ability with error, then instrumental variable techniques can be employed.
These were discussed in Chapter 10 of the textbook. Another possibility is to model ability as an omitted
variable that remains constant over time. In that case, panel estimation methods with fixed effects,
presented in Chapter 8 of the textbook, can be used. Data can be differenced to eliminate the entity fixed
effects or binary variables can be added to capture them. At any rate, this approach requires data being
available for more than a single point in time. The use of data from identical twins is fascinating since
these have identical genes and, typically, identical family backgrounds. The suggestion is therefore to
assume that they have identical ability as well. If some twins have different years of schooling while
others do not, then this can be treated as a quasi-experiment since the researcher can view this choice as
if it had been randomly assigned. Obviously it cannot count as a randomized controlled experiment,
since the difference in schooling was not determined by the flip of a coin, say. But it may also run into
problems in providing an as if randomization. The text flagged some of the potential problems in section
11.1: Initially, one might think that an ideal experiment would take two otherwise identical individuals,
treat one of them, and compare the difference in their outcomes while holding constant all other
influences. This is not, however, a practical experimental design, for it is impossible to find two identical
individuals: even identical twins have different life experiences, so they are not identical in every way.
Finally, if identical twins are different from the general population, then there is also a threat to
external validity by generalizing the results for the population of all individuals.
4) Describe the major differences between a randomized controlled experiment and a quasi -experiment.
Answer: Answers will vary by student. Some of the following points should appear.
A randomized controlled experiment relies on the random selection of entities from a population of interest,
and the random assignment of these individuals into either a treatment or control group. To study the
causal effects, a simple regression model with a single regressor can be specified. This regressor can
either be a binary variable or a variable indicating treatment levels. Since E(ui Xi) = 0 is guaranteed if
the assignment and selection was random, then the causal or treatment effect can be measured through
E(Yi X = x) - E(Yi X = 0). The random selection and assignment assures that there is no omitted
variable bias, and therefore the OLS estimator is unbiased. Adding additional regressors can result in
increased efficiency. Alternatively a differences-in-differences estimator with or without additional
regressors is also available if the entities have been observed for two periods, one before and one after
the treatment. In the case of more than two observations per entity, panel methods can be employed.
There are various threats to internal and external validity. These include failure to randomize, failure to
follow treatment protocol, attrition, experiment effects, and small samples (threats to internal validity),
and nonrepresentative sample, nonrepresentative program or policy, general equilibrium effects, and
treatment vs. eligibility effects (threats to external validity).
A quasi-experiment is also called a natural experiment since the treatment of some entities resulted
from an external event. The treatment is administered as if it was random. The reason for observing
quasi-experiments more often in economics is that they are less expensive and raise less of an ethical
concern. The as if randomly assigned treatment is the result of, as the textbook puts it, vagaries in
legal institutions, location, timing of policy or program implementation, natural randomness such as
birth dates, rainfall, or other factors that are unrelated to the causal effect under study. There are two
types of quasi-experiments, one whereby treatment is viewed as if randomly determined, the other
whereby the as if randomization provides an instrumental variable. Threats to internal and external
validity are the same as for randomized controlled experiments once they are modified. For example,
experimental effects are typically absent since individuals are not aware that they are part of an
experiment. Small samples is replaced by instrument validity in quasi -experiments.
5) Roughly ten percent of elementary schools in California have a system whereby 4 th to 6th graders share a
common classroom and a single teacher (multi-age, multi-grade classroom). Suggest an experimental design
that would allow you to assess the effect of learning in this environment. Mention some of the threats to
internal and external validity and how you would attempt to circumvent these.
Answer: Students should be selected randomly within a school and should be randomly assigned to a treatment
group (multi-age, multi-grade classroom) and a control group (traditional grade assignment; 4 th, 5th,
and 6th grade only per room). Alternatively, and depending on the size of the experiment, a subset of
schools could be chosen and some pupils would randomly be assigned to traditional grade assignments
while others would be moved into multi-age, multi-grade classrooms. Another alternative would be to
simply choose some schools randomly which would have multi-age, multi-grade classrooms only. The
causal effect could then be estimated in a simple regression model with a binary regressor. Random
selection and random assignment would assure E(ui Xi) = 0 and thereby eliminate one threat to internal
validity through omitted variable bias.
Another threat to internal validity would be if the worst or best performing schools were chosen instead
of using a random selection, or if parents in the district were allowed to vote whether or not to have the
school selected for the experiment. This would imply a failure to randomize. If students were allowed to
refuse to participate by transferring to a neighboring school, then this would represent failure to follow
treatment protocol. Double blind experiments are obviously not feasible since both instructors and
students know into which setting they are being placed (experimental effects). There are few threats to
external validity except for the situation whereby students would be allowed to opt in or out of the
experimental group (treatment vs. eligibility effect).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 318
6) Assume for the moment that the student-teacher ratio effect on test scores was large enough that you would
advocate reducing class sizes in elementary schools. In 1996, the State of California reduced class sizes from
K-3 to no more than 20 students across all public elementary schools (Class Size Reduction Act) at a cost of
approximately $2 billion. In a short essay, discuss why the general equilibrium effects might differ from the
results obtained using experiments.
Answer: The General Equilibrium effects are the result of the additional demand for teachers. Each elementary
school needed additional teachers in order to reduce the class size to 20 or less think of a school that
had perhaps 3 Kindergarten classes of 25 students each. In that case, one additional classroom had to be
created typically some temporary structure. The question arises where the additional teacher came
from. If your school district was a desirable district to teach in, perhaps because of having a reputation of
well behaved children or classrooms that were well equipped, then teachers from other districts, perhaps
less desirable ones, would apply to the better school district. Presumably the desirable school district
would pick the best teacher(s) available, leaving the less desirable school district with a lower level of
teacher quality. The same phenomenon would repeat itself at the lower level school district, and so forth,
until you would get to the least desirable school district, which would have to hire new teachers from a
cohort that could not find a job elsewhere. Given the size of the State of California, the General
Equilibrium effect could be substantial, perhaps even drawing quality teachers from other states.
^ diffs-in-diffs
you
need the change in the treatment group and the change in the control group. To do this, the study provides you
with the following information
FTE Employment
before
FTE Employment
after
PA
23.33
NJ
20.44
21.17
21.03
Where FTE is full time equivalent and the numbers are average employment per restaurant.
(a) Calculate the change in the treatment group, the change in the control group, and finally
Since minimum wages represent a price floor, did you expect
(b) If you look at
^ diffs-in-diffs
^ diffs-in-diffs
^ diffs-in-diffs
to be positive or negative?
, is this number primarily due to a change in the treatment group or the control
^ diffs-in-diffs
given that there are 410 observations. If you believed that the benefit from small minimum wage increases
outweighed the cost in terms of employment loss, would finding that this coefficient was not statistically
significant discourage you?
Answer: (a) change in treatment group: + 0.59, change in control group: - 2.16,
^ diffs-in-diffs
diffs-in-diffs
in terms of observable differences in the treatment and control group, before and after
1
2) Define the
the treatment. Explain why this presentation is the equivalent of calculating the coefficient in a regression
framework.
Answer:
^ diffs-in-diffs
Y treatment -
Yi = 0 + 1 Xi + ui
th
where Y is the value for the i individual after the experiment is completed, minus the value of Y
before it starts, and X is a randomly assigned binary treatment variable, which takes on the value of one
if treatment was received and is zero otherwise. Then for an individual who did not receive treatment,
^
Y control,after- Y control,before = 0 . If the individual received treatment, then
^
^
Y treatment,after - Y treatment,before = 0 + 1 . Hence
^
treatment,after - Y treatment,before) - (Y control,after- Y control,before).
1 = (Y
3) Your textbook gives a graphical example of
^ diffs-in-diffs
time period appears on the horizontal axis. There are two time periods entered: t = 1 and t = 2. The former
corresponds to the before time period, while the latter represents the after period. The assumption is that
the policy occurred sometime between the time periods (call this t = p). Keeping in mind the graphical
example of
^ diffs-in-diffs
, carefully read what a reviewer of the Card and Krueger (CK) study of the minimum
Two assumptions are implicit throughout the evaluation of the natural experiment: (1) [
would be zero if the treatment had not occurred, so a nonzero [
^ diffs-in-diffs
^ diffs-in-diffs
treatment (that is, nothing else could have caused the difference in the outcomes to change), and (2) the
intervention occurs after we measure the initial outcomes in the two groups. Three conditions are
particularly relevant in interpreting CKs work: (1) [t = 1] must be sufficiently before [t = p] that [the treatment
group] did not adjust to the treatment before [t=1] otherwise [Ytreatent,before Ycontrol,before] will reflect the
effect of the treatment; (2) [t = 2] must be sufficiently after [t = p] to allow the treatments effect to be fully felt;
and (3) we must be sure that the same difference [Ytreatent,before Ycontrol,before] would have been observed at
[t = 2] if the treatment had not been imposed, that is, [the control group must be good enough] that there is no
need to adjust the differences for factors other than the treatment that might have caused them to change.
Use a figure similar to the textbook to explain what this reviewer meant.
Answer: See accompanying figures.
(1)
^ diffs-in-diffs
(2) The intervention occurs after we not measure the initial outcomes in the two groups.
and (2)
4) Consider the simple population regression model where the treatment is the same for the members of the
treatment group, and hence X is a binary variable. Explain why the coefficient on X represents the difference
between two means. How is the test for the statistical significance of the coefficient on X related to the test for
differences in means between two populations, when their variances are different? Write down the null and
alternative hypothesis in each case.
Answer: The answer should proceed along the lines of Regression When X Is a Binary Variable (Section 4.7) of
the textbook, where the binary variable now indicates whether or not an individual has received
treatment. In terms of the regression model with a single regressor this is formulated as
Yi = 0 + 1 Xi + ui,
where Xi is 1 or 0 depending on whether or not the individual received treatment. Then in the case of no
treatment received, Yi = 0 + ui and E(Yi Xi = 0) = 0 . Alternatively, when treatment was received, Yi =
0 + 1 + ui and E(Yi Xi = 1) = 0 + 1 . Hence 1 is the difference between the two means. To test
whether or not there is a difference, the hypotheses are
H0 : 1 = 0 vs. H1 : 1
0.
The null hypothesis can be tested using the usual t-statistic and allowing for heteroskedasticity-robust
standard errors. This test corresponds to the test encountered in section 3.4 of the textbook, where
H0 : treatment - control = 0 vs. H1 : treatment - control
0,
and the standard error of the differences in means is calculated under the assumption that the two
population variances are unequal.
5) Present alternative estimators for causal effects using experimental data when data is available for a single
period or for two periods. Discuss their advantages and disadvantages.
Answer: There are essentially four estimators discussed in the textbook: two each for a single period randomized
controlled experiment, and two for panel data. For each of these situations, a binary or treatment level
regressor X is used, and additional characteristics can be added, thereby distinguishing the two possible
estimators within the single/panel two periods framework.
The single period estimator of the causal or treatment effect is the OLS estimator in the regression model
with a single regressor
^
Yi = 0 + 1 Xi + ui.
Random selection and assignment assures that E(ui Xi) = 0. Thus even with omitted variables present,
E(Yi Xi) 0 + 1 Xi, since X is independently distributed from the omitted variables. The OLS estimator
^
A different estimator, called differences estimator with additional regressors, is obtained by adding
characteristics for the individual, which are not affected by the treatment. This is done to deal with some
of the threats to validity, but also for efficiency purposes. The multiple regression model in this case is
Yi = 0 + 1 Xi + 2 W1i + ... +
1 is the differences estimator with additional regressors. Here 1 is consistent even if E(ui Xi,
W1i,..., Wri) = 0 does not hold, as long as there is conditional mean independence. In that case, the OLS
and
estimator is consistent. The inclusion of the characteristics also allows for testing for random receipt of
Stock/Watson 2e -- CVC2 8/23/06 -- Page 325
treatment and random assignment using the usual F-statistic in auxiliary regressions.
The third estimator generalizes the two estimators above to the case of panel data. The idea here is that
data is available for two periods, one before the treatment is administered and one after. The
differences-in-differences estimator is then defined as
^ diffs-in-diffs
Ytreatment -
Ycontrol.
If the treatment is randomly assigned, the estimator is unbiased, consistent, and more efficient that the
differences estimator. In addition, it eliminates pretreatment differences in Y.
Alternative it can be viewed in a regression framework
Yi = 0 + 1 Xi + ui
where Y is the value for the ith individual after the experiment is completed, minus the value of Y
before it starts. Then for an individual who did not receive treatment,
^
6) To analyze the effect of a minimum wage increase, a famous study used a quasi -experiment for two adjacent
states: New Jersey and (Eastern) Pennsylvania. A
^ diffs-in-diffs
employment changes per restaurant between to treatment group (New Jersey) and the control group
(Pennsylvania). In addition, the authors provide data on the employment changes between low wage
restaurants and high wage restaurants in New Jersey only. A restaurant was classified as low wage, if the
starting wage in the first wave of surveys was at the then prevailing minimum wage of $4.25. A high wage
restaurant was a place with a starting wage close to or above the $5.25 minimum wage after the increase.
(a) Explain why employment changes of the high wage and low wage restaurants might constitute a
quasi-experiment. Which is the treatment group and which the control group?
(b) The following information is provided
FTE Employment
before
FTE Employment
after
Low wage
19.56
High wage
22.25
20.88
20.21
Where FTE is full time equivalent and the numbers are average employment per restaurant.
Calculate the change in the treatment group, the change in the control group, and finally
minimum wages represent a price floor, did you expect
(c) The standard error for
^ diffs-in-diffs
^ diffs-in-diffs
^ diffs-in-diffs
. Since
to be positive or negative?
^ diffs-in-diffs
7) Specify the multiple regression model that contains the difference-in-difference estimator (with additional
regressors). Explain the circumstances under which this model is preferable to the simple
difference-in-difference estimator. Explain how the Ws can be used to test for randomization. How does the
interpretation of the W variables change compared to the differences estimator with additional regressors?
Answer: The differences-in-differences estimator with additional regressors is
Yi = 0 + 1 Xi + 2 W1i + ... + 1+ rWri + ui, i = 1,..., n.
This is more general than the differences-in-differences estimator
Yi = 0 + 1 Xi + ui
which equals
=
^ diffs-in-diffs
Ytreatment -
Since in some applications, the assumption E(ui Xi, W1i,..., Wri) = 0 is not likely to hold, the
differences-in-differences estimator will not be consistent. However, the differences-in-differences
estimator will be consistent under the weaker assumption of conditional mean independence. Including
the additional characteristics (W variables) also can improve efficiency. Furthermore, adding these
variables allows the researcher to perform tests for randomization, since Xi should be uncorrelated with
the W variables, and also with the assignment. Regressing Xi on W1i, , Wri, and using an F-test for the
hypothesis that all coefficients on the Ws are constant constitutes a test for the random receipt of
treatment. Performing a similar regression of the assignment Zi on the Ws with an accompanying F-test
is a test for random assignment. Obviously if treatment and assignment were randomly determined,
then neither should be dependent on characteristics of the entities.
The dependent variable in the case of the differences estimator is a level, while in the case of the
differences-in-differences estimator it is a change. Hence W affects the change in the latter case, not the
level itself.
8) Let the vertical axis of a figure indicate the average employment fast food restaurants. There are two time
periods, t = 1 and t = 2, where time period is measured on the horizontal axis. The following table presents
average employment levels per restaurant for New Jersey (the treatment group) and Eastern Pennsylvania (the
control group).
FTE Employment
before
FTE Employment
after
PA
23.33
NJ
20.44
21.17
21.03
Enter the four points in the figure and label them Ytreatment ,before, Ytreatment ,after , Ycontrol,before, and
Ycontrol ,after. Connect the points. Finally calculate and indicate the value for
Answer:
^ diffs-in-diffs
Ytreatment -
^ diffs-in-diffs
9) (Requires Appendix material) Discuss how the differences-in-differences estimator can be extended to
multiple time periods. In particular, assume that there are n individuals and T time periods. What do the
individual and time effects control for?
Answer: The extension of the differences-in-differences estimator to multiple time periods uses the differences
estimator for a single period, and adds binary variables for entity and time fixed effects. As with the
differences estimator and the differences-in-differences estimator, additional regressors W for
characteristics can be added. Without these characteristics, the population regression model is as follows
Yit = 0 + 1 Xit + 2 D2 i + ... + nDni + 2 B2 t + ... + TBTt + v it
with i = 1,,n entities, and t = 1, ,T time periods.
The entity effects control for unobserved variables that remain constant over time for the same entity,
and the time effects control for unobserved variables that are the same for all individuals at a point in
time. Examples of time fixed effects could be business cycle conditions or macroeconomic conditions in
general. Examples of entity fixed effects might be gender, race, years of previous education, etc. The
model simplifies to the differences-in-differences regression model for two periods (T = 2). If W
variables are added, then these can also be interacted with the time effect binary variables. The major
advantage over the differences-in-differences model is that effects can be traced out over time.
10) The New Jersey-Pennsylvania study on the effect of minimum wages on employment mentioned in your
textbook used a comparison in means before and after analysis. The difference -in-difference estimate
turned out to be 2.76 with a standard error of 1.36.
The authors also used a difference-in-differences estimator with additional regressors of the type
Yi = 0 + 1 Xi + 2 W1,t + ... + 1+ rWr,i + ui
where i = 1, , 410. X is a binary variable taking on the value one for the 331 observations in New Jersey. Since
the authors looked at Burger King, KFC, Wendys, and Roy Rogers fast food restaurants and the restaurant
could be company owned, four W-variables were added.
(a) Given that there are four chains and the possibility of a company ownership, why did the authors not
include five W-variables?
^
(b) OLS estimation resulted in 1 of 2.30 with a standard error of 1.20. Test for statistical significance and
specify the alternative hypothesis.
(c) Why is this estimate different from the number calculated from Ytreatment Ycontrol = 2.76? What is the
advantage of employing this estimator of the simple difference -in-difference estimator?
Answer: (a) Including a fifth W-variable would have resulted in perfect multicollinearity.
(b) The t-statistic is +1.92. If the alternative hypothesis was H1 : 1 < 0, then you cannot reject the null
hypothesis. If the alternative hypothesis was H1 : 1 0, then you cannot reject the null hypothesis at the
5% level, although you can at the 10% level. The choice of alternative hypothesis depends on prior
expectations, and standard economic theory would suggest H1 : 1 < 0.
(c) The difference is small in terms of the standard error and may be due to sample variation. Although
the difference-in-difference estimator is consistent, the difference-in-difference estimator with
additional regressors can be more efficient. It is different because it stems from using the multiple
regression model
Yi = 0 + 1 Xi + 2 W1i + ... + 1+ rWri + ui, i = 1,..., n
rather than the regression with a single regressor
Yi + 0 + 1 Xi + ui, i = 1,..., n
^
and E(ui Xi, W1i, ..., Wri) = 0 may not hold. In that case, 1 is consistent as long as there is conditional
mean independence. The inclusion of the characteristics also allows for testing for random receipt of
treatment and random assignment using the usual F-statistic in auxiliary regressions.
8) One reason for computing the logarithms (ln), or changes in logarithms, of economic time series is that
A) numbers often get very large.
B) economic variables are hardly ever negative.
C) they often exhibit growth that is approximately exponential.
D) natural logarithms are easier to work with than base 10 logarithms.
Answer: C
9) The jth autocorrelation coefficient is defined as
cov(Yt, Yt-1 )
A)
.
var(Yt) var(Yt-1 )
B)
C)
D)
cov(Yt, Yt-j-1 )
var(Yt) var(Yt-j)
cov(Yt, ut)
var(Yt) var(ut)
cov(Yt, Yt-j)
var(Yt) var(Yt-j)
Answer: D
10) Negative autocorrelation in the change of a variable implies that
A) the variable contains only negative values.
B) the series is not stable.
C) an increase in the variable in one period is, on average, associated with a decrease in the next.
D) the data is negatively trended.
Answer: C
11) An autoregression is a regression
A) of a dependent variable on lags of regressors.
B) that allows for the errors to be correlated.
C) model that relates a time series variable to its past values.
D) to predict sales in a certain industry.
Answer: C
12) The root mean squared forecast error (RMSFE) is defined as
^
A)
E YT - YT T-1
B)
C)
^
(YT - YT T - 1 )2 .
D)
E (YT - YT T-1 ) .
Answer: B
13) One of the sources of error in the RMSFE in the AR(1) model is
A) the error in estimating the coefficients 0 and 1 .
B) due to measuring variables in logarithms.
C) that the value of the explanatory variable is not known with certainty when making a forecast.
D) the model only looks at the previous periods value of Y when the entire history should be taken into
account.
Answer: A
SSR(p)
2
] + (p+1)
T
T
C) BIC(p) = ln [
SSR(p)
ln(T)
] - (p+1)
T
T
D) BIC(p) = ln [
SSR(p)
ln(T)
] (p+1)
T
T
Answer: A
27) The Akaike Information Criterion (AIC) is given by the following formula
SSR(p)
ln(T)
A) AIC(p) = ln [
] + (p+1)
T
T
B) AIC(p) = ln [
SSR(p)
2
] + (p+1)
T
T
C) AIC(p) = ln [
SSR(p)
p+2
]+
T
T
D) AIC(p) = ln [
SSR(p)
2
] (p+1)
T
T
Answer: B
28) The BIC is a statistic
A) commonly used to test for serial correlation
B) only used in cross-sectional analysis
C) developed by the Bank of England in its river of blood analysis
D) used to help the researcher choose the number of lags in an autoregression
Answer: D
29) The AIC is a statistic
A) that is used as an alternative to the BIC when the sample size is small (T < 50)
B) often used to test for heteroskedasticity
C) used to help a researcher chose the number of lags in a time series with multiple predictors
D) all of the above
Answer: C
30) The formulae for the AIC and the BIC are different. The
A) AIC is preferred because it is easier to calculate
B) BIC is preferred because it is a consistent estimator of the lag length
C) difference is irrelevant in practice since both information criteria lead to the same conclusion
D) AIC will typically underestimate p with non-zero probability
Answer: B
Unemployment Rate
1
2
3
4
0.97
0.92
0.83
0.75
Change of
Unemployment Rate
0.62
0.32
0.12
-0.07
(b) The accompanying table gives changes in the United States aggregate unemployment rate for the period
1999:I-2000:I and levels of the current and lagged unemployment rates for 1999:I. Fill in the blanks for the
missing unemployment rate levels.
Changes in Unemployment Rates in the United States
First Quarter 1999 to First Quarter 2000
Quarter
1999:I
1999:II
1999:III
1999:IV
2000:I
Change in
Unemployment Rate
-0.1
0.0
-0.1
-0.1
-0.1
(c) You decide to estimate an AR(1) in the change in the United States unemployment rate to forecast the
aggregate unemployment rate. The result is as follows:
UrateUSt = -0.003 + 0.621
(0.022) (0.106)
The AR(1) coefficient for the change in the inflation rate was 0.211 and the regression R2 was 0.04. What does
the difference in the results suggest here?
(d) The textbook used the change in the log of the price level to approximate the inflation rate, and then
predicted the change in the inflation rate. Why arent logarithms used here?
(e) If much of the forecast error arises as a result of future error terms dominating the error resulting from
estimating the unknown coefficients, then what is your best guess of the RMSFE here?
(f) The actual unemployment rate during the fourth quarter of 1999 is 4.1 percent, and it decreased from the
third quarter to the fourth quarter by 0.1 percent. What is your forecast for the unemployment rate level in the
first quarter of 1996?
(g) You want to see how sensitive your forecast is to changes in the specification. Given that you have
estimated the regression with quarterly data, you consider an AR(4) model. This results in the following output
(0.022) (0.125)
(0.139)
(0.091)
What is your forecast for the unemployment rate level in 2000:I? Compare the forecast error of the AR(4) model
with the forecast error of the AR(1) model.
(h) There does not seem to be much difference in the forecast of the unemployment rate level, whether you use
the AR(1) or the AR(4). Given the various information criteria and the regression R2 below, which model
should you use for forecasting?
p
0
1
2
3
4
5
6
BIC AIC
R2
0.604 0.624
0.158 0.1181
0.185 0.125
0.217 0.138
0.218 0.1183
0.249 0.130
0.277 0.138
0.000
0.393
0.397
0.400
0.416
0.417
0.420
Answer: (a) There is a very strong positive autocorrelation for the unemployment rate level. The 1 st to 4 th
autocorrelation coefficient is even higher than for the inflation rate. This suggests that a high (low) level
of the unemployment rate will persist for quite a while. Although the autocorrelations decline, they are
still high even at lag 4. This reflects the long-term trends in unemployment rates. If during a given
quarter in the 1960s or the 1990s the unemployment rate was low, then it was also low in the following
quarter. If the unemployment rate was high in a given quarter, as it was in the early 1980s, then it was
also high in the following quarter. Different from the inflation rate results discussed in the text, the
change in the unemployment rate also shows positive autocorrelations. Furthermore, these are quite
large for the first lag. Eventually, after a year, they turn negative. Hence an increase (decrease) in the
unemployment rate is followed typically by an increase (decrease) in the following quarters, before the
process reverses itself.
(b)
Changes in Unemployment Rates in the United States from the
First Quarter 1999 to the First Quarter 2000
Quarter
1999:I
1999:II
1999:III
1999:IV
2000:I
Change in
Unemployment Rate
-0.1
0.0
-0.1
-0.1
-0.1
(c) There is a higher persistence in the change of unemployment rate than in the change of the inflation
rate. The higher regression R2 means that almost 40 percent of the variation in the change of the
unemployment rate can be explained by a single regressor, namely its lag. Students may recall Figure
12.1 from the textbook, which shows a much smoother behavior for the levels, and hence the differences,
for the unemployment rate.
(d) The change of the log of the price level was used to convert a level variable (prices) into a change of
its growth rate. Unemployment is already measured as a rate in the above example. Hence differencing
the variable results in a change in the rate.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 338
(e) In this situation, the SER approximates the RMSFE. In the case of the change of the unemployment
rate, it is 0.255 percentage points.
(f) UrateUS1999:IV = 4.1 and the predicted change in the unemployment rate from 1999:IV to 2000:I is
0.06 or 0.1 rounded. The forecasted unemployment rate for 2000:I is UrateUs 2000:I = UrateUS1999:IV +
UrateUS2001:1 = 4.1% + 0.1% = 4.2%. The model therefore forecasts a slight increase in the
unemployment rate.
(g) UrateUS
= -0.005 + 0.663 (-0.1)
2001:I 1999:IV
-0.082 (-0.1) + 0.106 0.0 - 0.176 (-0.1)
-0.046. (Students may suggest a forecast of 0.1 or 0.0. The answer will proceed with 0.0.) The
corresponding forecast for the unemployment rate in 2000:I is then 4.1% + 0.0% = 4.1%. The forecast
error for the AR(4) model is 4.0% - 4.1% = -0.1%, which is slightly smaller than the 0.2% forecast error
of the AR(1) model.
(h) Close call, but both the BIC and the AIC favor the AR(1) over the AR(4). (The F-test statistic for
restricting the AR(4) to an AR(1) is 1.49 with a p-value of 0.21.)
2) You have collected quarterly data on Canadian unemployment (UrateC) and inflation (InfC) from 1962 to 1999
with the aim to forecast Canadian inflation.
(a) To get a better feel for the data, you first inspect the plots for the series.
Inspecting the Canadian inflation rate plot and having calculated the first autocorrelation to be 0.79 for the
sample period, do you suspect that the Canadian inflation rate has a stochastic trend? What more formal
methods do you have available to test for a unit root?
(b) You run the following regression, where the numbers in parenthesis are homoskedasticity -only standard
errors:
InfCt = 0.49 0.10 Inft-1 0.39
(0.28) (0.05)
InfCt-1 0.33
(0.09)
InfCt-3 + 0.05
InfCt-2 0.21
(0.09)
(0.09)
InfCt-4
(0.08)
Test for the presence of a stochastic trend. Should you have used heteroskedasticity -robust standard errors?
Does the fact that you use quarterly data suggest including four lags in the above regression, or how should
you determine the number of lags?
(c) To forecast the Canadian inflation rate for 2000:I, you estimate an AR(1), AR(4), and an ADL(4,1) model for
the sample period 1962:I to 1999:IV. The results are as follows:
InfCt = 0.002 0.31
InfCt-1
(0.014) (0.10)
InfCt = 0.021 0.46
InfCt-1 0.39
(0.158) (0.10)
InfCt = 1.279 0.51
(0.57)
InfCt-2 0.25
(0.11)
InfCt-1 0.44
(0.10)
(0.08)
InfCt-2 0.30
(0.11)
(0.09)
InfCt-3 + 0.03
InfCt-4
(0.07)
InfCt-3 0.02
InfCt-4
(0.08)
- 0.16 UrateCt-1
(0.07)
In addition, you have the following information on inflation in Canada during the four quarters of 1999 and the
first quarter of 2000:
Inflation and Unemployment in Canada, First Quarter 1999 to First Quarter 2000
Quarter
1999:I
1999:II
1999:III
1999:IV
2000:I
Unemployment
Rate
(UrateCt)
Rate of
Inflation at an
Annual Rate
(Inft)
First Lag
(Inft-1 )
Change in
Inflation
( Inft)
7.7
7.9
7.7
7.0
6.8
0.8
4.3
2.9
1.3
2.1
0.8
0.8
4.3
2.9
1.3
0.0
3.5
-1.4
-1.5
0.8
For each of the three models, calculate the predicted inflation rate for the period 2000:I and the forecast error.
(d) Perform a test on whether or not Canadian unemployment rates Granger -cause the Canadian inflation rate.
Answer: (a) A small autocorrelation coefficient together with a time series plot which displays no apparent trend
suggest the absence of a stochastic trend. Here the first autocorrelation coefficient is fairly high and the
figure displays long-run swings similar to the U.S. figure discussed in the textbook. To test for a
stochastic trend using more formal methods requires use of the Dickey-Fuller test, or better, the
augmented Dickey-Fuller test.
(b) The t-statistic on the lagged inflation rate level is (-2.00). The critical value for the ADF statistic is
(-2.57) at the 10% level. Hence you cannot reject the null hypothesis of a unit root. The ADF statistic
requires computation using homoskedasticity-only standard errors. Hence heteroskedasticity-robust
standard errors should not be used. The number of lags included should be determined using the AIC
information criterium, rather than the BIC, since it results in a better performance in finite-samples of
the ADF statistic. (As with the U.S. data used in the textbook, this results in a chosen lag length of three.
The ADF statistic in that case is (-1.91), which is still below the critical value at the 10% level.)
(c)
InfC2000:I 1999:IV for the various models is: 0.002 - 0.31 (-1.5) = 0.467
0.5 (AR(1));
0.021- 0.46 (-1.5)- 0.39 (-1.4) - 0.25 3.5 + 0.03 0.0 = 0.382 0.4 (AR(4));
1.279 - 0.51 (-1.5) - 0.44 (-1.4) - 0.30 3.5 - 0.02 0.0 - 0.16 7.0 = 0.49 0.5
(ADL(4,1)).
InfC2000:I then is: 1.3 + 0.5 = 1.8 (AR(1)); 1.3 + 0.4 = 1.7 (AR(4)); 1.3 + 0.5 = 1.8 (ADL(4,1)).
The forecast error is: 0.3 (AR(1)); 0.4 (AR(4)); 0.3 (ADL(4,1)).
(d) Since the ADL(4,1) only included the lagged unemployment rate, the t-statistic replaces the
F-statistic typically used for this test. The t-statistic is (-2.256) and the F-statistic is 2.256 2 = 5.091. Both
are statistically significant at the 5% level with a p-value of 0.026. Hence the null hypothesis that the
unemployment rate does not Granger-cause the inflation rate is rejected.
3) There is some evidence that the Phillips curve has been unstable during the 1962 to 1999 period for the United
States, and in particular during the 1990s. You set out to investigate whether or not this instability also
occurred in other places. Canada is a particularly interesting case, due to its proximity to the United States and
the fact that many features of its economy are similar to that of the U.S.
(a) Reading up on some of the comparative economic performance literature, you find that Canadian
unemployment rates were roughly the same as U.S. unemployment rates from the 1920s to the early 1980s. The
accompanying figure shows that a gap opened between the unemployment rates of the two countries in 1982,
which has persisted to this date.
Inspection of the graph and data suggest that the break occurred during the second quarter of 1982. To
investigate whether the Canadian Phillips curve shows a break at that point, you estimate an ADL(4,4) model
for the sample period 1962:I-1999:IV and perform a Chow test. Specifically you postulate that the constant and
coefficients of the unemployment rates changed at that point. The F-statistic is 1.96. Find the critical value from
the F-table and test the null hypothesis that a break occurred at that time. Is there any reason why you should
be skeptical about the result regarding the break and using the Chow -test to detect it?
(b) You consider alternative ways to test for a break in the relationship. The accompanying figure shows the
F-statistics testing for a break in the ADL(4,4) equation at different dates.
The QLR-statistic with 15% trimming is 3.11. Comment on the figure and test for the hypothesis of a break in
the ADL(4,4) regression.
(c) To test for the stability of the Canadian Phillips curve in the 1990s, you decide to perform a pseudo
out-of-sample forecasting. For the 24 quarters from 1994:I-1999:IV you use the ADL(4,4) model to calculate the
Stock/Watson 2e -- CVC2 8/23/06 -- Page 342
forecasted change in the inflation rate, the resulting forecasted inflation rate, and the forecast error. The
standard error of the ADL(4,4) for the estimation sample period 1962:1 -1993:4 is 1.91 and the sample RMSFE is
1.70. The average forecast error for the 24 inflation rates is 0.003 and the sample standard deviation of the
forecast errors is 0.82. Calculate the t-statistic and test the hypothesis that the mean out-of-sample forecast
error is zero. Comment on the result and the accompanying figure of the actual and forecasted inflation rate.
Answer: (a) The critical value from the F5, distribution is 1.85 at the 10% significance level, and 2.21 at the 5%
significance level. (The p-value is actually 0.088.). Hence, at the 10% significance level, you can reject
that null hypothesis that the constant and the four lagged unemployment rate coefficients remained
constant over the entire sample period, which suggests that a break occurred in 1982:2. There is not
sufficient evidence to reject the null hypothesis at the 5% significance level. However, the text
emphasizes that [preliminary] estimation of the break date means that the usual F critical values cannot
be used for the Chow test for a break at that date. This applies to the above example since the series was
analyzed before testing.
(b) The critical value for the QLR(5) statistic with 15% trimming is 3.26 at the 10% level. Hence you
cannot reject the null hypothesis of no break in the regression. Except for the peak at the end of 1982 and
the beginning of 1983, the F-statistic does not really come close to the critical value.
(c) The average forecast error is very small. The t-statistic is
t=
0.003
= 0.179
0.82
24
and therefore you cannot reject the hypothesis that the mean out -of-sample forecast is zero. Indeed, you
get the same impression from the graph, which shows that there are very few periods of systematically
too large or small inflation rate forecasts. The conclusion is that the Canadian Phillips curve has done
well as a model for forecasting at the end of the sample. This result is quite different from the results in
the textbook for the U.S. Phillips curve.
4) You collect monthly data on the money supply (M2) for the United States from 1962:1 -2002:4 to forecast future
money supply behavior.
where LM2 and DLM2 are the log level and growth rate of M2.
(a) Using quarterly data, when analyzing inflation and unemployment in the United States, the textbook
converted log levels of variables into growth rates by differencing the log levels, and then multiplying these by
400. Given that you have monthly data, how would you proceed here?
(b) How would you go about testing for a stochastic trend in LM2 and DLM2? Be specific about how to decide
the number of lags to be included and whether or not to include a deterministic trend in your test. The textbook
found the (quarterly) inflation rate to have a unit root. Does this have any affect on your expectation about
whether or not the (monthly) money growth rate should be stationary?
(c) You decide to conduct an ADF unit root test for LM2, DLM2, and the change in the growth rate DLM2.
This results in the following t-statistic on the parameter of interest.
LM2
DLM2
DLM2
DLM2
with trend
-0.505
without trend
-4.100
with trend
-4.592
without trend
-8.897
Find the critical value at the 1%, 5%, and 10% level and decide which of the coefficients is significant. What is
the alternative hypothesis?
(d) In forecasting the money growth rate, you add lags of the monetary base growth rate ( DLMB) to see if you
can improve on the forecasting performance of a chosen AR(10) model in DLM2. You perform a Granger
causality test on the 9 lags of DLMB and find a F-statistic of 2.31. Discuss the implications.
(e) Curious about the result in the previous question, you decide to estimate an ADL(10,10) for DLMB and
calculate the F-statistic for the Granger causality test on the 9 lag coefficients of DLM2. This turns out to be 0.66.
Discuss.
(f) Is there any a priori reason for you to be skeptical of the results? What other tests should you perform?
Answer: (a) To annualize monthly growth rates, you would need to multiply them by 1,200. The annualized
growth rate of money would be 1200 ln(LM2 t).
(b) The ADF statistic should be calculated to test for the presence of a unit root in each of the series. The
BIC information criterion can be used to determine the lag length, and homoskedasticity -only standard
errors, rather than heteroskedasticity-robust standard errors, should be considered for the regression.
Studies of the finite-sample properties of unit root tests have shown that it is better to use the AIC
criterion although it overestimates the lag length on average. Given that money growth determines the
inflation rate in the long-run, your expectation would be to also find a unit root for money growth.
(c) LM2 contains a time trend, and hence the critical values for an intercept and a time trend are relevant.
These are (-3.96), (-3.41), and (-3.12) for the three significance levels respectively. Hence you cannot
reject the null hypothesis of a unit root for LM2. The growth rate of money does not have a time trend
for the entire sample period, so the intercept only critical values should be used. These are ( -3.43),
(-2.86), and (-2.57) respectively. Hence you are able to reject the null hypothesis of a unit root for money
at the 1% significance level. The alternative hypothesis is that there is no unit root. However, failure to
reject the null hypothesis only means that there is insufficient evidence to conclude that it is false.
(d) The critical value for the null hypothesis that monetary growth rates do not Granger cause money
supply growth rates is F9, = 1.88 at the 5% significance level, and 2.41 at the 1% significance level.
Hence you can reject the null hypothesis at the 5% level, but not at the 1% level.
(e) In this situation, you cannot reject the null hypothesis that the money supply growth does not
Granger cause monetary base growth. This makes sense if the Federal Reserve uses monetary base
growth as an instrument and money supply growth is not a target.
(f) It is somehow surprising to find money growth not to contain a unit root when the inflation rate does.
It is also possible that the relationship has changed over time, as money markets have been liberalized
during the sample period. Hence it would help to test for breaks using the QLR statistic and pseudo
out-of-sample forecasts.
5) Having learned in macroeconomics that consumption depends on disposable income, you want to determine
whether or not disposable income helps predict future consumption. You collect data for the sample period
1962:I to 1995:IV and plot the two variables.
(a) To determine whether or not past values of personal disposable income growth rates help to predict
consumption growth rates, you estimate the following relationship.
LnCt = 1.695 + 0.126 LnCt-1 + 0.153 LnCt-2 ,
(0.484) (0.099)
(0.103)
+ 0.294
(0.103)
+ 0.088
(0.076)
LnYt-1 0.031
(0.078)
The Granger causality test for the exclusion on all four lags of the GDP growth rate is 0.98. Find the critical
value for the 1%, the 5%, and the 10% level from the relevant table and make a decision on whether or not these
additional variables Granger cause the change in the growth rate of consumption.
(b) You are somewhat surprised about the result in the previous question and wonder, how sensitive it is with
regard to the lag length in the ADL(p,q) model. As a result, you calculate BIC and AIC of p and q from 0 to 6.
The results are displayed in the accompanying table:
p,q
0
1
2
3
4
5
6
BIC
5.061
5.052
5.095
5.110
5.165
5.206
5.270
AIC
5.039
4.988
4.989
4.960
4.972
4.973
4.992
Your textbook estimates an AR(1) model (equation 14.7) for the change in the inflation rate using a
sample period 1962:I 2004:IV. Go to the Stock and Watson companion website for the textbook and
download the data Macroeconomic Data Used in Chapters 14 and 16. Enter the data for consumer
price index, calculate the inflation rate, the acceleration of the inflation rate, and replicate the result on
page 526 of your textbook. Make sure to use heteroskedasticity-robust standard error option for the
estimation.
b.
Next find a website with more recent data, such as the Federal Reserve Economic Data (FRED) site at
the Federal Reserve Bank of St. Louis. Locate the data for the CPI, which will be monthly, and convert
the data in quarterly averages. Then, using a sample from 1962:I 2009:IV, re -estimate the above
specification and comment on the changes that have occurred.
c.
Based on the BIC, how many lags should be included in the forecasting equation for the change in the
inflation rate? Use the new data set and sample period to answer the question.
0.017 0.127
0.097
D2LP( -1)
-0.238
R-squared
Adjusted R -squared
S.E. of regression
Sum squared resid
Log likelihood
F-statistic
Prob(F-statistic)
0.056
0.051
1.664
470.691
-330.634
10.157
0.002
t-Statistic
Prob.
0.135
-2.467
0.893
0.015
0.017
1.708
3.868
3.904
3.883
2.166
b. Not much has changed. The intercept became smaller, but was statistically insignificant anyway. The
slope coefficient increase somewhat (as did the Regression R2 with it) and its t-statistic also became
stronger. Some of this is the result of data revisions (even for the old sample period the slope coefficient
increased somewhat) while part of it has changed because of the longer sample period.
Dependent Variable: D2LP
Method: Least Squares
Date: 12/30/10 Time: 21:19
Sample: 1962Q1 2009Q4
Included observations: 192
White Heteroskedasticity-Consistent Standard Errors & Covariance
C
D2LP(-1)
0.014 0.153
-0.290
R-squared
0.094
t-Statistic
Prob.
0.089
0.929
-3.070
0.002
0.010
Adjusted R-squared
0.079
2.203
S.E. of regression
2.114
4.345
849.127
Schwarz criterion
4.379
Log likelihood
-415.161
Hannan-Quinn criter.
4.359
F-statistic
17.428
Durbin-Watson stat
2.203
Prob(F-statistic)
0.000
c. Using the BIC for p = 0, 1, 2, , 6, the minimum continues to be at p = 2. Hence the BIC still favors an
AR(2).
7) Statistical inference was a concept that was not too difficult to understand when using cross-sectional data. For
example, it is obvious that a population mean is not the same as a sample mean (take weight of students at
your college/university as an example). With a bit of thought, it also became clear that the sample mean had a
distribution. This meant that there was uncertainty regarding the population mean given the sample
information, and that you had to consider confidence intervals when making statements about the population
mean. The same concept carried over into the two-dimensional analysis of a simple regression: knowing the
height-weight relationship for a sample of students, for example, allowed you to make statements about the
population height-weight relationship. In other words, it was easy to understand the relationship between a
sample and a population in cross-sections. But what about time-series? Why should you be allowed to make
statistical inference about some population, given a sample at hand (using quarterly data from 1962 -2010, for
example)? Write an essay explaining the relationship between a sample and a population when using time
series.
Answer: Essays will differ by students. What is crucial here is the emphasis on stationarity or the concept that the
distribution remains constant over time. If the dependent variable and regressors are non -stationary,
then conventional hypothesis tests, confidence intervals, and forecasts can be unreliable. However, if
they are stationary, then it is plausible to argue that a sample will repeat itself again and again and
again, when getting additional data. It is in that sense that inference to a larger population can be made.
There are two concepts crucial to stationarity which are discussed in the textbook: (i) trends, and (ii)
breaks. Students should bring up methods for testing for stationarity and breaks, such as the DF and
ADF statistics, and the QLR test.
8) (Requires Internet access for the test question)
The following question requires you to download data from the internet and to load it into a statistical
package such as STATA or EViews.
a.
Your textbook suggests using two test statistics to test for stationarity: DF and ADF. Test the null
hypothesis that inflation has a stochastic trend against the alternative that it is stationary by
performing the DF and ADF test for a unit autoregressive root. That is, use the equation (14.34) in your
textbook with four lags and without a lag of the change in the inflation rate as a regressor for sample
period 1962:I 2004:IV. Go to the Stock and Watson companion website for the textbook and
download the data Macroeconomic Data Used in Chapters 14 and 16. Enter the data for consumer
price index, calculate the inflation rate and the acceleration of the inflation rate, and replicate the result
on page 526 of your textbook. Make sure not to use the heteroskedasticity -robust standard error
option for the estimation.
b.
Next find a website with more recent data, such as the Federal Reserve Economic Data (FRED) site at
the Federal Reserve Bank of St. Louis. Locate the data for the CPI, which will be monthly, and convert
the data in quarterly averages. Then, using a sample from 1962:I 2009:IV, re -estimate the above
specification and comment on the changes that have occurred.
c.
d.
Finally, calculate the ADF statistic, allowing for the lag length of the inflation acceleration term to be
determined by either the AIC or the BIC.
Answer: a. For the sample period 1962:I 2004:IV, the result is as follows:
0.51 0.21
t-Statistic
Prob.
2.37
0.02
DLP(-1)
-0.11
0.04
-2.69
0.01
D2LP(-1)
-0.19
0.08
-2.32
0.02
D2LP(-2)
-0.26
0.08
-3.15
0.00
D2LP(-3)
0.20
0.08
2.51
0.01
D2LP(-4)
0.01
0.08
0.13
0.90
R-squared
0.02
Adjusted R-squared
0.21
1.71
S.E. of regression
1.51
3.70
380.61
Schwarz criterion
3.81
Log likelihood
-312.37
Hannan-Quinn criter.
3.75
F-statistic
10.31
Durbin-Watson stat
1.99
Prob(F-statistic)
0.00
Hence the ADF statistic is -2.69. You cannot reject the null hypothesis of non-stationarity at the
5% level (critical value -2.86), but you could at the 10% level (critical value -2.57).
b. Not much has changed. The intercept became smaller, but was statistically insignificant anyway. The
slope coefficient increase somewhat (as did the Regression R2 with it) and its t-statistic also became
stronger. Some of this is the result of data revisions (even for the old sample period the slope coefficient
increased somewhat) while part of it has changed because of the longer sample period.
Dependent Variable: D2LP
Method: Least Squares
Date: 12/31/10 Time: 11:20
Sample: 1962Q1 2009Q4
Included observations: 192
0.62 0.26
t-Statistic
Prob.
2.36
0.02
DLP(-1)
-0.15
0.05
-2.75
0.01
D2LP(-1)
-0.29
0.08
-3.54
0.00
D2LP(-2)
-0.30
0.09
-3.45
0.00
D2LP(-3)
0.03
0.08
0.31
0.76
D2LP(-4)
-0.05
0.08
-0.62
0.54
R-squared
0.01
Adjusted R-squared
0.22
2.20
S.E. of regression
1.95
4.20
707.46
Schwarz criterion
4.31
Log likelihood
-397.64
Hannan-Quinn criter.
4.25
F-statistic
11.54
Durbin-Watson stat
2.00
Prob(F-statistic)
0.00
c. The DF statistic is obtained by simply regressing the change in the inflation rate on the lagged level of
the inflation rate. The t-statistic on the lagged inflation level is the ADF statistic, which is -5.28, rejecting
the null hypothesis of non-stationarity.
d. Both the AIC and the BIC have a minimum for two lags. For that case, the ADF statistic is -2.94 and
the null hypothesis of non-stationarity can therefore be rejected at the 5% level, but not at the 1% level.
i
j
j = (1- L )i, where i and j are typically omitted when they take the value of 1. Show
the expressions in Y only when applying the difference operator to the following expressions, and give the
resulting expression an economic interpretation, assuming that you are working with quarterly data:
(a) 4 Yt
(b)
2Yt
(c)
1 4 Yt
(d)
2
4 Yt
4 Yt = (1 - L4 ) Yt = Yt - Yt-4 . With quarterly data, this is the annual change. If Y is in logarithms,
then this is the annual growth rate.
(b) 2 Yt = (1 - L)2 Yt = (1 - 2L+ L2 )Yt = Yt - 2Yt-1 + Yt-2
Answer: (a)
Yt -
Yt-1
This represents the change of the change in a variable, or the acceleration. If Y is in logarithms, then
this is the quarterly change in the growth rate. A good example would be the acceleration in the
quarterly inflation rate.
(c) 1 4 Yt = (1 - L)(1 - L4 )Yt = (1 - L - L4 + L5 )Yt =Yt - Yt-1 - Yt-4 + Yt-5
= (Yt - Yt-4 ) - (Yt-1 - Yt-5 )
This is the quarterly change in the annual change. If Y is in logarithms, then this is the quarterly
acceleration or change in the annual growth rate.
(d)
2 Y = (1 - L4 )2 Y = (1 - 2L4 + L8 )Y =Y - 2Y
t
t
t-4 + Yt-8
t
t
4
4 Yt -
4 Yt-4
This represents the change in the annual change. If Y is in logarithms, then this is the change in the
annual growth rate.
2) The textbook displayed the accompanying four economic time series with markedly different patterns. For
each indicate what you think the sample autocorrelations of the level (Y) and change ( Y) will be and explain
your reasoning.
(a)
(b)
(c)
(d)
Answer: (a) There is strong positive autocorrelation in the federal funds rate, with sample autocorrelations
declining for higher lags. There are obvious long-term trends in the series in that the federal funds rate
was high during the first quarter of 1982, and high again in the second quarter of 1982. Similarly, it was
low during the first quarter of 1962 and again low in the second quarter of that year. Since inflationary
expectations and therefore the inflation rate itself play a large role in federal funds rate movements, it
should not be surprising to find a similar pattern in the autocorrelations for the inflation rate and the
federal funds rate. (The autocorrelations are 0.90, 0.83, 0.80 and 0.72 for lags one to four.) For the change
in the federal funds rate you would also expect a similar pattern in the autocorrelations as for the
inflation rate, i.e., a negative first autocorrelation. On average, an increase in the federal funds rate in
one quarter is associated with a decrease in the following quarter. (The autocorrelations are 0.14, -0.19,
0.24, -0.12.)
(b) (Different from the textbook, the figure here only displays the exchange rate behavior after the
collapse of the Bretton Woods system of fixed exchange rates.) As in the previous graph, there should be
positive autocorrelations reflecting long-term trends in the exchange rates. Students might point out that
due to purchasing power parity you could expect long-term exchange rate behavior or be similar to the
behavior of inflation rates. However, the inflation rate of the U.K would also have to be considered. (The
Stock/Watson 2e -- CVC2 8/23/06 -- Page 352
actual autocorrelations are 0.93, 0.85, 0.79, and 0.72 for lags one to four). Students may have difficulty
detecting the positive nature of the sample autocorrelations in the change of the exchange rate: positive
(negative) changes in the exchange rate tend to be followed by positive (negative) changes in the
following period. Perhaps students are able to see that the behavior of the exchange rate is somewhat
smoother than that of the federal funds rate. (The actual autocorrelations are 0.22, 0.14, 0.12, and 0.07 for
lags one to four.)
(c) Students should be able to identify the high autocorrelations in the level: typically a high level of real
GDP will be followed by a high level in the next period. In addition, there is to a large extent, a trend
increase. (The actual autocorrelations are 0.98, 0.96, 0.94, and 0.92 for one to four lags.) Since positive
growth rates in real GDP are typically followed by positive growth rates during the next quarter,
students should be able to see that the autocorrelations for the change in the logarithm of real GDP will
also be positive. (The actual autocorrelations are 0.29, 0.39, 0.40, and 0.36 for lags one to four.)
(d) Students should be able to see that the returns are essentially unpredictable, and that the level
autocorrelations should be very low. There are no long-term trends visible and a high return on a given
day is as likely to be followed by a high return the next day as a low return. (The actual autocorrelations
are 0.07, -0.01, -0.02, and 0.00 for lags one to four.) At the same time students should be able to see a
relatively strong negative first autocorrelation, since there are no long-term trends in the level returns. A
strong positive day-to-day change must therefore be followed, on average, by a strong negative change.
Due to the unpredictability, these autocorrelations should also fall off quite quickly (The actual
autocorrelations are 0.46, -0.04, -0.02, and 0.03 for lags one to four.)
3) You have decided to use the Dickey Fuller (DF) test on the United States aggregate unemployment rate (sample
period 1962:I 1995:IV). As a result, you estimate the following AR(1) model
UrateUs t = 0.114 0.024 UrateUSt-1 , R2 =0.0118, SER = 0.3417
(0.121) (0.019)
You recall that your textbook mentioned that this form of the AR(1) is convenient because it allows for you to
test for the presence of a unit root by using the t- statistic of the slope. Being adventurous, you decide to
estimate the original form of the AR(1) instead, which results in the following output
UrateUs t = 0.114 0.976 UrateUSt-1 , R2 =0.9510, SER = 0.3417
(0.121) (0.019)
You are surprised to find the constant, the standard errors of the two coefficients, and the SER unchanged,
while the regression R2 increased substantially. Explain this increase in the regression R 2 . Why should you
have been able to predict the change in the slope coefficient and the constancy of the standard errors of the two
coefficients and the SER?
Answer: There is no additional information in the second regression, hence the SSR, and therefore the SER, will
not change. The only difference is that the lag of the dependent variable has been subtracted from both
sides. This linear transformation changes the coefficient on the lag dependent variable from (-0.024) to
(-0.024)-(-1) = -0.976. The regression R2 is defined as ESS/TSS or 1-(SSR/TSS). The only change here
has been in the TSS, which is now calculated from a level rather than a difference. Since TSS increases
and SSR remains unchanged, SSR/TSS must decrease, and the regression R2 will increase. Finally, the
heteroskedasticity-robust standard errors contain the residuals and other terms involving the regressor,
both of which have not changed between the two specifications. Hence the standard errors should also
remain unchanged.
4) Consider the standard AR(1) Yt = 0 + 1 Yt-1 + ut, where the usual assumptions hold.
(a) Show that y t = 0 Yt-1 + ut, where y t is Yt with the mean removed, i.e., y t = Yt E(Yt). Show that E(Yt) = 0.
(b) Show that the r-period ahead forecast E(y T+r T) =
r
1 y T. If 0 < 1 < 1, how does the r-period ahead
for large r?
T+r T
(c) The median lag is the number of periods it takes a time series with zero mean to halve its current value (in
log(2)
.
expectation), i.e., the solution r to E(y T+r T) = 0.5y T. Show that in the present case this is given by r =
log( 1 )
2
1 y t-2 + ut + 1 ut-1.
T+1 T
T+r T
(c) E(y
or r =
)=
T+r T
T+r T
0
+
1- 1
)=
2
1 y T and so on until E(y T+r T ) =
r
1
r
0
for large r.
1 y T and hence E(y T+r T ) = 11
r
1
1 y T = 2 y T or
r
1
1 = 2 . Taking logs and solving for r then results in rlog( 1 ) = -log(2)
log(2)
.
log( 1 )
<1
This particular type of expectation formation is called the adaptive expectations hypothesis.
(a) In the above expectation formation hypothesis, expectations are formed at the beginning of the period, say
the 1st of January if you had annual data. Give an intuitive explanation for this process.
(b) Transform the adaptive expectation hypothesis in such a way that the right hand side of the equation only
contains observable variables, i.e., no expectations.
(c) Show that by substituting the resulting equation from the previous question into the original equation, you
get an ADL(0, ) type equation. How are the coefficients of the regressors related to each other?
(d) Can you think of a transformation of the ADL(0, ) equation into an ADL(1,1) type equation, if you allowed
Stock/Watson 2e -- CVC2 8/23/06 -- Page 354
< 1, in
which case the expected value can be seen as a linear combination of the previous periods forecast and
the previous periods actual value.
e
e
e
(b) X t =(1 - ) X t-1 + X t-1 = (1- )[(1- ) X t-2 + X t-2 ] + X t-1
e
= (1- )2 X t-2 + X t-1 + (1- )Xt-2 .
Repeated substitution results in
e
n+1
X t-2 = (1- )n+1 X t-(n+1) +
e
, Xt =
i=0
(1- )i Xt-i-1 .
i=0
e
(c) Yt = 0 + 1 X t + ut = 0 + 1 (
(1- )i Xt-i-1 ) + ut or
i=0
Yt = 0 + 1 Xt-1 + 2 Xt-2 + ... + rXt-r + ... ut . Here 0 = 0 , and
i = 1 (1- )i; 1.
(d) Lagging both sides of Yt = 0 +
1(
results in
(1- )Yt-1 = 0 (1- ) + 1 (
= 0+
1(
(1- ) i+1 Xt-i-2 ) + (1- )ut-1 . Finally, subtraction of this equation from Yt
i=0
i=0
Yt = 0 + 1 X t-1 + (1- )Yt-1 + (ut - (1- )ut-1 ).
6) The following two graphs give you a plot of the United States aggregate unemployment rate for the sample
period 1962:I to 1999:IV, and the (log) level of real United States GDP for the sample period 1962:I to 1995:IV.
You want test for stationarity in both cases. Indicate whether or not you should include a time trend in your
Augmented Dickey-Fuller test and why.
Answer: Looking over the entire sample period, there does not appear to be a deterministic trend for the
unemployment rate. There is no need to include a time trend for the ADF test in this case. The log level
of real GDP, on the other hand, is clearly upward trended and a time trend should therefore be included.
7) (Requires Appendix material): Show that the AR(1) process Yt = a1 Yt-1 + et; a1 < 1, can be converted to a MA(
) process.
2
Answer: Yt = a1 Yt-1 + et = a1 (a1 Yt-2 + et-1 ) + et = a 1 Yt-2 + et + a1 et-1 . Repeated substitution then results in Yt =
n+1
n
a 1 Yt-(n+1)+ et + a1 et-1 + ... + a 1 et-n, and for n
q
Yt = et + a1 et-1 + ... + a 1 et-q + ... .
8) (Requires Appendix material) The long-run, stationary state solution of an AD(p,q) model, which can be
written as A(L)Yt = 0 + c(L)Xt-1 + ut, where a0 = 1, and aj = - j, cj = j, can be found by setting L=1 in the two
lag polynomials. Explain. Derive the long-run solution for the estimated ADL(4,4) of the change in the
inflation rate on unemployment:
Inft = 1.32 .36 Inft-1 0.34 Inft-2 + 0.7 Inft-3 0.3 Inft-4
-2.68Unempt-1 + 3.43Unempt-2 1.04Unempt-3 + .07Unempt-4
Assume that the inflation rate is constant in the long-run and calculate the resulting unemployment rate. What
does the solution represent? Is it reasonable to assume that this long -run solution is constant over the
estimation period 1962-1999? If not, how could you detect the instability?
Answer: In a stationary state equilibrium, variables do not change from one period to the next. Hence Xt-1 =Xt-2
= ... Xt-q. This is achieved in the above formulation by setting L=1. This solution represents the
equilibrium rate of unemployment or NAIRU. In the above example it is 6%. The NAIRU does not
remain constant but instead is a function of various determining variables such as demographic
composition of the labor force, the competitiveness of labor and product markets, the generosity of the
unemployment benefits system, etc. One way to detect instability is to test for breaks, using a Chow -test,
if the break date is known, or using the QLR statistic, if the break date is unknown.
9) You want to determine whether or not the unemployment rate for the United States has a stochastic trend
using the Augmented Dickey Fuller Test (ADF). The BIC suggests using 3 lags, while the AIC suggests 4 lags.
(a) Which of the two will you use for your choice of the optimal lag length?
(b) After estimating the appropriate equation, the t-statistic on the lag level unemployment rate is (2.186)
(using a constant, but not a trend). What is your decision regarding the stochastic trend of the unemployment
rate series in the United States?
(c) Having worked in the previous exercise with the unemployment rate level, you repeat the exercise using the
difference in United States unemployment rates. Write down the appropriate equation to conduct the
Augmented Dickey-Fuller test here. The t-statistic on relevant coefficient turns out to be (-4.791). What is your
conclusion now?
Answer: (a) The BIC is a consistent estimator of the true lag length, while the AIC will overestimate the lag
length. The textbook suggests that if the researcher is concerned about too few lags, then the AIC can be
used as a reasonable alternative.
(b) The large-sample critical value of the ADF statistic is 2.57 at the 10% level. Hence you cannot reject
the null hypothesis of a unit root.
(c) 2 UrateUs t = 0 +
UrateUSt-1 + 1 2 UrateUSt-1
+ 2 2 UrateUSt-2 + 3 2 UrateUSt-3 + ut
The critical value at the 1% level is 3.43, so that you can reject the null hypothesis of a unit root in the
change of the U.S. unemployment rate.
1 < 1..
2
3
Yt = 0 (1 + 1 +
+
1
1 + ...) +
i=0
expression is
Yt =
0
1- 1
0
1- 1
i
u
1 t-i . To find the mean and the variance, take first expectations on both sides E(Yt) =
+
i=0
+
i=0
i
0
E(ut-i) =
, since E(ut) = 0 for all t.
1
1- 1
i=0
i
( ) E(ut-i)2 =
1
2
u
i=0
i
( 1 )2 =
i
2
u
1 t-i . Hence the variance is E(Yt - E(Yt)) =
2
u
2
11
(b) The first two autocovariances are defined as cov(Yt, Yt-1 ) and cov(Yt, Yt-2 ). Using the fact that Yt =
0
+
1- 1
i=0
i
u
1 t-i and that the expected values for both Yt and Yt-j, you get E[(Yt - E(Yt)(Yt-1 i
u )(
1 t-i
E(Yt-1 )] = E[(
i=0
var(ut)( 1 +
3
1 +
i=1
i
u
1 t-i )]=
5
1 + ...) = var(ut) 1 (1 + 1 +
2
1 + ...)
2
u
11 -
2
u
2
(and, more generally cov(Yt, Yt-j)=
11- 1
cov(Yt, Yt-j)
var(Yt)
j
).
11- 1
j
1 ).
Stock/Watson 2e -- CVC2 8/23/06 -- Page 358
2
u
2
1 (and, in general,
11) Find data for real GDP (Yt) for the United States for the time period 1959:I (first quarter) to 1995:IV. Next
generate two growth rates: The (annualized) quarterly growth rate of real GDP [(ln Yt ln Yt-1 )400] and the
annual growth rate of real GDP [(ln Yt ln Yt-4 )100]. Which is more volatile? What is the reason for this?
Explain.
Answer:
The quarterly growth rate that is more volatile because the annual growth rate is a moving average of the
quarterly growth rate, and hence wild swings are smoothed out:
(ln Yt ln Yt-4) = (ln Yt ln Yt-1) + (ln Yt-1 ln Yt-2) + (ln Yt-2 ln Yt-3)+ (ln Yt-3 ln Yt-4)
12) You have collected data for real GDP (Y) and have estimated the following function:
^
b.
c.
Do you think that given the regression R2 , you should use the equation to forecast real GDP beyond the sample
period?
Answer: a. The slope coefficient indicates the average growth rate per quarter. Since 1896, the U.S. economy has grown at a
rate of approximately 3%. As a result, observing a quarterly growth rate of 0.7% makes very much sense.
b. The regression R2 tells you that 98 percent of the variation in the log of real GDP is explained by the model.
Since the model only contains a deterministic time trend, this seems high on face value.
c. The logarithm of real GDP is bound to be non-stationary (using the ADF statistic, you would not be able to reject
the null hypothesis that the log of real GDP has a unit root). Hence this equation should not be used for forecasting
despite the very high regression R2 .
22) The distributed lag model assumptions include all of the following with the exception of:
A) There is no perfect multicollinearity.
B) Xt is strictly exogenous.
C) E(ut Xt, Xt-1 , Xt-2 ) = 0
D) The random variables Xt and Yt have a stationary distribution.
Answer: B
23) In the distributed lag model, the coefficient on the contemporaneous value of the regressor is called the
A) dynamic effect.
B) cumulative multiplier.
C) autoregressive error.
D) impact effect.
Answer: D
24) In the distributed lag model, the dynamic causal effect
A) is the sequence of coefficients on the current and lagged values of X.
B) is not the same as the dynamic multiplier.
C) is generated by choosing different truncation points for the HAC standard errors.
D) requires estimation of the model by Cochrane-Orcutt method.
Answer: A
25) HAC standard errors should be used because
A) they are convenient simplifications of the heteroskedasticity -robust standard errors.
B) conventional standard errors may result in misleading inference.
C) they are easier to calculate than the heteroskedasticity-robust standard errors and yet still allow you to
perform inference correctly.
D) when there is a structural break, then conventional standard errors result in misleading inference.
Answer: B
26) The interpretation of the coefficients in a distributed lag regression as causal dynamic effects hinges on
A) the assumption that X is exogenous
B) not having more than four lags when using quarterly data
C) using GLS rather than OLS
D) the use of monthly rather than annual data
Answer: A
27) Given the relationship between the two variables, the following is most likely to be exogenous:
A) the inflation rate and the short term interest rate: short-term interest rate is exogenous
B) U.S. rate of inflation and increases in oil prices: oil prices are exgoneous
C) Australian exports and U.S. aggregate income: U.S. aggregate income is exogenous
D) change in inflation, lagged changes of inflation, and lags of unemployment: lags of unemployment are
exogenous
Answer: C
28) When Xt is strictly exogenous, the following estimator(s) of dynamic causal effects are available:
A) estimating an ADL model and calculating the dyamic multipliers from the estimated ADL coefficients
B) using GLS to estimate the coefficients of the distributed lag model
C) neither (a) or (b)
D) (a) and (b)
Answer: D
1 Xt
2 Xt-1
3 Xt-2
++
r+1 Xt-r
Answer: B
2) Your textbook presents as an example of a distributed lag regression the effect of the weather on the price of
orange juice. The authors mention U.S. income and Australian exports, oil prices and inflation, monetary policy
and inflation, and the Phillips curve as other candidates for distributed lag regression. Briefly discuss whether
or not the exogeneity assumption is likely to hold in each of these cases. Explain why it is so hard to come up
with good examples of distributed lag regressions in economics.
Answer: Students answers should follow the discussion of section 13.7 in the textbook. Although there is some
degree of simultaneity between Australian exports and U.S. income, the Australian economy is too small
relative to the American economy to present much of a feedback from a fall in exports. It is therefore
reasonable to assume that U.S. income is exogenous in a regression of Australian exports on U.S. income.
The situation is different for oil prices and inflation since it is reasonable to assume that members of
OPEC countries analyze world wide economic conditions, including inflation rates, when setting oil
prices. If this is the case, then oil prices are not exogenous. Monetary policy and inflation are other
examples where it cannot be assumed reasonably that the monetary base or the federal funds rate is
exogenous. The Federal Reserve takes into account current and future inflation rates when setting their
instrument, which is thereby endogenous. Finally, the Phillips curve is another example where it cannot
be assumed that the (lagged) unemployment rate is exogenous, since past values of the unemployment
rate were simultaneously determined with past inflation rates.
3) Money supply is linked to the monetary base by the money multiplier. Macroeconomic textbooks tell you that
the central bank cannot control the money supply, but it can control the monetary base. As a result, you decide
to specify a distributed lag equation of the growth in the money supply on the growth in the monetary base.
One of your peers tells you that this is not a good idea for modeling the relationship between the two variables.
What does she mean?
Answer: Although the monetary base is one of the determinants of the money supply, there are other factors,
such as interest rates, that have an effect on the money multiplier. Hence there is the problem of omitted
variables. If interest rates are correlated with the monetary base, then the OLS estimator will be
inconsistent. Furthermore, it is likely that due to financial innovations, dynamic causal effects have
changed over time. Finally there is the concern of simultaneous causality bias. If the Federal Reserve
changes the monetary base as a result of changes in the money supply, perhaps as a result of targeting,
then the monetary base becomes endogenous.
4) In your intermediate macroeconomics course, government expenditures and the money supply were treated as
exogenous, in the sense that the variables could be changed to conduct economic policy to influence target
variables, but that these variables would not react to changes in the economy as a result of some fixed rule. The
St. Louis Model, proposed by two researchers at the Federal Reserve in St. Louis, used this idea to test whether
monetary policy or fiscal policy was more effective in influencing output behavior. Although there were
various versions of this model, the basic specification was of the following type:
ln(Yt) = 0 + 1 ln mt + ... + p ln mt-p-1 + p+1 ln Gt + ... + p+q ln Gt-q-1 + ut
Assuming that money supply and government expenditures are exogenous, how would you estimate dynamic
causal effects? Why do you think this type of model is no longer used by most to calculate fiscal and monetary
multipliers?
Answer: If the money supply and government expenditures were exogenous, then a distributed lag model could
be used to estimate the dynamic multipliers and cumulative dynamic multipliers using OLS. The
coefficients in the above equation are then the dynamic multipliers. To obtain the h-period cumulative
dynamic multipliers, all coefficients over the h-periods have to be added up. There is an alternative form
for the above equation which allows for statistical testing of the cumulative dynamic multipliers. This
involves differencing the regressors with the exception of the last lag, p and q, in the above equation. The
coefficient on the p and q lagged regressor then represents the long-run cumulative multiplier. The OLS
estimator of the coefficients in the above equation is consistent. However, the errors are likely to be
autocorrelated since omitted variables from the above equation are probably serially correlated
themselves. In that case the OLS standard errors are inconsistent and statistical inference based on these
standard errors will be misleading. To avoid this problem, heteroskedasticity- and
autocorrelation-consistent standard errors can be calculated. The reason why this type of model is no
longer used by most to calculate fiscal and monetary multipliers is that researchers are not willing to
assume that the money supply and government expenditures are exogenous. Both monetary and fiscal
policy takes into account current and future expected output growth in setting their policy instruments,
which are therefore endogenous.
5) Your textbook mentions heteroskedasticity- and autocorrelation- consistent standard errors. Explain why you
should use this option in your regression package when estimating the distributed lag regression model. What
are the properties of the OLS estimator in the presence of heteroskedasticity and autocorrelation in the error
terms? Explain why it is likely to find autocorrelation in time series data. If the errors are autocorrelated, then
why not simply adjust for autocorrelation by using some non-linear estimation method such as
Cochrane-Orcutt?
Answer: In the presence of either heteroskedasticity and/or autocorrelation in the errors, OLS estimation of the
regression coefficients is still consistent. However, the homoskedasticity-only or
heteroskedasticity-robust standard errors are inconsistent and use of these in the presence of serial
correlation results in misleading statistical inference. For example, confidence intervals do not contain
the true value in the postulated number of times in repeated samples. The solution is to adjust the
estimator for the standard errors by incorporating sample autocorrelation estimates. This results in the
heteroskedasticity- and autocorrelation-consistent (HAC) estimator of the variance of the estimator. For
this estimator to be consistent, a certain truncation parameter is introduced, so that not all T-1 sample
autocorrelations are used. Incorporating this idea into the HAC formula results in the Newey -West
variance estimator.
Autocorrelation in the errors is likely if there are omitted variables which are slowly changing over time.
Since the omitted variables are implicitly contained in the error term, this would result in autocorrelation
of the error term. For generalized least squares to have desirable properties, the regressors have to be
strictly (past, present, and future) exogenous, rather than just (past and present) exogenous. There are
very few truly exogenous variables in economics. Furthermore, most of the relationships between
economic time series contain simultaneous causality. As the example in the textbook on orange juice
prices and cold weather showed, it is even more difficult to find strictly exogenous variables.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 366
6) Your textbook presents as an example of a distributed lag regression the effect of the weather on the price of
orange juice. The authors mention U.S. income and Australian exports, oil prices and inflation, monetary policy
and inflation, and the Phillips curve as other potential candidates for distributed lag regression. You are
considering estimating the effect of minimum wages on teenage employment (employment population ratio)
using a time series of U.S. data. Write a short essay on whether a distributed lag model would be a suitable
tool to figure out dynamic causal effects in this case.
Answer: One of the first questions student must address is whether or not the X variable here is exogenous. In
studies of the labor market, e.g. microeconomics, students learned that it is real wages that determine
employment, not nominal wages. Some authors have used relative wages as an explanatory variable,
where the denominator is average hourly earnings. Setting aside whether or not minimum wages are
exogenous, the students should then focus on whether the price index used to adjust nominal minimum
wages or average hourly earnings are exogenous. However, most students will focus only on the
numerator (nominal minimum wages) and will argue that minimum wages are typically set by the
legislature following some political process and may therefore be considered exogenous. Some will go
further and argue that the process of setting minimum wages will depend on the state of the business
cycle. For example, recent increases in minimum wages (2007, 2008, 2009) would most likely not have
occurred if legislators would have anticipated teenage unemployment rates of over 25% for teenagers. If
that is the case, then minimum wage legislation depends on the state of the business cycle and hence
teenage employment. As a result, minimum wages should not be considered exogenous.
Lag
number
0
1
2
3
4
BIC
(1)
Dynamic
multipliers
-0.9
(1.3)
3.5
(1.6)
-1.3
(1.7)
0.2
(1.7)
-2.0
(1.5)
-234.4
(2)
Dynamic
multipliers
-1.1
(1.3)
3.2
(1.6)
-3.0
(1.6)
1.5
(1.2)
-
(3)
Dynamic
multipliers
-1.3
(1.7)
1.8
(1.6)
-2.2
(1.4)
-
(4)
Dynamic
multipliers
-0.2
(1.7)
0.8
(1.5)
-
(5)
Dynamic
multipliers
-2.0
(1.5)
-
-236.1
-238.5
-240.0
-241.8
variable, which takes on the value of one during the first quarter of the year after the election. Finally, a
friendly political scientist provides you with (i) an events variable, (ii) a Vietnam binary variable, and (iii) a
honeymoon variable, which measures the effect of a higher popularity of a president immediately following
the election. (The coefficients of these variables will not be reported here.)
Assuming that consumer sentiment is exogenous, you estimate the following two specifications (numbers in
parenthesis are heteroskedasticity- and autocorrelation-consistent standard errors):
Approvalt = 26.08 + 0.178 ICSt + 0.232 ICSt-1 ; R2 = 0.667, SER = 7.00
(8.83) (0.120)
(0.135)
Approvalt = 26.08 + 0.178 ICSt + 0.411 + ICSt-1 ; R2 = 0.667, SER = 7.00
(8.17) (0.120 )
(0.089)
What is the difference between the two specifications? What is the advantage of estimating the second
equation, if any?
(b) Assuming that the errors follow an AR(1) process, you also estimate the following alternative:
Approvalt = -4.61 + 0.300 ICSt 0.070 ICSt-1 - 0.054 ICSt-2 ; + 0.776 Approvalt-1 ;
(5.84) (0.083)
(0.099)
(0.083)
(0.057)
R2 = 0.868, SER = 4.45
How is this specification related to the previous ones? What implicit assumptions did you have to make to
allow for desirable properties of the OLS estimator?
(c) You finally estimate the approval equation using the quasi-difference specification and the GLS estimator.
(0.099)
causality in addition. If a variable is not exogenous, then it is also not strictly exogenous.
~ ~
3) Consider the following distributed lag model Yt = 0 + 1 Xt + 2 Xt-1 + ut, where ut = 1 ut-1 + ut, ut is serially
uncorrelated, and X is strictly exogenous.
(a) How many parameters are there to be estimated between the two equations?
(b) Using the two equations of the model above, derive the ADL form of the model.
(c) There are five regressors in the ADL model, namely Yt-1 , Xt, Xt-1 , Xt-2 and the constant. Estimating the
ADL model linearly will give you five coefficients. Can you derive the parameters of the original two equation
model from these five estimates? Why or why not?
(d) What alternative method do you have to retrieve the parameters of the two equation model?
Answer: (a) There are four parameters to be estimated, 0 , 1 , 2 and 1 .
(b) The ADL form of the model is derived by multiplying the first equation by
Yt = 0 + 1 Xt + 2 Xt-1 + ut
-[ 1 Yt-1 = 1 0 + 1 1 Xt-1 + 1 2 Xt-2 + 1 ut-1 ]
which, after collecting terms, results in
Yt = 0 (1- 1 ) + 1 Yt-1 + 1 Xt + ( 2 - 1 1 ) Xt-1 - 1 2 Xt-2 + (ut - 1 ut-1 )
or
~
Yt = 0 + 1 Yt-1 + 0 Xt + 1 Xt-1 + 2 Xt-2 + ut.
(c) The original four parameters cannot be derived without restrictions since in essence you have five
equation in four unknowns.
(d) The above model can be specified in quasi-differences, i.e.,
(0.069)
(0.049)
(0.068)
(0.051)
(0.027)
R2 = 0.346, SER=0.03
Stock/Watson 2e -- CVC2 8/23/06 -- Page 370
where ygrowth is quarterly growth of real GDP, mgrowth is quarterly growth of real money supply (M2), and
ggrowth is quarterly growth of real government expenditures. d in front of ggrowth and mgrowth indicates a
change in the variable.
(a) Assuming that money and government expenditures are exogenous, what do the coefficients represent?
Calculate the h-period cumulative dynamic multipliers from these. How can you test for the statistical
significance of the cumulative dynamic multipliers and the long-run cumulative dynamic multiplier?
(b) Sketch the estimated dynamic and cumulative dynamic fiscal and monetary multipliers.
(c) For these coefficients to represent dynamic multipliers, the money supply and government expenditures
must be exogenous variables. Explain why this is unlikely to be the case. As a result, what importance should
you attach to the above results?
Answer: (a) In that case the coefficients represent dynamic multipliers.
Lag number
0
1
2
3
4
Monetary
Dynamic
Multiplier
0.006
0.235
0.344
0.385
0.425
Monetary
Cumulative
Multiplier
0.006
0.241
0.585
0.970
1.395
Fiscal
Dynamic
Multiplier
0.170
-0.044
-0.003
-0.079
0.018
Fiscal
Cumulative
Multiplier
0.170
0.126
0.123
0.044
0.062
To test for the significance of the cumulative dynamic multipliers and the long -run cumulative dynamic
multiplier, the equation must be reestimated with all regressors appearing in differences with the
exception of the longest lag. The coefficients of these regressors then represent cumulative dynamic
multipliers and t-statistics can be used to test for their statistical significance.
(b) See the accompanying figures.
(c) There is little reason to believe that these government instruments are exogenous. Even if the
monetary base and those components of government expenditures which do not respond to business
cycle fluctuations had been chosen rather than the above regressors, then these instruments respond to
changes in the growth rate of GDP. As a matter of fact, government reaction functions were also
estimated at the time to capture how government instruments respond to changes in target variables. As
a result, the regressors will be correlated with the error term, OLS estimation is inconsistent, and
inference not dependable. It is hard to imagine how useable information can be retrieved from these
numbers.
5) Your textbook used a distributed lag model with only current and past values of Xt1 coupled with an AR(1)
error model to derive a quasi-difference model, where the error term was uncorrelated.
(a) Instead use a static model Yt = 0 + 1 Xt + ut here, where the error term follows an
AR(1). Derive the quasi difference form. Explain why in the case of the infeasible GLS estimators you could
easily estimate the s by OLS.
(b) Since 1 (the autocorrelation parameter for ut) is unknown, describe the Cochrane-Orcutt estimation
procedure.
(c) Explain how the iterated Cochrane-Orcutt estimator works in this situation. Iterations stop when there is
convergence in the estimates. What do you think is meant by that?
(d) Your textbook has pointed out that the iterated Cochrane-Orcutt GLS estimator is in fact the nonlinear least
squares estimator of the model. Given that -1 < 1 < 1, suggest a grid search or some strategy to nail down
^
the value of 1 which minimizes the sum of squared residuals. This is the so-called Hildreth-Lu method.
Answer: (a) The quasi-difference model is derived by multiplying the equation by
Yt = 0 + 0 Xt + ut
-[ 1 Yt-1 = 1 0 + 1 1 Xt-1 + 1 ut-1 ]
which results in
Stock/Watson 2e -- CVC2 8/23/06 -- Page 372
Yt = 0 + 1 Xt + ut.
If 1 was known, then it would be possible to generate the quasi-difference variables in a statistical
package and then estimate the coefficients using the transformed variables using OLS.
(b) In this case, nonlinear least squares has to be used to estimate the three parameters. One possible
feasible GLS estimator in this case is the Cochrane-Orcutt estimator. In the first step, 1 is set to zero, in
which case 0 and 1 can be estimated by OLS. The resulting residuals are then used to calculate the
OLS estimator for 1 . This, in return, can then generate the quasi-differenced variables and OLS is then
employed to get the estimate of 0 and 1 .
(c) The iterated Cochrane-Orcutt estimator continues the process described in (a). For example, in the
next step, a new set of residuals is used to update the previous estimate of 1 , which will generate a new
set of quasi-differenced variables and new estimates of 0 and 1 . The iterations stop when the
differences in the estimates from one round to the next differ by less than a very small number, which
can be chosen by the econometrician. This is then called convergence.
(d) Under the Hildreth-Lu method, the sum of squared residuals is computed for various values of 1 ,
using quasi-differenced variables. For example, initially a coarse grid is chosen of 0.9, -0.8, -0.7, , 0.7,
0.8, 0.9. For the value of 1 which yields the smallest SSR, say 0.7, a new finer grid is chosen, such as
0.65, 0.66, 0.67, , 0.73, 0.74, 0.75, and again the SSR is calculated for each of these values. The value of
1 which has the smallest SSR is retained and yet a finer grid around it is chosen, etc.
6) (Requires Appendix material) Your textbook states that in the distributed lag regression model, the error term
ut can be correlated with its lagged values. This autocorrelation arises, because, in time series data, the omitted
factors that comprise ut can themselves be serially correlated.
(a) Give an example what the authors have in mind.
(b) Consider the ADL model, where the Xs are strictly exogenous, and there is no autocorrelation (and/or
heteroskedasticity) in the error term.
Yt =
~
*
0 + 1 Xt + 2 Xt-1 + 3 Yt-1 + ut
How many coefficients are there to be estimated? Show that this model can be respecified using the lag
operator notation:
(L)Yt =
~
*
0 + 1 (L)Xt + ut
*
0
1 ~
where 0 =
and ut =
u.
1- 3
1- 3 L t
(d) Explain why autocorrelation in this model can be seen as a simplification, not a nuisance. Can you use
the F-test to test the above hypothesis? Why or why not?
Answer: (a) Taking the textbook example of the percentage change in the real price of orange juice and the
number of freezing degree days, the error term potentially contains other variables such as change in
tastes of the population, the price of substitutes, income, etc. Some of these variables may be hard to
measure, but all of these are bound to change slowly over time and are not likely to be correlated with
the weather variable.
*
(b) (1 - 3 L)Yt =
+ 1 (1+
0
2
1
L)
(c) Dividing both sides by 1 - 3 L results in the above equation after cancellation.
(d) There is one parameter less to estimate. The restriction is non -linear, so the F-test does not apply
here.
7) It has been argued that Canadas aggregate output growth and unemployment rates are very sensitive to
United States economic fluctuations, while the opposite is not true.
(a) A researcher uses a distributed lag model to estimate dynamic causal effects of U.S. economic activity on
Canada. The results (HAC standard errors in parenthesis) for the sample period 1961:I -1995:IV are:
urcant = -1.42 + 0.717 urus t + 0.262 urust-1 + 0.023 urus t-2 - 0.083 urust-3
(0.83) (0.457)
(0.557)
(0.398)
(0.405)
(0.385)
where urcan is the Canadian unemployment rate, and urus is the United States unemployment rate.
Calculate the long-run cumulative dynamic multiplier.
(b) What are some of the omitted variables that could cause autocorrelation in the error terms? Are these
omitted variables likely to uncorrelated with current and lagged values of the U.S. unemployment rate? Do
you think that the U.S. unemployment rate is exogenous in this distributed lag regression?
Answer: (a) The long-run cumulative dynamic multiplier is 1.460.
(b) Autocorrelation in the error term is the result of omitted variables which are serially correlated.
Canadian unemployment rates depend on Canadian labor market conditions and most likely on
Canadian aggregate demand variables in the short run. Prime candidates for slowly changing omitted
variables would be demographics, indicators of unemployment insurance generosity, changes in the
terms of trade, monetary policy indicators such as the real interest rate, etc. Some of these variables are
highly likely to be correlated with U.S. unemployment rates since demographics are similar between the
two countries and Canadian monetary policy often follows moves made by the Federal Reserve. A case
could be made that the U.S. unemployment rate is exogenous as a result of the relative size of the two
economies. However, due to the size of the trade between the two countries, this is not as easy to
support as if the dependent variable were the unemployment rate in Costa Rica, say.
e
8) Consider the following model Yt = 0 + X t + ut where the superscript e indicates expected values. This may
represent an example where consumption depends on expected, or permanent, income. Furthermore, let
expected income be formed as follows:
e
e
e
X t = X t-1 + (Xt - X t-1 ); 0 <
<1
(a) In the above expectation formation hypothesis, expectations are formed at the end of the period, say the 31 st
of December, if you had annual data. Give an intuitive explanation for this process.
(b) Rewrite the expectations equation in the following form:
e
e
X t = (1 ) X t-1 + X t
e
Next, following the method used in your textbook, lag both sides of the equation and replace X t-1 . Repeat
e
e
this process by repeatedly substituting expression for X t-2 , X t-3 , and so forth. Show that this results in the
following equation:
e
e
X t = X t + (1- ) Xt-1 + (1- )2 Xt-2 + ... + (1- )n Xt-n + (1 )n+1 X t+1
Explain why it is reasonable to drop the last right hand side term as n becomes large.
e
(c) Substitute the above expression into the original model that related Y to X t . Although you now have right
hand side variables that are all observable, what do you perceive as a potential problem here if you wanted to
estimate this distributed lag model without further restrictions?
(d) Lag both sides of the equation, multiply through by (1- ), and subtract this equation from the equation
found in (c). This is called a Koyck transformation. What does the resulting equation look like? What is the
error process? What is the impact effect (zero-period dynamic multiplier) of a unit change in X, and how does
it differ from long run cumulative dynamic multiplier?
e
Answer: (a) If the forecast error for the previous period, (Xt - X t-1 ) was zero, then expectations are not changed
for the next period. If there was a non-zero forecast error, then expectations are changed by a fraction of
that forecast error.
e
e
e
e
e
(b) Substitution of X t-1 = (1 - ) X t-2 + X t-1 into X t = (1- ) X t-1 + X t results in X t = (1- )2
e
e
e
e
X t-2 + X t + (1- ) X t-1 . The process is then repeated for X t-2 , which gives X t = (1- )3 X t-3 +
X t + (1- ) X t-1 + (1- )2 X t-2 and so on. The last term involving the unobservable expectation can be
dropped for large n since 0 <
< 1.
e
(c) Yt = 0 + 1 X t + ut=
0 + 1 X t + 1 (1- )Xt- 1 + 1 (1- )2 Xt-2 + ... + 1 (1- )n Xt-n + ut.
For large n, this would require estimation of a large number of coefficients, potentially more than there
are observations available on lags of X.
(d) The Koyck transformation works as follows
Stock/Watson 2e -- CVC2 8/23/06 -- Page 375
< 1.
9) The distributed lag regression model requires estimation of (r+1) coefficients in the case of a single explanatory
variable. In your textbook example of orange juice prices and cold weather, r = 18. With additional explanatory
variables, this number becomes even larger.
Consider the distributed lag regression model with a single regressor
Yt = 0 + 1 Xt + 2 Xt-1 + 3 Xt-2 + ... + r+1 Xt-r + ut
(a) Early econometric analysis of distributed lag regression models was interested in reducing the number of
parameters by approximating the coefficients by a polynomial of a suitable degree, i.e., i+1 f(i) for i = 0, 1, ,
r. Let f(i) be a third degree polynomial, with coefficients 0 , ...., 3 . Specify the equations for 1 , 2 , 3 , 4 , and
r+1 .
(b) Substitute these equations into the original distributed lag regression, and rearrange terms so that Y appears
as a linear function of 0 , 0 , 1 , 2 , 3 and a transformation of the Xt, Xt-1 , Xt-2 , ..., Xt-r
(c) Assume that the third-degree polynomial approximation is quite accurate. Then what is the advantage of
this polynomial lag technique?
Answer: (a) For a third degree polynomial, f(i) = 0 + 1 i + 2 i2 + 3 i3 . Then
1 = f(0) = 0
2 = f(1) = 0 + 1 + 2 + 3
3 = f(2) = 0 + 2 1 + 4 2 + 8 3
4 = f(3) = 0 + 3 1 + 9 2 + 27 3
...
r+1 = f(r) = 0 + r 1 + r2 2 + r3 3
(b) Substitution into the original distributed lag regression yields
Yt = 0 + 0 Xt + ( 0 + 1 + 2 + 3 )Xt-1 + ( 0 + 2 1 + 4 2 + 8 3 )Xt-2
+ ... + ( 0 + r 1 + r2 2 + r3 3 )Xt-r
and collecting terms in the coefficients results in
Yt = 0 + 0 (Xt + Xt-1 + Xt-2 + ... + Xt-r) + 1 (Xt-1 + 2Xt-2 + ... + rXt-r)
+ 2 (Xt-1 + 4Xt-2 + ... + r2 Xt-r) + 3 (Xt-1 + 8Xt-2 + ... + r3 Xt-r).
(c) By placing restrictions on the lag distribution and transforming the regressors, there are fewer
parameters to estimate, in this case five.
10) The distributed lag model relating orange juice prices to the Orlando weather reported in the text was of the
form
%ChgPt = 0 + 1 FDD t + 2 FDD t-1 + 3 FDD t-2 + ... +
19FDD t18 + ut
(a) Suppose that an agricultural economist tells you that a freeze in December is more harmful than a freeze in
the other months. How would you modify the regression to incorporate this effect? How would you test for
this December effect?
(b) The same economist tells you that the damage caused by freezes is not well captured by the FDD variable.
She says that a single day temperature with a temperature of 24 is more damaging than 8 days with a
temperature of 31. How would you modify the regression to incorporate this effect?
Answer: (a) A binary variable can be added to the list of regressors, which takes on the value of one in December
and is zero otherwise. A t-statistic can be computed for the coefficient of the December binary variable,
using HAC standard errors. The t-statistic has a standard normal distribution.
(b) An additional regressor (TempFreeze) can be introduced, either by itself or interacted with FDD. To
capture the postulated effect, it might be specified as follows:
TempFreezet = DFreezet
FDD
(Tempt - 32)2
i=1
where DFreeze is a binary variable that takes on the value of one for a month with freezing temperature,
Temp is the minimum temperature for any monthly freezing degree day.
11) (Requires some calculus) In the following, assume that Xt is strictly exogenous and that economic theory
suggests that, in equilibrium, the following relationship holds between Y* and Xt, where the * indicates
equilibrium.
Y* = kXt
An error term could be added here by assuming that even in equilibrium, random variations from strict
proportionality might occur. Next let there be adjustment costs when changing Y, e.g. costs associated with
changes in employment for firms. As a result, an entity might be faced with two types of costs: being out of
equilibrium and the adjustment cost. Assume that these costs can be modeled by the following quadratic loss
function:
L=
1 (Y t
Y* )2 +
1 (Y t
Yt-1 )2
a.
Minimize the loss function w.r.t. the only variable that is under the entitys control, Yt and solve for Yt.
b.
Note that the two weights on Y* and Yt-1 add up to one. To simplify notation, let the first weight be
and the second weight (1- ). Substitute the original expression for Y* into this equation. In terms of the
ADL(p,q) terminology, what are the values for p and q in this model?
Answer: a. Yt =
b. Yt =
1
Y* +
+
1 2
1
Yt-1
+
1 2
12) Your textbook estimates the initial relationship between the percentage change of real frozen OJ and the
freezing degree days as follows:
%ChgPt = -0.40 + 0.47 FDD t
(0.22) (0.13)
t = 1950:1 2000:12, R2 = 0.09, SER = 4.8
a.
Calculate the t-statistic for the slope coefficient. Can you reject the null hypothesis that the coefficient
is zero in the population?
b.
The above regression was estimated using HAC standard errors. When you re -estimate the regression
using homoskedasticity-only standard errors, the standard error of the slope coefficient drops to 0.06.
Calculate the t-statistic for the slope coefficient again. Which of the two standard errors should you
use for statistical inference?
Answer: a. The t-statistic is 3.62. Hence you can reject the null hypothesis at any reasonable level of significance.
b. The t-statistic has now increased to 7.94. In the presence of either heteroskedasticity and/or
autocorrelation in the errors, OLS estimation of the regression coefficients is still consistent. However,
the homoskedasticity-only or heteroskedasticity-robust standard errors are inconsistent and use of
these in the presence of serial correlation results in misleading statistical inference. For example,
confidence intervals do not contain the true value in the postulated number of times in repeated
samples. The solution is to adjust the estimator for the standard errors by incorporating sample
autocorrelation estimates. This results in the heteroskedasticity- and autocorrelation-consistent (HAC)
estimator of the variance of the estimator. For this estimator to be consistent, a certain truncation
parameter is introduced, so that not all T-1 sample autocorrelations are used. Incorporating this idea
into the HAC formula results in the Newey-West variance estimator.
13) You are hired to forecast the unemployment rate in a geographical area that is peripheral to a large
metropolitan area in the United States. The area in question is called the Inland Empire (San Bernardino
County and Riverside County) and is situated east of Greater Los Angeles (Los Angeles County and Orange
County). While the area has a large population (it is the 14 th largest metropolitan statistical area in the United
States), its economic activity relies heavily on that of the larger area it is attached to. For example, it is estimated
that approximately 20% of its workforce commutes into the Greater Los Angeles area for work and few
workers commute the other way. Furthermore, its logistics industry is heavily dependent on economic activity
in the Greater Los Angeles Area. As a result, you view the unemployment rate of the Greater Los Angeles Area
(urGLA) to be exogenous in determining the unemployment rate in the Inland Empire (urIE ). You estimate the
following distributed lag model, where numbers in parenthesis are HAC standard errors:
IE
ur t = 0.00002 + 0.74
GLA
ur t
- 0.04
GLA
ur t-1 - 0.01
GLA
ur t-2 + 0.07
(0.06)
(0.06)
(0.06)
(0.00010) (0.06)
+ 0.09
(0.05)
GLA
ur t-5 + 0.10
GLA
ur t-3 + 0.05
GLA
ur t-4
(0.06)
GLA
ur t-6
(0.06)
a.
What is the impact effect of a one percentage point increase (say from 0.06 to 0.07) of the
unemployment rate in the Greater Los Angeles area?
b.
c.
Why do you think the variables above appear in changes rather than in levels?
Answer: a. The unemployment rate in the Inland Empire will increase by 0.0074, or roughly three -quarters of a
percentage point.
b. The unemployment rate in the Inland Empire will increase by roughly one percentage points in the
long-run.
c. The implication must be that the unemployment rates are not stationary over the sample period.
14) There is some economic research which suggests that oil prices play a central role in causing recessions in
developed countries. Some of this work suggests that it is only oil price increases that matter and even more
specifically, that it is the percentage point difference between oil prices at date t and the maximum value over
the previous year. Realizing that energy prices in general can fluctuate quite dramatically in both directions
and that geographic areas also benefit substantially from oil price decreases, you decide to estimate the
following distributed lag model using annual data (numbers in parenthesis are HAC standard errors):
^
a.
b.
What is the predicted cumulative change in GDP Growth over two years of this effect?
c.
The HAC F-statistic is 4.07. Can you reject the null hypothesis that oil price changes have no effect on
real GDP growth? What is the critical value you considered? Is there any reason why you should be
cautious using an F-test in this case, given the sample period?
2 Yt is stationary.
C)
Answer: A
5) The following is not a consequence of Xt and Yt being cointegrated:
A) if Xt and Yt are both I(1), then for some , Yt X t is I(0).
B) Xt and Yt have the same stochastic trend.
C) in the expression Yt Xt , is called the cointegrating coefficient.
D) if Xt and Yt are cointegrated then integrating one of the variables gives you the same result as integrating
the other.
Answer: D
6) One advantage of forecasts based on a VAR rather than separately forecasting the variables involved is
A) that VAR forecasts are easier to calculate.
B) you typically have knowledge of future values of at least one of the variables involved.
C) it can help to make the forecasts mutually consistent.
D) that VAR involves panel data.
Answer: C
14)
2 Yt
A) =
Yt -
Yt-1 .
2
2
B) = Y t - Y t-1 .
C) =
Yt -
Yt-2 .
D) = Yt - Yt-2 .
Answer: A
15) The order of integration
A) can never be zero.
B) is the number of times that the series needs to be differenced for it to be stationary.
C) is the value of 1 in the quasi difference( Yt - 1 Yt-1 ).
D) depends on the number of lags in the VAR specification.
Answer: B
16) To test the null hypothesis of a unit root, the ADF test
A) has higher power than the so-called DF-GLS test.
B) uses complicated interative techniques.
C) cannot be calculated if the variable is integrated of order two or higher.
D) uses a t-statistic and a special critical value.
Answer: D
17) Unit root tests
A) use the standard normal distribution since they are based on the t-statistic.
B) cannot use the standard normal distribution for statistical inference. As a result the ADF statistic has its
own special table of critical values.
C) can use the standard normal distribution only when testing that the level variable is stationary, but not
the difference variable.
D) can use the standard normal distribution but only if HAC standard errors were computed.
Answer: B
18) In a VECM,
A) past values of Yt -
21) Assume that you have used the OLS estimator in the cointegrating regression and test the residual for a unit
root using an ADF test. The resulting ADF test statistic has a
A) normal distribution in large samples.
B) non-normal distribution which requires ADF critical values for inference.
C) non-normal distribution which requires EG-ADF critical values for inference.
D) normal distribution when HAC standard errors are used.
Answer: C
22) The DOLS estimator has the following property if Xt and Yt are cointegrated:
A) it is BLUE even in small samples.
B) it is efficient in large samples.
C) it has a standard normal distribution when homoskedasticity-only standard errors are used.
D) it has a non-normal distribution in large samples when HAC standard errors are used.
Answer: B
23) Volatility clustering
A) is evident in most cross-sections.
B) implies that a series is serially correlated.
C) can mostly be found in studies of the labor market.
D) is evident in many financial time series.
Answer: D
24) Using the ADL(1,1) regression Yt = 0 + 1 Yt-1 + 1 Xt-1 + ut, the ARCH model for the regression error
assumes that ut is normally distributed with mean zero and variance
A)
2
2
2
2
t = 0 + 1 u t-1 + 2 u t-2 + ... + p u t-p .
B)
2
2
2
t = u t-1 + ... + u t-p + 1
C)
2
t= 1
D)
2
2
2
t = 0 + 1 u t-1 + ... + p u t-p + 1
2
t , where
2
2
t-1 + ... + q t-q .
2
2
t-1 + ... + q t-q .
2
t-1 + ... + q
2
t-q .
Answer: A
25) ARCH and GARCH models are estimated using the
A) OLS estimation method.
B) the method of maximum likelihood.
C) DOLS estimation method.
D) VAR specification.
Answer: B
26) A VAR with k time series variables consists of
A) k equations, one for each of the variables, where the regressors in all equations are lagged values of all the
variables
B) a single equation, where the regressors are lagged values of all the variables
C) k equations, one for each of the variables, where the regressors in all equations are never more than one
lag of all the variables
D) k equations, one for each of the variables, where the regressors in all equations are current values of all the
variables
Answer: A
Stock/Watson 2e -- CVC2 8/23/06 -- Page 384
ln(T)
T
ln(T)
T
ln(T)
T
Answer: C
28) The lag length in a VAR using the BIC proceeds as follows: Among a set of candidate values of p, the estimated
lag length xxx is the value of p
A) For which the BIC exceeds the AIC
B) That maximizes BIC(p)
C) Cannot be determined here since a VAR is a system of equations, not a single one
D) That minimizes BIC(p)
Answer: D
29) The dynamic OLS (DOLS) estimator of the cointegrating coefficient, if Yt and Xt are cointegrated,
A) is efficient in large samples
B) statistical inference about the cointegrating coefficient is valid
C) the t-statistic constructed using the DOLS estimator with HAC standard errors has a standard normal
distribution in large samples
D) all of the above
Answer: D
30) The EG-ADF test
A) is the similar to the DF-GLS test
B) is a test for cointegration
C) has as a limitation that it can only test if two variables, but not more than two, are cointegrated
D) uses the ADF in the second step of its procedure
Answer: B
2) Some macroeconomic theories suggest that there is a short-run relationship between the inflation rate and the
unemployment rate. How would you go about forecasting these two variables? Suggest various alternatives
and discuss their advantages and disadvantages.
Answer: There are various methods available for forecasting the inflation rate and the unemployment rate. One
basic distinction is whether or not the two variables are forecasted separately, or jointly as a system of
two equations. Another distinction involves one period ahead forecasts vs. multiperiod forecasts.
Finally, if multiperiod forecasts are used, then there is a multiperiod forecasting regression method vs.
an interated forecast method.
Univariate Regression Methods: Here either the change in the inflation rate or the unemployment rate is
modeled as an AR(p) and estimated by OLS. Observed values for the regressors are then substituted to
produce a one period ahead forecast. (The one period ahead forecast for the inflation rate can then also
be derived.) Statistical methods, such as the BIC or AIC can be used for choosing the number of lags.
There are two important properties of the forecasts: the best forecast of either the change of the inflation
rate or the unemployment rate depends only on the most recent p past values, and the errors are serially
uncorrelated. These follow from the OLS assumptions. The multiperiod regression method for making
an h-period ahead forecast of the change in inflation or unemployment rate using the AR( p) involves
regressing these variables on its p lags, starting from (t-h), i.e., Yt = 0 + 1 Yt-h + . . . + p Yt-p-h+1 + ut.
Since the error term is serially correlated for the multiperiod regression, HAC standard errors must be
used to have a reliable basis for inference. The iterated AR forecast method for the AR( p) is achieved by
forecasting one period ahead initially, then using the forecasted value for the two period ahead forecast,
^
^ ^
and so on. More formally, the two-period ahead forecast is Yt t-2 = 0 + 1 Yt-1 t-2 + 2 Yt-2 + 3 Yt-3
^
^ ^
3) Think of at least five examples from economics where theory suggests that the variables involved are
cointegrated. For one of these cases, explain how you would test for cointegration between the variables
involved and how you could use this information to improve forecasting.
Answer: Answers will vary by student, but given the textbook example of the three -month and one-year interest
rates, you can expect students to list it. Consumption and income, real money balances, income and the
interest rate (or income velocity and the interest rate), purchasing power parity, inflation rates across
countries, are prime candidates.
I will use the example of real consumption and income to explain how to test for cointegration and how
to potentially incorporate the information into forecasting. Both (the log of) consumption and income
should be plotted over time to check whether they give the appearance of having a common stochastic
trend. Furthermore, economic theory suggests that they are proportional to each other, although the
factor of proportionality may depend on other variables. Under the null hypothesis, Ct - Yt has a unit
root, where C is the log of consumption and Y is the log of disposable income. If was known, then the
DF or DFGLS unit root tests could be employed here, but since it is not, the cointegrating coefficient has
to be estimated first by OLS, which is consistent if consumption and disposable income are cointegrated.
The resulting residuals from the regression Ct = + Yt + zt are then subjected to a DF t-test with an
intercept and no time trend. The t-statistic is compared to the critical values for the EGADF, and if they
exceed these, then the null hypothesis is reject in favor of consumption and disposable income being
cointegrated.
^
The lag of the estimated error correction term (Ct - Yt )can then be used as an additional regressor in a
VAR specification to predict both the growth rate of real consumption and the growth rate of real
disposable income. This specification is known as the vector error correction model (VECM).
4) What role does the concept of cointegration and the order of integration play in modeling the relationship
between variables? Explain how tests of cointegration work.
Answer: Cointegration between two or more variables is a regression analysis concept to potentially reveal
long-run relationships among time series variables. Variables are said to be cointegrated if the have the
same stochastic trend in common. Most economic time series are I(1) variables, which means that they
have a unit autoregressive root and that the first difference in that variable is stationary. Since these
variables are often measured in logs, their first difference approximates growth rates. Cointegration
requires a common stochastic trend. Therefore, variables which are tested for cointegration must have
the same order of integration.
The concept of cointegration is also an effort to bring back long-run relationships between variables into
short-run forecasting techniques, such as VARs. Adding the error correction term from the cointegrating
relationship to the VARs results in the vector error correction model. Here all variables are stationary,
either because they have been differenced or because the common stochastic trend has been removed.
VECMs therefore combine short-run and long-run information. One way to think about the role of the
error correction term is that it provides an anchor which pulls the modeled relationships eventually
back to their long-run behavior, even if it is disturbed by shocks in the short-run.
Cointegration also represents the return of the static regression model, i.e., regressions where no lags or
used. To test for cointegration using the EG-ADF test requires estimating a static regression between the
potentially cointegrated variables by OLS first, and then to conduct an ADF test on the residuals from
this regression. If the residuals do not have a unit root, then the variables are said to be cointegrated.
Since this is a two step procedure, critical values for the ADF t -statistic are adjusted and are referred to
the critical values for the EG-ADF statistic. Although the OLS estimator is consistent, it has a nonnormal
distribution and hence inference should not be conducted based on the t-statistic, even if HAC standard
errors are used. Alternative techniques to circumvent this problem, such as the DOLS estimator, which is
consistent and efficient in large samples, have been developed. The DOLS and another frequently used
technique, called the Johansen method, can be easily extended to multiple cointegrating relationships.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 387
5) Carefully explain the difference between forecasting variables separately versus forecasting a vector of time
series variables. Mention how you choose optimal lag lengths in each case. Part of your essay should deal with
multiperiod forecasts and different methods that can be used in that situation. Finally address the difference
between VARS and VECM.
Answer: t-When variables are forecasted separately, then single equations of the AR( p) type are typically
involved. If economic theory and/or institutional knowledge suggest that additional predictors should
be included, then forecasts can be potentially improved by estimating an ADL(p,q) model. For one
period ahead forecasts, these are identical to forecasts based on systems of equations. Lag lengths will be
chosen using the BIC or the AIC criterium.
There are three important reasons why VARs may be preferable for forecasting. One results from the
forecasting horizon. If forecasts are to be made two or more periods ahead, then if future values of the
additional predictors are to be used, these have to be forecasted themselves. This can be avoided by
choosing the multiperiod regression method. Here, in the case of an h period forecast, multiperiod
regressions are estimated where all predictors are lagged h periods or more. Second, using VAR
forecasting methods will make the forecasts for the variables involved mutually consistent. This is the
result of using the iterated VAR forecasts whereby the forecasted values are subsequently used to
forecast further ahead. Finally VAR models allow for restrictions across equations to be tested.
Multiperiod regression methods in general may be preferable over iterated forecasts if the AR(p),
ADL(p,q) or VAR models are incorrectly specified. In practice, the difference in forecasts tends to be very
small between the multiperiod regression and iterated forecast methods.
VAR models can be enhanced by incorporating long -run information in the form of error correction
terms. If some of the variables in the VAR model have a common stochastic trend, then this can be used
to improve the forecasts by including the error correction term, thereby turning the VAR model into a
VECM.
6) You have collected quarterly data for the unemployment rate ( Unemp) in the United States, using a sample
period from 1962:I (first quarter) to 2009:IV (the data is collected at a monthly frequency, but you have taken
quarterly averages).
a.
Does economic theory suggest that the unemployment rate should be stationary?
b.
Testing the unemployment rate for stationarity, you run the following regression (where the lag length
was determined using the BIC; using the AIC instead does not change the outcome of the test, even
though it chooses 9 lags of the LHS variable):
Unempt = 0.217 - 0.035 Unempt-1 + 0.689 Unempt-1
(0.01) 0.0012)
(0.054)
Use the ADF statistic with an intercept only to test for stationarity. What is your decision?
c.
The standard errors reported above were homoskedasticity -only standard errors. Do you think you
could potentially improve on inference by allowing for HAC standard errors?
d.
An alternative test for a unit root, the DF-GLS, produces a test statistic of -2.75. Find the critical value
and decide whether or not to reject the null hypothesis. If the decision is different from (c), is there any
reason why you might prefer the DF-GLS test over the ADF test?
Answer: a. In macroeconomics or labor economics, you have learned about the natural rate of unemployment, or
the Non-Accelerating Inflation Rate of Unemployment (NAIRU). The idea here is that unemployment
rates may deviate from this equilibrium unemployment rate, but that, following a shock, the
unemployment rate will revert towards this equilibrium. Hence you might expect the difference between
the unemployment rate and the NAIRU, referred by some as the cyclical unemployment rate, to be
stationary. Unfortunately the equilibrium unemployment rate is not a constant over time and may be
affected by demographics, the price of search (unemployment insurance benefits), and other variables. If
the NAIRU is not a constant over time, then the unemployment rate itself may not be stationary.
Furthermore, there is also the idea of hysteresis, which allows for the unemployment rate to move to a
new equilibrium rate once a shock hits the economy. The bottom line is that while there is some
guidance from economic theory, it is an empirical question whether or not the unemployment rate is
stationary.
b. The t-statistic for the ADF test is -2.84. The critical value at the 5% level is -2.86. Hence you can reject
the null hypothesis of a unit root for the unemployment rate at the 10% level, but (just) fail to reject the
null hypothesis at the 5% level. Most economist treat the unemployment rate as stationary.
c. The ADF statistic is computed using non-robust standard errors. It turns out that under the null
hypothesis of a unit root, the homoskedasticty-only standard errors generate a t-statistic that is robust to
heteroskedasticity.
d. The critical value for the DF-GLS test is -2.58 at the 1% level. Hence you can reject the null hypothesis
of a unit root using this test. The DF-GLS has a higher power when compared to the ADF test, and
hence should be preferred.
2
2
+ 1 ( u t-1 + 1 u t-2 +
2
2
t = 0 + 1 u t-1 + 1
2u2 +
t-3
1
2
t-1 . Show that this model can be rewritten as
2
t =
2
t-1 ; substitute this expression into the original specification, and so on.) Explain intuitively the meaning of
the resulting formulation.
Answer:
2
2
t = 0 + 1 u t-1 + 1
2
2
2
t-1 = 0 + 1 u t-1 + 1 ( 0 + 1 u t-2 + 1
2
2
= 0 (1 + 1 ) + 1 ( u t-1 + 1 u t-2 ) +
=
2
1
2
2
2
1 ) + 1 ( u t-1 + 1 u t-2 +
0 (1 + 1 +
2
t-2 )
2
t-2
2 2
1 u t-3 ) +
3
1
2
t-3 . Continuing with the
substitutions infinitely and noting that the sum of the geometric series is 1+ 1 +
you finally arrive at
2
0
2
2
+ 1 ( u t-2 + 1 u t-2 +
t = 11
2 2
1 u t-3 +
2
1 +
3
1
1 +... = 11
3 2
1 u t-4 + ...). This expression
states that the variances depend on a weighted average of past squared residuals, where the distant past
receives a smaller weight than more recently observed squared residuals.
2) You have collected quarterly data on inflation and unemployment rates for Canada from 1961:III to 1995:IV to
estimate a VAR(4) model of the change in the rate of inflation and the unemployment rate. The results are
Inft = 1.02 .54 Inft-1 .46 Inft-2 .32 Inft-2 .01 Inft-4
(.09) (.09)
(.09)
(.08)
(.44)
-.76 Unempt-1 + .20 Unempt-2 .16 Unempt-3 + .59 Unempt-4
(.43)
(.76)
(.76)
(.44)
R2 = .26.
Unempt = 0.18 .003 Inft-1 .016 Inft-2 .018 Inft-3 .010 Inft-4
(.10) (.016)
(.018)
(.017)
(.016)
+ 1.47 Unempt-1 .46 Unempt-2 .08 Unempt-3 + .05 Unempt-4
(.08)
(.14)
(.14)
(.08)
R2 = .980.
(a) Explain how you would use the above regressions to conduct one period ahead forecasts.
(b) Should you test for cointegration between the change in the inflation rate and the unemployment rate and,
in the case of finding cointegration here, respecify the above model as a VECM?
(c) The Granger causality test yields the following F-statistics: 3.75 for the test that the coefficients on lagged
unemployment rate in the change of inflation equation are all zero; and 0.36 for the test that the coefficients on
lagged changes in the inflation rate are all zero. Based on these results, does unemployment Grangercause
inflation? Does inflation Granger-cause unemployment?
Answer: (a) One period ahead forecasts are the same as for the ADL(4,4) models of the inflation rate and
unemployment rate. For example, forecasting the change in the inflation rate for 1996:I requires use of
the actual values for unemployment and change in inflation rates through 1995:IV. The unemployment
rate for 1996:I is forecasted in the same way using the second regression.
(b) Most economic theories suggest that there is no long-run relationship between the inflation rate and
the unemployment rate, or, stated differently, that the long -run Phillips curve is vertical. Hence
economic theory does not suggest testing for cointegration or using the error correction term in a VECM
model.
(c) The critical value for the F4, statistic is 3.32 at the 1% significance level, and 1.94 at the 10%
significance level. Based on the calculated F-statistics above you can reject the null hypothesis that
lagged unemployment rates do not Granger-cause the inflation rate, but you cannot reject the null
hypothesis that lagged inflation does not Granger -cause the unemployment rate.
3) Purchasing power parity (PPP), postulates that the exchange rate between two countries equals the ratio of the
Pf
respective price indexes or ExchRate =
(where ExchRate is the foreign exchange rate between the two
P
countries, and P represents the price index, with f indicating the foreign country). The long-run version of PPP
implies that that the exchange rate and the price ratio share a common trend.
(a) You collect monthly foreign exchange rate data from 1974:1 to 2002:4 for the U.S./U.K. exchange rate ($/)
and you collect data on the Consumer Price Index for both countries. Explain how you would used the Engle
Granger test statistic to investigate the long-run PPP hypothesis.
(b) One of your peers explains that there may be an easier way to test for the validity of PPP. She suggests to
simply test whether or not the real exchange rate, or competitiveness, is stationary. (The real exchange rate is
P
.) Is she correct? Explain. How would you implement her suggestion? Which
given by ExchRate
Pf
alternative test-statistic is available?
Answer: (a) Using the Engle-Granger two step procedure, the (log of) the exchange rate will be regressed on the
relative price ratio (log difference of the two prices). The residuals from this regression will then be
subjected to a Dickey-Fuller t-test with an intercept but no time trend. This is the EG-ADF procedure.
However, the OLS estimator of the coefficient in this regression is only consistent if the two variables are
cointegrated. Furthermore, inference can be misleading since the OLS estimator does not have a normal
distribution. If a test is performed on whether the coefficient of the price ratio is unity, then the DOLS
estimator should be used with HAC standard errors.
(b) If PPP holds, then the exchange rate and the relative price ratio will have a cointegrating coefficient of
= 1. First the real exchange rate should be plotted to inspect visually whether or not the two variables
are cointegrated. To test this more formally, the real exchange rate should be tested for containing a unit
root, using the ADF statistic. If the null hypothesis is rejected, then this would suggest that PPP holds in
the long-run. Since the ADF test is not the most powerful test, the DF-GLS test can be used as an
alternative.
4) You have collected quarterly Canadian data on the unemployment and the inflation rate from 1962:I to 2001:IV.
You want to re-estimate the ADL(3,1) formulation of the Phillips curve using a GARCH(1,1) specification. The
results are as follows:
Inft = 1.17 .56 Inft-1 .47 Inft-2 .31 Inft-3 .13 Unempt-1
(.48) (.08)
(.10)
(.09)
(.06)
^2
2
t = .86 + .27 u t-1 + .53
(.40) (.11)
2
t-1 .
(.15)
2
(a) Test the two coefficients for u t-1 and
2
t-1 in the GARCH model individually for statistical significance.
5) Consider the following model Yt = 0 + 1 Xt + 2 Xt-1 + 3 Yt-1 + ut, where Xt is strictly exogenous. Show that
3
by imposing the restriction
i = 1 , you can derive the following so-called Error Correction Mechanism
i=1
(ECM) model
Yt =
0 + 1 Xt (Y X)t-1 + ut
= 1 + 2 . What is the short-run (impact) response of a unit increase in X? What is the long-run
solution? Why do you think the term in parenthesis in the above expression is called ECM?
where
Answer: Starting with Yt = 0 + 1 Xt + 2 Xt-1 + 3 Yt-1 + ut, subtracting Yt-1 from both sides, and adding and
subtracting 1 Xt-1 on the right hand side, results in Yt = 0 + 1 Xt + ( 1 + 2 )Xt-1 - (1- 3 )Yt-1 + ut.
3
Note that
i = 1 implies 1 + 2 = 1- 3 . Since
1 + 2 , then
Yt =
0 + 1 Xt - ( Y - X)t-1 +
i=1
ut. The impact response is .
Yt
0 + 1g2 - g1
+ X, where gY and
gX are the steady-state growth rates of Y and X respectively (assuming that the model is in logs). (Y-X)
represents the amount of disequilibrium in the previous period. The term is sometimes referred to as
Equilibrium Correction Mechanism rather than Error Correction Mechanism. If the relationship is in
equilibrium in the previous period, then there is no additional movement in Y other than from the
short-run response.
6) Your textbook states that there are three ways to decide if two variables can plausibly be modeled as
cointegrated: use expert knowledge and economic theory, graph the series and see whether they appear to
have a common stochastic trend, and perform statistical tests for cointegration. All three ways should be used
in practice. Accordingly you set out to check whether (the log of) consumption and (the log of) personal
disposable income are cointegrated. You collect data for the sample period 1962:I to 1995:IV and plot the two
variables.
(a) Using the first two methods to examine the series for cointegration, what do you think the likely answer is?
Stock/Watson 2e -- CVC2 8/23/06 -- Page 393
(b) You begin your numerical analysis by testing for a stochastic trend in the variables, using an Augmented
Dickey-Fuller test. The t-statistic for the coefficient of interest is as follows:
Variable with
lag of 1
t-statistic
LnYpd
LnYpd
-1.93
-5.24
LnC
LnC
-2.20
-4.31
where LnYpd is (the log of) personal disposable income, and LnC is (the log of) real consumption. The estimated
equation included an intercept for the two growth rates, and, in addition, a deterministic trend for the level
variables. For each case make a decision about the stationarity of the variables based on the critical value of the
Augmented Dickey-Fuller test statistic. Why do you think a trend was included for level variables?
(c) Using the first step of the EGADF procedure, you get the following result:
lnC t = 0.24 + 1.017 lnYpd t
Should you interpret this equation? Would you be impressed if you were told that the regression R2 was 0.998
and that the t-statistic for the slope was 266.06? Why or why not?
(d) The DickeyFuller test for the residuals for the cointegrating regressions results in a t-statistic of (3.64).
State the null and alternative hypothesis and make a decision based on the result.
(e) You want to investigate if the slope of the cointegrating vector is one. To do so, you use the DOLS estimator
and HAC standard errors. The slope coefficient is 1.024 with a standard error of 0.009. Can you reject the null
hypothesis that the slope equals one?
Answer: (a) There are economic theories which postulate that real consumption and real personal disposable
income are proportional to each other in equilibrium. The above figure also suggests that the (log)
difference between the two series is stationary, or that they appear to have a common stochastic trend.
(b) The graph suggests the presence of a time trend. The critical values at the 10% significance level is
(-3.12) and (-3.96) at the 1% level. Hence you cannot reject the null hypothesis that the log levels of
consumption and disposable income contain a unit root. You are able to reject the null hypothesis for the
difference in both variables. Hence both series are I(1).
(c) The equation is estimated using OLS, which is only consistent if consumption and disposable income
are cointegrated. But even if the null hypothesis of a unit root can be rejected, the t -statistic does not
have a normal distribution, even when using HAC standard errors. As a result, inference can be
misleading. The high regression R2 is not surprising, given that the two variables are I(1). This could be
an example of a spurious regression. However, alternative estimators are available, such as DOLS, which
is consistent and efficient in large samples and statistical inference on the coefficient of disposable
income is valid if HAC standard errors are used. Alternatively, the Johansen procedure can be used.
(d) Under the null hypothesis, the residuals from the above regression will have a unit root. Given the
critical value for the EGADF statistic of (-3.96) at the 1% significance level, the null hypothesis is
rejected in favor of the alternative hypothesis that consumption and disposable income are cointegrated
over this period.
(e) The t-statistic on the null hypothesis is 2.67. Hence you can reject the null hypothesis at the 5%
significance level.
7) Your textbook so far considered variables for cointegration that are integrated of the same order. For example,
the log of consumption and personal disposable income might both be I(1) variables, and the error correction
term would be I(0), if consumption and personal disposable income were cointegrated.
(a) Do you think that it makes sense to test for cointegration between two variables if they are integrated of
different orders? Explain.
(b) Would your answer change if you have three variables, two of which are I(1) while the third is I(0)? Can you
think of an example in this case?
Answer: (a) To test for cointegration requires that the two variables have the same stochastic trend. If one variable
is I(1) while the other is I(0), then obviously they do not have the same stochastic trend and therefore
cannot be cointegrated.
(b) In this case there would possibly be cointegration between the two I(1) variables, but not between all
three variables. This does not imply that the third variable could not enter into the relationship. Think,
for example, about a money demand relationship between the (log of) real money balances, income, and
the nominal interest rate. It may well be that in some samples the nominal interest rate is I(0), while real
money balances and income are I(1). Finding real money balances and income to be cointegrated does
not imply that the nominal interest rate does not enter the money demand function. There is simply no
need for the interest rate to enter the cointegrating relation because it is I(0). The cointegrating relation
only involves zero-frequency relationships between the first differences of real money balances and
income, and the zero-frequency component of the first difference of the interest rate is non-existent.
8) For the United States, there is somewhat conflicting evidence whether or not the inflation rate has a unit
autoregressive root. For example, for the sample period 1962:I to 1999:IV using the ADF statistic, you cannot
reject at the 5% significance level that inflation contains a stochastic trend. However the null hypothesis can be
rejected at the 10% significance level. The DF-GLS test rejects the null hypothesis at the five percent level. This
result turns out to be sensitive to the number of lags chosen and the sample period.
(a) Somewhat intrigued by these findings, you decide to repeat the exercise using Canadian data. Letting the
AIC choose the lag length of the ADF regression, which turns out to be three, the ADF statistic is ( -1.91). What
is your decision regarding the null hypothesis?
(b) You also calculate the DF-GLS statistic, which turns out to be (-1.23). Can you reject the null hypothesis in
this case?
(c) Is it possible for the two test statistics to yield different answers and if so, why?
Answer: (a) For the Canadian data, the null hypothesis cannot be rejected even at the 10% significance level.
Hence for the chosen sample period and lag length, the Canadian inflation rate seems to have a
stochastic trend.
(b) The critical value for the DF-GLS statistic is (-1.62) at the 10% significance level. Hence the DF-GLS
test comes to the same conclusion as the test based on the ADF statistic: there is evidence of a stochastic
trend.
(c) The two test statistics can come to different conclusion, although this is not the case with the
Canadian inflation rate. The reason is that the DF-GLS test has more power.
9) You have collected time series for various macroeconomic variables to test if there is a single cointegrating
relationship among multiple variables. Formulate the null hypothesis and compare the EGADF statistic to its
critical value.
(a) Canadian unemployment rate, Canadian Inflation Rate, United States unemployment rate, United States
inflation rate; t = (-3.374).
(b) Approval of United States presidents (Gallup poll), cyclical unemployment rate, inflation rate, Michigan
Index of Consumer Sentiment; t = (-3.837).
(c) The log of real GDP, log of real government expenditures, log of real money supply (M2); t = (-2.23).
(d) Briefly explain how you could potentially improve on VAR(p) forecasts by using a cointegrating vector.
Answer: (a) The null hypothesis of a unit root in the error correction term cannot be rejected even at the 10% level.
Hence there is little support of a single cointegrating relationship between these four variables.
(b) The critical value is (-4.20) at the 10% significance level. Hence you cannot reject the null hypothesis
of the error correction term having a unit root.
(c) Since the critical value for three variables is (-3.84) at the 10% significance level, there does not seem
to be a cointegrating relationship between the three variables.
(d) Adding the error correction term from the cointegrating relationship between variables to the
VAR(p) model results in a vector error correction model (VECM). The advantage of this model over a
VAR model is that it incorporates both short-run and long-run information into the forecasting
equation.
10) There has been much talk recently about the convergence of inflation rates between many of the OECD
economies. You want to see if there is evidence of this closer to home by checking whether or not Canadas
inflation rate and the United States inflation rate are cointegrated.
(a) You begin your numerical analysis by testing for a stochastic trend in the variables, using an Augmented
Dickey-Fuller test. The t-statistic for the coefficient of interest is as follows:
Variable with
lag of 1
t-statistic
InfCan
InfCan
InfUS
-1.93
-6.38
-2.37
InfUS
-5.63
where InfCan is the Canadian inflation rate, and InfUS is the United States inflation rate. The estimated
equation included an intercept. For each case make a decision about the stationarity of the variables based on
the critical value of the Augmented Dickey-Fuller test statistic.
(b) Your test for cointegration results in a EGADF statistic of (7.34). Can you reject the null hypothesis of a
unit root for the residuals from the cointegrating regression?
(c) Using a working hypothesis that the two inflation rates are cointegrated, you want to test whether or not the
slope coefficient equals one. To do so you estimate the cointegrating equation using the DOLS estimator with
HAC standard errors. The coefficient on the U.S. inflation rate has a value of 0.45 with a standard error of 0.13.
Can you reject the null hypothesis that the slope equals unity?
(d) Even if you could not reject the null hypothesis of a unit slope, would that have been sufficient evidence to
establish convergence?
Answer: (a) The critical value for the ADF is (-2.57) at the 10% significance level for the sample period. Therefore
you cannot reject the null hypothesis that there is a unit root for both inflation rates. However, given the
critical value for the ADF statistic of (-3.43) you can reject the null hypothesis for the difference or the
acceleration in the inflation rates at the 1% significance level. Both price levels appear to be I(2) variables.
(b) Given the critical value of (-3.96) for the EG-ADF statistic, you can reject the null hypothesis of a unit
root in favor of the two inflation rates being cointegrated.
(c) The DOLS estimator allows for statistical inference on the coefficient using the standard normal
distribution. Since 0.45 is more than two standard deviations from unity, you can reject the null
hypothesis of that regression coefficient being one.
(d) Finding a unit slope would not be sufficient for convergence, since it would allow for a constant
difference between the two inflation rates. To have convergence you would need that difference to be
zero.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 396
11) You have re-estimated the two variable VAR model of the change in the inflation rate and the unemployment
rate presented in your textbook using the sample period 1982:I (first quarter) to 2009:IV. To see if the
conclusions regarding Granger causality of changed, you conduct an F-test for this new sample period. The
results are as follows: The F-statistic testing the null hypothesis that the coefficients on Unempt-1 , Unemp t-2 ,
Unempt-3 , and Unemplt-4 are zero in the inflation equation (Equation 16.5 in your textbook) is 6.04. The
F-statistic testing the hypothesis that the coefficients on the four lags of Inft are zero in the unemployment
equation (Equation 16.6 in your textbook) is 0.80.
a.
b.
Do you think that the unemployment rate Granger-causes changes in the inflation rate?
c.
Do you think that the change in the inflation rate Granger -causes the unemployment rate?
12) In this case, the Granger causality statistic does not exceed the critical value, and hence the conclusion is that
the change in the inflation rate does not Granger-cause the unemployment rate.
Inft = 0.05 - 0.31 Inft-1
(0.14) (0.07)
t = 1982:I 2009:IV, R2 = 0.10, SER = 2.4
a.
Calculate the one-quarter-ahead forecast of both Inf2010:I and Inf2010:I (the inflation rate in 2009:IV was
2.6 percent, and the change in the inflation rate for that quarter was -1.04).
b.
Calculate the forecast for 2010:II using the iterated multiperiod AR forecast both for the change in the
inflation rate and the inflation rate.
c.
What alternative method could you have used to forecast two quarters ahead? Write down the
equation for the two-period ahead forecast, using parameters instead of numerical coefficients, which
you would have used.
The alternative would have been to use the Direct Multiperiod Forecasts method. The
^
13) You have collected quarterly data for real GDP (Y) for the United States for the period 1962:I (first quarter) to
2009:IV.
a.
Testing the log of GDP for stationarity, you run the following regression (where the lag length was
determined using the AIC):
ln Yt = 0.03 - 0.0024 ln Yt-1 + 0.253 ln Yt-1 + 0.167 ln Yt-2
(0.03) (0.0014)
(0.072)
(0.072)
b.
You have decided to test the growth rate of real GDP for stationarity for the same sample period. The
regression is as follows:
2 ln Yt = 0.0041 - 0.543
(0.0009) (0.082)
ln Yt-1 - 0.186
2 ln Yt-1
(0.071)
Use the ADF statistic with an intercept only to test for stationarity. What is your decision?
c.
Using the orders of integration terminology, what order of integration is the log level of real GDP? The
growth rate?
d.
Given that the SER hardly changed in the second equation, why is the regression R2 larger?
Answer: a. The t-statistic for the ADF test is -1.77. The critical value at the 5% level is -2.86. Hence you cannot
reject the null hypothesis of a unit root for the log level of real GDP.
b. The t-statistic for the ADF test is -6.65. The critical value at the 5% level is -2.86. Hence you can reject
the null hypothesis of a unit root for the (quarterly) growth rate of real GDP.
c. The log of real GDP is I(1), the growth rate is I(0); the growth rate is stationary.
d. Obviously the TSS must have increased since R 2 = 1 (SSR/TSS).
14) Economic theory suggests that the law of one price holds. Applying this concept to foreign and domestic goods
implies that goods will sell for the same price across countries. The consumer price index is the price for a
basket of goods, and is calculated for countries as a whole. Hence in the absence of barriers to trade, and large
transportation costs (and the fact that not all goods are traded) you should observe Purchasing Power Parity
(PPP) between two countries, or ExchRateP=Pf, where ExchRate is the foreign exchange rate between the two
countries, and P represents the price index, with f indicating the foreign country. Dividing both sides of the
Pf
equation by the domestic price level then gives you the standard formulation for PPP: ExchRate =
. If PPP
P
holds in the long run, then the exchange rate and the price ratio should share a common trend. Since it is a
long-run concept, cointegration provides an interesting way to test for it.
a.
Using monthly data for the U.S./U.K. exchange rate ($/ ) and the respective price indexes, you estimate
the following regression:
ExchRatet = 0.44 + 0.69 (ln PUS - ln PUK )
Collecting the residuals from this regression and using an ADF test for cointegration, you find a
t-statistic of -2.71. Can you reject the null-hypothesis of no cointegration? What is the critical value?
b.
Was it good econometric practice to test for cointegration right away? What else should you have done
before proceeding with the EG-ADF test?
Answer: a. The critical value is -3.41 and hence the EG-ADF test cannot reject the null hypothesis of no
cointegration.
b. For the regression to establish cointegration, you should test first whether or not the LHS and RHS
variables are of the same order of integration. It is well known that exchange rates follow a random walk
and are therefore I(1) variables, but price indexes are typically of the same order of integration for
countries with similar inflation rates such as the U.K. and the U.S. Hence the RHS variable will likely be
stationary or I(0). (The ADF statistic for the exchange rate is -2.18 while the log price difference has an
ADF statistic of -4.67.)
2
u.
B) E(ui Xi) = 0.
C) the conditional distribution of ui given Xi is normal.
D) var(ui Xi) =
2
u,i .
Answer: D
6) The extended least squares assumptions are of interest, because
A) they will often hold in practice.
B) if they hold, then OLS is consistent.
C) they allow you to study additional theoretical properties of OLS.
D) if they hold, we can no longer calculate confidence intervals.
Answer: C
Y.
C) unbiased.
D) Pr Sn Y
0.
Answer: C
11) Slutskys theorem combines the Law of Large Numbers
A) with continuous functions.
B) and the normal distribution.
C) and the Central Limit Theorem.
D) with conditions for the unbiasedness of an estimator.
Answer: C
12) An implication of
n ( 1 1)
N(0,
var(v i)
[var(Xi)]2
) is that
A) 1 is unbiased.
^
B) 1 is consistent.
C) OLS is BLUE.
D) there is heteroskedasticity in the errors.
Answer: B
13) Under the five extended least squares assumptions, the homoskedasticity -only t-distribution in this chapter
A) has a Student t distribution with n-2 degrees of freedom.
B) has a normal distribution.
C) converges in distribution to a
2
n-2 distribution.
2
u.
B) a consistent estimator of
2
u.
n ^
2
ui
i=1
A) is the expected value of the homoskedasticity only standard errors.
1
n-2
B) =
2
u.
2
u /(n-2).
Answer: B
16) The Gauss-Markov Theorem proves that
A) the OLS estimator is t distributed.
B) the OLS estimator has the smallest mean square error.
C) the OLS estimator is unbiased.
D) with homoskedastic errors, the OLS estimator has the smallest variance in the class of linear and unbiased
estimators, conditional on X1 ,, Xn.
Answer: D
17) The following is not one of the Gauss-Markov conditions:
A) var(ui X1 ,, Xn) =
2
u, 0 <
2
u<
for i = 1,, n,
D) E(ui X1 ,, Xn) = 0
Answer: B
A)
Xi - X
B)
C)
n
j=1
n ^
ai Yi , where a^i =
i=1
(Xj - X)2
1
.
n
Xi - X
n
(Xj - X)
j=1
D)
Xi
n
j=1
(Xj - X)2
Answer: A
20) If the errors are heteroskedastic, then
A) the OLS estimator is still BLUE as long as the regressors are nonrandom.
B) the usual formula cannot be used for the OLS estimator.
C) your model becomes overidentified.
D) the OLS estimator is not BLUE.
Answer: D
21) Estimation by WLS
A) although harder than OLS, will always produce a smaller variance.
B) does not mean that you should use homoskedasticity -only standard errors on the transformed equation.
C) requires quite a bit of knowledge about the conditional variance function.
D) makes it very hard to interpret the coefficients, since the data is now weighted and not any longer in its
original form.
Answer: C
22) The WLS estimator is called infeasible WLS estimator when
A) the memory required to compute it on your PC is insufficient.
B) the conditional variance function is not known.
C) the numbers used to compute the estimator get too large.
D) calculating the weights requires you to take a square root of a negative number.
Answer: B
2
ui
2
u
C) var(ui|Xi) =
2
u
2
ui
D) var(ui|Xi) =
Answer: C
27) In order to use the t-statistic for hypothesis testing and constructing a 95% confidence interval as 1.96
standard errors, the following three assumptions have to hold:
A) the conditional mean of ui , given Xi is zero; (Xi ,Yi), i = 1,2, , n are i.i.d. draws from their joint
distribution; Xi and ui have four moments
B) the conditional mean of ui , given Xi is zero; (Xi ,Yi), i = 1,2, , n are i.i.d. draws from their joint
distribution; homoskedasticity
C) the conditional mean of ui , given Xi is zero; (Xi ,Yi), i = 1,2, , n are i.i.d. draws from their joint
distribution; the conditional distribution of ui given Xi is normal
D) none of the above
Answer: A
2
0
1/2
B) var(ui|Xi) = 0 + 1 X i
2
C) var(ui|Xi) = 0 + 1 X i
D) var(ui|Xi) =
2
u
Answer: C
29) In practice, you may want to use the OLS estimator instead of the WLS because
A) heteroskedasticity is seldom a realistic problem
B) OLS is easier to calculate
C) heteroskedasticity robust standard errors can be calculated
D) the functional form of the conditional variance function is rarely known
Answer: D
30) If the functional form of the conditional variance function is incorrect, then
A) the standard errors computed by WLS regression routines are invalid
B) the OLS estimator is biased
C) instrumental variable techniques have to be used
D) the regression R2 can no longer be computed
Answer: A
31) Suppose that the conditional variance is var(ui|Xi ) = h(Xi ) where is a constant and h is a known function.
The WLS estimator is
A) the same as the OLS estimator since the function is known
B) can only be calculated if you have at least 100 observations
C) the estimator obtained by first dividing the dependent variable and regressor by the square root of h and
then regressing this modified dependent variable on the modified regressor using OLS
D) the estimator obtained by first dividing the dependent variable and regressor by h and then regressing
this modified dependent variable on the modified regressor using OLS
Answer: C
^
B)
n( 1 - 1 ) d N(0
C)
n( 1 - 1 ) d N(0
D)
n( 1 - 1 ) d N(0
var( i)
[var(Xi)]2
var( i)
[var(Xi)]2
where i= ui
where i= Xiui
2
u
[var(Xi)]2
Answer: A
33) (Requires Appendix material) If X and Y are jointly normally distributed and are uncorrelated,
A) then their product is chi-square distributed with n-2 degrees of freedom
B) then they are independently distributed
C) then their ratio is t-distributed
D) none of the above is true
Answer: B
2
34) Assume that var(ui|Xi) = 0 + 1 X i . One way to estimate
0 and
1 consistently is to regress
^
2
A) ui on X i using OLS
^2
2
B) u i on X i using OLS
^2
C) u i on
Xiusing OLS
^2
2
D) u i on X i using OLS but surpressing the constant ( restricted least squares )
Answer: B
35) Assume that the variance depends on a third variable, W i, which does not appear in the regression function,
1
One way to estimate 0 and 1 consistently is to regress
and that var(u i|Xi,Wi) = 0 + 1
Wi
^
2
A) ui on W i using OLS
^
B) ui on
1
using OLS
Wi
Xi
^2
C) u i on
using OLS
Wi
^2
1
using OLS
D) u i on
Wi
Answer: D
4) I am an applied econometrician and therefore should not have to deal with econometric theory. There will be
others who I leave that to. I am more interested in interpreting the estimation results. Evaluate.
Answer: Being presented with regression output and interpreting these uncritically does not allow the applied
econometrician to understand the limitations of the tool. As a result, the interpretation may be false as
might be the case in rejecting hypotheses when standard statistical inference does not apply in the
situation at hand. In particular, having knowledge of econometric theory allows the econometrician to
check whether or not the assumptions, which are necessary for statistical properties to hold, apply in a
given situation. Knowing when to apply and when not to apply certain techniques is essential in
conducting statistical inference, such as hypothesis testing and using confidence intervals. If the applied
econometrician understands the limitations of certain estimation techniques, such as OLS, then she will
be able to look for alternative approaches rather than blindly applying techniques by pushing buttons
in econometric software. The above statement therefore seems short-sighted.
5) One should never bother with WLS. Using OLS with robust standard errors gives correct inference, at least
asymptotically. True, false, or a bit of both? Explain carefully what the quote means and evaluate it critically.
Answer: WLS is a special case of the GLS estimator. Furthermore, OLS is a special case of the WLS estimator. Both
will produce different estimates of the intercept and the coefficients of the other regressors, and different
estimates of their standard errors. WLS has the advantage over OLS, that it is (asymptotically) more
efficient than OLS. However, the efficiency result depends on knowing the conditional variance
function. When this is the case, the parameters can be estimated and the weights can be specified.
Unfortunately in practice, as Stock and Watson put it, the functional form of the conditional variance
function is rarely known. Using an incorrect functional form for the estimation of the parameters results
in incorrect statistical inference. The bottom line is that WLS should be used in those rare instances
where the functional form is known, but not otherwise. Estimation of the parameters using OLS with
heteroskedasticity-robust standard errors, on the other hand, leads to asymptotically valid inferences
even for the case where the functional form of the heteroskedasticity is not known. It therefore seems
that for real world applications the above statement is true.
2
4
u (homoskedasticity) this fails since var(ui Xi) = X i ; and
5. The conditional distribution of ui given Xi is normal (normal errors) this holds since Xi, ui is
perfectly normal, so to speak.
(b) Since the model is heteroskedastic, WLS offers efficiency gains.
2
2
(c) You would weight each observation by 1/ X i , i.e., regress Yi/ X i on 1/Xi.
2) (Requires Appendix material) This question requires you to work with Chebychevs Inequality.
(a) State Chebychevs Inequality.
(b) Chebychevs Inequality is sometimes stated in the form The probability that a random variable is further
than k standard deviations from its mean is less than 1/k2 . Deduce this form. (Hint: choose artfully.)
(c) If X is distributed N(0,1), what is the probability that X is two standard deviations from its mean? Three?
What is the Chebychev bound for these values?
(d) It is sometimes said that the Chebychev inequality is not sharp. What does that mean?
Answer: (a) Pr( V V
(c) 0.046 and 0.0027 respectively. (The smallest/largest z-value in Table 1 of the textbook is 2.99/2.99.
Using these values, the second number modifies to 0.0028.) Chebychevs inequality gives 0.25 and 0.11,
respectively.
(d) Answer: This means that, for some distributions, the probability that a random variable is further
than k standard deviations away from its mean is much less than 1/ k2 .
3) For this question you may assume that linear combinations of normal variates are themselves normally
distributed. Let a, b, and c be non-zero constants.
(a) X and Y are independently distributed as N(a, 2 ). What is the distribution of (bX+cY)?
(b) If X1 ,..., Xn are distributed i.i.d. as N(a,
2
1
X ), what is the distribution of n
Xi ?
i=1
(c) Draw this distribution for different values of n. What is the asymptotic distribution of this statistic?
(d) Comment on the relationship between your diagram and the concept of consistency.
n
1
Xi . What is the distribution of n(X a)? Does your answer depend on n?
(e) Let X =
n
i=1
Answer: (a) E(bX + cY) = bE(X) + cE(Y) = a(b + c); var(bX + xY) = (b2 + c2 ) 2 .
Hence (bX+cY) are distributed N(a(b + c), 2 (b2 + c2 )).
(b) From (a) it follows that this is distributed as N(a,
2
n
).
(c) The curves will be normal curves centered on a, but becoming spike-like as n grows.
(d) The diagram shows that, as n grows, the probability distribution concentrates on a. The probability of
n
1
Xi different from a becomes small as n grows. This is consistency.
observing a value of
n
i=1
(e) n(X - a) is distributed N(0, 2 ). This does not dependent on n, in contrast to the large-sample
non-normal case where this distribution is only approached as n grows.
4) Consider the model Yi - 1 Xi + ui, where the Xi and ui the are mutually independent i.i.d. random variables
with finite fourth moment and E(ui) = 0.
^
Xiui
i=1
n
n( 1 - 1 ) =
2
Xi
i=1
n
(b) What is the mean and the variance of
Xiui
i=1
n
^
i=1
n
XiYi
2
Xi
i=1
re-arranging terms then gives the above expression.
n
(b) The mean is zero and the variance is obtained from var
Xiui
i=1
n
If the Central Limit Theorem holds, then this will be distributed N(0,
n
1
n var (Xiui) =
n
2
2
u E( X i ).
2
2
u E( X i ).
Xiui
i=1
(c) Let
n( 1 - 1 ) =
2
Xi
xN
bN
2
2
u E( X i )) in distribution, and
i=1
xN
2
x
bN approaches E(X ) in probability. It follows that
approaches
in distribution, which is
i
bN
b
distributed N(0,
2
2
u /E( X i )) (Slutskys theorem).
5) (Requires Appendix material) If the Gauss-Markov conditions hold, then OLS is BLUE. In addition, assume
here that X is nonrandom. Your textbook proves the Gauss-Markov theorem by using the simple regression
n
~
aiYi . Substitution of the simple regression
model Yi = 0 + 1 Xi + ui and assuming a linear estimator 1 =
i=1
model into this expression then results in two conditions for the unbiasedness of the estimator:
n
i=1
ai = 0 and
aiXi = 1.
i=1
2
u
2
ai .
i=1
Different from your textbook, use the Lagrangian method to minimize the variance subject to the two
constraints. Show that the resulting weights correspond to the OLS weights.
2
u
L=
2
ai - 1
ai - 2 (
i=1
i=1
aiXi - 1).
i=1
To obtain the first order conditions, take the (n+2) derivatives with respect to the n weights and the two
Lagrange multipliers and set these to zero.
ai
2
u - 1 - 2 Xi; i= 1,..., n
L = 0 = 2ai
L=0=
ai
i=1
n
aiXi - 1
L=0
2
i=1
Using the summation operator on both sides of the first equation and bringing the first constraint into
play then gives 1 = - 2 X . Using this result in the first equation to eliminate the first Lagrange
multiplier results in the following conditions for the n weights: 2ai
2
u = 2 (Xi - X). To bring the second
constraint into play, multiply both sides by Xi and use the summation operator on both sides again 2
n
i=1
aiXi = 2
(Xi - X) Xi or 2
2
u= 2
i=1
multiplier 2 =
n
i=1
2
u
into 2ai
n
i=1
2
u = 2 (Xi - X) then gives 2ai
2
u=
(Xi - X)2
after simplifying ai =
(Xi - X)
n
i=1
2
u
2
u
(Xi - X) and
n
i=1
(Xi - X)2
. But these are the OLS weights, since the OLS slope estimator is
(Xi - X)2
defined as follows
n
n
(Xi - X)Yi
(Xi - X)(Xi - Y)
n
Xi - X
^
i=1
i=1
wi -Yi) , where wi =
.
=
=
=
1
n
n
n
i=1
(Xi - X)2
(Xi - X)2
(Xi - X)2
i=1
i=1
i=1
6) Your textbook states that an implication of the Gauss-Markov theorem is that the sample average, Y, is the
most efficient linear estimator of E(Yi) when Y1 ,..., Yn are i.i.d. with E(Yi) = Y and var(Yi) =
from the regression model with no slope and the fact that the OLS estimator is BLUE.
~ n aY
Provide a proof by assuming a linear estimator in the Ys, =
i i.
i=1
(a) State the condition under which this estimator is unbiased.
(b) Derive the variance of this estimator.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 412
2
Y . This follows
(c) Minimize this variance subject to the constraint (condition) derived in (a) and show that the sample mean is
BLUE.
n
Answer: (a) E( ) = E
ai Yi =
i=1
aiE(Yi) = Y
i=1
i=1
n
ai = 1 .
i=1
aiYi - y )2 = (
2
Y
2
ai -
2
a E(Yi - y )2 =
i=1 i
2
Y
2
ai .
i=1
ai
2
Y ai - ; i = 1,..., n
L=0=2
n
L=0=
ai - 1
i=1
2
Y
ai = n
i=1
2
Y
2
in
= 1,..., n. Since these are also the OLS weights, then OLS is BLUE.
2
Y ai =
2
Y
2
n
; i = 1,..., or ai =
1
;i
n
E(W)2 ) =
w2 f (w)dw
-
w2 f( w)dw +
w2 f (w)dw +
w2 f (w)dw
w2 f (w)dw +
w2 f (w)dw
f (w)dw +
f (w)dw
= 2 Pr( W
),
where the first equality is the definition of E(W2 ), the second equality holds because the range of
integration divides up the real line, the first inequality holds because the term that was dropped is
nonnegative, the second inequality holds because w2 2 over the range of integration, and the final
). Substituting W = V V into the final expression, noting
(a) Write the weighted regression as Yi = 0 X0i + 1 X1i + ui. How would you construct Yi, X0i and X1i?
~
(b) Prove that the variance of is ui homoskedastic.
(c) Which coefficient is the intercept in the modified regression model? Which is the slope?
(d) When interpreting the regression results, which of the two equations should you use, the original or the
modified model?
Answer: (a) Yi =
Yi ~
Xi
~
1
, X0i =
, and X1i =
= 1.
Xi
Xi
Xi
2
Xi
~
ui
var(ui Xi)
(b) var(ui Xi) = var X Xi =
=
= , which is constant.
i
2
2
Xi
Xi
(c) The coefficient on X1i is now the intercept, while the coefficient on X0i is the slope.
(d) The modified model is simply used to obtain estimates of the original model. The modified model
should therefore not be used for interpretation.
9) (Requires Appendix material) Your textbook considers various distributions such as the standard normal, t, 2 ,
and F distribution, and relationships between them.
2
n1
.
(a) Using statistical tables, give examples that the following relationship holds: Fn , =
1
n1
(b) t is distributed standard normal, and the square of the t-distribution with n2 degrees of freedom equals
the value of the F distribution with (1, n2 ) degrees of freedom. Why does this relationship between the t and F
distribution hold?
Answer: (a) For example, the critical value at the 10% significance level for the F-distribution is F30, . the 10%
significance level for the 2 distribution is 40.26 and dividing by 30 results in 1.34.
(b) The textbook states that if W1 and W2 are independent random variables with chi-squared
distributions and respective degrees of freedom n1 and n2 . Then the random variable
F=
W1 /n1
W2/n2
has an F distribution with (n1 , n2 ) degrees of freedom. This distribution is denoted Fn n . For the
1 2
t-distribution, the following holds: Let Z have a standard normal distribution, let W have a
2
m
distribution, and let Z and W be independently distributed. Then the random variable
t=
Z
W/m
Z2
has a Student t distribution with m degrees of freedom, denoted tm. Squaring this term gives t2 =
.
W/m
But if Z1 ,Z2 ,,Zn are n i.i.d standard normal random variables, then the random variable
n
2
W=
Zi
i=1
has a chi-squared distribution with n degrees of freedom. Hence Z2 , the square of a standard normal
Z2 /1
variable, has a chi-square distribution with one degree of freedom. This gives t2 =
= F1,m.
W/m
10) Consider estimating a consumption function from a large cross-section sample of households. Assume that
households at lower income levels do not have as much discretion for consumption variation as households
with high income levels. After all, if you live below the poverty line, then almost all of your income is spent on
necessities, and there is little room to save. On the other hand, if your annual income was $1 million, you could
save quite a bit if you were a frugal person, or spend it all, if you prefer. Sketch what the scatterplot between
consumption and income would look like in such a situation. What functional form do you think could
approximate the conditional variance var(ui Inome)?
Answer: See the accompanying figure. var(ui Inome) could be a + b Income or a + b Income2 . Hence there would
be heteroskedasticity.
Answer: A
5) The multiple regression model can be written in matrix form as follows:
A) Y = X .
B) Y = X + U.
C) Y = X + U.
D) Y = X + U.
Answer: D
6) The linear multiple regression model can be represented in matrix notation as Y= X + U, where X is of order
n(k+1). k represents the number of
A) regressors.
B) observations.
C) regressors excluding the constant regressor for the intercept.
D) unknown regression coefficients.
Answer: C
+ ui, i = 1,, n.
, i = 1,, n.
i i
C) Yi = X + ui, i = 1,, n.
i
D) Yi = X
+ ui, i = 1,, n.
Answer: D
8) The assumption that X has full column rank implies that
A) the number of observations equals the number of regressors.
B) binary variables are absent from the list of regressors.
C) there is no perfect multicollinearity.
D) none of the regressors appear in natural logarithm form.
Answer: C
9) One implication of the extended least squares assumptions in the multiple regression model is that
A) feasible GLS should be used for estimation.
B) E(U|X) = In.
C) X X is singular.
D) the conditional distribution of U given X is N(0 n, In).
Answer: D
10) One of the properties of the OLS estimator is
^
A) X = 0 k+1 .
B) that the coefficient vector
^
C) X (Y X ) = 0 k+1 .
D) (X X)-1 = X Y
Answer: C
n
11) Minimization of
^
i=1
A) X Y = X .
^
B) X = 0 k+1 .
^
C) X (Y X ) = 0 k+1 .
D) R = r.
Answer: C
12) The Gauss-Markov theorem for multiple regression proves that
A) MX is an idempotent matrix.
B) the OLS estimator is BLUE.
C) the OLS residuals and predicted values are orthogonal.
D) the variance-covariance matrix of the OLS estimator is
2
-1
u (X X) .
Answer: B
C) = Y - Y.
D) =
+ (X X)-1 X U
Answer: B
16) The heteroskedasticity-robust estimator of
n( - )
is obtained
A) from (X X)-1 X U.
B) by replacing the population moments in its definition by the identity matrix.
C) from feasible GLS estimation.
D) by replacing the population moments in its definition by sample moments.
Answer: D
17) A joint hypothesis that is linear in the coefficients and imposes a number of restrictions can be written as
A) (X X)-1 X Y.
B) R = r .
^
C) .
D) R = 0.
Answer: B
18) Let there be q joint hypothesis to be tested. Then the dimension of r in the expression
R = r is
A) q 1.
B) q (k+1).
C) (k+1) 1.
D) q.
Answer: A
19) The formulation R = r to test a hypotheses
A) allows for restrictions involving both multiple regression coefficients and single regression coefficients.
B) is F-distributed in large samples.
C) allows only for restrictions involving multiple regression coefficients.
D) allows for testing linear as well as nonlinear hypotheses.
Answer: A
^
^
^
is distributed N( ,
is distributed N( ,
^ ), where
is distributed N( ,
),where
X
),where
X
-1
/n = Q X
^
n( - )
^=
^
2
u I(k+1).
-1
Q X /n.
2
-1
u (X X) .
26) The extended least squares assumptions in the multiple regression model include four assumptions from
Chapter 6 (ui has conditional mean zero; (Xi,Yi), i = 1,, n are i.i.d. draws from their joint distribution; Xi and ui
have nonzero finite fourth moments; there is no perfect multicollinearity). In addition, there are two further
assumptions, one of which is
A) heteroskedasticity of the error term.
B) serial correlation of the error term.
C) the conditional distribution of ui given Xi is normal.
D) invertibility of the matrix of regressors.
Answer: C
27) The OLS estimator for the multiple regression model in matrix form is
A) (X X)-1 X Y
B) X(X X)-1 X - PX
C) (X X)-1 X U
D) (X -1 X)-1 X -1 Y
Answer: A
28) To prove that the OLS estimator is BLUE requires the following assumption
A) (Xi ,Yi) i = 1, , n are i.i.d. draws from their joint drstribution
B) Xi and ui have nonzero finite fourth moments
C) the conditional distribution of ui given Xi is normal
D) none of the above
Answer: D
29) The TSLS estimator is
A) (X X)-1 X Y
B) (X Z(ZZ)-1 ZX)-1 X Z(ZZ)-1 Z Y
C) (X -1 X)-1 (X -1 Y)
D) (XPz )-1 Pz Y
Answer: B
30) The homoskedasticity-only F-statistic is
^
^
(R -r) [R (X X)-1 R]-1 (R -r)/q
A)
2
s ^
u
B)
^
^
(R -r) [R (X X)-1 R]-1 (R -r)
2
s ^
u
^
C)
(R -r) [R
^
R]-1 (R -r)
q
^
U PZU
D) ^
^
U MZU
Answer: A
these parameters have been estimated, they can then be used to calculate
^
^
^
feasible GLS estimator is defined as GLS= (X -1 )-1 (X -1 Y).
, the estimator of
. The
joint
distribution;
Xi and ui have nonzero finite fourth moments;
var(ui Xi) =
2
u (homoskedasticity);
2
u In, the Gauss-Markov conditions for
multiple regression. If these hold, then OLS is BLUE. If assumptions 5 and 6 do not hold, but
assumptions 1 to 4 still hold, then OLS is consistent and asymptotically normally distributed. Small
sample statistics can be derived for the case where the errors are i.i.d. and normally distributed,
conditional on X.
The GLS assumptions are
1.
E(U X) = 0 n;
2.
3.
4.
The major differences between the two sets of assumptions relevant to the estimators themselves are that
(i) GLS allows for homoskedastic errors to be serially correlated (dropping assumption 2 of OLS list),
and (ii) there is the possibility that the errors are heteroskedastic (adding assumption 2 to GLS list). For
the case of independent sampling, replacing E(UU X) = (X) with E(UU X) =
2
u In turns the GLS
since the estimator typically cannot be computed. The result also holds if an estimator of exists.
However, for the feasible GLS estimator to be consistent, the first GLS assumption must apply, which is
much stronger than the first OLS assumption, particularly in time series applications. It is therefore
possible for the OLS estimator to be consistent while the GLS estimator is not consistent.
2) Give several economic examples of how to test various joint linear hypotheses using matrix notation. Include
specifications of R = r where you test for (i) all coefficients other than the constant being zero, (ii) a subset of
coefficients being zero, and (iii) equality of coefficients. Talk about the possible distributions involved in
finding critical values for your hypotheses.
Answer: Answers will vary by student. Many restrictions involve the equality of coefficients across different
types of entities in cross-sections (stability).
Using earnings functions, students may suggest testing for the presence of regional effects, as in the
textbook example at the end of Chapter 5 (exercises). The textbook tested jointly for the presence of
interaction effects in the student achievement example at the end of Chapter 6. Students may want to test
for the equality of returns to education and on-the-job training. The panel chapter allowed for the
presence of fixed effects, the presence of which can be tested for. Testing for constant returns to scale in
production functions is also frequently mentioned.
Consider the multiple regression model with k regressors plus the constant. Let R be of order q (k+ 1),
where q are the number of restrictions. Then to test (i) for all coefficients other than the constant to be
zero, H0 : 1 = 0, 2 = 0,. . ., k = 0 vs. H1 : j 0, at least one j, j=1, ..., n, you have R = [0 k1 Ik ] and r =
0 k1 . In large samples, the test will produce the overall regression F-statistic, which has a Fk,
distribution. In case (ii), reorder the variables so that the regressors with non-zero coefficients appear
first, followed by the regressors with coefficients that are hypothesized to be zero. This leads to the
following formulation
Yi =
0+
kXki + ui,
i = 1, , n. R = [0 q (k-q+1) Iq ] and r = 0 q1 . In large samples, the test will produce an F-statistic, which
has an Fq, distribution. In (iii), assume that the task at hand is to test the equality of two coefficients,
say H0 : 1 = 1 vs. H1 : 1
2 , as in section 5.8 of the textbook.
Then R = [0 1 -1 0 0], r = 0 and q = 1. This is a single restriction, and the F-statistic is the square of
the corresponding t-statistic. Hence critical values can be found either from Fq, or from the standard
normal table, after taking the square root.
3) Define the GLS estimator and discuss its properties when
is known. Why is this estimator sometimes called
infeasible GLS? What happens when is unknown? What would the matrix look like for the case of
2
independent sampling with heteroskedastic errors, where var( ui Xi) = ch(Xi) = 2 X 1i ? Since the inverse of the
error variance-covariance matrix is needed to compute the GLS estimator, find -1 . The textbook shows that
~ ~
~
~
~
~
the original model Y = X + U will be transformed into Y = X + U, where Y = FY, X = FX, and U = FU, and
F F = -1 . Find F in the above case, and describe what effect the transformation has on the original data.
Answer:
^ GLS
= (X -1 X)-1 (X -1 Y). The key point for the GLS estimator with known is that is used to
create a transformed regression model such that the resulting error term satisfies the Gauss-Markov
conditions. In that case, GLS is BLUE. However, since
is typically unknown, the estimator cannot be
calculated, and is therefore sometimes referred to as infeasible GLS. If
is unknown, then a feasible GLS
estimator can be calculated if is a known function of a number of parameters which can be estimated.
Once the parameters have been estimated, they can then be used to calculate
. The feasible GLS estimator is then
^ GLS
= (X
-1 X)-1 (X ^ -1 Y).
E(UU X) =
(X) = 2 ,
2
X 12 N
2
X 11
2
X 12
-1 (X) = 1
2 O
R O
2
N X 1n
1
0
X11
1
N 0
X12
,F=
R O
1
N
2
X 1n
N 0
R O
1
N
X1n
=(
1
a b -1 =
d -b )
ad
bc
c d
-c a
you decide to write the multiple regression model in deviations from mean form. Show what the X matrix, the (
X X) matrix, and the X Y matrix would look like now.
(Hint: use small letters to indicate deviations from mean, i.e., zi = Zi - Z and note that
^
Yi = 0 +
Y=
0+
^
^
1 X1i +
1 X1 +
^
^
2 X2i + ui
2 X2 .
y i = 1 x 1i + 2 x 2i + ui)
(d) Show that the slope for the population growth rate is given by
n
^
1=
y ix 1i
2
x 2i -
2
x 1i
i=1
y ix 2i
i=1
i=1
i=1
x 1ix 2i
i=1
n
2
x 2i - (
x 1ix 2i )2
i=1
i=1
(e) The various sums needed to calculate the OLS estimates are given below:
n
2
x 1i = .0122;
i=1
i=1
n
2
y i = 8.3103;
y ix 1i = -0.2304;
i=1
2
x 2i = 0.6422
i=1
y ix 2i = 1.5676;
i=1
x 1ix 2i = -0.0520
i=1
Find the numerical values for the effect of population growth and the saving rate on per capita income and
interpret these.
(f) Indicate how you would find the intercept in the above case. Is this coefficient of interest in the
interpretation of the determinants of per capita income? If not, then why estimate it?
Answer: (a)
1 X 11 X21
X 12 X22
X= 1
, and
... ...
...
1 X 1n X2n
1
2
(b) You would expect the population growth rate to have a negative coefficient, and the saving rate to
have a positive coefficient. The order of X X is 33.
n
n
n
2
y ix 1i
x 1ix 2i
x
1i
x 11 x21
i=1
i=1
i=1
x 12 x22
(c) X =
, X X= n
,
X
X
.
=
n
n
2
... ...
x 1ix 2i
y ix 2i
X 2i
x 1n x2n
i=1
i=1
i=1
(d)
n
i=1
n
i=1
2
x 1i
x 1ix 2i
n
i=1
n
i=1
-1
x 1ix 2i
2
x 2i
2
x 2i
x 1ix 2i
i=1
i=1
n
n
2
n
n
n
x 1ix 2i
2
2
x 1i
x 1ix 2i )2 x 1i - (
x 1i
i=1
i=1
i=1
i=1
i=1
1
i=1
n
y ix 1i
^
i=1
Stock/Watson 2e -- CVC2 8/23/06 -- Page 424
1 =
2
y ix 1i
i=1
i=1
n
i=1
2
x 1i
i=1
x 1ix 2i
x 1ix 2i )2
i=1
n
y ix 1i
i=1
n
n
i=1
2
x 2i - (
2
x 1i -
i=1
n
y ix 2i
i=1
y ix 2i
i=1
2
x 1i
n
i=1
i=1
n
2
x 2i -
2
x 2i - (
x 1ix 2i
i=1
n
x 1ix 2i )2
i=1
(e)
^
^
-0.23040.6422-(1.5676(-0.0520))
0.01220.6422-(-0.0520)2
1 =
1.56760.0122- ((-0.2304) (-0.0520)2
2
0.01220.6422- (-0.0520)2
= -12.953 .
1.393
A reduction of the population growth rate by one percent increases the per capita income relative to the
United States by roughly 0.13. An increase in the saving rate by ten percent increases per capita income
relative to the United States by roughly 0.14.
(f) The first order condition for the OLS estimator in the case of k = 2 is
n
n
n
Yi = n ^0 + ^1
X1i + ^2
X2i , which, after dividing by n, results in ^1 = Y - ^1 X1 - ^2 X2 . The
i=1
i=1
i=1
intercept is only of interest if there are observations close to the origin, which is not the case here. If it is
set to zero, then the regression is forced through the origin, instead being allowed to choose a level.
5) In Chapter 10 of your textbook, panel data estimation was introduced. Panel data consist of observations on the
same n entities at two or more time periods T. For two variables, you have
(Xit, Yit), i = 1,..., n and t = 1,..., T
where n could be the U.S. states. The example in Chapter 10 used annual data from 1982 to 1988 for the fatality
rate and beer taxes. Estimation by OLS, in essence, involved stacking the data.
(a) What would the variance-covariance matrix of the errors look like in this case if you allowed for
homoskedasticity-only standard errors? What is its order? Use an example of a linear regression with one
regressor of 4 U.S. states and 3 time periods.
(b) Does it make sense that errors in New Hampshire, say, are uncorrelated with errors in Massachusetts
during the same time period (contemporaneously)? Give examples why this correlation might not be zero.
(c) If this correlation was known, could you find an estimator which was more efficient than OLS?
Answer: (a) Under the extended least least squares assumptions, E(UU X) =
2
u In.
In the above example of 4 U.S. states and 3 time periods, the identity matrix will be of order 12 12, or
(nT) (nT) in general. Specifically
.
Stock/Watson 2e -- CVC2 8/23/06 -- Page 425
(b) It is reasonable to assume that a shock to an adjacent state would have an effect on its neighboring
state, particularly when the shock affects the larger of the two such as the case in Massachusetts. Other
examples may be Texas and Arkansas, Michigan and Indiana, California and Arizona, New York and
New Jersey, etc. A negative oil price shock, which affects the demand for automobiles produced in
Michigan, will have repercussions for suppliers located not only in Michigan, but also elsewhere.
(c) In case of a known variance-covariance matrix of the error terms, the GLS estimator
^ GLS
= (X -1 X)-1 (X -1 Y) could be used. The variance-covariance matrix would be of the form
(There is a subtle issue here for the case of a feasible GLS estimator, where the variances and covariances
have to be estimated. It can be shown, in that case, that the GLS estimator does not exist unless n T,
which is not the case for most panels. It is easier to see that the variance-covariance matrix is singular for
n>T if the data is stacked by time period.)
= (X X)-1 X Y.
Show that the estimator does not exist if there are fewer observations than the number of explanatory variables,
including the constant. What is the rank of X X in this case?
Answer: In order for a matrix to be invertible, it must have full rank. Since X X is of order (k + 1) (k + 1), then in
order to invert X X , it must have rank (k+1). In the case of a product such as X X, the rank is less than or
equal to the rank of X or X, whichever is smaller. X is of order n (k + 1), and assuming that there is no
perfect multicollinearity, will have either rank n or rank (k+1), whichever is the smaller of the two. Hence
if there are fewer observations than the number of explanatory variables (including the constant), then
the rank of X will be n(< k+1), and the rank of X X is also n( < k +1). Hence X X does not have full rank,
and therefore cannot be inverted. The OLS estimator does not exist as a result.
Y2
O
Yn
,U=
u1
u2
O
un
, X=
X11
X12
O
X1n
, and
= ( 1)
X1iYi , and X X =
2
X 1i . Hence
i=1
i=1
n
^
i=1
= (X X)-1 X Y =
n
X1iY1
.
2
X 1i
i=1
3) Write the following three linear equations in matrix format Ax = b, where x is a 31 vector containing q, p, and
y, A is a 33 matrix of coefficients, and b is a 31 vector of constants.
q = 5 +3 p 2 y
q = 10 p + 10 y
p=6y
p
5
-3 1 2
-3 1 2
Answer: A = 1 1 -10 , x = q , b = 10 or 1 1 -10
y
0
1 0 -6
1 0 -6
-2
3
4) Let Y = 10 and X =
2
2
1
1
1
1
1
p
5
q = 10 .
y
0
0
1
3
-1
2
5) A =
a11 a12
a21 a22
,B=
b11 b12
c c c
, and C = 11 12 13
b21 b22
c21 c22 c23
a11 + b11
a12 + b12
a21 + b21
a22 + b22
a11 a21
A =
a12 a22
a11 + b11
a21 + b21
a12 + b12
a22 + b22
b11 b21
a +b
a +b
, A + B = 11 11 21 21 .
b12 b22
a12 + b12 a22 + b22
,B=
(AC) =
, (A+B) =
c21
a11 a12
,
c22 , A =
a21 a22
c23
6) Write the following four restrictions in the form R = r, where the hypotheses are to be tested simultaneously.
3 = 2 5,
1 + 2 = 1,
4 = 0,
2 = - 6.
Can you write the following restriction 2 = -
3
1
0
0
Answer: 0
0
0
0
1
0
0
0
1
0
1
1
0
0
0
0
0
1
0
-2
0
0
0
0
0
0
1
0
1 .
3 =
0
4
0
5
2
6
The restriction 2 = -
3
1
7) Using the model Y = X + U, and the extended least squares assumptions, derive the OLS estimator . Discuss
the conditions under which X X is invertible.
Answer: The derivation copies the relevant parts of section 16.1 of the textbook. The model is Y = X + U, where Y
u1
Y1
1 X11 N Xk1
0
u2
X
X
, X = 1 12 N k2 , and = 1 .
O
O
O
O O R O
un
Yn
X
X
k
1 1n N kn
Y is the n1 dimensional vector of n observations on the dependent variable, X is the n(k + 1)
=
Y2
,U=
dimensional matrix of n observations on the k+1 regressors (including the constant regressor for the
intercept), U is the n1 dimensional vector of the n error terms, and is the (k+1)1 dimensional vector
of the k+1 unknown regression coefficients.
The extended least squares assumptions are:
E(ui Xi) = 0 (ui has conditional mean zero);
(Xi,Yi), i = 1, ..., n are independently and identically distributed (i.i.d.) draws from their joint
distribution;
Xi and ui have nonzero finite fourth moments.
X has full column rank (there is no perfect multicollinearity);
var(ui Xi) =
2
u (homoskedasticity);
of squared prediction mistakes with respect to each element of the coefficient vector, setting these
^
derivatives to zero, and solving for the estimator . The derivative on the right-hand side of above
equation is the jth element of the k+1 dimensional vector, 2X (Y Xb), where b is the k+1 dimensional
vector consisting of b0 ,, bk. There are k+1 such derivatives, each corresponding to an element of b.
Combined, these yield the system of k+1 equations that constitute the first order conditions for the OLS
^
estimator that, when set to zero, define the OLS estimator . That is,
equations,
X (Y X
^
)= 0 k+1 ,
or, equivalently, X Y = X X . Solving this system of equations yields the OLS estimator
form:
^
= in matrix
= (X X ) -1 X Y ,
8) Prove that under the extended least squares assumptions the OLS estimator
2
-1
u (X X) .
variance-covariance matrix is
Answer: Start the proof by relating the OLS estimator to the errors
^
= (X X)-1 X Y = (X X)-1 X (X + U) =
+ (X X)-1 XU.
To prove the unbiasedness of the OLS estimator, take the conditional expectation of both sides of the
expression.
^
E( X) =
+ E[(X X)-1 X U X] =
+ (X X)-1 X E(U X)
E( X) = .
^
X], we have
var( X) =
2
-1
-1
u (X X) X X(X X) =
2
-1
u (X X) .
^
9) For the OLS estimator = (X X)-1 X Y to exist, X X must be invertible. This is the case when X has full rank.
What is the rank of a matrix? What is the rank of the product of two matrices? Is it possible that X could have
rank n? What would be the rank of X X in the case n<(k+1)? Explain intuitively why the OLS estimator does not
exist in that situation.
Answer: The rank of a matrix is the maximum number of linearly independent rows or columns. In general, in the
case of a rectangular matrix, the maximum number of linearly independent columns is also equal to the
maximum number of linearly independent rows. In the case of X, it can be, at most, either n or (k+1),
whichever is smaller. The rank of product of two matrices will be, at most, the minimum of the rank of
the two matrices of the product. In the case of X X, both matrices will have, at most, either rank n or
(k+1), whichever is smaller. Since X X is a square matrix of order (k+1)(k+1), it must have full rank in
order to be invertible. In the absence of perfect multicollinearity, the rank will be (k+1) as long as (k+1)
n. If there are fewer observations than regressors (including the constant), then the rank will be n. Except
for the special case where there are exactly as many observations as regressors (including the constant),
X X will not have full rank in this case, and cannot be inverted. Intuitively you have to have as many
independent equations as there are unknowns to find a unique solution. This is not the case when you
have n<(k+1).
10) In order for a matrix A to have an inverse, its determinant cannot be zero. Derive the determinant of the
following matrices:
A=
3 6
-2 1
1 -1 2
B= 1 0 3
4 0 2
X X where X = (1 10)
Answer: det (A) =15, det (B) = -10, det (X X) = 0.
11)
Your textbook shows that the following matrix (Mx = In - Px ) is a symmetric idempotent matrix.
1
1
Consider a different Matrix A, which is defined as follows: A = I and = 1
n
...
1
a. Show what the elements of A look like.
b. Show that A is a symmetric idempotent matrix
c. Show that A = 0.
^
d. Show that AU= U , where U is the vector of OLS residuals from a multiple regression.
1
Answer: a. A = 0
...
0
0
1
...
0
...
...
...
...
0
1
0 - 1 1
n ...
...
1
1
1
1
...
1
...
...
...
...
1
1-1/n -1/n ...
1 = -1/n 1-1/n ...
...
...
...
...
1
-1/n -1/n ...
-1/n
-1/n
...
1-1/n
1
n
d. AU = ( I -
1
n
) = -
1
n
)U=U-
= 0 since
1
n
n
n
...
n
n
n
...
n
...
...
...
...
n
n , and
...
n
'=n
^
U = U since U = 0
12) Write down, in general, the variance-covariance matrix for the multiple regression error term U. Using the
assumptions cov(u i ,uj|XiXj) = 0 and var(u i|Xi) =
as
2
u . Show that the variance-covariance matrix can be written
2
u In .
u1
Answer: (var-cov)(
u2
...
un
2
u1
=E(
|X) = E(
u1 -E(u1 ) u1 -E(u1 )
u2 -E(u2 ) u2 -E(u2 )
...
...
un -E(un ) un -E(un )
2
u 0
u1 u2 ... u1 un
u2 u1 u 2
2
... u2 un
...
... ...
...
|X) =
un u1 un u2 ... u 2
n
u1
|X) = E(
...
un
u1 u2 ... un |X
... 0
2
u ... 0
u2
...
...
... ...
...
2
u In
2
u
1
13) Consider the following symmetric and idempotent Matrix A: A = I n
1
and = 1
...
1
a.
Show that by postmultiplying this matrix by the vector Y (the LHS variable of the OLS regression),
you convert all observations of Y in deviations from the mean.
b.
Derive the expression YAY. What is the order of this expression? Under what other name have
you encountered this expression before?
1
Y = Y. Given this result, then if you pre multiply Y with A, you get
n
Y1 -Y
AY = ( I
1
n
)Y=Y
Y=
Y2 -Y
...
Yn -Y
1 X1
Y
where Y= 2 , X= 1 X2 ,
...
... ...
Yn
1 Xn
u1
u2
0 , U=
...
1
un
Given the following information on population growth rates (Y) and education (X) for 86 countries
n
n
n
n
n
2
2
Yi = 1.594 ,
Xi = 449.6 ,
Y i = 0.03982 ,
X i = 3,022.76 ,
XiYi = 6.4697
i=1
i=1
i=1
i=1
i=1
a)
b)
Xi
n
Answer: a. X X =
n
i=1
n
X Y==
i=1
n
Xi
i=1
n
2
Xi
449.6
= 86
449.6 3022.76
i=1
Yi
Xi Yi
= 1.594
6.4697
i=1
(X X)-1 =
3022.76 -449.6
863022.76 - 449.6 2 -449.6 866
1
(X X)-1 X Y = 0.0331
-0.0028
b. According to these results, five more years of education will lower population growth rates by roughly
one percent.
15) You have obtained data on test scores and student -teacher ratios in region A and region B of your state. Region
B, on average, has lower student-teacher ratios than region A. You decide to run the following regression
Yi =
1 X1i
2 X2i
3 X3i
+ ui
where X1 is the class size in region A, X2 is the difference between the class size between region A and B, and X 3
is the class size in region B. Your regression package shows a message indicating that it cannot estimate the
above equation. What is the problem here and how can it be fixed? Explain the problem in terms of the rank of
the X matrix.
Answer: There is perfect multicollinearity here, in that X2 = X 1 -X3 , hence the X matrix (and the XX) matrix does
not have full rank (rank = 3 here, not 4). If the XX is singular, you cannot invert it, since its determinant
is zero. Dropping one of the three explanatory variables allows you to estimate the above equation.