Sunteți pe pagina 1din 12

BEE2006

UNIVERSITY OF EXETER

BUSINESS SCHOOL

May/June 2012

STATISTICS AND ECONOMETRICS

Module Convenors: Dr. Paulo M.D.C. Parente Dr. Ana Fernandes

Duration: TWO HOURS

Answer ONLY ONE question from SECTION A, ONLY ONE question from SECTION B and BOTH questions from SECTION C. Use a separate answer booklet for each section. Materials to be supplied: Statistical Tables Instructions (please read before starting): Write in a clear legible manner in ink/ballpoint. Do not use pencils or erasable pens. Approved calculators are permitted. Only one sheet (2 sides A4) of notes made exclusively by the student may be consulted (no material distributed by the teacher in any form is allowed). Whenever conducting a test use a 5% signicance level unless stated otherwise. Also be sure to state null and alternative hypotheses, null distribution (with degrees of freedom), rejection criterion (critical values and rejection region) and outcome. If you are asked to derive something, give all intermediate steps also. Do not answer questions with a yes or no only, but carefully justify your answer. 1

Section A - Answer only one question


Question 1 Consider the following model to explain child birth weight in terms of various factors bwght = 0 + 1 cigs + 2 parity + 3 faminc +4 motheeduc + 5 f atheeudc + u, where u N (0, 2 ) and the variables in the model are: bwght = birth weight in pounds; cigs = average number of cigarettes the mother smoked per day during pregnancy; parity = the birth order of the child; f aminc = annual family income; motheduc = years of schooling for the mother; f atheduc = years of schooling for the father. (a) (6 Marks) Does this regression model necessarily imply a causal relationship between childs birth weight and the regressors cigs, parity , f aminc, motheduc and f atheduc? Justify your answer. (b) (5 Marks) Interpret 3 . (c) (6 Marks) Using data from the US 1988 National Health Interview Survey the following results were obtained bwght = 114.524 0.5959cigs + 1.7876parity + 0.0560faminc
(3.7285) (0.1104) (0.6594) (0.0366)

(1)

0.3705motheeduc + 0.4724fatheeduc,
(0.3199) (0.2826)

n = 1191, T SS = 482722.355, SSR = 464040.052, where T SS is the Total Sum of Squares, SSR is the Sum of Squared Residuals, and the standard errors of estimated coecients are reported in brackets. Test the individual signicance of motheeduc and fatheeduc at 10% level. (d) (6 Marks) Test the signicance of the overall regression. (e) (6 Marks) Let u denote the residual of regressing bwght on cigs, parity and f aminc and consider the following regression u = 0.9456 0.0019cigs 0.0447parity 0.011 f aminc
(3.7285) (0.1104) (0.6594) (0.0366)

0.3705motheeduc + 0.4724f atheeduc,


(0.3199) (0.2826)

= 0.0024.

Are motheeduc and f atheeduc jointly signicant? 2

(f) (6 Marks) The R2 of the regression of the squared of the residuals of (1) on cigs, parity , f aminc, motheduc and f atheduc and respective squares is 0.0029. Test for Heteroskedasticity. Question 2 We are interested in investigating how the price of a house depends on the characteristics of the house in Boston, US. We consider the model log(price) = 0 + 1 sqrf t + 2 bdrms + u, where u N (0, 2 ) and the variables in the model are: price = house price, in thousands of dollars; sqrf t =size of house in square feet; bdrms =number of bedrooms. (a) (5 Marks) Interpret 2 . (b) (6 Marks) Using data collected from the Boston Globe during 1990 the following results were obtained log(price) = 4.76603 + 0.00038sqrf t + 0.02888bdrms,
(0.09704) (0.000040) 2 (0.02964)

n = 88, R = 0.5883, (Standard errors of estimated coecients are reported in brackets.) Test whether the size of house in square feet has a signicant positive eect on log(price). (c) (6 Marks) Test the overall signicance of the regression. (d) (6 Marks) We are interested in estimating and obtaining a condence interval for the percentage change in price when a 150-square-foot bedroom is added to a house. In decimal form, this is 1 = 1501 + 2 . Estimate and construct a 95% condence interval for 1 given that the estimated covariance between the OLS estimator for 1 and 2 is 0.000000681. (e) (6 Marks) We now include the squares of bdrms in the regression model. log(price) = 5.07139 + 0.00038sqrft 0.13086bdrms + 0.01999bdrms2 ,
(0.27108) (0.000040) (0.13573) (0.01657)

(2)

n = 88, R2 = 0.5883, SSR = 3. 2434. Test whether the number of bedrooms aects the price of the house taking into account that the R2 of the restricted model is 0.568.

(f) (6 Marks) Now we are interested in studying if the regression model diers between colonial houses and non-colonial houses. The regression for non-colonial houses yields log(price) = 6.12642 + 0.00033sqrft 0.76368bdrms + 0.11269bdrms2 ,
(0.63578) (8e005) (0.37576) (0.05902)

n = 27, R = 0.6366, SSR = 0.94035. Running a regression for colonial houses we obtain log(price) = 4.7786 + 0.0004 sqrft + 0.01041bdrms + 0.00229bdrms2 ,
(0.39637) (5e005) (0.18493) (0.02126)

n = 61, R = 0.6090, SSR = 2.021. Test whether the regression function is identical for colonial and non-colonial houses.

Section B- Answer only one question


Question 1 (a) To study the eect of womens education (schooling) on fertility we estimate model (3) below where the dependent variable, kids, is the number of children born to women aged between 35-54 and educ denotes the years of schooling. We also include as regressors age and its squared term agesq , a binary variable that takes the value of one if the individual is black and zero otherwise, black ; a binary variable that takes the value of one if the individual lived in a rural area at the age of 16, othrural; and a binary variable taking the value of one if the individual lived in a small city at the age of sixteen and zero otherwise, smcity . kids = 0 + 1 educ + 2 age + 3 age2 + 4 black + 5 othrural + 6 smcity + u (3) One could argue that education, educ, is not an exogenous determinant of fertility. Womens education could be correlated with unobservable characteristics that are jointly determined with fertility. We have two instrumental variable candidates for education, the individuals fathers years of education, feduc, and the individuals mother years of education, meduc. We estimate a number of models, provided

below, using OLS and Two Stage Least Squares (2SLS). Model 1: OLS, using observations 11129 Dependent variable: kids Coecient constant 8.11296 educ 0.134841 age 0.551360 agesq 0.00596589 black 0.862121 othrural 0.207259 smcity 0.186718 Std. Error 3.06963 0.0181137 0.139837 0.00158168 0.168723 0.158015 0.143372 t-test 2.6430 7.4442 3.9429 3.7719 5.1097 1.3116 1.3023 p-value 0.0083 0.0000 0.0001 0.0002 0.0000 0.1899 0.1931

R2 0.092255 F (6, 1122) 19.00493

Adjusted R2 P-value(F )

0.087400 3.65e21

Model 2: OLS, using observations 11129 Dependent variable: educ Coecient constant 14.0525 age 0.237050 agesq 0.00267332 black 0.431187 othrural 0.463964 smcity 0.186039 meduc 0.182272 feduc 0.218522 Std. Error t-ratio p-value 4.36629 3.2184 0.0013 0.199523 1.1881 0.2351 0.00225641 1.1848 0.2364 0.242351 1.7792 0.0755 0.225901 2.0538 0.0402 0.204700 0.9088 0.3636 0.0219009 8.3226 0.0000 0.0251017 8.7055 0.0000 Adjusted R2 P-value(F ) 0.271114 2.91e74

R2 0.275637 F (7, 1121) 60.93821

Model 3: OLS, using observations 11129 Dependent variable: kids constant educ age agesq black othrural smcity Model 2 Residuals Coecient 7.63497 0.155673 0.542543 0.00587699 0.859666 0.229508 0.195752 0.0278269 Std. Error 3.15309 0.0361375 0.140496 0.00158769 0.168805 0.161544 0.144047 0.0417661 Adjusted R2 P-value(F ) t-ratio 2.4214 4.3078 3.8616 3.7016 5.0927 1.4207 1.3589 0.6663 0.086948 1.35e20 p-value 0.0156 0.0000 0.0001 0.0002 0.0000 0.1557 0.1744 0.5054

R2 0.092614 F (7, 1121) 16.34528

Model 4: 2SLS, using observations 11129 Dependent variable: kids Instrumented: educ Instruments: constant age agesq black othrural smcity meduc feduc Coecient constant 7.63497 educ 0.155673 age 0.542543 agesq 0.00587699 black 0.859666 othrural 0.229508 smcity 0.195752 Std. Error 3.15417 0.0361498 0.140544 0.00158824 0.168863 0.161599 0.144096 z 2.4206 4.3063 3.8603 3.7003 5.0909 1.4202 1.3585 p-value 0.0155 0.0000 0.0001 0.0002 0.0000 0.1555 0.1743

R2 0.091781 F (6, 1122) 12.84828

Adjusted R2 P-value(F )

0.086924 4.45e14

Sargan over-identication test Null hypothesis: all instruments are valid Test statistic for over-identication: LM = 0.0582575 with p-value = 0.809272 (i) (5 Marks) Specify the equation for educ and explain why the parameters of that equation can be estimated by OLS. (ii) (6 Marks) Use the relevant output from above to test for instrumental variable relevance and assess whether meduc and f educ are suitable instruments for educ. (iii) (6 Marks) What do you conclude regarding Sargans over-identication test result? (provided at the end of the output for Model 4). (iv) (6 Marks) Using the relevant output from above, conduct Hausmans endogeneity test. Provide the null, the alternative hypothesis and the numerical value of the test. What do you conclude regarding the endogeneity of educ? 6

(v) (6 Marks) Since there is no presence of heteroskedasticity the usual standard errors are reported in all estimated models. Bearing this into consideration and given your decision regarding Hausmans endogeneity test which is your preferred estimate of parameter 1 ? Why? (b) (6 Marks) Consider a simple model to estimate the eect of computer ownership on the average mark of graduating students at a large UK university: M ARK = 0 + 1 P C + u. Is it reasonable to assume that PC ownership is likely to be uncorrelated with u? Explain. Question 2 (a) (5 Marks) Consider the multiple regression model: yt = 0 + 1 xt1 + ... + k xtk + ut . Assume that the explanatory variables, xtj , are strictly exogenous. Further, ut follows an AR(q ) process: ut = 1 ut1 + 2 ut2 + ... + q utq + et . Explain how you would test for serial correlation. (b) (6 Marks) Specify and explain the meaning of the contemporaneous exogeneity assumption for explanatory variables in time series analysis. (c) Consider the following partial adjustment model:
yt = 0 + 1 x t + e t , yt yt1 = (yt yt1 ) + at ,

0 < < 1,

where yt is the desired growth in rm inventories and yt is the actual (observed) growth. xt represents the growth in rm sales. The parameter 1 measures the eect of xt on yt .

(i) (6 Marks) Explain what the second equation describes and how you would interpret the parameter . (ii) (6 Marks) Show that we can write: yt = 0 + 1 yt1 + 2 xt + ut . In particular, provide the expressions for the s in terms of s and and nd ut in terms of et and at . (iii) (6 Marks) If E (et |xt , yt1, xt1 , ...) = 0 and E (at |xt , yt1, xt1 , ...) = 0, and all series are weakly dependent, how would you estimate the s in the model of part (ii) above? Explain. (iv) (6 Marks) If 1 = 0.7 and 2 = 0.2, what are the estimates of 1 and ? 7

Section C- Answer both questions


Question 1 Are the following statements correct? (Justify carefully your answers) (a) (5 Marks) From asymptotic theory we learn that - under appropriate conditions the error terms in a regression model will be approximately normally distributed if the sample size is suciently large. (b) (5 Marks) In a random sample under the assumption of homoskedasticity the generalised least squares estimator and the ordinary least squares estimator are identical. (c) (5 Marks) Suppose that we want to estimate the eect of several variables on annual saving and that we have a panel data set on individuals collected on January 20, 2000, and January 20, 2002. If we include a year dummy for 2002 and use rst dierencing, we can also include age in the original model. (d) (5 Marks) We can use rst dierences when we have independent cross sections in two years. Question 2 (10 Marks) Consider the linear regression model yi = + ui , i = 1, ..., n, E (ui |xi ) = 0, var(ui |xi ) = 2 , where the observations {(yi , xi ), i = 1, ..., n} are independent. Let
n

b= Show that

i=1 n

yi xi x2 i

i=1

var(b|x1 , ..., xn ) = justifying all the steps of the derivation.

2
n

x2 i=1 i

[end of paper]

S-ar putea să vă placă și