Sunteți pe pagina 1din 10

Final test Wednesday, 11 May 2011

Practical Statistics I, 2010/11

(This page has been made available in advance, please read it carefully.) This paper consists of two parts: data part and questions part. The pages in the rst (data) part are numbered with roman numerals. It contains description of data and other background information. The questions are in the second part. Its pages are numbered with arabic numbers. Answer as many questions as you can using the spaces provided on the question sheets. If you run out of room for an answer, continue on the back of the respective page. When you have nished or when I announce the end of the test, submit your R work as described in the last question. The assessment will be based mainly on your written submission. The electronic submission will be used only for verication and possibly bonus marks but failure to do it within a few minutes after the end of the test may be penalised. Before the test starts Please come early to have sucient time to log in. Start R, remove all objects from the workspace using the menu item Misc->Remove all objects or the command rm(list=ls(all=TRUE)), change the working directory to a folder on your p:-drive, and load the package psistudent. If package psistudent does not load successfully at rst attempt, quit R and try again after a couple of minutes. Once you have done the above, you should not leave R until you have nished. For some exercises you may need additional package(s). You may load them at any time during the test. You must write your name (legibly) before you start answering any questions. When the test ends When I announce the end of the test you should stop working immediately, tear o the last page of the paper, close you script (ready to be collected) and do the instructions in the last question. Any attempt to continue working will be classed as cheating. Failure to hand your script as soon as you are asked will be penalised with a mark of zero for the test. Important notes Mobile phones should be switched o during the test. Communication (by any means) with other people is strictly forbidden. In particular, you must not log into any online accounts (email, facebook, etc.) or sent any information anywhere. Opening the web pages of such sites will be deemed attempted cheating. Attempts for communication and other forms of cheating will be penalised with a zero mark for the test. Further disciplinary action may be taken if deemed appropriate. The test is open book in a very general sense. You may use any materials you need as long as you do not violate the above restrictions. Do not ask for help during the test, you will not get any. If you get stuck, continue with another question.

Page i

P.T.O.

Final test Data part

Practical Statistics I, 2010/11

Graphics: you are not required to draw graphs on paper (but you may if you wish). Produce the plots on the computer and write down only the appropriate comments. Please do not ask questions like What shall I write here?. You are expected to give competent answers to questions. Figuring out what is needed is an essential part of the assessment. Similarly, you should be able to determine if a question requires computing or analytical work (or both). If you are not sure if something is allowed, you should assume that it is not.

Page ii

P.T.O.

Final test Data part

Practical Statistics I, 2010/11

1. The volume (i.e., the eective wood production in cubic meters), height (in meters), and diameter (in meters, measured at 1.37 meter above the ground) are recorded for 31 black cherry trees in the Allegheny National Forest in Pennsylvania. They were collected to nd an estimate for the volume of a tree (and therefore for the timber yield), given its height and diameter. For each tree the volume y and the value of x = d2 h are recorded, where d and h are the diameter and height of the tree. The data is printed below and is available in the usual way in le cherrytrees.txt. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Diameter Height Volume 0.21 21.30 0.29 0.22 19.80 0.29 0.22 19.20 0.29 0.27 21.90 0.46 0.27 24.70 0.53 0.27 25.30 0.56 0.28 20.10 0.44 0.28 22.90 0.52 0.28 24.40 0.64 0.28 22.90 0.56 0.29 24.10 0.69 0.29 23.20 0.59 0.29 23.20 0.61 0.30 21.00 0.60 0.30 22.90 0.54 0.33 22.60 0.63 0.33 25.90 0.96 0.34 26.20 0.78 0.35 21.60 0.73 0.35 19.50 0.71 0.36 23.80 0.98 0.36 24.40 0.90 0.37 22.60 1.03 0.41 21.90 1.08 0.41 23.50 1.21 0.44 24.70 1.57 0.44 25.00 1.58 0.45 24.40 1.65 0.46 24.40 1.46 0.46 24.40 1.44 0.52 26.50 2.18

Page iii

P.T.O.

Final test Questions part Your Name:

Practical Statistics I, 2010/11 Wednesday, 11 May 2011

1. (30 marks) (a) Create a vector x1 containing a random sample of length 100 from t3 , the Students t distribution with 3 degrees of freedom. Give the sample mean, variance, standard deviation, lower quartile, median, and upper quartile (alternatively, you may give the command(s) used to calculate them).

(b) Pretending that the distribution of x1 is unknown, do the following: i. perform a t-test of the hypothesis that the mean of the population distribution is 0.2 versus a two-sided alternative. Write down the null and alternative hypotheses and report the results of the test.

ii. Produce a normal qq-plot of x1. Does it support a hypothesis that x1 is a sample from a normal distribution? Explain with a single sentence or phrase.

iii. Produce a qq-plot to check if x1 is a sample from the Students t distribution with 3 degrees of freedom. Explain your conclusion.

iv. What test would you use to test the null hypothesis that the sample is from the t3 distribution? Explain your choice of test. Perform the test and report its results.

Page 1 of 7

P.T.O.

Final test Questions part

Practical Statistics I, 2010/11 Wednesday, 11 May 2011

2. (40 marks) Item 1 in the data part of this paper describes data collected with the purpose of predicting timber yield and gives some background information. (a) It was decided to t the following model Y = x + U, (1)

where x = d2 h. i. The predictor, x, is a non-linear function of two observed variables. Is the model a simple linear regression model? Explain.

ii. What physical reasons justify a linear relationship between y and d2 h and the omission of the intercept from the model?

iii. Fit the model specied by equation (1). Report the estimates of the parameter and the standard deviation of the error term U . Write down the equation of the tted line.

iv. Plot the residuals versus the predictor. What does this plot tell you about the quality of the t?

Page 2 of 7

P.T.O.

Final test Questions part

Practical Statistics I, 2010/11 Wednesday, 11 May 2011

(b) In order to see if it would be better to include an intercept in the model, the following model is considered. Y = + x + , (2) where x is as in (1). i. Fit model (2) and write down the equation of the tted model.

ii. Test if the intercept, , is 0.

iii. What proportion of the variability of the volume is explained by the model?

iv. Is there statistical evidence to suggest dropping the intercept from the model? Explain.

v. Plot the residuals versus the predictor. What does this plot tell you about the quality of the t? Is there any improvement in comparison to the model without intercept.

vi. A logger fells a cherry tree with diameter 50cm and height 20m. Compute a 95% prediction interval for the volume of lumber obtainable from this tree.

Page 3 of 7

P.T.O.

Final test Questions part

Practical Statistics I, 2010/11 Wednesday, 11 May 2011

3. (30 marks) Let p be the area of the region S delimited by the x-axis (y = 0), the vertical lines x = 0 and x = 1, and the curve y = ex1 . Notice that S is part of the unit square whose vertices are at (0, 0), (0, 1), (1, 1), and (1, 0). Do the following using n = 100. (a) Generate a random sample, u1 , . . . , un , of size n from the uniform distribution on S. Store the sample in an n2 matrix us, where the ith row of us contains ui (i.e. the x-coordinate of ui is in us[i,1] and the y-coordinate of ui is in us[i,2]). Write down u1 and u50 .

(b) Create a vector v such that v[i] is equal to 1 or 0 depending on whether or not the point ui is under the curve y = ex1 .

(c) Using u and/or v, estimate the area of the region S.

(d) Calculate the sample variance and sample standard deviation of v.

(e) Using (c) and (d), calculate a 95% condence interval for p using a large sample approximation by the normal distribution.

Page 4 of 7

P.T.O.

Final test Questions part

Practical Statistics I, 2010/11 Wednesday, 11 May 2011

(f) Determine the value of p analytically (it is a function of the number e, the base of the natural logarithms). Using this and the condence interval for p obtained before, obtain a condence interval for e. Repeat the simulation with a large value of n and report the number of correct digits of your estimate of e.

Page 5 of 7

P.T.O.

Final test Questions part

Practical Statistics I, 2010/11 Wednesday, 11 May 2011

4. (20 marks (bonus)) Write an R function myks (or give an algorithm) that generates a random sample from the Kolmogorov-Smirnov distribution with m degrees of freedom (this is the distribution of the statistic Dm , see e.g. Notes, p. 35-). For example myks(100,10) should produce a random sample of length 100 from the Kolmogorov-Smirnov distribution with 10 degrees of freedom.

Page 6 of 7

P.T.O.

Final test Questions part

Practical Statistics I, 2010/11 Wednesday, 11 May 2011

5. (0 marks) This question does not carry marks on its own but it is compulsory. Save the R workspace and the command history le when quitting R. Email to me the les .RData and .Rhistory as attachments to a single email. The subject line of the email must be psi test. Multiple emails, emails with wrong subject lines, emails from non-University accounts, and late emails may be penalised. Do not manipulate these les after the end of the exam, this is cheating and will be penalised accordingly. Please remain seated until all scripts are collected and I tell you that you may leave.

Page 7 of 7

End of paper

S-ar putea să vă placă și