Sunteți pe pagina 1din 8

STA 215 Test #2 Solutions

Last Name: ________tions_________ First Name:________Solu ______ Student #:_______________________

TA (circle one)
Eman

Tutorial (circle)

Thomas

Wael

Sudipta

Narges

Wed 11-12
Wed 12-1
Wed 1-2
Wed 5-6

Mon 4-5
Tues 12-1
Tues 1-2
Wed 4-5

Tues 9-10
Tues 10-11

Mon 4-5
Mon 5-6

Wed 9-10
Wed 10-11

Time allowed: 50 minutes.

Aids: non-programmable calculator

Check that you have all the consecutively numbered pages of this test. Please give all probabilities
and proportions to four decimal places unless they are unnecessary zeroes.

Best marks go to best answers, as a general rule, particularly where some explanation is requested, so
try to be complete but also clear and concise; a lot of nonsense can decrease your grade.
Show your work and answer in the space provided (or indicate clearly where to look), and in
ink. Pencil may be used, but then remarks will not be allowed.

Marks are shown in brackets at the end of the question parts, and are distributed as follows:
Question
Max
Grade

1
5

2
15

3
10

4
10

Total
40

Good luck!

350

1) Below is some R Commander output from a regression analysis performed on some human
subjects. The persons heat output during a particular exercise was plotted against their mass (in
Kg) as shown below.
[5]

Max
29.036

Coefficients:
Estimate Std. Error t value
(Intercept) 129.8182
7.3561
17.65
Mass
3.8574
0.1924
20.05
Residual standard error: 14.81 on 21 DF
R-squared: 0.9503
F-statistic: 401.8 on 1 and 21 DF
p-value: 3.587e-15

250

3Q
7.530

200

Median
1.169

Heat Output

Residuals:
Min
1Q
-19.964 -10.898

300

Call:
lm(formula = Heat ~ Mass, data = Dataset)

20

30

40

50

Mass

a) Give the equation of best fit obtained from the software. (1)

= 129.8182 + 3.8574

b) Something was minimized to obtain this equation. Explain what was minimized in nonstatistical language, as though you are speaking to a first-year science student.

. (1)
.
Using the fitted model from the R output,
c) Predict the heat output of a 35 Kg person

= 129.8182 + 3.8574 35 = . (1)

d) Predict the body mass of a person with a heat output of 250

( ). (1)

e) My mass is roughly 90 Kg. What heat output would you expect me to have?

. (1)
3

2) Below is some output from a regression analysis performed on a dataset containing the age and
systolic blood pressure measurement for 30 patients. These patients were a random sample from
all of the patients at a medical clinic in Toronto.
[15]
The regression equation is

Scatterplot of blood_pressure vs age

blood_pressure = (A) + 0.971 (B)

Coef
98.71
0.971

S = 17.3137

SE Coef
10.00
0.2102

T
9.87
4.62

P
0.000
0.000

R-Sq = 43.2%

Analysis of Variance
Source
Regression
Residual Error
Total

DF
1
28
29

200
blood_pressure

Predictor
Constant
age

220

180
160
140
120

SS
6394.0
8393.4
14787.5

MS
6394.0
299.8

F
21.33

100
10

20

30

40
age

50

60

70

Unusual Observations
Obs
age blood_pressure
2
47.0
220.00

a) Two items have been replaced with letters. Fill in what they should be.

(A) (B) -

98.71 (1)
age (1)

b) What does the estimated slope of 0.971 tell us about the relationship between blood
pressure and age? Be specific a qualitative answer will not suffice here.
, (1)
0.971 , (1)

c) ___43.2%_(1)_ of the variation in __Blood Pressure (1)_ can be explained by the relationship
with _____Age__(1)_. Fill in the blanks and be specific.
4

d) R has identified one unusual observation. This observation has: (circle one each) (1)
i) High leverage

True or False

iii) Large residual

True or False

ii) High influence

True or False

e) Above is a plot of the residuals, with the outlier removed (it turns out this was a data entry
error). Do you see any problems? If so, list the problems in order of importance. If not, state
why you came to this conclusion.
. (1) (1).

After removing the outlier, the new model is:

blood_pressure = 96.24 + 0.973 age

f) A researcher wants to use this new model to predict the blood pressure of all Toronto
residents between the ages of 18-70. Is this an appropriate use of regression? Explain why or
why not using terminology from class. (2)

(1) (1)( ).

Using the new model:


g) Predict the blood pressure for an 80 year-old patient.

. (1)

h) Predict the age of a patient with a blood pressure of 160.

( ). (1)

i) Predict the blood pressure for a 38 year-old patient.


= 96.24 + 0.973(38) = . (1)

3) Choose one phrase from list A and however many phrases from list B that best describe each
situation. You may use some phrases more than once, or not at all, or you may combine them.
Indeed, some dont even make sense.
[10]
List A

Survey/Sample
Observational Study
Experiment

List B

Simple Random Sample


Stratified sample
Cluster sample
Systematic sample
Multistage sample
Factorial Experiment

Cross-sectional Study
Retrospective
Prospective
Cohort Study
Case-Control Study

Completely Randomized Experiment


Randomized Block Experiment
6

a) A UTM employee wants to see if the program a student chooses has an effect on whether
they stay in school until graduation. She looks through the last ten years of graduation
records, noting what programs they enrolled in and whether they graduated or not.
This is a(n)

___Obs. Study__________ (Choose from list A)

In particular, it is a _______Retrospective Cohort Study________ (List B)


b) An engineer wants to measure the mercury content of a nearby lake. She first converts the
lake into a 10x10 grid, and numbers each cell from 1 to 100. She then marks every 12th cell,
and rows out to obtain some water from each.
This is a(n)

______ Survey/Sample ___ (List A)

In particular, it is a ________Systematic___________________ (List B)

c) A researcher is given a dataset containing physician reports for all patients in the U.K. Using
this dataset, he randomly samples 200 patients with Crohns disease and 200 patients
without Crohns to determine if exercise levels can predict onset of the disease.
This is a(n)

_____ Obs. Study _________ (List A)

In particular, it is a _______(Retrospective) Case-Control Study__________ (List B)

d) A researcher wants to determine whether eating breakfast improves a students


performance in school. She allocates 50 students to eat breakfast and 50 not to eat breakfast,
ensuring that each treatment group has an equal number of males and females. She
measures their performance on a weekly test and compares the two groups.
This is a(n)

_____Experiment_________ (List A)

In particular, it is a _______Completely Randomized Exp.______ (List B)


e) A researcher wants to figure out what proportion of Americans support Obama in the
upcoming election. Using the voters list, he randomly chooses 5 States, then 5 towns from
each state, then five streets from each town. He then drives to each chosen street and asks
everyone which candidate they will support.
This is a(n)

_____Survey/Sample_____ (List A)

In particular, it is a _______Multistage Cluster sample_________ (List B)


7

4) A researcher wants to investigate whether different forms of exercise can be used to help
hyperactive children. A group of 90 children is divided into two groups according to age - those
aged 9-12 and those aged 5-9. Within each age group the children are randomly assigned to one
of three groups. The first group will just do their normal physical activities. The second group
will be given an additional moderately demanding exercise routine. The third group will be
given an additional exercise routine that is very strenuous in nature. At the end of a four month
period parents will be asked to evaluate their children's progress as either {None, Low or High}.
Identify all the key design elements, such as:
[10]
a) the factors, levels, and treatments
Factor: Physical Activity (1)
Levels: Normal, Moderate, Strenuous (1)
Treatments: same as factor levels (1) since only 1 factor design
b) any blocking variables (if present)

Age group (1)

c) response variable(s)

Progress (1)

d) use of blinding

None (1) the students surely know what exercise regime they are doing, and we figure the parents
would know this as well
e) possible improvements to the design (2 1mk for any of)
Blind the parents to the Tx by having students do exercise at school
Control for caloric intake
Block by sex as well as age
Evaluate results using a numerical metric instead of categories
Use more than 90 children / replicate the experiment (1 mk max for both)
<anything else that demonstrates the students understanding of the principles of
control, blinding, blocking, or randomization>

f) We could use side-by-side boxplots to compare the progress of children across the different
exercise groups.
True or False ? Response is categorical

g) If we notice a significant difference between the proportion of High progress from each
group, we can infer a cause-effect relationship. True or False ? No blinding

S-ar putea să vă placă și