Sunteți pe pagina 1din 4

STAT 505 Assessment #3 Luciana Echarren

1. (data from J&W Exercise 1.6 ) The data in air.dat are 42 measurements on air-pollution variables in
Los Angeles. The columns are x1 = wind, x2 = solar radiation, x3 = CO, x4 = NO, x5 = NO2, x6 = O3, and x7
= HC.

(a) Using notation consistent with our lessons, explain briefly what Xij and j represent in terms of the
variables here. Similarly, explain briefly what jk represents. What does it mean if jk = 0?

Xij represents the i-th observation within the j-th variable


j represents the population mean for the jth variable
jk represents the covariance between the jth and kth variables

(b) Considering only solar radiation, CO (carbon monoxide), and NO2 (nitrogen dioxide), can these
variables reasonably be assumed to be multivariate normal? Justify your answer with appropriate plots.

The Q-Q plot deviates slightly from a straight line;


however, not enough to indicate that the data
deviates from a multivariate normal distribution.

(c) Provide a matrix of scatterplots for the three variables above. Also, report the numeric correlation
for each pair of variables. Is any pair significantly correlated? Answer this with separate confidence
intervals of correlation. Use a Bonferroni adjustment for multiplicity so that your confidence for all
intervals simultaneously is 95%.
Hypothesis: H0: = 0 vs. Ha : = 0 at the = 0.05

CO and Solar Radiation r23 = 0.183, p value = 0.247


NO2 and Solar Radiation r25 = 0.116, p value = 0.465
CO and NO2 r35 = 0.557, p value = 0.000

Since the p-value for CO and NO2 is 0.000 < .05, we can reject the null hypothesis and conclude that
there is a significant correlation between these two variables.

1 + .183 1 + .116 1 + .557


= .5 ln ( ) = 0.185, = .5 ln ( ) = .117, = .5 ln ( ) = .628
1 .183 1 .116 1 .557
.05
Multiplier: 41 ( ) = 2.4962
3

2.4962
35 : 0.628 (0.243, 1.013)
(42)

Back transform the interval:

20.243 1 21.013 1
( 20.243 , ) (.238, 0.767)
+ 1 21.013 + 1

2. Measurements of biochemical oxygen demand (Y1) and suspended solids (Y2) were obtained from the
discharge of n = 11 municipal wastewater treatment plants into the rivers of Wisconsin.

Assume that these data are sampled from a bivariate normal population with mean vector and
covariance matrix .

(a) Find the sample mean vector and the sample covariance matrix.

34.64 109.3 120.4


= [ ],=[ ]
33.18 120.4 363.8

(b) Find 95% confidence intervals for the population means using

i. One at a time multiplier


10.45 19.07
1 : 34.64 2.228 (27.62,41.66) and 2 : 33.18 2.228 (20.37,45.99)
11 11
We can be 95% confident that the intervals (27.62, 41.66) and (20.37, 45.99) contain the true
population mean of Y1 and Y2, respectively.

ii. Bonferroni multiplier

10.45
1 : 34.64 2.634 (26.34, 42.94)
11
19.07
2 : 33.18 2.634 (18.04,48.33)
11

iii. Simultaneous confidence region multiplier

2(10)4.257 10.45
1 : 34.64 (24.94, 44.33)
9 11

2(10)4.257 19.07
2 : 33.18 (15.50,50.86)
9 11

(c) Compute the sample correlation r between oxygen demand and suspended solids.

12 120.37
12 = = = 0.604
12 22 109.25 363.76

(d) Test H0 : = 0 against Ha : = 0 at the = 0.01 level. What are your conclusions?

9
= 0.604 = 2.27, with 9 df and a p value of 0.0492. Since the p-value > .01, we cannot
10.604 2
reject the null hypothesis that says the correlation is equal to zero at the .01 significance level.

(e) Give a 95% confidence interval for

1 + .604
12 = .5 ln ( ) = 0.6994
1 .604
1.96
0.6994 (0.0064, 1.39)
(8)

Back transform the interval:

2.0064 1 21.39 1
( 2.0064 , ) (.0064, 0.88)
+ 1 21.39 + 1
SAS CODE