Sunteți pe pagina 1din 11

STA 6166, Section 8489, Fall 2007

Final Exam Part II


Due 13 December 2007

RAMIN SHAMSHIRI
UFID#: 9021-3353

C- The following experiment on reproductive fitness in ospreys was conducted back in 1970-1980. Review the description of the experiment and then answer the following questions. 1. Suppose location was expected to have an effect on reproductive fitness but was not of direct interest to the researcher. Should s/he simply ignore the location aspect in the analysis and use CRD with Year as the factor of interest? Explain.
Answer: Ignoring the location effect, the data will be ordered as below: Year Mean SD Var 1970 3.53 4.27 3.82 3.28 5.12 2.85 2.6 2.42 2.76 2.18 3.283 0.918 0.843 1976 12.32 13.18 9.03 18.67 13.91 13.88 16.42 8.92 6.95 10.49 12.377 3.611 13.04 1982 36.49 29.06 19.12 30.39 23.98 21.69 31.15 28.01 16.5 19.72 25.611 6.389 40.82 With the following hypothesis: H0: 1970=1976=1982 H1: At least one of the above is not equal = any reasonable level (0.05 or 0.01) Using one-way ANOVA for this hypothesis test, we need the assumptions below: 1- The population from which the samples were obtained must be normally or approximately normally distributed. 2- The samples must be independent. 3- The variances of the populations must be equal. Using the Levene test for homogeneity of variance, we get an F-value equal to 13.03 which leads to pvalue less than 0.0001, thus we conclude that the variances of the populations are not equal. The variance column of the table above also confirms this result. Since at least one of the assumptions of one-way ANOVA is not met here, we probably not able to receive a trusted result from this test. A one-way ANOVA to test this hypothesis will result: Test F-value = 69.13 Test P-value= <0.0001 Critical F-value= 3.53 This shows that our test F-value is larger than the critical F-value, (very small P-value, less than any reasonable significant level ), thus we reject the null hypothesis and conclude that at least one of the years is different in the mean value. This result is regardless of location effect.

Considering location effect, we first need to know whether the location had any effect on the data observed in a same year. The data and hypotheses can be written as below:

Location Mean Var F-value P-Value GAR1970 2.85 2.6 2.42 2.76 2.18 2.562 0.0724 17.41 0.0031 MAS1970 3.53 4.27 3.82 3.28 5.12 4.004 0.52473 Location Mean Var F-value P-Value GAR1976 13.88 16.42 8.92 6.95 10.49 11.332 14.527 0.82 0.391 MAS1976 12.32 13.18 9.03 18.67 13.91 13.422 12.085

F-crit 5.3176

F-crit 5.317

Location Mean Var F-value P-Value F-crit GAR1982 21.69 31.15 28.01 16.5 19.72 23.414 36.3475 1.209 0.303 5.317 MAS1982 36.49 29.06 19.12 30.39 23.98 27.808 43.4365

H0: MAS1970= GAR1970 H1: MAS1970 GAR1970 Result: P-value=0.0031 => reject H0 H0: MAS1976= GAR1976 H1: MAS1976 GAR1976 Result: P-value=0.0031 => reject H0 H0: MAS1982= GAR1982 H1: MAS1982 GAR1982 Result: P-value=0.0031 => reject H0 Based on the F-value and P-value results, we can see that the location has had effect only on the first year data collection, (1970). For the other years, (1976 and 1982) the location did not have any significant effect. Since locations also have effect on the reproductive fitness, the researcher should not ignore the location aspect in her analysis and use CRD which only uses year as factor of interest since it was shown here that this method will not reveal the true effects of both Year and Location on the reproductive fitness. The researcher shall consider RCBD and consider this problem as a block design in which the blocks have more than t experimental units that are used in the experiment. This method will provide a control on the effect of the two different locations.

2. Review the attached output and choose the most appropriate analysis for this data.

(There are four different ANOVA in the output) Explain your choice including specifically what aspects of the analyses led to your decision and why the other analyses were inappropriate. At a minimum, you should discuss the intentions of the scientist and assumptions of the alternative models.
Answer: Reviewing the four different outputs, I would the fourth one because of the four below reasons: 1- One-Way Anova on index with year The assumptions for this test is that error terms are independent, Normally distributed with constant variance. This One-Way ANOVA will test the below hypothesis: H0: 1970=1976=1982 H1: At least one of the above is not equal The assumption of the homogeneity of variance is not met here according to the following output which shows that the F-value from the Levenes test is equal to 13.03 with degrees of freedom=2 leading to a p-value smaller than any reasonable p-value, thus we reject the null hypothesis of equality of variances 2 2 2 (H0:1970 = 1976 = 1982 )

Since the assumption of homogeneous variance is not met here, it is not appropriate to use One-Way ANOVA. Moreover, as already mentioned earlier in the answer of previous question, this method does not show the location effect. However, regardless of these facts, this test has lead to the following results which rejects the Null hypothesis of equality of the means of productivity fitness through years. (Reject H0: 1970=1976=1982)

2- RCBD on index with location as block This method is capable of considering the effect of location on the fitness index, but we need to check if the assumptions are met. The assumptions for RCBD are independently selection of blocks, the treatments are randomly assigned to the experimental units within a block, homogeneity of variances in treatments and approximately normally distribution of each population. According to the outputs, we can see that the assumption of the approximately normal distribution for populations is met. The Shapro-Wilk and Kolmogorove test for example have both high p-values equal to 0.69 and 0.11 respectively, which does not reject the null hypothesis of normal distribution. The Q-Q plot and Box Plot also shows the same result.

Checking the assumption of homogeneity of variance from the plots of residuals against treatments, we can see that the distribution of the residuals of the model between years is not homogeneous, indicating that the assumption of homogeneous variance between treatments is not met.

The hypothesis of homogeneity of variance is also rejected with the Levenes test, which has a F-value of 2.70, leading to a P-value equal to 0.045<0.05.

Since the assumptions of RCBD are not, it is not appropriate to use its results which are mentioned as below:

3- RCBD on Log10(index)

Due to the problem of Unequal variance among factor levels, it may be useful to perform the analysis using transformed values of the observations, which may satisfy the assumption of equal variances. If is proportional to the Mean, we can use the Logarithm of the yij. Checking the assumption of Normality, the Shapiro-Wilk and Kolmogorov test both have large P-values which do not reject the null hypothesis of Normality distribution. The Q-Q plot and Box plot also confirm this result graphically.

But we can still see that the variances are not homogeneous according to the uneven distributions of the residuals shown as below:

Since the assumption of homogeneity of variance is not met, the test Result of this procedure shown as below cannot also be trusted.

4- RCBD on index - unequal variances for each year. This method provides a more appropriate procedure for making inference on this problem. The assumption of Normality is met by looking at Shapiro-Wilk and Kolmogorov P-values which are both large enough in order to fail in rejecting the null hypothesis of normality. The relevant Q-Q plot and Box plot also shows graphically that the populations are normally distributed. The plot of wtresid*Pred and the plot of Plot of wtresid*year shows that we have met our assumption of homogeneity of variance. Since all the assumptions of RCBD are met here, the results of this analysis can be trusted more than other three analyses.

..

3. Based on your decision in (2), state the statistical model your chose. Be sure to identify all terms in the model.
Answer: The model that I have selected is Randomize Complete Block Design (RCBD) which has the following equation: Yij = + i + j + ij Where: : is the Grand Mean of all the 30 fitness data observed in the two sites during the 3 experimental year and is equal to: i : is the effect due to the ith treatment. Here our treatments are the Years. We have three years, so we have 1 , 2 and 3 . j : is the effect due to the jth block. In this model, our blocks are the two location, GAR and MAS, So we have 1 and 2 . ij : is the error term. These error terms are independent observations from an approximately normally distribution with Mean=0 and constant Variance = 2

4. Given the model you chose, test the hypotheses of interest to the scientist. State the hypotheses being tested. For each set of hypotheses (if there are more than one), give the equation of the test statistic you are using and its distribution. From the output, give the value of the test statistic, the associated degrees of freedom, the p-value for the test, and your conclusion. State the conclusion in terms of the problem under study (reject the null hypothesis is NOT sufficient here). If you have multiple hypotheses, also discuss your choice of method for controlling the experiment-wise error rate.
Answer: The main hypothesis that the scientist are testing is whether the ban of DDT led to a recovery by the osprey in their fitness. This hypothesis can be written as: 0 = . > .
1 = . . Other sets of hypotheses that the scientists are interested to test are: H0: 19701976 H1: 1970<1976 (Claim)

H0: 19761982 H1: 1976<1982 (Claim) H0: 19701982 H1: 1970<1982 (Claim) H0: 1970=1976=1982 H1: At least one of the above is not equal

Using ANOVA test for RCBD, we will have a table of results as below:

The F-stat has F distribution with t-1 degrees of freedom for Numerator and (t 1)(b 1) degrees of freedom for Denominator, where t is number of treatments and b is number of blocks. From the SAS outputs, we have:

The F-value is equal to 92.57 leading to P-vale less than 0.0001, which rejects the null hypothesis of equality of means between years. The degrees of freedom of Numerator is 2 and df of denominator is 11.7. Using Tukey test to find out where the difference falls, we have the following hypotheses. H0: 1970=1976 H1: 19701976 (Claim) H0: 19761982 H1: 19761982 (Claim) H0: 19701982 H1: 19701982 (Claim)

Testing these hypothesis with Tukey, we have the following result from SAS:

The procedure for Tukey test is: =

. ( . )

Where n is the sample size for each treatment.

Conclusion:
Considering the p-values from the below SAS output table which is the results of our analyses, we conclude that the ban of DDT has led to recovery of fitness since 1972. In the other words, we are rejecting the null hypothesis of H0: 1970=1976=1982 and conclude that there is not enough evidence to show that the mean of the fitting index in the three years are equal.

S-ar putea să vă placă și