Sunteți pe pagina 1din 55

Lecture notes

Basic Principles of Experimental Designs


The basic principles of experimental designs are randomization, replication and
local control. These principles make a valid test of significance possible. Each of
them is described briefl in the follo!ing subsections.
(1) Randomization. The first principle of an experimental design is randomization, which
is a random process of assigning treatments to the experimental units. The random
process implies that every possible allotment of treatments has the same probability. n
experimental unit is the smallest division of the experimental material and a treatment
means an experimental condition whose effect is to be measured and compared. The
purpose of randomization is to remove bias and other sources of extraneous variation,
which are not controllable. nother advantage of randomization (accompanied by
replication) is that it forms the basis of any valid statistical test. !ence the treatments
must be assigned at random to the experimental units. Randomization is usually done
by drawing numbered cards from a well"shuffled pac# of cards, or by drawing numbered
balls from a well"sha#en container or by using tables of random numbers.
($) Replication. The second principle of an experimental design is replication% which is a
repetition of the basic experiment. &n other words, it is a complete run for all the
treatments to be tested in the experiment. &n all experiments, some variation is
introduced because of the fact that the experimental units such as individuals or plots of
land in agricultural experiments cannot be physically identical. This type of variation can
be removed by using a number of experimental units. 'e therefore perform the
experiment more than once, i.e., we repeat the basic experiment. n individual
repetition is called a replicate. The number, the shape and the size of replicates depend
upon the nature of the experimental material. replication is used
(i) to secure more accurate estimate of the experimental error, a term which represents
the differences that would be observed if the same treatments were applied several
times to the same experimental units%
(ii) to decrease the experimental error and thereby to increase precision, which is
a measure of the variability of the experimental error% and
1
(iii) to obtain more precise estimate of the mean effect of a treatment, since ,
where denotes the number of replications.
(() )ocal *ontrol. &t has been observed that all extraneous sources of variation
are not removed by randomization and replication. This necessitates a
refinement in the experimental techni+ue. &n other words, we need to choose a
design in such a manner that all extraneous sources of variation are brought
under control. ,or this purpose, we ma#e use of local control, a term referring to
the amount of balancing, bloc#ing and grouping of the experimental units.
-alancing means that the treatments should he assigned to the experimental
units in such a way that the result is a balanced arrangement of the treatments.
-loc#ing means that li#e experimental units should be collected together to form
a relatively homogeneous group. bloc# is also a replicate. The main purpose of
the principle of local control is to increase the efficiency of an experimental
design by decreasing the experimental error. The point to remember here is that
the term local control should not be confused with the word control. The word
control in experimental design is used for a treatment. 'hich does not receive
any treatment but we need to find out the effectiveness of other treatments
through comparison.
'hat is !ypothesis Testing.
statistical hpothesis is an assumption about a population parameter. This
assumption may or may not be true. "pothesis testing refers to the formal
procedures used by statisticians to accept or re/ect statistical investigation.
0tatistical !ypotheses
The best way to determine whether a statistical hypothesis is true would be to examine
the entire population. 0ince that is often impractical, researchers typically examine a
random sample from the population. &f sample data are not consistent with the statistical
hypothesis, the hypothesis is re/ected.
There are two types of statistical hypotheses.

#ull hpothesis. The null hypothesis, denoted by !


1
, is usually the hypothesis
that sample observations result purely from chance.
2

$lternative hpothesis. The alternative hypothesis, denoted by !


1
or !
a
, is the
hypothesis that sample observations are influenced by some non"random cause.
,or example, suppose our investigation is to test the yield of two varieties of cowpea
2ariety
2ariety -
The hypothesis statement for !
1
3 yield of 2 32- while !
a
states that the yield of 2 4
2-.
To prove this we need to conduct an experiment of research to test the two hypothesis.
The two cowpea varieties have to be grown in an experimental field using experimental
procedures, appropriate experimental design, and agronomic practices etc, collect
useful data on grain yield.
!aving got the yield data, it has to be tested at an appropriate level of significance t.
15
or
t.
11
0uppose the test of significance showed that 2

produced yield significantly higher of
lower than 2-, we therefore have to re/ect !
1
and accept !
a
.
!ypothesis Tests
0tatisticians follow a formal process to determine whether to re/ect a null hypothesis,
based on sample data. This process, called hpothesis testing, consists of four steps.

0tate the hypotheses. This involves stating the null and alternative hypotheses.
The hypotheses are stated in such a way that they are mutually exclusive. That
is, if one is true, the other must be false.

,ormulate an analysis plan. The analysis plan describes how to use sample data
to evaluate the null hypothesis. The evaluation often focuses around a single test
statistic.

nalyze sample data. ,ind the value of the test statistic (mean score, proportion,
t"score, z"score, etc.) described in the analysis plan.
3

&nterpret results. pply the decision rule described in the analysis plan. &f the
value of the test statistic is unli#ely, based on the null hypothesis, re/ect the null
hypothesis.
6ecision 7rrors
Two types of errors can result from a hypothesis test.

Tpe % error. Type & error occurs when the researcher re&ects a null hypothesis
when it is true. The probability of committing a Type & error is called the
significance level. This probability is also called alpha, and is often denoted by
8.
eg., given two varieties of maize that yield approx.. $111#g9ha%
"' states ()*(+:.True%
"a states () , (+-.,alse
!e may conduct experiment to collect data on their potential yield.
ssuming the data conclude 21 is higher than 2$. !e has committed Type 1
7rror. The truth is that both yield e+ually, which what !1 states. Re/ecting this
true situation constitutes the error.

Tpe %% error. Type && error occurs when the researcher $..EPT a null
hypothesis that is false. The probability of committing a Type && error is called
Beta, and is often denoted by ;. The probability of not committing a Type && error
is called the Po!er of the test.

eg., given two varieties of maize that 21 yields approx.. $111#g9ha and 2$ yield
1111#g9ha

"' states ()*v+:.,alse%


Ha states V1 V2.True

!e may conduct experiment to collect data on their potential yields.


4

ssuming the data conclude yield of 21 is e+ual to 2$. !e has committed Type $
7rror. !as failed to re/ect a false !1, in other words, !e accepted !1, he thereby
committed Type $ error.

$ssumption for "pothesis testing


1. /andom sampling0 'e assume that our samples were ta#ing randomly
to represent the population and without bias
$. "omogeneous variance0 &n a situation where more populations are
involved, we assume that the populations have common or homogeneous
variance. &f we assume otherwise, we would not be able to to compare
them since the differences among them as indicated by test"value could
be due to differences among variances.
(. #ormal distribution0 'e assume that the population(s) are normally
distributed.
1teps in testing hpothesis0
1. 0tate the hypothesis not only in symbols but also in words.
$. 0tate the assumptions of the experimental error% that it normally
and independently distributed, with a mean of zero and a variance
e+ual to the population variance.
(. 0elect the sample size, n, #nowing that the larger it is the closer it is
to the normal distribution
<. 0et the level of significance,2, the ris#, the probability of type 1
error (level of significance). &ndicate whether it is a one tail or two
tailed.
5. 0elect the appropriate statistics, e.g. T"test, ,"test etc.
5
=. 6etermine the critical value. &t is necessary to write down what the
absolute value of t, , or >
$
?. *ollect your data and calculate your statistic
@. Re/ect or accept the !1.
A. &nterpret results in terms of experimental materials
11. *onclude and ma#e recommendation where necessary

T3testing

This used to test two means from either two population or sample sets. There are
about four different t"tests. &t is important to #now which type to use under any
given situation.
Types of T"test
1. 0ample versus a standard whose population mean B and standard
deviation 4 are #nown.
$. 0ample versus a standard with #nown population mean B and un#nown
standard deviation 4.
(. Caired t"test or test for paired observations, whose population mean B and
standard deviation 4 are #nown.
<. Test for difference between two means or test for unpaired observations
(population mean B and standard deviation 4 un#nown).
*alculations will be illustrated using the first three t"tests mentioned above.
). 1ample versus a standard !hose population mean 5 and standard
deviation 4 are kno!n.
)et us assume that the average yield of maize in )impopo is 51#g9ha with a
standard deviation of @#g, if the government. encourages the use of improved
cultural practices, one may li#e to evaluate the farmers progress after five years.
6
0uppose one ta#es a random sample of =< farms and finds an average yield of
=o#g9ha.
Duestion. !as the activity of govt. made any significant impact on maize yield.
B351#9ha
E3@#g
0tate your hypothesis,
"' states B)* B+:.verage yield of maize before and after the govt, campaign
did not differ
"a states B) , B+- verage yield of maize before and after the govt, campaign
did differ
Fiven n 3 sample size 3 =<
8. 3 1.15, test is t"tailed,
t"testG t 3
3 =1, E3@#g
B351
T"testG t3 ( " B)9 E9Hn,
=1"519(@9H=<) 3 119(191) 3 1191 3 11
&nferenceG loo# at the critical region set earlier,
0ince t calI t"tab, we re/ect the h1 with 5J of type 1 error and conclude that the mean
yield of maize in )impopo after govt campaign was higher than the former average.
That is to say that the increase of 11#g9ha (=1"51) was more than what we can ascribe
or attribute.to chance.
7
+. 1ample versus a standard !ith kno!n population mean 5 and unkno!n
standard deviation 4.
)et us ta#e the average yield of canola is 11ton9ha. 0uppose an agronomist
introduces a new cultural techni+ue and wants to evaluate it against the original
to determine whether there is yield advantage.
ssume average yield of canola yield determined over several years and locations to be
11 tons9ha. The standard deviation is not #nown
6iven the date belo!,
trials ) + 7 8 9 : ; < = )' )) )+ )7 )8 )9 ):
Kld
ton9ha
< )' )+ )) = )8 < )' )) )7 = < )' )+ )7 )+
To solve this we have to estimate the population E from the sample by calculating the s,
sample standard deviation since the E is not #nown.
T"testG t3 ( " B)9 s9Hn,
"' states B)* B+:.verage yield of canola under the conventional and new
practice do not differ
"a states B) , B+- verage yield of verage yield of canola under the
conventional and new practice does differ
Fiven L 3 sample size 3 1=
8. 3 1.15, test is t"tailed, so we use 1.1$5
6ecision rule% t
tab
1.1$5 (n"1) 3 $.1(1
Re/ect !1, if t
cal
I t"
tab
8
.alculations
,irst calculate s sample standard deviation since E is not #nown.
0
$
3 @
$
M::..1$
$
N (1?1
$
)9n"1
355.@?
0
$
3 55.@?915 3(.?$
03 H(.?$ 3 1.A(
t"testG t 3
3 1?191= 3 11.=(,
B311
T"testG t
cal
3 ( " B)9 s9Hn,
311.=("119(1.A(9H1=) 3 .=(91.<@$ 3 1.(1,
%nference
t
cal 3
1.(1
t
tab 3
$.1(1
0ince t
cal
O t"
tab, P
we fail to re/ect

!1, in other words, we accept !1, and conclude that the influence of
both cultural practices on canola yield do not differ.
3. Paired t3test or test for paired observations, !hose population mean 5 and
standard deviation 4 are kno!n.
>e use this t"test when the observations are paired on basis of time or stage of
data collection.
9
n agronomist may be interested in comparing the leaf areas of cowpea grown
under intercrop with maize and that with pure stands which were sampled once a
wee# for 11 wee#s.
'ee# )eaf area (cm
$
)
Cure stand &ntercrop 6ifference (d)
1 $ < $
$ ( 5 $
( < = $
< 5 @ (
5 = A (
= ? 11 (
; < )+ 8
< = )' )
= )' )) )
)' )) )7 +
Total :9 ;< +7
?ean :.9 ;.< +.7
The criterion for pairing is !eek
10
T"testG t
cal
3 (d
1
"d
1
)9 sd
1
d1 3mean of the differences ($.$)
d
1
3 population mean difference
d
1
3 1
sd 3 sd9Hn 3 standard error of the differences
"' states d'* 1:.verage leaf area of cowpea in pure stand and intercrop do
not differ
"a states d1 , 1- verage leaf area of cowpea in pure stand and intercrop
does differ
6ecision rule% t
tab
1.1$5 (n"1) 3 t
tab
1.1$5 (A) 3 $.$=$
Re/ect !1, if t
cal
I t"
tab
t 3d1"d19sd
,irst calculate s sample standard error of the differences.
0
$
3 $
$
M $
$
::..1$
$
N ($$
$
)9n3 5="<@.< 3 "?.=9n"13?.=9A 3 1.@<<
0d 3 H 0
$ 3
H1.@<< 3 1.A$
0d 3 sd9Hn 3 1.A$911 3 1.$A
0ince d1 3 $.$, t
cal
3 $.$"191.$A 3?.5A
&nferenceG t
cal
3 ?.5A
t"
tab
3 $.$=$
11
since t
cal
I t"
tab, we
re/ect !1 at 1.15, and conclude that leaf area of cowpea
significantly differed with cropping system.
6ecision Rules
The analysis plan includes decision rules for re/ecting the null hypothesis. &n practice,
statisticians describe these decision rules in two ways " with reference to a C"value or
with reference to a region of acceptance.

C"value. The strength of evidence in support of a null hypothesis is measured by


the P3value. 0uppose the test statistic is e+ual to S. The C"value is the
probability of observing a test statistic as extreme as S, assuming the null
hypotheis is true. &f the C"value is less than the significance level, we re/ect the
null hypothesis.

Region of acceptance. The region of acceptance is a range of values. &f the test
statistic falls within the region of acceptance, the null hypothesis is not re/ected.
The region of acceptance is defined so that the chance of ma#ing a Type & error
is e+ual to the significance level.
The set of values outside the region of acceptance is called the region of
re&ection. &f the test statistic falls within the region of re/ection, the null
hypothesis is re/ected. &n such cases, we say that the hypothesis has been
re/ected at the 8 level of significance.
These approaches are e+uivalent. 0ome statistics texts use the C"value approach%
others use the region of acceptance approach. &n subse+uent lessons, this tutorial will
present examples that illustrate each approach.
Qne"Tailed and Two"Tailed Tests
12
test of a statistical hypothesis, where the region of re/ection is on only one side of the
sampling distribution, is called a one3tailed test. ,or example, suppose the null
hypothesis states that the mean is less than or e+ual to 11. The alternative hypothesis
would be that the mean is greater than 11. The region of re/ection would consist of a
range of numbers located located on the right side of sampling distribution% that is, a set
of numbers greater than 11.
test of a statistical hypothesis, where the region of re/ection is on both sides of the
sampling distribution, is called a t!o3tailed test. ,or example, suppose the null
hypothesis states that the mean is e+ual to 11. The alternative hypothesis would be that
the mean is less than 11 or greater than 11. The region of re/ection would consist of a
range of numbers located on both sides of sampling distribution% that is, the region of
re/ection would consist partly of numbers that were less than 11 and partly of numbers
that were greater than 11.
@nkno!n Population (alues
'hen we are testing a hypothesis we usually donRt #now parameters from the
population.
That is, we donRt #now the mean and standard deviation of an entire population most of
the time. 0o, the t"test is exactly li#e the z"test computationally, but instead of using the
standard deviation from the population we use the standard deviation from the sample.
The formula isG
t 3 > S T
sx
, where sx 3 s
n
The standard deviation from the sample (0), when used to estimate a population in this
way, is computed differently than the standard deviation from the population. Recall that
the sample standard deviation is U0V and is computed with n"1 in the denominator (see
prior lesson). Wost of the time you will be given this value, but in the homewor# pac#et
there are problems where you must compute it yourself.
The t3distribution
There are several conceptual differences when the statistic uses the standard deviation
from the sample instead of the population. 'hen we use the sample to estimate the
population it will be much smaller than the population. -ecause of this fact the
distribution will not be as regular or UnormalV in shape. &t will tend to be flatter and more
spread out than population distribution, and so are not as UnormalV in shape as a larger
set
of values would yield. &n fact, the t"distribution is a family of distributions (li#e the
zdistribution),
13
that vary as a function of sample size. The larger the sample size the more
normal in shape the distribution will be. Thus, the critical value that cuts off 5J of the
distribution will be different than on the z"score. 0ince the distribution is more spread
out, a higher value on the scale will be needed to cut off /ust 5J of the distribution.
The practical results of doing a t"test is that 1) there is a difference in the formula
notation, and $) the critical values will vary depending on the size of the sample we are
using. Thus, all the steps you have already learned stay the same, but when you see
that
the problem gives the standard deviation from the sample (0) instead of the population
(X), you write the formula with UtV instead of UzV, and you use a different table to find the
critical value.
The t"table
*ritical values for the t"test will vary depending on the sample size we are using, and as
usual whether it is one"tail or two"tail, and due to the alpha level. These critical values
are in the ppendices in the bac# of your boo#. 0ee page $? in your text. Lotice that
we have one and two"tail columns at the top and degrees of freedom (df) down the side.
6egrees of freedom are a way of accounting for the sample size. ,or this test df 3 n N 1.
*ross index the correct column with the degrees of freedom you compute. Lote that this
is a table of critical values rather than a table of areas li#e the z"table.
lso note, that as n approaches infinity, the t"distribution approaches the z"distribution.
&f you loo# at the bottom row (at the infinity symbol) you will see all the critical values
for the z"test we learned on the last exam.
.onfidence %ntervals
&f we re/ect the null with our hypothesis test, we can compute a confidence interval.
*onfidence intervals are a way to estimate the parameters of the un#nown population.
0ince our decision to re/ect the null means that there are two populations instead of /ust
the one we #now about, confidence intervals give us an idea about the mean of the new
un#nown population.
0ee the *onfidence &nterval demonstration on the web page or clic# here
httpG99faculty.uncfsu.edu9dwallace9sci.html for the rest of the lesson
Difference Bet!een A3test and T3test
Y *ategorized under Wiscellaneous Z 6ifference -etween ["test and T"test
A3test (s T3test
14
0ometimes, measuring every single piece of item is /ust not practical. That is why we
developed and use statistical methods to solve problems. The most practical way to do
it is to measure /ust a sample of the population. 0ome methods test hypotheses by
comparison. The two of the more #nown statistical hypothesis test are the T"test and the
["test. )et us try to brea#down the two.
T"test is a statistical hypothesis test. &n such test, the test statistic follows a 0tudentRs
T"distribution if the null hypothesis is true. The T"statistic was introduced by '.0.
Fossett under the pen name U0tudentV. The T"test is also referred as the U0tudent T"
testV. &t is very li#ely that the T"test is most commonly used 0tatistical 6ata nalysis
procedure for hypothesis testing since it is straightforward and easy to use. dditionally,
it is flexible and adaptable to a broad range of circumstances.
There are various T"tests and two most commonly applied tests are the one"sample and
paired"sample T"tests. Qne"sample T"tests are used to compare a sample mean with
the #nown population mean. Two"sample T"tests, the other hand, are used to compare
either independent samples or dependent samples.
T"test is best applied, at least in theory, if you have a limited sample size (n O (1) as
long as the variables are approximately normally distributed and the variation of scores
in the two groups is not reliably different. &t is also great if you do not #now the
populationsR standard deviation. &f the standard deviation is #nown, then, it would be
best to use another type of statistical test, the ["test. The ["test is also applied to
compare sample and population means to #now if thereRs a significant difference
between them. ["tests always use normal distribution and also ideally applied if the
standard deviation is #nown. ["tests are often applied if the certain conditions are met%
otherwise, other statistical tests li#e T"tests are applied in substitute. ["tests are often
applied in large samples (n I (1). 'hen T"test is used in large samples, the t"test
becomes very similar to the ["test. There are fluctuations that may occur in T"tests
sample variances that do not exist in ["tests. -ecause of this, there are differences in
both test results.
0ummaryG
1. ["test is a statistical hypothesis test that follows a normal distribution while T"test
follows a 0tudentRs T"distribution.
$. T"test is appropriate when you are handling small samples (n O (1) while a ["test is
appropriate when you are handling moderate to large samples (n I (1).
(. T"test is more adaptable than ["test since ["test will often re+uire certain conditions
to be reliable. dditionally, T"test has many methods that will suit any need.
<. T"tests are more commonly used than ["tests.
5. ["tests are preferred than T"tests when standard deviations are #nown.
Read moreG 6ifference -etween ["test and T"test Z 6ifference -etween Z ["test vs T"test
15
httpG99www.differencebetween.net9miscellaneous9difference"between"z"test"and"t"
test9\ixzz$]tQ1b5&
-. 'eaver ($?"Way"$111) z" and t"tests ... 1
"pothesis Testing @sing z3 and t3tests
&n hypothesis testing, one attempts to answer the following +uestionG If the null
hypothesis is
assumed to be true, what is the probability of obtaining the observed result, or any more
extreme
result that is favourable to the alternative hypothesis?1 &n order to tac#le this +uestion,
at least in
the context of z" and t"tests, one must first understand two important conceptsG 1)
sampling
16
distributions of statistics, and $) the central limit theorem.
1ampling Distributions
&magine drawing (with replacement) all possible samples of size n from a population,
and for
each sample, calculating a statistic""e.g., the sample mean. The fre+uency distribution
of those
sample means would be the sampling distribution of the mean (for samples of size n
drawn from
that particular population).
Lormally, one thin#s of sampling from relatively large populations. -ut the concept of a
sampling distribution can be illustrated with a small population. 0uppose, for example,
that our
population consisted of the following 5 scoresG $, (, <, 5, and =. The population mean
* 8, and
the population standard deviation (dividing by L) 3 ).8)8.
&f we drew (with replacement) all possible samples of $ from this population, we would
end up
with the $5 samples shown in Table 1.
Table 1G ll possible samples of n3$ from a population of 5 scores.
Birst 1econd 1ample Birst 1econd 1ample
1ample C 1core 1core ?ean 1ample C 1core 1core ?ean
1 $ $ $ 1< < 5 <.5
$ $ ( $.5 15 < = 5
( $ < ( 1= 5 $ (.5
< $ 5 (.5 1? 5 ( <
5 $ = < 1@ 5 < <.5
= ( $ $.5 1A 5 5 5
? ( ( ( $1 5 = 5.5
@ ( < (.5 $1 = $ <
A ( 5 < $$ = ( <.5
11 ( = <.5 $( = < 5
11 < $ ( $< = 5 5.5
1$ < ( (.5 $5 = = =
1( < < <
Wean of the sample means 3 <.111
06 of the sample means 3 ).'''
(06 calculated with division by L)
1 That probability is called a p"value. &t is a really a conditional probability""it is
conditional on the null hypothesis
being true.
-. 'eaver ($?"Way"$111) z" and t"tests ... $
The $5 sample means from Table 1 are plotted below in ,igure 1 (a histogram). This
distribution
of sample means is called the sampling distribution of the mean for samples of n3$ from
the
population of interest (i.e., our population of 5 scores).
17
,igure 1G 0ampling distribution of the mean for samples of n3$ from a population of
L35.
& suspect the first thing you noticed about this figure is pea#ed in the middle, and
symmetrical
about the mean. This is an important characteristic of sampling distributions, and we will
return
to it in a moment.
Kou may have also noticed that the standard deviation reported in the figure legend is
1.1$,
whereas & reported 06 3 1.111 in Table 1. 'hy the discrepancy. -ecause & used the
population
06 formula (with division by L) to compute 06 3 1.111 in Table 1, but 0C00 used the
sample
06 formula (with division by n"1) when computing the 06 it plotted alongside the
histogram.
The population 06 is the correct one to use in this case, because & have the entire
population of
$5 samples in hand.
The .entral Limit Theorem D.LTE
&f & were a mathematical statistician, & would now proceed to wor# through derivations,
proving
the following statementsG
W7L0
$.11 $.51 (.11 (.51 <.11 <.51 5.11 5.51 =.11
6istribution of 0ample Weans^
^ or _0ampling 6istribution of the Wean_
=
5
<
(
$
1
1
0td. 6ev 3 1.1$
Wean 3 <.11
L 3 $5.11
-. 'eaver ($?"Way"$111) z" and t"tests ... (
1. The mean of the sampling distribution of the mean 3 the population mean
$. The 06 of the sampling distribution of the mean 3 the standard error (07) of the
mean 3 the
population standard deviation divided by the s+uare root of the sample size
Cutting these statements into symbolsG
` mean of X X the sample means 3 the population mean a T 3 T (1.1)
X ` 07 of mean 3 population 06 over s+uare root of n a
X n
X
18
X 3 (1.$)
-ut alas, & am not a mathematical statistician. Therefore, & will content myself with telling
you
that these statements are true (those of you who do not trust me, or are simply curious,
may
consult a mathematical stats textboo#), and pointing to the example we started this
chapter with.
,or that population of 5 scores, T 3 < and X 3 1.<1< . s shown in Table 1, < X T 3 T 3 ,
and
1.111 X X 3 . ccording to e+uation (1.$), if we divide the population 06 by the s+uare
root of
the sample size, we should obtain the standard error of the mean. 0o letbs give it a tryG
1.<1< .AAA@ 1
n $
X
3 3 c (1.()
'hen & performed the calculation in 7xcel and did not round off X to ( decimals, the
solution
wor#ed out to 1 exactly. &n the 7xcel wor#sheet that demonstrates this, you may also
change the
values of the 5 population scores, and should observe that X X 3X n for any set of 5
scores you
choose. Qf course, these demonstrations do not prove the *)T (see the aforementioned
mathstats
boo#s if you want proof), but they should reassure you that it does indeed wor#.
>hat the .LT tells us about the shape of the sampling distribution
The central limit theorem also provides us with some very helpful information about the
shape of
the sampling distribution of the mean. 0pecifically, it tells us the conditions under which
the
sampling distribution of the mean is normally distributed, or at least approximately
normal,
where approximately means close enough to treat as normal for practical purposes.
The shape of the sampling distribution depends on two factorsG the shape of the
population from
which you sampled, and sample size. & find it useful to thin# about the two extremesG
1. &f the population from which you sample is itself normally distributed, then the
sampling
distribution of the mean will be normal, regardless of sample size. 7ven for sample
size 3
1, the sampling distribution of the mean will be normal, because it will be an exact copy
of
the population distribution.
-. 'eaver ($?"Way"$111) z" and t"tests ... <
$. &f the population from which you sample is extremely non"normal, the sampling
distribution
19
of the mean will still be approximately normal given a large enough sample size (e.g.,
some
authors suggest for sample sizes of (11 or greater).
0o, the general principle is that the more the population shape departs from normal, the
greater
the sample size must be to ensure that the sampling distribution of the mean is
approximately
normal. This tradeoff is illustrated in the following figure, which uses colour to represent
the
shape of the sampling distribution (purple 3 non"normal, red 3 normal, with the other
colours
representing points in between).
Does n have to be F 7'G
0ome textboo#s say that one should have a sample size of at least (1 to ensure that
the sampling
distribution of the mean is approximately normal. The example we started with (i.e.,
samples of
n 3 $ from a population of 5 scores) suggests that this is not correct (see ,igure 1).
!ere is
another example that ma#es the same point. The figure on the left, which shows the
age
distribution for all students admitted to the Lorthern Qntario 0chool of Wedicine in its
first (
years of operation, is treated as the population. The figure on the right shows the
distribution of
means for 11,111 samples of size 1= drawn from that population. Lotice that despite the
severe
-. 'eaver ($?"Way"$111) z" and t"tests ... 5
positive s#ew in the population, the distribution of sample means is near enough to
normal for
the normal approximation to be useful.
>hat
Bield laout and Experimentation
Bield laout depends on
). Land availabilit. The land must be sufficient to contain the trial under
investigation.
+. The experimental design to use
20
7. The slope.
8. #umber of treatments to investigate
Blocking
Blocking depends on0
). Tpes design to use
+. 1lope of the land. Block across the slop and not along the slope in the
case of /.BD.
?arking
/eHuirements0
Rope,
Weasuring Tape and
T"War#ers.
Procedure0
1. dse the rope to align the plot on a straight line and use the T" mar#er to
demarcate dimensions, at end of each replication or plot.
$. War# out the length and breadth of each plot with tape and rope.
(. Wa#e provisions for alley ways between plots and replication. The amount of
alley way permissible depends on the size of the plot and the treatments under
investigation. 0oil fertility trials would need wider alley ways to avoid or reduce
drift from one ad/acent plot to another.
SAM-PLING TECHNIQUES
Why do we sample?
Practically, it is impossible to collect data or information from all the items in a population due to
either cost,labour energy or other resources that may be needed. We need to collect a sample
21
that represents
the population. Therefore an appropriate sampling technique must be used.
Samplin !onside"a#ions
The sample sizes must be large enough to represent the population mean
Representative sample must be collected
The sample must not be biased
Samplin #e!hni$%es
The most commonly used techniques in agricultural experimentation include:
Random sampling
ystematic sampling
tratified sampling
!luster sampling
&andom samplin
"n a simple random sample every item in the population has equal chance of being sampled #ithout bias. This
minimises bias and simplifies analysis of results. "n this case random numbers are generated and assigned to
the units in the population before sampling is effected. The random numbers can be generated from $ %xcell.
"t is &east biased of all sampling techniques, and the population is not partitioned and no restrictions.
Ad'an#aes(
!an be used #ith large sample populations
'voids bias
)isad'an#aes(
amples that cannot represent the population can be selected because of its randomness.
Sys#ema#i! samplin
This involves arranging the study population according to some order and selecting sample units at regular
intervals through population. ystematic sampling involves a random start and then proceeds #ith the selection
sample units. The samples are chosen in a systematic or regular interval.
22
'dvantages:
"t is more straight(for#ard than random sampling
)isadvantages:
"t is more biased, as non(representative samples can be sampled
S#"a#i*ied samplin
This method is used #hen the population or sampling is in categories or strata. The population to be sampled is mapped into distinct
classes or strata. %ach stratum is sampled using any of the techniques mentioned above. The number sampled in each group should
be in proportion to its *no#n size in the parent population. +or example to sample a cereal silo for storage beetle, one may have to
divide the silo into top, medium and bottom strata. "n each stratum, random sampling technique can be used to collect the samples.
'dvantages:
"t can be used #ith random or systematic sampling
it can generate results #hich are more representative of the #hole population
"t is very flexible and easy to use
!orrelations and comparisons can be made bet#een strata
)isadvantages
,. Requires selection of relevant stratification variables #hich can be difficult.
%ntroduction to 1ingle3Bactor Experiments
]nowledge of experimental design is necessary for selection of simple designs that give control
of variability and enable the researcher to attain the re+uired precision. 'e have already
discussed certain factors which are important in selecting an experimental design. The three
most important among these areG
type and number of treatments,
degree of precision desired,
size of uncontrollable variations.
'e generally classify scientific experiments into two broad categories, namely, single"factor
experiments and multifactor experiment. &n a single"factor experiment, only one factor varies
while others are #ept constant. &n these experiments, the treatments consist solely of different
levels of the single variable factor. Qur focus in this section is on single"factor experiments.
23
&n multi"factor experiments (also referred to its factorial experiments), two or more factors vary
simultaneously. The experimental designs commonly used for both types of experiments are
classified asG
*omplete -loc# 6esigns
" completely randomised (*R6)
" randomised complete bloc# (R*-)
" latin s+uare ()0)
&ncomplete -loc# 6esigns
" lattice
" group balanced bloc#
&n a complete bloc# design, each bloc# contains all the treatments while in an incomplete bloc#
design not all treatments may be present. The complete bloc# designs are suited for small
number of treatments while incomplete bloc# designs are used when the number of treatments
is large.
;.8.+. .omplete Block Designs
'e will discuss here three basic designs which come under the category of complete bloc#
designs, namely *R6, R*-, and )0.
The layout of the designs will be illustrated with the example of a modified research protocol on
the _7valuation of four Gliricidia accessions in intensive food production_ (tta"]rah, pers.
comm.). The ob/ective of the protocol is to evaluate top potential Gliricidia accessions under
intensive feed garden conditions. The plot size is @ x 5 m with ( rows or columns of an
accession in each plot. The available area is capable of containing a maximum of 1= plots.
*ompletely Randomised 6esign (*R6)
This is the simplest design. &n *R6, each experimental unit has an e+ual chance of receiving a
certain treatment. The completely randomised design for p treatments with r replications will
have rp plots. 7ach of the p treatment is assigned at random to a fraction of the plots (r9rp),
without any restriction. s stated above, if we have four Gliricidia accessions designated as ,
-. * and 6 and we evaluate them using four replications in *R6, it is guise li#ely that any one
of the accessions, say , may occupy the first four plots of the 1= plots as illustrated in the
following hypothetical layout.

- * * 6
6 - * -
6 * - 6
dseful assumption for the application of this design is homogeneity of the land or among the
experimental materials. This design is rarely used in most trials involving woody vegetation, but
could be used under laboratory and possibly green house conditions.
24
The total source of variation (error) is made up of differences between treatments and within
treatments.
Randomised *omplete -loc# 6esign (R*-6)
Qne possibility that could arise in design or layout of alley farming trials is differences in the
cultural practices or crop"rotation history of the portions of land available for the study.
lternatively, there could be a natural fertility gradient or, in the case of pest studies, differences
in prevailing wind direction. &f any of these heterogenities are #nown to exist, one can classify or
group the area into large homogenous units, called bloc#s, to which the treatments can then be
applied by randomization.
Randomized *omplete -loc# 6esign (R*-6) is characterized by the presence of e+ually sized
bloc#s, each containing all of the treatments. The randomised bloc# design for C treatments with
r replications has rp plots arranged into r bloc#s with p plots in each bloc#. 7ach of the p
treatments is assigned at random to one plot in each bloc#. The allocation of a treatment in a
bloc# is done independently of other bloc#s.
layout for 1= accession plots, grouped in < bloc#s, may be as followsG
CR72&Qd0 *RQCC&LF !&0TQRK -)Q*] **700&QL
,allow 1 * - 6
Waize $ - 6 *
Fmelina ( - 6 *
Waize9Fmelina < - * 6
The arrangement of bloc#s does not have to be in a s+uare. The above arrangement can also
be placed as followsG
* - 6 - 6 * - 6 * - * 6
ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ ZZ
where ZZ represents ( columns or rows of accession.
The actual field plot arrangement, with three columns of each accession for the first two bloc#s
could be as followsG
O"""""-)Q*] 1"""""I O"""""-)Q*] $"""""I
a a a c c c b b b d d d a a a b b b d d d c c c
a a a c c c b b b d d d a a a b b b d d d c c c
a a a c c c b b b d d d a a a b b b d d d c c c
a a a c c c b b b d d d a a a b b b d d d c c c
a a a c c c b b b d d d a a a b b b d d d c c c
a a a c c c b b b d d d a a a b b b d d d c c c
The total source of variation may be categorized as differences between bloc#s, differences
between treatments, and interaction between bloc#s and treatments. The latter is usually ta#en
as the error term for testing differences in treatments.
25
The Randomized *omplete -loc# 6esign (R*-) is the most commonly used, particularly
because of its flexibility and robustness. !owever, it becomes less efficient as the number of
treatments increases, mainly because bloc# size increases in proportion to the number of
treatments. This ma#es it difficult to maintain the homogeneity within a bloc#.
&n R*-, missing plots (values) leading to dnbalanced designs were problematic at one time.
!owever, this is not much of a problem now due to the availability of improved estimation
methods, for example, the use of generalized linear models. ,or situation with less than three
missing values, one can still use the traditional computational procedure of R*- design.
)atin 0+uare 6esign ()0)
The Randomised *omplete -loc# design is useful for eliminating the contribution of one source
of variation only &n contrast, the )atin 0+uare 6esign can handle two sources of variations
among experimental units &n )atin 0+uare 6esign, every treatment ocurs only once in each row
and each column. &n the previous example, cropping history was the only source of variation in
four large bloc#s 0upposing in addition to this we have a fertility gradient at right angle to the
_cropping history_ as shown belowG
Qne may tac#le this problem by using a )atin 0+uare 6esign 7ach treatment (in this case, the
Gliricidia accessions) is applied in bbeach_ cropping history as well as in _each_ fertility gradient &n
our example, restriction on space allows us to have a maximum of only 1= plots, when, say, =<
might have been ideal. The randomization process has to be performed in such a way that each
accession appears once, and only once, in each row (cropping history) and in each column
(fertility gradient). The layout will be as followsG
*RQCC&LF ,7RT&)&TK FR6&7LT
!&0TQRK 1 $ ( <
,allow * - 6
Waize - 6 *
Fmelina * - 6
The four bloc#s correspond to the four different cropping histories. The )atin 0+uare ()0) design
thus minimises the effect of differences in fertility status within each bloc#. The total sources of
variation are made up of row, column, treatment differences, and experimental error.
,or field trials, the plot layout must be a s+uare. This condition imposes a severe restriction on
the site as well as on the number of treatments that can be handled at any one time. !owever,
the principle can be extended to animal experimentation where a physically s+uare arrangement
does not necessarily exist. ,or instance, if the intention is to assess the nutritional effects of the
accessions when fed to animals, the latter could be divided into four age and four size classes.
The )0 arrangement will thus be used to ensure that each age class and size class receives
one and only one of each accession type.
The )0 design can be replicated leading to what is commonly referred to as _Replicated )atin
0+uares_. These )atin s+uares may be lin#ed as shown belowG
26
*RQCC&LF ,7RT&)&TK FR6&7LT
!&0TQRK * - 6 6 * -
* - 6 - 6 *
- 6 * * 6 -
6 * - - * 6
&n the case of the above, the two s+uares have the same set of rows (cropping histories),
leading to an increased degree of freedom for the error term. The rows are said to be lin#ed. &f,
on the other hand, the rows are not lin#ed, _Rows 'ithin 0+uares_ variability replaces the
ordinary _Row_ source of variation.
n additional restriction (source of variation) imposed on a basic )0 design would lead to what
is called _Fraeco")atin 0+uare 6esign_.
;.8.7 %ncomplete Block Designs
Qne precondition for both the R*- and )0 designs is that all treatments must appear in all
bloc#s and all rows (,or R*-) or columns (,or )0). 0ometimes with large number of treatments
(say $1 accessions), each re+uiring relatively large plot sizes, this condition may not be
practicable. )atin 0+uare and R*- then fail to reduce the effect of heterogeneity(s). The
designs in which the bloc# phenomenon is followed but the condition of having all the
treatments in all bloc#s is not met, are called &ncomplete -loc# designs. &n &ncomplete -loc#
situations, the use of several small bloc#s with fewer treatments results in gains in precision but
at the expense of a loss of information on comparisons within bloc#s. The analysis of data for
incomplete bloc# designs is more complex than R*- and )0. Thus where computation facilities
are limited, incomplete bloc# designs should be considered a last resort.
mong incomplete bloc# designs, lattice designs are commonly used in species and variety
testing. These are more complex designs beyond the scope of this paper, but covered in a
number of text boo#s cited at the end of this paper. &t is always advisable to consult a statistician
when using incomplete bloc# designs.
;.9 Experimental designs0 multi3factor experiments
?.5.1 ,actorial Treatments
?.5.$ Lested Treatments9Lested 6esigns
?.5.( Lested",actorial Treatments
?.5.< 0plit"Clot rrangement
?.5.5 Wulti",actor, &ncomplete -loc# 6esigns
'e have so far concentrated on only one factor (i.e., one accession or other treatment).
!owever, more than one factor will often need to be studied simultaneously. 0uch experiments
are #nown as factorial experiments. The treatments in factorial experiments consist of two or
more levels of the two or more factors of production.
27
;.9.) Bactorial Treatments
0uppose we are interested in studying the yield of an agricultural crop in an alley farm where
four different leguminous tree species and three cultural methods are of interest. The
leguminous tree species could be Acacia sp, !assia sp, "eucaena sp, and Gliricidia sp
The cultural treatment could include two weedings, one weeding and no weeding% the
agricultural crop is maize planted between hedgerows of the same tree species. ,or a complete
factorial set of treatments, each level of each factor "must occur together with each level of
every other factor. Thus in the present case we ensure that each cultural method is applied to
each tree species. 0ince there are < species and ( cultural methods, the total number of
treatments will e+ual 1$. &n reality, what we have here is 1$ treatments, with one treatment
being made up of $ factors having < and ( levels, respectively. Qne might say, in this case, the
factors are crossed.
This is not an _experimental design_ but rather a _treatment design,_ because the 1$ treatment
combinations could be applied to any of the designs discussed previously. &f we ta#e the
simplest design, the unrestricted randomized design, and four replications, then the conduct of
an experiment with < leguminous species and ( cultural methods will imply the randomization of
_ 1$ treatmentsbb in <@ plots. &f it is a -loc# design, we will have to ensure that each of the 1$
treatments appears in all the bloc#s.
The advantages of the factorial arrangement are many. Qne ma/or advantage is the reduction in
the number of experiments, and a second the possibility of studying the interactions among the
various factors. significant interaction implies that changes in one factor may be dependent on
the level of the other factor. &f this happens, interpretation of the results has to be done
cautiously to avoid inaccurate general statements on the individual factors.
;.9.+ #ested TreatmentsI#ested Designs
The situation discussed above can be extended to two or more locations, and the results
combined using the *ombined nalysis Crocedure. !owever, it does at times happen that
species may be location specific, in which case the < leguminous tree species utilised in a
particular location may not be suitable at other locations. Qne approach would then be to use <
different species in each location. Qr, a particular tree species may not appear in all the
locations. This structure of treatments falls under the category of Lested 6esigns (or better,
Lested Treatments). The tree species are said to be nested in locations, not crossed as in
factorial treatment. &t is necessary to emphasize that this nested"treatment arrangement can be
applied to any of the basic designs, such as *R6, R*- and )0.
;.9.7 #ested3Bactorial Treatments
This type of treatment arrangement is followed when some factors in the same experiment are
crossed (as in factorial treatment) while others are nested. ,or instance, if we impose three
fertilizer levels to the trees nested in the example above a nested"factorial treatment
arrangement is obtained " provided the same fertiliser levels are used for all trees and locations.
;.9.8 1plit3Plot $rrangement
28
0plit"plot experiments are factorial experiments in which the levels of one factor, for example
tree species, are assigned at random to large plots. The large plots are then divided into small
plots #nown as _sub"plots_ or _split plots_, and the levels of the second factor, say cultural
practices, are assigned at random to small plots within the large plots.
This arrangement is often useful when we wish to combine certain treatments (as in factorial
and nested), some of which re+uire larger plots than others for practical and administrative
convenience. 7xamples are situations re+uiring spraying insecticides, irrigation, tillage trials,
etc. dsually, the treatment on which maximum information is desired is placed in the split"plot or
in the smallest plot.
&t is important to emphasize that the split"plot is not a design as such but rather refers to the
manner in which treatments are allocated to the plots. split"plot arrangement in an R*-
design will usually have two error terms " one for testing the treatments in large plots (not
efficient) and the other for the sub"plot treatments and interactions (very efficient).
split plot design can be further extended to accomodate a third factor through division of each
sub"plot into sub"sub"plots. This is then called a split"split"plot arrangement.
;.9.9 ?ulti3Bactor, %ncomplete Block Designs
lthough factorial experiments provide opportunities to examine interactions among various
factors, they are difficult to conduct when the number of factors and their levels are many.
*onsider a situation involving ( factors, each of which has < levels, ma#ing a total of <( or =<
treatment combinations. The conduct of this experiment will re+uire very large bloc#s if we
employ randomised bloc# design. Qbviously in field plot experimentation this could be a ma/or
defect.
To overcome this difficulty, fractional factorial or confounding designs can be used. &n a
fractional factorial design, only a fraction of the complete set of factorial treatment combinations
is included. !ere the main focus is on selecting and testing only those treatment combinations
which are more important. The fractional factorial design is used in exploratory trials, where the
main ob/ective is to examine the interaction between factors. &n a confounding design all the
treatment combinations of the factors and levels under study are tested with bloc#s containing
less than the full replications of the treatment combinations.
The two procedures do not allow e+ual evaluation of all the effects and interactions. 6epending
on what is being confounded, some effects may not be estimated at all. This problem can be
resolved through a conscious and ob/ective selection of the input variables. 'ith the limited
number of variables in alley farming research, the need for confounding may not be as great as
the need for fractional replications and or balanced incomplete bloc#s. To use the fractional
factorial or confounding designs, the assistance of a statistician is a must.
;.: #otes on laing out field plots
?.=.1 6iscards and 0ample dnits
?.=.$ 0oil !eterogeneity
29
?.=.( Clot Qrientation
?.=.< Clot 0hape and 0ize
?.=.5 0election of 7xperimental 0ite
?.=.= Fuidelines in Recording 6ata
;.:.) Discards and 1ample @nits
s in any field crop experiment, not all the areas in alley farming experimental plots need to be
observed during data collection. &f we are comparing two or more hedgerow species for their
effectiveness in enhancing soil fertility, the following possibilities in layout, sub/ect to land
restriction, could ariseG
1ub&ect to land restriction
The arrangement in (i) provides two plots for each hedgerow species for soil nutrient or crop
yield studies. Qne whole plot is discarded between the last row of a species and the first row of
another species. &f land is not limiting, this arrangement is ideal. 0ome practitioners will even go
30
further to sample or observe the area surrounding only the middle hedgerow, i.e., one half"plot
to the left and one half"plot to the right of the middle hedgerow of the same species.
,or the assessment of the hedgerows themselves, the middle hedgerow constitutes an ideal
sampling unit. !owever, in most practical situations, particularly where the hedgerow species
are spaced widely apart from each other, examination of all hedgerows may be acceptable.
rrangements (ii) and (iii) have been found useful when land is particularly limiting. &n situation
(iii), the area to sample lies between the two rows as mar#ed in treatment .
The arrangement in (iv) is not recommended, but has been used dnder serious land limitation
and species availability situations. The sample plots are half"plots. ,or consistency, either the
right"hand or left"hand side of the hedgerow should always be chosen. Remember that the
hedgerow species at the edges cannot be studied reliably. Fiven enough replications, these
could be ignored. &f the interest is in the yield of the hedge crop itself, then the arrangement in
(iv) is very much appropriate, with the sampling dnit being the inner rows of the hedges. This
implies ideally a minimum of three rows per species for effective assessment.
The areas mar#ed _x_ in the illustrations are usually planted with the agricultural crops, but not
assessed. dnplanted gaps are not recommended as they are li#ely to aggravate the edge
effects.
;.:.+ 1oil "eterogeneit
,or long"term experiments involving perennial crops such as hedgerow species, agronomists
have recognised the need to establish the nature and extent of soil heterogeneity through
_blan#_ trials, before the conduct of the actual trial. This involves the planting of a bul# crop on
the experimental field and monitoring its performance. Qn the other hand, if one is familiar with
the cropping history of an area, this could be considered accordingly while laying out the trial so
as to eliminate the delay in planting trials. 'hen planting on farmerbs plots the #nowledge base
of the farmer should not be ignored.
;.:.7 Plot Jrientation
&rregularly sloped areas should be avoided, but there is no ob/ection to the dse of area with a
near constant slope provided the plots run up and down the slope. The same principle applies
on a fertility gradient. ,or trials on terraces, one should ensure that all the treatments (except in
incomplete bloc# situations) appear on the same terrace, so that a terrace could be regarded as
a bloc# (Rao and Roger, 1 AA1).
;.:.8 Plot 1hape and 1ize
&n alley farming, plot shapes are more li#ely to be s+uare or rectangular than any other shape.
s+uare plot exposes the least number of plants to the edge effect. void circular plots% on
sloping grounds, circular plots tend to be ellipses. s regards plot size, plots that are too small
yield unreliable results. Qn the other hand, excessively large plots waste time and resources.
;.:.9 1election of Experimental 1ite
31
The most important factor in selecting an experimental site is its representativeness of the area.
&t should be of appropriate shape and size for the conduct of the experiment. The land and soil
characteristics as well as past cultural practices should be #nown as far as possible. &t should
have an access to a road and be distant from environmental modifiers.
Mean separation: Multiple comparisons [ST&D Ch.8, except 8.3]
5. 1. Basic concepts
In the analysis of variance, the null hypothesis that is tested is alays that all !eans are e"ual. If
the # statistic is not si$nificant, e fail to re%ect &' and there is nothin$ !ore to do, except
possi(ly redo the experi!ent, ta)in$ !easures to !a)e it !ore sensitive. If &' is re%ected, then
e conclude that at least one !ean is si$nificantly different fro! at least one other !ean. The
overall *+,-* $ives no indication of hich !eans are si$nificantly different. If there are only
to treat!ents, there is no pro(le!. (ut if there are !ore than to treat!ents, the pro(le!
re!ains of needin$ to deter!ine hich !eans are si$nificantly different. This is the process of
!ean separation.
/ean separation ta)es to $eneral for!s0
1. 2lanned, sin$le de$ree of freedo! # tests 3ortho$onal contrasts, last topic4
5. /ultiple co!parison tests that are su$$ested (y the data 3!ultiple co!parison tests,
Topic 64 itself 3this topic4.
,f these to !ethods, ortho$onal # tests are preferred (ecause they are !ore poerful than
!ultiple co!parison tests 3i.e. they are !ore sensitive to differences than are !ultiple
co!parison tests4. *s you sa in the last topic, hoever, contrasts are not alays appropriate
(ecause they !ust satisfy a nu!(er of strict constraints0
1. Contrasts are planned co!parisons, so the researcher !ust have a priori )noled$e a(out
hich co!parisons are !ost interestin$. This prior )noled$e, in fact, deter!ines the
treat!ent structure of the experi!ent.
2. The set of contrasts !ust (e ortho$onal.
3. The researcher is li!ited to !a)in$, at !ost, 3t 7 14 co!parisons.
-ery often, hoever, there is no such prior )noled$e. The treat!ent levels do not fall into
!eanin$ful $roups, and the researcher is left ith no choice (ut to carry out a se"uence of
!ultiple, unconstrained co!parisons for the purpose of ran)in$ and discri!inatin$ !eans. The
different !ethods of !ultiple co!parisons allo the researcher to do %ust that. There are !any
such !ethods, the details of hich for! the (ul) of this topic, (ut $enerally spea)in$ each
32
involves !ore than one co!parison a!on$ three or !ore !eans and are particularly useful in
those experi!ents here there are no particular relationships a!on$ the treat!ent !eans.
5. 2. Error rates
Selection of the !ost appropriate !ultiple co!parison test is heavily influenced (y the error
rate. Recall that a Type I error, occurs when one incorrectly rejects a true !. The Type I
error rate is the fraction of ti!es a Type I error is !ade. In a sin$le co!parison 3i!a$ine a
si!ple t test4 this is the value . 8hen co!parin$ three or !ore treat!ent !eans, hoever,
there are at least to different rates of Type I error0
"omparison#wise Type I error rate $"ER%
This is the nu!(er of Type I errors divided (y the total nu!(er of co!parisons
E&periment#wise Type I error rate $EER%
This is the nu!(er of experi!ents in hich at least one Type I error occurs, divided (y the total
nu!(er of experi!ents
Suppose the experi!enter conducts 1'' experi!ents ith 6 treat!ents each. In each experi!ent
there is a total of 1' possi(le pairise co!parisons that can (e !ade0
Total possi(le pairise co!parisons 3p4 9
5
4 1 3 t t
#or t 9 6, p 9 31:54;36;<4 9 1'
i.e. T1 vs. T5, T3,T<,T6. T5 vs. T3,T<,T6. T3 vs. T<,T6. T< vs. T6
8ith 1'' such experi!ents, therefore, there are a total of 1,''' possi(le pairise co!parisons.
Suppose that there are no true differences a!on$ the treat!ents 3i.e. &' is true4 and that in each
of the 1'' experi!ents, one Type I error is !ade. Then the C=> over all experi!ents is0
"ER9 31'' !ista)es4 : 31''' co!parisons4 9 '.1 or 1'?
The ==> is
EER9 31'' experi!ents ith !ista)es4 : 31'' experi!ents4 9 1 or 1''?.
The ==> is the pro(a(ility of !a)in$ at least one Type I error in the experi!ent. *s the nu!(er
of !eans 3and therefore the nu!(er of possi(le co!parisons4 increases, the chance of !a)in$ at
least one Type I error approaches 1. To preserve a lo experi!ent@ise error rate, then, the
co!parison@ise error rate !ust (e held extre!ely lo. Conversely, to !aintain a reasona(le
co!parison@ise error rate, the experi!ent@ise error rate ill inflate.
The relative i!portance of controllin$ these to Type I error rates depends on the o(%ectives of
the study, and different !ultiple co!parison procedures have (een developed (ased on different
philosophies of controllin$ these to )inds of error. In situations here incorrectly re%ectin$ one
33
co!parison !ay %eopardiAe the entire experi!ent or here the conse"uence of incorrectly
re%ectin$ one co!parison is as serious as incorrectly re%ectin$ a nu!(er of co!parisons, the
control of experi!ent@ise error rate is !ore i!portant. ,n the other hand, hen one erroneous
conclusion ill not affect other inferences in an experi!ent, the co!parison@ise error rate is
!ore pertinent.
The experi!ent@ise error rate is alays lar$er than the co!parison@ise error rate. It is
difficult to co!pute the exact experi!ent@ise error rate (ecause, for a $iven data set, Type I
errors are not independent. But it is possi(le to co!pute an upper (ound for the ==> (y
assu!in$ that the pro(a(ility of a Type I error for any sin$le co!parison is and is independent
of all other co!parisons. In that case0
'pper (oun) EER 9 1 @ 31 @ 4
p
here p9
5
4 1 3 t t
, as (efore.
So for 1' treat!ents and 9 '.'6, the upper (ound of the ==> is '.C 3==> 9 1 7 31 7 '.'64
<6
9
'.C' or C'?4.
The situation is !ore co!plicated than this, hoever. Suppose there are 1' treat!ents and one
shos a si$nificant effect hile the other C are approxi!ately e"ual. Such a situation is
indicated $raphically (elo0
Treatment number
2 4 6 8 10
Y
i.
Y
x
x
x x
x
x
x
x
x
x
..
* si!ple *+,-* ill pro(a(ly re%ect &', so the experi!enter ill ant to deter!ine hich
specific !eans are different. =ven thou$h one !ean is truly different, there is still a chance of
!a)in$ a Type I error in each pairise co!parison a!on$ the C si!ilar treat!ents. *n upper
(ound on this pro(a(ility is co!puted (y settin$ t 9 C in the a(ove for!ula, $ivin$ a result of
'.8<. That is, the experi!enter ill incorrectly conclude that to truly si!ilar effects are
actually different *+, o- the time. This is called the experi!ent@ise error rate under a partial
null hypothesis, the partial null hypothesis in this case (ein$ that the su(set of nine treat!ent
!eans are all e"ual to one another.
So e can distin$uish (eteen the ==> under the co!plete null hypothesis, in hich all
treat!ent !eans are e"ual, and the ==> under a partial null hypothesis, in hich so!e !eans are
e"ual (ut so!e differ. Because of this fact, S*S su(divides the error rates into the folloin$
four cate$ories0
CER 9 co!parison@ise error rate
EERC 9 experi!ent@ise error rate under a co!plete null hypothesis 3standard
==>4
34
EERP 9 experi!ent@ise error rate under a partial null hypothesis.
MEER 9 !axi!u! experi!ent@ise error rate under any co!plete or partial
null hypothesis.
5. 3. Multiple comparisons tests
Statistical !ethods for !a)in$ to or !ore inferences hile controllin$ the Type I error rates
3C=>, ==>C, ==>2, /==>4 are called simultaneous inference methods. The !aterial in this
sectionis (ased pri!arily on ST&D chapter 8 and on the S*S:ST*T !anual 3DE/ 2rocedure4.
The (asic techni"ues of !ultiple co!parisons fall into to $roups0
1. #ixed@ran$e tests0 Those hich provide confidence intervals and tests of hypotheses.
2. /ultiple@ran$e tests0 Those hich provide only tests of hypotheses.
To iustrate t!e "arious #ro$e%ures& 'e 'i use t!e %ata (rom t'o %i)erent
ex#eriments *i"en in Tabe 4+1 ,#re"ious $ass& e-ua re#i$ation. an% 5+1 ,beo'&
une-ua re#i$ation.. T!e /01V/s (or t!ese ex#eriments are *i"en in Tabes 4+2 an%
5+2.
Tabe 5.1. 2ei*!t *ains ,b3anima3%a4. as a)e$te% b4 t!ree %i)erent (ee%in*
rations. 567& 'it! une-ua re#i$ations.
Treatme
nt
0 Tota 8ea
n
5ontro 1.21 1.19 1.17 1.23 1.29 1.14 6 7.23 1.20
9ee%+/ 1.34 1.41 1.38 1.29 1.36 1.42 1.37 1.32 8 10.8
9
1.36
9ee%+: 1.45 1.45 1.51 1.39 1.44 5 7.24 1.45
9ee%+5 1.31 1.32 1.28 1.35 1.41 1.27 1.37 7 9.31 1.33
1"era 26 34.6
7
1.33
Tabe 5+2. /01V/ o( %ata in Tabe 5+1.
;our$e o(
Variation
%( ;um o(
;-uares
8ean
;-uares
9
Tota 25 0.2202
Treatment 3 0.1709 0.05696 25.41
<x#. error 22 0.0493 0.00224
5. 3. 1. .i&e)#ran/e tests
These tests provide a sin$le ran$e for !a)in$ all possi(le pairise co!parisons in experi!ents
ith e"ual replications across treat!ent $roups 3i.e. in (alanced desi$ns4. /any fixed@ran$e
procedures are availa(le, and considera(le controversy exists as to hich procedure is !ost
appropriate. 8e ill present four co!!only used procedures, !ovin$ fro! the less conservative
to the !ore conservative0 ESD, Dunnett, Tu)ey, and Scheffe. ,ther pairise tests are discussed
in the S*S !anual.
35
5. 3. 1. 1. The repeate) t an) least si/ni-icant )i--erence: 012
,ne of the oldest, si!plest, and !ost idely !isused !ultiple pairise co!parison tests is the
least si$nificant difference 3ESD4 test. The ESD is (ased on the t@test 3ST&D 1'14. in fact, it is
si!ply a se"uence of !any t@tests. >ecall the for!ula for the t statistic0
4 3
4 3
r Y
r
s
Y
t

=
here
r
s
s
r Y
=
4 3
This t statistic is distri(uted accordin$ to a t distri(ution ith 3r 7 14 de$rees of freedo!. The
ESD test declares the difference (eteen !eans
i Y
and j Y of treat!ents i and % to (e si$nificant
hen0
F
i Y
7 j Y F G ESD, here

+ =
5 1
,
5
1 1
r r
MSE t LSD
MSE
df

for une"ual r 3S*S calls this a repeated t test4


r
MSE t LSD
MSE
df
5
,
5

= for e"ual r 3S*S calls this an ESD test4


8here /S= 9 pooled s
5
and can (e calculated (y 2>,C *+,-* or 2>,C DE/.
The a(ove statistic is called the studentized range statistic. The "uantity under the s"uare root is
called the standard error of the difference, or S=D. *s an exa!ple, here are the calculations for
Ta(le <.1. +ote that the si$nificance level selected for pairise co!parisons does not have to
confor! to the si$nificance level of the overall # test. To co!pare procedures across the
exa!ples to co!e, e ill use a co!!on 9 '.'6.
#ro! Ta(le <@1, /S= 9 '.''8H ith 1H df.
15<3 . '
6
5
''8H . ' 15' . 5
5
,
5
= = =
r
MSE t LSD
MSE
df

So, if the a(solute difference (eteen any to treat!ent !eans is !ore than '.15<3, the
treat!ents are said to (e si$nificantly different at the 6? confidence level. *s the nu!(er of
treat!ents increases, it (eco!es !ore and !ore difficult, %ust fro! a lo$istical point of vie, to
identify those pairs of treat!ents that are si$nificantly different. * syste!atic procedure for
co!parison and ran)in$ (e$ins (y arran$in$ the !eans in descendin$ or ascendin$ order as
shon (elo0
Control <.1C
&Cl 3.8I
2ropionic 3.I3
Butyric 3.H<
36
,nce the !eans are so arran$ed, co!pare the lar$est ith the s!allest !ean. If these to !eans
are si$nificantly different, co!pare the next lar$est !ean ith the s!allest. >epeat this process
until a non@si$nificant difference is found. Ea(el these to and any !eans in (eteen ith a
co!!on loer case letter (y each !ean. >epeat the process ith the next s!allest !ean, etc.
Jlti!ately, you ill arrive at a !ean separation ta(le li)e the one shon (elo0
Ta(le 6.6
Treat!ent /ean ESD
Control <.1C a
&Cl 3.8I (
2ropionic 3.I3 c
Butyric 3.H< c
2airs of treat!ents that are not si$nificantly different fro! one another share the sa!e letter. #or
the a(ove exa!ple, e dra the folloin$ conclusions at the 6? confidence level0
*ll acids reduced shoot $roth.
The reduction as !ore severe ith (utyric and propionic acid than ith &C1.
8e do not have evidence to conclude that propionic acid is different in its effect than
(utyric acid.
8hen all the treat!ents are e"ually replicated, note that only one ESD value is re"uired to test
all six possi(le pairise co!parisons (eteen treat!ent !eans. This is not true in cases of
une"ual replication, here different ESD values !ust (e calculated for each co!parison
involvin$ different nu!(ers of replications.
#or the second data set 3Ta(le 6.1.4, e find the 6? ESD for co!parin$ the control ith #eed B
to (e0
'6C< . '
6
1
H
1
''55< . ' 'I< . 5
1 1
5 1
,
5
=

+ =

+ =
r r
MSE t LSD
MSE
df

The other re"uired ESDKs are0


* vs. Control 9 '.'631 * vs. B9 '.'6H'
* vs. C 9 '.'6'C B vs. C9 '.'6I6
C vs. Control 9 '.'6<H
Jsin$ these values, e can construct a !ean separation ta(le0
Treatment Mean 012
.ee) B 1.<6 a
.ee) 3 1.3H (
.ee) " 1.33 (
"ontrol 1.5' c
37
Thus, at the 6? level, e conclude all feeds cause si$nificantly $reater ei$ht $ain than the
control. #eed B causes the hi$hest ei$ht $ain. #eeds * and C are e"ually effective.
,ne advanta$e of the ESD procedure is its ease of application. *dditionally, it is easily used to
construct confidence intervals for !ean differences. The 1@ confidence li!its of the "uantity
3L* @ LB4 are $iven (y9
31 7 M4 CI for 3L* @ LB4 9 3
B A
Y Y 4 ESD
Because feer co!parisons are involved, the ESD test is !uch safer hen the !eans to (e
co!pared are selected in advance of the experi!ent. althou$h hardly anyone ever does this. The
test is pri!arily intended for use hen there is no predeter!ined structure to the treat!ents. If a
lar$e nu!(er of !eans are to (e co!pared and the ones co!pared are selected after the *+,-*
and the co!parisons tar$et those !eans ith !ost different values, the actual error rate ill (e
!uch hi$her than predicted.
The ESD test is the only test for hich the co!parison@ise error rate e"uals M. This is often
re$arded as too li(eral 3i.e. too ready to re%ect &'4. It has (een su$$ested that the ===> can (e
!aintained at (y perfor!in$ the overall *+,-* test at the level and !a)in$ further
co!parisons if and only if the # test is si$nificant 3.isher4s 5rotecte) 012 test4. &oever, it
as then de!onstrated that this assertion is false if there are !ore than three !eans. In those
cases, a preli!inary # test controls only the ==>C, not the ==>2.
5. 3. 1. 2. 2unnett4s Metho)
In certain experi!ents, one !ay desire only to co!pare a control ith each of the other
treat!ents, such as co!parin$ a standard variety or che!ical ith several ne ones. DunnettKs
!ethod perfor!s such an analysis hile holdin$ the !axi!u! experi!entise error rate under
any co!plete or partial null hypothesis 3/==>4 to a level not exceedin$ the stated .
In this !ethod, a t; value is calculated for each co!parison. This ta(ular t; value for
deter!inin$ statistical si$nificance, hoever, is not the StudentKs t (ut a special t; $iven in
*ppendix Ta(les *@Ca and *@C( 3ST&D p H5<@H564. Eet
' Y
represent the control !ean ith r'
replications, then0

+ =
5 1
;
1 1
,
5
r r
MSE t DLSD
MSE
df

for une"ual r 3r' N ri4


and
r
MSE t DLSD
MSE
df
5
;
,
5

= for e"ual r 3r' 9 ri4


#ro! the seed treat!ent experi!ent in Ta(le <@1, /S= 9 '.''8H ith 1H df and the nu!(er of
co!parisons 3p49 3.
38
By Ta(le *@1C(,
6C . 5
;
1H ,
5
=

t
.

161C . '
6
5
''8H . ' 6C . 5
5
;
,
5
= = =
r
MSE t DLSD
MSE
df

36ote that 20127 !.152 8 0127 !.12+4


This provides the least si$nificant difference (eteen a control and any other treat!ent. +ote
that the s!allest difference (eteen the control and any acid treat!ent is0
Control @ &C1 9 <.1C @ 3.8I 9 '.35.
Since this difference is lar$er than DESD, it is si$nificant. and all other differences, (ein$ lar$er,
are also si$nificant. The C6? si!ultaneous confidence intervals for all three differences are
co!puted as0
31 7 M4 CI for 3L' @ Li4 9 3
i
Y Y
'
4 DESD
The li!its of these differences are,
Control @ Butyric 9 '.35 O '.16
Control @ &C1 9 '.<H O '.16
Control @ 2ropionic 9 '.66 O '.16
8e have C6? confidence that the 3 ran$es ill include simultaneously the true differences.
8hen treat!ents are not e"ually replicated, as in the feed ration experi!ent, there are different
DESD values for each of the co!parisons. To co!pare the control ith #eed@C, first note that
61I . 5
;
55 , '56 . '
= t
3fro! S*S. (y Ta(le *@C(, t; is 5.6< or 5.61 for 5' and 5< df, respectively40
'HH3 . '
I
1
H
1
''55< . ' 61I . 5
1 1
1 '
;
,
5
=

+ =

+ =
r r
MSE t DLSD
MSE
df

Since F
C
Y Y
'
9 9 '.156 is lar$er than '.'HH5I, the difference is si$nificant. *ll other differences
ith the control, (ein$ lar$er than this, are also si$nificant.
5. 3. 1. 3. Tu:ey4s w proce)ure
Tu)eyKs test as desi$ned specifically for pairwise comparisons. This test, so!eti!es called
the Phonestly si$nificant difference testP 3&SD4, controls the /==> hen the sa!ple siAes are
e"ual. Instead of t or t*, it uses the statistic
MSE
df p
q
, ,
that is o(tained fro! Ta(le *@8. The Tu)ey
critical values are lar$er than those of Dunnett (ecause the Tu)ey fa!ily of contrasts is lar$er 3all
possi(le pairs of !eans instead of %ust co!parisons to a control4. The critical difference in this
!ethod is la(eled w0
39

+ =
5 1
, ,
1 1
5 r r
MSE
q w
MSE
df p
for une"ual r
r
MSE
q w
MSE
df p, ,
= for e"ual r
*side fro! the ne critical value, thin$s loo)s (asically the sa!e as (efore, except notice that
here e do not !ultiply /S= (y a factor of 5 because able A!" already includes t#e factor $%
in its values. #or exa!ple, for p 9 5, df 9 Q 3e"uivalent to the standard nor!al distri(ution R4,
and M9 6?, the critical value is 5.II, hich is e"ual to 1.CH ; S5.
Considerin$ the seed treat!ent data 3Ta(le <.140 " '.'6,3 <, 1H4 9 <.'6. and0
1H8' . '
6
''8H . '
'6 . <
4 , 3 ,
= = =
r
MSE
q w
MSE
df p
36ote that w 7 !.1;*! 8 2012 7 !.151< 8 012 7 !.12+34
By this !ethod, the !eans separation ta(le loo)s li)e0
Ta(le <.1
Treat!ent /ean w
Control <.1C a
&Cl 3.8I (
2ropionic 3.I3 ( c
Butyric 3.H< c
Ei)e the ESD and DunnettKs !ethods, this test detects si$nificant differences (eteen the control
and all other treat!ents. But unli)e ith the ESD !ethod, it detects no si$nificant differences
(eteen the &Cl and 2ropionic treat!ents 3co!pare ith Ta(le 6.64. This reflects the loer
poer of this test.
#or une"ual r, as in the feedin$ experi!ent in Ta(le 6.3, the contrast (eteen the Control ith
#eed@C ould (e tested usin$0
" '.'6,3<, 554 9 3.C3 w 9 3.C3

9 '.'I35
Since 9
C Cont
Y Y F 7 '.156 is lar$er than '.'I31, it is si$nificant. *s in the ESD, the only pairise
co!parison that is not si$nificant is that (eteen #eed C 3
33' . 1 =
C
Y
4 and #eed * 3 3H1 . 1 =
A
Y 4.
40
5. 3. 1. +. 1che--e4s . test -or pairwise comparisons
ScheffeKs test is co!pati(le ith the overall *+,-* & test in the sense that it never declares a
contrast si$nificant if the overall & test is nonsi$nificant. ScheffeKs test controls the /==> for
36= set of contrasts. This includes all possible pairwise and group comparisons. Since this
procedure controls /==> hile alloin$ for a lar$er nu!(er of co!parisons, it is less sensitive
3i.e. !ore conservative4 than other !ultiple co!parison procedures.
The Scheffe critical difference 3SCD4 has a si!ilar structure as that descri(ed for previous tests,
scalin$ the critical # value for its statistic0

+ =
5 1
, ,
1 1
r r
MSE & df SCD
MSE rt
df df rt
for une"ual r
r
MSE & df SCD
MSE rt
df df rt
5
, ,
= for e"ual r
#or the seed treat!ent data 3Ta(le <@14, /S= 9 '.''8H ith dfT>9 3, df/S=9 1H, and r9 6
SCD '.'69

9 '.185C
36ote that 1"2 7 !.1*2< 8 w 7 !.1;*! 8 2012 7 !.151< 8 012 7 !.12+34
*$ain, if the difference (eteen a pair of !eans is $reater than SCD, that difference ill (e
declared si$nificant at the $iven level, w#ile #oldin' MEE( below . The ta(le of !eans
separations0
Treat!ent /ean &s
Control <.1C a
&Cl 3.8I (
2ropionic 3.I3 ( c
Butyric 3.H< c
8hen the !eans to (e co!pared are (ased on une"ual replications, a different SCD is re"uired
for each co!parison. #or the ani!al feed experi!ent, critical difference for the contrast
(eteen the Control and #eed@C is0
SCD '.'6, 33, 554 9

9 '.'ICH
Since 9
C Cont
Y Y F 7 '.156 is lar$er than '.'ICH, it is si$nificant. ScheffeKs procedure is also
readily used for interval esti!ation0
31 7 M4 CI for 3L' @ Li4 9 3
i
Y Y
'
4 SCD
The resultin$ intervals are simultaneous in that the pro(a(ility is at least 31 7 M4 that all of the!
are true si!ultaneously.
41
5.3.1.5 1che--e4s . test -or /roup comparisons
The !ost i!portant use of ScheffeKs test is for ar(itrary comparisons amon/ /roups o- means.
8e use the ord Par(itraryP here (ecause, unli)e the $roup co!parisons usin$ contrasts, $roup
co!parisons usin$ ScheffeKs test do not have to (e ortho$onal, nor are they li!ited to 3t 7 14
"uestions. If you are interested only in testin$ the differences (eteen all pairs of !eans, the
Scheffe !ethod is not the (est choice. Tu)eyKs is (etter (ecause it is !ore sensitive hile
controllin$ /==>. But if you ant to P!ineP your data (y !a)in$ all possi(le co!parisons
3pairise and $roup co!parisons4 hile still controllin$ /==>, ScheffeKs is the ay to $o.
To !a)e co!parisons a!on$ $roups of !eans, you first define a contrast, as in Topic <0
T9

ith the constraint that

3or

for une"ual r4
8e ill re%ect the null hypothesis 3&'4 that the contrast T9 ' if the a(solute value of T is lar$er
than a critical value &S. This is the $eneral for! for ScheffeKs test0
Critical value

=
=
t
i i
i
df df rt S
r
c
MSE & df &
MSE rt
1
5
, ,
+ote that the previous expressions for Scheffe pairise co!parisons 36.3.1.<.4 are for the
particular contrast 1 vs. @1. If e ant to co!pare the control to the avera$e of three acid
treat!ents in Ta(le <.1, the contrast coefficients are U3 @1 @1 71.
In this case T is calculated (y !ultiplyin$ the coefficients for the !eans of the respective
treat!ents.
33< . 1 4 1 3 H<' . 3 4 1 3 I58 . 3 4 1 3 8H8 . 3 4 3 3 1C' . <
1
= + + + = =

=
t
i
i i
Y c )
The critical value &S '.'6, 33, 1H4 value for this contrast is0
<<IC . '
6
4 1 3 4 1 3 4 1 3 3
''8H . ' 4 5< . 3 3 3
5 5 5 5
1
5
, ,
=
+ + +
= =

=
t
i i
i
df df rt S
r
c
MSE & df &
MSE rt

Since T 9 1.33< G '.<<IC 9 #


s
, e re%ect &'. The avera$e of the control 3<.1C' !$4 is
si$nificantly different fro! the avera$e of the three acid treat!ents 33.I<6 !$4.
*$ain, ith ScheffeKs !ethod, you can test any conceiva(le set of contrasts, even if they nu!(er
!ore than 3t 7 14 "uestions and are not ortho$onal. The price you pay for this freedo!, hoever,
is very lo sensitivity. ScheffeKs is the !ost conservative !ethod of co!parin$ !eans. so if
ScheffeKs declares a difference to (e real, you can (elieve it.
>e!e!(er that in these contrasts e are usin$ means no totals.
5. 3. 2. Multiple#sta/e tests
Before e start0 /ultiple ran$e tests should only (e used ith (alanced desi$ns since they are
inefficient ith un(alanced ones.
42
The !ethods discussed so far are all Pfixed@ran$eP tests, so called (ecause they use a sin$le,
fixed value to test hypotheses and (uild si!ultaneous confidence intervals. If one forfeits the
a(ility to (uild si!ultaneous confidence intervals ith a sin$le value, it is possi(le to o(tain
si!ultaneous hypothesis tests of $reater poer usin$ !ultiple@sta$e tests 3/STs4. /STs co!e in
(oth step@up 3first co!parin$ closest !eans, then !ore distant !eans4 and step@don varieties
3the reverse4. (ut only step@don !ethods, hich are !ore idely used, are availa(le in S*S.
The (est )non /STs are the Duncan and the Student@+e!an@Veuls 3S+V4 !ethods. Both
use the studenti*ed ran'e statistic +q, and, hence, also $o (y the na!e multiple ran/e tests.
8ith !eans arran$ed fro! the loest to the hi$hest, a !ultiple@ran$e test provides critical
distances or ran$es that (eco!e s!aller as the pairise !eans to (e co!pared (eco!e closer
to$ether in the array. Such a strate$y allos the researcher to allocate test sensitivity here it is
!ost needed, in discri!inatin$ nei$h(orin$ !eans.
The idea of step@don /STs these tests is this0 The !ore !eans 3i.e. treat!ents4 are co!pared,
the s!aller the pro(a(ility that they are all the sa!e. The $eneral strate$y is0 #irst, the
!axi!u! and !ini!u! !eans are co!pared pairise usin$ the lar$est critical value since the
co!parison involves all the !eans. If this &' is accepted, the procedure stops. ,therise, the
analysis continues (y co!parin$ pairise the to sets of next@!ost@extre!e !eans 3i.e. W1 vs. Wt@
1, and W5 vs. Wt4 usin$ a s!aller critical value, (ecause $roups are no s!aller. This process is
repeated ith closer and closer pairs of !eans until one reaches the set of 3t 7 14 pairs of ad%acent
!eans, co!pared pairise usin$ the s!allest critical value. The lar$er the ran$e of the ran)s, the
lar$er the ta(led critical point.
*llo si!ultaneous hypothesis tests of $reater poer (y forfeitin$ the a(ility to construct
si!ultaneous confidence intervals.
Duncan Student@+e!an@Veuls 3S+V4 >=D8T
*ll three use the StudentiAed ran$e statistic 3q

4, and all three are result@$uided.


Multiple ran/e tests
8ith !eans arran$ed in order, an /ST provides critical distances or ran$es that
(eco!e s!aller as the pairise !eans to (e co!pared (eco!e closer to$ether in
the array. Such a strate$y allos the researcher to allocate test sensitivity here it
is !ost needed, in discri!inatin$ nei$h(orin$ !eans.
43
=
1
=
2
=
3
=
4
=
5

t+1

t+1

t+2

t+2

t+2

>ra#!i$a %e#i$tion o( t!e
*enera strate*4 o( ste#+%o'n
8;Ts. ?n t!is @*ure& t!e means
are arran*e% !i*!est ,=1. to
o'est ,=5.. T!e si*ni@$an$e
e"es o( ea$! o( t!e 10 #ossibe
#air'ise $om#arisons are
in%i$ate% b4 t!e .
The $eneral strate$y0
1 > 2 > 3 > ? > t#1
PConfidenceP is replaced (y the concept of Pprotection levelsP
So if a difference is detected at one level of the test, the researcher is %ustified in separatin$
!eans at a finer resolution ith less protection 3i.e. ith a hi$her 4.
5. 3. 2. 1. 2uncan4s multiple ran/e tests $Ta(le 3#@%
The test is identical to ESD for ad%acent !eans in an array (ut re"uires pro$ressively lar$er
values for si$nificance (eteen !eans as they are !ore idely separated in the array. &oever
for $roups of to !eans uses the sa!e value as ESD.
It controls the C=> at the level (ut it has a hi$h type I error rate 3/==>4. Its operatin$
characteristics appear si!ilar to those of #isherKs unprotected ESD at level . Since the last test is
easier to co!pute, easier to explain, and applica(le to une"ual sa!ple siAes, DuncanKs !ethod is
not reco!!ended (y S*S. The hi$her poer of DuncanKs !ethod co!pared to Tu)ey is, in fact,
due to its hi$her Type 1 error rate 3=inot and Da(riel 1CI64. DuncanKs test used to (e the !ost
popular !ethod (ut !any %ournals no lon$er accept it.
To co!pute Duncan critical ran$es 3>p4, use the folloin$ expression, plu$$in$ in the
appropriate values of the StudentiAed ran$e statistic 3"

40
r
MSE
q (
MSE p
df p p , ,
1
=

The procedure is to co!pute a set of critical values (y usin$ ST&D Ta(le *@I.0
#or the seed treat!ent data in Ta(le <.10
2 5 3 <
44

4
=
1
=
2
=
3
=
4
=
5

3
1
" '.'6 3p, 1H4 3.'' 3.16 3.53
>p '.15< '.131 '.13<
+ote that the critical difference for p95 is the sa!e as the ESD testX
5. 3. 2. 2. The 1tu)ent#6ewman#Aeuls $16A% test
Student@+e!an@Veuls test 3S+V4 is !ore conservative than DuncanKs in that the Ttype I error
rate is s!aller. This is (ecause S+V si!ply uses as the si$nificance level at all sta$es of
testin$, a$ain stoppin$ the analysis at the hi$hest level of non@si$nificance. Because is loer
than DuncanKs varia(le si$nificance values, the poer of S+V is $enerally loer than that of
DuncanKs test.
S+V is often accepted (y %ournals that do not accept DuncanKs test.
The S+V test controls the ==>C at the level (ut it (ehaves poorly in ter!s of the ==>2 and
/==> 3=inot and Da(riel 1CI64. To see this consider ten population !eans that cluster in five
pairs such that !eans ithin pairs are e"ual (ut there are lar$e differences (eteen pairs0
In such case, all su(set ho!o$eneity hypotheses for three or !ore !eans are re%ected. The S+V
!ethod then co!es don to five independent tests, one for each pair, each conducted at the
level. The pro(a(ility of at least one false re%ection is:
1(10.05)
5
= 0.23
*s the nu!(er of !eans increases, the /==> approaches 1. Therefore, the S+V !ethod is not
reco!!ended (y S*S since it does not control ell the !axi!u! experi!ent ise error rate
under any partial null hypothesis 3e.$. the one in the fi$ure a(ove4.
The procedure is to co!pute a set of critical values (y usin$ ST&D Ta(le *@8. #irst co!pare the
!axi!u! and !ini!u! !eans. If the ran$e is not si$nificant
8p 9 "
,3p, /S= df4


#or une"ual r use the sa!e correction as in Tu)ey 36. 3. 1. 3.4.
#or Ta(le <.1 data0
p 5 3 <
" '.'6 3p, 1H4 3.'' 3.H6 <.'6 +ote that for p9 t 8p9 Tu)ey w
8p '.15< '.161 '.1H8 and for p9 5 8p9 ESD
Ta(le 6.C
Treat!ent /ean 8p
Control <.1C a
&Cl 3.8I (
45
=
3
=
4
=
1
=
2
=
5
=
6
=
7
=
8
=
9
=
10
2ropionic 3.I3 c
Butyric 3.H< c
5. 3. 2. 3. The REBCD metho)
* variety of /STs that control /==> have (een proposed, (ut these !ethods are not as ell
)non as those of Duncan and S+V. *n approach developed (y >yan, =inot and Da(riel, and
8elsh 3>=D84 sets0
p9 1@ 31@4
p:t
for p Y t@1 and p 9 for p t@1.
The >=D8T !ethod perfor!s the co!parisons usin$ a ran$e test. This !ethod appears to (e
a!on$ the !ost poerful step@don !ultiple ran$e tests and is reco!!ended (y S*S for e"ual
replication 3i.e. (alanced desi$n4.
*ssu!in$ the sa!ple !eans have (een arran$ed in descendin$ order fro!

1 to

) , the
ho!o$eneity of !eans

i., ...,

%, ith i Y %, is re%ected (y >=D8T if0

i !

j "3p. p, df/S=4

3Jse Ta(le *.8 ST&D4
46
#or Ta(le 6.1 data0
p 5 3 <
p
'.'56 '.'6 '.'6
" p 3p, 1H4
3.<C 3.H6 <.'6
Critical value '.1<6 '.161 '.1H8
#or p9 t and p9 t@1 the critical value is as in S+V, (ut is lar$er for p Y t@1. +ote that the
difference (eteen &Cl and propionic is si$nificant ith S+V (ut no si$nificant ith >=D8T
33.8I @ 3.I3 Y '.1<64.
Ta(le 6.1'
Treat!ent /ean &s
Control <.1C a
&Cl 3.8I (
2ropionic 3.I3 ( c
Butyric 3.H< c
5. +. "onclusions an) recommen)ations
There are at least tenty other para!etric procedures availa(le for !ultiple co!parisons, not to
!ention the !any non@para!etric and !ultivariate !ethods. There is no consensus as to hich
is the !ost appropriate procedure to reco!!end to all users. ,ne !ain difficulty in co!parin$
the procedures is the different )inds of Type I error rates used, na!ely, experi!ent@ise versus
co!parison@ise. *ll this is to say that the difference in perfor!ance of any to procedures is
li)ely due to the different underlyin$ philosophies of Type I error control than to the specific
techni"ues used.
To a lar$e extent, the choice of a procedure is su(%ective and hin$es on a choice (eteen a
co!parison@ise error rate 3such as ESD4 and an experi!ent@ise error rate 3such as Tu)ey and
ScheffeKs test4.
1ome su//este) rules o- thum(0
1. 8hen in dou(t, use Tu)ey. Tu)eyKs !ethod is a $ood $eneral techni"ue for carryin$
out all pairise co!parisons, ena(lin$ you to ran) !eans and put the! into
si$nificance $roups, hile controllin$ /==>.
2. Jse DunnettKs 3!ore poerful than Tu)eyKs4 if you only ish to co!pare each
treat!ent level to a control.
3. Jse ScheffeKs if you ish to test a set of non@ortho$onal $roup co!parisons -( if you
ish to carry out $roup co!parisons in addition to all possi(le pairise co!parisons.
/==> ill (e controlled in (oth cases.
47
The S*S !anual !a)es the folloin$ additional reco!!endation0 #or controllin$ /==> for all
pairise co!parisons, use >=D8T for (alanced desi$ns and Tu)ey for un(alanced desi$ns.
,ne final point to note is that severely un(alanced desi$ns can yield very stran$e results,
re$ardless of !eans separation !ethod. To illustrate this, consider the exa!ple on pa$e 5'' of
ST&D. In this exa!ple, an experi!ent ith four treat!ents 3*, B, C, and D4 have responses in
the order * G B G C G D. * and D each have 5 replications, hile B and C each have 11. The
stran$e result0 The extre!e !eans 3* and D4 are found to (e not si$nificantly different, (ut the
inter!ediate !eans 3B and C4 are.
48
.hi31Huare Test
*hi"s+uare is a statistical test commonly used to compare observed data with data we
would expect to obtain according to a specific hypothesis. ,or example, if, according to
Wendelbs laws, you expected 11 of $1 offspring from a cross to be male and the actual
observed number was @ males, then you might want to #now about the _goodness to fit_
between the observed and expected. 'ere the deviations (differences between
observed and expected) the result of chance, or were they due to other factors. !ow
much deviation can occur before you, the investigator, must conclude that something
other than chance is at wor#, causing the observed to differ from the expected. The chi"
s+uare test is always testing what scientists call the null hpothesis, which states that
there is no significant difference between the expected and observed result.
@ses of .hi31Huare
1. Test for ,ixed ratio hypothesis e.g. genetic ratio
$. Test for independence in a contingency table
(. Test for homogeneity of ratio
The formula for calculating chi"s+uare ( >
$
) isG
>
$
3 #o$e%
$
&e
That is, chi"s+uare is the sum of the s+uared difference between observed ( o) and the
expected (e) data (or the deviation, d), divided by the expected data in all possible
categories.
,or example, suppose that a cross between two pea plants yields a population of @@1
plants, =(A with green seeds and $<1 with yellow seeds. Kou are as#ed to propose the
genotypes of the parents. Kour hypothesis is that the allele for green is dominant to the
allele for yellow and that the parent plants were both heterozygous for this trait. %f our
hpothesis is true, then the predicted ratio of offspring from this cross !ould be
70) Dbased on ?endelKs la!sE as predicted from the results of the Punnett sHuare
DBigure B. 1).
Bigure B.) 3 Punnett 1Huare. Credicted offspring from cross between green and
yellow"seeded plants. Freen (F) is dominant ((9< green% 19< yellow).
49
To calculate >
$
, first determine the number expected in each category. &f the ratio is (G1
and the total number of observed individuals is @@1, then the expected numerical values
should be ==1 green and $$1 yellow.
Chi-square requires that you use numerical values, not percentages or ratios.
Then calculate >
$
using this formula, as shown in Table -.1. Lote that we get a value
of $.==@ for >
$
. -ut what does this number mean. !erebs how to interpret the
$
valueG
1. Determine degrees of freedom DdfE. Degrees of freedom can be calculated as
the number of categories in the problem minus ). %n our example, there are t!o
categories Dgreen and ello!EL therefore, there is % degree of freedom.
$. 6etermine a relative standard to serve as the basis for accepting or re/ecting the
hypothesis. The relative standard commonly used in biological research is p ' 1.15. The
p value is the probability that the deviation of the observed from that expected is due to
chance alone (no other forces acting). &n this case, using p ' 1.15, you would expect
any deviation to be due to chance alone 5J of the time or less.
(. Refer to a chi"s+uare distribution table. dsing the appropriate degrees of bfreedom,
locate the value closest to your calculated chi"s+uare in the table. 6etermine the closest
p (probability) value associated with your chi"s+uare and degrees of freedom. &n this
case ( >
$
3$.==@), the p value is about 1.11, which means that there is a 11J probability
that any deviation from expected results is due to chance only. -ased on our standard p
' 1.15, this is within the range of acceptable deviation. &n terms of your hypothesis for
this example, the observed chi"s+uare is not significantly different from expected. The
observed numbers are consistent with those expected under Wendelbs law.
1tep3b31tep Procedure for Testing Mour "pothesis and .alculating .hi31Huare
1. 0tate the hypothesis being tested and the predicted results. Father the data by
conducting the proper experiment (or, if wor#ing genetics problems, use the data
provided in the problem).
$. 6etermine the expected numbers for each observational class. Remember to use
numbers, not percentages.
50
Chi-square should not be calculated if the expected value in any category is less
than 5.
(. *alculate >
$
using the formula. *omplete all calculations to three significant digits.
Round off your answer to two significant digits.
<. dse the chi"s+uare distribution table to determine significance of the value.
a. 6etermine degrees of freedom and locate the value in the appropriate column.
b. )ocate the value closest to your calculated >
(
on that degrees of freedom df row.
c. Wove up the column to determine the p value.
5. 0tate your conclusion in terms of your hypothesis.
a. &f the p value for the calculated N
+
cal
. is > N
tab
, re&ect our hpothesis.
b. &f the p value for the calculated N
+
is < N
tab
, accept our hpothesis, is due to
chance only.
The chi"s+uare test will be used to test for the _goodness to fit_ between observed and
expected data from several laboratory investigations in this lab manual.
Table B.)
5a$uatin* 5!i+;-uare
6reen Mello!
Qbserved (o) =(A $<1
7xpected (e) ==1 $$1
6eviation (o $ e% "$1 $1
6eviation
$
(d$) <<1 <<1
d
(
&e 1.==@ $
>
$
3 d
$
9e 3 $.==@
. .
51
$dapted b $nne B. ?aben from "tatistics for the ocial ciences" b (icki
1harp
The chi"s+uare (&) test is used to determine whether there is a significant difference between the expected
fre+uencies and the observed fre+uencies in one or more categories. 6o the number of individuals or
ob/ects that fall in each category differ significantly from the number you would expect. &s this difference
between the expected and observed due to sampling error, or is it a real difference.
.hi31Huare Test /eHuirements
1. Duantitative data or counts.
$. there must be two or more categories.
(. &ndependent observations.
< de+uate sample size (at least 11).
5. Random sample.
Expected BreHuencies
'hen you find the value for chi"s+uare, you determine whether the observed fre+uencies differ
significantly from the expected fre+uencies. Kou find the expected fre+uencies for chi s+uare in three
waysG
1teps0
& . Kou hypothesize that all the freHuencies are eHual in each category. ,or example, you might expect
that half of the entering freshmen class of $11 at Tech *ollege will be identified as women and half as
men. Kou figure the expected fre+uency by dividing the number in the sample by the number of
categories. &n this exampie, where there are $11 entering freshmen and two categories, male and female,
you divide your sample of $11 by $, the number of categories, to get 111 (expected fre+uencies) in each
category.
$. Kou determine the expected freHuencies on the basis of some prior )nowledge. )etbs use the Tech
*ollege example again, but this time pretend we have prior #nowledge of the fre+uencies of men and
women in each category from last yearbs entering class, when =1J of the freshmen were men and <1J
were women. This year you might expect that =1J of the total would be men and <1J would be women.
Kou find the expected fre+uencies by multiplying the sample size by each of the hypothesized population
proportions. &f the freshmen total were $11, you would expect 1$1 to be men (=1J x $11) and @1 to be
women (<1J x $11).
Low letbs ta#e a situation, find the expected fre+uencies, and use the chi"s+uare test to solve the
problem.
Example
Thai, the manager of a car dealership, did not want to stoc# cars that were bought less fre+uently
because of their unpopular color. The five colors that he ordered were red, yellow, green, blue, and white.
ccording to Thai, the expected fre+uencies or number of customers choosing each color should follow
the percentages of last year.0he felt $1J would choose yellow, (1J would choose red, 11J would
choose green, 11J would choose blue, and (1J would choose white. 0he now too# a random sample of
151 customers and as#ed them their color preferences. The results of this poll are shown in Table 1 under
the column labeled eobserved fre+uencies._
52
5oour 1bser"e% <x#e$te% ,o+e . or % ,o+e.
2
%
2
,o+e.
2
3e or
%
2
3e

Yeo' 35 30
6e% 50 45
>reen 30 15
:ue 10 15
2!ite 25 45
A
2
B
The expected fre+uencies in Table 1 are figured from last yearbs percentages. -ased on the
percentages for last year, we would expect $1J to choose yellow. ,igure the expected
fre+uencies for yellow by ta#ing $1J of the 151 customers, getting an expected fre+uency of (1
people for this category. ,or the color red we would expect (1J out of 151 or <5 people to fall in
this category. dsing this method, Thai figured out the expected fre+uencies (1, <5, 1*, 1*, and
<5. Qbviously, there are discrepancies between the colors preferred by customers in the poll
ta#en by Thai and the colors preferred by the customers who bought their cars last year. Wost
stri#ing is the difference in the green and white colors. &f Thai were to follow the results of her
poll, she would stoc# twice as many green cars than if she were to follow the customer color
preference for green based on last yearbs sales.
&n the case of white cars, she would stoc# half as many this year. 'hat to do... Thai needs to
#now whether or not the discrepancies between last yearbs choices (expected fre+uencies) and
this yearbs preferences on the basis of his poll (observed fre+uencies) demonstrate a real
change in customer color preferences. &t could be that the differences are simply a result of the
random sample she chanced to select. &f so, then the population of customers really has not
changed from last year as far as color preferences go. The null hypothesis states that there is
no significant difference between the expected and observed fre+uencies. The alternative
hypothesis states they are different. The level of significance (the point at which you can say
with A5J confidence that the difference is LQT due to chance alone) is set at .'9 (the standard
for most science experiments.) The chi"s+uare formula used on these data is
N+ * !" - #$%
where " is the Qbserved ,re+uency in each category
# # is the 7xpected ,re+uency in the corresponding category
O is esum ofe
df is the _degree of freedom_ (n"1)
N+ is *hi 0+uare
P/J.ED@/E
'e are now ready to use our formula for >$ and find out if there is a significant difference
between the
observed and expected fre+uencies for the customers in choosing cars. 'e will set up a
wor#sheet% then you will follow the directions to form the columns and solve the formula.
1. +irections for Setting ,p -or)sheet for !hi S.uare
.ategor " # !" - #$ !" - #$% !" - #$%
#
ello! (5 (1 5 $5 1.@(
red 51 <5 5 $5 1.5=
green (1 15 15 $$5 15
blue 11 15 "5 $5 1.=?
53
!hite $5 <5 "$1 <11 @.@A
N+ * +:.=9
$. fter calculating the *hi 0+uare value, find the /+egrees of 0reedom/ (6Q 123 0DdR7
T!7 LdW-7R KQd F7T, LQR ,&L6 T!7 0DdR7 RQQT " T!7 LdW-7R KQd F7T ,RQW
*QWC)7T&LF T!7 *)*d)T&QL0 0 -Q27 &0 e*!& 0DdR7.)
+egrees of freedom !df$ refers to the number of values that are free to vary after restriction has
been
placed on the data. ,or instance, if you have four numbers with the restriction that their sum has
to be 51,
then three of these numbers can be anything, they are free to vary, but the fourth number
definitely is
restricted. ,or example, the first three numbers could be 15, $1, and 5, adding up to <1% then
the fourth
number has to be 11 in order that they sum to 51. The degrees of freedom for these values are
then three.
The degrees of freedom here is defined as 1 $ 1, the number in the group minus one restriction
(< " & ).
(. ,ind the table value for *hi 0+uare. -egin by finding the df found in step $ along the left hand
side of the table. Run your fingers across the proper row until you reach the predetermined level
of significance (.15) at the column heading on the top of the table. The table value for *hi
0+uare in the correct box of & df and '(.)5 level of significance is =.8=.
<. &f the calculated chi"s+uare value for the set of data you are analyzing ($=.A5) is e+ual to or
greater than the table value (A.<A ), re&ect the null hypothesis. *here + a significant
difference bet,een the data sets that cannot be due to chance alone. &f the number you
calculate is )700 than the number you find on the table, than you can probably say that any
differences are due to chance alone.
&n this situation, the re/ection of the null hypothesis means that the differences between the
expected
fre+uencies (based upon last yearbs car sales) and the observed fre+uencies (based upon this
yearbs poll
ta#en by Thai) are not due to chance. That is, they are not due to chance variation in the
sample Thai too#% there is a real difference between them. Therefore, in deciding what color
autos to stoc#, it would be to Thaibs advantage to pay careful attention to the results of her pollf
The steps in using the chi3sHuare test ma be summarized as follo!s0
.hi31Huare &. 'rite the observed fre+uencies in column "
Test 1ummar $. ,igure the expected fre+uencies and write them in column #.
(. dse the formula to find the chi"s+uare valueG
<. ,ind the df. (1"1)
5. ,ind the table value (consult the *hi 0+uare Table.)
=. &f your chi"s+uare value is equal to or greater than the table value, re/ect the null
hypothesisG differences in your data are not due to chance alone ,or example, the reason
observed fre+uencies in a fruit fly genetic breeding lab did not match expected fre+uencies
could be due to such influences asG
Y Wate selection (certain flies may prefer certain mates)
Y Too small of a sample size was used
Y &ncorrect identification of male or female flies
Y The wrong genetic cross was sent from the lab
Y The flies were mixed in the bottle (carrying unexpected alleles)
54
Ta(le B.2
Chi@S"uare Distri(ution
De$rees of
#reedo!
+df,
5ro(a(ility 3p4
'.C6 '.C' '.8' '.I' '.6' '.3' '.5' '.1' '.'6 '.'1 '.''1
1 '.''< '.'5 '.'H '.16 '.<H 1.'I 1.H< 5.I1 3.8< H.H< 1'.83
5 '.1' '.51 '.<6 '.I1 1.3C 5.<1 3.55 <.H' 6.CC C.51 13.85
3 '.36 '.68 1.'1 1.<5 5.3I 3.HH <.H< H.56 I.85 11.3< 1H.5I
< '.I1 1.'H 1.H6 5.5' 3.3H <.88 6.CC I.I8 C.<C 13.58 18.<I
6 1.1< 1.H1 5.3< 3.'' <.36 H.'H I.5C C.5< 11.'I 16.'C 5'.65
H 1.H3 5.5' 3.'I 3.83 6.36 I.53 8.6H 1'.H< 15.6C 1H.81 55.<H
I 5.1I 5.83 3.85 <.HI H.36 8.38 C.8' 15.'5 1<.'I 18.<8 5<.35
8 5.I3 3.<C <.6C 6.63 I.3< C.65 11.'3 13.3H 16.61 5'.'C 5H.15
C 3.35 <.1I 6.38 H.3C 8.3< 1'.HH 15.5< 1<.H8 1H.C5 51.HI 5I.88
1' 3.C< <.8H H.18 I.5I C.3< 11.I8 13.<< 16.CC 18.31 53.51 5C.6C
+onsi$nificant Si$nificant
55

S-ar putea să vă placă și