Documente Academic
Documente Profesional
Documente Cultură
Analysis of Variance, in its simplest form, can be thought of as an extension of the two-sample t-test to more than two samples. (Further methods in Chapter 8 of usiness !tatistics" As an example# $sing the %-sample t-test we ha&e tested to see whether there was any difference between the si'e of in&oices in a company(s )eeds and radford stores. $sing Analysis of &ariance we can, simultaneously, in&estigate in&oices from as many towns as we wish, assuming that sufficient data is a&ailable. Problem# *hy can(t we +ust carry out repeated t-tests on pairs of the &ariables, -f many independent tests are carried out pairwise then the probability of being correct for t e combine! re"#lt" i" greatly re!#ce!. For example, if we compare the a&erage mar.s of two students at the end of a semester to see if their mean scores are significantly different we would ha&e, at a /0 le&el, 1.2/ probability of being correct. Comparing more students# St#!ent" % 4 n 31 Pair$i"e te"t" 3 4 7n(n-3"89% :/ P( all correct) 1.2/ 1.2/4 5 1.86/ 1.2/n 1.2/:/ 5 1.31 P(at lea"t one incorrect) 1.1/ 1.3%/ 3 - 1.2/n 1.21
;b&iously this situation is not acceptable Sol#tion# *e need therefore to use methods of analysis which will allow the &ariation between all n mean" to be te"te! "im#ltaneo#"ly gi&ing an o%erall probability of 1.2/ of being correct at the /0 le&el. <his type of analysis is referred to as Analy"i" of Variance or (A=;VA( in short. -n general, it# measures the o&erall &ariation within a &ariable> finds the &ariation between its group means> combines these to calculate a single test statistic> uses this to carry out a hypothesis test in the usual manner.
-n general, Analy"i" of Variance, ANOVA, compares the &ariation between groups and the &ariation within samples by analysing their &ariances. One&$ay ANOVA# -s there any difference between the a&erage sales at &arious departmental stores within a company, '$o&$ay ANOVA# -s there any difference between the a&erage sales at &arious stores within a company and9or the types of department, <he o&erall &ariation is split (two ways(.
One&$ay ANOVA
Variation due to difference bet$een t e gro#p", i.e. between the group means. (!!@"
Re"i!#al (error" &ariation not due to difference between the group means. (!!A"
Variation due to difference bet$een t e gro#p", i.e. between the group means (!!@"
Cesidual (error" &ariation not due to difference between the main group means. (!!A3" +en#ine re"i!#al (error) &ariation not due to difference between either set of group means (!!A"
Variation due to difference bet$een t e bloc* mean", i.e. second group means (!! l"
where !!< 5 <otal !um of !?uares> !!@ 5 <reatment !um of !?uares between the groups> !! l 5 loc.s !um of !?uares> !!A 5 !um of !?uares of Arrors. (At this stage +ust thin. of (sums of s?uares( as being a measure of &ariation." <he method of measuring this &ariation is %ariance, which is standard de&iation s?uared. 'otal %ariance ( bet$een gro#p" %ariance ) %ariance !#e to t e error" -t follows that# <otal sum of 5 s?uares (!!<" !um of s?uares between B !um of s?uares due the groups (!!@" to the errors (!!A"
-f we find any two of the three sums of s?uares then the other can be found by difference. -n practice we calculate !!< and !!@ and then find !!A by difference. !ince the method is much easier to understand with a numerical example, it will be explained in stages using theory and a numerical example simultaneously.
E,ample ;ne important factor in selecting software for word processing and database management systems is the time re?uired to learn how to use a particular system. -n order to e&aluate three database management systems, a firm de&ised a test to see how many training hours were needed for fi&e of its word processing operators to become proficient in each of three systems. !ystem A 3D 32 3: 34 38 hours !ystem !ystem C 3D %: 36 %% 34 32 3% 38 36 %% hours hours
$sing a /0 significance le&el, is there any difference between the training time needed for the three systems, -n this case the (groups( are the three database management systems. <hese account for some, but not all, of the total &ariance. !ome, howe&er, is not explained by the difference between them. <he residual &ariance is referred to as that due to the EerrorsF. <otal &ariance 5 between systems &ariance B &ariance due to the errors. -t follows that# <otal sum of s?uares (!!<" 5 !um of s?uares B between systems (!!!ys" !um of s?uares of errors (!!A"
<he (s?uare( for each case is (x - x "% where x is the &alue for that case and x is the mean. <he (total sum of s?uares( is therefore ( x x )% . <he classical method for calculating this sum is to tabulate the &alues> subtract the mean from each &alue> s?uare the results> and finally sum the s?uares. <he use of a statistical calculator is preferableG -n the lecture on summary statistics we saw that the standard de&iation is calculated by#
sn = ( x x) n
%
or s n 3 =
( x x ) n 3
Hxn-3I oth methods estimate exactly the same &alue for the total sum of s?uares. 'otalSS -nput all the data indi&idually and output the &alues for n , x and n from % the calculator in !J mode. $se these &alues to calculate % n and nn . / / n n n nn . ,
3/ SSSy" Calculate
36.44
4.:32
33.D2
36/.4 5 !! <otal
n
!ystem A !ystem !ystem C / / /
,
3D 3/ %3
n
SS for Sy"tem" 3/
,
36.44
/ n
/ nn .
%.D%/
D.882
314.4 5 SSSy"
0"ing t e ANOVA table elow is the general format of an analysis of &ariance table. -f you find it helpful then ma.e use of it, otherwise +ust wor. with the numbers, as on the next page. +eneral ANOVA 'able (for . groups, total sample si'e =" So#rce 3et$een gro#p" Error" 'otal 2et o! Fill in the total sum of s?uares, !!<, and the between groups sum of s?uares, !!@, after calculation> find the sum of s?uares due to the errors, !!A, by difference> the degrees of freedom, d.f., for the total and the groups are one less than the total number of &alues and the number of groups respecti&ely> find the error degree of freedom by difference> the mean sums of s?uares, K.!.!., is found in each case by di&iding the sum of s?uares, !!, by the corresponding degrees of freedom. <he test statistic, F, is the ratio of the mean sum of s?uares due to the differences between the group means and that due to the errors. S1S !!@ !!A !!< !1f1 .-3 (=-3" - (.-3" =-3 21S1S1
!!@ = K!@ . 3 !!A = K!A = .
F
K!@ =F K!A
-n this example# (. 5 4 systems, = 5 3/ &alues" !ource etween systems Arrors <otal !.!. 314.4 6%.1 36/.4 d.f. 4-35% 3: - % 5 3% 3/ - 3 5 3: K.!.!. 314.49% 5 /3.D/ 6%.193% 5 D.11 F /3.D/9D.11 5 8.D3
' e ypot e"i" te"t <he methodology for this hypothesis test is similar to that described last wee.. ' e n#ll ypot e"i"4 56, is that all the group means are e?ual. L1# 3 5 % 5 4 5 : etc. ' e alternati%e ypot e"i"4 5-, is that at least two of the group means are different. ' e "ignificance le%el is as stated or /0 by default. ' e critical %al#e is from the F-tables, F ( 3 , % ) , with the two degrees of freedom from the groups, 3, and the errors, %. ' e te"t "tati"tic is the F-&alue calculated from the sample in the A=;VA table. ' e concl#"ion is reached by comparing the test statistic with the critical &alue and re+ecting the null hypothesis if the test statistic is the larger of the two. E,ample - (cont1) 56# A 5 5 C 'e"t "tati"tic# 8.D3 Concl#"ion# <.!. M C.V. so re+ect L1. <here is a difference between the mean learning times for at least two of the three database management systems. 5-# At least two of the means are different.
Critical %al#e# F1.1/ (%,3%" 5 4.82 (Jeg. of free. from (between systems( and (errors(."
7 ere !oe" any !ifference lie8 *e can calculate a critical !ifference, CJ, which depends on the K!A, the sample si'es and the significance le&el, such that any difference between means which exceeds the CJ is significant and any less than it is not. <he critical difference formula is#
3 3 CJ = t K!A n + n 3 %
t has the error degrees of freedom and one tail. K!A from the A=;VA table.
E,ample- (cont1)
3 3 CJ = t K!A n + n % 3 3 3 3.68 D.11 + / / = %.6D
From the samples x A = 3D, x = 3/, x C = %3. !ystem C ta.es significantly longer to learn than !ystems A and '$o&$ay ANOVA
-n the abo&e example it might ha&e been reasonable to suggest that the fi&e ;perators might ha&e different learning speeds and were therefore responsible for some of the &ariation in the time needed to master the three !ystems. y extending the analysis from one-way A=;VA to two-way A=;VA we can find our whether ;perator &ariability is a significant factor or whether the differences found pre&iously were +ust due to the !ystems. E,ample / 3 !ystem A !ystem !ystem C 3D 3D %: % 32 36 %% ;perators 4 3: 34 32 : 34 3% 38 / 38 36 %%
Again we as. the same ?uestion# using a /0 le&el, is there any difference between the training time for the three systems, *e can use the ;perator &ariation +ust to explain some of the unexplained error thereby reducing it, (bloc.ed( design, or we can consider it in a similar manner to the !ystem &ariation in the last example in order to see if there is a difference between the ;perators. -n the first case the (groups( are the three database management systems and the (bloc.s( being used to reduce the error are the different operators who themsel&es may differ in speed of learning. -n the second we ha&e a second set of groups - the ;perators. <otal &ariance 5 between systems &ariance B between operators &ariance B &ariance of errors. !o <otal sum 5 !um of s?uares B !um of s?uares B of s?uares between systems between operators (!!<" (!!!ys" (!!;ps" !um of s?uares of errors (!!A"
-n %-way A=;VA we find !!<, !!!ys, !!;ps and then find !!A by difference. !!A 5 !!< - !!!ys - !!;ps *e already ha&e !!< and !!!ys from 3-way A=;VA but still need to find !!;ps.
;perators 6
3 3D 3D %: 38.D6
% 32 36 %% 32.44
4 3: 34 32 3/.44
: 34 3% 38 3:.44
/ 38 36 %% 32.11
From example 3# SS' 5 36/.4 and SSSy" 5 314.4 SSOp" -nputting the ;perator means as fre?uency data (n 5 4" gi&es# n 5 3/,
, 5 36.44,
n 5 %.168
and
/ nn 5 D:.6 5 SSOp"
'$o&$ay ANOVA table, including both !ystems and ;perators# !ource etween !ystems etween ;perators Arrors <otal !.!. 314.4 D:.6 6.4 36/.4 d.f. 4-35% /-35: 3: - D 5 8 3/ - 3 5 3: K.!.!. 314.49% 5 /3.D/ D:.69: 5 3D.38 6.498 5 1.23 F /3.D/91.23 5 /D.6D 3D.3891.23 5 36.68
5ypot e"i" te"t (-) for Sy"tem" 56# A = = C 5-# At least two of them are different. Critical %al#e# F1.1/ (%,8" 5 :.:D 'e"t Stati"tic# /D.6D (=otice how the test statistic has increased with the use of the more powerful two-way A=;VA" Concl#"ion# <.!. M C.V. so re+ect L1. <here is a difference between at least two of the mean times needed for training on the different systems. $sing CJ of 3.3% (see o&erhead"# C ta.es significantly longer to learn than A and9or .
5ypot e"i" te"t (/) for Operator" 56# 3 = % = 4 = : = / 5-# At least two of them are different. Critical %al#e# F1.1/ (:,8" 5 4.8: 'e"t Stati"tic# 36.68 Concl#"ion# <.!. M C.V. so re+ect L1. <here is a difference between at least two of the ;perators in the mean time needed for learning the systems. $sing CJ of 3.:/ calculated as pre&iously (see o&erhead"# ;perators 4 and : are significantly ?uic.er learners than ;perators 3, % and /.
'able 9
( ;<
/ = 9 ; > ? @ A -6 --/ -= -9 -; -> -? -@ -A /6 /// /= /9 /; /> /? /@ /A =6 96 >6 -/6 3D3.:1 38./3 31.34 6.63 D.D3 /.22 /./2 /.4% /.3% :.2D :.8: :.6/ :.D6 :.D1 :./: :.:2 :.:/ :.:3 :.48 :.4/ :.4% :.41 :.%8 :.%D :.%: :.%4 :.%3 :.%1 :.38 :.36 :.18 :.11 4.2% 4.8: / 322./1 32.11 2.// D.2: /.62 /.3: :.6: :.:D :.%D :.31 4.28 4.82 4.83 4.6: 4.D8 4.D4 4./2 4.// 4./% 4.:2 4.:6 4.:: 4.:% 4.:1 4.42 4.46 4.4/ 4.4: 4.44 4.4% 4.%4 4.3/ 4.16 4.11 =
%3/.61 32.3D 2.%8 D./D /.:3 :.6D :.4/ :.16 4.8D 4.63 4./2 4.:2 4.:3 4.4: 4.%2 4.%: 4.%1 4.3D 4.34 4.31 4.16 4.1/ 4.14 4.13 %.22 %.28 %.2D %.2/ %.24 %.2% %.8: %.6D %.D8 %.D1