Sunteți pe pagina 1din 9

ANALYSIS OF VARIANCE (ANOVA)

Analysis of Variance, in its simplest form, can be thought of as an extension of the two-sample t-test to more than two samples. (Further methods in Chapter 8 of usiness !tatistics" As an example# $sing the %-sample t-test we ha&e tested to see whether there was any difference between the si'e of in&oices in a company(s )eeds and radford stores. $sing Analysis of &ariance we can, simultaneously, in&estigate in&oices from as many towns as we wish, assuming that sufficient data is a&ailable. Problem# *hy can(t we +ust carry out repeated t-tests on pairs of the &ariables, -f many independent tests are carried out pairwise then the probability of being correct for t e combine! re"#lt" i" greatly re!#ce!. For example, if we compare the a&erage mar.s of two students at the end of a semester to see if their mean scores are significantly different we would ha&e, at a /0 le&el, 1.2/ probability of being correct. Comparing more students# St#!ent" % 4 n 31 Pair$i"e te"t" 3 4 7n(n-3"89% :/ P( all correct) 1.2/ 1.2/4 5 1.86/ 1.2/n 1.2/:/ 5 1.31 P(at lea"t one incorrect) 1.1/ 1.3%/ 3 - 1.2/n 1.21

;b&iously this situation is not acceptable Sol#tion# *e need therefore to use methods of analysis which will allow the &ariation between all n mean" to be te"te! "im#ltaneo#"ly gi&ing an o%erall probability of 1.2/ of being correct at the /0 le&el. <his type of analysis is referred to as Analy"i" of Variance or (A=;VA( in short. -n general, it# measures the o&erall &ariation within a &ariable> finds the &ariation between its group means> combines these to calculate a single test statistic> uses this to carry out a hypothesis test in the usual manner.

-n general, Analy"i" of Variance, ANOVA, compares the &ariation between groups and the &ariation within samples by analysing their &ariances. One&$ay ANOVA# -s there any difference between the a&erage sales at &arious departmental stores within a company, '$o&$ay ANOVA# -s there any difference between the a&erage sales at &arious stores within a company and9or the types of department, <he o&erall &ariation is split (two ways(.

One&$ay ANOVA

'otal &ariation (!!<"

Variation due to difference bet$een t e gro#p", i.e. between the group means. (!!@"

Re"i!#al (error" &ariation not due to difference between the group means. (!!A"

'$o&$ay ANOVA 'otal &ariation (!!<"

Variation due to difference bet$een t e gro#p", i.e. between the group means (!!@"

Cesidual (error" &ariation not due to difference between the main group means. (!!A3" +en#ine re"i!#al (error) &ariation not due to difference between either set of group means (!!A"

Variation due to difference bet$een t e bloc* mean", i.e. second group means (!! l"

where !!< 5 <otal !um of !?uares> !!@ 5 <reatment !um of !?uares between the groups> !! l 5 loc.s !um of !?uares> !!A 5 !um of !?uares of Arrors. (At this stage +ust thin. of (sums of s?uares( as being a measure of &ariation." <he method of measuring this &ariation is %ariance, which is standard de&iation s?uared. 'otal %ariance ( bet$een gro#p" %ariance ) %ariance !#e to t e error" -t follows that# <otal sum of 5 s?uares (!!<" !um of s?uares between B !um of s?uares due the groups (!!@" to the errors (!!A"

-f we find any two of the three sums of s?uares then the other can be found by difference. -n practice we calculate !!< and !!@ and then find !!A by difference. !ince the method is much easier to understand with a numerical example, it will be explained in stages using theory and a numerical example simultaneously.

E,ample ;ne important factor in selecting software for word processing and database management systems is the time re?uired to learn how to use a particular system. -n order to e&aluate three database management systems, a firm de&ised a test to see how many training hours were needed for fi&e of its word processing operators to become proficient in each of three systems. !ystem A 3D 32 3: 34 38 hours !ystem !ystem C 3D %: 36 %% 34 32 3% 38 36 %% hours hours

$sing a /0 significance le&el, is there any difference between the training time needed for the three systems, -n this case the (groups( are the three database management systems. <hese account for some, but not all, of the total &ariance. !ome, howe&er, is not explained by the difference between them. <he residual &ariance is referred to as that due to the EerrorsF. <otal &ariance 5 between systems &ariance B &ariance due to the errors. -t follows that# <otal sum of s?uares (!!<" 5 !um of s?uares B between systems (!!!ys" !um of s?uares of errors (!!A"

-n practice we transpose this e?uation and use# Calc#lation of S#m" of S.#are"

!!A 5 !!< - !!!ys

<he (s?uare( for each case is (x - x "% where x is the &alue for that case and x is the mean. <he (total sum of s?uares( is therefore ( x x )% . <he classical method for calculating this sum is to tabulate the &alues> subtract the mean from each &alue> s?uare the results> and finally sum the s?uares. <he use of a statistical calculator is preferableG -n the lecture on summary statistics we saw that the standard de&iation is calculated by#
sn = ( x x) n
%

so ( x x )% 5 ns % n with s from the calculator using HxnI


%

or s n 3 =

( x x ) n 3

so ( x x )% 5 ( n 3)s % n 3 with s from the calculator using

Hxn-3I oth methods estimate exactly the same &alue for the total sum of s?uares. 'otalSS -nput all the data indi&idually and output the &alues for n , x and n from % the calculator in !J mode. $se these &alues to calculate % n and nn . / / n n n nn . ,

3/ SSSy" Calculate

36.44

4.:32

33.D2

36/.4 5 !! <otal

and x for each of the management systems separately#

n
!ystem A !ystem !ystem C / / /

,
3D 3/ %3

-nput as fre?uency data, (n 5 /", in your calculator and output n, , and n .

n
SS for Sy"tem" 3/

,
36.44

/ n

/ nn .

%.D%/

D.882

314.4 5 SSSy"

SSE is found by difference !!A 5 !!< - !!!ys 5 36/.4 - 314.4 5 6%.1

0"ing t e ANOVA table elow is the general format of an analysis of &ariance table. -f you find it helpful then ma.e use of it, otherwise +ust wor. with the numbers, as on the next page. +eneral ANOVA 'able (for . groups, total sample si'e =" So#rce 3et$een gro#p" Error" 'otal 2et o! Fill in the total sum of s?uares, !!<, and the between groups sum of s?uares, !!@, after calculation> find the sum of s?uares due to the errors, !!A, by difference> the degrees of freedom, d.f., for the total and the groups are one less than the total number of &alues and the number of groups respecti&ely> find the error degree of freedom by difference> the mean sums of s?uares, K.!.!., is found in each case by di&iding the sum of s?uares, !!, by the corresponding degrees of freedom. <he test statistic, F, is the ratio of the mean sum of s?uares due to the differences between the group means and that due to the errors. S1S !!@ !!A !!< !1f1 .-3 (=-3" - (.-3" =-3 21S1S1
!!@ = K!@ . 3 !!A = K!A = .

F
K!@ =F K!A

-n this example# (. 5 4 systems, = 5 3/ &alues" !ource etween systems Arrors <otal !.!. 314.4 6%.1 36/.4 d.f. 4-35% 3: - % 5 3% 3/ - 3 5 3: K.!.!. 314.49% 5 /3.D/ 6%.193% 5 D.11 F /3.D/9D.11 5 8.D3

' e ypot e"i" te"t <he methodology for this hypothesis test is similar to that described last wee.. ' e n#ll ypot e"i"4 56, is that all the group means are e?ual. L1# 3 5 % 5 4 5 : etc. ' e alternati%e ypot e"i"4 5-, is that at least two of the group means are different. ' e "ignificance le%el is as stated or /0 by default. ' e critical %al#e is from the F-tables, F ( 3 , % ) , with the two degrees of freedom from the groups, 3, and the errors, %. ' e te"t "tati"tic is the F-&alue calculated from the sample in the A=;VA table. ' e concl#"ion is reached by comparing the test statistic with the critical &alue and re+ecting the null hypothesis if the test statistic is the larger of the two. E,ample - (cont1) 56# A 5 5 C 'e"t "tati"tic# 8.D3 Concl#"ion# <.!. M C.V. so re+ect L1. <here is a difference between the mean learning times for at least two of the three database management systems. 5-# At least two of the means are different.

Critical %al#e# F1.1/ (%,3%" 5 4.82 (Jeg. of free. from (between systems( and (errors(."

7 ere !oe" any !ifference lie8 *e can calculate a critical !ifference, CJ, which depends on the K!A, the sample si'es and the significance le&el, such that any difference between means which exceeds the CJ is significant and any less than it is not. <he critical difference formula is#
3 3 CJ = t K!A n + n 3 %

t has the error degrees of freedom and one tail. K!A from the A=;VA table.

E,ample- (cont1)
3 3 CJ = t K!A n + n % 3 3 3 3.68 D.11 + / / = %.6D

From the samples x A = 3D, x = 3/, x C = %3. !ystem C ta.es significantly longer to learn than !ystems A and '$o&$ay ANOVA

which are similar.

-n the abo&e example it might ha&e been reasonable to suggest that the fi&e ;perators might ha&e different learning speeds and were therefore responsible for some of the &ariation in the time needed to master the three !ystems. y extending the analysis from one-way A=;VA to two-way A=;VA we can find our whether ;perator &ariability is a significant factor or whether the differences found pre&iously were +ust due to the !ystems. E,ample / 3 !ystem A !ystem !ystem C 3D 3D %: % 32 36 %% ;perators 4 3: 34 32 : 34 3% 38 / 38 36 %%

Again we as. the same ?uestion# using a /0 le&el, is there any difference between the training time for the three systems, *e can use the ;perator &ariation +ust to explain some of the unexplained error thereby reducing it, (bloc.ed( design, or we can consider it in a similar manner to the !ystem &ariation in the last example in order to see if there is a difference between the ;perators. -n the first case the (groups( are the three database management systems and the (bloc.s( being used to reduce the error are the different operators who themsel&es may differ in speed of learning. -n the second we ha&e a second set of groups - the ;perators. <otal &ariance 5 between systems &ariance B between operators &ariance B &ariance of errors. !o <otal sum 5 !um of s?uares B !um of s?uares B of s?uares between systems between operators (!!<" (!!!ys" (!!;ps" !um of s?uares of errors (!!A"

-n %-way A=;VA we find !!<, !!!ys, !!;ps and then find !!A by difference. !!A 5 !!< - !!!ys - !!;ps *e already ha&e !!< and !!!ys from 3-way A=;VA but still need to find !!;ps.

;perators 6

!ystem A !ystem !ystem C Keans

3 3D 3D %: 38.D6

% 32 36 %% 32.44

4 3: 34 32 3/.44

: 34 3% 38 3:.44

/ 38 36 %% 32.11

Keans 3D.11 3/.11 %3.11 36.44

From example 3# SS' 5 36/.4 and SSSy" 5 314.4 SSOp" -nputting the ;perator means as fre?uency data (n 5 4" gi&es# n 5 3/,

, 5 36.44,

n 5 %.168

and

/ nn 5 D:.6 5 SSOp"

'$o&$ay ANOVA table, including both !ystems and ;perators# !ource etween !ystems etween ;perators Arrors <otal !.!. 314.4 D:.6 6.4 36/.4 d.f. 4-35% /-35: 3: - D 5 8 3/ - 3 5 3: K.!.!. 314.49% 5 /3.D/ D:.69: 5 3D.38 6.498 5 1.23 F /3.D/91.23 5 /D.6D 3D.3891.23 5 36.68

5ypot e"i" te"t (-) for Sy"tem" 56# A = = C 5-# At least two of them are different. Critical %al#e# F1.1/ (%,8" 5 :.:D 'e"t Stati"tic# /D.6D (=otice how the test statistic has increased with the use of the more powerful two-way A=;VA" Concl#"ion# <.!. M C.V. so re+ect L1. <here is a difference between at least two of the mean times needed for training on the different systems. $sing CJ of 3.3% (see o&erhead"# C ta.es significantly longer to learn than A and9or .

5ypot e"i" te"t (/) for Operator" 56# 3 = % = 4 = : = / 5-# At least two of them are different. Critical %al#e# F1.1/ (:,8" 5 4.8: 'e"t Stati"tic# 36.68 Concl#"ion# <.!. M C.V. so re+ect L1. <here is a difference between at least two of the ;perators in the mean time needed for learning the systems. $sing CJ of 3.:/ calculated as pre&iously (see o&erhead"# ;perators 4 and : are significantly ?uic.er learners than ;perators 3, % and /.

'able 9

PERCEN'A+E POIN'S OF '5E F :IS'RI30'ION

( ;<

/ = 9 ; > ? @ A -6 --/ -= -9 -; -> -? -@ -A /6 /// /= /9 /; /> /? /@ /A =6 96 >6 -/6 3D3.:1 38./3 31.34 6.63 D.D3 /.22 /./2 /.4% /.3% :.2D :.8: :.6/ :.D6 :.D1 :./: :.:2 :.:/ :.:3 :.48 :.4/ :.4% :.41 :.%8 :.%D :.%: :.%4 :.%3 :.%1 :.38 :.36 :.18 :.11 4.2% 4.8: / 322./1 32.11 2.// D.2: /.62 /.3: :.6: :.:D :.%D :.31 4.28 4.82 4.83 4.6: 4.D8 4.D4 4./2 4.// 4./% 4.:2 4.:6 4.:: 4.:% 4.:1 4.42 4.46 4.4/ 4.4: 4.44 4.4% 4.%4 4.3/ 4.16 4.11 =

N#merator !egree" of free!om


9 %%:.D1 32.%/ 2.3% D.42 /.32 :./4 :.3% 4.8: 4.D4 4.:8 4.4D 4.%D 4.38 4.33 4.1D 4.13 %.2D %.24 %.21 %.86 %.8: %.8% %.81 %.68 %.6D %.6: %.64 %.63 %.61 %.D2 %.D3 %./4 %.:/ %.46 ; %41.%1 32.41 2.13 D.%D /.1/ :.42 4.26 4.D2 4.:8 4.44 4.%1 4.33 4.14 %.2D %.21 %.8/ %.83 %.66 %.6: %.63 %.D8 %.DD %.D: %.D% %.D1 %./2 %./6 %./D %.// %./4 %.:/ %.46 %.%2 %.%3 > %4:.11 32.44 8.2: D.3D :.2/ :.%8 4.86 4./8 4.46 4.%% 4.12 4.11 %.2% %.8/ %.62 %.6: %.61 %.DD %.D4 %.D1 %./6 %.// %./4 %./3 %.:2 %.:6 %.:D %.:/ %.:4 %.:% %.4: %.%/ %.36 %.31 ? %4D.81 32.4/ 8.82 D.12 :.88 :.%3 4.62 4./1 4.%2 4.3: 4.13 %.23 %.84 %.6D %.63 %.DD %.D3 %./8 %./: %./3 %.:2 %.:D %.:: %.:% %.:1 %.42 %.46 %.4D %.4/ %.44 %.%/ %.36 %.12 %.13 @ %48.21 32.46 8.8/ D.1: :.8% :.3/ 4.64 4.:: 4.%4 4.16 %.2/ %.8/ %.66 %.61 %.D: %./2 %.// %./3 %.:8 %.:/ %.:% %.:1 %.46 %.4D %.4: %.4% %.43 %.%2 %.%8 %.%6 %.38 %.31 %.1% 3.2: A %:1./1 32.48 8183 D.11 :.66 :.31 4.D8 4.42 4.38 4.1% %.21 %.81 %.63 %.D/ %./2 %./: %.:2 %.:D %.:% %.42 %.46 %.4: %.4% %.41 %.%8 %.%6 %.%/ %.%: %.%% %.%3 %.3% %.1: 3.2D 3.88

:enominator !egree" of free!om

%3/.61 32.3D 2.%8 D./D /.:3 :.6D :.4/ :.16 4.8D 4.63 4./2 4.:2 4.:3 4.4: 4.%2 4.%: 4.%1 4.3D 4.34 4.31 4.16 4.1/ 4.14 4.13 %.22 %.28 %.2D %.2/ %.24 %.2% %.8: %.6D %.D8 %.D1

S-ar putea să vă placă și