Analysis of Variance

ANALYSIS OF VARIANCE (ANOVA)
ANOVA is a technique that will enable us to test for the significance of the difference among more
than 2 sample means.
Assumptions in ANOVA:
Each of the samples is drawn from a normal population.

The variances for the population from which samples have been drawn are equal.
The variation of each value around its own grand mean should be independent for each
value.
Basic steps in ANOVA:
Determine one estimate of the population variance from the variance among the sample means.
Determine a 2nd estimate of the population variance from the variance within the sample.
Compare these two estimates if they are approximately equal in value, accept the null
hypothesis.
Analysis of Variance Table ( One Way Classification)

Null hypothesis: Samples from same population
Source of
Variation
Between Samples
Within Samples
Sum of Squares
Sum of squares
between samples
(SSC)
Sum of squares
within samples
(SSE)
Degree of
freedom
1 K 1
2 N K
Here, K number of samples

N Total number of items in the given data.
Conclusion: If FC FT then difference is not significant.
If FC FT then difference is significant.
Calculation Procedu
re:
Mean square
Mean squares
between samples
SSC
MSC
K 1
Mean squares
within samples
SSE
MSE
N K
F-ratio
Fc
MSC
MSE
1. Sum of all items (T) =
X X
2. Correction Factor(C.F) =
....
T2
N
T 2
X 22 .... -
N
( X 1 ) 2 ( X 2 ) 2
T 2
.....)
4. SSC ( Sum of Squares between samples) = (
n
n
N
3. Total Sum of Squares (TSS) =
5. MSC =
2
1
SSC
df
6. Sum of Squares within Samples (SSE) = TSS SSC
SSE
df
1. A common test was given to a number of students taken at random from a particular class of
the four departments concerned to assess the significance of possible variation in performance.
Make an analysis of variance given in the following data:(Take the level of significance as 5%)
Departments
C
M
E
I
9
12
17
13
10
13
17
12
13
11
15
12
9
14
9
18
9
5
7
15
7. MSE =
Solution: Here N = 20, n = 5 ( number of items in each sample)

Sample I
X1
X12
9
81
10
100
13
169
9
81
9
81
50
512
Sample II
X2
X22
12
144
13
169
11
121
14
196
5
25
55
655
Step 1: Sum of all items (T) =
Sample III
X3
X32
17
289
17
289
15
225
9
81
7
49
65
933
Sample IV
X4
X42
13
169
12
144
12
144
18
324
15
225
70
1086
X1 X 2 X 3 X 4 ....= 50 + 55 + 65 + 70 = 240
(240)
T2
Step 2:Correction Factor(C.F) =
=
= 2880
N
20
Step 3: Total Sum of Squares (TSS) = Sum of squares of all items - CF
T 2
X 22 .... -
N
= 512 + 655 + 933 + 1006 2880
= 226
( X 1 ) 2 ( X 2 ) 2
T 2
.....)
Step 4: SSC (Sum of Squares between samples) = (
n
n
N
2
1
(50) 2 (55) 2 (65) 2 (70) 2

=(
....) 2880
5
5
5
5
= 50
Step 5: MSC = Mean square between samples =
50
SSC
=
16.67
df
4 1
Step 6: Sum of Squares within Samples (SSE) = TSS SSC = 226 50 = 176
Step 7: MSE =
Source of
Variation
Between
Samples
Within
Samples
SSE
= 11
df
Sum of Squares
Sum of squares
between samples
(SSC) = 50
Sum of squares
within samples
(SSE) = 176
Degree of
freedom
1 K 1
=3
2 N K
= 20 4
= 16
Mean square
Mean squares
between samples
SSC
MSC
K 1
= 16.67
Mean squares
within samples
SSE
MSE
N K
= 11
Tabulated value for (3, 16) df at 5% level of significance = 3.24

Calculated value = 1.515
Conclusion: Calculated value < Tabulated value.
So, we accept the null hypothesis.
Therefore, the samples could come from the same population.
F-ratio
MSC
MSE
= 1.515
Fc
2. Three different machines are used for a production. On the basis of the outputs, set up one-way
ANOVA table and test whether the machines are equally effective.
Outputs
Machine I
Machine II
Machine III
10
9
20
15
7
16
11
5
10
10
6
14
Given that the value of F at 5% level of significance for (2,9) df is 4.26.
Solution:
Source
of
Variation
Between
Samples
Within
Samples
Sum of Squares
Degree of
freedom
Mean square
Sum of squares
between samples
(SSC) = 162.17
1 K 1
Mean squares
between samples
SSC
MSC
K 1
= 81.085
Mean squares
within samples
SSE
MSE
N K
= 13.63
Sum of squares
within samples
(SSE) = 122.75
= 3-1
=2
2 N K
= 12 3
=9
F-ratio
MSC
MSE
= 5.95
Fc
Conclusion: we reject null hypothesis.

Three machines are not equally effective.
Analysis of Variance of 2-way Classification Model:
When 2 independent factors might affect the variable of interest it is possible to design a test so that an
analysis of variance can be used to test the effects of these two factors simultaneously. Then a test is
called a 2-way classification of ANOVA (or) a 2 factor ANOVA.
Analysis of Variance Table ( Two Way Classification)
Source of Variation
Between columns
( k = Number of
columns)
Sum of
Squares
SSC
Degree of
freedom
k1
Mean Square
MSC
SSC
k 1
F ratio
FC
MSC
MSE
Between rows
(r = Number of rows)
Residual (or) Error
SSR
r -1
SSE
(k-1)(r-1)
SSR
MSR
FR
r 1
MSE
SSE
MSE
(r 1)(k 1)
MSR
Conclusion: If FC FT , null hypothesis is accepted.

If FC FT , null hypothesis is rejected.
1. Three breeds of cattle A, B and C were fed by 4 different rations P,Q, R and S. The following
table gives the gains in weight. Test whether there is any significant difference between breeds
and rations at 5% level of significance.
P
6
1
7
1
Breed 2
3
Q
3
3
3
Rations
R
2
8
5
S
9
7
2
Null hypothesis: i) There is no significant between breeds.

ii) There is no significant difference between rations
Workers
P
6
1
7
14
1
2
3
Total
Rations
Q
R
3
2
3
8
3
5
9
15
Total
S
9
7
2
18
20
19
17
56(T)
Step 1: Total T = 56
(56) 2
T2
Step 2: Correction Factor CF =
=
261.33
12
N
Step 3: SSC = Sum of Squares between columns(Rations)
( X 1 ) 2 ( X 2 ) 2
T 2
.....)
=(
n
n
N
(14) 2 (9) 2 (15) 2 (18) 2
....) 261.33
3
3
3
3
= 14
Step 4: SSR = Sum of Squares between rows (workers)
=(
( X 1 ) 2
( X 2 ) 2
T 2
.....)
n
n
N
(20) 2 (19) 2 (17) 2
=(
.
) 261.33
4
4
4
= 1.17
=(
Step 5: Total Sum of Squares (TSS) = Sum of squares of each values CF

T 2
= X 12 X 22 .... -
N
2
2
2
2
= (6) + (3) +(2) +(9) +.+(2)2 261.33
= 78.67
Step 6: SSE = Residual
= TSS ( SSC + SSR)
= 78.67 - (14+1.17)
= 63.5
Source of
Variation
Between
columns
( k = Number of
columns)
Between rows
(r = Number of
rows)
Residual (or)
Error
Sum of Squares
Degree of
freedom
k 1 = 4 -1
=3
Mean Square
SSR = 1.17
r -1 = 3 1
=2
SSE = 63.5
(k-1)(r-1) = 6
SSR
r 1
= 0.585
SSE
MSE
(r 1)(k 1)
SSC = 14
MSC
SSC
k 1
= 4.67
MSR
= 10.58
Tabulated value: i) (6,3) df at 5% level is 8.94

ii) (6,2) df at 5% level is 19.3
Conclusion : i) CV <TV
Accept H0
ii) CV < TV
Accept H0.
F ratio
MSC
MSE
= 2.26
FC
MSR
MSE
= 18.08
FR
2. The following data represent the number of units of production per day turned out by 5
different workers using 4 different types of machines.
Workers
1
2
3
4
5
A
44
46
34
43
38
Machine Type
B
C
38
47
40
52
36
44
38
46
42
49
D
36
43
32
33
39
a) Test whether the mean production is the same for the different machine types.
b) Test whether the 5 men differ with mean productivity.
Null hypothesis: i) The mean productivity is the same for four different machines.
ii) 5 men do not differ with respect to mean productivity.
Since the data is too large, code the data by subtracting 40 from each value.
Workers
1
2
3
4
5
Total
A
-4
6
-6
3
-2
5
Machine Type
B
C
-2
7
0
12
-4
4
-2
6
2
9
-6
38
Total
D
-4
3
-8
-7
-1
-17
5
21
-14
0
8
T = 20
Step 1: Total T = 20
400
T2
20
Step 2: Correction Factor CF =
=
N
20
Step 3: SSC = Sum of Squares between columns(machines)
( X 1 ) 2 ( X 2 ) 2
T 2
.....)
=(
n
n
N
(5) 2 (6) 2 (38) 2 (17) 2
....) 20
=(
5
5
5
5
= 338.8
Step 4: SSR = Sum of Squares between rows (workers)
( X 1 ) 2 ( X 2 ) 2
T 2
.....)
=(
n
n
N
(5) 2 (21) 2 (14) 2 (0) 2 (8) 2
....) 20
=(
4
4
4
4
4
= 161.5
Step 5: Total Sum of Squares (TSS) = Sum of squares of each values CF
T 2
= X 12 X 22 .... -
N
= 594 - 20
= 574
Step 6: SSE = Residual
= TSS ( SSC + SSR)
= 574 - 338.8 161.5
= 73.7
SSC
Step 7: MSC = Mean square between columns =
= 112.933
df
Source of
Variation
Between
columns
( k = Number of
columns)
Between rows
(r = Number of
rows)
Residual (or)
Error
Sum of Squares
SSC = 338.8
SSR = 161. 5
SSE = 73.7
Degree of
freedom
k 1 = 4 -1
=3
Mean Square
r -1 = 5 1
=4
SSR
r 1
= 40.375
SSE
MSE
(r 1)(k 1)
(k-1)(r-1) =
12
SSC
k 1
= 112.933
MSC
MSR
F ratio
MSC
MSE
= 18.38
FC
MSR
MSE
= 6.574
FR
= 6.142

ii) (4,12) df at 5% level is 3.26
Conclusion : i) CV >TV
Mean productivity is the same for four different types of machines
ii) CV > TV
workers differe with mean productivity.
3. The following table gives monthly sales ( in thousand rupees) of a certain firm in three states
by its four salesmen.
States
A
I
6
Salesmen
II
III
5
3
IV
8
B
8
9
6
5
C
10
7
8
7
Set up the analysis of variance table and test whether there is any significant difference i) between
sales by the firm salesmen and ii) between sales in the three states.
Solution:
Null hypothesis: i) there is no significant difference between the sales by the firms salesmen and
ii) there is no significant difference between sales in the three states.
Source of
Variation
Between columns
( k = Number of
columns)
Between rows
(r = Number of
rows)
Residual (or)
Error
Sum of
Squares
SSC =
8.334
Degree of
freedom
k 1 = 4 -1
=3
SSR =
161. 5
r -1 = 3 1
=2
SSE =
73.7
(k-1)(r-1) = 6
Mean Square
SSC
k 1
= 2.778
SSR
MSR
r 1
= 6.334
SSE
MSE
(r 1)(k 1)
= 3.444
MSC
F ratio
MSC
MSE
= 0.81
MSR
FR
MSE
= 1.84
FC

ii) ( 2,6) df at 5% level is 5.14
Conclusion : i) there is no significant difference in sales at 5% level of significance
ii) There is no significant difference in the states.
Designs of Experiment
Aim of the Design of Experiments:
A statistical experiment in any field is performed to verify a particular hypothesis. For example, an
agricultural experiment may be performed to verify the claim that particular manure has got the effect
of increasing the yield of paddy. Here the quantity of the manure used and the amount of yield are the
two variables involved directly. They are called Experimental Variables.
Apart from these two, there are other variables such as fertility of the soil , the quantity of seed used
and the amount of rainfall, which also affect the yield of paddy. Such variables are called extraneous
variables.
The main aim of the design of experiments is to control the extraneous variables and hence to
minimize the experimental error so that the results of the experiments could be attributed only to the
experimental variables.
Basic Principle of Experimental Design:
Randomization, Replication, Local control

1. Randomization: It is not possible to eliminate completely the contribution of extraneous
variable to the value of the response variable, we try to control it by randomization.
The group of experimental units( plots of same size) in which the measure is used is called the
Experimental group and the other group of plots in which the manure is not used and which will
provide a basis of comparison is called Control group.
We select the plots for the experimental and control group in a random manner, which provides
the most sufficient way of eliminating any unknown basis in the experiment.
2. Replication:
It means Repetition. It is essential to carry out more than one test on each manure in order to
estimate the amount of the experimental error and hence to get some idea of the precision of
the estimates of the manure effects.
3. Local control:
To provide adequate control of extraneous variables, another essential principle used in the
experimental design is the local control. This includes techniques such as grouping, blocking
and balancing of the experimental units used in the experimental design.
By grouping, we mean combining sets of homogeneous plots into groups, so that different
manures may be used in different groups. The number of plots in different groups need not be
the same.
By blocking, we mean assigning the same number of plots in different blocks. The plots in the
same block may be assumed to be relatively homogeneous. We can use as many fertilizers as
the number of plots in a block in a random way.
By balancing, we do the adjusting of grouping procedures and blocking procedures and assign
fertilizers ao that a balanced configuration is obtained.
Basic Designs of Experiments:
1. Completely Randomised Design :(C.R.D)- (One factor classification)
Let us suppose that, to compare h treatments and there are n plots are available for the
experiment. Let the ith treatment be repeated ni times, so that n1 + n2 + + nh = N
The plots to which the different treatments are to be given are found by the following
randomisation principle. The plots are numbered from 1 to N serially. N-identical cards are
taken, numbered from 1 to N and shuffled thoroughly. Randomly draw n1 cards and the
numbers in these n1 cards give the numbers of the plots to which the first treatment is to be
given and so on. This design is called CRD, it is used when the plots are homogeneous
2. Randomised Block Design :(R.B.D) (Two factor classification)
Let us consider an agricultural experiment using which we wish to test the effect of k
fertilizing treatments on the yield of a crop. We assume that, we know some information about
the soil fertility of the plots. Then we divide the plots into h blocks according to the soil
fertility, each block containing k plots. Thus the plots in each block will be of homogeneous
as far as possible.
Within each block, the k treatments are given to the k plots in a perfectly random
manner, such that each treatment occurs only once in any block. But the same k treatments
are repeated from block to block.
Null hypothesis: Rows and columns are homogeneous
3. Latin Square Design: (LSD) (Three factor classification)
We consider an agricultural experiment, in which n2 plots are taken and arranged in the form
of an n x n square, such that the plots in each row will be homogeneous as far as possible with
respect to one factor of classification say (soil fertility) and plots in each column will be
homogeneous as as far as possible with respect to another factor of classification say (seed quality)
The n treatments are given to these plots such that each treatment occurs only once in each row
and only once in each column.
The various possible arrangements obtained in this manner are known as Latin squares of order
n. Here rows, columns and letters stand for the three factors say fertility, seed quality and treatment
respectively.
Null hypothesis: Rows, columns and letters are homogeneous.
Comparison of RBD & LSD:
1. The number of replications of each treatment is equal to the number of treatments in LSD,
whereas there is no such restrictions on treatments and replication in RBD.
2. LSD can be performed on a square field, while RBD can be performed either on a square field
or a rectangular field.
3. LSD is known to be suitable for the case when the number of treatments is between 5 and 12,
whereas RBD can be used for any number of treatments.
4. The main advantage of LSD is that it controls the effect of two extraneous variables, whereas
RBD controls the effect of only one extraneous variable. Hence the experimental error is
reduced to a larger extent in LSD than in RBD.
1. Three varieties A, B, C of a crop are tested in a RBD with four replications. The plot yields in
pounds are as follows.
A6
C5 A8 B9
C8
A4 B6 C9
B7
B 6 C 10 A 6
Analysis the experimental yield and state your conclusion.
Solution:
Null hypothesis: There is no significant difference between varieties ( rows) and between yiels
(blocks)
Source of
Sum of Squares
Degree of
Mean Square
F ratio
Variation
Between
columns
( k = Number of
columns)
Between rows
(r = Number of
rows)
Residual (or)
Error
SSC = 18
freedom
k 1 = 4 -1
=3
SSR = 8
r -1 = 3 1
=2
SSE = 10
(k-1)(r-1) =
6
SSC
k 1
=6
MSC
SSR
r 1
=4
SSE
MSE
(r 1)(k 1)
MSR
MSC
MSE
= 3.6
FC
MSR
MSE
= 2.4
FR
= 1.667
Tabulated value : i) (3,6) df at 5% level of significance is 4.76.
ii) ( 2,6) df at 5% level of significance is 5.14
Conclusion: i) there is no significant difference between yields.
ii) There is no significant difference between varieties.
2. The following data resulted from an experiment to compare three burners B1 , B2 and B3 . A
latin square design was used as the tests were made on 3 engines and were spread over 3 days.
Day 1
Day 2
Day - 3
Engine 1
B1 - 16
Engine 2
B2 -17
Engine - 3
B3 - 20
B2 - 16
B3 - 15
B3 - 21
B1 - 15
B2 - 13
B1 - 12
Test the hypothesis that there is no difference between the burners.

Analysis of Variance

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Analysis of Variance

Încărcat de

Drepturi de autor:

Formate disponibile

ANALYSIS OF VARIANCE (ANOVA)

Each of the samples is drawn from a normal population.

Basic steps in ANOVA:

Analysis of Variance Table ( One Way Classification)

Here, K number of samples

1. Sum of all items (T) =

3. Total Sum of Squares (TSS) =

6. Sum of Squares within Samples (SSE) = TSS SSC

Solution: Here N = 20, n = 5 ( number of items in each sample)

Step 1: Sum of all items (T) =

(50) 2 (55) 2 (65) 2 (70) 2

Tabulated value for (3, 16) df at 5% level of significance = 3.24

Conclusion: we reject null hypothesis.

Conclusion: If FC FT , null hypothesis is accepted.

Null hypothesis: i) There is no significant between breeds.

Step 5: Total Sum of Squares (TSS) = Sum of squares of each values CF

Tabulated value: i) (6,3) df at 5% level is 8.94

(5) 2 (6) 2 (38) 2 (17) 2

(5) 2 (21) 2 (14) 2 (0) 2 (8) 2

Tabulated value: i) (3,12) df at 5% level is 3.49

Tabulated value: i) (6,3) df at 5% level is 8.94

Randomization, Replication, Local control

Test the hypothesis that there is no difference between the burners.

S-ar putea să vă placă și