Documente Academic
Documente Profesional
Documente Cultură
Chi-square distribution
df = 2
Chi-square distribution
df = 5
P(lower)
.9999
P(upper)
.0001
Chi-square
25.74
P(lower)
1.0000
P(upper)
1.00E-05
Chi-square
41.30
Chi-square distribution
df = 10
The tables for the 2 distribution for various degrees of freedom from 1 to 100 are given on the
back of the book. The degree of freedom is indicated in the first column. For any degree of
freedom the 2 value given in the tables is the value which would have the area to the right
indicated by the subscript of 2 on top row. For example for 10 degrees of freedom the 2 value
which will have 10 percent area to the right is 15.9871.The 2 distribution is generally used for
Goodness of Fit Test or Test of Independence.
Note that a general convention is to consider n sufficiently large for Chi-square test if all the
expected frequencies (fe) are at least equal to 5. If for any cell the value of expected frequency
falls below 5, it is better to combine it with another category. The formula to calculate Chisquare is given below:
2 = (fo- fe)2/fe is distributed as 2 with degree of freedom k-1, where k is the number of
categories. We have to subtract one because only five of the six frequencies can be arbitrarily
determined once the total is fixed. The sixth is determined when five others and the total are
given. Let us do the calculations indicated by the formula as follows:
fo- fe
-7
10
-6
-5
8
0
13.7
= 2 calculated.
Total
2
The MegaStat result is given below. In excel create two columns: one for observed and one for
expected frequencies. Then go to Chi-square/cross tab. Then select Goodness of Fit and in the
dialogue box fill the input section, put 0 for number of parameters estimated and you get the
results. The following table shows all the calculation we did above using calculator. It gives the
value of calculated test statistic exactly equal to what we obtained.
Goodness of Fit Test
observed
expected
13
20.000
30
20.000
14
20.000
15
20.000
28
20.000
20
20.000
120
120.000
13.70 chi-square
5 df
.0176 p-value
O-E
-7.000
10.000
-6.000
-5.000
8.000
0.000
0.000
(O - E) / E
2.450
5.000
1.800
1.250
3.200
0.000
13.700
The p-value indicates that we can reject the Null at 5% but not at 1% level test.
2. The 2 Test of Goodness of Fit in The Case of Unequal Expected Frequencies
Example2: A recent (hypothetical) national survey of hospital admissions for people between 25
and 50 years who had hospital admissions during a two years period showed that 40% had 1
admission only, 20% had two admissions, 14% had 3 admissions, 10% had 4 admissions, 8%
had 5 admissions, 6% had 6 admissions and only 2% had 7 or more admissions. The mayor of a
small city claims that his city is much healthier than the national average. He even cites the
percentages for the two extreme categories. He says that 44% of local population in the given
age group have only one hospital admissions (compared to 40% national) and the percentage of 6
or more admissions is only 5% compared to national 8%. His claim was in fact based on a
sample of 400 randomly selected people in the specified age group who were interviewed by a
local Newspaper. It was revealed that 176 people had only 1 admission, 75 had 2 admissions, 50
had 3 admissions, 44 had 4 admissions, 35 had 5 admissions, 15 had 6 admissions and only 5 had
7 or more admissions. Is the claim of the mayor valid? Test at 5% and 10%.
Looking at the two extreme categories the mayors claim seems to have strong evidence. But
Statisticians in the local University wanted to test the claim using more scientific methods. Does
the overall data support the mayors claim?
The Null hypothesis in this case is that all the categories (number of hospital admissions) in the
local population are the same as in the national population. The alternative hypothesis is that the
local and national patterns (or percentages) are different. We will obtain the expected frequencies
by multiplying the percentages in the national survey by the total number of observation in the
local survey. For example the expected frequency for only one admission is 0.40*400= 160
(assuming equality between local and national percentages). The following table will make it
clear.
Admissions
1
2
3
4
5
6
7+
Total
National%
40
20
14
10
8
6
2
100
fe
160
80
56
40
32
24
8
400
fo
176
75
50
44
35
15
5
400
fo fe
16
-5
-6
4
3
-9
-3
0
(fo fe)2
256
25
36
16
9
81
9
---
(fe fo)2/ fe
1.600
0.313
0.643
0.400
0.281
3.375
1.125
7.737
The calculated test statistic is 7.737 and the degree of freedom is 7-1 = 6.
For this df the table gives 2.10 =10.6446 and 2.05 =12.5916. Thus the Null hypothesis of no
difference between national and local populations with respect to the number of hospital
admissions cannot be rejected even at 10% level. The mayors claim was found to lack strong
evidence from the data when the scientific hypothesis testing method was applied although
initially it seemed to have some evidence.
To the computer it does not matter whether the case is that of equal expected frequencies or
unequal expected frequencies. The process is the same.
Goodness of Fit Test
observed
176
75
50
44
35
15
5
400
7.74 chi-square
6 df
.2580 p-value
expected
O-E
(O - E) / E
160.000
80.000
56.000
40.000
32.000
24.000
8.000
400.000
16.000
-5.000
-6.000
4.000
3.000
-9.000
-3.000
0.000
1.600
0.313
0.643
0.400
0.281
3.375
1.125
7.737
18
14
12
50
Middle
52
70
100
78
300
Low
20
26
58
46
150
Total
90
110
170
130
500
We have the observed frequencies and need to find the expected frequencies. After that the
formula for the test statistic is the same as in the case of Goodness of Fit test. The formula for the
expected frequencies is based on the Null Hypothesis that the rows and columns are independent
of each other.
If feij denotes the expected frequency in cell (i,j) then
feij = (Row i total*Column j total)/Grand Total
For example the expected frequency in cell (1,1) or the left upper corner cell would be
50*90/500 = 9 whereas the observed frequency is 18. It is also customary to show both types of
frequencies in the same table so that pair wise differences can be easily calculated. The row and
column totals for the observed and expected frequencies must be identical. Therefore, if you
have to do rounding, keep this in mind.
Table of observed and expected frequencies of Income level by Letter Grade
Grade
Income
High
Middle
Low
Total
A
18
(9)
52
(54)
20
(27)
90
B
14
(11)
70
(66)
26
(33)
110
C
12
(17)
100
(102)
58
(51)
170
D
6
(13)
78
(78)
46
(39)
130
Total
50
300
150
500
Total
14
11.00
3.00
0.82
70
66.00
4.00
0.24
26
33.00
-7.00
12
17.00
-5.00
1.47
100
102.00
-2.00
0.04
58
51.00
7.00
6
13.00
-7.00
3.77
78
78.00
0.00
0.00
46
39.00
7.00
50
50.00
0.00
15.06
HIGH
Observed
Expected
O-E
(O - E) / E
MED
Observed
Expected
O-E
(O - E) / E
LOW
Observed
Expected
O-E
18
9.00
9.00
9.00
52
54.00
-2.00
0.07
20
27.00
-7.00
(O - E) / E
1.81
1.48
0.96
1.26
5.52
Observed
Expected
O-E
(O - E) / E
90
90.00
0.00
10.89
20.93
6
.0019
110
110.00
0.00
2.55
chi-square
df
p-value
170
170.00
0.00
2.47
130
130.00
0.00
5.03
500
500.00
0.00
20.93
Total
300
300.00
0.00
0.36
150
150.00
0.00