Chi Square Tests

COMPARING MULTIPLE
PROPORTIONS, TEST OF
INDEPENDENCE AND
GOODNESS OF FIT
Quantitative Analysis
Mayank Patel, CFA
Learning Objectives
In this chapter, you learn:
Testing the equality of population
proportions for three or more populations
Testing the independence of two categorical
variables
Testing whether a probability distribution for
a population follows a specific historical or
theoretical probability distribution.
Contingency Tables
Contingency Tables
Useful in situations involving multiple
population proportions
Used to classify sample observations
according to two or more characteristics
Also called a cross-classification table.
2 Test for The Differences Among

More Than Two Proportions
Extend the 2 test to the case with more than two
independent populations:
H 0: 1 = 2 = = c
H1: Not all of the j are equal (j = 1, 2, , c)
The Chi-Square Test Statistic

The Chi-square test statistic is:
2
(
f
f
)
2 o e
fe
all cells
where:
fo = observed frequency in a particular cell of the 2 x c table
fe = expected frequency in a particular cell if H0 is true
2 for the 2 x c case has (2-1)(c-1) = c - 1 degrees of freedom
Assumed: each cell in the contingency table has expected frequency of

at least 1
Computing the
Overall Proportion
The overall
proportion is:
X 1 X 2 ... X c X
p
n1 n2 ... nc
n
Expected cell frequencies for the c categories are
calculated as in the 2 x 2 case, and the decision rule

is the same:
Decision Rule:
If 2 > 2U, reject H0,
otherwise, do not
reject H0
Where 2U is from the

chi-square distribution
with c 1 degrees of
freedom
2 Test with More Than Two

Proportions: Example
The sharing of patient records is a
controversial issue in health care. A survey

of 500 respondents asked whether they
objected to their records being shared by
insurance companies, by pharmacies, and by
medical researchers. The results are
summarized on the following table:

Organization
Object to
Record
Sharing
Insurance
Companies
Pharmacies
Medical
Researchers
Yes
410
295
335
No
90
205
165

The overall
proportion is:
X 1 X 2 ... X c 410 295 335
0.6933
n1 n2 ... nc
500 500 500
Organization
Insurance
Companies
Pharmacies
Medical
Researchers
Yes
fo = 410
fe = 346.667
fo = 295
fe = 346.667
fo = 335
fe = 346.667
No
fo = 90
fe = 153.333
fo = 205
fe = 153.333
fo = 165
fe = 153.333
Object to
Record
Sharing

Object
to
Record
Sharing
Yes
No
Organization
Insurance
Companies
Pharmacies
Medical
Researchers
fo fe 2
fo fe 2
fo fe 2
fe
fo fe
fe
11.571
26.159
fe
fo fe 2
fe
7.700
17.409
fe
0.3926
fo fe 2
fe
0.888
2
(
f
f
)
The Chi-square test statistic is: 2 o e 64.1196
fe
all cells

H 0: 1 = 2 = 3
H1: Not all of the j are equal (j = 1, 2, 3)
Decision Rule:
otherwise, do not reject H0
2U = 5.991 is from the chisquare distribution with 2

degrees of freedom.
Conclusion: Since 64.1196 > 5.991, you reject H0 and you

conclude that at least one proportion of respondents who object
to their records being shared is different across the three
organizations
The Marascuilo Procedure

The Marascuilo procedure enables you to
make comparisons between all pairs of
groups.
First, compute the observed differences pj - pj
among all c(c-1)/2 pairs.
Second, compute the corresponding critical
range for the Marascuilo procedure.

Critical Range for the Marascuilo Procedure:
Critical Range
2
U
p j (1 p j )
nj
p j / (1 p j / )
n j/

Compute a different critical range for each
pair-wise comparison of sample proportions.

Compare each of the c(c - 1)/2 pairs of
sample proportions against its corresponding
critical range.
Declare a specific pair significantly different
if the absolute difference in the sample
proportions |pj pj| is greater than its critical
range.

Example
Organization
Object to
Record
Sharing
Insurance
Companies
Pharmacies
Medical
Researchers
Yes
410
P1 = 0.82
295
P2 = 0.59
335
P3 = 0.67
No
90
205
165

Example
MARASCUILO TABLE
Proportions
Absolute
Differences Critical Range
| Group 1 - Group 2 |
0.23
0.06831808
0.15
0.0664689
0.08
0.074485617
Conclusion: Since each absolute difference is greater

than the critical range, you conclude that each
proportion is significantly different that the other two.
2 Test of Independence
Similar to the 2 test for equality of more than two
proportions, but extends the concept to contingency

tables with r rows and c columns
H0: The two categorical variables are independent
(i.e., there is no relationship between them)
H1: The two categorical variables are dependent
(i.e., there is a relationship between them)
2 Test of Independence
The Chi-square test statistic is:
2
(
f
f
)
2 o e
fe
all cells
where:
fo = observed frequency in a particular cell of the r x c table
fe = expected frequency in a particular cell if H 0 is true
2 for the r x c case has (r-1)(c-1) degrees of freedom
Assumed: each cell in the contingency table has expected

frequency of at least 1)
Expected Cell Frequencies

Expected cell frequencies:
row total column tot al

fe
n
Where:
row total = sum of all frequencies in the row
column total = sum of all frequencies in the column
n = overall sample size
Decision Rule
The decision rule is

otherwise, do not reject H0
Where 2U is from the chi-square distribution with
(r 1)(c 1) degrees of freedom
Example: Test of Independence

The meal plan selected by 200 students is shown below:
Class
Standing
Fresh.
Soph.
Junior
Senior
Total
Number of meals per week

20/week 10/week
none
24
22
10
14
70
32
26
14
16
88
14
12
6
10
42
Total
70
60
30
40
200

The hypothesis to be tested is:
H0: Meal plan and class standing are independent

(i.e., there is no relationship between them)
H1: Meal plan and class standing are dependent
(i.e., there is a relationship between them)

Expected cell frequencies
if H0 is true:
Number of meals
per week
Example for one cell:

row total x column tot al
fe
n
30 70
10.5
200
Class
Standing
20/wk
10/wk
none
Fresh.
24.5
30.8
14.7
70
Soph.
21.0
26.4
12.6
60
Junior
10.5
13.2
6.3
30
Senior
14.0
17.6
8.4
40
70
88
42
200
Total
Total

The test statistic value is:
( fo fe )2

fe
all cells
2
(24 24.5) 2 (32 30.8) 2

(10 8.4) 2
0.709
24.5
30.8
8.4
2U = 12.592 for = .05 from the chi-square

distribution with (4 1)(3 1) = 6 degrees of
freedom
Example: Test of
Independence
The test statistic is 2 0.709 , U2 with 6 d.f. 12.592
Decision Rule:
If 2 > 12.592, reject H0, otherwise,
do not reject H0
=0.05
Do not
reject H0
Reject H0
2U=12.592
Here,
2 = 0.709 < 2U = 12.592,
so do not reject H0
Conclusion: there is
insufficient evidence that meal
plan and class standing are
related.
Hypothesis (Goodness of Fit) Test

for Proportions of a Multinomial
Population
1. State the null and alternative hypotheses.
H00: The population follows a multinomial
distribution with specified probabilities
for each of the k categories
Haa: The population does not follow a
multinomial distribution with specified
probabilities for each of the k
categories

for Proportions of a Multinomial
Population
2. Select a random sample and record the observed
frequency, fi , for each of the k categories.
3. Assuming H0 is true, compute the expected
frequency, ei , in each category by multiplying the
category probability by the sample size.

for Proportions of a Multinomial Population
4. Compute the value of the test statistic.
2
(
f
e
)
2 i i
ei
i 1
k
where:
fi = observed frequency for category i
ei = expected frequency for category i
k = number of categories
Note: The test statistic has a chi-square distribution

with k 1 df provided that the expected frequencies
are 5 or more for all categories.

for Proportions of a Multinomial Population
5. Rejection rule:
p-value approach: Reject H0 if p-value <

Critical value approach:
2
2
Reject H0 if
where is the significance level and

there are k - 1 degrees of freedom
Multinomial Distribution Goodness of Fit Test
Example: Finger Lakes Homes (A)

Finger Lakes Homes manufactures four models of
prefabricated homes, a two-story colonial, a log cabin,
a split-level, and an A-frame. To help in production
planning, management would like to determine if
previous customer purchases indicate that there is a
preference in the style selected.
Example: Finger Lakes Homes (A)

The number of homes sold of each model for 100
sales over the past two years is shown below.
SplitAModel Colonial Log Level Frame
# Sold
30
20
35
15
Hypotheses
H0: pC = pL = pS = pA = .25
Ha: The population proportions are not
pC = .25, pL = .25, pS = .25, and pA = .25
where:
pC = population proportion that purchase a colonial
pL = population proportion that purchase a log cabin
pS = population proportion that purchase a split-level
pA = population proportion that purchase an A-frame

Rejection Rule
Reject H0 if p-value < .05 or 2 > 7.815.
With = .05 and

k-1=4-1=3
degrees of freedom
Do Not Reject H0
Reject H0
2
7.815
Expected Frequencies
e1 = .25(100) = 25
e3 = .25(100) = 25
e2 = .25(100) = 25
e4 = .25(100) = 25
Test Statistic
( 30 25) 2 ( 20 25) 2 ( 35 25) 2 (15 25) 2

25
25
25
25
2
=1+1+4+4
= 10
Conclusion Using the p-Value Approach

Area in Upper Tail
2 Value (df = 3)
.10
.05
.025
.01
.005
6.251 7.815 9.348 11.345 12.838
Because 2 = 10 is between 9.348 and 11.345, the

area in the upper tail of the distribution is between
.025 and .01.
The p-value < . We can reject the null hypothesis.
Actual p-value is .
0186
Conclusion Using the Critical Value Approach

2 = 10 > 7.815
We reject, at the .05 level of significance,

the assumption that there is no home style
preference.
Goodness of Fit Test: Normal

Distribution
1. State the null and alternative hypotheses.
H00: The population has a normal distribution
Haa: The population does not have a normal distribution
2. Select a random sample and
a. Compute the mean and standard deviation.
b. Define intervals of values so that the expected
frequency is at least 5 for each interval.
c. For each interval, record the observed frequencies
3. Compute the expected frequency, ei , for each interval.
(Multiply the sample size by the probability of a
normal random variable being in the interval.

Distribution
4. Compute the value of the test statistic.
2
(
f
e
)
2 i i
ei
i 1
k
2
2
5. Reject H0 if (where is the significance level
and there are k - 3 degrees of freedom).

Distribution
Example: IQ Computers
IQ Computers (one better than HP?) manufactures

and sells a general purpose microcomputer. As part
of a study to evaluate sales personnel, management
wants to determine, at a .05 significance level, if the
annual sales volume (number of units sold by a
salesperson) follows a normal probability
distribution.

Distribution
Example: IQ Computers
A simple random sample of 30 of the salespeople
was taken and their numbers of units sold are
listed below.
33
64
83
43
65
84
44
66
85
45
68
86
52
70
91
52
72
92
56
73
94
58 63 64
73 74 75
98 102 105
(mean = 71, standard deviation = 18.54)

Distribution
Hypotheses
H0: The population of number of units sold

has a normal distribution with mean 71
and standard deviation 18.54.
Ha: The population of number of units sold
does not have a normal distribution with
mean 71 and standard deviation 18.54.

Distribution
Interval Definition
To satisfy the requirement of an expected

frequency of at least 5 in each interval we will
divide the normal distribution into 30/5 = 6
equal probability intervals.

Distribution
Interval Definition
Areas
= 1.00/6
= .1667
53.02
71 .43(18.54) = 63.03
71
88.98 = 71 + .97(18.54)
78.97

Distribution
Observed and Expected Frequencies

i
Less than 53.02
53.02 to 63.03
63.03 to 71.00
71.00 to 78.97
78.97 to 88.98
More than 88.98
Total
fi
6
3
6
5
4
6
30
ei
5
5
5
5
5
5
30
fi - ei
1
-2
1
0
-1
1
Goodness of Fit Test: Normal Distribution

Rejection Rule
With = .05 and k - p - 1 = 6 - 2 - 1 = 3 d.f.
(where k = number of categories and p = number
2
of population parameters estimated), .05 7.815
Reject H0 if p-value < .05 or 2 > 7.815.
Test Statistic
2
2
2
2
2
2
(1)
(
2)
(1)
(0)
(
1)
(1)
2
1.600
5
5
5
5
5
5
Goodness of Fit Test: Normal Distribution

Conclusion Using the p-Value Approach
Area in Upper Tail
.90
.10
.05
2 Value (df = 3)
.584 6.251 7.815
.025
.01
9.348 11.345
Because 2 = 1.600 is between .584 and 6.251 in the

Chi-Square Distribution Table, the area in the upper tail
of the distribution is between .90 and .10.
The p-value > . We cannot reject the null hypothesis.
There is little evidence to support rejecting the
assumption the population is normally distributed with
= 71 and = 18.54.
Actual p-value is .6594

Chi Square Tests

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Chi Square Tests

Încărcat de

Drepturi de autor:

Formate disponibile

COMPARING MULTIPLE

2 Test for The Differences Among

The Chi-Square Test Statistic

fo = observed frequency in a particular cell of the 2 x c table

fe = expected frequency in a particular cell if H0 is true

2 for the 2 x c case has (2-1)(c-1) = c - 1 degrees of freedom

Assumed: each cell in the contingency table has expected frequency of

Expected cell frequencies for the c categories are

calculated as in the 2 x 2 case, and the decision rule

Where 2U is from the

2 Test with More Than Two

controversial issue in health care. A survey

2 Test with More Than Two

2 Test with More Than Two

X 1 X 2 ... X c 410 295 335

2 Test with More Than Two

2 Test with More Than Two

2U = 5.991 is from the chisquare distribution with 2

Conclusion: Since 64.1196 > 5.991, you reject H0 and you

The Marascuilo Procedure

The Marascuilo Procedure

The Marascuilo Procedure

pair-wise comparison of sample proportions.

The Marascuilo Procedure

The Marascuilo Procedure

Conclusion: Since each absolute difference is greater

proportions, but extends the concept to contingency

Assumed: each cell in the contingency table has expected

Expected Cell Frequencies

row total column tot al

If 2 > 2U, reject H0,

Example: Test of Independence

Number of meals per week

Example: Test of Independence

H0: Meal plan and class standing are independent

Example: Test of Independence

Example for one cell:

Example: Test of Independence

(24 24.5) 2 (32 30.8) 2

2U = 12.592 for = .05 from the chi-square

Hypothesis (Goodness of Fit) Test

Hypothesis (Goodness of Fit) Test

Hypothesis (Goodness of Fit) Test

Note: The test statistic has a chi-square distribution

Hypothesis (Goodness of Fit) Test

p-value approach: Reject H0 if p-value <

where is the significance level and

Multinomial Distribution Goodness of Fit Test

Example: Finger Lakes Homes (A)

Multinomial Distribution Goodness of Fit Test

Example: Finger Lakes Homes (A)

Multinomial Distribution Goodness of Fit Test

Multinomial Distribution Goodness of Fit Test

With = .05 and

Multinomial Distribution Goodness of Fit Test

( 30 25) 2 ( 20 25) 2 ( 35 25) 2 (15 25) 2

Multinomial Distribution Goodness of Fit Test

Conclusion Using the p-Value Approach

6.251 7.815 9.348 11.345 12.838

Because 2 = 10 is between 9.348 and 11.345, the

Multinomial Distribution Goodness of Fit Test

Conclusion Using the Critical Value Approach

We reject, at the .05 level of significance,

Goodness of Fit Test: Normal