Sunteți pe pagina 1din 46

COMPARING MULTIPLE

PROPORTIONS, TEST OF
INDEPENDENCE AND
GOODNESS OF FIT
Quantitative Analysis
Mayank Patel, CFA

Learning Objectives
In this chapter, you learn:
Testing the equality of population
proportions for three or more populations
Testing the independence of two categorical
variables
Testing whether a probability distribution for
a population follows a specific historical or
theoretical probability distribution.

Contingency Tables
Contingency Tables
Useful in situations involving multiple
population proportions
Used to classify sample observations
according to two or more characteristics
Also called a cross-classification table.

2 Test for The Differences Among


More Than Two Proportions
Extend the 2 test to the case with more than two

independent populations:
H 0: 1 = 2 = = c
H1: Not all of the j are equal (j = 1, 2, , c)

The Chi-Square Test Statistic


The Chi-square test statistic is:
2
(
f

f
)
2 o e
fe
all cells

where:

fo = observed frequency in a particular cell of the 2 x c table

fe = expected frequency in a particular cell if H0 is true

2 for the 2 x c case has (2-1)(c-1) = c - 1 degrees of freedom

Assumed: each cell in the contingency table has expected frequency of


at least 1

Computing the
Overall Proportion
The overall
proportion is:

X 1 X 2 ... X c X
p

n1 n2 ... nc
n

Expected cell frequencies for the c categories are

calculated as in the 2 x 2 case, and the decision rule


is the same:
Decision Rule:
If 2 > 2U, reject H0,
otherwise, do not
reject H0

Where 2U is from the


chi-square distribution
with c 1 degrees of
freedom

2 Test with More Than Two


Proportions: Example
The sharing of patient records is a

controversial issue in health care. A survey


of 500 respondents asked whether they
objected to their records being shared by
insurance companies, by pharmacies, and by
medical researchers. The results are
summarized on the following table:

2 Test with More Than Two


Proportions: Example
Organization
Object to
Record
Sharing

Insurance
Companies

Pharmacies

Medical
Researchers

Yes

410

295

335

No

90

205

165

2 Test with More Than Two


Proportions: Example
The overall
proportion is:

X 1 X 2 ... X c 410 295 335

0.6933
n1 n2 ... nc
500 500 500

Organization
Insurance
Companies

Pharmacies

Medical
Researchers

Yes

fo = 410
fe = 346.667

fo = 295
fe = 346.667

fo = 335
fe = 346.667

No

fo = 90
fe = 153.333

fo = 205
fe = 153.333

fo = 165
fe = 153.333

Object to
Record
Sharing

2 Test with More Than Two


Proportions: Example
Object
to
Record
Sharing
Yes
No

Organization
Insurance
Companies

Pharmacies

Medical
Researchers

fo fe 2

fo fe 2

fo fe 2

fe

fo fe
fe

11.571

26.159

fe

fo fe 2
fe

7.700

17.409

fe

0.3926

fo fe 2
fe

0.888

2
(
f

f
)
The Chi-square test statistic is: 2 o e 64.1196
fe
all cells

2 Test with More Than Two


Proportions: Example
H 0: 1 = 2 = 3
H1: Not all of the j are equal (j = 1, 2, 3)

Decision Rule:
If 2 > 2U, reject H0,
otherwise, do not reject H0

2U = 5.991 is from the chisquare distribution with 2


degrees of freedom.

Conclusion: Since 64.1196 > 5.991, you reject H0 and you


conclude that at least one proportion of respondents who object
to their records being shared is different across the three
organizations

The Marascuilo Procedure


The Marascuilo procedure enables you to
make comparisons between all pairs of
groups.
First, compute the observed differences pj - pj
among all c(c-1)/2 pairs.
Second, compute the corresponding critical
range for the Marascuilo procedure.

The Marascuilo Procedure


Critical Range for the Marascuilo Procedure:

Critical Range

2
U

p j (1 p j )
nj

p j / (1 p j / )
n j/

The Marascuilo Procedure


Compute a different critical range for each

pair-wise comparison of sample proportions.


Compare each of the c(c - 1)/2 pairs of
sample proportions against its corresponding
critical range.
Declare a specific pair significantly different
if the absolute difference in the sample
proportions |pj pj| is greater than its critical
range.

The Marascuilo Procedure


Example
Organization
Object to
Record
Sharing

Insurance
Companies

Pharmacies

Medical
Researchers

Yes

410
P1 = 0.82

295
P2 = 0.59

335
P3 = 0.67

No

90

205

165

The Marascuilo Procedure


Example
MARASCUILO TABLE
Proportions

Absolute
Differences Critical Range

| Group 1 - Group 2 |

0.23

0.06831808

| Group 1 - Group 3 |

0.15

0.0664689

| Group 2 - Group 3 |

0.08

0.074485617

Conclusion: Since each absolute difference is greater


than the critical range, you conclude that each
proportion is significantly different that the other two.

2 Test of Independence
Similar to the 2 test for equality of more than two

proportions, but extends the concept to contingency


tables with r rows and c columns
H0: The two categorical variables are independent
(i.e., there is no relationship between them)
H1: The two categorical variables are dependent
(i.e., there is a relationship between them)

2 Test of Independence
The Chi-square test statistic is:

2
(
f

f
)
2 o e
fe
all cells

where:
fo = observed frequency in a particular cell of the r x c table
fe = expected frequency in a particular cell if H 0 is true
2 for the r x c case has (r-1)(c-1) degrees of freedom

Assumed: each cell in the contingency table has expected


frequency of at least 1)

Expected Cell Frequencies


Expected cell frequencies:

row total column tot al


fe
n
Where:
row total = sum of all frequencies in the row
column total = sum of all frequencies in the column
n = overall sample size

Decision Rule
The decision rule is

If 2 > 2U, reject H0,


otherwise, do not reject H0
Where 2U is from the chi-square distribution with
(r 1)(c 1) degrees of freedom

Example: Test of Independence


The meal plan selected by 200 students is shown below:

Class
Standing
Fresh.
Soph.
Junior
Senior
Total

Number of meals per week


20/week 10/week
none
24
22
10
14
70

32
26
14
16
88

14
12
6
10
42

Total
70
60
30
40
200

Example: Test of Independence


The hypothesis to be tested is:

H0: Meal plan and class standing are independent


(i.e., there is no relationship between them)
H1: Meal plan and class standing are dependent
(i.e., there is a relationship between them)

Example: Test of Independence


Expected cell frequencies
if H0 is true:
Number of meals
per week

Example for one cell:


row total x column tot al
fe
n
30 70

10.5
200

Class
Standing

20/wk

10/wk

none

Fresh.

24.5

30.8

14.7

70

Soph.

21.0

26.4

12.6

60

Junior

10.5

13.2

6.3

30

Senior

14.0

17.6

8.4

40

70

88

42

200

Total

Total

Example: Test of Independence


The test statistic value is:
( fo fe )2

fe
all cells
2

(24 24.5) 2 (32 30.8) 2


(10 8.4) 2

0.709
24.5
30.8
8.4

2U = 12.592 for = .05 from the chi-square


distribution with (4 1)(3 1) = 6 degrees of
freedom

Example: Test of
Independence
The test statistic is 2 0.709 , U2 with 6 d.f. 12.592
Decision Rule:
If 2 > 12.592, reject H0, otherwise,
do not reject H0
=0.05

Do not
reject H0

Reject H0

2U=12.592

Here,
2 = 0.709 < 2U = 12.592,
so do not reject H0
Conclusion: there is
insufficient evidence that meal
plan and class standing are
related.

Hypothesis (Goodness of Fit) Test


for Proportions of a Multinomial
Population
1. State the null and alternative hypotheses.
H00: The population follows a multinomial
distribution with specified probabilities
for each of the k categories
Haa: The population does not follow a
multinomial distribution with specified
probabilities for each of the k
categories

Hypothesis (Goodness of Fit) Test


for Proportions of a Multinomial
Population
2. Select a random sample and record the observed
frequency, fi , for each of the k categories.
3. Assuming H0 is true, compute the expected
frequency, ei , in each category by multiplying the
category probability by the sample size.

Hypothesis (Goodness of Fit) Test


for Proportions of a Multinomial Population
4. Compute the value of the test statistic.
2
(
f

e
)
2 i i
ei
i 1
k

where:
fi = observed frequency for category i
ei = expected frequency for category i
k = number of categories

Note: The test statistic has a chi-square distribution


with k 1 df provided that the expected frequencies
are 5 or more for all categories.

Hypothesis (Goodness of Fit) Test


for Proportions of a Multinomial Population
5. Rejection rule:

p-value approach: Reject H0 if p-value <


Critical value approach:

2
2

Reject H0 if

where is the significance level and


there are k - 1 degrees of freedom

Multinomial Distribution Goodness of Fit Test

Example: Finger Lakes Homes (A)


Finger Lakes Homes manufactures four models of
prefabricated homes, a two-story colonial, a log cabin,
a split-level, and an A-frame. To help in production
planning, management would like to determine if
previous customer purchases indicate that there is a
preference in the style selected.

Multinomial Distribution Goodness of Fit Test

Example: Finger Lakes Homes (A)


The number of homes sold of each model for 100
sales over the past two years is shown below.
SplitAModel Colonial Log Level Frame
# Sold
30
20
35
15

Multinomial Distribution Goodness of Fit Test

Hypotheses
H0: pC = pL = pS = pA = .25
Ha: The population proportions are not
pC = .25, pL = .25, pS = .25, and pA = .25
where:
pC = population proportion that purchase a colonial
pL = population proportion that purchase a log cabin
pS = population proportion that purchase a split-level
pA = population proportion that purchase an A-frame

Multinomial Distribution Goodness of Fit Test


Rejection Rule
Reject H0 if p-value < .05 or 2 > 7.815.

With = .05 and


k-1=4-1=3
degrees of freedom

Do Not Reject H0

Reject H0
2
7.815

Multinomial Distribution Goodness of Fit Test

Expected Frequencies
e1 = .25(100) = 25
e3 = .25(100) = 25

e2 = .25(100) = 25
e4 = .25(100) = 25

Test Statistic

( 30 25) 2 ( 20 25) 2 ( 35 25) 2 (15 25) 2


25
25
25
25
2

=1+1+4+4
= 10

Multinomial Distribution Goodness of Fit Test

Conclusion Using the p-Value Approach


Area in Upper Tail

2 Value (df = 3)

.10

.05

.025

.01

.005

6.251 7.815 9.348 11.345 12.838

Because 2 = 10 is between 9.348 and 11.345, the


area in the upper tail of the distribution is between
.025 and .01.
The p-value < . We can reject the null hypothesis.

Actual p-value is .
0186

Multinomial Distribution Goodness of Fit Test

Conclusion Using the Critical Value Approach


2 = 10 > 7.815

We reject, at the .05 level of significance,


the assumption that there is no home style
preference.

Goodness of Fit Test: Normal


Distribution
1. State the null and alternative hypotheses.
H00: The population has a normal distribution
Haa: The population does not have a normal distribution
2. Select a random sample and
a. Compute the mean and standard deviation.
b. Define intervals of values so that the expected
frequency is at least 5 for each interval.
c. For each interval, record the observed frequencies
3. Compute the expected frequency, ei , for each interval.
(Multiply the sample size by the probability of a
normal random variable being in the interval.

Goodness of Fit Test: Normal


Distribution
4. Compute the value of the test statistic.
2
(
f

e
)
2 i i
ei
i 1
k

2
2
5. Reject H0 if (where is the significance level

and there are k - 3 degrees of freedom).

Goodness of Fit Test: Normal


Distribution
Example: IQ Computers

IQ Computers (one better than HP?) manufactures


and sells a general purpose microcomputer. As part
of a study to evaluate sales personnel, management
wants to determine, at a .05 significance level, if the
annual sales volume (number of units sold by a
salesperson) follows a normal probability
distribution.

Goodness of Fit Test: Normal


Distribution

Example: IQ Computers
A simple random sample of 30 of the salespeople
was taken and their numbers of units sold are
listed below.
33
64
83

43
65
84

44
66
85

45
68
86

52
70
91

52
72
92

56
73
94

58 63 64
73 74 75
98 102 105

(mean = 71, standard deviation = 18.54)

Goodness of Fit Test: Normal


Distribution
Hypotheses

H0: The population of number of units sold


has a normal distribution with mean 71
and standard deviation 18.54.
Ha: The population of number of units sold
does not have a normal distribution with
mean 71 and standard deviation 18.54.

Goodness of Fit Test: Normal


Distribution
Interval Definition

To satisfy the requirement of an expected


frequency of at least 5 in each interval we will
divide the normal distribution into 30/5 = 6
equal probability intervals.

Goodness of Fit Test: Normal


Distribution

Interval Definition

Areas
= 1.00/6
= .1667

53.02
71 .43(18.54) = 63.03

71

88.98 = 71 + .97(18.54)
78.97

Goodness of Fit Test: Normal


Distribution

Observed and Expected Frequencies


i
Less than 53.02
53.02 to 63.03
63.03 to 71.00
71.00 to 78.97
78.97 to 88.98
More than 88.98
Total

fi
6
3
6
5
4
6
30

ei
5
5
5
5
5
5
30

fi - ei
1
-2
1
0
-1
1

Goodness of Fit Test: Normal Distribution


Rejection Rule
With = .05 and k - p - 1 = 6 - 2 - 1 = 3 d.f.
(where k = number of categories and p = number
2
of population parameters estimated), .05 7.815
Reject H0 if p-value < .05 or 2 > 7.815.

Test Statistic
2
2
2
2
2
2
(1)
(

2)
(1)
(0)
(

1)
(1)
2

1.600
5
5
5
5
5
5

Goodness of Fit Test: Normal Distribution


Conclusion Using the p-Value Approach
Area in Upper Tail

.90

.10

.05

2 Value (df = 3)

.584 6.251 7.815

.025

.01

9.348 11.345

Because 2 = 1.600 is between .584 and 6.251 in the


Chi-Square Distribution Table, the area in the upper tail
of the distribution is between .90 and .10.
The p-value > . We cannot reject the null hypothesis.
There is little evidence to support rejecting the
assumption the population is normally distributed with
= 71 and = 18.54.
Actual p-value is .6594

S-ar putea să vă placă și