Documente Academic
Documente Profesional
Documente Cultură
Data
Quantitative Qualitative
Discrete Continuous
4
A Nominal variable is…
2 or more
1 pop. pop.
Proportion Independence
2 pop.
EPI809/Spring 2008 6
Chi-Square Tables
15 - 10
Degrees of
Freedom
5–1=4
Right-Tail
Area
= 0.05
Characteristics of the 15 - 11
Chi-Square Distribution
… it is positively skewed
… it is non-negative
… it is based on degrees of freedom
…when the degrees of freedom change
a new distribution is created
…e.g.
Copyright © 2004 McGraw-Hill Ryerson Limited. All rights reserved.
Characteristics of the 15 - 12
Chi-Square Distribution
df = 3
df = 5
df = 10
2
Copyright © 2004 McGraw-Hill Ryerson Limited. All rights reserved.
Summing up the properties of the 2
Distribution:
2 distribution ranges from zero to some positive value,
i.e., ‘no difference’ to some ‘big difference’.
2 distribution is not symmetrical, but skewed to the
right, from zero to a large positive 2. Chi square looks at
differences from zero. Its value depends on the number
of comparisons made, that is, the number of df. Note that
the critical value of chi square gets bigger as the df get
bigger, just because the more comparisons made the
more likely you are to find differences, so df corrects for
this.
There are many different 2 distributions. Like the t
distribution, 2 varies with degrees of freedom.
CHI SQUARE APPLICATIONS
(χ2)
Test of
Independency
Test between Test of
proportions normality
Basic Assumption of the Null
Hypothesis
• There is no difference in the population, the
difference you observe is just the chance
variation of your sample.
•We are comparing observed values
(“frequency actually observed in our sample,
written “fo”) to some set of expected by
chance frequencies (written “fe”).
(χ2)
test for difference between two proportions
(fo )2
- fe
… the test statistic is: 2
=
fe
20
1. Determine Appropriate Test
α is a predetermined value
The convention
• α = .05
• α = .01
• α = .001
22
3. Determine The Hypothesis:
Whether There is an Association
or Not
Ho : The two variables are independent
Ha : The two variables are associated
23
4. Calculating Test Statistics
Contrasts observed frequencies in each cell of a
contingency table with expected frequencies.
The expected frequencies represent the number of
cases that would be found in each cell if the null
hypothesis were true ( i.e. the nominal variables
are unrelated).
Expected frequency of two unrelated events is
product of the row and column frequency divided
by number of cases.
Fe= Fr Fc / N
24
4. Calculating Test Statistics
( Fo - Fe ) 2
=
2
Fe
25
4. Calculating Test Statistics
( Fo - Fe ) 2
=
2
Fe
26
27
5. Determine Degrees of
Freedom
df = (R-1)(C-1)
6. Compare computed test statistic
against a tabled/critical value
The computed value of the Pearson chi-
square statistic is compared with the critical
value to determine if the computed value is
improbable
The critical tabled values are based on
sampling distributions of the Pearson chi-
square statistic
If calculated 2 is greater than 2 table
value, reject Ho 28
Example
29
Bivariate Frequency Table or
Contingency Table
Favor Neutral Oppose f row
Democrat 10 10 30 50
Republican 15 15 10 40
f column 25 25 40 n = 90
30
Bivariate Frequency Table or
Contingency Table
Favor Neutral Oppose f row
Democrat 10 10 30 50
Republican 15 15 10 40
f column 25 25 40 n = 90
31
Row frequency
Bivariate Frequency Table or
Contingency Table
Favor Neutral Oppose f row
Democrat 10 10 30 50
Republican 15 15 10 40
f column 25 25 40 n = 90
32
Bivariate Frequency Table or
Contingency Table
Favor Neutral Oppose f row
Democrat 10 10 30 50
Republican 15 15 10 40
f column 25 25 40 n = 90
Column frequency
33
1. Determine Appropriate Test
34
35
Alpha of .05
3. Determine The Hypothesis
36
4. Calculating Test Statistics
37
4. Calculating Test Statistics
38
4. Calculating Test Statistics
39
4. Calculating Test Statistics
= 11.03
40
41
5. Determine Degrees of
Freedom
df = (R-1)(C-1) =
(2-1)(3-1) = 2
6. Compare computed test statistic
against a tabled/critical value
α = 0.05
df = 2
Critical tabled value = 5.991
Test statistic, 11.03, exceeds critical value
Null hypothesis is rejected
Democrats & Republicans differ
significantly in their opinions on gun
control issues
42
SPSS Output for Gun Control
Example
Chi-Square Tests
Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 11.025a 2 .004
Likelihood Ratio 11.365 2 .003
Linear-by-Linear
8.722 1 .003
Association
N of Valid Cases 90
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 11.11.
43
Another case...
Approval for President Obama by Race
BLACKS WHITES
APPROVE 69 156
DISAPPROVE 21 144
The formula for 2 is:
( fo - fe )
2
=
2
fe
OR, sometimes written:
(O - E ) 2
= 2
E
Where fo is the observed frequency of each
category in each cell of a table.
O or fo is what we observe from our sample, the
observed frequency. NOTE that 2 works with
frequencies in each cell.
Observed fo 77 41 32
Expected fe 50 50 50
Step 1. Hypothesis:
Null = the proportions preferring each beer
should be equal IF indeed the beers are equal and
if preferences are not influenced by the label. Here,
chance would predict 50 people in each group if
label did not matter. The ratios of O to E values
should be the same across all 3 comparisons if
label does not matter. The O : E ratios in each
column should be the same. Our alternative
hypothesis is that preferences will follow the status
of beer 1 > beer 2 > beer 3.
Step 2. The Distribution: .
(O - E ) 2
=
2
E
Beer Hi Priced Med Priced Lo Priced
Observed 77 41 32
Expected 50 50 50
O-E 27 -9 -18
15 - 59
…to determine the mean and standard deviation
of the frequency distribution
- Compute the z-value for the lower class limit
and the upper class limit for each class
- Determine fe for each category
- Use the chi-square goodness-of-fit test to
determine if fo coincides with fe
attention
• Suppose we knew the mean and standard deviation of
population but wished to find whether some sample data
conform to the normal distribution :
d.f. = k = 1
• On the other hand, if we don’t know the mean and standard
deviation of population but we wish to test whether some
sample data follow the normal distribution
d.f. = k = p=1
Where p is the number of population parameter being estimated
from the sample data
Goodness-of-Fit Test:
Normality
15 - 61
Foundation is reported in the
following frequency distribution
15 - 62
<$6 20
$6 up to $8 60
$8 up to $10 140
$10 up to $12 120
$12 up to $14 90
>$14 70
Total 500
… continued
15 - 63
X - m 6 - 10
z = = = - 2 . 00
s 2
Now…
find the probability of a z - value less than –2.00
15 - 64
<$6 20 .02
$6 up to $8 60 .14
$8 up to $10 140 .34
$10 up to $12 120 .34
$12 up to $14 90 .14
>$14 70 .02
Total 500
… continued
15 - 65
f e = (. 0228 )( 500 ) = 11 . 40
The other expected frequencies
are computed similarly
… continued
15 - 66
<$6 20 .02 11.40 6.49
$6 up to $8 60 .14 67.95 .93
$8 up to $10 140 .34 170.65 5.50
$10 up to $12 120 .34 170.65 15.03
$12 up to $14 90 .14 67.95 7.16
>$14 70 .02 11.40 301.22
Total 500 500 336.33
… continued
Step 1 H0: The observations follow the normal distribution
H0: The observations do NOT follow the normal
distribution
Step 2
15 - 67
= 0.05
Step 4 2 = 336.33
H0: is rejected.
The observations do NOT follow the normal distribution
A contingency table is used to investigate
whether two traits or characteristics
are related
15 - 68
… each observation is classified according to two criteria
…the usual hypothesis testing procedure is used
15 - 69
of the person involved in the accident?
Sex
Location Total
Work Home Other
Male 60 20 10 90
15 - 70
Female 20 30 10 60
Total 80 50 20 150
The expected frequency for the work-male
intersection is computed as (90)(80)/150 =48
Similarly, you can compute the
expected frequencies for the other cells
… continued
Step 1 H0: The Gender and Location are NOT related
H0: The Gender and Location are related
Step 2 = 0.05
H0 is rejected if 2 >5.991, df = 2
15 - 71
Step 3
(…there are (3- 1)(2-1) = 2 degrees of freedom)
2 =
(60 - 48 )2
...
(10 - 8 )
2
48 8
= 16 . 667
H0: is rejected.
Gender and Location are related!