Sunteți pe pagina 1din 38

Sociology 5811:

Lecture 16: Crosstabs 2


Measures of Association
Plus Differences in Proportions
Copyright © 2005 by Evan Schofer
Please copy or distribute without
permission
Announcements
• Final project proposals due Nov 15
• Get started now!!!
• Find a dataset
• figure out what hypotheses you might test
• Today: Wrap up Crosstabs
• If time remains, we’ll discuss project ideas…
Review: Chi-square Test
• Chi-Square test is a test of independence
• Null hypothesis: the two categorical variables are
statistically independent
• There is no relationship between them
• H0: Gender and political party are independent
• Alternate hypothesis: the variables are related,
not independent of each other
• H1: Gender and political party are not independent
• Test is based on comparing the observed cell
values with the values you’d expect if there were
no relationship between variables.
Review: Expected Cell Values
• If two variables are independent, cell values will
depend only on row & column marginals
– Marginals reflect frequencies… And, if frequency is
high, all cells in that row (or column) should be high
• The formula for the expected value in a cell is:
( Ri )( C j )
Eij 
N
• Ri and Cj are the row and column marginals
• N is the total sample size
Review: Chi-square Test
• The Chi-square formula:
R C ( Eij  Oij ) 2

  
2

i 1 j 1 Eij
• Where:
• R = total number of rows in the table
• C = total number of columns in the table
• Eij = the expected frequency in row i, column j
• Oij = the observed frequency in row i, column j
– Assumption for test: Large N (>100)
– Critical value Dof Chi Square: (R-1)(C-1).
Chi-square Test of Independence
• Example: Gender and Political Views
– Let’s pretend that N of 68 is sufficient

Women Men

O11: 27 O12 : 10
Democrat
E11: 23.4 E12 : 13.6
O21 : 16 O22 : 15
Republican
E21 : 19.6 E22 : 11.4
Chi-square Test of Independence
• Compute (Eij – Oij )2 /Eij for each cell

Women Men

(23.4 – 27)2/23.4 (13.6 – 10)2/13.6


Democrat
= .55 = .95
(19.6 – 16)2/19.6 (11.4 – 15)2/15
Republican
= .66 = .86
Chi-Square Test of Independence
• Finally, sum up to compute the Chi-square
• 2 = .55 + .95 + .66 + .86 = 3.02
• What is the critical value for a=.05?
• Degrees of freedom: (R-1)(C-1) = (2-1)(2-1) = 1
• According to Knoke, p. 509: Critical value is 3.84
• Question: Can we reject H0 ?
• No. 2 of 3.02 is less than the critical value
• We cannot conclude that there is a relationship between
gender and political party affiliation.
Chi-square Test of Independence
• Weaknesses of chi-square tests:
• 1. If the sample is very large, we almost always
reject H0.
• Even tiny covariations are statistically significant
• But, they may not be socially meaningful differences
• 2. It doesn’t tell us how strong the relationship is
• It doesn’t tell us if it is a large, meaningful difference or a
very small one
• It is only a test of “independence” vs. “dependence”
• Measures of Association address this shortcoming.
Measures of Association
• Separate from the issue of independence,
statisticians have created measures of association
– They are measures that tell us how strong the
relationship is between two variables
• Weak Association Strong Association

Women Men Women Men

Dem. 51 49 Dem. 100 0

Rep. 49 51 Rep. 0 100


Crosstab Association:Yule’s Q
• #1: Yule’s Q
– Appropriate only for 2x2 tables (2 rows, 2 columns)
• Label cell frequencies a through d: a b
bc  ad
Formula : Q  c d
bc  ad
• Recall that extreme values along the “diagonal”
(cells a & d) or the “off-diagonal” (b & c)
indicate a strong relationship.
• Yule’s Q captures that in a measure
• 0 = no association. -1, +1 = strong association
Crosstab Association:Yule’s Q
• Rule of Thumb for interpreting Yule’s Q:
• Bohrnstedt & Knoke, p. 150

Absolute
Strength of Association
value of Q
0 to .24 “virtually no relationship”

.25 to .49 “weak relationship”

.50 to .74 “moderate relationship”

.75 to 1.0 “strong relationship”


Crosstab Association:Yule’s Q
• Example: Gender and Political Party Affiliation
Women Men Calculate “bc”
a b bc = (10)(16) = 160
Dem 27 10
Calculate “ad”
c d
Rep 16 15 ad = (27)(15) = 405

bc  ad 160  405  245


Q    .48
bc  ad 160  405 505
• -.48 = “weak association”, almost “moderate”
Crosstab Association
• Final remarks:
• You have a variety of possible measures to assess
association among variables. Which one should
you use?
• Yule’s Q and Phi require a 2x2 table
• Larger ordered tables: use Gamma, Tau-c, Somer’s d
• Ideally, report more than one to show that your findings are
robust.
Odds Ratios
• Odds ratios are a powerful way of analyzing
relationships in crosstabs
• Many advanced categorical data analysis techniques are
based on odds ratios
• Review: What is a probability?
• p(A) = # of outcomes that are “A” divided by total number
of outcomes
• To convert a frequency distribution to a probability
distribution, simply divide frequency by N
• The same can be done with crosstabs: Cell frequency over
N is probability.
Odds Ratios
• If total N = 68, probability of drawing cases is:
Women Men

Dem 27 / 68 10 / 68

Rep 16 / 68 15 / 68

Women Men

Dem .397 .147

Rep .235 .220


Odds Ratios
• Odds are similar to probability… but not quite
• Odds of A = Number of outcomes that are A,
divided by number of outcomes that are not A
– Note: Denominator is different that probability
• Ex: Probability of rolling 1 on a 6-sided die = 1/6
• Odds of rolling a 1 on a six-sided die = 1/5
• Odds can also be calculated from probabilities:
pi
oddsi 
1  pi
Odds Ratios
• Conditional odds = odds of being in one category
of a variable within a specific category of
another variable
– Example: For women, what are the odds of being
democrat?
– Instead of overall odds of being democrat, conditional
odds are about a particular subgroup in a table
Women Men Conditional odds of
being democrat are:
Dem 27 10 27 / 16 = 1.69
Note: Odds for women
Rep 16 15 are different than men
Odds Ratios
• If variables in a crosstab are independent, their
conditional odds are equal
• Odds of falling into one category or another are same for all
values of other variable
• If variables in a crosstab are associated,
conditional odds differ
• Odds can be compared by making a ratio
• Ratio is equal to 1 if odds are the same for two groups
• Ratios much greater or less than 1 indicate very different
odds.
Odds Ratios
• Formula for Odds Ratio in 2x2 table:
Women Men
b d bc
OR XY
  Dem 27 10
a c ad Rep 16 15

• Ex: OR = (10)(16)/(27)(15) = 160 / 405 = .395


• Interpretation: men have .395 times the odds of
being a democrat compared to women
• Inverted value (1/.395=2.5) indicates odds of
women being democrat = 2.5 is times men’s odds
Odds Ratios: Final Remarks
• 1. Cells with zeros cause problems for odds ratios
• Ratios with zero in denominator are undefined.
• Thus, you need to have full cells
• 2. Odds ratios can be used to measure assocation
• Indeed, Yule’s Q is based on them
• 3. Odds ratios form the basis for most advanced
categorical data analysis techniques
• For now it may be easier to use Yule’s Q, etc. But, if you
need to do advanced techniques, you will use odds ratios.
Association: Other Measures
• Phi ()
• Very similar to Yule’s Q
• Only for 2x2 tables, ranges from –1 to 1, 0 = no assoc.
• Gamma (G)
• Based on a very different method of calculation
• Not limited to 2x2 tables
• Requires ordered variables
• Tau c (tc) and Somer’s d (dyx)
• Same basic principle as Gamma
• Several Others discussed in Knoke, Norusis.
Crosstab Association: Gamma
• Gamma, like Q, is based on comparing
“diagonal” to “off-diagonal” cases.
– But, it does so differently
• Jargon:
• Concordant pairs: Pairs of cases where one case
is higher on both variables than another case
• Discordant pairs: Pairs of cases for which the
first case (when compared to a second) is higher
on one variable but lower on another
Crosstab Association: Gamma
• Example: Approval of candidates
– Cases in “Love Trees/Love Guns” cell make
concordant pairs with cases lower on both
Hate Trees Love
Trees OK Trees All 71 individuals can be a
Love pair with everyone in the
1205 603 71 lower cells. Just Multiply!
Guns
Guns (71)(659+1498+ 431+467)
659 1498 452 = 216,905 conc. pairs
= OK
Hate
431 467 1120
Guns
Crosstab Association: Gamma
• More possible concordant pairs
– The “Love Guns/Trees are OK” cell and the “Trees =
OK/Love Guns” cells also can have concordant pairs
Hate Trees Love These 603 can pair with all
those that score lower on
Trees = OK Trees
approval for Guns & Trees
Love
1205 603 71 (603)(659 + 431) =
Guns 657,270 conc. pairs
Guns
659 1498 452 These can pair lower too!
= OK
Hate (452)(431 + 467) =
431 467 1120 405,896 conc. pairs
Guns
Crosstab Association: Gamma
• Discordant pairs: Pairs where a first person ranks
higher on one dimension (e.g. approval of Trees)
but lower on the other (e.g., app. of Guns)
Hate Trees Love
Trees = OK Trees The top-left cell is higher
on Guns but lower on
Love Trees than those in the
1205 603 71
Guns lower right. They make
Guns pairs:
659 1498 452
= OK (1205)(1498 + 452 + 467
Hate + 1120) = 4,262,085
431 467 1120 discordant pairs
Guns
Crosstab Associaton: Gamma
• If all pairs are concordant or all pairs are
discordant, the variables are strongly related
• If there are an equal number of discordant and
concordant pairs, the variables are weakly
associated.
n s  nd
• Formula for Gamma: G 
n s  nd
• ns = number of concordant pairs
• nd = number of discordant pairs
Crosstab Association: Gamma
• Calculation of Gamma is typically done by
computer
• Zero indicates no association
• +1 = strong positive association
• -1 = strong negative association
• It is possible to do hypothesis tests on Gamma
• To determine if population gamma differs from zero
• Requirements: random sample, N > 50
• See Knoke, p. 155-6.
Tests for Difference in Proportions
• Another approach to small (2x2) tables:
• Instead of making a crosstab, you can just think
about the proportion of people in a given category
• More similar to T-test than a Chi-square test
• Ex: Do you approve of Pres. Bush? (Yes/No)
• Sample: N = 86 women, 80 men
• Proportion of women that approve: PW = .70
• Proportion of men that approve: PM = .78
• Issue: Do the populations of men/women differ?
• Or are the differences just due to sampling variability
Tests for Difference in Proportions
• Hypotheses:
• Again, the typical null hypothesis is that there are
no differences between groups
• Which is equivalent to statistical independence
• H0: Proportion women = proportion men
• H1: Proportion women not = proportion men
• Note: One-tailed directional hypotheses can also be used.
Tests for Difference in Proportions
• Strategy: Figure out the sampling distribution for
differences in proportions
• Statisticians have determined relevant info:
• 1. If samples are “large”, the sampling
distribution of difference in proportions is normal
– The Z-distribution can be used for hypothesis tests
• 2. A Z-value can be calculated using the formula:
P1  P2
Z
σ̂ ( P1  P2 )
Tests for Difference in Proportions
• Standard error can be estimated as:

N1  N 2
σ̂ ( P1  P2 )  Pboth (1  Pboth )
N1 N 2
• Where:

N1 P1  N 2 P2
Pboth 
N1  N 2
Difference in Proportions: Example
• Q: Do you approve of Pres. Bush? (Yes/No)
• Sample: N = 86 women, 80 men
• Women: N = 86, PW = .70
• Men: N = 80, PW = .78
• Total N is “Large”: 166 people
– So, we can use a Z-test
• Use a = .05, two-tailed Z = 1.96
Difference in Proportions: Example
• Use formula to calculate Z-value
P1  P2 .70  .78  .08
Z  
σ̂ ( P1  P2 ) σ̂ ( P1  P2 ) σ̂ ( P1  P2 )
• And, estimate the Standard Error as:

N1  N 2
σ̂ ( P1  P2 )  Pboth (1  Pboth )
N1 N 2
Difference in Proportions: Example
• First: Calculate Pboth:
N1 P1  N 2 P2
Pboth 
N1  N 2
86(.70)  80(.78)
Pboth 
86  80
60.2  62.4
Pboth   .739
166
Difference in Proportions: Example
• Plug in Pboth=.739:
N1  N 2
σ̂ ( P1  P2 )  .739(1  .739)
N1 N 2
86  80
σ̂ ( P1 P2 )  .454
(86)(80)
166
σ̂ ( P1 P2 )  .674  .104
6880
Difference in Proportions: Example
• Finally, plug in S.E. and calculate Z:
P1  P2 .70  .78  .08
Z  
σ̂ ( P1  P2 ) σ̂ ( P1  P2 ) σ̂ ( P1  P2 )
P1  P2  .08
Z   .769
σ̂ ( P1 P2 ) .104
Difference in Proportions: Example
• Results:
• Critical Z = 1.96
• Observed Z = .739
• Conclusion: We can’t reject null hypothesis
– Women and Men do not clearly differ in approval of
Bush

S-ar putea să vă placă și