Sunteți pe pagina 1din 9

This article was downloaded by: [University of Alberta]

On: 7 January 2009


Access details: Access Details: [subscription number 713587337]
Publisher Informa Healthcare
Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House,
37-41 Mortimer Street, London W1T 3JH, UK

Encyclopedia of Biopharmaceutical Statistics


Publication details, including instructions for authors and subscription information:
http://www.informaworld.com/smpp/title~content=t713172960

Analysis of 2 K Tables
Shiva Gautam a
a
Harvard Medical School, Boston, Massachusetts, U.S.A.
Online Publication Date: 25 August 2004

To cite this Section Gautam, Shiva(2004)'Analysis of 2 K Tables',Encyclopedia of Biopharmaceutical Statistics,1:1,1 7

PLEASE SCROLL DOWN FOR ARTICLE


Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf
This article may be used for research, teaching and private study purposes. Any substantial or
systematic reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or
distribution in any form to anyone is expressly forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents
will be complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses
should be independently verified with primary sources. The publisher shall not be liable for any loss,
actions, claims, proceedings, demand or costs or damages whatsoever or howsoever caused arising directly
or indirectly in connection with or arising out of the use of this material.

Analysis of 2
 K Tables
Shiva Gautam
Harvard Medical School, Boston, Massachusetts, U.S.A.

Downloaded By: [University of Alberta] At: 06:30 7 January 2009

INTRODUCTION
Data in 2  K contingency tables are encountered quite
frequently in biomedical, epidemiological, social, and
behavioral studies. The variable representing two rows is
often called the row variable, whereas the variable
representing K columns is called column variable. (Representation of a data set either in a 2  K or a K  2 table
is just a matter of convenience.) Depending on the research design, either of the column and row variables may
be outcome (response) variables or only one of them may
be an outcome variable. More specifically, an observation
may have been simultaneously categorized into one of the
two rows and into one of the K column categories, or an
observation may have been first drawn from a given
classification of one of the variables (row or column) and
then classified into one of the categories of the other
variable (column or row). For example, without taking
into account the pros and cons of study designs, consider a
possible study to evaluate the association between smoking and lung cancer. The investigator may choose a design
in which he/she first selects two groups of people according to whether they have or have no cancer. Then
each subject is classified into one of the smoking history
categories (e.g., nonsmoker, light smoker, heavy smoker,
etc.). Similarly, the investigator may first select people
according to smoking status, and then classify each
subject from each smoking group according to whether
he/she has or has no lung cancer. Finally, the investigator
may select a fixed number of subjects and then simultaneously classify them into one of the two lung cancer
categories and into one of the several smoking categories. In many situations, the same computational procedures can be used while analyzing data regardless of the
study design.
In the analysis of 2  K nominal table it is important
to distinguish between a nominal table and an ordinal
table. The data from the lung cancer and smoking study
alluded above give rise to an ordinal 2 K table as the
column of the tables (e.g., nonsmoker, light smoker,
heavy smoker, etc.) follow an ordering (increasing) or a
hierarchy in the sense that any one category will either
be at a higher level or at a lower level than any of the
other remaining categories. Sometimes such an ordering
among categories is also called simple ordering. A 2K
Encyclopedia of Biopharmaceutical Statistics
DOI: 10.1081/E-EBS 120023105
Copyright D 2004 by Marcel Dekker, Inc. All rights reserved.

table with no ordering, in the sense that any category


is neither at a higher nor at a lower level than any
one of the other remaining category, is called nominal
category. For example, a table showing low-birthweight
babies (low birthweight =yes or no) and ethnicity (Asians,
Blacks, Hispanics, Whites, etc.) is an example of a 2K
nominal table.
This paper presents some existing methods of analyzing data in 2K nominal and ordinal tables, and then
discusses some recently developed methods for 2K
ordinal table as an extension to Pearson chi-square test.

ANALYSIS OF 2
 K NOMINAL TABLES
Let nij denote the number of observations in the ith row
(i = 1, 2) and jth column ( j = 1, 2, . . ., K) as displayed in
2
k
P
P
Table 1. Also, let ni
nij , nj
nij , and
2
k
2
k
i1 P
j1
P
P
P
n
nij
ni
nj . The columns of
i 1j 1

i1

j1

the table, for the time being, are assumed to be nominal.


Pearsons Chi-Square Test
Perhaps the most popular method for analyzing 2K
nominal tables is the Pearsons chi-square procedure
introduced by Karl Pearson in 1900. The Pearson chisquare test statistic is defined as
X2

2 X
k
X

nij  ^yij 2 ;

i 1j 1

ni nj
.
where ^yij
n
The statistic X2 is asymptotically distributed as
the chi-square variate with (K  1) degrees of freedom. A
large value of X2 provides evidence against the null
hypothesis. The null hypothesis is often stated as there is
no association between the row and column variables.
Depending on the research question, the null hypothesis
could be that the distribution of proportions in each row
(two populations) is the same or the column proportions
(K-populations) are the same. As usual, the decision to
reject (or not to reject) the null hypothesis is based on
1

ORDER

REPRINTS

Analysis of 2  K Tables

Table 1 A 2  K contingency table


Column
Row

...

...

Total

1
2
Total

n11
n21
n+1

n12
n22
n+2

...
...
...

...
...
...

n1k
n2k
n+k

n1 +
n2 +
n

the p-value. Agresti[1] is an excellent source on chisquare analysis of two-way nominal categorical tables.

Downloaded By: [University of Alberta] At: 06:30 7 January 2009

Likelihood Ratio Chi-Square


Likelihood ratio chi-square statistic is also used to
make inference from a nominal contingency table. It is
defined as
G2 2

2 X
k
X

nij lognij =^
yij

cancer for a nonsmoker, then the odds in favor of lung


cancer for a smoker is given by p1/(1 p1), and the odds in
favor of lung cancer for a nonsmoker by p2/(1 p2). The
ratio p1(1 p2)/p2(1 p1) is referred to as the odds ratio
which is often used as a direct/indirect measure of the
relative risk of a disease (cancer) with an exposure
(smoking) relative to the same disease without the
exposure. Sample odds ratios are calculated using the
observed proportions instead of probabilities.]
One of the advantages of using logistic regression is
that it quantifies the magnitude of association. Furthermore, the effect of any additional variables can be adjusted in the model. For example, while evaluating the
association of low birthweight (yes, no) and race (Blacks,
Hispanics, Whites, Others), investigators may want to
adjust for the effect of a covariate (e.g., weight of mothers). Because of these and some other desirable properties
and ease of interpretation of coefficients, the logistic
regression procedure[2,3] is widely used to model binary
response data from biomedical and other studies.

i 1j 1

ni nj
.
where ^
yij
n
For large n, G2 also has chi-square distribution
with (K  1) degrees of freedom. Hence both X2 and G2
analyses of a given data set in a 2 K nominal table will
generally yield similar results for large n.
Logistic Regression
When the two rows of a 2K table represent response,
logistic regression may be used to analyze the data by
modeling the probability of response (e.g., present vs.
absent). Let p =probability of response in the first row.
Define dummy variable X2, X3, . . ., Xk such that Xj =1 if
the observation is from the jth category (j =2, 3, . . ., k),
and Xj =0, otherwise. The logistic regression model can be
represented as
^ X2    b
^ Xk
^ b
3
logitp b
1
2
k


p
where logit p log
.
1  p
^ is log odds of responding in row 1
Note that b
1
from column 1 (reference column) or equivalently when
X2, X3,. . .Xk equal to 0 and X1 equals to 1. In other
^ =log(n1j/n2j). From the above equation b
^ is
words, b
1
j
the excess of log odds responding in row 1 due to the jth
column than the response due to the first column. In
^ represents odds ratio (odds of
other words, expb
j
response due to the jth column compared to odds of
response due to the first column).
[Note: Suppose p1 denotes the probability of lung
cancer for smoker, and p2 denotes the chance of lung

Maximal Correlation and


Pearsons Chi-Square
Consider Table 1 as a sample from a bivariate distribution
U (row) and V (column) variable. Let U = 1 if an observation is classified into the first row, U = 0 if an
observation is classified into the second row. Let V= sj if
an observation is classified into the jth category ( j =1, 2,
. . ., k), where sj is a real number. The value sj taken by the
variable V corresponding to the jth column of Table 1 will
be referred to as a score hereafter. Let r2{s1, s2, . . ., sk}
denote the square of the Pearsons correlation between
U and V for a given set of scores {s1, s2, . . ., sk}. Let
r2max denote the maximum of r2{s1, s2, . . ., sk} over all
possible sets of scores. Then it can be shown[4,5] that,
2
X2
nrmax

where X2 is the Pearsons chi-square statistics.


It can be shown that the maximal score for r2max is given
by the set of scores {n11/n +1, n12/n +2, . . ., n1k/n + k} or any
set of scores obtained from
a linear transformation of
p
2 , then it can be shown
these scores.[5] If rmax
rmax
2
2
2
that rmax = (rmax) =(rmin) , where rmin = rmax. Thus rmax
is the maximum possible correlation between the row and
column variable, and is always nonnegative.
As mentioned earlier X2 is simply a significance test
and does not provide information on the magnitude of
association between row and column variables. As evident
from Eq. 4, a large value of chi-square may result if there
is a large sample size even when the association is poor.
q
X2
However, the maximum possible correlation rmax
n
can be used as a measure of association as it meets several

ORDER

REPRINTS

Analysis of 2  K Tables

Exact Tests

criteria outlined by Goodman


and Kruskal for a measure
q
[6]
X2
of association. As
n is also equal to Cramers V and
the f coefficient, the maximal correlation gives a new
meaning to these quantities which are incorporated into
the output of some statistical packages. Note that
0 rmax  1 and rmax =1 if and only if total observations
in each column are contributed by only one of the two
rows.[5]

Inferences drawn from the above-described analyses of


2 K tables are based on a large sample theory. When the
entries nij of Table 1 are small then the p-value is obtained
directly. For example, exact tests such as Fishers exact
test for 2 2 tables can be extended to a 2 K table. An
exact test for a 2 K table is often performed by
generating all possible tables or randomly generating
some large number of tables (e.g., 10,000 tables) with the
same marginal totals as the observed table assuming the
null hypothesis is true. The p-value is the proportion of
tables yielding values of a statistic (e.g., X2) that are equal
to or larger than the value of the same statistic obtained
from the observed table.

Downloaded By: [University of Alberta] At: 06:30 7 January 2009

Regression and Pearsons Chi-Square


Consider U and V as row and column variables of Table 1.
Furthermore, define K  1 dummy variables X2, X3, . . ., Xk
corresponding to the second, third, . . ., and kth category,
respectively (these are the same variables defined earlier
in the context of logistic regression). Consider the
following predicted line from the linear regression U on
X2, X3, . . ., Xk
^ ^
U
a1 ^
a 2 X2 ^
a3 X 3    ^
a k Xk

An Example
Consider Table 2 from Helmes and Fekken.[8] The table
is also reproduced in Agresti.[1] The table classifies
psychiatric patients by their diagnosis and whether the
treatment prescribed drugs.
The Pearson chi-square statistic from Table 2 is
X2 = 84.180 (p <0.0001, df = 4). This suggests an association between the diagnosis and whether or not a patients
treatment prescribed drugs. The above table shows that a
schizophrenic patient is most likely to be treated by drugs
followed by patient diagnosed as active disorder and
personality disorder, respectively. A patient with neurosis
has an almost 50% chance of being prescribed drugs,
whereas a patient classified as having special symptoms is
not likely to be treated by a drug. The Pearson chi-square
test rejects the hypothesis that these proportions in the
population are the same. In other words, there seems to be
an association between the diagnosis and whether or not
the treatment prescribed a drug. The last row shows the
odds ratio of being prescribed with a drug for a given
diagnosis compared to odds of being prescribed with a
drug if a patient is diagnosed as schizophrenic.

If R denotes the multiple correlation coefficient then it can


be shown that,[5]
nR2 X 2

It follows from Eqs. 4 and 6 that r2max = R2. It may be


desirable to collapse the columns of a 2 K table without
losing much information.[7] Gautam and Kimeldorf
showed that two columns of a 2K table can be collapsed
into one if the regression coefficients are equal.[5]
Similarly, if a regression coefficient equals zero then the
corresponding column can be collapsed to the column
representing the intercept. Therefore, one can test the
hypothesis that only a given subset of categories is
responsible for the association between the variables. A
post hoc analysis may also be performed to find out the
reduced table obtained by collapsing categories. This is
analogous to testing of the hypothesis for a subset of
regression parameters in multiple linear regression setting.

Table 2 Diagnosis of patients and whether their treatment prescribed drugs


Diagnosis
Treatment

Schizophrenia

Active disorder

Neurosis

Personality disorder

Special symptoms

Total

Drugs
No drugs
Total
% Drugs
Odds ratio

105
8
113
92.92
1.0

12
2
14
85.71
0.46

18
19
37
48.65
0.07

47
52
69
68.12
0.07

0
13
13
0
0

182
94
276

Source: Ref. [8].

ORDER

Analysis of 2  K Tables

Logistic Regression
Let
X2

1;
0;

X3

1;
0;

1;
X4
0;

Downloaded By: [University of Alberta] At: 06:30 7 January 2009

X5

REPRINTS

1;
0;

if Active Disorder
otherwise

if Neurosis
otherwise

if Personality Disorder
otherwise

if Other Symptoms
otherwise

Table 3 shows the results from the logistic regression


model presented in Eq. 3.
A quick inspection of Table 3 reveals that the entries in
the last column are odds ratios given in the last row of
Table 2 except for the entry corresponding to constant.
The entry corresponding to constant is simply the odds of
being prescribed drug in the schizophrenia category (105/
8 =13.215). Logistic regression analysis shows that at least
one odds ratio is different from one, thus indicating an
association between row and column variable. Some
caution must be taken while analyzing data in tables with
zero entries. A small number is often added to such
entries. As an alternate, exact logistic regression analysis
can be performed.
A large value of chi-square is indicative of an association between treatment prescribing drug and diagnosis, but it does not indicate whether the statistical
significance is a reflection of a strong association or an
artifact of weaker association and a large sample size. As
discussed above, the maximal correlation could shed
further light into this association. From the relationship
between qthe
Pearson
qchi-square and rmax, we have
X2
84:180

rmax
n
276 0:5523. This shows that the
observed significant association is not solely due to a large
sample size. A natural question that may arise is whether
some of the categories can be combined without losing
much information.[7] An investigator may be further
interested to determine whether this association is mostly
due to only a few selected categories of the table. Gautam

Table 3 Results from logistic regression on data in Table 2


Variable

SE(b )

p-value

Exp(b )

X2
X3
X4
X5
Constant

 0.783
 2.629
 2.676
 23.777
2.575

0.847
0.493
0.418
11,147.524
0.367

0.356
0.000
0.000
0.998
0.00

0.457
0.072
0.069
0.000
13.125

and Kimeldorf [5] used multiple regression procedure to


compute the contribution of a category to the total
association noting that rmax is equal to the multiple
correlation coefficient R obtained from the regression of
row variable U (U = 1 if first row and U =0 if second row)
on dummy variables X2, X3, X4, and X5 defined earlier.
They showed that if the 2 3 table is obtained by
collapsing columns 1 and 2 together, columns 3 and 4
collapsing together and leaving column 5 the way it is,
then the maximal correlation (Pearson chi-square) from
this reduced table is 0.5511 (chi-square= 83.824) which is
about 99.8% of the maximal correlation (chi-square) from
the original 2  5 table.
If the table is reduced to the 2 2 table by collapsing
the last four columns into one column, then the maximal
correlation from this 2  2 table is 0.4740 (chi-square =
62.011). The reduction in the correlation is 0.0783, and
the corresponding reduction in chi-square is 22.179
(p-value < 0.001). Furthermore, Gautam and Kimeldorf
argue that this significance is due to the sample size rather
than an indication of the weakening of the association due
to collapsing of the categories.[5] The two columns of this
table classify patients into schizophrenia and nonschizophrenia groups, and the 2  2 table still retains about 86%
of the information provided by the original 2  5 table.
Hence the association between row and column variables
in the original table is basically the association between
diagnosis of schizophrenia (yes or no) and whether the
treatment prescribed (yes or no) a drug. In terms of odds
ratios, the odds of being on prescription drugs for a person
diagnosed with schizophrenia are 14.66 times the odds of
being on prescription drugs for a person diagnosed with a
nonschizophrenia (95% CI : 6.7132.04) category.

ANALYSIS OF 2
 K ORDERED TABLES
Pearsons chi-square procedures and other tests developed
for analyzing data in 2 K nominal tables do not
incorporate the information on ordering among the
columns of the table. These tests are not directed toward
any specific alternate hypothesis. In analyzing data in a
2 K ordered table, investigators will obviously want
to use as much information as possible provided by the
data and also often want to determine whether the null
hypothesis can be rejected against a specific alternate
hypothesis (e.g., increasing response with the columns).
A test that utilizes ordering information will have
increased power compared to a test for nominal tables.[1]
Methods for analyzing data in 2 K ordered tables may
be broadly classified into two groups, namely, methods
that assign and that do not assign numerical scores to the
ordered categories, respectively. Methods that do assign
numerical scores to the ordered categories may further

ORDER

REPRINTS

Analysis of 2  K Tables

be divided into two subgroups. In the first subgroup of


methods, scores are chosen a priori and the analysis is
carried out using these scores, whereas in the second
subgroup, scores are extracted from the data and thus are
functions of the observed data.

Downloaded By: [University of Alberta] At: 06:30 7 January 2009

Methods Without Scores


One of the widely used methods for analyzing data in
2 K ordered tables with the two rows representing two
populations is the MannWhitney or equivalently Wilcoxon rank sum procedure.[9] Dykstra et al.[10] proposed
a likelihood ratio statistic for testing whether one
population with ordered outcome is larger than the other
in the sense likelihood ratio ordering. The Likelihood
Proportional Odds Model is another method used to
analyze 2 K ordered data without assigning scores to
ordinal categories.[11]
Methods with Scores
Several methods of data analysis for 2 K ordered tables
assign order-preserving scores. The CochranArmitage
Mantel trend test is widely used to evaluate the trend in
proportion.[1214] This trend test requires assignment of
order-preserving scores. Another widely used method that
utilizes order-preserving scores is logistic regression. The
scores are chosen a priori by the investigator, and often,
equally spaced scores (e.g., 1, 2, 3, . . ., k) are assigned to
the ordered categories. In some situations, the ordered
categories are defined by actual intervals or actual quantity (e.g., dose level) in which case the scores may be
chosen as mid values of the interval or the numerical
numbers used to define the categories. The trend test is
simply a test of significance and does not provide the magnitude of the association. The slope from the logistic regression is a function of the odds ratio [exp(slope) = odds
ratio].[2,3] It is worth noting that the test statistic for the
trend test and logistic regression are equivalent for a given
set of scores. Hence the p-value from the trend test under
the null hypothesis of no trend and the p-value for
the logistic regression under the null hypothesis unit odds
ratio are the same. When the two rows represent two
independent samples, one can compare the row means
with a given set of scores using Students t-test. Again, the
p-value from the t-test will also be equal to the p-value
from the trend test or the logistic regression if one were to
use these methods. Finally, the test of zero correlation
(Pearson) between row and column variables would also
have yielded the same p-value.
Let the row variable be denoted by U and column variable V with its values as category scores. If r= corr(U,V)
is the correlation between U and V for a given set of
order-preserving scores, then the test statistic for the

trend test and logistic regression is equal to nr2, and the


nr
test statistic for the t-test is t p
(where n is the
1  r2
number of observations). Therefore, if linear regression
instead of logistic regression is used, then one would
obtain the same value of the test statistic and the same pvalue while testing the null hypothesis of zero slope.
Although the research questions being asked and the
design generating a 2K ordered table may be different, a
common computational procedure may be employed to
obtain the statistical significance. Graubard and Korn
listed several equivalent test statistics for a given set of
scores.[15]
Choice of Scores
As discussed, several methods (e.g., trend test) of
analyzing data in 2 K ordered tables use order-preserving scores. Even the MannWhitney (or Wilcoxon rank
sum) test which apparently is a non-score-based method is
equivalent to a score-based method. The t-test with
midrank as category scores is equivalent to the Wilcoxon
rank sum test. Cochran noted that any set of scores gives a
valid test.[12] However, different sets of scores may yield
different results.[15]
Iso-Chi-Square Test
Gautam et al. proposed the Iso-chi-square test which is an
extension of Pearsons chi-square test.[16] This test also
addresses the issue of arbitrariness of the scores assigned
to the columns of 2K ordered tables. It was shown
earlier that the Pearson chi-square test statistic is equal to
nr2max where the maximum was taken over all possible
column scores (with no restriction on the scores). The
Iso-chi-square statistic is given by the same expression
but the maximum is taken over all possible orderpreserving scores. There is a closed form of solution for
the maximal scores and it is given by isotonic regression.
However, it is not necessary to extract the maximal
scores. The Iso-chi-square is equal to the Pearson chisquare obtained from either the original 2 K table or
from a reduced table obtained by collapsing certain
adjacent categories. The null distribution of the Iso-chisquare therefore is given by a mixture of chi-square
distributions with 1 to K  1 degrees of freedom. Exact
p-values can be obtained by generating tables with given
marginal totals. Gautam et al. listed 5% and 1% cut-off
values for several values of K.[16]
Iso-chi-square also addresses the issue of arbitrary
assignment of scores to the ordered categories as it
reports the maximal value of the test statistics. In cases
where there is no clear indication of what scores are to be
used, Agresti[1] suggests using sensitivity analysis by
choosing a few sets of scores. Iso-chi-square actually

ORDER

REPRINTS

Analysis of 2  K Tables

assigns all possible sets of order-preserving scores. It is


obvious that Iso-chi-square is also related to the maximal
t-statistic and the maximal trend statistic. Iso-chi-square
which utilizes all possible scores is equivalent to a
method that does not utilize order-preserving scores
proposed by Dykstra et al.[10] and thus links traditional
statistical methods with the correspondence analysis or
dual scaling.[17] It was also pointed out in an earlier
section that the Pearson chi-square test statistic is equivalent to the maximal correlation obtained without order
restriction on the scores.

Downloaded By: [University of Alberta] At: 06:30 7 January 2009

An Example
Consider Table 4 which classifies maternal drinking and
congenital sex organ malformation of babies.[15]
If the two sample Wilcoxon rank sum test is used then
the p-value = 0.56 which is also the p-value from the trend
test with midranks as category scores. If equally spaced
scores {1, 2, 3, 4, 5} are used then the p-value =0.20 (from
the trend test, t-test, logistic regression, linear regression,
and correlation analysis). In an example such as this
perhaps the mid-values of the interval represent the
underlying continuous measure. Graubard and Korn used
scores of 0, 1.5, 4.0, and 7 (somewhat arbitrary) which
yield a p-value equal to 0.01.[15] Iso-chi-square analysis
for this data set yields a p-value of 0.02. These are
exact p-values.
Stochastic Ordering
Stochastic ordering, in the context of a 2 K ordered table,
is defined as having the cumulative distribution function
(CDF) of one of the rows not crossing the distribution
function of the other. In terms of the entries of Table 1,
j
j
P
P
F1j
n1t =n1 and F2j
n2t =n2 . If F2j F1j
t1

t1

for all j =1, 2, . . ., K, then row 2 is stochastically larger


than row 1 (in the observed sample data). It is interesting
to note that row 2 is stochastically larger than row 1 if and
only if all order-preserving scores yield a larger (or equal)
mean for row 2 than the corresponding mean for row 1.
Kimeldorf et al. computed the minimum and maximum

Table 4 Maternal alcohol consumption and congenital sex


organ malformation of the children
Alcohol consumption (average number
of drinks per day)
Malformation
Absent
Present

<1

12

35

17,066
48

14,464
38

788
5

126
1

37
1

t-statistics, and argued that if the minimum t-statistic is


positive and significant (or the maximum t-statistic is
negative and significant), then any set of order-preserving scores will produce a significant result.[18] This
could be used as an evidence of stochastic ordering in a
given 2 K table. Berger et al. propose the convex hull
test for ordered categorical data.[19]

CONCLUSION
In this article some existing methods of analyzing data in
2  K (K>2) contingency tables are discussed. Pearsons
chi-square test statistic which is widely used to analyze
nominal data is shown to be related to maximal correlation. Using maximal correlation an investigator may
determine if only a few categories contribute to the observed association. This relationship between the chisquare and the maximal correlation may also shed light on
whether the large value of the chi-square test statistic is
only due to a large sample size.
The paper also discusses methods of analysis of 2 K
ordered table. Some of these methods use order-preserving scores and others that do not use such scores. Several
of the methods that utilize scores are equivalent to each
other. As these methods are directed toward a particular
alternative hypothesis they have more power in general
than the methods that do not utilize such scores. Also,
these methods are computationally simple. However, the
scores chosen are often arbitrary. In many situations the
columns may provide some indication (e.g., interval,
actual dose of a drug, etc.), where it makes sense to use
certain scores. But in a situation where the columns are
defined as low, medium, and high, it may be
difficult to come up with a set of score. In such situations,
the Iso-chi-square method may be useful. Iso-chi-square
may be considered as a natural extension of the Pearson
chi-square to the 2  K ordered table in the sense that if
this procedure is applied to 2  K nominal tables, the test
statistic is the Pearsons chi-square test statistic. Also, Isochi-square may be considered as a link between methods
that do and do not utilize order-preserving score.
All the 2  K tables discussed here are assumed to have
simple ordering. There may be other types of tables where
the ordering between two categories is not simple. For
example, parental drinking or smoking may be classified
as neither parent, mother only, father only, and
both parents. The level of the first or the last category
has a distinct hierarchy compared with any other categories. However, such a hierarchy between the second, the
first, and the third is not defined. Similarly, some 2 K
tables may have mixed categories (both nominal and
ordinal categories) or may have open-ended categories.[20,21] The method of Iso-chi-square may be extended

ORDER

REPRINTS

Analysis of 2  K Tables

in such situations. Gautam presented a case where the


choice of an open-ended category may influence the
statistical conclusion in the context of the trend test and
argued that such an analysis may be misleading.[22]

8.

9.
10.

ACKNOWLEDGMENTS

Downloaded By: [University of Alberta] At: 06:30 7 January 2009

The author would like to express his sincere thanks to


Roger Davis ScD for his valuable comments. This
research was supported in part by grant RR 01032 to the
Beth Israel Deaconess Medical Center General Clinical
Research Center from the National Institutes of Health.

REFERENCES
1. Agresti, A. Categorical Data Analysis, 2nd Ed.; Wiley:
New York, 2002.
2. Hosmer, D.W.; Lemeshow, S. Applied Logistic Regression,
2nd Ed.; Wiley: New York, 2000.
3. Collett, D. Modeling Binary Data; Chapman & Hall:
London, 1991.
4. Haberman, S.J. Test for independence in two-way contingency tables based on canonical correlation and linear-bylinear interaction. Ann. Stat. 1981, 9, 1178 1186.
5. Gautam, S.; Kimeldorf, G. Some results on the maximal
correlation in 2  K contingency tables. Am. Stat. 1999, 53
(4), 336 341.
6. Goodman, L.A.; Kruskal, W.H. Measures of association
for cross-classifications. J. Am. Stat. Assoc. 1954, 49,
732 764.
7. Bishop, Y.M.M.; Fienberg, S.E.; Holland, P.W. Discrete
Multivariate Analysis: Theory and Practice; The MIT
Press: Cambridge, 1995.

11.
12.
13.
14.

15.

16.
17.

18.

19.

20.
21.
22.

Helmes, E.; Fekken, G.C. Effects of psychotropic drugs


and psychiatric illness on vocational aptitude and interest
assessment. J. Clin. Psychol. 1986, 42, 569 576.
Conover, W.J. Practical Nonparametric Statistics, 2nd Ed.;
Wiley: New York, 1980.
Dykstra, R.; Kocher, S.; Robertson, T. Inference for
likelihood ratio ordering in the two sample problem. J.
Am. Stat. Assoc. 1995, 90, 1034 1040.
McCullugh, P. Regression models for ordinal data. J. R.
Stat. Soc., Ser. B 1980, 42, 109 142.
Cochran, W.G. Some methods of strengthening the
common chi-square test. Biometrics 1954, 10, 417 451.
Armitage, P. Tests for linear trend in proportion and
frequency. Biometrics 1955, 11, 375 386.
Mantel, N. Chi-square test with one degree of freedom:
Extension of the MantelHaenszel procedure. J. Am. Stat.
Assoc. 1963, 58, 690 700.
Graubard, B.I.; Korn, E.L. Choice of column scores for
testing independence in ordered 2  K contingency tables.
Biometrics 1987, 43, 471 476.
Gautam, S.; Singh, H.; Sampson, A. Iso-chi-square testing
in 2  K ordered tables. Can. J. Stat. 2002, 29, 609 629.
Nishisato, S. Analysis of Categorical Data: Dual Scaling
and Its Applications; University of Toronto Press: Toronto,
1980.
Kimeldorf, G.; Sampson, A.; Whitaker, L. Min max
scoring for two sample ordinal data. J. Am. Stat. Soc.
1992, 87, 241 247.
Berger, V.W.; Permutt, T.; Ivanova, A. The convex hull
test for ordered categorical data. Biometrics 1998, 54,
1541 1550.
Gautam, S. Test for linear trend in 2  K ordered tables with
open ended categories. Biometrics 1997, 53, 1163 1169.
Gautam, S. Analysis of mixed categorical data in 2  K
contingency tables. Stat. Med. 2002, 21, 1471 1484.
Gautam, S.; Ashikaga, T. Assessing the effect of openended category on the trend in 2  K ordered tables. J. Data
Sci. 2003, 1, 167 183.

Request Permission or Order Reprints Instantly!


Interested in copying and sharing this article? In most cases, U.S. Copyright
Law requires that you get permission from the articles rightsholder before
using copyrighted content.

Downloaded By: [University of Alberta] At: 06:30 7 January 2009

All information and materials found in this article, including but not limited
to text, trademarks, patents, logos, graphics and images (the "Materials"), are
the copyrighted works and other forms of intellectual property of Marcel
Dekker, Inc., or its licensors. All rights not expressly granted are reserved.
Get permission to lawfully reproduce and distribute the Materials or order
reprints quickly and painlessly. Simply click on the "Request Permission/
Order Reprints" link below and follow the instructions. Visit the
U.S. Copyright Office for information on Fair Use limitations of U.S.
copyright law. Please refer to The Association of American Publishers
(AAP) website for guidelines on Fair Use in the Classroom.
The Materials are for your personal use only and cannot be reformatted,
reposted, resold or distributed by electronic means or otherwise without
permission from Marcel Dekker, Inc. Marcel Dekker, Inc. grants you the
limited right to display the Materials only on your personal computer or
personal wireless device, and to copy and download single copies of such
Materials provided that any copyright, trademark or other notice appearing
on such Materials is also retained by, displayed, copied or downloaded as
part of the Materials and is not removed or obscured, and provided you do
not edit, modify, alter or enhance the Materials. Please refer to our Website
User Agreement for more details.

Request Permission/Order Reprints


Reprints of this article can also be ordered at
http://www.dekker.com/servlet/product/DOI/101081EEBS120023105

S-ar putea să vă placă și