Documente Academic
Documente Profesional
Documente Cultură
To cite this article: Carl F. Falk & Victoria Savalei (2011) The Relationship Between
Unstandardized and Standardized Alpha, True Reliability, and the Underlying Measurement Model,
Journal of Personality Assessment, 93:5, 445-453, DOI: 10.1080/00223891.2011.594129
Download by: [Australian Catholic University] Date: 08 October 2017, At: 03:09
Journal of Personality Assessment, 93(5), 445453, 2011
Copyright C Taylor & Francis Group, LLC
ISSN: 0022-3891 print / 1532-7752 online
DOI: 10.1080/00223891.2011.594129
Popular computer programs print 2 versions of Cronbachs alpha: unstandardized alpha, , based on the covariance matrix, and standardized
Downloaded by [Australian Catholic University] at 03:09 08 October 2017
alpha, R , based on the correlation matrix. Sources that accurately describe the theoretical distinction between the 2 coefficients are lacking, which
can lead to the misconception that the differences between R and are unimportant and to the temptation to report the larger coefficient. We
explore the relationship between R and and the reliability of the standardized and unstandardized composite under 3 popular measurement
models; we clarify the theoretical meaning of each coefficient and conclude that researchers should choose an appropriate reliability coefficient
based on theoretical considerations. We also illustrate that R and estimate the reliability of different composite scores, and in most cases cannot
be substituted for one another.
Cronbachs alpha (Cronbach, 1951) is by far the most frequently alpha might actually be an overestimate of reliability (Raykov,
reported reliability coefficient. Decades of research report 2001; Zimmerman et al., 1993; cf. Zinbarg et al., 2005).
on the various properties and uses of coefficient alpha (for However, we believe that coefficient alpha is useful when its
reviews see Brennan, 2001; Cronbach & Shavelson, 2004; Li, assumptions are understood and met, and many practitioners
Rosenthal, & Rubin, 1996; Rodriguez & Maeda, 2006), and continue to rely on it as an index of reliability. Under certain
a plethora of articles and books have advised researchers how measurement model assumptions, alpha has an intuitive inter-
to compute and interpret alpha (e.g., Cortina, 1993; Henson, pretation that is easy to understand: it is the average of all
2001; Lord & Novick, 1968; John & Soto, 2007; McDonald, possible split-half reliability estimates (using Rulons method;
1999; Novick & Lewis, 1967; Nunnally & Bernstein, 1994; Cronbach, 1951; Novick & Lewis, 1967). Thus, alpha was ini-
Onwuegbuzie & Daniel, 2002; Romano, Kromrey, & Hibbard, tially popularized perhaps because it removed the indeterminacy
2010; Schmitt, 1996; Streiner, 2003a, 2003b; Traub & Rowley, of deciding how to split a test in half, which was the most popular
1991). Numerous articles have also explored the relation- method for determining reliability from a single test adminis-
ship between coefficient alpha and various other measures of tration before Cronbachs (1951) paper. In addition, it turns out
reliability (e.g., Callender & Osburn, 1979; Cronbach & Azuma, that alpha is an accurate estimate of reliability under measure-
1962; Cudeck, 1980; Jackson, 1979; Osburn, 2000; Raju, 1977; ment models typically assumed by popular split-half reliability
Raykov, 1997a; Zinbarg, Revelle, Yovel, & Li, 2005). estimation methods.
Despite the popularity of coefficient alpha, its continued Researchers who wish to report coefficient alpha are faced
use by researchers is also somewhat controversial. Some with the choice between two varieties of this coefficient: unstan-
psychometricians argue against the use of coefficient alpha in dardized alpha, , based on the covariance matrix of the items,
favor of alternative indexes of reliability (e.g., Green & Yang, and standardized alpha, R , based on the correlation matrix of
2009; Revelle & Zinbarg, 2009; Sijtsma, 2009a, 2009b). For the items. Knowing the difference between these coefficients is
example, under certain measurement models, coefficient alpha particularly important because popular computer software such
tends to be an underestimate of reliability (Guttman, 1945; as SPSS, SAS, and other programs readily print both coeffi-
Lord & Novick, 1968; Raykov, 1997b; Zimmerman, Zumbo, cients. However, the relationship between these coefficients is
& Lalonde, 1993), although in many cases this underestimation not well understood and researchers sometimes report the wrong
might not be serious (e.g., Raykov, 1997b). If researchers use coefficient given the analyses they perform (e.g., Davis, 1980;
an underestimate of a measures reliability to disattenuate cor- Fresko, Kfir, & Nasser, 1997; Stone & Yoder, 2001). The de-
relations, this could result in inflated estimates of the resulting cision of which version of alpha to use depends on whether
correlations (Schmitt, 1996). Under other conditions, coefficient researchers decide to standardize a tests items before adding
the items to form a composite score. This decision to standard-
ize or leave items in raw form should be based on substantive
considerations, and this article focuses primarily on the choice
Received March 26, 2010; Revised October 30, 2010. of an appropriate reliability index after such a decision has been
Address correspondence to Carl F. Falk, Department of Psychology, Univer- made.
sity of British Columbia, 2136 West Mall, Vancouver, BC V6T 1Z4, Canada; Although unstandardized alpha is by far the most re-
Email: cffalk@psych.ubc.ca searched coefficient, only a small body of research has explored
445
446 FALK AND SAVALEI
standardized alpha (e.g., Hayashi & Kamata, 2005; Sun et al., based on the correlation matrix, is given by:
2007). Not only is the relationship between R and un-
derstudied, but confusion exists as to whether or not the two
coefficients are estimates of the same lower bound to reliability, k k k 2 rij
R = 1 k t = 2 , (2)
and, even more fundamentally, whether or not the reliability of k1 i=1 j=1,j =k rij +k
YZ
a scale changes when its items are standardized. For instance,
one article incorrectly claims that the true reliability of a mea-
sure is the same for both standardized and unstandardized ob- where rij corresponds to the correlation between the ith and jth
served scores (Osburn, 2000, p. 348; cited 68 times according item, rij is the average correlation, and Y2Z is the variance of
to Google Scholar, 2010). The relationship between R and is the composite of standardized items. The leftmost expressions
also not well understood. For example, one frequently cited ex- in Equations 1 and 2 are computational formulas and are most
pository article on coefficient alpha incorrectly states that stan- familiar to applied researchers, whereas the rightmost expres-
dardized alpha is always greater than or equal to unstandardized sions provide simpler representations that will be more intuitive
alpha (Cortina, 1993; cited 1,186 times according to Google for understanding how R and relate to the true reliabil-
Scholar, 2010). We provide an example where this is not true. ity of standardized and unstandardized composites. Equations 1
It is essential that researchers have access to accurate informa- and 2 were defined on population matrices. The corresponding
tion regarding the proper interpretation of reliability coefficients sample definitions can be obtained by using sample covariance
Downloaded by [Australian Catholic University] at 03:09 08 October 2017
such as R and , and the relationship between them. This will and correlation matrices. Throughout most of this article, we
enable researchers to make informed decisions as to whether ei- do not distinguish between sample and population coefficients,
ther type of alpha is an appropriate reliability estimate for their but we briefly address the effect of sampling variability where
data. The main goal of this article is to clarify the interpretation appropriate.
and the properties of R and , and to show that in many cases
these coefficients cannot be substitutes for each other. We will
show that the proper interpretation of each coefficient depends ALPHA, TRUE RELIABILITY, AND THE UNDERLYING
on the measurement model assumed for the data. Finally, we MEASUREMENT MODEL
will state recommendations for the use of each coefficient. Given that coefficients and R can be quite different when
computed on a particular sample, researchers might sometimes
DEFINITION OF UNSTANDARDIZED be faced with the tempting choice to report the larger of the two
AND STANDARDIZED ALPHA coefficients. Unfortunately, methodological articles sometimes
Unstandardized alpha is simply alpha computed on the co- inadvertently encourage this practice by implying that both co-
variance matrix of items, whereas standardized alpha is alpha efficients estimate the same population coefficient (i.e., Osburn,
computed on the correlation matrix of items. The unstandard- 2000). However, except under the most restrictive measurement
ized alpha coefficient, based on a covariance matrix with k model, the population values of and R are not the same.
items is given by: Additionally, the true reliabilities of measurement instruments
based on unstandardized and standardized scores are also not
k necessarily the same. The choice between R and should
k 2 k 2 ij
= 1 i=12 i = 2 , (1) depend on whether standardized or unstandardized scores are
k1 Y Y summed to form the scale. Moreover, whether the correspond-
ing alpha coefficient is an accurate measure of reliability for
where Y2 is the variance of the composite made up by adding the chosen scale will depend on the hypothesized measurement
raw item scores, Y = X1 + X2 + + Xk , i2 is the variance model for the items (e.g., McDonald, 1999; Novick & Lewis,
of the ith item, and ij is the average covariance of the items.1 1967). Researchers who wish to interpret alpha as an accurate
A correlation matrix, R, is obtained from the covariance ma- estimate of reliability are implicitly assuming a measurement
trix by dividing each element by the product of the correspond- model for their items. To further complicate things, this mea-
ing variables standard deviations (which is equivalent to stan- surement model might vary for standardized and unstandardized
dardizing each item and computing the resulting standardized item scores. It is possible, for example, that is an accurate
items covariance matrix).2 The standardized alpha coefficient, estimate of the reliability of the scale composed of unstandard-
ized items, whereas R is only a lower bound on the reliability
of the corresponding scale composed of standardized items.
1The variance of the composite can be obtained by either computing the We now elaborate on these points by considering three popular
composite and taking its variance, or by summing up all the elements in the measurement models from classical test theory: parallel tests,
covariance matrix . tau-equivalent tests, and congeneric tests (for an introduction
2Throughout the article, we assume that the tests items are measured con-
to classical test theory, see Crocker & Algina, 1986; Lord &
tinuously or at least using a sufficient number of categories so that the true score Novick, 1968; Nunnally & Bernstein, 1994).
model holds approximately at the item level (five to seven categories is usu- Consider a composite made up of original item scores,
ally deemed enough). For the case of binary items or items with few response
Y = X1 + X2 + + Xk , which we refer to as the unstandard-
options, although one can compute a reliability coefficient from the tetrachoric
or polychoric correlation matrix, this coefficient estimates the reliability of a
ized composite. Under classical test theory, the composites total
composite made of the underlying continuous responses to the items, not of the score can be decomposed as Y = TY + EY , where TY is the true
observed categorical items. Thus, this is not the estimate of interest. To model score, also described as the underlying construct or factor being
the observed categorical items appropriately requires switching to the item re- measured, and EY is the error score. Here, error is concep-
sponse theory framework, where the concept of reliability is replaced with the tualized as unsystematic and unrelated to the construct being
concept of information. measured. (For a deeper discussion of the concept of true and
STANDARDIZED ALPHA 447
TABLE 1.A comparison of the assumptions and features of three popular classical test theory measurement models.
Measurement
Model True Scores Error Scores Identifying Features True Reliability
k 2 T2
Parallel Ti = T ; var (Ti ) = T2 for all i var(Ei ) = E2 All item variances equal; all item =
Y2
for all i covariances equal; all correlations equal
k 2 T2
Tau-equivalent Ti = T ; var (Ti ) = T2 for all i E2 1 = E2 2 Unequal item variances; all item =
Y2
= = E2 k covariances equal
2
k
i=1 bi T
2
Congeneric Ti = bi T ; b12 T2 = b22 T2 = = bk2 T2 E2 1= E2 2 Unequal item variances; unequal item =
Y2
= = E2 k covariances
Note. Each item is assumed to follow the model: Xi = Ti + Ei . Assumptions regarding the true and error scores distinguish the three measurement models.
error scores and their interpretation, see, e.g., Bentler, 2009; YZ = TYZ + EYZ . The standardized composites true score and
Novick, 1966; Steyer, 1989; Zinbarg et al., 2005). error score
are the sum of the standardizeditems true scores,
The reliability of this composite is defined as the ratio of true TYZ = ki=1 TiZ , and error scores, EYZ = ki=1 EiZ . The stan-
dardized composites reliability, R , is also the ratio of its true
Downloaded by [Australian Catholic University] at 03:09 08 October 2017
var(Ei ) = E2 for all i.5 That is, each item is measuring the same model). This means that if items are parallel, either alpha can
underlying true score to the same extent, and error variances are be used to estimate reliability in the sample, and the actual scale
equal across the items. The covariance matrix for such a model reliability remains the same. Although sample estimates might
will have equal variances, X2 = T2 + E2 , and equal covariances differ from one another, they will estimate the same population
(see Table 1 for a summary of the assumptions and identifying quantity.
features of each measurement model). In a finite sample, item
variances will only be approximately equal and item covariances Tau-Equivalent Items
will only be approximately equal. Statistical tests are available The tau-equivalent measurement model still assumes that any
for checking the assumptions of a parallel tests model (e.g., item score can be decomposed into its true and error part: Xi =
Reuterberg & Gustafsson, 1992). T + Ei , where each item is measuring the same true score and
The covariance matrix for parallel items can be decomposed to the same extent. However, the error variances are no longer
into reliable and error variance components as follows:6 assumed to be equal for each item; that is, E21 = E22 = =
E2k . This measurement model results in a covariance matrix
T2 + E2 where item covariances are equal, but item variances are not
= ... ... equal (see Table 1). The covariance matrix can be decomposed
T
2
. . . T + E
2 2 as follows:
Downloaded by [Australian Catholic University] at 03:09 08 October 2017
2 2
T E T2 + E21
= ... ... + 0 ... .
(5) = ... ...
T . . . T
2 2
0 0 E2
T
2
. . . T + Ek
2 2
2 2
The total reliable variance of the composite is then the sum of T E1
all elements in the first term of this decomposition: T2Y = k 2 T2 . = ... ... + 0 ...
. (6)
The total variance of the composite is the sum of all the elements
in both the reliable and the error terms of this decomposition: T2 . . . T2 0 0 E2k
Y2 = k 2 T2 + kE2 . The reliability of the test is then the ratio of
k2 2 k2 2 In this case, the amount of reliable variance in the composite is
the true variance to the total variance, or = k2 2 +k
T
2 =
T
.
T E Y2 the same as that for parallel items, T2Y = k 2 T2 , as all true score
Compare the rightmost term to the rightmost term of Equation variances are still equal to one another. Although error variances
1. Because all item covariances are equal, the average item are no longer equal, the reliability of the composite is still the
covariance will be equal to T2 . Therefore, in the case of parallel k2 2
tests, unstandardized alpha is equal to the actual reliability of the ratio of true score variance to total variance: = 2T , because
Y
composite, = . This relationship holds in the population. the total variance incorporates the different item error variances
This means that for a finite sample, will be a consistent (see Footnote 1). Because all item covariances are equal, the
estimate of the true reliability of the composite. average item covariance used in Equation 1 will again be equal
When items from the parallel model are standardized, the to T2 and it is still the case that = . In finite samples,
standardized items covariances can be obtained from the origi- will be an accurate estimate of reliability of the unstandardized
nal covariances by dividing each element of by the product of composite.
the corresponding standard deviations. The resulting covariance Because item variances under the tau-equivalent model are
matrix of the standardized items is just the correlation matrix of not equal to one another, when tau-equivalent items are standard-
the raw data. Because the original items are parallel and have ized each item will be divided by a different standard deviation.
equal variances and equal covariances, all the off-diagonal ele- The resulting correlation matrix will consist of off-diagonal el-
ments (i.e., interitem correlations) will be equal to one another, ements that are no longer equal to one another, which means
and its diagonal elements will all be 1. Thus, the standardized that the standardized items no longer follow the tau-equivalent
items will also follow a parallel tests model. Moreover, because model. The reliability of the unstandardized composite will not
all items are rescaled by the same amount, standardization does equal the reliability of the standardized composite because the
not change the ratio of true score to error variance, and hence true and error variances have changed in a nontrivial way after
the reliability of the unstandardized composite is the same as standardization. Additionally, R is no longer an accurate esti-
that of the standardized composite, = R , and unstandard- mate of the standardized composites reliability, R = R , nor is
ized alpha is equal to standardized alpha, = R . Because it equal to unstandardized alpha, = R . Actually, the model
under a parallel tests model alpha is equal to reliability, all four obtained after standardizing tau-equivalent items is a congeneric
of these coefficients are equal in the population (see Table 2 for measurement model, which we discuss next.
a comparison of all four coefficients under each measurement
Congeneric Items
5All three measurement models discussed in this article allow for different Under the congeneric measurement model, items are assumed
intercepts among the items; however, item intercepts are omitted for simplicity to have the following decomposition: Xi = bi T + Ei . That is,
and because they do not affect any of the reliability computations. items no longer measure the underlying true score to the same
6The form of this matrix is obtained by applying the rules of covariance extent, but some items are more highly correlated with the true
algebra to find the variances of individual items and simplify the covariances score than others. In addition, as in the tau-equivalent model,
between Xi = T + Ei and Xj = T + Ej (e.g., Bollen, 1989). error variances are also not assumed equal. A congeneric model
STANDARDIZED ALPHA 449
TABLE 2.Three popular classical test theory measurement models, the corresponding standardized item measurement model if the items are standardized, and
the relationship between unstandardized alpha, standardized alpha, and the true reliability of each composite.
Parallel Parallel = R = R = R = R
Tau-equivalent Congeneric = R = R = R = R
Congeneric A different congeneric = R = R = R = R
model
Note. The preceding relationships hold in the population. In any given finite sample, these relationships will hold approximately.
is basically a one-factor model, and the coefficients bi are also R R . Finally, because all four coefficients ( , R , , R )
called factor loadings. The covariance matrix of such items can are different (see Table 2), nothing can be said about the relative
be identified by noting that item variances and covariances are size of and R . Also, we do not know how R compares to
unequal to each other (see Table 1). The covariance matrix of . Thus, if one uses a sample estimate of R to accompany
Downloaded by [Australian Catholic University] at 03:09 08 October 2017
such items can be partitioned into reliable variance and error an unstandardized composite, one cannot make any meaningful
variance as follows: statements with regard to the accuracy of this coefficient as an
estimate of the unstandardized composites true reliability.
b12 T2 + E21
NUMERICAL EXAMPLES
= We now give several numerical examples that illustrate the
b1 bK T2 bK T + E2K
2 2
three measurement models and the coefficients , R , ,
2 and R . First, consider the following population covariance
b12 T2 E1 matrix and its decomposition into true and error variance
= + 0
. components:
b1 bK T2 bK T
2 2 0 0 EK
2
5
(7) 2 5
=
2
= T + E
2 5
The total reliable variance in the composite is the sum of ev-
erything in the first term of this decomposition, which simplifies 2 2 2 5
to T2Y = ( ki=1 bi )2 T2 . The reliability of the composite is, by 2 3
definition, the ratio of the reliable variance
in the composite to 2 2 0 3
( k
b )2 2
the total variance of the composite: = i=1 2 i T . Coefficient =
2
+
Y
2 2 0 0 3
alpha computed on the congeneric population covariance matrix 2 2 2 2 0 0 0 3
k2 b b 2
will produce: = i 2j T . This follows from Equation 1 by
Y
noting that bi bj T2 is the average item covariance. Because the Because all item covariances are equal and all item vari-
numerators for the reliability of the unstandardized composite ances are equal, the items follow a parallel tests model. This
and unstandardized alpha are different, is in general not equal is an example of the theoretical decomposition in Equation
to . Actually, it can be shown that is smaller than the true 5. For these data, the average covariance is 2, the total vari-
ance of the composite (sum of every element in ) is 44, and
reliability, , because k 2 bi bj ( ki=1 bi )2 (Guttman,
= 4442 = .73, using Equation 1. The true reliability of the
2
Using the same computations as just shown but applied to is not an accurate measure of the unstandardized composites
these matrices, we obtain R = R = .73. All four coefficients reliability.
are equal to one another and either alpha would be an accurate Finally, our last example concerns the following covariance
measure of the true reliability of the unstandardized or standard- matrix:
ized composites.
As the next example, consider the following covariance ma- 17.52
trix and its decomposition: 15.77 17.52
=
.41
=T + E
.41 .98
11 .41 .41 .01 .98
2 4
=
2 2
= T + E
15.77
3 15.77 15.77
2 2 2 8 =
.41
.41 .01
2 9 .41 .41 .01 .01
2 2 0 2
=
2 2
+
1.75
2 0 0 1
Downloaded by [Australian Catholic University] at 03:09 08 October 2017
0 1.75
2 2 2 2 0 0 0 6 + 0
0 .97
The items follow a tau-equivalent model and is an exam- 0 0 0 .97
ple of the theoretical decomposition in Equation 6, where all
item covariances are equal but item variances are not equal. This matrix follows a congeneric model and is an exam-
For these data, the average covariance is 2, the total vari- ple of the theoretical decomposition in Equation 7. Here,
2
2.9
ance of the composite (sum of every element in ) is 50, = 471.84 = .65, which is a rather substantial underestimate
and = 4502 = .64. The sum of all the elements in T is
2
of the actual reliability, = 71.84
66.4
= .92. The corresponding
2 16 = 32, and = 32 = .64. Again, = , meaning that correlation matrix and its decomposition are given by:
50
the equation for alpha gives the correct reliability computation
for tau-equivalent items. 1
The situation changes when the tau-equivalent data just shown .9 1
are standardized: R=
.1
.1 1
.1 .1 .01 1
1
.30
1 .9 .1
R=
.35
.58 1 .9 .9 0 .1
=
.1
+
0 0 .99
.21 .35 .41 1 .1 .01
.1 .1 .01 .01 0 0 0 .99
.18 .82
.30 .5 0 .5
=
.35
+
0
For this decomposition, R = 4 6.62(.22)
2
= .53, and the actual
.58 .67 0 .33
.21 .35 .41 .25 0 0 0 .75 reliability of the standardized composite is R = 4.44 6.62
= .67.
This example illustrates that the reliabilities of the unstandard-
ized and standardized composites can be quite different (.92
vs .67), unlike the claim of Osburn (2000) that they are always
The resulting standardized items now follow a congeneric
the same. Also, both and R are serious underestimates of
model. The covariances (i.e., item correlations) are no longer
the actual corresponding reliabilitiesa difference of .27 for
equal. The decomposition no longer resembles an example of
the unstandardized composite and .14 for the unstandardized
Equation 6, but rather of Equation 7. Identifying a congeneric
composite. Such large underestimates are more likely to occur
model and arriving at the decomposition is not trivial, and
when item correlations or covariances are quite different
requires factor analysis. For our purposes, we assume that
from one another. Of course, Items 3 and 4 do not seem to
the decomposition of components is known. The average
belong on the scale, and Items 1 and 2 seem to be duplicating
covariance is now .37, and the sum of everything in R is 8.4,
2 content. However, such a situation could occur in practice. For
so that R = 4 8.4(.37)
= .70. The actual reliability, however, is example, Items 3 and 4 might have simply not yielded enough
R = 8.4 = .71, where the numerator is the sum of all the
6
variation to correlate highly with the other items in the matrix
elements in the reliable component of the decomposition. or variable scaling might have been different. Researchers
Therefore, R is a slight underestimate of the standardized should always examine itemtotal correlations as well as the
composites reliability. Note also because R > , a re- actual correlation matrix of the items to detect problems with
searcher might be tempted to report R . This is particularly true the scale, even if alpha is acceptable. Correlations that are
given that that (.64) is somewhat low, but R (.70) meets a extremely unequal might lead the researchers to consider other
common cutoff criterion (e.g., Nunnally, 1978). However, R estimates of reliability or dropping low-correlating items.
STANDARDIZED ALPHA 451
Finally, this example shows that can be greater than R , items. Formally, researchers might also test whether their data
contradicting Cortina (1993). Although it is tempting to think meet the assumptions of any of the classical test theory mea-
that this is because the datas correlation matrix is so extreme, surement models (e.g., Reuterberg & Gustafsson, 1992). In
the Appendix proves that a set of standard deviations leading our experience, it is unlikely that a test will meet the strin-
to R can be found for any correlation matrix. gent assumptions of a parallel or tau-equivalent measurement
model.
If it is the case that data follow a congeneric model, then
DISCUSSION AND RECOMMENDATIONS it is more appropriate to interpret alpha as a lower bound to
The process by which researchers decide on an appropriate reliability as it tends to underestimate the actual reliability of
estimate of internal consistency reliability is twofold. First, re- the scale. Instead of using coefficient alpha, researchers might
searchers should decide whether to estimate internal consistency obtain a better estimate of the reliability of a composite by
from the covariance matrix or correlation matrix on the basis of using an alternative approach (e.g., Bentler, 2009; Green &
how the composite score is calculated. In general, if researchers Yang, 2009; Revelle & Zinbarg, 2009; Sijtsma, 2009a). For
intend to sum raw scores, the covariance matrix should be used instance, researchers could compute coefficient rho, which re-
to determine internal consistency. If researchers intend to sum quires estimating a factor analytic model for the data (Bentler,
standardized scores, the correlation matrix is more appropri- 2006; Green & Yang, 2009; Raykov, 1997a, 1997b; Yang &
ate for determining internal consistency. Second, researchers Green, 2010). However, we qualify this advice by observing
Downloaded by [Australian Catholic University] at 03:09 08 October 2017
should decide whether alpha is an appropriate estimate of relia- that, first, coefficient alpha is often not a serious underesti-
bility for that composite, based on the underlying measurement mate of true reliability unless the weights bi are very hetero-
model that is either assumed or is shown to approximately hold geneous, there are few test items, and items contain a lot of
for the data. Neither of these decisions is trivial. Although this error (Raykov, 1997b). Second, alternatives to alpha also tend
article primarily focuses on the latter issue, which assumes that to be either not readily accessible to researchers or require ad-
researchers have selected a composite that is of interest, we now vanced statistical training, and additional research is required
briefly discuss the first issue. before an ideal alternative candidate emerges (Revelle & Zin-
There are several reasons that researchers might wish to stan- barg, 2009; Sijtsma, 2009b). In contrast, computation of alpha
dardize items before combining them. In general, standardizing requires minimal statistical training and researchers deciding
items will give equal weight to each item in constructing the to report alpha can be sure that others will easily recognize
composite. If items are left in raw form, the items with higher and interpret it. Thus, we believe that coefficient alpha might
variance will have higher weight in determining the distribution be more useful than some recent theorists seem to imply. Re-
of scores across individuals. Thus, standardizing items before gardless of which reliability estimate is used, researchers might
creating a composite can be useful in cases where the items use also wish to report a confidence interval for the estimate (e.g.,
different metrics or for some other reason where equal weight- Fan & Thompson, 2001; Raykov, 1998b; Raykov & Shrout,
ing of items is desired. For instance, equal weighting can often 2002).
be superior to alternative methods of combining scores such as Finally, it should be noted that it is possible for data to not
the use of regression-based weights (e.g., Dana & Dawes, 2004; follow any of the three classical test theory measurement models
Dawes, 1979). we have presented. For example, in some cases errors might be
In some instances, equal variances are already presumed correlated with each other. This can occur when items share sim-
by the measurement instrument. For instance, personality re- ilar wording or tap another theoretically uninteresting construct.
searchers often use self-report Likert items of the same format During scale construction researchers sometimes use a confir-
(Paulhus & Vazire, 2007). Thus, items are theoretically already matory factor analysis model and correlate errors to achieve
on the same metric, and in this situation, leaving the scores in model fit; if so, such a measurement model is different from
raw form might make the most sense. Researchers might also any of the three considered. In a related case, researchers might
wish to leave scores in the original metric, or follow a tests hypothesize (or find empirical evidence) that a test measures
official scoring procedure, especially in cases where this metric a general factor that represents the underlying construct of in-
has a meaningful interpretation. For example, this is sometimes terest, along with other factors (e.g., bifactor models). In both
the case in clinical applications of scales where cutoff scores or these cases, alpha will generally be an overestimate of the reli-
change scores are empirically demonstrated to map onto a level able variance due to the general factor and the coefficient h or
of psychopathology or an actual meaningful change in the lives structural equation techniques are preferred as estimates of the
of patients (e.g., Kazdin, 1999; Kendall, Marrs-Garcia, Nath, & general factors reliability (McDonald, 1999; Raykov, 1998a;
Sheldrick, 1999). Zinbarg et al., 2005). Typically in personality research it is as-
The second decision concerns what measurement model the sumed that there is some underlying construct or trait individuals
items are assumed to follow, whether they be standardized or un- possess that causes them to respond in a certain way on a tests
standardized. If data follow a parallel or tau-equivalent model, items. If this assumption is not plausible and a tests items are
then the researcher can use and interpret coefficient alpha as formative rather than reflective (e.g., socioeconomic status is
an accurate estimate of reliability. Researchers might infor- a composite consisting of measures such as education and in-
mally inspect the covariance or correlation matrix to see if item come; socioeconomic status does not cause income), it is not
covariances or correlations are approximately equal (meeting necessary that items be positively correlated (Bollen & Lennox,
the assumptions of a tau-equivalent model). This practice could 1991). In this case, other measures of reliability (e.g., testretest)
also help researchers identify item heterogeneity, multidimen- are more appropriate than internal consistency coefficients such
sionality, and items that do not correlate positively with other as alpha and rho.
452 FALK AND SAVALEI
REFERENCES Li, H., Rosenthal, R., & Rubin, D. B. (1996). Reliability of measurement in
Bentler, P. M. (1968). Alpha-maximized factor analysis (Alphamax): Its rela- psychology: From Spearman-Brown to maximal reliability. Psychological
tion to alpha and canonical factor analysis. Psychometrika, 33, 335345. Methods, 1, 98107.
Bentler, P. M. (2006). EQS 6 structural equation program manual. Encino, CA: Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores.
Multivariate Software. Reading, MA: Addison-Wesley.
Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consis- McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
tency reliability. Psychometrika, 74, 137143. Novick, M. R. (1966). The axioms and principal results of classical test theory.
Bollen, K. A. (1989). Structural equations with latent variables. New York, Journal of Mathematical Psychology, 3, 118.
NY: Wiley. Novick, M. R., & Lewis, C. (1967). Coefficient alpha and the reliability of
Bollen, K. A., & Lennox, R. (1991). Conventional wisdom on measure- composite measurements. Psychometrika, 32, 113.
ment: A structural equation perspective. Psychological Bulletin, 110, 305 Nunnally, J. C. (1978). Psychometric theory. New York, NY: McGraw-Hill.
314. Nunnally, J. C., & Bernstein, I. (1994). Psychometric theory. New York, NY:
Brennan, R. L. (2001). An essay on the history and future of reliability from McGraw-Hill.
the perspective of replications. Journal of Educational Measurement, 38, Onwuegbuzie, A. J., & Daniel, L. G. (2002). A framework for reporting and
295317. interpreting internal consistency reliability estimates. Measurement and Eval-
Callender, J. C., & Osburn, H. G. (1979). An empirical comparison of coefficient uation in Counseling and Development, 35, 89103.
alpha, Guttmans lambda2, and MSPLIT maximixed split-half reliability Osburn, H. G. (2000). Coefficient alpha and related internal consistency relia-
estimates. Journal of Educational Measurement, 16, 8999. bility coefficients. Psychological Methods, 5, 343355.
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and Paulhus, D. L., & Vazire, S. (2007). The self-report method. In R. W. Robins,
applications. Journal of Applied Psychology, 78, 98104. R. C. Fraley, & R. F. Krueger (Eds.), Handbook of research methods in
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test personality psychology (pp. 224239). New York, NY: Guilford.
theory. New York, NY: Holt, Rinehart, & Winston. Raju, N. S. (1977). A generalization of coefficient alpha. Psychometrika, 42,
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. 549565.
Psychometrika, 16, 297334. Raykov, T. (1997a). Estimation of composite reliability for congeneric mea-
Cronbach, L. J., & Azuma, H. (1962). Internal-consistency reliability formulas sures. Applied Psychological Measurement, 21, 173184.
applied to randomly sampled single-factor tests: An empirical comparison. Raykov, T. (1997b). Scale reliability, Cronbachs coefficient alpha, and viola-
Educational and Psychological Measurement, 22, 645665. tions of essential tau-equivalence with fixed congeneric components. Multi-
Cronbach, L. J., & Shavelson, R. J. (2004). My current thoughts on coefficient al- variate Behavioral Research, 32, 329353.
pha and successor procedures. Educational and Psychological Measurement, Raykov, T. (1998a). Coefficient alpha and composite reliability with interrelated
64, 391418. nonhomogeneous items. Applied Psychological Measurement, 22, 375385.
Cudeck, R. (1980). A comparative study of indices for internal consistency. Raykov, T. (1998b). A method for obtaining standard errors and confidence
Journal of Educational Measurement, 17, 117130. intervals of composite reliability for congeneric measures. Applied Psycho-
Dana, J., & Dawes, R. M. (2004). The superiority of simple alternatives to logical Measurement, 22, 369374.
regression for social science predictions. Journal of Educational and Behav- Raykov, T. (2001). Bias of coefficient for fixed congeneric measures with
ioral Statistics, 29, 317331. correlated errors. Applied Psychological Measurement, 25, 6976.
Davis, M. H. (1980). A multidimensional approach to individual differences in Raykov, T., & Shrout, P. E. (2002). Reliability of scales with general structure:
empathy. JSAS Catalog of Selected Documents in Psychology, 10, 85. Point and interval estimation using a structural equation modeling approach.
Dawes, R. M. (1979). The robust beauty of improper linear models in decision Structural Equation Modeling, 9, 195212.
making. American Psychologist, 34, 571582. Reuterberg, S. E., & Gustafsson, J. E. (1992). Confirmatory factor analysis
Fan, X., & Thompson, B. (2001). Confidence intervals about score reliability and reliability: Testing measurement model assumptions. Educational and
coefficients, please: An EPM guidelines editorial. Educational and Psycho- Psychological Measurement, 52, 795811.
logical Measurement, 61, 517531. Revelle, W., & Zinbarg, R. E. (2009). Coefficients alpha, beta, omega and the
Fresko, B., Kfir, D., & Nasser, F. (1997). Predicting teacher commitment. GLB: Comments on Sijtsma. Psychometrika, 74, 145154.
Teacher and Teacher Education, 13, 429438. Rodriguez, M. C., & Maeda, Y. (2006). Meta-analysis of coefficient alpha.
Google Scholar. (2010). http://scholar.google.com. Retrieved on August 21, Psychological Methods, 11, 306322.
2010. Romano, J. L., Kromrey, J. D., & Hibbard, S. T. (2010). Confidence interval
Green, S. B., & Yang, Y. (2009). Commentary on coefficient alpha: A cautionary methods for coefficient alpha. Educational and Psychological Measurement,
tale. Psychometrika, 74, 121135. 70, 376393.
STANDARDIZED ALPHA 453
Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assess- in Equation A.1 and can be rewritten as follows:
ment, 8, 350353.
Seber, G. A. F., & Lee, A. J. (2003). Linear regression analysis. Hoboken, NJ: k w w
Wiley. G = 1 , (A.2)
Sijtsma, K. (2009a). On the use, the misuse, and the very limited usefulness of k1 w Rw
Cronbachs alpha. Psychometrika, 74, 107120.
Sijtsma, K. (2009b). Reliability beyond theory and into practice. Psychometrika, where w = D 1/2 l. That is, G is the alpha coefficient corre-
74, 169173. sponding to a weighted composite, where w is the vector of
Steyer, R. (1989). Models of classical psychometric test theory as stochastic
the weights. When D = D , the weights are simply standard
measurement models: Representation, uniqueness, meaningfulness, identifi-
ability, and testability. Methodika, 3, 2560.
deviations of the variables, or the square roots of each element
Streiner, D. L. (2003a). Being inconsistent about consistency: When coeffi- of the diagonal of . When D = I , the weights are all equal to
cient alpha does and doesnt matter. Journal of Personality Assessment, 80, 1, the square root of each diagonal element of R.
217222. Maximizing Equation A.2 is the same as maximizing:
Streiner, D. L. (2003b). Starting at the beginning: An introduction to coeffi-
cient alpha and internal consistency. Journal of Personality Assessment, 80, w Rw
99103. = . (A.3)
Stone, W. L., & Yoder, P. J. (2001). Predicting spoken language level in children w w
with autism spectrum disorders. Autism, 5, 341361.
Downloaded by [Australian Catholic University] at 03:09 08 October 2017
Sun, W., Chou, C. P., Stacy, A. W., Ma, H., Unger, J., & Gallaher, P. (2007). The maximum of the ratio of two quadratic forms is a well-
SAS and SPSS macros to calculate standardized Cronbachs alpha using the known problem, and the maximum is achieved when w is the
upper bound of the phi coefficient for dichotomous items. Behavior Research eigenvector of R corresponding to the largest eigenvalue (e.g.,
Methods, 39, 7181. Seber & Lee, 2003). Nonetheless, we show this by taking the
Traub, R. E., & Rowley, G. L. (1991). Understanding reliability. Educational derivative of Equation A.3 and setting it to zero:
Measurement: Issues and Practice, 10, 3745.
Yang, Y., & Green, S. B. (2010). A note on structural equation modeling esti-
2Rw(w w) 2w(w Rw)
mates of reliability. Structural Equation Modeling, 17, 6681. = = 0,
Zimmerman, D. W., Zumbo, B. D., & Lalonde, C. (1993). Coefficient alpha as w (w w)2
an estimate of test reliability under violation of two assumptions. Educational
and Psychological Measurement, 53, 3349. which is equivalent to solving
Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbachs , Revelles
, and McDonalds H : Their relations with each other and two alternative
(w Rw)
conceptualizations of reliability. Psychometrika, 70, 123133. R I w = 0.
w w