Sunteți pe pagina 1din 10

Journal of Personality Assessment

ISSN: 0022-3891 (Print) 1532-7752 (Online) Journal homepage: http://www.tandfonline.com/loi/hjpa20

The Relationship Between Unstandardized and


Standardized Alpha, True Reliability, and the
Underlying Measurement Model

Carl F. Falk & Victoria Savalei

To cite this article: Carl F. Falk & Victoria Savalei (2011) The Relationship Between
Unstandardized and Standardized Alpha, True Reliability, and the Underlying Measurement Model,
Journal of Personality Assessment, 93:5, 445-453, DOI: 10.1080/00223891.2011.594129

To link to this article: http://dx.doi.org/10.1080/00223891.2011.594129

Published online: 22 Aug 2011.

Submit your article to this journal

Article views: 921

View related articles

Citing articles: 9 View citing articles

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=hjpa20

Download by: [Australian Catholic University] Date: 08 October 2017, At: 03:09
Journal of Personality Assessment, 93(5), 445453, 2011
Copyright C Taylor & Francis Group, LLC
ISSN: 0022-3891 print / 1532-7752 online
DOI: 10.1080/00223891.2011.594129

STATISTICAL DEVELOPMENTS AND APPLICATIONS

The Relationship Between Unstandardized and Standardized


Alpha, True Reliability, and the Underlying Measurement Model
CARL F. FALK AND VICTORIA SAVALEI

Department of Psychology, The University of British Columbia, Canada

Popular computer programs print 2 versions of Cronbachs alpha: unstandardized alpha,  , based on the covariance matrix, and standardized
Downloaded by [Australian Catholic University] at 03:09 08 October 2017

alpha, R , based on the correlation matrix. Sources that accurately describe the theoretical distinction between the 2 coefficients are lacking, which
can lead to the misconception that the differences between R and  are unimportant and to the temptation to report the larger coefficient. We
explore the relationship between R and  and the reliability of the standardized and unstandardized composite under 3 popular measurement
models; we clarify the theoretical meaning of each coefficient and conclude that researchers should choose an appropriate reliability coefficient
based on theoretical considerations. We also illustrate that R and  estimate the reliability of different composite scores, and in most cases cannot
be substituted for one another.

Cronbachs alpha (Cronbach, 1951) is by far the most frequently alpha might actually be an overestimate of reliability (Raykov,
reported reliability coefficient. Decades of research report 2001; Zimmerman et al., 1993; cf. Zinbarg et al., 2005).
on the various properties and uses of coefficient alpha (for However, we believe that coefficient alpha is useful when its
reviews see Brennan, 2001; Cronbach & Shavelson, 2004; Li, assumptions are understood and met, and many practitioners
Rosenthal, & Rubin, 1996; Rodriguez & Maeda, 2006), and continue to rely on it as an index of reliability. Under certain
a plethora of articles and books have advised researchers how measurement model assumptions, alpha has an intuitive inter-
to compute and interpret alpha (e.g., Cortina, 1993; Henson, pretation that is easy to understand: it is the average of all
2001; Lord & Novick, 1968; John & Soto, 2007; McDonald, possible split-half reliability estimates (using Rulons method;
1999; Novick & Lewis, 1967; Nunnally & Bernstein, 1994; Cronbach, 1951; Novick & Lewis, 1967). Thus, alpha was ini-
Onwuegbuzie & Daniel, 2002; Romano, Kromrey, & Hibbard, tially popularized perhaps because it removed the indeterminacy
2010; Schmitt, 1996; Streiner, 2003a, 2003b; Traub & Rowley, of deciding how to split a test in half, which was the most popular
1991). Numerous articles have also explored the relation- method for determining reliability from a single test adminis-
ship between coefficient alpha and various other measures of tration before Cronbachs (1951) paper. In addition, it turns out
reliability (e.g., Callender & Osburn, 1979; Cronbach & Azuma, that alpha is an accurate estimate of reliability under measure-
1962; Cudeck, 1980; Jackson, 1979; Osburn, 2000; Raju, 1977; ment models typically assumed by popular split-half reliability
Raykov, 1997a; Zinbarg, Revelle, Yovel, & Li, 2005). estimation methods.
Despite the popularity of coefficient alpha, its continued Researchers who wish to report coefficient alpha are faced
use by researchers is also somewhat controversial. Some with the choice between two varieties of this coefficient: unstan-
psychometricians argue against the use of coefficient alpha in dardized alpha,  , based on the covariance matrix of the items,
favor of alternative indexes of reliability (e.g., Green & Yang, and standardized alpha, R , based on the correlation matrix of
2009; Revelle & Zinbarg, 2009; Sijtsma, 2009a, 2009b). For the items. Knowing the difference between these coefficients is
example, under certain measurement models, coefficient alpha particularly important because popular computer software such
tends to be an underestimate of reliability (Guttman, 1945; as SPSS, SAS, and other programs readily print both coeffi-
Lord & Novick, 1968; Raykov, 1997b; Zimmerman, Zumbo, cients. However, the relationship between these coefficients is
& Lalonde, 1993), although in many cases this underestimation not well understood and researchers sometimes report the wrong
might not be serious (e.g., Raykov, 1997b). If researchers use coefficient given the analyses they perform (e.g., Davis, 1980;
an underestimate of a measures reliability to disattenuate cor- Fresko, Kfir, & Nasser, 1997; Stone & Yoder, 2001). The de-
relations, this could result in inflated estimates of the resulting cision of which version of alpha to use depends on whether
correlations (Schmitt, 1996). Under other conditions, coefficient researchers decide to standardize a tests items before adding
the items to form a composite score. This decision to standard-
ize or leave items in raw form should be based on substantive
considerations, and this article focuses primarily on the choice
Received March 26, 2010; Revised October 30, 2010. of an appropriate reliability index after such a decision has been
Address correspondence to Carl F. Falk, Department of Psychology, Univer- made.
sity of British Columbia, 2136 West Mall, Vancouver, BC V6T 1Z4, Canada; Although unstandardized alpha is by far the most re-
Email: cffalk@psych.ubc.ca searched coefficient, only a small body of research has explored

445
446 FALK AND SAVALEI

standardized alpha (e.g., Hayashi & Kamata, 2005; Sun et al., based on the correlation matrix, is given by:
2007). Not only is the relationship between R and  un-
derstudied, but confusion exists as to whether or not the two  
coefficients are estimates of the same lower bound to reliability, k k k 2 rij
R = 1 k t = 2 , (2)
and, even more fundamentally, whether or not the reliability of k1 i=1 j=1,j =k rij +k
YZ
a scale changes when its items are standardized. For instance,
one article incorrectly claims that the true reliability of a mea-
sure is the same for both standardized and unstandardized ob- where rij corresponds to the correlation between the ith and jth
served scores (Osburn, 2000, p. 348; cited 68 times according item, rij is the average correlation, and Y2Z is the variance of
to Google Scholar, 2010). The relationship between R and  is the composite of standardized items. The leftmost expressions
also not well understood. For example, one frequently cited ex- in Equations 1 and 2 are computational formulas and are most
pository article on coefficient alpha incorrectly states that stan- familiar to applied researchers, whereas the rightmost expres-
dardized alpha is always greater than or equal to unstandardized sions provide simpler representations that will be more intuitive
alpha (Cortina, 1993; cited 1,186 times according to Google for understanding how R and  relate to the true reliabil-
Scholar, 2010). We provide an example where this is not true. ity of standardized and unstandardized composites. Equations 1
It is essential that researchers have access to accurate informa- and 2 were defined on population matrices. The corresponding
tion regarding the proper interpretation of reliability coefficients sample definitions can be obtained by using sample covariance
Downloaded by [Australian Catholic University] at 03:09 08 October 2017

such as R and  , and the relationship between them. This will and correlation matrices. Throughout most of this article, we
enable researchers to make informed decisions as to whether ei- do not distinguish between sample and population coefficients,
ther type of alpha is an appropriate reliability estimate for their but we briefly address the effect of sampling variability where
data. The main goal of this article is to clarify the interpretation appropriate.
and the properties of R and  , and to show that in many cases
these coefficients cannot be substitutes for each other. We will
show that the proper interpretation of each coefficient depends ALPHA, TRUE RELIABILITY, AND THE UNDERLYING
on the measurement model assumed for the data. Finally, we MEASUREMENT MODEL
will state recommendations for the use of each coefficient. Given that coefficients  and R can be quite different when
computed on a particular sample, researchers might sometimes
DEFINITION OF UNSTANDARDIZED be faced with the tempting choice to report the larger of the two
AND STANDARDIZED ALPHA coefficients. Unfortunately, methodological articles sometimes
Unstandardized alpha is simply alpha computed on the co- inadvertently encourage this practice by implying that both co-
variance matrix of items, whereas standardized alpha is alpha efficients estimate the same population coefficient (i.e., Osburn,
computed on the correlation matrix of items. The unstandard- 2000). However, except under the most restrictive measurement
ized alpha coefficient, based on a covariance matrix  with k model, the population values of  and R are not the same.
items is given by: Additionally, the true reliabilities of measurement instruments
based on unstandardized and standardized scores are also not
 k  necessarily the same. The choice between R and  should
k 2 k 2 ij
 = 1 i=12 i = 2 , (1) depend on whether standardized or unstandardized scores are
k1 Y Y summed to form the scale. Moreover, whether the correspond-
ing alpha coefficient is an accurate measure of reliability for
where Y2 is the variance of the composite made up by adding the chosen scale will depend on the hypothesized measurement
raw item scores, Y = X1 + X2 + + Xk , i2 is the variance model for the items (e.g., McDonald, 1999; Novick & Lewis,
of the ith item, and ij is the average covariance of the items.1 1967). Researchers who wish to interpret alpha as an accurate
A correlation matrix, R, is obtained from the covariance ma- estimate of reliability are implicitly assuming a measurement
trix by dividing each element by the product of the correspond- model for their items. To further complicate things, this mea-
ing variables standard deviations (which is equivalent to stan- surement model might vary for standardized and unstandardized
dardizing each item and computing the resulting standardized item scores. It is possible, for example, that  is an accurate
items covariance matrix).2 The standardized alpha coefficient, estimate of the reliability of the scale composed of unstandard-
ized items, whereas R is only a lower bound on the reliability
of the corresponding scale composed of standardized items.
1The variance of the composite can be obtained by either computing the We now elaborate on these points by considering three popular
composite and taking its variance, or by summing up all the elements in the measurement models from classical test theory: parallel tests,
covariance matrix . tau-equivalent tests, and congeneric tests (for an introduction
2Throughout the article, we assume that the tests items are measured con-
to classical test theory, see Crocker & Algina, 1986; Lord &
tinuously or at least using a sufficient number of categories so that the true score Novick, 1968; Nunnally & Bernstein, 1994).
model holds approximately at the item level (five to seven categories is usu- Consider a composite made up of original item scores,
ally deemed enough). For the case of binary items or items with few response
Y = X1 + X2 + + Xk , which we refer to as the unstandard-
options, although one can compute a reliability coefficient from the tetrachoric
or polychoric correlation matrix, this coefficient estimates the reliability of a
ized composite. Under classical test theory, the composites total
composite made of the underlying continuous responses to the items, not of the score can be decomposed as Y = TY + EY , where TY is the true
observed categorical items. Thus, this is not the estimate of interest. To model score, also described as the underlying construct or factor being
the observed categorical items appropriately requires switching to the item re- measured, and EY is the error score. Here, error is concep-
sponse theory framework, where the concept of reliability is replaced with the tualized as unsystematic and unrelated to the construct being
concept of information. measured. (For a deeper discussion of the concept of true and
STANDARDIZED ALPHA 447

TABLE 1.A comparison of the assumptions and features of three popular classical test theory measurement models.

Measurement
Model True Scores Error Scores Identifying Features True Reliability

k 2 T2
Parallel Ti = T ; var (Ti ) = T2 for all i var(Ei ) = E2 All item variances equal; all item  =
Y2
for all i covariances equal; all correlations equal
k 2 T2
Tau-equivalent Ti = T ; var (Ti ) = T2 for all i E2 1 = E2 2 Unequal item variances; all item  =
Y2
= = E2 k covariances equal
 2
k
i=1 bi T
2
Congeneric Ti = bi T ; b12 T2 = b22 T2 = = bk2 T2 E2 1= E2 2 Unequal item variances; unequal item  =
Y2
= = E2 k covariances

Note. Each item is assumed to follow the model: Xi = Ti + Ei . Assumptions regarding the true and error scores distinguish the three measurement models.

error scores and their interpretation, see, e.g., Bentler, 2009; YZ = TYZ + EYZ . The standardized composites true score and
Novick, 1966; Steyer, 1989; Zinbarg et al., 2005). error score
 are the sum of the standardizeditems true scores,
The reliability of this composite is defined as the ratio of true TYZ = ki=1 TiZ , and error scores, EYZ = ki=1 EiZ . The stan-
dardized composites reliability, R , is also the ratio of its true
Downloaded by [Australian Catholic University] at 03:09 08 October 2017

score variance to total variance:


variance to its total variance:
T2Y T2Y
 = = , (3) T2Z T2Z
T2Y + E2Y Y2 R = = (4)
T2Z + E2Z Y2Z
where T2y is the variance of TY , E2Y is the variance of EY , and
Y2 is the total variance of Y . Because this decomposition into However, the reliability of the standardized composite will not
true and error scores is purely hypothetical, and the variance of necessarily equal the reliability of the unstandardized compos-
the true score is not known, Equation 3 does not provide a way ite, R =  . This is because rescaling the individual compo-
to estimate reliability. To obtain empirical reliability estimates, nents to create standardized items also rescales the composites
it is necessary to focus on the k component measurements Xi true and error scores in a way that can change their relative
that make up the composite. These measurements can be across variances. As was the case with the unstandardized composite,
time, across judges or observers, or across items. In this article, the reliability of the standardized composite, and whether it is
we conceptualize Xi s as individual items making up a scale accurately estimated by standardized alpha, will depend on the
score Y . In this case, the reliability coefficient in Equation 3 is measurement model assumed for the standardized items. Inter-
known as internal consistency reliability. estingly, assuming a model for the unstandardized items forces
Because the reliability of a composite score can be assessed a model on the standardized items, but this model might not be
by examining the relationships among the components, it is im- the same.
portant to understand how the true and error scores for the com- In what follows, we discuss three measurement models for
posite are related to the true and error scores for each individual unstandardized items and how item standardization affects each
test item, or component. Each item is also hypothesized to be the model. Briefly, if a parallel tests model is assumed for the un-
sum of a true score and an error term, Xi = Ti + Ei ; notice that standardized items, a parallel tests model also holds for the
the true scores measured by each item are not assumed to be the standardized items, and the reliabilities for the two composites
same in the most general case, although two of the three mea- are equal. If a tau-equivalent tests model is assumed for the
surement models discussed next will make this assumption. The unstandardized items, a congeneric model holds for the stan-
sum of all the items true scores form the composites true score: dardized items, and the reliabilities for the two composites are
 not equal. Finally, if a congeneric tests model is assumed for
TY = ki=1 Ti . Similarly, the sum of all the items error scores
 the unstandardized items, it forces a congeneric tests model on
form the composites error score: EY = ki=1 Ei . Additional the standardized items, but it is a different congeneric model,
assumptions about the properties of Ti s and Ei s will define the and the reliabilities for the two composites are again not equal.
three measurement models considered here (see also Table 1). Whichever set of scores is considered (unstandardized or stan-
The equation for the reliability of the composite, and whether dardized), the corresponding coefficient alpha ( or R ) is
coefficient alpha accurately captures this reliability, depends on equal to the corresponding reliability only when the parallel or
the measurement model used. tau-equivalent model holds for those scores. In the case of a
Now, consider the standardized composite created by stan- congeneric model, alpha is generally an underestimate of the
dardizing each individual item and then adding them together, corresponding reliability coefficient.4 We now examine the de-
YZ = Z1 + Z2 + + Zk .3 Because each standardized item tails of these claims.
consists of a true and error score in a similar fashion as the un-
standardized composite, Zi = TiZ + EiZ , the standardized com- Parallel Items
posite can also be decomposed into a true score and error score, In a parallel tests model, each item is assumed to have
the following structure: Xi = T + Ei , where var(T ) = T2 and
3If the total scale score is standardized after raw item scores have been added,
4We are assuming no correlated errors. The important case in which the
this standardization does not change the scales reliability, and unstandardized
alpha can still be used. errors are correlated is discussed by Raykov (1998a, 2001).
448 FALK AND SAVALEI

var(Ei ) = E2 for all i.5 That is, each item is measuring the same model). This means that if items are parallel, either alpha can
underlying true score to the same extent, and error variances are be used to estimate reliability in the sample, and the actual scale
equal across the items. The covariance matrix for such a model reliability remains the same. Although sample estimates might
will have equal variances, X2 = T2 + E2 , and equal covariances differ from one another, they will estimate the same population
(see Table 1 for a summary of the assumptions and identifying quantity.
features of each measurement model). In a finite sample, item
variances will only be approximately equal and item covariances Tau-Equivalent Items
will only be approximately equal. Statistical tests are available The tau-equivalent measurement model still assumes that any
for checking the assumptions of a parallel tests model (e.g., item score can be decomposed into its true and error part: Xi =
Reuterberg & Gustafsson, 1992). T + Ei , where each item is measuring the same true score and
The covariance matrix for parallel items can be decomposed to the same extent. However, the error variances are no longer
into reliable and error variance components as follows:6 assumed to be equal for each item; that is, E21 = E22 = =
E2k . This measurement model results in a covariance matrix
T2 + E2 where item covariances are equal, but item variances are not
 = ... ... equal (see Table 1). The covariance matrix can be decomposed
T
2
. . . T + E
2 2 as follows:
Downloaded by [Australian Catholic University] at 03:09 08 October 2017

2 2
T E T2 + E21
= ... ... + 0 ... .
(5)  = ... ...
T . . . T
2 2
0 0 E2
T
2
. . . T + Ek
2 2

2 2
The total reliable variance of the composite is then the sum of T E1
all elements in the first term of this decomposition: T2Y = k 2 T2 . = ... ... + 0 ...

. (6)
The total variance of the composite is the sum of all the elements
in both the reliable and the error terms of this decomposition: T2 . . . T2 0 0 E2k
Y2 = k 2 T2 + kE2 . The reliability of the test is then the ratio of
k2 2 k2 2 In this case, the amount of reliable variance in the composite is
the true variance to the total variance, or  = k2 2 +k
T
2 =
T
.
T E Y2 the same as that for parallel items, T2Y = k 2 T2 , as all true score
Compare the rightmost term to the rightmost term of Equation variances are still equal to one another. Although error variances
1. Because all item covariances are equal, the average item are no longer equal, the reliability of the composite is still the
covariance will be equal to T2 . Therefore, in the case of parallel k2 2
tests, unstandardized alpha is equal to the actual reliability of the ratio of true score variance to total variance:  = 2T , because
Y
composite,  =  . This relationship holds in the population. the total variance incorporates the different item error variances
This means that for a finite sample,  will be a consistent (see Footnote 1). Because all item covariances are equal, the
estimate of the true reliability of the composite. average item covariance used in Equation 1 will again be equal
When items from the parallel model are standardized, the to T2 and it is still the case that  =  . In finite samples, 
standardized items covariances can be obtained from the origi- will be an accurate estimate of reliability of the unstandardized
nal covariances by dividing each element of  by the product of composite.
the corresponding standard deviations. The resulting covariance Because item variances under the tau-equivalent model are
matrix of the standardized items is just the correlation matrix of not equal to one another, when tau-equivalent items are standard-
the raw data. Because the original items are parallel and have ized each item will be divided by a different standard deviation.
equal variances and equal covariances, all the off-diagonal ele- The resulting correlation matrix will consist of off-diagonal el-
ments (i.e., interitem correlations) will be equal to one another, ements that are no longer equal to one another, which means
and its diagonal elements will all be 1. Thus, the standardized that the standardized items no longer follow the tau-equivalent
items will also follow a parallel tests model. Moreover, because model. The reliability of the unstandardized composite will not
all items are rescaled by the same amount, standardization does equal the reliability of the standardized composite because the
not change the ratio of true score to error variance, and hence true and error variances have changed in a nontrivial way after
the reliability of the unstandardized composite is the same as standardization. Additionally, R is no longer an accurate esti-
that of the standardized composite,  = R , and unstandard- mate of the standardized composites reliability, R = R , nor is
ized alpha is equal to standardized alpha,  = R . Because it equal to unstandardized alpha,  = R . Actually, the model
under a parallel tests model alpha is equal to reliability, all four obtained after standardizing tau-equivalent items is a congeneric
of these coefficients are equal in the population (see Table 2 for measurement model, which we discuss next.
a comparison of all four coefficients under each measurement
Congeneric Items
5All three measurement models discussed in this article allow for different Under the congeneric measurement model, items are assumed
intercepts among the items; however, item intercepts are omitted for simplicity to have the following decomposition: Xi = bi T + Ei . That is,
and because they do not affect any of the reliability computations. items no longer measure the underlying true score to the same
6The form of this matrix is obtained by applying the rules of covariance extent, but some items are more highly correlated with the true
algebra to find the variances of individual items and simplify the covariances score than others. In addition, as in the tau-equivalent model,
between Xi = T + Ei and Xj = T + Ej (e.g., Bollen, 1989). error variances are also not assumed equal. A congeneric model
STANDARDIZED ALPHA 449

TABLE 2.Three popular classical test theory measurement models, the corresponding standardized item measurement model if the items are standardized, and
the relationship between unstandardized alpha, standardized alpha, and the true reliability of each composite.

Unstandardized Alpha Reliability of the


Corresponding and the Reliability of the Standardized Alpha and Unstandardized and
Unstandardized Item Standardized Item Unstandardized and Unstandardized the Reliability of the Standardized
Measurement Model Measurement Model Standardized Alpha Composite Standardized Composite Composites

Parallel Parallel  = R  =  R = R  = R
Tau-equivalent Congeneric  = R  =  R = R  = R
Congeneric A different congeneric  = R  =  R = R  = R
model

Note. The preceding relationships hold in the population. In any given finite sample, these relationships will hold approximately.

is basically a one-factor model, and the coefficients bi are also R R . Finally, because all four coefficients ( , R ,  , R )
called factor loadings. The covariance matrix of such items can are different (see Table 2), nothing can be said about the relative
be identified by noting that item variances and covariances are size of  and R . Also, we do not know how R compares to
unequal to each other (see Table 1). The covariance matrix of  . Thus, if one uses a sample estimate of R to accompany
Downloaded by [Australian Catholic University] at 03:09 08 October 2017

such items can be partitioned into reliable variance and error an unstandardized composite, one cannot make any meaningful
variance as follows: statements with regard to the accuracy of this coefficient as an
estimate of the unstandardized composites true reliability.

b12 T2 + E21
NUMERICAL EXAMPLES
= We now give several numerical examples that illustrate the
b1 bK T2 bK T + E2K
2 2
three measurement models and the coefficients  , R ,  ,
2 and R . First, consider the following population covariance
b12 T2 E1 matrix and its decomposition into true and error variance
= + 0

. components:
b1 bK T2 bK T
2 2 0 0 EK
2

5
(7) 2 5
=
2
= T + E

2 5
The total reliable variance in the composite is the sum of ev-
erything in the first term of this decomposition, which simplifies 2 2 2 5

to T2Y = ( ki=1 bi )2 T2 . The reliability of the composite is, by 2 3
definition, the ratio of the reliable variance 
in the composite to 2 2 0 3
( k
b )2 2
the total variance of the composite:  = i=1 2 i T . Coefficient =
2
+



Y
2 2 0 0 3
alpha computed on the congeneric population covariance matrix 2 2 2 2 0 0 0 3
k2 b b 2
will produce:  = i 2j T . This follows from Equation 1 by
Y
noting that bi bj T2 is the average item covariance. Because the Because all item covariances are equal and all item vari-
numerators for the reliability of the unstandardized composite ances are equal, the items follow a parallel tests model. This
and unstandardized alpha are different,  is in general not equal is an example of the theoretical decomposition in Equation
to  . Actually, it can be shown that  is smaller than the true 5. For these data, the average covariance is 2, the total vari-
 ance of the composite (sum of every element in ) is 44, and
reliability,   , because k 2 bi bj ( ki=1 bi )2 (Guttman,
 = 4442 = .73, using Equation 1. The true reliability of the
2

1945). That is,  is an underestimate of reliability when the


composite consists of congeneric items. This relationship holds unstandardized composite,  , is obtained by summing every
exactly in the population, and will tend to hold in large samples. element of T and dividing it by the sum of every element of
In the previous section, it was shown that if tau-equivalent items , so that  = 216
44
= .73. Thus,  =  , as it should for
are standardized, the resulting standardized items will follow a parallel items. The corresponding correlation matrix and its de-
congeneric model. The results of this section imply that when composition are given by:
unstandardized items follow a tau-equivalent model, R will be
an underestimate of the standardized composites true reliability. 1
When items from a congeneric scale are standardized, the re- .4 1
sulting correlation matrix again follows a congeneric model, but R=
.4


.4 1
with a different set of coefficients. The new coefficients are given
by: bi = bi /Xi , where bi s are the coefficients corresponding .4 .4 .4 1
to unstandardized items and Xi is each items standard devia-
.4 .6
tion. The change in the relative size of these coefficients implies .4 0 .6
.4
that the reliabilities are different for the unstandardized and stan- =
.4
+



dardized composites,  = R . As before, standardized alpha is .4 .4 0 0 .6
a lower bound on the reliability of the standardized composite, .4 .4 .4 .4 0 0 0 .6
450 FALK AND SAVALEI

Using the same computations as just shown but applied to is not an accurate measure of the unstandardized composites
these matrices, we obtain R = R = .73. All four coefficients reliability.
are equal to one another and either alpha would be an accurate Finally, our last example concerns the following covariance
measure of the true reliability of the unstandardized or standard- matrix:
ized composites.
As the next example, consider the following covariance ma- 17.52
trix and its decomposition: 15.77 17.52
=
.41
=T + E

.41 .98

11 .41 .41 .01 .98
2 4
=
2 2
= T +  E

15.77
3 15.77 15.77
2 2 2 8 =
.41


.41 .01

2 9 .41 .41 .01 .01
2 2 0 2
=
2 2
+



1.75
2 0 0 1
Downloaded by [Australian Catholic University] at 03:09 08 October 2017

0 1.75
2 2 2 2 0 0 0 6 + 0


0 .97
The items follow a tau-equivalent model and is an exam- 0 0 0 .97
ple of the theoretical decomposition in Equation 6, where all
item covariances are equal but item variances are not equal. This matrix follows a congeneric model and is an exam-
For these data, the average covariance is 2, the total vari- ple of the theoretical decomposition in Equation 7. Here,
2
2.9
ance of the composite (sum of every element in ) is 50,  = 471.84 = .65, which is a rather substantial underestimate
and  = 4502 = .64. The sum of all the elements in T is
2
of the actual reliability,  = 71.84
66.4
= .92. The corresponding
2 16 = 32, and  = 32 = .64. Again,  =  , meaning that correlation matrix and its decomposition are given by:
50
the equation for alpha gives the correct reliability computation
for tau-equivalent items. 1
The situation changes when the tau-equivalent data just shown .9 1
are standardized: R=
.1


.1 1
.1 .1 .01 1
1
.30
1 .9 .1
R=
.35


.58 1 .9 .9 0 .1
=
.1
+
0 0 .99


.21 .35 .41 1 .1 .01
.1 .1 .01 .01 0 0 0 .99
.18 .82
.30 .5 0 .5
=
.35
+
0

For this decomposition, R = 4 6.62(.22)
2
= .53, and the actual
.58 .67 0 .33
.21 .35 .41 .25 0 0 0 .75 reliability of the standardized composite is R = 4.44 6.62
= .67.
This example illustrates that the reliabilities of the unstandard-
ized and standardized composites can be quite different (.92
vs .67), unlike the claim of Osburn (2000) that they are always
The resulting standardized items now follow a congeneric
the same. Also, both  and R are serious underestimates of
model. The covariances (i.e., item correlations) are no longer
the actual corresponding reliabilitiesa difference of .27 for
equal. The decomposition no longer resembles an example of
the unstandardized composite and .14 for the unstandardized
Equation 6, but rather of Equation 7. Identifying a congeneric
composite. Such large underestimates are more likely to occur
model and arriving at the decomposition is not trivial, and
when item correlations or covariances are quite different
requires factor analysis. For our purposes, we assume that
from one another. Of course, Items 3 and 4 do not seem to
the decomposition of components is known. The average
belong on the scale, and Items 1 and 2 seem to be duplicating
covariance is now .37, and the sum of everything in R is 8.4,
2 content. However, such a situation could occur in practice. For
so that R = 4 8.4(.37)
= .70. The actual reliability, however, is example, Items 3 and 4 might have simply not yielded enough
R = 8.4 = .71, where the numerator is the sum of all the
6
variation to correlate highly with the other items in the matrix
elements in the reliable component of the decomposition. or variable scaling might have been different. Researchers
Therefore, R is a slight underestimate of the standardized should always examine itemtotal correlations as well as the
composites reliability. Note also because R >  , a re- actual correlation matrix of the items to detect problems with
searcher might be tempted to report R . This is particularly true the scale, even if alpha is acceptable. Correlations that are
given that that  (.64) is somewhat low, but R (.70) meets a extremely unequal might lead the researchers to consider other
common cutoff criterion (e.g., Nunnally, 1978). However, R estimates of reliability or dropping low-correlating items.
STANDARDIZED ALPHA 451

Finally, this example shows that  can be greater than R , items. Formally, researchers might also test whether their data
contradicting Cortina (1993). Although it is tempting to think meet the assumptions of any of the classical test theory mea-
that this is because the datas correlation matrix is so extreme, surement models (e.g., Reuterberg & Gustafsson, 1992). In
the Appendix proves that a set of standard deviations leading our experience, it is unlikely that a test will meet the strin-
to  R can be found for any correlation matrix. gent assumptions of a parallel or tau-equivalent measurement
model.
If it is the case that data follow a congeneric model, then
DISCUSSION AND RECOMMENDATIONS it is more appropriate to interpret alpha as a lower bound to
The process by which researchers decide on an appropriate reliability as it tends to underestimate the actual reliability of
estimate of internal consistency reliability is twofold. First, re- the scale. Instead of using coefficient alpha, researchers might
searchers should decide whether to estimate internal consistency obtain a better estimate of the reliability of a composite by
from the covariance matrix or correlation matrix on the basis of using an alternative approach (e.g., Bentler, 2009; Green &
how the composite score is calculated. In general, if researchers Yang, 2009; Revelle & Zinbarg, 2009; Sijtsma, 2009a). For
intend to sum raw scores, the covariance matrix should be used instance, researchers could compute coefficient rho, which re-
to determine internal consistency. If researchers intend to sum quires estimating a factor analytic model for the data (Bentler,
standardized scores, the correlation matrix is more appropri- 2006; Green & Yang, 2009; Raykov, 1997a, 1997b; Yang &
ate for determining internal consistency. Second, researchers Green, 2010). However, we qualify this advice by observing
Downloaded by [Australian Catholic University] at 03:09 08 October 2017

should decide whether alpha is an appropriate estimate of relia- that, first, coefficient alpha is often not a serious underesti-
bility for that composite, based on the underlying measurement mate of true reliability unless the weights bi are very hetero-
model that is either assumed or is shown to approximately hold geneous, there are few test items, and items contain a lot of
for the data. Neither of these decisions is trivial. Although this error (Raykov, 1997b). Second, alternatives to alpha also tend
article primarily focuses on the latter issue, which assumes that to be either not readily accessible to researchers or require ad-
researchers have selected a composite that is of interest, we now vanced statistical training, and additional research is required
briefly discuss the first issue. before an ideal alternative candidate emerges (Revelle & Zin-
There are several reasons that researchers might wish to stan- barg, 2009; Sijtsma, 2009b). In contrast, computation of alpha
dardize items before combining them. In general, standardizing requires minimal statistical training and researchers deciding
items will give equal weight to each item in constructing the to report alpha can be sure that others will easily recognize
composite. If items are left in raw form, the items with higher and interpret it. Thus, we believe that coefficient alpha might
variance will have higher weight in determining the distribution be more useful than some recent theorists seem to imply. Re-
of scores across individuals. Thus, standardizing items before gardless of which reliability estimate is used, researchers might
creating a composite can be useful in cases where the items use also wish to report a confidence interval for the estimate (e.g.,
different metrics or for some other reason where equal weight- Fan & Thompson, 2001; Raykov, 1998b; Raykov & Shrout,
ing of items is desired. For instance, equal weighting can often 2002).
be superior to alternative methods of combining scores such as Finally, it should be noted that it is possible for data to not
the use of regression-based weights (e.g., Dana & Dawes, 2004; follow any of the three classical test theory measurement models
Dawes, 1979). we have presented. For example, in some cases errors might be
In some instances, equal variances are already presumed correlated with each other. This can occur when items share sim-
by the measurement instrument. For instance, personality re- ilar wording or tap another theoretically uninteresting construct.
searchers often use self-report Likert items of the same format During scale construction researchers sometimes use a confir-
(Paulhus & Vazire, 2007). Thus, items are theoretically already matory factor analysis model and correlate errors to achieve
on the same metric, and in this situation, leaving the scores in model fit; if so, such a measurement model is different from
raw form might make the most sense. Researchers might also any of the three considered. In a related case, researchers might
wish to leave scores in the original metric, or follow a tests hypothesize (or find empirical evidence) that a test measures
official scoring procedure, especially in cases where this metric a general factor that represents the underlying construct of in-
has a meaningful interpretation. For example, this is sometimes terest, along with other factors (e.g., bifactor models). In both
the case in clinical applications of scales where cutoff scores or these cases, alpha will generally be an overestimate of the reli-
change scores are empirically demonstrated to map onto a level able variance due to the general factor and the coefficient h or
of psychopathology or an actual meaningful change in the lives structural equation techniques are preferred as estimates of the
of patients (e.g., Kazdin, 1999; Kendall, Marrs-Garcia, Nath, & general factors reliability (McDonald, 1999; Raykov, 1998a;
Sheldrick, 1999). Zinbarg et al., 2005). Typically in personality research it is as-
The second decision concerns what measurement model the sumed that there is some underlying construct or trait individuals
items are assumed to follow, whether they be standardized or un- possess that causes them to respond in a certain way on a tests
standardized. If data follow a parallel or tau-equivalent model, items. If this assumption is not plausible and a tests items are
then the researcher can use and interpret coefficient alpha as formative rather than reflective (e.g., socioeconomic status is
an accurate estimate of reliability. Researchers might infor- a composite consisting of measures such as education and in-
mally inspect the covariance or correlation matrix to see if item come; socioeconomic status does not cause income), it is not
covariances or correlations are approximately equal (meeting necessary that items be positively correlated (Bollen & Lennox,
the assumptions of a tau-equivalent model). This practice could 1991). In this case, other measures of reliability (e.g., testretest)
also help researchers identify item heterogeneity, multidimen- are more appropriate than internal consistency coefficients such
sionality, and items that do not correlate positively with other as alpha and rho.
452 FALK AND SAVALEI

CONCLUSION Guttman, L. (1945). A basis for analyzing testretest reliability. Psychometrika,


10, 255282.
This article clarified the relationship between standard-
Hayashi, K., & Kamata, A. (2005). A note on the estimator of the alpha coef-
ized alpha and unstandardized alpha. We have corrected ficient for standardized variables under normality. Psychometrika, 70, 579
misconceptions in previous work by demonstrating that the 586.
reliability of a composite of unstandardized scores is in general Henson, R. K. (2001). Understanding internal consistency reliability estimates:
not the same as the reliability of the composite of the standard- A conceptual primer on coefficient alpha. Measurement and Evaluation in
ized scores. We also showed that the measurement model for Counseling and Development, 34, 177189.
a set of unstandardized scores does not in general imply the Jackson, P. H. (1979). A note on the relation between coefficient alpha and
same measurement model for the standardized scores, which Guttmans split-half lower bounds. Psychometrika, 44, 251252.
has implications for the relationship between unstandardized John, O. P., & Soto, C. J. (2007). The importance of being valid: Reliability
alpha, standardized alpha, and true composite reliabilities. and the process of construct validation. In R. W. Robins, R. C. Fraley, & R.
F. Krueger (Eds.), Handbook of research methods in personality psychology
When deciding which estimate of reliability to use, we hope
(pp. 461494). New York, NY: Guilford.
that researchers will consider both how the composite score is Kazdin, A. E. (1999). The meanings and measurement of clinical significance.
computed (raw vs. standardized) as well as what measurement Journal of Consulting and Clinical Psychology, 67, 332339.
model is likely to describe for the data. Kendall, P. C., Marrs-Garcia, A., Nath, S. R., & Sheldrick, R. C. (1999). Nor-
mative comparisons for the evaluation of clinical significance. Journal of
Consulting and Clinical Psychology, 67, 285299.
Downloaded by [Australian Catholic University] at 03:09 08 October 2017

REFERENCES Li, H., Rosenthal, R., & Rubin, D. B. (1996). Reliability of measurement in
Bentler, P. M. (1968). Alpha-maximized factor analysis (Alphamax): Its rela- psychology: From Spearman-Brown to maximal reliability. Psychological
tion to alpha and canonical factor analysis. Psychometrika, 33, 335345. Methods, 1, 98107.
Bentler, P. M. (2006). EQS 6 structural equation program manual. Encino, CA: Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores.
Multivariate Software. Reading, MA: Addison-Wesley.
Bentler, P. M. (2009). Alpha, dimension-free, and model-based internal consis- McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
tency reliability. Psychometrika, 74, 137143. Novick, M. R. (1966). The axioms and principal results of classical test theory.
Bollen, K. A. (1989). Structural equations with latent variables. New York, Journal of Mathematical Psychology, 3, 118.
NY: Wiley. Novick, M. R., & Lewis, C. (1967). Coefficient alpha and the reliability of
Bollen, K. A., & Lennox, R. (1991). Conventional wisdom on measure- composite measurements. Psychometrika, 32, 113.
ment: A structural equation perspective. Psychological Bulletin, 110, 305 Nunnally, J. C. (1978). Psychometric theory. New York, NY: McGraw-Hill.
314. Nunnally, J. C., & Bernstein, I. (1994). Psychometric theory. New York, NY:
Brennan, R. L. (2001). An essay on the history and future of reliability from McGraw-Hill.
the perspective of replications. Journal of Educational Measurement, 38, Onwuegbuzie, A. J., & Daniel, L. G. (2002). A framework for reporting and
295317. interpreting internal consistency reliability estimates. Measurement and Eval-
Callender, J. C., & Osburn, H. G. (1979). An empirical comparison of coefficient uation in Counseling and Development, 35, 89103.
alpha, Guttmans lambda2, and MSPLIT maximixed split-half reliability Osburn, H. G. (2000). Coefficient alpha and related internal consistency relia-
estimates. Journal of Educational Measurement, 16, 8999. bility coefficients. Psychological Methods, 5, 343355.
Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and Paulhus, D. L., & Vazire, S. (2007). The self-report method. In R. W. Robins,
applications. Journal of Applied Psychology, 78, 98104. R. C. Fraley, & R. F. Krueger (Eds.), Handbook of research methods in
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test personality psychology (pp. 224239). New York, NY: Guilford.
theory. New York, NY: Holt, Rinehart, & Winston. Raju, N. S. (1977). A generalization of coefficient alpha. Psychometrika, 42,
Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. 549565.
Psychometrika, 16, 297334. Raykov, T. (1997a). Estimation of composite reliability for congeneric mea-
Cronbach, L. J., & Azuma, H. (1962). Internal-consistency reliability formulas sures. Applied Psychological Measurement, 21, 173184.
applied to randomly sampled single-factor tests: An empirical comparison. Raykov, T. (1997b). Scale reliability, Cronbachs coefficient alpha, and viola-
Educational and Psychological Measurement, 22, 645665. tions of essential tau-equivalence with fixed congeneric components. Multi-
Cronbach, L. J., & Shavelson, R. J. (2004). My current thoughts on coefficient al- variate Behavioral Research, 32, 329353.
pha and successor procedures. Educational and Psychological Measurement, Raykov, T. (1998a). Coefficient alpha and composite reliability with interrelated
64, 391418. nonhomogeneous items. Applied Psychological Measurement, 22, 375385.
Cudeck, R. (1980). A comparative study of indices for internal consistency. Raykov, T. (1998b). A method for obtaining standard errors and confidence
Journal of Educational Measurement, 17, 117130. intervals of composite reliability for congeneric measures. Applied Psycho-
Dana, J., & Dawes, R. M. (2004). The superiority of simple alternatives to logical Measurement, 22, 369374.
regression for social science predictions. Journal of Educational and Behav- Raykov, T. (2001). Bias of coefficient for fixed congeneric measures with
ioral Statistics, 29, 317331. correlated errors. Applied Psychological Measurement, 25, 6976.
Davis, M. H. (1980). A multidimensional approach to individual differences in Raykov, T., & Shrout, P. E. (2002). Reliability of scales with general structure:
empathy. JSAS Catalog of Selected Documents in Psychology, 10, 85. Point and interval estimation using a structural equation modeling approach.
Dawes, R. M. (1979). The robust beauty of improper linear models in decision Structural Equation Modeling, 9, 195212.
making. American Psychologist, 34, 571582. Reuterberg, S. E., & Gustafsson, J. E. (1992). Confirmatory factor analysis
Fan, X., & Thompson, B. (2001). Confidence intervals about score reliability and reliability: Testing measurement model assumptions. Educational and
coefficients, please: An EPM guidelines editorial. Educational and Psycho- Psychological Measurement, 52, 795811.
logical Measurement, 61, 517531. Revelle, W., & Zinbarg, R. E. (2009). Coefficients alpha, beta, omega and the
Fresko, B., Kfir, D., & Nasser, F. (1997). Predicting teacher commitment. GLB: Comments on Sijtsma. Psychometrika, 74, 145154.
Teacher and Teacher Education, 13, 429438. Rodriguez, M. C., & Maeda, Y. (2006). Meta-analysis of coefficient alpha.
Google Scholar. (2010). http://scholar.google.com. Retrieved on August 21, Psychological Methods, 11, 306322.
2010. Romano, J. L., Kromrey, J. D., & Hibbard, S. T. (2010). Confidence interval
Green, S. B., & Yang, Y. (2009). Commentary on coefficient alpha: A cautionary methods for coefficient alpha. Educational and Psychological Measurement,
tale. Psychometrika, 74, 121135. 70, 376393.
STANDARDIZED ALPHA 453

Schmitt, N. (1996). Uses and abuses of coefficient alpha. Psychological Assess- in Equation A.1 and can be rewritten as follows:
ment, 8, 350353.
 
Seber, G. A. F., & Lee, A. J. (2003). Linear regression analysis. Hoboken, NJ: k w w
Wiley. G = 1  , (A.2)
Sijtsma, K. (2009a). On the use, the misuse, and the very limited usefulness of k1 w Rw
Cronbachs alpha. Psychometrika, 74, 107120.
Sijtsma, K. (2009b). Reliability beyond theory and into practice. Psychometrika, where w = D 1/2 l. That is, G is the alpha coefficient corre-
74, 169173. sponding to a weighted composite, where w is the vector of
Steyer, R. (1989). Models of classical psychometric test theory as stochastic
the weights. When D = D , the weights are simply standard
measurement models: Representation, uniqueness, meaningfulness, identifi-
ability, and testability. Methodika, 3, 2560.
deviations of the variables, or the square roots of each element
Streiner, D. L. (2003a). Being inconsistent about consistency: When coeffi- of the diagonal of . When D = I , the weights are all equal to
cient alpha does and doesnt matter. Journal of Personality Assessment, 80, 1, the square root of each diagonal element of R.
217222. Maximizing Equation A.2 is the same as maximizing:
Streiner, D. L. (2003b). Starting at the beginning: An introduction to coeffi-
cient alpha and internal consistency. Journal of Personality Assessment, 80, w Rw
99103. = . (A.3)
Stone, W. L., & Yoder, P. J. (2001). Predicting spoken language level in children w w
with autism spectrum disorders. Autism, 5, 341361.
Downloaded by [Australian Catholic University] at 03:09 08 October 2017

Sun, W., Chou, C. P., Stacy, A. W., Ma, H., Unger, J., & Gallaher, P. (2007). The maximum of the ratio of two quadratic forms is a well-
SAS and SPSS macros to calculate standardized Cronbachs alpha using the known problem, and the maximum is achieved when w is the
upper bound of the phi coefficient for dichotomous items. Behavior Research eigenvector of R corresponding to the largest eigenvalue (e.g.,
Methods, 39, 7181. Seber & Lee, 2003). Nonetheless, we show this by taking the
Traub, R. E., & Rowley, G. L. (1991). Understanding reliability. Educational derivative of Equation A.3 and setting it to zero:
Measurement: Issues and Practice, 10, 3745.
Yang, Y., & Green, S. B. (2010). A note on structural equation modeling esti-
2Rw(w w) 2w(w Rw)
mates of reliability. Structural Equation Modeling, 17, 6681. = = 0,
Zimmerman, D. W., Zumbo, B. D., & Lalonde, C. (1993). Coefficient alpha as w (w w)2
an estimate of test reliability under violation of two assumptions. Educational
and Psychological Measurement, 53, 3349. which is equivalent to solving
Zinbarg, R. E., Revelle, W., Yovel, I., & Li, W. (2005). Cronbachs , Revelles
, and McDonalds H : Their relations with each other and two alternative  
(w Rw)
conceptualizations of reliability. Psychometrika, 70, 123133. R I w = 0.
w w

APPENDIX This eigen-equation holds whenever w is an eigenvector of


We sketch an outline of the proof that alpha is maximized R. Assuming R is full rank, as would be expected in most
when items standard deviations are set to elements of the first applications, there are k solutions. Arranging the eigenvalues
eigenvector of the items correction matrix (see also Bentler, of R in decreasing order, 1 k , the values of Equation
1968). Let  be the k k covariance matrix and D be A.2 at these local extrema are:
the diagonal matrix with variances of the variables on the    
diagonal: k w w k 1
G,i = 1 = 1 . (A.4)
k1 i w  w k1 i

12 12
 = , D = 0 . Clearly, G,1 is the maximum. In other words, coefficient al-
pha is maximized when the weights of the composite are set
1k k
2
0 0 k
2
to be elements of the first eigenvector of R. We note that if
the elements of R are all positive, as would be expected from
1/2 1/2
The correlation matrix is given by R = D D . Using any correlation matrix for which computing an estimate of in-
matrix algebra, we can write a general expression for coefficient ternal consistency reliability is meaningful, the first eigenvec-
alpha in the form: tor will contain all-positive elements and the optimal weights
will be positive. Finally, these optimal weights, w, are the
  principal component loadings corresponding to the first prin-
k l  Dl
G = 1  1/2 , (A.1) cipal component (standardized to have variance 1) computed
k1 l D RD 1/2 l on the matrix of z scores. That is, if zi is the k 1 vector
of z scores for person i, then each persons score on the first
where D is some diagonal matrix and l is the k 1 vector of principal component is given by pi = wzi . In other words, a
all 1s, or the summing vector. This expression reduces to composite with maximal alpha can be formed by performing
unstandardized alpha when the diagonal of D contains the item a principal component analysis, standardizing the tests items,
variances, D = D , and it reduces to standardized alpha when and then using the loadings from the first principal compo-
the diagonal of D is all ones, D = I . The optimization problem nent as each standardized items weight in computing the new
is to find a diagonal matrix D that will maximize the coefficient composite.

S-ar putea să vă placă și