Sunteți pe pagina 1din 30

Factor Analysis for Categorical Data Author(s): D. J. Bartholomew Reviewed work(s): Source: Journal of the Royal Statistical Society.

Series B (Methodological), Vol. 42, No. 3 (1980), pp. 293-321 Published by: Wiley-Blackwell for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2985165 . Accessed: 30/10/2012 08:49
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Wiley-Blackwell and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series B (Methodological).

http://www.jstor.org

J. R. Statist. Soc. B (1980), 42, No. 3, pp. 293-321

Factor Analysis forCategoricalData


By D. J. BARTHOLOMEW
Lonidoni School ofEconiomics anidPoliticalScienice the ROYAL [Read before
STATISTICAL SOCIETY

at a meeting organizedbythe RESEARCH May 21st,1980,Professor P. WHITTLE in theChair] SUMMARY

SECTION

on Wednesday,

The method of factor analysis is widely used as an exploratory tool to reduce the ofmultivariate dimensionality data.Thefact that the standard model isstrictly applicable only when themanifest variables arescaled is a serious limitation insocialscience where the variables areoften categorical. Inthis paper weaimtoprovide a theoretical framework within which methods forthefactor analysis ofcategorical data can be devised and compared. Discussion is restricted to thecase of ordered categories where thelatent variables arecontinuous. It is argued thatthechoice ofmodel should be madefrom a restricted set which includes twoexisting models as specialcases.A newmethod is proposed together with a simple approximate technique of fitting for the one-factor model. Thepaper concludes with anevaluation ofexisting methods andmakes some suggestions aboutthedirection which future research should take. Keywords: FACTOR ANALYSIS; LATENT STRUCTURE ANALYSIS; MULTIVARIATEANALYSIS;
CATEGORICAL DATA;MULTI-DIMENSIONAL CONTINGENCY TABLES; DATAREDUCTION; SCALING;ORDINALDATA 1. THE BACKGROUND

both to providean This is particularly desirablein theexploratory stagesofan investigation When the variablesare linesformodel building. and to suggestfruitful intelligible summary servesthis continuous and measuredon a commonscale,principal component analysisoften purpose.Factor analysisachieves much the same end by setting up a model in whichthe Neither of observedvariablesare relatedto a smallerset oflatentvariablesand to an "error". thesemethodsis directly applicable to categoricalvariablesyetthe need fordata reduction inthesocialsciences insuchcircumstances Thisis particularly true is no lesspressing. techniques a framework for The aimofthis is categorical. where muchofthedata arising paperis to provide of methodsfor use when all the variables are measuredon an ordered the development categoricalscale. In the process we shall show how earlierapproaches for dichotomous variablesarise as special cases of our generalformulation. are crossWe assume that we have a simplerandomsample of size N whose members is verycommonbut it can always be on p categoricalorderedvariables.Ordering classified The to a dichotomy. each dimension achieved,at the loss of some information, by reducing have a joint tablewhosecellfrequencies sampledata can be setout in a multi-way contingency be fixed. thatany marginal Thereis no requirement multinomial distribution. frequencies of theoriginalcontingency whether thep-variate Our aim is to determine representation loss of information, table can be replaced, without significant by one in a smallernumberof forthisto be done alreadyexistsin framework We shallarguethatthetheoretical dimensions. of whichnormalfactor latentstructure analysis, analysisis a special case. It appears to have from statisticians. little attention Latentstructure analysishas received and Henry(1968).A a sociologist, and is expoundedin Lazarsfeld withLazarsfeld, originated in Goodman ofsomeaspectsfrom a statistical morerecent discussion pointofviewis contained

ofthedata. AN important objectofmuchmultivariate analysisis to reducethedimensionality

294

BARTHOLOMEW -

for CategoricalData Factor Analysis

[No. 3,

haveusedlatent (1977). Sociologists is provided byFielding introduction (1978)and a useful The rawdata in such and measuring attributes. as a toolfor investigating analysis structure under to elicit theattitude to questions designed byindividuals oftheresponses casesconsist areconsistent themultidimensional responses liesin whether then Theinterest investigation. areencountered scales. Similar problems attitude ofone(ormore) underlying the with existence is reference A keystatistical rather thanattitudes. with abilities concerned bypsychologists structure oflatent the theoretical foundation work shares Someofthis (1968). LordandNovick caseof as a special arises scaling outthat Guttman andHenry (1968) point Lazarsfeld analysis; their model. their structure sharea common mathematical ofapplications twofields these Although the from one starts different. Whenscalingabilities are somewhat ways of proceeding measured ofbeing andiscapable exists as general intelligence) an ability that (such supposition tobe related which arebelieved selected variables aretherefore ona numerical scale.Response butin with alsoapply attitudes ofability. Similar conditions may dimension totheunderlying inthereverse direction. Thatistheresponses itis more usualtoproceed enquiries sociological ofone or moreunderlying there whether is evidence and theaim is to discover are given in followed Thisis thepathusually for theresponse which pattern. couldaccount dimensions here. andisthe oneweshall factor adopt andexploratory analysis analysis component principal isbecause this Partly statisticians. with wide favour havenotfound models Latent variable has Thisdifficulty themodels. fitting methods usedfor andsomewhat arbitrary ofthetedious More procedures. efficient estimation toimplement computers byusing beenlargely overcome ofthemethods andarbitrariness subjectiveness abouttheapparent serious, havebeendoubts of aspects the andtechnical for the this isbecause substantive results. Possibly, used interpreting them intheapplied field candeploy intertwined that an expert areso closely only themethods inpsychology and tomeet realneeds arisen models havecertainly Latent structure effectively. and Aigner (see,forexample, amongeconomists interest and there is a growing sociology inmost of analyses areimplicit qualitative that suchmodels I would argue 1977). Goldberger, and,as their explicit nature ofstatisticians tomake itisthe business social andthat phenomena overdue butin thepresent Attention is therefore bystatisticians far as possible, quantitative. dataand areaofcategorical Itis totaketherelatively neglected ouraimis more modest. paper be described analysis. as factor howbestto carry consider outwhatmayreasonably ofcategorical data.Someearlier to treat thefactor analysis attempt Thisis notthefirst ofthe the within the framework orother, tobring means problem have bysome aimed, attempts ina group ofpapers isthe latest factor common model. Muthen normal (1978) standard theory, the casewhere which dealwith (1975) andChristofferson BockandLieberman (1970) including that the2Pcontingency In essence, bysupposing do this aredichotomous. they all variables intotwo distribution multinormal ofa p-variate from eachdimension tablearises grouping ofthe factor tohavethelinear structure Theunderlying variables arethen assumed categories. ofLazarsfeld (1968)andLord and Henry related to that Thisapproach is very closely model. approach. All ofthese casesofourgeneral models ariseas special and Novick (1968). for analysing data and he also a method multi-category McDonald(1969)proposed point. hisstarting model Likeushemade the latent structure much ofthe work. earlier reviewed ofBock neither doesthat theordering ofthecategories; McDonald's method doesnotutilize model. logistic basedon a multivariate (1972), and the from principles first theproblem In thispapertheaim has beento approach tobe remains Muchmore than techniques. rather computational is onfundamentals emphasis so farareencouraging. on thecomputational achieved sidebuttheresults doneespecially 2. THE
and Notation 2.1. Terminology
MATHEMATICAL FRAMEWORK

dataas inits introduce context andthen the categorical Webegin general problem bysetting andaredenoted variables weobserve will becalled which manifest a special case.Thevariables

1980]

BARTHOLOMEW -

for CategoricalData Factor Analysis

295

to a setof to be related modelsupposesthesevariables structure byx = (x 1,x2, ..., xp)T.A latent variablesdenotedby y = (Y1, Y2, ..., Yq)T. For themodelto be practically latent q unobservable and may x and yis stochastic between thanp. The relationship q needsto be muchsmaller useful ofx giveny. function 7(x Iy) beingthedistribution probability by a conditional be expressed The mass accordingas x is continuousor categorical. or probability This will be a density theobservedvalues ofx. Let p(y)denotethejoint about y from something is to infer problem of the y's and f(x) thatof the x's thenthe two are relatedby distribution
f(x)
=

P(Y)dy, y(xy)

(1)

x has been observedour knowledge whereR is the rangespace of thelatentvariables.After about y is givenby (2) p(YI X) = P(Y)7(X IY)/f(X). ofy is of thefactthatthedistribution is thusachievedfrom we are seeking The data reduction smallerdimensionthan that of x. In practicewe may well be contentwithsome suitable of y such as E(y I x). measureof the conditionaldistribution summary a cell ofthetableandf(x) will table,x willidentify contingency In thecase ofthemulti-way We shall label the categoriesalong each dimensionby probability. be its multinominal 0, 1,2,..., 0 beingthe"lowest"level,1 the nextand so on. Thus, forexample,thedesignation at level at level0,thesecondat level2,thethird variable thefirst to thecellwhere (0,2, 1,3) refers of our model is thatthe latentvariablesare feature at level 3. A distinctive 1 and the fourth This choiceis based on thefactthatmostlatent continuous; p(y)and p(yIx) are thusdensities. variableswhich arise in social science discourseare thoughtof as being continuous.For in are all regarded politicalhue and aggressiveness standardofliving, qualityoflife, example, maybe handledby as categorical treated arebetter variables thelatent thisway.The case where forwhichsee Goodman (1978). latentclass analysis, 2.2. Assumptions which about thevariousfunctions some assumptions can be made without Littleprogress thatis that we make is thatthe y's are independent, assumption The first we have defined. P(Y) =
q i= 1

l P(YA).

easierto butitmakestheanalysis thisassumption reasonfor compelling Thereis no completely considerations practical seemsreasonableto adopt it until It therefore out and interpret. carry ofp(yi). We shallarguebelowthat is about theform The secondassumption dictateotherwise. and that the choice may be made to suit our is essentiallyarbitrary this distribution forthis on (0, 1). The justification For this reason we have made it uniform convenience. Thereseemto be two ofa latent variable. at thenature us to look moreclosely assertion requires cases as follows: distinct be measured (a) The latentvariablemay be "real" in the sensethatit could,in principle, the To avoid asking wealth. likepersonal quantity An examplewouldbe somesensitive directly. in thehope and life-style ofquestionsabout possessions ask a battery direct questionwe might in thiscase. The variable-wealth and scale theunderlying enable us to identify thatthey might to assume and it would be quiteinappropriate not arbitrary ofwealthis certainly distribution we have the second case. thatit was uniform. Such cases seem quite rare.More commonly evenin directly, thatitcould notbe measured variableis not"real"meaning (b) The latent and abilities Attitudes economyofthought. used to facilitate It is a mentalconstruct principle. come into thiscategory. largely

296

- Factor Analysisfor Categorical Data BARTHOLOMEW

[No. 3,

toconstruct oneto suit our scaleinsuchcasesweareat liberty Sincethere is no "natural" availablein themanifest levelof measurement Sinceordering is thehighest convenience. to askfor ofthe no more thanan ordinal level ofmeasurement itseems reasonable variables anymonotonic transformation tothe extent that latent variables also.Sucha scaleis arbitrary ofthelatent thedistribution equallywell.Thus whatever ofthechosenscale wouldserve be given a desired such as theuniform, by ona chosen scaleitcanalways distribution, variable ofscale. monotonic an appropriate change The aboutthis. tobespecified twoassumptions is i(y Ix).Wemake function Theremaining totherationale ofthemethod, is that ofconditional which is fundamental crucial assumption, We assume that independence.
p

n(x IY)=

nli(xi

IY)

(3)

by their explained amongthe x's is wholly dependence This meansthatthe observed on they's.Eliminating variation inthelatter removes theinter-dependence ofthe dependence on the bytheir dependence thex's is fully explained among x's.In that sensetheassociation are that theobserved variables tothehypothesis formal expression latent variables. Thisgives oflatent nottrue itwould imply interms ofa smaller If(3)were dimensions. describable number on thex's. a common influence variable exerting thatthere was someother theassumptions Under madeso far (1) becomes
f(x) =
I' Ij
O

.. *@ J
O

p
i=1

i(xiI y)dy = E [l zi(xi Iy).


i=l

(4)

function, is calledthetrace oftheform oftheresponse function sometimes Thechoice ni(xi Iy), ofthemodel. thefinal stepin thespecification 3. THECHOICE OF RESPONSE FUNCTION is initially, that eachvariable For theapplication to contingency tables we shallsuppose, in Section 7. In this willbe relaxed case we maywrite dichotomous. Thisrequirement on theithmanifest theconditional ofa response at the"upper" level probability ;i(y)is thus ofas a positiveresponse). variable (also spoken which thefunction We beginby listing someproperties ;i(y) shouldpossessand then of whether meetthespecification. Let # denotethefamily consider functions existwhich then we claimthat# should thefollowing properties: possess functions acceptable be monotonic in eachargument. non-increasing or non-decreasing (ii) i(y) should ofyby thefunction oftheelements byreplacing anysub-set obtained (iii) If t(y) E # then their should also belong to F. complements E #. E # then1- t(y) (iv) If t(y) 1,Yo< Y< Lessformally we mayadd
(vi) Ifq
(v) n(Y) = E s =1 n(y)- Y<,Y} ? 0<

i(xiIy) = {4(y)}Xi{ 1_ 7i(y)I-xi, (xi = 0,1);

(5)

(i) 0<, 7(y),< 1 forOSiSi 1 (i = 1,2,...,q) forall n(y) E F.

<1) e.F forall yo(O< yo

ofshapes. be capableofdescribing a widevariety should (vii) t(y) in t(y) be small. ofparameters should (viii)The number be easyto evaluate. of(4) should (ix) The expectations thefact is a probability; wewould notwish to reflects that t(y) Condition ideally, (i) merely toensure this. Condition but (ii)is notessential ontheparameters constraints impose artificial

1980]

BARTHOLOMEW -

Factor Analysis for CategoricalData

297

ofresponse in practice are suchthattheprobability mostlatent variablesconsidered increases, in stepwithchangesin thevariable. thearbitrariness ofthe Condition(iii)reflects or decreases, in which thelatent variableis measured. For example, wecan equallywellmeasure direction the or viceversa.Condition and arisesfrom the from left to right political spectrum (iv)is important in thedirection ofthecategories. oftheordering arbitrariness Answering "yes"to a questionis "no" to its negation.This conditionensuresthatthe outcomeof the the same as answering analysisdoes not depend on whichchoice we make. Conditions(v) and (vi) ensurethattwo in when no reduction special cases are included;(v) is the case of completeindependence is thecase of a perfect scale (Guttman is possible;(vi) whichis less important, dimensionality and all thosebelow negatively. scale). All thosewithy above Yo respondpositively functions whichsatisfy Thereis no difficulty about finding (iHvi) but mostdo not meet between (vii) and (viii).It is not,ofcourse,possibleto (viiHix) and thereis a naturalconflict ofitsparameters without express violating (i). Insteadwe shallconsider 7i(y)as a linearfunction theclass of functions givenby (6) cijH(yj), (i = 1,2,...,P). j=1 The coefficients can thusbe interpreted in theusual wayas factor loadings. We mustselect {cxij} the functions G and H to meet the conditionsset out above. If we choose H so that (iii)is satisfied, theeffect beingto changethesignofthecoefficient H(yj) = - H(1 - yj)condition ofyjis changed. thatG - 1must whenthedirection ofmeasurement Conditions (i) and (iv)imply havetheform ofthedistribution function ofa randomvariable distributed symmetrically about zero.Thisis equivalent to requiring thatG(v) =-G(1 - v)so thatbothG and H must be selected from the same class of functions. In practice the choice is very limited,the commonlyused functions being the logit - v)) and the probit(probit (logitv = log v/(1 (v) = - '(v) where D is the standardnormal the distribution Anotherpossibility is the inverseCauchy distribution function). function; choiceis H(v) = G(v) = v- i butthisviolates simplest condition (i). The complementary log log thecase function is ruledout byconditions (iv)and (iii).Lord and Novick(1968),whodiscussed q = 1,used thelogitforG and theprobitforH. Bock and Lieberman (1970) used theprobitin for aregood reasonsfor thelogitfunction bothG bothcases.We shallarguethatthere preferring and H. Our choice may therefore be expressed as
G{ii(Y)}
= XiO+ E q

logit ;i(y) = log {Ei(y)/(1-i(Y))} = log Ril(l 1log yj/(l i) + E (Xij 1-yj)
j=
1

(7a)

or
=~H = 7ri I1 Y, 7ri(Y)
q

j=1

j I1 Y yg+ 7ri Fl
j=1 oo i=,

- 7i) rl(iYYI (1G xJ?(b

j=1

(7b)

(O <

i < 1; - 00 < (ij <+

2,..., p,j = 1,2,..., q).

We shall refer to thisas thelogitmodel. Ifwe transform to a newscale forthelatentvariables givenby uj = log yj/(l- yj) then Ri(u)=
7ri {i

+ ( -7ri)exp (-E

cxijiu)}

(7c)

If they'sare uniform theu's have a logistic distribution withmean0 and varianceo2 = 72/3. Thistransformation emphasizes thepointalreadymadeabout thearbitrariness in thechoiceof p(y)and 7(y);thechoiceis,in fact, a jointone. The modelis incapableofdistinguishing between

298

BARTHOLOMEW -

for CategoricalData Factor Analysis

[No. 3,

tointerpret as in(7c).Any attempt with ni(u) with having ;i(y)as in(7b)andujlogistic yiuniform empirical ofreceiving terms istherefore incapable ofeither function inphysical forms particular conditions by(6b)satisfy that the functions given beverified easily Itmay support. (iHv) but(vi) only holds foryo = 4 in the limitas c -+ oo. ofti(y) when Itisthe value interpretation. anduseful nihasa direct Theparameter yj = - for at themedian position for an individual ofa positive response is the probability allj andhence onthe ofresponse itisthe probability In that typical therefore, sense, dimension. oneachlatent ithdimension. by(4).For the2Ptable theexpectations given themodel wehaveto evaluate tofit In order oftheform ofintegrals thedetermination involves this
J
o o

AND THE FIT y-SCORES 4. EXPECTED FREQUENCIES,

OF THE MODEL

Z(y)... j(y)... dy, gj

(8)

when canbefound a = ,1 q = 1for expressions Explicit contains pfactors. where the integrand method isthe andthis matter toevaluate itisa straightforward (8)numerically and2.However, with time increases q. Thecomputational rapidly which follow. wehaveusedintheexamples theanalogue of"factor scores". andderive wish to go further wemay analysis As infactor tableintoqofthecellsofthecontingency a mapping ofas finding Thismaybe thought ofy via theconditional distribution this problem Euclidean space.We approach dimensional has theform x. For the2Ptablethis given
Ay I x) =

{1-gi(y)} Hl{mi(y)}Xi i= 1

1 xi/f(x).

(9)

x butwe with a given valueofyisassociated x.No single given ushowyisdistributed Thistells that valueofyfor as a typical ofthe distribution oflocation take somemeasure may reasonably ofthe inymay be interpreted as thequantile theelement Yr distributed, x. Sinceyis uniformly Theexpectation stands. the individual onwhich rth latent onthe dimension distribution E(y,Ix) vector x. with manifest an individual "below" ofthe population isthus the proportion expected the r.Itrequires evaluation ondimension ofthe individual this tobethe y-score Weshall define oftheform ofintegrals

1 *| 1
o

yr i(y)j(y)

dy

(10)

numerically. which maybe obtained In wehavebeen. ofhowsuccessful measure tohavesome model itis useful fitted the Having ofthetotalvariation theproportion we do thisbycomputing analysis component principal tablea similar In a multi-way contingency for which is accounted byeachofthecomponents. butwehave offit. Thiscouldbe chi-squared ofgoodness measure maybe basedon measures to use preferred
A = 2Z0ilnOi/Ei
i

(1 1)

andexpected ofthelog-likelihood; itis a linear transform because Oi andEi aretheobserved forthe is takenoverall cellsof thetable.As a base-line and thesummation frequencies ofcomplete on theassumption theE's arecalculated wetakethevalueofA when comparison wehopethe which from ofthetotal independence departure Thisis a measure independence. the E's arethose obtained when samequantity itbyA0.LetAqbethe denote model will explain; ofhowmuch is a measure theratio then (A0- Aq)/AO variables with a model q latent byfitting Ifthe model fitted. parameters for isaccounted bythe from independence the departure original

1980]

BARTHOLOMEW

- Factor Analysis for CategoricalData

299

are fitted by an efficient methodA willhave,approximately, a %2-distribution withdegreesof freedom (2P- number ofparameters -1) and thegoodnessoffit maybe judged by thismeans. 5. SOME BASIC RESULTS Sincewe wishto reducethedimensionality ofthedata as muchas possible, a natural wayto proceedis to takevaluesofq in order, beginning with q = 1 and stopas soon as a good enough fit is obtained.However, increasing q increases thenumber ofparameters to be fitted and there comesa pointwherethemodelis under-identified. As theliterature on factor analysistestifies the questionof identifiability has subtlefeatures and the same is trueof our model. The case q = 1 presents the least numberof problems. It is useful, therefore, to have the following theorem whichhelpsto show whether a one-factor modelhas anyprospect offitting thedata. Let us denoteby Rii thecross-product ratioformed from the expectedfrequencies whenthe table is collapsed over all dimensions excepti and j. That is R _Eni(y)j(y)
'i Eni(y) ( 1-gj(y)) Enj(y) ( 1-gi(y))

E(1 -i(y)) ( -j(y))12)

(12

(Rij willplay a key role lateron). 1. If 7ci(y) Theorem are both monotonicnon-increasing and 7cj(y) or non-decreasing then and icj(y) are constant. Ri -1 > 0, otherwise Ri -1 < 0 withequalityonlyifat leastone ofni(y)
Proof Ri - 1 = {Eni(y) nj(y) - Eni(y) Enj(y))}/Eni(y)(1 - j(y))Eicj(y) (1 - i(y)) hence the signof Ri -1 is the same as thatof di,= Eni(y) - Eni(y) 7cj(y) Eij(y). Now
I

di

;i(y) {fj(y) - Eij(y)} dy.

Supposefirst that c1j(y) is monotonic > Eicj(y) decreasing, thenwecan find y = y*suchthaticj(y) foryk y* and icj(y) < Eij(y) fory< y* so thatdij may be written
dij=

;i(y) {j(y)

Eij(y)} dy+

;i(y) {fj(y)- Eij(y)} dy.

If 7;i(y) is also monotonicnon-decreasing


dij ;> i(y*)

J {icj(y) Enj(y)}dy+ 7ci(y*) J {j(y)


-

Ej(y)} dy = 0.

If both functions are monotonic non-increasing a similar argumentleads to the same conclusion;otherwise theinequality is reversed. Equalityobviouslyoccursonlywhenone or both functions are constant. The practical relevance ofthistheorem is as follows. Ifwe reverse theorderofthecategories on dimension i theprobability ofa positive response willbe 1- ici(y) insteadof ri(y). If;i(y) was formerly decreasingthe corresponding probability for that dimensionwith the categories willbe increasing. reversed In theone-factor case, therefore, itmustbe possibleto re-order the dimensions so thattheresponse functions either all increase or all decrease. Ifthisis done all the For a one-factor modelto be appropriate itis thusnecessary (butnot (Rii- 1)'swillbe positive. thatan ordering of the manifest sufficient) dimensions existssuch thatall the cross-product ratiosare greater thanone. Suppose,for example, thata tablewith p = 4 has R12> 1,R13, R14, R23, R24 < 1,R34 > 1.Reversing thecategories on a dimension changesthesignofRi - 1.In this case reversing the orderon dimensions1 and 2 willproducea set of positivevalues. In practice, ofcourse, theRij'shaveto be estimated from thesamplecross-product ratiosand - 1)'scan so thesignscannotbe determined with certainty. Nevertheless, ifall or mostofthe(RiJ be made positiveit is worthtrying a one-factor model.

300

BARTHOLOMEW -

Factor Analysis for CategoricalData

[No. 3,

The following theorem holdsforall members ofthechosenfamily ofresponse functions. It provides thebasis for theapproachto estimation proposedinthenextSectionand italso serves to displaya feature whichlinksvariousmethodsof factor analysistogether. Theorem 2. Eni(y)lj(y) - Eni(y) Eij(y)
= Tr2 Gq

1'(oiO)G - 1'(axO)E

of the4th degreein oad + terms and

k= 1

aik ajk

l,2,..., p; ioj). The proof is based on a straightforward oftheresponse Taylorexpansion function and term by term The left-hand integration. side is thepredicted covariancebetween xi and xj and the theorem shows thatthishas a simpleform ifthedeparture from completeindependence, as measured bythex's,is small.The covariances in thenormaltheory factor modelhave theform we should look forsamplefunctions whichare estimates of the 4j= 1 aik ajk and thissuggests quantities
=

where T2 = EH2(yj) (i,j

eJ7,

Eirj(y) E iry) i(y) - Eni(y)

(13)

Ifsuchcan be found they wouldhavethesamestructure smallx's)as in thenormal (for case and henceknownmethods ofestimation could be used.Ifwe takethelogitform forG and H itturns out that
=

k=1

Ea,

of 4th degree, ajk XJk+terms

(14)

where U2 = E log2 {y/( -y)} = 3-289,868. The samplecross-product ratiocan be used to estimate Rij and hencethe x'svia (14). Since cross-product ratiosare the "natural"measuresof associationin 2P contingency tables it is to findthemarising satisfying in thiscontext thussupporting thechoiceof thelogitfunction. If the probitfunctions are chosen then(13) is a first to the tetrachoric approximation correlation coefficients whichmay be taken as partly the heuristic justifying methodwhich carriesout a normalfactor analysison thesecoefficients.
LOGIT TO THE 2P TABLE 6.1. Methods Sincethelogitand probitfunctions are very similar one would expectbothversions ofthe general modelwe haveconsidered to givesimilar results and to involve about thesameamount ofcalculation. The logitfunction is easierto compute thantheprobit butthisadvantageis likely to be fairly Bock and Lieberman marginal. (1970)developeda maximum likelihood for method theprobit modeland illustrated iton twoexamples with p = 5 and q = 1.The method involved extensive numerical and they integration thatitwouldnotbe feasible suggested for p inexcessof 10 or 12. Christofferson (1975) founda faster methodusinga least squares fitof Eni(y)and p) to their sample estimates.Muthen (1978) has made further E;i(y) ij(y) (i,j = 1,2,..., on thismethodby substantially improvements theamountofnumerical reducing integration It appears from required. thisworkthatlittle information is lost by usingonlythe first and secondordermargins for estimation. could be provided Programs for thelogitmodelusingthe same methodsand theywould presumably involvesimilaramountsof computation. thelogitmodelhas an important However, which often property makesitpossibleto obtain a simple solutionwhenq = 1.Thissolution approximate also offers a good starting an pointfor iterative procedure by whichit can be improved. The basis ofthemethodrests on thefactthat theapproximation givenby(14) is remarkably good evenwhenthex'sare far beyondtherange FITTING THE ONE-FACTOR MODEL

6.

1980]

BARTHOLOMEW -

Factor Analysis for CategoricalData

301

when theycan be describedas "small". Table 1 gives values of (Rii- 1)/ocia cjc2 forvarious is to be judged by theclosenessofthe of(;ri,7cj)and (oi,aj). The approximation combinations on a has been dropped). ratiosto 1 (the second subscript
TABLE 1

are unchanged a2. 7he entries if(;i, 7cj)is replacedby(1 -i, Valuesofcij = (Rij- 1)/oci oej and (oi,xj) by (aj, ci)
(7ri, 7ri) (ire (irI) (1)10) i) (20, 20)

1- icj)

(2,2) (2,1) (2, i) (1, 1) (1, i) (j, i) (j, 4)

-) (-h,

(41

i)

0 942 0-801 0 614 0 912 0-846 0-935 0 917 0 971 0-994

1192 0 944 0 668 1063 0 934 1011 0 965 1.001 1000

0 984 0850 0-644 0 988 0 912 1-015 0-977 1-016 1-004

1 119 1008 0 731 1245 1125 1263 1139 1.119 1018

1280 1-196 0820 1576 1372 1535 1288 1-188 1003

are farapart and when ;i and ij are small. Only The worstcases occur when ci and oca forreasonswhichwillemergebelow. positivevalues of ci and cajhave been considered ofits term whichhas ai ajU2 as thefirst oftheexpectations Ri -1 is not theonlyfunction is much theapproximation forexample,of lnRi . Unfortunately expansion.The same is true, whichwe have investigated. less good forthis,and otherfunctions, a and ir such that(a) thecrossestimates is to find The basis of our methodofestimation productratiosforthe model are as close as possibleto those observedand (b) the marginal iteratively usingtheresult The x'sarefound ofthemodeland thedata agreeexactly. proportions steps. point.We proceedby thefollowing of (14) as a starting (1) Find a vectora such that oioj is as close as possible to the estimatedvalues of cij= (Rij- 1)/a2 fori,j = 1,2, ...,p; i:j. using proportion marginal (2) Find irbyequatingEni(y)(i = 1,2, ...,p) to thecorresponding the vectora obtainedin step 1. of a by a methodto be described. (3) Improvethe estimate ir usingthe improved a. (4) Re-estimate (5) Repeat the cycleuntilir and a (or A) converge. will required integration theamountofnumerical oftheiteration arerequired Ifmany cycles if p is,say,greater computers present-day basis with beingusedon a routine themethod prevent is oftenquite adequate for approximation than 10. However,we shall show that the first on p. The limit no practical with on a computer Thiscan be obtainedrapidly purposes. practical from theestimated does requirenumerical parameters calculationoftheexpectedfrequencies timeforlargep. (withall methods)and thismay take a considerable integration so we must thecij'sexactly p > 3; ifp > 3 itis notpossibleto reproduce The method requires findan a such thatthedistancebetweenthecij's and the ai caj'sis as small,in some sense,as thecij's context in which analysis factor thisproblem arisesin normaltheory possible.Precisely is based on also applicablehere, loadings.One solution, are covariancesand the x's are factor minimizing
p p

E(^jaa)2. (A i= 1 j= 1
joI

302

BARTHOLOMEW -

Factor Analysis for CategoricalData

[No. 3,

shall call it the rowand column method.

in It is an iterative known for procedure as the"minres" method and will be found, example, Harman (1970). which isas follows. Analternative method isboth intuitively appealing andeasy toapply We that the with hasthe Consider matrix elements p).Thismatrix property {eiGej}(i,j-1, 2,...,

= (Rowi total) x (Column ] total)/Grand total. (i,j)thelement (15) If we regard thecij'sas estimates oftheoff-diagonal elements we can treat theestimation for matrix such that infact, as oneoffinding elements this Wecan, problem diagonal (15)holds. The x'swe seekwilltherefore element. the ensure that(15) holdsfor every diagonal satisfy equations
a=
EC+ aX;4

where
p

j=l
p i=1

(i = ,2,...,p),

(16)

Ci=

j=1 j#i

E Aij, C=

ECi.

to Theseequations are equivalent


oi( oj oi= ...,p). Ci (i = 1,2, (17)

ofestimation result have ownright as a means since They considerable appealintheir they from if totheir Their the row values. solution issimplest equating the off-diagonal totals a's observed areall ofthesamesign. On thisquestion we havethefollowing lemma: ifand only if Thetwoa-vectors which ofthesamesign Lemma. satisfy (17)haveelements all i. Cik0 for it is therefore Before withthemethod of solution to ensure that proceeding necessary theorder ofcategories, p) bychanging Ci> 0 (i = 1,2,..., if/necessary. which yields
A = If I ci,(17) becomes Writing oi(A-oi) = Ci or a3- Aai+Ci = 0 (18)

(19) Ifwe sumbothsidesofthis overi weshallobtain an equation expression for A. Oncethis is estimates solved ofci (i = 1,2,..., (19) willprovide p). There isan ambiguity ofsign involved in(19)which leadstotwo alternative equations for A. A is therealroot, where itexists, of p-2= A is therealrootof Otherwise p-2
p-i p

, p). 2,... ji= ZA? MYA2-40j)4 (i = 1,

i= 1

(1-4C./A2) .

(20)

i= 1

(I E(-4CJA)

(-4Cp/2+

(21)

It is notimmediately obvious inwhat this thedistance sense procedure minimizes between the c. 's andtheproducts when weobserve thesameestimating that apparent aiaj. Thisbecomes

1980]

BARTHOLOMEW -

Factor Analysis for CategoricalData

303

equationsresultfrom maximizing

( xi In =EE~ ~ci oj =1 j=11=1


p p

i#j

i#j

oc ioej
log (CcieC)

(22)
so bymaking (22)as

nearto thisvalueas wecan weareachieving thebestfit ina certain sense.For thesolution of(17) to be a maximum it is necessary forCi >0 (i = 1,2,...,p). This also ensures thattheax's willall have the same sign and thusthatthe argument of the logarithm in (22) is positive. Having obtaineda we nextestimate the ;1i's from N dy, (i = 1,2,...,p), _______________ (23)

with tothex's.Thegreatest of isI I respect possible value

whereNi is thenumberofpositiveresponses on dimension i. These equationsmay be solved iteratively by the usual Newton-Raphsonmethodusing;ri= NJ/N as a starting value. The method so farrests on thesupposition thatcii = cicj. This is onlyan approximation so wenextwrite = be closeto where Oijdependsweakly on ;i, 7cj, ai and cj butwillusually 1. Using the estimates of a and ir alreadyobtainedwe nextestimateOi% by

cij ai aj Oij
tij =

Ci (1ii,Jrj, cZi, Ii)/1ji c2j

(i,j = 1,2,..., p; i #j),

(24)

are giventhe same values as the cij(i, i,rj 2,Lj) being the value of cij when the parameters The cycleofestimation is now repeated thestarting values preliminary estimates. by replacing ofa do not cij by cij/i. It can happen,as one oftheexamplesbelow shows,thattheestimates from thefact thatthere is nothing in at all. The possibility ofthisis apparent appearto converge In sucha case an iterative themodelto prevent one or morea's beinginfinite. procedure starting seem.Whenan x from finite valuesmaynever terminate. Thisfeature is notas seriousas itmight than2 say,bigchangesin a produceonlysmallchangesin theshape of;i(y) and is large, greater all thatmatters is hencein theoverallfit ofthemodel.From thepointofviewofinterpretation no difficulty arisesifwe stoptheiteration thatthexin questionis "large".In practice, therefore, in the fitis obtained. as soon as no worth-while improvement where thea's turn out to be smalland ofthesame order In mostcases we have investigated, of magnitude, of the parameter is rapid.This is especiallytrueof Rt. estimates convergence 6.2. Examples To illustrate theuse oftheone-factor logitmodeland to compareitwiththeprobitmethod we shallgivetheresults offitting themodelto sevensetsofdata. Two ofthesewereusedbyBock and Lieberman (1970),Christoffersson (1975) and Muthen(1978).Theyrelateto 1000cases on each ofSectionsVI and VII oftheLaw School Admission Test(LSAT). Background detailsand theoriginal data are in Bock and Lieberman(1970).The results offitting thelogitand probit in Table 2. For thelogitwe givethefirst modelsare given approximation obtainedfrom (14) and the final estimatesafteriteration.For the probit model we give Bock and Lieberman's maximum likelihood estimates,Muthen's generalizedleast squares (GLs) estimatesand Muthen'sunweighted least squares estimators. The latterare obtainedby doing a standard factor analysison the tetrachoric correlations obtainedfrom the table. We have re-parameterized the probitmodel to conform with(6) as explainedlater.For Section VI the fitby all methodsis excellentwithA almost equal to its expectation. The between thevariousparameter differences estimates are negligible on and wouldhave no effect theinterpretation ofthefactor. On thesegrounds there is therefore nothing to choose between

304

BARTHOLOMEW -

Factor Analysis for CategoricalData

[No. 3,

VII thefit is lessgoodandthere is greater thelogit andprobit models. In thecase ofSection variation intheestimates but, again, these arenotsufficient to affect theinterpretation ofthe analysis.
TABLE

estimates andgoodness theLSAT data using theprobit and logit Comparison ofparameter offitfor models
Logit First approximation Final estimate Probit Maximum Muthen Muthen likelihood (GLS) (ULS)

VI Section al
0(2 0(3

a4 a5

7rl 7r2 ir3 ir4 ir5 A

00460 00431 00516 00401 00373 0-941 00731 0-785 0 887


21-24 0-562

0-563 21-17

0-410 0-424 0-538 0-391 0-351 0-938 0 730

0-418 0 433 0 537 0 404 0 359 0-924 0 709


0-552 21-28

0-417 0 455 0-510 0 457 0-380 0.925 0 707


0.555

0-784 0-885

0-763 0-870

0-762 0-870

0-552

0-402 0-448 0.550 0-402 0 345 0-924 0 709

0-763 0-870

VII Section a,
a2

a3
0(5 7rl 7r2 ir3 i4 ir5 A
04

00574 00455

0-663

00898 00444 0-876 00687 0-848 0-620 0-868


33-1

0 604 0-581 0 907 0-420 0-870 0-688 0-849 0-621 0-865


32-21 0-465

0-560 0-648 0-986


0.462

0-588 0-667 0.959


0-480

0.609 0-598 0.922


0-480

0-411 0-828 0.658 0-772 0.606 0.843


31-59

0-413 0.828 0-657 0 775 0-606 0-843

0430 0-828 0-658 0-772 0-606 0-843

the From the ofview first isthe andcan computational point "logit approximation" simplest out witha pocketcalculator forproblems of thissize. In thatcase the easilybe carried in thesolution of(23)can be avoided a normal integration byusing approximation given by
fii .
+ ai2)+ @- 1(Nj1N)}. Oct1

(25)

effort The computing required forthefinal estimates depends on how manycycles of the iteration are necessary and this,in turn, dependson the accuracy required. No exact methods butitseems tobefaster haveyet the various been made with comparisons probit likely thanMuthen's thanthemaximum likelihood method butslower GLS method. oftheexpected ofthe Thecomputation and they-scores involves calculations frequencies for bothmodels here thelogit ofbeing to sameorder has theslight easier though advantage thantheprobit. calculate In social, as opposed to psychometric and educational applications, p is often quitesmall andwhat iswanted isa simple method ofextracting oneortwofactors anda way ofproviding a scaleofmeasurement for the Wetherefore ofthis kind latent variables. five further give examples

1980]

BARTHOLOMEW -

Factor Analysis for Categorical Data

305

is and to illustrate approximation in whichour main aim will be to see how good the first process.The sets of data used are as follows. problemswhichmay arise in the fitting A aboutcancer. to knowledge SetI is takenfrom Lombardand Doering(1947)and itrelates sourcesof general concerning on fourdimensions wereclassified sample of 1729 individuals knowledgeeach having two categoriesas follows:(1) Radio/no radio; (2) Newspapers/no variablewas A fifth lectures. (4) Lectures/no solid reading; (3) Solid reading/no newspapers; ofcancer.Herewe shalllook onlyat the had a good knowledge or nottherespondent whether we might is evidenceofa singlelatentvariablewhich, there first fourvariablesto see whether people are in general. would have to do withhow well informed anticipate, by2982 attitudes to scienceexpressed Solomon(1961) and concern SetsII andIII are from youngpeople. They weredividedinto two equal groupson thebasis of theirIQ (High = II, were elicitedin the formof positiveor negativeresponsesto four Low = III). Attitudes in Plackett(1974) whichalso contains questions.The data and the questionsare reproduced Set I. from a survey on entry thequestions to theEEC Upton(1978)where SetIV is a 25 tablefrom variable. be expectedto relateto a latentpoliticalleft/right weresuch as might oftheelderly and it is includedhereas an example a studyofmobility Set V is takenfrom whichgivesriseto problemsin fitting. 3 in thetext cases described and goodness thefive ofthelogitmodelfor offit estimates Parameter
TABLE I First al a2 a2
a4 S

II Final First 0 169 0 448 0818 0-217 0-818 0-174 0 646 0 543 11-80 7 Final 0 195 0 400 1068 0-223 0 819 0 178 0 664 0 543 11-10

iIIt First Final First

IV Final 0 986 0 397 0571 1 074


0-520

v First 2 757 2 682 0275 0-386 0 008 0-526 0 237 0-411 3331 7 Final 1 695 2 382 0457 0-594 0 037 0 524 0-221 0-402
-

00444 0 445 11228 1 550 00864 0860 0-506 0-456


--

0 168 0 164 0 097 0 161 1 143 15225 0 242 0-168


-

0 962 0 351 0-546 0-998


0 493

7rl

7r2 7r3 7r4

0 213 0-604 0 461 0 057


--

0-212 0-620 0 464 0 061

0-839 0-169 0 526 0-446


-

0 839 0-167 0 732 0-448


-

0 704 0-454 0 469 0 703


0-389

0 707 0 453 0 467 0 709


0 387

2371 1915 A 7 Degrees of freedom 15 cycles. t After

17-03 7

1292

8983

9040 21

2162

butthe might be involved factor thata further In all cases,exceptII, thevalue ofA suggests The first is always verysubstantial. in A from the case of completeindependence reduction to which we discrepancies marked there are several is good on thewholethough approximation turnin a moment. 15 was stoppedafter and theiteration forox werenot converging In case III theestimates at thisstage thefit In otherwords, converged. At thispointthevalue ofA had virtually cycles. to what case,a = oo,is equivalent The extreme theparameters. affected bychanging was hardly on theresponseonce is no uncertainty is called a Heywoodcase in factor analysiswherethere

306

BARTHOLOMEW -

Factor Analysis for CategoricalData

[No. 3,

in which is assumed function ofresponse kind It is this is fixed. variable thevalueofthelatent In inourmodel. itas anomalous for regarding tobenoreason andthere seems scaling Guttman thesamebasicpattern. shows is lessgoodbutit still approximation this case ourfirst modelis poorthefirst oftheone-factor thefit in thatalthough Case IV is interesting iteration. fit thanthefull better a slightly provides actually approximation the (273)whereas large wasvery ratio onecross-product from a 24table where Case V arises wasa marked and there to converge in therange 2-4. Hereittook26 iterations others were gives first approximation here, the Even next inthe early stages. from onecycle tothe oscillation good.We haveother ofx areparticularly andtheestimates thebroadoutline ofthesolution theiteration p = 7, where one for including ratios, with largecross-product very examples of some tobebecause Thisappears tozero. with elements ofx tending many todiverge appears the 0 or1.Neither 3)near (vi)ofSection form with (seecondition havethe Guttman the ir1y)'s yo situation. that cope with can adequately model northeprobit logit aregiven These andy-scores. frequencies ofexpected the calculation A full analysis requires for Case I in Table4.
4 data on cancerknowledge modelto Lombardand Doering's Fit of theone-factor
TABLE

Cell
0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1111 Total A
1110

Observed frequency
477 12 150 11 231 13 378 45 63 7 32 4 94 12 169 31 1729

Independence frequencyt
279-1 23-6 251-8 21-3 359 2 30 4 324-1 27-4 87-3 7-4 78-8 6-7 112-4 9.5 101-4 8-6 1729-0 380 57

Fitted frequency
466 5 16-1 156-4 8l6 250-1 18-9 355.5 44 0 67-1 30 35 8 2-4 78-8 7.5 182-9 34-4
17290-

y-score
0-212 0 304 0-384 0 475 0 522 0-615 0 700 0 797 0 304 0-394 0 475 0 567 0-615 0-711 0-796 0-889

19415

ofcomplete on theassumption independence frequencies t These are theexpected variables. betweenthe manifest

overthe adequateit is a greatimprovement theone-factor modelis barely Although of degree tothe according cells ofthe ranking a useful provide They-scores fit. "independence" exhibit. which thecellmembers knowledge
7. MANIFEST VARIABLES WITH MORE THAN Two CATEGORIES

to function weneeda response onanydimension When there aremore than twocategories thecumulative We do thisbydefining offalling intoeachcategory. theprobability specify

7.1. Specification of theResponseFunction

1980]

BARTHOLOMEW -

Factor Analysis for CategoricalData

307

probability = Pr {an individual s or higher},) (26) with in category latent position y falls 7ris(y) = 1, ti,c(y) = 0, ni,O(y) falls into s on where c is thenumber ofcategories. Theprobability that an individual category theithdimension is thus y)i, s(Y)-i, s+ 1(Y). (27) the form as weusedfor Wesuppose that the cumulative function hasthe samelogit dichotomy, the with measure parameters and The a's do not depend on s. They departure from a*i. nir, also Thisfact ofcategories. independence should be independent ofthenumber and,as such, couldbe usedwith thedifferences Thesameapproach ensures that (27)arenon-negative. the or anyother oftheform function probit response given by(6).
7.2. Fitting theModel

2Ptable tothe Theapproximate for the caneasily beextended method already given general case.Atthecostofneglecting themethod of someinformation, thea's can be estimated using 6 byreducing Thisshould Section eachdimension to a dichotomy. be doneso as to make the Moreefficient be marginal frequencies as nearly equalas possible. methods would, ofcourse, desirable. Oncethea's havebeenfound thefull setofni,'s can be estimated each byequating marginal cumulative frequency to itsexpectation. Ifthenumber ofmanifest is at all largethecomputation ofthe variables and categories expected frequencies andy-scores is a substantial undertaking, evenfor the2Ptable. It is thus tohavea systematic desirable tothe is provided of calculations andthis approach bytheresult Theorem 3. Thestatement ofthetheorem requires somefurther definitions andterminology. Supposethatdimension i has ri+1 categories which, in accordance withour earlier convention, are denoted by thelevels (0,1,2,..., by a ri).Each cell ofthetableis identified sequence oflevels which canbegenerated symbolically byforming a Kronecker (direct) product ofthevectors oflevels. Thus,for example, ifwe had twodimensions with levels (0,1,2) and (0,1,2,3) thelisting ofthecellsis 00 01 02 03
11

[fl

x[?]=

(28)

20 23 Theorder ofthecellsdepends on theorder inwhich Thisis arbitrary wetakethevectors. but onceselected itmust beadhered tothroughout. Itisconvenient torank the inincreasing vectors order ofri,as in theaboveexample, and to multiply outfrom theright-hand end.
21 22

308

BARTHOLOMEW -

Factor Analysis for CategoricalData

[No. 3,

In general, for a p-dimensional table, thelisting ofthecellsis given byforming theproduct

x ... x

An E in front of suchan expression meansthattheexpectation is to be takenof all cell frequencies designated bytheproduct. we define the(r+ 1)x (r+ 1) matrix Finally, A, as follows: 1 -1
0 1

0
-1

0
0

... 0
...

Ar~ 0

~
0

~~-

Thismatrix forms thefirst differences of theelements ofanycolumn vector which it premultiplies. Inparticular, if itpre-multiplies a vector ofcumulative probabilities such as given by (26) it yields thecategory probabilities of(27). The theorem is as follows: 3. The expected Theorem cellfrequencies can be computed from theformula

x ... x

=N[A1 x A,2x... x A,j

X 1 1r2Y
7EIrl(y)
72.2Y)

X-1X 1) 7rp1(Y)
7rp'#(Y)

(Note thatthe absenceof the " x " signbeforeE on the right-hand side implies matrix multiplication ofthestandard kind.)

1980]

BARTHOLOMEW -

Factor Analysis for CategoricalData

309

Themarginal probabilities that an individual falls into thevarious on the Proof. categories ithdimension, given y,aregiven by i(Y) t1I
A,,i i12(Y)

(29)

7tlr (Y)

Thustheprobability offalling intoanycellofthetable is obtained the bymultiplying together relevant marginal for probabilities that cell(since theevents areindependent, given y).Thisis for achieved allcells theKronecker together ofthevectors byforming over product i (29)taken in thesameorder as in thestatement ofthetheorem. result that Usingthestandard
(Ax x By x...) = (A x Bx ...)(x x yx...
(r1 + 1) (r2 + .)...

theresult then follows on taking with to y. expectations respect The theorem identifies all theexpectations which haveto be evaluated. In all there are
(rp

converted into oneofexpected frequencies bypre-multiplying ofthe bytheKronecker product matrices. Thetheorem differencing as a special includes, case,the2Ptableobtained bysetting ri = 1 forall i. The y-score on dimension s for cellx is given by
= E(y,5jIx)

+ 1)- 1 integrals to be calculated. The vector of expectations is then

J1 J1 yp(yIx)dy J', J...

Nf...

1 J. J1 J1 y)dy. ... Jp(xI Yp(xI y)dy/N1

(30)

Theorem 3 provides a formula for thedenominator ofthis Thenumerator expression. can be found the bymultiplying Kronecker after the E byyv product andthen the evaluating expression as before. As an illustration consider the2 x 3 caseusedas an example atthebeginning ofthe section: 1 E

ni]

ir2 (y)] =-E:

i22(y)

r1 i(Y)i22(Y)

andeachelement ofthis must be calculated bynumerical The resulting integration. vector is nowpre-multiplied by
1
F

-1
1 -1

-1
0 -1

0
1

A1XA2=

XLo 'j][

1?0

1=l

0~~~~~~~~~
0 0 0 0

-1
1

-1
-1

to givetheexpected cellfrequencies.

310

BARTHOLOMEW

Factor Analysis for CategoricalData


AND EVALUATION

[No. 3,

wayto thecase of in a natural modelextend logit theone-factor offitting The methods factor for thenormal anymethod fitting all q and hence Theorem 2 holdsfor factors. several approximate matrix canbe usedtoprovide ofa covariance elements totheoff-diagonal model way. 6.1canalsobe usedinthe following inSection method for the a's.Thenew estimates given the Nextweconstruct described. theparameters wefit First ao.1 = (a 1t 1 219 ..., p1) as already (1978)data (Case IV, Table 3) theresulting in thecase ofUpton's For example, estimated. are estimates
A

8.1. More thanone Latent Variable

8. EXTENSIONS

be changedto render so signsmust quantities negative residuals f 60 - ij &I. Thesewillinclude andX is retotheresiduals fitted aO.2isthen A second setofparameters row totals positive. the

0A493), 0546, 0-998, 0O351, (0O962,

&A2
In

= (-0-518,0-241, 0-158), -0518, 0-520,

= (0 704,0A454, 0469,0 703,0 389).

to from 89x83 A hasbeen reduced model; one-factor for the closetothat ofx isvery Theestimate the estimates couldnowbeusedtoimprove Aniterative procedure isstill a poorfit. which 61x47 similar models two-factor ofcijfor Calculations remains tobeexplored. ofthis butthe feasibility restricted 2 is more byTheorem that theapproximation given inTable1 suggest tothose given is required. investigation q = 1. Further thanwhen in itsusefulness Thisis advantages. computational theprobit modeloffers Withmorethantwofactors however to bivariate integrals be reduced {Eni(y) becausetheexpectations nj(y)}can always that be shown it mayeasily there are.In fact variables latent many
Eni(y)= 4(DQio), lj(y) = Eni(y)

{{
Pij

(31)

rAfo rAjo

dz1 c(Z1, Z2; pij) dZ2 (i J),


q

(32)

coefficient with correlation density normal bivariate where 0 is thestandard


k= 1

Y. E

& Aft,j

(33)

methods estimation )+ (k = 0,1,2,..., where by = I L.2 proposed p). The various ,iik = aik/(l + yq arebased model this (1978)for (1975)andMuthen Christofferson (1970), BockandLieberman for intoestimates be converted can easily of{JAw} estimates on (31)and(32).Their essentially logit ofthe parameters tothe corresponding inturn, begoodapproximations would {aik} which, model. to usethe belowweprefer given reasons and for ofviewofinterpretation Fromthepoint which for logit parameterization
zi = G (axo)= G
(o/(1

Ah= E

Xih= /{Lh/(1

Aih

(34)

ofthe regardless matrices outoncovariance carried analyses tofind factor Itisnotunusual totreat possible itisperfectly 2Ptable Forthe variables. ofthemanifest ofthe distribution form estimated onthe a factor analysis andtoperform anyothers variables the indicator {xi}justlike

withotherMethods 8.2. Comparisons

1980]

BARTHOLOMEW -

Factor Analysis for CategoricalData

311

covariance matrix. Such an analysis can be roughly interpreted in terms ofourmodel,whenthe (k = 1,2,...,q) are small,as follows.From Theorem2, a factormodel fitted to the off4C.1's willestimate thequantities {TG - `(xi0) = diagonalelements (k 1, 2, ..., q). The variance ofxi ocik,} is G 1(ociO) {1 - G 1(aiO)}+ O(x2). For the logitfunction G'(v) = G(v){1 - G(v)} and hencethe samplevariances estimate G - 1'(aiO)and therefore theox's are determined. It is doubtful whether such a procedure has any practicalvalue. We have alreadynotedthatthelogitand probitmodelsare likely to givesimilar numerical results. At theconceptuallevelthere areconsiderable advantages indeveloping bothmodelsby meansofthelatent structure arguments used here.The traditional "factor analysis"approach assumesthatthere are twotiers oflatent variables. Firstthere is supposedto be a latent variable underlying each dichotomy; a positiveresponseis then observedif that variableexceeds a threshold value.Secondly, thesevariables are related to thesecondtier oflatent variables bythe usual commonfactor model.This maybe plausiblein some applications butwithdichotomies based on house ownership, tradeunionmembership and suchlike, thenotionofan underlying latentvariableand its associated threshold is somewhatartificial. When we add to thisthe thattheforms argument ofthedistributions ofthelatent variablesare essentially arbitrary the usual modelappearsas no morethana convenient It is forthesereasonsthatwe prefer fiction. the parameterization in (34) whichhas a morerobustinterpretation. Similar considerations applyto thehybrid modelofLord and Novick(1968)in whichG is a logitand H a probitfunction. Samathananand Blumenthal (1978) have givena maximum likelihoodmethodforestimating its parameters similarto theEM algorithm. Thereis clearly roomfor further study ofthenumerical aspectsofall modelsinthelight ofcurrent, and thelikely stateof computer future, technology. It is unfortunate thatno othersuitableresponsefunction has come to lightforwhichthe various integrals have simpleexplicitforms. If we are preparedto abandon the symmetry conditions we could considersuch functions as

(35) ii(y) = y' or ii(y) = 1-(1-y)' forq = 1. These modelscan be fitted veryeasilybut,withonlyone parameter, theyare not formostpurposes.Introducing flexible further sufficiently parameters quicklydestroys their The approximatemethod of fitting our logit model seems to come nearestto simplicity. Whether or notitis good enoughto be generally and flexibility. useful for combining simplicity further In themeantime thevariousmethods fortheprobitmodel q >1 requires investigation. are available.
ACKNOWLEDGEMENTS

in a paperreadat theSociety's outlined The approachon whichthispaperis based was first forsuggestions and to severalparticipants conference at Oxfordin March 1979.I am grateful The especiallyto Dr J. A. Andersonwhose remarksled to a major change of direction. and other readers of an earlier version have also led to many suggestionsof referees in the logitmodel in Section6 has been programmed The methodof fitting improvements. FORTRAN by J.Tomensonto whom I owe special debt.
REFERENCES NorthModels. Amsterdam: in Socio-economic AIGNER,D. J. and GOLDBERGER,A. S. (1977) (eds). Latent Variables are scoredin two or morenominal whenresponses and latentability itemparameters BOCK, R. D. (1972). Estimating 35, Psychometrika, n dichotomously scoreditems. modelfor M. (1970).Fitting a response BOCK, R. D. and LIEBERMAN, 40, 5-32. variables.Psychometrika, A. (1975). Factor analysisof dichotomized CHRISTOFFERSON, (C. A. Data Structures Data, Vol. 1: Exploring ofSurvey models.In The Analysis FIELDING, A. (1977). Latentstructure and C. Payne,eds), pp. 125-157,London: Wiley. O'Muircheartaigh 179-197. 37, 29-51. categories. Psychometrika, Holland.

312

Discussion of thePaper by Professor Bartholomew

[No. 3,

Reading,Mass.: Addison-Wesley. H. H. (1970). ModernFactor Analysis, HARMAN, 2nd ed. Chicago: University of Chicago Press. LAZARSFELD, P. F. and HENRY, N. W. (1968). Latent Structure Analysis. NewYork: Houghton Mifflin. LOMBARD, H. L. and DOERING, C. R. (1947). Treatment of the four-fold table by partial association and partial correlation as it relatesto public healthproblems. Biometrics, 3, 123-128. LORD,F. M. andNovICK,M. R. (1968). Statistical Theories ofMental Test Scores. Reading,Mass.:Addison-Wesley. McDONALD, R. P. (1969). Thecommon factor analysis ofmulti-category data.Brit. J.ofMath. andStatist. Psych., 22, 165-175. MUTHtN, B. (1978). Contributions to factor analysisof dichotomousvariables.Psychometrika, 43, 551-560. PLACKETT, R. L. (1974).TheAnalysis ofCategorical Data.HighWycombe: Griffin. SAMATHANAN, L. and BLUMENTHAL, S. (1978). Thelogistic model andestimation of latent structure. J.Amer. Statist. Ass., 73, 794-799. H. (1961).Classification SOLOMON, procedures based on dichotomous response vectors. In Studies inItemAnalysis and Prediction (H. Solomon,ed.), pp. 177-186 Stanford: Stanford University Press. G. J.G. (1978).TheAnalysis UPTON, ofCross-tabulated Data. London: Wiley.

GOODMAN, L. A. (1978). Analyzing Qualitative/Categorical Data Log-LinearModels and LatentStructure Analysis.

DISCUSSION OF PROFESSOR BARTHOLOMEW'S PAPER

AITKIN (University ofLancaster): I ampleased topropose the vote ofthanks for David Professor MURRAY increasing practical importance variable models is ofrapidly Thesubject oflatent Bartholomew's paper. inSection notes ofapparently unconnected areas. Professor Bartholomew statistical andunifies a number ofthe difficulty 1 that partly because wide favour with statisticians, models havenotfound latent variable tonight takes isoverdue. Hispaper models bystatisticians andthat attention tosuch offitting the models, latent variable models for categorical data. an important developing proper steptowards continuous and is madebetween 2. Anearly distinction is setoutinSection Thebasisofthemodels bothtypes of model has beenusedwith independence though theconditional discrete latent variables, comesdown function 3, thechoiceofresponse (iHvi) in section latent variable. Giventheproperties Bartholomew andthelatent variables. themanifest variables toa probit/logit choice for both essentially the indetail Sections 4 and5 discuss which arenot entirely clear. the model, for reasons chooses logit/logit shows that TheLSAT example discussed play an important role. model. Herethe cross-ratios fitting ofthe for theprobit with theMLestimates gives estimates consistent the method for thelogit model computing it gives efficient though it is notclearthat is practicable, ofBockand Lieberman. The method model estimates. Laird MLestimation bythe EM algorithm (Dempster, for models arenatural candidates Latent variable butthe for model isvery sufficient suitable, as the Thelogit model isunsuitable EM, probit andRubin 1977). ofsquares inthe"complete sums andcross-products. statistics data"model arejusttheusualregression beachieved could backinthe factor model using simple byEM DLR pointed outthat ML estimation normal atLancaster a GENSTAT macro for Hinde hasdeveloped least andJohn and-forward squares computations, in Biometrics, ina noteto appear SteadandCreason, analysis using EM.Hasselblad, exploratory factor byEM.The modelfor curve can be fitted a dose-response outthatthestandard probit analysis point latent intheprobit/probit the estimation oftheparameters ofthese twoapproaches allows combination with I amcurrently Darrell Bockon completing jointwork variable model an EMalgorithm. byMLusing this procedure. be treated classanalysis. may bylatent notes that categorical latent variables Professor Bartholomew itisofsome interest While natural choices for abilities andattitudes, continuous latent variables aremore cellentries is essentially on a scalewhich a scaling ofthe classmodel also provides that thesimple latent thecomputations for thelatent classmodel arethe fitting Thisis ofpractical valuebecause continuous. The be donein GLIM an EMalgorithm. for mixture andcan easily using sameas those a general model, a simple illustration. cancer example (Set 1) provides knowledge classes ofpeople: mixture. There aretwo multinomial usedisa two-component Thelatent classmodel variable, In Professor notation, yin(1) is a Bernoulli Bartholomew's andbadly informed. well-informed, withP(y= 1) = A,P(y = 0) = 1-A. The responsefunction of(5) is thenjust
-Xi(l7rAxiI Y) = -

y) IXi

1980]

DiscussionofthePaper by Professor Bartholomew

313

they suffix indicating that there aretwosetsofzi in thetwolatent classes. Theconditional probability function ofy given x in (2) is then
P(y=1I x) = >f (x I y = 1_i(xIY=?' 1)

a monotonefunction of the likelihoodratioforthe two components, and similarly forP(y = 0 1x). The EM algorithm beginswithstarting values fortheprobabilities of latentclass membership, most simply byassigning each cellto one ofthetwoclasses.Parameter estimates are then obtainedin theM-step from the conditionalindependence model. These are substituted into the likelihoodratio to give new probabilities ofclass membership in theE-step. The sequenceofstepscontinues tillconvergence to theML estimates of the ;iy. This is verysimply in GLIM witha smallmacro.At convergence, accomplished the oflatent probabilities classmembership havealso converged, and these also provide a ranking ofcellsfrom "mostwellinformed" to "leastwellinformed." In addition, for theconditional thelog independence model, oftheratioof theprobabilities ofclass membership is a linearfunction of thexi-a lineardiscriminant whosecoefficients function, are thelog-oddsratiosfortheithitem. Thisdiscriminant function can also be usedto scale theindividual cells.A smallorzerocoefficient indicates thatthecorresponding item does not discriminate betweenwell and poorlyinformed the scale. classes,and can be droppedfrom In the cancer data, the two-classmodel gives a goodness-of-fit value A of 15 4, using one extra so it fitsas well as the one-factor parameter, model. The discriminant function is 1 43x1+ 3 62x2+ 2 35x3 + 1 61x4 assigningmost weightto newspapers, next to solid reading,and least to radio and lectures.These coefficients are verysimilarin relativemagnitudeto the factorloadings in Table 3, column 2. The discriminant score,and theestimated ofbelonging to thewell-informed probability group,are shownin Table Dl, together withthecell code and theestimated factor score from Table 4.
TABLE

DI
Probability ofbeing well-informed 0 131 0 241 0613 0 531 0 850 0922

0001 0010 0011 0100 0101 0110

0000

Cell

Factor (y) score

0 212
0 304 0 384 0475 0 522 0 615 0700

Discriminant score 1 61 2 35 396 3 62 5 23 5.97

000

0 029

0111 1000 1001 1010 1100 1101 1110 1111


1011

0-797 0 304 0 394 0 475 0615 0711 0796 0889


0 567

505 666 740 901

5 39

7 58 143 3 04 3.78

0825 0959 0980 0996

0 868

0 983 0 112 0 386 0 569

A plot of the factor score againstthediscriminant score shows a verynearlylinearrelationship. Similar results are obtainedfor theLSAT data ofBock and Lieberman. For Section6,thevalueofA for themixture model after12 iterations is 23 9, witha discriminant function of 1 66x1+ 1 48x2+ 191x3 + 1.32x4 + 1 26x5. For Section7, A is 35 5 after12 iterations withdiscriminant function 1 76xi +2 02x2+2 67x3+ 1 48x4+ 1 37x5. The goodness-of-fit ofbothmodelscan be improved byfurther iterations, without essentially changing the discriminant function.

314

Bartholomew Discussionof thePaper by Professor

[No. 3,

in unifying models variable to thevalueof latent comments of these I referred at thebeginning on principal withan example: regression areas.I shouldliketo conclude unconnected apparently principal to extract advocated practice, or at leasta commonly It is common practice, components. ofthe subset ona suitable the andthen response ofpredictors, regress set correlated a highly from variables (1975)-makes andGoldberger ofJoreskog model MIMIC model-the variable A latent variables. principal x isa setof setofpredictors the toachieve. istrying Underlying regression component what principal clear model. a regression onz through ofx anddepends yisindependent response z. Given z,the variables latent z: given independent The x areconditionally Yi|ziN1(y + zi, T2) independently ofyi T) independently xiIzi - N(O + Azi, marginally while matrix, withP a diagonal zi N(O,I). for this is under program anda GENSTAT algorithm, an EM using byML canagainbe fitted Themodel Hinde. byJohn development andimportant this for ofthanks vote the stimulating inproposing pleasure I have much Inconclusion, paper. tothe anapproach hasdescribed Bartholomew Professor ofNottingham): (University Dr A.M. SKENE at exists which oftheory rather tothe body sparse addition welcome isa very datawhich ofordinal analysis someofthe release which await future developments must ofcourse impact inthis area.Thefull present tooneorperhaps Therestriction estimation procedure. present bythe constraints imposed computational theory tousing normal tothose accustomed restriction severe tobea very must appear variables latent two and7t(x p(y) ofboth nature ofthe arbitary andaware this limitation accepting However, factor analysis. Iy) model. structure latent logit usesofthis I lookedto possible and data reduction. modelling aretwoareasofapplication; There ina observed the pattern for a very goodexplanation may provide variable continuous latent A single the to support arguments ifthere areextra-statistical particularly table, contingency multidimensional and thus constructs to be mental tend variables latent On theother hand, ofsucha variable. existence class oflatent The flexibility formulation. a discrete be madefor can usually validarguments equally be ourstarting should formulation thediscrete that (1978)suggests byGoodman as described analysis a clear having classes reveals thelatent classanalysis when adopted being thelogit model with point, ordering. the from It follows scores. x byE(yIx),thefactor byreplacing datareduction effects model Thelogit tocalculate p(yIx()) itis a trivial matter variables y,that ofthe manifest given conditional independence onlyrequires estimation thefact with thatparameter ofx. This, coupled x(l) is anysubvector where the logit that tothe observation leads ofthe variables, manifest margins way ofthe oneandtwo knowledge data. is unaffected bymissing model's applicability measure. as a summary betotally itsmean inappropriate may however, obtained Oncep(yIx)hasbeen finding of isthat data.Theproblem andDoering Lombard the ofp(yIx)for two instances Fig.D1 displays

20

2.0

00

y score

1.0

0.0

y score

1.0

(ii)p(y data.(i) p(y andDoering FIG.Dl. Conditional Lombard for distributions IXi = (0,0,0,0)). Ix' = (0,1,1,0)).

1980]

DiscussionofthePaper by Professor Bartholomew

315

suitable for summary measures heavily skewed distributions. Thechoice ofsummary statistic iseven more when twolatent complicated variables arefitted anditis certainly to make much ofthe here dangerous with analogy normal factor analysis. Effective data reduction is striking a balancebetween reducing dimension and retaining that is relevant information which to a specific objective. The realvalueof factor scoresor conditional distributions inthe cannot bejudged abstract anditis pointless the ofthese debating meaning quantities. Their ultimate valuemust bejudged by, for example, theaccuracy ofthe final predictive equation orthe insights intothesubject gained oftheanalysis. Thispoint is relevant toall latent structure models andcanbe illustrated bythe following model used for medical diagnosis. vector formulation ofp(SIDi)is thelatent GiveD diseases S, onepossible Di,i = 1.I andsymptom classmodel
p(S IDi) = E H Pk(SkI Cj)p(Cj IDi).
j= 1 k= 1 n K

(1)

of Di. The parameters independent of this model,viz. the parameters of Pk(. -) and the probabilities a particular using training data,Skene (1978), and, given p(CjIDi)j = 1.n; i = 1.I canbeestimated

Conditional upon latent class,Ci, we assumethatthe symptoms are mutually independent and

jDi)p(Di),can be computed. realisation ofS, sayT, disease probabilities p(DijT)oc(p(T 1makes noclaim tobea representation Equation ofthe absolutely truth. Inany particular application it stands or falls byitsability to correctly diagnose patients. There is a second wayofwriting this model. GivenT, we mayfirst calculate
p(Cj IT)oc

HPk(Tk IC)
k

p(C)

where
p(ci)
=

E p(cj jDi) p(Di) p(Di jCj) p(Cj jT).

and then calculate


p(Di jT)
= E

Theprobabilities p(CjIT)j = 1.n define a probability distribution over the latent classes andthis alone is usedin calculating thedisease probablities. This particular formulation makes the two steps ofthe classification much clearer. Thefirst step ofdata reduction isfollowed ofthe bythe using transformed this data.However isalsovery formulation seductive as itexposes the latent classes andraises the that haverealmeaning. Such possibility they might emphasis is,in themain, unwarranted. Professor ineffect, Bartholomew, hasdescribed a rather different this first wayofdoing stepofdata reduction. Theultimate ofthis test particular model iswhether effective orgoodunderstanding predictions ofparticular datasetsresult. I havemuch in seconding pleasure thevoteofthanks. Thevoteofthanks was passedbyacclamation. MrC.J.SKINNER(University I should tothank Professor for alsolike ofSouthampton): Bartholomew a I particularly very interesting paper. thediscussions oftheresponse enjoyed function in Section 3 andI in thegeneral notethatcertain formulation latent classmodels ofequation mayalso be included (6), ifH becomes a discrete valuedfunction. concern thesuggestion in thispaperthatthelogit My maincomments modelis preferable to the similar tooffer numerically andI should like probit model, a few words indefence ofthe model. One probit for reason given the model isthat 6 provides preferring logit Section a simple solution approximate (atleast when solution q = 1).This beviewed may, as aniterated ofthe however, analogue heuristic solution simple for theprobit where theestimated model, to thetetrachoric In fact, ifone correlations. cij'scorrespond to iterate theheuristic ina corresponding method attempts onefinds that successive iterations manner, give anidentical under the solution, because, probit parameterisation, Eni(y) doesnotdepend onthe factor loadings and thecorresponding One advantage of sucha non-iterated two-stage Oij'sare all unity.

316

Bartholomew Discussionof thePaper by Professor

[No. 3,

structure correlation or moresophisticated packages factor analysis is thatconventional procedure variables or with categorical 1978)maybe usedwith suchas LISREL and Sorbom, (Joreskog packages ofcategorical andcontinuous variables. combinations bytheheuristic obtained thatpoint estimates as in Table2, suggests empirical evidence, Available with the heuristic problem estimates. A supposed likelihood full maximum arevery closetothe procedure fit. test ofmodel a statistical (1978), is thatofobtaining stated by Muthen as for example procedure, 4. ofthis toSection bydirect analogy may be obtained offit test goodness However, a chi-squared paper, normal ofmultivariate ofa number requires theevaluation statistic thecomputation ofthetest where standard errors. ofobtaining method is that with the heuristic difficult problem Perhaps a more integrals. to beinteresting anditwould tointhis paper, is notreferred this problem Aitken hasnoted, AsProfessor errors. to givestandard adapted in Section 6 can be simply know ifthemethod modification understood model provides aneasily the probit I find inteaching this subject that Finally, itvaluable between contingency todemonstrate simple links andI find analysis, ofnormal theory factor and continuous variable analysis. tableanalysis ofEssex): mathematical Professor Bartholomew hassetupan elegant (University Dr G. J.G. UPTON I willtherefore confine aweandwonder. to a with mycomments which fills me,at least, superstructure that he has obtained. oftheresults discussion been obtained haveapparently data, which andDoering for the Lombard thy-scores Table4 includes ofthea to theestimates simply connected are,however, of(10).Thesescores integration bynumerical 0 or 1 corresponding I to thecelldefinition, eachtaking values i,j, k and1, Using subscripts parameters. is given by to they-scores fit notethat an excellent
4 895yijki = 1 024+Oei+a2i+a3k+a4l.

is too goodto be accidental. butthefit for this equation, I havebeenunable an explanation to find ofhis to theinterpretation Bartholomew I am unhappy attention paidbyProfessor aboutthescant itisunnecessary todefine read that bookhasbeen so widely my Inparticular, I cannot believe that results. into the Common oragainst entry votes (for arereferendum datasetIV.These dimensions for the manifest Theorder andsocialclass. in 1975, amount ofschooling, union membership allegiance Market), political ofthe inmy Thepositive place book. ofunion isthe reverse order given oe-values ofcategories membership handend member at theleft Market Labourunion working-class anti-Common theminimally-schooled itisdistinctly the are contrasts isreasonable. that strongest surprising ofthe which However, axis, political allegiance. political rather thanmanifest membership voteand union to referendum those related tothe it factor fits latent factor referendum data.Forthis Insection 8 Professor Bartholomew a second whois atoneend Conservative Market non-union middle-class anti-Common minimally-schooled is the I amunhappy that oneofthe aboutthe assumption ofan axis. Couldthis beanagedimension? However, intheone-factor found ofnecessity be thedimension model will inthetwo-factor dimensions twolatent dimension found I would ina casewhere there were two the single really dimensions, model. that, havefelt to be an over-worked thetwo.My between model is more hybrid lying theone-factor likely byfitting variation, from two wasderived, without random couldbe tested a datasetwhich bycreating hypothesis model. thesingle latent variable fitting and then latent variables, theories of the tofollow course cosmological MrG. J. ofscience seem A.STERN(I.C.L.): Many branches sequel: byPope and a modern consist ofa couplet in thelines, which suggested and Nature's lawslayhidin night, Nature, God said,"LetNewton be,"and all was light. "Ho! theDevilshouting It did notlast, thestatus be!"restored Let Einstein quo. in theshapeof which is followed by worsedarkness is followed theory, Darkness by a clarifying follows so that clarification andpossibly to incomprehensibility meaninglessness, leading sophistication U-curve. an upside-down werehighly measures ofintelligence G meant It seemsto me thatSpearman's something: many oftheoriginal that linear combinations I would to a single factor. variates, correlated suggest Likewise, It means the casewith as isoften something. hasa meaning, components, the combination principal where andthe ofnon-orthogonality with allsorts model full factor the impossibility toseewhat means, isharder variates in terms oftheoriginal thefactors ofexpressing (except byestimation).

1980]

DiscussionofthePaper by Professor Bartholomew

317

Ifa factor analysis type oftheory were tobeapplied atall,itshould be,I suggest, toprecise datawhere there aremany readings so that precise estimates ofparameters can be made.I would suggest that the cancer data,for example, is far from that. Can people really recall accurately from which combination of radio, papers etcthey gottheir cancer knowledge from? I suggest that another sample would yield different answers, which would greatly alter the estimates of the parameters. Moreover, even the one-factor model isfitting eight parameters anda variate towhat are really sixteen points, anda two still factor model would beworse inleading with toestimates (as I believe) a hugevariance. I don't know what theanswer is,even after playing with the dataofTable4,butwould venture these suggestions: fit isnotallthat (1) In fact even the independence badhaving tothe regard imprecision ofthe data.At I suggest, most, a slight modification ofthis assumption is needed. (2) Morethanone factor notbe considered should for theabovereasons. (3)Possibly the model would bemore if convincing the yhada physical meaning, perhaps related tothe correlation between theanswers to thequestions. (4)Ifparameters as well I would as the yareneeded, suggest that the a's should notbeused butonly the I suggest Inconclusion, with that socialmultivariate dataweareoften toexplain, trying comment on, lookat,rather imprecise figures ina way which addstopeople's understanding ofwhat the dataissaying. I think Has this been here? that intothetheory achieved quite a few havebeen built whose assumptions theuser will implications ofthetheory often notfully comprehend, andso itwill be hard for to theuser know I think what hasbeen inmany this isa core a clearer achieved, cases. Certainly onwhich andsimpler couldbe built, and I wouldhopethat this willbe done. theory The foregoing expresses myownview. Dr P. M. E. ALTHAM I would (Cambridge like tothank a useful University): Professor Bartholomew for and stimulating and maketwobrief paper, points. I shouldperhaps think further I speak,my (i) Although about Dr. Upton'scomments before I would itnottoohard impression isthat find tointerpret thelatent structure models toa socialscientist, with notharder andcertainly thaninterpreting a loglinear model order interactions. complicated high I find feature which (ii) Latent structure models possess the attractive. Theessential feature following is that we postulate ofthemodel for theobservable variables which the aregenerally x1...xp, discrete, ofthe latent variables such that existence Thusthe distribution of y, given y, x1...xpareindependent. joint in of x1...xp has thesamestructure as thatof x1...xp,. Thisseemsa desirable anysubset property x'smay where the number ofobservable notbevery the social would scientist applications clearly defined; x's or"questions" This want to include orexclude extra without hismodel. probably drastically altering is notshared it must "invariance" feature ofcourse be recognised that byloglinear analysis, although rather andlatent areaddressing different butfor thesametype loglinear structure of analyses problems, data. A consequence ofthis Bartholomew's property, as pointed outalready byDr Skene, Professor is that is"unaffected" wish toputthe ofthe more model logit bymissing data;I only positive advantages property strongly. The following in writing, contributions were received after themeeting. Professor E. B. ANDERSEN ofCopenhagen): It hasbeenvery (University stimulating to readProfessor newunifying Bartholomew's to latent Thekeyissueis,ofcourse, approach structure analysis. howto inlatent model Professor Bartholomew space. Although for a uniform argues very forcefully always having itisimportant distribution ofthelatent tonote, that demand that weconsider variable, many arguments with latent distributions Wemay thus incomparing be interested several parameters. latent distributions inchanges orwemay be interested ina latent variable time. In such over will casesa statistical analysis take the form ofa comparison ofthe ofdifferent latent Oneofthe distributions. usually parameters models mentioned Bartholomew-with G a logit and H a probit-was byProfessor in a paperby considered Andersen and Madsen(1977)and it has recently beenextended to coverthetypeofcomparsons I mentioned above(Andersen Ifonecompares theapproach (1980)). ofProfessor Bartholomew with the results itappears that the justmentioned, ofmodel a-parameters (6)play different roles. Someofthem are connected with the manifest variables andsome them parameters of relate more tothe latent variables. For an interpretation oftheresults ofan analysis it maybe worth theeffort to makesucha distinction.
i's.

318

Bartholomew ofthePaper by Professor Discussion

[No. 3,

the combine variable, with onelatent model H-probit G-logit, the weconsider if Asanexample, aj0 will mean ofthe (orRasch model) andthe curve model characteristic logistic item simple ofthe parameters item variable. ofthelatent deviation and equalto thestandard while variable, latent ail is a constant paperis doubly Ba.holomew's Professor uponTyne): ofNewcastle (University Dr J.A. ANDERSON and factor data analysis categorical multivariate topics, it combines twoimportant welcome because andcontains models for factor properties ofnecessary summary helpful Hispaper a very provides analysis. and areuseful models factor that these entirely ofRii.I agree on theexpectation results theinteresting theprobit for model. retain somepreference butI still important a and to optimize margins exactly theone-way hereis to fit suggested ofestimation The method probit for the established hasbeen approach A similar margins. two-way offit ofthe ofgoodness measure (25) is exactand (ii) maximum (i) theequation that with theadvantages (1976), by Mackenzie model more dimensions many for on theestimates for the(aij)conditional estimation likelihood (xi)is feasible Bock canbe derived. errors andstandard efficient to be asymptotically canbe shown p = 10;these than in andmay of(xi)and(ceij) estimation simultaneous refers tothe p < 12, limitation, andLiebeman's (1970) methods ofoptimization. bybetter anycase be superceded relates to onefactor, than there is more when model, aboutthelogit concern A more fundamental a kto estimate only itis possible say, k factors, models with, In continuous factor analysis rotatability. this spaceis determined within offactors Thechoice 1971). & Maxwell, space(Lawley dimensional factor thesame hasexactly analysis for factor categorical model Theprobit orbyexternal criteria. subjectively thelogit However, loadings. ofthefactor torotation spacecorresponds ofthefactor Rotation property. variates logistic, homoscedastic case, independent, normal the as unlike nottoberotatable model appears I am respects, models areso closein other Sincetheprobit and logit under rotation. arenotinvariant wouldlead to numerical which in thelogit model rotatability is an approximate concerned that there case. factor continuous as inthe with anddealt recognized inthe unless procedure estimation instability is to be Bartholomew and Professor literature in the statistical This fieldhas been neglected ourinterest. bothon hisresults and on stimulating congratulated is as thetopic I found the ofBristol): stimulating particularly paper MrC. L. F. ATTFELD (University metothe tointroduce served attempts andsothe previous familiar paper I amnot onewith which altogether asspecialcases ofhis more canbeviewed general Bartholomew Professor shows, which tosolve the problem, obtained of theparameter estimates by the"logitfirst I was impressed by theaccuracy approach. ona states, canbeobtained Professor Bartholomew for the onefactor casewhich, method approximation" of toolin thepreliminary an invaluable should analysis The approximation prove calculator. pocket data. categorical ofthelatent ofindependence theassumption ofrelaxing to see theresult It wouldbe interesting tojustify. difficult I find variables economic with unobservable very independence In working variables. andthen condition the the without towork independence itbepossible imposing Would analysis through for construct a test independence? 2 that variables which are"real", inSection latent remark Bartholomew's with Professor I would argue most arequite rare. On the such contrary as "personal wealth", bemeasured directly i.e.caninprinciple of theexpected rate this income, areofexactly e.g.permanent form, ineconomic theory variables latent because is noproblem ofthese casesthere inthe It is true that investment. majority anticipated inflation, ofthem) which variables with variables manifest (ora transformation canbeassociated variables the latent The models can thenbe normal. as multivariate and distributed to be continuous can be assumed method due to likelihood due to Brown theGLS procedure (1974)or themaximum estimated using and Sorbom in Joreskog (1974). outlined Joreskog and interesting I found paper Bartholomew's of Professor London): H.GoLDsTEIN(University Professor toassumption seems 3.this heattaches (iv)inSection butI ama little bytheimportance puzzled useful, 1- i(y) should ingeneral, toseewhy, andI fail setofmodels restrictive tometoleadtoan unnecessarily witha foran examination as i(y). For example, question of functions belongto thesame family choice an incorrect sayto a multiple one wouldnormally response, expect response correct/incorrect and I wouldnot to a correct mental response, to be obtained processes bywayofdifferent q.uestion, thecomplementary that loglog 1980) elsewhere (Goldstein, therefore expect (iv)to hold.I haveargued a more circumstances beinsome satisfy (ii))may incidentally, (iv)butdoes, function doesnotsatisfy (which I would acceptProfessor one than the logitor probitforexam typedata. Whilst appropriate

1980]

Bartholomew Discussion of thePaper by Professor

319

if itwere tobe used seem unfortunate kinds ofdata, itwould of(iv)for many Bartholomew's justification functions areavailable. where more realistic to as "local independence" in muchof the (referred independence On thetopicof conditional in Section 2.2that ifconditional independence asserts literature), Professor Bartholomew psychometric on themanifest wasexerting an influence latent variable this implies that someother were nottrue, then I am notsureI agree.Supposewe havedetermined thelatent space,and choosea setof variables. If setofvalues latent variables. inthis that the same onthe is,allhaving ata single point space, individuals in means that the probabilities response then conditional independence ofresponses weconsider a 2Ptable andthrough these onthe latent variables. This, however, onthe ofthe table this table only margins depend wemight a strong andeven if conditional independence didnothold, still seems tobe rather assumption, theappropriate "interaction" in thistable, via additional parameters probabilities be able to relate this if wehad only In practice wecouldpresumably attempt same setoflatent variables. (loadings), tothe isa difficult inthe social this thing toachieve although observations onindividuals, independent replicate with thebetween-individual variation, The dimensionality of thelatent spaceis concerned sciences. areassociated with each will determine howmany parameters the within-individual dependencies whereas latent variable. whatProfessor Bartholomew says about the care neededin I wouldliketo endorse strongly seems are Thehistory tobe full ofwhat offactor analysis models. from latent variable interpreting results Ofcourse, with oneway devices confused substantive convenient reality. being mathematically essentially butdifferent reasonable some models, including is ifonecan showthat ofinculcating a proper caution I very dataset. ofa common much hope different interpretations which (iv), canleadtoquite do notsatify to this issue. willgivesomefurther Bartholomew thought thatProfessor TheAUTHOR later, in writing, as follows. replied than the present a more extended reply questions which merit many fundamental Thediscussion hasraised andthe following contributions grateful tothose whosubmitted I ammost limit ontime andspaceallows. discussion. willI hopebe a continuing contribution to what areintended as first incomplete remarks inwhich the latent a latent classmodel tothe possibility ofusing have drawn attention Several speakers arises as a special with twolatent classes out, such a model As MrSkinner pointed variable iscategorical. wechoose case ofourgeneral model. Thussuppose ? < Y< Y. 1 (A) or more latent in latent class 1. Models with three ordered then is the of the population proportion yo 3 though itisnot in(1)satisfies allthe conditions ofSection expressible Thefunction arise similarly. classes in thesameway.In-so-far as (A) can in theform theparameters are notinterpretable (6) and hence togive wewould the latent classmodel fits orvice the expect functions, versa, logit (orprobit) approximate isindeed the caseand Aitkin's that this tofind from Professor calculations similar toours. Itisre-assuring ofhisdiscriminant functions will Itisworth that the coefficients noting hisexamples repay detailed study. inthe oncancer In particular indicate example the samepattern as the they that, exhibit a's inourmodel. radio orlectures. than what isheard through written ofinformation more carry weight sources knowledge, or ofrealism model be madeon grounds Thechoice between a latent classanda latent variable may latent variables are almost convenience. thatcontinuous suggests My ownexperience computational both kinds ofmodel canbe accommodated more realistic butinanycase,as wehave justshown, always like within Professor Aitkin's caseisthus onefor functions (A).He using step the oneframework. essentially which can the existence ofcomputer which stem from claims the programmes advantages computational of the thrust term here but main for the There well beshort beadapted model. advantages may fitting easily tobe which their relative merits within a framework allows wastolookata wide classofmodels the paper the I hopethat ofsecond bestmodels until this thepremature adoption assessed. would caution against full ofoptions has beenthoroughly and evaluated. explored range whether a probit, inthe over ofview there seems tobe little at stake From a practical argument point thelogit model was toinvestigate be used. version ofthe model should Myowndecision logit ora hybrid Thelatter seem more ratios. ofitscloselink with thecross-product bythediscovery largely stimulated ofassociation than tetrachoric correlations and theremarkably measures goodapproximation natural Atthevery leastit to be uncovered. connection ofsomedeeper yet provided by(14) maybe indicative model as version ofthe the In thelong I seenoobstacle tomaking further run logit general study. justifies
;i(y) =;i ,
=i2'

Yo<YS

320

Discussionof thePaper by Professor Bartholomew

[No. 3,

oftheapproximate is now.Thesimplicity method offitting theonefactor easily applicable as theprobit logit model already promises well. A number ofquestions the choice ofp(y) andi(x Iy).However, the made inthe were raised about point involved in this choice doesnotseemto havebeenfully arbitrariness paperabouttheessential taken. inthe Another ofmaking the is toobserve that ofvariable of(1)leavesf(x) way point anychange integral isnoempirical ofdistinguishing unchanged. Sincef(x) isallthat canbeestimated there means between any ofthecombinations which to argue offunctions leadto a given aboutthemost f(x).It is thus pointless realistic form ofp(y); the question ispurely oneofconvenience. same which Bythe token, analyses depend I amdoubtful inan essential form For this havelittle reason aboutthe wayon anyparticular meaning. ofDr E. B. Andersen's to maketheparameters ofp(y)functions oftime. consequences proposal mentioned orbyimplication the errors andthe of Several lackofstandard speakers directly disregard due to lack of spacebutarisesin partfrom of efficiency. Thisomission questions was notentirely misgivings aboutthe relevance ofsuch concepts inan analysis which is exploratory anddescriptive. The of Howmuch contribute tothe of useful. overall goodness fit test iscertainly standard errors interpretation ofthe howfar it the over seems more Theroot matter hastodo with analysis and abovethat questionable. from ina certain issensible toconsider a population hasnoreal existence. repeated sampling which, sense, Dr J.A. Anderson Dr Skene, thecaseq > 1.There andDr Upton all raise wasno matters concerning in thepaperand their havebeennoted for future this remarks use.For smalla's the spaceto develop 2 butthis for isscarely use.Theprobit model has normal theory carries over byTheorem adequate general ofthelogit models that ofthe means theestimates realadvantages here and thecloseness and probit ofonecan easily be transformed to giveestimates oftheother. parameters onSection 7 which howanymodel ofthe No onecommented shows canbefitted totables with family ofthe wider ofdatawithin the ordered manifest This a much several categories. brings range scope analysis themethod. is inprogress to implement and further work from these issues a number ofspecific were raised and these aredealtwith, in Apart general points below. order, theuseofprincipal inregression isa goodinstance Professor Aitkin's example involving components ina general framework. ofthe benefits ofsetting Hisexample is usedtoillustrate latent variable problems in Bartholomew thesamepoint (1981). oflocation for Dr Skene is right topoint outthat the mean notbe a goodmeasure theposterior may inthe ofthe of distribution butthe remark hastobeinterpreted arbitrariness light p(y). Bychoosing ytobe uniform that as theexpected ofan we, ensured E(yIx) had a "distribution-free" interpretation quantile from with x. individual chosen at random those a given Mr Skinner's remarks abouttherelationship between theprobit and logit estimation methods are I would with him inseeing itas a virtue illuminating. ofthe itcanbeeasily part company model that probit linked with thenormal offactor model Forteaching I think analysis. that this obscures rather purposes thecommon thanreveals structure shared variable models. underlying byall latent Dr Upton's formula for they-scores is intriguing butnotaltogether simple It is another surprising. ofthe indication common which for intheorem 2.Forsufficiently structure emerges, example, small a's the willsatisfy oftheform y-scores an equation
Ay = B+
p i=l1

oeixi

Itissurprising that this form isso goodwhen the This example encourages the hope a's arenot of small. a general finding linear approximation which would avoidtheneedtoevaluate the integrals required for they-scores. I share hisregret at thescant attention paidinthepaper to theinterpretation oftheresults wassolely duetolackofspace. which, again, I amgrateful toDr Upton andother contributors for helping to remedy thedeficiency. Asalways, MrStern puts usonourguard against undue sophistication. In this casehowever, I would allofhisconclusions. dispute Thestudy of cancer knowledge wasnotconcerned with asking people about thesource oftheir knowledge ofcancer. The pointoftrying to find latent variables which couldbe identified with "Knowledge aboutthings ingeneral" wasto seehowthis related toknowledge ofcancer. DrsAltham andSkene drew attention toanimportant property ofthe model which isextremely useful insocial applications notleast inthe regression problem referred tobyProfessor Aitkin. There areusually a great manifest many possible variables from which those included inthestudy areoften chosen inan

1980]

Bartholomew Discussionof thePaper by Professor

321

choice invalidate the Itis the ofthis doesnot model. Asthese contributors imply, manner arbitrary fashion. of manifest as of linearcombinations variables forthis reasonthat the "naive"interpretation at first sight. thanappears maybe lesssimple recommended by Mr Stern itto work andwould commend Dr J.A. Anderson aboutMackenzie's to hearfrom I wasinterested with their method. to see howitcompares it wouldbe interesting Professors Aitkin and Bocksince ispossible which attention than here. economists, twomatters deserve more Among MrAttfield raised inthelanguage established ofthetheory. suchas those hementions arefirmly I suspect, latent variables interms which will to towant toexpress theanalysis relate will anditis natural often be correlated They with here butI would prefer towork Thiscanbedone, inprinciple, for the models given economic theory. are todimensions which them latent variables atthe first andthentransform subsequently stage orthogonal I do notthink thatthere is any of"real"latent variables meaningful. On thequestion economically be could, in principle, variable as one which a reallatent between us. I defined substantial difference ithasmeaning ineconomic I think a variable as realif economists believe MrAttfield regards measured. The twodefinitions arenotequivalent. discourse. I fully his endorse fundamental. Whatever ourdifferences Professor points are,likewise, Goldstein's in which to conceive ofa situation p(y)does notsatisfy It is certainly possible sentence. penultimate Does thelabelling ofthe 3. The question at issuemaybe putas follows: assumption (iv) of Section Ifnot Even condition (iv)follows. which analysis? then isrelevant tothe any information categories convey wetake (iv).Suppose grounds for adopting aresubstantial empirical isnotaccepted there if this argument = y'. Thenthefunction = 1-(1 - y)amight m(y) somesimple notsatisfying (iv),suchas m(y) function, tothetable there are2P form wechoose? With p dimensions to serve Which should equally well. appear taskto investigate all ofthem and choosethebest. combinations. It wouldbe a formidable possible tousea function notsatisfying (iv)wehave the options toone.Itwedecide (iv)reduces Invoking condition Inview wehavesaidaboutthe for a particular form. ofallthat preferring tofind some extraneous grounds seems to be a formidable task. involved this arbitrariness in thenature thanan ofconditional is more ofa definition The crucial independence assumption ofwhat wemeanwhen wesaythat thevariation thex's is It is a formal statement among assumption. the latent variables areconstructs, onthe assumption y's.Ifthe bytheir dependence completely explained or tospeak ofitinterms ofbeing true appropriate andso itdoesnotseem empirically could never betested ofthe andI hopethat the isclearly more tobesaidonthis, andmany other issues raised, false. There much willcontinue. debate
IN THE DISCUSSION REFERENCES Psychometrika, 45, 121-134. ANDERSEN, E. B. (1980). Comparinglatentdistributions. E B. and MADSEN,M. (1977). Estimatingthe parametersof the latent population distribution. ANDERSEN, Psychometrika, 42, 357-374. to appear. D. J.(1981). Posterioranalysisof the factor model. Brit.J. Math. Statist.Psychol., BARTHOLOMEW, in theanalysisofcovariance structures. 5th.Afr. Statist. J., BROWNE, M. W. (1974).Generalized leastsquaresestimators in LatentVariables Models(D. J.Aigner and A. S. Goldberger, inSocio-economic eds),pp.205-226. 8,1-24. Reprinted Amsterdam: North-Holland. data via the EM DEMPSTER, A. P., LAIRD,N. M. and RUBIN, D. B. (1977). Maximumlikelihoodfromincomplete algorithm (withDiscussion).J. R. Statist.Soc. B, 39, 1-38. in latenttraittestscore H. (1980). Dimensionality, and measurement scale problems GOLDSTEIN, bias, independence in press. models.Brit.J. Math. Statist.Psychol., probitanalysis. estimation formultiple HASSELBLAD, V., STEAD, A. G. and CREASON, J.P. (1980). Maximumlikelihood to appear. Biometrics, K. G. and GOLDBERGER, A. S. (1975).Estimation ofa modelwith and multiple causes ofa multiple indicators JORESKOG, singlelatentvariable.J. Amer.Statist. Ass.,70, 631-639. D. (1974). Statistical modelsand methodsforanalysisof longitudinal data. In Latent K. G. and SORBOM, JORESKOG, North Variablesin Socioeconomic Models. (D. J. Aignerand A. S. Goldberger, eds), pp. 285-325. Amsterdam: Holland. modelsbymaximum liklihood for estimation oflinearstructural (1978).LISREL IV. A general computer program of Uppsala. method.User's guide.Dept of Statistics, University A. E. (1971). Factor Analysisas a Statistical Method,2nd Ed. London: Butterworth. LAWLEY, D. N. and MAXWELL, identification Ph.D. Thesis,OxfordUniversity. J. R. (1976). Some statistical problems. MCKENZIE, eds). A. M. (1978).Discrimination models.In COMPSTAT1978 (Corstenand Hermans, SKENE, usinglatentstructure Vienna: Physica-Verlag.

S-ar putea să vă placă și