Sunteți pe pagina 1din 16

FACTOR ANALYSIS IN A TEST-DEVELOPMENT

PROGRAM
BY J. P. GUILFORD
University of Southern California

The development of tests to meet the cussion will be confined to one type of
apparent requirements of selection and factor analysis—the Thurstone centroid
classification of personnel to fit particu- method with rotation of reference axes
lar assignments has traditionally pro- —since experience has shown that that
ceeded along two lines. New tests have method gives the most psychologically
been (1) based upon the work-sample meaningful results. Some illustrative
principle or (2) designed to measure findings from the AAF psychological
hypothetical traits that are thought to program will be cited, with their im-
be important as a result of a psycho- plications. The discussion will begin
logical job analysis. There is a third with brief comments on the work-sample
basis upon which test production and and the job-analysis modes of operation
refinement can be accomplished, namely, as a background for a presentation of
the approach through factor analysis. the factorial approach. The three pro-
The aptitude-test development early cedures can best be compared in an-
in the Army Air Forces psychological swer to the question, "Why are tests
program in World War II followed the valid?"
traditional routes. The reasons were
several. The great pressure for a valid I. WHY ARE TESTS VALID?
classification battery at an early date
favored the more direct and expedient 1. Work-sample tests. A work-sam-
steps and the exploitation of obvious ple test ordinarily presents a task that
leads. The factorial approach requires obviously resembles the features of a
for its full exploitation an accumulation job or of some elementary component
of intercorrelational information regard- of the job. In the AAF classification
ing tests and criteria. As this type of battery, a good example of a work-
information grew, the basis for an in- sample test is known as the Complex
creasing use of factor-analysis theory Coordination Test. Developed between
and practice was strengthened. There the two World Wars, it was apparently
was, to be sure, much previous knowl- designed to present a task analogous to
edge of factors, primarily from the work that of a pilot operating an airplane in
of Thurstone. As things turned out, it flight.1 The adjustment of a pilot's
would have been very profitable to have stick and rudder bar in response to
utilized this as a starting point. The changing signals confronting the ex-
prior knowledge was by no means ig- aminee in this test is clearly similar in
nored, but the test-development pro- many respects to the activity of a pilot
gram had been planned primarily from in controlling an airplane. Another
the traditional points of view. test, known as Dial Reading, and still
another, known as Table Reading, seem
In this article the emphasis will be to duplicate in part the tasks of pilot,
upon the presentation of factor theory of navigator, and of bombardier, all of
as a basis for rational test development
and the advantages of factorial meth- 1
For full descriptions of tests mentioned in
ods over other approaches. The dis- this article, see the AAF Reports (1).
79
80 J. P. GUILFORD

whom must read dials and tables While the validity of a work-sample
quickly and accurately as parts of their test is usually taken for granted, there
duties. are sometimes puzzling results. No one
Work-sample tests rarely fail to ex- would be amazed at the relatively high
hibit some degree of validity. The ex- validity of the Complex Coordination
tent of the validity will depend upon test for selecting pilot trainees, with a
the completeness of the simulation of typical validity coefficient of .39 (within
the job task and the relative importance the range of talent prevailing at the
of the task to the job as a whole. time aviation students were classified
There is no question but that the most for aircrew training). It was designed
complete aptitude test would be the job as a pilot test, and the obvious simi-
itself. The best test of aptitude for larity would seem to account for its
learning to pilot an airplane would be predictive value in this connection. But
to train the applicant in flying an air- it also proved to be equally valid for
plane. A closely simulated task would other assignments when the criteria had
be his performance in a good synthetic much less similarity to the test—a va-
trainer for pilots. Both of these tests lidity coefficient of .38 for air mechanics
would be costly in terms of time, equip- (when the criterion was a composite of
ment, and, in the one case, possibly hu- academic grades and N was 300), and
man lives. For these reasons, simpler, a validity coefficient of .40 for flexible
less expensive tests are to be recom- gunners (when the criterion was a final
mended. academic examination and N was 173).
The use of a simpler work-sample It had lower, but substantial, correla-
test also presents some disadvantages. tions with accuracy of pistol firing (.30
Even rather complex work-sample tests when N was 350) and with carbine fir-
rarely cover the entire job for which se- ing (.25 when N was also 350). Surely
lection is being made. A pilot, whether such versatility for a test needs to be
civilian or military, must be able to do explained. If it can be valid for pre-
more than maneuver an airplane safely. dicting such disparate activities, rang-
In order to cover all significant aspects ing from academic grades to pistol fir-
of his job, other work samples must be ing, some explanation is demanded.
brought into the picture. It may be Similarly puzzling are the correlations
supposed that if a sufficient number of of Complex Coordination with other
work samples are included in a battery, tests which it does not at all resemble
covering all critical activities of any spe- superficially. It was found to correlate
cialist, a fairly high degree of predic- almost as high with some paper-and-
tion would be attained. The number of pencil tests (by name, Dial and Table
activities in jobs, however, is rather Reading, .35, Instrument Comprehen-
large. The number of tests required to sion, .36, and Mechanical Principles,
simulate them would be correspondingly .32) as it did with other psychomotor
large. The amount of duplication found tests (.26 with Discrimination Reaction
in the complete coverage of a single job Time, .38 with Rotary Pursuit, and .32
in this manner would probably be ex- with Rudder Control). Such facts
cessive and the practice would thus be leave one with a craving to understand
wasteful. Even if only one work sam- these unexpected communalities of the
ple were required for each job, there Complex Coordination test. Such find-
would have to be as many tests as there ings are not by any means confined to
are jobs or distinct types of jobs. That this test. It is my contention that only
number is enormous. by making a study of the intercorrela-
FACTOR ANALYSIS IN A TEST-DEVELOPMENT PROGRAM 81

tions of tests and criteria are we able If the tests turn out to be invalid for
to understand such mysteries. It is the purpose intended, the natural con-
also contended that the understanding clusion is that there was failure at one
of communalities among tests leads to of the steps; which one is not immedi-
further enlightened progress in test de- ately apparent. Factor analysis of such
velopment and in the discovery of new tests has shown, however, that a valid
variables in human personality, as sub- test constructed on this basis may actu-
sequent discussion will show. ally be valid in spite of, rather than be-
2. Tests based upon psychological job cause of, the procedure followed. Some
analysis. A second answer to the ques- examples will support this statement. A
tion is that tests are valid because they test that was designed to measure sup-
measure hypothetical traits observed in posed traits of speed of decision and re-
making a job analysis. It requires action was found to measure factors
some amount of job analysis, to be sure, identified as speed of perception, aware-
to enable one to arrive at work-sample ness of spatial relations, and psycho-
tests, but this hardly deserves the name motor precision.2 Further factor analy-
'job analysis' from the point of view sis may show that the supposed traits of
of the psychological technologist. Psy- speed of decision and of reaction may
chological job analysis is designed to actually be represented by this test, but
arrive at less superficial results. Its the evidence indicates that its entire
goal is to break a criterion activity validities for predicting the pilot, navi-
down until the significant psychological gator, and bombardier criteria can be
abilities and traits have been identified. attributed to the three factors named.
The abilities and traits so observed may As another example, a test developed to
be designated by common recognized measure practical judgment was found
psychological terminology. upon analysis to measure factors identi-
In the analysis of the training of an fied as verbal comprehension, mechani-
aircraft pilot, such familiar categories cal experience, and reasoning, each to a
as memory, attention, perception, judg- higher degree than it measured a factor
ment, comprehension, foresight, plan- that could be called judgment. Tests
ning, coordination, fear, apprehension, developed to measure mechanical com-
and temperament were applied. Other prehension very often measure the
traits that were used had little prec- power to visualize almost as much as
edent in psychology, but, on the other they do a mechanical factor, and this
hand, are more natural to the lay ob- mechanical factor is one of knowledge
server of aviation performance—"sense or experience rather than one of com-
of sustentation, feel of controls, ap- prehension.
propriateness of controls used," and the The prevailing approach to aptitude-
like. These categories could, no doubt, test development to date has been closest
be further analyzed from the 'arm- to that of job analysis. Pre-war prog-
chair' and could be translated into fa- ress had gone very little beyond the pro-
miliar psychological terms, such as cedures so well set forth by Hull about
tactual and kinesthetic perception. twenty years ago (4). Most of the
Tests designed on the basis of this success thus far achieved must be ac-
type of job analysis are variously suc- credited to this mode of operation.
cessful. When they succeed, one has the From what is to follow, however, it will
basis for believing that the observation 2
Definitions of factors mentioned in this
of the trait, its definition, and its trans- article will be found in reference (1), Report
lation into a test have all been correct. No. S, and more briefly in reference (3).
82 J. P . GUILFORD

be seen that this approach has been derived factors is most striking. They
largely a blind one and has progressed coincide at some points, but are gener-
more by trial and error than by virtue ally divergent.
of real understanding of what the re- To express the first assumption more
quirements of prediction actually are exactly, the total variance of any test
and of what specific tests have to offer or of any criterion can be subdivided
to meet those requirements. into the following component variances:
3. Tests based upon factor analysis. (1) a number of common factors (com-
The third answer to the question "Why mon in the sense that they appear in
are tests valid?" is that they measure in more than one test or criterion); (2) a
common with the job criterion certain possible specific factor (a factor that
fundamental factors. These variables appears consistently in the same vari-
must not be confused with the usual able from time to time but not in other
job-analysis categories, for they are de- variables); and (3) error variance.
rived in a very different manner. The Mathematically, if we let the total
remainder of this paper will be devoted variance of a test or criterion be equal
to a brief account of the assumptions to 1.00, the same ideas can be ex-
and theory underlying factor analysis pressed by the equation
as applied to test validity, and to the
many advantages of this approach. x2 — a/ + bx° + . . .
+ nx2 + sx2 + ex2 - 1.00 (1)
II. RESUME OF FACTOR THEORY in which
1. Fundamental assumptions. The x- = the total variance of test X,
first assumption of the factorial ap- a/ = the proportion of variance con-
proach is that tests and criteria alike tributed by factor A,
can be statistically analyzed into a lim- bx- — the proportion contributed by fac-
ited number of basic traits that ad- tor B,
ditively make up the total variance of nx2 = the proportion contributed by the
each test or criterion. The term 'vari- nth factor,
ance' may be regarded, roughly and s/ ~ the proportion of specific variance,
simply, as merely a more exact expres- and
sion for the idea of 'individual differ- c 2 — the proportion that is error vari-
x
ences." It should be emphasized that ance.
the analysis is statistical rather than ob-
servational in the ordinary sense. One Thurstone has further defined the
reason why more of the factors have sum of the common-factor variance as
not been detected without the use of the comtnunality, which is described by
statistical procedures is that they are the equation
not obvious to surface inspection. For hx- = ax" (2)
the most part, owing to the extreme
complexity of the person observed and The reliability of a test (its proportion
of his activities, they have eluded even of non-error variance) is described by
the sophisticated observer. They are, the equation
however, usually recognizable and ac-
rxx = ax- + b2 + . . .
ceptable to most observers when they
+ nx2 + sx2 = 1 — ew2. (3)
are pointed out after having been dis-
covered by statistical analysis. The It might be added here that what
hiatus between a list of ordinary job- may appear to be specific variance in a
analysis traits and a list of statistically test may prove to be one or more addi-
FACTOK ANALYSIS IN A TEST-DEVELOPMENT PROGRAM 83

tional common factors when the test's fixed quantity, except as there are im-
relationships to new types of tests are proved measures of it, this mode of im-
taken into account. This statement provement would have to come from im-
does not preclude the possibility of proved tests. There are limitations to
genuine specific factors. In practice improvement from this direction, for
we have little interest in them, since maximal validity also requires optimal
nothing can be predicted from them weighting of the various factors in keep-
outside the tests having them. ing with their loadings in the criterion.
A second important assumption is 2. Some general illustrations. Let us
that the intercorrelation of any two apply the two fundamental assumptions
variables depends upon the factors that just stated to some general illustrative
they have in common, and the factor situations of tests and a criterion. In
weights or loadings. The general equa- Fig. 1 are shown in graphic form the
tion to describe this idea is
A | C G ( f s ^ Zj.
b,bx TEST
+ c,cx n,nr (4)
in which A C C
TEST 2
r,x is the correlation between text X and
criterion J,
B
a, = the loading for factor A in cri-
TEST 3
terion J,
ax — the loading for factor A in test X,
and other symbols have analogous A C D H'S,^ , E
meanings. CRITERION J

A validity coefficient is therefore re- FIG 1. Diagrams of the segregation of


garded as a summation of the cross variances of tests and a criterion into com-
products of the factor loadings of fac- ponent variances Letters A to G stand for
variances in common factors. S stands for a
tors that test and criterion have in com- specific variance, and E for error variance,
mon. A factor loading is identical with with subscripts of each in keeping with the
the correlation of a measured variable total variable
with a factor.
From equation (4), several deduc- component variances of three tests and
tions can be drawn. It can be seen that of a criterion J. The criterion has vari-
the validity of a test or of a test battery ances in four independent common fac-
can be maximized by taking several tors, A, B, C, and D. Test 1 has vari-
steps. One of these is to make sure ances in factors A and C; test 2 in fac-
that every factor that has a weight in tors A, C, and G; and test 3 in factors
the criterion also has a weight of like B and D. The proportions of these
algebraic sign in the test or battery. variances are indicated by sizes of areas
Increasing the present validity of a test of rectangles. Their composition of a
battery is best assured by adding cover- total, along with specific and error vari-
age of an additional factor, or factors, ances, illustrates the summative equa-
not now represented. Another step to- tion previously given (equation 1).
ward maximal validity would be to in- It can be seen that all three tests
crease the loading of a factor already have something in common with the
measured in either test or criterion. criterion and may be regarded as being
Since a criterion is usually a relatively valid for the prediction of this criterion;
84 J. P . GUILFOED

tests 1 and 2 by reason of sharing fac- .55, which is some improvement over
tors A and C, and test 3 by reason of the simple correlation rjS. Here the
sharing factors B and D. The degrees reasons for desiring low correlations be-
of validity can be readily estimated tween tests when they both correlate
from equation (4), as follows: positively with the same criterion which
they are combined to predict are very
rlf - a^! + CjC± = (.4) (.6)
clear.
+ (.4) (.6) = .48,
r2j = a}a2 + cf2 = (.4) (.4) As a matter of secondary interest
here, if the variance in criterion J that
+ (.4) (.35) = .30,
is represented by sf turns out to be an-
r3j = b,bs + d,d3 = (.3) (.7)
other common factor, it would be de-
+ (.5) (.5) = .46.
sirable to determine its nature and to
Tests 1 and 3 are about equally valid, develop a good test of it that could be
but for totally different reasons. Test added to the battery. Let us say that
2 is valid by reason of the same factors s,~ is equal to .14, or 14 percent of the
as test 1, but its validity is much lower total variance of the criterion. Then
by reason of lower loadings in those the loading corresponding to it, if the
factors. Test 2's high loading in factor variance belongs to a single factor,
G would make it a good contributor to would be the square root of .14, or .37.
the prediction of any criterion also It would be well worth while to find a
loaded in that factor. test, which, even if loaded only to the
Since all three tests are valid against same extent with the factor, would add
criterion J, let us consider the feasi- .14 to r,x, the test validity.
bility of combining them by pairs and The illustrations of factorial princi-
note the relation of their common fac- ples just given are in terms of simple,
tors to multiple prediction. In this fictitious variables. Similar examples
connection we must consider the inter- will be presented later in connection
correlations of the tests. These can with actual AAF tests and criteria.
also be estimated from the common- 3. The general nature of factors. Be-
factor loadings by using equation (4). fore listing the advantages of developing
They are: ri2 = .45, r18 = .00, and r2S tests on the basis of factorial knowl-
= .00. Knowledge of the traditional edge, it is perhaps necessary to say
multiple-regression equation would lead something regarding the perennial issue
us to expect that the combining of tests as to the status of factors. Are they
1 and 2 would bring about little im- primary abilities, mathematical arti-
provement over the use of test 1 alone. facts, real variables in personality, or
It would also lead us to expect that the culturally determined unities in be-
combination of tests 1 and 3 would lead havior?
to a very material improvement in pre- Operationally, factors are discovered
diction of the criterion over that of the from the systematic manner in which
best single test, namely, test 1. These measures of individual differences in-
expectations are fully born out by the tercorrelate. There seems no room for
results, and by reference to the dia- doubt as to the mathematical order so
grams the reasons will perhaps be more revealed, though there may be questions
apparent. The multiple correlations are: about specific findings in specific stud-
Rj.is — -4°, which is just .01 higher ies. Any finding is subject to further
than the simple correlation r^; R]M = verification. Having found clusters of
.66, which is .18 higher than the same measures of individual differences that
zero-order correlation, r^; and Rj.2s = have much more in common with each
FACTOR ANALYSIS IN A TEST-DEVELOPMENT PROGRAM 85

other than they have with other meas- chanical Principles of .52, .49, .50, .39,
ures, the psychologist is usually not .50, and .54. The verbal factor ap-
satisfied until he finds underlying 'rea- peared in Reading Comprehension tests
son,' or until he aligns the finding with with loadings of .68, .63, .65, .65, .68,
other concepts, usually verbal; he tries .69, and .73. The perceptual-speed fac-
to name the factor. The naming de- tor was found correlated with the Com-
pends upon features that the clustering plex Coordination test to the extent of
measures seem to have in common and .22, .22, .19, .20, .26, .22, .22, .28, .19,
that are unique to them. Attaching a .25, and .14 in different analyses. The
meaningful label facilitates communi- perceptual-speed loadings in the Me-
cation and systematic thinking. Any chanical Principles test were .00, .01,
label or definition of a factor should be .03, .13, .12, .12, and .01 in as many
regarded as a hypothesis, in the same analyses.
manner that any trait name is a hy- At this point it must be admitted
pothesis. that the criteria for rotation of refer-
The least that can be said for fac- ence axes in factor analysis are not
tors is that they are convenient and de- completely objective. There is much
pendable reference variables, derived by room for the judgment of the investiga-
known operations that can be dupli- tor to play a part. Indeed, experience
cated. This would seem to meet the has shown that completely 'blind' rota-
best scientific requirements for research tions are not desirable. Use must some-
procedure. The dependability of fac- times be made of previous findings. But
tors has been amply demonstrated. The there are limits imposed upon subjec-
same factor will appear with comparable tive judgment by the observance of the
loadings in the same tests time after goals of positive manifold and simple
time. The factor known as perceptual structure. Within these limits one can-
speed was found to have loadings in a not take extreme liberties. If more
test entitled Speed of Identification tests were factorially pure, those ob-
(matching airplanes) as follows: .64, jective requirements alone would prob-
.58, .66, .67, .62, .65, .69, and .65. ably suffice. Some of the consistency
Analyses were based upon different reported above may thus be attributed
groups of aviation students and in- to the use of cues from previous analy-
volved different test batteries. A fac- ses. It is difficult for one who is ex-
tor called spatial relations had loadings perienced in factorial procedures, how-
in the Complex Coordination test of .56, ever, to believe that those consistencies
.50, .47, .47, .52, .52, .50, .46, and .46.8 could have been achieved without per-
A general-reasoning factor appeared in mission of the configuration of the fac-
Arithmetic Reasoning tests with load- tor structure itself. It would be a se-
ings of .47, .47, .56, .50, .48, .47, .68, vere challenge to anyone to produce a
and .56. A factor denoted as visualiza- very different set of consistent results
tion had loadings in a test called Me- if he worked within the limits of simple
structure and positive manifold.
s
This spatial-relations factor is not identi-
cal with the space factor of Thurstone. AAF Factors, then, seem to be real dimen-
results show that his S factor is probably a sions of human personality when per-
composite of spatial relations and visualiza- sonality is defined as a phenomenon of
tion. There are times when without enough individual differences. ' Factors are dis-
definitive tests for a factor in a battery that covered by a set of mathematical op-
is analyzed two factors may refuse to separate.
This type of failure is not unique to factor erations from objective data, plus other
analysis. operations that can be prescribed. They
86 J. P. GUILFORD

are reproducible by the same operations V ME |R, Ri V« O |. , ; E - * '


READING COMPREHENSION
from data derived from new samples
and even under somewhat varied condi- MB
tions. They can usually be associated
with verbal symbols which have psycho- N | 5l Pf
DIAL AND TABLE READING
logical significance. They are 'primary'
only in the sense of being reference
variables which have a high degree of si PM* PMN|<«
mutual independence. They probably D I S C R I M I N A T I O N REACTION TIME
correspond to observable variables or
FIG. 2. Diagrams of the component vari-
facts of biological or social origin or of ances of three Army Air Forces classification
joint bio-social origin. Which of these tests. The letters stand for:
genetic cases applies will have to be de- V—verbal-comprehension factor
cided for each factor. Some AAF re- ME—mechanical-experience factor
sults tend to show that genuine experi- Ri—reasoning I (general-reasoning) factor
ence factors can be isolated, e.g., a R2—reasoning II (common to analogies tests)
factor
mechanical-experience factor and a Vi—visualization factor
mathematical-background factor. The O—other common factors, each with vari-
status of other factors is still an open ance too small to mention separately
question. TJ—unknown common-factor or specific-
factor variances
III. ADVANTAGES OF THE FACTORIAL E—error variances
N—numerical factor
APPROACH Si—space I (spatial-relations) factor
1. Precision. The factorial approach P—perceptual-speed factor
MB—mathematical-background factor
provides an exact, quantitative picture Ms—memory II (visual-memory) factor
of tests and criteria in terms of stable PMs—psychomotor II (precision) factor
categories. The precision feature has
already been suggested by the equations swers to some items could be facilitated
and by Fig. 1. More concrete illustra- by the fact that the examinee had had
tions are given in Figs. 2 and 3. Fig- acquaintance with mechanical concepts.
ure 2 shows the proportions of the fac- The test had been constructed with the
tor variances in three of the AAF classi- aim of requiring inferences from what
fication tests—the Reading Comprehen-
sion, Dial and Table Reading, and Dis- P LE
crimination Reaction Time tests. The PU,M,
first two of these are printed tests and PILOT CRITERION 1 E-
the third is a psychomotor test. Al-
though the Reading Comprehension test
0
is primarily verbal, the verbal-compre- N |5* JMB
hension factor takes up less than half of NAVIGATOR CRITERION
its total variance. Other factors that FIG. 3. Diagrams of the component vari-
contribute materially are mechanical ex- ances of pilot and navigator training criteria.
perience, two reasoning factors, and Letter symbols are as defined with Fig. 2 ex-
visualization. The presence of the me- cept for some additional ones:
chanical variable here is understand- PI—pilot-interest factor
able, though it could not have been de- M<—memory IV (content-memory) factor
Ms—memory III (picture-symbol associa-
cided with any assurance from inspec- tion) factor
tion of the test. The reading selections PMt—psychomotor I (coordination) factor
were on technical material. Right an- LE—length-estimation factor
FACTOR ANALYSIS IN A TEST-DEVELOPMENT PROGRAM 87

was read, hence the reasoning variances. indeed, be identifiable as speed of de-
There was no attempt to measure visu- cision or speed of reaction. Whatever
alization with this test, but there hap- it is, however, it is not needed to ac-
pened to be descriptive material in count for this test's obtained validities
which comprehension was presumably for aircrew predictions.
facilitated by virtue of good visualizing Figure 3 shows two criteria analyzed
ability. The small variances in other into contributing variances of different
identifiable factors were less than two kinds: common-factor, unknown, and
percent each. The sum of all common- error. The pilot criterion became much
factor variances fully reached the test's better known than the navigator cri-
reliability coefficient, so there was no terion because many more experimen-
specific or unknown non-error variance. tal tests had been validated against
Dial and Table Reading was designed it. There was opportunity to estimate
as a work-sample test.4 It is very com- the contribution of variances of some
plex factorially, which increases its twenty-seven common factors to the
chances of being valid but which makes pilot criterion, whereas for the navi-
its total scores ambiguous as to mean- gator criterion there was opportunity to
ing. When an individual has a high account for only 11 factors. It is esti-
score in it, we do not know whether he mated that some 52 per cent of the
is particularly good in number ability, variance of the pilot pass-fail training
spatial ability, or perceptual-speed abil- criterion can be accounted for by con-
ity, or any combination of these. Only tributions from twenty-three orthogonal
a small proportion of this test's non- factors. If we accept the estimate of
error variance is still to be accounted reliability of the pilot criterion as being
for, as indicated by the portion labeled .80 (it is probably less than that) we
"U" in Fig 2. can see that perhaps an addition of 28
The Discrimination Reaction Time per cent of the total variance could be
test was designed to measure the job- predicted if we had the proper tests to
analysis trait or traits of speed of de- measure the as yet unknown factors.
cision and reaction. Its leading vari- Nine known factors account for about
ance proved to be in spatial relations, 56 per cent of the navigator criterion.
followed by variance in psychomotor How much the additional known com-
precision, perceptual speed and visuali- mon factors would increase this figure
zation. With a reliability of .92, this is a question still to be answered. It is
test has considerable unknown non- safe to say that more of the non-error
error variance to be accounted for.5 variance in the navigator criterion has
Some of this unknown variance may, been accounted for than in the pilot
criterion. A comparison of the two
4 criteria shows that they have very little
It is interesting to note that this test was
developed first as two separate tests, but they in common; only the spatial-relations
were later combined when it was found that and perceptual-speed factors are mate-
they functioned very similarlj in predicting
criteria Factor analysis revealed that they rial contributors to both criteria. This
had almost identical functional content. is a circumstance that is very favorable
5 for differential selection, that is, classifi-
Since this test is an outgrowth of a very
old laboratory technique, the finding that it is cation.
quite complex factorially should be of special
interest to the experimental psychologist No 2. Economy. The economy of the
longer can the latter feel secure as to the na- factorial approach lies in the relatively
ture of his experimental variables. Ambiguity
of meaning of measurements is a much greater small number of variables with which
problem than has been generally recognized. one must deal in covering predictions.
88 J. P . GUILFORD

The number of distinct factors seems to test—mechanical experience, visualiza-


be much larger than many have antici- tion, and spatial relations. The navi-
pated. In spite of this, however, by the gator validity of the same test is much
use of factors the number of working greater and is accounted for mainly by
variables is materially reduced, as com- other factors—verbal, reasoning, and
pared with the number of tests gener- number. In both instances just men-
ally employed. During the war, the tioned, the obtained validity is less
AAF classification battery was com- than .01 greater than that expected
posed of about twenty tests yielding as from the factors and their loadings.
many scores, each of which received a The pilot validity of the Dial and
weight (including some zero weights) Table Reading test is mainly attribut-
in deriving a composite aptitude score able to its space and perceptual-speed
for each AAF aircrew specialty. These variances. The predicted validity is
twenty scores covered only about eight less than .02 short of the obtained va-
of the factors of the pilot criterion. lidity. The navigator validity for this
This fact is eloquent of the wastage test is much greater, and the additional
from overlapping of coverage of neces- communality is to be found in the num-
sary variables. At the same rate, it ber, reasoning, and mathematical-back-
would have required about sixty tests ground factors. In this instance, the
in order to cover the known factors of predicted validity is nearly .02 greater
the pilot criterion. With a battery of than the obtained validity. All of these
pure tests, one for each factor, the num- discrepancies are probably well within
ber of tests can be reduced to the num- the limits of sampling errors. In a
ber of factors. Had aviation psycholo- study of the closeness of obtained pilot
gists known what all the factors were validities to predicted validities, involv-
at the beginning, their efforts could ing 90 tests, the median discrepancy
have been directed to the construction was about .02. The correlation between
of far fewer than the one to two hun- predicted and obtained validity coeffi-
dred tests that were actually designed cients was .81. These results indicate
and constructed. In future test-devel- merely the goodness of fit of data to
opment programs, the AAF factorial the factor loadings estimated in tests
findings should provide considerable and criterion. Loadings in the criterion
guidance toward an economical con- had been estimated largely from va-
tinuation. lidity coefficients: loadings in the tests
3. Understanding of test validity. from intercorrelations of tests.
The factorial approach enables us to 4. Construction of univocal tests. A
understand why each test is valid for univocal test has its non-error variance
the prediction of some criteria and why confined to one common factor. In
it is not valid for the prediction of other words, it is factorially pure. The
others. Figure 4 shows how the va- advantages of having univocal tests are
lidity coefficients of each of two tests many, particularly if the factors are
for the selection of pilots and navigators themselves unrelated.6 It should be ap-
can be made intelligible. Equation (4) parent from what has been said thus
was applied to the factor loadings that far that factorially impure tests often
test and criterion had in common in
6
each case. The pilot validity of the In this discussion it will be assumed that
Reading Comprehension test, it can be the factors are orthogonal, i.e., uncorrelated.
There were some minor intercorrelations
seen in Fig. 4, is almost entirely ac- among factors, but not enough to detract
counted for by secondary factors in the from the arguments here presented.
FACTOR ANALYSIS IN A TEST-DEVELOPMENT PROGRAM 89
.o A .2 3 4 .5 .6

PILOT VALIDITY OF
READING COMPREHENSION

R, | ME| O
NAVIGATOR VALIDITY OF
READING COMPREHENSION

P OK
PILOT VALIDITY OF
DIAL AND TABLE READING

MB
NAVIGATOR VALIDITY OF
DIAL AND TABLE READING

PILOT VALIDITY OF
MEMORY FOR TACTICAL PLANS

.0 1 2 .3 4- S 6

Fio. 4. Diagrams of factorial contributions to the validity


coefficients of three tests for predicting pilot and navigator
training criteria. Linear validity scale is shown at the top
and bottom of the series of bars. The cross-hatched portions
represent validity unaccounted for by known factors. Let-
ter symbols have the same meanings as in Figs 2 and 3.

contain variances that are unrelated to lower the correlation still more. From
the criterion. The verbal and reasoning this point of view, it would seem im-
variances in the Reading Comprehen- portant to rid a test of all common-
sion test and the number and reasoning factor variance except that in one factor.
variances in the Dial and Table Read- Classification of personnel—and vo-
ing test add nothing to those tests for cational guidance comes under this gen-
pilot selection. There is evidence that eral heading of classification—requires
the verbal variance is even negatively univocal tests for most effective results.
correlated with the pilot criterion for Differential predictions among general
the AAF aircrew training population. vocational choices which themselves
Vocabulary tests, which are most uni- possess different degrees of mutual in-
vocal for the verbal factor, consistently dependence is an exacting process.
correlated negatively with the pilot cri- When vocational outlines are blurred
terion. An invalid variance in any test and overlapping and when test scores
will lower the amount of correlation of by which we hope to sort individuals
the test with the criterion. It attenu- into vocational categories also overlap
ates a correlation to the same extent as to a large extent, we are bound to make
would a like amount of error variance. unsatisfactory decisions.
A factor with negative validity will If we set up a single composite apti-
90 J. P . GUILFORD

tude score for each job, the maximal can sometimes be used effectively in
degree of independence among aptitude this connection, particularly when there
scores is demanded. If each aptitude are known pure tests of either the de-
score is a weighted composite from a sired or the undesired factors that are
battery, maximal uniqueness can be involved in the new test.
achieved for each composite when pure 5. Discovering new valid factors. It
tests are used. It was estimated from was stated earlier that one of the ways
known common-factor loadings that the of improving validity of an aptitude
pilot and navigator criteria would cor- score is to introduce an additional fac-
relate less than .20. It was impossible tor not previously covered in predicting
to determine this correlation empirically a certain criterion. There has been a
for the conditions were never adequate. tradition that after combining the best
A correlation of .20 indicates an ex- four or five tests in a battery it is rarely
treme degree of independence and hence that one more test increases the multi-
much opportunity for useful classifica- ple correlation with that criterion. The
tion of trainees. With the use of tests, reason is that the additional test merely
most of which were factorially complex, duplicates factors already covered. The
however, it was highly unlikely that remedy is to discover and to measure
this much independence between apti- some new factor that is valid—one that
tude composites could be achieved. The correlates with the criterion. Knowing
actual correlations between pilot and what factors are already covered, one
navigator aptitude scores were usually can avoid them in new tests and one has
of the order of .50. a much better idea whether a new test
Even a few pure tests in a battery is likely to contribute something really
will help a great deal, if they are prop- new.
erly chosen. For the use of single test There are two ways in which one is
scores in profiles, as in clinical voca- led to new factors. One of these is to
tional guidance, however, the ideal find that there is much unknown non-
would be to have all tests univocal. In- error variance in a test. This variance
terpretations would then be much might be specific to the test, but prob-
clearer and individual profile patterns ably is not. Such a finding is always a
would more nearly approach unique- challenge to the factorist to see whether
ness. or not this variance is common to other
The production of uni-factor tests is tests or is common to some criterion.
facilitated by the psychologist's aware- Excluding the features of the test that
ness of the nature of factors. As ex- can be attributed to already better
perience with the factors increases, their known factors, what does this test with
contours become sharper and their fea- unknown variance have that is unique?
tures better defined. The purification Several hypotheses are called to mind.
of a test is a double process; of maxi- New tests are developed, one or more
mizing desired variance and minimizing with the attempt to bring out or to
undesired variance. Certain factors, stress each supposed new factor. Analy-
such as verbal, numerical, and per- sis will show in which of these directions
ceptual speed, are among the chief stow- the greatest new communality lies. An
aways. It becomes relatively easy to example of this in the AAF was the dis-
tell beforehand when these factors are covery of the spatial-relations factor in
most likely to creep into a new test, the Complex Coordination test. This
and steps necessary to keep them out test was known to correlate with cer-
can be taken. Item-analysis procedures tain printed tests, which suggested that
FACTOR ANALYSIS IN A TEST-DEVELOPMENT PROGRAM 91

it had a substantial amount of some in- represented in Fig 4, was found to


tellectual variance. The leading hy- have a pilot validity of .19, of which
potheses were (1) that this was mostly the visualization factor can account for
a matter of awareness of spatial ar- only .06. If a single factor accounts
rangements and (2) that it was an for the remaining .13 of the total va-
ability to integrate a multitude of sen- lidity, it must have not only a relatively
sory impressions so as to yield a co- high variance in the pilot criterion but
ordinated movement response. New also in the test. Because the test was
printed tests were developed to fit the developed as a memory test, stressing
two hypotheses; space tests and inte- memory for verbal content (instructions
gration tests. After analyses of these for a mock mission), it is tempting to
and other tests, it was decided that the adopt this as a hypothesis. It was the
first hypothesis was correct. The im- only memory test of its kind developed
portant practical outcome was not that and analyzed. It did not correlate ma-
a new valid factor was added to the terially with other memory tests, so
list, but that a factor already included this factor is either a radically different
in the battery was better identified and kind of memory or it is a non-memory
it was found that this factor could be factor. The content-memory hypothe-
measured as well or even better by sis would be the one to follow up. In
means of printed tests, with a great Fig. 3, this factor has been tentatively
saving of time and equipment as com- identified as M4 (Memory IV, since
pared to the psychomotor test that also three others were identified as memory
measured it. Another outcome, how-
factors) and it has been given an esti-
ever, was the discovery of three new
mated variance of .06.7
factors in the integration tests, two of
which are probably valid for pilot se- 6. Objective job analysis. Ordinary
lection. job analysis, by direct observation or
otherwise, taxes the best powers of the
A second indication of the presence experienced psychologist. It may be
of a new factor is that the validity of a the traditional psychological categories
test is not fully accounted for by known that are at fault. It may be a 'halo'
factors. This was true of the Mechani- effect, which applies to the psychologi-
cal Principles test. Of all the mechani- cal evaluation of jobs as well as to the
cal tests developed, this one, which is rating of persons. However this may
similar to the Bennett test (2), had be, the list of factors does not coincide
greatest pilot validity. Its validity was very well with the usual list of job-
materially higher than that of a Me- analysis categories.
chanical Information test which was A comparison of the two approaches
purest for the only factor that is unique to job analysis is decidedly in favor of
to mechanical tests—the factor identi- the type represented in Fig. 3, where
fied as mechanical experience. In pre- proportions of the total variance of the
liminary analyses the additional valid pilot and navigator criteria are segre-
factor in Mechanical Principles became gated and identified. The approach is
recognized as visualization. With new
tests constructed with the intent to 7
In the AAF reports, this factor was desig-
measure visualization brought into the nated as integration I, merely because it was
picture, the visualization hypothesis found in some integration tests. The identity
gained ground and was never contra- of integration I and memory IV still lacks di-
dicted by later results. The test called rect verification The tests saturated with
integration I all require the remembering of
Memory for Tactical Plans, which is rather complicated instructions.
92 J. P . GUILFOBD

exact, objective, and dependable. Others sonably close predictions with so many
who use the same steps would be likely tests in the AAF lends some confidence
to arrive at similar results, though there to the practice of prediction, at least as
may be differences of opinion as to a preliminary step. Later actual valida-
naming and definition of some of the tion would be desirable.
factors. The categories at least have What holds for the prediction of cor-
referants. This type of analysis is pos- relations between tests and criteria also
sible for any job for which there are holds for the intercorrelations among
suitable criteria. The best procedure tests and among criteria. One would
would be to correlate each criterion with not need so often to forecast the cor-
a number of tests each of which fea- relation between two tests that had not
tures one common factor. Here, again, been administered to the same popula-
univocal tests would be most useful. tion, though the possibility is real when
Lacking the opportunity for this com- each has been analyzed and a liberal
plete procedure, the job analyst could portion of the non-error variance of
probably improve his inspectional meth- each test has been identified. Of much
ods materially by becoming well ac- more practical use is the prediction of
quainted with the factors and using the correlation between two criteria.
them as his reference categories. Once An illustration of this was mentioned
the factors become known, it is not earlier, involving pilot and navigator
very difficult to recognize by inspection criteria. Rarely does one have the op-
their probable presence in either a new portunity to evaluate the same sample
test or a new job criterion. of individuals in two different jobs.
7. Prediction of test validities and Yet, for the sake of knowing how much
other intercorrelations. When a job cri- differential prediction is possible, and
terion becomes known in terms of com- how much independence to expect be-
mon factors and their loadings, and tween the composite aptitude scores
when any new test has been analyzed used to predict the two job criteria,
and its loadings in corresponding fac- some estimate of the intercorrelation is
tors have been determined, the validity required. This estimate can be made
of the test for predicting this criterion using the common-factor loadings in
can be predicted by the use of equation equation (4).
(4). This statement should be modi- 8. The assembly oj test batteries.
fied by saying that the minimum va- From the knowledge of factorial com-
lidity can be estimated, allowing for position of a criterion and of available
the chance that there is unknown valid tests, one can proceed to put together
variance in the test. This feature of an appropriate battery with combining
analytical methods became quite practi- weights, with some assurance that one
cal in the case of a few AAF tests that is working upon a rather dependable
had not been correlated with the pilot foundation. If one has used a collec-
criterion, and even more tests that had tion of uni-factor tests in order to as-
not been validated against the naviga- sess the factorial composition of the
tor criterion owing to the lack of oppor- criterion, then of course a good battery
tunity. It is frequently feasible to could be selected from among their
gather the information needed for mak- number. The advantages claimed here
ing a prediction of validity when it is would be more apparent if it were de-
not easy to go through the regular rou- sired to use tests that had not been
tine of validation by correlation of test previously correlated with the criterion
with criterion. The experience of rea- but whose factorial compositions are
FACTOK ANALYSIS IN A TEST-DEVELOPMENT PROGRAM 93

known. On a smaller scale, the advan- 1. It provides a precise segregation


tage would be felt when it became nec- of what is measured in either a test or a
essary for any reason to substitute a job criterion into component variances.
new test for one in the battery. If the 2. It provides an economical pro-
substituted test has a very similar cedure, eliminating wastage due to over-
factorial composition to that of the test lapping categories, and reducing the
it replaces, one can have considerable number of variables needed to encom-
confidence that it will function in the pass an enormous variety of individual
same manner as the test replaced. This differences.
will be true even when the two tests 3. It enables us to understand why
superficially appear to be different. a test is valid for prediction of behavior
Without factorial knowledge, bad sub- of a certain class and why it is not
stitutions are undoubtedly made at valid for prediction of behavior in some
times because it is assumed that super- other class.
ficial resemblances carry with them 4. It leads to univocal or factorially
functional identities. Functional iden- pure tests, which are not only more eco-
tities can be assured only through the nomical for the coverage of aptitudes
duplication of factor variances. but are also more manageable in com-
binations and more meaningful in vo-
SUMMARY cational guidance.
5. It leads to the discovery of new
In this article it was pointed out that factors that are valid. The improve-
although tests that are developed on the ment of predictions depends upon this.
basis of the work-sample principle are 6. It provides an objective means of
generally valid for predicting particular job analysis. Even if the opportunity
criteria which they resemble in a sig- does not exist for the statistical analy-
nificant manner, they tend to be costly sis of a job's requirements by correla-
and of limited or unknown general ap- tion of its criteria with tests of known
plicability. Tests based upon the more factorial composition, the knowledge
common principles of job analysis are and use of factorial categories will be a
variously successful. Either when they material aid to the observer of job ac-
are successful or when they are not, one tivities.
is left very much in the dark as to the 7. Test validities may be predicted in
reasons, and progress is to a large de- advance of empirical validation pro-
gree of the trial-and-error type. Job- cedures, where it is important to have
analysis categories are usually hazy in this information early and when fac-
contour and seriously overlapping, thus torial compositions of both test and cri-
introducing much wasteful effort. Tests terion are known. This kind of predic-
developed to measure one supposed psy- tion also applies to the intercorrelations
chological entity are frequently found among tests and among criteria.
by factor analysis to be measuring 8. Knowledge of factorial composi-
something quite different. The factor- tion of tests and of criteria is sufficient
analysis aproach to the problems of basis for the compilation of aptitude
test development is proposed because it batteries. Knowing a job in terms of
provides a rational, objective procedure factors and their loadings, we can write
and a meaningful, operationally defined, a prescription of the tests and their
and dependable set of reference cate- weights needed to predict success in it.
gories. Among its advantages the fol- In general terms, proceeding on the
lowing are proposed for consideration: basis of factorial knowledge means
94 J. P . GUELFORD

working with considerable light whereas 4-5. Washington, D. C : Government


test development or selection of tests Printing Office, 1947.
2. BENNETT, G. K., & FRY, D. E. Mechanical
without it means working in consider- comprehension test. New York: Psy-
able darkness. chological Corporation.
3. GOTLFOSD, J. P. The discovery of aptitude
REFERENCES and achievement variables. Science,
1947, 106, 279-282.
1. Army Air Forces Aviation Psychology Pro- 4. Huix, C. L. Aptitude testing. Yonkers:
gram Research Reports. Reports No. World Book Co., 1928.

S-ar putea să vă placă și