Documente Academic
Documente Profesional
Documente Cultură
IW(I, 4.5
JO:iN p. CAMPBELL
University of Minnesota and Human Resources Research Organization
JEFFREY J.McHENRY, LAURESS L. WISE
American Institutes for Research
The research reported here was sponsored by the U.S. Army Research Institute for
the Behavioral and Social Sciences, Contract No. MDA903-82-C-0531. Ali statements
expressed in this paper are those of the authors and do not necessarily reflect official
opinions of the U.S. Army Research Institute or Ihe Department of the Army.
Jeffrey McHeni> is now at the Allstate Research and Planning Center, Menlo Park,
CA.
313
314 PERSONNEL PSYCHOLOGY
If all the rating scales are used separately and the job (MOS) spe-
cific measures are aggregated at the task or instructional module level,
there are approximately 200 criterion scores on each individual. Adding
them all up into a composite is a bit too atheoretical, and developing
a reliable and homogeneous measure of the general factor violates the
basic notion that performance is multidimensional. A more formal way
to model performance is to think in terms of its latent structure, postu-
late what that might be, and then resort to a confirmatory analysis. Un-
fortunately, much more is known about predictor constructs than about
job performance constructs. There are volumes of research on the for-
mer, and almost none on the latter. For personnel psychologists it is
almost second nature to talk about predictors in terms of theories and
constructs. However, on the performance side, only a few people have
even raised the issue (e.g., Dunnette, 1963; James, 1973; Wallace, 1965).
To proceed we used the previous literature that did exist (Borman,
Motowidlo, Rose, & Hanser, 1985), the collective judgment of the project
staff, data from the Project A pilot and field tests, and preliminary anal-
yses from the concurrent validation sample to formulate a target model.
The target performance model was then subjected to what might be de-
scribed as a "quasi" confirmatory analysis using data from the Project A
concurrent validation sample (described below). The purpose was to
consider whether a single model of the latent structure of job perfor-
mance would fit the data for all nine jobs. It is the results from these
analyses that are reported here.
316 PERSONNEL PSYCHOLOGY
Procedure
TABLE 1
Job Performance Criterion Measures Used in Project A
Concurrent Validation Samples
A Job History Questionnaire, which asks for information about frequency and
recency of performance of the MOS-specific tasks.
Work Environment Description Ouestionnaire—a 14i-item questionnaire
assessing situationai/enviroamental characteristics, leadership climate,
and reward performance.
The administrative measures were grouped into five scales on the basis
of content, and no attempts were made to further reduce these scales at
this point.
Analytic Steps
The analysis had three major steps. The first step was to determine a
basic array of criterion scores that would constitute the input to the con-
firmatory analysis. In unaggregated form, there were simply too many
318 PERSONNEL PSYCHOLOGY
Results
Individual task tests from the job sample (15 tasks) and job knowl-
edge (30 tasks) measures were grouped by six research staff members
into "functional content categories" on the basis of similarity of task con-
tent. The 30 tasks originally sampled for measurement in each job were
clustered into 8-15 categories per MOS. Each of the training knowl-
edge items was similarly grouped into a specific content category. Ten
of the categories were common to some or all of the jobs (e.g., first aid,
basic weapons, field techniques). Each MOS, except Infantryman, also
had two to five performance categories that were unique, or job specific.
The Infantryman position is unique in that much of the job content is
composed of the so-called common tasks, and it is difficult to make the
distinction between job-specific tasks and common tasks for this MOS.
Next, scores were computed for each content category within each
of the three sets of measures. For the hands-on measure, the functional
category score was the mean percentage of successfully completed steps
across all of the tasks assigned to that category. For the job knowledge
test and the training knowledge test, the functional category score was
the percentage of items within that category that were answered cor-
rectly.
After category scores were computed, they were subjected to a series
of exploratory analyses via principal components. Separate analyses
were executed for each type of measure within each job. There were
several common features in the results. First, the unique or specific
categories for each job/MOS tended to load on different factors than the
common categories. Second, the factors that emerged from the common
categories tended to be fairly similar across the nine different jobs and
across the three methods.
Using these exploratory principal components analyses as a guide,
the following set of content categories was identified:
1. Basic military skills (field techniques, basic weapons operation, weap-
ons, navigation, customs and laws).
JOHN P CAMPBELL ET AL. 319
The next step was to generate a target model for the latent structure
of job performance that could be tested for goodness of fit within each of
the nine jobs. As a starting point, the nine intercorreiation matrixes for
the basic array of criterion scores were each subjected to an exploratory
factor analysts. Several consistent results were observed. As expected,
there was the general prominence of "method" components, specifically
one methods component for the ratings and one methods component
for the written tests. The emergence of method components was con-
sistent with prior findings (e.g., Landy & Farr, 1980). Also, there was a
consistent correspondence between the administrative indexes and the
three Army-wide rating components. The awards and certificates item
from the administrative measures loaded together with the Army-wide
effort/leadership rating component; the Article 15 and promotion rate
JOHN P. CAMPBELL ET AL. 321
items loaded with the personal discipline component (most of the vari-
ance in promotion rate was thought to be due to retarded advancement
associated with disciplinary problems); and the physical readiness scale
loaded on the fitness/bearing component.
On the basis of findings from this last set of exploratory empirical
analyses, a revised model was constructed to account for the correlations
among the performance measures. This model included the five job
performance constructs shown in Figure 1. In addition, a "paper-and-
pencil test" methods component and a "ratings" method component
were retained.
Several Issues remained before the model could be tested for good-
ness of fit within the nine Batch A jobs. One was whether the job-specific
BARS rating scales were measuring job-specific technical knowledge
and skill, or effort and leadership, or both. The intercorrelations among
the performance components suggested that these rating scales were
measuring both of these performance constructs, though they seemed
to correlate more highly with other measures of effort and leadership
than with measures of job-specific technical knowledge and skill.
Another issue was whether it was necessary to posit hands-on and
administrative measures method components to account for the inter-
correlations within each of these sets of measures. The average inter-
correlation among the scores within each of these sets was not particu-
larly high. Therefore, for the sake of parsimony, these two additional
methods components were not made part of the model.
The next step was to conduct separate tests of goodness of fit of this
target model within each of the nine jobs using the LISREL VI (Joreskog
& Sorbom, 1981). In conducting a confirmatory analysis with LISREL,
it is necessary to specify the structure of three different parameter ma-
trixes: Lambda-!', the hypothesized factor structure matrix (a matrix
of regression coefficients for predicting the observed variables from the
underlying latent constructs); Theta-Epsilon, the matrix of uniqueness
or error components (and intercorrelations); Psi, the matrix of covari-
ances among the factors. In these analyses, the diagonal elements of
Psi (i.e., the factor variances) were set to 1.0, forcing a "standardized"
solution. This meant that the off-diagonal elements in Psi would rep-
resent the correlations among and between the performance constructs
and method. The model further specified that the correlation among
the two method factors and each performance construct should be zero.
322 PERSONNEL PSYCHOLOGY
it :|iti
fl« 2 | | 5 j"-^5
I
^•,3 O
c' 2 _* ° < ~ ^ •
2 g
g= 5
« —
|i til «
u
c
s
S &
If 3 : ; ...
CO
te
UJ
u -
E •
S
S S §£ o .3 •"
a. a » C • > c
•S -3
Ui tt
li 03
£ I-
•s £ ii
Q
<
n o O 5
*^ s s o
"
So ^1
E E CO
UJ
IS-
UJ u o J5 —
£•• =
O tt
IB £ (0 i2 . « •5 "
Sg £ g
V IB
sr » Q 5 £ ^ <o
o g
V >
Mi
flC
of
"C to ls 5c
D
U
« = o
U Q.
V O I |! £ e
! -
of S
•c TB *
O £ = c a> u O-5 c
u.
u.
Q.-O
-o
c «a 8 II sSC ?". -olc = «5 «
S T ot
UJ ^ 8 oO IS• M e n OC
UJ
M ;>
2-D
P .E .£
O- I- .c
SI?
T 5 t
5! ill
O £ :" O |
SS S S 5 £ — -o o tt •
• - - • • g-E 3 . E S
< OS . 5M
O
= «o E £ O **
•S •= tt
z O
X 8 o-st: CO
o •? — n
35^
« •
J, JJ JJ s c ; «• "
UJ
O • Q. O % tt 3 S < c « „ •• 'P
? s £ ff '- ; g £
I- CC
UJ
" E o
sss UJ
?«s £I t i • I "s
oc z
o =5 • c
oo c« -=
=
^ O ? tt
u UJ
O ti >
o
JOHN P CAMPBELL ET AL. 323
TABLE 2
Goodness-of-Fit Indexes Using a Separate Model for Each Job
Root Mean
Square Chi-
MOS Residual Square df P
This effectively defined the method factor as that portion of the com-
mon variance among measures from the same method that was not pre-
dictable from (i.e., correlated with) any of the other related factor or
performance construct scores.
As may often be the case, some problems were encountered in fitting
the hypothesized model for several of the jobs. Solutions were obtained
with some factor loadings greater than one and with negative unique-
ness estimates for the corresponding observed variables. Also, estimates
of the correlations among the performance constructs occasionally ex-
ceeded unity. These problems necessitated a certain amount of ad hoc
cutting and fitting in the form of computing the squared multiple corre-
lation (SMC) for predicting each observed variable from all of the other
variables and setting the uniqueness estimates (i.e., Theta-Epsilon diag-
onal) to 1.0 minus this SMC. This approach eliminated all factor loadings
and correlations greater than 1.0. In most cases, a second "Iteration"
was performed to adjust the Initial Theta-Epsilon estimates so that the
diagonal of the estimated correlation matrix would be as close to 1.0 as
possible.
Table 2 shows the value of chi-square for each job based on a good-
ness-of-fit comparison of the actual correlations among the observed
variables and the correlations estimated from Lambda-^, Theta-Epsilon,
and Psi. The goodness offitis distributed as chi-square, with degrees of
freedom dependent on the number of observed variables and the num-
ber of parameters estimated. The expected value of chi-square is equal
to the degrees of freedom, which is a sign that the correlations among
the observed variables do not reject the model. These chi-square val-
ues should be interpreted with considerable caution. The approach used
was not purely confirmatory because the hypothesized target model was
based in part on analyses of these same data. In addition, LISREL was
324 PERSONNEL PSYCHOLOGY
TABLE 3
Factor Loadings for Single Model Across All Jobs'^
TABLE 3 (continued)
Factor Loadings for Single Model Across AllJobs^
Written Method
JK Tech - 49 29 54 71 30 42 49 49
JK Soldier -16 51 29 40 53 25 28 60 60
JK Safety -07 49 07 52 26 28 35 52 52
JK Comm 00 11 19 38 - - - 41 41
JK Vehicle - - - 19 62 _b _ 20 20
JK Identify -05 20 12 17 - 10 - 25 25
TK Tech Skill - 54 65 64 49 71 45 53 53
TK Basic Skill 44 68 58 61 25 66 50 60 60
TK Safety 34 51 49 57 18 56 30 59 59
TK Comm 51 46 60 _ 20 36 20 50 50
TK Vehicle 38 51 17 60 45 _b 17 46 46
Note: HO = Hands-on; JK = Job Knowledge Test; TK = It-aining Knowledge Tfest; AW
= Army-wide Ratings; MOS = Job-specific Ratings.
"Decimals are omitted.
Vehicle content was merged into the Core Technical factor for 64C.
•^These loadings were constrained to be equal across all MOS.
TABLE 4
Uniqueness Estimates Single Model Across AU Jobs°^
a
.9
ual ific
0
C
XX XXXXXX
in
Writi
owled]
I
ess/
c u
i X
Milita
I CL,
I 13 c
c i^
Q
a. XX X XX
Disci
oi
hip
c
a 2
C X XXXXXX
Lead
I u
0
"5 t c
1 XXXX X X
u
I 0
c
<u
u ' cu0
5-
X
d
L.,
(C
0
1-
a.
a
u. U °S E
u
c !- —
"5..
, — '-
E P B-^
rfoi
<u 00
a.
XX
JOHN P. CAMPBELL ET AL. 329
"a-s 2
a Is
13 m a.
a* O a>
£ M
U.CQ
•o * c
o "a.
£•5
Iii
•t ^ I
s
2 tr BM>
oc
xxxx xxxxx
31.H
E|°i=
_ 4J O C
.5 g -Tj o
hre
;ral
des
JJ
=.'5 E
c
N 1 :>Ou S
• -
1 H
(/I
330 PERSONNEL PSYCHOLOGY
O r^ r^ ^ *•" (N ^ ^ ^-- r^
wo:
ri C C ol
CN O r<i oo
(N ^ O — r*-)
r-l VO ifi 1^ o
r- CN « O (N
JOHN P. CAMPBELL ET AL. 331
jobs sampled from this population, and the constructs are based on mea-
sures carefully developed to be content valid, it seems safe to ascribe a
degree of construct validity to them.
REFERENCES
Borman WC, Motowidlo SJ, Rose SR, Hanser LM. (1985). Development of a model of sol-
dier effectiveness (ARI Technical Report 741). Alexandria, VA: U.S. Army Research
Institute for the Behavioral and Social Sciences.
Campbell CH. Ford P, Rumsey MG, Pulakos ED. Borman WC, Felker DB, de Vera MV,
Riegelhaupt BJ. (1990). Development of multiple job performance measures in a
representative sample of jobs, PERSONNEI. PSYCHOLOGY, •^.?, 277-300.
Campbell JP (] 986a, August). When the textbook goes operational. Paper presented at the
94th Annual Convention of the American Psychological Asswiation, Washington,
DC.
Campbell JP (Ed.). (1986b). Improving the selection, classification, and utilization of army
enlisted personnel: Annual report, }986 fiscal year {AKl Technical Report 813101).
Alexandria. VA: Army Research Institute for the Behavioral and Social Sciences,
Alexandria, VA.
Campbell JP (Ed.). (1987). Improving the selection, classification, and utilization of Army en-
listed personnel: Annual report, 1985 fiscal year {ARI Technical Report 746). Alexan-
dria, VA: Army Research Institute for the Behavioral and Social Sciences, Alexan-
dria. VA.
Campbell JP, Dunnette MD. Lawler EE, Weick KE. (1970). Managerial behavior, perfor-
mance, and effectiveness. New York:McGraw-Hill.
Dunnette MD. (1963). A modified model for selection research. Joumal of Applied Psy-
chology, 47, ^\1-2.2}.
James LR. (1973). Criterion models and construct validity for criteria. Psychological
Bulletin, 80, 75-83.
Joreskog KC, Sorbom D. (1981). LISREL VI: Anafysis of linear squares methods. Uppsala,
Sweden: University of Uppsala.
Landy FJ. Farr J L (1980). Performance rating. Psychological Bulletin, 87, 72-107.
McHenry JJ, Hough LM. Toquam JL, Hanson MA. Ashworth S. (1990). Project A validity
results: The relationship between predictor and criterion domains, PERSONNEL
PSYCHOLOGY, 43, 335-354.
Schmidt FL. Kaplan LB. (1971). Composite vs. multiple criteria: A review and resolution
of the controversy, PERSONNEL PSYCHOLOGY, 24, 419-434.
Wallace SR. (1965). Criteria for what?/Imcnt-a/i P.rVf/io/ogisr, 20. 411-418.
Young WY. Houston JS, Harris JH. Hoffman RG. Wise LL. (1990). Urge-scale predictor
validation in Project A: Data collection procedures and data base preparation.
PERSONNEL PSYCHOLOGY, 43. 3 0 1 - 3 1 1 .