Sunteți pe pagina 1din 22

PERSONNEL PSYCHOLOGY

IW(I, 4.5

MODELING JOB PERFORMANCE IN A POPULATION


OF JOBS

JO:iN p. CAMPBELL
University of Minnesota and Human Resources Research Organization
JEFFREY J.McHENRY, LAURESS L. WISE
American Institutes for Research

The Army Selection and Classification Project has produced a compre-


hensive examination of job performance in 19 entry-level Army jobs
(Military Occupational Specialties) sampled from the existing popula-
tion of entry-level positions. Multiple methods of job analysis and crite-
rion measurement were utilized in asubsampleof nine jobs to generate
over 200 performance indicators, which were then used to assess per-
formance in a combined sample of 9,430 job incumbents. An iterative
procedure involving a series of content analyses and principal compo-
nents analyses was used to develop a basic array of up to 32 criterion ••
scores for each job. This basic set of scores formed the starting point
of an attempt to model the latent structure of performance in this pop-
ulation of jobs. After alternative models were proposed for the latent
structure, the models were submitted to a goodness-of-fit test via LIS-
REL VI. After accounting for two components of method variance, a
five-factor solution was judged as the best fit. The implications of the
results and the modeling procedure for future personnel research are
discussed.

Previous papers in this issue have discussed predictor development,


criterion development, and data editing and preparation. This paper
is intended to illustrate further the usefulness of theory, even a very
sketchy one, in applied research. It describes the Project A attempt
to model job performance in this population of jobs and to increase
understanding of the previously described criterion measures. Recall
that multiple methods were used to assess individuals on a wide array
of performance components and that considerable care was taken in the
task analysis and critical incident analysis to build in as much content
validity as possible.

The research reported here was sponsored by the U.S. Army Research Institute for
the Behavioral and Social Sciences, Contract No. MDA903-82-C-0531. Ali statements
expressed in this paper are those of the authors and do not necessarily reflect official
opinions of the U.S. Army Research Institute or Ihe Department of the Army.
Jeffrey McHeni> is now at the Allstate Research and Planning Center, Menlo Park,
CA.

COPYRIGHT © 1990 PERSONNEL PSYCHOLOCY. INC.

313
314 PERSONNEL PSYCHOLOGY

The Initial Framework

The overall criterion development work was guided by a particular


model of performance, the background for which is contained in Dun-
nette (1963), Wallace (1965), and J. P. Campbell, Dunnette, Lawler, and
Weick (1970). Performance is defined as observable things people do
(i.e., behaviors) that are relevant for the goals of the organization. The
behaviors that constitute performance can be scaled in terms of the level
of performance they represent. Further, individual performance behav-
iors exhibit sufficient patterns of covariation to yield reasonable factor
solutions. There is not one outcome, one factor, or one anything that
can be pointed to and labeled as job performance. Job performance re-
ally is multidimensional. A distinction is also made between performance
and the outcomes or results of performance, which J. P. Campbell et al.
(1970) called effectiveness. For example, if a manager exhibits stellar per-
formance behaviors, profits could go up, or they might not. For Project A
the overall goal was to describe the content and the structure, or dimen-
sionality, of performance in entry-level Army jobs.

Two General Factors

For the population of entry-level positions, two major types of job


performance components were postulated. The first is composed of
components specific to a particular job. That is, measures of such com-
ponents would refiect specific technical competence or specific job be-
haviors that are not required for other jobs. The second kind of per-
formance factor includes components that are defined and measured in
the same way for every job. These are referred to as non-job-specific or
Army-wide criterion factors.
For the job-specific components, we anticipated that there would be
a relatively small number of distinguishable factors of technical perfor-
mance that would be functions of different abilities or skills and that
would be refiected by different task content. The Army-wide concept
incorporates the basic notion that total performance is much more than
specific task or technical proficiency. It might include such things as
contributions to teamwork, continual self-development, support for the
norms and customs of the organization, and perseverance in the face of
adversity. In sum, the working model of total performance with which
the project began viewed performance as multidimensional within the
two broad categories of factors.
JOHN P. CAMPBELL ET AL. 315

Factors Versus a Composite

Saying that performance is multidimensional does not preclude using


just one index of an individual's contributions to make a specific person-
nel decision (e.g., select/not select, promote/not promote). As argued by
Schmidt and Kaplan (1971), it seems quite reasonable for the organiza-
tion to scale the importance of each major performance factor relative
to a particular personnel decision that must be made and to combine the
weighted factor scores into a composite that represents the total contri-
bution or utility of an individual's performance within the context of that
decision. That is, the way in which performance information is weighted
and combined is a value judgment on the organization's part. The deter-
mination of the specific combinational rules (e.g., simple sum, weighted
sum, non-linear combination) that best reflect what the organization is
trying to accomplish is a matter for research.

The Latent Structure

If all the rating scales are used separately and the job (MOS) spe-
cific measures are aggregated at the task or instructional module level,
there are approximately 200 criterion scores on each individual. Adding
them all up into a composite is a bit too atheoretical, and developing
a reliable and homogeneous measure of the general factor violates the
basic notion that performance is multidimensional. A more formal way
to model performance is to think in terms of its latent structure, postu-
late what that might be, and then resort to a confirmatory analysis. Un-
fortunately, much more is known about predictor constructs than about
job performance constructs. There are volumes of research on the for-
mer, and almost none on the latter. For personnel psychologists it is
almost second nature to talk about predictors in terms of theories and
constructs. However, on the performance side, only a few people have
even raised the issue (e.g., Dunnette, 1963; James, 1973; Wallace, 1965).
To proceed we used the previous literature that did exist (Borman,
Motowidlo, Rose, & Hanser, 1985), the collective judgment of the project
staff, data from the Project A pilot and field tests, and preliminary anal-
yses from the concurrent validation sample to formulate a target model.
The target performance model was then subjected to what might be de-
scribed as a "quasi" confirmatory analysis using data from the Project A
concurrent validation sample (described below). The purpose was to
consider whether a single model of the latent structure of job perfor-
mance would fit the data for all nine jobs. It is the results from these
analyses that are reported here.
316 PERSONNEL PSYCHOLOGY

Procedure

Samples and Measures

As described in a previous paper (Young, Houston, Harris, Hoffman,


& Wise, 1990), the final versions of the criterion measures were admin-
istered to the concurrent validation sample of 400-600 people in each
of the nine jobs (MOS) in Batch A. Again, the distinction between the
Batch A (9 MOS) and Batch Z (10 MOS) is that not all criterion mea-
sures were developed for the jobs in Batch Z as in Batch A. Budget
constraints dictated that the job-specific measures could only be devel-
oped for a limited number of jobs. The complete array of performance
measures is shown again in Table 1.
As described in more detail in C. H. Campbell et al. (1990), each job
sample, or hands-on measure, consisted of 15 major job tasks, sampled
from the population of tasks composing the whole job, and each task
consisted in turn of a number of critical steps, with each step scored
pass or fail. The number of steps within a task varied widely from a
half-dozen up to a maximum of 62. The job knowledge test consisted
of 3 to 8 items for each of 30 job tasks sampled from the population of
tasks, yielding total tests of 180-220 items. Fifteen of the 30 tasks were
also measured hands-on. The content of the training knowledge test
was designed to match both the Plan of Instruction (POI) in Advanced
Individual (technical) Training (AIT) and the task content of the MOS as
portrayed by the results of the job analysis survey conducted as part of the
Army Occupational Survey Programs. Test lengths ranged from 120-180
items. Whereas the training measure would eventually be administered
at the conclusion of advanced skills training in a longitudinal design, it
was administered in the concurrent validation as simply another test of
job knowledge.
The rating scales that were administered included the 10 Army-wide
(i.e., the scales were the same for all jobs) behaviorally anchored scales;
from 8-13 job-specific behaviorally anchored scales; ratings of perfor-
mance on each of the 15 tasks measured hands-on; and a 40-item combat
performance prediction questionnaire. Overall ratings of general effec-
tiveness as a soldier and potential for being an effective NCO were also
obtained.
The performance indicators contained in administrative records,
some of which were obtained via self-report questionnaire, included such
indicators as number of letters and certificates received, physical readi-
ness test score. Articles 15 and other disciphnary actions, and the M16
rifie qualification level. File data were also used to construct a promo-
tion rate score (relative to expected rate for a given length of service).
JOHN P. CAMPBELL ET AL. 317

TABLE 1
Job Performance Criterion Measures Used in Project A
Concurrent Validation Samples

Measures Used for all MOS


Paper-and-pencil test of Training Achievement developed for each of the 19
MOS (130-210 items each).
Five performance indicators from administrative records:
Total number of awards and letters of recommendation.
Physical fitness qualification.
Number of disciplinary infractions.
Rifle marksmanship qualification score.
Promotion rate (in deviation units).
Eleven behaviorally anchored rating scales designed to measure factors of
non-job-specific performance (e.g., giving peer leadership and support,
maintaining equipment, self-discipline).
Single scale rating of overall job performance.
Single seale rating of NCO (i.e.. leadership, supervision) potential.
A 40-iteni summated rating scale for the assessment of expected combat
performance.
Measures Used Only for Batch A MOS

From 6 to 13 MOS-specific behavioralty anchored rating scales intended to reflect


job-specific technical and task proficiency.
Job sample (hands-on) measures of MOS-specific task proficiency. Individual is '"
assessed on each of 15 major job tasks.
Paper-and-pencil job knowledge tests (150-200 items) designed to measure
task-specific job knowledge on 30 major job tasks. Fifteen
' • of the tasks were also measured hands-on.
Rating scale measures of specific task performance on the 15 tasks also
measured with the knowledge tests and the hands-on measures.
Situational Measures Included in Criterion Battery

A Job History Questionnaire, which asks for information about frequency and
recency of performance of the MOS-specific tasks.
Work Environment Description Ouestionnaire—a 14i-item questionnaire
assessing situationai/enviroamental characteristics, leadership climate,
and reward performance.

The administrative measures were grouped into five scales on the basis
of content, and no attempts were made to further reduce these scales at
this point.

Analytic Steps

The analysis had three major steps. The first step was to determine a
basic array of criterion scores that would constitute the input to the con-
firmatory analysis. In unaggregated form, there were simply too many
318 PERSONNEL PSYCHOLOGY

variables to theorize about. The reduction was accomplished by a series


of principal components analyses and expert judgment content analy-
ses. The next step was to specify a theory, or target matrix, that could
be subjected to LISREL. On the basis of the results of step one, several
alternative models were proposed, which were tested for goodness of fit.
The final step was to determine how much modification was necessary
to fit an overall model across all jobs, or MOS.

Results

Reduction of the Hands-On and Written Test Variables

Individual task tests from the job sample (15 tasks) and job knowl-
edge (30 tasks) measures were grouped by six research staff members
into "functional content categories" on the basis of similarity of task con-
tent. The 30 tasks originally sampled for measurement in each job were
clustered into 8-15 categories per MOS. Each of the training knowl-
edge items was similarly grouped into a specific content category. Ten
of the categories were common to some or all of the jobs (e.g., first aid,
basic weapons, field techniques). Each MOS, except Infantryman, also
had two to five performance categories that were unique, or job specific.
The Infantryman position is unique in that much of the job content is
composed of the so-called common tasks, and it is difficult to make the
distinction between job-specific tasks and common tasks for this MOS.
Next, scores were computed for each content category within each
of the three sets of measures. For the hands-on measure, the functional
category score was the mean percentage of successfully completed steps
across all of the tasks assigned to that category. For the job knowledge
test and the training knowledge test, the functional category score was
the percentage of items within that category that were answered cor-
rectly.
After category scores were computed, they were subjected to a series
of exploratory analyses via principal components. Separate analyses
were executed for each type of measure within each job. There were
several common features in the results. First, the unique or specific
categories for each job/MOS tended to load on different factors than the
common categories. Second, the factors that emerged from the common
categories tended to be fairly similar across the nine different jobs and
across the three methods.
Using these exploratory principal components analyses as a guide,
the following set of content categories was identified:
1. Basic military skills (field techniques, basic weapons operation, weap-
ons, navigation, customs and laws).
JOHN P CAMPBELL ET AL. 319

2. Safety/survival (first aid technologies, nuclear-biological-chemical,


warfare defensive measures, basic safety techniques).
3. Communications (radio operation).
4. Vehicle maintenance.
5. Identification of friendly/enemy aircraft and vehicles.
6. Technical skills (specific to the job).
These represent a further aggregation of the functional categories,
which, as noted above, were determined by expert judgment. These two
steps are described in greater detail in J. P. Campbell (1987).

Reduction of the Rating Variables

As noted in a previous paper (C. H. Campbell et al., 1990), the in-


dividual ratings scales were reasonably reliable; however, the different
scales exhibited intercorrelations varying from moderate to high. Fur-
ther reduction in the number of scales was aimed at reducing redundancy
and colinearity. Empirical factor analyses of the Army-wide rating scales
suggested three factors. These were:
1. Effort/Leadership: including effort and general competence in per-
forming job tasks, peer leadership and support, and self-develop-
ment.
2. Maintaining Personal Discipline: including maintaining self-control,
exhibiting integrity in day-to-day affairs, and following regulations.
3. Fitness and Bearing: including physicalfitnessand maintaining prop-
er military bearing and appearance.
Similar exploratory factor analyses were conducted for the job-spe-
cific BARS scales, and two factors within each job were identified. The
first consisted of scales refiecting performance that seemed to be most
central to the specific technical content of each job. The second factor
included the rating scales that seemed to refiect more tangential or less
central performance components. Again the final formulation of factors
was based on a combination of empirical and judgmental considerations.
The results of these analyses are not shown here, both to save space and
because, in thefinalanalysis, the distinction between the two factors was
not maintained. However, the details and results of the analyses are
given in J. P. Campbell (1986b).
The reliability, intercorrelations, and distributional properties of the
task-specific ratings were also examined for each of the 15 tasks also
tested hands-on. In general, these scales were less reliable than either
the Army-wide or the job-specific behavioral scales. Supervisors and
peers often reported that they had never had an opportunity to observe
their ratees' performance on many of the tasks, leading to a significant
320 PERSONNEL PSYCHOLOGY

missing data problem. Consequently, the task ratings were dropped


from the analyses.
The individual items in the combat performance prediction battery
also were subjected to a principal components analysis. Two components
seemed to emerge from an analysis on the combined sample (J. P. Camp-
bell, 1986b). The first consisted of items depicting exemplary effort, skill,
or dependability under stressful conditions. The second factor consisted
of items portraying failure to follow instructions and lack of discipline
under stressful conditions.

The Final Array

Based on the above exploratory analyses, the reduced array of crite-


rion variables consisted of:
6 hands-on content category scores
6 job knowledge content category scores
6 training knowledge content category scores
3 Army-wide rating factors
2 job-specific rating factors
2 combat performance prediction rating factors
1 NCO potential rating
1 overall effectiveness rating
5 administrative indexes
An individual criterion intercorrelation matrix was calculated for
each job. These results are also reported in more detail in J. P. Campbell
(1986b).

Ihe Target Model

The next step was to generate a target model for the latent structure
of job performance that could be tested for goodness of fit within each of
the nine jobs. As a starting point, the nine intercorreiation matrixes for
the basic array of criterion scores were each subjected to an exploratory
factor analysts. Several consistent results were observed. As expected,
there was the general prominence of "method" components, specifically
one methods component for the ratings and one methods component
for the written tests. The emergence of method components was con-
sistent with prior findings (e.g., Landy & Farr, 1980). Also, there was a
consistent correspondence between the administrative indexes and the
three Army-wide rating components. The awards and certificates item
from the administrative measures loaded together with the Army-wide
effort/leadership rating component; the Article 15 and promotion rate
JOHN P. CAMPBELL ET AL. 321

items loaded with the personal discipline component (most of the vari-
ance in promotion rate was thought to be due to retarded advancement
associated with disciplinary problems); and the physical readiness scale
loaded on the fitness/bearing component.
On the basis of findings from this last set of exploratory empirical
analyses, a revised model was constructed to account for the correlations
among the performance measures. This model included the five job
performance constructs shown in Figure 1. In addition, a "paper-and-
pencil test" methods component and a "ratings" method component
were retained.
Several Issues remained before the model could be tested for good-
ness of fit within the nine Batch A jobs. One was whether the job-specific
BARS rating scales were measuring job-specific technical knowledge
and skill, or effort and leadership, or both. The intercorrelations among
the performance components suggested that these rating scales were
measuring both of these performance constructs, though they seemed
to correlate more highly with other measures of effort and leadership
than with measures of job-specific technical knowledge and skill.
Another issue was whether it was necessary to posit hands-on and
administrative measures method components to account for the inter-
correlations within each of these sets of measures. The average inter-
correlation among the scores within each of these sets was not particu-
larly high. Therefore, for the sake of parsimony, these two additional
methods components were not made part of the model.

Confirming the Model Within Each Job

The next step was to conduct separate tests of goodness of fit of this
target model within each of the nine jobs using the LISREL VI (Joreskog
& Sorbom, 1981). In conducting a confirmatory analysis with LISREL,
it is necessary to specify the structure of three different parameter ma-
trixes: Lambda-!', the hypothesized factor structure matrix (a matrix
of regression coefficients for predicting the observed variables from the
underlying latent constructs); Theta-Epsilon, the matrix of uniqueness
or error components (and intercorrelations); Psi, the matrix of covari-
ances among the factors. In these analyses, the diagonal elements of
Psi (i.e., the factor variances) were set to 1.0, forcing a "standardized"
solution. This meant that the off-diagonal elements in Psi would rep-
resent the correlations among and between the performance constructs
and method. The model further specified that the correlation among
the two method factors and each performance construct should be zero.
322 PERSONNEL PSYCHOLOGY

*" •> "5 •=


_C !> W W
O 0) Q> 3

it :|iti
fl« 2 | | 5 j"-^5
I
^•,3 O
c' 2 _* ° < ~ ^ •

2 g
g= 5
« —
|i til «
u
c
s
S &
If 3 : ; ...
CO
te
UJ
u -
E •
S
S S §£ o .3 •"
a. a » C • > c
•S -3
Ui tt
li 03
£ I-
•s £ ii
Q
<
n o O 5
*^ s s o
"
So ^1
E E CO
UJ
IS-
UJ u o J5 —
£•• =
O tt
IB £ (0 i2 . « •5 "

Sg £ g
V IB
sr » Q 5 £ ^ <o
o g
V >
Mi
flC
of
"C to ls 5c
D
U
« = o
U Q.
V O I |! £ e
! -
of S
•c TB *
O £ = c a> u O-5 c
u.
u.
Q.-O
-o
c «a 8 II sSC ?". -olc = «5 «
S T ot
UJ ^ 8 oO IS• M e n OC
UJ
M ;>
2-D
P .E .£
O- I- .c

SI?

T 5 t
5! ill
O £ :" O |

SS S S 5 £ — -o o tt •
• - - • • g-E 3 . E S

< OS . 5M
O
= «o E £ O **
•S •= tt
z O
X 8 o-st: CO
o •? — n
35^
« •
J, JJ JJ s c ; «• "
UJ
O • Q. O % tt 3 S < c « „ •• 'P
? s £ ff '- ; g £
I- CC
UJ
" E o
sss UJ
?«s £I t i • I "s
oc z
o =5 • c
oo c« -=
=
^ O ? tt
u UJ
O ti >
o
JOHN P CAMPBELL ET AL. 323

TABLE 2
Goodness-of-Fit Indexes Using a Separate Model for Each Job

Root Mean
Square Chi-
MOS Residual Square df P

IIB Infantryman .061 326.2 227 .02


13B Cannon Crewman .057 350.0 322 .14
19E Tank Crewman .065 170.0 348 .999
31C Radio/Teletype Operator .069 369.2 375 .58
63B Vehicle/Generator Mechanic .060 332.1 296 .07
64C Motor TVansport Operator .058 280.1 247 .07
71L Administrative Clerk .067 232.6 249 .77
91A Medical Specialist .061 277.1 275 .45
95B Military Police .052 470.0 374 .001

This effectively defined the method factor as that portion of the com-
mon variance among measures from the same method that was not pre-
dictable from (i.e., correlated with) any of the other related factor or
performance construct scores.
As may often be the case, some problems were encountered in fitting
the hypothesized model for several of the jobs. Solutions were obtained
with some factor loadings greater than one and with negative unique-
ness estimates for the corresponding observed variables. Also, estimates
of the correlations among the performance constructs occasionally ex-
ceeded unity. These problems necessitated a certain amount of ad hoc
cutting and fitting in the form of computing the squared multiple corre-
lation (SMC) for predicting each observed variable from all of the other
variables and setting the uniqueness estimates (i.e., Theta-Epsilon diag-
onal) to 1.0 minus this SMC. This approach eliminated all factor loadings
and correlations greater than 1.0. In most cases, a second "Iteration"
was performed to adjust the Initial Theta-Epsilon estimates so that the
diagonal of the estimated correlation matrix would be as close to 1.0 as
possible.
Table 2 shows the value of chi-square for each job based on a good-
ness-of-fit comparison of the actual correlations among the observed
variables and the correlations estimated from Lambda-^, Theta-Epsilon,
and Psi. The goodness offitis distributed as chi-square, with degrees of
freedom dependent on the number of observed variables and the num-
ber of parameters estimated. The expected value of chi-square is equal
to the degrees of freedom, which is a sign that the correlations among
the observed variables do not reject the model. These chi-square val-
ues should be interpreted with considerable caution. The approach used
was not purely confirmatory because the hypothesized target model was
based in part on analyses of these same data. In addition, LISREL was
324 PERSONNEL PSYCHOLOGY

"told" that the TTieta-Epsilon (uniqueness) parameters were all fixed,


and therefore did not use up any degrees of freedom estimating these
parameters; in fact, these values were estimated entirely from the data.

Confirmation of the Overall Model

The results of quasi confirmatory procedures applied to the perfor-


mance measures from each job generally supported a common structure
of job performance. The procedures also yielded reasonably similar esti-
mates of the intercorrelations among the constructs and of the loadings
of the observed variables on these constructs across the nine jobs (J. R
Campbell, (1986b). The final step was to determine whether the varia-
tion in some of these parameters across jobs could be attributed to sam-
pling variation. The specific model that we explored stated that (a) the
correlation among factors was invariant across jobs and (b) the loadings
of all of the Army-wide measures on the performance constructs and on
the rating method factor were also constant across jobs.
The proposed overall model was a relatively stringent test of a com-
mon latent structure. For example, it was quite possible that selectivity
differences in the different jobs would lead to differences in the appar-
ent measurement precision of the common instruments or to differences
in the correlations between the constructs. This would tend to make it
appear that the different jobs required different performance models,
when in fact they do not.
The LISREL VI multi-groups option requires that the number of
observed variables be the same for each job. However, virtually every
job was missing scores on at least one of the five construct categories
for at least one of the measurement methods. To handle this problem,
the Theta-Epsilon error estimates for these variables were set at LOO,
and the observed correlations between these variables and all the other
variables were set to zero. It was thus necessary to count the number of
"observed" correlations that we generated in this manner and subtract
this number from the degrees of freedom when determining the signifi-
cance of the chi-square goodness-of-fit statistic.
The overall model fit extremely well. The root mean square residual
was .047, and the chi-square was 2508.1. There were 2403 degrees of
freedom after adjusting for missing variables and the use of the data
in estimating uniquenesses. This yields a significance level of .07, not
enough to reject the model. Tables 3 and 4 show the factor loadings and
uniqueness for each job under this constrained model. Table 5 shows the
final mapping of the criterion measures on the five performance factors.
Scores on each factor were obtained simply by summing scores within
JOHN R CAMPBELL ET AL. 325

TABLE 3
Factor Loadings for Single Model Across All Jobs'^

Job Codes for Nine Milltarv Occupalional Specialties


Construct/Factor IIB 13B 19E 31C 63B MC 71L 91A 95B
Core Technical
HO Tech Skill - 59 43 58 46 27 71 54 29
JK Tech Skill - 71 79 76 57 72 70 74 37
TK Tech Skill _ 66 70 54 73 55 68 85 42
MOS Tech Rating - 21 12 16 25 01 12 05 -02
General Soldiering
HO Basic Skill 52 66 44 52 16 51 57 35 58
HO Safety 20 44 31 36 10 49 30 50 41
HO Comm 06 12 37 52 - - - 43
HO Vehicle - - - 15 21 - - 27
JK Basic Skill 95 50 79 64 42 69 66 69 49
JK Safety 69 36 75 45 53 66 57 65 42
JKComm 35 25 59 51 - - - - 39
JK Vehicle - - - 28 37 _6 - 07 34
JK Identify 43 21 34 36 - 12 - 39 18
TK Basic Skill 81 40 67 33 70 50 42 40 38
TK Safety 57 34 45 40 63 43 31 62 34
TK Comm 51 21 31 - 42 29 17 - 23
TK Vehicle 35 22 06 17 65 _b 32 36 21
Effort/Leadership
Eff/Ldr Rating'' 76 76 76 76 76 76 76 76 76
MOS Core Rating 59 33 54 50 45 62 43 62 62
MOS N-Core Rating 77 59 33 45 59 48 47 58 58
Combat Exmp*^ 72 72 72 72 72 72 72 72 72
Combat Prob'^ 44 44 44 44 44 44 44 44 44
Awards/Cert^ 26 26 26 26 26 26 26 26 26
Overall Rating*^ 48 48 48 48 48 48 48 48 48
Discipline
Discipline Rating*^ 69 69 69 69 69 69 69 69 69
Combat Prob*^ 25 25 25 25 25 25 25 25 25
Articles 15"^ -48 -48 -48 -48 -48 -48 -48 -48 -48
Promotion Rate*^ 52 52 52 52 52 52 52 52 52
Overall Rating*^ 28 28 28 28 28 28 28 28 28
Fitness/Bearing
Fitness Rating*^ 82 82 82 82 82 82 82 82 82
Phys Readiness*^ 37 37 37 37 37 37 37 37 37
Ratings Method
AW Ratings" 56 56 56 56 56 56 56 56 56
MOS Ratings'' 61 61 61 61 61 61 61 61 61
Combat Ratings'^ 42 42 42 42 42 42 42 42 42
326 PERSONNEL PSYCHOLOGY

TABLE 3 (continued)
Factor Loadings for Single Model Across AllJobs^

Job Codes for Nine Militarv OccuDational SDecialties


Construct/Factor llB 13B 19E 31C 63B 64C 71L 91A 95B

Written Method
JK Tech - 49 29 54 71 30 42 49 49
JK Soldier -16 51 29 40 53 25 28 60 60
JK Safety -07 49 07 52 26 28 35 52 52
JK Comm 00 11 19 38 - - - 41 41
JK Vehicle - - - 19 62 _b _ 20 20
JK Identify -05 20 12 17 - 10 - 25 25
TK Tech Skill - 54 65 64 49 71 45 53 53
TK Basic Skill 44 68 58 61 25 66 50 60 60
TK Safety 34 51 49 57 18 56 30 59 59
TK Comm 51 46 60 _ 20 36 20 50 50
TK Vehicle 38 51 17 60 45 _b 17 46 46
Note: HO = Hands-on; JK = Job Knowledge Test; TK = It-aining Knowledge Tfest; AW
= Army-wide Ratings; MOS = Job-specific Ratings.
"Decimals are omitted.
Vehicle content was merged into the Core Technical factor for 64C.
•^These loadings were constrained to be equal across all MOS.

each measurement method, standardizing, and then taking the single


sum across methods.

Criterion Intercorrelations ' •'

Before computing the performance factors intercorrelations, five


residual scores were created from the five criterion factors in the fol-
lowing manner. A paper-and-pencii "methods" factor score was com-
puted by first summing the two paper-and-pencil knowledge tests (job
knowledge and training content knowledge scores) and then partialing
out the variance due to the correlation of the total paper-and-pencil test
score with all non-paper-and-pencil criterion measures (e.g., hands-on
scores, rating scores, and administrative records scores). This residual
was defined as the paper-and-pencil method score. This variable was in
turn partialed from the Core Technical Proficiency criterion score and
from the General Task Proficiency score, creating two residual scores.
A similar procedure was used to create a ratings method factor score,
which was in turn partialed from the Effort/Leadership, Personal Dis-
cipline, and Physical Fitness/Military Bearing scores, thereby creating
three more residual scores.
Thefivecriterion factor scores, the five residual criterion scores, the
single rating obtained from the overall performance rating scales, and
the total score from the hands-on lest were used to generate a 12 x
JOHN P. CAMPBELL ET AL. 327

TABLE 4
Uniqueness Estimates Single Model Across AU Jobs°^

Job Codes for Nine Militarv OccuDational Soecialties


Factor Score llB 13B 19E 31C 63B 64C 71L 91A 95B
HO Tech Skill - 62 79 62 76 91 44 68 90
HO Basic Skill 72 58 80 70 95 73 64 87 67
HO Safety 95 84 90 87 95 73 90 75 81
HO Comm 95 95 86 71 - .- ... 82
HO Vehicle - - 95 95 b - - 93
JK Tech Skill _ 23 28 13 15 32 28 16 60
JK Basic Skill 10 44 28 40 48 41 44 47 40
JK Safety 48 56 41 49 62 44 55 26 54
JKComm 85 91 57 55 - - - - 67
JK Vehicle _ - - 87 44 b 95 85
JK Identity 71 90 84 81 - 95 - 64 90
TK Tech Skill _ 25 10 24 18 17 27 19 54
TK Basic Skill 13 37 20 52 41 31 58 83 49
TK Safety 54 62 54 51 55 51 80 29 54
TK Comm 46 75 48 - 77 78 92 - 70
TK Vehicle 75 68 95 61 31 b 86 86 75
Overall Rating^ 18 18 18 18 18 18 18 18 18
Eff/Ldr Rating*^ 09 09 09 09 09 09 09 09 09
Discipline Rating*^ 17 17 17 17 17 17 17 17 17
Phys Fit Rating*^ 05 05 05 05 05 05 05 05 05
MOS Core Rating 18 34 22 24 18 18 18 18 25
MOS N-Ci)re Rating 05 24 46 37 05 05 05 05 27
Combat Exmp*^ 26 26 26 26 26 26 26 26 26
Combat Prob*' 29 29 29 29 29 29 29 29 29
Awards/Cert"^ 93 93 93 93 93 93 93 93 93
Phys Readiness*^ 83 83 83 83 83 83 83 83 83
Articles 15*" 77 77 77 77 77 77 77 77 77
Promotion Rate^ 70 70 70 70 70 70 70 70 70
''Decimals are omitted.
Vehicle content was merged into the Core Technical factor for 64C.
'^These loadings were constrained to be equal across all MOS.

12 matrix of criterion intercorrelation for each MOS in Batch A. The


averages of these correlations across MOS are shown in Thble 6.
The intercorrelations of the five criterion factors are in the upper
left quadrant, the intercorrelations among the five residual scores are
in the lower right quadrant, and the cross correlations are in the upper
right. Keep in mind that to create the residual scores the paper-and-
pencil method factor was partialed from the first two criterion factors
and the ratings method factor was partialed from the last three criterion
factors. Also, thefirsttwo factors contain items from both the knowledge
tests and hands-on tests, and the last three factors all contain both ratings
328 PERSONNEL PSYCHOLOGY

a
.9

ual ific
0

C
XX XXXXXX

in
Writi
owled]

I
ess/

c u
i X
Milita

I CL,

I 13 c
c i^
Q
a. XX X XX
Disci

oi
hip

c
a 2
C X XXXXXX
Lead

I u
0

"5 t c
1 XXXX X X
u

I 0
c
<u

u ' cu0
5-
X
d
L.,
(C
0
1-
a.

a
u. U °S E
u
c !- —
"5..
, — '-
E P B-^
rfoi

<u 00
a.
XX
JOHN P. CAMPBELL ET AL. 329

"a-s 2

a Is
13 m a.

a* O a>

xxxxxxxxxx •i^ "r^ Oi

£ M
U.CQ
•o * c

o "a.
£•5

Iii
•t ^ I
s

2 tr BM>
oc
xxxx xxxxx

31.H
E|°i=

_ 4J O C
.5 g -Tj o
hre
;ral
des

JJ
=.'5 E
c
N 1 :>Ou S
• -

1 H
(/I
330 PERSONNEL PSYCHOLOGY

O r^ r^ ^ *•" (N ^ ^ ^-- r^

wo:

ri C C ol
CN O r<i oo

(N ^ O — r*-)
r-l VO ifi 1^ o
r- CN « O (N
JOHN P. CAMPBELL ET AL. 331

and administrative measures. Some noteworthy features of this 12 x 12


matrix are the following.
Intercorrelations of the factor pairs that confound measurement
method (e.g., 1 with 2 or 3 with 4) are higher, as expected, than those that
do not confound method (e.g., 1 with 3 or 2 with 4). However, they do
not seem so high that collapsing the five factors into some smaller num-
ber would be justified. In fact, as illustrated in the next paper (McHenry,
Hough, Toquam, Hanson, & Ashworth, 1990), Factors 1 and 2, which in-
tercorrelate .531 on the average, yield different profiles of correlations
with the tests in the predictor battery.
The correlation of the overall performance rating scale with the total
hands-on test score is low (.203), but it is not zero. With an average reli-
ability of .61 for the rating and .53 (split-half) for the hands-on, the inter-
correlation becomes .36 when corrected for attenuation. Consequently,
there is a substantial proportion of common variance between the two
measures, but by no means do they assess the same things. Assuming
for the moment that the reliable variance in each measure is relevant
to performance, a reasonable conclusion is that while performance on a
standardized job sample is a significant component of performance, it is
by no means all of it.
The correlations of the residualized third factor (Effort/Leadership
residual) with the Core Technical factor, the residual Core Technical fac-
tor, the General Task Proficiency factor, the overall rating scale, and
the hands-on total score are simiiar in magnitude. Also, as compared
with the raw score correlation, the correlations of the Effort/Leadership
residual scores with the Core Technical and General Task Proficiency
factors go up while the correlations with Personal Discipline and Phys-
ical Fitness go down. Residualizing Factor 3 (by removing the rating
method factor) does seem to change the nature of the factor and makes
it more reflective of task performance as measured by the hands-on or
job samples and knowledge tests.
In general, these intercorrelations seem to behave in very lawful ways
and are consistent with a multi-dimensional model of performance.

Summary and Discussion '

Several aspects of the final structure are noteworthy. First, in spite


of some confounding of factor content with measurement method, the
latent performance structure appears to be composed of very distinct
components. It is reasonable to expect that the different performance
constructs would be predicted by different things, such that validity gen-
eralization may not exist across the performance constructs within a job.
332 PERSONNEL PSYCHOLOGY

A different predictor battery might be selected, depending on which per-


formance factor was emphasized in the criterion measure. If this is so,
there is a genuine question of how the performance constructs should
be weighted in forming an overall appraisal of performance for use in
personnel decisions.
It is tempting to infer that Effort/Leadership and Maintaining Per-
sonal Discipline, particularly the latter, reflect aspects of performance
that are under motivational control and consequently may be better pre-
dicted by personality or interest measures than by measures of ability
or skill (J. P. Campbell, 1986a). This leads to the question of whether
choice behaviors such as showing up on time, staying out of trouble, and
expending extra effort under adverse conditions are functions of state or
trait variables. Project A has considerable data to focus on the question.
It is also interesting that the residual score for Factor 3 becomes more
like a measure of task knowledge and performance than was the raw
score. It may be the case that raters cannot separate evidence of effort
and leadership contributions from technical task competence when they
are asked to aggregate an individual's task performance retrospectively
and provide an evaluation of it. If the degree to which an individual
exhibits a characteristic effort level and consistency of performance is
not task specific, then halo might indeed be substantive variance and not
error.
Given the high degree of consistency in the structure of the perfor-
mance measures across jobs, it is worth asking to what extent this per-
formance model generalizes to even wider domains of jobs. Some lim-
itations appear likely. The "general soldiering skills" constructs would
almost surely be quite different outside the military, but it also seems
quite possible that there is a general or non-job-specific task factor in
virtually any population of jobs. For example, virtually all college profes-
sors "teach" and "serve on committees." It is also likely that the physical
fitness and military bearing construct would be different for nonmilitary
occupations. The remaining constructs—technical skill, effort and lead-
ership, and personal discipline—all appear to be basic components of
almost any job.
In generalizing to a wider domain, it is reasonable to suppose that
other latent structures would fit other "populations" of jobs. For exam-
ple, jobs that are not organized into units and that involve a great deal
of written or oral communication (e.g., sales jobs) might have a different
performance stmcture. It is tempting to suggest that a common latent
structure defines a population of jobs and then ask how many distinguish-
able latent structures exist in the world of work. However, such questions
go well beyond the present finding, which is that a single structure did
seem to fit the jobs studied. Since the five-factor solution is stable across
JOHN P CAMPBELL ET AL. 333

jobs sampled from this population, and the constructs are based on mea-
sures carefully developed to be content valid, it seems safe to ascribe a
degree of construct validity to them.

REFERENCES
Borman WC, Motowidlo SJ, Rose SR, Hanser LM. (1985). Development of a model of sol-
dier effectiveness (ARI Technical Report 741). Alexandria, VA: U.S. Army Research
Institute for the Behavioral and Social Sciences.
Campbell CH. Ford P, Rumsey MG, Pulakos ED. Borman WC, Felker DB, de Vera MV,
Riegelhaupt BJ. (1990). Development of multiple job performance measures in a
representative sample of jobs, PERSONNEI. PSYCHOLOGY, •^.?, 277-300.
Campbell JP (] 986a, August). When the textbook goes operational. Paper presented at the
94th Annual Convention of the American Psychological Asswiation, Washington,
DC.
Campbell JP (Ed.). (1986b). Improving the selection, classification, and utilization of army
enlisted personnel: Annual report, }986 fiscal year {AKl Technical Report 813101).
Alexandria. VA: Army Research Institute for the Behavioral and Social Sciences,
Alexandria, VA.
Campbell JP (Ed.). (1987). Improving the selection, classification, and utilization of Army en-
listed personnel: Annual report, 1985 fiscal year {ARI Technical Report 746). Alexan-
dria, VA: Army Research Institute for the Behavioral and Social Sciences, Alexan-
dria. VA.
Campbell JP, Dunnette MD. Lawler EE, Weick KE. (1970). Managerial behavior, perfor-
mance, and effectiveness. New York:McGraw-Hill.
Dunnette MD. (1963). A modified model for selection research. Joumal of Applied Psy-
chology, 47, ^\1-2.2}.
James LR. (1973). Criterion models and construct validity for criteria. Psychological
Bulletin, 80, 75-83.
Joreskog KC, Sorbom D. (1981). LISREL VI: Anafysis of linear squares methods. Uppsala,
Sweden: University of Uppsala.
Landy FJ. Farr J L (1980). Performance rating. Psychological Bulletin, 87, 72-107.
McHenry JJ, Hough LM. Toquam JL, Hanson MA. Ashworth S. (1990). Project A validity
results: The relationship between predictor and criterion domains, PERSONNEL
PSYCHOLOGY, 43, 335-354.
Schmidt FL. Kaplan LB. (1971). Composite vs. multiple criteria: A review and resolution
of the controversy, PERSONNEL PSYCHOLOGY, 24, 419-434.
Wallace SR. (1965). Criteria for what?/Imcnt-a/i P.rVf/io/ogisr, 20. 411-418.
Young WY. Houston JS, Harris JH. Hoffman RG. Wise LL. (1990). Urge-scale predictor
validation in Project A: Data collection procedures and data base preparation.
PERSONNEL PSYCHOLOGY, 43. 3 0 1 - 3 1 1 .

S-ar putea să vă placă și