Hedge) Pilot CRM Selection

Selecting Pilots With
Crew Resource Management Skills

Jerry W. Hedge, Kenneth T. Bruskiewicz, Walter C. Borman,
Mary Ann Hanson, and Kristi K. Logan
Personnel Decisions Research Institutes, Inc.
Minneapolis
Frederick M. Siem
Air Force Research Laboratory
Brooks Air Force Base, TX
For years, pilot selection has focused primarily on the identification of individuals with
superior flying skills and abilities. More recently, the aviation community has become
increasingly aware that successful completion of a flight or mission requires not only
flying skills but the ability to work well in a crewsituation. This project involved devel-
opment and validation of a crewresource management (CRM) skills test for Air Force
transport pilots. A significant relation was found between the CRM skills test and be-
havior-based ratings of aircraft commander CRMperformance, and the implications of
these findings for CRM-based selection and training are discussed.
Training programs and research studies on aircrew performance have increased in
popularity since it was first shown that problems with aircrew coordination have
become a primary cause of aircraft mishaps (e.g., Cooper, White, & Lauber, 1980;
Leedom, 1990). During the past 20 years, formal training programs have evolved in
both the commercial and military aviation communities, aimed at addressing what
has commonly become known as crew resource management (CRM) problems.
CRM has been defined as the effective utilization of all available re-
sourcesinformation, equipment, and peopleto achieve safe and efficient flight
operation (see Driskell & Adams, 1992; Lauber, 1984). Recent research in CRM
points to the importance of certain knowledge, skills, and abilities related to crew
THE INTERNATIONAL JOURNAL OF AVIATION PSYCHOLOGY, 10(4), 377392
Copyright 2000, Lawrence Erlbaum Associates, Inc.
Requests for reprints should be sent to Jerry W. Hedge, Personnel Decisions Research Institutes, Inc.,
43 Main Street, SE, Suite 405, Minneapolis, MN 55414. E-mail: jerryh@pdi-corp.com
interaction, including such things as communication, problem solving, decision
making, interpersonal skills, situation awareness, and leadership (Franz, Prince,
Cannon-Bowers, & Salas, 1990; Helmreich & Foushee, 1993).
CRMresearchers and practitioners typically have taken a training approach to im-
provingcrewperformance, andsince the late 1970s CRMtrainingprograms have be-
come an increasingly important part of the aviation industry. Reflecting on this
widespread acceptance of CRM training, Helmreich, Wiener, & Kanki (1993) sug-
gested that the challenge of the next decade will be continued refinement of CRM
training programs. These authors also noted that refinements in pilot selection strate-
gies may be needed; they encouraged research aimed at identifying a more sophisti-
cated strategy centering on selecting those pilots with attributes associated with
effective teamperformance, as well as strong individual skills. They suggested that a
successful approach will most likely consist of not only exemplary technical compe-
tenceandmotivation, but alsointerpersonal skillsthat canenhancegroupprocesses.
PILOT SELECTION
The prediction of pilot performance has played a prominent role in aviation psy-
chology for over half a century. Over the years, research has shown that cognitive,
psychomotor, and biodata instruments have been among the best predictors of pilot
performance, whereas personality measures have tended to be less predictive. For
example, Hunter and Burke (1992) conducted a meta-analytic investigation of pre-
dictors of pilot performance and found that pilot performance correlated with cog-
nitive tests (r = .19), psychomotor tests (r = .30), biodata inventories (r = .26), and
personality inventories (r = .12). More recently, Martinussen (1996) summarized
validation results from 50 studies and found relatively comparable results.
Damos (1997), in a discussion of the current state of pilot selection, reinforced
such findings, noting that commercially available personality tests have been
shown to have minimal predictive validity when using passfail fromU.S. military
undergraduate pilot training as a criterion. Damos suggested, however, that despite
the historically low validities associated with personality measures, promising re-
sults from foreign air carriers, and advances in personality models, suggest that
further research may be warranted.
Rather than continuing to pursue measurement of personality or related
noncognitive traits using the more traditional personality inventory methodol-
ogy, we chose to examine a more direct approach to measuring crew coordina-
tion skills. Our aim was to develop and validate a CRM skills test.
SITUATIONAL JUDGMENT TEST (SJT)
One particularly promising methodology for measuring the individual differences
that seemto be important for optimal CRMperformance is the SJT, which presents
378 HEDGE ET AL.
respondents with a series of job-relevant situations and asks themto indicate which
of several alternative actions would be most effective and which would be least ef-
fective in each situation. These tests are based on the premise that there are impor-
tant and often subtle differences between the behavior of effective and ineffective
persons as they respond to problems or dilemmas confronted in the course of carry-
ing out their job responsibilities and that such differences will be reflected in their
responses to similar situations presented in a testing format.
SJTs have often been developed to predict performance in supervisory or mana-
gerial jobs (e.g., Motowidlo, Dunnette, & Carter, 1990; Mowry, 1957). However,
other applications include the prediction of insurance agent turnover (Dalessio,
1992), success in telephone sales positions (Phillips, 1992), and performance in
police work (Du Bois & Watson, 1950).
The individual differences or constructs, which have been targeted by SJTs,
historically have focused on managerial or interpersonal skills. Mandell (1950),
for example, described the Administrative Judgment Test as providing a broad un-
derstanding of administration processes. The How Supervise? test (File, 1945)
was designed to measure knowledge of human relations principles in supervising
employees. Mowrys (1957) Supervisors Problems Test was intended to tap su-
pervisory insight or judgment. Sternberg and colleagues (e.g., Sternberg, Wagner,
Williams, & Horvath, 1995) speculated that what they call tacit knowledge tests
(highly similar to the SJT concept) actually measure practical intelligence or
street smartsthe kind of generalized practical knowledge important for deal-
ing with the problems encountered in a job or, more broadly, in a career.
Although there is not a great deal of research on situational judgment tests, sev-
eral relevant studies do provide some insight into an SJTs relation with other mea-
sures. In the ability domain, most research on SJTs has focused on general
cognitive ability or g. Some have suspected that SJTs are merely measures of g
but in a format that is more face valid for predicting job performance than are typi-
cal ability tests. Empirical evidence, however, suggests this is not the case. Stern-
berg et al. (1995), working with measures of tacit knowledge, reported near zero
correlations between measures of general cognitive ability and their business man-
agement tacit knowledge test. Similarly, Motowidlo et al. (1990) found low corre-
lations (i.e., .05 or less) between cognitive ability and their SJTs in samples of
telecommunications company managers.
Regarding personality, there has been a limited amount of research linking SJTs
andpersonalityconstructs. For example, Mowry(1957) foundthat scores onthe Su-
pervisors Problems Test wererelativelyhighlycorrelatedwithscores ontheFscale
(r = .49), indicating that more democratic (i.e., less authoritarian) supervisors ob-
tained higher scores. Tenopyr (1969) found that scores on the Leadership Evalua-
tion and Development Scale correlated r =.37 with leader consideration scores on
the Leadership Opinion Questionnaire in a sample of production managers.
Hanson and Borman (1993) reported correlations of around r = .20 between their
SELECTING PILOTS WITH CRM SKILLS 379
SJT and composites measuring dominance, dependability, and work orientation.
Finally, Bosshardt and Cochran (1996) found correlations close to .20 between an
SJT and some personality scales including communications, service orientation,
and self-insight.
OVERVIEW OF RESEARCH EFFORT
Our objective, then, was to develop and validate a skills test targeting the CRM at-
tributes critical for success as an Air Force pilot. This CRM skills test, the Situa-
tional Test of Aircrew Response Styles (STARS), was designed to measure prob-
lem solving, decision making, knowledge of how to respond to challenging
situations, communication, aircrew management, and interpersonal effectiveness;
attributes highly relevant to successful performance in the CRM aspects of pilot
jobs. The resulting test would present respondents with realistic but difficult
aircrew situations and five possible responses in each situation and ask them to
identify the response that would be most effective and the response that would be
least effective in that situation.
METHOD
Development of the CRM Skills Test
Participants. Experienced aircrews from C-130 transport aircraft units (i.e.,
basic crews are composed of an aircraft commander, copilot, navigator, flight engi-
neer, and loadmaster) in the Air National Guard (ANG) and Air Force Reserve
(AFR) served as subject matter experts during the development phase of this re-
search project. In addition, junior Air Force officers and Air Force Academy ca-
dets, with little or no flying experience (novices), participated in the response op-
tion generation and response option scaling phases of test development
discussed later. In all, 398 individuals (240 experts and 158 novices) participated
in the test development workshops.
Procedure. Development of the STARS involved four primary steps: (a) sit-
uation generation, (b) response option generation, (c) itemreview, and (d) response
option scaling. Across a 10-month period, development workshops were con-
ducted at 22 Air Force sites within the continental United States. In situation gener-
ation workshops, crewmembers were asked to write brief descriptions of challeng-
ing and realistic CRMsituations that they had faced or might face. During response
option generation workshops, participants were asked to write a one- to three-sen-
380 HEDGE ET AL.
tence response describing howthey would deal with each situation. For itemreview
workshops, experienced crewmembers met in small groups to review the situation
and responses for clarity, completeness, and realism. During response option scal-
ing workshops, novices (junior officers and Air Force Academy cadets) and experi-
enced ANG and AFR aircraft commanders read each situation, rated the effective-
ness of each response option on a scale ranging from 1 (highly ineffective) to 7
(highly effective), and identified the single most and single least effective response
option for each item.
The outcome of the development work was a set of difficult situations targeted
toward performance-relevant CRM skills and a representative sampling of the
kinds of actions pilots might take in these situations. Each situation included a set
of five response options ranging from very effective to relatively ineffective. In ad-
dition, effectiveness data were collected from both expert and novice raters, and
the effectiveness data generated by these groups were analyzed to identify similar-
ities and differences between the expert and novice groups. Statistical compari-
sons both within and between the groups allowed us to select a final set of 60 items
to be used in the validation effort and to develop a scoring key for the test.
Development of a scoring key for the CRM skills test. For each item in
the CRM skills test, aircraft commanders are asked to indicate which of the five
possible response options is the most effective response to that situation and which
of the possible response options is the least effective response in that situation.
Each of the response options has an effectiveness value that was derived from rat-
ings made by a select panel of highly experienced pilots (i.e., the expert sample pre-
viously described). These effectiveness values were generated by computing the
mean effectiveness of each of these response options across all of these experts.
Examinees then received, as their item-level scores, the expert-derived mean effec-
tiveness values of the response options they identified as most effective.
Similar procedures were followed for respondents choices for the least effec-
tive responses. Then, an overall effectiveness score was computed for each situa-
tion by subtracting the effectiveness value associated with the least effective
response selected from the effectiveness value associated with the most effective
response selected (i.e., overall situation score = most effective score least effec-
tive score). This overall score reflects an examinees ability to identify both the
most and least effective responses and was used in all STARS analyses.
Development of CRM Performance Rating Scales
We also developed a set of special for research only performance rating scales
designed to measure aircraft commander performance on the constructs related to
the CRMaspects of the job, such as crewcoordination, communication, teamwork,
and problem solving.
Participants. Experienced C-130 aircrews fromeight ANGbases in the con-
tinental United States served as subject matter experts during this phase of the pro-
ject. In all, 115 individuals participated in the two types of rating scale development
workshops described in the following section.
Procedure. Performance example generation workshops were conducted at
four ANGbases with a total of 50 experienced C-130 crewmembers (32%were pi-
lots and 70%were officers). After being presented with a general description of the
purpose of the workshop and the advantages of behavior-based rating scales, these
aircrews (who averaged over 2,000 hr of C-130 flying time) were asked to write ex-
amples of highly effective, average, and ineffective aircraft commander CRM be-
haviors. They were then provided with special forms on which to write their perfor-
mance examples. These forms were designed to help participants focus their
writing of examples on specific aircraft commander behaviors, while avoiding ex-
traneous detail. As each participant generated initial performance examples, the re-
searcher monitoring the workshop provided feedback to help themtailor their writ-
ing styles to effective production of performance examples.
A total of 415 performance examples were collected from these 50 individuals.
Each performance example was then edited for clarity and grammar. Redundant or
inappropriate examples were eliminated, and the remaining 326 examples were
content analyzed. Thirteen CRM performance dimensions were identified from
the content analysis and defined in preparation for the next series of workshops.
These 13 dimensions included (a) facilitating teamwork, (b) responsibility and ac-
countability, (c) motivating crewmembers, (d) advocacy and assertiveness, (e)
communicating, (f) planning and organizing, (g) coordinating and directing
crewmembers, (h) problem solving and mission analysis, (i) decision making, (j)
stress management and tolerance, (k) attention and vigilance, (l) training and
coaching crewmembers, and (m) disciplining.
A second set of workshops was then held with 65 highly experienced C-130
crewmembers (34% were pilots and 58% were officers) at four ANG bases. The
326 performance examples were separated into two booklets of 163 examples each
to accommodate time and endurance constraints. These crewmembers (who aver-
aged over 3,300 hr of C-130 flying time) were asked to assign each performance
example in their booklet to 1 of the 13 performance dimensions identified in the
content analysis. They were also asked to rate (on a 1 to 7 scale) each performance
example in terms of the effectiveness of the aircraft commanders behavior de-
picted in the example. These data were then analyzed in terms of the percentage of
382 HEDGE ET AL.
C-130 crewmembers assigning each example to a particular category and the level
of effectiveness for each performance example, as rated by all workshop partici-
pants.
Results of the analyses of these categorization and rating data provided the in-
formation necessary to construct a set of behavior-based aircraft commander CRM
rating scales. Using an interrater agreement cutoff of 40%, the 13 performance di-
mensions were examined to assess the number of performance examples assigned
to each category. In all, 215 of the 326 performance examples met or exceeded the
40% cutoff. Because of the small numbers of examples assigned to 3 of the 13 di-
mensions, advocacy and assertiveness (three examples), stress management and
tolerance (five examples), and attention and vigilance (four examples), these 3 di-
mensions were dropped from further analysis. Of the remaining 10 dimensions, 3
were combined with other dimensions (motivating was combined with disciplin-
ing, planning and organizing was combined with communicating to form facilitat-
ing information flow, and problem solving was combined with decision making).
In each case, examples associated with the former dimension tended to anchor the
high end of the performance effectiveness continuum, whereas examples assigned
to the latter dimension tended to anchor the lower end of the rating scale. The 7 fi-
nal dimensions were labeled (a) facilitating teamwork, (b) responsibility and ac-
countability, (c) motivating and disciplining crewmembers, (d) training and
coaching crewmembers, (e) coordinating and directing crewmembers, (f) facilitat-
ing information flow, and (g) problem solving and decision making.
Finally, to construct rating scales for these seven dimensions, the mean effec-
tiveness rating associated with each example assigned to a particular dimension
was used to design behavior summary statements. Once completed, each rating
scale consisted of a label, a definition of the dimension to be rated, and three sets of
behavior summary statements defining the high, midrange, and low levels of ef-
fectiveness on that dimension.
CRM Skills Test Validation
Participants. In all, 792 aircrewmembers at 13 ANGand AFRunits partici-
pated in the concurrent validation data collection effort, either as a rater, a ratee, or
both (in most cases, aircraft commanders served as both raters and ratees). STARS
data were obtained from 280 aircraft commanders; 95% of the ratees were White,
98%were men, and the mean age of these ratees was 39 years. In addition, the sam-
ple was highly experienced in flying the C-130 transport aircraft, having logged an
average of 2,736 hr of C-130 flying time. Rating data were collected from 731 rat-
ers. Just over 92%of the raters were White; 97%were men. The mean rater age was
38 years and averaged almost 2,500 hr of C-130 flying time. As discussed more
thoroughly later, at each base our target set of ratees was all available aircraft com-
manders. Our target set of raters was all crewmembers (copilots, navigators, flight
engineers, loadmasters, radio operators, and other aircraft commanders) who had
firsthand knowledge of the flying skills of each aircraft commander. We also asked
each aircraft commander being rated by his or her fellowcrewmembers to provide a
self-rating.
Rater training videotape. To increase the likelihood of obtaining accurate
and standardized performance ratings using our aircraft commander rating scales,
we developed a brief (10 min) rater training programand videotaped its oral presen-
tation by a member of our project team. The videotape training programwas devel-
oped to be used at each test site by the test administrator, as part of the overall ad-
ministration package. The three primary objectives of the rater training program
were to (a) emphasize the purpose of the project, namely, to evaluate the STARS as
a predictor of aircraft commander CRM performance; (b) underscore the notion
that the rating scales were developed based on extensive input fromknowledgeable
aircrews; and (c) stress the importance of providing accurate ratings as a corner-
stone of overall validation success.
The last segment of the training was devoted to a discussion of typical errors
that raters sometimes commit, often unintentionally. Raters were warned about (a)
letting overall impressions of an aircraft commander affect their specific ratings of
performance on each dimension, (b) letting their ratings be influenced by things
that are not part of an individuals job performance, and (c) a raters tendency to
sometimes give the same ratings to all persons rated. In general, then, this rater
training emphasized the need to focus on each individual ratees strengths and
weaknesses and, with the help of the rating scale anchors, provide accurate ratings
of each aircraft commanders CRM performance.
Procedure. On arrival at the base, the project trainer briefed a small group of
relevant personnel. This group frequently included the Wing or Squadron Com-
mander, the training or flight safety officer, and the designated test administrator.
The short briefing covered the purpose of the project and the specific requirements
associated with the data collection effort at their unit.
Once the briefing was completed, the project trainer met separately with the test
administrator and discussed the test instruments and procedures in detail. Admin-
istration of the CRM rating scales was also discussed, as well as the importance of
collecting ratings of each aircraft commanders performance fromeach crewposi-
tion. We requested that whenever possible, crewmembers be asked to rate up to
five aircraft commanders, and that aircraft commanders be asked to rate their own
performance and the performance of three other aircraft commanders with which
they were familiar. In addition, it was suggested that the test administrators make
384 HEDGE ET AL.
certain that raters had flown with these aircraft commanders a sufficient number of
times in the last 6 to 9 months to be familiar with their CRM performance. Also,
the importance of obtaining accurate ratings was emphasized, as well as how to
improve rating accuracy with the use of the videotaped rater training program. The
test administrator was then shown the videotape, and again the need for all raters to
view it before completing the ratings was emphasized.
A typical administration session procedurally followed these steps. First, the
assembled group of participants was introduced to the project briefly, and each
person read a one-page project summary. Second, the test administrator distrib-
uted the rating scales and had the raters watch the training videotape. Third, back-
ground information sheets were distributed and completed. Fourth, the ratings
were completed by all raters, and the STARS booklets were completed by all
ratees (many aircraft commanders completed both forms). All test booklets, rating
scales, and background information sheets were then collected and processed.
RESULTS
Development of the Operational CRM Skills Test
Recall that at the completion of STARS development, 60 items had been selected
for administration to the concurrent validation sample. However, because of practi-
cal constraints, we believed that it was unrealistic to assume that a 2 hr to 3 hr CRM
skills test would be acceptable for operational use. Consequently, we identified a
subset of the STARS items to use in our final validity analyses using a combination
of preliminary empirical and rational assessment.
Therefore, a subset of STARS items was chosen based on factor analytic re-
sults, item-level validities, and rational considerations. This operational STARS
was produced by choosing items that had (a) factor loadings of at least .25 on one
of the two factors from a two-factor solution, and (b) item-level validities of at
least .15 with ratings fromat least one of our rating sources. Using these criteria, an
operational version of the STARS was developed that contained 13 of the original
60 items. Although representing more than a 75%itemreduction fromthe original
STARS, we believe it is a relatively good representation of the STARS content.
For comparison purposes and as a way to distinguish the two STARS versions, the
original 60-item STARS will be referred to as the R & D STARS, and the
13-item version will be referred to as the operational STARS.
Descriptive Statistics and Reliabilities for the R & D and
Operational CRM Skills Test
The mean, standard deviation, minimum, and maximumvalues for the 60-itemR&
DSTARS and the 13-itemoperational STARS are presented in Table 1, along with
internal consistency reliability estimates (using coefficient alpha). As can be seen,
the operational STARS has a slightly lower mean value and somewhat more vari-
ability. In addition, the internal consistency reliability estimate for the R & D
STARS is .87, suggesting an acceptable level of reliability; the operational STARS
is lower (which is not surprising given the fewer number of items), but still respect-
able at .69.
Descriptive Statistics for the Rating Sources
Seven rating sources were represented in the rating data collected; the five primary
crew positions of aircraft commander, copilot, navigator, flight engineer, and
loadmaster, plus radio operator (an additional crew position in rescue squadrons),
and self-ratings. Table 2 displays descriptive information computed by aggregating
ratings on each aircraft commander within rating sources and across the seven di-
mensions of CRM performance.
These CRM ratings have rather similar distributions across each of the seven
different rating sources, with the exception of copilot ratings and self-ratings.
Means ranged from 4.96 for ratings made by the flight engineers, to 5.56 for air-
craft commander self-ratings. Recalling that the maximum possible score for the
performance ratings was seven, it is clear that these ratings are negatively skewed.
Given the extensive flying experience possessed by this sample of aircraft com-
manders, it is not clear how much inflation is present in the ratings and how much
the ratings reflect true performance levels.
Formation of Rating Source Composites
In addition to keeping the seven rating sources separate for subsequent analyses, we
formed one additional rater composite, an overall rating score. The overall rating
composite was computed by aggregating across six of the seven rating sources. Be-
cause of the higher mean ratings associated with aircraft commander self-ratings,
and the conclusions in the literature suggesting greater leniency in such ratings, we
386 HEDGE ET AL.
TABLE 1
Descriptive Statistics and Reliability Estimates for the Research and Operational STARS
STARS N
a
M SD Minimum Maximum Reliability
b
R & D (60 items) 280 1.84 .37 .05 2.42 .87
Operational (13 items) 280 1.68 .46 .35 2.36 .69
Note. STARS = Situational Test of Aircrew Response Styles.
a
This sample size reflects the number of aircraft commanders who took the STARS.
b
Coefficient
alpha.
excluded this source from the overall rating composite. Thus, a ratee score was
computed by summing across all raters who rated a particular ratee (except self-rat-
ers), and dividing by the number of raters.
Validity of the Operational CRM Skills Test
The STARSpredictor data were randomly split into a developmental sample (45%)
and a cross-validation sample (55%). This cross-validation procedure was fol-
lowed to provide a better estimate of the true degree of the relation between the
CRM skills test and CRM job performance; results obtained from a single sample
likely capitalize on chance factors operating in that group and lead to a maximiza-
tion of the correlation (see Ghiselli, Campbell, &Zedeck, 1981). The developmen-
tal sample was used to select the 13 items for the operational STARS. Table 3
shows the uncorrected correlations between the operational STARS and the eight
TABLE 2
Descriptive Statistics for the Crew Resource Management Rating Sources
Source N
a
M SD Minimum Maximum
Aircraft commander 214 5.17 .84 1.00 7.00
Copilot 112 5.42 .94 2.71 7.00
Navigator 223 5.04 .98 1.43 7.00
Flight engineer 233 4.96 .95 1.43 6.93
Loadmaster 223 5.18 .86 2.14 6.86
Radio operator 21 5.03 .73 3.71 7.00
Self-rating 182 5.56 .59 3.86 7.00
a
This sample size reflects the number of aircraft commanders rated by each of the rating sources (e.g.,
233 aircraft commanders were rated by at least 1 flight engineer).
TABLE 3
STARS Validity Coefficients in the Cross-Validation Sample Uncorrected and
Corrected for Criterion Unreliability
STARS Overall AC CP NAV FE LM RO Self-Rating
Uncorrected
Operational (13 items) .19* .13 .10 .14 .07 .33* .32* .06
R & D (60 items) .14* .02 .18 .08 .04 .23* .32* .11
Corrected
Operational (13 items) .29* .16 .10 .17 .10 .42* .42* .06
R & D (60 items) .22* .02 .18 .09 .05 .31* .41* .11
Note. STARS = Situational Test of Aircrew Response Styles; AC = aircraft commander; CP =
copilot; NAV = navigator; FE = flight engineer; LM = loadmaster; RO = radio operator.
*p < .05.
rating source composites in the cross-validation sample. The results indicate that
predictor-criterion correlations are statistically significant for three of the rating
composites (overall ratings, loadmaster, and radio operator) and approach signifi-
cance for both the aircraft commander and navigator composites.
The uncorrected validities ranged from a low of .10 for copilot ratings to .33
for the ratings provided by loadmasters. The validities for the copilot and self-rat-
ing composites were both negative and relatively different from all other source
perspectives. The correlation between the operational STARS and the overall rat-
ing composite (r = .19) suggests a significant relation between how aircraft com-
manders scored on the STARS, and ratings of those aircraft commanders CRM
performance across all rating sources.
The highest validity, and one of the most intriguing findings, is associated with
the view from the rear of the aircraft. Loadmasters ratings correlated .33 with
aircraft commander scores on the operational STARS, more than double the size of
the validities found with any of the primary cockpit crewpositions. One interpreta-
tion of this finding would be that perhaps being removed from the cockpit allows
for a more objective perspective on the performance of the aircraft commander.
Corrected CRM Skills Test Validities
The uncorrected correlations presented in Table 3 are underestimates of the true re-
lations between the STARS and the rating source composites to the extent that the
ratings are unreliable. To obtain a better estimate of the true validities of the
STARS, we used a standard correction formula (see Ghiselli et al., 1981) to adjust
the validities for criterion unreliability. The corrected validities are also presented
in Table 3, and range from .15 to .49. For comparison purposes, both the uncor-
rected and corrected validities of the 60-item R & D STARS are also included.
DISCUSSION
This study examined the notion of developing a CRMskills test for use as part of a
pilot selection test battery. Previous studies using the SJT methodology have fo-
cused on predicting performance in supervisory and managerial jobs, thus targeting
qualities such as interpersonal insight, managerial skill, practical intelligence, and
judgment. SJTs present respondents with realistic, job-related situations, and re-
spondents are asked what should be done to handle each situation effectively. Just
as real life situations are complex, and often there is not one answer that is clearly
correct, whereas all other answers are clearly wrong, aircrew situations presented
in the CRM skills test are certainly difficult and complex. In addition, these situa-
tions do not typically involve responses that are simply correct or incorrect, but
rather responses or courses of actionthat varyalonga continuumof effectiveness.
388 HEDGE ET AL.
Thus, it seems reasonable to expect that a situational judgment approach would
be extremely appropriate for predicting those aspects of the aircraft commanders
performance not related to the traditional flying the aircraft aspects of perfor-
mance. The results of the validation study showed a significant relation between
performance on the CRM skills test and aircraft commander job performance. The
CRMskills test was developed to tap interpersonal effectiveness, problemsolving,
communication, coordination, and decision making; attributes most critical for
success as an Air Force aircraft commander.
Historically, the preponderance of pilot selection validation research has relied
on training scores as a criterion measure. Martinussen (1996), for example, con-
cluded that only three general categories of pilot performance measurement ex-
isted in sufficient quantities to include in her metaanalyses: (a) passfail in pilot
training, (b) ratings of pilot performance during training, and (c) course grades
from pilot training. Damos (1997) noted this lack of real-time performance data as
a serious weakness in pilot research. This study attempted to broaden this criterion
domain by focusing on pilot job performance. By obtaining ratings from other air-
craft commanders, copilots, navigators, flight engineers, loadmasters, and even ra-
dio operators, we were able to gain broad and varied perspectives on aircraft
commander CRM performance.
A relatively large body of research evidence has accumulated support of the
multiple rater perspective. Borman (1991) noted that each source has advantages
in providing performance information. Experienced supervisors have reasonably
good norms for performance because typically they have seen relatively large
numbers of employees working on the job and thus have well-calibrated views of
different performance levels. Peers are usually privy to the most performance in-
formation regarding their fellow workers, because group members often work in
close proximity (especially in team-oriented work environments), they are able to
observe each others technical and interpersonal job behaviors and, therefore,
should be in an excellent position to rate their coworkers. Subordinates are likely
to have especially relevant information about their supervisors leadership skills.
The findings of this research are supportive of this multiple perspective view-
point. In addition, the fact that one of the largest correlations with the CRM skills
test was associated with the loadmaster position, reinforces the notion that subor-
dinates may do a good job of assessing leadership. Certainly, a weakness of this
study was the lack of a superordinate perspective. Performance data gathered by
independent observers such as check airmen, instructors, or supervisors could
have provided additional insight into aircraft commander CRM behavior.
One recommendation we would offer for future research that we were unable to
accomplish in this study is to investigate the incremental validity of the STARS
over cognitive and psychomotor tests alone. We believe a CRM skills test may of-
fer significant increases in prediction beyond that provided with currently used
measures only.
PRACTICAL APPLICATIONS
In the future, airline pilot selection procedures may depend on assessment of
CRM skills and aptitude just as they currently rely primarily on evaluation of
technical flying skills. Chidester (1993) noted that the reductions in the num-
ber of military and high-time civilian aviators during the 1990s will lead to an
increased need to screen less experienced pilot applicants. Whereas current
procedures are based on selecting pilots with experience and evidence of tech-
nical competence, future selection boards may need to rely more on tests of ap-
titude; the potential to perform effectively. Moreover, selection decisions may
be made not only on technical aptitude but also on aptitude for effective crew
coordination.
Although the primary objective of this study was to develop a CRM selection
test to help identify pilots with superior crew coordination skills, the STARS tech-
nology could also be used as a relatively inexpensive means of CRM training de-
livery. For example, STARS-like items could be used in a group setting as training
stimulus materials, whereby junior and senior crewmembers could meet and dis-
cuss how to apply good CRM skills to difficult situations. Chidester (1993),
among others, suggested that much can be learned from examining successful
crew performance.
Although other training applications can also be envisioned, we briefly of-
fer one additional example. In some situations, the STARS test could be used
as an evaluation tool. The original 60-item STARS was developed to tap a va-
riety of CRM content areas. As such, if viewed from a training perspective,
once an individual completes the STARS, his or her strengths and weaknesses
in the various CRM content areas will have been identified, and those areas in
which deficiencies are found can be targeted for additional training. Similarly,
the STARS could be used as a good pretestposttest measure of CRM training
program effectiveness, and changes made to either the individuals training
goals or the training program content to stress particular areas of CRM training
in the future.
Training applications aside, we remain convinced that in todays competitive
commercial aviation environment, selecting aircrews based on both flying skills
and CRM skills will increase the chances that the efficiency, and effectiveness of
air transportation will continue to improve. Widespread use of a test such as the
STARS would be an effective means of including CRM in the selection process.
ACKNOWLEDGMENTS
All statements expressed in this article are those of the authors and do not represent
official opinions of the U.S. Air Force. Frederick M. Siem is currently affiliated
with People Research, The Boeing Company.
390 HEDGE ET AL.
REFERENCES
Borman, W. C. (1991). Job behavior, performance, and effectiveness. In M. Dunnette & L. Hough
(Eds.), Handbook of industrial and organizational psychology (pp. 271326). Palo Alto, CA: Con-
sulting Psychologists Press.
Bosshardt, M. R., &Cochran, C. C. (1996). Development and validation of a selection systemfor finan-
cial advisors (Institute Rep. No. 276). Minneapolis, MN: Personnel Decisions ResearchInstitutes.
Chidester, T. R. (1993). Critical issues for CRM training and research. In E. L. Wiener, B. G. Kanki, &
R. L. Helmreich(Eds.), Cockpit resource management (pp. 315336). SanDiego, CA: Academic.
Cooper, G. E., White, M. D., & Lauber, J. K. (Eds.). (1980). Resource management on the flightdeck.
Proceedings of a NASA/industry workshop (NASA CP2120). Moffett Field, CA: NASAAmes
Research Center.
Dalessio, A. T. (1992, May). Predicting insurance agent turnover using a video-based situational judg-
ment test. Paper presented at the 7th Annual Conference of the Society for Industrial and Organiza-
tional Psychology, Montreal, Canada.
Damos, D. L. (1997). Pilot selection batteries: Shortcomings and perspectives. The International Jour-
nal of Aviation Psychology, 6, 199209.
Driskell, J. E., & Adams, R. J. (1992). Crew resource management: An introductory handbook. Wash-
ington, DC: U.S. Department of Transportation, Federal Aviation Administration, Research Devel-
opment Service.
Du Bois, P. H., &Watson, R. I. (1950). The selection of patrolmen. Journal of Applied Psychology, 34,
9095.
File, Q. W. (1945). The measurement of supervisory quality in industry. Journal of Applied Psychology,
30, 323337.
Franz, T. M., Prince, C., Cannon-Bowers, J. A., &Salas, E. (1990). The identification of aircrewcoordi-
nation skills. Proceedings of the 12th Symposium on Psychology in the Department of Defense (pp.
97101). Springfield, VA: National Technical Information Services.
Ghiselli, E. E., Campbell, J. P., & Zedeck, S. (1981). Measurement theory for the behavioral sciences
(Rev. ed.). San Francisco: Freeman.
Hanson, M. A., &Borman, W. C. (1993). Development and construct validation of the Situational Judg-
ment Test (Institute Rep. No. 230). Minneapolis, MN: Personnel Decisions Research Institutes.
Helmreich, R. L., &Foushee, H. C. (1993). Why crewresource management? Empirical and theoretical
bases of human factors training in aviation. In E. L. Wiener, B. G. Kanki, &R. L. Helmreich (Eds.),
Cockpit resource management (pp. 345). San Diego, CA: Academic.
Helmreich, R. L., Wiener, E. L., &Kanki, B. G. (1993). The future of crewresource management in the
cockpit and elsewhere. In E. L. Wiener, B. G. Kanki, & R. L. Helmreich (Eds.), Cockpit resource
management (pp. 479500). San Diego, CA: Academic.
Hunter, D. R., &Burke, E. F. (1992, June). Meta analysis of aircraft pilot selection measures (ARI Re-
search Note 9251). Alexandria, VA: U.S. Army Research Institute for the Behavioral and Social
Sciences.
Lauber, J. K. (1984). Resource management in the cockpit. Air Line Pilot, 53, 2023.
Leedom, D. K. (1990). Aircrew coordination training and evaluation for army rotary wing aircrews:
Summary of research for fiscal year 1990. Fort Rucker, AL: Army Research Institute.
Mandell, M. M. (1950). The administrative judgment test. Journal of AppliedPsychology, 34, 145147.
Martinussen, M. (1996). Psychological measures as predictors of pilot performance: A meta-analysis.
The International Journal of Aviation Psychology, 6, 120.
Motowidlo, S. J., Dunnette, M. M., & Carter, G. W. (1990). An alternative selection procedure: The
low-fidelity simulation. Journal of Applied Psychology, 76, 640647.
Mowry, H. W. (1957). Ameasure of supervisoryquality. Journal of AppliedPsychology, 41, 405408.
Phillips, J. F. (1992). Predicting sales skills. Journal of Business and Psychology, 7, 151160.
Sternberg, R. J., Wagner, R. K., Williams, W. M., & Horvath, J. A. (1995). Testing common sense.
American Psychologist, 50, 912927.
Tenopyr, M. L. (1969). The comparative validity of selected leadership scales relative to success in pro-
duction management. Personnel Psychology, 22, 7785.
Manuscript first received October 1998
392 HEDGE ET AL.

Hedge) Pilot CRM Selection

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Hedge) Pilot CRM Selection

Încărcat de

Drepturi de autor:

Formate disponibile

Selecting Pilots With

Crew Resource Management Skills

S-ar putea să vă placă și